All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR
@ 2015-03-20 23:17 ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez

From: "Luis R. Rodriguez" <mcgrof@suse.com>

When a system has PAT support enabled you don't need to be
using MTRRs. Andy had added arch_phys_wc_add() long ago to
help with this but not all drivers were converted over. We
have to take care to only convert drivers where we know that
the proper ioremap_wc() API has been used. Doing this requires
a bit of work on verifying the driver split out the ioremap'd
areas -- and if not doing that ourselves. Verifying a driver
uses the same areas can be hard but with a bit of love Coccinelle
can help with that.

We're motivated to change drivers for a few reasons:

1) Take advantage of PAT when available

2) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

3) Bury MTRR code away from drivers as it is architecture specific

While working on the conversion I noticed a few things.

a) Run time disabling of MTRR

Some systems can technically have both PAT and MTRR enabled
and even if they support it, a system may end up not enabling MTRR.
There are a few reasons why this can happen but the code right now
doesn't address this well. This leads to another point: PAT code
right now is not a first class citizen on x86 -- pat_init() depends
on MTRR code so we can't actually enable PAT without building MTRR.
Doing this requires quite a bit more work so let this serve as
a starting point for conversation if we want to address that.

b) Driver work and required ioremap split

In order to take advantage of PAT device drivers that were using
MTRR must make sure that the area that was using MTRR is ioremap'd
separately. Fortunately a lot of drivers already do this, but there's
quite a bit of drivers that require some love to get that happen.
This leaves us needing to expose an last resort API to annotate this
and also avoid a regression on performance for systems that may have
PAT but can't yet move away from using MTRR. To find the drivers that
need love check out __arch_phys_wc_add(). For a good example driver
where the work was done refer to the atyfb driver fixes.

c) Missing APIs for write-combining

There's a few API calls missing to take advantage of write-combining,
this series add those.

d) Further framebuffer driver MTRR usage simplication

We can simplify MTRR usage by having the framebuffer core
add the MTRR by passing a flag when register_framebuffer()
is called, this could for instance be done on very few drivers
where the smem_len and smem_start are both used for the ioremap_wc()
and also for the arch_phys_wc_add(). Coccinelle can be easily used
to do a transformation here. I didn't do that here given that it
does not work for all device drivers *and* DRM drivers already
have something similar. Lastly this technically could also be done
on some other generic helper --- but figured its best we review that
here. One reason to *not* do this is that tons of framebuffer drivers
have mtrr options exposed -- we'd need to generalize those and provide
a port ... or deal with the fact that we are going to remove all that.

Luis R. Rodriguez (47):
  x86: mtrr: annotate mtrr_type_lookup() is only implemented on
    generic_mtrr_ops
  x86: mtrr: generalize run time disabling of MTRR
  devres: add devm_ioremap_wc()
  pci: add pci_ioremap_wc_bar()
  pci: add pci_iomap_wc() variants
  mtrr: add __arch_phys_wc_add()
  video: fbdev: atyfb: move framebuffer length fudging to helper
  video: fbdev: atyfb: clarify ioremap() base and length used
  vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  video: fbdev: atyfb: use arch_phys_wc_add() and ioremap_wc()
  IB/qib: add acounting for MTRR
  IB/qib: use arch_phys_wc_add()
  IB/ipath: add counting for MTRR
  IB/ipath: use __arch_phys_wc_add()
  [media] media: ivtv: use __arch_phys_wc_add()
  fusion: use __arch_phys_wc_add()
  video: fbdev: vesafb: only support MTRR_TYPE_WRCOMB
  vidoe: fbdev: vesafb: add missing mtrr_del() for added MTRR
  video: fbdev: vesafb: use arch_phys_wc_add()
  mtrr: avoid ifdef'ery with phys_wc_to_mtrr_index()
  ethernet: myri10ge: use arch_phys_wc_add()
  staging: sm750fb: use arch_phys_wc_add() and ioremap_wc()
  staging: xgifb: use arch_phys_wc_add() and ioremap_wc()
  video: fbdev: arkfb: use arch_phys_wc_add() and pci_iomap_wc()
  video: fbdev: radeonfb: use arch_phys_wc_add() and ioremap_wc()
  video: fbdev: gbefb: add missing mtrr_del() calls
  video: fbdev: gbefb: use arch_phys_wc_add() and devm_ioremap_wc()
  video: fbdev: intelfb: use arch_phys_wc_add() and ioremap_wc()
  video: fbdev: matrox: use arch_phys_wc_add() and ioremap_wc()
  video: fbdev: neofb: use arch_phys_wc_add() and ioremap_wc()
  video: fbdev: s3fb: use arch_phys_wc_add() and pci_iomap_wc()
  video: fbdev: nvidia: use arch_phys_wc_add() and ioremap_wc()
  video: fbdev: savagefb: use arch_phys_wc_add() and ioremap_wc()
  video: fbdev: sisfb: use arch_phys_wc_add() and ioremap_wc()
  video: fbdev: aty: use arch_phys_wc_add() and ioremap_wc()
  video: fbdev: i810: use arch_phys_wc_add() and ioremap_wc()
  video: fbdev: i740fb: use arch_phys_wc_add() and pci_ioremap_wc_bar()
  video: fbdev: kyrofb: use arch_phys_wc_add() and pci_ioremap_wc_bar()
  video: fbdev: pm2fb: use arch_phys_wc_add() and ioremap_wc()
  video: fbdev: pm3fb: use arch_phys_wc_add() and ioremap_wc()
  video: fbdev: rivafb: use arch_phys_wc_add() and ioremap_wc()
  video: fbdev: tdfxfb: use arch_phys_wc_add() and ioremap_wc()
  video: fbdev: vt8623fb: use arch_phys_wc_add() and pci_iomap_wc()
  video: fbdev: atmel_lcdfb: use ioremap_wc() for framebuffer
  video: fbdev: geode gxfb: use ioremap_wc() for framebuffer
  video: fbdev: gxt4500: use pci_ioremap_wc_bar() for framebuffer
  mtrr: bury MTRR - unexport mtrr_add() and mtrr_del()

 Documentation/driver-model/devres.txt            |  1 +
 arch/x86/include/asm/io.h                        |  6 ++
 arch/x86/include/asm/mtrr.h                      |  7 +-
 arch/x86/kernel/cpu/mtrr/cleanup.c               |  2 +-
 arch/x86/kernel/cpu/mtrr/generic.c               |  7 +-
 arch/x86/kernel/cpu/mtrr/if.c                    |  3 +
 arch/x86/kernel/cpu/mtrr/main.c                  | 73 +++++++++++++------
 drivers/gpu/drm/drm_ioctl.c                      | 14 +---
 drivers/infiniband/hw/ipath/ipath_driver.c       |  7 +-
 drivers/infiniband/hw/ipath/ipath_kernel.h       |  4 +-
 drivers/infiniband/hw/ipath/ipath_wc_x86_64.c    | 47 +++++--------
 drivers/infiniband/hw/qib/qib_wc_x86_64.c        | 31 ++------
 drivers/media/pci/ivtv/ivtvfb.c                  | 51 ++++----------
 drivers/message/fusion/mptbase.c                 | 19 ++---
 drivers/message/fusion/mptbase.h                 |  2 +-
 drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 36 +++-------
 drivers/pci/pci.c                                | 14 ++++
 drivers/staging/sm750fb/sm750.c                  | 34 ++-------
 drivers/staging/sm750fb/sm750.h                  |  3 -
 drivers/staging/sm750fb/sm750_hw.c               |  3 +-
 drivers/staging/xgifb/XGI_main_26.c              | 27 ++-----
 drivers/video/fbdev/arkfb.c                      | 36 ++--------
 drivers/video/fbdev/atmel_lcdfb.c                |  3 +-
 drivers/video/fbdev/aty/aty128fb.c               | 36 ++--------
 drivers/video/fbdev/aty/atyfb.h                  |  5 +-
 drivers/video/fbdev/aty/atyfb_base.c             | 90 ++++++++----------------
 drivers/video/fbdev/aty/radeon_base.c            | 29 ++------
 drivers/video/fbdev/aty/radeonfb.h               |  2 +-
 drivers/video/fbdev/gbefb.c                      | 18 +++--
 drivers/video/fbdev/geode/gxfb_core.c            |  3 +-
 drivers/video/fbdev/gxt4500.c                    |  2 +-
 drivers/video/fbdev/i740fb.c                     | 35 ++-------
 drivers/video/fbdev/i810/i810.h                  |  3 +-
 drivers/video/fbdev/i810/i810_main.c             | 11 +--
 drivers/video/fbdev/i810/i810_main.h             | 26 -------
 drivers/video/fbdev/intelfb/intelfb.h            |  4 +-
 drivers/video/fbdev/intelfb/intelfbdrv.c         | 38 ++--------
 drivers/video/fbdev/kyro/fbdev.c                 | 33 +++------
 drivers/video/fbdev/matrox/matroxfb_base.c       | 36 ++++------
 drivers/video/fbdev/matrox/matroxfb_base.h       | 27 +------
 drivers/video/fbdev/neofb.c                      | 26 ++-----
 drivers/video/fbdev/nvidia/nv_type.h             |  7 +-
 drivers/video/fbdev/nvidia/nvidia.c              | 37 ++--------
 drivers/video/fbdev/pm2fb.c                      | 31 ++------
 drivers/video/fbdev/pm3fb.c                      | 30 ++------
 drivers/video/fbdev/riva/fbdev.c                 | 39 ++--------
 drivers/video/fbdev/riva/rivafb.h                |  4 +-
 drivers/video/fbdev/s3fb.c                       | 35 ++-------
 drivers/video/fbdev/savage/savagefb.h            |  4 +-
 drivers/video/fbdev/savage/savagefb_driver.c     | 17 +----
 drivers/video/fbdev/sis/sis.h                    |  2 +-
 drivers/video/fbdev/sis/sis_main.c               | 27 ++-----
 drivers/video/fbdev/tdfxfb.c                     | 41 ++---------
 drivers/video/fbdev/vesafb.c                     | 77 +++++++-------------
 drivers/video/fbdev/vt8623fb.c                   | 31 ++------
 include/asm-generic/pci_iomap.h                  | 14 ++++
 include/linux/io.h                               | 12 ++++
 include/linux/pci.h                              |  1 +
 include/video/kyro.h                             |  4 +-
 include/video/neomagic.h                         |  5 +-
 include/video/tdfx.h                             |  2 +-
 lib/devres.c                                     | 29 ++++++++
 lib/pci_iomap.c                                  | 61 ++++++++++++++++
 63 files changed, 463 insertions(+), 901 deletions(-)

-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply	[flat|nested] 710+ messages in thread

* [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR
@ 2015-03-20 23:17 ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez

From: "Luis R. Rodriguez" <mcgrof@suse.com>

When a system has PAT support enabled you don't need to be
using MTRRs. Andy had added arch_phys_wc_add() long ago to
help with this but not all drivers were converted over. We
have to take care to only convert drivers where we know that
the proper ioremap_wc() API has been used. Doing this requires
a bit of work on verifying the driver split out the ioremap'd
areas -- and if not doing that ourselves. Verifying a driver
uses the same areas can be hard but with a bit of love Coccinelle
can help with that.

We're motivated to change drivers for a few reasons:

1) Take advantage of PAT when available

2) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

3) Bury MTRR code away from drivers as it is architecture specific

While working on the conversion I noticed a few things.

a) Run time disabling of MTRR

Some systems can technically have both PAT and MTRR enabled
and even if they support it, a system may end up not enabling MTRR.
There are a few reasons why this can happen but the code right now
doesn't address this well. This leads to another point: PAT code
right now is not a first class citizen on x86 -- pat_init() depends
on MTRR code so we can't actually enable PAT without building MTRR.
Doing this requires quite a bit more work so let this serve as
a starting point for conversation if we want to address that.

b) Driver work and required ioremap split

In order to take advantage of PAT device drivers that were using
MTRR must make sure that the area that was using MTRR is ioremap'd
separately. Fortunately a lot of drivers already do this, but there's
quite a bit of drivers that require some love to get that happen.
This leaves us needing to expose an last resort API to annotate this
and also avoid a regression on performance for systems that may have
PAT but can't yet move away from using MTRR. To find the drivers that
need love check out __arch_phys_wc_add(). For a good example driver
where the work was done refer to the atyfb driver fixes.

c) Missing APIs for write-combining

There's a few API calls missing to take advantage of write-combining,
this series add those.

d) Further framebuffer driver MTRR usage simplication

We can simplify MTRR usage by having the framebuffer core
add the MTRR by passing a flag when register_framebuffer()
is called, this could for instance be done on very few drivers
where the smem_len and smem_start are both used for the ioremap_wc()
and also for the arch_phys_wc_add(). Coccinelle can be easily used
to do a transformation here. I didn't do that here given that it
does not work for all device drivers *and* DRM drivers already
have something similar. Lastly this technically could also be done
on some other generic helper --- but figured its best we review that
here. One reason to *not* do this is that tons of framebuffer drivers
have mtrr options exposed -- we'd need to generalize those and provide
a port ... or deal with the fact that we are going to remove all that.

Luis R. Rodriguez (47):
  x86: mtrr: annotate mtrr_type_lookup() is only implemented on
    generic_mtrr_ops
  x86: mtrr: generalize run time disabling of MTRR
  devres: add devm_ioremap_wc()
  pci: add pci_ioremap_wc_bar()
  pci: add pci_iomap_wc() variants
  mtrr: add __arch_phys_wc_add()
  video: fbdev: atyfb: move framebuffer length fudging to helper
  video: fbdev: atyfb: clarify ioremap() base and length used
  vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  video: fbdev: atyfb: use arch_phys_wc_add() and ioremap_wc()
  IB/qib: add acounting for MTRR
  IB/qib: use arch_phys_wc_add()
  IB/ipath: add counting for MTRR
  IB/ipath: use __arch_phys_wc_add()
  [media] media: ivtv: use __arch_phys_wc_add()
  fusion: use __arch_phys_wc_add()
  video: fbdev: vesafb: only support MTRR_TYPE_WRCOMB
  vidoe: fbdev: vesafb: add missing mtrr_del() for added MTRR
  video: fbdev: vesafb: use arch_phys_wc_add()
  mtrr: avoid ifdef'ery with phys_wc_to_mtrr_index()
  ethernet: myri10ge: use arch_phys_wc_add()
  staging: sm750fb: use arch_phys_wc_add() and ioremap_wc()
  staging: xgifb: use arch_phys_wc_add() and ioremap_wc()
  video: fbdev: arkfb: use arch_phys_wc_add() and pci_iomap_wc()
  video: fbdev: radeonfb: use arch_phys_wc_add() and ioremap_wc()
  video: fbdev: gbefb: add missing mtrr_del() calls
  video: fbdev: gbefb: use arch_phys_wc_add() and devm_ioremap_wc()
  video: fbdev: intelfb: use arch_phys_wc_add() and ioremap_wc()
  video: fbdev: matrox: use arch_phys_wc_add() and ioremap_wc()
  video: fbdev: neofb: use arch_phys_wc_add() and ioremap_wc()
  video: fbdev: s3fb: use arch_phys_wc_add() and pci_iomap_wc()
  video: fbdev: nvidia: use arch_phys_wc_add() and ioremap_wc()
  video: fbdev: savagefb: use arch_phys_wc_add() and ioremap_wc()
  video: fbdev: sisfb: use arch_phys_wc_add() and ioremap_wc()
  video: fbdev: aty: use arch_phys_wc_add() and ioremap_wc()
  video: fbdev: i810: use arch_phys_wc_add() and ioremap_wc()
  video: fbdev: i740fb: use arch_phys_wc_add() and pci_ioremap_wc_bar()
  video: fbdev: kyrofb: use arch_phys_wc_add() and pci_ioremap_wc_bar()
  video: fbdev: pm2fb: use arch_phys_wc_add() and ioremap_wc()
  video: fbdev: pm3fb: use arch_phys_wc_add() and ioremap_wc()
  video: fbdev: rivafb: use arch_phys_wc_add() and ioremap_wc()
  video: fbdev: tdfxfb: use arch_phys_wc_add() and ioremap_wc()
  video: fbdev: vt8623fb: use arch_phys_wc_add() and pci_iomap_wc()
  video: fbdev: atmel_lcdfb: use ioremap_wc() for framebuffer
  video: fbdev: geode gxfb: use ioremap_wc() for framebuffer
  video: fbdev: gxt4500: use pci_ioremap_wc_bar() for framebuffer
  mtrr: bury MTRR - unexport mtrr_add() and mtrr_del()

 Documentation/driver-model/devres.txt            |  1 +
 arch/x86/include/asm/io.h                        |  6 ++
 arch/x86/include/asm/mtrr.h                      |  7 +-
 arch/x86/kernel/cpu/mtrr/cleanup.c               |  2 +-
 arch/x86/kernel/cpu/mtrr/generic.c               |  7 +-
 arch/x86/kernel/cpu/mtrr/if.c                    |  3 +
 arch/x86/kernel/cpu/mtrr/main.c                  | 73 +++++++++++++------
 drivers/gpu/drm/drm_ioctl.c                      | 14 +---
 drivers/infiniband/hw/ipath/ipath_driver.c       |  7 +-
 drivers/infiniband/hw/ipath/ipath_kernel.h       |  4 +-
 drivers/infiniband/hw/ipath/ipath_wc_x86_64.c    | 47 +++++--------
 drivers/infiniband/hw/qib/qib_wc_x86_64.c        | 31 ++------
 drivers/media/pci/ivtv/ivtvfb.c                  | 51 ++++----------
 drivers/message/fusion/mptbase.c                 | 19 ++---
 drivers/message/fusion/mptbase.h                 |  2 +-
 drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 36 +++-------
 drivers/pci/pci.c                                | 14 ++++
 drivers/staging/sm750fb/sm750.c                  | 34 ++-------
 drivers/staging/sm750fb/sm750.h                  |  3 -
 drivers/staging/sm750fb/sm750_hw.c               |  3 +-
 drivers/staging/xgifb/XGI_main_26.c              | 27 ++-----
 drivers/video/fbdev/arkfb.c                      | 36 ++--------
 drivers/video/fbdev/atmel_lcdfb.c                |  3 +-
 drivers/video/fbdev/aty/aty128fb.c               | 36 ++--------
 drivers/video/fbdev/aty/atyfb.h                  |  5 +-
 drivers/video/fbdev/aty/atyfb_base.c             | 90 ++++++++----------------
 drivers/video/fbdev/aty/radeon_base.c            | 29 ++------
 drivers/video/fbdev/aty/radeonfb.h               |  2 +-
 drivers/video/fbdev/gbefb.c                      | 18 +++--
 drivers/video/fbdev/geode/gxfb_core.c            |  3 +-
 drivers/video/fbdev/gxt4500.c                    |  2 +-
 drivers/video/fbdev/i740fb.c                     | 35 ++-------
 drivers/video/fbdev/i810/i810.h                  |  3 +-
 drivers/video/fbdev/i810/i810_main.c             | 11 +--
 drivers/video/fbdev/i810/i810_main.h             | 26 -------
 drivers/video/fbdev/intelfb/intelfb.h            |  4 +-
 drivers/video/fbdev/intelfb/intelfbdrv.c         | 38 ++--------
 drivers/video/fbdev/kyro/fbdev.c                 | 33 +++------
 drivers/video/fbdev/matrox/matroxfb_base.c       | 36 ++++------
 drivers/video/fbdev/matrox/matroxfb_base.h       | 27 +------
 drivers/video/fbdev/neofb.c                      | 26 ++-----
 drivers/video/fbdev/nvidia/nv_type.h             |  7 +-
 drivers/video/fbdev/nvidia/nvidia.c              | 37 ++--------
 drivers/video/fbdev/pm2fb.c                      | 31 ++------
 drivers/video/fbdev/pm3fb.c                      | 30 ++------
 drivers/video/fbdev/riva/fbdev.c                 | 39 ++--------
 drivers/video/fbdev/riva/rivafb.h                |  4 +-
 drivers/video/fbdev/s3fb.c                       | 35 ++-------
 drivers/video/fbdev/savage/savagefb.h            |  4 +-
 drivers/video/fbdev/savage/savagefb_driver.c     | 17 +----
 drivers/video/fbdev/sis/sis.h                    |  2 +-
 drivers/video/fbdev/sis/sis_main.c               | 27 ++-----
 drivers/video/fbdev/tdfxfb.c                     | 41 ++---------
 drivers/video/fbdev/vesafb.c                     | 77 +++++++-------------
 drivers/video/fbdev/vt8623fb.c                   | 31 ++------
 include/asm-generic/pci_iomap.h                  | 14 ++++
 include/linux/io.h                               | 12 ++++
 include/linux/pci.h                              |  1 +
 include/video/kyro.h                             |  4 +-
 include/video/neomagic.h                         |  5 +-
 include/video/tdfx.h                             |  2 +-
 lib/devres.c                                     | 29 ++++++++
 lib/pci_iomap.c                                  | 61 ++++++++++++++++
 63 files changed, 463 insertions(+), 901 deletions(-)

-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply	[flat|nested] 710+ messages in thread

* [PATCH v1 01/47] x86: mtrr: annotate mtrr_type_lookup() is only implemented on generic_mtrr_ops
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:17   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen, Dave Hansen,
	Stefan Bader, konrad.wilk, ville.syrjala, david.vrabel, jbeulich,
	toshi.kani, bhelgaas, Roger Pau Monné,
	xen-devel

From: "Luis R. Rodriguez" <mcgrof@suse.com>

There area few users of mtrr_type_lookup(), including PAT.
Note that PAT can be in theory enabled without MTRR fully
kicking in, such is the case with Xen.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: venkatesh.pallipadi@intel.com
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: bhelgaas@google.com
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 arch/x86/kernel/cpu/mtrr/generic.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 7d74f7b..09c82de 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -230,6 +230,8 @@ u8 mtrr_type_lookup(u64 start, u64 end)
 	int repeat;
 	u64 partial_end;
 
+	/* XXX: Currently only implemented on generic_mtrr_ops */
+
 	type = __mtrr_type_lookup(start, end, &partial_end, &repeat);
 
 	/*
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 01/47] x86: mtrr: annotate mtrr_type_lookup() is only implemented on generic_mtrr_ops
@ 2015-03-20 23:17   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen, Dave Hansen,
	Stefan Bader, konrad.wilk, ville.syrjala, david.vrabel, jbeulich,
	toshi.kani, bhelgaas, Roger Pau Monné,
	xen-devel

From: "Luis R. Rodriguez" <mcgrof@suse.com>

There area few users of mtrr_type_lookup(), including PAT.
Note that PAT can be in theory enabled without MTRR fully
kicking in, such is the case with Xen.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: venkatesh.pallipadi@intel.com
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: bhelgaas@google.com
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 arch/x86/kernel/cpu/mtrr/generic.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 7d74f7b..09c82de 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -230,6 +230,8 @@ u8 mtrr_type_lookup(u64 start, u64 end)
 	int repeat;
 	u64 partial_end;
 
+	/* XXX: Currently only implemented on generic_mtrr_ops */
+
 	type = __mtrr_type_lookup(start, end, &partial_end, &repeat);
 
 	/*
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:17   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen, Dave Hansen,
	Stefan Bader, konrad.wilk, ville.syrjala, david.vrabel, jbeulich,
	toshi.kani, bhelgaas, Roger Pau Monné,
	xen-devel

From: "Luis R. Rodriguez" <mcgrof@suse.com>

It is possible to enable CONFIG_MTRR and up with it
disabled at run time and yet CONFIG_X86_PAT continues
to kick through fully functionally. This can happen
for instance on Xen where MTRR is not supported but
PAT is, this can happen now on Linux as of commit
47591df50 by Juergen introduced as of v3.19.

Technically we should assume the proper CPU
bits would be set to disable MTRR but we can't
always rely on this. At least on the Xen Hypervisor
for instance only X86_FEATURE_MTRR was disabled
as of Xen 4.4 through Xen commit 586ab6a [0],
but not X86_FEATURE_K6_MTRR, X86_FEATURE_CENTAUR_MCR,
or X86_FEATURE_CYRIX_ARR for instance.

x86 mtrr code relies on quite a bit of checks for
mtrr_if being set to check to see if MTRR did get
set up, instead of using that lets provide a generic
setter which when set we know MTRR is enabled. This
also adds a few checks where they were not before
which could potentially safeguard ourselves against
incorrect usage of MTRR where this was not desirable.

Where possible match error codes as if MTRR was
disabled on arch/x86/include/asm/mtrr.h.

Lastly, since disabling MTRR can happen at run time
and we could end up with PAT enabled best record now
on our logs when MTRR is disabled.

[0] ~/devel/xen (git::stable-4.5)$ git describe --contains 586ab6a
4.4.0-rc1~18

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: venkatesh.pallipadi@intel.com
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: bhelgaas@google.com
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 arch/x86/include/asm/mtrr.h        |  2 ++
 arch/x86/kernel/cpu/mtrr/cleanup.c |  2 +-
 arch/x86/kernel/cpu/mtrr/generic.c |  5 +++--
 arch/x86/kernel/cpu/mtrr/if.c      |  3 +++
 arch/x86/kernel/cpu/mtrr/main.c    | 31 ++++++++++++++++++++++---------
 5 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index f768f62..cade917 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -31,6 +31,7 @@
  * arch_phys_wc_add and arch_phys_wc_del.
  */
 # ifdef CONFIG_MTRR
+extern int mtrr_enabled;
 extern u8 mtrr_type_lookup(u64 addr, u64 end);
 extern void mtrr_save_fixed_ranges(void *);
 extern void mtrr_save_state(void);
@@ -50,6 +51,7 @@ extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
 extern int amd_special_default_mtrr(void);
 extern int phys_wc_to_mtrr_index(int handle);
 #  else
+static const int mtrr_enabled;
 static inline u8 mtrr_type_lookup(u64 addr, u64 end)
 {
 	/*
diff --git a/arch/x86/kernel/cpu/mtrr/cleanup.c b/arch/x86/kernel/cpu/mtrr/cleanup.c
index 5f90b85..784dc55 100644
--- a/arch/x86/kernel/cpu/mtrr/cleanup.c
+++ b/arch/x86/kernel/cpu/mtrr/cleanup.c
@@ -880,7 +880,7 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn)
 	 * Make sure we only trim uncachable memory on machines that
 	 * support the Intel MTRR architecture:
 	 */
-	if (!is_cpu(INTEL) || disable_mtrr_trim)
+	if (!is_cpu(INTEL) || disable_mtrr_trim || !mtrr_enabled)
 		return 0;
 
 	rdmsr(MSR_MTRRdefType, def, dummy);
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 09c82de..df321b2 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -116,7 +116,8 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 	u8 prev_match, curr_match;
 
 	*repeat = 0;
-	if (!mtrr_state_set)
+	/* generic_mtrr_ops is only set for generic_mtrr_ops */
+	if (!mtrr_state_set || !mtrr_enabled)
 		return 0xFF;
 
 	if (!mtrr_state.enabled)
@@ -290,7 +291,7 @@ static void get_fixed_ranges(mtrr_type *frs)
 
 void mtrr_save_fixed_ranges(void *info)
 {
-	if (cpu_has_mtrr)
+	if (mtrr_enabled && cpu_has_mtrr)
 		get_fixed_ranges(mtrr_state.fixed_ranges);
 }
 
diff --git a/arch/x86/kernel/cpu/mtrr/if.c b/arch/x86/kernel/cpu/mtrr/if.c
index d76f13d..e9e001a 100644
--- a/arch/x86/kernel/cpu/mtrr/if.c
+++ b/arch/x86/kernel/cpu/mtrr/if.c
@@ -436,6 +436,9 @@ static int __init mtrr_if_init(void)
 {
 	struct cpuinfo_x86 *c = &boot_cpu_data;
 
+	if (!mtrr_enabled)
+		return 0;
+
 	if ((!cpu_has(c, X86_FEATURE_MTRR)) &&
 	    (!cpu_has(c, X86_FEATURE_K6_MTRR)) &&
 	    (!cpu_has(c, X86_FEATURE_CYRIX_ARR)) &&
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index ea5f363..7db9c47 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -59,6 +59,7 @@
 #define MTRR_TO_PHYS_WC_OFFSET 1000
 
 u32 num_var_ranges;
+int mtrr_enabled;
 
 unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
 static DEFINE_MUTEX(mtrr_mutex);
@@ -84,6 +85,9 @@ static int have_wrcomb(void)
 {
 	struct pci_dev *dev;
 
+	if (!mtrr_enabled)
+		return 0;
+
 	dev = pci_get_class(PCI_CLASS_BRIDGE_HOST << 8, NULL);
 	if (dev != NULL) {
 		/*
@@ -286,7 +290,7 @@ int mtrr_add_page(unsigned long base, unsigned long size,
 	int i, replace, error;
 	mtrr_type ltype;
 
-	if (!mtrr_if)
+	if (!mtrr_enabled)
 		return -ENXIO;
 
 	error = mtrr_if->validate_add_page(base, size, type);
@@ -388,6 +392,8 @@ int mtrr_add_page(unsigned long base, unsigned long size,
 
 static int mtrr_check(unsigned long base, unsigned long size)
 {
+	if (!mtrr_enabled)
+		return -ENODEV;
 	if ((base & (PAGE_SIZE - 1)) || (size & (PAGE_SIZE - 1))) {
 		pr_warning("mtrr: size and base must be multiples of 4 kiB\n");
 		pr_debug("mtrr: size: 0x%lx  base: 0x%lx\n", size, base);
@@ -463,8 +469,8 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
 	unsigned long lbase, lsize;
 	int error = -EINVAL;
 
-	if (!mtrr_if)
-		return -ENXIO;
+	if (!mtrr_enabled)
+		return -ENODEV;
 
 	max = num_var_ranges;
 	/* No CPU hotplug when we change MTRR entries */
@@ -523,6 +529,8 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
  */
 int mtrr_del(int reg, unsigned long base, unsigned long size)
 {
+	if (!mtrr_enabled)
+		return -ENODEV;
 	if (mtrr_check(base, size))
 		return -EINVAL;
 	return mtrr_del_page(reg, base >> PAGE_SHIFT, size >> PAGE_SHIFT);
@@ -545,7 +553,7 @@ int arch_phys_wc_add(unsigned long base, unsigned long size)
 {
 	int ret;
 
-	if (pat_enabled)
+	if (pat_enabled || !mtrr_enabled)
 		return 0;  /* Success!  (We don't need to do anything.) */
 
 	ret = mtrr_add(base, size, MTRR_TYPE_WRCOMB, true);
@@ -734,6 +742,7 @@ void __init mtrr_bp_init(void)
 	}
 
 	if (mtrr_if) {
+		mtrr_enabled = true;
 		set_num_var_ranges();
 		init_table();
 		if (use_intel()) {
@@ -744,12 +753,13 @@ void __init mtrr_bp_init(void)
 				mtrr_if->set_all();
 			}
 		}
-	}
+	} else
+		pr_info("mtrr: system does not support MTRR\n");
 }
 
 void mtrr_ap_init(void)
 {
-	if (!use_intel() || mtrr_aps_delayed_init)
+	if (!use_intel() || mtrr_aps_delayed_init || !mtrr_enabled)
 		return;
 	/*
 	 * Ideally we should hold mtrr_mutex here to avoid mtrr entries
@@ -774,6 +784,9 @@ void mtrr_save_state(void)
 {
 	int first_cpu;
 
+	if (!mtrr_enabled)
+		return;
+
 	get_online_cpus();
 	first_cpu = cpumask_first(cpu_online_mask);
 	smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
@@ -782,7 +795,7 @@ void mtrr_save_state(void)
 
 void set_mtrr_aps_delayed_init(void)
 {
-	if (!use_intel())
+	if (!use_intel() || !mtrr_enabled)
 		return;
 
 	mtrr_aps_delayed_init = true;
@@ -810,7 +823,7 @@ void mtrr_aps_init(void)
 
 void mtrr_bp_restore(void)
 {
-	if (!use_intel())
+	if (!use_intel() || !mtrr_enabled)
 		return;
 
 	mtrr_if->set_all();
@@ -818,7 +831,7 @@ void mtrr_bp_restore(void)
 
 static int __init mtrr_init_finialize(void)
 {
-	if (!mtrr_if)
+	if (!mtrr_enabled)
 		return 0;
 
 	if (use_intel()) {
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
@ 2015-03-20 23:17   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen, Dave Hansen,
	Stefan Bader, konrad.wilk, ville.syrjala, david.vrabel, jbeulich,
	toshi.kani, bhelgaas, Roger Pau Monné,
	xen-devel

From: "Luis R. Rodriguez" <mcgrof@suse.com>

It is possible to enable CONFIG_MTRR and up with it
disabled at run time and yet CONFIG_X86_PAT continues
to kick through fully functionally. This can happen
for instance on Xen where MTRR is not supported but
PAT is, this can happen now on Linux as of commit
47591df50 by Juergen introduced as of v3.19.

Technically we should assume the proper CPU
bits would be set to disable MTRR but we can't
always rely on this. At least on the Xen Hypervisor
for instance only X86_FEATURE_MTRR was disabled
as of Xen 4.4 through Xen commit 586ab6a [0],
but not X86_FEATURE_K6_MTRR, X86_FEATURE_CENTAUR_MCR,
or X86_FEATURE_CYRIX_ARR for instance.

x86 mtrr code relies on quite a bit of checks for
mtrr_if being set to check to see if MTRR did get
set up, instead of using that lets provide a generic
setter which when set we know MTRR is enabled. This
also adds a few checks where they were not before
which could potentially safeguard ourselves against
incorrect usage of MTRR where this was not desirable.

Where possible match error codes as if MTRR was
disabled on arch/x86/include/asm/mtrr.h.

Lastly, since disabling MTRR can happen at run time
and we could end up with PAT enabled best record now
on our logs when MTRR is disabled.

[0] ~/devel/xen (git::stable-4.5)$ git describe --contains 586ab6a
4.4.0-rc1~18

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: venkatesh.pallipadi@intel.com
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: bhelgaas@google.com
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 arch/x86/include/asm/mtrr.h        |  2 ++
 arch/x86/kernel/cpu/mtrr/cleanup.c |  2 +-
 arch/x86/kernel/cpu/mtrr/generic.c |  5 +++--
 arch/x86/kernel/cpu/mtrr/if.c      |  3 +++
 arch/x86/kernel/cpu/mtrr/main.c    | 31 ++++++++++++++++++++++---------
 5 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index f768f62..cade917 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -31,6 +31,7 @@
  * arch_phys_wc_add and arch_phys_wc_del.
  */
 # ifdef CONFIG_MTRR
+extern int mtrr_enabled;
 extern u8 mtrr_type_lookup(u64 addr, u64 end);
 extern void mtrr_save_fixed_ranges(void *);
 extern void mtrr_save_state(void);
@@ -50,6 +51,7 @@ extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
 extern int amd_special_default_mtrr(void);
 extern int phys_wc_to_mtrr_index(int handle);
 #  else
+static const int mtrr_enabled;
 static inline u8 mtrr_type_lookup(u64 addr, u64 end)
 {
 	/*
diff --git a/arch/x86/kernel/cpu/mtrr/cleanup.c b/arch/x86/kernel/cpu/mtrr/cleanup.c
index 5f90b85..784dc55 100644
--- a/arch/x86/kernel/cpu/mtrr/cleanup.c
+++ b/arch/x86/kernel/cpu/mtrr/cleanup.c
@@ -880,7 +880,7 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn)
 	 * Make sure we only trim uncachable memory on machines that
 	 * support the Intel MTRR architecture:
 	 */
-	if (!is_cpu(INTEL) || disable_mtrr_trim)
+	if (!is_cpu(INTEL) || disable_mtrr_trim || !mtrr_enabled)
 		return 0;
 
 	rdmsr(MSR_MTRRdefType, def, dummy);
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 09c82de..df321b2 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -116,7 +116,8 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 	u8 prev_match, curr_match;
 
 	*repeat = 0;
-	if (!mtrr_state_set)
+	/* generic_mtrr_ops is only set for generic_mtrr_ops */
+	if (!mtrr_state_set || !mtrr_enabled)
 		return 0xFF;
 
 	if (!mtrr_state.enabled)
@@ -290,7 +291,7 @@ static void get_fixed_ranges(mtrr_type *frs)
 
 void mtrr_save_fixed_ranges(void *info)
 {
-	if (cpu_has_mtrr)
+	if (mtrr_enabled && cpu_has_mtrr)
 		get_fixed_ranges(mtrr_state.fixed_ranges);
 }
 
diff --git a/arch/x86/kernel/cpu/mtrr/if.c b/arch/x86/kernel/cpu/mtrr/if.c
index d76f13d..e9e001a 100644
--- a/arch/x86/kernel/cpu/mtrr/if.c
+++ b/arch/x86/kernel/cpu/mtrr/if.c
@@ -436,6 +436,9 @@ static int __init mtrr_if_init(void)
 {
 	struct cpuinfo_x86 *c = &boot_cpu_data;
 
+	if (!mtrr_enabled)
+		return 0;
+
 	if ((!cpu_has(c, X86_FEATURE_MTRR)) &&
 	    (!cpu_has(c, X86_FEATURE_K6_MTRR)) &&
 	    (!cpu_has(c, X86_FEATURE_CYRIX_ARR)) &&
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index ea5f363..7db9c47 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -59,6 +59,7 @@
 #define MTRR_TO_PHYS_WC_OFFSET 1000
 
 u32 num_var_ranges;
+int mtrr_enabled;
 
 unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
 static DEFINE_MUTEX(mtrr_mutex);
@@ -84,6 +85,9 @@ static int have_wrcomb(void)
 {
 	struct pci_dev *dev;
 
+	if (!mtrr_enabled)
+		return 0;
+
 	dev = pci_get_class(PCI_CLASS_BRIDGE_HOST << 8, NULL);
 	if (dev != NULL) {
 		/*
@@ -286,7 +290,7 @@ int mtrr_add_page(unsigned long base, unsigned long size,
 	int i, replace, error;
 	mtrr_type ltype;
 
-	if (!mtrr_if)
+	if (!mtrr_enabled)
 		return -ENXIO;
 
 	error = mtrr_if->validate_add_page(base, size, type);
@@ -388,6 +392,8 @@ int mtrr_add_page(unsigned long base, unsigned long size,
 
 static int mtrr_check(unsigned long base, unsigned long size)
 {
+	if (!mtrr_enabled)
+		return -ENODEV;
 	if ((base & (PAGE_SIZE - 1)) || (size & (PAGE_SIZE - 1))) {
 		pr_warning("mtrr: size and base must be multiples of 4 kiB\n");
 		pr_debug("mtrr: size: 0x%lx  base: 0x%lx\n", size, base);
@@ -463,8 +469,8 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
 	unsigned long lbase, lsize;
 	int error = -EINVAL;
 
-	if (!mtrr_if)
-		return -ENXIO;
+	if (!mtrr_enabled)
+		return -ENODEV;
 
 	max = num_var_ranges;
 	/* No CPU hotplug when we change MTRR entries */
@@ -523,6 +529,8 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
  */
 int mtrr_del(int reg, unsigned long base, unsigned long size)
 {
+	if (!mtrr_enabled)
+		return -ENODEV;
 	if (mtrr_check(base, size))
 		return -EINVAL;
 	return mtrr_del_page(reg, base >> PAGE_SHIFT, size >> PAGE_SHIFT);
@@ -545,7 +553,7 @@ int arch_phys_wc_add(unsigned long base, unsigned long size)
 {
 	int ret;
 
-	if (pat_enabled)
+	if (pat_enabled || !mtrr_enabled)
 		return 0;  /* Success!  (We don't need to do anything.) */
 
 	ret = mtrr_add(base, size, MTRR_TYPE_WRCOMB, true);
@@ -734,6 +742,7 @@ void __init mtrr_bp_init(void)
 	}
 
 	if (mtrr_if) {
+		mtrr_enabled = true;
 		set_num_var_ranges();
 		init_table();
 		if (use_intel()) {
@@ -744,12 +753,13 @@ void __init mtrr_bp_init(void)
 				mtrr_if->set_all();
 			}
 		}
-	}
+	} else
+		pr_info("mtrr: system does not support MTRR\n");
 }
 
 void mtrr_ap_init(void)
 {
-	if (!use_intel() || mtrr_aps_delayed_init)
+	if (!use_intel() || mtrr_aps_delayed_init || !mtrr_enabled)
 		return;
 	/*
 	 * Ideally we should hold mtrr_mutex here to avoid mtrr entries
@@ -774,6 +784,9 @@ void mtrr_save_state(void)
 {
 	int first_cpu;
 
+	if (!mtrr_enabled)
+		return;
+
 	get_online_cpus();
 	first_cpu = cpumask_first(cpu_online_mask);
 	smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
@@ -782,7 +795,7 @@ void mtrr_save_state(void)
 
 void set_mtrr_aps_delayed_init(void)
 {
-	if (!use_intel())
+	if (!use_intel() || !mtrr_enabled)
 		return;
 
 	mtrr_aps_delayed_init = true;
@@ -810,7 +823,7 @@ void mtrr_aps_init(void)
 
 void mtrr_bp_restore(void)
 {
-	if (!use_intel())
+	if (!use_intel() || !mtrr_enabled)
 		return;
 
 	mtrr_if->set_all();
@@ -818,7 +831,7 @@ void mtrr_bp_restore(void)
 
 static int __init mtrr_init_finialize(void)
 {
-	if (!mtrr_if)
+	if (!mtrr_enabled)
 		return 0;
 
 	if (use_intel()) {
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 03/47] devres: add devm_ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:17   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

We have devm_ioremap_nocache() but no devm_ioremap_wc()
so add that. This will be used later.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 Documentation/driver-model/devres.txt |  1 +
 include/linux/io.h                    |  2 ++
 lib/devres.c                          | 29 +++++++++++++++++++++++++++++
 3 files changed, 32 insertions(+)

diff --git a/Documentation/driver-model/devres.txt b/Documentation/driver-model/devres.txt
index e1e2bbd..831a536 100644
--- a/Documentation/driver-model/devres.txt
+++ b/Documentation/driver-model/devres.txt
@@ -276,6 +276,7 @@ IOMAP
   devm_ioport_unmap()
   devm_ioremap()
   devm_ioremap_nocache()
+  devm_ioremap_wc()
   devm_ioremap_resource() : checks resource, requests memory region, ioremaps
   devm_iounmap()
   pcim_iomap()
diff --git a/include/linux/io.h b/include/linux/io.h
index 4cc299c..91101a1 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -72,6 +72,8 @@ void __iomem *devm_ioremap(struct device *dev, resource_size_t offset,
 			   resource_size_t size);
 void __iomem *devm_ioremap_nocache(struct device *dev, resource_size_t offset,
 				   resource_size_t size);
+void __iomem *devm_ioremap_wc(struct device *dev, resource_size_t offset,
+			      resource_size_t size);
 void devm_iounmap(struct device *dev, void __iomem *addr);
 int check_signature(const volatile void __iomem *io_addr,
 			const unsigned char *signature, int length);
diff --git a/lib/devres.c b/lib/devres.c
index 0f1dd2e..2eb2bfe 100644
--- a/lib/devres.c
+++ b/lib/devres.c
@@ -72,6 +72,35 @@ void __iomem *devm_ioremap_nocache(struct device *dev, resource_size_t offset,
 EXPORT_SYMBOL(devm_ioremap_nocache);
 
 /**
+ * devm_ioremap_wc - Managed ioremap_wc()
+ * @dev: Generic device to remap IO address for
+ * @offset: BUS offset to map
+ * @size: Size of map
+ *
+ * Managed ioremap_wc().  Map is automatically unmapped on driver
+ * detach.
+ */
+void __iomem *devm_ioremap_wc(struct device *dev, resource_size_t offset,
+			      resource_size_t size)
+{
+	void __iomem **ptr, *addr;
+
+	ptr = devres_alloc(devm_ioremap_release, sizeof(*ptr), GFP_KERNEL);
+	if (!ptr)
+		return NULL;
+
+	addr = ioremap_wc(offset, size);
+	if (addr) {
+		*ptr = addr;
+		devres_add(dev, ptr);
+	} else
+		devres_free(ptr);
+
+	return addr;
+}
+EXPORT_SYMBOL_GPL(devm_ioremap_wc);
+
+/**
  * devm_iounmap - Managed iounmap()
  * @dev: Generic device to unmap for
  * @addr: Address to unmap
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 03/47] devres: add devm_ioremap_wc()
@ 2015-03-20 23:17   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

We have devm_ioremap_nocache() but no devm_ioremap_wc()
so add that. This will be used later.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 Documentation/driver-model/devres.txt |  1 +
 include/linux/io.h                    |  2 ++
 lib/devres.c                          | 29 +++++++++++++++++++++++++++++
 3 files changed, 32 insertions(+)

diff --git a/Documentation/driver-model/devres.txt b/Documentation/driver-model/devres.txt
index e1e2bbd..831a536 100644
--- a/Documentation/driver-model/devres.txt
+++ b/Documentation/driver-model/devres.txt
@@ -276,6 +276,7 @@ IOMAP
   devm_ioport_unmap()
   devm_ioremap()
   devm_ioremap_nocache()
+  devm_ioremap_wc()
   devm_ioremap_resource() : checks resource, requests memory region, ioremaps
   devm_iounmap()
   pcim_iomap()
diff --git a/include/linux/io.h b/include/linux/io.h
index 4cc299c..91101a1 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -72,6 +72,8 @@ void __iomem *devm_ioremap(struct device *dev, resource_size_t offset,
 			   resource_size_t size);
 void __iomem *devm_ioremap_nocache(struct device *dev, resource_size_t offset,
 				   resource_size_t size);
+void __iomem *devm_ioremap_wc(struct device *dev, resource_size_t offset,
+			      resource_size_t size);
 void devm_iounmap(struct device *dev, void __iomem *addr);
 int check_signature(const volatile void __iomem *io_addr,
 			const unsigned char *signature, int length);
diff --git a/lib/devres.c b/lib/devres.c
index 0f1dd2e..2eb2bfe 100644
--- a/lib/devres.c
+++ b/lib/devres.c
@@ -72,6 +72,35 @@ void __iomem *devm_ioremap_nocache(struct device *dev, resource_size_t offset,
 EXPORT_SYMBOL(devm_ioremap_nocache);
 
 /**
+ * devm_ioremap_wc - Managed ioremap_wc()
+ * @dev: Generic device to remap IO address for
+ * @offset: BUS offset to map
+ * @size: Size of map
+ *
+ * Managed ioremap_wc().  Map is automatically unmapped on driver
+ * detach.
+ */
+void __iomem *devm_ioremap_wc(struct device *dev, resource_size_t offset,
+			      resource_size_t size)
+{
+	void __iomem **ptr, *addr;
+
+	ptr = devres_alloc(devm_ioremap_release, sizeof(*ptr), GFP_KERNEL);
+	if (!ptr)
+		return NULL;
+
+	addr = ioremap_wc(offset, size);
+	if (addr) {
+		*ptr = addr;
+		devres_add(dev, ptr);
+	} else
+		devres_free(ptr);
+
+	return addr;
+}
+EXPORT_SYMBOL_GPL(devm_ioremap_wc);
+
+/**
  * devm_iounmap - Managed iounmap()
  * @dev: Generic device to unmap for
  * @addr: Address to unmap
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 03/47] devres: add devm_ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (2 preceding siblings ...)
  (?)
@ 2015-03-20 23:17 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

We have devm_ioremap_nocache() but no devm_ioremap_wc()
so add that. This will be used later.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 Documentation/driver-model/devres.txt |  1 +
 include/linux/io.h                    |  2 ++
 lib/devres.c                          | 29 +++++++++++++++++++++++++++++
 3 files changed, 32 insertions(+)

diff --git a/Documentation/driver-model/devres.txt b/Documentation/driver-model/devres.txt
index e1e2bbd..831a536 100644
--- a/Documentation/driver-model/devres.txt
+++ b/Documentation/driver-model/devres.txt
@@ -276,6 +276,7 @@ IOMAP
   devm_ioport_unmap()
   devm_ioremap()
   devm_ioremap_nocache()
+  devm_ioremap_wc()
   devm_ioremap_resource() : checks resource, requests memory region, ioremaps
   devm_iounmap()
   pcim_iomap()
diff --git a/include/linux/io.h b/include/linux/io.h
index 4cc299c..91101a1 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -72,6 +72,8 @@ void __iomem *devm_ioremap(struct device *dev, resource_size_t offset,
 			   resource_size_t size);
 void __iomem *devm_ioremap_nocache(struct device *dev, resource_size_t offset,
 				   resource_size_t size);
+void __iomem *devm_ioremap_wc(struct device *dev, resource_size_t offset,
+			      resource_size_t size);
 void devm_iounmap(struct device *dev, void __iomem *addr);
 int check_signature(const volatile void __iomem *io_addr,
 			const unsigned char *signature, int length);
diff --git a/lib/devres.c b/lib/devres.c
index 0f1dd2e..2eb2bfe 100644
--- a/lib/devres.c
+++ b/lib/devres.c
@@ -72,6 +72,35 @@ void __iomem *devm_ioremap_nocache(struct device *dev, resource_size_t offset,
 EXPORT_SYMBOL(devm_ioremap_nocache);
 
 /**
+ * devm_ioremap_wc - Managed ioremap_wc()
+ * @dev: Generic device to remap IO address for
+ * @offset: BUS offset to map
+ * @size: Size of map
+ *
+ * Managed ioremap_wc().  Map is automatically unmapped on driver
+ * detach.
+ */
+void __iomem *devm_ioremap_wc(struct device *dev, resource_size_t offset,
+			      resource_size_t size)
+{
+	void __iomem **ptr, *addr;
+
+	ptr = devres_alloc(devm_ioremap_release, sizeof(*ptr), GFP_KERNEL);
+	if (!ptr)
+		return NULL;
+
+	addr = ioremap_wc(offset, size);
+	if (addr) {
+		*ptr = addr;
+		devres_add(dev, ptr);
+	} else
+		devres_free(ptr);
+
+	return addr;
+}
+EXPORT_SYMBOL_GPL(devm_ioremap_wc);
+
+/**
  * devm_iounmap - Managed iounmap()
  * @dev: Generic device to unmap for
  * @addr: Address to unmap
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 04/47] pci: add pci_ioremap_wc_bar()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:17   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This lets drivers take advanate of PAT when available. This
should help with the transition of converting video drivers over
to ioremap_wc() to help with the goal of eventually using
_PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on
ioremap_nocache() (de33c442e)

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/pci/pci.c   | 14 ++++++++++++++
 include/linux/pci.h |  1 +
 2 files changed, 15 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 81f06e8..6afd507 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -137,6 +137,20 @@ void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar)
 				     pci_resource_len(pdev, bar));
 }
 EXPORT_SYMBOL_GPL(pci_ioremap_bar);
+
+void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar)
+{
+	/*
+	 * Make sure the BAR is actually a memory resource, not an IO resource
+	 */
+	if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) {
+		WARN_ON(1);
+		return NULL;
+	}
+	return ioremap_wc(pci_resource_start(pdev, bar),
+			  pci_resource_len(pdev, bar));
+}
+EXPORT_SYMBOL_GPL(pci_ioremap_wc_bar);
 #endif
 
 #define PCI_FIND_CAP_TTL	48
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 211e9da..c235b09 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1667,6 +1667,7 @@ static inline void pci_mmcfg_late_init(void) { }
 int pci_ext_cfg_avail(void);
 
 void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar);
+void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar);
 
 #ifdef CONFIG_PCI_IOV
 int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 04/47] pci: add pci_ioremap_wc_bar()
@ 2015-03-20 23:17   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This lets drivers take advanate of PAT when available. This
should help with the transition of converting video drivers over
to ioremap_wc() to help with the goal of eventually using
_PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on
ioremap_nocache() (de33c442e)

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/pci/pci.c   | 14 ++++++++++++++
 include/linux/pci.h |  1 +
 2 files changed, 15 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 81f06e8..6afd507 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -137,6 +137,20 @@ void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar)
 				     pci_resource_len(pdev, bar));
 }
 EXPORT_SYMBOL_GPL(pci_ioremap_bar);
+
+void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar)
+{
+	/*
+	 * Make sure the BAR is actually a memory resource, not an IO resource
+	 */
+	if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) {
+		WARN_ON(1);
+		return NULL;
+	}
+	return ioremap_wc(pci_resource_start(pdev, bar),
+			  pci_resource_len(pdev, bar));
+}
+EXPORT_SYMBOL_GPL(pci_ioremap_wc_bar);
 #endif
 
 #define PCI_FIND_CAP_TTL	48
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 211e9da..c235b09 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1667,6 +1667,7 @@ static inline void pci_mmcfg_late_init(void) { }
 int pci_ext_cfg_avail(void);
 
 void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar);
+void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar);
 
 #ifdef CONFIG_PCI_IOV
 int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 04/47] pci: add pci_ioremap_wc_bar()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (5 preceding siblings ...)
  (?)
@ 2015-03-20 23:17 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This lets drivers take advanate of PAT when available. This
should help with the transition of converting video drivers over
to ioremap_wc() to help with the goal of eventually using
_PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on
ioremap_nocache() (de33c442e)

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/pci/pci.c   | 14 ++++++++++++++
 include/linux/pci.h |  1 +
 2 files changed, 15 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 81f06e8..6afd507 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -137,6 +137,20 @@ void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar)
 				     pci_resource_len(pdev, bar));
 }
 EXPORT_SYMBOL_GPL(pci_ioremap_bar);
+
+void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar)
+{
+	/*
+	 * Make sure the BAR is actually a memory resource, not an IO resource
+	 */
+	if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) {
+		WARN_ON(1);
+		return NULL;
+	}
+	return ioremap_wc(pci_resource_start(pdev, bar),
+			  pci_resource_len(pdev, bar));
+}
+EXPORT_SYMBOL_GPL(pci_ioremap_wc_bar);
 #endif
 
 #define PCI_FIND_CAP_TTL	48
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 211e9da..c235b09 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1667,6 +1667,7 @@ static inline void pci_mmcfg_late_init(void) { }
 int pci_ext_cfg_avail(void);
 
 void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar);
+void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar);
 
 #ifdef CONFIG_PCI_IOV
 int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 05/47] pci: add pci_iomap_wc() variants
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:17   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Bjorn Helgaas, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen, Dave Hansen,
	Arnd Bergmann, Michael S. Tsirkin, Stefan Bader, konrad.wilk,
	ville.syrjala, david.vrabel, jbeulich, toshi.kani,
	Roger Pau Monné,
	xen-devel

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This allows drivers to take advantage of write-combining
when possible. Ideally we'd have pci_read_bases() just
peg an IORESOURCE_WC flag for us but where exactly
video devices memory lie varies *largely* and at times things
are mixed with MMIO registers, sometimes we can address
the changes in drivers, other times the change requires
intrusive changes.

Although there is also arch_phys_wc_add() that makes use of
architecture specific write-combinging alternatives (MTRR on
x86 when a system does not have PAT) we void polluting
pci_iomap() space with it and force drivers and subsystems
that want to use it to be explicit.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: venkatesh.pallipadi@intel.com
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 include/asm-generic/pci_iomap.h | 14 ++++++++++
 lib/pci_iomap.c                 | 61 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 75 insertions(+)

diff --git a/include/asm-generic/pci_iomap.h b/include/asm-generic/pci_iomap.h
index 7389c87..b1e17fc 100644
--- a/include/asm-generic/pci_iomap.h
+++ b/include/asm-generic/pci_iomap.h
@@ -15,9 +15,13 @@ struct pci_dev;
 #ifdef CONFIG_PCI
 /* Create a virtual mapping cookie for a PCI BAR (memory or IO) */
 extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max);
+extern void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long max);
 extern void __iomem *pci_iomap_range(struct pci_dev *dev, int bar,
 				     unsigned long offset,
 				     unsigned long maxlen);
+extern void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar,
+					unsigned long offset,
+					unsigned long maxlen);
 /* Create a virtual mapping cookie for a port on a given PCI device.
  * Do not call this directly, it exists to make it easier for architectures
  * to override */
@@ -34,12 +38,22 @@ static inline void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned lon
 	return NULL;
 }
 
+static inline void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long max)
+{
+	return NULL;
+}
 static inline void __iomem *pci_iomap_range(struct pci_dev *dev, int bar,
 					    unsigned long offset,
 					    unsigned long maxlen)
 {
 	return NULL;
 }
+static inline void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar,
+					       unsigned long offset,
+					       unsigned long maxlen)
+{
+	return NULL;
+}
 #endif
 
 #endif /* __ASM_GENERIC_IO_H */
diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c
index bcce5f1..30b65ae 100644
--- a/lib/pci_iomap.c
+++ b/lib/pci_iomap.c
@@ -52,6 +52,46 @@ void __iomem *pci_iomap_range(struct pci_dev *dev,
 EXPORT_SYMBOL(pci_iomap_range);
 
 /**
+ * pci_iomap_wc_range - create a virtual WC mapping cookie for a PCI BAR
+ * @dev: PCI device that owns the BAR
+ * @bar: BAR number
+ * @offset: map memory at the given offset in BAR
+ * @maxlen: max length of the memory to map
+ *
+ * Using this function you will get a __iomem address to your device BAR.
+ * You can access it using ioread*() and iowrite*(). These functions hide
+ * the details if this is a MMIO or PIO address space and will just do what
+ * you expect from them in the correct way. When possible write combining
+ * is used.
+ *
+ * @maxlen specifies the maximum length to map. If you want to get access to
+ * the complete BAR from offset to the end, pass %0 here.
+ * */
+void __iomem *pci_iomap_wc_range(struct pci_dev *dev,
+				 int bar,
+				 unsigned long offset,
+				 unsigned long maxlen)
+{
+	resource_size_t start = pci_resource_start(dev, bar);
+	resource_size_t len = pci_resource_len(dev, bar);
+	unsigned long flags = pci_resource_flags(dev, bar);
+
+	if (len <= offset || !start)
+		return NULL;
+	len -= offset;
+	start += offset;
+	if (maxlen && len > maxlen)
+		len = maxlen;
+	if (flags & IORESOURCE_IO)
+		return __pci_ioport_map(dev, start, len);
+	if (flags & IORESOURCE_MEM)
+		return ioremap_wc(start, len);
+	/* What? */
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(pci_iomap_wc_range);
+
+/**
  * pci_iomap - create a virtual mapping cookie for a PCI BAR
  * @dev: PCI device that owns the BAR
  * @bar: BAR number
@@ -70,4 +110,25 @@ void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long maxlen)
 	return pci_iomap_range(dev, bar, 0, maxlen);
 }
 EXPORT_SYMBOL(pci_iomap);
+
+/**
+ * pci_iomap_wc - create a virtual WC mapping cookie for a PCI BAR
+ * @dev: PCI device that owns the BAR
+ * @bar: BAR number
+ * @maxlen: length of the memory to map
+ *
+ * Using this function you will get a __iomem address to your device BAR.
+ * You can access it using ioread*() and iowrite*(). These functions hide
+ * the details if this is a MMIO or PIO address space and will just do what
+ * you expect from them in the correct way. When possible write combining
+ * is used.
+ *
+ * @maxlen specifies the maximum length to map. If you want to get access to
+ * the complete BAR without checking for its length first, pass %0 here.
+ * */
+void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long maxlen)
+{
+	return pci_iomap_wc_range(dev, bar, 0, maxlen);
+}
+EXPORT_SYMBOL_GPL(pci_iomap_wc);
 #endif /* CONFIG_PCI */
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 05/47] pci: add pci_iomap_wc() variants
@ 2015-03-20 23:17   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Bjorn Helgaas, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen, Dave Hansen,
	Arnd Bergmann, Michael S. Tsirkin, Stefan Bader, konrad.wilk,
	ville.syrjala, david.vrabel, jbeulich, toshi.kani,
	Roger Pau Monné,
	xen-devel

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This allows drivers to take advantage of write-combining
when possible. Ideally we'd have pci_read_bases() just
peg an IORESOURCE_WC flag for us but where exactly
video devices memory lie varies *largely* and at times things
are mixed with MMIO registers, sometimes we can address
the changes in drivers, other times the change requires
intrusive changes.

Although there is also arch_phys_wc_add() that makes use of
architecture specific write-combinging alternatives (MTRR on
x86 when a system does not have PAT) we void polluting
pci_iomap() space with it and force drivers and subsystems
that want to use it to be explicit.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: venkatesh.pallipadi@intel.com
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 include/asm-generic/pci_iomap.h | 14 ++++++++++
 lib/pci_iomap.c                 | 61 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 75 insertions(+)

diff --git a/include/asm-generic/pci_iomap.h b/include/asm-generic/pci_iomap.h
index 7389c87..b1e17fc 100644
--- a/include/asm-generic/pci_iomap.h
+++ b/include/asm-generic/pci_iomap.h
@@ -15,9 +15,13 @@ struct pci_dev;
 #ifdef CONFIG_PCI
 /* Create a virtual mapping cookie for a PCI BAR (memory or IO) */
 extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max);
+extern void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long max);
 extern void __iomem *pci_iomap_range(struct pci_dev *dev, int bar,
 				     unsigned long offset,
 				     unsigned long maxlen);
+extern void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar,
+					unsigned long offset,
+					unsigned long maxlen);
 /* Create a virtual mapping cookie for a port on a given PCI device.
  * Do not call this directly, it exists to make it easier for architectures
  * to override */
@@ -34,12 +38,22 @@ static inline void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned lon
 	return NULL;
 }
 
+static inline void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long max)
+{
+	return NULL;
+}
 static inline void __iomem *pci_iomap_range(struct pci_dev *dev, int bar,
 					    unsigned long offset,
 					    unsigned long maxlen)
 {
 	return NULL;
 }
+static inline void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar,
+					       unsigned long offset,
+					       unsigned long maxlen)
+{
+	return NULL;
+}
 #endif
 
 #endif /* __ASM_GENERIC_IO_H */
diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c
index bcce5f1..30b65ae 100644
--- a/lib/pci_iomap.c
+++ b/lib/pci_iomap.c
@@ -52,6 +52,46 @@ void __iomem *pci_iomap_range(struct pci_dev *dev,
 EXPORT_SYMBOL(pci_iomap_range);
 
 /**
+ * pci_iomap_wc_range - create a virtual WC mapping cookie for a PCI BAR
+ * @dev: PCI device that owns the BAR
+ * @bar: BAR number
+ * @offset: map memory at the given offset in BAR
+ * @maxlen: max length of the memory to map
+ *
+ * Using this function you will get a __iomem address to your device BAR.
+ * You can access it using ioread*() and iowrite*(). These functions hide
+ * the details if this is a MMIO or PIO address space and will just do what
+ * you expect from them in the correct way. When possible write combining
+ * is used.
+ *
+ * @maxlen specifies the maximum length to map. If you want to get access to
+ * the complete BAR from offset to the end, pass %0 here.
+ * */
+void __iomem *pci_iomap_wc_range(struct pci_dev *dev,
+				 int bar,
+				 unsigned long offset,
+				 unsigned long maxlen)
+{
+	resource_size_t start = pci_resource_start(dev, bar);
+	resource_size_t len = pci_resource_len(dev, bar);
+	unsigned long flags = pci_resource_flags(dev, bar);
+
+	if (len <= offset || !start)
+		return NULL;
+	len -= offset;
+	start += offset;
+	if (maxlen && len > maxlen)
+		len = maxlen;
+	if (flags & IORESOURCE_IO)
+		return __pci_ioport_map(dev, start, len);
+	if (flags & IORESOURCE_MEM)
+		return ioremap_wc(start, len);
+	/* What? */
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(pci_iomap_wc_range);
+
+/**
  * pci_iomap - create a virtual mapping cookie for a PCI BAR
  * @dev: PCI device that owns the BAR
  * @bar: BAR number
@@ -70,4 +110,25 @@ void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long maxlen)
 	return pci_iomap_range(dev, bar, 0, maxlen);
 }
 EXPORT_SYMBOL(pci_iomap);
+
+/**
+ * pci_iomap_wc - create a virtual WC mapping cookie for a PCI BAR
+ * @dev: PCI device that owns the BAR
+ * @bar: BAR number
+ * @maxlen: length of the memory to map
+ *
+ * Using this function you will get a __iomem address to your device BAR.
+ * You can access it using ioread*() and iowrite*(). These functions hide
+ * the details if this is a MMIO or PIO address space and will just do what
+ * you expect from them in the correct way. When possible write combining
+ * is used.
+ *
+ * @maxlen specifies the maximum length to map. If you want to get access to
+ * the complete BAR without checking for its length first, pass %0 here.
+ * */
+void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long maxlen)
+{
+	return pci_iomap_wc_range(dev, bar, 0, maxlen);
+}
+EXPORT_SYMBOL_GPL(pci_iomap_wc);
 #endif /* CONFIG_PCI */
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:17   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Ideally on systems using PAT we can expect a swift
transition away from MTRR. There can be a few exceptions
to this, one is where device drivers are known to exist
on PATs with errata, another situation is observed on
old device drivers where devices had combined MMIO
register access with whatever area they typically
later wanted to end up using MTRR for on the same
PCI BAR. This situation can still be addressed by
splitting up ioremap'd PCI BAR into two ioremap'd
calls, one for MMIO registers, and another for whatever
is desirable for write-combining -- in order to
accomplish this though quite a bit of driver
restructuring is required.

Device drivers which are known to require large
amount of re-work in order to split ioremap'd areas
can use __arch_phys_wc_add() to avoid regressions
when PAT is enabled.

For a good example driver where things are neatly
split up on a PCI BAR refer the infiniband qib
driver. For a good example of a driver where good
amount of work is required refer to the infiniband
ipath driver.

This is *only* a transitive API -- and as such no new
drivers are ever expected to use this.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 arch/x86/include/asm/io.h       |  4 ++++
 arch/x86/kernel/cpu/mtrr/main.c | 36 +++++++++++++++++++++++++++++-------
 include/linux/io.h              |  4 ++++
 3 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 34a5b93..a144d05 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -338,6 +338,10 @@ extern bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
 #define IO_SPACE_LIMIT 0xffff
 
 #ifdef CONFIG_MTRR
+extern int __must_check __arch_phys_wc_add(unsigned long base,
+					   unsigned long size);
+#define __arch_phys_wc_add __arch_phys_wc_add
+
 extern int __must_check arch_phys_wc_add(unsigned long base,
 					 unsigned long size);
 extern void arch_phys_wc_del(int handle);
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 7db9c47..5ae830b 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -538,23 +538,24 @@ int mtrr_del(int reg, unsigned long base, unsigned long size)
 EXPORT_SYMBOL(mtrr_del);
 
 /**
- * arch_phys_wc_add - add a WC MTRR and handle errors if PAT is unavailable
+ * __arch_phys_wc_add - add a WC MTRR even if PAT is available
  * @base: Physical base address
  * @size: Size of region
  *
- * If PAT is available, this does nothing.  If PAT is unavailable, it
- * attempts to add a WC MTRR covering size bytes starting at base and
- * logs an error if this fails.
+ * We typically do not want to use MTRR if PAT is available but there
+ * are some drivers which require significant work to get this to work
+ * properly. This call should only be used by those drivers where it is
+ * clear that hard work is required to modify them to use arch_phys_wc_add()
  *
  * Drivers must store the return value to pass to mtrr_del_wc_if_needed,
  * but drivers should not try to interpret that return value.
  */
-int arch_phys_wc_add(unsigned long base, unsigned long size)
+int __arch_phys_wc_add(unsigned long base, unsigned long size)
 {
 	int ret;
 
-	if (pat_enabled || !mtrr_enabled)
-		return 0;  /* Success!  (We don't need to do anything.) */
+	if (!mtrr_enabled)
+		return 0;
 
 	ret = mtrr_add(base, size, MTRR_TYPE_WRCOMB, true);
 	if (ret < 0) {
@@ -564,6 +565,27 @@ int arch_phys_wc_add(unsigned long base, unsigned long size)
 	}
 	return ret + MTRR_TO_PHYS_WC_OFFSET;
 }
+EXPORT_SYMBOL_GPL(__arch_phys_wc_add);
+
+/**
+ * arch_phys_wc_add - add a WC MTRR and handle errors if PAT is unavailable
+ * @base: Physical base address
+ * @size: Size of region
+ *
+ * If PAT is available, this does nothing.  If PAT is unavailable, it
+ * attempts to add a WC MTRR covering size bytes starting at base and
+ * logs an error if this fails.
+ *
+ * Drivers must store the return value to pass to mtrr_del_wc_if_needed,
+ * but drivers should not try to interpret that return value.
+ */
+int arch_phys_wc_add(unsigned long base, unsigned long size)
+{
+	if (pat_enabled || !mtrr_enabled)
+		return 0;  /* Success!  (We don't need to do anything.) */
+
+	return __arch_phys_wc_add(base, size);
+}
 EXPORT_SYMBOL(arch_phys_wc_add);
 
 /*
diff --git a/include/linux/io.h b/include/linux/io.h
index 91101a1..ecc51c3 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -111,6 +111,10 @@ static inline void arch_phys_wc_del(int handle)
 }
 
 #define arch_phys_wc_add arch_phys_wc_add
+#ifndef __arch_phys_wc_add
+#define __arch_phys_wc_add arch_phys_wc_add
+#endif
+
 #endif
 
 #endif /* _LINUX_IO_H */
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
@ 2015-03-20 23:17   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Ideally on systems using PAT we can expect a swift
transition away from MTRR. There can be a few exceptions
to this, one is where device drivers are known to exist
on PATs with errata, another situation is observed on
old device drivers where devices had combined MMIO
register access with whatever area they typically
later wanted to end up using MTRR for on the same
PCI BAR. This situation can still be addressed by
splitting up ioremap'd PCI BAR into two ioremap'd
calls, one for MMIO registers, and another for whatever
is desirable for write-combining -- in order to
accomplish this though quite a bit of driver
restructuring is required.

Device drivers which are known to require large
amount of re-work in order to split ioremap'd areas
can use __arch_phys_wc_add() to avoid regressions
when PAT is enabled.

For a good example driver where things are neatly
split up on a PCI BAR refer the infiniband qib
driver. For a good example of a driver where good
amount of work is required refer to the infiniband
ipath driver.

This is *only* a transitive API -- and as such no new
drivers are ever expected to use this.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 arch/x86/include/asm/io.h       |  4 ++++
 arch/x86/kernel/cpu/mtrr/main.c | 36 +++++++++++++++++++++++++++++-------
 include/linux/io.h              |  4 ++++
 3 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 34a5b93..a144d05 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -338,6 +338,10 @@ extern bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
 #define IO_SPACE_LIMIT 0xffff
 
 #ifdef CONFIG_MTRR
+extern int __must_check __arch_phys_wc_add(unsigned long base,
+					   unsigned long size);
+#define __arch_phys_wc_add __arch_phys_wc_add
+
 extern int __must_check arch_phys_wc_add(unsigned long base,
 					 unsigned long size);
 extern void arch_phys_wc_del(int handle);
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 7db9c47..5ae830b 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -538,23 +538,24 @@ int mtrr_del(int reg, unsigned long base, unsigned long size)
 EXPORT_SYMBOL(mtrr_del);
 
 /**
- * arch_phys_wc_add - add a WC MTRR and handle errors if PAT is unavailable
+ * __arch_phys_wc_add - add a WC MTRR even if PAT is available
  * @base: Physical base address
  * @size: Size of region
  *
- * If PAT is available, this does nothing.  If PAT is unavailable, it
- * attempts to add a WC MTRR covering size bytes starting at base and
- * logs an error if this fails.
+ * We typically do not want to use MTRR if PAT is available but there
+ * are some drivers which require significant work to get this to work
+ * properly. This call should only be used by those drivers where it is
+ * clear that hard work is required to modify them to use arch_phys_wc_add()
  *
  * Drivers must store the return value to pass to mtrr_del_wc_if_needed,
  * but drivers should not try to interpret that return value.
  */
-int arch_phys_wc_add(unsigned long base, unsigned long size)
+int __arch_phys_wc_add(unsigned long base, unsigned long size)
 {
 	int ret;
 
-	if (pat_enabled || !mtrr_enabled)
-		return 0;  /* Success!  (We don't need to do anything.) */
+	if (!mtrr_enabled)
+		return 0;
 
 	ret = mtrr_add(base, size, MTRR_TYPE_WRCOMB, true);
 	if (ret < 0) {
@@ -564,6 +565,27 @@ int arch_phys_wc_add(unsigned long base, unsigned long size)
 	}
 	return ret + MTRR_TO_PHYS_WC_OFFSET;
 }
+EXPORT_SYMBOL_GPL(__arch_phys_wc_add);
+
+/**
+ * arch_phys_wc_add - add a WC MTRR and handle errors if PAT is unavailable
+ * @base: Physical base address
+ * @size: Size of region
+ *
+ * If PAT is available, this does nothing.  If PAT is unavailable, it
+ * attempts to add a WC MTRR covering size bytes starting at base and
+ * logs an error if this fails.
+ *
+ * Drivers must store the return value to pass to mtrr_del_wc_if_needed,
+ * but drivers should not try to interpret that return value.
+ */
+int arch_phys_wc_add(unsigned long base, unsigned long size)
+{
+	if (pat_enabled || !mtrr_enabled)
+		return 0;  /* Success!  (We don't need to do anything.) */
+
+	return __arch_phys_wc_add(base, size);
+}
 EXPORT_SYMBOL(arch_phys_wc_add);
 
 /*
diff --git a/include/linux/io.h b/include/linux/io.h
index 91101a1..ecc51c3 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -111,6 +111,10 @@ static inline void arch_phys_wc_del(int handle)
 }
 
 #define arch_phys_wc_add arch_phys_wc_add
+#ifndef __arch_phys_wc_add
+#define __arch_phys_wc_add arch_phys_wc_add
+#endif
+
 #endif
 
 #endif /* _LINUX_IO_H */
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (8 preceding siblings ...)
  (?)
@ 2015-03-20 23:17 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Ideally on systems using PAT we can expect a swift
transition away from MTRR. There can be a few exceptions
to this, one is where device drivers are known to exist
on PATs with errata, another situation is observed on
old device drivers where devices had combined MMIO
register access with whatever area they typically
later wanted to end up using MTRR for on the same
PCI BAR. This situation can still be addressed by
splitting up ioremap'd PCI BAR into two ioremap'd
calls, one for MMIO registers, and another for whatever
is desirable for write-combining -- in order to
accomplish this though quite a bit of driver
restructuring is required.

Device drivers which are known to require large
amount of re-work in order to split ioremap'd areas
can use __arch_phys_wc_add() to avoid regressions
when PAT is enabled.

For a good example driver where things are neatly
split up on a PCI BAR refer the infiniband qib
driver. For a good example of a driver where good
amount of work is required refer to the infiniband
ipath driver.

This is *only* a transitive API -- and as such no new
drivers are ever expected to use this.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 arch/x86/include/asm/io.h       |  4 ++++
 arch/x86/kernel/cpu/mtrr/main.c | 36 +++++++++++++++++++++++++++++-------
 include/linux/io.h              |  4 ++++
 3 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 34a5b93..a144d05 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -338,6 +338,10 @@ extern bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
 #define IO_SPACE_LIMIT 0xffff
 
 #ifdef CONFIG_MTRR
+extern int __must_check __arch_phys_wc_add(unsigned long base,
+					   unsigned long size);
+#define __arch_phys_wc_add __arch_phys_wc_add
+
 extern int __must_check arch_phys_wc_add(unsigned long base,
 					 unsigned long size);
 extern void arch_phys_wc_del(int handle);
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 7db9c47..5ae830b 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -538,23 +538,24 @@ int mtrr_del(int reg, unsigned long base, unsigned long size)
 EXPORT_SYMBOL(mtrr_del);
 
 /**
- * arch_phys_wc_add - add a WC MTRR and handle errors if PAT is unavailable
+ * __arch_phys_wc_add - add a WC MTRR even if PAT is available
  * @base: Physical base address
  * @size: Size of region
  *
- * If PAT is available, this does nothing.  If PAT is unavailable, it
- * attempts to add a WC MTRR covering size bytes starting at base and
- * logs an error if this fails.
+ * We typically do not want to use MTRR if PAT is available but there
+ * are some drivers which require significant work to get this to work
+ * properly. This call should only be used by those drivers where it is
+ * clear that hard work is required to modify them to use arch_phys_wc_add()
  *
  * Drivers must store the return value to pass to mtrr_del_wc_if_needed,
  * but drivers should not try to interpret that return value.
  */
-int arch_phys_wc_add(unsigned long base, unsigned long size)
+int __arch_phys_wc_add(unsigned long base, unsigned long size)
 {
 	int ret;
 
-	if (pat_enabled || !mtrr_enabled)
-		return 0;  /* Success!  (We don't need to do anything.) */
+	if (!mtrr_enabled)
+		return 0;
 
 	ret = mtrr_add(base, size, MTRR_TYPE_WRCOMB, true);
 	if (ret < 0) {
@@ -564,6 +565,27 @@ int arch_phys_wc_add(unsigned long base, unsigned long size)
 	}
 	return ret + MTRR_TO_PHYS_WC_OFFSET;
 }
+EXPORT_SYMBOL_GPL(__arch_phys_wc_add);
+
+/**
+ * arch_phys_wc_add - add a WC MTRR and handle errors if PAT is unavailable
+ * @base: Physical base address
+ * @size: Size of region
+ *
+ * If PAT is available, this does nothing.  If PAT is unavailable, it
+ * attempts to add a WC MTRR covering size bytes starting at base and
+ * logs an error if this fails.
+ *
+ * Drivers must store the return value to pass to mtrr_del_wc_if_needed,
+ * but drivers should not try to interpret that return value.
+ */
+int arch_phys_wc_add(unsigned long base, unsigned long size)
+{
+	if (pat_enabled || !mtrr_enabled)
+		return 0;  /* Success!  (We don't need to do anything.) */
+
+	return __arch_phys_wc_add(base, size);
+}
 EXPORT_SYMBOL(arch_phys_wc_add);
 
 /*
diff --git a/include/linux/io.h b/include/linux/io.h
index 91101a1..ecc51c3 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -111,6 +111,10 @@ static inline void arch_phys_wc_del(int handle)
 }
 
 #define arch_phys_wc_add arch_phys_wc_add
+#ifndef __arch_phys_wc_add
+#define __arch_phys_wc_add arch_phys_wc_add
+#endif
+
 #endif
 
 #endif /* _LINUX_IO_H */
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 07/47] video: fbdev: atyfb: move framebuffer length fudging to helper
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:17   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The size of the framebuffer to be used needs to
be fudged to account for the different type of
devices that are out there. This captures what
is required to do well, we'll resuse this later.

This has no functional changes.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/aty/atyfb_base.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
index 8789e48..16936bb 100644
--- a/drivers/video/fbdev/aty/atyfb_base.c
+++ b/drivers/video/fbdev/aty/atyfb_base.c
@@ -427,6 +427,20 @@ static struct {
 #endif /* CONFIG_FB_ATY_CT */
 };
 
+/*
+ * Last page of 8 MB (4 MB on ISA) aperture is MMIO,
+ * unless the auxiliary register aperture is used.
+ */
+static void aty_fudge_framebuffer_len(struct fb_info *info)
+{
+	struct atyfb_par *par = (struct atyfb_par *) info->par;
+
+	if (!par->aux_start &&
+	    (info->fix.smem_len == 0x800000 ||
+	     (par->bus_type == ISA && info->fix.smem_len == 0x400000)))
+		info->fix.smem_len -= GUI_RESERVE;
+}
+
 static int correct_chipset(struct atyfb_par *par)
 {
 	u8 rev;
@@ -2603,14 +2617,7 @@ static int aty_init(struct fb_info *info)
 	if (par->pll_ops->resume_pll)
 		par->pll_ops->resume_pll(info, &par->pll);
 
-	/*
-	 * Last page of 8 MB (4 MB on ISA) aperture is MMIO,
-	 * unless the auxiliary register aperture is used.
-	 */
-	if (!par->aux_start &&
-	    (info->fix.smem_len == 0x800000 ||
-	     (par->bus_type == ISA && info->fix.smem_len == 0x400000)))
-		info->fix.smem_len -= GUI_RESERVE;
+	aty_fudge_framebuffer_len(info);
 
 	/*
 	 * Disable register access through the linear aperture
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 07/47] video: fbdev: atyfb: move framebuffer length fudging to helper
@ 2015-03-20 23:17   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The size of the framebuffer to be used needs to
be fudged to account for the different type of
devices that are out there. This captures what
is required to do well, we'll resuse this later.

This has no functional changes.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/aty/atyfb_base.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
index 8789e48..16936bb 100644
--- a/drivers/video/fbdev/aty/atyfb_base.c
+++ b/drivers/video/fbdev/aty/atyfb_base.c
@@ -427,6 +427,20 @@ static struct {
 #endif /* CONFIG_FB_ATY_CT */
 };
 
+/*
+ * Last page of 8 MB (4 MB on ISA) aperture is MMIO,
+ * unless the auxiliary register aperture is used.
+ */
+static void aty_fudge_framebuffer_len(struct fb_info *info)
+{
+	struct atyfb_par *par = (struct atyfb_par *) info->par;
+
+	if (!par->aux_start &&
+	    (info->fix.smem_len = 0x800000 ||
+	     (par->bus_type = ISA && info->fix.smem_len = 0x400000)))
+		info->fix.smem_len -= GUI_RESERVE;
+}
+
 static int correct_chipset(struct atyfb_par *par)
 {
 	u8 rev;
@@ -2603,14 +2617,7 @@ static int aty_init(struct fb_info *info)
 	if (par->pll_ops->resume_pll)
 		par->pll_ops->resume_pll(info, &par->pll);
 
-	/*
-	 * Last page of 8 MB (4 MB on ISA) aperture is MMIO,
-	 * unless the auxiliary register aperture is used.
-	 */
-	if (!par->aux_start &&
-	    (info->fix.smem_len = 0x800000 ||
-	     (par->bus_type = ISA && info->fix.smem_len = 0x400000)))
-		info->fix.smem_len -= GUI_RESERVE;
+	aty_fudge_framebuffer_len(info);
 
 	/*
 	 * Disable register access through the linear aperture
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 07/47] video: fbdev: atyfb: move framebuffer length fudging to helper
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (10 preceding siblings ...)
  (?)
@ 2015-03-20 23:17 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: Jean-Christophe Plagniol-Villard, linux-fbdev, Antonino Daplas,
	Daniel Vetter, Luis R. Rodriguez, x86, linux-kernel,
	Tomi Valkeinen, xen-devel, Ingo Molnar, Linus Torvalds

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The size of the framebuffer to be used needs to
be fudged to account for the different type of
devices that are out there. This captures what
is required to do well, we'll resuse this later.

This has no functional changes.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/aty/atyfb_base.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
index 8789e48..16936bb 100644
--- a/drivers/video/fbdev/aty/atyfb_base.c
+++ b/drivers/video/fbdev/aty/atyfb_base.c
@@ -427,6 +427,20 @@ static struct {
 #endif /* CONFIG_FB_ATY_CT */
 };
 
+/*
+ * Last page of 8 MB (4 MB on ISA) aperture is MMIO,
+ * unless the auxiliary register aperture is used.
+ */
+static void aty_fudge_framebuffer_len(struct fb_info *info)
+{
+	struct atyfb_par *par = (struct atyfb_par *) info->par;
+
+	if (!par->aux_start &&
+	    (info->fix.smem_len == 0x800000 ||
+	     (par->bus_type == ISA && info->fix.smem_len == 0x400000)))
+		info->fix.smem_len -= GUI_RESERVE;
+}
+
 static int correct_chipset(struct atyfb_par *par)
 {
 	u8 rev;
@@ -2603,14 +2617,7 @@ static int aty_init(struct fb_info *info)
 	if (par->pll_ops->resume_pll)
 		par->pll_ops->resume_pll(info, &par->pll);
 
-	/*
-	 * Last page of 8 MB (4 MB on ISA) aperture is MMIO,
-	 * unless the auxiliary register aperture is used.
-	 */
-	if (!par->aux_start &&
-	    (info->fix.smem_len == 0x800000 ||
-	     (par->bus_type == ISA && info->fix.smem_len == 0x400000)))
-		info->fix.smem_len -= GUI_RESERVE;
+	aty_fudge_framebuffer_len(info);
 
 	/*
 	 * Disable register access through the linear aperture
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 08/47] video: fbdev: atyfb: clarify ioremap() base and length used
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:17   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This has no functional changes, it just adjusts
the ioremap() call for the framebuffer to use
the same values we later use for the framebuffer,
this will make it easier to review the next change.

The size of the framebuffer varies but since this is
for PCI we *know* this defaults to 0x800000.
atyfb_setup_generic() is *only* used on PCI probe.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/aty/atyfb_base.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
index 16936bb..8025624 100644
--- a/drivers/video/fbdev/aty/atyfb_base.c
+++ b/drivers/video/fbdev/aty/atyfb_base.c
@@ -3489,7 +3489,9 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
 
 	/* Map in frame buffer */
 	info->fix.smem_start = addr;
-	info->screen_base = ioremap(addr, 0x800000);
+	info->fix.smem_len = 0x800000;
+
+	info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
 	if (info->screen_base == NULL) {
 		ret = -ENOMEM;
 		goto atyfb_setup_generic_fail;
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 08/47] video: fbdev: atyfb: clarify ioremap() base and length used
@ 2015-03-20 23:17   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This has no functional changes, it just adjusts
the ioremap() call for the framebuffer to use
the same values we later use for the framebuffer,
this will make it easier to review the next change.

The size of the framebuffer varies but since this is
for PCI we *know* this defaults to 0x800000.
atyfb_setup_generic() is *only* used on PCI probe.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/aty/atyfb_base.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
index 16936bb..8025624 100644
--- a/drivers/video/fbdev/aty/atyfb_base.c
+++ b/drivers/video/fbdev/aty/atyfb_base.c
@@ -3489,7 +3489,9 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
 
 	/* Map in frame buffer */
 	info->fix.smem_start = addr;
-	info->screen_base = ioremap(addr, 0x800000);
+	info->fix.smem_len = 0x800000;
+
+	info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
 	if (info->screen_base = NULL) {
 		ret = -ENOMEM;
 		goto atyfb_setup_generic_fail;
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 08/47] video: fbdev: atyfb: clarify ioremap() base and length used
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (12 preceding siblings ...)
  (?)
@ 2015-03-20 23:17 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: Jean-Christophe Plagniol-Villard, linux-fbdev, Antonino Daplas,
	Daniel Vetter, Luis R. Rodriguez, x86, linux-kernel,
	Tomi Valkeinen, xen-devel, Ingo Molnar, Linus Torvalds

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This has no functional changes, it just adjusts
the ioremap() call for the framebuffer to use
the same values we later use for the framebuffer,
this will make it easier to review the next change.

The size of the framebuffer varies but since this is
for PCI we *know* this defaults to 0x800000.
atyfb_setup_generic() is *only* used on PCI probe.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/aty/atyfb_base.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
index 16936bb..8025624 100644
--- a/drivers/video/fbdev/aty/atyfb_base.c
+++ b/drivers/video/fbdev/aty/atyfb_base.c
@@ -3489,7 +3489,9 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
 
 	/* Map in frame buffer */
 	info->fix.smem_start = addr;
-	info->screen_base = ioremap(addr, 0x800000);
+	info->fix.smem_len = 0x800000;
+
+	info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
 	if (info->screen_base == NULL) {
 		ret = -ENOMEM;
 		goto atyfb_setup_generic_fail;
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:17   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The atyfb driver uses an MTRR work around since some
cards use the same PCI BAR for the framebuffer and MMIO.
In such cards the last page is used for MMIO, the rest for
the framebuffer, so on those cards we ioremap() the MMIO
page alone, then again ioremap() the full framebuffer
including the MMIO space *and* ___then___ use an MTRR with
MTRR_TYPE_WRCOMB on the full PCI BAR... and finally "hole"
in an MTRR_TYPE_UNCACHABLE MTRR only for MMIO.

This is a terrible fucking work around, and should by no means
be necessary however evidence through a large series of conversion
of drivers to ioremap_wc() for the framebuffer shows that around
the time MTRR started becoming popular devices did not have things
lined up for easily separating the framebuffer and MMIO register
access. In some cases a driver requires significant intrusive
changes in order to make the split for an ioremap() for MMIO registers
and another ioremap_wc() for the framebuffer, at other times a
bit of careful study of the driver suffices. This example driver
falls into the later category.

We can replace the MTRR MTRR_TYPE_UNCACHABLE
work around by using ioremap_nocache(), the length of the
MMIO space should already be correct. The other part we
need to correct is ensuring we ioremap() for the framebuffer
only the required size. Since the ioremap() happens early
on probe for PCI devices before aty_init() where we typically
adjust the length and know how to do it, we can fix this by
pegging the bus type as PCI on PCI probe, and finally fudging
and framebuffer length just as we do on aty_init().

The last thing we do must do to remain sane is ensure we
use the info->fix.smem_start and info->fix.smem_len for
the framebuffer MTRR as we know that is always well adjusted.
The *one* concern here would be if the MTRR is not in units
of 4K __but__ we already know that in the PCI case this cannot
happen, in the shared space setting the MTRR would be up to
0x7ff000 and assuming a 4K page:

; 0x7ff000 / 0x1000
	2047

Also, internally when MTRR is used mtrr_add() will use mtrr_check()
and that should splat a warning when the MTRR base and size are
not compatible with what is expected for MTRR usage.

This fix lets us nuke the MTRR_TYPE_UNCACHABLE MTRR "hole".

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/aty/atyfb.h      |  1 -
 drivers/video/fbdev/aty/atyfb_base.c | 28 ++++++----------------------
 2 files changed, 6 insertions(+), 23 deletions(-)

diff --git a/drivers/video/fbdev/aty/atyfb.h b/drivers/video/fbdev/aty/atyfb.h
index 1f39a62..89ec439 100644
--- a/drivers/video/fbdev/aty/atyfb.h
+++ b/drivers/video/fbdev/aty/atyfb.h
@@ -184,7 +184,6 @@ struct atyfb_par {
 	spinlock_t int_lock;
 #ifdef CONFIG_MTRR
 	int mtrr_aper;
-	int mtrr_reg;
 #endif
 	u32 mem_cntl;
 	struct crtc saved_crtc;
diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
index 8025624..8875e56 100644
--- a/drivers/video/fbdev/aty/atyfb_base.c
+++ b/drivers/video/fbdev/aty/atyfb_base.c
@@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
 
 #ifdef CONFIG_MTRR
 	par->mtrr_aper = -1;
-	par->mtrr_reg = -1;
 	if (!nomtrr) {
-		/* Cover the whole resource. */
-		par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
+		par->mtrr_aper = mtrr_add(info->fix.smem_start,
+					  info->fix.smem_len,
 					  MTRR_TYPE_WRCOMB, 1);
-		if (par->mtrr_aper >= 0 && !par->aux_start) {
-			/* Make a hole for mmio. */
-			par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
-						 GUI_RESERVE, GUI_RESERVE,
-						 MTRR_TYPE_UNCACHABLE, 1);
-			if (par->mtrr_reg < 0) {
-				mtrr_del(par->mtrr_aper, 0, 0);
-				par->mtrr_aper = -1;
-			}
-		}
 	}
 #endif
 
@@ -2776,10 +2765,6 @@ aty_init_exit:
 	par->pll_ops->set_pll(info, &par->saved_pll);
 
 #ifdef CONFIG_MTRR
-	if (par->mtrr_reg >= 0) {
-		mtrr_del(par->mtrr_reg, 0, 0);
-		par->mtrr_reg = -1;
-	}
 	if (par->mtrr_aper >= 0) {
 		mtrr_del(par->mtrr_aper, 0, 0);
 		par->mtrr_aper = -1;
@@ -3466,7 +3451,7 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
 	}
 
 	info->fix.mmio_start = raddr;
-	par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
+	par->ati_regbase = ioremap_nocache(info->fix.mmio_start, 0x1000);
 	if (par->ati_regbase == NULL)
 		return -ENOMEM;
 
@@ -3491,6 +3476,8 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
 	info->fix.smem_start = addr;
 	info->fix.smem_len = 0x800000;
 
+	aty_fudge_framebuffer_len(info);
+
 	info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
 	if (info->screen_base == NULL) {
 		ret = -ENOMEM;
@@ -3563,6 +3550,7 @@ static int atyfb_pci_probe(struct pci_dev *pdev,
 		return -ENOMEM;
 	}
 	par = info->par;
+	par->bus_type = PCI;
 	info->fix = atyfb_fix;
 	info->device = &pdev->dev;
 	par->pci_id = pdev->device;
@@ -3732,10 +3720,6 @@ static void atyfb_remove(struct fb_info *info)
 #endif
 
 #ifdef CONFIG_MTRR
-	if (par->mtrr_reg >= 0) {
-		mtrr_del(par->mtrr_reg, 0, 0);
-		par->mtrr_reg = -1;
-	}
 	if (par->mtrr_aper >= 0) {
 		mtrr_del(par->mtrr_aper, 0, 0);
 		par->mtrr_aper = -1;
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
@ 2015-03-20 23:17   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The atyfb driver uses an MTRR work around since some
cards use the same PCI BAR for the framebuffer and MMIO.
In such cards the last page is used for MMIO, the rest for
the framebuffer, so on those cards we ioremap() the MMIO
page alone, then again ioremap() the full framebuffer
including the MMIO space *and* ___then___ use an MTRR with
MTRR_TYPE_WRCOMB on the full PCI BAR... and finally "hole"
in an MTRR_TYPE_UNCACHABLE MTRR only for MMIO.

This is a terrible fucking work around, and should by no means
be necessary however evidence through a large series of conversion
of drivers to ioremap_wc() for the framebuffer shows that around
the time MTRR started becoming popular devices did not have things
lined up for easily separating the framebuffer and MMIO register
access. In some cases a driver requires significant intrusive
changes in order to make the split for an ioremap() for MMIO registers
and another ioremap_wc() for the framebuffer, at other times a
bit of careful study of the driver suffices. This example driver
falls into the later category.

We can replace the MTRR MTRR_TYPE_UNCACHABLE
work around by using ioremap_nocache(), the length of the
MMIO space should already be correct. The other part we
need to correct is ensuring we ioremap() for the framebuffer
only the required size. Since the ioremap() happens early
on probe for PCI devices before aty_init() where we typically
adjust the length and know how to do it, we can fix this by
pegging the bus type as PCI on PCI probe, and finally fudging
and framebuffer length just as we do on aty_init().

The last thing we do must do to remain sane is ensure we
use the info->fix.smem_start and info->fix.smem_len for
the framebuffer MTRR as we know that is always well adjusted.
The *one* concern here would be if the MTRR is not in units
of 4K __but__ we already know that in the PCI case this cannot
happen, in the shared space setting the MTRR would be up to
0x7ff000 and assuming a 4K page:

; 0x7ff000 / 0x1000
	2047

Also, internally when MTRR is used mtrr_add() will use mtrr_check()
and that should splat a warning when the MTRR base and size are
not compatible with what is expected for MTRR usage.

This fix lets us nuke the MTRR_TYPE_UNCACHABLE MTRR "hole".

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/aty/atyfb.h      |  1 -
 drivers/video/fbdev/aty/atyfb_base.c | 28 ++++++----------------------
 2 files changed, 6 insertions(+), 23 deletions(-)

diff --git a/drivers/video/fbdev/aty/atyfb.h b/drivers/video/fbdev/aty/atyfb.h
index 1f39a62..89ec439 100644
--- a/drivers/video/fbdev/aty/atyfb.h
+++ b/drivers/video/fbdev/aty/atyfb.h
@@ -184,7 +184,6 @@ struct atyfb_par {
 	spinlock_t int_lock;
 #ifdef CONFIG_MTRR
 	int mtrr_aper;
-	int mtrr_reg;
 #endif
 	u32 mem_cntl;
 	struct crtc saved_crtc;
diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
index 8025624..8875e56 100644
--- a/drivers/video/fbdev/aty/atyfb_base.c
+++ b/drivers/video/fbdev/aty/atyfb_base.c
@@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
 
 #ifdef CONFIG_MTRR
 	par->mtrr_aper = -1;
-	par->mtrr_reg = -1;
 	if (!nomtrr) {
-		/* Cover the whole resource. */
-		par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
+		par->mtrr_aper = mtrr_add(info->fix.smem_start,
+					  info->fix.smem_len,
 					  MTRR_TYPE_WRCOMB, 1);
-		if (par->mtrr_aper >= 0 && !par->aux_start) {
-			/* Make a hole for mmio. */
-			par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
-						 GUI_RESERVE, GUI_RESERVE,
-						 MTRR_TYPE_UNCACHABLE, 1);
-			if (par->mtrr_reg < 0) {
-				mtrr_del(par->mtrr_aper, 0, 0);
-				par->mtrr_aper = -1;
-			}
-		}
 	}
 #endif
 
@@ -2776,10 +2765,6 @@ aty_init_exit:
 	par->pll_ops->set_pll(info, &par->saved_pll);
 
 #ifdef CONFIG_MTRR
-	if (par->mtrr_reg >= 0) {
-		mtrr_del(par->mtrr_reg, 0, 0);
-		par->mtrr_reg = -1;
-	}
 	if (par->mtrr_aper >= 0) {
 		mtrr_del(par->mtrr_aper, 0, 0);
 		par->mtrr_aper = -1;
@@ -3466,7 +3451,7 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
 	}
 
 	info->fix.mmio_start = raddr;
-	par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
+	par->ati_regbase = ioremap_nocache(info->fix.mmio_start, 0x1000);
 	if (par->ati_regbase = NULL)
 		return -ENOMEM;
 
@@ -3491,6 +3476,8 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
 	info->fix.smem_start = addr;
 	info->fix.smem_len = 0x800000;
 
+	aty_fudge_framebuffer_len(info);
+
 	info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
 	if (info->screen_base = NULL) {
 		ret = -ENOMEM;
@@ -3563,6 +3550,7 @@ static int atyfb_pci_probe(struct pci_dev *pdev,
 		return -ENOMEM;
 	}
 	par = info->par;
+	par->bus_type = PCI;
 	info->fix = atyfb_fix;
 	info->device = &pdev->dev;
 	par->pci_id = pdev->device;
@@ -3732,10 +3720,6 @@ static void atyfb_remove(struct fb_info *info)
 #endif
 
 #ifdef CONFIG_MTRR
-	if (par->mtrr_reg >= 0) {
-		mtrr_del(par->mtrr_reg, 0, 0);
-		par->mtrr_reg = -1;
-	}
 	if (par->mtrr_aper >= 0) {
 		mtrr_del(par->mtrr_aper, 0, 0);
 		par->mtrr_aper = -1;
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (14 preceding siblings ...)
  (?)
@ 2015-03-20 23:17 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:17 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: Jean-Christophe Plagniol-Villard, linux-fbdev, Antonino Daplas,
	Daniel Vetter, Luis R. Rodriguez, x86, linux-kernel,
	Tomi Valkeinen, xen-devel, Ingo Molnar, Linus Torvalds

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The atyfb driver uses an MTRR work around since some
cards use the same PCI BAR for the framebuffer and MMIO.
In such cards the last page is used for MMIO, the rest for
the framebuffer, so on those cards we ioremap() the MMIO
page alone, then again ioremap() the full framebuffer
including the MMIO space *and* ___then___ use an MTRR with
MTRR_TYPE_WRCOMB on the full PCI BAR... and finally "hole"
in an MTRR_TYPE_UNCACHABLE MTRR only for MMIO.

This is a terrible fucking work around, and should by no means
be necessary however evidence through a large series of conversion
of drivers to ioremap_wc() for the framebuffer shows that around
the time MTRR started becoming popular devices did not have things
lined up for easily separating the framebuffer and MMIO register
access. In some cases a driver requires significant intrusive
changes in order to make the split for an ioremap() for MMIO registers
and another ioremap_wc() for the framebuffer, at other times a
bit of careful study of the driver suffices. This example driver
falls into the later category.

We can replace the MTRR MTRR_TYPE_UNCACHABLE
work around by using ioremap_nocache(), the length of the
MMIO space should already be correct. The other part we
need to correct is ensuring we ioremap() for the framebuffer
only the required size. Since the ioremap() happens early
on probe for PCI devices before aty_init() where we typically
adjust the length and know how to do it, we can fix this by
pegging the bus type as PCI on PCI probe, and finally fudging
and framebuffer length just as we do on aty_init().

The last thing we do must do to remain sane is ensure we
use the info->fix.smem_start and info->fix.smem_len for
the framebuffer MTRR as we know that is always well adjusted.
The *one* concern here would be if the MTRR is not in units
of 4K __but__ we already know that in the PCI case this cannot
happen, in the shared space setting the MTRR would be up to
0x7ff000 and assuming a 4K page:

; 0x7ff000 / 0x1000
	2047

Also, internally when MTRR is used mtrr_add() will use mtrr_check()
and that should splat a warning when the MTRR base and size are
not compatible with what is expected for MTRR usage.

This fix lets us nuke the MTRR_TYPE_UNCACHABLE MTRR "hole".

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/aty/atyfb.h      |  1 -
 drivers/video/fbdev/aty/atyfb_base.c | 28 ++++++----------------------
 2 files changed, 6 insertions(+), 23 deletions(-)

diff --git a/drivers/video/fbdev/aty/atyfb.h b/drivers/video/fbdev/aty/atyfb.h
index 1f39a62..89ec439 100644
--- a/drivers/video/fbdev/aty/atyfb.h
+++ b/drivers/video/fbdev/aty/atyfb.h
@@ -184,7 +184,6 @@ struct atyfb_par {
 	spinlock_t int_lock;
 #ifdef CONFIG_MTRR
 	int mtrr_aper;
-	int mtrr_reg;
 #endif
 	u32 mem_cntl;
 	struct crtc saved_crtc;
diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
index 8025624..8875e56 100644
--- a/drivers/video/fbdev/aty/atyfb_base.c
+++ b/drivers/video/fbdev/aty/atyfb_base.c
@@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
 
 #ifdef CONFIG_MTRR
 	par->mtrr_aper = -1;
-	par->mtrr_reg = -1;
 	if (!nomtrr) {
-		/* Cover the whole resource. */
-		par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
+		par->mtrr_aper = mtrr_add(info->fix.smem_start,
+					  info->fix.smem_len,
 					  MTRR_TYPE_WRCOMB, 1);
-		if (par->mtrr_aper >= 0 && !par->aux_start) {
-			/* Make a hole for mmio. */
-			par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
-						 GUI_RESERVE, GUI_RESERVE,
-						 MTRR_TYPE_UNCACHABLE, 1);
-			if (par->mtrr_reg < 0) {
-				mtrr_del(par->mtrr_aper, 0, 0);
-				par->mtrr_aper = -1;
-			}
-		}
 	}
 #endif
 
@@ -2776,10 +2765,6 @@ aty_init_exit:
 	par->pll_ops->set_pll(info, &par->saved_pll);
 
 #ifdef CONFIG_MTRR
-	if (par->mtrr_reg >= 0) {
-		mtrr_del(par->mtrr_reg, 0, 0);
-		par->mtrr_reg = -1;
-	}
 	if (par->mtrr_aper >= 0) {
 		mtrr_del(par->mtrr_aper, 0, 0);
 		par->mtrr_aper = -1;
@@ -3466,7 +3451,7 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
 	}
 
 	info->fix.mmio_start = raddr;
-	par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
+	par->ati_regbase = ioremap_nocache(info->fix.mmio_start, 0x1000);
 	if (par->ati_regbase == NULL)
 		return -ENOMEM;
 
@@ -3491,6 +3476,8 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
 	info->fix.smem_start = addr;
 	info->fix.smem_len = 0x800000;
 
+	aty_fudge_framebuffer_len(info);
+
 	info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
 	if (info->screen_base == NULL) {
 		ret = -ENOMEM;
@@ -3563,6 +3550,7 @@ static int atyfb_pci_probe(struct pci_dev *pdev,
 		return -ENOMEM;
 	}
 	par = info->par;
+	par->bus_type = PCI;
 	info->fix = atyfb_fix;
 	info->device = &pdev->dev;
 	par->pci_id = pdev->device;
@@ -3732,10 +3720,6 @@ static void atyfb_remove(struct fb_info *info)
 #endif
 
 #ifdef CONFIG_MTRR
-	if (par->mtrr_reg >= 0) {
-		mtrr_del(par->mtrr_reg, 0, 0);
-		par->mtrr_reg = -1;
-	}
 	if (par->mtrr_aper >= 0) {
 		mtrr_del(par->mtrr_aper, 0, 0);
 		par->mtrr_aper = -1;
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 10/47] video: fbdev: atyfb: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/aty/atyfb.h      |  4 +---
 drivers/video/fbdev/aty/atyfb_base.c | 41 +++++++++---------------------------
 2 files changed, 11 insertions(+), 34 deletions(-)

diff --git a/drivers/video/fbdev/aty/atyfb.h b/drivers/video/fbdev/aty/atyfb.h
index 89ec439..63c4842 100644
--- a/drivers/video/fbdev/aty/atyfb.h
+++ b/drivers/video/fbdev/aty/atyfb.h
@@ -182,9 +182,7 @@ struct atyfb_par {
 	unsigned long irq_flags;
 	unsigned int irq;
 	spinlock_t int_lock;
-#ifdef CONFIG_MTRR
-	int mtrr_aper;
-#endif
+	int wc_cookie;
 	u32 mem_cntl;
 	struct crtc saved_crtc;
 	union aty_pll saved_pll;
diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
index 8875e56..af278bb 100644
--- a/drivers/video/fbdev/aty/atyfb_base.c
+++ b/drivers/video/fbdev/aty/atyfb_base.c
@@ -98,9 +98,6 @@
 #ifdef CONFIG_PMAC_BACKLIGHT
 #include <asm/backlight.h>
 #endif
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 
 /*
  * Debug flags.
@@ -303,9 +300,7 @@ static struct fb_ops atyfb_ops = {
 };
 
 static bool noaccel;
-#ifdef CONFIG_MTRR
 static bool nomtrr;
-#endif
 static int vram;
 static int pll;
 static int mclk;
@@ -2628,14 +2623,9 @@ static int aty_init(struct fb_info *info)
 		aty_st_le32(BUS_CNTL, aty_ld_le32(BUS_CNTL, par) |
 			    BUS_APER_REG_DIS, par);
 
-#ifdef CONFIG_MTRR
-	par->mtrr_aper = -1;
-	if (!nomtrr) {
-		par->mtrr_aper = mtrr_add(info->fix.smem_start,
-					  info->fix.smem_len,
-					  MTRR_TYPE_WRCOMB, 1);
-	}
-#endif
+	if (!nomtrr)
+		par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+						  info->fix.smem_len);
 
 	info->fbops = &atyfb_ops;
 	info->pseudo_palette = par->pseudo_palette;
@@ -2763,13 +2753,8 @@ aty_init_exit:
 	/* restore video mode */
 	aty_set_crtc(par, &par->saved_crtc);
 	par->pll_ops->set_pll(info, &par->saved_pll);
+	arch_phys_wc_del(par->wc_cookie);
 
-#ifdef CONFIG_MTRR
-	if (par->mtrr_aper >= 0) {
-		mtrr_del(par->mtrr_aper, 0, 0);
-		par->mtrr_aper = -1;
-	}
-#endif
 	return ret;
 }
 
@@ -3478,7 +3463,8 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
 
 	aty_fudge_framebuffer_len(info);
 
-	info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
+	info->screen_base = ioremap_wc(info->fix.smem_start,
+				       info->fix.smem_len);
 	if (info->screen_base == NULL) {
 		ret = -ENOMEM;
 		goto atyfb_setup_generic_fail;
@@ -3652,7 +3638,8 @@ static int __init atyfb_atari_probe(void)
 		 * Map the video memory (physical address given)
 		 * to somewhere in the kernel address space.
 		 */
-		info->screen_base = ioremap(phys_vmembase[m64_num], phys_size[m64_num]);
+		info->screen_base = ioremap_wc(phys_vmembase[m64_num],
+					       phys_size[m64_num]);
 		info->fix.smem_start = (unsigned long)info->screen_base; /* Fake! */
 		par->ati_regbase = ioremap(phys_guiregbase[m64_num], 0x10000) +
 						0xFC00ul;
@@ -3719,12 +3706,8 @@ static void atyfb_remove(struct fb_info *info)
 		aty_bl_exit(info->bl_dev);
 #endif
 
-#ifdef CONFIG_MTRR
-	if (par->mtrr_aper >= 0) {
-		mtrr_del(par->mtrr_aper, 0, 0);
-		par->mtrr_aper = -1;
-	}
-#endif
+	arch_phys_wc_del(par->wc_cookie);
+
 #ifndef __sparc__
 	if (par->ati_regbase)
 		iounmap(par->ati_regbase);
@@ -3840,10 +3823,8 @@ static int __init atyfb_setup(char *options)
 	while ((this_opt = strsep(&options, ",")) != NULL) {
 		if (!strncmp(this_opt, "noaccel", 7)) {
 			noaccel = 1;
-#ifdef CONFIG_MTRR
 		} else if (!strncmp(this_opt, "nomtrr", 6)) {
 			nomtrr = 1;
-#endif
 		} else if (!strncmp(this_opt, "vram:", 5))
 			vram = simple_strtoul(this_opt + 5, NULL, 0);
 		else if (!strncmp(this_opt, "pll:", 4))
@@ -4013,7 +3994,5 @@ module_param(comp_sync, int, 0);
 MODULE_PARM_DESC(comp_sync, "Set composite sync signal to low (0) or high (1)");
 module_param(mode, charp, 0);
 MODULE_PARM_DESC(mode, "Specify resolution as \"<xres>x<yres>[-<bpp>][@<refresh>]\" ");
-#ifdef CONFIG_MTRR
 module_param(nomtrr, bool, 0);
 MODULE_PARM_DESC(nomtrr, "bool: disable use of MTRR registers");
-#endif
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 10/47] video: fbdev: atyfb: use arch_phys_wc_add() and ioremap_wc()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/aty/atyfb.h      |  4 +---
 drivers/video/fbdev/aty/atyfb_base.c | 41 +++++++++---------------------------
 2 files changed, 11 insertions(+), 34 deletions(-)

diff --git a/drivers/video/fbdev/aty/atyfb.h b/drivers/video/fbdev/aty/atyfb.h
index 89ec439..63c4842 100644
--- a/drivers/video/fbdev/aty/atyfb.h
+++ b/drivers/video/fbdev/aty/atyfb.h
@@ -182,9 +182,7 @@ struct atyfb_par {
 	unsigned long irq_flags;
 	unsigned int irq;
 	spinlock_t int_lock;
-#ifdef CONFIG_MTRR
-	int mtrr_aper;
-#endif
+	int wc_cookie;
 	u32 mem_cntl;
 	struct crtc saved_crtc;
 	union aty_pll saved_pll;
diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
index 8875e56..af278bb 100644
--- a/drivers/video/fbdev/aty/atyfb_base.c
+++ b/drivers/video/fbdev/aty/atyfb_base.c
@@ -98,9 +98,6 @@
 #ifdef CONFIG_PMAC_BACKLIGHT
 #include <asm/backlight.h>
 #endif
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 
 /*
  * Debug flags.
@@ -303,9 +300,7 @@ static struct fb_ops atyfb_ops = {
 };
 
 static bool noaccel;
-#ifdef CONFIG_MTRR
 static bool nomtrr;
-#endif
 static int vram;
 static int pll;
 static int mclk;
@@ -2628,14 +2623,9 @@ static int aty_init(struct fb_info *info)
 		aty_st_le32(BUS_CNTL, aty_ld_le32(BUS_CNTL, par) |
 			    BUS_APER_REG_DIS, par);
 
-#ifdef CONFIG_MTRR
-	par->mtrr_aper = -1;
-	if (!nomtrr) {
-		par->mtrr_aper = mtrr_add(info->fix.smem_start,
-					  info->fix.smem_len,
-					  MTRR_TYPE_WRCOMB, 1);
-	}
-#endif
+	if (!nomtrr)
+		par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+						  info->fix.smem_len);
 
 	info->fbops = &atyfb_ops;
 	info->pseudo_palette = par->pseudo_palette;
@@ -2763,13 +2753,8 @@ aty_init_exit:
 	/* restore video mode */
 	aty_set_crtc(par, &par->saved_crtc);
 	par->pll_ops->set_pll(info, &par->saved_pll);
+	arch_phys_wc_del(par->wc_cookie);
 
-#ifdef CONFIG_MTRR
-	if (par->mtrr_aper >= 0) {
-		mtrr_del(par->mtrr_aper, 0, 0);
-		par->mtrr_aper = -1;
-	}
-#endif
 	return ret;
 }
 
@@ -3478,7 +3463,8 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
 
 	aty_fudge_framebuffer_len(info);
 
-	info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
+	info->screen_base = ioremap_wc(info->fix.smem_start,
+				       info->fix.smem_len);
 	if (info->screen_base = NULL) {
 		ret = -ENOMEM;
 		goto atyfb_setup_generic_fail;
@@ -3652,7 +3638,8 @@ static int __init atyfb_atari_probe(void)
 		 * Map the video memory (physical address given)
 		 * to somewhere in the kernel address space.
 		 */
-		info->screen_base = ioremap(phys_vmembase[m64_num], phys_size[m64_num]);
+		info->screen_base = ioremap_wc(phys_vmembase[m64_num],
+					       phys_size[m64_num]);
 		info->fix.smem_start = (unsigned long)info->screen_base; /* Fake! */
 		par->ati_regbase = ioremap(phys_guiregbase[m64_num], 0x10000) +
 						0xFC00ul;
@@ -3719,12 +3706,8 @@ static void atyfb_remove(struct fb_info *info)
 		aty_bl_exit(info->bl_dev);
 #endif
 
-#ifdef CONFIG_MTRR
-	if (par->mtrr_aper >= 0) {
-		mtrr_del(par->mtrr_aper, 0, 0);
-		par->mtrr_aper = -1;
-	}
-#endif
+	arch_phys_wc_del(par->wc_cookie);
+
 #ifndef __sparc__
 	if (par->ati_regbase)
 		iounmap(par->ati_regbase);
@@ -3840,10 +3823,8 @@ static int __init atyfb_setup(char *options)
 	while ((this_opt = strsep(&options, ",")) != NULL) {
 		if (!strncmp(this_opt, "noaccel", 7)) {
 			noaccel = 1;
-#ifdef CONFIG_MTRR
 		} else if (!strncmp(this_opt, "nomtrr", 6)) {
 			nomtrr = 1;
-#endif
 		} else if (!strncmp(this_opt, "vram:", 5))
 			vram = simple_strtoul(this_opt + 5, NULL, 0);
 		else if (!strncmp(this_opt, "pll:", 4))
@@ -4013,7 +3994,5 @@ module_param(comp_sync, int, 0);
 MODULE_PARM_DESC(comp_sync, "Set composite sync signal to low (0) or high (1)");
 module_param(mode, charp, 0);
 MODULE_PARM_DESC(mode, "Specify resolution as \"<xres>x<yres>[-<bpp>][@<refresh>]\" ");
-#ifdef CONFIG_MTRR
 module_param(nomtrr, bool, 0);
 MODULE_PARM_DESC(nomtrr, "bool: disable use of MTRR registers");
-#endif
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 10/47] video: fbdev: atyfb: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (16 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/aty/atyfb.h      |  4 +---
 drivers/video/fbdev/aty/atyfb_base.c | 41 +++++++++---------------------------
 2 files changed, 11 insertions(+), 34 deletions(-)

diff --git a/drivers/video/fbdev/aty/atyfb.h b/drivers/video/fbdev/aty/atyfb.h
index 89ec439..63c4842 100644
--- a/drivers/video/fbdev/aty/atyfb.h
+++ b/drivers/video/fbdev/aty/atyfb.h
@@ -182,9 +182,7 @@ struct atyfb_par {
 	unsigned long irq_flags;
 	unsigned int irq;
 	spinlock_t int_lock;
-#ifdef CONFIG_MTRR
-	int mtrr_aper;
-#endif
+	int wc_cookie;
 	u32 mem_cntl;
 	struct crtc saved_crtc;
 	union aty_pll saved_pll;
diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
index 8875e56..af278bb 100644
--- a/drivers/video/fbdev/aty/atyfb_base.c
+++ b/drivers/video/fbdev/aty/atyfb_base.c
@@ -98,9 +98,6 @@
 #ifdef CONFIG_PMAC_BACKLIGHT
 #include <asm/backlight.h>
 #endif
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 
 /*
  * Debug flags.
@@ -303,9 +300,7 @@ static struct fb_ops atyfb_ops = {
 };
 
 static bool noaccel;
-#ifdef CONFIG_MTRR
 static bool nomtrr;
-#endif
 static int vram;
 static int pll;
 static int mclk;
@@ -2628,14 +2623,9 @@ static int aty_init(struct fb_info *info)
 		aty_st_le32(BUS_CNTL, aty_ld_le32(BUS_CNTL, par) |
 			    BUS_APER_REG_DIS, par);
 
-#ifdef CONFIG_MTRR
-	par->mtrr_aper = -1;
-	if (!nomtrr) {
-		par->mtrr_aper = mtrr_add(info->fix.smem_start,
-					  info->fix.smem_len,
-					  MTRR_TYPE_WRCOMB, 1);
-	}
-#endif
+	if (!nomtrr)
+		par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+						  info->fix.smem_len);
 
 	info->fbops = &atyfb_ops;
 	info->pseudo_palette = par->pseudo_palette;
@@ -2763,13 +2753,8 @@ aty_init_exit:
 	/* restore video mode */
 	aty_set_crtc(par, &par->saved_crtc);
 	par->pll_ops->set_pll(info, &par->saved_pll);
+	arch_phys_wc_del(par->wc_cookie);
 
-#ifdef CONFIG_MTRR
-	if (par->mtrr_aper >= 0) {
-		mtrr_del(par->mtrr_aper, 0, 0);
-		par->mtrr_aper = -1;
-	}
-#endif
 	return ret;
 }
 
@@ -3478,7 +3463,8 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
 
 	aty_fudge_framebuffer_len(info);
 
-	info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
+	info->screen_base = ioremap_wc(info->fix.smem_start,
+				       info->fix.smem_len);
 	if (info->screen_base == NULL) {
 		ret = -ENOMEM;
 		goto atyfb_setup_generic_fail;
@@ -3652,7 +3638,8 @@ static int __init atyfb_atari_probe(void)
 		 * Map the video memory (physical address given)
 		 * to somewhere in the kernel address space.
 		 */
-		info->screen_base = ioremap(phys_vmembase[m64_num], phys_size[m64_num]);
+		info->screen_base = ioremap_wc(phys_vmembase[m64_num],
+					       phys_size[m64_num]);
 		info->fix.smem_start = (unsigned long)info->screen_base; /* Fake! */
 		par->ati_regbase = ioremap(phys_guiregbase[m64_num], 0x10000) +
 						0xFC00ul;
@@ -3719,12 +3706,8 @@ static void atyfb_remove(struct fb_info *info)
 		aty_bl_exit(info->bl_dev);
 #endif
 
-#ifdef CONFIG_MTRR
-	if (par->mtrr_aper >= 0) {
-		mtrr_del(par->mtrr_aper, 0, 0);
-		par->mtrr_aper = -1;
-	}
-#endif
+	arch_phys_wc_del(par->wc_cookie);
+
 #ifndef __sparc__
 	if (par->ati_regbase)
 		iounmap(par->ati_regbase);
@@ -3840,10 +3823,8 @@ static int __init atyfb_setup(char *options)
 	while ((this_opt = strsep(&options, ",")) != NULL) {
 		if (!strncmp(this_opt, "noaccel", 7)) {
 			noaccel = 1;
-#ifdef CONFIG_MTRR
 		} else if (!strncmp(this_opt, "nomtrr", 6)) {
 			nomtrr = 1;
-#endif
 		} else if (!strncmp(this_opt, "vram:", 5))
 			vram = simple_strtoul(this_opt + 5, NULL, 0);
 		else if (!strncmp(this_opt, "pll:", 4))
@@ -4013,7 +3994,5 @@ module_param(comp_sync, int, 0);
 MODULE_PARM_DESC(comp_sync, "Set composite sync signal to low (0) or high (1)");
 module_param(mode, charp, 0);
 MODULE_PARM_DESC(mode, "Specify resolution as \"<xres>x<yres>[-<bpp>][@<refresh>]\" ");
-#ifdef CONFIG_MTRR
 module_param(nomtrr, bool, 0);
 MODULE_PARM_DESC(nomtrr, "bool: disable use of MTRR registers");
-#endif
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 11/47] IB/qib: add acounting for MTRR
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

There is no good reason not to, we eventually delete it as well.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/infiniband/hw/qib/qib_wc_x86_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/qib/qib_wc_x86_64.c b/drivers/infiniband/hw/qib/qib_wc_x86_64.c
index 81b225f..fe0850a 100644
--- a/drivers/infiniband/hw/qib/qib_wc_x86_64.c
+++ b/drivers/infiniband/hw/qib/qib_wc_x86_64.c
@@ -118,7 +118,7 @@ int qib_enable_wc(struct qib_devdata *dd)
 	if (!ret) {
 		int cookie;
 
-		cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 0);
+		cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 1);
 		if (cookie < 0) {
 			{
 				qib_devinfo(dd->pcidev,
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 11/47] IB/qib: add acounting for MTRR
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

There is no good reason not to, we eventually delete it as well.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/infiniband/hw/qib/qib_wc_x86_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/qib/qib_wc_x86_64.c b/drivers/infiniband/hw/qib/qib_wc_x86_64.c
index 81b225f..fe0850a 100644
--- a/drivers/infiniband/hw/qib/qib_wc_x86_64.c
+++ b/drivers/infiniband/hw/qib/qib_wc_x86_64.c
@@ -118,7 +118,7 @@ int qib_enable_wc(struct qib_devdata *dd)
 	if (!ret) {
 		int cookie;
 
-		cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 0);
+		cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 1);
 		if (cookie < 0) {
 			{
 				qib_devinfo(dd->pcidev,
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 11/47] IB/qib: add acounting for MTRR
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (18 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

There is no good reason not to, we eventually delete it as well.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/infiniband/hw/qib/qib_wc_x86_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/qib/qib_wc_x86_64.c b/drivers/infiniband/hw/qib/qib_wc_x86_64.c
index 81b225f..fe0850a 100644
--- a/drivers/infiniband/hw/qib/qib_wc_x86_64.c
+++ b/drivers/infiniband/hw/qib/qib_wc_x86_64.c
@@ -118,7 +118,7 @@ int qib_enable_wc(struct qib_devdata *dd)
 	if (!ret) {
 		int cookie;
 
-		cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 0);
+		cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 1);
 		if (cookie < 0) {
 			{
 				qib_devinfo(dd->pcidev,
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 12/47] IB/qib: use arch_phys_wc_add()
  2015-03-20 23:17 ` Luis R. Rodriguez
  (?)
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Rickard Strandqvist, Dennis Dalessandro, Mike Marciniszyn,
	Roland Dreier, Ingo Molnar, Daniel Vetter, Bjorn Helgaas,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
	Stefan Bader, konrad.wilk, ville.syrjala, david.vrabel, jbeulich,
	toshi.kani, Roger Pau Monné,
	xen-devel

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver already makes use of ioremap_wc() on PIO buffers,
so convert it to use arch_phys_wc_add().

Cc: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: venkatesh.pallipadi@intel.com
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/infiniband/hw/qib/qib_wc_x86_64.c | 31 ++++---------------------------
 1 file changed, 4 insertions(+), 27 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_wc_x86_64.c b/drivers/infiniband/hw/qib/qib_wc_x86_64.c
index fe0850a..6d61ef9 100644
--- a/drivers/infiniband/hw/qib/qib_wc_x86_64.c
+++ b/drivers/infiniband/hw/qib/qib_wc_x86_64.c
@@ -116,21 +116,9 @@ int qib_enable_wc(struct qib_devdata *dd)
 	}
 
 	if (!ret) {
-		int cookie;
-
-		cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 1);
-		if (cookie < 0) {
-			{
-				qib_devinfo(dd->pcidev,
-					 "mtrr_add()  WC for PIO bufs failed (%d)\n",
-					 cookie);
-				ret = -EINVAL;
-			}
-		} else {
-			dd->wc_cookie = cookie;
-			dd->wc_base = (unsigned long) pioaddr;
-			dd->wc_len = (unsigned long) piolen;
-		}
+		dd->wc_cookie = arch_phys_wc_add(pioaddr, piolen);
+		if (dd->wc_cookie < 0)
+			ret = -EINVAL;
 	}
 
 	return ret;
@@ -142,18 +130,7 @@ int qib_enable_wc(struct qib_devdata *dd)
  */
 void qib_disable_wc(struct qib_devdata *dd)
 {
-	if (dd->wc_cookie) {
-		int r;
-
-		r = mtrr_del(dd->wc_cookie, dd->wc_base,
-			     dd->wc_len);
-		if (r < 0)
-			qib_devinfo(dd->pcidev,
-				 "mtrr_del(%lx, %lx, %lx) failed: %d\n",
-				 dd->wc_cookie, dd->wc_base,
-				 dd->wc_len, r);
-		dd->wc_cookie = 0; /* even on failure */
-	}
+	arch_phys_wc_del(dd->wc_cookie);
 }
 
 /**
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 12/47] IB/qib: use arch_phys_wc_add()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Rickard Strandqvist, Dennis Dalessandro, Mike Marciniszyn,
	Roland Dreier, Ingo Molnar, Daniel Vetter, Bjorn Helgaas,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
	Stefan Bader, konrad.wilk, ville.syrjala, david.vrabel, jbeulich,
	toshi.kani, Roger Pau Monné,
	xen-devel

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver already makes use of ioremap_wc() on PIO buffers,
so convert it to use arch_phys_wc_add().

Cc: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: venkatesh.pallipadi@intel.com
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/infiniband/hw/qib/qib_wc_x86_64.c | 31 ++++---------------------------
 1 file changed, 4 insertions(+), 27 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_wc_x86_64.c b/drivers/infiniband/hw/qib/qib_wc_x86_64.c
index fe0850a..6d61ef9 100644
--- a/drivers/infiniband/hw/qib/qib_wc_x86_64.c
+++ b/drivers/infiniband/hw/qib/qib_wc_x86_64.c
@@ -116,21 +116,9 @@ int qib_enable_wc(struct qib_devdata *dd)
 	}
 
 	if (!ret) {
-		int cookie;
-
-		cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 1);
-		if (cookie < 0) {
-			{
-				qib_devinfo(dd->pcidev,
-					 "mtrr_add()  WC for PIO bufs failed (%d)\n",
-					 cookie);
-				ret = -EINVAL;
-			}
-		} else {
-			dd->wc_cookie = cookie;
-			dd->wc_base = (unsigned long) pioaddr;
-			dd->wc_len = (unsigned long) piolen;
-		}
+		dd->wc_cookie = arch_phys_wc_add(pioaddr, piolen);
+		if (dd->wc_cookie < 0)
+			ret = -EINVAL;
 	}
 
 	return ret;
@@ -142,18 +130,7 @@ int qib_enable_wc(struct qib_devdata *dd)
  */
 void qib_disable_wc(struct qib_devdata *dd)
 {
-	if (dd->wc_cookie) {
-		int r;
-
-		r = mtrr_del(dd->wc_cookie, dd->wc_base,
-			     dd->wc_len);
-		if (r < 0)
-			qib_devinfo(dd->pcidev,
-				 "mtrr_del(%lx, %lx, %lx) failed: %d\n",
-				 dd->wc_cookie, dd->wc_base,
-				 dd->wc_len, r);
-		dd->wc_cookie = 0; /* even on failure */
-	}
+	arch_phys_wc_del(dd->wc_cookie);
 }
 
 /**
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 12/47] IB/qib: use arch_phys_wc_add()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Rickard Strandqvist, Dennis Dalessandro, Mike Marciniszyn,
	Roland Dreier, Ingo Molnar, Daniel Vetter, Bjorn Helgaas,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
	Stefan Bader, konrad.wilk, ville.syrjala, david.vrabel, jbeulich,
	toshi.kani

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver already makes use of ioremap_wc() on PIO buffers,
so convert it to use arch_phys_wc_add().

Cc: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: venkatesh.pallipadi@intel.com
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/infiniband/hw/qib/qib_wc_x86_64.c | 31 ++++---------------------------
 1 file changed, 4 insertions(+), 27 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_wc_x86_64.c b/drivers/infiniband/hw/qib/qib_wc_x86_64.c
index fe0850a..6d61ef9 100644
--- a/drivers/infiniband/hw/qib/qib_wc_x86_64.c
+++ b/drivers/infiniband/hw/qib/qib_wc_x86_64.c
@@ -116,21 +116,9 @@ int qib_enable_wc(struct qib_devdata *dd)
 	}
 
 	if (!ret) {
-		int cookie;
-
-		cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 1);
-		if (cookie < 0) {
-			{
-				qib_devinfo(dd->pcidev,
-					 "mtrr_add()  WC for PIO bufs failed (%d)\n",
-					 cookie);
-				ret = -EINVAL;
-			}
-		} else {
-			dd->wc_cookie = cookie;
-			dd->wc_base = (unsigned long) pioaddr;
-			dd->wc_len = (unsigned long) piolen;
-		}
+		dd->wc_cookie = arch_phys_wc_add(pioaddr, piolen);
+		if (dd->wc_cookie < 0)
+			ret = -EINVAL;
 	}
 
 	return ret;
@@ -142,18 +130,7 @@ int qib_enable_wc(struct qib_devdata *dd)
  */
 void qib_disable_wc(struct qib_devdata *dd)
 {
-	if (dd->wc_cookie) {
-		int r;
-
-		r = mtrr_del(dd->wc_cookie, dd->wc_base,
-			     dd->wc_len);
-		if (r < 0)
-			qib_devinfo(dd->pcidev,
-				 "mtrr_del(%lx, %lx, %lx) failed: %d\n",
-				 dd->wc_cookie, dd->wc_base,
-				 dd->wc_len, r);
-		dd->wc_cookie = 0; /* even on failure */
-	}
+	arch_phys_wc_del(dd->wc_cookie);
 }
 
 /**
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 13/47] IB/ipath: add counting for MTRR
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

There is no good reason not to, we eventually delete it as well.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/infiniband/hw/ipath/ipath_wc_x86_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
index 4ad0b93..70c1f3a 100644
--- a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
+++ b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
@@ -127,7 +127,7 @@ int ipath_enable_wc(struct ipath_devdata *dd)
 			   "(addr %llx, len=0x%llx)\n",
 			   (unsigned long long) pioaddr,
 			   (unsigned long long) piolen);
-		cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 0);
+		cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 1);
 		if (cookie < 0) {
 			{
 				dev_info(&dd->pcidev->dev,
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 13/47] IB/ipath: add counting for MTRR
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

There is no good reason not to, we eventually delete it as well.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/infiniband/hw/ipath/ipath_wc_x86_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
index 4ad0b93..70c1f3a 100644
--- a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
+++ b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
@@ -127,7 +127,7 @@ int ipath_enable_wc(struct ipath_devdata *dd)
 			   "(addr %llx, len=0x%llx)\n",
 			   (unsigned long long) pioaddr,
 			   (unsigned long long) piolen);
-		cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 0);
+		cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 1);
 		if (cookie < 0) {
 			{
 				dev_info(&dd->pcidev->dev,
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 13/47] IB/ipath: add counting for MTRR
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (20 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

There is no good reason not to, we eventually delete it as well.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/infiniband/hw/ipath/ipath_wc_x86_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
index 4ad0b93..70c1f3a 100644
--- a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
+++ b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
@@ -127,7 +127,7 @@ int ipath_enable_wc(struct ipath_devdata *dd)
 			   "(addr %llx, len=0x%llx)\n",
 			   (unsigned long long) pioaddr,
 			   (unsigned long long) piolen);
-		cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 0);
+		cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 1);
 		if (cookie < 0) {
 			{
 				dev_info(&dd->pcidev->dev,
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 14/47] IB/ipath: use __arch_phys_wc_add()
  2015-03-20 23:17 ` Luis R. Rodriguez
  (?)
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Rickard Strandqvist, Mike Marciniszyn, Roland Dreier,
	Ingo Molnar, Linus Torvalds, Daniel Vetter, Bjorn Helgaas,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
	Stefan Bader, konrad.wilk, ville.syrjala, david.vrabel, jbeulich,
	toshi.kani, Roger Pau Monné,
	xen-devel

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver sadly does not have the MMIO registers and WC
desired areas (PIO buffers in this case) properly split up
and addressing a split is considerable work, as such this
such requires using the __arch_phys_wc_add() call to
ensure write combining is enforced using MTRR on x86
even when PAT is available.

Cc: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: venkatesh.pallipadi@intel.com
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/infiniband/hw/ipath/ipath_driver.c    |  7 ++--
 drivers/infiniband/hw/ipath/ipath_kernel.h    |  4 +--
 drivers/infiniband/hw/ipath/ipath_wc_x86_64.c | 47 ++++++++++-----------------
 3 files changed, 20 insertions(+), 38 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c
index bd0caed..464f39c 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -542,6 +542,7 @@ static int ipath_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	dd->ipath_kregbase = __ioremap(addr, len,
 		(_PAGE_NO_CACHE|_PAGE_WRITETHRU));
 #else
+	/* XXX: split pio on a separate ioremap_wc() */
 	dd->ipath_kregbase = ioremap_nocache(addr, len);
 #endif
 
@@ -587,12 +588,8 @@ static int ipath_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	ret = ipath_enable_wc(dd);
 
-	if (ret) {
-		ipath_dev_err(dd, "Write combining not enabled "
-			      "(err %d): performance may be poor\n",
-			      -ret);
+	if (ret)
 		ret = 0;
-	}
 
 	ipath_verify_pioperf(dd);
 
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index e08db70..f0f9471 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -463,9 +463,7 @@ struct ipath_devdata {
 	/* offset in HT config space of slave/primary interface block */
 	u8 ipath_ht_slave_off;
 	/* for write combining settings */
-	unsigned long ipath_wc_cookie;
-	unsigned long ipath_wc_base;
-	unsigned long ipath_wc_len;
+	int wc_cookie;
 	/* ref count for each pkey */
 	atomic_t ipath_pkeyrefs[4];
 	/* shadow copy of struct page *'s for exp tid pages */
diff --git a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
index 70c1f3a..88709c1 100644
--- a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
+++ b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
@@ -37,7 +37,6 @@
  */
 
 #include <linux/pci.h>
-#include <asm/mtrr.h>
 #include <asm/processor.h>
 
 #include "ipath_kernel.h"
@@ -122,27 +121,26 @@ int ipath_enable_wc(struct ipath_devdata *dd)
 	}
 
 	if (!ret) {
-		int cookie;
 		ipath_cdbg(VERBOSE, "Setting mtrr for chip to WC "
 			   "(addr %llx, len=0x%llx)\n",
 			   (unsigned long long) pioaddr,
 			   (unsigned long long) piolen);
-		cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 1);
-		if (cookie < 0) {
-			{
-				dev_info(&dd->pcidev->dev,
-					 "mtrr_add()  WC for PIO bufs "
-					 "failed (%d)\n",
-					 cookie);
-				ret = -EINVAL;
-			}
-		} else {
-			ipath_cdbg(VERBOSE, "Set mtrr for chip to WC, "
-				   "cookie is %d\n", cookie);
-			dd->ipath_wc_cookie = cookie;
-			dd->ipath_wc_base = (unsigned long) pioaddr;
-			dd->ipath_wc_len = (unsigned long) piolen;
-		}
+		dd->wc_cookie = __arch_phys_wc_add(pioaddr, piolen);
+		if (dd->wc_cookie <= 0) {
+				/*
+				 * If MTRR is not available on an architecture
+				 * or if it could not be enabled at run time
+				 * folks who care should work towards the
+				 * ioremap_wc() split.
+				 */
+				if (!dd->wc_cookie)
+					ipath_dev_err(dd, "System does not support MTRR\n");
+				else {
+					ipath_dev_err(dd, "Seting mtrr failed on PIO buffers\n");
+					ret = -EINVAL;
+				}
+		} else
+			ipath_cdbg(VERBOSE, "Set mtrr for chip to WC\n");
 	}
 
 	return ret;
@@ -154,16 +152,5 @@ int ipath_enable_wc(struct ipath_devdata *dd)
  */
 void ipath_disable_wc(struct ipath_devdata *dd)
 {
-	if (dd->ipath_wc_cookie) {
-		int r;
-		ipath_cdbg(VERBOSE, "undoing WCCOMB on pio buffers\n");
-		r = mtrr_del(dd->ipath_wc_cookie, dd->ipath_wc_base,
-			     dd->ipath_wc_len);
-		if (r < 0)
-			dev_info(&dd->pcidev->dev,
-				 "mtrr_del(%lx, %lx, %lx) failed: %d\n",
-				 dd->ipath_wc_cookie, dd->ipath_wc_base,
-				 dd->ipath_wc_len, r);
-		dd->ipath_wc_cookie = 0; /* even on failure */
-	}
+	arch_phys_wc_del(dd->wc_cookie);
 }
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 14/47] IB/ipath: use __arch_phys_wc_add()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Rickard Strandqvist, Mike Marciniszyn, Roland Dreier,
	Ingo Molnar, Linus Torvalds, Daniel Vetter, Bjorn Helgaas,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
	Stefan Bader, konrad.wilk, ville.syrjala, david.vrabel, jbeulich,
	toshi.kani, Roger Pau Monné,
	xen-devel

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver sadly does not have the MMIO registers and WC
desired areas (PIO buffers in this case) properly split up
and addressing a split is considerable work, as such this
such requires using the __arch_phys_wc_add() call to
ensure write combining is enforced using MTRR on x86
even when PAT is available.

Cc: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: venkatesh.pallipadi@intel.com
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/infiniband/hw/ipath/ipath_driver.c    |  7 ++--
 drivers/infiniband/hw/ipath/ipath_kernel.h    |  4 +--
 drivers/infiniband/hw/ipath/ipath_wc_x86_64.c | 47 ++++++++++-----------------
 3 files changed, 20 insertions(+), 38 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c
index bd0caed..464f39c 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -542,6 +542,7 @@ static int ipath_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	dd->ipath_kregbase = __ioremap(addr, len,
 		(_PAGE_NO_CACHE|_PAGE_WRITETHRU));
 #else
+	/* XXX: split pio on a separate ioremap_wc() */
 	dd->ipath_kregbase = ioremap_nocache(addr, len);
 #endif
 
@@ -587,12 +588,8 @@ static int ipath_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	ret = ipath_enable_wc(dd);
 
-	if (ret) {
-		ipath_dev_err(dd, "Write combining not enabled "
-			      "(err %d): performance may be poor\n",
-			      -ret);
+	if (ret)
 		ret = 0;
-	}
 
 	ipath_verify_pioperf(dd);
 
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index e08db70..f0f9471 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -463,9 +463,7 @@ struct ipath_devdata {
 	/* offset in HT config space of slave/primary interface block */
 	u8 ipath_ht_slave_off;
 	/* for write combining settings */
-	unsigned long ipath_wc_cookie;
-	unsigned long ipath_wc_base;
-	unsigned long ipath_wc_len;
+	int wc_cookie;
 	/* ref count for each pkey */
 	atomic_t ipath_pkeyrefs[4];
 	/* shadow copy of struct page *'s for exp tid pages */
diff --git a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
index 70c1f3a..88709c1 100644
--- a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
+++ b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
@@ -37,7 +37,6 @@
  */
 
 #include <linux/pci.h>
-#include <asm/mtrr.h>
 #include <asm/processor.h>
 
 #include "ipath_kernel.h"
@@ -122,27 +121,26 @@ int ipath_enable_wc(struct ipath_devdata *dd)
 	}
 
 	if (!ret) {
-		int cookie;
 		ipath_cdbg(VERBOSE, "Setting mtrr for chip to WC "
 			   "(addr %llx, len=0x%llx)\n",
 			   (unsigned long long) pioaddr,
 			   (unsigned long long) piolen);
-		cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 1);
-		if (cookie < 0) {
-			{
-				dev_info(&dd->pcidev->dev,
-					 "mtrr_add()  WC for PIO bufs "
-					 "failed (%d)\n",
-					 cookie);
-				ret = -EINVAL;
-			}
-		} else {
-			ipath_cdbg(VERBOSE, "Set mtrr for chip to WC, "
-				   "cookie is %d\n", cookie);
-			dd->ipath_wc_cookie = cookie;
-			dd->ipath_wc_base = (unsigned long) pioaddr;
-			dd->ipath_wc_len = (unsigned long) piolen;
-		}
+		dd->wc_cookie = __arch_phys_wc_add(pioaddr, piolen);
+		if (dd->wc_cookie <= 0) {
+				/*
+				 * If MTRR is not available on an architecture
+				 * or if it could not be enabled at run time
+				 * folks who care should work towards the
+				 * ioremap_wc() split.
+				 */
+				if (!dd->wc_cookie)
+					ipath_dev_err(dd, "System does not support MTRR\n");
+				else {
+					ipath_dev_err(dd, "Seting mtrr failed on PIO buffers\n");
+					ret = -EINVAL;
+				}
+		} else
+			ipath_cdbg(VERBOSE, "Set mtrr for chip to WC\n");
 	}
 
 	return ret;
@@ -154,16 +152,5 @@ int ipath_enable_wc(struct ipath_devdata *dd)
  */
 void ipath_disable_wc(struct ipath_devdata *dd)
 {
-	if (dd->ipath_wc_cookie) {
-		int r;
-		ipath_cdbg(VERBOSE, "undoing WCCOMB on pio buffers\n");
-		r = mtrr_del(dd->ipath_wc_cookie, dd->ipath_wc_base,
-			     dd->ipath_wc_len);
-		if (r < 0)
-			dev_info(&dd->pcidev->dev,
-				 "mtrr_del(%lx, %lx, %lx) failed: %d\n",
-				 dd->ipath_wc_cookie, dd->ipath_wc_base,
-				 dd->ipath_wc_len, r);
-		dd->ipath_wc_cookie = 0; /* even on failure */
-	}
+	arch_phys_wc_del(dd->wc_cookie);
 }
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 14/47] IB/ipath: use __arch_phys_wc_add()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Rickard Strandqvist, Mike Marciniszyn, Roland Dreier,
	Ingo Molnar, Linus Torvalds, Daniel Vetter, Bjorn Helgaas,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
	Stefan Bader, konrad.wilk, ville.syrjala, david.vrabel, jbeulich,
	toshi.kani

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver sadly does not have the MMIO registers and WC
desired areas (PIO buffers in this case) properly split up
and addressing a split is considerable work, as such this
such requires using the __arch_phys_wc_add() call to
ensure write combining is enforced using MTRR on x86
even when PAT is available.

Cc: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: venkatesh.pallipadi@intel.com
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/infiniband/hw/ipath/ipath_driver.c    |  7 ++--
 drivers/infiniband/hw/ipath/ipath_kernel.h    |  4 +--
 drivers/infiniband/hw/ipath/ipath_wc_x86_64.c | 47 ++++++++++-----------------
 3 files changed, 20 insertions(+), 38 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c
index bd0caed..464f39c 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -542,6 +542,7 @@ static int ipath_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	dd->ipath_kregbase = __ioremap(addr, len,
 		(_PAGE_NO_CACHE|_PAGE_WRITETHRU));
 #else
+	/* XXX: split pio on a separate ioremap_wc() */
 	dd->ipath_kregbase = ioremap_nocache(addr, len);
 #endif
 
@@ -587,12 +588,8 @@ static int ipath_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	ret = ipath_enable_wc(dd);
 
-	if (ret) {
-		ipath_dev_err(dd, "Write combining not enabled "
-			      "(err %d): performance may be poor\n",
-			      -ret);
+	if (ret)
 		ret = 0;
-	}
 
 	ipath_verify_pioperf(dd);
 
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index e08db70..f0f9471 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -463,9 +463,7 @@ struct ipath_devdata {
 	/* offset in HT config space of slave/primary interface block */
 	u8 ipath_ht_slave_off;
 	/* for write combining settings */
-	unsigned long ipath_wc_cookie;
-	unsigned long ipath_wc_base;
-	unsigned long ipath_wc_len;
+	int wc_cookie;
 	/* ref count for each pkey */
 	atomic_t ipath_pkeyrefs[4];
 	/* shadow copy of struct page *'s for exp tid pages */
diff --git a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
index 70c1f3a..88709c1 100644
--- a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
+++ b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
@@ -37,7 +37,6 @@
  */
 
 #include <linux/pci.h>
-#include <asm/mtrr.h>
 #include <asm/processor.h>
 
 #include "ipath_kernel.h"
@@ -122,27 +121,26 @@ int ipath_enable_wc(struct ipath_devdata *dd)
 	}
 
 	if (!ret) {
-		int cookie;
 		ipath_cdbg(VERBOSE, "Setting mtrr for chip to WC "
 			   "(addr %llx, len=0x%llx)\n",
 			   (unsigned long long) pioaddr,
 			   (unsigned long long) piolen);
-		cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 1);
-		if (cookie < 0) {
-			{
-				dev_info(&dd->pcidev->dev,
-					 "mtrr_add()  WC for PIO bufs "
-					 "failed (%d)\n",
-					 cookie);
-				ret = -EINVAL;
-			}
-		} else {
-			ipath_cdbg(VERBOSE, "Set mtrr for chip to WC, "
-				   "cookie is %d\n", cookie);
-			dd->ipath_wc_cookie = cookie;
-			dd->ipath_wc_base = (unsigned long) pioaddr;
-			dd->ipath_wc_len = (unsigned long) piolen;
-		}
+		dd->wc_cookie = __arch_phys_wc_add(pioaddr, piolen);
+		if (dd->wc_cookie <= 0) {
+				/*
+				 * If MTRR is not available on an architecture
+				 * or if it could not be enabled at run time
+				 * folks who care should work towards the
+				 * ioremap_wc() split.
+				 */
+				if (!dd->wc_cookie)
+					ipath_dev_err(dd, "System does not support MTRR\n");
+				else {
+					ipath_dev_err(dd, "Seting mtrr failed on PIO buffers\n");
+					ret = -EINVAL;
+				}
+		} else
+			ipath_cdbg(VERBOSE, "Set mtrr for chip to WC\n");
 	}
 
 	return ret;
@@ -154,16 +152,5 @@ int ipath_enable_wc(struct ipath_devdata *dd)
  */
 void ipath_disable_wc(struct ipath_devdata *dd)
 {
-	if (dd->ipath_wc_cookie) {
-		int r;
-		ipath_cdbg(VERBOSE, "undoing WCCOMB on pio buffers\n");
-		r = mtrr_del(dd->ipath_wc_cookie, dd->ipath_wc_base,
-			     dd->ipath_wc_len);
-		if (r < 0)
-			dev_info(&dd->pcidev->dev,
-				 "mtrr_del(%lx, %lx, %lx) failed: %d\n",
-				 dd->ipath_wc_cookie, dd->ipath_wc_base,
-				 dd->ipath_wc_len, r);
-		dd->ipath_wc_cookie = 0; /* even on failure */
-	}
+	arch_phys_wc_del(dd->wc_cookie);
 }
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 15/47] [media] media: ivtv: use __arch_phys_wc_add()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Andy Walls, Ingo Molnar, Daniel Vetter, Bjorn Helgaas,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
	Stefan Bader, konrad.wilk, ville.syrjala, david.vrabel, jbeulich,
	toshi.kani, Roger Pau Monné,
	ivtv-devel, linux-media, xen-devel

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Sadly this driver requires a bit of work in order
to use ioremap_wc() on the range currently used
for MTRR write-combining. We'd need to ensure two
ioremap() calls are done. Annotate this.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: venkatesh.pallipadi@intel.com
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: ivtv-devel@ivtvdriver.org
Cc: linux-media@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/media/pci/ivtv/ivtvfb.c | 51 +++++++++++------------------------------
 1 file changed, 14 insertions(+), 37 deletions(-)

diff --git a/drivers/media/pci/ivtv/ivtvfb.c b/drivers/media/pci/ivtv/ivtvfb.c
index 9ff1230..ceefa6f 100644
--- a/drivers/media/pci/ivtv/ivtvfb.c
+++ b/drivers/media/pci/ivtv/ivtvfb.c
@@ -44,10 +44,6 @@
 #include <linux/ivtvfb.h>
 #include <linux/slab.h>
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 #include "ivtv-driver.h"
 #include "ivtv-cards.h"
 #include "ivtv-i2c.h"
@@ -155,12 +151,11 @@ struct osd_info {
 	/* Buffer size */
 	u32 video_buffer_size;
 
-#ifdef CONFIG_MTRR
 	/* video_base rounded down as required by hardware MTRRs */
 	unsigned long fb_start_aligned_physaddr;
 	/* video_base rounded up as required by hardware MTRRs */
 	unsigned long fb_end_aligned_physaddr;
-#endif
+	int wc_cookie;
 
 	/* Store the buffer offset */
 	int set_osd_coords_x;
@@ -1099,6 +1094,8 @@ static int ivtvfb_init_vidmode(struct ivtv *itv)
 static int ivtvfb_init_io(struct ivtv *itv)
 {
 	struct osd_info *oi = itv->osd_info;
+	/* Find the largest power of two that maps the whole buffer */
+	int size_shift = 31;
 
 	mutex_lock(&itv->serialize_lock);
 	if (ivtv_init_on_first_open(itv)) {
@@ -1132,29 +1129,16 @@ static int ivtvfb_init_io(struct ivtv *itv)
 			oi->video_pbase, oi->video_vbase,
 			oi->video_buffer_size / 1024);
 
-#ifdef CONFIG_MTRR
-	{
-		/* Find the largest power of two that maps the whole buffer */
-		int size_shift = 31;
-
-		while (!(oi->video_buffer_size & (1 << size_shift))) {
-			size_shift--;
-		}
-		size_shift++;
-		oi->fb_start_aligned_physaddr = oi->video_pbase & ~((1 << size_shift) - 1);
-		oi->fb_end_aligned_physaddr = oi->video_pbase + oi->video_buffer_size;
-		oi->fb_end_aligned_physaddr += (1 << size_shift) - 1;
-		oi->fb_end_aligned_physaddr &= ~((1 << size_shift) - 1);
-		if (mtrr_add(oi->fb_start_aligned_physaddr,
-			oi->fb_end_aligned_physaddr - oi->fb_start_aligned_physaddr,
-			     MTRR_TYPE_WRCOMB, 1) < 0) {
-			IVTVFB_INFO("disabled mttr\n");
-			oi->fb_start_aligned_physaddr = 0;
-			oi->fb_end_aligned_physaddr = 0;
-		}
-	}
-#endif
-
+	while (!(oi->video_buffer_size & (1 << size_shift)))
+		size_shift--;
+	size_shift++;
+	oi->fb_start_aligned_physaddr = oi->video_pbase & ~((1 << size_shift) - 1);
+	oi->fb_end_aligned_physaddr = oi->video_pbase + oi->video_buffer_size;
+	oi->fb_end_aligned_physaddr += (1 << size_shift) - 1;
+	oi->fb_end_aligned_physaddr &= ~((1 << size_shift) - 1);
+	oi->wc_cookie = __arch_phys_wc_add(oi->fb_start_aligned_physaddr,
+					   oi->fb_end_aligned_physaddr -
+					   oi->fb_start_aligned_physaddr);
 	/* Blank the entire osd. */
 	memset_io(oi->video_vbase, 0, oi->video_buffer_size);
 
@@ -1172,14 +1156,7 @@ static void ivtvfb_release_buffers (struct ivtv *itv)
 
 	/* Release pseudo palette */
 	kfree(oi->ivtvfb_info.pseudo_palette);
-
-#ifdef CONFIG_MTRR
-	if (oi->fb_end_aligned_physaddr) {
-		mtrr_del(-1, oi->fb_start_aligned_physaddr,
-			oi->fb_end_aligned_physaddr - oi->fb_start_aligned_physaddr);
-	}
-#endif
-
+	arch_phys_wc_del(oi->wc_cookie);
 	kfree(oi);
 	itv->osd_info = NULL;
 }
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 15/47] [media] media: ivtv: use __arch_phys_wc_add()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Andy Walls, Ingo Molnar, Daniel Vetter, Bjorn Helgaas,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
	Stefan Bader, konrad.wilk, ville.syrjala, david.vrabel, jbeulich,
	toshi.kani, Roger Pau Monné,
	ivtv-devel, linux-media, xen-devel

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Sadly this driver requires a bit of work in order
to use ioremap_wc() on the range currently used
for MTRR write-combining. We'd need to ensure two
ioremap() calls are done. Annotate this.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: venkatesh.pallipadi@intel.com
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: ivtv-devel@ivtvdriver.org
Cc: linux-media@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/media/pci/ivtv/ivtvfb.c | 51 +++++++++++------------------------------
 1 file changed, 14 insertions(+), 37 deletions(-)

diff --git a/drivers/media/pci/ivtv/ivtvfb.c b/drivers/media/pci/ivtv/ivtvfb.c
index 9ff1230..ceefa6f 100644
--- a/drivers/media/pci/ivtv/ivtvfb.c
+++ b/drivers/media/pci/ivtv/ivtvfb.c
@@ -44,10 +44,6 @@
 #include <linux/ivtvfb.h>
 #include <linux/slab.h>
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 #include "ivtv-driver.h"
 #include "ivtv-cards.h"
 #include "ivtv-i2c.h"
@@ -155,12 +151,11 @@ struct osd_info {
 	/* Buffer size */
 	u32 video_buffer_size;
 
-#ifdef CONFIG_MTRR
 	/* video_base rounded down as required by hardware MTRRs */
 	unsigned long fb_start_aligned_physaddr;
 	/* video_base rounded up as required by hardware MTRRs */
 	unsigned long fb_end_aligned_physaddr;
-#endif
+	int wc_cookie;
 
 	/* Store the buffer offset */
 	int set_osd_coords_x;
@@ -1099,6 +1094,8 @@ static int ivtvfb_init_vidmode(struct ivtv *itv)
 static int ivtvfb_init_io(struct ivtv *itv)
 {
 	struct osd_info *oi = itv->osd_info;
+	/* Find the largest power of two that maps the whole buffer */
+	int size_shift = 31;
 
 	mutex_lock(&itv->serialize_lock);
 	if (ivtv_init_on_first_open(itv)) {
@@ -1132,29 +1129,16 @@ static int ivtvfb_init_io(struct ivtv *itv)
 			oi->video_pbase, oi->video_vbase,
 			oi->video_buffer_size / 1024);
 
-#ifdef CONFIG_MTRR
-	{
-		/* Find the largest power of two that maps the whole buffer */
-		int size_shift = 31;
-
-		while (!(oi->video_buffer_size & (1 << size_shift))) {
-			size_shift--;
-		}
-		size_shift++;
-		oi->fb_start_aligned_physaddr = oi->video_pbase & ~((1 << size_shift) - 1);
-		oi->fb_end_aligned_physaddr = oi->video_pbase + oi->video_buffer_size;
-		oi->fb_end_aligned_physaddr += (1 << size_shift) - 1;
-		oi->fb_end_aligned_physaddr &= ~((1 << size_shift) - 1);
-		if (mtrr_add(oi->fb_start_aligned_physaddr,
-			oi->fb_end_aligned_physaddr - oi->fb_start_aligned_physaddr,
-			     MTRR_TYPE_WRCOMB, 1) < 0) {
-			IVTVFB_INFO("disabled mttr\n");
-			oi->fb_start_aligned_physaddr = 0;
-			oi->fb_end_aligned_physaddr = 0;
-		}
-	}
-#endif
-
+	while (!(oi->video_buffer_size & (1 << size_shift)))
+		size_shift--;
+	size_shift++;
+	oi->fb_start_aligned_physaddr = oi->video_pbase & ~((1 << size_shift) - 1);
+	oi->fb_end_aligned_physaddr = oi->video_pbase + oi->video_buffer_size;
+	oi->fb_end_aligned_physaddr += (1 << size_shift) - 1;
+	oi->fb_end_aligned_physaddr &= ~((1 << size_shift) - 1);
+	oi->wc_cookie = __arch_phys_wc_add(oi->fb_start_aligned_physaddr,
+					   oi->fb_end_aligned_physaddr -
+					   oi->fb_start_aligned_physaddr);
 	/* Blank the entire osd. */
 	memset_io(oi->video_vbase, 0, oi->video_buffer_size);
 
@@ -1172,14 +1156,7 @@ static void ivtvfb_release_buffers (struct ivtv *itv)
 
 	/* Release pseudo palette */
 	kfree(oi->ivtvfb_info.pseudo_palette);
-
-#ifdef CONFIG_MTRR
-	if (oi->fb_end_aligned_physaddr) {
-		mtrr_del(-1, oi->fb_start_aligned_physaddr,
-			oi->fb_end_aligned_physaddr - oi->fb_start_aligned_physaddr);
-	}
-#endif
-
+	arch_phys_wc_del(oi->wc_cookie);
 	kfree(oi);
 	itv->osd_info = NULL;
 }
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 16/47] fusion: use __arch_phys_wc_add()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Nagalakshmi Nandigama,
	Praveen Krishnamoorthy, Sreekanth Reddy, Abhijit Mahajan,
	Antonino Daplas, Tomi Valkeinen,
	Jean-Christophe Plagniol-Villard, MPT-FusionLinux.pdl,
	linux-scsi

From: "Luis R. Rodriguez" <mcgrof@suse.com>

If and when this gets enabled the driver should address
using ioremap_wc() on the same area, that could require
a bit of work as it would mean a split with two ioremap'd
areas. Annotate this.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Nagalakshmi Nandigama <nagalakshmi.nandigama@avagotech.com>
Cc: Praveen Krishnamoorthy <praveen.krishnamoorthy@avagotech.com>
Cc: Sreekanth Reddy <sreekanth.reddy@avagotech.com>
Cc: Abhijit Mahajan <abhijit.mahajan@avagotech.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: MPT-FusionLinux.pdl@avagotech.com
Cc: linux-scsi@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/message/fusion/mptbase.c | 19 ++++---------------
 drivers/message/fusion/mptbase.h |  2 +-
 2 files changed, 5 insertions(+), 16 deletions(-)

diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c
index 187f836..c7b1a55 100644
--- a/drivers/message/fusion/mptbase.c
+++ b/drivers/message/fusion/mptbase.c
@@ -59,10 +59,6 @@
 #include <linux/delay.h>
 #include <linux/interrupt.h>		/* needed for in_interrupt() proto */
 #include <linux/dma-mapping.h>
-#include <asm/io.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 #include <linux/kthread.h>
 #include <scsi/scsi_host.h>
 
@@ -2820,11 +2816,8 @@ mpt_adapter_dispose(MPT_ADAPTER *ioc)
 	pci_disable_device(ioc->pcidev);
 	pci_release_selected_regions(ioc->pcidev, ioc->bars);
 
-#if defined(CONFIG_MTRR) && 0
-	if (ioc->mtrr_reg > 0) {
-		mtrr_del(ioc->mtrr_reg, 0, 0);
-		dprintk(ioc, printk(MYIOC_s_INFO_FMT "MTRR region de-registered\n", ioc->name));
-	}
+#if 0
+	__arch_phys_wc_del(ioc->wc_cookie);
 #endif
 
 	/*  Zap the adapter lookup ptr!  */
@@ -4512,17 +4505,13 @@ PrimeIocFifos(MPT_ADAPTER *ioc)
 
 		ioc->req_frames_low_dma = (u32) (alloc_dma & 0xFFFFFFFF);
 
-#if defined(CONFIG_MTRR) && 0
+#if 0
 		/*
 		 *  Enable Write Combining MTRR for IOC's memory region.
 		 *  (at least as much as we can; "size and base must be
 		 *  multiples of 4 kiB"
 		 */
-		ioc->mtrr_reg = mtrr_add(ioc->req_frames_dma,
-					 sz,
-					 MTRR_TYPE_WRCOMB, 1);
-		dprintk(ioc, printk(MYIOC_s_DEBUG_FMT "MTRR region registered (base:size=%08x:%x)\n",
-				ioc->name, ioc->req_frames_dma, sz));
+		ioc->wc_cookie = arch_phys_wc_add(ioc->req_frames_dma, sz);
 #endif
 
 		for (i = 0; i < ioc->req_depth; i++) {
diff --git a/drivers/message/fusion/mptbase.h b/drivers/message/fusion/mptbase.h
index 8f14090..f0bff11 100644
--- a/drivers/message/fusion/mptbase.h
+++ b/drivers/message/fusion/mptbase.h
@@ -671,7 +671,7 @@ typedef struct _MPT_ADAPTER
 	u8			*HostPageBuffer; /* SAS - host page buffer support */
 	u32			HostPageBuffer_sz;
 	dma_addr_t		HostPageBuffer_dma;
-	int			 mtrr_reg;
+	int			wc_cookie;
 	struct pci_dev		*pcidev;	/* struct pci_dev pointer */
 	int			bars;		/* bitmask of BAR's that must be configured */
 	int			msi_enable;
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 16/47] fusion: use __arch_phys_wc_add()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Nagalakshmi Nandigama,
	Praveen Krishnamoorthy, Sreekanth Reddy, Abhijit Mahajan,
	Antonino Daplas, Tomi Valkeinen,
	Jean-Christophe Plagniol-Villard, MPT-FusionLinux.pdl,
	linux-scsi

From: "Luis R. Rodriguez" <mcgrof@suse.com>

If and when this gets enabled the driver should address
using ioremap_wc() on the same area, that could require
a bit of work as it would mean a split with two ioremap'd
areas. Annotate this.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Nagalakshmi Nandigama <nagalakshmi.nandigama@avagotech.com>
Cc: Praveen Krishnamoorthy <praveen.krishnamoorthy@avagotech.com>
Cc: Sreekanth Reddy <sreekanth.reddy@avagotech.com>
Cc: Abhijit Mahajan <abhijit.mahajan@avagotech.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: MPT-FusionLinux.pdl@avagotech.com
Cc: linux-scsi@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/message/fusion/mptbase.c | 19 ++++---------------
 drivers/message/fusion/mptbase.h |  2 +-
 2 files changed, 5 insertions(+), 16 deletions(-)

diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c
index 187f836..c7b1a55 100644
--- a/drivers/message/fusion/mptbase.c
+++ b/drivers/message/fusion/mptbase.c
@@ -59,10 +59,6 @@
 #include <linux/delay.h>
 #include <linux/interrupt.h>		/* needed for in_interrupt() proto */
 #include <linux/dma-mapping.h>
-#include <asm/io.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 #include <linux/kthread.h>
 #include <scsi/scsi_host.h>
 
@@ -2820,11 +2816,8 @@ mpt_adapter_dispose(MPT_ADAPTER *ioc)
 	pci_disable_device(ioc->pcidev);
 	pci_release_selected_regions(ioc->pcidev, ioc->bars);
 
-#if defined(CONFIG_MTRR) && 0
-	if (ioc->mtrr_reg > 0) {
-		mtrr_del(ioc->mtrr_reg, 0, 0);
-		dprintk(ioc, printk(MYIOC_s_INFO_FMT "MTRR region de-registered\n", ioc->name));
-	}
+#if 0
+	__arch_phys_wc_del(ioc->wc_cookie);
 #endif
 
 	/*  Zap the adapter lookup ptr!  */
@@ -4512,17 +4505,13 @@ PrimeIocFifos(MPT_ADAPTER *ioc)
 
 		ioc->req_frames_low_dma = (u32) (alloc_dma & 0xFFFFFFFF);
 
-#if defined(CONFIG_MTRR) && 0
+#if 0
 		/*
 		 *  Enable Write Combining MTRR for IOC's memory region.
 		 *  (at least as much as we can; "size and base must be
 		 *  multiples of 4 kiB"
 		 */
-		ioc->mtrr_reg = mtrr_add(ioc->req_frames_dma,
-					 sz,
-					 MTRR_TYPE_WRCOMB, 1);
-		dprintk(ioc, printk(MYIOC_s_DEBUG_FMT "MTRR region registered (base:size=%08x:%x)\n",
-				ioc->name, ioc->req_frames_dma, sz));
+		ioc->wc_cookie = arch_phys_wc_add(ioc->req_frames_dma, sz);
 #endif
 
 		for (i = 0; i < ioc->req_depth; i++) {
diff --git a/drivers/message/fusion/mptbase.h b/drivers/message/fusion/mptbase.h
index 8f14090..f0bff11 100644
--- a/drivers/message/fusion/mptbase.h
+++ b/drivers/message/fusion/mptbase.h
@@ -671,7 +671,7 @@ typedef struct _MPT_ADAPTER
 	u8			*HostPageBuffer; /* SAS - host page buffer support */
 	u32			HostPageBuffer_sz;
 	dma_addr_t		HostPageBuffer_dma;
-	int			 mtrr_reg;
+	int			wc_cookie;
 	struct pci_dev		*pcidev;	/* struct pci_dev pointer */
 	int			bars;		/* bitmask of BAR's that must be configured */
 	int			msi_enable;
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 16/47] fusion: use __arch_phys_wc_add()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (25 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: Abhijit Mahajan, linux-fbdev, Antonino Daplas,
	Nagalakshmi Nandigama, Daniel Vetter, Luis R. Rodriguez, x86,
	linux-kernel, Sreekanth Reddy, Praveen Krishnamoorthy,
	Tomi Valkeinen, linux-scsi, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard, MPT-FusionLinux.pdl

From: "Luis R. Rodriguez" <mcgrof@suse.com>

If and when this gets enabled the driver should address
using ioremap_wc() on the same area, that could require
a bit of work as it would mean a split with two ioremap'd
areas. Annotate this.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Nagalakshmi Nandigama <nagalakshmi.nandigama@avagotech.com>
Cc: Praveen Krishnamoorthy <praveen.krishnamoorthy@avagotech.com>
Cc: Sreekanth Reddy <sreekanth.reddy@avagotech.com>
Cc: Abhijit Mahajan <abhijit.mahajan@avagotech.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: MPT-FusionLinux.pdl@avagotech.com
Cc: linux-scsi@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/message/fusion/mptbase.c | 19 ++++---------------
 drivers/message/fusion/mptbase.h |  2 +-
 2 files changed, 5 insertions(+), 16 deletions(-)

diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c
index 187f836..c7b1a55 100644
--- a/drivers/message/fusion/mptbase.c
+++ b/drivers/message/fusion/mptbase.c
@@ -59,10 +59,6 @@
 #include <linux/delay.h>
 #include <linux/interrupt.h>		/* needed for in_interrupt() proto */
 #include <linux/dma-mapping.h>
-#include <asm/io.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 #include <linux/kthread.h>
 #include <scsi/scsi_host.h>
 
@@ -2820,11 +2816,8 @@ mpt_adapter_dispose(MPT_ADAPTER *ioc)
 	pci_disable_device(ioc->pcidev);
 	pci_release_selected_regions(ioc->pcidev, ioc->bars);
 
-#if defined(CONFIG_MTRR) && 0
-	if (ioc->mtrr_reg > 0) {
-		mtrr_del(ioc->mtrr_reg, 0, 0);
-		dprintk(ioc, printk(MYIOC_s_INFO_FMT "MTRR region de-registered\n", ioc->name));
-	}
+#if 0
+	__arch_phys_wc_del(ioc->wc_cookie);
 #endif
 
 	/*  Zap the adapter lookup ptr!  */
@@ -4512,17 +4505,13 @@ PrimeIocFifos(MPT_ADAPTER *ioc)
 
 		ioc->req_frames_low_dma = (u32) (alloc_dma & 0xFFFFFFFF);
 
-#if defined(CONFIG_MTRR) && 0
+#if 0
 		/*
 		 *  Enable Write Combining MTRR for IOC's memory region.
 		 *  (at least as much as we can; "size and base must be
 		 *  multiples of 4 kiB"
 		 */
-		ioc->mtrr_reg = mtrr_add(ioc->req_frames_dma,
-					 sz,
-					 MTRR_TYPE_WRCOMB, 1);
-		dprintk(ioc, printk(MYIOC_s_DEBUG_FMT "MTRR region registered (base:size=%08x:%x)\n",
-				ioc->name, ioc->req_frames_dma, sz));
+		ioc->wc_cookie = arch_phys_wc_add(ioc->req_frames_dma, sz);
 #endif
 
 		for (i = 0; i < ioc->req_depth; i++) {
diff --git a/drivers/message/fusion/mptbase.h b/drivers/message/fusion/mptbase.h
index 8f14090..f0bff11 100644
--- a/drivers/message/fusion/mptbase.h
+++ b/drivers/message/fusion/mptbase.h
@@ -671,7 +671,7 @@ typedef struct _MPT_ADAPTER
 	u8			*HostPageBuffer; /* SAS - host page buffer support */
 	u32			HostPageBuffer_sz;
 	dma_addr_t		HostPageBuffer_dma;
-	int			 mtrr_reg;
+	int			wc_cookie;
 	struct pci_dev		*pcidev;	/* struct pci_dev pointer */
 	int			bars;		/* bitmask of BAR's that must be configured */
 	int			msi_enable;
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 17/47] video: fbdev: vesafb: only support MTRR_TYPE_WRCOMB
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

No other video driver uses MTRR types except for MTRR_TYPE_WRCOMB,
the other MTRR types were implemented and supported here but with
no real good reason. The ioremap() APIs are architecture agnostic and
at least on x86 PAT is a new design that extends MTRRs and
can replace it in a much cleaner way, where so long as the
proper ioremap_wc() or variant API is used the right thing will
be done behind the scenes. This is the only driver left using the
other MTRR types -- and since there is no good reason for it now
rip them out.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/vesafb.c | 62 ++++++++++++--------------------------------
 1 file changed, 16 insertions(+), 46 deletions(-)

diff --git a/drivers/video/fbdev/vesafb.c b/drivers/video/fbdev/vesafb.c
index d79a0ac..191156b 100644
--- a/drivers/video/fbdev/vesafb.c
+++ b/drivers/video/fbdev/vesafb.c
@@ -404,60 +404,30 @@ static int vesafb_probe(struct platform_device *dev)
 	 * region already (FIXME) */
 	request_region(0x3c0, 32, "vesafb");
 
+	if (mtrr == 3) {
 #ifdef CONFIG_MTRR
-	if (mtrr) {
 		unsigned int temp_size = size_total;
-		unsigned int type = 0;
+		int rc;
 
-		switch (mtrr) {
-		case 1:
-			type = MTRR_TYPE_UNCACHABLE;
-			break;
-		case 2:
-			type = MTRR_TYPE_WRBACK;
-			break;
-		case 3:
-			type = MTRR_TYPE_WRCOMB;
-			break;
-		case 4:
-			type = MTRR_TYPE_WRTHROUGH;
-			break;
-		default:
-			type = 0;
-			break;
-		}
-
-		if (type) {
-			int rc;
-
-			/* Find the largest power-of-two */
-			temp_size = roundup_pow_of_two(temp_size);
+		/* Find the largest power-of-two */
+		temp_size = roundup_pow_of_two(temp_size);
 
-			/* Try and find a power of two to add */
-			do {
-				rc = mtrr_add(vesafb_fix.smem_start, temp_size,
-					      type, 1);
-				temp_size >>= 1;
-			} while (temp_size >= PAGE_SIZE && rc == -EINVAL);
-		}
-	}
+		/* Try and find a power of two to add */
+		do {
+			rc = mtrr_add(vesafb_fix.smem_start, temp_size,
+				      MTRR_TYPE_WRCOMB, 1);
+			temp_size >>= 1;
+		} while (temp_size >= PAGE_SIZE && rc == -EINVAL);
 #endif
-	
-	switch (mtrr) {
-	case 1: /* uncachable */
-		info->screen_base = ioremap_nocache(vesafb_fix.smem_start, vesafb_fix.smem_len);
-		break;
-	case 2: /* write-back */
-		info->screen_base = ioremap_cache(vesafb_fix.smem_start, vesafb_fix.smem_len);
-		break;
-	case 3: /* write-combining */
 		info->screen_base = ioremap_wc(vesafb_fix.smem_start, vesafb_fix.smem_len);
-		break;
-	case 4: /* write-through */
-	default:
+	} else {
+#ifdef CONFIG_MTRR
+		if (mtrr && mtrr != 3)
+			WARN_ONCE(1, "Only MTRR_TYPE_WRCOMB (3) make sense\n");
+#endif
 		info->screen_base = ioremap(vesafb_fix.smem_start, vesafb_fix.smem_len);
-		break;
 	}
+
 	if (!info->screen_base) {
 		printk(KERN_ERR
 		       "vesafb: abort, cannot ioremap video memory 0x%x @ 0x%lx\n",
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 17/47] video: fbdev: vesafb: only support MTRR_TYPE_WRCOMB
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

No other video driver uses MTRR types except for MTRR_TYPE_WRCOMB,
the other MTRR types were implemented and supported here but with
no real good reason. The ioremap() APIs are architecture agnostic and
at least on x86 PAT is a new design that extends MTRRs and
can replace it in a much cleaner way, where so long as the
proper ioremap_wc() or variant API is used the right thing will
be done behind the scenes. This is the only driver left using the
other MTRR types -- and since there is no good reason for it now
rip them out.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/vesafb.c | 62 ++++++++++++--------------------------------
 1 file changed, 16 insertions(+), 46 deletions(-)

diff --git a/drivers/video/fbdev/vesafb.c b/drivers/video/fbdev/vesafb.c
index d79a0ac..191156b 100644
--- a/drivers/video/fbdev/vesafb.c
+++ b/drivers/video/fbdev/vesafb.c
@@ -404,60 +404,30 @@ static int vesafb_probe(struct platform_device *dev)
 	 * region already (FIXME) */
 	request_region(0x3c0, 32, "vesafb");
 
+	if (mtrr = 3) {
 #ifdef CONFIG_MTRR
-	if (mtrr) {
 		unsigned int temp_size = size_total;
-		unsigned int type = 0;
+		int rc;
 
-		switch (mtrr) {
-		case 1:
-			type = MTRR_TYPE_UNCACHABLE;
-			break;
-		case 2:
-			type = MTRR_TYPE_WRBACK;
-			break;
-		case 3:
-			type = MTRR_TYPE_WRCOMB;
-			break;
-		case 4:
-			type = MTRR_TYPE_WRTHROUGH;
-			break;
-		default:
-			type = 0;
-			break;
-		}
-
-		if (type) {
-			int rc;
-
-			/* Find the largest power-of-two */
-			temp_size = roundup_pow_of_two(temp_size);
+		/* Find the largest power-of-two */
+		temp_size = roundup_pow_of_two(temp_size);
 
-			/* Try and find a power of two to add */
-			do {
-				rc = mtrr_add(vesafb_fix.smem_start, temp_size,
-					      type, 1);
-				temp_size >>= 1;
-			} while (temp_size >= PAGE_SIZE && rc = -EINVAL);
-		}
-	}
+		/* Try and find a power of two to add */
+		do {
+			rc = mtrr_add(vesafb_fix.smem_start, temp_size,
+				      MTRR_TYPE_WRCOMB, 1);
+			temp_size >>= 1;
+		} while (temp_size >= PAGE_SIZE && rc = -EINVAL);
 #endif
-	
-	switch (mtrr) {
-	case 1: /* uncachable */
-		info->screen_base = ioremap_nocache(vesafb_fix.smem_start, vesafb_fix.smem_len);
-		break;
-	case 2: /* write-back */
-		info->screen_base = ioremap_cache(vesafb_fix.smem_start, vesafb_fix.smem_len);
-		break;
-	case 3: /* write-combining */
 		info->screen_base = ioremap_wc(vesafb_fix.smem_start, vesafb_fix.smem_len);
-		break;
-	case 4: /* write-through */
-	default:
+	} else {
+#ifdef CONFIG_MTRR
+		if (mtrr && mtrr != 3)
+			WARN_ONCE(1, "Only MTRR_TYPE_WRCOMB (3) make sense\n");
+#endif
 		info->screen_base = ioremap(vesafb_fix.smem_start, vesafb_fix.smem_len);
-		break;
 	}
+
 	if (!info->screen_base) {
 		printk(KERN_ERR
 		       "vesafb: abort, cannot ioremap video memory 0x%x @ 0x%lx\n",
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 17/47] video: fbdev: vesafb: only support MTRR_TYPE_WRCOMB
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (26 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: Jean-Christophe Plagniol-Villard, linux-fbdev, Antonino Daplas,
	Daniel Vetter, Luis R. Rodriguez, x86, linux-kernel,
	Tomi Valkeinen, xen-devel, Ingo Molnar, Linus Torvalds

From: "Luis R. Rodriguez" <mcgrof@suse.com>

No other video driver uses MTRR types except for MTRR_TYPE_WRCOMB,
the other MTRR types were implemented and supported here but with
no real good reason. The ioremap() APIs are architecture agnostic and
at least on x86 PAT is a new design that extends MTRRs and
can replace it in a much cleaner way, where so long as the
proper ioremap_wc() or variant API is used the right thing will
be done behind the scenes. This is the only driver left using the
other MTRR types -- and since there is no good reason for it now
rip them out.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/vesafb.c | 62 ++++++++++++--------------------------------
 1 file changed, 16 insertions(+), 46 deletions(-)

diff --git a/drivers/video/fbdev/vesafb.c b/drivers/video/fbdev/vesafb.c
index d79a0ac..191156b 100644
--- a/drivers/video/fbdev/vesafb.c
+++ b/drivers/video/fbdev/vesafb.c
@@ -404,60 +404,30 @@ static int vesafb_probe(struct platform_device *dev)
 	 * region already (FIXME) */
 	request_region(0x3c0, 32, "vesafb");
 
+	if (mtrr == 3) {
 #ifdef CONFIG_MTRR
-	if (mtrr) {
 		unsigned int temp_size = size_total;
-		unsigned int type = 0;
+		int rc;
 
-		switch (mtrr) {
-		case 1:
-			type = MTRR_TYPE_UNCACHABLE;
-			break;
-		case 2:
-			type = MTRR_TYPE_WRBACK;
-			break;
-		case 3:
-			type = MTRR_TYPE_WRCOMB;
-			break;
-		case 4:
-			type = MTRR_TYPE_WRTHROUGH;
-			break;
-		default:
-			type = 0;
-			break;
-		}
-
-		if (type) {
-			int rc;
-
-			/* Find the largest power-of-two */
-			temp_size = roundup_pow_of_two(temp_size);
+		/* Find the largest power-of-two */
+		temp_size = roundup_pow_of_two(temp_size);
 
-			/* Try and find a power of two to add */
-			do {
-				rc = mtrr_add(vesafb_fix.smem_start, temp_size,
-					      type, 1);
-				temp_size >>= 1;
-			} while (temp_size >= PAGE_SIZE && rc == -EINVAL);
-		}
-	}
+		/* Try and find a power of two to add */
+		do {
+			rc = mtrr_add(vesafb_fix.smem_start, temp_size,
+				      MTRR_TYPE_WRCOMB, 1);
+			temp_size >>= 1;
+		} while (temp_size >= PAGE_SIZE && rc == -EINVAL);
 #endif
-	
-	switch (mtrr) {
-	case 1: /* uncachable */
-		info->screen_base = ioremap_nocache(vesafb_fix.smem_start, vesafb_fix.smem_len);
-		break;
-	case 2: /* write-back */
-		info->screen_base = ioremap_cache(vesafb_fix.smem_start, vesafb_fix.smem_len);
-		break;
-	case 3: /* write-combining */
 		info->screen_base = ioremap_wc(vesafb_fix.smem_start, vesafb_fix.smem_len);
-		break;
-	case 4: /* write-through */
-	default:
+	} else {
+#ifdef CONFIG_MTRR
+		if (mtrr && mtrr != 3)
+			WARN_ONCE(1, "Only MTRR_TYPE_WRCOMB (3) make sense\n");
+#endif
 		info->screen_base = ioremap(vesafb_fix.smem_start, vesafb_fix.smem_len);
-		break;
 	}
+
 	if (!info->screen_base) {
 		printk(KERN_ERR
 		       "vesafb: abort, cannot ioremap video memory 0x%x @ 0x%lx\n",
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 18/47] vidoe: fbdev: vesafb: add missing mtrr_del() for added MTRR
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The MTRR added was never being deleted.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/vesafb.c | 30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/drivers/video/fbdev/vesafb.c b/drivers/video/fbdev/vesafb.c
index 191156b..a2261d0 100644
--- a/drivers/video/fbdev/vesafb.c
+++ b/drivers/video/fbdev/vesafb.c
@@ -29,6 +29,10 @@
 
 /* --------------------------------------------------------------------- */
 
+struct vesafb_par {
+	int wc_cookie;
+};
+
 static struct fb_var_screeninfo vesafb_defined = {
 	.activate	= FB_ACTIVATE_NOW,
 	.height		= -1,
@@ -175,7 +179,16 @@ static int vesafb_setcolreg(unsigned regno, unsigned red, unsigned green,
 
 static void vesafb_destroy(struct fb_info *info)
 {
+#ifdef CONFIG_MTRR
+	struct vesafb_par *par = info->par;
+#endif
+
 	fb_dealloc_cmap(&info->cmap);
+
+#ifdef CONFIG_MTRR
+	if (par->wc_cookie >= 0)
+		mtrr_del(par->wc_cookie, 0, 0);
+#endif
 	if (info->screen_base)
 		iounmap(info->screen_base);
 	release_mem_region(info->apertures->ranges[0].base, info->apertures->ranges[0].size);
@@ -228,6 +241,7 @@ static int vesafb_setup(char *options)
 static int vesafb_probe(struct platform_device *dev)
 {
 	struct fb_info *info;
+	struct vesafb_par *par;
 	int i, err;
 	unsigned int size_vmode;
 	unsigned int size_remap;
@@ -297,8 +311,8 @@ static int vesafb_probe(struct platform_device *dev)
 		return -ENOMEM;
 	}
 	platform_set_drvdata(dev, info);
-	info->pseudo_palette = info->par;
-	info->par = NULL;
+	info->pseudo_palette = NULL;
+	par = info->par;
 
 	/* set vesafb aperture size for generic probing */
 	info->apertures = alloc_apertures(1);
@@ -407,17 +421,17 @@ static int vesafb_probe(struct platform_device *dev)
 	if (mtrr == 3) {
 #ifdef CONFIG_MTRR
 		unsigned int temp_size = size_total;
-		int rc;
 
 		/* Find the largest power-of-two */
 		temp_size = roundup_pow_of_two(temp_size);
 
 		/* Try and find a power of two to add */
 		do {
-			rc = mtrr_add(vesafb_fix.smem_start, temp_size,
-				      MTRR_TYPE_WRCOMB, 1);
+			par->wc_cookie = mtrr_add(vesafb_fix.smem_start,
+						  temp_size,
+						  MTRR_TYPE_WRCOMB, 1);
 			temp_size >>= 1;
-		} while (temp_size >= PAGE_SIZE && rc == -EINVAL);
+		} while (temp_size >= PAGE_SIZE && par->wc_cookie == -EINVAL);
 #endif
 		info->screen_base = ioremap_wc(vesafb_fix.smem_start, vesafb_fix.smem_len);
 	} else {
@@ -462,6 +476,10 @@ static int vesafb_probe(struct platform_device *dev)
 	fb_info(info, "%s frame buffer device\n", info->fix.id);
 	return 0;
 err:
+#ifdef CONFIG_MTRR
+	if (par->wc_cookie >= 0)
+		mtrr_del(par->wc_cookie, 0, 0);
+#endif
 	if (info->screen_base)
 		iounmap(info->screen_base);
 	framebuffer_release(info);
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 18/47] vidoe: fbdev: vesafb: add missing mtrr_del() for added MTRR
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The MTRR added was never being deleted.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/vesafb.c | 30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/drivers/video/fbdev/vesafb.c b/drivers/video/fbdev/vesafb.c
index 191156b..a2261d0 100644
--- a/drivers/video/fbdev/vesafb.c
+++ b/drivers/video/fbdev/vesafb.c
@@ -29,6 +29,10 @@
 
 /* --------------------------------------------------------------------- */
 
+struct vesafb_par {
+	int wc_cookie;
+};
+
 static struct fb_var_screeninfo vesafb_defined = {
 	.activate	= FB_ACTIVATE_NOW,
 	.height		= -1,
@@ -175,7 +179,16 @@ static int vesafb_setcolreg(unsigned regno, unsigned red, unsigned green,
 
 static void vesafb_destroy(struct fb_info *info)
 {
+#ifdef CONFIG_MTRR
+	struct vesafb_par *par = info->par;
+#endif
+
 	fb_dealloc_cmap(&info->cmap);
+
+#ifdef CONFIG_MTRR
+	if (par->wc_cookie >= 0)
+		mtrr_del(par->wc_cookie, 0, 0);
+#endif
 	if (info->screen_base)
 		iounmap(info->screen_base);
 	release_mem_region(info->apertures->ranges[0].base, info->apertures->ranges[0].size);
@@ -228,6 +241,7 @@ static int vesafb_setup(char *options)
 static int vesafb_probe(struct platform_device *dev)
 {
 	struct fb_info *info;
+	struct vesafb_par *par;
 	int i, err;
 	unsigned int size_vmode;
 	unsigned int size_remap;
@@ -297,8 +311,8 @@ static int vesafb_probe(struct platform_device *dev)
 		return -ENOMEM;
 	}
 	platform_set_drvdata(dev, info);
-	info->pseudo_palette = info->par;
-	info->par = NULL;
+	info->pseudo_palette = NULL;
+	par = info->par;
 
 	/* set vesafb aperture size for generic probing */
 	info->apertures = alloc_apertures(1);
@@ -407,17 +421,17 @@ static int vesafb_probe(struct platform_device *dev)
 	if (mtrr = 3) {
 #ifdef CONFIG_MTRR
 		unsigned int temp_size = size_total;
-		int rc;
 
 		/* Find the largest power-of-two */
 		temp_size = roundup_pow_of_two(temp_size);
 
 		/* Try and find a power of two to add */
 		do {
-			rc = mtrr_add(vesafb_fix.smem_start, temp_size,
-				      MTRR_TYPE_WRCOMB, 1);
+			par->wc_cookie = mtrr_add(vesafb_fix.smem_start,
+						  temp_size,
+						  MTRR_TYPE_WRCOMB, 1);
 			temp_size >>= 1;
-		} while (temp_size >= PAGE_SIZE && rc = -EINVAL);
+		} while (temp_size >= PAGE_SIZE && par->wc_cookie = -EINVAL);
 #endif
 		info->screen_base = ioremap_wc(vesafb_fix.smem_start, vesafb_fix.smem_len);
 	} else {
@@ -462,6 +476,10 @@ static int vesafb_probe(struct platform_device *dev)
 	fb_info(info, "%s frame buffer device\n", info->fix.id);
 	return 0;
 err:
+#ifdef CONFIG_MTRR
+	if (par->wc_cookie >= 0)
+		mtrr_del(par->wc_cookie, 0, 0);
+#endif
 	if (info->screen_base)
 		iounmap(info->screen_base);
 	framebuffer_release(info);
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 18/47] vidoe: fbdev: vesafb: add missing mtrr_del() for added MTRR
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (28 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The MTRR added was never being deleted.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/vesafb.c | 30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/drivers/video/fbdev/vesafb.c b/drivers/video/fbdev/vesafb.c
index 191156b..a2261d0 100644
--- a/drivers/video/fbdev/vesafb.c
+++ b/drivers/video/fbdev/vesafb.c
@@ -29,6 +29,10 @@
 
 /* --------------------------------------------------------------------- */
 
+struct vesafb_par {
+	int wc_cookie;
+};
+
 static struct fb_var_screeninfo vesafb_defined = {
 	.activate	= FB_ACTIVATE_NOW,
 	.height		= -1,
@@ -175,7 +179,16 @@ static int vesafb_setcolreg(unsigned regno, unsigned red, unsigned green,
 
 static void vesafb_destroy(struct fb_info *info)
 {
+#ifdef CONFIG_MTRR
+	struct vesafb_par *par = info->par;
+#endif
+
 	fb_dealloc_cmap(&info->cmap);
+
+#ifdef CONFIG_MTRR
+	if (par->wc_cookie >= 0)
+		mtrr_del(par->wc_cookie, 0, 0);
+#endif
 	if (info->screen_base)
 		iounmap(info->screen_base);
 	release_mem_region(info->apertures->ranges[0].base, info->apertures->ranges[0].size);
@@ -228,6 +241,7 @@ static int vesafb_setup(char *options)
 static int vesafb_probe(struct platform_device *dev)
 {
 	struct fb_info *info;
+	struct vesafb_par *par;
 	int i, err;
 	unsigned int size_vmode;
 	unsigned int size_remap;
@@ -297,8 +311,8 @@ static int vesafb_probe(struct platform_device *dev)
 		return -ENOMEM;
 	}
 	platform_set_drvdata(dev, info);
-	info->pseudo_palette = info->par;
-	info->par = NULL;
+	info->pseudo_palette = NULL;
+	par = info->par;
 
 	/* set vesafb aperture size for generic probing */
 	info->apertures = alloc_apertures(1);
@@ -407,17 +421,17 @@ static int vesafb_probe(struct platform_device *dev)
 	if (mtrr == 3) {
 #ifdef CONFIG_MTRR
 		unsigned int temp_size = size_total;
-		int rc;
 
 		/* Find the largest power-of-two */
 		temp_size = roundup_pow_of_two(temp_size);
 
 		/* Try and find a power of two to add */
 		do {
-			rc = mtrr_add(vesafb_fix.smem_start, temp_size,
-				      MTRR_TYPE_WRCOMB, 1);
+			par->wc_cookie = mtrr_add(vesafb_fix.smem_start,
+						  temp_size,
+						  MTRR_TYPE_WRCOMB, 1);
 			temp_size >>= 1;
-		} while (temp_size >= PAGE_SIZE && rc == -EINVAL);
+		} while (temp_size >= PAGE_SIZE && par->wc_cookie == -EINVAL);
 #endif
 		info->screen_base = ioremap_wc(vesafb_fix.smem_start, vesafb_fix.smem_len);
 	} else {
@@ -462,6 +476,10 @@ static int vesafb_probe(struct platform_device *dev)
 	fb_info(info, "%s frame buffer device\n", info->fix.id);
 	return 0;
 err:
+#ifdef CONFIG_MTRR
+	if (par->wc_cookie >= 0)
+		mtrr_del(par->wc_cookie, 0, 0);
+#endif
 	if (info->screen_base)
 		iounmap(info->screen_base);
 	framebuffer_release(info);
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 19/47] video: fbdev: vesafb: use arch_phys_wc_add()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap_wc(), if
anything it just uses a smaller size in case MTRR reservation fails.
ioremap_wc() API is already used to take advantage of architecture
write-combining when available.

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/vesafb.c | 29 ++++++++---------------------
 1 file changed, 8 insertions(+), 21 deletions(-)

diff --git a/drivers/video/fbdev/vesafb.c b/drivers/video/fbdev/vesafb.c
index a2261d0..5bc94d3 100644
--- a/drivers/video/fbdev/vesafb.c
+++ b/drivers/video/fbdev/vesafb.c
@@ -19,10 +19,9 @@
 #include <linux/init.h>
 #include <linux/platform_device.h>
 #include <linux/screen_info.h>
+#include <linux/io.h>
 
 #include <video/vga.h>
-#include <asm/io.h>
-#include <asm/mtrr.h>
 
 #define dac_reg	(0x3c8)
 #define dac_val	(0x3c9)
@@ -179,16 +178,10 @@ static int vesafb_setcolreg(unsigned regno, unsigned red, unsigned green,
 
 static void vesafb_destroy(struct fb_info *info)
 {
-#ifdef CONFIG_MTRR
 	struct vesafb_par *par = info->par;
-#endif
 
 	fb_dealloc_cmap(&info->cmap);
-
-#ifdef CONFIG_MTRR
-	if (par->wc_cookie >= 0)
-		mtrr_del(par->wc_cookie, 0, 0);
-#endif
+	arch_phys_wc_del(par->wc_cookie);
 	if (info->screen_base)
 		iounmap(info->screen_base);
 	release_mem_region(info->apertures->ranges[0].base, info->apertures->ranges[0].size);
@@ -419,7 +412,6 @@ static int vesafb_probe(struct platform_device *dev)
 	request_region(0x3c0, 32, "vesafb");
 
 	if (mtrr == 3) {
-#ifdef CONFIG_MTRR
 		unsigned int temp_size = size_total;
 
 		/* Find the largest power-of-two */
@@ -427,18 +419,16 @@ static int vesafb_probe(struct platform_device *dev)
 
 		/* Try and find a power of two to add */
 		do {
-			par->wc_cookie = mtrr_add(vesafb_fix.smem_start,
-						  temp_size,
-						  MTRR_TYPE_WRCOMB, 1);
+			par->wc_cookie =
+				arch_phys_wc_add(vesafb_fix.smem_start,
+						 temp_size);
 			temp_size >>= 1;
-		} while (temp_size >= PAGE_SIZE && par->wc_cookie == -EINVAL);
-#endif
+		} while (temp_size >= PAGE_SIZE && par->wc_cookie < 0);
+
 		info->screen_base = ioremap_wc(vesafb_fix.smem_start, vesafb_fix.smem_len);
 	} else {
-#ifdef CONFIG_MTRR
 		if (mtrr && mtrr != 3)
 			WARN_ONCE(1, "Only MTRR_TYPE_WRCOMB (3) make sense\n");
-#endif
 		info->screen_base = ioremap(vesafb_fix.smem_start, vesafb_fix.smem_len);
 	}
 
@@ -476,10 +466,7 @@ static int vesafb_probe(struct platform_device *dev)
 	fb_info(info, "%s frame buffer device\n", info->fix.id);
 	return 0;
 err:
-#ifdef CONFIG_MTRR
-	if (par->wc_cookie >= 0)
-		mtrr_del(par->wc_cookie, 0, 0);
-#endif
+	arch_phys_wc_del(par->wc_cookie);
 	if (info->screen_base)
 		iounmap(info->screen_base);
 	framebuffer_release(info);
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 19/47] video: fbdev: vesafb: use arch_phys_wc_add()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap_wc(), if
anything it just uses a smaller size in case MTRR reservation fails.
ioremap_wc() API is already used to take advantage of architecture
write-combining when available.

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/vesafb.c | 29 ++++++++---------------------
 1 file changed, 8 insertions(+), 21 deletions(-)

diff --git a/drivers/video/fbdev/vesafb.c b/drivers/video/fbdev/vesafb.c
index a2261d0..5bc94d3 100644
--- a/drivers/video/fbdev/vesafb.c
+++ b/drivers/video/fbdev/vesafb.c
@@ -19,10 +19,9 @@
 #include <linux/init.h>
 #include <linux/platform_device.h>
 #include <linux/screen_info.h>
+#include <linux/io.h>
 
 #include <video/vga.h>
-#include <asm/io.h>
-#include <asm/mtrr.h>
 
 #define dac_reg	(0x3c8)
 #define dac_val	(0x3c9)
@@ -179,16 +178,10 @@ static int vesafb_setcolreg(unsigned regno, unsigned red, unsigned green,
 
 static void vesafb_destroy(struct fb_info *info)
 {
-#ifdef CONFIG_MTRR
 	struct vesafb_par *par = info->par;
-#endif
 
 	fb_dealloc_cmap(&info->cmap);
-
-#ifdef CONFIG_MTRR
-	if (par->wc_cookie >= 0)
-		mtrr_del(par->wc_cookie, 0, 0);
-#endif
+	arch_phys_wc_del(par->wc_cookie);
 	if (info->screen_base)
 		iounmap(info->screen_base);
 	release_mem_region(info->apertures->ranges[0].base, info->apertures->ranges[0].size);
@@ -419,7 +412,6 @@ static int vesafb_probe(struct platform_device *dev)
 	request_region(0x3c0, 32, "vesafb");
 
 	if (mtrr = 3) {
-#ifdef CONFIG_MTRR
 		unsigned int temp_size = size_total;
 
 		/* Find the largest power-of-two */
@@ -427,18 +419,16 @@ static int vesafb_probe(struct platform_device *dev)
 
 		/* Try and find a power of two to add */
 		do {
-			par->wc_cookie = mtrr_add(vesafb_fix.smem_start,
-						  temp_size,
-						  MTRR_TYPE_WRCOMB, 1);
+			par->wc_cookie +				arch_phys_wc_add(vesafb_fix.smem_start,
+						 temp_size);
 			temp_size >>= 1;
-		} while (temp_size >= PAGE_SIZE && par->wc_cookie = -EINVAL);
-#endif
+		} while (temp_size >= PAGE_SIZE && par->wc_cookie < 0);
+
 		info->screen_base = ioremap_wc(vesafb_fix.smem_start, vesafb_fix.smem_len);
 	} else {
-#ifdef CONFIG_MTRR
 		if (mtrr && mtrr != 3)
 			WARN_ONCE(1, "Only MTRR_TYPE_WRCOMB (3) make sense\n");
-#endif
 		info->screen_base = ioremap(vesafb_fix.smem_start, vesafb_fix.smem_len);
 	}
 
@@ -476,10 +466,7 @@ static int vesafb_probe(struct platform_device *dev)
 	fb_info(info, "%s frame buffer device\n", info->fix.id);
 	return 0;
 err:
-#ifdef CONFIG_MTRR
-	if (par->wc_cookie >= 0)
-		mtrr_del(par->wc_cookie, 0, 0);
-#endif
+	arch_phys_wc_del(par->wc_cookie);
 	if (info->screen_base)
 		iounmap(info->screen_base);
 	framebuffer_release(info);
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 19/47] video: fbdev: vesafb: use arch_phys_wc_add()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (31 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap_wc(), if
anything it just uses a smaller size in case MTRR reservation fails.
ioremap_wc() API is already used to take advantage of architecture
write-combining when available.

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/vesafb.c | 29 ++++++++---------------------
 1 file changed, 8 insertions(+), 21 deletions(-)

diff --git a/drivers/video/fbdev/vesafb.c b/drivers/video/fbdev/vesafb.c
index a2261d0..5bc94d3 100644
--- a/drivers/video/fbdev/vesafb.c
+++ b/drivers/video/fbdev/vesafb.c
@@ -19,10 +19,9 @@
 #include <linux/init.h>
 #include <linux/platform_device.h>
 #include <linux/screen_info.h>
+#include <linux/io.h>
 
 #include <video/vga.h>
-#include <asm/io.h>
-#include <asm/mtrr.h>
 
 #define dac_reg	(0x3c8)
 #define dac_val	(0x3c9)
@@ -179,16 +178,10 @@ static int vesafb_setcolreg(unsigned regno, unsigned red, unsigned green,
 
 static void vesafb_destroy(struct fb_info *info)
 {
-#ifdef CONFIG_MTRR
 	struct vesafb_par *par = info->par;
-#endif
 
 	fb_dealloc_cmap(&info->cmap);
-
-#ifdef CONFIG_MTRR
-	if (par->wc_cookie >= 0)
-		mtrr_del(par->wc_cookie, 0, 0);
-#endif
+	arch_phys_wc_del(par->wc_cookie);
 	if (info->screen_base)
 		iounmap(info->screen_base);
 	release_mem_region(info->apertures->ranges[0].base, info->apertures->ranges[0].size);
@@ -419,7 +412,6 @@ static int vesafb_probe(struct platform_device *dev)
 	request_region(0x3c0, 32, "vesafb");
 
 	if (mtrr == 3) {
-#ifdef CONFIG_MTRR
 		unsigned int temp_size = size_total;
 
 		/* Find the largest power-of-two */
@@ -427,18 +419,16 @@ static int vesafb_probe(struct platform_device *dev)
 
 		/* Try and find a power of two to add */
 		do {
-			par->wc_cookie = mtrr_add(vesafb_fix.smem_start,
-						  temp_size,
-						  MTRR_TYPE_WRCOMB, 1);
+			par->wc_cookie =
+				arch_phys_wc_add(vesafb_fix.smem_start,
+						 temp_size);
 			temp_size >>= 1;
-		} while (temp_size >= PAGE_SIZE && par->wc_cookie == -EINVAL);
-#endif
+		} while (temp_size >= PAGE_SIZE && par->wc_cookie < 0);
+
 		info->screen_base = ioremap_wc(vesafb_fix.smem_start, vesafb_fix.smem_len);
 	} else {
-#ifdef CONFIG_MTRR
 		if (mtrr && mtrr != 3)
 			WARN_ONCE(1, "Only MTRR_TYPE_WRCOMB (3) make sense\n");
-#endif
 		info->screen_base = ioremap(vesafb_fix.smem_start, vesafb_fix.smem_len);
 	}
 
@@ -476,10 +466,7 @@ static int vesafb_probe(struct platform_device *dev)
 	fb_info(info, "%s frame buffer device\n", info->fix.id);
 	return 0;
 err:
-#ifdef CONFIG_MTRR
-	if (par->wc_cookie >= 0)
-		mtrr_del(par->wc_cookie, 0, 0);
-#endif
+	arch_phys_wc_del(par->wc_cookie);
 	if (info->screen_base)
 		iounmap(info->screen_base);
 	framebuffer_release(info);
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 20/47] mtrr: avoid ifdef'ery with phys_wc_to_mtrr_index()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

There is only one user but since we're going to bury
MTRR next out of access to drivers expose this last
piece of API to drivers in a general fashion only
needing io.h for access to helpers.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 arch/x86/include/asm/io.h       |  2 ++
 arch/x86/include/asm/mtrr.h     |  5 -----
 arch/x86/kernel/cpu/mtrr/main.c |  6 +++---
 drivers/gpu/drm/drm_ioctl.c     | 14 +-------------
 include/linux/io.h              |  6 ++++++
 5 files changed, 12 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index a144d05..5e3f1f2 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -346,6 +346,8 @@ extern int __must_check arch_phys_wc_add(unsigned long base,
 					 unsigned long size);
 extern void arch_phys_wc_del(int handle);
 #define arch_phys_wc_add arch_phys_wc_add
+extern int arch_phys_wc_index(int handle);
+#define arch_phys_wc_index arch_phys_wc_index
 #endif
 
 #endif /* _ASM_X86_IO_H */
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index cade917..380bb4b 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -49,7 +49,6 @@ extern void mtrr_aps_init(void);
 extern void mtrr_bp_restore(void);
 extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
 extern int amd_special_default_mtrr(void);
-extern int phys_wc_to_mtrr_index(int handle);
 #  else
 static const int mtrr_enabled;
 static inline u8 mtrr_type_lookup(u64 addr, u64 end)
@@ -86,10 +85,6 @@ static inline int mtrr_trim_uncached_memory(unsigned long end_pfn)
 static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
 {
 }
-static inline int phys_wc_to_mtrr_index(int handle)
-{
-	return -1;
-}
 
 #define mtrr_ap_init() do {} while (0)
 #define mtrr_bp_init() do {} while (0)
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 5ae830b..b68b671 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -607,7 +607,7 @@ void arch_phys_wc_del(int handle)
 EXPORT_SYMBOL(arch_phys_wc_del);
 
 /*
- * phys_wc_to_mtrr_index - translates arch_phys_wc_add's return value
+ * arch_phys_wc_index - translates arch_phys_wc_add's return value
  * @handle: Return value from arch_phys_wc_add
  *
  * This will turn the return value from arch_phys_wc_add into an mtrr
@@ -617,14 +617,14 @@ EXPORT_SYMBOL(arch_phys_wc_del);
  * in printk line.  Alas there is an illegitimate use in some ancient
  * drm ioctls.
  */
-int phys_wc_to_mtrr_index(int handle)
+int arch_phys_wc_index(int handle)
 {
 	if (handle < MTRR_TO_PHYS_WC_OFFSET)
 		return -1;
 	else
 		return handle - MTRR_TO_PHYS_WC_OFFSET;
 }
-EXPORT_SYMBOL_GPL(phys_wc_to_mtrr_index);
+EXPORT_SYMBOL_GPL(arch_phys_wc_index);
 
 /*
  * HACK ALERT!
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index a6d773a..e597cdd 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -36,9 +36,6 @@
 
 #include <linux/pci.h>
 #include <linux/export.h>
-#ifdef CONFIG_X86
-#include <asm/mtrr.h>
-#endif
 
 static int drm_version(struct drm_device *dev, void *data,
 		       struct drm_file *file_priv);
@@ -197,16 +194,7 @@ static int drm_getmap(struct drm_device *dev, void *data,
 	map->type = r_list->map->type;
 	map->flags = r_list->map->flags;
 	map->handle = (void *)(unsigned long) r_list->user_token;
-
-#ifdef CONFIG_X86
-	/*
-	 * There appears to be exactly one user of the mtrr index: dritest.
-	 * It's easy enough to keep it working on non-PAT systems.
-	 */
-	map->mtrr = phys_wc_to_mtrr_index(r_list->map->mtrr);
-#else
-	map->mtrr = -1;
-#endif
+	map->mtrr = arch_phys_wc_index(r_list->map->mtrr);
 
 	mutex_unlock(&dev->struct_mutex);
 
diff --git a/include/linux/io.h b/include/linux/io.h
index ecc51c3..1676437 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -115,6 +115,12 @@ static inline void arch_phys_wc_del(int handle)
 #define __arch_phys_wc_add arch_phys_wc_add
 #endif
 
+#ifndef arch_phys_wc_index
+static inline int arch_phys_wc_index(int handle)
+{
+	return -1;
+}
+#define arch_phys_wc_index arch_phys_wc_index
 #endif
 
 #endif /* _LINUX_IO_H */
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 20/47] mtrr: avoid ifdef'ery with phys_wc_to_mtrr_index()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

There is only one user but since we're going to bury
MTRR next out of access to drivers expose this last
piece of API to drivers in a general fashion only
needing io.h for access to helpers.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 arch/x86/include/asm/io.h       |  2 ++
 arch/x86/include/asm/mtrr.h     |  5 -----
 arch/x86/kernel/cpu/mtrr/main.c |  6 +++---
 drivers/gpu/drm/drm_ioctl.c     | 14 +-------------
 include/linux/io.h              |  6 ++++++
 5 files changed, 12 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index a144d05..5e3f1f2 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -346,6 +346,8 @@ extern int __must_check arch_phys_wc_add(unsigned long base,
 					 unsigned long size);
 extern void arch_phys_wc_del(int handle);
 #define arch_phys_wc_add arch_phys_wc_add
+extern int arch_phys_wc_index(int handle);
+#define arch_phys_wc_index arch_phys_wc_index
 #endif
 
 #endif /* _ASM_X86_IO_H */
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index cade917..380bb4b 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -49,7 +49,6 @@ extern void mtrr_aps_init(void);
 extern void mtrr_bp_restore(void);
 extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
 extern int amd_special_default_mtrr(void);
-extern int phys_wc_to_mtrr_index(int handle);
 #  else
 static const int mtrr_enabled;
 static inline u8 mtrr_type_lookup(u64 addr, u64 end)
@@ -86,10 +85,6 @@ static inline int mtrr_trim_uncached_memory(unsigned long end_pfn)
 static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
 {
 }
-static inline int phys_wc_to_mtrr_index(int handle)
-{
-	return -1;
-}
 
 #define mtrr_ap_init() do {} while (0)
 #define mtrr_bp_init() do {} while (0)
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 5ae830b..b68b671 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -607,7 +607,7 @@ void arch_phys_wc_del(int handle)
 EXPORT_SYMBOL(arch_phys_wc_del);
 
 /*
- * phys_wc_to_mtrr_index - translates arch_phys_wc_add's return value
+ * arch_phys_wc_index - translates arch_phys_wc_add's return value
  * @handle: Return value from arch_phys_wc_add
  *
  * This will turn the return value from arch_phys_wc_add into an mtrr
@@ -617,14 +617,14 @@ EXPORT_SYMBOL(arch_phys_wc_del);
  * in printk line.  Alas there is an illegitimate use in some ancient
  * drm ioctls.
  */
-int phys_wc_to_mtrr_index(int handle)
+int arch_phys_wc_index(int handle)
 {
 	if (handle < MTRR_TO_PHYS_WC_OFFSET)
 		return -1;
 	else
 		return handle - MTRR_TO_PHYS_WC_OFFSET;
 }
-EXPORT_SYMBOL_GPL(phys_wc_to_mtrr_index);
+EXPORT_SYMBOL_GPL(arch_phys_wc_index);
 
 /*
  * HACK ALERT!
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index a6d773a..e597cdd 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -36,9 +36,6 @@
 
 #include <linux/pci.h>
 #include <linux/export.h>
-#ifdef CONFIG_X86
-#include <asm/mtrr.h>
-#endif
 
 static int drm_version(struct drm_device *dev, void *data,
 		       struct drm_file *file_priv);
@@ -197,16 +194,7 @@ static int drm_getmap(struct drm_device *dev, void *data,
 	map->type = r_list->map->type;
 	map->flags = r_list->map->flags;
 	map->handle = (void *)(unsigned long) r_list->user_token;
-
-#ifdef CONFIG_X86
-	/*
-	 * There appears to be exactly one user of the mtrr index: dritest.
-	 * It's easy enough to keep it working on non-PAT systems.
-	 */
-	map->mtrr = phys_wc_to_mtrr_index(r_list->map->mtrr);
-#else
-	map->mtrr = -1;
-#endif
+	map->mtrr = arch_phys_wc_index(r_list->map->mtrr);
 
 	mutex_unlock(&dev->struct_mutex);
 
diff --git a/include/linux/io.h b/include/linux/io.h
index ecc51c3..1676437 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -115,6 +115,12 @@ static inline void arch_phys_wc_del(int handle)
 #define __arch_phys_wc_add arch_phys_wc_add
 #endif
 
+#ifndef arch_phys_wc_index
+static inline int arch_phys_wc_index(int handle)
+{
+	return -1;
+}
+#define arch_phys_wc_index arch_phys_wc_index
 #endif
 
 #endif /* _LINUX_IO_H */
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 20/47] mtrr: avoid ifdef'ery with phys_wc_to_mtrr_index()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (32 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: Jean-Christophe Plagniol-Villard, linux-fbdev, Antonino Daplas,
	Daniel Vetter, Luis R. Rodriguez, x86, linux-kernel,
	Tomi Valkeinen, xen-devel, Ingo Molnar, Linus Torvalds

From: "Luis R. Rodriguez" <mcgrof@suse.com>

There is only one user but since we're going to bury
MTRR next out of access to drivers expose this last
piece of API to drivers in a general fashion only
needing io.h for access to helpers.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 arch/x86/include/asm/io.h       |  2 ++
 arch/x86/include/asm/mtrr.h     |  5 -----
 arch/x86/kernel/cpu/mtrr/main.c |  6 +++---
 drivers/gpu/drm/drm_ioctl.c     | 14 +-------------
 include/linux/io.h              |  6 ++++++
 5 files changed, 12 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index a144d05..5e3f1f2 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -346,6 +346,8 @@ extern int __must_check arch_phys_wc_add(unsigned long base,
 					 unsigned long size);
 extern void arch_phys_wc_del(int handle);
 #define arch_phys_wc_add arch_phys_wc_add
+extern int arch_phys_wc_index(int handle);
+#define arch_phys_wc_index arch_phys_wc_index
 #endif
 
 #endif /* _ASM_X86_IO_H */
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index cade917..380bb4b 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -49,7 +49,6 @@ extern void mtrr_aps_init(void);
 extern void mtrr_bp_restore(void);
 extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
 extern int amd_special_default_mtrr(void);
-extern int phys_wc_to_mtrr_index(int handle);
 #  else
 static const int mtrr_enabled;
 static inline u8 mtrr_type_lookup(u64 addr, u64 end)
@@ -86,10 +85,6 @@ static inline int mtrr_trim_uncached_memory(unsigned long end_pfn)
 static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
 {
 }
-static inline int phys_wc_to_mtrr_index(int handle)
-{
-	return -1;
-}
 
 #define mtrr_ap_init() do {} while (0)
 #define mtrr_bp_init() do {} while (0)
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 5ae830b..b68b671 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -607,7 +607,7 @@ void arch_phys_wc_del(int handle)
 EXPORT_SYMBOL(arch_phys_wc_del);
 
 /*
- * phys_wc_to_mtrr_index - translates arch_phys_wc_add's return value
+ * arch_phys_wc_index - translates arch_phys_wc_add's return value
  * @handle: Return value from arch_phys_wc_add
  *
  * This will turn the return value from arch_phys_wc_add into an mtrr
@@ -617,14 +617,14 @@ EXPORT_SYMBOL(arch_phys_wc_del);
  * in printk line.  Alas there is an illegitimate use in some ancient
  * drm ioctls.
  */
-int phys_wc_to_mtrr_index(int handle)
+int arch_phys_wc_index(int handle)
 {
 	if (handle < MTRR_TO_PHYS_WC_OFFSET)
 		return -1;
 	else
 		return handle - MTRR_TO_PHYS_WC_OFFSET;
 }
-EXPORT_SYMBOL_GPL(phys_wc_to_mtrr_index);
+EXPORT_SYMBOL_GPL(arch_phys_wc_index);
 
 /*
  * HACK ALERT!
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index a6d773a..e597cdd 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -36,9 +36,6 @@
 
 #include <linux/pci.h>
 #include <linux/export.h>
-#ifdef CONFIG_X86
-#include <asm/mtrr.h>
-#endif
 
 static int drm_version(struct drm_device *dev, void *data,
 		       struct drm_file *file_priv);
@@ -197,16 +194,7 @@ static int drm_getmap(struct drm_device *dev, void *data,
 	map->type = r_list->map->type;
 	map->flags = r_list->map->flags;
 	map->handle = (void *)(unsigned long) r_list->user_token;
-
-#ifdef CONFIG_X86
-	/*
-	 * There appears to be exactly one user of the mtrr index: dritest.
-	 * It's easy enough to keep it working on non-PAT systems.
-	 */
-	map->mtrr = phys_wc_to_mtrr_index(r_list->map->mtrr);
-#else
-	map->mtrr = -1;
-#endif
+	map->mtrr = arch_phys_wc_index(r_list->map->mtrr);
 
 	mutex_unlock(&dev->struct_mutex);
 
diff --git a/include/linux/io.h b/include/linux/io.h
index ecc51c3..1676437 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -115,6 +115,12 @@ static inline void arch_phys_wc_del(int handle)
 #define __arch_phys_wc_add arch_phys_wc_add
 #endif
 
+#ifndef arch_phys_wc_index
+static inline int arch_phys_wc_index(int handle)
+{
+	return -1;
+}
+#define arch_phys_wc_index arch_phys_wc_index
 #endif
 
 #endif /* _LINUX_IO_H */
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 21/47] ethernet: myri10ge: use arch_phys_wc_add()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Hyong-Youb Kim, netdev,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver already uses ioremap_wc() on the same range
so when write-combining is available that will be used
instead.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Hyong-Youb Kim <hykim@myri.com>
Cc: netdev@vger.kernel.org
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 36 ++++++------------------
 1 file changed, 8 insertions(+), 28 deletions(-)

diff --git a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
index 1412f5a..01e4069 100644
--- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
+++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
@@ -69,11 +69,7 @@
 #include <net/ip.h>
 #include <net/tcp.h>
 #include <asm/byteorder.h>
-#include <asm/io.h>
 #include <asm/processor.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 #include <net/busy_poll.h>
 
 #include "myri10ge_mcp.h"
@@ -242,8 +238,7 @@ struct myri10ge_priv {
 	unsigned int rdma_tags_available;
 	int intr_coal_delay;
 	__be32 __iomem *intr_coal_delay_ptr;
-	int mtrr;
-	int wc_enabled;
+	int wc_cookie;
 	int down_cnt;
 	wait_queue_head_t down_wq;
 	struct work_struct watchdog_work;
@@ -1984,7 +1979,6 @@ myri10ge_get_ethtool_stats(struct net_device *netdev,
 		data[i] = ((u64 *)&link_stats)[i];
 
 	data[i++] = (unsigned int)mgp->tx_boundary;
-	data[i++] = (unsigned int)mgp->wc_enabled;
 	data[i++] = (unsigned int)mgp->pdev->irq;
 	data[i++] = (unsigned int)mgp->msi_enabled;
 	data[i++] = (unsigned int)mgp->msix_enabled;
@@ -4040,14 +4034,7 @@ static int myri10ge_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	mgp->board_span = pci_resource_len(pdev, 0);
 	mgp->iomem_base = pci_resource_start(pdev, 0);
-	mgp->mtrr = -1;
-	mgp->wc_enabled = 0;
-#ifdef CONFIG_MTRR
-	mgp->mtrr = mtrr_add(mgp->iomem_base, mgp->board_span,
-			     MTRR_TYPE_WRCOMB, 1);
-	if (mgp->mtrr >= 0)
-		mgp->wc_enabled = 1;
-#endif
+	mgp->wc_cookie = arch_phys_wc_add(mgp->iomem_base, mgp->board_span);
 	mgp->sram = ioremap_wc(mgp->iomem_base, mgp->board_span);
 	if (mgp->sram == NULL) {
 		dev_err(&pdev->dev, "ioremap failed for %ld bytes at 0x%lx\n",
@@ -4146,14 +4133,14 @@ static int myri10ge_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto abort_with_state;
 	}
 	if (mgp->msix_enabled)
-		dev_info(dev, "%d MSI-X IRQs, tx bndry %d, fw %s, WC %s\n",
+		dev_info(dev, "%d MSI-X IRQs, tx bndry %d, fw %s, MTRR %s, WC Enabled\n",
 			 mgp->num_slices, mgp->tx_boundary, mgp->fw_name,
-			 (mgp->wc_enabled ? "Enabled" : "Disabled"));
+			 (mgp->wc_cookie > 0 ? "Enabled" : "Disabled"));
 	else
-		dev_info(dev, "%s IRQ %d, tx bndry %d, fw %s, WC %s\n",
+		dev_info(dev, "%s IRQ %d, tx bndry %d, fw %s, MTRR %s, WC Enabled\n",
 			 mgp->msi_enabled ? "MSI" : "xPIC",
 			 pdev->irq, mgp->tx_boundary, mgp->fw_name,
-			 (mgp->wc_enabled ? "Enabled" : "Disabled"));
+			 (mgp->wc_cookie > 0 ? "Enabled" : "Disabled"));
 
 	board_number++;
 	return 0;
@@ -4175,10 +4162,7 @@ abort_with_ioremap:
 	iounmap(mgp->sram);
 
 abort_with_mtrr:
-#ifdef CONFIG_MTRR
-	if (mgp->mtrr >= 0)
-		mtrr_del(mgp->mtrr, mgp->iomem_base, mgp->board_span);
-#endif
+	arch_phys_wc_del(mgp->wc_cookie);
 	dma_free_coherent(&pdev->dev, sizeof(*mgp->cmd),
 			  mgp->cmd, mgp->cmd_bus);
 
@@ -4220,11 +4204,7 @@ static void myri10ge_remove(struct pci_dev *pdev)
 	pci_restore_state(pdev);
 
 	iounmap(mgp->sram);
-
-#ifdef CONFIG_MTRR
-	if (mgp->mtrr >= 0)
-		mtrr_del(mgp->mtrr, mgp->iomem_base, mgp->board_span);
-#endif
+	arch_phys_wc_del(mgp->wc_cookie);
 	myri10ge_free_slices(mgp);
 	kfree(mgp->msix_vectors);
 	dma_free_coherent(&pdev->dev, sizeof(*mgp->cmd),
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 21/47] ethernet: myri10ge: use arch_phys_wc_add()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Hyong-Youb Kim, netdev,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver already uses ioremap_wc() on the same range
so when write-combining is available that will be used
instead.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Hyong-Youb Kim <hykim@myri.com>
Cc: netdev@vger.kernel.org
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 36 ++++++------------------
 1 file changed, 8 insertions(+), 28 deletions(-)

diff --git a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
index 1412f5a..01e4069 100644
--- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
+++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
@@ -69,11 +69,7 @@
 #include <net/ip.h>
 #include <net/tcp.h>
 #include <asm/byteorder.h>
-#include <asm/io.h>
 #include <asm/processor.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 #include <net/busy_poll.h>
 
 #include "myri10ge_mcp.h"
@@ -242,8 +238,7 @@ struct myri10ge_priv {
 	unsigned int rdma_tags_available;
 	int intr_coal_delay;
 	__be32 __iomem *intr_coal_delay_ptr;
-	int mtrr;
-	int wc_enabled;
+	int wc_cookie;
 	int down_cnt;
 	wait_queue_head_t down_wq;
 	struct work_struct watchdog_work;
@@ -1984,7 +1979,6 @@ myri10ge_get_ethtool_stats(struct net_device *netdev,
 		data[i] = ((u64 *)&link_stats)[i];
 
 	data[i++] = (unsigned int)mgp->tx_boundary;
-	data[i++] = (unsigned int)mgp->wc_enabled;
 	data[i++] = (unsigned int)mgp->pdev->irq;
 	data[i++] = (unsigned int)mgp->msi_enabled;
 	data[i++] = (unsigned int)mgp->msix_enabled;
@@ -4040,14 +4034,7 @@ static int myri10ge_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	mgp->board_span = pci_resource_len(pdev, 0);
 	mgp->iomem_base = pci_resource_start(pdev, 0);
-	mgp->mtrr = -1;
-	mgp->wc_enabled = 0;
-#ifdef CONFIG_MTRR
-	mgp->mtrr = mtrr_add(mgp->iomem_base, mgp->board_span,
-			     MTRR_TYPE_WRCOMB, 1);
-	if (mgp->mtrr >= 0)
-		mgp->wc_enabled = 1;
-#endif
+	mgp->wc_cookie = arch_phys_wc_add(mgp->iomem_base, mgp->board_span);
 	mgp->sram = ioremap_wc(mgp->iomem_base, mgp->board_span);
 	if (mgp->sram = NULL) {
 		dev_err(&pdev->dev, "ioremap failed for %ld bytes at 0x%lx\n",
@@ -4146,14 +4133,14 @@ static int myri10ge_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto abort_with_state;
 	}
 	if (mgp->msix_enabled)
-		dev_info(dev, "%d MSI-X IRQs, tx bndry %d, fw %s, WC %s\n",
+		dev_info(dev, "%d MSI-X IRQs, tx bndry %d, fw %s, MTRR %s, WC Enabled\n",
 			 mgp->num_slices, mgp->tx_boundary, mgp->fw_name,
-			 (mgp->wc_enabled ? "Enabled" : "Disabled"));
+			 (mgp->wc_cookie > 0 ? "Enabled" : "Disabled"));
 	else
-		dev_info(dev, "%s IRQ %d, tx bndry %d, fw %s, WC %s\n",
+		dev_info(dev, "%s IRQ %d, tx bndry %d, fw %s, MTRR %s, WC Enabled\n",
 			 mgp->msi_enabled ? "MSI" : "xPIC",
 			 pdev->irq, mgp->tx_boundary, mgp->fw_name,
-			 (mgp->wc_enabled ? "Enabled" : "Disabled"));
+			 (mgp->wc_cookie > 0 ? "Enabled" : "Disabled"));
 
 	board_number++;
 	return 0;
@@ -4175,10 +4162,7 @@ abort_with_ioremap:
 	iounmap(mgp->sram);
 
 abort_with_mtrr:
-#ifdef CONFIG_MTRR
-	if (mgp->mtrr >= 0)
-		mtrr_del(mgp->mtrr, mgp->iomem_base, mgp->board_span);
-#endif
+	arch_phys_wc_del(mgp->wc_cookie);
 	dma_free_coherent(&pdev->dev, sizeof(*mgp->cmd),
 			  mgp->cmd, mgp->cmd_bus);
 
@@ -4220,11 +4204,7 @@ static void myri10ge_remove(struct pci_dev *pdev)
 	pci_restore_state(pdev);
 
 	iounmap(mgp->sram);
-
-#ifdef CONFIG_MTRR
-	if (mgp->mtrr >= 0)
-		mtrr_del(mgp->mtrr, mgp->iomem_base, mgp->board_span);
-#endif
+	arch_phys_wc_del(mgp->wc_cookie);
 	myri10ge_free_slices(mgp);
 	kfree(mgp->msix_vectors);
 	dma_free_coherent(&pdev->dev, sizeof(*mgp->cmd),
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 21/47] ethernet: myri10ge: use arch_phys_wc_add()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (35 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Hyong-Youb Kim, Tomi Valkeinen, netdev,
	xen-devel, Ingo Molnar, Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver already uses ioremap_wc() on the same range
so when write-combining is available that will be used
instead.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Hyong-Youb Kim <hykim@myri.com>
Cc: netdev@vger.kernel.org
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 36 ++++++------------------
 1 file changed, 8 insertions(+), 28 deletions(-)

diff --git a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
index 1412f5a..01e4069 100644
--- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
+++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
@@ -69,11 +69,7 @@
 #include <net/ip.h>
 #include <net/tcp.h>
 #include <asm/byteorder.h>
-#include <asm/io.h>
 #include <asm/processor.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 #include <net/busy_poll.h>
 
 #include "myri10ge_mcp.h"
@@ -242,8 +238,7 @@ struct myri10ge_priv {
 	unsigned int rdma_tags_available;
 	int intr_coal_delay;
 	__be32 __iomem *intr_coal_delay_ptr;
-	int mtrr;
-	int wc_enabled;
+	int wc_cookie;
 	int down_cnt;
 	wait_queue_head_t down_wq;
 	struct work_struct watchdog_work;
@@ -1984,7 +1979,6 @@ myri10ge_get_ethtool_stats(struct net_device *netdev,
 		data[i] = ((u64 *)&link_stats)[i];
 
 	data[i++] = (unsigned int)mgp->tx_boundary;
-	data[i++] = (unsigned int)mgp->wc_enabled;
 	data[i++] = (unsigned int)mgp->pdev->irq;
 	data[i++] = (unsigned int)mgp->msi_enabled;
 	data[i++] = (unsigned int)mgp->msix_enabled;
@@ -4040,14 +4034,7 @@ static int myri10ge_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	mgp->board_span = pci_resource_len(pdev, 0);
 	mgp->iomem_base = pci_resource_start(pdev, 0);
-	mgp->mtrr = -1;
-	mgp->wc_enabled = 0;
-#ifdef CONFIG_MTRR
-	mgp->mtrr = mtrr_add(mgp->iomem_base, mgp->board_span,
-			     MTRR_TYPE_WRCOMB, 1);
-	if (mgp->mtrr >= 0)
-		mgp->wc_enabled = 1;
-#endif
+	mgp->wc_cookie = arch_phys_wc_add(mgp->iomem_base, mgp->board_span);
 	mgp->sram = ioremap_wc(mgp->iomem_base, mgp->board_span);
 	if (mgp->sram == NULL) {
 		dev_err(&pdev->dev, "ioremap failed for %ld bytes at 0x%lx\n",
@@ -4146,14 +4133,14 @@ static int myri10ge_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto abort_with_state;
 	}
 	if (mgp->msix_enabled)
-		dev_info(dev, "%d MSI-X IRQs, tx bndry %d, fw %s, WC %s\n",
+		dev_info(dev, "%d MSI-X IRQs, tx bndry %d, fw %s, MTRR %s, WC Enabled\n",
 			 mgp->num_slices, mgp->tx_boundary, mgp->fw_name,
-			 (mgp->wc_enabled ? "Enabled" : "Disabled"));
+			 (mgp->wc_cookie > 0 ? "Enabled" : "Disabled"));
 	else
-		dev_info(dev, "%s IRQ %d, tx bndry %d, fw %s, WC %s\n",
+		dev_info(dev, "%s IRQ %d, tx bndry %d, fw %s, MTRR %s, WC Enabled\n",
 			 mgp->msi_enabled ? "MSI" : "xPIC",
 			 pdev->irq, mgp->tx_boundary, mgp->fw_name,
-			 (mgp->wc_enabled ? "Enabled" : "Disabled"));
+			 (mgp->wc_cookie > 0 ? "Enabled" : "Disabled"));
 
 	board_number++;
 	return 0;
@@ -4175,10 +4162,7 @@ abort_with_ioremap:
 	iounmap(mgp->sram);
 
 abort_with_mtrr:
-#ifdef CONFIG_MTRR
-	if (mgp->mtrr >= 0)
-		mtrr_del(mgp->mtrr, mgp->iomem_base, mgp->board_span);
-#endif
+	arch_phys_wc_del(mgp->wc_cookie);
 	dma_free_coherent(&pdev->dev, sizeof(*mgp->cmd),
 			  mgp->cmd, mgp->cmd_bus);
 
@@ -4220,11 +4204,7 @@ static void myri10ge_remove(struct pci_dev *pdev)
 	pci_restore_state(pdev);
 
 	iounmap(mgp->sram);
-
-#ifdef CONFIG_MTRR
-	if (mgp->mtrr >= 0)
-		mtrr_del(mgp->mtrr, mgp->iomem_base, mgp->board_span);
-#endif
+	arch_phys_wc_del(mgp->wc_cookie);
 	myri10ge_free_slices(mgp);
 	kfree(mgp->msix_vectors);
 	dma_free_coherent(&pdev->dev, sizeof(*mgp->cmd),
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 22/47] staging: sm750fb: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The same area used for ioremap() is used for the MTRR area.
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/staging/sm750fb/sm750.c    | 34 ++++------------------------------
 drivers/staging/sm750fb/sm750.h    |  3 ---
 drivers/staging/sm750fb/sm750_hw.c |  3 +--
 3 files changed, 5 insertions(+), 35 deletions(-)

diff --git a/drivers/staging/sm750fb/sm750.c b/drivers/staging/sm750fb/sm750.c
index aa0888c..ea59471 100644
--- a/drivers/staging/sm750fb/sm750.c
+++ b/drivers/staging/sm750fb/sm750.c
@@ -16,9 +16,6 @@
 #include<linux/vmalloc.h>
 #include<linux/pagemap.h>
 #include <linux/console.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 #include <asm/fb.h>
 #include "sm750.h"
 #include "sm750_hw.h"
@@ -47,9 +44,7 @@ typedef int (*PROC_SPEC_INITHW)(struct lynx_share*,struct pci_dev*);
 /* common var for all device */
 static int g_hwcursor = 1;
 static int g_noaccel = 0;
-#ifdef CONFIG_MTRR
 static int g_nomtrr  = 0;
-#endif
 static const char * g_fbmode[] = {NULL,NULL};
 static const char * g_def_fbmode = "800x600-16@60";
 static char * g_settings = NULL;
@@ -1102,11 +1097,8 @@ static int lynxfb_pci_probe(struct pci_dev * pdev,
 
 	pr_info("share->revid = %02x\n",share->revid);
 	share->pdev = pdev;
-#ifdef CONFIG_MTRR
 	share->mtrr_off = g_nomtrr;
 	share->mtrr.vram = 0;
-	share->mtrr.vram_added = 0;
-#endif
 	share->accel_off = g_noaccel;
 	share->dual = g_dualview;
 	spin_lock_init(&share->slock);
@@ -1134,22 +1126,9 @@ static int lynxfb_pci_probe(struct pci_dev * pdev,
 		goto err_map;
 	}
 
-#ifdef CONFIG_MTRR
-	if(!share->mtrr_off){
-		pr_info("enable mtrr\n");
-		share->mtrr.vram = mtrr_add(share->vidmem_start,
-				share->vidmem_size,
-				MTRR_TYPE_WRCOMB,1);
-
-		if(share->mtrr.vram < 0){
-			/* don't block driver with the failure of MTRR */
-			pr_err("Unable to setup MTRR.\n");
-		}else{
-			share->mtrr.vram_added = 1;
-			pr_info("MTRR added succesfully\n");
-		}
-	}
-#endif
+	if (!share->mtrr_off)
+		share->mtrr.vram = arch_phys_wc_add(share->vidmem_start,
+						    share->vidmem_size);
 
 	memset(share->pvMem,0,share->vidmem_size);
 
@@ -1250,10 +1229,7 @@ static void __exit lynxfb_pci_remove(struct pci_dev * pdev)
 		/* release frame buffer*/
 		framebuffer_release(info);
 	}
-#ifdef CONFIG_MTRR
-	if(share->mtrr.vram_added)
-		mtrr_del(share->mtrr.vram,share->vidmem_start,share->vidmem_size);
-#endif
+	arch_phys_wc_del(share->mtrr.vram);
 	//	pci_release_regions(pdev);
 
 	iounmap(share->pvReg);
@@ -1297,10 +1273,8 @@ static int __init lynxfb_setup(char * options)
 		/* options that mean for any lynx chips are configured here */
 		if(!strncmp(opt,"noaccel",strlen("noaccel")))
 			g_noaccel = 1;
-#ifdef CONFIG_MTRR
 		else if(!strncmp(opt,"nomtrr",strlen("nomtrr")))
 			g_nomtrr = 1;
-#endif
 		else if(!strncmp(opt,"dual",strlen("dual")))
 			g_dualview = 1;
 		else
diff --git a/drivers/staging/sm750fb/sm750.h b/drivers/staging/sm750fb/sm750.h
index 0847d2b..5528912 100644
--- a/drivers/staging/sm750fb/sm750.h
+++ b/drivers/staging/sm750fb/sm750.h
@@ -51,13 +51,10 @@ struct lynx_share{
 	struct lynx_accel accel;
 	int accel_off;
 	int dual;
-#ifdef CONFIG_MTRR
 		int mtrr_off;
 		struct{
 			int vram;
-			int vram_added;
 		}mtrr;
-#endif
 	/* all smi graphic adaptor got below attributes */
 	unsigned long vidmem_start;
 	unsigned long vidreg_start;
diff --git a/drivers/staging/sm750fb/sm750_hw.c b/drivers/staging/sm750fb/sm750_hw.c
index c44a50b..203a0a1 100644
--- a/drivers/staging/sm750fb/sm750_hw.c
+++ b/drivers/staging/sm750fb/sm750_hw.c
@@ -85,8 +85,7 @@ int hw_sm750_map(struct lynx_share* share,struct pci_dev* pdev)
 	}
 #endif
 
-	share->pvMem = ioremap(share->vidmem_start,
-							share->vidmem_size);
+	share->pvMem = ioremap_wc(share->vidmem_start, share->vidmem_size);
 
 	if(!share->pvMem){
 		pr_err("Map video memory failed\n");
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 22/47] staging: sm750fb: use arch_phys_wc_add() and ioremap_wc()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The same area used for ioremap() is used for the MTRR area.
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/staging/sm750fb/sm750.c    | 34 ++++------------------------------
 drivers/staging/sm750fb/sm750.h    |  3 ---
 drivers/staging/sm750fb/sm750_hw.c |  3 +--
 3 files changed, 5 insertions(+), 35 deletions(-)

diff --git a/drivers/staging/sm750fb/sm750.c b/drivers/staging/sm750fb/sm750.c
index aa0888c..ea59471 100644
--- a/drivers/staging/sm750fb/sm750.c
+++ b/drivers/staging/sm750fb/sm750.c
@@ -16,9 +16,6 @@
 #include<linux/vmalloc.h>
 #include<linux/pagemap.h>
 #include <linux/console.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 #include <asm/fb.h>
 #include "sm750.h"
 #include "sm750_hw.h"
@@ -47,9 +44,7 @@ typedef int (*PROC_SPEC_INITHW)(struct lynx_share*,struct pci_dev*);
 /* common var for all device */
 static int g_hwcursor = 1;
 static int g_noaccel = 0;
-#ifdef CONFIG_MTRR
 static int g_nomtrr  = 0;
-#endif
 static const char * g_fbmode[] = {NULL,NULL};
 static const char * g_def_fbmode = "800x600-16@60";
 static char * g_settings = NULL;
@@ -1102,11 +1097,8 @@ static int lynxfb_pci_probe(struct pci_dev * pdev,
 
 	pr_info("share->revid = %02x\n",share->revid);
 	share->pdev = pdev;
-#ifdef CONFIG_MTRR
 	share->mtrr_off = g_nomtrr;
 	share->mtrr.vram = 0;
-	share->mtrr.vram_added = 0;
-#endif
 	share->accel_off = g_noaccel;
 	share->dual = g_dualview;
 	spin_lock_init(&share->slock);
@@ -1134,22 +1126,9 @@ static int lynxfb_pci_probe(struct pci_dev * pdev,
 		goto err_map;
 	}
 
-#ifdef CONFIG_MTRR
-	if(!share->mtrr_off){
-		pr_info("enable mtrr\n");
-		share->mtrr.vram = mtrr_add(share->vidmem_start,
-				share->vidmem_size,
-				MTRR_TYPE_WRCOMB,1);
-
-		if(share->mtrr.vram < 0){
-			/* don't block driver with the failure of MTRR */
-			pr_err("Unable to setup MTRR.\n");
-		}else{
-			share->mtrr.vram_added = 1;
-			pr_info("MTRR added succesfully\n");
-		}
-	}
-#endif
+	if (!share->mtrr_off)
+		share->mtrr.vram = arch_phys_wc_add(share->vidmem_start,
+						    share->vidmem_size);
 
 	memset(share->pvMem,0,share->vidmem_size);
 
@@ -1250,10 +1229,7 @@ static void __exit lynxfb_pci_remove(struct pci_dev * pdev)
 		/* release frame buffer*/
 		framebuffer_release(info);
 	}
-#ifdef CONFIG_MTRR
-	if(share->mtrr.vram_added)
-		mtrr_del(share->mtrr.vram,share->vidmem_start,share->vidmem_size);
-#endif
+	arch_phys_wc_del(share->mtrr.vram);
 	//	pci_release_regions(pdev);
 
 	iounmap(share->pvReg);
@@ -1297,10 +1273,8 @@ static int __init lynxfb_setup(char * options)
 		/* options that mean for any lynx chips are configured here */
 		if(!strncmp(opt,"noaccel",strlen("noaccel")))
 			g_noaccel = 1;
-#ifdef CONFIG_MTRR
 		else if(!strncmp(opt,"nomtrr",strlen("nomtrr")))
 			g_nomtrr = 1;
-#endif
 		else if(!strncmp(opt,"dual",strlen("dual")))
 			g_dualview = 1;
 		else
diff --git a/drivers/staging/sm750fb/sm750.h b/drivers/staging/sm750fb/sm750.h
index 0847d2b..5528912 100644
--- a/drivers/staging/sm750fb/sm750.h
+++ b/drivers/staging/sm750fb/sm750.h
@@ -51,13 +51,10 @@ struct lynx_share{
 	struct lynx_accel accel;
 	int accel_off;
 	int dual;
-#ifdef CONFIG_MTRR
 		int mtrr_off;
 		struct{
 			int vram;
-			int vram_added;
 		}mtrr;
-#endif
 	/* all smi graphic adaptor got below attributes */
 	unsigned long vidmem_start;
 	unsigned long vidreg_start;
diff --git a/drivers/staging/sm750fb/sm750_hw.c b/drivers/staging/sm750fb/sm750_hw.c
index c44a50b..203a0a1 100644
--- a/drivers/staging/sm750fb/sm750_hw.c
+++ b/drivers/staging/sm750fb/sm750_hw.c
@@ -85,8 +85,7 @@ int hw_sm750_map(struct lynx_share* share,struct pci_dev* pdev)
 	}
 #endif
 
-	share->pvMem = ioremap(share->vidmem_start,
-							share->vidmem_size);
+	share->pvMem = ioremap_wc(share->vidmem_start, share->vidmem_size);
 
 	if(!share->pvMem){
 		pr_err("Map video memory failed\n");
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 22/47] staging: sm750fb: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (36 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The same area used for ioremap() is used for the MTRR area.
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/staging/sm750fb/sm750.c    | 34 ++++------------------------------
 drivers/staging/sm750fb/sm750.h    |  3 ---
 drivers/staging/sm750fb/sm750_hw.c |  3 +--
 3 files changed, 5 insertions(+), 35 deletions(-)

diff --git a/drivers/staging/sm750fb/sm750.c b/drivers/staging/sm750fb/sm750.c
index aa0888c..ea59471 100644
--- a/drivers/staging/sm750fb/sm750.c
+++ b/drivers/staging/sm750fb/sm750.c
@@ -16,9 +16,6 @@
 #include<linux/vmalloc.h>
 #include<linux/pagemap.h>
 #include <linux/console.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 #include <asm/fb.h>
 #include "sm750.h"
 #include "sm750_hw.h"
@@ -47,9 +44,7 @@ typedef int (*PROC_SPEC_INITHW)(struct lynx_share*,struct pci_dev*);
 /* common var for all device */
 static int g_hwcursor = 1;
 static int g_noaccel = 0;
-#ifdef CONFIG_MTRR
 static int g_nomtrr  = 0;
-#endif
 static const char * g_fbmode[] = {NULL,NULL};
 static const char * g_def_fbmode = "800x600-16@60";
 static char * g_settings = NULL;
@@ -1102,11 +1097,8 @@ static int lynxfb_pci_probe(struct pci_dev * pdev,
 
 	pr_info("share->revid = %02x\n",share->revid);
 	share->pdev = pdev;
-#ifdef CONFIG_MTRR
 	share->mtrr_off = g_nomtrr;
 	share->mtrr.vram = 0;
-	share->mtrr.vram_added = 0;
-#endif
 	share->accel_off = g_noaccel;
 	share->dual = g_dualview;
 	spin_lock_init(&share->slock);
@@ -1134,22 +1126,9 @@ static int lynxfb_pci_probe(struct pci_dev * pdev,
 		goto err_map;
 	}
 
-#ifdef CONFIG_MTRR
-	if(!share->mtrr_off){
-		pr_info("enable mtrr\n");
-		share->mtrr.vram = mtrr_add(share->vidmem_start,
-				share->vidmem_size,
-				MTRR_TYPE_WRCOMB,1);
-
-		if(share->mtrr.vram < 0){
-			/* don't block driver with the failure of MTRR */
-			pr_err("Unable to setup MTRR.\n");
-		}else{
-			share->mtrr.vram_added = 1;
-			pr_info("MTRR added succesfully\n");
-		}
-	}
-#endif
+	if (!share->mtrr_off)
+		share->mtrr.vram = arch_phys_wc_add(share->vidmem_start,
+						    share->vidmem_size);
 
 	memset(share->pvMem,0,share->vidmem_size);
 
@@ -1250,10 +1229,7 @@ static void __exit lynxfb_pci_remove(struct pci_dev * pdev)
 		/* release frame buffer*/
 		framebuffer_release(info);
 	}
-#ifdef CONFIG_MTRR
-	if(share->mtrr.vram_added)
-		mtrr_del(share->mtrr.vram,share->vidmem_start,share->vidmem_size);
-#endif
+	arch_phys_wc_del(share->mtrr.vram);
 	//	pci_release_regions(pdev);
 
 	iounmap(share->pvReg);
@@ -1297,10 +1273,8 @@ static int __init lynxfb_setup(char * options)
 		/* options that mean for any lynx chips are configured here */
 		if(!strncmp(opt,"noaccel",strlen("noaccel")))
 			g_noaccel = 1;
-#ifdef CONFIG_MTRR
 		else if(!strncmp(opt,"nomtrr",strlen("nomtrr")))
 			g_nomtrr = 1;
-#endif
 		else if(!strncmp(opt,"dual",strlen("dual")))
 			g_dualview = 1;
 		else
diff --git a/drivers/staging/sm750fb/sm750.h b/drivers/staging/sm750fb/sm750.h
index 0847d2b..5528912 100644
--- a/drivers/staging/sm750fb/sm750.h
+++ b/drivers/staging/sm750fb/sm750.h
@@ -51,13 +51,10 @@ struct lynx_share{
 	struct lynx_accel accel;
 	int accel_off;
 	int dual;
-#ifdef CONFIG_MTRR
 		int mtrr_off;
 		struct{
 			int vram;
-			int vram_added;
 		}mtrr;
-#endif
 	/* all smi graphic adaptor got below attributes */
 	unsigned long vidmem_start;
 	unsigned long vidreg_start;
diff --git a/drivers/staging/sm750fb/sm750_hw.c b/drivers/staging/sm750fb/sm750_hw.c
index c44a50b..203a0a1 100644
--- a/drivers/staging/sm750fb/sm750_hw.c
+++ b/drivers/staging/sm750fb/sm750_hw.c
@@ -85,8 +85,7 @@ int hw_sm750_map(struct lynx_share* share,struct pci_dev* pdev)
 	}
 #endif
 
-	share->pvMem = ioremap(share->vidmem_start,
-							share->vidmem_size);
+	share->pvMem = ioremap_wc(share->vidmem_start, share->vidmem_size);
 
 	if(!share->pvMem){
 		pr_err("Map video memory failed\n");
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 23/47] staging: xgifb: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The same area used for ioremap() is used for the MTRR area.
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/staging/xgifb/XGI_main_26.c | 27 ++++++---------------------
 1 file changed, 6 insertions(+), 21 deletions(-)

diff --git a/drivers/staging/xgifb/XGI_main_26.c b/drivers/staging/xgifb/XGI_main_26.c
index 74e8820..943d463 100644
--- a/drivers/staging/xgifb/XGI_main_26.c
+++ b/drivers/staging/xgifb/XGI_main_26.c
@@ -8,10 +8,7 @@
 
 #include <linux/sizes.h>
 #include <linux/module.h>
-
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
+#include <linux/pci.h>
 
 #include "XGI_main.h"
 #include "vb_init.h"
@@ -1770,7 +1767,7 @@ static int xgifb_probe(struct pci_dev *pdev,
 	}
 
 	xgifb_info->video_vbase = hw_info->pjVideoMemoryAddress =
-	ioremap(xgifb_info->video_base, xgifb_info->video_size);
+		ioremap_wc(xgifb_info->video_base, xgifb_info->video_size);
 	xgifb_info->mmio_vbase = ioremap(xgifb_info->mmio_base,
 					    xgifb_info->mmio_size);
 
@@ -2014,12 +2011,8 @@ static int xgifb_probe(struct pci_dev *pdev,
 
 	fb_alloc_cmap(&fb_info->cmap, 256, 0);
 
-#ifdef CONFIG_MTRR
-	xgifb_info->mtrr = mtrr_add(xgifb_info->video_base,
-		xgifb_info->video_size, MTRR_TYPE_WRCOMB, 1);
-	if (xgifb_info->mtrr >= 0)
-		dev_info(&pdev->dev, "Added MTRR\n");
-#endif
+	xgifb_info->mtrr = arch_phys_wc_add(xgifb_info->video_base,
+					    xgifb_info->video_size);
 
 	if (register_framebuffer(fb_info) < 0) {
 		ret = -EINVAL;
@@ -2031,11 +2024,7 @@ static int xgifb_probe(struct pci_dev *pdev,
 	return 0;
 
 error_mtrr:
-#ifdef CONFIG_MTRR
-	if (xgifb_info->mtrr >= 0)
-		mtrr_del(xgifb_info->mtrr, xgifb_info->video_base,
-			xgifb_info->video_size);
-#endif /* CONFIG_MTRR */
+	arch_phys_wc_del(xgifb_info->mtrr);
 error_1:
 	iounmap(xgifb_info->mmio_vbase);
 	iounmap(xgifb_info->video_vbase);
@@ -2059,11 +2048,7 @@ static void xgifb_remove(struct pci_dev *pdev)
 	struct fb_info *fb_info = xgifb_info->fb_info;
 
 	unregister_framebuffer(fb_info);
-#ifdef CONFIG_MTRR
-	if (xgifb_info->mtrr >= 0)
-		mtrr_del(xgifb_info->mtrr, xgifb_info->video_base,
-			xgifb_info->video_size);
-#endif /* CONFIG_MTRR */
+	arch_phys_wc_del(xgifb_info->mtrr);
 	iounmap(xgifb_info->mmio_vbase);
 	iounmap(xgifb_info->video_vbase);
 	release_mem_region(xgifb_info->mmio_base, xgifb_info->mmio_size);
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 23/47] staging: xgifb: use arch_phys_wc_add() and ioremap_wc()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The same area used for ioremap() is used for the MTRR area.
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/staging/xgifb/XGI_main_26.c | 27 ++++++---------------------
 1 file changed, 6 insertions(+), 21 deletions(-)

diff --git a/drivers/staging/xgifb/XGI_main_26.c b/drivers/staging/xgifb/XGI_main_26.c
index 74e8820..943d463 100644
--- a/drivers/staging/xgifb/XGI_main_26.c
+++ b/drivers/staging/xgifb/XGI_main_26.c
@@ -8,10 +8,7 @@
 
 #include <linux/sizes.h>
 #include <linux/module.h>
-
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
+#include <linux/pci.h>
 
 #include "XGI_main.h"
 #include "vb_init.h"
@@ -1770,7 +1767,7 @@ static int xgifb_probe(struct pci_dev *pdev,
 	}
 
 	xgifb_info->video_vbase = hw_info->pjVideoMemoryAddress -	ioremap(xgifb_info->video_base, xgifb_info->video_size);
+		ioremap_wc(xgifb_info->video_base, xgifb_info->video_size);
 	xgifb_info->mmio_vbase = ioremap(xgifb_info->mmio_base,
 					    xgifb_info->mmio_size);
 
@@ -2014,12 +2011,8 @@ static int xgifb_probe(struct pci_dev *pdev,
 
 	fb_alloc_cmap(&fb_info->cmap, 256, 0);
 
-#ifdef CONFIG_MTRR
-	xgifb_info->mtrr = mtrr_add(xgifb_info->video_base,
-		xgifb_info->video_size, MTRR_TYPE_WRCOMB, 1);
-	if (xgifb_info->mtrr >= 0)
-		dev_info(&pdev->dev, "Added MTRR\n");
-#endif
+	xgifb_info->mtrr = arch_phys_wc_add(xgifb_info->video_base,
+					    xgifb_info->video_size);
 
 	if (register_framebuffer(fb_info) < 0) {
 		ret = -EINVAL;
@@ -2031,11 +2024,7 @@ static int xgifb_probe(struct pci_dev *pdev,
 	return 0;
 
 error_mtrr:
-#ifdef CONFIG_MTRR
-	if (xgifb_info->mtrr >= 0)
-		mtrr_del(xgifb_info->mtrr, xgifb_info->video_base,
-			xgifb_info->video_size);
-#endif /* CONFIG_MTRR */
+	arch_phys_wc_del(xgifb_info->mtrr);
 error_1:
 	iounmap(xgifb_info->mmio_vbase);
 	iounmap(xgifb_info->video_vbase);
@@ -2059,11 +2048,7 @@ static void xgifb_remove(struct pci_dev *pdev)
 	struct fb_info *fb_info = xgifb_info->fb_info;
 
 	unregister_framebuffer(fb_info);
-#ifdef CONFIG_MTRR
-	if (xgifb_info->mtrr >= 0)
-		mtrr_del(xgifb_info->mtrr, xgifb_info->video_base,
-			xgifb_info->video_size);
-#endif /* CONFIG_MTRR */
+	arch_phys_wc_del(xgifb_info->mtrr);
 	iounmap(xgifb_info->mmio_vbase);
 	iounmap(xgifb_info->video_vbase);
 	release_mem_region(xgifb_info->mmio_base, xgifb_info->mmio_size);
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 23/47] staging: xgifb: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (38 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The same area used for ioremap() is used for the MTRR area.
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/staging/xgifb/XGI_main_26.c | 27 ++++++---------------------
 1 file changed, 6 insertions(+), 21 deletions(-)

diff --git a/drivers/staging/xgifb/XGI_main_26.c b/drivers/staging/xgifb/XGI_main_26.c
index 74e8820..943d463 100644
--- a/drivers/staging/xgifb/XGI_main_26.c
+++ b/drivers/staging/xgifb/XGI_main_26.c
@@ -8,10 +8,7 @@
 
 #include <linux/sizes.h>
 #include <linux/module.h>
-
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
+#include <linux/pci.h>
 
 #include "XGI_main.h"
 #include "vb_init.h"
@@ -1770,7 +1767,7 @@ static int xgifb_probe(struct pci_dev *pdev,
 	}
 
 	xgifb_info->video_vbase = hw_info->pjVideoMemoryAddress =
-	ioremap(xgifb_info->video_base, xgifb_info->video_size);
+		ioremap_wc(xgifb_info->video_base, xgifb_info->video_size);
 	xgifb_info->mmio_vbase = ioremap(xgifb_info->mmio_base,
 					    xgifb_info->mmio_size);
 
@@ -2014,12 +2011,8 @@ static int xgifb_probe(struct pci_dev *pdev,
 
 	fb_alloc_cmap(&fb_info->cmap, 256, 0);
 
-#ifdef CONFIG_MTRR
-	xgifb_info->mtrr = mtrr_add(xgifb_info->video_base,
-		xgifb_info->video_size, MTRR_TYPE_WRCOMB, 1);
-	if (xgifb_info->mtrr >= 0)
-		dev_info(&pdev->dev, "Added MTRR\n");
-#endif
+	xgifb_info->mtrr = arch_phys_wc_add(xgifb_info->video_base,
+					    xgifb_info->video_size);
 
 	if (register_framebuffer(fb_info) < 0) {
 		ret = -EINVAL;
@@ -2031,11 +2024,7 @@ static int xgifb_probe(struct pci_dev *pdev,
 	return 0;
 
 error_mtrr:
-#ifdef CONFIG_MTRR
-	if (xgifb_info->mtrr >= 0)
-		mtrr_del(xgifb_info->mtrr, xgifb_info->video_base,
-			xgifb_info->video_size);
-#endif /* CONFIG_MTRR */
+	arch_phys_wc_del(xgifb_info->mtrr);
 error_1:
 	iounmap(xgifb_info->mmio_vbase);
 	iounmap(xgifb_info->video_vbase);
@@ -2059,11 +2048,7 @@ static void xgifb_remove(struct pci_dev *pdev)
 	struct fb_info *fb_info = xgifb_info->fb_info;
 
 	unregister_framebuffer(fb_info);
-#ifdef CONFIG_MTRR
-	if (xgifb_info->mtrr >= 0)
-		mtrr_del(xgifb_info->mtrr, xgifb_info->video_base,
-			xgifb_info->video_size);
-#endif /* CONFIG_MTRR */
+	arch_phys_wc_del(xgifb_info->mtrr);
 	iounmap(xgifb_info->mmio_vbase);
 	iounmap(xgifb_info->video_vbase);
 	release_mem_region(xgifb_info->mmio_base, xgifb_info->mmio_size);
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 24/47] video: fbdev: arkfb: use arch_phys_wc_add() and pci_iomap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/arkfb.c | 36 +++++-------------------------------
 1 file changed, 5 insertions(+), 31 deletions(-)

diff --git a/drivers/video/fbdev/arkfb.c b/drivers/video/fbdev/arkfb.c
index b305a1e..6a317de 100644
--- a/drivers/video/fbdev/arkfb.c
+++ b/drivers/video/fbdev/arkfb.c
@@ -26,13 +26,9 @@
 #include <linux/console.h> /* Why should fb driver call console functions? because console_lock() */
 #include <video/vga.h>
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 struct arkfb_info {
 	int mclk_freq;
-	int mtrr_reg;
+	int wc_cookie;
 
 	struct dac_info *dac;
 	struct vgastate state;
@@ -102,10 +98,6 @@ static const struct svga_timing_regs ark_timing_regs     = {
 
 static char *mode_option = "640x480-8@60";
 
-#ifdef CONFIG_MTRR
-static int mtrr = 1;
-#endif
-
 MODULE_AUTHOR("(c) 2007 Ondrej Zajicek <santiago@crfreenet.org>");
 MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION("fbdev driver for ARK 2000PV");
@@ -115,11 +107,6 @@ MODULE_PARM_DESC(mode_option, "Default video mode ('640x480-8@60', etc)");
 module_param_named(mode, mode_option, charp, 0444);
 MODULE_PARM_DESC(mode, "Default video mode ('640x480-8@60', etc) (deprecated)");
 
-#ifdef CONFIG_MTRR
-module_param(mtrr, int, 0444);
-MODULE_PARM_DESC(mtrr, "Enable write-combining with MTRR (1=enable, 0=disable, default=1)");
-#endif
-
 static int threshold = 4;
 
 module_param(threshold, int, 0644);
@@ -1002,7 +989,7 @@ static int ark_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 	info->fix.smem_len = pci_resource_len(dev, 0);
 
 	/* Map physical IO memory address into kernel space */
-	info->screen_base = pci_iomap(dev, 0, 0);
+	info->screen_base = pci_iomap_wc(dev, 0, 0);
 	if (! info->screen_base) {
 		rc = -ENOMEM;
 		dev_err(info->device, "iomap for framebuffer failed\n");
@@ -1057,14 +1044,8 @@ static int ark_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 
 	/* Record a reference to the driver data */
 	pci_set_drvdata(dev, info);
-
-#ifdef CONFIG_MTRR
-	if (mtrr) {
-		par->mtrr_reg = -1;
-		par->mtrr_reg = mtrr_add(info->fix.smem_start, info->fix.smem_len, MTRR_TYPE_WRCOMB, 1);
-	}
-#endif
-
+	par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+					  info->fix.smem_len);
 	return 0;
 
 	/* Error handling */
@@ -1092,14 +1073,7 @@ static void ark_pci_remove(struct pci_dev *dev)
 
 	if (info) {
 		struct arkfb_info *par = info->par;
-
-#ifdef CONFIG_MTRR
-		if (par->mtrr_reg >= 0) {
-			mtrr_del(par->mtrr_reg, 0, 0);
-			par->mtrr_reg = -1;
-		}
-#endif
-
+		arch_phys_wc_del(par->wc_cookie);
 		dac_release(par->dac);
 		unregister_framebuffer(info);
 		fb_dealloc_cmap(&info->cmap);
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 24/47] video: fbdev: arkfb: use arch_phys_wc_add() and pci_iomap_wc()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/arkfb.c | 36 +++++-------------------------------
 1 file changed, 5 insertions(+), 31 deletions(-)

diff --git a/drivers/video/fbdev/arkfb.c b/drivers/video/fbdev/arkfb.c
index b305a1e..6a317de 100644
--- a/drivers/video/fbdev/arkfb.c
+++ b/drivers/video/fbdev/arkfb.c
@@ -26,13 +26,9 @@
 #include <linux/console.h> /* Why should fb driver call console functions? because console_lock() */
 #include <video/vga.h>
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 struct arkfb_info {
 	int mclk_freq;
-	int mtrr_reg;
+	int wc_cookie;
 
 	struct dac_info *dac;
 	struct vgastate state;
@@ -102,10 +98,6 @@ static const struct svga_timing_regs ark_timing_regs     = {
 
 static char *mode_option = "640x480-8@60";
 
-#ifdef CONFIG_MTRR
-static int mtrr = 1;
-#endif
-
 MODULE_AUTHOR("(c) 2007 Ondrej Zajicek <santiago@crfreenet.org>");
 MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION("fbdev driver for ARK 2000PV");
@@ -115,11 +107,6 @@ MODULE_PARM_DESC(mode_option, "Default video mode ('640x480-8@60', etc)");
 module_param_named(mode, mode_option, charp, 0444);
 MODULE_PARM_DESC(mode, "Default video mode ('640x480-8@60', etc) (deprecated)");
 
-#ifdef CONFIG_MTRR
-module_param(mtrr, int, 0444);
-MODULE_PARM_DESC(mtrr, "Enable write-combining with MTRR (1=enable, 0=disable, default=1)");
-#endif
-
 static int threshold = 4;
 
 module_param(threshold, int, 0644);
@@ -1002,7 +989,7 @@ static int ark_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 	info->fix.smem_len = pci_resource_len(dev, 0);
 
 	/* Map physical IO memory address into kernel space */
-	info->screen_base = pci_iomap(dev, 0, 0);
+	info->screen_base = pci_iomap_wc(dev, 0, 0);
 	if (! info->screen_base) {
 		rc = -ENOMEM;
 		dev_err(info->device, "iomap for framebuffer failed\n");
@@ -1057,14 +1044,8 @@ static int ark_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 
 	/* Record a reference to the driver data */
 	pci_set_drvdata(dev, info);
-
-#ifdef CONFIG_MTRR
-	if (mtrr) {
-		par->mtrr_reg = -1;
-		par->mtrr_reg = mtrr_add(info->fix.smem_start, info->fix.smem_len, MTRR_TYPE_WRCOMB, 1);
-	}
-#endif
-
+	par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+					  info->fix.smem_len);
 	return 0;
 
 	/* Error handling */
@@ -1092,14 +1073,7 @@ static void ark_pci_remove(struct pci_dev *dev)
 
 	if (info) {
 		struct arkfb_info *par = info->par;
-
-#ifdef CONFIG_MTRR
-		if (par->mtrr_reg >= 0) {
-			mtrr_del(par->mtrr_reg, 0, 0);
-			par->mtrr_reg = -1;
-		}
-#endif
-
+		arch_phys_wc_del(par->wc_cookie);
 		dac_release(par->dac);
 		unregister_framebuffer(info);
 		fb_dealloc_cmap(&info->cmap);
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 24/47] video: fbdev: arkfb: use arch_phys_wc_add() and pci_iomap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (41 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/arkfb.c | 36 +++++-------------------------------
 1 file changed, 5 insertions(+), 31 deletions(-)

diff --git a/drivers/video/fbdev/arkfb.c b/drivers/video/fbdev/arkfb.c
index b305a1e..6a317de 100644
--- a/drivers/video/fbdev/arkfb.c
+++ b/drivers/video/fbdev/arkfb.c
@@ -26,13 +26,9 @@
 #include <linux/console.h> /* Why should fb driver call console functions? because console_lock() */
 #include <video/vga.h>
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 struct arkfb_info {
 	int mclk_freq;
-	int mtrr_reg;
+	int wc_cookie;
 
 	struct dac_info *dac;
 	struct vgastate state;
@@ -102,10 +98,6 @@ static const struct svga_timing_regs ark_timing_regs     = {
 
 static char *mode_option = "640x480-8@60";
 
-#ifdef CONFIG_MTRR
-static int mtrr = 1;
-#endif
-
 MODULE_AUTHOR("(c) 2007 Ondrej Zajicek <santiago@crfreenet.org>");
 MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION("fbdev driver for ARK 2000PV");
@@ -115,11 +107,6 @@ MODULE_PARM_DESC(mode_option, "Default video mode ('640x480-8@60', etc)");
 module_param_named(mode, mode_option, charp, 0444);
 MODULE_PARM_DESC(mode, "Default video mode ('640x480-8@60', etc) (deprecated)");
 
-#ifdef CONFIG_MTRR
-module_param(mtrr, int, 0444);
-MODULE_PARM_DESC(mtrr, "Enable write-combining with MTRR (1=enable, 0=disable, default=1)");
-#endif
-
 static int threshold = 4;
 
 module_param(threshold, int, 0644);
@@ -1002,7 +989,7 @@ static int ark_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 	info->fix.smem_len = pci_resource_len(dev, 0);
 
 	/* Map physical IO memory address into kernel space */
-	info->screen_base = pci_iomap(dev, 0, 0);
+	info->screen_base = pci_iomap_wc(dev, 0, 0);
 	if (! info->screen_base) {
 		rc = -ENOMEM;
 		dev_err(info->device, "iomap for framebuffer failed\n");
@@ -1057,14 +1044,8 @@ static int ark_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 
 	/* Record a reference to the driver data */
 	pci_set_drvdata(dev, info);
-
-#ifdef CONFIG_MTRR
-	if (mtrr) {
-		par->mtrr_reg = -1;
-		par->mtrr_reg = mtrr_add(info->fix.smem_start, info->fix.smem_len, MTRR_TYPE_WRCOMB, 1);
-	}
-#endif
-
+	par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+					  info->fix.smem_len);
 	return 0;
 
 	/* Error handling */
@@ -1092,14 +1073,7 @@ static void ark_pci_remove(struct pci_dev *dev)
 
 	if (info) {
 		struct arkfb_info *par = info->par;
-
-#ifdef CONFIG_MTRR
-		if (par->mtrr_reg >= 0) {
-			mtrr_del(par->mtrr_reg, 0, 0);
-			par->mtrr_reg = -1;
-		}
-#endif
-
+		arch_phys_wc_del(par->wc_cookie);
 		dac_release(par->dac);
 		unregister_framebuffer(info);
 		fb_dealloc_cmap(&info->cmap);
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 25/47] video: fbdev: radeonfb: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/aty/radeon_base.c | 29 ++++++-----------------------
 drivers/video/fbdev/aty/radeonfb.h    |  2 +-
 2 files changed, 7 insertions(+), 24 deletions(-)

diff --git a/drivers/video/fbdev/aty/radeon_base.c b/drivers/video/fbdev/aty/radeon_base.c
index 26d80a4..922e8fc 100644
--- a/drivers/video/fbdev/aty/radeon_base.c
+++ b/drivers/video/fbdev/aty/radeon_base.c
@@ -85,10 +85,6 @@
 
 #endif /* CONFIG_PPC_OF */
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 #include <video/radeon.h>
 #include <linux/radeonfb.h>
 
@@ -271,9 +267,7 @@ static bool mirror = 0;
 static int panel_yres = 0;
 static bool force_dfp = 0;
 static bool force_measure_pll = 0;
-#ifdef CONFIG_MTRR
 static bool nomtrr = 0;
-#endif
 static bool force_sleep;
 static bool ignore_devlist;
 #ifdef CONFIG_PMAC_BACKLIGHT
@@ -2260,8 +2254,8 @@ static int radeonfb_pci_register(struct pci_dev *pdev,
 	rinfo->mapped_vram = min_t(unsigned long, MAX_MAPPED_VRAM, rinfo->video_ram);
 
 	do {
-		rinfo->fb_base = ioremap (rinfo->fb_base_phys,
-					  rinfo->mapped_vram);
+		rinfo->fb_base = ioremap_wc(rinfo->fb_base_phys,
+					    rinfo->mapped_vram);
 	} while (rinfo->fb_base == NULL &&
 		 ((rinfo->mapped_vram /= 2) >= MIN_MAPPED_VRAM));
 
@@ -2359,11 +2353,9 @@ static int radeonfb_pci_register(struct pci_dev *pdev,
 		goto err_unmap_fb;
 	}
 
-#ifdef CONFIG_MTRR
-	rinfo->mtrr_hdl = nomtrr ? -1 : mtrr_add(rinfo->fb_base_phys,
-						 rinfo->video_ram,
-						 MTRR_TYPE_WRCOMB, 1);
-#endif
+	if (!nomtrr)
+		rinfo->wc_cookie = arch_phys_wc_add(rinfo->fb_base_phys,
+						    rinfo->video_ram);
 
 	if (backlight)
 		radeonfb_bl_init(rinfo);
@@ -2428,12 +2420,7 @@ static void radeonfb_pci_unregister(struct pci_dev *pdev)
  #endif
 
 	del_timer_sync(&rinfo->lvds_timer);
-
-#ifdef CONFIG_MTRR
-	if (rinfo->mtrr_hdl >= 0)
-		mtrr_del(rinfo->mtrr_hdl, 0, 0);
-#endif
-
+	arch_phys_wc_del(rinfo->wc_cookie);
         unregister_framebuffer(info);
 
         radeonfb_bl_exit(rinfo);
@@ -2489,10 +2476,8 @@ static int __init radeonfb_setup (char *options)
 			panel_yres = simple_strtoul((this_opt+11), NULL, 0);
 		} else if (!strncmp(this_opt, "backlight:", 10)) {
 			backlight = simple_strtoul(this_opt+10, NULL, 0);
-#ifdef CONFIG_MTRR
 		} else if (!strncmp(this_opt, "nomtrr", 6)) {
 			nomtrr = 1;
-#endif
 		} else if (!strncmp(this_opt, "nomodeset", 9)) {
 			nomodeset = 1;
 		} else if (!strncmp(this_opt, "force_measure_pll", 17)) {
@@ -2552,10 +2537,8 @@ module_param(monitor_layout, charp, 0);
 MODULE_PARM_DESC(monitor_layout, "Specify monitor mapping (like XFree86)");
 module_param(force_measure_pll, bool, 0);
 MODULE_PARM_DESC(force_measure_pll, "Force measurement of PLL (debug)");
-#ifdef CONFIG_MTRR
 module_param(nomtrr, bool, 0);
 MODULE_PARM_DESC(nomtrr, "bool: disable use of MTRR registers");
-#endif
 module_param(panel_yres, int, 0);
 MODULE_PARM_DESC(panel_yres, "int: set panel yres");
 module_param(mode_option, charp, 0);
diff --git a/drivers/video/fbdev/aty/radeonfb.h b/drivers/video/fbdev/aty/radeonfb.h
index cb84604..61812db 100644
--- a/drivers/video/fbdev/aty/radeonfb.h
+++ b/drivers/video/fbdev/aty/radeonfb.h
@@ -340,7 +340,7 @@ struct radeonfb_info {
 
 	struct pll_info		pll;
 
-	int			mtrr_hdl;
+	int			wc_cookie;
 
 	u32			save_regs[100];
 	int			asleep;
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 25/47] video: fbdev: radeonfb: use arch_phys_wc_add() and ioremap_wc()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/aty/radeon_base.c | 29 ++++++-----------------------
 drivers/video/fbdev/aty/radeonfb.h    |  2 +-
 2 files changed, 7 insertions(+), 24 deletions(-)

diff --git a/drivers/video/fbdev/aty/radeon_base.c b/drivers/video/fbdev/aty/radeon_base.c
index 26d80a4..922e8fc 100644
--- a/drivers/video/fbdev/aty/radeon_base.c
+++ b/drivers/video/fbdev/aty/radeon_base.c
@@ -85,10 +85,6 @@
 
 #endif /* CONFIG_PPC_OF */
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 #include <video/radeon.h>
 #include <linux/radeonfb.h>
 
@@ -271,9 +267,7 @@ static bool mirror = 0;
 static int panel_yres = 0;
 static bool force_dfp = 0;
 static bool force_measure_pll = 0;
-#ifdef CONFIG_MTRR
 static bool nomtrr = 0;
-#endif
 static bool force_sleep;
 static bool ignore_devlist;
 #ifdef CONFIG_PMAC_BACKLIGHT
@@ -2260,8 +2254,8 @@ static int radeonfb_pci_register(struct pci_dev *pdev,
 	rinfo->mapped_vram = min_t(unsigned long, MAX_MAPPED_VRAM, rinfo->video_ram);
 
 	do {
-		rinfo->fb_base = ioremap (rinfo->fb_base_phys,
-					  rinfo->mapped_vram);
+		rinfo->fb_base = ioremap_wc(rinfo->fb_base_phys,
+					    rinfo->mapped_vram);
 	} while (rinfo->fb_base = NULL &&
 		 ((rinfo->mapped_vram /= 2) >= MIN_MAPPED_VRAM));
 
@@ -2359,11 +2353,9 @@ static int radeonfb_pci_register(struct pci_dev *pdev,
 		goto err_unmap_fb;
 	}
 
-#ifdef CONFIG_MTRR
-	rinfo->mtrr_hdl = nomtrr ? -1 : mtrr_add(rinfo->fb_base_phys,
-						 rinfo->video_ram,
-						 MTRR_TYPE_WRCOMB, 1);
-#endif
+	if (!nomtrr)
+		rinfo->wc_cookie = arch_phys_wc_add(rinfo->fb_base_phys,
+						    rinfo->video_ram);
 
 	if (backlight)
 		radeonfb_bl_init(rinfo);
@@ -2428,12 +2420,7 @@ static void radeonfb_pci_unregister(struct pci_dev *pdev)
  #endif
 
 	del_timer_sync(&rinfo->lvds_timer);
-
-#ifdef CONFIG_MTRR
-	if (rinfo->mtrr_hdl >= 0)
-		mtrr_del(rinfo->mtrr_hdl, 0, 0);
-#endif
-
+	arch_phys_wc_del(rinfo->wc_cookie);
         unregister_framebuffer(info);
 
         radeonfb_bl_exit(rinfo);
@@ -2489,10 +2476,8 @@ static int __init radeonfb_setup (char *options)
 			panel_yres = simple_strtoul((this_opt+11), NULL, 0);
 		} else if (!strncmp(this_opt, "backlight:", 10)) {
 			backlight = simple_strtoul(this_opt+10, NULL, 0);
-#ifdef CONFIG_MTRR
 		} else if (!strncmp(this_opt, "nomtrr", 6)) {
 			nomtrr = 1;
-#endif
 		} else if (!strncmp(this_opt, "nomodeset", 9)) {
 			nomodeset = 1;
 		} else if (!strncmp(this_opt, "force_measure_pll", 17)) {
@@ -2552,10 +2537,8 @@ module_param(monitor_layout, charp, 0);
 MODULE_PARM_DESC(monitor_layout, "Specify monitor mapping (like XFree86)");
 module_param(force_measure_pll, bool, 0);
 MODULE_PARM_DESC(force_measure_pll, "Force measurement of PLL (debug)");
-#ifdef CONFIG_MTRR
 module_param(nomtrr, bool, 0);
 MODULE_PARM_DESC(nomtrr, "bool: disable use of MTRR registers");
-#endif
 module_param(panel_yres, int, 0);
 MODULE_PARM_DESC(panel_yres, "int: set panel yres");
 module_param(mode_option, charp, 0);
diff --git a/drivers/video/fbdev/aty/radeonfb.h b/drivers/video/fbdev/aty/radeonfb.h
index cb84604..61812db 100644
--- a/drivers/video/fbdev/aty/radeonfb.h
+++ b/drivers/video/fbdev/aty/radeonfb.h
@@ -340,7 +340,7 @@ struct radeonfb_info {
 
 	struct pll_info		pll;
 
-	int			mtrr_hdl;
+	int			wc_cookie;
 
 	u32			save_regs[100];
 	int			asleep;
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 25/47] video: fbdev: radeonfb: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (43 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/aty/radeon_base.c | 29 ++++++-----------------------
 drivers/video/fbdev/aty/radeonfb.h    |  2 +-
 2 files changed, 7 insertions(+), 24 deletions(-)

diff --git a/drivers/video/fbdev/aty/radeon_base.c b/drivers/video/fbdev/aty/radeon_base.c
index 26d80a4..922e8fc 100644
--- a/drivers/video/fbdev/aty/radeon_base.c
+++ b/drivers/video/fbdev/aty/radeon_base.c
@@ -85,10 +85,6 @@
 
 #endif /* CONFIG_PPC_OF */
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 #include <video/radeon.h>
 #include <linux/radeonfb.h>
 
@@ -271,9 +267,7 @@ static bool mirror = 0;
 static int panel_yres = 0;
 static bool force_dfp = 0;
 static bool force_measure_pll = 0;
-#ifdef CONFIG_MTRR
 static bool nomtrr = 0;
-#endif
 static bool force_sleep;
 static bool ignore_devlist;
 #ifdef CONFIG_PMAC_BACKLIGHT
@@ -2260,8 +2254,8 @@ static int radeonfb_pci_register(struct pci_dev *pdev,
 	rinfo->mapped_vram = min_t(unsigned long, MAX_MAPPED_VRAM, rinfo->video_ram);
 
 	do {
-		rinfo->fb_base = ioremap (rinfo->fb_base_phys,
-					  rinfo->mapped_vram);
+		rinfo->fb_base = ioremap_wc(rinfo->fb_base_phys,
+					    rinfo->mapped_vram);
 	} while (rinfo->fb_base == NULL &&
 		 ((rinfo->mapped_vram /= 2) >= MIN_MAPPED_VRAM));
 
@@ -2359,11 +2353,9 @@ static int radeonfb_pci_register(struct pci_dev *pdev,
 		goto err_unmap_fb;
 	}
 
-#ifdef CONFIG_MTRR
-	rinfo->mtrr_hdl = nomtrr ? -1 : mtrr_add(rinfo->fb_base_phys,
-						 rinfo->video_ram,
-						 MTRR_TYPE_WRCOMB, 1);
-#endif
+	if (!nomtrr)
+		rinfo->wc_cookie = arch_phys_wc_add(rinfo->fb_base_phys,
+						    rinfo->video_ram);
 
 	if (backlight)
 		radeonfb_bl_init(rinfo);
@@ -2428,12 +2420,7 @@ static void radeonfb_pci_unregister(struct pci_dev *pdev)
  #endif
 
 	del_timer_sync(&rinfo->lvds_timer);
-
-#ifdef CONFIG_MTRR
-	if (rinfo->mtrr_hdl >= 0)
-		mtrr_del(rinfo->mtrr_hdl, 0, 0);
-#endif
-
+	arch_phys_wc_del(rinfo->wc_cookie);
         unregister_framebuffer(info);
 
         radeonfb_bl_exit(rinfo);
@@ -2489,10 +2476,8 @@ static int __init radeonfb_setup (char *options)
 			panel_yres = simple_strtoul((this_opt+11), NULL, 0);
 		} else if (!strncmp(this_opt, "backlight:", 10)) {
 			backlight = simple_strtoul(this_opt+10, NULL, 0);
-#ifdef CONFIG_MTRR
 		} else if (!strncmp(this_opt, "nomtrr", 6)) {
 			nomtrr = 1;
-#endif
 		} else if (!strncmp(this_opt, "nomodeset", 9)) {
 			nomodeset = 1;
 		} else if (!strncmp(this_opt, "force_measure_pll", 17)) {
@@ -2552,10 +2537,8 @@ module_param(monitor_layout, charp, 0);
 MODULE_PARM_DESC(monitor_layout, "Specify monitor mapping (like XFree86)");
 module_param(force_measure_pll, bool, 0);
 MODULE_PARM_DESC(force_measure_pll, "Force measurement of PLL (debug)");
-#ifdef CONFIG_MTRR
 module_param(nomtrr, bool, 0);
 MODULE_PARM_DESC(nomtrr, "bool: disable use of MTRR registers");
-#endif
 module_param(panel_yres, int, 0);
 MODULE_PARM_DESC(panel_yres, "int: set panel yres");
 module_param(mode_option, charp, 0);
diff --git a/drivers/video/fbdev/aty/radeonfb.h b/drivers/video/fbdev/aty/radeonfb.h
index cb84604..61812db 100644
--- a/drivers/video/fbdev/aty/radeonfb.h
+++ b/drivers/video/fbdev/aty/radeonfb.h
@@ -340,7 +340,7 @@ struct radeonfb_info {
 
 	struct pll_info		pll;
 
-	int			mtrr_hdl;
+	int			wc_cookie;
 
 	u32			save_regs[100];
 	int			asleep;
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 26/47] video: fbdev: gbefb: add missing mtrr_del() calls
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver never removed the MTRRs. Fix that.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/gbefb.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/video/fbdev/gbefb.c b/drivers/video/fbdev/gbefb.c
index 6d9ef39..f48ea7e 100644
--- a/drivers/video/fbdev/gbefb.c
+++ b/drivers/video/fbdev/gbefb.c
@@ -38,6 +38,7 @@ static struct sgi_gbe *gbe;
 struct gbefb_par {
 	struct fb_var_screeninfo var;
 	struct gbe_timing_info timing;
+	int wc_cookie;
 	int valid;
 };
 
@@ -1199,7 +1200,8 @@ static int gbefb_probe(struct platform_device *p_dev)
 	}
 
 #ifdef CONFIG_X86
-	mtrr_add(gbe_mem_phys, gbe_mem_size, MTRR_TYPE_WRCOMB, 1);
+	info->wc_cookie = mtrr_add(gbe_mem_phys, gbe_mem_size,
+				   MTRR_TYPE_WRCOMB, 1);
 #endif
 
 	/* map framebuffer memory into tiles table */
@@ -1240,6 +1242,10 @@ static int gbefb_probe(struct platform_device *p_dev)
 	return 0;
 
 out_gbe_unmap:
+#ifdef CONFIG_MTRR
+	if (info->wc_cookie >= 0)
+		mtrr_del(info->wc_cookie, 0, 0);
+#endif
 	if (gbe_dma_addr)
 		dma_free_coherent(NULL, gbe_mem_size, gbe_mem, gbe_mem_phys);
 out_tiles_free:
@@ -1259,6 +1265,10 @@ static int gbefb_remove(struct platform_device* p_dev)
 
 	unregister_framebuffer(info);
 	gbe_turn_off();
+#ifdef CONFIG_MTRR
+	if (info->wc_cookie >= 0)
+		mtrr_del(info->wc_cookie, 0, 0);
+#endif
 	if (gbe_dma_addr)
 		dma_free_coherent(NULL, gbe_mem_size, gbe_mem, gbe_mem_phys);
 	dma_free_coherent(NULL, GBE_TLB_SIZE * sizeof(uint16_t),
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 26/47] video: fbdev: gbefb: add missing mtrr_del() calls
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver never removed the MTRRs. Fix that.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/gbefb.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/video/fbdev/gbefb.c b/drivers/video/fbdev/gbefb.c
index 6d9ef39..f48ea7e 100644
--- a/drivers/video/fbdev/gbefb.c
+++ b/drivers/video/fbdev/gbefb.c
@@ -38,6 +38,7 @@ static struct sgi_gbe *gbe;
 struct gbefb_par {
 	struct fb_var_screeninfo var;
 	struct gbe_timing_info timing;
+	int wc_cookie;
 	int valid;
 };
 
@@ -1199,7 +1200,8 @@ static int gbefb_probe(struct platform_device *p_dev)
 	}
 
 #ifdef CONFIG_X86
-	mtrr_add(gbe_mem_phys, gbe_mem_size, MTRR_TYPE_WRCOMB, 1);
+	info->wc_cookie = mtrr_add(gbe_mem_phys, gbe_mem_size,
+				   MTRR_TYPE_WRCOMB, 1);
 #endif
 
 	/* map framebuffer memory into tiles table */
@@ -1240,6 +1242,10 @@ static int gbefb_probe(struct platform_device *p_dev)
 	return 0;
 
 out_gbe_unmap:
+#ifdef CONFIG_MTRR
+	if (info->wc_cookie >= 0)
+		mtrr_del(info->wc_cookie, 0, 0);
+#endif
 	if (gbe_dma_addr)
 		dma_free_coherent(NULL, gbe_mem_size, gbe_mem, gbe_mem_phys);
 out_tiles_free:
@@ -1259,6 +1265,10 @@ static int gbefb_remove(struct platform_device* p_dev)
 
 	unregister_framebuffer(info);
 	gbe_turn_off();
+#ifdef CONFIG_MTRR
+	if (info->wc_cookie >= 0)
+		mtrr_del(info->wc_cookie, 0, 0);
+#endif
 	if (gbe_dma_addr)
 		dma_free_coherent(NULL, gbe_mem_size, gbe_mem, gbe_mem_phys);
 	dma_free_coherent(NULL, GBE_TLB_SIZE * sizeof(uint16_t),
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 26/47] video: fbdev: gbefb: add missing mtrr_del() calls
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (45 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver never removed the MTRRs. Fix that.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/gbefb.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/video/fbdev/gbefb.c b/drivers/video/fbdev/gbefb.c
index 6d9ef39..f48ea7e 100644
--- a/drivers/video/fbdev/gbefb.c
+++ b/drivers/video/fbdev/gbefb.c
@@ -38,6 +38,7 @@ static struct sgi_gbe *gbe;
 struct gbefb_par {
 	struct fb_var_screeninfo var;
 	struct gbe_timing_info timing;
+	int wc_cookie;
 	int valid;
 };
 
@@ -1199,7 +1200,8 @@ static int gbefb_probe(struct platform_device *p_dev)
 	}
 
 #ifdef CONFIG_X86
-	mtrr_add(gbe_mem_phys, gbe_mem_size, MTRR_TYPE_WRCOMB, 1);
+	info->wc_cookie = mtrr_add(gbe_mem_phys, gbe_mem_size,
+				   MTRR_TYPE_WRCOMB, 1);
 #endif
 
 	/* map framebuffer memory into tiles table */
@@ -1240,6 +1242,10 @@ static int gbefb_probe(struct platform_device *p_dev)
 	return 0;
 
 out_gbe_unmap:
+#ifdef CONFIG_MTRR
+	if (info->wc_cookie >= 0)
+		mtrr_del(info->wc_cookie, 0, 0);
+#endif
 	if (gbe_dma_addr)
 		dma_free_coherent(NULL, gbe_mem_size, gbe_mem, gbe_mem_phys);
 out_tiles_free:
@@ -1259,6 +1265,10 @@ static int gbefb_remove(struct platform_device* p_dev)
 
 	unregister_framebuffer(info);
 	gbe_turn_off();
+#ifdef CONFIG_MTRR
+	if (info->wc_cookie >= 0)
+		mtrr_del(info->wc_cookie, 0, 0);
+#endif
 	if (gbe_dma_addr)
 		dma_free_coherent(NULL, gbe_mem_size, gbe_mem, gbe_mem_phys);
 	dma_free_coherent(NULL, GBE_TLB_SIZE * sizeof(uint16_t),
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 27/47] video: fbdev: gbefb: use arch_phys_wc_add() and devm_ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/gbefb.c | 26 +++++++-------------------
 1 file changed, 7 insertions(+), 19 deletions(-)

diff --git a/drivers/video/fbdev/gbefb.c b/drivers/video/fbdev/gbefb.c
index f48ea7e..ef81215 100644
--- a/drivers/video/fbdev/gbefb.c
+++ b/drivers/video/fbdev/gbefb.c
@@ -22,9 +22,6 @@
 #include <linux/module.h>
 #include <linux/io.h>
 
-#ifdef CONFIG_X86
-#include <asm/mtrr.h>
-#endif
 #ifdef CONFIG_MIPS
 #include <asm/addrspace.h>
 #endif
@@ -1176,8 +1173,8 @@ static int gbefb_probe(struct platform_device *p_dev)
 
 	if (gbe_mem_phys) {
 		/* memory was allocated at boot time */
-		gbe_mem = devm_ioremap_nocache(&p_dev->dev, gbe_mem_phys,
-					       gbe_mem_size);
+		gbe_mem = devm_ioremap_wc(&p_dev->dev, gbe_mem_phys,
+					  gbe_mem_size);
 		if (!gbe_mem) {
 			printk(KERN_ERR "gbefb: couldn't map framebuffer\n");
 			ret = -ENOMEM;
@@ -1188,8 +1185,8 @@ static int gbefb_probe(struct platform_device *p_dev)
 	} else {
 		/* try to allocate memory with the classical allocator
 		 * this has high chance to fail on low memory machines */
-		gbe_mem = dma_alloc_coherent(NULL, gbe_mem_size, &gbe_dma_addr,
-					     GFP_KERNEL);
+		gbe_mem = dma_alloc_writecombine(NULL, gbe_mem_size,
+						 &gbe_dma_addr, GFP_KERNEL);
 		if (!gbe_mem) {
 			printk(KERN_ERR "gbefb: couldn't allocate framebuffer memory\n");
 			ret = -ENOMEM;
@@ -1199,10 +1196,7 @@ static int gbefb_probe(struct platform_device *p_dev)
 		gbe_mem_phys = (unsigned long) gbe_dma_addr;
 	}
 
-#ifdef CONFIG_X86
-	info->wc_cookie = mtrr_add(gbe_mem_phys, gbe_mem_size,
-				   MTRR_TYPE_WRCOMB, 1);
-#endif
+	info->wc_cookie = arch_phys_wc_add(gbe_mem_phys, gbe_mem_size);
 
 	/* map framebuffer memory into tiles table */
 	for (i = 0; i < (gbe_mem_size >> TILE_SHIFT); i++)
@@ -1242,10 +1236,7 @@ static int gbefb_probe(struct platform_device *p_dev)
 	return 0;
 
 out_gbe_unmap:
-#ifdef CONFIG_MTRR
-	if (info->wc_cookie >= 0)
-		mtrr_del(info->wc_cookie, 0, 0);
-#endif
+	arch_phys_wc_del(info->wc_cookie);
 	if (gbe_dma_addr)
 		dma_free_coherent(NULL, gbe_mem_size, gbe_mem, gbe_mem_phys);
 out_tiles_free:
@@ -1265,10 +1256,7 @@ static int gbefb_remove(struct platform_device* p_dev)
 
 	unregister_framebuffer(info);
 	gbe_turn_off();
-#ifdef CONFIG_MTRR
-	if (info->wc_cookie >= 0)
-		mtrr_del(info->wc_cookie, 0, 0);
-#endif
+	arch_phys_wc_del(info->wc_cookie);
 	if (gbe_dma_addr)
 		dma_free_coherent(NULL, gbe_mem_size, gbe_mem, gbe_mem_phys);
 	dma_free_coherent(NULL, GBE_TLB_SIZE * sizeof(uint16_t),
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 27/47] video: fbdev: gbefb: use arch_phys_wc_add() and devm_ioremap_wc()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/gbefb.c | 26 +++++++-------------------
 1 file changed, 7 insertions(+), 19 deletions(-)

diff --git a/drivers/video/fbdev/gbefb.c b/drivers/video/fbdev/gbefb.c
index f48ea7e..ef81215 100644
--- a/drivers/video/fbdev/gbefb.c
+++ b/drivers/video/fbdev/gbefb.c
@@ -22,9 +22,6 @@
 #include <linux/module.h>
 #include <linux/io.h>
 
-#ifdef CONFIG_X86
-#include <asm/mtrr.h>
-#endif
 #ifdef CONFIG_MIPS
 #include <asm/addrspace.h>
 #endif
@@ -1176,8 +1173,8 @@ static int gbefb_probe(struct platform_device *p_dev)
 
 	if (gbe_mem_phys) {
 		/* memory was allocated at boot time */
-		gbe_mem = devm_ioremap_nocache(&p_dev->dev, gbe_mem_phys,
-					       gbe_mem_size);
+		gbe_mem = devm_ioremap_wc(&p_dev->dev, gbe_mem_phys,
+					  gbe_mem_size);
 		if (!gbe_mem) {
 			printk(KERN_ERR "gbefb: couldn't map framebuffer\n");
 			ret = -ENOMEM;
@@ -1188,8 +1185,8 @@ static int gbefb_probe(struct platform_device *p_dev)
 	} else {
 		/* try to allocate memory with the classical allocator
 		 * this has high chance to fail on low memory machines */
-		gbe_mem = dma_alloc_coherent(NULL, gbe_mem_size, &gbe_dma_addr,
-					     GFP_KERNEL);
+		gbe_mem = dma_alloc_writecombine(NULL, gbe_mem_size,
+						 &gbe_dma_addr, GFP_KERNEL);
 		if (!gbe_mem) {
 			printk(KERN_ERR "gbefb: couldn't allocate framebuffer memory\n");
 			ret = -ENOMEM;
@@ -1199,10 +1196,7 @@ static int gbefb_probe(struct platform_device *p_dev)
 		gbe_mem_phys = (unsigned long) gbe_dma_addr;
 	}
 
-#ifdef CONFIG_X86
-	info->wc_cookie = mtrr_add(gbe_mem_phys, gbe_mem_size,
-				   MTRR_TYPE_WRCOMB, 1);
-#endif
+	info->wc_cookie = arch_phys_wc_add(gbe_mem_phys, gbe_mem_size);
 
 	/* map framebuffer memory into tiles table */
 	for (i = 0; i < (gbe_mem_size >> TILE_SHIFT); i++)
@@ -1242,10 +1236,7 @@ static int gbefb_probe(struct platform_device *p_dev)
 	return 0;
 
 out_gbe_unmap:
-#ifdef CONFIG_MTRR
-	if (info->wc_cookie >= 0)
-		mtrr_del(info->wc_cookie, 0, 0);
-#endif
+	arch_phys_wc_del(info->wc_cookie);
 	if (gbe_dma_addr)
 		dma_free_coherent(NULL, gbe_mem_size, gbe_mem, gbe_mem_phys);
 out_tiles_free:
@@ -1265,10 +1256,7 @@ static int gbefb_remove(struct platform_device* p_dev)
 
 	unregister_framebuffer(info);
 	gbe_turn_off();
-#ifdef CONFIG_MTRR
-	if (info->wc_cookie >= 0)
-		mtrr_del(info->wc_cookie, 0, 0);
-#endif
+	arch_phys_wc_del(info->wc_cookie);
 	if (gbe_dma_addr)
 		dma_free_coherent(NULL, gbe_mem_size, gbe_mem, gbe_mem_phys);
 	dma_free_coherent(NULL, GBE_TLB_SIZE * sizeof(uint16_t),
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 27/47] video: fbdev: gbefb: use arch_phys_wc_add() and devm_ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (46 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/gbefb.c | 26 +++++++-------------------
 1 file changed, 7 insertions(+), 19 deletions(-)

diff --git a/drivers/video/fbdev/gbefb.c b/drivers/video/fbdev/gbefb.c
index f48ea7e..ef81215 100644
--- a/drivers/video/fbdev/gbefb.c
+++ b/drivers/video/fbdev/gbefb.c
@@ -22,9 +22,6 @@
 #include <linux/module.h>
 #include <linux/io.h>
 
-#ifdef CONFIG_X86
-#include <asm/mtrr.h>
-#endif
 #ifdef CONFIG_MIPS
 #include <asm/addrspace.h>
 #endif
@@ -1176,8 +1173,8 @@ static int gbefb_probe(struct platform_device *p_dev)
 
 	if (gbe_mem_phys) {
 		/* memory was allocated at boot time */
-		gbe_mem = devm_ioremap_nocache(&p_dev->dev, gbe_mem_phys,
-					       gbe_mem_size);
+		gbe_mem = devm_ioremap_wc(&p_dev->dev, gbe_mem_phys,
+					  gbe_mem_size);
 		if (!gbe_mem) {
 			printk(KERN_ERR "gbefb: couldn't map framebuffer\n");
 			ret = -ENOMEM;
@@ -1188,8 +1185,8 @@ static int gbefb_probe(struct platform_device *p_dev)
 	} else {
 		/* try to allocate memory with the classical allocator
 		 * this has high chance to fail on low memory machines */
-		gbe_mem = dma_alloc_coherent(NULL, gbe_mem_size, &gbe_dma_addr,
-					     GFP_KERNEL);
+		gbe_mem = dma_alloc_writecombine(NULL, gbe_mem_size,
+						 &gbe_dma_addr, GFP_KERNEL);
 		if (!gbe_mem) {
 			printk(KERN_ERR "gbefb: couldn't allocate framebuffer memory\n");
 			ret = -ENOMEM;
@@ -1199,10 +1196,7 @@ static int gbefb_probe(struct platform_device *p_dev)
 		gbe_mem_phys = (unsigned long) gbe_dma_addr;
 	}
 
-#ifdef CONFIG_X86
-	info->wc_cookie = mtrr_add(gbe_mem_phys, gbe_mem_size,
-				   MTRR_TYPE_WRCOMB, 1);
-#endif
+	info->wc_cookie = arch_phys_wc_add(gbe_mem_phys, gbe_mem_size);
 
 	/* map framebuffer memory into tiles table */
 	for (i = 0; i < (gbe_mem_size >> TILE_SHIFT); i++)
@@ -1242,10 +1236,7 @@ static int gbefb_probe(struct platform_device *p_dev)
 	return 0;
 
 out_gbe_unmap:
-#ifdef CONFIG_MTRR
-	if (info->wc_cookie >= 0)
-		mtrr_del(info->wc_cookie, 0, 0);
-#endif
+	arch_phys_wc_del(info->wc_cookie);
 	if (gbe_dma_addr)
 		dma_free_coherent(NULL, gbe_mem_size, gbe_mem, gbe_mem_phys);
 out_tiles_free:
@@ -1265,10 +1256,7 @@ static int gbefb_remove(struct platform_device* p_dev)
 
 	unregister_framebuffer(info);
 	gbe_turn_off();
-#ifdef CONFIG_MTRR
-	if (info->wc_cookie >= 0)
-		mtrr_del(info->wc_cookie, 0, 0);
-#endif
+	arch_phys_wc_del(info->wc_cookie);
 	if (gbe_dma_addr)
 		dma_free_coherent(NULL, gbe_mem_size, gbe_mem, gbe_mem_phys);
 	dma_free_coherent(NULL, GBE_TLB_SIZE * sizeof(uint16_t),
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 28/47] video: fbdev: intelfb: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Although this driver gives the framebuffer layer a different
size for the framebuffer it uses the entire aperture PCI BAR
size for the MTRR. Since the framebuffer is included in that
range and MTRR was used on the entire PCI BAR WC will have
been preferred on that range as well. This propagates the
WC preference on the same entire PCI BAR.

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/intelfb/intelfb.h    |  4 +---
 drivers/video/fbdev/intelfb/intelfbdrv.c | 38 ++++----------------------------
 2 files changed, 5 insertions(+), 37 deletions(-)

diff --git a/drivers/video/fbdev/intelfb/intelfb.h b/drivers/video/fbdev/intelfb/intelfb.h
index 6b51175..37f8339 100644
--- a/drivers/video/fbdev/intelfb/intelfb.h
+++ b/drivers/video/fbdev/intelfb/intelfb.h
@@ -285,9 +285,7 @@ struct intelfb_info {
 	/* use a gart reserved fb mem */
 	u8 fbmem_gart;
 
-	/* mtrr support */
-	int mtrr_reg;
-	u32 has_mtrr;
+	int wc_cookie;
 
 	/* heap data */
 	struct intelfb_heap_data aperture;
diff --git a/drivers/video/fbdev/intelfb/intelfbdrv.c b/drivers/video/fbdev/intelfb/intelfbdrv.c
index b847d53..bbec737 100644
--- a/drivers/video/fbdev/intelfb/intelfbdrv.c
+++ b/drivers/video/fbdev/intelfb/intelfbdrv.c
@@ -124,10 +124,6 @@
 
 #include <asm/io.h>
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 #include "intelfb.h"
 #include "intelfbhw.h"
 #include "../edid.h"
@@ -411,33 +407,6 @@ module_init(intelfb_init);
 module_exit(intelfb_exit);
 
 /***************************************************************
- *                     mtrr support functions                  *
- ***************************************************************/
-
-#ifdef CONFIG_MTRR
-static inline void set_mtrr(struct intelfb_info *dinfo)
-{
-	dinfo->mtrr_reg = mtrr_add(dinfo->aperture.physical,
-				   dinfo->aperture.size, MTRR_TYPE_WRCOMB, 1);
-	if (dinfo->mtrr_reg < 0) {
-		ERR_MSG("unable to set MTRR\n");
-		return;
-	}
-	dinfo->has_mtrr = 1;
-}
-static inline void unset_mtrr(struct intelfb_info *dinfo)
-{
-	if (dinfo->has_mtrr)
-		mtrr_del(dinfo->mtrr_reg, dinfo->aperture.physical,
-			 dinfo->aperture.size);
-}
-#else
-#define set_mtrr(x) WRN_MSG("MTRR is disabled in the kernel\n")
-
-#define unset_mtrr(x) do { } while (0)
-#endif /* CONFIG_MTRR */
-
-/***************************************************************
  *                        driver init / cleanup                *
  ***************************************************************/
 
@@ -456,7 +425,7 @@ static void cleanup(struct intelfb_info *dinfo)
 	if (dinfo->registered)
 		unregister_framebuffer(dinfo->info);
 
-	unset_mtrr(dinfo);
+	arch_phys_wc_del(dinfo->wc_cookie);
 
 	if (dinfo->fbmem_gart && dinfo->gtt_fb_mem) {
 		agp_unbind_memory(dinfo->gtt_fb_mem);
@@ -675,7 +644,7 @@ static int intelfb_pci_register(struct pci_dev *pdev,
 	/* Allocate memories (which aren't stolen) */
 	/* Map the fb and MMIO regions */
 	/* ioremap only up to the end of used aperture */
-	dinfo->aperture.virtual = (u8 __iomem *)ioremap_nocache
+	dinfo->aperture.virtual = (u8 __iomem *)ioremap_wc
 		(dinfo->aperture.physical, ((offset + dinfo->fb.offset) << 12)
 		 + dinfo->fb.size);
 	if (!dinfo->aperture.virtual) {
@@ -772,7 +741,8 @@ static int intelfb_pci_register(struct pci_dev *pdev,
 	agp_backend_release(bridge);
 
 	if (mtrr)
-		set_mtrr(dinfo);
+		dinfo->wc_cookie = arch_phys_wc_add(dinfo->aperture.physical,
+						    dinfo->aperture.size);
 
 	DBG_MSG("fb: 0x%x(+ 0x%x)/0x%x (0x%p)\n",
 		dinfo->fb.physical, dinfo->fb.offset, dinfo->fb.size,
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 28/47] video: fbdev: intelfb: use arch_phys_wc_add() and ioremap_wc()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Although this driver gives the framebuffer layer a different
size for the framebuffer it uses the entire aperture PCI BAR
size for the MTRR. Since the framebuffer is included in that
range and MTRR was used on the entire PCI BAR WC will have
been preferred on that range as well. This propagates the
WC preference on the same entire PCI BAR.

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/intelfb/intelfb.h    |  4 +---
 drivers/video/fbdev/intelfb/intelfbdrv.c | 38 ++++----------------------------
 2 files changed, 5 insertions(+), 37 deletions(-)

diff --git a/drivers/video/fbdev/intelfb/intelfb.h b/drivers/video/fbdev/intelfb/intelfb.h
index 6b51175..37f8339 100644
--- a/drivers/video/fbdev/intelfb/intelfb.h
+++ b/drivers/video/fbdev/intelfb/intelfb.h
@@ -285,9 +285,7 @@ struct intelfb_info {
 	/* use a gart reserved fb mem */
 	u8 fbmem_gart;
 
-	/* mtrr support */
-	int mtrr_reg;
-	u32 has_mtrr;
+	int wc_cookie;
 
 	/* heap data */
 	struct intelfb_heap_data aperture;
diff --git a/drivers/video/fbdev/intelfb/intelfbdrv.c b/drivers/video/fbdev/intelfb/intelfbdrv.c
index b847d53..bbec737 100644
--- a/drivers/video/fbdev/intelfb/intelfbdrv.c
+++ b/drivers/video/fbdev/intelfb/intelfbdrv.c
@@ -124,10 +124,6 @@
 
 #include <asm/io.h>
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 #include "intelfb.h"
 #include "intelfbhw.h"
 #include "../edid.h"
@@ -411,33 +407,6 @@ module_init(intelfb_init);
 module_exit(intelfb_exit);
 
 /***************************************************************
- *                     mtrr support functions                  *
- ***************************************************************/
-
-#ifdef CONFIG_MTRR
-static inline void set_mtrr(struct intelfb_info *dinfo)
-{
-	dinfo->mtrr_reg = mtrr_add(dinfo->aperture.physical,
-				   dinfo->aperture.size, MTRR_TYPE_WRCOMB, 1);
-	if (dinfo->mtrr_reg < 0) {
-		ERR_MSG("unable to set MTRR\n");
-		return;
-	}
-	dinfo->has_mtrr = 1;
-}
-static inline void unset_mtrr(struct intelfb_info *dinfo)
-{
-	if (dinfo->has_mtrr)
-		mtrr_del(dinfo->mtrr_reg, dinfo->aperture.physical,
-			 dinfo->aperture.size);
-}
-#else
-#define set_mtrr(x) WRN_MSG("MTRR is disabled in the kernel\n")
-
-#define unset_mtrr(x) do { } while (0)
-#endif /* CONFIG_MTRR */
-
-/***************************************************************
  *                        driver init / cleanup                *
  ***************************************************************/
 
@@ -456,7 +425,7 @@ static void cleanup(struct intelfb_info *dinfo)
 	if (dinfo->registered)
 		unregister_framebuffer(dinfo->info);
 
-	unset_mtrr(dinfo);
+	arch_phys_wc_del(dinfo->wc_cookie);
 
 	if (dinfo->fbmem_gart && dinfo->gtt_fb_mem) {
 		agp_unbind_memory(dinfo->gtt_fb_mem);
@@ -675,7 +644,7 @@ static int intelfb_pci_register(struct pci_dev *pdev,
 	/* Allocate memories (which aren't stolen) */
 	/* Map the fb and MMIO regions */
 	/* ioremap only up to the end of used aperture */
-	dinfo->aperture.virtual = (u8 __iomem *)ioremap_nocache
+	dinfo->aperture.virtual = (u8 __iomem *)ioremap_wc
 		(dinfo->aperture.physical, ((offset + dinfo->fb.offset) << 12)
 		 + dinfo->fb.size);
 	if (!dinfo->aperture.virtual) {
@@ -772,7 +741,8 @@ static int intelfb_pci_register(struct pci_dev *pdev,
 	agp_backend_release(bridge);
 
 	if (mtrr)
-		set_mtrr(dinfo);
+		dinfo->wc_cookie = arch_phys_wc_add(dinfo->aperture.physical,
+						    dinfo->aperture.size);
 
 	DBG_MSG("fb: 0x%x(+ 0x%x)/0x%x (0x%p)\n",
 		dinfo->fb.physical, dinfo->fb.offset, dinfo->fb.size,
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 28/47] video: fbdev: intelfb: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (49 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Although this driver gives the framebuffer layer a different
size for the framebuffer it uses the entire aperture PCI BAR
size for the MTRR. Since the framebuffer is included in that
range and MTRR was used on the entire PCI BAR WC will have
been preferred on that range as well. This propagates the
WC preference on the same entire PCI BAR.

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/intelfb/intelfb.h    |  4 +---
 drivers/video/fbdev/intelfb/intelfbdrv.c | 38 ++++----------------------------
 2 files changed, 5 insertions(+), 37 deletions(-)

diff --git a/drivers/video/fbdev/intelfb/intelfb.h b/drivers/video/fbdev/intelfb/intelfb.h
index 6b51175..37f8339 100644
--- a/drivers/video/fbdev/intelfb/intelfb.h
+++ b/drivers/video/fbdev/intelfb/intelfb.h
@@ -285,9 +285,7 @@ struct intelfb_info {
 	/* use a gart reserved fb mem */
 	u8 fbmem_gart;
 
-	/* mtrr support */
-	int mtrr_reg;
-	u32 has_mtrr;
+	int wc_cookie;
 
 	/* heap data */
 	struct intelfb_heap_data aperture;
diff --git a/drivers/video/fbdev/intelfb/intelfbdrv.c b/drivers/video/fbdev/intelfb/intelfbdrv.c
index b847d53..bbec737 100644
--- a/drivers/video/fbdev/intelfb/intelfbdrv.c
+++ b/drivers/video/fbdev/intelfb/intelfbdrv.c
@@ -124,10 +124,6 @@
 
 #include <asm/io.h>
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 #include "intelfb.h"
 #include "intelfbhw.h"
 #include "../edid.h"
@@ -411,33 +407,6 @@ module_init(intelfb_init);
 module_exit(intelfb_exit);
 
 /***************************************************************
- *                     mtrr support functions                  *
- ***************************************************************/
-
-#ifdef CONFIG_MTRR
-static inline void set_mtrr(struct intelfb_info *dinfo)
-{
-	dinfo->mtrr_reg = mtrr_add(dinfo->aperture.physical,
-				   dinfo->aperture.size, MTRR_TYPE_WRCOMB, 1);
-	if (dinfo->mtrr_reg < 0) {
-		ERR_MSG("unable to set MTRR\n");
-		return;
-	}
-	dinfo->has_mtrr = 1;
-}
-static inline void unset_mtrr(struct intelfb_info *dinfo)
-{
-	if (dinfo->has_mtrr)
-		mtrr_del(dinfo->mtrr_reg, dinfo->aperture.physical,
-			 dinfo->aperture.size);
-}
-#else
-#define set_mtrr(x) WRN_MSG("MTRR is disabled in the kernel\n")
-
-#define unset_mtrr(x) do { } while (0)
-#endif /* CONFIG_MTRR */
-
-/***************************************************************
  *                        driver init / cleanup                *
  ***************************************************************/
 
@@ -456,7 +425,7 @@ static void cleanup(struct intelfb_info *dinfo)
 	if (dinfo->registered)
 		unregister_framebuffer(dinfo->info);
 
-	unset_mtrr(dinfo);
+	arch_phys_wc_del(dinfo->wc_cookie);
 
 	if (dinfo->fbmem_gart && dinfo->gtt_fb_mem) {
 		agp_unbind_memory(dinfo->gtt_fb_mem);
@@ -675,7 +644,7 @@ static int intelfb_pci_register(struct pci_dev *pdev,
 	/* Allocate memories (which aren't stolen) */
 	/* Map the fb and MMIO regions */
 	/* ioremap only up to the end of used aperture */
-	dinfo->aperture.virtual = (u8 __iomem *)ioremap_nocache
+	dinfo->aperture.virtual = (u8 __iomem *)ioremap_wc
 		(dinfo->aperture.physical, ((offset + dinfo->fb.offset) << 12)
 		 + dinfo->fb.size);
 	if (!dinfo->aperture.virtual) {
@@ -772,7 +741,8 @@ static int intelfb_pci_register(struct pci_dev *pdev,
 	agp_backend_release(bridge);
 
 	if (mtrr)
-		set_mtrr(dinfo);
+		dinfo->wc_cookie = arch_phys_wc_add(dinfo->aperture.physical,
+						    dinfo->aperture.size);
 
 	DBG_MSG("fb: 0x%x(+ 0x%x)/0x%x (0x%p)\n",
 		dinfo->fb.physical, dinfo->fb.offset, dinfo->fb.size,
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 29/47] video: fbdev: matrox: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same ioremap()'d area for the MTRR.
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/matrox/matroxfb_base.c | 36 +++++++++++-------------------
 drivers/video/fbdev/matrox/matroxfb_base.h | 27 +---------------------
 2 files changed, 14 insertions(+), 49 deletions(-)

diff --git a/drivers/video/fbdev/matrox/matroxfb_base.c b/drivers/video/fbdev/matrox/matroxfb_base.c
index 62539ca..2f70365 100644
--- a/drivers/video/fbdev/matrox/matroxfb_base.c
+++ b/drivers/video/fbdev/matrox/matroxfb_base.c
@@ -370,12 +370,9 @@ static void matroxfb_remove(struct matrox_fb_info *minfo, int dummy)
 	matroxfb_unregister_device(minfo);
 	unregister_framebuffer(&minfo->fbcon);
 	matroxfb_g450_shutdown(minfo);
-#ifdef CONFIG_MTRR
-	if (minfo->mtrr.vram_valid)
-		mtrr_del(minfo->mtrr.vram, minfo->video.base, minfo->video.len);
-#endif
-	mga_iounmap(minfo->mmio.vbase);
-	mga_iounmap(minfo->video.vbase);
+	arch_phys_wc_del(minfo->wc_cookie);
+	iounmap(minfo->mmio.vbase.vaddr);
+	iounmap(minfo->video.vbase.vaddr);
 	release_mem_region(minfo->video.base, minfo->video.len_maximum);
 	release_mem_region(minfo->mmio.base, 16384);
 	kfree(minfo);
@@ -1256,9 +1253,7 @@ static int nobios;			/* "matroxfb:nobios" */
 static int noinit = 1;			/* "matroxfb:init" */
 static int inverse;			/* "matroxfb:inverse" */
 static int sgram;			/* "matroxfb:sgram" */
-#ifdef CONFIG_MTRR
 static int mtrr = 1;			/* "matroxfb:nomtrr" */
-#endif
 static int grayscale;			/* "matroxfb:grayscale" */
 static int dev = -1;			/* "matroxfb:dev:xxxxx" */
 static unsigned int vesa = ~0;		/* "matroxfb:vesa:xxxxx" */
@@ -1717,14 +1712,17 @@ static int initMatrox2(struct matrox_fb_info *minfo, struct board *b)
 	if (mem && (mem < memsize))
 		memsize = mem;
 	err = -ENOMEM;
-	if (mga_ioremap(ctrlptr_phys, 16384, MGA_IOREMAP_MMIO, &minfo->mmio.vbase)) {
+
+	minfo->mmio.vbase.vaddr = ioremap_nocache(ctrlptr_phys, 16384);
+	if (!minfo->mmio.vbase.vaddr) {
 		printk(KERN_ERR "matroxfb: cannot ioremap(%lX, 16384), matroxfb disabled\n", ctrlptr_phys);
 		goto failVideoMR;
 	}
 	minfo->mmio.base = ctrlptr_phys;
 	minfo->mmio.len = 16384;
 	minfo->video.base = video_base_phys;
-	if (mga_ioremap(video_base_phys, memsize, MGA_IOREMAP_FB, &minfo->video.vbase)) {
+	minfo->video.vbase.vaddr = ioremap_wc(video_base_phys, memsize);
+	if (!minfo->video.vbase.vaddr) {
 		printk(KERN_ERR "matroxfb: cannot ioremap(%lX, %d), matroxfb disabled\n",
 			video_base_phys, memsize);
 		goto failCtrlIO;
@@ -1772,13 +1770,9 @@ static int initMatrox2(struct matrox_fb_info *minfo, struct board *b)
 	minfo->video.len_usable = minfo->video.len;
 	if (minfo->video.len_usable > b->base->maxdisplayable)
 		minfo->video.len_usable = b->base->maxdisplayable;
-#ifdef CONFIG_MTRR
-	if (mtrr) {
-		minfo->mtrr.vram = mtrr_add(video_base_phys, minfo->video.len, MTRR_TYPE_WRCOMB, 1);
-		minfo->mtrr.vram_valid = 1;
-		printk(KERN_INFO "matroxfb: MTRR's turned on\n");
-	}
-#endif	/* CONFIG_MTRR */
+	if (mtrr)
+		minfo->wc_cookie = arch_phys_wc_add(video_base_phys,
+						    minfo->video.len);
 
 	if (!minfo->devflags.novga)
 		request_region(0x3C0, 32, "matrox");
@@ -1947,9 +1941,9 @@ static int initMatrox2(struct matrox_fb_info *minfo, struct board *b)
 	return 0;
 failVideoIO:;
 	matroxfb_g450_shutdown(minfo);
-	mga_iounmap(minfo->video.vbase);
+	iounmap(minfo->video.vbase.vaddr);
 failCtrlIO:;
-	mga_iounmap(minfo->mmio.vbase);
+	iounmap(minfo->mmio.vbase.vaddr);
 failVideoMR:;
 	release_mem_region(video_base_phys, minfo->video.len_maximum);
 failCtrlMR:;
@@ -2443,10 +2437,8 @@ static int __init matroxfb_setup(char *options) {
 				nobios = !value;
 			else if (!strcmp(this_opt, "init"))
 				noinit = !value;
-#ifdef CONFIG_MTRR
 			else if (!strcmp(this_opt, "mtrr"))
 				mtrr = value;
-#endif
 			else if (!strcmp(this_opt, "inv24"))
 				inv24 = value;
 			else if (!strcmp(this_opt, "cross4MB"))
@@ -2515,10 +2507,8 @@ module_param(noinit, int, 0);
 MODULE_PARM_DESC(noinit, "Disables W/SG/SD-RAM and bus interface initialization (0 or 1=do not initialize) (default=0)");
 module_param(memtype, int, 0);
 MODULE_PARM_DESC(memtype, "Memory type for G200/G400 (see Documentation/fb/matroxfb.txt for explanation) (default=3 for G200, 0 for G400)");
-#ifdef CONFIG_MTRR
 module_param(mtrr, int, 0);
 MODULE_PARM_DESC(mtrr, "This speeds up video memory accesses (0=disabled or 1) (default=1)");
-#endif
 module_param(sgram, int, 0);
 MODULE_PARM_DESC(sgram, "Indicates that G100/G200/G400 has SGRAM memory (0=SDRAM, 1=SGRAM) (default=0)");
 module_param(inv24, int, 0);
diff --git a/drivers/video/fbdev/matrox/matroxfb_base.h b/drivers/video/fbdev/matrox/matroxfb_base.h
index 89a8a89a..09b02cd 100644
--- a/drivers/video/fbdev/matrox/matroxfb_base.h
+++ b/drivers/video/fbdev/matrox/matroxfb_base.h
@@ -44,9 +44,6 @@
 
 #include <asm/io.h>
 #include <asm/unaligned.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 
 #if defined(CONFIG_PPC_PMAC)
 #include <asm/prom.h>
@@ -187,23 +184,6 @@ static inline void __iomem* vaddr_va(vaddr_t va) {
 	return va.vaddr;
 }
 
-#define MGA_IOREMAP_NORMAL	0
-#define MGA_IOREMAP_NOCACHE	1
-
-#define MGA_IOREMAP_FB		MGA_IOREMAP_NOCACHE
-#define MGA_IOREMAP_MMIO	MGA_IOREMAP_NOCACHE
-static inline int mga_ioremap(unsigned long phys, unsigned long size, int flags, vaddr_t* virt) {
-	if (flags & MGA_IOREMAP_NOCACHE)
-		virt->vaddr = ioremap_nocache(phys, size);
-	else
-		virt->vaddr = ioremap(phys, size);
-	return (virt->vaddr == NULL); /* 0, !0... 0, error_code in future */
-}
-
-static inline void mga_iounmap(vaddr_t va) {
-	iounmap(va.vaddr);
-}
-
 struct my_timming {
 	unsigned int pixclock;
 	int mnp;
@@ -449,12 +429,7 @@ struct matrox_fb_info {
 		int		plnwt;
 		int		srcorg;
 			      } capable;
-#ifdef CONFIG_MTRR
-	struct {
-		int		vram;
-		int		vram_valid;
-			      } mtrr;
-#endif
+	int			wc_cookie;
 	struct {
 		int		precise_width;
 		int		mga_24bpp_fix;
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 29/47] video: fbdev: matrox: use arch_phys_wc_add() and ioremap_wc()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same ioremap()'d area for the MTRR.
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/matrox/matroxfb_base.c | 36 +++++++++++-------------------
 drivers/video/fbdev/matrox/matroxfb_base.h | 27 +---------------------
 2 files changed, 14 insertions(+), 49 deletions(-)

diff --git a/drivers/video/fbdev/matrox/matroxfb_base.c b/drivers/video/fbdev/matrox/matroxfb_base.c
index 62539ca..2f70365 100644
--- a/drivers/video/fbdev/matrox/matroxfb_base.c
+++ b/drivers/video/fbdev/matrox/matroxfb_base.c
@@ -370,12 +370,9 @@ static void matroxfb_remove(struct matrox_fb_info *minfo, int dummy)
 	matroxfb_unregister_device(minfo);
 	unregister_framebuffer(&minfo->fbcon);
 	matroxfb_g450_shutdown(minfo);
-#ifdef CONFIG_MTRR
-	if (minfo->mtrr.vram_valid)
-		mtrr_del(minfo->mtrr.vram, minfo->video.base, minfo->video.len);
-#endif
-	mga_iounmap(minfo->mmio.vbase);
-	mga_iounmap(minfo->video.vbase);
+	arch_phys_wc_del(minfo->wc_cookie);
+	iounmap(minfo->mmio.vbase.vaddr);
+	iounmap(minfo->video.vbase.vaddr);
 	release_mem_region(minfo->video.base, minfo->video.len_maximum);
 	release_mem_region(minfo->mmio.base, 16384);
 	kfree(minfo);
@@ -1256,9 +1253,7 @@ static int nobios;			/* "matroxfb:nobios" */
 static int noinit = 1;			/* "matroxfb:init" */
 static int inverse;			/* "matroxfb:inverse" */
 static int sgram;			/* "matroxfb:sgram" */
-#ifdef CONFIG_MTRR
 static int mtrr = 1;			/* "matroxfb:nomtrr" */
-#endif
 static int grayscale;			/* "matroxfb:grayscale" */
 static int dev = -1;			/* "matroxfb:dev:xxxxx" */
 static unsigned int vesa = ~0;		/* "matroxfb:vesa:xxxxx" */
@@ -1717,14 +1712,17 @@ static int initMatrox2(struct matrox_fb_info *minfo, struct board *b)
 	if (mem && (mem < memsize))
 		memsize = mem;
 	err = -ENOMEM;
-	if (mga_ioremap(ctrlptr_phys, 16384, MGA_IOREMAP_MMIO, &minfo->mmio.vbase)) {
+
+	minfo->mmio.vbase.vaddr = ioremap_nocache(ctrlptr_phys, 16384);
+	if (!minfo->mmio.vbase.vaddr) {
 		printk(KERN_ERR "matroxfb: cannot ioremap(%lX, 16384), matroxfb disabled\n", ctrlptr_phys);
 		goto failVideoMR;
 	}
 	minfo->mmio.base = ctrlptr_phys;
 	minfo->mmio.len = 16384;
 	minfo->video.base = video_base_phys;
-	if (mga_ioremap(video_base_phys, memsize, MGA_IOREMAP_FB, &minfo->video.vbase)) {
+	minfo->video.vbase.vaddr = ioremap_wc(video_base_phys, memsize);
+	if (!minfo->video.vbase.vaddr) {
 		printk(KERN_ERR "matroxfb: cannot ioremap(%lX, %d), matroxfb disabled\n",
 			video_base_phys, memsize);
 		goto failCtrlIO;
@@ -1772,13 +1770,9 @@ static int initMatrox2(struct matrox_fb_info *minfo, struct board *b)
 	minfo->video.len_usable = minfo->video.len;
 	if (minfo->video.len_usable > b->base->maxdisplayable)
 		minfo->video.len_usable = b->base->maxdisplayable;
-#ifdef CONFIG_MTRR
-	if (mtrr) {
-		minfo->mtrr.vram = mtrr_add(video_base_phys, minfo->video.len, MTRR_TYPE_WRCOMB, 1);
-		minfo->mtrr.vram_valid = 1;
-		printk(KERN_INFO "matroxfb: MTRR's turned on\n");
-	}
-#endif	/* CONFIG_MTRR */
+	if (mtrr)
+		minfo->wc_cookie = arch_phys_wc_add(video_base_phys,
+						    minfo->video.len);
 
 	if (!minfo->devflags.novga)
 		request_region(0x3C0, 32, "matrox");
@@ -1947,9 +1941,9 @@ static int initMatrox2(struct matrox_fb_info *minfo, struct board *b)
 	return 0;
 failVideoIO:;
 	matroxfb_g450_shutdown(minfo);
-	mga_iounmap(minfo->video.vbase);
+	iounmap(minfo->video.vbase.vaddr);
 failCtrlIO:;
-	mga_iounmap(minfo->mmio.vbase);
+	iounmap(minfo->mmio.vbase.vaddr);
 failVideoMR:;
 	release_mem_region(video_base_phys, minfo->video.len_maximum);
 failCtrlMR:;
@@ -2443,10 +2437,8 @@ static int __init matroxfb_setup(char *options) {
 				nobios = !value;
 			else if (!strcmp(this_opt, "init"))
 				noinit = !value;
-#ifdef CONFIG_MTRR
 			else if (!strcmp(this_opt, "mtrr"))
 				mtrr = value;
-#endif
 			else if (!strcmp(this_opt, "inv24"))
 				inv24 = value;
 			else if (!strcmp(this_opt, "cross4MB"))
@@ -2515,10 +2507,8 @@ module_param(noinit, int, 0);
 MODULE_PARM_DESC(noinit, "Disables W/SG/SD-RAM and bus interface initialization (0 or 1=do not initialize) (default=0)");
 module_param(memtype, int, 0);
 MODULE_PARM_DESC(memtype, "Memory type for G200/G400 (see Documentation/fb/matroxfb.txt for explanation) (default=3 for G200, 0 for G400)");
-#ifdef CONFIG_MTRR
 module_param(mtrr, int, 0);
 MODULE_PARM_DESC(mtrr, "This speeds up video memory accesses (0=disabled or 1) (default=1)");
-#endif
 module_param(sgram, int, 0);
 MODULE_PARM_DESC(sgram, "Indicates that G100/G200/G400 has SGRAM memory (0=SDRAM, 1=SGRAM) (default=0)");
 module_param(inv24, int, 0);
diff --git a/drivers/video/fbdev/matrox/matroxfb_base.h b/drivers/video/fbdev/matrox/matroxfb_base.h
index 89a8a89a..09b02cd 100644
--- a/drivers/video/fbdev/matrox/matroxfb_base.h
+++ b/drivers/video/fbdev/matrox/matroxfb_base.h
@@ -44,9 +44,6 @@
 
 #include <asm/io.h>
 #include <asm/unaligned.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 
 #if defined(CONFIG_PPC_PMAC)
 #include <asm/prom.h>
@@ -187,23 +184,6 @@ static inline void __iomem* vaddr_va(vaddr_t va) {
 	return va.vaddr;
 }
 
-#define MGA_IOREMAP_NORMAL	0
-#define MGA_IOREMAP_NOCACHE	1
-
-#define MGA_IOREMAP_FB		MGA_IOREMAP_NOCACHE
-#define MGA_IOREMAP_MMIO	MGA_IOREMAP_NOCACHE
-static inline int mga_ioremap(unsigned long phys, unsigned long size, int flags, vaddr_t* virt) {
-	if (flags & MGA_IOREMAP_NOCACHE)
-		virt->vaddr = ioremap_nocache(phys, size);
-	else
-		virt->vaddr = ioremap(phys, size);
-	return (virt->vaddr = NULL); /* 0, !0... 0, error_code in future */
-}
-
-static inline void mga_iounmap(vaddr_t va) {
-	iounmap(va.vaddr);
-}
-
 struct my_timming {
 	unsigned int pixclock;
 	int mnp;
@@ -449,12 +429,7 @@ struct matrox_fb_info {
 		int		plnwt;
 		int		srcorg;
 			      } capable;
-#ifdef CONFIG_MTRR
-	struct {
-		int		vram;
-		int		vram_valid;
-			      } mtrr;
-#endif
+	int			wc_cookie;
 	struct {
 		int		precise_width;
 		int		mga_24bpp_fix;
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 29/47] video: fbdev: matrox: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (50 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same ioremap()'d area for the MTRR.
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/matrox/matroxfb_base.c | 36 +++++++++++-------------------
 drivers/video/fbdev/matrox/matroxfb_base.h | 27 +---------------------
 2 files changed, 14 insertions(+), 49 deletions(-)

diff --git a/drivers/video/fbdev/matrox/matroxfb_base.c b/drivers/video/fbdev/matrox/matroxfb_base.c
index 62539ca..2f70365 100644
--- a/drivers/video/fbdev/matrox/matroxfb_base.c
+++ b/drivers/video/fbdev/matrox/matroxfb_base.c
@@ -370,12 +370,9 @@ static void matroxfb_remove(struct matrox_fb_info *minfo, int dummy)
 	matroxfb_unregister_device(minfo);
 	unregister_framebuffer(&minfo->fbcon);
 	matroxfb_g450_shutdown(minfo);
-#ifdef CONFIG_MTRR
-	if (minfo->mtrr.vram_valid)
-		mtrr_del(minfo->mtrr.vram, minfo->video.base, minfo->video.len);
-#endif
-	mga_iounmap(minfo->mmio.vbase);
-	mga_iounmap(minfo->video.vbase);
+	arch_phys_wc_del(minfo->wc_cookie);
+	iounmap(minfo->mmio.vbase.vaddr);
+	iounmap(minfo->video.vbase.vaddr);
 	release_mem_region(minfo->video.base, minfo->video.len_maximum);
 	release_mem_region(minfo->mmio.base, 16384);
 	kfree(minfo);
@@ -1256,9 +1253,7 @@ static int nobios;			/* "matroxfb:nobios" */
 static int noinit = 1;			/* "matroxfb:init" */
 static int inverse;			/* "matroxfb:inverse" */
 static int sgram;			/* "matroxfb:sgram" */
-#ifdef CONFIG_MTRR
 static int mtrr = 1;			/* "matroxfb:nomtrr" */
-#endif
 static int grayscale;			/* "matroxfb:grayscale" */
 static int dev = -1;			/* "matroxfb:dev:xxxxx" */
 static unsigned int vesa = ~0;		/* "matroxfb:vesa:xxxxx" */
@@ -1717,14 +1712,17 @@ static int initMatrox2(struct matrox_fb_info *minfo, struct board *b)
 	if (mem && (mem < memsize))
 		memsize = mem;
 	err = -ENOMEM;
-	if (mga_ioremap(ctrlptr_phys, 16384, MGA_IOREMAP_MMIO, &minfo->mmio.vbase)) {
+
+	minfo->mmio.vbase.vaddr = ioremap_nocache(ctrlptr_phys, 16384);
+	if (!minfo->mmio.vbase.vaddr) {
 		printk(KERN_ERR "matroxfb: cannot ioremap(%lX, 16384), matroxfb disabled\n", ctrlptr_phys);
 		goto failVideoMR;
 	}
 	minfo->mmio.base = ctrlptr_phys;
 	minfo->mmio.len = 16384;
 	minfo->video.base = video_base_phys;
-	if (mga_ioremap(video_base_phys, memsize, MGA_IOREMAP_FB, &minfo->video.vbase)) {
+	minfo->video.vbase.vaddr = ioremap_wc(video_base_phys, memsize);
+	if (!minfo->video.vbase.vaddr) {
 		printk(KERN_ERR "matroxfb: cannot ioremap(%lX, %d), matroxfb disabled\n",
 			video_base_phys, memsize);
 		goto failCtrlIO;
@@ -1772,13 +1770,9 @@ static int initMatrox2(struct matrox_fb_info *minfo, struct board *b)
 	minfo->video.len_usable = minfo->video.len;
 	if (minfo->video.len_usable > b->base->maxdisplayable)
 		minfo->video.len_usable = b->base->maxdisplayable;
-#ifdef CONFIG_MTRR
-	if (mtrr) {
-		minfo->mtrr.vram = mtrr_add(video_base_phys, minfo->video.len, MTRR_TYPE_WRCOMB, 1);
-		minfo->mtrr.vram_valid = 1;
-		printk(KERN_INFO "matroxfb: MTRR's turned on\n");
-	}
-#endif	/* CONFIG_MTRR */
+	if (mtrr)
+		minfo->wc_cookie = arch_phys_wc_add(video_base_phys,
+						    minfo->video.len);
 
 	if (!minfo->devflags.novga)
 		request_region(0x3C0, 32, "matrox");
@@ -1947,9 +1941,9 @@ static int initMatrox2(struct matrox_fb_info *minfo, struct board *b)
 	return 0;
 failVideoIO:;
 	matroxfb_g450_shutdown(minfo);
-	mga_iounmap(minfo->video.vbase);
+	iounmap(minfo->video.vbase.vaddr);
 failCtrlIO:;
-	mga_iounmap(minfo->mmio.vbase);
+	iounmap(minfo->mmio.vbase.vaddr);
 failVideoMR:;
 	release_mem_region(video_base_phys, minfo->video.len_maximum);
 failCtrlMR:;
@@ -2443,10 +2437,8 @@ static int __init matroxfb_setup(char *options) {
 				nobios = !value;
 			else if (!strcmp(this_opt, "init"))
 				noinit = !value;
-#ifdef CONFIG_MTRR
 			else if (!strcmp(this_opt, "mtrr"))
 				mtrr = value;
-#endif
 			else if (!strcmp(this_opt, "inv24"))
 				inv24 = value;
 			else if (!strcmp(this_opt, "cross4MB"))
@@ -2515,10 +2507,8 @@ module_param(noinit, int, 0);
 MODULE_PARM_DESC(noinit, "Disables W/SG/SD-RAM and bus interface initialization (0 or 1=do not initialize) (default=0)");
 module_param(memtype, int, 0);
 MODULE_PARM_DESC(memtype, "Memory type for G200/G400 (see Documentation/fb/matroxfb.txt for explanation) (default=3 for G200, 0 for G400)");
-#ifdef CONFIG_MTRR
 module_param(mtrr, int, 0);
 MODULE_PARM_DESC(mtrr, "This speeds up video memory accesses (0=disabled or 1) (default=1)");
-#endif
 module_param(sgram, int, 0);
 MODULE_PARM_DESC(sgram, "Indicates that G100/G200/G400 has SGRAM memory (0=SDRAM, 1=SGRAM) (default=0)");
 module_param(inv24, int, 0);
diff --git a/drivers/video/fbdev/matrox/matroxfb_base.h b/drivers/video/fbdev/matrox/matroxfb_base.h
index 89a8a89a..09b02cd 100644
--- a/drivers/video/fbdev/matrox/matroxfb_base.h
+++ b/drivers/video/fbdev/matrox/matroxfb_base.h
@@ -44,9 +44,6 @@
 
 #include <asm/io.h>
 #include <asm/unaligned.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 
 #if defined(CONFIG_PPC_PMAC)
 #include <asm/prom.h>
@@ -187,23 +184,6 @@ static inline void __iomem* vaddr_va(vaddr_t va) {
 	return va.vaddr;
 }
 
-#define MGA_IOREMAP_NORMAL	0
-#define MGA_IOREMAP_NOCACHE	1
-
-#define MGA_IOREMAP_FB		MGA_IOREMAP_NOCACHE
-#define MGA_IOREMAP_MMIO	MGA_IOREMAP_NOCACHE
-static inline int mga_ioremap(unsigned long phys, unsigned long size, int flags, vaddr_t* virt) {
-	if (flags & MGA_IOREMAP_NOCACHE)
-		virt->vaddr = ioremap_nocache(phys, size);
-	else
-		virt->vaddr = ioremap(phys, size);
-	return (virt->vaddr == NULL); /* 0, !0... 0, error_code in future */
-}
-
-static inline void mga_iounmap(vaddr_t va) {
-	iounmap(va.vaddr);
-}
-
 struct my_timming {
 	unsigned int pixclock;
 	int mnp;
@@ -449,12 +429,7 @@ struct matrox_fb_info {
 		int		plnwt;
 		int		srcorg;
 			      } capable;
-#ifdef CONFIG_MTRR
-	struct {
-		int		vram;
-		int		vram_valid;
-			      } mtrr;
-#endif
+	int			wc_cookie;
 	struct {
 		int		precise_width;
 		int		mga_24bpp_fix;
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 30/47] video: fbdev: neofb: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/neofb.c | 26 +++++++-------------------
 include/video/neomagic.h    |  5 +----
 2 files changed, 8 insertions(+), 23 deletions(-)

diff --git a/drivers/video/fbdev/neofb.c b/drivers/video/fbdev/neofb.c
index 44f99a6..db023a9 100644
--- a/drivers/video/fbdev/neofb.c
+++ b/drivers/video/fbdev/neofb.c
@@ -71,11 +71,6 @@
 #include <asm/io.h>
 #include <asm/irq.h>
 #include <asm/pgtable.h>
-
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 #include <video/vga.h>
 #include <video/neomagic.h>
 
@@ -1710,6 +1705,7 @@ static int neo_map_video(struct fb_info *info, struct pci_dev *dev,
 			 int video_len)
 {
 	//unsigned long addr;
+	struct neofb_par *par = info->par;
 
 	DBG("neo_map_video");
 
@@ -1723,7 +1719,7 @@ static int neo_map_video(struct fb_info *info, struct pci_dev *dev,
 	}
 
 	info->screen_base =
-	    ioremap(info->fix.smem_start, info->fix.smem_len);
+	    ioremap_wc(info->fix.smem_start, info->fix.smem_len);
 	if (!info->screen_base) {
 		printk("neofb: unable to map screen memory\n");
 		release_mem_region(info->fix.smem_start,
@@ -1733,11 +1729,8 @@ static int neo_map_video(struct fb_info *info, struct pci_dev *dev,
 		printk(KERN_INFO "neofb: mapped framebuffer at %p\n",
 		       info->screen_base);
 
-#ifdef CONFIG_MTRR
-	((struct neofb_par *)(info->par))->mtrr =
-		mtrr_add(info->fix.smem_start, pci_resource_len(dev, 0),
-				MTRR_TYPE_WRCOMB, 1);
-#endif
+	par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+					  pci_resource_len(dev, 0));
 
 	/* Clear framebuffer, it's all white in memory after boot */
 	memset_io(info->screen_base, 0, info->fix.smem_len);
@@ -1754,16 +1747,11 @@ static int neo_map_video(struct fb_info *info, struct pci_dev *dev,
 
 static void neo_unmap_video(struct fb_info *info)
 {
-	DBG("neo_unmap_video");
+	struct neofb_par *par = info->par;
 
-#ifdef CONFIG_MTRR
-	{
-		struct neofb_par *par = info->par;
+	DBG("neo_unmap_video");
 
-		mtrr_del(par->mtrr, info->fix.smem_start,
-			 info->fix.smem_len);
-	}
-#endif
+	arch_phys_wc_del(par->wc_cookie);
 	iounmap(info->screen_base);
 	info->screen_base = NULL;
 
diff --git a/include/video/neomagic.h b/include/video/neomagic.h
index bc5013e..91e225a 100644
--- a/include/video/neomagic.h
+++ b/include/video/neomagic.h
@@ -159,10 +159,7 @@ struct neofb_par {
 	unsigned char VCLK3NumeratorHigh;
 	unsigned char VCLK3Denominator;
 	unsigned char VerticalExt;
-
-#ifdef CONFIG_MTRR
-	int mtrr;
-#endif
+	int wc_cookie;
 	u8 __iomem *mmio_vbase;
 	u8 cursorOff;
 	u8 *cursorPad;		/* Must die !! */
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 30/47] video: fbdev: neofb: use arch_phys_wc_add() and ioremap_wc()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/neofb.c | 26 +++++++-------------------
 include/video/neomagic.h    |  5 +----
 2 files changed, 8 insertions(+), 23 deletions(-)

diff --git a/drivers/video/fbdev/neofb.c b/drivers/video/fbdev/neofb.c
index 44f99a6..db023a9 100644
--- a/drivers/video/fbdev/neofb.c
+++ b/drivers/video/fbdev/neofb.c
@@ -71,11 +71,6 @@
 #include <asm/io.h>
 #include <asm/irq.h>
 #include <asm/pgtable.h>
-
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 #include <video/vga.h>
 #include <video/neomagic.h>
 
@@ -1710,6 +1705,7 @@ static int neo_map_video(struct fb_info *info, struct pci_dev *dev,
 			 int video_len)
 {
 	//unsigned long addr;
+	struct neofb_par *par = info->par;
 
 	DBG("neo_map_video");
 
@@ -1723,7 +1719,7 @@ static int neo_map_video(struct fb_info *info, struct pci_dev *dev,
 	}
 
 	info->screen_base -	    ioremap(info->fix.smem_start, info->fix.smem_len);
+	    ioremap_wc(info->fix.smem_start, info->fix.smem_len);
 	if (!info->screen_base) {
 		printk("neofb: unable to map screen memory\n");
 		release_mem_region(info->fix.smem_start,
@@ -1733,11 +1729,8 @@ static int neo_map_video(struct fb_info *info, struct pci_dev *dev,
 		printk(KERN_INFO "neofb: mapped framebuffer at %p\n",
 		       info->screen_base);
 
-#ifdef CONFIG_MTRR
-	((struct neofb_par *)(info->par))->mtrr -		mtrr_add(info->fix.smem_start, pci_resource_len(dev, 0),
-				MTRR_TYPE_WRCOMB, 1);
-#endif
+	par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+					  pci_resource_len(dev, 0));
 
 	/* Clear framebuffer, it's all white in memory after boot */
 	memset_io(info->screen_base, 0, info->fix.smem_len);
@@ -1754,16 +1747,11 @@ static int neo_map_video(struct fb_info *info, struct pci_dev *dev,
 
 static void neo_unmap_video(struct fb_info *info)
 {
-	DBG("neo_unmap_video");
+	struct neofb_par *par = info->par;
 
-#ifdef CONFIG_MTRR
-	{
-		struct neofb_par *par = info->par;
+	DBG("neo_unmap_video");
 
-		mtrr_del(par->mtrr, info->fix.smem_start,
-			 info->fix.smem_len);
-	}
-#endif
+	arch_phys_wc_del(par->wc_cookie);
 	iounmap(info->screen_base);
 	info->screen_base = NULL;
 
diff --git a/include/video/neomagic.h b/include/video/neomagic.h
index bc5013e..91e225a 100644
--- a/include/video/neomagic.h
+++ b/include/video/neomagic.h
@@ -159,10 +159,7 @@ struct neofb_par {
 	unsigned char VCLK3NumeratorHigh;
 	unsigned char VCLK3Denominator;
 	unsigned char VerticalExt;
-
-#ifdef CONFIG_MTRR
-	int mtrr;
-#endif
+	int wc_cookie;
 	u8 __iomem *mmio_vbase;
 	u8 cursorOff;
 	u8 *cursorPad;		/* Must die !! */
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 30/47] video: fbdev: neofb: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (53 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/neofb.c | 26 +++++++-------------------
 include/video/neomagic.h    |  5 +----
 2 files changed, 8 insertions(+), 23 deletions(-)

diff --git a/drivers/video/fbdev/neofb.c b/drivers/video/fbdev/neofb.c
index 44f99a6..db023a9 100644
--- a/drivers/video/fbdev/neofb.c
+++ b/drivers/video/fbdev/neofb.c
@@ -71,11 +71,6 @@
 #include <asm/io.h>
 #include <asm/irq.h>
 #include <asm/pgtable.h>
-
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 #include <video/vga.h>
 #include <video/neomagic.h>
 
@@ -1710,6 +1705,7 @@ static int neo_map_video(struct fb_info *info, struct pci_dev *dev,
 			 int video_len)
 {
 	//unsigned long addr;
+	struct neofb_par *par = info->par;
 
 	DBG("neo_map_video");
 
@@ -1723,7 +1719,7 @@ static int neo_map_video(struct fb_info *info, struct pci_dev *dev,
 	}
 
 	info->screen_base =
-	    ioremap(info->fix.smem_start, info->fix.smem_len);
+	    ioremap_wc(info->fix.smem_start, info->fix.smem_len);
 	if (!info->screen_base) {
 		printk("neofb: unable to map screen memory\n");
 		release_mem_region(info->fix.smem_start,
@@ -1733,11 +1729,8 @@ static int neo_map_video(struct fb_info *info, struct pci_dev *dev,
 		printk(KERN_INFO "neofb: mapped framebuffer at %p\n",
 		       info->screen_base);
 
-#ifdef CONFIG_MTRR
-	((struct neofb_par *)(info->par))->mtrr =
-		mtrr_add(info->fix.smem_start, pci_resource_len(dev, 0),
-				MTRR_TYPE_WRCOMB, 1);
-#endif
+	par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+					  pci_resource_len(dev, 0));
 
 	/* Clear framebuffer, it's all white in memory after boot */
 	memset_io(info->screen_base, 0, info->fix.smem_len);
@@ -1754,16 +1747,11 @@ static int neo_map_video(struct fb_info *info, struct pci_dev *dev,
 
 static void neo_unmap_video(struct fb_info *info)
 {
-	DBG("neo_unmap_video");
+	struct neofb_par *par = info->par;
 
-#ifdef CONFIG_MTRR
-	{
-		struct neofb_par *par = info->par;
+	DBG("neo_unmap_video");
 
-		mtrr_del(par->mtrr, info->fix.smem_start,
-			 info->fix.smem_len);
-	}
-#endif
+	arch_phys_wc_del(par->wc_cookie);
 	iounmap(info->screen_base);
 	info->screen_base = NULL;
 
diff --git a/include/video/neomagic.h b/include/video/neomagic.h
index bc5013e..91e225a 100644
--- a/include/video/neomagic.h
+++ b/include/video/neomagic.h
@@ -159,10 +159,7 @@ struct neofb_par {
 	unsigned char VCLK3NumeratorHigh;
 	unsigned char VCLK3Denominator;
 	unsigned char VerticalExt;
-
-#ifdef CONFIG_MTRR
-	int mtrr;
-#endif
+	int wc_cookie;
 	u8 __iomem *mmio_vbase;
 	u8 cursorOff;
 	u8 *cursorPad;		/* Must die !! */
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 31/47] video: fbdev: s3fb: use arch_phys_wc_add() and pci_iomap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/s3fb.c | 35 ++++++-----------------------------
 1 file changed, 6 insertions(+), 29 deletions(-)

diff --git a/drivers/video/fbdev/s3fb.c b/drivers/video/fbdev/s3fb.c
index f0ae61a..13b1090 100644
--- a/drivers/video/fbdev/s3fb.c
+++ b/drivers/video/fbdev/s3fb.c
@@ -28,13 +28,9 @@
 #include <linux/i2c.h>
 #include <linux/i2c-algo-bit.h>
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 struct s3fb_info {
 	int chip, rev, mclk_freq;
-	int mtrr_reg;
+	int wc_cookie;
 	struct vgastate state;
 	struct mutex open_lock;
 	unsigned int ref_count;
@@ -154,11 +150,7 @@ static const struct svga_timing_regs s3_timing_regs     = {
 
 
 static char *mode_option;
-
-#ifdef CONFIG_MTRR
 static int mtrr = 1;
-#endif
-
 static int fasttext = 1;
 
 
@@ -170,11 +162,8 @@ module_param(mode_option, charp, 0444);
 MODULE_PARM_DESC(mode_option, "Default video mode ('640x480-8@60', etc)");
 module_param_named(mode, mode_option, charp, 0444);
 MODULE_PARM_DESC(mode, "Default video mode ('640x480-8@60', etc) (deprecated)");
-
-#ifdef CONFIG_MTRR
 module_param(mtrr, int, 0444);
 MODULE_PARM_DESC(mtrr, "Enable write-combining with MTRR (1=enable, 0=disable, default=1)");
-#endif
 
 module_param(fasttext, int, 0644);
 MODULE_PARM_DESC(fasttext, "Enable S3 fast text mode (1=enable, 0=disable, default=1)");
@@ -1168,7 +1157,7 @@ static int s3_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 	info->fix.smem_len = pci_resource_len(dev, 0);
 
 	/* Map physical IO memory address into kernel space */
-	info->screen_base = pci_iomap(dev, 0, 0);
+	info->screen_base = pci_iomap_wc(dev, 0, 0);
 	if (! info->screen_base) {
 		rc = -ENOMEM;
 		dev_err(info->device, "iomap for framebuffer failed\n");
@@ -1365,12 +1354,9 @@ static int s3_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 	/* Record a reference to the driver data */
 	pci_set_drvdata(dev, info);
 
-#ifdef CONFIG_MTRR
-	if (mtrr) {
-		par->mtrr_reg = -1;
-		par->mtrr_reg = mtrr_add(info->fix.smem_start, info->fix.smem_len, MTRR_TYPE_WRCOMB, 1);
-	}
-#endif
+	if (mtrr)
+		par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+						  info->fix.smem_len);
 
 	return 0;
 
@@ -1405,14 +1391,7 @@ static void s3_pci_remove(struct pci_dev *dev)
 
 	if (info) {
 		par = info->par;
-
-#ifdef CONFIG_MTRR
-		if (par->mtrr_reg >= 0) {
-			mtrr_del(par->mtrr_reg, 0, 0);
-			par->mtrr_reg = -1;
-		}
-#endif
-
+		arch_phys_wc_del(par->wc_cookie);
 		unregister_framebuffer(info);
 		fb_dealloc_cmap(&info->cmap);
 
@@ -1551,10 +1530,8 @@ static int  __init s3fb_setup(char *options)
 
 		if (!*opt)
 			continue;
-#ifdef CONFIG_MTRR
 		else if (!strncmp(opt, "mtrr:", 5))
 			mtrr = simple_strtoul(opt + 5, NULL, 0);
-#endif
 		else if (!strncmp(opt, "fasttext:", 9))
 			fasttext = simple_strtoul(opt + 9, NULL, 0);
 		else
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 31/47] video: fbdev: s3fb: use arch_phys_wc_add() and pci_iomap_wc()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/s3fb.c | 35 ++++++-----------------------------
 1 file changed, 6 insertions(+), 29 deletions(-)

diff --git a/drivers/video/fbdev/s3fb.c b/drivers/video/fbdev/s3fb.c
index f0ae61a..13b1090 100644
--- a/drivers/video/fbdev/s3fb.c
+++ b/drivers/video/fbdev/s3fb.c
@@ -28,13 +28,9 @@
 #include <linux/i2c.h>
 #include <linux/i2c-algo-bit.h>
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 struct s3fb_info {
 	int chip, rev, mclk_freq;
-	int mtrr_reg;
+	int wc_cookie;
 	struct vgastate state;
 	struct mutex open_lock;
 	unsigned int ref_count;
@@ -154,11 +150,7 @@ static const struct svga_timing_regs s3_timing_regs     = {
 
 
 static char *mode_option;
-
-#ifdef CONFIG_MTRR
 static int mtrr = 1;
-#endif
-
 static int fasttext = 1;
 
 
@@ -170,11 +162,8 @@ module_param(mode_option, charp, 0444);
 MODULE_PARM_DESC(mode_option, "Default video mode ('640x480-8@60', etc)");
 module_param_named(mode, mode_option, charp, 0444);
 MODULE_PARM_DESC(mode, "Default video mode ('640x480-8@60', etc) (deprecated)");
-
-#ifdef CONFIG_MTRR
 module_param(mtrr, int, 0444);
 MODULE_PARM_DESC(mtrr, "Enable write-combining with MTRR (1=enable, 0=disable, default=1)");
-#endif
 
 module_param(fasttext, int, 0644);
 MODULE_PARM_DESC(fasttext, "Enable S3 fast text mode (1=enable, 0=disable, default=1)");
@@ -1168,7 +1157,7 @@ static int s3_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 	info->fix.smem_len = pci_resource_len(dev, 0);
 
 	/* Map physical IO memory address into kernel space */
-	info->screen_base = pci_iomap(dev, 0, 0);
+	info->screen_base = pci_iomap_wc(dev, 0, 0);
 	if (! info->screen_base) {
 		rc = -ENOMEM;
 		dev_err(info->device, "iomap for framebuffer failed\n");
@@ -1365,12 +1354,9 @@ static int s3_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 	/* Record a reference to the driver data */
 	pci_set_drvdata(dev, info);
 
-#ifdef CONFIG_MTRR
-	if (mtrr) {
-		par->mtrr_reg = -1;
-		par->mtrr_reg = mtrr_add(info->fix.smem_start, info->fix.smem_len, MTRR_TYPE_WRCOMB, 1);
-	}
-#endif
+	if (mtrr)
+		par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+						  info->fix.smem_len);
 
 	return 0;
 
@@ -1405,14 +1391,7 @@ static void s3_pci_remove(struct pci_dev *dev)
 
 	if (info) {
 		par = info->par;
-
-#ifdef CONFIG_MTRR
-		if (par->mtrr_reg >= 0) {
-			mtrr_del(par->mtrr_reg, 0, 0);
-			par->mtrr_reg = -1;
-		}
-#endif
-
+		arch_phys_wc_del(par->wc_cookie);
 		unregister_framebuffer(info);
 		fb_dealloc_cmap(&info->cmap);
 
@@ -1551,10 +1530,8 @@ static int  __init s3fb_setup(char *options)
 
 		if (!*opt)
 			continue;
-#ifdef CONFIG_MTRR
 		else if (!strncmp(opt, "mtrr:", 5))
 			mtrr = simple_strtoul(opt + 5, NULL, 0);
-#endif
 		else if (!strncmp(opt, "fasttext:", 9))
 			fasttext = simple_strtoul(opt + 9, NULL, 0);
 		else
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 31/47] video: fbdev: s3fb: use arch_phys_wc_add() and pci_iomap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (54 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/s3fb.c | 35 ++++++-----------------------------
 1 file changed, 6 insertions(+), 29 deletions(-)

diff --git a/drivers/video/fbdev/s3fb.c b/drivers/video/fbdev/s3fb.c
index f0ae61a..13b1090 100644
--- a/drivers/video/fbdev/s3fb.c
+++ b/drivers/video/fbdev/s3fb.c
@@ -28,13 +28,9 @@
 #include <linux/i2c.h>
 #include <linux/i2c-algo-bit.h>
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 struct s3fb_info {
 	int chip, rev, mclk_freq;
-	int mtrr_reg;
+	int wc_cookie;
 	struct vgastate state;
 	struct mutex open_lock;
 	unsigned int ref_count;
@@ -154,11 +150,7 @@ static const struct svga_timing_regs s3_timing_regs     = {
 
 
 static char *mode_option;
-
-#ifdef CONFIG_MTRR
 static int mtrr = 1;
-#endif
-
 static int fasttext = 1;
 
 
@@ -170,11 +162,8 @@ module_param(mode_option, charp, 0444);
 MODULE_PARM_DESC(mode_option, "Default video mode ('640x480-8@60', etc)");
 module_param_named(mode, mode_option, charp, 0444);
 MODULE_PARM_DESC(mode, "Default video mode ('640x480-8@60', etc) (deprecated)");
-
-#ifdef CONFIG_MTRR
 module_param(mtrr, int, 0444);
 MODULE_PARM_DESC(mtrr, "Enable write-combining with MTRR (1=enable, 0=disable, default=1)");
-#endif
 
 module_param(fasttext, int, 0644);
 MODULE_PARM_DESC(fasttext, "Enable S3 fast text mode (1=enable, 0=disable, default=1)");
@@ -1168,7 +1157,7 @@ static int s3_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 	info->fix.smem_len = pci_resource_len(dev, 0);
 
 	/* Map physical IO memory address into kernel space */
-	info->screen_base = pci_iomap(dev, 0, 0);
+	info->screen_base = pci_iomap_wc(dev, 0, 0);
 	if (! info->screen_base) {
 		rc = -ENOMEM;
 		dev_err(info->device, "iomap for framebuffer failed\n");
@@ -1365,12 +1354,9 @@ static int s3_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 	/* Record a reference to the driver data */
 	pci_set_drvdata(dev, info);
 
-#ifdef CONFIG_MTRR
-	if (mtrr) {
-		par->mtrr_reg = -1;
-		par->mtrr_reg = mtrr_add(info->fix.smem_start, info->fix.smem_len, MTRR_TYPE_WRCOMB, 1);
-	}
-#endif
+	if (mtrr)
+		par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+						  info->fix.smem_len);
 
 	return 0;
 
@@ -1405,14 +1391,7 @@ static void s3_pci_remove(struct pci_dev *dev)
 
 	if (info) {
 		par = info->par;
-
-#ifdef CONFIG_MTRR
-		if (par->mtrr_reg >= 0) {
-			mtrr_del(par->mtrr_reg, 0, 0);
-			par->mtrr_reg = -1;
-		}
-#endif
-
+		arch_phys_wc_del(par->wc_cookie);
 		unregister_framebuffer(info);
 		fb_dealloc_cmap(&info->cmap);
 
@@ -1551,10 +1530,8 @@ static int  __init s3fb_setup(char *options)
 
 		if (!*opt)
 			continue;
-#ifdef CONFIG_MTRR
 		else if (!strncmp(opt, "mtrr:", 5))
 			mtrr = simple_strtoul(opt + 5, NULL, 0);
-#endif
 		else if (!strncmp(opt, "fasttext:", 9))
 			fasttext = simple_strtoul(opt + 9, NULL, 0);
 		else
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 32/47] video: fbdev: nvidia: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR and ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/nvidia/nv_type.h |  7 +------
 drivers/video/fbdev/nvidia/nvidia.c  | 37 ++++++------------------------------
 2 files changed, 7 insertions(+), 37 deletions(-)

diff --git a/drivers/video/fbdev/nvidia/nv_type.h b/drivers/video/fbdev/nvidia/nv_type.h
index c03f7f5..6ff321a 100644
--- a/drivers/video/fbdev/nvidia/nv_type.h
+++ b/drivers/video/fbdev/nvidia/nv_type.h
@@ -148,12 +148,7 @@ struct nvidia_par {
 	u32 forceCRTC;
 	u32 open_count;
 	u8 DDCBase;
-#ifdef CONFIG_MTRR
-	struct {
-		int vram;
-		int vram_valid;
-	} mtrr;
-#endif
+	int wc_cookie;
 	struct nvidia_i2c_chan chan[3];
 
 	volatile u32 __iomem *REGS;
diff --git a/drivers/video/fbdev/nvidia/nvidia.c b/drivers/video/fbdev/nvidia/nvidia.c
index def0412..781f5e7 100644
--- a/drivers/video/fbdev/nvidia/nvidia.c
+++ b/drivers/video/fbdev/nvidia/nvidia.c
@@ -21,9 +21,6 @@
 #include <linux/pci.h>
 #include <linux/console.h>
 #include <linux/backlight.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 #ifdef CONFIG_PPC_OF
 #include <asm/prom.h>
 #include <asm/pci-bridge.h>
@@ -80,9 +77,7 @@ static int paneltweak = 0;
 static int vram = 0;
 static int bpp = 8;
 static int reverse_i2c;
-#ifdef CONFIG_MTRR
 static bool nomtrr = false;
-#endif
 #ifdef CONFIG_PMAC_BACKLIGHT
 static int backlight = 1;
 #else
@@ -1365,7 +1360,8 @@ static int nvidiafb_probe(struct pci_dev *pd, const struct pci_device_id *ent)
 	par->ScratchBufferStart = par->FbUsableSize - par->ScratchBufferSize;
 	par->CursorStart = par->FbUsableSize + (32 * 1024);
 
-	info->screen_base = ioremap(nvidiafb_fix.smem_start, par->FbMapSize);
+	info->screen_base = ioremap_wc(nvidiafb_fix.smem_start,
+				       par->FbMapSize);
 	info->screen_size = par->FbUsableSize;
 	nvidiafb_fix.smem_len = par->RamAmountKBytes * 1024;
 
@@ -1376,20 +1372,9 @@ static int nvidiafb_probe(struct pci_dev *pd, const struct pci_device_id *ent)
 
 	par->FbStart = info->screen_base;
 
-#ifdef CONFIG_MTRR
-	if (!nomtrr) {
-		par->mtrr.vram = mtrr_add(nvidiafb_fix.smem_start,
-					  par->RamAmountKBytes * 1024,
-					  MTRR_TYPE_WRCOMB, 1);
-		if (par->mtrr.vram < 0) {
-			printk(KERN_ERR PFX "unable to setup MTRR\n");
-		} else {
-			par->mtrr.vram_valid = 1;
-			/* let there be speed */
-			printk(KERN_INFO PFX "MTRR set to ON\n");
-		}
-	}
-#endif				/* CONFIG_MTRR */
+	if (!nomtrr)
+		par->wc_cookie = arch_phys_wc_add(nvidiafb_fix.smem_start,
+						  par->RamAmountKBytes * 1024);
 
 	info->fbops = &nvidia_fb_ops;
 	info->fix = nvidiafb_fix;
@@ -1447,13 +1432,7 @@ static void nvidiafb_remove(struct pci_dev *pd)
 	unregister_framebuffer(info);
 
 	nvidia_bl_exit(par);
-
-#ifdef CONFIG_MTRR
-	if (par->mtrr.vram_valid)
-		mtrr_del(par->mtrr.vram, info->fix.smem_start,
-			 info->fix.smem_len);
-#endif				/* CONFIG_MTRR */
-
+	arch_phys_wc_del(par->wc_cookie);
 	iounmap(info->screen_base);
 	fb_destroy_modedb(info->monspecs.modedb);
 	nvidia_delete_i2c_busses(par);
@@ -1505,10 +1484,8 @@ static int nvidiafb_setup(char *options)
 			vram = simple_strtoul(this_opt+5, NULL, 0);
 		} else if (!strncmp(this_opt, "backlight:", 10)) {
 			backlight = simple_strtoul(this_opt+10, NULL, 0);
-#ifdef CONFIG_MTRR
 		} else if (!strncmp(this_opt, "nomtrr", 6)) {
 			nomtrr = true;
-#endif
 		} else if (!strncmp(this_opt, "fpdither:", 9)) {
 			fpdither = simple_strtol(this_opt+9, NULL, 0);
 		} else if (!strncmp(this_opt, "bpp:", 4)) {
@@ -1596,11 +1573,9 @@ MODULE_PARM_DESC(bpp, "pixel width in bits"
 		 "(default=8)");
 module_param(reverse_i2c, int, 0);
 MODULE_PARM_DESC(reverse_i2c, "reverse port assignment of the i2c bus");
-#ifdef CONFIG_MTRR
 module_param(nomtrr, bool, false);
 MODULE_PARM_DESC(nomtrr, "Disables MTRR support (0 or 1=disabled) "
 		 "(default=0)");
-#endif
 
 MODULE_AUTHOR("Antonino Daplas");
 MODULE_DESCRIPTION("Framebuffer driver for nVidia graphics chipset");
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 32/47] video: fbdev: nvidia: use arch_phys_wc_add() and ioremap_wc()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR and ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/nvidia/nv_type.h |  7 +------
 drivers/video/fbdev/nvidia/nvidia.c  | 37 ++++++------------------------------
 2 files changed, 7 insertions(+), 37 deletions(-)

diff --git a/drivers/video/fbdev/nvidia/nv_type.h b/drivers/video/fbdev/nvidia/nv_type.h
index c03f7f5..6ff321a 100644
--- a/drivers/video/fbdev/nvidia/nv_type.h
+++ b/drivers/video/fbdev/nvidia/nv_type.h
@@ -148,12 +148,7 @@ struct nvidia_par {
 	u32 forceCRTC;
 	u32 open_count;
 	u8 DDCBase;
-#ifdef CONFIG_MTRR
-	struct {
-		int vram;
-		int vram_valid;
-	} mtrr;
-#endif
+	int wc_cookie;
 	struct nvidia_i2c_chan chan[3];
 
 	volatile u32 __iomem *REGS;
diff --git a/drivers/video/fbdev/nvidia/nvidia.c b/drivers/video/fbdev/nvidia/nvidia.c
index def0412..781f5e7 100644
--- a/drivers/video/fbdev/nvidia/nvidia.c
+++ b/drivers/video/fbdev/nvidia/nvidia.c
@@ -21,9 +21,6 @@
 #include <linux/pci.h>
 #include <linux/console.h>
 #include <linux/backlight.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 #ifdef CONFIG_PPC_OF
 #include <asm/prom.h>
 #include <asm/pci-bridge.h>
@@ -80,9 +77,7 @@ static int paneltweak = 0;
 static int vram = 0;
 static int bpp = 8;
 static int reverse_i2c;
-#ifdef CONFIG_MTRR
 static bool nomtrr = false;
-#endif
 #ifdef CONFIG_PMAC_BACKLIGHT
 static int backlight = 1;
 #else
@@ -1365,7 +1360,8 @@ static int nvidiafb_probe(struct pci_dev *pd, const struct pci_device_id *ent)
 	par->ScratchBufferStart = par->FbUsableSize - par->ScratchBufferSize;
 	par->CursorStart = par->FbUsableSize + (32 * 1024);
 
-	info->screen_base = ioremap(nvidiafb_fix.smem_start, par->FbMapSize);
+	info->screen_base = ioremap_wc(nvidiafb_fix.smem_start,
+				       par->FbMapSize);
 	info->screen_size = par->FbUsableSize;
 	nvidiafb_fix.smem_len = par->RamAmountKBytes * 1024;
 
@@ -1376,20 +1372,9 @@ static int nvidiafb_probe(struct pci_dev *pd, const struct pci_device_id *ent)
 
 	par->FbStart = info->screen_base;
 
-#ifdef CONFIG_MTRR
-	if (!nomtrr) {
-		par->mtrr.vram = mtrr_add(nvidiafb_fix.smem_start,
-					  par->RamAmountKBytes * 1024,
-					  MTRR_TYPE_WRCOMB, 1);
-		if (par->mtrr.vram < 0) {
-			printk(KERN_ERR PFX "unable to setup MTRR\n");
-		} else {
-			par->mtrr.vram_valid = 1;
-			/* let there be speed */
-			printk(KERN_INFO PFX "MTRR set to ON\n");
-		}
-	}
-#endif				/* CONFIG_MTRR */
+	if (!nomtrr)
+		par->wc_cookie = arch_phys_wc_add(nvidiafb_fix.smem_start,
+						  par->RamAmountKBytes * 1024);
 
 	info->fbops = &nvidia_fb_ops;
 	info->fix = nvidiafb_fix;
@@ -1447,13 +1432,7 @@ static void nvidiafb_remove(struct pci_dev *pd)
 	unregister_framebuffer(info);
 
 	nvidia_bl_exit(par);
-
-#ifdef CONFIG_MTRR
-	if (par->mtrr.vram_valid)
-		mtrr_del(par->mtrr.vram, info->fix.smem_start,
-			 info->fix.smem_len);
-#endif				/* CONFIG_MTRR */
-
+	arch_phys_wc_del(par->wc_cookie);
 	iounmap(info->screen_base);
 	fb_destroy_modedb(info->monspecs.modedb);
 	nvidia_delete_i2c_busses(par);
@@ -1505,10 +1484,8 @@ static int nvidiafb_setup(char *options)
 			vram = simple_strtoul(this_opt+5, NULL, 0);
 		} else if (!strncmp(this_opt, "backlight:", 10)) {
 			backlight = simple_strtoul(this_opt+10, NULL, 0);
-#ifdef CONFIG_MTRR
 		} else if (!strncmp(this_opt, "nomtrr", 6)) {
 			nomtrr = true;
-#endif
 		} else if (!strncmp(this_opt, "fpdither:", 9)) {
 			fpdither = simple_strtol(this_opt+9, NULL, 0);
 		} else if (!strncmp(this_opt, "bpp:", 4)) {
@@ -1596,11 +1573,9 @@ MODULE_PARM_DESC(bpp, "pixel width in bits"
 		 "(default=8)");
 module_param(reverse_i2c, int, 0);
 MODULE_PARM_DESC(reverse_i2c, "reverse port assignment of the i2c bus");
-#ifdef CONFIG_MTRR
 module_param(nomtrr, bool, false);
 MODULE_PARM_DESC(nomtrr, "Disables MTRR support (0 or 1=disabled) "
 		 "(default=0)");
-#endif
 
 MODULE_AUTHOR("Antonino Daplas");
 MODULE_DESCRIPTION("Framebuffer driver for nVidia graphics chipset");
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 32/47] video: fbdev: nvidia: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (56 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR and ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/nvidia/nv_type.h |  7 +------
 drivers/video/fbdev/nvidia/nvidia.c  | 37 ++++++------------------------------
 2 files changed, 7 insertions(+), 37 deletions(-)

diff --git a/drivers/video/fbdev/nvidia/nv_type.h b/drivers/video/fbdev/nvidia/nv_type.h
index c03f7f5..6ff321a 100644
--- a/drivers/video/fbdev/nvidia/nv_type.h
+++ b/drivers/video/fbdev/nvidia/nv_type.h
@@ -148,12 +148,7 @@ struct nvidia_par {
 	u32 forceCRTC;
 	u32 open_count;
 	u8 DDCBase;
-#ifdef CONFIG_MTRR
-	struct {
-		int vram;
-		int vram_valid;
-	} mtrr;
-#endif
+	int wc_cookie;
 	struct nvidia_i2c_chan chan[3];
 
 	volatile u32 __iomem *REGS;
diff --git a/drivers/video/fbdev/nvidia/nvidia.c b/drivers/video/fbdev/nvidia/nvidia.c
index def0412..781f5e7 100644
--- a/drivers/video/fbdev/nvidia/nvidia.c
+++ b/drivers/video/fbdev/nvidia/nvidia.c
@@ -21,9 +21,6 @@
 #include <linux/pci.h>
 #include <linux/console.h>
 #include <linux/backlight.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 #ifdef CONFIG_PPC_OF
 #include <asm/prom.h>
 #include <asm/pci-bridge.h>
@@ -80,9 +77,7 @@ static int paneltweak = 0;
 static int vram = 0;
 static int bpp = 8;
 static int reverse_i2c;
-#ifdef CONFIG_MTRR
 static bool nomtrr = false;
-#endif
 #ifdef CONFIG_PMAC_BACKLIGHT
 static int backlight = 1;
 #else
@@ -1365,7 +1360,8 @@ static int nvidiafb_probe(struct pci_dev *pd, const struct pci_device_id *ent)
 	par->ScratchBufferStart = par->FbUsableSize - par->ScratchBufferSize;
 	par->CursorStart = par->FbUsableSize + (32 * 1024);
 
-	info->screen_base = ioremap(nvidiafb_fix.smem_start, par->FbMapSize);
+	info->screen_base = ioremap_wc(nvidiafb_fix.smem_start,
+				       par->FbMapSize);
 	info->screen_size = par->FbUsableSize;
 	nvidiafb_fix.smem_len = par->RamAmountKBytes * 1024;
 
@@ -1376,20 +1372,9 @@ static int nvidiafb_probe(struct pci_dev *pd, const struct pci_device_id *ent)
 
 	par->FbStart = info->screen_base;
 
-#ifdef CONFIG_MTRR
-	if (!nomtrr) {
-		par->mtrr.vram = mtrr_add(nvidiafb_fix.smem_start,
-					  par->RamAmountKBytes * 1024,
-					  MTRR_TYPE_WRCOMB, 1);
-		if (par->mtrr.vram < 0) {
-			printk(KERN_ERR PFX "unable to setup MTRR\n");
-		} else {
-			par->mtrr.vram_valid = 1;
-			/* let there be speed */
-			printk(KERN_INFO PFX "MTRR set to ON\n");
-		}
-	}
-#endif				/* CONFIG_MTRR */
+	if (!nomtrr)
+		par->wc_cookie = arch_phys_wc_add(nvidiafb_fix.smem_start,
+						  par->RamAmountKBytes * 1024);
 
 	info->fbops = &nvidia_fb_ops;
 	info->fix = nvidiafb_fix;
@@ -1447,13 +1432,7 @@ static void nvidiafb_remove(struct pci_dev *pd)
 	unregister_framebuffer(info);
 
 	nvidia_bl_exit(par);
-
-#ifdef CONFIG_MTRR
-	if (par->mtrr.vram_valid)
-		mtrr_del(par->mtrr.vram, info->fix.smem_start,
-			 info->fix.smem_len);
-#endif				/* CONFIG_MTRR */
-
+	arch_phys_wc_del(par->wc_cookie);
 	iounmap(info->screen_base);
 	fb_destroy_modedb(info->monspecs.modedb);
 	nvidia_delete_i2c_busses(par);
@@ -1505,10 +1484,8 @@ static int nvidiafb_setup(char *options)
 			vram = simple_strtoul(this_opt+5, NULL, 0);
 		} else if (!strncmp(this_opt, "backlight:", 10)) {
 			backlight = simple_strtoul(this_opt+10, NULL, 0);
-#ifdef CONFIG_MTRR
 		} else if (!strncmp(this_opt, "nomtrr", 6)) {
 			nomtrr = true;
-#endif
 		} else if (!strncmp(this_opt, "fpdither:", 9)) {
 			fpdither = simple_strtol(this_opt+9, NULL, 0);
 		} else if (!strncmp(this_opt, "bpp:", 4)) {
@@ -1596,11 +1573,9 @@ MODULE_PARM_DESC(bpp, "pixel width in bits"
 		 "(default=8)");
 module_param(reverse_i2c, int, 0);
 MODULE_PARM_DESC(reverse_i2c, "reverse port assignment of the i2c bus");
-#ifdef CONFIG_MTRR
 module_param(nomtrr, bool, false);
 MODULE_PARM_DESC(nomtrr, "Disables MTRR support (0 or 1=disabled) "
 		 "(default=0)");
-#endif
 
 MODULE_AUTHOR("Antonino Daplas");
 MODULE_DESCRIPTION("Framebuffer driver for nVidia graphics chipset");
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 33/47] video: fbdev: savagefb: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/savage/savagefb.h        |  4 +---
 drivers/video/fbdev/savage/savagefb_driver.c | 17 +++--------------
 2 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/drivers/video/fbdev/savage/savagefb.h b/drivers/video/fbdev/savage/savagefb.h
index 8ff4ab1..aba04af 100644
--- a/drivers/video/fbdev/savage/savagefb.h
+++ b/drivers/video/fbdev/savage/savagefb.h
@@ -213,9 +213,7 @@ struct savagefb_par {
 		void   __iomem *vbase;
 		u32    pbase;
 		u32    len;
-#ifdef CONFIG_MTRR
-		int    mtrr;
-#endif
+		int    wc_cookie;
 	} video;
 
 	struct {
diff --git a/drivers/video/fbdev/savage/savagefb_driver.c b/drivers/video/fbdev/savage/savagefb_driver.c
index 4dbf45f..6c77ab0 100644
--- a/drivers/video/fbdev/savage/savagefb_driver.c
+++ b/drivers/video/fbdev/savage/savagefb_driver.c
@@ -57,10 +57,6 @@
 #include <asm/irq.h>
 #include <asm/pgtable.h>
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 #include "savagefb.h"
 
 
@@ -1775,7 +1771,7 @@ static int savage_map_video(struct fb_info *info, int video_len)
 
 	par->video.pbase = pci_resource_start(par->pcidev, resource);
 	par->video.len   = video_len;
-	par->video.vbase = ioremap(par->video.pbase, par->video.len);
+	par->video.vbase = ioremap_wc(par->video.pbase, par->video.len);
 
 	if (!par->video.vbase) {
 		printk("savagefb: unable to map screen memory\n");
@@ -1787,11 +1783,7 @@ static int savage_map_video(struct fb_info *info, int video_len)
 	info->fix.smem_start = par->video.pbase;
 	info->fix.smem_len   = par->video.len - par->cob_size;
 	info->screen_base    = par->video.vbase;
-
-#ifdef CONFIG_MTRR
-	par->video.mtrr = mtrr_add(par->video.pbase, video_len,
-				   MTRR_TYPE_WRCOMB, 1);
-#endif
+	par->video.wc_cookie = arch_phys_wc_add(par->video.pbase, video_len);
 
 	/* Clear framebuffer, it's all white in memory after boot */
 	memset_io(par->video.vbase, 0, par->video.len);
@@ -1806,10 +1798,7 @@ static void savage_unmap_video(struct fb_info *info)
 	DBG("savage_unmap_video");
 
 	if (par->video.vbase) {
-#ifdef CONFIG_MTRR
-		mtrr_del(par->video.mtrr, par->video.pbase, par->video.len);
-#endif
-
+		arch_phys_wc_del(par->video.wc_cookie);
 		iounmap(par->video.vbase);
 		par->video.vbase = NULL;
 		info->screen_base = NULL;
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 33/47] video: fbdev: savagefb: use arch_phys_wc_add() and ioremap_wc()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/savage/savagefb.h        |  4 +---
 drivers/video/fbdev/savage/savagefb_driver.c | 17 +++--------------
 2 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/drivers/video/fbdev/savage/savagefb.h b/drivers/video/fbdev/savage/savagefb.h
index 8ff4ab1..aba04af 100644
--- a/drivers/video/fbdev/savage/savagefb.h
+++ b/drivers/video/fbdev/savage/savagefb.h
@@ -213,9 +213,7 @@ struct savagefb_par {
 		void   __iomem *vbase;
 		u32    pbase;
 		u32    len;
-#ifdef CONFIG_MTRR
-		int    mtrr;
-#endif
+		int    wc_cookie;
 	} video;
 
 	struct {
diff --git a/drivers/video/fbdev/savage/savagefb_driver.c b/drivers/video/fbdev/savage/savagefb_driver.c
index 4dbf45f..6c77ab0 100644
--- a/drivers/video/fbdev/savage/savagefb_driver.c
+++ b/drivers/video/fbdev/savage/savagefb_driver.c
@@ -57,10 +57,6 @@
 #include <asm/irq.h>
 #include <asm/pgtable.h>
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 #include "savagefb.h"
 
 
@@ -1775,7 +1771,7 @@ static int savage_map_video(struct fb_info *info, int video_len)
 
 	par->video.pbase = pci_resource_start(par->pcidev, resource);
 	par->video.len   = video_len;
-	par->video.vbase = ioremap(par->video.pbase, par->video.len);
+	par->video.vbase = ioremap_wc(par->video.pbase, par->video.len);
 
 	if (!par->video.vbase) {
 		printk("savagefb: unable to map screen memory\n");
@@ -1787,11 +1783,7 @@ static int savage_map_video(struct fb_info *info, int video_len)
 	info->fix.smem_start = par->video.pbase;
 	info->fix.smem_len   = par->video.len - par->cob_size;
 	info->screen_base    = par->video.vbase;
-
-#ifdef CONFIG_MTRR
-	par->video.mtrr = mtrr_add(par->video.pbase, video_len,
-				   MTRR_TYPE_WRCOMB, 1);
-#endif
+	par->video.wc_cookie = arch_phys_wc_add(par->video.pbase, video_len);
 
 	/* Clear framebuffer, it's all white in memory after boot */
 	memset_io(par->video.vbase, 0, par->video.len);
@@ -1806,10 +1798,7 @@ static void savage_unmap_video(struct fb_info *info)
 	DBG("savage_unmap_video");
 
 	if (par->video.vbase) {
-#ifdef CONFIG_MTRR
-		mtrr_del(par->video.mtrr, par->video.pbase, par->video.len);
-#endif
-
+		arch_phys_wc_del(par->video.wc_cookie);
 		iounmap(par->video.vbase);
 		par->video.vbase = NULL;
 		info->screen_base = NULL;
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 33/47] video: fbdev: savagefb: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (58 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/savage/savagefb.h        |  4 +---
 drivers/video/fbdev/savage/savagefb_driver.c | 17 +++--------------
 2 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/drivers/video/fbdev/savage/savagefb.h b/drivers/video/fbdev/savage/savagefb.h
index 8ff4ab1..aba04af 100644
--- a/drivers/video/fbdev/savage/savagefb.h
+++ b/drivers/video/fbdev/savage/savagefb.h
@@ -213,9 +213,7 @@ struct savagefb_par {
 		void   __iomem *vbase;
 		u32    pbase;
 		u32    len;
-#ifdef CONFIG_MTRR
-		int    mtrr;
-#endif
+		int    wc_cookie;
 	} video;
 
 	struct {
diff --git a/drivers/video/fbdev/savage/savagefb_driver.c b/drivers/video/fbdev/savage/savagefb_driver.c
index 4dbf45f..6c77ab0 100644
--- a/drivers/video/fbdev/savage/savagefb_driver.c
+++ b/drivers/video/fbdev/savage/savagefb_driver.c
@@ -57,10 +57,6 @@
 #include <asm/irq.h>
 #include <asm/pgtable.h>
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 #include "savagefb.h"
 
 
@@ -1775,7 +1771,7 @@ static int savage_map_video(struct fb_info *info, int video_len)
 
 	par->video.pbase = pci_resource_start(par->pcidev, resource);
 	par->video.len   = video_len;
-	par->video.vbase = ioremap(par->video.pbase, par->video.len);
+	par->video.vbase = ioremap_wc(par->video.pbase, par->video.len);
 
 	if (!par->video.vbase) {
 		printk("savagefb: unable to map screen memory\n");
@@ -1787,11 +1783,7 @@ static int savage_map_video(struct fb_info *info, int video_len)
 	info->fix.smem_start = par->video.pbase;
 	info->fix.smem_len   = par->video.len - par->cob_size;
 	info->screen_base    = par->video.vbase;
-
-#ifdef CONFIG_MTRR
-	par->video.mtrr = mtrr_add(par->video.pbase, video_len,
-				   MTRR_TYPE_WRCOMB, 1);
-#endif
+	par->video.wc_cookie = arch_phys_wc_add(par->video.pbase, video_len);
 
 	/* Clear framebuffer, it's all white in memory after boot */
 	memset_io(par->video.vbase, 0, par->video.len);
@@ -1806,10 +1798,7 @@ static void savage_unmap_video(struct fb_info *info)
 	DBG("savage_unmap_video");
 
 	if (par->video.vbase) {
-#ifdef CONFIG_MTRR
-		mtrr_del(par->video.mtrr, par->video.pbase, par->video.len);
-#endif
-
+		arch_phys_wc_del(par->video.wc_cookie);
 		iounmap(par->video.vbase);
 		par->video.vbase = NULL;
 		info->screen_base = NULL;
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 34/47] video: fbdev: sisfb: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/sis/sis.h      |  2 +-
 drivers/video/fbdev/sis/sis_main.c | 27 ++++++---------------------
 2 files changed, 7 insertions(+), 22 deletions(-)

diff --git a/drivers/video/fbdev/sis/sis.h b/drivers/video/fbdev/sis/sis.h
index 1987f1b7..ea1d1c9 100644
--- a/drivers/video/fbdev/sis/sis.h
+++ b/drivers/video/fbdev/sis/sis.h
@@ -458,7 +458,7 @@ struct sis_video_info {
 
 	unsigned char	*bios_abase;
 
-	int		mtrr;
+	int		wc_cookie;
 
 	u32		sisfb_mem;
 
diff --git a/drivers/video/fbdev/sis/sis_main.c b/drivers/video/fbdev/sis/sis_main.c
index fcf610e..e923038 100644
--- a/drivers/video/fbdev/sis/sis_main.c
+++ b/drivers/video/fbdev/sis/sis_main.c
@@ -53,9 +53,6 @@
 #include <linux/types.h>
 #include <linux/uaccess.h>
 #include <asm/io.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 
 #include "sis.h"
 #include "sis_main.h"
@@ -4130,13 +4127,13 @@ static void sisfb_post_map_vram(struct sis_video_info *ivideo,
 	if (*mapsize < (min << 20))
 		return;
 
-	ivideo->video_vbase = ioremap(ivideo->video_base, (*mapsize));
+	ivideo->video_vbase = ioremap_wc(ivideo->video_base, (*mapsize));
 
 	if(!ivideo->video_vbase) {
 		printk(KERN_ERR
 			"sisfb: Unable to map maximum video RAM for size detection\n");
 		(*mapsize) >>= 1;
-		while((!(ivideo->video_vbase = ioremap(ivideo->video_base, (*mapsize))))) {
+		while((!(ivideo->video_vbase = ioremap_wc(ivideo->video_base, (*mapsize))))) {
 			(*mapsize) >>= 1;
 			if((*mapsize) < (min << 20))
 				break;
@@ -6186,7 +6183,7 @@ static int sisfb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto error_2;
 	}
 
-	ivideo->video_vbase = ioremap(ivideo->video_base, ivideo->video_size);
+	ivideo->video_vbase = ioremap_wc(ivideo->video_base, ivideo->video_size);
 	ivideo->SiS_Pr.VideoMemoryAddress = ivideo->video_vbase;
 	if(!ivideo->video_vbase) {
 		printk(KERN_ERR "sisfb: Fatal error: Unable to map framebuffer memory\n");
@@ -6254,8 +6251,6 @@ error_3:	vfree(ivideo->bios_abase);
 	ivideo->SiS_Pr.VideoMemoryAddress += ivideo->video_offset;
 	ivideo->SiS_Pr.VideoMemorySize = ivideo->sisfb_mem;
 
-	ivideo->mtrr = -1;
-
 	ivideo->vbflags = 0;
 	ivideo->lcddefmodeidx = DEFAULT_LCDMODE;
 	ivideo->tvdefmodeidx  = DEFAULT_TVMODE;
@@ -6443,14 +6438,8 @@ error_3:	vfree(ivideo->bios_abase);
 
 		printk(KERN_DEBUG "sisfb: Initial vbflags 0x%x\n", (int)ivideo->vbflags);
 
-#ifdef CONFIG_MTRR
-		ivideo->mtrr = mtrr_add(ivideo->video_base, ivideo->video_size,
-					MTRR_TYPE_WRCOMB, 1);
-		if(ivideo->mtrr < 0) {
-			printk(KERN_DEBUG "sisfb: Failed to add MTRRs\n");
-		}
-#endif
-
+		ivideo->wc_cookie = arch_phys_wc_add(ivideo->video_base,
+						     ivideo->video_size);
 		if(register_framebuffer(sis_fb_info) < 0) {
 			printk(KERN_ERR "sisfb: Fatal error: Failed to register framebuffer\n");
 			ret = -EINVAL;
@@ -6507,11 +6496,7 @@ static void sisfb_remove(struct pci_dev *pdev)
 
 	pci_dev_put(ivideo->nbridge);
 
-#ifdef CONFIG_MTRR
-	/* Release MTRR region */
-	if(ivideo->mtrr >= 0)
-		mtrr_del(ivideo->mtrr, ivideo->video_base, ivideo->video_size);
-#endif
+	arch_phys_wc_del(ivideo->wc_cookie);
 
 	/* If device was disabled when starting, disable
 	 * it when quitting.
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 34/47] video: fbdev: sisfb: use arch_phys_wc_add() and ioremap_wc()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/sis/sis.h      |  2 +-
 drivers/video/fbdev/sis/sis_main.c | 27 ++++++---------------------
 2 files changed, 7 insertions(+), 22 deletions(-)

diff --git a/drivers/video/fbdev/sis/sis.h b/drivers/video/fbdev/sis/sis.h
index 1987f1b7..ea1d1c9 100644
--- a/drivers/video/fbdev/sis/sis.h
+++ b/drivers/video/fbdev/sis/sis.h
@@ -458,7 +458,7 @@ struct sis_video_info {
 
 	unsigned char	*bios_abase;
 
-	int		mtrr;
+	int		wc_cookie;
 
 	u32		sisfb_mem;
 
diff --git a/drivers/video/fbdev/sis/sis_main.c b/drivers/video/fbdev/sis/sis_main.c
index fcf610e..e923038 100644
--- a/drivers/video/fbdev/sis/sis_main.c
+++ b/drivers/video/fbdev/sis/sis_main.c
@@ -53,9 +53,6 @@
 #include <linux/types.h>
 #include <linux/uaccess.h>
 #include <asm/io.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 
 #include "sis.h"
 #include "sis_main.h"
@@ -4130,13 +4127,13 @@ static void sisfb_post_map_vram(struct sis_video_info *ivideo,
 	if (*mapsize < (min << 20))
 		return;
 
-	ivideo->video_vbase = ioremap(ivideo->video_base, (*mapsize));
+	ivideo->video_vbase = ioremap_wc(ivideo->video_base, (*mapsize));
 
 	if(!ivideo->video_vbase) {
 		printk(KERN_ERR
 			"sisfb: Unable to map maximum video RAM for size detection\n");
 		(*mapsize) >>= 1;
-		while((!(ivideo->video_vbase = ioremap(ivideo->video_base, (*mapsize))))) {
+		while((!(ivideo->video_vbase = ioremap_wc(ivideo->video_base, (*mapsize))))) {
 			(*mapsize) >>= 1;
 			if((*mapsize) < (min << 20))
 				break;
@@ -6186,7 +6183,7 @@ static int sisfb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto error_2;
 	}
 
-	ivideo->video_vbase = ioremap(ivideo->video_base, ivideo->video_size);
+	ivideo->video_vbase = ioremap_wc(ivideo->video_base, ivideo->video_size);
 	ivideo->SiS_Pr.VideoMemoryAddress = ivideo->video_vbase;
 	if(!ivideo->video_vbase) {
 		printk(KERN_ERR "sisfb: Fatal error: Unable to map framebuffer memory\n");
@@ -6254,8 +6251,6 @@ error_3:	vfree(ivideo->bios_abase);
 	ivideo->SiS_Pr.VideoMemoryAddress += ivideo->video_offset;
 	ivideo->SiS_Pr.VideoMemorySize = ivideo->sisfb_mem;
 
-	ivideo->mtrr = -1;
-
 	ivideo->vbflags = 0;
 	ivideo->lcddefmodeidx = DEFAULT_LCDMODE;
 	ivideo->tvdefmodeidx  = DEFAULT_TVMODE;
@@ -6443,14 +6438,8 @@ error_3:	vfree(ivideo->bios_abase);
 
 		printk(KERN_DEBUG "sisfb: Initial vbflags 0x%x\n", (int)ivideo->vbflags);
 
-#ifdef CONFIG_MTRR
-		ivideo->mtrr = mtrr_add(ivideo->video_base, ivideo->video_size,
-					MTRR_TYPE_WRCOMB, 1);
-		if(ivideo->mtrr < 0) {
-			printk(KERN_DEBUG "sisfb: Failed to add MTRRs\n");
-		}
-#endif
-
+		ivideo->wc_cookie = arch_phys_wc_add(ivideo->video_base,
+						     ivideo->video_size);
 		if(register_framebuffer(sis_fb_info) < 0) {
 			printk(KERN_ERR "sisfb: Fatal error: Failed to register framebuffer\n");
 			ret = -EINVAL;
@@ -6507,11 +6496,7 @@ static void sisfb_remove(struct pci_dev *pdev)
 
 	pci_dev_put(ivideo->nbridge);
 
-#ifdef CONFIG_MTRR
-	/* Release MTRR region */
-	if(ivideo->mtrr >= 0)
-		mtrr_del(ivideo->mtrr, ivideo->video_base, ivideo->video_size);
-#endif
+	arch_phys_wc_del(ivideo->wc_cookie);
 
 	/* If device was disabled when starting, disable
 	 * it when quitting.
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 34/47] video: fbdev: sisfb: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (60 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/sis/sis.h      |  2 +-
 drivers/video/fbdev/sis/sis_main.c | 27 ++++++---------------------
 2 files changed, 7 insertions(+), 22 deletions(-)

diff --git a/drivers/video/fbdev/sis/sis.h b/drivers/video/fbdev/sis/sis.h
index 1987f1b7..ea1d1c9 100644
--- a/drivers/video/fbdev/sis/sis.h
+++ b/drivers/video/fbdev/sis/sis.h
@@ -458,7 +458,7 @@ struct sis_video_info {
 
 	unsigned char	*bios_abase;
 
-	int		mtrr;
+	int		wc_cookie;
 
 	u32		sisfb_mem;
 
diff --git a/drivers/video/fbdev/sis/sis_main.c b/drivers/video/fbdev/sis/sis_main.c
index fcf610e..e923038 100644
--- a/drivers/video/fbdev/sis/sis_main.c
+++ b/drivers/video/fbdev/sis/sis_main.c
@@ -53,9 +53,6 @@
 #include <linux/types.h>
 #include <linux/uaccess.h>
 #include <asm/io.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 
 #include "sis.h"
 #include "sis_main.h"
@@ -4130,13 +4127,13 @@ static void sisfb_post_map_vram(struct sis_video_info *ivideo,
 	if (*mapsize < (min << 20))
 		return;
 
-	ivideo->video_vbase = ioremap(ivideo->video_base, (*mapsize));
+	ivideo->video_vbase = ioremap_wc(ivideo->video_base, (*mapsize));
 
 	if(!ivideo->video_vbase) {
 		printk(KERN_ERR
 			"sisfb: Unable to map maximum video RAM for size detection\n");
 		(*mapsize) >>= 1;
-		while((!(ivideo->video_vbase = ioremap(ivideo->video_base, (*mapsize))))) {
+		while((!(ivideo->video_vbase = ioremap_wc(ivideo->video_base, (*mapsize))))) {
 			(*mapsize) >>= 1;
 			if((*mapsize) < (min << 20))
 				break;
@@ -6186,7 +6183,7 @@ static int sisfb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto error_2;
 	}
 
-	ivideo->video_vbase = ioremap(ivideo->video_base, ivideo->video_size);
+	ivideo->video_vbase = ioremap_wc(ivideo->video_base, ivideo->video_size);
 	ivideo->SiS_Pr.VideoMemoryAddress = ivideo->video_vbase;
 	if(!ivideo->video_vbase) {
 		printk(KERN_ERR "sisfb: Fatal error: Unable to map framebuffer memory\n");
@@ -6254,8 +6251,6 @@ error_3:	vfree(ivideo->bios_abase);
 	ivideo->SiS_Pr.VideoMemoryAddress += ivideo->video_offset;
 	ivideo->SiS_Pr.VideoMemorySize = ivideo->sisfb_mem;
 
-	ivideo->mtrr = -1;
-
 	ivideo->vbflags = 0;
 	ivideo->lcddefmodeidx = DEFAULT_LCDMODE;
 	ivideo->tvdefmodeidx  = DEFAULT_TVMODE;
@@ -6443,14 +6438,8 @@ error_3:	vfree(ivideo->bios_abase);
 
 		printk(KERN_DEBUG "sisfb: Initial vbflags 0x%x\n", (int)ivideo->vbflags);
 
-#ifdef CONFIG_MTRR
-		ivideo->mtrr = mtrr_add(ivideo->video_base, ivideo->video_size,
-					MTRR_TYPE_WRCOMB, 1);
-		if(ivideo->mtrr < 0) {
-			printk(KERN_DEBUG "sisfb: Failed to add MTRRs\n");
-		}
-#endif
-
+		ivideo->wc_cookie = arch_phys_wc_add(ivideo->video_base,
+						     ivideo->video_size);
 		if(register_framebuffer(sis_fb_info) < 0) {
 			printk(KERN_ERR "sisfb: Fatal error: Failed to register framebuffer\n");
 			ret = -EINVAL;
@@ -6507,11 +6496,7 @@ static void sisfb_remove(struct pci_dev *pdev)
 
 	pci_dev_put(ivideo->nbridge);
 
-#ifdef CONFIG_MTRR
-	/* Release MTRR region */
-	if(ivideo->mtrr >= 0)
-		mtrr_del(ivideo->mtrr, ivideo->video_base, ivideo->video_size);
-#endif
+	arch_phys_wc_del(ivideo->wc_cookie);
 
 	/* If device was disabled when starting, disable
 	 * it when quitting.
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 35/47] video: fbdev: aty: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/aty/aty128fb.c | 36 ++++++------------------------------
 1 file changed, 6 insertions(+), 30 deletions(-)

diff --git a/drivers/video/fbdev/aty/aty128fb.c b/drivers/video/fbdev/aty/aty128fb.c
index aedf2fb..f41955b 100644
--- a/drivers/video/fbdev/aty/aty128fb.c
+++ b/drivers/video/fbdev/aty/aty128fb.c
@@ -80,10 +80,6 @@
 #include <asm/btext.h>
 #endif /* CONFIG_BOOTX_TEXT */
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 #include <video/aty128.h>
 
 /* Debug flag */
@@ -399,10 +395,7 @@ static int default_cmode = CMODE_8;
 
 static int default_crt_on = 0;
 static int default_lcd_on = 1;
-
-#ifdef CONFIG_MTRR
 static bool mtrr = true;
-#endif
 
 #ifdef CONFIG_FB_ATY128_BACKLIGHT
 #ifdef CONFIG_PMAC_BACKLIGHT
@@ -456,9 +449,7 @@ struct aty128fb_par {
 	u32 vram_size;                      /* onboard video ram   */
 	int chip_gen;
 	const struct aty128_meminfo *mem;   /* onboard mem info    */
-#ifdef CONFIG_MTRR
-	struct { int vram; int vram_valid; } mtrr;
-#endif
+	int wc_cookie;
 	int blitter_may_be_busy;
 	int fifo_slots;                 /* free slots in FIFO (64 max) */
 
@@ -1725,12 +1716,10 @@ static int aty128fb_setup(char *options)
 #endif
 			continue;
 		}
-#ifdef CONFIG_MTRR
 		if(!strncmp(this_opt, "nomtrr", 6)) {
 			mtrr = 0;
 			continue;
 		}
-#endif
 #ifdef CONFIG_PPC_PMAC
 		/* vmode and cmode deprecated */
 		if (!strncmp(this_opt, "vmode:", 6)) {
@@ -2133,7 +2122,7 @@ static int aty128_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	par->vram_size = aty_ld_le32(CNFG_MEMSIZE) & 0x03FFFFFF;
 
 	/* Virtualize the framebuffer */
-	info->screen_base = ioremap(fb_addr, par->vram_size);
+	info->screen_base = ioremap_wc(fb_addr, par->vram_size);
 	if (!info->screen_base)
 		goto err_unmap_out;
 
@@ -2170,15 +2159,9 @@ static int aty128_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (!aty128_init(pdev, ent))
 		goto err_out;
 
-#ifdef CONFIG_MTRR
-	if (mtrr) {
-		par->mtrr.vram = mtrr_add(info->fix.smem_start,
-				par->vram_size, MTRR_TYPE_WRCOMB, 1);
-		par->mtrr.vram_valid = 1;
-		/* let there be speed */
-		printk(KERN_INFO "aty128fb: Rage128 MTRR set to ON\n");
-	}
-#endif /* CONFIG_MTRR */
+	if (mtrr)
+		par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+						  par->vram_size);
 	return 0;
 
 err_out:
@@ -2212,11 +2195,7 @@ static void aty128_remove(struct pci_dev *pdev)
 	aty128_bl_exit(info->bl_dev);
 #endif
 
-#ifdef CONFIG_MTRR
-	if (par->mtrr.vram_valid)
-		mtrr_del(par->mtrr.vram, info->fix.smem_start,
-			 par->vram_size);
-#endif /* CONFIG_MTRR */
+	arch_phys_wc_del(par->wc_cookie);
 	iounmap(par->regbase);
 	iounmap(info->screen_base);
 
@@ -2625,8 +2604,5 @@ MODULE_DESCRIPTION("FBDev driver for ATI Rage128 / Pro cards");
 MODULE_LICENSE("GPL");
 module_param(mode_option, charp, 0);
 MODULE_PARM_DESC(mode_option, "Specify resolution as \"<xres>x<yres>[-<bpp>][@<refresh>]\" ");
-#ifdef CONFIG_MTRR
 module_param_named(nomtrr, mtrr, invbool, 0);
 MODULE_PARM_DESC(nomtrr, "bool: Disable MTRR support (0 or 1=disabled) (default=0)");
-#endif
-
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 35/47] video: fbdev: aty: use arch_phys_wc_add() and ioremap_wc()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/aty/aty128fb.c | 36 ++++++------------------------------
 1 file changed, 6 insertions(+), 30 deletions(-)

diff --git a/drivers/video/fbdev/aty/aty128fb.c b/drivers/video/fbdev/aty/aty128fb.c
index aedf2fb..f41955b 100644
--- a/drivers/video/fbdev/aty/aty128fb.c
+++ b/drivers/video/fbdev/aty/aty128fb.c
@@ -80,10 +80,6 @@
 #include <asm/btext.h>
 #endif /* CONFIG_BOOTX_TEXT */
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 #include <video/aty128.h>
 
 /* Debug flag */
@@ -399,10 +395,7 @@ static int default_cmode = CMODE_8;
 
 static int default_crt_on = 0;
 static int default_lcd_on = 1;
-
-#ifdef CONFIG_MTRR
 static bool mtrr = true;
-#endif
 
 #ifdef CONFIG_FB_ATY128_BACKLIGHT
 #ifdef CONFIG_PMAC_BACKLIGHT
@@ -456,9 +449,7 @@ struct aty128fb_par {
 	u32 vram_size;                      /* onboard video ram   */
 	int chip_gen;
 	const struct aty128_meminfo *mem;   /* onboard mem info    */
-#ifdef CONFIG_MTRR
-	struct { int vram; int vram_valid; } mtrr;
-#endif
+	int wc_cookie;
 	int blitter_may_be_busy;
 	int fifo_slots;                 /* free slots in FIFO (64 max) */
 
@@ -1725,12 +1716,10 @@ static int aty128fb_setup(char *options)
 #endif
 			continue;
 		}
-#ifdef CONFIG_MTRR
 		if(!strncmp(this_opt, "nomtrr", 6)) {
 			mtrr = 0;
 			continue;
 		}
-#endif
 #ifdef CONFIG_PPC_PMAC
 		/* vmode and cmode deprecated */
 		if (!strncmp(this_opt, "vmode:", 6)) {
@@ -2133,7 +2122,7 @@ static int aty128_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	par->vram_size = aty_ld_le32(CNFG_MEMSIZE) & 0x03FFFFFF;
 
 	/* Virtualize the framebuffer */
-	info->screen_base = ioremap(fb_addr, par->vram_size);
+	info->screen_base = ioremap_wc(fb_addr, par->vram_size);
 	if (!info->screen_base)
 		goto err_unmap_out;
 
@@ -2170,15 +2159,9 @@ static int aty128_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (!aty128_init(pdev, ent))
 		goto err_out;
 
-#ifdef CONFIG_MTRR
-	if (mtrr) {
-		par->mtrr.vram = mtrr_add(info->fix.smem_start,
-				par->vram_size, MTRR_TYPE_WRCOMB, 1);
-		par->mtrr.vram_valid = 1;
-		/* let there be speed */
-		printk(KERN_INFO "aty128fb: Rage128 MTRR set to ON\n");
-	}
-#endif /* CONFIG_MTRR */
+	if (mtrr)
+		par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+						  par->vram_size);
 	return 0;
 
 err_out:
@@ -2212,11 +2195,7 @@ static void aty128_remove(struct pci_dev *pdev)
 	aty128_bl_exit(info->bl_dev);
 #endif
 
-#ifdef CONFIG_MTRR
-	if (par->mtrr.vram_valid)
-		mtrr_del(par->mtrr.vram, info->fix.smem_start,
-			 par->vram_size);
-#endif /* CONFIG_MTRR */
+	arch_phys_wc_del(par->wc_cookie);
 	iounmap(par->regbase);
 	iounmap(info->screen_base);
 
@@ -2625,8 +2604,5 @@ MODULE_DESCRIPTION("FBDev driver for ATI Rage128 / Pro cards");
 MODULE_LICENSE("GPL");
 module_param(mode_option, charp, 0);
 MODULE_PARM_DESC(mode_option, "Specify resolution as \"<xres>x<yres>[-<bpp>][@<refresh>]\" ");
-#ifdef CONFIG_MTRR
 module_param_named(nomtrr, mtrr, invbool, 0);
 MODULE_PARM_DESC(nomtrr, "bool: Disable MTRR support (0 or 1=disabled) (default=0)");
-#endif
-
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 35/47] video: fbdev: aty: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (62 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/aty/aty128fb.c | 36 ++++++------------------------------
 1 file changed, 6 insertions(+), 30 deletions(-)

diff --git a/drivers/video/fbdev/aty/aty128fb.c b/drivers/video/fbdev/aty/aty128fb.c
index aedf2fb..f41955b 100644
--- a/drivers/video/fbdev/aty/aty128fb.c
+++ b/drivers/video/fbdev/aty/aty128fb.c
@@ -80,10 +80,6 @@
 #include <asm/btext.h>
 #endif /* CONFIG_BOOTX_TEXT */
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 #include <video/aty128.h>
 
 /* Debug flag */
@@ -399,10 +395,7 @@ static int default_cmode = CMODE_8;
 
 static int default_crt_on = 0;
 static int default_lcd_on = 1;
-
-#ifdef CONFIG_MTRR
 static bool mtrr = true;
-#endif
 
 #ifdef CONFIG_FB_ATY128_BACKLIGHT
 #ifdef CONFIG_PMAC_BACKLIGHT
@@ -456,9 +449,7 @@ struct aty128fb_par {
 	u32 vram_size;                      /* onboard video ram   */
 	int chip_gen;
 	const struct aty128_meminfo *mem;   /* onboard mem info    */
-#ifdef CONFIG_MTRR
-	struct { int vram; int vram_valid; } mtrr;
-#endif
+	int wc_cookie;
 	int blitter_may_be_busy;
 	int fifo_slots;                 /* free slots in FIFO (64 max) */
 
@@ -1725,12 +1716,10 @@ static int aty128fb_setup(char *options)
 #endif
 			continue;
 		}
-#ifdef CONFIG_MTRR
 		if(!strncmp(this_opt, "nomtrr", 6)) {
 			mtrr = 0;
 			continue;
 		}
-#endif
 #ifdef CONFIG_PPC_PMAC
 		/* vmode and cmode deprecated */
 		if (!strncmp(this_opt, "vmode:", 6)) {
@@ -2133,7 +2122,7 @@ static int aty128_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	par->vram_size = aty_ld_le32(CNFG_MEMSIZE) & 0x03FFFFFF;
 
 	/* Virtualize the framebuffer */
-	info->screen_base = ioremap(fb_addr, par->vram_size);
+	info->screen_base = ioremap_wc(fb_addr, par->vram_size);
 	if (!info->screen_base)
 		goto err_unmap_out;
 
@@ -2170,15 +2159,9 @@ static int aty128_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (!aty128_init(pdev, ent))
 		goto err_out;
 
-#ifdef CONFIG_MTRR
-	if (mtrr) {
-		par->mtrr.vram = mtrr_add(info->fix.smem_start,
-				par->vram_size, MTRR_TYPE_WRCOMB, 1);
-		par->mtrr.vram_valid = 1;
-		/* let there be speed */
-		printk(KERN_INFO "aty128fb: Rage128 MTRR set to ON\n");
-	}
-#endif /* CONFIG_MTRR */
+	if (mtrr)
+		par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+						  par->vram_size);
 	return 0;
 
 err_out:
@@ -2212,11 +2195,7 @@ static void aty128_remove(struct pci_dev *pdev)
 	aty128_bl_exit(info->bl_dev);
 #endif
 
-#ifdef CONFIG_MTRR
-	if (par->mtrr.vram_valid)
-		mtrr_del(par->mtrr.vram, info->fix.smem_start,
-			 par->vram_size);
-#endif /* CONFIG_MTRR */
+	arch_phys_wc_del(par->wc_cookie);
 	iounmap(par->regbase);
 	iounmap(info->screen_base);
 
@@ -2625,8 +2604,5 @@ MODULE_DESCRIPTION("FBDev driver for ATI Rage128 / Pro cards");
 MODULE_LICENSE("GPL");
 module_param(mode_option, charp, 0);
 MODULE_PARM_DESC(mode_option, "Specify resolution as \"<xres>x<yres>[-<bpp>][@<refresh>]\" ");
-#ifdef CONFIG_MTRR
 module_param_named(nomtrr, mtrr, invbool, 0);
 MODULE_PARM_DESC(nomtrr, "bool: Disable MTRR support (0 or 1=disabled) (default=0)");
-#endif
-
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 36/47] video: fbdev: i810: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The same area used for MTRR is used for the ioremap() area.
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/i810/i810.h      |  3 +--
 drivers/video/fbdev/i810/i810_main.c | 11 +++++++----
 drivers/video/fbdev/i810/i810_main.h | 26 --------------------------
 3 files changed, 8 insertions(+), 32 deletions(-)

diff --git a/drivers/video/fbdev/i810/i810.h b/drivers/video/fbdev/i810/i810.h
index 1414b73..7b1c002 100644
--- a/drivers/video/fbdev/i810/i810.h
+++ b/drivers/video/fbdev/i810/i810.h
@@ -199,7 +199,6 @@
 #define HAS_FONTCACHE               8 
 
 /* driver flags */
-#define HAS_MTRR                    1
 #define HAS_ACCELERATION            2
 #define ALWAYS_SYNC                 4
 #define LOCKUP                      8
@@ -281,7 +280,7 @@ struct i810fb_par {
 	u32 ovract;
 	u32 cur_state;
 	u32 ddc_num;
-	int mtrr_reg;
+	int wc_cookie;
 	u16 bltcntl;
 	u8 interlace;
 };
diff --git a/drivers/video/fbdev/i810/i810_main.c b/drivers/video/fbdev/i810/i810_main.c
index bb674e4..025b882 100644
--- a/drivers/video/fbdev/i810/i810_main.c
+++ b/drivers/video/fbdev/i810/i810_main.c
@@ -41,6 +41,7 @@
 #include <linux/resource.h>
 #include <linux/unistd.h>
 #include <linux/console.h>
+#include <linux/io.h>
 
 #include <asm/io.h>
 #include <asm/div64.h>
@@ -1816,7 +1817,9 @@ static void i810_init_device(struct i810fb_par *par)
 	u8 reg;
 	u8 __iomem *mmio = par->mmio_start_virtual;
 
-	if (mtrr) set_mtrr(par);
+	if (mtrr)
+		par->wc_cookie= arch_phys_wc_add((u32) par->aperture.physical,
+						 par->aperture.size);
 
 	i810_init_cursor(par);
 
@@ -1865,8 +1868,8 @@ static int i810_allocate_pci_resource(struct i810fb_par *par,
 	}
 	par->res_flags |= FRAMEBUFFER_REQ;
 
-	par->aperture.virtual = ioremap_nocache(par->aperture.physical, 
-					par->aperture.size);
+	par->aperture.virtual = ioremap_wc(par->aperture.physical,
+					   par->aperture.size);
 	if (!par->aperture.virtual) {
 		printk("i810fb_init: cannot remap framebuffer region\n");
 		return -ENODEV;
@@ -2096,7 +2099,7 @@ static void i810fb_release_resource(struct fb_info *info,
 				    struct i810fb_par *par)
 {
 	struct gtt_data *gtt = &par->i810_gtt;
-	unset_mtrr(par);
+	arch_phys_wc_del(par->wc_cookie);
 
 	i810_delete_i2c_busses(par);
 
diff --git a/drivers/video/fbdev/i810/i810_main.h b/drivers/video/fbdev/i810/i810_main.h
index a25afaa..7bfaaad 100644
--- a/drivers/video/fbdev/i810/i810_main.h
+++ b/drivers/video/fbdev/i810/i810_main.h
@@ -60,32 +60,6 @@ static inline void flush_cache(void)
 #define flush_cache() do { } while(0)
 #endif 
 
-#ifdef CONFIG_MTRR
-
-#include <asm/mtrr.h>
-
-static inline void set_mtrr(struct i810fb_par *par)
-{
-	par->mtrr_reg = mtrr_add((u32) par->aperture.physical, 
-		 par->aperture.size, MTRR_TYPE_WRCOMB, 1);
-	if (par->mtrr_reg < 0) {
-		printk(KERN_ERR "set_mtrr: unable to set MTRR\n");
-		return;
-	}
-	par->dev_flags |= HAS_MTRR;
-}
-static inline void unset_mtrr(struct i810fb_par *par)
-{
-  	if (par->dev_flags & HAS_MTRR) 
-  		mtrr_del(par->mtrr_reg, (u32) par->aperture.physical, 
-			 par->aperture.size); 
-}
-#else
-#define set_mtrr(x) printk("set_mtrr: MTRR is disabled in the kernel\n")
-
-#define unset_mtrr(x) do { } while (0)
-#endif /* CONFIG_MTRR */
-
 #ifdef CONFIG_FB_I810_GTF
 #define IS_DVT (0)
 #else
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 36/47] video: fbdev: i810: use arch_phys_wc_add() and ioremap_wc()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The same area used for MTRR is used for the ioremap() area.
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/i810/i810.h      |  3 +--
 drivers/video/fbdev/i810/i810_main.c | 11 +++++++----
 drivers/video/fbdev/i810/i810_main.h | 26 --------------------------
 3 files changed, 8 insertions(+), 32 deletions(-)

diff --git a/drivers/video/fbdev/i810/i810.h b/drivers/video/fbdev/i810/i810.h
index 1414b73..7b1c002 100644
--- a/drivers/video/fbdev/i810/i810.h
+++ b/drivers/video/fbdev/i810/i810.h
@@ -199,7 +199,6 @@
 #define HAS_FONTCACHE               8 
 
 /* driver flags */
-#define HAS_MTRR                    1
 #define HAS_ACCELERATION            2
 #define ALWAYS_SYNC                 4
 #define LOCKUP                      8
@@ -281,7 +280,7 @@ struct i810fb_par {
 	u32 ovract;
 	u32 cur_state;
 	u32 ddc_num;
-	int mtrr_reg;
+	int wc_cookie;
 	u16 bltcntl;
 	u8 interlace;
 };
diff --git a/drivers/video/fbdev/i810/i810_main.c b/drivers/video/fbdev/i810/i810_main.c
index bb674e4..025b882 100644
--- a/drivers/video/fbdev/i810/i810_main.c
+++ b/drivers/video/fbdev/i810/i810_main.c
@@ -41,6 +41,7 @@
 #include <linux/resource.h>
 #include <linux/unistd.h>
 #include <linux/console.h>
+#include <linux/io.h>
 
 #include <asm/io.h>
 #include <asm/div64.h>
@@ -1816,7 +1817,9 @@ static void i810_init_device(struct i810fb_par *par)
 	u8 reg;
 	u8 __iomem *mmio = par->mmio_start_virtual;
 
-	if (mtrr) set_mtrr(par);
+	if (mtrr)
+		par->wc_cookie= arch_phys_wc_add((u32) par->aperture.physical,
+						 par->aperture.size);
 
 	i810_init_cursor(par);
 
@@ -1865,8 +1868,8 @@ static int i810_allocate_pci_resource(struct i810fb_par *par,
 	}
 	par->res_flags |= FRAMEBUFFER_REQ;
 
-	par->aperture.virtual = ioremap_nocache(par->aperture.physical, 
-					par->aperture.size);
+	par->aperture.virtual = ioremap_wc(par->aperture.physical,
+					   par->aperture.size);
 	if (!par->aperture.virtual) {
 		printk("i810fb_init: cannot remap framebuffer region\n");
 		return -ENODEV;
@@ -2096,7 +2099,7 @@ static void i810fb_release_resource(struct fb_info *info,
 				    struct i810fb_par *par)
 {
 	struct gtt_data *gtt = &par->i810_gtt;
-	unset_mtrr(par);
+	arch_phys_wc_del(par->wc_cookie);
 
 	i810_delete_i2c_busses(par);
 
diff --git a/drivers/video/fbdev/i810/i810_main.h b/drivers/video/fbdev/i810/i810_main.h
index a25afaa..7bfaaad 100644
--- a/drivers/video/fbdev/i810/i810_main.h
+++ b/drivers/video/fbdev/i810/i810_main.h
@@ -60,32 +60,6 @@ static inline void flush_cache(void)
 #define flush_cache() do { } while(0)
 #endif 
 
-#ifdef CONFIG_MTRR
-
-#include <asm/mtrr.h>
-
-static inline void set_mtrr(struct i810fb_par *par)
-{
-	par->mtrr_reg = mtrr_add((u32) par->aperture.physical, 
-		 par->aperture.size, MTRR_TYPE_WRCOMB, 1);
-	if (par->mtrr_reg < 0) {
-		printk(KERN_ERR "set_mtrr: unable to set MTRR\n");
-		return;
-	}
-	par->dev_flags |= HAS_MTRR;
-}
-static inline void unset_mtrr(struct i810fb_par *par)
-{
-  	if (par->dev_flags & HAS_MTRR) 
-  		mtrr_del(par->mtrr_reg, (u32) par->aperture.physical, 
-			 par->aperture.size); 
-}
-#else
-#define set_mtrr(x) printk("set_mtrr: MTRR is disabled in the kernel\n")
-
-#define unset_mtrr(x) do { } while (0)
-#endif /* CONFIG_MTRR */
-
 #ifdef CONFIG_FB_I810_GTF
 #define IS_DVT (0)
 #else
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 36/47] video: fbdev: i810: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (65 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The same area used for MTRR is used for the ioremap() area.
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/i810/i810.h      |  3 +--
 drivers/video/fbdev/i810/i810_main.c | 11 +++++++----
 drivers/video/fbdev/i810/i810_main.h | 26 --------------------------
 3 files changed, 8 insertions(+), 32 deletions(-)

diff --git a/drivers/video/fbdev/i810/i810.h b/drivers/video/fbdev/i810/i810.h
index 1414b73..7b1c002 100644
--- a/drivers/video/fbdev/i810/i810.h
+++ b/drivers/video/fbdev/i810/i810.h
@@ -199,7 +199,6 @@
 #define HAS_FONTCACHE               8 
 
 /* driver flags */
-#define HAS_MTRR                    1
 #define HAS_ACCELERATION            2
 #define ALWAYS_SYNC                 4
 #define LOCKUP                      8
@@ -281,7 +280,7 @@ struct i810fb_par {
 	u32 ovract;
 	u32 cur_state;
 	u32 ddc_num;
-	int mtrr_reg;
+	int wc_cookie;
 	u16 bltcntl;
 	u8 interlace;
 };
diff --git a/drivers/video/fbdev/i810/i810_main.c b/drivers/video/fbdev/i810/i810_main.c
index bb674e4..025b882 100644
--- a/drivers/video/fbdev/i810/i810_main.c
+++ b/drivers/video/fbdev/i810/i810_main.c
@@ -41,6 +41,7 @@
 #include <linux/resource.h>
 #include <linux/unistd.h>
 #include <linux/console.h>
+#include <linux/io.h>
 
 #include <asm/io.h>
 #include <asm/div64.h>
@@ -1816,7 +1817,9 @@ static void i810_init_device(struct i810fb_par *par)
 	u8 reg;
 	u8 __iomem *mmio = par->mmio_start_virtual;
 
-	if (mtrr) set_mtrr(par);
+	if (mtrr)
+		par->wc_cookie= arch_phys_wc_add((u32) par->aperture.physical,
+						 par->aperture.size);
 
 	i810_init_cursor(par);
 
@@ -1865,8 +1868,8 @@ static int i810_allocate_pci_resource(struct i810fb_par *par,
 	}
 	par->res_flags |= FRAMEBUFFER_REQ;
 
-	par->aperture.virtual = ioremap_nocache(par->aperture.physical, 
-					par->aperture.size);
+	par->aperture.virtual = ioremap_wc(par->aperture.physical,
+					   par->aperture.size);
 	if (!par->aperture.virtual) {
 		printk("i810fb_init: cannot remap framebuffer region\n");
 		return -ENODEV;
@@ -2096,7 +2099,7 @@ static void i810fb_release_resource(struct fb_info *info,
 				    struct i810fb_par *par)
 {
 	struct gtt_data *gtt = &par->i810_gtt;
-	unset_mtrr(par);
+	arch_phys_wc_del(par->wc_cookie);
 
 	i810_delete_i2c_busses(par);
 
diff --git a/drivers/video/fbdev/i810/i810_main.h b/drivers/video/fbdev/i810/i810_main.h
index a25afaa..7bfaaad 100644
--- a/drivers/video/fbdev/i810/i810_main.h
+++ b/drivers/video/fbdev/i810/i810_main.h
@@ -60,32 +60,6 @@ static inline void flush_cache(void)
 #define flush_cache() do { } while(0)
 #endif 
 
-#ifdef CONFIG_MTRR
-
-#include <asm/mtrr.h>
-
-static inline void set_mtrr(struct i810fb_par *par)
-{
-	par->mtrr_reg = mtrr_add((u32) par->aperture.physical, 
-		 par->aperture.size, MTRR_TYPE_WRCOMB, 1);
-	if (par->mtrr_reg < 0) {
-		printk(KERN_ERR "set_mtrr: unable to set MTRR\n");
-		return;
-	}
-	par->dev_flags |= HAS_MTRR;
-}
-static inline void unset_mtrr(struct i810fb_par *par)
-{
-  	if (par->dev_flags & HAS_MTRR) 
-  		mtrr_del(par->mtrr_reg, (u32) par->aperture.physical, 
-			 par->aperture.size); 
-}
-#else
-#define set_mtrr(x) printk("set_mtrr: MTRR is disabled in the kernel\n")
-
-#define unset_mtrr(x) do { } while (0)
-#endif /* CONFIG_MTRR */
-
 #ifdef CONFIG_FB_I810_GTF
 #define IS_DVT (0)
 #else
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 37/47] video: fbdev: i740fb: use arch_phys_wc_add() and pci_ioremap_wc_bar()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/i740fb.c | 35 ++++++-----------------------------
 1 file changed, 6 insertions(+), 29 deletions(-)

diff --git a/drivers/video/fbdev/i740fb.c b/drivers/video/fbdev/i740fb.c
index a2b4204..452e116 100644
--- a/drivers/video/fbdev/i740fb.c
+++ b/drivers/video/fbdev/i740fb.c
@@ -27,24 +27,15 @@
 #include <linux/console.h>
 #include <video/vga.h>
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 #include "i740_reg.h"
 
 static char *mode_option;
-
-#ifdef CONFIG_MTRR
 static int mtrr = 1;
-#endif
 
 struct i740fb_par {
 	unsigned char __iomem *regs;
 	bool has_sgram;
-#ifdef CONFIG_MTRR
-	int mtrr_reg;
-#endif
+	int wc_cookie;
 	bool ddc_registered;
 	struct i2c_adapter ddc_adapter;
 	struct i2c_algo_bit_data ddc_algo;
@@ -1040,7 +1031,7 @@ static int i740fb_probe(struct pci_dev *dev, const struct pci_device_id *ent)
 		goto err_request_regions;
 	}
 
-	info->screen_base = pci_ioremap_bar(dev, 0);
+	info->screen_base = pci_ioremap_wc_bar(dev, 0);
 	if (!info->screen_base) {
 		dev_err(info->device, "error remapping base\n");
 		ret = -ENOMEM;
@@ -1144,13 +1135,9 @@ static int i740fb_probe(struct pci_dev *dev, const struct pci_device_id *ent)
 
 	fb_info(info, "%s frame buffer device\n", info->fix.id);
 	pci_set_drvdata(dev, info);
-#ifdef CONFIG_MTRR
-	if (mtrr) {
-		par->mtrr_reg = -1;
-		par->mtrr_reg = mtrr_add(info->fix.smem_start,
-				info->fix.smem_len, MTRR_TYPE_WRCOMB, 1);
-	}
-#endif
+	if (mtrr)
+		par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+						  info->fix.smem_len);
 	return 0;
 
 err_reg_framebuffer:
@@ -1177,13 +1164,7 @@ static void i740fb_remove(struct pci_dev *dev)
 
 	if (info) {
 		struct i740fb_par *par = info->par;
-
-#ifdef CONFIG_MTRR
-		if (par->mtrr_reg >= 0) {
-			mtrr_del(par->mtrr_reg, 0, 0);
-			par->mtrr_reg = -1;
-		}
-#endif
+		arch_phys_wc_del(par->wc_cookie);
 		unregister_framebuffer(info);
 		fb_dealloc_cmap(&info->cmap);
 		if (par->ddc_registered)
@@ -1287,10 +1268,8 @@ static int  __init i740fb_setup(char *options)
 	while ((opt = strsep(&options, ",")) != NULL) {
 		if (!*opt)
 			continue;
-#ifdef CONFIG_MTRR
 		else if (!strncmp(opt, "mtrr:", 5))
 			mtrr = simple_strtoul(opt + 5, NULL, 0);
-#endif
 		else
 			mode_option = opt;
 	}
@@ -1327,7 +1306,5 @@ MODULE_DESCRIPTION("fbdev driver for Intel740");
 module_param(mode_option, charp, 0444);
 MODULE_PARM_DESC(mode_option, "Default video mode ('640x480-8@60', etc)");
 
-#ifdef CONFIG_MTRR
 module_param(mtrr, int, 0444);
 MODULE_PARM_DESC(mtrr, "Enable write-combining with MTRR (1=enable, 0=disable, default=1)");
-#endif
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 37/47] video: fbdev: i740fb: use arch_phys_wc_add() and pci_ioremap_wc_bar()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/i740fb.c | 35 ++++++-----------------------------
 1 file changed, 6 insertions(+), 29 deletions(-)

diff --git a/drivers/video/fbdev/i740fb.c b/drivers/video/fbdev/i740fb.c
index a2b4204..452e116 100644
--- a/drivers/video/fbdev/i740fb.c
+++ b/drivers/video/fbdev/i740fb.c
@@ -27,24 +27,15 @@
 #include <linux/console.h>
 #include <video/vga.h>
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 #include "i740_reg.h"
 
 static char *mode_option;
-
-#ifdef CONFIG_MTRR
 static int mtrr = 1;
-#endif
 
 struct i740fb_par {
 	unsigned char __iomem *regs;
 	bool has_sgram;
-#ifdef CONFIG_MTRR
-	int mtrr_reg;
-#endif
+	int wc_cookie;
 	bool ddc_registered;
 	struct i2c_adapter ddc_adapter;
 	struct i2c_algo_bit_data ddc_algo;
@@ -1040,7 +1031,7 @@ static int i740fb_probe(struct pci_dev *dev, const struct pci_device_id *ent)
 		goto err_request_regions;
 	}
 
-	info->screen_base = pci_ioremap_bar(dev, 0);
+	info->screen_base = pci_ioremap_wc_bar(dev, 0);
 	if (!info->screen_base) {
 		dev_err(info->device, "error remapping base\n");
 		ret = -ENOMEM;
@@ -1144,13 +1135,9 @@ static int i740fb_probe(struct pci_dev *dev, const struct pci_device_id *ent)
 
 	fb_info(info, "%s frame buffer device\n", info->fix.id);
 	pci_set_drvdata(dev, info);
-#ifdef CONFIG_MTRR
-	if (mtrr) {
-		par->mtrr_reg = -1;
-		par->mtrr_reg = mtrr_add(info->fix.smem_start,
-				info->fix.smem_len, MTRR_TYPE_WRCOMB, 1);
-	}
-#endif
+	if (mtrr)
+		par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+						  info->fix.smem_len);
 	return 0;
 
 err_reg_framebuffer:
@@ -1177,13 +1164,7 @@ static void i740fb_remove(struct pci_dev *dev)
 
 	if (info) {
 		struct i740fb_par *par = info->par;
-
-#ifdef CONFIG_MTRR
-		if (par->mtrr_reg >= 0) {
-			mtrr_del(par->mtrr_reg, 0, 0);
-			par->mtrr_reg = -1;
-		}
-#endif
+		arch_phys_wc_del(par->wc_cookie);
 		unregister_framebuffer(info);
 		fb_dealloc_cmap(&info->cmap);
 		if (par->ddc_registered)
@@ -1287,10 +1268,8 @@ static int  __init i740fb_setup(char *options)
 	while ((opt = strsep(&options, ",")) != NULL) {
 		if (!*opt)
 			continue;
-#ifdef CONFIG_MTRR
 		else if (!strncmp(opt, "mtrr:", 5))
 			mtrr = simple_strtoul(opt + 5, NULL, 0);
-#endif
 		else
 			mode_option = opt;
 	}
@@ -1327,7 +1306,5 @@ MODULE_DESCRIPTION("fbdev driver for Intel740");
 module_param(mode_option, charp, 0444);
 MODULE_PARM_DESC(mode_option, "Default video mode ('640x480-8@60', etc)");
 
-#ifdef CONFIG_MTRR
 module_param(mtrr, int, 0444);
 MODULE_PARM_DESC(mtrr, "Enable write-combining with MTRR (1=enable, 0=disable, default=1)");
-#endif
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 37/47] video: fbdev: i740fb: use arch_phys_wc_add() and pci_ioremap_wc_bar()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (66 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/i740fb.c | 35 ++++++-----------------------------
 1 file changed, 6 insertions(+), 29 deletions(-)

diff --git a/drivers/video/fbdev/i740fb.c b/drivers/video/fbdev/i740fb.c
index a2b4204..452e116 100644
--- a/drivers/video/fbdev/i740fb.c
+++ b/drivers/video/fbdev/i740fb.c
@@ -27,24 +27,15 @@
 #include <linux/console.h>
 #include <video/vga.h>
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 #include "i740_reg.h"
 
 static char *mode_option;
-
-#ifdef CONFIG_MTRR
 static int mtrr = 1;
-#endif
 
 struct i740fb_par {
 	unsigned char __iomem *regs;
 	bool has_sgram;
-#ifdef CONFIG_MTRR
-	int mtrr_reg;
-#endif
+	int wc_cookie;
 	bool ddc_registered;
 	struct i2c_adapter ddc_adapter;
 	struct i2c_algo_bit_data ddc_algo;
@@ -1040,7 +1031,7 @@ static int i740fb_probe(struct pci_dev *dev, const struct pci_device_id *ent)
 		goto err_request_regions;
 	}
 
-	info->screen_base = pci_ioremap_bar(dev, 0);
+	info->screen_base = pci_ioremap_wc_bar(dev, 0);
 	if (!info->screen_base) {
 		dev_err(info->device, "error remapping base\n");
 		ret = -ENOMEM;
@@ -1144,13 +1135,9 @@ static int i740fb_probe(struct pci_dev *dev, const struct pci_device_id *ent)
 
 	fb_info(info, "%s frame buffer device\n", info->fix.id);
 	pci_set_drvdata(dev, info);
-#ifdef CONFIG_MTRR
-	if (mtrr) {
-		par->mtrr_reg = -1;
-		par->mtrr_reg = mtrr_add(info->fix.smem_start,
-				info->fix.smem_len, MTRR_TYPE_WRCOMB, 1);
-	}
-#endif
+	if (mtrr)
+		par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+						  info->fix.smem_len);
 	return 0;
 
 err_reg_framebuffer:
@@ -1177,13 +1164,7 @@ static void i740fb_remove(struct pci_dev *dev)
 
 	if (info) {
 		struct i740fb_par *par = info->par;
-
-#ifdef CONFIG_MTRR
-		if (par->mtrr_reg >= 0) {
-			mtrr_del(par->mtrr_reg, 0, 0);
-			par->mtrr_reg = -1;
-		}
-#endif
+		arch_phys_wc_del(par->wc_cookie);
 		unregister_framebuffer(info);
 		fb_dealloc_cmap(&info->cmap);
 		if (par->ddc_registered)
@@ -1287,10 +1268,8 @@ static int  __init i740fb_setup(char *options)
 	while ((opt = strsep(&options, ",")) != NULL) {
 		if (!*opt)
 			continue;
-#ifdef CONFIG_MTRR
 		else if (!strncmp(opt, "mtrr:", 5))
 			mtrr = simple_strtoul(opt + 5, NULL, 0);
-#endif
 		else
 			mode_option = opt;
 	}
@@ -1327,7 +1306,5 @@ MODULE_DESCRIPTION("fbdev driver for Intel740");
 module_param(mode_option, charp, 0444);
 MODULE_PARM_DESC(mode_option, "Default video mode ('640x480-8@60', etc)");
 
-#ifdef CONFIG_MTRR
 module_param(mtrr, int, 0444);
 MODULE_PARM_DESC(mtrr, "Enable write-combining with MTRR (1=enable, 0=disable, default=1)");
-#endif
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 38/47] video: fbdev: kyrofb: use arch_phys_wc_add() and pci_ioremap_wc_bar()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/kyro/fbdev.c | 33 +++++++++++----------------------
 include/video/kyro.h             |  4 +---
 2 files changed, 12 insertions(+), 25 deletions(-)

diff --git a/drivers/video/fbdev/kyro/fbdev.c b/drivers/video/fbdev/kyro/fbdev.c
index 65041e1..5bb0153 100644
--- a/drivers/video/fbdev/kyro/fbdev.c
+++ b/drivers/video/fbdev/kyro/fbdev.c
@@ -22,9 +22,6 @@
 #include <linux/pci.h>
 #include <asm/io.h>
 #include <linux/uaccess.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 
 #include <video/kyro.h>
 
@@ -84,9 +81,7 @@ static device_info_t deviceInfo;
 static char *mode_option = NULL;
 static int nopan = 0;
 static int nowrap = 1;
-#ifdef CONFIG_MTRR
 static int nomtrr = 0;
-#endif
 
 /* PCI driver prototypes */
 static int kyrofb_probe(struct pci_dev *pdev, const struct pci_device_id *ent);
@@ -570,10 +565,8 @@ static int __init kyrofb_setup(char *options)
 			nopan = 1;
 		} else if (strcmp(this_opt, "nowrap") == 0) {
 			nowrap = 1;
-#ifdef CONFIG_MTRR
 		} else if (strcmp(this_opt, "nomtrr") == 0) {
 			nomtrr = 1;
-#endif
 		} else {
 			mode_option = this_opt;
 		}
@@ -691,17 +684,16 @@ static int kyrofb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	currentpar->regbase = deviceInfo.pSTGReg =
 		ioremap_nocache(kyro_fix.mmio_start, kyro_fix.mmio_len);
+	if (!currentpar->regbase)
+		goto out_free_fb;
 
-	info->screen_base = ioremap_nocache(kyro_fix.smem_start,
-					    kyro_fix.smem_len);
+	info->screen_base = pci_ioremap_wc_bar(pdev, 0);
+	if (!info->screen_base)
+		goto out_unmap_regs;
 
-#ifdef CONFIG_MTRR
 	if (!nomtrr)
-		currentpar->mtrr_handle =
-			mtrr_add(kyro_fix.smem_start,
-				 kyro_fix.smem_len,
-				 MTRR_TYPE_WRCOMB, 1);
-#endif
+		currentpar->wc_cookie = arch_phys_wc_add(kyro_fix.smem_start,
+							 kyro_fix.smem_len);
 
 	kyro_fix.ypanstep	= nopan ? 0 : 1;
 	kyro_fix.ywrapstep	= nowrap ? 0 : 1;
@@ -745,8 +737,10 @@ static int kyrofb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	return 0;
 
 out_unmap:
-	iounmap(currentpar->regbase);
 	iounmap(info->screen_base);
+out_unmap_regs:
+	iounmap(currentpar->regbase);
+out_free_fb:
 	framebuffer_release(info);
 
 	return -EINVAL;
@@ -770,12 +764,7 @@ static void kyrofb_remove(struct pci_dev *pdev)
 	iounmap(info->screen_base);
 	iounmap(par->regbase);
 
-#ifdef CONFIG_MTRR
-	if (par->mtrr_handle)
-		mtrr_del(par->mtrr_handle,
-			 info->fix.smem_start,
-			 info->fix.smem_len);
-#endif
+	arch_phys_wc_del(par->wc_cookie);
 
 	unregister_framebuffer(info);
 	framebuffer_release(info);
diff --git a/include/video/kyro.h b/include/video/kyro.h
index c563968..b958c2e 100644
--- a/include/video/kyro.h
+++ b/include/video/kyro.h
@@ -35,9 +35,7 @@ struct kyrofb_info {
 	/* Useful to hold depth here for Linux */
 	u8 PIXDEPTH;
 
-#ifdef CONFIG_MTRR
-	int mtrr_handle;
-#endif
+	int wc_cookie;
 };
 
 extern int kyro_dev_init(void);
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 38/47] video: fbdev: kyrofb: use arch_phys_wc_add() and pci_ioremap_wc_bar()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/kyro/fbdev.c | 33 +++++++++++----------------------
 include/video/kyro.h             |  4 +---
 2 files changed, 12 insertions(+), 25 deletions(-)

diff --git a/drivers/video/fbdev/kyro/fbdev.c b/drivers/video/fbdev/kyro/fbdev.c
index 65041e1..5bb0153 100644
--- a/drivers/video/fbdev/kyro/fbdev.c
+++ b/drivers/video/fbdev/kyro/fbdev.c
@@ -22,9 +22,6 @@
 #include <linux/pci.h>
 #include <asm/io.h>
 #include <linux/uaccess.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 
 #include <video/kyro.h>
 
@@ -84,9 +81,7 @@ static device_info_t deviceInfo;
 static char *mode_option = NULL;
 static int nopan = 0;
 static int nowrap = 1;
-#ifdef CONFIG_MTRR
 static int nomtrr = 0;
-#endif
 
 /* PCI driver prototypes */
 static int kyrofb_probe(struct pci_dev *pdev, const struct pci_device_id *ent);
@@ -570,10 +565,8 @@ static int __init kyrofb_setup(char *options)
 			nopan = 1;
 		} else if (strcmp(this_opt, "nowrap") = 0) {
 			nowrap = 1;
-#ifdef CONFIG_MTRR
 		} else if (strcmp(this_opt, "nomtrr") = 0) {
 			nomtrr = 1;
-#endif
 		} else {
 			mode_option = this_opt;
 		}
@@ -691,17 +684,16 @@ static int kyrofb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	currentpar->regbase = deviceInfo.pSTGReg  		ioremap_nocache(kyro_fix.mmio_start, kyro_fix.mmio_len);
+	if (!currentpar->regbase)
+		goto out_free_fb;
 
-	info->screen_base = ioremap_nocache(kyro_fix.smem_start,
-					    kyro_fix.smem_len);
+	info->screen_base = pci_ioremap_wc_bar(pdev, 0);
+	if (!info->screen_base)
+		goto out_unmap_regs;
 
-#ifdef CONFIG_MTRR
 	if (!nomtrr)
-		currentpar->mtrr_handle -			mtrr_add(kyro_fix.smem_start,
-				 kyro_fix.smem_len,
-				 MTRR_TYPE_WRCOMB, 1);
-#endif
+		currentpar->wc_cookie = arch_phys_wc_add(kyro_fix.smem_start,
+							 kyro_fix.smem_len);
 
 	kyro_fix.ypanstep	= nopan ? 0 : 1;
 	kyro_fix.ywrapstep	= nowrap ? 0 : 1;
@@ -745,8 +737,10 @@ static int kyrofb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	return 0;
 
 out_unmap:
-	iounmap(currentpar->regbase);
 	iounmap(info->screen_base);
+out_unmap_regs:
+	iounmap(currentpar->regbase);
+out_free_fb:
 	framebuffer_release(info);
 
 	return -EINVAL;
@@ -770,12 +764,7 @@ static void kyrofb_remove(struct pci_dev *pdev)
 	iounmap(info->screen_base);
 	iounmap(par->regbase);
 
-#ifdef CONFIG_MTRR
-	if (par->mtrr_handle)
-		mtrr_del(par->mtrr_handle,
-			 info->fix.smem_start,
-			 info->fix.smem_len);
-#endif
+	arch_phys_wc_del(par->wc_cookie);
 
 	unregister_framebuffer(info);
 	framebuffer_release(info);
diff --git a/include/video/kyro.h b/include/video/kyro.h
index c563968..b958c2e 100644
--- a/include/video/kyro.h
+++ b/include/video/kyro.h
@@ -35,9 +35,7 @@ struct kyrofb_info {
 	/* Useful to hold depth here for Linux */
 	u8 PIXDEPTH;
 
-#ifdef CONFIG_MTRR
-	int mtrr_handle;
-#endif
+	int wc_cookie;
 };
 
 extern int kyro_dev_init(void);
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 38/47] video: fbdev: kyrofb: use arch_phys_wc_add() and pci_ioremap_wc_bar()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (69 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/kyro/fbdev.c | 33 +++++++++++----------------------
 include/video/kyro.h             |  4 +---
 2 files changed, 12 insertions(+), 25 deletions(-)

diff --git a/drivers/video/fbdev/kyro/fbdev.c b/drivers/video/fbdev/kyro/fbdev.c
index 65041e1..5bb0153 100644
--- a/drivers/video/fbdev/kyro/fbdev.c
+++ b/drivers/video/fbdev/kyro/fbdev.c
@@ -22,9 +22,6 @@
 #include <linux/pci.h>
 #include <asm/io.h>
 #include <linux/uaccess.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 
 #include <video/kyro.h>
 
@@ -84,9 +81,7 @@ static device_info_t deviceInfo;
 static char *mode_option = NULL;
 static int nopan = 0;
 static int nowrap = 1;
-#ifdef CONFIG_MTRR
 static int nomtrr = 0;
-#endif
 
 /* PCI driver prototypes */
 static int kyrofb_probe(struct pci_dev *pdev, const struct pci_device_id *ent);
@@ -570,10 +565,8 @@ static int __init kyrofb_setup(char *options)
 			nopan = 1;
 		} else if (strcmp(this_opt, "nowrap") == 0) {
 			nowrap = 1;
-#ifdef CONFIG_MTRR
 		} else if (strcmp(this_opt, "nomtrr") == 0) {
 			nomtrr = 1;
-#endif
 		} else {
 			mode_option = this_opt;
 		}
@@ -691,17 +684,16 @@ static int kyrofb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	currentpar->regbase = deviceInfo.pSTGReg =
 		ioremap_nocache(kyro_fix.mmio_start, kyro_fix.mmio_len);
+	if (!currentpar->regbase)
+		goto out_free_fb;
 
-	info->screen_base = ioremap_nocache(kyro_fix.smem_start,
-					    kyro_fix.smem_len);
+	info->screen_base = pci_ioremap_wc_bar(pdev, 0);
+	if (!info->screen_base)
+		goto out_unmap_regs;
 
-#ifdef CONFIG_MTRR
 	if (!nomtrr)
-		currentpar->mtrr_handle =
-			mtrr_add(kyro_fix.smem_start,
-				 kyro_fix.smem_len,
-				 MTRR_TYPE_WRCOMB, 1);
-#endif
+		currentpar->wc_cookie = arch_phys_wc_add(kyro_fix.smem_start,
+							 kyro_fix.smem_len);
 
 	kyro_fix.ypanstep	= nopan ? 0 : 1;
 	kyro_fix.ywrapstep	= nowrap ? 0 : 1;
@@ -745,8 +737,10 @@ static int kyrofb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	return 0;
 
 out_unmap:
-	iounmap(currentpar->regbase);
 	iounmap(info->screen_base);
+out_unmap_regs:
+	iounmap(currentpar->regbase);
+out_free_fb:
 	framebuffer_release(info);
 
 	return -EINVAL;
@@ -770,12 +764,7 @@ static void kyrofb_remove(struct pci_dev *pdev)
 	iounmap(info->screen_base);
 	iounmap(par->regbase);
 
-#ifdef CONFIG_MTRR
-	if (par->mtrr_handle)
-		mtrr_del(par->mtrr_handle,
-			 info->fix.smem_start,
-			 info->fix.smem_len);
-#endif
+	arch_phys_wc_del(par->wc_cookie);
 
 	unregister_framebuffer(info);
 	framebuffer_release(info);
diff --git a/include/video/kyro.h b/include/video/kyro.h
index c563968..b958c2e 100644
--- a/include/video/kyro.h
+++ b/include/video/kyro.h
@@ -35,9 +35,7 @@ struct kyrofb_info {
 	/* Useful to hold depth here for Linux */
 	u8 PIXDEPTH;
 
-#ifdef CONFIG_MTRR
-	int mtrr_handle;
-#endif
+	int wc_cookie;
 };
 
 extern int kyro_dev_init(void);
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 39/47] video: fbdev: pm2fb: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/pm2fb.c | 31 +++++--------------------------
 1 file changed, 5 insertions(+), 26 deletions(-)

diff --git a/drivers/video/fbdev/pm2fb.c b/drivers/video/fbdev/pm2fb.c
index 3b85b64..aa8d288 100644
--- a/drivers/video/fbdev/pm2fb.c
+++ b/drivers/video/fbdev/pm2fb.c
@@ -38,10 +38,6 @@
 #include <linux/fb.h>
 #include <linux/init.h>
 #include <linux/pci.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 #include <video/permedia2.h>
 #include <video/cvisionppc.h>
 
@@ -81,10 +77,7 @@ static char *mode_option;
 static bool lowhsync;
 static bool lowvsync;
 static bool noaccel;
-/* mtrr option */
-#ifdef CONFIG_MTRR
 static bool nomtrr;
-#endif
 
 /*
  * The hardware state of the graphics card that isn't part of the
@@ -100,7 +93,7 @@ struct pm2fb_par
 	u32		mem_control;	/* MemControl reg at probe */
 	u32		boot_address;	/* BootAddress reg at probe */
 	u32		palette[16];
-	int		mtrr_handle;
+	int		wc_cookie;
 };
 
 /*
@@ -1637,21 +1630,16 @@ static int pm2fb_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		goto err_exit_mmio;
 	}
 	info->screen_base =
-		ioremap_nocache(pm2fb_fix.smem_start, pm2fb_fix.smem_len);
+		ioremap_wc(pm2fb_fix.smem_start, pm2fb_fix.smem_len);
 	if (!info->screen_base) {
 		printk(KERN_WARNING "pm2fb: Can't ioremap smem area.\n");
 		release_mem_region(pm2fb_fix.smem_start, pm2fb_fix.smem_len);
 		goto err_exit_mmio;
 	}
 
-#ifdef CONFIG_MTRR
-	default_par->mtrr_handle = -1;
 	if (!nomtrr)
-		default_par->mtrr_handle =
-			mtrr_add(pm2fb_fix.smem_start,
-				 pm2fb_fix.smem_len,
-				 MTRR_TYPE_WRCOMB, 1);
-#endif
+		default_par->wc_cookie = arch_phys_wc_add(pm2fb_fix.smem_start,
+							  pm2fb_fix.smem_len);
 
 	info->fbops		= &pm2fb_ops;
 	info->fix		= pm2fb_fix;
@@ -1733,12 +1721,7 @@ static void pm2fb_remove(struct pci_dev *pdev)
 	struct pm2fb_par *par = info->par;
 
 	unregister_framebuffer(info);
-
-#ifdef CONFIG_MTRR
-	if (par->mtrr_handle >= 0)
-		mtrr_del(par->mtrr_handle, info->fix.smem_start,
-			 info->fix.smem_len);
-#endif /* CONFIG_MTRR */
+	arch_phys_wc_del(par->wc_cookie);
 	iounmap(info->screen_base);
 	release_mem_region(fix->smem_start, fix->smem_len);
 	iounmap(par->v_regs);
@@ -1791,10 +1774,8 @@ static int __init pm2fb_setup(char *options)
 			lowvsync = 1;
 		else if (!strncmp(this_opt, "hwcursor=", 9))
 			hwcursor = simple_strtoul(this_opt + 9, NULL, 0);
-#ifdef CONFIG_MTRR
 		else if (!strncmp(this_opt, "nomtrr", 6))
 			nomtrr = 1;
-#endif
 		else if (!strncmp(this_opt, "noaccel", 7))
 			noaccel = 1;
 		else
@@ -1847,10 +1828,8 @@ MODULE_PARM_DESC(noaccel, "Disable acceleration");
 module_param(hwcursor, int, 0644);
 MODULE_PARM_DESC(hwcursor, "Enable hardware cursor "
 			"(1=enable, 0=disable, default=1)");
-#ifdef CONFIG_MTRR
 module_param(nomtrr, bool, 0);
 MODULE_PARM_DESC(nomtrr, "Disable MTRR support (0 or 1=disabled) (default=0)");
-#endif
 
 MODULE_AUTHOR("Jim Hague <jim.hague@acm.org>");
 MODULE_DESCRIPTION("Permedia2 framebuffer device driver");
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 39/47] video: fbdev: pm2fb: use arch_phys_wc_add() and ioremap_wc()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/pm2fb.c | 31 +++++--------------------------
 1 file changed, 5 insertions(+), 26 deletions(-)

diff --git a/drivers/video/fbdev/pm2fb.c b/drivers/video/fbdev/pm2fb.c
index 3b85b64..aa8d288 100644
--- a/drivers/video/fbdev/pm2fb.c
+++ b/drivers/video/fbdev/pm2fb.c
@@ -38,10 +38,6 @@
 #include <linux/fb.h>
 #include <linux/init.h>
 #include <linux/pci.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 #include <video/permedia2.h>
 #include <video/cvisionppc.h>
 
@@ -81,10 +77,7 @@ static char *mode_option;
 static bool lowhsync;
 static bool lowvsync;
 static bool noaccel;
-/* mtrr option */
-#ifdef CONFIG_MTRR
 static bool nomtrr;
-#endif
 
 /*
  * The hardware state of the graphics card that isn't part of the
@@ -100,7 +93,7 @@ struct pm2fb_par
 	u32		mem_control;	/* MemControl reg at probe */
 	u32		boot_address;	/* BootAddress reg at probe */
 	u32		palette[16];
-	int		mtrr_handle;
+	int		wc_cookie;
 };
 
 /*
@@ -1637,21 +1630,16 @@ static int pm2fb_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		goto err_exit_mmio;
 	}
 	info->screen_base -		ioremap_nocache(pm2fb_fix.smem_start, pm2fb_fix.smem_len);
+		ioremap_wc(pm2fb_fix.smem_start, pm2fb_fix.smem_len);
 	if (!info->screen_base) {
 		printk(KERN_WARNING "pm2fb: Can't ioremap smem area.\n");
 		release_mem_region(pm2fb_fix.smem_start, pm2fb_fix.smem_len);
 		goto err_exit_mmio;
 	}
 
-#ifdef CONFIG_MTRR
-	default_par->mtrr_handle = -1;
 	if (!nomtrr)
-		default_par->mtrr_handle -			mtrr_add(pm2fb_fix.smem_start,
-				 pm2fb_fix.smem_len,
-				 MTRR_TYPE_WRCOMB, 1);
-#endif
+		default_par->wc_cookie = arch_phys_wc_add(pm2fb_fix.smem_start,
+							  pm2fb_fix.smem_len);
 
 	info->fbops		= &pm2fb_ops;
 	info->fix		= pm2fb_fix;
@@ -1733,12 +1721,7 @@ static void pm2fb_remove(struct pci_dev *pdev)
 	struct pm2fb_par *par = info->par;
 
 	unregister_framebuffer(info);
-
-#ifdef CONFIG_MTRR
-	if (par->mtrr_handle >= 0)
-		mtrr_del(par->mtrr_handle, info->fix.smem_start,
-			 info->fix.smem_len);
-#endif /* CONFIG_MTRR */
+	arch_phys_wc_del(par->wc_cookie);
 	iounmap(info->screen_base);
 	release_mem_region(fix->smem_start, fix->smem_len);
 	iounmap(par->v_regs);
@@ -1791,10 +1774,8 @@ static int __init pm2fb_setup(char *options)
 			lowvsync = 1;
 		else if (!strncmp(this_opt, "hwcursor=", 9))
 			hwcursor = simple_strtoul(this_opt + 9, NULL, 0);
-#ifdef CONFIG_MTRR
 		else if (!strncmp(this_opt, "nomtrr", 6))
 			nomtrr = 1;
-#endif
 		else if (!strncmp(this_opt, "noaccel", 7))
 			noaccel = 1;
 		else
@@ -1847,10 +1828,8 @@ MODULE_PARM_DESC(noaccel, "Disable acceleration");
 module_param(hwcursor, int, 0644);
 MODULE_PARM_DESC(hwcursor, "Enable hardware cursor "
 			"(1=enable, 0=disable, default=1)");
-#ifdef CONFIG_MTRR
 module_param(nomtrr, bool, 0);
 MODULE_PARM_DESC(nomtrr, "Disable MTRR support (0 or 1=disabled) (default=0)");
-#endif
 
 MODULE_AUTHOR("Jim Hague <jim.hague@acm.org>");
 MODULE_DESCRIPTION("Permedia2 framebuffer device driver");
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 39/47] video: fbdev: pm2fb: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (70 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/pm2fb.c | 31 +++++--------------------------
 1 file changed, 5 insertions(+), 26 deletions(-)

diff --git a/drivers/video/fbdev/pm2fb.c b/drivers/video/fbdev/pm2fb.c
index 3b85b64..aa8d288 100644
--- a/drivers/video/fbdev/pm2fb.c
+++ b/drivers/video/fbdev/pm2fb.c
@@ -38,10 +38,6 @@
 #include <linux/fb.h>
 #include <linux/init.h>
 #include <linux/pci.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 #include <video/permedia2.h>
 #include <video/cvisionppc.h>
 
@@ -81,10 +77,7 @@ static char *mode_option;
 static bool lowhsync;
 static bool lowvsync;
 static bool noaccel;
-/* mtrr option */
-#ifdef CONFIG_MTRR
 static bool nomtrr;
-#endif
 
 /*
  * The hardware state of the graphics card that isn't part of the
@@ -100,7 +93,7 @@ struct pm2fb_par
 	u32		mem_control;	/* MemControl reg at probe */
 	u32		boot_address;	/* BootAddress reg at probe */
 	u32		palette[16];
-	int		mtrr_handle;
+	int		wc_cookie;
 };
 
 /*
@@ -1637,21 +1630,16 @@ static int pm2fb_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		goto err_exit_mmio;
 	}
 	info->screen_base =
-		ioremap_nocache(pm2fb_fix.smem_start, pm2fb_fix.smem_len);
+		ioremap_wc(pm2fb_fix.smem_start, pm2fb_fix.smem_len);
 	if (!info->screen_base) {
 		printk(KERN_WARNING "pm2fb: Can't ioremap smem area.\n");
 		release_mem_region(pm2fb_fix.smem_start, pm2fb_fix.smem_len);
 		goto err_exit_mmio;
 	}
 
-#ifdef CONFIG_MTRR
-	default_par->mtrr_handle = -1;
 	if (!nomtrr)
-		default_par->mtrr_handle =
-			mtrr_add(pm2fb_fix.smem_start,
-				 pm2fb_fix.smem_len,
-				 MTRR_TYPE_WRCOMB, 1);
-#endif
+		default_par->wc_cookie = arch_phys_wc_add(pm2fb_fix.smem_start,
+							  pm2fb_fix.smem_len);
 
 	info->fbops		= &pm2fb_ops;
 	info->fix		= pm2fb_fix;
@@ -1733,12 +1721,7 @@ static void pm2fb_remove(struct pci_dev *pdev)
 	struct pm2fb_par *par = info->par;
 
 	unregister_framebuffer(info);
-
-#ifdef CONFIG_MTRR
-	if (par->mtrr_handle >= 0)
-		mtrr_del(par->mtrr_handle, info->fix.smem_start,
-			 info->fix.smem_len);
-#endif /* CONFIG_MTRR */
+	arch_phys_wc_del(par->wc_cookie);
 	iounmap(info->screen_base);
 	release_mem_region(fix->smem_start, fix->smem_len);
 	iounmap(par->v_regs);
@@ -1791,10 +1774,8 @@ static int __init pm2fb_setup(char *options)
 			lowvsync = 1;
 		else if (!strncmp(this_opt, "hwcursor=", 9))
 			hwcursor = simple_strtoul(this_opt + 9, NULL, 0);
-#ifdef CONFIG_MTRR
 		else if (!strncmp(this_opt, "nomtrr", 6))
 			nomtrr = 1;
-#endif
 		else if (!strncmp(this_opt, "noaccel", 7))
 			noaccel = 1;
 		else
@@ -1847,10 +1828,8 @@ MODULE_PARM_DESC(noaccel, "Disable acceleration");
 module_param(hwcursor, int, 0644);
 MODULE_PARM_DESC(hwcursor, "Enable hardware cursor "
 			"(1=enable, 0=disable, default=1)");
-#ifdef CONFIG_MTRR
 module_param(nomtrr, bool, 0);
 MODULE_PARM_DESC(nomtrr, "Disable MTRR support (0 or 1=disabled) (default=0)");
-#endif
 
 MODULE_AUTHOR("Jim Hague <jim.hague@acm.org>");
 MODULE_DESCRIPTION("Permedia2 framebuffer device driver");
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 40/47] video: fbdev: pm3fb: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/pm3fb.c | 30 ++++++------------------------
 1 file changed, 6 insertions(+), 24 deletions(-)

diff --git a/drivers/video/fbdev/pm3fb.c b/drivers/video/fbdev/pm3fb.c
index 77b99ed..6ff5077 100644
--- a/drivers/video/fbdev/pm3fb.c
+++ b/drivers/video/fbdev/pm3fb.c
@@ -32,9 +32,6 @@
 #include <linux/fb.h>
 #include <linux/init.h>
 #include <linux/pci.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 
 #include <video/pm3fb.h>
 
@@ -58,11 +55,7 @@
 static int hwcursor = 1;
 static char *mode_option;
 static bool noaccel;
-
-/* mtrr option */
-#ifdef CONFIG_MTRR
 static bool nomtrr;
-#endif
 
 /*
  * This structure defines the hardware state of the graphics card. Normally
@@ -76,7 +69,7 @@ struct pm3_par {
 	u32		video;		/* video flags before blanking */
 	u32		base;		/* screen base in 128 bits unit */
 	u32		palette[16];
-	int		mtrr_handle;
+	int		wc_cookie;
 };
 
 /*
@@ -1374,8 +1367,8 @@ static int pm3fb_probe(struct pci_dev *dev, const struct pci_device_id *ent)
 		printk(KERN_WARNING "pm3fb: Can't reserve smem.\n");
 		goto err_exit_mmio;
 	}
-	info->screen_base =
-		ioremap_nocache(pm3fb_fix.smem_start, pm3fb_fix.smem_len);
+	info->screen_base = ioremap_wc(pm3fb_fix.smem_start,
+				       pm3fb_fix.smem_len);
 	if (!info->screen_base) {
 		printk(KERN_WARNING "pm3fb: Can't ioremap smem area.\n");
 		release_mem_region(pm3fb_fix.smem_start, pm3fb_fix.smem_len);
@@ -1383,12 +1376,9 @@ static int pm3fb_probe(struct pci_dev *dev, const struct pci_device_id *ent)
 	}
 	info->screen_size = pm3fb_fix.smem_len;
 
-#ifdef CONFIG_MTRR
 	if (!nomtrr)
-		par->mtrr_handle = mtrr_add(pm3fb_fix.smem_start,
-						pm3fb_fix.smem_len,
-						MTRR_TYPE_WRCOMB, 1);
-#endif
+		par->wc_cookie = arch_phys_wc_add(pm3fb_fix.smem_start,
+						  pm3fb_fix.smem_len);
 	info->fbops = &pm3fb_ops;
 
 	par->video = PM3_READ_REG(par, PM3VideoControl);
@@ -1478,11 +1468,7 @@ static void pm3fb_remove(struct pci_dev *dev)
 		unregister_framebuffer(info);
 		fb_dealloc_cmap(&info->cmap);
 
-#ifdef CONFIG_MTRR
-		if (par->mtrr_handle >= 0)
-			mtrr_del(par->mtrr_handle, info->fix.smem_start,
-				 info->fix.smem_len);
-#endif /* CONFIG_MTRR */
+		arch_phys_wc_del(par->wc_cookie);
 		iounmap(info->screen_base);
 		release_mem_region(fix->smem_start, fix->smem_len);
 		iounmap(par->v_regs);
@@ -1533,10 +1519,8 @@ static int __init pm3fb_setup(char *options)
 			noaccel = 1;
 		else if (!strncmp(this_opt, "hwcursor=", 9))
 			hwcursor = simple_strtoul(this_opt + 9, NULL, 0);
-#ifdef CONFIG_MTRR
 		else if (!strncmp(this_opt, "nomtrr", 6))
 			nomtrr = 1;
-#endif
 		else
 			mode_option = this_opt;
 	}
@@ -1577,10 +1561,8 @@ MODULE_PARM_DESC(noaccel, "Disable acceleration");
 module_param(hwcursor, int, 0644);
 MODULE_PARM_DESC(hwcursor, "Enable hardware cursor "
 			"(1=enable, 0=disable, default=1)");
-#ifdef CONFIG_MTRR
 module_param(nomtrr, bool, 0);
 MODULE_PARM_DESC(nomtrr, "Disable MTRR support (0 or 1=disabled) (default=0)");
-#endif
 
 MODULE_DESCRIPTION("Permedia3 framebuffer device driver");
 MODULE_LICENSE("GPL");
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 40/47] video: fbdev: pm3fb: use arch_phys_wc_add() and ioremap_wc()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/pm3fb.c | 30 ++++++------------------------
 1 file changed, 6 insertions(+), 24 deletions(-)

diff --git a/drivers/video/fbdev/pm3fb.c b/drivers/video/fbdev/pm3fb.c
index 77b99ed..6ff5077 100644
--- a/drivers/video/fbdev/pm3fb.c
+++ b/drivers/video/fbdev/pm3fb.c
@@ -32,9 +32,6 @@
 #include <linux/fb.h>
 #include <linux/init.h>
 #include <linux/pci.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 
 #include <video/pm3fb.h>
 
@@ -58,11 +55,7 @@
 static int hwcursor = 1;
 static char *mode_option;
 static bool noaccel;
-
-/* mtrr option */
-#ifdef CONFIG_MTRR
 static bool nomtrr;
-#endif
 
 /*
  * This structure defines the hardware state of the graphics card. Normally
@@ -76,7 +69,7 @@ struct pm3_par {
 	u32		video;		/* video flags before blanking */
 	u32		base;		/* screen base in 128 bits unit */
 	u32		palette[16];
-	int		mtrr_handle;
+	int		wc_cookie;
 };
 
 /*
@@ -1374,8 +1367,8 @@ static int pm3fb_probe(struct pci_dev *dev, const struct pci_device_id *ent)
 		printk(KERN_WARNING "pm3fb: Can't reserve smem.\n");
 		goto err_exit_mmio;
 	}
-	info->screen_base -		ioremap_nocache(pm3fb_fix.smem_start, pm3fb_fix.smem_len);
+	info->screen_base = ioremap_wc(pm3fb_fix.smem_start,
+				       pm3fb_fix.smem_len);
 	if (!info->screen_base) {
 		printk(KERN_WARNING "pm3fb: Can't ioremap smem area.\n");
 		release_mem_region(pm3fb_fix.smem_start, pm3fb_fix.smem_len);
@@ -1383,12 +1376,9 @@ static int pm3fb_probe(struct pci_dev *dev, const struct pci_device_id *ent)
 	}
 	info->screen_size = pm3fb_fix.smem_len;
 
-#ifdef CONFIG_MTRR
 	if (!nomtrr)
-		par->mtrr_handle = mtrr_add(pm3fb_fix.smem_start,
-						pm3fb_fix.smem_len,
-						MTRR_TYPE_WRCOMB, 1);
-#endif
+		par->wc_cookie = arch_phys_wc_add(pm3fb_fix.smem_start,
+						  pm3fb_fix.smem_len);
 	info->fbops = &pm3fb_ops;
 
 	par->video = PM3_READ_REG(par, PM3VideoControl);
@@ -1478,11 +1468,7 @@ static void pm3fb_remove(struct pci_dev *dev)
 		unregister_framebuffer(info);
 		fb_dealloc_cmap(&info->cmap);
 
-#ifdef CONFIG_MTRR
-		if (par->mtrr_handle >= 0)
-			mtrr_del(par->mtrr_handle, info->fix.smem_start,
-				 info->fix.smem_len);
-#endif /* CONFIG_MTRR */
+		arch_phys_wc_del(par->wc_cookie);
 		iounmap(info->screen_base);
 		release_mem_region(fix->smem_start, fix->smem_len);
 		iounmap(par->v_regs);
@@ -1533,10 +1519,8 @@ static int __init pm3fb_setup(char *options)
 			noaccel = 1;
 		else if (!strncmp(this_opt, "hwcursor=", 9))
 			hwcursor = simple_strtoul(this_opt + 9, NULL, 0);
-#ifdef CONFIG_MTRR
 		else if (!strncmp(this_opt, "nomtrr", 6))
 			nomtrr = 1;
-#endif
 		else
 			mode_option = this_opt;
 	}
@@ -1577,10 +1561,8 @@ MODULE_PARM_DESC(noaccel, "Disable acceleration");
 module_param(hwcursor, int, 0644);
 MODULE_PARM_DESC(hwcursor, "Enable hardware cursor "
 			"(1=enable, 0=disable, default=1)");
-#ifdef CONFIG_MTRR
 module_param(nomtrr, bool, 0);
 MODULE_PARM_DESC(nomtrr, "Disable MTRR support (0 or 1=disabled) (default=0)");
-#endif
 
 MODULE_DESCRIPTION("Permedia3 framebuffer device driver");
 MODULE_LICENSE("GPL");
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 40/47] video: fbdev: pm3fb: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (73 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/pm3fb.c | 30 ++++++------------------------
 1 file changed, 6 insertions(+), 24 deletions(-)

diff --git a/drivers/video/fbdev/pm3fb.c b/drivers/video/fbdev/pm3fb.c
index 77b99ed..6ff5077 100644
--- a/drivers/video/fbdev/pm3fb.c
+++ b/drivers/video/fbdev/pm3fb.c
@@ -32,9 +32,6 @@
 #include <linux/fb.h>
 #include <linux/init.h>
 #include <linux/pci.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 
 #include <video/pm3fb.h>
 
@@ -58,11 +55,7 @@
 static int hwcursor = 1;
 static char *mode_option;
 static bool noaccel;
-
-/* mtrr option */
-#ifdef CONFIG_MTRR
 static bool nomtrr;
-#endif
 
 /*
  * This structure defines the hardware state of the graphics card. Normally
@@ -76,7 +69,7 @@ struct pm3_par {
 	u32		video;		/* video flags before blanking */
 	u32		base;		/* screen base in 128 bits unit */
 	u32		palette[16];
-	int		mtrr_handle;
+	int		wc_cookie;
 };
 
 /*
@@ -1374,8 +1367,8 @@ static int pm3fb_probe(struct pci_dev *dev, const struct pci_device_id *ent)
 		printk(KERN_WARNING "pm3fb: Can't reserve smem.\n");
 		goto err_exit_mmio;
 	}
-	info->screen_base =
-		ioremap_nocache(pm3fb_fix.smem_start, pm3fb_fix.smem_len);
+	info->screen_base = ioremap_wc(pm3fb_fix.smem_start,
+				       pm3fb_fix.smem_len);
 	if (!info->screen_base) {
 		printk(KERN_WARNING "pm3fb: Can't ioremap smem area.\n");
 		release_mem_region(pm3fb_fix.smem_start, pm3fb_fix.smem_len);
@@ -1383,12 +1376,9 @@ static int pm3fb_probe(struct pci_dev *dev, const struct pci_device_id *ent)
 	}
 	info->screen_size = pm3fb_fix.smem_len;
 
-#ifdef CONFIG_MTRR
 	if (!nomtrr)
-		par->mtrr_handle = mtrr_add(pm3fb_fix.smem_start,
-						pm3fb_fix.smem_len,
-						MTRR_TYPE_WRCOMB, 1);
-#endif
+		par->wc_cookie = arch_phys_wc_add(pm3fb_fix.smem_start,
+						  pm3fb_fix.smem_len);
 	info->fbops = &pm3fb_ops;
 
 	par->video = PM3_READ_REG(par, PM3VideoControl);
@@ -1478,11 +1468,7 @@ static void pm3fb_remove(struct pci_dev *dev)
 		unregister_framebuffer(info);
 		fb_dealloc_cmap(&info->cmap);
 
-#ifdef CONFIG_MTRR
-		if (par->mtrr_handle >= 0)
-			mtrr_del(par->mtrr_handle, info->fix.smem_start,
-				 info->fix.smem_len);
-#endif /* CONFIG_MTRR */
+		arch_phys_wc_del(par->wc_cookie);
 		iounmap(info->screen_base);
 		release_mem_region(fix->smem_start, fix->smem_len);
 		iounmap(par->v_regs);
@@ -1533,10 +1519,8 @@ static int __init pm3fb_setup(char *options)
 			noaccel = 1;
 		else if (!strncmp(this_opt, "hwcursor=", 9))
 			hwcursor = simple_strtoul(this_opt + 9, NULL, 0);
-#ifdef CONFIG_MTRR
 		else if (!strncmp(this_opt, "nomtrr", 6))
 			nomtrr = 1;
-#endif
 		else
 			mode_option = this_opt;
 	}
@@ -1577,10 +1561,8 @@ MODULE_PARM_DESC(noaccel, "Disable acceleration");
 module_param(hwcursor, int, 0644);
 MODULE_PARM_DESC(hwcursor, "Enable hardware cursor "
 			"(1=enable, 0=disable, default=1)");
-#ifdef CONFIG_MTRR
 module_param(nomtrr, bool, 0);
 MODULE_PARM_DESC(nomtrr, "Disable MTRR support (0 or 1=disabled) (default=0)");
-#endif
 
 MODULE_DESCRIPTION("Permedia3 framebuffer device driver");
 MODULE_LICENSE("GPL");
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 41/47] video: fbdev: rivafb: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/riva/fbdev.c  | 39 +++++++--------------------------------
 drivers/video/fbdev/riva/rivafb.h |  4 +---
 2 files changed, 8 insertions(+), 35 deletions(-)

diff --git a/drivers/video/fbdev/riva/fbdev.c b/drivers/video/fbdev/riva/fbdev.c
index be73727..854b86d 100644
--- a/drivers/video/fbdev/riva/fbdev.c
+++ b/drivers/video/fbdev/riva/fbdev.c
@@ -41,9 +41,6 @@
 #include <linux/pci.h>
 #include <linux/backlight.h>
 #include <linux/bitrev.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 #ifdef CONFIG_PPC_OF
 #include <asm/prom.h>
 #include <asm/pci-bridge.h>
@@ -208,9 +205,7 @@ MODULE_DEVICE_TABLE(pci, rivafb_pci_tbl);
 static int flatpanel = -1; /* Autodetect later */
 static int forceCRTC = -1;
 static bool noaccel  = 0;
-#ifdef CONFIG_MTRR
 static bool nomtrr = 0;
-#endif
 #ifdef CONFIG_PMAC_BACKLIGHT
 static int backlight = 1;
 #else
@@ -2013,28 +2008,18 @@ static int rivafb_probe(struct pci_dev *pd, const struct pci_device_id *ent)
 
 	rivafb_fix.smem_len = riva_get_memlen(default_par) * 1024;
 	default_par->dclk_max = riva_get_maxdclk(default_par) * 1000;
-	info->screen_base = ioremap(rivafb_fix.smem_start,
-				    rivafb_fix.smem_len);
+	info->screen_base = ioremap_wc(rivafb_fix.smem_start,
+				       rivafb_fix.smem_len);
 	if (!info->screen_base) {
 		printk(KERN_ERR PFX "cannot ioremap FB base\n");
 		ret = -EIO;
 		goto err_iounmap_pramin;
 	}
 
-#ifdef CONFIG_MTRR
-	if (!nomtrr) {
-		default_par->mtrr.vram = mtrr_add(rivafb_fix.smem_start,
-					   	  rivafb_fix.smem_len,
-					    	  MTRR_TYPE_WRCOMB, 1);
-		if (default_par->mtrr.vram < 0) {
-			printk(KERN_ERR PFX "unable to setup MTRR\n");
-		} else {
-			default_par->mtrr.vram_valid = 1;
-			/* let there be speed */
-			printk(KERN_INFO PFX "RIVA MTRR set to ON\n");
-		}
-	}
-#endif /* CONFIG_MTRR */
+	if (!nomtrr)
+		default_par->wc_cookie =
+			arch_phys_wc_add(rivafb_fix.smem_start,
+					 rivafb_fix.smem_len);
 
 	info->fbops = &riva_fb_ops;
 	info->fix = rivafb_fix;
@@ -2108,13 +2093,7 @@ static void rivafb_remove(struct pci_dev *pd)
 	unregister_framebuffer(info);
 
 	riva_bl_exit(info);
-
-#ifdef CONFIG_MTRR
-	if (par->mtrr.vram_valid)
-		mtrr_del(par->mtrr.vram, info->fix.smem_start,
-			 info->fix.smem_len);
-#endif /* CONFIG_MTRR */
-
+	arch_phys_wc_del(par->wc_cookie);
 	iounmap(par->ctrl_base);
 	iounmap(info->screen_base);
 	if (par->riva.Architecture == NV_ARCH_03)
@@ -2153,10 +2132,8 @@ static int rivafb_setup(char *options)
 			flatpanel = 1;
 		} else if (!strncmp(this_opt, "backlight:", 10)) {
 			backlight = simple_strtoul(this_opt+10, NULL, 0);
-#ifdef CONFIG_MTRR
 		} else if (!strncmp(this_opt, "nomtrr", 6)) {
 			nomtrr = 1;
-#endif
 		} else if (!strncmp(this_opt, "strictmode", 10)) {
 			strictmode = 1;
 		} else if (!strncmp(this_opt, "noaccel", 7)) {
@@ -2212,10 +2189,8 @@ module_param(flatpanel, int, 0);
 MODULE_PARM_DESC(flatpanel, "Enables experimental flat panel support for some chipsets. (0 or 1=enabled) (default=0)");
 module_param(forceCRTC, int, 0);
 MODULE_PARM_DESC(forceCRTC, "Forces usage of a particular CRTC in case autodetection fails. (0 or 1) (default=autodetect)");
-#ifdef CONFIG_MTRR
 module_param(nomtrr, bool, 0);
 MODULE_PARM_DESC(nomtrr, "Disables MTRR support (0 or 1=disabled) (default=0)");
-#endif
 module_param(strictmode, bool, 0);
 MODULE_PARM_DESC(strictmode, "Only use video modes from EDID");
 
diff --git a/drivers/video/fbdev/riva/rivafb.h b/drivers/video/fbdev/riva/rivafb.h
index d9f107b..61fd37c 100644
--- a/drivers/video/fbdev/riva/rivafb.h
+++ b/drivers/video/fbdev/riva/rivafb.h
@@ -61,9 +61,7 @@ struct riva_par {
 	int FlatPanel;
 	struct pci_dev *pdev;
 	int cursor_reset;
-#ifdef CONFIG_MTRR
-	struct { int vram; int vram_valid; } mtrr;
-#endif
+	int wc_cookie;
 	struct riva_i2c_chan chan[3];
 };
 
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 41/47] video: fbdev: rivafb: use arch_phys_wc_add() and ioremap_wc()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/riva/fbdev.c  | 39 +++++++--------------------------------
 drivers/video/fbdev/riva/rivafb.h |  4 +---
 2 files changed, 8 insertions(+), 35 deletions(-)

diff --git a/drivers/video/fbdev/riva/fbdev.c b/drivers/video/fbdev/riva/fbdev.c
index be73727..854b86d 100644
--- a/drivers/video/fbdev/riva/fbdev.c
+++ b/drivers/video/fbdev/riva/fbdev.c
@@ -41,9 +41,6 @@
 #include <linux/pci.h>
 #include <linux/backlight.h>
 #include <linux/bitrev.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 #ifdef CONFIG_PPC_OF
 #include <asm/prom.h>
 #include <asm/pci-bridge.h>
@@ -208,9 +205,7 @@ MODULE_DEVICE_TABLE(pci, rivafb_pci_tbl);
 static int flatpanel = -1; /* Autodetect later */
 static int forceCRTC = -1;
 static bool noaccel  = 0;
-#ifdef CONFIG_MTRR
 static bool nomtrr = 0;
-#endif
 #ifdef CONFIG_PMAC_BACKLIGHT
 static int backlight = 1;
 #else
@@ -2013,28 +2008,18 @@ static int rivafb_probe(struct pci_dev *pd, const struct pci_device_id *ent)
 
 	rivafb_fix.smem_len = riva_get_memlen(default_par) * 1024;
 	default_par->dclk_max = riva_get_maxdclk(default_par) * 1000;
-	info->screen_base = ioremap(rivafb_fix.smem_start,
-				    rivafb_fix.smem_len);
+	info->screen_base = ioremap_wc(rivafb_fix.smem_start,
+				       rivafb_fix.smem_len);
 	if (!info->screen_base) {
 		printk(KERN_ERR PFX "cannot ioremap FB base\n");
 		ret = -EIO;
 		goto err_iounmap_pramin;
 	}
 
-#ifdef CONFIG_MTRR
-	if (!nomtrr) {
-		default_par->mtrr.vram = mtrr_add(rivafb_fix.smem_start,
-					   	  rivafb_fix.smem_len,
-					    	  MTRR_TYPE_WRCOMB, 1);
-		if (default_par->mtrr.vram < 0) {
-			printk(KERN_ERR PFX "unable to setup MTRR\n");
-		} else {
-			default_par->mtrr.vram_valid = 1;
-			/* let there be speed */
-			printk(KERN_INFO PFX "RIVA MTRR set to ON\n");
-		}
-	}
-#endif /* CONFIG_MTRR */
+	if (!nomtrr)
+		default_par->wc_cookie +			arch_phys_wc_add(rivafb_fix.smem_start,
+					 rivafb_fix.smem_len);
 
 	info->fbops = &riva_fb_ops;
 	info->fix = rivafb_fix;
@@ -2108,13 +2093,7 @@ static void rivafb_remove(struct pci_dev *pd)
 	unregister_framebuffer(info);
 
 	riva_bl_exit(info);
-
-#ifdef CONFIG_MTRR
-	if (par->mtrr.vram_valid)
-		mtrr_del(par->mtrr.vram, info->fix.smem_start,
-			 info->fix.smem_len);
-#endif /* CONFIG_MTRR */
-
+	arch_phys_wc_del(par->wc_cookie);
 	iounmap(par->ctrl_base);
 	iounmap(info->screen_base);
 	if (par->riva.Architecture = NV_ARCH_03)
@@ -2153,10 +2132,8 @@ static int rivafb_setup(char *options)
 			flatpanel = 1;
 		} else if (!strncmp(this_opt, "backlight:", 10)) {
 			backlight = simple_strtoul(this_opt+10, NULL, 0);
-#ifdef CONFIG_MTRR
 		} else if (!strncmp(this_opt, "nomtrr", 6)) {
 			nomtrr = 1;
-#endif
 		} else if (!strncmp(this_opt, "strictmode", 10)) {
 			strictmode = 1;
 		} else if (!strncmp(this_opt, "noaccel", 7)) {
@@ -2212,10 +2189,8 @@ module_param(flatpanel, int, 0);
 MODULE_PARM_DESC(flatpanel, "Enables experimental flat panel support for some chipsets. (0 or 1=enabled) (default=0)");
 module_param(forceCRTC, int, 0);
 MODULE_PARM_DESC(forceCRTC, "Forces usage of a particular CRTC in case autodetection fails. (0 or 1) (default=autodetect)");
-#ifdef CONFIG_MTRR
 module_param(nomtrr, bool, 0);
 MODULE_PARM_DESC(nomtrr, "Disables MTRR support (0 or 1=disabled) (default=0)");
-#endif
 module_param(strictmode, bool, 0);
 MODULE_PARM_DESC(strictmode, "Only use video modes from EDID");
 
diff --git a/drivers/video/fbdev/riva/rivafb.h b/drivers/video/fbdev/riva/rivafb.h
index d9f107b..61fd37c 100644
--- a/drivers/video/fbdev/riva/rivafb.h
+++ b/drivers/video/fbdev/riva/rivafb.h
@@ -61,9 +61,7 @@ struct riva_par {
 	int FlatPanel;
 	struct pci_dev *pdev;
 	int cursor_reset;
-#ifdef CONFIG_MTRR
-	struct { int vram; int vram_valid; } mtrr;
-#endif
+	int wc_cookie;
 	struct riva_i2c_chan chan[3];
 };
 
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 41/47] video: fbdev: rivafb: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (74 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/riva/fbdev.c  | 39 +++++++--------------------------------
 drivers/video/fbdev/riva/rivafb.h |  4 +---
 2 files changed, 8 insertions(+), 35 deletions(-)

diff --git a/drivers/video/fbdev/riva/fbdev.c b/drivers/video/fbdev/riva/fbdev.c
index be73727..854b86d 100644
--- a/drivers/video/fbdev/riva/fbdev.c
+++ b/drivers/video/fbdev/riva/fbdev.c
@@ -41,9 +41,6 @@
 #include <linux/pci.h>
 #include <linux/backlight.h>
 #include <linux/bitrev.h>
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 #ifdef CONFIG_PPC_OF
 #include <asm/prom.h>
 #include <asm/pci-bridge.h>
@@ -208,9 +205,7 @@ MODULE_DEVICE_TABLE(pci, rivafb_pci_tbl);
 static int flatpanel = -1; /* Autodetect later */
 static int forceCRTC = -1;
 static bool noaccel  = 0;
-#ifdef CONFIG_MTRR
 static bool nomtrr = 0;
-#endif
 #ifdef CONFIG_PMAC_BACKLIGHT
 static int backlight = 1;
 #else
@@ -2013,28 +2008,18 @@ static int rivafb_probe(struct pci_dev *pd, const struct pci_device_id *ent)
 
 	rivafb_fix.smem_len = riva_get_memlen(default_par) * 1024;
 	default_par->dclk_max = riva_get_maxdclk(default_par) * 1000;
-	info->screen_base = ioremap(rivafb_fix.smem_start,
-				    rivafb_fix.smem_len);
+	info->screen_base = ioremap_wc(rivafb_fix.smem_start,
+				       rivafb_fix.smem_len);
 	if (!info->screen_base) {
 		printk(KERN_ERR PFX "cannot ioremap FB base\n");
 		ret = -EIO;
 		goto err_iounmap_pramin;
 	}
 
-#ifdef CONFIG_MTRR
-	if (!nomtrr) {
-		default_par->mtrr.vram = mtrr_add(rivafb_fix.smem_start,
-					   	  rivafb_fix.smem_len,
-					    	  MTRR_TYPE_WRCOMB, 1);
-		if (default_par->mtrr.vram < 0) {
-			printk(KERN_ERR PFX "unable to setup MTRR\n");
-		} else {
-			default_par->mtrr.vram_valid = 1;
-			/* let there be speed */
-			printk(KERN_INFO PFX "RIVA MTRR set to ON\n");
-		}
-	}
-#endif /* CONFIG_MTRR */
+	if (!nomtrr)
+		default_par->wc_cookie =
+			arch_phys_wc_add(rivafb_fix.smem_start,
+					 rivafb_fix.smem_len);
 
 	info->fbops = &riva_fb_ops;
 	info->fix = rivafb_fix;
@@ -2108,13 +2093,7 @@ static void rivafb_remove(struct pci_dev *pd)
 	unregister_framebuffer(info);
 
 	riva_bl_exit(info);
-
-#ifdef CONFIG_MTRR
-	if (par->mtrr.vram_valid)
-		mtrr_del(par->mtrr.vram, info->fix.smem_start,
-			 info->fix.smem_len);
-#endif /* CONFIG_MTRR */
-
+	arch_phys_wc_del(par->wc_cookie);
 	iounmap(par->ctrl_base);
 	iounmap(info->screen_base);
 	if (par->riva.Architecture == NV_ARCH_03)
@@ -2153,10 +2132,8 @@ static int rivafb_setup(char *options)
 			flatpanel = 1;
 		} else if (!strncmp(this_opt, "backlight:", 10)) {
 			backlight = simple_strtoul(this_opt+10, NULL, 0);
-#ifdef CONFIG_MTRR
 		} else if (!strncmp(this_opt, "nomtrr", 6)) {
 			nomtrr = 1;
-#endif
 		} else if (!strncmp(this_opt, "strictmode", 10)) {
 			strictmode = 1;
 		} else if (!strncmp(this_opt, "noaccel", 7)) {
@@ -2212,10 +2189,8 @@ module_param(flatpanel, int, 0);
 MODULE_PARM_DESC(flatpanel, "Enables experimental flat panel support for some chipsets. (0 or 1=enabled) (default=0)");
 module_param(forceCRTC, int, 0);
 MODULE_PARM_DESC(forceCRTC, "Forces usage of a particular CRTC in case autodetection fails. (0 or 1) (default=autodetect)");
-#ifdef CONFIG_MTRR
 module_param(nomtrr, bool, 0);
 MODULE_PARM_DESC(nomtrr, "Disables MTRR support (0 or 1=disabled) (default=0)");
-#endif
 module_param(strictmode, bool, 0);
 MODULE_PARM_DESC(strictmode, "Only use video modes from EDID");
 
diff --git a/drivers/video/fbdev/riva/rivafb.h b/drivers/video/fbdev/riva/rivafb.h
index d9f107b..61fd37c 100644
--- a/drivers/video/fbdev/riva/rivafb.h
+++ b/drivers/video/fbdev/riva/rivafb.h
@@ -61,9 +61,7 @@ struct riva_par {
 	int FlatPanel;
 	struct pci_dev *pdev;
 	int cursor_reset;
-#ifdef CONFIG_MTRR
-	struct { int vram; int vram_valid; } mtrr;
-#endif
+	int wc_cookie;
 	struct riva_i2c_chan chan[3];
 };
 
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 42/47] video: fbdev: tdfxfb: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/tdfxfb.c | 41 ++++++-----------------------------------
 include/video/tdfx.h         |  2 +-
 2 files changed, 7 insertions(+), 36 deletions(-)

diff --git a/drivers/video/fbdev/tdfxfb.c b/drivers/video/fbdev/tdfxfb.c
index f761fe3..621fa44 100644
--- a/drivers/video/fbdev/tdfxfb.c
+++ b/drivers/video/fbdev/tdfxfb.c
@@ -78,24 +78,6 @@
 
 #define DPRINTK(a, b...) pr_debug("fb: %s: " a, __func__ , ## b)
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#else
-/* duplicate asm/mtrr.h defines to work on archs without mtrr */
-#define MTRR_TYPE_WRCOMB     1
-
-static inline int mtrr_add(unsigned long base, unsigned long size,
-				unsigned int type, char increment)
-{
-    return -ENODEV;
-}
-static inline int mtrr_del(int reg, unsigned long base,
-				unsigned long size)
-{
-    return -ENODEV;
-}
-#endif
-
 #define BANSHEE_MAX_PIXCLOCK 270000
 #define VOODOO3_MAX_PIXCLOCK 300000
 #define VOODOO5_MAX_PIXCLOCK 350000
@@ -167,7 +149,6 @@ static int nopan;
 static int nowrap = 1;      /* not implemented (yet) */
 static int hwcursor = 1;
 static char *mode_option;
-/* mtrr option */
 static bool nomtrr;
 
 /* -------------------------------------------------------------------------
@@ -1454,8 +1435,8 @@ static int tdfxfb_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		goto out_err_regbase;
 	}
 
-	info->screen_base = ioremap_nocache(info->fix.smem_start,
-					    info->fix.smem_len);
+	info->screen_base = ioremap_wc(info->fix.smem_start,
+				       info->fix.smem_len);
 	if (!info->screen_base) {
 		printk(KERN_ERR "fb: Can't remap %s framebuffer.\n",
 				info->fix.id);
@@ -1473,11 +1454,9 @@ static int tdfxfb_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	printk(KERN_INFO "fb: %s memory = %dK\n", info->fix.id,
 			info->fix.smem_len >> 10);
 
-	default_par->mtrr_handle = -1;
 	if (!nomtrr)
-		default_par->mtrr_handle =
-			mtrr_add(info->fix.smem_start, info->fix.smem_len,
-				 MTRR_TYPE_WRCOMB, 1);
+		default_par->wc_cookie= arch_phys_wc_add(info->fix.smem_start,
+							 info->fix.smem_len);
 
 	info->fix.ypanstep	= nopan ? 0 : 1;
 	info->fix.ywrapstep	= nowrap ? 0 : 1;
@@ -1566,9 +1545,7 @@ out_err_iobase:
 #ifdef CONFIG_FB_3DFX_I2C
 	tdfxfb_delete_i2c_busses(default_par);
 #endif
-	if (default_par->mtrr_handle >= 0)
-		mtrr_del(default_par->mtrr_handle, info->fix.smem_start,
-			 info->fix.smem_len);
+	arch_phys_wc_del(default_par->wc_cookie);
 	release_region(pci_resource_start(pdev, 2),
 		       pci_resource_len(pdev, 2));
 out_err_screenbase:
@@ -1604,10 +1581,8 @@ static void __init tdfxfb_setup(char *options)
 			nowrap = 1;
 		} else if (!strncmp(this_opt, "hwcursor=", 9)) {
 			hwcursor = simple_strtoul(this_opt + 9, NULL, 0);
-#ifdef CONFIG_MTRR
 		} else if (!strncmp(this_opt, "nomtrr", 6)) {
 			nomtrr = 1;
-#endif
 		} else {
 			mode_option = this_opt;
 		}
@@ -1633,9 +1608,7 @@ static void tdfxfb_remove(struct pci_dev *pdev)
 #ifdef CONFIG_FB_3DFX_I2C
 	tdfxfb_delete_i2c_busses(par);
 #endif
-	if (par->mtrr_handle >= 0)
-		mtrr_del(par->mtrr_handle, info->fix.smem_start,
-			 info->fix.smem_len);
+	arch_phys_wc_del(par->wc_cookie);
 	iounmap(par->regbase_virt);
 	iounmap(info->screen_base);
 
@@ -1677,10 +1650,8 @@ MODULE_PARM_DESC(hwcursor, "Enable hardware cursor "
 			"(1=enable, 0=disable, default=1)");
 module_param(mode_option, charp, 0);
 MODULE_PARM_DESC(mode_option, "Initial video mode e.g. '648x480-8@60'");
-#ifdef CONFIG_MTRR
 module_param(nomtrr, bool, 0);
 MODULE_PARM_DESC(nomtrr, "Disable MTRR support (default: enabled)");
-#endif
 
 module_init(tdfxfb_init);
 module_exit(tdfxfb_exit);
diff --git a/include/video/tdfx.h b/include/video/tdfx.h
index befbaf0..69674b9 100644
--- a/include/video/tdfx.h
+++ b/include/video/tdfx.h
@@ -196,7 +196,7 @@ struct tdfx_par {
 	u32 palette[16];
 	void __iomem *regbase_virt;
 	unsigned long iobase;
-	int mtrr_handle;
+	int wc_cookie;
 #ifdef CONFIG_FB_3DFX_I2C
 	struct tdfxfb_i2c_chan chan[2];
 #endif
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 42/47] video: fbdev: tdfxfb: use arch_phys_wc_add() and ioremap_wc()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/tdfxfb.c | 41 ++++++-----------------------------------
 include/video/tdfx.h         |  2 +-
 2 files changed, 7 insertions(+), 36 deletions(-)

diff --git a/drivers/video/fbdev/tdfxfb.c b/drivers/video/fbdev/tdfxfb.c
index f761fe3..621fa44 100644
--- a/drivers/video/fbdev/tdfxfb.c
+++ b/drivers/video/fbdev/tdfxfb.c
@@ -78,24 +78,6 @@
 
 #define DPRINTK(a, b...) pr_debug("fb: %s: " a, __func__ , ## b)
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#else
-/* duplicate asm/mtrr.h defines to work on archs without mtrr */
-#define MTRR_TYPE_WRCOMB     1
-
-static inline int mtrr_add(unsigned long base, unsigned long size,
-				unsigned int type, char increment)
-{
-    return -ENODEV;
-}
-static inline int mtrr_del(int reg, unsigned long base,
-				unsigned long size)
-{
-    return -ENODEV;
-}
-#endif
-
 #define BANSHEE_MAX_PIXCLOCK 270000
 #define VOODOO3_MAX_PIXCLOCK 300000
 #define VOODOO5_MAX_PIXCLOCK 350000
@@ -167,7 +149,6 @@ static int nopan;
 static int nowrap = 1;      /* not implemented (yet) */
 static int hwcursor = 1;
 static char *mode_option;
-/* mtrr option */
 static bool nomtrr;
 
 /* -------------------------------------------------------------------------
@@ -1454,8 +1435,8 @@ static int tdfxfb_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		goto out_err_regbase;
 	}
 
-	info->screen_base = ioremap_nocache(info->fix.smem_start,
-					    info->fix.smem_len);
+	info->screen_base = ioremap_wc(info->fix.smem_start,
+				       info->fix.smem_len);
 	if (!info->screen_base) {
 		printk(KERN_ERR "fb: Can't remap %s framebuffer.\n",
 				info->fix.id);
@@ -1473,11 +1454,9 @@ static int tdfxfb_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	printk(KERN_INFO "fb: %s memory = %dK\n", info->fix.id,
 			info->fix.smem_len >> 10);
 
-	default_par->mtrr_handle = -1;
 	if (!nomtrr)
-		default_par->mtrr_handle -			mtrr_add(info->fix.smem_start, info->fix.smem_len,
-				 MTRR_TYPE_WRCOMB, 1);
+		default_par->wc_cookie= arch_phys_wc_add(info->fix.smem_start,
+							 info->fix.smem_len);
 
 	info->fix.ypanstep	= nopan ? 0 : 1;
 	info->fix.ywrapstep	= nowrap ? 0 : 1;
@@ -1566,9 +1545,7 @@ out_err_iobase:
 #ifdef CONFIG_FB_3DFX_I2C
 	tdfxfb_delete_i2c_busses(default_par);
 #endif
-	if (default_par->mtrr_handle >= 0)
-		mtrr_del(default_par->mtrr_handle, info->fix.smem_start,
-			 info->fix.smem_len);
+	arch_phys_wc_del(default_par->wc_cookie);
 	release_region(pci_resource_start(pdev, 2),
 		       pci_resource_len(pdev, 2));
 out_err_screenbase:
@@ -1604,10 +1581,8 @@ static void __init tdfxfb_setup(char *options)
 			nowrap = 1;
 		} else if (!strncmp(this_opt, "hwcursor=", 9)) {
 			hwcursor = simple_strtoul(this_opt + 9, NULL, 0);
-#ifdef CONFIG_MTRR
 		} else if (!strncmp(this_opt, "nomtrr", 6)) {
 			nomtrr = 1;
-#endif
 		} else {
 			mode_option = this_opt;
 		}
@@ -1633,9 +1608,7 @@ static void tdfxfb_remove(struct pci_dev *pdev)
 #ifdef CONFIG_FB_3DFX_I2C
 	tdfxfb_delete_i2c_busses(par);
 #endif
-	if (par->mtrr_handle >= 0)
-		mtrr_del(par->mtrr_handle, info->fix.smem_start,
-			 info->fix.smem_len);
+	arch_phys_wc_del(par->wc_cookie);
 	iounmap(par->regbase_virt);
 	iounmap(info->screen_base);
 
@@ -1677,10 +1650,8 @@ MODULE_PARM_DESC(hwcursor, "Enable hardware cursor "
 			"(1=enable, 0=disable, default=1)");
 module_param(mode_option, charp, 0);
 MODULE_PARM_DESC(mode_option, "Initial video mode e.g. '648x480-8@60'");
-#ifdef CONFIG_MTRR
 module_param(nomtrr, bool, 0);
 MODULE_PARM_DESC(nomtrr, "Disable MTRR support (default: enabled)");
-#endif
 
 module_init(tdfxfb_init);
 module_exit(tdfxfb_exit);
diff --git a/include/video/tdfx.h b/include/video/tdfx.h
index befbaf0..69674b9 100644
--- a/include/video/tdfx.h
+++ b/include/video/tdfx.h
@@ -196,7 +196,7 @@ struct tdfx_par {
 	u32 palette[16];
 	void __iomem *regbase_virt;
 	unsigned long iobase;
-	int mtrr_handle;
+	int wc_cookie;
 #ifdef CONFIG_FB_3DFX_I2C
 	struct tdfxfb_i2c_chan chan[2];
 #endif
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 42/47] video: fbdev: tdfxfb: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (77 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/tdfxfb.c | 41 ++++++-----------------------------------
 include/video/tdfx.h         |  2 +-
 2 files changed, 7 insertions(+), 36 deletions(-)

diff --git a/drivers/video/fbdev/tdfxfb.c b/drivers/video/fbdev/tdfxfb.c
index f761fe3..621fa44 100644
--- a/drivers/video/fbdev/tdfxfb.c
+++ b/drivers/video/fbdev/tdfxfb.c
@@ -78,24 +78,6 @@
 
 #define DPRINTK(a, b...) pr_debug("fb: %s: " a, __func__ , ## b)
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#else
-/* duplicate asm/mtrr.h defines to work on archs without mtrr */
-#define MTRR_TYPE_WRCOMB     1
-
-static inline int mtrr_add(unsigned long base, unsigned long size,
-				unsigned int type, char increment)
-{
-    return -ENODEV;
-}
-static inline int mtrr_del(int reg, unsigned long base,
-				unsigned long size)
-{
-    return -ENODEV;
-}
-#endif
-
 #define BANSHEE_MAX_PIXCLOCK 270000
 #define VOODOO3_MAX_PIXCLOCK 300000
 #define VOODOO5_MAX_PIXCLOCK 350000
@@ -167,7 +149,6 @@ static int nopan;
 static int nowrap = 1;      /* not implemented (yet) */
 static int hwcursor = 1;
 static char *mode_option;
-/* mtrr option */
 static bool nomtrr;
 
 /* -------------------------------------------------------------------------
@@ -1454,8 +1435,8 @@ static int tdfxfb_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		goto out_err_regbase;
 	}
 
-	info->screen_base = ioremap_nocache(info->fix.smem_start,
-					    info->fix.smem_len);
+	info->screen_base = ioremap_wc(info->fix.smem_start,
+				       info->fix.smem_len);
 	if (!info->screen_base) {
 		printk(KERN_ERR "fb: Can't remap %s framebuffer.\n",
 				info->fix.id);
@@ -1473,11 +1454,9 @@ static int tdfxfb_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	printk(KERN_INFO "fb: %s memory = %dK\n", info->fix.id,
 			info->fix.smem_len >> 10);
 
-	default_par->mtrr_handle = -1;
 	if (!nomtrr)
-		default_par->mtrr_handle =
-			mtrr_add(info->fix.smem_start, info->fix.smem_len,
-				 MTRR_TYPE_WRCOMB, 1);
+		default_par->wc_cookie= arch_phys_wc_add(info->fix.smem_start,
+							 info->fix.smem_len);
 
 	info->fix.ypanstep	= nopan ? 0 : 1;
 	info->fix.ywrapstep	= nowrap ? 0 : 1;
@@ -1566,9 +1545,7 @@ out_err_iobase:
 #ifdef CONFIG_FB_3DFX_I2C
 	tdfxfb_delete_i2c_busses(default_par);
 #endif
-	if (default_par->mtrr_handle >= 0)
-		mtrr_del(default_par->mtrr_handle, info->fix.smem_start,
-			 info->fix.smem_len);
+	arch_phys_wc_del(default_par->wc_cookie);
 	release_region(pci_resource_start(pdev, 2),
 		       pci_resource_len(pdev, 2));
 out_err_screenbase:
@@ -1604,10 +1581,8 @@ static void __init tdfxfb_setup(char *options)
 			nowrap = 1;
 		} else if (!strncmp(this_opt, "hwcursor=", 9)) {
 			hwcursor = simple_strtoul(this_opt + 9, NULL, 0);
-#ifdef CONFIG_MTRR
 		} else if (!strncmp(this_opt, "nomtrr", 6)) {
 			nomtrr = 1;
-#endif
 		} else {
 			mode_option = this_opt;
 		}
@@ -1633,9 +1608,7 @@ static void tdfxfb_remove(struct pci_dev *pdev)
 #ifdef CONFIG_FB_3DFX_I2C
 	tdfxfb_delete_i2c_busses(par);
 #endif
-	if (par->mtrr_handle >= 0)
-		mtrr_del(par->mtrr_handle, info->fix.smem_start,
-			 info->fix.smem_len);
+	arch_phys_wc_del(par->wc_cookie);
 	iounmap(par->regbase_virt);
 	iounmap(info->screen_base);
 
@@ -1677,10 +1650,8 @@ MODULE_PARM_DESC(hwcursor, "Enable hardware cursor "
 			"(1=enable, 0=disable, default=1)");
 module_param(mode_option, charp, 0);
 MODULE_PARM_DESC(mode_option, "Initial video mode e.g. '648x480-8@60'");
-#ifdef CONFIG_MTRR
 module_param(nomtrr, bool, 0);
 MODULE_PARM_DESC(nomtrr, "Disable MTRR support (default: enabled)");
-#endif
 
 module_init(tdfxfb_init);
 module_exit(tdfxfb_exit);
diff --git a/include/video/tdfx.h b/include/video/tdfx.h
index befbaf0..69674b9 100644
--- a/include/video/tdfx.h
+++ b/include/video/tdfx.h
@@ -196,7 +196,7 @@ struct tdfx_par {
 	u32 palette[16];
 	void __iomem *regbase_virt;
 	unsigned long iobase;
-	int mtrr_handle;
+	int wc_cookie;
 #ifdef CONFIG_FB_3DFX_I2C
 	struct tdfxfb_i2c_chan chan[2];
 #endif
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 43/47] video: fbdev: vt8623fb: use arch_phys_wc_add() and pci_iomap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/vt8623fb.c | 31 ++++++-------------------------
 1 file changed, 6 insertions(+), 25 deletions(-)

diff --git a/drivers/video/fbdev/vt8623fb.c b/drivers/video/fbdev/vt8623fb.c
index ea7f056..60f24828 100644
--- a/drivers/video/fbdev/vt8623fb.c
+++ b/drivers/video/fbdev/vt8623fb.c
@@ -26,13 +26,9 @@
 #include <linux/console.h> /* Why should fb driver call console functions? because console_lock() */
 #include <video/vga.h>
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 struct vt8623fb_info {
 	char __iomem *mmio_base;
-	int mtrr_reg;
+	int wc_cookie;
 	struct vgastate state;
 	struct mutex open_lock;
 	unsigned int ref_count;
@@ -99,10 +95,7 @@ static struct svga_timing_regs vt8623_timing_regs     = {
 /* Module parameters */
 
 static char *mode_option = "640x480-8@60";
-
-#ifdef CONFIG_MTRR
 static int mtrr = 1;
-#endif
 
 MODULE_AUTHOR("(c) 2006 Ondrej Zajicek <santiago@crfreenet.org>");
 MODULE_LICENSE("GPL");
@@ -112,11 +105,8 @@ module_param(mode_option, charp, 0644);
 MODULE_PARM_DESC(mode_option, "Default video mode ('640x480-8@60', etc)");
 module_param_named(mode, mode_option, charp, 0);
 MODULE_PARM_DESC(mode, "Default video mode e.g. '648x480-8@60' (deprecated)");
-
-#ifdef CONFIG_MTRR
 module_param(mtrr, int, 0444);
 MODULE_PARM_DESC(mtrr, "Enable write-combining with MTRR (1=enable, 0=disable, default=1)");
-#endif
 
 
 /* ------------------------------------------------------------------------- */
@@ -710,7 +700,7 @@ static int vt8623_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 	info->fix.mmio_len = pci_resource_len(dev, 1);
 
 	/* Map physical IO memory address into kernel space */
-	info->screen_base = pci_iomap(dev, 0, 0);
+	info->screen_base = pci_iomap_wc(dev, 0, 0);
 	if (! info->screen_base) {
 		rc = -ENOMEM;
 		dev_err(info->device, "iomap for framebuffer failed\n");
@@ -781,12 +771,9 @@ static int vt8623_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 	/* Record a reference to the driver data */
 	pci_set_drvdata(dev, info);
 
-#ifdef CONFIG_MTRR
-	if (mtrr) {
-		par->mtrr_reg = -1;
-		par->mtrr_reg = mtrr_add(info->fix.smem_start, info->fix.smem_len, MTRR_TYPE_WRCOMB, 1);
-	}
-#endif
+	if (mtrr)
+		par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+						  info->fix.smem_len);
 
 	return 0;
 
@@ -816,13 +803,7 @@ static void vt8623_pci_remove(struct pci_dev *dev)
 	if (info) {
 		struct vt8623fb_info *par = info->par;
 
-#ifdef CONFIG_MTRR
-		if (par->mtrr_reg >= 0) {
-			mtrr_del(par->mtrr_reg, 0, 0);
-			par->mtrr_reg = -1;
-		}
-#endif
-
+		arch_phys_wc_del(par->wc_cookie);
 		unregister_framebuffer(info);
 		fb_dealloc_cmap(&info->cmap);
 
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 43/47] video: fbdev: vt8623fb: use arch_phys_wc_add() and pci_iomap_wc()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/vt8623fb.c | 31 ++++++-------------------------
 1 file changed, 6 insertions(+), 25 deletions(-)

diff --git a/drivers/video/fbdev/vt8623fb.c b/drivers/video/fbdev/vt8623fb.c
index ea7f056..60f24828 100644
--- a/drivers/video/fbdev/vt8623fb.c
+++ b/drivers/video/fbdev/vt8623fb.c
@@ -26,13 +26,9 @@
 #include <linux/console.h> /* Why should fb driver call console functions? because console_lock() */
 #include <video/vga.h>
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 struct vt8623fb_info {
 	char __iomem *mmio_base;
-	int mtrr_reg;
+	int wc_cookie;
 	struct vgastate state;
 	struct mutex open_lock;
 	unsigned int ref_count;
@@ -99,10 +95,7 @@ static struct svga_timing_regs vt8623_timing_regs     = {
 /* Module parameters */
 
 static char *mode_option = "640x480-8@60";
-
-#ifdef CONFIG_MTRR
 static int mtrr = 1;
-#endif
 
 MODULE_AUTHOR("(c) 2006 Ondrej Zajicek <santiago@crfreenet.org>");
 MODULE_LICENSE("GPL");
@@ -112,11 +105,8 @@ module_param(mode_option, charp, 0644);
 MODULE_PARM_DESC(mode_option, "Default video mode ('640x480-8@60', etc)");
 module_param_named(mode, mode_option, charp, 0);
 MODULE_PARM_DESC(mode, "Default video mode e.g. '648x480-8@60' (deprecated)");
-
-#ifdef CONFIG_MTRR
 module_param(mtrr, int, 0444);
 MODULE_PARM_DESC(mtrr, "Enable write-combining with MTRR (1=enable, 0=disable, default=1)");
-#endif
 
 
 /* ------------------------------------------------------------------------- */
@@ -710,7 +700,7 @@ static int vt8623_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 	info->fix.mmio_len = pci_resource_len(dev, 1);
 
 	/* Map physical IO memory address into kernel space */
-	info->screen_base = pci_iomap(dev, 0, 0);
+	info->screen_base = pci_iomap_wc(dev, 0, 0);
 	if (! info->screen_base) {
 		rc = -ENOMEM;
 		dev_err(info->device, "iomap for framebuffer failed\n");
@@ -781,12 +771,9 @@ static int vt8623_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 	/* Record a reference to the driver data */
 	pci_set_drvdata(dev, info);
 
-#ifdef CONFIG_MTRR
-	if (mtrr) {
-		par->mtrr_reg = -1;
-		par->mtrr_reg = mtrr_add(info->fix.smem_start, info->fix.smem_len, MTRR_TYPE_WRCOMB, 1);
-	}
-#endif
+	if (mtrr)
+		par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+						  info->fix.smem_len);
 
 	return 0;
 
@@ -816,13 +803,7 @@ static void vt8623_pci_remove(struct pci_dev *dev)
 	if (info) {
 		struct vt8623fb_info *par = info->par;
 
-#ifdef CONFIG_MTRR
-		if (par->mtrr_reg >= 0) {
-			mtrr_del(par->mtrr_reg, 0, 0);
-			par->mtrr_reg = -1;
-		}
-#endif
-
+		arch_phys_wc_del(par->wc_cookie);
 		unregister_framebuffer(info);
 		fb_dealloc_cmap(&info->cmap);
 
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 43/47] video: fbdev: vt8623fb: use arch_phys_wc_add() and pci_iomap_wc()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (78 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses the same area for MTRR as for the ioremap().
Convert the driver from using the x86 specific MTRR code to
the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
will avoid MTRR if write-combining is available, in order to
take advantage of that also ensure the ioremap'd area is requested
as write-combining.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/vt8623fb.c | 31 ++++++-------------------------
 1 file changed, 6 insertions(+), 25 deletions(-)

diff --git a/drivers/video/fbdev/vt8623fb.c b/drivers/video/fbdev/vt8623fb.c
index ea7f056..60f24828 100644
--- a/drivers/video/fbdev/vt8623fb.c
+++ b/drivers/video/fbdev/vt8623fb.c
@@ -26,13 +26,9 @@
 #include <linux/console.h> /* Why should fb driver call console functions? because console_lock() */
 #include <video/vga.h>
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 struct vt8623fb_info {
 	char __iomem *mmio_base;
-	int mtrr_reg;
+	int wc_cookie;
 	struct vgastate state;
 	struct mutex open_lock;
 	unsigned int ref_count;
@@ -99,10 +95,7 @@ static struct svga_timing_regs vt8623_timing_regs     = {
 /* Module parameters */
 
 static char *mode_option = "640x480-8@60";
-
-#ifdef CONFIG_MTRR
 static int mtrr = 1;
-#endif
 
 MODULE_AUTHOR("(c) 2006 Ondrej Zajicek <santiago@crfreenet.org>");
 MODULE_LICENSE("GPL");
@@ -112,11 +105,8 @@ module_param(mode_option, charp, 0644);
 MODULE_PARM_DESC(mode_option, "Default video mode ('640x480-8@60', etc)");
 module_param_named(mode, mode_option, charp, 0);
 MODULE_PARM_DESC(mode, "Default video mode e.g. '648x480-8@60' (deprecated)");
-
-#ifdef CONFIG_MTRR
 module_param(mtrr, int, 0444);
 MODULE_PARM_DESC(mtrr, "Enable write-combining with MTRR (1=enable, 0=disable, default=1)");
-#endif
 
 
 /* ------------------------------------------------------------------------- */
@@ -710,7 +700,7 @@ static int vt8623_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 	info->fix.mmio_len = pci_resource_len(dev, 1);
 
 	/* Map physical IO memory address into kernel space */
-	info->screen_base = pci_iomap(dev, 0, 0);
+	info->screen_base = pci_iomap_wc(dev, 0, 0);
 	if (! info->screen_base) {
 		rc = -ENOMEM;
 		dev_err(info->device, "iomap for framebuffer failed\n");
@@ -781,12 +771,9 @@ static int vt8623_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 	/* Record a reference to the driver data */
 	pci_set_drvdata(dev, info);
 
-#ifdef CONFIG_MTRR
-	if (mtrr) {
-		par->mtrr_reg = -1;
-		par->mtrr_reg = mtrr_add(info->fix.smem_start, info->fix.smem_len, MTRR_TYPE_WRCOMB, 1);
-	}
-#endif
+	if (mtrr)
+		par->wc_cookie = arch_phys_wc_add(info->fix.smem_start,
+						  info->fix.smem_len);
 
 	return 0;
 
@@ -816,13 +803,7 @@ static void vt8623_pci_remove(struct pci_dev *dev)
 	if (info) {
 		struct vt8623fb_info *par = info->par;
 
-#ifdef CONFIG_MTRR
-		if (par->mtrr_reg >= 0) {
-			mtrr_del(par->mtrr_reg, 0, 0);
-			par->mtrr_reg = -1;
-		}
-#endif
-
+		arch_phys_wc_del(par->wc_cookie);
 		unregister_framebuffer(info);
 		fb_dealloc_cmap(&info->cmap);
 
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 44/47] video: fbdev: atmel_lcdfb: use ioremap_wc() for framebuffer
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The driver doesn't use mtrr_add() or arch_phys_wc_add() but
since we know the framebuffer is isolated already on an
ioremap() we can take advantage of write combining for
performance where possible.

In this case there are a few motivations for this:

a) Take advantage of PAT when available

b) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/atmel_lcdfb.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/video/fbdev/atmel_lcdfb.c b/drivers/video/fbdev/atmel_lcdfb.c
index 94a8d04..abadc49 100644
--- a/drivers/video/fbdev/atmel_lcdfb.c
+++ b/drivers/video/fbdev/atmel_lcdfb.c
@@ -1266,7 +1266,8 @@ static int __init atmel_lcdfb_probe(struct platform_device *pdev)
 			goto stop_clk;
 		}
 
-		info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
+		info->screen_base = ioremap_wc(info->fix.smem_start,
+					       info->fix.smem_len);
 		if (!info->screen_base) {
 			ret = -ENOMEM;
 			goto release_intmem;
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 44/47] video: fbdev: atmel_lcdfb: use ioremap_wc() for framebuffer
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The driver doesn't use mtrr_add() or arch_phys_wc_add() but
since we know the framebuffer is isolated already on an
ioremap() we can take advantage of write combining for
performance where possible.

In this case there are a few motivations for this:

a) Take advantage of PAT when available

b) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/atmel_lcdfb.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/video/fbdev/atmel_lcdfb.c b/drivers/video/fbdev/atmel_lcdfb.c
index 94a8d04..abadc49 100644
--- a/drivers/video/fbdev/atmel_lcdfb.c
+++ b/drivers/video/fbdev/atmel_lcdfb.c
@@ -1266,7 +1266,8 @@ static int __init atmel_lcdfb_probe(struct platform_device *pdev)
 			goto stop_clk;
 		}
 
-		info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
+		info->screen_base = ioremap_wc(info->fix.smem_start,
+					       info->fix.smem_len);
 		if (!info->screen_base) {
 			ret = -ENOMEM;
 			goto release_intmem;
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 44/47] video: fbdev: atmel_lcdfb: use ioremap_wc() for framebuffer
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (80 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The driver doesn't use mtrr_add() or arch_phys_wc_add() but
since we know the framebuffer is isolated already on an
ioremap() we can take advantage of write combining for
performance where possible.

In this case there are a few motivations for this:

a) Take advantage of PAT when available

b) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/atmel_lcdfb.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/video/fbdev/atmel_lcdfb.c b/drivers/video/fbdev/atmel_lcdfb.c
index 94a8d04..abadc49 100644
--- a/drivers/video/fbdev/atmel_lcdfb.c
+++ b/drivers/video/fbdev/atmel_lcdfb.c
@@ -1266,7 +1266,8 @@ static int __init atmel_lcdfb_probe(struct platform_device *pdev)
 			goto stop_clk;
 		}
 
-		info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
+		info->screen_base = ioremap_wc(info->fix.smem_start,
+					       info->fix.smem_len);
 		if (!info->screen_base) {
 			ret = -ENOMEM;
 			goto release_intmem;
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 45/47] video: fbdev: geode gxfb: use ioremap_wc() for framebuffer
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The driver doesn't use mtrr_add() or arch_phys_wc_add() but
since we know the framebuffer is isolated already on an
ioremap() we can take advantage of write combining for
performance where possible.

In this case there are a few motivations for this:

a) Take advantage of PAT when available

b) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/geode/gxfb_core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/video/fbdev/geode/gxfb_core.c b/drivers/video/fbdev/geode/gxfb_core.c
index 124d7c7..ec9fc9a 100644
--- a/drivers/video/fbdev/geode/gxfb_core.c
+++ b/drivers/video/fbdev/geode/gxfb_core.c
@@ -263,7 +263,8 @@ static int gxfb_map_video_memory(struct fb_info *info, struct pci_dev *dev)
 
 	info->fix.smem_start = pci_resource_start(dev, 0);
 	info->fix.smem_len = vram ? vram : gx_frame_buffer_size();
-	info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
+	info->screen_base = ioremap_wc(info->fix.smem_start,
+				       info->fix.smem_len);
 	if (!info->screen_base)
 		return -ENOMEM;
 
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 45/47] video: fbdev: geode gxfb: use ioremap_wc() for framebuffer
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The driver doesn't use mtrr_add() or arch_phys_wc_add() but
since we know the framebuffer is isolated already on an
ioremap() we can take advantage of write combining for
performance where possible.

In this case there are a few motivations for this:

a) Take advantage of PAT when available

b) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/geode/gxfb_core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/video/fbdev/geode/gxfb_core.c b/drivers/video/fbdev/geode/gxfb_core.c
index 124d7c7..ec9fc9a 100644
--- a/drivers/video/fbdev/geode/gxfb_core.c
+++ b/drivers/video/fbdev/geode/gxfb_core.c
@@ -263,7 +263,8 @@ static int gxfb_map_video_memory(struct fb_info *info, struct pci_dev *dev)
 
 	info->fix.smem_start = pci_resource_start(dev, 0);
 	info->fix.smem_len = vram ? vram : gx_frame_buffer_size();
-	info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
+	info->screen_base = ioremap_wc(info->fix.smem_start,
+				       info->fix.smem_len);
 	if (!info->screen_base)
 		return -ENOMEM;
 
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 45/47] video: fbdev: geode gxfb: use ioremap_wc() for framebuffer
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (82 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The driver doesn't use mtrr_add() or arch_phys_wc_add() but
since we know the framebuffer is isolated already on an
ioremap() we can take advantage of write combining for
performance where possible.

In this case there are a few motivations for this:

a) Take advantage of PAT when available

b) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/geode/gxfb_core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/video/fbdev/geode/gxfb_core.c b/drivers/video/fbdev/geode/gxfb_core.c
index 124d7c7..ec9fc9a 100644
--- a/drivers/video/fbdev/geode/gxfb_core.c
+++ b/drivers/video/fbdev/geode/gxfb_core.c
@@ -263,7 +263,8 @@ static int gxfb_map_video_memory(struct fb_info *info, struct pci_dev *dev)
 
 	info->fix.smem_start = pci_resource_start(dev, 0);
 	info->fix.smem_len = vram ? vram : gx_frame_buffer_size();
-	info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
+	info->screen_base = ioremap_wc(info->fix.smem_start,
+				       info->fix.smem_len);
 	if (!info->screen_base)
 		return -ENOMEM;
 
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 46/47] video: fbdev: gxt4500: use pci_ioremap_wc_bar() for framebuffer
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The driver doesn't use mtrr_add() or arch_phys_wc_add() but
since we know the framebuffer is isolated already on an
ioremap() we can take advantage of write combining for
performance where possible.

In this case there are a few motivations for this:

a) Take advantage of PAT when available

b) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/gxt4500.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/video/fbdev/gxt4500.c b/drivers/video/fbdev/gxt4500.c
index 135d78a..f19133a 100644
--- a/drivers/video/fbdev/gxt4500.c
+++ b/drivers/video/fbdev/gxt4500.c
@@ -662,7 +662,7 @@ static int gxt4500_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	info->fix.smem_start = fb_phys;
 	info->fix.smem_len = pci_resource_len(pdev, 1);
-	info->screen_base = pci_ioremap_bar(pdev, 1);
+	info->screen_base = pci_ioremap_wc_bar(pdev, 1);
 	if (!info->screen_base) {
 		dev_err(&pdev->dev, "gxt4500: cannot map framebuffer\n");
 		goto err_unmap_regs;
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 46/47] video: fbdev: gxt4500: use pci_ioremap_wc_bar() for framebuffer
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The driver doesn't use mtrr_add() or arch_phys_wc_add() but
since we know the framebuffer is isolated already on an
ioremap() we can take advantage of write combining for
performance where possible.

In this case there are a few motivations for this:

a) Take advantage of PAT when available

b) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/gxt4500.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/video/fbdev/gxt4500.c b/drivers/video/fbdev/gxt4500.c
index 135d78a..f19133a 100644
--- a/drivers/video/fbdev/gxt4500.c
+++ b/drivers/video/fbdev/gxt4500.c
@@ -662,7 +662,7 @@ static int gxt4500_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	info->fix.smem_start = fb_phys;
 	info->fix.smem_len = pci_resource_len(pdev, 1);
-	info->screen_base = pci_ioremap_bar(pdev, 1);
+	info->screen_base = pci_ioremap_wc_bar(pdev, 1);
 	if (!info->screen_base) {
 		dev_err(&pdev->dev, "gxt4500: cannot map framebuffer\n");
 		goto err_unmap_regs;
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 46/47] video: fbdev: gxt4500: use pci_ioremap_wc_bar() for framebuffer
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (85 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The driver doesn't use mtrr_add() or arch_phys_wc_add() but
since we know the framebuffer is isolated already on an
ioremap() we can take advantage of write combining for
performance where possible.

In this case there are a few motivations for this:

a) Take advantage of PAT when available

b) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/gxt4500.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/video/fbdev/gxt4500.c b/drivers/video/fbdev/gxt4500.c
index 135d78a..f19133a 100644
--- a/drivers/video/fbdev/gxt4500.c
+++ b/drivers/video/fbdev/gxt4500.c
@@ -662,7 +662,7 @@ static int gxt4500_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	info->fix.smem_start = fb_phys;
 	info->fix.smem_len = pci_resource_len(pdev, 1);
-	info->screen_base = pci_ioremap_bar(pdev, 1);
+	info->screen_base = pci_ioremap_wc_bar(pdev, 1);
 	if (!info->screen_base) {
 		dev_err(&pdev->dev, "gxt4500: cannot map framebuffer\n");
 		goto err_unmap_regs;
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 47/47] mtrr: bury MTRR - unexport mtrr_add() and mtrr_del()
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The crusade to replace mtrr_add() with architecture agnostic
arch_phys_wc_add() is complete, this will ensure write-combining
implementations (PAT on x86) is taken advantage instead of using
MTRR. With the crusade done now, hide direct MTRR access for
drivers.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 arch/x86/kernel/cpu/mtrr/main.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index b68b671..f0e19db 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -446,7 +446,6 @@ int mtrr_add(unsigned long base, unsigned long size, unsigned int type,
 	return mtrr_add_page(base >> PAGE_SHIFT, size >> PAGE_SHIFT, type,
 			     increment);
 }
-EXPORT_SYMBOL(mtrr_add);
 
 /**
  * mtrr_del_page - delete a memory type region
@@ -535,7 +534,6 @@ int mtrr_del(int reg, unsigned long base, unsigned long size)
 		return -EINVAL;
 	return mtrr_del_page(reg, base >> PAGE_SHIFT, size >> PAGE_SHIFT);
 }
-EXPORT_SYMBOL(mtrr_del);
 
 /**
  * __arch_phys_wc_add - add a WC MTRR even if PAT is available
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 47/47] mtrr: bury MTRR - unexport mtrr_add() and mtrr_del()
@ 2015-03-20 23:18   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-kernel, linux-fbdev, x86, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The crusade to replace mtrr_add() with architecture agnostic
arch_phys_wc_add() is complete, this will ensure write-combining
implementations (PAT on x86) is taken advantage instead of using
MTRR. With the crusade done now, hide direct MTRR access for
drivers.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 arch/x86/kernel/cpu/mtrr/main.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index b68b671..f0e19db 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -446,7 +446,6 @@ int mtrr_add(unsigned long base, unsigned long size, unsigned int type,
 	return mtrr_add_page(base >> PAGE_SHIFT, size >> PAGE_SHIFT, type,
 			     increment);
 }
-EXPORT_SYMBOL(mtrr_add);
 
 /**
  * mtrr_del_page - delete a memory type region
@@ -535,7 +534,6 @@ int mtrr_del(int reg, unsigned long base, unsigned long size)
 		return -EINVAL;
 	return mtrr_del_page(reg, base >> PAGE_SHIFT, size >> PAGE_SHIFT);
 }
-EXPORT_SYMBOL(mtrr_del);
 
 /**
  * __arch_phys_wc_add - add a WC MTRR even if PAT is available
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v1 47/47] mtrr: bury MTRR - unexport mtrr_add() and mtrr_del()
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (86 preceding siblings ...)
  (?)
@ 2015-03-20 23:18 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-20 23:18 UTC (permalink / raw)
  To: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	x86, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The crusade to replace mtrr_add() with architecture agnostic
arch_phys_wc_add() is complete, this will ensure write-combining
implementations (PAT on x86) is taken advantage instead of using
MTRR. With the crusade done now, hide direct MTRR access for
drivers.

Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 arch/x86/kernel/cpu/mtrr/main.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index b68b671..f0e19db 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -446,7 +446,6 @@ int mtrr_add(unsigned long base, unsigned long size, unsigned int type,
 	return mtrr_add_page(base >> PAGE_SHIFT, size >> PAGE_SHIFT, type,
 			     increment);
 }
-EXPORT_SYMBOL(mtrr_add);
 
 /**
  * mtrr_del_page - delete a memory type region
@@ -535,7 +534,6 @@ int mtrr_del(int reg, unsigned long base, unsigned long size)
 		return -EINVAL;
 	return mtrr_del_page(reg, base >> PAGE_SHIFT, size >> PAGE_SHIFT);
 }
-EXPORT_SYMBOL(mtrr_del);
 
 /**
  * __arch_phys_wc_add - add a WC MTRR even if PAT is available
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
  2015-03-20 23:17   ` Luis R. Rodriguez
@ 2015-03-20 23:48     ` Andy Lutomirski
  -1 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-20 23:48 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
	Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
	Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
	xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> Ideally on systems using PAT we can expect a swift
> transition away from MTRR. There can be a few exceptions
> to this, one is where device drivers are known to exist
> on PATs with errata, another situation is observed on
> old device drivers where devices had combined MMIO
> register access with whatever area they typically
> later wanted to end up using MTRR for on the same
> PCI BAR. This situation can still be addressed by
> splitting up ioremap'd PCI BAR into two ioremap'd
> calls, one for MMIO registers, and another for whatever
> is desirable for write-combining -- in order to
> accomplish this though quite a bit of driver
> restructuring is required.
>
> Device drivers which are known to require large
> amount of re-work in order to split ioremap'd areas
> can use __arch_phys_wc_add() to avoid regressions
> when PAT is enabled.
>
> For a good example driver where things are neatly
> split up on a PCI BAR refer the infiniband qib
> driver. For a good example of a driver where good
> amount of work is required refer to the infiniband
> ipath driver.
>
> This is *only* a transitive API -- and as such no new
> drivers are ever expected to use this.

What's the exact layout that this helps?  I'm sceptical that this can
ever be correct.

Is there some awful driver that has a large ioremap that's supposed to
contain multiple different memtypes?  If so, can we ioremap +
set_page_xyz instead?

--Andy

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
@ 2015-03-20 23:48     ` Andy Lutomirski
  0 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-20 23:48 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
	Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
	Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
	xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> Ideally on systems using PAT we can expect a swift
> transition away from MTRR. There can be a few exceptions
> to this, one is where device drivers are known to exist
> on PATs with errata, another situation is observed on
> old device drivers where devices had combined MMIO
> register access with whatever area they typically
> later wanted to end up using MTRR for on the same
> PCI BAR. This situation can still be addressed by
> splitting up ioremap'd PCI BAR into two ioremap'd
> calls, one for MMIO registers, and another for whatever
> is desirable for write-combining -- in order to
> accomplish this though quite a bit of driver
> restructuring is required.
>
> Device drivers which are known to require large
> amount of re-work in order to split ioremap'd areas
> can use __arch_phys_wc_add() to avoid regressions
> when PAT is enabled.
>
> For a good example driver where things are neatly
> split up on a PCI BAR refer the infiniband qib
> driver. For a good example of a driver where good
> amount of work is required refer to the infiniband
> ipath driver.
>
> This is *only* a transitive API -- and as such no new
> drivers are ever expected to use this.

What's the exact layout that this helps?  I'm sceptical that this can
ever be correct.

Is there some awful driver that has a large ioremap that's supposed to
contain multiple different memtypes?  If so, can we ioremap +
set_page_xyz instead?

--Andy

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
  2015-03-20 23:17   ` Luis R. Rodriguez
  (?)
@ 2015-03-20 23:48   ` Andy Lutomirski
  -1 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-20 23:48 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Juergen Gross, Linux Fbdev development list, X86 ML,
	Suresh Siddha, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	venkatesh.pallipadi, linux-kernel, xen-devel, Ingo Molnar,
	Tomi Valkeinen, Jan Beulich, H. Peter Anvin, Dave Airlie,
	Thomas Gleixner, Borislav Petkov,
	Jean-Christophe Plagniol-Villard, Ingo Molnar

On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> Ideally on systems using PAT we can expect a swift
> transition away from MTRR. There can be a few exceptions
> to this, one is where device drivers are known to exist
> on PATs with errata, another situation is observed on
> old device drivers where devices had combined MMIO
> register access with whatever area they typically
> later wanted to end up using MTRR for on the same
> PCI BAR. This situation can still be addressed by
> splitting up ioremap'd PCI BAR into two ioremap'd
> calls, one for MMIO registers, and another for whatever
> is desirable for write-combining -- in order to
> accomplish this though quite a bit of driver
> restructuring is required.
>
> Device drivers which are known to require large
> amount of re-work in order to split ioremap'd areas
> can use __arch_phys_wc_add() to avoid regressions
> when PAT is enabled.
>
> For a good example driver where things are neatly
> split up on a PCI BAR refer the infiniband qib
> driver. For a good example of a driver where good
> amount of work is required refer to the infiniband
> ipath driver.
>
> This is *only* a transitive API -- and as such no new
> drivers are ever expected to use this.

What's the exact layout that this helps?  I'm sceptical that this can
ever be correct.

Is there some awful driver that has a large ioremap that's supposed to
contain multiple different memtypes?  If so, can we ioremap +
set_page_xyz instead?

--Andy

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 03/47] devres: add devm_ioremap_wc()
  2015-03-20 23:17   ` Luis R. Rodriguez
@ 2015-03-20 23:49     ` Andy Lutomirski
  -1 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-20 23:49 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
	Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
	Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
	xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> We have devm_ioremap_nocache() but no devm_ioremap_wc()
> so add that. This will be used later.
>
> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>

Looks good to me.

> ---
>  Documentation/driver-model/devres.txt |  1 +
>  include/linux/io.h                    |  2 ++
>  lib/devres.c                          | 29 +++++++++++++++++++++++++++++
>  3 files changed, 32 insertions(+)
>
> diff --git a/Documentation/driver-model/devres.txt b/Documentation/driver-model/devres.txt
> index e1e2bbd..831a536 100644
> --- a/Documentation/driver-model/devres.txt
> +++ b/Documentation/driver-model/devres.txt
> @@ -276,6 +276,7 @@ IOMAP
>    devm_ioport_unmap()
>    devm_ioremap()
>    devm_ioremap_nocache()
> +  devm_ioremap_wc()
>    devm_ioremap_resource() : checks resource, requests memory region, ioremaps
>    devm_iounmap()
>    pcim_iomap()
> diff --git a/include/linux/io.h b/include/linux/io.h
> index 4cc299c..91101a1 100644
> --- a/include/linux/io.h
> +++ b/include/linux/io.h
> @@ -72,6 +72,8 @@ void __iomem *devm_ioremap(struct device *dev, resource_size_t offset,
>                            resource_size_t size);
>  void __iomem *devm_ioremap_nocache(struct device *dev, resource_size_t offset,
>                                    resource_size_t size);
> +void __iomem *devm_ioremap_wc(struct device *dev, resource_size_t offset,
> +                             resource_size_t size);
>  void devm_iounmap(struct device *dev, void __iomem *addr);
>  int check_signature(const volatile void __iomem *io_addr,
>                         const unsigned char *signature, int length);
> diff --git a/lib/devres.c b/lib/devres.c
> index 0f1dd2e..2eb2bfe 100644
> --- a/lib/devres.c
> +++ b/lib/devres.c
> @@ -72,6 +72,35 @@ void __iomem *devm_ioremap_nocache(struct device *dev, resource_size_t offset,
>  EXPORT_SYMBOL(devm_ioremap_nocache);
>
>  /**
> + * devm_ioremap_wc - Managed ioremap_wc()
> + * @dev: Generic device to remap IO address for
> + * @offset: BUS offset to map
> + * @size: Size of map
> + *
> + * Managed ioremap_wc().  Map is automatically unmapped on driver
> + * detach.
> + */
> +void __iomem *devm_ioremap_wc(struct device *dev, resource_size_t offset,
> +                             resource_size_t size)
> +{
> +       void __iomem **ptr, *addr;
> +
> +       ptr = devres_alloc(devm_ioremap_release, sizeof(*ptr), GFP_KERNEL);
> +       if (!ptr)
> +               return NULL;
> +
> +       addr = ioremap_wc(offset, size);
> +       if (addr) {
> +               *ptr = addr;
> +               devres_add(dev, ptr);
> +       } else
> +               devres_free(ptr);
> +
> +       return addr;
> +}
> +EXPORT_SYMBOL_GPL(devm_ioremap_wc);
> +
> +/**
>   * devm_iounmap - Managed iounmap()
>   * @dev: Generic device to unmap for
>   * @addr: Address to unmap
> --
> 2.3.2.209.gd67f9d5.dirty
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 03/47] devres: add devm_ioremap_wc()
@ 2015-03-20 23:49     ` Andy Lutomirski
  0 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-20 23:49 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
	Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
	Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
	xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> We have devm_ioremap_nocache() but no devm_ioremap_wc()
> so add that. This will be used later.
>
> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>

Looks good to me.

> ---
>  Documentation/driver-model/devres.txt |  1 +
>  include/linux/io.h                    |  2 ++
>  lib/devres.c                          | 29 +++++++++++++++++++++++++++++
>  3 files changed, 32 insertions(+)
>
> diff --git a/Documentation/driver-model/devres.txt b/Documentation/driver-model/devres.txt
> index e1e2bbd..831a536 100644
> --- a/Documentation/driver-model/devres.txt
> +++ b/Documentation/driver-model/devres.txt
> @@ -276,6 +276,7 @@ IOMAP
>    devm_ioport_unmap()
>    devm_ioremap()
>    devm_ioremap_nocache()
> +  devm_ioremap_wc()
>    devm_ioremap_resource() : checks resource, requests memory region, ioremaps
>    devm_iounmap()
>    pcim_iomap()
> diff --git a/include/linux/io.h b/include/linux/io.h
> index 4cc299c..91101a1 100644
> --- a/include/linux/io.h
> +++ b/include/linux/io.h
> @@ -72,6 +72,8 @@ void __iomem *devm_ioremap(struct device *dev, resource_size_t offset,
>                            resource_size_t size);
>  void __iomem *devm_ioremap_nocache(struct device *dev, resource_size_t offset,
>                                    resource_size_t size);
> +void __iomem *devm_ioremap_wc(struct device *dev, resource_size_t offset,
> +                             resource_size_t size);
>  void devm_iounmap(struct device *dev, void __iomem *addr);
>  int check_signature(const volatile void __iomem *io_addr,
>                         const unsigned char *signature, int length);
> diff --git a/lib/devres.c b/lib/devres.c
> index 0f1dd2e..2eb2bfe 100644
> --- a/lib/devres.c
> +++ b/lib/devres.c
> @@ -72,6 +72,35 @@ void __iomem *devm_ioremap_nocache(struct device *dev, resource_size_t offset,
>  EXPORT_SYMBOL(devm_ioremap_nocache);
>
>  /**
> + * devm_ioremap_wc - Managed ioremap_wc()
> + * @dev: Generic device to remap IO address for
> + * @offset: BUS offset to map
> + * @size: Size of map
> + *
> + * Managed ioremap_wc().  Map is automatically unmapped on driver
> + * detach.
> + */
> +void __iomem *devm_ioremap_wc(struct device *dev, resource_size_t offset,
> +                             resource_size_t size)
> +{
> +       void __iomem **ptr, *addr;
> +
> +       ptr = devres_alloc(devm_ioremap_release, sizeof(*ptr), GFP_KERNEL);
> +       if (!ptr)
> +               return NULL;
> +
> +       addr = ioremap_wc(offset, size);
> +       if (addr) {
> +               *ptr = addr;
> +               devres_add(dev, ptr);
> +       } else
> +               devres_free(ptr);
> +
> +       return addr;
> +}
> +EXPORT_SYMBOL_GPL(devm_ioremap_wc);
> +
> +/**
>   * devm_iounmap - Managed iounmap()
>   * @dev: Generic device to unmap for
>   * @addr: Address to unmap
> --
> 2.3.2.209.gd67f9d5.dirty
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 03/47] devres: add devm_ioremap_wc()
  2015-03-20 23:17   ` Luis R. Rodriguez
  (?)
  (?)
@ 2015-03-20 23:49   ` Andy Lutomirski
  -1 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-20 23:49 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Juergen Gross, Linux Fbdev development list, X86 ML,
	Suresh Siddha, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	venkatesh.pallipadi, linux-kernel, xen-devel, Ingo Molnar,
	Tomi Valkeinen, Jan Beulich, H. Peter Anvin, Dave Airlie,
	Thomas Gleixner, Borislav Petkov,
	Jean-Christophe Plagniol-Villard, Ingo Molnar

On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> We have devm_ioremap_nocache() but no devm_ioremap_wc()
> so add that. This will be used later.
>
> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>

Looks good to me.

> ---
>  Documentation/driver-model/devres.txt |  1 +
>  include/linux/io.h                    |  2 ++
>  lib/devres.c                          | 29 +++++++++++++++++++++++++++++
>  3 files changed, 32 insertions(+)
>
> diff --git a/Documentation/driver-model/devres.txt b/Documentation/driver-model/devres.txt
> index e1e2bbd..831a536 100644
> --- a/Documentation/driver-model/devres.txt
> +++ b/Documentation/driver-model/devres.txt
> @@ -276,6 +276,7 @@ IOMAP
>    devm_ioport_unmap()
>    devm_ioremap()
>    devm_ioremap_nocache()
> +  devm_ioremap_wc()
>    devm_ioremap_resource() : checks resource, requests memory region, ioremaps
>    devm_iounmap()
>    pcim_iomap()
> diff --git a/include/linux/io.h b/include/linux/io.h
> index 4cc299c..91101a1 100644
> --- a/include/linux/io.h
> +++ b/include/linux/io.h
> @@ -72,6 +72,8 @@ void __iomem *devm_ioremap(struct device *dev, resource_size_t offset,
>                            resource_size_t size);
>  void __iomem *devm_ioremap_nocache(struct device *dev, resource_size_t offset,
>                                    resource_size_t size);
> +void __iomem *devm_ioremap_wc(struct device *dev, resource_size_t offset,
> +                             resource_size_t size);
>  void devm_iounmap(struct device *dev, void __iomem *addr);
>  int check_signature(const volatile void __iomem *io_addr,
>                         const unsigned char *signature, int length);
> diff --git a/lib/devres.c b/lib/devres.c
> index 0f1dd2e..2eb2bfe 100644
> --- a/lib/devres.c
> +++ b/lib/devres.c
> @@ -72,6 +72,35 @@ void __iomem *devm_ioremap_nocache(struct device *dev, resource_size_t offset,
>  EXPORT_SYMBOL(devm_ioremap_nocache);
>
>  /**
> + * devm_ioremap_wc - Managed ioremap_wc()
> + * @dev: Generic device to remap IO address for
> + * @offset: BUS offset to map
> + * @size: Size of map
> + *
> + * Managed ioremap_wc().  Map is automatically unmapped on driver
> + * detach.
> + */
> +void __iomem *devm_ioremap_wc(struct device *dev, resource_size_t offset,
> +                             resource_size_t size)
> +{
> +       void __iomem **ptr, *addr;
> +
> +       ptr = devres_alloc(devm_ioremap_release, sizeof(*ptr), GFP_KERNEL);
> +       if (!ptr)
> +               return NULL;
> +
> +       addr = ioremap_wc(offset, size);
> +       if (addr) {
> +               *ptr = addr;
> +               devres_add(dev, ptr);
> +       } else
> +               devres_free(ptr);
> +
> +       return addr;
> +}
> +EXPORT_SYMBOL_GPL(devm_ioremap_wc);
> +
> +/**
>   * devm_iounmap - Managed iounmap()
>   * @dev: Generic device to unmap for
>   * @addr: Address to unmap
> --
> 2.3.2.209.gd67f9d5.dirty
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 04/47] pci: add pci_ioremap_wc_bar()
  2015-03-20 23:17   ` Luis R. Rodriguez
@ 2015-03-20 23:50     ` Andy Lutomirski
  -1 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-20 23:50 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
	Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
	Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
	xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> This lets drivers take advanate of PAT when available. This
> should help with the transition of converting video drivers over
> to ioremap_wc() to help with the goal of eventually using
> _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on
> ioremap_nocache() (de33c442e)
>
> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
>  drivers/pci/pci.c   | 14 ++++++++++++++
>  include/linux/pci.h |  1 +
>  2 files changed, 15 insertions(+)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 81f06e8..6afd507 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -137,6 +137,20 @@ void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar)
>                                      pci_resource_len(pdev, bar));
>  }
>  EXPORT_SYMBOL_GPL(pci_ioremap_bar);
> +
> +void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar)
> +{
> +       /*
> +        * Make sure the BAR is actually a memory resource, not an IO resource
> +        */
> +       if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) {
> +               WARN_ON(1);
> +               return NULL;
> +       }

if (WARN_ON(...))?

--Andy

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 04/47] pci: add pci_ioremap_wc_bar()
@ 2015-03-20 23:50     ` Andy Lutomirski
  0 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-20 23:50 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
	Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
	Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
	xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> This lets drivers take advanate of PAT when available. This
> should help with the transition of converting video drivers over
> to ioremap_wc() to help with the goal of eventually using
> _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on
> ioremap_nocache() (de33c442e)
>
> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
>  drivers/pci/pci.c   | 14 ++++++++++++++
>  include/linux/pci.h |  1 +
>  2 files changed, 15 insertions(+)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 81f06e8..6afd507 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -137,6 +137,20 @@ void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar)
>                                      pci_resource_len(pdev, bar));
>  }
>  EXPORT_SYMBOL_GPL(pci_ioremap_bar);
> +
> +void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar)
> +{
> +       /*
> +        * Make sure the BAR is actually a memory resource, not an IO resource
> +        */
> +       if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) {
> +               WARN_ON(1);
> +               return NULL;
> +       }

if (WARN_ON(...))?

--Andy

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 04/47] pci: add pci_ioremap_wc_bar()
  2015-03-20 23:17   ` Luis R. Rodriguez
  (?)
@ 2015-03-20 23:50   ` Andy Lutomirski
  -1 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-20 23:50 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Juergen Gross, Linux Fbdev development list, X86 ML,
	Suresh Siddha, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	venkatesh.pallipadi, linux-kernel, xen-devel, Ingo Molnar,
	Tomi Valkeinen, Jan Beulich, H. Peter Anvin, Dave Airlie,
	Thomas Gleixner, Borislav Petkov,
	Jean-Christophe Plagniol-Villard, Ingo Molnar

On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> This lets drivers take advanate of PAT when available. This
> should help with the transition of converting video drivers over
> to ioremap_wc() to help with the goal of eventually using
> _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on
> ioremap_nocache() (de33c442e)
>
> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
>  drivers/pci/pci.c   | 14 ++++++++++++++
>  include/linux/pci.h |  1 +
>  2 files changed, 15 insertions(+)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 81f06e8..6afd507 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -137,6 +137,20 @@ void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar)
>                                      pci_resource_len(pdev, bar));
>  }
>  EXPORT_SYMBOL_GPL(pci_ioremap_bar);
> +
> +void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar)
> +{
> +       /*
> +        * Make sure the BAR is actually a memory resource, not an IO resource
> +        */
> +       if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) {
> +               WARN_ON(1);
> +               return NULL;
> +       }

if (WARN_ON(...))?

--Andy

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-20 23:17   ` Luis R. Rodriguez
@ 2015-03-20 23:52     ` Andy Lutomirski
  -1 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-20 23:52 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
	Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
	Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
	xen-devel, Luis R. Rodriguez, Ingo Molnar, Linus Torvalds,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> The atyfb driver uses an MTRR work around since some
> cards use the same PCI BAR for the framebuffer and MMIO.
> In such cards the last page is used for MMIO, the rest for
> the framebuffer, so on those cards we ioremap() the MMIO
> page alone, then again ioremap() the full framebuffer
> including the MMIO space *and* ___then___ use an MTRR with
> MTRR_TYPE_WRCOMB on the full PCI BAR... and finally "hole"
> in an MTRR_TYPE_UNCACHABLE MTRR only for MMIO.
>
> This is a terrible fucking work around, and should by no means
> be necessary however evidence through a large series of conversion
> of drivers to ioremap_wc() for the framebuffer shows that around
> the time MTRR started becoming popular devices did not have things
> lined up for easily separating the framebuffer and MMIO register
> access. In some cases a driver requires significant intrusive
> changes in order to make the split for an ioremap() for MMIO registers
> and another ioremap_wc() for the framebuffer, at other times a
> bit of careful study of the driver suffices. This example driver
> falls into the later category.
>
> We can replace the MTRR MTRR_TYPE_UNCACHABLE
> work around by using ioremap_nocache(), the length of the
> MMIO space should already be correct. The other part we
> need to correct is ensuring we ioremap() for the framebuffer
> only the required size. Since the ioremap() happens early
> on probe for PCI devices before aty_init() where we typically
> adjust the length and know how to do it, we can fix this by
> pegging the bus type as PCI on PCI probe, and finally fudging
> and framebuffer length just as we do on aty_init().
>
> The last thing we do must do to remain sane is ensure we
> use the info->fix.smem_start and info->fix.smem_len for
> the framebuffer MTRR as we know that is always well adjusted.
> The *one* concern here would be if the MTRR is not in units
> of 4K __but__ we already know that in the PCI case this cannot
> happen, in the shared space setting the MTRR would be up to
> 0x7ff000 and assuming a 4K page:
>
> ; 0x7ff000 / 0x1000
>         2047
>
> Also, internally when MTRR is used mtrr_add() will use mtrr_check()
> and that should splat a warning when the MTRR base and size are
> not compatible with what is expected for MTRR usage.
>
> This fix lets us nuke the MTRR_TYPE_UNCACHABLE MTRR "hole".
>
> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
>  drivers/video/fbdev/aty/atyfb.h      |  1 -
>  drivers/video/fbdev/aty/atyfb_base.c | 28 ++++++----------------------
>  2 files changed, 6 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/video/fbdev/aty/atyfb.h b/drivers/video/fbdev/aty/atyfb.h
> index 1f39a62..89ec439 100644
> --- a/drivers/video/fbdev/aty/atyfb.h
> +++ b/drivers/video/fbdev/aty/atyfb.h
> @@ -184,7 +184,6 @@ struct atyfb_par {
>         spinlock_t int_lock;
>  #ifdef CONFIG_MTRR
>         int mtrr_aper;
> -       int mtrr_reg;
>  #endif
>         u32 mem_cntl;
>         struct crtc saved_crtc;
> diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> index 8025624..8875e56 100644
> --- a/drivers/video/fbdev/aty/atyfb_base.c
> +++ b/drivers/video/fbdev/aty/atyfb_base.c
> @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
>
>  #ifdef CONFIG_MTRR
>         par->mtrr_aper = -1;
> -       par->mtrr_reg = -1;
>         if (!nomtrr) {
> -               /* Cover the whole resource. */
> -               par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> +               par->mtrr_aper = mtrr_add(info->fix.smem_start,
> +                                         info->fix.smem_len,
>                                           MTRR_TYPE_WRCOMB, 1);
> -               if (par->mtrr_aper >= 0 && !par->aux_start) {
> -                       /* Make a hole for mmio. */
> -                       par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
> -                                                GUI_RESERVE, GUI_RESERVE,
> -                                                MTRR_TYPE_UNCACHABLE, 1);
> -                       if (par->mtrr_reg < 0) {
> -                               mtrr_del(par->mtrr_aper, 0, 0);
> -                               par->mtrr_aper = -1;
> -                       }
> -               }
>         }
>  #endif
>
> @@ -2776,10 +2765,6 @@ aty_init_exit:
>         par->pll_ops->set_pll(info, &par->saved_pll);
>
>  #ifdef CONFIG_MTRR
> -       if (par->mtrr_reg >= 0) {
> -               mtrr_del(par->mtrr_reg, 0, 0);
> -               par->mtrr_reg = -1;
> -       }
>         if (par->mtrr_aper >= 0) {
>                 mtrr_del(par->mtrr_aper, 0, 0);
>                 par->mtrr_aper = -1;
> @@ -3466,7 +3451,7 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
>         }
>
>         info->fix.mmio_start = raddr;
> -       par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
> +       par->ati_regbase = ioremap_nocache(info->fix.mmio_start, 0x1000);

Double-check me, but I think that ioremap_nocache + WC MTRR = WC.  I
think we might need ioremap_nocache_me_harder (or maybe ioremap_x86_uc
if you prefer that bikeshed color) for this.

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
@ 2015-03-20 23:52     ` Andy Lutomirski
  0 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-20 23:52 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
	Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
	Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
	xen-devel, Luis R. Rodriguez, Ingo Molnar, Linus Torvalds,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> The atyfb driver uses an MTRR work around since some
> cards use the same PCI BAR for the framebuffer and MMIO.
> In such cards the last page is used for MMIO, the rest for
> the framebuffer, so on those cards we ioremap() the MMIO
> page alone, then again ioremap() the full framebuffer
> including the MMIO space *and* ___then___ use an MTRR with
> MTRR_TYPE_WRCOMB on the full PCI BAR... and finally "hole"
> in an MTRR_TYPE_UNCACHABLE MTRR only for MMIO.
>
> This is a terrible fucking work around, and should by no means
> be necessary however evidence through a large series of conversion
> of drivers to ioremap_wc() for the framebuffer shows that around
> the time MTRR started becoming popular devices did not have things
> lined up for easily separating the framebuffer and MMIO register
> access. In some cases a driver requires significant intrusive
> changes in order to make the split for an ioremap() for MMIO registers
> and another ioremap_wc() for the framebuffer, at other times a
> bit of careful study of the driver suffices. This example driver
> falls into the later category.
>
> We can replace the MTRR MTRR_TYPE_UNCACHABLE
> work around by using ioremap_nocache(), the length of the
> MMIO space should already be correct. The other part we
> need to correct is ensuring we ioremap() for the framebuffer
> only the required size. Since the ioremap() happens early
> on probe for PCI devices before aty_init() where we typically
> adjust the length and know how to do it, we can fix this by
> pegging the bus type as PCI on PCI probe, and finally fudging
> and framebuffer length just as we do on aty_init().
>
> The last thing we do must do to remain sane is ensure we
> use the info->fix.smem_start and info->fix.smem_len for
> the framebuffer MTRR as we know that is always well adjusted.
> The *one* concern here would be if the MTRR is not in units
> of 4K __but__ we already know that in the PCI case this cannot
> happen, in the shared space setting the MTRR would be up to
> 0x7ff000 and assuming a 4K page:
>
> ; 0x7ff000 / 0x1000
>         2047
>
> Also, internally when MTRR is used mtrr_add() will use mtrr_check()
> and that should splat a warning when the MTRR base and size are
> not compatible with what is expected for MTRR usage.
>
> This fix lets us nuke the MTRR_TYPE_UNCACHABLE MTRR "hole".
>
> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
>  drivers/video/fbdev/aty/atyfb.h      |  1 -
>  drivers/video/fbdev/aty/atyfb_base.c | 28 ++++++----------------------
>  2 files changed, 6 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/video/fbdev/aty/atyfb.h b/drivers/video/fbdev/aty/atyfb.h
> index 1f39a62..89ec439 100644
> --- a/drivers/video/fbdev/aty/atyfb.h
> +++ b/drivers/video/fbdev/aty/atyfb.h
> @@ -184,7 +184,6 @@ struct atyfb_par {
>         spinlock_t int_lock;
>  #ifdef CONFIG_MTRR
>         int mtrr_aper;
> -       int mtrr_reg;
>  #endif
>         u32 mem_cntl;
>         struct crtc saved_crtc;
> diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> index 8025624..8875e56 100644
> --- a/drivers/video/fbdev/aty/atyfb_base.c
> +++ b/drivers/video/fbdev/aty/atyfb_base.c
> @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
>
>  #ifdef CONFIG_MTRR
>         par->mtrr_aper = -1;
> -       par->mtrr_reg = -1;
>         if (!nomtrr) {
> -               /* Cover the whole resource. */
> -               par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> +               par->mtrr_aper = mtrr_add(info->fix.smem_start,
> +                                         info->fix.smem_len,
>                                           MTRR_TYPE_WRCOMB, 1);
> -               if (par->mtrr_aper >= 0 && !par->aux_start) {
> -                       /* Make a hole for mmio. */
> -                       par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
> -                                                GUI_RESERVE, GUI_RESERVE,
> -                                                MTRR_TYPE_UNCACHABLE, 1);
> -                       if (par->mtrr_reg < 0) {
> -                               mtrr_del(par->mtrr_aper, 0, 0);
> -                               par->mtrr_aper = -1;
> -                       }
> -               }
>         }
>  #endif
>
> @@ -2776,10 +2765,6 @@ aty_init_exit:
>         par->pll_ops->set_pll(info, &par->saved_pll);
>
>  #ifdef CONFIG_MTRR
> -       if (par->mtrr_reg >= 0) {
> -               mtrr_del(par->mtrr_reg, 0, 0);
> -               par->mtrr_reg = -1;
> -       }
>         if (par->mtrr_aper >= 0) {
>                 mtrr_del(par->mtrr_aper, 0, 0);
>                 par->mtrr_aper = -1;
> @@ -3466,7 +3451,7 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
>         }
>
>         info->fix.mmio_start = raddr;
> -       par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
> +       par->ati_regbase = ioremap_nocache(info->fix.mmio_start, 0x1000);

Double-check me, but I think that ioremap_nocache + WC MTRR = WC.  I
think we might need ioremap_nocache_me_harder (or maybe ioremap_x86_uc
if you prefer that bikeshed color) for this.

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-20 23:17   ` Luis R. Rodriguez
  (?)
  (?)
@ 2015-03-20 23:52   ` Andy Lutomirski
  -1 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-20 23:52 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Juergen Gross, Jean-Christophe Plagniol-Villard,
	Linux Fbdev development list, X86 ML, Suresh Siddha,
	Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	venkatesh.pallipadi, linux-kernel, xen-devel, Ingo Molnar,
	Tomi Valkeinen, Jan Beulich, H. Peter Anvin, Dave Airlie,
	Thomas Gleixner, Borislav Petkov, Linus Torvalds, Ingo Molnar

On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> The atyfb driver uses an MTRR work around since some
> cards use the same PCI BAR for the framebuffer and MMIO.
> In such cards the last page is used for MMIO, the rest for
> the framebuffer, so on those cards we ioremap() the MMIO
> page alone, then again ioremap() the full framebuffer
> including the MMIO space *and* ___then___ use an MTRR with
> MTRR_TYPE_WRCOMB on the full PCI BAR... and finally "hole"
> in an MTRR_TYPE_UNCACHABLE MTRR only for MMIO.
>
> This is a terrible fucking work around, and should by no means
> be necessary however evidence through a large series of conversion
> of drivers to ioremap_wc() for the framebuffer shows that around
> the time MTRR started becoming popular devices did not have things
> lined up for easily separating the framebuffer and MMIO register
> access. In some cases a driver requires significant intrusive
> changes in order to make the split for an ioremap() for MMIO registers
> and another ioremap_wc() for the framebuffer, at other times a
> bit of careful study of the driver suffices. This example driver
> falls into the later category.
>
> We can replace the MTRR MTRR_TYPE_UNCACHABLE
> work around by using ioremap_nocache(), the length of the
> MMIO space should already be correct. The other part we
> need to correct is ensuring we ioremap() for the framebuffer
> only the required size. Since the ioremap() happens early
> on probe for PCI devices before aty_init() where we typically
> adjust the length and know how to do it, we can fix this by
> pegging the bus type as PCI on PCI probe, and finally fudging
> and framebuffer length just as we do on aty_init().
>
> The last thing we do must do to remain sane is ensure we
> use the info->fix.smem_start and info->fix.smem_len for
> the framebuffer MTRR as we know that is always well adjusted.
> The *one* concern here would be if the MTRR is not in units
> of 4K __but__ we already know that in the PCI case this cannot
> happen, in the shared space setting the MTRR would be up to
> 0x7ff000 and assuming a 4K page:
>
> ; 0x7ff000 / 0x1000
>         2047
>
> Also, internally when MTRR is used mtrr_add() will use mtrr_check()
> and that should splat a warning when the MTRR base and size are
> not compatible with what is expected for MTRR usage.
>
> This fix lets us nuke the MTRR_TYPE_UNCACHABLE MTRR "hole".
>
> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
>  drivers/video/fbdev/aty/atyfb.h      |  1 -
>  drivers/video/fbdev/aty/atyfb_base.c | 28 ++++++----------------------
>  2 files changed, 6 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/video/fbdev/aty/atyfb.h b/drivers/video/fbdev/aty/atyfb.h
> index 1f39a62..89ec439 100644
> --- a/drivers/video/fbdev/aty/atyfb.h
> +++ b/drivers/video/fbdev/aty/atyfb.h
> @@ -184,7 +184,6 @@ struct atyfb_par {
>         spinlock_t int_lock;
>  #ifdef CONFIG_MTRR
>         int mtrr_aper;
> -       int mtrr_reg;
>  #endif
>         u32 mem_cntl;
>         struct crtc saved_crtc;
> diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> index 8025624..8875e56 100644
> --- a/drivers/video/fbdev/aty/atyfb_base.c
> +++ b/drivers/video/fbdev/aty/atyfb_base.c
> @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
>
>  #ifdef CONFIG_MTRR
>         par->mtrr_aper = -1;
> -       par->mtrr_reg = -1;
>         if (!nomtrr) {
> -               /* Cover the whole resource. */
> -               par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> +               par->mtrr_aper = mtrr_add(info->fix.smem_start,
> +                                         info->fix.smem_len,
>                                           MTRR_TYPE_WRCOMB, 1);
> -               if (par->mtrr_aper >= 0 && !par->aux_start) {
> -                       /* Make a hole for mmio. */
> -                       par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
> -                                                GUI_RESERVE, GUI_RESERVE,
> -                                                MTRR_TYPE_UNCACHABLE, 1);
> -                       if (par->mtrr_reg < 0) {
> -                               mtrr_del(par->mtrr_aper, 0, 0);
> -                               par->mtrr_aper = -1;
> -                       }
> -               }
>         }
>  #endif
>
> @@ -2776,10 +2765,6 @@ aty_init_exit:
>         par->pll_ops->set_pll(info, &par->saved_pll);
>
>  #ifdef CONFIG_MTRR
> -       if (par->mtrr_reg >= 0) {
> -               mtrr_del(par->mtrr_reg, 0, 0);
> -               par->mtrr_reg = -1;
> -       }
>         if (par->mtrr_aper >= 0) {
>                 mtrr_del(par->mtrr_aper, 0, 0);
>                 par->mtrr_aper = -1;
> @@ -3466,7 +3451,7 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
>         }
>
>         info->fix.mmio_start = raddr;
> -       par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
> +       par->ati_regbase = ioremap_nocache(info->fix.mmio_start, 0x1000);

Double-check me, but I think that ioremap_nocache + WC MTRR = WC.  I
think we might need ioremap_nocache_me_harder (or maybe ioremap_x86_uc
if you prefer that bikeshed color) for this.

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR
  2015-03-20 23:17 ` Luis R. Rodriguez
@ 2015-03-21  1:08   ` Andy Lutomirski
  -1 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-21  1:08 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
	Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
	Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
	xen-devel, Luis R. Rodriguez

On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> When a system has PAT support enabled you don't need to be
> using MTRRs. Andy had added arch_phys_wc_add() long ago to
> help with this but not all drivers were converted over. We
> have to take care to only convert drivers where we know that
> the proper ioremap_wc() API has been used. Doing this requires
> a bit of work on verifying the driver split out the ioremap'd
> areas -- and if not doing that ourselves. Verifying a driver
> uses the same areas can be hard but with a bit of love Coccinelle
> can help with that.
>
> We're motivated to change drivers for a few reasons:
>
> 1) Take advantage of PAT when available
>
> 2) Help with the goal of eventually using _PAGE_CACHE_UC over
>    _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

Nice!

--Andy

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR
@ 2015-03-21  1:08   ` Andy Lutomirski
  0 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-21  1:08 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
	Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
	Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
	xen-devel, Luis R. Rodriguez

On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> When a system has PAT support enabled you don't need to be
> using MTRRs. Andy had added arch_phys_wc_add() long ago to
> help with this but not all drivers were converted over. We
> have to take care to only convert drivers where we know that
> the proper ioremap_wc() API has been used. Doing this requires
> a bit of work on verifying the driver split out the ioremap'd
> areas -- and if not doing that ourselves. Verifying a driver
> uses the same areas can be hard but with a bit of love Coccinelle
> can help with that.
>
> We're motivated to change drivers for a few reasons:
>
> 1) Take advantage of PAT when available
>
> 2) Help with the goal of eventually using _PAGE_CACHE_UC over
>    _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

Nice!

--Andy

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR
  2015-03-20 23:17 ` Luis R. Rodriguez
                   ` (88 preceding siblings ...)
  (?)
@ 2015-03-21  1:08 ` Andy Lutomirski
  -1 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-21  1:08 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Juergen Gross, Linux Fbdev development list, X86 ML,
	Suresh Siddha, Luis R. Rodriguez, venkatesh.pallipadi,
	linux-kernel, xen-devel, Ingo Molnar, Jan Beulich,
	H. Peter Anvin, Dave Airlie, Thomas Gleixner, Borislav Petkov

On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> When a system has PAT support enabled you don't need to be
> using MTRRs. Andy had added arch_phys_wc_add() long ago to
> help with this but not all drivers were converted over. We
> have to take care to only convert drivers where we know that
> the proper ioremap_wc() API has been used. Doing this requires
> a bit of work on verifying the driver split out the ioremap'd
> areas -- and if not doing that ourselves. Verifying a driver
> uses the same areas can be hard but with a bit of love Coccinelle
> can help with that.
>
> We're motivated to change drivers for a few reasons:
>
> 1) Take advantage of PAT when available
>
> 2) Help with the goal of eventually using _PAGE_CACHE_UC over
>    _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)

Nice!

--Andy

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 21/47] ethernet: myri10ge: use arch_phys_wc_add()
  2015-03-20 23:18   ` Luis R. Rodriguez
@ 2015-03-21  7:08     ` Hyong-Youb Kim
  -1 siblings, 0 replies; 710+ messages in thread
From: Hyong-Youb Kim @ 2015-03-21  7:08 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied, linux-kernel, linux-fbdev, x86,
	xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
	Hyong-Youb Kim, netdev, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 20, 2015 at 04:18:11PM -0700, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
> 
> This driver already uses ioremap_wc() on the same range
> so when write-combining is available that will be used
> instead.
> 
[...]
> --- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
> +++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
[...]
> @@ -1984,7 +1979,6 @@ myri10ge_get_ethtool_stats(struct net_device *netdev,
>  		data[i] = ((u64 *)&link_stats)[i];
>  
>  	data[i++] = (unsigned int)mgp->tx_boundary;
> -	data[i++] = (unsigned int)mgp->wc_enabled;
>  	data[i++] = (unsigned int)mgp->pdev->irq;
>  	data[i++] = (unsigned int)mgp->msi_enabled;
>  	data[i++] = (unsigned int)mgp->msix_enabled;

You would have to delete "WC from myri10ge_gstrings_main_stats too.
Something like below.  Thanks.

@@ -1905,7 +1905,7 @@ static const char myri10ge_gstrings_main_stats[][ETH_GSTRING_LEN] = {
 	"tx_aborted_errors", "tx_carrier_errors", "tx_fifo_errors",
 	"tx_heartbeat_errors", "tx_window_errors",
 	/* device-specific stats */
-	"tx_boundary", "WC", "irq", "MSI", "MSIX",
+	"tx_boundary", "irq", "MSI", "MSIX",
 	"read_dma_bw_MBs", "write_dma_bw_MBs", "read_write_dma_bw_MBs",
 	"serial_number", "watchdog_resets",
 #ifdef CONFIG_MYRI10GE_DCA

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 21/47] ethernet: myri10ge: use arch_phys_wc_add()
@ 2015-03-21  7:08     ` Hyong-Youb Kim
  0 siblings, 0 replies; 710+ messages in thread
From: Hyong-Youb Kim @ 2015-03-21  7:08 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied, linux-kernel, linux-fbdev, x86,
	xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
	Hyong-Youb Kim, netdev, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 20, 2015 at 04:18:11PM -0700, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
> 
> This driver already uses ioremap_wc() on the same range
> so when write-combining is available that will be used
> instead.
> 
[...]
> --- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
> +++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
[...]
> @@ -1984,7 +1979,6 @@ myri10ge_get_ethtool_stats(struct net_device *netdev,
>  		data[i] = ((u64 *)&link_stats)[i];
>  
>  	data[i++] = (unsigned int)mgp->tx_boundary;
> -	data[i++] = (unsigned int)mgp->wc_enabled;
>  	data[i++] = (unsigned int)mgp->pdev->irq;
>  	data[i++] = (unsigned int)mgp->msi_enabled;
>  	data[i++] = (unsigned int)mgp->msix_enabled;

You would have to delete "WC from myri10ge_gstrings_main_stats too.
Something like below.  Thanks.

@@ -1905,7 +1905,7 @@ static const char myri10ge_gstrings_main_stats[][ETH_GSTRING_LEN] = {
 	"tx_aborted_errors", "tx_carrier_errors", "tx_fifo_errors",
 	"tx_heartbeat_errors", "tx_window_errors",
 	/* device-specific stats */
-	"tx_boundary", "WC", "irq", "MSI", "MSIX",
+	"tx_boundary", "irq", "MSI", "MSIX",
 	"read_dma_bw_MBs", "write_dma_bw_MBs", "read_write_dma_bw_MBs",
 	"serial_number", "watchdog_resets",
 #ifdef CONFIG_MYRI10GE_DCA

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 21/47] ethernet: myri10ge: use arch_phys_wc_add()
  2015-03-20 23:18   ` Luis R. Rodriguez
  (?)
@ 2015-03-21  7:08   ` Hyong-Youb Kim
  -1 siblings, 0 replies; 710+ messages in thread
From: Hyong-Youb Kim @ 2015-03-21  7:08 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: linux-fbdev, Daniel Vetter, JBeulich, hpa, suresh.b.siddha,
	Tomi Valkeinen, x86, mingo, xen-devel, Ingo Molnar, bp,
	Jean-Christophe Plagniol-Villard, Antonino Daplas,
	Luis R. Rodriguez, airlied, tglx, jgross, netdev, linux-kernel,
	luto, Hyong-Youb Kim, venkatesh.pallipadi

On Fri, Mar 20, 2015 at 04:18:11PM -0700, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
> 
> This driver already uses ioremap_wc() on the same range
> so when write-combining is available that will be used
> instead.
> 
[...]
> --- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
> +++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
[...]
> @@ -1984,7 +1979,6 @@ myri10ge_get_ethtool_stats(struct net_device *netdev,
>  		data[i] = ((u64 *)&link_stats)[i];
>  
>  	data[i++] = (unsigned int)mgp->tx_boundary;
> -	data[i++] = (unsigned int)mgp->wc_enabled;
>  	data[i++] = (unsigned int)mgp->pdev->irq;
>  	data[i++] = (unsigned int)mgp->msi_enabled;
>  	data[i++] = (unsigned int)mgp->msix_enabled;

You would have to delete "WC from myri10ge_gstrings_main_stats too.
Something like below.  Thanks.

@@ -1905,7 +1905,7 @@ static const char myri10ge_gstrings_main_stats[][ETH_GSTRING_LEN] = {
 	"tx_aborted_errors", "tx_carrier_errors", "tx_fifo_errors",
 	"tx_heartbeat_errors", "tx_window_errors",
 	/* device-specific stats */
-	"tx_boundary", "WC", "irq", "MSI", "MSIX",
+	"tx_boundary", "irq", "MSI", "MSIX",
 	"read_dma_bw_MBs", "write_dma_bw_MBs", "read_write_dma_bw_MBs",
 	"serial_number", "watchdog_resets",
 #ifdef CONFIG_MYRI10GE_DCA

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-20 23:17   ` Luis R. Rodriguez
@ 2015-03-21  9:15     ` Ville Syrjälä
  -1 siblings, 0 replies; 710+ messages in thread
From: Ville Syrjälä @ 2015-03-21  9:15 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied, linux-kernel, linux-fbdev, x86,
	xen-devel, Luis R. Rodriguez, Ingo Molnar, Linus Torvalds,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> index 8025624..8875e56 100644
> --- a/drivers/video/fbdev/aty/atyfb_base.c
> +++ b/drivers/video/fbdev/aty/atyfb_base.c
> @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
>  
>  #ifdef CONFIG_MTRR
>  	par->mtrr_aper = -1;
> -	par->mtrr_reg = -1;
>  	if (!nomtrr) {
> -		/* Cover the whole resource. */
> -		par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> +		par->mtrr_aper = mtrr_add(info->fix.smem_start,
> +					  info->fix.smem_len,
>  					  MTRR_TYPE_WRCOMB, 1);

MTRRs need power of two size, so how is this supposed to work?

> -		if (par->mtrr_aper >= 0 && !par->aux_start) {
> -			/* Make a hole for mmio. */
> -			par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
> -						 GUI_RESERVE, GUI_RESERVE,
> -						 MTRR_TYPE_UNCACHABLE, 1);
> -			if (par->mtrr_reg < 0) {
> -				mtrr_del(par->mtrr_aper, 0, 0);
> -				par->mtrr_aper = -1;
> -			}
> -		}
>  	}
>  #endif
>  
> @@ -2776,10 +2765,6 @@ aty_init_exit:
>  	par->pll_ops->set_pll(info, &par->saved_pll);
>  
>  #ifdef CONFIG_MTRR
> -	if (par->mtrr_reg >= 0) {
> -		mtrr_del(par->mtrr_reg, 0, 0);
> -		par->mtrr_reg = -1;
> -	}
>  	if (par->mtrr_aper >= 0) {
>  		mtrr_del(par->mtrr_aper, 0, 0);
>  		par->mtrr_aper = -1;
> @@ -3466,7 +3451,7 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
>  	}
>  
>  	info->fix.mmio_start = raddr;
> -	par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
> +	par->ati_regbase = ioremap_nocache(info->fix.mmio_start, 0x1000);
>  	if (par->ati_regbase == NULL)
>  		return -ENOMEM;
>  
> @@ -3491,6 +3476,8 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
>  	info->fix.smem_start = addr;
>  	info->fix.smem_len = 0x800000;
>  
> +	aty_fudge_framebuffer_len(info);
> +
>  	info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
>  	if (info->screen_base == NULL) {
>  		ret = -ENOMEM;
> @@ -3563,6 +3550,7 @@ static int atyfb_pci_probe(struct pci_dev *pdev,
>  		return -ENOMEM;
>  	}
>  	par = info->par;
> +	par->bus_type = PCI;
>  	info->fix = atyfb_fix;
>  	info->device = &pdev->dev;
>  	par->pci_id = pdev->device;
> @@ -3732,10 +3720,6 @@ static void atyfb_remove(struct fb_info *info)
>  #endif
>  
>  #ifdef CONFIG_MTRR
> -	if (par->mtrr_reg >= 0) {
> -		mtrr_del(par->mtrr_reg, 0, 0);
> -		par->mtrr_reg = -1;
> -	}
>  	if (par->mtrr_aper >= 0) {
>  		mtrr_del(par->mtrr_aper, 0, 0);
>  		par->mtrr_aper = -1;
> -- 
> 2.3.2.209.gd67f9d5.dirty
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Ville Syrjälä
syrjala@sci.fi
http://www.sci.fi/~syrjala/

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
@ 2015-03-21  9:15     ` Ville Syrjälä
  0 siblings, 0 replies; 710+ messages in thread
From: Ville Syrjälä @ 2015-03-21  9:15 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied, linux-kernel, linux-fbdev, x86,
	xen-devel, Luis R. Rodriguez, Ingo Molnar, Linus Torvalds,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> index 8025624..8875e56 100644
> --- a/drivers/video/fbdev/aty/atyfb_base.c
> +++ b/drivers/video/fbdev/aty/atyfb_base.c
> @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
>  
>  #ifdef CONFIG_MTRR
>  	par->mtrr_aper = -1;
> -	par->mtrr_reg = -1;
>  	if (!nomtrr) {
> -		/* Cover the whole resource. */
> -		par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> +		par->mtrr_aper = mtrr_add(info->fix.smem_start,
> +					  info->fix.smem_len,
>  					  MTRR_TYPE_WRCOMB, 1);

MTRRs need power of two size, so how is this supposed to work?

> -		if (par->mtrr_aper >= 0 && !par->aux_start) {
> -			/* Make a hole for mmio. */
> -			par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
> -						 GUI_RESERVE, GUI_RESERVE,
> -						 MTRR_TYPE_UNCACHABLE, 1);
> -			if (par->mtrr_reg < 0) {
> -				mtrr_del(par->mtrr_aper, 0, 0);
> -				par->mtrr_aper = -1;
> -			}
> -		}
>  	}
>  #endif
>  
> @@ -2776,10 +2765,6 @@ aty_init_exit:
>  	par->pll_ops->set_pll(info, &par->saved_pll);
>  
>  #ifdef CONFIG_MTRR
> -	if (par->mtrr_reg >= 0) {
> -		mtrr_del(par->mtrr_reg, 0, 0);
> -		par->mtrr_reg = -1;
> -	}
>  	if (par->mtrr_aper >= 0) {
>  		mtrr_del(par->mtrr_aper, 0, 0);
>  		par->mtrr_aper = -1;
> @@ -3466,7 +3451,7 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
>  	}
>  
>  	info->fix.mmio_start = raddr;
> -	par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
> +	par->ati_regbase = ioremap_nocache(info->fix.mmio_start, 0x1000);
>  	if (par->ati_regbase = NULL)
>  		return -ENOMEM;
>  
> @@ -3491,6 +3476,8 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
>  	info->fix.smem_start = addr;
>  	info->fix.smem_len = 0x800000;
>  
> +	aty_fudge_framebuffer_len(info);
> +
>  	info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
>  	if (info->screen_base = NULL) {
>  		ret = -ENOMEM;
> @@ -3563,6 +3550,7 @@ static int atyfb_pci_probe(struct pci_dev *pdev,
>  		return -ENOMEM;
>  	}
>  	par = info->par;
> +	par->bus_type = PCI;
>  	info->fix = atyfb_fix;
>  	info->device = &pdev->dev;
>  	par->pci_id = pdev->device;
> @@ -3732,10 +3720,6 @@ static void atyfb_remove(struct fb_info *info)
>  #endif
>  
>  #ifdef CONFIG_MTRR
> -	if (par->mtrr_reg >= 0) {
> -		mtrr_del(par->mtrr_reg, 0, 0);
> -		par->mtrr_reg = -1;
> -	}
>  	if (par->mtrr_aper >= 0) {
>  		mtrr_del(par->mtrr_aper, 0, 0);
>  		par->mtrr_aper = -1;
> -- 
> 2.3.2.209.gd67f9d5.dirty
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Ville Syrjälä
syrjala@sci.fi
http://www.sci.fi/~syrjala/

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-20 23:17   ` Luis R. Rodriguez
                     ` (2 preceding siblings ...)
  (?)
@ 2015-03-21  9:15   ` Ville Syrjälä
  -1 siblings, 0 replies; 710+ messages in thread
From: Ville Syrjälä @ 2015-03-21  9:15 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: jgross, Jean-Christophe Plagniol-Villard, linux-fbdev, x86,
	suresh.b.siddha, Antonino Daplas, Daniel Vetter,
	Luis R. Rodriguez, venkatesh.pallipadi, linux-kernel, luto,
	xen-devel, mingo, Tomi Valkeinen, JBeulich, hpa, airlied, tglx,
	bp, Linus Torvalds, Ingo Molnar

On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> index 8025624..8875e56 100644
> --- a/drivers/video/fbdev/aty/atyfb_base.c
> +++ b/drivers/video/fbdev/aty/atyfb_base.c
> @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
>  
>  #ifdef CONFIG_MTRR
>  	par->mtrr_aper = -1;
> -	par->mtrr_reg = -1;
>  	if (!nomtrr) {
> -		/* Cover the whole resource. */
> -		par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> +		par->mtrr_aper = mtrr_add(info->fix.smem_start,
> +					  info->fix.smem_len,
>  					  MTRR_TYPE_WRCOMB, 1);

MTRRs need power of two size, so how is this supposed to work?

> -		if (par->mtrr_aper >= 0 && !par->aux_start) {
> -			/* Make a hole for mmio. */
> -			par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
> -						 GUI_RESERVE, GUI_RESERVE,
> -						 MTRR_TYPE_UNCACHABLE, 1);
> -			if (par->mtrr_reg < 0) {
> -				mtrr_del(par->mtrr_aper, 0, 0);
> -				par->mtrr_aper = -1;
> -			}
> -		}
>  	}
>  #endif
>  
> @@ -2776,10 +2765,6 @@ aty_init_exit:
>  	par->pll_ops->set_pll(info, &par->saved_pll);
>  
>  #ifdef CONFIG_MTRR
> -	if (par->mtrr_reg >= 0) {
> -		mtrr_del(par->mtrr_reg, 0, 0);
> -		par->mtrr_reg = -1;
> -	}
>  	if (par->mtrr_aper >= 0) {
>  		mtrr_del(par->mtrr_aper, 0, 0);
>  		par->mtrr_aper = -1;
> @@ -3466,7 +3451,7 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
>  	}
>  
>  	info->fix.mmio_start = raddr;
> -	par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
> +	par->ati_regbase = ioremap_nocache(info->fix.mmio_start, 0x1000);
>  	if (par->ati_regbase == NULL)
>  		return -ENOMEM;
>  
> @@ -3491,6 +3476,8 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
>  	info->fix.smem_start = addr;
>  	info->fix.smem_len = 0x800000;
>  
> +	aty_fudge_framebuffer_len(info);
> +
>  	info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
>  	if (info->screen_base == NULL) {
>  		ret = -ENOMEM;
> @@ -3563,6 +3550,7 @@ static int atyfb_pci_probe(struct pci_dev *pdev,
>  		return -ENOMEM;
>  	}
>  	par = info->par;
> +	par->bus_type = PCI;
>  	info->fix = atyfb_fix;
>  	info->device = &pdev->dev;
>  	par->pci_id = pdev->device;
> @@ -3732,10 +3720,6 @@ static void atyfb_remove(struct fb_info *info)
>  #endif
>  
>  #ifdef CONFIG_MTRR
> -	if (par->mtrr_reg >= 0) {
> -		mtrr_del(par->mtrr_reg, 0, 0);
> -		par->mtrr_reg = -1;
> -	}
>  	if (par->mtrr_aper >= 0) {
>  		mtrr_del(par->mtrr_aper, 0, 0);
>  		par->mtrr_aper = -1;
> -- 
> 2.3.2.209.gd67f9d5.dirty
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Ville Syrjälä
syrjala@sci.fi
http://www.sci.fi/~syrjala/

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
  2015-03-20 23:17   ` Luis R. Rodriguez
  (?)
@ 2015-03-23 17:20     ` Bjorn Helgaas
  -1 siblings, 0 replies; 710+ messages in thread
From: Bjorn Helgaas @ 2015-03-23 17:20 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	jgross, Jan Beulich, Borislav Petkov, Suresh Siddha,
	venkatesh.pallipadi, Dave Airlie, linux-kernel, linux-fbdev, x86,
	xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
	Stefan Bader, Konrad Rzeszutek Wilk, Ville Syrjälä,
	David Vrabel, Toshi Kani, Roger Pau Monné,
	xen-devel

Hi Luis,

This seems OK to me, but I'm curious about a few things.

On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> This allows drivers to take advantage of write-combining
> when possible. Ideally we'd have pci_read_bases() just
> peg an IORESOURCE_WC flag for us

We do set IORESOURCE_PREFETCH.  Do you mean something different?

>  but where exactly
> video devices memory lie varies *largely* and at times things
> are mixed with MMIO registers, sometimes we can address
> the changes in drivers, other times the change requires
> intrusive changes.

What does a video device address have to do with this?  I do see that
if a BAR maps only a frame buffer, the device might be able to mark it
prefetchable, while if the BAR mapped both a frame buffer and some
registers, it might not be able to make it prefetchable.  But that
doesn't seem like it depends on the *address*.

pci_iomap_range() already makes a cacheable mapping if
IORESOURCE_CACHEABLE; I'm guessing that you would like it to
automatically use WC if the BAR if IORESOURCE_PREFETCH, e.g.,

  if (flags & IORESOURCE_CACHEABLE)
    return ioremap(start, len);
  if (flags & IORESOURCE_PREFETCH)
    return ioremap_wc(start, len);
  return ioremap_nocache(start, len);

Is there a reason not to do that?

> Although there is also arch_phys_wc_add() that makes use of
> architecture specific write-combinging alternatives (MTRR on
> x86 when a system does not have PAT) we void polluting
> pci_iomap() space with it and force drivers and subsystems
> that want to use it to be explicit.
>
> There are a few motivations for this:
>
> a) Take advantage of PAT when available
>
> b) Help bury MTRR code away, MTRR is architecture specific and on
>    x86 its replaced by PAT
>
> c) Help with the goal of eventually using _PAGE_CACHE_UC over
>    _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
> ...

> +void __iomem *pci_iomap_wc_range(struct pci_dev *dev,
> +                                int bar,
> +                                unsigned long offset,
> +                                unsigned long maxlen)
> +{
> +       resource_size_t start = pci_resource_start(dev, bar);
> +       resource_size_t len = pci_resource_len(dev, bar);
> +       unsigned long flags = pci_resource_flags(dev, bar);
> +
> +       if (len <= offset || !start)
> +               return NULL;
> +       len -= offset;
> +       start += offset;
> +       if (maxlen && len > maxlen)
> +               len = maxlen;
> +       if (flags & IORESOURCE_IO)
> +               return __pci_ioport_map(dev, start, len);
> +       if (flags & IORESOURCE_MEM)

Should we log a note in dmesg if the BAR is *not* IORESOURCE_PREFETCH?
 I know the driver might know it's safe even if the device didn't mark
the BAR as prefetchable, but it does seem like an easy way for a
driver to shoot itself in the foot.

> +               return ioremap_wc(start, len);
> +       /* What? */
> +       return NULL;
> +}

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
@ 2015-03-23 17:20     ` Bjorn Helgaas
  0 siblings, 0 replies; 710+ messages in thread
From: Bjorn Helgaas @ 2015-03-23 17:20 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	jgross, Jan Beulich, Borislav Petkov, Suresh Siddha,
	venkatesh.pallipadi, Dave Airlie, linux-kernel, linux-fbdev, x86,
	xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
	Stefan Bader, Konrad Rzeszutek Wilk, Ville Syrjälä,
	David Vrabel, Toshi Kani, Roger Pau Monné,
	xen-devel

Hi Luis,

This seems OK to me, but I'm curious about a few things.

On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> This allows drivers to take advantage of write-combining
> when possible. Ideally we'd have pci_read_bases() just
> peg an IORESOURCE_WC flag for us

We do set IORESOURCE_PREFETCH.  Do you mean something different?

>  but where exactly
> video devices memory lie varies *largely* and at times things
> are mixed with MMIO registers, sometimes we can address
> the changes in drivers, other times the change requires
> intrusive changes.

What does a video device address have to do with this?  I do see that
if a BAR maps only a frame buffer, the device might be able to mark it
prefetchable, while if the BAR mapped both a frame buffer and some
registers, it might not be able to make it prefetchable.  But that
doesn't seem like it depends on the *address*.

pci_iomap_range() already makes a cacheable mapping if
IORESOURCE_CACHEABLE; I'm guessing that you would like it to
automatically use WC if the BAR if IORESOURCE_PREFETCH, e.g.,

  if (flags & IORESOURCE_CACHEABLE)
    return ioremap(start, len);
  if (flags & IORESOURCE_PREFETCH)
    return ioremap_wc(start, len);
  return ioremap_nocache(start, len);

Is there a reason not to do that?

> Although there is also arch_phys_wc_add() that makes use of
> architecture specific write-combinging alternatives (MTRR on
> x86 when a system does not have PAT) we void polluting
> pci_iomap() space with it and force drivers and subsystems
> that want to use it to be explicit.
>
> There are a few motivations for this:
>
> a) Take advantage of PAT when available
>
> b) Help bury MTRR code away, MTRR is architecture specific and on
>    x86 its replaced by PAT
>
> c) Help with the goal of eventually using _PAGE_CACHE_UC over
>    _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
> ...

> +void __iomem *pci_iomap_wc_range(struct pci_dev *dev,
> +                                int bar,
> +                                unsigned long offset,
> +                                unsigned long maxlen)
> +{
> +       resource_size_t start = pci_resource_start(dev, bar);
> +       resource_size_t len = pci_resource_len(dev, bar);
> +       unsigned long flags = pci_resource_flags(dev, bar);
> +
> +       if (len <= offset || !start)
> +               return NULL;
> +       len -= offset;
> +       start += offset;
> +       if (maxlen && len > maxlen)
> +               len = maxlen;
> +       if (flags & IORESOURCE_IO)
> +               return __pci_ioport_map(dev, start, len);
> +       if (flags & IORESOURCE_MEM)

Should we log a note in dmesg if the BAR is *not* IORESOURCE_PREFETCH?
 I know the driver might know it's safe even if the device didn't mark
the BAR as prefetchable, but it does seem like an easy way for a
driver to shoot itself in the foot.

> +               return ioremap_wc(start, len);
> +       /* What? */
> +       return NULL;
> +}

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
@ 2015-03-23 17:20     ` Bjorn Helgaas
  0 siblings, 0 replies; 710+ messages in thread
From: Bjorn Helgaas @ 2015-03-23 17:20 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	jgross, Jan Beulich, Borislav Petkov, Suresh Siddha,
	venkatesh.pallipadi, Dave Airlie, linux-kernel, linux-fbdev, x86,
	xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin

Hi Luis,

This seems OK to me, but I'm curious about a few things.

On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> This allows drivers to take advantage of write-combining
> when possible. Ideally we'd have pci_read_bases() just
> peg an IORESOURCE_WC flag for us

We do set IORESOURCE_PREFETCH.  Do you mean something different?

>  but where exactly
> video devices memory lie varies *largely* and at times things
> are mixed with MMIO registers, sometimes we can address
> the changes in drivers, other times the change requires
> intrusive changes.

What does a video device address have to do with this?  I do see that
if a BAR maps only a frame buffer, the device might be able to mark it
prefetchable, while if the BAR mapped both a frame buffer and some
registers, it might not be able to make it prefetchable.  But that
doesn't seem like it depends on the *address*.

pci_iomap_range() already makes a cacheable mapping if
IORESOURCE_CACHEABLE; I'm guessing that you would like it to
automatically use WC if the BAR if IORESOURCE_PREFETCH, e.g.,

  if (flags & IORESOURCE_CACHEABLE)
    return ioremap(start, len);
  if (flags & IORESOURCE_PREFETCH)
    return ioremap_wc(start, len);
  return ioremap_nocache(start, len);

Is there a reason not to do that?

> Although there is also arch_phys_wc_add() that makes use of
> architecture specific write-combinging alternatives (MTRR on
> x86 when a system does not have PAT) we void polluting
> pci_iomap() space with it and force drivers and subsystems
> that want to use it to be explicit.
>
> There are a few motivations for this:
>
> a) Take advantage of PAT when available
>
> b) Help bury MTRR code away, MTRR is architecture specific and on
>    x86 its replaced by PAT
>
> c) Help with the goal of eventually using _PAGE_CACHE_UC over
>    _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
> ...

> +void __iomem *pci_iomap_wc_range(struct pci_dev *dev,
> +                                int bar,
> +                                unsigned long offset,
> +                                unsigned long maxlen)
> +{
> +       resource_size_t start = pci_resource_start(dev, bar);
> +       resource_size_t len = pci_resource_len(dev, bar);
> +       unsigned long flags = pci_resource_flags(dev, bar);
> +
> +       if (len <= offset || !start)
> +               return NULL;
> +       len -= offset;
> +       start += offset;
> +       if (maxlen && len > maxlen)
> +               len = maxlen;
> +       if (flags & IORESOURCE_IO)
> +               return __pci_ioport_map(dev, start, len);
> +       if (flags & IORESOURCE_MEM)

Should we log a note in dmesg if the BAR is *not* IORESOURCE_PREFETCH?
 I know the driver might know it's safe even if the device didn't mark
the BAR as prefetchable, but it does seem like an easy way for a
driver to shoot itself in the foot.

> +               return ioremap_wc(start, len);
> +       /* What? */
> +       return NULL;
> +}

^ permalink raw reply	[flat|nested] 710+ messages in thread

* [PATCH v4 0/7] mtrr, mm, x86: Enhance MTRR checks for huge I/O mapping
@ 2015-03-24 22:08 ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-03-24 22:08 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle

This patchset enhances MTRR checks for the kernel huge I/O mapping,
which was enabled by the patchset below:
  https://lkml.org/lkml/2015/3/3/589

The following functional changes are made in patch 7/7.
 - Allow pud_set_huge() and pmd_set_huge() to create a huge page
   mapping to a range covered by a single MTRR entry of any memory
   type.
 - Log a pr_warn() message when a specified PMD map range spans more
   than a single MTRR entry.  Drivers should make a mapping request
   aligned to a single MTRR entry when the range is covered by MTRRs.

Patch 1/7 addresses other review comments to the mapping funcs for
better code read-ability.  Patch 2/7 - 6/7 are bug fixes and clean up
to mtrr_type_lookup().

The patchset is based on the -mm tree.
---
v4:
 - Update the change logs of patchset. (Ingo Molnar)
 - Add patch 3/7 to make the wrong address fix as a separate patch.
   (Ingo Molnar)
 - Add patch 5/7 to define MTRR_TYPE_INVALID. (Ingo Molnar)
 - Update patch 6/7 to document MTRR fixed ranges. (Ingo Molnar)

v3:
 - Add patch 3/5 to fix a bug in MTRR state checks.
 - Update patch 4/5 to create separate functions for the fixed and
   variable entries. (Ingo Molnar)

v2:
 - Update change logs and comments per review comments.
   (Ingo Molnar)
 - Add patch 3/4 to clean up mtrr_type_lookup(). (Ingo Molnar)

---
Toshi Kani (7):
 1/7 mm, x86: Document return values of mapping funcs
 2/7 mtrr, x86: Fix MTRR lookup to handle inclusive entry
 3/7 mtrr, x86: Remove a wrong address check in __mtrr_type_lookup()
 4/7 mtrr, x86: Fix MTRR state checks in mtrr_type_lookup()
 5/7 mtrr, x86: Define MTRR_TYPE_INVALID for mtrr_type_lookup()
 6/7 mtrr, x86: Clean up mtrr_type_lookup()
 7/7 mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping

---
 arch/x86/Kconfig                   |   2 +-
 arch/x86/include/asm/mtrr.h        |   7 +-
 arch/x86/include/uapi/asm/mtrr.h   |  12 ++-
 arch/x86/kernel/cpu/mtrr/generic.c | 192 ++++++++++++++++++++++++-------------
 arch/x86/mm/pat.c                  |   4 +-
 arch/x86/mm/pgtable.c              |  53 +++++++---
 6 files changed, 181 insertions(+), 89 deletions(-)


^ permalink raw reply	[flat|nested] 710+ messages in thread

* [PATCH v4 0/7] mtrr, mm, x86: Enhance MTRR checks for huge I/O mapping
@ 2015-03-24 22:08 ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-03-24 22:08 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle

This patchset enhances MTRR checks for the kernel huge I/O mapping,
which was enabled by the patchset below:
  https://lkml.org/lkml/2015/3/3/589

The following functional changes are made in patch 7/7.
 - Allow pud_set_huge() and pmd_set_huge() to create a huge page
   mapping to a range covered by a single MTRR entry of any memory
   type.
 - Log a pr_warn() message when a specified PMD map range spans more
   than a single MTRR entry.  Drivers should make a mapping request
   aligned to a single MTRR entry when the range is covered by MTRRs.

Patch 1/7 addresses other review comments to the mapping funcs for
better code read-ability.  Patch 2/7 - 6/7 are bug fixes and clean up
to mtrr_type_lookup().

The patchset is based on the -mm tree.
---
v4:
 - Update the change logs of patchset. (Ingo Molnar)
 - Add patch 3/7 to make the wrong address fix as a separate patch.
   (Ingo Molnar)
 - Add patch 5/7 to define MTRR_TYPE_INVALID. (Ingo Molnar)
 - Update patch 6/7 to document MTRR fixed ranges. (Ingo Molnar)

v3:
 - Add patch 3/5 to fix a bug in MTRR state checks.
 - Update patch 4/5 to create separate functions for the fixed and
   variable entries. (Ingo Molnar)

v2:
 - Update change logs and comments per review comments.
   (Ingo Molnar)
 - Add patch 3/4 to clean up mtrr_type_lookup(). (Ingo Molnar)

---
Toshi Kani (7):
 1/7 mm, x86: Document return values of mapping funcs
 2/7 mtrr, x86: Fix MTRR lookup to handle inclusive entry
 3/7 mtrr, x86: Remove a wrong address check in __mtrr_type_lookup()
 4/7 mtrr, x86: Fix MTRR state checks in mtrr_type_lookup()
 5/7 mtrr, x86: Define MTRR_TYPE_INVALID for mtrr_type_lookup()
 6/7 mtrr, x86: Clean up mtrr_type_lookup()
 7/7 mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping

---
 arch/x86/Kconfig                   |   2 +-
 arch/x86/include/asm/mtrr.h        |   7 +-
 arch/x86/include/uapi/asm/mtrr.h   |  12 ++-
 arch/x86/kernel/cpu/mtrr/generic.c | 192 ++++++++++++++++++++++++-------------
 arch/x86/mm/pat.c                  |   4 +-
 arch/x86/mm/pgtable.c              |  53 +++++++---
 6 files changed, 181 insertions(+), 89 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* [PATCH v4 1/7] mm, x86: Document return values of mapping funcs
  2015-03-24 22:08 ` Toshi Kani
@ 2015-03-24 22:08   ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-03-24 22:08 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle, Toshi Kani

Document the return values of KVA mapping functions,
pud_set_huge(), pmd_set_huge, pud_clear_huge() and
pmd_clear_huge().

Simplify the conditions to select HAVE_ARCH_HUGE_VMAP
in the Kconfig, since X86_PAE depends on X86_32.

There is no functional change in this patch.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/Kconfig      |    2 +-
 arch/x86/mm/pgtable.c |   36 ++++++++++++++++++++++++++++--------
 2 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cb23206..2ea27da 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -99,7 +99,7 @@ config X86
 	select IRQ_FORCED_THREADING
 	select HAVE_BPF_JIT if X86_64
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
-	select HAVE_ARCH_HUGE_VMAP if X86_64 || (X86_32 && X86_PAE)
+	select HAVE_ARCH_HUGE_VMAP if X86_64 || X86_PAE
 	select ARCH_HAS_SG_CHAIN
 	select CLKEVT_I8253
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 0b97d2c..4891fa1 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -563,14 +563,19 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
 }
 
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+/**
+ * pud_set_huge - setup kernel PUD mapping
+ *
+ * MTRR can override PAT memory types with 4KB granularity.  Therefore,
+ * it does not set up a huge page when the range is covered by a non-WB
+ * type of MTRR.  0xFF indicates that MTRR are disabled.
+ *
+ * Return 1 on success, and 0 when no PUD was set.
+ */
 int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 {
 	u8 mtrr;
 
-	/*
-	 * Do not use a huge page when the range is covered by non-WB type
-	 * of MTRRs.
-	 */
 	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
 	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
 		return 0;
@@ -584,14 +589,19 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 	return 1;
 }
 
+/**
+ * pmd_set_huge - setup kernel PMD mapping
+ *
+ * MTRR can override PAT memory types with 4KB granularity.  Therefore,
+ * it does not set up a huge page when the range is covered by a non-WB
+ * type of MTRR.  0xFF indicates that MTRR are disabled.
+ *
+ * Return 1 on success, and 0 when no PMD was set.
+ */
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
 {
 	u8 mtrr;
 
-	/*
-	 * Do not use a huge page when the range is covered by non-WB type
-	 * of MTRRs.
-	 */
 	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
 	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
 		return 0;
@@ -605,6 +615,11 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
 	return 1;
 }
 
+/**
+ * pud_clear_huge - clear kernel PUD mapping when it is set
+ *
+ * Return 1 on success, and 0 when no PUD map was found.
+ */
 int pud_clear_huge(pud_t *pud)
 {
 	if (pud_large(*pud)) {
@@ -615,6 +630,11 @@ int pud_clear_huge(pud_t *pud)
 	return 0;
 }
 
+/**
+ * pmd_clear_huge - clear kernel PMD mapping when it is set
+ *
+ * Return 1 on success, and 0 when no PMD map was found.
+ */
 int pmd_clear_huge(pmd_t *pmd)
 {
 	if (pmd_large(*pmd)) {

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4 1/7] mm, x86: Document return values of mapping funcs
@ 2015-03-24 22:08   ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-03-24 22:08 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle, Toshi Kani

Document the return values of KVA mapping functions,
pud_set_huge(), pmd_set_huge, pud_clear_huge() and
pmd_clear_huge().

Simplify the conditions to select HAVE_ARCH_HUGE_VMAP
in the Kconfig, since X86_PAE depends on X86_32.

There is no functional change in this patch.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/Kconfig      |    2 +-
 arch/x86/mm/pgtable.c |   36 ++++++++++++++++++++++++++++--------
 2 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cb23206..2ea27da 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -99,7 +99,7 @@ config X86
 	select IRQ_FORCED_THREADING
 	select HAVE_BPF_JIT if X86_64
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
-	select HAVE_ARCH_HUGE_VMAP if X86_64 || (X86_32 && X86_PAE)
+	select HAVE_ARCH_HUGE_VMAP if X86_64 || X86_PAE
 	select ARCH_HAS_SG_CHAIN
 	select CLKEVT_I8253
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 0b97d2c..4891fa1 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -563,14 +563,19 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
 }
 
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+/**
+ * pud_set_huge - setup kernel PUD mapping
+ *
+ * MTRR can override PAT memory types with 4KB granularity.  Therefore,
+ * it does not set up a huge page when the range is covered by a non-WB
+ * type of MTRR.  0xFF indicates that MTRR are disabled.
+ *
+ * Return 1 on success, and 0 when no PUD was set.
+ */
 int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 {
 	u8 mtrr;
 
-	/*
-	 * Do not use a huge page when the range is covered by non-WB type
-	 * of MTRRs.
-	 */
 	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
 	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
 		return 0;
@@ -584,14 +589,19 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 	return 1;
 }
 
+/**
+ * pmd_set_huge - setup kernel PMD mapping
+ *
+ * MTRR can override PAT memory types with 4KB granularity.  Therefore,
+ * it does not set up a huge page when the range is covered by a non-WB
+ * type of MTRR.  0xFF indicates that MTRR are disabled.
+ *
+ * Return 1 on success, and 0 when no PMD was set.
+ */
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
 {
 	u8 mtrr;
 
-	/*
-	 * Do not use a huge page when the range is covered by non-WB type
-	 * of MTRRs.
-	 */
 	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
 	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
 		return 0;
@@ -605,6 +615,11 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
 	return 1;
 }
 
+/**
+ * pud_clear_huge - clear kernel PUD mapping when it is set
+ *
+ * Return 1 on success, and 0 when no PUD map was found.
+ */
 int pud_clear_huge(pud_t *pud)
 {
 	if (pud_large(*pud)) {
@@ -615,6 +630,11 @@ int pud_clear_huge(pud_t *pud)
 	return 0;
 }
 
+/**
+ * pmd_clear_huge - clear kernel PMD mapping when it is set
+ *
+ * Return 1 on success, and 0 when no PMD map was found.
+ */
 int pmd_clear_huge(pmd_t *pmd)
 {
 	if (pmd_large(*pmd)) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4 2/7] mtrr, x86: Fix MTRR lookup to handle inclusive entry
  2015-03-24 22:08 ` Toshi Kani
@ 2015-03-24 22:08   ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-03-24 22:08 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle, Toshi Kani

When an MTRR entry is inclusive to a requested range, i.e.
the start and end of the request are not within the MTRR
entry range but the range contains the MTRR entry entirely,
__mtrr_type_lookup() ignores such a case because both
start_state and end_state are set to zero.

This bug can cause the following issues:
1) reserve_memtype() tracks an effective memory type in case
   a request type is WB (ex. /dev/mem blindly uses WB). Missing
   to track with its effective type causes a subsequent request
   to map the same range with the effective type to fail.
2) pud_set_huge() and pmd_set_huge() check if a requested range
   has any overlap with MTRRs. Missing to detect an overlap may
   cause a performance penalty or undefined behavior.

This patch fixes the bug by adding a new flag, 'inclusive',
to detect the inclusive case.  This case is then handled in
the same way as (!start_state && end_state).  With this fix,
__mtrr_type_lookup() handles the inclusive case properly.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/kernel/cpu/mtrr/generic.c |   17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 7d74f7b..a82e370 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -154,7 +154,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 
 	prev_match = 0xFF;
 	for (i = 0; i < num_var_ranges; ++i) {
-		unsigned short start_state, end_state;
+		unsigned short start_state, end_state, inclusive;
 
 		if (!(mtrr_state.var_ranges[i].mask_lo & (1 << 11)))
 			continue;
@@ -166,15 +166,16 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 
 		start_state = ((start & mask) == (base & mask));
 		end_state = ((end & mask) == (base & mask));
+		inclusive = ((start < base) && (end > base));
 
-		if (start_state != end_state) {
+		if ((start_state != end_state) || inclusive) {
 			/*
 			 * We have start:end spanning across an MTRR.
-			 * We split the region into
-			 * either
-			 * (start:mtrr_end) (mtrr_end:end)
-			 * or
-			 * (start:mtrr_start) (mtrr_start:end)
+			 * We split the region into either
+			 * - start_state:1
+			 *     (start:mtrr_end) (mtrr_end:end)
+			 * - end_state:1 or inclusive:1
+			 *     (start:mtrr_start) (mtrr_start:end)
 			 * depending on kind of overlap.
 			 * Return the type for first region and a pointer to
 			 * the start of second region so that caller will
@@ -195,7 +196,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			*repeat = 1;
 		}
 
-		if ((start & mask) != (base & mask))
+		if (!start_state)
 			continue;
 
 		curr_match = mtrr_state.var_ranges[i].base_lo & 0xff;

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4 2/7] mtrr, x86: Fix MTRR lookup to handle inclusive entry
@ 2015-03-24 22:08   ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-03-24 22:08 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle, Toshi Kani

When an MTRR entry is inclusive to a requested range, i.e.
the start and end of the request are not within the MTRR
entry range but the range contains the MTRR entry entirely,
__mtrr_type_lookup() ignores such a case because both
start_state and end_state are set to zero.

This bug can cause the following issues:
1) reserve_memtype() tracks an effective memory type in case
   a request type is WB (ex. /dev/mem blindly uses WB). Missing
   to track with its effective type causes a subsequent request
   to map the same range with the effective type to fail.
2) pud_set_huge() and pmd_set_huge() check if a requested range
   has any overlap with MTRRs. Missing to detect an overlap may
   cause a performance penalty or undefined behavior.

This patch fixes the bug by adding a new flag, 'inclusive',
to detect the inclusive case.  This case is then handled in
the same way as (!start_state && end_state).  With this fix,
__mtrr_type_lookup() handles the inclusive case properly.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/kernel/cpu/mtrr/generic.c |   17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 7d74f7b..a82e370 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -154,7 +154,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 
 	prev_match = 0xFF;
 	for (i = 0; i < num_var_ranges; ++i) {
-		unsigned short start_state, end_state;
+		unsigned short start_state, end_state, inclusive;
 
 		if (!(mtrr_state.var_ranges[i].mask_lo & (1 << 11)))
 			continue;
@@ -166,15 +166,16 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 
 		start_state = ((start & mask) == (base & mask));
 		end_state = ((end & mask) == (base & mask));
+		inclusive = ((start < base) && (end > base));
 
-		if (start_state != end_state) {
+		if ((start_state != end_state) || inclusive) {
 			/*
 			 * We have start:end spanning across an MTRR.
-			 * We split the region into
-			 * either
-			 * (start:mtrr_end) (mtrr_end:end)
-			 * or
-			 * (start:mtrr_start) (mtrr_start:end)
+			 * We split the region into either
+			 * - start_state:1
+			 *     (start:mtrr_end) (mtrr_end:end)
+			 * - end_state:1 or inclusive:1
+			 *     (start:mtrr_start) (mtrr_start:end)
 			 * depending on kind of overlap.
 			 * Return the type for first region and a pointer to
 			 * the start of second region so that caller will
@@ -195,7 +196,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			*repeat = 1;
 		}
 
-		if ((start & mask) != (base & mask))
+		if (!start_state)
 			continue;
 
 		curr_match = mtrr_state.var_ranges[i].base_lo & 0xff;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4 3/7] mtrr, x86: Remove a wrong address check in __mtrr_type_lookup()
  2015-03-24 22:08 ` Toshi Kani
@ 2015-03-24 22:08   ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-03-24 22:08 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle, Toshi Kani

__mtrr_type_lookup() checks MTRR fixed ranges when
mtrr_state.have_fixed is set and start is less than
0x100000.  However, the 'else if (start < 0x1000000)'
in the code checks with a wrong address as it has
an extra-zero in the address.  The code still runs
correctly as this check is meaningless, though.

This patch replaces the wrong address check with 'else'
with no condition.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/kernel/cpu/mtrr/generic.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index a82e370..c5be327 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -137,7 +137,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			idx = 1 * 8;
 			idx += ((start - 0x80000) >> 14);
 			return mtrr_state.fixed_ranges[idx];
-		} else if (start < 0x1000000) {
+		} else {
 			idx = 3 * 8;
 			idx += ((start - 0xC0000) >> 12);
 			return mtrr_state.fixed_ranges[idx];

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4 3/7] mtrr, x86: Remove a wrong address check in __mtrr_type_lookup()
@ 2015-03-24 22:08   ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-03-24 22:08 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle, Toshi Kani

__mtrr_type_lookup() checks MTRR fixed ranges when
mtrr_state.have_fixed is set and start is less than
0x100000.  However, the 'else if (start < 0x1000000)'
in the code checks with a wrong address as it has
an extra-zero in the address.  The code still runs
correctly as this check is meaningless, though.

This patch replaces the wrong address check with 'else'
with no condition.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/kernel/cpu/mtrr/generic.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index a82e370..c5be327 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -137,7 +137,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			idx = 1 * 8;
 			idx += ((start - 0x80000) >> 14);
 			return mtrr_state.fixed_ranges[idx];
-		} else if (start < 0x1000000) {
+		} else {
 			idx = 3 * 8;
 			idx += ((start - 0xC0000) >> 12);
 			return mtrr_state.fixed_ranges[idx];

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4 4/7] mtrr, x86: Fix MTRR state checks in mtrr_type_lookup()
  2015-03-24 22:08 ` Toshi Kani
@ 2015-03-24 22:08   ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-03-24 22:08 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle, Toshi Kani

'mtrr_state.enabled' contains the FE (fixed MTRRs enabled)
and E (MTRRs enabled) flags in MSR_MTRRdefType.  Intel SDM,
section 11.11.2.1, defines these flags as follows:
 - All MTRRs are disabled when the E flag is clear.
   The FE flag has no affect when the E flag is clear.
 - The default type is enabled when the E flag is set.
 - MTRR variable ranges are enabled when the E flag is set.
 - MTRR fixed ranges are enabled when both E and FE flags
   are set.

MTRR state checks in __mtrr_type_lookup() do not match with
SDM.  Hence, this patch makes the following changes:
 - The current code detects MTRRs disabled when both E and
   FE flags are clear in mtrr_state.enabled.  Fix to detect
   MTRRs disabled when the E flag is clear.
 - The current code does not check if the FE bit is set in
   mtrr_state.enabled when looking into the fixed entries.
   Fix to check the FE flag.
 - The current code returns the default type when the E flag
   is clear in mtrr_state.enabled.  However, the default type
   is also disabled when the E flag is clear.  Fix to remove
   the code as this case is handled as MTRR disabled with
   the 1st change.

In addition, this patch defines the E and FE flags in
mtrr_state.enabled as follows.
 - FE flag: MTRR_STATE_MTRR_FIXED_ENABLED
 - E  flag: MTRR_STATE_MTRR_ENABLED

print_mtrr_state() is also updated accordingly.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/include/uapi/asm/mtrr.h   |    4 ++++
 arch/x86/kernel/cpu/mtrr/generic.c |   15 ++++++++-------
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/uapi/asm/mtrr.h b/arch/x86/include/uapi/asm/mtrr.h
index d0acb65..66ba88d 100644
--- a/arch/x86/include/uapi/asm/mtrr.h
+++ b/arch/x86/include/uapi/asm/mtrr.h
@@ -88,6 +88,10 @@ struct mtrr_state_type {
 	mtrr_type def_type;
 };
 
+/* Bit fields for enabled in struct mtrr_state_type */
+#define MTRR_STATE_MTRR_FIXED_ENABLED	0x01
+#define MTRR_STATE_MTRR_ENABLED		0x02
+
 #define MTRRphysBase_MSR(reg) (0x200 + 2 * (reg))
 #define MTRRphysMask_MSR(reg) (0x200 + 2 * (reg) + 1)
 
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index c5be327..4bff6db 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -119,14 +119,16 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 	if (!mtrr_state_set)
 		return 0xFF;
 
-	if (!mtrr_state.enabled)
+	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
 		return 0xFF;
 
 	/* Make end inclusive end, instead of exclusive */
 	end--;
 
 	/* Look in fixed ranges. Just return the type as per start */
-	if (mtrr_state.have_fixed && (start < 0x100000)) {
+	if ((start < 0x100000) &&
+	    (mtrr_state.have_fixed) &&
+	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
 		int idx;
 
 		if (start < 0x80000) {
@@ -149,9 +151,6 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 	 * Look of multiple ranges matching this address and pick type
 	 * as per MTRR precedence
 	 */
-	if (!(mtrr_state.enabled & 2))
-		return mtrr_state.def_type;
-
 	prev_match = 0xFF;
 	for (i = 0; i < num_var_ranges; ++i) {
 		unsigned short start_state, end_state, inclusive;
@@ -348,7 +347,9 @@ static void __init print_mtrr_state(void)
 		 mtrr_attrib_to_str(mtrr_state.def_type));
 	if (mtrr_state.have_fixed) {
 		pr_debug("MTRR fixed ranges %sabled:\n",
-			 mtrr_state.enabled & 1 ? "en" : "dis");
+			((mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED) &&
+			 (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) ?
+			 "en" : "dis");
 		print_fixed(0x00000, 0x10000, mtrr_state.fixed_ranges + 0);
 		for (i = 0; i < 2; ++i)
 			print_fixed(0x80000 + i * 0x20000, 0x04000,
@@ -361,7 +362,7 @@ static void __init print_mtrr_state(void)
 		print_fixed_last();
 	}
 	pr_debug("MTRR variable ranges %sabled:\n",
-		 mtrr_state.enabled & 2 ? "en" : "dis");
+		 mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED ? "en" : "dis");
 	high_width = (__ffs64(size_or_mask) - (32 - PAGE_SHIFT) + 3) / 4;
 
 	for (i = 0; i < num_var_ranges; ++i) {

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4 4/7] mtrr, x86: Fix MTRR state checks in mtrr_type_lookup()
@ 2015-03-24 22:08   ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-03-24 22:08 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle, Toshi Kani

'mtrr_state.enabled' contains the FE (fixed MTRRs enabled)
and E (MTRRs enabled) flags in MSR_MTRRdefType.  Intel SDM,
section 11.11.2.1, defines these flags as follows:
 - All MTRRs are disabled when the E flag is clear.
   The FE flag has no affect when the E flag is clear.
 - The default type is enabled when the E flag is set.
 - MTRR variable ranges are enabled when the E flag is set.
 - MTRR fixed ranges are enabled when both E and FE flags
   are set.

MTRR state checks in __mtrr_type_lookup() do not match with
SDM.  Hence, this patch makes the following changes:
 - The current code detects MTRRs disabled when both E and
   FE flags are clear in mtrr_state.enabled.  Fix to detect
   MTRRs disabled when the E flag is clear.
 - The current code does not check if the FE bit is set in
   mtrr_state.enabled when looking into the fixed entries.
   Fix to check the FE flag.
 - The current code returns the default type when the E flag
   is clear in mtrr_state.enabled.  However, the default type
   is also disabled when the E flag is clear.  Fix to remove
   the code as this case is handled as MTRR disabled with
   the 1st change.

In addition, this patch defines the E and FE flags in
mtrr_state.enabled as follows.
 - FE flag: MTRR_STATE_MTRR_FIXED_ENABLED
 - E  flag: MTRR_STATE_MTRR_ENABLED

print_mtrr_state() is also updated accordingly.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/include/uapi/asm/mtrr.h   |    4 ++++
 arch/x86/kernel/cpu/mtrr/generic.c |   15 ++++++++-------
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/uapi/asm/mtrr.h b/arch/x86/include/uapi/asm/mtrr.h
index d0acb65..66ba88d 100644
--- a/arch/x86/include/uapi/asm/mtrr.h
+++ b/arch/x86/include/uapi/asm/mtrr.h
@@ -88,6 +88,10 @@ struct mtrr_state_type {
 	mtrr_type def_type;
 };
 
+/* Bit fields for enabled in struct mtrr_state_type */
+#define MTRR_STATE_MTRR_FIXED_ENABLED	0x01
+#define MTRR_STATE_MTRR_ENABLED		0x02
+
 #define MTRRphysBase_MSR(reg) (0x200 + 2 * (reg))
 #define MTRRphysMask_MSR(reg) (0x200 + 2 * (reg) + 1)
 
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index c5be327..4bff6db 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -119,14 +119,16 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 	if (!mtrr_state_set)
 		return 0xFF;
 
-	if (!mtrr_state.enabled)
+	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
 		return 0xFF;
 
 	/* Make end inclusive end, instead of exclusive */
 	end--;
 
 	/* Look in fixed ranges. Just return the type as per start */
-	if (mtrr_state.have_fixed && (start < 0x100000)) {
+	if ((start < 0x100000) &&
+	    (mtrr_state.have_fixed) &&
+	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
 		int idx;
 
 		if (start < 0x80000) {
@@ -149,9 +151,6 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 	 * Look of multiple ranges matching this address and pick type
 	 * as per MTRR precedence
 	 */
-	if (!(mtrr_state.enabled & 2))
-		return mtrr_state.def_type;
-
 	prev_match = 0xFF;
 	for (i = 0; i < num_var_ranges; ++i) {
 		unsigned short start_state, end_state, inclusive;
@@ -348,7 +347,9 @@ static void __init print_mtrr_state(void)
 		 mtrr_attrib_to_str(mtrr_state.def_type));
 	if (mtrr_state.have_fixed) {
 		pr_debug("MTRR fixed ranges %sabled:\n",
-			 mtrr_state.enabled & 1 ? "en" : "dis");
+			((mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED) &&
+			 (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) ?
+			 "en" : "dis");
 		print_fixed(0x00000, 0x10000, mtrr_state.fixed_ranges + 0);
 		for (i = 0; i < 2; ++i)
 			print_fixed(0x80000 + i * 0x20000, 0x04000,
@@ -361,7 +362,7 @@ static void __init print_mtrr_state(void)
 		print_fixed_last();
 	}
 	pr_debug("MTRR variable ranges %sabled:\n",
-		 mtrr_state.enabled & 2 ? "en" : "dis");
+		 mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED ? "en" : "dis");
 	high_width = (__ffs64(size_or_mask) - (32 - PAGE_SHIFT) + 3) / 4;
 
 	for (i = 0; i < num_var_ranges; ++i) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4 5/7] mtrr, x86: Define MTRR_TYPE_INVALID for mtrr_type_lookup()
  2015-03-24 22:08 ` Toshi Kani
@ 2015-03-24 22:08   ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-03-24 22:08 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle, Toshi Kani

mtrr_type_lookup() returns 0xFF when it cannot return a valid
MTRR memory type since MTRRs are disabled.  This patch defines
MTRR_TYPE_INVALID to clarify the meaning of this value, and
documents its usage.

There is no functional change in this patch.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/include/asm/mtrr.h        |    2 +-
 arch/x86/include/uapi/asm/mtrr.h   |    8 +++++++-
 arch/x86/kernel/cpu/mtrr/generic.c |   14 +++++++-------
 arch/x86/mm/pgtable.c              |    8 ++++----
 4 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index f768f62..a174af6 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -55,7 +55,7 @@ static inline u8 mtrr_type_lookup(u64 addr, u64 end)
 	/*
 	 * Return no-MTRRs:
 	 */
-	return 0xff;
+	return MTRR_TYPE_INVALID;
 }
 #define mtrr_save_fixed_ranges(arg) do {} while (0)
 #define mtrr_save_state() do {} while (0)
diff --git a/arch/x86/include/uapi/asm/mtrr.h b/arch/x86/include/uapi/asm/mtrr.h
index 66ba88d..0bc86c6 100644
--- a/arch/x86/include/uapi/asm/mtrr.h
+++ b/arch/x86/include/uapi/asm/mtrr.h
@@ -107,7 +107,7 @@ struct mtrr_state_type {
 #define MTRRIOC_GET_PAGE_ENTRY   _IOWR(MTRR_IOCTL_BASE, 8, struct mtrr_gentry)
 #define MTRRIOC_KILL_PAGE_ENTRY  _IOW(MTRR_IOCTL_BASE,  9, struct mtrr_sentry)
 
-/*  These are the region types  */
+/* MTRR memory types, which are defined in SDM */
 #define MTRR_TYPE_UNCACHABLE 0
 #define MTRR_TYPE_WRCOMB     1
 /*#define MTRR_TYPE_         2*/
@@ -117,5 +117,11 @@ struct mtrr_state_type {
 #define MTRR_TYPE_WRBACK     6
 #define MTRR_NUM_TYPES       7
 
+/*
+ * Invalid MTRR memory type.  mtrr_type_lookup() returns this value when
+ * MTRRs are disabled.  Note, this value is allocated from the reserved
+ * values (0x7-0xff) of the MTRR memory types.
+ */
+#define MTRR_TYPE_INVALID    0xff
 
 #endif /* _UAPI_ASM_X86_MTRR_H */
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 4bff6db..8bd1298 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -104,7 +104,7 @@ static int check_type_overlap(u8 *prev, u8 *curr)
 
 /*
  * Error/Semi-error returns:
- * 0xFF - when MTRR is not enabled
+ * MTRR_TYPE_INVALID - when MTRR is not enabled
  * *repeat == 1 implies [start:end] spanned across MTRR range and type returned
  *		corresponds only to [start:*partial_end].
  *		Caller has to lookup again for [*partial_end:end].
@@ -117,10 +117,10 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 
 	*repeat = 0;
 	if (!mtrr_state_set)
-		return 0xFF;
+		return MTRR_TYPE_INVALID;
 
 	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
-		return 0xFF;
+		return MTRR_TYPE_INVALID;
 
 	/* Make end inclusive end, instead of exclusive */
 	end--;
@@ -151,7 +151,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 	 * Look of multiple ranges matching this address and pick type
 	 * as per MTRR precedence
 	 */
-	prev_match = 0xFF;
+	prev_match = MTRR_TYPE_INVALID;
 	for (i = 0; i < num_var_ranges; ++i) {
 		unsigned short start_state, end_state, inclusive;
 
@@ -199,7 +199,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			continue;
 
 		curr_match = mtrr_state.var_ranges[i].base_lo & 0xff;
-		if (prev_match == 0xFF) {
+		if (prev_match == MTRR_TYPE_INVALID) {
 			prev_match = curr_match;
 			continue;
 		}
@@ -213,7 +213,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			return MTRR_TYPE_WRBACK;
 	}
 
-	if (prev_match != 0xFF)
+	if (prev_match != MTRR_TYPE_INVALID)
 		return prev_match;
 
 	return mtrr_state.def_type;
@@ -222,7 +222,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 /*
  * Returns the effective MTRR type for the region
  * Error return:
- * 0xFF - when MTRR is not enabled
+ * MTRR_TYPE_INVALID - when MTRR is not enabled
  */
 u8 mtrr_type_lookup(u64 start, u64 end)
 {
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 4891fa1..cfca4cf 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -568,7 +568,7 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
  *
  * MTRR can override PAT memory types with 4KB granularity.  Therefore,
  * it does not set up a huge page when the range is covered by a non-WB
- * type of MTRR.  0xFF indicates that MTRR are disabled.
+ * type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are disabled.
  *
  * Return 1 on success, and 0 when no PUD was set.
  */
@@ -577,7 +577,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 	u8 mtrr;
 
 	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
+	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
 		return 0;
 
 	prot = pgprot_4k_2_large(prot);
@@ -594,7 +594,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
  *
  * MTRR can override PAT memory types with 4KB granularity.  Therefore,
  * it does not set up a huge page when the range is covered by a non-WB
- * type of MTRR.  0xFF indicates that MTRR are disabled.
+ * type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are disabled.
  *
  * Return 1 on success, and 0 when no PMD was set.
  */
@@ -603,7 +603,7 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
 	u8 mtrr;
 
 	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
+	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
 		return 0;
 
 	prot = pgprot_4k_2_large(prot);

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4 5/7] mtrr, x86: Define MTRR_TYPE_INVALID for mtrr_type_lookup()
@ 2015-03-24 22:08   ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-03-24 22:08 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle, Toshi Kani

mtrr_type_lookup() returns 0xFF when it cannot return a valid
MTRR memory type since MTRRs are disabled.  This patch defines
MTRR_TYPE_INVALID to clarify the meaning of this value, and
documents its usage.

There is no functional change in this patch.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/include/asm/mtrr.h        |    2 +-
 arch/x86/include/uapi/asm/mtrr.h   |    8 +++++++-
 arch/x86/kernel/cpu/mtrr/generic.c |   14 +++++++-------
 arch/x86/mm/pgtable.c              |    8 ++++----
 4 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index f768f62..a174af6 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -55,7 +55,7 @@ static inline u8 mtrr_type_lookup(u64 addr, u64 end)
 	/*
 	 * Return no-MTRRs:
 	 */
-	return 0xff;
+	return MTRR_TYPE_INVALID;
 }
 #define mtrr_save_fixed_ranges(arg) do {} while (0)
 #define mtrr_save_state() do {} while (0)
diff --git a/arch/x86/include/uapi/asm/mtrr.h b/arch/x86/include/uapi/asm/mtrr.h
index 66ba88d..0bc86c6 100644
--- a/arch/x86/include/uapi/asm/mtrr.h
+++ b/arch/x86/include/uapi/asm/mtrr.h
@@ -107,7 +107,7 @@ struct mtrr_state_type {
 #define MTRRIOC_GET_PAGE_ENTRY   _IOWR(MTRR_IOCTL_BASE, 8, struct mtrr_gentry)
 #define MTRRIOC_KILL_PAGE_ENTRY  _IOW(MTRR_IOCTL_BASE,  9, struct mtrr_sentry)
 
-/*  These are the region types  */
+/* MTRR memory types, which are defined in SDM */
 #define MTRR_TYPE_UNCACHABLE 0
 #define MTRR_TYPE_WRCOMB     1
 /*#define MTRR_TYPE_         2*/
@@ -117,5 +117,11 @@ struct mtrr_state_type {
 #define MTRR_TYPE_WRBACK     6
 #define MTRR_NUM_TYPES       7
 
+/*
+ * Invalid MTRR memory type.  mtrr_type_lookup() returns this value when
+ * MTRRs are disabled.  Note, this value is allocated from the reserved
+ * values (0x7-0xff) of the MTRR memory types.
+ */
+#define MTRR_TYPE_INVALID    0xff
 
 #endif /* _UAPI_ASM_X86_MTRR_H */
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 4bff6db..8bd1298 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -104,7 +104,7 @@ static int check_type_overlap(u8 *prev, u8 *curr)
 
 /*
  * Error/Semi-error returns:
- * 0xFF - when MTRR is not enabled
+ * MTRR_TYPE_INVALID - when MTRR is not enabled
  * *repeat == 1 implies [start:end] spanned across MTRR range and type returned
  *		corresponds only to [start:*partial_end].
  *		Caller has to lookup again for [*partial_end:end].
@@ -117,10 +117,10 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 
 	*repeat = 0;
 	if (!mtrr_state_set)
-		return 0xFF;
+		return MTRR_TYPE_INVALID;
 
 	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
-		return 0xFF;
+		return MTRR_TYPE_INVALID;
 
 	/* Make end inclusive end, instead of exclusive */
 	end--;
@@ -151,7 +151,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 	 * Look of multiple ranges matching this address and pick type
 	 * as per MTRR precedence
 	 */
-	prev_match = 0xFF;
+	prev_match = MTRR_TYPE_INVALID;
 	for (i = 0; i < num_var_ranges; ++i) {
 		unsigned short start_state, end_state, inclusive;
 
@@ -199,7 +199,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			continue;
 
 		curr_match = mtrr_state.var_ranges[i].base_lo & 0xff;
-		if (prev_match == 0xFF) {
+		if (prev_match == MTRR_TYPE_INVALID) {
 			prev_match = curr_match;
 			continue;
 		}
@@ -213,7 +213,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			return MTRR_TYPE_WRBACK;
 	}
 
-	if (prev_match != 0xFF)
+	if (prev_match != MTRR_TYPE_INVALID)
 		return prev_match;
 
 	return mtrr_state.def_type;
@@ -222,7 +222,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 /*
  * Returns the effective MTRR type for the region
  * Error return:
- * 0xFF - when MTRR is not enabled
+ * MTRR_TYPE_INVALID - when MTRR is not enabled
  */
 u8 mtrr_type_lookup(u64 start, u64 end)
 {
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 4891fa1..cfca4cf 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -568,7 +568,7 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
  *
  * MTRR can override PAT memory types with 4KB granularity.  Therefore,
  * it does not set up a huge page when the range is covered by a non-WB
- * type of MTRR.  0xFF indicates that MTRR are disabled.
+ * type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are disabled.
  *
  * Return 1 on success, and 0 when no PUD was set.
  */
@@ -577,7 +577,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 	u8 mtrr;
 
 	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
+	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
 		return 0;
 
 	prot = pgprot_4k_2_large(prot);
@@ -594,7 +594,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
  *
  * MTRR can override PAT memory types with 4KB granularity.  Therefore,
  * it does not set up a huge page when the range is covered by a non-WB
- * type of MTRR.  0xFF indicates that MTRR are disabled.
+ * type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are disabled.
  *
  * Return 1 on success, and 0 when no PMD was set.
  */
@@ -603,7 +603,7 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
 	u8 mtrr;
 
 	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
+	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
 		return 0;
 
 	prot = pgprot_4k_2_large(prot);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4 6/7] mtrr, x86: Clean up mtrr_type_lookup()
  2015-03-24 22:08 ` Toshi Kani
@ 2015-03-24 22:08   ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-03-24 22:08 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle, Toshi Kani

MTRRs contain fixed and variable entries.  mtrr_type_lookup()
may repeatedly call __mtrr_type_lookup() to handle a request
that overlaps with variable entries.  However,
__mtrr_type_lookup() also handles the fixed entries, which
do not have to be repeated.  Therefore, this patch creates
separate functions, mtrr_type_lookup_fixed() and
mtrr_type_lookup_variable(), to handle the fixed and variable
ranges respectively.

The patch also updates the function headers to clarify the
return values and output argument.  It updates comments to
clarify that the repeating is necessary to handle overlaps
with the default type, since overlaps with multiple entries
alone can be handled without such repeating.

There is no functional change in this patch.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/kernel/cpu/mtrr/generic.c |  137 +++++++++++++++++++++++-------------
 1 file changed, 86 insertions(+), 51 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 8bd1298..3652e2b 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -102,55 +102,69 @@ static int check_type_overlap(u8 *prev, u8 *curr)
 	return 0;
 }
 
-/*
- * Error/Semi-error returns:
- * MTRR_TYPE_INVALID - when MTRR is not enabled
- * *repeat == 1 implies [start:end] spanned across MTRR range and type returned
- *		corresponds only to [start:*partial_end].
- *		Caller has to lookup again for [*partial_end:end].
+/**
+ * mtrr_type_lookup_fixed - look up memory type in MTRR fixed entries
+ *
+ * MTRR fixed entries are divided into the following ways:
+ *  0x00000 - 0x7FFFF : This range is divided into eight 64KB sub-ranges
+ *  0x80000 - 0xBFFFF : This range is divided into sixteen 16KB sub-ranges
+ *  0xC0000 - 0xFFFFF : This range is divided into sixty-four 4KB sub-ranges
+ *
+ * Return Values:
+ * MTRR_TYPE_(type)  - Matched memory type
+ * MTRR_TYPE_INVALID - Unmatched or fixed entries are disabled
  */
-static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
+static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
+{
+	int idx;
+
+	if (start >= 0x100000)
+		return MTRR_TYPE_INVALID;
+
+	if (!(mtrr_state.have_fixed) ||
+	    !(mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
+		return MTRR_TYPE_INVALID;
+
+	if (start < 0x80000) {		/* 0x0 - 0x7FFFF */
+		idx = 0;
+		idx += (start >> 16);
+		return mtrr_state.fixed_ranges[idx];
+
+	} else if (start < 0xC0000) {	/* 0x80000 - 0xBFFFF */
+		idx = 1 * 8;
+		idx += ((start - 0x80000) >> 14);
+		return mtrr_state.fixed_ranges[idx];
+	}
+
+	/* 0xC0000 - 0xFFFFF */
+	idx = 3 * 8;
+	idx += ((start - 0xC0000) >> 12);
+	return mtrr_state.fixed_ranges[idx];
+}
+
+/**
+ * mtrr_type_lookup_variable - look up memory type in MTRR variable entries
+ *
+ * Return Value:
+ * MTRR_TYPE_(type) - Matched memory type or default memory type (unmatched)
+ *
+ * Output Argument:
+ * repeat - Set to 1 when [start:end] spanned across MTRR range and type
+ *	    returned corresponds only to [start:*partial_end].  Caller has
+ *	    to lookup again for [*partial_end:end].
+ */
+static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
+				    int *repeat)
 {
 	int i;
 	u64 base, mask;
 	u8 prev_match, curr_match;
 
 	*repeat = 0;
-	if (!mtrr_state_set)
-		return MTRR_TYPE_INVALID;
-
-	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
-		return MTRR_TYPE_INVALID;
 
 	/* Make end inclusive end, instead of exclusive */
 	end--;
 
-	/* Look in fixed ranges. Just return the type as per start */
-	if ((start < 0x100000) &&
-	    (mtrr_state.have_fixed) &&
-	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
-		int idx;
-
-		if (start < 0x80000) {
-			idx = 0;
-			idx += (start >> 16);
-			return mtrr_state.fixed_ranges[idx];
-		} else if (start < 0xC0000) {
-			idx = 1 * 8;
-			idx += ((start - 0x80000) >> 14);
-			return mtrr_state.fixed_ranges[idx];
-		} else {
-			idx = 3 * 8;
-			idx += ((start - 0xC0000) >> 12);
-			return mtrr_state.fixed_ranges[idx];
-		}
-	}
-
-	/*
-	 * Look in variable ranges
-	 * Look of multiple ranges matching this address and pick type
-	 * as per MTRR precedence
-	 */
 	prev_match = MTRR_TYPE_INVALID;
 	for (i = 0; i < num_var_ranges; ++i) {
 		unsigned short start_state, end_state, inclusive;
@@ -179,7 +193,8 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			 * Return the type for first region and a pointer to
 			 * the start of second region so that caller will
 			 * lookup again on the second region.
-			 * Note: This way we handle multiple overlaps as well.
+			 * Note: This way we handle overlaps with multiple
+			 * entries and the default type properly.
 			 */
 			if (start_state)
 				*partial_end = base + get_mtrr_size(mask);
@@ -208,21 +223,18 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			return curr_match;
 	}
 
-	if (mtrr_tom2) {
-		if (start >= (1ULL<<32) && (end < mtrr_tom2))
-			return MTRR_TYPE_WRBACK;
-	}
-
 	if (prev_match != MTRR_TYPE_INVALID)
 		return prev_match;
 
 	return mtrr_state.def_type;
 }
 
-/*
- * Returns the effective MTRR type for the region
- * Error return:
- * MTRR_TYPE_INVALID - when MTRR is not enabled
+/**
+ * mtrr_type_lookup - look up memory type in MTRR
+ *
+ * Return Values:
+ * MTRR_TYPE_(type)  - The effective MTRR type for the region
+ * MTRR_TYPE_INVALID - MTRR is disabled
  */
 u8 mtrr_type_lookup(u64 start, u64 end)
 {
@@ -230,22 +242,45 @@ u8 mtrr_type_lookup(u64 start, u64 end)
 	int repeat;
 	u64 partial_end;
 
-	type = __mtrr_type_lookup(start, end, &partial_end, &repeat);
+	if (!mtrr_state_set)
+		return MTRR_TYPE_INVALID;
+
+	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
+		return MTRR_TYPE_INVALID;
+
+	/*
+	 * Look up the fixed ranges first, which take priority over
+	 * the variable ranges.
+	 */
+	type = mtrr_type_lookup_fixed(start, end);
+	if (type != MTRR_TYPE_INVALID)
+		return type;
+
+	/*
+	 * Look up the variable ranges.  Look of multiple ranges matching
+	 * this address and pick type as per MTRR precedence.
+	 */
+	type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
 
 	/*
 	 * Common path is with repeat = 0.
 	 * However, we can have cases where [start:end] spans across some
-	 * MTRR range. Do repeated lookups for that case here.
+	 * MTRR ranges and/or the default type.  Do repeated lookups for
+	 * that case here.
 	 */
 	while (repeat) {
 		prev_type = type;
 		start = partial_end;
-		type = __mtrr_type_lookup(start, end, &partial_end, &repeat);
+		type = mtrr_type_lookup_variable(start, end, &partial_end,
+						 &repeat);
 
 		if (check_type_overlap(&prev_type, &type))
 			return type;
 	}
 
+	if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
+		return MTRR_TYPE_WRBACK;
+
 	return type;
 }
 

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4 6/7] mtrr, x86: Clean up mtrr_type_lookup()
@ 2015-03-24 22:08   ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-03-24 22:08 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle, Toshi Kani

MTRRs contain fixed and variable entries.  mtrr_type_lookup()
may repeatedly call __mtrr_type_lookup() to handle a request
that overlaps with variable entries.  However,
__mtrr_type_lookup() also handles the fixed entries, which
do not have to be repeated.  Therefore, this patch creates
separate functions, mtrr_type_lookup_fixed() and
mtrr_type_lookup_variable(), to handle the fixed and variable
ranges respectively.

The patch also updates the function headers to clarify the
return values and output argument.  It updates comments to
clarify that the repeating is necessary to handle overlaps
with the default type, since overlaps with multiple entries
alone can be handled without such repeating.

There is no functional change in this patch.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/kernel/cpu/mtrr/generic.c |  137 +++++++++++++++++++++++-------------
 1 file changed, 86 insertions(+), 51 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 8bd1298..3652e2b 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -102,55 +102,69 @@ static int check_type_overlap(u8 *prev, u8 *curr)
 	return 0;
 }
 
-/*
- * Error/Semi-error returns:
- * MTRR_TYPE_INVALID - when MTRR is not enabled
- * *repeat == 1 implies [start:end] spanned across MTRR range and type returned
- *		corresponds only to [start:*partial_end].
- *		Caller has to lookup again for [*partial_end:end].
+/**
+ * mtrr_type_lookup_fixed - look up memory type in MTRR fixed entries
+ *
+ * MTRR fixed entries are divided into the following ways:
+ *  0x00000 - 0x7FFFF : This range is divided into eight 64KB sub-ranges
+ *  0x80000 - 0xBFFFF : This range is divided into sixteen 16KB sub-ranges
+ *  0xC0000 - 0xFFFFF : This range is divided into sixty-four 4KB sub-ranges
+ *
+ * Return Values:
+ * MTRR_TYPE_(type)  - Matched memory type
+ * MTRR_TYPE_INVALID - Unmatched or fixed entries are disabled
  */
-static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
+static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
+{
+	int idx;
+
+	if (start >= 0x100000)
+		return MTRR_TYPE_INVALID;
+
+	if (!(mtrr_state.have_fixed) ||
+	    !(mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
+		return MTRR_TYPE_INVALID;
+
+	if (start < 0x80000) {		/* 0x0 - 0x7FFFF */
+		idx = 0;
+		idx += (start >> 16);
+		return mtrr_state.fixed_ranges[idx];
+
+	} else if (start < 0xC0000) {	/* 0x80000 - 0xBFFFF */
+		idx = 1 * 8;
+		idx += ((start - 0x80000) >> 14);
+		return mtrr_state.fixed_ranges[idx];
+	}
+
+	/* 0xC0000 - 0xFFFFF */
+	idx = 3 * 8;
+	idx += ((start - 0xC0000) >> 12);
+	return mtrr_state.fixed_ranges[idx];
+}
+
+/**
+ * mtrr_type_lookup_variable - look up memory type in MTRR variable entries
+ *
+ * Return Value:
+ * MTRR_TYPE_(type) - Matched memory type or default memory type (unmatched)
+ *
+ * Output Argument:
+ * repeat - Set to 1 when [start:end] spanned across MTRR range and type
+ *	    returned corresponds only to [start:*partial_end].  Caller has
+ *	    to lookup again for [*partial_end:end].
+ */
+static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
+				    int *repeat)
 {
 	int i;
 	u64 base, mask;
 	u8 prev_match, curr_match;
 
 	*repeat = 0;
-	if (!mtrr_state_set)
-		return MTRR_TYPE_INVALID;
-
-	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
-		return MTRR_TYPE_INVALID;
 
 	/* Make end inclusive end, instead of exclusive */
 	end--;
 
-	/* Look in fixed ranges. Just return the type as per start */
-	if ((start < 0x100000) &&
-	    (mtrr_state.have_fixed) &&
-	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
-		int idx;
-
-		if (start < 0x80000) {
-			idx = 0;
-			idx += (start >> 16);
-			return mtrr_state.fixed_ranges[idx];
-		} else if (start < 0xC0000) {
-			idx = 1 * 8;
-			idx += ((start - 0x80000) >> 14);
-			return mtrr_state.fixed_ranges[idx];
-		} else {
-			idx = 3 * 8;
-			idx += ((start - 0xC0000) >> 12);
-			return mtrr_state.fixed_ranges[idx];
-		}
-	}
-
-	/*
-	 * Look in variable ranges
-	 * Look of multiple ranges matching this address and pick type
-	 * as per MTRR precedence
-	 */
 	prev_match = MTRR_TYPE_INVALID;
 	for (i = 0; i < num_var_ranges; ++i) {
 		unsigned short start_state, end_state, inclusive;
@@ -179,7 +193,8 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			 * Return the type for first region and a pointer to
 			 * the start of second region so that caller will
 			 * lookup again on the second region.
-			 * Note: This way we handle multiple overlaps as well.
+			 * Note: This way we handle overlaps with multiple
+			 * entries and the default type properly.
 			 */
 			if (start_state)
 				*partial_end = base + get_mtrr_size(mask);
@@ -208,21 +223,18 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			return curr_match;
 	}
 
-	if (mtrr_tom2) {
-		if (start >= (1ULL<<32) && (end < mtrr_tom2))
-			return MTRR_TYPE_WRBACK;
-	}
-
 	if (prev_match != MTRR_TYPE_INVALID)
 		return prev_match;
 
 	return mtrr_state.def_type;
 }
 
-/*
- * Returns the effective MTRR type for the region
- * Error return:
- * MTRR_TYPE_INVALID - when MTRR is not enabled
+/**
+ * mtrr_type_lookup - look up memory type in MTRR
+ *
+ * Return Values:
+ * MTRR_TYPE_(type)  - The effective MTRR type for the region
+ * MTRR_TYPE_INVALID - MTRR is disabled
  */
 u8 mtrr_type_lookup(u64 start, u64 end)
 {
@@ -230,22 +242,45 @@ u8 mtrr_type_lookup(u64 start, u64 end)
 	int repeat;
 	u64 partial_end;
 
-	type = __mtrr_type_lookup(start, end, &partial_end, &repeat);
+	if (!mtrr_state_set)
+		return MTRR_TYPE_INVALID;
+
+	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
+		return MTRR_TYPE_INVALID;
+
+	/*
+	 * Look up the fixed ranges first, which take priority over
+	 * the variable ranges.
+	 */
+	type = mtrr_type_lookup_fixed(start, end);
+	if (type != MTRR_TYPE_INVALID)
+		return type;
+
+	/*
+	 * Look up the variable ranges.  Look of multiple ranges matching
+	 * this address and pick type as per MTRR precedence.
+	 */
+	type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
 
 	/*
 	 * Common path is with repeat = 0.
 	 * However, we can have cases where [start:end] spans across some
-	 * MTRR range. Do repeated lookups for that case here.
+	 * MTRR ranges and/or the default type.  Do repeated lookups for
+	 * that case here.
 	 */
 	while (repeat) {
 		prev_type = type;
 		start = partial_end;
-		type = __mtrr_type_lookup(start, end, &partial_end, &repeat);
+		type = mtrr_type_lookup_variable(start, end, &partial_end,
+						 &repeat);
 
 		if (check_type_overlap(&prev_type, &type))
 			return type;
 	}
 
+	if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
+		return MTRR_TYPE_WRBACK;
+
 	return type;
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4 7/7] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-03-24 22:08 ` Toshi Kani
@ 2015-03-24 22:08   ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-03-24 22:08 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle, Toshi Kani

This patch adds an additional argument, 'uniform', to
mtrr_type_lookup(), which returns 1 when a given range is
covered uniformly by MTRRs, i.e. the range is fully covered
by a single MTRR entry or the default type.

pud_set_huge() and pmd_set_huge() are changed to check the
new 'uniform' flag to see if it is safe to create a huge page
mapping to the range.  This allows them to create a huge page
mapping to a range covered by a single MTRR entry of any
memory type.  It also detects a non-optimal request properly.
They continue to check with the WB type since the WB type has
no effect even if a request spans multiple MTRR entries.

pmd_set_huge() logs a warning message to a non-optimal request
so that driver writers will be aware of such a case.  Drivers
should make a mapping request aligned to a single MTRR entry
when the range is covered by MTRRs.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/include/asm/mtrr.h        |    5 +++--
 arch/x86/kernel/cpu/mtrr/generic.c |   35 +++++++++++++++++++++++++++--------
 arch/x86/mm/pat.c                  |    4 ++--
 arch/x86/mm/pgtable.c              |   25 +++++++++++++++----------
 4 files changed, 47 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index a174af6..da8dff1 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -31,7 +31,7 @@
  * arch_phys_wc_add and arch_phys_wc_del.
  */
 # ifdef CONFIG_MTRR
-extern u8 mtrr_type_lookup(u64 addr, u64 end);
+extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform);
 extern void mtrr_save_fixed_ranges(void *);
 extern void mtrr_save_state(void);
 extern int mtrr_add(unsigned long base, unsigned long size,
@@ -50,11 +50,12 @@ extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
 extern int amd_special_default_mtrr(void);
 extern int phys_wc_to_mtrr_index(int handle);
 #  else
-static inline u8 mtrr_type_lookup(u64 addr, u64 end)
+static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
 {
 	/*
 	 * Return no-MTRRs:
 	 */
+	*uniform = 1;
 	return MTRR_TYPE_INVALID;
 }
 #define mtrr_save_fixed_ranges(arg) do {} while (0)
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 3652e2b..a83f27a 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -148,19 +148,22 @@ static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
  * Return Value:
  * MTRR_TYPE_(type) - Matched memory type or default memory type (unmatched)
  *
- * Output Argument:
+ * Output Arguments:
  * repeat - Set to 1 when [start:end] spanned across MTRR range and type
  *	    returned corresponds only to [start:*partial_end].  Caller has
  *	    to lookup again for [*partial_end:end].
+ * uniform - Set to 1 when MTRR covers the region uniformly, i.e. the region
+ *	     is fully covered by a single MTRR entry or the default type.
  */
 static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
-				    int *repeat)
+				    int *repeat, u8 *uniform)
 {
 	int i;
 	u64 base, mask;
 	u8 prev_match, curr_match;
 
 	*repeat = 0;
+	*uniform = 1;
 
 	/* Make end inclusive end, instead of exclusive */
 	end--;
@@ -208,6 +211,7 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
 
 			end = *partial_end - 1; /* end is inclusive */
 			*repeat = 1;
+			*uniform = 0;
 		}
 
 		if (!start_state)
@@ -219,6 +223,7 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
 			continue;
 		}
 
+		*uniform = 0;
 		if (check_type_overlap(&prev_match, &curr_match))
 			return curr_match;
 	}
@@ -235,13 +240,19 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
  * Return Values:
  * MTRR_TYPE_(type)  - The effective MTRR type for the region
  * MTRR_TYPE_INVALID - MTRR is disabled
+ *
+ * Output Argument:
+ * uniform - Set to 1 when MTRR covers the region uniformly, i.e. the region
+ *	     is fully covered by a single MTRR entry or the default type.
  */
-u8 mtrr_type_lookup(u64 start, u64 end)
+u8 mtrr_type_lookup(u64 start, u64 end, u8 *uniform)
 {
-	u8 type, prev_type;
+	u8 type, prev_type, is_uniform, dummy;
 	int repeat;
 	u64 partial_end;
 
+	*uniform = 1;
+
 	if (!mtrr_state_set)
 		return MTRR_TYPE_INVALID;
 
@@ -253,14 +264,17 @@ u8 mtrr_type_lookup(u64 start, u64 end)
 	 * the variable ranges.
 	 */
 	type = mtrr_type_lookup_fixed(start, end);
-	if (type != MTRR_TYPE_INVALID)
+	if (type != MTRR_TYPE_INVALID) {
+		*uniform = 0;
 		return type;
+	}
 
 	/*
 	 * Look up the variable ranges.  Look of multiple ranges matching
 	 * this address and pick type as per MTRR precedence.
 	 */
-	type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
+	type = mtrr_type_lookup_variable(start, end, &partial_end,
+					 &repeat, &is_uniform);
 
 	/*
 	 * Common path is with repeat = 0.
@@ -271,16 +285,21 @@ u8 mtrr_type_lookup(u64 start, u64 end)
 	while (repeat) {
 		prev_type = type;
 		start = partial_end;
+		is_uniform = 0;
+
 		type = mtrr_type_lookup_variable(start, end, &partial_end,
-						 &repeat);
+						 &repeat, &dummy);
 
-		if (check_type_overlap(&prev_type, &type))
+		if (check_type_overlap(&prev_type, &type)) {
+			*uniform = 0;
 			return type;
+		}
 	}
 
 	if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
 		return MTRR_TYPE_WRBACK;
 
+	*uniform = is_uniform;
 	return type;
 }
 
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 35af677..372ad42 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -267,9 +267,9 @@ static unsigned long pat_x_mtrr_type(u64 start, u64 end,
 	 * request is for WB.
 	 */
 	if (req_type == _PAGE_CACHE_MODE_WB) {
-		u8 mtrr_type;
+		u8 mtrr_type, uniform;
 
-		mtrr_type = mtrr_type_lookup(start, end);
+		mtrr_type = mtrr_type_lookup(start, end, &uniform);
 		if (mtrr_type != MTRR_TYPE_WRBACK)
 			return _PAGE_CACHE_MODE_UC_MINUS;
 
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index cfca4cf..3d6edea 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -567,17 +567,18 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
  * pud_set_huge - setup kernel PUD mapping
  *
  * MTRR can override PAT memory types with 4KB granularity.  Therefore,
- * it does not set up a huge page when the range is covered by a non-WB
- * type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are disabled.
+ * it only sets up a huge page when the range is mapped uniformly by MTRR
+ * (i.e. the range is fully covered by a single MTRR entry or the default
+ * type) or the MTRR memory type is WB.
  *
  * Return 1 on success, and 0 when no PUD was set.
  */
 int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 {
-	u8 mtrr;
+	u8 mtrr, uniform;
 
-	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
+	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
+	if ((!uniform) && (mtrr != MTRR_TYPE_WRBACK))
 		return 0;
 
 	prot = pgprot_4k_2_large(prot);
@@ -593,18 +594,22 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
  * pmd_set_huge - setup kernel PMD mapping
  *
  * MTRR can override PAT memory types with 4KB granularity.  Therefore,
- * it does not set up a huge page when the range is covered by a non-WB
- * type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are disabled.
+ * it only sets up a huge page when the range is mapped uniformly by MTRR
+ * (i.e. the range is fully covered by a single MTRR entry or the default
+ * type) or the MTRR memory type is WB.
  *
  * Return 1 on success, and 0 when no PMD was set.
  */
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
 {
-	u8 mtrr;
+	u8 mtrr, uniform;
 
-	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
+	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
+	if ((!uniform) && (mtrr != MTRR_TYPE_WRBACK)) {
+		pr_warn("pmd_set_huge: requesting [mem %#010llx-%#010llx], which spans more than a single MTRR entry\n",
+				addr, addr + PMD_SIZE);
 		return 0;
+	}
 
 	prot = pgprot_4k_2_large(prot);
 

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4 7/7] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
@ 2015-03-24 22:08   ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-03-24 22:08 UTC (permalink / raw)
  To: akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle, Toshi Kani

This patch adds an additional argument, 'uniform', to
mtrr_type_lookup(), which returns 1 when a given range is
covered uniformly by MTRRs, i.e. the range is fully covered
by a single MTRR entry or the default type.

pud_set_huge() and pmd_set_huge() are changed to check the
new 'uniform' flag to see if it is safe to create a huge page
mapping to the range.  This allows them to create a huge page
mapping to a range covered by a single MTRR entry of any
memory type.  It also detects a non-optimal request properly.
They continue to check with the WB type since the WB type has
no effect even if a request spans multiple MTRR entries.

pmd_set_huge() logs a warning message to a non-optimal request
so that driver writers will be aware of such a case.  Drivers
should make a mapping request aligned to a single MTRR entry
when the range is covered by MTRRs.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/include/asm/mtrr.h        |    5 +++--
 arch/x86/kernel/cpu/mtrr/generic.c |   35 +++++++++++++++++++++++++++--------
 arch/x86/mm/pat.c                  |    4 ++--
 arch/x86/mm/pgtable.c              |   25 +++++++++++++++----------
 4 files changed, 47 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index a174af6..da8dff1 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -31,7 +31,7 @@
  * arch_phys_wc_add and arch_phys_wc_del.
  */
 # ifdef CONFIG_MTRR
-extern u8 mtrr_type_lookup(u64 addr, u64 end);
+extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform);
 extern void mtrr_save_fixed_ranges(void *);
 extern void mtrr_save_state(void);
 extern int mtrr_add(unsigned long base, unsigned long size,
@@ -50,11 +50,12 @@ extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
 extern int amd_special_default_mtrr(void);
 extern int phys_wc_to_mtrr_index(int handle);
 #  else
-static inline u8 mtrr_type_lookup(u64 addr, u64 end)
+static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
 {
 	/*
 	 * Return no-MTRRs:
 	 */
+	*uniform = 1;
 	return MTRR_TYPE_INVALID;
 }
 #define mtrr_save_fixed_ranges(arg) do {} while (0)
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 3652e2b..a83f27a 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -148,19 +148,22 @@ static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
  * Return Value:
  * MTRR_TYPE_(type) - Matched memory type or default memory type (unmatched)
  *
- * Output Argument:
+ * Output Arguments:
  * repeat - Set to 1 when [start:end] spanned across MTRR range and type
  *	    returned corresponds only to [start:*partial_end].  Caller has
  *	    to lookup again for [*partial_end:end].
+ * uniform - Set to 1 when MTRR covers the region uniformly, i.e. the region
+ *	     is fully covered by a single MTRR entry or the default type.
  */
 static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
-				    int *repeat)
+				    int *repeat, u8 *uniform)
 {
 	int i;
 	u64 base, mask;
 	u8 prev_match, curr_match;
 
 	*repeat = 0;
+	*uniform = 1;
 
 	/* Make end inclusive end, instead of exclusive */
 	end--;
@@ -208,6 +211,7 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
 
 			end = *partial_end - 1; /* end is inclusive */
 			*repeat = 1;
+			*uniform = 0;
 		}
 
 		if (!start_state)
@@ -219,6 +223,7 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
 			continue;
 		}
 
+		*uniform = 0;
 		if (check_type_overlap(&prev_match, &curr_match))
 			return curr_match;
 	}
@@ -235,13 +240,19 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
  * Return Values:
  * MTRR_TYPE_(type)  - The effective MTRR type for the region
  * MTRR_TYPE_INVALID - MTRR is disabled
+ *
+ * Output Argument:
+ * uniform - Set to 1 when MTRR covers the region uniformly, i.e. the region
+ *	     is fully covered by a single MTRR entry or the default type.
  */
-u8 mtrr_type_lookup(u64 start, u64 end)
+u8 mtrr_type_lookup(u64 start, u64 end, u8 *uniform)
 {
-	u8 type, prev_type;
+	u8 type, prev_type, is_uniform, dummy;
 	int repeat;
 	u64 partial_end;
 
+	*uniform = 1;
+
 	if (!mtrr_state_set)
 		return MTRR_TYPE_INVALID;
 
@@ -253,14 +264,17 @@ u8 mtrr_type_lookup(u64 start, u64 end)
 	 * the variable ranges.
 	 */
 	type = mtrr_type_lookup_fixed(start, end);
-	if (type != MTRR_TYPE_INVALID)
+	if (type != MTRR_TYPE_INVALID) {
+		*uniform = 0;
 		return type;
+	}
 
 	/*
 	 * Look up the variable ranges.  Look of multiple ranges matching
 	 * this address and pick type as per MTRR precedence.
 	 */
-	type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
+	type = mtrr_type_lookup_variable(start, end, &partial_end,
+					 &repeat, &is_uniform);
 
 	/*
 	 * Common path is with repeat = 0.
@@ -271,16 +285,21 @@ u8 mtrr_type_lookup(u64 start, u64 end)
 	while (repeat) {
 		prev_type = type;
 		start = partial_end;
+		is_uniform = 0;
+
 		type = mtrr_type_lookup_variable(start, end, &partial_end,
-						 &repeat);
+						 &repeat, &dummy);
 
-		if (check_type_overlap(&prev_type, &type))
+		if (check_type_overlap(&prev_type, &type)) {
+			*uniform = 0;
 			return type;
+		}
 	}
 
 	if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
 		return MTRR_TYPE_WRBACK;
 
+	*uniform = is_uniform;
 	return type;
 }
 
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 35af677..372ad42 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -267,9 +267,9 @@ static unsigned long pat_x_mtrr_type(u64 start, u64 end,
 	 * request is for WB.
 	 */
 	if (req_type == _PAGE_CACHE_MODE_WB) {
-		u8 mtrr_type;
+		u8 mtrr_type, uniform;
 
-		mtrr_type = mtrr_type_lookup(start, end);
+		mtrr_type = mtrr_type_lookup(start, end, &uniform);
 		if (mtrr_type != MTRR_TYPE_WRBACK)
 			return _PAGE_CACHE_MODE_UC_MINUS;
 
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index cfca4cf..3d6edea 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -567,17 +567,18 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
  * pud_set_huge - setup kernel PUD mapping
  *
  * MTRR can override PAT memory types with 4KB granularity.  Therefore,
- * it does not set up a huge page when the range is covered by a non-WB
- * type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are disabled.
+ * it only sets up a huge page when the range is mapped uniformly by MTRR
+ * (i.e. the range is fully covered by a single MTRR entry or the default
+ * type) or the MTRR memory type is WB.
  *
  * Return 1 on success, and 0 when no PUD was set.
  */
 int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 {
-	u8 mtrr;
+	u8 mtrr, uniform;
 
-	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
+	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
+	if ((!uniform) && (mtrr != MTRR_TYPE_WRBACK))
 		return 0;
 
 	prot = pgprot_4k_2_large(prot);
@@ -593,18 +594,22 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
  * pmd_set_huge - setup kernel PMD mapping
  *
  * MTRR can override PAT memory types with 4KB granularity.  Therefore,
- * it does not set up a huge page when the range is covered by a non-WB
- * type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are disabled.
+ * it only sets up a huge page when the range is mapped uniformly by MTRR
+ * (i.e. the range is fully covered by a single MTRR entry or the default
+ * type) or the MTRR memory type is WB.
  *
  * Return 1 on success, and 0 when no PMD was set.
  */
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
 {
-	u8 mtrr;
+	u8 mtrr, uniform;
 
-	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
+	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
+	if ((!uniform) && (mtrr != MTRR_TYPE_WRBACK)) {
+		pr_warn("pmd_set_huge: requesting [mem %#010llx-%#010llx], which spans more than a single MTRR entry\n",
+				addr, addr + PMD_SIZE);
 		return 0;
+	}
 
 	prot = pgprot_4k_2_large(prot);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/7] mtrr, mm, x86: Enhance MTRR checks for huge I/O mapping
  2015-03-24 22:08 ` Toshi Kani
@ 2015-03-24 22:43   ` Andrew Morton
  -1 siblings, 0 replies; 710+ messages in thread
From: Andrew Morton @ 2015-03-24 22:43 UTC (permalink / raw)
  To: Toshi Kani
  Cc: hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, 24 Mar 2015 16:08:34 -0600 Toshi Kani <toshi.kani@hp.com> wrote:

> This patchset enhances MTRR checks for the kernel huge I/O mapping,
> which was enabled by the patchset below:
>   https://lkml.org/lkml/2015/3/3/589
> 
> The following functional changes are made in patch 7/7.
>  - Allow pud_set_huge() and pmd_set_huge() to create a huge page
>    mapping to a range covered by a single MTRR entry of any memory
>    type.
>  - Log a pr_warn() message when a specified PMD map range spans more
>    than a single MTRR entry.  Drivers should make a mapping request
>    aligned to a single MTRR entry when the range is covered by MTRRs.
> 

OK, I grabbed these after barely looking at them, to get them a bit of
runtime testing.

I'll await guidance from the x86 maintainers regarding next steps?

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/7] mtrr, mm, x86: Enhance MTRR checks for huge I/O mapping
@ 2015-03-24 22:43   ` Andrew Morton
  0 siblings, 0 replies; 710+ messages in thread
From: Andrew Morton @ 2015-03-24 22:43 UTC (permalink / raw)
  To: Toshi Kani
  Cc: hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, 24 Mar 2015 16:08:34 -0600 Toshi Kani <toshi.kani@hp.com> wrote:

> This patchset enhances MTRR checks for the kernel huge I/O mapping,
> which was enabled by the patchset below:
>   https://lkml.org/lkml/2015/3/3/589
> 
> The following functional changes are made in patch 7/7.
>  - Allow pud_set_huge() and pmd_set_huge() to create a huge page
>    mapping to a range covered by a single MTRR entry of any memory
>    type.
>  - Log a pr_warn() message when a specified PMD map range spans more
>    than a single MTRR entry.  Drivers should make a mapping request
>    aligned to a single MTRR entry when the range is covered by MTRRs.
> 

OK, I grabbed these after barely looking at them, to get them a bit of
runtime testing.

I'll await guidance from the x86 maintainers regarding next steps?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 03/47] devres: add devm_ioremap_wc()
  2015-03-20 23:49     ` Andy Lutomirski
@ 2015-03-25 19:50       ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-25 19:50 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Juergen Gross, Jan Beulich, Borislav Petkov, Suresh Siddha,
	venkatesh.pallipadi, Dave Airlie, linux-kernel,
	Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

On Fri, Mar 20, 2015 at 04:49:51PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > We have devm_ioremap_nocache() but no devm_ioremap_wc()
> > so add that. This will be used later.
> >
> > Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> > Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> > Cc: Ingo Molnar <mingo@elte.hu>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Juergen Gross <jgross@suse.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Andy Lutomirski <luto@amacapital.net>
> > Cc: Dave Airlie <airlied@redhat.com>
> > Cc: Antonino Daplas <adaplas@gmail.com>
> > Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> > Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> > Cc: linux-fbdev@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> 
> Looks good to me.

Thanks, I'll peg a Reviewed-by.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 03/47] devres: add devm_ioremap_wc()
@ 2015-03-25 19:50       ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-25 19:50 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Juergen Gross, Jan Beulich, Borislav Petkov, Suresh Siddha,
	venkatesh.pallipadi, Dave Airlie, linux-kernel,
	Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

On Fri, Mar 20, 2015 at 04:49:51PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > We have devm_ioremap_nocache() but no devm_ioremap_wc()
> > so add that. This will be used later.
> >
> > Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> > Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> > Cc: Ingo Molnar <mingo@elte.hu>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Juergen Gross <jgross@suse.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Andy Lutomirski <luto@amacapital.net>
> > Cc: Dave Airlie <airlied@redhat.com>
> > Cc: Antonino Daplas <adaplas@gmail.com>
> > Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> > Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> > Cc: linux-fbdev@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> 
> Looks good to me.

Thanks, I'll peg a Reviewed-by.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 03/47] devres: add devm_ioremap_wc()
  2015-03-20 23:49     ` Andy Lutomirski
  (?)
  (?)
@ 2015-03-25 19:50     ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-25 19:50 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Juergen Gross, Linux Fbdev development list, X86 ML,
	Suresh Siddha, Antonino Daplas, Luis R. Rodriguez, Daniel Vetter,
	Tomi Valkeinen, venkatesh.pallipadi, linux-kernel, xen-devel,
	Ingo Molnar, Jan Beulich, H. Peter Anvin, Dave Airlie,
	Thomas Gleixner, Borislav Petkov,
	Jean-Christophe Plagniol-Villard, Ingo Molnar

On Fri, Mar 20, 2015 at 04:49:51PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > We have devm_ioremap_nocache() but no devm_ioremap_wc()
> > so add that. This will be used later.
> >
> > Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> > Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> > Cc: Ingo Molnar <mingo@elte.hu>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Juergen Gross <jgross@suse.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Andy Lutomirski <luto@amacapital.net>
> > Cc: Dave Airlie <airlied@redhat.com>
> > Cc: Antonino Daplas <adaplas@gmail.com>
> > Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> > Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> > Cc: linux-fbdev@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> 
> Looks good to me.

Thanks, I'll peg a Reviewed-by.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
  2015-03-20 23:17   ` Luis R. Rodriguez
@ 2015-03-25 19:59     ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 710+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-03-25 19:59 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied, linux-kernel, linux-fbdev, x86,
	xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Stefan Bader, ville.syrjala,
	david.vrabel, toshi.kani, bhelgaas, Roger Pau Monné,
	xen-devel

On Fri, Mar 20, 2015 at 04:17:52PM -0700, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
> 
> It is possible to enable CONFIG_MTRR and up with it
> disabled at run time and yet CONFIG_X86_PAT continues
> to kick through fully functionally. This can happen

s/fully/full/ ?


> for instance on Xen where MTRR is not supported but
> PAT is, this can happen now on Linux as of commit
> 47591df50 by Juergen introduced as of v3.19.

s/3.19/4.0/
> 
> Technically we should assume the proper CPU
> bits would be set to disable MTRR but we can't
> always rely on this. At least on the Xen Hypervisor
> for instance only X86_FEATURE_MTRR was disabled
> as of Xen 4.4 through Xen commit 586ab6a [0],
> but not X86_FEATURE_K6_MTRR, X86_FEATURE_CENTAUR_MCR,
> or X86_FEATURE_CYRIX_ARR for instance.

Oh, could you send an patch for that to Xen please?
> 
> x86 mtrr code relies on quite a bit of checks for
> mtrr_if being set to check to see if MTRR did get
> set up, instead of using that lets provide a generic
> setter which when set we know MTRR is enabled. This

s/we know MTRR is enabled/will let us know that MTRR is enabled/

> also adds a few checks where they were not before
> which could potentially safeguard ourselves against
> incorrect usage of MTRR where this was not desirable.
> 
> Where possible match error codes as if MTRR was
> disabled on arch/x86/include/asm/mtrr.h.
> 
> Lastly, since disabling MTRR can happen at run time
> and we could end up with PAT enabled best record now
> on our logs when MTRR is disabled.
> 
> [0] ~/devel/xen (git::stable-4.5)$ git describe --contains 586ab6a
> 4.4.0-rc1~18
> 
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: venkatesh.pallipadi@intel.com
> Cc: Stefan Bader <stefan.bader@canonical.com>
> Cc: konrad.wilk@oracle.com
> Cc: ville.syrjala@linux.intel.com
> Cc: david.vrabel@citrix.com
> Cc: jbeulich@suse.com
> Cc: toshi.kani@hp.com
> Cc: bhelgaas@google.com
> Cc: Roger Pau Monné <roger.pau@citrix.com>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: xen-devel@lists.xensource.com
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
>  arch/x86/include/asm/mtrr.h        |  2 ++
>  arch/x86/kernel/cpu/mtrr/cleanup.c |  2 +-
>  arch/x86/kernel/cpu/mtrr/generic.c |  5 +++--
>  arch/x86/kernel/cpu/mtrr/if.c      |  3 +++
>  arch/x86/kernel/cpu/mtrr/main.c    | 31 ++++++++++++++++++++++---------
>  5 files changed, 31 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
> index f768f62..cade917 100644
> --- a/arch/x86/include/asm/mtrr.h
> +++ b/arch/x86/include/asm/mtrr.h
> @@ -31,6 +31,7 @@
>   * arch_phys_wc_add and arch_phys_wc_del.
>   */
>  # ifdef CONFIG_MTRR
> +extern int mtrr_enabled;
>  extern u8 mtrr_type_lookup(u64 addr, u64 end);
>  extern void mtrr_save_fixed_ranges(void *);
>  extern void mtrr_save_state(void);
> @@ -50,6 +51,7 @@ extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
>  extern int amd_special_default_mtrr(void);
>  extern int phys_wc_to_mtrr_index(int handle);
>  #  else
> +static const int mtrr_enabled;
>  static inline u8 mtrr_type_lookup(u64 addr, u64 end)
>  {
>  	/*
> diff --git a/arch/x86/kernel/cpu/mtrr/cleanup.c b/arch/x86/kernel/cpu/mtrr/cleanup.c
> index 5f90b85..784dc55 100644
> --- a/arch/x86/kernel/cpu/mtrr/cleanup.c
> +++ b/arch/x86/kernel/cpu/mtrr/cleanup.c
> @@ -880,7 +880,7 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn)
>  	 * Make sure we only trim uncachable memory on machines that
>  	 * support the Intel MTRR architecture:
>  	 */
> -	if (!is_cpu(INTEL) || disable_mtrr_trim)
> +	if (!is_cpu(INTEL) || disable_mtrr_trim || !mtrr_enabled)
>  		return 0;
>  
>  	rdmsr(MSR_MTRRdefType, def, dummy);
> diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
> index 09c82de..df321b2 100644
> --- a/arch/x86/kernel/cpu/mtrr/generic.c
> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> @@ -116,7 +116,8 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
>  	u8 prev_match, curr_match;
>  
>  	*repeat = 0;
> -	if (!mtrr_state_set)
> +	/* generic_mtrr_ops is only set for generic_mtrr_ops */
> +	if (!mtrr_state_set || !mtrr_enabled)
>  		return 0xFF;
>  
>  	if (!mtrr_state.enabled)
> @@ -290,7 +291,7 @@ static void get_fixed_ranges(mtrr_type *frs)
>  
>  void mtrr_save_fixed_ranges(void *info)
>  {
> -	if (cpu_has_mtrr)
> +	if (mtrr_enabled && cpu_has_mtrr)
>  		get_fixed_ranges(mtrr_state.fixed_ranges);
>  }
>  
> diff --git a/arch/x86/kernel/cpu/mtrr/if.c b/arch/x86/kernel/cpu/mtrr/if.c
> index d76f13d..e9e001a 100644
> --- a/arch/x86/kernel/cpu/mtrr/if.c
> +++ b/arch/x86/kernel/cpu/mtrr/if.c
> @@ -436,6 +436,9 @@ static int __init mtrr_if_init(void)
>  {
>  	struct cpuinfo_x86 *c = &boot_cpu_data;
>  
> +	if (!mtrr_enabled)
> +		return 0;
> +
>  	if ((!cpu_has(c, X86_FEATURE_MTRR)) &&
>  	    (!cpu_has(c, X86_FEATURE_K6_MTRR)) &&
>  	    (!cpu_has(c, X86_FEATURE_CYRIX_ARR)) &&
> diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
> index ea5f363..7db9c47 100644
> --- a/arch/x86/kernel/cpu/mtrr/main.c
> +++ b/arch/x86/kernel/cpu/mtrr/main.c
> @@ -59,6 +59,7 @@
>  #define MTRR_TO_PHYS_WC_OFFSET 1000
>  
>  u32 num_var_ranges;
> +int mtrr_enabled;
>  
>  unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
>  static DEFINE_MUTEX(mtrr_mutex);
> @@ -84,6 +85,9 @@ static int have_wrcomb(void)
>  {
>  	struct pci_dev *dev;
>  
> +	if (!mtrr_enabled)
> +		return 0;
> +
>  	dev = pci_get_class(PCI_CLASS_BRIDGE_HOST << 8, NULL);
>  	if (dev != NULL) {
>  		/*
> @@ -286,7 +290,7 @@ int mtrr_add_page(unsigned long base, unsigned long size,
>  	int i, replace, error;
>  	mtrr_type ltype;
>  
> -	if (!mtrr_if)
> +	if (!mtrr_enabled)
>  		return -ENXIO;
>  
>  	error = mtrr_if->validate_add_page(base, size, type);
> @@ -388,6 +392,8 @@ int mtrr_add_page(unsigned long base, unsigned long size,
>  
>  static int mtrr_check(unsigned long base, unsigned long size)
>  {
> +	if (!mtrr_enabled)
> +		return -ENODEV;
>  	if ((base & (PAGE_SIZE - 1)) || (size & (PAGE_SIZE - 1))) {
>  		pr_warning("mtrr: size and base must be multiples of 4 kiB\n");
>  		pr_debug("mtrr: size: 0x%lx  base: 0x%lx\n", size, base);
> @@ -463,8 +469,8 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
>  	unsigned long lbase, lsize;
>  	int error = -EINVAL;
>  
> -	if (!mtrr_if)
> -		return -ENXIO;
> +	if (!mtrr_enabled)
> +		return -ENODEV;
>  
>  	max = num_var_ranges;
>  	/* No CPU hotplug when we change MTRR entries */
> @@ -523,6 +529,8 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
>   */
>  int mtrr_del(int reg, unsigned long base, unsigned long size)
>  {
> +	if (!mtrr_enabled)
> +		return -ENODEV;
>  	if (mtrr_check(base, size))
>  		return -EINVAL;
>  	return mtrr_del_page(reg, base >> PAGE_SHIFT, size >> PAGE_SHIFT);
> @@ -545,7 +553,7 @@ int arch_phys_wc_add(unsigned long base, unsigned long size)
>  {
>  	int ret;
>  
> -	if (pat_enabled)
> +	if (pat_enabled || !mtrr_enabled)
>  		return 0;  /* Success!  (We don't need to do anything.) */
>  
>  	ret = mtrr_add(base, size, MTRR_TYPE_WRCOMB, true);
> @@ -734,6 +742,7 @@ void __init mtrr_bp_init(void)
>  	}
>  
>  	if (mtrr_if) {
> +		mtrr_enabled = true;
>  		set_num_var_ranges();
>  		init_table();
>  		if (use_intel()) {
> @@ -744,12 +753,13 @@ void __init mtrr_bp_init(void)
>  				mtrr_if->set_all();
>  			}
>  		}
> -	}
> +	} else
> +		pr_info("mtrr: system does not support MTRR\n");

 'pr_warn' ? 
>  }
>  
>  void mtrr_ap_init(void)
>  {
> -	if (!use_intel() || mtrr_aps_delayed_init)
> +	if (!use_intel() || mtrr_aps_delayed_init || !mtrr_enabled)
>  		return;
>  	/*
>  	 * Ideally we should hold mtrr_mutex here to avoid mtrr entries
> @@ -774,6 +784,9 @@ void mtrr_save_state(void)
>  {
>  	int first_cpu;
>  
> +	if (!mtrr_enabled)
> +		return;
> +
>  	get_online_cpus();
>  	first_cpu = cpumask_first(cpu_online_mask);
>  	smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
> @@ -782,7 +795,7 @@ void mtrr_save_state(void)
>  
>  void set_mtrr_aps_delayed_init(void)
>  {
> -	if (!use_intel())
> +	if (!use_intel() || !mtrr_enabled)
>  		return;
>  
>  	mtrr_aps_delayed_init = true;
> @@ -810,7 +823,7 @@ void mtrr_aps_init(void)
>  
>  void mtrr_bp_restore(void)
>  {
> -	if (!use_intel())
> +	if (!use_intel() || !mtrr_enabled)
>  		return;
>  
>  	mtrr_if->set_all();
> @@ -818,7 +831,7 @@ void mtrr_bp_restore(void)
>  
>  static int __init mtrr_init_finialize(void)
>  {
> -	if (!mtrr_if)
> +	if (!mtrr_enabled)
>  		return 0;
>  
>  	if (use_intel()) {
> -- 
> 2.3.2.209.gd67f9d5.dirty
> 

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
@ 2015-03-25 19:59     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 710+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-03-25 19:59 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied, linux-kernel, linux-fbdev, x86,
	xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Stefan Bader, ville.syrjala,
	david.vrabel, toshi.kani, bhelgaas, Roger Pau Monné,
	xen-devel

On Fri, Mar 20, 2015 at 04:17:52PM -0700, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
> 
> It is possible to enable CONFIG_MTRR and up with it
> disabled at run time and yet CONFIG_X86_PAT continues
> to kick through fully functionally. This can happen

s/fully/full/ ?


> for instance on Xen where MTRR is not supported but
> PAT is, this can happen now on Linux as of commit
> 47591df50 by Juergen introduced as of v3.19.

s/3.19/4.0/
> 
> Technically we should assume the proper CPU
> bits would be set to disable MTRR but we can't
> always rely on this. At least on the Xen Hypervisor
> for instance only X86_FEATURE_MTRR was disabled
> as of Xen 4.4 through Xen commit 586ab6a [0],
> but not X86_FEATURE_K6_MTRR, X86_FEATURE_CENTAUR_MCR,
> or X86_FEATURE_CYRIX_ARR for instance.

Oh, could you send an patch for that to Xen please?
> 
> x86 mtrr code relies on quite a bit of checks for
> mtrr_if being set to check to see if MTRR did get
> set up, instead of using that lets provide a generic
> setter which when set we know MTRR is enabled. This

s/we know MTRR is enabled/will let us know that MTRR is enabled/

> also adds a few checks where they were not before
> which could potentially safeguard ourselves against
> incorrect usage of MTRR where this was not desirable.
> 
> Where possible match error codes as if MTRR was
> disabled on arch/x86/include/asm/mtrr.h.
> 
> Lastly, since disabling MTRR can happen at run time
> and we could end up with PAT enabled best record now
> on our logs when MTRR is disabled.
> 
> [0] ~/devel/xen (git::stable-4.5)$ git describe --contains 586ab6a
> 4.4.0-rc1~18
> 
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: venkatesh.pallipadi@intel.com
> Cc: Stefan Bader <stefan.bader@canonical.com>
> Cc: konrad.wilk@oracle.com
> Cc: ville.syrjala@linux.intel.com
> Cc: david.vrabel@citrix.com
> Cc: jbeulich@suse.com
> Cc: toshi.kani@hp.com
> Cc: bhelgaas@google.com
> Cc: Roger Pau Monné <roger.pau@citrix.com>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: xen-devel@lists.xensource.com
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
>  arch/x86/include/asm/mtrr.h        |  2 ++
>  arch/x86/kernel/cpu/mtrr/cleanup.c |  2 +-
>  arch/x86/kernel/cpu/mtrr/generic.c |  5 +++--
>  arch/x86/kernel/cpu/mtrr/if.c      |  3 +++
>  arch/x86/kernel/cpu/mtrr/main.c    | 31 ++++++++++++++++++++++---------
>  5 files changed, 31 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
> index f768f62..cade917 100644
> --- a/arch/x86/include/asm/mtrr.h
> +++ b/arch/x86/include/asm/mtrr.h
> @@ -31,6 +31,7 @@
>   * arch_phys_wc_add and arch_phys_wc_del.
>   */
>  # ifdef CONFIG_MTRR
> +extern int mtrr_enabled;
>  extern u8 mtrr_type_lookup(u64 addr, u64 end);
>  extern void mtrr_save_fixed_ranges(void *);
>  extern void mtrr_save_state(void);
> @@ -50,6 +51,7 @@ extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
>  extern int amd_special_default_mtrr(void);
>  extern int phys_wc_to_mtrr_index(int handle);
>  #  else
> +static const int mtrr_enabled;
>  static inline u8 mtrr_type_lookup(u64 addr, u64 end)
>  {
>  	/*
> diff --git a/arch/x86/kernel/cpu/mtrr/cleanup.c b/arch/x86/kernel/cpu/mtrr/cleanup.c
> index 5f90b85..784dc55 100644
> --- a/arch/x86/kernel/cpu/mtrr/cleanup.c
> +++ b/arch/x86/kernel/cpu/mtrr/cleanup.c
> @@ -880,7 +880,7 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn)
>  	 * Make sure we only trim uncachable memory on machines that
>  	 * support the Intel MTRR architecture:
>  	 */
> -	if (!is_cpu(INTEL) || disable_mtrr_trim)
> +	if (!is_cpu(INTEL) || disable_mtrr_trim || !mtrr_enabled)
>  		return 0;
>  
>  	rdmsr(MSR_MTRRdefType, def, dummy);
> diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
> index 09c82de..df321b2 100644
> --- a/arch/x86/kernel/cpu/mtrr/generic.c
> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> @@ -116,7 +116,8 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
>  	u8 prev_match, curr_match;
>  
>  	*repeat = 0;
> -	if (!mtrr_state_set)
> +	/* generic_mtrr_ops is only set for generic_mtrr_ops */
> +	if (!mtrr_state_set || !mtrr_enabled)
>  		return 0xFF;
>  
>  	if (!mtrr_state.enabled)
> @@ -290,7 +291,7 @@ static void get_fixed_ranges(mtrr_type *frs)
>  
>  void mtrr_save_fixed_ranges(void *info)
>  {
> -	if (cpu_has_mtrr)
> +	if (mtrr_enabled && cpu_has_mtrr)
>  		get_fixed_ranges(mtrr_state.fixed_ranges);
>  }
>  
> diff --git a/arch/x86/kernel/cpu/mtrr/if.c b/arch/x86/kernel/cpu/mtrr/if.c
> index d76f13d..e9e001a 100644
> --- a/arch/x86/kernel/cpu/mtrr/if.c
> +++ b/arch/x86/kernel/cpu/mtrr/if.c
> @@ -436,6 +436,9 @@ static int __init mtrr_if_init(void)
>  {
>  	struct cpuinfo_x86 *c = &boot_cpu_data;
>  
> +	if (!mtrr_enabled)
> +		return 0;
> +
>  	if ((!cpu_has(c, X86_FEATURE_MTRR)) &&
>  	    (!cpu_has(c, X86_FEATURE_K6_MTRR)) &&
>  	    (!cpu_has(c, X86_FEATURE_CYRIX_ARR)) &&
> diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
> index ea5f363..7db9c47 100644
> --- a/arch/x86/kernel/cpu/mtrr/main.c
> +++ b/arch/x86/kernel/cpu/mtrr/main.c
> @@ -59,6 +59,7 @@
>  #define MTRR_TO_PHYS_WC_OFFSET 1000
>  
>  u32 num_var_ranges;
> +int mtrr_enabled;
>  
>  unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
>  static DEFINE_MUTEX(mtrr_mutex);
> @@ -84,6 +85,9 @@ static int have_wrcomb(void)
>  {
>  	struct pci_dev *dev;
>  
> +	if (!mtrr_enabled)
> +		return 0;
> +
>  	dev = pci_get_class(PCI_CLASS_BRIDGE_HOST << 8, NULL);
>  	if (dev != NULL) {
>  		/*
> @@ -286,7 +290,7 @@ int mtrr_add_page(unsigned long base, unsigned long size,
>  	int i, replace, error;
>  	mtrr_type ltype;
>  
> -	if (!mtrr_if)
> +	if (!mtrr_enabled)
>  		return -ENXIO;
>  
>  	error = mtrr_if->validate_add_page(base, size, type);
> @@ -388,6 +392,8 @@ int mtrr_add_page(unsigned long base, unsigned long size,
>  
>  static int mtrr_check(unsigned long base, unsigned long size)
>  {
> +	if (!mtrr_enabled)
> +		return -ENODEV;
>  	if ((base & (PAGE_SIZE - 1)) || (size & (PAGE_SIZE - 1))) {
>  		pr_warning("mtrr: size and base must be multiples of 4 kiB\n");
>  		pr_debug("mtrr: size: 0x%lx  base: 0x%lx\n", size, base);
> @@ -463,8 +469,8 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
>  	unsigned long lbase, lsize;
>  	int error = -EINVAL;
>  
> -	if (!mtrr_if)
> -		return -ENXIO;
> +	if (!mtrr_enabled)
> +		return -ENODEV;
>  
>  	max = num_var_ranges;
>  	/* No CPU hotplug when we change MTRR entries */
> @@ -523,6 +529,8 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
>   */
>  int mtrr_del(int reg, unsigned long base, unsigned long size)
>  {
> +	if (!mtrr_enabled)
> +		return -ENODEV;
>  	if (mtrr_check(base, size))
>  		return -EINVAL;
>  	return mtrr_del_page(reg, base >> PAGE_SHIFT, size >> PAGE_SHIFT);
> @@ -545,7 +553,7 @@ int arch_phys_wc_add(unsigned long base, unsigned long size)
>  {
>  	int ret;
>  
> -	if (pat_enabled)
> +	if (pat_enabled || !mtrr_enabled)
>  		return 0;  /* Success!  (We don't need to do anything.) */
>  
>  	ret = mtrr_add(base, size, MTRR_TYPE_WRCOMB, true);
> @@ -734,6 +742,7 @@ void __init mtrr_bp_init(void)
>  	}
>  
>  	if (mtrr_if) {
> +		mtrr_enabled = true;
>  		set_num_var_ranges();
>  		init_table();
>  		if (use_intel()) {
> @@ -744,12 +753,13 @@ void __init mtrr_bp_init(void)
>  				mtrr_if->set_all();
>  			}
>  		}
> -	}
> +	} else
> +		pr_info("mtrr: system does not support MTRR\n");

 'pr_warn' ? 
>  }
>  
>  void mtrr_ap_init(void)
>  {
> -	if (!use_intel() || mtrr_aps_delayed_init)
> +	if (!use_intel() || mtrr_aps_delayed_init || !mtrr_enabled)
>  		return;
>  	/*
>  	 * Ideally we should hold mtrr_mutex here to avoid mtrr entries
> @@ -774,6 +784,9 @@ void mtrr_save_state(void)
>  {
>  	int first_cpu;
>  
> +	if (!mtrr_enabled)
> +		return;
> +
>  	get_online_cpus();
>  	first_cpu = cpumask_first(cpu_online_mask);
>  	smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
> @@ -782,7 +795,7 @@ void mtrr_save_state(void)
>  
>  void set_mtrr_aps_delayed_init(void)
>  {
> -	if (!use_intel())
> +	if (!use_intel() || !mtrr_enabled)
>  		return;
>  
>  	mtrr_aps_delayed_init = true;
> @@ -810,7 +823,7 @@ void mtrr_aps_init(void)
>  
>  void mtrr_bp_restore(void)
>  {
> -	if (!use_intel())
> +	if (!use_intel() || !mtrr_enabled)
>  		return;
>  
>  	mtrr_if->set_all();
> @@ -818,7 +831,7 @@ void mtrr_bp_restore(void)
>  
>  static int __init mtrr_init_finialize(void)
>  {
> -	if (!mtrr_if)
> +	if (!mtrr_enabled)
>  		return 0;
>  
>  	if (use_intel()) {
> -- 
> 2.3.2.209.gd67f9d5.dirty
> 

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [Xen-devel] [PATCH v1 04/47] pci: add pci_ioremap_wc_bar()
  2015-03-20 23:17   ` Luis R. Rodriguez
@ 2015-03-25 20:03     ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 710+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-03-25 20:03 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied, linux-fbdev, Antonino Daplas,
	Daniel Vetter, Luis R. Rodriguez, x86, linux-kernel,
	Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

On Fri, Mar 20, 2015 at 04:17:54PM -0700, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
> 
> This lets drivers take advanate of PAT when available. This

s/advanate/advantage/
> should help with the transition of converting video drivers over
> to ioremap_wc() to help with the goal of eventually using
> _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on
> ioremap_nocache() (de33c442e)

Please mention the title of the patch too:

"x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()"
> 
> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
>  drivers/pci/pci.c   | 14 ++++++++++++++
>  include/linux/pci.h |  1 +
>  2 files changed, 15 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 81f06e8..6afd507 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -137,6 +137,20 @@ void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar)
>  				     pci_resource_len(pdev, bar));
>  }
>  EXPORT_SYMBOL_GPL(pci_ioremap_bar);
> +
> +void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar)
> +{
> +	/*
> +	 * Make sure the BAR is actually a memory resource, not an IO resource
> +	 */
> +	if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) {
> +		WARN_ON(1);

Would it be better to use dev_warn ? That way you can see which BDF it is?

Thought WARN will give a nice stack-trace that should easily point to the
driver so perhaps not.. Either way - up to you.

> +		return NULL;
> +	}
> +	return ioremap_wc(pci_resource_start(pdev, bar),
> +			  pci_resource_len(pdev, bar));
> +}
> +EXPORT_SYMBOL_GPL(pci_ioremap_wc_bar);
>  #endif
>  
>  #define PCI_FIND_CAP_TTL	48
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 211e9da..c235b09 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1667,6 +1667,7 @@ static inline void pci_mmcfg_late_init(void) { }
>  int pci_ext_cfg_avail(void);
>  
>  void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar);
> +void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar);
>  
>  #ifdef CONFIG_PCI_IOV
>  int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
> -- 
> 2.3.2.209.gd67f9d5.dirty
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [Xen-devel] [PATCH v1 04/47] pci: add pci_ioremap_wc_bar()
@ 2015-03-25 20:03     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 710+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-03-25 20:03 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied, linux-fbdev, Antonino Daplas,
	Daniel Vetter, Luis R. Rodriguez, x86, linux-kernel,
	Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

On Fri, Mar 20, 2015 at 04:17:54PM -0700, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
> 
> This lets drivers take advanate of PAT when available. This

s/advanate/advantage/
> should help with the transition of converting video drivers over
> to ioremap_wc() to help with the goal of eventually using
> _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on
> ioremap_nocache() (de33c442e)

Please mention the title of the patch too:

"x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()"
> 
> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
>  drivers/pci/pci.c   | 14 ++++++++++++++
>  include/linux/pci.h |  1 +
>  2 files changed, 15 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 81f06e8..6afd507 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -137,6 +137,20 @@ void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar)
>  				     pci_resource_len(pdev, bar));
>  }
>  EXPORT_SYMBOL_GPL(pci_ioremap_bar);
> +
> +void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar)
> +{
> +	/*
> +	 * Make sure the BAR is actually a memory resource, not an IO resource
> +	 */
> +	if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) {
> +		WARN_ON(1);

Would it be better to use dev_warn ? That way you can see which BDF it is?

Thought WARN will give a nice stack-trace that should easily point to the
driver so perhaps not.. Either way - up to you.

> +		return NULL;
> +	}
> +	return ioremap_wc(pci_resource_start(pdev, bar),
> +			  pci_resource_len(pdev, bar));
> +}
> +EXPORT_SYMBOL_GPL(pci_ioremap_wc_bar);
>  #endif
>  
>  #define PCI_FIND_CAP_TTL	48
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 211e9da..c235b09 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1667,6 +1667,7 @@ static inline void pci_mmcfg_late_init(void) { }
>  int pci_ext_cfg_avail(void);
>  
>  void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar);
> +void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar);
>  
>  #ifdef CONFIG_PCI_IOV
>  int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
> -- 
> 2.3.2.209.gd67f9d5.dirty
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 04/47] pci: add pci_ioremap_wc_bar()
  2015-03-20 23:17   ` Luis R. Rodriguez
                     ` (2 preceding siblings ...)
  (?)
@ 2015-03-25 20:03   ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 710+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-03-25 20:03 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: jgross, linux-fbdev, x86, suresh.b.siddha, Antonino Daplas,
	Daniel Vetter, Luis R. Rodriguez, venkatesh.pallipadi,
	linux-kernel, luto, xen-devel, mingo, Tomi Valkeinen, JBeulich,
	hpa, airlied, tglx, bp, Jean-Christophe Plagniol-Villard,
	Ingo Molnar

On Fri, Mar 20, 2015 at 04:17:54PM -0700, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
> 
> This lets drivers take advanate of PAT when available. This

s/advanate/advantage/
> should help with the transition of converting video drivers over
> to ioremap_wc() to help with the goal of eventually using
> _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on
> ioremap_nocache() (de33c442e)

Please mention the title of the patch too:

"x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()"
> 
> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
>  drivers/pci/pci.c   | 14 ++++++++++++++
>  include/linux/pci.h |  1 +
>  2 files changed, 15 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 81f06e8..6afd507 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -137,6 +137,20 @@ void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar)
>  				     pci_resource_len(pdev, bar));
>  }
>  EXPORT_SYMBOL_GPL(pci_ioremap_bar);
> +
> +void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar)
> +{
> +	/*
> +	 * Make sure the BAR is actually a memory resource, not an IO resource
> +	 */
> +	if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) {
> +		WARN_ON(1);

Would it be better to use dev_warn ? That way you can see which BDF it is?

Thought WARN will give a nice stack-trace that should easily point to the
driver so perhaps not.. Either way - up to you.

> +		return NULL;
> +	}
> +	return ioremap_wc(pci_resource_start(pdev, bar),
> +			  pci_resource_len(pdev, bar));
> +}
> +EXPORT_SYMBOL_GPL(pci_ioremap_wc_bar);
>  #endif
>  
>  #define PCI_FIND_CAP_TTL	48
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 211e9da..c235b09 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1667,6 +1667,7 @@ static inline void pci_mmcfg_late_init(void) { }
>  int pci_ext_cfg_avail(void);
>  
>  void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar);
> +void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar);
>  
>  #ifdef CONFIG_PCI_IOV
>  int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
> -- 
> 2.3.2.209.gd67f9d5.dirty
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 04/47] pci: add pci_ioremap_wc_bar()
  2015-03-20 23:50     ` Andy Lutomirski
@ 2015-03-25 20:06       ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-25 20:06 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Juergen Gross, Jan Beulich, Borislav Petkov, Suresh Siddha,
	venkatesh.pallipadi, Dave Airlie, linux-kernel,
	Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

On Fri, Mar 20, 2015 at 04:50:32PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > This lets drivers take advanate of PAT when available. This
> > should help with the transition of converting video drivers over
> > to ioremap_wc() to help with the goal of eventually using
> > _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on
> > ioremap_nocache() (de33c442e)
> >
> > Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> > Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> > Cc: Ingo Molnar <mingo@elte.hu>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Juergen Gross <jgross@suse.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Andy Lutomirski <luto@amacapital.net>
> > Cc: Dave Airlie <airlied@redhat.com>
> > Cc: Antonino Daplas <adaplas@gmail.com>
> > Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> > Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> > Cc: linux-fbdev@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> > ---
> >  drivers/pci/pci.c   | 14 ++++++++++++++
> >  include/linux/pci.h |  1 +
> >  2 files changed, 15 insertions(+)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 81f06e8..6afd507 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -137,6 +137,20 @@ void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar)
> >                                      pci_resource_len(pdev, bar));
> >  }
> >  EXPORT_SYMBOL_GPL(pci_ioremap_bar);
> > +
> > +void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar)
> > +{
> > +       /*
> > +        * Make sure the BAR is actually a memory resource, not an IO resource
> > +        */
> > +       if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) {
> > +               WARN_ON(1);
> > +               return NULL;
> > +       }
> 
> if (WARN_ON(...))?

Sure, they are equivalent however this follows the same exact style as
pci_ioremap_bar() so if we change this one might as well change the style of
pci_ioremap_bar() as well. Let me know if there is any preference. I personally
don't mind the extra line as it shortens the check.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 04/47] pci: add pci_ioremap_wc_bar()
@ 2015-03-25 20:06       ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-25 20:06 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Juergen Gross, Jan Beulich, Borislav Petkov, Suresh Siddha,
	venkatesh.pallipadi, Dave Airlie, linux-kernel,
	Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

On Fri, Mar 20, 2015 at 04:50:32PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > This lets drivers take advanate of PAT when available. This
> > should help with the transition of converting video drivers over
> > to ioremap_wc() to help with the goal of eventually using
> > _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on
> > ioremap_nocache() (de33c442e)
> >
> > Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> > Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> > Cc: Ingo Molnar <mingo@elte.hu>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Juergen Gross <jgross@suse.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Andy Lutomirski <luto@amacapital.net>
> > Cc: Dave Airlie <airlied@redhat.com>
> > Cc: Antonino Daplas <adaplas@gmail.com>
> > Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> > Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> > Cc: linux-fbdev@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> > ---
> >  drivers/pci/pci.c   | 14 ++++++++++++++
> >  include/linux/pci.h |  1 +
> >  2 files changed, 15 insertions(+)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 81f06e8..6afd507 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -137,6 +137,20 @@ void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar)
> >                                      pci_resource_len(pdev, bar));
> >  }
> >  EXPORT_SYMBOL_GPL(pci_ioremap_bar);
> > +
> > +void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar)
> > +{
> > +       /*
> > +        * Make sure the BAR is actually a memory resource, not an IO resource
> > +        */
> > +       if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) {
> > +               WARN_ON(1);
> > +               return NULL;
> > +       }
> 
> if (WARN_ON(...))?

Sure, they are equivalent however this follows the same exact style as
pci_ioremap_bar() so if we change this one might as well change the style of
pci_ioremap_bar() as well. Let me know if there is any preference. I personally
don't mind the extra line as it shortens the check.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 04/47] pci: add pci_ioremap_wc_bar()
  2015-03-20 23:50     ` Andy Lutomirski
  (?)
@ 2015-03-25 20:06     ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-25 20:06 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Juergen Gross, Linux Fbdev development list, X86 ML,
	Suresh Siddha, Antonino Daplas, Luis R. Rodriguez, Daniel Vetter,
	Tomi Valkeinen, venkatesh.pallipadi, linux-kernel, xen-devel,
	Ingo Molnar, Jan Beulich, H. Peter Anvin, Dave Airlie,
	Thomas Gleixner, Borislav Petkov,
	Jean-Christophe Plagniol-Villard, Ingo Molnar

On Fri, Mar 20, 2015 at 04:50:32PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > This lets drivers take advanate of PAT when available. This
> > should help with the transition of converting video drivers over
> > to ioremap_wc() to help with the goal of eventually using
> > _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on
> > ioremap_nocache() (de33c442e)
> >
> > Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> > Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> > Cc: Ingo Molnar <mingo@elte.hu>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Juergen Gross <jgross@suse.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Andy Lutomirski <luto@amacapital.net>
> > Cc: Dave Airlie <airlied@redhat.com>
> > Cc: Antonino Daplas <adaplas@gmail.com>
> > Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> > Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> > Cc: linux-fbdev@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> > ---
> >  drivers/pci/pci.c   | 14 ++++++++++++++
> >  include/linux/pci.h |  1 +
> >  2 files changed, 15 insertions(+)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 81f06e8..6afd507 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -137,6 +137,20 @@ void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar)
> >                                      pci_resource_len(pdev, bar));
> >  }
> >  EXPORT_SYMBOL_GPL(pci_ioremap_bar);
> > +
> > +void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar)
> > +{
> > +       /*
> > +        * Make sure the BAR is actually a memory resource, not an IO resource
> > +        */
> > +       if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) {
> > +               WARN_ON(1);
> > +               return NULL;
> > +       }
> 
> if (WARN_ON(...))?

Sure, they are equivalent however this follows the same exact style as
pci_ioremap_bar() so if we change this one might as well change the style of
pci_ioremap_bar() as well. Let me know if there is any preference. I personally
don't mind the extra line as it shortens the check.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
  2015-03-20 23:17   ` Luis R. Rodriguez
  (?)
@ 2015-03-25 20:07     ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 710+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-03-25 20:07 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied, linux-kernel, linux-fbdev, x86,
	xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
	Bjorn Helgaas, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
	Stefan Bader, ville.syrjala, david.vrabel, toshi.kani,
	Roger Pau Monné,
	xen-devel

On Fri, Mar 20, 2015 at 04:17:55PM -0700, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
> 
> This allows drivers to take advantage of write-combining
> when possible. Ideally we'd have pci_read_bases() just
> peg an IORESOURCE_WC flag for us but where exactly
> video devices memory lie varies *largely* and at times things
> are mixed with MMIO registers, sometimes we can address
> the changes in drivers, other times the change requires
> intrusive changes.
> 
> Although there is also arch_phys_wc_add() that makes use of
> architecture specific write-combinging alternatives (MTRR on

combinging?

> x86 when a system does not have PAT) we void polluting
> pci_iomap() space with it and force drivers and subsystems
> that want to use it to be explicit.
> 
> There are a few motivations for this:
> 
> a) Take advantage of PAT when available
> 
> b) Help bury MTRR code away, MTRR is architecture specific and on
>    x86 its replaced by PAT
> 
> c) Help with the goal of eventually using _PAGE_CACHE_UC over
>    _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
> 
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: venkatesh.pallipadi@intel.com
> Cc: Stefan Bader <stefan.bader@canonical.com>
> Cc: konrad.wilk@oracle.com
> Cc: ville.syrjala@linux.intel.com
> Cc: david.vrabel@citrix.com
> Cc: jbeulich@suse.com
> Cc: toshi.kani@hp.com
> Cc: Roger Pau Monné <roger.pau@citrix.com>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: xen-devel@lists.xensource.com
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
>  include/asm-generic/pci_iomap.h | 14 ++++++++++
>  lib/pci_iomap.c                 | 61 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 75 insertions(+)
> 
> diff --git a/include/asm-generic/pci_iomap.h b/include/asm-generic/pci_iomap.h
> index 7389c87..b1e17fc 100644
> --- a/include/asm-generic/pci_iomap.h
> +++ b/include/asm-generic/pci_iomap.h
> @@ -15,9 +15,13 @@ struct pci_dev;
>  #ifdef CONFIG_PCI
>  /* Create a virtual mapping cookie for a PCI BAR (memory or IO) */
>  extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max);
> +extern void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long max);
>  extern void __iomem *pci_iomap_range(struct pci_dev *dev, int bar,
>  				     unsigned long offset,
>  				     unsigned long maxlen);
> +extern void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar,
> +					unsigned long offset,
> +					unsigned long maxlen);
>  /* Create a virtual mapping cookie for a port on a given PCI device.
>   * Do not call this directly, it exists to make it easier for architectures
>   * to override */
> @@ -34,12 +38,22 @@ static inline void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned lon
>  	return NULL;
>  }
>  
> +static inline void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long max)
> +{
> +	return NULL;
> +}
>  static inline void __iomem *pci_iomap_range(struct pci_dev *dev, int bar,
>  					    unsigned long offset,
>  					    unsigned long maxlen)
>  {
>  	return NULL;
>  }
> +static inline void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar,
> +					       unsigned long offset,
> +					       unsigned long maxlen)
> +{
> +	return NULL;
> +}
>  #endif
>  
>  #endif /* __ASM_GENERIC_IO_H */
> diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c
> index bcce5f1..30b65ae 100644
> --- a/lib/pci_iomap.c
> +++ b/lib/pci_iomap.c
> @@ -52,6 +52,46 @@ void __iomem *pci_iomap_range(struct pci_dev *dev,
>  EXPORT_SYMBOL(pci_iomap_range);
>  
>  /**
> + * pci_iomap_wc_range - create a virtual WC mapping cookie for a PCI BAR
> + * @dev: PCI device that owns the BAR
> + * @bar: BAR number
> + * @offset: map memory at the given offset in BAR
> + * @maxlen: max length of the memory to map
> + *
> + * Using this function you will get a __iomem address to your device BAR.
> + * You can access it using ioread*() and iowrite*(). These functions hide
> + * the details if this is a MMIO or PIO address space and will just do what
> + * you expect from them in the correct way. When possible write combining
> + * is used.
> + *
> + * @maxlen specifies the maximum length to map. If you want to get access to
> + * the complete BAR from offset to the end, pass %0 here.

s/%0/0 ? Or is that some special syntax?

> + * */
> +void __iomem *pci_iomap_wc_range(struct pci_dev *dev,
> +				 int bar,
> +				 unsigned long offset,
> +				 unsigned long maxlen)
> +{
> +	resource_size_t start = pci_resource_start(dev, bar);
> +	resource_size_t len = pci_resource_len(dev, bar);
> +	unsigned long flags = pci_resource_flags(dev, bar);
> +
> +	if (len <= offset || !start)
> +		return NULL;
> +	len -= offset;
> +	start += offset;
> +	if (maxlen && len > maxlen)
> +		len = maxlen;
> +	if (flags & IORESOURCE_IO)
> +		return __pci_ioport_map(dev, start, len);
> +	if (flags & IORESOURCE_MEM)
> +		return ioremap_wc(start, len);
> +	/* What? */
> +	return NULL;
> +}
> +EXPORT_SYMBOL_GPL(pci_iomap_wc_range);
> +
> +/**
>   * pci_iomap - create a virtual mapping cookie for a PCI BAR
>   * @dev: PCI device that owns the BAR
>   * @bar: BAR number
> @@ -70,4 +110,25 @@ void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long maxlen)
>  	return pci_iomap_range(dev, bar, 0, maxlen);
>  }
>  EXPORT_SYMBOL(pci_iomap);
> +
> +/**
> + * pci_iomap_wc - create a virtual WC mapping cookie for a PCI BAR
> + * @dev: PCI device that owns the BAR
> + * @bar: BAR number
> + * @maxlen: length of the memory to map
> + *
> + * Using this function you will get a __iomem address to your device BAR.
> + * You can access it using ioread*() and iowrite*(). These functions hide
> + * the details if this is a MMIO or PIO address space and will just do what
> + * you expect from them in the correct way. When possible write combining
> + * is used.
> + *
> + * @maxlen specifies the maximum length to map. If you want to get access to
> + * the complete BAR without checking for its length first, pass %0 here.
> + * */
> +void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long maxlen)
> +{
> +	return pci_iomap_wc_range(dev, bar, 0, maxlen);
> +}
> +EXPORT_SYMBOL_GPL(pci_iomap_wc);
>  #endif /* CONFIG_PCI */
> -- 
> 2.3.2.209.gd67f9d5.dirty
> 

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
@ 2015-03-25 20:07     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 710+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-03-25 20:07 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied, linux-kernel, linux-fbdev, x86,
	xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
	Bjorn Helgaas, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
	Stefan Bader, ville.syrjala, david.vrabel, toshi.kani,
	Roger Pau Monné,
	xen-devel

On Fri, Mar 20, 2015 at 04:17:55PM -0700, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
> 
> This allows drivers to take advantage of write-combining
> when possible. Ideally we'd have pci_read_bases() just
> peg an IORESOURCE_WC flag for us but where exactly
> video devices memory lie varies *largely* and at times things
> are mixed with MMIO registers, sometimes we can address
> the changes in drivers, other times the change requires
> intrusive changes.
> 
> Although there is also arch_phys_wc_add() that makes use of
> architecture specific write-combinging alternatives (MTRR on

combinging?

> x86 when a system does not have PAT) we void polluting
> pci_iomap() space with it and force drivers and subsystems
> that want to use it to be explicit.
> 
> There are a few motivations for this:
> 
> a) Take advantage of PAT when available
> 
> b) Help bury MTRR code away, MTRR is architecture specific and on
>    x86 its replaced by PAT
> 
> c) Help with the goal of eventually using _PAGE_CACHE_UC over
>    _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
> 
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: venkatesh.pallipadi@intel.com
> Cc: Stefan Bader <stefan.bader@canonical.com>
> Cc: konrad.wilk@oracle.com
> Cc: ville.syrjala@linux.intel.com
> Cc: david.vrabel@citrix.com
> Cc: jbeulich@suse.com
> Cc: toshi.kani@hp.com
> Cc: Roger Pau Monné <roger.pau@citrix.com>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: xen-devel@lists.xensource.com
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
>  include/asm-generic/pci_iomap.h | 14 ++++++++++
>  lib/pci_iomap.c                 | 61 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 75 insertions(+)
> 
> diff --git a/include/asm-generic/pci_iomap.h b/include/asm-generic/pci_iomap.h
> index 7389c87..b1e17fc 100644
> --- a/include/asm-generic/pci_iomap.h
> +++ b/include/asm-generic/pci_iomap.h
> @@ -15,9 +15,13 @@ struct pci_dev;
>  #ifdef CONFIG_PCI
>  /* Create a virtual mapping cookie for a PCI BAR (memory or IO) */
>  extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max);
> +extern void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long max);
>  extern void __iomem *pci_iomap_range(struct pci_dev *dev, int bar,
>  				     unsigned long offset,
>  				     unsigned long maxlen);
> +extern void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar,
> +					unsigned long offset,
> +					unsigned long maxlen);
>  /* Create a virtual mapping cookie for a port on a given PCI device.
>   * Do not call this directly, it exists to make it easier for architectures
>   * to override */
> @@ -34,12 +38,22 @@ static inline void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned lon
>  	return NULL;
>  }
>  
> +static inline void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long max)
> +{
> +	return NULL;
> +}
>  static inline void __iomem *pci_iomap_range(struct pci_dev *dev, int bar,
>  					    unsigned long offset,
>  					    unsigned long maxlen)
>  {
>  	return NULL;
>  }
> +static inline void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar,
> +					       unsigned long offset,
> +					       unsigned long maxlen)
> +{
> +	return NULL;
> +}
>  #endif
>  
>  #endif /* __ASM_GENERIC_IO_H */
> diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c
> index bcce5f1..30b65ae 100644
> --- a/lib/pci_iomap.c
> +++ b/lib/pci_iomap.c
> @@ -52,6 +52,46 @@ void __iomem *pci_iomap_range(struct pci_dev *dev,
>  EXPORT_SYMBOL(pci_iomap_range);
>  
>  /**
> + * pci_iomap_wc_range - create a virtual WC mapping cookie for a PCI BAR
> + * @dev: PCI device that owns the BAR
> + * @bar: BAR number
> + * @offset: map memory at the given offset in BAR
> + * @maxlen: max length of the memory to map
> + *
> + * Using this function you will get a __iomem address to your device BAR.
> + * You can access it using ioread*() and iowrite*(). These functions hide
> + * the details if this is a MMIO or PIO address space and will just do what
> + * you expect from them in the correct way. When possible write combining
> + * is used.
> + *
> + * @maxlen specifies the maximum length to map. If you want to get access to
> + * the complete BAR from offset to the end, pass %0 here.

s/%0/0 ? Or is that some special syntax?

> + * */
> +void __iomem *pci_iomap_wc_range(struct pci_dev *dev,
> +				 int bar,
> +				 unsigned long offset,
> +				 unsigned long maxlen)
> +{
> +	resource_size_t start = pci_resource_start(dev, bar);
> +	resource_size_t len = pci_resource_len(dev, bar);
> +	unsigned long flags = pci_resource_flags(dev, bar);
> +
> +	if (len <= offset || !start)
> +		return NULL;
> +	len -= offset;
> +	start += offset;
> +	if (maxlen && len > maxlen)
> +		len = maxlen;
> +	if (flags & IORESOURCE_IO)
> +		return __pci_ioport_map(dev, start, len);
> +	if (flags & IORESOURCE_MEM)
> +		return ioremap_wc(start, len);
> +	/* What? */
> +	return NULL;
> +}
> +EXPORT_SYMBOL_GPL(pci_iomap_wc_range);
> +
> +/**
>   * pci_iomap - create a virtual mapping cookie for a PCI BAR
>   * @dev: PCI device that owns the BAR
>   * @bar: BAR number
> @@ -70,4 +110,25 @@ void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long maxlen)
>  	return pci_iomap_range(dev, bar, 0, maxlen);
>  }
>  EXPORT_SYMBOL(pci_iomap);
> +
> +/**
> + * pci_iomap_wc - create a virtual WC mapping cookie for a PCI BAR
> + * @dev: PCI device that owns the BAR
> + * @bar: BAR number
> + * @maxlen: length of the memory to map
> + *
> + * Using this function you will get a __iomem address to your device BAR.
> + * You can access it using ioread*() and iowrite*(). These functions hide
> + * the details if this is a MMIO or PIO address space and will just do what
> + * you expect from them in the correct way. When possible write combining
> + * is used.
> + *
> + * @maxlen specifies the maximum length to map. If you want to get access to
> + * the complete BAR without checking for its length first, pass %0 here.
> + * */
> +void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long maxlen)
> +{
> +	return pci_iomap_wc_range(dev, bar, 0, maxlen);
> +}
> +EXPORT_SYMBOL_GPL(pci_iomap_wc);
>  #endif /* CONFIG_PCI */
> -- 
> 2.3.2.209.gd67f9d5.dirty
> 

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
@ 2015-03-25 20:07     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 710+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-03-25 20:07 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied, linux-kernel, linux-fbdev, x86,
	xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
	Bjorn Helgaas, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
	Stefan Bader, ville.syrjala, david.vrabel, toshi.kani,
	Roger Pau Monné

On Fri, Mar 20, 2015 at 04:17:55PM -0700, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
> 
> This allows drivers to take advantage of write-combining
> when possible. Ideally we'd have pci_read_bases() just
> peg an IORESOURCE_WC flag for us but where exactly
> video devices memory lie varies *largely* and at times things
> are mixed with MMIO registers, sometimes we can address
> the changes in drivers, other times the change requires
> intrusive changes.
> 
> Although there is also arch_phys_wc_add() that makes use of
> architecture specific write-combinging alternatives (MTRR on

combinging?

> x86 when a system does not have PAT) we void polluting
> pci_iomap() space with it and force drivers and subsystems
> that want to use it to be explicit.
> 
> There are a few motivations for this:
> 
> a) Take advantage of PAT when available
> 
> b) Help bury MTRR code away, MTRR is architecture specific and on
>    x86 its replaced by PAT
> 
> c) Help with the goal of eventually using _PAGE_CACHE_UC over
>    _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
> 
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: venkatesh.pallipadi@intel.com
> Cc: Stefan Bader <stefan.bader@canonical.com>
> Cc: konrad.wilk@oracle.com
> Cc: ville.syrjala@linux.intel.com
> Cc: david.vrabel@citrix.com
> Cc: jbeulich@suse.com
> Cc: toshi.kani@hp.com
> Cc: Roger Pau Monné <roger.pau@citrix.com>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: xen-devel@lists.xensource.com
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
>  include/asm-generic/pci_iomap.h | 14 ++++++++++
>  lib/pci_iomap.c                 | 61 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 75 insertions(+)
> 
> diff --git a/include/asm-generic/pci_iomap.h b/include/asm-generic/pci_iomap.h
> index 7389c87..b1e17fc 100644
> --- a/include/asm-generic/pci_iomap.h
> +++ b/include/asm-generic/pci_iomap.h
> @@ -15,9 +15,13 @@ struct pci_dev;
>  #ifdef CONFIG_PCI
>  /* Create a virtual mapping cookie for a PCI BAR (memory or IO) */
>  extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max);
> +extern void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long max);
>  extern void __iomem *pci_iomap_range(struct pci_dev *dev, int bar,
>  				     unsigned long offset,
>  				     unsigned long maxlen);
> +extern void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar,
> +					unsigned long offset,
> +					unsigned long maxlen);
>  /* Create a virtual mapping cookie for a port on a given PCI device.
>   * Do not call this directly, it exists to make it easier for architectures
>   * to override */
> @@ -34,12 +38,22 @@ static inline void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned lon
>  	return NULL;
>  }
>  
> +static inline void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long max)
> +{
> +	return NULL;
> +}
>  static inline void __iomem *pci_iomap_range(struct pci_dev *dev, int bar,
>  					    unsigned long offset,
>  					    unsigned long maxlen)
>  {
>  	return NULL;
>  }
> +static inline void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar,
> +					       unsigned long offset,
> +					       unsigned long maxlen)
> +{
> +	return NULL;
> +}
>  #endif
>  
>  #endif /* __ASM_GENERIC_IO_H */
> diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c
> index bcce5f1..30b65ae 100644
> --- a/lib/pci_iomap.c
> +++ b/lib/pci_iomap.c
> @@ -52,6 +52,46 @@ void __iomem *pci_iomap_range(struct pci_dev *dev,
>  EXPORT_SYMBOL(pci_iomap_range);
>  
>  /**
> + * pci_iomap_wc_range - create a virtual WC mapping cookie for a PCI BAR
> + * @dev: PCI device that owns the BAR
> + * @bar: BAR number
> + * @offset: map memory at the given offset in BAR
> + * @maxlen: max length of the memory to map
> + *
> + * Using this function you will get a __iomem address to your device BAR.
> + * You can access it using ioread*() and iowrite*(). These functions hide
> + * the details if this is a MMIO or PIO address space and will just do what
> + * you expect from them in the correct way. When possible write combining
> + * is used.
> + *
> + * @maxlen specifies the maximum length to map. If you want to get access to
> + * the complete BAR from offset to the end, pass %0 here.

s/%0/0 ? Or is that some special syntax?

> + * */
> +void __iomem *pci_iomap_wc_range(struct pci_dev *dev,
> +				 int bar,
> +				 unsigned long offset,
> +				 unsigned long maxlen)
> +{
> +	resource_size_t start = pci_resource_start(dev, bar);
> +	resource_size_t len = pci_resource_len(dev, bar);
> +	unsigned long flags = pci_resource_flags(dev, bar);
> +
> +	if (len <= offset || !start)
> +		return NULL;
> +	len -= offset;
> +	start += offset;
> +	if (maxlen && len > maxlen)
> +		len = maxlen;
> +	if (flags & IORESOURCE_IO)
> +		return __pci_ioport_map(dev, start, len);
> +	if (flags & IORESOURCE_MEM)
> +		return ioremap_wc(start, len);
> +	/* What? */
> +	return NULL;
> +}
> +EXPORT_SYMBOL_GPL(pci_iomap_wc_range);
> +
> +/**
>   * pci_iomap - create a virtual mapping cookie for a PCI BAR
>   * @dev: PCI device that owns the BAR
>   * @bar: BAR number
> @@ -70,4 +110,25 @@ void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long maxlen)
>  	return pci_iomap_range(dev, bar, 0, maxlen);
>  }
>  EXPORT_SYMBOL(pci_iomap);
> +
> +/**
> + * pci_iomap_wc - create a virtual WC mapping cookie for a PCI BAR
> + * @dev: PCI device that owns the BAR
> + * @bar: BAR number
> + * @maxlen: length of the memory to map
> + *
> + * Using this function you will get a __iomem address to your device BAR.
> + * You can access it using ioread*() and iowrite*(). These functions hide
> + * the details if this is a MMIO or PIO address space and will just do what
> + * you expect from them in the correct way. When possible write combining
> + * is used.
> + *
> + * @maxlen specifies the maximum length to map. If you want to get access to
> + * the complete BAR without checking for its length first, pass %0 here.
> + * */
> +void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned long maxlen)
> +{
> +	return pci_iomap_wc_range(dev, bar, 0, maxlen);
> +}
> +EXPORT_SYMBOL_GPL(pci_iomap_wc);
>  #endif /* CONFIG_PCI */
> -- 
> 2.3.2.209.gd67f9d5.dirty
> 

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [Xen-devel] [PATCH v1 04/47] pci: add pci_ioremap_wc_bar()
  2015-03-25 20:03     ` Konrad Rzeszutek Wilk
@ 2015-03-25 20:39       ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-25 20:39 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Arjan van de Ven, Arjan van de Ven
  Cc: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
	suresh.b.siddha, venkatesh.pallipadi, airlied, linux-fbdev,
	Antonino Daplas, Daniel Vetter, x86, linux-kernel,
	Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

On Wed, Mar 25, 2015 at 04:03:46PM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Mar 20, 2015 at 04:17:54PM -0700, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> > 
> > This lets drivers take advanate of PAT when available. This
> 
> s/advanate/advantage/

Amended.

> > should help with the transition of converting video drivers over
> > to ioremap_wc() to help with the goal of eventually using
> > _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on
> > ioremap_nocache() (de33c442e)
> 
> Please mention the title of the patch too:
> 
> "x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()"

Added.

> > 
> > Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> > Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> > Cc: Ingo Molnar <mingo@elte.hu>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Juergen Gross <jgross@suse.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Andy Lutomirski <luto@amacapital.net>
> > Cc: Dave Airlie <airlied@redhat.com>
> > Cc: Antonino Daplas <adaplas@gmail.com>
> > Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> > Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> > Cc: linux-fbdev@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> > ---
> >  drivers/pci/pci.c   | 14 ++++++++++++++
> >  include/linux/pci.h |  1 +
> >  2 files changed, 15 insertions(+)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 81f06e8..6afd507 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -137,6 +137,20 @@ void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar)
> >  				     pci_resource_len(pdev, bar));
> >  }
> >  EXPORT_SYMBOL_GPL(pci_ioremap_bar);
> > +
> > +void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar)
> > +{
> > +	/*
> > +	 * Make sure the BAR is actually a memory resource, not an IO resource
> > +	 */
> > +	if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) {
> > +		WARN_ON(1);
> 
> Would it be better to use dev_warn ? That way you can see which BDF it is?
> 
> Thought WARN will give a nice stack-trace that should easily point to the
> driver so perhaps not.. Either way - up to you.

I'm sticking to the style and use as with pci_ioremap_bar(). Whatever we pick
we should make both use the same. More information is always better and
since we do have dev_warn(), it would be nice to use that however within
its use on both pci_ioremap_wc_bar() and pci_ioremap_bar() we have
a use of the pdev with pci_resource_flags() and I believe if pdev is NULL
we'd get a NULL dereference (dev_driver_string() is used), so it would
seem it might be best to stick with a simple WARN_ON(). Arjan, any
preference? Obviously if pdev is NULL your driver is dumb but as folks
develop drivers this should be expected.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [Xen-devel] [PATCH v1 04/47] pci: add pci_ioremap_wc_bar()
@ 2015-03-25 20:39       ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-25 20:39 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Arjan van de Ven, Arjan van de Ven
  Cc: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
	suresh.b.siddha, venkatesh.pallipadi, airlied, linux-fbdev,
	Antonino Daplas, Daniel Vetter, x86, linux-kernel,
	Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

On Wed, Mar 25, 2015 at 04:03:46PM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Mar 20, 2015 at 04:17:54PM -0700, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> > 
> > This lets drivers take advanate of PAT when available. This
> 
> s/advanate/advantage/

Amended.

> > should help with the transition of converting video drivers over
> > to ioremap_wc() to help with the goal of eventually using
> > _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on
> > ioremap_nocache() (de33c442e)
> 
> Please mention the title of the patch too:
> 
> "x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()"

Added.

> > 
> > Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> > Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> > Cc: Ingo Molnar <mingo@elte.hu>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Juergen Gross <jgross@suse.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Andy Lutomirski <luto@amacapital.net>
> > Cc: Dave Airlie <airlied@redhat.com>
> > Cc: Antonino Daplas <adaplas@gmail.com>
> > Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> > Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> > Cc: linux-fbdev@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> > ---
> >  drivers/pci/pci.c   | 14 ++++++++++++++
> >  include/linux/pci.h |  1 +
> >  2 files changed, 15 insertions(+)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 81f06e8..6afd507 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -137,6 +137,20 @@ void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar)
> >  				     pci_resource_len(pdev, bar));
> >  }
> >  EXPORT_SYMBOL_GPL(pci_ioremap_bar);
> > +
> > +void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar)
> > +{
> > +	/*
> > +	 * Make sure the BAR is actually a memory resource, not an IO resource
> > +	 */
> > +	if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) {
> > +		WARN_ON(1);
> 
> Would it be better to use dev_warn ? That way you can see which BDF it is?
> 
> Thought WARN will give a nice stack-trace that should easily point to the
> driver so perhaps not.. Either way - up to you.

I'm sticking to the style and use as with pci_ioremap_bar(). Whatever we pick
we should make both use the same. More information is always better and
since we do have dev_warn(), it would be nice to use that however within
its use on both pci_ioremap_wc_bar() and pci_ioremap_bar() we have
a use of the pdev with pci_resource_flags() and I believe if pdev is NULL
we'd get a NULL dereference (dev_driver_string() is used), so it would
seem it might be best to stick with a simple WARN_ON(). Arjan, any
preference? Obviously if pdev is NULL your driver is dumb but as folks
develop drivers this should be expected.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 04/47] pci: add pci_ioremap_wc_bar()
  2015-03-25 20:03     ` Konrad Rzeszutek Wilk
  (?)
  (?)
@ 2015-03-25 20:39     ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-25 20:39 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Arjan van de Ven, Arjan van de Ven
  Cc: jgross, linux-fbdev, x86, suresh.b.siddha, Antonino Daplas,
	Luis R. Rodriguez, Daniel Vetter, Tomi Valkeinen,
	venkatesh.pallipadi, linux-kernel, luto, xen-devel, mingo,
	JBeulich, hpa, airlied, tglx, bp,
	Jean-Christophe Plagniol-Villard, Ingo Molnar

On Wed, Mar 25, 2015 at 04:03:46PM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Mar 20, 2015 at 04:17:54PM -0700, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> > 
> > This lets drivers take advanate of PAT when available. This
> 
> s/advanate/advantage/

Amended.

> > should help with the transition of converting video drivers over
> > to ioremap_wc() to help with the goal of eventually using
> > _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on
> > ioremap_nocache() (de33c442e)
> 
> Please mention the title of the patch too:
> 
> "x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()"

Added.

> > 
> > Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> > Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> > Cc: Ingo Molnar <mingo@elte.hu>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Juergen Gross <jgross@suse.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Andy Lutomirski <luto@amacapital.net>
> > Cc: Dave Airlie <airlied@redhat.com>
> > Cc: Antonino Daplas <adaplas@gmail.com>
> > Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> > Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> > Cc: linux-fbdev@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> > ---
> >  drivers/pci/pci.c   | 14 ++++++++++++++
> >  include/linux/pci.h |  1 +
> >  2 files changed, 15 insertions(+)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 81f06e8..6afd507 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -137,6 +137,20 @@ void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar)
> >  				     pci_resource_len(pdev, bar));
> >  }
> >  EXPORT_SYMBOL_GPL(pci_ioremap_bar);
> > +
> > +void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar)
> > +{
> > +	/*
> > +	 * Make sure the BAR is actually a memory resource, not an IO resource
> > +	 */
> > +	if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) {
> > +		WARN_ON(1);
> 
> Would it be better to use dev_warn ? That way you can see which BDF it is?
> 
> Thought WARN will give a nice stack-trace that should easily point to the
> driver so perhaps not.. Either way - up to you.

I'm sticking to the style and use as with pci_ioremap_bar(). Whatever we pick
we should make both use the same. More information is always better and
since we do have dev_warn(), it would be nice to use that however within
its use on both pci_ioremap_wc_bar() and pci_ioremap_bar() we have
a use of the pdev with pci_resource_flags() and I believe if pdev is NULL
we'd get a NULL dereference (dev_driver_string() is used), so it would
seem it might be best to stick with a simple WARN_ON(). Arjan, any
preference? Obviously if pdev is NULL your driver is dumb but as folks
develop drivers this should be expected.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
  2015-03-23 17:20     ` Bjorn Helgaas
@ 2015-03-26  3:00       ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-26  3:00 UTC (permalink / raw)
  To: Bjorn Helgaas, Arnd Bergmann, Linus Walleij, Stefano Stabellini,
	Julia Lawall, Peter Senna Tschudin
  Cc: Luis R. Rodriguez, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, jgross, Jan Beulich, Borislav Petkov,
	Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
	linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
	Stefan Bader, Konrad Rzeszutek Wilk, Ville Syrjälä,
	David Vrabel, Toshi Kani, Roger Pau Monné,
	Benjamin Poirier, linux-pci

On Mon, Mar 23, 2015 at 12:20:47PM -0500, Bjorn Helgaas wrote:
> Hi Luis,
> 
> This seems OK to me, 

Great.

> but I'm curious about a few things.
> 
> On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > This allows drivers to take advantage of write-combining
> > when possible. Ideally we'd have pci_read_bases() just
> > peg an IORESOURCE_WC flag for us
> 
> We do set IORESOURCE_PREFETCH.  Do you mean something different?

I did not think we had a WC IORESOURCE flag. Are you saying that we can use
IORESOURCE_PREFETCH for that purpose? If so then great.  As I read a PCI BAR
can have PCI_BASE_ADDRESS_MEM_PREFETCH and when that's the case we peg
IORESOURCE_PREFETCH. That seems to be what I want indeed. Questions below.

> >  but where exactly
> > video devices memory lie varies *largely* and at times things
> > are mixed with MMIO registers, sometimes we can address
> > the changes in drivers, other times the change requires
> > intrusive changes.
> 
> What does a video device address have to do with this?  I do see that
> if a BAR maps only a frame buffer, the device might be able to mark it
> prefetchable, while if the BAR mapped both a frame buffer and some
> registers, it might not be able to make it prefetchable.  But that
> doesn't seem like it depends on the *address*.

I meant the offsets for each of those, either registers or framebuffer,
and that typically they are mixed (primarily on older devices), so indeed your
summary of the problem is what I meant. Let's remember that we are trying to
take advantage of PAT here when available and avoid MTRR in that case, do we
know that the same PCI BARs that have always historically used MTRRs had
IORESOURCE_PREFETCH set, is that a fair assumption ? I realize they are
different things -- but its precisely why I ask.

> pci_iomap_range() already makes a cacheable mapping if
> IORESOURCE_CACHEABLE; I'm guessing that you would like it to
> automatically use WC if the BAR if IORESOURCE_PREFETCH, e.g.,
> 
>   if (flags & IORESOURCE_CACHEABLE)
>     return ioremap(start, len);
>   if (flags & IORESOURCE_PREFETCH)
>     return ioremap_wc(start, len);
>   return ioremap_nocache(start, len);

Indeed, that's exactly what I think we should strive towards.

> Is there a reason not to do that?

This depends on the exact defintion of IORESOURCE_PREFETCH and
PCI_BASE_ADDRESS_MEM_PREFETCH and how they are used all over and
accross *all devices*. This didn't look promising for starters:

include/uapi/linux/pci_regs.h:#define  PCI_BASE_ADDRESS_MEM_PREFETCH    0x08    /* prefetchable? */

PCI_BASE_ADDRESS_MEM_PREFETCH seems to be BAR specific, so a few questions:

1) Can we rest assured for instance that if we check for
PCI_BASE_ADDRESS_MEM_PREFETCH and if set that it will *only* be set on a full
PCI BAR if the full PCI BAR does want WC? If not this can regress
functionality. That seems risky. It however would not be risky if we used
another API that did look for IORESOURCE_PREFETCH and if so use ioremap_wc() --
that way only drivers we know that do use the full PCI bar would use this API.
There's a bit of a problem with this though:

2) Do we know that if a *full PCI BAR* is used for WC that
PCI_BASE_ADDRESS_MEM_PREFETCH *was* definitely set for the PCI BAR? If so then
the API usage would be restricted only to devices that we know *do* adhere to
this. That reduces the possible uses for older drivers and can create
regressions if used loosely without verification... but..

3) If from now on we get folks to commit to uset PCI_BASE_ADDRESS_MEM_PREFETCH
for full PCI BARs that do want WC perhaps newer devices / drivers will use
this very consistently ? Can we bank on that and is it worth it ?

4) If a PCI BAR *does not* have PCI_BASE_ADDRESS_MEM_PREFETCH do we know it
must not never want WC ?

If we don't have certainty on any of the above I'm afraid we can't do much
right now but perhaps we can push towards better use of PCI_BASE_ADDRESS_MEM_PREFETCH
and hope folks will only use this for the full PCI BAR only if WC is desired.

Thoughts?

> > Although there is also arch_phys_wc_add() that makes use of
> > architecture specific write-combinging alternatives (MTRR on
> > x86 when a system does not have PAT) we void polluting
> > pci_iomap() space with it and force drivers and subsystems
> > that want to use it to be explicit.
> >
> > There are a few motivations for this:
> >
> > a) Take advantage of PAT when available
> >
> > b) Help bury MTRR code away, MTRR is architecture specific and on
> >    x86 its replaced by PAT
> >
> > c) Help with the goal of eventually using _PAGE_CACHE_UC over
> >    _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
> > ...
> 
> > +void __iomem *pci_iomap_wc_range(struct pci_dev *dev,
> > +                                int bar,
> > +                                unsigned long offset,
> > +                                unsigned long maxlen)
> > +{
> > +       resource_size_t start = pci_resource_start(dev, bar);
> > +       resource_size_t len = pci_resource_len(dev, bar);
> > +       unsigned long flags = pci_resource_flags(dev, bar);
> > +
> > +       if (len <= offset || !start)
> > +               return NULL;
> > +       len -= offset;
> > +       start += offset;
> > +       if (maxlen && len > maxlen)
> > +               len = maxlen;
> > +       if (flags & IORESOURCE_IO)
> > +               return __pci_ioport_map(dev, start, len);
> > +       if (flags & IORESOURCE_MEM)
> 
> Should we log a note in dmesg if the BAR is *not* IORESOURCE_PREFETCH?
>  I know the driver might know it's safe even if the device didn't mark
> the BAR as prefetchable, but it does seem like an easy way for a
> driver to shoot itself in the foot.

You tell me. I would fear this may not be consistent and we'd end up
having bug reports open for something that has historically been a
non-issue. The above questions can help us gauge the risk of this.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
@ 2015-03-26  3:00       ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-26  3:00 UTC (permalink / raw)
  To: Bjorn Helgaas, Arnd Bergmann, Linus Walleij, Stefano Stabellini,
	Julia Lawall, Peter Senna Tschudin
  Cc: Luis R. Rodriguez, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, jgross, Jan Beulich, Borislav Petkov,
	Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
	linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
	Stefan Bader, Konrad Rzeszutek Wilk, Ville Syrjälä,
	David Vrabel, Toshi Kani, Roger Pau Monné,
	Benjamin Poirier, linux-pci

On Mon, Mar 23, 2015 at 12:20:47PM -0500, Bjorn Helgaas wrote:
> Hi Luis,
> 
> This seems OK to me, 

Great.

> but I'm curious about a few things.
> 
> On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > This allows drivers to take advantage of write-combining
> > when possible. Ideally we'd have pci_read_bases() just
> > peg an IORESOURCE_WC flag for us
> 
> We do set IORESOURCE_PREFETCH.  Do you mean something different?

I did not think we had a WC IORESOURCE flag. Are you saying that we can use
IORESOURCE_PREFETCH for that purpose? If so then great.  As I read a PCI BAR
can have PCI_BASE_ADDRESS_MEM_PREFETCH and when that's the case we peg
IORESOURCE_PREFETCH. That seems to be what I want indeed. Questions below.

> >  but where exactly
> > video devices memory lie varies *largely* and at times things
> > are mixed with MMIO registers, sometimes we can address
> > the changes in drivers, other times the change requires
> > intrusive changes.
> 
> What does a video device address have to do with this?  I do see that
> if a BAR maps only a frame buffer, the device might be able to mark it
> prefetchable, while if the BAR mapped both a frame buffer and some
> registers, it might not be able to make it prefetchable.  But that
> doesn't seem like it depends on the *address*.

I meant the offsets for each of those, either registers or framebuffer,
and that typically they are mixed (primarily on older devices), so indeed your
summary of the problem is what I meant. Let's remember that we are trying to
take advantage of PAT here when available and avoid MTRR in that case, do we
know that the same PCI BARs that have always historically used MTRRs had
IORESOURCE_PREFETCH set, is that a fair assumption ? I realize they are
different things -- but its precisely why I ask.

> pci_iomap_range() already makes a cacheable mapping if
> IORESOURCE_CACHEABLE; I'm guessing that you would like it to
> automatically use WC if the BAR if IORESOURCE_PREFETCH, e.g.,
> 
>   if (flags & IORESOURCE_CACHEABLE)
>     return ioremap(start, len);
>   if (flags & IORESOURCE_PREFETCH)
>     return ioremap_wc(start, len);
>   return ioremap_nocache(start, len);

Indeed, that's exactly what I think we should strive towards.

> Is there a reason not to do that?

This depends on the exact defintion of IORESOURCE_PREFETCH and
PCI_BASE_ADDRESS_MEM_PREFETCH and how they are used all over and
accross *all devices*. This didn't look promising for starters:

include/uapi/linux/pci_regs.h:#define  PCI_BASE_ADDRESS_MEM_PREFETCH    0x08    /* prefetchable? */

PCI_BASE_ADDRESS_MEM_PREFETCH seems to be BAR specific, so a few questions:

1) Can we rest assured for instance that if we check for
PCI_BASE_ADDRESS_MEM_PREFETCH and if set that it will *only* be set on a full
PCI BAR if the full PCI BAR does want WC? If not this can regress
functionality. That seems risky. It however would not be risky if we used
another API that did look for IORESOURCE_PREFETCH and if so use ioremap_wc() --
that way only drivers we know that do use the full PCI bar would use this API.
There's a bit of a problem with this though:

2) Do we know that if a *full PCI BAR* is used for WC that
PCI_BASE_ADDRESS_MEM_PREFETCH *was* definitely set for the PCI BAR? If so then
the API usage would be restricted only to devices that we know *do* adhere to
this. That reduces the possible uses for older drivers and can create
regressions if used loosely without verification... but..

3) If from now on we get folks to commit to uset PCI_BASE_ADDRESS_MEM_PREFETCH
for full PCI BARs that do want WC perhaps newer devices / drivers will use
this very consistently ? Can we bank on that and is it worth it ?

4) If a PCI BAR *does not* have PCI_BASE_ADDRESS_MEM_PREFETCH do we know it
must not never want WC ?

If we don't have certainty on any of the above I'm afraid we can't do much
right now but perhaps we can push towards better use of PCI_BASE_ADDRESS_MEM_PREFETCH
and hope folks will only use this for the full PCI BAR only if WC is desired.

Thoughts?

> > Although there is also arch_phys_wc_add() that makes use of
> > architecture specific write-combinging alternatives (MTRR on
> > x86 when a system does not have PAT) we void polluting
> > pci_iomap() space with it and force drivers and subsystems
> > that want to use it to be explicit.
> >
> > There are a few motivations for this:
> >
> > a) Take advantage of PAT when available
> >
> > b) Help bury MTRR code away, MTRR is architecture specific and on
> >    x86 its replaced by PAT
> >
> > c) Help with the goal of eventually using _PAGE_CACHE_UC over
> >    _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
> > ...
> 
> > +void __iomem *pci_iomap_wc_range(struct pci_dev *dev,
> > +                                int bar,
> > +                                unsigned long offset,
> > +                                unsigned long maxlen)
> > +{
> > +       resource_size_t start = pci_resource_start(dev, bar);
> > +       resource_size_t len = pci_resource_len(dev, bar);
> > +       unsigned long flags = pci_resource_flags(dev, bar);
> > +
> > +       if (len <= offset || !start)
> > +               return NULL;
> > +       len -= offset;
> > +       start += offset;
> > +       if (maxlen && len > maxlen)
> > +               len = maxlen;
> > +       if (flags & IORESOURCE_IO)
> > +               return __pci_ioport_map(dev, start, len);
> > +       if (flags & IORESOURCE_MEM)
> 
> Should we log a note in dmesg if the BAR is *not* IORESOURCE_PREFETCH?
>  I know the driver might know it's safe even if the device didn't mark
> the BAR as prefetchable, but it does seem like an easy way for a
> driver to shoot itself in the foot.

You tell me. I would fear this may not be consistent and we'd end up
having bug reports open for something that has historically been a
non-issue. The above questions can help us gauge the risk of this.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
  2015-03-23 17:20     ` Bjorn Helgaas
  (?)
  (?)
@ 2015-03-26  3:00     ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-26  3:00 UTC (permalink / raw)
  To: Bjorn Helgaas, Arnd Bergmann, Linus Walleij, Stefano Stabellini,
	Julia Lawall, Peter Senna Tschudin
  Cc: linux-fbdev, Michael S. Tsirkin, Daniel Vetter, Dave Hansen,
	Jan Beulich, H. Peter Anvin, Ville Syrjälä,
	Suresh Siddha, x86, Tomi Valkeinen, linux-pci, xen-devel,
	Ingo Molnar, Borislav Petkov, Jean-Christophe Plagniol-Villard,
	Benjamin Poirier, Antonino Daplas, Stefan Bader, Dave Airlie,
	Thomas Gleixner, Ingo Molnar, jgross, Toshi Kani

On Mon, Mar 23, 2015 at 12:20:47PM -0500, Bjorn Helgaas wrote:
> Hi Luis,
> 
> This seems OK to me, 

Great.

> but I'm curious about a few things.
> 
> On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > This allows drivers to take advantage of write-combining
> > when possible. Ideally we'd have pci_read_bases() just
> > peg an IORESOURCE_WC flag for us
> 
> We do set IORESOURCE_PREFETCH.  Do you mean something different?

I did not think we had a WC IORESOURCE flag. Are you saying that we can use
IORESOURCE_PREFETCH for that purpose? If so then great.  As I read a PCI BAR
can have PCI_BASE_ADDRESS_MEM_PREFETCH and when that's the case we peg
IORESOURCE_PREFETCH. That seems to be what I want indeed. Questions below.

> >  but where exactly
> > video devices memory lie varies *largely* and at times things
> > are mixed with MMIO registers, sometimes we can address
> > the changes in drivers, other times the change requires
> > intrusive changes.
> 
> What does a video device address have to do with this?  I do see that
> if a BAR maps only a frame buffer, the device might be able to mark it
> prefetchable, while if the BAR mapped both a frame buffer and some
> registers, it might not be able to make it prefetchable.  But that
> doesn't seem like it depends on the *address*.

I meant the offsets for each of those, either registers or framebuffer,
and that typically they are mixed (primarily on older devices), so indeed your
summary of the problem is what I meant. Let's remember that we are trying to
take advantage of PAT here when available and avoid MTRR in that case, do we
know that the same PCI BARs that have always historically used MTRRs had
IORESOURCE_PREFETCH set, is that a fair assumption ? I realize they are
different things -- but its precisely why I ask.

> pci_iomap_range() already makes a cacheable mapping if
> IORESOURCE_CACHEABLE; I'm guessing that you would like it to
> automatically use WC if the BAR if IORESOURCE_PREFETCH, e.g.,
> 
>   if (flags & IORESOURCE_CACHEABLE)
>     return ioremap(start, len);
>   if (flags & IORESOURCE_PREFETCH)
>     return ioremap_wc(start, len);
>   return ioremap_nocache(start, len);

Indeed, that's exactly what I think we should strive towards.

> Is there a reason not to do that?

This depends on the exact defintion of IORESOURCE_PREFETCH and
PCI_BASE_ADDRESS_MEM_PREFETCH and how they are used all over and
accross *all devices*. This didn't look promising for starters:

include/uapi/linux/pci_regs.h:#define  PCI_BASE_ADDRESS_MEM_PREFETCH    0x08    /* prefetchable? */

PCI_BASE_ADDRESS_MEM_PREFETCH seems to be BAR specific, so a few questions:

1) Can we rest assured for instance that if we check for
PCI_BASE_ADDRESS_MEM_PREFETCH and if set that it will *only* be set on a full
PCI BAR if the full PCI BAR does want WC? If not this can regress
functionality. That seems risky. It however would not be risky if we used
another API that did look for IORESOURCE_PREFETCH and if so use ioremap_wc() --
that way only drivers we know that do use the full PCI bar would use this API.
There's a bit of a problem with this though:

2) Do we know that if a *full PCI BAR* is used for WC that
PCI_BASE_ADDRESS_MEM_PREFETCH *was* definitely set for the PCI BAR? If so then
the API usage would be restricted only to devices that we know *do* adhere to
this. That reduces the possible uses for older drivers and can create
regressions if used loosely without verification... but..

3) If from now on we get folks to commit to uset PCI_BASE_ADDRESS_MEM_PREFETCH
for full PCI BARs that do want WC perhaps newer devices / drivers will use
this very consistently ? Can we bank on that and is it worth it ?

4) If a PCI BAR *does not* have PCI_BASE_ADDRESS_MEM_PREFETCH do we know it
must not never want WC ?

If we don't have certainty on any of the above I'm afraid we can't do much
right now but perhaps we can push towards better use of PCI_BASE_ADDRESS_MEM_PREFETCH
and hope folks will only use this for the full PCI BAR only if WC is desired.

Thoughts?

> > Although there is also arch_phys_wc_add() that makes use of
> > architecture specific write-combinging alternatives (MTRR on
> > x86 when a system does not have PAT) we void polluting
> > pci_iomap() space with it and force drivers and subsystems
> > that want to use it to be explicit.
> >
> > There are a few motivations for this:
> >
> > a) Take advantage of PAT when available
> >
> > b) Help bury MTRR code away, MTRR is architecture specific and on
> >    x86 its replaced by PAT
> >
> > c) Help with the goal of eventually using _PAGE_CACHE_UC over
> >    _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
> > ...
> 
> > +void __iomem *pci_iomap_wc_range(struct pci_dev *dev,
> > +                                int bar,
> > +                                unsigned long offset,
> > +                                unsigned long maxlen)
> > +{
> > +       resource_size_t start = pci_resource_start(dev, bar);
> > +       resource_size_t len = pci_resource_len(dev, bar);
> > +       unsigned long flags = pci_resource_flags(dev, bar);
> > +
> > +       if (len <= offset || !start)
> > +               return NULL;
> > +       len -= offset;
> > +       start += offset;
> > +       if (maxlen && len > maxlen)
> > +               len = maxlen;
> > +       if (flags & IORESOURCE_IO)
> > +               return __pci_ioport_map(dev, start, len);
> > +       if (flags & IORESOURCE_MEM)
> 
> Should we log a note in dmesg if the BAR is *not* IORESOURCE_PREFETCH?
>  I know the driver might know it's safe even if the device didn't mark
> the BAR as prefetchable, but it does seem like an easy way for a
> driver to shoot itself in the foot.

You tell me. I would fear this may not be consistent and we'd end up
having bug reports open for something that has historically been a
non-issue. The above questions can help us gauge the risk of this.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
  2015-03-25 19:59     ` Konrad Rzeszutek Wilk
@ 2015-03-26  4:38       ` Juergen Gross
  -1 siblings, 0 replies; 710+ messages in thread
From: Juergen Gross @ 2015-03-26  4:38 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Luis R. Rodriguez
  Cc: luto, mingo, tglx, hpa, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied, linux-kernel, linux-fbdev, x86,
	xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Stefan Bader, ville.syrjala,
	david.vrabel, toshi.kani, bhelgaas, Roger Pau Monné,
	xen-devel

On 03/25/2015 08:59 PM, Konrad Rzeszutek Wilk wrote:
> On Fri, Mar 20, 2015 at 04:17:52PM -0700, Luis R. Rodriguez wrote:
>> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>>
>> It is possible to enable CONFIG_MTRR and up with it
>> disabled at run time and yet CONFIG_X86_PAT continues
>> to kick through fully functionally. This can happen
>
> s/fully/full/ ?
>
>
>> for instance on Xen where MTRR is not supported but
>> PAT is, this can happen now on Linux as of commit
>> 47591df50 by Juergen introduced as of v3.19.
>
> s/3.19/4.0/

No, 3.19 is correct.

Juergen

>>
>> Technically we should assume the proper CPU
>> bits would be set to disable MTRR but we can't
>> always rely on this. At least on the Xen Hypervisor
>> for instance only X86_FEATURE_MTRR was disabled
>> as of Xen 4.4 through Xen commit 586ab6a [0],
>> but not X86_FEATURE_K6_MTRR, X86_FEATURE_CENTAUR_MCR,
>> or X86_FEATURE_CYRIX_ARR for instance.
>
> Oh, could you send an patch for that to Xen please?
>>
>> x86 mtrr code relies on quite a bit of checks for
>> mtrr_if being set to check to see if MTRR did get
>> set up, instead of using that lets provide a generic
>> setter which when set we know MTRR is enabled. This
>
> s/we know MTRR is enabled/will let us know that MTRR is enabled/
>
>> also adds a few checks where they were not before
>> which could potentially safeguard ourselves against
>> incorrect usage of MTRR where this was not desirable.
>>
>> Where possible match error codes as if MTRR was
>> disabled on arch/x86/include/asm/mtrr.h.
>>
>> Lastly, since disabling MTRR can happen at run time
>> and we could end up with PAT enabled best record now
>> on our logs when MTRR is disabled.
>>
>> [0] ~/devel/xen (git::stable-4.5)$ git describe --contains 586ab6a
>> 4.4.0-rc1~18
>>
>> Cc: Andy Lutomirski <luto@amacapital.net>
>> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
>> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
>> Cc: Ingo Molnar <mingo@elte.hu>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Juergen Gross <jgross@suse.com>
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Cc: Dave Airlie <airlied@redhat.com>
>> Cc: Antonino Daplas <adaplas@gmail.com>
>> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
>> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> Cc: venkatesh.pallipadi@intel.com
>> Cc: Stefan Bader <stefan.bader@canonical.com>
>> Cc: konrad.wilk@oracle.com
>> Cc: ville.syrjala@linux.intel.com
>> Cc: david.vrabel@citrix.com
>> Cc: jbeulich@suse.com
>> Cc: toshi.kani@hp.com
>> Cc: bhelgaas@google.com
>> Cc: Roger Pau Monné <roger.pau@citrix.com>
>> Cc: linux-fbdev@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> Cc: xen-devel@lists.xensource.com
>> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
>> ---
>>   arch/x86/include/asm/mtrr.h        |  2 ++
>>   arch/x86/kernel/cpu/mtrr/cleanup.c |  2 +-
>>   arch/x86/kernel/cpu/mtrr/generic.c |  5 +++--
>>   arch/x86/kernel/cpu/mtrr/if.c      |  3 +++
>>   arch/x86/kernel/cpu/mtrr/main.c    | 31 ++++++++++++++++++++++---------
>>   5 files changed, 31 insertions(+), 12 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
>> index f768f62..cade917 100644
>> --- a/arch/x86/include/asm/mtrr.h
>> +++ b/arch/x86/include/asm/mtrr.h
>> @@ -31,6 +31,7 @@
>>    * arch_phys_wc_add and arch_phys_wc_del.
>>    */
>>   # ifdef CONFIG_MTRR
>> +extern int mtrr_enabled;
>>   extern u8 mtrr_type_lookup(u64 addr, u64 end);
>>   extern void mtrr_save_fixed_ranges(void *);
>>   extern void mtrr_save_state(void);
>> @@ -50,6 +51,7 @@ extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
>>   extern int amd_special_default_mtrr(void);
>>   extern int phys_wc_to_mtrr_index(int handle);
>>   #  else
>> +static const int mtrr_enabled;
>>   static inline u8 mtrr_type_lookup(u64 addr, u64 end)
>>   {
>>   	/*
>> diff --git a/arch/x86/kernel/cpu/mtrr/cleanup.c b/arch/x86/kernel/cpu/mtrr/cleanup.c
>> index 5f90b85..784dc55 100644
>> --- a/arch/x86/kernel/cpu/mtrr/cleanup.c
>> +++ b/arch/x86/kernel/cpu/mtrr/cleanup.c
>> @@ -880,7 +880,7 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn)
>>   	 * Make sure we only trim uncachable memory on machines that
>>   	 * support the Intel MTRR architecture:
>>   	 */
>> -	if (!is_cpu(INTEL) || disable_mtrr_trim)
>> +	if (!is_cpu(INTEL) || disable_mtrr_trim || !mtrr_enabled)
>>   		return 0;
>>
>>   	rdmsr(MSR_MTRRdefType, def, dummy);
>> diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
>> index 09c82de..df321b2 100644
>> --- a/arch/x86/kernel/cpu/mtrr/generic.c
>> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
>> @@ -116,7 +116,8 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
>>   	u8 prev_match, curr_match;
>>
>>   	*repeat = 0;
>> -	if (!mtrr_state_set)
>> +	/* generic_mtrr_ops is only set for generic_mtrr_ops */
>> +	if (!mtrr_state_set || !mtrr_enabled)
>>   		return 0xFF;
>>
>>   	if (!mtrr_state.enabled)
>> @@ -290,7 +291,7 @@ static void get_fixed_ranges(mtrr_type *frs)
>>
>>   void mtrr_save_fixed_ranges(void *info)
>>   {
>> -	if (cpu_has_mtrr)
>> +	if (mtrr_enabled && cpu_has_mtrr)
>>   		get_fixed_ranges(mtrr_state.fixed_ranges);
>>   }
>>
>> diff --git a/arch/x86/kernel/cpu/mtrr/if.c b/arch/x86/kernel/cpu/mtrr/if.c
>> index d76f13d..e9e001a 100644
>> --- a/arch/x86/kernel/cpu/mtrr/if.c
>> +++ b/arch/x86/kernel/cpu/mtrr/if.c
>> @@ -436,6 +436,9 @@ static int __init mtrr_if_init(void)
>>   {
>>   	struct cpuinfo_x86 *c = &boot_cpu_data;
>>
>> +	if (!mtrr_enabled)
>> +		return 0;
>> +
>>   	if ((!cpu_has(c, X86_FEATURE_MTRR)) &&
>>   	    (!cpu_has(c, X86_FEATURE_K6_MTRR)) &&
>>   	    (!cpu_has(c, X86_FEATURE_CYRIX_ARR)) &&
>> diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
>> index ea5f363..7db9c47 100644
>> --- a/arch/x86/kernel/cpu/mtrr/main.c
>> +++ b/arch/x86/kernel/cpu/mtrr/main.c
>> @@ -59,6 +59,7 @@
>>   #define MTRR_TO_PHYS_WC_OFFSET 1000
>>
>>   u32 num_var_ranges;
>> +int mtrr_enabled;
>>
>>   unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
>>   static DEFINE_MUTEX(mtrr_mutex);
>> @@ -84,6 +85,9 @@ static int have_wrcomb(void)
>>   {
>>   	struct pci_dev *dev;
>>
>> +	if (!mtrr_enabled)
>> +		return 0;
>> +
>>   	dev = pci_get_class(PCI_CLASS_BRIDGE_HOST << 8, NULL);
>>   	if (dev != NULL) {
>>   		/*
>> @@ -286,7 +290,7 @@ int mtrr_add_page(unsigned long base, unsigned long size,
>>   	int i, replace, error;
>>   	mtrr_type ltype;
>>
>> -	if (!mtrr_if)
>> +	if (!mtrr_enabled)
>>   		return -ENXIO;
>>
>>   	error = mtrr_if->validate_add_page(base, size, type);
>> @@ -388,6 +392,8 @@ int mtrr_add_page(unsigned long base, unsigned long size,
>>
>>   static int mtrr_check(unsigned long base, unsigned long size)
>>   {
>> +	if (!mtrr_enabled)
>> +		return -ENODEV;
>>   	if ((base & (PAGE_SIZE - 1)) || (size & (PAGE_SIZE - 1))) {
>>   		pr_warning("mtrr: size and base must be multiples of 4 kiB\n");
>>   		pr_debug("mtrr: size: 0x%lx  base: 0x%lx\n", size, base);
>> @@ -463,8 +469,8 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
>>   	unsigned long lbase, lsize;
>>   	int error = -EINVAL;
>>
>> -	if (!mtrr_if)
>> -		return -ENXIO;
>> +	if (!mtrr_enabled)
>> +		return -ENODEV;
>>
>>   	max = num_var_ranges;
>>   	/* No CPU hotplug when we change MTRR entries */
>> @@ -523,6 +529,8 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
>>    */
>>   int mtrr_del(int reg, unsigned long base, unsigned long size)
>>   {
>> +	if (!mtrr_enabled)
>> +		return -ENODEV;
>>   	if (mtrr_check(base, size))
>>   		return -EINVAL;
>>   	return mtrr_del_page(reg, base >> PAGE_SHIFT, size >> PAGE_SHIFT);
>> @@ -545,7 +553,7 @@ int arch_phys_wc_add(unsigned long base, unsigned long size)
>>   {
>>   	int ret;
>>
>> -	if (pat_enabled)
>> +	if (pat_enabled || !mtrr_enabled)
>>   		return 0;  /* Success!  (We don't need to do anything.) */
>>
>>   	ret = mtrr_add(base, size, MTRR_TYPE_WRCOMB, true);
>> @@ -734,6 +742,7 @@ void __init mtrr_bp_init(void)
>>   	}
>>
>>   	if (mtrr_if) {
>> +		mtrr_enabled = true;
>>   		set_num_var_ranges();
>>   		init_table();
>>   		if (use_intel()) {
>> @@ -744,12 +753,13 @@ void __init mtrr_bp_init(void)
>>   				mtrr_if->set_all();
>>   			}
>>   		}
>> -	}
>> +	} else
>> +		pr_info("mtrr: system does not support MTRR\n");
>
>   'pr_warn' ?
>>   }
>>
>>   void mtrr_ap_init(void)
>>   {
>> -	if (!use_intel() || mtrr_aps_delayed_init)
>> +	if (!use_intel() || mtrr_aps_delayed_init || !mtrr_enabled)
>>   		return;
>>   	/*
>>   	 * Ideally we should hold mtrr_mutex here to avoid mtrr entries
>> @@ -774,6 +784,9 @@ void mtrr_save_state(void)
>>   {
>>   	int first_cpu;
>>
>> +	if (!mtrr_enabled)
>> +		return;
>> +
>>   	get_online_cpus();
>>   	first_cpu = cpumask_first(cpu_online_mask);
>>   	smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
>> @@ -782,7 +795,7 @@ void mtrr_save_state(void)
>>
>>   void set_mtrr_aps_delayed_init(void)
>>   {
>> -	if (!use_intel())
>> +	if (!use_intel() || !mtrr_enabled)
>>   		return;
>>
>>   	mtrr_aps_delayed_init = true;
>> @@ -810,7 +823,7 @@ void mtrr_aps_init(void)
>>
>>   void mtrr_bp_restore(void)
>>   {
>> -	if (!use_intel())
>> +	if (!use_intel() || !mtrr_enabled)
>>   		return;
>>
>>   	mtrr_if->set_all();
>> @@ -818,7 +831,7 @@ void mtrr_bp_restore(void)
>>
>>   static int __init mtrr_init_finialize(void)
>>   {
>> -	if (!mtrr_if)
>> +	if (!mtrr_enabled)
>>   		return 0;
>>
>>   	if (use_intel()) {
>> --
>> 2.3.2.209.gd67f9d5.dirty
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
>


^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
@ 2015-03-26  4:38       ` Juergen Gross
  0 siblings, 0 replies; 710+ messages in thread
From: Juergen Gross @ 2015-03-26  4:38 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Luis R. Rodriguez
  Cc: luto, mingo, tglx, hpa, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied, linux-kernel, linux-fbdev, x86,
	xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Stefan Bader, ville.syrjala,
	david.vrabel, toshi.kani, bhelgaas, Roger Pau Monné,
	xen-devel

On 03/25/2015 08:59 PM, Konrad Rzeszutek Wilk wrote:
> On Fri, Mar 20, 2015 at 04:17:52PM -0700, Luis R. Rodriguez wrote:
>> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>>
>> It is possible to enable CONFIG_MTRR and up with it
>> disabled at run time and yet CONFIG_X86_PAT continues
>> to kick through fully functionally. This can happen
>
> s/fully/full/ ?
>
>
>> for instance on Xen where MTRR is not supported but
>> PAT is, this can happen now on Linux as of commit
>> 47591df50 by Juergen introduced as of v3.19.
>
> s/3.19/4.0/

No, 3.19 is correct.

Juergen

>>
>> Technically we should assume the proper CPU
>> bits would be set to disable MTRR but we can't
>> always rely on this. At least on the Xen Hypervisor
>> for instance only X86_FEATURE_MTRR was disabled
>> as of Xen 4.4 through Xen commit 586ab6a [0],
>> but not X86_FEATURE_K6_MTRR, X86_FEATURE_CENTAUR_MCR,
>> or X86_FEATURE_CYRIX_ARR for instance.
>
> Oh, could you send an patch for that to Xen please?
>>
>> x86 mtrr code relies on quite a bit of checks for
>> mtrr_if being set to check to see if MTRR did get
>> set up, instead of using that lets provide a generic
>> setter which when set we know MTRR is enabled. This
>
> s/we know MTRR is enabled/will let us know that MTRR is enabled/
>
>> also adds a few checks where they were not before
>> which could potentially safeguard ourselves against
>> incorrect usage of MTRR where this was not desirable.
>>
>> Where possible match error codes as if MTRR was
>> disabled on arch/x86/include/asm/mtrr.h.
>>
>> Lastly, since disabling MTRR can happen at run time
>> and we could end up with PAT enabled best record now
>> on our logs when MTRR is disabled.
>>
>> [0] ~/devel/xen (git::stable-4.5)$ git describe --contains 586ab6a
>> 4.4.0-rc1~18
>>
>> Cc: Andy Lutomirski <luto@amacapital.net>
>> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
>> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
>> Cc: Ingo Molnar <mingo@elte.hu>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Juergen Gross <jgross@suse.com>
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Cc: Dave Airlie <airlied@redhat.com>
>> Cc: Antonino Daplas <adaplas@gmail.com>
>> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
>> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> Cc: venkatesh.pallipadi@intel.com
>> Cc: Stefan Bader <stefan.bader@canonical.com>
>> Cc: konrad.wilk@oracle.com
>> Cc: ville.syrjala@linux.intel.com
>> Cc: david.vrabel@citrix.com
>> Cc: jbeulich@suse.com
>> Cc: toshi.kani@hp.com
>> Cc: bhelgaas@google.com
>> Cc: Roger Pau Monné <roger.pau@citrix.com>
>> Cc: linux-fbdev@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> Cc: xen-devel@lists.xensource.com
>> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
>> ---
>>   arch/x86/include/asm/mtrr.h        |  2 ++
>>   arch/x86/kernel/cpu/mtrr/cleanup.c |  2 +-
>>   arch/x86/kernel/cpu/mtrr/generic.c |  5 +++--
>>   arch/x86/kernel/cpu/mtrr/if.c      |  3 +++
>>   arch/x86/kernel/cpu/mtrr/main.c    | 31 ++++++++++++++++++++++---------
>>   5 files changed, 31 insertions(+), 12 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
>> index f768f62..cade917 100644
>> --- a/arch/x86/include/asm/mtrr.h
>> +++ b/arch/x86/include/asm/mtrr.h
>> @@ -31,6 +31,7 @@
>>    * arch_phys_wc_add and arch_phys_wc_del.
>>    */
>>   # ifdef CONFIG_MTRR
>> +extern int mtrr_enabled;
>>   extern u8 mtrr_type_lookup(u64 addr, u64 end);
>>   extern void mtrr_save_fixed_ranges(void *);
>>   extern void mtrr_save_state(void);
>> @@ -50,6 +51,7 @@ extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
>>   extern int amd_special_default_mtrr(void);
>>   extern int phys_wc_to_mtrr_index(int handle);
>>   #  else
>> +static const int mtrr_enabled;
>>   static inline u8 mtrr_type_lookup(u64 addr, u64 end)
>>   {
>>   	/*
>> diff --git a/arch/x86/kernel/cpu/mtrr/cleanup.c b/arch/x86/kernel/cpu/mtrr/cleanup.c
>> index 5f90b85..784dc55 100644
>> --- a/arch/x86/kernel/cpu/mtrr/cleanup.c
>> +++ b/arch/x86/kernel/cpu/mtrr/cleanup.c
>> @@ -880,7 +880,7 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn)
>>   	 * Make sure we only trim uncachable memory on machines that
>>   	 * support the Intel MTRR architecture:
>>   	 */
>> -	if (!is_cpu(INTEL) || disable_mtrr_trim)
>> +	if (!is_cpu(INTEL) || disable_mtrr_trim || !mtrr_enabled)
>>   		return 0;
>>
>>   	rdmsr(MSR_MTRRdefType, def, dummy);
>> diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
>> index 09c82de..df321b2 100644
>> --- a/arch/x86/kernel/cpu/mtrr/generic.c
>> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
>> @@ -116,7 +116,8 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
>>   	u8 prev_match, curr_match;
>>
>>   	*repeat = 0;
>> -	if (!mtrr_state_set)
>> +	/* generic_mtrr_ops is only set for generic_mtrr_ops */
>> +	if (!mtrr_state_set || !mtrr_enabled)
>>   		return 0xFF;
>>
>>   	if (!mtrr_state.enabled)
>> @@ -290,7 +291,7 @@ static void get_fixed_ranges(mtrr_type *frs)
>>
>>   void mtrr_save_fixed_ranges(void *info)
>>   {
>> -	if (cpu_has_mtrr)
>> +	if (mtrr_enabled && cpu_has_mtrr)
>>   		get_fixed_ranges(mtrr_state.fixed_ranges);
>>   }
>>
>> diff --git a/arch/x86/kernel/cpu/mtrr/if.c b/arch/x86/kernel/cpu/mtrr/if.c
>> index d76f13d..e9e001a 100644
>> --- a/arch/x86/kernel/cpu/mtrr/if.c
>> +++ b/arch/x86/kernel/cpu/mtrr/if.c
>> @@ -436,6 +436,9 @@ static int __init mtrr_if_init(void)
>>   {
>>   	struct cpuinfo_x86 *c = &boot_cpu_data;
>>
>> +	if (!mtrr_enabled)
>> +		return 0;
>> +
>>   	if ((!cpu_has(c, X86_FEATURE_MTRR)) &&
>>   	    (!cpu_has(c, X86_FEATURE_K6_MTRR)) &&
>>   	    (!cpu_has(c, X86_FEATURE_CYRIX_ARR)) &&
>> diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
>> index ea5f363..7db9c47 100644
>> --- a/arch/x86/kernel/cpu/mtrr/main.c
>> +++ b/arch/x86/kernel/cpu/mtrr/main.c
>> @@ -59,6 +59,7 @@
>>   #define MTRR_TO_PHYS_WC_OFFSET 1000
>>
>>   u32 num_var_ranges;
>> +int mtrr_enabled;
>>
>>   unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
>>   static DEFINE_MUTEX(mtrr_mutex);
>> @@ -84,6 +85,9 @@ static int have_wrcomb(void)
>>   {
>>   	struct pci_dev *dev;
>>
>> +	if (!mtrr_enabled)
>> +		return 0;
>> +
>>   	dev = pci_get_class(PCI_CLASS_BRIDGE_HOST << 8, NULL);
>>   	if (dev != NULL) {
>>   		/*
>> @@ -286,7 +290,7 @@ int mtrr_add_page(unsigned long base, unsigned long size,
>>   	int i, replace, error;
>>   	mtrr_type ltype;
>>
>> -	if (!mtrr_if)
>> +	if (!mtrr_enabled)
>>   		return -ENXIO;
>>
>>   	error = mtrr_if->validate_add_page(base, size, type);
>> @@ -388,6 +392,8 @@ int mtrr_add_page(unsigned long base, unsigned long size,
>>
>>   static int mtrr_check(unsigned long base, unsigned long size)
>>   {
>> +	if (!mtrr_enabled)
>> +		return -ENODEV;
>>   	if ((base & (PAGE_SIZE - 1)) || (size & (PAGE_SIZE - 1))) {
>>   		pr_warning("mtrr: size and base must be multiples of 4 kiB\n");
>>   		pr_debug("mtrr: size: 0x%lx  base: 0x%lx\n", size, base);
>> @@ -463,8 +469,8 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
>>   	unsigned long lbase, lsize;
>>   	int error = -EINVAL;
>>
>> -	if (!mtrr_if)
>> -		return -ENXIO;
>> +	if (!mtrr_enabled)
>> +		return -ENODEV;
>>
>>   	max = num_var_ranges;
>>   	/* No CPU hotplug when we change MTRR entries */
>> @@ -523,6 +529,8 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
>>    */
>>   int mtrr_del(int reg, unsigned long base, unsigned long size)
>>   {
>> +	if (!mtrr_enabled)
>> +		return -ENODEV;
>>   	if (mtrr_check(base, size))
>>   		return -EINVAL;
>>   	return mtrr_del_page(reg, base >> PAGE_SHIFT, size >> PAGE_SHIFT);
>> @@ -545,7 +553,7 @@ int arch_phys_wc_add(unsigned long base, unsigned long size)
>>   {
>>   	int ret;
>>
>> -	if (pat_enabled)
>> +	if (pat_enabled || !mtrr_enabled)
>>   		return 0;  /* Success!  (We don't need to do anything.) */
>>
>>   	ret = mtrr_add(base, size, MTRR_TYPE_WRCOMB, true);
>> @@ -734,6 +742,7 @@ void __init mtrr_bp_init(void)
>>   	}
>>
>>   	if (mtrr_if) {
>> +		mtrr_enabled = true;
>>   		set_num_var_ranges();
>>   		init_table();
>>   		if (use_intel()) {
>> @@ -744,12 +753,13 @@ void __init mtrr_bp_init(void)
>>   				mtrr_if->set_all();
>>   			}
>>   		}
>> -	}
>> +	} else
>> +		pr_info("mtrr: system does not support MTRR\n");
>
>   'pr_warn' ?
>>   }
>>
>>   void mtrr_ap_init(void)
>>   {
>> -	if (!use_intel() || mtrr_aps_delayed_init)
>> +	if (!use_intel() || mtrr_aps_delayed_init || !mtrr_enabled)
>>   		return;
>>   	/*
>>   	 * Ideally we should hold mtrr_mutex here to avoid mtrr entries
>> @@ -774,6 +784,9 @@ void mtrr_save_state(void)
>>   {
>>   	int first_cpu;
>>
>> +	if (!mtrr_enabled)
>> +		return;
>> +
>>   	get_online_cpus();
>>   	first_cpu = cpumask_first(cpu_online_mask);
>>   	smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
>> @@ -782,7 +795,7 @@ void mtrr_save_state(void)
>>
>>   void set_mtrr_aps_delayed_init(void)
>>   {
>> -	if (!use_intel())
>> +	if (!use_intel() || !mtrr_enabled)
>>   		return;
>>
>>   	mtrr_aps_delayed_init = true;
>> @@ -810,7 +823,7 @@ void mtrr_aps_init(void)
>>
>>   void mtrr_bp_restore(void)
>>   {
>> -	if (!use_intel())
>> +	if (!use_intel() || !mtrr_enabled)
>>   		return;
>>
>>   	mtrr_if->set_all();
>> @@ -818,7 +831,7 @@ void mtrr_bp_restore(void)
>>
>>   static int __init mtrr_init_finialize(void)
>>   {
>> -	if (!mtrr_if)
>> +	if (!mtrr_enabled)
>>   		return 0;
>>
>>   	if (use_intel()) {
>> --
>> 2.3.2.209.gd67f9d5.dirty
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
>


^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
  2015-03-25 19:59     ` Konrad Rzeszutek Wilk
@ 2015-03-26 23:35       ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-26 23:35 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
	suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
	linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Stefan Bader, ville.syrjala,
	david.vrabel, toshi.kani, bhelgaas, Roger Pau Monné,
	xen-devel

On Wed, Mar 25, 2015 at 03:59:41PM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Mar 20, 2015 at 04:17:52PM -0700, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> > 
> > It is possible to enable CONFIG_MTRR and up with it
> > disabled at run time and yet CONFIG_X86_PAT continues
> > to kick through fully functionally. This can happen
> 
> s/fully/full/ ?

I'll rephrase this to:

---
It is possible to enable CONFIG_MTRR and up with it
disabled at run time and yet CONFIG_X86_PAT continues
to kick through with all functionally enabled. This
can happen for instance on Xen where MTRR is not
supported but PAT is, this can happen now on Linux as
of commit 47591df50 by Juergen introduced as of v3.19.
---

Which BTW I had also mentioned on the cover letter that
this is a good time to address if we want to make PAT
then a first class citizen, to detangle it from depending
on MTRR. If so I can do that later.

> > Technically we should assume the proper CPU
> > bits would be set to disable MTRR but we can't
> > always rely on this. At least on the Xen Hypervisor
> > for instance only X86_FEATURE_MTRR was disabled
> > as of Xen 4.4 through Xen commit 586ab6a [0],
> > but not X86_FEATURE_K6_MTRR, X86_FEATURE_CENTAUR_MCR,
> > or X86_FEATURE_CYRIX_ARR for instance.
> 
> Oh, could you send an patch for that to Xen please?

Done.

> > x86 mtrr code relies on quite a bit of checks for
> > mtrr_if being set to check to see if MTRR did get
> > set up, instead of using that lets provide a generic
> > setter which when set we know MTRR is enabled. This
> 
> s/we know MTRR is enabled/will let us know that MTRR is enabled/

Amended.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
@ 2015-03-26 23:35       ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-26 23:35 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
	suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
	linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Stefan Bader, ville.syrjala,
	david.vrabel, toshi.kani, bhelgaas, Roger Pau Monné,
	xen-devel

On Wed, Mar 25, 2015 at 03:59:41PM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Mar 20, 2015 at 04:17:52PM -0700, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> > 
> > It is possible to enable CONFIG_MTRR and up with it
> > disabled at run time and yet CONFIG_X86_PAT continues
> > to kick through fully functionally. This can happen
> 
> s/fully/full/ ?

I'll rephrase this to:

---
It is possible to enable CONFIG_MTRR and up with it
disabled at run time and yet CONFIG_X86_PAT continues
to kick through with all functionally enabled. This
can happen for instance on Xen where MTRR is not
supported but PAT is, this can happen now on Linux as
of commit 47591df50 by Juergen introduced as of v3.19.
---

Which BTW I had also mentioned on the cover letter that
this is a good time to address if we want to make PAT
then a first class citizen, to detangle it from depending
on MTRR. If so I can do that later.

> > Technically we should assume the proper CPU
> > bits would be set to disable MTRR but we can't
> > always rely on this. At least on the Xen Hypervisor
> > for instance only X86_FEATURE_MTRR was disabled
> > as of Xen 4.4 through Xen commit 586ab6a [0],
> > but not X86_FEATURE_K6_MTRR, X86_FEATURE_CENTAUR_MCR,
> > or X86_FEATURE_CYRIX_ARR for instance.
> 
> Oh, could you send an patch for that to Xen please?

Done.

> > x86 mtrr code relies on quite a bit of checks for
> > mtrr_if being set to check to see if MTRR did get
> > set up, instead of using that lets provide a generic
> > setter which when set we know MTRR is enabled. This
> 
> s/we know MTRR is enabled/will let us know that MTRR is enabled/

Amended.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-21  9:15     ` Ville Syrjälä
@ 2015-03-27  8:37       ` Ville Syrjälä
  -1 siblings, 0 replies; 710+ messages in thread
From: Ville Syrjälä @ 2015-03-27  8:37 UTC (permalink / raw)
  To: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
	suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
	linux-fbdev, x86, xen-devel, Luis R. Rodriguez, Ingo Molnar,
	Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > index 8025624..8875e56 100644
> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> >  
> >  #ifdef CONFIG_MTRR
> >  	par->mtrr_aper = -1;
> > -	par->mtrr_reg = -1;
> >  	if (!nomtrr) {
> > -		/* Cover the whole resource. */
> > -		par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > +		par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > +					  info->fix.smem_len,
> >  					  MTRR_TYPE_WRCOMB, 1);
> 
> MTRRs need power of two size, so how is this supposed to work?

Still waiting for an answer...

> 
> > -		if (par->mtrr_aper >= 0 && !par->aux_start) {
> > -			/* Make a hole for mmio. */
> > -			par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
> > -						 GUI_RESERVE, GUI_RESERVE,
> > -						 MTRR_TYPE_UNCACHABLE, 1);
> > -			if (par->mtrr_reg < 0) {
> > -				mtrr_del(par->mtrr_aper, 0, 0);
> > -				par->mtrr_aper = -1;
> > -			}
> > -		}
> >  	}
> >  #endif
> >  
> > @@ -2776,10 +2765,6 @@ aty_init_exit:
> >  	par->pll_ops->set_pll(info, &par->saved_pll);
> >  
> >  #ifdef CONFIG_MTRR
> > -	if (par->mtrr_reg >= 0) {
> > -		mtrr_del(par->mtrr_reg, 0, 0);
> > -		par->mtrr_reg = -1;
> > -	}
> >  	if (par->mtrr_aper >= 0) {
> >  		mtrr_del(par->mtrr_aper, 0, 0);
> >  		par->mtrr_aper = -1;
> > @@ -3466,7 +3451,7 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
> >  	}
> >  
> >  	info->fix.mmio_start = raddr;
> > -	par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
> > +	par->ati_regbase = ioremap_nocache(info->fix.mmio_start, 0x1000);
> >  	if (par->ati_regbase == NULL)
> >  		return -ENOMEM;
> >  
> > @@ -3491,6 +3476,8 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
> >  	info->fix.smem_start = addr;
> >  	info->fix.smem_len = 0x800000;
> >  
> > +	aty_fudge_framebuffer_len(info);
> > +
> >  	info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
> >  	if (info->screen_base == NULL) {
> >  		ret = -ENOMEM;
> > @@ -3563,6 +3550,7 @@ static int atyfb_pci_probe(struct pci_dev *pdev,
> >  		return -ENOMEM;
> >  	}
> >  	par = info->par;
> > +	par->bus_type = PCI;
> >  	info->fix = atyfb_fix;
> >  	info->device = &pdev->dev;
> >  	par->pci_id = pdev->device;
> > @@ -3732,10 +3720,6 @@ static void atyfb_remove(struct fb_info *info)
> >  #endif
> >  
> >  #ifdef CONFIG_MTRR
> > -	if (par->mtrr_reg >= 0) {
> > -		mtrr_del(par->mtrr_reg, 0, 0);
> > -		par->mtrr_reg = -1;
> > -	}
> >  	if (par->mtrr_aper >= 0) {
> >  		mtrr_del(par->mtrr_aper, 0, 0);
> >  		par->mtrr_aper = -1;
> > -- 
> > 2.3.2.209.gd67f9d5.dirty
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -- 
> Ville Syrjälä
> syrjala@sci.fi
> http://www.sci.fi/~syrjala/

-- 
Ville Syrjälä
syrjala@sci.fi
http://www.sci.fi/~syrjala/

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
@ 2015-03-27  8:37       ` Ville Syrjälä
  0 siblings, 0 replies; 710+ messages in thread
From: Ville Syrjälä @ 2015-03-27  8:37 UTC (permalink / raw)
  To: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
	suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
	linux-fbdev, x86, xen-devel, Luis R. Rodriguez, Ingo Molnar,
	Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > index 8025624..8875e56 100644
> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> >  
> >  #ifdef CONFIG_MTRR
> >  	par->mtrr_aper = -1;
> > -	par->mtrr_reg = -1;
> >  	if (!nomtrr) {
> > -		/* Cover the whole resource. */
> > -		par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > +		par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > +					  info->fix.smem_len,
> >  					  MTRR_TYPE_WRCOMB, 1);
> 
> MTRRs need power of two size, so how is this supposed to work?

Still waiting for an answer...

> 
> > -		if (par->mtrr_aper >= 0 && !par->aux_start) {
> > -			/* Make a hole for mmio. */
> > -			par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
> > -						 GUI_RESERVE, GUI_RESERVE,
> > -						 MTRR_TYPE_UNCACHABLE, 1);
> > -			if (par->mtrr_reg < 0) {
> > -				mtrr_del(par->mtrr_aper, 0, 0);
> > -				par->mtrr_aper = -1;
> > -			}
> > -		}
> >  	}
> >  #endif
> >  
> > @@ -2776,10 +2765,6 @@ aty_init_exit:
> >  	par->pll_ops->set_pll(info, &par->saved_pll);
> >  
> >  #ifdef CONFIG_MTRR
> > -	if (par->mtrr_reg >= 0) {
> > -		mtrr_del(par->mtrr_reg, 0, 0);
> > -		par->mtrr_reg = -1;
> > -	}
> >  	if (par->mtrr_aper >= 0) {
> >  		mtrr_del(par->mtrr_aper, 0, 0);
> >  		par->mtrr_aper = -1;
> > @@ -3466,7 +3451,7 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
> >  	}
> >  
> >  	info->fix.mmio_start = raddr;
> > -	par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
> > +	par->ati_regbase = ioremap_nocache(info->fix.mmio_start, 0x1000);
> >  	if (par->ati_regbase = NULL)
> >  		return -ENOMEM;
> >  
> > @@ -3491,6 +3476,8 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
> >  	info->fix.smem_start = addr;
> >  	info->fix.smem_len = 0x800000;
> >  
> > +	aty_fudge_framebuffer_len(info);
> > +
> >  	info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
> >  	if (info->screen_base = NULL) {
> >  		ret = -ENOMEM;
> > @@ -3563,6 +3550,7 @@ static int atyfb_pci_probe(struct pci_dev *pdev,
> >  		return -ENOMEM;
> >  	}
> >  	par = info->par;
> > +	par->bus_type = PCI;
> >  	info->fix = atyfb_fix;
> >  	info->device = &pdev->dev;
> >  	par->pci_id = pdev->device;
> > @@ -3732,10 +3720,6 @@ static void atyfb_remove(struct fb_info *info)
> >  #endif
> >  
> >  #ifdef CONFIG_MTRR
> > -	if (par->mtrr_reg >= 0) {
> > -		mtrr_del(par->mtrr_reg, 0, 0);
> > -		par->mtrr_reg = -1;
> > -	}
> >  	if (par->mtrr_aper >= 0) {
> >  		mtrr_del(par->mtrr_aper, 0, 0);
> >  		par->mtrr_aper = -1;
> > -- 
> > 2.3.2.209.gd67f9d5.dirty
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -- 
> Ville Syrjälä
> syrjala@sci.fi
> http://www.sci.fi/~syrjala/

-- 
Ville Syrjälä
syrjala@sci.fi
http://www.sci.fi/~syrjala/

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-21  9:15     ` Ville Syrjälä
  (?)
@ 2015-03-27  8:37     ` Ville Syrjälä
  -1 siblings, 0 replies; 710+ messages in thread
From: Ville Syrjälä @ 2015-03-27  8:37 UTC (permalink / raw)
  To: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
	suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
	linux-fbdev, x86, xen-devel, Luis R. Rodriguez, Ingo Molnar,
	Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > index 8025624..8875e56 100644
> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> >  
> >  #ifdef CONFIG_MTRR
> >  	par->mtrr_aper = -1;
> > -	par->mtrr_reg = -1;
> >  	if (!nomtrr) {
> > -		/* Cover the whole resource. */
> > -		par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > +		par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > +					  info->fix.smem_len,
> >  					  MTRR_TYPE_WRCOMB, 1);
> 
> MTRRs need power of two size, so how is this supposed to work?

Still waiting for an answer...

> 
> > -		if (par->mtrr_aper >= 0 && !par->aux_start) {
> > -			/* Make a hole for mmio. */
> > -			par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
> > -						 GUI_RESERVE, GUI_RESERVE,
> > -						 MTRR_TYPE_UNCACHABLE, 1);
> > -			if (par->mtrr_reg < 0) {
> > -				mtrr_del(par->mtrr_aper, 0, 0);
> > -				par->mtrr_aper = -1;
> > -			}
> > -		}
> >  	}
> >  #endif
> >  
> > @@ -2776,10 +2765,6 @@ aty_init_exit:
> >  	par->pll_ops->set_pll(info, &par->saved_pll);
> >  
> >  #ifdef CONFIG_MTRR
> > -	if (par->mtrr_reg >= 0) {
> > -		mtrr_del(par->mtrr_reg, 0, 0);
> > -		par->mtrr_reg = -1;
> > -	}
> >  	if (par->mtrr_aper >= 0) {
> >  		mtrr_del(par->mtrr_aper, 0, 0);
> >  		par->mtrr_aper = -1;
> > @@ -3466,7 +3451,7 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
> >  	}
> >  
> >  	info->fix.mmio_start = raddr;
> > -	par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
> > +	par->ati_regbase = ioremap_nocache(info->fix.mmio_start, 0x1000);
> >  	if (par->ati_regbase == NULL)
> >  		return -ENOMEM;
> >  
> > @@ -3491,6 +3476,8 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
> >  	info->fix.smem_start = addr;
> >  	info->fix.smem_len = 0x800000;
> >  
> > +	aty_fudge_framebuffer_len(info);
> > +
> >  	info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
> >  	if (info->screen_base == NULL) {
> >  		ret = -ENOMEM;
> > @@ -3563,6 +3550,7 @@ static int atyfb_pci_probe(struct pci_dev *pdev,
> >  		return -ENOMEM;
> >  	}
> >  	par = info->par;
> > +	par->bus_type = PCI;
> >  	info->fix = atyfb_fix;
> >  	info->device = &pdev->dev;
> >  	par->pci_id = pdev->device;
> > @@ -3732,10 +3720,6 @@ static void atyfb_remove(struct fb_info *info)
> >  #endif
> >  
> >  #ifdef CONFIG_MTRR
> > -	if (par->mtrr_reg >= 0) {
> > -		mtrr_del(par->mtrr_reg, 0, 0);
> > -		par->mtrr_reg = -1;
> > -	}
> >  	if (par->mtrr_aper >= 0) {
> >  		mtrr_del(par->mtrr_aper, 0, 0);
> >  		par->mtrr_aper = -1;
> > -- 
> > 2.3.2.209.gd67f9d5.dirty
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -- 
> Ville Syrjälä
> syrjala@sci.fi
> http://www.sci.fi/~syrjala/

-- 
Ville Syrjälä
syrjala@sci.fi
http://www.sci.fi/~syrjala/

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
  2015-03-25 20:07     ` Konrad Rzeszutek Wilk
  (?)
@ 2015-03-27 18:40       ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 18:40 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
	suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
	linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
	Bjorn Helgaas, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
	Stefan Bader, ville.syrjala, david.vrabel, toshi.kani,
	Roger Pau Monné,
	xen-devel

On Wed, Mar 25, 2015 at 04:07:43PM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Mar 20, 2015 at 04:17:55PM -0700, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> > 
> > This allows drivers to take advantage of write-combining
> > when possible. Ideally we'd have pci_read_bases() just
> > peg an IORESOURCE_WC flag for us but where exactly
> > video devices memory lie varies *largely* and at times things
> > are mixed with MMIO registers, sometimes we can address
> > the changes in drivers, other times the change requires
> > intrusive changes.
> > 
> > Although there is also arch_phys_wc_add() that makes use of
> > architecture specific write-combinging alternatives (MTRR on
> 
> combinging?

Amended.

> > diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c
> > index bcce5f1..30b65ae 100644
> > --- a/lib/pci_iomap.c
> > +++ b/lib/pci_iomap.c
> > @@ -52,6 +52,46 @@ void __iomem *pci_iomap_range(struct pci_dev *dev,
> >  EXPORT_SYMBOL(pci_iomap_range);
> >  
> >  /**
> > + * pci_iomap_wc_range - create a virtual WC mapping cookie for a PCI BAR
> > + * @dev: PCI device that owns the BAR
> > + * @bar: BAR number
> > + * @offset: map memory at the given offset in BAR
> > + * @maxlen: max length of the memory to map
> > + *
> > + * Using this function you will get a __iomem address to your device BAR.
> > + * You can access it using ioread*() and iowrite*(). These functions hide
> > + * the details if this is a MMIO or PIO address space and will just do what
> > + * you expect from them in the correct way. When possible write combining
> > + * is used.
> > + *
> > + * @maxlen specifies the maximum length to map. If you want to get access to
> > + * the complete BAR from offset to the end, pass %0 here.
> 
> s/%0/0 ? Or is that some special syntax?

This copies the syntax of pci_iomap_range() which also uses %0, and as per
Documentation/kernel-doc-nano-HOWTO.txt % is used for constants. See:

scripts/kernel-doc -man -function pci_iomap_range lib/pci_iomap.c | nroff -man | less

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
@ 2015-03-27 18:40       ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 18:40 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
	suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
	linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
	Bjorn Helgaas, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
	Stefan Bader, ville.syrjala, david.vrabel, toshi.kani,
	Roger Pau Monné,
	xen-devel

On Wed, Mar 25, 2015 at 04:07:43PM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Mar 20, 2015 at 04:17:55PM -0700, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> > 
> > This allows drivers to take advantage of write-combining
> > when possible. Ideally we'd have pci_read_bases() just
> > peg an IORESOURCE_WC flag for us but where exactly
> > video devices memory lie varies *largely* and at times things
> > are mixed with MMIO registers, sometimes we can address
> > the changes in drivers, other times the change requires
> > intrusive changes.
> > 
> > Although there is also arch_phys_wc_add() that makes use of
> > architecture specific write-combinging alternatives (MTRR on
> 
> combinging?

Amended.

> > diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c
> > index bcce5f1..30b65ae 100644
> > --- a/lib/pci_iomap.c
> > +++ b/lib/pci_iomap.c
> > @@ -52,6 +52,46 @@ void __iomem *pci_iomap_range(struct pci_dev *dev,
> >  EXPORT_SYMBOL(pci_iomap_range);
> >  
> >  /**
> > + * pci_iomap_wc_range - create a virtual WC mapping cookie for a PCI BAR
> > + * @dev: PCI device that owns the BAR
> > + * @bar: BAR number
> > + * @offset: map memory at the given offset in BAR
> > + * @maxlen: max length of the memory to map
> > + *
> > + * Using this function you will get a __iomem address to your device BAR.
> > + * You can access it using ioread*() and iowrite*(). These functions hide
> > + * the details if this is a MMIO or PIO address space and will just do what
> > + * you expect from them in the correct way. When possible write combining
> > + * is used.
> > + *
> > + * @maxlen specifies the maximum length to map. If you want to get access to
> > + * the complete BAR from offset to the end, pass %0 here.
> 
> s/%0/0 ? Or is that some special syntax?

This copies the syntax of pci_iomap_range() which also uses %0, and as per
Documentation/kernel-doc-nano-HOWTO.txt % is used for constants. See:

scripts/kernel-doc -man -function pci_iomap_range lib/pci_iomap.c | nroff -man | less

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
@ 2015-03-27 18:40       ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 18:40 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
	suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
	linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
	Bjorn Helgaas, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
	Stefan Bader, ville.syrjala, david.vrabel, toshi.kani,
	Roger Pau Monné

On Wed, Mar 25, 2015 at 04:07:43PM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Mar 20, 2015 at 04:17:55PM -0700, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> > 
> > This allows drivers to take advantage of write-combining
> > when possible. Ideally we'd have pci_read_bases() just
> > peg an IORESOURCE_WC flag for us but where exactly
> > video devices memory lie varies *largely* and at times things
> > are mixed with MMIO registers, sometimes we can address
> > the changes in drivers, other times the change requires
> > intrusive changes.
> > 
> > Although there is also arch_phys_wc_add() that makes use of
> > architecture specific write-combinging alternatives (MTRR on
> 
> combinging?

Amended.

> > diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c
> > index bcce5f1..30b65ae 100644
> > --- a/lib/pci_iomap.c
> > +++ b/lib/pci_iomap.c
> > @@ -52,6 +52,46 @@ void __iomem *pci_iomap_range(struct pci_dev *dev,
> >  EXPORT_SYMBOL(pci_iomap_range);
> >  
> >  /**
> > + * pci_iomap_wc_range - create a virtual WC mapping cookie for a PCI BAR
> > + * @dev: PCI device that owns the BAR
> > + * @bar: BAR number
> > + * @offset: map memory at the given offset in BAR
> > + * @maxlen: max length of the memory to map
> > + *
> > + * Using this function you will get a __iomem address to your device BAR.
> > + * You can access it using ioread*() and iowrite*(). These functions hide
> > + * the details if this is a MMIO or PIO address space and will just do what
> > + * you expect from them in the correct way. When possible write combining
> > + * is used.
> > + *
> > + * @maxlen specifies the maximum length to map. If you want to get access to
> > + * the complete BAR from offset to the end, pass %0 here.
> 
> s/%0/0 ? Or is that some special syntax?

This copies the syntax of pci_iomap_range() which also uses %0, and as per
Documentation/kernel-doc-nano-HOWTO.txt % is used for constants. See:

scripts/kernel-doc -man -function pci_iomap_range lib/pci_iomap.c | nroff -man | less

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
  2015-03-23 17:20     ` Bjorn Helgaas
  (?)
@ 2015-03-27 19:18       ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-03-27 19:18 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Luis R. Rodriguez, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, jgross, Jan Beulich, Borislav Petkov,
	Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
	linux-fbdev, x86, xen-devel, Luis R. Rodriguez, Ingo Molnar,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
	Stefan Bader, Konrad Rzeszutek Wilk, Ville Syrjälä,
	David Vrabel, Roger Pau Monné,
	xen-devel

On Mon, 2015-03-23 at 12:20 -0500, Bjorn Helgaas wrote:
 :
> pci_iomap_range() already makes a cacheable mapping if
> IORESOURCE_CACHEABLE; I'm guessing that you would like it to
> automatically use WC if the BAR if IORESOURCE_PREFETCH, e.g.,
> 
>   if (flags & IORESOURCE_CACHEABLE)
>     return ioremap(start, len);

Is this supposed to be ioremap_cache()?  ioremap() is the same as
ioremap_nocache() at least on x86 per arch/x86/include/asm/io.h.

>   if (flags & IORESOURCE_PREFETCH)
>     return ioremap_wc(start, len);
>   return ioremap_nocache(start, len);
> 

-Toshi



^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
@ 2015-03-27 19:18       ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-03-27 19:18 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Luis R. Rodriguez, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, jgross, Jan Beulich, Borislav Petkov,
	Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
	linux-fbdev, x86, xen-devel, Luis R. Rodriguez, Ingo Molnar,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
	Stefan Bader, Konrad Rzeszutek Wilk, Ville Syrjälä,
	David Vrabel, Roger Pau Monné,
	xen-devel

On Mon, 2015-03-23 at 12:20 -0500, Bjorn Helgaas wrote:
 :
> pci_iomap_range() already makes a cacheable mapping if
> IORESOURCE_CACHEABLE; I'm guessing that you would like it to
> automatically use WC if the BAR if IORESOURCE_PREFETCH, e.g.,
> 
>   if (flags & IORESOURCE_CACHEABLE)
>     return ioremap(start, len);

Is this supposed to be ioremap_cache()?  ioremap() is the same as
ioremap_nocache() at least on x86 per arch/x86/include/asm/io.h.

>   if (flags & IORESOURCE_PREFETCH)
>     return ioremap_wc(start, len);
>   return ioremap_nocache(start, len);
> 

-Toshi



^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
@ 2015-03-27 19:18       ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-03-27 19:18 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Luis R. Rodriguez, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, jgross, Jan Beulich, Borislav Petkov,
	Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
	linux-fbdev, x86, xen-devel, Luis R. Rodriguez, Ingo Molnar,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann

On Mon, 2015-03-23 at 12:20 -0500, Bjorn Helgaas wrote:
 :
> pci_iomap_range() already makes a cacheable mapping if
> IORESOURCE_CACHEABLE; I'm guessing that you would like it to
> automatically use WC if the BAR if IORESOURCE_PREFETCH, e.g.,
> 
>   if (flags & IORESOURCE_CACHEABLE)
>     return ioremap(start, len);

Is this supposed to be ioremap_cache()?  ioremap() is the same as
ioremap_nocache() at least on x86 per arch/x86/include/asm/io.h.

>   if (flags & IORESOURCE_PREFETCH)
>     return ioremap_wc(start, len);
>   return ioremap_nocache(start, len);
> 

-Toshi

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-21  9:15     ` Ville Syrjälä
@ 2015-03-27 19:38       ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 19:38 UTC (permalink / raw)
  To: Ville Syrjälä,
	Bjorn Helgaas, Luis R. Rodriguez, Andy Lutomirski
  Cc: mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied, linux-kernel, linux-fbdev, x86,
	xen-devel, Ingo Molnar, Linus Torvalds, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > index 8025624..8875e56 100644
> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> >  
> >  #ifdef CONFIG_MTRR
> >  	par->mtrr_aper = -1;
> > -	par->mtrr_reg = -1;
> >  	if (!nomtrr) {
> > -		/* Cover the whole resource. */
> > -		par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > +		par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > +					  info->fix.smem_len,
> >  					  MTRR_TYPE_WRCOMB, 1);
> 
> MTRRs need power of two size, so how is this supposed to work?

As per mtrr_add_page() [0] the base and size are just supposed to be in units
of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
is not standardized and by no means recorded as a requirement. Obviously
powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
will use mtrr_check() to verify the the same requirement. Furthermore,
as per my commit log message:

---
    The last thing we do must do to remain sane is ensure we
    use the info->fix.smem_start and info->fix.smem_len for
    the framebuffer MTRR as we know that is always well adjusted.
    The *one* concern here would be if the MTRR is not in units
    of 4K __but__ we already know that in the PCI case this cannot
    happen, in the shared space setting the MTRR would be up to
    0x7ff000 and assuming a 4K page:
    
    ; 0x7ff000 / 0x1000
        2047
    
    Also, internally when MTRR is used mtrr_add() will use mtrr_check()
    and that should splat a warning when the MTRR base and size are
    not compatible with what is expected for MTRR usage.
---

If any of this is too risky we can use the __arch_phys_wc_add() (or as
Andy suggested perhaps use set_page_* stuff, although I am still evaluating
this) but I did this change to show the effort required for a change when
the registers / framebuffer is on the same PCI BAR but at different offsets.

[0] scripts/kernel-doc -man -function mtrr_add_page arch/x86/kernel/cpu/mtrr/main.c | nroff -man | less

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
@ 2015-03-27 19:38       ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 19:38 UTC (permalink / raw)
  To: Ville Syrjälä,
	Bjorn Helgaas, Luis R. Rodriguez, Andy Lutomirski
  Cc: mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied, linux-kernel, linux-fbdev, x86,
	xen-devel, Ingo Molnar, Linus Torvalds, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > index 8025624..8875e56 100644
> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> >  
> >  #ifdef CONFIG_MTRR
> >  	par->mtrr_aper = -1;
> > -	par->mtrr_reg = -1;
> >  	if (!nomtrr) {
> > -		/* Cover the whole resource. */
> > -		par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > +		par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > +					  info->fix.smem_len,
> >  					  MTRR_TYPE_WRCOMB, 1);
> 
> MTRRs need power of two size, so how is this supposed to work?

As per mtrr_add_page() [0] the base and size are just supposed to be in units
of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
is not standardized and by no means recorded as a requirement. Obviously
powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
will use mtrr_check() to verify the the same requirement. Furthermore,
as per my commit log message:

---
    The last thing we do must do to remain sane is ensure we
    use the info->fix.smem_start and info->fix.smem_len for
    the framebuffer MTRR as we know that is always well adjusted.
    The *one* concern here would be if the MTRR is not in units
    of 4K __but__ we already know that in the PCI case this cannot
    happen, in the shared space setting the MTRR would be up to
    0x7ff000 and assuming a 4K page:
    
    ; 0x7ff000 / 0x1000
        2047
    
    Also, internally when MTRR is used mtrr_add() will use mtrr_check()
    and that should splat a warning when the MTRR base and size are
    not compatible with what is expected for MTRR usage.
---

If any of this is too risky we can use the __arch_phys_wc_add() (or as
Andy suggested perhaps use set_page_* stuff, although I am still evaluating
this) but I did this change to show the effort required for a change when
the registers / framebuffer is on the same PCI BAR but at different offsets.

[0] scripts/kernel-doc -man -function mtrr_add_page arch/x86/kernel/cpu/mtrr/main.c | nroff -man | less

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-21  9:15     ` Ville Syrjälä
                       ` (2 preceding siblings ...)
  (?)
@ 2015-03-27 19:38     ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 19:38 UTC (permalink / raw)
  To: Ville Syrjälä,
	Bjorn Helgaas, Luis R. Rodriguez, Andy Lutomirski
  Cc: jgross, Jean-Christophe Plagniol-Villard, linux-fbdev, x86,
	suresh.b.siddha, Antonino Daplas, Daniel Vetter, Tomi Valkeinen,
	venkatesh.pallipadi, linux-kernel, xen-devel, mingo, JBeulich,
	hpa, airlied, tglx, bp, Linus Torvalds, Ingo Molnar

On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > index 8025624..8875e56 100644
> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> >  
> >  #ifdef CONFIG_MTRR
> >  	par->mtrr_aper = -1;
> > -	par->mtrr_reg = -1;
> >  	if (!nomtrr) {
> > -		/* Cover the whole resource. */
> > -		par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > +		par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > +					  info->fix.smem_len,
> >  					  MTRR_TYPE_WRCOMB, 1);
> 
> MTRRs need power of two size, so how is this supposed to work?

As per mtrr_add_page() [0] the base and size are just supposed to be in units
of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
is not standardized and by no means recorded as a requirement. Obviously
powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
will use mtrr_check() to verify the the same requirement. Furthermore,
as per my commit log message:

---
    The last thing we do must do to remain sane is ensure we
    use the info->fix.smem_start and info->fix.smem_len for
    the framebuffer MTRR as we know that is always well adjusted.
    The *one* concern here would be if the MTRR is not in units
    of 4K __but__ we already know that in the PCI case this cannot
    happen, in the shared space setting the MTRR would be up to
    0x7ff000 and assuming a 4K page:
    
    ; 0x7ff000 / 0x1000
        2047
    
    Also, internally when MTRR is used mtrr_add() will use mtrr_check()
    and that should splat a warning when the MTRR base and size are
    not compatible with what is expected for MTRR usage.
---

If any of this is too risky we can use the __arch_phys_wc_add() (or as
Andy suggested perhaps use set_page_* stuff, although I am still evaluating
this) but I did this change to show the effort required for a change when
the registers / framebuffer is on the same PCI BAR but at different offsets.

[0] scripts/kernel-doc -man -function mtrr_add_page arch/x86/kernel/cpu/mtrr/main.c | nroff -man | less

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-27  8:37       ` Ville Syrjälä
@ 2015-03-27 19:38         ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 19:38 UTC (permalink / raw)
  To: Ville Syrjälä,
	Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
	suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
	linux-fbdev, x86, xen-devel, Ingo Molnar, Linus Torvalds,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

On Fri, Mar 27, 2015 at 10:37:04AM +0200, Ville Syrjälä wrote:
> On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> > On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> > > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > > index 8025624..8875e56 100644
> > > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> > >  
> > >  #ifdef CONFIG_MTRR
> > >  	par->mtrr_aper = -1;
> > > -	par->mtrr_reg = -1;
> > >  	if (!nomtrr) {
> > > -		/* Cover the whole resource. */
> > > -		par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > > +		par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > > +					  info->fix.smem_len,
> > >  					  MTRR_TYPE_WRCOMB, 1);
> > 
> > MTRRs need power of two size, so how is this supposed to work?
> 
> Still waiting for an answer...

Sorry was in the desert for a bit, I'm back now.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
@ 2015-03-27 19:38         ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 19:38 UTC (permalink / raw)
  To: Ville Syrjälä,
	Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
	suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
	linux-fbdev, x86, xen-devel, Ingo Molnar, Linus Torvalds,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

On Fri, Mar 27, 2015 at 10:37:04AM +0200, Ville Syrjälä wrote:
> On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> > On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> > > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > > index 8025624..8875e56 100644
> > > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> > >  
> > >  #ifdef CONFIG_MTRR
> > >  	par->mtrr_aper = -1;
> > > -	par->mtrr_reg = -1;
> > >  	if (!nomtrr) {
> > > -		/* Cover the whole resource. */
> > > -		par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > > +		par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > > +					  info->fix.smem_len,
> > >  					  MTRR_TYPE_WRCOMB, 1);
> > 
> > MTRRs need power of two size, so how is this supposed to work?
> 
> Still waiting for an answer...

Sorry was in the desert for a bit, I'm back now.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-27  8:37       ` Ville Syrjälä
  (?)
  (?)
@ 2015-03-27 19:38       ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 19:38 UTC (permalink / raw)
  To: Ville Syrjälä,
	Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
	suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
	linux-fbdev, x86, xen-devel, Ingo Molnar, Linus Torvalds,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

On Fri, Mar 27, 2015 at 10:37:04AM +0200, Ville Syrjälä wrote:
> On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> > On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> > > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > > index 8025624..8875e56 100644
> > > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> > >  
> > >  #ifdef CONFIG_MTRR
> > >  	par->mtrr_aper = -1;
> > > -	par->mtrr_reg = -1;
> > >  	if (!nomtrr) {
> > > -		/* Cover the whole resource. */
> > > -		par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > > +		par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > > +					  info->fix.smem_len,
> > >  					  MTRR_TYPE_WRCOMB, 1);
> > 
> > MTRRs need power of two size, so how is this supposed to work?
> 
> Still waiting for an answer...

Sorry was in the desert for a bit, I'm back now.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-27 19:38       ` Luis R. Rodriguez
@ 2015-03-27 19:43         ` Andy Lutomirski
  -1 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-27 19:43 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Ville Syrjälä,
	Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Juergen Gross, Jan Beulich, Borislav Petkov,
	Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
	Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
	Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
>> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
>> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
>> > index 8025624..8875e56 100644
>> > --- a/drivers/video/fbdev/aty/atyfb_base.c
>> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
>> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
>> >
>> >  #ifdef CONFIG_MTRR
>> >     par->mtrr_aper = -1;
>> > -   par->mtrr_reg = -1;
>> >     if (!nomtrr) {
>> > -           /* Cover the whole resource. */
>> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
>> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
>> > +                                     info->fix.smem_len,
>> >                                       MTRR_TYPE_WRCOMB, 1);
>>
>> MTRRs need power of two size, so how is this supposed to work?
>
> As per mtrr_add_page() [0] the base and size are just supposed to be in units
> of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> is not standardized and by no means recorded as a requirement. Obviously
> powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> will use mtrr_check() to verify the the same requirement. Furthermore,
> as per my commit log message:

Whatever the code may or may not do, the x86 architecture uses
power-of-two MTRR sizes.  So I'm confused.

--Andy

>
> ---
>     The last thing we do must do to remain sane is ensure we
>     use the info->fix.smem_start and info->fix.smem_len for
>     the framebuffer MTRR as we know that is always well adjusted.
>     The *one* concern here would be if the MTRR is not in units
>     of 4K __but__ we already know that in the PCI case this cannot
>     happen, in the shared space setting the MTRR would be up to
>     0x7ff000 and assuming a 4K page:
>
>     ; 0x7ff000 / 0x1000
>         2047
>
>     Also, internally when MTRR is used mtrr_add() will use mtrr_check()
>     and that should splat a warning when the MTRR base and size are
>     not compatible with what is expected for MTRR usage.
> ---
>
> If any of this is too risky we can use the __arch_phys_wc_add() (or as
> Andy suggested perhaps use set_page_* stuff, although I am still evaluating
> this) but I did this change to show the effort required for a change when
> the registers / framebuffer is on the same PCI BAR but at different offsets.
>
> [0] scripts/kernel-doc -man -function mtrr_add_page arch/x86/kernel/cpu/mtrr/main.c | nroff -man | less
>
>   Luis



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
@ 2015-03-27 19:43         ` Andy Lutomirski
  0 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-27 19:43 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Ville Syrjälä,
	Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Juergen Gross, Jan Beulich, Borislav Petkov,
	Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
	Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
	Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
>> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
>> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
>> > index 8025624..8875e56 100644
>> > --- a/drivers/video/fbdev/aty/atyfb_base.c
>> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
>> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
>> >
>> >  #ifdef CONFIG_MTRR
>> >     par->mtrr_aper = -1;
>> > -   par->mtrr_reg = -1;
>> >     if (!nomtrr) {
>> > -           /* Cover the whole resource. */
>> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
>> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
>> > +                                     info->fix.smem_len,
>> >                                       MTRR_TYPE_WRCOMB, 1);
>>
>> MTRRs need power of two size, so how is this supposed to work?
>
> As per mtrr_add_page() [0] the base and size are just supposed to be in units
> of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> is not standardized and by no means recorded as a requirement. Obviously
> powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> will use mtrr_check() to verify the the same requirement. Furthermore,
> as per my commit log message:

Whatever the code may or may not do, the x86 architecture uses
power-of-two MTRR sizes.  So I'm confused.

--Andy

>
> ---
>     The last thing we do must do to remain sane is ensure we
>     use the info->fix.smem_start and info->fix.smem_len for
>     the framebuffer MTRR as we know that is always well adjusted.
>     The *one* concern here would be if the MTRR is not in units
>     of 4K __but__ we already know that in the PCI case this cannot
>     happen, in the shared space setting the MTRR would be up to
>     0x7ff000 and assuming a 4K page:
>
>     ; 0x7ff000 / 0x1000
>         2047
>
>     Also, internally when MTRR is used mtrr_add() will use mtrr_check()
>     and that should splat a warning when the MTRR base and size are
>     not compatible with what is expected for MTRR usage.
> ---
>
> If any of this is too risky we can use the __arch_phys_wc_add() (or as
> Andy suggested perhaps use set_page_* stuff, although I am still evaluating
> this) but I did this change to show the effort required for a change when
> the registers / framebuffer is on the same PCI BAR but at different offsets.
>
> [0] scripts/kernel-doc -man -function mtrr_add_page arch/x86/kernel/cpu/mtrr/main.c | nroff -man | less
>
>   Luis



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-27 19:38       ` Luis R. Rodriguez
  (?)
@ 2015-03-27 19:43       ` Andy Lutomirski
  -1 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-27 19:43 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Linux Fbdev development list, Daniel Vetter,
	Ville Syrjälä,
	Jan Beulich, H. Peter Anvin, Suresh Siddha, Tomi Valkeinen,
	X86 ML, Ingo Molnar, xen-devel, Ingo Molnar, Borislav Petkov,
	Jean-Christophe Plagniol-Villard, Antonino Daplas, Dave Airlie,
	Bjorn Helgaas, Thomas Gleixner, Juergen Gross, Luis R. Rodriguez,
	linux-kernel, venkatesh.pallipadi

On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
>> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
>> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
>> > index 8025624..8875e56 100644
>> > --- a/drivers/video/fbdev/aty/atyfb_base.c
>> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
>> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
>> >
>> >  #ifdef CONFIG_MTRR
>> >     par->mtrr_aper = -1;
>> > -   par->mtrr_reg = -1;
>> >     if (!nomtrr) {
>> > -           /* Cover the whole resource. */
>> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
>> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
>> > +                                     info->fix.smem_len,
>> >                                       MTRR_TYPE_WRCOMB, 1);
>>
>> MTRRs need power of two size, so how is this supposed to work?
>
> As per mtrr_add_page() [0] the base and size are just supposed to be in units
> of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> is not standardized and by no means recorded as a requirement. Obviously
> powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> will use mtrr_check() to verify the the same requirement. Furthermore,
> as per my commit log message:

Whatever the code may or may not do, the x86 architecture uses
power-of-two MTRR sizes.  So I'm confused.

--Andy

>
> ---
>     The last thing we do must do to remain sane is ensure we
>     use the info->fix.smem_start and info->fix.smem_len for
>     the framebuffer MTRR as we know that is always well adjusted.
>     The *one* concern here would be if the MTRR is not in units
>     of 4K __but__ we already know that in the PCI case this cannot
>     happen, in the shared space setting the MTRR would be up to
>     0x7ff000 and assuming a 4K page:
>
>     ; 0x7ff000 / 0x1000
>         2047
>
>     Also, internally when MTRR is used mtrr_add() will use mtrr_check()
>     and that should splat a warning when the MTRR base and size are
>     not compatible with what is expected for MTRR usage.
> ---
>
> If any of this is too risky we can use the __arch_phys_wc_add() (or as
> Andy suggested perhaps use set_page_* stuff, although I am still evaluating
> this) but I did this change to show the effort required for a change when
> the registers / framebuffer is on the same PCI BAR but at different offsets.
>
> [0] scripts/kernel-doc -man -function mtrr_add_page arch/x86/kernel/cpu/mtrr/main.c | nroff -man | less
>
>   Luis



-- 
Andy Lutomirski
AMA Capital Management, LLC

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
  2015-03-20 23:48     ` Andy Lutomirski
@ 2015-03-27 19:53       ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 19:53 UTC (permalink / raw)
  To: Andy Lutomirski, Bjorn Helgaas, Ville Syrjälä,
	Mauro Carvalho Chehab, Mike Marciniszyn
  Cc: Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Juergen Gross, Jan Beulich, Borislav Petkov, Suresh Siddha,
	venkatesh.pallipadi, Dave Airlie, linux-kernel,
	Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > Ideally on systems using PAT we can expect a swift
> > transition away from MTRR. There can be a few exceptions
> > to this, one is where device drivers are known to exist
> > on PATs with errata, another situation is observed on
> > old device drivers where devices had combined MMIO
> > register access with whatever area they typically
> > later wanted to end up using MTRR for on the same
> > PCI BAR. This situation can still be addressed by
> > splitting up ioremap'd PCI BAR into two ioremap'd
> > calls, one for MMIO registers, and another for whatever
> > is desirable for write-combining -- in order to
> > accomplish this though quite a bit of driver
> > restructuring is required.
> >
> > Device drivers which are known to require large
> > amount of re-work in order to split ioremap'd areas
> > can use __arch_phys_wc_add() to avoid regressions
> > when PAT is enabled.
> >
> > For a good example driver where things are neatly
> > split up on a PCI BAR refer the infiniband qib
> > driver. For a good example of a driver where good
> > amount of work is required refer to the infiniband
> > ipath driver.
> >
> > This is *only* a transitive API -- and as such no new
> > drivers are ever expected to use this.
> 
> What's the exact layout that this helps?  I'm sceptical that this can
> ever be correct.
> 
> Is there some awful driver that has a large ioremap that's supposed to
> contain multiple different memtypes? 

Yes, I cc'd you just now on one where I made changes on a driver which uses one
PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
regress those drivers by making the MTRR WC hole trick non functional.
The changes are non trivial and so in this series I supplied changes on
one driver only to show the effort required. The other drivers which
required this were:

Driver		File
------------------------------------------------------------
fusion		drivers/message/fusion/mptbase.c
ivtv		drivers/media/pci/ivtv/ivtvfb.c
ipath		drivers/infiniband/hw/ipath/ipath_driver.c

This series makes those drivers use __arch_phys_wc_add() more as a
transitory phase in hopes we can address the proper split as with the
atyfb illustrates. For ipath the changes required have a nice template
with the qib driver as they share very similar driver structure, the
qib driver *did* do the nice split.

> If so, can we ioremap + set_page_xyz instead?

I'm not sure I see which call we'd use.  Care to provide an example patch
alternative for the atyfb as a case in point alternative to the work required
to do the split?

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
@ 2015-03-27 19:53       ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 19:53 UTC (permalink / raw)
  To: Andy Lutomirski, Bjorn Helgaas, Ville Syrjälä,
	Mauro Carvalho Chehab, Mike Marciniszyn
  Cc: Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Juergen Gross, Jan Beulich, Borislav Petkov, Suresh Siddha,
	venkatesh.pallipadi, Dave Airlie, linux-kernel,
	Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > Ideally on systems using PAT we can expect a swift
> > transition away from MTRR. There can be a few exceptions
> > to this, one is where device drivers are known to exist
> > on PATs with errata, another situation is observed on
> > old device drivers where devices had combined MMIO
> > register access with whatever area they typically
> > later wanted to end up using MTRR for on the same
> > PCI BAR. This situation can still be addressed by
> > splitting up ioremap'd PCI BAR into two ioremap'd
> > calls, one for MMIO registers, and another for whatever
> > is desirable for write-combining -- in order to
> > accomplish this though quite a bit of driver
> > restructuring is required.
> >
> > Device drivers which are known to require large
> > amount of re-work in order to split ioremap'd areas
> > can use __arch_phys_wc_add() to avoid regressions
> > when PAT is enabled.
> >
> > For a good example driver where things are neatly
> > split up on a PCI BAR refer the infiniband qib
> > driver. For a good example of a driver where good
> > amount of work is required refer to the infiniband
> > ipath driver.
> >
> > This is *only* a transitive API -- and as such no new
> > drivers are ever expected to use this.
> 
> What's the exact layout that this helps?  I'm sceptical that this can
> ever be correct.
> 
> Is there some awful driver that has a large ioremap that's supposed to
> contain multiple different memtypes? 

Yes, I cc'd you just now on one where I made changes on a driver which uses one
PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
regress those drivers by making the MTRR WC hole trick non functional.
The changes are non trivial and so in this series I supplied changes on
one driver only to show the effort required. The other drivers which
required this were:

Driver		File
------------------------------------------------------------
fusion		drivers/message/fusion/mptbase.c
ivtv		drivers/media/pci/ivtv/ivtvfb.c
ipath		drivers/infiniband/hw/ipath/ipath_driver.c

This series makes those drivers use __arch_phys_wc_add() more as a
transitory phase in hopes we can address the proper split as with the
atyfb illustrates. For ipath the changes required have a nice template
with the qib driver as they share very similar driver structure, the
qib driver *did* do the nice split.

> If so, can we ioremap + set_page_xyz instead?

I'm not sure I see which call we'd use.  Care to provide an example patch
alternative for the atyfb as a case in point alternative to the work required
to do the split?

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
  2015-03-20 23:48     ` Andy Lutomirski
  (?)
@ 2015-03-27 19:53     ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 19:53 UTC (permalink / raw)
  To: Andy Lutomirski, Bjorn Helgaas, Ville Syrjälä,
	Mauro Carvalho Chehab, Mike Marciniszyn
  Cc: Juergen Gross, Linux Fbdev development list, X86 ML,
	Suresh Siddha, Antonino Daplas, Luis R. Rodriguez, Daniel Vetter,
	Tomi Valkeinen, venkatesh.pallipadi, linux-kernel, xen-devel,
	Ingo Molnar, Jan Beulich, H. Peter Anvin, Dave Airlie,
	Thomas Gleixner, Borislav Petkov,
	Jean-Christophe Plagniol-Villard, Ingo Molnar

On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > Ideally on systems using PAT we can expect a swift
> > transition away from MTRR. There can be a few exceptions
> > to this, one is where device drivers are known to exist
> > on PATs with errata, another situation is observed on
> > old device drivers where devices had combined MMIO
> > register access with whatever area they typically
> > later wanted to end up using MTRR for on the same
> > PCI BAR. This situation can still be addressed by
> > splitting up ioremap'd PCI BAR into two ioremap'd
> > calls, one for MMIO registers, and another for whatever
> > is desirable for write-combining -- in order to
> > accomplish this though quite a bit of driver
> > restructuring is required.
> >
> > Device drivers which are known to require large
> > amount of re-work in order to split ioremap'd areas
> > can use __arch_phys_wc_add() to avoid regressions
> > when PAT is enabled.
> >
> > For a good example driver where things are neatly
> > split up on a PCI BAR refer the infiniband qib
> > driver. For a good example of a driver where good
> > amount of work is required refer to the infiniband
> > ipath driver.
> >
> > This is *only* a transitive API -- and as such no new
> > drivers are ever expected to use this.
> 
> What's the exact layout that this helps?  I'm sceptical that this can
> ever be correct.
> 
> Is there some awful driver that has a large ioremap that's supposed to
> contain multiple different memtypes? 

Yes, I cc'd you just now on one where I made changes on a driver which uses one
PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
regress those drivers by making the MTRR WC hole trick non functional.
The changes are non trivial and so in this series I supplied changes on
one driver only to show the effort required. The other drivers which
required this were:

Driver		File
------------------------------------------------------------
fusion		drivers/message/fusion/mptbase.c
ivtv		drivers/media/pci/ivtv/ivtvfb.c
ipath		drivers/infiniband/hw/ipath/ipath_driver.c

This series makes those drivers use __arch_phys_wc_add() more as a
transitory phase in hopes we can address the proper split as with the
atyfb illustrates. For ipath the changes required have a nice template
with the qib driver as they share very similar driver structure, the
qib driver *did* do the nice split.

> If so, can we ioremap + set_page_xyz instead?

I'm not sure I see which call we'd use.  Care to provide an example patch
alternative for the atyfb as a case in point alternative to the work required
to do the split?

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-27 19:43         ` Andy Lutomirski
@ 2015-03-27 19:57           ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 19:57 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ville Syrjälä,
	Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Juergen Gross, Jan Beulich, Borislav Petkov,
	Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
	Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
	Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> >> > index 8025624..8875e56 100644
> >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> >> >
> >> >  #ifdef CONFIG_MTRR
> >> >     par->mtrr_aper = -1;
> >> > -   par->mtrr_reg = -1;
> >> >     if (!nomtrr) {
> >> > -           /* Cover the whole resource. */
> >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
> >> > +                                     info->fix.smem_len,
> >> >                                       MTRR_TYPE_WRCOMB, 1);
> >>
> >> MTRRs need power of two size, so how is this supposed to work?
> >
> > As per mtrr_add_page() [0] the base and size are just supposed to be in units
> > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> > is not standardized and by no means recorded as a requirement. Obviously
> > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> > will use mtrr_check() to verify the the same requirement. Furthermore,
> > as per my commit log message:
> 
> Whatever the code may or may not do, the x86 architecture uses
> power-of-two MTRR sizes.  So I'm confused.

There should be no confusion, I simply did not know that *was* the
requirement for x86, if that is the case we should add a check for that
and perhaps generalize a helper that does the power of two helper changes,
the cleanest I found was the vesafb driver solution.

Thoughts?

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
@ 2015-03-27 19:57           ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 19:57 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ville Syrjälä,
	Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Juergen Gross, Jan Beulich, Borislav Petkov,
	Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
	Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
	Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> >> > index 8025624..8875e56 100644
> >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> >> >
> >> >  #ifdef CONFIG_MTRR
> >> >     par->mtrr_aper = -1;
> >> > -   par->mtrr_reg = -1;
> >> >     if (!nomtrr) {
> >> > -           /* Cover the whole resource. */
> >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
> >> > +                                     info->fix.smem_len,
> >> >                                       MTRR_TYPE_WRCOMB, 1);
> >>
> >> MTRRs need power of two size, so how is this supposed to work?
> >
> > As per mtrr_add_page() [0] the base and size are just supposed to be in units
> > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> > is not standardized and by no means recorded as a requirement. Obviously
> > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> > will use mtrr_check() to verify the the same requirement. Furthermore,
> > as per my commit log message:
> 
> Whatever the code may or may not do, the x86 architecture uses
> power-of-two MTRR sizes.  So I'm confused.

There should be no confusion, I simply did not know that *was* the
requirement for x86, if that is the case we should add a check for that
and perhaps generalize a helper that does the power of two helper changes,
the cleanest I found was the vesafb driver solution.

Thoughts?

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-27 19:43         ` Andy Lutomirski
  (?)
@ 2015-03-27 19:57         ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 19:57 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Fbdev development list, Daniel Vetter,
	Ville Syrjälä,
	Jan Beulich, H. Peter Anvin, Suresh Siddha, Tomi Valkeinen,
	X86 ML, Ingo Molnar, xen-devel, Ingo Molnar, Borislav Petkov,
	Jean-Christophe Plagniol-Villard, Antonino Daplas, Dave Airlie,
	Bjorn Helgaas, Thomas Gleixner, Juergen Gross, Luis R. Rodriguez,
	linux-kernel, venkatesh.pallipadi

On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> >> > index 8025624..8875e56 100644
> >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> >> >
> >> >  #ifdef CONFIG_MTRR
> >> >     par->mtrr_aper = -1;
> >> > -   par->mtrr_reg = -1;
> >> >     if (!nomtrr) {
> >> > -           /* Cover the whole resource. */
> >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
> >> > +                                     info->fix.smem_len,
> >> >                                       MTRR_TYPE_WRCOMB, 1);
> >>
> >> MTRRs need power of two size, so how is this supposed to work?
> >
> > As per mtrr_add_page() [0] the base and size are just supposed to be in units
> > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> > is not standardized and by no means recorded as a requirement. Obviously
> > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> > will use mtrr_check() to verify the the same requirement. Furthermore,
> > as per my commit log message:
> 
> Whatever the code may or may not do, the x86 architecture uses
> power-of-two MTRR sizes.  So I'm confused.

There should be no confusion, I simply did not know that *was* the
requirement for x86, if that is the case we should add a check for that
and perhaps generalize a helper that does the power of two helper changes,
the cleanest I found was the vesafb driver solution.

Thoughts?

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
  2015-03-27 19:53       ` Luis R. Rodriguez
@ 2015-03-27 19:58         ` Andy Lutomirski
  -1 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-27 19:58 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Bjorn Helgaas, Ville Syrjälä,
	Mauro Carvalho Chehab, Mike Marciniszyn, Luis R. Rodriguez,
	Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
	Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
	Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
	xen-devel, Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
>> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
>> <mcgrof@do-not-panic.com> wrote:
>> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
>> >
>> > Ideally on systems using PAT we can expect a swift
>> > transition away from MTRR. There can be a few exceptions
>> > to this, one is where device drivers are known to exist
>> > on PATs with errata, another situation is observed on
>> > old device drivers where devices had combined MMIO
>> > register access with whatever area they typically
>> > later wanted to end up using MTRR for on the same
>> > PCI BAR. This situation can still be addressed by
>> > splitting up ioremap'd PCI BAR into two ioremap'd
>> > calls, one for MMIO registers, and another for whatever
>> > is desirable for write-combining -- in order to
>> > accomplish this though quite a bit of driver
>> > restructuring is required.
>> >
>> > Device drivers which are known to require large
>> > amount of re-work in order to split ioremap'd areas
>> > can use __arch_phys_wc_add() to avoid regressions
>> > when PAT is enabled.
>> >
>> > For a good example driver where things are neatly
>> > split up on a PCI BAR refer the infiniband qib
>> > driver. For a good example of a driver where good
>> > amount of work is required refer to the infiniband
>> > ipath driver.
>> >
>> > This is *only* a transitive API -- and as such no new
>> > drivers are ever expected to use this.
>>
>> What's the exact layout that this helps?  I'm sceptical that this can
>> ever be correct.
>>
>> Is there some awful driver that has a large ioremap that's supposed to
>> contain multiple different memtypes?
>
> Yes, I cc'd you just now on one where I made changes on a driver which uses one
> PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
> arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
> regress those drivers by making the MTRR WC hole trick non functional.
> The changes are non trivial and so in this series I supplied changes on
> one driver only to show the effort required. The other drivers which
> required this were:
>
> Driver          File
> ------------------------------------------------------------
> fusion          drivers/message/fusion/mptbase.c
> ivtv            drivers/media/pci/ivtv/ivtvfb.c
> ipath           drivers/infiniband/hw/ipath/ipath_driver.c
>
> This series makes those drivers use __arch_phys_wc_add() more as a
> transitory phase in hopes we can address the proper split as with the
> atyfb illustrates. For ipath the changes required have a nice template
> with the qib driver as they share very similar driver structure, the
> qib driver *did* do the nice split.
>
>> If so, can we ioremap + set_page_xyz instead?
>
> I'm not sure I see which call we'd use.  Care to provide an example patch
> alternative for the atyfb as a case in point alternative to the work required
> to do the split?
>

I'm still confused.  Would it be insufficient to ioremap_nocache the
whole thing and then call set_memory_wc on parts of it?  (Sorry,
set_page_xyz was a typo.)

--Andy


-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
@ 2015-03-27 19:58         ` Andy Lutomirski
  0 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-27 19:58 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Bjorn Helgaas, Ville Syrjälä,
	Mauro Carvalho Chehab, Mike Marciniszyn, Luis R. Rodriguez,
	Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
	Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
	Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
	xen-devel, Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
>> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
>> <mcgrof@do-not-panic.com> wrote:
>> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
>> >
>> > Ideally on systems using PAT we can expect a swift
>> > transition away from MTRR. There can be a few exceptions
>> > to this, one is where device drivers are known to exist
>> > on PATs with errata, another situation is observed on
>> > old device drivers where devices had combined MMIO
>> > register access with whatever area they typically
>> > later wanted to end up using MTRR for on the same
>> > PCI BAR. This situation can still be addressed by
>> > splitting up ioremap'd PCI BAR into two ioremap'd
>> > calls, one for MMIO registers, and another for whatever
>> > is desirable for write-combining -- in order to
>> > accomplish this though quite a bit of driver
>> > restructuring is required.
>> >
>> > Device drivers which are known to require large
>> > amount of re-work in order to split ioremap'd areas
>> > can use __arch_phys_wc_add() to avoid regressions
>> > when PAT is enabled.
>> >
>> > For a good example driver where things are neatly
>> > split up on a PCI BAR refer the infiniband qib
>> > driver. For a good example of a driver where good
>> > amount of work is required refer to the infiniband
>> > ipath driver.
>> >
>> > This is *only* a transitive API -- and as such no new
>> > drivers are ever expected to use this.
>>
>> What's the exact layout that this helps?  I'm sceptical that this can
>> ever be correct.
>>
>> Is there some awful driver that has a large ioremap that's supposed to
>> contain multiple different memtypes?
>
> Yes, I cc'd you just now on one where I made changes on a driver which uses one
> PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
> arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
> regress those drivers by making the MTRR WC hole trick non functional.
> The changes are non trivial and so in this series I supplied changes on
> one driver only to show the effort required. The other drivers which
> required this were:
>
> Driver          File
> ------------------------------------------------------------
> fusion          drivers/message/fusion/mptbase.c
> ivtv            drivers/media/pci/ivtv/ivtvfb.c
> ipath           drivers/infiniband/hw/ipath/ipath_driver.c
>
> This series makes those drivers use __arch_phys_wc_add() more as a
> transitory phase in hopes we can address the proper split as with the
> atyfb illustrates. For ipath the changes required have a nice template
> with the qib driver as they share very similar driver structure, the
> qib driver *did* do the nice split.
>
>> If so, can we ioremap + set_page_xyz instead?
>
> I'm not sure I see which call we'd use.  Care to provide an example patch
> alternative for the atyfb as a case in point alternative to the work required
> to do the split?
>

I'm still confused.  Would it be insufficient to ioremap_nocache the
whole thing and then call set_memory_wc on parts of it?  (Sorry,
set_page_xyz was a typo.)

--Andy


-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
  2015-03-27 19:53       ` Luis R. Rodriguez
  (?)
@ 2015-03-27 19:58       ` Andy Lutomirski
  -1 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-27 19:58 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Linux Fbdev development list, Daniel Vetter,
	Ville Syrjälä,
	Jan Beulich, H. Peter Anvin, Suresh Siddha, Tomi Valkeinen,
	X86 ML, Ingo Molnar, Dave Airlie, Ingo Molnar, Borislav Petkov,
	Jean-Christophe Plagniol-Villard, Antonino Daplas,
	Mauro Carvalho Chehab, Mike Marciniszyn, xen-devel,
	Bjorn Helgaas, Thomas Gleixner, Juergen Gross, Luis R. Rodriguez,
	linux-kernel

On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
>> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
>> <mcgrof@do-not-panic.com> wrote:
>> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
>> >
>> > Ideally on systems using PAT we can expect a swift
>> > transition away from MTRR. There can be a few exceptions
>> > to this, one is where device drivers are known to exist
>> > on PATs with errata, another situation is observed on
>> > old device drivers where devices had combined MMIO
>> > register access with whatever area they typically
>> > later wanted to end up using MTRR for on the same
>> > PCI BAR. This situation can still be addressed by
>> > splitting up ioremap'd PCI BAR into two ioremap'd
>> > calls, one for MMIO registers, and another for whatever
>> > is desirable for write-combining -- in order to
>> > accomplish this though quite a bit of driver
>> > restructuring is required.
>> >
>> > Device drivers which are known to require large
>> > amount of re-work in order to split ioremap'd areas
>> > can use __arch_phys_wc_add() to avoid regressions
>> > when PAT is enabled.
>> >
>> > For a good example driver where things are neatly
>> > split up on a PCI BAR refer the infiniband qib
>> > driver. For a good example of a driver where good
>> > amount of work is required refer to the infiniband
>> > ipath driver.
>> >
>> > This is *only* a transitive API -- and as such no new
>> > drivers are ever expected to use this.
>>
>> What's the exact layout that this helps?  I'm sceptical that this can
>> ever be correct.
>>
>> Is there some awful driver that has a large ioremap that's supposed to
>> contain multiple different memtypes?
>
> Yes, I cc'd you just now on one where I made changes on a driver which uses one
> PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
> arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
> regress those drivers by making the MTRR WC hole trick non functional.
> The changes are non trivial and so in this series I supplied changes on
> one driver only to show the effort required. The other drivers which
> required this were:
>
> Driver          File
> ------------------------------------------------------------
> fusion          drivers/message/fusion/mptbase.c
> ivtv            drivers/media/pci/ivtv/ivtvfb.c
> ipath           drivers/infiniband/hw/ipath/ipath_driver.c
>
> This series makes those drivers use __arch_phys_wc_add() more as a
> transitory phase in hopes we can address the proper split as with the
> atyfb illustrates. For ipath the changes required have a nice template
> with the qib driver as they share very similar driver structure, the
> qib driver *did* do the nice split.
>
>> If so, can we ioremap + set_page_xyz instead?
>
> I'm not sure I see which call we'd use.  Care to provide an example patch
> alternative for the atyfb as a case in point alternative to the work required
> to do the split?
>

I'm still confused.  Would it be insufficient to ioremap_nocache the
whole thing and then call set_memory_wc on parts of it?  (Sorry,
set_page_xyz was a typo.)

--Andy


-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-20 23:52     ` Andy Lutomirski
@ 2015-03-27 20:12       ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 20:12 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Juergen Gross, Jan Beulich, Borislav Petkov, Suresh Siddha,
	venkatesh.pallipadi, Dave Airlie, linux-kernel,
	Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
	Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 20, 2015 at 04:52:18PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > index 8025624..8875e56 100644
> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> >
> >  #ifdef CONFIG_MTRR
> >         par->mtrr_aper = -1;
> > -       par->mtrr_reg = -1;
> >         if (!nomtrr) {
> > -               /* Cover the whole resource. */
> > -               par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > +               par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > +                                         info->fix.smem_len,
> >                                           MTRR_TYPE_WRCOMB, 1);
> > -               if (par->mtrr_aper >= 0 && !par->aux_start) {
> > -                       /* Make a hole for mmio. */
> > -                       par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
> > -                                                GUI_RESERVE, GUI_RESERVE,
> > -                                                MTRR_TYPE_UNCACHABLE, 1);
> > -                       if (par->mtrr_reg < 0) {
> > -                               mtrr_del(par->mtrr_aper, 0, 0);
> > -                               par->mtrr_aper = -1;
> > -                       }
> > -               }
> >         }
> >  #endif
> >
> > @@ -2776,10 +2765,6 @@ aty_init_exit:
> >         par->pll_ops->set_pll(info, &par->saved_pll);
> >
> >  #ifdef CONFIG_MTRR
> > -       if (par->mtrr_reg >= 0) {
> > -               mtrr_del(par->mtrr_reg, 0, 0);
> > -               par->mtrr_reg = -1;
> > -       }
> >         if (par->mtrr_aper >= 0) {
> >                 mtrr_del(par->mtrr_aper, 0, 0);
> >                 par->mtrr_aper = -1;
> > @@ -3466,7 +3451,7 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
> >         }
> >
> >         info->fix.mmio_start = raddr;
> > -       par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
> > +       par->ati_regbase = ioremap_nocache(info->fix.mmio_start, 0x1000);
> 
> Double-check me, but I think that ioremap_nocache + WC MTRR = WC. 

Precicely, in this case the WC hole was obtained by using MTRR WC. This
patch removes that WC hole trick and now we can be explciit about
only wanting ioremap_nocache() on the registers, that is WC is not
desired here and is not used. The patch does not highlight the fact
that there was left in place another ioremap() call for the framebuffer:

info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);

That is the one that later after this patch we use ioremap_wc() for.
This patch just removes the hole solution. That's all.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
@ 2015-03-27 20:12       ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 20:12 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Juergen Gross, Jan Beulich, Borislav Petkov, Suresh Siddha,
	venkatesh.pallipadi, Dave Airlie, linux-kernel,
	Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
	Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 20, 2015 at 04:52:18PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > index 8025624..8875e56 100644
> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> >
> >  #ifdef CONFIG_MTRR
> >         par->mtrr_aper = -1;
> > -       par->mtrr_reg = -1;
> >         if (!nomtrr) {
> > -               /* Cover the whole resource. */
> > -               par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > +               par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > +                                         info->fix.smem_len,
> >                                           MTRR_TYPE_WRCOMB, 1);
> > -               if (par->mtrr_aper >= 0 && !par->aux_start) {
> > -                       /* Make a hole for mmio. */
> > -                       par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
> > -                                                GUI_RESERVE, GUI_RESERVE,
> > -                                                MTRR_TYPE_UNCACHABLE, 1);
> > -                       if (par->mtrr_reg < 0) {
> > -                               mtrr_del(par->mtrr_aper, 0, 0);
> > -                               par->mtrr_aper = -1;
> > -                       }
> > -               }
> >         }
> >  #endif
> >
> > @@ -2776,10 +2765,6 @@ aty_init_exit:
> >         par->pll_ops->set_pll(info, &par->saved_pll);
> >
> >  #ifdef CONFIG_MTRR
> > -       if (par->mtrr_reg >= 0) {
> > -               mtrr_del(par->mtrr_reg, 0, 0);
> > -               par->mtrr_reg = -1;
> > -       }
> >         if (par->mtrr_aper >= 0) {
> >                 mtrr_del(par->mtrr_aper, 0, 0);
> >                 par->mtrr_aper = -1;
> > @@ -3466,7 +3451,7 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
> >         }
> >
> >         info->fix.mmio_start = raddr;
> > -       par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
> > +       par->ati_regbase = ioremap_nocache(info->fix.mmio_start, 0x1000);
> 
> Double-check me, but I think that ioremap_nocache + WC MTRR = WC. 

Precicely, in this case the WC hole was obtained by using MTRR WC. This
patch removes that WC hole trick and now we can be explciit about
only wanting ioremap_nocache() on the registers, that is WC is not
desired here and is not used. The patch does not highlight the fact
that there was left in place another ioremap() call for the framebuffer:

info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);

That is the one that later after this patch we use ioremap_wc() for.
This patch just removes the hole solution. That's all.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-20 23:52     ` Andy Lutomirski
  (?)
@ 2015-03-27 20:12     ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 20:12 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Juergen Gross, Jean-Christophe Plagniol-Villard,
	Linux Fbdev development list, X86 ML, Suresh Siddha,
	Antonino Daplas, Luis R. Rodriguez, Daniel Vetter,
	Tomi Valkeinen, venkatesh.pallipadi, linux-kernel, xen-devel,
	Ingo Molnar, Jan Beulich, H. Peter Anvin, Dave Airlie,
	Thomas Gleixner, Borislav Petkov, Linus Torvalds, Ingo Molnar

On Fri, Mar 20, 2015 at 04:52:18PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > index 8025624..8875e56 100644
> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> >
> >  #ifdef CONFIG_MTRR
> >         par->mtrr_aper = -1;
> > -       par->mtrr_reg = -1;
> >         if (!nomtrr) {
> > -               /* Cover the whole resource. */
> > -               par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > +               par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > +                                         info->fix.smem_len,
> >                                           MTRR_TYPE_WRCOMB, 1);
> > -               if (par->mtrr_aper >= 0 && !par->aux_start) {
> > -                       /* Make a hole for mmio. */
> > -                       par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
> > -                                                GUI_RESERVE, GUI_RESERVE,
> > -                                                MTRR_TYPE_UNCACHABLE, 1);
> > -                       if (par->mtrr_reg < 0) {
> > -                               mtrr_del(par->mtrr_aper, 0, 0);
> > -                               par->mtrr_aper = -1;
> > -                       }
> > -               }
> >         }
> >  #endif
> >
> > @@ -2776,10 +2765,6 @@ aty_init_exit:
> >         par->pll_ops->set_pll(info, &par->saved_pll);
> >
> >  #ifdef CONFIG_MTRR
> > -       if (par->mtrr_reg >= 0) {
> > -               mtrr_del(par->mtrr_reg, 0, 0);
> > -               par->mtrr_reg = -1;
> > -       }
> >         if (par->mtrr_aper >= 0) {
> >                 mtrr_del(par->mtrr_aper, 0, 0);
> >                 par->mtrr_aper = -1;
> > @@ -3466,7 +3451,7 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
> >         }
> >
> >         info->fix.mmio_start = raddr;
> > -       par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
> > +       par->ati_regbase = ioremap_nocache(info->fix.mmio_start, 0x1000);
> 
> Double-check me, but I think that ioremap_nocache + WC MTRR = WC. 

Precicely, in this case the WC hole was obtained by using MTRR WC. This
patch removes that WC hole trick and now we can be explciit about
only wanting ioremap_nocache() on the registers, that is WC is not
desired here and is not used. The patch does not highlight the fact
that there was left in place another ioremap() call for the framebuffer:

info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);

That is the one that later after this patch we use ioremap_wc() for.
This patch just removes the hole solution. That's all.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
  2015-03-27 19:58         ` Andy Lutomirski
@ 2015-03-27 20:30           ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 20:30 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Bjorn Helgaas, Ville Syrjälä,
	Mauro Carvalho Chehab, Mike Marciniszyn, Luis R. Rodriguez,
	Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
	Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
	Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
	xen-devel, Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 27, 2015 at 12:58:02PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> >> <mcgrof@do-not-panic.com> wrote:
> >> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >> >
> >> > Ideally on systems using PAT we can expect a swift
> >> > transition away from MTRR. There can be a few exceptions
> >> > to this, one is where device drivers are known to exist
> >> > on PATs with errata, another situation is observed on
> >> > old device drivers where devices had combined MMIO
> >> > register access with whatever area they typically
> >> > later wanted to end up using MTRR for on the same
> >> > PCI BAR. This situation can still be addressed by
> >> > splitting up ioremap'd PCI BAR into two ioremap'd
> >> > calls, one for MMIO registers, and another for whatever
> >> > is desirable for write-combining -- in order to
> >> > accomplish this though quite a bit of driver
> >> > restructuring is required.
> >> >
> >> > Device drivers which are known to require large
> >> > amount of re-work in order to split ioremap'd areas
> >> > can use __arch_phys_wc_add() to avoid regressions
> >> > when PAT is enabled.
> >> >
> >> > For a good example driver where things are neatly
> >> > split up on a PCI BAR refer the infiniband qib
> >> > driver. For a good example of a driver where good
> >> > amount of work is required refer to the infiniband
> >> > ipath driver.
> >> >
> >> > This is *only* a transitive API -- and as such no new
> >> > drivers are ever expected to use this.
> >>
> >> What's the exact layout that this helps?  I'm sceptical that this can
> >> ever be correct.
> >>
> >> Is there some awful driver that has a large ioremap that's supposed to
> >> contain multiple different memtypes?
> >
> > Yes, I cc'd you just now on one where I made changes on a driver which uses one
> > PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
> > arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
> > regress those drivers by making the MTRR WC hole trick non functional.
> > The changes are non trivial and so in this series I supplied changes on
> > one driver only to show the effort required. The other drivers which
> > required this were:
> >
> > Driver          File
> > ------------------------------------------------------------
> > fusion          drivers/message/fusion/mptbase.c
> > ivtv            drivers/media/pci/ivtv/ivtvfb.c
> > ipath           drivers/infiniband/hw/ipath/ipath_driver.c
> >
> > This series makes those drivers use __arch_phys_wc_add() more as a
> > transitory phase in hopes we can address the proper split as with the
> > atyfb illustrates. For ipath the changes required have a nice template
> > with the qib driver as they share very similar driver structure, the
> > qib driver *did* do the nice split.
> >
> >> If so, can we ioremap + set_page_xyz instead?
> >
> > I'm not sure I see which call we'd use.  Care to provide an example patch
> > alternative for the atyfb as a case in point alternative to the work required
> > to do the split?
> >
> 
> I'm still confused.  Would it be insufficient to ioremap_nocache the
> whole thing and then call set_memory_wc on parts of it?  (Sorry,
> set_page_xyz was a typo.)

I think that would be a sexy alternative.

In this driver's case the thing is a bit messy as it not only used
the WC MTRR for a hole but it also then used a UC MTRR on top of
it all, so since I already tried to address the split, and if we address
the power of 2 woes, I think it'd be best to try to remove the UC MTRR
and just avoid set_page_wc() in this driver's case, but for the other cases
(fusion, ivtv, ipath) I think this makes sense.

Thoughts?

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
@ 2015-03-27 20:30           ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 20:30 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Bjorn Helgaas, Ville Syrjälä,
	Mauro Carvalho Chehab, Mike Marciniszyn, Luis R. Rodriguez,
	Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
	Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
	Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
	xen-devel, Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 27, 2015 at 12:58:02PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> >> <mcgrof@do-not-panic.com> wrote:
> >> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >> >
> >> > Ideally on systems using PAT we can expect a swift
> >> > transition away from MTRR. There can be a few exceptions
> >> > to this, one is where device drivers are known to exist
> >> > on PATs with errata, another situation is observed on
> >> > old device drivers where devices had combined MMIO
> >> > register access with whatever area they typically
> >> > later wanted to end up using MTRR for on the same
> >> > PCI BAR. This situation can still be addressed by
> >> > splitting up ioremap'd PCI BAR into two ioremap'd
> >> > calls, one for MMIO registers, and another for whatever
> >> > is desirable for write-combining -- in order to
> >> > accomplish this though quite a bit of driver
> >> > restructuring is required.
> >> >
> >> > Device drivers which are known to require large
> >> > amount of re-work in order to split ioremap'd areas
> >> > can use __arch_phys_wc_add() to avoid regressions
> >> > when PAT is enabled.
> >> >
> >> > For a good example driver where things are neatly
> >> > split up on a PCI BAR refer the infiniband qib
> >> > driver. For a good example of a driver where good
> >> > amount of work is required refer to the infiniband
> >> > ipath driver.
> >> >
> >> > This is *only* a transitive API -- and as such no new
> >> > drivers are ever expected to use this.
> >>
> >> What's the exact layout that this helps?  I'm sceptical that this can
> >> ever be correct.
> >>
> >> Is there some awful driver that has a large ioremap that's supposed to
> >> contain multiple different memtypes?
> >
> > Yes, I cc'd you just now on one where I made changes on a driver which uses one
> > PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
> > arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
> > regress those drivers by making the MTRR WC hole trick non functional.
> > The changes are non trivial and so in this series I supplied changes on
> > one driver only to show the effort required. The other drivers which
> > required this were:
> >
> > Driver          File
> > ------------------------------------------------------------
> > fusion          drivers/message/fusion/mptbase.c
> > ivtv            drivers/media/pci/ivtv/ivtvfb.c
> > ipath           drivers/infiniband/hw/ipath/ipath_driver.c
> >
> > This series makes those drivers use __arch_phys_wc_add() more as a
> > transitory phase in hopes we can address the proper split as with the
> > atyfb illustrates. For ipath the changes required have a nice template
> > with the qib driver as they share very similar driver structure, the
> > qib driver *did* do the nice split.
> >
> >> If so, can we ioremap + set_page_xyz instead?
> >
> > I'm not sure I see which call we'd use.  Care to provide an example patch
> > alternative for the atyfb as a case in point alternative to the work required
> > to do the split?
> >
> 
> I'm still confused.  Would it be insufficient to ioremap_nocache the
> whole thing and then call set_memory_wc on parts of it?  (Sorry,
> set_page_xyz was a typo.)

I think that would be a sexy alternative.

In this driver's case the thing is a bit messy as it not only used
the WC MTRR for a hole but it also then used a UC MTRR on top of
it all, so since I already tried to address the split, and if we address
the power of 2 woes, I think it'd be best to try to remove the UC MTRR
and just avoid set_page_wc() in this driver's case, but for the other cases
(fusion, ivtv, ipath) I think this makes sense.

Thoughts?

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
  2015-03-27 19:58         ` Andy Lutomirski
  (?)
@ 2015-03-27 20:30         ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 20:30 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Fbdev development list, Daniel Vetter,
	Ville Syrjälä,
	Jan Beulich, H. Peter Anvin, Suresh Siddha, Tomi Valkeinen,
	X86 ML, Ingo Molnar, Dave Airlie, Ingo Molnar, Borislav Petkov,
	Jean-Christophe Plagniol-Villard, Antonino Daplas,
	Mauro Carvalho Chehab, Mike Marciniszyn, xen-devel,
	Bjorn Helgaas, Thomas Gleixner, Juergen Gross, Luis R. Rodriguez,
	linux-kernel

On Fri, Mar 27, 2015 at 12:58:02PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> >> <mcgrof@do-not-panic.com> wrote:
> >> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >> >
> >> > Ideally on systems using PAT we can expect a swift
> >> > transition away from MTRR. There can be a few exceptions
> >> > to this, one is where device drivers are known to exist
> >> > on PATs with errata, another situation is observed on
> >> > old device drivers where devices had combined MMIO
> >> > register access with whatever area they typically
> >> > later wanted to end up using MTRR for on the same
> >> > PCI BAR. This situation can still be addressed by
> >> > splitting up ioremap'd PCI BAR into two ioremap'd
> >> > calls, one for MMIO registers, and another for whatever
> >> > is desirable for write-combining -- in order to
> >> > accomplish this though quite a bit of driver
> >> > restructuring is required.
> >> >
> >> > Device drivers which are known to require large
> >> > amount of re-work in order to split ioremap'd areas
> >> > can use __arch_phys_wc_add() to avoid regressions
> >> > when PAT is enabled.
> >> >
> >> > For a good example driver where things are neatly
> >> > split up on a PCI BAR refer the infiniband qib
> >> > driver. For a good example of a driver where good
> >> > amount of work is required refer to the infiniband
> >> > ipath driver.
> >> >
> >> > This is *only* a transitive API -- and as such no new
> >> > drivers are ever expected to use this.
> >>
> >> What's the exact layout that this helps?  I'm sceptical that this can
> >> ever be correct.
> >>
> >> Is there some awful driver that has a large ioremap that's supposed to
> >> contain multiple different memtypes?
> >
> > Yes, I cc'd you just now on one where I made changes on a driver which uses one
> > PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
> > arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
> > regress those drivers by making the MTRR WC hole trick non functional.
> > The changes are non trivial and so in this series I supplied changes on
> > one driver only to show the effort required. The other drivers which
> > required this were:
> >
> > Driver          File
> > ------------------------------------------------------------
> > fusion          drivers/message/fusion/mptbase.c
> > ivtv            drivers/media/pci/ivtv/ivtvfb.c
> > ipath           drivers/infiniband/hw/ipath/ipath_driver.c
> >
> > This series makes those drivers use __arch_phys_wc_add() more as a
> > transitory phase in hopes we can address the proper split as with the
> > atyfb illustrates. For ipath the changes required have a nice template
> > with the qib driver as they share very similar driver structure, the
> > qib driver *did* do the nice split.
> >
> >> If so, can we ioremap + set_page_xyz instead?
> >
> > I'm not sure I see which call we'd use.  Care to provide an example patch
> > alternative for the atyfb as a case in point alternative to the work required
> > to do the split?
> >
> 
> I'm still confused.  Would it be insufficient to ioremap_nocache the
> whole thing and then call set_memory_wc on parts of it?  (Sorry,
> set_page_xyz was a typo.)

I think that would be a sexy alternative.

In this driver's case the thing is a bit messy as it not only used
the WC MTRR for a hole but it also then used a UC MTRR on top of
it all, so since I already tried to address the split, and if we address
the power of 2 woes, I think it'd be best to try to remove the UC MTRR
and just avoid set_page_wc() in this driver's case, but for the other cases
(fusion, ivtv, ipath) I think this makes sense.

Thoughts?

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 21/47] ethernet: myri10ge: use arch_phys_wc_add()
  2015-03-21  7:08     ` Hyong-Youb Kim
@ 2015-03-27 20:36       ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 20:36 UTC (permalink / raw)
  To: Hyong-Youb Kim
  Cc: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
	suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
	linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
	Hyong-Youb Kim, netdev, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Sat, Mar 21, 2015 at 04:08:00PM +0900, Hyong-Youb Kim wrote:
> On Fri, Mar 20, 2015 at 04:18:11PM -0700, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> > 
> > This driver already uses ioremap_wc() on the same range
> > so when write-combining is available that will be used
> > instead.
> > 
> [...]
> > --- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
> > +++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
> [...]
> > @@ -1984,7 +1979,6 @@ myri10ge_get_ethtool_stats(struct net_device *netdev,
> >  		data[i] = ((u64 *)&link_stats)[i];
> >  
> >  	data[i++] = (unsigned int)mgp->tx_boundary;
> > -	data[i++] = (unsigned int)mgp->wc_enabled;
> >  	data[i++] = (unsigned int)mgp->pdev->irq;
> >  	data[i++] = (unsigned int)mgp->msi_enabled;
> >  	data[i++] = (unsigned int)mgp->msix_enabled;
> 
> You would have to delete "WC from myri10ge_gstrings_main_stats too.
> Something like below.  Thanks.
> 
> @@ -1905,7 +1905,7 @@ static const char myri10ge_gstrings_main_stats[][ETH_GSTRING_LEN] = {
>  	"tx_aborted_errors", "tx_carrier_errors", "tx_fifo_errors",
>  	"tx_heartbeat_errors", "tx_window_errors",
>  	/* device-specific stats */
> -	"tx_boundary", "WC", "irq", "MSI", "MSIX",
> +	"tx_boundary", "irq", "MSI", "MSIX",
>  	"read_dma_bw_MBs", "write_dma_bw_MBs", "read_write_dma_bw_MBs",
>  	"serial_number", "watchdog_resets",
>  #ifdef CONFIG_MYRI10GE_DCA

OK great thanks. Amended.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 21/47] ethernet: myri10ge: use arch_phys_wc_add()
@ 2015-03-27 20:36       ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 20:36 UTC (permalink / raw)
  To: Hyong-Youb Kim
  Cc: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
	suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
	linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
	Hyong-Youb Kim, netdev, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Sat, Mar 21, 2015 at 04:08:00PM +0900, Hyong-Youb Kim wrote:
> On Fri, Mar 20, 2015 at 04:18:11PM -0700, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> > 
> > This driver already uses ioremap_wc() on the same range
> > so when write-combining is available that will be used
> > instead.
> > 
> [...]
> > --- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
> > +++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
> [...]
> > @@ -1984,7 +1979,6 @@ myri10ge_get_ethtool_stats(struct net_device *netdev,
> >  		data[i] = ((u64 *)&link_stats)[i];
> >  
> >  	data[i++] = (unsigned int)mgp->tx_boundary;
> > -	data[i++] = (unsigned int)mgp->wc_enabled;
> >  	data[i++] = (unsigned int)mgp->pdev->irq;
> >  	data[i++] = (unsigned int)mgp->msi_enabled;
> >  	data[i++] = (unsigned int)mgp->msix_enabled;
> 
> You would have to delete "WC from myri10ge_gstrings_main_stats too.
> Something like below.  Thanks.
> 
> @@ -1905,7 +1905,7 @@ static const char myri10ge_gstrings_main_stats[][ETH_GSTRING_LEN] = {
>  	"tx_aborted_errors", "tx_carrier_errors", "tx_fifo_errors",
>  	"tx_heartbeat_errors", "tx_window_errors",
>  	/* device-specific stats */
> -	"tx_boundary", "WC", "irq", "MSI", "MSIX",
> +	"tx_boundary", "irq", "MSI", "MSIX",
>  	"read_dma_bw_MBs", "write_dma_bw_MBs", "read_write_dma_bw_MBs",
>  	"serial_number", "watchdog_resets",
>  #ifdef CONFIG_MYRI10GE_DCA

OK great thanks. Amended.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 21/47] ethernet: myri10ge: use arch_phys_wc_add()
  2015-03-21  7:08     ` Hyong-Youb Kim
  (?)
@ 2015-03-27 20:36     ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 20:36 UTC (permalink / raw)
  To: Hyong-Youb Kim
  Cc: linux-fbdev, Daniel Vetter, JBeulich, hpa, suresh.b.siddha, x86,
	Tomi Valkeinen, xen-devel, Ingo Molnar, bp,
	Jean-Christophe Plagniol-Villard, Antonino Daplas, airlied, tglx,
	mingo, jgross, Luis R. Rodriguez, netdev, linux-kernel, luto,
	Hyong-Youb Kim, venkatesh.pallipadi

On Sat, Mar 21, 2015 at 04:08:00PM +0900, Hyong-Youb Kim wrote:
> On Fri, Mar 20, 2015 at 04:18:11PM -0700, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> > 
> > This driver already uses ioremap_wc() on the same range
> > so when write-combining is available that will be used
> > instead.
> > 
> [...]
> > --- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
> > +++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
> [...]
> > @@ -1984,7 +1979,6 @@ myri10ge_get_ethtool_stats(struct net_device *netdev,
> >  		data[i] = ((u64 *)&link_stats)[i];
> >  
> >  	data[i++] = (unsigned int)mgp->tx_boundary;
> > -	data[i++] = (unsigned int)mgp->wc_enabled;
> >  	data[i++] = (unsigned int)mgp->pdev->irq;
> >  	data[i++] = (unsigned int)mgp->msi_enabled;
> >  	data[i++] = (unsigned int)mgp->msix_enabled;
> 
> You would have to delete "WC from myri10ge_gstrings_main_stats too.
> Something like below.  Thanks.
> 
> @@ -1905,7 +1905,7 @@ static const char myri10ge_gstrings_main_stats[][ETH_GSTRING_LEN] = {
>  	"tx_aborted_errors", "tx_carrier_errors", "tx_fifo_errors",
>  	"tx_heartbeat_errors", "tx_window_errors",
>  	/* device-specific stats */
> -	"tx_boundary", "WC", "irq", "MSI", "MSIX",
> +	"tx_boundary", "irq", "MSI", "MSIX",
>  	"read_dma_bw_MBs", "write_dma_bw_MBs", "read_write_dma_bw_MBs",
>  	"serial_number", "watchdog_resets",
>  #ifdef CONFIG_MYRI10GE_DCA

OK great thanks. Amended.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
  2015-03-20 23:17   ` Luis R. Rodriguez
@ 2015-03-27 20:40     ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-03-27 20:40 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied, linux-kernel, linux-fbdev, x86,
	xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Stefan Bader, konrad.wilk,
	ville.syrjala, david.vrabel, bhelgaas, Roger Pau Monné,
	xen-devel

On Fri, 2015-03-20 at 16:17 -0700, Luis R. Rodriguez wrote:
 :
> @@ -734,6 +742,7 @@ void __init mtrr_bp_init(void)
>  	}
>  
>  	if (mtrr_if) {
> +		mtrr_enabled = true;
>  		set_num_var_ranges();
>  		init_table();
>  		if (use_intel()) {
                        get_mtrr_state();

After setting mtrr_enabled to true, get_mtrr_state() reads
MSR_MTRRdefType and sets 'mtrr_state.enabled', which also indicates if
MTRRs are enabled or not on the system.  So, potentially, we could have
a case that mtrr_enabled is set to true, but mtrr_state.enabled is set
to disabled when MTRRs are disabled by BIOS.

Thanks,
-Toshi

ps.
I recently cleaned up this part of the MTRR code in the patch below,
which is currently available in the -mm & -next trees.
https://lkml.org/lkml/2015/3/24/1063





^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
@ 2015-03-27 20:40     ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-03-27 20:40 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: luto, mingo, tglx, hpa, jgross, JBeulich, bp, suresh.b.siddha,
	venkatesh.pallipadi, airlied, linux-kernel, linux-fbdev, x86,
	xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Stefan Bader, konrad.wilk,
	ville.syrjala, david.vrabel, bhelgaas, Roger Pau Monné,
	xen-devel

On Fri, 2015-03-20 at 16:17 -0700, Luis R. Rodriguez wrote:
 :
> @@ -734,6 +742,7 @@ void __init mtrr_bp_init(void)
>  	}
>  
>  	if (mtrr_if) {
> +		mtrr_enabled = true;
>  		set_num_var_ranges();
>  		init_table();
>  		if (use_intel()) {
                        get_mtrr_state();

After setting mtrr_enabled to true, get_mtrr_state() reads
MSR_MTRRdefType and sets 'mtrr_state.enabled', which also indicates if
MTRRs are enabled or not on the system.  So, potentially, we could have
a case that mtrr_enabled is set to true, but mtrr_state.enabled is set
to disabled when MTRRs are disabled by BIOS.

Thanks,
-Toshi

ps.
I recently cleaned up this part of the MTRR code in the patch below,
which is currently available in the -mm & -next trees.
https://lkml.org/lkml/2015/3/24/1063





^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-27 20:12       ` Luis R. Rodriguez
@ 2015-03-27 21:21         ` Andy Lutomirski
  -1 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-27 21:21 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Juergen Gross, Jan Beulich, Borislav Petkov, Suresh Siddha,
	venkatesh.pallipadi, Dave Airlie, linux-kernel,
	Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
	Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 27, 2015 at 1:12 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Fri, Mar 20, 2015 at 04:52:18PM -0700, Andy Lutomirski wrote:
>> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
>> <mcgrof@do-not-panic.com> wrote:
>> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
>> > index 8025624..8875e56 100644
>> > --- a/drivers/video/fbdev/aty/atyfb_base.c
>> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
>> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
>> >
>> >  #ifdef CONFIG_MTRR
>> >         par->mtrr_aper = -1;
>> > -       par->mtrr_reg = -1;
>> >         if (!nomtrr) {
>> > -               /* Cover the whole resource. */
>> > -               par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
>> > +               par->mtrr_aper = mtrr_add(info->fix.smem_start,
>> > +                                         info->fix.smem_len,
>> >                                           MTRR_TYPE_WRCOMB, 1);
>> > -               if (par->mtrr_aper >= 0 && !par->aux_start) {
>> > -                       /* Make a hole for mmio. */
>> > -                       par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
>> > -                                                GUI_RESERVE, GUI_RESERVE,
>> > -                                                MTRR_TYPE_UNCACHABLE, 1);
>> > -                       if (par->mtrr_reg < 0) {
>> > -                               mtrr_del(par->mtrr_aper, 0, 0);
>> > -                               par->mtrr_aper = -1;
>> > -                       }
>> > -               }
>> >         }
>> >  #endif
>> >
>> > @@ -2776,10 +2765,6 @@ aty_init_exit:
>> >         par->pll_ops->set_pll(info, &par->saved_pll);
>> >
>> >  #ifdef CONFIG_MTRR
>> > -       if (par->mtrr_reg >= 0) {
>> > -               mtrr_del(par->mtrr_reg, 0, 0);
>> > -               par->mtrr_reg = -1;
>> > -       }
>> >         if (par->mtrr_aper >= 0) {
>> >                 mtrr_del(par->mtrr_aper, 0, 0);
>> >                 par->mtrr_aper = -1;
>> > @@ -3466,7 +3451,7 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
>> >         }
>> >
>> >         info->fix.mmio_start = raddr;
>> > -       par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
>> > +       par->ati_regbase = ioremap_nocache(info->fix.mmio_start, 0x1000);
>>
>> Double-check me, but I think that ioremap_nocache + WC MTRR = WC.
>
> Precicely, in this case the WC hole was obtained by using MTRR WC. This
> patch removes that WC hole trick and now we can be explciit about
> only wanting ioremap_nocache() on the registers, that is WC is not
> desired here and is not used. The patch does not highlight the fact
> that there was left in place another ioremap() call for the framebuffer:
>
> info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
>
> That is the one that later after this patch we use ioremap_wc() for.
> This patch just removes the hole solution. That's all.
>

I don't understand.

If I read it right, there's a 2^n byte BAR.  You're requesting WC for
the whole think using arch_phys_wc_add.  On a PAT system that has no
effect and all is well.  On a non-PAT system, it adds an MTRR.  That
means that you need to override the MTRR somehow for the mmio regs,
and UC- won't do the trick.

Or am I missing something here?

--Andy

>   Luis



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
@ 2015-03-27 21:21         ` Andy Lutomirski
  0 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-27 21:21 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Juergen Gross, Jan Beulich, Borislav Petkov, Suresh Siddha,
	venkatesh.pallipadi, Dave Airlie, linux-kernel,
	Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
	Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 27, 2015 at 1:12 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Fri, Mar 20, 2015 at 04:52:18PM -0700, Andy Lutomirski wrote:
>> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
>> <mcgrof@do-not-panic.com> wrote:
>> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
>> > index 8025624..8875e56 100644
>> > --- a/drivers/video/fbdev/aty/atyfb_base.c
>> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
>> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
>> >
>> >  #ifdef CONFIG_MTRR
>> >         par->mtrr_aper = -1;
>> > -       par->mtrr_reg = -1;
>> >         if (!nomtrr) {
>> > -               /* Cover the whole resource. */
>> > -               par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
>> > +               par->mtrr_aper = mtrr_add(info->fix.smem_start,
>> > +                                         info->fix.smem_len,
>> >                                           MTRR_TYPE_WRCOMB, 1);
>> > -               if (par->mtrr_aper >= 0 && !par->aux_start) {
>> > -                       /* Make a hole for mmio. */
>> > -                       par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
>> > -                                                GUI_RESERVE, GUI_RESERVE,
>> > -                                                MTRR_TYPE_UNCACHABLE, 1);
>> > -                       if (par->mtrr_reg < 0) {
>> > -                               mtrr_del(par->mtrr_aper, 0, 0);
>> > -                               par->mtrr_aper = -1;
>> > -                       }
>> > -               }
>> >         }
>> >  #endif
>> >
>> > @@ -2776,10 +2765,6 @@ aty_init_exit:
>> >         par->pll_ops->set_pll(info, &par->saved_pll);
>> >
>> >  #ifdef CONFIG_MTRR
>> > -       if (par->mtrr_reg >= 0) {
>> > -               mtrr_del(par->mtrr_reg, 0, 0);
>> > -               par->mtrr_reg = -1;
>> > -       }
>> >         if (par->mtrr_aper >= 0) {
>> >                 mtrr_del(par->mtrr_aper, 0, 0);
>> >                 par->mtrr_aper = -1;
>> > @@ -3466,7 +3451,7 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
>> >         }
>> >
>> >         info->fix.mmio_start = raddr;
>> > -       par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
>> > +       par->ati_regbase = ioremap_nocache(info->fix.mmio_start, 0x1000);
>>
>> Double-check me, but I think that ioremap_nocache + WC MTRR = WC.
>
> Precicely, in this case the WC hole was obtained by using MTRR WC. This
> patch removes that WC hole trick and now we can be explciit about
> only wanting ioremap_nocache() on the registers, that is WC is not
> desired here and is not used. The patch does not highlight the fact
> that there was left in place another ioremap() call for the framebuffer:
>
> info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
>
> That is the one that later after this patch we use ioremap_wc() for.
> This patch just removes the hole solution. That's all.
>

I don't understand.

If I read it right, there's a 2^n byte BAR.  You're requesting WC for
the whole think using arch_phys_wc_add.  On a PAT system that has no
effect and all is well.  On a non-PAT system, it adds an MTRR.  That
means that you need to override the MTRR somehow for the mmio regs,
and UC- won't do the trick.

Or am I missing something here?

--Andy

>   Luis



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-27 20:12       ` Luis R. Rodriguez
  (?)
@ 2015-03-27 21:21       ` Andy Lutomirski
  -1 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-27 21:21 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Juergen Gross, Jean-Christophe Plagniol-Villard,
	Linux Fbdev development list, X86 ML, Suresh Siddha,
	Antonino Daplas, Luis R. Rodriguez, Daniel Vetter,
	Tomi Valkeinen, venkatesh.pallipadi, linux-kernel, xen-devel,
	Ingo Molnar, Jan Beulich, H. Peter Anvin, Dave Airlie,
	Thomas Gleixner, Borislav Petkov, Linus Torvalds, Ingo Molnar

On Fri, Mar 27, 2015 at 1:12 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Fri, Mar 20, 2015 at 04:52:18PM -0700, Andy Lutomirski wrote:
>> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
>> <mcgrof@do-not-panic.com> wrote:
>> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
>> > index 8025624..8875e56 100644
>> > --- a/drivers/video/fbdev/aty/atyfb_base.c
>> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
>> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
>> >
>> >  #ifdef CONFIG_MTRR
>> >         par->mtrr_aper = -1;
>> > -       par->mtrr_reg = -1;
>> >         if (!nomtrr) {
>> > -               /* Cover the whole resource. */
>> > -               par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
>> > +               par->mtrr_aper = mtrr_add(info->fix.smem_start,
>> > +                                         info->fix.smem_len,
>> >                                           MTRR_TYPE_WRCOMB, 1);
>> > -               if (par->mtrr_aper >= 0 && !par->aux_start) {
>> > -                       /* Make a hole for mmio. */
>> > -                       par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
>> > -                                                GUI_RESERVE, GUI_RESERVE,
>> > -                                                MTRR_TYPE_UNCACHABLE, 1);
>> > -                       if (par->mtrr_reg < 0) {
>> > -                               mtrr_del(par->mtrr_aper, 0, 0);
>> > -                               par->mtrr_aper = -1;
>> > -                       }
>> > -               }
>> >         }
>> >  #endif
>> >
>> > @@ -2776,10 +2765,6 @@ aty_init_exit:
>> >         par->pll_ops->set_pll(info, &par->saved_pll);
>> >
>> >  #ifdef CONFIG_MTRR
>> > -       if (par->mtrr_reg >= 0) {
>> > -               mtrr_del(par->mtrr_reg, 0, 0);
>> > -               par->mtrr_reg = -1;
>> > -       }
>> >         if (par->mtrr_aper >= 0) {
>> >                 mtrr_del(par->mtrr_aper, 0, 0);
>> >                 par->mtrr_aper = -1;
>> > @@ -3466,7 +3451,7 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
>> >         }
>> >
>> >         info->fix.mmio_start = raddr;
>> > -       par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
>> > +       par->ati_regbase = ioremap_nocache(info->fix.mmio_start, 0x1000);
>>
>> Double-check me, but I think that ioremap_nocache + WC MTRR = WC.
>
> Precicely, in this case the WC hole was obtained by using MTRR WC. This
> patch removes that WC hole trick and now we can be explciit about
> only wanting ioremap_nocache() on the registers, that is WC is not
> desired here and is not used. The patch does not highlight the fact
> that there was left in place another ioremap() call for the framebuffer:
>
> info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
>
> That is the one that later after this patch we use ioremap_wc() for.
> This patch just removes the hole solution. That's all.
>

I don't understand.

If I read it right, there's a 2^n byte BAR.  You're requesting WC for
the whole think using arch_phys_wc_add.  On a PAT system that has no
effect and all is well.  On a non-PAT system, it adds an MTRR.  That
means that you need to override the MTRR somehow for the mmio regs,
and UC- won't do the trick.

Or am I missing something here?

--Andy

>   Luis



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
  2015-03-27 20:30           ` Luis R. Rodriguez
@ 2015-03-27 21:23             ` Andy Lutomirski
  -1 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-27 21:23 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Bjorn Helgaas, Ville Syrjälä,
	Mauro Carvalho Chehab, Mike Marciniszyn, Luis R. Rodriguez,
	Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
	Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
	Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
	xen-devel, Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 27, 2015 at 1:30 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Fri, Mar 27, 2015 at 12:58:02PM -0700, Andy Lutomirski wrote:
>> On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> > On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
>> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
>> >> <mcgrof@do-not-panic.com> wrote:
>> >> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
>> >> >
>> >> > Ideally on systems using PAT we can expect a swift
>> >> > transition away from MTRR. There can be a few exceptions
>> >> > to this, one is where device drivers are known to exist
>> >> > on PATs with errata, another situation is observed on
>> >> > old device drivers where devices had combined MMIO
>> >> > register access with whatever area they typically
>> >> > later wanted to end up using MTRR for on the same
>> >> > PCI BAR. This situation can still be addressed by
>> >> > splitting up ioremap'd PCI BAR into two ioremap'd
>> >> > calls, one for MMIO registers, and another for whatever
>> >> > is desirable for write-combining -- in order to
>> >> > accomplish this though quite a bit of driver
>> >> > restructuring is required.
>> >> >
>> >> > Device drivers which are known to require large
>> >> > amount of re-work in order to split ioremap'd areas
>> >> > can use __arch_phys_wc_add() to avoid regressions
>> >> > when PAT is enabled.
>> >> >
>> >> > For a good example driver where things are neatly
>> >> > split up on a PCI BAR refer the infiniband qib
>> >> > driver. For a good example of a driver where good
>> >> > amount of work is required refer to the infiniband
>> >> > ipath driver.
>> >> >
>> >> > This is *only* a transitive API -- and as such no new
>> >> > drivers are ever expected to use this.
>> >>
>> >> What's the exact layout that this helps?  I'm sceptical that this can
>> >> ever be correct.
>> >>
>> >> Is there some awful driver that has a large ioremap that's supposed to
>> >> contain multiple different memtypes?
>> >
>> > Yes, I cc'd you just now on one where I made changes on a driver which uses one
>> > PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
>> > arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
>> > regress those drivers by making the MTRR WC hole trick non functional.
>> > The changes are non trivial and so in this series I supplied changes on
>> > one driver only to show the effort required. The other drivers which
>> > required this were:
>> >
>> > Driver          File
>> > ------------------------------------------------------------
>> > fusion          drivers/message/fusion/mptbase.c
>> > ivtv            drivers/media/pci/ivtv/ivtvfb.c
>> > ipath           drivers/infiniband/hw/ipath/ipath_driver.c
>> >
>> > This series makes those drivers use __arch_phys_wc_add() more as a
>> > transitory phase in hopes we can address the proper split as with the
>> > atyfb illustrates. For ipath the changes required have a nice template
>> > with the qib driver as they share very similar driver structure, the
>> > qib driver *did* do the nice split.
>> >
>> >> If so, can we ioremap + set_page_xyz instead?
>> >
>> > I'm not sure I see which call we'd use.  Care to provide an example patch
>> > alternative for the atyfb as a case in point alternative to the work required
>> > to do the split?
>> >
>>
>> I'm still confused.  Would it be insufficient to ioremap_nocache the
>> whole thing and then call set_memory_wc on parts of it?  (Sorry,
>> set_page_xyz was a typo.)
>
> I think that would be a sexy alternative.
>
> In this driver's case the thing is a bit messy as it not only used
> the WC MTRR for a hole but it also then used a UC MTRR on top of
> it all, so since I already tried to address the split, and if we address
> the power of 2 woes, I think it'd be best to try to remove the UC MTRR
> and just avoid set_page_wc() in this driver's case, but for the other cases
> (fusion, ivtv, ipath) I think this makes sense.
>
> Thoughts?

Once that WC MTRR is in place, I think you really need UC and not UC-
if you want to override it.  Otherwise I agree with all of this.

--Andy

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
@ 2015-03-27 21:23             ` Andy Lutomirski
  0 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-27 21:23 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Bjorn Helgaas, Ville Syrjälä,
	Mauro Carvalho Chehab, Mike Marciniszyn, Luis R. Rodriguez,
	Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
	Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
	Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
	xen-devel, Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 27, 2015 at 1:30 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Fri, Mar 27, 2015 at 12:58:02PM -0700, Andy Lutomirski wrote:
>> On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> > On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
>> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
>> >> <mcgrof@do-not-panic.com> wrote:
>> >> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
>> >> >
>> >> > Ideally on systems using PAT we can expect a swift
>> >> > transition away from MTRR. There can be a few exceptions
>> >> > to this, one is where device drivers are known to exist
>> >> > on PATs with errata, another situation is observed on
>> >> > old device drivers where devices had combined MMIO
>> >> > register access with whatever area they typically
>> >> > later wanted to end up using MTRR for on the same
>> >> > PCI BAR. This situation can still be addressed by
>> >> > splitting up ioremap'd PCI BAR into two ioremap'd
>> >> > calls, one for MMIO registers, and another for whatever
>> >> > is desirable for write-combining -- in order to
>> >> > accomplish this though quite a bit of driver
>> >> > restructuring is required.
>> >> >
>> >> > Device drivers which are known to require large
>> >> > amount of re-work in order to split ioremap'd areas
>> >> > can use __arch_phys_wc_add() to avoid regressions
>> >> > when PAT is enabled.
>> >> >
>> >> > For a good example driver where things are neatly
>> >> > split up on a PCI BAR refer the infiniband qib
>> >> > driver. For a good example of a driver where good
>> >> > amount of work is required refer to the infiniband
>> >> > ipath driver.
>> >> >
>> >> > This is *only* a transitive API -- and as such no new
>> >> > drivers are ever expected to use this.
>> >>
>> >> What's the exact layout that this helps?  I'm sceptical that this can
>> >> ever be correct.
>> >>
>> >> Is there some awful driver that has a large ioremap that's supposed to
>> >> contain multiple different memtypes?
>> >
>> > Yes, I cc'd you just now on one where I made changes on a driver which uses one
>> > PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
>> > arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
>> > regress those drivers by making the MTRR WC hole trick non functional.
>> > The changes are non trivial and so in this series I supplied changes on
>> > one driver only to show the effort required. The other drivers which
>> > required this were:
>> >
>> > Driver          File
>> > ------------------------------------------------------------
>> > fusion          drivers/message/fusion/mptbase.c
>> > ivtv            drivers/media/pci/ivtv/ivtvfb.c
>> > ipath           drivers/infiniband/hw/ipath/ipath_driver.c
>> >
>> > This series makes those drivers use __arch_phys_wc_add() more as a
>> > transitory phase in hopes we can address the proper split as with the
>> > atyfb illustrates. For ipath the changes required have a nice template
>> > with the qib driver as they share very similar driver structure, the
>> > qib driver *did* do the nice split.
>> >
>> >> If so, can we ioremap + set_page_xyz instead?
>> >
>> > I'm not sure I see which call we'd use.  Care to provide an example patch
>> > alternative for the atyfb as a case in point alternative to the work required
>> > to do the split?
>> >
>>
>> I'm still confused.  Would it be insufficient to ioremap_nocache the
>> whole thing and then call set_memory_wc on parts of it?  (Sorry,
>> set_page_xyz was a typo.)
>
> I think that would be a sexy alternative.
>
> In this driver's case the thing is a bit messy as it not only used
> the WC MTRR for a hole but it also then used a UC MTRR on top of
> it all, so since I already tried to address the split, and if we address
> the power of 2 woes, I think it'd be best to try to remove the UC MTRR
> and just avoid set_page_wc() in this driver's case, but for the other cases
> (fusion, ivtv, ipath) I think this makes sense.
>
> Thoughts?

Once that WC MTRR is in place, I think you really need UC and not UC-
if you want to override it.  Otherwise I agree with all of this.

--Andy

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
  2015-03-27 20:30           ` Luis R. Rodriguez
  (?)
  (?)
@ 2015-03-27 21:23           ` Andy Lutomirski
  -1 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-27 21:23 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Linux Fbdev development list, Daniel Vetter,
	Ville Syrjälä,
	Jan Beulich, H. Peter Anvin, Suresh Siddha, Tomi Valkeinen,
	X86 ML, Ingo Molnar, Dave Airlie, Ingo Molnar, Borislav Petkov,
	Jean-Christophe Plagniol-Villard, Antonino Daplas,
	Mauro Carvalho Chehab, Mike Marciniszyn, xen-devel,
	Bjorn Helgaas, Thomas Gleixner, Juergen Gross, Luis R. Rodriguez,
	linux-kernel

On Fri, Mar 27, 2015 at 1:30 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Fri, Mar 27, 2015 at 12:58:02PM -0700, Andy Lutomirski wrote:
>> On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> > On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
>> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
>> >> <mcgrof@do-not-panic.com> wrote:
>> >> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
>> >> >
>> >> > Ideally on systems using PAT we can expect a swift
>> >> > transition away from MTRR. There can be a few exceptions
>> >> > to this, one is where device drivers are known to exist
>> >> > on PATs with errata, another situation is observed on
>> >> > old device drivers where devices had combined MMIO
>> >> > register access with whatever area they typically
>> >> > later wanted to end up using MTRR for on the same
>> >> > PCI BAR. This situation can still be addressed by
>> >> > splitting up ioremap'd PCI BAR into two ioremap'd
>> >> > calls, one for MMIO registers, and another for whatever
>> >> > is desirable for write-combining -- in order to
>> >> > accomplish this though quite a bit of driver
>> >> > restructuring is required.
>> >> >
>> >> > Device drivers which are known to require large
>> >> > amount of re-work in order to split ioremap'd areas
>> >> > can use __arch_phys_wc_add() to avoid regressions
>> >> > when PAT is enabled.
>> >> >
>> >> > For a good example driver where things are neatly
>> >> > split up on a PCI BAR refer the infiniband qib
>> >> > driver. For a good example of a driver where good
>> >> > amount of work is required refer to the infiniband
>> >> > ipath driver.
>> >> >
>> >> > This is *only* a transitive API -- and as such no new
>> >> > drivers are ever expected to use this.
>> >>
>> >> What's the exact layout that this helps?  I'm sceptical that this can
>> >> ever be correct.
>> >>
>> >> Is there some awful driver that has a large ioremap that's supposed to
>> >> contain multiple different memtypes?
>> >
>> > Yes, I cc'd you just now on one where I made changes on a driver which uses one
>> > PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
>> > arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
>> > regress those drivers by making the MTRR WC hole trick non functional.
>> > The changes are non trivial and so in this series I supplied changes on
>> > one driver only to show the effort required. The other drivers which
>> > required this were:
>> >
>> > Driver          File
>> > ------------------------------------------------------------
>> > fusion          drivers/message/fusion/mptbase.c
>> > ivtv            drivers/media/pci/ivtv/ivtvfb.c
>> > ipath           drivers/infiniband/hw/ipath/ipath_driver.c
>> >
>> > This series makes those drivers use __arch_phys_wc_add() more as a
>> > transitory phase in hopes we can address the proper split as with the
>> > atyfb illustrates. For ipath the changes required have a nice template
>> > with the qib driver as they share very similar driver structure, the
>> > qib driver *did* do the nice split.
>> >
>> >> If so, can we ioremap + set_page_xyz instead?
>> >
>> > I'm not sure I see which call we'd use.  Care to provide an example patch
>> > alternative for the atyfb as a case in point alternative to the work required
>> > to do the split?
>> >
>>
>> I'm still confused.  Would it be insufficient to ioremap_nocache the
>> whole thing and then call set_memory_wc on parts of it?  (Sorry,
>> set_page_xyz was a typo.)
>
> I think that would be a sexy alternative.
>
> In this driver's case the thing is a bit messy as it not only used
> the WC MTRR for a hole but it also then used a UC MTRR on top of
> it all, so since I already tried to address the split, and if we address
> the power of 2 woes, I think it'd be best to try to remove the UC MTRR
> and just avoid set_page_wc() in this driver's case, but for the other cases
> (fusion, ivtv, ipath) I think this makes sense.
>
> Thoughts?

Once that WC MTRR is in place, I think you really need UC and not UC-
if you want to override it.  Otherwise I agree with all of this.

--Andy

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-27 19:57           ` Luis R. Rodriguez
@ 2015-03-27 21:56             ` Ville Syrjälä
  -1 siblings, 0 replies; 710+ messages in thread
From: Ville Syrjälä @ 2015-03-27 21:56 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Andy Lutomirski, Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
	Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
	linux-kernel, Linux Fbdev development list, X86 ML, xen-devel,
	Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > >> > index 8025624..8875e56 100644
> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> > >> >
> > >> >  #ifdef CONFIG_MTRR
> > >> >     par->mtrr_aper = -1;
> > >> > -   par->mtrr_reg = -1;
> > >> >     if (!nomtrr) {
> > >> > -           /* Cover the whole resource. */
> > >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > >> > +                                     info->fix.smem_len,
> > >> >                                       MTRR_TYPE_WRCOMB, 1);
> > >>
> > >> MTRRs need power of two size, so how is this supposed to work?
> > >
> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> > > is not standardized and by no means recorded as a requirement. Obviously
> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> > > will use mtrr_check() to verify the the same requirement. Furthermore,
> > > as per my commit log message:
> > 
> > Whatever the code may or may not do, the x86 architecture uses
> > power-of-two MTRR sizes.  So I'm confused.
> 
> There should be no confusion, I simply did not know that *was* the
> requirement for x86, if that is the case we should add a check for that
> and perhaps generalize a helper that does the power of two helper changes,
> the cleanest I found was the vesafb driver solution.
> 
> Thoughts?

The vesafb solution is bad since you'll only end up covering only
the first 4MB of the framebuffer instead of the almost 8MB you want.
Which in practice will mean throwing away half the VRAM since you really
don't want the massive performance hit from accessing it as UC. And that
would mean giving up decent display resolutions as well :(

And the other option of trying to cover the remainder with multiple ever
smaller MTRRs doesn't work either since you'll run out of MTRRs very
quickly.

This is precisely why I used the hole method in atyfb in the first
place.

I don't really like the idea of any new mtrr code not supporting that
use case, especially as these things tend to be present in older machines
where PAT isn't an option.

-- 
Ville Syrjälä
syrjala@sci.fi
http://www.sci.fi/~syrjala/

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
@ 2015-03-27 21:56             ` Ville Syrjälä
  0 siblings, 0 replies; 710+ messages in thread
From: Ville Syrjälä @ 2015-03-27 21:56 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Andy Lutomirski, Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
	Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
	linux-kernel, Linux Fbdev development list, X86 ML, xen-devel,
	Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > >> > index 8025624..8875e56 100644
> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> > >> >
> > >> >  #ifdef CONFIG_MTRR
> > >> >     par->mtrr_aper = -1;
> > >> > -   par->mtrr_reg = -1;
> > >> >     if (!nomtrr) {
> > >> > -           /* Cover the whole resource. */
> > >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > >> > +                                     info->fix.smem_len,
> > >> >                                       MTRR_TYPE_WRCOMB, 1);
> > >>
> > >> MTRRs need power of two size, so how is this supposed to work?
> > >
> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> > > is not standardized and by no means recorded as a requirement. Obviously
> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> > > will use mtrr_check() to verify the the same requirement. Furthermore,
> > > as per my commit log message:
> > 
> > Whatever the code may or may not do, the x86 architecture uses
> > power-of-two MTRR sizes.  So I'm confused.
> 
> There should be no confusion, I simply did not know that *was* the
> requirement for x86, if that is the case we should add a check for that
> and perhaps generalize a helper that does the power of two helper changes,
> the cleanest I found was the vesafb driver solution.
> 
> Thoughts?

The vesafb solution is bad since you'll only end up covering only
the first 4MB of the framebuffer instead of the almost 8MB you want.
Which in practice will mean throwing away half the VRAM since you really
don't want the massive performance hit from accessing it as UC. And that
would mean giving up decent display resolutions as well :(

And the other option of trying to cover the remainder with multiple ever
smaller MTRRs doesn't work either since you'll run out of MTRRs very
quickly.

This is precisely why I used the hole method in atyfb in the first
place.

I don't really like the idea of any new mtrr code not supporting that
use case, especially as these things tend to be present in older machines
where PAT isn't an option.

-- 
Ville Syrjälä
syrjala@sci.fi
http://www.sci.fi/~syrjala/

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-27 19:57           ` Luis R. Rodriguez
  (?)
  (?)
@ 2015-03-27 21:56           ` Ville Syrjälä
  -1 siblings, 0 replies; 710+ messages in thread
From: Ville Syrjälä @ 2015-03-27 21:56 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Linux Fbdev development list, Daniel Vetter, Jan Beulich,
	H. Peter Anvin, Suresh Siddha, X86 ML, Tomi Valkeinen, xen-devel,
	Ingo Molnar, Borislav Petkov, Jean-Christophe Plagniol-Villard,
	Antonino Daplas, Dave Airlie, Bjorn Helgaas, Thomas Gleixner,
	Ingo Molnar, Juergen Gross, Luis R. Rodriguez, linux-kernel,
	Andy Lutomirski, venkatesh.pallipadi, Linus Torvalds

On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > >> > index 8025624..8875e56 100644
> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> > >> >
> > >> >  #ifdef CONFIG_MTRR
> > >> >     par->mtrr_aper = -1;
> > >> > -   par->mtrr_reg = -1;
> > >> >     if (!nomtrr) {
> > >> > -           /* Cover the whole resource. */
> > >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > >> > +                                     info->fix.smem_len,
> > >> >                                       MTRR_TYPE_WRCOMB, 1);
> > >>
> > >> MTRRs need power of two size, so how is this supposed to work?
> > >
> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> > > is not standardized and by no means recorded as a requirement. Obviously
> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> > > will use mtrr_check() to verify the the same requirement. Furthermore,
> > > as per my commit log message:
> > 
> > Whatever the code may or may not do, the x86 architecture uses
> > power-of-two MTRR sizes.  So I'm confused.
> 
> There should be no confusion, I simply did not know that *was* the
> requirement for x86, if that is the case we should add a check for that
> and perhaps generalize a helper that does the power of two helper changes,
> the cleanest I found was the vesafb driver solution.
> 
> Thoughts?

The vesafb solution is bad since you'll only end up covering only
the first 4MB of the framebuffer instead of the almost 8MB you want.
Which in practice will mean throwing away half the VRAM since you really
don't want the massive performance hit from accessing it as UC. And that
would mean giving up decent display resolutions as well :(

And the other option of trying to cover the remainder with multiple ever
smaller MTRRs doesn't work either since you'll run out of MTRRs very
quickly.

This is precisely why I used the hole method in atyfb in the first
place.

I don't really like the idea of any new mtrr code not supporting that
use case, especially as these things tend to be present in older machines
where PAT isn't an option.

-- 
Ville Syrjälä
syrjala@sci.fi
http://www.sci.fi/~syrjala/

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-27 21:56             ` Ville Syrjälä
@ 2015-03-27 22:02               ` Andy Lutomirski
  -1 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-27 22:02 UTC (permalink / raw)
  To: Ville Syrjälä,
	Luis R. Rodriguez, Andy Lutomirski, Bjorn Helgaas,
	Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Juergen Gross, Jan Beulich, Borislav Petkov, Suresh Siddha,
	venkatesh.pallipadi, Dave Airlie, linux-kernel,
	Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
	Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 27, 2015 at 2:56 PM, Ville Syrjälä <syrjala@sci.fi> wrote:
> On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
>> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
>> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
>> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
>> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
>> > >> > index 8025624..8875e56 100644
>> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
>> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
>> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
>> > >> >
>> > >> >  #ifdef CONFIG_MTRR
>> > >> >     par->mtrr_aper = -1;
>> > >> > -   par->mtrr_reg = -1;
>> > >> >     if (!nomtrr) {
>> > >> > -           /* Cover the whole resource. */
>> > >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
>> > >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
>> > >> > +                                     info->fix.smem_len,
>> > >> >                                       MTRR_TYPE_WRCOMB, 1);
>> > >>
>> > >> MTRRs need power of two size, so how is this supposed to work?
>> > >
>> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
>> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
>> > > is not standardized and by no means recorded as a requirement. Obviously
>> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
>> > > will use mtrr_check() to verify the the same requirement. Furthermore,
>> > > as per my commit log message:
>> >
>> > Whatever the code may or may not do, the x86 architecture uses
>> > power-of-two MTRR sizes.  So I'm confused.
>>
>> There should be no confusion, I simply did not know that *was* the
>> requirement for x86, if that is the case we should add a check for that
>> and perhaps generalize a helper that does the power of two helper changes,
>> the cleanest I found was the vesafb driver solution.
>>
>> Thoughts?
>
> The vesafb solution is bad since you'll only end up covering only
> the first 4MB of the framebuffer instead of the almost 8MB you want.
> Which in practice will mean throwing away half the VRAM since you really
> don't want the massive performance hit from accessing it as UC. And that
> would mean giving up decent display resolutions as well :(
>
> And the other option of trying to cover the remainder with multiple ever
> smaller MTRRs doesn't work either since you'll run out of MTRRs very
> quickly.
>
> This is precisely why I used the hole method in atyfb in the first
> place.
>
> I don't really like the idea of any new mtrr code not supporting that
> use case, especially as these things tend to be present in older machines
> where PAT isn't an option.

According to the Intel SDM, volume 3, section 11.5.2.1, table 11-6,
non-PAT CPUs that have a WC MTRR, PCD = 1, and PWT = 1 (aka UC) have
an effective memory type of UC.  Hence my suggestion to add
ioremap_x86_uc and/or set_memory_x86_uc to punch a UC hole in an
otherwise WC MTRR-covered region.

ioremap_nocache is UC- (even on non-PAT unless I misunderstood how
this stuff works), so ioremap_nocache by itself isn't good enough.

--Andy

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
@ 2015-03-27 22:02               ` Andy Lutomirski
  0 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-27 22:02 UTC (permalink / raw)
  To: Ville Syrjälä,
	Luis R. Rodriguez, Andy Lutomirski, Bjorn Helgaas,
	Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Juergen Gross, Jan Beulich, Borislav Petkov, Suresh Siddha,
	venkatesh.pallipadi, Dave Airlie, linux-kernel,
	Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
	Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 27, 2015 at 2:56 PM, Ville Syrj채l채 <syrjala@sci.fi> wrote:
> On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
>> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
>> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrj채l채 wrote:
>> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
>> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
>> > >> > index 8025624..8875e56 100644
>> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
>> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
>> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
>> > >> >
>> > >> >  #ifdef CONFIG_MTRR
>> > >> >     par->mtrr_aper = -1;
>> > >> > -   par->mtrr_reg = -1;
>> > >> >     if (!nomtrr) {
>> > >> > -           /* Cover the whole resource. */
>> > >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
>> > >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
>> > >> > +                                     info->fix.smem_len,
>> > >> >                                       MTRR_TYPE_WRCOMB, 1);
>> > >>
>> > >> MTRRs need power of two size, so how is this supposed to work?
>> > >
>> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
>> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
>> > > is not standardized and by no means recorded as a requirement. Obviously
>> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
>> > > will use mtrr_check() to verify the the same requirement. Furthermore,
>> > > as per my commit log message:
>> >
>> > Whatever the code may or may not do, the x86 architecture uses
>> > power-of-two MTRR sizes.  So I'm confused.
>>
>> There should be no confusion, I simply did not know that *was* the
>> requirement for x86, if that is the case we should add a check for that
>> and perhaps generalize a helper that does the power of two helper changes,
>> the cleanest I found was the vesafb driver solution.
>>
>> Thoughts?
>
> The vesafb solution is bad since you'll only end up covering only
> the first 4MB of the framebuffer instead of the almost 8MB you want.
> Which in practice will mean throwing away half the VRAM since you really
> don't want the massive performance hit from accessing it as UC. And that
> would mean giving up decent display resolutions as well :(
>
> And the other option of trying to cover the remainder with multiple ever
> smaller MTRRs doesn't work either since you'll run out of MTRRs very
> quickly.
>
> This is precisely why I used the hole method in atyfb in the first
> place.
>
> I don't really like the idea of any new mtrr code not supporting that
> use case, especially as these things tend to be present in older machines
> where PAT isn't an option.

According to the Intel SDM, volume 3, section 11.5.2.1, table 11-6,
non-PAT CPUs that have a WC MTRR, PCD = 1, and PWT = 1 (aka UC) have
an effective memory type of UC.  Hence my suggestion to add
ioremap_x86_uc and/or set_memory_x86_uc to punch a UC hole in an
otherwise WC MTRR-covered region.

ioremap_nocache is UC- (even on non-PAT unless I misunderstood how
this stuff works), so ioremap_nocache by itself isn't good enough.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-27 21:56             ` Ville Syrjälä
  (?)
@ 2015-03-27 22:02             ` Andy Lutomirski
  -1 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-27 22:02 UTC (permalink / raw)
  To: Ville Syrjälä,
	Luis R. Rodriguez, Andy Lutomirski, Bjorn Helgaas,
	Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Juergen Gross, Jan Beulich, Borislav Petkov, Suresh Siddha,
	venkatesh.pallipadi, Dave Airlie, linux-kernel,
	Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
	Linus Torvalds, Daniel Vetter

On Fri, Mar 27, 2015 at 2:56 PM, Ville Syrjälä <syrjala@sci.fi> wrote:
> On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
>> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
>> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
>> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
>> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
>> > >> > index 8025624..8875e56 100644
>> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
>> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
>> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
>> > >> >
>> > >> >  #ifdef CONFIG_MTRR
>> > >> >     par->mtrr_aper = -1;
>> > >> > -   par->mtrr_reg = -1;
>> > >> >     if (!nomtrr) {
>> > >> > -           /* Cover the whole resource. */
>> > >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
>> > >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
>> > >> > +                                     info->fix.smem_len,
>> > >> >                                       MTRR_TYPE_WRCOMB, 1);
>> > >>
>> > >> MTRRs need power of two size, so how is this supposed to work?
>> > >
>> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
>> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
>> > > is not standardized and by no means recorded as a requirement. Obviously
>> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
>> > > will use mtrr_check() to verify the the same requirement. Furthermore,
>> > > as per my commit log message:
>> >
>> > Whatever the code may or may not do, the x86 architecture uses
>> > power-of-two MTRR sizes.  So I'm confused.
>>
>> There should be no confusion, I simply did not know that *was* the
>> requirement for x86, if that is the case we should add a check for that
>> and perhaps generalize a helper that does the power of two helper changes,
>> the cleanest I found was the vesafb driver solution.
>>
>> Thoughts?
>
> The vesafb solution is bad since you'll only end up covering only
> the first 4MB of the framebuffer instead of the almost 8MB you want.
> Which in practice will mean throwing away half the VRAM since you really
> don't want the massive performance hit from accessing it as UC. And that
> would mean giving up decent display resolutions as well :(
>
> And the other option of trying to cover the remainder with multiple ever
> smaller MTRRs doesn't work either since you'll run out of MTRRs very
> quickly.
>
> This is precisely why I used the hole method in atyfb in the first
> place.
>
> I don't really like the idea of any new mtrr code not supporting that
> use case, especially as these things tend to be present in older machines
> where PAT isn't an option.

According to the Intel SDM, volume 3, section 11.5.2.1, table 11-6,
non-PAT CPUs that have a WC MTRR, PCD = 1, and PWT = 1 (aka UC) have
an effective memory type of UC.  Hence my suggestion to add
ioremap_x86_uc and/or set_memory_x86_uc to punch a UC hole in an
otherwise WC MTRR-covered region.

ioremap_nocache is UC- (even on non-PAT unless I misunderstood how
this stuff works), so ioremap_nocache by itself isn't good enough.

--Andy

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
  2015-03-27 21:23             ` Andy Lutomirski
@ 2015-03-27 23:04               ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 23:04 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Bjorn Helgaas, Ville Syrjälä,
	Mauro Carvalho Chehab, Mike Marciniszyn, Luis R. Rodriguez,
	Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
	Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
	Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
	xen-devel, Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 27, 2015 at 02:23:16PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 27, 2015 at 1:30 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Fri, Mar 27, 2015 at 12:58:02PM -0700, Andy Lutomirski wrote:
> >> On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >> > On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
> >> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> >> >> <mcgrof@do-not-panic.com> wrote:
> >> >> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >> >> >
> >> >> > Ideally on systems using PAT we can expect a swift
> >> >> > transition away from MTRR. There can be a few exceptions
> >> >> > to this, one is where device drivers are known to exist
> >> >> > on PATs with errata, another situation is observed on
> >> >> > old device drivers where devices had combined MMIO
> >> >> > register access with whatever area they typically
> >> >> > later wanted to end up using MTRR for on the same
> >> >> > PCI BAR. This situation can still be addressed by
> >> >> > splitting up ioremap'd PCI BAR into two ioremap'd
> >> >> > calls, one for MMIO registers, and another for whatever
> >> >> > is desirable for write-combining -- in order to
> >> >> > accomplish this though quite a bit of driver
> >> >> > restructuring is required.
> >> >> >
> >> >> > Device drivers which are known to require large
> >> >> > amount of re-work in order to split ioremap'd areas
> >> >> > can use __arch_phys_wc_add() to avoid regressions
> >> >> > when PAT is enabled.
> >> >> >
> >> >> > For a good example driver where things are neatly
> >> >> > split up on a PCI BAR refer the infiniband qib
> >> >> > driver. For a good example of a driver where good
> >> >> > amount of work is required refer to the infiniband
> >> >> > ipath driver.
> >> >> >
> >> >> > This is *only* a transitive API -- and as such no new
> >> >> > drivers are ever expected to use this.
> >> >>
> >> >> What's the exact layout that this helps?  I'm sceptical that this can
> >> >> ever be correct.
> >> >>
> >> >> Is there some awful driver that has a large ioremap that's supposed to
> >> >> contain multiple different memtypes?
> >> >
> >> > Yes, I cc'd you just now on one where I made changes on a driver which uses one
> >> > PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
> >> > arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
> >> > regress those drivers by making the MTRR WC hole trick non functional.
> >> > The changes are non trivial and so in this series I supplied changes on
> >> > one driver only to show the effort required. The other drivers which
> >> > required this were:
> >> >
> >> > Driver          File
> >> > ------------------------------------------------------------
> >> > fusion          drivers/message/fusion/mptbase.c
> >> > ivtv            drivers/media/pci/ivtv/ivtvfb.c
> >> > ipath           drivers/infiniband/hw/ipath/ipath_driver.c
> >> >
> >> > This series makes those drivers use __arch_phys_wc_add() more as a
> >> > transitory phase in hopes we can address the proper split as with the
> >> > atyfb illustrates. For ipath the changes required have a nice template
> >> > with the qib driver as they share very similar driver structure, the
> >> > qib driver *did* do the nice split.
> >> >
> >> >> If so, can we ioremap + set_page_xyz instead?
> >> >
> >> > I'm not sure I see which call we'd use.  Care to provide an example patch
> >> > alternative for the atyfb as a case in point alternative to the work required
> >> > to do the split?
> >> >
> >>
> >> I'm still confused.  Would it be insufficient to ioremap_nocache the
> >> whole thing and then call set_memory_wc on parts of it?  (Sorry,
> >> set_page_xyz was a typo.)
> >
> > I think that would be a sexy alternative.
> >
> > In this driver's case the thing is a bit messy as it not only used
> > the WC MTRR for a hole but it also then used a UC MTRR on top of
> > it all, so since I already tried to address the split, and if we address
> > the power of 2 woes, I think it'd be best to try to remove the UC MTRR
> > and just avoid set_page_wc() in this driver's case, but for the other cases
> > (fusion, ivtv, ipath) I think this makes sense.
> >
> > Thoughts?
> 
> Once that WC MTRR is in place, I think you really need UC and not UC-
> if you want to override it.  Otherwise I agree with all of this.

Do you mean that the UC MTRR work around that was in place might not
have really been effective? Not quite sure what you mean. I don't think
I follow.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
@ 2015-03-27 23:04               ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 23:04 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Bjorn Helgaas, Ville Syrjälä,
	Mauro Carvalho Chehab, Mike Marciniszyn, Luis R. Rodriguez,
	Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
	Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
	Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
	xen-devel, Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 27, 2015 at 02:23:16PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 27, 2015 at 1:30 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Fri, Mar 27, 2015 at 12:58:02PM -0700, Andy Lutomirski wrote:
> >> On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >> > On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
> >> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> >> >> <mcgrof@do-not-panic.com> wrote:
> >> >> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >> >> >
> >> >> > Ideally on systems using PAT we can expect a swift
> >> >> > transition away from MTRR. There can be a few exceptions
> >> >> > to this, one is where device drivers are known to exist
> >> >> > on PATs with errata, another situation is observed on
> >> >> > old device drivers where devices had combined MMIO
> >> >> > register access with whatever area they typically
> >> >> > later wanted to end up using MTRR for on the same
> >> >> > PCI BAR. This situation can still be addressed by
> >> >> > splitting up ioremap'd PCI BAR into two ioremap'd
> >> >> > calls, one for MMIO registers, and another for whatever
> >> >> > is desirable for write-combining -- in order to
> >> >> > accomplish this though quite a bit of driver
> >> >> > restructuring is required.
> >> >> >
> >> >> > Device drivers which are known to require large
> >> >> > amount of re-work in order to split ioremap'd areas
> >> >> > can use __arch_phys_wc_add() to avoid regressions
> >> >> > when PAT is enabled.
> >> >> >
> >> >> > For a good example driver where things are neatly
> >> >> > split up on a PCI BAR refer the infiniband qib
> >> >> > driver. For a good example of a driver where good
> >> >> > amount of work is required refer to the infiniband
> >> >> > ipath driver.
> >> >> >
> >> >> > This is *only* a transitive API -- and as such no new
> >> >> > drivers are ever expected to use this.
> >> >>
> >> >> What's the exact layout that this helps?  I'm sceptical that this can
> >> >> ever be correct.
> >> >>
> >> >> Is there some awful driver that has a large ioremap that's supposed to
> >> >> contain multiple different memtypes?
> >> >
> >> > Yes, I cc'd you just now on one where I made changes on a driver which uses one
> >> > PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
> >> > arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
> >> > regress those drivers by making the MTRR WC hole trick non functional.
> >> > The changes are non trivial and so in this series I supplied changes on
> >> > one driver only to show the effort required. The other drivers which
> >> > required this were:
> >> >
> >> > Driver          File
> >> > ------------------------------------------------------------
> >> > fusion          drivers/message/fusion/mptbase.c
> >> > ivtv            drivers/media/pci/ivtv/ivtvfb.c
> >> > ipath           drivers/infiniband/hw/ipath/ipath_driver.c
> >> >
> >> > This series makes those drivers use __arch_phys_wc_add() more as a
> >> > transitory phase in hopes we can address the proper split as with the
> >> > atyfb illustrates. For ipath the changes required have a nice template
> >> > with the qib driver as they share very similar driver structure, the
> >> > qib driver *did* do the nice split.
> >> >
> >> >> If so, can we ioremap + set_page_xyz instead?
> >> >
> >> > I'm not sure I see which call we'd use.  Care to provide an example patch
> >> > alternative for the atyfb as a case in point alternative to the work required
> >> > to do the split?
> >> >
> >>
> >> I'm still confused.  Would it be insufficient to ioremap_nocache the
> >> whole thing and then call set_memory_wc on parts of it?  (Sorry,
> >> set_page_xyz was a typo.)
> >
> > I think that would be a sexy alternative.
> >
> > In this driver's case the thing is a bit messy as it not only used
> > the WC MTRR for a hole but it also then used a UC MTRR on top of
> > it all, so since I already tried to address the split, and if we address
> > the power of 2 woes, I think it'd be best to try to remove the UC MTRR
> > and just avoid set_page_wc() in this driver's case, but for the other cases
> > (fusion, ivtv, ipath) I think this makes sense.
> >
> > Thoughts?
> 
> Once that WC MTRR is in place, I think you really need UC and not UC-
> if you want to override it.  Otherwise I agree with all of this.

Do you mean that the UC MTRR work around that was in place might not
have really been effective? Not quite sure what you mean. I don't think
I follow.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
  2015-03-27 21:23             ` Andy Lutomirski
  (?)
@ 2015-03-27 23:04             ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 23:04 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Fbdev development list, Daniel Vetter,
	Ville Syrjälä,
	Jan Beulich, H. Peter Anvin, Suresh Siddha, Tomi Valkeinen,
	X86 ML, Ingo Molnar, Dave Airlie, Ingo Molnar, Borislav Petkov,
	Jean-Christophe Plagniol-Villard, Antonino Daplas,
	Mauro Carvalho Chehab, Mike Marciniszyn, xen-devel,
	Bjorn Helgaas, Thomas Gleixner, Juergen Gross, Luis R. Rodriguez,
	linux-kernel

On Fri, Mar 27, 2015 at 02:23:16PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 27, 2015 at 1:30 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Fri, Mar 27, 2015 at 12:58:02PM -0700, Andy Lutomirski wrote:
> >> On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >> > On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
> >> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> >> >> <mcgrof@do-not-panic.com> wrote:
> >> >> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >> >> >
> >> >> > Ideally on systems using PAT we can expect a swift
> >> >> > transition away from MTRR. There can be a few exceptions
> >> >> > to this, one is where device drivers are known to exist
> >> >> > on PATs with errata, another situation is observed on
> >> >> > old device drivers where devices had combined MMIO
> >> >> > register access with whatever area they typically
> >> >> > later wanted to end up using MTRR for on the same
> >> >> > PCI BAR. This situation can still be addressed by
> >> >> > splitting up ioremap'd PCI BAR into two ioremap'd
> >> >> > calls, one for MMIO registers, and another for whatever
> >> >> > is desirable for write-combining -- in order to
> >> >> > accomplish this though quite a bit of driver
> >> >> > restructuring is required.
> >> >> >
> >> >> > Device drivers which are known to require large
> >> >> > amount of re-work in order to split ioremap'd areas
> >> >> > can use __arch_phys_wc_add() to avoid regressions
> >> >> > when PAT is enabled.
> >> >> >
> >> >> > For a good example driver where things are neatly
> >> >> > split up on a PCI BAR refer the infiniband qib
> >> >> > driver. For a good example of a driver where good
> >> >> > amount of work is required refer to the infiniband
> >> >> > ipath driver.
> >> >> >
> >> >> > This is *only* a transitive API -- and as such no new
> >> >> > drivers are ever expected to use this.
> >> >>
> >> >> What's the exact layout that this helps?  I'm sceptical that this can
> >> >> ever be correct.
> >> >>
> >> >> Is there some awful driver that has a large ioremap that's supposed to
> >> >> contain multiple different memtypes?
> >> >
> >> > Yes, I cc'd you just now on one where I made changes on a driver which uses one
> >> > PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
> >> > arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
> >> > regress those drivers by making the MTRR WC hole trick non functional.
> >> > The changes are non trivial and so in this series I supplied changes on
> >> > one driver only to show the effort required. The other drivers which
> >> > required this were:
> >> >
> >> > Driver          File
> >> > ------------------------------------------------------------
> >> > fusion          drivers/message/fusion/mptbase.c
> >> > ivtv            drivers/media/pci/ivtv/ivtvfb.c
> >> > ipath           drivers/infiniband/hw/ipath/ipath_driver.c
> >> >
> >> > This series makes those drivers use __arch_phys_wc_add() more as a
> >> > transitory phase in hopes we can address the proper split as with the
> >> > atyfb illustrates. For ipath the changes required have a nice template
> >> > with the qib driver as they share very similar driver structure, the
> >> > qib driver *did* do the nice split.
> >> >
> >> >> If so, can we ioremap + set_page_xyz instead?
> >> >
> >> > I'm not sure I see which call we'd use.  Care to provide an example patch
> >> > alternative for the atyfb as a case in point alternative to the work required
> >> > to do the split?
> >> >
> >>
> >> I'm still confused.  Would it be insufficient to ioremap_nocache the
> >> whole thing and then call set_memory_wc on parts of it?  (Sorry,
> >> set_page_xyz was a typo.)
> >
> > I think that would be a sexy alternative.
> >
> > In this driver's case the thing is a bit messy as it not only used
> > the WC MTRR for a hole but it also then used a UC MTRR on top of
> > it all, so since I already tried to address the split, and if we address
> > the power of 2 woes, I think it'd be best to try to remove the UC MTRR
> > and just avoid set_page_wc() in this driver's case, but for the other cases
> > (fusion, ivtv, ipath) I think this makes sense.
> >
> > Thoughts?
> 
> Once that WC MTRR is in place, I think you really need UC and not UC-
> if you want to override it.  Otherwise I agree with all of this.

Do you mean that the UC MTRR work around that was in place might not
have really been effective? Not quite sure what you mean. I don't think
I follow.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
  2015-03-27 23:04               ` Luis R. Rodriguez
@ 2015-03-27 23:10                 ` Andy Lutomirski
  -1 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-27 23:10 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Bjorn Helgaas, Ville Syrjälä,
	Mauro Carvalho Chehab, Mike Marciniszyn, Luis R. Rodriguez,
	Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
	Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
	Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
	xen-devel, Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 27, 2015 at 4:04 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Fri, Mar 27, 2015 at 02:23:16PM -0700, Andy Lutomirski wrote:
>> On Fri, Mar 27, 2015 at 1:30 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> > On Fri, Mar 27, 2015 at 12:58:02PM -0700, Andy Lutomirski wrote:
>> >> On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> >> > On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
>> >> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
>> >> >> <mcgrof@do-not-panic.com> wrote:
>> >> >> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
>> >> >> >
>> >> >> > Ideally on systems using PAT we can expect a swift
>> >> >> > transition away from MTRR. There can be a few exceptions
>> >> >> > to this, one is where device drivers are known to exist
>> >> >> > on PATs with errata, another situation is observed on
>> >> >> > old device drivers where devices had combined MMIO
>> >> >> > register access with whatever area they typically
>> >> >> > later wanted to end up using MTRR for on the same
>> >> >> > PCI BAR. This situation can still be addressed by
>> >> >> > splitting up ioremap'd PCI BAR into two ioremap'd
>> >> >> > calls, one for MMIO registers, and another for whatever
>> >> >> > is desirable for write-combining -- in order to
>> >> >> > accomplish this though quite a bit of driver
>> >> >> > restructuring is required.
>> >> >> >
>> >> >> > Device drivers which are known to require large
>> >> >> > amount of re-work in order to split ioremap'd areas
>> >> >> > can use __arch_phys_wc_add() to avoid regressions
>> >> >> > when PAT is enabled.
>> >> >> >
>> >> >> > For a good example driver where things are neatly
>> >> >> > split up on a PCI BAR refer the infiniband qib
>> >> >> > driver. For a good example of a driver where good
>> >> >> > amount of work is required refer to the infiniband
>> >> >> > ipath driver.
>> >> >> >
>> >> >> > This is *only* a transitive API -- and as such no new
>> >> >> > drivers are ever expected to use this.
>> >> >>
>> >> >> What's the exact layout that this helps?  I'm sceptical that this can
>> >> >> ever be correct.
>> >> >>
>> >> >> Is there some awful driver that has a large ioremap that's supposed to
>> >> >> contain multiple different memtypes?
>> >> >
>> >> > Yes, I cc'd you just now on one where I made changes on a driver which uses one
>> >> > PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
>> >> > arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
>> >> > regress those drivers by making the MTRR WC hole trick non functional.
>> >> > The changes are non trivial and so in this series I supplied changes on
>> >> > one driver only to show the effort required. The other drivers which
>> >> > required this were:
>> >> >
>> >> > Driver          File
>> >> > ------------------------------------------------------------
>> >> > fusion          drivers/message/fusion/mptbase.c
>> >> > ivtv            drivers/media/pci/ivtv/ivtvfb.c
>> >> > ipath           drivers/infiniband/hw/ipath/ipath_driver.c
>> >> >
>> >> > This series makes those drivers use __arch_phys_wc_add() more as a
>> >> > transitory phase in hopes we can address the proper split as with the
>> >> > atyfb illustrates. For ipath the changes required have a nice template
>> >> > with the qib driver as they share very similar driver structure, the
>> >> > qib driver *did* do the nice split.
>> >> >
>> >> >> If so, can we ioremap + set_page_xyz instead?
>> >> >
>> >> > I'm not sure I see which call we'd use.  Care to provide an example patch
>> >> > alternative for the atyfb as a case in point alternative to the work required
>> >> > to do the split?
>> >> >
>> >>
>> >> I'm still confused.  Would it be insufficient to ioremap_nocache the
>> >> whole thing and then call set_memory_wc on parts of it?  (Sorry,
>> >> set_page_xyz was a typo.)
>> >
>> > I think that would be a sexy alternative.
>> >
>> > In this driver's case the thing is a bit messy as it not only used
>> > the WC MTRR for a hole but it also then used a UC MTRR on top of
>> > it all, so since I already tried to address the split, and if we address
>> > the power of 2 woes, I think it'd be best to try to remove the UC MTRR
>> > and just avoid set_page_wc() in this driver's case, but for the other cases
>> > (fusion, ivtv, ipath) I think this makes sense.
>> >
>> > Thoughts?
>>
>> Once that WC MTRR is in place, I think you really need UC and not UC-
>> if you want to override it.  Otherwise I agree with all of this.
>
> Do you mean that the UC MTRR work around that was in place might not
> have really been effective? Not quite sure what you mean. I don't think
> I follow.

I mean that the UC MTRR that overrides the WC MTRR was probably fine
(I hope smaller MTRRs override larger MTRRs).  But we should just
ditch UC MTRRs entirely, and setting UC in the page tables would work
on all CPUs *if we supported that*.  We'd need to add a couple trivial
helpers to do that.

--Andy

>
>   Luis



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
@ 2015-03-27 23:10                 ` Andy Lutomirski
  0 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-27 23:10 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Bjorn Helgaas, Ville Syrjälä,
	Mauro Carvalho Chehab, Mike Marciniszyn, Luis R. Rodriguez,
	Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
	Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
	Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
	xen-devel, Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 27, 2015 at 4:04 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Fri, Mar 27, 2015 at 02:23:16PM -0700, Andy Lutomirski wrote:
>> On Fri, Mar 27, 2015 at 1:30 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> > On Fri, Mar 27, 2015 at 12:58:02PM -0700, Andy Lutomirski wrote:
>> >> On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> >> > On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
>> >> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
>> >> >> <mcgrof@do-not-panic.com> wrote:
>> >> >> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
>> >> >> >
>> >> >> > Ideally on systems using PAT we can expect a swift
>> >> >> > transition away from MTRR. There can be a few exceptions
>> >> >> > to this, one is where device drivers are known to exist
>> >> >> > on PATs with errata, another situation is observed on
>> >> >> > old device drivers where devices had combined MMIO
>> >> >> > register access with whatever area they typically
>> >> >> > later wanted to end up using MTRR for on the same
>> >> >> > PCI BAR. This situation can still be addressed by
>> >> >> > splitting up ioremap'd PCI BAR into two ioremap'd
>> >> >> > calls, one for MMIO registers, and another for whatever
>> >> >> > is desirable for write-combining -- in order to
>> >> >> > accomplish this though quite a bit of driver
>> >> >> > restructuring is required.
>> >> >> >
>> >> >> > Device drivers which are known to require large
>> >> >> > amount of re-work in order to split ioremap'd areas
>> >> >> > can use __arch_phys_wc_add() to avoid regressions
>> >> >> > when PAT is enabled.
>> >> >> >
>> >> >> > For a good example driver where things are neatly
>> >> >> > split up on a PCI BAR refer the infiniband qib
>> >> >> > driver. For a good example of a driver where good
>> >> >> > amount of work is required refer to the infiniband
>> >> >> > ipath driver.
>> >> >> >
>> >> >> > This is *only* a transitive API -- and as such no new
>> >> >> > drivers are ever expected to use this.
>> >> >>
>> >> >> What's the exact layout that this helps?  I'm sceptical that this can
>> >> >> ever be correct.
>> >> >>
>> >> >> Is there some awful driver that has a large ioremap that's supposed to
>> >> >> contain multiple different memtypes?
>> >> >
>> >> > Yes, I cc'd you just now on one where I made changes on a driver which uses one
>> >> > PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
>> >> > arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
>> >> > regress those drivers by making the MTRR WC hole trick non functional.
>> >> > The changes are non trivial and so in this series I supplied changes on
>> >> > one driver only to show the effort required. The other drivers which
>> >> > required this were:
>> >> >
>> >> > Driver          File
>> >> > ------------------------------------------------------------
>> >> > fusion          drivers/message/fusion/mptbase.c
>> >> > ivtv            drivers/media/pci/ivtv/ivtvfb.c
>> >> > ipath           drivers/infiniband/hw/ipath/ipath_driver.c
>> >> >
>> >> > This series makes those drivers use __arch_phys_wc_add() more as a
>> >> > transitory phase in hopes we can address the proper split as with the
>> >> > atyfb illustrates. For ipath the changes required have a nice template
>> >> > with the qib driver as they share very similar driver structure, the
>> >> > qib driver *did* do the nice split.
>> >> >
>> >> >> If so, can we ioremap + set_page_xyz instead?
>> >> >
>> >> > I'm not sure I see which call we'd use.  Care to provide an example patch
>> >> > alternative for the atyfb as a case in point alternative to the work required
>> >> > to do the split?
>> >> >
>> >>
>> >> I'm still confused.  Would it be insufficient to ioremap_nocache the
>> >> whole thing and then call set_memory_wc on parts of it?  (Sorry,
>> >> set_page_xyz was a typo.)
>> >
>> > I think that would be a sexy alternative.
>> >
>> > In this driver's case the thing is a bit messy as it not only used
>> > the WC MTRR for a hole but it also then used a UC MTRR on top of
>> > it all, so since I already tried to address the split, and if we address
>> > the power of 2 woes, I think it'd be best to try to remove the UC MTRR
>> > and just avoid set_page_wc() in this driver's case, but for the other cases
>> > (fusion, ivtv, ipath) I think this makes sense.
>> >
>> > Thoughts?
>>
>> Once that WC MTRR is in place, I think you really need UC and not UC-
>> if you want to override it.  Otherwise I agree with all of this.
>
> Do you mean that the UC MTRR work around that was in place might not
> have really been effective? Not quite sure what you mean. I don't think
> I follow.

I mean that the UC MTRR that overrides the WC MTRR was probably fine
(I hope smaller MTRRs override larger MTRRs).  But we should just
ditch UC MTRRs entirely, and setting UC in the page tables would work
on all CPUs *if we supported that*.  We'd need to add a couple trivial
helpers to do that.

--Andy

>
>   Luis



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
  2015-03-27 23:04               ` Luis R. Rodriguez
  (?)
  (?)
@ 2015-03-27 23:10               ` Andy Lutomirski
  -1 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-03-27 23:10 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Linux Fbdev development list, Daniel Vetter,
	Ville Syrjälä,
	Jan Beulich, H. Peter Anvin, Suresh Siddha, Tomi Valkeinen,
	X86 ML, Ingo Molnar, Dave Airlie, Ingo Molnar, Borislav Petkov,
	Jean-Christophe Plagniol-Villard, Antonino Daplas,
	Mauro Carvalho Chehab, Mike Marciniszyn, xen-devel,
	Bjorn Helgaas, Thomas Gleixner, Juergen Gross, Luis R. Rodriguez,
	linux-kernel

On Fri, Mar 27, 2015 at 4:04 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Fri, Mar 27, 2015 at 02:23:16PM -0700, Andy Lutomirski wrote:
>> On Fri, Mar 27, 2015 at 1:30 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> > On Fri, Mar 27, 2015 at 12:58:02PM -0700, Andy Lutomirski wrote:
>> >> On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> >> > On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
>> >> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
>> >> >> <mcgrof@do-not-panic.com> wrote:
>> >> >> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
>> >> >> >
>> >> >> > Ideally on systems using PAT we can expect a swift
>> >> >> > transition away from MTRR. There can be a few exceptions
>> >> >> > to this, one is where device drivers are known to exist
>> >> >> > on PATs with errata, another situation is observed on
>> >> >> > old device drivers where devices had combined MMIO
>> >> >> > register access with whatever area they typically
>> >> >> > later wanted to end up using MTRR for on the same
>> >> >> > PCI BAR. This situation can still be addressed by
>> >> >> > splitting up ioremap'd PCI BAR into two ioremap'd
>> >> >> > calls, one for MMIO registers, and another for whatever
>> >> >> > is desirable for write-combining -- in order to
>> >> >> > accomplish this though quite a bit of driver
>> >> >> > restructuring is required.
>> >> >> >
>> >> >> > Device drivers which are known to require large
>> >> >> > amount of re-work in order to split ioremap'd areas
>> >> >> > can use __arch_phys_wc_add() to avoid regressions
>> >> >> > when PAT is enabled.
>> >> >> >
>> >> >> > For a good example driver where things are neatly
>> >> >> > split up on a PCI BAR refer the infiniband qib
>> >> >> > driver. For a good example of a driver where good
>> >> >> > amount of work is required refer to the infiniband
>> >> >> > ipath driver.
>> >> >> >
>> >> >> > This is *only* a transitive API -- and as such no new
>> >> >> > drivers are ever expected to use this.
>> >> >>
>> >> >> What's the exact layout that this helps?  I'm sceptical that this can
>> >> >> ever be correct.
>> >> >>
>> >> >> Is there some awful driver that has a large ioremap that's supposed to
>> >> >> contain multiple different memtypes?
>> >> >
>> >> > Yes, I cc'd you just now on one where I made changes on a driver which uses one
>> >> > PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
>> >> > arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
>> >> > regress those drivers by making the MTRR WC hole trick non functional.
>> >> > The changes are non trivial and so in this series I supplied changes on
>> >> > one driver only to show the effort required. The other drivers which
>> >> > required this were:
>> >> >
>> >> > Driver          File
>> >> > ------------------------------------------------------------
>> >> > fusion          drivers/message/fusion/mptbase.c
>> >> > ivtv            drivers/media/pci/ivtv/ivtvfb.c
>> >> > ipath           drivers/infiniband/hw/ipath/ipath_driver.c
>> >> >
>> >> > This series makes those drivers use __arch_phys_wc_add() more as a
>> >> > transitory phase in hopes we can address the proper split as with the
>> >> > atyfb illustrates. For ipath the changes required have a nice template
>> >> > with the qib driver as they share very similar driver structure, the
>> >> > qib driver *did* do the nice split.
>> >> >
>> >> >> If so, can we ioremap + set_page_xyz instead?
>> >> >
>> >> > I'm not sure I see which call we'd use.  Care to provide an example patch
>> >> > alternative for the atyfb as a case in point alternative to the work required
>> >> > to do the split?
>> >> >
>> >>
>> >> I'm still confused.  Would it be insufficient to ioremap_nocache the
>> >> whole thing and then call set_memory_wc on parts of it?  (Sorry,
>> >> set_page_xyz was a typo.)
>> >
>> > I think that would be a sexy alternative.
>> >
>> > In this driver's case the thing is a bit messy as it not only used
>> > the WC MTRR for a hole but it also then used a UC MTRR on top of
>> > it all, so since I already tried to address the split, and if we address
>> > the power of 2 woes, I think it'd be best to try to remove the UC MTRR
>> > and just avoid set_page_wc() in this driver's case, but for the other cases
>> > (fusion, ivtv, ipath) I think this makes sense.
>> >
>> > Thoughts?
>>
>> Once that WC MTRR is in place, I think you really need UC and not UC-
>> if you want to override it.  Otherwise I agree with all of this.
>
> Do you mean that the UC MTRR work around that was in place might not
> have really been effective? Not quite sure what you mean. I don't think
> I follow.

I mean that the UC MTRR that overrides the WC MTRR was probably fine
(I hope smaller MTRRs override larger MTRRs).  But we should just
ditch UC MTRRs entirely, and setting UC in the page tables would work
on all CPUs *if we supported that*.  We'd need to add a couple trivial
helpers to do that.

--Andy

>
>   Luis



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-27 21:21         ` Andy Lutomirski
@ 2015-03-27 23:31           ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 23:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Juergen Gross, Jan Beulich, Borislav Petkov, Suresh Siddha,
	venkatesh.pallipadi, Dave Airlie, linux-kernel,
	Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
	Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 27, 2015 at 02:21:34PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 27, 2015 at 1:12 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Fri, Mar 20, 2015 at 04:52:18PM -0700, Andy Lutomirski wrote:
> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> >> <mcgrof@do-not-panic.com> wrote:
> >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> >> > index 8025624..8875e56 100644
> >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> >> >
> >> >  #ifdef CONFIG_MTRR
> >> >         par->mtrr_aper = -1;
> >> > -       par->mtrr_reg = -1;
> >> >         if (!nomtrr) {
> >> > -               /* Cover the whole resource. */
> >> > -               par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> >> > +               par->mtrr_aper = mtrr_add(info->fix.smem_start,
> >> > +                                         info->fix.smem_len,
> >> >                                           MTRR_TYPE_WRCOMB, 1);
> >> > -               if (par->mtrr_aper >= 0 && !par->aux_start) {
> >> > -                       /* Make a hole for mmio. */
> >> > -                       par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
> >> > -                                                GUI_RESERVE, GUI_RESERVE,
> >> > -                                                MTRR_TYPE_UNCACHABLE, 1);
> >> > -                       if (par->mtrr_reg < 0) {
> >> > -                               mtrr_del(par->mtrr_aper, 0, 0);
> >> > -                               par->mtrr_aper = -1;
> >> > -                       }
> >> > -               }
> >> >         }
> >> >  #endif
> >> >
> >> > @@ -2776,10 +2765,6 @@ aty_init_exit:
> >> >         par->pll_ops->set_pll(info, &par->saved_pll);
> >> >
> >> >  #ifdef CONFIG_MTRR
> >> > -       if (par->mtrr_reg >= 0) {
> >> > -               mtrr_del(par->mtrr_reg, 0, 0);
> >> > -               par->mtrr_reg = -1;
> >> > -       }
> >> >         if (par->mtrr_aper >= 0) {
> >> >                 mtrr_del(par->mtrr_aper, 0, 0);
> >> >                 par->mtrr_aper = -1;
> >> > @@ -3466,7 +3451,7 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
> >> >         }
> >> >
> >> >         info->fix.mmio_start = raddr;
> >> > -       par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
> >> > +       par->ati_regbase = ioremap_nocache(info->fix.mmio_start, 0x1000);
> >>
> >> Double-check me, but I think that ioremap_nocache + WC MTRR = WC.
> >
> > Precicely, in this case the WC hole was obtained by using MTRR WC. This
> > patch removes that WC hole trick and now we can be explciit about
> > only wanting ioremap_nocache() on the registers, that is WC is not
> > desired here and is not used. The patch does not highlight the fact
> > that there was left in place another ioremap() call for the framebuffer:
> >
> > info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
> >
> > That is the one that later after this patch we use ioremap_wc() for.
> > This patch just removes the hole solution. That's all.
> >
> 
> I don't understand.
> 
> If I read it right, there's a 2^n byte BAR.  You're requesting WC for
> the whole think using arch_phys_wc_add.

I believe there is a misunderstanding of order of changes.

Let's split when we use mtrr_add() Vs arch_phys_wc_add() to avoid
confusion as in this patch we don't yet use arch_phys_wc_add(). That
is done in the next patch.

The commit log describes best the state of affairs prior to this
patch:

    The atyfb driver uses an MTRR work around since some
    cards use the same PCI BAR for the framebuffer and MMIO.
    In such cards the last page is used for MMIO, the rest for
    the framebuffer, so on those cards we ioremap() the MMIO
    page alone, then again ioremap() the full framebuffer
    including the MMIO space *and* ___then___ use an MTRR with
    MTRR_TYPE_WRCOMB on the full PCI BAR... and finally "hole"
    in an MTRR_TYPE_UNCACHABLE MTRR only for MMIO.

Then this patch drops the MTRR_TYPE_UNCACHABLE rewrite thing
and instead corrects the ioremap() call for the framebuffer
to only be called for the framebuffer alone. For the MMIO
area we adjust to use then ioremap_nocache(). The MTRR left
is now only *for the framebuffer* and it should not be touching
the MMIO area. So the MMIO area has its own ioremap_nocache()
area alone, the framebuffer is left with an ioremap() followed
by an mtrr_add() call.

The next patch replaces the mtrr_add() with arch_phys_wc_add()
and then also uses ioremap_wc().

> On a PAT system that has no effect and all is well.

Yeah we're not doing arch_phys_wc_add() on the entire PCI BAR.
That was dumb, this fixes that, and on this patch mtrr_add()
is still used.

> On a non-PAT system, it adds an MTRR.  That
> means that you need to override the MTRR somehow for the mmio regs,
> and UC- won't do the trick.

We don't need to solve that problem here as the MTRR should only
be for the framebuffer.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
@ 2015-03-27 23:31           ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 23:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Juergen Gross, Jan Beulich, Borislav Petkov, Suresh Siddha,
	venkatesh.pallipadi, Dave Airlie, linux-kernel,
	Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
	Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 27, 2015 at 02:21:34PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 27, 2015 at 1:12 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Fri, Mar 20, 2015 at 04:52:18PM -0700, Andy Lutomirski wrote:
> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> >> <mcgrof@do-not-panic.com> wrote:
> >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> >> > index 8025624..8875e56 100644
> >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> >> >
> >> >  #ifdef CONFIG_MTRR
> >> >         par->mtrr_aper = -1;
> >> > -       par->mtrr_reg = -1;
> >> >         if (!nomtrr) {
> >> > -               /* Cover the whole resource. */
> >> > -               par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> >> > +               par->mtrr_aper = mtrr_add(info->fix.smem_start,
> >> > +                                         info->fix.smem_len,
> >> >                                           MTRR_TYPE_WRCOMB, 1);
> >> > -               if (par->mtrr_aper >= 0 && !par->aux_start) {
> >> > -                       /* Make a hole for mmio. */
> >> > -                       par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
> >> > -                                                GUI_RESERVE, GUI_RESERVE,
> >> > -                                                MTRR_TYPE_UNCACHABLE, 1);
> >> > -                       if (par->mtrr_reg < 0) {
> >> > -                               mtrr_del(par->mtrr_aper, 0, 0);
> >> > -                               par->mtrr_aper = -1;
> >> > -                       }
> >> > -               }
> >> >         }
> >> >  #endif
> >> >
> >> > @@ -2776,10 +2765,6 @@ aty_init_exit:
> >> >         par->pll_ops->set_pll(info, &par->saved_pll);
> >> >
> >> >  #ifdef CONFIG_MTRR
> >> > -       if (par->mtrr_reg >= 0) {
> >> > -               mtrr_del(par->mtrr_reg, 0, 0);
> >> > -               par->mtrr_reg = -1;
> >> > -       }
> >> >         if (par->mtrr_aper >= 0) {
> >> >                 mtrr_del(par->mtrr_aper, 0, 0);
> >> >                 par->mtrr_aper = -1;
> >> > @@ -3466,7 +3451,7 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
> >> >         }
> >> >
> >> >         info->fix.mmio_start = raddr;
> >> > -       par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
> >> > +       par->ati_regbase = ioremap_nocache(info->fix.mmio_start, 0x1000);
> >>
> >> Double-check me, but I think that ioremap_nocache + WC MTRR = WC.
> >
> > Precicely, in this case the WC hole was obtained by using MTRR WC. This
> > patch removes that WC hole trick and now we can be explciit about
> > only wanting ioremap_nocache() on the registers, that is WC is not
> > desired here and is not used. The patch does not highlight the fact
> > that there was left in place another ioremap() call for the framebuffer:
> >
> > info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
> >
> > That is the one that later after this patch we use ioremap_wc() for.
> > This patch just removes the hole solution. That's all.
> >
> 
> I don't understand.
> 
> If I read it right, there's a 2^n byte BAR.  You're requesting WC for
> the whole think using arch_phys_wc_add.

I believe there is a misunderstanding of order of changes.

Let's split when we use mtrr_add() Vs arch_phys_wc_add() to avoid
confusion as in this patch we don't yet use arch_phys_wc_add(). That
is done in the next patch.

The commit log describes best the state of affairs prior to this
patch:

    The atyfb driver uses an MTRR work around since some
    cards use the same PCI BAR for the framebuffer and MMIO.
    In such cards the last page is used for MMIO, the rest for
    the framebuffer, so on those cards we ioremap() the MMIO
    page alone, then again ioremap() the full framebuffer
    including the MMIO space *and* ___then___ use an MTRR with
    MTRR_TYPE_WRCOMB on the full PCI BAR... and finally "hole"
    in an MTRR_TYPE_UNCACHABLE MTRR only for MMIO.

Then this patch drops the MTRR_TYPE_UNCACHABLE rewrite thing
and instead corrects the ioremap() call for the framebuffer
to only be called for the framebuffer alone. For the MMIO
area we adjust to use then ioremap_nocache(). The MTRR left
is now only *for the framebuffer* and it should not be touching
the MMIO area. So the MMIO area has its own ioremap_nocache()
area alone, the framebuffer is left with an ioremap() followed
by an mtrr_add() call.

The next patch replaces the mtrr_add() with arch_phys_wc_add()
and then also uses ioremap_wc().

> On a PAT system that has no effect and all is well.

Yeah we're not doing arch_phys_wc_add() on the entire PCI BAR.
That was dumb, this fixes that, and on this patch mtrr_add()
is still used.

> On a non-PAT system, it adds an MTRR.  That
> means that you need to override the MTRR somehow for the mmio regs,
> and UC- won't do the trick.

We don't need to solve that problem here as the MTRR should only
be for the framebuffer.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-27 21:21         ` Andy Lutomirski
  (?)
  (?)
@ 2015-03-27 23:31         ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 23:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Juergen Gross, Jean-Christophe Plagniol-Villard,
	Linux Fbdev development list, X86 ML, Suresh Siddha,
	Antonino Daplas, Luis R. Rodriguez, Daniel Vetter,
	Tomi Valkeinen, venkatesh.pallipadi, linux-kernel, xen-devel,
	Ingo Molnar, Jan Beulich, H. Peter Anvin, Dave Airlie,
	Thomas Gleixner, Borislav Petkov, Linus Torvalds, Ingo Molnar

On Fri, Mar 27, 2015 at 02:21:34PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 27, 2015 at 1:12 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Fri, Mar 20, 2015 at 04:52:18PM -0700, Andy Lutomirski wrote:
> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> >> <mcgrof@do-not-panic.com> wrote:
> >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> >> > index 8025624..8875e56 100644
> >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> >> >
> >> >  #ifdef CONFIG_MTRR
> >> >         par->mtrr_aper = -1;
> >> > -       par->mtrr_reg = -1;
> >> >         if (!nomtrr) {
> >> > -               /* Cover the whole resource. */
> >> > -               par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> >> > +               par->mtrr_aper = mtrr_add(info->fix.smem_start,
> >> > +                                         info->fix.smem_len,
> >> >                                           MTRR_TYPE_WRCOMB, 1);
> >> > -               if (par->mtrr_aper >= 0 && !par->aux_start) {
> >> > -                       /* Make a hole for mmio. */
> >> > -                       par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
> >> > -                                                GUI_RESERVE, GUI_RESERVE,
> >> > -                                                MTRR_TYPE_UNCACHABLE, 1);
> >> > -                       if (par->mtrr_reg < 0) {
> >> > -                               mtrr_del(par->mtrr_aper, 0, 0);
> >> > -                               par->mtrr_aper = -1;
> >> > -                       }
> >> > -               }
> >> >         }
> >> >  #endif
> >> >
> >> > @@ -2776,10 +2765,6 @@ aty_init_exit:
> >> >         par->pll_ops->set_pll(info, &par->saved_pll);
> >> >
> >> >  #ifdef CONFIG_MTRR
> >> > -       if (par->mtrr_reg >= 0) {
> >> > -               mtrr_del(par->mtrr_reg, 0, 0);
> >> > -               par->mtrr_reg = -1;
> >> > -       }
> >> >         if (par->mtrr_aper >= 0) {
> >> >                 mtrr_del(par->mtrr_aper, 0, 0);
> >> >                 par->mtrr_aper = -1;
> >> > @@ -3466,7 +3451,7 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
> >> >         }
> >> >
> >> >         info->fix.mmio_start = raddr;
> >> > -       par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
> >> > +       par->ati_regbase = ioremap_nocache(info->fix.mmio_start, 0x1000);
> >>
> >> Double-check me, but I think that ioremap_nocache + WC MTRR = WC.
> >
> > Precicely, in this case the WC hole was obtained by using MTRR WC. This
> > patch removes that WC hole trick and now we can be explciit about
> > only wanting ioremap_nocache() on the registers, that is WC is not
> > desired here and is not used. The patch does not highlight the fact
> > that there was left in place another ioremap() call for the framebuffer:
> >
> > info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
> >
> > That is the one that later after this patch we use ioremap_wc() for.
> > This patch just removes the hole solution. That's all.
> >
> 
> I don't understand.
> 
> If I read it right, there's a 2^n byte BAR.  You're requesting WC for
> the whole think using arch_phys_wc_add.

I believe there is a misunderstanding of order of changes.

Let's split when we use mtrr_add() Vs arch_phys_wc_add() to avoid
confusion as in this patch we don't yet use arch_phys_wc_add(). That
is done in the next patch.

The commit log describes best the state of affairs prior to this
patch:

    The atyfb driver uses an MTRR work around since some
    cards use the same PCI BAR for the framebuffer and MMIO.
    In such cards the last page is used for MMIO, the rest for
    the framebuffer, so on those cards we ioremap() the MMIO
    page alone, then again ioremap() the full framebuffer
    including the MMIO space *and* ___then___ use an MTRR with
    MTRR_TYPE_WRCOMB on the full PCI BAR... and finally "hole"
    in an MTRR_TYPE_UNCACHABLE MTRR only for MMIO.

Then this patch drops the MTRR_TYPE_UNCACHABLE rewrite thing
and instead corrects the ioremap() call for the framebuffer
to only be called for the framebuffer alone. For the MMIO
area we adjust to use then ioremap_nocache(). The MTRR left
is now only *for the framebuffer* and it should not be touching
the MMIO area. So the MMIO area has its own ioremap_nocache()
area alone, the framebuffer is left with an ioremap() followed
by an mtrr_add() call.

The next patch replaces the mtrr_add() with arch_phys_wc_add()
and then also uses ioremap_wc().

> On a PAT system that has no effect and all is well.

Yeah we're not doing arch_phys_wc_add() on the entire PCI BAR.
That was dumb, this fixes that, and on this patch mtrr_add()
is still used.

> On a non-PAT system, it adds an MTRR.  That
> means that you need to override the MTRR somehow for the mmio regs,
> and UC- won't do the trick.

We don't need to solve that problem here as the MTRR should only
be for the framebuffer.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
  2015-03-27 23:10                 ` Andy Lutomirski
@ 2015-03-27 23:33                   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 23:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Bjorn Helgaas, Ville Syrjälä,
	Mauro Carvalho Chehab, Mike Marciniszyn, Luis R. Rodriguez,
	Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
	Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
	Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
	xen-devel, Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 27, 2015 at 04:10:03PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 27, 2015 at 4:04 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Fri, Mar 27, 2015 at 02:23:16PM -0700, Andy Lutomirski wrote:
> >> On Fri, Mar 27, 2015 at 1:30 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >> > On Fri, Mar 27, 2015 at 12:58:02PM -0700, Andy Lutomirski wrote:
> >> >> On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >> >> > On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
> >> >> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> >> >> >> <mcgrof@do-not-panic.com> wrote:
> >> >> >> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >> >> >> >
> >> >> >> > Ideally on systems using PAT we can expect a swift
> >> >> >> > transition away from MTRR. There can be a few exceptions
> >> >> >> > to this, one is where device drivers are known to exist
> >> >> >> > on PATs with errata, another situation is observed on
> >> >> >> > old device drivers where devices had combined MMIO
> >> >> >> > register access with whatever area they typically
> >> >> >> > later wanted to end up using MTRR for on the same
> >> >> >> > PCI BAR. This situation can still be addressed by
> >> >> >> > splitting up ioremap'd PCI BAR into two ioremap'd
> >> >> >> > calls, one for MMIO registers, and another for whatever
> >> >> >> > is desirable for write-combining -- in order to
> >> >> >> > accomplish this though quite a bit of driver
> >> >> >> > restructuring is required.
> >> >> >> >
> >> >> >> > Device drivers which are known to require large
> >> >> >> > amount of re-work in order to split ioremap'd areas
> >> >> >> > can use __arch_phys_wc_add() to avoid regressions
> >> >> >> > when PAT is enabled.
> >> >> >> >
> >> >> >> > For a good example driver where things are neatly
> >> >> >> > split up on a PCI BAR refer the infiniband qib
> >> >> >> > driver. For a good example of a driver where good
> >> >> >> > amount of work is required refer to the infiniband
> >> >> >> > ipath driver.
> >> >> >> >
> >> >> >> > This is *only* a transitive API -- and as such no new
> >> >> >> > drivers are ever expected to use this.
> >> >> >>
> >> >> >> What's the exact layout that this helps?  I'm sceptical that this can
> >> >> >> ever be correct.
> >> >> >>
> >> >> >> Is there some awful driver that has a large ioremap that's supposed to
> >> >> >> contain multiple different memtypes?
> >> >> >
> >> >> > Yes, I cc'd you just now on one where I made changes on a driver which uses one
> >> >> > PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
> >> >> > arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
> >> >> > regress those drivers by making the MTRR WC hole trick non functional.
> >> >> > The changes are non trivial and so in this series I supplied changes on
> >> >> > one driver only to show the effort required. The other drivers which
> >> >> > required this were:
> >> >> >
> >> >> > Driver          File
> >> >> > ------------------------------------------------------------
> >> >> > fusion          drivers/message/fusion/mptbase.c
> >> >> > ivtv            drivers/media/pci/ivtv/ivtvfb.c
> >> >> > ipath           drivers/infiniband/hw/ipath/ipath_driver.c
> >> >> >
> >> >> > This series makes those drivers use __arch_phys_wc_add() more as a
> >> >> > transitory phase in hopes we can address the proper split as with the
> >> >> > atyfb illustrates. For ipath the changes required have a nice template
> >> >> > with the qib driver as they share very similar driver structure, the
> >> >> > qib driver *did* do the nice split.
> >> >> >
> >> >> >> If so, can we ioremap + set_page_xyz instead?
> >> >> >
> >> >> > I'm not sure I see which call we'd use.  Care to provide an example patch
> >> >> > alternative for the atyfb as a case in point alternative to the work required
> >> >> > to do the split?
> >> >> >
> >> >>
> >> >> I'm still confused.  Would it be insufficient to ioremap_nocache the
> >> >> whole thing and then call set_memory_wc on parts of it?  (Sorry,
> >> >> set_page_xyz was a typo.)
> >> >
> >> > I think that would be a sexy alternative.
> >> >
> >> > In this driver's case the thing is a bit messy as it not only used
> >> > the WC MTRR for a hole but it also then used a UC MTRR on top of
> >> > it all, so since I already tried to address the split, and if we address
> >> > the power of 2 woes, I think it'd be best to try to remove the UC MTRR
> >> > and just avoid set_page_wc() in this driver's case, but for the other cases
> >> > (fusion, ivtv, ipath) I think this makes sense.
> >> >
> >> > Thoughts?
> >>
> >> Once that WC MTRR is in place, I think you really need UC and not UC-
> >> if you want to override it.  Otherwise I agree with all of this.
> >
> > Do you mean that the UC MTRR work around that was in place might not
> > have really been effective? Not quite sure what you mean. I don't think
> > I follow.
> 
> I mean that the UC MTRR that overrides the WC MTRR was probably fine
> (I hope smaller MTRRs override larger MTRRs).  But we should just
> ditch UC MTRRs entirely, 

Agreed, this series does that, and this patch addresses the last
UC MTRR ;)

> and setting UC in the page tables would work on all CPUs *if we supported
> that*.  We'd need to add a couple trivial helpers to do that.

OK please check my latest reply and if you do not mind clarify what you mean
there as I am not sure if we're on the same page (no pun) here. I don't quite
follow this last statement.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
@ 2015-03-27 23:33                   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 23:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Bjorn Helgaas, Ville Syrjälä,
	Mauro Carvalho Chehab, Mike Marciniszyn, Luis R. Rodriguez,
	Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Juergen Gross,
	Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
	Dave Airlie, linux-kernel, Linux Fbdev development list, X86 ML,
	xen-devel, Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 27, 2015 at 04:10:03PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 27, 2015 at 4:04 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Fri, Mar 27, 2015 at 02:23:16PM -0700, Andy Lutomirski wrote:
> >> On Fri, Mar 27, 2015 at 1:30 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >> > On Fri, Mar 27, 2015 at 12:58:02PM -0700, Andy Lutomirski wrote:
> >> >> On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >> >> > On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
> >> >> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> >> >> >> <mcgrof@do-not-panic.com> wrote:
> >> >> >> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >> >> >> >
> >> >> >> > Ideally on systems using PAT we can expect a swift
> >> >> >> > transition away from MTRR. There can be a few exceptions
> >> >> >> > to this, one is where device drivers are known to exist
> >> >> >> > on PATs with errata, another situation is observed on
> >> >> >> > old device drivers where devices had combined MMIO
> >> >> >> > register access with whatever area they typically
> >> >> >> > later wanted to end up using MTRR for on the same
> >> >> >> > PCI BAR. This situation can still be addressed by
> >> >> >> > splitting up ioremap'd PCI BAR into two ioremap'd
> >> >> >> > calls, one for MMIO registers, and another for whatever
> >> >> >> > is desirable for write-combining -- in order to
> >> >> >> > accomplish this though quite a bit of driver
> >> >> >> > restructuring is required.
> >> >> >> >
> >> >> >> > Device drivers which are known to require large
> >> >> >> > amount of re-work in order to split ioremap'd areas
> >> >> >> > can use __arch_phys_wc_add() to avoid regressions
> >> >> >> > when PAT is enabled.
> >> >> >> >
> >> >> >> > For a good example driver where things are neatly
> >> >> >> > split up on a PCI BAR refer the infiniband qib
> >> >> >> > driver. For a good example of a driver where good
> >> >> >> > amount of work is required refer to the infiniband
> >> >> >> > ipath driver.
> >> >> >> >
> >> >> >> > This is *only* a transitive API -- and as such no new
> >> >> >> > drivers are ever expected to use this.
> >> >> >>
> >> >> >> What's the exact layout that this helps?  I'm sceptical that this can
> >> >> >> ever be correct.
> >> >> >>
> >> >> >> Is there some awful driver that has a large ioremap that's supposed to
> >> >> >> contain multiple different memtypes?
> >> >> >
> >> >> > Yes, I cc'd you just now on one where I made changes on a driver which uses one
> >> >> > PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
> >> >> > arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
> >> >> > regress those drivers by making the MTRR WC hole trick non functional.
> >> >> > The changes are non trivial and so in this series I supplied changes on
> >> >> > one driver only to show the effort required. The other drivers which
> >> >> > required this were:
> >> >> >
> >> >> > Driver          File
> >> >> > ------------------------------------------------------------
> >> >> > fusion          drivers/message/fusion/mptbase.c
> >> >> > ivtv            drivers/media/pci/ivtv/ivtvfb.c
> >> >> > ipath           drivers/infiniband/hw/ipath/ipath_driver.c
> >> >> >
> >> >> > This series makes those drivers use __arch_phys_wc_add() more as a
> >> >> > transitory phase in hopes we can address the proper split as with the
> >> >> > atyfb illustrates. For ipath the changes required have a nice template
> >> >> > with the qib driver as they share very similar driver structure, the
> >> >> > qib driver *did* do the nice split.
> >> >> >
> >> >> >> If so, can we ioremap + set_page_xyz instead?
> >> >> >
> >> >> > I'm not sure I see which call we'd use.  Care to provide an example patch
> >> >> > alternative for the atyfb as a case in point alternative to the work required
> >> >> > to do the split?
> >> >> >
> >> >>
> >> >> I'm still confused.  Would it be insufficient to ioremap_nocache the
> >> >> whole thing and then call set_memory_wc on parts of it?  (Sorry,
> >> >> set_page_xyz was a typo.)
> >> >
> >> > I think that would be a sexy alternative.
> >> >
> >> > In this driver's case the thing is a bit messy as it not only used
> >> > the WC MTRR for a hole but it also then used a UC MTRR on top of
> >> > it all, so since I already tried to address the split, and if we address
> >> > the power of 2 woes, I think it'd be best to try to remove the UC MTRR
> >> > and just avoid set_page_wc() in this driver's case, but for the other cases
> >> > (fusion, ivtv, ipath) I think this makes sense.
> >> >
> >> > Thoughts?
> >>
> >> Once that WC MTRR is in place, I think you really need UC and not UC-
> >> if you want to override it.  Otherwise I agree with all of this.
> >
> > Do you mean that the UC MTRR work around that was in place might not
> > have really been effective? Not quite sure what you mean. I don't think
> > I follow.
> 
> I mean that the UC MTRR that overrides the WC MTRR was probably fine
> (I hope smaller MTRRs override larger MTRRs).  But we should just
> ditch UC MTRRs entirely, 

Agreed, this series does that, and this patch addresses the last
UC MTRR ;)

> and setting UC in the page tables would work on all CPUs *if we supported
> that*.  We'd need to add a couple trivial helpers to do that.

OK please check my latest reply and if you do not mind clarify what you mean
there as I am not sure if we're on the same page (no pun) here. I don't quite
follow this last statement.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
  2015-03-27 23:10                 ` Andy Lutomirski
  (?)
  (?)
@ 2015-03-27 23:33                 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 23:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Fbdev development list, Daniel Vetter,
	Ville Syrjälä,
	Jan Beulich, H. Peter Anvin, Suresh Siddha, Tomi Valkeinen,
	X86 ML, Ingo Molnar, Dave Airlie, Ingo Molnar, Borislav Petkov,
	Jean-Christophe Plagniol-Villard, Antonino Daplas,
	Mauro Carvalho Chehab, Mike Marciniszyn, xen-devel,
	Bjorn Helgaas, Thomas Gleixner, Juergen Gross, Luis R. Rodriguez,
	linux-kernel

On Fri, Mar 27, 2015 at 04:10:03PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 27, 2015 at 4:04 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Fri, Mar 27, 2015 at 02:23:16PM -0700, Andy Lutomirski wrote:
> >> On Fri, Mar 27, 2015 at 1:30 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >> > On Fri, Mar 27, 2015 at 12:58:02PM -0700, Andy Lutomirski wrote:
> >> >> On Fri, Mar 27, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >> >> > On Fri, Mar 20, 2015 at 04:48:46PM -0700, Andy Lutomirski wrote:
> >> >> >> On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez
> >> >> >> <mcgrof@do-not-panic.com> wrote:
> >> >> >> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >> >> >> >
> >> >> >> > Ideally on systems using PAT we can expect a swift
> >> >> >> > transition away from MTRR. There can be a few exceptions
> >> >> >> > to this, one is where device drivers are known to exist
> >> >> >> > on PATs with errata, another situation is observed on
> >> >> >> > old device drivers where devices had combined MMIO
> >> >> >> > register access with whatever area they typically
> >> >> >> > later wanted to end up using MTRR for on the same
> >> >> >> > PCI BAR. This situation can still be addressed by
> >> >> >> > splitting up ioremap'd PCI BAR into two ioremap'd
> >> >> >> > calls, one for MMIO registers, and another for whatever
> >> >> >> > is desirable for write-combining -- in order to
> >> >> >> > accomplish this though quite a bit of driver
> >> >> >> > restructuring is required.
> >> >> >> >
> >> >> >> > Device drivers which are known to require large
> >> >> >> > amount of re-work in order to split ioremap'd areas
> >> >> >> > can use __arch_phys_wc_add() to avoid regressions
> >> >> >> > when PAT is enabled.
> >> >> >> >
> >> >> >> > For a good example driver where things are neatly
> >> >> >> > split up on a PCI BAR refer the infiniband qib
> >> >> >> > driver. For a good example of a driver where good
> >> >> >> > amount of work is required refer to the infiniband
> >> >> >> > ipath driver.
> >> >> >> >
> >> >> >> > This is *only* a transitive API -- and as such no new
> >> >> >> > drivers are ever expected to use this.
> >> >> >>
> >> >> >> What's the exact layout that this helps?  I'm sceptical that this can
> >> >> >> ever be correct.
> >> >> >>
> >> >> >> Is there some awful driver that has a large ioremap that's supposed to
> >> >> >> contain multiple different memtypes?
> >> >> >
> >> >> > Yes, I cc'd you just now on one where I made changes on a driver which uses one
> >> >> > PCI with mixed memtypes and uses MTRR to hole in WC. A transition to
> >> >> > arch_phys_wc_add() is therefore not possible if PAT is enabled as it would
> >> >> > regress those drivers by making the MTRR WC hole trick non functional.
> >> >> > The changes are non trivial and so in this series I supplied changes on
> >> >> > one driver only to show the effort required. The other drivers which
> >> >> > required this were:
> >> >> >
> >> >> > Driver          File
> >> >> > ------------------------------------------------------------
> >> >> > fusion          drivers/message/fusion/mptbase.c
> >> >> > ivtv            drivers/media/pci/ivtv/ivtvfb.c
> >> >> > ipath           drivers/infiniband/hw/ipath/ipath_driver.c
> >> >> >
> >> >> > This series makes those drivers use __arch_phys_wc_add() more as a
> >> >> > transitory phase in hopes we can address the proper split as with the
> >> >> > atyfb illustrates. For ipath the changes required have a nice template
> >> >> > with the qib driver as they share very similar driver structure, the
> >> >> > qib driver *did* do the nice split.
> >> >> >
> >> >> >> If so, can we ioremap + set_page_xyz instead?
> >> >> >
> >> >> > I'm not sure I see which call we'd use.  Care to provide an example patch
> >> >> > alternative for the atyfb as a case in point alternative to the work required
> >> >> > to do the split?
> >> >> >
> >> >>
> >> >> I'm still confused.  Would it be insufficient to ioremap_nocache the
> >> >> whole thing and then call set_memory_wc on parts of it?  (Sorry,
> >> >> set_page_xyz was a typo.)
> >> >
> >> > I think that would be a sexy alternative.
> >> >
> >> > In this driver's case the thing is a bit messy as it not only used
> >> > the WC MTRR for a hole but it also then used a UC MTRR on top of
> >> > it all, so since I already tried to address the split, and if we address
> >> > the power of 2 woes, I think it'd be best to try to remove the UC MTRR
> >> > and just avoid set_page_wc() in this driver's case, but for the other cases
> >> > (fusion, ivtv, ipath) I think this makes sense.
> >> >
> >> > Thoughts?
> >>
> >> Once that WC MTRR is in place, I think you really need UC and not UC-
> >> if you want to override it.  Otherwise I agree with all of this.
> >
> > Do you mean that the UC MTRR work around that was in place might not
> > have really been effective? Not quite sure what you mean. I don't think
> > I follow.
> 
> I mean that the UC MTRR that overrides the WC MTRR was probably fine
> (I hope smaller MTRRs override larger MTRRs).  But we should just
> ditch UC MTRRs entirely, 

Agreed, this series does that, and this patch addresses the last
UC MTRR ;)

> and setting UC in the page tables would work on all CPUs *if we supported
> that*.  We'd need to add a couple trivial helpers to do that.

OK please check my latest reply and if you do not mind clarify what you mean
there as I am not sure if we're on the same page (no pun) here. I don't quite
follow this last statement.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
  2015-03-27 20:40     ` Toshi Kani
@ 2015-03-27 23:56       ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 23:56 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
	suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
	linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Stefan Bader, konrad.wilk,
	ville.syrjala, david.vrabel, bhelgaas, Roger Pau Monné,
	xen-devel

On Fri, Mar 27, 2015 at 02:40:17PM -0600, Toshi Kani wrote:
> On Fri, 2015-03-20 at 16:17 -0700, Luis R. Rodriguez wrote:
>  :
> > @@ -734,6 +742,7 @@ void __init mtrr_bp_init(void)
> >  	}
> >  
> >  	if (mtrr_if) {
> > +		mtrr_enabled = true;
> >  		set_num_var_ranges();
> >  		init_table();
> >  		if (use_intel()) {
>                         get_mtrr_state();
> 
> After setting mtrr_enabled to true, get_mtrr_state() reads
> MSR_MTRRdefType and sets 'mtrr_state.enabled', which also indicates if
> MTRRs are enabled or not on the system.  So, potentially, we could have
> a case that mtrr_enabled is set to true, but mtrr_state.enabled is set
> to disabled when MTRRs are disabled by BIOS.

Thanks for the review, in this case then we should update mtrr_enabled to false.

> ps.
> I recently cleaned up this part of the MTRR code in the patch below,
> which is currently available in the -mm & -next trees.
> https://lkml.org/lkml/2015/3/24/1063

Great I will rebase and work with that and try to address this
consideration you have raised.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
@ 2015-03-27 23:56       ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-27 23:56 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
	suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
	linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Stefan Bader, konrad.wilk,
	ville.syrjala, david.vrabel, bhelgaas, Roger Pau Monné,
	xen-devel

On Fri, Mar 27, 2015 at 02:40:17PM -0600, Toshi Kani wrote:
> On Fri, 2015-03-20 at 16:17 -0700, Luis R. Rodriguez wrote:
>  :
> > @@ -734,6 +742,7 @@ void __init mtrr_bp_init(void)
> >  	}
> >  
> >  	if (mtrr_if) {
> > +		mtrr_enabled = true;
> >  		set_num_var_ranges();
> >  		init_table();
> >  		if (use_intel()) {
>                         get_mtrr_state();
> 
> After setting mtrr_enabled to true, get_mtrr_state() reads
> MSR_MTRRdefType and sets 'mtrr_state.enabled', which also indicates if
> MTRRs are enabled or not on the system.  So, potentially, we could have
> a case that mtrr_enabled is set to true, but mtrr_state.enabled is set
> to disabled when MTRRs are disabled by BIOS.

Thanks for the review, in this case then we should update mtrr_enabled to false.

> ps.
> I recently cleaned up this part of the MTRR code in the patch below,
> which is currently available in the -mm & -next trees.
> https://lkml.org/lkml/2015/3/24/1063

Great I will rebase and work with that and try to address this
consideration you have raised.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-27 21:56             ` Ville Syrjälä
@ 2015-03-28  0:21               ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-28  0:21 UTC (permalink / raw)
  To: Ville Syrjälä,
	Andy Lutomirski, Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
	Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
	linux-kernel, Linux Fbdev development list, X86 ML, xen-devel,
	Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 27, 2015 at 11:56:55PM +0200, Ville Syrjälä wrote:
> On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
> > On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
> > > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> > > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> > > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > > >> > index 8025624..8875e56 100644
> > > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> > > >> >
> > > >> >  #ifdef CONFIG_MTRR
> > > >> >     par->mtrr_aper = -1;
> > > >> > -   par->mtrr_reg = -1;
> > > >> >     if (!nomtrr) {
> > > >> > -           /* Cover the whole resource. */
> > > >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > > >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > > >> > +                                     info->fix.smem_len,
> > > >> >                                       MTRR_TYPE_WRCOMB, 1);
> > > >>
> > > >> MTRRs need power of two size, so how is this supposed to work?
> > > >
> > > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
> > > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> > > > is not standardized and by no means recorded as a requirement. Obviously
> > > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> > > > will use mtrr_check() to verify the the same requirement. Furthermore,
> > > > as per my commit log message:
> > > 
> > > Whatever the code may or may not do, the x86 architecture uses
> > > power-of-two MTRR sizes.  So I'm confused.
> > 
> > There should be no confusion, I simply did not know that *was* the
> > requirement for x86, if that is the case we should add a check for that
> > and perhaps generalize a helper that does the power of two helper changes,
> > the cleanest I found was the vesafb driver solution.
> > 
> > Thoughts?
> 
> The vesafb solution is bad since you'll only end up covering only
> the first 4MB of the framebuffer instead of the almost 8MB you want.

OK so the power of 2 requirement implicates us *having* to use a large
MTRR that includes the MMIo region in the shared PCI case?

Andy, Ville, are we 100% certain about this power of two requirement?
Is that for the base and size or just the size?

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
@ 2015-03-28  0:21               ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-28  0:21 UTC (permalink / raw)
  To: Ville Syrjälä,
	Andy Lutomirski, Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
	Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
	linux-kernel, Linux Fbdev development list, X86 ML, xen-devel,
	Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 27, 2015 at 11:56:55PM +0200, Ville Syrjälä wrote:
> On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
> > On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
> > > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> > > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> > > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > > >> > index 8025624..8875e56 100644
> > > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> > > >> >
> > > >> >  #ifdef CONFIG_MTRR
> > > >> >     par->mtrr_aper = -1;
> > > >> > -   par->mtrr_reg = -1;
> > > >> >     if (!nomtrr) {
> > > >> > -           /* Cover the whole resource. */
> > > >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > > >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > > >> > +                                     info->fix.smem_len,
> > > >> >                                       MTRR_TYPE_WRCOMB, 1);
> > > >>
> > > >> MTRRs need power of two size, so how is this supposed to work?
> > > >
> > > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
> > > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> > > > is not standardized and by no means recorded as a requirement. Obviously
> > > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> > > > will use mtrr_check() to verify the the same requirement. Furthermore,
> > > > as per my commit log message:
> > > 
> > > Whatever the code may or may not do, the x86 architecture uses
> > > power-of-two MTRR sizes.  So I'm confused.
> > 
> > There should be no confusion, I simply did not know that *was* the
> > requirement for x86, if that is the case we should add a check for that
> > and perhaps generalize a helper that does the power of two helper changes,
> > the cleanest I found was the vesafb driver solution.
> > 
> > Thoughts?
> 
> The vesafb solution is bad since you'll only end up covering only
> the first 4MB of the framebuffer instead of the almost 8MB you want.

OK so the power of 2 requirement implicates us *having* to use a large
MTRR that includes the MMIo region in the shared PCI case?

Andy, Ville, are we 100% certain about this power of two requirement?
Is that for the base and size or just the size?

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-27 21:56             ` Ville Syrjälä
                               ` (2 preceding siblings ...)
  (?)
@ 2015-03-28  0:21             ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-28  0:21 UTC (permalink / raw)
  To: Ville Syrjälä,
	Andy Lutomirski, Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
	Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
	linux-kernel, Linux Fbdev development list, X86 ML, xen-devel,
	Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas

On Fri, Mar 27, 2015 at 11:56:55PM +0200, Ville Syrjälä wrote:
> On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
> > On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
> > > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> > > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> > > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > > >> > index 8025624..8875e56 100644
> > > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> > > >> >
> > > >> >  #ifdef CONFIG_MTRR
> > > >> >     par->mtrr_aper = -1;
> > > >> > -   par->mtrr_reg = -1;
> > > >> >     if (!nomtrr) {
> > > >> > -           /* Cover the whole resource. */
> > > >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > > >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > > >> > +                                     info->fix.smem_len,
> > > >> >                                       MTRR_TYPE_WRCOMB, 1);
> > > >>
> > > >> MTRRs need power of two size, so how is this supposed to work?
> > > >
> > > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
> > > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> > > > is not standardized and by no means recorded as a requirement. Obviously
> > > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> > > > will use mtrr_check() to verify the the same requirement. Furthermore,
> > > > as per my commit log message:
> > > 
> > > Whatever the code may or may not do, the x86 architecture uses
> > > power-of-two MTRR sizes.  So I'm confused.
> > 
> > There should be no confusion, I simply did not know that *was* the
> > requirement for x86, if that is the case we should add a check for that
> > and perhaps generalize a helper that does the power of two helper changes,
> > the cleanest I found was the vesafb driver solution.
> > 
> > Thoughts?
> 
> The vesafb solution is bad since you'll only end up covering only
> the first 4MB of the framebuffer instead of the almost 8MB you want.

OK so the power of 2 requirement implicates us *having* to use a large
MTRR that includes the MMIo region in the shared PCI case?

Andy, Ville, are we 100% certain about this power of two requirement?
Is that for the base and size or just the size?

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-27 22:02               ` Andy Lutomirski
@ 2015-03-28  0:28                 ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-28  0:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ville Syrjälä,
	Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Juergen Gross, Jan Beulich, Borislav Petkov,
	Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
	Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
	Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 27, 2015 at 03:02:10PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 27, 2015 at 2:56 PM, Ville Syrjälä <syrjala@sci.fi> wrote:
> > On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
> >> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
> >> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> >> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> >> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> >> > >> > index 8025624..8875e56 100644
> >> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> >> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> >> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> >> > >> >
> >> > >> >  #ifdef CONFIG_MTRR
> >> > >> >     par->mtrr_aper = -1;
> >> > >> > -   par->mtrr_reg = -1;
> >> > >> >     if (!nomtrr) {
> >> > >> > -           /* Cover the whole resource. */
> >> > >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> >> > >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
> >> > >> > +                                     info->fix.smem_len,
> >> > >> >                                       MTRR_TYPE_WRCOMB, 1);
> >> > >>
> >> > >> MTRRs need power of two size, so how is this supposed to work?
> >> > >
> >> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
> >> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> >> > > is not standardized and by no means recorded as a requirement. Obviously
> >> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> >> > > will use mtrr_check() to verify the the same requirement. Furthermore,
> >> > > as per my commit log message:
> >> >
> >> > Whatever the code may or may not do, the x86 architecture uses
> >> > power-of-two MTRR sizes.  So I'm confused.
> >>
> >> There should be no confusion, I simply did not know that *was* the
> >> requirement for x86, if that is the case we should add a check for that
> >> and perhaps generalize a helper that does the power of two helper changes,
> >> the cleanest I found was the vesafb driver solution.
> >>
> >> Thoughts?
> >
> > The vesafb solution is bad since you'll only end up covering only
> > the first 4MB of the framebuffer instead of the almost 8MB you want.
> > Which in practice will mean throwing away half the VRAM since you really
> > don't want the massive performance hit from accessing it as UC. And that
> > would mean giving up decent display resolutions as well :(
> >
> > And the other option of trying to cover the remainder with multiple ever
> > smaller MTRRs doesn't work either since you'll run out of MTRRs very
> > quickly.
> >
> > This is precisely why I used the hole method in atyfb in the first
> > place.
> >
> > I don't really like the idea of any new mtrr code not supporting that
> > use case, especially as these things tend to be present in older machines
> > where PAT isn't an option.
> 
> According to the Intel SDM, volume 3, section 11.5.2.1, table 11-6,
> non-PAT CPUs that have a WC MTRR, PCD = 1, and PWT = 1 (aka UC) have
> an effective memory type of UC.  Hence my suggestion to add
> ioremap_x86_uc and/or set_memory_x86_uc to punch a UC hole in an
> otherwise WC MTRR-covered region.

OK I think I get it now.

And I take it this would hopefully only be used for non-PAT systems?
Would there be a use case for PAT systems? I wonder if we can wrap
this under some APIs to make it clean and hide this dirty thing
behind the scenes, it seems a fragile and error prone and my hope
would be that we won't need more specialization in this area for
PAT systems.

> ioremap_nocache is UC- (even on non-PAT unless I misunderstood how
> this stuff works), so ioremap_nocache by itself isn't good enough.

Thanks for the clarification.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
@ 2015-03-28  0:28                 ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-28  0:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ville Syrjälä,
	Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Juergen Gross, Jan Beulich, Borislav Petkov,
	Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
	Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
	Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 27, 2015 at 03:02:10PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 27, 2015 at 2:56 PM, Ville Syrjälä <syrjala@sci.fi> wrote:
> > On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
> >> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
> >> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> >> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> >> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> >> > >> > index 8025624..8875e56 100644
> >> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> >> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> >> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> >> > >> >
> >> > >> >  #ifdef CONFIG_MTRR
> >> > >> >     par->mtrr_aper = -1;
> >> > >> > -   par->mtrr_reg = -1;
> >> > >> >     if (!nomtrr) {
> >> > >> > -           /* Cover the whole resource. */
> >> > >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> >> > >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
> >> > >> > +                                     info->fix.smem_len,
> >> > >> >                                       MTRR_TYPE_WRCOMB, 1);
> >> > >>
> >> > >> MTRRs need power of two size, so how is this supposed to work?
> >> > >
> >> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
> >> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> >> > > is not standardized and by no means recorded as a requirement. Obviously
> >> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> >> > > will use mtrr_check() to verify the the same requirement. Furthermore,
> >> > > as per my commit log message:
> >> >
> >> > Whatever the code may or may not do, the x86 architecture uses
> >> > power-of-two MTRR sizes.  So I'm confused.
> >>
> >> There should be no confusion, I simply did not know that *was* the
> >> requirement for x86, if that is the case we should add a check for that
> >> and perhaps generalize a helper that does the power of two helper changes,
> >> the cleanest I found was the vesafb driver solution.
> >>
> >> Thoughts?
> >
> > The vesafb solution is bad since you'll only end up covering only
> > the first 4MB of the framebuffer instead of the almost 8MB you want.
> > Which in practice will mean throwing away half the VRAM since you really
> > don't want the massive performance hit from accessing it as UC. And that
> > would mean giving up decent display resolutions as well :(
> >
> > And the other option of trying to cover the remainder with multiple ever
> > smaller MTRRs doesn't work either since you'll run out of MTRRs very
> > quickly.
> >
> > This is precisely why I used the hole method in atyfb in the first
> > place.
> >
> > I don't really like the idea of any new mtrr code not supporting that
> > use case, especially as these things tend to be present in older machines
> > where PAT isn't an option.
> 
> According to the Intel SDM, volume 3, section 11.5.2.1, table 11-6,
> non-PAT CPUs that have a WC MTRR, PCD = 1, and PWT = 1 (aka UC) have
> an effective memory type of UC.  Hence my suggestion to add
> ioremap_x86_uc and/or set_memory_x86_uc to punch a UC hole in an
> otherwise WC MTRR-covered region.

OK I think I get it now.

And I take it this would hopefully only be used for non-PAT systems?
Would there be a use case for PAT systems? I wonder if we can wrap
this under some APIs to make it clean and hide this dirty thing
behind the scenes, it seems a fragile and error prone and my hope
would be that we won't need more specialization in this area for
PAT systems.

> ioremap_nocache is UC- (even on non-PAT unless I misunderstood how
> this stuff works), so ioremap_nocache by itself isn't good enough.

Thanks for the clarification.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-27 22:02               ` Andy Lutomirski
  (?)
@ 2015-03-28  0:28               ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-03-28  0:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux Fbdev development list, Daniel Vetter,
	Ville Syrjälä,
	Jan Beulich, H. Peter Anvin, Suresh Siddha, Tomi Valkeinen,
	X86 ML, Ingo Molnar, xen-devel, Ingo Molnar, Borislav Petkov,
	Jean-Christophe Plagniol-Villard, Antonino Daplas, Dave Airlie,
	Bjorn Helgaas, Thomas Gleixner, Juergen Gross, Luis R. Rodriguez,
	linux-kernel, venkatesh.pallipadi

On Fri, Mar 27, 2015 at 03:02:10PM -0700, Andy Lutomirski wrote:
> On Fri, Mar 27, 2015 at 2:56 PM, Ville Syrjälä <syrjala@sci.fi> wrote:
> > On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
> >> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
> >> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> >> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> >> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> >> > >> > index 8025624..8875e56 100644
> >> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> >> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> >> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> >> > >> >
> >> > >> >  #ifdef CONFIG_MTRR
> >> > >> >     par->mtrr_aper = -1;
> >> > >> > -   par->mtrr_reg = -1;
> >> > >> >     if (!nomtrr) {
> >> > >> > -           /* Cover the whole resource. */
> >> > >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> >> > >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
> >> > >> > +                                     info->fix.smem_len,
> >> > >> >                                       MTRR_TYPE_WRCOMB, 1);
> >> > >>
> >> > >> MTRRs need power of two size, so how is this supposed to work?
> >> > >
> >> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
> >> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> >> > > is not standardized and by no means recorded as a requirement. Obviously
> >> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> >> > > will use mtrr_check() to verify the the same requirement. Furthermore,
> >> > > as per my commit log message:
> >> >
> >> > Whatever the code may or may not do, the x86 architecture uses
> >> > power-of-two MTRR sizes.  So I'm confused.
> >>
> >> There should be no confusion, I simply did not know that *was* the
> >> requirement for x86, if that is the case we should add a check for that
> >> and perhaps generalize a helper that does the power of two helper changes,
> >> the cleanest I found was the vesafb driver solution.
> >>
> >> Thoughts?
> >
> > The vesafb solution is bad since you'll only end up covering only
> > the first 4MB of the framebuffer instead of the almost 8MB you want.
> > Which in practice will mean throwing away half the VRAM since you really
> > don't want the massive performance hit from accessing it as UC. And that
> > would mean giving up decent display resolutions as well :(
> >
> > And the other option of trying to cover the remainder with multiple ever
> > smaller MTRRs doesn't work either since you'll run out of MTRRs very
> > quickly.
> >
> > This is precisely why I used the hole method in atyfb in the first
> > place.
> >
> > I don't really like the idea of any new mtrr code not supporting that
> > use case, especially as these things tend to be present in older machines
> > where PAT isn't an option.
> 
> According to the Intel SDM, volume 3, section 11.5.2.1, table 11-6,
> non-PAT CPUs that have a WC MTRR, PCD = 1, and PWT = 1 (aka UC) have
> an effective memory type of UC.  Hence my suggestion to add
> ioremap_x86_uc and/or set_memory_x86_uc to punch a UC hole in an
> otherwise WC MTRR-covered region.

OK I think I get it now.

And I take it this would hopefully only be used for non-PAT systems?
Would there be a use case for PAT systems? I wonder if we can wrap
this under some APIs to make it clean and hide this dirty thing
behind the scenes, it seems a fragile and error prone and my hope
would be that we won't need more specialization in this area for
PAT systems.

> ioremap_nocache is UC- (even on non-PAT unless I misunderstood how
> this stuff works), so ioremap_nocache by itself isn't good enough.

Thanks for the clarification.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-28  0:28                 ` Luis R. Rodriguez
@ 2015-03-28 12:23                   ` Ville Syrjälä
  -1 siblings, 0 replies; 710+ messages in thread
From: Ville Syrjälä @ 2015-03-28 12:23 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Andy Lutomirski, Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
	Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
	linux-kernel, Linux Fbdev development list, X86 ML, xen-devel,
	Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Sat, Mar 28, 2015 at 01:28:18AM +0100, Luis R. Rodriguez wrote:
> On Fri, Mar 27, 2015 at 03:02:10PM -0700, Andy Lutomirski wrote:
> > On Fri, Mar 27, 2015 at 2:56 PM, Ville Syrjälä <syrjala@sci.fi> wrote:
> > > On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
> > >> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
> > >> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > >> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> > >> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> > >> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > >> > >> > index 8025624..8875e56 100644
> > >> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > >> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > >> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> > >> > >> >
> > >> > >> >  #ifdef CONFIG_MTRR
> > >> > >> >     par->mtrr_aper = -1;
> > >> > >> > -   par->mtrr_reg = -1;
> > >> > >> >     if (!nomtrr) {
> > >> > >> > -           /* Cover the whole resource. */
> > >> > >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > >> > >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > >> > >> > +                                     info->fix.smem_len,
> > >> > >> >                                       MTRR_TYPE_WRCOMB, 1);
> > >> > >>
> > >> > >> MTRRs need power of two size, so how is this supposed to work?
> > >> > >
> > >> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
> > >> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> > >> > > is not standardized and by no means recorded as a requirement. Obviously
> > >> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> > >> > > will use mtrr_check() to verify the the same requirement. Furthermore,
> > >> > > as per my commit log message:
> > >> >
> > >> > Whatever the code may or may not do, the x86 architecture uses
> > >> > power-of-two MTRR sizes.  So I'm confused.
> > >>
> > >> There should be no confusion, I simply did not know that *was* the
> > >> requirement for x86, if that is the case we should add a check for that
> > >> and perhaps generalize a helper that does the power of two helper changes,
> > >> the cleanest I found was the vesafb driver solution.
> > >>
> > >> Thoughts?
> > >
> > > The vesafb solution is bad since you'll only end up covering only
> > > the first 4MB of the framebuffer instead of the almost 8MB you want.
> > > Which in practice will mean throwing away half the VRAM since you really
> > > don't want the massive performance hit from accessing it as UC. And that
> > > would mean giving up decent display resolutions as well :(
> > >
> > > And the other option of trying to cover the remainder with multiple ever
> > > smaller MTRRs doesn't work either since you'll run out of MTRRs very
> > > quickly.
> > >
> > > This is precisely why I used the hole method in atyfb in the first
> > > place.
> > >
> > > I don't really like the idea of any new mtrr code not supporting that
> > > use case, especially as these things tend to be present in older machines
> > > where PAT isn't an option.
> > 
> > According to the Intel SDM, volume 3, section 11.5.2.1, table 11-6,
> > non-PAT CPUs that have a WC MTRR, PCD = 1, and PWT = 1 (aka UC) have
> > an effective memory type of UC.  Hence my suggestion to add
> > ioremap_x86_uc and/or set_memory_x86_uc to punch a UC hole in an
> > otherwise WC MTRR-covered region.
> 
> OK I think I get it now.
> 
> And I take it this would hopefully only be used for non-PAT systems?
> Would there be a use case for PAT systems? I wonder if we can wrap
> this under some APIs to make it clean and hide this dirty thing
> behind the scenes, it seems a fragile and error prone and my hope
> would be that we won't need more specialization in this area for
> PAT systems.

One potential complication is kernel vs. userspace mmap. MTRR applies to
the physical address, but PAT applies to the virtual address, so with
the WC MTRR you get WC for userspace "for free" as well. Also the
userspace mmaps request will have the length of smem_len (at most), so
it won't be the nice power of two in that case.

Also on PAT systems w/o a BIOS provided WC MTRR, the fbdev mmap seems
to be total crap at the moment. IIRC I have a patch to fix things a bit...

>From 4e6d70d223f35953c8a11a58cf3376a8a001fa4f Mon Sep 17 00:00:00 2001
From: Ville Syrjala <syrjala@sci.fi>
Date: Fri, 15 Apr 2011 04:02:43 +0300
Subject: [PATCH] fb: writecombine fb

---
 drivers/video/fbdev/core/fbmem.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
index 0705d88..ecbde0e 100644
--- a/drivers/video/fbdev/core/fbmem.c
+++ b/drivers/video/fbdev/core/fbmem.c
@@ -1396,6 +1396,7 @@ fb_mmap(struct file *file, struct vm_area_struct * vma)
 	unsigned long mmio_pgoff;
 	unsigned long start;
 	u32 len;
+	bool mmio = false;
 
 	if (!info)
 		return -ENODEV;
@@ -1426,11 +1427,20 @@ fb_mmap(struct file *file, struct vm_area_struct * vma)
 		vma->vm_pgoff -= mmio_pgoff;
 		start = info->fix.mmio_start;
 		len = info->fix.mmio_len;
+		mmio = true;
 	}
 	mutex_unlock(&info->mm_lock);
 
+	if (!mmio) {
+		vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
+		vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
+
+		if (!vm_iomap_memory(vma, start, len))
+			return 0;
+	}
+
 	vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
-	fb_pgprotect(file, vma, start);
+	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
 
 	return vm_iomap_memory(vma, start, len);
 }

Perhaps it's time I tried to send that upstream properly :P

-- 
Ville Syrjälä
syrjala@sci.fi
http://www.sci.fi/~syrjala/

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
@ 2015-03-28 12:23                   ` Ville Syrjälä
  0 siblings, 0 replies; 710+ messages in thread
From: Ville Syrjälä @ 2015-03-28 12:23 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Andy Lutomirski, Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
	Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
	linux-kernel, Linux Fbdev development list, X86 ML, xen-devel,
	Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Sat, Mar 28, 2015 at 01:28:18AM +0100, Luis R. Rodriguez wrote:
> On Fri, Mar 27, 2015 at 03:02:10PM -0700, Andy Lutomirski wrote:
> > On Fri, Mar 27, 2015 at 2:56 PM, Ville Syrjälä <syrjala@sci.fi> wrote:
> > > On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
> > >> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
> > >> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > >> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> > >> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> > >> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > >> > >> > index 8025624..8875e56 100644
> > >> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > >> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > >> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> > >> > >> >
> > >> > >> >  #ifdef CONFIG_MTRR
> > >> > >> >     par->mtrr_aper = -1;
> > >> > >> > -   par->mtrr_reg = -1;
> > >> > >> >     if (!nomtrr) {
> > >> > >> > -           /* Cover the whole resource. */
> > >> > >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > >> > >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > >> > >> > +                                     info->fix.smem_len,
> > >> > >> >                                       MTRR_TYPE_WRCOMB, 1);
> > >> > >>
> > >> > >> MTRRs need power of two size, so how is this supposed to work?
> > >> > >
> > >> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
> > >> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> > >> > > is not standardized and by no means recorded as a requirement. Obviously
> > >> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> > >> > > will use mtrr_check() to verify the the same requirement. Furthermore,
> > >> > > as per my commit log message:
> > >> >
> > >> > Whatever the code may or may not do, the x86 architecture uses
> > >> > power-of-two MTRR sizes.  So I'm confused.
> > >>
> > >> There should be no confusion, I simply did not know that *was* the
> > >> requirement for x86, if that is the case we should add a check for that
> > >> and perhaps generalize a helper that does the power of two helper changes,
> > >> the cleanest I found was the vesafb driver solution.
> > >>
> > >> Thoughts?
> > >
> > > The vesafb solution is bad since you'll only end up covering only
> > > the first 4MB of the framebuffer instead of the almost 8MB you want.
> > > Which in practice will mean throwing away half the VRAM since you really
> > > don't want the massive performance hit from accessing it as UC. And that
> > > would mean giving up decent display resolutions as well :(
> > >
> > > And the other option of trying to cover the remainder with multiple ever
> > > smaller MTRRs doesn't work either since you'll run out of MTRRs very
> > > quickly.
> > >
> > > This is precisely why I used the hole method in atyfb in the first
> > > place.
> > >
> > > I don't really like the idea of any new mtrr code not supporting that
> > > use case, especially as these things tend to be present in older machines
> > > where PAT isn't an option.
> > 
> > According to the Intel SDM, volume 3, section 11.5.2.1, table 11-6,
> > non-PAT CPUs that have a WC MTRR, PCD = 1, and PWT = 1 (aka UC) have
> > an effective memory type of UC.  Hence my suggestion to add
> > ioremap_x86_uc and/or set_memory_x86_uc to punch a UC hole in an
> > otherwise WC MTRR-covered region.
> 
> OK I think I get it now.
> 
> And I take it this would hopefully only be used for non-PAT systems?
> Would there be a use case for PAT systems? I wonder if we can wrap
> this under some APIs to make it clean and hide this dirty thing
> behind the scenes, it seems a fragile and error prone and my hope
> would be that we won't need more specialization in this area for
> PAT systems.

One potential complication is kernel vs. userspace mmap. MTRR applies to
the physical address, but PAT applies to the virtual address, so with
the WC MTRR you get WC for userspace "for free" as well. Also the
userspace mmaps request will have the length of smem_len (at most), so
it won't be the nice power of two in that case.

Also on PAT systems w/o a BIOS provided WC MTRR, the fbdev mmap seems
to be total crap at the moment. IIRC I have a patch to fix things a bit...

From 4e6d70d223f35953c8a11a58cf3376a8a001fa4f Mon Sep 17 00:00:00 2001
From: Ville Syrjala <syrjala@sci.fi>
Date: Fri, 15 Apr 2011 04:02:43 +0300
Subject: [PATCH] fb: writecombine fb

---
 drivers/video/fbdev/core/fbmem.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
index 0705d88..ecbde0e 100644
--- a/drivers/video/fbdev/core/fbmem.c
+++ b/drivers/video/fbdev/core/fbmem.c
@@ -1396,6 +1396,7 @@ fb_mmap(struct file *file, struct vm_area_struct * vma)
 	unsigned long mmio_pgoff;
 	unsigned long start;
 	u32 len;
+	bool mmio = false;
 
 	if (!info)
 		return -ENODEV;
@@ -1426,11 +1427,20 @@ fb_mmap(struct file *file, struct vm_area_struct * vma)
 		vma->vm_pgoff -= mmio_pgoff;
 		start = info->fix.mmio_start;
 		len = info->fix.mmio_len;
+		mmio = true;
 	}
 	mutex_unlock(&info->mm_lock);
 
+	if (!mmio) {
+		vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
+		vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
+
+		if (!vm_iomap_memory(vma, start, len))
+			return 0;
+	}
+
 	vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
-	fb_pgprotect(file, vma, start);
+	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
 
 	return vm_iomap_memory(vma, start, len);
 }

Perhaps it's time I tried to send that upstream properly :P

-- 
Ville Syrjälä
syrjala@sci.fi
http://www.sci.fi/~syrjala/

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-28  0:28                 ` Luis R. Rodriguez
  (?)
  (?)
@ 2015-03-28 12:23                 ` Ville Syrjälä
  -1 siblings, 0 replies; 710+ messages in thread
From: Ville Syrjälä @ 2015-03-28 12:23 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Linux Fbdev development list, Daniel Vetter, Jan Beulich,
	H. Peter Anvin, Suresh Siddha, X86 ML, Tomi Valkeinen, xen-devel,
	Ingo Molnar, Borislav Petkov, Jean-Christophe Plagniol-Villard,
	Antonino Daplas, Dave Airlie, Bjorn Helgaas, Thomas Gleixner,
	Ingo Molnar, Juergen Gross, Luis R. Rodriguez, linux-kernel,
	Andy Lutomirski, venkatesh.pallipadi, Linus Torvalds

On Sat, Mar 28, 2015 at 01:28:18AM +0100, Luis R. Rodriguez wrote:
> On Fri, Mar 27, 2015 at 03:02:10PM -0700, Andy Lutomirski wrote:
> > On Fri, Mar 27, 2015 at 2:56 PM, Ville Syrjälä <syrjala@sci.fi> wrote:
> > > On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
> > >> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
> > >> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > >> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> > >> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> > >> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > >> > >> > index 8025624..8875e56 100644
> > >> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > >> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > >> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> > >> > >> >
> > >> > >> >  #ifdef CONFIG_MTRR
> > >> > >> >     par->mtrr_aper = -1;
> > >> > >> > -   par->mtrr_reg = -1;
> > >> > >> >     if (!nomtrr) {
> > >> > >> > -           /* Cover the whole resource. */
> > >> > >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > >> > >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > >> > >> > +                                     info->fix.smem_len,
> > >> > >> >                                       MTRR_TYPE_WRCOMB, 1);
> > >> > >>
> > >> > >> MTRRs need power of two size, so how is this supposed to work?
> > >> > >
> > >> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
> > >> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> > >> > > is not standardized and by no means recorded as a requirement. Obviously
> > >> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> > >> > > will use mtrr_check() to verify the the same requirement. Furthermore,
> > >> > > as per my commit log message:
> > >> >
> > >> > Whatever the code may or may not do, the x86 architecture uses
> > >> > power-of-two MTRR sizes.  So I'm confused.
> > >>
> > >> There should be no confusion, I simply did not know that *was* the
> > >> requirement for x86, if that is the case we should add a check for that
> > >> and perhaps generalize a helper that does the power of two helper changes,
> > >> the cleanest I found was the vesafb driver solution.
> > >>
> > >> Thoughts?
> > >
> > > The vesafb solution is bad since you'll only end up covering only
> > > the first 4MB of the framebuffer instead of the almost 8MB you want.
> > > Which in practice will mean throwing away half the VRAM since you really
> > > don't want the massive performance hit from accessing it as UC. And that
> > > would mean giving up decent display resolutions as well :(
> > >
> > > And the other option of trying to cover the remainder with multiple ever
> > > smaller MTRRs doesn't work either since you'll run out of MTRRs very
> > > quickly.
> > >
> > > This is precisely why I used the hole method in atyfb in the first
> > > place.
> > >
> > > I don't really like the idea of any new mtrr code not supporting that
> > > use case, especially as these things tend to be present in older machines
> > > where PAT isn't an option.
> > 
> > According to the Intel SDM, volume 3, section 11.5.2.1, table 11-6,
> > non-PAT CPUs that have a WC MTRR, PCD = 1, and PWT = 1 (aka UC) have
> > an effective memory type of UC.  Hence my suggestion to add
> > ioremap_x86_uc and/or set_memory_x86_uc to punch a UC hole in an
> > otherwise WC MTRR-covered region.
> 
> OK I think I get it now.
> 
> And I take it this would hopefully only be used for non-PAT systems?
> Would there be a use case for PAT systems? I wonder if we can wrap
> this under some APIs to make it clean and hide this dirty thing
> behind the scenes, it seems a fragile and error prone and my hope
> would be that we won't need more specialization in this area for
> PAT systems.

One potential complication is kernel vs. userspace mmap. MTRR applies to
the physical address, but PAT applies to the virtual address, so with
the WC MTRR you get WC for userspace "for free" as well. Also the
userspace mmaps request will have the length of smem_len (at most), so
it won't be the nice power of two in that case.

Also on PAT systems w/o a BIOS provided WC MTRR, the fbdev mmap seems
to be total crap at the moment. IIRC I have a patch to fix things a bit...

>From 4e6d70d223f35953c8a11a58cf3376a8a001fa4f Mon Sep 17 00:00:00 2001
From: Ville Syrjala <syrjala@sci.fi>
Date: Fri, 15 Apr 2011 04:02:43 +0300
Subject: [PATCH] fb: writecombine fb

---
 drivers/video/fbdev/core/fbmem.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
index 0705d88..ecbde0e 100644
--- a/drivers/video/fbdev/core/fbmem.c
+++ b/drivers/video/fbdev/core/fbmem.c
@@ -1396,6 +1396,7 @@ fb_mmap(struct file *file, struct vm_area_struct * vma)
 	unsigned long mmio_pgoff;
 	unsigned long start;
 	u32 len;
+	bool mmio = false;
 
 	if (!info)
 		return -ENODEV;
@@ -1426,11 +1427,20 @@ fb_mmap(struct file *file, struct vm_area_struct * vma)
 		vma->vm_pgoff -= mmio_pgoff;
 		start = info->fix.mmio_start;
 		len = info->fix.mmio_len;
+		mmio = true;
 	}
 	mutex_unlock(&info->mm_lock);
 
+	if (!mmio) {
+		vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
+		vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
+
+		if (!vm_iomap_memory(vma, start, len))
+			return 0;
+	}
+
 	vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
-	fb_pgprotect(file, vma, start);
+	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
 
 	return vm_iomap_memory(vma, start, len);
 }

Perhaps it's time I tried to send that upstream properly :P

-- 
Ville Syrjälä
syrjala@sci.fi
http://www.sci.fi/~syrjala/

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-28 12:23                   ` Ville Syrjälä
@ 2015-04-01 23:52                     ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-01 23:52 UTC (permalink / raw)
  To: Ville Syrjälä,
	Andy Lutomirski, Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
	Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
	linux-kernel, Linux Fbdev development list, X86 ML, xen-devel,
	Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Sat, Mar 28, 2015 at 02:23:34PM +0200, Ville Syrjälä wrote:
> On Sat, Mar 28, 2015 at 01:28:18AM +0100, Luis R. Rodriguez wrote:
> > On Fri, Mar 27, 2015 at 03:02:10PM -0700, Andy Lutomirski wrote:
> > > On Fri, Mar 27, 2015 at 2:56 PM, Ville Syrjälä <syrjala@sci.fi> wrote:
> > > > On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
> > > >> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
> > > >> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > > >> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> > > >> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> > > >> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > > >> > >> > index 8025624..8875e56 100644
> > > >> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > > >> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > > >> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> > > >> > >> >
> > > >> > >> >  #ifdef CONFIG_MTRR
> > > >> > >> >     par->mtrr_aper = -1;
> > > >> > >> > -   par->mtrr_reg = -1;
> > > >> > >> >     if (!nomtrr) {
> > > >> > >> > -           /* Cover the whole resource. */
> > > >> > >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > > >> > >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > > >> > >> > +                                     info->fix.smem_len,
> > > >> > >> >                                       MTRR_TYPE_WRCOMB, 1);
> > > >> > >>
> > > >> > >> MTRRs need power of two size, so how is this supposed to work?
> > > >> > >
> > > >> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
> > > >> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> > > >> > > is not standardized and by no means recorded as a requirement. Obviously
> > > >> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> > > >> > > will use mtrr_check() to verify the the same requirement. Furthermore,
> > > >> > > as per my commit log message:
> > > >> >
> > > >> > Whatever the code may or may not do, the x86 architecture uses
> > > >> > power-of-two MTRR sizes.  So I'm confused.
> > > >>
> > > >> There should be no confusion, I simply did not know that *was* the
> > > >> requirement for x86, if that is the case we should add a check for that
> > > >> and perhaps generalize a helper that does the power of two helper changes,
> > > >> the cleanest I found was the vesafb driver solution.
> > > >>
> > > >> Thoughts?
> > > >
> > > > The vesafb solution is bad since you'll only end up covering only
> > > > the first 4MB of the framebuffer instead of the almost 8MB you want.
> > > > Which in practice will mean throwing away half the VRAM since you really
> > > > don't want the massive performance hit from accessing it as UC. And that
> > > > would mean giving up decent display resolutions as well :(
> > > >
> > > > And the other option of trying to cover the remainder with multiple ever
> > > > smaller MTRRs doesn't work either since you'll run out of MTRRs very
> > > > quickly.
> > > >
> > > > This is precisely why I used the hole method in atyfb in the first
> > > > place.
> > > >
> > > > I don't really like the idea of any new mtrr code not supporting that
> > > > use case, especially as these things tend to be present in older machines
> > > > where PAT isn't an option.
> > > 
> > > According to the Intel SDM, volume 3, section 11.5.2.1, table 11-6,
> > > non-PAT CPUs that have a WC MTRR, PCD = 1, and PWT = 1 (aka UC) have
> > > an effective memory type of UC.

This is true but non-PAT systems that use just ioremap() will default to
_PAGE_CACHE_MODE_UC_MINUS, not _PAGE_CACHE_MODE_UC, and _PAGE_CACHE_MODE_UC_MINUS
on Linux has PCD = 1, PWT = 0. The list comes from:

uint16_t __cachemode2pte_tbl[_PAGE_CACHE_MODE_NUM] = {                          
        [_PAGE_CACHE_MODE_WB      ]     = 0         | 0        ,                
        [_PAGE_CACHE_MODE_WC      ]     = _PAGE_PWT | 0        ,                
        [_PAGE_CACHE_MODE_UC_MINUS]     = 0         | _PAGE_PCD,                
        [_PAGE_CACHE_MODE_UC      ]     = _PAGE_PWT | _PAGE_PCD,                
        [_PAGE_CACHE_MODE_WT      ]     = 0         | _PAGE_PCD,                
        [_PAGE_CACHE_MODE_WP      ]     = 0         | _PAGE_PCD,                
};                                                                              

This can better be read here:

 PAT
 |PCD
 ||PWT
 |||
 000 WB          _PAGE_CACHE_MODE_WB
 001 WC          _PAGE_CACHE_MODE_WC
 010 UC-         _PAGE_CACHE_MODE_UC_MINUS
 011 UC          _PAGE_CACHE_MODE_UC

On x86 ioremap() defaults to ioremap_nocache() and right now that uses
_PAGE_CACHE_MODE_UC_MINUS not _PAGE_CACHE_MODE_UC. We have two cases
to consider for non-PAT systems then:

a) Right now as ioremap() and ioremap_nocache() default to _PAGE_CACHE_MODE_UC_MINUS
   on x86. In this case using a WC MTRR seems to use PWT=0, PCD=1, and
   table table 11-6 on non-PAT systems seems to place this situation as
   "implementation defined" and not encouraged.

a) when commit de33c442e "x86 PAT: fix performance drop for glx, use
   UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()"
   gets reverted and we use _PAGE_CACHE_MODE_UC by default. In this
   case on x86 for both ioremap() and ioremap_nocache() as they will
   both default to _PAGE_CACHE_MODE_UC we'll end up as you note with
   an effective memory type of UC.

If I've understood this correctly then neither of these situations are good and
its just by chance that on some systems situation a) has lead to proper WC.

On a PAT system we have a bit different combinatorial results (based on Table
11-7):

a) Right now ioremap() and ioremap_nocache() defaulting to
    _PAGE_CACHE_MODE_UC_MINUS yields + MTRR WC = WC

b) When commit de33c442e gets reverted _PAGE_CACHE_MODE_UC + MTRR WC = UC

So to be clear right now atyfb should work fine on PAT systems
with de33c442e in place, once reverted as-is right now we'd end
up with UC effective memory type.

For both PAT and non-PAT systems when commit de33c442e gets reverted
we'd end up with UC as the effective memory type for atyfb. Right
now it shoud work on PAT systems and by chance its suspected to work
on non-PAT systems. We want to phase MTRR though, specially to avoid
all this insane combinatorial nightmware.

> > > Hence my suggestion to add
> > > ioremap_x86_uc and/or set_memory_x86_uc to punch a UC hole in an
> > > otherwise WC MTRR-covered region.

To be clear I think you mean then that ioremap_x86_uc() would help us avoid the
jumps between combinatorial issues with MTRR on PAT / non-PAT systems before
and after commit de33c442e gets reverted. So for instance if we had on the
atyfb driver:

ioremap_x86_uc(PCI BAR)
ioremap_wc(framebuffer)
arch_phys_add_wc(PCI BAR)

On non-PAT systems on the MMIO region with PWT=1, PCD=1 we'd end up with UC.
Sadly though since _PAGE_CACHE_WC on non-PAT has PWT=1, PCD=0, the WC
MTRR that follows would mean we'd end up with another grey area (but
similar to before as technically an effectivethe memory type of WC).

On PAT systems the above would not use MTRRs but we'd be counting on
overlapping memory types -- its not clear if aliasing here is a problem.

Also Intel SDM, volume 3, section "11.11.4 Range Size and Alignment Requirement"
describes that: "the minimum range size is 4 KiB, the base address must be on
a 4 KiB boundary. For ranges greater than 4 KiB each range must be of length
2^n and its base address must be alinged on a 2^n boundary where n is a value
equal or greatar then 12. The base-address alignment value cannot be less
than its length. For example, an 8-KiB range cannot be aligned on a
4-KiB boundary. It must be aligned on at least an 8-KiB boundary"

So to answer my own question: indeed, our framebuffer base address must be
aligned on a 2^n boundary, the size also has to be a power of 2. MTRR supports
fixed range sizes and variable range sizes, in case of the MMIO that does
not need to abide by the power of 2 rule as a fixed range size of 4 KiB
could be used although upon review ouf our own implemetnation its unclear if
that is what is used for 4 KiB sized MTRRs.

Hence my arch_phys_add_wc(PCI BAR) as above.

> > OK I think I get it now.
> > 
> > And I take it this would hopefully only be used for non-PAT systems?

Since we likely could care to use ioremap_x86_uc() on PAT systems as well we
could make the effective for both PAT and non-PAT obviously then.  Later when
we get ioremap() to default to strong UC we could drop ioremap_x86_uc() as we'd
only need it as transitory until then -- that is unless we want perhaps a strong
UC ioremap primitive which is always following strong UC when available regardless
of these default transitions.

The big issue I see here is simply the combinatorial issues, so I do think
its best to annotate these corner cases well and avoid them.

> > Would there be a use case for PAT systems? I wonder if we can wrap
> > this under some APIs to make it clean and hide this dirty thing
> > behind the scenes, it seems a fragile and error prone and my hope
> > would be that we won't need more specialization in this area for
> > PAT systems.
> 
> One potential complication is kernel vs. userspace mmap. MTRR applies to
> the physical address, but PAT applies to the virtual address, so with
> the WC MTRR you get WC for userspace "for free" as well.

What is the performance impact of having the conversion being done by the
kernel? Has anyone done measurements? If significant can't the subsystem mmap()
cache the phys address for PAT? Shouldn't the TLB take care of those considerations
for us? If this is generally desirable shouldn't we just generalize the cache
for devices for O(1) access through a generic API?

Can the difference, other than a possible performance hit, implicate a userspace
visible change?

If the performance / userspace effect is neglibable then I'd expect the gains
from cleaner code / APIs to outweight the cons. After all the goal is to
streamline PAT when possible here.

> Also the
> userspace mmaps request will have the length of smem_len (at most), so
> it won't be the nice power of two in that case.

Does that length change implicate a userspace visible change?

> Also on PAT systems w/o a BIOS provided WC MTRR, the fbdev mmap seems
> to be total crap at the moment. IIRC I have a patch to fix things a bit...

Isn't that becuase of the lack of the ioremap_wc()'s? You seem to be
alternatively doing this with pgprot_writecombine(), more on this strategy
below though.

> From 4e6d70d223f35953c8a11a58cf3376a8a001fa4f Mon Sep 17 00:00:00 2001
> From: Ville Syrjala <syrjala@sci.fi>
> Date: Fri, 15 Apr 2011 04:02:43 +0300
> Subject: [PATCH] fb: writecombine fb
> 
> ---
>  drivers/video/fbdev/core/fbmem.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
> index 0705d88..ecbde0e 100644
> --- a/drivers/video/fbdev/core/fbmem.c
> +++ b/drivers/video/fbdev/core/fbmem.c
> @@ -1396,6 +1396,7 @@ fb_mmap(struct file *file, struct vm_area_struct * vma)
>  	unsigned long mmio_pgoff;
>  	unsigned long start;
>  	u32 len;
> +	bool mmio = false;
>  
>  	if (!info)
>  		return -ENODEV;
> @@ -1426,11 +1427,20 @@ fb_mmap(struct file *file, struct vm_area_struct * vma)
>  		vma->vm_pgoff -= mmio_pgoff;
>  		start = info->fix.mmio_start;
>  		len = info->fix.mmio_len;
> +		mmio = true;
>  	}
>  	mutex_unlock(&info->mm_lock);
>  
> +	if (!mmio) {
> +		vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
> +		vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
> +
> +		if (!vm_iomap_memory(vma, start, len))
> +			return 0;
> +	}
> +
>  	vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
> -	fb_pgprotect(file, vma, start);
> +	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
>  
>  	return vm_iomap_memory(vma, start, len);
>  }
> 
> Perhaps it's time I tried to send that upstream properly :P

Lets assume drivers all have ioremap_wc() on the framebuffer, would the
the above not be needed then (disregarding the corner cases such as atyfb)?

If your goal is to generalize a place to make framebuffer WC instead of doing
that at mmap() why not do it at register_framebuffer() time and do it
only once? I suspect all this might also be easier to do and generalize
after this series.
 
So as we can see from this series there are tons of drivers that can safely
be moved to use ioremap_wc() already, provided there are no regressions with
the simple ioremap_wc() / arch_phys_wc_add() switch. There are only a few corner
cases to address after that. Addressing both of these *first* would simplify
the code and gramatically make it a bit more consistent while trying to avoid a
generalized regression. I believe a generalized solution is definitely in order
but we also should first address the corner cases.

So how about we:

a) convert all drivers over that are safe to convert to ioremap_wc() /
   arch_phys_add_wc()
b) address all corner cases and try to avoid further combinatorial
   issues
c) after a while push for reverting de33c442e
d) generalize a solution / for framebuffer

Ideally as I mentioned in the other thread with Bjorn we could even
have the WC be done further below for us but it was very unclear
if we could accomplish this due the definition of the PCI flags,
the way we'd use it and the way they could be integrated on hardware
by manufacturers. I think generalizing things under the frambuffer
code would be good intermediate step but I think we need to phase
this in in light of the corner cases, combinatorial issues with
PAT / non-PAT and eventual reverting goals of commit de33c442e
in order to generalize strong UC.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
@ 2015-04-01 23:52                     ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-01 23:52 UTC (permalink / raw)
  To: Ville Syrjälä,
	Andy Lutomirski, Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
	Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
	linux-kernel, Linux Fbdev development list, X86 ML, xen-devel,
	Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Sat, Mar 28, 2015 at 02:23:34PM +0200, Ville Syrjälä wrote:
> On Sat, Mar 28, 2015 at 01:28:18AM +0100, Luis R. Rodriguez wrote:
> > On Fri, Mar 27, 2015 at 03:02:10PM -0700, Andy Lutomirski wrote:
> > > On Fri, Mar 27, 2015 at 2:56 PM, Ville Syrjälä <syrjala@sci.fi> wrote:
> > > > On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
> > > >> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
> > > >> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > > >> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> > > >> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> > > >> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > > >> > >> > index 8025624..8875e56 100644
> > > >> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > > >> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > > >> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> > > >> > >> >
> > > >> > >> >  #ifdef CONFIG_MTRR
> > > >> > >> >     par->mtrr_aper = -1;
> > > >> > >> > -   par->mtrr_reg = -1;
> > > >> > >> >     if (!nomtrr) {
> > > >> > >> > -           /* Cover the whole resource. */
> > > >> > >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > > >> > >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > > >> > >> > +                                     info->fix.smem_len,
> > > >> > >> >                                       MTRR_TYPE_WRCOMB, 1);
> > > >> > >>
> > > >> > >> MTRRs need power of two size, so how is this supposed to work?
> > > >> > >
> > > >> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
> > > >> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> > > >> > > is not standardized and by no means recorded as a requirement. Obviously
> > > >> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> > > >> > > will use mtrr_check() to verify the the same requirement. Furthermore,
> > > >> > > as per my commit log message:
> > > >> >
> > > >> > Whatever the code may or may not do, the x86 architecture uses
> > > >> > power-of-two MTRR sizes.  So I'm confused.
> > > >>
> > > >> There should be no confusion, I simply did not know that *was* the
> > > >> requirement for x86, if that is the case we should add a check for that
> > > >> and perhaps generalize a helper that does the power of two helper changes,
> > > >> the cleanest I found was the vesafb driver solution.
> > > >>
> > > >> Thoughts?
> > > >
> > > > The vesafb solution is bad since you'll only end up covering only
> > > > the first 4MB of the framebuffer instead of the almost 8MB you want.
> > > > Which in practice will mean throwing away half the VRAM since you really
> > > > don't want the massive performance hit from accessing it as UC. And that
> > > > would mean giving up decent display resolutions as well :(
> > > >
> > > > And the other option of trying to cover the remainder with multiple ever
> > > > smaller MTRRs doesn't work either since you'll run out of MTRRs very
> > > > quickly.
> > > >
> > > > This is precisely why I used the hole method in atyfb in the first
> > > > place.
> > > >
> > > > I don't really like the idea of any new mtrr code not supporting that
> > > > use case, especially as these things tend to be present in older machines
> > > > where PAT isn't an option.
> > > 
> > > According to the Intel SDM, volume 3, section 11.5.2.1, table 11-6,
> > > non-PAT CPUs that have a WC MTRR, PCD = 1, and PWT = 1 (aka UC) have
> > > an effective memory type of UC.

This is true but non-PAT systems that use just ioremap() will default to
_PAGE_CACHE_MODE_UC_MINUS, not _PAGE_CACHE_MODE_UC, and _PAGE_CACHE_MODE_UC_MINUS
on Linux has PCD = 1, PWT = 0. The list comes from:

uint16_t __cachemode2pte_tbl[_PAGE_CACHE_MODE_NUM] = {                          
        [_PAGE_CACHE_MODE_WB      ]     = 0         | 0        ,                
        [_PAGE_CACHE_MODE_WC      ]     = _PAGE_PWT | 0        ,                
        [_PAGE_CACHE_MODE_UC_MINUS]     = 0         | _PAGE_PCD,                
        [_PAGE_CACHE_MODE_UC      ]     = _PAGE_PWT | _PAGE_PCD,                
        [_PAGE_CACHE_MODE_WT      ]     = 0         | _PAGE_PCD,                
        [_PAGE_CACHE_MODE_WP      ]     = 0         | _PAGE_PCD,                
};                                                                              

This can better be read here:

 PAT
 |PCD
 ||PWT
 |||
 000 WB          _PAGE_CACHE_MODE_WB
 001 WC          _PAGE_CACHE_MODE_WC
 010 UC-         _PAGE_CACHE_MODE_UC_MINUS
 011 UC          _PAGE_CACHE_MODE_UC

On x86 ioremap() defaults to ioremap_nocache() and right now that uses
_PAGE_CACHE_MODE_UC_MINUS not _PAGE_CACHE_MODE_UC. We have two cases
to consider for non-PAT systems then:

a) Right now as ioremap() and ioremap_nocache() default to _PAGE_CACHE_MODE_UC_MINUS
   on x86. In this case using a WC MTRR seems to use PWT=0, PCD=1, and
   table table 11-6 on non-PAT systems seems to place this situation as
   "implementation defined" and not encouraged.

a) when commit de33c442e "x86 PAT: fix performance drop for glx, use
   UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()"
   gets reverted and we use _PAGE_CACHE_MODE_UC by default. In this
   case on x86 for both ioremap() and ioremap_nocache() as they will
   both default to _PAGE_CACHE_MODE_UC we'll end up as you note with
   an effective memory type of UC.

If I've understood this correctly then neither of these situations are good and
its just by chance that on some systems situation a) has lead to proper WC.

On a PAT system we have a bit different combinatorial results (based on Table
11-7):

a) Right now ioremap() and ioremap_nocache() defaulting to
    _PAGE_CACHE_MODE_UC_MINUS yields + MTRR WC = WC

b) When commit de33c442e gets reverted _PAGE_CACHE_MODE_UC + MTRR WC = UC

So to be clear right now atyfb should work fine on PAT systems
with de33c442e in place, once reverted as-is right now we'd end
up with UC effective memory type.

For both PAT and non-PAT systems when commit de33c442e gets reverted
we'd end up with UC as the effective memory type for atyfb. Right
now it shoud work on PAT systems and by chance its suspected to work
on non-PAT systems. We want to phase MTRR though, specially to avoid
all this insane combinatorial nightmware.

> > > Hence my suggestion to add
> > > ioremap_x86_uc and/or set_memory_x86_uc to punch a UC hole in an
> > > otherwise WC MTRR-covered region.

To be clear I think you mean then that ioremap_x86_uc() would help us avoid the
jumps between combinatorial issues with MTRR on PAT / non-PAT systems before
and after commit de33c442e gets reverted. So for instance if we had on the
atyfb driver:

ioremap_x86_uc(PCI BAR)
ioremap_wc(framebuffer)
arch_phys_add_wc(PCI BAR)

On non-PAT systems on the MMIO region with PWT=1, PCD=1 we'd end up with UC.
Sadly though since _PAGE_CACHE_WC on non-PAT has PWT=1, PCD=0, the WC
MTRR that follows would mean we'd end up with another grey area (but
similar to before as technically an effectivethe memory type of WC).

On PAT systems the above would not use MTRRs but we'd be counting on
overlapping memory types -- its not clear if aliasing here is a problem.

Also Intel SDM, volume 3, section "11.11.4 Range Size and Alignment Requirement"
describes that: "the minimum range size is 4 KiB, the base address must be on
a 4 KiB boundary. For ranges greater than 4 KiB each range must be of length
2^n and its base address must be alinged on a 2^n boundary where n is a value
equal or greatar then 12. The base-address alignment value cannot be less
than its length. For example, an 8-KiB range cannot be aligned on a
4-KiB boundary. It must be aligned on at least an 8-KiB boundary"

So to answer my own question: indeed, our framebuffer base address must be
aligned on a 2^n boundary, the size also has to be a power of 2. MTRR supports
fixed range sizes and variable range sizes, in case of the MMIO that does
not need to abide by the power of 2 rule as a fixed range size of 4 KiB
could be used although upon review ouf our own implemetnation its unclear if
that is what is used for 4 KiB sized MTRRs.

Hence my arch_phys_add_wc(PCI BAR) as above.

> > OK I think I get it now.
> > 
> > And I take it this would hopefully only be used for non-PAT systems?

Since we likely could care to use ioremap_x86_uc() on PAT systems as well we
could make the effective for both PAT and non-PAT obviously then.  Later when
we get ioremap() to default to strong UC we could drop ioremap_x86_uc() as we'd
only need it as transitory until then -- that is unless we want perhaps a strong
UC ioremap primitive which is always following strong UC when available regardless
of these default transitions.

The big issue I see here is simply the combinatorial issues, so I do think
its best to annotate these corner cases well and avoid them.

> > Would there be a use case for PAT systems? I wonder if we can wrap
> > this under some APIs to make it clean and hide this dirty thing
> > behind the scenes, it seems a fragile and error prone and my hope
> > would be that we won't need more specialization in this area for
> > PAT systems.
> 
> One potential complication is kernel vs. userspace mmap. MTRR applies to
> the physical address, but PAT applies to the virtual address, so with
> the WC MTRR you get WC for userspace "for free" as well.

What is the performance impact of having the conversion being done by the
kernel? Has anyone done measurements? If significant can't the subsystem mmap()
cache the phys address for PAT? Shouldn't the TLB take care of those considerations
for us? If this is generally desirable shouldn't we just generalize the cache
for devices for O(1) access through a generic API?

Can the difference, other than a possible performance hit, implicate a userspace
visible change?

If the performance / userspace effect is neglibable then I'd expect the gains
from cleaner code / APIs to outweight the cons. After all the goal is to
streamline PAT when possible here.

> Also the
> userspace mmaps request will have the length of smem_len (at most), so
> it won't be the nice power of two in that case.

Does that length change implicate a userspace visible change?

> Also on PAT systems w/o a BIOS provided WC MTRR, the fbdev mmap seems
> to be total crap at the moment. IIRC I have a patch to fix things a bit...

Isn't that becuase of the lack of the ioremap_wc()'s? You seem to be
alternatively doing this with pgprot_writecombine(), more on this strategy
below though.

> From 4e6d70d223f35953c8a11a58cf3376a8a001fa4f Mon Sep 17 00:00:00 2001
> From: Ville Syrjala <syrjala@sci.fi>
> Date: Fri, 15 Apr 2011 04:02:43 +0300
> Subject: [PATCH] fb: writecombine fb
> 
> ---
>  drivers/video/fbdev/core/fbmem.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
> index 0705d88..ecbde0e 100644
> --- a/drivers/video/fbdev/core/fbmem.c
> +++ b/drivers/video/fbdev/core/fbmem.c
> @@ -1396,6 +1396,7 @@ fb_mmap(struct file *file, struct vm_area_struct * vma)
>  	unsigned long mmio_pgoff;
>  	unsigned long start;
>  	u32 len;
> +	bool mmio = false;
>  
>  	if (!info)
>  		return -ENODEV;
> @@ -1426,11 +1427,20 @@ fb_mmap(struct file *file, struct vm_area_struct * vma)
>  		vma->vm_pgoff -= mmio_pgoff;
>  		start = info->fix.mmio_start;
>  		len = info->fix.mmio_len;
> +		mmio = true;
>  	}
>  	mutex_unlock(&info->mm_lock);
>  
> +	if (!mmio) {
> +		vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
> +		vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
> +
> +		if (!vm_iomap_memory(vma, start, len))
> +			return 0;
> +	}
> +
>  	vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
> -	fb_pgprotect(file, vma, start);
> +	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
>  
>  	return vm_iomap_memory(vma, start, len);
>  }
> 
> Perhaps it's time I tried to send that upstream properly :P

Lets assume drivers all have ioremap_wc() on the framebuffer, would the
the above not be needed then (disregarding the corner cases such as atyfb)?

If your goal is to generalize a place to make framebuffer WC instead of doing
that at mmap() why not do it at register_framebuffer() time and do it
only once? I suspect all this might also be easier to do and generalize
after this series.
 
So as we can see from this series there are tons of drivers that can safely
be moved to use ioremap_wc() already, provided there are no regressions with
the simple ioremap_wc() / arch_phys_wc_add() switch. There are only a few corner
cases to address after that. Addressing both of these *first* would simplify
the code and gramatically make it a bit more consistent while trying to avoid a
generalized regression. I believe a generalized solution is definitely in order
but we also should first address the corner cases.

So how about we:

a) convert all drivers over that are safe to convert to ioremap_wc() /
   arch_phys_add_wc()
b) address all corner cases and try to avoid further combinatorial
   issues
c) after a while push for reverting de33c442e
d) generalize a solution / for framebuffer

Ideally as I mentioned in the other thread with Bjorn we could even
have the WC be done further below for us but it was very unclear
if we could accomplish this due the definition of the PCI flags,
the way we'd use it and the way they could be integrated on hardware
by manufacturers. I think generalizing things under the frambuffer
code would be good intermediate step but I think we need to phase
this in in light of the corner cases, combinatorial issues with
PAT / non-PAT and eventual reverting goals of commit de33c442e
in order to generalize strong UC.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-03-28 12:23                   ` Ville Syrjälä
  (?)
@ 2015-04-01 23:52                   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-01 23:52 UTC (permalink / raw)
  To: Ville Syrjälä,
	Andy Lutomirski, Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
	Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
	linux-kernel, Linux Fbdev development list, X86 ML, xen-devel,
	Ingo Molnar, Linus Torvalds, Daniel Vetter, Antonino Daplas

On Sat, Mar 28, 2015 at 02:23:34PM +0200, Ville Syrjälä wrote:
> On Sat, Mar 28, 2015 at 01:28:18AM +0100, Luis R. Rodriguez wrote:
> > On Fri, Mar 27, 2015 at 03:02:10PM -0700, Andy Lutomirski wrote:
> > > On Fri, Mar 27, 2015 at 2:56 PM, Ville Syrjälä <syrjala@sci.fi> wrote:
> > > > On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
> > > >> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
> > > >> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > > >> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> > > >> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> > > >> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> > > >> > >> > index 8025624..8875e56 100644
> > > >> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> > > >> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> > > >> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> > > >> > >> >
> > > >> > >> >  #ifdef CONFIG_MTRR
> > > >> > >> >     par->mtrr_aper = -1;
> > > >> > >> > -   par->mtrr_reg = -1;
> > > >> > >> >     if (!nomtrr) {
> > > >> > >> > -           /* Cover the whole resource. */
> > > >> > >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> > > >> > >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
> > > >> > >> > +                                     info->fix.smem_len,
> > > >> > >> >                                       MTRR_TYPE_WRCOMB, 1);
> > > >> > >>
> > > >> > >> MTRRs need power of two size, so how is this supposed to work?
> > > >> > >
> > > >> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
> > > >> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> > > >> > > is not standardized and by no means recorded as a requirement. Obviously
> > > >> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> > > >> > > will use mtrr_check() to verify the the same requirement. Furthermore,
> > > >> > > as per my commit log message:
> > > >> >
> > > >> > Whatever the code may or may not do, the x86 architecture uses
> > > >> > power-of-two MTRR sizes.  So I'm confused.
> > > >>
> > > >> There should be no confusion, I simply did not know that *was* the
> > > >> requirement for x86, if that is the case we should add a check for that
> > > >> and perhaps generalize a helper that does the power of two helper changes,
> > > >> the cleanest I found was the vesafb driver solution.
> > > >>
> > > >> Thoughts?
> > > >
> > > > The vesafb solution is bad since you'll only end up covering only
> > > > the first 4MB of the framebuffer instead of the almost 8MB you want.
> > > > Which in practice will mean throwing away half the VRAM since you really
> > > > don't want the massive performance hit from accessing it as UC. And that
> > > > would mean giving up decent display resolutions as well :(
> > > >
> > > > And the other option of trying to cover the remainder with multiple ever
> > > > smaller MTRRs doesn't work either since you'll run out of MTRRs very
> > > > quickly.
> > > >
> > > > This is precisely why I used the hole method in atyfb in the first
> > > > place.
> > > >
> > > > I don't really like the idea of any new mtrr code not supporting that
> > > > use case, especially as these things tend to be present in older machines
> > > > where PAT isn't an option.
> > > 
> > > According to the Intel SDM, volume 3, section 11.5.2.1, table 11-6,
> > > non-PAT CPUs that have a WC MTRR, PCD = 1, and PWT = 1 (aka UC) have
> > > an effective memory type of UC.

This is true but non-PAT systems that use just ioremap() will default to
_PAGE_CACHE_MODE_UC_MINUS, not _PAGE_CACHE_MODE_UC, and _PAGE_CACHE_MODE_UC_MINUS
on Linux has PCD = 1, PWT = 0. The list comes from:

uint16_t __cachemode2pte_tbl[_PAGE_CACHE_MODE_NUM] = {                          
        [_PAGE_CACHE_MODE_WB      ]     = 0         | 0        ,                
        [_PAGE_CACHE_MODE_WC      ]     = _PAGE_PWT | 0        ,                
        [_PAGE_CACHE_MODE_UC_MINUS]     = 0         | _PAGE_PCD,                
        [_PAGE_CACHE_MODE_UC      ]     = _PAGE_PWT | _PAGE_PCD,                
        [_PAGE_CACHE_MODE_WT      ]     = 0         | _PAGE_PCD,                
        [_PAGE_CACHE_MODE_WP      ]     = 0         | _PAGE_PCD,                
};                                                                              

This can better be read here:

 PAT
 |PCD
 ||PWT
 |||
 000 WB          _PAGE_CACHE_MODE_WB
 001 WC          _PAGE_CACHE_MODE_WC
 010 UC-         _PAGE_CACHE_MODE_UC_MINUS
 011 UC          _PAGE_CACHE_MODE_UC

On x86 ioremap() defaults to ioremap_nocache() and right now that uses
_PAGE_CACHE_MODE_UC_MINUS not _PAGE_CACHE_MODE_UC. We have two cases
to consider for non-PAT systems then:

a) Right now as ioremap() and ioremap_nocache() default to _PAGE_CACHE_MODE_UC_MINUS
   on x86. In this case using a WC MTRR seems to use PWT=0, PCD=1, and
   table table 11-6 on non-PAT systems seems to place this situation as
   "implementation defined" and not encouraged.

a) when commit de33c442e "x86 PAT: fix performance drop for glx, use
   UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()"
   gets reverted and we use _PAGE_CACHE_MODE_UC by default. In this
   case on x86 for both ioremap() and ioremap_nocache() as they will
   both default to _PAGE_CACHE_MODE_UC we'll end up as you note with
   an effective memory type of UC.

If I've understood this correctly then neither of these situations are good and
its just by chance that on some systems situation a) has lead to proper WC.

On a PAT system we have a bit different combinatorial results (based on Table
11-7):

a) Right now ioremap() and ioremap_nocache() defaulting to
    _PAGE_CACHE_MODE_UC_MINUS yields + MTRR WC = WC

b) When commit de33c442e gets reverted _PAGE_CACHE_MODE_UC + MTRR WC = UC

So to be clear right now atyfb should work fine on PAT systems
with de33c442e in place, once reverted as-is right now we'd end
up with UC effective memory type.

For both PAT and non-PAT systems when commit de33c442e gets reverted
we'd end up with UC as the effective memory type for atyfb. Right
now it shoud work on PAT systems and by chance its suspected to work
on non-PAT systems. We want to phase MTRR though, specially to avoid
all this insane combinatorial nightmware.

> > > Hence my suggestion to add
> > > ioremap_x86_uc and/or set_memory_x86_uc to punch a UC hole in an
> > > otherwise WC MTRR-covered region.

To be clear I think you mean then that ioremap_x86_uc() would help us avoid the
jumps between combinatorial issues with MTRR on PAT / non-PAT systems before
and after commit de33c442e gets reverted. So for instance if we had on the
atyfb driver:

ioremap_x86_uc(PCI BAR)
ioremap_wc(framebuffer)
arch_phys_add_wc(PCI BAR)

On non-PAT systems on the MMIO region with PWT=1, PCD=1 we'd end up with UC.
Sadly though since _PAGE_CACHE_WC on non-PAT has PWT=1, PCD=0, the WC
MTRR that follows would mean we'd end up with another grey area (but
similar to before as technically an effectivethe memory type of WC).

On PAT systems the above would not use MTRRs but we'd be counting on
overlapping memory types -- its not clear if aliasing here is a problem.

Also Intel SDM, volume 3, section "11.11.4 Range Size and Alignment Requirement"
describes that: "the minimum range size is 4 KiB, the base address must be on
a 4 KiB boundary. For ranges greater than 4 KiB each range must be of length
2^n and its base address must be alinged on a 2^n boundary where n is a value
equal or greatar then 12. The base-address alignment value cannot be less
than its length. For example, an 8-KiB range cannot be aligned on a
4-KiB boundary. It must be aligned on at least an 8-KiB boundary"

So to answer my own question: indeed, our framebuffer base address must be
aligned on a 2^n boundary, the size also has to be a power of 2. MTRR supports
fixed range sizes and variable range sizes, in case of the MMIO that does
not need to abide by the power of 2 rule as a fixed range size of 4 KiB
could be used although upon review ouf our own implemetnation its unclear if
that is what is used for 4 KiB sized MTRRs.

Hence my arch_phys_add_wc(PCI BAR) as above.

> > OK I think I get it now.
> > 
> > And I take it this would hopefully only be used for non-PAT systems?

Since we likely could care to use ioremap_x86_uc() on PAT systems as well we
could make the effective for both PAT and non-PAT obviously then.  Later when
we get ioremap() to default to strong UC we could drop ioremap_x86_uc() as we'd
only need it as transitory until then -- that is unless we want perhaps a strong
UC ioremap primitive which is always following strong UC when available regardless
of these default transitions.

The big issue I see here is simply the combinatorial issues, so I do think
its best to annotate these corner cases well and avoid them.

> > Would there be a use case for PAT systems? I wonder if we can wrap
> > this under some APIs to make it clean and hide this dirty thing
> > behind the scenes, it seems a fragile and error prone and my hope
> > would be that we won't need more specialization in this area for
> > PAT systems.
> 
> One potential complication is kernel vs. userspace mmap. MTRR applies to
> the physical address, but PAT applies to the virtual address, so with
> the WC MTRR you get WC for userspace "for free" as well.

What is the performance impact of having the conversion being done by the
kernel? Has anyone done measurements? If significant can't the subsystem mmap()
cache the phys address for PAT? Shouldn't the TLB take care of those considerations
for us? If this is generally desirable shouldn't we just generalize the cache
for devices for O(1) access through a generic API?

Can the difference, other than a possible performance hit, implicate a userspace
visible change?

If the performance / userspace effect is neglibable then I'd expect the gains
from cleaner code / APIs to outweight the cons. After all the goal is to
streamline PAT when possible here.

> Also the
> userspace mmaps request will have the length of smem_len (at most), so
> it won't be the nice power of two in that case.

Does that length change implicate a userspace visible change?

> Also on PAT systems w/o a BIOS provided WC MTRR, the fbdev mmap seems
> to be total crap at the moment. IIRC I have a patch to fix things a bit...

Isn't that becuase of the lack of the ioremap_wc()'s? You seem to be
alternatively doing this with pgprot_writecombine(), more on this strategy
below though.

> From 4e6d70d223f35953c8a11a58cf3376a8a001fa4f Mon Sep 17 00:00:00 2001
> From: Ville Syrjala <syrjala@sci.fi>
> Date: Fri, 15 Apr 2011 04:02:43 +0300
> Subject: [PATCH] fb: writecombine fb
> 
> ---
>  drivers/video/fbdev/core/fbmem.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
> index 0705d88..ecbde0e 100644
> --- a/drivers/video/fbdev/core/fbmem.c
> +++ b/drivers/video/fbdev/core/fbmem.c
> @@ -1396,6 +1396,7 @@ fb_mmap(struct file *file, struct vm_area_struct * vma)
>  	unsigned long mmio_pgoff;
>  	unsigned long start;
>  	u32 len;
> +	bool mmio = false;
>  
>  	if (!info)
>  		return -ENODEV;
> @@ -1426,11 +1427,20 @@ fb_mmap(struct file *file, struct vm_area_struct * vma)
>  		vma->vm_pgoff -= mmio_pgoff;
>  		start = info->fix.mmio_start;
>  		len = info->fix.mmio_len;
> +		mmio = true;
>  	}
>  	mutex_unlock(&info->mm_lock);
>  
> +	if (!mmio) {
> +		vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
> +		vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
> +
> +		if (!vm_iomap_memory(vma, start, len))
> +			return 0;
> +	}
> +
>  	vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
> -	fb_pgprotect(file, vma, start);
> +	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
>  
>  	return vm_iomap_memory(vma, start, len);
>  }
> 
> Perhaps it's time I tried to send that upstream properly :P

Lets assume drivers all have ioremap_wc() on the framebuffer, would the
the above not be needed then (disregarding the corner cases such as atyfb)?

If your goal is to generalize a place to make framebuffer WC instead of doing
that at mmap() why not do it at register_framebuffer() time and do it
only once? I suspect all this might also be easier to do and generalize
after this series.
 
So as we can see from this series there are tons of drivers that can safely
be moved to use ioremap_wc() already, provided there are no regressions with
the simple ioremap_wc() / arch_phys_wc_add() switch. There are only a few corner
cases to address after that. Addressing both of these *first* would simplify
the code and gramatically make it a bit more consistent while trying to avoid a
generalized regression. I believe a generalized solution is definitely in order
but we also should first address the corner cases.

So how about we:

a) convert all drivers over that are safe to convert to ioremap_wc() /
   arch_phys_add_wc()
b) address all corner cases and try to avoid further combinatorial
   issues
c) after a while push for reverting de33c442e
d) generalize a solution / for framebuffer

Ideally as I mentioned in the other thread with Bjorn we could even
have the WC be done further below for us but it was very unclear
if we could accomplish this due the definition of the PCI flags,
the way we'd use it and the way they could be integrated on hardware
by manufacturers. I think generalizing things under the frambuffer
code would be good intermediate step but I think we need to phase
this in in light of the corner cases, combinatorial issues with
PAT / non-PAT and eventual reverting goals of commit de33c442e
in order to generalize strong UC.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-04-01 23:52                     ` Luis R. Rodriguez
@ 2015-04-02  0:04                       ` Andy Lutomirski
  -1 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-04-02  0:04 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Ville Syrjälä,
	Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Juergen Gross, Jan Beulich, Borislav Petkov,
	Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
	Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
	Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Wed, Apr 1, 2015 at 4:52 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Sat, Mar 28, 2015 at 02:23:34PM +0200, Ville Syrjälä wrote:
>> On Sat, Mar 28, 2015 at 01:28:18AM +0100, Luis R. Rodriguez wrote:
>> > On Fri, Mar 27, 2015 at 03:02:10PM -0700, Andy Lutomirski wrote:
>> > > On Fri, Mar 27, 2015 at 2:56 PM, Ville Syrjälä <syrjala@sci.fi> wrote:
>> > > > On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
>> > > >> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
>> > > >> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> > > >> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
>> > > >> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
>> > > >> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
>> > > >> > >> > index 8025624..8875e56 100644
>> > > >> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
>> > > >> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
>> > > >> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
>> > > >> > >> >
>> > > >> > >> >  #ifdef CONFIG_MTRR
>> > > >> > >> >     par->mtrr_aper = -1;
>> > > >> > >> > -   par->mtrr_reg = -1;
>> > > >> > >> >     if (!nomtrr) {
>> > > >> > >> > -           /* Cover the whole resource. */
>> > > >> > >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
>> > > >> > >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
>> > > >> > >> > +                                     info->fix.smem_len,
>> > > >> > >> >                                       MTRR_TYPE_WRCOMB, 1);
>> > > >> > >>
>> > > >> > >> MTRRs need power of two size, so how is this supposed to work?
>> > > >> > >
>> > > >> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
>> > > >> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
>> > > >> > > is not standardized and by no means recorded as a requirement. Obviously
>> > > >> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
>> > > >> > > will use mtrr_check() to verify the the same requirement. Furthermore,
>> > > >> > > as per my commit log message:
>> > > >> >
>> > > >> > Whatever the code may or may not do, the x86 architecture uses
>> > > >> > power-of-two MTRR sizes.  So I'm confused.
>> > > >>
>> > > >> There should be no confusion, I simply did not know that *was* the
>> > > >> requirement for x86, if that is the case we should add a check for that
>> > > >> and perhaps generalize a helper that does the power of two helper changes,
>> > > >> the cleanest I found was the vesafb driver solution.
>> > > >>
>> > > >> Thoughts?
>> > > >
>> > > > The vesafb solution is bad since you'll only end up covering only
>> > > > the first 4MB of the framebuffer instead of the almost 8MB you want.
>> > > > Which in practice will mean throwing away half the VRAM since you really
>> > > > don't want the massive performance hit from accessing it as UC. And that
>> > > > would mean giving up decent display resolutions as well :(
>> > > >
>> > > > And the other option of trying to cover the remainder with multiple ever
>> > > > smaller MTRRs doesn't work either since you'll run out of MTRRs very
>> > > > quickly.
>> > > >
>> > > > This is precisely why I used the hole method in atyfb in the first
>> > > > place.
>> > > >
>> > > > I don't really like the idea of any new mtrr code not supporting that
>> > > > use case, especially as these things tend to be present in older machines
>> > > > where PAT isn't an option.
>> > >
>> > > According to the Intel SDM, volume 3, section 11.5.2.1, table 11-6,
>> > > non-PAT CPUs that have a WC MTRR, PCD = 1, and PWT = 1 (aka UC) have
>> > > an effective memory type of UC.
>
> This is true but non-PAT systems that use just ioremap() will default to
> _PAGE_CACHE_MODE_UC_MINUS, not _PAGE_CACHE_MODE_UC, and _PAGE_CACHE_MODE_UC_MINUS
> on Linux has PCD = 1, PWT = 0. The list comes from:
>
> uint16_t __cachemode2pte_tbl[_PAGE_CACHE_MODE_NUM] = {
>         [_PAGE_CACHE_MODE_WB      ]     = 0         | 0        ,
>         [_PAGE_CACHE_MODE_WC      ]     = _PAGE_PWT | 0        ,
>         [_PAGE_CACHE_MODE_UC_MINUS]     = 0         | _PAGE_PCD,
>         [_PAGE_CACHE_MODE_UC      ]     = _PAGE_PWT | _PAGE_PCD,
>         [_PAGE_CACHE_MODE_WT      ]     = 0         | _PAGE_PCD,
>         [_PAGE_CACHE_MODE_WP      ]     = 0         | _PAGE_PCD,
> };
>
> This can better be read here:
>
>  PAT
>  |PCD
>  ||PWT
>  |||
>  000 WB          _PAGE_CACHE_MODE_WB
>  001 WC          _PAGE_CACHE_MODE_WC
>  010 UC-         _PAGE_CACHE_MODE_UC_MINUS
>  011 UC          _PAGE_CACHE_MODE_UC
>
> On x86 ioremap() defaults to ioremap_nocache() and right now that uses
> _PAGE_CACHE_MODE_UC_MINUS not _PAGE_CACHE_MODE_UC. We have two cases
> to consider for non-PAT systems then:
>
> a) Right now as ioremap() and ioremap_nocache() default to _PAGE_CACHE_MODE_UC_MINUS
>    on x86. In this case using a WC MTRR seems to use PWT=0, PCD=1, and
>    table table 11-6 on non-PAT systems seems to place this situation as
>    "implementation defined" and not encouraged.
>
> a) when commit de33c442e "x86 PAT: fix performance drop for glx, use
>    UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()"
>    gets reverted and we use _PAGE_CACHE_MODE_UC by default. In this
>    case on x86 for both ioremap() and ioremap_nocache() as they will
>    both default to _PAGE_CACHE_MODE_UC we'll end up as you note with
>    an effective memory type of UC.
>
> If I've understood this correctly then neither of these situations are good and
> its just by chance that on some systems situation a) has lead to proper WC.
>
> On a PAT system we have a bit different combinatorial results (based on Table
> 11-7):
>
> a) Right now ioremap() and ioremap_nocache() defaulting to
>     _PAGE_CACHE_MODE_UC_MINUS yields + MTRR WC = WC
>
> b) When commit de33c442e gets reverted _PAGE_CACHE_MODE_UC + MTRR WC = UC
>
> So to be clear right now atyfb should work fine on PAT systems
> with de33c442e in place, once reverted as-is right now we'd end
> up with UC effective memory type.
>
> For both PAT and non-PAT systems when commit de33c442e gets reverted
> we'd end up with UC as the effective memory type for atyfb. Right
> now it shoud work on PAT systems and by chance its suspected to work
> on non-PAT systems. We want to phase MTRR though, specially to avoid
> all this insane combinatorial nightmware.
>
>> > > Hence my suggestion to add
>> > > ioremap_x86_uc and/or set_memory_x86_uc to punch a UC hole in an
>> > > otherwise WC MTRR-covered region.
>
> To be clear I think you mean then that ioremap_x86_uc() would help us avoid the
> jumps between combinatorial issues with MTRR on PAT / non-PAT systems before
> and after commit de33c442e gets reverted. So for instance if we had on the
> atyfb driver:
>
> ioremap_x86_uc(PCI BAR)
> ioremap_wc(framebuffer)
> arch_phys_add_wc(PCI BAR)
>
> On non-PAT systems on the MMIO region with PWT=1, PCD=1 we'd end up with UC.
> Sadly though since _PAGE_CACHE_WC on non-PAT has PWT=1, PCD=0, the WC
> MTRR that follows would mean we'd end up with another grey area (but
> similar to before as technically an effectivethe memory type of WC).
>
> On PAT systems the above would not use MTRRs but we'd be counting on
> overlapping memory types -- its not clear if aliasing here is a problem.
>
> Also Intel SDM, volume 3, section "11.11.4 Range Size and Alignment Requirement"
> describes that: "the minimum range size is 4 KiB, the base address must be on
> a 4 KiB boundary. For ranges greater than 4 KiB each range must be of length
> 2^n and its base address must be alinged on a 2^n boundary where n is a value
> equal or greatar then 12. The base-address alignment value cannot be less
> than its length. For example, an 8-KiB range cannot be aligned on a
> 4-KiB boundary. It must be aligned on at least an 8-KiB boundary"
>
> So to answer my own question: indeed, our framebuffer base address must be
> aligned on a 2^n boundary, the size also has to be a power of 2. MTRR supports
> fixed range sizes and variable range sizes, in case of the MMIO that does
> not need to abide by the power of 2 rule as a fixed range size of 4 KiB
> could be used although upon review ouf our own implemetnation its unclear if
> that is what is used for 4 KiB sized MTRRs.
>
> Hence my arch_phys_add_wc(PCI BAR) as above.
>
>> > OK I think I get it now.
>> >
>> > And I take it this would hopefully only be used for non-PAT systems?
>
> Since we likely could care to use ioremap_x86_uc() on PAT systems as well we
> could make the effective for both PAT and non-PAT obviously then.  Later when
> we get ioremap() to default to strong UC we could drop ioremap_x86_uc() as we'd
> only need it as transitory until then -- that is unless we want perhaps a strong
> UC ioremap primitive which is always following strong UC when available regardless
> of these default transitions.
>
> The big issue I see here is simply the combinatorial issues, so I do think
> its best to annotate these corner cases well and avoid them.
>
>> > Would there be a use case for PAT systems? I wonder if we can wrap
>> > this under some APIs to make it clean and hide this dirty thing
>> > behind the scenes, it seems a fragile and error prone and my hope
>> > would be that we won't need more specialization in this area for
>> > PAT systems.
>>
>> One potential complication is kernel vs. userspace mmap. MTRR applies to
>> the physical address, but PAT applies to the virtual address, so with
>> the WC MTRR you get WC for userspace "for free" as well.
>
> What is the performance impact of having the conversion being done by the
> kernel? Has anyone done measurements? If significant can't the subsystem mmap()
> cache the phys address for PAT? Shouldn't the TLB take care of those considerations
> for us? If this is generally desirable shouldn't we just generalize the cache
> for devices for O(1) access through a generic API?

We're pretty much required to keep the PTE memory types consistent for
aliasses of the same page.  I think that the x86 pageattr code is
supposed to take care of this.  IOW, if everything is working right,
then the supposedly uncached mmap should either fail, be promoted to
WC, or cause the existing WC map to degrade to UC.  The code is really
overcomplicated right now.

--Andy

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
@ 2015-04-02  0:04                       ` Andy Lutomirski
  0 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-04-02  0:04 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Ville Syrjälä,
	Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Juergen Gross, Jan Beulich, Borislav Petkov,
	Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
	Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
	Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Wed, Apr 1, 2015 at 4:52 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Sat, Mar 28, 2015 at 02:23:34PM +0200, Ville Syrjälä wrote:
>> On Sat, Mar 28, 2015 at 01:28:18AM +0100, Luis R. Rodriguez wrote:
>> > On Fri, Mar 27, 2015 at 03:02:10PM -0700, Andy Lutomirski wrote:
>> > > On Fri, Mar 27, 2015 at 2:56 PM, Ville Syrjälä <syrjala@sci.fi> wrote:
>> > > > On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
>> > > >> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
>> > > >> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> > > >> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
>> > > >> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
>> > > >> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
>> > > >> > >> > index 8025624..8875e56 100644
>> > > >> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
>> > > >> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
>> > > >> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
>> > > >> > >> >
>> > > >> > >> >  #ifdef CONFIG_MTRR
>> > > >> > >> >     par->mtrr_aper = -1;
>> > > >> > >> > -   par->mtrr_reg = -1;
>> > > >> > >> >     if (!nomtrr) {
>> > > >> > >> > -           /* Cover the whole resource. */
>> > > >> > >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
>> > > >> > >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
>> > > >> > >> > +                                     info->fix.smem_len,
>> > > >> > >> >                                       MTRR_TYPE_WRCOMB, 1);
>> > > >> > >>
>> > > >> > >> MTRRs need power of two size, so how is this supposed to work?
>> > > >> > >
>> > > >> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
>> > > >> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
>> > > >> > > is not standardized and by no means recorded as a requirement. Obviously
>> > > >> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
>> > > >> > > will use mtrr_check() to verify the the same requirement. Furthermore,
>> > > >> > > as per my commit log message:
>> > > >> >
>> > > >> > Whatever the code may or may not do, the x86 architecture uses
>> > > >> > power-of-two MTRR sizes.  So I'm confused.
>> > > >>
>> > > >> There should be no confusion, I simply did not know that *was* the
>> > > >> requirement for x86, if that is the case we should add a check for that
>> > > >> and perhaps generalize a helper that does the power of two helper changes,
>> > > >> the cleanest I found was the vesafb driver solution.
>> > > >>
>> > > >> Thoughts?
>> > > >
>> > > > The vesafb solution is bad since you'll only end up covering only
>> > > > the first 4MB of the framebuffer instead of the almost 8MB you want.
>> > > > Which in practice will mean throwing away half the VRAM since you really
>> > > > don't want the massive performance hit from accessing it as UC. And that
>> > > > would mean giving up decent display resolutions as well :(
>> > > >
>> > > > And the other option of trying to cover the remainder with multiple ever
>> > > > smaller MTRRs doesn't work either since you'll run out of MTRRs very
>> > > > quickly.
>> > > >
>> > > > This is precisely why I used the hole method in atyfb in the first
>> > > > place.
>> > > >
>> > > > I don't really like the idea of any new mtrr code not supporting that
>> > > > use case, especially as these things tend to be present in older machines
>> > > > where PAT isn't an option.
>> > >
>> > > According to the Intel SDM, volume 3, section 11.5.2.1, table 11-6,
>> > > non-PAT CPUs that have a WC MTRR, PCD = 1, and PWT = 1 (aka UC) have
>> > > an effective memory type of UC.
>
> This is true but non-PAT systems that use just ioremap() will default to
> _PAGE_CACHE_MODE_UC_MINUS, not _PAGE_CACHE_MODE_UC, and _PAGE_CACHE_MODE_UC_MINUS
> on Linux has PCD = 1, PWT = 0. The list comes from:
>
> uint16_t __cachemode2pte_tbl[_PAGE_CACHE_MODE_NUM] = {
>         [_PAGE_CACHE_MODE_WB      ]     = 0         | 0        ,
>         [_PAGE_CACHE_MODE_WC      ]     = _PAGE_PWT | 0        ,
>         [_PAGE_CACHE_MODE_UC_MINUS]     = 0         | _PAGE_PCD,
>         [_PAGE_CACHE_MODE_UC      ]     = _PAGE_PWT | _PAGE_PCD,
>         [_PAGE_CACHE_MODE_WT      ]     = 0         | _PAGE_PCD,
>         [_PAGE_CACHE_MODE_WP      ]     = 0         | _PAGE_PCD,
> };
>
> This can better be read here:
>
>  PAT
>  |PCD
>  ||PWT
>  |||
>  000 WB          _PAGE_CACHE_MODE_WB
>  001 WC          _PAGE_CACHE_MODE_WC
>  010 UC-         _PAGE_CACHE_MODE_UC_MINUS
>  011 UC          _PAGE_CACHE_MODE_UC
>
> On x86 ioremap() defaults to ioremap_nocache() and right now that uses
> _PAGE_CACHE_MODE_UC_MINUS not _PAGE_CACHE_MODE_UC. We have two cases
> to consider for non-PAT systems then:
>
> a) Right now as ioremap() and ioremap_nocache() default to _PAGE_CACHE_MODE_UC_MINUS
>    on x86. In this case using a WC MTRR seems to use PWT=0, PCD=1, and
>    table table 11-6 on non-PAT systems seems to place this situation as
>    "implementation defined" and not encouraged.
>
> a) when commit de33c442e "x86 PAT: fix performance drop for glx, use
>    UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()"
>    gets reverted and we use _PAGE_CACHE_MODE_UC by default. In this
>    case on x86 for both ioremap() and ioremap_nocache() as they will
>    both default to _PAGE_CACHE_MODE_UC we'll end up as you note with
>    an effective memory type of UC.
>
> If I've understood this correctly then neither of these situations are good and
> its just by chance that on some systems situation a) has lead to proper WC.
>
> On a PAT system we have a bit different combinatorial results (based on Table
> 11-7):
>
> a) Right now ioremap() and ioremap_nocache() defaulting to
>     _PAGE_CACHE_MODE_UC_MINUS yields + MTRR WC = WC
>
> b) When commit de33c442e gets reverted _PAGE_CACHE_MODE_UC + MTRR WC = UC
>
> So to be clear right now atyfb should work fine on PAT systems
> with de33c442e in place, once reverted as-is right now we'd end
> up with UC effective memory type.
>
> For both PAT and non-PAT systems when commit de33c442e gets reverted
> we'd end up with UC as the effective memory type for atyfb. Right
> now it shoud work on PAT systems and by chance its suspected to work
> on non-PAT systems. We want to phase MTRR though, specially to avoid
> all this insane combinatorial nightmware.
>
>> > > Hence my suggestion to add
>> > > ioremap_x86_uc and/or set_memory_x86_uc to punch a UC hole in an
>> > > otherwise WC MTRR-covered region.
>
> To be clear I think you mean then that ioremap_x86_uc() would help us avoid the
> jumps between combinatorial issues with MTRR on PAT / non-PAT systems before
> and after commit de33c442e gets reverted. So for instance if we had on the
> atyfb driver:
>
> ioremap_x86_uc(PCI BAR)
> ioremap_wc(framebuffer)
> arch_phys_add_wc(PCI BAR)
>
> On non-PAT systems on the MMIO region with PWT=1, PCD=1 we'd end up with UC.
> Sadly though since _PAGE_CACHE_WC on non-PAT has PWT=1, PCD=0, the WC
> MTRR that follows would mean we'd end up with another grey area (but
> similar to before as technically an effectivethe memory type of WC).
>
> On PAT systems the above would not use MTRRs but we'd be counting on
> overlapping memory types -- its not clear if aliasing here is a problem.
>
> Also Intel SDM, volume 3, section "11.11.4 Range Size and Alignment Requirement"
> describes that: "the minimum range size is 4 KiB, the base address must be on
> a 4 KiB boundary. For ranges greater than 4 KiB each range must be of length
> 2^n and its base address must be alinged on a 2^n boundary where n is a value
> equal or greatar then 12. The base-address alignment value cannot be less
> than its length. For example, an 8-KiB range cannot be aligned on a
> 4-KiB boundary. It must be aligned on at least an 8-KiB boundary"
>
> So to answer my own question: indeed, our framebuffer base address must be
> aligned on a 2^n boundary, the size also has to be a power of 2. MTRR supports
> fixed range sizes and variable range sizes, in case of the MMIO that does
> not need to abide by the power of 2 rule as a fixed range size of 4 KiB
> could be used although upon review ouf our own implemetnation its unclear if
> that is what is used for 4 KiB sized MTRRs.
>
> Hence my arch_phys_add_wc(PCI BAR) as above.
>
>> > OK I think I get it now.
>> >
>> > And I take it this would hopefully only be used for non-PAT systems?
>
> Since we likely could care to use ioremap_x86_uc() on PAT systems as well we
> could make the effective for both PAT and non-PAT obviously then.  Later when
> we get ioremap() to default to strong UC we could drop ioremap_x86_uc() as we'd
> only need it as transitory until then -- that is unless we want perhaps a strong
> UC ioremap primitive which is always following strong UC when available regardless
> of these default transitions.
>
> The big issue I see here is simply the combinatorial issues, so I do think
> its best to annotate these corner cases well and avoid them.
>
>> > Would there be a use case for PAT systems? I wonder if we can wrap
>> > this under some APIs to make it clean and hide this dirty thing
>> > behind the scenes, it seems a fragile and error prone and my hope
>> > would be that we won't need more specialization in this area for
>> > PAT systems.
>>
>> One potential complication is kernel vs. userspace mmap. MTRR applies to
>> the physical address, but PAT applies to the virtual address, so with
>> the WC MTRR you get WC for userspace "for free" as well.
>
> What is the performance impact of having the conversion being done by the
> kernel? Has anyone done measurements? If significant can't the subsystem mmap()
> cache the phys address for PAT? Shouldn't the TLB take care of those considerations
> for us? If this is generally desirable shouldn't we just generalize the cache
> for devices for O(1) access through a generic API?

We're pretty much required to keep the PTE memory types consistent for
aliasses of the same page.  I think that the x86 pageattr code is
supposed to take care of this.  IOW, if everything is working right,
then the supposedly uncached mmap should either fail, be promoted to
WC, or cause the existing WC map to degrade to UC.  The code is really
overcomplicated right now.

--Andy

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-04-01 23:52                     ` Luis R. Rodriguez
  (?)
@ 2015-04-02  0:04                     ` Andy Lutomirski
  -1 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-04-02  0:04 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Linux Fbdev development list, Daniel Vetter,
	Ville Syrjälä,
	Jan Beulich, H. Peter Anvin, Suresh Siddha, Tomi Valkeinen,
	X86 ML, Ingo Molnar, xen-devel, Ingo Molnar, Borislav Petkov,
	Jean-Christophe Plagniol-Villard, Antonino Daplas, Dave Airlie,
	Bjorn Helgaas, Thomas Gleixner, Juergen Gross, Luis R. Rodriguez,
	linux-kernel, venkatesh.pallipadi

On Wed, Apr 1, 2015 at 4:52 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Sat, Mar 28, 2015 at 02:23:34PM +0200, Ville Syrjälä wrote:
>> On Sat, Mar 28, 2015 at 01:28:18AM +0100, Luis R. Rodriguez wrote:
>> > On Fri, Mar 27, 2015 at 03:02:10PM -0700, Andy Lutomirski wrote:
>> > > On Fri, Mar 27, 2015 at 2:56 PM, Ville Syrjälä <syrjala@sci.fi> wrote:
>> > > > On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
>> > > >> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
>> > > >> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> > > >> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
>> > > >> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
>> > > >> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
>> > > >> > >> > index 8025624..8875e56 100644
>> > > >> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
>> > > >> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
>> > > >> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
>> > > >> > >> >
>> > > >> > >> >  #ifdef CONFIG_MTRR
>> > > >> > >> >     par->mtrr_aper = -1;
>> > > >> > >> > -   par->mtrr_reg = -1;
>> > > >> > >> >     if (!nomtrr) {
>> > > >> > >> > -           /* Cover the whole resource. */
>> > > >> > >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
>> > > >> > >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
>> > > >> > >> > +                                     info->fix.smem_len,
>> > > >> > >> >                                       MTRR_TYPE_WRCOMB, 1);
>> > > >> > >>
>> > > >> > >> MTRRs need power of two size, so how is this supposed to work?
>> > > >> > >
>> > > >> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
>> > > >> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
>> > > >> > > is not standardized and by no means recorded as a requirement. Obviously
>> > > >> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
>> > > >> > > will use mtrr_check() to verify the the same requirement. Furthermore,
>> > > >> > > as per my commit log message:
>> > > >> >
>> > > >> > Whatever the code may or may not do, the x86 architecture uses
>> > > >> > power-of-two MTRR sizes.  So I'm confused.
>> > > >>
>> > > >> There should be no confusion, I simply did not know that *was* the
>> > > >> requirement for x86, if that is the case we should add a check for that
>> > > >> and perhaps generalize a helper that does the power of two helper changes,
>> > > >> the cleanest I found was the vesafb driver solution.
>> > > >>
>> > > >> Thoughts?
>> > > >
>> > > > The vesafb solution is bad since you'll only end up covering only
>> > > > the first 4MB of the framebuffer instead of the almost 8MB you want.
>> > > > Which in practice will mean throwing away half the VRAM since you really
>> > > > don't want the massive performance hit from accessing it as UC. And that
>> > > > would mean giving up decent display resolutions as well :(
>> > > >
>> > > > And the other option of trying to cover the remainder with multiple ever
>> > > > smaller MTRRs doesn't work either since you'll run out of MTRRs very
>> > > > quickly.
>> > > >
>> > > > This is precisely why I used the hole method in atyfb in the first
>> > > > place.
>> > > >
>> > > > I don't really like the idea of any new mtrr code not supporting that
>> > > > use case, especially as these things tend to be present in older machines
>> > > > where PAT isn't an option.
>> > >
>> > > According to the Intel SDM, volume 3, section 11.5.2.1, table 11-6,
>> > > non-PAT CPUs that have a WC MTRR, PCD = 1, and PWT = 1 (aka UC) have
>> > > an effective memory type of UC.
>
> This is true but non-PAT systems that use just ioremap() will default to
> _PAGE_CACHE_MODE_UC_MINUS, not _PAGE_CACHE_MODE_UC, and _PAGE_CACHE_MODE_UC_MINUS
> on Linux has PCD = 1, PWT = 0. The list comes from:
>
> uint16_t __cachemode2pte_tbl[_PAGE_CACHE_MODE_NUM] = {
>         [_PAGE_CACHE_MODE_WB      ]     = 0         | 0        ,
>         [_PAGE_CACHE_MODE_WC      ]     = _PAGE_PWT | 0        ,
>         [_PAGE_CACHE_MODE_UC_MINUS]     = 0         | _PAGE_PCD,
>         [_PAGE_CACHE_MODE_UC      ]     = _PAGE_PWT | _PAGE_PCD,
>         [_PAGE_CACHE_MODE_WT      ]     = 0         | _PAGE_PCD,
>         [_PAGE_CACHE_MODE_WP      ]     = 0         | _PAGE_PCD,
> };
>
> This can better be read here:
>
>  PAT
>  |PCD
>  ||PWT
>  |||
>  000 WB          _PAGE_CACHE_MODE_WB
>  001 WC          _PAGE_CACHE_MODE_WC
>  010 UC-         _PAGE_CACHE_MODE_UC_MINUS
>  011 UC          _PAGE_CACHE_MODE_UC
>
> On x86 ioremap() defaults to ioremap_nocache() and right now that uses
> _PAGE_CACHE_MODE_UC_MINUS not _PAGE_CACHE_MODE_UC. We have two cases
> to consider for non-PAT systems then:
>
> a) Right now as ioremap() and ioremap_nocache() default to _PAGE_CACHE_MODE_UC_MINUS
>    on x86. In this case using a WC MTRR seems to use PWT=0, PCD=1, and
>    table table 11-6 on non-PAT systems seems to place this situation as
>    "implementation defined" and not encouraged.
>
> a) when commit de33c442e "x86 PAT: fix performance drop for glx, use
>    UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()"
>    gets reverted and we use _PAGE_CACHE_MODE_UC by default. In this
>    case on x86 for both ioremap() and ioremap_nocache() as they will
>    both default to _PAGE_CACHE_MODE_UC we'll end up as you note with
>    an effective memory type of UC.
>
> If I've understood this correctly then neither of these situations are good and
> its just by chance that on some systems situation a) has lead to proper WC.
>
> On a PAT system we have a bit different combinatorial results (based on Table
> 11-7):
>
> a) Right now ioremap() and ioremap_nocache() defaulting to
>     _PAGE_CACHE_MODE_UC_MINUS yields + MTRR WC = WC
>
> b) When commit de33c442e gets reverted _PAGE_CACHE_MODE_UC + MTRR WC = UC
>
> So to be clear right now atyfb should work fine on PAT systems
> with de33c442e in place, once reverted as-is right now we'd end
> up with UC effective memory type.
>
> For both PAT and non-PAT systems when commit de33c442e gets reverted
> we'd end up with UC as the effective memory type for atyfb. Right
> now it shoud work on PAT systems and by chance its suspected to work
> on non-PAT systems. We want to phase MTRR though, specially to avoid
> all this insane combinatorial nightmware.
>
>> > > Hence my suggestion to add
>> > > ioremap_x86_uc and/or set_memory_x86_uc to punch a UC hole in an
>> > > otherwise WC MTRR-covered region.
>
> To be clear I think you mean then that ioremap_x86_uc() would help us avoid the
> jumps between combinatorial issues with MTRR on PAT / non-PAT systems before
> and after commit de33c442e gets reverted. So for instance if we had on the
> atyfb driver:
>
> ioremap_x86_uc(PCI BAR)
> ioremap_wc(framebuffer)
> arch_phys_add_wc(PCI BAR)
>
> On non-PAT systems on the MMIO region with PWT=1, PCD=1 we'd end up with UC.
> Sadly though since _PAGE_CACHE_WC on non-PAT has PWT=1, PCD=0, the WC
> MTRR that follows would mean we'd end up with another grey area (but
> similar to before as technically an effectivethe memory type of WC).
>
> On PAT systems the above would not use MTRRs but we'd be counting on
> overlapping memory types -- its not clear if aliasing here is a problem.
>
> Also Intel SDM, volume 3, section "11.11.4 Range Size and Alignment Requirement"
> describes that: "the minimum range size is 4 KiB, the base address must be on
> a 4 KiB boundary. For ranges greater than 4 KiB each range must be of length
> 2^n and its base address must be alinged on a 2^n boundary where n is a value
> equal or greatar then 12. The base-address alignment value cannot be less
> than its length. For example, an 8-KiB range cannot be aligned on a
> 4-KiB boundary. It must be aligned on at least an 8-KiB boundary"
>
> So to answer my own question: indeed, our framebuffer base address must be
> aligned on a 2^n boundary, the size also has to be a power of 2. MTRR supports
> fixed range sizes and variable range sizes, in case of the MMIO that does
> not need to abide by the power of 2 rule as a fixed range size of 4 KiB
> could be used although upon review ouf our own implemetnation its unclear if
> that is what is used for 4 KiB sized MTRRs.
>
> Hence my arch_phys_add_wc(PCI BAR) as above.
>
>> > OK I think I get it now.
>> >
>> > And I take it this would hopefully only be used for non-PAT systems?
>
> Since we likely could care to use ioremap_x86_uc() on PAT systems as well we
> could make the effective for both PAT and non-PAT obviously then.  Later when
> we get ioremap() to default to strong UC we could drop ioremap_x86_uc() as we'd
> only need it as transitory until then -- that is unless we want perhaps a strong
> UC ioremap primitive which is always following strong UC when available regardless
> of these default transitions.
>
> The big issue I see here is simply the combinatorial issues, so I do think
> its best to annotate these corner cases well and avoid them.
>
>> > Would there be a use case for PAT systems? I wonder if we can wrap
>> > this under some APIs to make it clean and hide this dirty thing
>> > behind the scenes, it seems a fragile and error prone and my hope
>> > would be that we won't need more specialization in this area for
>> > PAT systems.
>>
>> One potential complication is kernel vs. userspace mmap. MTRR applies to
>> the physical address, but PAT applies to the virtual address, so with
>> the WC MTRR you get WC for userspace "for free" as well.
>
> What is the performance impact of having the conversion being done by the
> kernel? Has anyone done measurements? If significant can't the subsystem mmap()
> cache the phys address for PAT? Shouldn't the TLB take care of those considerations
> for us? If this is generally desirable shouldn't we just generalize the cache
> for devices for O(1) access through a generic API?

We're pretty much required to keep the PTE memory types consistent for
aliasses of the same page.  I think that the x86 pageattr code is
supposed to take care of this.  IOW, if everything is working right,
then the supposedly uncached mmap should either fail, be promoted to
WC, or cause the existing WC map to degrade to UC.  The code is really
overcomplicated right now.

--Andy

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-04-02  0:04                       ` Andy Lutomirski
@ 2015-04-02 19:45                         ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-02 19:45 UTC (permalink / raw)
  To: Andy Lutomirski, Mel Gorman, Vlastimil Babka
  Cc: Ville Syrjälä,
	Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Juergen Gross, Jan Beulich, Borislav Petkov,
	Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
	Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
	Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Wed, Apr 01, 2015 at 05:04:08PM -0700, Andy Lutomirski wrote:
> On Wed, Apr 1, 2015 at 4:52 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Sat, Mar 28, 2015 at 02:23:34PM +0200, Ville Syrjälä wrote:
> >> On Sat, Mar 28, 2015 at 01:28:18AM +0100, Luis R. Rodriguez wrote:
> >> > On Fri, Mar 27, 2015 at 03:02:10PM -0700, Andy Lutomirski wrote:
> >> > > On Fri, Mar 27, 2015 at 2:56 PM, Ville Syrjälä <syrjala@sci.fi> wrote:
> >> > > > On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
> >> > > >> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
> >> > > >> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >> > > >> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> >> > > >> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> >> > > >> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> >> > > >> > >> > index 8025624..8875e56 100644
> >> > > >> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> >> > > >> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> >> > > >> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> >> > > >> > >> >
> >> > > >> > >> >  #ifdef CONFIG_MTRR
> >> > > >> > >> >     par->mtrr_aper = -1;
> >> > > >> > >> > -   par->mtrr_reg = -1;
> >> > > >> > >> >     if (!nomtrr) {
> >> > > >> > >> > -           /* Cover the whole resource. */
> >> > > >> > >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> >> > > >> > >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
> >> > > >> > >> > +                                     info->fix.smem_len,
> >> > > >> > >> >                                       MTRR_TYPE_WRCOMB, 1);
> >> > > >> > >>
> >> > > >> > >> MTRRs need power of two size, so how is this supposed to work?
> >> > > >> > >
> >> > > >> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
> >> > > >> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> >> > > >> > > is not standardized and by no means recorded as a requirement. Obviously
> >> > > >> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> >> > > >> > > will use mtrr_check() to verify the the same requirement. Furthermore,
> >> > > >> > > as per my commit log message:
> >> > > >> >
> >> > > >> > Whatever the code may or may not do, the x86 architecture uses
> >> > > >> > power-of-two MTRR sizes.  So I'm confused.
> >> > > >>
> >> > > >> There should be no confusion, I simply did not know that *was* the
> >> > > >> requirement for x86, if that is the case we should add a check for that
> >> > > >> and perhaps generalize a helper that does the power of two helper changes,
> >> > > >> the cleanest I found was the vesafb driver solution.
> >> > > >>
> >> > > >> Thoughts?
> >> > > >
> >> > > > The vesafb solution is bad since you'll only end up covering only
> >> > > > the first 4MB of the framebuffer instead of the almost 8MB you want.
> >> > > > Which in practice will mean throwing away half the VRAM since you really
> >> > > > don't want the massive performance hit from accessing it as UC. And that
> >> > > > would mean giving up decent display resolutions as well :(
> >> > > >
> >> > > > And the other option of trying to cover the remainder with multiple ever
> >> > > > smaller MTRRs doesn't work either since you'll run out of MTRRs very
> >> > > > quickly.
> >> > > >
> >> > > > This is precisely why I used the hole method in atyfb in the first
> >> > > > place.
> >> > > >
> >> > > > I don't really like the idea of any new mtrr code not supporting that
> >> > > > use case, especially as these things tend to be present in older machines
> >> > > > where PAT isn't an option.
> >> > >
> >> > > According to the Intel SDM, volume 3, section 11.5.2.1, table 11-6,
> >> > > non-PAT CPUs that have a WC MTRR, PCD = 1, and PWT = 1 (aka UC) have
> >> > > an effective memory type of UC.
> >
> > This is true but non-PAT systems that use just ioremap() will default to
> > _PAGE_CACHE_MODE_UC_MINUS, not _PAGE_CACHE_MODE_UC, and _PAGE_CACHE_MODE_UC_MINUS
> > on Linux has PCD = 1, PWT = 0. The list comes from:
> >
> > uint16_t __cachemode2pte_tbl[_PAGE_CACHE_MODE_NUM] = {
> >         [_PAGE_CACHE_MODE_WB      ]     = 0         | 0        ,
> >         [_PAGE_CACHE_MODE_WC      ]     = _PAGE_PWT | 0        ,
> >         [_PAGE_CACHE_MODE_UC_MINUS]     = 0         | _PAGE_PCD,
> >         [_PAGE_CACHE_MODE_UC      ]     = _PAGE_PWT | _PAGE_PCD,
> >         [_PAGE_CACHE_MODE_WT      ]     = 0         | _PAGE_PCD,
> >         [_PAGE_CACHE_MODE_WP      ]     = 0         | _PAGE_PCD,
> > };
> >
> > This can better be read here:
> >
> >  PAT
> >  |PCD
> >  ||PWT
> >  |||
> >  000 WB          _PAGE_CACHE_MODE_WB
> >  001 WC          _PAGE_CACHE_MODE_WC
> >  010 UC-         _PAGE_CACHE_MODE_UC_MINUS
> >  011 UC          _PAGE_CACHE_MODE_UC
> >
> > On x86 ioremap() defaults to ioremap_nocache() and right now that uses
> > _PAGE_CACHE_MODE_UC_MINUS not _PAGE_CACHE_MODE_UC. We have two cases
> > to consider for non-PAT systems then:
> >
> > a) Right now as ioremap() and ioremap_nocache() default to _PAGE_CACHE_MODE_UC_MINUS
> >    on x86. In this case using a WC MTRR seems to use PWT=0, PCD=1, and
> >    table table 11-6 on non-PAT systems seems to place this situation as
> >    "implementation defined" and not encouraged.
> >
> > a) when commit de33c442e "x86 PAT: fix performance drop for glx, use
> >    UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()"
> >    gets reverted and we use _PAGE_CACHE_MODE_UC by default. In this
> >    case on x86 for both ioremap() and ioremap_nocache() as they will
> >    both default to _PAGE_CACHE_MODE_UC we'll end up as you note with
> >    an effective memory type of UC.
> >
> > If I've understood this correctly then neither of these situations are good and
> > its just by chance that on some systems situation a) has lead to proper WC.
> >
> > On a PAT system we have a bit different combinatorial results (based on Table
> > 11-7):
> >
> > a) Right now ioremap() and ioremap_nocache() defaulting to
> >     _PAGE_CACHE_MODE_UC_MINUS yields + MTRR WC = WC
> >
> > b) When commit de33c442e gets reverted _PAGE_CACHE_MODE_UC + MTRR WC = UC
> >
> > So to be clear right now atyfb should work fine on PAT systems
> > with de33c442e in place, once reverted as-is right now we'd end
> > up with UC effective memory type.
> >
> > For both PAT and non-PAT systems when commit de33c442e gets reverted
> > we'd end up with UC as the effective memory type for atyfb. Right
> > now it shoud work on PAT systems and by chance its suspected to work
> > on non-PAT systems. We want to phase MTRR though, specially to avoid
> > all this insane combinatorial nightmware.
> >
> >> > > Hence my suggestion to add
> >> > > ioremap_x86_uc and/or set_memory_x86_uc to punch a UC hole in an
> >> > > otherwise WC MTRR-covered region.
> >
> > To be clear I think you mean then that ioremap_x86_uc() would help us avoid the
> > jumps between combinatorial issues with MTRR on PAT / non-PAT systems before
> > and after commit de33c442e gets reverted. So for instance if we had on the
> > atyfb driver:
> >
> > ioremap_x86_uc(PCI BAR)
> > ioremap_wc(framebuffer)
> > arch_phys_add_wc(PCI BAR)
> >
> > On non-PAT systems on the MMIO region with PWT=1, PCD=1 we'd end up with UC.
> > Sadly though since _PAGE_CACHE_WC on non-PAT has PWT=1, PCD=0, the WC
> > MTRR that follows would mean we'd end up with another grey area (but
> > similar to before as technically an effectivethe memory type of WC).
> >
> > On PAT systems the above would not use MTRRs but we'd be counting on
> > overlapping memory types -- its not clear if aliasing here is a problem.
> >
> > Also Intel SDM, volume 3, section "11.11.4 Range Size and Alignment Requirement"
> > describes that: "the minimum range size is 4 KiB, the base address must be on
> > a 4 KiB boundary. For ranges greater than 4 KiB each range must be of length
> > 2^n and its base address must be alinged on a 2^n boundary where n is a value
> > equal or greatar then 12. The base-address alignment value cannot be less
> > than its length. For example, an 8-KiB range cannot be aligned on a
> > 4-KiB boundary. It must be aligned on at least an 8-KiB boundary"
> >
> > So to answer my own question: indeed, our framebuffer base address must be
> > aligned on a 2^n boundary, the size also has to be a power of 2. MTRR supports
> > fixed range sizes and variable range sizes, in case of the MMIO that does
> > not need to abide by the power of 2 rule as a fixed range size of 4 KiB
> > could be used although upon review ouf our own implemetnation its unclear if
> > that is what is used for 4 KiB sized MTRRs.
> >
> > Hence my arch_phys_add_wc(PCI BAR) as above.
> >
> >> > OK I think I get it now.
> >> >
> >> > And I take it this would hopefully only be used for non-PAT systems?
> >
> > Since we likely could care to use ioremap_x86_uc() on PAT systems as well we
> > could make the effective for both PAT and non-PAT obviously then.  Later when
> > we get ioremap() to default to strong UC we could drop ioremap_x86_uc() as we'd
> > only need it as transitory until then -- that is unless we want perhaps a strong
> > UC ioremap primitive which is always following strong UC when available regardless
> > of these default transitions.
> >
> > The big issue I see here is simply the combinatorial issues, so I do think
> > its best to annotate these corner cases well and avoid them.
> >
> >> > Would there be a use case for PAT systems? I wonder if we can wrap
> >> > this under some APIs to make it clean and hide this dirty thing
> >> > behind the scenes, it seems a fragile and error prone and my hope
> >> > would be that we won't need more specialization in this area for
> >> > PAT systems.
> >>
> >> One potential complication is kernel vs. userspace mmap. MTRR applies to
> >> the physical address, but PAT applies to the virtual address, so with
> >> the WC MTRR you get WC for userspace "for free" as well.
> >
> > What is the performance impact of having the conversion being done by the
> > kernel? Has anyone done measurements? If significant can't the subsystem mmap()
> > cache the phys address for PAT? Shouldn't the TLB take care of those considerations
> > for us? If this is generally desirable shouldn't we just generalize the cache
> > for devices for O(1) access through a generic API?
> 
> We're pretty much required to keep the PTE memory types consistent for
> aliasses of the same page. 

Hrm, OK so overlapping ioremap() calls should be frowed upon?

I think its important to clarify the few different scenarios we have
for atyfb, both for today when uc- is default and when uc becomes the
default. I'll also clarify what this series originally tried to do
but the issues that size requirements prohibit us to do along with
combinatorial issues that would also be present when and if uc becomes
default. Finally I'll clarify what I am thinking we should do in light
of all this.

_______________________________________________________________________
|							|	      |
|_______________________________________________________|_____________|

\______________________________________________________/ \____________/

		Framebuffer (8 MiB)			    MMIO (4 KiB)

Currently we have:

Page_cache_mode's _PAGE_CACHE_MODE_ is removed below for brevity.
The atyfb PCI BAR is condensed to:

Frambuffer,MMIO

Keeping in mind:

Intel SDM, volume 3, section 11.5.2.1, table 11-6 (NonPAT combinatorial)
Intel SDM, volume 3, section 11.5.2.2, table 11-7 (PAT    combinatorial)

Linux PCD, PWT bits:

 PAT
 |PCD
 ||PWT
 |||
 000 WB          _PAGE_CACHE_MODE_WB
 001 WC          _PAGE_CACHE_MODE_WC
 010 UC-         _PAGE_CACHE_MODE_UC_MINUS
 011 UC          _PAGE_CACHE_MODE_UC

(*)   below denotes grey area as per SDM, implementation-defined
(%)   below denotes not posislbe due to size / base requirements of MTRRs
(+)   below denotes combinatorial issue

Non-PAT systems use PCD, PWT values, their respective bit settings for
these are given although internally we use _PAGE_CACHE_MODE* on the
ioremap* calls for both non-PAT and PAT. For instance
_PAGE_CACHE_MODE_UC_MINUS is 10 for PCD=1, PWT=0.

Today we have:

--------------------------------------------------------------------
Calls			|    Page_cache_mode  |  Effective memtype  |
------------------------|---------------------|---------------------
			|  Non-PAT |    PAT   |  Non-PAT |    PAT   |
--------------------------------------------------------------------
ioremap(MMIO)		| xxx, 10  | xxx, UC- | xxx, UC  | xxx, UC- |
ioremap(PCI BAR)	| 10 , 10  | UC-, UC- | UC,  UC  | UC-, UC- |
MTRR WC(PCI BAR)	| 10 , 10  | UC-, UC- | WC*, WC* | WC , WC  |
MTRR UC(MMIO)		| 10 , 10  | UC-, UC- | WC*, UC  | WC , UC  |
--------------------------------------------------------------------

If today we revert commit de33c442e and UC becomes default this would run into
the combinatorial issue:

--------------------------------------------------------------------
Calls			|    Page_cache_mode  |  Effective memtype  |
------------------------|---------------------|---------------------
			|  Non-PAT |    PAT   |  Non-PAT |    PAT   |
--------------------------------------------------------------------
ioremap(MMIO)		| xxx, 11  | xxx, UC  | xxx, UC  | xxx, UC  |
ioremap(PCI BAR)	| 11 , 11  | UC , UC  | UC,  UC  | UC , UC  |
MTRR WC(PCI BAR)	| 11 , 11  | UC,  UC  | UC+, UC+ | UC+, UC+ |
MTRR UC(MMIO)		| 11 , 11  | UC,  UC  | UC+, UC  | UC+, UC  |
--------------------------------------------------------------------

We ideally would like to do the following but can't because of the restriction
of having to use powers of two for both size and base address for MTRRs, we'd
have two steps, one with mtrr_add, and another with arch_phys_add_wc(). This is
what this series was proposing for atyfb.

With mtrr_add():

--------------------------------------------------------------------
Calls			|    Page_cache_mode  |  Effective memtype  |
------------------------|---------------------|---------------------
			|  Non-PAT |    PAT   |  Non-PAT |    PAT   |
--------------------------------------------------------------------
ioremap_nocache(MMIO)	| xxx, 10  | xxx, UC- | xxx, UC  | xxx, UC- |
ioremap_wc(fb)		| 01 , 10  | WC , UC- | UC , UC  | WC , UC- |
MTRR WC(fb)		| 01 , 10  | UC-, WC  | WC%*,UC  | WC%, UC- |
--------------------------------------------------------------------

Then we'd change this to arch_phys_add_wc():

--------------------------------------------------------------------
Calls			|    Page_cache_mode  |  Effective memtype  |
------------------------|---------------------|---------------------
			|  Non-PAT |    PAT   |  Non-PAT |    PAT   |
--------------------------------------------------------------------
ioremap_nocache(MMIO)	| xxx, 10  | xxx, UC- | xxx, UC  | UC-, UC- |
ioremap_wc(fb)		| 01 , 10  | WC , UC- | UC , UC  | WC , UC- |
arch_phys_add_wc(fb)	| 01 , 10  | WC , WC  | WC%*,UC  | WC , UC- |
--------------------------------------------------------------------

With the above code as well we have to consider the issues if we
revert commit de33c442e and UC becomes default, we'd run into then
both the size issue and also a grey area:

With mtrr_add():

--------------------------------------------------------------------
Calls			|    Page_cache_mode  |  Effective memtype  |
------------------------|---------------------|---------------------
			|  Non-PAT |    PAT   |  Non-PAT |    PAT   |
--------------------------------------------------------------------
ioremap_nocache(MMIO)	| xxx, 11  | xxx, UC  | xxx, UC  | xxx, UC  |
ioremap_wc(fb)		| 01 , 11  | WC , UC  | UC , UC  | WC , UC  |
MTRR WC(fb)		| 01 , 11  | WC , UC  | WC%* ,UC  | WC , UC  |
--------------------------------------------------------------------

Then with arch_phys_add_wc():

--------------------------------------------------------------------
Calls			|    Page_cache_mode  |  Effective memtype  |
------------------------|---------------------|---------------------
			|  Non-PAT |    PAT   |  Non-PAT |    PAT   |
--------------------------------------------------------------------
ioremap_nocache(MMIO)	| xxx, 11  | xxx, UC  | xxx, UC  | xxx, UC  |
ioremap_wc(fb)		| 01 , 11  | WC , UC  | UC , UC  | WC , UC  |
arch_phys_add_wc(fb)	| 01 , 11  | WC , UC  | WC%*,UC  | WC , UC  |
--------------------------------------------------------------------

So what we *could* do then if we add ioremap_uc() (use strong UC always),
then override the framebuffer area with wc, and finally use MTRR on the
full PCI BAR, relying on that strong UC won't let the MTRR override
the earlier UC on the MMIO area. There is a grey area here for non-PAT
systemes but that is also the case as-is today.

--------------------------------------------------------------------
Calls			|    Page_cache_mode  |  Effective memtype  |
------------------------|---------------------|---------------------
			|  Non-PAT |    PAT   |  Non-PAT |    PAT   |
--------------------------------------------------------------------
ioremap_uc(PCI BAR)	| 11 , 11  | UC , UC  | UC , UC  | UC , UC  |
ioremap_wc(fb)		| 01 , 11  | WC , UC  | UC , UC  | WC , UC  |
MTRR_WC(PCI BAR)	| 01 , 11  | WC , UC  | WC*, UC  | WC , UC  |
--------------------------------------------------------------------

Finally with the arch_phys_add_wc() we'd end up with:

--------------------------------------------------------------------
Calls			|    Page_cache_mode  |  Effective memtype  |
------------------------|---------------------|---------------------
			|  Non-PAT |    PAT   |  Non-PAT |    PAT   |
--------------------------------------------------------------------
ioremap_uc(PCI BAR)	| 11 , 11  | UC , UC  | UC , UC  | UC , UC  |
ioremap_wc(fb)		| 01 , 11  | WC , UC  | UC , UC  | WC , UC  |
arch_phys_add_wc(PCIBAR)| 01 , 11  | WC , UC  | WC*, UC  | WC , UC  |
--------------------------------------------------------------------

In this case a revert of de33c442e won't have any effect as the driver
was already well prepared for it by using ioremap_uc().

> I think that the x86 pageattr code is
> supposed to take care of this.  IOW, if everything is working right,
> then the supposedly uncached mmap should either fail, be promoted to
> WC, or cause the existing WC map to degrade to UC.  The code is really
> overcomplicated right now.

Yeah aliasing things are not clear for the above picture for me, someone
who is knee-deep in this can likely confirm of any issues with the above
pictures. But most importrantly if we believe however that the last two sets
above don't have any issues then I think we can move forward. Since we only
have a few drivers that need special handling I think it makes sense to treat
them specially and document this strategy for the "hole" work around.

Thoughts?

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
@ 2015-04-02 19:45                         ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-02 19:45 UTC (permalink / raw)
  To: Andy Lutomirski, Mel Gorman, Vlastimil Babka
  Cc: Ville Syrjälä,
	Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Juergen Gross, Jan Beulich, Borislav Petkov,
	Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
	Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
	Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Wed, Apr 01, 2015 at 05:04:08PM -0700, Andy Lutomirski wrote:
> On Wed, Apr 1, 2015 at 4:52 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Sat, Mar 28, 2015 at 02:23:34PM +0200, Ville Syrjälä wrote:
> >> On Sat, Mar 28, 2015 at 01:28:18AM +0100, Luis R. Rodriguez wrote:
> >> > On Fri, Mar 27, 2015 at 03:02:10PM -0700, Andy Lutomirski wrote:
> >> > > On Fri, Mar 27, 2015 at 2:56 PM, Ville Syrjälä <syrjala@sci.fi> wrote:
> >> > > > On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
> >> > > >> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
> >> > > >> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >> > > >> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> >> > > >> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> >> > > >> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> >> > > >> > >> > index 8025624..8875e56 100644
> >> > > >> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> >> > > >> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> >> > > >> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> >> > > >> > >> >
> >> > > >> > >> >  #ifdef CONFIG_MTRR
> >> > > >> > >> >     par->mtrr_aper = -1;
> >> > > >> > >> > -   par->mtrr_reg = -1;
> >> > > >> > >> >     if (!nomtrr) {
> >> > > >> > >> > -           /* Cover the whole resource. */
> >> > > >> > >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> >> > > >> > >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
> >> > > >> > >> > +                                     info->fix.smem_len,
> >> > > >> > >> >                                       MTRR_TYPE_WRCOMB, 1);
> >> > > >> > >>
> >> > > >> > >> MTRRs need power of two size, so how is this supposed to work?
> >> > > >> > >
> >> > > >> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
> >> > > >> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> >> > > >> > > is not standardized and by no means recorded as a requirement. Obviously
> >> > > >> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> >> > > >> > > will use mtrr_check() to verify the the same requirement. Furthermore,
> >> > > >> > > as per my commit log message:
> >> > > >> >
> >> > > >> > Whatever the code may or may not do, the x86 architecture uses
> >> > > >> > power-of-two MTRR sizes.  So I'm confused.
> >> > > >>
> >> > > >> There should be no confusion, I simply did not know that *was* the
> >> > > >> requirement for x86, if that is the case we should add a check for that
> >> > > >> and perhaps generalize a helper that does the power of two helper changes,
> >> > > >> the cleanest I found was the vesafb driver solution.
> >> > > >>
> >> > > >> Thoughts?
> >> > > >
> >> > > > The vesafb solution is bad since you'll only end up covering only
> >> > > > the first 4MB of the framebuffer instead of the almost 8MB you want.
> >> > > > Which in practice will mean throwing away half the VRAM since you really
> >> > > > don't want the massive performance hit from accessing it as UC. And that
> >> > > > would mean giving up decent display resolutions as well :(
> >> > > >
> >> > > > And the other option of trying to cover the remainder with multiple ever
> >> > > > smaller MTRRs doesn't work either since you'll run out of MTRRs very
> >> > > > quickly.
> >> > > >
> >> > > > This is precisely why I used the hole method in atyfb in the first
> >> > > > place.
> >> > > >
> >> > > > I don't really like the idea of any new mtrr code not supporting that
> >> > > > use case, especially as these things tend to be present in older machines
> >> > > > where PAT isn't an option.
> >> > >
> >> > > According to the Intel SDM, volume 3, section 11.5.2.1, table 11-6,
> >> > > non-PAT CPUs that have a WC MTRR, PCD = 1, and PWT = 1 (aka UC) have
> >> > > an effective memory type of UC.
> >
> > This is true but non-PAT systems that use just ioremap() will default to
> > _PAGE_CACHE_MODE_UC_MINUS, not _PAGE_CACHE_MODE_UC, and _PAGE_CACHE_MODE_UC_MINUS
> > on Linux has PCD = 1, PWT = 0. The list comes from:
> >
> > uint16_t __cachemode2pte_tbl[_PAGE_CACHE_MODE_NUM] = {
> >         [_PAGE_CACHE_MODE_WB      ]     = 0         | 0        ,
> >         [_PAGE_CACHE_MODE_WC      ]     = _PAGE_PWT | 0        ,
> >         [_PAGE_CACHE_MODE_UC_MINUS]     = 0         | _PAGE_PCD,
> >         [_PAGE_CACHE_MODE_UC      ]     = _PAGE_PWT | _PAGE_PCD,
> >         [_PAGE_CACHE_MODE_WT      ]     = 0         | _PAGE_PCD,
> >         [_PAGE_CACHE_MODE_WP      ]     = 0         | _PAGE_PCD,
> > };
> >
> > This can better be read here:
> >
> >  PAT
> >  |PCD
> >  ||PWT
> >  |||
> >  000 WB          _PAGE_CACHE_MODE_WB
> >  001 WC          _PAGE_CACHE_MODE_WC
> >  010 UC-         _PAGE_CACHE_MODE_UC_MINUS
> >  011 UC          _PAGE_CACHE_MODE_UC
> >
> > On x86 ioremap() defaults to ioremap_nocache() and right now that uses
> > _PAGE_CACHE_MODE_UC_MINUS not _PAGE_CACHE_MODE_UC. We have two cases
> > to consider for non-PAT systems then:
> >
> > a) Right now as ioremap() and ioremap_nocache() default to _PAGE_CACHE_MODE_UC_MINUS
> >    on x86. In this case using a WC MTRR seems to use PWT=0, PCD=1, and
> >    table table 11-6 on non-PAT systems seems to place this situation as
> >    "implementation defined" and not encouraged.
> >
> > a) when commit de33c442e "x86 PAT: fix performance drop for glx, use
> >    UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()"
> >    gets reverted and we use _PAGE_CACHE_MODE_UC by default. In this
> >    case on x86 for both ioremap() and ioremap_nocache() as they will
> >    both default to _PAGE_CACHE_MODE_UC we'll end up as you note with
> >    an effective memory type of UC.
> >
> > If I've understood this correctly then neither of these situations are good and
> > its just by chance that on some systems situation a) has lead to proper WC.
> >
> > On a PAT system we have a bit different combinatorial results (based on Table
> > 11-7):
> >
> > a) Right now ioremap() and ioremap_nocache() defaulting to
> >     _PAGE_CACHE_MODE_UC_MINUS yields + MTRR WC = WC
> >
> > b) When commit de33c442e gets reverted _PAGE_CACHE_MODE_UC + MTRR WC = UC
> >
> > So to be clear right now atyfb should work fine on PAT systems
> > with de33c442e in place, once reverted as-is right now we'd end
> > up with UC effective memory type.
> >
> > For both PAT and non-PAT systems when commit de33c442e gets reverted
> > we'd end up with UC as the effective memory type for atyfb. Right
> > now it shoud work on PAT systems and by chance its suspected to work
> > on non-PAT systems. We want to phase MTRR though, specially to avoid
> > all this insane combinatorial nightmware.
> >
> >> > > Hence my suggestion to add
> >> > > ioremap_x86_uc and/or set_memory_x86_uc to punch a UC hole in an
> >> > > otherwise WC MTRR-covered region.
> >
> > To be clear I think you mean then that ioremap_x86_uc() would help us avoid the
> > jumps between combinatorial issues with MTRR on PAT / non-PAT systems before
> > and after commit de33c442e gets reverted. So for instance if we had on the
> > atyfb driver:
> >
> > ioremap_x86_uc(PCI BAR)
> > ioremap_wc(framebuffer)
> > arch_phys_add_wc(PCI BAR)
> >
> > On non-PAT systems on the MMIO region with PWT=1, PCD=1 we'd end up with UC.
> > Sadly though since _PAGE_CACHE_WC on non-PAT has PWT=1, PCD=0, the WC
> > MTRR that follows would mean we'd end up with another grey area (but
> > similar to before as technically an effectivethe memory type of WC).
> >
> > On PAT systems the above would not use MTRRs but we'd be counting on
> > overlapping memory types -- its not clear if aliasing here is a problem.
> >
> > Also Intel SDM, volume 3, section "11.11.4 Range Size and Alignment Requirement"
> > describes that: "the minimum range size is 4 KiB, the base address must be on
> > a 4 KiB boundary. For ranges greater than 4 KiB each range must be of length
> > 2^n and its base address must be alinged on a 2^n boundary where n is a value
> > equal or greatar then 12. The base-address alignment value cannot be less
> > than its length. For example, an 8-KiB range cannot be aligned on a
> > 4-KiB boundary. It must be aligned on at least an 8-KiB boundary"
> >
> > So to answer my own question: indeed, our framebuffer base address must be
> > aligned on a 2^n boundary, the size also has to be a power of 2. MTRR supports
> > fixed range sizes and variable range sizes, in case of the MMIO that does
> > not need to abide by the power of 2 rule as a fixed range size of 4 KiB
> > could be used although upon review ouf our own implemetnation its unclear if
> > that is what is used for 4 KiB sized MTRRs.
> >
> > Hence my arch_phys_add_wc(PCI BAR) as above.
> >
> >> > OK I think I get it now.
> >> >
> >> > And I take it this would hopefully only be used for non-PAT systems?
> >
> > Since we likely could care to use ioremap_x86_uc() on PAT systems as well we
> > could make the effective for both PAT and non-PAT obviously then.  Later when
> > we get ioremap() to default to strong UC we could drop ioremap_x86_uc() as we'd
> > only need it as transitory until then -- that is unless we want perhaps a strong
> > UC ioremap primitive which is always following strong UC when available regardless
> > of these default transitions.
> >
> > The big issue I see here is simply the combinatorial issues, so I do think
> > its best to annotate these corner cases well and avoid them.
> >
> >> > Would there be a use case for PAT systems? I wonder if we can wrap
> >> > this under some APIs to make it clean and hide this dirty thing
> >> > behind the scenes, it seems a fragile and error prone and my hope
> >> > would be that we won't need more specialization in this area for
> >> > PAT systems.
> >>
> >> One potential complication is kernel vs. userspace mmap. MTRR applies to
> >> the physical address, but PAT applies to the virtual address, so with
> >> the WC MTRR you get WC for userspace "for free" as well.
> >
> > What is the performance impact of having the conversion being done by the
> > kernel? Has anyone done measurements? If significant can't the subsystem mmap()
> > cache the phys address for PAT? Shouldn't the TLB take care of those considerations
> > for us? If this is generally desirable shouldn't we just generalize the cache
> > for devices for O(1) access through a generic API?
> 
> We're pretty much required to keep the PTE memory types consistent for
> aliasses of the same page. 

Hrm, OK so overlapping ioremap() calls should be frowed upon?

I think its important to clarify the few different scenarios we have
for atyfb, both for today when uc- is default and when uc becomes the
default. I'll also clarify what this series originally tried to do
but the issues that size requirements prohibit us to do along with
combinatorial issues that would also be present when and if uc becomes
default. Finally I'll clarify what I am thinking we should do in light
of all this.

_______________________________________________________________________
|							|	      |
|_______________________________________________________|_____________|

\______________________________________________________/ \____________/

		Framebuffer (8 MiB)			    MMIO (4 KiB)

Currently we have:

Page_cache_mode's _PAGE_CACHE_MODE_ is removed below for brevity.
The atyfb PCI BAR is condensed to:

Frambuffer,MMIO

Keeping in mind:

Intel SDM, volume 3, section 11.5.2.1, table 11-6 (NonPAT combinatorial)
Intel SDM, volume 3, section 11.5.2.2, table 11-7 (PAT    combinatorial)

Linux PCD, PWT bits:

 PAT
 |PCD
 ||PWT
 |||
 000 WB          _PAGE_CACHE_MODE_WB
 001 WC          _PAGE_CACHE_MODE_WC
 010 UC-         _PAGE_CACHE_MODE_UC_MINUS
 011 UC          _PAGE_CACHE_MODE_UC

(*)   below denotes grey area as per SDM, implementation-defined
(%)   below denotes not posislbe due to size / base requirements of MTRRs
(+)   below denotes combinatorial issue

Non-PAT systems use PCD, PWT values, their respective bit settings for
these are given although internally we use _PAGE_CACHE_MODE* on the
ioremap* calls for both non-PAT and PAT. For instance
_PAGE_CACHE_MODE_UC_MINUS is 10 for PCD=1, PWT=0.

Today we have:

--------------------------------------------------------------------
Calls			|    Page_cache_mode  |  Effective memtype  |
------------------------|---------------------|---------------------
			|  Non-PAT |    PAT   |  Non-PAT |    PAT   |
--------------------------------------------------------------------
ioremap(MMIO)		| xxx, 10  | xxx, UC- | xxx, UC  | xxx, UC- |
ioremap(PCI BAR)	| 10 , 10  | UC-, UC- | UC,  UC  | UC-, UC- |
MTRR WC(PCI BAR)	| 10 , 10  | UC-, UC- | WC*, WC* | WC , WC  |
MTRR UC(MMIO)		| 10 , 10  | UC-, UC- | WC*, UC  | WC , UC  |
--------------------------------------------------------------------

If today we revert commit de33c442e and UC becomes default this would run into
the combinatorial issue:

--------------------------------------------------------------------
Calls			|    Page_cache_mode  |  Effective memtype  |
------------------------|---------------------|---------------------
			|  Non-PAT |    PAT   |  Non-PAT |    PAT   |
--------------------------------------------------------------------
ioremap(MMIO)		| xxx, 11  | xxx, UC  | xxx, UC  | xxx, UC  |
ioremap(PCI BAR)	| 11 , 11  | UC , UC  | UC,  UC  | UC , UC  |
MTRR WC(PCI BAR)	| 11 , 11  | UC,  UC  | UC+, UC+ | UC+, UC+ |
MTRR UC(MMIO)		| 11 , 11  | UC,  UC  | UC+, UC  | UC+, UC  |
--------------------------------------------------------------------

We ideally would like to do the following but can't because of the restriction
of having to use powers of two for both size and base address for MTRRs, we'd
have two steps, one with mtrr_add, and another with arch_phys_add_wc(). This is
what this series was proposing for atyfb.

With mtrr_add():

--------------------------------------------------------------------
Calls			|    Page_cache_mode  |  Effective memtype  |
------------------------|---------------------|---------------------
			|  Non-PAT |    PAT   |  Non-PAT |    PAT   |
--------------------------------------------------------------------
ioremap_nocache(MMIO)	| xxx, 10  | xxx, UC- | xxx, UC  | xxx, UC- |
ioremap_wc(fb)		| 01 , 10  | WC , UC- | UC , UC  | WC , UC- |
MTRR WC(fb)		| 01 , 10  | UC-, WC  | WC%*,UC  | WC%, UC- |
--------------------------------------------------------------------

Then we'd change this to arch_phys_add_wc():

--------------------------------------------------------------------
Calls			|    Page_cache_mode  |  Effective memtype  |
------------------------|---------------------|---------------------
			|  Non-PAT |    PAT   |  Non-PAT |    PAT   |
--------------------------------------------------------------------
ioremap_nocache(MMIO)	| xxx, 10  | xxx, UC- | xxx, UC  | UC-, UC- |
ioremap_wc(fb)		| 01 , 10  | WC , UC- | UC , UC  | WC , UC- |
arch_phys_add_wc(fb)	| 01 , 10  | WC , WC  | WC%*,UC  | WC , UC- |
--------------------------------------------------------------------

With the above code as well we have to consider the issues if we
revert commit de33c442e and UC becomes default, we'd run into then
both the size issue and also a grey area:

With mtrr_add():

--------------------------------------------------------------------
Calls			|    Page_cache_mode  |  Effective memtype  |
------------------------|---------------------|---------------------
			|  Non-PAT |    PAT   |  Non-PAT |    PAT   |
--------------------------------------------------------------------
ioremap_nocache(MMIO)	| xxx, 11  | xxx, UC  | xxx, UC  | xxx, UC  |
ioremap_wc(fb)		| 01 , 11  | WC , UC  | UC , UC  | WC , UC  |
MTRR WC(fb)		| 01 , 11  | WC , UC  | WC%* ,UC  | WC , UC  |
--------------------------------------------------------------------

Then with arch_phys_add_wc():

--------------------------------------------------------------------
Calls			|    Page_cache_mode  |  Effective memtype  |
------------------------|---------------------|---------------------
			|  Non-PAT |    PAT   |  Non-PAT |    PAT   |
--------------------------------------------------------------------
ioremap_nocache(MMIO)	| xxx, 11  | xxx, UC  | xxx, UC  | xxx, UC  |
ioremap_wc(fb)		| 01 , 11  | WC , UC  | UC , UC  | WC , UC  |
arch_phys_add_wc(fb)	| 01 , 11  | WC , UC  | WC%*,UC  | WC , UC  |
--------------------------------------------------------------------

So what we *could* do then if we add ioremap_uc() (use strong UC always),
then override the framebuffer area with wc, and finally use MTRR on the
full PCI BAR, relying on that strong UC won't let the MTRR override
the earlier UC on the MMIO area. There is a grey area here for non-PAT
systemes but that is also the case as-is today.

--------------------------------------------------------------------
Calls			|    Page_cache_mode  |  Effective memtype  |
------------------------|---------------------|---------------------
			|  Non-PAT |    PAT   |  Non-PAT |    PAT   |
--------------------------------------------------------------------
ioremap_uc(PCI BAR)	| 11 , 11  | UC , UC  | UC , UC  | UC , UC  |
ioremap_wc(fb)		| 01 , 11  | WC , UC  | UC , UC  | WC , UC  |
MTRR_WC(PCI BAR)	| 01 , 11  | WC , UC  | WC*, UC  | WC , UC  |
--------------------------------------------------------------------

Finally with the arch_phys_add_wc() we'd end up with:

--------------------------------------------------------------------
Calls			|    Page_cache_mode  |  Effective memtype  |
------------------------|---------------------|---------------------
			|  Non-PAT |    PAT   |  Non-PAT |    PAT   |
--------------------------------------------------------------------
ioremap_uc(PCI BAR)	| 11 , 11  | UC , UC  | UC , UC  | UC , UC  |
ioremap_wc(fb)		| 01 , 11  | WC , UC  | UC , UC  | WC , UC  |
arch_phys_add_wc(PCIBAR)| 01 , 11  | WC , UC  | WC*, UC  | WC , UC  |
--------------------------------------------------------------------

In this case a revert of de33c442e won't have any effect as the driver
was already well prepared for it by using ioremap_uc().

> I think that the x86 pageattr code is
> supposed to take care of this.  IOW, if everything is working right,
> then the supposedly uncached mmap should either fail, be promoted to
> WC, or cause the existing WC map to degrade to UC.  The code is really
> overcomplicated right now.

Yeah aliasing things are not clear for the above picture for me, someone
who is knee-deep in this can likely confirm of any issues with the above
pictures. But most importrantly if we believe however that the last two sets
above don't have any issues then I think we can move forward. Since we only
have a few drivers that need special handling I think it makes sense to treat
them specially and document this strategy for the "hole" work around.

Thoughts?

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-04-02  0:04                       ` Andy Lutomirski
  (?)
  (?)
@ 2015-04-02 19:45                       ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-02 19:45 UTC (permalink / raw)
  To: Andy Lutomirski, Mel Gorman, Vlastimil Babka
  Cc: Linux Fbdev development list, Daniel Vetter,
	Ville Syrjälä,
	Jan Beulich, H. Peter Anvin, Suresh Siddha, Tomi Valkeinen,
	X86 ML, Ingo Molnar, xen-devel, Ingo Molnar, Borislav Petkov,
	Jean-Christophe Plagniol-Villard, Antonino Daplas, Dave Airlie,
	Bjorn Helgaas, Thomas Gleixner, Juergen Gross, Luis R. Rodriguez,
	linux-kernel, venkatesh.pallipadi

On Wed, Apr 01, 2015 at 05:04:08PM -0700, Andy Lutomirski wrote:
> On Wed, Apr 1, 2015 at 4:52 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Sat, Mar 28, 2015 at 02:23:34PM +0200, Ville Syrjälä wrote:
> >> On Sat, Mar 28, 2015 at 01:28:18AM +0100, Luis R. Rodriguez wrote:
> >> > On Fri, Mar 27, 2015 at 03:02:10PM -0700, Andy Lutomirski wrote:
> >> > > On Fri, Mar 27, 2015 at 2:56 PM, Ville Syrjälä <syrjala@sci.fi> wrote:
> >> > > > On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
> >> > > >> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
> >> > > >> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >> > > >> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
> >> > > >> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
> >> > > >> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
> >> > > >> > >> > index 8025624..8875e56 100644
> >> > > >> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
> >> > > >> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
> >> > > >> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
> >> > > >> > >> >
> >> > > >> > >> >  #ifdef CONFIG_MTRR
> >> > > >> > >> >     par->mtrr_aper = -1;
> >> > > >> > >> > -   par->mtrr_reg = -1;
> >> > > >> > >> >     if (!nomtrr) {
> >> > > >> > >> > -           /* Cover the whole resource. */
> >> > > >> > >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
> >> > > >> > >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
> >> > > >> > >> > +                                     info->fix.smem_len,
> >> > > >> > >> >                                       MTRR_TYPE_WRCOMB, 1);
> >> > > >> > >>
> >> > > >> > >> MTRRs need power of two size, so how is this supposed to work?
> >> > > >> > >
> >> > > >> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
> >> > > >> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
> >> > > >> > > is not standardized and by no means recorded as a requirement. Obviously
> >> > > >> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
> >> > > >> > > will use mtrr_check() to verify the the same requirement. Furthermore,
> >> > > >> > > as per my commit log message:
> >> > > >> >
> >> > > >> > Whatever the code may or may not do, the x86 architecture uses
> >> > > >> > power-of-two MTRR sizes.  So I'm confused.
> >> > > >>
> >> > > >> There should be no confusion, I simply did not know that *was* the
> >> > > >> requirement for x86, if that is the case we should add a check for that
> >> > > >> and perhaps generalize a helper that does the power of two helper changes,
> >> > > >> the cleanest I found was the vesafb driver solution.
> >> > > >>
> >> > > >> Thoughts?
> >> > > >
> >> > > > The vesafb solution is bad since you'll only end up covering only
> >> > > > the first 4MB of the framebuffer instead of the almost 8MB you want.
> >> > > > Which in practice will mean throwing away half the VRAM since you really
> >> > > > don't want the massive performance hit from accessing it as UC. And that
> >> > > > would mean giving up decent display resolutions as well :(
> >> > > >
> >> > > > And the other option of trying to cover the remainder with multiple ever
> >> > > > smaller MTRRs doesn't work either since you'll run out of MTRRs very
> >> > > > quickly.
> >> > > >
> >> > > > This is precisely why I used the hole method in atyfb in the first
> >> > > > place.
> >> > > >
> >> > > > I don't really like the idea of any new mtrr code not supporting that
> >> > > > use case, especially as these things tend to be present in older machines
> >> > > > where PAT isn't an option.
> >> > >
> >> > > According to the Intel SDM, volume 3, section 11.5.2.1, table 11-6,
> >> > > non-PAT CPUs that have a WC MTRR, PCD = 1, and PWT = 1 (aka UC) have
> >> > > an effective memory type of UC.
> >
> > This is true but non-PAT systems that use just ioremap() will default to
> > _PAGE_CACHE_MODE_UC_MINUS, not _PAGE_CACHE_MODE_UC, and _PAGE_CACHE_MODE_UC_MINUS
> > on Linux has PCD = 1, PWT = 0. The list comes from:
> >
> > uint16_t __cachemode2pte_tbl[_PAGE_CACHE_MODE_NUM] = {
> >         [_PAGE_CACHE_MODE_WB      ]     = 0         | 0        ,
> >         [_PAGE_CACHE_MODE_WC      ]     = _PAGE_PWT | 0        ,
> >         [_PAGE_CACHE_MODE_UC_MINUS]     = 0         | _PAGE_PCD,
> >         [_PAGE_CACHE_MODE_UC      ]     = _PAGE_PWT | _PAGE_PCD,
> >         [_PAGE_CACHE_MODE_WT      ]     = 0         | _PAGE_PCD,
> >         [_PAGE_CACHE_MODE_WP      ]     = 0         | _PAGE_PCD,
> > };
> >
> > This can better be read here:
> >
> >  PAT
> >  |PCD
> >  ||PWT
> >  |||
> >  000 WB          _PAGE_CACHE_MODE_WB
> >  001 WC          _PAGE_CACHE_MODE_WC
> >  010 UC-         _PAGE_CACHE_MODE_UC_MINUS
> >  011 UC          _PAGE_CACHE_MODE_UC
> >
> > On x86 ioremap() defaults to ioremap_nocache() and right now that uses
> > _PAGE_CACHE_MODE_UC_MINUS not _PAGE_CACHE_MODE_UC. We have two cases
> > to consider for non-PAT systems then:
> >
> > a) Right now as ioremap() and ioremap_nocache() default to _PAGE_CACHE_MODE_UC_MINUS
> >    on x86. In this case using a WC MTRR seems to use PWT=0, PCD=1, and
> >    table table 11-6 on non-PAT systems seems to place this situation as
> >    "implementation defined" and not encouraged.
> >
> > a) when commit de33c442e "x86 PAT: fix performance drop for glx, use
> >    UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()"
> >    gets reverted and we use _PAGE_CACHE_MODE_UC by default. In this
> >    case on x86 for both ioremap() and ioremap_nocache() as they will
> >    both default to _PAGE_CACHE_MODE_UC we'll end up as you note with
> >    an effective memory type of UC.
> >
> > If I've understood this correctly then neither of these situations are good and
> > its just by chance that on some systems situation a) has lead to proper WC.
> >
> > On a PAT system we have a bit different combinatorial results (based on Table
> > 11-7):
> >
> > a) Right now ioremap() and ioremap_nocache() defaulting to
> >     _PAGE_CACHE_MODE_UC_MINUS yields + MTRR WC = WC
> >
> > b) When commit de33c442e gets reverted _PAGE_CACHE_MODE_UC + MTRR WC = UC
> >
> > So to be clear right now atyfb should work fine on PAT systems
> > with de33c442e in place, once reverted as-is right now we'd end
> > up with UC effective memory type.
> >
> > For both PAT and non-PAT systems when commit de33c442e gets reverted
> > we'd end up with UC as the effective memory type for atyfb. Right
> > now it shoud work on PAT systems and by chance its suspected to work
> > on non-PAT systems. We want to phase MTRR though, specially to avoid
> > all this insane combinatorial nightmware.
> >
> >> > > Hence my suggestion to add
> >> > > ioremap_x86_uc and/or set_memory_x86_uc to punch a UC hole in an
> >> > > otherwise WC MTRR-covered region.
> >
> > To be clear I think you mean then that ioremap_x86_uc() would help us avoid the
> > jumps between combinatorial issues with MTRR on PAT / non-PAT systems before
> > and after commit de33c442e gets reverted. So for instance if we had on the
> > atyfb driver:
> >
> > ioremap_x86_uc(PCI BAR)
> > ioremap_wc(framebuffer)
> > arch_phys_add_wc(PCI BAR)
> >
> > On non-PAT systems on the MMIO region with PWT=1, PCD=1 we'd end up with UC.
> > Sadly though since _PAGE_CACHE_WC on non-PAT has PWT=1, PCD=0, the WC
> > MTRR that follows would mean we'd end up with another grey area (but
> > similar to before as technically an effectivethe memory type of WC).
> >
> > On PAT systems the above would not use MTRRs but we'd be counting on
> > overlapping memory types -- its not clear if aliasing here is a problem.
> >
> > Also Intel SDM, volume 3, section "11.11.4 Range Size and Alignment Requirement"
> > describes that: "the minimum range size is 4 KiB, the base address must be on
> > a 4 KiB boundary. For ranges greater than 4 KiB each range must be of length
> > 2^n and its base address must be alinged on a 2^n boundary where n is a value
> > equal or greatar then 12. The base-address alignment value cannot be less
> > than its length. For example, an 8-KiB range cannot be aligned on a
> > 4-KiB boundary. It must be aligned on at least an 8-KiB boundary"
> >
> > So to answer my own question: indeed, our framebuffer base address must be
> > aligned on a 2^n boundary, the size also has to be a power of 2. MTRR supports
> > fixed range sizes and variable range sizes, in case of the MMIO that does
> > not need to abide by the power of 2 rule as a fixed range size of 4 KiB
> > could be used although upon review ouf our own implemetnation its unclear if
> > that is what is used for 4 KiB sized MTRRs.
> >
> > Hence my arch_phys_add_wc(PCI BAR) as above.
> >
> >> > OK I think I get it now.
> >> >
> >> > And I take it this would hopefully only be used for non-PAT systems?
> >
> > Since we likely could care to use ioremap_x86_uc() on PAT systems as well we
> > could make the effective for both PAT and non-PAT obviously then.  Later when
> > we get ioremap() to default to strong UC we could drop ioremap_x86_uc() as we'd
> > only need it as transitory until then -- that is unless we want perhaps a strong
> > UC ioremap primitive which is always following strong UC when available regardless
> > of these default transitions.
> >
> > The big issue I see here is simply the combinatorial issues, so I do think
> > its best to annotate these corner cases well and avoid them.
> >
> >> > Would there be a use case for PAT systems? I wonder if we can wrap
> >> > this under some APIs to make it clean and hide this dirty thing
> >> > behind the scenes, it seems a fragile and error prone and my hope
> >> > would be that we won't need more specialization in this area for
> >> > PAT systems.
> >>
> >> One potential complication is kernel vs. userspace mmap. MTRR applies to
> >> the physical address, but PAT applies to the virtual address, so with
> >> the WC MTRR you get WC for userspace "for free" as well.
> >
> > What is the performance impact of having the conversion being done by the
> > kernel? Has anyone done measurements? If significant can't the subsystem mmap()
> > cache the phys address for PAT? Shouldn't the TLB take care of those considerations
> > for us? If this is generally desirable shouldn't we just generalize the cache
> > for devices for O(1) access through a generic API?
> 
> We're pretty much required to keep the PTE memory types consistent for
> aliasses of the same page. 

Hrm, OK so overlapping ioremap() calls should be frowed upon?

I think its important to clarify the few different scenarios we have
for atyfb, both for today when uc- is default and when uc becomes the
default. I'll also clarify what this series originally tried to do
but the issues that size requirements prohibit us to do along with
combinatorial issues that would also be present when and if uc becomes
default. Finally I'll clarify what I am thinking we should do in light
of all this.

_______________________________________________________________________
|							|	      |
|_______________________________________________________|_____________|

\______________________________________________________/ \____________/

		Framebuffer (8 MiB)			    MMIO (4 KiB)

Currently we have:

Page_cache_mode's _PAGE_CACHE_MODE_ is removed below for brevity.
The atyfb PCI BAR is condensed to:

Frambuffer,MMIO

Keeping in mind:

Intel SDM, volume 3, section 11.5.2.1, table 11-6 (NonPAT combinatorial)
Intel SDM, volume 3, section 11.5.2.2, table 11-7 (PAT    combinatorial)

Linux PCD, PWT bits:

 PAT
 |PCD
 ||PWT
 |||
 000 WB          _PAGE_CACHE_MODE_WB
 001 WC          _PAGE_CACHE_MODE_WC
 010 UC-         _PAGE_CACHE_MODE_UC_MINUS
 011 UC          _PAGE_CACHE_MODE_UC

(*)   below denotes grey area as per SDM, implementation-defined
(%)   below denotes not posislbe due to size / base requirements of MTRRs
(+)   below denotes combinatorial issue

Non-PAT systems use PCD, PWT values, their respective bit settings for
these are given although internally we use _PAGE_CACHE_MODE* on the
ioremap* calls for both non-PAT and PAT. For instance
_PAGE_CACHE_MODE_UC_MINUS is 10 for PCD=1, PWT=0.

Today we have:

--------------------------------------------------------------------
Calls			|    Page_cache_mode  |  Effective memtype  |
------------------------|---------------------|---------------------
			|  Non-PAT |    PAT   |  Non-PAT |    PAT   |
--------------------------------------------------------------------
ioremap(MMIO)		| xxx, 10  | xxx, UC- | xxx, UC  | xxx, UC- |
ioremap(PCI BAR)	| 10 , 10  | UC-, UC- | UC,  UC  | UC-, UC- |
MTRR WC(PCI BAR)	| 10 , 10  | UC-, UC- | WC*, WC* | WC , WC  |
MTRR UC(MMIO)		| 10 , 10  | UC-, UC- | WC*, UC  | WC , UC  |
--------------------------------------------------------------------

If today we revert commit de33c442e and UC becomes default this would run into
the combinatorial issue:

--------------------------------------------------------------------
Calls			|    Page_cache_mode  |  Effective memtype  |
------------------------|---------------------|---------------------
			|  Non-PAT |    PAT   |  Non-PAT |    PAT   |
--------------------------------------------------------------------
ioremap(MMIO)		| xxx, 11  | xxx, UC  | xxx, UC  | xxx, UC  |
ioremap(PCI BAR)	| 11 , 11  | UC , UC  | UC,  UC  | UC , UC  |
MTRR WC(PCI BAR)	| 11 , 11  | UC,  UC  | UC+, UC+ | UC+, UC+ |
MTRR UC(MMIO)		| 11 , 11  | UC,  UC  | UC+, UC  | UC+, UC  |
--------------------------------------------------------------------

We ideally would like to do the following but can't because of the restriction
of having to use powers of two for both size and base address for MTRRs, we'd
have two steps, one with mtrr_add, and another with arch_phys_add_wc(). This is
what this series was proposing for atyfb.

With mtrr_add():

--------------------------------------------------------------------
Calls			|    Page_cache_mode  |  Effective memtype  |
------------------------|---------------------|---------------------
			|  Non-PAT |    PAT   |  Non-PAT |    PAT   |
--------------------------------------------------------------------
ioremap_nocache(MMIO)	| xxx, 10  | xxx, UC- | xxx, UC  | xxx, UC- |
ioremap_wc(fb)		| 01 , 10  | WC , UC- | UC , UC  | WC , UC- |
MTRR WC(fb)		| 01 , 10  | UC-, WC  | WC%*,UC  | WC%, UC- |
--------------------------------------------------------------------

Then we'd change this to arch_phys_add_wc():

--------------------------------------------------------------------
Calls			|    Page_cache_mode  |  Effective memtype  |
------------------------|---------------------|---------------------
			|  Non-PAT |    PAT   |  Non-PAT |    PAT   |
--------------------------------------------------------------------
ioremap_nocache(MMIO)	| xxx, 10  | xxx, UC- | xxx, UC  | UC-, UC- |
ioremap_wc(fb)		| 01 , 10  | WC , UC- | UC , UC  | WC , UC- |
arch_phys_add_wc(fb)	| 01 , 10  | WC , WC  | WC%*,UC  | WC , UC- |
--------------------------------------------------------------------

With the above code as well we have to consider the issues if we
revert commit de33c442e and UC becomes default, we'd run into then
both the size issue and also a grey area:

With mtrr_add():

--------------------------------------------------------------------
Calls			|    Page_cache_mode  |  Effective memtype  |
------------------------|---------------------|---------------------
			|  Non-PAT |    PAT   |  Non-PAT |    PAT   |
--------------------------------------------------------------------
ioremap_nocache(MMIO)	| xxx, 11  | xxx, UC  | xxx, UC  | xxx, UC  |
ioremap_wc(fb)		| 01 , 11  | WC , UC  | UC , UC  | WC , UC  |
MTRR WC(fb)		| 01 , 11  | WC , UC  | WC%* ,UC  | WC , UC  |
--------------------------------------------------------------------

Then with arch_phys_add_wc():

--------------------------------------------------------------------
Calls			|    Page_cache_mode  |  Effective memtype  |
------------------------|---------------------|---------------------
			|  Non-PAT |    PAT   |  Non-PAT |    PAT   |
--------------------------------------------------------------------
ioremap_nocache(MMIO)	| xxx, 11  | xxx, UC  | xxx, UC  | xxx, UC  |
ioremap_wc(fb)		| 01 , 11  | WC , UC  | UC , UC  | WC , UC  |
arch_phys_add_wc(fb)	| 01 , 11  | WC , UC  | WC%*,UC  | WC , UC  |
--------------------------------------------------------------------

So what we *could* do then if we add ioremap_uc() (use strong UC always),
then override the framebuffer area with wc, and finally use MTRR on the
full PCI BAR, relying on that strong UC won't let the MTRR override
the earlier UC on the MMIO area. There is a grey area here for non-PAT
systemes but that is also the case as-is today.

--------------------------------------------------------------------
Calls			|    Page_cache_mode  |  Effective memtype  |
------------------------|---------------------|---------------------
			|  Non-PAT |    PAT   |  Non-PAT |    PAT   |
--------------------------------------------------------------------
ioremap_uc(PCI BAR)	| 11 , 11  | UC , UC  | UC , UC  | UC , UC  |
ioremap_wc(fb)		| 01 , 11  | WC , UC  | UC , UC  | WC , UC  |
MTRR_WC(PCI BAR)	| 01 , 11  | WC , UC  | WC*, UC  | WC , UC  |
--------------------------------------------------------------------

Finally with the arch_phys_add_wc() we'd end up with:

--------------------------------------------------------------------
Calls			|    Page_cache_mode  |  Effective memtype  |
------------------------|---------------------|---------------------
			|  Non-PAT |    PAT   |  Non-PAT |    PAT   |
--------------------------------------------------------------------
ioremap_uc(PCI BAR)	| 11 , 11  | UC , UC  | UC , UC  | UC , UC  |
ioremap_wc(fb)		| 01 , 11  | WC , UC  | UC , UC  | WC , UC  |
arch_phys_add_wc(PCIBAR)| 01 , 11  | WC , UC  | WC*, UC  | WC , UC  |
--------------------------------------------------------------------

In this case a revert of de33c442e won't have any effect as the driver
was already well prepared for it by using ioremap_uc().

> I think that the x86 pageattr code is
> supposed to take care of this.  IOW, if everything is working right,
> then the supposedly uncached mmap should either fail, be promoted to
> WC, or cause the existing WC map to degrade to UC.  The code is really
> overcomplicated right now.

Yeah aliasing things are not clear for the above picture for me, someone
who is knee-deep in this can likely confirm of any issues with the above
pictures. But most importrantly if we believe however that the last two sets
above don't have any issues then I think we can move forward. Since we only
have a few drivers that need special handling I think it makes sense to treat
them specially and document this strategy for the "hole" work around.

Thoughts?

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-04-02 19:45                         ` Luis R. Rodriguez
@ 2015-04-02 19:50                           ` Andy Lutomirski
  -1 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-04-02 19:50 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Mel Gorman, Vlastimil Babka, Ville Syrjälä,
	Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Juergen Gross, Jan Beulich, Borislav Petkov,
	Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
	Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
	Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Thu, Apr 2, 2015 at 12:45 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Wed, Apr 01, 2015 at 05:04:08PM -0700, Andy Lutomirski wrote:
>> On Wed, Apr 1, 2015 at 4:52 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> > On Sat, Mar 28, 2015 at 02:23:34PM +0200, Ville Syrjälä wrote:
>> >> On Sat, Mar 28, 2015 at 01:28:18AM +0100, Luis R. Rodriguez wrote:
>> >> > On Fri, Mar 27, 2015 at 03:02:10PM -0700, Andy Lutomirski wrote:
>> >> > > On Fri, Mar 27, 2015 at 2:56 PM, Ville Syrjälä <syrjala@sci.fi> wrote:
>> >> > > > On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
>> >> > > >> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
>> >> > > >> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> >> > > >> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
>> >> > > >> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
>> >> > > >> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
>> >> > > >> > >> > index 8025624..8875e56 100644
>> >> > > >> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
>> >> > > >> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
>> >> > > >> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
>> >> > > >> > >> >
>> >> > > >> > >> >  #ifdef CONFIG_MTRR
>> >> > > >> > >> >     par->mtrr_aper = -1;
>> >> > > >> > >> > -   par->mtrr_reg = -1;
>> >> > > >> > >> >     if (!nomtrr) {
>> >> > > >> > >> > -           /* Cover the whole resource. */
>> >> > > >> > >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
>> >> > > >> > >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
>> >> > > >> > >> > +                                     info->fix.smem_len,
>> >> > > >> > >> >                                       MTRR_TYPE_WRCOMB, 1);
>> >> > > >> > >>
>> >> > > >> > >> MTRRs need power of two size, so how is this supposed to work?
>> >> > > >> > >
>> >> > > >> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
>> >> > > >> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
>> >> > > >> > > is not standardized and by no means recorded as a requirement. Obviously
>> >> > > >> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
>> >> > > >> > > will use mtrr_check() to verify the the same requirement. Furthermore,
>> >> > > >> > > as per my commit log message:
>> >> > > >> >
>> >> > > >> > Whatever the code may or may not do, the x86 architecture uses
>> >> > > >> > power-of-two MTRR sizes.  So I'm confused.
>> >> > > >>
>> >> > > >> There should be no confusion, I simply did not know that *was* the
>> >> > > >> requirement for x86, if that is the case we should add a check for that
>> >> > > >> and perhaps generalize a helper that does the power of two helper changes,
>> >> > > >> the cleanest I found was the vesafb driver solution.
>> >> > > >>
>> >> > > >> Thoughts?
>> >> > > >
>> >> > > > The vesafb solution is bad since you'll only end up covering only
>> >> > > > the first 4MB of the framebuffer instead of the almost 8MB you want.
>> >> > > > Which in practice will mean throwing away half the VRAM since you really
>> >> > > > don't want the massive performance hit from accessing it as UC. And that
>> >> > > > would mean giving up decent display resolutions as well :(
>> >> > > >
>> >> > > > And the other option of trying to cover the remainder with multiple ever
>> >> > > > smaller MTRRs doesn't work either since you'll run out of MTRRs very
>> >> > > > quickly.
>> >> > > >
>> >> > > > This is precisely why I used the hole method in atyfb in the first
>> >> > > > place.
>> >> > > >
>> >> > > > I don't really like the idea of any new mtrr code not supporting that
>> >> > > > use case, especially as these things tend to be present in older machines
>> >> > > > where PAT isn't an option.
>> >> > >
>> >> > > According to the Intel SDM, volume 3, section 11.5.2.1, table 11-6,
>> >> > > non-PAT CPUs that have a WC MTRR, PCD = 1, and PWT = 1 (aka UC) have
>> >> > > an effective memory type of UC.
>> >
>> > This is true but non-PAT systems that use just ioremap() will default to
>> > _PAGE_CACHE_MODE_UC_MINUS, not _PAGE_CACHE_MODE_UC, and _PAGE_CACHE_MODE_UC_MINUS
>> > on Linux has PCD = 1, PWT = 0. The list comes from:
>> >
>> > uint16_t __cachemode2pte_tbl[_PAGE_CACHE_MODE_NUM] = {
>> >         [_PAGE_CACHE_MODE_WB      ]     = 0         | 0        ,
>> >         [_PAGE_CACHE_MODE_WC      ]     = _PAGE_PWT | 0        ,
>> >         [_PAGE_CACHE_MODE_UC_MINUS]     = 0         | _PAGE_PCD,
>> >         [_PAGE_CACHE_MODE_UC      ]     = _PAGE_PWT | _PAGE_PCD,
>> >         [_PAGE_CACHE_MODE_WT      ]     = 0         | _PAGE_PCD,
>> >         [_PAGE_CACHE_MODE_WP      ]     = 0         | _PAGE_PCD,
>> > };
>> >
>> > This can better be read here:
>> >
>> >  PAT
>> >  |PCD
>> >  ||PWT
>> >  |||
>> >  000 WB          _PAGE_CACHE_MODE_WB
>> >  001 WC          _PAGE_CACHE_MODE_WC
>> >  010 UC-         _PAGE_CACHE_MODE_UC_MINUS
>> >  011 UC          _PAGE_CACHE_MODE_UC
>> >
>> > On x86 ioremap() defaults to ioremap_nocache() and right now that uses
>> > _PAGE_CACHE_MODE_UC_MINUS not _PAGE_CACHE_MODE_UC. We have two cases
>> > to consider for non-PAT systems then:
>> >
>> > a) Right now as ioremap() and ioremap_nocache() default to _PAGE_CACHE_MODE_UC_MINUS
>> >    on x86. In this case using a WC MTRR seems to use PWT=0, PCD=1, and
>> >    table table 11-6 on non-PAT systems seems to place this situation as
>> >    "implementation defined" and not encouraged.
>> >
>> > a) when commit de33c442e "x86 PAT: fix performance drop for glx, use
>> >    UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()"
>> >    gets reverted and we use _PAGE_CACHE_MODE_UC by default. In this
>> >    case on x86 for both ioremap() and ioremap_nocache() as they will
>> >    both default to _PAGE_CACHE_MODE_UC we'll end up as you note with
>> >    an effective memory type of UC.
>> >
>> > If I've understood this correctly then neither of these situations are good and
>> > its just by chance that on some systems situation a) has lead to proper WC.
>> >
>> > On a PAT system we have a bit different combinatorial results (based on Table
>> > 11-7):
>> >
>> > a) Right now ioremap() and ioremap_nocache() defaulting to
>> >     _PAGE_CACHE_MODE_UC_MINUS yields + MTRR WC = WC
>> >
>> > b) When commit de33c442e gets reverted _PAGE_CACHE_MODE_UC + MTRR WC = UC
>> >
>> > So to be clear right now atyfb should work fine on PAT systems
>> > with de33c442e in place, once reverted as-is right now we'd end
>> > up with UC effective memory type.
>> >
>> > For both PAT and non-PAT systems when commit de33c442e gets reverted
>> > we'd end up with UC as the effective memory type for atyfb. Right
>> > now it shoud work on PAT systems and by chance its suspected to work
>> > on non-PAT systems. We want to phase MTRR though, specially to avoid
>> > all this insane combinatorial nightmware.
>> >
>> >> > > Hence my suggestion to add
>> >> > > ioremap_x86_uc and/or set_memory_x86_uc to punch a UC hole in an
>> >> > > otherwise WC MTRR-covered region.
>> >
>> > To be clear I think you mean then that ioremap_x86_uc() would help us avoid the
>> > jumps between combinatorial issues with MTRR on PAT / non-PAT systems before
>> > and after commit de33c442e gets reverted. So for instance if we had on the
>> > atyfb driver:
>> >
>> > ioremap_x86_uc(PCI BAR)
>> > ioremap_wc(framebuffer)
>> > arch_phys_add_wc(PCI BAR)
>> >
>> > On non-PAT systems on the MMIO region with PWT=1, PCD=1 we'd end up with UC.
>> > Sadly though since _PAGE_CACHE_WC on non-PAT has PWT=1, PCD=0, the WC
>> > MTRR that follows would mean we'd end up with another grey area (but
>> > similar to before as technically an effectivethe memory type of WC).
>> >
>> > On PAT systems the above would not use MTRRs but we'd be counting on
>> > overlapping memory types -- its not clear if aliasing here is a problem.
>> >
>> > Also Intel SDM, volume 3, section "11.11.4 Range Size and Alignment Requirement"
>> > describes that: "the minimum range size is 4 KiB, the base address must be on
>> > a 4 KiB boundary. For ranges greater than 4 KiB each range must be of length
>> > 2^n and its base address must be alinged on a 2^n boundary where n is a value
>> > equal or greatar then 12. The base-address alignment value cannot be less
>> > than its length. For example, an 8-KiB range cannot be aligned on a
>> > 4-KiB boundary. It must be aligned on at least an 8-KiB boundary"
>> >
>> > So to answer my own question: indeed, our framebuffer base address must be
>> > aligned on a 2^n boundary, the size also has to be a power of 2. MTRR supports
>> > fixed range sizes and variable range sizes, in case of the MMIO that does
>> > not need to abide by the power of 2 rule as a fixed range size of 4 KiB
>> > could be used although upon review ouf our own implemetnation its unclear if
>> > that is what is used for 4 KiB sized MTRRs.
>> >
>> > Hence my arch_phys_add_wc(PCI BAR) as above.
>> >
>> >> > OK I think I get it now.
>> >> >
>> >> > And I take it this would hopefully only be used for non-PAT systems?
>> >
>> > Since we likely could care to use ioremap_x86_uc() on PAT systems as well we
>> > could make the effective for both PAT and non-PAT obviously then.  Later when
>> > we get ioremap() to default to strong UC we could drop ioremap_x86_uc() as we'd
>> > only need it as transitory until then -- that is unless we want perhaps a strong
>> > UC ioremap primitive which is always following strong UC when available regardless
>> > of these default transitions.
>> >
>> > The big issue I see here is simply the combinatorial issues, so I do think
>> > its best to annotate these corner cases well and avoid them.
>> >
>> >> > Would there be a use case for PAT systems? I wonder if we can wrap
>> >> > this under some APIs to make it clean and hide this dirty thing
>> >> > behind the scenes, it seems a fragile and error prone and my hope
>> >> > would be that we won't need more specialization in this area for
>> >> > PAT systems.
>> >>
>> >> One potential complication is kernel vs. userspace mmap. MTRR applies to
>> >> the physical address, but PAT applies to the virtual address, so with
>> >> the WC MTRR you get WC for userspace "for free" as well.
>> >
>> > What is the performance impact of having the conversion being done by the
>> > kernel? Has anyone done measurements? If significant can't the subsystem mmap()
>> > cache the phys address for PAT? Shouldn't the TLB take care of those considerations
>> > for us? If this is generally desirable shouldn't we just generalize the cache
>> > for devices for O(1) access through a generic API?
>>
>> We're pretty much required to keep the PTE memory types consistent for
>> aliasses of the same page.
>
> Hrm, OK so overlapping ioremap() calls should be frowed upon?
>
> I think its important to clarify the few different scenarios we have
> for atyfb, both for today when uc- is default and when uc becomes the
> default. I'll also clarify what this series originally tried to do
> but the issues that size requirements prohibit us to do along with
> combinatorial issues that would also be present when and if uc becomes
> default. Finally I'll clarify what I am thinking we should do in light
> of all this.
>
> _______________________________________________________________________
> |                                                       |             |
> |_______________________________________________________|_____________|
>
> \______________________________________________________/ \____________/
>
>                 Framebuffer (8 MiB)                         MMIO (4 KiB)
>
> Currently we have:
>
> Page_cache_mode's _PAGE_CACHE_MODE_ is removed below for brevity.
> The atyfb PCI BAR is condensed to:
>
> Frambuffer,MMIO
>
> Keeping in mind:
>
> Intel SDM, volume 3, section 11.5.2.1, table 11-6 (NonPAT combinatorial)
> Intel SDM, volume 3, section 11.5.2.2, table 11-7 (PAT    combinatorial)
>
> Linux PCD, PWT bits:
>
>  PAT
>  |PCD
>  ||PWT
>  |||
>  000 WB          _PAGE_CACHE_MODE_WB
>  001 WC          _PAGE_CACHE_MODE_WC
>  010 UC-         _PAGE_CACHE_MODE_UC_MINUS
>  011 UC          _PAGE_CACHE_MODE_UC
>
> (*)   below denotes grey area as per SDM, implementation-defined
> (%)   below denotes not posislbe due to size / base requirements of MTRRs
> (+)   below denotes combinatorial issue
>
> Non-PAT systems use PCD, PWT values, their respective bit settings for
> these are given although internally we use _PAGE_CACHE_MODE* on the
> ioremap* calls for both non-PAT and PAT. For instance
> _PAGE_CACHE_MODE_UC_MINUS is 10 for PCD=1, PWT=0.
>
> Today we have:
>
> --------------------------------------------------------------------
> Calls                   |    Page_cache_mode  |  Effective memtype  |
> ------------------------|---------------------|---------------------
>                         |  Non-PAT |    PAT   |  Non-PAT |    PAT   |
> --------------------------------------------------------------------
> ioremap(MMIO)           | xxx, 10  | xxx, UC- | xxx, UC  | xxx, UC- |
> ioremap(PCI BAR)        | 10 , 10  | UC-, UC- | UC,  UC  | UC-, UC- |
> MTRR WC(PCI BAR)        | 10 , 10  | UC-, UC- | WC*, WC* | WC , WC  |
> MTRR UC(MMIO)           | 10 , 10  | UC-, UC- | WC*, UC  | WC , UC  |
> --------------------------------------------------------------------
>
> If today we revert commit de33c442e and UC becomes default this would run into
> the combinatorial issue:
>
> --------------------------------------------------------------------
> Calls                   |    Page_cache_mode  |  Effective memtype  |
> ------------------------|---------------------|---------------------
>                         |  Non-PAT |    PAT   |  Non-PAT |    PAT   |
> --------------------------------------------------------------------
> ioremap(MMIO)           | xxx, 11  | xxx, UC  | xxx, UC  | xxx, UC  |
> ioremap(PCI BAR)        | 11 , 11  | UC , UC  | UC,  UC  | UC , UC  |
> MTRR WC(PCI BAR)        | 11 , 11  | UC,  UC  | UC+, UC+ | UC+, UC+ |
> MTRR UC(MMIO)           | 11 , 11  | UC,  UC  | UC+, UC  | UC+, UC  |
> --------------------------------------------------------------------
>
> We ideally would like to do the following but can't because of the restriction
> of having to use powers of two for both size and base address for MTRRs, we'd
> have two steps, one with mtrr_add, and another with arch_phys_add_wc(). This is
> what this series was proposing for atyfb.
>
> With mtrr_add():
>
> --------------------------------------------------------------------
> Calls                   |    Page_cache_mode  |  Effective memtype  |
> ------------------------|---------------------|---------------------
>                         |  Non-PAT |    PAT   |  Non-PAT |    PAT   |
> --------------------------------------------------------------------
> ioremap_nocache(MMIO)   | xxx, 10  | xxx, UC- | xxx, UC  | xxx, UC- |
> ioremap_wc(fb)          | 01 , 10  | WC , UC- | UC , UC  | WC , UC- |
> MTRR WC(fb)             | 01 , 10  | UC-, WC  | WC%*,UC  | WC%, UC- |
> --------------------------------------------------------------------
>
> Then we'd change this to arch_phys_add_wc():
>
> --------------------------------------------------------------------
> Calls                   |    Page_cache_mode  |  Effective memtype  |
> ------------------------|---------------------|---------------------
>                         |  Non-PAT |    PAT   |  Non-PAT |    PAT   |
> --------------------------------------------------------------------
> ioremap_nocache(MMIO)   | xxx, 10  | xxx, UC- | xxx, UC  | UC-, UC- |
> ioremap_wc(fb)          | 01 , 10  | WC , UC- | UC , UC  | WC , UC- |
> arch_phys_add_wc(fb)    | 01 , 10  | WC , WC  | WC%*,UC  | WC , UC- |
> --------------------------------------------------------------------
>
> With the above code as well we have to consider the issues if we
> revert commit de33c442e and UC becomes default, we'd run into then
> both the size issue and also a grey area:
>
> With mtrr_add():
>
> --------------------------------------------------------------------
> Calls                   |    Page_cache_mode  |  Effective memtype  |
> ------------------------|---------------------|---------------------
>                         |  Non-PAT |    PAT   |  Non-PAT |    PAT   |
> --------------------------------------------------------------------
> ioremap_nocache(MMIO)   | xxx, 11  | xxx, UC  | xxx, UC  | xxx, UC  |
> ioremap_wc(fb)          | 01 , 11  | WC , UC  | UC , UC  | WC , UC  |
> MTRR WC(fb)             | 01 , 11  | WC , UC  | WC%* ,UC  | WC , UC  |
> --------------------------------------------------------------------
>
> Then with arch_phys_add_wc():
>
> --------------------------------------------------------------------
> Calls                   |    Page_cache_mode  |  Effective memtype  |
> ------------------------|---------------------|---------------------
>                         |  Non-PAT |    PAT   |  Non-PAT |    PAT   |
> --------------------------------------------------------------------
> ioremap_nocache(MMIO)   | xxx, 11  | xxx, UC  | xxx, UC  | xxx, UC  |
> ioremap_wc(fb)          | 01 , 11  | WC , UC  | UC , UC  | WC , UC  |
> arch_phys_add_wc(fb)    | 01 , 11  | WC , UC  | WC%*,UC  | WC , UC  |
> --------------------------------------------------------------------
>
> So what we *could* do then if we add ioremap_uc() (use strong UC always),
> then override the framebuffer area with wc, and finally use MTRR on the
> full PCI BAR, relying on that strong UC won't let the MTRR override
> the earlier UC on the MMIO area. There is a grey area here for non-PAT
> systemes but that is also the case as-is today.
>
> --------------------------------------------------------------------
> Calls                   |    Page_cache_mode  |  Effective memtype  |
> ------------------------|---------------------|---------------------
>                         |  Non-PAT |    PAT   |  Non-PAT |    PAT   |
> --------------------------------------------------------------------
> ioremap_uc(PCI BAR)     | 11 , 11  | UC , UC  | UC , UC  | UC , UC  |
> ioremap_wc(fb)          | 01 , 11  | WC , UC  | UC , UC  | WC , UC  |
> MTRR_WC(PCI BAR)        | 01 , 11  | WC , UC  | WC*, UC  | WC , UC  |
> --------------------------------------------------------------------
>
> Finally with the arch_phys_add_wc() we'd end up with:
>
> --------------------------------------------------------------------
> Calls                   |    Page_cache_mode  |  Effective memtype  |
> ------------------------|---------------------|---------------------
>                         |  Non-PAT |    PAT   |  Non-PAT |    PAT   |
> --------------------------------------------------------------------
> ioremap_uc(PCI BAR)     | 11 , 11  | UC , UC  | UC , UC  | UC , UC  |
> ioremap_wc(fb)          | 01 , 11  | WC , UC  | UC , UC  | WC , UC  |
> arch_phys_add_wc(PCIBAR)| 01 , 11  | WC , UC  | WC*, UC  | WC , UC  |
> --------------------------------------------------------------------
>
> In this case a revert of de33c442e won't have any effect as the driver
> was already well prepared for it by using ioremap_uc().
>
>> I think that the x86 pageattr code is
>> supposed to take care of this.  IOW, if everything is working right,
>> then the supposedly uncached mmap should either fail, be promoted to
>> WC, or cause the existing WC map to degrade to UC.  The code is really
>> overcomplicated right now.
>
> Yeah aliasing things are not clear for the above picture for me, someone
> who is knee-deep in this can likely confirm of any issues with the above
> pictures. But most importrantly if we believe however that the last two sets
> above don't have any issues then I think we can move forward. Since we only
> have a few drivers that need special handling I think it makes sense to treat
> them specially and document this strategy for the "hole" work around.
>

Seems reaonable to me.

--Andy

> Thoughts?
>
>   Luis



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
@ 2015-04-02 19:50                           ` Andy Lutomirski
  0 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-04-02 19:50 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Mel Gorman, Vlastimil Babka, Ville Syrjälä,
	Bjorn Helgaas, Luis R. Rodriguez, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Juergen Gross, Jan Beulich, Borislav Petkov,
	Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
	Linux Fbdev development list, X86 ML, xen-devel, Ingo Molnar,
	Linus Torvalds, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Thu, Apr 2, 2015 at 12:45 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Wed, Apr 01, 2015 at 05:04:08PM -0700, Andy Lutomirski wrote:
>> On Wed, Apr 1, 2015 at 4:52 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> > On Sat, Mar 28, 2015 at 02:23:34PM +0200, Ville Syrjälä wrote:
>> >> On Sat, Mar 28, 2015 at 01:28:18AM +0100, Luis R. Rodriguez wrote:
>> >> > On Fri, Mar 27, 2015 at 03:02:10PM -0700, Andy Lutomirski wrote:
>> >> > > On Fri, Mar 27, 2015 at 2:56 PM, Ville Syrjälä <syrjala@sci.fi> wrote:
>> >> > > > On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
>> >> > > >> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
>> >> > > >> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> >> > > >> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
>> >> > > >> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
>> >> > > >> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
>> >> > > >> > >> > index 8025624..8875e56 100644
>> >> > > >> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
>> >> > > >> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
>> >> > > >> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
>> >> > > >> > >> >
>> >> > > >> > >> >  #ifdef CONFIG_MTRR
>> >> > > >> > >> >     par->mtrr_aper = -1;
>> >> > > >> > >> > -   par->mtrr_reg = -1;
>> >> > > >> > >> >     if (!nomtrr) {
>> >> > > >> > >> > -           /* Cover the whole resource. */
>> >> > > >> > >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
>> >> > > >> > >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
>> >> > > >> > >> > +                                     info->fix.smem_len,
>> >> > > >> > >> >                                       MTRR_TYPE_WRCOMB, 1);
>> >> > > >> > >>
>> >> > > >> > >> MTRRs need power of two size, so how is this supposed to work?
>> >> > > >> > >
>> >> > > >> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
>> >> > > >> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
>> >> > > >> > > is not standardized and by no means recorded as a requirement. Obviously
>> >> > > >> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
>> >> > > >> > > will use mtrr_check() to verify the the same requirement. Furthermore,
>> >> > > >> > > as per my commit log message:
>> >> > > >> >
>> >> > > >> > Whatever the code may or may not do, the x86 architecture uses
>> >> > > >> > power-of-two MTRR sizes.  So I'm confused.
>> >> > > >>
>> >> > > >> There should be no confusion, I simply did not know that *was* the
>> >> > > >> requirement for x86, if that is the case we should add a check for that
>> >> > > >> and perhaps generalize a helper that does the power of two helper changes,
>> >> > > >> the cleanest I found was the vesafb driver solution.
>> >> > > >>
>> >> > > >> Thoughts?
>> >> > > >
>> >> > > > The vesafb solution is bad since you'll only end up covering only
>> >> > > > the first 4MB of the framebuffer instead of the almost 8MB you want.
>> >> > > > Which in practice will mean throwing away half the VRAM since you really
>> >> > > > don't want the massive performance hit from accessing it as UC. And that
>> >> > > > would mean giving up decent display resolutions as well :(
>> >> > > >
>> >> > > > And the other option of trying to cover the remainder with multiple ever
>> >> > > > smaller MTRRs doesn't work either since you'll run out of MTRRs very
>> >> > > > quickly.
>> >> > > >
>> >> > > > This is precisely why I used the hole method in atyfb in the first
>> >> > > > place.
>> >> > > >
>> >> > > > I don't really like the idea of any new mtrr code not supporting that
>> >> > > > use case, especially as these things tend to be present in older machines
>> >> > > > where PAT isn't an option.
>> >> > >
>> >> > > According to the Intel SDM, volume 3, section 11.5.2.1, table 11-6,
>> >> > > non-PAT CPUs that have a WC MTRR, PCD = 1, and PWT = 1 (aka UC) have
>> >> > > an effective memory type of UC.
>> >
>> > This is true but non-PAT systems that use just ioremap() will default to
>> > _PAGE_CACHE_MODE_UC_MINUS, not _PAGE_CACHE_MODE_UC, and _PAGE_CACHE_MODE_UC_MINUS
>> > on Linux has PCD = 1, PWT = 0. The list comes from:
>> >
>> > uint16_t __cachemode2pte_tbl[_PAGE_CACHE_MODE_NUM] = {
>> >         [_PAGE_CACHE_MODE_WB      ]     = 0         | 0        ,
>> >         [_PAGE_CACHE_MODE_WC      ]     = _PAGE_PWT | 0        ,
>> >         [_PAGE_CACHE_MODE_UC_MINUS]     = 0         | _PAGE_PCD,
>> >         [_PAGE_CACHE_MODE_UC      ]     = _PAGE_PWT | _PAGE_PCD,
>> >         [_PAGE_CACHE_MODE_WT      ]     = 0         | _PAGE_PCD,
>> >         [_PAGE_CACHE_MODE_WP      ]     = 0         | _PAGE_PCD,
>> > };
>> >
>> > This can better be read here:
>> >
>> >  PAT
>> >  |PCD
>> >  ||PWT
>> >  |||
>> >  000 WB          _PAGE_CACHE_MODE_WB
>> >  001 WC          _PAGE_CACHE_MODE_WC
>> >  010 UC-         _PAGE_CACHE_MODE_UC_MINUS
>> >  011 UC          _PAGE_CACHE_MODE_UC
>> >
>> > On x86 ioremap() defaults to ioremap_nocache() and right now that uses
>> > _PAGE_CACHE_MODE_UC_MINUS not _PAGE_CACHE_MODE_UC. We have two cases
>> > to consider for non-PAT systems then:
>> >
>> > a) Right now as ioremap() and ioremap_nocache() default to _PAGE_CACHE_MODE_UC_MINUS
>> >    on x86. In this case using a WC MTRR seems to use PWT=0, PCD=1, and
>> >    table table 11-6 on non-PAT systems seems to place this situation as
>> >    "implementation defined" and not encouraged.
>> >
>> > a) when commit de33c442e "x86 PAT: fix performance drop for glx, use
>> >    UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()"
>> >    gets reverted and we use _PAGE_CACHE_MODE_UC by default. In this
>> >    case on x86 for both ioremap() and ioremap_nocache() as they will
>> >    both default to _PAGE_CACHE_MODE_UC we'll end up as you note with
>> >    an effective memory type of UC.
>> >
>> > If I've understood this correctly then neither of these situations are good and
>> > its just by chance that on some systems situation a) has lead to proper WC.
>> >
>> > On a PAT system we have a bit different combinatorial results (based on Table
>> > 11-7):
>> >
>> > a) Right now ioremap() and ioremap_nocache() defaulting to
>> >     _PAGE_CACHE_MODE_UC_MINUS yields + MTRR WC = WC
>> >
>> > b) When commit de33c442e gets reverted _PAGE_CACHE_MODE_UC + MTRR WC = UC
>> >
>> > So to be clear right now atyfb should work fine on PAT systems
>> > with de33c442e in place, once reverted as-is right now we'd end
>> > up with UC effective memory type.
>> >
>> > For both PAT and non-PAT systems when commit de33c442e gets reverted
>> > we'd end up with UC as the effective memory type for atyfb. Right
>> > now it shoud work on PAT systems and by chance its suspected to work
>> > on non-PAT systems. We want to phase MTRR though, specially to avoid
>> > all this insane combinatorial nightmware.
>> >
>> >> > > Hence my suggestion to add
>> >> > > ioremap_x86_uc and/or set_memory_x86_uc to punch a UC hole in an
>> >> > > otherwise WC MTRR-covered region.
>> >
>> > To be clear I think you mean then that ioremap_x86_uc() would help us avoid the
>> > jumps between combinatorial issues with MTRR on PAT / non-PAT systems before
>> > and after commit de33c442e gets reverted. So for instance if we had on the
>> > atyfb driver:
>> >
>> > ioremap_x86_uc(PCI BAR)
>> > ioremap_wc(framebuffer)
>> > arch_phys_add_wc(PCI BAR)
>> >
>> > On non-PAT systems on the MMIO region with PWT=1, PCD=1 we'd end up with UC.
>> > Sadly though since _PAGE_CACHE_WC on non-PAT has PWT=1, PCD=0, the WC
>> > MTRR that follows would mean we'd end up with another grey area (but
>> > similar to before as technically an effectivethe memory type of WC).
>> >
>> > On PAT systems the above would not use MTRRs but we'd be counting on
>> > overlapping memory types -- its not clear if aliasing here is a problem.
>> >
>> > Also Intel SDM, volume 3, section "11.11.4 Range Size and Alignment Requirement"
>> > describes that: "the minimum range size is 4 KiB, the base address must be on
>> > a 4 KiB boundary. For ranges greater than 4 KiB each range must be of length
>> > 2^n and its base address must be alinged on a 2^n boundary where n is a value
>> > equal or greatar then 12. The base-address alignment value cannot be less
>> > than its length. For example, an 8-KiB range cannot be aligned on a
>> > 4-KiB boundary. It must be aligned on at least an 8-KiB boundary"
>> >
>> > So to answer my own question: indeed, our framebuffer base address must be
>> > aligned on a 2^n boundary, the size also has to be a power of 2. MTRR supports
>> > fixed range sizes and variable range sizes, in case of the MMIO that does
>> > not need to abide by the power of 2 rule as a fixed range size of 4 KiB
>> > could be used although upon review ouf our own implemetnation its unclear if
>> > that is what is used for 4 KiB sized MTRRs.
>> >
>> > Hence my arch_phys_add_wc(PCI BAR) as above.
>> >
>> >> > OK I think I get it now.
>> >> >
>> >> > And I take it this would hopefully only be used for non-PAT systems?
>> >
>> > Since we likely could care to use ioremap_x86_uc() on PAT systems as well we
>> > could make the effective for both PAT and non-PAT obviously then.  Later when
>> > we get ioremap() to default to strong UC we could drop ioremap_x86_uc() as we'd
>> > only need it as transitory until then -- that is unless we want perhaps a strong
>> > UC ioremap primitive which is always following strong UC when available regardless
>> > of these default transitions.
>> >
>> > The big issue I see here is simply the combinatorial issues, so I do think
>> > its best to annotate these corner cases well and avoid them.
>> >
>> >> > Would there be a use case for PAT systems? I wonder if we can wrap
>> >> > this under some APIs to make it clean and hide this dirty thing
>> >> > behind the scenes, it seems a fragile and error prone and my hope
>> >> > would be that we won't need more specialization in this area for
>> >> > PAT systems.
>> >>
>> >> One potential complication is kernel vs. userspace mmap. MTRR applies to
>> >> the physical address, but PAT applies to the virtual address, so with
>> >> the WC MTRR you get WC for userspace "for free" as well.
>> >
>> > What is the performance impact of having the conversion being done by the
>> > kernel? Has anyone done measurements? If significant can't the subsystem mmap()
>> > cache the phys address for PAT? Shouldn't the TLB take care of those considerations
>> > for us? If this is generally desirable shouldn't we just generalize the cache
>> > for devices for O(1) access through a generic API?
>>
>> We're pretty much required to keep the PTE memory types consistent for
>> aliasses of the same page.
>
> Hrm, OK so overlapping ioremap() calls should be frowed upon?
>
> I think its important to clarify the few different scenarios we have
> for atyfb, both for today when uc- is default and when uc becomes the
> default. I'll also clarify what this series originally tried to do
> but the issues that size requirements prohibit us to do along with
> combinatorial issues that would also be present when and if uc becomes
> default. Finally I'll clarify what I am thinking we should do in light
> of all this.
>
> _______________________________________________________________________
> |                                                       |             |
> |_______________________________________________________|_____________|
>
> \______________________________________________________/ \____________/
>
>                 Framebuffer (8 MiB)                         MMIO (4 KiB)
>
> Currently we have:
>
> Page_cache_mode's _PAGE_CACHE_MODE_ is removed below for brevity.
> The atyfb PCI BAR is condensed to:
>
> Frambuffer,MMIO
>
> Keeping in mind:
>
> Intel SDM, volume 3, section 11.5.2.1, table 11-6 (NonPAT combinatorial)
> Intel SDM, volume 3, section 11.5.2.2, table 11-7 (PAT    combinatorial)
>
> Linux PCD, PWT bits:
>
>  PAT
>  |PCD
>  ||PWT
>  |||
>  000 WB          _PAGE_CACHE_MODE_WB
>  001 WC          _PAGE_CACHE_MODE_WC
>  010 UC-         _PAGE_CACHE_MODE_UC_MINUS
>  011 UC          _PAGE_CACHE_MODE_UC
>
> (*)   below denotes grey area as per SDM, implementation-defined
> (%)   below denotes not posislbe due to size / base requirements of MTRRs
> (+)   below denotes combinatorial issue
>
> Non-PAT systems use PCD, PWT values, their respective bit settings for
> these are given although internally we use _PAGE_CACHE_MODE* on the
> ioremap* calls for both non-PAT and PAT. For instance
> _PAGE_CACHE_MODE_UC_MINUS is 10 for PCD=1, PWT=0.
>
> Today we have:
>
> --------------------------------------------------------------------
> Calls                   |    Page_cache_mode  |  Effective memtype  |
> ------------------------|---------------------|---------------------
>                         |  Non-PAT |    PAT   |  Non-PAT |    PAT   |
> --------------------------------------------------------------------
> ioremap(MMIO)           | xxx, 10  | xxx, UC- | xxx, UC  | xxx, UC- |
> ioremap(PCI BAR)        | 10 , 10  | UC-, UC- | UC,  UC  | UC-, UC- |
> MTRR WC(PCI BAR)        | 10 , 10  | UC-, UC- | WC*, WC* | WC , WC  |
> MTRR UC(MMIO)           | 10 , 10  | UC-, UC- | WC*, UC  | WC , UC  |
> --------------------------------------------------------------------
>
> If today we revert commit de33c442e and UC becomes default this would run into
> the combinatorial issue:
>
> --------------------------------------------------------------------
> Calls                   |    Page_cache_mode  |  Effective memtype  |
> ------------------------|---------------------|---------------------
>                         |  Non-PAT |    PAT   |  Non-PAT |    PAT   |
> --------------------------------------------------------------------
> ioremap(MMIO)           | xxx, 11  | xxx, UC  | xxx, UC  | xxx, UC  |
> ioremap(PCI BAR)        | 11 , 11  | UC , UC  | UC,  UC  | UC , UC  |
> MTRR WC(PCI BAR)        | 11 , 11  | UC,  UC  | UC+, UC+ | UC+, UC+ |
> MTRR UC(MMIO)           | 11 , 11  | UC,  UC  | UC+, UC  | UC+, UC  |
> --------------------------------------------------------------------
>
> We ideally would like to do the following but can't because of the restriction
> of having to use powers of two for both size and base address for MTRRs, we'd
> have two steps, one with mtrr_add, and another with arch_phys_add_wc(). This is
> what this series was proposing for atyfb.
>
> With mtrr_add():
>
> --------------------------------------------------------------------
> Calls                   |    Page_cache_mode  |  Effective memtype  |
> ------------------------|---------------------|---------------------
>                         |  Non-PAT |    PAT   |  Non-PAT |    PAT   |
> --------------------------------------------------------------------
> ioremap_nocache(MMIO)   | xxx, 10  | xxx, UC- | xxx, UC  | xxx, UC- |
> ioremap_wc(fb)          | 01 , 10  | WC , UC- | UC , UC  | WC , UC- |
> MTRR WC(fb)             | 01 , 10  | UC-, WC  | WC%*,UC  | WC%, UC- |
> --------------------------------------------------------------------
>
> Then we'd change this to arch_phys_add_wc():
>
> --------------------------------------------------------------------
> Calls                   |    Page_cache_mode  |  Effective memtype  |
> ------------------------|---------------------|---------------------
>                         |  Non-PAT |    PAT   |  Non-PAT |    PAT   |
> --------------------------------------------------------------------
> ioremap_nocache(MMIO)   | xxx, 10  | xxx, UC- | xxx, UC  | UC-, UC- |
> ioremap_wc(fb)          | 01 , 10  | WC , UC- | UC , UC  | WC , UC- |
> arch_phys_add_wc(fb)    | 01 , 10  | WC , WC  | WC%*,UC  | WC , UC- |
> --------------------------------------------------------------------
>
> With the above code as well we have to consider the issues if we
> revert commit de33c442e and UC becomes default, we'd run into then
> both the size issue and also a grey area:
>
> With mtrr_add():
>
> --------------------------------------------------------------------
> Calls                   |    Page_cache_mode  |  Effective memtype  |
> ------------------------|---------------------|---------------------
>                         |  Non-PAT |    PAT   |  Non-PAT |    PAT   |
> --------------------------------------------------------------------
> ioremap_nocache(MMIO)   | xxx, 11  | xxx, UC  | xxx, UC  | xxx, UC  |
> ioremap_wc(fb)          | 01 , 11  | WC , UC  | UC , UC  | WC , UC  |
> MTRR WC(fb)             | 01 , 11  | WC , UC  | WC%* ,UC  | WC , UC  |
> --------------------------------------------------------------------
>
> Then with arch_phys_add_wc():
>
> --------------------------------------------------------------------
> Calls                   |    Page_cache_mode  |  Effective memtype  |
> ------------------------|---------------------|---------------------
>                         |  Non-PAT |    PAT   |  Non-PAT |    PAT   |
> --------------------------------------------------------------------
> ioremap_nocache(MMIO)   | xxx, 11  | xxx, UC  | xxx, UC  | xxx, UC  |
> ioremap_wc(fb)          | 01 , 11  | WC , UC  | UC , UC  | WC , UC  |
> arch_phys_add_wc(fb)    | 01 , 11  | WC , UC  | WC%*,UC  | WC , UC  |
> --------------------------------------------------------------------
>
> So what we *could* do then if we add ioremap_uc() (use strong UC always),
> then override the framebuffer area with wc, and finally use MTRR on the
> full PCI BAR, relying on that strong UC won't let the MTRR override
> the earlier UC on the MMIO area. There is a grey area here for non-PAT
> systemes but that is also the case as-is today.
>
> --------------------------------------------------------------------
> Calls                   |    Page_cache_mode  |  Effective memtype  |
> ------------------------|---------------------|---------------------
>                         |  Non-PAT |    PAT   |  Non-PAT |    PAT   |
> --------------------------------------------------------------------
> ioremap_uc(PCI BAR)     | 11 , 11  | UC , UC  | UC , UC  | UC , UC  |
> ioremap_wc(fb)          | 01 , 11  | WC , UC  | UC , UC  | WC , UC  |
> MTRR_WC(PCI BAR)        | 01 , 11  | WC , UC  | WC*, UC  | WC , UC  |
> --------------------------------------------------------------------
>
> Finally with the arch_phys_add_wc() we'd end up with:
>
> --------------------------------------------------------------------
> Calls                   |    Page_cache_mode  |  Effective memtype  |
> ------------------------|---------------------|---------------------
>                         |  Non-PAT |    PAT   |  Non-PAT |    PAT   |
> --------------------------------------------------------------------
> ioremap_uc(PCI BAR)     | 11 , 11  | UC , UC  | UC , UC  | UC , UC  |
> ioremap_wc(fb)          | 01 , 11  | WC , UC  | UC , UC  | WC , UC  |
> arch_phys_add_wc(PCIBAR)| 01 , 11  | WC , UC  | WC*, UC  | WC , UC  |
> --------------------------------------------------------------------
>
> In this case a revert of de33c442e won't have any effect as the driver
> was already well prepared for it by using ioremap_uc().
>
>> I think that the x86 pageattr code is
>> supposed to take care of this.  IOW, if everything is working right,
>> then the supposedly uncached mmap should either fail, be promoted to
>> WC, or cause the existing WC map to degrade to UC.  The code is really
>> overcomplicated right now.
>
> Yeah aliasing things are not clear for the above picture for me, someone
> who is knee-deep in this can likely confirm of any issues with the above
> pictures. But most importrantly if we believe however that the last two sets
> above don't have any issues then I think we can move forward. Since we only
> have a few drivers that need special handling I think it makes sense to treat
> them specially and document this strategy for the "hole" work around.
>

Seems reaonable to me.

--Andy

> Thoughts?
>
>   Luis



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around
  2015-04-02 19:45                         ` Luis R. Rodriguez
  (?)
@ 2015-04-02 19:50                         ` Andy Lutomirski
  -1 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-04-02 19:50 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Linux Fbdev development list, Daniel Vetter,
	Ville Syrjälä,
	Jan Beulich, H. Peter Anvin, Suresh Siddha, Tomi Valkeinen,
	X86 ML, Ingo Molnar, Mel Gorman, xen-devel, Ingo Molnar,
	Borislav Petkov, Jean-Christophe Plagniol-Villard,
	Antonino Daplas, Dave Airlie, Bjorn Helgaas, Thomas Gleixner,
	Vlastimil Babka, Juergen Gross, Luis R. Rodriguez,
	linux-kernel@vger.kernel.org

On Thu, Apr 2, 2015 at 12:45 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Wed, Apr 01, 2015 at 05:04:08PM -0700, Andy Lutomirski wrote:
>> On Wed, Apr 1, 2015 at 4:52 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> > On Sat, Mar 28, 2015 at 02:23:34PM +0200, Ville Syrjälä wrote:
>> >> On Sat, Mar 28, 2015 at 01:28:18AM +0100, Luis R. Rodriguez wrote:
>> >> > On Fri, Mar 27, 2015 at 03:02:10PM -0700, Andy Lutomirski wrote:
>> >> > > On Fri, Mar 27, 2015 at 2:56 PM, Ville Syrjälä <syrjala@sci.fi> wrote:
>> >> > > > On Fri, Mar 27, 2015 at 08:57:59PM +0100, Luis R. Rodriguez wrote:
>> >> > > >> On Fri, Mar 27, 2015 at 12:43:55PM -0700, Andy Lutomirski wrote:
>> >> > > >> > On Fri, Mar 27, 2015 at 12:38 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> >> > > >> > > On Sat, Mar 21, 2015 at 11:15:14AM +0200, Ville Syrjälä wrote:
>> >> > > >> > >> On Fri, Mar 20, 2015 at 04:17:59PM -0700, Luis R. Rodriguez wrote:
>> >> > > >> > >> > diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
>> >> > > >> > >> > index 8025624..8875e56 100644
>> >> > > >> > >> > --- a/drivers/video/fbdev/aty/atyfb_base.c
>> >> > > >> > >> > +++ b/drivers/video/fbdev/aty/atyfb_base.c
>> >> > > >> > >> > @@ -2630,21 +2630,10 @@ static int aty_init(struct fb_info *info)
>> >> > > >> > >> >
>> >> > > >> > >> >  #ifdef CONFIG_MTRR
>> >> > > >> > >> >     par->mtrr_aper = -1;
>> >> > > >> > >> > -   par->mtrr_reg = -1;
>> >> > > >> > >> >     if (!nomtrr) {
>> >> > > >> > >> > -           /* Cover the whole resource. */
>> >> > > >> > >> > -           par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
>> >> > > >> > >> > +           par->mtrr_aper = mtrr_add(info->fix.smem_start,
>> >> > > >> > >> > +                                     info->fix.smem_len,
>> >> > > >> > >> >                                       MTRR_TYPE_WRCOMB, 1);
>> >> > > >> > >>
>> >> > > >> > >> MTRRs need power of two size, so how is this supposed to work?
>> >> > > >> > >
>> >> > > >> > > As per mtrr_add_page() [0] the base and size are just supposed to be in units
>> >> > > >> > > of 4 KiB, although the practice is to use powers of 2 in *some* drivers this
>> >> > > >> > > is not standardized and by no means recorded as a requirement. Obviously
>> >> > > >> > > powers of 2 will work too and you'd end up neatly aligned as well. mtrr_add()
>> >> > > >> > > will use mtrr_check() to verify the the same requirement. Furthermore,
>> >> > > >> > > as per my commit log message:
>> >> > > >> >
>> >> > > >> > Whatever the code may or may not do, the x86 architecture uses
>> >> > > >> > power-of-two MTRR sizes.  So I'm confused.
>> >> > > >>
>> >> > > >> There should be no confusion, I simply did not know that *was* the
>> >> > > >> requirement for x86, if that is the case we should add a check for that
>> >> > > >> and perhaps generalize a helper that does the power of two helper changes,
>> >> > > >> the cleanest I found was the vesafb driver solution.
>> >> > > >>
>> >> > > >> Thoughts?
>> >> > > >
>> >> > > > The vesafb solution is bad since you'll only end up covering only
>> >> > > > the first 4MB of the framebuffer instead of the almost 8MB you want.
>> >> > > > Which in practice will mean throwing away half the VRAM since you really
>> >> > > > don't want the massive performance hit from accessing it as UC. And that
>> >> > > > would mean giving up decent display resolutions as well :(
>> >> > > >
>> >> > > > And the other option of trying to cover the remainder with multiple ever
>> >> > > > smaller MTRRs doesn't work either since you'll run out of MTRRs very
>> >> > > > quickly.
>> >> > > >
>> >> > > > This is precisely why I used the hole method in atyfb in the first
>> >> > > > place.
>> >> > > >
>> >> > > > I don't really like the idea of any new mtrr code not supporting that
>> >> > > > use case, especially as these things tend to be present in older machines
>> >> > > > where PAT isn't an option.
>> >> > >
>> >> > > According to the Intel SDM, volume 3, section 11.5.2.1, table 11-6,
>> >> > > non-PAT CPUs that have a WC MTRR, PCD = 1, and PWT = 1 (aka UC) have
>> >> > > an effective memory type of UC.
>> >
>> > This is true but non-PAT systems that use just ioremap() will default to
>> > _PAGE_CACHE_MODE_UC_MINUS, not _PAGE_CACHE_MODE_UC, and _PAGE_CACHE_MODE_UC_MINUS
>> > on Linux has PCD = 1, PWT = 0. The list comes from:
>> >
>> > uint16_t __cachemode2pte_tbl[_PAGE_CACHE_MODE_NUM] = {
>> >         [_PAGE_CACHE_MODE_WB      ]     = 0         | 0        ,
>> >         [_PAGE_CACHE_MODE_WC      ]     = _PAGE_PWT | 0        ,
>> >         [_PAGE_CACHE_MODE_UC_MINUS]     = 0         | _PAGE_PCD,
>> >         [_PAGE_CACHE_MODE_UC      ]     = _PAGE_PWT | _PAGE_PCD,
>> >         [_PAGE_CACHE_MODE_WT      ]     = 0         | _PAGE_PCD,
>> >         [_PAGE_CACHE_MODE_WP      ]     = 0         | _PAGE_PCD,
>> > };
>> >
>> > This can better be read here:
>> >
>> >  PAT
>> >  |PCD
>> >  ||PWT
>> >  |||
>> >  000 WB          _PAGE_CACHE_MODE_WB
>> >  001 WC          _PAGE_CACHE_MODE_WC
>> >  010 UC-         _PAGE_CACHE_MODE_UC_MINUS
>> >  011 UC          _PAGE_CACHE_MODE_UC
>> >
>> > On x86 ioremap() defaults to ioremap_nocache() and right now that uses
>> > _PAGE_CACHE_MODE_UC_MINUS not _PAGE_CACHE_MODE_UC. We have two cases
>> > to consider for non-PAT systems then:
>> >
>> > a) Right now as ioremap() and ioremap_nocache() default to _PAGE_CACHE_MODE_UC_MINUS
>> >    on x86. In this case using a WC MTRR seems to use PWT=0, PCD=1, and
>> >    table table 11-6 on non-PAT systems seems to place this situation as
>> >    "implementation defined" and not encouraged.
>> >
>> > a) when commit de33c442e "x86 PAT: fix performance drop for glx, use
>> >    UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()"
>> >    gets reverted and we use _PAGE_CACHE_MODE_UC by default. In this
>> >    case on x86 for both ioremap() and ioremap_nocache() as they will
>> >    both default to _PAGE_CACHE_MODE_UC we'll end up as you note with
>> >    an effective memory type of UC.
>> >
>> > If I've understood this correctly then neither of these situations are good and
>> > its just by chance that on some systems situation a) has lead to proper WC.
>> >
>> > On a PAT system we have a bit different combinatorial results (based on Table
>> > 11-7):
>> >
>> > a) Right now ioremap() and ioremap_nocache() defaulting to
>> >     _PAGE_CACHE_MODE_UC_MINUS yields + MTRR WC = WC
>> >
>> > b) When commit de33c442e gets reverted _PAGE_CACHE_MODE_UC + MTRR WC = UC
>> >
>> > So to be clear right now atyfb should work fine on PAT systems
>> > with de33c442e in place, once reverted as-is right now we'd end
>> > up with UC effective memory type.
>> >
>> > For both PAT and non-PAT systems when commit de33c442e gets reverted
>> > we'd end up with UC as the effective memory type for atyfb. Right
>> > now it shoud work on PAT systems and by chance its suspected to work
>> > on non-PAT systems. We want to phase MTRR though, specially to avoid
>> > all this insane combinatorial nightmware.
>> >
>> >> > > Hence my suggestion to add
>> >> > > ioremap_x86_uc and/or set_memory_x86_uc to punch a UC hole in an
>> >> > > otherwise WC MTRR-covered region.
>> >
>> > To be clear I think you mean then that ioremap_x86_uc() would help us avoid the
>> > jumps between combinatorial issues with MTRR on PAT / non-PAT systems before
>> > and after commit de33c442e gets reverted. So for instance if we had on the
>> > atyfb driver:
>> >
>> > ioremap_x86_uc(PCI BAR)
>> > ioremap_wc(framebuffer)
>> > arch_phys_add_wc(PCI BAR)
>> >
>> > On non-PAT systems on the MMIO region with PWT=1, PCD=1 we'd end up with UC.
>> > Sadly though since _PAGE_CACHE_WC on non-PAT has PWT=1, PCD=0, the WC
>> > MTRR that follows would mean we'd end up with another grey area (but
>> > similar to before as technically an effectivethe memory type of WC).
>> >
>> > On PAT systems the above would not use MTRRs but we'd be counting on
>> > overlapping memory types -- its not clear if aliasing here is a problem.
>> >
>> > Also Intel SDM, volume 3, section "11.11.4 Range Size and Alignment Requirement"
>> > describes that: "the minimum range size is 4 KiB, the base address must be on
>> > a 4 KiB boundary. For ranges greater than 4 KiB each range must be of length
>> > 2^n and its base address must be alinged on a 2^n boundary where n is a value
>> > equal or greatar then 12. The base-address alignment value cannot be less
>> > than its length. For example, an 8-KiB range cannot be aligned on a
>> > 4-KiB boundary. It must be aligned on at least an 8-KiB boundary"
>> >
>> > So to answer my own question: indeed, our framebuffer base address must be
>> > aligned on a 2^n boundary, the size also has to be a power of 2. MTRR supports
>> > fixed range sizes and variable range sizes, in case of the MMIO that does
>> > not need to abide by the power of 2 rule as a fixed range size of 4 KiB
>> > could be used although upon review ouf our own implemetnation its unclear if
>> > that is what is used for 4 KiB sized MTRRs.
>> >
>> > Hence my arch_phys_add_wc(PCI BAR) as above.
>> >
>> >> > OK I think I get it now.
>> >> >
>> >> > And I take it this would hopefully only be used for non-PAT systems?
>> >
>> > Since we likely could care to use ioremap_x86_uc() on PAT systems as well we
>> > could make the effective for both PAT and non-PAT obviously then.  Later when
>> > we get ioremap() to default to strong UC we could drop ioremap_x86_uc() as we'd
>> > only need it as transitory until then -- that is unless we want perhaps a strong
>> > UC ioremap primitive which is always following strong UC when available regardless
>> > of these default transitions.
>> >
>> > The big issue I see here is simply the combinatorial issues, so I do think
>> > its best to annotate these corner cases well and avoid them.
>> >
>> >> > Would there be a use case for PAT systems? I wonder if we can wrap
>> >> > this under some APIs to make it clean and hide this dirty thing
>> >> > behind the scenes, it seems a fragile and error prone and my hope
>> >> > would be that we won't need more specialization in this area for
>> >> > PAT systems.
>> >>
>> >> One potential complication is kernel vs. userspace mmap. MTRR applies to
>> >> the physical address, but PAT applies to the virtual address, so with
>> >> the WC MTRR you get WC for userspace "for free" as well.
>> >
>> > What is the performance impact of having the conversion being done by the
>> > kernel? Has anyone done measurements? If significant can't the subsystem mmap()
>> > cache the phys address for PAT? Shouldn't the TLB take care of those considerations
>> > for us? If this is generally desirable shouldn't we just generalize the cache
>> > for devices for O(1) access through a generic API?
>>
>> We're pretty much required to keep the PTE memory types consistent for
>> aliasses of the same page.
>
> Hrm, OK so overlapping ioremap() calls should be frowed upon?
>
> I think its important to clarify the few different scenarios we have
> for atyfb, both for today when uc- is default and when uc becomes the
> default. I'll also clarify what this series originally tried to do
> but the issues that size requirements prohibit us to do along with
> combinatorial issues that would also be present when and if uc becomes
> default. Finally I'll clarify what I am thinking we should do in light
> of all this.
>
> _______________________________________________________________________
> |                                                       |             |
> |_______________________________________________________|_____________|
>
> \______________________________________________________/ \____________/
>
>                 Framebuffer (8 MiB)                         MMIO (4 KiB)
>
> Currently we have:
>
> Page_cache_mode's _PAGE_CACHE_MODE_ is removed below for brevity.
> The atyfb PCI BAR is condensed to:
>
> Frambuffer,MMIO
>
> Keeping in mind:
>
> Intel SDM, volume 3, section 11.5.2.1, table 11-6 (NonPAT combinatorial)
> Intel SDM, volume 3, section 11.5.2.2, table 11-7 (PAT    combinatorial)
>
> Linux PCD, PWT bits:
>
>  PAT
>  |PCD
>  ||PWT
>  |||
>  000 WB          _PAGE_CACHE_MODE_WB
>  001 WC          _PAGE_CACHE_MODE_WC
>  010 UC-         _PAGE_CACHE_MODE_UC_MINUS
>  011 UC          _PAGE_CACHE_MODE_UC
>
> (*)   below denotes grey area as per SDM, implementation-defined
> (%)   below denotes not posislbe due to size / base requirements of MTRRs
> (+)   below denotes combinatorial issue
>
> Non-PAT systems use PCD, PWT values, their respective bit settings for
> these are given although internally we use _PAGE_CACHE_MODE* on the
> ioremap* calls for both non-PAT and PAT. For instance
> _PAGE_CACHE_MODE_UC_MINUS is 10 for PCD=1, PWT=0.
>
> Today we have:
>
> --------------------------------------------------------------------
> Calls                   |    Page_cache_mode  |  Effective memtype  |
> ------------------------|---------------------|---------------------
>                         |  Non-PAT |    PAT   |  Non-PAT |    PAT   |
> --------------------------------------------------------------------
> ioremap(MMIO)           | xxx, 10  | xxx, UC- | xxx, UC  | xxx, UC- |
> ioremap(PCI BAR)        | 10 , 10  | UC-, UC- | UC,  UC  | UC-, UC- |
> MTRR WC(PCI BAR)        | 10 , 10  | UC-, UC- | WC*, WC* | WC , WC  |
> MTRR UC(MMIO)           | 10 , 10  | UC-, UC- | WC*, UC  | WC , UC  |
> --------------------------------------------------------------------
>
> If today we revert commit de33c442e and UC becomes default this would run into
> the combinatorial issue:
>
> --------------------------------------------------------------------
> Calls                   |    Page_cache_mode  |  Effective memtype  |
> ------------------------|---------------------|---------------------
>                         |  Non-PAT |    PAT   |  Non-PAT |    PAT   |
> --------------------------------------------------------------------
> ioremap(MMIO)           | xxx, 11  | xxx, UC  | xxx, UC  | xxx, UC  |
> ioremap(PCI BAR)        | 11 , 11  | UC , UC  | UC,  UC  | UC , UC  |
> MTRR WC(PCI BAR)        | 11 , 11  | UC,  UC  | UC+, UC+ | UC+, UC+ |
> MTRR UC(MMIO)           | 11 , 11  | UC,  UC  | UC+, UC  | UC+, UC  |
> --------------------------------------------------------------------
>
> We ideally would like to do the following but can't because of the restriction
> of having to use powers of two for both size and base address for MTRRs, we'd
> have two steps, one with mtrr_add, and another with arch_phys_add_wc(). This is
> what this series was proposing for atyfb.
>
> With mtrr_add():
>
> --------------------------------------------------------------------
> Calls                   |    Page_cache_mode  |  Effective memtype  |
> ------------------------|---------------------|---------------------
>                         |  Non-PAT |    PAT   |  Non-PAT |    PAT   |
> --------------------------------------------------------------------
> ioremap_nocache(MMIO)   | xxx, 10  | xxx, UC- | xxx, UC  | xxx, UC- |
> ioremap_wc(fb)          | 01 , 10  | WC , UC- | UC , UC  | WC , UC- |
> MTRR WC(fb)             | 01 , 10  | UC-, WC  | WC%*,UC  | WC%, UC- |
> --------------------------------------------------------------------
>
> Then we'd change this to arch_phys_add_wc():
>
> --------------------------------------------------------------------
> Calls                   |    Page_cache_mode  |  Effective memtype  |
> ------------------------|---------------------|---------------------
>                         |  Non-PAT |    PAT   |  Non-PAT |    PAT   |
> --------------------------------------------------------------------
> ioremap_nocache(MMIO)   | xxx, 10  | xxx, UC- | xxx, UC  | UC-, UC- |
> ioremap_wc(fb)          | 01 , 10  | WC , UC- | UC , UC  | WC , UC- |
> arch_phys_add_wc(fb)    | 01 , 10  | WC , WC  | WC%*,UC  | WC , UC- |
> --------------------------------------------------------------------
>
> With the above code as well we have to consider the issues if we
> revert commit de33c442e and UC becomes default, we'd run into then
> both the size issue and also a grey area:
>
> With mtrr_add():
>
> --------------------------------------------------------------------
> Calls                   |    Page_cache_mode  |  Effective memtype  |
> ------------------------|---------------------|---------------------
>                         |  Non-PAT |    PAT   |  Non-PAT |    PAT   |
> --------------------------------------------------------------------
> ioremap_nocache(MMIO)   | xxx, 11  | xxx, UC  | xxx, UC  | xxx, UC  |
> ioremap_wc(fb)          | 01 , 11  | WC , UC  | UC , UC  | WC , UC  |
> MTRR WC(fb)             | 01 , 11  | WC , UC  | WC%* ,UC  | WC , UC  |
> --------------------------------------------------------------------
>
> Then with arch_phys_add_wc():
>
> --------------------------------------------------------------------
> Calls                   |    Page_cache_mode  |  Effective memtype  |
> ------------------------|---------------------|---------------------
>                         |  Non-PAT |    PAT   |  Non-PAT |    PAT   |
> --------------------------------------------------------------------
> ioremap_nocache(MMIO)   | xxx, 11  | xxx, UC  | xxx, UC  | xxx, UC  |
> ioremap_wc(fb)          | 01 , 11  | WC , UC  | UC , UC  | WC , UC  |
> arch_phys_add_wc(fb)    | 01 , 11  | WC , UC  | WC%*,UC  | WC , UC  |
> --------------------------------------------------------------------
>
> So what we *could* do then if we add ioremap_uc() (use strong UC always),
> then override the framebuffer area with wc, and finally use MTRR on the
> full PCI BAR, relying on that strong UC won't let the MTRR override
> the earlier UC on the MMIO area. There is a grey area here for non-PAT
> systemes but that is also the case as-is today.
>
> --------------------------------------------------------------------
> Calls                   |    Page_cache_mode  |  Effective memtype  |
> ------------------------|---------------------|---------------------
>                         |  Non-PAT |    PAT   |  Non-PAT |    PAT   |
> --------------------------------------------------------------------
> ioremap_uc(PCI BAR)     | 11 , 11  | UC , UC  | UC , UC  | UC , UC  |
> ioremap_wc(fb)          | 01 , 11  | WC , UC  | UC , UC  | WC , UC  |
> MTRR_WC(PCI BAR)        | 01 , 11  | WC , UC  | WC*, UC  | WC , UC  |
> --------------------------------------------------------------------
>
> Finally with the arch_phys_add_wc() we'd end up with:
>
> --------------------------------------------------------------------
> Calls                   |    Page_cache_mode  |  Effective memtype  |
> ------------------------|---------------------|---------------------
>                         |  Non-PAT |    PAT   |  Non-PAT |    PAT   |
> --------------------------------------------------------------------
> ioremap_uc(PCI BAR)     | 11 , 11  | UC , UC  | UC , UC  | UC , UC  |
> ioremap_wc(fb)          | 01 , 11  | WC , UC  | UC , UC  | WC , UC  |
> arch_phys_add_wc(PCIBAR)| 01 , 11  | WC , UC  | WC*, UC  | WC , UC  |
> --------------------------------------------------------------------
>
> In this case a revert of de33c442e won't have any effect as the driver
> was already well prepared for it by using ioremap_uc().
>
>> I think that the x86 pageattr code is
>> supposed to take care of this.  IOW, if everything is working right,
>> then the supposedly uncached mmap should either fail, be promoted to
>> WC, or cause the existing WC map to degrade to UC.  The code is really
>> overcomplicated right now.
>
> Yeah aliasing things are not clear for the above picture for me, someone
> who is knee-deep in this can likely confirm of any issues with the above
> pictures. But most importrantly if we believe however that the last two sets
> above don't have any issues then I think we can move forward. Since we only
> have a few drivers that need special handling I think it makes sense to treat
> them specially and document this strategy for the "hole" work around.
>

Seems reaonable to me.

--Andy

> Thoughts?
>
>   Luis



-- 
Andy Lutomirski
AMA Capital Management, LLC

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
  2015-03-26 23:35       ` Luis R. Rodriguez
  (?)
@ 2015-04-02 20:13         ` Bjorn Helgaas
  -1 siblings, 0 replies; 710+ messages in thread
From: Bjorn Helgaas @ 2015-04-02 20:13 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Konrad Rzeszutek Wilk, Luis R. Rodriguez, Andy Lutomirski,
	Ingo Molnar, Thomas Gleixner, H. Peter Anvin, jgross,
	Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
	Dave Airlie, linux-kernel, linux-fbdev, x86, xen-devel,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen, Dave Hansen,
	Stefan Bader, Ville Syrjälä,
	David Vrabel, Toshi Kani, Roger Pau Monné,
	xen-devel

On Thu, Mar 26, 2015 at 6:35 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:

> I'll rephrase this to:
>
> ---
> It is possible to enable CONFIG_MTRR and up with it
> disabled at run time and yet CONFIG_X86_PAT continues
> to kick through with all functionally enabled. This
> can happen for instance on Xen where MTRR is not
> supported but PAT is, this can happen now on Linux as
> of commit 47591df50 by Juergen introduced as of v3.19.

I still can't parse this.  What does "up with it disabled at run time"
mean?  And "... continues to kick through"?  Probably some idiomatic
usage I'm just too old to understand :)

Please use the conventional citation format:

  47591df50512 ("xen: Support Xen pv-domains using PAT")

A one-character typo in a SHA1 makes it completely useless, so it's
nice to have the summary line both for readability and a bit of
redundancy.

Bjorn

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
@ 2015-04-02 20:13         ` Bjorn Helgaas
  0 siblings, 0 replies; 710+ messages in thread
From: Bjorn Helgaas @ 2015-04-02 20:13 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Konrad Rzeszutek Wilk, Luis R. Rodriguez, Andy Lutomirski,
	Ingo Molnar, Thomas Gleixner, H. Peter Anvin, jgross,
	Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
	Dave Airlie, linux-kernel, linux-fbdev, x86, xen-devel,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen, Dave Hansen,
	Stefan Bader, Ville Syrjälä,
	David Vrabel, Toshi Kani, Roger Pau Monné,
	xen-devel

On Thu, Mar 26, 2015 at 6:35 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:

> I'll rephrase this to:
>
> ---
> It is possible to enable CONFIG_MTRR and up with it
> disabled at run time and yet CONFIG_X86_PAT continues
> to kick through with all functionally enabled. This
> can happen for instance on Xen where MTRR is not
> supported but PAT is, this can happen now on Linux as
> of commit 47591df50 by Juergen introduced as of v3.19.

I still can't parse this.  What does "up with it disabled at run time"
mean?  And "... continues to kick through"?  Probably some idiomatic
usage I'm just too old to understand :)

Please use the conventional citation format:

  47591df50512 ("xen: Support Xen pv-domains using PAT")

A one-character typo in a SHA1 makes it completely useless, so it's
nice to have the summary line both for readability and a bit of
redundancy.

Bjorn

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
@ 2015-04-02 20:13         ` Bjorn Helgaas
  0 siblings, 0 replies; 710+ messages in thread
From: Bjorn Helgaas @ 2015-04-02 20:13 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Konrad Rzeszutek Wilk, Luis R. Rodriguez, Andy Lutomirski,
	Ingo Molnar, Thomas Gleixner, H. Peter Anvin, jgross,
	Jan Beulich, Borislav Petkov, Suresh Siddha, venkatesh.pallipadi,
	Dave Airlie, linux-kernel, linux-fbdev, x86, xen-devel,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen, Dave Hansen,
	Stefan

On Thu, Mar 26, 2015 at 6:35 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:

> I'll rephrase this to:
>
> ---
> It is possible to enable CONFIG_MTRR and up with it
> disabled at run time and yet CONFIG_X86_PAT continues
> to kick through with all functionally enabled. This
> can happen for instance on Xen where MTRR is not
> supported but PAT is, this can happen now on Linux as
> of commit 47591df50 by Juergen introduced as of v3.19.

I still can't parse this.  What does "up with it disabled at run time"
mean?  And "... continues to kick through"?  Probably some idiomatic
usage I'm just too old to understand :)

Please use the conventional citation format:

  47591df50512 ("xen: Support Xen pv-domains using PAT")

A one-character typo in a SHA1 makes it completely useless, so it's
nice to have the summary line both for readability and a bit of
redundancy.

Bjorn

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
  2015-04-02 20:13         ` Bjorn Helgaas
  (?)
@ 2015-04-02 20:20           ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-02 20:20 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Konrad Rzeszutek Wilk, Andy Lutomirski, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
	Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
	linux-kernel, linux-fbdev, x86, xen-devel, Ingo Molnar,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Stefan Bader,
	Ville Syrjälä,
	David Vrabel, Toshi Kani, Roger Pau Monné,
	xen-devel

On Thu, Apr 2, 2015 at 1:13 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>
> On Thu, Mar 26, 2015 at 6:35 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>
> > I'll rephrase this to:
> >
> > ---
> > It is possible to enable CONFIG_MTRR and up with it
> > disabled at run time and yet CONFIG_X86_PAT continues
> > to kick through with all functionally enabled. This
> > can happen for instance on Xen where MTRR is not
> > supported but PAT is, this can happen now on Linux as
> > of commit 47591df50 by Juergen introduced as of v3.19.
>
> I still can't parse this.  What does "up with it disabled at run time"
> mean?

It  means that technically even if your CPU/BIOS/system did support
MTRR if you use a kernel with MTRR support enabled you might end up
with a situation where under one situation MTRR  might be enabled and
at another run time scenario with the same exact kernel and system you
will end up with MTRR disabled. Such is the case for example when
booting with Xen, which disables the CPU bits on the hypervisor code.
If you boot the same system without Xen you'll get MTRR.

>  And "... continues to kick through"?  Probably some idiomatic
> usage I'm just too old to understand :)

That means for example that in both the above circumstances even if
MTRR went disabled at run time with Xen, the kernel went through with
getting PAT enabled.

> Please use the conventional citation format:
>
>   47591df50512 ("xen: Support Xen pv-domains using PAT")
>
> A one-character typo in a SHA1 makes it completely useless, so it's
> nice to have the summary line both for readability and a bit of
> redundancy.

Sure, fixed.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
@ 2015-04-02 20:20           ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-02 20:20 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Konrad Rzeszutek Wilk, Andy Lutomirski, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
	Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
	linux-kernel, linux-fbdev, x86, xen-devel, Ingo Molnar,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Stefan Bader,
	Ville Syrjälä,
	David Vrabel, Toshi Kani, Roger Pau Monné,
	xen-devel

On Thu, Apr 2, 2015 at 1:13 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>
> On Thu, Mar 26, 2015 at 6:35 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>
> > I'll rephrase this to:
> >
> > ---
> > It is possible to enable CONFIG_MTRR and up with it
> > disabled at run time and yet CONFIG_X86_PAT continues
> > to kick through with all functionally enabled. This
> > can happen for instance on Xen where MTRR is not
> > supported but PAT is, this can happen now on Linux as
> > of commit 47591df50 by Juergen introduced as of v3.19.
>
> I still can't parse this.  What does "up with it disabled at run time"
> mean?

It  means that technically even if your CPU/BIOS/system did support
MTRR if you use a kernel with MTRR support enabled you might end up
with a situation where under one situation MTRR  might be enabled and
at another run time scenario with the same exact kernel and system you
will end up with MTRR disabled. Such is the case for example when
booting with Xen, which disables the CPU bits on the hypervisor code.
If you boot the same system without Xen you'll get MTRR.

>  And "... continues to kick through"?  Probably some idiomatic
> usage I'm just too old to understand :)

That means for example that in both the above circumstances even if
MTRR went disabled at run time with Xen, the kernel went through with
getting PAT enabled.

> Please use the conventional citation format:
>
>   47591df50512 ("xen: Support Xen pv-domains using PAT")
>
> A one-character typo in a SHA1 makes it completely useless, so it's
> nice to have the summary line both for readability and a bit of
> redundancy.

Sure, fixed.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
@ 2015-04-02 20:20           ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-02 20:20 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Konrad Rzeszutek Wilk, Andy Lutomirski, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
	Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
	linux-kernel, linux-fbdev, x86, xen-devel, Ingo Molnar,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Stefan Bader

On Thu, Apr 2, 2015 at 1:13 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>
> On Thu, Mar 26, 2015 at 6:35 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>
> > I'll rephrase this to:
> >
> > ---
> > It is possible to enable CONFIG_MTRR and up with it
> > disabled at run time and yet CONFIG_X86_PAT continues
> > to kick through with all functionally enabled. This
> > can happen for instance on Xen where MTRR is not
> > supported but PAT is, this can happen now on Linux as
> > of commit 47591df50 by Juergen introduced as of v3.19.
>
> I still can't parse this.  What does "up with it disabled at run time"
> mean?

It  means that technically even if your CPU/BIOS/system did support
MTRR if you use a kernel with MTRR support enabled you might end up
with a situation where under one situation MTRR  might be enabled and
at another run time scenario with the same exact kernel and system you
will end up with MTRR disabled. Such is the case for example when
booting with Xen, which disables the CPU bits on the hypervisor code.
If you boot the same system without Xen you'll get MTRR.

>  And "... continues to kick through"?  Probably some idiomatic
> usage I'm just too old to understand :)

That means for example that in both the above circumstances even if
MTRR went disabled at run time with Xen, the kernel went through with
getting PAT enabled.

> Please use the conventional citation format:
>
>   47591df50512 ("xen: Support Xen pv-domains using PAT")
>
> A one-character typo in a SHA1 makes it completely useless, so it's
> nice to have the summary line both for readability and a bit of
> redundancy.

Sure, fixed.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
  2015-03-20 23:17   ` Luis R. Rodriguez
@ 2015-04-02 20:21     ` Bjorn Helgaas
  -1 siblings, 0 replies; 710+ messages in thread
From: Bjorn Helgaas @ 2015-04-02 20:21 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	jgross, Jan Beulich, Borislav Petkov, Suresh Siddha,
	venkatesh.pallipadi, Dave Airlie, linux-kernel, linux-fbdev, x86,
	xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> Ideally on systems using PAT we can expect a swift
> transition away from MTRR. There can be a few exceptions
> to this, one is where device drivers are known to exist
> on PATs with errata,

This probably makes sense to someone steeped in MTRR and PAT, but not
otherwise.  "One exception is where drivers are known to exist on PATs
with errata"?  The drivers exist, independent of PAT/MTRR/errata.  Do
you mean there are drivers that can't be converted from MTRR to PAT
because some PATs are broken?

I don't really know anything about MTRR or PAT; I'm just trying to
figure out how to parse this paragraph.

> another situation is observed on
> old device drivers where devices had combined MMIO
> register access with whatever area they typically
> later wanted to end up using MTRR for on the same
> PCI BAR. This situation can still be addressed by
> splitting up ioremap'd PCI BAR into two ioremap'd
> calls, one for MMIO registers, and another for whatever
> is desirable for write-combining -- in order to
> accomplish this though quite a bit of driver
> restructuring is required.
>
> Device drivers which are known to require large
> amount of re-work in order to split ioremap'd areas
> can use __arch_phys_wc_add() to avoid regressions
> when PAT is enabled.
>
> For a good example driver where things are neatly
> split up on a PCI BAR refer the infiniband qib
> driver. For a good example of a driver where good
> amount of work is required refer to the infiniband
> ipath driver.
>
> This is *only* a transitive API -- and as such no new
> drivers are ever expected to use this.

"transient"?  Do you mean you intend to remove this API in the near future?

Bjorn

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
@ 2015-04-02 20:21     ` Bjorn Helgaas
  0 siblings, 0 replies; 710+ messages in thread
From: Bjorn Helgaas @ 2015-04-02 20:21 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	jgross, Jan Beulich, Borislav Petkov, Suresh Siddha,
	venkatesh.pallipadi, Dave Airlie, linux-kernel, linux-fbdev, x86,
	xen-devel, Luis R. Rodriguez, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> Ideally on systems using PAT we can expect a swift
> transition away from MTRR. There can be a few exceptions
> to this, one is where device drivers are known to exist
> on PATs with errata,

This probably makes sense to someone steeped in MTRR and PAT, but not
otherwise.  "One exception is where drivers are known to exist on PATs
with errata"?  The drivers exist, independent of PAT/MTRR/errata.  Do
you mean there are drivers that can't be converted from MTRR to PAT
because some PATs are broken?

I don't really know anything about MTRR or PAT; I'm just trying to
figure out how to parse this paragraph.

> another situation is observed on
> old device drivers where devices had combined MMIO
> register access with whatever area they typically
> later wanted to end up using MTRR for on the same
> PCI BAR. This situation can still be addressed by
> splitting up ioremap'd PCI BAR into two ioremap'd
> calls, one for MMIO registers, and another for whatever
> is desirable for write-combining -- in order to
> accomplish this though quite a bit of driver
> restructuring is required.
>
> Device drivers which are known to require large
> amount of re-work in order to split ioremap'd areas
> can use __arch_phys_wc_add() to avoid regressions
> when PAT is enabled.
>
> For a good example driver where things are neatly
> split up on a PCI BAR refer the infiniband qib
> driver. For a good example of a driver where good
> amount of work is required refer to the infiniband
> ipath driver.
>
> This is *only* a transitive API -- and as such no new
> drivers are ever expected to use this.

"transient"?  Do you mean you intend to remove this API in the near future?

Bjorn

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
  2015-03-20 23:17   ` Luis R. Rodriguez
                     ` (3 preceding siblings ...)
  (?)
@ 2015-04-02 20:21   ` Bjorn Helgaas
  -1 siblings, 0 replies; 710+ messages in thread
From: Bjorn Helgaas @ 2015-04-02 20:21 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: jgross, linux-fbdev, x86, Suresh Siddha, Antonino Daplas,
	Daniel Vetter, Luis R. Rodriguez, venkatesh.pallipadi,
	linux-kernel, Andy Lutomirski, xen-devel, Ingo Molnar,
	Tomi Valkeinen, Jan Beulich, H. Peter Anvin, Dave Airlie,
	Thomas Gleixner, Borislav Petkov,
	Jean-Christophe Plagniol-Villard, Ingo Molnar

On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> Ideally on systems using PAT we can expect a swift
> transition away from MTRR. There can be a few exceptions
> to this, one is where device drivers are known to exist
> on PATs with errata,

This probably makes sense to someone steeped in MTRR and PAT, but not
otherwise.  "One exception is where drivers are known to exist on PATs
with errata"?  The drivers exist, independent of PAT/MTRR/errata.  Do
you mean there are drivers that can't be converted from MTRR to PAT
because some PATs are broken?

I don't really know anything about MTRR or PAT; I'm just trying to
figure out how to parse this paragraph.

> another situation is observed on
> old device drivers where devices had combined MMIO
> register access with whatever area they typically
> later wanted to end up using MTRR for on the same
> PCI BAR. This situation can still be addressed by
> splitting up ioremap'd PCI BAR into two ioremap'd
> calls, one for MMIO registers, and another for whatever
> is desirable for write-combining -- in order to
> accomplish this though quite a bit of driver
> restructuring is required.
>
> Device drivers which are known to require large
> amount of re-work in order to split ioremap'd areas
> can use __arch_phys_wc_add() to avoid regressions
> when PAT is enabled.
>
> For a good example driver where things are neatly
> split up on a PCI BAR refer the infiniband qib
> driver. For a good example of a driver where good
> amount of work is required refer to the infiniband
> ipath driver.
>
> This is *only* a transitive API -- and as such no new
> drivers are ever expected to use this.

"transient"?  Do you mean you intend to remove this API in the near future?

Bjorn

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
  2015-04-02 20:20           ` Luis R. Rodriguez
  (?)
@ 2015-04-02 20:28             ` Bjorn Helgaas
  -1 siblings, 0 replies; 710+ messages in thread
From: Bjorn Helgaas @ 2015-04-02 20:28 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Konrad Rzeszutek Wilk, Andy Lutomirski, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
	Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
	linux-kernel, linux-fbdev, x86, xen-devel, Ingo Molnar,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Stefan Bader,
	Ville Syrjälä,
	David Vrabel, Toshi Kani, Roger Pau Monné,
	xen-devel

On Thu, Apr 2, 2015 at 3:20 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Thu, Apr 2, 2015 at 1:13 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>
>> On Thu, Mar 26, 2015 at 6:35 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>>
>> > I'll rephrase this to:
>> >
>> > ---
>> > It is possible to enable CONFIG_MTRR and up with it
>> > disabled at run time and yet CONFIG_X86_PAT continues
>> > to kick through with all functionally enabled. This
>> > can happen for instance on Xen where MTRR is not
>> > supported but PAT is, this can happen now on Linux as
>> > of commit 47591df50 by Juergen introduced as of v3.19.
>>
>> I still can't parse this.  What does "up with it disabled at run time"
>> mean?
>
> It  means that technically even if your CPU/BIOS/system did support
> MTRR if you use a kernel with MTRR support enabled you might end up
> with a situation where under one situation MTRR  might be enabled and
> at another run time scenario with the same exact kernel and system you
> will end up with MTRR disabled. Such is the case for example when
> booting with Xen, which disables the CPU bits on the hypervisor code.
> If you boot the same system without Xen you'll get MTRR.

Your text is missing some words.  You seem to be using "up" as a verb,
but it's not a verb.  Maybe you meant "end up"?  Even then, it
wouldn't make sense for CONFIG_MTRR to be "disabled at run time"
because CONFIG_MTRR is a compile-time switch.  The MTRR
*functionality* could certainly be disabled at run-time, but not
CONFIG_MTRR itself.

>>  And "... continues to kick through"?  Probably some idiomatic
>> usage I'm just too old to understand :)
>
> That means for example that in both the above circumstances even if
> MTRR went disabled at run time with Xen, the kernel went through with
> getting PAT enabled.

"CONFIG_X86_PAT continues to kick through" doesn't seem a very precise
way of describing this.  But maybe it's enough for experts in this
area (which I'm not).

Bjorn

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
@ 2015-04-02 20:28             ` Bjorn Helgaas
  0 siblings, 0 replies; 710+ messages in thread
From: Bjorn Helgaas @ 2015-04-02 20:28 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Konrad Rzeszutek Wilk, Andy Lutomirski, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
	Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
	linux-kernel, linux-fbdev, x86, xen-devel, Ingo Molnar,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Stefan Bader,
	Ville Syrjälä,
	David Vrabel, Toshi Kani, Roger Pau Monné,
	xen-devel

On Thu, Apr 2, 2015 at 3:20 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Thu, Apr 2, 2015 at 1:13 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>
>> On Thu, Mar 26, 2015 at 6:35 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>>
>> > I'll rephrase this to:
>> >
>> > ---
>> > It is possible to enable CONFIG_MTRR and up with it
>> > disabled at run time and yet CONFIG_X86_PAT continues
>> > to kick through with all functionally enabled. This
>> > can happen for instance on Xen where MTRR is not
>> > supported but PAT is, this can happen now on Linux as
>> > of commit 47591df50 by Juergen introduced as of v3.19.
>>
>> I still can't parse this.  What does "up with it disabled at run time"
>> mean?
>
> It  means that technically even if your CPU/BIOS/system did support
> MTRR if you use a kernel with MTRR support enabled you might end up
> with a situation where under one situation MTRR  might be enabled and
> at another run time scenario with the same exact kernel and system you
> will end up with MTRR disabled. Such is the case for example when
> booting with Xen, which disables the CPU bits on the hypervisor code.
> If you boot the same system without Xen you'll get MTRR.

Your text is missing some words.  You seem to be using "up" as a verb,
but it's not a verb.  Maybe you meant "end up"?  Even then, it
wouldn't make sense for CONFIG_MTRR to be "disabled at run time"
because CONFIG_MTRR is a compile-time switch.  The MTRR
*functionality* could certainly be disabled at run-time, but not
CONFIG_MTRR itself.

>>  And "... continues to kick through"?  Probably some idiomatic
>> usage I'm just too old to understand :)
>
> That means for example that in both the above circumstances even if
> MTRR went disabled at run time with Xen, the kernel went through with
> getting PAT enabled.

"CONFIG_X86_PAT continues to kick through" doesn't seem a very precise
way of describing this.  But maybe it's enough for experts in this
area (which I'm not).

Bjorn

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
@ 2015-04-02 20:28             ` Bjorn Helgaas
  0 siblings, 0 replies; 710+ messages in thread
From: Bjorn Helgaas @ 2015-04-02 20:28 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Konrad Rzeszutek Wilk, Andy Lutomirski, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
	Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
	linux-kernel, linux-fbdev, x86, xen-devel, Ingo Molnar,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Stefan Bader

On Thu, Apr 2, 2015 at 3:20 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Thu, Apr 2, 2015 at 1:13 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>
>> On Thu, Mar 26, 2015 at 6:35 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>>
>> > I'll rephrase this to:
>> >
>> > ---
>> > It is possible to enable CONFIG_MTRR and up with it
>> > disabled at run time and yet CONFIG_X86_PAT continues
>> > to kick through with all functionally enabled. This
>> > can happen for instance on Xen where MTRR is not
>> > supported but PAT is, this can happen now on Linux as
>> > of commit 47591df50 by Juergen introduced as of v3.19.
>>
>> I still can't parse this.  What does "up with it disabled at run time"
>> mean?
>
> It  means that technically even if your CPU/BIOS/system did support
> MTRR if you use a kernel with MTRR support enabled you might end up
> with a situation where under one situation MTRR  might be enabled and
> at another run time scenario with the same exact kernel and system you
> will end up with MTRR disabled. Such is the case for example when
> booting with Xen, which disables the CPU bits on the hypervisor code.
> If you boot the same system without Xen you'll get MTRR.

Your text is missing some words.  You seem to be using "up" as a verb,
but it's not a verb.  Maybe you meant "end up"?  Even then, it
wouldn't make sense for CONFIG_MTRR to be "disabled at run time"
because CONFIG_MTRR is a compile-time switch.  The MTRR
*functionality* could certainly be disabled at run-time, but not
CONFIG_MTRR itself.

>>  And "... continues to kick through"?  Probably some idiomatic
>> usage I'm just too old to understand :)
>
> That means for example that in both the above circumstances even if
> MTRR went disabled at run time with Xen, the kernel went through with
> getting PAT enabled.

"CONFIG_X86_PAT continues to kick through" doesn't seem a very precise
way of describing this.  But maybe it's enough for experts in this
area (which I'm not).

Bjorn

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
  2015-04-02 20:21     ` Bjorn Helgaas
@ 2015-04-02 20:55       ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-02 20:55 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Luis R. Rodriguez, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, jgross, Jan Beulich, Borislav Petkov,
	Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
	linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

On Thu, Apr 02, 2015 at 03:21:22PM -0500, Bjorn Helgaas wrote:
> On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > Ideally on systems using PAT we can expect a swift
> > transition away from MTRR. There can be a few exceptions
> > to this, one is where device drivers are known to exist
> > on PATs with errata,
> 
> This probably makes sense to someone steeped in MTRR and PAT, but not
> otherwise.  "One exception is where drivers are known to exist on PATs
> with errata"?  The drivers exist, independent of PAT/MTRR/errata.  Do
> you mean there are drivers that can't be converted from MTRR to PAT
> because some PATs are broken?

Well there is that but it seems we have motivation to
address the PAT broken systems so this would be one of the
lower priority reasons to consider adding this API. The
more important reason is below.

> I don't really know anything about MTRR or PAT; I'm just trying to
> figure out how to parse this paragraph.

Sure.

> > another situation is observed on
> > old device drivers where devices had combined MMIO
> > register access with whatever area they typically
> > later wanted to end up using MTRR for on the same
> > PCI BAR. This situation can still be addressed by
> > splitting up ioremap'd PCI BAR into two ioremap'd
> > calls, one for MMIO registers, and another for whatever
> > is desirable for write-combining -- in order to
> > accomplish this though quite a bit of driver
> > restructuring is required.
> >
> > Device drivers which are known to require large
> > amount of re-work in order to split ioremap'd areas
> > can use __arch_phys_wc_add() to avoid regressions
> > when PAT is enabled.
> >
> > For a good example driver where things are neatly
> > split up on a PCI BAR refer the infiniband qib
> > driver. For a good example of a driver where good
> > amount of work is required refer to the infiniband
> > ipath driver.
> >
> > This is *only* a transitive API -- and as such no new
> > drivers are ever expected to use this.
> 
> "transient"?  Do you mean you intend to remove this API in the near future?

That's correct, the problem is that in order to use PAT cleanly we'd need to
change these drivers to not overlap ioremap'd areas otherwise things can get
quite complex, and changing the way we do the ioremap() calls on a driver might
require a bit of work. The atyfb driver changes I did are an example of the
types of changes that are expected.  In the most complex worst cases there are
MTRR "hole" tricks used, and as can be observed with the atyfb driver changes
there are a series of things to consider when this is done specially in light
of eventually making strong UC the default instead of UC-.

I might be able to work around not adding this API by reviewing the users I had
in this series again and seeing if something similar to what I will do on atyfb
can be done in the meantime by using ioremap_uc(). Its not clear to me yet.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
@ 2015-04-02 20:55       ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-02 20:55 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Luis R. Rodriguez, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, jgross, Jan Beulich, Borislav Petkov,
	Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
	linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

On Thu, Apr 02, 2015 at 03:21:22PM -0500, Bjorn Helgaas wrote:
> On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > Ideally on systems using PAT we can expect a swift
> > transition away from MTRR. There can be a few exceptions
> > to this, one is where device drivers are known to exist
> > on PATs with errata,
> 
> This probably makes sense to someone steeped in MTRR and PAT, but not
> otherwise.  "One exception is where drivers are known to exist on PATs
> with errata"?  The drivers exist, independent of PAT/MTRR/errata.  Do
> you mean there are drivers that can't be converted from MTRR to PAT
> because some PATs are broken?

Well there is that but it seems we have motivation to
address the PAT broken systems so this would be one of the
lower priority reasons to consider adding this API. The
more important reason is below.

> I don't really know anything about MTRR or PAT; I'm just trying to
> figure out how to parse this paragraph.

Sure.

> > another situation is observed on
> > old device drivers where devices had combined MMIO
> > register access with whatever area they typically
> > later wanted to end up using MTRR for on the same
> > PCI BAR. This situation can still be addressed by
> > splitting up ioremap'd PCI BAR into two ioremap'd
> > calls, one for MMIO registers, and another for whatever
> > is desirable for write-combining -- in order to
> > accomplish this though quite a bit of driver
> > restructuring is required.
> >
> > Device drivers which are known to require large
> > amount of re-work in order to split ioremap'd areas
> > can use __arch_phys_wc_add() to avoid regressions
> > when PAT is enabled.
> >
> > For a good example driver where things are neatly
> > split up on a PCI BAR refer the infiniband qib
> > driver. For a good example of a driver where good
> > amount of work is required refer to the infiniband
> > ipath driver.
> >
> > This is *only* a transitive API -- and as such no new
> > drivers are ever expected to use this.
> 
> "transient"?  Do you mean you intend to remove this API in the near future?

That's correct, the problem is that in order to use PAT cleanly we'd need to
change these drivers to not overlap ioremap'd areas otherwise things can get
quite complex, and changing the way we do the ioremap() calls on a driver might
require a bit of work. The atyfb driver changes I did are an example of the
types of changes that are expected.  In the most complex worst cases there are
MTRR "hole" tricks used, and as can be observed with the atyfb driver changes
there are a series of things to consider when this is done specially in light
of eventually making strong UC the default instead of UC-.

I might be able to work around not adding this API by reviewing the users I had
in this series again and seeing if something similar to what I will do on atyfb
can be done in the meantime by using ioremap_uc(). Its not clear to me yet.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
  2015-04-02 20:21     ` Bjorn Helgaas
  (?)
  (?)
@ 2015-04-02 20:55     ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-02 20:55 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: jgross, linux-fbdev, x86, Suresh Siddha, Antonino Daplas,
	Luis R. Rodriguez, Daniel Vetter, Tomi Valkeinen,
	venkatesh.pallipadi, linux-kernel, Andy Lutomirski, xen-devel,
	Ingo Molnar, Jan Beulich, H. Peter Anvin, Dave Airlie,
	Thomas Gleixner, Borislav Petkov,
	Jean-Christophe Plagniol-Villard, Ingo Molnar

On Thu, Apr 02, 2015 at 03:21:22PM -0500, Bjorn Helgaas wrote:
> On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
> <mcgrof@do-not-panic.com> wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> >
> > Ideally on systems using PAT we can expect a swift
> > transition away from MTRR. There can be a few exceptions
> > to this, one is where device drivers are known to exist
> > on PATs with errata,
> 
> This probably makes sense to someone steeped in MTRR and PAT, but not
> otherwise.  "One exception is where drivers are known to exist on PATs
> with errata"?  The drivers exist, independent of PAT/MTRR/errata.  Do
> you mean there are drivers that can't be converted from MTRR to PAT
> because some PATs are broken?

Well there is that but it seems we have motivation to
address the PAT broken systems so this would be one of the
lower priority reasons to consider adding this API. The
more important reason is below.

> I don't really know anything about MTRR or PAT; I'm just trying to
> figure out how to parse this paragraph.

Sure.

> > another situation is observed on
> > old device drivers where devices had combined MMIO
> > register access with whatever area they typically
> > later wanted to end up using MTRR for on the same
> > PCI BAR. This situation can still be addressed by
> > splitting up ioremap'd PCI BAR into two ioremap'd
> > calls, one for MMIO registers, and another for whatever
> > is desirable for write-combining -- in order to
> > accomplish this though quite a bit of driver
> > restructuring is required.
> >
> > Device drivers which are known to require large
> > amount of re-work in order to split ioremap'd areas
> > can use __arch_phys_wc_add() to avoid regressions
> > when PAT is enabled.
> >
> > For a good example driver where things are neatly
> > split up on a PCI BAR refer the infiniband qib
> > driver. For a good example of a driver where good
> > amount of work is required refer to the infiniband
> > ipath driver.
> >
> > This is *only* a transitive API -- and as such no new
> > drivers are ever expected to use this.
> 
> "transient"?  Do you mean you intend to remove this API in the near future?

That's correct, the problem is that in order to use PAT cleanly we'd need to
change these drivers to not overlap ioremap'd areas otherwise things can get
quite complex, and changing the way we do the ioremap() calls on a driver might
require a bit of work. The atyfb driver changes I did are an example of the
types of changes that are expected.  In the most complex worst cases there are
MTRR "hole" tricks used, and as can be observed with the atyfb driver changes
there are a series of things to consider when this is done specially in light
of eventually making strong UC the default instead of UC-.

I might be able to work around not adding this API by reviewing the users I had
in this series again and seeing if something similar to what I will do on atyfb
can be done in the meantime by using ioremap_uc(). Its not clear to me yet.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
  2015-04-02 20:28             ` Bjorn Helgaas
  (?)
@ 2015-04-02 21:02               ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-02 21:02 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Konrad Rzeszutek Wilk, Andy Lutomirski, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
	Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
	linux-kernel, linux-fbdev, x86, xen-devel, Ingo Molnar,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Stefan Bader,
	Ville Syrjälä,
	David Vrabel, Toshi Kani, Roger Pau Monné,
	xen-devel

On Thu, Apr 02, 2015 at 03:28:51PM -0500, Bjorn Helgaas wrote:
> On Thu, Apr 2, 2015 at 3:20 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Thu, Apr 2, 2015 at 1:13 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> >>
> >> On Thu, Mar 26, 2015 at 6:35 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >>
> >> > I'll rephrase this to:
> >> >
> >> > ---
> >> > It is possible to enable CONFIG_MTRR and up with it
> >> > disabled at run time and yet CONFIG_X86_PAT continues
> >> > to kick through with all functionally enabled. This
> >> > can happen for instance on Xen where MTRR is not
> >> > supported but PAT is, this can happen now on Linux as
> >> > of commit 47591df50 by Juergen introduced as of v3.19.
> >>
> >> I still can't parse this.  What does "up with it disabled at run time"
> >> mean?
> >
> > It  means that technically even if your CPU/BIOS/system did support
> > MTRR if you use a kernel with MTRR support enabled you might end up
> > with a situation where under one situation MTRR  might be enabled and
> > at another run time scenario with the same exact kernel and system you
> > will end up with MTRR disabled. Such is the case for example when
> > booting with Xen, which disables the CPU bits on the hypervisor code.
> > If you boot the same system without Xen you'll get MTRR.
> 
> Your text is missing some words.  You seem to be using "up" as a verb,
> but it's not a verb.  Maybe you meant "end up"? 

Indeed.

> Even then, it
> wouldn't make sense for CONFIG_MTRR to be "disabled at run time"
> because CONFIG_MTRR is a compile-time switch.  The MTRR
> *functionality* could certainly be disabled at run-time, but not
> CONFIG_MTRR itself.

I'll clarify.

> >>  And "... continues to kick through"?  Probably some idiomatic
> >> usage I'm just too old to understand :)
> >
> > That means for example that in both the above circumstances even if
> > MTRR went disabled at run time with Xen, the kernel went through with
> > getting PAT enabled.
> 
> "CONFIG_X86_PAT continues to kick through" doesn't seem a very precise
> way of describing this.  But maybe it's enough for experts in this
> area (which I'm not).

I've rephrased this to:

---
It is possible to enable CONFIG_MTRR and CONFIG_X86_PAT                         
and end up with a system with MTRR functionality disabled                       
PAT functionality enabled. This can happen for instance                         
on Xen where MTRR is not supported but PAT is. This can                         
happen on Linux as of commit 47591df50 ("xen: Support Xen                       
pv-domains using PAT") by Juergen, introduced as of v3.19.
---

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
@ 2015-04-02 21:02               ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-02 21:02 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Konrad Rzeszutek Wilk, Andy Lutomirski, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
	Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
	linux-kernel, linux-fbdev, x86, xen-devel, Ingo Molnar,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Stefan Bader,
	Ville Syrjälä,
	David Vrabel, Toshi Kani, Roger Pau Monné,
	xen-devel

On Thu, Apr 02, 2015 at 03:28:51PM -0500, Bjorn Helgaas wrote:
> On Thu, Apr 2, 2015 at 3:20 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Thu, Apr 2, 2015 at 1:13 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> >>
> >> On Thu, Mar 26, 2015 at 6:35 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >>
> >> > I'll rephrase this to:
> >> >
> >> > ---
> >> > It is possible to enable CONFIG_MTRR and up with it
> >> > disabled at run time and yet CONFIG_X86_PAT continues
> >> > to kick through with all functionally enabled. This
> >> > can happen for instance on Xen where MTRR is not
> >> > supported but PAT is, this can happen now on Linux as
> >> > of commit 47591df50 by Juergen introduced as of v3.19.
> >>
> >> I still can't parse this.  What does "up with it disabled at run time"
> >> mean?
> >
> > It  means that technically even if your CPU/BIOS/system did support
> > MTRR if you use a kernel with MTRR support enabled you might end up
> > with a situation where under one situation MTRR  might be enabled and
> > at another run time scenario with the same exact kernel and system you
> > will end up with MTRR disabled. Such is the case for example when
> > booting with Xen, which disables the CPU bits on the hypervisor code.
> > If you boot the same system without Xen you'll get MTRR.
> 
> Your text is missing some words.  You seem to be using "up" as a verb,
> but it's not a verb.  Maybe you meant "end up"? 

Indeed.

> Even then, it
> wouldn't make sense for CONFIG_MTRR to be "disabled at run time"
> because CONFIG_MTRR is a compile-time switch.  The MTRR
> *functionality* could certainly be disabled at run-time, but not
> CONFIG_MTRR itself.

I'll clarify.

> >>  And "... continues to kick through"?  Probably some idiomatic
> >> usage I'm just too old to understand :)
> >
> > That means for example that in both the above circumstances even if
> > MTRR went disabled at run time with Xen, the kernel went through with
> > getting PAT enabled.
> 
> "CONFIG_X86_PAT continues to kick through" doesn't seem a very precise
> way of describing this.  But maybe it's enough for experts in this
> area (which I'm not).

I've rephrased this to:

---
It is possible to enable CONFIG_MTRR and CONFIG_X86_PAT                         
and end up with a system with MTRR functionality disabled                       
PAT functionality enabled. This can happen for instance                         
on Xen where MTRR is not supported but PAT is. This can                         
happen on Linux as of commit 47591df50 ("xen: Support Xen                       
pv-domains using PAT") by Juergen, introduced as of v3.19.
---

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
@ 2015-04-02 21:02               ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-02 21:02 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Konrad Rzeszutek Wilk, Andy Lutomirski, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
	Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
	linux-kernel, linux-fbdev, x86, xen-devel, Ingo Molnar,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Stefan Bader

On Thu, Apr 02, 2015 at 03:28:51PM -0500, Bjorn Helgaas wrote:
> On Thu, Apr 2, 2015 at 3:20 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Thu, Apr 2, 2015 at 1:13 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> >>
> >> On Thu, Mar 26, 2015 at 6:35 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >>
> >> > I'll rephrase this to:
> >> >
> >> > ---
> >> > It is possible to enable CONFIG_MTRR and up with it
> >> > disabled at run time and yet CONFIG_X86_PAT continues
> >> > to kick through with all functionally enabled. This
> >> > can happen for instance on Xen where MTRR is not
> >> > supported but PAT is, this can happen now on Linux as
> >> > of commit 47591df50 by Juergen introduced as of v3.19.
> >>
> >> I still can't parse this.  What does "up with it disabled at run time"
> >> mean?
> >
> > It  means that technically even if your CPU/BIOS/system did support
> > MTRR if you use a kernel with MTRR support enabled you might end up
> > with a situation where under one situation MTRR  might be enabled and
> > at another run time scenario with the same exact kernel and system you
> > will end up with MTRR disabled. Such is the case for example when
> > booting with Xen, which disables the CPU bits on the hypervisor code.
> > If you boot the same system without Xen you'll get MTRR.
> 
> Your text is missing some words.  You seem to be using "up" as a verb,
> but it's not a verb.  Maybe you meant "end up"? 

Indeed.

> Even then, it
> wouldn't make sense for CONFIG_MTRR to be "disabled at run time"
> because CONFIG_MTRR is a compile-time switch.  The MTRR
> *functionality* could certainly be disabled at run-time, but not
> CONFIG_MTRR itself.

I'll clarify.

> >>  And "... continues to kick through"?  Probably some idiomatic
> >> usage I'm just too old to understand :)
> >
> > That means for example that in both the above circumstances even if
> > MTRR went disabled at run time with Xen, the kernel went through with
> > getting PAT enabled.
> 
> "CONFIG_X86_PAT continues to kick through" doesn't seem a very precise
> way of describing this.  But maybe it's enough for experts in this
> area (which I'm not).

I've rephrased this to:

---
It is possible to enable CONFIG_MTRR and CONFIG_X86_PAT                         
and end up with a system with MTRR functionality disabled                       
PAT functionality enabled. This can happen for instance                         
on Xen where MTRR is not supported but PAT is. This can                         
happen on Linux as of commit 47591df50 ("xen: Support Xen                       
pv-domains using PAT") by Juergen, introduced as of v3.19.
---

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
  2015-03-27 23:56       ` Luis R. Rodriguez
@ 2015-04-02 21:49         ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-02 21:49 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
	suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
	linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Stefan Bader, konrad.wilk,
	ville.syrjala, david.vrabel, bhelgaas, Roger Pau Monné,
	xen-devel

On Sat, Mar 28, 2015 at 12:56:30AM +0100, Luis R. Rodriguez wrote:
> On Fri, Mar 27, 2015 at 02:40:17PM -0600, Toshi Kani wrote:
> > On Fri, 2015-03-20 at 16:17 -0700, Luis R. Rodriguez wrote:
> >  :
> > > @@ -734,6 +742,7 @@ void __init mtrr_bp_init(void)
> > >  	}
> > >  
> > >  	if (mtrr_if) {
> > > +		mtrr_enabled = true;
> > >  		set_num_var_ranges();
> > >  		init_table();
> > >  		if (use_intel()) {
> >                         get_mtrr_state();
> > 
> > After setting mtrr_enabled to true, get_mtrr_state() reads
> > MSR_MTRRdefType and sets 'mtrr_state.enabled', which also indicates if
> > MTRRs are enabled or not on the system.  So, potentially, we could have
> > a case that mtrr_enabled is set to true, but mtrr_state.enabled is set
> > to disabled when MTRRs are disabled by BIOS.
> 
> Thanks for the review, in this case then we should update mtrr_enabled to false.
> 
> > ps.
> > I recently cleaned up this part of the MTRR code in the patch below,
> > which is currently available in the -mm & -next trees.
> > https://lkml.org/lkml/2015/3/24/1063
> 
> Great I will rebase and work with that and try to address this
> consideration you have raised.

OK I'll mesh in this change as well in my next respin:

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index a83f27a..ecf7cb9 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -438,7 +438,7 @@ static void __init print_mtrr_state(void)
 }
 
 /* Grab all of the MTRR state for this CPU into *state */
-void __init get_mtrr_state(void)
+bool __init get_mtrr_state(void)
 {
 	struct mtrr_var_range *vrs;
 	unsigned long flags;
@@ -482,6 +482,8 @@ void __init get_mtrr_state(void)
 
 	post_set();
 	local_irq_restore(flags);
+
+	return !!mtrr_state.enabled;
 }
 
 /* Some BIOS's are messed up and don't set all MTRRs the same! */
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index ea5f363..f96195e 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -734,22 +742,25 @@ void __init mtrr_bp_init(void)
 	}
 
 	if (mtrr_if) {
+		mtrr_enabled = true;
 		set_num_var_ranges();
 		init_table();
 		if (use_intel()) {
-			get_mtrr_state();
+			/* BIOS may override */
+			mtrr_enabled = get_mtrr_state();
 
 			if (mtrr_cleanup(phys_addr)) {
 				changed_by_mtrr_cleanup = 1;
@@ -745,11 +755,14 @@ void __init mtrr_bp_init(void)
                        }
                }
        }
+
+       if (!mtrr_enabled)
+               pr_info("mtrr: system does not support MTRR\n");
 }
 

diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
index df5e41f..951884d 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -51,7 +51,7 @@ void set_mtrr_prepare_save(struct set_mtrr_context *ctxt);
 
 void fill_mtrr_var_range(unsigned int index,
 		u32 base_lo, u32 base_hi, u32 mask_lo, u32 mask_hi);
-void get_mtrr_state(void);
+bool get_mtrr_state(void);
 
 extern void set_mtrr_ops(const struct mtrr_ops *ops);
 

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
@ 2015-04-02 21:49         ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-02 21:49 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
	suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
	linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Stefan Bader, konrad.wilk,
	ville.syrjala, david.vrabel, bhelgaas, Roger Pau Monné,
	xen-devel

On Sat, Mar 28, 2015 at 12:56:30AM +0100, Luis R. Rodriguez wrote:
> On Fri, Mar 27, 2015 at 02:40:17PM -0600, Toshi Kani wrote:
> > On Fri, 2015-03-20 at 16:17 -0700, Luis R. Rodriguez wrote:
> >  :
> > > @@ -734,6 +742,7 @@ void __init mtrr_bp_init(void)
> > >  	}
> > >  
> > >  	if (mtrr_if) {
> > > +		mtrr_enabled = true;
> > >  		set_num_var_ranges();
> > >  		init_table();
> > >  		if (use_intel()) {
> >                         get_mtrr_state();
> > 
> > After setting mtrr_enabled to true, get_mtrr_state() reads
> > MSR_MTRRdefType and sets 'mtrr_state.enabled', which also indicates if
> > MTRRs are enabled or not on the system.  So, potentially, we could have
> > a case that mtrr_enabled is set to true, but mtrr_state.enabled is set
> > to disabled when MTRRs are disabled by BIOS.
> 
> Thanks for the review, in this case then we should update mtrr_enabled to false.
> 
> > ps.
> > I recently cleaned up this part of the MTRR code in the patch below,
> > which is currently available in the -mm & -next trees.
> > https://lkml.org/lkml/2015/3/24/1063
> 
> Great I will rebase and work with that and try to address this
> consideration you have raised.

OK I'll mesh in this change as well in my next respin:

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index a83f27a..ecf7cb9 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -438,7 +438,7 @@ static void __init print_mtrr_state(void)
 }
 
 /* Grab all of the MTRR state for this CPU into *state */
-void __init get_mtrr_state(void)
+bool __init get_mtrr_state(void)
 {
 	struct mtrr_var_range *vrs;
 	unsigned long flags;
@@ -482,6 +482,8 @@ void __init get_mtrr_state(void)
 
 	post_set();
 	local_irq_restore(flags);
+
+	return !!mtrr_state.enabled;
 }
 
 /* Some BIOS's are messed up and don't set all MTRRs the same! */
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index ea5f363..f96195e 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -734,22 +742,25 @@ void __init mtrr_bp_init(void)
 	}
 
 	if (mtrr_if) {
+		mtrr_enabled = true;
 		set_num_var_ranges();
 		init_table();
 		if (use_intel()) {
-			get_mtrr_state();
+			/* BIOS may override */
+			mtrr_enabled = get_mtrr_state();
 
 			if (mtrr_cleanup(phys_addr)) {
 				changed_by_mtrr_cleanup = 1;
@@ -745,11 +755,14 @@ void __init mtrr_bp_init(void)
                        }
                }
        }
+
+       if (!mtrr_enabled)
+               pr_info("mtrr: system does not support MTRR\n");
 }
 

diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
index df5e41f..951884d 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -51,7 +51,7 @@ void set_mtrr_prepare_save(struct set_mtrr_context *ctxt);
 
 void fill_mtrr_var_range(unsigned int index,
 		u32 base_lo, u32 base_hi, u32 mask_lo, u32 mask_hi);
-void get_mtrr_state(void);
+bool get_mtrr_state(void);
 
 extern void set_mtrr_ops(const struct mtrr_ops *ops);
 

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
  2015-04-02 21:02               ` Luis R. Rodriguez
  (?)
@ 2015-04-02 22:09                 ` Bjorn Helgaas
  -1 siblings, 0 replies; 710+ messages in thread
From: Bjorn Helgaas @ 2015-04-02 22:09 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Konrad Rzeszutek Wilk, Andy Lutomirski, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
	Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
	linux-kernel, linux-fbdev, x86, xen-devel, Ingo Molnar,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Stefan Bader,
	Ville Syrjälä,
	David Vrabel, Toshi Kani, Roger Pau Monné,
	xen-devel

On Thu, Apr 2, 2015 at 4:02 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:

> ---
> It is possible to enable CONFIG_MTRR and CONFIG_X86_PAT
> and end up with a system with MTRR functionality disabled
> PAT functionality enabled.

This is missing a conjunction or something in "MTRR functionality
disabled PAT functionality."

Bjorn

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
@ 2015-04-02 22:09                 ` Bjorn Helgaas
  0 siblings, 0 replies; 710+ messages in thread
From: Bjorn Helgaas @ 2015-04-02 22:09 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Konrad Rzeszutek Wilk, Andy Lutomirski, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
	Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
	linux-kernel, linux-fbdev, x86, xen-devel, Ingo Molnar,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Stefan Bader,
	Ville Syrjälä,
	David Vrabel, Toshi Kani, Roger Pau Monné,
	xen-devel

On Thu, Apr 2, 2015 at 4:02 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:

> ---
> It is possible to enable CONFIG_MTRR and CONFIG_X86_PAT
> and end up with a system with MTRR functionality disabled
> PAT functionality enabled.

This is missing a conjunction or something in "MTRR functionality
disabled PAT functionality."

Bjorn

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
@ 2015-04-02 22:09                 ` Bjorn Helgaas
  0 siblings, 0 replies; 710+ messages in thread
From: Bjorn Helgaas @ 2015-04-02 22:09 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Konrad Rzeszutek Wilk, Andy Lutomirski, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Juergen Gross, Jan Beulich,
	Borislav Petkov, Suresh Siddha, venkatesh.pallipadi, Dave Airlie,
	linux-kernel, linux-fbdev, x86, xen-devel, Ingo Molnar,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Stefan Bader

On Thu, Apr 2, 2015 at 4:02 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:

> ---
> It is possible to enable CONFIG_MTRR and CONFIG_X86_PAT
> and end up with a system with MTRR functionality disabled
> PAT functionality enabled.

This is missing a conjunction or something in "MTRR functionality
disabled PAT functionality."

Bjorn

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [Xen-devel] [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
  2015-04-02 22:09                 ` Bjorn Helgaas
  (?)
@ 2015-04-02 22:12                   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-02 22:12 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-fbdev, Daniel Vetter, Dave Hansen, Jan Beulich,
	H. Peter Anvin, Ville Syrjälä,
	xen-devel, Suresh Siddha, x86, Tomi Valkeinen, xen-devel,
	Ingo Molnar, Borislav Petkov, Jean-Christophe Plagniol-Villard,
	Antonino Daplas, Stefan Bader, Dave Airlie, Thomas Gleixner,
	Ingo Molnar, Juergen Gross, Toshi Kani, linux-kernel,
	Andy Lutomirski, David Vrabel, venkatesh.pallipadi,
	Roger Pau Monné

On Thu, Apr 2, 2015 at 3:09 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Thu, Apr 2, 2015 at 4:02 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>
>> ---
>> It is possible to enable CONFIG_MTRR and CONFIG_X86_PAT
>> and end up with a system with MTRR functionality disabled
>> PAT functionality enabled.
>
> This is missing a conjunction or something in "MTRR functionality
> disabled PAT functionality."

"and PAT functionality" -- fixed. Thanks.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [Xen-devel] [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
@ 2015-04-02 22:12                   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-02 22:12 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-fbdev, Daniel Vetter, Dave Hansen, Jan Beulich,
	H. Peter Anvin, Ville Syrjälä,
	xen-devel, Suresh Siddha, x86, Tomi Valkeinen, xen-devel,
	Ingo Molnar, Borislav Petkov, Jean-Christophe Plagniol-Villard,
	Antonino Daplas, Stefan Bader, Dave Airlie, Thomas Gleixner,
	Ingo Molnar, Juergen Gross, Toshi Kani, linux-kernel,
	Andy Lutomirski, David Vrabel, venkatesh.pallipadi,
	Roger Pau Monné

On Thu, Apr 2, 2015 at 3:09 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Thu, Apr 2, 2015 at 4:02 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>
>> ---
>> It is possible to enable CONFIG_MTRR and CONFIG_X86_PAT
>> and end up with a system with MTRR functionality disabled
>> PAT functionality enabled.
>
> This is missing a conjunction or something in "MTRR functionality
> disabled PAT functionality."

"and PAT functionality" -- fixed. Thanks.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [Xen-devel] [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
@ 2015-04-02 22:12                   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-02 22:12 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-fbdev, Daniel Vetter, Dave Hansen, Jan Beulich,
	H. Peter Anvin, Ville Syrjälä,
	xen-devel, Suresh Siddha, x86, Tomi Valkeinen, xen-devel,
	Ingo Molnar, Borislav Petkov, Jean-Christophe Plagniol-Villard,
	Antonino Daplas, Stefan Bader, Dave Airlie, Thomas Gleixner,
	Ingo Molnar, Juergen Gross, Toshi Kani, linux-kernel

On Thu, Apr 2, 2015 at 3:09 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Thu, Apr 2, 2015 at 4:02 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>
>> ---
>> It is possible to enable CONFIG_MTRR and CONFIG_X86_PAT
>> and end up with a system with MTRR functionality disabled
>> PAT functionality enabled.
>
> This is missing a conjunction or something in "MTRR functionality
> disabled PAT functionality."

"and PAT functionality" -- fixed. Thanks.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
  2015-04-02 20:55       ` Luis R. Rodriguez
@ 2015-04-02 22:35         ` Bjorn Helgaas
  -1 siblings, 0 replies; 710+ messages in thread
From: Bjorn Helgaas @ 2015-04-02 22:35 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Luis R. Rodriguez, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Juergen Gross, Jan Beulich, Borislav Petkov,
	Dave Airlie, linux-kernel, linux-fbdev, x86, xen-devel,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

[-cc Venkatesh, Suresh]

On Thu, Apr 2, 2015 at 3:55 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Thu, Apr 02, 2015 at 03:21:22PM -0500, Bjorn Helgaas wrote:
>> On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
>> <mcgrof@do-not-panic.com> wrote:
>> > From: "Luis R. Rodriguez" <mcgrof@suse.com>

>> > This is *only* a transitive API -- and as such no new
>> > drivers are ever expected to use this.
>>
>> "transient"?  Do you mean you intend to remove this API in the near future?
>
> That's correct, the problem is that in order to use PAT cleanly we'd need to
> change these drivers ...

I was just trying to ask whether you intended to write "transient"
instead of "transitive."  But I'm not doing a very good job :)

Bjorn

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
@ 2015-04-02 22:35         ` Bjorn Helgaas
  0 siblings, 0 replies; 710+ messages in thread
From: Bjorn Helgaas @ 2015-04-02 22:35 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Luis R. Rodriguez, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Juergen Gross, Jan Beulich, Borislav Petkov,
	Dave Airlie, linux-kernel, linux-fbdev, x86, xen-devel,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

[-cc Venkatesh, Suresh]

On Thu, Apr 2, 2015 at 3:55 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Thu, Apr 02, 2015 at 03:21:22PM -0500, Bjorn Helgaas wrote:
>> On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
>> <mcgrof@do-not-panic.com> wrote:
>> > From: "Luis R. Rodriguez" <mcgrof@suse.com>

>> > This is *only* a transitive API -- and as such no new
>> > drivers are ever expected to use this.
>>
>> "transient"?  Do you mean you intend to remove this API in the near future?
>
> That's correct, the problem is that in order to use PAT cleanly we'd need to
> change these drivers ...

I was just trying to ask whether you intended to write "transient"
instead of "transitive."  But I'm not doing a very good job :)

Bjorn

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
  2015-04-02 20:55       ` Luis R. Rodriguez
  (?)
  (?)
@ 2015-04-02 22:35       ` Bjorn Helgaas
  -1 siblings, 0 replies; 710+ messages in thread
From: Bjorn Helgaas @ 2015-04-02 22:35 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Juergen Gross, linux-fbdev, xen-devel, Antonino Daplas,
	Luis R. Rodriguez, Daniel Vetter, Tomi Valkeinen, x86,
	linux-kernel, Andy Lutomirski, Ingo Molnar, Jan Beulich,
	H. Peter Anvin, Dave Airlie, Thomas Gleixner, Borislav Petkov,
	Jean-Christophe Plagniol-Villard, Ingo Molnar

[-cc Venkatesh, Suresh]

On Thu, Apr 2, 2015 at 3:55 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Thu, Apr 02, 2015 at 03:21:22PM -0500, Bjorn Helgaas wrote:
>> On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
>> <mcgrof@do-not-panic.com> wrote:
>> > From: "Luis R. Rodriguez" <mcgrof@suse.com>

>> > This is *only* a transitive API -- and as such no new
>> > drivers are ever expected to use this.
>>
>> "transient"?  Do you mean you intend to remove this API in the near future?
>
> That's correct, the problem is that in order to use PAT cleanly we'd need to
> change these drivers ...

I was just trying to ask whether you intended to write "transient"
instead of "transitive."  But I'm not doing a very good job :)

Bjorn

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
  2015-04-02 22:35         ` Bjorn Helgaas
@ 2015-04-02 22:54           ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-02 22:54 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Juergen Gross, Jan Beulich, Borislav Petkov, Dave Airlie,
	linux-kernel, linux-fbdev, x86, xen-devel, Ingo Molnar,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

On Thu, Apr 2, 2015 at 3:35 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> [-cc Venkatesh, Suresh]
>
> On Thu, Apr 2, 2015 at 3:55 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> On Thu, Apr 02, 2015 at 03:21:22PM -0500, Bjorn Helgaas wrote:
>>> On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
>>> <mcgrof@do-not-panic.com> wrote:
>>> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
>>> > This is *only* a transitive API -- and as such no new
>>> > drivers are ever expected to use this.
>>>
>>> "transient"?  Do you mean you intend to remove this API in the near future?
>>
>> That's correct, the problem is that in order to use PAT cleanly we'd need to
>> change these drivers ...
>
> I was just trying to ask whether you intended to write "transient"
> instead of "transitive."  But I'm not doing a very good job :)

Yes, corrected, thanks.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
@ 2015-04-02 22:54           ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-02 22:54 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Juergen Gross, Jan Beulich, Borislav Petkov, Dave Airlie,
	linux-kernel, linux-fbdev, x86, xen-devel, Ingo Molnar,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen

On Thu, Apr 2, 2015 at 3:35 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> [-cc Venkatesh, Suresh]
>
> On Thu, Apr 2, 2015 at 3:55 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> On Thu, Apr 02, 2015 at 03:21:22PM -0500, Bjorn Helgaas wrote:
>>> On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
>>> <mcgrof@do-not-panic.com> wrote:
>>> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
>>> > This is *only* a transitive API -- and as such no new
>>> > drivers are ever expected to use this.
>>>
>>> "transient"?  Do you mean you intend to remove this API in the near future?
>>
>> That's correct, the problem is that in order to use PAT cleanly we'd need to
>> change these drivers ...
>
> I was just trying to ask whether you intended to write "transient"
> instead of "transitive."  But I'm not doing a very good job :)

Yes, corrected, thanks.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 06/47] mtrr: add __arch_phys_wc_add()
  2015-04-02 22:35         ` Bjorn Helgaas
  (?)
  (?)
@ 2015-04-02 22:54         ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-02 22:54 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Juergen Gross, linux-fbdev, xen-devel, Antonino Daplas,
	Daniel Vetter, Tomi Valkeinen, x86, linux-kernel,
	Andy Lutomirski, Ingo Molnar, Jan Beulich, H. Peter Anvin,
	Dave Airlie, Thomas Gleixner, Borislav Petkov,
	Jean-Christophe Plagniol-Villard, Ingo Molnar

On Thu, Apr 2, 2015 at 3:35 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> [-cc Venkatesh, Suresh]
>
> On Thu, Apr 2, 2015 at 3:55 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> On Thu, Apr 02, 2015 at 03:21:22PM -0500, Bjorn Helgaas wrote:
>>> On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
>>> <mcgrof@do-not-panic.com> wrote:
>>> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
>>> > This is *only* a transitive API -- and as such no new
>>> > drivers are ever expected to use this.
>>>
>>> "transient"?  Do you mean you intend to remove this API in the near future?
>>
>> That's correct, the problem is that in order to use PAT cleanly we'd need to
>> change these drivers ...
>
> I was just trying to ask whether you intended to write "transient"
> instead of "transitive."  But I'm not doing a very good job :)

Yes, corrected, thanks.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
  2015-04-02 21:49         ` Luis R. Rodriguez
@ 2015-04-02 23:52           ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-04-02 23:52 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
	suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
	linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Stefan Bader, konrad.wilk,
	ville.syrjala, david.vrabel, bhelgaas, Roger Pau Monné,
	xen-devel

On Thu, 2015-04-02 at 23:49 +0200, Luis R. Rodriguez wrote:
> On Sat, Mar 28, 2015 at 12:56:30AM +0100, Luis R. Rodriguez wrote:
> > On Fri, Mar 27, 2015 at 02:40:17PM -0600, Toshi Kani wrote:
> > > On Fri, 2015-03-20 at 16:17 -0700, Luis R. Rodriguez wrote:
> > >  :
> > > > @@ -734,6 +742,7 @@ void __init mtrr_bp_init(void)
> > > >  	}
> > > >  
> > > >  	if (mtrr_if) {
> > > > +		mtrr_enabled = true;
> > > >  		set_num_var_ranges();
> > > >  		init_table();
> > > >  		if (use_intel()) {
> > >                         get_mtrr_state();
> > > 
> > > After setting mtrr_enabled to true, get_mtrr_state() reads
> > > MSR_MTRRdefType and sets 'mtrr_state.enabled', which also indicates if
> > > MTRRs are enabled or not on the system.  So, potentially, we could have
> > > a case that mtrr_enabled is set to true, but mtrr_state.enabled is set
> > > to disabled when MTRRs are disabled by BIOS.
> > 
> > Thanks for the review, in this case then we should update mtrr_enabled to false.
> > 
> > > ps.
> > > I recently cleaned up this part of the MTRR code in the patch below,
> > > which is currently available in the -mm & -next trees.
> > > https://lkml.org/lkml/2015/3/24/1063
> > 
> > Great I will rebase and work with that and try to address this
> > consideration you have raised.
> 
> OK I'll mesh in this change as well in my next respin:
> 
> diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
> index a83f27a..ecf7cb9 100644
> --- a/arch/x86/kernel/cpu/mtrr/generic.c
> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> @@ -438,7 +438,7 @@ static void __init print_mtrr_state(void)
>  }
>  
>  /* Grab all of the MTRR state for this CPU into *state */
> -void __init get_mtrr_state(void)
> +bool __init get_mtrr_state(void)
>  {
>  	struct mtrr_var_range *vrs;
>  	unsigned long flags;
> @@ -482,6 +482,8 @@ void __init get_mtrr_state(void)
>  
>  	post_set();
>  	local_irq_restore(flags);
> +
> +	return !!mtrr_state.enabled;

This should be:
	return mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED;

because the MTRR_STATE_MTRR_FIXED_ENABLED flag is ignored when the
MTRR_STATE_MTRR_ENABLED flag is clear.

Thanks,
-Toshi




^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
@ 2015-04-02 23:52           ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-04-02 23:52 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
	suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
	linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Stefan Bader, konrad.wilk,
	ville.syrjala, david.vrabel, bhelgaas, Roger Pau Monné,
	xen-devel

On Thu, 2015-04-02 at 23:49 +0200, Luis R. Rodriguez wrote:
> On Sat, Mar 28, 2015 at 12:56:30AM +0100, Luis R. Rodriguez wrote:
> > On Fri, Mar 27, 2015 at 02:40:17PM -0600, Toshi Kani wrote:
> > > On Fri, 2015-03-20 at 16:17 -0700, Luis R. Rodriguez wrote:
> > >  :
> > > > @@ -734,6 +742,7 @@ void __init mtrr_bp_init(void)
> > > >  	}
> > > >  
> > > >  	if (mtrr_if) {
> > > > +		mtrr_enabled = true;
> > > >  		set_num_var_ranges();
> > > >  		init_table();
> > > >  		if (use_intel()) {
> > >                         get_mtrr_state();
> > > 
> > > After setting mtrr_enabled to true, get_mtrr_state() reads
> > > MSR_MTRRdefType and sets 'mtrr_state.enabled', which also indicates if
> > > MTRRs are enabled or not on the system.  So, potentially, we could have
> > > a case that mtrr_enabled is set to true, but mtrr_state.enabled is set
> > > to disabled when MTRRs are disabled by BIOS.
> > 
> > Thanks for the review, in this case then we should update mtrr_enabled to false.
> > 
> > > ps.
> > > I recently cleaned up this part of the MTRR code in the patch below,
> > > which is currently available in the -mm & -next trees.
> > > https://lkml.org/lkml/2015/3/24/1063
> > 
> > Great I will rebase and work with that and try to address this
> > consideration you have raised.
> 
> OK I'll mesh in this change as well in my next respin:
> 
> diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
> index a83f27a..ecf7cb9 100644
> --- a/arch/x86/kernel/cpu/mtrr/generic.c
> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> @@ -438,7 +438,7 @@ static void __init print_mtrr_state(void)
>  }
>  
>  /* Grab all of the MTRR state for this CPU into *state */
> -void __init get_mtrr_state(void)
> +bool __init get_mtrr_state(void)
>  {
>  	struct mtrr_var_range *vrs;
>  	unsigned long flags;
> @@ -482,6 +482,8 @@ void __init get_mtrr_state(void)
>  
>  	post_set();
>  	local_irq_restore(flags);
> +
> +	return !!mtrr_state.enabled;

This should be:
	return mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED;

because the MTRR_STATE_MTRR_FIXED_ENABLED flag is ignored when the
MTRR_STATE_MTRR_ENABLED flag is clear.

Thanks,
-Toshi




^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
  2015-04-02 23:52           ` Toshi Kani
@ 2015-04-03  1:08             ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-03  1:08 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
	suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
	linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Stefan Bader, konrad.wilk,
	ville.syrjala, david.vrabel, bhelgaas, Roger Pau Monné,
	xen-devel

On Thu, Apr 02, 2015 at 05:52:16PM -0600, Toshi Kani wrote:
> On Thu, 2015-04-02 at 23:49 +0200, Luis R. Rodriguez wrote:
> > On Sat, Mar 28, 2015 at 12:56:30AM +0100, Luis R. Rodriguez wrote:
> > > On Fri, Mar 27, 2015 at 02:40:17PM -0600, Toshi Kani wrote:
> > > > On Fri, 2015-03-20 at 16:17 -0700, Luis R. Rodriguez wrote:
> > > >  :
> > > > > @@ -734,6 +742,7 @@ void __init mtrr_bp_init(void)
> > > > >  	}
> > > > >  
> > > > >  	if (mtrr_if) {
> > > > > +		mtrr_enabled = true;
> > > > >  		set_num_var_ranges();
> > > > >  		init_table();
> > > > >  		if (use_intel()) {
> > > >                         get_mtrr_state();
> > > > 
> > > > After setting mtrr_enabled to true, get_mtrr_state() reads
> > > > MSR_MTRRdefType and sets 'mtrr_state.enabled', which also indicates if
> > > > MTRRs are enabled or not on the system.  So, potentially, we could have
> > > > a case that mtrr_enabled is set to true, but mtrr_state.enabled is set
> > > > to disabled when MTRRs are disabled by BIOS.
> > > 
> > > Thanks for the review, in this case then we should update mtrr_enabled to false.
> > > 
> > > > ps.
> > > > I recently cleaned up this part of the MTRR code in the patch below,
> > > > which is currently available in the -mm & -next trees.
> > > > https://lkml.org/lkml/2015/3/24/1063
> > > 
> > > Great I will rebase and work with that and try to address this
> > > consideration you have raised.
> > 
> > OK I'll mesh in this change as well in my next respin:
> > 
> > diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
> > index a83f27a..ecf7cb9 100644
> > --- a/arch/x86/kernel/cpu/mtrr/generic.c
> > +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> > @@ -438,7 +438,7 @@ static void __init print_mtrr_state(void)
> >  }
> >  
> >  /* Grab all of the MTRR state for this CPU into *state */
> > -void __init get_mtrr_state(void)
> > +bool __init get_mtrr_state(void)
> >  {
> >  	struct mtrr_var_range *vrs;
> >  	unsigned long flags;
> > @@ -482,6 +482,8 @@ void __init get_mtrr_state(void)
> >  
> >  	post_set();
> >  	local_irq_restore(flags);
> > +
> > +	return !!mtrr_state.enabled;
> 
> This should be:
> 	return mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED;
> 
> because the MTRR_STATE_MTRR_FIXED_ENABLED flag is ignored when the
> MTRR_STATE_MTRR_ENABLED flag is clear.

Thanks, I've used

	return !!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED);

Amended.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
@ 2015-04-03  1:08             ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-03  1:08 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Luis R. Rodriguez, luto, mingo, tglx, hpa, jgross, JBeulich, bp,
	suresh.b.siddha, venkatesh.pallipadi, airlied, linux-kernel,
	linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Stefan Bader, konrad.wilk,
	ville.syrjala, david.vrabel, bhelgaas, Roger Pau Monné,
	xen-devel

On Thu, Apr 02, 2015 at 05:52:16PM -0600, Toshi Kani wrote:
> On Thu, 2015-04-02 at 23:49 +0200, Luis R. Rodriguez wrote:
> > On Sat, Mar 28, 2015 at 12:56:30AM +0100, Luis R. Rodriguez wrote:
> > > On Fri, Mar 27, 2015 at 02:40:17PM -0600, Toshi Kani wrote:
> > > > On Fri, 2015-03-20 at 16:17 -0700, Luis R. Rodriguez wrote:
> > > >  :
> > > > > @@ -734,6 +742,7 @@ void __init mtrr_bp_init(void)
> > > > >  	}
> > > > >  
> > > > >  	if (mtrr_if) {
> > > > > +		mtrr_enabled = true;
> > > > >  		set_num_var_ranges();
> > > > >  		init_table();
> > > > >  		if (use_intel()) {
> > > >                         get_mtrr_state();
> > > > 
> > > > After setting mtrr_enabled to true, get_mtrr_state() reads
> > > > MSR_MTRRdefType and sets 'mtrr_state.enabled', which also indicates if
> > > > MTRRs are enabled or not on the system.  So, potentially, we could have
> > > > a case that mtrr_enabled is set to true, but mtrr_state.enabled is set
> > > > to disabled when MTRRs are disabled by BIOS.
> > > 
> > > Thanks for the review, in this case then we should update mtrr_enabled to false.
> > > 
> > > > ps.
> > > > I recently cleaned up this part of the MTRR code in the patch below,
> > > > which is currently available in the -mm & -next trees.
> > > > https://lkml.org/lkml/2015/3/24/1063
> > > 
> > > Great I will rebase and work with that and try to address this
> > > consideration you have raised.
> > 
> > OK I'll mesh in this change as well in my next respin:
> > 
> > diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
> > index a83f27a..ecf7cb9 100644
> > --- a/arch/x86/kernel/cpu/mtrr/generic.c
> > +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> > @@ -438,7 +438,7 @@ static void __init print_mtrr_state(void)
> >  }
> >  
> >  /* Grab all of the MTRR state for this CPU into *state */
> > -void __init get_mtrr_state(void)
> > +bool __init get_mtrr_state(void)
> >  {
> >  	struct mtrr_var_range *vrs;
> >  	unsigned long flags;
> > @@ -482,6 +482,8 @@ void __init get_mtrr_state(void)
> >  
> >  	post_set();
> >  	local_irq_restore(flags);
> > +
> > +	return !!mtrr_state.enabled;
> 
> This should be:
> 	return mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED;
> 
> because the MTRR_STATE_MTRR_FIXED_ENABLED flag is ignored when the
> MTRR_STATE_MTRR_ENABLED flag is clear.

Thanks, I've used

	return !!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED);

Amended.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/7] mtrr, mm, x86: Enhance MTRR checks for huge I/O mapping
  2015-03-24 22:43   ` Andrew Morton
@ 2015-04-03  6:33     ` Ingo Molnar
  -1 siblings, 0 replies; 710+ messages in thread
From: Ingo Molnar @ 2015-04-03  6:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Toshi Kani, hpa, tglx, mingo, linux-mm, x86, linux-kernel,
	dave.hansen, Elliott, pebolle


* Andrew Morton <akpm@linux-foundation.org> wrote:

> On Tue, 24 Mar 2015 16:08:34 -0600 Toshi Kani <toshi.kani@hp.com> wrote:
> 
> > This patchset enhances MTRR checks for the kernel huge I/O mapping,
> > which was enabled by the patchset below:
> >   https://lkml.org/lkml/2015/3/3/589
> > 
> > The following functional changes are made in patch 7/7.
> >  - Allow pud_set_huge() and pmd_set_huge() to create a huge page
> >    mapping to a range covered by a single MTRR entry of any memory
> >    type.
> >  - Log a pr_warn() message when a specified PMD map range spans more
> >    than a single MTRR entry.  Drivers should make a mapping request
> >    aligned to a single MTRR entry when the range is covered by MTRRs.
> > 
> 
> OK, I grabbed these after barely looking at them, to get them a bit of
> runtime testing.
> 
> I'll await guidance from the x86 maintainers regarding next steps?

Could you please send the current version of them over to us if your 
testing didn't find any problems?

I'd like to take a final look and have them cook in the x86 tree as 
well for a while and want to preserve your testing effort.

Thanks!

	Ingo

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/7] mtrr, mm, x86: Enhance MTRR checks for huge I/O mapping
@ 2015-04-03  6:33     ` Ingo Molnar
  0 siblings, 0 replies; 710+ messages in thread
From: Ingo Molnar @ 2015-04-03  6:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Toshi Kani, hpa, tglx, mingo, linux-mm, x86, linux-kernel,
	dave.hansen, Elliott, pebolle


* Andrew Morton <akpm@linux-foundation.org> wrote:

> On Tue, 24 Mar 2015 16:08:34 -0600 Toshi Kani <toshi.kani@hp.com> wrote:
> 
> > This patchset enhances MTRR checks for the kernel huge I/O mapping,
> > which was enabled by the patchset below:
> >   https://lkml.org/lkml/2015/3/3/589
> > 
> > The following functional changes are made in patch 7/7.
> >  - Allow pud_set_huge() and pmd_set_huge() to create a huge page
> >    mapping to a range covered by a single MTRR entry of any memory
> >    type.
> >  - Log a pr_warn() message when a specified PMD map range spans more
> >    than a single MTRR entry.  Drivers should make a mapping request
> >    aligned to a single MTRR entry when the range is covered by MTRRs.
> > 
> 
> OK, I grabbed these after barely looking at them, to get them a bit of
> runtime testing.
> 
> I'll await guidance from the x86 maintainers regarding next steps?

Could you please send the current version of them over to us if your 
testing didn't find any problems?

I'd like to take a final look and have them cook in the x86 tree as 
well for a while and want to preserve your testing effort.

Thanks!

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/7] mtrr, mm, x86: Enhance MTRR checks for huge I/O mapping
  2015-04-03  6:33     ` Ingo Molnar
@ 2015-04-03 15:22       ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-04-03 15:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, hpa, tglx, mingo, linux-mm, x86, linux-kernel,
	dave.hansen, Elliott, pebolle

On Fri, 2015-04-03 at 08:33 +0200, Ingo Molnar wrote:
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > On Tue, 24 Mar 2015 16:08:34 -0600 Toshi Kani <toshi.kani@hp.com> wrote:
> > 
> > > This patchset enhances MTRR checks for the kernel huge I/O mapping,
> > > which was enabled by the patchset below:
> > >   https://lkml.org/lkml/2015/3/3/589
> > > 
> > > The following functional changes are made in patch 7/7.
> > >  - Allow pud_set_huge() and pmd_set_huge() to create a huge page
> > >    mapping to a range covered by a single MTRR entry of any memory
> > >    type.
> > >  - Log a pr_warn() message when a specified PMD map range spans more
> > >    than a single MTRR entry.  Drivers should make a mapping request
> > >    aligned to a single MTRR entry when the range is covered by MTRRs.
> > > 
> > 
> > OK, I grabbed these after barely looking at them, to get them a bit of
> > runtime testing.
> > 
> > I'll await guidance from the x86 maintainers regarding next steps?
> 
> Could you please send the current version of them over to us if your 
> testing didn't find any problems?
> 
> I'd like to take a final look and have them cook in the x86 tree as 
> well for a while and want to preserve your testing effort.

This patchset is on top of the following patches in the -mm tree.
(Patches apply from the bottom to the top.)

2. Build error fixes and cleanups
http://ozlabs.org/~akpm/mmotm/broken-out/x86-mm-support-huge-kva-mappings-on-x86-fix.patch
http://ozlabs.org/~akpm/mmotm/broken-out/mm-change-vunmap-to-tear-down-huge-kva-mappings-fix.patch
http://ozlabs.org/~akpm/mmotm/broken-out/mm-change-ioremap-to-set-up-huge-i-o-mappings-fix.patch
http://ozlabs.org/~akpm/mmotm/broken-out/lib-add-huge-i-o-map-capability-interfaces-fix.patch

1. Kernel huge I/O mapping support
http://ozlabs.org/~akpm/mmotm/broken-out/x86-mm-support-huge-kva-mappings-on-x86.patch
http://ozlabs.org/~akpm/mmotm/broken-out/x86-mm-support-huge-i-o-mapping-capability-i-f.patch
http://ozlabs.org/~akpm/mmotm/broken-out/mm-change-vunmap-to-tear-down-huge-kva-mappings.patch
http://ozlabs.org/~akpm/mmotm/broken-out/mm-change-ioremap-to-set-up-huge-i-o-mappings.patch
http://ozlabs.org/~akpm/mmotm/broken-out/lib-add-huge-i-o-map-capability-interfaces.patch
http://ozlabs.org/~akpm/mmotm/broken-out/mm-change-__get_vm_area_node-to-use-fls_long.patch

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/7] mtrr, mm, x86: Enhance MTRR checks for huge I/O mapping
@ 2015-04-03 15:22       ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-04-03 15:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, hpa, tglx, mingo, linux-mm, x86, linux-kernel,
	dave.hansen, Elliott, pebolle

On Fri, 2015-04-03 at 08:33 +0200, Ingo Molnar wrote:
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > On Tue, 24 Mar 2015 16:08:34 -0600 Toshi Kani <toshi.kani@hp.com> wrote:
> > 
> > > This patchset enhances MTRR checks for the kernel huge I/O mapping,
> > > which was enabled by the patchset below:
> > >   https://lkml.org/lkml/2015/3/3/589
> > > 
> > > The following functional changes are made in patch 7/7.
> > >  - Allow pud_set_huge() and pmd_set_huge() to create a huge page
> > >    mapping to a range covered by a single MTRR entry of any memory
> > >    type.
> > >  - Log a pr_warn() message when a specified PMD map range spans more
> > >    than a single MTRR entry.  Drivers should make a mapping request
> > >    aligned to a single MTRR entry when the range is covered by MTRRs.
> > > 
> > 
> > OK, I grabbed these after barely looking at them, to get them a bit of
> > runtime testing.
> > 
> > I'll await guidance from the x86 maintainers regarding next steps?
> 
> Could you please send the current version of them over to us if your 
> testing didn't find any problems?
> 
> I'd like to take a final look and have them cook in the x86 tree as 
> well for a while and want to preserve your testing effort.

This patchset is on top of the following patches in the -mm tree.
(Patches apply from the bottom to the top.)

2. Build error fixes and cleanups
http://ozlabs.org/~akpm/mmotm/broken-out/x86-mm-support-huge-kva-mappings-on-x86-fix.patch
http://ozlabs.org/~akpm/mmotm/broken-out/mm-change-vunmap-to-tear-down-huge-kva-mappings-fix.patch
http://ozlabs.org/~akpm/mmotm/broken-out/mm-change-ioremap-to-set-up-huge-i-o-mappings-fix.patch
http://ozlabs.org/~akpm/mmotm/broken-out/lib-add-huge-i-o-map-capability-interfaces-fix.patch

1. Kernel huge I/O mapping support
http://ozlabs.org/~akpm/mmotm/broken-out/x86-mm-support-huge-kva-mappings-on-x86.patch
http://ozlabs.org/~akpm/mmotm/broken-out/x86-mm-support-huge-i-o-mapping-capability-i-f.patch
http://ozlabs.org/~akpm/mmotm/broken-out/mm-change-vunmap-to-tear-down-huge-kva-mappings.patch
http://ozlabs.org/~akpm/mmotm/broken-out/mm-change-ioremap-to-set-up-huge-i-o-mappings.patch
http://ozlabs.org/~akpm/mmotm/broken-out/lib-add-huge-i-o-map-capability-interfaces.patch
http://ozlabs.org/~akpm/mmotm/broken-out/mm-change-__get_vm_area_node-to-use-fls_long.patch

Thanks,
-Toshi


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* [PATCH] x86/kaslr: Fix typo in documentation
@ 2015-04-14 11:35 Miroslav Benes
  2015-04-14 11:37 ` Borislav Petkov
  0 siblings, 1 reply; 710+ messages in thread
From: Miroslav Benes @ 2015-04-14 11:35 UTC (permalink / raw)
  To: bp, corbet
  Cc: mingo, hpa, tglx, jkosina, x86, linux-kernel, linux-doc, Miroslav Benes

Documentation/x86/boot.txt labels the bit in boot_params.hdr.loadflags
as ALSR_FLAG while it should be KASLR_FLAG.

Signed-off-by: Miroslav Benes <mbenes@suse.cz>
---
 Documentation/x86/boot.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt
index 88b8589..69e1397 100644
--- a/Documentation/x86/boot.txt
+++ b/Documentation/x86/boot.txt
@@ -406,7 +406,7 @@ Protocol:	2.00+
 	- If 0, the protected-mode code is loaded at 0x10000.
 	- If 1, the protected-mode code is loaded at 0x100000.
 
-  Bit 1 (kernel internal): ALSR_FLAG
+  Bit 1 (kernel internal): KASLR_FLAG
 	- Used internally by the compressed kernel to communicate
 	  KASLR status to kernel proper.
 	  If 1, KASLR enabled.
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [PATCH] x86/kaslr: Fix typo in documentation
  2015-04-14 11:35 [PATCH] x86/kaslr: Fix typo in documentation Miroslav Benes
@ 2015-04-14 11:37 ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-04-14 11:37 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: corbet, mingo, hpa, tglx, jkosina, x86, linux-kernel, linux-doc

On Tue, Apr 14, 2015 at 01:35:24PM +0200, Miroslav Benes wrote:
> Documentation/x86/boot.txt labels the bit in boot_params.hdr.loadflags
> as ALSR_FLAG while it should be KASLR_FLAG.
> 
> Signed-off-by: Miroslav Benes <mbenes@suse.cz>
> ---
>  Documentation/x86/boot.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt
> index 88b8589..69e1397 100644
> --- a/Documentation/x86/boot.txt
> +++ b/Documentation/x86/boot.txt
> @@ -406,7 +406,7 @@ Protocol:	2.00+
>  	- If 0, the protected-mode code is loaded at 0x10000.
>  	- If 1, the protected-mode code is loaded at 0x100000.
>  
> -  Bit 1 (kernel internal): ALSR_FLAG
> +  Bit 1 (kernel internal): KASLR_FLAG
>  	- Used internally by the compressed kernel to communicate
>  	  KASLR status to kernel proper.
>  	  If 1, KASLR enabled.

Applied, thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
  2015-03-26  3:00       ` Luis R. Rodriguez
  (?)
@ 2015-04-21 17:52       ` Luis R. Rodriguez
  2015-04-21 18:46         ` Michael S. Tsirkin
  2015-04-21 18:46         ` Michael S. Tsirkin
  -1 siblings, 2 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-21 17:52 UTC (permalink / raw)
  To: Bjorn Helgaas, Arnd Bergmann, Linus Walleij, Stefano Stabellini,
	Julia Lawall, Peter Senna Tschudin, Sarah Sharp
  Cc: Luis R. Rodriguez, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, jgross, Jan Beulich, Borislav Petkov,
	Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
	linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Michael S. Tsirkin,
	Stefan Bader, Konrad Rzeszutek Wilk, Ville Syrjälä,
	David Vrabel, Toshi Kani, Roger Pau Monné,
	Benjamin Poirier, linux-pci

On Thu, Mar 26, 2015 at 04:00:54AM +0100, Luis R. Rodriguez wrote:
> On Mon, Mar 23, 2015 at 12:20:47PM -0500, Bjorn Helgaas wrote:
> > Hi Luis,
> > 
> > This seems OK to me, 
> 
> Great.
> 
> > but I'm curious about a few things.
> > 
> > On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
> > <mcgrof@do-not-panic.com> wrote:
> > > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> > >
> > > This allows drivers to take advantage of write-combining
> > > when possible. Ideally we'd have pci_read_bases() just
> > > peg an IORESOURCE_WC flag for us
> > 
> > We do set IORESOURCE_PREFETCH.  Do you mean something different?
> 
> I did not think we had a WC IORESOURCE flag. Are you saying that we can use
> IORESOURCE_PREFETCH for that purpose? If so then great.  As I read a PCI BAR
> can have PCI_BASE_ADDRESS_MEM_PREFETCH and when that's the case we peg
> IORESOURCE_PREFETCH. That seems to be what I want indeed. Questions below.
> 
> > >  but where exactly
> > > video devices memory lie varies *largely* and at times things
> > > are mixed with MMIO registers, sometimes we can address
> > > the changes in drivers, other times the change requires
> > > intrusive changes.
> > 
> > What does a video device address have to do with this?  I do see that
> > if a BAR maps only a frame buffer, the device might be able to mark it
> > prefetchable, while if the BAR mapped both a frame buffer and some
> > registers, it might not be able to make it prefetchable.  But that
> > doesn't seem like it depends on the *address*.
> 
> I meant the offsets for each of those, either registers or framebuffer,
> and that typically they are mixed (primarily on older devices), so indeed your
> summary of the problem is what I meant. Let's remember that we are trying to
> take advantage of PAT here when available and avoid MTRR in that case, do we
> know that the same PCI BARs that have always historically used MTRRs had
> IORESOURCE_PREFETCH set, is that a fair assumption ? I realize they are
> different things -- but its precisely why I ask.
> 
> > pci_iomap_range() already makes a cacheable mapping if
> > IORESOURCE_CACHEABLE; I'm guessing that you would like it to
> > automatically use WC if the BAR if IORESOURCE_PREFETCH, e.g.,
> > 
> >   if (flags & IORESOURCE_CACHEABLE)
> >     return ioremap(start, len);
> >   if (flags & IORESOURCE_PREFETCH)
> >     return ioremap_wc(start, len);
> >   return ioremap_nocache(start, len);
> 
> Indeed, that's exactly what I think we should strive towards.
> 
> > Is there a reason not to do that?
> 
> This depends on the exact defintion of IORESOURCE_PREFETCH and
> PCI_BASE_ADDRESS_MEM_PREFETCH and how they are used all over and
> accross *all devices*. This didn't look promising for starters:
> 
> include/uapi/linux/pci_regs.h:#define  PCI_BASE_ADDRESS_MEM_PREFETCH    0x08    /* prefetchable? */
> 
> PCI_BASE_ADDRESS_MEM_PREFETCH seems to be BAR specific, so a few questions:
> 
> 1) Can we rest assured for instance that if we check for
> PCI_BASE_ADDRESS_MEM_PREFETCH and if set that it will *only* be set on a full
> PCI BAR if the full PCI BAR does want WC? If not this can regress
> functionality. That seems risky. It however would not be risky if we used
> another API that did look for IORESOURCE_PREFETCH and if so use ioremap_wc() --
> that way only drivers we know that do use the full PCI bar would use this API.
> There's a bit of a problem with this though:
> 
> 2) Do we know that if a *full PCI BAR* is used for WC that
> PCI_BASE_ADDRESS_MEM_PREFETCH *was* definitely set for the PCI BAR? If so then
> the API usage would be restricted only to devices that we know *do* adhere to
> this. That reduces the possible uses for older drivers and can create
> regressions if used loosely without verification... but..
> 
> 3) If from now on we get folks to commit to uset PCI_BASE_ADDRESS_MEM_PREFETCH
> for full PCI BARs that do want WC perhaps newer devices / drivers will use
> this very consistently ? Can we bank on that and is it worth it ?
> 
> 4) If a PCI BAR *does not* have PCI_BASE_ADDRESS_MEM_PREFETCH do we know it
> must not never want WC ?
> 
> If we don't have certainty on any of the above I'm afraid we can't do much
> right now but perhaps we can push towards better use of PCI_BASE_ADDRESS_MEM_PREFETCH
> and hope folks will only use this for the full PCI BAR only if WC is desired.
> 
> Thoughts?

Bjorn, now that you're done schooling me on English, any thoughts on the above?

> > > Although there is also arch_phys_wc_add() that makes use of
> > > architecture specific write-combinging alternatives (MTRR on
> > > x86 when a system does not have PAT) we void polluting
> > > pci_iomap() space with it and force drivers and subsystems
> > > that want to use it to be explicit.
> > >
> > > There are a few motivations for this:
> > >
> > > a) Take advantage of PAT when available
> > >
> > > b) Help bury MTRR code away, MTRR is architecture specific and on
> > >    x86 its replaced by PAT
> > >
> > > c) Help with the goal of eventually using _PAGE_CACHE_UC over
> > >    _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
> > > ...
> > 
> > > +void __iomem *pci_iomap_wc_range(struct pci_dev *dev,
> > > +                                int bar,
> > > +                                unsigned long offset,
> > > +                                unsigned long maxlen)
> > > +{
> > > +       resource_size_t start = pci_resource_start(dev, bar);
> > > +       resource_size_t len = pci_resource_len(dev, bar);
> > > +       unsigned long flags = pci_resource_flags(dev, bar);
> > > +
> > > +       if (len <= offset || !start)
> > > +               return NULL;
> > > +       len -= offset;
> > > +       start += offset;
> > > +       if (maxlen && len > maxlen)
> > > +               len = maxlen;
> > > +       if (flags & IORESOURCE_IO)
> > > +               return __pci_ioport_map(dev, start, len);
> > > +       if (flags & IORESOURCE_MEM)
> > 
> > Should we log a note in dmesg if the BAR is *not* IORESOURCE_PREFETCH?
> >  I know the driver might know it's safe even if the device didn't mark
> > the BAR as prefetchable, but it does seem like an easy way for a
> > driver to shoot itself in the foot.
> 
> You tell me. I would fear this may not be consistent and we'd end up
> having bug reports open for something that has historically been a
> non-issue. The above questions can help us gauge the risk of this.

Now, I'll tell you what I *think* but these are just guestimates (TM):

  * Likely PCI_BASE_ADDRESS_MEM_PREFETCH can implate IORESOURCE_PREFETCH
    and use of ioremap_wc() on a full PCI BAR only, but this strict
    definition likely cannot be 100% guaranteed and could break some
    devices. We need something a bit more concrete and well known so
    that next generation industry standards embrace and let us in
    the kernel automatically detect specific ranges and their respective
    page attribute requirements. Might be good to address here x86 and
    ARM families

Curious: Sarah, how does USB address these different different page attribute
needs on USB 3.0?

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
  2015-03-26  3:00       ` Luis R. Rodriguez
  (?)
  (?)
@ 2015-04-21 17:52       ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-21 17:52 UTC (permalink / raw)
  To: Bjorn Helgaas, Arnd Bergmann, Linus Walleij, Stefano Stabellini,
	Julia Lawall, Peter Senna Tschudin, Sarah Sharp
  Cc: linux-fbdev, Michael S. Tsirkin, Daniel Vetter, Dave Hansen,
	Jan Beulich, H. Peter Anvin, Ville Syrjälä,
	Suresh Siddha, x86, Tomi Valkeinen, linux-pci, xen-devel,
	Ingo Molnar, Borislav Petkov, Jean-Christophe Plagniol-Villard,
	Benjamin Poirier, Antonino Daplas, Stefan Bader, Dave Airlie,
	Thomas Gleixner, Ingo Molnar, jgross, Toshi Kani

On Thu, Mar 26, 2015 at 04:00:54AM +0100, Luis R. Rodriguez wrote:
> On Mon, Mar 23, 2015 at 12:20:47PM -0500, Bjorn Helgaas wrote:
> > Hi Luis,
> > 
> > This seems OK to me, 
> 
> Great.
> 
> > but I'm curious about a few things.
> > 
> > On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
> > <mcgrof@do-not-panic.com> wrote:
> > > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> > >
> > > This allows drivers to take advantage of write-combining
> > > when possible. Ideally we'd have pci_read_bases() just
> > > peg an IORESOURCE_WC flag for us
> > 
> > We do set IORESOURCE_PREFETCH.  Do you mean something different?
> 
> I did not think we had a WC IORESOURCE flag. Are you saying that we can use
> IORESOURCE_PREFETCH for that purpose? If so then great.  As I read a PCI BAR
> can have PCI_BASE_ADDRESS_MEM_PREFETCH and when that's the case we peg
> IORESOURCE_PREFETCH. That seems to be what I want indeed. Questions below.
> 
> > >  but where exactly
> > > video devices memory lie varies *largely* and at times things
> > > are mixed with MMIO registers, sometimes we can address
> > > the changes in drivers, other times the change requires
> > > intrusive changes.
> > 
> > What does a video device address have to do with this?  I do see that
> > if a BAR maps only a frame buffer, the device might be able to mark it
> > prefetchable, while if the BAR mapped both a frame buffer and some
> > registers, it might not be able to make it prefetchable.  But that
> > doesn't seem like it depends on the *address*.
> 
> I meant the offsets for each of those, either registers or framebuffer,
> and that typically they are mixed (primarily on older devices), so indeed your
> summary of the problem is what I meant. Let's remember that we are trying to
> take advantage of PAT here when available and avoid MTRR in that case, do we
> know that the same PCI BARs that have always historically used MTRRs had
> IORESOURCE_PREFETCH set, is that a fair assumption ? I realize they are
> different things -- but its precisely why I ask.
> 
> > pci_iomap_range() already makes a cacheable mapping if
> > IORESOURCE_CACHEABLE; I'm guessing that you would like it to
> > automatically use WC if the BAR if IORESOURCE_PREFETCH, e.g.,
> > 
> >   if (flags & IORESOURCE_CACHEABLE)
> >     return ioremap(start, len);
> >   if (flags & IORESOURCE_PREFETCH)
> >     return ioremap_wc(start, len);
> >   return ioremap_nocache(start, len);
> 
> Indeed, that's exactly what I think we should strive towards.
> 
> > Is there a reason not to do that?
> 
> This depends on the exact defintion of IORESOURCE_PREFETCH and
> PCI_BASE_ADDRESS_MEM_PREFETCH and how they are used all over and
> accross *all devices*. This didn't look promising for starters:
> 
> include/uapi/linux/pci_regs.h:#define  PCI_BASE_ADDRESS_MEM_PREFETCH    0x08    /* prefetchable? */
> 
> PCI_BASE_ADDRESS_MEM_PREFETCH seems to be BAR specific, so a few questions:
> 
> 1) Can we rest assured for instance that if we check for
> PCI_BASE_ADDRESS_MEM_PREFETCH and if set that it will *only* be set on a full
> PCI BAR if the full PCI BAR does want WC? If not this can regress
> functionality. That seems risky. It however would not be risky if we used
> another API that did look for IORESOURCE_PREFETCH and if so use ioremap_wc() --
> that way only drivers we know that do use the full PCI bar would use this API.
> There's a bit of a problem with this though:
> 
> 2) Do we know that if a *full PCI BAR* is used for WC that
> PCI_BASE_ADDRESS_MEM_PREFETCH *was* definitely set for the PCI BAR? If so then
> the API usage would be restricted only to devices that we know *do* adhere to
> this. That reduces the possible uses for older drivers and can create
> regressions if used loosely without verification... but..
> 
> 3) If from now on we get folks to commit to uset PCI_BASE_ADDRESS_MEM_PREFETCH
> for full PCI BARs that do want WC perhaps newer devices / drivers will use
> this very consistently ? Can we bank on that and is it worth it ?
> 
> 4) If a PCI BAR *does not* have PCI_BASE_ADDRESS_MEM_PREFETCH do we know it
> must not never want WC ?
> 
> If we don't have certainty on any of the above I'm afraid we can't do much
> right now but perhaps we can push towards better use of PCI_BASE_ADDRESS_MEM_PREFETCH
> and hope folks will only use this for the full PCI BAR only if WC is desired.
> 
> Thoughts?

Bjorn, now that you're done schooling me on English, any thoughts on the above?

> > > Although there is also arch_phys_wc_add() that makes use of
> > > architecture specific write-combinging alternatives (MTRR on
> > > x86 when a system does not have PAT) we void polluting
> > > pci_iomap() space with it and force drivers and subsystems
> > > that want to use it to be explicit.
> > >
> > > There are a few motivations for this:
> > >
> > > a) Take advantage of PAT when available
> > >
> > > b) Help bury MTRR code away, MTRR is architecture specific and on
> > >    x86 its replaced by PAT
> > >
> > > c) Help with the goal of eventually using _PAGE_CACHE_UC over
> > >    _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
> > > ...
> > 
> > > +void __iomem *pci_iomap_wc_range(struct pci_dev *dev,
> > > +                                int bar,
> > > +                                unsigned long offset,
> > > +                                unsigned long maxlen)
> > > +{
> > > +       resource_size_t start = pci_resource_start(dev, bar);
> > > +       resource_size_t len = pci_resource_len(dev, bar);
> > > +       unsigned long flags = pci_resource_flags(dev, bar);
> > > +
> > > +       if (len <= offset || !start)
> > > +               return NULL;
> > > +       len -= offset;
> > > +       start += offset;
> > > +       if (maxlen && len > maxlen)
> > > +               len = maxlen;
> > > +       if (flags & IORESOURCE_IO)
> > > +               return __pci_ioport_map(dev, start, len);
> > > +       if (flags & IORESOURCE_MEM)
> > 
> > Should we log a note in dmesg if the BAR is *not* IORESOURCE_PREFETCH?
> >  I know the driver might know it's safe even if the device didn't mark
> > the BAR as prefetchable, but it does seem like an easy way for a
> > driver to shoot itself in the foot.
> 
> You tell me. I would fear this may not be consistent and we'd end up
> having bug reports open for something that has historically been a
> non-issue. The above questions can help us gauge the risk of this.

Now, I'll tell you what I *think* but these are just guestimates (TM):

  * Likely PCI_BASE_ADDRESS_MEM_PREFETCH can implate IORESOURCE_PREFETCH
    and use of ioremap_wc() on a full PCI BAR only, but this strict
    definition likely cannot be 100% guaranteed and could break some
    devices. We need something a bit more concrete and well known so
    that next generation industry standards embrace and let us in
    the kernel automatically detect specific ranges and their respective
    page attribute requirements. Might be good to address here x86 and
    ARM families

Curious: Sarah, how does USB address these different different page attribute
needs on USB 3.0?

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
  2015-04-21 17:52       ` Luis R. Rodriguez
@ 2015-04-21 18:46         ` Michael S. Tsirkin
  2015-04-21 18:46         ` Michael S. Tsirkin
  1 sibling, 0 replies; 710+ messages in thread
From: Michael S. Tsirkin @ 2015-04-21 18:46 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Bjorn Helgaas, Arnd Bergmann, Linus Walleij, Stefano Stabellini,
	Julia Lawall, Peter Senna Tschudin, Sarah Sharp,
	Luis R. Rodriguez, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, jgross, Jan Beulich, Borislav Petkov,
	Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
	linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Stefan Bader,
	Konrad Rzeszutek Wilk, Ville Syrjälä,
	David Vrabel, Toshi Kani, Roger Pau Monné,
	Benjamin Poirier, linux-pci

On Tue, Apr 21, 2015 at 07:52:49PM +0200, Luis R. Rodriguez wrote:
> On Thu, Mar 26, 2015 at 04:00:54AM +0100, Luis R. Rodriguez wrote:
> > On Mon, Mar 23, 2015 at 12:20:47PM -0500, Bjorn Helgaas wrote:
> > > Hi Luis,
> > > 
> > > This seems OK to me, 
> > 
> > Great.
> > 
> > > but I'm curious about a few things.
> > > 
> > > On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
> > > <mcgrof@do-not-panic.com> wrote:
> > > > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> > > >
> > > > This allows drivers to take advantage of write-combining
> > > > when possible. Ideally we'd have pci_read_bases() just
> > > > peg an IORESOURCE_WC flag for us
> > > 
> > > We do set IORESOURCE_PREFETCH.  Do you mean something different?
> > 
> > I did not think we had a WC IORESOURCE flag. Are you saying that we can use
> > IORESOURCE_PREFETCH for that purpose? If so then great.  As I read a PCI BAR
> > can have PCI_BASE_ADDRESS_MEM_PREFETCH and when that's the case we peg
> > IORESOURCE_PREFETCH. That seems to be what I want indeed. Questions below.
> > 
> > > >  but where exactly
> > > > video devices memory lie varies *largely* and at times things
> > > > are mixed with MMIO registers, sometimes we can address
> > > > the changes in drivers, other times the change requires
> > > > intrusive changes.
> > > 
> > > What does a video device address have to do with this?  I do see that
> > > if a BAR maps only a frame buffer, the device might be able to mark it
> > > prefetchable, while if the BAR mapped both a frame buffer and some
> > > registers, it might not be able to make it prefetchable.  But that
> > > doesn't seem like it depends on the *address*.
> > 
> > I meant the offsets for each of those, either registers or framebuffer,
> > and that typically they are mixed (primarily on older devices), so indeed your
> > summary of the problem is what I meant. Let's remember that we are trying to
> > take advantage of PAT here when available and avoid MTRR in that case, do we
> > know that the same PCI BARs that have always historically used MTRRs had
> > IORESOURCE_PREFETCH set, is that a fair assumption ? I realize they are
> > different things -- but its precisely why I ask.
> > 
> > > pci_iomap_range() already makes a cacheable mapping if
> > > IORESOURCE_CACHEABLE; I'm guessing that you would like it to
> > > automatically use WC if the BAR if IORESOURCE_PREFETCH, e.g.,
> > > 
> > >   if (flags & IORESOURCE_CACHEABLE)
> > >     return ioremap(start, len);
> > >   if (flags & IORESOURCE_PREFETCH)
> > >     return ioremap_wc(start, len);
> > >   return ioremap_nocache(start, len);
> > 
> > Indeed, that's exactly what I think we should strive towards.
> > 
> > > Is there a reason not to do that?
> > 
> > This depends on the exact defintion of IORESOURCE_PREFETCH and
> > PCI_BASE_ADDRESS_MEM_PREFETCH and how they are used all over and
> > accross *all devices*. This didn't look promising for starters:
> > 
> > include/uapi/linux/pci_regs.h:#define  PCI_BASE_ADDRESS_MEM_PREFETCH    0x08    /* prefetchable? */
> > 
> > PCI_BASE_ADDRESS_MEM_PREFETCH seems to be BAR specific, so a few questions:
> > 
> > 1) Can we rest assured for instance that if we check for
> > PCI_BASE_ADDRESS_MEM_PREFETCH and if set that it will *only* be set on a full
> > PCI BAR if the full PCI BAR does want WC? If not this can regress
> > functionality. That seems risky. It however would not be risky if we used
> > another API that did look for IORESOURCE_PREFETCH and if so use ioremap_wc() --
> > that way only drivers we know that do use the full PCI bar would use this API.
> > There's a bit of a problem with this though:
> > 
> > 2) Do we know that if a *full PCI BAR* is used for WC that
> > PCI_BASE_ADDRESS_MEM_PREFETCH *was* definitely set for the PCI BAR? If so then
> > the API usage would be restricted only to devices that we know *do* adhere to
> > this. That reduces the possible uses for older drivers and can create
> > regressions if used loosely without verification... but..
> > 

In theory, PCI spec says this about prefetch memory:
	Bridges are permitted to merge writes into this range (refer to Section 3.2.6).

Exceptions could be:
	- devices not behind a bridge (e.g. intergrated in a root
	  complex)
	- devices behind a virtual bridge from same vendor
	  (which know bridge won't prefetch)

I worry that WC might also cause more reordering though.  I don't
remember this is true, off-hand.  Bridges can only reorder transactions
according to very specific rules.

> > 3) If from now on we get folks to commit to uset PCI_BASE_ADDRESS_MEM_PREFETCH
> > for full PCI BARs that do want WC perhaps newer devices / drivers will use
> > this very consistently ? Can we bank on that and is it worth it ?

Unfortunately there's a separate good reason to set memory as prefetcheable:
it's the only way to get 64 bit addresses for devices behind bridges.
So WC might be *safe* for prefetch BARs, but might not be a good idea.

> > 
> > 4) If a PCI BAR *does not* have PCI_BASE_ADDRESS_MEM_PREFETCH do we know it
> > must not never want WC ?

That's not true I think. It means device can't allow prefetch but maybe
it does allow combining.

> > 
> > If we don't have certainty on any of the above I'm afraid we can't do much
> > right now but perhaps we can push towards better use of PCI_BASE_ADDRESS_MEM_PREFETCH
> > and hope folks will only use this for the full PCI BAR only if WC is desired.
> > 
> > Thoughts?
> 
> Bjorn, now that you're done schooling me on English, any thoughts on the above?
>
> > > > Although there is also arch_phys_wc_add() that makes use of
> > > > architecture specific write-combinging alternatives (MTRR on
> > > > x86 when a system does not have PAT) we void polluting
> > > > pci_iomap() space with it and force drivers and subsystems
> > > > that want to use it to be explicit.
> > > >
> > > > There are a few motivations for this:
> > > >
> > > > a) Take advantage of PAT when available
> > > >
> > > > b) Help bury MTRR code away, MTRR is architecture specific and on
> > > >    x86 its replaced by PAT
> > > >
> > > > c) Help with the goal of eventually using _PAGE_CACHE_UC over
> > > >    _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
> > > > ...
> > > 
> > > > +void __iomem *pci_iomap_wc_range(struct pci_dev *dev,
> > > > +                                int bar,
> > > > +                                unsigned long offset,
> > > > +                                unsigned long maxlen)
> > > > +{
> > > > +       resource_size_t start = pci_resource_start(dev, bar);
> > > > +       resource_size_t len = pci_resource_len(dev, bar);
> > > > +       unsigned long flags = pci_resource_flags(dev, bar);
> > > > +
> > > > +       if (len <= offset || !start)
> > > > +               return NULL;
> > > > +       len -= offset;
> > > > +       start += offset;
> > > > +       if (maxlen && len > maxlen)
> > > > +               len = maxlen;
> > > > +       if (flags & IORESOURCE_IO)
> > > > +               return __pci_ioport_map(dev, start, len);
> > > > +       if (flags & IORESOURCE_MEM)
> > > 
> > > Should we log a note in dmesg if the BAR is *not* IORESOURCE_PREFETCH?
> > >  I know the driver might know it's safe even if the device didn't mark
> > > the BAR as prefetchable, but it does seem like an easy way for a
> > > driver to shoot itself in the foot.
> > 
> > You tell me. I would fear this may not be consistent and we'd end up
> > having bug reports open for something that has historically been a
> > non-issue. The above questions can help us gauge the risk of this.
> 
> Now, I'll tell you what I *think* but these are just guestimates (TM):
> 
>   * Likely PCI_BASE_ADDRESS_MEM_PREFETCH can implate IORESOURCE_PREFETCH
>     and use of ioremap_wc() on a full PCI BAR only, but this strict
>     definition likely cannot be 100% guaranteed and could break some
>     devices. We need something a bit more concrete and well known so
>     that next generation industry standards embrace and let us in
>     the kernel automatically detect specific ranges and their respective
>     page attribute requirements. Might be good to address here x86 and
>     ARM families
> 
> Curious: Sarah, how does USB address these different different page attribute
> needs on USB 3.0?
> 
>   Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
  2015-04-21 17:52       ` Luis R. Rodriguez
  2015-04-21 18:46         ` Michael S. Tsirkin
@ 2015-04-21 18:46         ` Michael S. Tsirkin
  1 sibling, 0 replies; 710+ messages in thread
From: Michael S. Tsirkin @ 2015-04-21 18:46 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: linux-fbdev, Daniel Vetter, Sarah Sharp, Peter Senna Tschudin,
	Jan Beulich, H. Peter Anvin, Ville Syrjälä,
	Suresh Siddha, Tomi Valkeinen, x86, Ingo Molnar, linux-pci,
	Dave Airlie, Ingo Molnar, Borislav Petkov,
	Jean-Christophe Plagniol-Villard, Benjamin Poirier, Dave Hansen,
	Antonino Daplas, Stefano Stabellini, Stefan Bader, Julia Lawall

On Tue, Apr 21, 2015 at 07:52:49PM +0200, Luis R. Rodriguez wrote:
> On Thu, Mar 26, 2015 at 04:00:54AM +0100, Luis R. Rodriguez wrote:
> > On Mon, Mar 23, 2015 at 12:20:47PM -0500, Bjorn Helgaas wrote:
> > > Hi Luis,
> > > 
> > > This seems OK to me, 
> > 
> > Great.
> > 
> > > but I'm curious about a few things.
> > > 
> > > On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez
> > > <mcgrof@do-not-panic.com> wrote:
> > > > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> > > >
> > > > This allows drivers to take advantage of write-combining
> > > > when possible. Ideally we'd have pci_read_bases() just
> > > > peg an IORESOURCE_WC flag for us
> > > 
> > > We do set IORESOURCE_PREFETCH.  Do you mean something different?
> > 
> > I did not think we had a WC IORESOURCE flag. Are you saying that we can use
> > IORESOURCE_PREFETCH for that purpose? If so then great.  As I read a PCI BAR
> > can have PCI_BASE_ADDRESS_MEM_PREFETCH and when that's the case we peg
> > IORESOURCE_PREFETCH. That seems to be what I want indeed. Questions below.
> > 
> > > >  but where exactly
> > > > video devices memory lie varies *largely* and at times things
> > > > are mixed with MMIO registers, sometimes we can address
> > > > the changes in drivers, other times the change requires
> > > > intrusive changes.
> > > 
> > > What does a video device address have to do with this?  I do see that
> > > if a BAR maps only a frame buffer, the device might be able to mark it
> > > prefetchable, while if the BAR mapped both a frame buffer and some
> > > registers, it might not be able to make it prefetchable.  But that
> > > doesn't seem like it depends on the *address*.
> > 
> > I meant the offsets for each of those, either registers or framebuffer,
> > and that typically they are mixed (primarily on older devices), so indeed your
> > summary of the problem is what I meant. Let's remember that we are trying to
> > take advantage of PAT here when available and avoid MTRR in that case, do we
> > know that the same PCI BARs that have always historically used MTRRs had
> > IORESOURCE_PREFETCH set, is that a fair assumption ? I realize they are
> > different things -- but its precisely why I ask.
> > 
> > > pci_iomap_range() already makes a cacheable mapping if
> > > IORESOURCE_CACHEABLE; I'm guessing that you would like it to
> > > automatically use WC if the BAR if IORESOURCE_PREFETCH, e.g.,
> > > 
> > >   if (flags & IORESOURCE_CACHEABLE)
> > >     return ioremap(start, len);
> > >   if (flags & IORESOURCE_PREFETCH)
> > >     return ioremap_wc(start, len);
> > >   return ioremap_nocache(start, len);
> > 
> > Indeed, that's exactly what I think we should strive towards.
> > 
> > > Is there a reason not to do that?
> > 
> > This depends on the exact defintion of IORESOURCE_PREFETCH and
> > PCI_BASE_ADDRESS_MEM_PREFETCH and how they are used all over and
> > accross *all devices*. This didn't look promising for starters:
> > 
> > include/uapi/linux/pci_regs.h:#define  PCI_BASE_ADDRESS_MEM_PREFETCH    0x08    /* prefetchable? */
> > 
> > PCI_BASE_ADDRESS_MEM_PREFETCH seems to be BAR specific, so a few questions:
> > 
> > 1) Can we rest assured for instance that if we check for
> > PCI_BASE_ADDRESS_MEM_PREFETCH and if set that it will *only* be set on a full
> > PCI BAR if the full PCI BAR does want WC? If not this can regress
> > functionality. That seems risky. It however would not be risky if we used
> > another API that did look for IORESOURCE_PREFETCH and if so use ioremap_wc() --
> > that way only drivers we know that do use the full PCI bar would use this API.
> > There's a bit of a problem with this though:
> > 
> > 2) Do we know that if a *full PCI BAR* is used for WC that
> > PCI_BASE_ADDRESS_MEM_PREFETCH *was* definitely set for the PCI BAR? If so then
> > the API usage would be restricted only to devices that we know *do* adhere to
> > this. That reduces the possible uses for older drivers and can create
> > regressions if used loosely without verification... but..
> > 

In theory, PCI spec says this about prefetch memory:
	Bridges are permitted to merge writes into this range (refer to Section 3.2.6).

Exceptions could be:
	- devices not behind a bridge (e.g. intergrated in a root
	  complex)
	- devices behind a virtual bridge from same vendor
	  (which know bridge won't prefetch)

I worry that WC might also cause more reordering though.  I don't
remember this is true, off-hand.  Bridges can only reorder transactions
according to very specific rules.

> > 3) If from now on we get folks to commit to uset PCI_BASE_ADDRESS_MEM_PREFETCH
> > for full PCI BARs that do want WC perhaps newer devices / drivers will use
> > this very consistently ? Can we bank on that and is it worth it ?

Unfortunately there's a separate good reason to set memory as prefetcheable:
it's the only way to get 64 bit addresses for devices behind bridges.
So WC might be *safe* for prefetch BARs, but might not be a good idea.

> > 
> > 4) If a PCI BAR *does not* have PCI_BASE_ADDRESS_MEM_PREFETCH do we know it
> > must not never want WC ?

That's not true I think. It means device can't allow prefetch but maybe
it does allow combining.

> > 
> > If we don't have certainty on any of the above I'm afraid we can't do much
> > right now but perhaps we can push towards better use of PCI_BASE_ADDRESS_MEM_PREFETCH
> > and hope folks will only use this for the full PCI BAR only if WC is desired.
> > 
> > Thoughts?
> 
> Bjorn, now that you're done schooling me on English, any thoughts on the above?
>
> > > > Although there is also arch_phys_wc_add() that makes use of
> > > > architecture specific write-combinging alternatives (MTRR on
> > > > x86 when a system does not have PAT) we void polluting
> > > > pci_iomap() space with it and force drivers and subsystems
> > > > that want to use it to be explicit.
> > > >
> > > > There are a few motivations for this:
> > > >
> > > > a) Take advantage of PAT when available
> > > >
> > > > b) Help bury MTRR code away, MTRR is architecture specific and on
> > > >    x86 its replaced by PAT
> > > >
> > > > c) Help with the goal of eventually using _PAGE_CACHE_UC over
> > > >    _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
> > > > ...
> > > 
> > > > +void __iomem *pci_iomap_wc_range(struct pci_dev *dev,
> > > > +                                int bar,
> > > > +                                unsigned long offset,
> > > > +                                unsigned long maxlen)
> > > > +{
> > > > +       resource_size_t start = pci_resource_start(dev, bar);
> > > > +       resource_size_t len = pci_resource_len(dev, bar);
> > > > +       unsigned long flags = pci_resource_flags(dev, bar);
> > > > +
> > > > +       if (len <= offset || !start)
> > > > +               return NULL;
> > > > +       len -= offset;
> > > > +       start += offset;
> > > > +       if (maxlen && len > maxlen)
> > > > +               len = maxlen;
> > > > +       if (flags & IORESOURCE_IO)
> > > > +               return __pci_ioport_map(dev, start, len);
> > > > +       if (flags & IORESOURCE_MEM)
> > > 
> > > Should we log a note in dmesg if the BAR is *not* IORESOURCE_PREFETCH?
> > >  I know the driver might know it's safe even if the device didn't mark
> > > the BAR as prefetchable, but it does seem like an easy way for a
> > > driver to shoot itself in the foot.
> > 
> > You tell me. I would fear this may not be consistent and we'd end up
> > having bug reports open for something that has historically been a
> > non-issue. The above questions can help us gauge the risk of this.
> 
> Now, I'll tell you what I *think* but these are just guestimates (TM):
> 
>   * Likely PCI_BASE_ADDRESS_MEM_PREFETCH can implate IORESOURCE_PREFETCH
>     and use of ioremap_wc() on a full PCI BAR only, but this strict
>     definition likely cannot be 100% guaranteed and could break some
>     devices. We need something a bit more concrete and well known so
>     that next generation industry standards embrace and let us in
>     the kernel automatically detect specific ranges and their respective
>     page attribute requirements. Might be good to address here x86 and
>     ARM families
> 
> Curious: Sarah, how does USB address these different different page attribute
> needs on USB 3.0?
> 
>   Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
  2015-03-23 17:20     ` Bjorn Helgaas
  (?)
@ 2015-04-21 19:25       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 710+ messages in thread
From: Michael S. Tsirkin @ 2015-04-21 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Luis R. Rodriguez, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, jgross, Jan Beulich, Borislav Petkov,
	Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
	linux-fbdev, x86, xen-devel, Luis R. Rodriguez, Ingo Molnar,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Stefan Bader,
	Konrad Rzeszutek Wilk, Ville Syrjälä,
	David Vrabel, Toshi Kani, Roger Pau Monné,
	xen-devel

On Mon, Mar 23, 2015 at 12:20:47PM -0500, Bjorn Helgaas wrote:
> pci_iomap_range() already makes a cacheable mapping if
> IORESOURCE_CACHEABLE; I'm guessing that you would like it to
> automatically use WC if the BAR if IORESOURCE_PREFETCH, e.g.,
> 
>   if (flags & IORESOURCE_CACHEABLE)
>     return ioremap(start, len);
>   if (flags & IORESOURCE_PREFETCH)
>     return ioremap_wc(start, len);
>   return ioremap_nocache(start, len);
> 
> Is there a reason not to do that?

I think that's wrong and will break a bunch of things.
PCI prefetch bit merely means bridges can combine writes and prefetch
reads.  Prefetch does not affect ordering rules and does not allow
writes to be collapsed.

WC is stronger: it allows collapsing and changes ordering rules.

WC can also hurt latency as small writes are buffered.

To summarise, driver needs to know what it's doing,
we can't set WC in the pci core automatically.

-- 
MST

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
@ 2015-04-21 19:25       ` Michael S. Tsirkin
  0 siblings, 0 replies; 710+ messages in thread
From: Michael S. Tsirkin @ 2015-04-21 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Luis R. Rodriguez, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, jgross, Jan Beulich, Borislav Petkov,
	Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
	linux-fbdev, x86, xen-devel, Luis R. Rodriguez, Ingo Molnar,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Stefan Bader,
	Konrad Rzeszutek Wilk, Ville Syrjälä,
	David Vrabel, Toshi Kani, Roger Pau Monné,
	xen-devel

On Mon, Mar 23, 2015 at 12:20:47PM -0500, Bjorn Helgaas wrote:
> pci_iomap_range() already makes a cacheable mapping if
> IORESOURCE_CACHEABLE; I'm guessing that you would like it to
> automatically use WC if the BAR if IORESOURCE_PREFETCH, e.g.,
> 
>   if (flags & IORESOURCE_CACHEABLE)
>     return ioremap(start, len);
>   if (flags & IORESOURCE_PREFETCH)
>     return ioremap_wc(start, len);
>   return ioremap_nocache(start, len);
> 
> Is there a reason not to do that?

I think that's wrong and will break a bunch of things.
PCI prefetch bit merely means bridges can combine writes and prefetch
reads.  Prefetch does not affect ordering rules and does not allow
writes to be collapsed.

WC is stronger: it allows collapsing and changes ordering rules.

WC can also hurt latency as small writes are buffered.

To summarise, driver needs to know what it's doing,
we can't set WC in the pci core automatically.

-- 
MST

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
@ 2015-04-21 19:25       ` Michael S. Tsirkin
  0 siblings, 0 replies; 710+ messages in thread
From: Michael S. Tsirkin @ 2015-04-21 19:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Luis R. Rodriguez, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, jgross, Jan Beulich, Borislav Petkov,
	Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
	linux-fbdev, x86, xen-devel, Luis R. Rodriguez, Ingo Molnar,
	Daniel Vetter, Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann

On Mon, Mar 23, 2015 at 12:20:47PM -0500, Bjorn Helgaas wrote:
> pci_iomap_range() already makes a cacheable mapping if
> IORESOURCE_CACHEABLE; I'm guessing that you would like it to
> automatically use WC if the BAR if IORESOURCE_PREFETCH, e.g.,
> 
>   if (flags & IORESOURCE_CACHEABLE)
>     return ioremap(start, len);
>   if (flags & IORESOURCE_PREFETCH)
>     return ioremap_wc(start, len);
>   return ioremap_nocache(start, len);
> 
> Is there a reason not to do that?

I think that's wrong and will break a bunch of things.
PCI prefetch bit merely means bridges can combine writes and prefetch
reads.  Prefetch does not affect ordering rules and does not allow
writes to be collapsed.

WC is stronger: it allows collapsing and changes ordering rules.

WC can also hurt latency as small writes are buffered.

To summarise, driver needs to know what it's doing,
we can't set WC in the pci core automatically.

-- 
MST

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
  2015-04-21 19:25       ` Michael S. Tsirkin
  (?)
@ 2015-04-21 19:27         ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-21 19:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Bjorn Helgaas, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Juergen Gross, Jan Beulich, Borislav Petkov,
	Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
	linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Stefan Bader,
	Konrad Rzeszutek Wilk, Ville Syrjälä,
	David Vrabel, Toshi Kani, Roger Pau Monné,
	xen-devel

On Tue, Apr 21, 2015 at 12:25 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> To summarise, driver needs to know what it's doing,
> we can't set WC in the pci core automatically.

Thanks, I'll document this and proceed with device driver helpers to
aid with this.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
@ 2015-04-21 19:27         ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-21 19:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Bjorn Helgaas, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Juergen Gross, Jan Beulich, Borislav Petkov,
	Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
	linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Stefan Bader,
	Konrad Rzeszutek Wilk, Ville Syrjälä,
	David Vrabel, Toshi Kani, Roger Pau Monné,
	xen-devel

On Tue, Apr 21, 2015 at 12:25 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> To summarise, driver needs to know what it's doing,
> we can't set WC in the pci core automatically.

Thanks, I'll document this and proceed with device driver helpers to
aid with this.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 05/47] pci: add pci_iomap_wc() variants
@ 2015-04-21 19:27         ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-21 19:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Bjorn Helgaas, Andy Lutomirski, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Juergen Gross, Jan Beulich, Borislav Petkov,
	Suresh Siddha, venkatesh.pallipadi, Dave Airlie, linux-kernel,
	linux-fbdev, x86, xen-devel, Ingo Molnar, Daniel Vetter,
	Antonino Daplas, Jean-Christophe Plagniol-Villard,
	Tomi Valkeinen, Dave Hansen, Arnd Bergmann, Stefan

On Tue, Apr 21, 2015 at 12:25 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> To summarise, driver needs to know what it's doing,
> we can't set WC in the pci core automatically.

Thanks, I'll document this and proceed with device driver helpers to
aid with this.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* [PATCH v4] mtrr: avoid ifdef'ery with phys_wc_to_mtrr_index()
@ 2015-04-22 17:12 ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-22 17:12 UTC (permalink / raw)
  To: mingo, tglx, hpa, plagnioj, tomi.valkeinen
  Cc: linux-fbdev, luto, mst, linux-kernel, Luis R. Rodriguez,
	Toshi Kani, Ingo Molnar, Will Deacon, Thierry Reding,
	Andrew Morton, Dave Hansen, Greg Kroah-Hartman, Catalin Marinas,
	Abhilash Kesavan, Matthias Brugger, Cristian Stoica, dri-devel,
	Suresh Siddha, Linus Torvalds, Juergen Gross, Daniel Vetter,
	Dave Airlie, Antonino Daplas, Ville Syrjälä,
	Mel Gorman, Vlastimil Babka, Borislav Petkov, Davidlohr Bueso,
	x86

From: "Luis R. Rodriguez" <mcgrof@suse.com>

There is only one user but since we're going to bury
MTRR next out of access to drivers expose this last
piece of API to drivers in a general fashion only
needing io.h for access to helpers.

Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Abhilash Kesavan <a.kesavan@samsung.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Cristian Stoica <cristian.stoica@freescale.com>
Cc: dri-devel@lists.freedesktop.org
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: x86@kernel.org
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---

This v4 adds a missing #endif.

 arch/x86/include/asm/io.h       |  3 +++
 arch/x86/include/asm/mtrr.h     |  5 -----
 arch/x86/kernel/cpu/mtrr/main.c |  6 +++---
 drivers/gpu/drm/drm_ioctl.c     | 14 +-------------
 include/linux/io.h              |  7 +++++++
 5 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 4afc05f..a2b9740 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -339,6 +339,9 @@ extern bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
 #define IO_SPACE_LIMIT 0xffff
 
 #ifdef CONFIG_MTRR
+extern int __must_check arch_phys_wc_index(int handle);
+#define arch_phys_wc_index arch_phys_wc_index
+
 extern int __must_check arch_phys_wc_add(unsigned long base,
 					 unsigned long size);
 extern void arch_phys_wc_del(int handle);
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index da8dff1..27e3dc0 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -48,7 +48,6 @@ extern void mtrr_aps_init(void);
 extern void mtrr_bp_restore(void);
 extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
 extern int amd_special_default_mtrr(void);
-extern int phys_wc_to_mtrr_index(int handle);
 #  else
 static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
 {
@@ -85,10 +84,6 @@ static inline int mtrr_trim_uncached_memory(unsigned long end_pfn)
 static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
 {
 }
-static inline int phys_wc_to_mtrr_index(int handle)
-{
-	return -1;
-}
 
 #define mtrr_ap_init() do {} while (0)
 #define mtrr_bp_init() do {} while (0)
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 12abdbe..d8c106c 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -580,7 +580,7 @@ void arch_phys_wc_del(int handle)
 EXPORT_SYMBOL(arch_phys_wc_del);
 
 /*
- * phys_wc_to_mtrr_index - translates arch_phys_wc_add's return value
+ * arch_phys_wc_index - translates arch_phys_wc_add's return value
  * @handle: Return value from arch_phys_wc_add
  *
  * This will turn the return value from arch_phys_wc_add into an mtrr
@@ -590,14 +590,14 @@ EXPORT_SYMBOL(arch_phys_wc_del);
  * in printk line.  Alas there is an illegitimate use in some ancient
  * drm ioctls.
  */
-int phys_wc_to_mtrr_index(int handle)
+int arch_phys_wc_index(int handle)
 {
 	if (handle < MTRR_TO_PHYS_WC_OFFSET)
 		return -1;
 	else
 		return handle - MTRR_TO_PHYS_WC_OFFSET;
 }
-EXPORT_SYMBOL_GPL(phys_wc_to_mtrr_index);
+EXPORT_SYMBOL_GPL(arch_phys_wc_index);
 
 /*
  * HACK ALERT!
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 266dcd6..0a95782 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -36,9 +36,6 @@
 
 #include <linux/pci.h>
 #include <linux/export.h>
-#ifdef CONFIG_X86
-#include <asm/mtrr.h>
-#endif
 
 static int drm_version(struct drm_device *dev, void *data,
 		       struct drm_file *file_priv);
@@ -197,16 +194,7 @@ static int drm_getmap(struct drm_device *dev, void *data,
 	map->type = r_list->map->type;
 	map->flags = r_list->map->flags;
 	map->handle = (void *)(unsigned long) r_list->user_token;
-
-#ifdef CONFIG_X86
-	/*
-	 * There appears to be exactly one user of the mtrr index: dritest.
-	 * It's easy enough to keep it working on non-PAT systems.
-	 */
-	map->mtrr = phys_wc_to_mtrr_index(r_list->map->mtrr);
-#else
-	map->mtrr = -1;
-#endif
+	map->mtrr = arch_phys_wc_index(r_list->map->mtrr);
 
 	mutex_unlock(&dev->struct_mutex);
 
diff --git a/include/linux/io.h b/include/linux/io.h
index 986f2bf..04cce4d 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -111,6 +111,13 @@ static inline void arch_phys_wc_del(int handle)
 }
 
 #define arch_phys_wc_add arch_phys_wc_add
+#ifndef arch_phys_wc_index
+static inline int arch_phys_wc_index(int handle)
+{
+	return -1;
+}
+#define arch_phys_wc_index arch_phys_wc_index
+#endif
 #endif
 
 #endif /* _LINUX_IO_H */
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4] mtrr: avoid ifdef'ery with phys_wc_to_mtrr_index()
@ 2015-04-22 17:12 ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-22 17:12 UTC (permalink / raw)
  To: mingo, tglx, hpa, plagnioj, tomi.valkeinen
  Cc: linux-fbdev, luto, mst, linux-kernel, Luis R. Rodriguez,
	Toshi Kani, Ingo Molnar, Will Deacon, Thierry Reding,
	Andrew Morton, Dave Hansen, Greg Kroah-Hartman, Catalin Marinas,
	Abhilash Kesavan, Matthias Brugger, Cristian Stoica, dri-devel,
	Suresh Siddha, Linus Torvalds, Juergen Gross, Daniel Vetter,
	Dave Airlie, Antonino Daplas

From: "Luis R. Rodriguez" <mcgrof@suse.com>

There is only one user but since we're going to bury
MTRR next out of access to drivers expose this last
piece of API to drivers in a general fashion only
needing io.h for access to helpers.

Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Abhilash Kesavan <a.kesavan@samsung.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Cristian Stoica <cristian.stoica@freescale.com>
Cc: dri-devel@lists.freedesktop.org
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: x86@kernel.org
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---

This v4 adds a missing #endif.

 arch/x86/include/asm/io.h       |  3 +++
 arch/x86/include/asm/mtrr.h     |  5 -----
 arch/x86/kernel/cpu/mtrr/main.c |  6 +++---
 drivers/gpu/drm/drm_ioctl.c     | 14 +-------------
 include/linux/io.h              |  7 +++++++
 5 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 4afc05f..a2b9740 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -339,6 +339,9 @@ extern bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
 #define IO_SPACE_LIMIT 0xffff
 
 #ifdef CONFIG_MTRR
+extern int __must_check arch_phys_wc_index(int handle);
+#define arch_phys_wc_index arch_phys_wc_index
+
 extern int __must_check arch_phys_wc_add(unsigned long base,
 					 unsigned long size);
 extern void arch_phys_wc_del(int handle);
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index da8dff1..27e3dc0 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -48,7 +48,6 @@ extern void mtrr_aps_init(void);
 extern void mtrr_bp_restore(void);
 extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
 extern int amd_special_default_mtrr(void);
-extern int phys_wc_to_mtrr_index(int handle);
 #  else
 static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
 {
@@ -85,10 +84,6 @@ static inline int mtrr_trim_uncached_memory(unsigned long end_pfn)
 static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
 {
 }
-static inline int phys_wc_to_mtrr_index(int handle)
-{
-	return -1;
-}
 
 #define mtrr_ap_init() do {} while (0)
 #define mtrr_bp_init() do {} while (0)
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 12abdbe..d8c106c 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -580,7 +580,7 @@ void arch_phys_wc_del(int handle)
 EXPORT_SYMBOL(arch_phys_wc_del);
 
 /*
- * phys_wc_to_mtrr_index - translates arch_phys_wc_add's return value
+ * arch_phys_wc_index - translates arch_phys_wc_add's return value
  * @handle: Return value from arch_phys_wc_add
  *
  * This will turn the return value from arch_phys_wc_add into an mtrr
@@ -590,14 +590,14 @@ EXPORT_SYMBOL(arch_phys_wc_del);
  * in printk line.  Alas there is an illegitimate use in some ancient
  * drm ioctls.
  */
-int phys_wc_to_mtrr_index(int handle)
+int arch_phys_wc_index(int handle)
 {
 	if (handle < MTRR_TO_PHYS_WC_OFFSET)
 		return -1;
 	else
 		return handle - MTRR_TO_PHYS_WC_OFFSET;
 }
-EXPORT_SYMBOL_GPL(phys_wc_to_mtrr_index);
+EXPORT_SYMBOL_GPL(arch_phys_wc_index);
 
 /*
  * HACK ALERT!
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 266dcd6..0a95782 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -36,9 +36,6 @@
 
 #include <linux/pci.h>
 #include <linux/export.h>
-#ifdef CONFIG_X86
-#include <asm/mtrr.h>
-#endif
 
 static int drm_version(struct drm_device *dev, void *data,
 		       struct drm_file *file_priv);
@@ -197,16 +194,7 @@ static int drm_getmap(struct drm_device *dev, void *data,
 	map->type = r_list->map->type;
 	map->flags = r_list->map->flags;
 	map->handle = (void *)(unsigned long) r_list->user_token;
-
-#ifdef CONFIG_X86
-	/*
-	 * There appears to be exactly one user of the mtrr index: dritest.
-	 * It's easy enough to keep it working on non-PAT systems.
-	 */
-	map->mtrr = phys_wc_to_mtrr_index(r_list->map->mtrr);
-#else
-	map->mtrr = -1;
-#endif
+	map->mtrr = arch_phys_wc_index(r_list->map->mtrr);
 
 	mutex_unlock(&dev->struct_mutex);
 
diff --git a/include/linux/io.h b/include/linux/io.h
index 986f2bf..04cce4d 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -111,6 +111,13 @@ static inline void arch_phys_wc_del(int handle)
 }
 
 #define arch_phys_wc_add arch_phys_wc_add
+#ifndef arch_phys_wc_index
+static inline int arch_phys_wc_index(int handle)
+{
+	return -1;
+}
+#define arch_phys_wc_index arch_phys_wc_index
+#endif
 #endif
 
 #endif /* _LINUX_IO_H */
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4] mtrr: avoid ifdef'ery with phys_wc_to_mtrr_index()
@ 2015-04-22 17:12 ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-22 17:12 UTC (permalink / raw)
  To: mingo, tglx, hpa, plagnioj, tomi.valkeinen
  Cc: linux-fbdev, luto, mst, linux-kernel, Luis R. Rodriguez,
	Toshi Kani, Ingo Molnar, Will Deacon, Thierry Reding,
	Andrew Morton, Dave Hansen, Greg Kroah-Hartman, Catalin Marinas,
	Abhilash Kesavan, Matthias Brugger, Cristian Stoica, dri-devel,
	Suresh Siddha, Linus Torvalds, Juergen Gross, Daniel Vetter,
	Dave Airlie, Antonino Daplas

From: "Luis R. Rodriguez" <mcgrof@suse.com>

There is only one user but since we're going to bury
MTRR next out of access to drivers expose this last
piece of API to drivers in a general fashion only
needing io.h for access to helpers.

Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Abhilash Kesavan <a.kesavan@samsung.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Cristian Stoica <cristian.stoica@freescale.com>
Cc: dri-devel@lists.freedesktop.org
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: x86@kernel.org
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---

This v4 adds a missing #endif.

 arch/x86/include/asm/io.h       |  3 +++
 arch/x86/include/asm/mtrr.h     |  5 -----
 arch/x86/kernel/cpu/mtrr/main.c |  6 +++---
 drivers/gpu/drm/drm_ioctl.c     | 14 +-------------
 include/linux/io.h              |  7 +++++++
 5 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 4afc05f..a2b9740 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -339,6 +339,9 @@ extern bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
 #define IO_SPACE_LIMIT 0xffff
 
 #ifdef CONFIG_MTRR
+extern int __must_check arch_phys_wc_index(int handle);
+#define arch_phys_wc_index arch_phys_wc_index
+
 extern int __must_check arch_phys_wc_add(unsigned long base,
 					 unsigned long size);
 extern void arch_phys_wc_del(int handle);
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index da8dff1..27e3dc0 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -48,7 +48,6 @@ extern void mtrr_aps_init(void);
 extern void mtrr_bp_restore(void);
 extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
 extern int amd_special_default_mtrr(void);
-extern int phys_wc_to_mtrr_index(int handle);
 #  else
 static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
 {
@@ -85,10 +84,6 @@ static inline int mtrr_trim_uncached_memory(unsigned long end_pfn)
 static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
 {
 }
-static inline int phys_wc_to_mtrr_index(int handle)
-{
-	return -1;
-}
 
 #define mtrr_ap_init() do {} while (0)
 #define mtrr_bp_init() do {} while (0)
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 12abdbe..d8c106c 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -580,7 +580,7 @@ void arch_phys_wc_del(int handle)
 EXPORT_SYMBOL(arch_phys_wc_del);
 
 /*
- * phys_wc_to_mtrr_index - translates arch_phys_wc_add's return value
+ * arch_phys_wc_index - translates arch_phys_wc_add's return value
  * @handle: Return value from arch_phys_wc_add
  *
  * This will turn the return value from arch_phys_wc_add into an mtrr
@@ -590,14 +590,14 @@ EXPORT_SYMBOL(arch_phys_wc_del);
  * in printk line.  Alas there is an illegitimate use in some ancient
  * drm ioctls.
  */
-int phys_wc_to_mtrr_index(int handle)
+int arch_phys_wc_index(int handle)
 {
 	if (handle < MTRR_TO_PHYS_WC_OFFSET)
 		return -1;
 	else
 		return handle - MTRR_TO_PHYS_WC_OFFSET;
 }
-EXPORT_SYMBOL_GPL(phys_wc_to_mtrr_index);
+EXPORT_SYMBOL_GPL(arch_phys_wc_index);
 
 /*
  * HACK ALERT!
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 266dcd6..0a95782 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -36,9 +36,6 @@
 
 #include <linux/pci.h>
 #include <linux/export.h>
-#ifdef CONFIG_X86
-#include <asm/mtrr.h>
-#endif
 
 static int drm_version(struct drm_device *dev, void *data,
 		       struct drm_file *file_priv);
@@ -197,16 +194,7 @@ static int drm_getmap(struct drm_device *dev, void *data,
 	map->type = r_list->map->type;
 	map->flags = r_list->map->flags;
 	map->handle = (void *)(unsigned long) r_list->user_token;
-
-#ifdef CONFIG_X86
-	/*
-	 * There appears to be exactly one user of the mtrr index: dritest.
-	 * It's easy enough to keep it working on non-PAT systems.
-	 */
-	map->mtrr = phys_wc_to_mtrr_index(r_list->map->mtrr);
-#else
-	map->mtrr = -1;
-#endif
+	map->mtrr = arch_phys_wc_index(r_list->map->mtrr);
 
 	mutex_unlock(&dev->struct_mutex);
 
diff --git a/include/linux/io.h b/include/linux/io.h
index 986f2bf..04cce4d 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -111,6 +111,13 @@ static inline void arch_phys_wc_del(int handle)
 }
 
 #define arch_phys_wc_add arch_phys_wc_add
+#ifndef arch_phys_wc_index
+static inline int arch_phys_wc_index(int handle)
+{
+	return -1;
+}
+#define arch_phys_wc_index arch_phys_wc_index
+#endif
 #endif
 
 #endif /* _LINUX_IO_H */
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/7] mtrr, mm, x86: Enhance MTRR checks for huge I/O mapping
  2015-04-03 15:22       ` Toshi Kani
@ 2015-04-27 14:31         ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-04-27 14:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, hpa, tglx, mingo, linux-mm, x86, linux-kernel,
	dave.hansen, Elliott, pebolle

On Fri, 2015-04-03 at 09:22 -0600, Toshi Kani wrote:
> On Fri, 2015-04-03 at 08:33 +0200, Ingo Molnar wrote:
> > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > > On Tue, 24 Mar 2015 16:08:34 -0600 Toshi Kani <toshi.kani@hp.com> wrote:
> > > 
> > > > This patchset enhances MTRR checks for the kernel huge I/O mapping,
> > > > which was enabled by the patchset below:
> > > >   https://lkml.org/lkml/2015/3/3/589
> > > > 
> > > > The following functional changes are made in patch 7/7.
> > > >  - Allow pud_set_huge() and pmd_set_huge() to create a huge page
> > > >    mapping to a range covered by a single MTRR entry of any memory
> > > >    type.
> > > >  - Log a pr_warn() message when a specified PMD map range spans more
> > > >    than a single MTRR entry.  Drivers should make a mapping request
> > > >    aligned to a single MTRR entry when the range is covered by MTRRs.
> > > > 
> > > 
> > > OK, I grabbed these after barely looking at them, to get them a bit of
> > > runtime testing.
> > > 
> > > I'll await guidance from the x86 maintainers regarding next steps?
> > 
> > Could you please send the current version of them over to us if your 
> > testing didn't find any problems?
> > 
> > I'd like to take a final look and have them cook in the x86 tree as 
> > well for a while and want to preserve your testing effort.
> 
> This patchset is on top of the following patches in the -mm tree.
> (Patches apply from the bottom to the top.)

Ingo,

The following patches (2 got squashed to 1) went to 4.1-rc1, but this
patch-set is still sitting in the -mm tree.  I confirmed that the
patch-set applies cleanly to 4.1-rc1.  Please take a final look and let
me know if you have any comment.

Thanks,
-Toshi


> 2. Build error fixes and cleanups
> http://ozlabs.org/~akpm/mmotm/broken-out/x86-mm-support-huge-kva-mappings-on-x86-fix.patch
> http://ozlabs.org/~akpm/mmotm/broken-out/mm-change-vunmap-to-tear-down-huge-kva-mappings-fix.patch
> http://ozlabs.org/~akpm/mmotm/broken-out/mm-change-ioremap-to-set-up-huge-i-o-mappings-fix.patch
> http://ozlabs.org/~akpm/mmotm/broken-out/lib-add-huge-i-o-map-capability-interfaces-fix.patch
> 
> 1. Kernel huge I/O mapping support
> http://ozlabs.org/~akpm/mmotm/broken-out/x86-mm-support-huge-kva-mappings-on-x86.patch
> http://ozlabs.org/~akpm/mmotm/broken-out/x86-mm-support-huge-i-o-mapping-capability-i-f.patch
> http://ozlabs.org/~akpm/mmotm/broken-out/mm-change-vunmap-to-tear-down-huge-kva-mappings.patch
> http://ozlabs.org/~akpm/mmotm/broken-out/mm-change-ioremap-to-set-up-huge-i-o-mappings.patch
> http://ozlabs.org/~akpm/mmotm/broken-out/lib-add-huge-i-o-map-capability-interfaces.patch
> http://ozlabs.org/~akpm/mmotm/broken-out/mm-change-__get_vm_area_node-to-use-fls_long.patch
> 
> Thanks,
> -Toshi
> 



^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/7] mtrr, mm, x86: Enhance MTRR checks for huge I/O mapping
@ 2015-04-27 14:31         ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-04-27 14:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, hpa, tglx, mingo, linux-mm, x86, linux-kernel,
	dave.hansen, Elliott, pebolle

On Fri, 2015-04-03 at 09:22 -0600, Toshi Kani wrote:
> On Fri, 2015-04-03 at 08:33 +0200, Ingo Molnar wrote:
> > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > > On Tue, 24 Mar 2015 16:08:34 -0600 Toshi Kani <toshi.kani@hp.com> wrote:
> > > 
> > > > This patchset enhances MTRR checks for the kernel huge I/O mapping,
> > > > which was enabled by the patchset below:
> > > >   https://lkml.org/lkml/2015/3/3/589
> > > > 
> > > > The following functional changes are made in patch 7/7.
> > > >  - Allow pud_set_huge() and pmd_set_huge() to create a huge page
> > > >    mapping to a range covered by a single MTRR entry of any memory
> > > >    type.
> > > >  - Log a pr_warn() message when a specified PMD map range spans more
> > > >    than a single MTRR entry.  Drivers should make a mapping request
> > > >    aligned to a single MTRR entry when the range is covered by MTRRs.
> > > > 
> > > 
> > > OK, I grabbed these after barely looking at them, to get them a bit of
> > > runtime testing.
> > > 
> > > I'll await guidance from the x86 maintainers regarding next steps?
> > 
> > Could you please send the current version of them over to us if your 
> > testing didn't find any problems?
> > 
> > I'd like to take a final look and have them cook in the x86 tree as 
> > well for a while and want to preserve your testing effort.
> 
> This patchset is on top of the following patches in the -mm tree.
> (Patches apply from the bottom to the top.)

Ingo,

The following patches (2 got squashed to 1) went to 4.1-rc1, but this
patch-set is still sitting in the -mm tree.  I confirmed that the
patch-set applies cleanly to 4.1-rc1.  Please take a final look and let
me know if you have any comment.

Thanks,
-Toshi


> 2. Build error fixes and cleanups
> http://ozlabs.org/~akpm/mmotm/broken-out/x86-mm-support-huge-kva-mappings-on-x86-fix.patch
> http://ozlabs.org/~akpm/mmotm/broken-out/mm-change-vunmap-to-tear-down-huge-kva-mappings-fix.patch
> http://ozlabs.org/~akpm/mmotm/broken-out/mm-change-ioremap-to-set-up-huge-i-o-mappings-fix.patch
> http://ozlabs.org/~akpm/mmotm/broken-out/lib-add-huge-i-o-map-capability-interfaces-fix.patch
> 
> 1. Kernel huge I/O mapping support
> http://ozlabs.org/~akpm/mmotm/broken-out/x86-mm-support-huge-kva-mappings-on-x86.patch
> http://ozlabs.org/~akpm/mmotm/broken-out/x86-mm-support-huge-i-o-mapping-capability-i-f.patch
> http://ozlabs.org/~akpm/mmotm/broken-out/mm-change-vunmap-to-tear-down-huge-kva-mappings.patch
> http://ozlabs.org/~akpm/mmotm/broken-out/mm-change-ioremap-to-set-up-huge-i-o-mappings.patch
> http://ozlabs.org/~akpm/mmotm/broken-out/lib-add-huge-i-o-map-capability-interfaces.patch
> http://ozlabs.org/~akpm/mmotm/broken-out/mm-change-__get_vm_area_node-to-use-fls_long.patch
> 
> Thanks,
> -Toshi
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* [PATCH] x86: improve algorithm in clflush_cache_range
@ 2015-04-28 22:13 Ross Zwisler
  2015-04-29 10:28 ` Borislav Petkov
  0 siblings, 1 reply; 710+ messages in thread
From: Ross Zwisler @ 2015-04-28 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ross Zwisler, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, x86,
	Borislav Petkov

The current algorithm used in clflush_cache_range() can cause the last
cache line of the buffer to be flushed twice.  Fix that algorithm so
that each cache line will only be flushed once.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: H. Peter Anvin <hpa@zytor.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
Cc: Borislav Petkov <bp@suse.de>
---
 arch/x86/mm/pageattr.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 89af288ec674..338e507f95b8 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -129,16 +129,15 @@ within(unsigned long addr, unsigned long start, unsigned long end)
  */
 void clflush_cache_range(void *vaddr, unsigned int size)
 {
-	void *vend = vaddr + size - 1;
+	unsigned long clflush_mask = boot_cpu_data.x86_clflush_size - 1;
+	char *vend = (char *)vaddr + size;
+	char *p;
 
 	mb();
 
-	for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size)
-		clflushopt(vaddr);
-	/*
-	 * Flush any possible final partial cacheline:
-	 */
-	clflushopt(vend);
+	for (p = (char *)((unsigned long)vaddr & ~clflush_mask);
+	     p < vend; p += boot_cpu_data.x86_clflush_size)
+		clflushopt(p);
 
 	mb();
 }
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v2] x86: Add kerneldoc for pcommit_sfence()
@ 2015-04-28 22:46 Ross Zwisler
  2015-04-29 14:23 ` Borislav Petkov
  0 siblings, 1 reply; 710+ messages in thread
From: Ross Zwisler @ 2015-04-28 22:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ross Zwisler, H Peter Anvin, Ingo Molnar, Thomas Gleixner,
	Borislav Petkov

Add kerneldoc comments for pcommit_sfence() describing the purpose of
the pcommit instruction and demonstrating the usage of that instruction.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: H Peter Anvin <h.peter.anvin@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
---
 arch/x86/include/asm/special_insns.h | 37 ++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index aeb4666e0c0a..c9f2ebec33ac 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -215,6 +215,43 @@ static inline void clwb(volatile void *__p)
 		: [pax] "a" (p));
 }
 
+/**
+ * pcommit_sfence() - persistent commit and fence
+ *
+ * The PCOMMIT instruction ensures that data that has been flushed from the
+ * processor's cache hierarchy with CLWB, CLFLUSHOPT or CLFLUSH is accepted to
+ * memory and is durable on the DIMM.  The primary use case for this is
+ * persistent memory.
+ *
+ * This function shows how to properly use CLWB/CLFLUSHOPT/CLFLUSH and PCOMMIT
+ * with appropriate fencing:
+ *
+ * void flush_and_commit_buffer(void *vaddr, unsigned int size)
+ * {
+ *         unsigned long clflush_mask = boot_cpu_data.x86_clflush_size - 1;
+ *         void *vend = vaddr + size;
+ *         void *p;
+ *
+ *         for (p = (void *)((unsigned long)vaddr & ~clflush_mask);
+ *              p < vend; p += boot_cpu_data.x86_clflush_size)
+ *                 clwb(p);
+ *
+ *         // SFENCE to order CLWB/CLFLUSHOPT/CLFLUSH cache flushes
+ *         // MFENCE via mb() also works
+ *         wmb();
+ *
+ *         // PCOMMIT and the required SFENCE for ordering
+ *         pcommit_sfence();
+ * }
+ *
+ * After this function completes the data pointed to by 'vaddr' has been
+ * accepted to memory and will be durable if the 'vaddr' points to persistent
+ * memory.
+ *
+ * PCOMMIT must always be ordered by an MFENCE or SFENCE, so to help simplify
+ * things we include both the PCOMMIT and the required SFENCE in the
+ * alternatives generated by pcommit_sfence().
+ */
 static inline void pcommit_sfence(void)
 {
 	alternative(ASM_NOP7,
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [PATCH] x86: improve algorithm in clflush_cache_range
  2015-04-28 22:13 [PATCH] x86: improve algorithm in clflush_cache_range Ross Zwisler
@ 2015-04-29 10:28 ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-04-29 10:28 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: linux-kernel, H. Peter Anvin, Thomas Gleixner, Ingo Molnar, x86

On Tue, Apr 28, 2015 at 04:13:12PM -0600, Ross Zwisler wrote:
> The current algorithm used in clflush_cache_range() can cause the last
> cache line of the buffer to be flushed twice.  Fix that algorithm so
> that each cache line will only be flushed once.
> 
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> Reported-by: H. Peter Anvin <hpa@zytor.com>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: x86@kernel.org
> Cc: Borislav Petkov <bp@suse.de>
> ---
>  arch/x86/mm/pageattr.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)

Applied, thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v2] x86: Add kerneldoc for pcommit_sfence()
  2015-04-28 22:46 [PATCH v2] x86: Add kerneldoc for pcommit_sfence() Ross Zwisler
@ 2015-04-29 14:23 ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-04-29 14:23 UTC (permalink / raw)
  To: Ross Zwisler; +Cc: linux-kernel, H Peter Anvin, Ingo Molnar, Thomas Gleixner

On Tue, Apr 28, 2015 at 04:46:36PM -0600, Ross Zwisler wrote:
> Add kerneldoc comments for pcommit_sfence() describing the purpose of
> the pcommit instruction and demonstrating the usage of that instruction.
> 
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> Cc: H Peter Anvin <h.peter.anvin@intel.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Borislav Petkov <bp@alien8.de>
> ---

Applied, thanks.

I added the "Example:" thing because apparently kernel-doc parses that,
see below.

Doing

$ scripts/kernel-doc -html arch/x86/include/asm/special_insns.h > /tmp/doc.html

and then looking at doc.html doesn't make me go all full of joy and clap my
hands but whatever, it is better than nothing.

---
From: Ross Zwisler <ross.zwisler@linux.intel.com>
Date: Tue, 28 Apr 2015 16:46:36 -0600
Subject: [PATCH] x86/mm: Add kerneldoc comments for pcommit_sfence()

Add kerneldoc comments for pcommit_sfence() describing the purpose of
the PCOMMIT instruction and demonstrating its usage with an example.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: H Peter Anvin <h.peter.anvin@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/1430261196-2401-1-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/special_insns.h | 38 ++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index aeb4666e0c0a..2270e41b32fd 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -215,6 +215,44 @@ static inline void clwb(volatile void *__p)
 		: [pax] "a" (p));
 }
 
+/**
+ * pcommit_sfence() - persistent commit and fence
+ *
+ * The PCOMMIT instruction ensures that data that has been flushed from the
+ * processor's cache hierarchy with CLWB, CLFLUSHOPT or CLFLUSH is accepted to
+ * memory and is durable on the DIMM.  The primary use case for this is
+ * persistent memory.
+ *
+ * This function shows how to properly use CLWB/CLFLUSHOPT/CLFLUSH and PCOMMIT
+ * with appropriate fencing.
+ *
+ * Example:
+ * void flush_and_commit_buffer(void *vaddr, unsigned int size)
+ * {
+ *         unsigned long clflush_mask = boot_cpu_data.x86_clflush_size - 1;
+ *         void *vend = vaddr + size;
+ *         void *p;
+ *
+ *         for (p = (void *)((unsigned long)vaddr & ~clflush_mask);
+ *              p < vend; p += boot_cpu_data.x86_clflush_size)
+ *                 clwb(p);
+ *
+ *         // SFENCE to order CLWB/CLFLUSHOPT/CLFLUSH cache flushes
+ *         // MFENCE via mb() also works
+ *         wmb();
+ *
+ *         // PCOMMIT and the required SFENCE for ordering
+ *         pcommit_sfence();
+ * }
+ *
+ * After this function completes the data pointed to by 'vaddr' has been
+ * accepted to memory and will be durable if the 'vaddr' points to persistent
+ * memory.
+ *
+ * PCOMMIT must always be ordered by an MFENCE or SFENCE, so to help simplify
+ * things we include both the PCOMMIT and the required SFENCE in the
+ * alternatives generated by pcommit_sfence().
+ */
 static inline void pcommit_sfence(void)
 {
 	alternative(ASM_NOP7,
-- 
2.3.5

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4 0/6] x86: document and address MTRR corner cases
@ 2015-04-29 21:44 Luis R. Rodriguez
  2015-04-29 21:44   ` Luis R. Rodriguez
                   ` (6 more replies)
  0 siblings, 7 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-29 21:44 UTC (permalink / raw)
  To: mingo, tglx, hpa, bp, plagnioj, tomi.valkeinen, daniel.vetter, airlied
  Cc: dledford, awalls, syrjala, luto, mst, cocci, linux-kernel,
	Luis R. Rodriguez

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This series addresses one commend fix on the table for mtrr_add()
effect on the PAT case when UC- is used. Other than that it is
the same as v4.

Luis R. Rodriguez (6):
  x86: add ioremap_uc() - force strong UC, PCD=1, PWT=1
  x86: document WC MTRR effects on PAT / non-PAT pages
  video: fbdev: atyfb: move framebuffer length fudging to helper
  video: fbdev: atyfb: clarify ioremap() base and length used
  video: fbdev: atyfb: replace MTRR UC hole with strong UC
  video: fbdev: atyfb: use arch_phys_wc_add() and ioremap_wc()

 Documentation/x86/mtrr.txt           | 18 +++++--
 Documentation/x86/pat.txt            | 40 ++++++++++++++-
 arch/x86/include/asm/io.h            |  1 +
 arch/x86/kernel/cpu/mtrr/main.c      |  3 ++
 arch/x86/mm/ioremap.c                | 36 ++++++++++++-
 arch/x86/mm/pageattr.c               |  3 ++
 drivers/video/fbdev/aty/atyfb.h      |  5 +-
 drivers/video/fbdev/aty/atyfb_base.c | 98 ++++++++++++++----------------------
 include/asm-generic/io.h             |  8 +++
 9 files changed, 143 insertions(+), 69 deletions(-)

-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply	[flat|nested] 710+ messages in thread

* [PATCH v4 1/6] x86: add ioremap_uc() - force strong UC, PCD=1, PWT=1
  2015-04-29 21:44 [PATCH v4 0/6] x86: document and address MTRR corner cases Luis R. Rodriguez
@ 2015-04-29 21:44   ` Luis R. Rodriguez
  2015-04-29 21:44   ` Luis R. Rodriguez
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-29 21:44 UTC (permalink / raw)
  To: mingo, tglx, hpa, bp, plagnioj, tomi.valkeinen, daniel.vetter, airlied
  Cc: dledford, awalls, syrjala, luto, mst, cocci, linux-kernel,
	Luis R. Rodriguez, Toshi Kani, Bjorn Helgaas, Suresh Siddha,
	Juergen Gross, Daniel Vetter, Dave Airlie, Antonino Daplas,
	Will Deacon, Thierry Reding, Mike Travis, Mel Gorman,
	Vlastimil Babka, Davidlohr Bueso, x86, linux-fbdev

From: "Luis R. Rodriguez" <mcgrof@suse.com>

ioremap_nocache() currently uses UC- by default.
Our goal is to eventually make UC the default.
Linux maps UC- to PCD=1, PWT=0 page attributes on
non-PAT systems. Linux maps UC to PCD=1, PWT=1
page attributes on non-PAT systems. On non-PAT
and PAT systems a WC MTRR has different effects on
pages with either of these attributes. In order to
help with a smooth transition its best to enable
use of UC (PCD,1, PWT=1) on a region as that ensures
a WC MTRR will have no effect on a region, this
however requires us to have an way to declare a
region as UC and we currently do not have a way
to do this.

WC MTRR on non-PAT system with PCD=1, PWT=0 (UC-) yields WC.
WC MTRR on non-PAT system with PCD=1, PWT=1 (UC)  yields UC.

WC MTRR on PAT system with PCD=1, PWT=0 (UC-) yields WC.
WC MTRR on PAT system with PCD=1, PWT=1 (UC)  yields UC.

A flip of the default ioremap_nocache() behaviour
from UC- to UC can therefore regress a memory
region from effective memory type WC to UC if MTRRs
are used. Use of MTRRs should be phased out and in
the best case only arch_phys_wc_add() use will remain,
even if this happens arch_phys_wc_add() will have an
effect on non-PAT systems and changes to default
ioremap_nocache() behaviour could regress drivers.

Now, ideally we'd use ioremap_nocache() on the regions
in which we'd need uncachable memory types and avoid
any MTRRs on those regions. There are however some
restrictions on MTRRs use, such as the requirement of
having the base and size of variable sized MTRRs
to be powers of two, which could mean having to use
a WC MTRR over a large area which includes a region
in which write-combining effects are undesirable.

Add ioremap_uc() to help with the both phasing out of
MTRR use and also provide a way to blacklist small
WC undesirable regions in devices with mixed regions
which are size-implicated to use large WC MTRRs. Use
of ioremap_uc() helps phase out MTRR use by avoiding
regressions with an eventual flip of default behaviour
or ioremap_nocache() from UC- to UC.

Drivers working with WC MTRRs can use the below table
to review and consider the use of ioremap*() and similar
helpers to ensure appropriate behaviour long term even
if default ioremap_nocache() behaviour changes from UC-
to UC.

Although ioremap_uc() is being added we leave set_memory_uc()
to use UC- as only initial memory type setup is required
to be able to accomodate existing device drivers and phase
out MTRR use. It should also be clarified that set_memory_uc()
cannot be used with IO memory, even though its use will
not return any errors, it really has no effect.

----------------------------------------------------------------------
MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
----------------------------------------------------------------------
                                                  Non-PAT |  PAT
     PAT
     |PCD
     ||PWT
     |||
WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   WC
WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
----------------------------------------------------------------------

Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: x86@kernel.org
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 arch/x86/include/asm/io.h |  1 +
 arch/x86/mm/ioremap.c     | 36 +++++++++++++++++++++++++++++++++++-
 arch/x86/mm/pageattr.c    |  3 +++
 include/asm-generic/io.h  |  8 ++++++++
 4 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 34a5b93..4afc05f 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -177,6 +177,7 @@ static inline unsigned int isa_virt_to_bus(volatile void *address)
  * look at pci_iomap().
  */
 extern void __iomem *ioremap_nocache(resource_size_t offset, unsigned long size);
+extern void __iomem *ioremap_uc(resource_size_t offset, unsigned long size);
 extern void __iomem *ioremap_cache(resource_size_t offset, unsigned long size);
 extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size,
 				unsigned long prot_val);
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 70e7444..a493bb8 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -237,7 +237,8 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, unsigned long size)
 	 *	pat_enabled ? _PAGE_CACHE_MODE_UC : _PAGE_CACHE_MODE_UC_MINUS;
 	 *
 	 * Till we fix all X drivers to use ioremap_wc(), we will use
-	 * UC MINUS.
+	 * UC MINUS. Drivers that are certain they need or can already
+	 * be converted over to strong UC can use ioremap_uc().
 	 */
 	enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS;
 
@@ -247,6 +248,39 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, unsigned long size)
 EXPORT_SYMBOL(ioremap_nocache);
 
 /**
+ * ioremap_uc     -   map bus memory into CPU space as strongly uncachable
+ * @phys_addr:    bus address of the memory
+ * @size:      size of the resource to map
+ *
+ * ioremap_uc performs a platform specific sequence of operations to
+ * make bus memory CPU accessible via the readb/readw/readl/writeb/
+ * writew/writel functions and the other mmio helpers. The returned
+ * address is not guaranteed to be usable directly as a virtual
+ * address.
+ *
+ * This version of ioremap ensures that the memory is marked with a strong
+ * preference as completely uncachable on the CPU when possible. For non-PAT
+ * systems this ends up setting page-attribute flags PCD=1, PWT=1. For PAT
+ * systems this will set the PAT entry for the pages as strong UC.  This call
+ * will honor existing caching rules from things like the PCI bus. Note that
+ * there are other caches and buffers on many busses. In particular driver
+ * authors should read up on PCI writes.
+ *
+ * It's useful if some control registers are in such an area and
+ * write combining or read caching is not desirable:
+ *
+ * Must be freed with iounmap.
+ */
+void __iomem *ioremap_uc(resource_size_t phys_addr, unsigned long size)
+{
+	enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC;
+
+	return __ioremap_caller(phys_addr, size, pcm,
+				__builtin_return_address(0));
+}
+EXPORT_SYMBOL_GPL(ioremap_uc);
+
+/**
  * ioremap_wc	-	map memory into CPU space write combined
  * @phys_addr:	bus address of the memory
  * @size:	size of the resource to map
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 89af288..49660c0 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1468,6 +1468,9 @@ int _set_memory_uc(unsigned long addr, int numpages)
 {
 	/*
 	 * for now UC MINUS. see comments in ioremap_nocache()
+	 * If you really need strong UC use ioremap_uc(), but note
+	 * that you cannot override IO areas with set_memory_*() as
+	 * these helpers cannot work with IO memory.
 	 */
 	return change_page_attr_set(&addr, numpages,
 				    cachemode2pgprot(_PAGE_CACHE_MODE_UC_MINUS),
diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index 9db0423..90ccba7 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -769,6 +769,14 @@ static inline void __iomem *ioremap_nocache(phys_addr_t offset, size_t size)
 }
 #endif
 
+#ifndef ioremap_uc
+#define ioremap_uc ioremap_uc
+static inline void __iomem *ioremap_uc(phys_addr_t offset, size_t size)
+{
+	return ioremap_nocache(offset, size);
+}
+#endif
+
 #ifndef ioremap_wc
 #define ioremap_wc ioremap_wc
 static inline void __iomem *ioremap_wc(phys_addr_t offset, size_t size)
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4 1/6] x86: add ioremap_uc() - force strong UC, PCD=1, PWT=1
@ 2015-04-29 21:44   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-29 21:44 UTC (permalink / raw)
  To: mingo, tglx, hpa, bp, plagnioj, tomi.valkeinen, daniel.vetter, airlied
  Cc: dledford, awalls, syrjala, luto, mst, cocci, linux-kernel,
	Luis R. Rodriguez, Toshi Kani, Bjorn Helgaas, Suresh Siddha,
	Juergen Gross, Daniel Vetter, Dave Airlie, Antonino Daplas,
	Will Deacon, Thierry Reding, Mike Travis, Mel Gorman,
	Vlastimil Babka, Davidlohr Bueso, x86, linux-fbdev

From: "Luis R. Rodriguez" <mcgrof@suse.com>

ioremap_nocache() currently uses UC- by default.
Our goal is to eventually make UC the default.
Linux maps UC- to PCD=1, PWT=0 page attributes on
non-PAT systems. Linux maps UC to PCD=1, PWT=1
page attributes on non-PAT systems. On non-PAT
and PAT systems a WC MTRR has different effects on
pages with either of these attributes. In order to
help with a smooth transition its best to enable
use of UC (PCD,1, PWT=1) on a region as that ensures
a WC MTRR will have no effect on a region, this
however requires us to have an way to declare a
region as UC and we currently do not have a way
to do this.

WC MTRR on non-PAT system with PCD=1, PWT=0 (UC-) yields WC.
WC MTRR on non-PAT system with PCD=1, PWT=1 (UC)  yields UC.

WC MTRR on PAT system with PCD=1, PWT=0 (UC-) yields WC.
WC MTRR on PAT system with PCD=1, PWT=1 (UC)  yields UC.

A flip of the default ioremap_nocache() behaviour
from UC- to UC can therefore regress a memory
region from effective memory type WC to UC if MTRRs
are used. Use of MTRRs should be phased out and in
the best case only arch_phys_wc_add() use will remain,
even if this happens arch_phys_wc_add() will have an
effect on non-PAT systems and changes to default
ioremap_nocache() behaviour could regress drivers.

Now, ideally we'd use ioremap_nocache() on the regions
in which we'd need uncachable memory types and avoid
any MTRRs on those regions. There are however some
restrictions on MTRRs use, such as the requirement of
having the base and size of variable sized MTRRs
to be powers of two, which could mean having to use
a WC MTRR over a large area which includes a region
in which write-combining effects are undesirable.

Add ioremap_uc() to help with the both phasing out of
MTRR use and also provide a way to blacklist small
WC undesirable regions in devices with mixed regions
which are size-implicated to use large WC MTRRs. Use
of ioremap_uc() helps phase out MTRR use by avoiding
regressions with an eventual flip of default behaviour
or ioremap_nocache() from UC- to UC.

Drivers working with WC MTRRs can use the below table
to review and consider the use of ioremap*() and similar
helpers to ensure appropriate behaviour long term even
if default ioremap_nocache() behaviour changes from UC-
to UC.

Although ioremap_uc() is being added we leave set_memory_uc()
to use UC- as only initial memory type setup is required
to be able to accomodate existing device drivers and phase
out MTRR use. It should also be clarified that set_memory_uc()
cannot be used with IO memory, even though its use will
not return any errors, it really has no effect.

----------------------------------------------------------------------
MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
----------------------------------------------------------------------
                                                  Non-PAT |  PAT
     PAT
     |PCD
     ||PWT
     |||
WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   WC
WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
----------------------------------------------------------------------

Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: x86@kernel.org
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 arch/x86/include/asm/io.h |  1 +
 arch/x86/mm/ioremap.c     | 36 +++++++++++++++++++++++++++++++++++-
 arch/x86/mm/pageattr.c    |  3 +++
 include/asm-generic/io.h  |  8 ++++++++
 4 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 34a5b93..4afc05f 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -177,6 +177,7 @@ static inline unsigned int isa_virt_to_bus(volatile void *address)
  * look at pci_iomap().
  */
 extern void __iomem *ioremap_nocache(resource_size_t offset, unsigned long size);
+extern void __iomem *ioremap_uc(resource_size_t offset, unsigned long size);
 extern void __iomem *ioremap_cache(resource_size_t offset, unsigned long size);
 extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size,
 				unsigned long prot_val);
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 70e7444..a493bb8 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -237,7 +237,8 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, unsigned long size)
 	 *	pat_enabled ? _PAGE_CACHE_MODE_UC : _PAGE_CACHE_MODE_UC_MINUS;
 	 *
 	 * Till we fix all X drivers to use ioremap_wc(), we will use
-	 * UC MINUS.
+	 * UC MINUS. Drivers that are certain they need or can already
+	 * be converted over to strong UC can use ioremap_uc().
 	 */
 	enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS;
 
@@ -247,6 +248,39 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, unsigned long size)
 EXPORT_SYMBOL(ioremap_nocache);
 
 /**
+ * ioremap_uc     -   map bus memory into CPU space as strongly uncachable
+ * @phys_addr:    bus address of the memory
+ * @size:      size of the resource to map
+ *
+ * ioremap_uc performs a platform specific sequence of operations to
+ * make bus memory CPU accessible via the readb/readw/readl/writeb/
+ * writew/writel functions and the other mmio helpers. The returned
+ * address is not guaranteed to be usable directly as a virtual
+ * address.
+ *
+ * This version of ioremap ensures that the memory is marked with a strong
+ * preference as completely uncachable on the CPU when possible. For non-PAT
+ * systems this ends up setting page-attribute flags PCD=1, PWT=1. For PAT
+ * systems this will set the PAT entry for the pages as strong UC.  This call
+ * will honor existing caching rules from things like the PCI bus. Note that
+ * there are other caches and buffers on many busses. In particular driver
+ * authors should read up on PCI writes.
+ *
+ * It's useful if some control registers are in such an area and
+ * write combining or read caching is not desirable:
+ *
+ * Must be freed with iounmap.
+ */
+void __iomem *ioremap_uc(resource_size_t phys_addr, unsigned long size)
+{
+	enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC;
+
+	return __ioremap_caller(phys_addr, size, pcm,
+				__builtin_return_address(0));
+}
+EXPORT_SYMBOL_GPL(ioremap_uc);
+
+/**
  * ioremap_wc	-	map memory into CPU space write combined
  * @phys_addr:	bus address of the memory
  * @size:	size of the resource to map
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 89af288..49660c0 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1468,6 +1468,9 @@ int _set_memory_uc(unsigned long addr, int numpages)
 {
 	/*
 	 * for now UC MINUS. see comments in ioremap_nocache()
+	 * If you really need strong UC use ioremap_uc(), but note
+	 * that you cannot override IO areas with set_memory_*() as
+	 * these helpers cannot work with IO memory.
 	 */
 	return change_page_attr_set(&addr, numpages,
 				    cachemode2pgprot(_PAGE_CACHE_MODE_UC_MINUS),
diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index 9db0423..90ccba7 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -769,6 +769,14 @@ static inline void __iomem *ioremap_nocache(phys_addr_t offset, size_t size)
 }
 #endif
 
+#ifndef ioremap_uc
+#define ioremap_uc ioremap_uc
+static inline void __iomem *ioremap_uc(phys_addr_t offset, size_t size)
+{
+	return ioremap_nocache(offset, size);
+}
+#endif
+
 #ifndef ioremap_wc
 #define ioremap_wc ioremap_wc
 static inline void __iomem *ioremap_wc(phys_addr_t offset, size_t size)
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4 2/6] x86: document WC MTRR effects on PAT / non-PAT pages
  2015-04-29 21:44 [PATCH v4 0/6] x86: document and address MTRR corner cases Luis R. Rodriguez
@ 2015-04-29 21:44   ` Luis R. Rodriguez
  2015-04-29 21:44   ` Luis R. Rodriguez
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-29 21:44 UTC (permalink / raw)
  To: mingo, tglx, hpa, bp, plagnioj, tomi.valkeinen, daniel.vetter, airlied
  Cc: dledford, awalls, syrjala, luto, mst, cocci, linux-kernel,
	Luis R. Rodriguez, Toshi Kani, Jonathan Corbet, Dave Hansen,
	Suresh Siddha, Juergen Gross, Daniel Vetter, Dave Airlie,
	Antonino Daplas, Mel Gorman, Vlastimil Babka, Davidlohr Bueso,
	linux-fbdev

From: "Luis R. Rodriguez" <mcgrof@suse.com>

As part of the effort to phase out MTRR use document
write-combining MTRR effects on pages with different
non-PAT page attributes flags and different PAT entry
values. Extend arch_phys_wc_add() documentation to
clarify power of two sizes / boundary requirements as
we phase out mtrr_add() use.

Lastly hint towards ioremap_uc() for corner cases on
device drivers working with devices with mixed regions
where MTRR size requirements would otherwise not
enable write-combining effective memory types.

Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 Documentation/x86/mtrr.txt      | 18 +++++++++++++++---
 Documentation/x86/pat.txt       | 40 +++++++++++++++++++++++++++++++++++++++-
 arch/x86/kernel/cpu/mtrr/main.c |  3 +++
 3 files changed, 57 insertions(+), 4 deletions(-)

diff --git a/Documentation/x86/mtrr.txt b/Documentation/x86/mtrr.txt
index cc071dc..a111a6c 100644
--- a/Documentation/x86/mtrr.txt
+++ b/Documentation/x86/mtrr.txt
@@ -1,7 +1,19 @@
 MTRR (Memory Type Range Register) control
-3 Jun 1999
-Richard Gooch
-<rgooch@atnf.csiro.au>
+
+Richard Gooch <rgooch@atnf.csiro.au> - 3 Jun 1999
+Luis R. Rodriguez <mcgrof@do-not-panic.com> - April 9, 2015
+
+===============================================================================
+Phasing MTRR use
+
+MTRR use is replaced on modern x86 hardware with PAT. Over time the only type
+of effective MTRR that is expected to be supported will be for write-combining.
+As MTRR use is phased out device drivers should use arch_phys_wc_add() to make
+MTRR effective on non-PAT systems while a no-op on PAT enabled systems.
+
+For details refer to Documentation/x86/pat.txt.
+
+===============================================================================
 
   On Intel P6 family processors (Pentium Pro, Pentium II and later)
   the Memory Type Range Registers (MTRRs) may be used to control
diff --git a/Documentation/x86/pat.txt b/Documentation/x86/pat.txt
index cf08c9f..7e183e3 100644
--- a/Documentation/x86/pat.txt
+++ b/Documentation/x86/pat.txt
@@ -34,6 +34,8 @@ ioremap                |    --    |    UC-     |       UC-        |
                        |          |            |                  |
 ioremap_cache          |    --    |    WB      |       WB         |
                        |          |            |                  |
+ioremap_uc             |    --    |    UC      |       UC         |
+                       |          |            |                  |
 ioremap_nocache        |    --    |    UC-     |       UC-        |
                        |          |            |                  |
 ioremap_wc             |    --    |    --      |       WC         |
@@ -102,7 +104,43 @@ wants to export a RAM region, it has to do set_memory_uc() or set_memory_wc()
 as step 0 above and also track the usage of those pages and use set_memory_wb()
 before the page is freed to free pool.
 
-
+MTRR effects on PAT / non-PAT systems
+-------------------------------------
+
+The following table provides the effects of using write-combining MTRRs when
+using ioremap*() calls on x86 for both non-PAT and PAT systems. Ideally
+mtrr_add() usage will be phased in favor of arch_phys_wc_add() which will
+be a no-op on PAT enabled systems. The region over which a arch_phys_wc_add()
+is made should already have be ioremap'd with write-combining page attributes
+or PAT entries, this can be done by using ioremap_wc() / or respective helpers.
+Devices which combine areas of IO memory desired to remain uncachable with
+areas where write-combining is desirable and are restricted by the size
+requirements of MTRRs should consider splitting up their IO memory space
+cleanly with ioremap_uc() and ioremap_wc() followed by an arch_phys_wc_add()
+encompassing both regions. Such use is nevertheless heavily discouraged as
+the effective memory type is considered implementation defined. This strategy
+should only be used as last resort on devices with size-contrained regions
+where otherwise MTRR write-combining would not be effective.
+
+Note that you cannot use set_memory_wc() to override / whitelist IO remapped
+memory space mapped with ioremap*() calls, set_memory_wc() can only be used
+on RAM.
+
+----------------------------------------------------------------------
+MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
+----------------------------------------------------------------------
+                                                  Non-PAT |  PAT
+     PAT
+     |PCD
+     ||PWT
+     |||
+WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
+WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
+WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   WC
+WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
+----------------------------------------------------------------------
+
+(*) denotes implementation defined and is discouraged
 
 Notes:
 
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index ea5f363..12abdbe 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -538,6 +538,9 @@ EXPORT_SYMBOL(mtrr_del);
  * attempts to add a WC MTRR covering size bytes starting at base and
  * logs an error if this fails.
  *
+ * The caller should expect to need to provide a power of two size on an
+ * equivalent power of two boundary.
+ *
  * Drivers must store the return value to pass to mtrr_del_wc_if_needed,
  * but drivers should not try to interpret that return value.
  */
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4 2/6] x86: document WC MTRR effects on PAT / non-PAT pages
@ 2015-04-29 21:44   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-29 21:44 UTC (permalink / raw)
  To: mingo, tglx, hpa, bp, plagnioj, tomi.valkeinen, daniel.vetter, airlied
  Cc: dledford, awalls, syrjala, luto, mst, cocci, linux-kernel,
	Luis R. Rodriguez, Toshi Kani, Jonathan Corbet, Dave Hansen,
	Suresh Siddha, Juergen Gross, Daniel Vetter, Dave Airlie,
	Antonino Daplas, Mel Gorman, Vlastimil Babka, Davidlohr Bueso,
	linux-fbdev

From: "Luis R. Rodriguez" <mcgrof@suse.com>

As part of the effort to phase out MTRR use document
write-combining MTRR effects on pages with different
non-PAT page attributes flags and different PAT entry
values. Extend arch_phys_wc_add() documentation to
clarify power of two sizes / boundary requirements as
we phase out mtrr_add() use.

Lastly hint towards ioremap_uc() for corner cases on
device drivers working with devices with mixed regions
where MTRR size requirements would otherwise not
enable write-combining effective memory types.

Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 Documentation/x86/mtrr.txt      | 18 +++++++++++++++---
 Documentation/x86/pat.txt       | 40 +++++++++++++++++++++++++++++++++++++++-
 arch/x86/kernel/cpu/mtrr/main.c |  3 +++
 3 files changed, 57 insertions(+), 4 deletions(-)

diff --git a/Documentation/x86/mtrr.txt b/Documentation/x86/mtrr.txt
index cc071dc..a111a6c 100644
--- a/Documentation/x86/mtrr.txt
+++ b/Documentation/x86/mtrr.txt
@@ -1,7 +1,19 @@
 MTRR (Memory Type Range Register) control
-3 Jun 1999
-Richard Gooch
-<rgooch@atnf.csiro.au>
+
+Richard Gooch <rgooch@atnf.csiro.au> - 3 Jun 1999
+Luis R. Rodriguez <mcgrof@do-not-panic.com> - April 9, 2015
+
+=======================================+Phasing MTRR use
+
+MTRR use is replaced on modern x86 hardware with PAT. Over time the only type
+of effective MTRR that is expected to be supported will be for write-combining.
+As MTRR use is phased out device drivers should use arch_phys_wc_add() to make
+MTRR effective on non-PAT systems while a no-op on PAT enabled systems.
+
+For details refer to Documentation/x86/pat.txt.
+
+======================================= 
   On Intel P6 family processors (Pentium Pro, Pentium II and later)
   the Memory Type Range Registers (MTRRs) may be used to control
diff --git a/Documentation/x86/pat.txt b/Documentation/x86/pat.txt
index cf08c9f..7e183e3 100644
--- a/Documentation/x86/pat.txt
+++ b/Documentation/x86/pat.txt
@@ -34,6 +34,8 @@ ioremap                |    --    |    UC-     |       UC-        |
                        |          |            |                  |
 ioremap_cache          |    --    |    WB      |       WB         |
                        |          |            |                  |
+ioremap_uc             |    --    |    UC      |       UC         |
+                       |          |            |                  |
 ioremap_nocache        |    --    |    UC-     |       UC-        |
                        |          |            |                  |
 ioremap_wc             |    --    |    --      |       WC         |
@@ -102,7 +104,43 @@ wants to export a RAM region, it has to do set_memory_uc() or set_memory_wc()
 as step 0 above and also track the usage of those pages and use set_memory_wb()
 before the page is freed to free pool.
 
-
+MTRR effects on PAT / non-PAT systems
+-------------------------------------
+
+The following table provides the effects of using write-combining MTRRs when
+using ioremap*() calls on x86 for both non-PAT and PAT systems. Ideally
+mtrr_add() usage will be phased in favor of arch_phys_wc_add() which will
+be a no-op on PAT enabled systems. The region over which a arch_phys_wc_add()
+is made should already have be ioremap'd with write-combining page attributes
+or PAT entries, this can be done by using ioremap_wc() / or respective helpers.
+Devices which combine areas of IO memory desired to remain uncachable with
+areas where write-combining is desirable and are restricted by the size
+requirements of MTRRs should consider splitting up their IO memory space
+cleanly with ioremap_uc() and ioremap_wc() followed by an arch_phys_wc_add()
+encompassing both regions. Such use is nevertheless heavily discouraged as
+the effective memory type is considered implementation defined. This strategy
+should only be used as last resort on devices with size-contrained regions
+where otherwise MTRR write-combining would not be effective.
+
+Note that you cannot use set_memory_wc() to override / whitelist IO remapped
+memory space mapped with ioremap*() calls, set_memory_wc() can only be used
+on RAM.
+
+----------------------------------------------------------------------
+MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
+----------------------------------------------------------------------
+                                                  Non-PAT |  PAT
+     PAT
+     |PCD
+     ||PWT
+     |||
+WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
+WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
+WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   WC
+WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
+----------------------------------------------------------------------
+
+(*) denotes implementation defined and is discouraged
 
 Notes:
 
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index ea5f363..12abdbe 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -538,6 +538,9 @@ EXPORT_SYMBOL(mtrr_del);
  * attempts to add a WC MTRR covering size bytes starting at base and
  * logs an error if this fails.
  *
+ * The caller should expect to need to provide a power of two size on an
+ * equivalent power of two boundary.
+ *
  * Drivers must store the return value to pass to mtrr_del_wc_if_needed,
  * but drivers should not try to interpret that return value.
  */
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4 3/6] video: fbdev: atyfb: move framebuffer length fudging to helper
  2015-04-29 21:44 [PATCH v4 0/6] x86: document and address MTRR corner cases Luis R. Rodriguez
@ 2015-04-29 21:44   ` Luis R. Rodriguez
  2015-04-29 21:44   ` Luis R. Rodriguez
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-29 21:44 UTC (permalink / raw)
  To: mingo, tglx, hpa, bp, plagnioj, tomi.valkeinen, daniel.vetter, airlied
  Cc: dledford, awalls, syrjala, luto, mst, cocci, linux-kernel,
	Luis R. Rodriguez, Toshi Kani, Suresh Siddha, Linus Torvalds,
	Juergen Gross, Daniel Vetter, Dave Airlie, Antonino Daplas,
	Rob Clark, Mathias Krause, Andrzej Hajda, Mel Gorman,
	Vlastimil Babka, Davidlohr Bueso, linux-fbdev

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The size of the framebuffer to be used needs to
be fudged to account for the different type of
devices that are out there. This captures what
is required to do well, we'll resuse this later.

This has no functional changes.

Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Andrzej Hajda <a.hajda@samsung.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/aty/atyfb_base.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
index 8789e48..16936bb 100644
--- a/drivers/video/fbdev/aty/atyfb_base.c
+++ b/drivers/video/fbdev/aty/atyfb_base.c
@@ -427,6 +427,20 @@ static struct {
 #endif /* CONFIG_FB_ATY_CT */
 };
 
+/*
+ * Last page of 8 MB (4 MB on ISA) aperture is MMIO,
+ * unless the auxiliary register aperture is used.
+ */
+static void aty_fudge_framebuffer_len(struct fb_info *info)
+{
+	struct atyfb_par *par = (struct atyfb_par *) info->par;
+
+	if (!par->aux_start &&
+	    (info->fix.smem_len == 0x800000 ||
+	     (par->bus_type == ISA && info->fix.smem_len == 0x400000)))
+		info->fix.smem_len -= GUI_RESERVE;
+}
+
 static int correct_chipset(struct atyfb_par *par)
 {
 	u8 rev;
@@ -2603,14 +2617,7 @@ static int aty_init(struct fb_info *info)
 	if (par->pll_ops->resume_pll)
 		par->pll_ops->resume_pll(info, &par->pll);
 
-	/*
-	 * Last page of 8 MB (4 MB on ISA) aperture is MMIO,
-	 * unless the auxiliary register aperture is used.
-	 */
-	if (!par->aux_start &&
-	    (info->fix.smem_len == 0x800000 ||
-	     (par->bus_type == ISA && info->fix.smem_len == 0x400000)))
-		info->fix.smem_len -= GUI_RESERVE;
+	aty_fudge_framebuffer_len(info);
 
 	/*
 	 * Disable register access through the linear aperture
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4 3/6] video: fbdev: atyfb: move framebuffer length fudging to helper
@ 2015-04-29 21:44   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-29 21:44 UTC (permalink / raw)
  To: mingo, tglx, hpa, bp, plagnioj, tomi.valkeinen, daniel.vetter, airlied
  Cc: dledford, awalls, syrjala, luto, mst, cocci, linux-kernel,
	Luis R. Rodriguez, Toshi Kani, Suresh Siddha, Linus Torvalds,
	Juergen Gross, Daniel Vetter, Dave Airlie, Antonino Daplas,
	Rob Clark, Mathias Krause, Andrzej Hajda, Mel Gorman,
	Vlastimil Babka, Davidlohr Bueso, linux-fbdev

From: "Luis R. Rodriguez" <mcgrof@suse.com>

The size of the framebuffer to be used needs to
be fudged to account for the different type of
devices that are out there. This captures what
is required to do well, we'll resuse this later.

This has no functional changes.

Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Andrzej Hajda <a.hajda@samsung.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/aty/atyfb_base.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
index 8789e48..16936bb 100644
--- a/drivers/video/fbdev/aty/atyfb_base.c
+++ b/drivers/video/fbdev/aty/atyfb_base.c
@@ -427,6 +427,20 @@ static struct {
 #endif /* CONFIG_FB_ATY_CT */
 };
 
+/*
+ * Last page of 8 MB (4 MB on ISA) aperture is MMIO,
+ * unless the auxiliary register aperture is used.
+ */
+static void aty_fudge_framebuffer_len(struct fb_info *info)
+{
+	struct atyfb_par *par = (struct atyfb_par *) info->par;
+
+	if (!par->aux_start &&
+	    (info->fix.smem_len = 0x800000 ||
+	     (par->bus_type = ISA && info->fix.smem_len = 0x400000)))
+		info->fix.smem_len -= GUI_RESERVE;
+}
+
 static int correct_chipset(struct atyfb_par *par)
 {
 	u8 rev;
@@ -2603,14 +2617,7 @@ static int aty_init(struct fb_info *info)
 	if (par->pll_ops->resume_pll)
 		par->pll_ops->resume_pll(info, &par->pll);
 
-	/*
-	 * Last page of 8 MB (4 MB on ISA) aperture is MMIO,
-	 * unless the auxiliary register aperture is used.
-	 */
-	if (!par->aux_start &&
-	    (info->fix.smem_len = 0x800000 ||
-	     (par->bus_type = ISA && info->fix.smem_len = 0x400000)))
-		info->fix.smem_len -= GUI_RESERVE;
+	aty_fudge_framebuffer_len(info);
 
 	/*
 	 * Disable register access through the linear aperture
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4 4/6] video: fbdev: atyfb: clarify ioremap() base and length used
  2015-04-29 21:44 [PATCH v4 0/6] x86: document and address MTRR corner cases Luis R. Rodriguez
@ 2015-04-29 21:44   ` Luis R. Rodriguez
  2015-04-29 21:44   ` Luis R. Rodriguez
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-29 21:44 UTC (permalink / raw)
  To: mingo, tglx, hpa, bp, plagnioj, tomi.valkeinen, daniel.vetter, airlied
  Cc: dledford, awalls, syrjala, luto, mst, cocci, linux-kernel,
	Luis R. Rodriguez, Toshi Kani, Suresh Siddha, Linus Torvalds,
	Juergen Gross, Daniel Vetter, Dave Airlie, Antonino Daplas,
	Rob Clark, Mathias Krause, Andrzej Hajda, Mel Gorman,
	Vlastimil Babka, Davidlohr Bueso, linux-fbdev

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This has no functional changes, it just adjusts
the ioremap() call for the framebuffer to use
the same values we later use for the framebuffer,
this will make it easier to review the next change.

The size of the framebuffer varies but since this is
for PCI we *know* this defaults to 0x800000.
atyfb_setup_generic() is *only* used on PCI probe.

Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Andrzej Hajda <a.hajda@samsung.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/aty/atyfb_base.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
index 16936bb..8025624 100644
--- a/drivers/video/fbdev/aty/atyfb_base.c
+++ b/drivers/video/fbdev/aty/atyfb_base.c
@@ -3489,7 +3489,9 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
 
 	/* Map in frame buffer */
 	info->fix.smem_start = addr;
-	info->screen_base = ioremap(addr, 0x800000);
+	info->fix.smem_len = 0x800000;
+
+	info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
 	if (info->screen_base == NULL) {
 		ret = -ENOMEM;
 		goto atyfb_setup_generic_fail;
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4 4/6] video: fbdev: atyfb: clarify ioremap() base and length used
@ 2015-04-29 21:44   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-29 21:44 UTC (permalink / raw)
  To: mingo, tglx, hpa, bp, plagnioj, tomi.valkeinen, daniel.vetter, airlied
  Cc: dledford, awalls, syrjala, luto, mst, cocci, linux-kernel,
	Luis R. Rodriguez, Toshi Kani, Suresh Siddha, Linus Torvalds,
	Juergen Gross, Daniel Vetter, Dave Airlie, Antonino Daplas,
	Rob Clark, Mathias Krause, Andrzej Hajda, Mel Gorman,
	Vlastimil Babka, Davidlohr Bueso, linux-fbdev

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This has no functional changes, it just adjusts
the ioremap() call for the framebuffer to use
the same values we later use for the framebuffer,
this will make it easier to review the next change.

The size of the framebuffer varies but since this is
for PCI we *know* this defaults to 0x800000.
atyfb_setup_generic() is *only* used on PCI probe.

Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Andrzej Hajda <a.hajda@samsung.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/aty/atyfb_base.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
index 16936bb..8025624 100644
--- a/drivers/video/fbdev/aty/atyfb_base.c
+++ b/drivers/video/fbdev/aty/atyfb_base.c
@@ -3489,7 +3489,9 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
 
 	/* Map in frame buffer */
 	info->fix.smem_start = addr;
-	info->screen_base = ioremap(addr, 0x800000);
+	info->fix.smem_len = 0x800000;
+
+	info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
 	if (info->screen_base = NULL) {
 		ret = -ENOMEM;
 		goto atyfb_setup_generic_fail;
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4 5/6] video: fbdev: atyfb: replace MTRR UC hole with strong UC
  2015-04-29 21:44 [PATCH v4 0/6] x86: document and address MTRR corner cases Luis R. Rodriguez
@ 2015-04-29 21:44   ` Luis R. Rodriguez
  2015-04-29 21:44   ` Luis R. Rodriguez
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-29 21:44 UTC (permalink / raw)
  To: mingo, tglx, hpa, bp, plagnioj, tomi.valkeinen, daniel.vetter, airlied
  Cc: dledford, awalls, syrjala, luto, mst, cocci, linux-kernel,
	Luis R. Rodriguez, Toshi Kani, Suresh Siddha, Linus Torvalds,
	Juergen Gross, Daniel Vetter, Dave Airlie, Antonino Daplas,
	Rob Clark, Mathias Krause, Andrzej Hajda, Mel Gorman,
	Vlastimil Babka, Davidlohr Bueso, linux-fbdev

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Replace a WC MTRR call followed by a UC MTRR "hole" call
with a single WC MTRR call and use strong UC to protect
the MMIO region and account for the device's architecture
and MTRR size requirements.

The atyfb driver relies on two overlapping MTRRs. It
does this to account for the fact that on some devices
it has the MMIO region bundled together with the framebuffer
on the same PCI BAR and the hardware requirement on
MTRRs on both base and size to be powers of two. In the
atyfb driver's case in the worst case the PCI BAR is
of 16 MiB while the MMIO region is on the last 4 KiB of
the same PCI BAR. If we use just one MTRR for WC we can
only end up with an 8 MiB or 16 MiB framebuffer. Using a
16 MiB WC framebuffer area is unacceptable since we need
the MMIO region to not be write-combined. An 8 MiB WC
framebuffer option does not let use quite a bit of framebuffer
space, it would reduce the resolution capability of the device
considerably. An alternative is to use many MTRRs but on
some systems that could mean not having not enough MTRRs
to cover the framebuffer. The current driver solution is
to issue a 16 MiB WC MTRR followed by a 4 KiB UC MTRR on
the last 4 KiB. Its worth mentioning and documenting that
the current ioremap*() strategy as well: the first ioremap()
is used only for the MMIO region, a second ioremap() call
is used for the framebuffer *and* the MMIO region, the MMIO
region then ends up mmap'd twice. Two ioremap() calls are
used since in some situations the framebuffer actually ends
up on a separate auxiliary PCI BAR, but this is not always
true, in the worst case the PCI BAR is shared for both
MMIO and the framebuffer. By allowing overlapping ioremap()
calls the driver enables two types of devices with one
simple ioremap() strategy.

For non PAT systems:

As per Intel SDM "11.5.2.1 Selecting Memory Types for Pentium
Pro and Pentium II Processors" [0] the effect of a WC MTRR for
a region with page attribute settings set to PCD=1, PWT=1
(Linux _PAGE_CACHE_MODE_UC) will render the effective memory
type to UC. A WC MTRR for a region with page attribute settings
set to PCD=1, PWT=0 (Linux _PAGE_CACHE_MODE_UC_MINUS) will render
the effective memory type to WC *but* yet this is considered
implementation defined -- that is, "system designers are
encouraged to avoid these implementation-defined combinations".
A WC MTRR for a region with page attribute settings set to
PCD=0, PWT=1 (Linux _PAGE_CACHE_MODE_WC) will render the
effective memory type to WC *but* this is also implementation
defined. Such is the case for non-PAT systems.

For PAT systems:

As per Intel SDM "11.5.2.2 Selecting Memory Types for Pentium
III and More Recent Processor Families" the ffect of a WC MTRR
for a region with a PAT entry value of UC will be UC. The effect
of a WC MTRR on a region with a PAT entry UC- will be WC. The
effect of a WC MTRR on a regoin with PAT entry WC is WC.

This can all be summarized in the following table:

----------------------------------------------------------------------
MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
----------------------------------------------------------------------
                                                  Non-PAT |  PAT
     PAT
     |PCD
     ||PWT
     |||
WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   UC
WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
----------------------------------------------------------------------

 (*) denotes implementation defined

By default Linux today defaults both and ioremap_nocache()
to use _PAGE_CACHE_MODE_UC_MINUS. On x86 ioremap() aliases
ioremap_nocache(). The preferred value for Linux by may soon
change however, the goal is to use _PAGE_CACHE_MODE_UC by
default in the future.

We can use ioremap_uc() to set PCD=1, PWT=1 on non-PAT systems
and use a PAT value of UC for PAT systems. This will ensure the
same settings are in place regardless of what Linux decides to
use by default later and to not regress our MTRR strategy since
the effective memory type will differ depending on the value used.
Using a WC MTRR on such an area will be nullified. This technique
can be used to protect the MMIO region in this driver's case and
address the restrictions of the device's architecture as well as
restrictions set upon us by powers of 2 when using MTRRs.

This allows us to replace the two MTRR calls with a single
16 MiB WC MTRR and use page-attribute settings for non-PAT
and PAT entry values for PAT systems to ensure the
appropriate effective memory type won't have a write-combined
effect on the MMIO region on both non-PAT and PAT systems.
The framebuffer area will be sure to get the write-combined
effective memory type by white-listing it with ioremap_wc().

We ensure the desired effective memory types are set by:

0) Using one ioremap_uc() for the MMIO region alone.
   This will set the page attribute settings for the MMIO
   region to PCD=1, PWT=1 for non-PAT systems while using a
   strong UC value on PAT systems.

1) Fixing the framebuffer ioremap'd area to exclude the
   MMIO region and using ioremap_wc() instead to whitelist
   the area we want for write-combining.

On both cases an implementation defined (as per above table)
effective memory type of WC is used for the framebuffer for
non-PAT systems.

[0] https://www-ssl.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf

Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Andrzej Hajda <a.hajda@samsung.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/aty/atyfb.h      |  1 -
 drivers/video/fbdev/aty/atyfb_base.c | 36 ++++++++++++++----------------------
 2 files changed, 14 insertions(+), 23 deletions(-)

diff --git a/drivers/video/fbdev/aty/atyfb.h b/drivers/video/fbdev/aty/atyfb.h
index 1f39a62..89ec439 100644
--- a/drivers/video/fbdev/aty/atyfb.h
+++ b/drivers/video/fbdev/aty/atyfb.h
@@ -184,7 +184,6 @@ struct atyfb_par {
 	spinlock_t int_lock;
 #ifdef CONFIG_MTRR
 	int mtrr_aper;
-	int mtrr_reg;
 #endif
 	u32 mem_cntl;
 	struct crtc saved_crtc;
diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
index 8025624..546f5af 100644
--- a/drivers/video/fbdev/aty/atyfb_base.c
+++ b/drivers/video/fbdev/aty/atyfb_base.c
@@ -2630,21 +2630,13 @@ static int aty_init(struct fb_info *info)
 
 #ifdef CONFIG_MTRR
 	par->mtrr_aper = -1;
-	par->mtrr_reg = -1;
 	if (!nomtrr) {
-		/* Cover the whole resource. */
+		/*
+		 * Only the ioremap_wc()'d area will get WC here
+		 * since ioremap_uc() was used on the entire PCI BAR.
+		 */
 		par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
 					  MTRR_TYPE_WRCOMB, 1);
-		if (par->mtrr_aper >= 0 && !par->aux_start) {
-			/* Make a hole for mmio. */
-			par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
-						 GUI_RESERVE, GUI_RESERVE,
-						 MTRR_TYPE_UNCACHABLE, 1);
-			if (par->mtrr_reg < 0) {
-				mtrr_del(par->mtrr_aper, 0, 0);
-				par->mtrr_aper = -1;
-			}
-		}
 	}
 #endif
 
@@ -2776,10 +2768,6 @@ aty_init_exit:
 	par->pll_ops->set_pll(info, &par->saved_pll);
 
 #ifdef CONFIG_MTRR
-	if (par->mtrr_reg >= 0) {
-		mtrr_del(par->mtrr_reg, 0, 0);
-		par->mtrr_reg = -1;
-	}
 	if (par->mtrr_aper >= 0) {
 		mtrr_del(par->mtrr_aper, 0, 0);
 		par->mtrr_aper = -1;
@@ -3466,7 +3454,11 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
 	}
 
 	info->fix.mmio_start = raddr;
-	par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
+	/*
+	 * By using strong UC we force the MTRR to never have an
+	 * effect on the MMIO region on both non-PAT and PAT systems.
+	 */
+	par->ati_regbase = ioremap_uc(info->fix.mmio_start, 0x1000);
 	if (par->ati_regbase == NULL)
 		return -ENOMEM;
 
@@ -3491,7 +3483,10 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
 	info->fix.smem_start = addr;
 	info->fix.smem_len = 0x800000;
 
-	info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
+	aty_fudge_framebuffer_len(info);
+
+	info->screen_base = ioremap_wc(info->fix.smem_start,
+				       info->fix.smem_len);
 	if (info->screen_base == NULL) {
 		ret = -ENOMEM;
 		goto atyfb_setup_generic_fail;
@@ -3563,6 +3558,7 @@ static int atyfb_pci_probe(struct pci_dev *pdev,
 		return -ENOMEM;
 	}
 	par = info->par;
+	par->bus_type = PCI;
 	info->fix = atyfb_fix;
 	info->device = &pdev->dev;
 	par->pci_id = pdev->device;
@@ -3732,10 +3728,6 @@ static void atyfb_remove(struct fb_info *info)
 #endif
 
 #ifdef CONFIG_MTRR
-	if (par->mtrr_reg >= 0) {
-		mtrr_del(par->mtrr_reg, 0, 0);
-		par->mtrr_reg = -1;
-	}
 	if (par->mtrr_aper >= 0) {
 		mtrr_del(par->mtrr_aper, 0, 0);
 		par->mtrr_aper = -1;
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4 5/6] video: fbdev: atyfb: replace MTRR UC hole with strong UC
@ 2015-04-29 21:44   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-29 21:44 UTC (permalink / raw)
  To: mingo, tglx, hpa, bp, plagnioj, tomi.valkeinen, daniel.vetter, airlied
  Cc: dledford, awalls, syrjala, luto, mst, cocci, linux-kernel,
	Luis R. Rodriguez, Toshi Kani, Suresh Siddha, Linus Torvalds,
	Juergen Gross, Daniel Vetter, Dave Airlie, Antonino Daplas,
	Rob Clark, Mathias Krause, Andrzej Hajda, Mel Gorman,
	Vlastimil Babka, Davidlohr Bueso, linux-fbdev

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Replace a WC MTRR call followed by a UC MTRR "hole" call
with a single WC MTRR call and use strong UC to protect
the MMIO region and account for the device's architecture
and MTRR size requirements.

The atyfb driver relies on two overlapping MTRRs. It
does this to account for the fact that on some devices
it has the MMIO region bundled together with the framebuffer
on the same PCI BAR and the hardware requirement on
MTRRs on both base and size to be powers of two. In the
atyfb driver's case in the worst case the PCI BAR is
of 16 MiB while the MMIO region is on the last 4 KiB of
the same PCI BAR. If we use just one MTRR for WC we can
only end up with an 8 MiB or 16 MiB framebuffer. Using a
16 MiB WC framebuffer area is unacceptable since we need
the MMIO region to not be write-combined. An 8 MiB WC
framebuffer option does not let use quite a bit of framebuffer
space, it would reduce the resolution capability of the device
considerably. An alternative is to use many MTRRs but on
some systems that could mean not having not enough MTRRs
to cover the framebuffer. The current driver solution is
to issue a 16 MiB WC MTRR followed by a 4 KiB UC MTRR on
the last 4 KiB. Its worth mentioning and documenting that
the current ioremap*() strategy as well: the first ioremap()
is used only for the MMIO region, a second ioremap() call
is used for the framebuffer *and* the MMIO region, the MMIO
region then ends up mmap'd twice. Two ioremap() calls are
used since in some situations the framebuffer actually ends
up on a separate auxiliary PCI BAR, but this is not always
true, in the worst case the PCI BAR is shared for both
MMIO and the framebuffer. By allowing overlapping ioremap()
calls the driver enables two types of devices with one
simple ioremap() strategy.

For non PAT systems:

As per Intel SDM "11.5.2.1 Selecting Memory Types for Pentium
Pro and Pentium II Processors" [0] the effect of a WC MTRR for
a region with page attribute settings set to PCD=1, PWT=1
(Linux _PAGE_CACHE_MODE_UC) will render the effective memory
type to UC. A WC MTRR for a region with page attribute settings
set to PCD=1, PWT=0 (Linux _PAGE_CACHE_MODE_UC_MINUS) will render
the effective memory type to WC *but* yet this is considered
implementation defined -- that is, "system designers are
encouraged to avoid these implementation-defined combinations".
A WC MTRR for a region with page attribute settings set to
PCD=0, PWT=1 (Linux _PAGE_CACHE_MODE_WC) will render the
effective memory type to WC *but* this is also implementation
defined. Such is the case for non-PAT systems.

For PAT systems:

As per Intel SDM "11.5.2.2 Selecting Memory Types for Pentium
III and More Recent Processor Families" the ffect of a WC MTRR
for a region with a PAT entry value of UC will be UC. The effect
of a WC MTRR on a region with a PAT entry UC- will be WC. The
effect of a WC MTRR on a regoin with PAT entry WC is WC.

This can all be summarized in the following table:

----------------------------------------------------------------------
MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
----------------------------------------------------------------------
                                                  Non-PAT |  PAT
     PAT
     |PCD
     ||PWT
     |||
WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   UC
WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
----------------------------------------------------------------------

 (*) denotes implementation defined

By default Linux today defaults both and ioremap_nocache()
to use _PAGE_CACHE_MODE_UC_MINUS. On x86 ioremap() aliases
ioremap_nocache(). The preferred value for Linux by may soon
change however, the goal is to use _PAGE_CACHE_MODE_UC by
default in the future.

We can use ioremap_uc() to set PCD=1, PWT=1 on non-PAT systems
and use a PAT value of UC for PAT systems. This will ensure the
same settings are in place regardless of what Linux decides to
use by default later and to not regress our MTRR strategy since
the effective memory type will differ depending on the value used.
Using a WC MTRR on such an area will be nullified. This technique
can be used to protect the MMIO region in this driver's case and
address the restrictions of the device's architecture as well as
restrictions set upon us by powers of 2 when using MTRRs.

This allows us to replace the two MTRR calls with a single
16 MiB WC MTRR and use page-attribute settings for non-PAT
and PAT entry values for PAT systems to ensure the
appropriate effective memory type won't have a write-combined
effect on the MMIO region on both non-PAT and PAT systems.
The framebuffer area will be sure to get the write-combined
effective memory type by white-listing it with ioremap_wc().

We ensure the desired effective memory types are set by:

0) Using one ioremap_uc() for the MMIO region alone.
   This will set the page attribute settings for the MMIO
   region to PCD=1, PWT=1 for non-PAT systems while using a
   strong UC value on PAT systems.

1) Fixing the framebuffer ioremap'd area to exclude the
   MMIO region and using ioremap_wc() instead to whitelist
   the area we want for write-combining.

On both cases an implementation defined (as per above table)
effective memory type of WC is used for the framebuffer for
non-PAT systems.

[0] https://www-ssl.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf

Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Andrzej Hajda <a.hajda@samsung.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/aty/atyfb.h      |  1 -
 drivers/video/fbdev/aty/atyfb_base.c | 36 ++++++++++++++----------------------
 2 files changed, 14 insertions(+), 23 deletions(-)

diff --git a/drivers/video/fbdev/aty/atyfb.h b/drivers/video/fbdev/aty/atyfb.h
index 1f39a62..89ec439 100644
--- a/drivers/video/fbdev/aty/atyfb.h
+++ b/drivers/video/fbdev/aty/atyfb.h
@@ -184,7 +184,6 @@ struct atyfb_par {
 	spinlock_t int_lock;
 #ifdef CONFIG_MTRR
 	int mtrr_aper;
-	int mtrr_reg;
 #endif
 	u32 mem_cntl;
 	struct crtc saved_crtc;
diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
index 8025624..546f5af 100644
--- a/drivers/video/fbdev/aty/atyfb_base.c
+++ b/drivers/video/fbdev/aty/atyfb_base.c
@@ -2630,21 +2630,13 @@ static int aty_init(struct fb_info *info)
 
 #ifdef CONFIG_MTRR
 	par->mtrr_aper = -1;
-	par->mtrr_reg = -1;
 	if (!nomtrr) {
-		/* Cover the whole resource. */
+		/*
+		 * Only the ioremap_wc()'d area will get WC here
+		 * since ioremap_uc() was used on the entire PCI BAR.
+		 */
 		par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
 					  MTRR_TYPE_WRCOMB, 1);
-		if (par->mtrr_aper >= 0 && !par->aux_start) {
-			/* Make a hole for mmio. */
-			par->mtrr_reg = mtrr_add(par->res_start + 0x800000 -
-						 GUI_RESERVE, GUI_RESERVE,
-						 MTRR_TYPE_UNCACHABLE, 1);
-			if (par->mtrr_reg < 0) {
-				mtrr_del(par->mtrr_aper, 0, 0);
-				par->mtrr_aper = -1;
-			}
-		}
 	}
 #endif
 
@@ -2776,10 +2768,6 @@ aty_init_exit:
 	par->pll_ops->set_pll(info, &par->saved_pll);
 
 #ifdef CONFIG_MTRR
-	if (par->mtrr_reg >= 0) {
-		mtrr_del(par->mtrr_reg, 0, 0);
-		par->mtrr_reg = -1;
-	}
 	if (par->mtrr_aper >= 0) {
 		mtrr_del(par->mtrr_aper, 0, 0);
 		par->mtrr_aper = -1;
@@ -3466,7 +3454,11 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
 	}
 
 	info->fix.mmio_start = raddr;
-	par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
+	/*
+	 * By using strong UC we force the MTRR to never have an
+	 * effect on the MMIO region on both non-PAT and PAT systems.
+	 */
+	par->ati_regbase = ioremap_uc(info->fix.mmio_start, 0x1000);
 	if (par->ati_regbase = NULL)
 		return -ENOMEM;
 
@@ -3491,7 +3483,10 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
 	info->fix.smem_start = addr;
 	info->fix.smem_len = 0x800000;
 
-	info->screen_base = ioremap(info->fix.smem_start, info->fix.smem_len);
+	aty_fudge_framebuffer_len(info);
+
+	info->screen_base = ioremap_wc(info->fix.smem_start,
+				       info->fix.smem_len);
 	if (info->screen_base = NULL) {
 		ret = -ENOMEM;
 		goto atyfb_setup_generic_fail;
@@ -3563,6 +3558,7 @@ static int atyfb_pci_probe(struct pci_dev *pdev,
 		return -ENOMEM;
 	}
 	par = info->par;
+	par->bus_type = PCI;
 	info->fix = atyfb_fix;
 	info->device = &pdev->dev;
 	par->pci_id = pdev->device;
@@ -3732,10 +3728,6 @@ static void atyfb_remove(struct fb_info *info)
 #endif
 
 #ifdef CONFIG_MTRR
-	if (par->mtrr_reg >= 0) {
-		mtrr_del(par->mtrr_reg, 0, 0);
-		par->mtrr_reg = -1;
-	}
 	if (par->mtrr_aper >= 0) {
 		mtrr_del(par->mtrr_aper, 0, 0);
 		par->mtrr_aper = -1;
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4 6/6] video: fbdev: atyfb: use arch_phys_wc_add() and ioremap_wc()
  2015-04-29 21:44 [PATCH v4 0/6] x86: document and address MTRR corner cases Luis R. Rodriguez
@ 2015-04-29 21:44   ` Luis R. Rodriguez
  2015-04-29 21:44   ` Luis R. Rodriguez
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-29 21:44 UTC (permalink / raw)
  To: mingo, tglx, hpa, bp, plagnioj, tomi.valkeinen, daniel.vetter, airlied
  Cc: dledford, awalls, syrjala, luto, mst, cocci, linux-kernel,
	Luis R. Rodriguez, Toshi Kani, Suresh Siddha, Juergen Gross,
	Daniel Vetter, Dave Airlie, Antonino Daplas, Rob Clark,
	Mathias Krause, Andrzej Hajda, Mel Gorman, Vlastimil Babka,
	Davidlohr Bueso, linux-fbdev

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses strong UC for the MMIO region, and ioremap_wc()
for the framebuffer to whitelist for the WC MTRR what can changed
to WC. On PAT systems we don't need the MTRR call so just use
arch_phys_wc_add() there, this lets us remove all those ifdefs.
Lets also be consistent and use ioremap_wc() for ATARI as well.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (see commit
   de33c442e titled "x86 PAT: fix performance drop for glx,
   use UC minus for ioremap(), ioremap_nocache() and
   pci_mmap_page_range()")

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Andrzej Hajda <a.hajda@samsung.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/aty/atyfb.h      |  4 +---
 drivers/video/fbdev/aty/atyfb_base.c | 37 +++++++-----------------------------
 2 files changed, 8 insertions(+), 33 deletions(-)

diff --git a/drivers/video/fbdev/aty/atyfb.h b/drivers/video/fbdev/aty/atyfb.h
index 89ec439..63c4842 100644
--- a/drivers/video/fbdev/aty/atyfb.h
+++ b/drivers/video/fbdev/aty/atyfb.h
@@ -182,9 +182,7 @@ struct atyfb_par {
 	unsigned long irq_flags;
 	unsigned int irq;
 	spinlock_t int_lock;
-#ifdef CONFIG_MTRR
-	int mtrr_aper;
-#endif
+	int wc_cookie;
 	u32 mem_cntl;
 	struct crtc saved_crtc;
 	union aty_pll saved_pll;
diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
index 546f5af..b75c974 100644
--- a/drivers/video/fbdev/aty/atyfb_base.c
+++ b/drivers/video/fbdev/aty/atyfb_base.c
@@ -98,9 +98,6 @@
 #ifdef CONFIG_PMAC_BACKLIGHT
 #include <asm/backlight.h>
 #endif
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 
 /*
  * Debug flags.
@@ -303,9 +300,6 @@ static struct fb_ops atyfb_ops = {
 };
 
 static bool noaccel;
-#ifdef CONFIG_MTRR
-static bool nomtrr;
-#endif
 static int vram;
 static int pll;
 static int mclk;
@@ -2628,17 +2622,13 @@ static int aty_init(struct fb_info *info)
 		aty_st_le32(BUS_CNTL, aty_ld_le32(BUS_CNTL, par) |
 			    BUS_APER_REG_DIS, par);
 
-#ifdef CONFIG_MTRR
-	par->mtrr_aper = -1;
-	if (!nomtrr) {
+	if (!nomtrr)
 		/*
 		 * Only the ioremap_wc()'d area will get WC here
 		 * since ioremap_uc() was used on the entire PCI BAR.
 		 */
-		par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
-					  MTRR_TYPE_WRCOMB, 1);
-	}
-#endif
+		par->wc_cookie = arch_phys_wc_add(par->res_start,
+						  par->res_size);
 
 	info->fbops = &atyfb_ops;
 	info->pseudo_palette = par->pseudo_palette;
@@ -2766,13 +2756,8 @@ aty_init_exit:
 	/* restore video mode */
 	aty_set_crtc(par, &par->saved_crtc);
 	par->pll_ops->set_pll(info, &par->saved_pll);
+	arch_phys_wc_del(par->wc_cookie);
 
-#ifdef CONFIG_MTRR
-	if (par->mtrr_aper >= 0) {
-		mtrr_del(par->mtrr_aper, 0, 0);
-		par->mtrr_aper = -1;
-	}
-#endif
 	return ret;
 }
 
@@ -3660,7 +3645,8 @@ static int __init atyfb_atari_probe(void)
 		 * Map the video memory (physical address given)
 		 * to somewhere in the kernel address space.
 		 */
-		info->screen_base = ioremap(phys_vmembase[m64_num], phys_size[m64_num]);
+		info->screen_base = ioremap_wc(phys_vmembase[m64_num],
+					       phys_size[m64_num]);
 		info->fix.smem_start = (unsigned long)info->screen_base; /* Fake! */
 		par->ati_regbase = ioremap(phys_guiregbase[m64_num], 0x10000) +
 						0xFC00ul;
@@ -3726,13 +3712,8 @@ static void atyfb_remove(struct fb_info *info)
 	if (M64_HAS(MOBIL_BUS))
 		aty_bl_exit(info->bl_dev);
 #endif
+	arch_phys_wc_del(par->wc_cookie);
 
-#ifdef CONFIG_MTRR
-	if (par->mtrr_aper >= 0) {
-		mtrr_del(par->mtrr_aper, 0, 0);
-		par->mtrr_aper = -1;
-	}
-#endif
 #ifndef __sparc__
 	if (par->ati_regbase)
 		iounmap(par->ati_regbase);
@@ -3848,10 +3829,8 @@ static int __init atyfb_setup(char *options)
 	while ((this_opt = strsep(&options, ",")) != NULL) {
 		if (!strncmp(this_opt, "noaccel", 7)) {
 			noaccel = 1;
-#ifdef CONFIG_MTRR
 		} else if (!strncmp(this_opt, "nomtrr", 6)) {
 			nomtrr = 1;
-#endif
 		} else if (!strncmp(this_opt, "vram:", 5))
 			vram = simple_strtoul(this_opt + 5, NULL, 0);
 		else if (!strncmp(this_opt, "pll:", 4))
@@ -4021,7 +4000,5 @@ module_param(comp_sync, int, 0);
 MODULE_PARM_DESC(comp_sync, "Set composite sync signal to low (0) or high (1)");
 module_param(mode, charp, 0);
 MODULE_PARM_DESC(mode, "Specify resolution as \"<xres>x<yres>[-<bpp>][@<refresh>]\" ");
-#ifdef CONFIG_MTRR
 module_param(nomtrr, bool, 0);
 MODULE_PARM_DESC(nomtrr, "bool: disable use of MTRR registers");
-#endif
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4 6/6] video: fbdev: atyfb: use arch_phys_wc_add() and ioremap_wc()
@ 2015-04-29 21:44   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-29 21:44 UTC (permalink / raw)
  To: mingo, tglx, hpa, bp, plagnioj, tomi.valkeinen, daniel.vetter, airlied
  Cc: dledford, awalls, syrjala, luto, mst, cocci, linux-kernel,
	Luis R. Rodriguez, Toshi Kani, Suresh Siddha, Juergen Gross,
	Daniel Vetter, Dave Airlie, Antonino Daplas, Rob Clark,
	Mathias Krause, Andrzej Hajda, Mel Gorman, Vlastimil Babka,
	Davidlohr Bueso, linux-fbdev

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This driver uses strong UC for the MMIO region, and ioremap_wc()
for the framebuffer to whitelist for the WC MTRR what can changed
to WC. On PAT systems we don't need the MTRR call so just use
arch_phys_wc_add() there, this lets us remove all those ifdefs.
Lets also be consistent and use ioremap_wc() for ATARI as well.

There are a few motivations for this:

a) Take advantage of PAT when available

b) Help bury MTRR code away, MTRR is architecture specific and on
   x86 its replaced by PAT

c) Help with the goal of eventually using _PAGE_CACHE_UC over
   _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (see commit
   de33c442e titled "x86 PAT: fix performance drop for glx,
   use UC minus for ioremap(), ioremap_nocache() and
   pci_mmap_page_range()")

The conversion done is expressed by the following Coccinelle
SmPL patch, it additionally required manual intervention to
address all the #ifdery and removal of redundant things which
arch_phys_wc_add() already addresses such as verbose message
about when MTRR fails and doing nothing when we didn't get
an MTRR.

@ mtrr_found @
expression index, base, size;
@@

-index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
+index = arch_phys_wc_add(base, size);

@ mtrr_rm depends on mtrr_found @
expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
@@

-mtrr_del(index, base, size);
+arch_phys_wc_del(index);

@ mtrr_rm_zero_arg depends on mtrr_found @
expression mtrr_found.index;
@@

-mtrr_del(index, 0, 0);
+arch_phys_wc_del(index);

@ mtrr_rm_fb_info depends on mtrr_found @
struct fb_info *info;
expression mtrr_found.index;
@@

-mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
+arch_phys_wc_del(index);

@ ioremap_replace_nocache depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap_nocache(base, size);
+info->screen_base = ioremap_wc(base, size);

@ ioremap_replace_default depends on mtrr_found @
struct fb_info *info;
expression base, size;
@@

-info->screen_base = ioremap(base, size);
+info->screen_base = ioremap_wc(base, size);

Generated-by: Coccinelle SmPL
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Andrzej Hajda <a.hajda@samsung.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/video/fbdev/aty/atyfb.h      |  4 +---
 drivers/video/fbdev/aty/atyfb_base.c | 37 +++++++-----------------------------
 2 files changed, 8 insertions(+), 33 deletions(-)

diff --git a/drivers/video/fbdev/aty/atyfb.h b/drivers/video/fbdev/aty/atyfb.h
index 89ec439..63c4842 100644
--- a/drivers/video/fbdev/aty/atyfb.h
+++ b/drivers/video/fbdev/aty/atyfb.h
@@ -182,9 +182,7 @@ struct atyfb_par {
 	unsigned long irq_flags;
 	unsigned int irq;
 	spinlock_t int_lock;
-#ifdef CONFIG_MTRR
-	int mtrr_aper;
-#endif
+	int wc_cookie;
 	u32 mem_cntl;
 	struct crtc saved_crtc;
 	union aty_pll saved_pll;
diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
index 546f5af..b75c974 100644
--- a/drivers/video/fbdev/aty/atyfb_base.c
+++ b/drivers/video/fbdev/aty/atyfb_base.c
@@ -98,9 +98,6 @@
 #ifdef CONFIG_PMAC_BACKLIGHT
 #include <asm/backlight.h>
 #endif
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
 
 /*
  * Debug flags.
@@ -303,9 +300,6 @@ static struct fb_ops atyfb_ops = {
 };
 
 static bool noaccel;
-#ifdef CONFIG_MTRR
-static bool nomtrr;
-#endif
 static int vram;
 static int pll;
 static int mclk;
@@ -2628,17 +2622,13 @@ static int aty_init(struct fb_info *info)
 		aty_st_le32(BUS_CNTL, aty_ld_le32(BUS_CNTL, par) |
 			    BUS_APER_REG_DIS, par);
 
-#ifdef CONFIG_MTRR
-	par->mtrr_aper = -1;
-	if (!nomtrr) {
+	if (!nomtrr)
 		/*
 		 * Only the ioremap_wc()'d area will get WC here
 		 * since ioremap_uc() was used on the entire PCI BAR.
 		 */
-		par->mtrr_aper = mtrr_add(par->res_start, par->res_size,
-					  MTRR_TYPE_WRCOMB, 1);
-	}
-#endif
+		par->wc_cookie = arch_phys_wc_add(par->res_start,
+						  par->res_size);
 
 	info->fbops = &atyfb_ops;
 	info->pseudo_palette = par->pseudo_palette;
@@ -2766,13 +2756,8 @@ aty_init_exit:
 	/* restore video mode */
 	aty_set_crtc(par, &par->saved_crtc);
 	par->pll_ops->set_pll(info, &par->saved_pll);
+	arch_phys_wc_del(par->wc_cookie);
 
-#ifdef CONFIG_MTRR
-	if (par->mtrr_aper >= 0) {
-		mtrr_del(par->mtrr_aper, 0, 0);
-		par->mtrr_aper = -1;
-	}
-#endif
 	return ret;
 }
 
@@ -3660,7 +3645,8 @@ static int __init atyfb_atari_probe(void)
 		 * Map the video memory (physical address given)
 		 * to somewhere in the kernel address space.
 		 */
-		info->screen_base = ioremap(phys_vmembase[m64_num], phys_size[m64_num]);
+		info->screen_base = ioremap_wc(phys_vmembase[m64_num],
+					       phys_size[m64_num]);
 		info->fix.smem_start = (unsigned long)info->screen_base; /* Fake! */
 		par->ati_regbase = ioremap(phys_guiregbase[m64_num], 0x10000) +
 						0xFC00ul;
@@ -3726,13 +3712,8 @@ static void atyfb_remove(struct fb_info *info)
 	if (M64_HAS(MOBIL_BUS))
 		aty_bl_exit(info->bl_dev);
 #endif
+	arch_phys_wc_del(par->wc_cookie);
 
-#ifdef CONFIG_MTRR
-	if (par->mtrr_aper >= 0) {
-		mtrr_del(par->mtrr_aper, 0, 0);
-		par->mtrr_aper = -1;
-	}
-#endif
 #ifndef __sparc__
 	if (par->ati_regbase)
 		iounmap(par->ati_regbase);
@@ -3848,10 +3829,8 @@ static int __init atyfb_setup(char *options)
 	while ((this_opt = strsep(&options, ",")) != NULL) {
 		if (!strncmp(this_opt, "noaccel", 7)) {
 			noaccel = 1;
-#ifdef CONFIG_MTRR
 		} else if (!strncmp(this_opt, "nomtrr", 6)) {
 			nomtrr = 1;
-#endif
 		} else if (!strncmp(this_opt, "vram:", 5))
 			vram = simple_strtoul(this_opt + 5, NULL, 0);
 		else if (!strncmp(this_opt, "pll:", 4))
@@ -4021,7 +4000,5 @@ module_param(comp_sync, int, 0);
 MODULE_PARM_DESC(comp_sync, "Set composite sync signal to low (0) or high (1)");
 module_param(mode, charp, 0);
 MODULE_PARM_DESC(mode, "Specify resolution as \"<xres>x<yres>[-<bpp>][@<refresh>]\" ");
-#ifdef CONFIG_MTRR
 module_param(nomtrr, bool, 0);
 MODULE_PARM_DESC(nomtrr, "bool: disable use of MTRR registers");
-#endif
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 1/6] x86: add ioremap_uc() - force strong UC, PCD=1, PWT=1
  2015-04-29 21:44   ` Luis R. Rodriguez
@ 2015-04-30 10:18     ` Borislav Petkov
  -1 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-04-30 10:18 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: mingo, tglx, hpa, bp, plagnioj, tomi.valkeinen, daniel.vetter,
	airlied, dledford, awalls, syrjala, luto, mst, cocci,
	linux-kernel, Luis R. Rodriguez, Toshi Kani, Bjorn Helgaas,
	Suresh Siddha, Juergen Gross, Daniel Vetter, Dave Airlie,
	Antonino Daplas, Will Deacon, Thierry Reding, Mike Travis,
	Mel Gorman, Vlastimil Babka, Davidlohr Bueso, x86, linux-fbdev

On Wed, Apr 29, 2015 at 02:44:06PM -0700, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
> 
> ioremap_nocache() currently uses UC- by default.
> Our goal is to eventually make UC the default.
> Linux maps UC- to PCD=1, PWT=0 page attributes on
> non-PAT systems. Linux maps UC to PCD=1, PWT=1
> page attributes on non-PAT systems. On non-PAT
> and PAT systems a WC MTRR has different effects on
> pages with either of these attributes. In order to
> help with a smooth transition its best to enable
> use of UC (PCD,1, PWT=1) on a region as that ensures
> a WC MTRR will have no effect on a region, this
> however requires us to have an way to declare a
> region as UC and we currently do not have a way
> to do this.
> 
> WC MTRR on non-PAT system with PCD=1, PWT=0 (UC-) yields WC.
> WC MTRR on non-PAT system with PCD=1, PWT=1 (UC)  yields UC.
> 
> WC MTRR on PAT system with PCD=1, PWT=0 (UC-) yields WC.
> WC MTRR on PAT system with PCD=1, PWT=1 (UC)  yields UC.
> 
> A flip of the default ioremap_nocache() behaviour
> from UC- to UC can therefore regress a memory
> region from effective memory type WC to UC if MTRRs
> are used. Use of MTRRs should be phased out and in
> the best case only arch_phys_wc_add() use will remain,
> even if this happens arch_phys_wc_add() will have an
> effect on non-PAT systems and changes to default
> ioremap_nocache() behaviour could regress drivers.
> 
> Now, ideally we'd use ioremap_nocache() on the regions
> in which we'd need uncachable memory types and avoid
> any MTRRs on those regions. There are however some
> restrictions on MTRRs use, such as the requirement of
> having the base and size of variable sized MTRRs
> to be powers of two, which could mean having to use
> a WC MTRR over a large area which includes a region
> in which write-combining effects are undesirable.
> 
> Add ioremap_uc() to help with the both phasing out of
> MTRR use and also provide a way to blacklist small
> WC undesirable regions in devices with mixed regions
> which are size-implicated to use large WC MTRRs. Use
> of ioremap_uc() helps phase out MTRR use by avoiding
> regressions with an eventual flip of default behaviour
> or ioremap_nocache() from UC- to UC.
> 
> Drivers working with WC MTRRs can use the below table
> to review and consider the use of ioremap*() and similar
> helpers to ensure appropriate behaviour long term even
> if default ioremap_nocache() behaviour changes from UC-
> to UC.
> 
> Although ioremap_uc() is being added we leave set_memory_uc()
> to use UC- as only initial memory type setup is required
> to be able to accomodate existing device drivers and phase
> out MTRR use. It should also be clarified that set_memory_uc()
> cannot be used with IO memory, even though its use will
> not return any errors, it really has no effect.
> 
> ----------------------------------------------------------------------
> MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
> ----------------------------------------------------------------------
>                                                   Non-PAT |  PAT
>      PAT
>      |PCD
>      ||PWT
>      |||
> WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
> WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
> WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   WC
> WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
> ----------------------------------------------------------------------
> 
> Cc: Toshi Kani <toshi.kani@hp.com>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Suresh Siddha <sbsiddha@gmail.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: Ville Syrjälä <syrjala@sci.fi>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Thierry Reding <treding@nvidia.com>
> Cc: Mike Travis <travis@sgi.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Davidlohr Bueso <dbueso@suse.de>
> Cc: x86@kernel.org
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
>  arch/x86/include/asm/io.h |  1 +
>  arch/x86/mm/ioremap.c     | 36 +++++++++++++++++++++++++++++++++++-
>  arch/x86/mm/pageattr.c    |  3 +++
>  include/asm-generic/io.h  |  8 ++++++++
>  4 files changed, 47 insertions(+), 1 deletion(-)

Looks ok to me. Applied, thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 1/6] x86: add ioremap_uc() - force strong UC, PCD=1, PWT=1
@ 2015-04-30 10:18     ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-04-30 10:18 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: mingo, tglx, hpa, bp, plagnioj, tomi.valkeinen, daniel.vetter,
	airlied, dledford, awalls, syrjala, luto, mst, cocci,
	linux-kernel, Luis R. Rodriguez, Toshi Kani, Bjorn Helgaas,
	Suresh Siddha, Juergen Gross, Daniel Vetter, Dave Airlie,
	Antonino Daplas, Will Deacon, Thierry Reding, Mike Travis,
	Mel Gorman, Vlastimil Babka, Davidlohr Bueso, x86, linux-fbdev

On Wed, Apr 29, 2015 at 02:44:06PM -0700, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
> 
> ioremap_nocache() currently uses UC- by default.
> Our goal is to eventually make UC the default.
> Linux maps UC- to PCD=1, PWT=0 page attributes on
> non-PAT systems. Linux maps UC to PCD=1, PWT=1
> page attributes on non-PAT systems. On non-PAT
> and PAT systems a WC MTRR has different effects on
> pages with either of these attributes. In order to
> help with a smooth transition its best to enable
> use of UC (PCD,1, PWT=1) on a region as that ensures
> a WC MTRR will have no effect on a region, this
> however requires us to have an way to declare a
> region as UC and we currently do not have a way
> to do this.
> 
> WC MTRR on non-PAT system with PCD=1, PWT=0 (UC-) yields WC.
> WC MTRR on non-PAT system with PCD=1, PWT=1 (UC)  yields UC.
> 
> WC MTRR on PAT system with PCD=1, PWT=0 (UC-) yields WC.
> WC MTRR on PAT system with PCD=1, PWT=1 (UC)  yields UC.
> 
> A flip of the default ioremap_nocache() behaviour
> from UC- to UC can therefore regress a memory
> region from effective memory type WC to UC if MTRRs
> are used. Use of MTRRs should be phased out and in
> the best case only arch_phys_wc_add() use will remain,
> even if this happens arch_phys_wc_add() will have an
> effect on non-PAT systems and changes to default
> ioremap_nocache() behaviour could regress drivers.
> 
> Now, ideally we'd use ioremap_nocache() on the regions
> in which we'd need uncachable memory types and avoid
> any MTRRs on those regions. There are however some
> restrictions on MTRRs use, such as the requirement of
> having the base and size of variable sized MTRRs
> to be powers of two, which could mean having to use
> a WC MTRR over a large area which includes a region
> in which write-combining effects are undesirable.
> 
> Add ioremap_uc() to help with the both phasing out of
> MTRR use and also provide a way to blacklist small
> WC undesirable regions in devices with mixed regions
> which are size-implicated to use large WC MTRRs. Use
> of ioremap_uc() helps phase out MTRR use by avoiding
> regressions with an eventual flip of default behaviour
> or ioremap_nocache() from UC- to UC.
> 
> Drivers working with WC MTRRs can use the below table
> to review and consider the use of ioremap*() and similar
> helpers to ensure appropriate behaviour long term even
> if default ioremap_nocache() behaviour changes from UC-
> to UC.
> 
> Although ioremap_uc() is being added we leave set_memory_uc()
> to use UC- as only initial memory type setup is required
> to be able to accomodate existing device drivers and phase
> out MTRR use. It should also be clarified that set_memory_uc()
> cannot be used with IO memory, even though its use will
> not return any errors, it really has no effect.
> 
> ----------------------------------------------------------------------
> MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
> ----------------------------------------------------------------------
>                                                   Non-PAT |  PAT
>      PAT
>      |PCD
>      ||PWT
>      |||
> WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
> WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
> WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   WC
> WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
> ----------------------------------------------------------------------
> 
> Cc: Toshi Kani <toshi.kani@hp.com>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Suresh Siddha <sbsiddha@gmail.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: Ville Syrjälä <syrjala@sci.fi>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Thierry Reding <treding@nvidia.com>
> Cc: Mike Travis <travis@sgi.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Davidlohr Bueso <dbueso@suse.de>
> Cc: x86@kernel.org
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
>  arch/x86/include/asm/io.h |  1 +
>  arch/x86/mm/ioremap.c     | 36 +++++++++++++++++++++++++++++++++++-
>  arch/x86/mm/pageattr.c    |  3 +++
>  include/asm-generic/io.h  |  8 ++++++++
>  4 files changed, 47 insertions(+), 1 deletion(-)

Looks ok to me. Applied, thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 23/47] staging: xgifb: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:18   ` Luis R. Rodriguez
@ 2015-04-30 17:40     ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-30 17:40 UTC (permalink / raw)
  To: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Juergen Gross, Jan Beulich, Borislav Petkov, Suresh Siddha,
	venkatesh.pallipadi, Dave Airlie
  Cc: linux-kernel, linux-fbdev, X86 ML, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 20, 2015 at 4:18 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> The same area used for ioremap() is used for the MTRR area.
> Convert the driver from using the x86 specific MTRR code to
> the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
> will avoid MTRR if write-combining is available, in order to
> take advantage of that also ensure the ioremap'd area is requested
> as write-combining.
>
> There are a few motivations for this:
>
> a) Take advantage of PAT when available
>
> b) Help bury MTRR code away, MTRR is architecture specific and on
>    x86 its replaced by PAT
>
> c) Help with the goal of eventually using _PAGE_CACHE_UC over
>    _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
>
> The conversion done is expressed by the following Coccinelle
> SmPL patch, it additionally required manual intervention to
> address all the #ifdery and removal of redundant things which
> arch_phys_wc_add() already addresses such as verbose message
> about when MTRR fails and doing nothing when we didn't get
> an MTRR.
>
> @ mtrr_found @
> expression index, base, size;
> @@
>
> -index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
> +index = arch_phys_wc_add(base, size);
>
> @ mtrr_rm depends on mtrr_found @
> expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
> @@
>
> -mtrr_del(index, base, size);
> +arch_phys_wc_del(index);
>
> @ mtrr_rm_zero_arg depends on mtrr_found @
> expression mtrr_found.index;
> @@
>
> -mtrr_del(index, 0, 0);
> +arch_phys_wc_del(index);
>
> @ mtrr_rm_fb_info depends on mtrr_found @
> struct fb_info *info;
> expression mtrr_found.index;
> @@
>
> -mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
> +arch_phys_wc_del(index);
>
> @ ioremap_replace_nocache depends on mtrr_found @
> struct fb_info *info;
> expression base, size;
> @@
>
> -info->screen_base = ioremap_nocache(base, size);
> +info->screen_base = ioremap_wc(base, size);
>
> @ ioremap_replace_default depends on mtrr_found @
> struct fb_info *info;
> expression base, size;
> @@
>
> -info->screen_base = ioremap(base, size);
> +info->screen_base = ioremap_wc(base, size);
>
> Generated-by: Coccinelle SmPL
> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>

Hey folks, can this be considered to be merged.

Thanks,

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 23/47] staging: xgifb: use arch_phys_wc_add() and ioremap_wc()
@ 2015-04-30 17:40     ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-30 17:40 UTC (permalink / raw)
  To: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Juergen Gross, Jan Beulich, Borislav Petkov, Suresh Siddha,
	venkatesh.pallipadi, Dave Airlie
  Cc: linux-kernel, linux-fbdev, X86 ML, xen-devel, Luis R. Rodriguez,
	Ingo Molnar, Daniel Vetter, Antonino Daplas,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

On Fri, Mar 20, 2015 at 4:18 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> The same area used for ioremap() is used for the MTRR area.
> Convert the driver from using the x86 specific MTRR code to
> the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
> will avoid MTRR if write-combining is available, in order to
> take advantage of that also ensure the ioremap'd area is requested
> as write-combining.
>
> There are a few motivations for this:
>
> a) Take advantage of PAT when available
>
> b) Help bury MTRR code away, MTRR is architecture specific and on
>    x86 its replaced by PAT
>
> c) Help with the goal of eventually using _PAGE_CACHE_UC over
>    _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
>
> The conversion done is expressed by the following Coccinelle
> SmPL patch, it additionally required manual intervention to
> address all the #ifdery and removal of redundant things which
> arch_phys_wc_add() already addresses such as verbose message
> about when MTRR fails and doing nothing when we didn't get
> an MTRR.
>
> @ mtrr_found @
> expression index, base, size;
> @@
>
> -index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
> +index = arch_phys_wc_add(base, size);
>
> @ mtrr_rm depends on mtrr_found @
> expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
> @@
>
> -mtrr_del(index, base, size);
> +arch_phys_wc_del(index);
>
> @ mtrr_rm_zero_arg depends on mtrr_found @
> expression mtrr_found.index;
> @@
>
> -mtrr_del(index, 0, 0);
> +arch_phys_wc_del(index);
>
> @ mtrr_rm_fb_info depends on mtrr_found @
> struct fb_info *info;
> expression mtrr_found.index;
> @@
>
> -mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
> +arch_phys_wc_del(index);
>
> @ ioremap_replace_nocache depends on mtrr_found @
> struct fb_info *info;
> expression base, size;
> @@
>
> -info->screen_base = ioremap_nocache(base, size);
> +info->screen_base = ioremap_wc(base, size);
>
> @ ioremap_replace_default depends on mtrr_found @
> struct fb_info *info;
> expression base, size;
> @@
>
> -info->screen_base = ioremap(base, size);
> +info->screen_base = ioremap_wc(base, size);
>
> Generated-by: Coccinelle SmPL
> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>

Hey folks, can this be considered to be merged.

Thanks,

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v1 23/47] staging: xgifb: use arch_phys_wc_add() and ioremap_wc()
  2015-03-20 23:18   ` Luis R. Rodriguez
  (?)
@ 2015-04-30 17:40   ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-30 17:40 UTC (permalink / raw)
  To: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Juergen Gross, Jan Beulich, Borislav Petkov, Suresh Siddha,
	venkatesh.pallipadi, Dave Airlie
  Cc: linux-fbdev, Antonino Daplas, Daniel Vetter, Luis R. Rodriguez,
	X86 ML, linux-kernel, Tomi Valkeinen, xen-devel, Ingo Molnar,
	Jean-Christophe Plagniol-Villard

On Fri, Mar 20, 2015 at 4:18 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> The same area used for ioremap() is used for the MTRR area.
> Convert the driver from using the x86 specific MTRR code to
> the architecture agnostic arch_phys_wc_add(). arch_phys_wc_add()
> will avoid MTRR if write-combining is available, in order to
> take advantage of that also ensure the ioremap'd area is requested
> as write-combining.
>
> There are a few motivations for this:
>
> a) Take advantage of PAT when available
>
> b) Help bury MTRR code away, MTRR is architecture specific and on
>    x86 its replaced by PAT
>
> c) Help with the goal of eventually using _PAGE_CACHE_UC over
>    _PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e)
>
> The conversion done is expressed by the following Coccinelle
> SmPL patch, it additionally required manual intervention to
> address all the #ifdery and removal of redundant things which
> arch_phys_wc_add() already addresses such as verbose message
> about when MTRR fails and doing nothing when we didn't get
> an MTRR.
>
> @ mtrr_found @
> expression index, base, size;
> @@
>
> -index = mtrr_add(base, size, MTRR_TYPE_WRCOMB, 1);
> +index = arch_phys_wc_add(base, size);
>
> @ mtrr_rm depends on mtrr_found @
> expression mtrr_found.index, mtrr_found.base, mtrr_found.size;
> @@
>
> -mtrr_del(index, base, size);
> +arch_phys_wc_del(index);
>
> @ mtrr_rm_zero_arg depends on mtrr_found @
> expression mtrr_found.index;
> @@
>
> -mtrr_del(index, 0, 0);
> +arch_phys_wc_del(index);
>
> @ mtrr_rm_fb_info depends on mtrr_found @
> struct fb_info *info;
> expression mtrr_found.index;
> @@
>
> -mtrr_del(index, info->fix.smem_start, info->fix.smem_len);
> +arch_phys_wc_del(index);
>
> @ ioremap_replace_nocache depends on mtrr_found @
> struct fb_info *info;
> expression base, size;
> @@
>
> -info->screen_base = ioremap_nocache(base, size);
> +info->screen_base = ioremap_wc(base, size);
>
> @ ioremap_replace_default depends on mtrr_found @
> struct fb_info *info;
> expression base, size;
> @@
>
> -info->screen_base = ioremap(base, size);
> +info->screen_base = ioremap_wc(base, size);
>
> Generated-by: Coccinelle SmPL
> Cc: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>

Hey folks, can this be considered to be merged.

Thanks,

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* [PATCH v5 0/6] x86: address drivers that do not work with PAT
@ 2015-04-30 20:25 Luis R. Rodriguez
  2015-04-30 20:25 ` [PATCH v5 1/6] x86/mm/pat: use pr_info() and friends Luis R. Rodriguez
                   ` (5 more replies)
  0 siblings, 6 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-30 20:25 UTC (permalink / raw)
  To: bp, mingo, tglx, hpa, plagnioj, tomi.valkeinen, daniel.vetter, airlied
  Cc: dledford, awalls, syrjala, luto, mst, cocci, linux-kernel,
	Luis R. Rodriguez

From: "Luis R. Rodriguez" <mcgrof@suse.com>

This v5 drops the addition of new early_param_*() helpers
and their use on pat_enabled as we are sticking with
__read_mostly, and as per review this should be selectively
used only on well established hot paths. pat_enabled turns
out to be a common hot path, so we want to keep that. This
v5 also changes the pr_info() patch slightly to address the
feedback. The other patches do not change at all.

Luis R. Rodriguez (6):
  x86/mm/pat: use pr_info() and friends
  x86/mm/pat: redefine pat_enabled
  arch/x86/mm/pat: export pat_enabled()
  ivtv: use arch_phys_wc_add() and require PAT disabled
  IB/ipath: add counting for MTRR
  IB/ipath: use arch_phys_wc_add() and require PAT disabled

 arch/x86/include/asm/pat.h                    |  7 +--
 arch/x86/kernel/cpu/mtrr/main.c               |  2 +-
 arch/x86/mm/iomap_32.c                        |  2 +-
 arch/x86/mm/ioremap.c                         |  4 +-
 arch/x86/mm/pageattr.c                        |  2 +-
 arch/x86/mm/pat.c                             | 75 +++++++++++++--------------
 arch/x86/mm/pat_internal.h                    |  2 +-
 arch/x86/mm/pat_rbtree.c                      |  5 +-
 arch/x86/pci/i386.c                           |  6 +--
 drivers/infiniband/hw/ipath/Kconfig           |  3 ++
 drivers/infiniband/hw/ipath/ipath_driver.c    | 18 +++++--
 drivers/infiniband/hw/ipath/ipath_kernel.h    |  4 +-
 drivers/infiniband/hw/ipath/ipath_wc_x86_64.c | 43 ++++-----------
 drivers/media/pci/ivtv/Kconfig                |  3 ++
 drivers/media/pci/ivtv/ivtvfb.c               | 58 ++++++++-------------
 15 files changed, 103 insertions(+), 131 deletions(-)

-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply	[flat|nested] 710+ messages in thread

* [PATCH v5 1/6] x86/mm/pat: use pr_info() and friends
  2015-04-30 20:25 [PATCH v5 0/6] x86: address drivers that do not work with PAT Luis R. Rodriguez
@ 2015-04-30 20:25 ` Luis R. Rodriguez
  2015-05-04 14:58   ` Borislav Petkov
  2015-05-07  3:36   ` Elliott, Robert (Server Storage)
  2015-04-30 20:25 ` [PATCH v5 2/6] x86/mm/pat: redefine pat_enabled Luis R. Rodriguez
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-30 20:25 UTC (permalink / raw)
  To: bp, mingo, tglx, hpa, plagnioj, tomi.valkeinen, daniel.vetter, airlied
  Cc: dledford, awalls, syrjala, luto, mst, cocci, linux-kernel,
	Luis R. Rodriguez, Juergen Gross, Daniel Vetter, Dave Airlie,
	Bjorn Helgaas, x86

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Use pr_info() instead of the old printk to
prefix the component where things are coming
from. With this readers will know exactly where
the message is coming from. Since pr_fmt is
already defined in this case we redefine it to
"PAT: ".

We leave the users of dprintk() in place, this
will print only when the debugpat kernel parameter
is enabled. We want to leave those enabled as a
debug feature, but also make them use the same
prefix.

Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: x86@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 arch/x86/mm/pat.c          | 41 +++++++++++++++++++++--------------------
 arch/x86/mm/pat_internal.h |  2 +-
 arch/x86/mm/pat_rbtree.c   |  5 ++++-
 3 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 372ad42..8f88c6a 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -33,13 +33,16 @@
 #include "pat_internal.h"
 #include "mm_internal.h"
 
+#undef pr_fmt
+#define pr_fmt(fmt)	"PAT: " fmt
+
 #ifdef CONFIG_X86_PAT
 int __read_mostly pat_enabled = 1;
 
 static inline void pat_disable(const char *reason)
 {
 	pat_enabled = 0;
-	printk(KERN_INFO "%s\n", reason);
+	pr_info("%s\n", reason);
 }
 
 static int __init nopat(char *str)
@@ -211,8 +214,7 @@ void pat_init(void)
 			 * switched to PAT on the boot CPU. We have no way to
 			 * undo PAT.
 			 */
-			printk(KERN_ERR "PAT enabled, "
-			       "but not supported by secondary CPU\n");
+			pr_err("PAT enabled, but not supported by secondary CPU\n");
 			BUG();
 		}
 	}
@@ -451,9 +453,9 @@ int reserve_memtype(u64 start, u64 end, enum page_cache_mode req_type,
 
 	err = rbt_memtype_check_insert(new, new_type);
 	if (err) {
-		printk(KERN_INFO "reserve_memtype failed [mem %#010Lx-%#010Lx], track %s, req %s\n",
-		       start, end - 1,
-		       cattr_name(new->type), cattr_name(req_type));
+		pr_info("reserve_memtype failed [mem %#010Lx-%#010Lx], track %s, req %s\n",
+			start, end - 1,
+			cattr_name(new->type), cattr_name(req_type));
 		kfree(new);
 		spin_unlock(&memtype_lock);
 
@@ -497,8 +499,8 @@ int free_memtype(u64 start, u64 end)
 	spin_unlock(&memtype_lock);
 
 	if (!entry) {
-		printk(KERN_INFO "%s:%d freeing invalid memtype [mem %#010Lx-%#010Lx]\n",
-		       current->comm, current->pid, start, end - 1);
+		pr_info("%s:%d freeing invalid memtype [mem %#010Lx-%#010Lx]\n",
+			current->comm, current->pid, start, end - 1);
 		return -EINVAL;
 	}
 
@@ -628,8 +630,8 @@ static inline int range_is_allowed(unsigned long pfn, unsigned long size)
 
 	while (cursor < to) {
 		if (!devmem_is_allowed(pfn)) {
-			printk(KERN_INFO "Program %s tried to access /dev/mem between [mem %#010Lx-%#010Lx], PAT prevents it\n",
-			       current->comm, from, to - 1);
+			pr_info("Program %s tried to access /dev/mem between [mem %#010Lx-%#010Lx], PAT prevents it\n",
+				current->comm, from, to - 1);
 			return 0;
 		}
 		cursor += PAGE_SIZE;
@@ -698,8 +700,7 @@ int kernel_map_sync_memtype(u64 base, unsigned long size,
 				size;
 
 	if (ioremap_change_attr((unsigned long)__va(base), id_sz, pcm) < 0) {
-		printk(KERN_INFO "%s:%d ioremap_change_attr failed %s "
-			"for [mem %#010Lx-%#010Lx]\n",
+		pr_info("%s:%d ioremap_change_attr failed %s for [mem %#010Lx-%#010Lx]\n",
 			current->comm, current->pid,
 			cattr_name(pcm),
 			base, (unsigned long long)(base + size-1));
@@ -734,7 +735,7 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
 
 		pcm = lookup_memtype(paddr);
 		if (want_pcm != pcm) {
-			printk(KERN_WARNING "%s:%d map pfn RAM range req %s for [mem %#010Lx-%#010Lx], got %s\n",
+			pr_warn("%s:%d map pfn RAM range req %s for [mem %#010Lx-%#010Lx], got %s\n",
 				current->comm, current->pid,
 				cattr_name(want_pcm),
 				(unsigned long long)paddr,
@@ -755,13 +756,13 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
 		if (strict_prot ||
 		    !is_new_memtype_allowed(paddr, size, want_pcm, pcm)) {
 			free_memtype(paddr, paddr + size);
-			printk(KERN_ERR "%s:%d map pfn expected mapping type %s"
-				" for [mem %#010Lx-%#010Lx], got %s\n",
-				current->comm, current->pid,
-				cattr_name(want_pcm),
-				(unsigned long long)paddr,
-				(unsigned long long)(paddr + size - 1),
-				cattr_name(pcm));
+			pr_err("%s:%d map pfn expected mapping type %s"
+			       " for [mem %#010Lx-%#010Lx], got %s\n",
+			       current->comm, current->pid,
+			       cattr_name(want_pcm),
+			       (unsigned long long)paddr,
+			       (unsigned long long)(paddr + size - 1),
+			       cattr_name(pcm));
 			return -EINVAL;
 		}
 		/*
diff --git a/arch/x86/mm/pat_internal.h b/arch/x86/mm/pat_internal.h
index f641162..ea7fbf0 100644
--- a/arch/x86/mm/pat_internal.h
+++ b/arch/x86/mm/pat_internal.h
@@ -4,7 +4,7 @@
 extern int pat_debug_enable;
 
 #define dprintk(fmt, arg...) \
-	do { if (pat_debug_enable) printk(KERN_INFO fmt, ##arg); } while (0)
+	do { if (pat_debug_enable) pr_info(fmt, ##arg); } while (0)
 
 struct memtype {
 	u64			start;
diff --git a/arch/x86/mm/pat_rbtree.c b/arch/x86/mm/pat_rbtree.c
index 6582adc..374539b 100644
--- a/arch/x86/mm/pat_rbtree.c
+++ b/arch/x86/mm/pat_rbtree.c
@@ -21,6 +21,9 @@
 
 #include "pat_internal.h"
 
+#undef pr_fmt
+#define pr_fmt(fmt)	"PAT: " fmt
+
 /*
  * The memtype tree keeps track of memory type for specific
  * physical memory areas. Without proper tracking, conflicting memory
@@ -160,7 +163,7 @@ success:
 	return 0;
 
 failure:
-	printk(KERN_INFO "%s:%d conflicting memory types "
+	pr_info("%s:%d conflicting memory types "
 		"%Lx-%Lx %s<->%s\n", current->comm, current->pid, start,
 		end, cattr_name(found_type), cattr_name(match->type));
 	return -EBUSY;
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v5 2/6] x86/mm/pat: redefine pat_enabled
  2015-04-30 20:25 [PATCH v5 0/6] x86: address drivers that do not work with PAT Luis R. Rodriguez
  2015-04-30 20:25 ` [PATCH v5 1/6] x86/mm/pat: use pr_info() and friends Luis R. Rodriguez
@ 2015-04-30 20:25 ` Luis R. Rodriguez
  2015-05-04 15:22   ` Borislav Petkov
  2015-04-30 20:25 ` [PATCH v5 3/6] arch/x86/mm/pat: export pat_enabled() Luis R. Rodriguez
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-30 20:25 UTC (permalink / raw)
  To: bp, mingo, tglx, hpa, plagnioj, tomi.valkeinen, daniel.vetter, airlied
  Cc: dledford, awalls, syrjala, luto, mst, cocci, linux-kernel,
	Luis R. Rodriguez, Christoph Lameter, Kyle McMartin,
	Juergen Gross, Daniel Vetter, Dave Airlie, Bjorn Helgaas, x86

From: "Luis R. Rodriguez" <mcgrof@suse.com>

We use pat_enabled on x86 specific code to see if PAT
is enabled or not, we however are granting full access to
the variable even though readers do not need to set it.
If for instance we granted access to it to modules later
they then could override the variable setting... no bueno.

This renames pat_enabled under a new static variable
__pat_enabled, to see if PAT is enabled / disabled folks
can just use pat_enabled() now. Code that sets this can
only be internal to pat.c. Apart from the early kernel
parameter "nopat" to disable PAT we also have a few
cases that disable it later and make use of a helper
pat_disable(), this helper is wrapped under an ifdef but
since that code cannot run unless PAT was enabled its not
required to wrap it with ifdefs, unwrap that. Likewise
since "nopat" doesn't really change non-PAT systems
just remove that ifdef as well.

Although we could add and use an early_param_off()
these helpers don't use __read_mostly and we want to
keep __read_mostly for __pat_enabled as this is a hot
path -- upon boot for instance a simple guest may see
~4k accesses to pat_enabled(). Since __read_mostly
early boot params are not that common we don't add a
helper for them just yet.

Cc: Borislav Petkov <bp@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: x86@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 arch/x86/include/asm/pat.h      |  7 +------
 arch/x86/kernel/cpu/mtrr/main.c |  2 +-
 arch/x86/mm/iomap_32.c          |  2 +-
 arch/x86/mm/ioremap.c           |  4 ++--
 arch/x86/mm/pageattr.c          |  2 +-
 arch/x86/mm/pat.c               | 33 +++++++++++++++------------------
 arch/x86/pci/i386.c             |  6 +++---
 7 files changed, 24 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/pat.h b/arch/x86/include/asm/pat.h
index 91bc4ba..cdcff7f 100644
--- a/arch/x86/include/asm/pat.h
+++ b/arch/x86/include/asm/pat.h
@@ -4,12 +4,7 @@
 #include <linux/types.h>
 #include <asm/pgtable_types.h>
 
-#ifdef CONFIG_X86_PAT
-extern int pat_enabled;
-#else
-static const int pat_enabled;
-#endif
-
+bool pat_enabled(void);
 extern void pat_init(void);
 void pat_init_cache_modes(void);
 
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index bfef424..f094d36 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -556,7 +556,7 @@ int arch_phys_wc_add(unsigned long base, unsigned long size)
 {
 	int ret;
 
-	if (pat_enabled || !mtrr_enabled)
+	if (pat_enabled() || !mtrr_enabled)
 		return 0;  /* Success!  (We don't need to do anything.) */
 
 	ret = mtrr_add(base, size, MTRR_TYPE_WRCOMB, true);
diff --git a/arch/x86/mm/iomap_32.c b/arch/x86/mm/iomap_32.c
index 9ca35fc..3a2ec87 100644
--- a/arch/x86/mm/iomap_32.c
+++ b/arch/x86/mm/iomap_32.c
@@ -82,7 +82,7 @@ iomap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot)
 	 * MTRR is UC or WC.  UC_MINUS gets the real intention, of the
 	 * user, which is "WC if the MTRR is WC, UC if you can't do that."
 	 */
-	if (!pat_enabled && pgprot_val(prot) ==
+	if (!pat_enabled() && pgprot_val(prot) ==
 	    (__PAGE_KERNEL | cachemode2protval(_PAGE_CACHE_MODE_WC)))
 		prot = __pgprot(__PAGE_KERNEL |
 				cachemode2protval(_PAGE_CACHE_MODE_UC_MINUS));
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index a493bb8..82d63ed 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -234,7 +234,7 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, unsigned long size)
 {
 	/*
 	 * Ideally, this should be:
-	 *	pat_enabled ? _PAGE_CACHE_MODE_UC : _PAGE_CACHE_MODE_UC_MINUS;
+	 *	pat_enabled() ? _PAGE_CACHE_MODE_UC : _PAGE_CACHE_MODE_UC_MINUS;
 	 *
 	 * Till we fix all X drivers to use ioremap_wc(), we will use
 	 * UC MINUS. Drivers that are certain they need or can already
@@ -292,7 +292,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc);
  */
 void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
 {
-	if (pat_enabled)
+	if (pat_enabled())
 		return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC,
 					__builtin_return_address(0));
 	else
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 49660c0..0aa8dd8 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1574,7 +1574,7 @@ int set_memory_wc(unsigned long addr, int numpages)
 {
 	int ret;
 
-	if (!pat_enabled)
+	if (!pat_enabled())
 		return set_memory_uc(addr, numpages);
 
 	ret = reserve_memtype(__pa(addr), __pa(addr) + numpages * PAGE_SIZE,
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 8f88c6a..f64785e 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -36,12 +36,11 @@
 #undef pr_fmt
 #define pr_fmt(fmt)	"PAT: " fmt
 
-#ifdef CONFIG_X86_PAT
-int __read_mostly pat_enabled = 1;
+static int __read_mostly __pat_enabled = IS_ENABLED(CONFIG_X86_PAT);
 
 static inline void pat_disable(const char *reason)
 {
-	pat_enabled = 0;
+	__pat_enabled = false;
 	pr_info("%s\n", reason);
 }
 
@@ -51,13 +50,11 @@ static int __init nopat(char *str)
 	return 0;
 }
 early_param("nopat", nopat);
-#else
-static inline void pat_disable(const char *reason)
+
+bool pat_enabled(void)
 {
-	(void)reason;
+	return !!__pat_enabled;
 }
-#endif
-
 
 int pat_debug_enable;
 
@@ -201,7 +198,7 @@ void pat_init(void)
 	u64 pat;
 	bool boot_cpu = !boot_pat_state;
 
-	if (!pat_enabled)
+	if (!pat_enabled())
 		return;
 
 	if (!cpu_has_pat) {
@@ -402,7 +399,7 @@ int reserve_memtype(u64 start, u64 end, enum page_cache_mode req_type,
 
 	BUG_ON(start >= end); /* end is exclusive */
 
-	if (!pat_enabled) {
+	if (!pat_enabled()) {
 		/* This is identical to page table setting without PAT */
 		if (new_type) {
 			if (req_type == _PAGE_CACHE_MODE_WC)
@@ -477,7 +474,7 @@ int free_memtype(u64 start, u64 end)
 	int is_range_ram;
 	struct memtype *entry;
 
-	if (!pat_enabled)
+	if (!pat_enabled())
 		return 0;
 
 	/* Low ISA region is always mapped WB. No need to track */
@@ -625,7 +622,7 @@ static inline int range_is_allowed(unsigned long pfn, unsigned long size)
 	u64 to = from + size;
 	u64 cursor = from;
 
-	if (!pat_enabled)
+	if (!pat_enabled())
 		return 1;
 
 	while (cursor < to) {
@@ -661,7 +658,7 @@ int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
 	 * caching for the high addresses through the KEN pin, but
 	 * we maintain the tradition of paranoia in this code.
 	 */
-	if (!pat_enabled &&
+	if (!pat_enabled() &&
 	    !(boot_cpu_has(X86_FEATURE_MTRR) ||
 	      boot_cpu_has(X86_FEATURE_K6_MTRR) ||
 	      boot_cpu_has(X86_FEATURE_CYRIX_ARR) ||
@@ -730,7 +727,7 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
 	 * the type requested matches the type of first page in the range.
 	 */
 	if (is_ram) {
-		if (!pat_enabled)
+		if (!pat_enabled())
 			return 0;
 
 		pcm = lookup_memtype(paddr);
@@ -845,7 +842,7 @@ int track_pfn_remap(struct vm_area_struct *vma, pgprot_t *prot,
 		return ret;
 	}
 
-	if (!pat_enabled)
+	if (!pat_enabled())
 		return 0;
 
 	/*
@@ -873,7 +870,7 @@ int track_pfn_insert(struct vm_area_struct *vma, pgprot_t *prot,
 {
 	enum page_cache_mode pcm;
 
-	if (!pat_enabled)
+	if (!pat_enabled())
 		return 0;
 
 	/* Set prot based on lookup */
@@ -914,7 +911,7 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
 
 pgprot_t pgprot_writecombine(pgprot_t prot)
 {
-	if (pat_enabled)
+	if (pat_enabled())
 		return __pgprot(pgprot_val(prot) |
 				cachemode2protval(_PAGE_CACHE_MODE_WC));
 	else
@@ -997,7 +994,7 @@ static const struct file_operations memtype_fops = {
 
 static int __init pat_memtype_list_init(void)
 {
-	if (pat_enabled) {
+	if (pat_enabled()) {
 		debugfs_create_file("pat_memtype_list", S_IRUSR,
 				    arch_debugfs_dir, NULL, &memtype_fops);
 	}
diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index 349c0d3..0a9f2ca 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -429,12 +429,12 @@ int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
  	 * Caller can followup with UC MINUS request and add a WC mtrr if there
  	 * is a free mtrr slot.
  	 */
-	if (!pat_enabled && write_combine)
+	if (!pat_enabled() && write_combine)
 		return -EINVAL;
 
-	if (pat_enabled && write_combine)
+	if (pat_enabled() && write_combine)
 		prot |= cachemode2protval(_PAGE_CACHE_MODE_WC);
-	else if (pat_enabled || boot_cpu_data.x86 > 3)
+	else if (pat_enabled() || boot_cpu_data.x86 > 3)
 		/*
 		 * ioremap() and ioremap_nocache() defaults to UC MINUS for now.
 		 * To avoid attribute conflicts, request UC MINUS here
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v5 3/6] arch/x86/mm/pat: export pat_enabled()
  2015-04-30 20:25 [PATCH v5 0/6] x86: address drivers that do not work with PAT Luis R. Rodriguez
  2015-04-30 20:25 ` [PATCH v5 1/6] x86/mm/pat: use pr_info() and friends Luis R. Rodriguez
  2015-04-30 20:25 ` [PATCH v5 2/6] x86/mm/pat: redefine pat_enabled Luis R. Rodriguez
@ 2015-04-30 20:25 ` Luis R. Rodriguez
  2015-05-04 15:29   ` Borislav Petkov
  2015-04-30 20:25   ` Luis R. Rodriguez
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-30 20:25 UTC (permalink / raw)
  To: bp, mingo, tglx, hpa, plagnioj, tomi.valkeinen, daniel.vetter, airlied
  Cc: dledford, awalls, syrjala, luto, mst, cocci, linux-kernel,
	Luis R. Rodriguez, Juergen Gross, Daniel Vetter, Dave Airlie,
	Bjorn Helgaas, x86

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Two Linux device drivers cannot work with PAT and the work
required to make them work is significant. There is not
enough motivation to convert these drivers over to use
PAT properly, the compromise reached is to let drivers
that cannot be ported to PAT check if PAT was enabled
and if so fail on probe with a recommendation to boot
with the "nopat" kernel parameter.

Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: x86@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 arch/x86/mm/pat.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index f64785e..3d60207 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -55,6 +55,7 @@ bool pat_enabled(void)
 {
 	return !!__pat_enabled;
 }
+EXPORT_SYMBOL_GPL(pat_enabled);
 
 int pat_debug_enable;
 
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v5 4/6] ivtv: use arch_phys_wc_add() and require PAT disabled
  2015-04-30 20:25 [PATCH v5 0/6] x86: address drivers that do not work with PAT Luis R. Rodriguez
  2015-04-30 20:25 ` [PATCH v5 1/6] x86/mm/pat: use pr_info() and friends Luis R. Rodriguez
@ 2015-04-30 20:25   ` Luis R. Rodriguez
  2015-04-30 20:25 ` [PATCH v5 3/6] arch/x86/mm/pat: export pat_enabled() Luis R. Rodriguez
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-30 20:25 UTC (permalink / raw)
  To: bp, mingo, tglx, hpa, plagnioj, tomi.valkeinen, daniel.vetter, airlied
  Cc: dledford, awalls, syrjala, luto, mst, cocci, linux-kernel,
	Luis R. Rodriguez, Mauro Carvalho Chehab, Suresh Siddha,
	Juergen Gross, Daniel Vetter, Dave Airlie, Bjorn Helgaas,
	Antonino Daplas, Dave Hansen, Arnd Bergmann, Stefan Bader,
	Mel Gorman, Vlastimil Babka, Davidlohr Bueso, konrad.wilk,
	ville.syrjala, david.vrabel, jbeulich, toshi.kani,
	Roger Pau Monné,
	linux-fbdev, ivtv-devel, linux-media, xen-devel

From: "Luis R. Rodriguez" <mcgrof@suse.com>

We are burrying direct access to MTRR code support on
x86 in order to take advantage of PAT. In the future we
also want to make the default behaviour of ioremap_nocache()
to use strong UC, use of mtrr_add() on those systems
would make write-combining void.

In order to help both enable us to later make strong
UC default and in order to phase out direct MTRR access
code port the driver over to arch_phys_wc_add() and
annotate that the device driver requires systems to
boot with PAT disabled, with the nopat kernel parameter.

This is a worthy comprmise given that the hardware is
really rare these days, and perhaps only some lost souls
in some third world country are expected to be using this
feature of the device driver.

Acked-by: Andy Walls <awalls@md.metrocast.net>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: ivtv-devel@ivtvdriver.org
Cc: linux-media@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/media/pci/ivtv/Kconfig  |  3 +++
 drivers/media/pci/ivtv/ivtvfb.c | 58 ++++++++++++++++-------------------------
 2 files changed, 26 insertions(+), 35 deletions(-)

diff --git a/drivers/media/pci/ivtv/Kconfig b/drivers/media/pci/ivtv/Kconfig
index dd6ee57e..b2a7f88 100644
--- a/drivers/media/pci/ivtv/Kconfig
+++ b/drivers/media/pci/ivtv/Kconfig
@@ -57,5 +57,8 @@ config VIDEO_FB_IVTV
 	  This is used in the Hauppauge PVR-350 card. There is a driver
 	  homepage at <http://www.ivtvdriver.org>.
 
+	  If you have this hardware you will need to boot with PAT disabled
+	  on your x86 systems, use the nopat kernel parameter.
+
 	  To compile this driver as a module, choose M here: the
 	  module will be called ivtvfb.
diff --git a/drivers/media/pci/ivtv/ivtvfb.c b/drivers/media/pci/ivtv/ivtvfb.c
index 9ff1230..7685ae3 100644
--- a/drivers/media/pci/ivtv/ivtvfb.c
+++ b/drivers/media/pci/ivtv/ivtvfb.c
@@ -44,8 +44,8 @@
 #include <linux/ivtvfb.h>
 #include <linux/slab.h>
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
+#ifdef CONFIG_X86_64
+#include <asm/pat.h>
 #endif
 
 #include "ivtv-driver.h"
@@ -155,12 +155,11 @@ struct osd_info {
 	/* Buffer size */
 	u32 video_buffer_size;
 
-#ifdef CONFIG_MTRR
 	/* video_base rounded down as required by hardware MTRRs */
 	unsigned long fb_start_aligned_physaddr;
 	/* video_base rounded up as required by hardware MTRRs */
 	unsigned long fb_end_aligned_physaddr;
-#endif
+	int wc_cookie;
 
 	/* Store the buffer offset */
 	int set_osd_coords_x;
@@ -1099,6 +1098,8 @@ static int ivtvfb_init_vidmode(struct ivtv *itv)
 static int ivtvfb_init_io(struct ivtv *itv)
 {
 	struct osd_info *oi = itv->osd_info;
+	/* Find the largest power of two that maps the whole buffer */
+	int size_shift = 31;
 
 	mutex_lock(&itv->serialize_lock);
 	if (ivtv_init_on_first_open(itv)) {
@@ -1132,29 +1133,16 @@ static int ivtvfb_init_io(struct ivtv *itv)
 			oi->video_pbase, oi->video_vbase,
 			oi->video_buffer_size / 1024);
 
-#ifdef CONFIG_MTRR
-	{
-		/* Find the largest power of two that maps the whole buffer */
-		int size_shift = 31;
-
-		while (!(oi->video_buffer_size & (1 << size_shift))) {
-			size_shift--;
-		}
-		size_shift++;
-		oi->fb_start_aligned_physaddr = oi->video_pbase & ~((1 << size_shift) - 1);
-		oi->fb_end_aligned_physaddr = oi->video_pbase + oi->video_buffer_size;
-		oi->fb_end_aligned_physaddr += (1 << size_shift) - 1;
-		oi->fb_end_aligned_physaddr &= ~((1 << size_shift) - 1);
-		if (mtrr_add(oi->fb_start_aligned_physaddr,
-			oi->fb_end_aligned_physaddr - oi->fb_start_aligned_physaddr,
-			     MTRR_TYPE_WRCOMB, 1) < 0) {
-			IVTVFB_INFO("disabled mttr\n");
-			oi->fb_start_aligned_physaddr = 0;
-			oi->fb_end_aligned_physaddr = 0;
-		}
-	}
-#endif
-
+	while (!(oi->video_buffer_size & (1 << size_shift)))
+		size_shift--;
+	size_shift++;
+	oi->fb_start_aligned_physaddr = oi->video_pbase & ~((1 << size_shift) - 1);
+	oi->fb_end_aligned_physaddr = oi->video_pbase + oi->video_buffer_size;
+	oi->fb_end_aligned_physaddr += (1 << size_shift) - 1;
+	oi->fb_end_aligned_physaddr &= ~((1 << size_shift) - 1);
+	oi->wc_cookie = arch_phys_wc_add(oi->fb_start_aligned_physaddr,
+					 oi->fb_end_aligned_physaddr -
+					 oi->fb_start_aligned_physaddr);
 	/* Blank the entire osd. */
 	memset_io(oi->video_vbase, 0, oi->video_buffer_size);
 
@@ -1172,14 +1160,7 @@ static void ivtvfb_release_buffers (struct ivtv *itv)
 
 	/* Release pseudo palette */
 	kfree(oi->ivtvfb_info.pseudo_palette);
-
-#ifdef CONFIG_MTRR
-	if (oi->fb_end_aligned_physaddr) {
-		mtrr_del(-1, oi->fb_start_aligned_physaddr,
-			oi->fb_end_aligned_physaddr - oi->fb_start_aligned_physaddr);
-	}
-#endif
-
+	arch_phys_wc_del(oi->wc_cookie);
 	kfree(oi);
 	itv->osd_info = NULL;
 }
@@ -1284,6 +1265,13 @@ static int __init ivtvfb_init(void)
 	int registered = 0;
 	int err;
 
+#ifdef CONFIG_X86_64
+	if (WARN(pat_enabled(),
+		 "ivtvfb needs PAT disabled, boot with nopat kernel parameter\n")) {
+		return EINVAL;
+	}
+#endif
+
 	if (ivtvfb_card_id < -1 || ivtvfb_card_id >= IVTV_MAX_CARDS) {
 		printk(KERN_ERR "ivtvfb:  ivtvfb_card_id parameter is out of range (valid range: -1 - %d)\n",
 		     IVTV_MAX_CARDS - 1);
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v5 4/6] ivtv: use arch_phys_wc_add() and require PAT disabled
@ 2015-04-30 20:25   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-30 20:25 UTC (permalink / raw)
  To: bp, mingo, tglx, hpa, plagnioj, tomi.valkeinen, daniel.vetter, airlied
  Cc: dledford, awalls, syrjala, luto, mst, cocci, linux-kernel,
	Luis R. Rodriguez, Mauro Carvalho Chehab, Suresh Siddha,
	Juergen Gross, Daniel Vetter, Dave Airlie, Bjorn Helgaas,
	Antonino Daplas, Dave Hansen, Arnd Bergmann, Stefan Bader,
	Mel Gorman, Vlastimil Babka, Davidlohr Bueso, konrad.wilk,
	ville.syrjala, david.vrabel, jbeulich, toshi.kani,
	Roger Pau Monné,
	linux-fbdev, ivtv-devel, linux-media, xen-devel

From: "Luis R. Rodriguez" <mcgrof@suse.com>

We are burrying direct access to MTRR code support on
x86 in order to take advantage of PAT. In the future we
also want to make the default behaviour of ioremap_nocache()
to use strong UC, use of mtrr_add() on those systems
would make write-combining void.

In order to help both enable us to later make strong
UC default and in order to phase out direct MTRR access
code port the driver over to arch_phys_wc_add() and
annotate that the device driver requires systems to
boot with PAT disabled, with the nopat kernel parameter.

This is a worthy comprmise given that the hardware is
really rare these days, and perhaps only some lost souls
in some third world country are expected to be using this
feature of the device driver.

Acked-by: Andy Walls <awalls@md.metrocast.net>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: ivtv-devel@ivtvdriver.org
Cc: linux-media@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/media/pci/ivtv/Kconfig  |  3 +++
 drivers/media/pci/ivtv/ivtvfb.c | 58 ++++++++++++++++-------------------------
 2 files changed, 26 insertions(+), 35 deletions(-)

diff --git a/drivers/media/pci/ivtv/Kconfig b/drivers/media/pci/ivtv/Kconfig
index dd6ee57e..b2a7f88 100644
--- a/drivers/media/pci/ivtv/Kconfig
+++ b/drivers/media/pci/ivtv/Kconfig
@@ -57,5 +57,8 @@ config VIDEO_FB_IVTV
 	  This is used in the Hauppauge PVR-350 card. There is a driver
 	  homepage at <http://www.ivtvdriver.org>.
 
+	  If you have this hardware you will need to boot with PAT disabled
+	  on your x86 systems, use the nopat kernel parameter.
+
 	  To compile this driver as a module, choose M here: the
 	  module will be called ivtvfb.
diff --git a/drivers/media/pci/ivtv/ivtvfb.c b/drivers/media/pci/ivtv/ivtvfb.c
index 9ff1230..7685ae3 100644
--- a/drivers/media/pci/ivtv/ivtvfb.c
+++ b/drivers/media/pci/ivtv/ivtvfb.c
@@ -44,8 +44,8 @@
 #include <linux/ivtvfb.h>
 #include <linux/slab.h>
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
+#ifdef CONFIG_X86_64
+#include <asm/pat.h>
 #endif
 
 #include "ivtv-driver.h"
@@ -155,12 +155,11 @@ struct osd_info {
 	/* Buffer size */
 	u32 video_buffer_size;
 
-#ifdef CONFIG_MTRR
 	/* video_base rounded down as required by hardware MTRRs */
 	unsigned long fb_start_aligned_physaddr;
 	/* video_base rounded up as required by hardware MTRRs */
 	unsigned long fb_end_aligned_physaddr;
-#endif
+	int wc_cookie;
 
 	/* Store the buffer offset */
 	int set_osd_coords_x;
@@ -1099,6 +1098,8 @@ static int ivtvfb_init_vidmode(struct ivtv *itv)
 static int ivtvfb_init_io(struct ivtv *itv)
 {
 	struct osd_info *oi = itv->osd_info;
+	/* Find the largest power of two that maps the whole buffer */
+	int size_shift = 31;
 
 	mutex_lock(&itv->serialize_lock);
 	if (ivtv_init_on_first_open(itv)) {
@@ -1132,29 +1133,16 @@ static int ivtvfb_init_io(struct ivtv *itv)
 			oi->video_pbase, oi->video_vbase,
 			oi->video_buffer_size / 1024);
 
-#ifdef CONFIG_MTRR
-	{
-		/* Find the largest power of two that maps the whole buffer */
-		int size_shift = 31;
-
-		while (!(oi->video_buffer_size & (1 << size_shift))) {
-			size_shift--;
-		}
-		size_shift++;
-		oi->fb_start_aligned_physaddr = oi->video_pbase & ~((1 << size_shift) - 1);
-		oi->fb_end_aligned_physaddr = oi->video_pbase + oi->video_buffer_size;
-		oi->fb_end_aligned_physaddr += (1 << size_shift) - 1;
-		oi->fb_end_aligned_physaddr &= ~((1 << size_shift) - 1);
-		if (mtrr_add(oi->fb_start_aligned_physaddr,
-			oi->fb_end_aligned_physaddr - oi->fb_start_aligned_physaddr,
-			     MTRR_TYPE_WRCOMB, 1) < 0) {
-			IVTVFB_INFO("disabled mttr\n");
-			oi->fb_start_aligned_physaddr = 0;
-			oi->fb_end_aligned_physaddr = 0;
-		}
-	}
-#endif
-
+	while (!(oi->video_buffer_size & (1 << size_shift)))
+		size_shift--;
+	size_shift++;
+	oi->fb_start_aligned_physaddr = oi->video_pbase & ~((1 << size_shift) - 1);
+	oi->fb_end_aligned_physaddr = oi->video_pbase + oi->video_buffer_size;
+	oi->fb_end_aligned_physaddr += (1 << size_shift) - 1;
+	oi->fb_end_aligned_physaddr &= ~((1 << size_shift) - 1);
+	oi->wc_cookie = arch_phys_wc_add(oi->fb_start_aligned_physaddr,
+					 oi->fb_end_aligned_physaddr -
+					 oi->fb_start_aligned_physaddr);
 	/* Blank the entire osd. */
 	memset_io(oi->video_vbase, 0, oi->video_buffer_size);
 
@@ -1172,14 +1160,7 @@ static void ivtvfb_release_buffers (struct ivtv *itv)
 
 	/* Release pseudo palette */
 	kfree(oi->ivtvfb_info.pseudo_palette);
-
-#ifdef CONFIG_MTRR
-	if (oi->fb_end_aligned_physaddr) {
-		mtrr_del(-1, oi->fb_start_aligned_physaddr,
-			oi->fb_end_aligned_physaddr - oi->fb_start_aligned_physaddr);
-	}
-#endif
-
+	arch_phys_wc_del(oi->wc_cookie);
 	kfree(oi);
 	itv->osd_info = NULL;
 }
@@ -1284,6 +1265,13 @@ static int __init ivtvfb_init(void)
 	int registered = 0;
 	int err;
 
+#ifdef CONFIG_X86_64
+	if (WARN(pat_enabled(),
+		 "ivtvfb needs PAT disabled, boot with nopat kernel parameter\n")) {
+		return EINVAL;
+	}
+#endif
+
 	if (ivtvfb_card_id < -1 || ivtvfb_card_id >= IVTV_MAX_CARDS) {
 		printk(KERN_ERR "ivtvfb:  ivtvfb_card_id parameter is out of range (valid range: -1 - %d)\n",
 		     IVTV_MAX_CARDS - 1);
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v5 4/6] ivtv: use arch_phys_wc_add() and require PAT disabled
@ 2015-04-30 20:25   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-30 20:25 UTC (permalink / raw)
  To: bp, mingo, tglx, hpa, plagnioj, tomi.valkeinen, daniel.vetter, airlied
  Cc: dledford, awalls, syrjala, luto, mst, cocci, linux-kernel,
	Luis R. Rodriguez, Mauro Carvalho Chehab, Suresh Siddha,
	Juergen Gross, Daniel Vetter, Dave Airlie, Bjorn Helgaas,
	Antonino Daplas, Dave Hansen, Arnd Bergmann, Stefan Bader,
	Mel Gorman, Vlastimil Babka, Davidlohr Bueso, konrad.wilk,
	ville.syrjala, david.vrabel, jbeulich, toshi.kani,
	Roger Pau Monné,
	linux-fbdev

From: "Luis R. Rodriguez" <mcgrof@suse.com>

We are burrying direct access to MTRR code support on
x86 in order to take advantage of PAT. In the future we
also want to make the default behaviour of ioremap_nocache()
to use strong UC, use of mtrr_add() on those systems
would make write-combining void.

In order to help both enable us to later make strong
UC default and in order to phase out direct MTRR access
code port the driver over to arch_phys_wc_add() and
annotate that the device driver requires systems to
boot with PAT disabled, with the nopat kernel parameter.

This is a worthy comprmise given that the hardware is
really rare these days, and perhaps only some lost souls
in some third world country are expected to be using this
feature of the device driver.

Acked-by: Andy Walls <awalls@md.metrocast.net>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: ivtv-devel@ivtvdriver.org
Cc: linux-media@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/media/pci/ivtv/Kconfig  |  3 +++
 drivers/media/pci/ivtv/ivtvfb.c | 58 ++++++++++++++++-------------------------
 2 files changed, 26 insertions(+), 35 deletions(-)

diff --git a/drivers/media/pci/ivtv/Kconfig b/drivers/media/pci/ivtv/Kconfig
index dd6ee57e..b2a7f88 100644
--- a/drivers/media/pci/ivtv/Kconfig
+++ b/drivers/media/pci/ivtv/Kconfig
@@ -57,5 +57,8 @@ config VIDEO_FB_IVTV
 	  This is used in the Hauppauge PVR-350 card. There is a driver
 	  homepage at <http://www.ivtvdriver.org>.
 
+	  If you have this hardware you will need to boot with PAT disabled
+	  on your x86 systems, use the nopat kernel parameter.
+
 	  To compile this driver as a module, choose M here: the
 	  module will be called ivtvfb.
diff --git a/drivers/media/pci/ivtv/ivtvfb.c b/drivers/media/pci/ivtv/ivtvfb.c
index 9ff1230..7685ae3 100644
--- a/drivers/media/pci/ivtv/ivtvfb.c
+++ b/drivers/media/pci/ivtv/ivtvfb.c
@@ -44,8 +44,8 @@
 #include <linux/ivtvfb.h>
 #include <linux/slab.h>
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
+#ifdef CONFIG_X86_64
+#include <asm/pat.h>
 #endif
 
 #include "ivtv-driver.h"
@@ -155,12 +155,11 @@ struct osd_info {
 	/* Buffer size */
 	u32 video_buffer_size;
 
-#ifdef CONFIG_MTRR
 	/* video_base rounded down as required by hardware MTRRs */
 	unsigned long fb_start_aligned_physaddr;
 	/* video_base rounded up as required by hardware MTRRs */
 	unsigned long fb_end_aligned_physaddr;
-#endif
+	int wc_cookie;
 
 	/* Store the buffer offset */
 	int set_osd_coords_x;
@@ -1099,6 +1098,8 @@ static int ivtvfb_init_vidmode(struct ivtv *itv)
 static int ivtvfb_init_io(struct ivtv *itv)
 {
 	struct osd_info *oi = itv->osd_info;
+	/* Find the largest power of two that maps the whole buffer */
+	int size_shift = 31;
 
 	mutex_lock(&itv->serialize_lock);
 	if (ivtv_init_on_first_open(itv)) {
@@ -1132,29 +1133,16 @@ static int ivtvfb_init_io(struct ivtv *itv)
 			oi->video_pbase, oi->video_vbase,
 			oi->video_buffer_size / 1024);
 
-#ifdef CONFIG_MTRR
-	{
-		/* Find the largest power of two that maps the whole buffer */
-		int size_shift = 31;
-
-		while (!(oi->video_buffer_size & (1 << size_shift))) {
-			size_shift--;
-		}
-		size_shift++;
-		oi->fb_start_aligned_physaddr = oi->video_pbase & ~((1 << size_shift) - 1);
-		oi->fb_end_aligned_physaddr = oi->video_pbase + oi->video_buffer_size;
-		oi->fb_end_aligned_physaddr += (1 << size_shift) - 1;
-		oi->fb_end_aligned_physaddr &= ~((1 << size_shift) - 1);
-		if (mtrr_add(oi->fb_start_aligned_physaddr,
-			oi->fb_end_aligned_physaddr - oi->fb_start_aligned_physaddr,
-			     MTRR_TYPE_WRCOMB, 1) < 0) {
-			IVTVFB_INFO("disabled mttr\n");
-			oi->fb_start_aligned_physaddr = 0;
-			oi->fb_end_aligned_physaddr = 0;
-		}
-	}
-#endif
-
+	while (!(oi->video_buffer_size & (1 << size_shift)))
+		size_shift--;
+	size_shift++;
+	oi->fb_start_aligned_physaddr = oi->video_pbase & ~((1 << size_shift) - 1);
+	oi->fb_end_aligned_physaddr = oi->video_pbase + oi->video_buffer_size;
+	oi->fb_end_aligned_physaddr += (1 << size_shift) - 1;
+	oi->fb_end_aligned_physaddr &= ~((1 << size_shift) - 1);
+	oi->wc_cookie = arch_phys_wc_add(oi->fb_start_aligned_physaddr,
+					 oi->fb_end_aligned_physaddr -
+					 oi->fb_start_aligned_physaddr);
 	/* Blank the entire osd. */
 	memset_io(oi->video_vbase, 0, oi->video_buffer_size);
 
@@ -1172,14 +1160,7 @@ static void ivtvfb_release_buffers (struct ivtv *itv)
 
 	/* Release pseudo palette */
 	kfree(oi->ivtvfb_info.pseudo_palette);
-
-#ifdef CONFIG_MTRR
-	if (oi->fb_end_aligned_physaddr) {
-		mtrr_del(-1, oi->fb_start_aligned_physaddr,
-			oi->fb_end_aligned_physaddr - oi->fb_start_aligned_physaddr);
-	}
-#endif
-
+	arch_phys_wc_del(oi->wc_cookie);
 	kfree(oi);
 	itv->osd_info = NULL;
 }
@@ -1284,6 +1265,13 @@ static int __init ivtvfb_init(void)
 	int registered = 0;
 	int err;
 
+#ifdef CONFIG_X86_64
+	if (WARN(pat_enabled(),
+		 "ivtvfb needs PAT disabled, boot with nopat kernel parameter\n")) {
+		return EINVAL;
+	}
+#endif
+
 	if (ivtvfb_card_id < -1 || ivtvfb_card_id >= IVTV_MAX_CARDS) {
 		printk(KERN_ERR "ivtvfb:  ivtvfb_card_id parameter is out of range (valid range: -1 - %d)\n",
 		     IVTV_MAX_CARDS - 1);
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v5 5/6] IB/ipath: add counting for MTRR
  2015-04-30 20:25 [PATCH v5 0/6] x86: address drivers that do not work with PAT Luis R. Rodriguez
@ 2015-04-30 20:25   ` Luis R. Rodriguez
  2015-04-30 20:25 ` [PATCH v5 2/6] x86/mm/pat: redefine pat_enabled Luis R. Rodriguez
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-30 20:25 UTC (permalink / raw)
  To: bp, mingo, tglx, hpa, plagnioj, tomi.valkeinen, daniel.vetter, airlied
  Cc: dledford, awalls, syrjala, luto, mst, cocci, linux-kernel,
	Luis R. Rodriguez, Toshi Kani, Roland Dreier, Sean Hefty,
	Hal Rosenstock, Suresh Siddha, Juergen Gross, Daniel Vetter,
	Dave Airlie, Antonino Daplas, infinipath, linux-rdma,
	linux-fbdev

From: "Luis R. Rodriguez" <mcgrof@suse.com>

There is no good reason not to, we eventually delete it as well.

Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: infinipath@intel.com
Cc: linux-rdma@vger.kernel.org
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/infiniband/hw/ipath/ipath_wc_x86_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
index 4ad0b93..70c1f3a 100644
--- a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
+++ b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
@@ -127,7 +127,7 @@ int ipath_enable_wc(struct ipath_devdata *dd)
 			   "(addr %llx, len=0x%llx)\n",
 			   (unsigned long long) pioaddr,
 			   (unsigned long long) piolen);
-		cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 0);
+		cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 1);
 		if (cookie < 0) {
 			{
 				dev_info(&dd->pcidev->dev,
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v5 5/6] IB/ipath: add counting for MTRR
@ 2015-04-30 20:25   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-30 20:25 UTC (permalink / raw)
  To: bp, mingo, tglx, hpa, plagnioj, tomi.valkeinen, daniel.vetter, airlied
  Cc: dledford, awalls, syrjala, luto, mst, cocci, linux-kernel,
	Luis R. Rodriguez, Toshi Kani, Roland Dreier, Sean Hefty,
	Hal Rosenstock, Suresh Siddha, Juergen Gross, Daniel Vetter,
	Dave Airlie, Antonino Daplas, infinipath, linux-rdma,
	linux-fbdev

From: "Luis R. Rodriguez" <mcgrof@suse.com>

There is no good reason not to, we eventually delete it as well.

Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: infinipath@intel.com
Cc: linux-rdma@vger.kernel.org
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/infiniband/hw/ipath/ipath_wc_x86_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
index 4ad0b93..70c1f3a 100644
--- a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
+++ b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
@@ -127,7 +127,7 @@ int ipath_enable_wc(struct ipath_devdata *dd)
 			   "(addr %llx, len=0x%llx)\n",
 			   (unsigned long long) pioaddr,
 			   (unsigned long long) piolen);
-		cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 0);
+		cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 1);
 		if (cookie < 0) {
 			{
 				dev_info(&dd->pcidev->dev,
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v5 6/6] IB/ipath: use arch_phys_wc_add() and require PAT disabled
  2015-04-30 20:25 [PATCH v5 0/6] x86: address drivers that do not work with PAT Luis R. Rodriguez
  2015-04-30 20:25 ` [PATCH v5 1/6] x86/mm/pat: use pr_info() and friends Luis R. Rodriguez
@ 2015-04-30 20:25   ` Luis R. Rodriguez
  2015-04-30 20:25 ` [PATCH v5 3/6] arch/x86/mm/pat: export pat_enabled() Luis R. Rodriguez
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-30 20:25 UTC (permalink / raw)
  To: bp, mingo, tglx, hpa, plagnioj, tomi.valkeinen, daniel.vetter, airlied
  Cc: dledford, awalls, syrjala, luto, mst, cocci, linux-kernel,
	Luis R. Rodriguez, Hal Rosenstock, Sean Hefty, Suresh Siddha,
	Rickard Strandqvist, Mike Marciniszyn, Roland Dreier,
	Linus Torvalds, Juergen Gross, Daniel Vetter, Dave Airlie,
	Bjorn Helgaas, Antonino Daplas, Mel Gorman, Vlastimil Babka,
	Davidlohr Bueso, Dave Hansen, Arnd Bergmann, Stefan Bader

From: "Luis R. Rodriguez" <mcgrof@suse.com>

We are burrying direct access to MTRR code support on
x86 in order to take advantage of PAT. In the future we
also want to make the default behaviour of ioremap_nocache()
to use strong UC, use of mtrr_add() on those systems
would make write-combining void.

In order to help both enable us to later make strong
UC default and in order to phase out direct MTRR access
code port the driver over to arch_phys_wc_add() and
annotate that the device driver requires systems to
boot with PAT disabled, with the nopat kernel parameter.

This is a worthy compromise given that the ipath device
driver powers the old HTX bus cards that only work in
AMD systems, while the newer IB/qib device driver
powers all PCI-e cards. The ipath device driver is
obsolete, hardware hard to find and because of this
this its a reasonable compromise to make to require
users of ipath to boot with nopat.

Acked-by: Doug Ledford <dledford@redhat.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: infinipath@intel.com
Cc: linux-rdma@vger.kernel.org
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/infiniband/hw/ipath/Kconfig           |  3 ++
 drivers/infiniband/hw/ipath/ipath_driver.c    | 18 +++++++----
 drivers/infiniband/hw/ipath/ipath_kernel.h    |  4 +--
 drivers/infiniband/hw/ipath/ipath_wc_x86_64.c | 43 ++++++---------------------
 4 files changed, 26 insertions(+), 42 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/Kconfig b/drivers/infiniband/hw/ipath/Kconfig
index 1d9bb11..8fe54ff 100644
--- a/drivers/infiniband/hw/ipath/Kconfig
+++ b/drivers/infiniband/hw/ipath/Kconfig
@@ -9,3 +9,6 @@ config INFINIBAND_IPATH
 	as IP-over-InfiniBand as well as with userspace applications
 	(in conjunction with InfiniBand userspace access).
 	For QLogic PCIe QLE based cards, use the QIB driver instead.
+
+	If you have this hardware you will need to boot with PAT disabled
+	on your x86-64 systems, use the nopat kernel parameter.
diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c
index bd0caed..441cfe5 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -42,6 +42,9 @@
 #include <linux/bitmap.h>
 #include <linux/slab.h>
 #include <linux/module.h>
+#ifdef CONFIG_X86_64
+#include <asm/pat.h>
+#endif
 
 #include "ipath_kernel.h"
 #include "ipath_verbs.h"
@@ -395,6 +398,14 @@ static int ipath_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	unsigned long long addr;
 	u32 bar0 = 0, bar1 = 0;
 
+#ifdef CONFIG_X86_64
+	if (WARN(pat_enabled(),
+		 "ipath needs PAT disabled, boot with nopat kernel parameter\n")) {
+		ret = EINVAL;
+		goto bail;
+	}
+#endif
+
 	dd = ipath_alloc_devdata(pdev);
 	if (IS_ERR(dd)) {
 		ret = PTR_ERR(dd);
@@ -542,6 +553,7 @@ static int ipath_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	dd->ipath_kregbase = __ioremap(addr, len,
 		(_PAGE_NO_CACHE|_PAGE_WRITETHRU));
 #else
+	/* XXX: split this properly to enable on PAT */
 	dd->ipath_kregbase = ioremap_nocache(addr, len);
 #endif
 
@@ -587,12 +599,8 @@ static int ipath_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	ret = ipath_enable_wc(dd);
 
-	if (ret) {
-		ipath_dev_err(dd, "Write combining not enabled "
-			      "(err %d): performance may be poor\n",
-			      -ret);
+	if (ret)
 		ret = 0;
-	}
 
 	ipath_verify_pioperf(dd);
 
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index e08db70..f0f9471 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -463,9 +463,7 @@ struct ipath_devdata {
 	/* offset in HT config space of slave/primary interface block */
 	u8 ipath_ht_slave_off;
 	/* for write combining settings */
-	unsigned long ipath_wc_cookie;
-	unsigned long ipath_wc_base;
-	unsigned long ipath_wc_len;
+	int wc_cookie;
 	/* ref count for each pkey */
 	atomic_t ipath_pkeyrefs[4];
 	/* shadow copy of struct page *'s for exp tid pages */
diff --git a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
index 70c1f3a..7b6e4c8 100644
--- a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
+++ b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
@@ -37,7 +37,6 @@
  */
 
 #include <linux/pci.h>
-#include <asm/mtrr.h>
 #include <asm/processor.h>
 
 #include "ipath_kernel.h"
@@ -122,27 +121,14 @@ int ipath_enable_wc(struct ipath_devdata *dd)
 	}
 
 	if (!ret) {
-		int cookie;
-		ipath_cdbg(VERBOSE, "Setting mtrr for chip to WC "
-			   "(addr %llx, len=0x%llx)\n",
-			   (unsigned long long) pioaddr,
-			   (unsigned long long) piolen);
-		cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 1);
-		if (cookie < 0) {
-			{
-				dev_info(&dd->pcidev->dev,
-					 "mtrr_add()  WC for PIO bufs "
-					 "failed (%d)\n",
-					 cookie);
-				ret = -EINVAL;
-			}
-		} else {
-			ipath_cdbg(VERBOSE, "Set mtrr for chip to WC, "
-				   "cookie is %d\n", cookie);
-			dd->ipath_wc_cookie = cookie;
-			dd->ipath_wc_base = (unsigned long) pioaddr;
-			dd->ipath_wc_len = (unsigned long) piolen;
-		}
+		dd->wc_cookie = arch_phys_wc_add(pioaddr, piolen);
+		if (dd->wc_cookie < 0) {
+			ipath_dev_err(dd, "Seting mtrr failed on PIO buffers\n");
+			ret = -ENODEV;
+		} else if (dd->wc_cookie == 0)
+			ipath_cdbg(VERBOSE, "Set mtrr for chip to WC not needed\n");
+		else
+			ipath_cdbg(VERBOSE, "Set mtrr for chip to WC\n");
 	}
 
 	return ret;
@@ -154,16 +140,5 @@ int ipath_enable_wc(struct ipath_devdata *dd)
  */
 void ipath_disable_wc(struct ipath_devdata *dd)
 {
-	if (dd->ipath_wc_cookie) {
-		int r;
-		ipath_cdbg(VERBOSE, "undoing WCCOMB on pio buffers\n");
-		r = mtrr_del(dd->ipath_wc_cookie, dd->ipath_wc_base,
-			     dd->ipath_wc_len);
-		if (r < 0)
-			dev_info(&dd->pcidev->dev,
-				 "mtrr_del(%lx, %lx, %lx) failed: %d\n",
-				 dd->ipath_wc_cookie, dd->ipath_wc_base,
-				 dd->ipath_wc_len, r);
-		dd->ipath_wc_cookie = 0; /* even on failure */
-	}
+	arch_phys_wc_del(dd->wc_cookie);
 }
-- 
2.3.2.209.gd67f9d5.dirty

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v5 6/6] IB/ipath: use arch_phys_wc_add() and require PAT disabled
@ 2015-04-30 20:25   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-30 20:25 UTC (permalink / raw)
  To: bp, mingo, tglx, hpa, plagnioj, tomi.valkeinen, daniel.vetter, airlied
  Cc: dledford, awalls, syrjala, luto, mst, cocci, linux-kernel,
	Luis R. Rodriguez, Hal Rosenstock, Sean Hefty, Suresh Siddha,
	Rickard Strandqvist, Mike Marciniszyn, Roland Dreier,
	Linus Torvalds, Juergen Gross, Daniel Vetter, Dave Airlie,
	Bjorn Helgaas, Antonino Daplas, Mel Gorman, Vlastimil Babka,
	Davidlohr Bueso, Dave Hansen, Arnd Bergmann, Stefan Bader,
	konrad.wilk, ville.syrjala, david.vrabel, jbeulich, toshi.kani,
	Roger Pau Monné,
	infinipath, linux-rdma, linux-fbdev, xen-devel

From: "Luis R. Rodriguez" <mcgrof@suse.com>

We are burrying direct access to MTRR code support on
x86 in order to take advantage of PAT. In the future we
also want to make the default behaviour of ioremap_nocache()
to use strong UC, use of mtrr_add() on those systems
would make write-combining void.

In order to help both enable us to later make strong
UC default and in order to phase out direct MTRR access
code port the driver over to arch_phys_wc_add() and
annotate that the device driver requires systems to
boot with PAT disabled, with the nopat kernel parameter.

This is a worthy compromise given that the ipath device
driver powers the old HTX bus cards that only work in
AMD systems, while the newer IB/qib device driver
powers all PCI-e cards. The ipath device driver is
obsolete, hardware hard to find and because of this
this its a reasonable compromise to make to require
users of ipath to boot with nopat.

Acked-by: Doug Ledford <dledford@redhat.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: infinipath@intel.com
Cc: linux-rdma@vger.kernel.org
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/infiniband/hw/ipath/Kconfig           |  3 ++
 drivers/infiniband/hw/ipath/ipath_driver.c    | 18 +++++++----
 drivers/infiniband/hw/ipath/ipath_kernel.h    |  4 +--
 drivers/infiniband/hw/ipath/ipath_wc_x86_64.c | 43 ++++++---------------------
 4 files changed, 26 insertions(+), 42 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/Kconfig b/drivers/infiniband/hw/ipath/Kconfig
index 1d9bb11..8fe54ff 100644
--- a/drivers/infiniband/hw/ipath/Kconfig
+++ b/drivers/infiniband/hw/ipath/Kconfig
@@ -9,3 +9,6 @@ config INFINIBAND_IPATH
 	as IP-over-InfiniBand as well as with userspace applications
 	(in conjunction with InfiniBand userspace access).
 	For QLogic PCIe QLE based cards, use the QIB driver instead.
+
+	If you have this hardware you will need to boot with PAT disabled
+	on your x86-64 systems, use the nopat kernel parameter.
diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c
index bd0caed..441cfe5 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -42,6 +42,9 @@
 #include <linux/bitmap.h>
 #include <linux/slab.h>
 #include <linux/module.h>
+#ifdef CONFIG_X86_64
+#include <asm/pat.h>
+#endif
 
 #include "ipath_kernel.h"
 #include "ipath_verbs.h"
@@ -395,6 +398,14 @@ static int ipath_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	unsigned long long addr;
 	u32 bar0 = 0, bar1 = 0;
 
+#ifdef CONFIG_X86_64
+	if (WARN(pat_enabled(),
+		 "ipath needs PAT disabled, boot with nopat kernel parameter\n")) {
+		ret = EINVAL;
+		goto bail;
+	}
+#endif
+
 	dd = ipath_alloc_devdata(pdev);
 	if (IS_ERR(dd)) {
 		ret = PTR_ERR(dd);
@@ -542,6 +553,7 @@ static int ipath_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	dd->ipath_kregbase = __ioremap(addr, len,
 		(_PAGE_NO_CACHE|_PAGE_WRITETHRU));
 #else
+	/* XXX: split this properly to enable on PAT */
 	dd->ipath_kregbase = ioremap_nocache(addr, len);
 #endif
 
@@ -587,12 +599,8 @@ static int ipath_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	ret = ipath_enable_wc(dd);
 
-	if (ret) {
-		ipath_dev_err(dd, "Write combining not enabled "
-			      "(err %d): performance may be poor\n",
-			      -ret);
+	if (ret)
 		ret = 0;
-	}
 
 	ipath_verify_pioperf(dd);
 
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index e08db70..f0f9471 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -463,9 +463,7 @@ struct ipath_devdata {
 	/* offset in HT config space of slave/primary interface block */
 	u8 ipath_ht_slave_off;
 	/* for write combining settings */
-	unsigned long ipath_wc_cookie;
-	unsigned long ipath_wc_base;
-	unsigned long ipath_wc_len;
+	int wc_cookie;
 	/* ref count for each pkey */
 	atomic_t ipath_pkeyrefs[4];
 	/* shadow copy of struct page *'s for exp tid pages */
diff --git a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
index 70c1f3a..7b6e4c8 100644
--- a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
+++ b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
@@ -37,7 +37,6 @@
  */
 
 #include <linux/pci.h>
-#include <asm/mtrr.h>
 #include <asm/processor.h>
 
 #include "ipath_kernel.h"
@@ -122,27 +121,14 @@ int ipath_enable_wc(struct ipath_devdata *dd)
 	}
 
 	if (!ret) {
-		int cookie;
-		ipath_cdbg(VERBOSE, "Setting mtrr for chip to WC "
-			   "(addr %llx, len=0x%llx)\n",
-			   (unsigned long long) pioaddr,
-			   (unsigned long long) piolen);
-		cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 1);
-		if (cookie < 0) {
-			{
-				dev_info(&dd->pcidev->dev,
-					 "mtrr_add()  WC for PIO bufs "
-					 "failed (%d)\n",
-					 cookie);
-				ret = -EINVAL;
-			}
-		} else {
-			ipath_cdbg(VERBOSE, "Set mtrr for chip to WC, "
-				   "cookie is %d\n", cookie);
-			dd->ipath_wc_cookie = cookie;
-			dd->ipath_wc_base = (unsigned long) pioaddr;
-			dd->ipath_wc_len = (unsigned long) piolen;
-		}
+		dd->wc_cookie = arch_phys_wc_add(pioaddr, piolen);
+		if (dd->wc_cookie < 0) {
+			ipath_dev_err(dd, "Seting mtrr failed on PIO buffers\n");
+			ret = -ENODEV;
+		} else if (dd->wc_cookie == 0)
+			ipath_cdbg(VERBOSE, "Set mtrr for chip to WC not needed\n");
+		else
+			ipath_cdbg(VERBOSE, "Set mtrr for chip to WC\n");
 	}
 
 	return ret;
@@ -154,16 +140,5 @@ int ipath_enable_wc(struct ipath_devdata *dd)
  */
 void ipath_disable_wc(struct ipath_devdata *dd)
 {
-	if (dd->ipath_wc_cookie) {
-		int r;
-		ipath_cdbg(VERBOSE, "undoing WCCOMB on pio buffers\n");
-		r = mtrr_del(dd->ipath_wc_cookie, dd->ipath_wc_base,
-			     dd->ipath_wc_len);
-		if (r < 0)
-			dev_info(&dd->pcidev->dev,
-				 "mtrr_del(%lx, %lx, %lx) failed: %d\n",
-				 dd->ipath_wc_cookie, dd->ipath_wc_base,
-				 dd->ipath_wc_len, r);
-		dd->ipath_wc_cookie = 0; /* even on failure */
-	}
+	arch_phys_wc_del(dd->wc_cookie);
 }
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v5 6/6] IB/ipath: use arch_phys_wc_add() and require PAT disabled
@ 2015-04-30 20:25   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-04-30 20:25 UTC (permalink / raw)
  To: bp, mingo, tglx, hpa, plagnioj, tomi.valkeinen, daniel.vetter, airlied
  Cc: dledford, awalls, syrjala, luto, mst, cocci, linux-kernel,
	Luis R. Rodriguez, Hal Rosenstock, Sean Hefty, Suresh Siddha,
	Rickard Strandqvist, Mike Marciniszyn, Roland Dreier,
	Linus Torvalds, Juergen Gross, Daniel Vetter, Dave Airlie,
	Bjorn Helgaas, Antonino Daplas, Mel Gorman, Vlastimil Babka,
	Davidlohr Bueso, Dave Hansen, Arnd Bergmann, Stefan Bader

From: "Luis R. Rodriguez" <mcgrof@suse.com>

We are burrying direct access to MTRR code support on
x86 in order to take advantage of PAT. In the future we
also want to make the default behaviour of ioremap_nocache()
to use strong UC, use of mtrr_add() on those systems
would make write-combining void.

In order to help both enable us to later make strong
UC default and in order to phase out direct MTRR access
code port the driver over to arch_phys_wc_add() and
annotate that the device driver requires systems to
boot with PAT disabled, with the nopat kernel parameter.

This is a worthy compromise given that the ipath device
driver powers the old HTX bus cards that only work in
AMD systems, while the newer IB/qib device driver
powers all PCI-e cards. The ipath device driver is
obsolete, hardware hard to find and because of this
this its a reasonable compromise to make to require
users of ipath to boot with nopat.

Acked-by: Doug Ledford <dledford@redhat.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: infinipath@intel.com
Cc: linux-rdma@vger.kernel.org
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 drivers/infiniband/hw/ipath/Kconfig           |  3 ++
 drivers/infiniband/hw/ipath/ipath_driver.c    | 18 +++++++----
 drivers/infiniband/hw/ipath/ipath_kernel.h    |  4 +--
 drivers/infiniband/hw/ipath/ipath_wc_x86_64.c | 43 ++++++---------------------
 4 files changed, 26 insertions(+), 42 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/Kconfig b/drivers/infiniband/hw/ipath/Kconfig
index 1d9bb11..8fe54ff 100644
--- a/drivers/infiniband/hw/ipath/Kconfig
+++ b/drivers/infiniband/hw/ipath/Kconfig
@@ -9,3 +9,6 @@ config INFINIBAND_IPATH
 	as IP-over-InfiniBand as well as with userspace applications
 	(in conjunction with InfiniBand userspace access).
 	For QLogic PCIe QLE based cards, use the QIB driver instead.
+
+	If you have this hardware you will need to boot with PAT disabled
+	on your x86-64 systems, use the nopat kernel parameter.
diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c
index bd0caed..441cfe5 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -42,6 +42,9 @@
 #include <linux/bitmap.h>
 #include <linux/slab.h>
 #include <linux/module.h>
+#ifdef CONFIG_X86_64
+#include <asm/pat.h>
+#endif
 
 #include "ipath_kernel.h"
 #include "ipath_verbs.h"
@@ -395,6 +398,14 @@ static int ipath_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	unsigned long long addr;
 	u32 bar0 = 0, bar1 = 0;
 
+#ifdef CONFIG_X86_64
+	if (WARN(pat_enabled(),
+		 "ipath needs PAT disabled, boot with nopat kernel parameter\n")) {
+		ret = EINVAL;
+		goto bail;
+	}
+#endif
+
 	dd = ipath_alloc_devdata(pdev);
 	if (IS_ERR(dd)) {
 		ret = PTR_ERR(dd);
@@ -542,6 +553,7 @@ static int ipath_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	dd->ipath_kregbase = __ioremap(addr, len,
 		(_PAGE_NO_CACHE|_PAGE_WRITETHRU));
 #else
+	/* XXX: split this properly to enable on PAT */
 	dd->ipath_kregbase = ioremap_nocache(addr, len);
 #endif
 
@@ -587,12 +599,8 @@ static int ipath_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	ret = ipath_enable_wc(dd);
 
-	if (ret) {
-		ipath_dev_err(dd, "Write combining not enabled "
-			      "(err %d): performance may be poor\n",
-			      -ret);
+	if (ret)
 		ret = 0;
-	}
 
 	ipath_verify_pioperf(dd);
 
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index e08db70..f0f9471 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -463,9 +463,7 @@ struct ipath_devdata {
 	/* offset in HT config space of slave/primary interface block */
 	u8 ipath_ht_slave_off;
 	/* for write combining settings */
-	unsigned long ipath_wc_cookie;
-	unsigned long ipath_wc_base;
-	unsigned long ipath_wc_len;
+	int wc_cookie;
 	/* ref count for each pkey */
 	atomic_t ipath_pkeyrefs[4];
 	/* shadow copy of struct page *'s for exp tid pages */
diff --git a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
index 70c1f3a..7b6e4c8 100644
--- a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
+++ b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
@@ -37,7 +37,6 @@
  */
 
 #include <linux/pci.h>
-#include <asm/mtrr.h>
 #include <asm/processor.h>
 
 #include "ipath_kernel.h"
@@ -122,27 +121,14 @@ int ipath_enable_wc(struct ipath_devdata *dd)
 	}
 
 	if (!ret) {
-		int cookie;
-		ipath_cdbg(VERBOSE, "Setting mtrr for chip to WC "
-			   "(addr %llx, len=0x%llx)\n",
-			   (unsigned long long) pioaddr,
-			   (unsigned long long) piolen);
-		cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 1);
-		if (cookie < 0) {
-			{
-				dev_info(&dd->pcidev->dev,
-					 "mtrr_add()  WC for PIO bufs "
-					 "failed (%d)\n",
-					 cookie);
-				ret = -EINVAL;
-			}
-		} else {
-			ipath_cdbg(VERBOSE, "Set mtrr for chip to WC, "
-				   "cookie is %d\n", cookie);
-			dd->ipath_wc_cookie = cookie;
-			dd->ipath_wc_base = (unsigned long) pioaddr;
-			dd->ipath_wc_len = (unsigned long) piolen;
-		}
+		dd->wc_cookie = arch_phys_wc_add(pioaddr, piolen);
+		if (dd->wc_cookie < 0) {
+			ipath_dev_err(dd, "Seting mtrr failed on PIO buffers\n");
+			ret = -ENODEV;
+		} else if (dd->wc_cookie = 0)
+			ipath_cdbg(VERBOSE, "Set mtrr for chip to WC not needed\n");
+		else
+			ipath_cdbg(VERBOSE, "Set mtrr for chip to WC\n");
 	}
 
 	return ret;
@@ -154,16 +140,5 @@ int ipath_enable_wc(struct ipath_devdata *dd)
  */
 void ipath_disable_wc(struct ipath_devdata *dd)
 {
-	if (dd->ipath_wc_cookie) {
-		int r;
-		ipath_cdbg(VERBOSE, "undoing WCCOMB on pio buffers\n");
-		r = mtrr_del(dd->ipath_wc_cookie, dd->ipath_wc_base,
-			     dd->ipath_wc_len);
-		if (r < 0)
-			dev_info(&dd->pcidev->dev,
-				 "mtrr_del(%lx, %lx, %lx) failed: %d\n",
-				 dd->ipath_wc_cookie, dd->ipath_wc_base,
-				 dd->ipath_wc_len, r);
-		dd->ipath_wc_cookie = 0; /* even on failure */
-	}
+	arch_phys_wc_del(dd->wc_cookie);
 }
-- 
2.3.2.209.gd67f9d5.dirty


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 2/6] x86: document WC MTRR effects on PAT / non-PAT pages
  2015-04-29 21:44   ` Luis R. Rodriguez
@ 2015-04-30 22:01     ` Randy Dunlap
  -1 siblings, 0 replies; 710+ messages in thread
From: Randy Dunlap @ 2015-04-30 22:01 UTC (permalink / raw)
  To: Luis R. Rodriguez, mingo, tglx, hpa, bp, plagnioj,
	tomi.valkeinen, daniel.vetter, airlied
  Cc: dledford, awalls, syrjala, luto, mst, cocci, linux-kernel,
	Luis R. Rodriguez, Toshi Kani, Jonathan Corbet, Dave Hansen,
	Suresh Siddha, Juergen Gross, Daniel Vetter, Dave Airlie,
	Antonino Daplas, Mel Gorman, Vlastimil Babka, Davidlohr Bueso,
	linux-fbdev

On 04/29/15 14:44, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
> 

> ---
>  Documentation/x86/mtrr.txt      | 18 +++++++++++++++---
>  Documentation/x86/pat.txt       | 40 +++++++++++++++++++++++++++++++++++++++-
>  arch/x86/kernel/cpu/mtrr/main.c |  3 +++
>  3 files changed, 57 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/x86/pat.txt b/Documentation/x86/pat.txt
> index cf08c9f..7e183e3 100644
> --- a/Documentation/x86/pat.txt
> +++ b/Documentation/x86/pat.txt
> @@ -102,7 +104,43 @@ wants to export a RAM region, it has to do set_memory_uc() or set_memory_wc()
>  as step 0 above and also track the usage of those pages and use set_memory_wb()
>  before the page is freed to free pool.
>  
> -
> +MTRR effects on PAT / non-PAT systems
> +-------------------------------------
> +
> +The following table provides the effects of using write-combining MTRRs when
> +using ioremap*() calls on x86 for both non-PAT and PAT systems. Ideally
> +mtrr_add() usage will be phased in favor of arch_phys_wc_add() which will
> +be a no-op on PAT enabled systems. The region over which a arch_phys_wc_add()
> +is made should already have be ioremap'd with write-combining page attributes
> +or PAT entries, this can be done by using ioremap_wc() / or respective helpers.
> +Devices which combine areas of IO memory desired to remain uncachable with

I would spell it uncacheable.  In kernel Documentation/, grep uncacheable finds
14 hits vs. 6 hits for uncachable.  No big deal.

> +areas where write-combining is desirable and are restricted by the size
> +requirements of MTRRs should consider splitting up their IO memory space
> +cleanly with ioremap_uc() and ioremap_wc() followed by an arch_phys_wc_add()
> +encompassing both regions. Such use is nevertheless heavily discouraged as
> +the effective memory type is considered implementation defined. This strategy
> +should only be used as last resort on devices with size-contrained regions

                                                      size-constrained

> +where otherwise MTRR write-combining would not be effective.
> +
> +Note that you cannot use set_memory_wc() to override / whitelist IO remapped
> +memory space mapped with ioremap*() calls, set_memory_wc() can only be used
> +on RAM.
> +
> +----------------------------------------------------------------------
> +MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
> +----------------------------------------------------------------------
> +                                                  Non-PAT |  PAT
> +     PAT
> +     |PCD
> +     ||PWT
> +     |||
> +WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
> +WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
> +WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   WC
> +WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
> +----------------------------------------------------------------------
> +
> +(*) denotes implementation defined and is discouraged
>  
>  Notes:
>  
> diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
> index ea5f363..12abdbe 100644
> --- a/arch/x86/kernel/cpu/mtrr/main.c
> +++ b/arch/x86/kernel/cpu/mtrr/main.c
> @@ -538,6 +538,9 @@ EXPORT_SYMBOL(mtrr_del);
>   * attempts to add a WC MTRR covering size bytes starting at base and
>   * logs an error if this fails.
>   *
> + * The caller should expect to need to provide a power of two size on an

    * The called should provide a power of two size on an equivalent
    * power of two boundary.

> + * equivalent power of two boundary.
> + *
>   * Drivers must store the return value to pass to mtrr_del_wc_if_needed,
>   * but drivers should not try to interpret that return value.
>   */
> 


-- 
~Randy

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 2/6] x86: document WC MTRR effects on PAT / non-PAT pages
@ 2015-04-30 22:01     ` Randy Dunlap
  0 siblings, 0 replies; 710+ messages in thread
From: Randy Dunlap @ 2015-04-30 22:01 UTC (permalink / raw)
  To: Luis R. Rodriguez, mingo, tglx, hpa, bp, plagnioj,
	tomi.valkeinen, daniel.vetter, airlied
  Cc: dledford, awalls, syrjala, luto, mst, cocci, linux-kernel,
	Luis R. Rodriguez, Toshi Kani, Jonathan Corbet, Dave Hansen,
	Suresh Siddha, Juergen Gross, Daniel Vetter, Dave Airlie,
	Antonino Daplas, Mel Gorman, Vlastimil Babka, Davidlohr Bueso,
	linux-fbdev

On 04/29/15 14:44, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
> 

> ---
>  Documentation/x86/mtrr.txt      | 18 +++++++++++++++---
>  Documentation/x86/pat.txt       | 40 +++++++++++++++++++++++++++++++++++++++-
>  arch/x86/kernel/cpu/mtrr/main.c |  3 +++
>  3 files changed, 57 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/x86/pat.txt b/Documentation/x86/pat.txt
> index cf08c9f..7e183e3 100644
> --- a/Documentation/x86/pat.txt
> +++ b/Documentation/x86/pat.txt
> @@ -102,7 +104,43 @@ wants to export a RAM region, it has to do set_memory_uc() or set_memory_wc()
>  as step 0 above and also track the usage of those pages and use set_memory_wb()
>  before the page is freed to free pool.
>  
> -
> +MTRR effects on PAT / non-PAT systems
> +-------------------------------------
> +
> +The following table provides the effects of using write-combining MTRRs when
> +using ioremap*() calls on x86 for both non-PAT and PAT systems. Ideally
> +mtrr_add() usage will be phased in favor of arch_phys_wc_add() which will
> +be a no-op on PAT enabled systems. The region over which a arch_phys_wc_add()
> +is made should already have be ioremap'd with write-combining page attributes
> +or PAT entries, this can be done by using ioremap_wc() / or respective helpers.
> +Devices which combine areas of IO memory desired to remain uncachable with

I would spell it uncacheable.  In kernel Documentation/, grep uncacheable finds
14 hits vs. 6 hits for uncachable.  No big deal.

> +areas where write-combining is desirable and are restricted by the size
> +requirements of MTRRs should consider splitting up their IO memory space
> +cleanly with ioremap_uc() and ioremap_wc() followed by an arch_phys_wc_add()
> +encompassing both regions. Such use is nevertheless heavily discouraged as
> +the effective memory type is considered implementation defined. This strategy
> +should only be used as last resort on devices with size-contrained regions

                                                      size-constrained

> +where otherwise MTRR write-combining would not be effective.
> +
> +Note that you cannot use set_memory_wc() to override / whitelist IO remapped
> +memory space mapped with ioremap*() calls, set_memory_wc() can only be used
> +on RAM.
> +
> +----------------------------------------------------------------------
> +MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
> +----------------------------------------------------------------------
> +                                                  Non-PAT |  PAT
> +     PAT
> +     |PCD
> +     ||PWT
> +     |||
> +WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
> +WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
> +WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   WC
> +WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
> +----------------------------------------------------------------------
> +
> +(*) denotes implementation defined and is discouraged
>  
>  Notes:
>  
> diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
> index ea5f363..12abdbe 100644
> --- a/arch/x86/kernel/cpu/mtrr/main.c
> +++ b/arch/x86/kernel/cpu/mtrr/main.c
> @@ -538,6 +538,9 @@ EXPORT_SYMBOL(mtrr_del);
>   * attempts to add a WC MTRR covering size bytes starting at base and
>   * logs an error if this fails.
>   *
> + * The caller should expect to need to provide a power of two size on an

    * The called should provide a power of two size on an equivalent
    * power of two boundary.

> + * equivalent power of two boundary.
> + *
>   * Drivers must store the return value to pass to mtrr_del_wc_if_needed,
>   * but drivers should not try to interpret that return value.
>   */
> 


-- 
~Randy

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 2/6] x86: document WC MTRR effects on PAT / non-PAT pages
  2015-04-29 21:44   ` Luis R. Rodriguez
@ 2015-05-04 12:23     ` Borislav Petkov
  -1 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-04 12:23 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: mingo, tglx, hpa, bp, plagnioj, tomi.valkeinen, daniel.vetter,
	airlied, dledford, awalls, syrjala, luto, mst, cocci,
	linux-kernel, Luis R. Rodriguez, Toshi Kani, Jonathan Corbet,
	Dave Hansen, Suresh Siddha, Juergen Gross, Daniel Vetter,
	Dave Airlie, Antonino Daplas, Mel Gorman, Vlastimil Babka,
	Davidlohr Bueso, linux-fbdev

On Wed, Apr 29, 2015 at 02:44:07PM -0700, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
> 
> As part of the effort to phase out MTRR use document
> write-combining MTRR effects on pages with different
> non-PAT page attributes flags and different PAT entry
> values. Extend arch_phys_wc_add() documentation to
> clarify power of two sizes / boundary requirements as
> we phase out mtrr_add() use.
> 
> Lastly hint towards ioremap_uc() for corner cases on
> device drivers working with devices with mixed regions
> where MTRR size requirements would otherwise not
> enable write-combining effective memory types.
> 
> Cc: Toshi Kani <toshi.kani@hp.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Suresh Siddha <sbsiddha@gmail.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: Ville Syrjälä <syrjala@sci.fi>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Davidlohr Bueso <dbueso@suse.de>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
>  Documentation/x86/mtrr.txt      | 18 +++++++++++++++---
>  Documentation/x86/pat.txt       | 40 +++++++++++++++++++++++++++++++++++++++-
>  arch/x86/kernel/cpu/mtrr/main.c |  3 +++
>  3 files changed, 57 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/x86/mtrr.txt b/Documentation/x86/mtrr.txt
> index cc071dc..a111a6c 100644
> --- a/Documentation/x86/mtrr.txt
> +++ b/Documentation/x86/mtrr.txt
> @@ -1,7 +1,19 @@
>  MTRR (Memory Type Range Register) control
> -3 Jun 1999
> -Richard Gooch
> -<rgooch@atnf.csiro.au>
> +
> +Richard Gooch <rgooch@atnf.csiro.au> - 3 Jun 1999
> +Luis R. Rodriguez <mcgrof@do-not-panic.com> - April 9, 2015
> +
> +===============================================================================
> +Phasing MTRR use

"Phasing out...".

> +
> +MTRR use is replaced on modern x86 hardware with PAT. Over time the only type
> +of effective MTRR that is expected to be supported will be for write-combining.
> +As MTRR use is phased out device drivers should use arch_phys_wc_add() to make
> +MTRR effective on non-PAT systems while a no-op on PAT enabled systems.
> +
> +For details refer to Documentation/x86/pat.txt.
> +
> +===============================================================================
>  
>    On Intel P6 family processors (Pentium Pro, Pentium II and later)
>    the Memory Type Range Registers (MTRRs) may be used to control
> diff --git a/Documentation/x86/pat.txt b/Documentation/x86/pat.txt
> index cf08c9f..7e183e3 100644
> --- a/Documentation/x86/pat.txt
> +++ b/Documentation/x86/pat.txt
> @@ -34,6 +34,8 @@ ioremap                |    --    |    UC-     |       UC-        |
>                         |          |            |                  |
>  ioremap_cache          |    --    |    WB      |       WB         |
>                         |          |            |                  |
> +ioremap_uc             |    --    |    UC      |       UC         |
> +                       |          |            |                  |
>  ioremap_nocache        |    --    |    UC-     |       UC-        |
>                         |          |            |                  |
>  ioremap_wc             |    --    |    --      |       WC         |
> @@ -102,7 +104,43 @@ wants to export a RAM region, it has to do set_memory_uc() or set_memory_wc()
>  as step 0 above and also track the usage of those pages and use set_memory_wb()
>  before the page is freed to free pool.
>  
> -
> +MTRR effects on PAT / non-PAT systems
> +-------------------------------------
> +
> +The following table provides the effects of using write-combining MTRRs when
> +using ioremap*() calls on x86 for both non-PAT and PAT systems. Ideally
> +mtrr_add() usage will be phased in favor of arch_phys_wc_add() which will

				out

> +be a no-op on PAT enabled systems. The region over which a arch_phys_wc_add()
> +is made should already have be ioremap'd with write-combining page attributes

	 , 		have been ioremapped with WC attributes...

> +or PAT entries, this can be done by using ioremap_wc() / or respective helpers.
> +Devices which combine areas of IO memory desired to remain uncachable with
> +areas where write-combining is desirable and are restricted by the size
> +requirements of MTRRs should consider splitting up their IO memory space
> +cleanly with ioremap_uc() and ioremap_wc() followed by an arch_phys_wc_add()
> +encompassing both regions. Such use is nevertheless heavily discouraged as
> +the effective memory type is considered implementation defined. This strategy
> +should only be used as last resort on devices with size-contrained regions
> +where otherwise MTRR write-combining would not be effective.
> +
> +Note that you cannot use set_memory_wc() to override / whitelist IO remapped
> +memory space mapped with ioremap*() calls, set_memory_wc() can only be used
> +on RAM.
> +
> +----------------------------------------------------------------------
> +MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
> +----------------------------------------------------------------------
> +                                                  Non-PAT |  PAT
> +     PAT
> +     |PCD
> +     ||PWT
> +     |||
> +WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
> +WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
> +WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   WC
> +WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
> +----------------------------------------------------------------------
> +
> +(*) denotes implementation defined and is discouraged
>  
>  Notes:
>  
> diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
> index ea5f363..12abdbe 100644
> --- a/arch/x86/kernel/cpu/mtrr/main.c
> +++ b/arch/x86/kernel/cpu/mtrr/main.c
> @@ -538,6 +538,9 @@ EXPORT_SYMBOL(mtrr_del);
>   * attempts to add a WC MTRR covering size bytes starting at base and
>   * logs an error if this fails.
>   *
> + * The caller should expect to need to provide a power of two size on an
> + * equivalent power of two boundary.
> + *
>   * Drivers must store the return value to pass to mtrr_del_wc_if_needed,
>   * but drivers should not try to interpret that return value.
>   */
> -- 
> 2.3.2.209.gd67f9d5.dirty
> 

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 2/6] x86: document WC MTRR effects on PAT / non-PAT pages
@ 2015-05-04 12:23     ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-04 12:23 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: mingo, tglx, hpa, bp, plagnioj, tomi.valkeinen, daniel.vetter,
	airlied, dledford, awalls, syrjala, luto, mst, cocci,
	linux-kernel, Luis R. Rodriguez, Toshi Kani, Jonathan Corbet,
	Dave Hansen, Suresh Siddha, Juergen Gross, Daniel Vetter,
	Dave Airlie, Antonino Daplas, Mel Gorman, Vlastimil Babka,
	Davidlohr Bueso, linux-fbdev

On Wed, Apr 29, 2015 at 02:44:07PM -0700, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
> 
> As part of the effort to phase out MTRR use document
> write-combining MTRR effects on pages with different
> non-PAT page attributes flags and different PAT entry
> values. Extend arch_phys_wc_add() documentation to
> clarify power of two sizes / boundary requirements as
> we phase out mtrr_add() use.
> 
> Lastly hint towards ioremap_uc() for corner cases on
> device drivers working with devices with mixed regions
> where MTRR size requirements would otherwise not
> enable write-combining effective memory types.
> 
> Cc: Toshi Kani <toshi.kani@hp.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Suresh Siddha <sbsiddha@gmail.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Antonino Daplas <adaplas@gmail.com>
> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> Cc: Ville Syrjälä <syrjala@sci.fi>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Davidlohr Bueso <dbueso@suse.de>
> Cc: linux-fbdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
>  Documentation/x86/mtrr.txt      | 18 +++++++++++++++---
>  Documentation/x86/pat.txt       | 40 +++++++++++++++++++++++++++++++++++++++-
>  arch/x86/kernel/cpu/mtrr/main.c |  3 +++
>  3 files changed, 57 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/x86/mtrr.txt b/Documentation/x86/mtrr.txt
> index cc071dc..a111a6c 100644
> --- a/Documentation/x86/mtrr.txt
> +++ b/Documentation/x86/mtrr.txt
> @@ -1,7 +1,19 @@
>  MTRR (Memory Type Range Register) control
> -3 Jun 1999
> -Richard Gooch
> -<rgooch@atnf.csiro.au>
> +
> +Richard Gooch <rgooch@atnf.csiro.au> - 3 Jun 1999
> +Luis R. Rodriguez <mcgrof@do-not-panic.com> - April 9, 2015
> +
> +=======================================> +Phasing MTRR use

"Phasing out...".

> +
> +MTRR use is replaced on modern x86 hardware with PAT. Over time the only type
> +of effective MTRR that is expected to be supported will be for write-combining.
> +As MTRR use is phased out device drivers should use arch_phys_wc_add() to make
> +MTRR effective on non-PAT systems while a no-op on PAT enabled systems.
> +
> +For details refer to Documentation/x86/pat.txt.
> +
> +=======================================>  
>    On Intel P6 family processors (Pentium Pro, Pentium II and later)
>    the Memory Type Range Registers (MTRRs) may be used to control
> diff --git a/Documentation/x86/pat.txt b/Documentation/x86/pat.txt
> index cf08c9f..7e183e3 100644
> --- a/Documentation/x86/pat.txt
> +++ b/Documentation/x86/pat.txt
> @@ -34,6 +34,8 @@ ioremap                |    --    |    UC-     |       UC-        |
>                         |          |            |                  |
>  ioremap_cache          |    --    |    WB      |       WB         |
>                         |          |            |                  |
> +ioremap_uc             |    --    |    UC      |       UC         |
> +                       |          |            |                  |
>  ioremap_nocache        |    --    |    UC-     |       UC-        |
>                         |          |            |                  |
>  ioremap_wc             |    --    |    --      |       WC         |
> @@ -102,7 +104,43 @@ wants to export a RAM region, it has to do set_memory_uc() or set_memory_wc()
>  as step 0 above and also track the usage of those pages and use set_memory_wb()
>  before the page is freed to free pool.
>  
> -
> +MTRR effects on PAT / non-PAT systems
> +-------------------------------------
> +
> +The following table provides the effects of using write-combining MTRRs when
> +using ioremap*() calls on x86 for both non-PAT and PAT systems. Ideally
> +mtrr_add() usage will be phased in favor of arch_phys_wc_add() which will

				out

> +be a no-op on PAT enabled systems. The region over which a arch_phys_wc_add()
> +is made should already have be ioremap'd with write-combining page attributes

	 , 		have been ioremapped with WC attributes...

> +or PAT entries, this can be done by using ioremap_wc() / or respective helpers.
> +Devices which combine areas of IO memory desired to remain uncachable with
> +areas where write-combining is desirable and are restricted by the size
> +requirements of MTRRs should consider splitting up their IO memory space
> +cleanly with ioremap_uc() and ioremap_wc() followed by an arch_phys_wc_add()
> +encompassing both regions. Such use is nevertheless heavily discouraged as
> +the effective memory type is considered implementation defined. This strategy
> +should only be used as last resort on devices with size-contrained regions
> +where otherwise MTRR write-combining would not be effective.
> +
> +Note that you cannot use set_memory_wc() to override / whitelist IO remapped
> +memory space mapped with ioremap*() calls, set_memory_wc() can only be used
> +on RAM.
> +
> +----------------------------------------------------------------------
> +MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
> +----------------------------------------------------------------------
> +                                                  Non-PAT |  PAT
> +     PAT
> +     |PCD
> +     ||PWT
> +     |||
> +WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
> +WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
> +WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   WC
> +WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
> +----------------------------------------------------------------------
> +
> +(*) denotes implementation defined and is discouraged
>  
>  Notes:
>  
> diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
> index ea5f363..12abdbe 100644
> --- a/arch/x86/kernel/cpu/mtrr/main.c
> +++ b/arch/x86/kernel/cpu/mtrr/main.c
> @@ -538,6 +538,9 @@ EXPORT_SYMBOL(mtrr_del);
>   * attempts to add a WC MTRR covering size bytes starting at base and
>   * logs an error if this fails.
>   *
> + * The caller should expect to need to provide a power of two size on an
> + * equivalent power of two boundary.
> + *
>   * Drivers must store the return value to pass to mtrr_del_wc_if_needed,
>   * but drivers should not try to interpret that return value.
>   */
> -- 
> 2.3.2.209.gd67f9d5.dirty
> 

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 1/6] x86/mm/pat: use pr_info() and friends
  2015-04-30 20:25 ` [PATCH v5 1/6] x86/mm/pat: use pr_info() and friends Luis R. Rodriguez
@ 2015-05-04 14:58   ` Borislav Petkov
  2015-05-07  3:36   ` Elliott, Robert (Server Storage)
  1 sibling, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-04 14:58 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: mingo, tglx, hpa, plagnioj, tomi.valkeinen, daniel.vetter,
	airlied, dledford, awalls, syrjala, luto, mst, cocci,
	linux-kernel, Luis R. Rodriguez, Juergen Gross, Daniel Vetter,
	Dave Airlie, Bjorn Helgaas, x86

On Thu, Apr 30, 2015 at 01:25:15PM -0700, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
> 
> Use pr_info() instead of the old printk to
> prefix the component where things are coming
> from. With this readers will know exactly where
> the message is coming from. Since pr_fmt is
> already defined in this case we redefine it to
> "PAT: ".
> 
> We leave the users of dprintk() in place, this
> will print only when the debugpat kernel parameter
> is enabled. We want to leave those enabled as a
> debug feature, but also make them use the same
> prefix.
> 
> Cc: Andy Walls <awalls@md.metrocast.net>
> Cc: Doug Ledford <dledford@redhat.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: linux-kernel@vger.kernel.org
> Cc: x86@kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
>  arch/x86/mm/pat.c          | 41 +++++++++++++++++++++--------------------
>  arch/x86/mm/pat_internal.h |  2 +-
>  arch/x86/mm/pat_rbtree.c   |  5 ++++-
>  3 files changed, 26 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
> index 372ad42..8f88c6a 100644
> --- a/arch/x86/mm/pat.c
> +++ b/arch/x86/mm/pat.c
> @@ -33,13 +33,16 @@
>  #include "pat_internal.h"
>  #include "mm_internal.h"
>  
> +#undef pr_fmt
> +#define pr_fmt(fmt)	"PAT: " fmt

Hmm, ok, so those pr_* helpers with the prefix actually make grepping
for the error message not fun. So I take that back about the pr_fmt
thing - it is a bad idea.

Instead, we should still use pr_* because they're shorter but simply add
the prefix before each message so that you can grep for it.

I went and I did that, see below.

---
From: "Luis R. Rodriguez" <mcgrof@suse.com>
Date: Thu, 30 Apr 2015 13:25:15 -0700
Subject: [PATCH] x86/mm/pat: Convert to pr_* usage

Use pr_info() instead of the old printk to prefix the component where
things are coming from. With this readers will know exactly where the
message is coming from. Since pr_fmt is already defined in this case we
redefine it to "x86/PAT: ".

We leave the users of dprintk() in place, this will print only when the
debugpat kernel parameter is enabled. We want to leave those enabled as
a debug feature, but also make them use the same prefix.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: x86@kernel.org
Cc: cocci@systeme.lip6.fr
Link: http://lkml.kernel.org/r/1430425520-22275-2-git-send-email-mcgrof@do-not-panic.com
Signed-off-by: 
---
 arch/x86/mm/pat.c          | 41 +++++++++++++++++++----------------------
 arch/x86/mm/pat_internal.h |  2 +-
 arch/x86/mm/pat_rbtree.c   |  2 +-
 3 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 35af6771a95a..8269c784b61c 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -39,7 +39,7 @@ int __read_mostly pat_enabled = 1;
 static inline void pat_disable(const char *reason)
 {
 	pat_enabled = 0;
-	printk(KERN_INFO "%s\n", reason);
+	pr_info("x86/PAT: %s\n", reason);
 }
 
 static int __init nopat(char *str)
@@ -188,7 +188,7 @@ void pat_init_cache_modes(void)
 					   pat_msg + 4 * i);
 		update_cache_mode_entry(i, cache);
 	}
-	pr_info("PAT configuration [0-7]: %s\n", pat_msg);
+	pr_info("x86/PAT: Configuration [0-7]: %s\n", pat_msg);
 }
 
 #define PAT(x, y)	((u64)PAT_ ## y << ((x)*8))
@@ -211,8 +211,7 @@ void pat_init(void)
 			 * switched to PAT on the boot CPU. We have no way to
 			 * undo PAT.
 			 */
-			printk(KERN_ERR "PAT enabled, "
-			       "but not supported by secondary CPU\n");
+			pr_err("x86/PAT: PAT enabled, but not supported by secondary CPU\n");
 			BUG();
 		}
 	}
@@ -347,7 +346,7 @@ static int reserve_ram_pages_type(u64 start, u64 end,
 		page = pfn_to_page(pfn);
 		type = get_page_memtype(page);
 		if (type != -1) {
-			pr_info("reserve_ram_pages_type failed [mem %#010Lx-%#010Lx], track 0x%x, req 0x%x\n",
+			pr_info("x86/PAT: reserve_ram_pages_type failed [mem %#010Lx-%#010Lx], track 0x%x, req 0x%x\n",
 				start, end - 1, type, req_type);
 			if (new_type)
 				*new_type = type;
@@ -451,9 +450,9 @@ int reserve_memtype(u64 start, u64 end, enum page_cache_mode req_type,
 
 	err = rbt_memtype_check_insert(new, new_type);
 	if (err) {
-		printk(KERN_INFO "reserve_memtype failed [mem %#010Lx-%#010Lx], track %s, req %s\n",
-		       start, end - 1,
-		       cattr_name(new->type), cattr_name(req_type));
+		pr_info("x86/PAT: reserve_memtype failed [mem %#010Lx-%#010Lx], track %s, req %s\n",
+			start, end - 1,
+			cattr_name(new->type), cattr_name(req_type));
 		kfree(new);
 		spin_unlock(&memtype_lock);
 
@@ -497,8 +496,8 @@ int free_memtype(u64 start, u64 end)
 	spin_unlock(&memtype_lock);
 
 	if (!entry) {
-		printk(KERN_INFO "%s:%d freeing invalid memtype [mem %#010Lx-%#010Lx]\n",
-		       current->comm, current->pid, start, end - 1);
+		pr_info("x86/PAT: %s:%d freeing invalid memtype [mem %#010Lx-%#010Lx]\n",
+			current->comm, current->pid, start, end - 1);
 		return -EINVAL;
 	}
 
@@ -628,8 +627,8 @@ static inline int range_is_allowed(unsigned long pfn, unsigned long size)
 
 	while (cursor < to) {
 		if (!devmem_is_allowed(pfn)) {
-			printk(KERN_INFO "Program %s tried to access /dev/mem between [mem %#010Lx-%#010Lx], PAT prevents it\n",
-			       current->comm, from, to - 1);
+			pr_info("x86/PAT: Program %s tried to access /dev/mem between [mem %#010Lx-%#010Lx], PAT prevents it\n",
+				current->comm, from, to - 1);
 			return 0;
 		}
 		cursor += PAGE_SIZE;
@@ -698,8 +697,7 @@ int kernel_map_sync_memtype(u64 base, unsigned long size,
 				size;
 
 	if (ioremap_change_attr((unsigned long)__va(base), id_sz, pcm) < 0) {
-		printk(KERN_INFO "%s:%d ioremap_change_attr failed %s "
-			"for [mem %#010Lx-%#010Lx]\n",
+		pr_info("x86/PAT: %s:%d ioremap_change_attr failed %s for [mem %#010Lx-%#010Lx]\n",
 			current->comm, current->pid,
 			cattr_name(pcm),
 			base, (unsigned long long)(base + size-1));
@@ -734,7 +732,7 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
 
 		pcm = lookup_memtype(paddr);
 		if (want_pcm != pcm) {
-			printk(KERN_WARNING "%s:%d map pfn RAM range req %s for [mem %#010Lx-%#010Lx], got %s\n",
+			pr_warn("x86/PAT: %s:%d map pfn RAM range req %s for [mem %#010Lx-%#010Lx], got %s\n",
 				current->comm, current->pid,
 				cattr_name(want_pcm),
 				(unsigned long long)paddr,
@@ -755,13 +753,12 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
 		if (strict_prot ||
 		    !is_new_memtype_allowed(paddr, size, want_pcm, pcm)) {
 			free_memtype(paddr, paddr + size);
-			printk(KERN_ERR "%s:%d map pfn expected mapping type %s"
-				" for [mem %#010Lx-%#010Lx], got %s\n",
-				current->comm, current->pid,
-				cattr_name(want_pcm),
-				(unsigned long long)paddr,
-				(unsigned long long)(paddr + size - 1),
-				cattr_name(pcm));
+			pr_err("x86/PAT: %s:%d map pfn expected mapping type %s for [mem %#010Lx-%#010Lx], got %s\n",
+			       current->comm, current->pid,
+			       cattr_name(want_pcm),
+			       (unsigned long long)paddr,
+			       (unsigned long long)(paddr + size - 1),
+			       cattr_name(pcm));
 			return -EINVAL;
 		}
 		/*
diff --git a/arch/x86/mm/pat_internal.h b/arch/x86/mm/pat_internal.h
index f6411620305d..a739bfc40690 100644
--- a/arch/x86/mm/pat_internal.h
+++ b/arch/x86/mm/pat_internal.h
@@ -4,7 +4,7 @@
 extern int pat_debug_enable;
 
 #define dprintk(fmt, arg...) \
-	do { if (pat_debug_enable) printk(KERN_INFO fmt, ##arg); } while (0)
+	do { if (pat_debug_enable) pr_info("x86/PAT: " fmt, ##arg); } while (0)
 
 struct memtype {
 	u64			start;
diff --git a/arch/x86/mm/pat_rbtree.c b/arch/x86/mm/pat_rbtree.c
index 6582adcc8bd9..82b8c6aaf260 100644
--- a/arch/x86/mm/pat_rbtree.c
+++ b/arch/x86/mm/pat_rbtree.c
@@ -160,7 +160,7 @@ success:
 	return 0;
 
 failure:
-	printk(KERN_INFO "%s:%d conflicting memory types "
+	pr_info("x86/PAT: %s:%d conflicting memory types "
 		"%Lx-%Lx %s<->%s\n", current->comm, current->pid, start,
 		end, cattr_name(found_type), cattr_name(match->type));
 	return -EBUSY;
-- 
2.3.5

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 2/6] x86/mm/pat: redefine pat_enabled
  2015-04-30 20:25 ` [PATCH v5 2/6] x86/mm/pat: redefine pat_enabled Luis R. Rodriguez
@ 2015-05-04 15:22   ` Borislav Petkov
  2015-05-05  0:42     ` Luis R. Rodriguez
  0 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-04 15:22 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: mingo, tglx, hpa, plagnioj, tomi.valkeinen, daniel.vetter,
	airlied, dledford, awalls, syrjala, luto, mst, cocci,
	linux-kernel, Luis R. Rodriguez, Christoph Lameter,
	Kyle McMartin, Juergen Gross, Daniel Vetter, Dave Airlie,
	Bjorn Helgaas, x86

On Thu, Apr 30, 2015 at 01:25:16PM -0700, Luis R. Rodriguez wrote:
> diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
> index bfef424..f094d36 100644
> --- a/arch/x86/kernel/cpu/mtrr/main.c
> +++ b/arch/x86/kernel/cpu/mtrr/main.c
> @@ -556,7 +556,7 @@ int arch_phys_wc_add(unsigned long base, unsigned long size)
>  {
>  	int ret;
>  
> -	if (pat_enabled || !mtrr_enabled)
> +	if (pat_enabled() || !mtrr_enabled)

What's going on here? I got a reject about mtrr_enabled which is nowhere
to be found. Am I missing a patch?

Anyway, I applied that:

---
From: "Luis R. Rodriguez" <mcgrof@suse.com>
Date: Thu, 30 Apr 2015 13:25:16 -0700
Subject: [PATCH] x86/mm/pat: Redefine pat_enabled

We use pat_enabled in x86 specific code to see if PAT is enabled or
not, we however are granting full access to the variable even though
readers do not need to set it. If for instance we granted access to it
to modules later they then could override the variable setting... no
bueno.

This renames pat_enabled to a new static variable __pat_enabled. To
see if PAT is enabled / disabled folks can just use the nice accessor
pat_enabled() now.

Code that sets this can only be internal to pat.c. Apart from the early
kernel parameter "nopat" to disable PAT we also have a few cases that
disable it later and make use of a helper pat_disable(), this helper is
wrapped under an ifdef but since that code cannot run unless PAT was
enabled its not required to wrap it with ifdefs, unwrap that. Likewise
since "nopat" doesn't really change non-PAT systems just remove that
ifdef as well.

Although we could add and use an early_param_off() these helpers don't
use __read_mostly and we want to keep __read_mostly for __pat_enabled as
this is a hot path -- upon boot for instance a simple guest may see ~4k
accesses to pat_enabled(). Since __read_mostly early boot params are not
that common we don't add a helper for them just yet.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: syrjala@sci.fi
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Link: http://lkml.kernel.org/r/1430425520-22275-3-git-send-email-mcgrof@do-not-panic.com
Signed-off-by: 
---
 arch/x86/include/asm/pat.h      |  7 +------
 arch/x86/kernel/cpu/mtrr/main.c |  2 +-
 arch/x86/mm/iomap_32.c          |  2 +-
 arch/x86/mm/ioremap.c           |  4 ++--
 arch/x86/mm/pageattr.c          |  2 +-
 arch/x86/mm/pat.c               | 33 +++++++++++++++------------------
 arch/x86/pci/i386.c             |  6 +++---
 7 files changed, 24 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/pat.h b/arch/x86/include/asm/pat.h
index 91bc4ba95f91..cdcff7f7f694 100644
--- a/arch/x86/include/asm/pat.h
+++ b/arch/x86/include/asm/pat.h
@@ -4,12 +4,7 @@
 #include <linux/types.h>
 #include <asm/pgtable_types.h>
 
-#ifdef CONFIG_X86_PAT
-extern int pat_enabled;
-#else
-static const int pat_enabled;
-#endif
-
+bool pat_enabled(void);
 extern void pat_init(void);
 void pat_init_cache_modes(void);
 
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index ea5f363a1948..96fa7b38af5e 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -545,7 +545,7 @@ int arch_phys_wc_add(unsigned long base, unsigned long size)
 {
 	int ret;
 
-	if (pat_enabled)
+	if (pat_enabled())
 		return 0;  /* Success!  (We don't need to do anything.) */
 
 	ret = mtrr_add(base, size, MTRR_TYPE_WRCOMB, true);
diff --git a/arch/x86/mm/iomap_32.c b/arch/x86/mm/iomap_32.c
index 9ca35fc60cfe..3a2ec8790ca7 100644
--- a/arch/x86/mm/iomap_32.c
+++ b/arch/x86/mm/iomap_32.c
@@ -82,7 +82,7 @@ iomap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot)
 	 * MTRR is UC or WC.  UC_MINUS gets the real intention, of the
 	 * user, which is "WC if the MTRR is WC, UC if you can't do that."
 	 */
-	if (!pat_enabled && pgprot_val(prot) ==
+	if (!pat_enabled() && pgprot_val(prot) ==
 	    (__PAGE_KERNEL | cachemode2protval(_PAGE_CACHE_MODE_WC)))
 		prot = __pgprot(__PAGE_KERNEL |
 				cachemode2protval(_PAGE_CACHE_MODE_UC_MINUS));
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index fc08431a387b..ea379c06cc4c 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -234,7 +234,7 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, unsigned long size)
 {
 	/*
 	 * Ideally, this should be:
-	 *	pat_enabled ? _PAGE_CACHE_MODE_UC : _PAGE_CACHE_MODE_UC_MINUS;
+	 *	pat_enabled() ? _PAGE_CACHE_MODE_UC : _PAGE_CACHE_MODE_UC_MINUS;
 	 *
 	 * Till we fix all X drivers to use ioremap_wc(), we will use
 	 * UC MINUS. Drivers that are certain they need or can already
@@ -292,7 +292,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc);
  */
 void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
 {
-	if (pat_enabled)
+	if (pat_enabled())
 		return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC,
 					__builtin_return_address(0));
 	else
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index d35148acdc05..e07686633ce4 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1573,7 +1573,7 @@ int set_memory_wc(unsigned long addr, int numpages)
 {
 	int ret;
 
-	if (!pat_enabled)
+	if (!pat_enabled())
 		return set_memory_uc(addr, numpages);
 
 	ret = reserve_memtype(__pa(addr), __pa(addr) + numpages * PAGE_SIZE,
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 8269c784b61c..6e05a071ffc8 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -33,12 +33,11 @@
 #include "pat_internal.h"
 #include "mm_internal.h"
 
-#ifdef CONFIG_X86_PAT
-int __read_mostly pat_enabled = 1;
+static int __read_mostly __pat_enabled = IS_ENABLED(CONFIG_X86_PAT);
 
 static inline void pat_disable(const char *reason)
 {
-	pat_enabled = 0;
+	__pat_enabled = 0;
 	pr_info("x86/PAT: %s\n", reason);
 }
 
@@ -48,13 +47,11 @@ static int __init nopat(char *str)
 	return 0;
 }
 early_param("nopat", nopat);
-#else
-static inline void pat_disable(const char *reason)
+
+bool pat_enabled(void)
 {
-	(void)reason;
+	return !!__pat_enabled;
 }
-#endif
-
 
 int pat_debug_enable;
 
@@ -198,7 +195,7 @@ void pat_init(void)
 	u64 pat;
 	bool boot_cpu = !boot_pat_state;
 
-	if (!pat_enabled)
+	if (!pat_enabled())
 		return;
 
 	if (!cpu_has_pat) {
@@ -399,7 +396,7 @@ int reserve_memtype(u64 start, u64 end, enum page_cache_mode req_type,
 
 	BUG_ON(start >= end); /* end is exclusive */
 
-	if (!pat_enabled) {
+	if (!pat_enabled()) {
 		/* This is identical to page table setting without PAT */
 		if (new_type) {
 			if (req_type == _PAGE_CACHE_MODE_WC)
@@ -474,7 +471,7 @@ int free_memtype(u64 start, u64 end)
 	int is_range_ram;
 	struct memtype *entry;
 
-	if (!pat_enabled)
+	if (!pat_enabled())
 		return 0;
 
 	/* Low ISA region is always mapped WB. No need to track */
@@ -622,7 +619,7 @@ static inline int range_is_allowed(unsigned long pfn, unsigned long size)
 	u64 to = from + size;
 	u64 cursor = from;
 
-	if (!pat_enabled)
+	if (!pat_enabled())
 		return 1;
 
 	while (cursor < to) {
@@ -658,7 +655,7 @@ int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
 	 * caching for the high addresses through the KEN pin, but
 	 * we maintain the tradition of paranoia in this code.
 	 */
-	if (!pat_enabled &&
+	if (!pat_enabled() &&
 	    !(boot_cpu_has(X86_FEATURE_MTRR) ||
 	      boot_cpu_has(X86_FEATURE_K6_MTRR) ||
 	      boot_cpu_has(X86_FEATURE_CYRIX_ARR) ||
@@ -727,7 +724,7 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
 	 * the type requested matches the type of first page in the range.
 	 */
 	if (is_ram) {
-		if (!pat_enabled)
+		if (!pat_enabled())
 			return 0;
 
 		pcm = lookup_memtype(paddr);
@@ -841,7 +838,7 @@ int track_pfn_remap(struct vm_area_struct *vma, pgprot_t *prot,
 		return ret;
 	}
 
-	if (!pat_enabled)
+	if (!pat_enabled())
 		return 0;
 
 	/*
@@ -869,7 +866,7 @@ int track_pfn_insert(struct vm_area_struct *vma, pgprot_t *prot,
 {
 	enum page_cache_mode pcm;
 
-	if (!pat_enabled)
+	if (!pat_enabled())
 		return 0;
 
 	/* Set prot based on lookup */
@@ -910,7 +907,7 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
 
 pgprot_t pgprot_writecombine(pgprot_t prot)
 {
-	if (pat_enabled)
+	if (pat_enabled())
 		return __pgprot(pgprot_val(prot) |
 				cachemode2protval(_PAGE_CACHE_MODE_WC));
 	else
@@ -993,7 +990,7 @@ static const struct file_operations memtype_fops = {
 
 static int __init pat_memtype_list_init(void)
 {
-	if (pat_enabled) {
+	if (pat_enabled()) {
 		debugfs_create_file("pat_memtype_list", S_IRUSR,
 				    arch_debugfs_dir, NULL, &memtype_fops);
 	}
diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index 349c0d32cc0b..0a9f2caf358f 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -429,12 +429,12 @@ int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
  	 * Caller can followup with UC MINUS request and add a WC mtrr if there
  	 * is a free mtrr slot.
  	 */
-	if (!pat_enabled && write_combine)
+	if (!pat_enabled() && write_combine)
 		return -EINVAL;
 
-	if (pat_enabled && write_combine)
+	if (pat_enabled() && write_combine)
 		prot |= cachemode2protval(_PAGE_CACHE_MODE_WC);
-	else if (pat_enabled || boot_cpu_data.x86 > 3)
+	else if (pat_enabled() || boot_cpu_data.x86 > 3)
 		/*
 		 * ioremap() and ioremap_nocache() defaults to UC MINUS for now.
 		 * To avoid attribute conflicts, request UC MINUS here
-- 
2.3.5

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 3/6] arch/x86/mm/pat: export pat_enabled()
  2015-04-30 20:25 ` [PATCH v5 3/6] arch/x86/mm/pat: export pat_enabled() Luis R. Rodriguez
@ 2015-05-04 15:29   ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-04 15:29 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: mingo, tglx, hpa, plagnioj, tomi.valkeinen, daniel.vetter,
	airlied, dledford, awalls, syrjala, luto, mst, cocci,
	linux-kernel, Luis R. Rodriguez, Juergen Gross, Daniel Vetter,
	Dave Airlie, Bjorn Helgaas, x86

On Thu, Apr 30, 2015 at 01:25:17PM -0700, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
> 
> Two Linux device drivers cannot work with PAT and the work
> required to make them work is significant. There is not
> enough motivation to convert these drivers over to use
> PAT properly, the compromise reached is to let drivers
> that cannot be ported to PAT check if PAT was enabled
> and if so fail on probe with a recommendation to boot
> with the "nopat" kernel parameter.
> 
> Cc: Andy Walls <awalls@md.metrocast.net>
> Cc: Doug Ledford <dledford@redhat.com>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: linux-kernel@vger.kernel.org
> Cc: x86@kernel.org
> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> ---
>  arch/x86/mm/pat.c | 1 +
>  1 file changed, 1 insertion(+)

Applied, thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 2/6] x86/mm/pat: redefine pat_enabled
  2015-05-04 15:22   ` Borislav Petkov
@ 2015-05-05  0:42     ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-05-05  0:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Luis R. Rodriguez, mingo, tglx, hpa, plagnioj, tomi.valkeinen,
	daniel.vetter, airlied, dledford, awalls, syrjala, luto, mst,
	cocci, linux-kernel, Christoph Lameter, Kyle McMartin,
	Juergen Gross, Daniel Vetter, Dave Airlie, Bjorn Helgaas, x86

On Mon, May 04, 2015 at 05:22:08PM +0200, Borislav Petkov wrote:
> On Thu, Apr 30, 2015 at 01:25:16PM -0700, Luis R. Rodriguez wrote:
> > diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
> > index bfef424..f094d36 100644
> > --- a/arch/x86/kernel/cpu/mtrr/main.c
> > +++ b/arch/x86/kernel/cpu/mtrr/main.c
> > @@ -556,7 +556,7 @@ int arch_phys_wc_add(unsigned long base, unsigned long size)
> >  {
> >  	int ret;
> >  
> > -	if (pat_enabled || !mtrr_enabled)
> > +	if (pat_enabled() || !mtrr_enabled)
> 
> What's going on here? I got a reject about mtrr_enabled which is nowhere
> to be found. Am I missing a patch?

Yes, the patch titled, "x86: mtrr: generalize run time disabling of MTRR"
should be applied first, or ammended to fit the new style. Let me know what
you prefer.

> Anyway, I applied that:

Great, thanks, is there a tree I can use to rebase / fetch ?

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 2/6] x86: document WC MTRR effects on PAT / non-PAT pages
  2015-04-30 22:01     ` Randy Dunlap
@ 2015-05-05  0:45       ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-05-05  0:45 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Luis R. Rodriguez, mingo, tglx, hpa, bp, plagnioj,
	tomi.valkeinen, daniel.vetter, airlied, dledford, awalls,
	syrjala, luto, mst, cocci, linux-kernel, Toshi Kani,
	Jonathan Corbet, Dave Hansen, Suresh Siddha, Juergen Gross,
	Daniel Vetter, Dave Airlie, Antonino Daplas, Mel Gorman,
	Vlastimil Babka, Davidlohr Bueso, linux-fbdev

On Thu, Apr 30, 2015 at 03:01:12PM -0700, Randy Dunlap wrote:
> On 04/29/15 14:44, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> > diff --git a/Documentation/x86/pat.txt b/Documentation/x86/pat.txt
> > index cf08c9f..7e183e3 100644
> > --- a/Documentation/x86/pat.txt
> > +++ b/Documentation/x86/pat.txt
> > @@ -102,7 +104,43 @@ wants to export a RAM region, it has to do set_memory_uc() or set_memory_wc()
> >  as step 0 above and also track the usage of those pages and use set_memory_wb()
> >  before the page is freed to free pool.
> >  
> > -
> > +MTRR effects on PAT / non-PAT systems
> > +-------------------------------------
> > +
> > +The following table provides the effects of using write-combining MTRRs when
> > +using ioremap*() calls on x86 for both non-PAT and PAT systems. Ideally
> > +mtrr_add() usage will be phased in favor of arch_phys_wc_add() which will
> > +be a no-op on PAT enabled systems. The region over which a arch_phys_wc_add()
> > +is made should already have be ioremap'd with write-combining page attributes
> > +or PAT entries, this can be done by using ioremap_wc() / or respective helpers.
> > +Devices which combine areas of IO memory desired to remain uncachable with
> 
> I would spell it uncacheable.  In kernel Documentation/, grep uncacheable finds
> 14 hits vs. 6 hits for uncachable.  No big deal.
> 
> > +areas where write-combining is desirable and are restricted by the size
> > +requirements of MTRRs should consider splitting up their IO memory space
> > +cleanly with ioremap_uc() and ioremap_wc() followed by an arch_phys_wc_add()
> > +encompassing both regions. Such use is nevertheless heavily discouraged as
> > +the effective memory type is considered implementation defined. This strategy
> > +should only be used as last resort on devices with size-contrained regions
> 
>                                                       size-constrained
> 
> > +where otherwise MTRR write-combining would not be effective.
> > +
> > +Note that you cannot use set_memory_wc() to override / whitelist IO remapped
> > +memory space mapped with ioremap*() calls, set_memory_wc() can only be used
> > +on RAM.
> > +
> > +----------------------------------------------------------------------
> > +MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
> > +----------------------------------------------------------------------
> > +                                                  Non-PAT |  PAT
> > +     PAT
> > +     |PCD
> > +     ||PWT
> > +     |||
> > +WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
> > +WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
> > +WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   WC
> > +WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
> > +----------------------------------------------------------------------
> > +
> > +(*) denotes implementation defined and is discouraged
> >  
> >  Notes:
> >  
> > diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
> > index ea5f363..12abdbe 100644
> > --- a/arch/x86/kernel/cpu/mtrr/main.c
> > +++ b/arch/x86/kernel/cpu/mtrr/main.c
> > @@ -538,6 +538,9 @@ EXPORT_SYMBOL(mtrr_del);
> >   * attempts to add a WC MTRR covering size bytes starting at base and
> >   * logs an error if this fails.
> >   *
> > + * The caller should expect to need to provide a power of two size on an
> 
>     * The called should provide a power of two size on an equivalent
>     * power of two boundary.
> 

Thanks since Boris took this already I'll let him amend unless he wishes for
me to send a new version.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 2/6] x86: document WC MTRR effects on PAT / non-PAT pages
@ 2015-05-05  0:45       ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-05-05  0:45 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Luis R. Rodriguez, mingo, tglx, hpa, bp, plagnioj,
	tomi.valkeinen, daniel.vetter, airlied, dledford, awalls,
	syrjala, luto, mst, cocci, linux-kernel, Toshi Kani,
	Jonathan Corbet, Dave Hansen, Suresh Siddha, Juergen Gross,
	Daniel Vetter, Dave Airlie, Antonino Daplas, Mel Gorman,
	Vlastimil Babka, Davidlohr Bueso, linux-fbdev

On Thu, Apr 30, 2015 at 03:01:12PM -0700, Randy Dunlap wrote:
> On 04/29/15 14:44, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> > diff --git a/Documentation/x86/pat.txt b/Documentation/x86/pat.txt
> > index cf08c9f..7e183e3 100644
> > --- a/Documentation/x86/pat.txt
> > +++ b/Documentation/x86/pat.txt
> > @@ -102,7 +104,43 @@ wants to export a RAM region, it has to do set_memory_uc() or set_memory_wc()
> >  as step 0 above and also track the usage of those pages and use set_memory_wb()
> >  before the page is freed to free pool.
> >  
> > -
> > +MTRR effects on PAT / non-PAT systems
> > +-------------------------------------
> > +
> > +The following table provides the effects of using write-combining MTRRs when
> > +using ioremap*() calls on x86 for both non-PAT and PAT systems. Ideally
> > +mtrr_add() usage will be phased in favor of arch_phys_wc_add() which will
> > +be a no-op on PAT enabled systems. The region over which a arch_phys_wc_add()
> > +is made should already have be ioremap'd with write-combining page attributes
> > +or PAT entries, this can be done by using ioremap_wc() / or respective helpers.
> > +Devices which combine areas of IO memory desired to remain uncachable with
> 
> I would spell it uncacheable.  In kernel Documentation/, grep uncacheable finds
> 14 hits vs. 6 hits for uncachable.  No big deal.
> 
> > +areas where write-combining is desirable and are restricted by the size
> > +requirements of MTRRs should consider splitting up their IO memory space
> > +cleanly with ioremap_uc() and ioremap_wc() followed by an arch_phys_wc_add()
> > +encompassing both regions. Such use is nevertheless heavily discouraged as
> > +the effective memory type is considered implementation defined. This strategy
> > +should only be used as last resort on devices with size-contrained regions
> 
>                                                       size-constrained
> 
> > +where otherwise MTRR write-combining would not be effective.
> > +
> > +Note that you cannot use set_memory_wc() to override / whitelist IO remapped
> > +memory space mapped with ioremap*() calls, set_memory_wc() can only be used
> > +on RAM.
> > +
> > +----------------------------------------------------------------------
> > +MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
> > +----------------------------------------------------------------------
> > +                                                  Non-PAT |  PAT
> > +     PAT
> > +     |PCD
> > +     ||PWT
> > +     |||
> > +WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
> > +WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
> > +WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   WC
> > +WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
> > +----------------------------------------------------------------------
> > +
> > +(*) denotes implementation defined and is discouraged
> >  
> >  Notes:
> >  
> > diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
> > index ea5f363..12abdbe 100644
> > --- a/arch/x86/kernel/cpu/mtrr/main.c
> > +++ b/arch/x86/kernel/cpu/mtrr/main.c
> > @@ -538,6 +538,9 @@ EXPORT_SYMBOL(mtrr_del);
> >   * attempts to add a WC MTRR covering size bytes starting at base and
> >   * logs an error if this fails.
> >   *
> > + * The caller should expect to need to provide a power of two size on an
> 
>     * The called should provide a power of two size on an equivalent
>     * power of two boundary.
> 

Thanks since Boris took this already I'll let him amend unless he wishes for
me to send a new version.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 2/6] x86: document WC MTRR effects on PAT / non-PAT pages
  2015-05-05  0:45       ` Luis R. Rodriguez
@ 2015-05-05  7:22         ` Borislav Petkov
  -1 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-05  7:22 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Randy Dunlap, Luis R. Rodriguez, mingo, tglx, hpa, plagnioj,
	tomi.valkeinen, daniel.vetter, airlied, dledford, awalls,
	syrjala, luto, mst, cocci, linux-kernel, Toshi Kani,
	Jonathan Corbet, Dave Hansen, Suresh Siddha, Juergen Gross,
	Daniel Vetter, Dave Airlie, Antonino Daplas, Mel Gorman,
	Vlastimil Babka, Davidlohr Bueso, linux-fbdev

On Tue, May 05, 2015 at 02:45:06AM +0200, Luis R. Rodriguez wrote:
> Thanks since Boris took this already I'll let him amend unless he wishes for
> me to send a new version.

Haven't. I'm waiting for v2.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 2/6] x86: document WC MTRR effects on PAT / non-PAT pages
@ 2015-05-05  7:22         ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-05  7:22 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Randy Dunlap, Luis R. Rodriguez, mingo, tglx, hpa, plagnioj,
	tomi.valkeinen, daniel.vetter, airlied, dledford, awalls,
	syrjala, luto, mst, cocci, linux-kernel, Toshi Kani,
	Jonathan Corbet, Dave Hansen, Suresh Siddha, Juergen Gross,
	Daniel Vetter, Dave Airlie, Antonino Daplas, Mel Gorman,
	Vlastimil Babka, Davidlohr Bueso, linux-fbdev

On Tue, May 05, 2015 at 02:45:06AM +0200, Luis R. Rodriguez wrote:
> Thanks since Boris took this already I'll let him amend unless he wishes for
> me to send a new version.

Haven't. I'm waiting for v2.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 2/6] x86: document WC MTRR effects on PAT / non-PAT pages
  2015-04-30 22:01     ` Randy Dunlap
@ 2015-05-05  7:31       ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-05-05  7:31 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Luis R. Rodriguez, mingo, tglx, hpa, bp, plagnioj,
	tomi.valkeinen, daniel.vetter, airlied, dledford, awalls,
	syrjala, luto, mst, cocci, linux-kernel, Toshi Kani,
	Jonathan Corbet, Dave Hansen, Suresh Siddha, Juergen Gross,
	Daniel Vetter, Dave Airlie, Antonino Daplas, Mel Gorman,
	Vlastimil Babka, Davidlohr Bueso, linux-fbdev

On Thu, Apr 30, 2015 at 03:01:12PM -0700, Randy Dunlap wrote:
> On 04/29/15 14:44, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> > 
> 
> > ---
> >  Documentation/x86/mtrr.txt      | 18 +++++++++++++++---
> >  Documentation/x86/pat.txt       | 40 +++++++++++++++++++++++++++++++++++++++-
> >  arch/x86/kernel/cpu/mtrr/main.c |  3 +++
> >  3 files changed, 57 insertions(+), 4 deletions(-)
> > 
> > diff --git a/Documentation/x86/pat.txt b/Documentation/x86/pat.txt
> > index cf08c9f..7e183e3 100644
> > --- a/Documentation/x86/pat.txt
> > +++ b/Documentation/x86/pat.txt
> > @@ -102,7 +104,43 @@ wants to export a RAM region, it has to do set_memory_uc() or set_memory_wc()
> >  as step 0 above and also track the usage of those pages and use set_memory_wb()
> >  before the page is freed to free pool.
> >  
> > -
> > +MTRR effects on PAT / non-PAT systems
> > +-------------------------------------
> > +
> > +The following table provides the effects of using write-combining MTRRs when
> > +using ioremap*() calls on x86 for both non-PAT and PAT systems. Ideally
> > +mtrr_add() usage will be phased in favor of arch_phys_wc_add() which will
> > +be a no-op on PAT enabled systems. The region over which a arch_phys_wc_add()
> > +is made should already have be ioremap'd with write-combining page attributes
> > +or PAT entries, this can be done by using ioremap_wc() / or respective helpers.
> > +Devices which combine areas of IO memory desired to remain uncachable with
> 
> I would spell it uncacheable.  In kernel Documentation/, grep uncacheable finds
> 14 hits vs. 6 hits for uncachable.  No big deal.

Fixed.

> > +areas where write-combining is desirable and are restricted by the size
> > +requirements of MTRRs should consider splitting up their IO memory space
> > +cleanly with ioremap_uc() and ioremap_wc() followed by an arch_phys_wc_add()
> > +encompassing both regions. Such use is nevertheless heavily discouraged as
> > +the effective memory type is considered implementation defined. This strategy
> > +should only be used as last resort on devices with size-contrained regions
> 
>                                                       size-constrained

Fixed.

> > +where otherwise MTRR write-combining would not be effective.
> > +
> > +Note that you cannot use set_memory_wc() to override / whitelist IO remapped
> > +memory space mapped with ioremap*() calls, set_memory_wc() can only be used
> > +on RAM.
> > +
> > +----------------------------------------------------------------------
> > +MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
> > +----------------------------------------------------------------------
> > +                                                  Non-PAT |  PAT
> > +     PAT
> > +     |PCD
> > +     ||PWT
> > +     |||
> > +WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
> > +WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
> > +WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   WC
> > +WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
> > +----------------------------------------------------------------------
> > +
> > +(*) denotes implementation defined and is discouraged
> >  
> >  Notes:
> >  
> > diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
> > index ea5f363..12abdbe 100644
> > --- a/arch/x86/kernel/cpu/mtrr/main.c
> > +++ b/arch/x86/kernel/cpu/mtrr/main.c
> > @@ -538,6 +538,9 @@ EXPORT_SYMBOL(mtrr_del);
> >   * attempts to add a WC MTRR covering size bytes starting at base and
> >   * logs an error if this fails.
> >   *
> > + * The caller should expect to need to provide a power of two size on an
> 
>     * The called should provide a power of two size on an equivalent
>     * power of two boundary.

Fixed.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 2/6] x86: document WC MTRR effects on PAT / non-PAT pages
@ 2015-05-05  7:31       ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-05-05  7:31 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Luis R. Rodriguez, mingo, tglx, hpa, bp, plagnioj,
	tomi.valkeinen, daniel.vetter, airlied, dledford, awalls,
	syrjala, luto, mst, cocci, linux-kernel, Toshi Kani,
	Jonathan Corbet, Dave Hansen, Suresh Siddha, Juergen Gross,
	Daniel Vetter, Dave Airlie, Antonino Daplas, Mel Gorman,
	Vlastimil Babka, Davidlohr Bueso, linux-fbdev

On Thu, Apr 30, 2015 at 03:01:12PM -0700, Randy Dunlap wrote:
> On 04/29/15 14:44, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> > 
> 
> > ---
> >  Documentation/x86/mtrr.txt      | 18 +++++++++++++++---
> >  Documentation/x86/pat.txt       | 40 +++++++++++++++++++++++++++++++++++++++-
> >  arch/x86/kernel/cpu/mtrr/main.c |  3 +++
> >  3 files changed, 57 insertions(+), 4 deletions(-)
> > 
> > diff --git a/Documentation/x86/pat.txt b/Documentation/x86/pat.txt
> > index cf08c9f..7e183e3 100644
> > --- a/Documentation/x86/pat.txt
> > +++ b/Documentation/x86/pat.txt
> > @@ -102,7 +104,43 @@ wants to export a RAM region, it has to do set_memory_uc() or set_memory_wc()
> >  as step 0 above and also track the usage of those pages and use set_memory_wb()
> >  before the page is freed to free pool.
> >  
> > -
> > +MTRR effects on PAT / non-PAT systems
> > +-------------------------------------
> > +
> > +The following table provides the effects of using write-combining MTRRs when
> > +using ioremap*() calls on x86 for both non-PAT and PAT systems. Ideally
> > +mtrr_add() usage will be phased in favor of arch_phys_wc_add() which will
> > +be a no-op on PAT enabled systems. The region over which a arch_phys_wc_add()
> > +is made should already have be ioremap'd with write-combining page attributes
> > +or PAT entries, this can be done by using ioremap_wc() / or respective helpers.
> > +Devices which combine areas of IO memory desired to remain uncachable with
> 
> I would spell it uncacheable.  In kernel Documentation/, grep uncacheable finds
> 14 hits vs. 6 hits for uncachable.  No big deal.

Fixed.

> > +areas where write-combining is desirable and are restricted by the size
> > +requirements of MTRRs should consider splitting up their IO memory space
> > +cleanly with ioremap_uc() and ioremap_wc() followed by an arch_phys_wc_add()
> > +encompassing both regions. Such use is nevertheless heavily discouraged as
> > +the effective memory type is considered implementation defined. This strategy
> > +should only be used as last resort on devices with size-contrained regions
> 
>                                                       size-constrained

Fixed.

> > +where otherwise MTRR write-combining would not be effective.
> > +
> > +Note that you cannot use set_memory_wc() to override / whitelist IO remapped
> > +memory space mapped with ioremap*() calls, set_memory_wc() can only be used
> > +on RAM.
> > +
> > +----------------------------------------------------------------------
> > +MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
> > +----------------------------------------------------------------------
> > +                                                  Non-PAT |  PAT
> > +     PAT
> > +     |PCD
> > +     ||PWT
> > +     |||
> > +WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
> > +WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
> > +WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   WC
> > +WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
> > +----------------------------------------------------------------------
> > +
> > +(*) denotes implementation defined and is discouraged
> >  
> >  Notes:
> >  
> > diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
> > index ea5f363..12abdbe 100644
> > --- a/arch/x86/kernel/cpu/mtrr/main.c
> > +++ b/arch/x86/kernel/cpu/mtrr/main.c
> > @@ -538,6 +538,9 @@ EXPORT_SYMBOL(mtrr_del);
> >   * attempts to add a WC MTRR covering size bytes starting at base and
> >   * logs an error if this fails.
> >   *
> > + * The caller should expect to need to provide a power of two size on an
> 
>     * The called should provide a power of two size on an equivalent
>     * power of two boundary.

Fixed.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 2/6] x86: document WC MTRR effects on PAT / non-PAT pages
  2015-05-04 12:23     ` Borislav Petkov
@ 2015-05-05  7:35       ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-05-05  7:35 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Luis R. Rodriguez, mingo, tglx, hpa, bp, plagnioj,
	tomi.valkeinen, daniel.vetter, airlied, dledford, awalls,
	syrjala, luto, mst, cocci, linux-kernel, Toshi Kani,
	Jonathan Corbet, Dave Hansen, Suresh Siddha, Juergen Gross,
	Daniel Vetter, Dave Airlie, Antonino Daplas, Mel Gorman,
	Vlastimil Babka, Davidlohr Bueso, linux-fbdev

On Mon, May 04, 2015 at 02:23:03PM +0200, Borislav Petkov wrote:
> On Wed, Apr 29, 2015 at 02:44:07PM -0700, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> > 
> > As part of the effort to phase out MTRR use document
> > write-combining MTRR effects on pages with different
> > non-PAT page attributes flags and different PAT entry
> > values. Extend arch_phys_wc_add() documentation to
> > clarify power of two sizes / boundary requirements as
> > we phase out mtrr_add() use.
> > 
> > Lastly hint towards ioremap_uc() for corner cases on
> > device drivers working with devices with mixed regions
> > where MTRR size requirements would otherwise not
> > enable write-combining effective memory types.
> > 
> > Cc: Toshi Kani <toshi.kani@hp.com>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Andy Lutomirski <luto@amacapital.net>
> > Cc: Suresh Siddha <sbsiddha@gmail.com>
> > Cc: Ingo Molnar <mingo@elte.hu>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Juergen Gross <jgross@suse.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Dave Airlie <airlied@redhat.com>
> > Cc: Antonino Daplas <adaplas@gmail.com>
> > Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> > Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> > Cc: Ville Syrjälä <syrjala@sci.fi>
> > Cc: Mel Gorman <mgorman@suse.de>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Davidlohr Bueso <dbueso@suse.de>
> > Cc: linux-fbdev@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> > ---
> >  Documentation/x86/mtrr.txt      | 18 +++++++++++++++---
> >  Documentation/x86/pat.txt       | 40 +++++++++++++++++++++++++++++++++++++++-
> >  arch/x86/kernel/cpu/mtrr/main.c |  3 +++
> >  3 files changed, 57 insertions(+), 4 deletions(-)
> > 
> > diff --git a/Documentation/x86/mtrr.txt b/Documentation/x86/mtrr.txt
> > index cc071dc..a111a6c 100644
> > --- a/Documentation/x86/mtrr.txt
> > +++ b/Documentation/x86/mtrr.txt
> > @@ -1,7 +1,19 @@
> >  MTRR (Memory Type Range Register) control
> > -3 Jun 1999
> > -Richard Gooch
> > -<rgooch@atnf.csiro.au>
> > +
> > +Richard Gooch <rgooch@atnf.csiro.au> - 3 Jun 1999
> > +Luis R. Rodriguez <mcgrof@do-not-panic.com> - April 9, 2015
> > +
> > +===============================================================================
> > +Phasing MTRR use
> 
> "Phasing out...".

Fixed all, will send another version.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 2/6] x86: document WC MTRR effects on PAT / non-PAT pages
@ 2015-05-05  7:35       ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-05-05  7:35 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Luis R. Rodriguez, mingo, tglx, hpa, bp, plagnioj,
	tomi.valkeinen, daniel.vetter, airlied, dledford, awalls,
	syrjala, luto, mst, cocci, linux-kernel, Toshi Kani,
	Jonathan Corbet, Dave Hansen, Suresh Siddha, Juergen Gross,
	Daniel Vetter, Dave Airlie, Antonino Daplas, Mel Gorman,
	Vlastimil Babka, Davidlohr Bueso, linux-fbdev

On Mon, May 04, 2015 at 02:23:03PM +0200, Borislav Petkov wrote:
> On Wed, Apr 29, 2015 at 02:44:07PM -0700, Luis R. Rodriguez wrote:
> > From: "Luis R. Rodriguez" <mcgrof@suse.com>
> > 
> > As part of the effort to phase out MTRR use document
> > write-combining MTRR effects on pages with different
> > non-PAT page attributes flags and different PAT entry
> > values. Extend arch_phys_wc_add() documentation to
> > clarify power of two sizes / boundary requirements as
> > we phase out mtrr_add() use.
> > 
> > Lastly hint towards ioremap_uc() for corner cases on
> > device drivers working with devices with mixed regions
> > where MTRR size requirements would otherwise not
> > enable write-combining effective memory types.
> > 
> > Cc: Toshi Kani <toshi.kani@hp.com>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Andy Lutomirski <luto@amacapital.net>
> > Cc: Suresh Siddha <sbsiddha@gmail.com>
> > Cc: Ingo Molnar <mingo@elte.hu>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Juergen Gross <jgross@suse.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Dave Airlie <airlied@redhat.com>
> > Cc: Antonino Daplas <adaplas@gmail.com>
> > Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
> > Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
> > Cc: Ville Syrjälä <syrjala@sci.fi>
> > Cc: Mel Gorman <mgorman@suse.de>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Davidlohr Bueso <dbueso@suse.de>
> > Cc: linux-fbdev@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
> > ---
> >  Documentation/x86/mtrr.txt      | 18 +++++++++++++++---
> >  Documentation/x86/pat.txt       | 40 +++++++++++++++++++++++++++++++++++++++-
> >  arch/x86/kernel/cpu/mtrr/main.c |  3 +++
> >  3 files changed, 57 insertions(+), 4 deletions(-)
> > 
> > diff --git a/Documentation/x86/mtrr.txt b/Documentation/x86/mtrr.txt
> > index cc071dc..a111a6c 100644
> > --- a/Documentation/x86/mtrr.txt
> > +++ b/Documentation/x86/mtrr.txt
> > @@ -1,7 +1,19 @@
> >  MTRR (Memory Type Range Register) control
> > -3 Jun 1999
> > -Richard Gooch
> > -<rgooch@atnf.csiro.au>
> > +
> > +Richard Gooch <rgooch@atnf.csiro.au> - 3 Jun 1999
> > +Luis R. Rodriguez <mcgrof@do-not-panic.com> - April 9, 2015
> > +
> > +=======================================> > +Phasing MTRR use
> 
> "Phasing out...".

Fixed all, will send another version.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 2/6] x86: document WC MTRR effects on PAT / non-PAT pages
  2015-05-05  7:22         ` Borislav Petkov
@ 2015-05-05  7:46           ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-05-05  7:46 UTC (permalink / raw)
  To: Borislav Petkov, Ville Syrjälä, luto
  Cc: Randy Dunlap, Luis R. Rodriguez, mingo, tglx, hpa, plagnioj,
	tomi.valkeinen, daniel.vetter, airlied, dledford, awalls,
	syrjala, mst, cocci, linux-kernel, Toshi Kani, Jonathan Corbet,
	Dave Hansen, Suresh Siddha, Juergen Gross, Daniel Vetter,
	Dave Airlie, Antonino Daplas, Mel Gorman, Vlastimil Babka,
	Davidlohr Bueso, linux-fbdev

On Tue, May 05, 2015 at 09:22:14AM +0200, Borislav Petkov wrote:
> On Tue, May 05, 2015 at 02:45:06AM +0200, Luis R. Rodriguez wrote:
> > Thanks since Boris took this already I'll let him amend unless he wishes for
> > me to send a new version.
> 
> Haven't. I'm waiting for v2.

OK thanks, it'll be a v5 actually. I am only resending the documentation patch.

Ville, are you OK with the other atyfb patches that follow up on top of this?
If so since they depend on ioremap_uc() should it go through Boris as he's
taking that in?

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 2/6] x86: document WC MTRR effects on PAT / non-PAT pages
@ 2015-05-05  7:46           ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-05-05  7:46 UTC (permalink / raw)
  To: Borislav Petkov, Ville Syrjälä, luto
  Cc: Randy Dunlap, Luis R. Rodriguez, mingo, tglx, hpa, plagnioj,
	tomi.valkeinen, daniel.vetter, airlied, dledford, awalls, mst,
	cocci, linux-kernel, Toshi Kani, Jonathan Corbet, Dave Hansen,
	Suresh Siddha, Juergen Gross, Daniel Vetter, Dave Airlie,
	Antonino Daplas, Mel Gorman, Vlastimil Babka, Davidlohr Bueso,
	linux-fbdev

On Tue, May 05, 2015 at 09:22:14AM +0200, Borislav Petkov wrote:
> On Tue, May 05, 2015 at 02:45:06AM +0200, Luis R. Rodriguez wrote:
> > Thanks since Boris took this already I'll let him amend unless he wishes for
> > me to send a new version.
> 
> Haven't. I'm waiting for v2.

OK thanks, it'll be a v5 actually. I am only resending the documentation patch.

Ville, are you OK with the other atyfb patches that follow up on top of this?
If so since they depend on ioremap_uc() should it go through Boris as he's
taking that in?

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 2/6] x86: document WC MTRR effects on PAT / non-PAT pages
  2015-05-05  7:46           ` Luis R. Rodriguez
@ 2015-05-05  7:53             ` Borislav Petkov
  -1 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-05  7:53 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Ville Syrjälä,
	luto, Randy Dunlap, Luis R. Rodriguez, mingo, tglx, hpa,
	plagnioj, tomi.valkeinen, daniel.vetter, airlied, dledford,
	awalls, mst, cocci, linux-kernel, Toshi Kani, Jonathan Corbet,
	Dave Hansen, Suresh Siddha, Juergen Gross, Daniel Vetter,
	Dave Airlie, Antonino Daplas, Mel Gorman, Vlastimil Babka,
	Davidlohr Bueso, linux-fbdev

On Tue, May 05, 2015 at 09:46:34AM +0200, Luis R. Rodriguez wrote:
> If so since they depend on ioremap_uc() should it go through Boris as he's
> taking that in?

Let's slow down a bit first, ok? First let's have all the x86 changes
ready, in and tested. Drivers can convert to them in a following step.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 2/6] x86: document WC MTRR effects on PAT / non-PAT pages
@ 2015-05-05  7:53             ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-05  7:53 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Ville Syrjälä,
	luto, Randy Dunlap, Luis R. Rodriguez, mingo, tglx, hpa,
	plagnioj, tomi.valkeinen, daniel.vetter, airlied, dledford,
	awalls, mst, cocci, linux-kernel, Toshi Kani, Jonathan Corbet,
	Dave Hansen, Suresh Siddha, Juergen Gross, Daniel Vetter,
	Dave Airlie, Antonino Daplas, Mel Gorman, Vlastimil Babka,
	Davidlohr Bueso, linux-fbdev

On Tue, May 05, 2015 at 09:46:34AM +0200, Luis R. Rodriguez wrote:
> If so since they depend on ioremap_uc() should it go through Boris as he's
> taking that in?

Let's slow down a bit first, ok? First let's have all the x86 changes
ready, in and tested. Drivers can convert to them in a following step.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 1/7] mm, x86: Document return values of mapping funcs
  2015-03-24 22:08   ` Toshi Kani
@ 2015-05-05 11:19     ` Borislav Petkov
  -1 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-05 11:19 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, Mar 24, 2015 at 04:08:35PM -0600, Toshi Kani wrote:
> Document the return values of KVA mapping functions,

KVA?

Please write it out.

> pud_set_huge(), pmd_set_huge, pud_clear_huge() and
> pmd_clear_huge().
> 
> Simplify the conditions to select HAVE_ARCH_HUGE_VMAP
> in the Kconfig, since X86_PAE depends on X86_32.
> 
> There is no functional change in this patch.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> ---
>  arch/x86/Kconfig      |    2 +-
>  arch/x86/mm/pgtable.c |   36 ++++++++++++++++++++++++++++--------
>  2 files changed, 29 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index cb23206..2ea27da 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -99,7 +99,7 @@ config X86
>  	select IRQ_FORCED_THREADING
>  	select HAVE_BPF_JIT if X86_64
>  	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
> -	select HAVE_ARCH_HUGE_VMAP if X86_64 || (X86_32 && X86_PAE)
> +	select HAVE_ARCH_HUGE_VMAP if X86_64 || X86_PAE
>  	select ARCH_HAS_SG_CHAIN
>  	select CLKEVT_I8253
>  	select ARCH_HAVE_NMI_SAFE_CMPXCHG

This is an unrelated change, please carve it out in a separate patch.

> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index 0b97d2c..4891fa1 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -563,14 +563,19 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
>  }
>  
>  #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
> +/**
> + * pud_set_huge - setup kernel PUD mapping
> + *
> + * MTRR can override PAT memory types with 4KB granularity.  Therefore,
> + * it does not set up a huge page when the range is covered by a non-WB

"it" is what exactly?

> + * type of MTRR.  0xFF indicates that MTRR are disabled.

So this shows that this patch shouldn't be the first one in the series.

IMO you want to start with cleaning up mtrr_type_lookup(), add the
defines for its retval and *then* document its users. This way you won't
have to touch the same place twice, the net-size of your patchset will
go down and it will be easier for reviewiers.

> + *
> + * Return 1 on success, and 0 when no PUD was set.

"Returns 1 on success and 0 on failure."

> + */
>  int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
>  {
>  	u8 mtrr;
>  
> -	/*
> -	 * Do not use a huge page when the range is covered by non-WB type
> -	 * of MTRRs.
> -	 */
>  	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
>  	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
>  		return 0;

Ditto for the rest.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 1/7] mm, x86: Document return values of mapping funcs
@ 2015-05-05 11:19     ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-05 11:19 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, Mar 24, 2015 at 04:08:35PM -0600, Toshi Kani wrote:
> Document the return values of KVA mapping functions,

KVA?

Please write it out.

> pud_set_huge(), pmd_set_huge, pud_clear_huge() and
> pmd_clear_huge().
> 
> Simplify the conditions to select HAVE_ARCH_HUGE_VMAP
> in the Kconfig, since X86_PAE depends on X86_32.
> 
> There is no functional change in this patch.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> ---
>  arch/x86/Kconfig      |    2 +-
>  arch/x86/mm/pgtable.c |   36 ++++++++++++++++++++++++++++--------
>  2 files changed, 29 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index cb23206..2ea27da 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -99,7 +99,7 @@ config X86
>  	select IRQ_FORCED_THREADING
>  	select HAVE_BPF_JIT if X86_64
>  	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
> -	select HAVE_ARCH_HUGE_VMAP if X86_64 || (X86_32 && X86_PAE)
> +	select HAVE_ARCH_HUGE_VMAP if X86_64 || X86_PAE
>  	select ARCH_HAS_SG_CHAIN
>  	select CLKEVT_I8253
>  	select ARCH_HAVE_NMI_SAFE_CMPXCHG

This is an unrelated change, please carve it out in a separate patch.

> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index 0b97d2c..4891fa1 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -563,14 +563,19 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
>  }
>  
>  #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
> +/**
> + * pud_set_huge - setup kernel PUD mapping
> + *
> + * MTRR can override PAT memory types with 4KB granularity.  Therefore,
> + * it does not set up a huge page when the range is covered by a non-WB

"it" is what exactly?

> + * type of MTRR.  0xFF indicates that MTRR are disabled.

So this shows that this patch shouldn't be the first one in the series.

IMO you want to start with cleaning up mtrr_type_lookup(), add the
defines for its retval and *then* document its users. This way you won't
have to touch the same place twice, the net-size of your patchset will
go down and it will be easier for reviewiers.

> + *
> + * Return 1 on success, and 0 when no PUD was set.

"Returns 1 on success and 0 on failure."

> + */
>  int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
>  {
>  	u8 mtrr;
>  
> -	/*
> -	 * Do not use a huge page when the range is covered by non-WB type
> -	 * of MTRRs.
> -	 */
>  	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
>  	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
>  		return 0;

Ditto for the rest.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 1/7] mm, x86: Document return values of mapping funcs
  2015-05-05 11:19     ` Borislav Petkov
@ 2015-05-05 13:46       ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-05 13:46 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, 2015-05-05 at 13:19 +0200, Borislav Petkov wrote:
> On Tue, Mar 24, 2015 at 04:08:35PM -0600, Toshi Kani wrote:
> > Document the return values of KVA mapping functions,
> 
> KVA?
> Please write it out.

Will expand it as Kernel Virtual Address.

> > pud_set_huge(), pmd_set_huge, pud_clear_huge() and
> > pmd_clear_huge().
> > 
> > Simplify the conditions to select HAVE_ARCH_HUGE_VMAP
> > in the Kconfig, since X86_PAE depends on X86_32.
> > 
> > There is no functional change in this patch.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > ---
> >  arch/x86/Kconfig      |    2 +-
> >  arch/x86/mm/pgtable.c |   36 ++++++++++++++++++++++++++++--------
> >  2 files changed, 29 insertions(+), 9 deletions(-)
> > 
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index cb23206..2ea27da 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -99,7 +99,7 @@ config X86
> >  	select IRQ_FORCED_THREADING
> >  	select HAVE_BPF_JIT if X86_64
> >  	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
> > -	select HAVE_ARCH_HUGE_VMAP if X86_64 || (X86_32 && X86_PAE)
> > +	select HAVE_ARCH_HUGE_VMAP if X86_64 || X86_PAE
> >  	select ARCH_HAS_SG_CHAIN
> >  	select CLKEVT_I8253
> >  	select ARCH_HAVE_NMI_SAFE_CMPXCHG
> 
> This is an unrelated change, please carve it out in a separate patch.

Will do.

> > diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> > index 0b97d2c..4891fa1 100644
> > --- a/arch/x86/mm/pgtable.c
> > +++ b/arch/x86/mm/pgtable.c
> > @@ -563,14 +563,19 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
> >  }
> >  
> >  #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
> > +/**
> > + * pud_set_huge - setup kernel PUD mapping
> > + *
> > + * MTRR can override PAT memory types with 4KB granularity.  Therefore,
> > + * it does not set up a huge page when the range is covered by a non-WB
> 
> "it" is what exactly?

Will change to "this function".

> > + * type of MTRR.  0xFF indicates that MTRR are disabled.
> 
> So this shows that this patch shouldn't be the first one in the series.
> 
> IMO you want to start with cleaning up mtrr_type_lookup(), add the
> defines for its retval and *then* document its users. This way you won't
> have to touch the same place twice, the net-size of your patchset will
> go down and it will be easier for reviewiers.

Agreed.  This patch-set was originally a small set of patches, but was
extended later with additional patches, which ended up with touching the
same place again.  I will reorganize the patch-set. 

> > + *
> > + * Return 1 on success, and 0 when no PUD was set.
> 
> "Returns 1 on success and 0 on failure."

Will do.

> > + */
> >  int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
> >  {
> >  	u8 mtrr;
> >  
> > -	/*
> > -	 * Do not use a huge page when the range is covered by non-WB type
> > -	 * of MTRRs.
> > -	 */
> >  	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
> >  	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
> >  		return 0;
> 
> Ditto for the rest.

Will do.

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 1/7] mm, x86: Document return values of mapping funcs
@ 2015-05-05 13:46       ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-05 13:46 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, 2015-05-05 at 13:19 +0200, Borislav Petkov wrote:
> On Tue, Mar 24, 2015 at 04:08:35PM -0600, Toshi Kani wrote:
> > Document the return values of KVA mapping functions,
> 
> KVA?
> Please write it out.

Will expand it as Kernel Virtual Address.

> > pud_set_huge(), pmd_set_huge, pud_clear_huge() and
> > pmd_clear_huge().
> > 
> > Simplify the conditions to select HAVE_ARCH_HUGE_VMAP
> > in the Kconfig, since X86_PAE depends on X86_32.
> > 
> > There is no functional change in this patch.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > ---
> >  arch/x86/Kconfig      |    2 +-
> >  arch/x86/mm/pgtable.c |   36 ++++++++++++++++++++++++++++--------
> >  2 files changed, 29 insertions(+), 9 deletions(-)
> > 
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index cb23206..2ea27da 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -99,7 +99,7 @@ config X86
> >  	select IRQ_FORCED_THREADING
> >  	select HAVE_BPF_JIT if X86_64
> >  	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
> > -	select HAVE_ARCH_HUGE_VMAP if X86_64 || (X86_32 && X86_PAE)
> > +	select HAVE_ARCH_HUGE_VMAP if X86_64 || X86_PAE
> >  	select ARCH_HAS_SG_CHAIN
> >  	select CLKEVT_I8253
> >  	select ARCH_HAVE_NMI_SAFE_CMPXCHG
> 
> This is an unrelated change, please carve it out in a separate patch.

Will do.

> > diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> > index 0b97d2c..4891fa1 100644
> > --- a/arch/x86/mm/pgtable.c
> > +++ b/arch/x86/mm/pgtable.c
> > @@ -563,14 +563,19 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
> >  }
> >  
> >  #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
> > +/**
> > + * pud_set_huge - setup kernel PUD mapping
> > + *
> > + * MTRR can override PAT memory types with 4KB granularity.  Therefore,
> > + * it does not set up a huge page when the range is covered by a non-WB
> 
> "it" is what exactly?

Will change to "this function".

> > + * type of MTRR.  0xFF indicates that MTRR are disabled.
> 
> So this shows that this patch shouldn't be the first one in the series.
> 
> IMO you want to start with cleaning up mtrr_type_lookup(), add the
> defines for its retval and *then* document its users. This way you won't
> have to touch the same place twice, the net-size of your patchset will
> go down and it will be easier for reviewiers.

Agreed.  This patch-set was originally a small set of patches, but was
extended later with additional patches, which ended up with touching the
same place again.  I will reorganize the patch-set. 

> > + *
> > + * Return 1 on success, and 0 when no PUD was set.
> 
> "Returns 1 on success and 0 on failure."

Will do.

> > + */
> >  int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
> >  {
> >  	u8 mtrr;
> >  
> > -	/*
> > -	 * Do not use a huge page when the range is covered by non-WB type
> > -	 * of MTRRs.
> > -	 */
> >  	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
> >  	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
> >  		return 0;
> 
> Ditto for the rest.

Will do.

Thanks,
-Toshi


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 1/7] mm, x86: Document return values of mapping funcs
  2015-05-05 14:19         ` Borislav Petkov
@ 2015-05-05 14:14           ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-05 14:14 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, 2015-05-05 at 16:19 +0200, Borislav Petkov wrote:
> On Tue, May 05, 2015 at 07:46:36AM -0600, Toshi Kani wrote:
> > Agreed.  This patch-set was originally a small set of patches, but was
> > extended later with additional patches, which ended up with touching the
> > same place again.  I will reorganize the patch-set.
> 
> Ok, but please wait until I take a look at the rest.

Sure, I will wait for your review.  

> 
> Thanks.
> 
> Btw, is there anything else MTRR-related pending for tip?

Not exactly MTRR-related, but I am planing to re-submit my WT patchset
after checking to see if Luis's patchset (which you are reviewing) has
any conflict with this.

https://lkml.org/lkml/2015/2/24/773

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 1/7] mm, x86: Document return values of mapping funcs
@ 2015-05-05 14:14           ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-05 14:14 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, 2015-05-05 at 16:19 +0200, Borislav Petkov wrote:
> On Tue, May 05, 2015 at 07:46:36AM -0600, Toshi Kani wrote:
> > Agreed.  This patch-set was originally a small set of patches, but was
> > extended later with additional patches, which ended up with touching the
> > same place again.  I will reorganize the patch-set.
> 
> Ok, but please wait until I take a look at the rest.

Sure, I will wait for your review.  

> 
> Thanks.
> 
> Btw, is there anything else MTRR-related pending for tip?

Not exactly MTRR-related, but I am planing to re-submit my WT patchset
after checking to see if Luis's patchset (which you are reviewing) has
any conflict with this.

https://lkml.org/lkml/2015/2/24/773

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 1/7] mm, x86: Document return values of mapping funcs
  2015-05-05 13:46       ` Toshi Kani
@ 2015-05-05 14:19         ` Borislav Petkov
  -1 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-05 14:19 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, May 05, 2015 at 07:46:36AM -0600, Toshi Kani wrote:
> Agreed.  This patch-set was originally a small set of patches, but was
> extended later with additional patches, which ended up with touching the
> same place again.  I will reorganize the patch-set.

Ok, but please wait until I take a look at the rest.

Thanks.

Btw, is there anything else MTRR-related pending for tip?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 1/7] mm, x86: Document return values of mapping funcs
@ 2015-05-05 14:19         ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-05 14:19 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, May 05, 2015 at 07:46:36AM -0600, Toshi Kani wrote:
> Agreed.  This patch-set was originally a small set of patches, but was
> extended later with additional patches, which ended up with touching the
> same place again.  I will reorganize the patch-set.

Ok, but please wait until I take a look at the rest.

Thanks.

Btw, is there anything else MTRR-related pending for tip?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 2/7] mtrr, x86: Fix MTRR lookup to handle inclusive entry
  2015-03-24 22:08   ` Toshi Kani
@ 2015-05-05 17:11     ` Borislav Petkov
  -1 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-05 17:11 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, Mar 24, 2015 at 04:08:36PM -0600, Toshi Kani wrote:
> When an MTRR entry is inclusive to a requested range, i.e.
> the start and end of the request are not within the MTRR
> entry range but the range contains the MTRR entry entirely,
> __mtrr_type_lookup() ignores such a case because both
> start_state and end_state are set to zero.
> 
> This bug can cause the following issues:
> 1) reserve_memtype() tracks an effective memory type in case
>    a request type is WB (ex. /dev/mem blindly uses WB). Missing
>    to track with its effective type causes a subsequent request
>    to map the same range with the effective type to fail.
> 2) pud_set_huge() and pmd_set_huge() check if a requested range
>    has any overlap with MTRRs. Missing to detect an overlap may
>    cause a performance penalty or undefined behavior.
> 
> This patch fixes the bug by adding a new flag, 'inclusive',
> to detect the inclusive case.  This case is then handled in
> the same way as (!start_state && end_state).  With this fix,
> __mtrr_type_lookup() handles the inclusive case properly.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> ---
>  arch/x86/kernel/cpu/mtrr/generic.c |   17 +++++++++--------
>  1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
> index 7d74f7b..a82e370 100644
> --- a/arch/x86/kernel/cpu/mtrr/generic.c
> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> @@ -154,7 +154,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
>  
>  	prev_match = 0xFF;
>  	for (i = 0; i < num_var_ranges; ++i) {
> -		unsigned short start_state, end_state;
> +		unsigned short start_state, end_state, inclusive;
>  
>  		if (!(mtrr_state.var_ranges[i].mask_lo & (1 << 11)))
>  			continue;
> @@ -166,15 +166,16 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
>  
>  		start_state = ((start & mask) == (base & mask));
>  		end_state = ((end & mask) == (base & mask));
> +		inclusive = ((start < base) && (end > base));
>  
> -		if (start_state != end_state) {
> +		if ((start_state != end_state) || inclusive) {
>  			/*
>  			 * We have start:end spanning across an MTRR.
> -			 * We split the region into
> -			 * either
> -			 * (start:mtrr_end) (mtrr_end:end)
> -			 * or
> -			 * (start:mtrr_start) (mtrr_start:end)
> +			 * We split the region into either
> +			 * - start_state:1
> +			 *     (start:mtrr_end) (mtrr_end:end)
> +			 * - end_state:1 or inclusive:1
> +			 *     (start:mtrr_start) (mtrr_start:end)

Ok, I'm confused. Shouldn't the inclusive:1 case be

			(start:mtrr_start) (mtrr_start:mtrr_end) (mtrr_end:end)

?

If so, this function would need more changes...

>  			 * depending on kind of overlap.
>  			 * Return the type for first region and a pointer to
>  			 * the start of second region so that caller will
> @@ -195,7 +196,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
>  			*repeat = 1;
>  		}
>  
> -		if ((start & mask) != (base & mask))
> +		if (!start_state)
>  			continue;

That change actually makes the code more unreadable because you have to
go and look up what start_state was and the previous version actually
shows the check that start is within the range, exactly like it is
documented in the CPU manuals.

And I'd leave it this way because gcc is smart enough to reload the
result saved in start_state and not compute it again.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 2/7] mtrr, x86: Fix MTRR lookup to handle inclusive entry
@ 2015-05-05 17:11     ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-05 17:11 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, Mar 24, 2015 at 04:08:36PM -0600, Toshi Kani wrote:
> When an MTRR entry is inclusive to a requested range, i.e.
> the start and end of the request are not within the MTRR
> entry range but the range contains the MTRR entry entirely,
> __mtrr_type_lookup() ignores such a case because both
> start_state and end_state are set to zero.
> 
> This bug can cause the following issues:
> 1) reserve_memtype() tracks an effective memory type in case
>    a request type is WB (ex. /dev/mem blindly uses WB). Missing
>    to track with its effective type causes a subsequent request
>    to map the same range with the effective type to fail.
> 2) pud_set_huge() and pmd_set_huge() check if a requested range
>    has any overlap with MTRRs. Missing to detect an overlap may
>    cause a performance penalty or undefined behavior.
> 
> This patch fixes the bug by adding a new flag, 'inclusive',
> to detect the inclusive case.  This case is then handled in
> the same way as (!start_state && end_state).  With this fix,
> __mtrr_type_lookup() handles the inclusive case properly.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> ---
>  arch/x86/kernel/cpu/mtrr/generic.c |   17 +++++++++--------
>  1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
> index 7d74f7b..a82e370 100644
> --- a/arch/x86/kernel/cpu/mtrr/generic.c
> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> @@ -154,7 +154,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
>  
>  	prev_match = 0xFF;
>  	for (i = 0; i < num_var_ranges; ++i) {
> -		unsigned short start_state, end_state;
> +		unsigned short start_state, end_state, inclusive;
>  
>  		if (!(mtrr_state.var_ranges[i].mask_lo & (1 << 11)))
>  			continue;
> @@ -166,15 +166,16 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
>  
>  		start_state = ((start & mask) == (base & mask));
>  		end_state = ((end & mask) == (base & mask));
> +		inclusive = ((start < base) && (end > base));
>  
> -		if (start_state != end_state) {
> +		if ((start_state != end_state) || inclusive) {
>  			/*
>  			 * We have start:end spanning across an MTRR.
> -			 * We split the region into
> -			 * either
> -			 * (start:mtrr_end) (mtrr_end:end)
> -			 * or
> -			 * (start:mtrr_start) (mtrr_start:end)
> +			 * We split the region into either
> +			 * - start_state:1
> +			 *     (start:mtrr_end) (mtrr_end:end)
> +			 * - end_state:1 or inclusive:1
> +			 *     (start:mtrr_start) (mtrr_start:end)

Ok, I'm confused. Shouldn't the inclusive:1 case be

			(start:mtrr_start) (mtrr_start:mtrr_end) (mtrr_end:end)

?

If so, this function would need more changes...

>  			 * depending on kind of overlap.
>  			 * Return the type for first region and a pointer to
>  			 * the start of second region so that caller will
> @@ -195,7 +196,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
>  			*repeat = 1;
>  		}
>  
> -		if ((start & mask) != (base & mask))
> +		if (!start_state)
>  			continue;

That change actually makes the code more unreadable because you have to
go and look up what start_state was and the previous version actually
shows the check that start is within the range, exactly like it is
documented in the CPU manuals.

And I'd leave it this way because gcc is smart enough to reload the
result saved in start_state and not compute it again.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 2/7] mtrr, x86: Fix MTRR lookup to handle inclusive entry
  2015-05-05 17:11     ` Borislav Petkov
@ 2015-05-05 17:32       ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-05 17:32 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, 2015-05-05 at 19:11 +0200, Borislav Petkov wrote:
> On Tue, Mar 24, 2015 at 04:08:36PM -0600, Toshi Kani wrote:
> > When an MTRR entry is inclusive to a requested range, i.e.
> > the start and end of the request are not within the MTRR
> > entry range but the range contains the MTRR entry entirely,
> > __mtrr_type_lookup() ignores such a case because both
> > start_state and end_state are set to zero.
> > 
> > This bug can cause the following issues:
> > 1) reserve_memtype() tracks an effective memory type in case
> >    a request type is WB (ex. /dev/mem blindly uses WB). Missing
> >    to track with its effective type causes a subsequent request
> >    to map the same range with the effective type to fail.
> > 2) pud_set_huge() and pmd_set_huge() check if a requested range
> >    has any overlap with MTRRs. Missing to detect an overlap may
> >    cause a performance penalty or undefined behavior.
> > 
> > This patch fixes the bug by adding a new flag, 'inclusive',
> > to detect the inclusive case.  This case is then handled in
> > the same way as (!start_state && end_state).  With this fix,
> > __mtrr_type_lookup() handles the inclusive case properly.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > ---
> >  arch/x86/kernel/cpu/mtrr/generic.c |   17 +++++++++--------
> >  1 file changed, 9 insertions(+), 8 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
> > index 7d74f7b..a82e370 100644
> > --- a/arch/x86/kernel/cpu/mtrr/generic.c
> > +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> > @@ -154,7 +154,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
> >  
> >  	prev_match = 0xFF;
> >  	for (i = 0; i < num_var_ranges; ++i) {
> > -		unsigned short start_state, end_state;
> > +		unsigned short start_state, end_state, inclusive;
> >  
> >  		if (!(mtrr_state.var_ranges[i].mask_lo & (1 << 11)))
> >  			continue;
> > @@ -166,15 +166,16 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
> >  
> >  		start_state = ((start & mask) == (base & mask));
> >  		end_state = ((end & mask) == (base & mask));
> > +		inclusive = ((start < base) && (end > base));
> >  
> > -		if (start_state != end_state) {
> > +		if ((start_state != end_state) || inclusive) {
> >  			/*
> >  			 * We have start:end spanning across an MTRR.
> > -			 * We split the region into
> > -			 * either
> > -			 * (start:mtrr_end) (mtrr_end:end)
> > -			 * or
> > -			 * (start:mtrr_start) (mtrr_start:end)
> > +			 * We split the region into either
> > +			 * - start_state:1
> > +			 *     (start:mtrr_end) (mtrr_end:end)
> > +			 * - end_state:1 or inclusive:1
> > +			 *     (start:mtrr_start) (mtrr_start:end)
> 
> Ok, I'm confused. Shouldn't the inclusive:1 case be
> 
> 			(start:mtrr_start) (mtrr_start:mtrr_end) (mtrr_end:end)
> 
> ?
> 
> If so, this function would need more changes...

Yes, that's how it gets separated eventually.  Since *repeat is set in
this case, the code only needs to separate the first part at a time.
The 2nd part gets separated in the next call with the *repeat.


> >  			 * depending on kind of overlap.
> >  			 * Return the type for first region and a pointer to
> >  			 * the start of second region so that caller will
> > @@ -195,7 +196,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
> >  			*repeat = 1;
> >  		}
> >  
> > -		if ((start & mask) != (base & mask))
> > +		if (!start_state)
> >  			continue;
> 
> That change actually makes the code more unreadable because you have to
> go and look up what start_state was and the previous version actually
> shows the check that start is within the range, exactly like it is
> documented in the CPU manuals.
> 
> And I'd leave it this way because gcc is smart enough to reload the
> result saved in start_state and not compute it again.

When I see such re-calculation, it makes me look at the code again to
see if there is a case that updates the parameters after the first
calculation...  That said, I am OK as long as gcc is smart enough to
reload the value.  I will put it back to the original.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 2/7] mtrr, x86: Fix MTRR lookup to handle inclusive entry
@ 2015-05-05 17:32       ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-05 17:32 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, 2015-05-05 at 19:11 +0200, Borislav Petkov wrote:
> On Tue, Mar 24, 2015 at 04:08:36PM -0600, Toshi Kani wrote:
> > When an MTRR entry is inclusive to a requested range, i.e.
> > the start and end of the request are not within the MTRR
> > entry range but the range contains the MTRR entry entirely,
> > __mtrr_type_lookup() ignores such a case because both
> > start_state and end_state are set to zero.
> > 
> > This bug can cause the following issues:
> > 1) reserve_memtype() tracks an effective memory type in case
> >    a request type is WB (ex. /dev/mem blindly uses WB). Missing
> >    to track with its effective type causes a subsequent request
> >    to map the same range with the effective type to fail.
> > 2) pud_set_huge() and pmd_set_huge() check if a requested range
> >    has any overlap with MTRRs. Missing to detect an overlap may
> >    cause a performance penalty or undefined behavior.
> > 
> > This patch fixes the bug by adding a new flag, 'inclusive',
> > to detect the inclusive case.  This case is then handled in
> > the same way as (!start_state && end_state).  With this fix,
> > __mtrr_type_lookup() handles the inclusive case properly.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > ---
> >  arch/x86/kernel/cpu/mtrr/generic.c |   17 +++++++++--------
> >  1 file changed, 9 insertions(+), 8 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
> > index 7d74f7b..a82e370 100644
> > --- a/arch/x86/kernel/cpu/mtrr/generic.c
> > +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> > @@ -154,7 +154,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
> >  
> >  	prev_match = 0xFF;
> >  	for (i = 0; i < num_var_ranges; ++i) {
> > -		unsigned short start_state, end_state;
> > +		unsigned short start_state, end_state, inclusive;
> >  
> >  		if (!(mtrr_state.var_ranges[i].mask_lo & (1 << 11)))
> >  			continue;
> > @@ -166,15 +166,16 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
> >  
> >  		start_state = ((start & mask) == (base & mask));
> >  		end_state = ((end & mask) == (base & mask));
> > +		inclusive = ((start < base) && (end > base));
> >  
> > -		if (start_state != end_state) {
> > +		if ((start_state != end_state) || inclusive) {
> >  			/*
> >  			 * We have start:end spanning across an MTRR.
> > -			 * We split the region into
> > -			 * either
> > -			 * (start:mtrr_end) (mtrr_end:end)
> > -			 * or
> > -			 * (start:mtrr_start) (mtrr_start:end)
> > +			 * We split the region into either
> > +			 * - start_state:1
> > +			 *     (start:mtrr_end) (mtrr_end:end)
> > +			 * - end_state:1 or inclusive:1
> > +			 *     (start:mtrr_start) (mtrr_start:end)
> 
> Ok, I'm confused. Shouldn't the inclusive:1 case be
> 
> 			(start:mtrr_start) (mtrr_start:mtrr_end) (mtrr_end:end)
> 
> ?
> 
> If so, this function would need more changes...

Yes, that's how it gets separated eventually.  Since *repeat is set in
this case, the code only needs to separate the first part at a time.
The 2nd part gets separated in the next call with the *repeat.


> >  			 * depending on kind of overlap.
> >  			 * Return the type for first region and a pointer to
> >  			 * the start of second region so that caller will
> > @@ -195,7 +196,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
> >  			*repeat = 1;
> >  		}
> >  
> > -		if ((start & mask) != (base & mask))
> > +		if (!start_state)
> >  			continue;
> 
> That change actually makes the code more unreadable because you have to
> go and look up what start_state was and the previous version actually
> shows the check that start is within the range, exactly like it is
> documented in the CPU manuals.
> 
> And I'd leave it this way because gcc is smart enough to reload the
> result saved in start_state and not compute it again.

When I see such re-calculation, it makes me look at the code again to
see if there is a case that updates the parameters after the first
calculation...  That said, I am OK as long as gcc is smart enough to
reload the value.  I will put it back to the original.

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 2/7] mtrr, x86: Fix MTRR lookup to handle inclusive entry
  2015-05-05 17:32       ` Toshi Kani
@ 2015-05-05 18:39         ` Borislav Petkov
  -1 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-05 18:39 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, May 05, 2015 at 11:32:08AM -0600, Toshi Kani wrote:
> > Ok, I'm confused. Shouldn't the inclusive:1 case be
> > 
> > 			(start:mtrr_start) (mtrr_start:mtrr_end) (mtrr_end:end)
> > 
> > ?
> > 
> > If so, this function would need more changes...
> 
> Yes, that's how it gets separated eventually.  Since *repeat is set in
> this case, the code only needs to separate the first part at a time.
> The 2nd part gets separated in the next call with the *repeat.

Aah, right, the caller is supposed to adjust the interval limits on
subsequent calls. Please reflect this in the comment because:

		*     (start:mtrr_start) (mtrr_start:end)

is misleading for inclusive:1.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 2/7] mtrr, x86: Fix MTRR lookup to handle inclusive entry
@ 2015-05-05 18:39         ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-05 18:39 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, May 05, 2015 at 11:32:08AM -0600, Toshi Kani wrote:
> > Ok, I'm confused. Shouldn't the inclusive:1 case be
> > 
> > 			(start:mtrr_start) (mtrr_start:mtrr_end) (mtrr_end:end)
> > 
> > ?
> > 
> > If so, this function would need more changes...
> 
> Yes, that's how it gets separated eventually.  Since *repeat is set in
> this case, the code only needs to separate the first part at a time.
> The 2nd part gets separated in the next call with the *repeat.

Aah, right, the caller is supposed to adjust the interval limits on
subsequent calls. Please reflect this in the comment because:

		*     (start:mtrr_start) (mtrr_start:end)

is misleading for inclusive:1.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 2/7] mtrr, x86: Fix MTRR lookup to handle inclusive entry
  2015-05-05 18:39         ` Borislav Petkov
@ 2015-05-05 19:31           ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-05 19:31 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, 2015-05-05 at 20:39 +0200, Borislav Petkov wrote:
> On Tue, May 05, 2015 at 11:32:08AM -0600, Toshi Kani wrote:
> > > Ok, I'm confused. Shouldn't the inclusive:1 case be
> > > 
> > > 			(start:mtrr_start) (mtrr_start:mtrr_end) (mtrr_end:end)
> > > 
> > > ?
> > > 
> > > If so, this function would need more changes...
> > 
> > Yes, that's how it gets separated eventually.  Since *repeat is set in
> > this case, the code only needs to separate the first part at a time.
> > The 2nd part gets separated in the next call with the *repeat.
> 
> Aah, right, the caller is supposed to adjust the interval limits on
> subsequent calls. Please reflect this in the comment because:
> 
> 		*     (start:mtrr_start) (mtrr_start:end)
> 
> is misleading for inclusive:1.

Well, the comment kinda says it already, but I will try to clarify it.

           /*
            * We have start:end spanning across an MTRR.
            * We split the region into either
            * - start_state:1
            *     (start:mtrr_end) (mtrr_end:end)
            * - end_state:1 or inclusive:1
            *     (start:mtrr_start) (mtrr_start:end)
            * depending on kind of overlap.
            * Return the type for first region and a pointer to
            * the start of second region so that caller will
            * lookup again on the second region.
            * Note: This way we handle multiple overlaps as well.
            */

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 2/7] mtrr, x86: Fix MTRR lookup to handle inclusive entry
@ 2015-05-05 19:31           ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-05 19:31 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, 2015-05-05 at 20:39 +0200, Borislav Petkov wrote:
> On Tue, May 05, 2015 at 11:32:08AM -0600, Toshi Kani wrote:
> > > Ok, I'm confused. Shouldn't the inclusive:1 case be
> > > 
> > > 			(start:mtrr_start) (mtrr_start:mtrr_end) (mtrr_end:end)
> > > 
> > > ?
> > > 
> > > If so, this function would need more changes...
> > 
> > Yes, that's how it gets separated eventually.  Since *repeat is set in
> > this case, the code only needs to separate the first part at a time.
> > The 2nd part gets separated in the next call with the *repeat.
> 
> Aah, right, the caller is supposed to adjust the interval limits on
> subsequent calls. Please reflect this in the comment because:
> 
> 		*     (start:mtrr_start) (mtrr_start:end)
> 
> is misleading for inclusive:1.

Well, the comment kinda says it already, but I will try to clarify it.

           /*
            * We have start:end spanning across an MTRR.
            * We split the region into either
            * - start_state:1
            *     (start:mtrr_end) (mtrr_end:end)
            * - end_state:1 or inclusive:1
            *     (start:mtrr_start) (mtrr_start:end)
            * depending on kind of overlap.
            * Return the type for first region and a pointer to
            * the start of second region so that caller will
            * lookup again on the second region.
            * Note: This way we handle multiple overlaps as well.
            */

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 2/7] mtrr, x86: Fix MTRR lookup to handle inclusive entry
  2015-05-05 20:09             ` Borislav Petkov
@ 2015-05-05 20:06               ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-05 20:06 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, 2015-05-05 at 22:09 +0200, Borislav Petkov wrote:
> On Tue, May 05, 2015 at 01:31:32PM -0600, Toshi Kani wrote:
> > Well, the comment kinda says it already, but I will try to clarify it.
> > 
> >            /*
> >             * We have start:end spanning across an MTRR.
> >             * We split the region into either
> >             * - start_state:1
> >             *     (start:mtrr_end) (mtrr_end:end)
> >             * - end_state:1 or inclusive:1
> >             *     (start:mtrr_start) (mtrr_start:end)
> 
> What I mean is this:
> 
> 		* - start_state:1
> 		*     (start:mtrr_end) (mtrr_end:end)
> 		* - end_state:1
> 		*     (start:mtrr_start) (mtrr_start:end)
> 		* - inclusive:1
> 		*     (start:mtrr_start) (mtrr_start:mtrr_end) (mtrr_end:end)
> 		*
> 		* depending on kind of overlap.
> 		*
> 		* Return the type of the first region and a pointer to the start
> 		* of next region so that caller will be advised to lookup again
> 		* after having adjusted start and end.
> 		*
> 		* Note: This way we handle multiple overlaps as well.
> 		*/
> 
> We add comments so that people can read them and can quickly understand
> what the function does. Not to make them parse it and wonder why
> inclusive:1 is listed together with end_state:1 which returns two
> intervals.
> 
> Note that I changed the text to talk about the *next* region and not
> about the *second* region, to make it even more clear.

Thanks for the suggestion.  I see your point.  I will update it
accordingly.
-Toshi




^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 2/7] mtrr, x86: Fix MTRR lookup to handle inclusive entry
@ 2015-05-05 20:06               ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-05 20:06 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, 2015-05-05 at 22:09 +0200, Borislav Petkov wrote:
> On Tue, May 05, 2015 at 01:31:32PM -0600, Toshi Kani wrote:
> > Well, the comment kinda says it already, but I will try to clarify it.
> > 
> >            /*
> >             * We have start:end spanning across an MTRR.
> >             * We split the region into either
> >             * - start_state:1
> >             *     (start:mtrr_end) (mtrr_end:end)
> >             * - end_state:1 or inclusive:1
> >             *     (start:mtrr_start) (mtrr_start:end)
> 
> What I mean is this:
> 
> 		* - start_state:1
> 		*     (start:mtrr_end) (mtrr_end:end)
> 		* - end_state:1
> 		*     (start:mtrr_start) (mtrr_start:end)
> 		* - inclusive:1
> 		*     (start:mtrr_start) (mtrr_start:mtrr_end) (mtrr_end:end)
> 		*
> 		* depending on kind of overlap.
> 		*
> 		* Return the type of the first region and a pointer to the start
> 		* of next region so that caller will be advised to lookup again
> 		* after having adjusted start and end.
> 		*
> 		* Note: This way we handle multiple overlaps as well.
> 		*/
> 
> We add comments so that people can read them and can quickly understand
> what the function does. Not to make them parse it and wonder why
> inclusive:1 is listed together with end_state:1 which returns two
> intervals.
> 
> Note that I changed the text to talk about the *next* region and not
> about the *second* region, to make it even more clear.

Thanks for the suggestion.  I see your point.  I will update it
accordingly.
-Toshi



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 2/7] mtrr, x86: Fix MTRR lookup to handle inclusive entry
  2015-05-05 19:31           ` Toshi Kani
@ 2015-05-05 20:09             ` Borislav Petkov
  -1 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-05 20:09 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, May 05, 2015 at 01:31:32PM -0600, Toshi Kani wrote:
> Well, the comment kinda says it already, but I will try to clarify it.
> 
>            /*
>             * We have start:end spanning across an MTRR.
>             * We split the region into either
>             * - start_state:1
>             *     (start:mtrr_end) (mtrr_end:end)
>             * - end_state:1 or inclusive:1
>             *     (start:mtrr_start) (mtrr_start:end)

What I mean is this:

		* - start_state:1
		*     (start:mtrr_end) (mtrr_end:end)
		* - end_state:1
		*     (start:mtrr_start) (mtrr_start:end)
		* - inclusive:1
		*     (start:mtrr_start) (mtrr_start:mtrr_end) (mtrr_end:end)
		*
		* depending on kind of overlap.
		*
		* Return the type of the first region and a pointer to the start
		* of next region so that caller will be advised to lookup again
		* after having adjusted start and end.
		*
		* Note: This way we handle multiple overlaps as well.
		*/

We add comments so that people can read them and can quickly understand
what the function does. Not to make them parse it and wonder why
inclusive:1 is listed together with end_state:1 which returns two
intervals.

Note that I changed the text to talk about the *next* region and not
about the *second* region, to make it even more clear.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 2/7] mtrr, x86: Fix MTRR lookup to handle inclusive entry
@ 2015-05-05 20:09             ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-05 20:09 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, May 05, 2015 at 01:31:32PM -0600, Toshi Kani wrote:
> Well, the comment kinda says it already, but I will try to clarify it.
> 
>            /*
>             * We have start:end spanning across an MTRR.
>             * We split the region into either
>             * - start_state:1
>             *     (start:mtrr_end) (mtrr_end:end)
>             * - end_state:1 or inclusive:1
>             *     (start:mtrr_start) (mtrr_start:end)

What I mean is this:

		* - start_state:1
		*     (start:mtrr_end) (mtrr_end:end)
		* - end_state:1
		*     (start:mtrr_start) (mtrr_start:end)
		* - inclusive:1
		*     (start:mtrr_start) (mtrr_start:mtrr_end) (mtrr_end:end)
		*
		* depending on kind of overlap.
		*
		* Return the type of the first region and a pointer to the start
		* of next region so that caller will be advised to lookup again
		* after having adjusted start and end.
		*
		* Note: This way we handle multiple overlaps as well.
		*/

We add comments so that people can read them and can quickly understand
what the function does. Not to make them parse it and wonder why
inclusive:1 is listed together with end_state:1 which returns two
intervals.

Note that I changed the text to talk about the *next* region and not
about the *second* region, to make it even more clear.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 3/7] mtrr, x86: Remove a wrong address check in __mtrr_type_lookup()
  2015-03-24 22:08   ` Toshi Kani
@ 2015-05-06 10:46     ` Borislav Petkov
  -1 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-06 10:46 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, Mar 24, 2015 at 04:08:37PM -0600, Toshi Kani wrote:
> __mtrr_type_lookup() checks MTRR fixed ranges when
> mtrr_state.have_fixed is set and start is less than
> 0x100000.  However, the 'else if (start < 0x1000000)'
> in the code checks with a wrong address as it has
> an extra-zero in the address.  The code still runs
> correctly as this check is meaningless, though.
> 
> This patch replaces the wrong address check with 'else'
> with no condition.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> ---
>  arch/x86/kernel/cpu/mtrr/generic.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied, thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 3/7] mtrr, x86: Remove a wrong address check in __mtrr_type_lookup()
@ 2015-05-06 10:46     ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-06 10:46 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, Mar 24, 2015 at 04:08:37PM -0600, Toshi Kani wrote:
> __mtrr_type_lookup() checks MTRR fixed ranges when
> mtrr_state.have_fixed is set and start is less than
> 0x100000.  However, the 'else if (start < 0x1000000)'
> in the code checks with a wrong address as it has
> an extra-zero in the address.  The code still runs
> correctly as this check is meaningless, though.
> 
> This patch replaces the wrong address check with 'else'
> with no condition.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> ---
>  arch/x86/kernel/cpu/mtrr/generic.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied, thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 4/7] mtrr, x86: Fix MTRR state checks in mtrr_type_lookup()
  2015-03-24 22:08   ` Toshi Kani
@ 2015-05-06 11:47     ` Borislav Petkov
  -1 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-06 11:47 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, Mar 24, 2015 at 04:08:38PM -0600, Toshi Kani wrote:
> 'mtrr_state.enabled' contains the FE (fixed MTRRs enabled)
> and E (MTRRs enabled) flags in MSR_MTRRdefType.  Intel SDM,
> section 11.11.2.1, defines these flags as follows:
>  - All MTRRs are disabled when the E flag is clear.
>    The FE flag has no affect when the E flag is clear.
>  - The default type is enabled when the E flag is set.
>  - MTRR variable ranges are enabled when the E flag is set.
>  - MTRR fixed ranges are enabled when both E and FE flags
>    are set.
> 
> MTRR state checks in __mtrr_type_lookup() do not match with
> SDM.  Hence, this patch makes the following changes:
>  - The current code detects MTRRs disabled when both E and
>    FE flags are clear in mtrr_state.enabled.  Fix to detect
>    MTRRs disabled when the E flag is clear.
>  - The current code does not check if the FE bit is set in
>    mtrr_state.enabled when looking into the fixed entries.
>    Fix to check the FE flag.
>  - The current code returns the default type when the E flag
>    is clear in mtrr_state.enabled.  However, the default type
>    is also disabled when the E flag is clear.  Fix to remove
>    the code as this case is handled as MTRR disabled with
>    the 1st change.
> 
> In addition, this patch defines the E and FE flags in
> mtrr_state.enabled as follows.
>  - FE flag: MTRR_STATE_MTRR_FIXED_ENABLED
>  - E  flag: MTRR_STATE_MTRR_ENABLED
> 
> print_mtrr_state() is also updated accordingly.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> ---
>  arch/x86/include/uapi/asm/mtrr.h   |    4 ++++
>  arch/x86/kernel/cpu/mtrr/generic.c |   15 ++++++++-------
>  2 files changed, 12 insertions(+), 7 deletions(-)

You missed a spot in the conversion in
arch/x86/kernel/cpu/mtrr/cleanup.c::x86_get_mtrr_mem_range():

There we have

	if (base < (1<<(20-PAGE_SHIFT)) && mtrr_state.have_fixed &&
	    (mtrr_state.enabled & 1)) {

which should be mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED.

> diff --git a/arch/x86/include/uapi/asm/mtrr.h b/arch/x86/include/uapi/asm/mtrr.h
> index d0acb65..66ba88d 100644
> --- a/arch/x86/include/uapi/asm/mtrr.h
> +++ b/arch/x86/include/uapi/asm/mtrr.h
> @@ -88,6 +88,10 @@ struct mtrr_state_type {
>         mtrr_type def_type;
>  };
> 
> +/* Bit fields for enabled in struct mtrr_state_type */
> +#define MTRR_STATE_MTRR_FIXED_ENABLED  0x01
> +#define MTRR_STATE_MTRR_ENABLED                0x02
> +
>  #define MTRRphysBase_MSR(reg) (0x200 + 2 * (reg))
>  #define MTRRphysMask_MSR(reg) (0x200 + 2 * (reg) + 1)

Please add those to arch/x86/include/asm/mtrr.h instead. They have no
place in the uapi header.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 4/7] mtrr, x86: Fix MTRR state checks in mtrr_type_lookup()
@ 2015-05-06 11:47     ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-06 11:47 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, Mar 24, 2015 at 04:08:38PM -0600, Toshi Kani wrote:
> 'mtrr_state.enabled' contains the FE (fixed MTRRs enabled)
> and E (MTRRs enabled) flags in MSR_MTRRdefType.  Intel SDM,
> section 11.11.2.1, defines these flags as follows:
>  - All MTRRs are disabled when the E flag is clear.
>    The FE flag has no affect when the E flag is clear.
>  - The default type is enabled when the E flag is set.
>  - MTRR variable ranges are enabled when the E flag is set.
>  - MTRR fixed ranges are enabled when both E and FE flags
>    are set.
> 
> MTRR state checks in __mtrr_type_lookup() do not match with
> SDM.  Hence, this patch makes the following changes:
>  - The current code detects MTRRs disabled when both E and
>    FE flags are clear in mtrr_state.enabled.  Fix to detect
>    MTRRs disabled when the E flag is clear.
>  - The current code does not check if the FE bit is set in
>    mtrr_state.enabled when looking into the fixed entries.
>    Fix to check the FE flag.
>  - The current code returns the default type when the E flag
>    is clear in mtrr_state.enabled.  However, the default type
>    is also disabled when the E flag is clear.  Fix to remove
>    the code as this case is handled as MTRR disabled with
>    the 1st change.
> 
> In addition, this patch defines the E and FE flags in
> mtrr_state.enabled as follows.
>  - FE flag: MTRR_STATE_MTRR_FIXED_ENABLED
>  - E  flag: MTRR_STATE_MTRR_ENABLED
> 
> print_mtrr_state() is also updated accordingly.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> ---
>  arch/x86/include/uapi/asm/mtrr.h   |    4 ++++
>  arch/x86/kernel/cpu/mtrr/generic.c |   15 ++++++++-------
>  2 files changed, 12 insertions(+), 7 deletions(-)

You missed a spot in the conversion in
arch/x86/kernel/cpu/mtrr/cleanup.c::x86_get_mtrr_mem_range():

There we have

	if (base < (1<<(20-PAGE_SHIFT)) && mtrr_state.have_fixed &&
	    (mtrr_state.enabled & 1)) {

which should be mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED.

> diff --git a/arch/x86/include/uapi/asm/mtrr.h b/arch/x86/include/uapi/asm/mtrr.h
> index d0acb65..66ba88d 100644
> --- a/arch/x86/include/uapi/asm/mtrr.h
> +++ b/arch/x86/include/uapi/asm/mtrr.h
> @@ -88,6 +88,10 @@ struct mtrr_state_type {
>         mtrr_type def_type;
>  };
> 
> +/* Bit fields for enabled in struct mtrr_state_type */
> +#define MTRR_STATE_MTRR_FIXED_ENABLED  0x01
> +#define MTRR_STATE_MTRR_ENABLED                0x02
> +
>  #define MTRRphysBase_MSR(reg) (0x200 + 2 * (reg))
>  #define MTRRphysMask_MSR(reg) (0x200 + 2 * (reg) + 1)

Please add those to arch/x86/include/asm/mtrr.h instead. They have no
place in the uapi header.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 6/7] mtrr, x86: Clean up mtrr_type_lookup()
  2015-03-24 22:08   ` Toshi Kani
@ 2015-05-06 13:41     ` Borislav Petkov
  -1 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-06 13:41 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, Mar 24, 2015 at 04:08:40PM -0600, Toshi Kani wrote:
> MTRRs contain fixed and variable entries.  mtrr_type_lookup()
> may repeatedly call __mtrr_type_lookup() to handle a request
> that overlaps with variable entries.  However,
> __mtrr_type_lookup() also handles the fixed entries, which
> do not have to be repeated.  Therefore, this patch creates
> separate functions, mtrr_type_lookup_fixed() and
> mtrr_type_lookup_variable(), to handle the fixed and variable
> ranges respectively.
> 
> The patch also updates the function headers to clarify the
> return values and output argument.  It updates comments to
> clarify that the repeating is necessary to handle overlaps
> with the default type, since overlaps with multiple entries
> alone can be handled without such repeating.
> 
> There is no functional change in this patch.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> ---
>  arch/x86/kernel/cpu/mtrr/generic.c |  137 +++++++++++++++++++++++-------------
>  1 file changed, 86 insertions(+), 51 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
> index 8bd1298..3652e2b 100644
> --- a/arch/x86/kernel/cpu/mtrr/generic.c
> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> @@ -102,55 +102,69 @@ static int check_type_overlap(u8 *prev, u8 *curr)
>  	return 0;
>  }
>  
> -/*
> - * Error/Semi-error returns:
> - * MTRR_TYPE_INVALID - when MTRR is not enabled
> - * *repeat == 1 implies [start:end] spanned across MTRR range and type returned
> - *		corresponds only to [start:*partial_end].
> - *		Caller has to lookup again for [*partial_end:end].
> +/**
> + * mtrr_type_lookup_fixed - look up memory type in MTRR fixed entries
> + *
> + * MTRR fixed entries are divided into the following ways:
> + *  0x00000 - 0x7FFFF : This range is divided into eight 64KB sub-ranges
> + *  0x80000 - 0xBFFFF : This range is divided into sixteen 16KB sub-ranges
> + *  0xC0000 - 0xFFFFF : This range is divided into sixty-four 4KB sub-ranges

No need for those - simply a pointer to either the SDM or APM manuals'
section suffices as they both describe it good.

> + *
> + * Return Values:
> + * MTRR_TYPE_(type)  - Matched memory type
> + * MTRR_TYPE_INVALID - Unmatched or fixed entries are disabled
>   */
> -static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
> +static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
> +{
> +	int idx;
> +
> +	if (start >= 0x100000)
> +		return MTRR_TYPE_INVALID;
> +
> +	if (!(mtrr_state.have_fixed) ||
> +	    !(mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
> +		return MTRR_TYPE_INVALID;
> +
> +	if (start < 0x80000) {		/* 0x0 - 0x7FFFF */
> +		idx = 0;
> +		idx += (start >> 16);
> +		return mtrr_state.fixed_ranges[idx];
> +
> +	} else if (start < 0xC0000) {	/* 0x80000 - 0xBFFFF */
> +		idx = 1 * 8;
> +		idx += ((start - 0x80000) >> 14);
> +		return mtrr_state.fixed_ranges[idx];
> +	}
> +
> +	/* 0xC0000 - 0xFFFFF */
> +	idx = 3 * 8;
> +	idx += ((start - 0xC0000) >> 12);
> +	return mtrr_state.fixed_ranges[idx];
> +}
> +
> +/**
> + * mtrr_type_lookup_variable - look up memory type in MTRR variable entries
> + *
> + * Return Value:
> + * MTRR_TYPE_(type) - Matched memory type or default memory type (unmatched)
> + *
> + * Output Argument:
> + * repeat - Set to 1 when [start:end] spanned across MTRR range and type
> + *	    returned corresponds only to [start:*partial_end].  Caller has
> + *	    to lookup again for [*partial_end:end].
> + */
> +static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
> +				    int *repeat)
>  {
>  	int i;
>  	u64 base, mask;
>  	u8 prev_match, curr_match;
>  
>  	*repeat = 0;
> -	if (!mtrr_state_set)
> -		return MTRR_TYPE_INVALID;
> -
> -	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
> -		return MTRR_TYPE_INVALID;
>  
>  	/* Make end inclusive end, instead of exclusive */
>  	end--;
>  
> -	/* Look in fixed ranges. Just return the type as per start */
> -	if ((start < 0x100000) &&
> -	    (mtrr_state.have_fixed) &&
> -	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
> -		int idx;
> -
> -		if (start < 0x80000) {
> -			idx = 0;
> -			idx += (start >> 16);
> -			return mtrr_state.fixed_ranges[idx];
> -		} else if (start < 0xC0000) {
> -			idx = 1 * 8;
> -			idx += ((start - 0x80000) >> 14);
> -			return mtrr_state.fixed_ranges[idx];
> -		} else {
> -			idx = 3 * 8;
> -			idx += ((start - 0xC0000) >> 12);
> -			return mtrr_state.fixed_ranges[idx];
> -		}
> -	}
> -
> -	/*
> -	 * Look in variable ranges
> -	 * Look of multiple ranges matching this address and pick type
> -	 * as per MTRR precedence
> -	 */
>  	prev_match = MTRR_TYPE_INVALID;
>  	for (i = 0; i < num_var_ranges; ++i) {
>  		unsigned short start_state, end_state, inclusive;
> @@ -179,7 +193,8 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
>  			 * Return the type for first region and a pointer to
>  			 * the start of second region so that caller will
>  			 * lookup again on the second region.
> -			 * Note: This way we handle multiple overlaps as well.
> +			 * Note: This way we handle overlaps with multiple
> +			 * entries and the default type properly.
>  			 */
>  			if (start_state)
>  				*partial_end = base + get_mtrr_size(mask);
> @@ -208,21 +223,18 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
>  			return curr_match;
>  	}
>  
> -	if (mtrr_tom2) {
> -		if (start >= (1ULL<<32) && (end < mtrr_tom2))
> -			return MTRR_TYPE_WRBACK;
> -	}
> -
>  	if (prev_match != MTRR_TYPE_INVALID)
>  		return prev_match;
>  
>  	return mtrr_state.def_type;
>  }
>  
> -/*
> - * Returns the effective MTRR type for the region
> - * Error return:
> - * MTRR_TYPE_INVALID - when MTRR is not enabled
> +/**
> + * mtrr_type_lookup - look up memory type in MTRR
> + *
> + * Return Values:
> + * MTRR_TYPE_(type)  - The effective MTRR type for the region
> + * MTRR_TYPE_INVALID - MTRR is disabled
>   */
>  u8 mtrr_type_lookup(u64 start, u64 end)
>  {
> @@ -230,22 +242,45 @@ u8 mtrr_type_lookup(u64 start, u64 end)
>  	int repeat;
>  	u64 partial_end;
>  
> -	type = __mtrr_type_lookup(start, end, &partial_end, &repeat);
> +	if (!mtrr_state_set)
> +		return MTRR_TYPE_INVALID;
> +
> +	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
> +		return MTRR_TYPE_INVALID;
> +
> +	/*
> +	 * Look up the fixed ranges first, which take priority over
> +	 * the variable ranges.
> +	 */
> +	type = mtrr_type_lookup_fixed(start, end);
> +	if (type != MTRR_TYPE_INVALID)
> +		return type;

Huh, why are we not looking at start?

I mean, fixed MTRRs cover the first 1MB so we can simply do:

        if ((start < 0x100000) &&
            (mtrr_state.have_fixed) &&
            (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
		return mtrr_type_lookup_fixed(start, end);

and for all the other ranges we would do the variable lookup:

	type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
	...

?

Although I don't know what the code is supposed to do when a region
starts in the fixed range and overlaps its end, i,e, something like
that:

	[ start ... 0x100000 ... end ]

The current code would return a fixed range index and that would be not
really correct.

OTOH, this has been like this forever so maybe we don't care...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 6/7] mtrr, x86: Clean up mtrr_type_lookup()
@ 2015-05-06 13:41     ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-06 13:41 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, Mar 24, 2015 at 04:08:40PM -0600, Toshi Kani wrote:
> MTRRs contain fixed and variable entries.  mtrr_type_lookup()
> may repeatedly call __mtrr_type_lookup() to handle a request
> that overlaps with variable entries.  However,
> __mtrr_type_lookup() also handles the fixed entries, which
> do not have to be repeated.  Therefore, this patch creates
> separate functions, mtrr_type_lookup_fixed() and
> mtrr_type_lookup_variable(), to handle the fixed and variable
> ranges respectively.
> 
> The patch also updates the function headers to clarify the
> return values and output argument.  It updates comments to
> clarify that the repeating is necessary to handle overlaps
> with the default type, since overlaps with multiple entries
> alone can be handled without such repeating.
> 
> There is no functional change in this patch.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> ---
>  arch/x86/kernel/cpu/mtrr/generic.c |  137 +++++++++++++++++++++++-------------
>  1 file changed, 86 insertions(+), 51 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
> index 8bd1298..3652e2b 100644
> --- a/arch/x86/kernel/cpu/mtrr/generic.c
> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> @@ -102,55 +102,69 @@ static int check_type_overlap(u8 *prev, u8 *curr)
>  	return 0;
>  }
>  
> -/*
> - * Error/Semi-error returns:
> - * MTRR_TYPE_INVALID - when MTRR is not enabled
> - * *repeat == 1 implies [start:end] spanned across MTRR range and type returned
> - *		corresponds only to [start:*partial_end].
> - *		Caller has to lookup again for [*partial_end:end].
> +/**
> + * mtrr_type_lookup_fixed - look up memory type in MTRR fixed entries
> + *
> + * MTRR fixed entries are divided into the following ways:
> + *  0x00000 - 0x7FFFF : This range is divided into eight 64KB sub-ranges
> + *  0x80000 - 0xBFFFF : This range is divided into sixteen 16KB sub-ranges
> + *  0xC0000 - 0xFFFFF : This range is divided into sixty-four 4KB sub-ranges

No need for those - simply a pointer to either the SDM or APM manuals'
section suffices as they both describe it good.

> + *
> + * Return Values:
> + * MTRR_TYPE_(type)  - Matched memory type
> + * MTRR_TYPE_INVALID - Unmatched or fixed entries are disabled
>   */
> -static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
> +static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
> +{
> +	int idx;
> +
> +	if (start >= 0x100000)
> +		return MTRR_TYPE_INVALID;
> +
> +	if (!(mtrr_state.have_fixed) ||
> +	    !(mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
> +		return MTRR_TYPE_INVALID;
> +
> +	if (start < 0x80000) {		/* 0x0 - 0x7FFFF */
> +		idx = 0;
> +		idx += (start >> 16);
> +		return mtrr_state.fixed_ranges[idx];
> +
> +	} else if (start < 0xC0000) {	/* 0x80000 - 0xBFFFF */
> +		idx = 1 * 8;
> +		idx += ((start - 0x80000) >> 14);
> +		return mtrr_state.fixed_ranges[idx];
> +	}
> +
> +	/* 0xC0000 - 0xFFFFF */
> +	idx = 3 * 8;
> +	idx += ((start - 0xC0000) >> 12);
> +	return mtrr_state.fixed_ranges[idx];
> +}
> +
> +/**
> + * mtrr_type_lookup_variable - look up memory type in MTRR variable entries
> + *
> + * Return Value:
> + * MTRR_TYPE_(type) - Matched memory type or default memory type (unmatched)
> + *
> + * Output Argument:
> + * repeat - Set to 1 when [start:end] spanned across MTRR range and type
> + *	    returned corresponds only to [start:*partial_end].  Caller has
> + *	    to lookup again for [*partial_end:end].
> + */
> +static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
> +				    int *repeat)
>  {
>  	int i;
>  	u64 base, mask;
>  	u8 prev_match, curr_match;
>  
>  	*repeat = 0;
> -	if (!mtrr_state_set)
> -		return MTRR_TYPE_INVALID;
> -
> -	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
> -		return MTRR_TYPE_INVALID;
>  
>  	/* Make end inclusive end, instead of exclusive */
>  	end--;
>  
> -	/* Look in fixed ranges. Just return the type as per start */
> -	if ((start < 0x100000) &&
> -	    (mtrr_state.have_fixed) &&
> -	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
> -		int idx;
> -
> -		if (start < 0x80000) {
> -			idx = 0;
> -			idx += (start >> 16);
> -			return mtrr_state.fixed_ranges[idx];
> -		} else if (start < 0xC0000) {
> -			idx = 1 * 8;
> -			idx += ((start - 0x80000) >> 14);
> -			return mtrr_state.fixed_ranges[idx];
> -		} else {
> -			idx = 3 * 8;
> -			idx += ((start - 0xC0000) >> 12);
> -			return mtrr_state.fixed_ranges[idx];
> -		}
> -	}
> -
> -	/*
> -	 * Look in variable ranges
> -	 * Look of multiple ranges matching this address and pick type
> -	 * as per MTRR precedence
> -	 */
>  	prev_match = MTRR_TYPE_INVALID;
>  	for (i = 0; i < num_var_ranges; ++i) {
>  		unsigned short start_state, end_state, inclusive;
> @@ -179,7 +193,8 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
>  			 * Return the type for first region and a pointer to
>  			 * the start of second region so that caller will
>  			 * lookup again on the second region.
> -			 * Note: This way we handle multiple overlaps as well.
> +			 * Note: This way we handle overlaps with multiple
> +			 * entries and the default type properly.
>  			 */
>  			if (start_state)
>  				*partial_end = base + get_mtrr_size(mask);
> @@ -208,21 +223,18 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
>  			return curr_match;
>  	}
>  
> -	if (mtrr_tom2) {
> -		if (start >= (1ULL<<32) && (end < mtrr_tom2))
> -			return MTRR_TYPE_WRBACK;
> -	}
> -
>  	if (prev_match != MTRR_TYPE_INVALID)
>  		return prev_match;
>  
>  	return mtrr_state.def_type;
>  }
>  
> -/*
> - * Returns the effective MTRR type for the region
> - * Error return:
> - * MTRR_TYPE_INVALID - when MTRR is not enabled
> +/**
> + * mtrr_type_lookup - look up memory type in MTRR
> + *
> + * Return Values:
> + * MTRR_TYPE_(type)  - The effective MTRR type for the region
> + * MTRR_TYPE_INVALID - MTRR is disabled
>   */
>  u8 mtrr_type_lookup(u64 start, u64 end)
>  {
> @@ -230,22 +242,45 @@ u8 mtrr_type_lookup(u64 start, u64 end)
>  	int repeat;
>  	u64 partial_end;
>  
> -	type = __mtrr_type_lookup(start, end, &partial_end, &repeat);
> +	if (!mtrr_state_set)
> +		return MTRR_TYPE_INVALID;
> +
> +	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
> +		return MTRR_TYPE_INVALID;
> +
> +	/*
> +	 * Look up the fixed ranges first, which take priority over
> +	 * the variable ranges.
> +	 */
> +	type = mtrr_type_lookup_fixed(start, end);
> +	if (type != MTRR_TYPE_INVALID)
> +		return type;

Huh, why are we not looking at start?

I mean, fixed MTRRs cover the first 1MB so we can simply do:

        if ((start < 0x100000) &&
            (mtrr_state.have_fixed) &&
            (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
		return mtrr_type_lookup_fixed(start, end);

and for all the other ranges we would do the variable lookup:

	type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
	...

?

Although I don't know what the code is supposed to do when a region
starts in the fixed range and overlaps its end, i,e, something like
that:

	[ start ... 0x100000 ... end ]

The current code would return a fixed range index and that would be not
really correct.

OTOH, this has been like this forever so maybe we don't care...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 4/7] mtrr, x86: Fix MTRR state checks in mtrr_type_lookup()
  2015-05-06 11:47     ` Borislav Petkov
@ 2015-05-06 15:23       ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-06 15:23 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Wed, 2015-05-06 at 13:47 +0200, Borislav Petkov wrote:
> On Tue, Mar 24, 2015 at 04:08:38PM -0600, Toshi Kani wrote:
> > 'mtrr_state.enabled' contains the FE (fixed MTRRs enabled)
> > and E (MTRRs enabled) flags in MSR_MTRRdefType.  Intel SDM,
> > section 11.11.2.1, defines these flags as follows:
> >  - All MTRRs are disabled when the E flag is clear.
> >    The FE flag has no affect when the E flag is clear.
> >  - The default type is enabled when the E flag is set.
> >  - MTRR variable ranges are enabled when the E flag is set.
> >  - MTRR fixed ranges are enabled when both E and FE flags
> >    are set.
> > 
> > MTRR state checks in __mtrr_type_lookup() do not match with
> > SDM.  Hence, this patch makes the following changes:
> >  - The current code detects MTRRs disabled when both E and
> >    FE flags are clear in mtrr_state.enabled.  Fix to detect
> >    MTRRs disabled when the E flag is clear.
> >  - The current code does not check if the FE bit is set in
> >    mtrr_state.enabled when looking into the fixed entries.
> >    Fix to check the FE flag.
> >  - The current code returns the default type when the E flag
> >    is clear in mtrr_state.enabled.  However, the default type
> >    is also disabled when the E flag is clear.  Fix to remove
> >    the code as this case is handled as MTRR disabled with
> >    the 1st change.
> > 
> > In addition, this patch defines the E and FE flags in
> > mtrr_state.enabled as follows.
> >  - FE flag: MTRR_STATE_MTRR_FIXED_ENABLED
> >  - E  flag: MTRR_STATE_MTRR_ENABLED
> > 
> > print_mtrr_state() is also updated accordingly.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > ---
> >  arch/x86/include/uapi/asm/mtrr.h   |    4 ++++
> >  arch/x86/kernel/cpu/mtrr/generic.c |   15 ++++++++-------
> >  2 files changed, 12 insertions(+), 7 deletions(-)
> 
> You missed a spot in the conversion in
> arch/x86/kernel/cpu/mtrr/cleanup.c::x86_get_mtrr_mem_range():
> 
> There we have
> 
> 	if (base < (1<<(20-PAGE_SHIFT)) && mtrr_state.have_fixed &&
> 	    (mtrr_state.enabled & 1)) {
> 
> which should be mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED.

Right.  I will also check both MTRR_STATE_MTRR_FIXED_ENABLED &
MTRR_STATE_MTRR_FIXED_ENABLED bits here.

> > diff --git a/arch/x86/include/uapi/asm/mtrr.h b/arch/x86/include/uapi/asm/mtrr.h
> > index d0acb65..66ba88d 100644
> > --- a/arch/x86/include/uapi/asm/mtrr.h
> > +++ b/arch/x86/include/uapi/asm/mtrr.h
> > @@ -88,6 +88,10 @@ struct mtrr_state_type {
> >         mtrr_type def_type;
> >  };
> > 
> > +/* Bit fields for enabled in struct mtrr_state_type */
> > +#define MTRR_STATE_MTRR_FIXED_ENABLED  0x01
> > +#define MTRR_STATE_MTRR_ENABLED                0x02
> > +
> >  #define MTRRphysBase_MSR(reg) (0x200 + 2 * (reg))
> >  #define MTRRphysMask_MSR(reg) (0x200 + 2 * (reg) + 1)
> 
> Please add those to arch/x86/include/asm/mtrr.h instead. They have no
> place in the uapi header.

I have a question.  Those bits define the bit field of enabled in struct
mtrr_state_type, which is defined in this header.  Is it OK to only move
those definitions to other header?

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 4/7] mtrr, x86: Fix MTRR state checks in mtrr_type_lookup()
@ 2015-05-06 15:23       ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-06 15:23 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Wed, 2015-05-06 at 13:47 +0200, Borislav Petkov wrote:
> On Tue, Mar 24, 2015 at 04:08:38PM -0600, Toshi Kani wrote:
> > 'mtrr_state.enabled' contains the FE (fixed MTRRs enabled)
> > and E (MTRRs enabled) flags in MSR_MTRRdefType.  Intel SDM,
> > section 11.11.2.1, defines these flags as follows:
> >  - All MTRRs are disabled when the E flag is clear.
> >    The FE flag has no affect when the E flag is clear.
> >  - The default type is enabled when the E flag is set.
> >  - MTRR variable ranges are enabled when the E flag is set.
> >  - MTRR fixed ranges are enabled when both E and FE flags
> >    are set.
> > 
> > MTRR state checks in __mtrr_type_lookup() do not match with
> > SDM.  Hence, this patch makes the following changes:
> >  - The current code detects MTRRs disabled when both E and
> >    FE flags are clear in mtrr_state.enabled.  Fix to detect
> >    MTRRs disabled when the E flag is clear.
> >  - The current code does not check if the FE bit is set in
> >    mtrr_state.enabled when looking into the fixed entries.
> >    Fix to check the FE flag.
> >  - The current code returns the default type when the E flag
> >    is clear in mtrr_state.enabled.  However, the default type
> >    is also disabled when the E flag is clear.  Fix to remove
> >    the code as this case is handled as MTRR disabled with
> >    the 1st change.
> > 
> > In addition, this patch defines the E and FE flags in
> > mtrr_state.enabled as follows.
> >  - FE flag: MTRR_STATE_MTRR_FIXED_ENABLED
> >  - E  flag: MTRR_STATE_MTRR_ENABLED
> > 
> > print_mtrr_state() is also updated accordingly.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > ---
> >  arch/x86/include/uapi/asm/mtrr.h   |    4 ++++
> >  arch/x86/kernel/cpu/mtrr/generic.c |   15 ++++++++-------
> >  2 files changed, 12 insertions(+), 7 deletions(-)
> 
> You missed a spot in the conversion in
> arch/x86/kernel/cpu/mtrr/cleanup.c::x86_get_mtrr_mem_range():
> 
> There we have
> 
> 	if (base < (1<<(20-PAGE_SHIFT)) && mtrr_state.have_fixed &&
> 	    (mtrr_state.enabled & 1)) {
> 
> which should be mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED.

Right.  I will also check both MTRR_STATE_MTRR_FIXED_ENABLED &
MTRR_STATE_MTRR_FIXED_ENABLED bits here.

> > diff --git a/arch/x86/include/uapi/asm/mtrr.h b/arch/x86/include/uapi/asm/mtrr.h
> > index d0acb65..66ba88d 100644
> > --- a/arch/x86/include/uapi/asm/mtrr.h
> > +++ b/arch/x86/include/uapi/asm/mtrr.h
> > @@ -88,6 +88,10 @@ struct mtrr_state_type {
> >         mtrr_type def_type;
> >  };
> > 
> > +/* Bit fields for enabled in struct mtrr_state_type */
> > +#define MTRR_STATE_MTRR_FIXED_ENABLED  0x01
> > +#define MTRR_STATE_MTRR_ENABLED                0x02
> > +
> >  #define MTRRphysBase_MSR(reg) (0x200 + 2 * (reg))
> >  #define MTRRphysMask_MSR(reg) (0x200 + 2 * (reg) + 1)
> 
> Please add those to arch/x86/include/asm/mtrr.h instead. They have no
> place in the uapi header.

I have a question.  Those bits define the bit field of enabled in struct
mtrr_state_type, which is defined in this header.  Is it OK to only move
those definitions to other header?

Thanks,
-Toshi


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 6/7] mtrr, x86: Clean up mtrr_type_lookup()
  2015-05-06 13:41     ` Borislav Petkov
@ 2015-05-06 16:00       ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-06 16:00 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Wed, 2015-05-06 at 15:41 +0200, Borislav Petkov wrote:
> On Tue, Mar 24, 2015 at 04:08:40PM -0600, Toshi Kani wrote:
> > MTRRs contain fixed and variable entries.  mtrr_type_lookup()
> > may repeatedly call __mtrr_type_lookup() to handle a request
> > that overlaps with variable entries.  However,
> > __mtrr_type_lookup() also handles the fixed entries, which
> > do not have to be repeated.  Therefore, this patch creates
> > separate functions, mtrr_type_lookup_fixed() and
> > mtrr_type_lookup_variable(), to handle the fixed and variable
> > ranges respectively.
> > 
> > The patch also updates the function headers to clarify the
> > return values and output argument.  It updates comments to
> > clarify that the repeating is necessary to handle overlaps
> > with the default type, since overlaps with multiple entries
> > alone can be handled without such repeating.
> > 
> > There is no functional change in this patch.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > ---
> >  arch/x86/kernel/cpu/mtrr/generic.c |  137 +++++++++++++++++++++++-------------
> >  1 file changed, 86 insertions(+), 51 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
> > index 8bd1298..3652e2b 100644
> > --- a/arch/x86/kernel/cpu/mtrr/generic.c
> > +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> > @@ -102,55 +102,69 @@ static int check_type_overlap(u8 *prev, u8 *curr)
> >  	return 0;
> >  }
> >  
> > -/*
> > - * Error/Semi-error returns:
> > - * MTRR_TYPE_INVALID - when MTRR is not enabled
> > - * *repeat == 1 implies [start:end] spanned across MTRR range and type returned
> > - *		corresponds only to [start:*partial_end].
> > - *		Caller has to lookup again for [*partial_end:end].
> > +/**
> > + * mtrr_type_lookup_fixed - look up memory type in MTRR fixed entries
> > + *
> > + * MTRR fixed entries are divided into the following ways:
> > + *  0x00000 - 0x7FFFF : This range is divided into eight 64KB sub-ranges
> > + *  0x80000 - 0xBFFFF : This range is divided into sixteen 16KB sub-ranges
> > + *  0xC0000 - 0xFFFFF : This range is divided into sixty-four 4KB sub-ranges
> 
> No need for those - simply a pointer to either the SDM or APM manuals'
> section suffices as they both describe it good.

Ingo asked me to describe this info here in his review...

> > + *
> > + * Return Values:
> > + * MTRR_TYPE_(type)  - Matched memory type
> > + * MTRR_TYPE_INVALID - Unmatched or fixed entries are disabled
> >   */
> > -static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
> > +static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
> > +{
> > +	int idx;
> > +
> > +	if (start >= 0x100000)
> > +		return MTRR_TYPE_INVALID;
> > +
> > +	if (!(mtrr_state.have_fixed) ||
> > +	    !(mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
> > +		return MTRR_TYPE_INVALID;
> > +
> > +	if (start < 0x80000) {		/* 0x0 - 0x7FFFF */
> > +		idx = 0;
> > +		idx += (start >> 16);
> > +		return mtrr_state.fixed_ranges[idx];
> > +
> > +	} else if (start < 0xC0000) {	/* 0x80000 - 0xBFFFF */
> > +		idx = 1 * 8;
> > +		idx += ((start - 0x80000) >> 14);
> > +		return mtrr_state.fixed_ranges[idx];
> > +	}
> > +
> > +	/* 0xC0000 - 0xFFFFF */
> > +	idx = 3 * 8;
> > +	idx += ((start - 0xC0000) >> 12);
> > +	return mtrr_state.fixed_ranges[idx];
> > +}
> > +
> > +/**
> > + * mtrr_type_lookup_variable - look up memory type in MTRR variable entries
> > + *
> > + * Return Value:
> > + * MTRR_TYPE_(type) - Matched memory type or default memory type (unmatched)
> > + *
> > + * Output Argument:
> > + * repeat - Set to 1 when [start:end] spanned across MTRR range and type
> > + *	    returned corresponds only to [start:*partial_end].  Caller has
> > + *	    to lookup again for [*partial_end:end].
> > + */
> > +static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
> > +				    int *repeat)
> >  {
> >  	int i;
> >  	u64 base, mask;
> >  	u8 prev_match, curr_match;
> >  
> >  	*repeat = 0;
> > -	if (!mtrr_state_set)
> > -		return MTRR_TYPE_INVALID;
> > -
> > -	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
> > -		return MTRR_TYPE_INVALID;
> >  
> >  	/* Make end inclusive end, instead of exclusive */
> >  	end--;
> >  
> > -	/* Look in fixed ranges. Just return the type as per start */
> > -	if ((start < 0x100000) &&
> > -	    (mtrr_state.have_fixed) &&
> > -	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
> > -		int idx;
> > -
> > -		if (start < 0x80000) {
> > -			idx = 0;
> > -			idx += (start >> 16);
> > -			return mtrr_state.fixed_ranges[idx];
> > -		} else if (start < 0xC0000) {
> > -			idx = 1 * 8;
> > -			idx += ((start - 0x80000) >> 14);
> > -			return mtrr_state.fixed_ranges[idx];
> > -		} else {
> > -			idx = 3 * 8;
> > -			idx += ((start - 0xC0000) >> 12);
> > -			return mtrr_state.fixed_ranges[idx];
> > -		}
> > -	}
> > -
> > -	/*
> > -	 * Look in variable ranges
> > -	 * Look of multiple ranges matching this address and pick type
> > -	 * as per MTRR precedence
> > -	 */
> >  	prev_match = MTRR_TYPE_INVALID;
> >  	for (i = 0; i < num_var_ranges; ++i) {
> >  		unsigned short start_state, end_state, inclusive;
> > @@ -179,7 +193,8 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
> >  			 * Return the type for first region and a pointer to
> >  			 * the start of second region so that caller will
> >  			 * lookup again on the second region.
> > -			 * Note: This way we handle multiple overlaps as well.
> > +			 * Note: This way we handle overlaps with multiple
> > +			 * entries and the default type properly.
> >  			 */
> >  			if (start_state)
> >  				*partial_end = base + get_mtrr_size(mask);
> > @@ -208,21 +223,18 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
> >  			return curr_match;
> >  	}
> >  
> > -	if (mtrr_tom2) {
> > -		if (start >= (1ULL<<32) && (end < mtrr_tom2))
> > -			return MTRR_TYPE_WRBACK;
> > -	}
> > -
> >  	if (prev_match != MTRR_TYPE_INVALID)
> >  		return prev_match;
> >  
> >  	return mtrr_state.def_type;
> >  }
> >  
> > -/*
> > - * Returns the effective MTRR type for the region
> > - * Error return:
> > - * MTRR_TYPE_INVALID - when MTRR is not enabled
> > +/**
> > + * mtrr_type_lookup - look up memory type in MTRR
> > + *
> > + * Return Values:
> > + * MTRR_TYPE_(type)  - The effective MTRR type for the region
> > + * MTRR_TYPE_INVALID - MTRR is disabled
> >   */
> >  u8 mtrr_type_lookup(u64 start, u64 end)
> >  {
> > @@ -230,22 +242,45 @@ u8 mtrr_type_lookup(u64 start, u64 end)
> >  	int repeat;
> >  	u64 partial_end;
> >  
> > -	type = __mtrr_type_lookup(start, end, &partial_end, &repeat);
> > +	if (!mtrr_state_set)
> > +		return MTRR_TYPE_INVALID;
> > +
> > +	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
> > +		return MTRR_TYPE_INVALID;
> > +
> > +	/*
> > +	 * Look up the fixed ranges first, which take priority over
> > +	 * the variable ranges.
> > +	 */
> > +	type = mtrr_type_lookup_fixed(start, end);
> > +	if (type != MTRR_TYPE_INVALID)
> > +		return type;
> 
> Huh, why are we not looking at start?
> 
> I mean, fixed MTRRs cover the first 1MB so we can simply do:
> 
>         if ((start < 0x100000) &&
>             (mtrr_state.have_fixed) &&
>             (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
> 		return mtrr_type_lookup_fixed(start, end);

mtrr_type_lookup_fixed() checks the above conditions at entry, and
returns immediately with TYPE_INVALID.  I think it is safer to have such
checks in mtrr_type_lookup_fixed() in case there will be multiple
callers.

> and for all the other ranges we would do the variable lookup:
> 
> 	type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
> 	...
> 
> ?
> 
> Although I don't know what the code is supposed to do when a region
> starts in the fixed range and overlaps its end, i,e, something like
> that:
> 
> 	[ start ... 0x100000 ... end ]
> 
> The current code would return a fixed range index and that would be not
> really correct.
> 
> OTOH, this has been like this forever so maybe we don't care...

Right, and there is more.  As the original code had comment "Just return
the type as per start", which I noticed that I had accidentally removed,
the code only returns the type of the start address.  The fixed ranges
have multiple entries with different types.  Hence, a given range may
overlap with multiple fixed entries.  I will restore the comment in the
function header to clarify this limitation.

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 6/7] mtrr, x86: Clean up mtrr_type_lookup()
@ 2015-05-06 16:00       ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-06 16:00 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Wed, 2015-05-06 at 15:41 +0200, Borislav Petkov wrote:
> On Tue, Mar 24, 2015 at 04:08:40PM -0600, Toshi Kani wrote:
> > MTRRs contain fixed and variable entries.  mtrr_type_lookup()
> > may repeatedly call __mtrr_type_lookup() to handle a request
> > that overlaps with variable entries.  However,
> > __mtrr_type_lookup() also handles the fixed entries, which
> > do not have to be repeated.  Therefore, this patch creates
> > separate functions, mtrr_type_lookup_fixed() and
> > mtrr_type_lookup_variable(), to handle the fixed and variable
> > ranges respectively.
> > 
> > The patch also updates the function headers to clarify the
> > return values and output argument.  It updates comments to
> > clarify that the repeating is necessary to handle overlaps
> > with the default type, since overlaps with multiple entries
> > alone can be handled without such repeating.
> > 
> > There is no functional change in this patch.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > ---
> >  arch/x86/kernel/cpu/mtrr/generic.c |  137 +++++++++++++++++++++++-------------
> >  1 file changed, 86 insertions(+), 51 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
> > index 8bd1298..3652e2b 100644
> > --- a/arch/x86/kernel/cpu/mtrr/generic.c
> > +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> > @@ -102,55 +102,69 @@ static int check_type_overlap(u8 *prev, u8 *curr)
> >  	return 0;
> >  }
> >  
> > -/*
> > - * Error/Semi-error returns:
> > - * MTRR_TYPE_INVALID - when MTRR is not enabled
> > - * *repeat == 1 implies [start:end] spanned across MTRR range and type returned
> > - *		corresponds only to [start:*partial_end].
> > - *		Caller has to lookup again for [*partial_end:end].
> > +/**
> > + * mtrr_type_lookup_fixed - look up memory type in MTRR fixed entries
> > + *
> > + * MTRR fixed entries are divided into the following ways:
> > + *  0x00000 - 0x7FFFF : This range is divided into eight 64KB sub-ranges
> > + *  0x80000 - 0xBFFFF : This range is divided into sixteen 16KB sub-ranges
> > + *  0xC0000 - 0xFFFFF : This range is divided into sixty-four 4KB sub-ranges
> 
> No need for those - simply a pointer to either the SDM or APM manuals'
> section suffices as they both describe it good.

Ingo asked me to describe this info here in his review...

> > + *
> > + * Return Values:
> > + * MTRR_TYPE_(type)  - Matched memory type
> > + * MTRR_TYPE_INVALID - Unmatched or fixed entries are disabled
> >   */
> > -static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
> > +static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
> > +{
> > +	int idx;
> > +
> > +	if (start >= 0x100000)
> > +		return MTRR_TYPE_INVALID;
> > +
> > +	if (!(mtrr_state.have_fixed) ||
> > +	    !(mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
> > +		return MTRR_TYPE_INVALID;
> > +
> > +	if (start < 0x80000) {		/* 0x0 - 0x7FFFF */
> > +		idx = 0;
> > +		idx += (start >> 16);
> > +		return mtrr_state.fixed_ranges[idx];
> > +
> > +	} else if (start < 0xC0000) {	/* 0x80000 - 0xBFFFF */
> > +		idx = 1 * 8;
> > +		idx += ((start - 0x80000) >> 14);
> > +		return mtrr_state.fixed_ranges[idx];
> > +	}
> > +
> > +	/* 0xC0000 - 0xFFFFF */
> > +	idx = 3 * 8;
> > +	idx += ((start - 0xC0000) >> 12);
> > +	return mtrr_state.fixed_ranges[idx];
> > +}
> > +
> > +/**
> > + * mtrr_type_lookup_variable - look up memory type in MTRR variable entries
> > + *
> > + * Return Value:
> > + * MTRR_TYPE_(type) - Matched memory type or default memory type (unmatched)
> > + *
> > + * Output Argument:
> > + * repeat - Set to 1 when [start:end] spanned across MTRR range and type
> > + *	    returned corresponds only to [start:*partial_end].  Caller has
> > + *	    to lookup again for [*partial_end:end].
> > + */
> > +static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
> > +				    int *repeat)
> >  {
> >  	int i;
> >  	u64 base, mask;
> >  	u8 prev_match, curr_match;
> >  
> >  	*repeat = 0;
> > -	if (!mtrr_state_set)
> > -		return MTRR_TYPE_INVALID;
> > -
> > -	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
> > -		return MTRR_TYPE_INVALID;
> >  
> >  	/* Make end inclusive end, instead of exclusive */
> >  	end--;
> >  
> > -	/* Look in fixed ranges. Just return the type as per start */
> > -	if ((start < 0x100000) &&
> > -	    (mtrr_state.have_fixed) &&
> > -	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
> > -		int idx;
> > -
> > -		if (start < 0x80000) {
> > -			idx = 0;
> > -			idx += (start >> 16);
> > -			return mtrr_state.fixed_ranges[idx];
> > -		} else if (start < 0xC0000) {
> > -			idx = 1 * 8;
> > -			idx += ((start - 0x80000) >> 14);
> > -			return mtrr_state.fixed_ranges[idx];
> > -		} else {
> > -			idx = 3 * 8;
> > -			idx += ((start - 0xC0000) >> 12);
> > -			return mtrr_state.fixed_ranges[idx];
> > -		}
> > -	}
> > -
> > -	/*
> > -	 * Look in variable ranges
> > -	 * Look of multiple ranges matching this address and pick type
> > -	 * as per MTRR precedence
> > -	 */
> >  	prev_match = MTRR_TYPE_INVALID;
> >  	for (i = 0; i < num_var_ranges; ++i) {
> >  		unsigned short start_state, end_state, inclusive;
> > @@ -179,7 +193,8 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
> >  			 * Return the type for first region and a pointer to
> >  			 * the start of second region so that caller will
> >  			 * lookup again on the second region.
> > -			 * Note: This way we handle multiple overlaps as well.
> > +			 * Note: This way we handle overlaps with multiple
> > +			 * entries and the default type properly.
> >  			 */
> >  			if (start_state)
> >  				*partial_end = base + get_mtrr_size(mask);
> > @@ -208,21 +223,18 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
> >  			return curr_match;
> >  	}
> >  
> > -	if (mtrr_tom2) {
> > -		if (start >= (1ULL<<32) && (end < mtrr_tom2))
> > -			return MTRR_TYPE_WRBACK;
> > -	}
> > -
> >  	if (prev_match != MTRR_TYPE_INVALID)
> >  		return prev_match;
> >  
> >  	return mtrr_state.def_type;
> >  }
> >  
> > -/*
> > - * Returns the effective MTRR type for the region
> > - * Error return:
> > - * MTRR_TYPE_INVALID - when MTRR is not enabled
> > +/**
> > + * mtrr_type_lookup - look up memory type in MTRR
> > + *
> > + * Return Values:
> > + * MTRR_TYPE_(type)  - The effective MTRR type for the region
> > + * MTRR_TYPE_INVALID - MTRR is disabled
> >   */
> >  u8 mtrr_type_lookup(u64 start, u64 end)
> >  {
> > @@ -230,22 +242,45 @@ u8 mtrr_type_lookup(u64 start, u64 end)
> >  	int repeat;
> >  	u64 partial_end;
> >  
> > -	type = __mtrr_type_lookup(start, end, &partial_end, &repeat);
> > +	if (!mtrr_state_set)
> > +		return MTRR_TYPE_INVALID;
> > +
> > +	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
> > +		return MTRR_TYPE_INVALID;
> > +
> > +	/*
> > +	 * Look up the fixed ranges first, which take priority over
> > +	 * the variable ranges.
> > +	 */
> > +	type = mtrr_type_lookup_fixed(start, end);
> > +	if (type != MTRR_TYPE_INVALID)
> > +		return type;
> 
> Huh, why are we not looking at start?
> 
> I mean, fixed MTRRs cover the first 1MB so we can simply do:
> 
>         if ((start < 0x100000) &&
>             (mtrr_state.have_fixed) &&
>             (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
> 		return mtrr_type_lookup_fixed(start, end);

mtrr_type_lookup_fixed() checks the above conditions at entry, and
returns immediately with TYPE_INVALID.  I think it is safer to have such
checks in mtrr_type_lookup_fixed() in case there will be multiple
callers.

> and for all the other ranges we would do the variable lookup:
> 
> 	type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
> 	...
> 
> ?
> 
> Although I don't know what the code is supposed to do when a region
> starts in the fixed range and overlaps its end, i,e, something like
> that:
> 
> 	[ start ... 0x100000 ... end ]
> 
> The current code would return a fixed range index and that would be not
> really correct.
> 
> OTOH, this has been like this forever so maybe we don't care...

Right, and there is more.  As the original code had comment "Just return
the type as per start", which I noticed that I had accidentally removed,
the code only returns the type of the start address.  The fixed ranges
have multiple entries with different types.  Hence, a given range may
overlap with multiple fixed entries.  I will restore the comment in the
function header to clarify this limitation.

Thanks,
-Toshi


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: tools: Consolidate types.h
@ 2015-05-06 16:54 Oleg Nesterov
  2015-05-06 17:17 ` Borislav Petkov
  0 siblings, 1 reply; 710+ messages in thread
From: Oleg Nesterov @ 2015-05-06 16:54 UTC (permalink / raw)
  To: Borislav Petkov, Rusty Russell, Jiri Olsa; +Cc: linux-kernel

Hi,

I can't build the kernel after "git pull",

	In file included from /usr/include/asm/types.h:4,
			 from ./tools/include/linux/types.h:9,
			 from ./include/uapi/linux/elf.h:4,
			 from arch/x86/vdso/vdso2c.c:66:
	./include/uapi/asm-generic/int-ll64.h:11:29: error: asm/bitsperlong.h: No such file or directory

I am not 100% sure but it seems that this was broken by
d944c4eebcf4c0d5e5d9728fec110cbf0047ad7f "tools: Consolidate types.h"

Don't we need the patch below? Or should I finally update my (very old)
distro which doesn't have /usr/include/asm/bitsperlong.h ?

Oleg.


diff --git a/arch/x86/vdso/Makefile b/arch/x86/vdso/Makefile
index 275a3a8..e970320 100644
--- a/arch/x86/vdso/Makefile
+++ b/arch/x86/vdso/Makefile
@@ -51,7 +51,7 @@ VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
 $(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE
 	$(call if_changed,vdso)
 
-HOST_EXTRACFLAGS += -I$(srctree)/tools/include -I$(srctree)/include/uapi
+HOST_EXTRACFLAGS += -I$(srctree)/tools/include -I$(srctree)/include/uapi -I$(srctree)/arch/x86/include/uapi
 hostprogs-y			+= vdso2c
 
 quiet_cmd_vdso2c = VDSO2C  $@


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: tools: Consolidate types.h
  2015-05-06 16:54 tools: Consolidate types.h Oleg Nesterov
@ 2015-05-06 17:17 ` Borislav Petkov
  2015-05-06 17:30   ` Oleg Nesterov
  0 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-06 17:17 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Rusty Russell, Jiri Olsa, linux-kernel

On Wed, May 06, 2015 at 06:54:00PM +0200, Oleg Nesterov wrote:
> Hi,
> 
> I can't build the kernel after "git pull",

You mean, you can't build perf tool...?

> 
> 	In file included from /usr/include/asm/types.h:4,
> 			 from ./tools/include/linux/types.h:9,
> 			 from ./include/uapi/linux/elf.h:4,
> 			 from arch/x86/vdso/vdso2c.c:66:
> 	./include/uapi/asm-generic/int-ll64.h:11:29: error: asm/bitsperlong.h: No such file or directory
> 
> I am not 100% sure but it seems that this was broken by
> d944c4eebcf4c0d5e5d9728fec110cbf0047ad7f "tools: Consolidate types.h"
> 
> Don't we need the patch below? Or should I finally update my (very old)
> distro which doesn't have /usr/include/asm/bitsperlong.h ?
> 
> Oleg.
> 
> 
> diff --git a/arch/x86/vdso/Makefile b/arch/x86/vdso/Makefile
> index 275a3a8..e970320 100644
> --- a/arch/x86/vdso/Makefile
> +++ b/arch/x86/vdso/Makefile
> @@ -51,7 +51,7 @@ VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
>  $(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE
>  	$(call if_changed,vdso)
>  
> -HOST_EXTRACFLAGS += -I$(srctree)/tools/include -I$(srctree)/include/uapi
> +HOST_EXTRACFLAGS += -I$(srctree)/tools/include -I$(srctree)/include/uapi -I$(srctree)/arch/x86/include/uapi

Do you have kernel-headers installed on your distro? That's
basically those uapi headers packaged separately. There's also "make
headers_install" which should probably do that (haven't tried it
though).

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: tools: Consolidate types.h
  2015-05-06 17:17 ` Borislav Petkov
@ 2015-05-06 17:30   ` Oleg Nesterov
  2015-05-06 17:37     ` Borislav Petkov
  0 siblings, 1 reply; 710+ messages in thread
From: Oleg Nesterov @ 2015-05-06 17:30 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Rusty Russell, Jiri Olsa, linux-kernel

On 05/06, Borislav Petkov wrote:
>
> On Wed, May 06, 2015 at 06:54:00PM +0200, Oleg Nesterov wrote:
> > Hi,
> >
> > I can't build the kernel after "git pull",
>
> You mean, you can't build perf tool...?

No, make bzImage fails, it can't compile arch/x86/vdso/vdso2c

>
> >
> > 	In file included from /usr/include/asm/types.h:4,
> > 			 from ./tools/include/linux/types.h:9,
> > 			 from ./include/uapi/linux/elf.h:4,
> > 			 from arch/x86/vdso/vdso2c.c:66:
> > 	./include/uapi/asm-generic/int-ll64.h:11:29: error: asm/bitsperlong.h: No such file or directory
> >
> > I am not 100% sure but it seems that this was broken by
> > d944c4eebcf4c0d5e5d9728fec110cbf0047ad7f "tools: Consolidate types.h"
> >
> > Don't we need the patch below? Or should I finally update my (very old)
> > distro which doesn't have /usr/include/asm/bitsperlong.h ?
> >
> > Oleg.
> >
> >
> > diff --git a/arch/x86/vdso/Makefile b/arch/x86/vdso/Makefile
> > index 275a3a8..e970320 100644
> > --- a/arch/x86/vdso/Makefile
> > +++ b/arch/x86/vdso/Makefile
> > @@ -51,7 +51,7 @@ VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
> >  $(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE
> >  	$(call if_changed,vdso)
> >
> > -HOST_EXTRACFLAGS += -I$(srctree)/tools/include -I$(srctree)/include/uapi
> > +HOST_EXTRACFLAGS += -I$(srctree)/tools/include -I$(srctree)/include/uapi -I$(srctree)/arch/x86/include/uapi
>
> Do you have kernel-headers installed on your distro?

I have no idea ;) but I guess they were installed. many years ago.

> That's
> basically those uapi headers packaged separately. There's also "make
> headers_install" which should probably do that (haven't tried it
> though).

Perhaps. but still, if HOST_EXTRACFLAGS has -I$(srctree)/include/uapi, why
it doesn't add arch/x86/include/uapi? This doesn't look consistent in any
case.

Oleg.


^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: tools: Consolidate types.h
  2015-05-06 17:30   ` Oleg Nesterov
@ 2015-05-06 17:37     ` Borislav Petkov
  2015-05-07  2:53       ` Andy Lutomirski
  0 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-06 17:37 UTC (permalink / raw)
  To: Oleg Nesterov, Andy Lutomirski; +Cc: Rusty Russell, Jiri Olsa, linux-kernel

On Wed, May 06, 2015 at 07:30:35PM +0200, Oleg Nesterov wrote:
> On 05/06, Borislav Petkov wrote:
> >
> > On Wed, May 06, 2015 at 06:54:00PM +0200, Oleg Nesterov wrote:
> > > Hi,
> > >
> > > I can't build the kernel after "git pull",
> >
> > You mean, you can't build perf tool...?
> 
> No, make bzImage fails, it can't compile arch/x86/vdso/vdso2c

Wow, so this commit is a year old and this is the first time I see a it
causing a failure. You must have a really ooold distro :-)

> > > diff --git a/arch/x86/vdso/Makefile b/arch/x86/vdso/Makefile
> > > index 275a3a8..e970320 100644
> > > --- a/arch/x86/vdso/Makefile
> > > +++ b/arch/x86/vdso/Makefile
> > > @@ -51,7 +51,7 @@ VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
> > >  $(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE
> > >  	$(call if_changed,vdso)
> > >
> > > -HOST_EXTRACFLAGS += -I$(srctree)/tools/include -I$(srctree)/include/uapi
> > > +HOST_EXTRACFLAGS += -I$(srctree)/tools/include -I$(srctree)/include/uapi -I$(srctree)/arch/x86/include/uapi
> >
> > Do you have kernel-headers installed on your distro?
> 
> I have no idea ;) but I guess they were installed. many years ago.
> 
> > That's
> > basically those uapi headers packaged separately. There's also "make
> > headers_install" which should probably do that (haven't tried it
> > though).
> 
> Perhaps. but still, if HOST_EXTRACFLAGS has -I$(srctree)/include/uapi, why
> it doesn't add arch/x86/include/uapi? This doesn't look consistent in any
> case.

Yeah, I guess it wouldn't hurt. Andy, see quoted hunk above ^^.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 4/7] mtrr, x86: Fix MTRR state checks in mtrr_type_lookup()
  2015-05-06 15:23       ` Toshi Kani
@ 2015-05-06 22:39         ` Borislav Petkov
  -1 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-06 22:39 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Wed, May 06, 2015 at 09:23:31AM -0600, Toshi Kani wrote:
> I have a question.  Those bits define the bit field of enabled in struct
> mtrr_state_type, which is defined in this header.  Is it OK to only move
> those definitions to other header?

I think we shouldn't expose stuff to userspace if we don't have to
because then we're stuck with it. Userspace has managed so far without
those defines so I don't see why we should export them now.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 4/7] mtrr, x86: Fix MTRR state checks in mtrr_type_lookup()
@ 2015-05-06 22:39         ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-06 22:39 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Wed, May 06, 2015 at 09:23:31AM -0600, Toshi Kani wrote:
> I have a question.  Those bits define the bit field of enabled in struct
> mtrr_state_type, which is defined in this header.  Is it OK to only move
> those definitions to other header?

I think we shouldn't expose stuff to userspace if we don't have to
because then we're stuck with it. Userspace has managed so far without
those defines so I don't see why we should export them now.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 6/7] mtrr, x86: Clean up mtrr_type_lookup()
  2015-05-06 16:00       ` Toshi Kani
@ 2015-05-06 22:49         ` Borislav Petkov
  -1 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-06 22:49 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Wed, May 06, 2015 at 10:00:30AM -0600, Toshi Kani wrote:
> Ingo asked me to describe this info here in his review...

Ok.

> mtrr_type_lookup_fixed() checks the above conditions at entry, and
> returns immediately with TYPE_INVALID.  I think it is safer to have such
> checks in mtrr_type_lookup_fixed() in case there will be multiple
> callers.

This is not what I mean - I mean to call mtrr_type_lookup_fixed() based
on @start and not unconditionally, like you do.

And there most likely won't be multiple callers because we're phasing
out MTRR use.

And even if there are, they better look at how this function is being
called before calling it. Which I seriously doubt - it is a static
function which you *just* came up with.

> Right, and there is more.  As the original code had comment "Just return
> the type as per start", which I noticed that I had accidentally removed,
> the code only returns the type of the start address.  The fixed ranges
> have multiple entries with different types.  Hence, a given range may
> overlap with multiple fixed entries.  I will restore the comment in the
> function header to clarify this limitation.

Ok, let's cleanup this function first and then consider fixing other
possible bugs which haven't been fixed since forever. Again, we might
not even need to address them because we won't be using MTRRs once we
switch to PAT completely, which is what Luis is working on.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 6/7] mtrr, x86: Clean up mtrr_type_lookup()
@ 2015-05-06 22:49         ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-06 22:49 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Wed, May 06, 2015 at 10:00:30AM -0600, Toshi Kani wrote:
> Ingo asked me to describe this info here in his review...

Ok.

> mtrr_type_lookup_fixed() checks the above conditions at entry, and
> returns immediately with TYPE_INVALID.  I think it is safer to have such
> checks in mtrr_type_lookup_fixed() in case there will be multiple
> callers.

This is not what I mean - I mean to call mtrr_type_lookup_fixed() based
on @start and not unconditionally, like you do.

And there most likely won't be multiple callers because we're phasing
out MTRR use.

And even if there are, they better look at how this function is being
called before calling it. Which I seriously doubt - it is a static
function which you *just* came up with.

> Right, and there is more.  As the original code had comment "Just return
> the type as per start", which I noticed that I had accidentally removed,
> the code only returns the type of the start address.  The fixed ranges
> have multiple entries with different types.  Hence, a given range may
> overlap with multiple fixed entries.  I will restore the comment in the
> function header to clarify this limitation.

Ok, let's cleanup this function first and then consider fixing other
possible bugs which haven't been fixed since forever. Again, we might
not even need to address them because we won't be using MTRRs once we
switch to PAT completely, which is what Luis is working on.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 4/7] mtrr, x86: Fix MTRR state checks in mtrr_type_lookup()
  2015-05-06 22:39         ` Borislav Petkov
@ 2015-05-06 23:08           ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-06 23:08 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Thu, 2015-05-07 at 00:39 +0200, Borislav Petkov wrote:
> On Wed, May 06, 2015 at 09:23:31AM -0600, Toshi Kani wrote:
> > I have a question.  Those bits define the bit field of enabled in struct
> > mtrr_state_type, which is defined in this header.  Is it OK to only move
> > those definitions to other header?
> 
> I think we shouldn't expose stuff to userspace if we don't have to
> because then we're stuck with it. Userspace has managed so far without
> those defines so I don't see why we should export them now.

OK, I will move those bits definition to arch/x86/include/asm/mtrr.h.

Thanks for the clarification,
-Toshi



^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 4/7] mtrr, x86: Fix MTRR state checks in mtrr_type_lookup()
@ 2015-05-06 23:08           ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-06 23:08 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Thu, 2015-05-07 at 00:39 +0200, Borislav Petkov wrote:
> On Wed, May 06, 2015 at 09:23:31AM -0600, Toshi Kani wrote:
> > I have a question.  Those bits define the bit field of enabled in struct
> > mtrr_state_type, which is defined in this header.  Is it OK to only move
> > those definitions to other header?
> 
> I think we shouldn't expose stuff to userspace if we don't have to
> because then we're stuck with it. Userspace has managed so far without
> those defines so I don't see why we should export them now.

OK, I will move those bits definition to arch/x86/include/asm/mtrr.h.

Thanks for the clarification,
-Toshi


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 6/7] mtrr, x86: Clean up mtrr_type_lookup()
  2015-05-06 22:49         ` Borislav Petkov
@ 2015-05-06 23:42           ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-06 23:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Thu, 2015-05-07 at 00:49 +0200, Borislav Petkov wrote:
> On Wed, May 06, 2015 at 10:00:30AM -0600, Toshi Kani wrote:
> > Ingo asked me to describe this info here in his review...
> 
> Ok.
> 
> > mtrr_type_lookup_fixed() checks the above conditions at entry, and
> > returns immediately with TYPE_INVALID.  I think it is safer to have such
> > checks in mtrr_type_lookup_fixed() in case there will be multiple
> > callers.
> 
> This is not what I mean - I mean to call mtrr_type_lookup_fixed() based
> on @start and not unconditionally, like you do.
> 
> And there most likely won't be multiple callers because we're phasing
> out MTRR use.
> 
> And even if there are, they better look at how this function is being
> called before calling it. Which I seriously doubt - it is a static
> function which you *just* came up with.

Well, creating mtrr_type_lookup_fixed() is one of the comments I had in
the previous code review.  Anyway, let me make sure if I understand your
comment correctly.  Do the following changes look right to you?

1) Change the caller responsible for the condition checks.

        if ((start < 0x100000) &&
            (mtrr_state.have_fixed) &&
            (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
                return mtrr_type_lookup_fixed(start, end);

2) Delete the checks with mtrr_state in mtrr_type_lookup_fixed() as they
are done by the caller.  Keep the check with '(start >= 0x100000)' to
assure that the code handles the range [0xC0000 - 0xFFFFF] correctly.

static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
{
        int idx;

        if (start >= 0x100000)
                 return MTRR_TYPE_INVALID;
 
-       if (!(mtrr_state.have_fixed) ||
-           !(mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
-               return MTRR_TYPE_INVALID;


> > Right, and there is more.  As the original code had comment "Just return
> > the type as per start", which I noticed that I had accidentally removed,
> > the code only returns the type of the start address.  The fixed ranges
> > have multiple entries with different types.  Hence, a given range may
> > overlap with multiple fixed entries.  I will restore the comment in the
> > function header to clarify this limitation.
> 
> Ok, let's cleanup this function first and then consider fixing other
> possible bugs which haven't been fixed since forever. Again, we might
> not even need to address them because we won't be using MTRRs once we
> switch to PAT completely, which is what Luis is working on.

Right.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 6/7] mtrr, x86: Clean up mtrr_type_lookup()
@ 2015-05-06 23:42           ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-06 23:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Thu, 2015-05-07 at 00:49 +0200, Borislav Petkov wrote:
> On Wed, May 06, 2015 at 10:00:30AM -0600, Toshi Kani wrote:
> > Ingo asked me to describe this info here in his review...
> 
> Ok.
> 
> > mtrr_type_lookup_fixed() checks the above conditions at entry, and
> > returns immediately with TYPE_INVALID.  I think it is safer to have such
> > checks in mtrr_type_lookup_fixed() in case there will be multiple
> > callers.
> 
> This is not what I mean - I mean to call mtrr_type_lookup_fixed() based
> on @start and not unconditionally, like you do.
> 
> And there most likely won't be multiple callers because we're phasing
> out MTRR use.
> 
> And even if there are, they better look at how this function is being
> called before calling it. Which I seriously doubt - it is a static
> function which you *just* came up with.

Well, creating mtrr_type_lookup_fixed() is one of the comments I had in
the previous code review.  Anyway, let me make sure if I understand your
comment correctly.  Do the following changes look right to you?

1) Change the caller responsible for the condition checks.

        if ((start < 0x100000) &&
            (mtrr_state.have_fixed) &&
            (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
                return mtrr_type_lookup_fixed(start, end);

2) Delete the checks with mtrr_state in mtrr_type_lookup_fixed() as they
are done by the caller.  Keep the check with '(start >= 0x100000)' to
assure that the code handles the range [0xC0000 - 0xFFFFF] correctly.

static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
{
        int idx;

        if (start >= 0x100000)
                 return MTRR_TYPE_INVALID;
 
-       if (!(mtrr_state.have_fixed) ||
-           !(mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
-               return MTRR_TYPE_INVALID;


> > Right, and there is more.  As the original code had comment "Just return
> > the type as per start", which I noticed that I had accidentally removed,
> > the code only returns the type of the start address.  The fixed ranges
> > have multiple entries with different types.  Hence, a given range may
> > overlap with multiple fixed entries.  I will restore the comment in the
> > function header to clarify this limitation.
> 
> Ok, let's cleanup this function first and then consider fixing other
> possible bugs which haven't been fixed since forever. Again, we might
> not even need to address them because we won't be using MTRRs once we
> switch to PAT completely, which is what Luis is working on.

Right.

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: tools: Consolidate types.h
  2015-05-06 17:37     ` Borislav Petkov
@ 2015-05-07  2:53       ` Andy Lutomirski
  2015-05-07 16:58         ` [PATCH 0/1] x86/vdso: add -Iarch/x86/include/uapi into HOST_EXTRACFLAGS Oleg Nesterov
  0 siblings, 1 reply; 710+ messages in thread
From: Andy Lutomirski @ 2015-05-07  2:53 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Oleg Nesterov, Rusty Russell, Jiri Olsa, linux-kernel

On Wed, May 6, 2015 at 10:37 AM, Borislav Petkov <bp@suse.de> wrote:
> On Wed, May 06, 2015 at 07:30:35PM +0200, Oleg Nesterov wrote:
>> On 05/06, Borislav Petkov wrote:
>> >
>> > On Wed, May 06, 2015 at 06:54:00PM +0200, Oleg Nesterov wrote:
>> > > Hi,
>> > >
>> > > I can't build the kernel after "git pull",
>> >
>> > You mean, you can't build perf tool...?
>>
>> No, make bzImage fails, it can't compile arch/x86/vdso/vdso2c
>
> Wow, so this commit is a year old and this is the first time I see a it
> causing a failure. You must have a really ooold distro :-)
>
>> > > diff --git a/arch/x86/vdso/Makefile b/arch/x86/vdso/Makefile
>> > > index 275a3a8..e970320 100644
>> > > --- a/arch/x86/vdso/Makefile
>> > > +++ b/arch/x86/vdso/Makefile
>> > > @@ -51,7 +51,7 @@ VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
>> > >  $(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE
>> > >   $(call if_changed,vdso)
>> > >
>> > > -HOST_EXTRACFLAGS += -I$(srctree)/tools/include -I$(srctree)/include/uapi
>> > > +HOST_EXTRACFLAGS += -I$(srctree)/tools/include -I$(srctree)/include/uapi -I$(srctree)/arch/x86/include/uapi
>> >
>> > Do you have kernel-headers installed on your distro?
>>
>> I have no idea ;) but I guess they were installed. many years ago.
>>
>> > That's
>> > basically those uapi headers packaged separately. There's also "make
>> > headers_install" which should probably do that (haven't tried it
>> > though).
>>
>> Perhaps. but still, if HOST_EXTRACFLAGS has -I$(srctree)/include/uapi, why
>> it doesn't add arch/x86/include/uapi? This doesn't look consistent in any
>> case.
>
> Yeah, I guess it wouldn't hurt. Andy, see quoted hunk above ^^.

I'd be fine with adding the extra -I.  Want to send a pach?  I'll get
to it eventually if you don't beat me to it.

--Andy

>
> --
> Regards/Gruss,
>     Boris.
>
> ECO tip #101: Trim your mails when you reply.
> --



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 710+ messages in thread

* RE: [PATCH v5 1/6] x86/mm/pat: use pr_info() and friends
  2015-04-30 20:25 ` [PATCH v5 1/6] x86/mm/pat: use pr_info() and friends Luis R. Rodriguez
  2015-05-04 14:58   ` Borislav Petkov
@ 2015-05-07  3:36   ` Elliott, Robert (Server Storage)
  2015-05-14 15:55     ` Luis R. Rodriguez
  1 sibling, 1 reply; 710+ messages in thread
From: Elliott, Robert (Server Storage) @ 2015-05-07  3:36 UTC (permalink / raw)
  To: Luis R. Rodriguez, bp, mingo, tglx, hpa, plagnioj,
	tomi.valkeinen, daniel.vetter, airlied
  Cc: dledford, awalls, syrjala, luto, mst, cocci, linux-kernel,
	Luis R. Rodriguez, Juergen Gross, Daniel Vetter, Dave Airlie,
	Bjorn Helgaas, x86

> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> owner@vger.kernel.org] On Behalf Of Luis R. Rodriguez
> Sent: Thursday, April 30, 2015 3:25 PM
> Subject: [PATCH v5 1/6] x86/mm/pat: use pr_info() and friends
> 
...
> -			printk(KERN_ERR "%s:%d map pfn expected mapping
> type %s"
> -				" for [mem %#010Lx-%#010Lx], got %s\n",
> -				current->comm, current->pid,
> -				cattr_name(want_pcm),
> -				(unsigned long long)paddr,
> -				(unsigned long long)(paddr + size - 1),
> -				cattr_name(pcm));
> +			pr_err("%s:%d map pfn expected mapping type %s"
> +			       " for [mem %#010Lx-%#010Lx], got %s\n",

Since the patch joins some other print format strings split across 
lines (which checkpatch allows), you might want to join this one too.

...
> diff --git a/arch/x86/mm/pat_rbtree.c b/arch/x86/mm/pat_rbtree.c
...
>  failure:
> -	printk(KERN_INFO "%s:%d conflicting memory types "
> +	pr_info("%s:%d conflicting memory types "
>  		"%Lx-%Lx %s<->%s\n", current->comm, current->pid, start,
>  		end, cattr_name(found_type), cattr_name(match->type));

and that one.



^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 6/7] mtrr, x86: Clean up mtrr_type_lookup()
  2015-05-06 23:42           ` Toshi Kani
@ 2015-05-07  7:52             ` Borislav Petkov
  -1 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-07  7:52 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Wed, May 06, 2015 at 05:42:10PM -0600, Toshi Kani wrote:
> Well, creating mtrr_type_lookup_fixed() is one of the comments I had in
> the previous code review.  Anyway, let me make sure if I understand your
> comment correctly.  Do the following changes look right to you?
> 
> 1) Change the caller responsible for the condition checks.
> 
>         if ((start < 0x100000) &&
>             (mtrr_state.have_fixed) &&
>             (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
>                 return mtrr_type_lookup_fixed(start, end);
> 
> 2) Delete the checks with mtrr_state in mtrr_type_lookup_fixed() as they
> are done by the caller.  Keep the check with '(start >= 0x100000)' to
> assure that the code handles the range [0xC0000 - 0xFFFFF] correctly.

That is a good defensive measure.

> static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
> {
>         int idx;
> 
>         if (start >= 0x100000)
>                  return MTRR_TYPE_INVALID;
>  
> -       if (!(mtrr_state.have_fixed) ||
> -           !(mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
> -               return MTRR_TYPE_INVALID;

Yeah, that's what I mean.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 6/7] mtrr, x86: Clean up mtrr_type_lookup()
@ 2015-05-07  7:52             ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-07  7:52 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Wed, May 06, 2015 at 05:42:10PM -0600, Toshi Kani wrote:
> Well, creating mtrr_type_lookup_fixed() is one of the comments I had in
> the previous code review.  Anyway, let me make sure if I understand your
> comment correctly.  Do the following changes look right to you?
> 
> 1) Change the caller responsible for the condition checks.
> 
>         if ((start < 0x100000) &&
>             (mtrr_state.have_fixed) &&
>             (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
>                 return mtrr_type_lookup_fixed(start, end);
> 
> 2) Delete the checks with mtrr_state in mtrr_type_lookup_fixed() as they
> are done by the caller.  Keep the check with '(start >= 0x100000)' to
> assure that the code handles the range [0xC0000 - 0xFFFFF] correctly.

That is a good defensive measure.

> static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
> {
>         int idx;
> 
>         if (start >= 0x100000)
>                  return MTRR_TYPE_INVALID;
>  
> -       if (!(mtrr_state.have_fixed) ||
> -           !(mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
> -               return MTRR_TYPE_INVALID;

Yeah, that's what I mean.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 6/7] mtrr, x86: Clean up mtrr_type_lookup()
  2015-05-07  7:52             ` Borislav Petkov
@ 2015-05-07 13:45               ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-07 13:45 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Thu, 2015-05-07 at 09:52 +0200, Borislav Petkov wrote:
> On Wed, May 06, 2015 at 05:42:10PM -0600, Toshi Kani wrote:
> > Well, creating mtrr_type_lookup_fixed() is one of the comments I had in
> > the previous code review.  Anyway, let me make sure if I understand your
> > comment correctly.  Do the following changes look right to you?
> > 
> > 1) Change the caller responsible for the condition checks.
> > 
> >         if ((start < 0x100000) &&
> >             (mtrr_state.have_fixed) &&
> >             (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
> >                 return mtrr_type_lookup_fixed(start, end);
> > 
> > 2) Delete the checks with mtrr_state in mtrr_type_lookup_fixed() as they
> > are done by the caller.  Keep the check with '(start >= 0x100000)' to
> > assure that the code handles the range [0xC0000 - 0xFFFFF] correctly.
> 
> That is a good defensive measure.
> 
> > static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
> > {
> >         int idx;
> > 
> >         if (start >= 0x100000)
> >                  return MTRR_TYPE_INVALID;
> >  
> > -       if (!(mtrr_state.have_fixed) ||
> > -           !(mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
> > -               return MTRR_TYPE_INVALID;
> 
> Yeah, that's what I mean.

Thanks for the clarification! Will change accordingly.
-Toshi


^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 6/7] mtrr, x86: Clean up mtrr_type_lookup()
@ 2015-05-07 13:45               ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-07 13:45 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Thu, 2015-05-07 at 09:52 +0200, Borislav Petkov wrote:
> On Wed, May 06, 2015 at 05:42:10PM -0600, Toshi Kani wrote:
> > Well, creating mtrr_type_lookup_fixed() is one of the comments I had in
> > the previous code review.  Anyway, let me make sure if I understand your
> > comment correctly.  Do the following changes look right to you?
> > 
> > 1) Change the caller responsible for the condition checks.
> > 
> >         if ((start < 0x100000) &&
> >             (mtrr_state.have_fixed) &&
> >             (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
> >                 return mtrr_type_lookup_fixed(start, end);
> > 
> > 2) Delete the checks with mtrr_state in mtrr_type_lookup_fixed() as they
> > are done by the caller.  Keep the check with '(start >= 0x100000)' to
> > assure that the code handles the range [0xC0000 - 0xFFFFF] correctly.
> 
> That is a good defensive measure.
> 
> > static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
> > {
> >         int idx;
> > 
> >         if (start >= 0x100000)
> >                  return MTRR_TYPE_INVALID;
> >  
> > -       if (!(mtrr_state.have_fixed) ||
> > -           !(mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
> > -               return MTRR_TYPE_INVALID;
> 
> Yeah, that's what I mean.

Thanks for the clarification! Will change accordingly.
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* [PATCH 0/1] x86/vdso: add -Iarch/x86/include/uapi into HOST_EXTRACFLAGS
  2015-05-07  2:53       ` Andy Lutomirski
@ 2015-05-07 16:58         ` Oleg Nesterov
  2015-05-07 16:58           ` [PATCH 1/1] " Oleg Nesterov
  0 siblings, 1 reply; 710+ messages in thread
From: Oleg Nesterov @ 2015-05-07 16:58 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Borislav Petkov, Rusty Russell, Jiri Olsa, linux-kernel

On 05/06, Andy Lutomirski wrote:
>
> I'd be fine with adding the extra -I.  Want to send a pach?  I'll get
> to it eventually if you don't beat me to it.

Sure. But please feel free to ignore, this is really minor and I can
add a couple of .h files in my /usr/include. That said, "looks more
consistent" is true in any case imo, so please see the trivial 1/1.

Oleg.


^ permalink raw reply	[flat|nested] 710+ messages in thread

* [PATCH 1/1] x86/vdso: add -Iarch/x86/include/uapi into HOST_EXTRACFLAGS
  2015-05-07 16:58         ` [PATCH 0/1] x86/vdso: add -Iarch/x86/include/uapi into HOST_EXTRACFLAGS Oleg Nesterov
@ 2015-05-07 16:58           ` Oleg Nesterov
  2015-05-07 19:46             ` Andy Lutomirski
  2015-05-11 12:44             ` [tip:x86/urgent] x86/vdso: Fix 'make bzImage' on older distros tip-bot for Oleg Nesterov
  0 siblings, 2 replies; 710+ messages in thread
From: Oleg Nesterov @ 2015-05-07 16:58 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Borislav Petkov, Rusty Russell, Jiri Olsa, linux-kernel

Change HOST_EXTRACFLAGS to include arch/x86/include/uapi along with
include/uapi.

This looks more consistent, and this fixes "make bzImage" on my old
distro which doesn't have asm/bitsperlong.h in /usr/include/.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/vdso/Makefile |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/vdso/Makefile b/arch/x86/vdso/Makefile
index 275a3a8..e970320 100644
--- a/arch/x86/vdso/Makefile
+++ b/arch/x86/vdso/Makefile
@@ -51,7 +51,7 @@ VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
 $(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE
 	$(call if_changed,vdso)
 
-HOST_EXTRACFLAGS += -I$(srctree)/tools/include -I$(srctree)/include/uapi
+HOST_EXTRACFLAGS += -I$(srctree)/tools/include -I$(srctree)/include/uapi -I$(srctree)/arch/x86/include/uapi
 hostprogs-y			+= vdso2c
 
 quiet_cmd_vdso2c = VDSO2C  $@
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [PATCH 1/1] x86/vdso: add -Iarch/x86/include/uapi into HOST_EXTRACFLAGS
  2015-05-07 16:58           ` [PATCH 1/1] " Oleg Nesterov
@ 2015-05-07 19:46             ` Andy Lutomirski
  2015-05-07 21:55               ` Borislav Petkov
  2015-05-11 12:44             ` [tip:x86/urgent] x86/vdso: Fix 'make bzImage' on older distros tip-bot for Oleg Nesterov
  1 sibling, 1 reply; 710+ messages in thread
From: Andy Lutomirski @ 2015-05-07 19:46 UTC (permalink / raw)
  To: Oleg Nesterov, Ingo Molnar, X86 ML
  Cc: Borislav Petkov, Rusty Russell, Jiri Olsa, linux-kernel

On Thu, May 7, 2015 at 9:58 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> Change HOST_EXTRACFLAGS to include arch/x86/include/uapi along with
> include/uapi.
>
> This looks more consistent, and this fixes "make bzImage" on my old
> distro which doesn't have asm/bitsperlong.h in /usr/include/.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Fixed: 6f121e548f83 x86, vdso: Reimplement vdso.so preparation in build-time C
Acked-by: Andy Lutomirski <luto@kernel.org>

Ingo, despite Oleg's rather late discovery of this, it's a regression
fix.  If you think it's appropriate for x86/urgent, please apply it.
Otherwise I'll queue it up.

--Andy

> ---
>  arch/x86/vdso/Makefile |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/vdso/Makefile b/arch/x86/vdso/Makefile
> index 275a3a8..e970320 100644
> --- a/arch/x86/vdso/Makefile
> +++ b/arch/x86/vdso/Makefile
> @@ -51,7 +51,7 @@ VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
>  $(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE
>         $(call if_changed,vdso)
>
> -HOST_EXTRACFLAGS += -I$(srctree)/tools/include -I$(srctree)/include/uapi
> +HOST_EXTRACFLAGS += -I$(srctree)/tools/include -I$(srctree)/include/uapi -I$(srctree)/arch/x86/include/uapi
>  hostprogs-y                    += vdso2c
>
>  quiet_cmd_vdso2c = VDSO2C  $@
> --
> 1.5.5.1
>
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH 1/1] x86/vdso: add -Iarch/x86/include/uapi into HOST_EXTRACFLAGS
  2015-05-07 19:46             ` Andy Lutomirski
@ 2015-05-07 21:55               ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-07 21:55 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Oleg Nesterov, Ingo Molnar, X86 ML, Rusty Russell, Jiri Olsa,
	linux-kernel

On Thu, May 07, 2015 at 12:46:33PM -0700, Andy Lutomirski wrote:
> On Thu, May 7, 2015 at 9:58 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> > Change HOST_EXTRACFLAGS to include arch/x86/include/uapi along with
> > include/uapi.
> >
> > This looks more consistent, and this fixes "make bzImage" on my old
> > distro which doesn't have asm/bitsperlong.h in /usr/include/.
> >
> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> 
> Fixed: 6f121e548f83 x86, vdso: Reimplement vdso.so preparation in build-time C
> Acked-by: Andy Lutomirski <luto@kernel.org>
> 
> Ingo, despite Oleg's rather late discovery of this, it's a regression
> fix.  If you think it's appropriate for x86/urgent, please apply it.
> Otherwise I'll queue it up.

I took it and tagged it for stable. It is a build fix after all.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 7/7] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-03-24 22:08   ` Toshi Kani
@ 2015-05-09  9:08     ` Borislav Petkov
  -1 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-09  9:08 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, Mar 24, 2015 at 04:08:41PM -0600, Toshi Kani wrote:
> This patch adds an additional argument, 'uniform', to
> mtrr_type_lookup(), which returns 1 when a given range is
> covered uniformly by MTRRs, i.e. the range is fully covered
> by a single MTRR entry or the default type.
> 
> pud_set_huge() and pmd_set_huge() are changed to check the
> new 'uniform' flag to see if it is safe to create a huge page
> mapping to the range.  This allows them to create a huge page
> mapping to a range covered by a single MTRR entry of any
> memory type.  It also detects a non-optimal request properly.
> They continue to check with the WB type since the WB type has
> no effect even if a request spans multiple MTRR entries.
> 
> pmd_set_huge() logs a warning message to a non-optimal request
> so that driver writers will be aware of such a case.  Drivers
> should make a mapping request aligned to a single MTRR entry
> when the range is covered by MTRRs.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> ---
>  arch/x86/include/asm/mtrr.h        |    5 +++--
>  arch/x86/kernel/cpu/mtrr/generic.c |   35 +++++++++++++++++++++++++++--------
>  arch/x86/mm/pat.c                  |    4 ++--
>  arch/x86/mm/pgtable.c              |   25 +++++++++++++++----------
>  4 files changed, 47 insertions(+), 22 deletions(-)

...

> @@ -235,13 +240,19 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
>   * Return Values:
>   * MTRR_TYPE_(type)  - The effective MTRR type for the region
>   * MTRR_TYPE_INVALID - MTRR is disabled
> + *
> + * Output Argument:
> + * uniform - Set to 1 when MTRR covers the region uniformly, i.e. the region
> + *	     is fully covered by a single MTRR entry or the default type.

I'd call this "single_mtrr". "uniform" could also mean that the resulting
type is uniform, i.e. of the same type but spanning multiple MTRRs.

>   */
> -u8 mtrr_type_lookup(u64 start, u64 end)
> +u8 mtrr_type_lookup(u64 start, u64 end, u8 *uniform)
>  {
> -	u8 type, prev_type;
> +	u8 type, prev_type, is_uniform, dummy;
>  	int repeat;
>  	u64 partial_end;
>  
> +	*uniform = 1;
> +

You're setting it here...

>  	if (!mtrr_state_set)
>  		return MTRR_TYPE_INVALID;

... but if you return here, you would've changed the thing uniform
points to needlessly as you're returning an error.

> @@ -253,14 +264,17 @@ u8 mtrr_type_lookup(u64 start, u64 end)
>  	 * the variable ranges.
>  	 */
>  	type = mtrr_type_lookup_fixed(start, end);
> -	if (type != MTRR_TYPE_INVALID)
> +	if (type != MTRR_TYPE_INVALID) {
> +		*uniform = 0;
>  		return type;
> +	}
>  
>  	/*
>  	 * Look up the variable ranges.  Look of multiple ranges matching
>  	 * this address and pick type as per MTRR precedence.
>  	 */
> -	type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
> +	type = mtrr_type_lookup_variable(start, end, &partial_end,
> +					 &repeat, &is_uniform);
>  
>  	/*
>  	 * Common path is with repeat = 0.
> @@ -271,16 +285,21 @@ u8 mtrr_type_lookup(u64 start, u64 end)
>  	while (repeat) {
>  		prev_type = type;
>  		start = partial_end;
> +		is_uniform = 0;

So I think it would be better if you added an out: label where you do
exit from the function and set return values there.

So something like that, I'm pasting the whole function here so that you
can follow better:

u8 mtrr_type_lookup(u64 start, u64 end, u8 *uniform)
{
        u8 type, prev_type, is_uniform = 1, dummy;
        int repeat;
        u64 partial_end;

        if (!mtrr_state_set)
                return MTRR_TYPE_INVALID;

        if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
                return MTRR_TYPE_INVALID;

        /*
         * Look up the fixed ranges first, which take priority over
         * the variable ranges.
         */
        type = mtrr_type_lookup_fixed(start, end);
        if (type != MTRR_TYPE_INVALID) {
                is_uniform = 0;
                goto out;
        }

        /*
         * Look up the variable ranges.  Look of multiple ranges matching
         * this address and pick type as per MTRR precedence.
         */
        type = mtrr_type_lookup_variable(start, end, &partial_end,
                                         &repeat, &is_uniform);

        /*
         * Common path is with repeat = 0.
         * However, we can have cases where [start:end] spans across some
         * MTRR ranges and/or the default type.  Do repeated lookups for
         * that case here.
         */
        while (repeat) {
                prev_type = type;
                start = partial_end;
                is_uniform = 0;

                type = mtrr_type_lookup_variable(start, end, &partial_end,
                                                 &repeat, &dummy);

                if (check_type_overlap(&prev_type, &type))
                        goto out;

        }

        if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
                type = MTRR_TYPE_WRBACK;

out:
        *uniform = is_uniform;
        return type;
}
---

This way you're setting the uniform pointer in a single location and you're
working with the local variable inside the function.

Much easier to follow.

> +
>  		type = mtrr_type_lookup_variable(start, end, &partial_end,
> -						 &repeat);
> +						 &repeat, &dummy);
>  
> -		if (check_type_overlap(&prev_type, &type))
> +		if (check_type_overlap(&prev_type, &type)) {
> +			*uniform = 0;
>  			return type;
> +		}
>  	}
>  
>  	if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
>  		return MTRR_TYPE_WRBACK;
>  
> +	*uniform = is_uniform;
>  	return type;
>  }
>  
> diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
> index 35af677..372ad42 100644
> --- a/arch/x86/mm/pat.c
> +++ b/arch/x86/mm/pat.c
> @@ -267,9 +267,9 @@ static unsigned long pat_x_mtrr_type(u64 start, u64 end,
>  	 * request is for WB.
>  	 */
>  	if (req_type == _PAGE_CACHE_MODE_WB) {
> -		u8 mtrr_type;
> +		u8 mtrr_type, uniform;
>  
> -		mtrr_type = mtrr_type_lookup(start, end);
> +		mtrr_type = mtrr_type_lookup(start, end, &uniform);
>  		if (mtrr_type != MTRR_TYPE_WRBACK)
>  			return _PAGE_CACHE_MODE_UC_MINUS;
>  
> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index cfca4cf..3d6edea 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -567,17 +567,18 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
>   * pud_set_huge - setup kernel PUD mapping
>   *
>   * MTRR can override PAT memory types with 4KB granularity.  Therefore,
> - * it does not set up a huge page when the range is covered by a non-WB
> - * type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are disabled.
> + * it only sets up a huge page when the range is mapped uniformly by MTRR
> + * (i.e. the range is fully covered by a single MTRR entry or the default
> + * type) or the MTRR memory type is WB.
>   *
>   * Return 1 on success, and 0 when no PUD was set.
>   */
>  int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
>  {
> -	u8 mtrr;
> +	u8 mtrr, uniform;
>  
> -	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
> -	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
> +	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
> +	if ((!uniform) && (mtrr != MTRR_TYPE_WRBACK))
>  		return 0;
>  
>  	prot = pgprot_4k_2_large(prot);
> @@ -593,18 +594,22 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
>   * pmd_set_huge - setup kernel PMD mapping
>   *
>   * MTRR can override PAT memory types with 4KB granularity.  Therefore,
> - * it does not set up a huge page when the range is covered by a non-WB
> - * type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are disabled.
> + * it only sets up a huge page when the range is mapped uniformly by MTRR
> + * (i.e. the range is fully covered by a single MTRR entry or the default
> + * type) or the MTRR memory type is WB.
>   *
>   * Return 1 on success, and 0 when no PMD was set.
>   */
>  int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
>  {
> -	u8 mtrr;
> +	u8 mtrr, uniform;
>  
> -	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
> -	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
> +	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
> +	if ((!uniform) && (mtrr != MTRR_TYPE_WRBACK)) {
> +		pr_warn("pmd_set_huge: requesting [mem %#010llx-%#010llx], which spans more than a single MTRR entry\n",
> +				addr, addr + PMD_SIZE);
>  		return 0;

So this returns 0, i.e. failure already. Why do we even have to warn?
Caller already knows it failed.

And this warning would flood dmesg needlessly.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 7/7] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
@ 2015-05-09  9:08     ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-09  9:08 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, Mar 24, 2015 at 04:08:41PM -0600, Toshi Kani wrote:
> This patch adds an additional argument, 'uniform', to
> mtrr_type_lookup(), which returns 1 when a given range is
> covered uniformly by MTRRs, i.e. the range is fully covered
> by a single MTRR entry or the default type.
> 
> pud_set_huge() and pmd_set_huge() are changed to check the
> new 'uniform' flag to see if it is safe to create a huge page
> mapping to the range.  This allows them to create a huge page
> mapping to a range covered by a single MTRR entry of any
> memory type.  It also detects a non-optimal request properly.
> They continue to check with the WB type since the WB type has
> no effect even if a request spans multiple MTRR entries.
> 
> pmd_set_huge() logs a warning message to a non-optimal request
> so that driver writers will be aware of such a case.  Drivers
> should make a mapping request aligned to a single MTRR entry
> when the range is covered by MTRRs.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> ---
>  arch/x86/include/asm/mtrr.h        |    5 +++--
>  arch/x86/kernel/cpu/mtrr/generic.c |   35 +++++++++++++++++++++++++++--------
>  arch/x86/mm/pat.c                  |    4 ++--
>  arch/x86/mm/pgtable.c              |   25 +++++++++++++++----------
>  4 files changed, 47 insertions(+), 22 deletions(-)

...

> @@ -235,13 +240,19 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
>   * Return Values:
>   * MTRR_TYPE_(type)  - The effective MTRR type for the region
>   * MTRR_TYPE_INVALID - MTRR is disabled
> + *
> + * Output Argument:
> + * uniform - Set to 1 when MTRR covers the region uniformly, i.e. the region
> + *	     is fully covered by a single MTRR entry or the default type.

I'd call this "single_mtrr". "uniform" could also mean that the resulting
type is uniform, i.e. of the same type but spanning multiple MTRRs.

>   */
> -u8 mtrr_type_lookup(u64 start, u64 end)
> +u8 mtrr_type_lookup(u64 start, u64 end, u8 *uniform)
>  {
> -	u8 type, prev_type;
> +	u8 type, prev_type, is_uniform, dummy;
>  	int repeat;
>  	u64 partial_end;
>  
> +	*uniform = 1;
> +

You're setting it here...

>  	if (!mtrr_state_set)
>  		return MTRR_TYPE_INVALID;

... but if you return here, you would've changed the thing uniform
points to needlessly as you're returning an error.

> @@ -253,14 +264,17 @@ u8 mtrr_type_lookup(u64 start, u64 end)
>  	 * the variable ranges.
>  	 */
>  	type = mtrr_type_lookup_fixed(start, end);
> -	if (type != MTRR_TYPE_INVALID)
> +	if (type != MTRR_TYPE_INVALID) {
> +		*uniform = 0;
>  		return type;
> +	}
>  
>  	/*
>  	 * Look up the variable ranges.  Look of multiple ranges matching
>  	 * this address and pick type as per MTRR precedence.
>  	 */
> -	type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
> +	type = mtrr_type_lookup_variable(start, end, &partial_end,
> +					 &repeat, &is_uniform);
>  
>  	/*
>  	 * Common path is with repeat = 0.
> @@ -271,16 +285,21 @@ u8 mtrr_type_lookup(u64 start, u64 end)
>  	while (repeat) {
>  		prev_type = type;
>  		start = partial_end;
> +		is_uniform = 0;

So I think it would be better if you added an out: label where you do
exit from the function and set return values there.

So something like that, I'm pasting the whole function here so that you
can follow better:

u8 mtrr_type_lookup(u64 start, u64 end, u8 *uniform)
{
        u8 type, prev_type, is_uniform = 1, dummy;
        int repeat;
        u64 partial_end;

        if (!mtrr_state_set)
                return MTRR_TYPE_INVALID;

        if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
                return MTRR_TYPE_INVALID;

        /*
         * Look up the fixed ranges first, which take priority over
         * the variable ranges.
         */
        type = mtrr_type_lookup_fixed(start, end);
        if (type != MTRR_TYPE_INVALID) {
                is_uniform = 0;
                goto out;
        }

        /*
         * Look up the variable ranges.  Look of multiple ranges matching
         * this address and pick type as per MTRR precedence.
         */
        type = mtrr_type_lookup_variable(start, end, &partial_end,
                                         &repeat, &is_uniform);

        /*
         * Common path is with repeat = 0.
         * However, we can have cases where [start:end] spans across some
         * MTRR ranges and/or the default type.  Do repeated lookups for
         * that case here.
         */
        while (repeat) {
                prev_type = type;
                start = partial_end;
                is_uniform = 0;

                type = mtrr_type_lookup_variable(start, end, &partial_end,
                                                 &repeat, &dummy);

                if (check_type_overlap(&prev_type, &type))
                        goto out;

        }

        if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
                type = MTRR_TYPE_WRBACK;

out:
        *uniform = is_uniform;
        return type;
}
---

This way you're setting the uniform pointer in a single location and you're
working with the local variable inside the function.

Much easier to follow.

> +
>  		type = mtrr_type_lookup_variable(start, end, &partial_end,
> -						 &repeat);
> +						 &repeat, &dummy);
>  
> -		if (check_type_overlap(&prev_type, &type))
> +		if (check_type_overlap(&prev_type, &type)) {
> +			*uniform = 0;
>  			return type;
> +		}
>  	}
>  
>  	if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
>  		return MTRR_TYPE_WRBACK;
>  
> +	*uniform = is_uniform;
>  	return type;
>  }
>  
> diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
> index 35af677..372ad42 100644
> --- a/arch/x86/mm/pat.c
> +++ b/arch/x86/mm/pat.c
> @@ -267,9 +267,9 @@ static unsigned long pat_x_mtrr_type(u64 start, u64 end,
>  	 * request is for WB.
>  	 */
>  	if (req_type == _PAGE_CACHE_MODE_WB) {
> -		u8 mtrr_type;
> +		u8 mtrr_type, uniform;
>  
> -		mtrr_type = mtrr_type_lookup(start, end);
> +		mtrr_type = mtrr_type_lookup(start, end, &uniform);
>  		if (mtrr_type != MTRR_TYPE_WRBACK)
>  			return _PAGE_CACHE_MODE_UC_MINUS;
>  
> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index cfca4cf..3d6edea 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -567,17 +567,18 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
>   * pud_set_huge - setup kernel PUD mapping
>   *
>   * MTRR can override PAT memory types with 4KB granularity.  Therefore,
> - * it does not set up a huge page when the range is covered by a non-WB
> - * type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are disabled.
> + * it only sets up a huge page when the range is mapped uniformly by MTRR
> + * (i.e. the range is fully covered by a single MTRR entry or the default
> + * type) or the MTRR memory type is WB.
>   *
>   * Return 1 on success, and 0 when no PUD was set.
>   */
>  int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
>  {
> -	u8 mtrr;
> +	u8 mtrr, uniform;
>  
> -	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
> -	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
> +	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
> +	if ((!uniform) && (mtrr != MTRR_TYPE_WRBACK))
>  		return 0;
>  
>  	prot = pgprot_4k_2_large(prot);
> @@ -593,18 +594,22 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
>   * pmd_set_huge - setup kernel PMD mapping
>   *
>   * MTRR can override PAT memory types with 4KB granularity.  Therefore,
> - * it does not set up a huge page when the range is covered by a non-WB
> - * type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are disabled.
> + * it only sets up a huge page when the range is mapped uniformly by MTRR
> + * (i.e. the range is fully covered by a single MTRR entry or the default
> + * type) or the MTRR memory type is WB.
>   *
>   * Return 1 on success, and 0 when no PMD was set.
>   */
>  int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
>  {
> -	u8 mtrr;
> +	u8 mtrr, uniform;
>  
> -	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
> -	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
> +	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
> +	if ((!uniform) && (mtrr != MTRR_TYPE_WRBACK)) {
> +		pr_warn("pmd_set_huge: requesting [mem %#010llx-%#010llx], which spans more than a single MTRR entry\n",
> +				addr, addr + PMD_SIZE);
>  		return 0;

So this returns 0, i.e. failure already. Why do we even have to warn?
Caller already knows it failed.

And this warning would flood dmesg needlessly.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* [0/8] tip queue 2015-05-11
@ 2015-05-11  8:15 Borislav Petkov
  2015-05-11  8:15 ` [PATCH] x86/alternatives: Switch AMD F15h and later to the P6 NOPs Borislav Petkov
                   ` (7 more replies)
  0 siblings, 8 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-11  8:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML

From: Borislav Petkov <bp@suse.de>

Hi Ingo,

8 patches this time.

2015-05-11/
├── cleanups
│   └── 0001-x86-kaslr-Fix-typo-in-KASLR_FLAG-documentation.patch
├── cpu
│   └── 0001-x86-alternatives-Switch-AMD-F15h-and-later-to-the-P6.patch
├── microcode
│   └── 0001-x86-cpu-microcode-Zap-changelog.patch
├── mm
│   ├── 0001-x86-mm-Do-not-flush-last-cacheline-twice-in-clflush_.patch
│   ├── 0002-x86-mm-Add-kerneldoc-comments-for-pcommit_sfence.patch
│   ├── 0003-x86-MTRR-Remove-wrong-address-check-in-__mtrr_type_l.patch
│   └── 0004-x86-mm-Add-ioremap_uc-helper-to-map-memory-uncacheab.patch
└── urgent
    └── 0001-x86-vdso-Add-arch-x86-include-uapi-include-path-to-H.patch

^ permalink raw reply	[flat|nested] 710+ messages in thread

* [PATCH] x86/alternatives: Switch AMD F15h and later to the P6 NOPs
  2015-05-11  8:15 [0/8] tip queue 2015-05-11 Borislav Petkov
@ 2015-05-11  8:15 ` Borislav Petkov
  2015-05-11 12:44   ` [tip:x86/asm] " tip-bot for Borislav Petkov
  2015-05-11  8:15 ` [PATCH] x86/cpu/microcode: Zap changelog Borislav Petkov
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-11  8:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML

From: Borislav Petkov <bp@suse.de>

Software optimization guides for both F15h and F16h cite those NOPs as
the optimal ones. A microbenchmark confirms that actually even older
families are better with the single-insn NOPs so switch to them for the
alternatives.

Cycles count below includes the loop overhead of the measurement but
that overhead is the same with all runs.

F10h, revE:
-----------
Running NOP tests, 1000 NOPs x 1000000 repetitions

K8:
                      90     288.212282 cycles
                   66 90     288.220840 cycles
                66 66 90     288.219447 cycles
             66 66 66 90     288.223204 cycles
          66 66 90 66 90     571.393424 cycles
       66 66 90 66 66 90     571.374919 cycles
    66 66 66 90 66 66 90     572.249281 cycles
 66 66 66 90 66 66 66 90     571.388651 cycles

P6:
                      90     288.214193 cycles
                   66 90     288.225550 cycles
                0f 1f 00     288.224441 cycles
             0f 1f 40 00     288.225030 cycles
          0f 1f 44 00 00     288.233558 cycles
       66 0f 1f 44 00 00     324.792342 cycles
    0f 1f 80 00 00 00 00     325.657462 cycles
 0f 1f 84 00 00 00 00 00     430.246643 cycles

F14h:
----
Running NOP tests, 1000 NOPs x 1000000 repetitions

K8:
                      90     510.404890 cycles
                   66 90     510.432117 cycles
                66 66 90     510.561858 cycles
             66 66 66 90     510.541865 cycles
          66 66 90 66 90    1014.192782 cycles
       66 66 90 66 66 90    1014.226546 cycles
    66 66 66 90 66 66 90    1014.334299 cycles
 66 66 66 90 66 66 66 90    1014.381205 cycles

P6:
                      90     510.436710 cycles
                   66 90     510.448229 cycles
                0f 1f 00     510.545100 cycles
             0f 1f 40 00     510.502792 cycles
          0f 1f 44 00 00     510.589517 cycles
       66 0f 1f 44 00 00     510.611462 cycles
    0f 1f 80 00 00 00 00     511.166794 cycles
 0f 1f 84 00 00 00 00 00     511.651641 cycles

F15h:
-----
Running NOP tests, 1000 NOPs x 1000000 repetitions

K8:
                      90     243.128396 cycles
                   66 90     243.129883 cycles
                66 66 90     243.131631 cycles
             66 66 66 90     242.499324 cycles
          66 66 90 66 90     481.829083 cycles
       66 66 90 66 66 90     481.884413 cycles
    66 66 66 90 66 66 90     481.851446 cycles
 66 66 66 90 66 66 66 90     481.409220 cycles

P6:
                      90     243.127026 cycles
                   66 90     243.130711 cycles
                0f 1f 00     243.122747 cycles
             0f 1f 40 00     242.497617 cycles
          0f 1f 44 00 00     245.354461 cycles
       66 0f 1f 44 00 00     361.930417 cycles
    0f 1f 80 00 00 00 00     362.844944 cycles
 0f 1f 84 00 00 00 00 00     480.514948 cycles

F16h:
-----
Running NOP tests, 1000 NOPs x 1000000 repetitions

K8:
                      90     507.793298 cycles
                   66 90     507.789636 cycles
                66 66 90     507.826490 cycles
             66 66 66 90     507.859075 cycles
          66 66 90 66 90    1008.663129 cycles
       66 66 90 66 66 90    1008.696259 cycles
    66 66 66 90 66 66 90    1008.692517 cycles
 66 66 66 90 66 66 66 90    1008.755399 cycles

P6:
                      90     507.795232 cycles
                   66 90     507.794761 cycles
                0f 1f 00     507.834901 cycles
             0f 1f 40 00     507.822629 cycles
          0f 1f 44 00 00     507.838493 cycles
       66 0f 1f 44 00 00     507.908597 cycles
    0f 1f 80 00 00 00 00     507.946417 cycles
 0f 1f 84 00 00 00 00 00     507.954960 cycles

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
---
 arch/x86/kernel/alternative.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index aef653193160..b0932c4341b3 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -227,6 +227,15 @@ void __init arch_init_ideal_nops(void)
 #endif
 		}
 		break;
+
+	case X86_VENDOR_AMD:
+		if (boot_cpu_data.x86 > 0xf) {
+			ideal_nops = p6_nops;
+			return;
+		}
+
+		/* fall through */
+
 	default:
 #ifdef CONFIG_X86_64
 		ideal_nops = k8_nops;
-- 
2.3.5


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH] x86/cpu/microcode: Zap changelog
  2015-05-11  8:15 [0/8] tip queue 2015-05-11 Borislav Petkov
  2015-05-11  8:15 ` [PATCH] x86/alternatives: Switch AMD F15h and later to the P6 NOPs Borislav Petkov
@ 2015-05-11  8:15 ` Borislav Petkov
  2015-05-11 12:45   ` [tip:x86/microcode] " tip-bot for Borislav Petkov
  2015-05-11  8:15 ` [PATCH] x86/kaslr: Fix typo in KASLR_FLAG documentation Borislav Petkov
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-11  8:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML

From: Borislav Petkov <bp@suse.de>

It is useless at best and git history has it all detailed anyway. Update
copyright while at it.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/microcode/core.c  | 76 +++++------------------------------
 arch/x86/kernel/cpu/microcode/intel.c | 75 ++++------------------------------
 2 files changed, 16 insertions(+), 135 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index 36a83617eb21..6236a54a63f4 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -1,74 +1,16 @@
 /*
- *	Intel CPU Microcode Update Driver for Linux
+ * CPU Microcode Update Driver for Linux
  *
- *	Copyright (C) 2000-2006 Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
- *		      2006	Shaohua Li <shaohua.li@intel.com>
+ * Copyright (C) 2000-2006 Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
+ *	      2006	Shaohua Li <shaohua.li@intel.com>
+ *	      2013-2015	Borislav Petkov <bp@alien8.de>
  *
- *	This driver allows to upgrade microcode on Intel processors
- *	belonging to IA-32 family - PentiumPro, Pentium II,
- *	Pentium III, Xeon, Pentium 4, etc.
+ * This driver allows to upgrade microcode on x86 processors.
  *
- *	Reference: Section 8.11 of Volume 3a, IA-32 Intel? Architecture
- *	Software Developer's Manual
- *	Order Number 253668 or free download from:
- *
- *	http://developer.intel.com/Assets/PDF/manual/253668.pdf	
- *
- *	For more information, go to http://www.urbanmyth.org/microcode
- *
- *	This program is free software; you can redistribute it and/or
- *	modify it under the terms of the GNU General Public License
- *	as published by the Free Software Foundation; either version
- *	2 of the License, or (at your option) any later version.
- *
- *	1.0	16 Feb 2000, Tigran Aivazian <tigran@sco.com>
- *		Initial release.
- *	1.01	18 Feb 2000, Tigran Aivazian <tigran@sco.com>
- *		Added read() support + cleanups.
- *	1.02	21 Feb 2000, Tigran Aivazian <tigran@sco.com>
- *		Added 'device trimming' support. open(O_WRONLY) zeroes
- *		and frees the saved copy of applied microcode.
- *	1.03	29 Feb 2000, Tigran Aivazian <tigran@sco.com>
- *		Made to use devfs (/dev/cpu/microcode) + cleanups.
- *	1.04	06 Jun 2000, Simon Trimmer <simon@veritas.com>
- *		Added misc device support (now uses both devfs and misc).
- *		Added MICROCODE_IOCFREE ioctl to clear memory.
- *	1.05	09 Jun 2000, Simon Trimmer <simon@veritas.com>
- *		Messages for error cases (non Intel & no suitable microcode).
- *	1.06	03 Aug 2000, Tigran Aivazian <tigran@veritas.com>
- *		Removed ->release(). Removed exclusive open and status bitmap.
- *		Added microcode_rwsem to serialize read()/write()/ioctl().
- *		Removed global kernel lock usage.
- *	1.07	07 Sep 2000, Tigran Aivazian <tigran@veritas.com>
- *		Write 0 to 0x8B msr and then cpuid before reading revision,
- *		so that it works even if there were no update done by the
- *		BIOS. Otherwise, reading from 0x8B gives junk (which happened
- *		to be 0 on my machine which is why it worked even when I
- *		disabled update by the BIOS)
- *		Thanks to Eric W. Biederman <ebiederman@lnxi.com> for the fix.
- *	1.08	11 Dec 2000, Richard Schaal <richard.schaal@intel.com> and
- *			     Tigran Aivazian <tigran@veritas.com>
- *		Intel Pentium 4 processor support and bugfixes.
- *	1.09	30 Oct 2001, Tigran Aivazian <tigran@veritas.com>
- *		Bugfix for HT (Hyper-Threading) enabled processors
- *		whereby processor resources are shared by all logical processors
- *		in a single CPU package.
- *	1.10	28 Feb 2002 Asit K Mallick <asit.k.mallick@intel.com> and
- *		Tigran Aivazian <tigran@veritas.com>,
- *		Serialize updates as required on HT processors due to
- *		speculative nature of implementation.
- *	1.11	22 Mar 2002 Tigran Aivazian <tigran@veritas.com>
- *		Fix the panic when writing zero-length microcode chunk.
- *	1.12	29 Sep 2003 Nitin Kamble <nitin.a.kamble@intel.com>,
- *		Jun Nakajima <jun.nakajima@intel.com>
- *		Support for the microcode updates in the new format.
- *	1.13	10 Oct 2003 Tigran Aivazian <tigran@veritas.com>
- *		Removed ->read() method and obsoleted MICROCODE_IOCFREE ioctl
- *		because we no longer hold a copy of applied microcode
- *		in kernel memory.
- *	1.14	25 Jun 2004 Tigran Aivazian <tigran@veritas.com>
- *		Fix sigmatch() macro to handle old CPUs with pf == 0.
- *		Thanks to Stuart Swales for pointing out this bug.
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
  */
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index a41beadb3db9..e20d4e58cd89 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -1,74 +1,13 @@
 /*
- *	Intel CPU Microcode Update Driver for Linux
+ * Intel CPU Microcode Update Driver for Linux
  *
- *	Copyright (C) 2000-2006 Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
- *		      2006	Shaohua Li <shaohua.li@intel.com>
+ * Copyright (C) 2000-2006 Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
+ *		 2006 Shaohua Li <shaohua.li@intel.com>
  *
- *	This driver allows to upgrade microcode on Intel processors
- *	belonging to IA-32 family - PentiumPro, Pentium II,
- *	Pentium III, Xeon, Pentium 4, etc.
- *
- *	Reference: Section 8.11 of Volume 3a, IA-32 Intel? Architecture
- *	Software Developer's Manual
- *	Order Number 253668 or free download from:
- *
- *	http://developer.intel.com/Assets/PDF/manual/253668.pdf	
- *
- *	For more information, go to http://www.urbanmyth.org/microcode
- *
- *	This program is free software; you can redistribute it and/or
- *	modify it under the terms of the GNU General Public License
- *	as published by the Free Software Foundation; either version
- *	2 of the License, or (at your option) any later version.
- *
- *	1.0	16 Feb 2000, Tigran Aivazian <tigran@sco.com>
- *		Initial release.
- *	1.01	18 Feb 2000, Tigran Aivazian <tigran@sco.com>
- *		Added read() support + cleanups.
- *	1.02	21 Feb 2000, Tigran Aivazian <tigran@sco.com>
- *		Added 'device trimming' support. open(O_WRONLY) zeroes
- *		and frees the saved copy of applied microcode.
- *	1.03	29 Feb 2000, Tigran Aivazian <tigran@sco.com>
- *		Made to use devfs (/dev/cpu/microcode) + cleanups.
- *	1.04	06 Jun 2000, Simon Trimmer <simon@veritas.com>
- *		Added misc device support (now uses both devfs and misc).
- *		Added MICROCODE_IOCFREE ioctl to clear memory.
- *	1.05	09 Jun 2000, Simon Trimmer <simon@veritas.com>
- *		Messages for error cases (non Intel & no suitable microcode).
- *	1.06	03 Aug 2000, Tigran Aivazian <tigran@veritas.com>
- *		Removed ->release(). Removed exclusive open and status bitmap.
- *		Added microcode_rwsem to serialize read()/write()/ioctl().
- *		Removed global kernel lock usage.
- *	1.07	07 Sep 2000, Tigran Aivazian <tigran@veritas.com>
- *		Write 0 to 0x8B msr and then cpuid before reading revision,
- *		so that it works even if there were no update done by the
- *		BIOS. Otherwise, reading from 0x8B gives junk (which happened
- *		to be 0 on my machine which is why it worked even when I
- *		disabled update by the BIOS)
- *		Thanks to Eric W. Biederman <ebiederman@lnxi.com> for the fix.
- *	1.08	11 Dec 2000, Richard Schaal <richard.schaal@intel.com> and
- *			     Tigran Aivazian <tigran@veritas.com>
- *		Intel Pentium 4 processor support and bugfixes.
- *	1.09	30 Oct 2001, Tigran Aivazian <tigran@veritas.com>
- *		Bugfix for HT (Hyper-Threading) enabled processors
- *		whereby processor resources are shared by all logical processors
- *		in a single CPU package.
- *	1.10	28 Feb 2002 Asit K Mallick <asit.k.mallick@intel.com> and
- *		Tigran Aivazian <tigran@veritas.com>,
- *		Serialize updates as required on HT processors due to
- *		speculative nature of implementation.
- *	1.11	22 Mar 2002 Tigran Aivazian <tigran@veritas.com>
- *		Fix the panic when writing zero-length microcode chunk.
- *	1.12	29 Sep 2003 Nitin Kamble <nitin.a.kamble@intel.com>,
- *		Jun Nakajima <jun.nakajima@intel.com>
- *		Support for the microcode updates in the new format.
- *	1.13	10 Oct 2003 Tigran Aivazian <tigran@veritas.com>
- *		Removed ->read() method and obsoleted MICROCODE_IOCFREE ioctl
- *		because we no longer hold a copy of applied microcode
- *		in kernel memory.
- *	1.14	25 Jun 2004 Tigran Aivazian <tigran@veritas.com>
- *		Fix sigmatch() macro to handle old CPUs with pf == 0.
- *		Thanks to Stuart Swales for pointing out this bug.
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
  */
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-- 
2.3.5


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH] x86/kaslr: Fix typo in KASLR_FLAG documentation
  2015-05-11  8:15 [0/8] tip queue 2015-05-11 Borislav Petkov
  2015-05-11  8:15 ` [PATCH] x86/alternatives: Switch AMD F15h and later to the P6 NOPs Borislav Petkov
  2015-05-11  8:15 ` [PATCH] x86/cpu/microcode: Zap changelog Borislav Petkov
@ 2015-05-11  8:15 ` Borislav Petkov
  2015-05-11 12:45   ` [tip:x86/boot] x86/kaslr: Fix typo in the " tip-bot for Miroslav Benes
  2015-05-11  8:15 ` [PATCH 1/5] x86/mm: Do not flush last cacheline twice in clflush_cache_range() Borislav Petkov
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-11  8:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML

From: Miroslav Benes <mbenes@suse.cz>

Documentation/x86/boot.txt labels the bit in boot_params.hdr.loadflags
as ALSR_FLAG while it should be KASLR_FLAG.

Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: corbet@lwn.net
Link: http://lkml.kernel.org/r/1429011324-7170-1-git-send-email-mbenes@suse.cz
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 Documentation/x86/boot.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt
index 88b85899d309..69e139791868 100644
--- a/Documentation/x86/boot.txt
+++ b/Documentation/x86/boot.txt
@@ -406,7 +406,7 @@ Protocol:	2.00+
 	- If 0, the protected-mode code is loaded at 0x10000.
 	- If 1, the protected-mode code is loaded at 0x100000.
 
-  Bit 1 (kernel internal): ALSR_FLAG
+  Bit 1 (kernel internal): KASLR_FLAG
 	- Used internally by the compressed kernel to communicate
 	  KASLR status to kernel proper.
 	  If 1, KASLR enabled.
-- 
2.3.5


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH 1/5] x86/mm: Do not flush last cacheline twice in clflush_cache_range()
  2015-05-11  8:15 [0/8] tip queue 2015-05-11 Borislav Petkov
                   ` (2 preceding siblings ...)
  2015-05-11  8:15 ` [PATCH] x86/kaslr: Fix typo in KASLR_FLAG documentation Borislav Petkov
@ 2015-05-11  8:15 ` Borislav Petkov
  2015-05-11 12:45   ` [tip:x86/mm] " tip-bot for Ross Zwisler
  2015-05-11  8:15 ` [PATCH] x86/vdso: Add arch/x86/include/uapi include path to HOST_EXTRACFLAGS Borislav Petkov
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-11  8:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML

From: Ross Zwisler <ross.zwisler@linux.intel.com>

The current algorithm used in clflush_cache_range() can cause the last
cache line of the buffer to be flushed twice. Fix that algorithm so that
each cache line will only be flushed once.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1430259192-18802-1-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/mm/pageattr.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 89af288ec674..338e507f95b8 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -129,16 +129,15 @@ within(unsigned long addr, unsigned long start, unsigned long end)
  */
 void clflush_cache_range(void *vaddr, unsigned int size)
 {
-	void *vend = vaddr + size - 1;
+	unsigned long clflush_mask = boot_cpu_data.x86_clflush_size - 1;
+	char *vend = (char *)vaddr + size;
+	char *p;
 
 	mb();
 
-	for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size)
-		clflushopt(vaddr);
-	/*
-	 * Flush any possible final partial cacheline:
-	 */
-	clflushopt(vend);
+	for (p = (char *)((unsigned long)vaddr & ~clflush_mask);
+	     p < vend; p += boot_cpu_data.x86_clflush_size)
+		clflushopt(p);
 
 	mb();
 }
-- 
2.3.5


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH] x86/vdso: Add arch/x86/include/uapi include path to HOST_EXTRACFLAGS
  2015-05-11  8:15 [0/8] tip queue 2015-05-11 Borislav Petkov
                   ` (3 preceding siblings ...)
  2015-05-11  8:15 ` [PATCH 1/5] x86/mm: Do not flush last cacheline twice in clflush_cache_range() Borislav Petkov
@ 2015-05-11  8:15 ` Borislav Petkov
  2015-05-11  8:15 ` [PATCH 2/5] x86/mm: Add kerneldoc comments for pcommit_sfence() Borislav Petkov
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-11  8:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML

From: Oleg Nesterov <oleg@redhat.com>

Change HOST_EXTRACFLAGS to include arch/x86/include/uapi along with
include/uapi.

This looks more consistent, and this fixes "make bzImage" on my old
distro which doesn't have asm/bitsperlong.h in /usr/include/.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jiri Olsa <jolsa@kernel.org>
Fixes: 6f121e548f83 ("x86, vdso: Reimplement vdso.so preparation in build-time C")
Link: http://lkml.kernel.org/r/20150507165835.GB18652@redhat.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/vdso/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/vdso/Makefile b/arch/x86/vdso/Makefile
index 275a3a8b78af..e97032069f88 100644
--- a/arch/x86/vdso/Makefile
+++ b/arch/x86/vdso/Makefile
@@ -51,7 +51,7 @@ VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
 $(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE
 	$(call if_changed,vdso)
 
-HOST_EXTRACFLAGS += -I$(srctree)/tools/include -I$(srctree)/include/uapi
+HOST_EXTRACFLAGS += -I$(srctree)/tools/include -I$(srctree)/include/uapi -I$(srctree)/arch/x86/include/uapi
 hostprogs-y			+= vdso2c
 
 quiet_cmd_vdso2c = VDSO2C  $@
-- 
2.3.5


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH 2/5] x86/mm: Add kerneldoc comments for pcommit_sfence()
  2015-05-11  8:15 [0/8] tip queue 2015-05-11 Borislav Petkov
                   ` (4 preceding siblings ...)
  2015-05-11  8:15 ` [PATCH] x86/vdso: Add arch/x86/include/uapi include path to HOST_EXTRACFLAGS Borislav Petkov
@ 2015-05-11  8:15 ` Borislav Petkov
  2015-05-11 12:45   ` [tip:x86/mm] " tip-bot for Ross Zwisler
  2015-05-11  8:15 ` [PATCH 3/5] x86/MTRR: Remove wrong address check in __mtrr_type_lookup() Borislav Petkov
  2015-05-11  8:15 ` [PATCH 4/5] x86/mm: Add ioremap_uc() helper to map memory uncacheable (not UC-) Borislav Petkov
  7 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-11  8:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML

From: Ross Zwisler <ross.zwisler@linux.intel.com>

Add kerneldoc comments for pcommit_sfence() describing the purpose of
the PCOMMIT instruction and demonstrating its usage with an example.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: H Peter Anvin <h.peter.anvin@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/1430261196-2401-1-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/special_insns.h | 38 ++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index aeb4666e0c0a..2270e41b32fd 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -215,6 +215,44 @@ static inline void clwb(volatile void *__p)
 		: [pax] "a" (p));
 }
 
+/**
+ * pcommit_sfence() - persistent commit and fence
+ *
+ * The PCOMMIT instruction ensures that data that has been flushed from the
+ * processor's cache hierarchy with CLWB, CLFLUSHOPT or CLFLUSH is accepted to
+ * memory and is durable on the DIMM.  The primary use case for this is
+ * persistent memory.
+ *
+ * This function shows how to properly use CLWB/CLFLUSHOPT/CLFLUSH and PCOMMIT
+ * with appropriate fencing.
+ *
+ * Example:
+ * void flush_and_commit_buffer(void *vaddr, unsigned int size)
+ * {
+ *         unsigned long clflush_mask = boot_cpu_data.x86_clflush_size - 1;
+ *         void *vend = vaddr + size;
+ *         void *p;
+ *
+ *         for (p = (void *)((unsigned long)vaddr & ~clflush_mask);
+ *              p < vend; p += boot_cpu_data.x86_clflush_size)
+ *                 clwb(p);
+ *
+ *         // SFENCE to order CLWB/CLFLUSHOPT/CLFLUSH cache flushes
+ *         // MFENCE via mb() also works
+ *         wmb();
+ *
+ *         // PCOMMIT and the required SFENCE for ordering
+ *         pcommit_sfence();
+ * }
+ *
+ * After this function completes the data pointed to by 'vaddr' has been
+ * accepted to memory and will be durable if the 'vaddr' points to persistent
+ * memory.
+ *
+ * PCOMMIT must always be ordered by an MFENCE or SFENCE, so to help simplify
+ * things we include both the PCOMMIT and the required SFENCE in the
+ * alternatives generated by pcommit_sfence().
+ */
 static inline void pcommit_sfence(void)
 {
 	alternative(ASM_NOP7,
-- 
2.3.5


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH 3/5] x86/MTRR: Remove wrong address check in __mtrr_type_lookup()
  2015-05-11  8:15 [0/8] tip queue 2015-05-11 Borislav Petkov
                   ` (5 preceding siblings ...)
  2015-05-11  8:15 ` [PATCH 2/5] x86/mm: Add kerneldoc comments for pcommit_sfence() Borislav Petkov
@ 2015-05-11  8:15 ` Borislav Petkov
  2015-05-11 12:46     ` tip-bot for Toshi Kani
  2015-05-11  8:15 ` [PATCH 4/5] x86/mm: Add ioremap_uc() helper to map memory uncacheable (not UC-) Borislav Petkov
  7 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-11  8:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML

From: Toshi Kani <toshi.kani@hp.com>

__mtrr_type_lookup() checks MTRR fixed ranges when mtrr_state.have_fixed
is set and start is less than 0x100000. However, the 'else if (start
< 0x1000000)' in the code checks with a wrong address as it has an
extra-zero in the address. The code still runs correctly as this check
is meaningless, though.

This patch replaces the wrong address check with 'else' with no
condition.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm <linux-mm@kvack.org>
Cc: x86-ml <x86@kernel.org>
Cc: dave.hansen@intel.com
Cc: Elliott@hp.com
Cc: pebolle@tiscali.nl
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: mingo@redhat.com
Link: http://lkml.kernel.org/r/1427234921-19737-4-git-send-email-toshi.kani@hp.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/mtrr/generic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 7d74f7b3c6ba..5b239679cfc9 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -137,7 +137,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			idx = 1 * 8;
 			idx += ((start - 0x80000) >> 14);
 			return mtrr_state.fixed_ranges[idx];
-		} else if (start < 0x1000000) {
+		} else {
 			idx = 3 * 8;
 			idx += ((start - 0xC0000) >> 12);
 			return mtrr_state.fixed_ranges[idx];
-- 
2.3.5


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH 4/5] x86/mm: Add ioremap_uc() helper to map memory uncacheable (not UC-)
  2015-05-11  8:15 [0/8] tip queue 2015-05-11 Borislav Petkov
                   ` (6 preceding siblings ...)
  2015-05-11  8:15 ` [PATCH 3/5] x86/MTRR: Remove wrong address check in __mtrr_type_lookup() Borislav Petkov
@ 2015-05-11  8:15 ` Borislav Petkov
  2015-05-11 12:46   ` [tip:x86/mm] " tip-bot for Luis R. Rodriguez
  7 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-11  8:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML

From: "Luis R. Rodriguez" <mcgrof@suse.com>

ioremap_nocache() currently uses UC- by default. Our goal is to
eventually make UC the default. Linux maps UC- to PCD=1, PWT=0 page
attributes on non-PAT systems. Linux maps UC to PCD=1, PWT=1 page
attributes on non-PAT systems. On non-PAT and PAT systems a WC MTRR has
different effects on pages with either of these attributes. In order
to help with a smooth transition its best to enable use of UC (PCD,1,
PWT=1) on a region as that ensures a WC MTRR will have no effect on a
region, this however requires us to have an way to declare a region as
UC and we currently do not have a way to do this.

WC MTRR on non-PAT system with PCD=1, PWT=0 (UC-) yields WC.
WC MTRR on non-PAT system with PCD=1, PWT=1 (UC)  yields UC.

WC MTRR on PAT system with PCD=1, PWT=0 (UC-) yields WC.
WC MTRR on PAT system with PCD=1, PWT=1 (UC)  yields UC.

A flip of the default ioremap_nocache() behaviour from UC- to UC can
therefore regress a memory region from effective memory type WC to UC
if MTRRs are used. Use of MTRRs should be phased out and in the best
case only arch_phys_wc_add() use will remain, even if this happens
arch_phys_wc_add() will have an effect on non-PAT systems and changes to
default ioremap_nocache() behaviour could regress drivers.

Now, ideally we'd use ioremap_nocache() on the regions in which we'd
need uncachable memory types and avoid any MTRRs on those regions. There
are however some restrictions on MTRRs use, such as the requirement
of having the base and size of variable sized MTRRs to be powers of
two, which could mean having to use a WC MTRR over a large area which
includes a region in which write-combining effects are undesirable.

Add ioremap_uc() to help with the both phasing out of MTRR use and also
provide a way to blacklist small WC undesirable regions in devices with
mixed regions which are size-implicated to use large WC MTRRs. Use of
ioremap_uc() helps phase out MTRR use by avoiding regressions with an
eventual flip of default behaviour or ioremap_nocache() from UC- to UC.

Drivers working with WC MTRRs can use the below table to review and
consider the use of ioremap*() and similar helpers to ensure appropriate
behaviour long term even if default ioremap_nocache() behaviour changes
from UC- to UC.

Although ioremap_uc() is being added we leave set_memory_uc() to use UC-
as only initial memory type setup is required to be able to accommodate
existing device drivers and phase out MTRR use. It should also be
clarified that set_memory_uc() cannot be used with IO memory, even
though its use will not return any errors, it really has no effect.

----------------------------------------------------------------------
MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
----------------------------------------------------------------------
                                                  Non-PAT |  PAT
     PAT
     |PCD
     ||PWT
     |||
WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   WC
WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
----------------------------------------------------------------------

[ hpa: this requires communication with driver writers ]

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: x86@kernel.org
Cc: linux-fbdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1430343851-967-2-git-send-email-mcgrof@do-not-panic.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/io.h |  1 +
 arch/x86/mm/ioremap.c     | 36 +++++++++++++++++++++++++++++++++++-
 arch/x86/mm/pageattr.c    |  3 +++
 include/asm-generic/io.h  |  8 ++++++++
 4 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 34a5b93704d3..4afc05ffa566 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -177,6 +177,7 @@ static inline unsigned int isa_virt_to_bus(volatile void *address)
  * look at pci_iomap().
  */
 extern void __iomem *ioremap_nocache(resource_size_t offset, unsigned long size);
+extern void __iomem *ioremap_uc(resource_size_t offset, unsigned long size);
 extern void __iomem *ioremap_cache(resource_size_t offset, unsigned long size);
 extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size,
 				unsigned long prot_val);
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 5ead4d6cf3a7..fc08431a387b 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -237,7 +237,8 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, unsigned long size)
 	 *	pat_enabled ? _PAGE_CACHE_MODE_UC : _PAGE_CACHE_MODE_UC_MINUS;
 	 *
 	 * Till we fix all X drivers to use ioremap_wc(), we will use
-	 * UC MINUS.
+	 * UC MINUS. Drivers that are certain they need or can already
+	 * be converted over to strong UC can use ioremap_uc().
 	 */
 	enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS;
 
@@ -247,6 +248,39 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, unsigned long size)
 EXPORT_SYMBOL(ioremap_nocache);
 
 /**
+ * ioremap_uc     -   map bus memory into CPU space as strongly uncachable
+ * @phys_addr:    bus address of the memory
+ * @size:      size of the resource to map
+ *
+ * ioremap_uc performs a platform specific sequence of operations to
+ * make bus memory CPU accessible via the readb/readw/readl/writeb/
+ * writew/writel functions and the other mmio helpers. The returned
+ * address is not guaranteed to be usable directly as a virtual
+ * address.
+ *
+ * This version of ioremap ensures that the memory is marked with a strong
+ * preference as completely uncachable on the CPU when possible. For non-PAT
+ * systems this ends up setting page-attribute flags PCD=1, PWT=1. For PAT
+ * systems this will set the PAT entry for the pages as strong UC.  This call
+ * will honor existing caching rules from things like the PCI bus. Note that
+ * there are other caches and buffers on many busses. In particular driver
+ * authors should read up on PCI writes.
+ *
+ * It's useful if some control registers are in such an area and
+ * write combining or read caching is not desirable:
+ *
+ * Must be freed with iounmap.
+ */
+void __iomem *ioremap_uc(resource_size_t phys_addr, unsigned long size)
+{
+	enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC;
+
+	return __ioremap_caller(phys_addr, size, pcm,
+				__builtin_return_address(0));
+}
+EXPORT_SYMBOL_GPL(ioremap_uc);
+
+/**
  * ioremap_wc	-	map memory into CPU space write combined
  * @phys_addr:	bus address of the memory
  * @size:	size of the resource to map
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 338e507f95b8..d35148acdc05 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1467,6 +1467,9 @@ int _set_memory_uc(unsigned long addr, int numpages)
 {
 	/*
 	 * for now UC MINUS. see comments in ioremap_nocache()
+	 * If you really need strong UC use ioremap_uc(), but note
+	 * that you cannot override IO areas with set_memory_*() as
+	 * these helpers cannot work with IO memory.
 	 */
 	return change_page_attr_set(&addr, numpages,
 				    cachemode2pgprot(_PAGE_CACHE_MODE_UC_MINUS),
diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index 9db042304df3..90ccba7f9f9a 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -769,6 +769,14 @@ static inline void __iomem *ioremap_nocache(phys_addr_t offset, size_t size)
 }
 #endif
 
+#ifndef ioremap_uc
+#define ioremap_uc ioremap_uc
+static inline void __iomem *ioremap_uc(phys_addr_t offset, size_t size)
+{
+	return ioremap_nocache(offset, size);
+}
+#endif
+
 #ifndef ioremap_wc
 #define ioremap_wc ioremap_wc
 static inline void __iomem *ioremap_wc(phys_addr_t offset, size_t size)
-- 
2.3.5


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/urgent] x86/vdso: Fix 'make bzImage' on older distros
  2015-05-07 16:58           ` [PATCH 1/1] " Oleg Nesterov
  2015-05-07 19:46             ` Andy Lutomirski
@ 2015-05-11 12:44             ` tip-bot for Oleg Nesterov
  1 sibling, 0 replies; 710+ messages in thread
From: tip-bot for Oleg Nesterov @ 2015-05-11 12:44 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: bp, luto, stable, tglx, linux-kernel, jolsa, dvlasenk, brgerst,
	bp, luto, mingo, hpa, oleg, peterz, rusty, torvalds

Commit-ID:  ef7254a595912b026d80a4116b8c4cd5b79d9c62
Gitweb:     http://git.kernel.org/tip/ef7254a595912b026d80a4116b8c4cd5b79d9c62
Author:     Oleg Nesterov <oleg@redhat.com>
AuthorDate: Mon, 11 May 2015 10:15:50 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 11 May 2015 10:25:02 +0200

x86/vdso: Fix 'make bzImage' on older distros

Change HOST_EXTRACFLAGS to include arch/x86/include/uapi along
with include/uapi.

This looks more consistent, and this fixes "make bzImage" on my
old distro which doesn't have asm/bitsperlong.h in /usr/include/.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 6f121e548f83 ("x86, vdso: Reimplement vdso.so preparation in build-time C")
Link: http://lkml.kernel.org/r/1431332153-18566-6-git-send-email-bp@alien8.de
Link: http://lkml.kernel.org/r/20150507165835.GB18652@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/vdso/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/vdso/Makefile b/arch/x86/vdso/Makefile
index 275a3a8..e970320 100644
--- a/arch/x86/vdso/Makefile
+++ b/arch/x86/vdso/Makefile
@@ -51,7 +51,7 @@ VDSO_LDFLAGS_vdso.lds = -m64 -Wl,-soname=linux-vdso.so.1 \
 $(obj)/vdso64.so.dbg: $(src)/vdso.lds $(vobjs) FORCE
 	$(call if_changed,vdso)
 
-HOST_EXTRACFLAGS += -I$(srctree)/tools/include -I$(srctree)/include/uapi
+HOST_EXTRACFLAGS += -I$(srctree)/tools/include -I$(srctree)/include/uapi -I$(srctree)/arch/x86/include/uapi
 hostprogs-y			+= vdso2c
 
 quiet_cmd_vdso2c = VDSO2C  $@

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/asm] x86/alternatives: Switch AMD F15h and later to the P6 NOPs
  2015-05-11  8:15 ` [PATCH] x86/alternatives: Switch AMD F15h and later to the P6 NOPs Borislav Petkov
@ 2015-05-11 12:44   ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Borislav Petkov @ 2015-05-11 12:44 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, aravind.gopalakrishnan, peterz, dvlasenk, brgerst, bp,
	torvalds, tglx, mingo, linux-kernel, luto, bp

Commit-ID:  f21262b8e092a770e39fbd405cc18a0247c3af68
Gitweb:     http://git.kernel.org/tip/f21262b8e092a770e39fbd405cc18a0247c3af68
Author:     Borislav Petkov <bp@suse.de>
AuthorDate: Mon, 11 May 2015 10:15:46 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 11 May 2015 10:26:05 +0200

x86/alternatives: Switch AMD F15h and later to the P6 NOPs

Software optimization guides for both F15h and F16h cite those
NOPs as the optimal ones. A microbenchmark confirms that
actually even older families are better with the single-insn
NOPs so switch to them for the alternatives.

Cycles count below includes the loop overhead of the measurement
but that overhead is the same with all runs.

	F10h, revE:
	-----------
	Running NOP tests, 1000 NOPs x 1000000 repetitions

	K8:
			      90     288.212282 cycles
			   66 90     288.220840 cycles
			66 66 90     288.219447 cycles
		     66 66 66 90     288.223204 cycles
		  66 66 90 66 90     571.393424 cycles
	       66 66 90 66 66 90     571.374919 cycles
	    66 66 66 90 66 66 90     572.249281 cycles
	 66 66 66 90 66 66 66 90     571.388651 cycles

	P6:
			      90     288.214193 cycles
			   66 90     288.225550 cycles
			0f 1f 00     288.224441 cycles
		     0f 1f 40 00     288.225030 cycles
		  0f 1f 44 00 00     288.233558 cycles
	       66 0f 1f 44 00 00     324.792342 cycles
	    0f 1f 80 00 00 00 00     325.657462 cycles
	 0f 1f 84 00 00 00 00 00     430.246643 cycles

	F14h:
	----
	Running NOP tests, 1000 NOPs x 1000000 repetitions

	K8:
			      90     510.404890 cycles
			   66 90     510.432117 cycles
			66 66 90     510.561858 cycles
		     66 66 66 90     510.541865 cycles
		  66 66 90 66 90    1014.192782 cycles
	       66 66 90 66 66 90    1014.226546 cycles
	    66 66 66 90 66 66 90    1014.334299 cycles
	 66 66 66 90 66 66 66 90    1014.381205 cycles

	P6:
			      90     510.436710 cycles
			   66 90     510.448229 cycles
			0f 1f 00     510.545100 cycles
		     0f 1f 40 00     510.502792 cycles
		  0f 1f 44 00 00     510.589517 cycles
	       66 0f 1f 44 00 00     510.611462 cycles
	    0f 1f 80 00 00 00 00     511.166794 cycles
	 0f 1f 84 00 00 00 00 00     511.651641 cycles

	F15h:
	-----
	Running NOP tests, 1000 NOPs x 1000000 repetitions

	K8:
			      90     243.128396 cycles
			   66 90     243.129883 cycles
			66 66 90     243.131631 cycles
		     66 66 66 90     242.499324 cycles
		  66 66 90 66 90     481.829083 cycles
	       66 66 90 66 66 90     481.884413 cycles
	    66 66 66 90 66 66 90     481.851446 cycles
	 66 66 66 90 66 66 66 90     481.409220 cycles

	P6:
			      90     243.127026 cycles
			   66 90     243.130711 cycles
			0f 1f 00     243.122747 cycles
		     0f 1f 40 00     242.497617 cycles
		  0f 1f 44 00 00     245.354461 cycles
	       66 0f 1f 44 00 00     361.930417 cycles
	    0f 1f 80 00 00 00 00     362.844944 cycles
	 0f 1f 84 00 00 00 00 00     480.514948 cycles

	F16h:
	-----
	Running NOP tests, 1000 NOPs x 1000000 repetitions

	K8:
			      90     507.793298 cycles
			   66 90     507.789636 cycles
			66 66 90     507.826490 cycles
		     66 66 66 90     507.859075 cycles
		  66 66 90 66 90    1008.663129 cycles
	       66 66 90 66 66 90    1008.696259 cycles
	    66 66 66 90 66 66 90    1008.692517 cycles
	 66 66 66 90 66 66 66 90    1008.755399 cycles

	P6:
			      90     507.795232 cycles
			   66 90     507.794761 cycles
			0f 1f 00     507.834901 cycles
		     0f 1f 40 00     507.822629 cycles
		  0f 1f 44 00 00     507.838493 cycles
	       66 0f 1f 44 00 00     507.908597 cycles
	    0f 1f 80 00 00 00 00     507.946417 cycles
	 0f 1f 84 00 00 00 00 00     507.954960 cycles

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431332153-18566-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/alternative.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index aef6531..b0932c4 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -227,6 +227,15 @@ void __init arch_init_ideal_nops(void)
 #endif
 		}
 		break;
+
+	case X86_VENDOR_AMD:
+		if (boot_cpu_data.x86 > 0xf) {
+			ideal_nops = p6_nops;
+			return;
+		}
+
+		/* fall through */
+
 	default:
 #ifdef CONFIG_X86_64
 		ideal_nops = k8_nops;

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/microcode] x86/cpu/microcode: Zap changelog
  2015-05-11  8:15 ` [PATCH] x86/cpu/microcode: Zap changelog Borislav Petkov
@ 2015-05-11 12:45   ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Borislav Petkov @ 2015-05-11 12:45 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: bp, peterz, hpa, mingo, tglx, dvlasenk, bp, luto, linux-kernel,
	brgerst, torvalds

Commit-ID:  6b44e72a1c45d1a4e903af75611235a2d6ea25e3
Gitweb:     http://git.kernel.org/tip/6b44e72a1c45d1a4e903af75611235a2d6ea25e3
Author:     Borislav Petkov <bp@suse.de>
AuthorDate: Mon, 11 May 2015 10:15:47 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 11 May 2015 10:27:09 +0200

x86/cpu/microcode: Zap changelog

It is useless at best and git history has it all detailed
anyway. Update copyright while at it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431332153-18566-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/microcode/core.c  | 76 +++++------------------------------
 arch/x86/kernel/cpu/microcode/intel.c | 75 ++++------------------------------
 2 files changed, 16 insertions(+), 135 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index 36a8361..6236a54 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -1,74 +1,16 @@
 /*
- *	Intel CPU Microcode Update Driver for Linux
+ * CPU Microcode Update Driver for Linux
  *
- *	Copyright (C) 2000-2006 Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
- *		      2006	Shaohua Li <shaohua.li@intel.com>
+ * Copyright (C) 2000-2006 Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
+ *	      2006	Shaohua Li <shaohua.li@intel.com>
+ *	      2013-2015	Borislav Petkov <bp@alien8.de>
  *
- *	This driver allows to upgrade microcode on Intel processors
- *	belonging to IA-32 family - PentiumPro, Pentium II,
- *	Pentium III, Xeon, Pentium 4, etc.
+ * This driver allows to upgrade microcode on x86 processors.
  *
- *	Reference: Section 8.11 of Volume 3a, IA-32 Intel? Architecture
- *	Software Developer's Manual
- *	Order Number 253668 or free download from:
- *
- *	http://developer.intel.com/Assets/PDF/manual/253668.pdf	
- *
- *	For more information, go to http://www.urbanmyth.org/microcode
- *
- *	This program is free software; you can redistribute it and/or
- *	modify it under the terms of the GNU General Public License
- *	as published by the Free Software Foundation; either version
- *	2 of the License, or (at your option) any later version.
- *
- *	1.0	16 Feb 2000, Tigran Aivazian <tigran@sco.com>
- *		Initial release.
- *	1.01	18 Feb 2000, Tigran Aivazian <tigran@sco.com>
- *		Added read() support + cleanups.
- *	1.02	21 Feb 2000, Tigran Aivazian <tigran@sco.com>
- *		Added 'device trimming' support. open(O_WRONLY) zeroes
- *		and frees the saved copy of applied microcode.
- *	1.03	29 Feb 2000, Tigran Aivazian <tigran@sco.com>
- *		Made to use devfs (/dev/cpu/microcode) + cleanups.
- *	1.04	06 Jun 2000, Simon Trimmer <simon@veritas.com>
- *		Added misc device support (now uses both devfs and misc).
- *		Added MICROCODE_IOCFREE ioctl to clear memory.
- *	1.05	09 Jun 2000, Simon Trimmer <simon@veritas.com>
- *		Messages for error cases (non Intel & no suitable microcode).
- *	1.06	03 Aug 2000, Tigran Aivazian <tigran@veritas.com>
- *		Removed ->release(). Removed exclusive open and status bitmap.
- *		Added microcode_rwsem to serialize read()/write()/ioctl().
- *		Removed global kernel lock usage.
- *	1.07	07 Sep 2000, Tigran Aivazian <tigran@veritas.com>
- *		Write 0 to 0x8B msr and then cpuid before reading revision,
- *		so that it works even if there were no update done by the
- *		BIOS. Otherwise, reading from 0x8B gives junk (which happened
- *		to be 0 on my machine which is why it worked even when I
- *		disabled update by the BIOS)
- *		Thanks to Eric W. Biederman <ebiederman@lnxi.com> for the fix.
- *	1.08	11 Dec 2000, Richard Schaal <richard.schaal@intel.com> and
- *			     Tigran Aivazian <tigran@veritas.com>
- *		Intel Pentium 4 processor support and bugfixes.
- *	1.09	30 Oct 2001, Tigran Aivazian <tigran@veritas.com>
- *		Bugfix for HT (Hyper-Threading) enabled processors
- *		whereby processor resources are shared by all logical processors
- *		in a single CPU package.
- *	1.10	28 Feb 2002 Asit K Mallick <asit.k.mallick@intel.com> and
- *		Tigran Aivazian <tigran@veritas.com>,
- *		Serialize updates as required on HT processors due to
- *		speculative nature of implementation.
- *	1.11	22 Mar 2002 Tigran Aivazian <tigran@veritas.com>
- *		Fix the panic when writing zero-length microcode chunk.
- *	1.12	29 Sep 2003 Nitin Kamble <nitin.a.kamble@intel.com>,
- *		Jun Nakajima <jun.nakajima@intel.com>
- *		Support for the microcode updates in the new format.
- *	1.13	10 Oct 2003 Tigran Aivazian <tigran@veritas.com>
- *		Removed ->read() method and obsoleted MICROCODE_IOCFREE ioctl
- *		because we no longer hold a copy of applied microcode
- *		in kernel memory.
- *	1.14	25 Jun 2004 Tigran Aivazian <tigran@veritas.com>
- *		Fix sigmatch() macro to handle old CPUs with pf == 0.
- *		Thanks to Stuart Swales for pointing out this bug.
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
  */
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index a41bead..e20d4e5 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -1,74 +1,13 @@
 /*
- *	Intel CPU Microcode Update Driver for Linux
+ * Intel CPU Microcode Update Driver for Linux
  *
- *	Copyright (C) 2000-2006 Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
- *		      2006	Shaohua Li <shaohua.li@intel.com>
+ * Copyright (C) 2000-2006 Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
+ *		 2006 Shaohua Li <shaohua.li@intel.com>
  *
- *	This driver allows to upgrade microcode on Intel processors
- *	belonging to IA-32 family - PentiumPro, Pentium II,
- *	Pentium III, Xeon, Pentium 4, etc.
- *
- *	Reference: Section 8.11 of Volume 3a, IA-32 Intel? Architecture
- *	Software Developer's Manual
- *	Order Number 253668 or free download from:
- *
- *	http://developer.intel.com/Assets/PDF/manual/253668.pdf	
- *
- *	For more information, go to http://www.urbanmyth.org/microcode
- *
- *	This program is free software; you can redistribute it and/or
- *	modify it under the terms of the GNU General Public License
- *	as published by the Free Software Foundation; either version
- *	2 of the License, or (at your option) any later version.
- *
- *	1.0	16 Feb 2000, Tigran Aivazian <tigran@sco.com>
- *		Initial release.
- *	1.01	18 Feb 2000, Tigran Aivazian <tigran@sco.com>
- *		Added read() support + cleanups.
- *	1.02	21 Feb 2000, Tigran Aivazian <tigran@sco.com>
- *		Added 'device trimming' support. open(O_WRONLY) zeroes
- *		and frees the saved copy of applied microcode.
- *	1.03	29 Feb 2000, Tigran Aivazian <tigran@sco.com>
- *		Made to use devfs (/dev/cpu/microcode) + cleanups.
- *	1.04	06 Jun 2000, Simon Trimmer <simon@veritas.com>
- *		Added misc device support (now uses both devfs and misc).
- *		Added MICROCODE_IOCFREE ioctl to clear memory.
- *	1.05	09 Jun 2000, Simon Trimmer <simon@veritas.com>
- *		Messages for error cases (non Intel & no suitable microcode).
- *	1.06	03 Aug 2000, Tigran Aivazian <tigran@veritas.com>
- *		Removed ->release(). Removed exclusive open and status bitmap.
- *		Added microcode_rwsem to serialize read()/write()/ioctl().
- *		Removed global kernel lock usage.
- *	1.07	07 Sep 2000, Tigran Aivazian <tigran@veritas.com>
- *		Write 0 to 0x8B msr and then cpuid before reading revision,
- *		so that it works even if there were no update done by the
- *		BIOS. Otherwise, reading from 0x8B gives junk (which happened
- *		to be 0 on my machine which is why it worked even when I
- *		disabled update by the BIOS)
- *		Thanks to Eric W. Biederman <ebiederman@lnxi.com> for the fix.
- *	1.08	11 Dec 2000, Richard Schaal <richard.schaal@intel.com> and
- *			     Tigran Aivazian <tigran@veritas.com>
- *		Intel Pentium 4 processor support and bugfixes.
- *	1.09	30 Oct 2001, Tigran Aivazian <tigran@veritas.com>
- *		Bugfix for HT (Hyper-Threading) enabled processors
- *		whereby processor resources are shared by all logical processors
- *		in a single CPU package.
- *	1.10	28 Feb 2002 Asit K Mallick <asit.k.mallick@intel.com> and
- *		Tigran Aivazian <tigran@veritas.com>,
- *		Serialize updates as required on HT processors due to
- *		speculative nature of implementation.
- *	1.11	22 Mar 2002 Tigran Aivazian <tigran@veritas.com>
- *		Fix the panic when writing zero-length microcode chunk.
- *	1.12	29 Sep 2003 Nitin Kamble <nitin.a.kamble@intel.com>,
- *		Jun Nakajima <jun.nakajima@intel.com>
- *		Support for the microcode updates in the new format.
- *	1.13	10 Oct 2003 Tigran Aivazian <tigran@veritas.com>
- *		Removed ->read() method and obsoleted MICROCODE_IOCFREE ioctl
- *		because we no longer hold a copy of applied microcode
- *		in kernel memory.
- *	1.14	25 Jun 2004 Tigran Aivazian <tigran@veritas.com>
- *		Fix sigmatch() macro to handle old CPUs with pf == 0.
- *		Thanks to Stuart Swales for pointing out this bug.
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
  */
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/boot] x86/kaslr: Fix typo in the KASLR_FLAG documentation
  2015-05-11  8:15 ` [PATCH] x86/kaslr: Fix typo in KASLR_FLAG documentation Borislav Petkov
@ 2015-05-11 12:45   ` tip-bot for Miroslav Benes
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Miroslav Benes @ 2015-05-11 12:45 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: luto, dvlasenk, brgerst, jkosina, hpa, mbenes, corbet, tglx,
	mingo, linux-kernel, torvalds, bp, bp, peterz

Commit-ID:  d4bd441532b81fe2be1706e7f9dbbe8b5a364bcf
Gitweb:     http://git.kernel.org/tip/d4bd441532b81fe2be1706e7f9dbbe8b5a364bcf
Author:     Miroslav Benes <mbenes@suse.cz>
AuthorDate: Mon, 11 May 2015 10:15:48 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 11 May 2015 10:28:57 +0200

x86/kaslr: Fix typo in the KASLR_FLAG documentation

Documentation/x86/boot.txt labels the bit in
boot_params.hdr.loadflags as ALSR_FLAG while it should be
KASLR_FLAG.

Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1429011324-7170-1-git-send-email-mbenes@suse.cz
Link: http://lkml.kernel.org/r/1431332153-18566-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/x86/boot.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt
index 88b8589..69e1397 100644
--- a/Documentation/x86/boot.txt
+++ b/Documentation/x86/boot.txt
@@ -406,7 +406,7 @@ Protocol:	2.00+
 	- If 0, the protected-mode code is loaded at 0x10000.
 	- If 1, the protected-mode code is loaded at 0x100000.
 
-  Bit 1 (kernel internal): ALSR_FLAG
+  Bit 1 (kernel internal): KASLR_FLAG
 	- Used internally by the compressed kernel to communicate
 	  KASLR status to kernel proper.
 	  If 1, KASLR enabled.

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/mm] x86/mm: Do not flush last cacheline twice in clflush_cache_range()
  2015-05-11  8:15 ` [PATCH 1/5] x86/mm: Do not flush last cacheline twice in clflush_cache_range() Borislav Petkov
@ 2015-05-11 12:45   ` tip-bot for Ross Zwisler
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Ross Zwisler @ 2015-05-11 12:45 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: luto, linux-kernel, brgerst, bp, mcgrof, bp, toshi.kani, hpa,
	akpm, ross.zwisler, tglx, peterz, dvlasenk, mingo, torvalds

Commit-ID:  6c434d6176c0cb42847c33245189667d645db7bf
Gitweb:     http://git.kernel.org/tip/6c434d6176c0cb42847c33245189667d645db7bf
Author:     Ross Zwisler <ross.zwisler@linux.intel.com>
AuthorDate: Mon, 11 May 2015 10:15:49 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 11 May 2015 10:38:44 +0200

x86/mm: Do not flush last cacheline twice in clflush_cache_range()

The current algorithm used in clflush_cache_range() can cause
the last cache line of the buffer to be flushed twice. Fix that
algorithm so that each cache line will only be flushed once.

Reported-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/1430259192-18802-1-git-send-email-ross.zwisler@linux.intel.com
Link: http://lkml.kernel.org/r/1431332153-18566-5-git-send-email-bp@alien8.de
[ Changed it to 'void *' to simplify the type conversions. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/mm/pageattr.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 89af288..5ddd900 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -129,16 +129,15 @@ within(unsigned long addr, unsigned long start, unsigned long end)
  */
 void clflush_cache_range(void *vaddr, unsigned int size)
 {
-	void *vend = vaddr + size - 1;
+	unsigned long clflush_mask = boot_cpu_data.x86_clflush_size - 1;
+	void *vend = vaddr + size;
+	void *p;
 
 	mb();
 
-	for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size)
-		clflushopt(vaddr);
-	/*
-	 * Flush any possible final partial cacheline:
-	 */
-	clflushopt(vend);
+	for (p = (void *)((unsigned long)vaddr & ~clflush_mask);
+	     p < vend; p += boot_cpu_data.x86_clflush_size)
+		clflushopt(p);
 
 	mb();
 }

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/mm] x86/mm: Add kerneldoc comments for pcommit_sfence()
  2015-05-11  8:15 ` [PATCH 2/5] x86/mm: Add kerneldoc comments for pcommit_sfence() Borislav Petkov
@ 2015-05-11 12:45   ` tip-bot for Ross Zwisler
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Ross Zwisler @ 2015-05-11 12:45 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: akpm, tglx, linux-kernel, mcgrof, bp, torvalds, hpa, bp,
	toshi.kani, dvlasenk, mingo, peterz, luto, h.peter.anvin,
	brgerst, ross.zwisler

Commit-ID:  ca7d9b795e6bc78c80a1771ada867994fabcfc01
Gitweb:     http://git.kernel.org/tip/ca7d9b795e6bc78c80a1771ada867994fabcfc01
Author:     Ross Zwisler <ross.zwisler@linux.intel.com>
AuthorDate: Mon, 11 May 2015 10:15:51 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 11 May 2015 10:38:44 +0200

x86/mm: Add kerneldoc comments for pcommit_sfence()

Add kerneldoc comments for pcommit_sfence() describing the
purpose of the PCOMMIT instruction and demonstrating its usage
with an example.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H Peter Anvin <h.peter.anvin@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/1430261196-2401-1-git-send-email-ross.zwisler@linux.intel.com
Link: http://lkml.kernel.org/r/1431332153-18566-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/special_insns.h | 38 ++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index aeb4666e..2270e41 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -215,6 +215,44 @@ static inline void clwb(volatile void *__p)
 		: [pax] "a" (p));
 }
 
+/**
+ * pcommit_sfence() - persistent commit and fence
+ *
+ * The PCOMMIT instruction ensures that data that has been flushed from the
+ * processor's cache hierarchy with CLWB, CLFLUSHOPT or CLFLUSH is accepted to
+ * memory and is durable on the DIMM.  The primary use case for this is
+ * persistent memory.
+ *
+ * This function shows how to properly use CLWB/CLFLUSHOPT/CLFLUSH and PCOMMIT
+ * with appropriate fencing.
+ *
+ * Example:
+ * void flush_and_commit_buffer(void *vaddr, unsigned int size)
+ * {
+ *         unsigned long clflush_mask = boot_cpu_data.x86_clflush_size - 1;
+ *         void *vend = vaddr + size;
+ *         void *p;
+ *
+ *         for (p = (void *)((unsigned long)vaddr & ~clflush_mask);
+ *              p < vend; p += boot_cpu_data.x86_clflush_size)
+ *                 clwb(p);
+ *
+ *         // SFENCE to order CLWB/CLFLUSHOPT/CLFLUSH cache flushes
+ *         // MFENCE via mb() also works
+ *         wmb();
+ *
+ *         // PCOMMIT and the required SFENCE for ordering
+ *         pcommit_sfence();
+ * }
+ *
+ * After this function completes the data pointed to by 'vaddr' has been
+ * accepted to memory and will be durable if the 'vaddr' points to persistent
+ * memory.
+ *
+ * PCOMMIT must always be ordered by an MFENCE or SFENCE, so to help simplify
+ * things we include both the PCOMMIT and the required SFENCE in the
+ * alternatives generated by pcommit_sfence().
+ */
 static inline void pcommit_sfence(void)
 {
 	alternative(ASM_NOP7,

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/mm] x86/mm/mtrr: Remove incorrect address check in __mtrr_type_lookup()
  2015-05-11  8:15 ` [PATCH 3/5] x86/MTRR: Remove wrong address check in __mtrr_type_lookup() Borislav Petkov
@ 2015-05-11 12:46     ` tip-bot for Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Toshi Kani @ 2015-05-11 12:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: brgerst, linux-kernel, bp, toshi.kani, hpa, peterz, akpm,
	dvlasenk, mcgrof, torvalds, bp, linux-mm, tglx, luto, mingo

Commit-ID:  cd2f6a5a4704a359635eb34919317052e6a96ba7
Gitweb:     http://git.kernel.org/tip/cd2f6a5a4704a359635eb34919317052e6a96ba7
Author:     Toshi Kani <toshi.kani@hp.com>
AuthorDate: Mon, 11 May 2015 10:15:52 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 11 May 2015 10:38:44 +0200

x86/mm/mtrr: Remove incorrect address check in __mtrr_type_lookup()

__mtrr_type_lookup() checks MTRR fixed ranges when mtrr_state.have_fixed
is set and start is less than 0x100000.

However, the 'else if (start < 0x1000000)' in the code checks with an
incorrect address as it has an extra-zero in the address.

The code still runs correctly as this check is meaningless, though.

This patch replaces the incorrect address check with 'else' with no
condition.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1427234921-19737-4-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1431332153-18566-8-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/mtrr/generic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 7d74f7b..5b23967 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -137,7 +137,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			idx = 1 * 8;
 			idx += ((start - 0x80000) >> 14);
 			return mtrr_state.fixed_ranges[idx];
-		} else if (start < 0x1000000) {
+		} else {
 			idx = 3 * 8;
 			idx += ((start - 0xC0000) >> 12);
 			return mtrr_state.fixed_ranges[idx];

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/mm] x86/mm/mtrr: Remove incorrect address check in __mtrr_type_lookup()
@ 2015-05-11 12:46     ` tip-bot for Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Toshi Kani @ 2015-05-11 12:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: brgerst, linux-kernel, bp, toshi.kani, hpa, peterz, akpm,
	dvlasenk, mcgrof, torvalds, bp, linux-mm, tglx, luto, mingo

Commit-ID:  cd2f6a5a4704a359635eb34919317052e6a96ba7
Gitweb:     http://git.kernel.org/tip/cd2f6a5a4704a359635eb34919317052e6a96ba7
Author:     Toshi Kani <toshi.kani@hp.com>
AuthorDate: Mon, 11 May 2015 10:15:52 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 11 May 2015 10:38:44 +0200

x86/mm/mtrr: Remove incorrect address check in __mtrr_type_lookup()

__mtrr_type_lookup() checks MTRR fixed ranges when mtrr_state.have_fixed
is set and start is less than 0x100000.

However, the 'else if (start < 0x1000000)' in the code checks with an
incorrect address as it has an extra-zero in the address.

The code still runs correctly as this check is meaningless, though.

This patch replaces the incorrect address check with 'else' with no
condition.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1427234921-19737-4-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1431332153-18566-8-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/mtrr/generic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 7d74f7b..5b23967 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -137,7 +137,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			idx = 1 * 8;
 			idx += ((start - 0x80000) >> 14);
 			return mtrr_state.fixed_ranges[idx];
-		} else if (start < 0x1000000) {
+		} else {
 			idx = 3 * 8;
 			idx += ((start - 0xC0000) >> 12);
 			return mtrr_state.fixed_ranges[idx];

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/mm] x86/mm: Add ioremap_uc() helper to map memory uncacheable (not UC-)
  2015-05-11  8:15 ` [PATCH 4/5] x86/mm: Add ioremap_uc() helper to map memory uncacheable (not UC-) Borislav Petkov
@ 2015-05-11 12:46   ` tip-bot for Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Luis R. Rodriguez @ 2015-05-11 12:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: brgerst, mgorman, sbsiddha, mcgrof, dbueso, travis, jgross,
	adaplas, airlied, bp, peterz, plagnioj, tglx, torvalds, bhelgaas,
	dvlasenk, linux-kernel, vbabka, treding, tomi.valkeinen, syrjala,
	mingo, luto, will.deacon, toshi.kani, bp, daniel.vetter, hpa

Commit-ID:  e4b6be33c28923d8cde53023e0888b1c5d1a9027
Gitweb:     http://git.kernel.org/tip/e4b6be33c28923d8cde53023e0888b1c5d1a9027
Author:     Luis R. Rodriguez <mcgrof@suse.com>
AuthorDate: Mon, 11 May 2015 10:15:53 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 11 May 2015 10:38:45 +0200

x86/mm: Add ioremap_uc() helper to map memory uncacheable (not UC-)

ioremap_nocache() currently uses UC- by default. Our goal is to
eventually make UC the default. Linux maps UC- to PCD=1, PWT=0
page attributes on non-PAT systems. Linux maps UC to PCD=1,
PWT=1 page attributes on non-PAT systems. On non-PAT and PAT
systems a WC MTRR has different effects on pages with either of
these attributes. In order to help with a smooth transition its
best to enable use of UC (PCD,1, PWT=1) on a region as that
ensures a WC MTRR will have no effect on a region, this however
requires us to have an way to declare a region as UC and we
currently do not have a way to do this.

  WC MTRR on non-PAT system with PCD=1, PWT=0 (UC-) yields WC.
  WC MTRR on non-PAT system with PCD=1, PWT=1 (UC)  yields UC.

  WC MTRR on PAT system with PCD=1, PWT=0 (UC-) yields WC.
  WC MTRR on PAT system with PCD=1, PWT=1 (UC)  yields UC.

A flip of the default ioremap_nocache() behaviour from UC- to UC
can therefore regress a memory region from effective memory type
WC to UC if MTRRs are used. Use of MTRRs should be phased out
and in the best case only arch_phys_wc_add() use will remain,
even if this happens arch_phys_wc_add() will have an effect on
non-PAT systems and changes to default ioremap_nocache()
behaviour could regress drivers.

Now, ideally we'd use ioremap_nocache() on the regions in which
we'd need uncachable memory types and avoid any MTRRs on those
regions. There are however some restrictions on MTRRs use, such
as the requirement of having the base and size of variable sized
MTRRs to be powers of two, which could mean having to use a WC
MTRR over a large area which includes a region in which
write-combining effects are undesirable.

Add ioremap_uc() to help with the both phasing out of MTRR use
and also provide a way to blacklist small WC undesirable regions
in devices with mixed regions which are size-implicated to use
large WC MTRRs. Use of ioremap_uc() helps phase out MTRR use by
avoiding regressions with an eventual flip of default behaviour
or ioremap_nocache() from UC- to UC.

Drivers working with WC MTRRs can use the below table to review
and consider the use of ioremap*() and similar helpers to ensure
appropriate behaviour long term even if default
ioremap_nocache() behaviour changes from UC- to UC.

Although ioremap_uc() is being added we leave set_memory_uc() to
use UC- as only initial memory type setup is required to be able
to accommodate existing device drivers and phase out MTRR use.
It should also be clarified that set_memory_uc() cannot be used
with IO memory, even though its use will not return any errors,
it really has no effect.

  ----------------------------------------------------------------------
  MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
  ----------------------------------------------------------------------
                                                    Non-PAT |  PAT
       PAT
       |PCD
       ||PWT
       |||
  WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
  WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
  WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   WC
  WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
  ----------------------------------------------------------------------

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Travis <travis@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-fbdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1430343851-967-2-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1431332153-18566-9-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/io.h |  1 +
 arch/x86/mm/ioremap.c     | 36 +++++++++++++++++++++++++++++++++++-
 arch/x86/mm/pageattr.c    |  3 +++
 include/asm-generic/io.h  |  8 ++++++++
 4 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 34a5b93..4afc05f 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -177,6 +177,7 @@ static inline unsigned int isa_virt_to_bus(volatile void *address)
  * look at pci_iomap().
  */
 extern void __iomem *ioremap_nocache(resource_size_t offset, unsigned long size);
+extern void __iomem *ioremap_uc(resource_size_t offset, unsigned long size);
 extern void __iomem *ioremap_cache(resource_size_t offset, unsigned long size);
 extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size,
 				unsigned long prot_val);
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 70e7444..a493bb8 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -237,7 +237,8 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, unsigned long size)
 	 *	pat_enabled ? _PAGE_CACHE_MODE_UC : _PAGE_CACHE_MODE_UC_MINUS;
 	 *
 	 * Till we fix all X drivers to use ioremap_wc(), we will use
-	 * UC MINUS.
+	 * UC MINUS. Drivers that are certain they need or can already
+	 * be converted over to strong UC can use ioremap_uc().
 	 */
 	enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS;
 
@@ -247,6 +248,39 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, unsigned long size)
 EXPORT_SYMBOL(ioremap_nocache);
 
 /**
+ * ioremap_uc     -   map bus memory into CPU space as strongly uncachable
+ * @phys_addr:    bus address of the memory
+ * @size:      size of the resource to map
+ *
+ * ioremap_uc performs a platform specific sequence of operations to
+ * make bus memory CPU accessible via the readb/readw/readl/writeb/
+ * writew/writel functions and the other mmio helpers. The returned
+ * address is not guaranteed to be usable directly as a virtual
+ * address.
+ *
+ * This version of ioremap ensures that the memory is marked with a strong
+ * preference as completely uncachable on the CPU when possible. For non-PAT
+ * systems this ends up setting page-attribute flags PCD=1, PWT=1. For PAT
+ * systems this will set the PAT entry for the pages as strong UC.  This call
+ * will honor existing caching rules from things like the PCI bus. Note that
+ * there are other caches and buffers on many busses. In particular driver
+ * authors should read up on PCI writes.
+ *
+ * It's useful if some control registers are in such an area and
+ * write combining or read caching is not desirable:
+ *
+ * Must be freed with iounmap.
+ */
+void __iomem *ioremap_uc(resource_size_t phys_addr, unsigned long size)
+{
+	enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC;
+
+	return __ioremap_caller(phys_addr, size, pcm,
+				__builtin_return_address(0));
+}
+EXPORT_SYMBOL_GPL(ioremap_uc);
+
+/**
  * ioremap_wc	-	map memory into CPU space write combined
  * @phys_addr:	bus address of the memory
  * @size:	size of the resource to map
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 5ddd900..c77abd7 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1467,6 +1467,9 @@ int _set_memory_uc(unsigned long addr, int numpages)
 {
 	/*
 	 * for now UC MINUS. see comments in ioremap_nocache()
+	 * If you really need strong UC use ioremap_uc(), but note
+	 * that you cannot override IO areas with set_memory_*() as
+	 * these helpers cannot work with IO memory.
 	 */
 	return change_page_attr_set(&addr, numpages,
 				    cachemode2pgprot(_PAGE_CACHE_MODE_UC_MINUS),
diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index 9db0423..90ccba7 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -769,6 +769,14 @@ static inline void __iomem *ioremap_nocache(phys_addr_t offset, size_t size)
 }
 #endif
 
+#ifndef ioremap_uc
+#define ioremap_uc ioremap_uc
+static inline void __iomem *ioremap_uc(phys_addr_t offset, size_t size)
+{
+	return ioremap_nocache(offset, size);
+}
+#endif
+
 #ifndef ioremap_wc
 #define ioremap_wc ioremap_wc
 static inline void __iomem *ioremap_wc(phys_addr_t offset, size_t size)

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 7/7] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-05-09  9:08     ` Borislav Petkov
@ 2015-05-11 19:25       ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-11 19:25 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Sat, 2015-05-09 at 11:08 +0200, Borislav Petkov wrote:
> On Tue, Mar 24, 2015 at 04:08:41PM -0600, Toshi Kani wrote:
 :
> > @@ -235,13 +240,19 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
> >   * Return Values:
> >   * MTRR_TYPE_(type)  - The effective MTRR type for the region
> >   * MTRR_TYPE_INVALID - MTRR is disabled
> > + *
> > + * Output Argument:
> > + * uniform - Set to 1 when MTRR covers the region uniformly, i.e. the region
> > + *	     is fully covered by a single MTRR entry or the default type.
> 
> I'd call this "single_mtrr". "uniform" could also mean that the resulting
> type is uniform, i.e. of the same type but spanning multiple MTRRs.

Actually, that is the intend of "uniform" and the same type but spanning
multiple MTRRs should set "uniform" to 1.  The patch does not check such
case for simplicity since we do not need to maximize the performance
with MTRRs for every corner case since they are legacy and their use is
expected to be phased out.  It makes sure that a type conflict with
MTRRs is detected so that huge page mappings are made safely.

Also, in most of the cases, "uniform" is set to 1 because there is no
MTRR entry that covers the range, i.e. the default type.


> >   */
> > -u8 mtrr_type_lookup(u64 start, u64 end)
> > +u8 mtrr_type_lookup(u64 start, u64 end, u8 *uniform)
> >  {
> > -	u8 type, prev_type;
> > +	u8 type, prev_type, is_uniform, dummy;
> >  	int repeat;
> >  	u64 partial_end;
> >  
> > +	*uniform = 1;
> > +
> 
> You're setting it here...
> 
> >  	if (!mtrr_state_set)
> >  		return MTRR_TYPE_INVALID;
> 
> ... but if you return here, you would've changed the thing uniform
> points to needlessly as you're returning an error.

We need to set "uniform" to 1 when MTRRs are disabled since there is no
type conflict with MTRRs. 


> > @@ -253,14 +264,17 @@ u8 mtrr_type_lookup(u64 start, u64 end)
> >  	 * the variable ranges.
> >  	 */
> >  	type = mtrr_type_lookup_fixed(start, end);
> > -	if (type != MTRR_TYPE_INVALID)
> > +	if (type != MTRR_TYPE_INVALID) {
> > +		*uniform = 0;
> >  		return type;
> > +	}
> >  
> >  	/*
> >  	 * Look up the variable ranges.  Look of multiple ranges matching
> >  	 * this address and pick type as per MTRR precedence.
> >  	 */
> > -	type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
> > +	type = mtrr_type_lookup_variable(start, end, &partial_end,
> > +					 &repeat, &is_uniform);
> >  
> >  	/*
> >  	 * Common path is with repeat = 0.
> > @@ -271,16 +285,21 @@ u8 mtrr_type_lookup(u64 start, u64 end)
> >  	while (repeat) {
> >  		prev_type = type;
> >  		start = partial_end;
> > +		is_uniform = 0;
> 
> So I think it would be better if you added an out: label where you do
> exit from the function and set return values there.
> 
> So something like that, I'm pasting the whole function here so that you
> can follow better:
 :
> 
> This way you're setting the uniform pointer in a single location and you're
> working with the local variable inside the function.
> 
> Much easier to follow.

With the label, the above check will be:

        if (!mtrr_state_set) {
		is_uniform = 1;
                type = MTRR_TYPE_INVALID;
		goto out;
	}

I can follow your suggestion of using the label if you still prefer
using it.


> >   */
> >  int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
> >  {
> > -	u8 mtrr;
> > +	u8 mtrr, uniform;
> >  
> > -	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
> > -	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
> > +	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
> > +	if ((!uniform) && (mtrr != MTRR_TYPE_WRBACK)) {
> > +		pr_warn("pmd_set_huge: requesting [mem %#010llx-%#010llx], which spans more than a single MTRR entry\n",
> > +				addr, addr + PMD_SIZE);
> >  		return 0;
> 
> So this returns 0, i.e. failure already. Why do we even have to warn?
> Caller already knows it failed.
> 
> And this warning would flood dmesg needlessly.

The warning was suggested by reviewers in the previous review so that
driver writers will notice the issue.  Returning 0 here will lead
ioremap() to use 4KB mappings, but does not cause ioremap() to fail.

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 7/7] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
@ 2015-05-11 19:25       ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-11 19:25 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Sat, 2015-05-09 at 11:08 +0200, Borislav Petkov wrote:
> On Tue, Mar 24, 2015 at 04:08:41PM -0600, Toshi Kani wrote:
 :
> > @@ -235,13 +240,19 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
> >   * Return Values:
> >   * MTRR_TYPE_(type)  - The effective MTRR type for the region
> >   * MTRR_TYPE_INVALID - MTRR is disabled
> > + *
> > + * Output Argument:
> > + * uniform - Set to 1 when MTRR covers the region uniformly, i.e. the region
> > + *	     is fully covered by a single MTRR entry or the default type.
> 
> I'd call this "single_mtrr". "uniform" could also mean that the resulting
> type is uniform, i.e. of the same type but spanning multiple MTRRs.

Actually, that is the intend of "uniform" and the same type but spanning
multiple MTRRs should set "uniform" to 1.  The patch does not check such
case for simplicity since we do not need to maximize the performance
with MTRRs for every corner case since they are legacy and their use is
expected to be phased out.  It makes sure that a type conflict with
MTRRs is detected so that huge page mappings are made safely.

Also, in most of the cases, "uniform" is set to 1 because there is no
MTRR entry that covers the range, i.e. the default type.


> >   */
> > -u8 mtrr_type_lookup(u64 start, u64 end)
> > +u8 mtrr_type_lookup(u64 start, u64 end, u8 *uniform)
> >  {
> > -	u8 type, prev_type;
> > +	u8 type, prev_type, is_uniform, dummy;
> >  	int repeat;
> >  	u64 partial_end;
> >  
> > +	*uniform = 1;
> > +
> 
> You're setting it here...
> 
> >  	if (!mtrr_state_set)
> >  		return MTRR_TYPE_INVALID;
> 
> ... but if you return here, you would've changed the thing uniform
> points to needlessly as you're returning an error.

We need to set "uniform" to 1 when MTRRs are disabled since there is no
type conflict with MTRRs. 


> > @@ -253,14 +264,17 @@ u8 mtrr_type_lookup(u64 start, u64 end)
> >  	 * the variable ranges.
> >  	 */
> >  	type = mtrr_type_lookup_fixed(start, end);
> > -	if (type != MTRR_TYPE_INVALID)
> > +	if (type != MTRR_TYPE_INVALID) {
> > +		*uniform = 0;
> >  		return type;
> > +	}
> >  
> >  	/*
> >  	 * Look up the variable ranges.  Look of multiple ranges matching
> >  	 * this address and pick type as per MTRR precedence.
> >  	 */
> > -	type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
> > +	type = mtrr_type_lookup_variable(start, end, &partial_end,
> > +					 &repeat, &is_uniform);
> >  
> >  	/*
> >  	 * Common path is with repeat = 0.
> > @@ -271,16 +285,21 @@ u8 mtrr_type_lookup(u64 start, u64 end)
> >  	while (repeat) {
> >  		prev_type = type;
> >  		start = partial_end;
> > +		is_uniform = 0;
> 
> So I think it would be better if you added an out: label where you do
> exit from the function and set return values there.
> 
> So something like that, I'm pasting the whole function here so that you
> can follow better:
 :
> 
> This way you're setting the uniform pointer in a single location and you're
> working with the local variable inside the function.
> 
> Much easier to follow.

With the label, the above check will be:

        if (!mtrr_state_set) {
		is_uniform = 1;
                type = MTRR_TYPE_INVALID;
		goto out;
	}

I can follow your suggestion of using the label if you still prefer
using it.


> >   */
> >  int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
> >  {
> > -	u8 mtrr;
> > +	u8 mtrr, uniform;
> >  
> > -	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
> > -	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
> > +	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
> > +	if ((!uniform) && (mtrr != MTRR_TYPE_WRBACK)) {
> > +		pr_warn("pmd_set_huge: requesting [mem %#010llx-%#010llx], which spans more than a single MTRR entry\n",
> > +				addr, addr + PMD_SIZE);
> >  		return 0;
> 
> So this returns 0, i.e. failure already. Why do we even have to warn?
> Caller already knows it failed.
> 
> And this warning would flood dmesg needlessly.

The warning was suggested by reviewers in the previous review so that
driver writers will notice the issue.  Returning 0 here will lead
ioremap() to use 4KB mappings, but does not cause ioremap() to fail.

Thanks,
-Toshi


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 7/7] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-05-11 19:25       ` Toshi Kani
@ 2015-05-11 20:18         ` Borislav Petkov
  -1 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-11 20:18 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Mon, May 11, 2015 at 01:25:16PM -0600, Toshi Kani wrote:
> > > @@ -235,13 +240,19 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
> > >   * Return Values:
> > >   * MTRR_TYPE_(type)  - The effective MTRR type for the region
> > >   * MTRR_TYPE_INVALID - MTRR is disabled
> > > + *
> > > + * Output Argument:
> > > + * uniform - Set to 1 when MTRR covers the region uniformly, i.e. the region
> > > + *	     is fully covered by a single MTRR entry or the default type.
> > 
> > I'd call this "single_mtrr". "uniform" could also mean that the resulting
> > type is uniform, i.e. of the same type but spanning multiple MTRRs.
> 
> Actually, that is the intend of "uniform" and the same type but spanning
> multiple MTRRs should set "uniform" to 1.  The patch does not check such

So why does it say "is fully covered by a single MTRR entry or the
default type." - the stress being on *single*

You need to make up your mind.

> We need to set "uniform" to 1 when MTRRs are disabled since there is no
> type conflict with MTRRs.

No, this is wrong.

When we return an *error*, "uniform" should be *undefined* because MTRRs
are disabled and callers should be checking whether it returned an error
first and only *then* look at uniform.

> The warning was suggested by reviewers in the previous review so that
> driver writers will notice the issue.

No, we don't flood dmesg so that driver writers notice stuff. We better
fix the callers.

> Returning 0 here will lead
> ioremap() to use 4KB mappings, but does not cause ioremap() to fail.

I guess a pr_warn_once() should be better then. Flooding dmesg with
error messages for which the user can't really do anything about doesn't
bring us anything.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 7/7] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
@ 2015-05-11 20:18         ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-11 20:18 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Mon, May 11, 2015 at 01:25:16PM -0600, Toshi Kani wrote:
> > > @@ -235,13 +240,19 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
> > >   * Return Values:
> > >   * MTRR_TYPE_(type)  - The effective MTRR type for the region
> > >   * MTRR_TYPE_INVALID - MTRR is disabled
> > > + *
> > > + * Output Argument:
> > > + * uniform - Set to 1 when MTRR covers the region uniformly, i.e. the region
> > > + *	     is fully covered by a single MTRR entry or the default type.
> > 
> > I'd call this "single_mtrr". "uniform" could also mean that the resulting
> > type is uniform, i.e. of the same type but spanning multiple MTRRs.
> 
> Actually, that is the intend of "uniform" and the same type but spanning
> multiple MTRRs should set "uniform" to 1.  The patch does not check such

So why does it say "is fully covered by a single MTRR entry or the
default type." - the stress being on *single*

You need to make up your mind.

> We need to set "uniform" to 1 when MTRRs are disabled since there is no
> type conflict with MTRRs.

No, this is wrong.

When we return an *error*, "uniform" should be *undefined* because MTRRs
are disabled and callers should be checking whether it returned an error
first and only *then* look at uniform.

> The warning was suggested by reviewers in the previous review so that
> driver writers will notice the issue.

No, we don't flood dmesg so that driver writers notice stuff. We better
fix the callers.

> Returning 0 here will lead
> ioremap() to use 4KB mappings, but does not cause ioremap() to fail.

I guess a pr_warn_once() should be better then. Flooding dmesg with
error messages for which the user can't really do anything about doesn't
bring us anything.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 7/7] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-05-11 20:18         ` Borislav Petkov
@ 2015-05-11 20:38           ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-11 20:38 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Mon, 2015-05-11 at 22:18 +0200, Borislav Petkov wrote:
> On Mon, May 11, 2015 at 01:25:16PM -0600, Toshi Kani wrote:
> > > > @@ -235,13 +240,19 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
> > > >   * Return Values:
> > > >   * MTRR_TYPE_(type)  - The effective MTRR type for the region
> > > >   * MTRR_TYPE_INVALID - MTRR is disabled
> > > > + *
> > > > + * Output Argument:
> > > > + * uniform - Set to 1 when MTRR covers the region uniformly, i.e. the region
> > > > + *	     is fully covered by a single MTRR entry or the default type.
> > > 
> > > I'd call this "single_mtrr". "uniform" could also mean that the resulting
> > > type is uniform, i.e. of the same type but spanning multiple MTRRs.
> > 
> > Actually, that is the intend of "uniform" and the same type but spanning
> > multiple MTRRs should set "uniform" to 1.  The patch does not check such
> 
> So why does it say "is fully covered by a single MTRR entry or the
> default type." - the stress being on *single*
> 
> You need to make up your mind.

I will clarify the comment as follows.
===
uniform - Set to 1 when the region is not covered with multiple memory
types by MTRRs.  It is set for any return value.

NOTE: The current code sets 'uniform' to 1 when the region is fully
covered by a single MTRR entry or fully uncovered.  However, it does not
detect a uniform case that the region is covered by the same type but
spanning multiple MTRR entries for simplicity.
===

> > We need to set "uniform" to 1 when MTRRs are disabled since there is no
> > type conflict with MTRRs.
> 
> No, this is wrong.
> 
> When we return an *error*, "uniform" should be *undefined* because MTRRs
> are disabled and callers should be checking whether it returned an error
> first and only *then* look at uniform.

MTRRs disabled is not an error case as it could be a normal
configuration on some platforms / BIOS setups.  I clarified it in the
above comment that uniform is set for any return value.


> > The warning was suggested by reviewers in the previous review so that
> > driver writers will notice the issue.
> 
> No, we don't flood dmesg so that driver writers notice stuff. We better
> fix the callers.
> 
> > Returning 0 here will lead
> > ioremap() to use 4KB mappings, but does not cause ioremap() to fail.
> 
> I guess a pr_warn_once() should be better then. Flooding dmesg with
> error messages for which the user can't really do anything about doesn't
> bring us anything.

OK, I will change it to pr_warn_once().

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 7/7] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
@ 2015-05-11 20:38           ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-11 20:38 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Mon, 2015-05-11 at 22:18 +0200, Borislav Petkov wrote:
> On Mon, May 11, 2015 at 01:25:16PM -0600, Toshi Kani wrote:
> > > > @@ -235,13 +240,19 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
> > > >   * Return Values:
> > > >   * MTRR_TYPE_(type)  - The effective MTRR type for the region
> > > >   * MTRR_TYPE_INVALID - MTRR is disabled
> > > > + *
> > > > + * Output Argument:
> > > > + * uniform - Set to 1 when MTRR covers the region uniformly, i.e. the region
> > > > + *	     is fully covered by a single MTRR entry or the default type.
> > > 
> > > I'd call this "single_mtrr". "uniform" could also mean that the resulting
> > > type is uniform, i.e. of the same type but spanning multiple MTRRs.
> > 
> > Actually, that is the intend of "uniform" and the same type but spanning
> > multiple MTRRs should set "uniform" to 1.  The patch does not check such
> 
> So why does it say "is fully covered by a single MTRR entry or the
> default type." - the stress being on *single*
> 
> You need to make up your mind.

I will clarify the comment as follows.
===
uniform - Set to 1 when the region is not covered with multiple memory
types by MTRRs.  It is set for any return value.

NOTE: The current code sets 'uniform' to 1 when the region is fully
covered by a single MTRR entry or fully uncovered.  However, it does not
detect a uniform case that the region is covered by the same type but
spanning multiple MTRR entries for simplicity.
===

> > We need to set "uniform" to 1 when MTRRs are disabled since there is no
> > type conflict with MTRRs.
> 
> No, this is wrong.
> 
> When we return an *error*, "uniform" should be *undefined* because MTRRs
> are disabled and callers should be checking whether it returned an error
> first and only *then* look at uniform.

MTRRs disabled is not an error case as it could be a normal
configuration on some platforms / BIOS setups.  I clarified it in the
above comment that uniform is set for any return value.


> > The warning was suggested by reviewers in the previous review so that
> > driver writers will notice the issue.
> 
> No, we don't flood dmesg so that driver writers notice stuff. We better
> fix the callers.
> 
> > Returning 0 here will lead
> > ioremap() to use 4KB mappings, but does not cause ioremap() to fail.
> 
> I guess a pr_warn_once() should be better then. Flooding dmesg with
> error messages for which the user can't really do anything about doesn't
> bring us anything.

OK, I will change it to pr_warn_once().

Thanks,
-Toshi


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 7/7] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-05-11 20:38           ` Toshi Kani
@ 2015-05-11 21:42             ` Borislav Petkov
  -1 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-11 21:42 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Mon, May 11, 2015 at 02:38:46PM -0600, Toshi Kani wrote:
> MTRRs disabled is not an error case as it could be a normal
> configuration on some platforms / BIOS setups.

Normal how? PAT-only systems? Examples please...

> I clarified it in the above comment that uniform is set for any return
> value.

Hell no!

u8 mtrr_type_lookup(u64 start, u64 end, u8 *uniform)
{

	...

        *uniform = 1;

        if (!mtrr_state_set)
                return MTRR_TYPE_INVALID;

        if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
                return MTRR_TYPE_INVALID;


This is wrong and the fact that I still need to persuade you about it
says a lot.

If you want to be able to state that a type is uniform even if MTRRs are
disabled, you need to define another retval which means exactly that.

Or add an inline function called mtrr_enabled() and call it in the
mtrr_type_lookup() callers.

Or whatever.

I don't want any confusing states with two return types and people
having to figure out what it exactly means and digging into the code
and scratching heads WTF is that supposed to mean.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 7/7] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
@ 2015-05-11 21:42             ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-11 21:42 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Mon, May 11, 2015 at 02:38:46PM -0600, Toshi Kani wrote:
> MTRRs disabled is not an error case as it could be a normal
> configuration on some platforms / BIOS setups.

Normal how? PAT-only systems? Examples please...

> I clarified it in the above comment that uniform is set for any return
> value.

Hell no!

u8 mtrr_type_lookup(u64 start, u64 end, u8 *uniform)
{

	...

        *uniform = 1;

        if (!mtrr_state_set)
                return MTRR_TYPE_INVALID;

        if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
                return MTRR_TYPE_INVALID;


This is wrong and the fact that I still need to persuade you about it
says a lot.

If you want to be able to state that a type is uniform even if MTRRs are
disabled, you need to define another retval which means exactly that.

Or add an inline function called mtrr_enabled() and call it in the
mtrr_type_lookup() callers.

Or whatever.

I don't want any confusing states with two return types and people
having to figure out what it exactly means and digging into the code
and scratching heads WTF is that supposed to mean.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 7/7] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-05-11 21:42             ` Borislav Petkov
@ 2015-05-11 22:09               ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-11 22:09 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Mon, 2015-05-11 at 23:42 +0200, Borislav Petkov wrote:
> On Mon, May 11, 2015 at 02:38:46PM -0600, Toshi Kani wrote:
> > MTRRs disabled is not an error case as it could be a normal
> > configuration on some platforms / BIOS setups.
> 
> Normal how? PAT-only systems? Examples please...

BIOS initializes and enables MTRRs at POST.  While the most (if not all)
BIOSes do it today, I do not think the x86 arch requires BIOS to enable
them.

Here is a quote from Intel SDM:
===
11.11.5 MTRR Initialization

On a hardware reset, the P6 and more recent processors clear the valid
flags in variable-range MTRRs and clear the E flag in the
IA32_MTRR_DEF_TYPE MSR to disable all MTRRs. All other bits in the MTRRs
are undefined.

Prior to initializing the MTRRs, software (normally the system BIOS)
must initialize all fixed-range and variablerange MTRR register fields
to 0. Software can then initialize the MTRRs according to known types of
memory, including memory on devices that it auto-configures.
Initialization is expected to occur prior to booting the operating
system.
===

> > I clarified it in the above comment that uniform is set for any return
> > value.
> 
> Hell no!
> 
> u8 mtrr_type_lookup(u64 start, u64 end, u8 *uniform)
> {
> 
> 	...
> 
>         *uniform = 1;
> 
>         if (!mtrr_state_set)
>                 return MTRR_TYPE_INVALID;
> 
>         if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
>                 return MTRR_TYPE_INVALID;
> 
> 
> This is wrong and the fact that I still need to persuade you about it
> says a lot.
> 
> If you want to be able to state that a type is uniform even if MTRRs are
> disabled, you need to define another retval which means exactly that.

There may not be any type conflict with MTRR_TYPE_INVALID. 

> Or add an inline function called mtrr_enabled() and call it in the
> mtrr_type_lookup() callers.
> 
> Or whatever.
> 
> I don't want any confusing states with two return types and people
> having to figure out what it exactly means and digging into the code
> and scratching heads WTF is that supposed to mean.

I will change the caller to check MTRR_TYPE_INVALID, and treat it as a
uniform case.

Thanks,
-Toshi




^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 7/7] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
@ 2015-05-11 22:09               ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-11 22:09 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Mon, 2015-05-11 at 23:42 +0200, Borislav Petkov wrote:
> On Mon, May 11, 2015 at 02:38:46PM -0600, Toshi Kani wrote:
> > MTRRs disabled is not an error case as it could be a normal
> > configuration on some platforms / BIOS setups.
> 
> Normal how? PAT-only systems? Examples please...

BIOS initializes and enables MTRRs at POST.  While the most (if not all)
BIOSes do it today, I do not think the x86 arch requires BIOS to enable
them.

Here is a quote from Intel SDM:
===
11.11.5 MTRR Initialization

On a hardware reset, the P6 and more recent processors clear the valid
flags in variable-range MTRRs and clear the E flag in the
IA32_MTRR_DEF_TYPE MSR to disable all MTRRs. All other bits in the MTRRs
are undefined.

Prior to initializing the MTRRs, software (normally the system BIOS)
must initialize all fixed-range and variablerange MTRR register fields
to 0. Software can then initialize the MTRRs according to known types of
memory, including memory on devices that it auto-configures.
Initialization is expected to occur prior to booting the operating
system.
===

> > I clarified it in the above comment that uniform is set for any return
> > value.
> 
> Hell no!
> 
> u8 mtrr_type_lookup(u64 start, u64 end, u8 *uniform)
> {
> 
> 	...
> 
>         *uniform = 1;
> 
>         if (!mtrr_state_set)
>                 return MTRR_TYPE_INVALID;
> 
>         if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
>                 return MTRR_TYPE_INVALID;
> 
> 
> This is wrong and the fact that I still need to persuade you about it
> says a lot.
> 
> If you want to be able to state that a type is uniform even if MTRRs are
> disabled, you need to define another retval which means exactly that.

There may not be any type conflict with MTRR_TYPE_INVALID. 

> Or add an inline function called mtrr_enabled() and call it in the
> mtrr_type_lookup() callers.
> 
> Or whatever.
> 
> I don't want any confusing states with two return types and people
> having to figure out what it exactly means and digging into the code
> and scratching heads WTF is that supposed to mean.

I will change the caller to check MTRR_TYPE_INVALID, and treat it as a
uniform case.

Thanks,
-Toshi



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 7/7] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-05-11 22:09               ` Toshi Kani
@ 2015-05-12  7:28                 ` Borislav Petkov
  -1 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-12  7:28 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Mon, May 11, 2015 at 04:09:39PM -0600, Toshi Kani wrote:
> There may not be any type conflict with MTRR_TYPE_INVALID.

Because...?

Let me guess: you cannot change this function to return a signed value
which is the type when positive and an error when negative?

> I will change the caller to check MTRR_TYPE_INVALID, and treat it as a
> uniform case.

That would be, of course, also wrong.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 7/7] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
@ 2015-05-12  7:28                 ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-12  7:28 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Mon, May 11, 2015 at 04:09:39PM -0600, Toshi Kani wrote:
> There may not be any type conflict with MTRR_TYPE_INVALID.

Because...?

Let me guess: you cannot change this function to return a signed value
which is the type when positive and an error when negative?

> I will change the caller to check MTRR_TYPE_INVALID, and treat it as a
> uniform case.

That would be, of course, also wrong.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 7/7] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-05-12  7:28                 ` Borislav Petkov
@ 2015-05-12 14:30                   ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-12 14:30 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, 2015-05-12 at 09:28 +0200, Borislav Petkov wrote:
> On Mon, May 11, 2015 at 04:09:39PM -0600, Toshi Kani wrote:
> > There may not be any type conflict with MTRR_TYPE_INVALID.
> 
> Because...?

Because you cannot have a memory type conflict with MTRRs when MTRRs are
disabled.  mtrr_type_lookup() returns MTRR_TYPE_INVALID when MTRRs are
disabled.  This is stated in the comments of mtrr_type_lookup() and the
MTRR_TYPE_INVALID definition itself.

BIOS can disable MTRRs, or VM may choose not to implement MTRRs.  The OS
needs to handle this case as a valid config, and this is not an error
case.

> Let me guess: you cannot change this function to return a signed value
> which is the type when positive and an error when negative?

No, that is not the reason. 

> > I will change the caller to check MTRR_TYPE_INVALID, and treat it as a
> > uniform case.
> 
> That would be, of course, also wrong.

I am confused... In your previous comments, you mentioned that:

| If you want to be able to state that a type is uniform even if MTRRs
| are disabled, you need to define another retval which means exactly
| that.

There may not be type conflict when MTRRs are disabled.  There is no
point of defining a new return value.

| Or add an inline function called mtrr_enabled() and call it in the
| mtrr_type_lookup() callers.

MTRR_TYPE_INVALID means MTRRs disabled.  So, the caller checking with
this value is the same as checking with mtrr_enabled() you suggested.

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 7/7] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
@ 2015-05-12 14:30                   ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-12 14:30 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, 2015-05-12 at 09:28 +0200, Borislav Petkov wrote:
> On Mon, May 11, 2015 at 04:09:39PM -0600, Toshi Kani wrote:
> > There may not be any type conflict with MTRR_TYPE_INVALID.
> 
> Because...?

Because you cannot have a memory type conflict with MTRRs when MTRRs are
disabled.  mtrr_type_lookup() returns MTRR_TYPE_INVALID when MTRRs are
disabled.  This is stated in the comments of mtrr_type_lookup() and the
MTRR_TYPE_INVALID definition itself.

BIOS can disable MTRRs, or VM may choose not to implement MTRRs.  The OS
needs to handle this case as a valid config, and this is not an error
case.

> Let me guess: you cannot change this function to return a signed value
> which is the type when positive and an error when negative?

No, that is not the reason. 

> > I will change the caller to check MTRR_TYPE_INVALID, and treat it as a
> > uniform case.
> 
> That would be, of course, also wrong.

I am confused... In your previous comments, you mentioned that:

| If you want to be able to state that a type is uniform even if MTRRs
| are disabled, you need to define another retval which means exactly
| that.

There may not be type conflict when MTRRs are disabled.  There is no
point of defining a new return value.

| Or add an inline function called mtrr_enabled() and call it in the
| mtrr_type_lookup() callers.

MTRR_TYPE_INVALID means MTRRs disabled.  So, the caller checking with
this value is the same as checking with mtrr_enabled() you suggested.

Thanks,
-Toshi


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 7/7] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-05-12 14:30                   ` Toshi Kani
@ 2015-05-12 16:31                     ` Borislav Petkov
  -1 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-12 16:31 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, May 12, 2015 at 08:30:30AM -0600, Toshi Kani wrote:
> MTRR_TYPE_INVALID means MTRRs disabled.  So, the caller checking with
> this value is the same as checking with mtrr_enabled() you suggested.

So then you don't have to set *uniform = 1 on entry to
mtrr_type_lookup(). And change the retval test

	if ((!uniform) && (mtrr != MTRR_TYPE_WRBACK))

to
	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) && (mtrr != MTRR_TYPE_WRBACK))

You can put the MTRR_TYPE_INVALID first so that it shortcuts.

You need the distinction between MTRRs *disabled* and an MTRR region
being {non-,}uniform.

If MTRRs are disabled, uniform doesn't *mean* *anything* because it is
undefined. When MTRRs are disabled, the range is *not* covered by MTRRs
because, well, them MTRRs are disabled.

And it might be fine for *your* use case to set *uniform even when MTRRs
are disabled but it might matter in the future. So we better design it
correct from the beginning.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 7/7] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
@ 2015-05-12 16:31                     ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-12 16:31 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, May 12, 2015 at 08:30:30AM -0600, Toshi Kani wrote:
> MTRR_TYPE_INVALID means MTRRs disabled.  So, the caller checking with
> this value is the same as checking with mtrr_enabled() you suggested.

So then you don't have to set *uniform = 1 on entry to
mtrr_type_lookup(). And change the retval test

	if ((!uniform) && (mtrr != MTRR_TYPE_WRBACK))

to
	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) && (mtrr != MTRR_TYPE_WRBACK))

You can put the MTRR_TYPE_INVALID first so that it shortcuts.

You need the distinction between MTRRs *disabled* and an MTRR region
being {non-,}uniform.

If MTRRs are disabled, uniform doesn't *mean* *anything* because it is
undefined. When MTRRs are disabled, the range is *not* covered by MTRRs
because, well, them MTRRs are disabled.

And it might be fine for *your* use case to set *uniform even when MTRRs
are disabled but it might matter in the future. So we better design it
correct from the beginning.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 7/7] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-05-12 16:31                     ` Borislav Petkov
@ 2015-05-12 16:57                       ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-12 16:57 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, 2015-05-12 at 18:31 +0200, Borislav Petkov wrote:
> On Tue, May 12, 2015 at 08:30:30AM -0600, Toshi Kani wrote:
> > MTRR_TYPE_INVALID means MTRRs disabled.  So, the caller checking with
> > this value is the same as checking with mtrr_enabled() you suggested.
> 
> So then you don't have to set *uniform = 1 on entry to
> mtrr_type_lookup(). And change the retval test
> 
> 	if ((!uniform) && (mtrr != MTRR_TYPE_WRBACK))
> 
> to
> 	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) && (mtrr != MTRR_TYPE_WRBACK))

Yes, that's what I was thinking as well.  Will do.

> You can put the MTRR_TYPE_INVALID first so that it shortcuts.
> 
> You need the distinction between MTRRs *disabled* and an MTRR region
> being {non-,}uniform.
> 
> If MTRRs are disabled, uniform doesn't *mean* *anything* because it is
> undefined. When MTRRs are disabled, the range is *not* covered by MTRRs
> because, well, them MTRRs are disabled.
> 
> And it might be fine for *your* use case to set *uniform even when MTRRs
> are disabled but it might matter in the future. So we better design it
> correct from the beginning.

I think it is a matter of how "uniform" is defined, but your point is
taken and I will change it accordingly.

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 7/7] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
@ 2015-05-12 16:57                       ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-12 16:57 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle

On Tue, 2015-05-12 at 18:31 +0200, Borislav Petkov wrote:
> On Tue, May 12, 2015 at 08:30:30AM -0600, Toshi Kani wrote:
> > MTRR_TYPE_INVALID means MTRRs disabled.  So, the caller checking with
> > this value is the same as checking with mtrr_enabled() you suggested.
> 
> So then you don't have to set *uniform = 1 on entry to
> mtrr_type_lookup(). And change the retval test
> 
> 	if ((!uniform) && (mtrr != MTRR_TYPE_WRBACK))
> 
> to
> 	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) && (mtrr != MTRR_TYPE_WRBACK))

Yes, that's what I was thinking as well.  Will do.

> You can put the MTRR_TYPE_INVALID first so that it shortcuts.
> 
> You need the distinction between MTRRs *disabled* and an MTRR region
> being {non-,}uniform.
> 
> If MTRRs are disabled, uniform doesn't *mean* *anything* because it is
> undefined. When MTRRs are disabled, the range is *not* covered by MTRRs
> because, well, them MTRRs are disabled.
> 
> And it might be fine for *your* use case to set *uniform even when MTRRs
> are disabled but it might matter in the future. So we better design it
> correct from the beginning.

I think it is a matter of how "uniform" is defined, but your point is
taken and I will change it accordingly.

Thanks,
-Toshi


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 1/6] x86/mm/pat: use pr_info() and friends
  2015-05-07  3:36   ` Elliott, Robert (Server Storage)
@ 2015-05-14 15:55     ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-05-14 15:55 UTC (permalink / raw)
  To: Elliott, Robert (Server Storage), bp
  Cc: Luis R. Rodriguez, mingo, tglx, hpa, plagnioj, tomi.valkeinen,
	daniel.vetter, airlied, dledford, awalls, syrjala, luto, mst,
	cocci, linux-kernel, Juergen Gross, Daniel Vetter, Dave Airlie,
	Bjorn Helgaas, x86

On Thu, May 07, 2015 at 03:36:15AM +0000, Elliott, Robert (Server Storage) wrote:
> > From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> > owner@vger.kernel.org] On Behalf Of Luis R. Rodriguez
> > Sent: Thursday, April 30, 2015 3:25 PM
> > Subject: [PATCH v5 1/6] x86/mm/pat: use pr_info() and friends
> > 
> ...
> > -			printk(KERN_ERR "%s:%d map pfn expected mapping
> > type %s"
> > -				" for [mem %#010Lx-%#010Lx], got %s\n",
> > -				current->comm, current->pid,
> > -				cattr_name(want_pcm),
> > -				(unsigned long long)paddr,
> > -				(unsigned long long)(paddr + size - 1),
> > -				cattr_name(pcm));
> > +			pr_err("%s:%d map pfn expected mapping type %s"
> > +			       " for [mem %#010Lx-%#010Lx], got %s\n",
> 
> Since the patch joins some other print format strings split across 
> lines (which checkpatch allows), you might want to join this one too.
> 
> ...
> > diff --git a/arch/x86/mm/pat_rbtree.c b/arch/x86/mm/pat_rbtree.c
> ...
> >  failure:
> > -	printk(KERN_INFO "%s:%d conflicting memory types "
> > +	pr_info("%s:%d conflicting memory types "
> >  		"%Lx-%Lx %s<->%s\n", current->comm, current->pid, start,
> >  		end, cattr_name(found_type), cattr_name(match->type));
> 
> and that one.

I have adjusted this.

Boris, would you like a v6 re-spin on this series?  Or just this patch, or
anthing else? FWIW since I keep having to re-do patches / rebase after a while
and the entire kill-mtrr series is large with tons of parts I've set out a tree
with all pending mtrr changes.  The kill-mtrr-v5-20150514 can be used:

https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git

When needed I'll just fetch linux-next and rebase --onto that day's
origin/master.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* [PATCH v5 0/6] mtrr, mm, x86: Enhance MTRR checks for huge I/O mapping
@ 2015-05-15 18:23 ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-15 18:23 UTC (permalink / raw)
  To: bp, akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle, mcgrof

This patchset enhances MTRR checks for the kernel huge I/O mapping.

The following functional changes are made in patch 6/6.
 - Allow pud_set_huge() and pmd_set_huge() to create a huge page mapping
   when the range is covered by a single MTRR entry of any memory type.
 - Log a pr_warn_once() message when a specified PMD map range spans more
   than a single MTRR entry.  Drivers should make a mapping request aligned
   to a single MTRR entry when the range is covered by MTRRs.

Patch 1/6 simplifies the condition of HAVE_ARCH_HUGE_VMAP in Kconfig.
Patch 2/6 - 5/6 are bug fix and clean up to mtrr_type_lookup().

The patchset is based on the tip tree.
---
v5:
 - Separate Kconfig change and reordered/squashed the patchset. (Borislav
   Petkov)
 - Update logs, comments and functional structures. (Borislav Petkov)
 - Move MTRR_STATE_MTRR_XXX definitions to kernel asm/mtrr.h.  (Borislav
   Petkov)
 - Change mtrr_type_lookup() not to set 'uniform' in case of MTRR_TYPE_INVALID.
   (Borislav Petkov)
 - Remove a patch accepted in the tip free from the series.

v4:
 - Update the change logs of patchset. (Ingo Molnar)
 - Add patch 3/7 to make the wrong address fix as a separate patch.
   (Ingo Molnar)
 - Add patch 5/7 to define MTRR_TYPE_INVALID. (Ingo Molnar)
 - Update patch 6/7 to document MTRR fixed ranges. (Ingo Molnar)

v3:
 - Add patch 3/5 to fix a bug in MTRR state checks.
 - Update patch 4/5 to create separate functions for the fixed and
   variable entries. (Ingo Molnar)

v2:
 - Update change logs and comments per review comments.
   (Ingo Molnar)
 - Add patch 3/4 to clean up mtrr_type_lookup(). (Ingo Molnar)

---
Toshi Kani (6):
 1/6 mm, x86: Simplify conditions of HAVE_ARCH_HUGE_VMAP
 2/6 mtrr, x86: Fix MTRR lookup to handle inclusive entry
 3/6 mtrr, x86: Fix MTRR state checks in mtrr_type_lookup()
 4/6 mtrr, x86: Define MTRR_TYPE_INVALID for mtrr_type_lookup()
 5/6 mtrr, x86: Clean up mtrr_type_lookup()
 6/6 mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping

---
 arch/x86/Kconfig                   |   2 +-
 arch/x86/include/asm/mtrr.h        |  10 +-
 arch/x86/include/uapi/asm/mtrr.h   |   8 +-
 arch/x86/kernel/cpu/mtrr/cleanup.c |   3 +-
 arch/x86/kernel/cpu/mtrr/generic.c | 200 ++++++++++++++++++++++++-------------
 arch/x86/mm/pat.c                  |   4 +-
 arch/x86/mm/pgtable.c              |  59 ++++++++---
 7 files changed, 194 insertions(+), 92 deletions(-)


^ permalink raw reply	[flat|nested] 710+ messages in thread

* [PATCH v5 0/6] mtrr, mm, x86: Enhance MTRR checks for huge I/O mapping
@ 2015-05-15 18:23 ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-15 18:23 UTC (permalink / raw)
  To: bp, akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle, mcgrof

This patchset enhances MTRR checks for the kernel huge I/O mapping.

The following functional changes are made in patch 6/6.
 - Allow pud_set_huge() and pmd_set_huge() to create a huge page mapping
   when the range is covered by a single MTRR entry of any memory type.
 - Log a pr_warn_once() message when a specified PMD map range spans more
   than a single MTRR entry.  Drivers should make a mapping request aligned
   to a single MTRR entry when the range is covered by MTRRs.

Patch 1/6 simplifies the condition of HAVE_ARCH_HUGE_VMAP in Kconfig.
Patch 2/6 - 5/6 are bug fix and clean up to mtrr_type_lookup().

The patchset is based on the tip tree.
---
v5:
 - Separate Kconfig change and reordered/squashed the patchset. (Borislav
   Petkov)
 - Update logs, comments and functional structures. (Borislav Petkov)
 - Move MTRR_STATE_MTRR_XXX definitions to kernel asm/mtrr.h.  (Borislav
   Petkov)
 - Change mtrr_type_lookup() not to set 'uniform' in case of MTRR_TYPE_INVALID.
   (Borislav Petkov)
 - Remove a patch accepted in the tip free from the series.

v4:
 - Update the change logs of patchset. (Ingo Molnar)
 - Add patch 3/7 to make the wrong address fix as a separate patch.
   (Ingo Molnar)
 - Add patch 5/7 to define MTRR_TYPE_INVALID. (Ingo Molnar)
 - Update patch 6/7 to document MTRR fixed ranges. (Ingo Molnar)

v3:
 - Add patch 3/5 to fix a bug in MTRR state checks.
 - Update patch 4/5 to create separate functions for the fixed and
   variable entries. (Ingo Molnar)

v2:
 - Update change logs and comments per review comments.
   (Ingo Molnar)
 - Add patch 3/4 to clean up mtrr_type_lookup(). (Ingo Molnar)

---
Toshi Kani (6):
 1/6 mm, x86: Simplify conditions of HAVE_ARCH_HUGE_VMAP
 2/6 mtrr, x86: Fix MTRR lookup to handle inclusive entry
 3/6 mtrr, x86: Fix MTRR state checks in mtrr_type_lookup()
 4/6 mtrr, x86: Define MTRR_TYPE_INVALID for mtrr_type_lookup()
 5/6 mtrr, x86: Clean up mtrr_type_lookup()
 6/6 mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping

---
 arch/x86/Kconfig                   |   2 +-
 arch/x86/include/asm/mtrr.h        |  10 +-
 arch/x86/include/uapi/asm/mtrr.h   |   8 +-
 arch/x86/kernel/cpu/mtrr/cleanup.c |   3 +-
 arch/x86/kernel/cpu/mtrr/generic.c | 200 ++++++++++++++++++++++++-------------
 arch/x86/mm/pat.c                  |   4 +-
 arch/x86/mm/pgtable.c              |  59 ++++++++---
 7 files changed, 194 insertions(+), 92 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* [PATCH v5 1/6] mm, x86: Simplify conditions of HAVE_ARCH_HUGE_VMAP
  2015-05-15 18:23 ` Toshi Kani
@ 2015-05-15 18:23   ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-15 18:23 UTC (permalink / raw)
  To: bp, akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle,
	mcgrof, Toshi Kani

Simplify the conditions to select HAVE_ARCH_HUGE_VMAP
in arch/x86/Kconfig since X86_PAE depends on X86_32.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/Kconfig |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8fec044..73a4d03 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -100,7 +100,7 @@ config X86
 	select IRQ_FORCED_THREADING
 	select HAVE_BPF_JIT if X86_64
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
-	select HAVE_ARCH_HUGE_VMAP if X86_64 || (X86_32 && X86_PAE)
+	select HAVE_ARCH_HUGE_VMAP if X86_64 || X86_PAE
 	select ARCH_HAS_SG_CHAIN
 	select CLKEVT_I8253
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v5 1/6] mm, x86: Simplify conditions of HAVE_ARCH_HUGE_VMAP
@ 2015-05-15 18:23   ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-15 18:23 UTC (permalink / raw)
  To: bp, akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle,
	mcgrof, Toshi Kani

Simplify the conditions to select HAVE_ARCH_HUGE_VMAP
in arch/x86/Kconfig since X86_PAE depends on X86_32.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/Kconfig |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8fec044..73a4d03 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -100,7 +100,7 @@ config X86
 	select IRQ_FORCED_THREADING
 	select HAVE_BPF_JIT if X86_64
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
-	select HAVE_ARCH_HUGE_VMAP if X86_64 || (X86_32 && X86_PAE)
+	select HAVE_ARCH_HUGE_VMAP if X86_64 || X86_PAE
 	select ARCH_HAS_SG_CHAIN
 	select CLKEVT_I8253
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v5 2/6] mtrr, x86: Fix MTRR lookup to handle inclusive entry
  2015-05-15 18:23 ` Toshi Kani
@ 2015-05-15 18:23   ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-15 18:23 UTC (permalink / raw)
  To: bp, akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle,
	mcgrof, Toshi Kani

When an MTRR entry is inclusive to a requested range, i.e.
the start and end of the request are not within the MTRR
entry range but the range contains the MTRR entry entirely,
__mtrr_type_lookup() ignores such a case because both
start_state and end_state are set to zero.

This bug can cause the following issues:
1) reserve_memtype() tracks an effective memory type in case
   a request type is WB (ex. /dev/mem blindly uses WB). Missing
   to track with its effective type causes a subsequent request
   to map the same range with the effective type to fail.
2) pud_set_huge() and pmd_set_huge() check if a requested range
   has any overlap with MTRRs. Missing to detect an overlap may
   cause a performance penalty or undefined behavior.

This patch fixes the bug by adding a new flag, 'inclusive',
to detect the inclusive case.  This case is then handled in
the same way as end_state:1 since the first region is the same.
With this fix, __mtrr_type_lookup() handles the inclusive case
properly.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/kernel/cpu/mtrr/generic.c |   28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 5b23967..e202d26 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -154,7 +154,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 
 	prev_match = 0xFF;
 	for (i = 0; i < num_var_ranges; ++i) {
-		unsigned short start_state, end_state;
+		unsigned short start_state, end_state, inclusive;
 
 		if (!(mtrr_state.var_ranges[i].mask_lo & (1 << 11)))
 			continue;
@@ -166,19 +166,27 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 
 		start_state = ((start & mask) == (base & mask));
 		end_state = ((end & mask) == (base & mask));
+		inclusive = ((start < base) && (end > base));
 
-		if (start_state != end_state) {
+		if ((start_state != end_state) || inclusive) {
 			/*
 			 * We have start:end spanning across an MTRR.
-			 * We split the region into
-			 * either
-			 * (start:mtrr_end) (mtrr_end:end)
-			 * or
-			 * (start:mtrr_start) (mtrr_start:end)
+			 * We split the region into either
+			 *
+			 * - start_state:1
+			 * (start:mtrr_end)(mtrr_end:end)
+			 * - end_state:1
+			 * (start:mtrr_start)(mtrr_start:end)
+			 * - inclusive:1
+			 * (start:mtrr_start)(mtrr_start:mtrr_end)(mtrr_end:end)
+			 *
 			 * depending on kind of overlap.
-			 * Return the type for first region and a pointer to
-			 * the start of second region so that caller will
-			 * lookup again on the second region.
+			 *
+			 * Return the type of the first region and a pointer
+			 * to the start of next region so that caller will be
+			 * advised to lookup again after having adjusted start
+			 * and end.
+			 *
 			 * Note: This way we handle multiple overlaps as well.
 			 */
 			if (start_state)

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v5 2/6] mtrr, x86: Fix MTRR lookup to handle inclusive entry
@ 2015-05-15 18:23   ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-15 18:23 UTC (permalink / raw)
  To: bp, akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle,
	mcgrof, Toshi Kani

When an MTRR entry is inclusive to a requested range, i.e.
the start and end of the request are not within the MTRR
entry range but the range contains the MTRR entry entirely,
__mtrr_type_lookup() ignores such a case because both
start_state and end_state are set to zero.

This bug can cause the following issues:
1) reserve_memtype() tracks an effective memory type in case
   a request type is WB (ex. /dev/mem blindly uses WB). Missing
   to track with its effective type causes a subsequent request
   to map the same range with the effective type to fail.
2) pud_set_huge() and pmd_set_huge() check if a requested range
   has any overlap with MTRRs. Missing to detect an overlap may
   cause a performance penalty or undefined behavior.

This patch fixes the bug by adding a new flag, 'inclusive',
to detect the inclusive case.  This case is then handled in
the same way as end_state:1 since the first region is the same.
With this fix, __mtrr_type_lookup() handles the inclusive case
properly.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/kernel/cpu/mtrr/generic.c |   28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 5b23967..e202d26 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -154,7 +154,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 
 	prev_match = 0xFF;
 	for (i = 0; i < num_var_ranges; ++i) {
-		unsigned short start_state, end_state;
+		unsigned short start_state, end_state, inclusive;
 
 		if (!(mtrr_state.var_ranges[i].mask_lo & (1 << 11)))
 			continue;
@@ -166,19 +166,27 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 
 		start_state = ((start & mask) == (base & mask));
 		end_state = ((end & mask) == (base & mask));
+		inclusive = ((start < base) && (end > base));
 
-		if (start_state != end_state) {
+		if ((start_state != end_state) || inclusive) {
 			/*
 			 * We have start:end spanning across an MTRR.
-			 * We split the region into
-			 * either
-			 * (start:mtrr_end) (mtrr_end:end)
-			 * or
-			 * (start:mtrr_start) (mtrr_start:end)
+			 * We split the region into either
+			 *
+			 * - start_state:1
+			 * (start:mtrr_end)(mtrr_end:end)
+			 * - end_state:1
+			 * (start:mtrr_start)(mtrr_start:end)
+			 * - inclusive:1
+			 * (start:mtrr_start)(mtrr_start:mtrr_end)(mtrr_end:end)
+			 *
 			 * depending on kind of overlap.
-			 * Return the type for first region and a pointer to
-			 * the start of second region so that caller will
-			 * lookup again on the second region.
+			 *
+			 * Return the type of the first region and a pointer
+			 * to the start of next region so that caller will be
+			 * advised to lookup again after having adjusted start
+			 * and end.
+			 *
 			 * Note: This way we handle multiple overlaps as well.
 			 */
 			if (start_state)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v5 3/6] mtrr, x86: Fix MTRR state checks in mtrr_type_lookup()
  2015-05-15 18:23 ` Toshi Kani
@ 2015-05-15 18:23   ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-15 18:23 UTC (permalink / raw)
  To: bp, akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle,
	mcgrof, Toshi Kani

'mtrr_state.enabled' contains the FE (fixed MTRRs enabled)
and E (MTRRs enabled) flags in MSR_MTRRdefType.  Intel SDM,
section 11.11.2.1, defines these flags as follows:
 - All MTRRs are disabled when the E flag is clear.
   The FE flag has no affect when the E flag is clear.
 - The default type is enabled when the E flag is set.
 - MTRR variable ranges are enabled when the E flag is set.
 - MTRR fixed ranges are enabled when both E and FE flags
   are set.

MTRR state checks in __mtrr_type_lookup() do not match with
SDM.  Hence, this patch makes the following changes:
 - The current code detects MTRRs disabled when both E and
   FE flags are clear in mtrr_state.enabled.  Fix to detect
   MTRRs disabled when the E flag is clear.
 - The current code does not check if the FE bit is set in
   mtrr_state.enabled when looking into the fixed entries.
   Fix to check the FE flag.
 - The current code returns the default type when the E flag
   is clear in mtrr_state.enabled.  However, the default type
   is also disabled when the E flag is clear.  Fix to remove
   the code as this case is handled as MTRR disabled with
   the 1st change.

In addition, this patch defines the E and FE flags in
mtrr_state.enabled as follows.
 - FE flag: MTRR_STATE_MTRR_FIXED_ENABLED
 - E  flag: MTRR_STATE_MTRR_ENABLED

print_mtrr_state() and x86_get_mtrr_mem_range() are also updated
accordingly.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/include/asm/mtrr.h        |    4 ++++
 arch/x86/kernel/cpu/mtrr/cleanup.c |    3 ++-
 arch/x86/kernel/cpu/mtrr/generic.c |   15 ++++++++-------
 3 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index f768f62..ef92794 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -127,4 +127,8 @@ struct mtrr_gentry32 {
 				 _IOW(MTRR_IOCTL_BASE,  9, struct mtrr_sentry32)
 #endif /* CONFIG_COMPAT */
 
+/* Bit fields for enabled in struct mtrr_state_type */
+#define MTRR_STATE_MTRR_FIXED_ENABLED	0x01
+#define MTRR_STATE_MTRR_ENABLED		0x02
+
 #endif /* _ASM_X86_MTRR_H */
diff --git a/arch/x86/kernel/cpu/mtrr/cleanup.c b/arch/x86/kernel/cpu/mtrr/cleanup.c
index 5f90b85..70d7c93 100644
--- a/arch/x86/kernel/cpu/mtrr/cleanup.c
+++ b/arch/x86/kernel/cpu/mtrr/cleanup.c
@@ -98,7 +98,8 @@ x86_get_mtrr_mem_range(struct range *range, int nr_range,
 			continue;
 		base = range_state[i].base_pfn;
 		if (base < (1<<(20-PAGE_SHIFT)) && mtrr_state.have_fixed &&
-		    (mtrr_state.enabled & 1)) {
+		    (mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED) &&
+		    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
 			/* Var MTRR contains UC entry below 1M? Skip it: */
 			printk(BIOS_BUG_MSG, i);
 			if (base + size <= (1<<(20-PAGE_SHIFT)))
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index e202d26..b0599db 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -119,14 +119,16 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 	if (!mtrr_state_set)
 		return 0xFF;
 
-	if (!mtrr_state.enabled)
+	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
 		return 0xFF;
 
 	/* Make end inclusive end, instead of exclusive */
 	end--;
 
 	/* Look in fixed ranges. Just return the type as per start */
-	if (mtrr_state.have_fixed && (start < 0x100000)) {
+	if ((start < 0x100000) &&
+	    (mtrr_state.have_fixed) &&
+	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
 		int idx;
 
 		if (start < 0x80000) {
@@ -149,9 +151,6 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 	 * Look of multiple ranges matching this address and pick type
 	 * as per MTRR precedence
 	 */
-	if (!(mtrr_state.enabled & 2))
-		return mtrr_state.def_type;
-
 	prev_match = 0xFF;
 	for (i = 0; i < num_var_ranges; ++i) {
 		unsigned short start_state, end_state, inclusive;
@@ -355,7 +354,9 @@ static void __init print_mtrr_state(void)
 		 mtrr_attrib_to_str(mtrr_state.def_type));
 	if (mtrr_state.have_fixed) {
 		pr_debug("MTRR fixed ranges %sabled:\n",
-			 mtrr_state.enabled & 1 ? "en" : "dis");
+			((mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED) &&
+			 (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) ?
+			 "en" : "dis");
 		print_fixed(0x00000, 0x10000, mtrr_state.fixed_ranges + 0);
 		for (i = 0; i < 2; ++i)
 			print_fixed(0x80000 + i * 0x20000, 0x04000,
@@ -368,7 +369,7 @@ static void __init print_mtrr_state(void)
 		print_fixed_last();
 	}
 	pr_debug("MTRR variable ranges %sabled:\n",
-		 mtrr_state.enabled & 2 ? "en" : "dis");
+		 mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED ? "en" : "dis");
 	high_width = (__ffs64(size_or_mask) - (32 - PAGE_SHIFT) + 3) / 4;
 
 	for (i = 0; i < num_var_ranges; ++i) {

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v5 3/6] mtrr, x86: Fix MTRR state checks in mtrr_type_lookup()
@ 2015-05-15 18:23   ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-15 18:23 UTC (permalink / raw)
  To: bp, akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle,
	mcgrof, Toshi Kani

'mtrr_state.enabled' contains the FE (fixed MTRRs enabled)
and E (MTRRs enabled) flags in MSR_MTRRdefType.  Intel SDM,
section 11.11.2.1, defines these flags as follows:
 - All MTRRs are disabled when the E flag is clear.
   The FE flag has no affect when the E flag is clear.
 - The default type is enabled when the E flag is set.
 - MTRR variable ranges are enabled when the E flag is set.
 - MTRR fixed ranges are enabled when both E and FE flags
   are set.

MTRR state checks in __mtrr_type_lookup() do not match with
SDM.  Hence, this patch makes the following changes:
 - The current code detects MTRRs disabled when both E and
   FE flags are clear in mtrr_state.enabled.  Fix to detect
   MTRRs disabled when the E flag is clear.
 - The current code does not check if the FE bit is set in
   mtrr_state.enabled when looking into the fixed entries.
   Fix to check the FE flag.
 - The current code returns the default type when the E flag
   is clear in mtrr_state.enabled.  However, the default type
   is also disabled when the E flag is clear.  Fix to remove
   the code as this case is handled as MTRR disabled with
   the 1st change.

In addition, this patch defines the E and FE flags in
mtrr_state.enabled as follows.
 - FE flag: MTRR_STATE_MTRR_FIXED_ENABLED
 - E  flag: MTRR_STATE_MTRR_ENABLED

print_mtrr_state() and x86_get_mtrr_mem_range() are also updated
accordingly.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/include/asm/mtrr.h        |    4 ++++
 arch/x86/kernel/cpu/mtrr/cleanup.c |    3 ++-
 arch/x86/kernel/cpu/mtrr/generic.c |   15 ++++++++-------
 3 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index f768f62..ef92794 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -127,4 +127,8 @@ struct mtrr_gentry32 {
 				 _IOW(MTRR_IOCTL_BASE,  9, struct mtrr_sentry32)
 #endif /* CONFIG_COMPAT */
 
+/* Bit fields for enabled in struct mtrr_state_type */
+#define MTRR_STATE_MTRR_FIXED_ENABLED	0x01
+#define MTRR_STATE_MTRR_ENABLED		0x02
+
 #endif /* _ASM_X86_MTRR_H */
diff --git a/arch/x86/kernel/cpu/mtrr/cleanup.c b/arch/x86/kernel/cpu/mtrr/cleanup.c
index 5f90b85..70d7c93 100644
--- a/arch/x86/kernel/cpu/mtrr/cleanup.c
+++ b/arch/x86/kernel/cpu/mtrr/cleanup.c
@@ -98,7 +98,8 @@ x86_get_mtrr_mem_range(struct range *range, int nr_range,
 			continue;
 		base = range_state[i].base_pfn;
 		if (base < (1<<(20-PAGE_SHIFT)) && mtrr_state.have_fixed &&
-		    (mtrr_state.enabled & 1)) {
+		    (mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED) &&
+		    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
 			/* Var MTRR contains UC entry below 1M? Skip it: */
 			printk(BIOS_BUG_MSG, i);
 			if (base + size <= (1<<(20-PAGE_SHIFT)))
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index e202d26..b0599db 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -119,14 +119,16 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 	if (!mtrr_state_set)
 		return 0xFF;
 
-	if (!mtrr_state.enabled)
+	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
 		return 0xFF;
 
 	/* Make end inclusive end, instead of exclusive */
 	end--;
 
 	/* Look in fixed ranges. Just return the type as per start */
-	if (mtrr_state.have_fixed && (start < 0x100000)) {
+	if ((start < 0x100000) &&
+	    (mtrr_state.have_fixed) &&
+	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
 		int idx;
 
 		if (start < 0x80000) {
@@ -149,9 +151,6 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 	 * Look of multiple ranges matching this address and pick type
 	 * as per MTRR precedence
 	 */
-	if (!(mtrr_state.enabled & 2))
-		return mtrr_state.def_type;
-
 	prev_match = 0xFF;
 	for (i = 0; i < num_var_ranges; ++i) {
 		unsigned short start_state, end_state, inclusive;
@@ -355,7 +354,9 @@ static void __init print_mtrr_state(void)
 		 mtrr_attrib_to_str(mtrr_state.def_type));
 	if (mtrr_state.have_fixed) {
 		pr_debug("MTRR fixed ranges %sabled:\n",
-			 mtrr_state.enabled & 1 ? "en" : "dis");
+			((mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED) &&
+			 (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) ?
+			 "en" : "dis");
 		print_fixed(0x00000, 0x10000, mtrr_state.fixed_ranges + 0);
 		for (i = 0; i < 2; ++i)
 			print_fixed(0x80000 + i * 0x20000, 0x04000,
@@ -368,7 +369,7 @@ static void __init print_mtrr_state(void)
 		print_fixed_last();
 	}
 	pr_debug("MTRR variable ranges %sabled:\n",
-		 mtrr_state.enabled & 2 ? "en" : "dis");
+		 mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED ? "en" : "dis");
 	high_width = (__ffs64(size_or_mask) - (32 - PAGE_SHIFT) + 3) / 4;
 
 	for (i = 0; i < num_var_ranges; ++i) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v5 4/6] mtrr, x86: Define MTRR_TYPE_INVALID for mtrr_type_lookup()
  2015-05-15 18:23 ` Toshi Kani
@ 2015-05-15 18:23   ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-15 18:23 UTC (permalink / raw)
  To: bp, akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle,
	mcgrof, Toshi Kani

mtrr_type_lookup() returns 0xFF when it cannot return a valid
MTRR memory type since MTRRs are disabled.  This patch defines
MTRR_TYPE_INVALID to clarify the meaning of this value, and
documents its usage.

Document the return values of Kernel Virtual Address mapping
functions, pud_set_huge(), pmd_set_huge, pud_clear_huge() and
pmd_clear_huge().

There is no functional change in this patch.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/include/asm/mtrr.h        |    2 +-
 arch/x86/include/uapi/asm/mtrr.h   |    8 ++++++-
 arch/x86/kernel/cpu/mtrr/generic.c |   14 ++++++------
 arch/x86/mm/pgtable.c              |   42 +++++++++++++++++++++++++++---------
 4 files changed, 47 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index ef92794..bb03a54 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -55,7 +55,7 @@ static inline u8 mtrr_type_lookup(u64 addr, u64 end)
 	/*
 	 * Return no-MTRRs:
 	 */
-	return 0xff;
+	return MTRR_TYPE_INVALID;
 }
 #define mtrr_save_fixed_ranges(arg) do {} while (0)
 #define mtrr_save_state() do {} while (0)
diff --git a/arch/x86/include/uapi/asm/mtrr.h b/arch/x86/include/uapi/asm/mtrr.h
index d0acb65..7528dcf 100644
--- a/arch/x86/include/uapi/asm/mtrr.h
+++ b/arch/x86/include/uapi/asm/mtrr.h
@@ -103,7 +103,7 @@ struct mtrr_state_type {
 #define MTRRIOC_GET_PAGE_ENTRY   _IOWR(MTRR_IOCTL_BASE, 8, struct mtrr_gentry)
 #define MTRRIOC_KILL_PAGE_ENTRY  _IOW(MTRR_IOCTL_BASE,  9, struct mtrr_sentry)
 
-/*  These are the region types  */
+/* MTRR memory types, which are defined in SDM */
 #define MTRR_TYPE_UNCACHABLE 0
 #define MTRR_TYPE_WRCOMB     1
 /*#define MTRR_TYPE_         2*/
@@ -113,5 +113,11 @@ struct mtrr_state_type {
 #define MTRR_TYPE_WRBACK     6
 #define MTRR_NUM_TYPES       7
 
+/*
+ * Invalid MTRR memory type.  mtrr_type_lookup() returns this value when
+ * MTRRs are disabled.  Note, this value is allocated from the reserved
+ * values (0x7-0xff) of the MTRR memory types.
+ */
+#define MTRR_TYPE_INVALID    0xff
 
 #endif /* _UAPI_ASM_X86_MTRR_H */
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index b0599db..7b1491c 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -104,7 +104,7 @@ static int check_type_overlap(u8 *prev, u8 *curr)
 
 /*
  * Error/Semi-error returns:
- * 0xFF - when MTRR is not enabled
+ * MTRR_TYPE_INVALID - when MTRR is not enabled
  * *repeat == 1 implies [start:end] spanned across MTRR range and type returned
  *		corresponds only to [start:*partial_end].
  *		Caller has to lookup again for [*partial_end:end].
@@ -117,10 +117,10 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 
 	*repeat = 0;
 	if (!mtrr_state_set)
-		return 0xFF;
+		return MTRR_TYPE_INVALID;
 
 	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
-		return 0xFF;
+		return MTRR_TYPE_INVALID;
 
 	/* Make end inclusive end, instead of exclusive */
 	end--;
@@ -151,7 +151,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 	 * Look of multiple ranges matching this address and pick type
 	 * as per MTRR precedence
 	 */
-	prev_match = 0xFF;
+	prev_match = MTRR_TYPE_INVALID;
 	for (i = 0; i < num_var_ranges; ++i) {
 		unsigned short start_state, end_state, inclusive;
 
@@ -206,7 +206,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			continue;
 
 		curr_match = mtrr_state.var_ranges[i].base_lo & 0xff;
-		if (prev_match == 0xFF) {
+		if (prev_match == MTRR_TYPE_INVALID) {
 			prev_match = curr_match;
 			continue;
 		}
@@ -220,7 +220,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			return MTRR_TYPE_WRBACK;
 	}
 
-	if (prev_match != 0xFF)
+	if (prev_match != MTRR_TYPE_INVALID)
 		return prev_match;
 
 	return mtrr_state.def_type;
@@ -229,7 +229,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 /*
  * Returns the effective MTRR type for the region
  * Error return:
- * 0xFF - when MTRR is not enabled
+ * MTRR_TYPE_INVALID - when MTRR is not enabled
  */
 u8 mtrr_type_lookup(u64 start, u64 end)
 {
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 0b97d2c..c30f981 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -563,16 +563,22 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
 }
 
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+/**
+ * pud_set_huge - setup kernel PUD mapping
+ *
+ * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
+ * this function does not set up a huge page when the range is covered
+ * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
+ * disabled.
+ *
+ * Returns 1 on success and 0 on failure.
+ */
 int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 {
 	u8 mtrr;
 
-	/*
-	 * Do not use a huge page when the range is covered by non-WB type
-	 * of MTRRs.
-	 */
 	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
+	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
 		return 0;
 
 	prot = pgprot_4k_2_large(prot);
@@ -584,16 +590,22 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 	return 1;
 }
 
+/**
+ * pmd_set_huge - setup kernel PMD mapping
+ *
+ * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
+ * this function does not set up a huge page when the range is covered
+ * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
+ * disabled.
+ *
+ * Returns 1 on success and 0 on failure.
+ */
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
 {
 	u8 mtrr;
 
-	/*
-	 * Do not use a huge page when the range is covered by non-WB type
-	 * of MTRRs.
-	 */
 	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
+	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
 		return 0;
 
 	prot = pgprot_4k_2_large(prot);
@@ -605,6 +617,11 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
 	return 1;
 }
 
+/**
+ * pud_clear_huge - clear kernel PUD mapping when it is set
+ *
+ * Returns 1 on success and 0 on failure (no PUD map is found).
+ */
 int pud_clear_huge(pud_t *pud)
 {
 	if (pud_large(*pud)) {
@@ -615,6 +632,11 @@ int pud_clear_huge(pud_t *pud)
 	return 0;
 }
 
+/**
+ * pmd_clear_huge - clear kernel PMD mapping when it is set
+ *
+ * Returns 1 on success and 0 on failure (no PMD map is found).
+ */
 int pmd_clear_huge(pmd_t *pmd)
 {
 	if (pmd_large(*pmd)) {

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v5 4/6] mtrr, x86: Define MTRR_TYPE_INVALID for mtrr_type_lookup()
@ 2015-05-15 18:23   ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-15 18:23 UTC (permalink / raw)
  To: bp, akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle,
	mcgrof, Toshi Kani

mtrr_type_lookup() returns 0xFF when it cannot return a valid
MTRR memory type since MTRRs are disabled.  This patch defines
MTRR_TYPE_INVALID to clarify the meaning of this value, and
documents its usage.

Document the return values of Kernel Virtual Address mapping
functions, pud_set_huge(), pmd_set_huge, pud_clear_huge() and
pmd_clear_huge().

There is no functional change in this patch.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/include/asm/mtrr.h        |    2 +-
 arch/x86/include/uapi/asm/mtrr.h   |    8 ++++++-
 arch/x86/kernel/cpu/mtrr/generic.c |   14 ++++++------
 arch/x86/mm/pgtable.c              |   42 +++++++++++++++++++++++++++---------
 4 files changed, 47 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index ef92794..bb03a54 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -55,7 +55,7 @@ static inline u8 mtrr_type_lookup(u64 addr, u64 end)
 	/*
 	 * Return no-MTRRs:
 	 */
-	return 0xff;
+	return MTRR_TYPE_INVALID;
 }
 #define mtrr_save_fixed_ranges(arg) do {} while (0)
 #define mtrr_save_state() do {} while (0)
diff --git a/arch/x86/include/uapi/asm/mtrr.h b/arch/x86/include/uapi/asm/mtrr.h
index d0acb65..7528dcf 100644
--- a/arch/x86/include/uapi/asm/mtrr.h
+++ b/arch/x86/include/uapi/asm/mtrr.h
@@ -103,7 +103,7 @@ struct mtrr_state_type {
 #define MTRRIOC_GET_PAGE_ENTRY   _IOWR(MTRR_IOCTL_BASE, 8, struct mtrr_gentry)
 #define MTRRIOC_KILL_PAGE_ENTRY  _IOW(MTRR_IOCTL_BASE,  9, struct mtrr_sentry)
 
-/*  These are the region types  */
+/* MTRR memory types, which are defined in SDM */
 #define MTRR_TYPE_UNCACHABLE 0
 #define MTRR_TYPE_WRCOMB     1
 /*#define MTRR_TYPE_         2*/
@@ -113,5 +113,11 @@ struct mtrr_state_type {
 #define MTRR_TYPE_WRBACK     6
 #define MTRR_NUM_TYPES       7
 
+/*
+ * Invalid MTRR memory type.  mtrr_type_lookup() returns this value when
+ * MTRRs are disabled.  Note, this value is allocated from the reserved
+ * values (0x7-0xff) of the MTRR memory types.
+ */
+#define MTRR_TYPE_INVALID    0xff
 
 #endif /* _UAPI_ASM_X86_MTRR_H */
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index b0599db..7b1491c 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -104,7 +104,7 @@ static int check_type_overlap(u8 *prev, u8 *curr)
 
 /*
  * Error/Semi-error returns:
- * 0xFF - when MTRR is not enabled
+ * MTRR_TYPE_INVALID - when MTRR is not enabled
  * *repeat == 1 implies [start:end] spanned across MTRR range and type returned
  *		corresponds only to [start:*partial_end].
  *		Caller has to lookup again for [*partial_end:end].
@@ -117,10 +117,10 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 
 	*repeat = 0;
 	if (!mtrr_state_set)
-		return 0xFF;
+		return MTRR_TYPE_INVALID;
 
 	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
-		return 0xFF;
+		return MTRR_TYPE_INVALID;
 
 	/* Make end inclusive end, instead of exclusive */
 	end--;
@@ -151,7 +151,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 	 * Look of multiple ranges matching this address and pick type
 	 * as per MTRR precedence
 	 */
-	prev_match = 0xFF;
+	prev_match = MTRR_TYPE_INVALID;
 	for (i = 0; i < num_var_ranges; ++i) {
 		unsigned short start_state, end_state, inclusive;
 
@@ -206,7 +206,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			continue;
 
 		curr_match = mtrr_state.var_ranges[i].base_lo & 0xff;
-		if (prev_match == 0xFF) {
+		if (prev_match == MTRR_TYPE_INVALID) {
 			prev_match = curr_match;
 			continue;
 		}
@@ -220,7 +220,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			return MTRR_TYPE_WRBACK;
 	}
 
-	if (prev_match != 0xFF)
+	if (prev_match != MTRR_TYPE_INVALID)
 		return prev_match;
 
 	return mtrr_state.def_type;
@@ -229,7 +229,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 /*
  * Returns the effective MTRR type for the region
  * Error return:
- * 0xFF - when MTRR is not enabled
+ * MTRR_TYPE_INVALID - when MTRR is not enabled
  */
 u8 mtrr_type_lookup(u64 start, u64 end)
 {
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 0b97d2c..c30f981 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -563,16 +563,22 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
 }
 
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+/**
+ * pud_set_huge - setup kernel PUD mapping
+ *
+ * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
+ * this function does not set up a huge page when the range is covered
+ * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
+ * disabled.
+ *
+ * Returns 1 on success and 0 on failure.
+ */
 int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 {
 	u8 mtrr;
 
-	/*
-	 * Do not use a huge page when the range is covered by non-WB type
-	 * of MTRRs.
-	 */
 	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
+	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
 		return 0;
 
 	prot = pgprot_4k_2_large(prot);
@@ -584,16 +590,22 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 	return 1;
 }
 
+/**
+ * pmd_set_huge - setup kernel PMD mapping
+ *
+ * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
+ * this function does not set up a huge page when the range is covered
+ * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
+ * disabled.
+ *
+ * Returns 1 on success and 0 on failure.
+ */
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
 {
 	u8 mtrr;
 
-	/*
-	 * Do not use a huge page when the range is covered by non-WB type
-	 * of MTRRs.
-	 */
 	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
+	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
 		return 0;
 
 	prot = pgprot_4k_2_large(prot);
@@ -605,6 +617,11 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
 	return 1;
 }
 
+/**
+ * pud_clear_huge - clear kernel PUD mapping when it is set
+ *
+ * Returns 1 on success and 0 on failure (no PUD map is found).
+ */
 int pud_clear_huge(pud_t *pud)
 {
 	if (pud_large(*pud)) {
@@ -615,6 +632,11 @@ int pud_clear_huge(pud_t *pud)
 	return 0;
 }
 
+/**
+ * pmd_clear_huge - clear kernel PMD mapping when it is set
+ *
+ * Returns 1 on success and 0 on failure (no PMD map is found).
+ */
 int pmd_clear_huge(pmd_t *pmd)
 {
 	if (pmd_large(*pmd)) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v5 5/6] mtrr, x86: Clean up mtrr_type_lookup()
  2015-05-15 18:23 ` Toshi Kani
@ 2015-05-15 18:23   ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-15 18:23 UTC (permalink / raw)
  To: bp, akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle,
	mcgrof, Toshi Kani

MTRRs contain fixed and variable entries.  mtrr_type_lookup()
may repeatedly call __mtrr_type_lookup() to handle a request
that overlaps with variable entries.  However,
__mtrr_type_lookup() also handles the fixed entries, which
do not have to be repeated.  Therefore, this patch creates
separate functions, mtrr_type_lookup_fixed() and
mtrr_type_lookup_variable(), to handle the fixed and variable
ranges respectively.

The patch also updates the function headers to clarify the
return values and output argument.  It updates comments to
clarify that the repeating is necessary to handle overlaps
with the default type, since overlaps with multiple entries
alone can be handled without such repeating.

There is no functional change in this patch.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/kernel/cpu/mtrr/generic.c |  136 +++++++++++++++++++++++-------------
 1 file changed, 85 insertions(+), 51 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 7b1491c..c7d5245 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -102,55 +102,67 @@ static int check_type_overlap(u8 *prev, u8 *curr)
 	return 0;
 }
 
-/*
- * Error/Semi-error returns:
- * MTRR_TYPE_INVALID - when MTRR is not enabled
- * *repeat == 1 implies [start:end] spanned across MTRR range and type returned
- *		corresponds only to [start:*partial_end].
- *		Caller has to lookup again for [*partial_end:end].
+/**
+ * mtrr_type_lookup_fixed - look up memory type in MTRR fixed entries
+ *
+ * Return the MTRR fixed memory type of 'start'.
+ *
+ * MTRR fixed entries are divided into the following ways:
+ *  0x00000 - 0x7FFFF : This range is divided into eight 64KB sub-ranges
+ *  0x80000 - 0xBFFFF : This range is divided into sixteen 16KB sub-ranges
+ *  0xC0000 - 0xFFFFF : This range is divided into sixty-four 4KB sub-ranges
+ *
+ * Return Values:
+ * MTRR_TYPE_(type)  - Matched memory type
+ * MTRR_TYPE_INVALID - Unmatched
  */
-static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
+static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
+{
+	int idx;
+
+	if (start >= 0x100000)
+		return MTRR_TYPE_INVALID;
+
+	if (start < 0x80000) {		/* 0x0 - 0x7FFFF */
+		idx = 0;
+		idx += (start >> 16);
+		return mtrr_state.fixed_ranges[idx];
+
+	} else if (start < 0xC0000) {	/* 0x80000 - 0xBFFFF */
+		idx = 1 * 8;
+		idx += ((start - 0x80000) >> 14);
+		return mtrr_state.fixed_ranges[idx];
+	}
+
+	/* 0xC0000 - 0xFFFFF */
+	idx = 3 * 8;
+	idx += ((start - 0xC0000) >> 12);
+	return mtrr_state.fixed_ranges[idx];
+}
+
+/**
+ * mtrr_type_lookup_variable - look up memory type in MTRR variable entries
+ *
+ * Return Value:
+ * MTRR_TYPE_(type) - Matched memory type or default memory type (unmatched)
+ *
+ * Output Argument:
+ * repeat - Set to 1 when [start:end] spanned across MTRR range and type
+ *	    returned corresponds only to [start:*partial_end].  Caller has
+ *	    to lookup again for [*partial_end:end].
+ */
+static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
+				    int *repeat)
 {
 	int i;
 	u64 base, mask;
 	u8 prev_match, curr_match;
 
 	*repeat = 0;
-	if (!mtrr_state_set)
-		return MTRR_TYPE_INVALID;
-
-	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
-		return MTRR_TYPE_INVALID;
 
 	/* Make end inclusive end, instead of exclusive */
 	end--;
 
-	/* Look in fixed ranges. Just return the type as per start */
-	if ((start < 0x100000) &&
-	    (mtrr_state.have_fixed) &&
-	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
-		int idx;
-
-		if (start < 0x80000) {
-			idx = 0;
-			idx += (start >> 16);
-			return mtrr_state.fixed_ranges[idx];
-		} else if (start < 0xC0000) {
-			idx = 1 * 8;
-			idx += ((start - 0x80000) >> 14);
-			return mtrr_state.fixed_ranges[idx];
-		} else {
-			idx = 3 * 8;
-			idx += ((start - 0xC0000) >> 12);
-			return mtrr_state.fixed_ranges[idx];
-		}
-	}
-
-	/*
-	 * Look in variable ranges
-	 * Look of multiple ranges matching this address and pick type
-	 * as per MTRR precedence
-	 */
 	prev_match = MTRR_TYPE_INVALID;
 	for (i = 0; i < num_var_ranges; ++i) {
 		unsigned short start_state, end_state, inclusive;
@@ -186,7 +198,8 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			 * advised to lookup again after having adjusted start
 			 * and end.
 			 *
-			 * Note: This way we handle multiple overlaps as well.
+			 * Note: This way we handle overlaps with multiple
+			 * entries and the default type properly.
 			 */
 			if (start_state)
 				*partial_end = base + get_mtrr_size(mask);
@@ -215,21 +228,18 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			return curr_match;
 	}
 
-	if (mtrr_tom2) {
-		if (start >= (1ULL<<32) && (end < mtrr_tom2))
-			return MTRR_TYPE_WRBACK;
-	}
-
 	if (prev_match != MTRR_TYPE_INVALID)
 		return prev_match;
 
 	return mtrr_state.def_type;
 }
 
-/*
- * Returns the effective MTRR type for the region
- * Error return:
- * MTRR_TYPE_INVALID - when MTRR is not enabled
+/**
+ * mtrr_type_lookup - look up memory type in MTRR
+ *
+ * Return Values:
+ * MTRR_TYPE_(type)  - The effective MTRR type for the region
+ * MTRR_TYPE_INVALID - MTRR is disabled
  */
 u8 mtrr_type_lookup(u64 start, u64 end)
 {
@@ -237,22 +247,46 @@ u8 mtrr_type_lookup(u64 start, u64 end)
 	int repeat;
 	u64 partial_end;
 
-	type = __mtrr_type_lookup(start, end, &partial_end, &repeat);
+	if (!mtrr_state_set)
+		return MTRR_TYPE_INVALID;
+
+	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
+		return MTRR_TYPE_INVALID;
+
+	/*
+	 * Look up the fixed ranges first, which take priority over
+	 * the variable ranges.
+	 */
+	if ((start < 0x100000) &&
+	    (mtrr_state.have_fixed) &&
+	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
+		return mtrr_type_lookup_fixed(start, end);
+
+	/*
+	 * Look up the variable ranges.  Look of multiple ranges matching
+	 * this address and pick type as per MTRR precedence.
+	 */
+	type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
 
 	/*
 	 * Common path is with repeat = 0.
 	 * However, we can have cases where [start:end] spans across some
-	 * MTRR range. Do repeated lookups for that case here.
+	 * MTRR ranges and/or the default type.  Do repeated lookups for
+	 * that case here.
 	 */
 	while (repeat) {
 		prev_type = type;
 		start = partial_end;
-		type = __mtrr_type_lookup(start, end, &partial_end, &repeat);
+		type = mtrr_type_lookup_variable(start, end, &partial_end,
+						 &repeat);
 
 		if (check_type_overlap(&prev_type, &type))
 			return type;
 	}
 
+	if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
+		return MTRR_TYPE_WRBACK;
+
 	return type;
 }
 

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v5 5/6] mtrr, x86: Clean up mtrr_type_lookup()
@ 2015-05-15 18:23   ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-15 18:23 UTC (permalink / raw)
  To: bp, akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle,
	mcgrof, Toshi Kani

MTRRs contain fixed and variable entries.  mtrr_type_lookup()
may repeatedly call __mtrr_type_lookup() to handle a request
that overlaps with variable entries.  However,
__mtrr_type_lookup() also handles the fixed entries, which
do not have to be repeated.  Therefore, this patch creates
separate functions, mtrr_type_lookup_fixed() and
mtrr_type_lookup_variable(), to handle the fixed and variable
ranges respectively.

The patch also updates the function headers to clarify the
return values and output argument.  It updates comments to
clarify that the repeating is necessary to handle overlaps
with the default type, since overlaps with multiple entries
alone can be handled without such repeating.

There is no functional change in this patch.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/kernel/cpu/mtrr/generic.c |  136 +++++++++++++++++++++++-------------
 1 file changed, 85 insertions(+), 51 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 7b1491c..c7d5245 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -102,55 +102,67 @@ static int check_type_overlap(u8 *prev, u8 *curr)
 	return 0;
 }
 
-/*
- * Error/Semi-error returns:
- * MTRR_TYPE_INVALID - when MTRR is not enabled
- * *repeat == 1 implies [start:end] spanned across MTRR range and type returned
- *		corresponds only to [start:*partial_end].
- *		Caller has to lookup again for [*partial_end:end].
+/**
+ * mtrr_type_lookup_fixed - look up memory type in MTRR fixed entries
+ *
+ * Return the MTRR fixed memory type of 'start'.
+ *
+ * MTRR fixed entries are divided into the following ways:
+ *  0x00000 - 0x7FFFF : This range is divided into eight 64KB sub-ranges
+ *  0x80000 - 0xBFFFF : This range is divided into sixteen 16KB sub-ranges
+ *  0xC0000 - 0xFFFFF : This range is divided into sixty-four 4KB sub-ranges
+ *
+ * Return Values:
+ * MTRR_TYPE_(type)  - Matched memory type
+ * MTRR_TYPE_INVALID - Unmatched
  */
-static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
+static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
+{
+	int idx;
+
+	if (start >= 0x100000)
+		return MTRR_TYPE_INVALID;
+
+	if (start < 0x80000) {		/* 0x0 - 0x7FFFF */
+		idx = 0;
+		idx += (start >> 16);
+		return mtrr_state.fixed_ranges[idx];
+
+	} else if (start < 0xC0000) {	/* 0x80000 - 0xBFFFF */
+		idx = 1 * 8;
+		idx += ((start - 0x80000) >> 14);
+		return mtrr_state.fixed_ranges[idx];
+	}
+
+	/* 0xC0000 - 0xFFFFF */
+	idx = 3 * 8;
+	idx += ((start - 0xC0000) >> 12);
+	return mtrr_state.fixed_ranges[idx];
+}
+
+/**
+ * mtrr_type_lookup_variable - look up memory type in MTRR variable entries
+ *
+ * Return Value:
+ * MTRR_TYPE_(type) - Matched memory type or default memory type (unmatched)
+ *
+ * Output Argument:
+ * repeat - Set to 1 when [start:end] spanned across MTRR range and type
+ *	    returned corresponds only to [start:*partial_end].  Caller has
+ *	    to lookup again for [*partial_end:end].
+ */
+static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
+				    int *repeat)
 {
 	int i;
 	u64 base, mask;
 	u8 prev_match, curr_match;
 
 	*repeat = 0;
-	if (!mtrr_state_set)
-		return MTRR_TYPE_INVALID;
-
-	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
-		return MTRR_TYPE_INVALID;
 
 	/* Make end inclusive end, instead of exclusive */
 	end--;
 
-	/* Look in fixed ranges. Just return the type as per start */
-	if ((start < 0x100000) &&
-	    (mtrr_state.have_fixed) &&
-	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
-		int idx;
-
-		if (start < 0x80000) {
-			idx = 0;
-			idx += (start >> 16);
-			return mtrr_state.fixed_ranges[idx];
-		} else if (start < 0xC0000) {
-			idx = 1 * 8;
-			idx += ((start - 0x80000) >> 14);
-			return mtrr_state.fixed_ranges[idx];
-		} else {
-			idx = 3 * 8;
-			idx += ((start - 0xC0000) >> 12);
-			return mtrr_state.fixed_ranges[idx];
-		}
-	}
-
-	/*
-	 * Look in variable ranges
-	 * Look of multiple ranges matching this address and pick type
-	 * as per MTRR precedence
-	 */
 	prev_match = MTRR_TYPE_INVALID;
 	for (i = 0; i < num_var_ranges; ++i) {
 		unsigned short start_state, end_state, inclusive;
@@ -186,7 +198,8 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			 * advised to lookup again after having adjusted start
 			 * and end.
 			 *
-			 * Note: This way we handle multiple overlaps as well.
+			 * Note: This way we handle overlaps with multiple
+			 * entries and the default type properly.
 			 */
 			if (start_state)
 				*partial_end = base + get_mtrr_size(mask);
@@ -215,21 +228,18 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			return curr_match;
 	}
 
-	if (mtrr_tom2) {
-		if (start >= (1ULL<<32) && (end < mtrr_tom2))
-			return MTRR_TYPE_WRBACK;
-	}
-
 	if (prev_match != MTRR_TYPE_INVALID)
 		return prev_match;
 
 	return mtrr_state.def_type;
 }
 
-/*
- * Returns the effective MTRR type for the region
- * Error return:
- * MTRR_TYPE_INVALID - when MTRR is not enabled
+/**
+ * mtrr_type_lookup - look up memory type in MTRR
+ *
+ * Return Values:
+ * MTRR_TYPE_(type)  - The effective MTRR type for the region
+ * MTRR_TYPE_INVALID - MTRR is disabled
  */
 u8 mtrr_type_lookup(u64 start, u64 end)
 {
@@ -237,22 +247,46 @@ u8 mtrr_type_lookup(u64 start, u64 end)
 	int repeat;
 	u64 partial_end;
 
-	type = __mtrr_type_lookup(start, end, &partial_end, &repeat);
+	if (!mtrr_state_set)
+		return MTRR_TYPE_INVALID;
+
+	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
+		return MTRR_TYPE_INVALID;
+
+	/*
+	 * Look up the fixed ranges first, which take priority over
+	 * the variable ranges.
+	 */
+	if ((start < 0x100000) &&
+	    (mtrr_state.have_fixed) &&
+	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
+		return mtrr_type_lookup_fixed(start, end);
+
+	/*
+	 * Look up the variable ranges.  Look of multiple ranges matching
+	 * this address and pick type as per MTRR precedence.
+	 */
+	type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
 
 	/*
 	 * Common path is with repeat = 0.
 	 * However, we can have cases where [start:end] spans across some
-	 * MTRR range. Do repeated lookups for that case here.
+	 * MTRR ranges and/or the default type.  Do repeated lookups for
+	 * that case here.
 	 */
 	while (repeat) {
 		prev_type = type;
 		start = partial_end;
-		type = __mtrr_type_lookup(start, end, &partial_end, &repeat);
+		type = mtrr_type_lookup_variable(start, end, &partial_end,
+						 &repeat);
 
 		if (check_type_overlap(&prev_type, &type))
 			return type;
 	}
 
+	if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
+		return MTRR_TYPE_WRBACK;
+
 	return type;
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-05-15 18:23 ` Toshi Kani
@ 2015-05-15 18:23   ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-15 18:23 UTC (permalink / raw)
  To: bp, akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle,
	mcgrof, Toshi Kani

This patch adds an additional argument, 'uniform', to
mtrr_type_lookup(), which returns 1 when a given range is
covered uniformly by MTRRs, i.e. the range is fully covered
by a single MTRR entry or the default type.

pud_set_huge() and pmd_set_huge() are changed to check the
new 'uniform' flag to see if it is safe to create a huge page
mapping to the range.  This allows them to create a huge page
mapping to a range covered by a single MTRR entry of any
memory type.  It also detects a non-optimal request properly.
They continue to check with the WB type since the WB type has
no effect even if a request spans multiple MTRR entries.

pmd_set_huge() logs a warning message to a non-optimal request
so that driver writers will be aware of such a case.  Drivers
should make a mapping request aligned to a single MTRR entry
when the range is covered by MTRRs.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/include/asm/mtrr.h        |    4 ++--
 arch/x86/kernel/cpu/mtrr/generic.c |   37 ++++++++++++++++++++++++++----------
 arch/x86/mm/pat.c                  |    4 ++--
 arch/x86/mm/pgtable.c              |   33 ++++++++++++++++++++------------
 4 files changed, 52 insertions(+), 26 deletions(-)

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index bb03a54..a31759e 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -31,7 +31,7 @@
  * arch_phys_wc_add and arch_phys_wc_del.
  */
 # ifdef CONFIG_MTRR
-extern u8 mtrr_type_lookup(u64 addr, u64 end);
+extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform);
 extern void mtrr_save_fixed_ranges(void *);
 extern void mtrr_save_state(void);
 extern int mtrr_add(unsigned long base, unsigned long size,
@@ -50,7 +50,7 @@ extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
 extern int amd_special_default_mtrr(void);
 extern int phys_wc_to_mtrr_index(int handle);
 #  else
-static inline u8 mtrr_type_lookup(u64 addr, u64 end)
+static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
 {
 	/*
 	 * Return no-MTRRs:
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index c7d5245..7d347ac 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -146,19 +146,22 @@ static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
  * Return Value:
  * MTRR_TYPE_(type) - Matched memory type or default memory type (unmatched)
  *
- * Output Argument:
+ * Output Arguments:
  * repeat - Set to 1 when [start:end] spanned across MTRR range and type
  *	    returned corresponds only to [start:*partial_end].  Caller has
  *	    to lookup again for [*partial_end:end].
+ * uniform - Set to 1 when MTRR covers the region uniformly, i.e. the region
+ *	     is fully covered by a single MTRR entry or the default type.
  */
 static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
-				    int *repeat)
+				    int *repeat, u8 *uniform)
 {
 	int i;
 	u64 base, mask;
 	u8 prev_match, curr_match;
 
 	*repeat = 0;
+	*uniform = 1;
 
 	/* Make end inclusive end, instead of exclusive */
 	end--;
@@ -213,6 +216,7 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
 
 			end = *partial_end - 1; /* end is inclusive */
 			*repeat = 1;
+			*uniform = 0;
 		}
 
 		if ((start & mask) != (base & mask))
@@ -224,6 +228,7 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
 			continue;
 		}
 
+		*uniform = 0;
 		if (check_type_overlap(&prev_match, &curr_match))
 			return curr_match;
 	}
@@ -240,10 +245,14 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
  * Return Values:
  * MTRR_TYPE_(type)  - The effective MTRR type for the region
  * MTRR_TYPE_INVALID - MTRR is disabled
+ *
+ * Output Argument:
+ * uniform - Set to 1 when MTRR covers the region uniformly, i.e. the region
+ *	     is fully covered by a single MTRR entry or the default type.
  */
-u8 mtrr_type_lookup(u64 start, u64 end)
+u8 mtrr_type_lookup(u64 start, u64 end, u8 *uniform)
 {
-	u8 type, prev_type;
+	u8 type, prev_type, is_uniform = 1, dummy;
 	int repeat;
 	u64 partial_end;
 
@@ -259,14 +268,18 @@ u8 mtrr_type_lookup(u64 start, u64 end)
 	 */
 	if ((start < 0x100000) &&
 	    (mtrr_state.have_fixed) &&
-	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
-		return mtrr_type_lookup_fixed(start, end);
+	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
+		is_uniform = 0;
+		type = mtrr_type_lookup_fixed(start, end);
+		goto out;
+	}
 
 	/*
 	 * Look up the variable ranges.  Look of multiple ranges matching
 	 * this address and pick type as per MTRR precedence.
 	 */
-	type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
+	type = mtrr_type_lookup_variable(start, end, &partial_end,
+					 &repeat, &is_uniform);
 
 	/*
 	 * Common path is with repeat = 0.
@@ -277,16 +290,20 @@ u8 mtrr_type_lookup(u64 start, u64 end)
 	while (repeat) {
 		prev_type = type;
 		start = partial_end;
+		is_uniform = 0;
+
 		type = mtrr_type_lookup_variable(start, end, &partial_end,
-						 &repeat);
+						 &repeat, &dummy);
 
 		if (check_type_overlap(&prev_type, &type))
-			return type;
+			goto out;
 	}
 
 	if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
-		return MTRR_TYPE_WRBACK;
+		type = MTRR_TYPE_WRBACK;
 
+out:
+	*uniform = is_uniform;
 	return type;
 }
 
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 35af677..372ad42 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -267,9 +267,9 @@ static unsigned long pat_x_mtrr_type(u64 start, u64 end,
 	 * request is for WB.
 	 */
 	if (req_type == _PAGE_CACHE_MODE_WB) {
-		u8 mtrr_type;
+		u8 mtrr_type, uniform;
 
-		mtrr_type = mtrr_type_lookup(start, end);
+		mtrr_type = mtrr_type_lookup(start, end, &uniform);
 		if (mtrr_type != MTRR_TYPE_WRBACK)
 			return _PAGE_CACHE_MODE_UC_MINUS;
 
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index c30f981..3fa0eb9 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -567,18 +567,21 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
  * pud_set_huge - setup kernel PUD mapping
  *
  * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
- * this function does not set up a huge page when the range is covered
- * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
- * disabled.
+ * this function only sets up a huge page in the following conditions.
+ *  - MTRR is disabled.
+ *  - The range is mapped uniformly by MTRR, i.e. the range is fully covered
+ *    by a single MTRR entry or the default type.
+ *  - The MTRR memory type is WB.
  *
  * Returns 1 on success and 0 on failure.
  */
 int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 {
-	u8 mtrr;
+	u8 mtrr, uniform;
 
-	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
+	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
+	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
+	    (mtrr != MTRR_TYPE_WRBACK))
 		return 0;
 
 	prot = pgprot_4k_2_large(prot);
@@ -594,19 +597,25 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
  * pmd_set_huge - setup kernel PMD mapping
  *
  * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
- * this function does not set up a huge page when the range is covered
- * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
- * disabled.
+ * this function only sets up a huge page in the following conditions.
+ *  - MTRR is disabled.
+ *  - The range is mapped uniformly by MTRR, i.e. the range is fully covered
+ *    by a single MTRR entry or the default type.
+ *  - The MTRR memory type is WB.
  *
  * Returns 1 on success and 0 on failure.
  */
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
 {
-	u8 mtrr;
+	u8 mtrr, uniform;
 
-	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
+	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
+	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
+	    (mtrr != MTRR_TYPE_WRBACK)) {
+		pr_warn_once("pmd_set_huge: requesting [mem %#010llx-%#010llx], which spans more than a single MTRR entry\n",
+				addr, addr + PMD_SIZE);
 		return 0;
+	}
 
 	prot = pgprot_4k_2_large(prot);
 

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
@ 2015-05-15 18:23   ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-15 18:23 UTC (permalink / raw)
  To: bp, akpm, hpa, tglx, mingo
  Cc: linux-mm, x86, linux-kernel, dave.hansen, Elliott, pebolle,
	mcgrof, Toshi Kani

This patch adds an additional argument, 'uniform', to
mtrr_type_lookup(), which returns 1 when a given range is
covered uniformly by MTRRs, i.e. the range is fully covered
by a single MTRR entry or the default type.

pud_set_huge() and pmd_set_huge() are changed to check the
new 'uniform' flag to see if it is safe to create a huge page
mapping to the range.  This allows them to create a huge page
mapping to a range covered by a single MTRR entry of any
memory type.  It also detects a non-optimal request properly.
They continue to check with the WB type since the WB type has
no effect even if a request spans multiple MTRR entries.

pmd_set_huge() logs a warning message to a non-optimal request
so that driver writers will be aware of such a case.  Drivers
should make a mapping request aligned to a single MTRR entry
when the range is covered by MTRRs.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 arch/x86/include/asm/mtrr.h        |    4 ++--
 arch/x86/kernel/cpu/mtrr/generic.c |   37 ++++++++++++++++++++++++++----------
 arch/x86/mm/pat.c                  |    4 ++--
 arch/x86/mm/pgtable.c              |   33 ++++++++++++++++++++------------
 4 files changed, 52 insertions(+), 26 deletions(-)

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index bb03a54..a31759e 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -31,7 +31,7 @@
  * arch_phys_wc_add and arch_phys_wc_del.
  */
 # ifdef CONFIG_MTRR
-extern u8 mtrr_type_lookup(u64 addr, u64 end);
+extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform);
 extern void mtrr_save_fixed_ranges(void *);
 extern void mtrr_save_state(void);
 extern int mtrr_add(unsigned long base, unsigned long size,
@@ -50,7 +50,7 @@ extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
 extern int amd_special_default_mtrr(void);
 extern int phys_wc_to_mtrr_index(int handle);
 #  else
-static inline u8 mtrr_type_lookup(u64 addr, u64 end)
+static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
 {
 	/*
 	 * Return no-MTRRs:
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index c7d5245..7d347ac 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -146,19 +146,22 @@ static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
  * Return Value:
  * MTRR_TYPE_(type) - Matched memory type or default memory type (unmatched)
  *
- * Output Argument:
+ * Output Arguments:
  * repeat - Set to 1 when [start:end] spanned across MTRR range and type
  *	    returned corresponds only to [start:*partial_end].  Caller has
  *	    to lookup again for [*partial_end:end].
+ * uniform - Set to 1 when MTRR covers the region uniformly, i.e. the region
+ *	     is fully covered by a single MTRR entry or the default type.
  */
 static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
-				    int *repeat)
+				    int *repeat, u8 *uniform)
 {
 	int i;
 	u64 base, mask;
 	u8 prev_match, curr_match;
 
 	*repeat = 0;
+	*uniform = 1;
 
 	/* Make end inclusive end, instead of exclusive */
 	end--;
@@ -213,6 +216,7 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
 
 			end = *partial_end - 1; /* end is inclusive */
 			*repeat = 1;
+			*uniform = 0;
 		}
 
 		if ((start & mask) != (base & mask))
@@ -224,6 +228,7 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
 			continue;
 		}
 
+		*uniform = 0;
 		if (check_type_overlap(&prev_match, &curr_match))
 			return curr_match;
 	}
@@ -240,10 +245,14 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
  * Return Values:
  * MTRR_TYPE_(type)  - The effective MTRR type for the region
  * MTRR_TYPE_INVALID - MTRR is disabled
+ *
+ * Output Argument:
+ * uniform - Set to 1 when MTRR covers the region uniformly, i.e. the region
+ *	     is fully covered by a single MTRR entry or the default type.
  */
-u8 mtrr_type_lookup(u64 start, u64 end)
+u8 mtrr_type_lookup(u64 start, u64 end, u8 *uniform)
 {
-	u8 type, prev_type;
+	u8 type, prev_type, is_uniform = 1, dummy;
 	int repeat;
 	u64 partial_end;
 
@@ -259,14 +268,18 @@ u8 mtrr_type_lookup(u64 start, u64 end)
 	 */
 	if ((start < 0x100000) &&
 	    (mtrr_state.have_fixed) &&
-	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
-		return mtrr_type_lookup_fixed(start, end);
+	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
+		is_uniform = 0;
+		type = mtrr_type_lookup_fixed(start, end);
+		goto out;
+	}
 
 	/*
 	 * Look up the variable ranges.  Look of multiple ranges matching
 	 * this address and pick type as per MTRR precedence.
 	 */
-	type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
+	type = mtrr_type_lookup_variable(start, end, &partial_end,
+					 &repeat, &is_uniform);
 
 	/*
 	 * Common path is with repeat = 0.
@@ -277,16 +290,20 @@ u8 mtrr_type_lookup(u64 start, u64 end)
 	while (repeat) {
 		prev_type = type;
 		start = partial_end;
+		is_uniform = 0;
+
 		type = mtrr_type_lookup_variable(start, end, &partial_end,
-						 &repeat);
+						 &repeat, &dummy);
 
 		if (check_type_overlap(&prev_type, &type))
-			return type;
+			goto out;
 	}
 
 	if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
-		return MTRR_TYPE_WRBACK;
+		type = MTRR_TYPE_WRBACK;
 
+out:
+	*uniform = is_uniform;
 	return type;
 }
 
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 35af677..372ad42 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -267,9 +267,9 @@ static unsigned long pat_x_mtrr_type(u64 start, u64 end,
 	 * request is for WB.
 	 */
 	if (req_type == _PAGE_CACHE_MODE_WB) {
-		u8 mtrr_type;
+		u8 mtrr_type, uniform;
 
-		mtrr_type = mtrr_type_lookup(start, end);
+		mtrr_type = mtrr_type_lookup(start, end, &uniform);
 		if (mtrr_type != MTRR_TYPE_WRBACK)
 			return _PAGE_CACHE_MODE_UC_MINUS;
 
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index c30f981..3fa0eb9 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -567,18 +567,21 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
  * pud_set_huge - setup kernel PUD mapping
  *
  * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
- * this function does not set up a huge page when the range is covered
- * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
- * disabled.
+ * this function only sets up a huge page in the following conditions.
+ *  - MTRR is disabled.
+ *  - The range is mapped uniformly by MTRR, i.e. the range is fully covered
+ *    by a single MTRR entry or the default type.
+ *  - The MTRR memory type is WB.
  *
  * Returns 1 on success and 0 on failure.
  */
 int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 {
-	u8 mtrr;
+	u8 mtrr, uniform;
 
-	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
+	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
+	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
+	    (mtrr != MTRR_TYPE_WRBACK))
 		return 0;
 
 	prot = pgprot_4k_2_large(prot);
@@ -594,19 +597,25 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
  * pmd_set_huge - setup kernel PMD mapping
  *
  * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
- * this function does not set up a huge page when the range is covered
- * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
- * disabled.
+ * this function only sets up a huge page in the following conditions.
+ *  - MTRR is disabled.
+ *  - The range is mapped uniformly by MTRR, i.e. the range is fully covered
+ *    by a single MTRR entry or the default type.
+ *  - The MTRR memory type is WB.
  *
  * Returns 1 on success and 0 on failure.
  */
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
 {
-	u8 mtrr;
+	u8 mtrr, uniform;
 
-	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
+	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
+	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
+	    (mtrr != MTRR_TYPE_WRBACK)) {
+		pr_warn_once("pmd_set_huge: requesting [mem %#010llx-%#010llx], which spans more than a single MTRR entry\n",
+				addr, addr + PMD_SIZE);
 		return 0;
+	}
 
 	prot = pgprot_4k_2_large(prot);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 1/6] mm, x86: Simplify conditions of HAVE_ARCH_HUGE_VMAP
  2015-05-15 18:23   ` Toshi Kani
  (?)
@ 2015-05-17  8:30   ` Borislav Petkov
  -1 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-17  8:30 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle, mcgrof

On Fri, May 15, 2015 at 12:23:52PM -0600, Toshi Kani wrote:
> Simplify the conditions to select HAVE_ARCH_HUGE_VMAP
> in arch/x86/Kconfig since X86_PAE depends on X86_32.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> ---
>  arch/x86/Kconfig |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 8fec044..73a4d03 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -100,7 +100,7 @@ config X86
>  	select IRQ_FORCED_THREADING
>  	select HAVE_BPF_JIT if X86_64
>  	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
> -	select HAVE_ARCH_HUGE_VMAP if X86_64 || (X86_32 && X86_PAE)
> +	select HAVE_ARCH_HUGE_VMAP if X86_64 || X86_PAE
>  	select ARCH_HAS_SG_CHAIN
>  	select CLKEVT_I8253
>  	select ARCH_HAVE_NMI_SAFE_CMPXCHG

Applied, thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-05-15 18:23   ` Toshi Kani
  (?)
@ 2015-05-18 13:33   ` Borislav Petkov
  2015-05-18 17:22       ` Toshi Kani
  -1 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-18 13:33 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle, mcgrof

On Fri, May 15, 2015 at 12:23:57PM -0600, Toshi Kani wrote:
> This patch adds an additional argument, 'uniform', to
> mtrr_type_lookup(), which returns 1 when a given range is
> covered uniformly by MTRRs, i.e. the range is fully covered
> by a single MTRR entry or the default type.
> 
> pud_set_huge() and pmd_set_huge() are changed to check the
> new 'uniform' flag to see if it is safe to create a huge page
> mapping to the range.  This allows them to create a huge page
> mapping to a range covered by a single MTRR entry of any
> memory type.  It also detects a non-optimal request properly.
> They continue to check with the WB type since the WB type has
> no effect even if a request spans multiple MTRR entries.
> 
> pmd_set_huge() logs a warning message to a non-optimal request
> so that driver writers will be aware of such a case.  Drivers
> should make a mapping request aligned to a single MTRR entry
> when the range is covered by MTRRs.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> ---
>  arch/x86/include/asm/mtrr.h        |    4 ++--
>  arch/x86/kernel/cpu/mtrr/generic.c |   37 ++++++++++++++++++++++++++----------
>  arch/x86/mm/pat.c                  |    4 ++--
>  arch/x86/mm/pgtable.c              |   33 ++++++++++++++++++++------------
>  4 files changed, 52 insertions(+), 26 deletions(-)

...

>  int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
>  {
> -	u8 mtrr;
> +	u8 mtrr, uniform;
>  
> -	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
> -	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
> +	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
> +	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
> +	    (mtrr != MTRR_TYPE_WRBACK)) {
> +		pr_warn_once("pmd_set_huge: requesting [mem %#010llx-%#010llx], which spans more than a single MTRR entry\n",
> +				addr, addr + PMD_SIZE);
>  		return 0;
> +	}

All applied, I reformatted the comments in this last one a bit and made
the warning message hopefully a bit more descriptive:

---
From: Toshi Kani <toshi.kani@hp.com>
Date: Fri, 15 May 2015 12:23:57 -0600
Subject: [PATCH] x86/mm: Enhance MTRR checks in kernel mapping helpers

This patch adds the argument 'uniform' to mtrr_type_lookup(), which gets
set to 1 when a given range is covered uniformly by MTRRs, i.e. the
range is fully covered by a single MTRR entry or the default type.

Change pud_set_huge() and pmd_set_huge() to honor the 'uniform' flag to
see if it is safe to create a huge page mapping in the range.

This allows them to create a huge page mapping in a range covered by
a single MTRR entry of any memory type. It also detects a non-optimal
request properly. They continue to check with the WB type since it does
not effectively change the uniform mapping even if a request spans
multiple MTRR entries.

pmd_set_huge() logs a warning message to a non-optimal request so that
driver writers will be aware of such a case. Drivers should make a
mapping request aligned to a single MTRR entry when the range is covered
by MTRRs.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Cc: dave.hansen@intel.com
Cc: Elliott@hp.com
Cc: pebolle@tiscali.nl
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-mm <linux-mm@kvack.org>
Cc: x86-ml <x86@kernel.org>
Cc: lkml <linux-kernel@vger.kernel.org>
Link: http://lkml.kernel.org/r/1431714237-880-7-git-send-email-toshi.kani@hp.com
[ Realign comments, improve warning message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/mtrr.h        |  4 ++--
 arch/x86/kernel/cpu/mtrr/generic.c | 40 +++++++++++++++++++++++++++----------
 arch/x86/mm/pat.c                  |  4 ++--
 arch/x86/mm/pgtable.c              | 41 +++++++++++++++++++++++++-------------
 4 files changed, 61 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index bb03a547c1ab..a31759e1edd9 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -31,7 +31,7 @@
  * arch_phys_wc_add and arch_phys_wc_del.
  */
 # ifdef CONFIG_MTRR
-extern u8 mtrr_type_lookup(u64 addr, u64 end);
+extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform);
 extern void mtrr_save_fixed_ranges(void *);
 extern void mtrr_save_state(void);
 extern int mtrr_add(unsigned long base, unsigned long size,
@@ -50,7 +50,7 @@ extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
 extern int amd_special_default_mtrr(void);
 extern int phys_wc_to_mtrr_index(int handle);
 #  else
-static inline u8 mtrr_type_lookup(u64 addr, u64 end)
+static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
 {
 	/*
 	 * Return no-MTRRs:
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index e51100c49eea..f782d9b62cb3 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -147,19 +147,24 @@ static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
  * Return Value:
  * MTRR_TYPE_(type) - Matched memory type or default memory type (unmatched)
  *
- * Output Argument:
+ * Output Arguments:
  * repeat - Set to 1 when [start:end] spanned across MTRR range and type
  *	    returned corresponds only to [start:*partial_end].  Caller has
  *	    to lookup again for [*partial_end:end].
+ *
+ * uniform - Set to 1 when an MTRR covers the region uniformly, i.e. the
+ *	     region is fully covered by a single MTRR entry or the default
+ *	     type.
  */
 static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
-				    int *repeat)
+				    int *repeat, u8 *uniform)
 {
 	int i;
 	u64 base, mask;
 	u8 prev_match, curr_match;
 
 	*repeat = 0;
+	*uniform = 1;
 
 	/* Make end inclusive instead of exclusive */
 	end--;
@@ -214,6 +219,7 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
 
 			end = *partial_end - 1; /* end is inclusive */
 			*repeat = 1;
+			*uniform = 0;
 		}
 
 		if ((start & mask) != (base & mask))
@@ -225,6 +231,7 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
 			continue;
 		}
 
+		*uniform = 0;
 		if (check_type_overlap(&prev_match, &curr_match))
 			return curr_match;
 	}
@@ -241,10 +248,15 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
  * Return Values:
  * MTRR_TYPE_(type)  - The effective MTRR type for the region
  * MTRR_TYPE_INVALID - MTRR is disabled
+ *
+ * Output Argument:
+ * uniform - Set to 1 when an MTRR covers the region uniformly, i.e. the
+ *	     region is fully covered by a single MTRR entry or the default
+ *	     type.
  */
-u8 mtrr_type_lookup(u64 start, u64 end)
+u8 mtrr_type_lookup(u64 start, u64 end, u8 *uniform)
 {
-	u8 type, prev_type;
+	u8 type, prev_type, is_uniform = 1, dummy;
 	int repeat;
 	u64 partial_end;
 
@@ -260,14 +272,18 @@ u8 mtrr_type_lookup(u64 start, u64 end)
 	 */
 	if ((start < 0x100000) &&
 	    (mtrr_state.have_fixed) &&
-	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
-		return mtrr_type_lookup_fixed(start, end);
+	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
+		is_uniform = 0;
+		type = mtrr_type_lookup_fixed(start, end);
+		goto out;
+	}
 
 	/*
 	 * Look up the variable ranges.  Look of multiple ranges matching
 	 * this address and pick type as per MTRR precedence.
 	 */
-	type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
+	type = mtrr_type_lookup_variable(start, end, &partial_end,
+					 &repeat, &is_uniform);
 
 	/*
 	 * Common path is with repeat = 0.
@@ -278,15 +294,19 @@ u8 mtrr_type_lookup(u64 start, u64 end)
 	while (repeat) {
 		prev_type = type;
 		start = partial_end;
-		type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
+		is_uniform = 0;
+		type = mtrr_type_lookup_variable(start, end, &partial_end,
+						 &repeat, &dummy);
 
 		if (check_type_overlap(&prev_type, &type))
-			return type;
+			goto out;
 	}
 
 	if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
-		return MTRR_TYPE_WRBACK;
+		type = MTRR_TYPE_WRBACK;
 
+out:
+	*uniform = is_uniform;
 	return type;
 }
 
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 35af6771a95a..372ad422c2c3 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -267,9 +267,9 @@ static unsigned long pat_x_mtrr_type(u64 start, u64 end,
 	 * request is for WB.
 	 */
 	if (req_type == _PAGE_CACHE_MODE_WB) {
-		u8 mtrr_type;
+		u8 mtrr_type, uniform;
 
-		mtrr_type = mtrr_type_lookup(start, end);
+		mtrr_type = mtrr_type_lookup(start, end, &uniform);
 		if (mtrr_type != MTRR_TYPE_WRBACK)
 			return _PAGE_CACHE_MODE_UC_MINUS;
 
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index c30f9819786b..f1894daa79ee 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -566,19 +566,24 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
 /**
  * pud_set_huge - setup kernel PUD mapping
  *
- * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
- * this function does not set up a huge page when the range is covered
- * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
- * disabled.
+ * MTRRs can override PAT memory types with 4KiB granularity. Therefore,
+ * this function sets up a huge page only if all of the following
+ * conditions are met:
+ *
+ *  - MTRRs are disabled.
+ *  - The range is mapped uniformly by an MTRR, i.e. the range is
+ *    fully covered by a single MTRR entry or the default type.
+ *  - The MTRR memory type is WB.
  *
  * Returns 1 on success and 0 on failure.
  */
 int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 {
-	u8 mtrr;
+	u8 mtrr, uniform;
 
-	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
+	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
+	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
+	    (mtrr != MTRR_TYPE_WRBACK))
 		return 0;
 
 	prot = pgprot_4k_2_large(prot);
@@ -593,20 +598,28 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 /**
  * pmd_set_huge - setup kernel PMD mapping
  *
- * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
- * this function does not set up a huge page when the range is covered
- * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
- * disabled.
+ * MTRRs can override PAT memory types with 4KiB granularity. Therefore,
+ * this function sets up a huge page only if all of the following
+ * conditions are met:
+ *
+ *  - MTRR is disabled.
+ *  - The range is mapped uniformly by an MTRR, i.e. the range is
+ *    fully covered by a single MTRR entry or the default type.
+ *  - The MTRR memory type is WB.
  *
  * Returns 1 on success and 0 on failure.
  */
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
 {
-	u8 mtrr;
+	u8 mtrr, uniform;
 
-	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
+	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
+	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
+	    (mtrr != MTRR_TYPE_WRBACK)) {
+		pr_warn_once("%s: Cannot satisfy [mem %#010llx-%#010llx] with a huge-page mapping due to MTRR override.\n",
+			     __func__, addr, addr + PMD_SIZE);
 		return 0;
+	}
 
 	prot = pgprot_4k_2_large(prot);
 
-- 
2.3.5

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4 0/3] Compile-time stack frame pointer validation
@ 2015-05-18 16:34 Josh Poimboeuf
  2015-05-18 16:34 ` [PATCH v4 1/3] x86, stackvalidate: " Josh Poimboeuf
                   ` (3 more replies)
  0 siblings, 4 replies; 710+ messages in thread
From: Josh Poimboeuf @ 2015-05-18 16:34 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Michal Marek, Peter Zijlstra, x86, live-patching, linux-kernel

In discussions around the live kernel patching consistency model RFC
[1], Peter and Ingo correctly pointed out that stack traces aren't
reliable.  And as Ingo said, there's no "strong force" which ensures we
can rely on them.

So I've been thinking about how to fix that.  My goal is to eventually
make stack traces reliable.  Or at the very least, to be able to detect
at runtime when a given stack trace *might* be unreliable.  But improved
stack traces would broadly benefit the entire kernel, regardless of the
outcome of the live kernel patching consistency model discussions.

This patch set is just the first in a series of proposed stack trace
reliability improvements.  Future proposals will include runtime stack
reliability checking, as well as compile-time and runtime DWARF
validations.

As far as I can tell, there are two main obstacles which prevent frame
pointer based stack traces from being reliable:

1) Missing frame pointer logic: currently, most assembly functions don't
   set up the frame pointer.

2) Interrupts: if a function is interrupted before it can save and set
   up the frame pointer, its caller won't show up in the stack trace.

This patch set aims to remove the first obstacle by enforcing that all
asm functions honor CONFIG_FRAME_POINTER.  This is done with a new
stackvalidate host tool which is automatically run for every compiled .S
file and which validates that every asm function does the proper frame
pointer setup.

Also, to make sure somebody didn't forget to annotate their callable asm
code as a function, flag an error for any return instructions which are
hiding outside of a function.  In almost all cases, return instructions
are part of callable functions and should be annotated as such so that
we can validate their frame pointer usage.  A whitelist mechanism exists
for those few return instructions which are not actually in callable
code.

It currently only supports x86_64.  It *almost* supports x86_32, but the
stackvalidate code doesn't yet know how to deal with 32-bit REL
relocations for the return whitelists.  I tried to make the code generic
so that support for other architectures can be plugged in pretty easily.

As a first step, all reported non-compliances result in warnings.  Right
now I'm seeing 200+ warnings.  Once we get them all cleaned up, we can
change the warnings to build errors so the asm code can stay clean.

The patches are based on linux-next.  Patch 1 adds the stackvalidate
host tool.  Patch 2 is a cleanup which makes the push/pop CFI macros
arch-independent, in preparation for patch 3.  Patch 3 adds some helper
macros for asm functions so that they can comply with stackvalidate.

[1] http://lkml.kernel.org/r/cover.1423499826.git.jpoimboe@redhat.com

v4:
- Changed the default to CONFIG_STACK_VALIDATION=n, until all the asm
  code can get cleaned up.
- Fixed a stackvalidate error path exit code issue found by Michal
  Marek.

v3:
- Added a patch to make the push/pop CFI macros arch-independent, as
  suggested by H. Peter Anvin

v2:
- Fixed memory leaks reported by Petr Mladek

Josh Poimboeuf (3):
  x86, stackvalidate: Compile-time stack frame pointer validation
  x86: Make push/pop CFI macros arch-independent
  x86, stackvalidate: Add asm frame pointer setup macros

 MAINTAINERS                           |   6 +
 arch/Kconfig                          |   3 +
 arch/x86/Kconfig                      |   1 +
 arch/x86/Makefile                     |   6 +-
 arch/x86/ia32/ia32entry.S             |  60 +++---
 arch/x86/include/asm/calling.h        |  28 +--
 arch/x86/include/asm/dwarf2.h         |  92 ++++-----
 arch/x86/include/asm/frame.h          |   4 +-
 arch/x86/include/asm/func.h           |  82 ++++++++
 arch/x86/kernel/entry_32.S            | 214 ++++++++++-----------
 arch/x86/kernel/entry_64.S            |  96 +++++-----
 arch/x86/lib/atomic64_386_32.S        |   4 +-
 arch/x86/lib/atomic64_cx8_32.S        |  40 ++--
 arch/x86/lib/checksum_32.S            |  42 ++--
 arch/x86/lib/cmpxchg16b_emu.S         |   6 +-
 arch/x86/lib/cmpxchg8b_emu.S          |   6 +-
 arch/x86/lib/msr-reg.S                |  34 ++--
 arch/x86/lib/rwsem.S                  |  40 ++--
 arch/x86/lib/thunk_32.S               |  12 +-
 arch/x86/lib/thunk_64.S               |  36 ++--
 lib/Kconfig.debug                     |  11 ++
 scripts/Makefile                      |   1 +
 scripts/Makefile.build                |  22 ++-
 scripts/stackvalidate/Makefile        |  17 ++
 scripts/stackvalidate/arch-x86.c      | 134 +++++++++++++
 scripts/stackvalidate/arch.h          |  10 +
 scripts/stackvalidate/elf.c           | 352 ++++++++++++++++++++++++++++++++++
 scripts/stackvalidate/elf.h           |  56 ++++++
 scripts/stackvalidate/list.h          | 217 +++++++++++++++++++++
 scripts/stackvalidate/stackvalidate.c | 226 ++++++++++++++++++++++
 30 files changed, 1484 insertions(+), 374 deletions(-)
 create mode 100644 arch/x86/include/asm/func.h
 create mode 100644 scripts/stackvalidate/Makefile
 create mode 100644 scripts/stackvalidate/arch-x86.c
 create mode 100644 scripts/stackvalidate/arch.h
 create mode 100644 scripts/stackvalidate/elf.c
 create mode 100644 scripts/stackvalidate/elf.h
 create mode 100644 scripts/stackvalidate/list.h
 create mode 100644 scripts/stackvalidate/stackvalidate.c

-- 
2.1.0


^ permalink raw reply	[flat|nested] 710+ messages in thread

* [PATCH v4 1/3] x86, stackvalidate: Compile-time stack frame pointer validation
  2015-05-18 16:34 [PATCH v4 0/3] Compile-time stack frame pointer validation Josh Poimboeuf
@ 2015-05-18 16:34 ` Josh Poimboeuf
  2015-05-18 16:34 ` [PATCH v4 2/3] x86: Make push/pop CFI macros arch-independent Josh Poimboeuf
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 710+ messages in thread
From: Josh Poimboeuf @ 2015-05-18 16:34 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Michal Marek, Peter Zijlstra, x86, live-patching, linux-kernel

Frame pointer based stack traces aren't always reliable.  One big reason
is that most asm functions don't set up the frame pointer.

Fix that by enforcing that all asm functions honor CONFIG_FRAME_POINTER.
This is done with a new stackvalidate host tool which is automatically
run for every compiled .S file and which validates that every asm
function does the proper frame pointer setup.

Also, to make sure somebody didn't forget to annotate their callable asm code
as a function, flag an error for any return instructions which are hiding
outside of a function.  In almost all cases, return instructions are part of
callable functions and should be annotated as such so that we can validate
their frame pointer usage.  A whitelist mechanism exists for those few return
instructions which are not actually in callable code.

It currently only supports x86_64.  It *almost* supports x86_32, but the
stackvalidate code doesn't yet know how to deal with 32-bit REL
relocations for the return whitelists.  I tried to make the code generic
so that support for other architectures can be plugged in pretty easily.

As a first step, CONFIG_STACK_VALIDATION is disabled by default, and all
reported non-compliances result in warnings.  Right now I'm seeing 200+
warnings.  Once we get them all cleaned up, we can change the default to
CONFIG_STACK_VALIDATION=y and change the warnings to build errors so the
asm code can stay clean.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Michal Marek <mmarek@suse.cz>
---
 MAINTAINERS                           |   6 +
 arch/Kconfig                          |   3 +
 arch/x86/Kconfig                      |   1 +
 arch/x86/Makefile                     |   6 +-
 lib/Kconfig.debug                     |  11 ++
 scripts/Makefile                      |   1 +
 scripts/Makefile.build                |  22 ++-
 scripts/stackvalidate/Makefile        |  17 ++
 scripts/stackvalidate/arch-x86.c      | 134 +++++++++++++
 scripts/stackvalidate/arch.h          |  10 +
 scripts/stackvalidate/elf.c           | 352 ++++++++++++++++++++++++++++++++++
 scripts/stackvalidate/elf.h           |  56 ++++++
 scripts/stackvalidate/list.h          | 217 +++++++++++++++++++++
 scripts/stackvalidate/stackvalidate.c | 226 ++++++++++++++++++++++
 14 files changed, 1059 insertions(+), 3 deletions(-)
 create mode 100644 scripts/stackvalidate/Makefile
 create mode 100644 scripts/stackvalidate/arch-x86.c
 create mode 100644 scripts/stackvalidate/arch.h
 create mode 100644 scripts/stackvalidate/elf.c
 create mode 100644 scripts/stackvalidate/elf.h
 create mode 100644 scripts/stackvalidate/list.h
 create mode 100644 scripts/stackvalidate/stackvalidate.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 78ea7b6..6d700bf 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9451,6 +9451,12 @@ L:	stable@vger.kernel.org
 S:	Supported
 F:	Documentation/stable_kernel_rules.txt
 
+STACK VALIDATION
+M:	Josh Poimboeuf <jpoimboe@redhat.com>
+S:	Supported
+F:	scripts/stackvalidate/
+F:	arch/x86/include/asm/func.h
+
 STAGING SUBSYSTEM
 M:	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
diff --git a/arch/Kconfig b/arch/Kconfig
index bec6666..a5c3f50 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -506,6 +506,9 @@ config HAVE_COPY_THREAD_TLS
 	  normal C parameter passing, rather than extracting the syscall
 	  argument from pt_regs.
 
+config HAVE_STACK_VALIDATION
+	bool
+
 #
 # ABI hall of shame
 #
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c92fdcc..d60a2378a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -146,6 +146,7 @@ config X86
 	select ACPI_LEGACY_TABLES_LOOKUP if ACPI
 	select X86_FEATURE_NAMES if PROC_FS
 	select SRCU
+	select HAVE_STACK_VALIDATION if FRAME_POINTER && X86_64
 
 config INSTRUCTION_DECODER
 	def_bool y
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 57996ee..c5598a0 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -180,9 +180,13 @@ KBUILD_CFLAGS += $(call cc-option,-mno-avx,)
 KBUILD_CFLAGS += $(mflags-y)
 KBUILD_AFLAGS += $(mflags-y)
 
-archscripts: scripts_basic
+archscripts: scripts_basic $(objtree)/arch/x86/lib/inat-tables.c
 	$(Q)$(MAKE) $(build)=arch/x86/tools relocs
 
+# this file is needed early by scripts/stackvalidate
+$(objtree)/arch/x86/lib/inat-tables.c:
+	$(Q)$(MAKE) $(build)=arch/x86/lib $@
+
 ###
 # Syscall table generation
 
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index eb3997b..7bfaf80 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -332,6 +332,17 @@ config FRAME_POINTER
 	  larger and slower, but it gives very useful debugging information
 	  in case of kernel bugs. (precise oopses/stacktraces/warnings)
 
+
+config STACK_VALIDATION
+	bool "Enable kernel stack validation"
+	depends on HAVE_STACK_VALIDATION
+	default n
+	help
+	  Add compile-time validations which help make kernel stack traces more
+	  reliable.  This includes checks to ensure that assembly functions
+	  save, update and restore the frame pointer or the back chain pointer.
+
+
 config DEBUG_FORCE_WEAK_PER_CPU
 	bool "Force weak per-cpu definitions"
 	depends on DEBUG_KERNEL
diff --git a/scripts/Makefile b/scripts/Makefile
index 2016a64..c882a91 100644
--- a/scripts/Makefile
+++ b/scripts/Makefile
@@ -37,6 +37,7 @@ subdir-y                     += mod
 subdir-$(CONFIG_SECURITY_SELINUX) += selinux
 subdir-$(CONFIG_DTC)         += dtc
 subdir-$(CONFIG_GDB_SCRIPTS) += gdb
+subdir-$(CONFIG_STACK_VALIDATION) += stackvalidate
 
 # Let clean descend into subdirs
 subdir-	+= basic kconfig package
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 01df30a..3b05833 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -253,6 +253,24 @@ define rule_cc_o_c
 	mv -f $(dot-target).tmp $(dot-target).cmd
 endef
 
+ifdef CONFIG_STACK_VALIDATION
+stackvalidate = $(objtree)/scripts/stackvalidate/stackvalidate
+cmd_stackvalidate =							  \
+	case $(@) in							  \
+		arch/x86/purgatory/*) ;;				  \
+		*) $(stackvalidate) "$(@)"; \
+	esac;
+endif
+
+define rule_as_o_S
+	$(call echo-cmd,as_o_S) $(cmd_as_o_S);				  \
+	$(cmd_stackvalidate)						  \
+	scripts/basic/fixdep $(depfile) $@ '$(call make-cmd,as_o_S)' >    \
+	                                              $(dot-target).tmp;  \
+	rm -f $(depfile);						  \
+	mv -f $(dot-target).tmp $(dot-target).cmd
+endef
+
 # Built-in and composite module parts
 $(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE
 	$(call cmd,force_checksrc)
@@ -290,8 +308,8 @@ $(obj)/%.s: $(src)/%.S FORCE
 quiet_cmd_as_o_S = AS $(quiet_modtag)  $@
 cmd_as_o_S       = $(CC) $(a_flags) -c -o $@ $<
 
-$(obj)/%.o: $(src)/%.S FORCE
-	$(call if_changed_dep,as_o_S)
+$(obj)/%.o: $(src)/%.S $(stackvalidate) FORCE
+	$(call if_changed_rule,as_o_S)
 
 targets += $(real-objs-y) $(real-objs-m) $(lib-y)
 targets += $(extra-y) $(MAKECMDGOALS) $(always)
diff --git a/scripts/stackvalidate/Makefile b/scripts/stackvalidate/Makefile
new file mode 100644
index 0000000..6027ec4
--- /dev/null
+++ b/scripts/stackvalidate/Makefile
@@ -0,0 +1,17 @@
+hostprogs-y := stackvalidate
+always := $(hostprogs-y)
+
+stackvalidate-objs := stackvalidate.o elf.o
+
+HOSTCFLAGS += -Werror
+HOSTLOADLIBES_stackvalidate := -lelf
+
+ifdef CONFIG_X86
+
+stackvalidate-objs += arch-x86.o
+
+HOSTCFLAGS_arch-x86.o := -I$(objtree)/arch/x86/lib/ -I$(srctree)/arch/x86/include/ -I$(srctree)/arch/x86/lib/
+
+$(obj)/arch-x86.o: $(srctree)/arch/x86/lib/insn.c $(srctree)/arch/x86/lib/inat.c $(srctree)/arch/x86/include/asm/inat_types.h $(srctree)/arch/x86/include/asm/inat.h $(srctree)/arch/x86/include/asm/insn.h $(objtree)/arch/x86/lib/inat-tables.c
+
+endif
diff --git a/scripts/stackvalidate/arch-x86.c b/scripts/stackvalidate/arch-x86.c
new file mode 100644
index 0000000..fbc0756
--- /dev/null
+++ b/scripts/stackvalidate/arch-x86.c
@@ -0,0 +1,134 @@
+#include <stdio.h>
+
+#define unlikely(cond) (cond)
+#include <asm/insn.h>
+#include <inat.c>
+#include <insn.c>
+
+#include "elf.h"
+#include "arch.h"
+
+static int is_x86_64(struct elf *elf)
+{
+	switch (elf->ehdr.e_machine) {
+	case EM_X86_64:
+		return 1;
+	case EM_386:
+		return 0;
+	default:
+		WARN("unexpected ELF machine type %d", elf->ehdr.e_machine);
+		return -1;
+	}
+}
+
+/*
+ * arch_validate_function() - Ensures the given asm function saves, sets up,
+ * and restores the frame pointer.
+ *
+ * The frame pointer prologue/epilogue should look something like:
+ *
+ *   push %rbp
+ *   mov %rsp, %rbp
+ *   [ function body ]
+ *   pop %rbp
+ *   ret
+ *
+ * Return value:
+ *   -1: bad instruction
+ *    1: missing frame pointer logic
+ *    0: validation succeeded
+ */
+int arch_validate_function(struct elf *elf, struct symbol *func)
+{
+	struct insn insn;
+	unsigned long addr, length;
+	int push, mov, pop, ret, x86_64;
+
+	push = mov = pop = ret = 0;
+
+	x86_64 = is_x86_64(elf);
+	if (x86_64 == -1)
+		return -1;
+
+	for (addr = func->start; addr < func->end; addr += length) {
+		insn_init(&insn, (void *)addr, func->end - addr, x86_64);
+		insn_get_length(&insn);
+		length = insn.length;
+		insn_get_opcode(&insn);
+		if (!length || !insn.opcode.got) {
+			WARN("%s+0x%lx: bad instruction", func->name,
+			     addr - func->start);
+			return -1;
+		}
+
+		switch (insn.opcode.bytes[0]) {
+		case 0x55:
+			if (!insn.rex_prefix.nbytes)
+				/* push bp */
+				push++;
+			break;
+		case 0x5d:
+			if (!insn.rex_prefix.nbytes)
+				/* pop bp */
+				pop++;
+			break;
+		case 0xc9: /* leave */
+			pop++;
+			break;
+		case 0x89:
+			insn_get_modrm(&insn);
+			if (insn.modrm.bytes[0] == 0xe5)
+				/* mov sp, bp */
+				mov++;
+			break;
+		case 0xc3: /* ret */
+			ret++;
+			break;
+		}
+	}
+
+	if (push != 1 || mov != 1 || !pop || !ret || pop != ret) {
+		WARN("%s() is missing frame pointer logic.  Please use FUNC_ENTER.",
+		     func->name);
+		return 1;
+	}
+
+	return 0;
+}
+
+/*
+ * arch_is_return_insn() - Determines whether the instruction at the given
+ * address is a return instruction.  Also returns the instruction length in
+ * *len.
+ *
+ * Return value:
+ *   -1: bad instruction
+ *    0: no, it's not a return instruction
+ *    1: yes, it's a return instruction
+ */
+int arch_is_return_insn(struct elf *elf, unsigned long addr,
+			unsigned int maxlen, unsigned int *len)
+{
+	struct insn insn;
+	int x86_64;
+
+	x86_64 = is_x86_64(elf);
+	if (x86_64 == -1)
+		return -1;
+
+	insn_init(&insn, (void *)addr, maxlen, x86_64);
+	insn_get_length(&insn);
+	insn_get_opcode(&insn);
+	if (!insn.opcode.got)
+		return -1;
+
+	*len = insn.length;
+
+	switch (insn.opcode.bytes[0]) {
+	case 0xc2: case 0xc3: /* ret near */
+	case 0xca: case 0xcb: /* ret far */
+		return 1;
+	}
+
+	return 0;
+}
diff --git a/scripts/stackvalidate/arch.h b/scripts/stackvalidate/arch.h
new file mode 100644
index 0000000..3b91b1c
--- /dev/null
+++ b/scripts/stackvalidate/arch.h
@@ -0,0 +1,10 @@
+#ifndef _ARCH_H_
+#define _ARCH_H_
+
+#include "elf.h"
+
+int arch_validate_function(struct elf *elf, struct symbol *func);
+int arch_is_return_insn(struct elf *elf, unsigned long addr,
+			unsigned int maxlen, unsigned int *len);
+
+#endif /* _ARCH_H_ */
diff --git a/scripts/stackvalidate/elf.c b/scripts/stackvalidate/elf.c
new file mode 100644
index 0000000..a1419a5
--- /dev/null
+++ b/scripts/stackvalidate/elf.c
@@ -0,0 +1,352 @@
+/*
+ * elf.c - ELF access library
+ *
+ * Adapted from kpatch (https://github.com/dynup/kpatch):
+ * Copyright (C) 2013-2015 Josh Poimboeuf <jpoimboe@redhat.com>
+ * Copyright (C) 2014 Seth Jennings <sjenning@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#include "elf.h"
+
+struct section *elf_find_section_by_name(struct elf *elf, const char *name)
+{
+	struct section *sec;
+
+	list_for_each_entry(sec, &elf->sections, list)
+		if (!strcmp(sec->name, name))
+			return sec;
+
+	return NULL;
+}
+
+static struct section *elf_find_section_by_index(struct elf *elf,
+						 unsigned int index)
+{
+	struct section *sec;
+
+	list_for_each_entry(sec, &elf->sections, list)
+		if (sec->index == index)
+			return sec;
+
+	return NULL;
+}
+
+static struct symbol *elf_find_symbol_by_index(struct elf *elf,
+					       unsigned int index)
+{
+	struct section *sec;
+	struct symbol *sym;
+
+	list_for_each_entry(sec, &elf->sections, list)
+		list_for_each_entry(sym, &sec->symbols, list)
+			if (sym->index == index)
+				return sym;
+
+	return NULL;
+}
+
+static int elf_read_sections(struct elf *elf)
+{
+	Elf_Scn *s = NULL;
+	struct section *sec;
+	size_t shstrndx, sections_nr;
+	int i;
+
+	if (elf_getshdrnum(elf->elf, &sections_nr)) {
+		perror("elf_getshdrnum");
+		return -1;
+	}
+
+	if (elf_getshdrstrndx(elf->elf, &shstrndx)) {
+		perror("elf_getshdrstrndx");
+		return -1;
+	}
+
+	for (i = 0; i < sections_nr; i++) {
+		sec = malloc(sizeof(*sec));
+		if (!sec) {
+			perror("malloc");
+			return -1;
+		}
+		memset(sec, 0, sizeof(*sec));
+
+		INIT_LIST_HEAD(&sec->symbols);
+		INIT_LIST_HEAD(&sec->relas);
+
+		list_add_tail(&sec->list, &elf->sections);
+
+		s = elf_getscn(elf->elf, i);
+		if (!s) {
+			perror("elf_getscn");
+			return -1;
+		}
+
+		sec->index = elf_ndxscn(s);
+
+		if (!gelf_getshdr(s, &sec->sh)) {
+			perror("gelf_getshdr");
+			return -1;
+		}
+
+		sec->name = elf_strptr(elf->elf, shstrndx, sec->sh.sh_name);
+		if (!sec->name) {
+			perror("elf_strptr");
+			return -1;
+		}
+
+		sec->data = elf_getdata(s, NULL);
+		if (!sec->data) {
+			perror("elf_getdata");
+			return -1;
+		}
+
+		if (sec->data->d_off != 0 ||
+		    sec->data->d_size != sec->sh.sh_size) {
+			WARN("unexpected data attributes for %s", sec->name);
+			return -1;
+		}
+
+		sec->start = (unsigned long)sec->data->d_buf;
+		sec->end = sec->start + sec->data->d_size;
+	}
+
+	/* sanity check, one more call to elf_nextscn() should return NULL */
+	if (elf_nextscn(elf->elf, s)) {
+		WARN("section entry mismatch");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int elf_read_symbols(struct elf *elf)
+{
+	struct section *symtab;
+	struct symbol *sym;
+	struct list_head *entry, *tmp;
+	int symbols_nr, i;
+
+	symtab = elf_find_section_by_name(elf, ".symtab");
+	if (!symtab) {
+		WARN("missing symbol table");
+		return -1;
+	}
+
+	symbols_nr = symtab->sh.sh_size / symtab->sh.sh_entsize;
+
+	for (i = 0; i < symbols_nr; i++) {
+		sym = malloc(sizeof(*sym));
+		if (!sym) {
+			perror("malloc");
+			return -1;
+		}
+		memset(sym, 0, sizeof(*sym));
+
+		sym->index = i;
+
+		if (!gelf_getsym(symtab->data, i, &sym->sym)) {
+			perror("gelf_getsym");
+			goto err;
+		}
+
+		sym->name = elf_strptr(elf->elf, symtab->sh.sh_link,
+				       sym->sym.st_name);
+		if (!sym->name) {
+			perror("elf_strptr");
+			goto err;
+		}
+
+		sym->type = GELF_ST_TYPE(sym->sym.st_info);
+		sym->bind = GELF_ST_BIND(sym->sym.st_info);
+
+		if (sym->sym.st_shndx > SHN_UNDEF &&
+		    sym->sym.st_shndx < SHN_LORESERVE) {
+			sym->sec = elf_find_section_by_index(elf,
+							     sym->sym.st_shndx);
+			if (!sym->sec) {
+				WARN("couldn't find section for symbol %s",
+				     sym->name);
+				goto err;
+			}
+			if (sym->type == STT_SECTION)
+				sym->name = sym->sec->name;
+		} else
+			sym->sec = elf_find_section_by_index(elf, 0);
+
+		sym->start = sym->sec->start + sym->sym.st_value;
+		sym->end = sym->start + sym->sym.st_size;
+
+		/* sorted insert into a per-section list */
+		entry = &sym->sec->symbols;
+		list_for_each_prev(tmp, &sym->sec->symbols) {
+			struct symbol *s;
+
+			s = list_entry(tmp, struct symbol, list);
+
+			if (sym->start > s->start) {
+				entry = tmp;
+				break;
+			}
+
+			if (sym->start == s->start && sym->end >= s->end) {
+				entry = tmp;
+				break;
+			}
+		}
+		list_add(&sym->list, entry);
+	}
+
+	return 0;
+
+err:
+	free(sym);
+	return -1;
+}
+
+static int elf_read_relas(struct elf *elf)
+{
+	struct section *sec;
+	struct rela *rela;
+	int i;
+	unsigned int symndx;
+
+	list_for_each_entry(sec, &elf->sections, list) {
+		if (sec->sh.sh_type != SHT_RELA)
+			continue;
+
+		sec->base = elf_find_section_by_name(elf, sec->name + 5);
+		if (!sec->base) {
+			WARN("can't find base section for rela section %s",
+			     sec->name);
+			return -1;
+		}
+
+		sec->base->rela = sec;
+
+		for (i = 0; i < sec->sh.sh_size / sec->sh.sh_entsize; i++) {
+			rela = malloc(sizeof(*rela));
+			if (!rela) {
+				perror("malloc");
+				return -1;
+			}
+			memset(rela, 0, sizeof(*rela));
+
+			list_add_tail(&rela->list, &sec->relas);
+
+			if (!gelf_getrela(sec->data, i, &rela->rela)) {
+				perror("gelf_getrela");
+				return -1;
+			}
+
+			rela->type = GELF_R_TYPE(rela->rela.r_info);
+			rela->addend = rela->rela.r_addend;
+			rela->offset = rela->rela.r_offset;
+			symndx = GELF_R_SYM(rela->rela.r_info);
+			rela->sym = elf_find_symbol_by_index(elf, symndx);
+			if (!rela->sym) {
+				WARN("can't find rela entry symbol %d for %s",
+				     symndx, sec->name);
+				return -1;
+			}
+		}
+	}
+
+	return 0;
+}
+
+struct elf *elf_open(const char *name)
+{
+	struct elf *elf;
+
+	elf_version(EV_CURRENT);
+
+	elf = malloc(sizeof(*elf));
+	if (!elf) {
+		perror("malloc");
+		return NULL;
+	}
+	memset(elf, 0, sizeof(*elf));
+
+	INIT_LIST_HEAD(&elf->sections);
+
+	elf->name = strdup(name);
+	if (!elf->name) {
+		perror("strdup");
+		goto err;
+	}
+
+	elf->fd = open(name, O_RDONLY);
+	if (elf->fd == -1) {
+		perror("open");
+		goto err;
+	}
+
+	elf->elf = elf_begin(elf->fd, ELF_C_READ_MMAP, NULL);
+	if (!elf->elf) {
+		perror("elf_begin");
+		goto err;
+	}
+
+	if (!gelf_getehdr(elf->elf, &elf->ehdr)) {
+		perror("gelf_getehdr");
+		goto err;
+	}
+
+	if (elf_read_sections(elf))
+		goto err;
+
+	if (elf_read_symbols(elf))
+		goto err;
+
+	if (elf_read_relas(elf))
+		goto err;
+
+	return elf;
+
+err:
+	elf_close(elf);
+	return NULL;
+}
+
+void elf_close(struct elf *elf)
+{
+	struct section *sec, *tmpsec;
+	struct symbol *sym, *tmpsym;
+
+	list_for_each_entry_safe(sec, tmpsec, &elf->sections, list) {
+		list_for_each_entry_safe(sym, tmpsym, &sec->symbols, list) {
+			list_del(&sym->list);
+			free(sym);
+		}
+		list_del(&sec->list);
+		free(sec);
+	}
+	if (elf->name)
+		free(elf->name);
+	if (elf->fd > 0)
+		close(elf->fd);
+	if (elf->elf)
+		elf_end(elf->elf);
+	free(elf);
+}
diff --git a/scripts/stackvalidate/elf.h b/scripts/stackvalidate/elf.h
new file mode 100644
index 0000000..db5d5fa
--- /dev/null
+++ b/scripts/stackvalidate/elf.h
@@ -0,0 +1,56 @@
+#ifndef _ELF_H_
+#define _ELF_H_
+
+#include <gelf.h>
+#include "list.h"
+
+#define WARN(format, ...) \
+	fprintf(stderr, \
+		"%s: " format "\n", \
+		elf->name, ##__VA_ARGS__)
+
+struct section {
+	struct list_head list;
+	GElf_Shdr sh;
+	struct list_head symbols;
+	struct list_head relas;
+	struct section *base, *rela;
+	Elf_Data *data;
+	char *name;
+	int index;
+	unsigned long start, end;
+};
+
+struct symbol {
+	struct list_head list;
+	GElf_Sym sym;
+	struct section *sec;
+	char *name;
+	int index;
+	unsigned char bind, type;
+	unsigned long start, end;
+};
+
+struct rela {
+	struct list_head list;
+	GElf_Rela rela;
+	struct symbol *sym;
+	unsigned int type;
+	int offset;
+	int addend;
+};
+
+struct elf {
+	Elf *elf;
+	GElf_Ehdr ehdr;
+	int fd;
+	char *name;
+	struct list_head sections;
+};
+
+
+struct elf *elf_open(const char *name);
+struct section *elf_find_section_by_name(struct elf *elf, const char *name);
+void elf_close(struct elf *elf);
+
+#endif /* _ELF_H_ */
diff --git a/scripts/stackvalidate/list.h b/scripts/stackvalidate/list.h
new file mode 100644
index 0000000..25716b5
--- /dev/null
+++ b/scripts/stackvalidate/list.h
@@ -0,0 +1,217 @@
+#ifndef _LIST_H
+#define _LIST_H
+
+#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
+
+#define container_of(ptr, type, member) ({ \
+	const typeof(((type *)0)->member) *__mptr = (ptr); \
+	(type *)((char *)__mptr - offsetof(type, member)); })
+
+#define LIST_POISON1 ((void *) 0x00100100)
+#define LIST_POISON2 ((void *) 0x00200200)
+
+struct list_head {
+	struct list_head *next, *prev;
+};
+
+#define LIST_HEAD_INIT(name) { &(name), &(name) }
+
+#define LIST_HEAD(name) \
+	struct list_head name = LIST_HEAD_INIT(name)
+
+static inline void INIT_LIST_HEAD(struct list_head *list)
+{
+	list->next = list;
+	list->prev = list;
+}
+
+static inline void __list_add(struct list_head *new,
+			      struct list_head *prev,
+			      struct list_head *next)
+{
+	next->prev = new;
+	new->next = next;
+	new->prev = prev;
+	prev->next = new;
+}
+
+static inline void list_add(struct list_head *new, struct list_head *head)
+{
+	__list_add(new, head, head->next);
+}
+
+static inline void list_add_tail(struct list_head *new, struct list_head *head)
+{
+	__list_add(new, head->prev, head);
+}
+
+static inline void __list_del(struct list_head *prev, struct list_head *next)
+{
+	next->prev = prev;
+	prev->next = next;
+}
+
+static inline void __list_del_entry(struct list_head *entry)
+{
+	__list_del(entry->prev, entry->next);
+}
+
+static inline void list_del(struct list_head *entry)
+{
+	__list_del(entry->prev, entry->next);
+	entry->next = LIST_POISON1;
+	entry->prev = LIST_POISON2;
+}
+
+static inline void list_replace(struct list_head *old,
+				struct list_head *new)
+{
+	new->next = old->next;
+	new->next->prev = new;
+	new->prev = old->prev;
+	new->prev->next = new;
+}
+
+static inline void list_replace_init(struct list_head *old,
+					struct list_head *new)
+{
+	list_replace(old, new);
+	INIT_LIST_HEAD(old);
+}
+
+static inline void list_del_init(struct list_head *entry)
+{
+	__list_del_entry(entry);
+	INIT_LIST_HEAD(entry);
+}
+
+static inline void list_move(struct list_head *list, struct list_head *head)
+{
+	__list_del_entry(list);
+	list_add(list, head);
+}
+
+static inline void list_move_tail(struct list_head *list,
+				  struct list_head *head)
+{
+	__list_del_entry(list);
+	list_add_tail(list, head);
+}
+
+static inline int list_is_last(const struct list_head *list,
+				const struct list_head *head)
+{
+	return list->next == head;
+}
+
+static inline int list_empty(const struct list_head *head)
+{
+	return head->next == head;
+}
+
+static inline int list_empty_careful(const struct list_head *head)
+{
+	struct list_head *next = head->next;
+
+	return (next == head) && (next == head->prev);
+}
+
+static inline void list_rotate_left(struct list_head *head)
+{
+	struct list_head *first;
+
+	if (!list_empty(head)) {
+		first = head->next;
+		list_move_tail(first, head);
+	}
+}
+
+static inline int list_is_singular(const struct list_head *head)
+{
+	return !list_empty(head) && (head->next == head->prev);
+}
+
+#define list_entry(ptr, type, member) \
+	container_of(ptr, type, member)
+
+#define list_first_entry(ptr, type, member) \
+	list_entry((ptr)->next, type, member)
+
+#define list_last_entry(ptr, type, member) \
+	list_entry((ptr)->prev, type, member)
+
+#define list_first_entry_or_null(ptr, type, member) \
+	(!list_empty(ptr) ? list_first_entry(ptr, type, member) : NULL)
+
+#define list_next_entry(pos, member) \
+	list_entry((pos)->member.next, typeof(*(pos)), member)
+
+#define list_prev_entry(pos, member) \
+	list_entry((pos)->member.prev, typeof(*(pos)), member)
+
+#define list_for_each(pos, head) \
+	for (pos = (head)->next; pos != (head); pos = pos->next)
+
+#define list_for_each_prev(pos, head) \
+	for (pos = (head)->prev; pos != (head); pos = pos->prev)
+
+#define list_for_each_safe(pos, n, head) \
+	for (pos = (head)->next, n = pos->next; pos != (head); \
+		pos = n, n = pos->next)
+
+#define list_for_each_prev_safe(pos, n, head) \
+	for (pos = (head)->prev, n = pos->prev; \
+	     pos != (head); \
+	     pos = n, n = pos->prev)
+
+#define list_for_each_entry(pos, head, member)				\
+	for (pos = list_first_entry(head, typeof(*pos), member);	\
+	     &pos->member != (head);					\
+	     pos = list_next_entry(pos, member))
+
+#define list_for_each_entry_reverse(pos, head, member)			\
+	for (pos = list_last_entry(head, typeof(*pos), member);		\
+	     &pos->member != (head);					\
+	     pos = list_prev_entry(pos, member))
+
+#define list_prepare_entry(pos, head, member) \
+	((pos) ? : list_entry(head, typeof(*pos), member))
+
+#define list_for_each_entry_continue(pos, head, member)			\
+	for (pos = list_next_entry(pos, member);			\
+	     &pos->member != (head);					\
+	     pos = list_next_entry(pos, member))
+
+#define list_for_each_entry_continue_reverse(pos, head, member)		\
+	for (pos = list_prev_entry(pos, member);			\
+	     &pos->member != (head);					\
+	     pos = list_prev_entry(pos, member))
+
+#define list_for_each_entry_from(pos, head, member)			\
+	for (; &pos->member != (head);					\
+	     pos = list_next_entry(pos, member))
+
+#define list_for_each_entry_safe(pos, n, head, member)			\
+	for (pos = list_first_entry(head, typeof(*pos), member),	\
+		n = list_next_entry(pos, member);			\
+	     &pos->member != (head);					\
+	     pos = n, n = list_next_entry(n, member))
+
+#define list_for_each_entry_safe_continue(pos, n, head, member)		\
+	for (pos = list_next_entry(pos, member),			\
+		n = list_next_entry(pos, member);			\
+	     &pos->member != (head);					\
+	     pos = n, n = list_next_entry(n, member))
+
+#define list_for_each_entry_safe_from(pos, n, head, member)		\
+	for (n = list_next_entry(pos, member);				\
+	     &pos->member != (head);					\
+	     pos = n, n = list_next_entry(n, member))
+
+#define list_for_each_entry_safe_reverse(pos, n, head, member)		\
+	for (pos = list_last_entry(head, typeof(*pos), member),		\
+		n = list_prev_entry(pos, member);			\
+	     &pos->member != (head);					\
+	     pos = n, n = list_prev_entry(n, member))
+
+#endif /* _LIST_H */
diff --git a/scripts/stackvalidate/stackvalidate.c b/scripts/stackvalidate/stackvalidate.c
new file mode 100644
index 0000000..07f1110
--- /dev/null
+++ b/scripts/stackvalidate/stackvalidate.c
@@ -0,0 +1,226 @@
+/*
+ * stackvalidate.c
+ *
+ * Copyright (C) 2015 Josh Poimboeuf <jpoimboe@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * This tool automatically runs for every compiled .S file and validates that
+ * every asm function does the proper frame pointer setup.
+ *
+ * Also, to make sure somebody didn't forget to annotate their callable asm
+ * code as a function (e.g. via the FUNC_ENTER/FUNC_RETURN macros), it flags an
+ * error for any return instructions which are hiding outside of a function.
+ * In almost all cases, return instructions are part of callable functions and
+ * should be annotated as such so that we can validate their frame pointer
+ * usage.
+ *
+ * Whitelist mechanisms exist (RET_NOVALIDATE and FILE_NOVALIDATE) for those
+ * few return instructions which are not actually in callable code.
+ */
+
+#include <argp.h>
+#include <stdbool.h>
+
+#include "elf.h"
+#include "arch.h"
+
+int warnings;
+
+struct args {
+	char *args[1];
+};
+static const char args_doc[] = "file.o";
+static struct argp_option options[] = {
+	{0},
+};
+static error_t parse_opt(int key, char *arg, struct argp_state *state)
+{
+	/* Get the input argument from argp_parse, which we
+	   know is a pointer to our args structure. */
+	struct args *args = state->input;
+
+	switch (key) {
+	case ARGP_KEY_ARG:
+		if (state->arg_num >= 1)
+			/* Too many arguments. */
+			argp_usage(state);
+		args->args[state->arg_num] = arg;
+		break;
+	case ARGP_KEY_END:
+		if (state->arg_num < 1)
+			/* Not enough arguments. */
+			argp_usage(state);
+		break;
+	default:
+		return ARGP_ERR_UNKNOWN;
+	}
+	return 0;
+}
+static struct argp argp = { options, parse_opt, args_doc, 0 };
+
+/*
+ * Check for the RET_NOVALIDATE macro.
+ */
+static bool is_ret_whitelisted(struct elf *elf, struct section *sec,
+			       unsigned long offset)
+{
+	struct section *wlsec;
+	struct rela *rela;
+
+	wlsec = elf_find_section_by_name(elf,
+					 ".rela__stackvalidate_whitelist_ret");
+	if (!wlsec)
+		return false;
+
+	list_for_each_entry(rela, &wlsec->relas, list)
+		if (rela->sym->type == STT_SECTION &&
+		    rela->sym->index == sec->index && rela->addend == offset)
+			return true;
+
+	return false;
+}
+
+/*
+ * Check for the FILE_NOVALIDATE macro.
+ */
+static bool is_file_whitelisted(struct elf *elf)
+{
+	if (elf_find_section_by_name(elf, "__stackvalidate_whitelist_file"))
+		return true;
+
+	return false;
+}
+
+/*
+ * For the given collection of instructions which are outside an STT_FUNC
+ * function, ensure there are no (whitelisted) return instructions.
+ */
+static int validate_nonfunction(struct elf *elf, struct section *sec,
+				unsigned long start, unsigned long end)
+{
+	unsigned long addr;
+	unsigned int len;
+	int ret, warnings = 0;
+
+	for (addr = start; addr < end; addr += len) {
+		ret = arch_is_return_insn(elf, addr, end - addr, &len);
+		if (ret == -1)
+			return -1;
+
+		if (ret && !is_ret_whitelisted(elf, sec, addr - sec->start)) {
+			WARN("return instruction outside of a function at %s+0x%lx.  Please use FUNC_ENTER.",
+			     sec->name, addr - sec->start);
+			warnings++;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * For the given section, ensure that:
+ *
+ * 1) all STT_FUNC functions do the proper frame pointer setup; and
+ * 2) any other instructions outside of STT_FUNC aren't return instructions
+ *    (unless they're annotated with the RET_NOVALIDATE macro).
+ */
+static int validate_section(struct elf *elf, struct section *sec)
+{
+	struct symbol *func, *last_func;
+	struct symbol null_func = {};
+	int ret, warnings = 0;
+
+	if (!(sec->sh.sh_flags & SHF_EXECINSTR))
+		return 0;
+
+	if (list_empty(&sec->symbols)) {
+		WARN("%s: no symbols", sec->name);
+		return -1;
+	}
+
+	last_func = &null_func;
+	last_func->start = last_func->end = sec->start;
+	list_for_each_entry(func, &sec->symbols, list) {
+		if (func->type != STT_FUNC)
+			continue;
+
+		if (func->start != last_func->start &&
+		    func->end != last_func->end &&
+		    func->start < last_func->end) {
+			WARN("overlapping functions %s and %s",
+			     last_func->name, func->name);
+			warnings++;
+		}
+
+		if (func->start > last_func->end) {
+			ret = validate_nonfunction(elf, sec, last_func->end,
+						   func->start);
+			if (ret < 0)
+				return -1;
+
+			warnings += ret;
+		}
+
+		ret = arch_validate_function(elf, func);
+		if (ret < 0)
+			return -1;
+
+		warnings += ret;
+
+		last_func = func;
+	}
+
+	if (last_func->end < sec->end) {
+		ret = validate_nonfunction(elf, sec, last_func->end, sec->end);
+		if (ret < 0)
+			return -1;
+
+		warnings += ret;
+	}
+
+	return warnings;
+}
+
+int main(int argc, char *argv[])
+{
+	struct args args;
+	struct elf *elf;
+	struct section *sec;
+	int ret, warnings = 0;
+
+	argp_parse(&argp, argc, argv, 0, 0, &args);
+
+	elf = elf_open(args.args[0]);
+	if (!elf) {
+		fprintf(stderr, "error reading elf file %s\n", args.args[0]);
+		return 1;
+	}
+
+	if (is_file_whitelisted(elf))
+		return 0;
+
+	list_for_each_entry(sec, &elf->sections, list) {
+		ret = validate_section(elf, sec);
+		if (ret < 0)
+			return 1;
+
+		warnings += ret;
+	}
+
+	/* ignore warnings for now until we get all the asm code cleaned up */
+	return 0;
+}
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4 2/3] x86: Make push/pop CFI macros arch-independent
  2015-05-18 16:34 [PATCH v4 0/3] Compile-time stack frame pointer validation Josh Poimboeuf
  2015-05-18 16:34 ` [PATCH v4 1/3] x86, stackvalidate: " Josh Poimboeuf
@ 2015-05-18 16:34 ` Josh Poimboeuf
  2015-05-18 16:34 ` [PATCH v4 3/3] x86, stackvalidate: Add asm frame pointer setup macros Josh Poimboeuf
  2015-05-20 10:33 ` [PATCH v4 0/3] Compile-time stack frame pointer validation Ingo Molnar
  3 siblings, 0 replies; 710+ messages in thread
From: Josh Poimboeuf @ 2015-05-18 16:34 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Michal Marek, Peter Zijlstra, x86, live-patching, linux-kernel

The separate push{lq}_cfi and pop_{lq}_cfi macros aren't needed.  Push
and pop only come in one size per architecture, so the trailing 'q' or
'l' characters are redundant, and awkward to use in arch-independent
code.

Replace the push/pop CFI macros with architecture-independent versions:
push_cfi, pop_cfi, etc.

This change is purely cosmetic, with no resulting object code changes.

Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/ia32/ia32entry.S      |  60 ++++++------
 arch/x86/include/asm/calling.h |  28 +++---
 arch/x86/include/asm/dwarf2.h  |  92 ++++++------------
 arch/x86/include/asm/frame.h   |   4 +-
 arch/x86/kernel/entry_32.S     | 214 ++++++++++++++++++++---------------------
 arch/x86/kernel/entry_64.S     |  96 +++++++++---------
 arch/x86/lib/atomic64_386_32.S |   4 +-
 arch/x86/lib/atomic64_cx8_32.S |  40 ++++----
 arch/x86/lib/checksum_32.S     |  42 ++++----
 arch/x86/lib/cmpxchg16b_emu.S  |   6 +-
 arch/x86/lib/cmpxchg8b_emu.S   |   6 +-
 arch/x86/lib/msr-reg.S         |  34 +++----
 arch/x86/lib/rwsem.S           |  40 ++++----
 arch/x86/lib/thunk_32.S        |  12 +--
 arch/x86/lib/thunk_64.S        |  36 +++----
 15 files changed, 343 insertions(+), 371 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 83e4ed2..7259bc9 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -124,19 +124,19 @@ ENTRY(ia32_sysenter_target)
 	CFI_REGISTER rip,r10
 
 	/* Construct struct pt_regs on stack */
-	pushq_cfi	$__USER32_DS		/* pt_regs->ss */
-	pushq_cfi	%rbp			/* pt_regs->sp */
+	push_cfi	$__USER32_DS		/* pt_regs->ss */
+	push_cfi	%rbp			/* pt_regs->sp */
 	CFI_REL_OFFSET	rsp,0
-	pushfq_cfi				/* pt_regs->flags */
-	pushq_cfi	$__USER32_CS		/* pt_regs->cs */
-	pushq_cfi	%r10 /* pt_regs->ip = thread_info->sysenter_return */
+	pushf_cfi				/* pt_regs->flags */
+	push_cfi	$__USER32_CS		/* pt_regs->cs */
+	push_cfi	%r10 /* pt_regs->ip = thread_info->sysenter_return */
 	CFI_REL_OFFSET	rip,0
-	pushq_cfi_reg	rax			/* pt_regs->orig_ax */
-	pushq_cfi_reg	rdi			/* pt_regs->di */
-	pushq_cfi_reg	rsi			/* pt_regs->si */
-	pushq_cfi_reg	rdx			/* pt_regs->dx */
-	pushq_cfi_reg	rcx			/* pt_regs->cx */
-	pushq_cfi	$-ENOSYS		/* pt_regs->ax */
+	push_cfi_reg	rax			/* pt_regs->orig_ax */
+	push_cfi_reg	rdi			/* pt_regs->di */
+	push_cfi_reg	rsi			/* pt_regs->si */
+	push_cfi_reg	rdx			/* pt_regs->dx */
+	push_cfi_reg	rcx			/* pt_regs->cx */
+	push_cfi	$-ENOSYS		/* pt_regs->ax */
 	cld
 	sub	$(10*8),%rsp /* pt_regs->r8-11,bp,bx,r12-15 not saved */
 	CFI_ADJUST_CFA_OFFSET 10*8
@@ -282,8 +282,8 @@ sysexit_audit:
 #endif
 
 sysenter_fix_flags:
-	pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED)
-	popfq_cfi
+	push_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED)
+	popf_cfi
 	jmp sysenter_flags_fixed
 
 sysenter_tracesys:
@@ -353,20 +353,20 @@ ENTRY(ia32_cstar_target)
 	movl	%eax,%eax
 
 	/* Construct struct pt_regs on stack */
-	pushq_cfi	$__USER32_DS		/* pt_regs->ss */
-	pushq_cfi	%r8			/* pt_regs->sp */
+	push_cfi	$__USER32_DS		/* pt_regs->ss */
+	push_cfi	%r8			/* pt_regs->sp */
 	CFI_REL_OFFSET rsp,0
-	pushq_cfi	%r11			/* pt_regs->flags */
-	pushq_cfi	$__USER32_CS		/* pt_regs->cs */
-	pushq_cfi	%rcx			/* pt_regs->ip */
+	push_cfi	%r11			/* pt_regs->flags */
+	push_cfi	$__USER32_CS		/* pt_regs->cs */
+	push_cfi	%rcx			/* pt_regs->ip */
 	CFI_REL_OFFSET rip,0
-	pushq_cfi_reg	rax			/* pt_regs->orig_ax */
-	pushq_cfi_reg	rdi			/* pt_regs->di */
-	pushq_cfi_reg	rsi			/* pt_regs->si */
-	pushq_cfi_reg	rdx			/* pt_regs->dx */
-	pushq_cfi_reg	rbp			/* pt_regs->cx */
+	push_cfi_reg	rax			/* pt_regs->orig_ax */
+	push_cfi_reg	rdi			/* pt_regs->di */
+	push_cfi_reg	rsi			/* pt_regs->si */
+	push_cfi_reg	rdx			/* pt_regs->dx */
+	push_cfi_reg	rbp			/* pt_regs->cx */
 	movl	%ebp,%ecx
-	pushq_cfi	$-ENOSYS		/* pt_regs->ax */
+	push_cfi	$-ENOSYS		/* pt_regs->ax */
 	sub	$(10*8),%rsp /* pt_regs->r8-11,bp,bx,r12-15 not saved */
 	CFI_ADJUST_CFA_OFFSET 10*8
 
@@ -506,12 +506,12 @@ ENTRY(ia32_syscall)
 	movl	%eax,%eax
 
 	/* Construct struct pt_regs on stack (iret frame is already on stack) */
-	pushq_cfi_reg	rax			/* pt_regs->orig_ax */
-	pushq_cfi_reg	rdi			/* pt_regs->di */
-	pushq_cfi_reg	rsi			/* pt_regs->si */
-	pushq_cfi_reg	rdx			/* pt_regs->dx */
-	pushq_cfi_reg	rcx			/* pt_regs->cx */
-	pushq_cfi	$-ENOSYS		/* pt_regs->ax */
+	push_cfi_reg	rax			/* pt_regs->orig_ax */
+	push_cfi_reg	rdi			/* pt_regs->di */
+	push_cfi_reg	rsi			/* pt_regs->si */
+	push_cfi_reg	rdx			/* pt_regs->dx */
+	push_cfi_reg	rcx			/* pt_regs->cx */
+	push_cfi	$-ENOSYS		/* pt_regs->ax */
 	cld
 	sub	$(10*8),%rsp /* pt_regs->r8-11,bp,bx,r12-15 not saved */
 	CFI_ADJUST_CFA_OFFSET 10*8
diff --git a/arch/x86/include/asm/calling.h b/arch/x86/include/asm/calling.h
index 1c8b50e..4abc60f 100644
--- a/arch/x86/include/asm/calling.h
+++ b/arch/x86/include/asm/calling.h
@@ -224,23 +224,23 @@ For 32-bit we have the following conventions - kernel is built with
  */
 
 	.macro SAVE_ALL
-	pushl_cfi_reg eax
-	pushl_cfi_reg ebp
-	pushl_cfi_reg edi
-	pushl_cfi_reg esi
-	pushl_cfi_reg edx
-	pushl_cfi_reg ecx
-	pushl_cfi_reg ebx
+	push_cfi_reg eax
+	push_cfi_reg ebp
+	push_cfi_reg edi
+	push_cfi_reg esi
+	push_cfi_reg edx
+	push_cfi_reg ecx
+	push_cfi_reg ebx
 	.endm
 
 	.macro RESTORE_ALL
-	popl_cfi_reg ebx
-	popl_cfi_reg ecx
-	popl_cfi_reg edx
-	popl_cfi_reg esi
-	popl_cfi_reg edi
-	popl_cfi_reg ebp
-	popl_cfi_reg eax
+	pop_cfi_reg ebx
+	pop_cfi_reg ecx
+	pop_cfi_reg edx
+	pop_cfi_reg esi
+	pop_cfi_reg edi
+	pop_cfi_reg ebp
+	pop_cfi_reg eax
 	.endm
 
 #endif /* CONFIG_X86_64 */
diff --git a/arch/x86/include/asm/dwarf2.h b/arch/x86/include/asm/dwarf2.h
index de1cdaf..5af7e15 100644
--- a/arch/x86/include/asm/dwarf2.h
+++ b/arch/x86/include/asm/dwarf2.h
@@ -5,6 +5,8 @@
 #warning "asm/dwarf2.h should be only included in pure assembly files"
 #endif
 
+#include <asm/asm.h>
+
 /*
  * Macros for dwarf2 CFI unwind table entries.
  * See "as.info" for details on these pseudo ops. Unfortunately
@@ -80,79 +82,39 @@
  * what you're doing if you use them.
  */
 #ifdef __ASSEMBLY__
-#ifdef CONFIG_X86_64
-	.macro pushq_cfi reg
-	pushq \reg
-	CFI_ADJUST_CFA_OFFSET 8
-	.endm
-
-	.macro pushq_cfi_reg reg
-	pushq %\reg
-	CFI_ADJUST_CFA_OFFSET 8
-	CFI_REL_OFFSET \reg, 0
-	.endm
 
-	.macro popq_cfi reg
-	popq \reg
-	CFI_ADJUST_CFA_OFFSET -8
-	.endm
-
-	.macro popq_cfi_reg reg
-	popq %\reg
-	CFI_ADJUST_CFA_OFFSET -8
-	CFI_RESTORE \reg
-	.endm
+#define STACK_WORD_SIZE __ASM_SEL(4,8)
 
-	.macro pushfq_cfi
-	pushfq
-	CFI_ADJUST_CFA_OFFSET 8
+	.macro push_cfi reg
+	push \reg
+	CFI_ADJUST_CFA_OFFSET STACK_WORD_SIZE
 	.endm
 
-	.macro popfq_cfi
-	popfq
-	CFI_ADJUST_CFA_OFFSET -8
-	.endm
-
-	.macro movq_cfi reg offset=0
-	movq %\reg, \offset(%rsp)
-	CFI_REL_OFFSET \reg, \offset
-	.endm
-
-	.macro movq_cfi_restore offset reg
-	movq \offset(%rsp), %\reg
-	CFI_RESTORE \reg
-	.endm
-#else /*!CONFIG_X86_64*/
-	.macro pushl_cfi reg
-	pushl \reg
-	CFI_ADJUST_CFA_OFFSET 4
-	.endm
-
-	.macro pushl_cfi_reg reg
-	pushl %\reg
-	CFI_ADJUST_CFA_OFFSET 4
+	.macro push_cfi_reg reg
+	push %\reg
+	CFI_ADJUST_CFA_OFFSET STACK_WORD_SIZE
 	CFI_REL_OFFSET \reg, 0
 	.endm
 
-	.macro popl_cfi reg
-	popl \reg
-	CFI_ADJUST_CFA_OFFSET -4
+	.macro pop_cfi reg
+	pop \reg
+	CFI_ADJUST_CFA_OFFSET -STACK_WORD_SIZE
 	.endm
 
-	.macro popl_cfi_reg reg
-	popl %\reg
-	CFI_ADJUST_CFA_OFFSET -4
+	.macro pop_cfi_reg reg
+	pop %\reg
+	CFI_ADJUST_CFA_OFFSET -STACK_WORD_SIZE
 	CFI_RESTORE \reg
 	.endm
 
-	.macro pushfl_cfi
-	pushfl
-	CFI_ADJUST_CFA_OFFSET 4
+	.macro pushf_cfi
+	pushf
+	CFI_ADJUST_CFA_OFFSET STACK_WORD_SIZE
 	.endm
 
-	.macro popfl_cfi
-	popfl
-	CFI_ADJUST_CFA_OFFSET -4
+	.macro popf_cfi
+	popf
+	CFI_ADJUST_CFA_OFFSET -STACK_WORD_SIZE
 	.endm
 
 	.macro movl_cfi reg offset=0
@@ -164,7 +126,17 @@
 	movl \offset(%esp), %\reg
 	CFI_RESTORE \reg
 	.endm
-#endif /*!CONFIG_X86_64*/
+
+	.macro movq_cfi reg offset=0
+	movq %\reg, \offset(%rsp)
+	CFI_REL_OFFSET \reg, \offset
+	.endm
+
+	.macro movq_cfi_restore offset reg
+	movq \offset(%rsp), %\reg
+	CFI_RESTORE \reg
+	.endm
+
 #endif /*__ASSEMBLY__*/
 
 #endif /* _ASM_X86_DWARF2_H */
diff --git a/arch/x86/include/asm/frame.h b/arch/x86/include/asm/frame.h
index 3b629f4..325e4e8 100644
--- a/arch/x86/include/asm/frame.h
+++ b/arch/x86/include/asm/frame.h
@@ -8,12 +8,12 @@
    frame pointer later */
 #ifdef CONFIG_FRAME_POINTER
 	.macro FRAME
-	__ASM_SIZE(push,_cfi)	%__ASM_REG(bp)
+	push_cfi		%__ASM_REG(bp)
 	CFI_REL_OFFSET		__ASM_REG(bp), 0
 	__ASM_SIZE(mov)		%__ASM_REG(sp), %__ASM_REG(bp)
 	.endm
 	.macro ENDFRAME
-	__ASM_SIZE(pop,_cfi)	%__ASM_REG(bp)
+	pop_cfi			%__ASM_REG(bp)
 	CFI_RESTORE		__ASM_REG(bp)
 	.endm
 #else
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index 1c30976..7e88181 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -113,7 +113,7 @@
 
  /* unfortunately push/pop can't be no-op */
 .macro PUSH_GS
-	pushl_cfi $0
+	push_cfi $0
 .endm
 .macro POP_GS pop=0
 	addl $(4 + \pop), %esp
@@ -137,12 +137,12 @@
 #else	/* CONFIG_X86_32_LAZY_GS */
 
 .macro PUSH_GS
-	pushl_cfi %gs
+	push_cfi %gs
 	/*CFI_REL_OFFSET gs, 0*/
 .endm
 
 .macro POP_GS pop=0
-98:	popl_cfi %gs
+98:	pop_cfi %gs
 	/*CFI_RESTORE gs*/
   .if \pop <> 0
 	add $\pop, %esp
@@ -186,25 +186,25 @@
 .macro SAVE_ALL
 	cld
 	PUSH_GS
-	pushl_cfi %fs
+	push_cfi %fs
 	/*CFI_REL_OFFSET fs, 0;*/
-	pushl_cfi %es
+	push_cfi %es
 	/*CFI_REL_OFFSET es, 0;*/
-	pushl_cfi %ds
+	push_cfi %ds
 	/*CFI_REL_OFFSET ds, 0;*/
-	pushl_cfi %eax
+	push_cfi %eax
 	CFI_REL_OFFSET eax, 0
-	pushl_cfi %ebp
+	push_cfi %ebp
 	CFI_REL_OFFSET ebp, 0
-	pushl_cfi %edi
+	push_cfi %edi
 	CFI_REL_OFFSET edi, 0
-	pushl_cfi %esi
+	push_cfi %esi
 	CFI_REL_OFFSET esi, 0
-	pushl_cfi %edx
+	push_cfi %edx
 	CFI_REL_OFFSET edx, 0
-	pushl_cfi %ecx
+	push_cfi %ecx
 	CFI_REL_OFFSET ecx, 0
-	pushl_cfi %ebx
+	push_cfi %ebx
 	CFI_REL_OFFSET ebx, 0
 	movl $(__USER_DS), %edx
 	movl %edx, %ds
@@ -215,29 +215,29 @@
 .endm
 
 .macro RESTORE_INT_REGS
-	popl_cfi %ebx
+	pop_cfi %ebx
 	CFI_RESTORE ebx
-	popl_cfi %ecx
+	pop_cfi %ecx
 	CFI_RESTORE ecx
-	popl_cfi %edx
+	pop_cfi %edx
 	CFI_RESTORE edx
-	popl_cfi %esi
+	pop_cfi %esi
 	CFI_RESTORE esi
-	popl_cfi %edi
+	pop_cfi %edi
 	CFI_RESTORE edi
-	popl_cfi %ebp
+	pop_cfi %ebp
 	CFI_RESTORE ebp
-	popl_cfi %eax
+	pop_cfi %eax
 	CFI_RESTORE eax
 .endm
 
 .macro RESTORE_REGS pop=0
 	RESTORE_INT_REGS
-1:	popl_cfi %ds
+1:	pop_cfi %ds
 	/*CFI_RESTORE ds;*/
-2:	popl_cfi %es
+2:	pop_cfi %es
 	/*CFI_RESTORE es;*/
-3:	popl_cfi %fs
+3:	pop_cfi %fs
 	/*CFI_RESTORE fs;*/
 	POP_GS \pop
 .pushsection .fixup, "ax"
@@ -289,24 +289,24 @@
 
 ENTRY(ret_from_fork)
 	CFI_STARTPROC
-	pushl_cfi %eax
+	push_cfi %eax
 	call schedule_tail
 	GET_THREAD_INFO(%ebp)
-	popl_cfi %eax
-	pushl_cfi $0x0202		# Reset kernel eflags
-	popfl_cfi
+	pop_cfi %eax
+	push_cfi $0x0202		# Reset kernel eflags
+	popf_cfi
 	jmp syscall_exit
 	CFI_ENDPROC
 END(ret_from_fork)
 
 ENTRY(ret_from_kernel_thread)
 	CFI_STARTPROC
-	pushl_cfi %eax
+	push_cfi %eax
 	call schedule_tail
 	GET_THREAD_INFO(%ebp)
-	popl_cfi %eax
-	pushl_cfi $0x0202		# Reset kernel eflags
-	popfl_cfi
+	pop_cfi %eax
+	push_cfi $0x0202		# Reset kernel eflags
+	popf_cfi
 	movl PT_EBP(%esp),%eax
 	call *PT_EBX(%esp)
 	movl $0,PT_EAX(%esp)
@@ -385,13 +385,13 @@ sysenter_past_esp:
 	 * enough kernel state to call TRACE_IRQS_OFF can be called - but
 	 * we immediately enable interrupts at that point anyway.
 	 */
-	pushl_cfi $__USER_DS
+	push_cfi $__USER_DS
 	/*CFI_REL_OFFSET ss, 0*/
-	pushl_cfi %ebp
+	push_cfi %ebp
 	CFI_REL_OFFSET esp, 0
-	pushfl_cfi
+	pushf_cfi
 	orl $X86_EFLAGS_IF, (%esp)
-	pushl_cfi $__USER_CS
+	push_cfi $__USER_CS
 	/*CFI_REL_OFFSET cs, 0*/
 	/*
 	 * Push current_thread_info()->sysenter_return to the stack.
@@ -401,10 +401,10 @@ sysenter_past_esp:
 	 * TOP_OF_KERNEL_STACK_PADDING takes us to the top of the stack;
 	 * and THREAD_SIZE takes us to the bottom.
 	 */
-	pushl_cfi ((TI_sysenter_return) - THREAD_SIZE + TOP_OF_KERNEL_STACK_PADDING + 4*4)(%esp)
+	push_cfi ((TI_sysenter_return) - THREAD_SIZE + TOP_OF_KERNEL_STACK_PADDING + 4*4)(%esp)
 	CFI_REL_OFFSET eip, 0
 
-	pushl_cfi %eax
+	push_cfi %eax
 	SAVE_ALL
 	ENABLE_INTERRUPTS(CLBR_NONE)
 
@@ -453,11 +453,11 @@ sysenter_audit:
 	/* movl PT_EAX(%esp), %eax	already set, syscall number: 1st arg to audit */
 	movl PT_EBX(%esp), %edx		/* ebx/a0: 2nd arg to audit */
 	/* movl PT_ECX(%esp), %ecx	already set, a1: 3nd arg to audit */
-	pushl_cfi PT_ESI(%esp)		/* a3: 5th arg */
-	pushl_cfi PT_EDX+4(%esp)	/* a2: 4th arg */
+	push_cfi PT_ESI(%esp)		/* a3: 5th arg */
+	push_cfi PT_EDX+4(%esp)	/* a2: 4th arg */
 	call __audit_syscall_entry
-	popl_cfi %ecx /* get that remapped edx off the stack */
-	popl_cfi %ecx /* get that remapped esi off the stack */
+	pop_cfi %ecx /* get that remapped edx off the stack */
+	pop_cfi %ecx /* get that remapped esi off the stack */
 	movl PT_EAX(%esp),%eax		/* reload syscall number */
 	jmp sysenter_do_call
 
@@ -493,7 +493,7 @@ ENDPROC(ia32_sysenter_target)
 ENTRY(system_call)
 	RING0_INT_FRAME			# can't unwind into user space anyway
 	ASM_CLAC
-	pushl_cfi %eax			# save orig_eax
+	push_cfi %eax			# save orig_eax
 	SAVE_ALL
 	GET_THREAD_INFO(%ebp)
 					# system call tracing in operation / emulation
@@ -577,8 +577,8 @@ ldt_ss:
 	shr $16, %edx
 	mov %dl, GDT_ESPFIX_SS + 4 /* bits 16..23 */
 	mov %dh, GDT_ESPFIX_SS + 7 /* bits 24..31 */
-	pushl_cfi $__ESPFIX_SS
-	pushl_cfi %eax			/* new kernel esp */
+	push_cfi $__ESPFIX_SS
+	push_cfi %eax			/* new kernel esp */
 	/* Disable interrupts, but do not irqtrace this section: we
 	 * will soon execute iret and the tracer was already set to
 	 * the irqstate after the iret */
@@ -634,9 +634,9 @@ work_notifysig:				# deal with pending signals and
 #ifdef CONFIG_VM86
 	ALIGN
 work_notifysig_v86:
-	pushl_cfi %ecx			# save ti_flags for do_notify_resume
+	push_cfi %ecx			# save ti_flags for do_notify_resume
 	call save_v86_state		# %eax contains pt_regs pointer
-	popl_cfi %ecx
+	pop_cfi %ecx
 	movl %eax, %esp
 	jmp 1b
 #endif
@@ -701,8 +701,8 @@ END(sysenter_badsys)
 	mov GDT_ESPFIX_SS + 7, %ah /* bits 24..31 */
 	shl $16, %eax
 	addl %esp, %eax			/* the adjusted stack pointer */
-	pushl_cfi $__KERNEL_DS
-	pushl_cfi %eax
+	push_cfi $__KERNEL_DS
+	push_cfi %eax
 	lss (%esp), %esp		/* switch to the normal stack segment */
 	CFI_ADJUST_CFA_OFFSET -8
 #endif
@@ -731,7 +731,7 @@ ENTRY(irq_entries_start)
 	RING0_INT_FRAME
     vector=FIRST_EXTERNAL_VECTOR
     .rept (FIRST_SYSTEM_VECTOR - FIRST_EXTERNAL_VECTOR)
-	pushl_cfi $(~vector+0x80)	/* Note: always in signed byte range */
+	push_cfi $(~vector+0x80)	/* Note: always in signed byte range */
     vector=vector+1
 	jmp	common_interrupt
 	CFI_ADJUST_CFA_OFFSET -4
@@ -759,7 +759,7 @@ ENDPROC(common_interrupt)
 ENTRY(name)				\
 	RING0_INT_FRAME;		\
 	ASM_CLAC;			\
-	pushl_cfi $~(nr);		\
+	push_cfi $~(nr);		\
 	SAVE_ALL;			\
 	TRACE_IRQS_OFF			\
 	movl %esp,%eax;			\
@@ -786,8 +786,8 @@ ENDPROC(name)
 ENTRY(coprocessor_error)
 	RING0_INT_FRAME
 	ASM_CLAC
-	pushl_cfi $0
-	pushl_cfi $do_coprocessor_error
+	push_cfi $0
+	push_cfi $do_coprocessor_error
 	jmp error_code
 	CFI_ENDPROC
 END(coprocessor_error)
@@ -795,14 +795,14 @@ END(coprocessor_error)
 ENTRY(simd_coprocessor_error)
 	RING0_INT_FRAME
 	ASM_CLAC
-	pushl_cfi $0
+	push_cfi $0
 #ifdef CONFIG_X86_INVD_BUG
 	/* AMD 486 bug: invd from userspace calls exception 19 instead of #GP */
-	ALTERNATIVE "pushl_cfi $do_general_protection",	\
+	ALTERNATIVE "push_cfi $do_general_protection",	\
 		    "pushl $do_simd_coprocessor_error", \
 		    X86_FEATURE_XMM
 #else
-	pushl_cfi $do_simd_coprocessor_error
+	push_cfi $do_simd_coprocessor_error
 #endif
 	jmp error_code
 	CFI_ENDPROC
@@ -811,8 +811,8 @@ END(simd_coprocessor_error)
 ENTRY(device_not_available)
 	RING0_INT_FRAME
 	ASM_CLAC
-	pushl_cfi $-1			# mark this as an int
-	pushl_cfi $do_device_not_available
+	push_cfi $-1			# mark this as an int
+	push_cfi $do_device_not_available
 	jmp error_code
 	CFI_ENDPROC
 END(device_not_available)
@@ -832,8 +832,8 @@ END(native_irq_enable_sysexit)
 ENTRY(overflow)
 	RING0_INT_FRAME
 	ASM_CLAC
-	pushl_cfi $0
-	pushl_cfi $do_overflow
+	push_cfi $0
+	push_cfi $do_overflow
 	jmp error_code
 	CFI_ENDPROC
 END(overflow)
@@ -841,8 +841,8 @@ END(overflow)
 ENTRY(bounds)
 	RING0_INT_FRAME
 	ASM_CLAC
-	pushl_cfi $0
-	pushl_cfi $do_bounds
+	push_cfi $0
+	push_cfi $do_bounds
 	jmp error_code
 	CFI_ENDPROC
 END(bounds)
@@ -850,8 +850,8 @@ END(bounds)
 ENTRY(invalid_op)
 	RING0_INT_FRAME
 	ASM_CLAC
-	pushl_cfi $0
-	pushl_cfi $do_invalid_op
+	push_cfi $0
+	push_cfi $do_invalid_op
 	jmp error_code
 	CFI_ENDPROC
 END(invalid_op)
@@ -859,8 +859,8 @@ END(invalid_op)
 ENTRY(coprocessor_segment_overrun)
 	RING0_INT_FRAME
 	ASM_CLAC
-	pushl_cfi $0
-	pushl_cfi $do_coprocessor_segment_overrun
+	push_cfi $0
+	push_cfi $do_coprocessor_segment_overrun
 	jmp error_code
 	CFI_ENDPROC
 END(coprocessor_segment_overrun)
@@ -868,7 +868,7 @@ END(coprocessor_segment_overrun)
 ENTRY(invalid_TSS)
 	RING0_EC_FRAME
 	ASM_CLAC
-	pushl_cfi $do_invalid_TSS
+	push_cfi $do_invalid_TSS
 	jmp error_code
 	CFI_ENDPROC
 END(invalid_TSS)
@@ -876,7 +876,7 @@ END(invalid_TSS)
 ENTRY(segment_not_present)
 	RING0_EC_FRAME
 	ASM_CLAC
-	pushl_cfi $do_segment_not_present
+	push_cfi $do_segment_not_present
 	jmp error_code
 	CFI_ENDPROC
 END(segment_not_present)
@@ -884,7 +884,7 @@ END(segment_not_present)
 ENTRY(stack_segment)
 	RING0_EC_FRAME
 	ASM_CLAC
-	pushl_cfi $do_stack_segment
+	push_cfi $do_stack_segment
 	jmp error_code
 	CFI_ENDPROC
 END(stack_segment)
@@ -892,7 +892,7 @@ END(stack_segment)
 ENTRY(alignment_check)
 	RING0_EC_FRAME
 	ASM_CLAC
-	pushl_cfi $do_alignment_check
+	push_cfi $do_alignment_check
 	jmp error_code
 	CFI_ENDPROC
 END(alignment_check)
@@ -900,8 +900,8 @@ END(alignment_check)
 ENTRY(divide_error)
 	RING0_INT_FRAME
 	ASM_CLAC
-	pushl_cfi $0			# no error code
-	pushl_cfi $do_divide_error
+	push_cfi $0			# no error code
+	push_cfi $do_divide_error
 	jmp error_code
 	CFI_ENDPROC
 END(divide_error)
@@ -910,8 +910,8 @@ END(divide_error)
 ENTRY(machine_check)
 	RING0_INT_FRAME
 	ASM_CLAC
-	pushl_cfi $0
-	pushl_cfi machine_check_vector
+	push_cfi $0
+	push_cfi machine_check_vector
 	jmp error_code
 	CFI_ENDPROC
 END(machine_check)
@@ -920,8 +920,8 @@ END(machine_check)
 ENTRY(spurious_interrupt_bug)
 	RING0_INT_FRAME
 	ASM_CLAC
-	pushl_cfi $0
-	pushl_cfi $do_spurious_interrupt_bug
+	push_cfi $0
+	push_cfi $do_spurious_interrupt_bug
 	jmp error_code
 	CFI_ENDPROC
 END(spurious_interrupt_bug)
@@ -938,7 +938,7 @@ ENTRY(xen_sysenter_target)
 
 ENTRY(xen_hypervisor_callback)
 	CFI_STARTPROC
-	pushl_cfi $-1 /* orig_ax = -1 => not a system call */
+	push_cfi $-1 /* orig_ax = -1 => not a system call */
 	SAVE_ALL
 	TRACE_IRQS_OFF
 
@@ -977,7 +977,7 @@ ENDPROC(xen_hypervisor_callback)
 # We distinguish between categories by maintaining a status value in EAX.
 ENTRY(xen_failsafe_callback)
 	CFI_STARTPROC
-	pushl_cfi %eax
+	push_cfi %eax
 	movl $1,%eax
 1:	mov 4(%esp),%ds
 2:	mov 8(%esp),%es
@@ -986,12 +986,12 @@ ENTRY(xen_failsafe_callback)
 	/* EAX == 0 => Category 1 (Bad segment)
 	   EAX != 0 => Category 2 (Bad IRET) */
 	testl %eax,%eax
-	popl_cfi %eax
+	pop_cfi %eax
 	lea 16(%esp),%esp
 	CFI_ADJUST_CFA_OFFSET -16
 	jz 5f
 	jmp iret_exc
-5:	pushl_cfi $-1 /* orig_ax = -1 => not a system call */
+5:	push_cfi $-1 /* orig_ax = -1 => not a system call */
 	SAVE_ALL
 	jmp ret_from_exception
 	CFI_ENDPROC
@@ -1197,7 +1197,7 @@ return_to_handler:
 ENTRY(trace_page_fault)
 	RING0_EC_FRAME
 	ASM_CLAC
-	pushl_cfi $trace_do_page_fault
+	push_cfi $trace_do_page_fault
 	jmp error_code
 	CFI_ENDPROC
 END(trace_page_fault)
@@ -1206,23 +1206,23 @@ END(trace_page_fault)
 ENTRY(page_fault)
 	RING0_EC_FRAME
 	ASM_CLAC
-	pushl_cfi $do_page_fault
+	push_cfi $do_page_fault
 	ALIGN
 error_code:
 	/* the function address is in %gs's slot on the stack */
-	pushl_cfi %fs
+	push_cfi %fs
 	/*CFI_REL_OFFSET fs, 0*/
-	pushl_cfi %es
+	push_cfi %es
 	/*CFI_REL_OFFSET es, 0*/
-	pushl_cfi %ds
+	push_cfi %ds
 	/*CFI_REL_OFFSET ds, 0*/
-	pushl_cfi_reg eax
-	pushl_cfi_reg ebp
-	pushl_cfi_reg edi
-	pushl_cfi_reg esi
-	pushl_cfi_reg edx
-	pushl_cfi_reg ecx
-	pushl_cfi_reg ebx
+	push_cfi_reg eax
+	push_cfi_reg ebp
+	push_cfi_reg edi
+	push_cfi_reg esi
+	push_cfi_reg edx
+	push_cfi_reg ecx
+	push_cfi_reg ebx
 	cld
 	movl $(__KERNEL_PERCPU), %ecx
 	movl %ecx, %fs
@@ -1263,9 +1263,9 @@ END(page_fault)
 	movl TSS_sysenter_sp0 + \offset(%esp), %esp
 	CFI_DEF_CFA esp, 0
 	CFI_UNDEFINED eip
-	pushfl_cfi
-	pushl_cfi $__KERNEL_CS
-	pushl_cfi $sysenter_past_esp
+	pushf_cfi
+	push_cfi $__KERNEL_CS
+	push_cfi $sysenter_past_esp
 	CFI_REL_OFFSET eip, 0
 .endm
 
@@ -1276,7 +1276,7 @@ ENTRY(debug)
 	jne debug_stack_correct
 	FIX_STACK 12, debug_stack_correct, debug_esp_fix_insn
 debug_stack_correct:
-	pushl_cfi $-1			# mark this as an int
+	push_cfi $-1			# mark this as an int
 	SAVE_ALL
 	TRACE_IRQS_OFF
 	xorl %edx,%edx			# error code 0
@@ -1298,28 +1298,28 @@ ENTRY(nmi)
 	RING0_INT_FRAME
 	ASM_CLAC
 #ifdef CONFIG_X86_ESPFIX32
-	pushl_cfi %eax
+	push_cfi %eax
 	movl %ss, %eax
 	cmpw $__ESPFIX_SS, %ax
-	popl_cfi %eax
+	pop_cfi %eax
 	je nmi_espfix_stack
 #endif
 	cmpl $ia32_sysenter_target,(%esp)
 	je nmi_stack_fixup
-	pushl_cfi %eax
+	push_cfi %eax
 	movl %esp,%eax
 	/* Do not access memory above the end of our stack page,
 	 * it might not exist.
 	 */
 	andl $(THREAD_SIZE-1),%eax
 	cmpl $(THREAD_SIZE-20),%eax
-	popl_cfi %eax
+	pop_cfi %eax
 	jae nmi_stack_correct
 	cmpl $ia32_sysenter_target,12(%esp)
 	je nmi_debug_stack_check
 nmi_stack_correct:
 	/* We have a RING0_INT_FRAME here */
-	pushl_cfi %eax
+	push_cfi %eax
 	SAVE_ALL
 	xorl %edx,%edx		# zero error code
 	movl %esp,%eax		# pt_regs pointer
@@ -1349,14 +1349,14 @@ nmi_espfix_stack:
 	 *
 	 * create the pointer to lss back
 	 */
-	pushl_cfi %ss
-	pushl_cfi %esp
+	push_cfi %ss
+	push_cfi %esp
 	addl $4, (%esp)
 	/* copy the iret frame of 12 bytes */
 	.rept 3
-	pushl_cfi 16(%esp)
+	push_cfi 16(%esp)
 	.endr
-	pushl_cfi %eax
+	push_cfi %eax
 	SAVE_ALL
 	FIXUP_ESPFIX_STACK		# %eax == %esp
 	xorl %edx,%edx			# zero error code
@@ -1372,7 +1372,7 @@ END(nmi)
 ENTRY(int3)
 	RING0_INT_FRAME
 	ASM_CLAC
-	pushl_cfi $-1			# mark this as an int
+	push_cfi $-1			# mark this as an int
 	SAVE_ALL
 	TRACE_IRQS_OFF
 	xorl %edx,%edx		# zero error code
@@ -1384,7 +1384,7 @@ END(int3)
 
 ENTRY(general_protection)
 	RING0_EC_FRAME
-	pushl_cfi $do_general_protection
+	push_cfi $do_general_protection
 	jmp error_code
 	CFI_ENDPROC
 END(general_protection)
@@ -1393,7 +1393,7 @@ END(general_protection)
 ENTRY(async_page_fault)
 	RING0_EC_FRAME
 	ASM_CLAC
-	pushl_cfi $do_async_page_fault
+	push_cfi $do_async_page_fault
 	jmp error_code
 	CFI_ENDPROC
 END(async_page_fault)
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 4e0ed47..3f2c4b2 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -219,8 +219,8 @@ GLOBAL(system_call_after_swapgs)
 	movq	PER_CPU_VAR(cpu_current_top_of_stack),%rsp
 
 	/* Construct struct pt_regs on stack */
-	pushq_cfi $__USER_DS			/* pt_regs->ss */
-	pushq_cfi PER_CPU_VAR(rsp_scratch)	/* pt_regs->sp */
+	push_cfi $__USER_DS			/* pt_regs->ss */
+	push_cfi PER_CPU_VAR(rsp_scratch)	/* pt_regs->sp */
 	/*
 	 * Re-enable interrupts.
 	 * We use 'rsp_scratch' as a scratch space, hence irq-off block above
@@ -229,20 +229,20 @@ GLOBAL(system_call_after_swapgs)
 	 * with using rsp_scratch:
 	 */
 	ENABLE_INTERRUPTS(CLBR_NONE)
-	pushq_cfi	%r11			/* pt_regs->flags */
-	pushq_cfi	$__USER_CS		/* pt_regs->cs */
-	pushq_cfi	%rcx			/* pt_regs->ip */
+	push_cfi	%r11			/* pt_regs->flags */
+	push_cfi	$__USER_CS		/* pt_regs->cs */
+	push_cfi	%rcx			/* pt_regs->ip */
 	CFI_REL_OFFSET rip,0
-	pushq_cfi_reg	rax			/* pt_regs->orig_ax */
-	pushq_cfi_reg	rdi			/* pt_regs->di */
-	pushq_cfi_reg	rsi			/* pt_regs->si */
-	pushq_cfi_reg	rdx			/* pt_regs->dx */
-	pushq_cfi_reg	rcx			/* pt_regs->cx */
-	pushq_cfi	$-ENOSYS		/* pt_regs->ax */
-	pushq_cfi_reg	r8			/* pt_regs->r8 */
-	pushq_cfi_reg	r9			/* pt_regs->r9 */
-	pushq_cfi_reg	r10			/* pt_regs->r10 */
-	pushq_cfi_reg	r11			/* pt_regs->r11 */
+	push_cfi_reg	rax			/* pt_regs->orig_ax */
+	push_cfi_reg	rdi			/* pt_regs->di */
+	push_cfi_reg	rsi			/* pt_regs->si */
+	push_cfi_reg	rdx			/* pt_regs->dx */
+	push_cfi_reg	rcx			/* pt_regs->cx */
+	push_cfi	$-ENOSYS		/* pt_regs->ax */
+	push_cfi_reg	r8			/* pt_regs->r8 */
+	push_cfi_reg	r9			/* pt_regs->r9 */
+	push_cfi_reg	r10			/* pt_regs->r10 */
+	push_cfi_reg	r11			/* pt_regs->r11 */
 	sub	$(6*8),%rsp /* pt_regs->bp,bx,r12-15 not saved */
 	CFI_ADJUST_CFA_OFFSET 6*8
 
@@ -374,9 +374,9 @@ int_careful:
 	jnc  int_very_careful
 	TRACE_IRQS_ON
 	ENABLE_INTERRUPTS(CLBR_NONE)
-	pushq_cfi %rdi
+	push_cfi %rdi
 	SCHEDULE_USER
-	popq_cfi %rdi
+	pop_cfi %rdi
 	DISABLE_INTERRUPTS(CLBR_NONE)
 	TRACE_IRQS_OFF
 	jmp int_with_check
@@ -389,10 +389,10 @@ int_very_careful:
 	/* Check for syscall exit trace */
 	testl $_TIF_WORK_SYSCALL_EXIT,%edx
 	jz int_signal
-	pushq_cfi %rdi
+	push_cfi %rdi
 	leaq 8(%rsp),%rdi	# &ptregs -> arg1
 	call syscall_trace_leave
-	popq_cfi %rdi
+	pop_cfi %rdi
 	andl $~(_TIF_WORK_SYSCALL_EXIT|_TIF_SYSCALL_EMU),%edi
 	jmp int_restore_rest
 
@@ -603,8 +603,8 @@ ENTRY(ret_from_fork)
 
 	LOCK ; btr $TIF_FORK,TI_flags(%r8)
 
-	pushq_cfi $0x0002
-	popfq_cfi				# reset kernel eflags
+	push_cfi $0x0002
+	popf_cfi				# reset kernel eflags
 
 	call schedule_tail			# rdi: 'prev' task parameter
 
@@ -640,7 +640,7 @@ ENTRY(irq_entries_start)
 	INTR_FRAME
     vector=FIRST_EXTERNAL_VECTOR
     .rept (FIRST_SYSTEM_VECTOR - FIRST_EXTERNAL_VECTOR)
-	pushq_cfi $(~vector+0x80)	/* Note: always in signed byte range */
+	push_cfi $(~vector+0x80)	/* Note: always in signed byte range */
     vector=vector+1
 	jmp	common_interrupt
 	CFI_ADJUST_CFA_OFFSET -8
@@ -807,8 +807,8 @@ native_irq_return_iret:
 
 #ifdef CONFIG_X86_ESPFIX64
 native_irq_return_ldt:
-	pushq_cfi %rax
-	pushq_cfi %rdi
+	push_cfi %rax
+	push_cfi %rdi
 	SWAPGS
 	movq PER_CPU_VAR(espfix_waddr),%rdi
 	movq %rax,(0*8)(%rdi)	/* RAX */
@@ -823,11 +823,11 @@ native_irq_return_ldt:
 	movq (5*8)(%rsp),%rax	/* RSP */
 	movq %rax,(4*8)(%rdi)
 	andl $0xffff0000,%eax
-	popq_cfi %rdi
+	pop_cfi %rdi
 	orq PER_CPU_VAR(espfix_stack),%rax
 	SWAPGS
 	movq %rax,%rsp
-	popq_cfi %rax
+	pop_cfi %rax
 	jmp native_irq_return_iret
 #endif
 
@@ -838,9 +838,9 @@ retint_careful:
 	jnc   retint_signal
 	TRACE_IRQS_ON
 	ENABLE_INTERRUPTS(CLBR_NONE)
-	pushq_cfi %rdi
+	push_cfi %rdi
 	SCHEDULE_USER
-	popq_cfi %rdi
+	pop_cfi %rdi
 	GET_THREAD_INFO(%rcx)
 	DISABLE_INTERRUPTS(CLBR_NONE)
 	TRACE_IRQS_OFF
@@ -872,7 +872,7 @@ END(common_interrupt)
 ENTRY(\sym)
 	INTR_FRAME
 	ASM_CLAC
-	pushq_cfi $~(\num)
+	push_cfi $~(\num)
 .Lcommon_\sym:
 	interrupt \do_sym
 	jmp ret_from_intr
@@ -974,7 +974,7 @@ ENTRY(\sym)
 	PARAVIRT_ADJUST_EXCEPTION_FRAME
 
 	.ifeq \has_error_code
-	pushq_cfi $-1			/* ORIG_RAX: no syscall to restart */
+	push_cfi $-1			/* ORIG_RAX: no syscall to restart */
 	.endif
 
 	ALLOC_PT_GPREGS_ON_STACK
@@ -1091,14 +1091,14 @@ idtentry simd_coprocessor_error do_simd_coprocessor_error has_error_code=0
 	/* edi:  new selector */
 ENTRY(native_load_gs_index)
 	CFI_STARTPROC
-	pushfq_cfi
+	pushf_cfi
 	DISABLE_INTERRUPTS(CLBR_ANY & ~CLBR_RDI)
 	SWAPGS
 gs_change:
 	movl %edi,%gs
 2:	mfence		/* workaround */
 	SWAPGS
-	popfq_cfi
+	popf_cfi
 	ret
 	CFI_ENDPROC
 END(native_load_gs_index)
@@ -1116,7 +1116,7 @@ bad_gs:
 /* Call softirq on interrupt stack. Interrupts are off. */
 ENTRY(do_softirq_own_stack)
 	CFI_STARTPROC
-	pushq_cfi %rbp
+	push_cfi %rbp
 	CFI_REL_OFFSET rbp,0
 	mov  %rsp,%rbp
 	CFI_DEF_CFA_REGISTER rbp
@@ -1215,9 +1215,9 @@ ENTRY(xen_failsafe_callback)
 	CFI_RESTORE r11
 	addq $0x30,%rsp
 	CFI_ADJUST_CFA_OFFSET -0x30
-	pushq_cfi $0	/* RIP */
-	pushq_cfi %r11
-	pushq_cfi %rcx
+	push_cfi $0	/* RIP */
+	push_cfi %r11
+	push_cfi %rcx
 	jmp general_protection
 	CFI_RESTORE_STATE
 1:	/* Segment mismatch => Category 1 (Bad segment). Retry the IRET. */
@@ -1227,7 +1227,7 @@ ENTRY(xen_failsafe_callback)
 	CFI_RESTORE r11
 	addq $0x30,%rsp
 	CFI_ADJUST_CFA_OFFSET -0x30
-	pushq_cfi $-1 /* orig_ax = -1 => not a system call */
+	push_cfi $-1 /* orig_ax = -1 => not a system call */
 	ALLOC_PT_GPREGS_ON_STACK
 	SAVE_C_REGS
 	SAVE_EXTRA_REGS
@@ -1422,7 +1422,7 @@ ENTRY(nmi)
 	 */
 
 	/* Use %rdx as our temp variable throughout */
-	pushq_cfi %rdx
+	push_cfi %rdx
 	CFI_REL_OFFSET rdx, 0
 
 	/*
@@ -1478,18 +1478,18 @@ nested_nmi:
 	movq %rdx, %rsp
 	CFI_ADJUST_CFA_OFFSET 1*8
 	leaq -10*8(%rsp), %rdx
-	pushq_cfi $__KERNEL_DS
-	pushq_cfi %rdx
-	pushfq_cfi
-	pushq_cfi $__KERNEL_CS
-	pushq_cfi $repeat_nmi
+	push_cfi $__KERNEL_DS
+	push_cfi %rdx
+	pushf_cfi
+	push_cfi $__KERNEL_CS
+	push_cfi $repeat_nmi
 
 	/* Put stack back */
 	addq $(6*8), %rsp
 	CFI_ADJUST_CFA_OFFSET -6*8
 
 nested_nmi_out:
-	popq_cfi %rdx
+	pop_cfi %rdx
 	CFI_RESTORE rdx
 
 	/* No need to check faults here */
@@ -1537,7 +1537,7 @@ first_nmi:
 	CFI_RESTORE rdx
 
 	/* Set the NMI executing variable on the stack. */
-	pushq_cfi $1
+	push_cfi $1
 
 	/*
 	 * Leave room for the "copied" frame
@@ -1547,7 +1547,7 @@ first_nmi:
 
 	/* Copy the stack frame to the Saved frame */
 	.rept 5
-	pushq_cfi 11*8(%rsp)
+	push_cfi 11*8(%rsp)
 	.endr
 	CFI_DEF_CFA_OFFSET 5*8
 
@@ -1574,7 +1574,7 @@ repeat_nmi:
 	addq $(10*8), %rsp
 	CFI_ADJUST_CFA_OFFSET -10*8
 	.rept 5
-	pushq_cfi -6*8(%rsp)
+	push_cfi -6*8(%rsp)
 	.endr
 	subq $(5*8), %rsp
 	CFI_DEF_CFA_OFFSET 5*8
@@ -1585,7 +1585,7 @@ end_repeat_nmi:
 	 * NMI if the first NMI took an exception and reset our iret stack
 	 * so that we repeat another NMI.
 	 */
-	pushq_cfi $-1		/* ORIG_RAX: no syscall to restart */
+	push_cfi $-1		/* ORIG_RAX: no syscall to restart */
 	ALLOC_PT_GPREGS_ON_STACK
 
 	/*
diff --git a/arch/x86/lib/atomic64_386_32.S b/arch/x86/lib/atomic64_386_32.S
index 00933d5..aa17c69 100644
--- a/arch/x86/lib/atomic64_386_32.S
+++ b/arch/x86/lib/atomic64_386_32.S
@@ -15,12 +15,12 @@
 
 /* if you want SMP support, implement these with real spinlocks */
 .macro LOCK reg
-	pushfl_cfi
+	pushf_cfi
 	cli
 .endm
 
 .macro UNLOCK reg
-	popfl_cfi
+	popf_cfi
 .endm
 
 #define BEGIN(op) \
diff --git a/arch/x86/lib/atomic64_cx8_32.S b/arch/x86/lib/atomic64_cx8_32.S
index 082a851..c5dd086 100644
--- a/arch/x86/lib/atomic64_cx8_32.S
+++ b/arch/x86/lib/atomic64_cx8_32.S
@@ -57,10 +57,10 @@ ENDPROC(atomic64_xchg_cx8)
 .macro addsub_return func ins insc
 ENTRY(atomic64_\func\()_return_cx8)
 	CFI_STARTPROC
-	pushl_cfi_reg ebp
-	pushl_cfi_reg ebx
-	pushl_cfi_reg esi
-	pushl_cfi_reg edi
+	push_cfi_reg ebp
+	push_cfi_reg ebx
+	push_cfi_reg esi
+	push_cfi_reg edi
 
 	movl %eax, %esi
 	movl %edx, %edi
@@ -79,10 +79,10 @@ ENTRY(atomic64_\func\()_return_cx8)
 10:
 	movl %ebx, %eax
 	movl %ecx, %edx
-	popl_cfi_reg edi
-	popl_cfi_reg esi
-	popl_cfi_reg ebx
-	popl_cfi_reg ebp
+	pop_cfi_reg edi
+	pop_cfi_reg esi
+	pop_cfi_reg ebx
+	pop_cfi_reg ebp
 	ret
 	CFI_ENDPROC
 ENDPROC(atomic64_\func\()_return_cx8)
@@ -94,7 +94,7 @@ addsub_return sub sub sbb
 .macro incdec_return func ins insc
 ENTRY(atomic64_\func\()_return_cx8)
 	CFI_STARTPROC
-	pushl_cfi_reg ebx
+	push_cfi_reg ebx
 
 	read64 %esi
 1:
@@ -109,7 +109,7 @@ ENTRY(atomic64_\func\()_return_cx8)
 10:
 	movl %ebx, %eax
 	movl %ecx, %edx
-	popl_cfi_reg ebx
+	pop_cfi_reg ebx
 	ret
 	CFI_ENDPROC
 ENDPROC(atomic64_\func\()_return_cx8)
@@ -120,7 +120,7 @@ incdec_return dec sub sbb
 
 ENTRY(atomic64_dec_if_positive_cx8)
 	CFI_STARTPROC
-	pushl_cfi_reg ebx
+	push_cfi_reg ebx
 
 	read64 %esi
 1:
@@ -136,18 +136,18 @@ ENTRY(atomic64_dec_if_positive_cx8)
 2:
 	movl %ebx, %eax
 	movl %ecx, %edx
-	popl_cfi_reg ebx
+	pop_cfi_reg ebx
 	ret
 	CFI_ENDPROC
 ENDPROC(atomic64_dec_if_positive_cx8)
 
 ENTRY(atomic64_add_unless_cx8)
 	CFI_STARTPROC
-	pushl_cfi_reg ebp
-	pushl_cfi_reg ebx
+	push_cfi_reg ebp
+	push_cfi_reg ebx
 /* these just push these two parameters on the stack */
-	pushl_cfi_reg edi
-	pushl_cfi_reg ecx
+	push_cfi_reg edi
+	push_cfi_reg ecx
 
 	movl %eax, %ebp
 	movl %edx, %edi
@@ -169,8 +169,8 @@ ENTRY(atomic64_add_unless_cx8)
 3:
 	addl $8, %esp
 	CFI_ADJUST_CFA_OFFSET -8
-	popl_cfi_reg ebx
-	popl_cfi_reg ebp
+	pop_cfi_reg ebx
+	pop_cfi_reg ebp
 	ret
 4:
 	cmpl %edx, 4(%esp)
@@ -182,7 +182,7 @@ ENDPROC(atomic64_add_unless_cx8)
 
 ENTRY(atomic64_inc_not_zero_cx8)
 	CFI_STARTPROC
-	pushl_cfi_reg ebx
+	push_cfi_reg ebx
 
 	read64 %esi
 1:
@@ -199,7 +199,7 @@ ENTRY(atomic64_inc_not_zero_cx8)
 
 	movl $1, %eax
 3:
-	popl_cfi_reg ebx
+	pop_cfi_reg ebx
 	ret
 	CFI_ENDPROC
 ENDPROC(atomic64_inc_not_zero_cx8)
diff --git a/arch/x86/lib/checksum_32.S b/arch/x86/lib/checksum_32.S
index 9bc944a..42c1f9f 100644
--- a/arch/x86/lib/checksum_32.S
+++ b/arch/x86/lib/checksum_32.S
@@ -51,8 +51,8 @@ unsigned int csum_partial(const unsigned char * buff, int len, unsigned int sum)
 	   */		
 ENTRY(csum_partial)
 	CFI_STARTPROC
-	pushl_cfi_reg esi
-	pushl_cfi_reg ebx
+	push_cfi_reg esi
+	push_cfi_reg ebx
 	movl 20(%esp),%eax	# Function arg: unsigned int sum
 	movl 16(%esp),%ecx	# Function arg: int len
 	movl 12(%esp),%esi	# Function arg: unsigned char *buff
@@ -129,8 +129,8 @@ ENTRY(csum_partial)
 	jz 8f
 	roll $8, %eax
 8:
-	popl_cfi_reg ebx
-	popl_cfi_reg esi
+	pop_cfi_reg ebx
+	pop_cfi_reg esi
 	ret
 	CFI_ENDPROC
 ENDPROC(csum_partial)
@@ -141,8 +141,8 @@ ENDPROC(csum_partial)
 
 ENTRY(csum_partial)
 	CFI_STARTPROC
-	pushl_cfi_reg esi
-	pushl_cfi_reg ebx
+	push_cfi_reg esi
+	push_cfi_reg ebx
 	movl 20(%esp),%eax	# Function arg: unsigned int sum
 	movl 16(%esp),%ecx	# Function arg: int len
 	movl 12(%esp),%esi	# Function arg:	const unsigned char *buf
@@ -249,8 +249,8 @@ ENTRY(csum_partial)
 	jz 90f
 	roll $8, %eax
 90: 
-	popl_cfi_reg ebx
-	popl_cfi_reg esi
+	pop_cfi_reg ebx
+	pop_cfi_reg esi
 	ret
 	CFI_ENDPROC
 ENDPROC(csum_partial)
@@ -290,9 +290,9 @@ ENTRY(csum_partial_copy_generic)
 	CFI_STARTPROC
 	subl  $4,%esp	
 	CFI_ADJUST_CFA_OFFSET 4
-	pushl_cfi_reg edi
-	pushl_cfi_reg esi
-	pushl_cfi_reg ebx
+	push_cfi_reg edi
+	push_cfi_reg esi
+	push_cfi_reg ebx
 	movl ARGBASE+16(%esp),%eax	# sum
 	movl ARGBASE+12(%esp),%ecx	# len
 	movl ARGBASE+4(%esp),%esi	# src
@@ -401,10 +401,10 @@ DST(	movb %cl, (%edi)	)
 
 .previous
 
-	popl_cfi_reg ebx
-	popl_cfi_reg esi
-	popl_cfi_reg edi
-	popl_cfi %ecx			# equivalent to addl $4,%esp
+	pop_cfi_reg ebx
+	pop_cfi_reg esi
+	pop_cfi_reg edi
+	pop_cfi %ecx			# equivalent to addl $4,%esp
 	ret	
 	CFI_ENDPROC
 ENDPROC(csum_partial_copy_generic)
@@ -427,9 +427,9 @@ ENDPROC(csum_partial_copy_generic)
 		
 ENTRY(csum_partial_copy_generic)
 	CFI_STARTPROC
-	pushl_cfi_reg ebx
-	pushl_cfi_reg edi
-	pushl_cfi_reg esi
+	push_cfi_reg ebx
+	push_cfi_reg edi
+	push_cfi_reg esi
 	movl ARGBASE+4(%esp),%esi	#src
 	movl ARGBASE+8(%esp),%edi	#dst	
 	movl ARGBASE+12(%esp),%ecx	#len
@@ -489,9 +489,9 @@ DST(	movb %dl, (%edi)         )
 	jmp  7b			
 .previous				
 
-	popl_cfi_reg esi
-	popl_cfi_reg edi
-	popl_cfi_reg ebx
+	pop_cfi_reg esi
+	pop_cfi_reg edi
+	pop_cfi_reg ebx
 	ret
 	CFI_ENDPROC
 ENDPROC(csum_partial_copy_generic)
diff --git a/arch/x86/lib/cmpxchg16b_emu.S b/arch/x86/lib/cmpxchg16b_emu.S
index 40a1725..b18f317 100644
--- a/arch/x86/lib/cmpxchg16b_emu.S
+++ b/arch/x86/lib/cmpxchg16b_emu.S
@@ -32,7 +32,7 @@ CFI_STARTPROC
 # *atomic* on a single cpu (as provided by the this_cpu_xx class of
 # macros).
 #
-	pushfq_cfi
+	pushf_cfi
 	cli
 
 	cmpq PER_CPU_VAR((%rsi)), %rax
@@ -44,13 +44,13 @@ CFI_STARTPROC
 	movq %rcx, PER_CPU_VAR(8(%rsi))
 
 	CFI_REMEMBER_STATE
-	popfq_cfi
+	popf_cfi
 	mov $1, %al
 	ret
 
 	CFI_RESTORE_STATE
 .Lnot_same:
-	popfq_cfi
+	popf_cfi
 	xor %al,%al
 	ret
 
diff --git a/arch/x86/lib/cmpxchg8b_emu.S b/arch/x86/lib/cmpxchg8b_emu.S
index b4807fce..a4862d0 100644
--- a/arch/x86/lib/cmpxchg8b_emu.S
+++ b/arch/x86/lib/cmpxchg8b_emu.S
@@ -27,7 +27,7 @@ CFI_STARTPROC
 # set the whole ZF thing (caller will just compare
 # eax:edx with the expected value)
 #
-	pushfl_cfi
+	pushf_cfi
 	cli
 
 	cmpl  (%esi), %eax
@@ -39,7 +39,7 @@ CFI_STARTPROC
 	movl %ecx, 4(%esi)
 
 	CFI_REMEMBER_STATE
-	popfl_cfi
+	popf_cfi
 	ret
 
 	CFI_RESTORE_STATE
@@ -48,7 +48,7 @@ CFI_STARTPROC
 .Lhalf_same:
 	movl 4(%esi), %edx
 
-	popfl_cfi
+	popf_cfi
 	ret
 
 CFI_ENDPROC
diff --git a/arch/x86/lib/msr-reg.S b/arch/x86/lib/msr-reg.S
index 3ca5218..046a560 100644
--- a/arch/x86/lib/msr-reg.S
+++ b/arch/x86/lib/msr-reg.S
@@ -14,8 +14,8 @@
 .macro op_safe_regs op
 ENTRY(\op\()_safe_regs)
 	CFI_STARTPROC
-	pushq_cfi_reg rbx
-	pushq_cfi_reg rbp
+	push_cfi_reg rbx
+	push_cfi_reg rbp
 	movq	%rdi, %r10	/* Save pointer */
 	xorl	%r11d, %r11d	/* Return value */
 	movl    (%rdi), %eax
@@ -35,8 +35,8 @@ ENTRY(\op\()_safe_regs)
 	movl    %ebp, 20(%r10)
 	movl    %esi, 24(%r10)
 	movl    %edi, 28(%r10)
-	popq_cfi_reg rbp
-	popq_cfi_reg rbx
+	pop_cfi_reg rbp
+	pop_cfi_reg rbx
 	ret
 3:
 	CFI_RESTORE_STATE
@@ -53,12 +53,12 @@ ENDPROC(\op\()_safe_regs)
 .macro op_safe_regs op
 ENTRY(\op\()_safe_regs)
 	CFI_STARTPROC
-	pushl_cfi_reg ebx
-	pushl_cfi_reg ebp
-	pushl_cfi_reg esi
-	pushl_cfi_reg edi
-	pushl_cfi $0              /* Return value */
-	pushl_cfi %eax
+	push_cfi_reg ebx
+	push_cfi_reg ebp
+	push_cfi_reg esi
+	push_cfi_reg edi
+	push_cfi $0              /* Return value */
+	push_cfi %eax
 	movl    4(%eax), %ecx
 	movl    8(%eax), %edx
 	movl    12(%eax), %ebx
@@ -68,9 +68,9 @@ ENTRY(\op\()_safe_regs)
 	movl    (%eax), %eax
 	CFI_REMEMBER_STATE
 1:	\op
-2:	pushl_cfi %eax
+2:	push_cfi %eax
 	movl    4(%esp), %eax
-	popl_cfi (%eax)
+	pop_cfi (%eax)
 	addl    $4, %esp
 	CFI_ADJUST_CFA_OFFSET -4
 	movl    %ecx, 4(%eax)
@@ -79,11 +79,11 @@ ENTRY(\op\()_safe_regs)
 	movl    %ebp, 20(%eax)
 	movl    %esi, 24(%eax)
 	movl    %edi, 28(%eax)
-	popl_cfi %eax
-	popl_cfi_reg edi
-	popl_cfi_reg esi
-	popl_cfi_reg ebp
-	popl_cfi_reg ebx
+	pop_cfi %eax
+	pop_cfi_reg edi
+	pop_cfi_reg esi
+	pop_cfi_reg ebp
+	pop_cfi_reg ebx
 	ret
 3:
 	CFI_RESTORE_STATE
diff --git a/arch/x86/lib/rwsem.S b/arch/x86/lib/rwsem.S
index 2322abe..c630a80 100644
--- a/arch/x86/lib/rwsem.S
+++ b/arch/x86/lib/rwsem.S
@@ -34,10 +34,10 @@
  */
 
 #define save_common_regs \
-	pushl_cfi_reg ecx
+	push_cfi_reg ecx
 
 #define restore_common_regs \
-	popl_cfi_reg ecx
+	pop_cfi_reg ecx
 
 	/* Avoid uglifying the argument copying x86-64 needs to do. */
 	.macro movq src, dst
@@ -64,22 +64,22 @@
  */
 
 #define save_common_regs \
-	pushq_cfi_reg rdi; \
-	pushq_cfi_reg rsi; \
-	pushq_cfi_reg rcx; \
-	pushq_cfi_reg r8;  \
-	pushq_cfi_reg r9;  \
-	pushq_cfi_reg r10; \
-	pushq_cfi_reg r11
+	push_cfi_reg rdi; \
+	push_cfi_reg rsi; \
+	push_cfi_reg rcx; \
+	push_cfi_reg r8;  \
+	push_cfi_reg r9;  \
+	push_cfi_reg r10; \
+	push_cfi_reg r11
 
 #define restore_common_regs \
-	popq_cfi_reg r11; \
-	popq_cfi_reg r10; \
-	popq_cfi_reg r9; \
-	popq_cfi_reg r8; \
-	popq_cfi_reg rcx; \
-	popq_cfi_reg rsi; \
-	popq_cfi_reg rdi
+	pop_cfi_reg r11; \
+	pop_cfi_reg r10; \
+	pop_cfi_reg r9; \
+	pop_cfi_reg r8; \
+	pop_cfi_reg rcx; \
+	pop_cfi_reg rsi; \
+	pop_cfi_reg rdi
 
 #endif
 
@@ -87,10 +87,10 @@
 ENTRY(call_rwsem_down_read_failed)
 	CFI_STARTPROC
 	save_common_regs
-	__ASM_SIZE(push,_cfi_reg) __ASM_REG(dx)
+	push_cfi_reg __ASM_REG(dx)
 	movq %rax,%rdi
 	call rwsem_down_read_failed
-	__ASM_SIZE(pop,_cfi_reg) __ASM_REG(dx)
+	pop_cfi_reg __ASM_REG(dx)
 	restore_common_regs
 	ret
 	CFI_ENDPROC
@@ -122,10 +122,10 @@ ENDPROC(call_rwsem_wake)
 ENTRY(call_rwsem_downgrade_wake)
 	CFI_STARTPROC
 	save_common_regs
-	__ASM_SIZE(push,_cfi_reg) __ASM_REG(dx)
+	push_cfi_reg __ASM_REG(dx)
 	movq %rax,%rdi
 	call rwsem_downgrade_wake
-	__ASM_SIZE(pop,_cfi_reg) __ASM_REG(dx)
+	pop_cfi_reg __ASM_REG(dx)
 	restore_common_regs
 	ret
 	CFI_ENDPROC
diff --git a/arch/x86/lib/thunk_32.S b/arch/x86/lib/thunk_32.S
index 5eb7150..bb370de 100644
--- a/arch/x86/lib/thunk_32.S
+++ b/arch/x86/lib/thunk_32.S
@@ -13,9 +13,9 @@
 	.globl \name
 \name:
 	CFI_STARTPROC
-	pushl_cfi_reg eax
-	pushl_cfi_reg ecx
-	pushl_cfi_reg edx
+	push_cfi_reg eax
+	push_cfi_reg ecx
+	push_cfi_reg edx
 
 	.if \put_ret_addr_in_eax
 	/* Place EIP in the arg1 */
@@ -23,9 +23,9 @@
 	.endif
 
 	call \func
-	popl_cfi_reg edx
-	popl_cfi_reg ecx
-	popl_cfi_reg eax
+	pop_cfi_reg edx
+	pop_cfi_reg ecx
+	pop_cfi_reg eax
 	ret
 	CFI_ENDPROC
 	_ASM_NOKPROBE(\name)
diff --git a/arch/x86/lib/thunk_64.S b/arch/x86/lib/thunk_64.S
index f89ba4e9..39ad268 100644
--- a/arch/x86/lib/thunk_64.S
+++ b/arch/x86/lib/thunk_64.S
@@ -17,15 +17,15 @@
 	CFI_STARTPROC
 
 	/* this one pushes 9 elems, the next one would be %rIP */
-	pushq_cfi_reg rdi
-	pushq_cfi_reg rsi
-	pushq_cfi_reg rdx
-	pushq_cfi_reg rcx
-	pushq_cfi_reg rax
-	pushq_cfi_reg r8
-	pushq_cfi_reg r9
-	pushq_cfi_reg r10
-	pushq_cfi_reg r11
+	push_cfi_reg rdi
+	push_cfi_reg rsi
+	push_cfi_reg rdx
+	push_cfi_reg rcx
+	push_cfi_reg rax
+	push_cfi_reg r8
+	push_cfi_reg r9
+	push_cfi_reg r10
+	push_cfi_reg r11
 
 	.if \put_ret_addr_in_rdi
 	/* 9*8(%rsp) is return addr on stack */
@@ -60,15 +60,15 @@
 	CFI_STARTPROC
 	CFI_ADJUST_CFA_OFFSET 9*8
 restore:
-	popq_cfi_reg r11
-	popq_cfi_reg r10
-	popq_cfi_reg r9
-	popq_cfi_reg r8
-	popq_cfi_reg rax
-	popq_cfi_reg rcx
-	popq_cfi_reg rdx
-	popq_cfi_reg rsi
-	popq_cfi_reg rdi
+	pop_cfi_reg r11
+	pop_cfi_reg r10
+	pop_cfi_reg r9
+	pop_cfi_reg r8
+	pop_cfi_reg rax
+	pop_cfi_reg rcx
+	pop_cfi_reg rdx
+	pop_cfi_reg rsi
+	pop_cfi_reg rdi
 	ret
 	CFI_ENDPROC
 	_ASM_NOKPROBE(restore)
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH v4 3/3] x86, stackvalidate: Add asm frame pointer setup macros
  2015-05-18 16:34 [PATCH v4 0/3] Compile-time stack frame pointer validation Josh Poimboeuf
  2015-05-18 16:34 ` [PATCH v4 1/3] x86, stackvalidate: " Josh Poimboeuf
  2015-05-18 16:34 ` [PATCH v4 2/3] x86: Make push/pop CFI macros arch-independent Josh Poimboeuf
@ 2015-05-18 16:34 ` Josh Poimboeuf
  2015-05-20 10:33 ` [PATCH v4 0/3] Compile-time stack frame pointer validation Ingo Molnar
  3 siblings, 0 replies; 710+ messages in thread
From: Josh Poimboeuf @ 2015-05-18 16:34 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: Michal Marek, Peter Zijlstra, x86, live-patching, linux-kernel

Add some helper macros for asm functions so that they can comply with
stackvalidate.

The FUNC_ENTER and FUNC_RETURN macros help asm functions save, set up,
and restore frame pointers.

The RET_NOVALIDATE and FILE_NOVALIDATE macros can be used to whitelist
the few locations which need a return instruction outside of a callable
function.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/func.h | 82 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 82 insertions(+)
 create mode 100644 arch/x86/include/asm/func.h

diff --git a/arch/x86/include/asm/func.h b/arch/x86/include/asm/func.h
new file mode 100644
index 0000000..cd27ad4
--- /dev/null
+++ b/arch/x86/include/asm/func.h
@@ -0,0 +1,82 @@
+#ifndef _ASM_X86_FUNC_H
+#define _ASM_X86_FUNC_H
+
+#include <linux/linkage.h>
+#include <asm/dwarf2.h>
+#include <asm/asm.h>
+
+.macro FUNC_ENTER_NO_FP name
+	ENTRY(\name)
+	CFI_STARTPROC
+	CFI_DEF_CFA _ASM_SP, __ASM_SEL(4, 8)
+.endm
+
+.macro FUNC_RETURN_NO_FP name
+	CFI_DEF_CFA _ASM_SP, __ASM_SEL(4, 8)
+	ret
+	CFI_ENDPROC
+	ENDPROC(\name)
+.endm
+
+#ifdef CONFIG_FRAME_POINTER
+
+.macro FUNC_ENTER_FP name
+	FUNC_ENTER_NO_FP \name
+	push_cfi %_ASM_BP
+	CFI_REL_OFFSET _ASM_BP, 0
+	_ASM_MOV %_ASM_SP, %_ASM_BP
+	CFI_DEF_CFA_REGISTER _ASM_BP
+.endm
+
+.macro FUNC_RETURN_FP name
+	pop_cfi %_ASM_BP
+	CFI_RESTORE _ASM_BP
+	FUNC_RETURN_NO_FP \name
+.endm
+
+/*
+ * Every callable asm function should be bookended with FUNC_ENTER and
+ * FUNC_RETURN.  They do proper frame pointer and DWARF CFI setups in order to
+ * achieve more reliable stack traces.
+ *
+ * For the sake of simplicity and correct DWARF annotations, use of the macros
+ * requires that the return instruction comes at the end of the function.
+ */
+#define FUNC_ENTER(name) FUNC_ENTER_FP name
+#define FUNC_RETURN(name) FUNC_RETURN_FP name
+
+/*
+ * RET_NOVALIDATE tells the stack validation script to whitelist the return
+ * instruction immediately after the macro.  Only use it if you're completely
+ * sure you need a return instruction outside of a callable function.
+ * Otherwise, if the code can be called and you haven't annotated it with
+ * FUNC_ENTER/FUNC_RETURN, it will break stack trace reliability.
+ */
+.macro RET_NOVALIDATE
+	163:
+	.pushsection __stackvalidate_whitelist_ret, "ae"
+	_ASM_ALIGN
+	.long 163b - .
+	.popsection
+.endm
+
+/*
+ * FILE_NOVALIDATE is like RET_NOVALIDATE except it whitelists the entire file.
+ * Use with extreme caution or you will silently break stack traces.
+ */
+.macro FILE_NOVALIDATE
+	.pushsection __stackvalidate_whitelist_file, "ae"
+	.long 0
+	.popsection
+.endm
+
+#else /* !FRAME_POINTER */
+
+#define FUNC_ENTER(name) FUNC_ENTER_NO_FP name
+#define FUNC_RETURN(name) FUNC_RETURN_NO_FP name
+#define RET_NOVALIDATE
+#define FILE_NOVALIDATE
+
+#endif /* FRAME_POINTER */
+
+#endif /* _ASM_X86_FUNC_H */
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-05-18 13:33   ` Borislav Petkov
@ 2015-05-18 17:22       ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-18 17:22 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle, mcgrof

On Mon, 2015-05-18 at 15:33 +0200, Borislav Petkov wrote:
> On Fri, May 15, 2015 at 12:23:57PM -0600, Toshi Kani wrote:
> > This patch adds an additional argument, 'uniform', to
> > mtrr_type_lookup(), which returns 1 when a given range is
> > covered uniformly by MTRRs, i.e. the range is fully covered
> > by a single MTRR entry or the default type.
> > 
> > pud_set_huge() and pmd_set_huge() are changed to check the
> > new 'uniform' flag to see if it is safe to create a huge page
> > mapping to the range.  This allows them to create a huge page
> > mapping to a range covered by a single MTRR entry of any
> > memory type.  It also detects a non-optimal request properly.
> > They continue to check with the WB type since the WB type has
> > no effect even if a request spans multiple MTRR entries.
> > 
> > pmd_set_huge() logs a warning message to a non-optimal request
> > so that driver writers will be aware of such a case.  Drivers
> > should make a mapping request aligned to a single MTRR entry
> > when the range is covered by MTRRs.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > ---
> >  arch/x86/include/asm/mtrr.h        |    4 ++--
> >  arch/x86/kernel/cpu/mtrr/generic.c |   37 ++++++++++++++++++++++++++----------
> >  arch/x86/mm/pat.c                  |    4 ++--
> >  arch/x86/mm/pgtable.c              |   33 ++++++++++++++++++++------------
> >  4 files changed, 52 insertions(+), 26 deletions(-)
 :
> 
> All applied, 

Great!

> I reformatted the comments in this last one a bit and made
> the warning message hopefully a bit more descriptive:

I have a few comments below.

> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index c30f9819786b..f1894daa79ee 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -566,19 +566,24 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
>  /**
>   * pud_set_huge - setup kernel PUD mapping
>   *
> - * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
> - * this function does not set up a huge page when the range is covered
> - * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
> - * disabled.
> + * MTRRs can override PAT memory types with 4KiB granularity. Therefore,
> + * this function sets up a huge page only if all of the following
> + * conditions are met:

It should be "if any of the following condition is met".  Or, does NOT
setup if all of ...

> + *
> + *  - MTRRs are disabled.
> + *  - The range is mapped uniformly by an MTRR, i.e. the range is
> + *    fully covered by a single MTRR entry or the default type.
> + *  - The MTRR memory type is WB.
>   *
>   * Returns 1 on success and 0 on failure.
>   */
>  int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
>  {
> -	u8 mtrr;
> +	u8 mtrr, uniform;
>  
> -	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
> -	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
> +	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
> +	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
> +	    (mtrr != MTRR_TYPE_WRBACK))
>  		return 0;
>  
>  	prot = pgprot_4k_2_large(prot);
> @@ -593,20 +598,28 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
>  /**
>   * pmd_set_huge - setup kernel PMD mapping
>   *
> - * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
> - * this function does not set up a huge page when the range is covered
> - * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
> - * disabled.
> + * MTRRs can override PAT memory types with 4KiB granularity. Therefore,
> + * this function sets up a huge page only if all of the following
> + * conditions are met:

Ditto.

> + *
> + *  - MTRR is disabled.
> + *  - The range is mapped uniformly by an MTRR, i.e. the range is
> + *    fully covered by a single MTRR entry or the default type.
> + *  - The MTRR memory type is WB.
>   *
>   * Returns 1 on success and 0 on failure.
>   */
>  int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
>  {
> -	u8 mtrr;
> +	u8 mtrr, uniform;
>  
> -	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
> -	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
> +	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
> +	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
> +	    (mtrr != MTRR_TYPE_WRBACK)) {
> +		pr_warn_once("%s: Cannot satisfy [mem %#010llx-%#010llx] with a huge-page mapping due to MTRR override.\n",
> +			     __func__, addr, addr + PMD_SIZE);

This new message looks good.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
@ 2015-05-18 17:22       ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-18 17:22 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle, mcgrof

On Mon, 2015-05-18 at 15:33 +0200, Borislav Petkov wrote:
> On Fri, May 15, 2015 at 12:23:57PM -0600, Toshi Kani wrote:
> > This patch adds an additional argument, 'uniform', to
> > mtrr_type_lookup(), which returns 1 when a given range is
> > covered uniformly by MTRRs, i.e. the range is fully covered
> > by a single MTRR entry or the default type.
> > 
> > pud_set_huge() and pmd_set_huge() are changed to check the
> > new 'uniform' flag to see if it is safe to create a huge page
> > mapping to the range.  This allows them to create a huge page
> > mapping to a range covered by a single MTRR entry of any
> > memory type.  It also detects a non-optimal request properly.
> > They continue to check with the WB type since the WB type has
> > no effect even if a request spans multiple MTRR entries.
> > 
> > pmd_set_huge() logs a warning message to a non-optimal request
> > so that driver writers will be aware of such a case.  Drivers
> > should make a mapping request aligned to a single MTRR entry
> > when the range is covered by MTRRs.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > ---
> >  arch/x86/include/asm/mtrr.h        |    4 ++--
> >  arch/x86/kernel/cpu/mtrr/generic.c |   37 ++++++++++++++++++++++++++----------
> >  arch/x86/mm/pat.c                  |    4 ++--
> >  arch/x86/mm/pgtable.c              |   33 ++++++++++++++++++++------------
> >  4 files changed, 52 insertions(+), 26 deletions(-)
 :
> 
> All applied, 

Great!

> I reformatted the comments in this last one a bit and made
> the warning message hopefully a bit more descriptive:

I have a few comments below.

> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index c30f9819786b..f1894daa79ee 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -566,19 +566,24 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
>  /**
>   * pud_set_huge - setup kernel PUD mapping
>   *
> - * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
> - * this function does not set up a huge page when the range is covered
> - * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
> - * disabled.
> + * MTRRs can override PAT memory types with 4KiB granularity. Therefore,
> + * this function sets up a huge page only if all of the following
> + * conditions are met:

It should be "if any of the following condition is met".  Or, does NOT
setup if all of ...

> + *
> + *  - MTRRs are disabled.
> + *  - The range is mapped uniformly by an MTRR, i.e. the range is
> + *    fully covered by a single MTRR entry or the default type.
> + *  - The MTRR memory type is WB.
>   *
>   * Returns 1 on success and 0 on failure.
>   */
>  int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
>  {
> -	u8 mtrr;
> +	u8 mtrr, uniform;
>  
> -	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
> -	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
> +	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
> +	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
> +	    (mtrr != MTRR_TYPE_WRBACK))
>  		return 0;
>  
>  	prot = pgprot_4k_2_large(prot);
> @@ -593,20 +598,28 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
>  /**
>   * pmd_set_huge - setup kernel PMD mapping
>   *
> - * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
> - * this function does not set up a huge page when the range is covered
> - * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
> - * disabled.
> + * MTRRs can override PAT memory types with 4KiB granularity. Therefore,
> + * this function sets up a huge page only if all of the following
> + * conditions are met:

Ditto.

> + *
> + *  - MTRR is disabled.
> + *  - The range is mapped uniformly by an MTRR, i.e. the range is
> + *    fully covered by a single MTRR entry or the default type.
> + *  - The MTRR memory type is WB.
>   *
>   * Returns 1 on success and 0 on failure.
>   */
>  int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
>  {
> -	u8 mtrr;
> +	u8 mtrr, uniform;
>  
> -	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
> -	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
> +	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
> +	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
> +	    (mtrr != MTRR_TYPE_WRBACK)) {
> +		pr_warn_once("%s: Cannot satisfy [mem %#010llx-%#010llx] with a huge-page mapping due to MTRR override.\n",
> +			     __func__, addr, addr + PMD_SIZE);

This new message looks good.

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-05-18 17:22       ` Toshi Kani
  (?)
@ 2015-05-18 19:01       ` Borislav Petkov
  2015-05-18 19:31           ` Toshi Kani
  -1 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-18 19:01 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle, mcgrof

On Mon, May 18, 2015 at 11:22:39AM -0600, Toshi Kani wrote:
> > diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> > index c30f9819786b..f1894daa79ee 100644
> > --- a/arch/x86/mm/pgtable.c
> > +++ b/arch/x86/mm/pgtable.c
> > @@ -566,19 +566,24 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
> >  /**
> >   * pud_set_huge - setup kernel PUD mapping
> >   *
> > - * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
> > - * this function does not set up a huge page when the range is covered
> > - * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
> > - * disabled.
> > + * MTRRs can override PAT memory types with 4KiB granularity. Therefore,
> > + * this function sets up a huge page only if all of the following
> > + * conditions are met:
> 
> It should be "if any of the following condition is met".  Or, does NOT
> setup if all of ...
> 
> > + *
> > + *  - MTRRs are disabled.
> > + *  - The range is mapped uniformly by an MTRR, i.e. the range is
> > + *    fully covered by a single MTRR entry or the default type.
> > + *  - The MTRR memory type is WB.

Hmm, ok, so this is kinda like "any" but they also depend on each other.
So it is

If
	- MTRRs are disabled

	or

	- MTRRs are enabled and the range is completely covered by a single MTRR

	or

	 - MTRRs are enabled and the range is not completely covered by a
	 single MTRR but the memory type of the range is WB, even if covered by
	 multiple MTRRs.

Right?

So tell me this: why do we need to repeat that over those KVA helpers?
It's not like the callers can do anything about it, can they?

So maybe that comment - expanded into more detail - should be over
mtrr_type_lookup() only. That'll be better, methinks.

Hmm.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-05-18 19:01       ` Borislav Petkov
@ 2015-05-18 19:31           ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-18 19:31 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle, mcgrof

On Mon, 2015-05-18 at 21:01 +0200, Borislav Petkov wrote:
> On Mon, May 18, 2015 at 11:22:39AM -0600, Toshi Kani wrote:
> > > diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> > > index c30f9819786b..f1894daa79ee 100644
> > > --- a/arch/x86/mm/pgtable.c
> > > +++ b/arch/x86/mm/pgtable.c
> > > @@ -566,19 +566,24 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
> > >  /**
> > >   * pud_set_huge - setup kernel PUD mapping
> > >   *
> > > - * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
> > > - * this function does not set up a huge page when the range is covered
> > > - * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
> > > - * disabled.
> > > + * MTRRs can override PAT memory types with 4KiB granularity. Therefore,
> > > + * this function sets up a huge page only if all of the following
> > > + * conditions are met:
> > 
> > It should be "if any of the following condition is met".  Or, does NOT
> > setup if all of ...
> > 
> > > + *
> > > + *  - MTRRs are disabled.
> > > + *  - The range is mapped uniformly by an MTRR, i.e. the range is
> > > + *    fully covered by a single MTRR entry or the default type.
> > > + *  - The MTRR memory type is WB.
> 
> Hmm, ok, so this is kinda like "any" but they also depend on each other.
> So it is
> 
> If
> 	- MTRRs are disabled
> 
> 	or
> 
> 	- MTRRs are enabled and the range is completely covered by a single MTRR
> 
> 	or
> 
> 	 - MTRRs are enabled and the range is not completely covered by a
> 	 single MTRR but the memory type of the range is WB, even if covered by
> 	 multiple MTRRs.
> 
> Right?

Well, #2 and #3 are independent. That is, uniform can be set regardless
of a type value, and WB can be returned regardless of a uniform value.  

#1 is a new condition added per your comment that uniform no longer
covers the MTRR disabled case.  Yes, #2 and #3 depend on #1 being false.

> So tell me this: why do we need to repeat that over those KVA helpers?
> It's not like the callers can do anything about it, can they?
>
> So maybe that comment - expanded into more detail - should be over
> mtrr_type_lookup() only. That'll be better, methinks.

The caller is responsible for verifying the conditions that are safe to
create huge page.  So, I think the comments are needed here to state
such conditions.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
@ 2015-05-18 19:31           ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-18 19:31 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle, mcgrof

On Mon, 2015-05-18 at 21:01 +0200, Borislav Petkov wrote:
> On Mon, May 18, 2015 at 11:22:39AM -0600, Toshi Kani wrote:
> > > diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> > > index c30f9819786b..f1894daa79ee 100644
> > > --- a/arch/x86/mm/pgtable.c
> > > +++ b/arch/x86/mm/pgtable.c
> > > @@ -566,19 +566,24 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
> > >  /**
> > >   * pud_set_huge - setup kernel PUD mapping
> > >   *
> > > - * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
> > > - * this function does not set up a huge page when the range is covered
> > > - * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
> > > - * disabled.
> > > + * MTRRs can override PAT memory types with 4KiB granularity. Therefore,
> > > + * this function sets up a huge page only if all of the following
> > > + * conditions are met:
> > 
> > It should be "if any of the following condition is met".  Or, does NOT
> > setup if all of ...
> > 
> > > + *
> > > + *  - MTRRs are disabled.
> > > + *  - The range is mapped uniformly by an MTRR, i.e. the range is
> > > + *    fully covered by a single MTRR entry or the default type.
> > > + *  - The MTRR memory type is WB.
> 
> Hmm, ok, so this is kinda like "any" but they also depend on each other.
> So it is
> 
> If
> 	- MTRRs are disabled
> 
> 	or
> 
> 	- MTRRs are enabled and the range is completely covered by a single MTRR
> 
> 	or
> 
> 	 - MTRRs are enabled and the range is not completely covered by a
> 	 single MTRR but the memory type of the range is WB, even if covered by
> 	 multiple MTRRs.
> 
> Right?

Well, #2 and #3 are independent. That is, uniform can be set regardless
of a type value, and WB can be returned regardless of a uniform value.  

#1 is a new condition added per your comment that uniform no longer
covers the MTRR disabled case.  Yes, #2 and #3 depend on #1 being false.

> So tell me this: why do we need to repeat that over those KVA helpers?
> It's not like the callers can do anything about it, can they?
>
> So maybe that comment - expanded into more detail - should be over
> mtrr_type_lookup() only. That'll be better, methinks.

The caller is responsible for verifying the conditions that are safe to
create huge page.  So, I think the comments are needed here to state
such conditions.

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-05-18 19:31           ` Toshi Kani
  (?)
@ 2015-05-18 20:01           ` Borislav Petkov
  2015-05-18 20:21               ` Toshi Kani
  -1 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-18 20:01 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle, mcgrof

On Mon, May 18, 2015 at 01:31:59PM -0600, Toshi Kani wrote:
> Well, #2 and #3 are independent. That is, uniform can be set regardless

Not #2 and #3 above - the original #2 and #3 ones. I've written them out
detailed to show what I mean.

> The caller is responsible for verifying the conditions that are safe to
> create huge page.

How is the caller ever going to be able to do anything about it?

Regardless, I'd prefer to not duplicate comments and rather put a short
sentence pointing the reader to the comments over mtrr_type_lookup()
where this all is being explained in detail.

I'll fix it up.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-05-18 20:01           ` Borislav Petkov
@ 2015-05-18 20:21               ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-18 20:21 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle, mcgrof

On Mon, 2015-05-18 at 22:01 +0200, Borislav Petkov wrote:
> On Mon, May 18, 2015 at 01:31:59PM -0600, Toshi Kani wrote:
> > Well, #2 and #3 are independent. That is, uniform can be set regardless
> 
> Not #2 and #3 above - the original #2 and #3 ones. I've written them out
> detailed to show what I mean.

The original #2 and #3 are set independently as well. They do not depend
on each other condition being a specific value.

> > The caller is responsible for verifying the conditions that are safe to
> > create huge page.
> 
> How is the caller ever going to be able to do anything about it?

The caller is the one who makes the condition checks necessary to create
a huge page mapping.  mtrr_type_look() only returns how the given range
is related with MTRRs.

> Regardless, I'd prefer to not duplicate comments and rather put a short
> sentence pointing the reader to the comments over mtrr_type_lookup()
> where this all is being explained in detail.
> 
> I'll fix it up.

I appreciate your help.

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
@ 2015-05-18 20:21               ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-18 20:21 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle, mcgrof

On Mon, 2015-05-18 at 22:01 +0200, Borislav Petkov wrote:
> On Mon, May 18, 2015 at 01:31:59PM -0600, Toshi Kani wrote:
> > Well, #2 and #3 are independent. That is, uniform can be set regardless
> 
> Not #2 and #3 above - the original #2 and #3 ones. I've written them out
> detailed to show what I mean.

The original #2 and #3 are set independently as well. They do not depend
on each other condition being a specific value.

> > The caller is responsible for verifying the conditions that are safe to
> > create huge page.
> 
> How is the caller ever going to be able to do anything about it?

The caller is the one who makes the condition checks necessary to create
a huge page mapping.  mtrr_type_look() only returns how the given range
is related with MTRRs.

> Regardless, I'd prefer to not duplicate comments and rather put a short
> sentence pointing the reader to the comments over mtrr_type_lookup()
> where this all is being explained in detail.
> 
> I'll fix it up.

I appreciate your help.

Thanks,
-Toshi


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-05-18 20:21               ` Toshi Kani
  (?)
@ 2015-05-18 20:51               ` Borislav Petkov
  2015-05-18 21:53                   ` Toshi Kani
  -1 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-18 20:51 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle, mcgrof

On Mon, May 18, 2015 at 02:21:08PM -0600, Toshi Kani wrote:
> The caller is the one who makes the condition checks necessary to create
> a huge page mapping.

How? It would go and change MTRRs configuration and ranges and their
memory types so that a huge mapping succeeds?

Or go and try a different range?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-05-18 20:51               ` Borislav Petkov
@ 2015-05-18 21:53                   ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-18 21:53 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle, mcgrof

On Mon, 2015-05-18 at 22:51 +0200, Borislav Petkov wrote:
> On Mon, May 18, 2015 at 02:21:08PM -0600, Toshi Kani wrote:
> > The caller is the one who makes the condition checks necessary to create
> > a huge page mapping.
> 
> How? It would go and change MTRRs configuration and ranges and their
> memory types so that a huge mapping succeeds?
> 
> Or go and try a different range?

Try with a smaller page size.

The callers, pud_set_huge() and pmd_set_huge(), check if the given range
is safe with MTRRs for creating a huge page mapping.  If not, they fail
the request, which leads their callers, ioremap_pud_range() and
ioremap_pmd_range(), to retry with a smaller page size, i.e. 1GB -> 2MB
-> 4KB.  4KB may not have overlap with MTRRs (hence no checking is
necessary), which will succeed as before.

Thanks,
-Toshi





^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
@ 2015-05-18 21:53                   ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-18 21:53 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle, mcgrof

On Mon, 2015-05-18 at 22:51 +0200, Borislav Petkov wrote:
> On Mon, May 18, 2015 at 02:21:08PM -0600, Toshi Kani wrote:
> > The caller is the one who makes the condition checks necessary to create
> > a huge page mapping.
> 
> How? It would go and change MTRRs configuration and ranges and their
> memory types so that a huge mapping succeeds?
> 
> Or go and try a different range?

Try with a smaller page size.

The callers, pud_set_huge() and pmd_set_huge(), check if the given range
is safe with MTRRs for creating a huge page mapping.  If not, they fail
the request, which leads their callers, ioremap_pud_range() and
ioremap_pmd_range(), to retry with a smaller page size, i.e. 1GB -> 2MB
-> 4KB.  4KB may not have overlap with MTRRs (hence no checking is
necessary), which will succeed as before.

Thanks,
-Toshi




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* [RFC PATCH 0/4] x86, mwaitt: introduce AMD mwaitt support
@ 2015-05-19  8:01 Huang Rui
  2015-05-19  8:01 ` [RFC PATCH 1/4] x86, mwaitt: add monitorx and mwaitx instruction Huang Rui
                   ` (4 more replies)
  0 siblings, 5 replies; 710+ messages in thread
From: Huang Rui @ 2015-05-19  8:01 UTC (permalink / raw)
  To: Borislav Petkov, Len Brown, Rafael J. Wysocki, Thomas Gleixner
  Cc: x86, linux-kernel, Fengguang Wu, Aaron Lu, Tony Li, Huang Rui

Hi,

This patch set introduces a new instruction support on AMD Carrizo (Family
15h, Model 60h-6fh). It adds mwaitx idle function with a configurable
timer. The user can configure the idle method and timer value via the idle
kernel parameter.

Some discussions of the background, please see:
http://marc.info/?l=linux-kernel&m=143202042530498&w=2
http://marc.info/?l=linux-kernel&m=143161327003541&w=2

They are rebased on tip/sched/core.

Thanks,
Rui

Huang Rui (4):
  x86, mwaitt: add monitorx and mwaitx instruction
  x86, mwaitt: introduce mwaitx idle with a configurable timer
  x86, mwaitt: add document to describe mwaitx
  x86, mwait: fix redundant comment

 Documentation/kernel-parameters.txt | 10 ++++-
 arch/x86/include/asm/cpufeature.h   |  1 +
 arch/x86/include/asm/mwait.h        | 27 +++++++++++++
 arch/x86/include/asm/processor.h    |  2 +-
 arch/x86/kernel/process.c           | 81 ++++++++++++++++++++++++++++++++++++-
 5 files changed, 118 insertions(+), 3 deletions(-)

-- 
2.1.0


^ permalink raw reply	[flat|nested] 710+ messages in thread

* [RFC PATCH 1/4] x86, mwaitt: add monitorx and mwaitx instruction
  2015-05-19  8:01 [RFC PATCH 0/4] x86, mwaitt: introduce AMD mwaitt support Huang Rui
@ 2015-05-19  8:01 ` Huang Rui
  2015-05-19 11:29   ` Borislav Petkov
  2015-05-19  8:01 ` [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer Huang Rui
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 710+ messages in thread
From: Huang Rui @ 2015-05-19  8:01 UTC (permalink / raw)
  To: Borislav Petkov, Len Brown, Rafael J. Wysocki, Thomas Gleixner
  Cc: x86, linux-kernel, Fengguang Wu, Aaron Lu, Tony Li, Huang Rui

On AMD Carrizo processors (Family 15h, Model 60h-6fh), there is a new
feature called MWAITT (Mwait with a timer) as an extension of
Monitor/Mwait.

MWAITT, another name is MWAITX (MWAIT with extensions), has a configurable
timer that causes MWAITX to exit on expiration.

Compared with MONITOR/MWAIT, there are minor differences in opcode and
input parameters.

MWAITX ECX[1]: enable timer if set
MWAITX EBX[31:0]: max wait time expressed in SW P0 clocks

The software P0 frequency is the same as the TSC frequency.

Max timeout = EBX/(TSC frequency)

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 arch/x86/include/asm/cpufeature.h |  1 +
 arch/x86/include/asm/mwait.h      | 25 +++++++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 3d6606f..3ef1f6e 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -176,6 +176,7 @@
 #define X86_FEATURE_PERFCTR_NB  ( 6*32+24) /* NB performance counter extensions */
 #define X86_FEATURE_BPEXT	(6*32+26) /* data breakpoint extension */
 #define X86_FEATURE_PERFCTR_L2	( 6*32+28) /* L2 performance counter extensions */
+#define X86_FEATURE_MWAITT	( 6*32+29) /* Mwait extension (MonitorX/MwaitX) */
 
 /*
  * Auxiliary flags: Linux defined - For features scattered in various
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 653dfa7..b91136f 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -23,6 +23,14 @@ static inline void __monitor(const void *eax, unsigned long ecx,
 		     :: "a" (eax), "c" (ecx), "d"(edx));
 }
 
+static inline void __monitorx(const void *eax, unsigned long ecx,
+			     unsigned long edx)
+{
+	/* "monitorx %eax, %ecx, %edx;" */
+	asm volatile(".byte 0x0f, 0x01, 0xfa;"
+		     :: "a" (eax), "c" (ecx), "d"(edx));
+}
+
 static inline void __mwait(unsigned long eax, unsigned long ecx)
 {
 	/* "mwait %eax, %ecx;" */
@@ -30,6 +38,14 @@ static inline void __mwait(unsigned long eax, unsigned long ecx)
 		     :: "a" (eax), "c" (ecx));
 }
 
+static inline void __mwaitx(unsigned long eax, unsigned long ebx,
+		unsigned long ecx)
+{
+	/* "mwaitx %eax, %ebx, %ecx;" */
+	asm volatile(".byte 0x0f, 0x01, 0xfb;"
+		     :: "a" (eax), "b" (ebx), "c" (ecx));
+}
+
 static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
 {
 	trace_hardirqs_on();
@@ -38,6 +54,15 @@ static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
 		     :: "a" (eax), "c" (ecx));
 }
 
+static inline void __sti_mwaitx(unsigned long eax, unsigned long ebx,
+		unsigned long ecx)
+{
+	trace_hardirqs_on();
+	/* "mwaitx %eax, %ebx, %ecx;" */
+	asm volatile("sti; .byte 0x0f, 0x01, 0xfb;"
+		     :: "a" (eax), "b" (ebx), "c" (ecx));
+}
+
 /*
  * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
  * which can obviate IPI to trigger checking of need_resched.
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-19  8:01 [RFC PATCH 0/4] x86, mwaitt: introduce AMD mwaitt support Huang Rui
  2015-05-19  8:01 ` [RFC PATCH 1/4] x86, mwaitt: add monitorx and mwaitx instruction Huang Rui
@ 2015-05-19  8:01 ` Huang Rui
  2015-05-19 11:31   ` Borislav Petkov
  2015-05-21  1:34   ` Andy Lutomirski
  2015-05-19  8:01 ` [RFC PATCH 3/4] x86, mwaitt: add document to describe mwaitx Huang Rui
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 710+ messages in thread
From: Huang Rui @ 2015-05-19  8:01 UTC (permalink / raw)
  To: Borislav Petkov, Len Brown, Rafael J. Wysocki, Thomas Gleixner
  Cc: x86, linux-kernel, Fengguang Wu, Aaron Lu, Tony Li, Huang Rui

MWAITX/MWAIT does not let the cpu core go into C1 state on AMD processors.
The cpu core still consumes less power while waiting, and has faster exit
from waiting than "Halt". This patch implements an interface using the
kernel parameter "idle=" to configure mwaitx type and timer value.

If "idle=mwaitx", the timeout will be set as the maximum value
((2^64 - 1) * TSC cycle).
If "idle=mwaitx,100", the timeout will be set as 100ns.
If the processor doesn't support MWAITX, then halt is used.

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 arch/x86/include/asm/mwait.h     |  2 +
 arch/x86/include/asm/processor.h |  2 +-
 arch/x86/kernel/process.c        | 79 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index b91136f..c4e51e7 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -14,6 +14,8 @@
 #define CPUID5_ECX_INTERRUPT_BREAK	0x2
 
 #define MWAIT_ECX_INTERRUPT_BREAK	0x1
+#define MWAITX_ECX_TIMER_ENABLE		0x2
+#define MWAITX_EBX_WAIT_TIMEOUT		0xffffffff
 
 static inline void __monitor(const void *eax, unsigned long ecx,
 			     unsigned long edx)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 23ba676..0f60e94 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -733,7 +733,7 @@ extern unsigned long		boot_option_idle_override;
 extern bool			amd_e400_c1e_detected;
 
 enum idle_boot_override {IDLE_NO_OVERRIDE=0, IDLE_HALT, IDLE_NOMWAIT,
-			 IDLE_POLL};
+			 IDLE_POLL, IDLE_MWAITX};
 
 extern void enable_sep_cpu(void);
 extern int sysenter_setup(void);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 6e338e3..9d68193 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -30,6 +30,7 @@
 #include <asm/debugreg.h>
 #include <asm/nmi.h>
 #include <asm/tlbflush.h>
+#include <asm/x86_init.h>
 
 /*
  * per-CPU TSS segments. Threads are completely 'soft' on Linux,
@@ -276,6 +277,7 @@ unsigned long boot_option_idle_override = IDLE_NO_OVERRIDE;
 EXPORT_SYMBOL(boot_option_idle_override);
 
 static void (*x86_idle)(void);
+static unsigned long idle_param;
 
 #ifndef CONFIG_SMP
 static inline void play_dead(void)
@@ -444,6 +446,17 @@ static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
 	return 1;
 }
 
+static int not_support_mwaitx(const struct cpuinfo_x86 *c)
+{
+	if (c->x86_vendor != X86_VENDOR_AMD)
+		return 1;
+
+	if (!cpu_has(c, X86_FEATURE_MWAITT))
+		return 1;
+
+	return 0;
+}
+
 /*
  * MONITOR/MWAIT with no hints, used for default default C1 state.
  * This invokes MWAIT with interrutps enabled and no flags,
@@ -470,12 +483,45 @@ static void mwait_idle(void)
 	__current_clr_polling();
 }
 
+/*
+ * AMD Excavator processors support the new MONITORX/MWAITX instructions.
+ * The function is similar to mwait but with a timer. On AMD platforms
+ * mwaitx does not let the core go into C1 state. This provides for a
+ * faster waiting exit speed. The user can configure the idle method and
+ * timer value via the idle kernel parameter.
+ */
+static void mwaitx_idle(void)
+{
+	unsigned long ebx, ecx;
+
+	ebx = idle_param;
+	ecx = MWAITX_ECX_TIMER_ENABLE;
+
+	if (!current_set_polling_and_test()) {
+		__monitorx((void *)&current_thread_info()->flags, 0, 0);
+		if (!need_resched())
+			__sti_mwaitx(0, ebx, ecx);
+		else
+			local_irq_enable();
+	} else {
+		local_irq_enable();
+	}
+	__current_clr_polling();
+}
+
 void select_idle_routine(const struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_SMP
 	if (boot_option_idle_override == IDLE_POLL && smp_num_siblings > 1)
 		pr_warn_once("WARNING: polling idle and HT enabled, performance may degrade\n");
 #endif
+
+	if (boot_option_idle_override == IDLE_MWAITX &&
+	    not_support_mwaitx(c)) {
+		pr_warn_once("WARNING: mwaitx not supported, using default idle support\n");
+		x86_idle = default_idle;
+	}
+
 	if (x86_idle || boot_option_idle_override == IDLE_POLL)
 		return;
 
@@ -499,6 +545,8 @@ void __init init_amd_e400_c1e_mask(void)
 
 static int __init idle_setup(char *str)
 {
+	unsigned long timeout, tsc_freq;
+
 	if (!str)
 		return -EINVAL;
 
@@ -524,6 +572,37 @@ static int __init idle_setup(char *str)
 		 * of boot_option_idle_override.
 		 */
 		boot_option_idle_override = IDLE_NOMWAIT;
+	} else if (!strncmp(str, "mwaitx", 6)) {
+		/*
+		 * If the boot option of "idle=mwaitx" is added, it means
+		 * that mwaitx will be enabled if current processor
+		 * supports it. If not supported, use default_idle.
+		 */
+		x86_idle = mwaitx_idle;
+		boot_option_idle_override = IDLE_MWAITX;
+		str += 6;
+		if (str && (str[0] == ',')) {
+			if (kstrtoul(str + 1, 0, &timeout)) {
+				pr_warn_once("WARNING: timer value should be numerical\n");
+				return -1;
+			}
+
+			tsc_freq = x86_platform.calibrate_tsc();
+			if (!tsc_freq) {
+				pr_warn_once("WARNING: can not calculate TSC khz\n");
+				return -1;
+			}
+
+			/*
+			 * TSC loops (EBX input) = Timer(nsec) *
+			 * TSC freq(khz) / 1000000
+			 */
+			timeout = timeout * tsc_freq;
+			do_div(timeout, 1000000);
+
+			idle_param = timeout;
+		} else
+			idle_param = MWAITX_EBX_WAIT_TIMEOUT;
 	} else
 		return -1;
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [RFC PATCH 3/4] x86, mwaitt: add document to describe mwaitx
  2015-05-19  8:01 [RFC PATCH 0/4] x86, mwaitt: introduce AMD mwaitt support Huang Rui
  2015-05-19  8:01 ` [RFC PATCH 1/4] x86, mwaitt: add monitorx and mwaitx instruction Huang Rui
  2015-05-19  8:01 ` [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer Huang Rui
@ 2015-05-19  8:01 ` Huang Rui
  2015-05-19  8:01 ` [RFC PATCH 4/4] x86, mwait: fix redundant comment Huang Rui
  2015-05-19  8:57 ` [RFC PATCH 0/4] x86, mwaitt: introduce AMD mwaitt support Borislav Petkov
  4 siblings, 0 replies; 710+ messages in thread
From: Huang Rui @ 2015-05-19  8:01 UTC (permalink / raw)
  To: Borislav Petkov, Len Brown, Rafael J. Wysocki, Thomas Gleixner
  Cc: x86, linux-kernel, Fengguang Wu, Aaron Lu, Tony Li, Huang Rui

This patch adds a description in the kernel parameters documentation for
"idle=mwaitx".

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 Documentation/kernel-parameters.txt | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 61ab162..af88c5c 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1363,7 +1363,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			Claim all unknown PCI IDE storage controllers.
 
 	idle=		[X86]
-			Format: idle=poll, idle=halt, idle=nomwait
+			Format: idle=poll, idle=halt, idle=nomwait,
+				idle=mwaitx[,<timer value>]
 			Poll forces a polling idle loop that can slightly
 			improve the performance of waking up a idle CPU, but
 			will use a lot of power and make the system run hot.
@@ -1371,6 +1372,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			idle=halt: Halt is forced to be used for CPU idle.
 			In such case C2/C3 won't be used again.
 			idle=nomwait: Disable mwait for CPU C-states
+			idle=mwaitx[,<timer value>]: Enable mwaitx and
+			configure max waiting timeout value (usec) for CPU
+			idle. If <timer value> is not specified, the
+			maximum timeout value (ebx=0xffffffff * TSC cycle)
+			will be used.  If <timer value> is specified as
+			zero, the timer is disabled. Waiting exits via
+			other reasons, such as an IPI.
 
 	ignore_loglevel	[KNL]
 			Ignore loglevel setting - this will print /all/
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [RFC PATCH 4/4] x86, mwait: fix redundant comment
  2015-05-19  8:01 [RFC PATCH 0/4] x86, mwaitt: introduce AMD mwaitt support Huang Rui
                   ` (2 preceding siblings ...)
  2015-05-19  8:01 ` [RFC PATCH 3/4] x86, mwaitt: add document to describe mwaitx Huang Rui
@ 2015-05-19  8:01 ` Huang Rui
  2015-05-19  9:40   ` Borislav Petkov
  2015-05-19  8:57 ` [RFC PATCH 0/4] x86, mwaitt: introduce AMD mwaitt support Borislav Petkov
  4 siblings, 1 reply; 710+ messages in thread
From: Huang Rui @ 2015-05-19  8:01 UTC (permalink / raw)
  To: Borislav Petkov, Len Brown, Rafael J. Wysocki, Thomas Gleixner
  Cc: x86, linux-kernel, Fengguang Wu, Aaron Lu, Tony Li, Huang Rui

This patch removes the redundant comment.

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 arch/x86/kernel/process.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 9d68193..e3e12b6 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -458,7 +458,7 @@ static int not_support_mwaitx(const struct cpuinfo_x86 *c)
 }
 
 /*
- * MONITOR/MWAIT with no hints, used for default default C1 state.
+ * MONITOR/MWAIT with no hints, used for default C1 state.
  * This invokes MWAIT with interrutps enabled and no flags,
  * which is backwards compatible with the original MWAIT implementation.
  */
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 0/4] x86, mwaitt: introduce AMD mwaitt support
  2015-05-19  8:01 [RFC PATCH 0/4] x86, mwaitt: introduce AMD mwaitt support Huang Rui
                   ` (3 preceding siblings ...)
  2015-05-19  8:01 ` [RFC PATCH 4/4] x86, mwait: fix redundant comment Huang Rui
@ 2015-05-19  8:57 ` Borislav Petkov
  2015-05-19  9:44   ` Huang Rui
  4 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-19  8:57 UTC (permalink / raw)
  To: Huang Rui
  Cc: Len Brown, Rafael J. Wysocki, Thomas Gleixner, x86, linux-kernel,
	Fengguang Wu, Aaron Lu, Tony Li

On Tue, May 19, 2015 at 04:01:08PM +0800, Huang Rui wrote:
> Hi,
> 
> This patch set introduces a new instruction support on AMD Carrizo (Family
> 15h, Model 60h-6fh). It adds mwaitx idle function with a configurable
> timer. The user can configure the idle method and timer value via the idle
> kernel parameter.
> 
> Some discussions of the background, please see:
> http://marc.info/?l=linux-kernel&m=143202042530498&w=2
> http://marc.info/?l=linux-kernel&m=143161327003541&w=2
> 
> They are rebased on tip/sched/core.

Just a note for the future - please use tip/master to base your patches
on.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 4/4] x86, mwait: fix redundant comment
  2015-05-19  8:01 ` [RFC PATCH 4/4] x86, mwait: fix redundant comment Huang Rui
@ 2015-05-19  9:40   ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-19  9:40 UTC (permalink / raw)
  To: Huang Rui
  Cc: Len Brown, Rafael J. Wysocki, Thomas Gleixner, x86, linux-kernel,
	Fengguang Wu, Aaron Lu, Tony Li

On Tue, May 19, 2015 at 04:01:12PM +0800, Huang Rui wrote:
> This patch removes the redundant comment.
> 
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> ---
>  arch/x86/kernel/process.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 9d68193..e3e12b6 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -458,7 +458,7 @@ static int not_support_mwaitx(const struct cpuinfo_x86 *c)
>  }
>  
>  /*
> - * MONITOR/MWAIT with no hints, used for default default C1 state.
> + * MONITOR/MWAIT with no hints, used for default C1 state.
>   * This invokes MWAIT with interrutps enabled and no flags,
>   * which is backwards compatible with the original MWAIT implementation.
>   */

Applied, thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 0/4] x86, mwaitt: introduce AMD mwaitt support
  2015-05-19  8:57 ` [RFC PATCH 0/4] x86, mwaitt: introduce AMD mwaitt support Borislav Petkov
@ 2015-05-19  9:44   ` Huang Rui
  0 siblings, 0 replies; 710+ messages in thread
From: Huang Rui @ 2015-05-19  9:44 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Len Brown, Rafael J. Wysocki, Thomas Gleixner, x86, linux-kernel,
	Fengguang Wu, Aaron Lu, Tony Li

On Tue, May 19, 2015 at 10:57:06AM +0200, Borislav Petkov wrote:
> On Tue, May 19, 2015 at 04:01:08PM +0800, Huang Rui wrote:
> > Hi,
> > 
> > This patch set introduces a new instruction support on AMD Carrizo (Family
> > 15h, Model 60h-6fh). It adds mwaitx idle function with a configurable
> > timer. The user can configure the idle method and timer value via the idle
> > kernel parameter.
> > 
> > Some discussions of the background, please see:
> > http://marc.info/?l=linux-kernel&m=143202042530498&w=2
> > http://marc.info/?l=linux-kernel&m=143161327003541&w=2
> > 
> > They are rebased on tip/sched/core.
> 
> Just a note for the future - please use tip/master to base your patches
> on.
> 

OK, I got it.

Thanks,
Rui

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 1/4] x86, mwaitt: add monitorx and mwaitx instruction
  2015-05-19  8:01 ` [RFC PATCH 1/4] x86, mwaitt: add monitorx and mwaitx instruction Huang Rui
@ 2015-05-19 11:29   ` Borislav Petkov
  2015-05-21  8:54     ` Huang Rui
  0 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-19 11:29 UTC (permalink / raw)
  To: Huang Rui
  Cc: Len Brown, Rafael J. Wysocki, Thomas Gleixner, x86, linux-kernel,
	Fengguang Wu, Aaron Lu, Tony Li

On Tue, May 19, 2015 at 04:01:09PM +0800, Huang Rui wrote:
> On AMD Carrizo processors (Family 15h, Model 60h-6fh), there is a new
> feature called MWAITT (Mwait with a timer) as an extension of
> Monitor/Mwait.
> 
> MWAITT, another name is MWAITX (MWAIT with extensions), has a configurable
> timer that causes MWAITX to exit on expiration.
> 
> Compared with MONITOR/MWAIT, there are minor differences in opcode and
> input parameters.
> 
> MWAITX ECX[1]: enable timer if set
> MWAITX EBX[31:0]: max wait time expressed in SW P0 clocks

What's the behavior if you set EBX to some value but don't enable the
timer with ECX? Normal MWAIT?

> The software P0 frequency is the same as the TSC frequency.
> 
> Max timeout = EBX/(TSC frequency)

That's max timeout in seconds then.

> Signed-off-by: Huang Rui <ray.huang@amd.com>
> ---
>  arch/x86/include/asm/cpufeature.h |  1 +
>  arch/x86/include/asm/mwait.h      | 25 +++++++++++++++++++++++++
>  2 files changed, 26 insertions(+)
> 
> diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> index 3d6606f..3ef1f6e 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -176,6 +176,7 @@
>  #define X86_FEATURE_PERFCTR_NB  ( 6*32+24) /* NB performance counter extensions */
>  #define X86_FEATURE_BPEXT	(6*32+26) /* data breakpoint extension */
>  #define X86_FEATURE_PERFCTR_L2	( 6*32+28) /* L2 performance counter extensions */
> +#define X86_FEATURE_MWAITT	( 6*32+29) /* Mwait extension (MonitorX/MwaitX) */
>
>  /*
>   * Auxiliary flags: Linux defined - For features scattered in various
> diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
> index 653dfa7..b91136f 100644
> --- a/arch/x86/include/asm/mwait.h
> +++ b/arch/x86/include/asm/mwait.h
> @@ -23,6 +23,14 @@ static inline void __monitor(const void *eax, unsigned long ecx,
>  		     :: "a" (eax), "c" (ecx), "d"(edx));
>  }
>  
> +static inline void __monitorx(const void *eax, unsigned long ecx,
> +			     unsigned long edx)
> +{
> +	/* "monitorx %eax, %ecx, %edx;" */
> +	asm volatile(".byte 0x0f, 0x01, 0xfa;"

Ah ok, ModRM extension to secondary opcode 0x1. Simply filling out the
empty slots after SWAPGS, RDTSCP, ... :)

> +		     :: "a" (eax), "c" (ecx), "d"(edx));
> +}
> +
>  static inline void __mwait(unsigned long eax, unsigned long ecx)
>  {
>  	/* "mwait %eax, %ecx;" */
> @@ -30,6 +38,14 @@ static inline void __mwait(unsigned long eax, unsigned long ecx)
>  		     :: "a" (eax), "c" (ecx));
>  }
>  
> +static inline void __mwaitx(unsigned long eax, unsigned long ebx,
> +		unsigned long ecx)
> +{
> +	/* "mwaitx %eax, %ebx, %ecx;" */
> +	asm volatile(".byte 0x0f, 0x01, 0xfb;"
> +		     :: "a" (eax), "b" (ebx), "c" (ecx));
> +}
> +
>  static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
>  {
>  	trace_hardirqs_on();
> @@ -38,6 +54,15 @@ static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
>  		     :: "a" (eax), "c" (ecx));
>  }
>  
> +static inline void __sti_mwaitx(unsigned long eax, unsigned long ebx,
> +		unsigned long ecx)

Please align the argument on the new line to the opening brace:

static inline void __sti_mwaitx(unsigned long eax, unsigned long ebx,
				unsigned long ecx)

> +{
> +	trace_hardirqs_on();
> +	/* "mwaitx %eax, %ebx, %ecx;" */
> +	asm volatile("sti; .byte 0x0f, 0x01, 0xfb;"
> +		     :: "a" (eax), "b" (ebx), "c" (ecx));
> +}
> +
>  /*
>   * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
>   * which can obviate IPI to trigger checking of need_resched.
> -- 
> 2.1.0
> 

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-19  8:01 ` [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer Huang Rui
@ 2015-05-19 11:31   ` Borislav Petkov
  2015-05-20  8:55     ` Ingo Molnar
  2015-05-21 13:26     ` Huang Rui
  2015-05-21  1:34   ` Andy Lutomirski
  1 sibling, 2 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-19 11:31 UTC (permalink / raw)
  To: Huang Rui
  Cc: Len Brown, Rafael J. Wysocki, Thomas Gleixner, x86, linux-kernel,
	Fengguang Wu, Aaron Lu, Tony Li

On Tue, May 19, 2015 at 04:01:10PM +0800, Huang Rui wrote:
> MWAITX/MWAIT does not let the cpu core go into C1 state on AMD processors.
> The cpu core still consumes less power while waiting, and has faster exit
> from waiting than "Halt". This patch implements an interface using the
> kernel parameter "idle=" to configure mwaitx type and timer value.
> 
> If "idle=mwaitx", the timeout will be set as the maximum value
> ((2^64 - 1) * TSC cycle).
> If "idle=mwaitx,100", the timeout will be set as 100ns.
> If the processor doesn't support MWAITX, then halt is used.

Ok, I see what you're trying here and I think this is not the optimal
approach.

So let me explain how I see it, you correct me if I'm wrong:

So we want to do MWAITX so that we can save us idle entry/exit overhead
with HLT. Because MWAITX is faster, reportedly.

Now, if we want to do that, we want to do it dynamically and adjust the
MWAITX sleep interval depending on the system, usage pattern, system
load and so on.

And for that we would need an adaptive scheme which approximates each
idle interval. Simply taking TSC before we enter idle and after we come
out would give us each idle residency duration and we can do some simple
math to approximate it.

Now, what would that bring us: faster wakeup times.

And here comes the 10^6 $ question: why are we doing all the fun?

I'm thinking we want to find a cutoff duration where for smaller
durations it is worth to do MWAITX and have faster entry/exit times and
for bigger durations we want to do HLT because it'll get into C1E and
give us higher power savings.

We don't want to do MWAITX too long because that'll burn more power
relatively to HLT but we don't want to do HLT for shorter periods
because then entry/exit costs.

Am I on the right track at least?

> Signed-off-by: Huang Rui <ray.huang@amd.com>
> ---
>  arch/x86/include/asm/mwait.h     |  2 +
>  arch/x86/include/asm/processor.h |  2 +-
>  arch/x86/kernel/process.c        | 79 ++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 82 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
> index b91136f..c4e51e7 100644
> --- a/arch/x86/include/asm/mwait.h
> +++ b/arch/x86/include/asm/mwait.h
> @@ -14,6 +14,8 @@
>  #define CPUID5_ECX_INTERRUPT_BREAK	0x2
>  
>  #define MWAIT_ECX_INTERRUPT_BREAK	0x1
> +#define MWAITX_ECX_TIMER_ENABLE		0x2

						Use BIT(1) here.

> +#define MWAITX_EBX_WAIT_TIMEOUT		0xffffffff
>  
>  static inline void __monitor(const void *eax, unsigned long ecx,
>  			     unsigned long edx)
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 23ba676..0f60e94 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -733,7 +733,7 @@ extern unsigned long		boot_option_idle_override;
>  extern bool			amd_e400_c1e_detected;
>  
>  enum idle_boot_override {IDLE_NO_OVERRIDE=0, IDLE_HALT, IDLE_NOMWAIT,
> -			 IDLE_POLL};
> +			 IDLE_POLL, IDLE_MWAITX};
>  
>  extern void enable_sep_cpu(void);
>  extern int sysenter_setup(void);
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 6e338e3..9d68193 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -30,6 +30,7 @@
>  #include <asm/debugreg.h>
>  #include <asm/nmi.h>
>  #include <asm/tlbflush.h>
> +#include <asm/x86_init.h>
>  
>  /*
>   * per-CPU TSS segments. Thre ads are completely 'soft' on Linux,
> @@ -276,6 +277,7 @@ unsigned long boot_option_idle_override = IDLE_NO_OVERRIDE;
>  EXPORT_SYMBOL(boot_option_idle_override);
>  
>  static void (*x86_idle)(void);
> +static unsigned long idle_param;
>  
>  #ifndef CONFIG_SMP
>  static inline void play_dead(void)
> @@ -444,6 +446,17 @@ static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
>  	return 1;
>  }
>  
> +static int not_support_mwaitx(const struct cpuinfo_x86 *c)
> +{
> +	if (c->x86_vendor != X86_VENDOR_AMD)
> +		return 1;
> +
> +	if (!cpu_has(c, X86_FEATURE_MWAITT))
> +		return 1;
> +
> +	return 0;
> +}
> +
>  /*
>   * MONITOR/MWAIT with no hints, used for default default C1 state.
>   * This invokes MWAIT with interrutps enabled and no flags,
> @@ -470,12 +483,45 @@ static void mwait_idle(void)
>  	__current_clr_polling();
>  }
>  
> +/*
> + * AMD Excavator processors support the new MONITORX/MWAITX instructions.

No need for that especially when newer than XV processors start
supporting those too.

> + * The function is similar to mwait but with a timer. On AMD platforms
> + * mwaitx does not let the core go into C1 state. This provides for a
> + * faster waiting exit speed. The user can configure the idle method and
> + * timer value via the idle kernel parameter.
> + */

...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-05-18 21:53                   ` Toshi Kani
  (?)
@ 2015-05-19 11:44                   ` Borislav Petkov
  2015-05-19 13:23                     ` Borislav Petkov
  -1 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-19 11:44 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle, mcgrof

On Mon, May 18, 2015 at 03:53:14PM -0600, Toshi Kani wrote:
> On Mon, 2015-05-18 at 22:51 +0200, Borislav Petkov wrote:
> > On Mon, May 18, 2015 at 02:21:08PM -0600, Toshi Kani wrote:
> > > The caller is the one who makes the condition checks necessary to create
> > > a huge page mapping.
> > 
> > How? It would go and change MTRRs configuration and ranges and their
> > memory types so that a huge mapping succeeds?
> > 
> > Or go and try a different range?
> 
> Try with a smaller page size.
> 
> The callers, pud_set_huge() and pmd_set_huge(), check if the given range
> is safe with MTRRs for creating a huge page mapping.  If not, they fail
> the request, which leads their callers, ioremap_pud_range() and
> ioremap_pmd_range(), to retry with a smaller page size, i.e. 1GB -> 2MB
> -> 4KB.  4KB may not have overlap with MTRRs (hence no checking is
> necessary), which will succeed as before.

Ok, now *this* should be in the form of a comment over the KVA helpers,
not the MTRR aspect. Callers of those functions would have to know that
- they shouldn't care about MTRR setup.

The MTRR aspect with the 3 conditions should be only over
mtrr_type_lookup().

I'll integrate it into the patch.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-05-19 11:44                   ` Borislav Petkov
@ 2015-05-19 13:23                     ` Borislav Petkov
  2015-05-19 13:47                         ` Toshi Kani
  2015-05-20 11:55                         ` Ingo Molnar
  0 siblings, 2 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-19 13:23 UTC (permalink / raw)
  To: Toshi Kani
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle, mcgrof

On Tue, May 19, 2015 at 01:44:37PM +0200, Borislav Petkov wrote:
> > Try with a smaller page size.
> > 
> > The callers, pud_set_huge() and pmd_set_huge(), check if the given range
> > is safe with MTRRs for creating a huge page mapping.  If not, they fail
> > the request, which leads their callers, ioremap_pud_range() and
> > ioremap_pmd_range(), to retry with a smaller page size, i.e. 1GB -> 2MB
> > -> 4KB.  4KB may not have overlap with MTRRs (hence no checking is
> > necessary), which will succeed as before.

Scratch that, I think I have it now. And I even have a good feeling
about it :-)

---
From: Toshi Kani <toshi.kani@hp.com>
Date: Fri, 15 May 2015 12:23:57 -0600
Subject: [PATCH] x86/mm: Enhance MTRR checks in kernel mapping helpers

This patch adds the argument 'uniform' to mtrr_type_lookup(), which gets
set to 1 when a given range is covered uniformly by MTRRs, i.e. the
range is fully covered by a single MTRR entry or the default type.

Change pud_set_huge() and pmd_set_huge() to honor the 'uniform' flag to
see if it is safe to create a huge page mapping in the range.

This allows them to create a huge page mapping in a range covered by
a single MTRR entry of any memory type. It also detects a non-optimal
request properly. They continue to check with the WB type since it does
not effectively change the uniform mapping even if a request spans
multiple MTRR entries.

pmd_set_huge() logs a warning message to a non-optimal request so that
driver writers will be aware of such a case. Drivers should make a
mapping request aligned to a single MTRR entry when the range is covered
by MTRRs.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Cc: dave.hansen@intel.com
Cc: Elliott@hp.com
Cc: pebolle@tiscali.nl
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-mm <linux-mm@kvack.org>
Cc: x86-ml <x86@kernel.org>
Cc: lkml <linux-kernel@vger.kernel.org>
Link: http://lkml.kernel.org/r/1431714237-880-7-git-send-email-toshi.kani@hp.com
[ Realign, flesh out comments, improve warning message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/mtrr.h        |  4 ++--
 arch/x86/kernel/cpu/mtrr/generic.c | 40 ++++++++++++++++++++++++++++----------
 arch/x86/mm/pat.c                  |  4 ++--
 arch/x86/mm/pgtable.c              | 38 +++++++++++++++++++++++-------------
 4 files changed, 58 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index bb03a547c1ab..a31759e1edd9 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -31,7 +31,7 @@
  * arch_phys_wc_add and arch_phys_wc_del.
  */
 # ifdef CONFIG_MTRR
-extern u8 mtrr_type_lookup(u64 addr, u64 end);
+extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform);
 extern void mtrr_save_fixed_ranges(void *);
 extern void mtrr_save_state(void);
 extern int mtrr_add(unsigned long base, unsigned long size,
@@ -50,7 +50,7 @@ extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
 extern int amd_special_default_mtrr(void);
 extern int phys_wc_to_mtrr_index(int handle);
 #  else
-static inline u8 mtrr_type_lookup(u64 addr, u64 end)
+static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
 {
 	/*
 	 * Return no-MTRRs:
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index e51100c49eea..f782d9b62cb3 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -147,19 +147,24 @@ static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
  * Return Value:
  * MTRR_TYPE_(type) - Matched memory type or default memory type (unmatched)
  *
- * Output Argument:
+ * Output Arguments:
  * repeat - Set to 1 when [start:end] spanned across MTRR range and type
  *	    returned corresponds only to [start:*partial_end].  Caller has
  *	    to lookup again for [*partial_end:end].
+ *
+ * uniform - Set to 1 when an MTRR covers the region uniformly, i.e. the
+ *	     region is fully covered by a single MTRR entry or the default
+ *	     type.
  */
 static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
-				    int *repeat)
+				    int *repeat, u8 *uniform)
 {
 	int i;
 	u64 base, mask;
 	u8 prev_match, curr_match;
 
 	*repeat = 0;
+	*uniform = 1;
 
 	/* Make end inclusive instead of exclusive */
 	end--;
@@ -214,6 +219,7 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
 
 			end = *partial_end - 1; /* end is inclusive */
 			*repeat = 1;
+			*uniform = 0;
 		}
 
 		if ((start & mask) != (base & mask))
@@ -225,6 +231,7 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
 			continue;
 		}
 
+		*uniform = 0;
 		if (check_type_overlap(&prev_match, &curr_match))
 			return curr_match;
 	}
@@ -241,10 +248,15 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
  * Return Values:
  * MTRR_TYPE_(type)  - The effective MTRR type for the region
  * MTRR_TYPE_INVALID - MTRR is disabled
+ *
+ * Output Argument:
+ * uniform - Set to 1 when an MTRR covers the region uniformly, i.e. the
+ *	     region is fully covered by a single MTRR entry or the default
+ *	     type.
  */
-u8 mtrr_type_lookup(u64 start, u64 end)
+u8 mtrr_type_lookup(u64 start, u64 end, u8 *uniform)
 {
-	u8 type, prev_type;
+	u8 type, prev_type, is_uniform = 1, dummy;
 	int repeat;
 	u64 partial_end;
 
@@ -260,14 +272,18 @@ u8 mtrr_type_lookup(u64 start, u64 end)
 	 */
 	if ((start < 0x100000) &&
 	    (mtrr_state.have_fixed) &&
-	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
-		return mtrr_type_lookup_fixed(start, end);
+	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
+		is_uniform = 0;
+		type = mtrr_type_lookup_fixed(start, end);
+		goto out;
+	}
 
 	/*
 	 * Look up the variable ranges.  Look of multiple ranges matching
 	 * this address and pick type as per MTRR precedence.
 	 */
-	type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
+	type = mtrr_type_lookup_variable(start, end, &partial_end,
+					 &repeat, &is_uniform);
 
 	/*
 	 * Common path is with repeat = 0.
@@ -278,15 +294,19 @@ u8 mtrr_type_lookup(u64 start, u64 end)
 	while (repeat) {
 		prev_type = type;
 		start = partial_end;
-		type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
+		is_uniform = 0;
+		type = mtrr_type_lookup_variable(start, end, &partial_end,
+						 &repeat, &dummy);
 
 		if (check_type_overlap(&prev_type, &type))
-			return type;
+			goto out;
 	}
 
 	if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
-		return MTRR_TYPE_WRBACK;
+		type = MTRR_TYPE_WRBACK;
 
+out:
+	*uniform = is_uniform;
 	return type;
 }
 
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 35af6771a95a..372ad422c2c3 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -267,9 +267,9 @@ static unsigned long pat_x_mtrr_type(u64 start, u64 end,
 	 * request is for WB.
 	 */
 	if (req_type == _PAGE_CACHE_MODE_WB) {
-		u8 mtrr_type;
+		u8 mtrr_type, uniform;
 
-		mtrr_type = mtrr_type_lookup(start, end);
+		mtrr_type = mtrr_type_lookup(start, end, &uniform);
 		if (mtrr_type != MTRR_TYPE_WRBACK)
 			return _PAGE_CACHE_MODE_UC_MINUS;
 
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index c30f9819786b..df2f8a587438 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -566,19 +566,28 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
 /**
  * pud_set_huge - setup kernel PUD mapping
  *
- * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
- * this function does not set up a huge page when the range is covered
- * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
- * disabled.
+ * MTRRs can override PAT memory types with 4KiB granularity. Therefore, this
+ * function sets up a huge page only if any of the following conditions are met:
+ *
+ * - MTRRs are disabled, or
+ *
+ * - MTRRs are enabled and the range is completely covered by a single MTRR, or
+ *
+ * - MTRRs are enabled and the range is not completely covered by a single MTRR
+ *   but the memory type of the range is WB, even if covered by multiple MTRRs.
+ *
+ * Callers should try to decrease page size (1GB -> 2MB -> 4K) if the bigger
+ * page mapping attempt fails.
  *
  * Returns 1 on success and 0 on failure.
  */
 int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 {
-	u8 mtrr;
+	u8 mtrr, uniform;
 
-	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
+	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
+	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
+	    (mtrr != MTRR_TYPE_WRBACK))
 		return 0;
 
 	prot = pgprot_4k_2_large(prot);
@@ -593,20 +602,21 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 /**
  * pmd_set_huge - setup kernel PMD mapping
  *
- * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
- * this function does not set up a huge page when the range is covered
- * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
- * disabled.
+ * See text over pud_set_huge() above.
  *
  * Returns 1 on success and 0 on failure.
  */
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
 {
-	u8 mtrr;
+	u8 mtrr, uniform;
 
-	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
+	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
+	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
+	    (mtrr != MTRR_TYPE_WRBACK)) {
+		pr_warn_once("%s: Cannot satisfy [mem %#010llx-%#010llx] with a huge-page mapping due to MTRR override.\n",
+			     __func__, addr, addr + PMD_SIZE);
 		return 0;
+	}
 
 	prot = pgprot_4k_2_large(prot);
 
-- 
2.3.5

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-05-19 13:23                     ` Borislav Petkov
@ 2015-05-19 13:47                         ` Toshi Kani
  2015-05-20 11:55                         ` Ingo Molnar
  1 sibling, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-19 13:47 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle, mcgrof

On Tue, 2015-05-19 at 15:23 +0200, Borislav Petkov wrote:
> On Tue, May 19, 2015 at 01:44:37PM +0200, Borislav Petkov wrote:
> > > Try with a smaller page size.
> > > 
> > > The callers, pud_set_huge() and pmd_set_huge(), check if the given range
> > > is safe with MTRRs for creating a huge page mapping.  If not, they fail
> > > the request, which leads their callers, ioremap_pud_range() and
> > > ioremap_pmd_range(), to retry with a smaller page size, i.e. 1GB -> 2MB
> > > -> 4KB.  4KB may not have overlap with MTRRs (hence no checking is
> > > necessary), which will succeed as before.
> 
> Scratch that, I think I have it now. And I even have a good feeling
> about it :-)

Looks good. Thanks for the update!
-Toshi



^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
@ 2015-05-19 13:47                         ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-19 13:47 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel, dave.hansen,
	Elliott, pebolle, mcgrof

On Tue, 2015-05-19 at 15:23 +0200, Borislav Petkov wrote:
> On Tue, May 19, 2015 at 01:44:37PM +0200, Borislav Petkov wrote:
> > > Try with a smaller page size.
> > > 
> > > The callers, pud_set_huge() and pmd_set_huge(), check if the given range
> > > is safe with MTRRs for creating a huge page mapping.  If not, they fail
> > > the request, which leads their callers, ioremap_pud_range() and
> > > ioremap_pmd_range(), to retry with a smaller page size, i.e. 1GB -> 2MB
> > > -> 4KB.  4KB may not have overlap with MTRRs (hence no checking is
> > > necessary), which will succeed as before.
> 
> Scratch that, I think I have it now. And I even have a good feeling
> about it :-)

Looks good. Thanks for the update!
-Toshi


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* [PATCH] x86, cpuinfo x86_model_id whitespace cleanup
@ 2015-05-19 15:43 Prarit Bhargava
  2015-05-19 16:56 ` Borislav Petkov
                   ` (3 more replies)
  0 siblings, 4 replies; 710+ messages in thread
From: Prarit Bhargava @ 2015-05-19 15:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Prarit Bhargava, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	x86, Andy Lutomirski, Borislav Petkov, Denys Vlasenko,
	Dave Hansen, Igor Mammedov, Fenghua Yu, Brian Gerst

When comparing 'model name' fields in /proc/cpuinfo it was noticed that
a simple test comparing the model name fields was failing.  After some
quick investigation it was noticed that the model name fields were actually
different -- processor 0's model name field had trailing white space removed,
while the other processors did not.

Another way of seeing this behaviour is to convert spaces into underscores
in the output of /proc/cpuinfo,

[thetango@prarit ~]# grep "^model name" /proc/cpuinfo | uniq -c | sed 's/\ /_/g'
______1_model_name      :_AMD_Opteron(TM)_Processor_6272
_____63_model_name      :_AMD_Opteron(TM)_Processor_6272_________________

which shows two different model name fields even though they should be the
same.

This occurs because the kernel calls strim() on cpu 0's x86_model_id field
to output a pretty message to the console in print_cpu_info(), and as a
result truncates the whitespace at the end of the x86_model_id field.

The x86_model_id field should be the same for the same processors.  This
patch uses string functions to remove both leading and trailing whitespace
in the x86_model_id field.  As a result the print_cpu_info() output looks
like

smpboot: CPU0: AMD Opteron(TM) Processor 6272 (fam: 15, model: 01, stepping: 02)

and the x86_model_id field is correct on all processors on AMD platforms

[thetango@prarit ~]# grep "^model name" /proc/cpuinfo | uniq -c | sed 's/\ /_/g'
_____64_model_name      :_AMD_Opteron(TM)_Processor_6272

and the functionality is correct on an Intel box:

[thetango@prarit2]# grep "^model name" /proc/cpuinfo | uniq -c | sed 's/\ /_/g'
____144_model_name      :_Intel(R)_Xeon(R)_CPU_E7-8890_v3_@_2.50GHz

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/kernel/cpu/common.c |   17 ++++-------------
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index a62cf04..9405c1e 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -419,7 +419,6 @@ static const struct cpu_dev *cpu_devs[X86_VENDOR_NUM] = {};
 static void get_model_name(struct cpuinfo_x86 *c)
 {
 	unsigned int *v;
-	char *p, *q;
 
 	if (c->extended_cpuid_level < 0x80000004)
 		return;
@@ -431,18 +430,10 @@ static void get_model_name(struct cpuinfo_x86 *c)
 	c->x86_model_id[48] = 0;
 
 	/*
-	 * Intel chips right-justify this string for some dumb reason;
-	 * undo that brain damage:
+	 * Remove leading whitespace on Intel processors and trailing
+	 * whitespace on AMD processors.
 	 */
-	p = q = &c->x86_model_id[0];
-	while (*p == ' ')
-		p++;
-	if (p != q) {
-		while (*p)
-			*q++ = *p++;
-		while (q <= &c->x86_model_id[48])
-			*q++ = '\0';	/* Zero-pad the rest */
-	}
+	strlcpy(c->x86_model_id, strim(c->x86_model_id), 48);
 }
 
 void cpu_detect_cache_sizes(struct cpuinfo_x86 *c)
@@ -1122,7 +1113,7 @@ void print_cpu_info(struct cpuinfo_x86 *c)
 		printk(KERN_CONT "%s ", vendor);
 
 	if (c->x86_model_id[0])
-		printk(KERN_CONT "%s", strim(c->x86_model_id));
+		printk(KERN_CONT "%s", c->x86_model_id);
 	else
 		printk(KERN_CONT "%d86", c->x86);
 
-- 
1.7.9.3


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [PATCH] x86, cpuinfo x86_model_id whitespace cleanup
  2015-05-19 15:43 [PATCH] x86, cpuinfo x86_model_id whitespace cleanup Prarit Bhargava
@ 2015-05-19 16:56 ` Borislav Petkov
  2015-05-19 17:25 ` Brian Gerst
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-19 16:56 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Andy Lutomirski, Denys Vlasenko, Dave Hansen, Igor Mammedov,
	Fenghua Yu, Brian Gerst

On Tue, May 19, 2015 at 11:43:30AM -0400, Prarit Bhargava wrote:
> When comparing 'model name' fields in /proc/cpuinfo it was noticed that
> a simple test comparing the model name fields was failing.  After some
> quick investigation it was noticed that the model name fields were actually
> different -- processor 0's model name field had trailing white space removed,
> while the other processors did not.
> 
> Another way of seeing this behaviour is to convert spaces into underscores
> in the output of /proc/cpuinfo,
> 
> [thetango@prarit ~]# grep "^model name" /proc/cpuinfo | uniq -c | sed 's/\ /_/g'
> ______1_model_name      :_AMD_Opteron(TM)_Processor_6272
> _____63_model_name      :_AMD_Opteron(TM)_Processor_6272_________________
> 
> which shows two different model name fields even though they should be the
> same.
> 
> This occurs because the kernel calls strim() on cpu 0's x86_model_id field
> to output a pretty message to the console in print_cpu_info(), and as a
> result truncates the whitespace at the end of the x86_model_id field.
> 
> The x86_model_id field should be the same for the same processors.  This
> patch uses string functions to remove both leading and trailing whitespace
> in the x86_model_id field.  As a result the print_cpu_info() output looks
> like
> 
> smpboot: CPU0: AMD Opteron(TM) Processor 6272 (fam: 15, model: 01, stepping: 02)
> 
> and the x86_model_id field is correct on all processors on AMD platforms
> 
> [thetango@prarit ~]# grep "^model name" /proc/cpuinfo | uniq -c | sed 's/\ /_/g'
> _____64_model_name      :_AMD_Opteron(TM)_Processor_6272
> 
> and the functionality is correct on an Intel box:
> 
> [thetango@prarit2]# grep "^model name" /proc/cpuinfo | uniq -c | sed 's/\ /_/g'
> ____144_model_name      :_Intel(R)_Xeon(R)_CPU_E7-8890_v3_@_2.50GHz
> 
> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Denys Vlasenko <dvlasenk@redhat.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Brian Gerst <brgerst@gmail.com>
> ---
>  arch/x86/kernel/cpu/common.c |   17 ++++-------------
>  1 file changed, 4 insertions(+), 13 deletions(-)

Applied, thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH] x86, cpuinfo x86_model_id whitespace cleanup
  2015-05-19 15:43 [PATCH] x86, cpuinfo x86_model_id whitespace cleanup Prarit Bhargava
  2015-05-19 16:56 ` Borislav Petkov
@ 2015-05-19 17:25 ` Brian Gerst
  2015-05-19 18:13   ` Borislav Petkov
  2015-05-20  6:34 ` Ingo Molnar
  2015-06-02  8:42 ` [tip:x86/cpu] x86/cpu: Trim model ID whitespace tip-bot for Borislav Petkov
  3 siblings, 1 reply; 710+ messages in thread
From: Brian Gerst @ 2015-05-19 17:25 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: Linux Kernel Mailing List, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, the arch/x86 maintainers, Andy Lutomirski,
	Borislav Petkov, Denys Vlasenko, Dave Hansen, Igor Mammedov,
	Fenghua Yu

On Tue, May 19, 2015 at 11:43 AM, Prarit Bhargava <prarit@redhat.com> wrote:
> When comparing 'model name' fields in /proc/cpuinfo it was noticed that
> a simple test comparing the model name fields was failing.  After some
> quick investigation it was noticed that the model name fields were actually
> different -- processor 0's model name field had trailing white space removed,
> while the other processors did not.
>
> Another way of seeing this behaviour is to convert spaces into underscores
> in the output of /proc/cpuinfo,
>
> [thetango@prarit ~]# grep "^model name" /proc/cpuinfo | uniq -c | sed 's/\ /_/g'
> ______1_model_name      :_AMD_Opteron(TM)_Processor_6272
> _____63_model_name      :_AMD_Opteron(TM)_Processor_6272_________________
>
> which shows two different model name fields even though they should be the
> same.
>
> This occurs because the kernel calls strim() on cpu 0's x86_model_id field
> to output a pretty message to the console in print_cpu_info(), and as a
> result truncates the whitespace at the end of the x86_model_id field.
>
> The x86_model_id field should be the same for the same processors.  This
> patch uses string functions to remove both leading and trailing whitespace
> in the x86_model_id field.  As a result the print_cpu_info() output looks
> like
>
> smpboot: CPU0: AMD Opteron(TM) Processor 6272 (fam: 15, model: 01, stepping: 02)
>
> and the x86_model_id field is correct on all processors on AMD platforms
>
> [thetango@prarit ~]# grep "^model name" /proc/cpuinfo | uniq -c | sed 's/\ /_/g'
> _____64_model_name      :_AMD_Opteron(TM)_Processor_6272
>
> and the functionality is correct on an Intel box:
>
> [thetango@prarit2]# grep "^model name" /proc/cpuinfo | uniq -c | sed 's/\ /_/g'
> ____144_model_name      :_Intel(R)_Xeon(R)_CPU_E7-8890_v3_@_2.50GHz
>
> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Denys Vlasenko <dvlasenk@redhat.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: Brian Gerst <brgerst@gmail.com>
> ---
>  arch/x86/kernel/cpu/common.c |   17 ++++-------------
>  1 file changed, 4 insertions(+), 13 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index a62cf04..9405c1e 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -419,7 +419,6 @@ static const struct cpu_dev *cpu_devs[X86_VENDOR_NUM] = {};
>  static void get_model_name(struct cpuinfo_x86 *c)
>  {
>         unsigned int *v;
> -       char *p, *q;
>
>         if (c->extended_cpuid_level < 0x80000004)
>                 return;
> @@ -431,18 +430,10 @@ static void get_model_name(struct cpuinfo_x86 *c)
>         c->x86_model_id[48] = 0;
>
>         /*
> -        * Intel chips right-justify this string for some dumb reason;
> -        * undo that brain damage:
> +        * Remove leading whitespace on Intel processors and trailing
> +        * whitespace on AMD processors.
>          */
> -       p = q = &c->x86_model_id[0];
> -       while (*p == ' ')
> -               p++;
> -       if (p != q) {
> -               while (*p)
> -                       *q++ = *p++;
> -               while (q <= &c->x86_model_id[48])
> -                       *q++ = '\0';    /* Zero-pad the rest */
> -       }
> +       strlcpy(c->x86_model_id, strim(c->x86_model_id), 48);
>  }

Using strlcpy in this manner could fail if it does larger than byte
copies and they overlap.  I would instead allocate a temp buffer on
the stack:

    unsigned char model_id[49];
    v = (unsigned int *)model_id;
   ...
    strlcpy(c->x86_model_id, strim(model_id), 48);

--
Brian Gerst

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH] x86, cpuinfo x86_model_id whitespace cleanup
  2015-05-19 17:25 ` Brian Gerst
@ 2015-05-19 18:13   ` Borislav Petkov
  2015-05-19 18:44     ` Andy Lutomirski
  0 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-19 18:13 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Prarit Bhargava, Linux Kernel Mailing List, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, the arch/x86 maintainers,
	Andy Lutomirski, Denys Vlasenko, Dave Hansen, Igor Mammedov,
	Fenghua Yu

On Tue, May 19, 2015 at 01:25:59PM -0400, Brian Gerst wrote:
> Using strlcpy in this manner could fail if it does larger than byte
> copies and they overlap.

Why?

AFAICT, strlcpy() calls memcpy() and memcpy should handle overlapping
buffers just fine.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH] x86, cpuinfo x86_model_id whitespace cleanup
  2015-05-19 18:13   ` Borislav Petkov
@ 2015-05-19 18:44     ` Andy Lutomirski
  2015-05-19 19:22       ` Borislav Petkov
  0 siblings, 1 reply; 710+ messages in thread
From: Andy Lutomirski @ 2015-05-19 18:44 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, Fenghua Yu, Dave Hansen, Thomas Gleixner,
	Denys Vlasenko, Ingo Molnar, Brian Gerst, H. Peter Anvin,
	Igor Mammedov, the arch/x86 maintainers, Prarit Bhargava

On May 19, 2015 11:13 AM, "Borislav Petkov" <bp@alien8.de> wrote:
>
> On Tue, May 19, 2015 at 01:25:59PM -0400, Brian Gerst wrote:
> > Using strlcpy in this manner could fail if it does larger than byte
> > copies and they overlap.
>
> Why?
>
> AFAICT, strlcpy() calls memcpy() and memcpy should handle overlapping
> buffers just fine.

Are you thinking of memmove?

--Andy

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH] x86, cpuinfo x86_model_id whitespace cleanup
  2015-05-19 18:44     ` Andy Lutomirski
@ 2015-05-19 19:22       ` Borislav Petkov
  2015-05-19 20:16         ` Andy Lutomirski
  2015-05-27 17:18         ` H. Peter Anvin
  0 siblings, 2 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-19 19:22 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel, Fenghua Yu, Dave Hansen, Thomas Gleixner,
	Denys Vlasenko, Ingo Molnar, Brian Gerst, H. Peter Anvin,
	Igor Mammedov, the arch/x86 maintainers, Prarit Bhargava

On Tue, May 19, 2015 at 11:44:41AM -0700, Andy Lutomirski wrote:
> On May 19, 2015 11:13 AM, "Borislav Petkov" <bp@alien8.de> wrote:
> >
> > On Tue, May 19, 2015 at 01:25:59PM -0400, Brian Gerst wrote:
> > > Using strlcpy in this manner could fail if it does larger than byte
> > > copies and they overlap.
> >
> > Why?
> >
> > AFAICT, strlcpy() calls memcpy() and memcpy should handle overlapping
> > buffers just fine.
> 
> Are you thinking of memmove?

I guess I'm trying to find out why don't we have a BIG FAT WARNING over
memcpy saying not to use it with overlapping buffers and larger than
byte sizes. Or maybe this is something everyone, except me, just knows
and that's a "Doh, Boris, of course!".

Btw, can we still avoid using the temporary buffer and use strncpy()
instead? AFAICT, that does byte copies, from looking at the asm.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH] x86, cpuinfo x86_model_id whitespace cleanup
  2015-05-19 19:22       ` Borislav Petkov
@ 2015-05-19 20:16         ` Andy Lutomirski
  2015-05-19 20:26           ` Joe Perches
  2015-05-19 20:31           ` Borislav Petkov
  2015-05-27 17:18         ` H. Peter Anvin
  1 sibling, 2 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-05-19 20:16 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, Fenghua Yu, Dave Hansen, Thomas Gleixner,
	Denys Vlasenko, Ingo Molnar, Brian Gerst, H. Peter Anvin,
	Igor Mammedov, the arch/x86 maintainers, Prarit Bhargava

On Tue, May 19, 2015 at 12:22 PM, Borislav Petkov <bp@alien8.de> wrote:
> On Tue, May 19, 2015 at 11:44:41AM -0700, Andy Lutomirski wrote:
>> On May 19, 2015 11:13 AM, "Borislav Petkov" <bp@alien8.de> wrote:
>> >
>> > On Tue, May 19, 2015 at 01:25:59PM -0400, Brian Gerst wrote:
>> > > Using strlcpy in this manner could fail if it does larger than byte
>> > > copies and they overlap.
>> >
>> > Why?
>> >
>> > AFAICT, strlcpy() calls memcpy() and memcpy should handle overlapping
>> > buffers just fine.
>>
>> Are you thinking of memmove?
>
> I guess I'm trying to find out why don't we have a BIG FAT WARNING over
> memcpy saying not to use it with overlapping buffers and larger than
> byte sizes. Or maybe this is something everyone, except me, just knows
> and that's a "Doh, Boris, of course!".
>
> Btw, can we still avoid using the temporary buffer and use strncpy()
> instead? AFAICT, that does byte copies, from looking at the asm.

It's not just chunk size; it's the direction.  If the dest starts
after the source but overlaps it and you copy forwards, then you can
clobber the end of the source before you read it.  memmove is
specifically intended to avoid this.

Would it be possible to just use memmove directly?

--Andy

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH] x86, cpuinfo x86_model_id whitespace cleanup
  2015-05-19 20:16         ` Andy Lutomirski
@ 2015-05-19 20:26           ` Joe Perches
  2015-05-19 20:28             ` Joe Perches
  2015-05-19 20:31           ` Borislav Petkov
  1 sibling, 1 reply; 710+ messages in thread
From: Joe Perches @ 2015-05-19 20:26 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Borislav Petkov, linux-kernel, Fenghua Yu, Dave Hansen,
	Thomas Gleixner, Denys Vlasenko, Ingo Molnar, Brian Gerst,
	H. Peter Anvin, Igor Mammedov, the arch/x86 maintainers,
	Prarit Bhargava

iOn Tue, 2015-05-19 at 13:16 -0700, Andy Lutomirski wrote:
> On Tue, May 19, 2015 at 12:22 PM, Borislav Petkov <bp@alien8.de> wrote:
> > On Tue, May 19, 2015 at 11:44:41AM -0700, Andy Lutomirski wrote:
> >> On May 19, 2015 11:13 AM, "Borislav Petkov" <bp@alien8.de> wrote:
> >> >
> >> > On Tue, May 19, 2015 at 01:25:59PM -0400, Brian Gerst wrote:
> >> > > Using strlcpy in this manner could fail if it does larger than byte
> >> > > copies and they overlap.
> >> >
> >> > Why?

I think this is traditionally handled by specifying that
the strcpy strings may not overlap, so the suggested

+	strlcpy(c->x86_model_id, strim(c->x86_model_id), 48);

isn't good code.

A temporary intermediate buffer is required.


^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH] x86, cpuinfo x86_model_id whitespace cleanup
  2015-05-19 20:26           ` Joe Perches
@ 2015-05-19 20:28             ` Joe Perches
  0 siblings, 0 replies; 710+ messages in thread
From: Joe Perches @ 2015-05-19 20:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Borislav Petkov, linux-kernel, Fenghua Yu, Dave Hansen,
	Thomas Gleixner, Denys Vlasenko, Ingo Molnar, Brian Gerst,
	H. Peter Anvin, Igor Mammedov, the arch/x86 maintainers,
	Prarit Bhargava

On Tue, 2015-05-19 at 13:26 -0700, Joe Perches wrote:
> iOn Tue, 2015-05-19 at 13:16 -0700, Andy Lutomirski wrote:
> > On Tue, May 19, 2015 at 12:22 PM, Borislav Petkov <bp@alien8.de> wrote:
> > > On Tue, May 19, 2015 at 11:44:41AM -0700, Andy Lutomirski wrote:
> > >> On May 19, 2015 11:13 AM, "Borislav Petkov" <bp@alien8.de> wrote:
> > >> >
> > >> > On Tue, May 19, 2015 at 01:25:59PM -0400, Brian Gerst wrote:
> > >> > > Using strlcpy in this manner could fail if it does larger than byte
> > >> > > copies and they overlap.
> > >> >
> > >> > Why?
> 
> I think this is traditionally handled by specifying that
> the strcpy strings may not overlap, so the suggested
> 
> +	strlcpy(c->x86_model_id, strim(c->x86_model_id), 48);
> 
> isn't good code.
> 
> A temporary intermediate buffer is required.

Or memmove. (duh)



^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH] x86, cpuinfo x86_model_id whitespace cleanup
  2015-05-19 20:16         ` Andy Lutomirski
  2015-05-19 20:26           ` Joe Perches
@ 2015-05-19 20:31           ` Borislav Petkov
  2015-05-19 22:17             ` Prarit Bhargava
  1 sibling, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-19 20:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel, Fenghua Yu, Dave Hansen, Thomas Gleixner,
	Denys Vlasenko, Ingo Molnar, Brian Gerst, H. Peter Anvin,
	Igor Mammedov, the arch/x86 maintainers, Prarit Bhargava

On Tue, May 19, 2015 at 01:16:23PM -0700, Andy Lutomirski wrote:
> It's not just chunk size; it's the direction.  If the dest starts
> after the source but overlaps it and you copy forwards, then you can
> clobber the end of the source before you read it.  memmove is

Some things should simply be done solely in userspace :-)

> specifically intended to avoid this.
> 
> Would it be possible to just use memmove directly?

Yeah.

Prarit, care to send a v2?

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH] x86, cpuinfo x86_model_id whitespace cleanup
  2015-05-19 20:31           ` Borislav Petkov
@ 2015-05-19 22:17             ` Prarit Bhargava
  0 siblings, 0 replies; 710+ messages in thread
From: Prarit Bhargava @ 2015-05-19 22:17 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, linux-kernel, Fenghua Yu, Dave Hansen,
	Thomas Gleixner, Denys Vlasenko, Ingo Molnar, Brian Gerst,
	H. Peter Anvin, Igor Mammedov, the arch/x86 maintainers



On 05/19/2015 04:31 PM, Borislav Petkov wrote:
> On Tue, May 19, 2015 at 01:16:23PM -0700, Andy Lutomirski wrote:
>> It's not just chunk size; it's the direction.  If the dest starts
>> after the source but overlaps it and you copy forwards, then you can
>> clobber the end of the source before you read it.  memmove is
> 
> Some things should simply be done solely in userspace :-)
> 
>> specifically intended to avoid this.
>>
>> Would it be possible to just use memmove directly?
> 
> Yeah.
> 
> Prarit, care to send a v2?

Yep, I can do that.  It'll have to wait until I get into the office in the
morning so I can retest.

P.

> 
> Thanks.
> 

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH] x86, cpuinfo x86_model_id whitespace cleanup
  2015-05-19 15:43 [PATCH] x86, cpuinfo x86_model_id whitespace cleanup Prarit Bhargava
  2015-05-19 16:56 ` Borislav Petkov
  2015-05-19 17:25 ` Brian Gerst
@ 2015-05-20  6:34 ` Ingo Molnar
  2015-05-20 10:15   ` Prarit Bhargava
  2015-06-02  8:42 ` [tip:x86/cpu] x86/cpu: Trim model ID whitespace tip-bot for Borislav Petkov
  3 siblings, 1 reply; 710+ messages in thread
From: Ingo Molnar @ 2015-05-20  6:34 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Dave Hansen,
	Igor Mammedov, Fenghua Yu, Brian Gerst


* Prarit Bhargava <prarit@redhat.com> wrote:

>  arch/x86/kernel/cpu/common.c |   17 ++++-------------
>  1 file changed, 4 insertions(+), 13 deletions(-)

So I saw this title:

   [PATCH] x86, cpuinfo x86_model_id whitespace cleanup

... and in an early morning deconcentrated state was skipped the 
changelog and was looking for a whitespace coding style cleanup:

> @@ -431,18 +430,10 @@ static void get_model_name(struct cpuinfo_x86 *c)
>  	c->x86_model_id[48] = 0;
>  
>  	/*
> -	 * Intel chips right-justify this string for some dumb reason;
> -	 * undo that brain damage:
> +	 * Remove leading whitespace on Intel processors and trailing
> +	 * whitespace on AMD processors.
>  	 */
> -	p = q = &c->x86_model_id[0];
> -	while (*p == ' ')
> -		p++;
> -	if (p != q) {
> -		while (*p)
> -			*q++ = *p++;
> -		while (q <= &c->x86_model_id[48])
> -			*q++ = '\0';	/* Zero-pad the rest */
> -	}
> +	strlcpy(c->x86_model_id, strim(c->x86_model_id), 48);
>  }

Which this clearly isnt!

Fortunately before complaining about that I read the changelog as 
well, and realized that the 'whitespace cleanup' is done to 
/proc/cpuinfo ABI output.

Could you please make the title less ambiguous, so that sleepy kernel 
developers get the right idea what the patch looks like, from the 
title alone? Git shortlogs will vastly improve as well.

Something like:

   [PATCH] x86/cpu: Strip leading and trailing spaces from the /proc/cpuinfo CPU model field

... or so would work very well for me!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-19 11:31   ` Borislav Petkov
@ 2015-05-20  8:55     ` Ingo Molnar
  2015-05-20  9:12       ` Borislav Petkov
  2015-05-21 13:26     ` Huang Rui
  1 sibling, 1 reply; 710+ messages in thread
From: Ingo Molnar @ 2015-05-20  8:55 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Huang Rui, Len Brown, Rafael J. Wysocki, Thomas Gleixner, x86,
	linux-kernel, Fengguang Wu, Aaron Lu, Tony Li


* Borislav Petkov <bp@suse.de> wrote:

> On Tue, May 19, 2015 at 04:01:10PM +0800, Huang Rui wrote:
> > MWAITX/MWAIT does not let the cpu core go into C1 state on AMD processors.
> > The cpu core still consumes less power while waiting, and has faster exit
> > from waiting than "Halt". This patch implements an interface using the
> > kernel parameter "idle=" to configure mwaitx type and timer value.
> > 
> > If "idle=mwaitx", the timeout will be set as the maximum value
> > ((2^64 - 1) * TSC cycle).
> > If "idle=mwaitx,100", the timeout will be set as 100ns.
> > If the processor doesn't support MWAITX, then halt is used.

So what does the hardware do with the timeout value?

Does it use it to decide how 'deep' a sleep it will go into, i.e. 
larger timeouts cause longer entry and exit latencies?

Or some other purpose?

I suppose it's also the case that if an interrupt arrives _before_ the 
expected timeout then MWAITX will try to exit immediately, it won't 
wait until the timeout, right?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-20  8:55     ` Ingo Molnar
@ 2015-05-20  9:12       ` Borislav Petkov
  2015-05-20 10:22         ` Ingo Molnar
  2015-05-21 14:15         ` Huang Rui
  0 siblings, 2 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-20  9:12 UTC (permalink / raw)
  To: Ingo Molnar, Huang Rui
  Cc: Len Brown, Rafael J. Wysocki, Thomas Gleixner, x86, linux-kernel,
	Fengguang Wu, Aaron Lu, Tony Li

On Wed, May 20, 2015 at 10:55:20AM +0200, Ingo Molnar wrote:
> Does it use it to decide how 'deep' a sleep it will go into, i.e. 
> larger timeouts cause longer entry and exit latencies?

That's what the HLT thing does. Cores go into C1 and then at some point
(hysteresis, etc) the whole core complex enters C1E.

The MWAIT* should be used for only shorter sleeps as it remains in C1.
IMHO, of course.

But the problem there is another: what happens if the timeout fires,
you wake up and see that you can remain idle? Do HLT? Do another MWAITX
round?

This means you have an additional unnecessary wakeup which costs.

> I suppose it's also the case that if an interrupt arrives _before_ the 
> expected timeout then MWAITX will try to exit immediately, it won't 
> wait until the timeout, right?

I'd assume so - I mean, it must, right.

BUT!, in talking to Andy about it last night on IRC, he pointed out
that when using acpi_idle, we never come to calling x86_idle() and from
looking quickly at cpuidle_idle_call(), that still might be the case as
we go to use_default only when there's an error with the cpuidle driver
or so.

So Rui, before you go and do more work on it, you should probably
analyze what cpuidle exactly does (if you haven't done so yet). And on
AMD we do use acpi_idle - at least on my F15h box that is the case:

$ grep . /sys/devices/system/cpu/cpuidle/current_*
/sys/devices/system/cpu/cpuidle/current_driver:acpi_idle
/sys/devices/system/cpu/cpuidle/current_governor_ro:menu

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH] x86, cpuinfo x86_model_id whitespace cleanup
  2015-05-20  6:34 ` Ingo Molnar
@ 2015-05-20 10:15   ` Prarit Bhargava
  0 siblings, 0 replies; 710+ messages in thread
From: Prarit Bhargava @ 2015-05-20 10:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Andy Lutomirski, Borislav Petkov, Denys Vlasenko, Dave Hansen,
	Igor Mammedov, Fenghua Yu, Brian Gerst



On 05/20/2015 02:34 AM, Ingo Molnar wrote:
> 
> * Prarit Bhargava <prarit@redhat.com> wrote:
> 
>>  arch/x86/kernel/cpu/common.c |   17 ++++-------------
>>  1 file changed, 4 insertions(+), 13 deletions(-)
> 
> So I saw this title:
> 
>    [PATCH] x86, cpuinfo x86_model_id whitespace cleanup
> 
> ... and in an early morning deconcentrated state was skipped the 
> changelog and was looking for a whitespace coding style cleanup:
> 
>> @@ -431,18 +430,10 @@ static void get_model_name(struct cpuinfo_x86 *c)
>>  	c->x86_model_id[48] = 0;
>>  
>>  	/*
>> -	 * Intel chips right-justify this string for some dumb reason;
>> -	 * undo that brain damage:
>> +	 * Remove leading whitespace on Intel processors and trailing
>> +	 * whitespace on AMD processors.
>>  	 */
>> -	p = q = &c->x86_model_id[0];
>> -	while (*p == ' ')
>> -		p++;
>> -	if (p != q) {
>> -		while (*p)
>> -			*q++ = *p++;
>> -		while (q <= &c->x86_model_id[48])
>> -			*q++ = '\0';	/* Zero-pad the rest */
>> -	}
>> +	strlcpy(c->x86_model_id, strim(c->x86_model_id), 48);
>>  }
> 
> Which this clearly isnt!
> 
> Fortunately before complaining about that I read the changelog as 
> well, and realized that the 'whitespace cleanup' is done to 
> /proc/cpuinfo ABI output.
> 
> Could you please make the title less ambiguous, so that sleepy kernel 
> developers get the right idea what the patch looks like, from the 
> title alone? Git shortlogs will vastly improve as well.
> 
> Something like:
> 
>    [PATCH] x86/cpu: Strip leading and trailing spaces from the /proc/cpuinfo CPU model field
> 
> ... or so would work very well for me!

:)  Sorry Ingo -- I'll definitely clean that up in the next version.

P.

> 
> Thanks,
> 
> 	Ingo
> 

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-20  9:12       ` Borislav Petkov
@ 2015-05-20 10:22         ` Ingo Molnar
  2015-05-20 10:50           ` Borislav Petkov
  2015-05-21 14:15         ` Huang Rui
  1 sibling, 1 reply; 710+ messages in thread
From: Ingo Molnar @ 2015-05-20 10:22 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Huang Rui, Len Brown, Rafael J. Wysocki, Thomas Gleixner, x86,
	linux-kernel, Fengguang Wu, Aaron Lu, Tony Li


* Borislav Petkov <bp@suse.de> wrote:

> On Wed, May 20, 2015 at 10:55:20AM +0200, Ingo Molnar wrote:
> > Does it use it to decide how 'deep' a sleep it will go into, i.e. 
> > larger timeouts cause longer entry and exit latencies?
> 
> That's what the HLT thing does. Cores go into C1 and then at some 
> point (hysteresis, etc) the whole core complex enters C1E.

Well, HLT does not get any hint from the OS how long the idling is 
expected to last.

So I don't think it's the same thing.

> The MWAIT* should be used for only shorter sleeps as it remains in 
> C1. IMHO, of course.
> 
> But the problem there is another: what happens if the timeout fires, 
> you wake up and see that you can remain idle? Do HLT? Do another 
> MWAITX round?

Another MWAITX round - we've got no crystal ball, so the hint might be 
wrong if an external event occurs that we did not anticipate.

As long as it's a statistical optimization it's OK: i.e. if the 
hardware only uses the timeout to determine how deep to sleep.

> This means you have an additional unnecessary wakeup which costs.

I don't think MWAITX will wake up in itself. (If yes then it's 
essentially a timer in disguise and needs a whole different approach!)

> > I suppose it's also the case that if an interrupt arrives _before_ 
> > the expected timeout then MWAITX will try to exit immediately, it 
> > won't wait until the timeout, right?
> 
> I'd assume so - I mean, it must, right.
> 
> BUT!, in talking to Andy about it last night on IRC, he pointed out 
> that when using acpi_idle, we never come to calling x86_idle() and 
> from looking quickly at cpuidle_idle_call(), that still might be the 
> case as we go to use_default only when there's an error with the 
> cpuidle driver or so.

Yes, we don't normally see these idle handlers, ACPI takes over on 
most systems.

> So Rui, before you go and do more work on it, you should probably 
> analyze what cpuidle exactly does (if you haven't done so yet). And 
> on AMD we do use acpi_idle - at least on my F15h box that is the 
> case:
> 
> $ grep . /sys/devices/system/cpu/cpuidle/current_*
> /sys/devices/system/cpu/cpuidle/current_driver:acpi_idle
> /sys/devices/system/cpu/cpuidle/current_governor_ro:menu

Yes.

The question would be: on systems that provide ACPI idle but also have 
MWAITX support, which one behaves better on the hardware side?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
  2015-05-18 16:34 [PATCH v4 0/3] Compile-time stack frame pointer validation Josh Poimboeuf
                   ` (2 preceding siblings ...)
  2015-05-18 16:34 ` [PATCH v4 3/3] x86, stackvalidate: Add asm frame pointer setup macros Josh Poimboeuf
@ 2015-05-20 10:33 ` Ingo Molnar
  2015-05-20 14:13   ` Josh Poimboeuf
  3 siblings, 1 reply; 710+ messages in thread
From: Ingo Molnar @ 2015-05-20 10:33 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Michal Marek,
	Peter Zijlstra, x86, live-patching, linux-kernel


* Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> In discussions around the live kernel patching consistency model RFC
> [1], Peter and Ingo correctly pointed out that stack traces aren't
> reliable.  And as Ingo said, there's no "strong force" which ensures we
> can rely on them.
> 
> So I've been thinking about how to fix that.  My goal is to eventually
> make stack traces reliable.  Or at the very least, to be able to detect
> at runtime when a given stack trace *might* be unreliable.  But improved
> stack traces would broadly benefit the entire kernel, regardless of the
> outcome of the live kernel patching consistency model discussions.
> 
> This patch set is just the first in a series of proposed stack trace
> reliability improvements.  Future proposals will include runtime stack
> reliability checking, as well as compile-time and runtime DWARF
> validations.
> 
> As far as I can tell, there are two main obstacles which prevent frame
> pointer based stack traces from being reliable:
> 
> 1) Missing frame pointer logic: currently, most assembly functions don't
>    set up the frame pointer.

Could you please paste here the output of what the new checks print 
for x86/64 defconfig?

> As a first step, all reported non-compliances result in warnings.  
> Right now I'm seeing 200+ warnings.  Once we get them all cleaned 
> up, we can change the warnings to build errors so the asm code can 
> stay clean.

That's quite a bit ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-20 10:22         ` Ingo Molnar
@ 2015-05-20 10:50           ` Borislav Petkov
  2015-05-20 11:11             ` Ingo Molnar
  0 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-20 10:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Huang Rui, Len Brown, Rafael J. Wysocki, Thomas Gleixner, x86,
	linux-kernel, Fengguang Wu, Aaron Lu, Tony Li

On Wed, May 20, 2015 at 12:22:58PM +0200, Ingo Molnar wrote:
> Well, HLT does not get any hint from the OS how long the idling is
> expected to last.

MWAIT on AMD doesn't either:

"EAX specifies optional hints for the MWAIT instruction. There are
currently no hints defined and all bits should be 0. Setting a reserved
bit in EAX is ignored by the processor."

I don't know about MWAITX though as I haven't seen any official doc yet.

> Another MWAITX round - we've got no crystal ball, so the hint might be 
> wrong if an external event occurs that we did not anticipate.

So if we end up doing a bunch of MWAITX rounds instead of HLT and MWAITX
saves less power than HLT, then we practically are worse.

I think the idea with MWAITX is to use it to sleep only when you know
the timeout would be shorter - whatever "shorter" means - and thus you
can save yourself the idle entry/exit latency.

If you keep waking up due to timeout ending - which with u32 in EBX will
be ~1sec on a 4GHz core, or 2 on a 2GHz core - and your MWAITX C-state
is worse wrt power consumption than your HLT state, then you lose. And
your MWAITX C-state *is* worse currently, see below.

> I don't think MWAITX will wake up in itself. (If yes then it's
> essentially a timer in disguise and needs a whole different approach!)

I mean when the MWAITX timeout expires. When it does, we wake up.

Also, normal MWAIT allows for interrupts to wake it up:

"ECX specifies optional extensions for the MWAIT instruction. The only
extension currently defined is ECX bit 0, which allows interrupts to
wake MWAIT, even when eFLAGS.IF = 0. Support for this extension is
indicated by a feature flage returned by the CPUID instruction."

> The question would be: on systems that provide ACPI idle but also have
> MWAITX support, which one behaves better on the hardware side?

I'd venture a guess here that the ACPI side should be using all C-states
available (think of other OSes and having optimal power savings there)
and MWAITX would be worse or the same. Right now it is entering some
funny state between C0 and C1 reportedly:

"But on AMD platform, mwaitx/mwait cannot go to C1 or C1E like intel.
The power consumption of waiting phase is somewhere in between (C0 and
C1). Actually, it's still in C0 but less power consumption than normal
C0."

So my thinking currently is - provided we want to use it at all:

* Use MWAITX on entry to idle, considering that on a busy system, the
statistical probability of this sleep timeout to be small, is high.

* When the timeout expires and we wake up and realize there's still
nothing to do, we do HLT.

But all that is pointless if we end up in acpi_idle anyway...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-20 10:50           ` Borislav Petkov
@ 2015-05-20 11:11             ` Ingo Molnar
  2015-05-20 11:21               ` Borislav Petkov
  2015-05-25  2:42               ` Huang Rui
  0 siblings, 2 replies; 710+ messages in thread
From: Ingo Molnar @ 2015-05-20 11:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Huang Rui, Len Brown, Rafael J. Wysocki, Thomas Gleixner, x86,
	linux-kernel, Fengguang Wu, Aaron Lu, Tony Li


* Borislav Petkov <bp@suse.de> wrote:

> On Wed, May 20, 2015 at 12:22:58PM +0200, Ingo Molnar wrote:
>
> > Well, HLT does not get any hint from the OS how long the idling is 
> > expected to last.
> 
> MWAIT on AMD doesn't either:

Yeah, MWAIT clearly doesn't, but I was talking about MWAITX, which 
takes a timeout parameter as per these patches.

> > Another MWAITX round - we've got no crystal ball, so the hint 
> > might be wrong if an external event occurs that we did not 
> > anticipate.
> 
> So if we end up doing a bunch of MWAITX rounds instead of HLT and 
> MWAITX saves less power than HLT, then we practically are worse.

So the way I think it would work ideally is (and note that this is 
different from how you think it works):

  - MWAITX takes a 'timeout' parameter, but otherwise behaves exactly 
    like MWAIT: i.e. once idle it won't exit idle on its own

  - based on the 'timeout' hint, MWAITX can internally optimize how 
    deep sleep it enters. If the timeout is large it goes deep, if 
    it's small, it goes shallow. This does not change the fact that no 
    matter which state it enters, it will come back the moment an 
    interrupt is posted.

if actual behavior departs from this ideal behavior then we need to 
know.

> > I don't think MWAITX will wake up in itself. (If yes then it's 
> > essentially a timer in disguise and needs a whole different 
> > approach!)
> 
> I mean when the MWAITX timeout expires. When it does, we wake up.

I'm not sure that's the case, see above what I think would be the 
ideal behavior.

If it's a true timeout, as you suggest, then I don't see any obvious 
way to support it, especially if it does not give access to deeper 
sleep states.

> Also, normal MWAIT allows for interrupts to wake it up:

of course! That's a necessary feature as otherwise we could stay stuck 
indefinitely.

This MWAIT extension:

> "ECX specifies optional extensions for the MWAIT instruction. The 
> only extension currently defined is ECX bit 0, which allows 
> interrupts to wake MWAIT, even when eFLAGS.IF = 0. Support for this 
> extension is indicated by a feature flage returned by the CPUID 
> instruction."

is used by the Intel idle driver so that we can call MWAIT with 
interrupts disabled - but it behaves as if interrupts were enabled.

> > The question would be: on systems that provide ACPI idle but also 
> > have MWAITX support, which one behaves better on the hardware 
> > side?
> 
> I'd venture a guess here that the ACPI side should be using all 
> C-states available (think of other OSes and having optimal power 
> savings there) and MWAITX would be worse or the same. Right now it 
> is entering some funny state between C0 and C1 reportedly:
> 
> "But on AMD platform, mwaitx/mwait cannot go to C1 or C1E like 
> intel. The power consumption of waiting phase is somewhere in 
> between (C0 and C1). Actually, it's still in C0 but less power 
> consumption than normal C0."
> 
> So my thinking currently is - provided we want to use it at all:
> 
> * Use MWAITX on entry to idle, considering that on a busy system, 
> the statistical probability of this sleep timeout to be small, is 
> high.
> 
> * When the timeout expires and we wake up and realize there's still 
> nothing to do, we do HLT.
> 
> But all that is pointless if we end up in acpi_idle anyway...

Yeah.

Thanks,

	Ingo


^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-20 11:11             ` Ingo Molnar
@ 2015-05-20 11:21               ` Borislav Petkov
  2015-05-20 11:41                 ` Ingo Molnar
  2015-05-21 14:32                 ` Huang Rui
  2015-05-25  2:42               ` Huang Rui
  1 sibling, 2 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-20 11:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Huang Rui, Len Brown, Rafael J. Wysocki, Thomas Gleixner, x86,
	linux-kernel, Fengguang Wu, Aaron Lu, Tony Li

On Wed, May 20, 2015 at 01:11:20PM +0200, Ingo Molnar wrote:
>   - MWAITX takes a 'timeout' parameter, but otherwise behaves exactly 
>     like MWAIT: i.e. once idle it won't exit idle on its own

Let me quote the commit message:

"MWAITT, another name is MWAITX (MWAIT with extensions), has a
configurable timer that causes MWAITX to exit on expiration."

You need to set the second bit in ECX to enable the timer.

I guess if you don't, then you get normal MWAIT but then you don't need
the timeout either...

>   - based on the 'timeout' hint, MWAITX can internally optimize how 
>     deep sleep it enters. If the timeout is large it goes deep, if 
>     it's small, it goes shallow.

I haven't heard anything about handling the timeout this way and if it
is not done this way, maybe Rui could forward this idea to hw people...

> If it's a true timeout, as you suggest, then I don't see any obvious
> way to support it, especially if it does not give access to deeper
> sleep states.

Right.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* [PATCH] mce: fix fail to set 'monarchtimeout' via boot option
@ 2015-05-20 11:22 Xie XiuQi
  2015-05-20 17:43 ` Borislav Petkov
  0 siblings, 1 reply; 710+ messages in thread
From: Xie XiuQi @ 2015-05-20 11:22 UTC (permalink / raw)
  To: tony.luck, bp, tglx, mingo, hpa; +Cc: x86, linux-edac, linux-kernel

I use "mce=1,10000000" in cmdline to change the monarch timeout, but
it does not work.

The cause is that get_option() has parsed the ',' already, we need
not to check the ',' again.

--
get_option(): read an int from an option string;
if available accept a subsequent comma as well.

Return values:
0 - no int in string
1 - int found, no subsequent comma
2 - int found including a subsequent comma
3 - hyphen found to denote a range

Cc: <stable@vger.kernel.org>	# 2.6.32+
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 2a2bb91..46ca8e7 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -2020,11 +2020,8 @@ static int __init mcheck_enable(char *str)
 	else if (!strcmp(str, "bios_cmci_threshold"))
 		cfg->bios_cmci_threshold = true;
 	else if (isdigit(str[0])) {
-		get_option(&str, &(cfg->tolerant));
-		if (*str == ',') {
-			++str;
+		if (get_option(&str, &(cfg->tolerant) == 2)
 			get_option(&str, &(cfg->monarch_timeout));
-		}
 	} else {
 		pr_info("mce argument %s ignored. Please use /sys\n", str);
 		return 0;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-20 11:21               ` Borislav Petkov
@ 2015-05-20 11:41                 ` Ingo Molnar
  2015-05-20 13:20                   ` Thomas Gleixner
  2015-05-21 14:32                 ` Huang Rui
  1 sibling, 1 reply; 710+ messages in thread
From: Ingo Molnar @ 2015-05-20 11:41 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Huang Rui, Len Brown, Rafael J. Wysocki, Thomas Gleixner, x86,
	linux-kernel, Fengguang Wu, Aaron Lu, Tony Li,
	Frédéric Weisbecker


* Borislav Petkov <bp@suse.de> wrote:

> On Wed, May 20, 2015 at 01:11:20PM +0200, Ingo Molnar wrote:
> >   - MWAITX takes a 'timeout' parameter, but otherwise behaves exactly 
> >     like MWAIT: i.e. once idle it won't exit idle on its own
> 
> Let me quote the commit message:
> 
> "MWAITT, another name is MWAITX (MWAIT with extensions), has a
> configurable timer that causes MWAITX to exit on expiration."

Ah. A useful skill that is, being able to read.

> You need to set the second bit in ECX to enable the timer.
> 
> I guess if you don't, then you get normal MWAIT but then you don't 
> need the timeout either...

Yeah.

So if it's a true timeout then we could use it to implement irq-less 
timers: that's actually pretty useful, because it could be faster than 
getting a local APIC timer irq, etc.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-05-19 13:23                     ` Borislav Petkov
@ 2015-05-20 11:55                         ` Ingo Molnar
  2015-05-20 11:55                         ` Ingo Molnar
  1 sibling, 0 replies; 710+ messages in thread
From: Ingo Molnar @ 2015-05-20 11:55 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Toshi Kani, akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel,
	dave.hansen, Elliott, pebolle, mcgrof


* Borislav Petkov <bp@alien8.de> wrote:

> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -566,19 +566,28 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
>  /**
>   * pud_set_huge - setup kernel PUD mapping
>   *
> - * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
> - * this function does not set up a huge page when the range is covered
> - * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
> - * disabled.
> + * MTRRs can override PAT memory types with 4KiB granularity. Therefore, this
> + * function sets up a huge page only if any of the following conditions are met:
> + *
> + * - MTRRs are disabled, or
> + *
> + * - MTRRs are enabled and the range is completely covered by a single MTRR, or
> + *
> + * - MTRRs are enabled and the range is not completely covered by a single MTRR
> + *   but the memory type of the range is WB, even if covered by multiple MTRRs.
> + *
> + * Callers should try to decrease page size (1GB -> 2MB -> 4K) if the bigger
> + * page mapping attempt fails.

This comment should explain why it's ok in the WB case.

Also, the phrase 'the memory type of the range' is ambiguous: it might 
mean the partial MTRR's, or the memory type specified via PAT by the 
huge-pmd entry.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
@ 2015-05-20 11:55                         ` Ingo Molnar
  0 siblings, 0 replies; 710+ messages in thread
From: Ingo Molnar @ 2015-05-20 11:55 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Toshi Kani, akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel,
	dave.hansen, Elliott, pebolle, mcgrof


* Borislav Petkov <bp@alien8.de> wrote:

> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -566,19 +566,28 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
>  /**
>   * pud_set_huge - setup kernel PUD mapping
>   *
> - * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
> - * this function does not set up a huge page when the range is covered
> - * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
> - * disabled.
> + * MTRRs can override PAT memory types with 4KiB granularity. Therefore, this
> + * function sets up a huge page only if any of the following conditions are met:
> + *
> + * - MTRRs are disabled, or
> + *
> + * - MTRRs are enabled and the range is completely covered by a single MTRR, or
> + *
> + * - MTRRs are enabled and the range is not completely covered by a single MTRR
> + *   but the memory type of the range is WB, even if covered by multiple MTRRs.
> + *
> + * Callers should try to decrease page size (1GB -> 2MB -> 4K) if the bigger
> + * page mapping attempt fails.

This comment should explain why it's ok in the WB case.

Also, the phrase 'the memory type of the range' is ambiguous: it might 
mean the partial MTRR's, or the memory type specified via PAT by the 
huge-pmd entry.

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-20 11:41                 ` Ingo Molnar
@ 2015-05-20 13:20                   ` Thomas Gleixner
  2015-05-20 14:51                     ` Ingo Molnar
  0 siblings, 1 reply; 710+ messages in thread
From: Thomas Gleixner @ 2015-05-20 13:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Borislav Petkov, Huang Rui, Len Brown, Rafael J. Wysocki, x86,
	linux-kernel, Fengguang Wu, Aaron Lu, Tony Li,
	Frédéric Weisbecker

On Wed, 20 May 2015, Ingo Molnar wrote:
> * Borislav Petkov <bp@suse.de> wrote:
> 
> > On Wed, May 20, 2015 at 01:11:20PM +0200, Ingo Molnar wrote:
> > >   - MWAITX takes a 'timeout' parameter, but otherwise behaves exactly 
> > >     like MWAIT: i.e. once idle it won't exit idle on its own
> > 
> > Let me quote the commit message:
> > 
> > "MWAITT, another name is MWAITX (MWAIT with extensions), has a
> > configurable timer that causes MWAITX to exit on expiration."
> 
> Ah. A useful skill that is, being able to read.
> 
> > You need to set the second bit in ECX to enable the timer.
> > 
> > I guess if you don't, then you get normal MWAIT but then you don't 
> > need the timeout either...
> 
> Yeah.
> 
> So if it's a true timeout then we could use it to implement irq-less 
> timers: that's actually pretty useful, because it could be faster than 
> getting a local APIC timer irq, etc.

Uurgh, NO NO NO! 

We have enough trouble with non functional timers already, we do not
need another variant of those.

We can supply the estimated sleep time though if that helps the PM
controller underneath to select a state. That's more or less what we
do in the governors as well.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
  2015-05-20 10:33 ` [PATCH v4 0/3] Compile-time stack frame pointer validation Ingo Molnar
@ 2015-05-20 14:13   ` Josh Poimboeuf
  2015-05-20 14:48     ` Ingo Molnar
  0 siblings, 1 reply; 710+ messages in thread
From: Josh Poimboeuf @ 2015-05-20 14:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Michal Marek,
	Peter Zijlstra, x86, live-patching, linux-kernel

On Wed, May 20, 2015 at 12:33:39PM +0200, Ingo Molnar wrote:
> 
> * Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> 
> > In discussions around the live kernel patching consistency model RFC
> > [1], Peter and Ingo correctly pointed out that stack traces aren't
> > reliable.  And as Ingo said, there's no "strong force" which ensures we
> > can rely on them.
> > 
> > So I've been thinking about how to fix that.  My goal is to eventually
> > make stack traces reliable.  Or at the very least, to be able to detect
> > at runtime when a given stack trace *might* be unreliable.  But improved
> > stack traces would broadly benefit the entire kernel, regardless of the
> > outcome of the live kernel patching consistency model discussions.
> > 
> > This patch set is just the first in a series of proposed stack trace
> > reliability improvements.  Future proposals will include runtime stack
> > reliability checking, as well as compile-time and runtime DWARF
> > validations.
> > 
> > As far as I can tell, there are two main obstacles which prevent frame
> > pointer based stack traces from being reliable:
> > 
> > 1) Missing frame pointer logic: currently, most assembly functions don't
> >    set up the frame pointer.
> 
> Could you please paste here the output of what the new checks print 
> for x86/64 defconfig?

Here are all 89 warnings from defconfig:

arch/x86/ia32/ia32entry.o: ia32_sysenter_target() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/ia32/ia32entry.o: return instruction outside of a function at .entry.text+0x52e.  Please use FUNC_ENTER.
arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x359.  Please use FUNC_ENTER.
arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x19be.  Please use FUNC_ENTER.
arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x19e5.  Please use FUNC_ENTER.
arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x1c21.  Please use FUNC_ENTER.
arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x1ceb.  Please use FUNC_ENTER.
arch/x86/kernel/acpi/wakeup_64.o: wakeup_long64() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/kernel/acpi/wakeup_64.o: do_suspend_lowlevel() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/platform/efi/efi_stub_64.o: efi_call() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x6b.  Please use FUNC_ENTER.
arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0xc7.  Please use FUNC_ENTER.
arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x110.  Please use FUNC_ENTER.
arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x145.  Please use FUNC_ENTER.
arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x1c4.  Please use FUNC_ENTER.
arch/x86/realmode/rm/trampoline_64.o: return instruction outside of a function at .text+0x170.  Please use FUNC_ENTER.
arch/x86/realmode/rm/trampoline_64.o: return instruction outside of a function at .text+0x176.  Please use FUNC_ENTER.
arch/x86/kernel/head_64.o: return instruction outside of a function at .head.text+0x1a2.  Please use FUNC_ENTER.
arch/x86/kernel/head_64.o: start_cpu0() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/kernel/head_64.o: early_idt_handler() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/realmode/rm/reboot.o: return instruction outside of a function at .text+0x2a.  Please use FUNC_ENTER.
arch/x86/realmode/rm/copy.o: memcpy() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/realmode/rm/copy.o: memset() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/realmode/rm/copy.o: copy_from_fs() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/realmode/rm/copy.o: copy_to_fs() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/realmode/rm/bioscall.o: intcall() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/vdso/vdso32/int80.o: __kernel_sigreturn() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/vdso/vdso32/int80.o: __kernel_rt_sigreturn() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/vdso/vdso32/int80.o: __kernel_vsyscall() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/vdso/vdso32/syscall.o: __kernel_sigreturn() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/vdso/vdso32/syscall.o: __kernel_rt_sigreturn() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/vdso/vdso32/syscall.o: __kernel_vsyscall() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/vdso/vdso32/sysenter.o: __kernel_sigreturn() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/vdso/vdso32/sysenter.o: __kernel_rt_sigreturn() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/power/hibernate_asm_64.o: return instruction outside of a function at .text+0x69.  Please use FUNC_ENTER.
arch/x86/power/hibernate_asm_64.o: return instruction outside of a function at .text+0x16d.  Please use FUNC_ENTER.
arch/x86/lib/msr-reg.o: rdmsr_safe_regs() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/msr-reg.o: wrmsr_safe_regs() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/iomap_copy_64.o: __iowrite32_copy() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/clear_page_64.o: clear_page() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/clear_page_64.o: clear_page_orig() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/clear_page_64.o: clear_page_c_e() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/cmpxchg16b_emu.o: this_cpu_cmpxchg16b_emu() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/copy_page_64.o: copy_page() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/copy_page_64.o: copy_page_regs() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/copy_user_64.o: _copy_to_user() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/copy_user_64.o: _copy_from_user() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/copy_user_64.o: copy_user_generic_unrolled() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/copy_user_64.o: copy_user_generic_string() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/copy_user_64.o: copy_user_enhanced_fast_string() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/copy_user_64.o: __copy_user_nocache() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/copy_user_64.o: bad_from_user() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/csum-copy_64.o: csum_partial_copy_generic() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/getuser.o: __get_user_1() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/getuser.o: __get_user_2() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/getuser.o: __get_user_4() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/getuser.o: __get_user_8() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/getuser.o: return instruction outside of a function at .text+0xc5.  Please use FUNC_ENTER.
arch/x86/lib/memcpy_64.o: memcpy() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/memcpy_64.o: __memcpy() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/memcpy_64.o: memcpy_erms() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/memcpy_64.o: memcpy_orig() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/memmove_64.o: memmove() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/memmove_64.o: __memmove() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/memmove_64.o: return instruction outside of a function at .altinstr_replacement+0x5.  Please use FUNC_ENTER.
arch/x86/lib/memset_64.o: memset() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/memset_64.o: __memset() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/memset_64.o: memset_erms() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/memset_64.o: memset_orig() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/putuser.o: __put_user_1() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/putuser.o: __put_user_2() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/putuser.o: __put_user_4() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/putuser.o: __put_user_8() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/putuser.o: return instruction outside of a function at .text+0xc1.  Please use FUNC_ENTER.
arch/x86/lib/rwsem.o: call_rwsem_down_read_failed() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/rwsem.o: call_rwsem_down_write_failed() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/rwsem.o: call_rwsem_wake() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/lib/rwsem.o: call_rwsem_downgrade_wake() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/boot/bioscall.o: intcall() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/boot/copy.o: memcpy() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/boot/copy.o: memset() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/boot/copy.o: copy_from_fs() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/boot/copy.o: copy_to_fs() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/boot/pmjump.o: protected_mode_jump() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/boot/pmjump.o: in_pm32() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/boot/compressed/head_64.o: return instruction outside of a function at .text+0x16e.  Please use FUNC_ENTER.
arch/x86/boot/compressed/head_64.o: return instruction outside of a function at .text+0x172.  Please use FUNC_ENTER.
arch/x86/boot/compressed/head_64.o: startup_32() is missing frame pointer logic.  Please use FUNC_ENTER.
arch/x86/boot/header.o: die() is missing frame pointer logic.  Please use FUNC_ENTER.

> > As a first step, all reported non-compliances result in warnings.  
> > Right now I'm seeing 200+ warnings.  Once we get them all cleaned 
> > up, we can change the warnings to build errors so the asm code can 
> > stay clean.
> 
> That's quite a bit ...

Yeah, a Fedora-based config has over 200 warnings.  Most of the
differences between the above 89 warnings for defconfig and the 200+ for
a Fedora config seem to be caused by xen, crypto and bpf.

-- 
Josh

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-05-20 11:55                         ` Ingo Molnar
@ 2015-05-20 14:34                           ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-20 14:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Borislav Petkov, akpm, hpa, tglx, mingo, linux-mm, x86,
	linux-kernel, dave.hansen, Elliott, pebolle, mcgrof

On Wed, 2015-05-20 at 13:55 +0200, Ingo Molnar wrote:
> * Borislav Petkov <bp@alien8.de> wrote:
> 
> > --- a/arch/x86/mm/pgtable.c
> > +++ b/arch/x86/mm/pgtable.c
> > @@ -566,19 +566,28 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
> >  /**
> >   * pud_set_huge - setup kernel PUD mapping
> >   *
> > - * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
> > - * this function does not set up a huge page when the range is covered
> > - * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
> > - * disabled.
> > + * MTRRs can override PAT memory types with 4KiB granularity. Therefore, this
> > + * function sets up a huge page only if any of the following conditions are met:
> > + *
> > + * - MTRRs are disabled, or
> > + *
> > + * - MTRRs are enabled and the range is completely covered by a single MTRR, or
> > + *
> > + * - MTRRs are enabled and the range is not completely covered by a single MTRR
> > + *   but the memory type of the range is WB, even if covered by multiple MTRRs.
> > + *
> > + * Callers should try to decrease page size (1GB -> 2MB -> 4K) if the bigger
> > + * page mapping attempt fails.
> 
> This comment should explain why it's ok in the WB case.
> 
> Also, the phrase 'the memory type of the range' is ambiguous: it might 
> mean the partial MTRR's, or the memory type specified via PAT by the 
> huge-pmd entry.

Agreed.  How about this sentence?

 - MTRRs are enabled and the corresponding MTRR memory type is WB, which
has no effect to the requested PAT memory type.

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
@ 2015-05-20 14:34                           ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-20 14:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Borislav Petkov, akpm, hpa, tglx, mingo, linux-mm, x86,
	linux-kernel, dave.hansen, Elliott, pebolle, mcgrof

On Wed, 2015-05-20 at 13:55 +0200, Ingo Molnar wrote:
> * Borislav Petkov <bp@alien8.de> wrote:
> 
> > --- a/arch/x86/mm/pgtable.c
> > +++ b/arch/x86/mm/pgtable.c
> > @@ -566,19 +566,28 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
> >  /**
> >   * pud_set_huge - setup kernel PUD mapping
> >   *
> > - * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
> > - * this function does not set up a huge page when the range is covered
> > - * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
> > - * disabled.
> > + * MTRRs can override PAT memory types with 4KiB granularity. Therefore, this
> > + * function sets up a huge page only if any of the following conditions are met:
> > + *
> > + * - MTRRs are disabled, or
> > + *
> > + * - MTRRs are enabled and the range is completely covered by a single MTRR, or
> > + *
> > + * - MTRRs are enabled and the range is not completely covered by a single MTRR
> > + *   but the memory type of the range is WB, even if covered by multiple MTRRs.
> > + *
> > + * Callers should try to decrease page size (1GB -> 2MB -> 4K) if the bigger
> > + * page mapping attempt fails.
> 
> This comment should explain why it's ok in the WB case.
> 
> Also, the phrase 'the memory type of the range' is ambiguous: it might 
> mean the partial MTRR's, or the memory type specified via PAT by the 
> huge-pmd entry.

Agreed.  How about this sentence?

 - MTRRs are enabled and the corresponding MTRR memory type is WB, which
has no effect to the requested PAT memory type.

Thanks,
-Toshi


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
  2015-05-20 14:13   ` Josh Poimboeuf
@ 2015-05-20 14:48     ` Ingo Molnar
  2015-05-20 15:51       ` Josh Poimboeuf
                         ` (2 more replies)
  0 siblings, 3 replies; 710+ messages in thread
From: Ingo Molnar @ 2015-05-20 14:48 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Michal Marek,
	Peter Zijlstra, x86, live-patching, linux-kernel, Linus Torvalds,
	Andy Lutomirski, Denys Vlasenko, Brian Gerst, Peter Zijlstra,
	Borislav Petkov, Andrew Morton


* Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> On Wed, May 20, 2015 at 12:33:39PM +0200, Ingo Molnar wrote:
> > 
> > * Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > 
> > > In discussions around the live kernel patching consistency model RFC
> > > [1], Peter and Ingo correctly pointed out that stack traces aren't
> > > reliable.  And as Ingo said, there's no "strong force" which ensures we
> > > can rely on them.
> > > 
> > > So I've been thinking about how to fix that.  My goal is to eventually
> > > make stack traces reliable.  Or at the very least, to be able to detect
> > > at runtime when a given stack trace *might* be unreliable.  But improved
> > > stack traces would broadly benefit the entire kernel, regardless of the
> > > outcome of the live kernel patching consistency model discussions.
> > > 
> > > This patch set is just the first in a series of proposed stack trace
> > > reliability improvements.  Future proposals will include runtime stack
> > > reliability checking, as well as compile-time and runtime DWARF
> > > validations.
> > > 
> > > As far as I can tell, there are two main obstacles which prevent frame
> > > pointer based stack traces from being reliable:
> > > 
> > > 1) Missing frame pointer logic: currently, most assembly functions don't
> > >    set up the frame pointer.
> > 
> > Could you please paste here the output of what the new checks print 
> > for x86/64 defconfig?
> 
> Here are all 89 warnings from defconfig:
> 
> arch/x86/ia32/ia32entry.o: ia32_sysenter_target() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/ia32/ia32entry.o: return instruction outside of a function at .entry.text+0x52e.  Please use FUNC_ENTER.
> arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x359.  Please use FUNC_ENTER.
> arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x19be.  Please use FUNC_ENTER.
> arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x19e5.  Please use FUNC_ENTER.
> arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x1c21.  Please use FUNC_ENTER.
> arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x1ceb.  Please use FUNC_ENTER.
> arch/x86/kernel/acpi/wakeup_64.o: wakeup_long64() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/kernel/acpi/wakeup_64.o: do_suspend_lowlevel() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/platform/efi/efi_stub_64.o: efi_call() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x6b.  Please use FUNC_ENTER.
> arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0xc7.  Please use FUNC_ENTER.
> arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x110.  Please use FUNC_ENTER.
> arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x145.  Please use FUNC_ENTER.
> arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x1c4.  Please use FUNC_ENTER.
> arch/x86/realmode/rm/trampoline_64.o: return instruction outside of a function at .text+0x170.  Please use FUNC_ENTER.
> arch/x86/realmode/rm/trampoline_64.o: return instruction outside of a function at .text+0x176.  Please use FUNC_ENTER.
> arch/x86/kernel/head_64.o: return instruction outside of a function at .head.text+0x1a2.  Please use FUNC_ENTER.
> arch/x86/kernel/head_64.o: start_cpu0() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/kernel/head_64.o: early_idt_handler() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/realmode/rm/reboot.o: return instruction outside of a function at .text+0x2a.  Please use FUNC_ENTER.
> arch/x86/realmode/rm/copy.o: memcpy() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/realmode/rm/copy.o: memset() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/realmode/rm/copy.o: copy_from_fs() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/realmode/rm/copy.o: copy_to_fs() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/realmode/rm/bioscall.o: intcall() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/vdso/vdso32/int80.o: __kernel_sigreturn() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/vdso/vdso32/int80.o: __kernel_rt_sigreturn() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/vdso/vdso32/int80.o: __kernel_vsyscall() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/vdso/vdso32/syscall.o: __kernel_sigreturn() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/vdso/vdso32/syscall.o: __kernel_rt_sigreturn() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/vdso/vdso32/syscall.o: __kernel_vsyscall() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/vdso/vdso32/sysenter.o: __kernel_sigreturn() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/vdso/vdso32/sysenter.o: __kernel_rt_sigreturn() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/power/hibernate_asm_64.o: return instruction outside of a function at .text+0x69.  Please use FUNC_ENTER.
> arch/x86/power/hibernate_asm_64.o: return instruction outside of a function at .text+0x16d.  Please use FUNC_ENTER.
> arch/x86/lib/msr-reg.o: rdmsr_safe_regs() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/msr-reg.o: wrmsr_safe_regs() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/iomap_copy_64.o: __iowrite32_copy() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/clear_page_64.o: clear_page() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/clear_page_64.o: clear_page_orig() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/clear_page_64.o: clear_page_c_e() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/cmpxchg16b_emu.o: this_cpu_cmpxchg16b_emu() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/copy_page_64.o: copy_page() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/copy_page_64.o: copy_page_regs() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/copy_user_64.o: _copy_to_user() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/copy_user_64.o: _copy_from_user() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/copy_user_64.o: copy_user_generic_unrolled() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/copy_user_64.o: copy_user_generic_string() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/copy_user_64.o: copy_user_enhanced_fast_string() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/copy_user_64.o: __copy_user_nocache() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/copy_user_64.o: bad_from_user() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/csum-copy_64.o: csum_partial_copy_generic() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/getuser.o: __get_user_1() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/getuser.o: __get_user_2() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/getuser.o: __get_user_4() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/getuser.o: __get_user_8() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/getuser.o: return instruction outside of a function at .text+0xc5.  Please use FUNC_ENTER.
> arch/x86/lib/memcpy_64.o: memcpy() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/memcpy_64.o: __memcpy() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/memcpy_64.o: memcpy_erms() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/memcpy_64.o: memcpy_orig() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/memmove_64.o: memmove() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/memmove_64.o: __memmove() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/memmove_64.o: return instruction outside of a function at .altinstr_replacement+0x5.  Please use FUNC_ENTER.
> arch/x86/lib/memset_64.o: memset() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/memset_64.o: __memset() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/memset_64.o: memset_erms() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/memset_64.o: memset_orig() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/putuser.o: __put_user_1() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/putuser.o: __put_user_2() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/putuser.o: __put_user_4() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/putuser.o: __put_user_8() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/putuser.o: return instruction outside of a function at .text+0xc1.  Please use FUNC_ENTER.
> arch/x86/lib/rwsem.o: call_rwsem_down_read_failed() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/rwsem.o: call_rwsem_down_write_failed() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/rwsem.o: call_rwsem_wake() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/lib/rwsem.o: call_rwsem_downgrade_wake() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/boot/bioscall.o: intcall() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/boot/copy.o: memcpy() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/boot/copy.o: memset() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/boot/copy.o: copy_from_fs() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/boot/copy.o: copy_to_fs() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/boot/pmjump.o: protected_mode_jump() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/boot/pmjump.o: in_pm32() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/boot/compressed/head_64.o: return instruction outside of a function at .text+0x16e.  Please use FUNC_ENTER.
> arch/x86/boot/compressed/head_64.o: return instruction outside of a function at .text+0x172.  Please use FUNC_ENTER.
> arch/x86/boot/compressed/head_64.o: startup_32() is missing frame pointer logic.  Please use FUNC_ENTER.
> arch/x86/boot/header.o: die() is missing frame pointer logic.  Please use FUNC_ENTER.

Yeah, so many of these seem to be 'leaf only' functions: functions 
that don't ever call functions themselves.

So lets assume we always have CONFIG_FRAME_POINTERS=y.

If they don't set up a frame pointer then they in essence won't show 
up in the call chain - but normally they wouldn't because they call 
nothing.

If they trigger an exception/fault or if they get hit by an interrupt 
then I think we'll still correctly walk the stack - just those 
functions might be missing from the deterministic call chain, right? 
(it will still show up as a '?' entry.)

If they crash then we'll see them because the crashing RIP will be 
printed.

So I'm wondering what the x86 policy here should be: to create frame 
pointers in them or not. Cc:-ed a few more gents for thoughts.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-20 13:20                   ` Thomas Gleixner
@ 2015-05-20 14:51                     ` Ingo Molnar
  2015-05-20 15:55                       ` One Thousand Gnomes
  0 siblings, 1 reply; 710+ messages in thread
From: Ingo Molnar @ 2015-05-20 14:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Borislav Petkov, Huang Rui, Len Brown, Rafael J. Wysocki, x86,
	linux-kernel, Fengguang Wu, Aaron Lu, Tony Li,
	Frédéric Weisbecker


* Thomas Gleixner <tglx@linutronix.de> wrote:

> On Wed, 20 May 2015, Ingo Molnar wrote:
> > * Borislav Petkov <bp@suse.de> wrote:
> > 
> > > On Wed, May 20, 2015 at 01:11:20PM +0200, Ingo Molnar wrote:
> > > >   - MWAITX takes a 'timeout' parameter, but otherwise behaves exactly 
> > > >     like MWAIT: i.e. once idle it won't exit idle on its own
> > > 
> > > Let me quote the commit message:
> > > 
> > > "MWAITT, another name is MWAITX (MWAIT with extensions), has a
> > > configurable timer that causes MWAITX to exit on expiration."
> > 
> > Ah. A useful skill that is, being able to read.
> > 
> > > You need to set the second bit in ECX to enable the timer.
> > > 
> > > I guess if you don't, then you get normal MWAIT but then you don't 
> > > need the timeout either...
> > 
> > Yeah.
> > 
> > So if it's a true timeout then we could use it to implement 
> > irq-less timers: that's actually pretty useful, because it could 
> > be faster than getting a local APIC timer irq, etc.
> 
> Uurgh, NO NO NO!

I know, I know :-)

The XP PIC was a nasty, broken hardware timer, and all x86 timer 
generations after that made the situation even worse.

> We have enough trouble with non functional timers already, we do not 
> need another variant of those.
> 
> We can supply the estimated sleep time though if that helps the PM 
> controller underneath to select a state. That's more or less what we 
> do in the governors as well.

That's not what appears to be happening here though: the MWAITX will 
return after the timeout.

Which isn't really useful unless we use it to drive timers.

So 'lets not use it' might be the sane answer.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-05-20 14:34                           ` Toshi Kani
@ 2015-05-20 15:01                             ` Ingo Molnar
  -1 siblings, 0 replies; 710+ messages in thread
From: Ingo Molnar @ 2015-05-20 15:01 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Borislav Petkov, akpm, hpa, tglx, mingo, linux-mm, x86,
	linux-kernel, dave.hansen, Elliott, pebolle, mcgrof


* Toshi Kani <toshi.kani@hp.com> wrote:

> On Wed, 2015-05-20 at 13:55 +0200, Ingo Molnar wrote:
> > * Borislav Petkov <bp@alien8.de> wrote:
> > 
> > > --- a/arch/x86/mm/pgtable.c
> > > +++ b/arch/x86/mm/pgtable.c
> > > @@ -566,19 +566,28 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
> > >  /**
> > >   * pud_set_huge - setup kernel PUD mapping
> > >   *
> > > - * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
> > > - * this function does not set up a huge page when the range is covered
> > > - * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
> > > - * disabled.
> > > + * MTRRs can override PAT memory types with 4KiB granularity. Therefore, this
> > > + * function sets up a huge page only if any of the following conditions are met:
> > > + *
> > > + * - MTRRs are disabled, or
> > > + *
> > > + * - MTRRs are enabled and the range is completely covered by a single MTRR, or
> > > + *
> > > + * - MTRRs are enabled and the range is not completely covered by a single MTRR
> > > + *   but the memory type of the range is WB, even if covered by multiple MTRRs.
> > > + *
> > > + * Callers should try to decrease page size (1GB -> 2MB -> 4K) if the bigger
> > > + * page mapping attempt fails.
> > 
> > This comment should explain why it's ok in the WB case.
> > 
> > Also, the phrase 'the memory type of the range' is ambiguous: it might 
> > mean the partial MTRR's, or the memory type specified via PAT by the 
> > huge-pmd entry.
> 
> Agreed.  How about this sentence?
> 
>  - MTRRs are enabled and the corresponding MTRR memory type is WB, which
> has no effect to the requested PAT memory type.

s/effect to/effect on

sounds good otherwise!

Btw., if WB MTRR entries can never have an effect on Linux PAT 
specified attributes, why do we allow them to be created? I don't 
think we ever call into real mode for this to matter?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
@ 2015-05-20 15:01                             ` Ingo Molnar
  0 siblings, 0 replies; 710+ messages in thread
From: Ingo Molnar @ 2015-05-20 15:01 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Borislav Petkov, akpm, hpa, tglx, mingo, linux-mm, x86,
	linux-kernel, dave.hansen, Elliott, pebolle, mcgrof


* Toshi Kani <toshi.kani@hp.com> wrote:

> On Wed, 2015-05-20 at 13:55 +0200, Ingo Molnar wrote:
> > * Borislav Petkov <bp@alien8.de> wrote:
> > 
> > > --- a/arch/x86/mm/pgtable.c
> > > +++ b/arch/x86/mm/pgtable.c
> > > @@ -566,19 +566,28 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
> > >  /**
> > >   * pud_set_huge - setup kernel PUD mapping
> > >   *
> > > - * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
> > > - * this function does not set up a huge page when the range is covered
> > > - * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
> > > - * disabled.
> > > + * MTRRs can override PAT memory types with 4KiB granularity. Therefore, this
> > > + * function sets up a huge page only if any of the following conditions are met:
> > > + *
> > > + * - MTRRs are disabled, or
> > > + *
> > > + * - MTRRs are enabled and the range is completely covered by a single MTRR, or
> > > + *
> > > + * - MTRRs are enabled and the range is not completely covered by a single MTRR
> > > + *   but the memory type of the range is WB, even if covered by multiple MTRRs.
> > > + *
> > > + * Callers should try to decrease page size (1GB -> 2MB -> 4K) if the bigger
> > > + * page mapping attempt fails.
> > 
> > This comment should explain why it's ok in the WB case.
> > 
> > Also, the phrase 'the memory type of the range' is ambiguous: it might 
> > mean the partial MTRR's, or the memory type specified via PAT by the 
> > huge-pmd entry.
> 
> Agreed.  How about this sentence?
> 
>  - MTRRs are enabled and the corresponding MTRR memory type is WB, which
> has no effect to the requested PAT memory type.

s/effect to/effect on

sounds good otherwise!

Btw., if WB MTRR entries can never have an effect on Linux PAT 
specified attributes, why do we allow them to be created? I don't 
think we ever call into real mode for this to matter?

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-05-20 15:01                             ` Ingo Molnar
@ 2015-05-20 15:02                               ` Toshi Kani
  -1 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-20 15:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Borislav Petkov, akpm, hpa, tglx, mingo, linux-mm, x86,
	linux-kernel, dave.hansen, Elliott, pebolle, mcgrof

On Wed, 2015-05-20 at 17:01 +0200, Ingo Molnar wrote:
> * Toshi Kani <toshi.kani@hp.com> wrote:
> 
> > On Wed, 2015-05-20 at 13:55 +0200, Ingo Molnar wrote:
> > > * Borislav Petkov <bp@alien8.de> wrote:
> > > 
> > > > --- a/arch/x86/mm/pgtable.c
> > > > +++ b/arch/x86/mm/pgtable.c
> > > > @@ -566,19 +566,28 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
> > > >  /**
> > > >   * pud_set_huge - setup kernel PUD mapping
> > > >   *
> > > > - * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
> > > > - * this function does not set up a huge page when the range is covered
> > > > - * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
> > > > - * disabled.
> > > > + * MTRRs can override PAT memory types with 4KiB granularity. Therefore, this
> > > > + * function sets up a huge page only if any of the following conditions are met:
> > > > + *
> > > > + * - MTRRs are disabled, or
> > > > + *
> > > > + * - MTRRs are enabled and the range is completely covered by a single MTRR, or
> > > > + *
> > > > + * - MTRRs are enabled and the range is not completely covered by a single MTRR
> > > > + *   but the memory type of the range is WB, even if covered by multiple MTRRs.
> > > > + *
> > > > + * Callers should try to decrease page size (1GB -> 2MB -> 4K) if the bigger
> > > > + * page mapping attempt fails.
> > > 
> > > This comment should explain why it's ok in the WB case.
> > > 
> > > Also, the phrase 'the memory type of the range' is ambiguous: it might 
> > > mean the partial MTRR's, or the memory type specified via PAT by the 
> > > huge-pmd entry.
> > 
> > Agreed.  How about this sentence?
> > 
> >  - MTRRs are enabled and the corresponding MTRR memory type is WB, which
> > has no effect to the requested PAT memory type.
> 
> s/effect to/effect on
> 
> sounds good otherwise!

Great!

Boris, can you update the patch, or do you want me to send you a patch
for this update?

> Btw., if WB MTRR entries can never have an effect on Linux PAT 
> specified attributes, why do we allow them to be created? I don't 
> think we ever call into real mode for this to matter?

MTRRs have the default memory type, which is used when the given range
is not covered by any MTRR entries.  There are two types of BIOS setup:

1) Default UC
 - BIOS sets the default type to UC, and covers all WB accessible ranges
with MTRR entries of WB.

2) Default WB
 - BIOS sets the default type to WB, and covers non-WB accessible range
with MTRR entries of other memory types, such as UC.

In both cases, WB type can be returned.  In case of 1), the requested
range may overlap with multiple MTRR entries of WB type, which is still
safe.

Thanks,
-Toshi


Thanks,
-Toshi




^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
@ 2015-05-20 15:02                               ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-20 15:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Borislav Petkov, akpm, hpa, tglx, mingo, linux-mm, x86,
	linux-kernel, dave.hansen, Elliott, pebolle, mcgrof

On Wed, 2015-05-20 at 17:01 +0200, Ingo Molnar wrote:
> * Toshi Kani <toshi.kani@hp.com> wrote:
> 
> > On Wed, 2015-05-20 at 13:55 +0200, Ingo Molnar wrote:
> > > * Borislav Petkov <bp@alien8.de> wrote:
> > > 
> > > > --- a/arch/x86/mm/pgtable.c
> > > > +++ b/arch/x86/mm/pgtable.c
> > > > @@ -566,19 +566,28 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
> > > >  /**
> > > >   * pud_set_huge - setup kernel PUD mapping
> > > >   *
> > > > - * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
> > > > - * this function does not set up a huge page when the range is covered
> > > > - * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
> > > > - * disabled.
> > > > + * MTRRs can override PAT memory types with 4KiB granularity. Therefore, this
> > > > + * function sets up a huge page only if any of the following conditions are met:
> > > > + *
> > > > + * - MTRRs are disabled, or
> > > > + *
> > > > + * - MTRRs are enabled and the range is completely covered by a single MTRR, or
> > > > + *
> > > > + * - MTRRs are enabled and the range is not completely covered by a single MTRR
> > > > + *   but the memory type of the range is WB, even if covered by multiple MTRRs.
> > > > + *
> > > > + * Callers should try to decrease page size (1GB -> 2MB -> 4K) if the bigger
> > > > + * page mapping attempt fails.
> > > 
> > > This comment should explain why it's ok in the WB case.
> > > 
> > > Also, the phrase 'the memory type of the range' is ambiguous: it might 
> > > mean the partial MTRR's, or the memory type specified via PAT by the 
> > > huge-pmd entry.
> > 
> > Agreed.  How about this sentence?
> > 
> >  - MTRRs are enabled and the corresponding MTRR memory type is WB, which
> > has no effect to the requested PAT memory type.
> 
> s/effect to/effect on
> 
> sounds good otherwise!

Great!

Boris, can you update the patch, or do you want me to send you a patch
for this update?

> Btw., if WB MTRR entries can never have an effect on Linux PAT 
> specified attributes, why do we allow them to be created? I don't 
> think we ever call into real mode for this to matter?

MTRRs have the default memory type, which is used when the given range
is not covered by any MTRR entries.  There are two types of BIOS setup:

1) Default UC
 - BIOS sets the default type to UC, and covers all WB accessible ranges
with MTRR entries of WB.

2) Default WB
 - BIOS sets the default type to WB, and covers non-WB accessible range
with MTRR entries of other memory types, such as UC.

In both cases, WB type can be returned.  In case of 1), the requested
range may overlap with multiple MTRR entries of WB type, which is still
safe.

Thanks,
-Toshi


Thanks,
-Toshi



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-05-20 16:04                               ` Borislav Petkov
@ 2015-05-20 15:46                                   ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-20 15:46 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel,
	dave.hansen, Elliott, pebolle, mcgrof

On Wed, 2015-05-20 at 18:04 +0200, Borislav Petkov wrote:
> On Wed, May 20, 2015 at 09:02:23AM -0600, Toshi Kani wrote:
> > Boris, can you update the patch,
> 
> Done.

Thanks!
-Toshi



^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
@ 2015-05-20 15:46                                   ` Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: Toshi Kani @ 2015-05-20 15:46 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel,
	dave.hansen, Elliott, pebolle, mcgrof

On Wed, 2015-05-20 at 18:04 +0200, Borislav Petkov wrote:
> On Wed, May 20, 2015 at 09:02:23AM -0600, Toshi Kani wrote:
> > Boris, can you update the patch,
> 
> Done.

Thanks!
-Toshi


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
  2015-05-20 14:48     ` Ingo Molnar
@ 2015-05-20 15:51       ` Josh Poimboeuf
  2015-05-20 16:09         ` Josh Poimboeuf
  2015-05-20 16:03       ` Andy Lutomirski
  2015-05-21 20:54       ` Josh Poimboeuf
  2 siblings, 1 reply; 710+ messages in thread
From: Josh Poimboeuf @ 2015-05-20 15:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Michal Marek,
	Peter Zijlstra, x86, live-patching, linux-kernel, Linus Torvalds,
	Andy Lutomirski, Denys Vlasenko, Brian Gerst, Peter Zijlstra,
	Borislav Petkov, Andrew Morton

On Wed, May 20, 2015 at 04:48:10PM +0200, Ingo Molnar wrote:
> 
> * Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> 
> > On Wed, May 20, 2015 at 12:33:39PM +0200, Ingo Molnar wrote:
> > > 
> > > * Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > > 
> > > > In discussions around the live kernel patching consistency model RFC
> > > > [1], Peter and Ingo correctly pointed out that stack traces aren't
> > > > reliable.  And as Ingo said, there's no "strong force" which ensures we
> > > > can rely on them.
> > > > 
> > > > So I've been thinking about how to fix that.  My goal is to eventually
> > > > make stack traces reliable.  Or at the very least, to be able to detect
> > > > at runtime when a given stack trace *might* be unreliable.  But improved
> > > > stack traces would broadly benefit the entire kernel, regardless of the
> > > > outcome of the live kernel patching consistency model discussions.
> > > > 
> > > > This patch set is just the first in a series of proposed stack trace
> > > > reliability improvements.  Future proposals will include runtime stack
> > > > reliability checking, as well as compile-time and runtime DWARF
> > > > validations.
> > > > 
> > > > As far as I can tell, there are two main obstacles which prevent frame
> > > > pointer based stack traces from being reliable:
> > > > 
> > > > 1) Missing frame pointer logic: currently, most assembly functions don't
> > > >    set up the frame pointer.
> > > 
> > > Could you please paste here the output of what the new checks print 
> > > for x86/64 defconfig?
> > 
> > Here are all 89 warnings from defconfig:
> > 
> > arch/x86/ia32/ia32entry.o: ia32_sysenter_target() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/ia32/ia32entry.o: return instruction outside of a function at .entry.text+0x52e.  Please use FUNC_ENTER.
> > arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x359.  Please use FUNC_ENTER.
> > arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x19be.  Please use FUNC_ENTER.
> > arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x19e5.  Please use FUNC_ENTER.
> > arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x1c21.  Please use FUNC_ENTER.
> > arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x1ceb.  Please use FUNC_ENTER.
> > arch/x86/kernel/acpi/wakeup_64.o: wakeup_long64() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/kernel/acpi/wakeup_64.o: do_suspend_lowlevel() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/platform/efi/efi_stub_64.o: efi_call() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x6b.  Please use FUNC_ENTER.
> > arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0xc7.  Please use FUNC_ENTER.
> > arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x110.  Please use FUNC_ENTER.
> > arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x145.  Please use FUNC_ENTER.
> > arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x1c4.  Please use FUNC_ENTER.
> > arch/x86/realmode/rm/trampoline_64.o: return instruction outside of a function at .text+0x170.  Please use FUNC_ENTER.
> > arch/x86/realmode/rm/trampoline_64.o: return instruction outside of a function at .text+0x176.  Please use FUNC_ENTER.
> > arch/x86/kernel/head_64.o: return instruction outside of a function at .head.text+0x1a2.  Please use FUNC_ENTER.
> > arch/x86/kernel/head_64.o: start_cpu0() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/kernel/head_64.o: early_idt_handler() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/realmode/rm/reboot.o: return instruction outside of a function at .text+0x2a.  Please use FUNC_ENTER.
> > arch/x86/realmode/rm/copy.o: memcpy() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/realmode/rm/copy.o: memset() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/realmode/rm/copy.o: copy_from_fs() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/realmode/rm/copy.o: copy_to_fs() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/realmode/rm/bioscall.o: intcall() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/vdso/vdso32/int80.o: __kernel_sigreturn() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/vdso/vdso32/int80.o: __kernel_rt_sigreturn() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/vdso/vdso32/int80.o: __kernel_vsyscall() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/vdso/vdso32/syscall.o: __kernel_sigreturn() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/vdso/vdso32/syscall.o: __kernel_rt_sigreturn() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/vdso/vdso32/syscall.o: __kernel_vsyscall() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/vdso/vdso32/sysenter.o: __kernel_sigreturn() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/vdso/vdso32/sysenter.o: __kernel_rt_sigreturn() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/power/hibernate_asm_64.o: return instruction outside of a function at .text+0x69.  Please use FUNC_ENTER.
> > arch/x86/power/hibernate_asm_64.o: return instruction outside of a function at .text+0x16d.  Please use FUNC_ENTER.
> > arch/x86/lib/msr-reg.o: rdmsr_safe_regs() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/msr-reg.o: wrmsr_safe_regs() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/iomap_copy_64.o: __iowrite32_copy() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/clear_page_64.o: clear_page() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/clear_page_64.o: clear_page_orig() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/clear_page_64.o: clear_page_c_e() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/cmpxchg16b_emu.o: this_cpu_cmpxchg16b_emu() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/copy_page_64.o: copy_page() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/copy_page_64.o: copy_page_regs() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/copy_user_64.o: _copy_to_user() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/copy_user_64.o: _copy_from_user() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/copy_user_64.o: copy_user_generic_unrolled() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/copy_user_64.o: copy_user_generic_string() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/copy_user_64.o: copy_user_enhanced_fast_string() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/copy_user_64.o: __copy_user_nocache() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/copy_user_64.o: bad_from_user() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/csum-copy_64.o: csum_partial_copy_generic() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/getuser.o: __get_user_1() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/getuser.o: __get_user_2() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/getuser.o: __get_user_4() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/getuser.o: __get_user_8() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/getuser.o: return instruction outside of a function at .text+0xc5.  Please use FUNC_ENTER.
> > arch/x86/lib/memcpy_64.o: memcpy() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/memcpy_64.o: __memcpy() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/memcpy_64.o: memcpy_erms() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/memcpy_64.o: memcpy_orig() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/memmove_64.o: memmove() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/memmove_64.o: __memmove() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/memmove_64.o: return instruction outside of a function at .altinstr_replacement+0x5.  Please use FUNC_ENTER.
> > arch/x86/lib/memset_64.o: memset() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/memset_64.o: __memset() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/memset_64.o: memset_erms() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/memset_64.o: memset_orig() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/putuser.o: __put_user_1() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/putuser.o: __put_user_2() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/putuser.o: __put_user_4() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/putuser.o: __put_user_8() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/putuser.o: return instruction outside of a function at .text+0xc1.  Please use FUNC_ENTER.
> > arch/x86/lib/rwsem.o: call_rwsem_down_read_failed() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/rwsem.o: call_rwsem_down_write_failed() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/rwsem.o: call_rwsem_wake() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/lib/rwsem.o: call_rwsem_downgrade_wake() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/boot/bioscall.o: intcall() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/boot/copy.o: memcpy() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/boot/copy.o: memset() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/boot/copy.o: copy_from_fs() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/boot/copy.o: copy_to_fs() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/boot/pmjump.o: protected_mode_jump() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/boot/pmjump.o: in_pm32() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/boot/compressed/head_64.o: return instruction outside of a function at .text+0x16e.  Please use FUNC_ENTER.
> > arch/x86/boot/compressed/head_64.o: return instruction outside of a function at .text+0x172.  Please use FUNC_ENTER.
> > arch/x86/boot/compressed/head_64.o: startup_32() is missing frame pointer logic.  Please use FUNC_ENTER.
> > arch/x86/boot/header.o: die() is missing frame pointer logic.  Please use FUNC_ENTER.
> 
> Yeah, so many of these seem to be 'leaf only' functions: functions 
> that don't ever call functions themselves.

Yeah, good observation.

> So lets assume we always have CONFIG_FRAME_POINTERS=y.
> 
> If they don't set up a frame pointer then they in essence won't show 
> up in the call chain

It's actually the _caller_ of the asm function which gets skipped in the
trace.

(Though it doesn't really matter -- either way it's unreliable.)

> but normally they wouldn't because they call nothing.
> 
> If they trigger an exception/fault or if they get hit by an interrupt 
> then I think we'll still correctly walk the stack - just those 
> functions might be missing from the deterministic call chain, right? 
> (it will still show up as a '?' entry.)

Right.  This patch set takes the more conservative approach of requiring
_all_ callable asm functions to have frame pointer logic.  Which has the
benefit of getting rid of some of the cases where we need the '?' stack
entries.

> If they crash then we'll see them because the crashing RIP will be 
> printed.
> 
> So I'm wondering what the x86 policy here should be: to create frame 
> pointers in them or not. Cc:-ed a few more gents for thoughts.

I agree that frame pointers aren't strictly necessary for leaf
functions.

I could easily relax the stackvalidate restrictions to exclude the
checking of leaf functions.  In fact I think that would be more
consistent with how gcc does it, so maybe that's a more reasonable
approach.

-- 
Josh

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-20 14:51                     ` Ingo Molnar
@ 2015-05-20 15:55                       ` One Thousand Gnomes
  2015-05-20 16:07                         ` Borislav Petkov
  0 siblings, 1 reply; 710+ messages in thread
From: One Thousand Gnomes @ 2015-05-20 15:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Borislav Petkov, Huang Rui, Len Brown,
	Rafael J. Wysocki, x86, linux-kernel, Fengguang Wu, Aaron Lu,
	Tony Li, Frédéric Weisbecker

> That's not what appears to be happening here though: the MWAITX will 
> return after the timeout.
> 
> Which isn't really useful unless we use it to drive timers.

What about things like mdelay() ?

Alan

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
  2015-05-20 14:48     ` Ingo Molnar
  2015-05-20 15:51       ` Josh Poimboeuf
@ 2015-05-20 16:03       ` Andy Lutomirski
  2015-05-20 16:25         ` Josh Poimboeuf
  2015-05-20 17:27         ` Peter Zijlstra
  2015-05-21 20:54       ` Josh Poimboeuf
  2 siblings, 2 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-05-20 16:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Josh Poimboeuf, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Michal Marek, Peter Zijlstra, X86 ML, live-patching,
	linux-kernel, Linus Torvalds, Andy Lutomirski, Denys Vlasenko,
	Brian Gerst, Peter Zijlstra, Borislav Petkov, Andrew Morton

On Wed, May 20, 2015 at 7:48 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
>> On Wed, May 20, 2015 at 12:33:39PM +0200, Ingo Molnar wrote:
>> >
>> > * Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>> >
>> > > In discussions around the live kernel patching consistency model RFC
>> > > [1], Peter and Ingo correctly pointed out that stack traces aren't
>> > > reliable.  And as Ingo said, there's no "strong force" which ensures we
>> > > can rely on them.
>> > >
>> > > So I've been thinking about how to fix that.  My goal is to eventually
>> > > make stack traces reliable.  Or at the very least, to be able to detect
>> > > at runtime when a given stack trace *might* be unreliable.  But improved
>> > > stack traces would broadly benefit the entire kernel, regardless of the
>> > > outcome of the live kernel patching consistency model discussions.
>> > >
>> > > This patch set is just the first in a series of proposed stack trace
>> > > reliability improvements.  Future proposals will include runtime stack
>> > > reliability checking, as well as compile-time and runtime DWARF
>> > > validations.
>> > >
>> > > As far as I can tell, there are two main obstacles which prevent frame
>> > > pointer based stack traces from being reliable:
>> > >
>> > > 1) Missing frame pointer logic: currently, most assembly functions don't
>> > >    set up the frame pointer.
>> >
>> > Could you please paste here the output of what the new checks print
>> > for x86/64 defconfig?
>>
>> Here are all 89 warnings from defconfig:
>>
>> arch/x86/ia32/ia32entry.o: ia32_sysenter_target() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/ia32/ia32entry.o: return instruction outside of a function at .entry.text+0x52e.  Please use FUNC_ENTER.
>> arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x359.  Please use FUNC_ENTER.
>> arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x19be.  Please use FUNC_ENTER.
>> arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x19e5.  Please use FUNC_ENTER.
>> arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x1c21.  Please use FUNC_ENTER.
>> arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x1ceb.  Please use FUNC_ENTER.
>> arch/x86/kernel/acpi/wakeup_64.o: wakeup_long64() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/kernel/acpi/wakeup_64.o: do_suspend_lowlevel() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/platform/efi/efi_stub_64.o: efi_call() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x6b.  Please use FUNC_ENTER.
>> arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0xc7.  Please use FUNC_ENTER.
>> arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x110.  Please use FUNC_ENTER.
>> arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x145.  Please use FUNC_ENTER.
>> arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x1c4.  Please use FUNC_ENTER.
>> arch/x86/realmode/rm/trampoline_64.o: return instruction outside of a function at .text+0x170.  Please use FUNC_ENTER.
>> arch/x86/realmode/rm/trampoline_64.o: return instruction outside of a function at .text+0x176.  Please use FUNC_ENTER.
>> arch/x86/kernel/head_64.o: return instruction outside of a function at .head.text+0x1a2.  Please use FUNC_ENTER.
>> arch/x86/kernel/head_64.o: start_cpu0() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/kernel/head_64.o: early_idt_handler() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/realmode/rm/reboot.o: return instruction outside of a function at .text+0x2a.  Please use FUNC_ENTER.
>> arch/x86/realmode/rm/copy.o: memcpy() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/realmode/rm/copy.o: memset() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/realmode/rm/copy.o: copy_from_fs() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/realmode/rm/copy.o: copy_to_fs() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/realmode/rm/bioscall.o: intcall() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/vdso/vdso32/int80.o: __kernel_sigreturn() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/vdso/vdso32/int80.o: __kernel_rt_sigreturn() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/vdso/vdso32/int80.o: __kernel_vsyscall() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/vdso/vdso32/syscall.o: __kernel_sigreturn() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/vdso/vdso32/syscall.o: __kernel_rt_sigreturn() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/vdso/vdso32/syscall.o: __kernel_vsyscall() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/vdso/vdso32/sysenter.o: __kernel_sigreturn() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/vdso/vdso32/sysenter.o: __kernel_rt_sigreturn() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/power/hibernate_asm_64.o: return instruction outside of a function at .text+0x69.  Please use FUNC_ENTER.
>> arch/x86/power/hibernate_asm_64.o: return instruction outside of a function at .text+0x16d.  Please use FUNC_ENTER.
>> arch/x86/lib/msr-reg.o: rdmsr_safe_regs() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/msr-reg.o: wrmsr_safe_regs() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/iomap_copy_64.o: __iowrite32_copy() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/clear_page_64.o: clear_page() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/clear_page_64.o: clear_page_orig() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/clear_page_64.o: clear_page_c_e() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/cmpxchg16b_emu.o: this_cpu_cmpxchg16b_emu() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/copy_page_64.o: copy_page() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/copy_page_64.o: copy_page_regs() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/copy_user_64.o: _copy_to_user() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/copy_user_64.o: _copy_from_user() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/copy_user_64.o: copy_user_generic_unrolled() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/copy_user_64.o: copy_user_generic_string() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/copy_user_64.o: copy_user_enhanced_fast_string() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/copy_user_64.o: __copy_user_nocache() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/copy_user_64.o: bad_from_user() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/csum-copy_64.o: csum_partial_copy_generic() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/getuser.o: __get_user_1() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/getuser.o: __get_user_2() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/getuser.o: __get_user_4() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/getuser.o: __get_user_8() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/getuser.o: return instruction outside of a function at .text+0xc5.  Please use FUNC_ENTER.
>> arch/x86/lib/memcpy_64.o: memcpy() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/memcpy_64.o: __memcpy() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/memcpy_64.o: memcpy_erms() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/memcpy_64.o: memcpy_orig() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/memmove_64.o: memmove() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/memmove_64.o: __memmove() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/memmove_64.o: return instruction outside of a function at .altinstr_replacement+0x5.  Please use FUNC_ENTER.
>> arch/x86/lib/memset_64.o: memset() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/memset_64.o: __memset() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/memset_64.o: memset_erms() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/memset_64.o: memset_orig() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/putuser.o: __put_user_1() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/putuser.o: __put_user_2() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/putuser.o: __put_user_4() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/putuser.o: __put_user_8() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/putuser.o: return instruction outside of a function at .text+0xc1.  Please use FUNC_ENTER.
>> arch/x86/lib/rwsem.o: call_rwsem_down_read_failed() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/rwsem.o: call_rwsem_down_write_failed() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/rwsem.o: call_rwsem_wake() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/lib/rwsem.o: call_rwsem_downgrade_wake() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/boot/bioscall.o: intcall() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/boot/copy.o: memcpy() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/boot/copy.o: memset() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/boot/copy.o: copy_from_fs() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/boot/copy.o: copy_to_fs() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/boot/pmjump.o: protected_mode_jump() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/boot/pmjump.o: in_pm32() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/boot/compressed/head_64.o: return instruction outside of a function at .text+0x16e.  Please use FUNC_ENTER.
>> arch/x86/boot/compressed/head_64.o: return instruction outside of a function at .text+0x172.  Please use FUNC_ENTER.
>> arch/x86/boot/compressed/head_64.o: startup_32() is missing frame pointer logic.  Please use FUNC_ENTER.
>> arch/x86/boot/header.o: die() is missing frame pointer logic.  Please use FUNC_ENTER.
>
> Yeah, so many of these seem to be 'leaf only' functions: functions
> that don't ever call functions themselves.
>
> So lets assume we always have CONFIG_FRAME_POINTERS=y.
>
> If they don't set up a frame pointer then they in essence won't show
> up in the call chain - but normally they wouldn't because they call
> nothing.
>
> If they trigger an exception/fault or if they get hit by an interrupt
> then I think we'll still correctly walk the stack - just those
> functions might be missing from the deterministic call chain, right?
> (it will still show up as a '?' entry.)

I've never quite understood what the '?' means.

>
> If they crash then we'll see them because the crashing RIP will be
> printed.
>
> So I'm wondering what the x86 policy here should be: to create frame
> pointers in them or not. Cc:-ed a few more gents for thoughts.
>

I think it would be nice to have full DWARF unwind support for
everything at some point.  Unfortunately, I don't see any easy path to
getting there.  It doesn't help that AFAIK no one has ever proposed a
usable in-kernel DWARF unwinder.

It also doesn't help that writing correct CFI annotations in inline
asm can be very complicated.

I think that ia64 manages to have complete unwind support.  How did
they manage it?

If we had an unwinder, it would be relatively straightforward to write
something perf-based that would frequently check that we can unwind
all the way out of an NMI back to userspace and warn if we couldn't.

--Andy

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping
  2015-05-20 15:02                               ` Toshi Kani
  (?)
@ 2015-05-20 16:04                               ` Borislav Petkov
  2015-05-20 15:46                                   ` Toshi Kani
  -1 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-20 16:04 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Ingo Molnar, akpm, hpa, tglx, mingo, linux-mm, x86, linux-kernel,
	dave.hansen, Elliott, pebolle, mcgrof

On Wed, May 20, 2015 at 09:02:23AM -0600, Toshi Kani wrote:
> Boris, can you update the patch,

Done.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-20 15:55                       ` One Thousand Gnomes
@ 2015-05-20 16:07                         ` Borislav Petkov
  2015-05-20 19:12                           ` Thomas Gleixner
  0 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-20 16:07 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Ingo Molnar, Thomas Gleixner, Huang Rui, Len Brown,
	Rafael J. Wysocki, x86, linux-kernel, Fengguang Wu, Aaron Lu,
	Tony Li, Frédéric Weisbecker

On Wed, May 20, 2015 at 04:55:58PM +0100, One Thousand Gnomes wrote:
> > That's not what appears to be happening here though: the MWAITX will 
> > return after the timeout.
> > 
> > Which isn't really useful unless we use it to drive timers.
> 
> What about things like mdelay() ?

It has an upper limit on the max timeout though: u32 TSC cycles.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
  2015-05-20 15:51       ` Josh Poimboeuf
@ 2015-05-20 16:09         ` Josh Poimboeuf
  0 siblings, 0 replies; 710+ messages in thread
From: Josh Poimboeuf @ 2015-05-20 16:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Michal Marek,
	Peter Zijlstra, x86, live-patching, linux-kernel, Linus Torvalds,
	Andy Lutomirski, Denys Vlasenko, Brian Gerst, Peter Zijlstra,
	Borislav Petkov, Andrew Morton

On Wed, May 20, 2015 at 10:51:56AM -0500, Josh Poimboeuf wrote:
> On Wed, May 20, 2015 at 04:48:10PM +0200, Ingo Molnar wrote:
> > Yeah, so many of these seem to be 'leaf only' functions: functions 
> > that don't ever call functions themselves.
> 
> Yeah, good observation.
> 
> > So lets assume we always have CONFIG_FRAME_POINTERS=y.
> > 
> > If they don't set up a frame pointer then they in essence won't show 
> > up in the call chain
> 
> It's actually the _caller_ of the asm function which gets skipped in the
> trace.
> 
> (Though it doesn't really matter -- either way it's unreliable.)
> 
> > but normally they wouldn't because they call nothing.
> > 
> > If they trigger an exception/fault or if they get hit by an interrupt 
> > then I think we'll still correctly walk the stack - just those 
> > functions might be missing from the deterministic call chain, right? 
> > (it will still show up as a '?' entry.)
> 
> Right.  This patch set takes the more conservative approach of requiring
> _all_ callable asm functions to have frame pointer logic.  Which has the
> benefit of getting rid of some of the cases where we need the '?' stack
> entries.
> 
> > If they crash then we'll see them because the crashing RIP will be 
> > printed.
> > 
> > So I'm wondering what the x86 policy here should be: to create frame 
> > pointers in them or not. Cc:-ed a few more gents for thoughts.
> 
> I agree that frame pointers aren't strictly necessary for leaf
> functions.
> 
> I could easily relax the stackvalidate restrictions to exclude the
> checking of leaf functions.  In fact I think that would be more
> consistent with how gcc does it, so maybe that's a more reasonable
> approach.

I remembered another reason why I went with the more conservative
approach of requiring frame pointers in leaf functions.

It's often hard to pin down where an asm function begins and where it
ends.  For example, you might have something like:

ENTRY(callable_asm_func)
	jmp label
ENDPROC(callable_asm_func)

label:
	call some_c_function
	ret

If we were to relax the stackvalidate restrictions then we'd miss that
kind of (surprisingly common) situation, where a function jumps outside
of its scope.

Then again, I guess it would be pretty easy to add checks for that.

-- 
Josh

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
  2015-05-20 16:03       ` Andy Lutomirski
@ 2015-05-20 16:25         ` Josh Poimboeuf
  2015-05-20 16:39           ` Andy Lutomirski
                             ` (2 more replies)
  2015-05-20 17:27         ` Peter Zijlstra
  1 sibling, 3 replies; 710+ messages in thread
From: Josh Poimboeuf @ 2015-05-20 16:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Michal Marek, Peter Zijlstra, X86 ML, live-patching,
	linux-kernel, Linus Torvalds, Andy Lutomirski, Denys Vlasenko,
	Brian Gerst, Peter Zijlstra, Borislav Petkov, Andrew Morton

On Wed, May 20, 2015 at 09:03:37AM -0700, Andy Lutomirski wrote:
> On Wed, May 20, 2015 at 7:48 AM, Ingo Molnar <mingo@kernel.org> wrote:
> > Yeah, so many of these seem to be 'leaf only' functions: functions
> > that don't ever call functions themselves.
> >
> > So lets assume we always have CONFIG_FRAME_POINTERS=y.
> >
> > If they don't set up a frame pointer then they in essence won't show
> > up in the call chain - but normally they wouldn't because they call
> > nothing.
> >
> > If they trigger an exception/fault or if they get hit by an interrupt
> > then I think we'll still correctly walk the stack - just those
> > functions might be missing from the deterministic call chain, right?
> > (it will still show up as a '?' entry.)
> 
> I've never quite understood what the '?' means.

It basically means "here's a function address we found on the stack,
which may or may not have been called."  It's needed because stack
walking isn't currently 100% reliable.

> > If they crash then we'll see them because the crashing RIP will be
> > printed.
> >
> > So I'm wondering what the x86 policy here should be: to create frame
> > pointers in them or not. Cc:-ed a few more gents for thoughts.
> >
> 
> I think it would be nice to have full DWARF unwind support for
> everything at some point.  Unfortunately, I don't see any easy path to
> getting there.  It doesn't help that AFAIK no one has ever proposed a
> usable in-kernel DWARF unwinder.
> 
> It also doesn't help that writing correct CFI annotations in inline
> asm can be very complicated.
> 
> I think that ia64 manages to have complete unwind support.  How did
> they manage it?
> 
> If we had an unwinder, it would be relatively straightforward to write
> something perf-based that would frequently check that we can unwind
> all the way out of an NMI back to userspace and warn if we couldn't.

I agree that DWARF unwind support would be nice.  I have some plans
about how to achieve that in future patch sets.  It includes several
pieces:

- compile-time DWARF data validation (using some similar approaches to
  this patch set)

- run time DWARF data validation, including:
  - a DWARF unwinder which doesn't blindly trust any of the DWARF data
  - ensuring DWARF and frame pointer data are consistent with each other
  - ensuring it can walk all the way to the bottom of the stack
  - a DEBUG option which validates the stack periodically from an NMI
    and/or schedule()

-- 
Josh

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
  2015-05-20 16:25         ` Josh Poimboeuf
@ 2015-05-20 16:39           ` Andy Lutomirski
  2015-05-20 16:52           ` Borislav Petkov
  2015-05-20 16:59           ` [PATCH v4 0/3] Compile-time stack frame pointer validation Linus Torvalds
  2 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-05-20 16:39 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Ingo Molnar, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Michal Marek, Peter Zijlstra, X86 ML, live-patching,
	linux-kernel, Linus Torvalds, Andy Lutomirski, Denys Vlasenko,
	Brian Gerst, Peter Zijlstra, Borislav Petkov, Andrew Morton

On Wed, May 20, 2015 at 9:25 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Wed, May 20, 2015 at 09:03:37AM -0700, Andy Lutomirski wrote:
>> On Wed, May 20, 2015 at 7:48 AM, Ingo Molnar <mingo@kernel.org> wrote:
>> > Yeah, so many of these seem to be 'leaf only' functions: functions
>> > that don't ever call functions themselves.
>> >
>> > So lets assume we always have CONFIG_FRAME_POINTERS=y.
>> >
>> > If they don't set up a frame pointer then they in essence won't show
>> > up in the call chain - but normally they wouldn't because they call
>> > nothing.
>> >
>> > If they trigger an exception/fault or if they get hit by an interrupt
>> > then I think we'll still correctly walk the stack - just those
>> > functions might be missing from the deterministic call chain, right?
>> > (it will still show up as a '?' entry.)
>>
>> I've never quite understood what the '?' means.
>
> It basically means "here's a function address we found on the stack,
> which may or may not have been called."  It's needed because stack
> walking isn't currently 100% reliable.
>
>> > If they crash then we'll see them because the crashing RIP will be
>> > printed.
>> >
>> > So I'm wondering what the x86 policy here should be: to create frame
>> > pointers in them or not. Cc:-ed a few more gents for thoughts.
>> >
>>
>> I think it would be nice to have full DWARF unwind support for
>> everything at some point.  Unfortunately, I don't see any easy path to
>> getting there.  It doesn't help that AFAIK no one has ever proposed a
>> usable in-kernel DWARF unwinder.
>>
>> It also doesn't help that writing correct CFI annotations in inline
>> asm can be very complicated.
>>
>> I think that ia64 manages to have complete unwind support.  How did
>> they manage it?
>>
>> If we had an unwinder, it would be relatively straightforward to write
>> something perf-based that would frequently check that we can unwind
>> all the way out of an NMI back to userspace and warn if we couldn't.
>
> I agree that DWARF unwind support would be nice.  I have some plans
> about how to achieve that in future patch sets.  It includes several
> pieces:
>
> - compile-time DWARF data validation (using some similar approaches to
>   this patch set)
>
> - run time DWARF data validation, including:
>   - a DWARF unwinder which doesn't blindly trust any of the DWARF data

Fantastic!

>   - ensuring DWARF and frame pointer data are consistent with each other
>   - ensuring it can walk all the way to the bottom of the stack
>   - a DEBUG option which validates the stack periodically from an NMI
>     and/or schedule()

We think alike :)

NMI will be much more interesting than schedule.

--Andy

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
  2015-05-20 16:25         ` Josh Poimboeuf
  2015-05-20 16:39           ` Andy Lutomirski
@ 2015-05-20 16:52           ` Borislav Petkov
  2015-05-21 10:16             ` Ingo Molnar
  2015-05-20 16:59           ` [PATCH v4 0/3] Compile-time stack frame pointer validation Linus Torvalds
  2 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-20 16:52 UTC (permalink / raw)
  To: Josh Poimboeuf, Andy Lutomirski
  Cc: Ingo Molnar, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Michal Marek, Peter Zijlstra, X86 ML, live-patching,
	linux-kernel, Linus Torvalds, Andy Lutomirski, Denys Vlasenko,
	Brian Gerst, Peter Zijlstra, Andrew Morton

On Wed, May 20, 2015 at 11:25:37AM -0500, Josh Poimboeuf wrote:
> > I've never quite understood what the '?' means.
> 
> It basically means "here's a function address we found on the stack,
> which may or may not have been called."  It's needed because stack
> walking isn't currently 100% reliable.

Yeah, that was not that trivial to figure out at the time:

unsigned long
print_context_stack(struct thread_info *tinfo,
		...

                if (__kernel_text_address(addr)) {
                        if ((unsigned long) stack == bp + sizeof(long)) {
                                ops->address(data, addr, 1);
                                frame = frame->next_frame;
                                bp = (unsigned long) frame;
                        } else {
                                ops->address(data, addr, 0);
                        }

and that ops->address is

print_trace_address()
|-> printk_stack_address()

So if I'm understanding this correctly, if rBP+8 is equal to rSP, i.e.
return address is on the stack, then this frame got called.

Otherwise -> "?".

I might be missing something though...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
  2015-05-20 16:25         ` Josh Poimboeuf
  2015-05-20 16:39           ` Andy Lutomirski
  2015-05-20 16:52           ` Borislav Petkov
@ 2015-05-20 16:59           ` Linus Torvalds
  2015-05-20 17:20             ` Josh Poimboeuf
  2015-05-21  7:52             ` Ingo Molnar
  2 siblings, 2 replies; 710+ messages in thread
From: Linus Torvalds @ 2015-05-20 16:59 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Michal Marek, Peter Zijlstra, X86 ML,
	live-patching, linux-kernel, Andy Lutomirski, Denys Vlasenko,
	Brian Gerst, Peter Zijlstra, Borislav Petkov, Andrew Morton

On Wed, May 20, 2015 at 9:25 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Wed, May 20, 2015 at 09:03:37AM -0700, Andy Lutomirski wrote:
>>
>> I've never quite understood what the '?' means.
>
> It basically means "here's a function address we found on the stack,
> which may or may not have been called."  It's needed because stack
> walking isn't currently 100% reliable.

It is often quite interesting and helpful, because it shows stale data
on the stack, giving clues about what happened just before.

Now, I'd like gcc to generally be better about not wasting so much
stack frame, so in that sense I'd like to see fewer '?" entries just
from a code quality standpoint, but when debugging those things, the
downside of "noise" is often cancelled by the upside of "ahh, it
happens after calling X".

So the "perfect stack frames" is actually not as great a thing as some
people want to make it seem.

                   Linus

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
  2015-05-20 16:59           ` [PATCH v4 0/3] Compile-time stack frame pointer validation Linus Torvalds
@ 2015-05-20 17:20             ` Josh Poimboeuf
  2015-05-21 10:27               ` Ingo Molnar
  2015-05-21  7:52             ` Ingo Molnar
  1 sibling, 1 reply; 710+ messages in thread
From: Josh Poimboeuf @ 2015-05-20 17:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Ingo Molnar, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Michal Marek, Peter Zijlstra, X86 ML,
	live-patching, linux-kernel, Andy Lutomirski, Denys Vlasenko,
	Brian Gerst, Peter Zijlstra, Borislav Petkov, Andrew Morton

On Wed, May 20, 2015 at 09:59:18AM -0700, Linus Torvalds wrote:
> On Wed, May 20, 2015 at 9:25 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Wed, May 20, 2015 at 09:03:37AM -0700, Andy Lutomirski wrote:
> >>
> >> I've never quite understood what the '?' means.
> >
> > It basically means "here's a function address we found on the stack,
> > which may or may not have been called."  It's needed because stack
> > walking isn't currently 100% reliable.
> 
> It is often quite interesting and helpful, because it shows stale data
> on the stack, giving clues about what happened just before.
> 
> Now, I'd like gcc to generally be better about not wasting so much
> stack frame, so in that sense I'd like to see fewer '?" entries just
> from a code quality standpoint, but when debugging those things, the
> downside of "noise" is often cancelled by the upside of "ahh, it
> happens after calling X".
> 
> So the "perfect stack frames" is actually not as great a thing as some
> people want to make it seem.

Ok, I can see how looking at stale stack data could be useful for some
of the really tough problems.

But right now, the meaning of '?' is ambiguous.  It could be stale data,
or it could be part of a frame for the current stack which was skipped
due to missing frame pointers or an exception.

If we can somehow make the stack unwinder reliable, then it would at
least allow us to remove the ambiguity of the '?' entries.  And it would
reduce the "noise" for the majority of issues where we don't care about
stale stack data, and can simply ignore it.

-- 
Josh

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
  2015-05-20 16:03       ` Andy Lutomirski
  2015-05-20 16:25         ` Josh Poimboeuf
@ 2015-05-20 17:27         ` Peter Zijlstra
  2015-05-20 19:10           ` Jiri Kosina
  1 sibling, 1 reply; 710+ messages in thread
From: Peter Zijlstra @ 2015-05-20 17:27 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Josh Poimboeuf, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Michal Marek, X86 ML, live-patching,
	linux-kernel, Linus Torvalds, Andy Lutomirski, Denys Vlasenko,
	Brian Gerst, Borislav Petkov, Andrew Morton

On Wed, May 20, 2015 at 09:03:37AM -0700, Andy Lutomirski wrote:
> I think it would be nice to have full DWARF unwind support for
> everything at some point.  Unfortunately, I don't see any easy path to
> getting there.  It doesn't help that AFAIK no one has ever proposed a
> usable in-kernel DWARF unwinder.

There's a bit of history here; SuSE (iirc) actually has one, however:

  https://lkml.org/lkml/2012/2/10/356

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH] mce: fix fail to set 'monarchtimeout' via boot option
  2015-05-20 11:22 [PATCH] mce: fix fail to set 'monarchtimeout' via boot option Xie XiuQi
@ 2015-05-20 17:43 ` Borislav Petkov
  2015-05-21  1:00   ` Xie XiuQi
  0 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-20 17:43 UTC (permalink / raw)
  To: Xie XiuQi; +Cc: tony.luck, tglx, mingo, hpa, x86, linux-edac, linux-kernel

On Wed, May 20, 2015 at 07:22:23PM +0800, Xie XiuQi wrote:
> I use "mce=1,10000000" in cmdline to change the monarch timeout, but
> it does not work.
> 
> The cause is that get_option() has parsed the ',' already, we need
> not to check the ',' again.
> 
> --
> get_option(): read an int from an option string;
> if available accept a subsequent comma as well.
> 
> Return values:
> 0 - no int in string
> 1 - int found, no subsequent comma
> 2 - int found including a subsequent comma
> 3 - hyphen found to denote a range
> 
> Cc: <stable@vger.kernel.org>	# 2.6.32+

I don't think that's a serious enough a bug to justify the stable tag.

> Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 2a2bb91..46ca8e7 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -2020,11 +2020,8 @@ static int __init mcheck_enable(char *str)
>  	else if (!strcmp(str, "bios_cmci_threshold"))
>  		cfg->bios_cmci_threshold = true;
>  	else if (isdigit(str[0])) {
> -		get_option(&str, &(cfg->tolerant));
> -		if (*str == ',') {
> -			++str;
> +		if (get_option(&str, &(cfg->tolerant) == 2)

This patch wasn't build-tested, right:

arch/x86/kernel/cpu/mcheck/mce.c: In function ‘mcheck_enable’:
arch/x86/kernel/cpu/mcheck/mce.c:1993:41: warning: comparison between pointer and integer
   if (get_option(&str, &(cfg->tolerant) == 2)
                                         ^
arch/x86/kernel/cpu/mcheck/mce.c:1993:24: warning: passing argument 2 of ‘get_option’ makes pointer from integer without a cast
   if (get_option(&str, &(cfg->tolerant) == 2)
                        ^
In file included from include/asm-generic/bug.h:13:0,
                 from ./arch/x86/include/asm/bug.h:35,
                 from include/linux/bug.h:4,
                 from include/linux/thread_info.h:11,
                 from arch/x86/kernel/cpu/mcheck/mce.c:13:
include/linux/kernel.h:420:12: note: expected ‘int *’ but argument is of type ‘int’
 extern int get_option(char **str, int *pint);
            ^
arch/x86/kernel/cpu/mcheck/mce.c:1994:4: error: expected ‘)’ before ‘get_option’
    get_option(&str, &(cfg->monarch_timeout));
    ^
arch/x86/kernel/cpu/mcheck/mce.c:1995:2: error: expected expression before ‘}’ token
  } else {
  ^
make[4]: *** [arch/x86/kernel/cpu/mcheck/mce.o] Error 1
make[3]: *** [arch/x86/kernel/cpu/mcheck] Error 2
make[2]: *** [arch/x86/kernel/cpu] Error 2
make[1]: *** [arch/x86/kernel] Error 2
make: *** [arch/x86] Error 2
make: *** Waiting for unfinished jobs....

>  			get_option(&str, &(cfg->monarch_timeout));
> -		}
>  	} else {
>  		pr_info("mce argument %s ignored. Please use /sys\n", str);
>  		return 0;

Anyway, I fixed it up and applied it.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
  2015-05-20 17:27         ` Peter Zijlstra
@ 2015-05-20 19:10           ` Jiri Kosina
  0 siblings, 0 replies; 710+ messages in thread
From: Jiri Kosina @ 2015-05-20 19:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andy Lutomirski, Ingo Molnar, Josh Poimboeuf, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Michal Marek, X86 ML, live-patching,
	linux-kernel, Linus Torvalds, Andy Lutomirski, Denys Vlasenko,
	Brian Gerst, Borislav Petkov, Andrew Morton

On Wed, 20 May 2015, Peter Zijlstra wrote:

> > I think it would be nice to have full DWARF unwind support for
> > everything at some point.  Unfortunately, I don't see any easy path to
> > getting there.  It doesn't help that AFAIK no one has ever proposed a
> > usable in-kernel DWARF unwinder.
> 
> There's a bit of history here; SuSE (iirc) actually has one, however:
> 
>   https://lkml.org/lkml/2012/2/10/356

Oh absolutely, there are stories behind this :)

Just for the sake of completness -- the current implementation can be 
found in our public GIT repository, for not-really-complete picture see 
[1] [2] [3] [4].

It turned out to be rather useful on many ocasions when debugging customer 
reports, but I of course also understand what Linus is saying above. The 
bugs in unwinder can be *really* painful. Our experience so far has been 
that it did pay off at the end of the day (and of course analyzing 
stacktraces is our daily bread).

[1] http://kernel.suse.com/cgit/kernel-source/tree/patches.suse/stack-unwind?h=SLE12
[2] http://kernel.suse.com/cgit/kernel-source/tree/patches.suse/no-frame-pointer-select?h=SLE12
[3] http://kernel.suse.com/cgit/kernel-source/tree/patches.arch/stack-unwind-cfi_ignore-takes-more-arguments?h=SLE12
[4] http://kernel.suse.com/cgit/kernel-source/tree/patches.arch/x86_64-unwind-annotations?h=SLE12

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-20 16:07                         ` Borislav Petkov
@ 2015-05-20 19:12                           ` Thomas Gleixner
  2015-05-20 20:15                             ` Borislav Petkov
  0 siblings, 1 reply; 710+ messages in thread
From: Thomas Gleixner @ 2015-05-20 19:12 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: One Thousand Gnomes, Ingo Molnar, Huang Rui, Len Brown,
	Rafael J. Wysocki, x86, linux-kernel, Fengguang Wu, Aaron Lu,
	Tony Li, Frédéric Weisbecker

On Wed, 20 May 2015, Borislav Petkov wrote:

> On Wed, May 20, 2015 at 04:55:58PM +0100, One Thousand Gnomes wrote:
> > > That's not what appears to be happening here though: the MWAITX will 
> > > return after the timeout.
> > > 
> > > Which isn't really useful unless we use it to drive timers.
> > 
> > What about things like mdelay() ?
> 
> It has an upper limit on the max timeout though: u32 TSC cycles.

Which would be good enough for mdelay/udelay I think, but we'd need to
measure the time spend in MWAITT so we wont return early.

Something like this:

	  delay = usec_to_tsc(delay);
	  end = rdtsc() + delay;
	  while (1) {
		MWAITT(delay);
		now = rdtsc();
		if (end <= now)
		   	  break;
		delay = end - now;
	}

Now we'd need to add alternatives or some other mechanism to it to
make this conditionally for those machines.

Not sure if it's worth the trouble.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 6/6] video: fbdev: atyfb: use arch_phys_wc_add() and ioremap_wc()
  2015-04-29 21:44   ` Luis R. Rodriguez
@ 2015-05-20 19:53     ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-05-20 19:53 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: mingo, tglx, hpa, bp, plagnioj, tomi.valkeinen, daniel.vetter,
	airlied, dledford, awalls, syrjala, luto, mst, cocci,
	linux-kernel, Toshi Kani, Suresh Siddha, Juergen Gross,
	Daniel Vetter, Dave Airlie, Antonino Daplas, Rob Clark,
	Mathias Krause, Andrzej Hajda, Mel Gorman, Vlastimil Babka,
	Davidlohr Bueso, linux-fbdev

Tomi,

the new required ioremap_uc() which was added in the initial patch set here is
now merged on linux-next but I just noticed a small issue with this atyfb
specific patch, I'll fix that and respin and send to you as v5.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 6/6] video: fbdev: atyfb: use arch_phys_wc_add() and ioremap_wc()
@ 2015-05-20 19:53     ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-05-20 19:53 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: mingo, tglx, hpa, bp, plagnioj, tomi.valkeinen, daniel.vetter,
	airlied, dledford, awalls, syrjala, luto, mst, cocci,
	linux-kernel, Toshi Kani, Suresh Siddha, Juergen Gross,
	Daniel Vetter, Dave Airlie, Antonino Daplas, Rob Clark,
	Mathias Krause, Andrzej Hajda, Mel Gorman, Vlastimil Babka,
	Davidlohr Bueso, linux-fbdev

Tomi,

the new required ioremap_uc() which was added in the initial patch set here is
now merged on linux-next but I just noticed a small issue with this atyfb
specific patch, I'll fix that and respin and send to you as v5.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-20 19:12                           ` Thomas Gleixner
@ 2015-05-20 20:15                             ` Borislav Petkov
  2015-05-21 14:56                               ` Huang Rui
  0 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-20 20:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: One Thousand Gnomes, Ingo Molnar, Huang Rui, Len Brown,
	Rafael J. Wysocki, x86, linux-kernel, Fengguang Wu, Aaron Lu,
	Tony Li, Frédéric Weisbecker

On Wed, May 20, 2015 at 09:12:15PM +0200, Thomas Gleixner wrote:
> Which would be good enough for mdelay/udelay I think, but we'd need to
> measure the time spend in MWAITT so we wont return early.
> 
> Something like this:

Yeah, with a check maybe:

> 	  delay = usec_to_tsc(delay_usec);

	if (delay > ((1 << 32) - 1)) {
		mdelay(delay_usec);
		return;
	}

> 	  end = rdtsc() + delay;
> 	  while (1) {

I guess
		monitorx( ...);

first.

> 		MWAITT(delay);
> 		now = rdtsc();
> 		if (end <= now)
> 		   	  break;
> 		delay = end - now;
> 	}
> 
> Now we'd need to add alternatives or some other mechanism to it to
> make this conditionally for those machines.

alternative_call(mdelay, mdelayx, X86_FEATURE_MWAITT, /* no output */, timeout);

Something like that maybe.

> Not sure if it's worth the trouble.

Could be a use case for MWAITX in the kernel!

:-D

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 6/6] video: fbdev: atyfb: use arch_phys_wc_add() and ioremap_wc()
  2015-05-20 19:53     ` Luis R. Rodriguez
@ 2015-05-20 20:57       ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-05-20 20:57 UTC (permalink / raw)
  To: Luis R. Rodriguez, Borislav Petkov, Tomi Valkeinen, Bjorn Helgaas
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Jean-Christophe Plagniol-Villard, Daniel Vetter, Dave Airlie,
	Doug Ledford, Andy Walls, Ville Syrjälä,
	Andy Lutomirski, Michael S. Tsirkin, cocci, linux-kernel,
	Toshi Kani, Suresh Siddha, Juergen Gross, Daniel Vetter,
	Dave Airlie, Antonino Daplas, Rob Clark, Mathias Krause,
	Andrzej Hajda, Mel Gorman, Vlastimil Babka, Davidlohr Bueso,
	linux-fbdev

On Wed, May 20, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> Tomi,
>
> the new required ioremap_uc() which was added in the initial patch set here is
> now merged on linux-next but I just noticed a small issue with this atyfb
> specific patch, I'll fix that and respin and send to you as v5.

Actually since this series depends on ioremap_uc() and since I need to
respin the last patch in this series for atyfb provided I get an
Acked-by from you for the fbdev changes do you mind if this goes
through the x86 tree as that already has the ioremap_uc() required
call? Otherwise we will have one kernel release with that call
available but not used, and we'll have to wait for 2 releases before
these changes get merged.

Thoughts?

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 6/6] video: fbdev: atyfb: use arch_phys_wc_add() and ioremap_wc()
@ 2015-05-20 20:57       ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-05-20 20:57 UTC (permalink / raw)
  To: Luis R. Rodriguez, Borislav Petkov, Tomi Valkeinen, Bjorn Helgaas
  Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Jean-Christophe Plagniol-Villard, Daniel Vetter, Dave Airlie,
	Doug Ledford, Andy Walls, Ville Syrjälä,
	Andy Lutomirski, Michael S. Tsirkin, cocci, linux-kernel,
	Toshi Kani, Suresh Siddha, Juergen Gross, Daniel Vetter,
	Dave Airlie, Antonino Daplas, Rob Clark, Mathias Krause,
	Andrzej Hajda, Mel Gorman, Vlastimil Babka, Davidlohr Bueso,
	linux-fbdev

On Wed, May 20, 2015 at 12:53 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> Tomi,
>
> the new required ioremap_uc() which was added in the initial patch set here is
> now merged on linux-next but I just noticed a small issue with this atyfb
> specific patch, I'll fix that and respin and send to you as v5.

Actually since this series depends on ioremap_uc() and since I need to
respin the last patch in this series for atyfb provided I get an
Acked-by from you for the fbdev changes do you mind if this goes
through the x86 tree as that already has the ioremap_uc() required
call? Otherwise we will have one kernel release with that call
available but not used, and we'll have to wait for 2 releases before
these changes get merged.

Thoughts?

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH] mce: fix fail to set 'monarchtimeout' via boot option
  2015-05-20 17:43 ` Borislav Petkov
@ 2015-05-21  1:00   ` Xie XiuQi
  0 siblings, 0 replies; 710+ messages in thread
From: Xie XiuQi @ 2015-05-21  1:00 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: tony.luck, tglx, mingo, hpa, x86, linux-edac, linux-kernel

On 2015/5/21 1:43, Borislav Petkov wrote:
> On Wed, May 20, 2015 at 07:22:23PM +0800, Xie XiuQi wrote:
>> I use "mce=1,10000000" in cmdline to change the monarch timeout, but
>> it does not work.
>>
>> The cause is that get_option() has parsed the ',' already, we need
>> not to check the ',' again.
>>
>> --
>> get_option(): read an int from an option string;
>> if available accept a subsequent comma as well.
>>
>> Return values:
>> 0 - no int in string
>> 1 - int found, no subsequent comma
>> 2 - int found including a subsequent comma
>> 3 - hyphen found to denote a range
>>
>> Cc: <stable@vger.kernel.org>	# 2.6.32+
> 
> I don't think that's a serious enough a bug to justify the stable tag.
> 
>> Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
>> ---

...

>>  		return 0;
> 
> Anyway, I fixed it up and applied it.

Sorry, I will check carefully next time.

Thanks.

> 
> Thanks.
> 



^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-19  8:01 ` [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer Huang Rui
  2015-05-19 11:31   ` Borislav Petkov
@ 2015-05-21  1:34   ` Andy Lutomirski
  2015-05-21  5:48     ` Andy Lutomirski
  2015-05-21  9:41     ` Thomas Gleixner
  1 sibling, 2 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-05-21  1:34 UTC (permalink / raw)
  To: Huang Rui, Borislav Petkov, Len Brown, Rafael J. Wysocki,
	Thomas Gleixner
  Cc: x86, linux-kernel, Fengguang Wu, Aaron Lu, Tony Li,
	Peter Zijlstra, John Stultz

On 05/19/2015 01:01 AM, Huang Rui wrote:
> MWAITX/MWAIT does not let the cpu core go into C1 state on AMD processors.
> The cpu core still consumes less power while waiting, and has faster exit
> from waiting than "Halt". This patch implements an interface using the
> kernel parameter "idle=" to configure mwaitx type and timer value.
>
> If "idle=mwaitx", the timeout will be set as the maximum value
> ((2^64 - 1) * TSC cycle).
> If "idle=mwaitx,100", the timeout will be set as 100ns.
> If the processor doesn't support MWAITX, then halt is used.

I think this is wrong way to do this...

> +		x86_idle = mwaitx_idle;

...this is a legacy thing.  The modern idle path is cpuidle_idle_call, I 
believe, that that goes through the cpuidle subsystem, which has little 
to do with any of this.

Where is the MWAITX documentation?  It seems that AMD has failed to 
update the obvious reference:

http://developer.amd.com/resources/documentation-articles/developer-guides-manuals/

 From my vague understanding, MWAITX accepts a 32-bit maximum number of 
TSC ticks to wait.  If that's correct, and it's not too late to change, 
then: AMD, you blew it.  The correct way to do this would be to accept a 
64-bit absolute TSC deadline.

The 32-bit relative timeout model utterly sucks for two reasons. 
Suppose we tried to use it.  We'd have two major issues:

1. We can't sleep more than about 1.5 seconds because we'll overflow the 
deadline.

2. The relative timeout is annoying.  Imagine:

rdtsc
shove the computed timeout into ebx
<-- IRQ here
mwaitx

now we sleep too long.

We can do:

cli
rdtsc
shove the computed timeout into ebx
mov $1,%ecx
mwaitx
sti

but that's annoying and isn't really correct wrt NMIs.

So this sucks.

In any event, I think this is barely useful.

That being said, it might be worth teaching the timer code about a 
magical ideal type of clock that is simultaneously a perfect invariant 
high-res clocksource *and* a very fast (in fact free) wakeup source that 
uses the same time base.  In fact, Sandy Bridge and newer Intel CPUs 
have such a thing: it's called the TSC deadline timer.  I think it's 
much faster to reprogram than other timers, and it ought to avoid a 
whole bunch of complicated messy code that handles the fact that 
crappier timers have their own crappy time bases.

If we did that *and* we had a non-crappy mwaitx, then we could apply an 
optimization: when going idle, we could turn off the TSC deadline timer 
and use mwaitx instead.  This would about an interrupt if the event that 
wakes us is our timer.

In the mean time, I don't really see the point.

John, Peter, Thomas: would it actually make sense to teach the core 
timer/clockevent code about perfect time sources like invariant TSC + 
TSC deadline?  AFAICT right now we're not doing anything particularly 
interesting with the TSC deadline timer.

--Andy

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-21  1:34   ` Andy Lutomirski
@ 2015-05-21  5:48     ` Andy Lutomirski
  2015-05-27  1:01       ` Andy Lutomirski
  2015-05-21  9:41     ` Thomas Gleixner
  1 sibling, 1 reply; 710+ messages in thread
From: Andy Lutomirski @ 2015-05-21  5:48 UTC (permalink / raw)
  To: Huang Rui, Thomas Gleixner, Rafael J. Wysocki, Len Brown,
	Borislav Petkov
  Cc: John Stultz, Tony Li, X86 ML, Peter Zijlstra, Aaron Lu,
	Fengguang Wu, linux-kernel

On May 20, 2015 6:34 PM, "Andy Lutomirski" <luto@kernel.org> wrote:
> If we did that *and* we had a non-crappy mwaitx, then we could apply an optimization: when going idle, we could turn off the TSC deadline timer and use mwaitx instead.  This would about an interrupt if the event that wakes us is our timer.
>

Hey, Intel, want to document your secret "Timed MWAIT" feature?  It
causes a transition to C0 when the deadline expires (see 4.2.4 of the
Desktop 4th Generation Intel Core Processor Family Datasheet Volume 1,
order number 328897-001) and it even has an erratum (HSD63 / BDM32),
but the instruction itself doesn't appear to be documented.

--Andy

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
  2015-05-20 16:59           ` [PATCH v4 0/3] Compile-time stack frame pointer validation Linus Torvalds
  2015-05-20 17:20             ` Josh Poimboeuf
@ 2015-05-21  7:52             ` Ingo Molnar
  2015-05-21 12:12               ` Ingo Molnar
  2015-05-26 23:06               ` Andi Kleen
  1 sibling, 2 replies; 710+ messages in thread
From: Ingo Molnar @ 2015-05-21  7:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Poimboeuf, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Michal Marek, Peter Zijlstra, X86 ML,
	live-patching, linux-kernel, Andy Lutomirski, Denys Vlasenko,
	Brian Gerst, Peter Zijlstra, Borislav Petkov, Andrew Morton


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wed, May 20, 2015 at 9:25 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Wed, May 20, 2015 at 09:03:37AM -0700, Andy Lutomirski wrote:
> >>
> >> I've never quite understood what the '?' means.
> >
> > It basically means "here's a function address we found on the 
> > stack, which may or may not have been called."  It's needed 
> > because stack walking isn't currently 100% reliable.
> 
> It is often quite interesting and helpful, because it shows stale 
> data on the stack, giving clues about what happened just before.

Yes, it's basically a zero-cost tracer: often showing a partial trace 
of events that happened before.

> Now, I'd like gcc to generally be better about not wasting so much 
> stack frame, so in that sense I'd like to see fewer '?" entries just 
> from a code quality standpoint, but when debugging those things, the 
> downside of "noise" is often cancelled by the upside of "ahh, it 
> happens after calling X".
> 
> So the "perfect stack frames" is actually not as great a thing as 
> some people want to make it seem.

We should definitely also print out the '?' entries, they are very 
useful especially when analyzing rare, difficult to reproduce, 
sporadic bugs - which are usually the hardest to fix bugs.

The biggest long term plus of 'perfect stack frames' would not be to 
skip the '?' entries (we don't want to skip them!), but to be able to 
eventually build the kernel without frame pointers.

Especially on modern x86 CPUs with stack engines (latest Intel and AMD 
CPUs) that keeps ESP updates out of the later stages of execution 
pipelines, going from RBP framepointers to direct ESP use is 
beneficial to performance and compresses I$ footprint as well:

    text           data     bss      dec            hex filename
12150606        2565544 1634304 16350454         f97cf6 linux-CONFIG_FRAME_POINTERS=n/vmlinux
13282884        2571744 1617920 17472548        10a9c24 linux-CONFIG_FRAME_POINTERS=y/vmlinux

Here's the I$ cachemiss rate with the 'vfs-mix' workload that I used 
in the -falign-functions measuremenst gives this for 
CONFIG_FRAMEPOINTERS=y, on Intel Sandy Bridge (best of 9x10 runs):

 #
 # CONFIG_FRAMEPOINTERS=y
 #
 Performance counter stats for 'system wide' (10 runs):

       728,328,347      L1-icache-load-misses                                         ( +-  0.08% )  (100.00%)
    11,891,931,664      instructions                                                  ( +-  0.00% )
           300,023      context-switches                                              ( +-  0.00% )

       7.324048170 seconds time elapsed                                          ( +-  0.09% )

... and these are the I$ miss perf stats from running the same 
workload on a CONFIG_FRAMEPOINTERS=n kernel:

 #
 # CONFIG_FRAMEPOINTERS are not set
 #
 Performance counter stats for 'system wide' (10 runs):

       687,758,078      L1-icache-load-misses                                         ( +-  0.10% )  (100.00%)
    10,984,908,013      instructions                                                  ( +-  0.01% )
           300,021      context-switches                                              ( +-  0.00% )

       7.120867260 seconds time elapsed                                          ( +-  0.29% )

So if we disable frame pointers, then on this workload:

  - the kernel text size is 9.3% smaller
  - the number of instructions executed went down by about 8.2%
  - the cachemiss rate went down by about 5.9%
  - performance went up by about 2.8%.

The speedup is actually even better than 2.8%, if you look at average 
execution time:

linux-CONFIG_FRAME_POINTERS=y/res.txt:       7.324048170 seconds time elapsed                                          ( +-  0.09% )
linux-CONFIG_FRAME_POINTERS=y/res.txt:       7.470166715 seconds time elapsed                                          ( +-  1.01% )
linux-CONFIG_FRAME_POINTERS=y/res.txt:       7.365047474 seconds time elapsed                                          ( +-  0.25% )
linux-CONFIG_FRAME_POINTERS=y/res.txt:       7.828223324 seconds time elapsed                                          ( +-  2.04% )
linux-CONFIG_FRAME_POINTERS=y/res.txt:       7.427164489 seconds time elapsed                                          ( +-  0.70% )
linux-CONFIG_FRAME_POINTERS=y/res.txt:       7.385565350 seconds time elapsed                                          ( +-  0.35% )
linux-CONFIG_FRAME_POINTERS=y/res.txt:       7.560782318 seconds time elapsed                                          ( +-  1.68% )
linux-CONFIG_FRAME_POINTERS=y/res.txt:       7.399741309 seconds time elapsed                                          ( +-  0.74% )
linux-CONFIG_FRAME_POINTERS=y/res.txt:       7.303746766 seconds time elapsed                                          ( +-  0.04% )

 avg = 7.451609

linux-CONFIG_FRAME_POINTERS=n/res.txt:       7.201498813 seconds time elapsed                                          ( +-  0.86% )
linux-CONFIG_FRAME_POINTERS=n/res.txt:       7.120867260 seconds time elapsed                                          ( +-  0.29% )
linux-CONFIG_FRAME_POINTERS=n/res.txt:       7.141642635 seconds time elapsed                                          ( +-  0.15% )
linux-CONFIG_FRAME_POINTERS=n/res.txt:       7.217213506 seconds time elapsed                                          ( +-  0.85% )
linux-CONFIG_FRAME_POINTERS=n/res.txt:       7.163046581 seconds time elapsed                                          ( +-  0.56% )
linux-CONFIG_FRAME_POINTERS=n/res.txt:       7.128939439 seconds time elapsed                                          ( +-  0.23% )
linux-CONFIG_FRAME_POINTERS=n/res.txt:       7.256172853 seconds time elapsed                                          ( +-  0.82% )
linux-CONFIG_FRAME_POINTERS=n/res.txt:       7.122946768 seconds time elapsed                                          ( +-  0.23% )
linux-CONFIG_FRAME_POINTERS=n/res.txt:       7.126018578 seconds time elapsed                                          ( +-  0.18% )

 avg = 7.164260

Then with framepointers disabled this workload gets faster by 4.0% on 
average.

The average result is also pretty stable in the no-framepointers case, 
while it fluctuates more in the framepointers case. (and this is why 
the 'best runtime' favors the framepointers case - the average is 
closer to reality.)

So the performance advantages of not doing framepointers is not 
something we can ignore IMHO: but obviously performance isn't 
everything - so if stack unwinding is unrobust, then we need and
want frame pointers.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 1/4] x86, mwaitt: add monitorx and mwaitx instruction
  2015-05-19 11:29   ` Borislav Petkov
@ 2015-05-21  8:54     ` Huang Rui
  2015-05-21  9:35       ` Borislav Petkov
  0 siblings, 1 reply; 710+ messages in thread
From: Huang Rui @ 2015-05-21  8:54 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Len Brown, Rafael J. Wysocki, Thomas Gleixner, x86, linux-kernel,
	Fengguang Wu, Aaron Lu, Li, Tony

On Tue, May 19, 2015 at 07:29:45PM +0800, Borislav Petkov wrote:
> On Tue, May 19, 2015 at 04:01:09PM +0800, Huang Rui wrote:
> > On AMD Carrizo processors (Family 15h, Model 60h-6fh), there is a new
> > feature called MWAITT (Mwait with a timer) as an extension of
> > Monitor/Mwait.
> > 
> > MWAITT, another name is MWAITX (MWAIT with extensions), has a configurable
> > timer that causes MWAITX to exit on expiration.
> > 
> > Compared with MONITOR/MWAIT, there are minor differences in opcode and
> > input parameters.
> > 
> > MWAITX ECX[1]: enable timer if set
> > MWAITX EBX[31:0]: max wait time expressed in SW P0 clocks
> 
> What's the behavior if you set EBX to some value but don't enable the
> timer with ECX? Normal MWAIT?
> 

Apology to late reply. I was having some AMD internal discussions and
collecting more data about this feature these two days.

EBX will be unused if disable timer. You're right, then the behavior
is like normal MWAIT. :)

> > The software P0 frequency is the same as the TSC frequency.
> > 
> > Max timeout = EBX/(TSC frequency)
> 
> That's max timeout in seconds then.
> 

Actually, timeout = EBX * TSC cycle, EBX is the loop number that timer counts.
This is 32-bit counter, and the maximum value is 0xffffffff.

> > Signed-off-by: Huang Rui <ray.huang@amd.com>
> > ---
> >  arch/x86/include/asm/cpufeature.h |  1 +
> >  arch/x86/include/asm/mwait.h      | 25 +++++++++++++++++++++++++
> >  2 files changed, 26 insertions(+)
> > 
> > diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> > index 3d6606f..3ef1f6e 100644
> > --- a/arch/x86/include/asm/cpufeature.h
> > +++ b/arch/x86/include/asm/cpufeature.h
> > @@ -176,6 +176,7 @@
> >  #define X86_FEATURE_PERFCTR_NB  ( 6*32+24) /* NB performance counter extensions */
> >  #define X86_FEATURE_BPEXT	(6*32+26) /* data breakpoint extension */
> >  #define X86_FEATURE_PERFCTR_L2	( 6*32+28) /* L2 performance counter extensions */
> > +#define X86_FEATURE_MWAITT	( 6*32+29) /* Mwait extension (MonitorX/MwaitX) */
> >
> >  /*
> >   * Auxiliary flags: Linux defined - For features scattered in various
> > diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
> > index 653dfa7..b91136f 100644
> > --- a/arch/x86/include/asm/mwait.h
> > +++ b/arch/x86/include/asm/mwait.h
> > @@ -23,6 +23,14 @@ static inline void __monitor(const void *eax, unsigned long ecx,
> >  		     :: "a" (eax), "c" (ecx), "d"(edx));
> >  }
> >  
> > +static inline void __monitorx(const void *eax, unsigned long ecx,
> > +			     unsigned long edx)
> > +{
> > +	/* "monitorx %eax, %ecx, %edx;" */
> > +	asm volatile(".byte 0x0f, 0x01, 0xfa;"
> 
> Ah ok, ModRM extension to secondary opcode 0x1. Simply filling out the
> empty slots after SWAPGS, RDTSCP, ... :)
> 

Sorry, I might not get your meaning. Should it update like below:

asm volatile(".byte 0x0f, 0x01, 0xfa;"
             :: "a" (eax), "c" (ecx), "d" (edx));
                                        ^^^

> > +		     :: "a" (eax), "c" (ecx), "d"(edx));
> > +}
> > +
> >  static inline void __mwait(unsigned long eax, unsigned long ecx)
> >  {
> >  	/* "mwait %eax, %ecx;" */
> > @@ -30,6 +38,14 @@ static inline void __mwait(unsigned long eax, unsigned long ecx)
> >  		     :: "a" (eax), "c" (ecx));
> >  }
> >  
> > +static inline void __mwaitx(unsigned long eax, unsigned long ebx,
> > +		unsigned long ecx)
> > +{
> > +	/* "mwaitx %eax, %ebx, %ecx;" */
> > +	asm volatile(".byte 0x0f, 0x01, 0xfb;"
> > +		     :: "a" (eax), "b" (ebx), "c" (ecx));
> > +}
> > +
> >  static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
> >  {
> >  	trace_hardirqs_on();
> > @@ -38,6 +54,15 @@ static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
> >  		     :: "a" (eax), "c" (ecx));
> >  }
> >  
> > +static inline void __sti_mwaitx(unsigned long eax, unsigned long ebx,
> > +		unsigned long ecx)
> 
> Please align the argument on the new line to the opening brace:
> 
> static inline void __sti_mwaitx(unsigned long eax, unsigned long ebx,
> 				unsigned long ecx)

Got it, thanks.

Thanks,
Rui

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 1/4] x86, mwaitt: add monitorx and mwaitx instruction
  2015-05-21  8:54     ` Huang Rui
@ 2015-05-21  9:35       ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-21  9:35 UTC (permalink / raw)
  To: Huang Rui
  Cc: Len Brown, Rafael J. Wysocki, Thomas Gleixner, x86, linux-kernel,
	Fengguang Wu, Aaron Lu, Li, Tony

On Thu, May 21, 2015 at 04:54:29PM +0800, Huang Rui wrote:
> > Ah ok, ModRM extension to secondary opcode 0x1. Simply filling out the
> > empty slots after SWAPGS, RDTSCP, ... :)
> > 
> 
> Sorry, I might not get your meaning.

No no - I was simply discussing the instruction encoding. Code is fine.
:-)

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-21  1:34   ` Andy Lutomirski
  2015-05-21  5:48     ` Andy Lutomirski
@ 2015-05-21  9:41     ` Thomas Gleixner
  1 sibling, 0 replies; 710+ messages in thread
From: Thomas Gleixner @ 2015-05-21  9:41 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Huang Rui, Borislav Petkov, Len Brown, Rafael J. Wysocki, x86,
	linux-kernel, Fengguang Wu, Aaron Lu, Tony Li, Peter Zijlstra,
	John Stultz

On Wed, 20 May 2015, Andy Lutomirski wrote:
> John, Peter, Thomas: would it actually make sense to teach the core
> timer/clockevent code about perfect time sources like invariant TSC + TSC

Perfect? There is no such concept in timer land.

> deadline?  AFAICT right now we're not doing anything particularly interesting
> with the TSC deadline timer.

Interesting in what way? We still have to convert from and to
nanoseconds and deal with the clock skew ....

Thanks,

	tglx




^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
  2015-05-20 16:52           ` Borislav Petkov
@ 2015-05-21 10:16             ` Ingo Molnar
  2015-05-21 10:47               ` Borislav Petkov
  2015-05-27 14:17               ` [tip:x86/debug] x86/Documentation: Adapt Ingo' s " tip-bot for Borislav Petkov
  0 siblings, 2 replies; 710+ messages in thread
From: Ingo Molnar @ 2015-05-21 10:16 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Josh Poimboeuf, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Michal Marek, Peter Zijlstra, X86 ML,
	live-patching, linux-kernel, Linus Torvalds, Andy Lutomirski,
	Denys Vlasenko, Brian Gerst, Peter Zijlstra, Andrew Morton


* Borislav Petkov <bp@alien8.de> wrote:

> On Wed, May 20, 2015 at 11:25:37AM -0500, Josh Poimboeuf wrote:
> > > I've never quite understood what the '?' means.
> > 
> > It basically means "here's a function address we found on the stack,
> > which may or may not have been called."  It's needed because stack
> > walking isn't currently 100% reliable.
> 
> Yeah, that was not that trivial to figure out at the time:
> 
> unsigned long
> print_context_stack(struct thread_info *tinfo,
> 		...
> 
>                 if (__kernel_text_address(addr)) {
>                         if ((unsigned long) stack == bp + sizeof(long)) {
>                                 ops->address(data, addr, 1);
>                                 frame = frame->next_frame;
>                                 bp = (unsigned long) frame;
>                         } else {
>                                 ops->address(data, addr, 0);
>                         }
> 
> and that ops->address is
> 
> print_trace_address()
> |-> printk_stack_address()
> 
> So if I'm understanding this correctly, if rBP+8 is equal to rSP, i.e.
> return address is on the stack, then this frame got called.
> 
> Otherwise -> "?".
> 
> I might be missing something though...

So this is how we are printing backtraces on x86:

We always scan the full kernel stack for return addresses stored on 
the kernel stack(s) [*], from stack top to stack bottom, and print out 
anything that 'looks like' a kernel text address.

If it fits into the frame pointer chain, we print it without a 
question mark, knowing that it's part of the real backtrace.

If the address does not fit into our expected frame pointer chain we 
still print it, but we print a '?'. It can mean two things:

 - either the address is not part of the call chain: it's just stale
   values on the kernel stack, from earlier function calls. This is 
   the common case.

 - or it is part of the call chain, but the frame pointer was not set 
   up properly within the function, so we don't recognize it. See the 
   200+ assembly functions that Josh's build time validation found.

This way we will always print out the real call chain (plus a few more 
entries), regardless of whether the frame pointer was set up correctly 
or not - but in most cases we'll get the call chain right as well. The 
entries printed are strictly in stack order, so you can deduce more 
information from that as well.

The most important property of this method is that we _never_ lose 
information: we always strive to print _all_ addresses on the stack(s) 
that look like kernel text addresses, so if debug information is 
wrong, we still print out the real call chain as well - just with more 
question marks than ideal.

Thanks,

	Ingo

[*] For things like IRQ stacks and ISTs we also scan those stacks, in 
    the right order, and try to cross from one stack into another
    reconstructing the call chain. This works most of the time.


^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
  2015-05-20 17:20             ` Josh Poimboeuf
@ 2015-05-21 10:27               ` Ingo Molnar
  0 siblings, 0 replies; 710+ messages in thread
From: Ingo Molnar @ 2015-05-21 10:27 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Linus Torvalds, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Michal Marek, Peter Zijlstra, X86 ML,
	live-patching, linux-kernel, Andy Lutomirski, Denys Vlasenko,
	Brian Gerst, Peter Zijlstra, Borislav Petkov, Andrew Morton


* Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> On Wed, May 20, 2015 at 09:59:18AM -0700, Linus Torvalds wrote:
> > On Wed, May 20, 2015 at 9:25 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > > On Wed, May 20, 2015 at 09:03:37AM -0700, Andy Lutomirski wrote:
> > >>
> > >> I've never quite understood what the '?' means.
> > >
> > > It basically means "here's a function address we found on the 
> > > stack, which may or may not have been called."  It's needed 
> > > because stack walking isn't currently 100% reliable.
> > 
> > It is often quite interesting and helpful, because it shows stale 
> > data on the stack, giving clues about what happened just before.
> > 
> > Now, I'd like gcc to generally be better about not wasting so much 
> > stack frame, so in that sense I'd like to see fewer '?" entries 
> > just from a code quality standpoint, but when debugging those 
> > things, the downside of "noise" is often cancelled by the upside 
> > of "ahh, it happens after calling X".
> > 
> > So the "perfect stack frames" is actually not as great a thing as 
> > some people want to make it seem.
> 
> Ok, I can see how looking at stale stack data could be useful for 
> some of the really tough problems.

And note that the tough problems are actually the ones where we need 
that information the most. So any stack backtrace printing method must 
be biased towards helping the difficult scenarios - not the trivial 
crashes. That is one of the reasons why we are always printing the 
question marks.

> But right now, the meaning of '?' is ambiguous.  It could be stale 
> data, or it could be part of a frame for the current stack which was 
> skipped due to missing frame pointers or an exception.

Yes, of course. That's not a big problem as the actual symbolic 
information will tell us a lot, which allows us to reconstruct the 
real call chain, plus allows us to see any 'recent execution activity' 
that might be on the stack as stale entries.

> If we can somehow make the stack unwinder reliable, then it would at 
> least allow us to remove the ambiguity of the '?' entries.  And it 
> would reduce the "noise" for the majority of issues where we don't 
> care about stale stack data, and can simply ignore it.

Yes, but note the above consideration - the probability distribution 
of kernel bugs tends to have a _very_ long tail, with bugs that 
sometimes take years to trigger and fix. Kernel developers upstream 
and at distros tend to spend a disproportionately large amount of time 
staring at difficult to decode bugs.

For that reason it is far more important to still stay maintainable 
with those kinds of difficult bugs, than to make the resolution of 
trivial, unambiguous crashes a tiny bit easier by printing fewer 
'distractions'...

Also, note that the '?' entries have another role: they cross-check 
the unwinder.

If you think we'll be able to do a perfect unwinder then think again: 
debug info _will_ be messed up periodically, either by us or by 
tooling, because right now no kernel code or other functionality 
relies on perfect unwinding.

So this is not like C++ exception handling where broken unwinding will 
break the code. This is something that is literally only visible in 
kernel logs currently, as a slight anomaly.

So any x86 stack unwinder code must be fundamentally based on the idea 
and expectation that stack unwinding is always going to be somewhat 
imperfect and somewhat statistical.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
  2015-05-21 10:16             ` Ingo Molnar
@ 2015-05-21 10:47               ` Borislav Petkov
  2015-05-21 11:11                 ` Ingo Molnar
  2015-05-27 14:17               ` [tip:x86/debug] x86/Documentation: Adapt Ingo' s " tip-bot for Borislav Petkov
  1 sibling, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-21 10:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Josh Poimboeuf, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Michal Marek, Peter Zijlstra, X86 ML,
	live-patching, linux-kernel, Linus Torvalds, Andy Lutomirski,
	Denys Vlasenko, Brian Gerst, Peter Zijlstra, Andrew Morton

On Thu, May 21, 2015 at 12:16:14PM +0200, Ingo Molnar wrote:
> So this is how we are printing backtraces on x86:

<snip useful info>

This is pretty useful info and the question about the '?' keeps popping
up.

How about I moved Documentation/x86/x86_64/kernel-stacks to
Documentation/x86/kernel-stacks and added that info to it?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
  2015-05-21 10:47               ` Borislav Petkov
@ 2015-05-21 11:11                 ` Ingo Molnar
  2015-05-21 15:49                   ` [PATCH 1/3] x86/documentation: Move kernel-stacks doc one level up Borislav Petkov
  0 siblings, 1 reply; 710+ messages in thread
From: Ingo Molnar @ 2015-05-21 11:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Josh Poimboeuf, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Michal Marek, Peter Zijlstra, X86 ML,
	live-patching, linux-kernel, Linus Torvalds, Andy Lutomirski,
	Denys Vlasenko, Brian Gerst, Peter Zijlstra, Andrew Morton


* Borislav Petkov <bp@alien8.de> wrote:

> On Thu, May 21, 2015 at 12:16:14PM +0200, Ingo Molnar wrote:
> > So this is how we are printing backtraces on x86:
> 
> <snip useful info>
> 
> This is pretty useful info and the question about the '?' keeps popping
> up.
> 
> How about I moved Documentation/x86/x86_64/kernel-stacks to
> Documentation/x86/kernel-stacks and added that info to it?

Yeah, please do!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
  2015-05-21  7:52             ` Ingo Molnar
@ 2015-05-21 12:12               ` Ingo Molnar
  2015-05-26 23:06               ` Andi Kleen
  1 sibling, 0 replies; 710+ messages in thread
From: Ingo Molnar @ 2015-05-21 12:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Poimboeuf, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Michal Marek, Peter Zijlstra, X86 ML,
	live-patching, linux-kernel, Andy Lutomirski, Denys Vlasenko,
	Brian Gerst, Peter Zijlstra, Borislav Petkov, Andrew Morton


* Ingo Molnar <mingo@kernel.org> wrote:

> Especially on modern x86 CPUs with stack engines (latest Intel and 
> AMD CPUs) that keeps ESP updates out of the later stages of 
> execution pipelines, going from RBP framepointers to direct ESP use 
> is beneficial to performance and compresses I$ footprint as well:
> 
>     text           data     bss      dec            hex filename
> 12150606        2565544 1634304 16350454         f97cf6 linux-CONFIG_FRAME_POINTERS=n/vmlinux
> 13282884        2571744 1617920 17472548        10a9c24 linux-CONFIG_FRAME_POINTERS=y/vmlinux

Correction: I ran that with a 1-byte alignment patch still applied.

I reran all the numbers with the default 16-bytes alignment as well, 
and the gap between framepointers and no-framepointers become smaller, 
but the various trends and conclusions still hold.

Here are the updated numbers:

     text           data     bss      dec            hex filename
 13548564        2571744 1617920 17738228        10ea9f4 linux-CONFIG_FRAME_POINTERS=n/vmlinux
 13797773        2571744 1617920 17987437        112776d linux-CONFIG_FRAME_POINTERS=y/vmlinux

> Here's the I$ cachemiss rate with the 'vfs-mix' workload that I used 
> in the -falign-functions measuremenst gives this for 
> CONFIG_FRAMEPOINTERS=y, on Intel Sandy Bridge (best of 9x10 runs):
> 
>  #
>  # CONFIG_FRAMEPOINTERS=y
>  #
>  Performance counter stats for 'system wide' (10 runs):
> 
>        728,328,347      L1-icache-load-misses                                         ( +-  0.08% )  (100.00%)
>     11,891,931,664      instructions                                                  ( +-  0.00% )
>            300,023      context-switches                                              ( +-  0.00% )
> 
>        7.324048170 seconds time elapsed                                          ( +-  0.09% )


 Performance counter stats for 'system wide' (10 runs):

       701,525,006      L1-icache-load-misses                                         ( +-  0.06% )  (100.00%)
    11,891,793,196      instructions                                                  ( +-  0.01% )
           300,036      context-switches                                              ( +-  0.00% )

       7.354372294 seconds time elapsed                                          ( +-  0.82% )

> 
> ... and these are the I$ miss perf stats from running the same 
> workload on a CONFIG_FRAMEPOINTERS=n kernel:
> 
>  #
>  # CONFIG_FRAMEPOINTERS are not set
>  #
>  Performance counter stats for 'system wide' (10 runs):
> 
>        687,758,078      L1-icache-load-misses                                         ( +-  0.10% )  (100.00%)
>     10,984,908,013      instructions                                                  ( +-  0.01% )
>            300,021      context-switches                                              ( +-  0.00% )
> 
>        7.120867260 seconds time elapsed                                          ( +-  0.29% )

 Performance counter stats for 'system wide' (10 runs):

       685,107,089      L1-icache-load-misses                                         ( +-  0.08% )  (100.00%)
    10,983,861,590      instructions                                                  ( +-  0.01% )
           300,031      context-switches                                              ( +-  0.00% )

       7.120738452 seconds time elapsed                                          ( +-  0.35% )

> So if we disable frame pointers, then on this workload:
> 
>   - the kernel text size is 9.3% smaller
>   - the number of instructions executed went down by about 8.2%
>   - the cachemiss rate went down by about 5.9%
>   - performance went up by about 2.8%.

    - the kernel text size is 1.8% smaller: with 16 bytes alignment 
      there's quite some extra free space the frame pointer code can 
      grow into, which reduces the size win.

    - the number of instructions executed went down by about 8.2% (as 
      expected this is invariant of alignment.)

    - the cachemiss rate went down by about 2.7%: this is a smaller 
      win again, partly because of the 'free space' 16-byte alignment 
      gives us.

    - the best 'time elapsed' numbers out of 10 runs show a speedup of 
      2.0% - close to the 2.8% with 1-byte alignment.

> The speedup is actually even better than 2.8%, if you look at 
> average execution time:
> 
> linux-CONFIG_FRAME_POINTERS=y/res.txt:       7.324048170 seconds time elapsed                                          ( +-  0.09% )
> linux-CONFIG_FRAME_POINTERS=y/res.txt:       7.470166715 seconds time elapsed                                          ( +-  1.01% )
> linux-CONFIG_FRAME_POINTERS=y/res.txt:       7.365047474 seconds time elapsed                                          ( +-  0.25% )
> linux-CONFIG_FRAME_POINTERS=y/res.txt:       7.828223324 seconds time elapsed                                          ( +-  2.04% )
> linux-CONFIG_FRAME_POINTERS=y/res.txt:       7.427164489 seconds time elapsed                                          ( +-  0.70% )
> linux-CONFIG_FRAME_POINTERS=y/res.txt:       7.385565350 seconds time elapsed                                          ( +-  0.35% )
> linux-CONFIG_FRAME_POINTERS=y/res.txt:       7.560782318 seconds time elapsed                                          ( +-  1.68% )
> linux-CONFIG_FRAME_POINTERS=y/res.txt:       7.399741309 seconds time elapsed                                          ( +-  0.74% )
> linux-CONFIG_FRAME_POINTERS=y/res.txt:       7.303746766 seconds time elapsed                                          ( +-  0.04% )
> 
>  avg = 7.451609

linux-CONFIG_FRAME_POINTERS=y/res2.txt:       7.300875812 seconds time elapsed                                          ( +-  0.17% )
linux-CONFIG_FRAME_POINTERS=y/res2.txt:       7.491652338 seconds time elapsed                                          ( +-  1.33% )
linux-CONFIG_FRAME_POINTERS=y/res2.txt:       7.307877300 seconds time elapsed                                          ( +-  0.20% )
linux-CONFIG_FRAME_POINTERS=y/res2.txt:       7.258946461 seconds time elapsed                                          ( +-  0.23% )
linux-CONFIG_FRAME_POINTERS=y/res2.txt:       7.295113779 seconds time elapsed                                          ( +-  0.30% )
linux-CONFIG_FRAME_POINTERS=y/res2.txt:       7.283375859 seconds time elapsed                                          ( +-  0.21% )
linux-CONFIG_FRAME_POINTERS=y/res2.txt:       7.319320205 seconds time elapsed                                          ( +-  0.38% )
linux-CONFIG_FRAME_POINTERS=y/res2.txt:       7.354372294 seconds time elapsed                                          ( +-  0.82% )
linux-CONFIG_FRAME_POINTERS=y/res2.txt:       7.308955558 seconds time elapsed                                          ( +-  0.26% )
linux-CONFIG_FRAME_POINTERS=y/res2.txt:       7.295267101 seconds time elapsed                                          ( +-  0.26% )

avg=7.32

> linux-CONFIG_FRAME_POINTERS=n/res.txt:       7.201498813 seconds time elapsed                                          ( +-  0.86% )
> linux-CONFIG_FRAME_POINTERS=n/res.txt:       7.120867260 seconds time elapsed                                          ( +-  0.29% )
> linux-CONFIG_FRAME_POINTERS=n/res.txt:       7.141642635 seconds time elapsed                                          ( +-  0.15% )
> linux-CONFIG_FRAME_POINTERS=n/res.txt:       7.217213506 seconds time elapsed                                          ( +-  0.85% )
> linux-CONFIG_FRAME_POINTERS=n/res.txt:       7.163046581 seconds time elapsed                                          ( +-  0.56% )
> linux-CONFIG_FRAME_POINTERS=n/res.txt:       7.128939439 seconds time elapsed                                          ( +-  0.23% )
> linux-CONFIG_FRAME_POINTERS=n/res.txt:       7.256172853 seconds time elapsed                                          ( +-  0.82% )
> linux-CONFIG_FRAME_POINTERS=n/res.txt:       7.122946768 seconds time elapsed                                          ( +-  0.23% )
> linux-CONFIG_FRAME_POINTERS=n/res.txt:       7.126018578 seconds time elapsed                                          ( +-  0.18% )
> 
>  avg = 7.164260

linux-CONFIG_FRAME_POINTERS=n/res2.txt:       7.135061084 seconds time elapsed                                          ( +-  0.39% )
linux-CONFIG_FRAME_POINTERS=n/res2.txt:       7.132738388 seconds time elapsed                                          ( +-  0.34% )
linux-CONFIG_FRAME_POINTERS=n/res2.txt:       7.174334895 seconds time elapsed                                          ( +-  0.32% )
linux-CONFIG_FRAME_POINTERS=n/res2.txt:       7.215143851 seconds time elapsed                                          ( +-  0.71% )
linux-CONFIG_FRAME_POINTERS=n/res2.txt:       7.131166029 seconds time elapsed                                          ( +-  0.19% )
linux-CONFIG_FRAME_POINTERS=n/res2.txt:       7.270427197 seconds time elapsed                                          ( +-  1.22% )
linux-CONFIG_FRAME_POINTERS=n/res2.txt:       7.120738452 seconds time elapsed                                          ( +-  0.35% )
linux-CONFIG_FRAME_POINTERS=n/res2.txt:       7.168856127 seconds time elapsed                                          ( +-  0.27% )
linux-CONFIG_FRAME_POINTERS=n/res2.txt:       7.268637173 seconds time elapsed                                          ( +-  1.28% )
linux-CONFIG_FRAME_POINTERS=n/res2.txt:       7.178431781 seconds time elapsed                                          ( +-  0.32% )

avg=7.18

> Then with framepointers disabled this workload gets faster by 4.0% 
> on average.

With 16-byte alignment the average gets faster by 2.8%.

The conclusions are unchanged:

> The average result is also pretty stable in the no-framepointers 
> case, while it fluctuates more in the framepointers case. (and this 
> is why the 'best runtime' favors the framepointers case - the 
> average is closer to reality.)
> 
> So the performance advantages of not doing framepointers is not 
> something we can ignore IMHO: but obviously performance isn't 
> everything - so if stack unwinding is unrobust, then we need and 
> want frame pointers.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-19 11:31   ` Borislav Petkov
  2015-05-20  8:55     ` Ingo Molnar
@ 2015-05-21 13:26     ` Huang Rui
  1 sibling, 0 replies; 710+ messages in thread
From: Huang Rui @ 2015-05-21 13:26 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Len Brown, Rafael J. Wysocki, Thomas Gleixner, x86, linux-kernel,
	Fengguang Wu, Aaron Lu, Tony Li

On Tue, May 19, 2015 at 01:31:21PM +0200, Borislav Petkov wrote:
> On Tue, May 19, 2015 at 04:01:10PM +0800, Huang Rui wrote:
> > MWAITX/MWAIT does not let the cpu core go into C1 state on AMD processors.
> > The cpu core still consumes less power while waiting, and has faster exit
> > from waiting than "Halt". This patch implements an interface using the
> > kernel parameter "idle=" to configure mwaitx type and timer value.
> > 
> > If "idle=mwaitx", the timeout will be set as the maximum value
> > ((2^64 - 1) * TSC cycle).
> > If "idle=mwaitx,100", the timeout will be set as 100ns.
> > If the processor doesn't support MWAITX, then halt is used.
> 
> Ok, I see what you're trying here and I think this is not the optimal
> approach.
> 
> So let me explain how I see it, you correct me if I'm wrong:
> 
> So we want to do MWAITX so that we can save us idle entry/exit overhead
> with HLT. Because MWAITX is faster, reportedly.
> 
> Now, if we want to do that, we want to do it dynamically and adjust the
> MWAITX sleep interval depending on the system, usage pattern, system
> load and so on.

Yes, that's right. Maybe, even we can also find any other use case on
kernel other components.

> 
> And for that we would need an adaptive scheme which approximates each
> idle interval. Simply taking TSC before we enter idle and after we come
> out would give us each idle residency duration and we can do some simple
> math to approximate it.
> 
> Now, what would that bring us: faster wakeup times.
> 
> And here comes the 10^6 $ question: why are we doing all the fun?
> 

Do you mention the below codes:

/*
 * TSC loops (EBX input) = Timer(nsec) *
 * TSC freq(khz) / 1000000
 */
timeout = timeout * tsc_freq;
do_div(timeout, 1000000);

That's because the unit of tsc_freq is KHz and the unit of timeout is
nanosecond. Then I do div 10^6 to figure out the corresponding loops.

> I'm thinking we want to find a cutoff duration where for smaller
> durations it is worth to do MWAITX and have faster entry/exit times and
> for bigger durations we want to do HLT because it'll get into C1E and
> give us higher power savings.
> 
> We don't want to do MWAITX too long because that'll burn more power
> relatively to HLT but we don't want to do HLT for shorter periods
> because then entry/exit costs.
> 
> Am I on the right track at least?

Yes, that's totally right.

> 
> > Signed-off-by: Huang Rui <ray.huang@amd.com>
> > ---
> >  arch/x86/include/asm/mwait.h     |  2 +
> >  arch/x86/include/asm/processor.h |  2 +-
> >  arch/x86/kernel/process.c        | 79 ++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 82 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
> > index b91136f..c4e51e7 100644
> > --- a/arch/x86/include/asm/mwait.h
> > +++ b/arch/x86/include/asm/mwait.h
> > @@ -14,6 +14,8 @@
> >  #define CPUID5_ECX_INTERRUPT_BREAK	0x2
> >  
> >  #define MWAIT_ECX_INTERRUPT_BREAK	0x1
> > +#define MWAITX_ECX_TIMER_ENABLE		0x2
> 
> 						Use BIT(1) here.

OK.

> 
> > +#define MWAITX_EBX_WAIT_TIMEOUT		0xffffffff
> >  
> >  static inline void __monitor(const void *eax, unsigned long ecx,
> >  			     unsigned long edx)
> > diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> > index 23ba676..0f60e94 100644
> > --- a/arch/x86/include/asm/processor.h
> > +++ b/arch/x86/include/asm/processor.h
> > @@ -733,7 +733,7 @@ extern unsigned long		boot_option_idle_override;
> >  extern bool			amd_e400_c1e_detected;
> >  
> >  enum idle_boot_override {IDLE_NO_OVERRIDE=0, IDLE_HALT, IDLE_NOMWAIT,
> > -			 IDLE_POLL};
> > +			 IDLE_POLL, IDLE_MWAITX};
> >  
> >  extern void enable_sep_cpu(void);
> >  extern int sysenter_setup(void);
> > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > index 6e338e3..9d68193 100644
> > --- a/arch/x86/kernel/process.c
> > +++ b/arch/x86/kernel/process.c
> > @@ -30,6 +30,7 @@
> >  #include <asm/debugreg.h>
> >  #include <asm/nmi.h>
> >  #include <asm/tlbflush.h>
> > +#include <asm/x86_init.h>
> >  
> >  /*
> >   * per-CPU TSS segments. Thre ads are completely 'soft' on Linux,
> > @@ -276,6 +277,7 @@ unsigned long boot_option_idle_override = IDLE_NO_OVERRIDE;
> >  EXPORT_SYMBOL(boot_option_idle_override);
> >  
> >  static void (*x86_idle)(void);
> > +static unsigned long idle_param;
> >  
> >  #ifndef CONFIG_SMP
> >  static inline void play_dead(void)
> > @@ -444,6 +446,17 @@ static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
> >  	return 1;
> >  }
> >  
> > +static int not_support_mwaitx(const struct cpuinfo_x86 *c)
> > +{
> > +	if (c->x86_vendor != X86_VENDOR_AMD)
> > +		return 1;
> > +
> > +	if (!cpu_has(c, X86_FEATURE_MWAITT))
> > +		return 1;
> > +
> > +	return 0;
> > +}
> > +
> >  /*
> >   * MONITOR/MWAIT with no hints, used for default default C1 state.
> >   * This invokes MWAIT with interrutps enabled and no flags,
> > @@ -470,12 +483,45 @@ static void mwait_idle(void)
> >  	__current_clr_polling();
> >  }
> >  
> > +/*
> > + * AMD Excavator processors support the new MONITORX/MWAITX instructions.
> 
> No need for that especially when newer than XV processors start
> supporting those too.
> 

OK, got it.

Thanks,
Rui

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-20  9:12       ` Borislav Petkov
  2015-05-20 10:22         ` Ingo Molnar
@ 2015-05-21 14:15         ` Huang Rui
  1 sibling, 0 replies; 710+ messages in thread
From: Huang Rui @ 2015-05-21 14:15 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Len Brown, Rafael J. Wysocki, Thomas Gleixner, x86,
	linux-kernel, Fengguang Wu, Aaron Lu, Li, Tony

On Wed, May 20, 2015 at 05:12:13PM +0800, Borislav Petkov wrote:
> On Wed, May 20, 2015 at 10:55:20AM +0200, Ingo Molnar wrote:
> > Does it use it to decide how 'deep' a sleep it will go into, i.e. 
> > larger timeouts cause longer entry and exit latencies?
> 
> That's what the HLT thing does. Cores go into C1 and then at some point
> (hysteresis, etc) the whole core complex enters C1E.
> 
> The MWAIT* should be used for only shorter sleeps as it remains in C1.
> IMHO, of course.
> 
> But the problem there is another: what happens if the timeout fires,
> you wake up and see that you can remain idle? Do HLT? Do another MWAITX
> round?
> 
> This means you have an additional unnecessary wakeup which costs.
> 
> > I suppose it's also the case that if an interrupt arrives _before_ the 
> > expected timeout then MWAITX will try to exit immediately, it won't 
> > wait until the timeout, right?
> 
> I'd assume so - I mean, it must, right.
> 
> BUT!, in talking to Andy about it last night on IRC, he pointed out
> that when using acpi_idle, we never come to calling x86_idle() and from
> looking quickly at cpuidle_idle_call(), that still might be the case as
> we go to use_default only when there's an error with the cpuidle driver
> or so.
> 
> So Rui, before you go and do more work on it, you should probably
> analyze what cpuidle exactly does (if you haven't done so yet). And on
> AMD we do use acpi_idle - at least on my F15h box that is the case:
> 
> $ grep . /sys/devices/system/cpu/cpuidle/current_*
> /sys/devices/system/cpu/cpuidle/current_driver:acpi_idle
> /sys/devices/system/cpu/cpuidle/current_governor_ro:menu
> 

OK, I know. Thanks to reminder. :)

Thanks,
Rui

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-20 11:21               ` Borislav Petkov
  2015-05-20 11:41                 ` Ingo Molnar
@ 2015-05-21 14:32                 ` Huang Rui
  1 sibling, 0 replies; 710+ messages in thread
From: Huang Rui @ 2015-05-21 14:32 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Len Brown, Rafael J. Wysocki, Thomas Gleixner, x86,
	linux-kernel, Fengguang Wu, Aaron Lu, Li, Tony

On Wed, May 20, 2015 at 07:21:10PM +0800, Borislav Petkov wrote:
> On Wed, May 20, 2015 at 01:11:20PM +0200, Ingo Molnar wrote:
> >   - MWAITX takes a 'timeout' parameter, but otherwise behaves exactly 
> >     like MWAIT: i.e. once idle it won't exit idle on its own
> 
> Let me quote the commit message:
> 
> "MWAITT, another name is MWAITX (MWAIT with extensions), has a
> configurable timer that causes MWAITX to exit on expiration."
> 
> You need to set the second bit in ECX to enable the timer.
> 
> I guess if you don't, then you get normal MWAIT but then you don't need
> the timeout either...

That's right.

This feature will expose on Family 15h, Model 60-6fh. Just check the
http://developer.amd.com, sorry, I don't know why APM still doesn't
update. But I can raise your question to HW guys.

> 
> >   - based on the 'timeout' hint, MWAITX can internally optimize how 
> >     deep sleep it enters. If the timeout is large it goes deep, if 
> >     it's small, it goes shallow.
> 
> I haven't heard anything about handling the timeout this way and if it
> is not done this way, maybe Rui could forward this idea to hw people...
> 

Actually, there is another use case on HSA stack. The timer is used
for synchronization between CPU and GPU. The CPU core will exit
waiting when GPU thread modifies the monitoring address or the timer
expires. :)

Thanks,
Rui

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-20 20:15                             ` Borislav Petkov
@ 2015-05-21 14:56                               ` Huang Rui
  2015-05-21 16:02                                 ` Borislav Petkov
  0 siblings, 1 reply; 710+ messages in thread
From: Huang Rui @ 2015-05-21 14:56 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, One Thousand Gnomes, Ingo Molnar, Len Brown,
	Rafael J. Wysocki, x86, linux-kernel, Fengguang Wu, Aaron Lu, Li,
	Tony, Frédéric Weisbecker

On Thu, May 21, 2015 at 04:15:54AM +0800, Borislav Petkov wrote:
> On Wed, May 20, 2015 at 09:12:15PM +0200, Thomas Gleixner wrote:
> > Which would be good enough for mdelay/udelay I think, but we'd need to
> > measure the time spend in MWAITT so we wont return early.
> > 
> > Something like this:
> 
> Yeah, with a check maybe:
> 
> > 	  delay = usec_to_tsc(delay_usec);
> 
> 	if (delay > ((1 << 32) - 1)) {
> 		mdelay(delay_usec);
> 		return;
> 	}
> 
> > 	  end = rdtsc() + delay;
> > 	  while (1) {
> 
> I guess
> 		monitorx( ...);
> 
> first.
> 
> > 		MWAITT(delay);
> > 		now = rdtsc();
> > 		if (end <= now)
> > 		   	  break;
> > 		delay = end - now;
> > 	}
> > 
> > Now we'd need to add alternatives or some other mechanism to it to
> > make this conditionally for those machines.
> 
> alternative_call(mdelay, mdelayx, X86_FEATURE_MWAITT, /* no output */, timeout);
> 
> Something like that maybe.
> 
> > Not sure if it's worth the trouble.
> 
> Could be a use case for MWAITX in the kernel!
> 

Looks like good use case. Boris, could we try to implement it?

Thanks,
Rui

^ permalink raw reply	[flat|nested] 710+ messages in thread

* [PATCH 1/3] x86/documentation: Move kernel-stacks doc one level up
  2015-05-21 11:11                 ` Ingo Molnar
@ 2015-05-21 15:49                   ` Borislav Petkov
  2015-05-21 15:49                     ` [PATCH 2/3] x86/documentation: Remove STACKFAULT_STACK bulletpoint Borislav Petkov
  2015-05-21 15:49                     ` [PATCH 3/3] x86/documentation: Adapt Ingo's explanation on printing backtraces Borislav Petkov
  0 siblings, 2 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-21 15:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Josh Poimboeuf, Andy Lutomirski, Thomas Gleixner,
	H. Peter Anvin, Michal Marek, Peter Zijlstra, X86 ML,
	live-patching, Linus Torvalds, Denys Vlasenko, Brian Gerst,
	Peter Zijlstra, Andrew Morton

From: Borislav Petkov <bp@suse.de>

... to Documentation/x86/ as it is going to collect more and not only
64-bit specific info.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: X86 ML <x86@kernel.org>
Cc: live-patching@vger.kernel.org
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 Documentation/x86/{x86_64 => }/kernel-stacks | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename Documentation/x86/{x86_64 => }/kernel-stacks (100%)

diff --git a/Documentation/x86/x86_64/kernel-stacks b/Documentation/x86/kernel-stacks
similarity index 100%
rename from Documentation/x86/x86_64/kernel-stacks
rename to Documentation/x86/kernel-stacks
-- 
2.3.5


^ permalink raw reply	[flat|nested] 710+ messages in thread

* [PATCH 2/3] x86/documentation: Remove STACKFAULT_STACK bulletpoint
  2015-05-21 15:49                   ` [PATCH 1/3] x86/documentation: Move kernel-stacks doc one level up Borislav Petkov
@ 2015-05-21 15:49                     ` Borislav Petkov
  2015-05-21 15:49                     ` [PATCH 3/3] x86/documentation: Adapt Ingo's explanation on printing backtraces Borislav Petkov
  1 sibling, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-21 15:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Josh Poimboeuf, Andy Lutomirski, Thomas Gleixner,
	H. Peter Anvin, Michal Marek, Peter Zijlstra, X86 ML,
	live-patching, Linus Torvalds, Denys Vlasenko, Brian Gerst,
	Peter Zijlstra, Andrew Morton

From: Borislav Petkov <bp@suse.de>

Update the documentation after

  6f442be2fb22 ("x86_64, traps: Stop using IST for #SS").

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: X86 ML <x86@kernel.org>
Cc: live-patching@vger.kernel.org
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 Documentation/x86/kernel-stacks | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/Documentation/x86/kernel-stacks b/Documentation/x86/kernel-stacks
index e3c8a49d1a2f..c3c935b9d56e 100644
--- a/Documentation/x86/kernel-stacks
+++ b/Documentation/x86/kernel-stacks
@@ -1,3 +1,6 @@
+Kernel stacks on x86-64 bit
+---------------------------
+
 Most of the text from Keith Owens, hacked by AK
 
 x86_64 page size (PAGE_SIZE) is 4K.
@@ -56,13 +59,6 @@ If that assumption is ever broken then the stacks will become corrupt.
 
 The currently assigned IST stacks are :-
 
-* STACKFAULT_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
-
-  Used for interrupt 12 - Stack Fault Exception (#SS).
-
-  This allows the CPU to recover from invalid stack segments. Rarely
-  happens.
-
 * DOUBLEFAULT_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
 
   Used for interrupt 8 - Double Fault Exception (#DF).
-- 
2.3.5


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH 3/3] x86/documentation: Adapt Ingo's explanation on printing backtraces
  2015-05-21 15:49                   ` [PATCH 1/3] x86/documentation: Move kernel-stacks doc one level up Borislav Petkov
  2015-05-21 15:49                     ` [PATCH 2/3] x86/documentation: Remove STACKFAULT_STACK bulletpoint Borislav Petkov
@ 2015-05-21 15:49                     ` Borislav Petkov
  1 sibling, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-21 15:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Josh Poimboeuf, Andy Lutomirski, Thomas Gleixner,
	H. Peter Anvin, Michal Marek, Peter Zijlstra, X86 ML,
	live-patching, Linus Torvalds, Denys Vlasenko, Brian Gerst,
	Peter Zijlstra, Andrew Morton

From: Borislav Petkov <bp@suse.de>

Hold it down for future reference, as the question about the question
mark in stack traces keeps popping up.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: X86 ML <x86@kernel.org>
Cc: live-patching@vger.kernel.org
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20150521101614.GA10889@gmail.com
---
 Documentation/x86/kernel-stacks | 44 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/Documentation/x86/kernel-stacks b/Documentation/x86/kernel-stacks
index c3c935b9d56e..0f3a6c201943 100644
--- a/Documentation/x86/kernel-stacks
+++ b/Documentation/x86/kernel-stacks
@@ -95,3 +95,47 @@ The currently assigned IST stacks are :-
   assumptions about the previous state of the kernel stack.
 
 For more details see the Intel IA32 or AMD AMD64 architecture manuals.
+
+
+Printing backtraces on x86
+--------------------------
+
+The question about the '?' preceding function names in an x86 stacktrace
+keeps popping up, here's an indepth explanation. It helps if the reader
+stares at print_context_stack() and the whole machinery in and around
+arch/x86/kernel/dumpstack.c.
+
+Adapted from Ingo's mail, Message-ID: <20150521101614.GA10889@gmail.com>:
+
+We always scan the full kernel stack for return addresses stored on
+the kernel stack(s) [*], from stack top to stack bottom, and print out
+anything that 'looks like' a kernel text address.
+
+If it fits into the frame pointer chain, we print it without a question
+mark, knowing that it's part of the real backtrace.
+
+If the address does not fit into our expected frame pointer chain we
+still print it, but we print a '?'. It can mean two things:
+
+ - either the address is not part of the call chain: it's just stale
+   values on the kernel stack, from earlier function calls. This is
+   the common case.
+
+ - or it is part of the call chain, but the frame pointer was not set
+   up properly within the function, so we don't recognize it.
+
+This way we will always print out the real call chain (plus a few more
+entries), regardless of whether the frame pointer was set up correctly
+or not - but in most cases we'll get the call chain right as well. The
+entries printed are strictly in stack order, so you can deduce more
+information from that as well.
+
+The most important property of this method is that we _never_ lose
+information: we always strive to print _all_ addresses on the stack(s)
+that look like kernel text addresses, so if debug information is wrong,
+we still print out the real call chain as well - just with more question
+marks than ideal.
+
+[*] For things like IRQ and IST stacks, we also scan those stacks, in
+    the right order, and try to cross from one stack into another
+    reconstructing the call chain. This works most of the time.
-- 
2.3.5


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-21 14:56                               ` Huang Rui
@ 2015-05-21 16:02                                 ` Borislav Petkov
  2015-05-21 16:45                                   ` Andy Lutomirski
  2015-05-21 19:30                                   ` Thomas Gleixner
  0 siblings, 2 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-21 16:02 UTC (permalink / raw)
  To: Huang Rui
  Cc: Thomas Gleixner, One Thousand Gnomes, Ingo Molnar, Len Brown,
	Rafael J. Wysocki, x86, linux-kernel, Fengguang Wu, Aaron Lu, Li,
	Tony, Frédéric Weisbecker, Andy Lutomirski

On Thu, May 21, 2015 at 10:56:32PM +0800, Huang Rui wrote:
> Looks like good use case. Boris, could we try to implement it?

Andy had some suggestions on how to do it here:

https://lkml.kernel.org/r/555D3629.8080002@kernel.org

which should be doable. Also, you'd probably need to set ECX[0]=0b too,
so that MWAITX doesn't get woken up by interrupts while MWAIT-ing with
interrupts disabled. I.e., this sequence:

cli
rdtsc
shove the computed timeout into ebx
mov $2,%ecx				# this enables the timer and disables IRQs while MWAITing
mwaitx
sti

The NMI argument is a problem though - if and NMI gets you out of
MWAITX, a simple perf tool workload would kill all MWAITX executions.
Which is bad. :-\

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-21 16:02                                 ` Borislav Petkov
@ 2015-05-21 16:45                                   ` Andy Lutomirski
  2015-05-21 17:08                                     ` Borislav Petkov
  2015-05-21 19:30                                   ` Thomas Gleixner
  1 sibling, 1 reply; 710+ messages in thread
From: Andy Lutomirski @ 2015-05-21 16:45 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Huang Rui, Thomas Gleixner, One Thousand Gnomes, Ingo Molnar,
	Len Brown, Rafael J. Wysocki, x86, linux-kernel, Fengguang Wu,
	Aaron Lu, Li, Tony, Frédéric Weisbecker

On Thu, May 21, 2015 at 9:02 AM, Borislav Petkov <bp@suse.de> wrote:
> On Thu, May 21, 2015 at 10:56:32PM +0800, Huang Rui wrote:
>> Looks like good use case. Boris, could we try to implement it?
>
> Andy had some suggestions on how to do it here:
>
> https://lkml.kernel.org/r/555D3629.8080002@kernel.org
>
> which should be doable. Also, you'd probably need to set ECX[0]=0b too,
> so that MWAITX doesn't get woken up by interrupts while MWAIT-ing with
> interrupts disabled. I.e., this sequence:
>
> cli
> rdtsc
> shove the computed timeout into ebx
> mov $2,%ecx                             # this enables the timer and disables IRQs while MWAITing
> mwaitx
> sti

I must be missing something.  In this sequence, you're sleeping with
IF=0 and ECX[0] = 0, so an IRQ won't get handled.  Don't we want
ECX[0] = 1?

>
> The NMI argument is a problem though - if and NMI gets you out of
> MWAITX, a simple perf tool workload would kill all MWAITX executions.
> Which is bad. :-\

I'm not sure it's a show-stopper.  NMI handlers are meant to be fast.
If an NMI comes in between rdtsc and mwaitx, then we oversleep, but by
at most the time it takes to handle an NMI, and nothing would have
stopped us from oversleeping that long if an NMI came in right after
mwaitx returned.

--Andy

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-21 16:45                                   ` Andy Lutomirski
@ 2015-05-21 17:08                                     ` Borislav Petkov
  2015-05-21 17:12                                       ` Andy Lutomirski
  0 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-21 17:08 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Huang Rui, Thomas Gleixner, One Thousand Gnomes, Ingo Molnar,
	Len Brown, Rafael J. Wysocki, x86, linux-kernel, Fengguang Wu,
	Aaron Lu, Li, Tony, Frédéric Weisbecker

On Thu, May 21, 2015 at 09:45:10AM -0700, Andy Lutomirski wrote:
> I must be missing something.  In this sequence, you're sleeping with
> IF=0 and ECX[0] = 0, so an IRQ won't get handled.  Don't we want
> ECX[0] = 1?

Hmm, so actually we don't want to sleep with interrupts disabled. If
ECX[0]=1b, then an interrupt will wake MWAIT. So then you have to do the
loop thing as tglx suggested.

> > The NMI argument is a problem though - if and NMI gets you out of
> > MWAITX, a simple perf tool workload would kill all MWAITX executions.
> > Which is bad. :-\
> 
> I'm not sure it's a show-stopper.  NMI handlers are meant to be fast.
> If an NMI comes in between rdtsc and mwaitx, then we oversleep, but by
> at most the time it takes to handle an NMI, and nothing would have
> stopped us from oversleeping that long if an NMI came in right after
> mwaitx returned.

Actually, I'm thinking about an NMI happening after we've issued MWAIT.
NMIs wake it up. So you have the same problem as above:

NMIs will wake MWAIT so you'd need to check how long you've slept and
sleep for the remaining time. I.e., something like that thing from a
couple of mails ago:

	delay = usec_to_tsc(delay_usec);

        if (delay > ((1 << 32) - 1)) {
                mdelay(delay_usec);
                return;
        }

	end = rdtsc() + delay;
	while (1) {

		monitorx( ...); /* Do we need that here? */
		mwaitx(delay);

		/* possible wakeups */

		now = rdtsc();
		if (end <= now)
			break;
		delay = end - now;
	}


Yes, no?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-21 17:08                                     ` Borislav Petkov
@ 2015-05-21 17:12                                       ` Andy Lutomirski
  0 siblings, 0 replies; 710+ messages in thread
From: Andy Lutomirski @ 2015-05-21 17:12 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Huang Rui, Thomas Gleixner, One Thousand Gnomes, Ingo Molnar,
	Len Brown, Rafael J. Wysocki, x86, linux-kernel, Fengguang Wu,
	Aaron Lu, Li, Tony, Frédéric Weisbecker

On Thu, May 21, 2015 at 10:08 AM, Borislav Petkov <bp@suse.de> wrote:
> On Thu, May 21, 2015 at 09:45:10AM -0700, Andy Lutomirski wrote:
>> I must be missing something.  In this sequence, you're sleeping with
>> IF=0 and ECX[0] = 0, so an IRQ won't get handled.  Don't we want
>> ECX[0] = 1?
>
> Hmm, so actually we don't want to sleep with interrupts disabled. If
> ECX[0]=1b, then an interrupt will wake MWAIT. So then you have to do the
> loop thing as tglx suggested.
>
>> > The NMI argument is a problem though - if and NMI gets you out of
>> > MWAITX, a simple perf tool workload would kill all MWAITX executions.
>> > Which is bad. :-\
>>
>> I'm not sure it's a show-stopper.  NMI handlers are meant to be fast.
>> If an NMI comes in between rdtsc and mwaitx, then we oversleep, but by
>> at most the time it takes to handle an NMI, and nothing would have
>> stopped us from oversleeping that long if an NMI came in right after
>> mwaitx returned.
>
> Actually, I'm thinking about an NMI happening after we've issued MWAIT.
> NMIs wake it up. So you have the same problem as above:
>
> NMIs will wake MWAIT so you'd need to check how long you've slept and
> sleep for the remaining time. I.e., something like that thing from a
> couple of mails ago:
>
>         delay = usec_to_tsc(delay_usec);
>
>         if (delay > ((1 << 32) - 1)) {
>                 mdelay(delay_usec);
>                 return;
>         }
>
>         end = rdtsc() + delay;
>         while (1) {
>
>                 monitorx( ...); /* Do we need that here? */
>                 mwaitx(delay);
>
>                 /* possible wakeups */
>
>                 now = rdtsc();
>                 if (end <= now)
>                         break;
>                 delay = end - now;
>         }
>
>
> Yes, no?

Yes, but there should already be an adequate outer loop around the
whole thing.  After all, even regular mwait can have spurious wakeups
due to monitor monitoring the entire cache line.

--Andy

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-21 16:02                                 ` Borislav Petkov
  2015-05-21 16:45                                   ` Andy Lutomirski
@ 2015-05-21 19:30                                   ` Thomas Gleixner
  1 sibling, 0 replies; 710+ messages in thread
From: Thomas Gleixner @ 2015-05-21 19:30 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Huang Rui, One Thousand Gnomes, Ingo Molnar, Len Brown,
	Rafael J. Wysocki, x86, linux-kernel, Fengguang Wu, Aaron Lu, Li,
	Tony, Frédéric Weisbecker, Andy Lutomirski

On Thu, 21 May 2015, Borislav Petkov wrote:

> On Thu, May 21, 2015 at 10:56:32PM +0800, Huang Rui wrote:
> > Looks like good use case. Boris, could we try to implement it?
> 
> Andy had some suggestions on how to do it here:
> 
> https://lkml.kernel.org/r/555D3629.8080002@kernel.org
> 
> which should be doable. Also, you'd probably need to set ECX[0]=0b too,
> so that MWAITX doesn't get woken up by interrupts while MWAIT-ing with
> interrupts disabled. I.e., this sequence:
> 
> cli
> rdtsc
> shove the computed timeout into ebx
> mov $2,%ecx				# this enables the timer and disables IRQs while MWAITing
> mwaitx
> sti

And the above sucks for udelay, because you disable interrupts for
random amounts of time.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
  2015-05-20 14:48     ` Ingo Molnar
  2015-05-20 15:51       ` Josh Poimboeuf
  2015-05-20 16:03       ` Andy Lutomirski
@ 2015-05-21 20:54       ` Josh Poimboeuf
  2015-05-21 21:53         ` Andy Lutomirski
  2015-05-21 22:01         ` Borislav Petkov
  2 siblings, 2 replies; 710+ messages in thread
From: Josh Poimboeuf @ 2015-05-21 20:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Michal Marek,
	Peter Zijlstra, x86, live-patching, linux-kernel, Linus Torvalds,
	Andy Lutomirski, Denys Vlasenko, Brian Gerst, Peter Zijlstra,
	Borislav Petkov, Andrew Morton

On Wed, May 20, 2015 at 04:48:10PM +0200, Ingo Molnar wrote:
> Yeah, so many of these seem to be 'leaf only' functions: functions 
> that don't ever call functions themselves.
> 
> So lets assume we always have CONFIG_FRAME_POINTERS=y.
> 
> If they don't set up a frame pointer then they in essence won't show 
> up in the call chain - but normally they wouldn't because they call 
> nothing.
> 
> If they trigger an exception/fault or if they get hit by an interrupt 
> then I think we'll still correctly walk the stack - just those 
> functions might be missing from the deterministic call chain, right? 
> (it will still show up as a '?' entry.)
> 
> If they crash then we'll see them because the crashing RIP will be 
> printed.
> 
> So I'm wondering what the x86 policy here should be: to create frame 
> pointers in them or not. Cc:-ed a few more gents for thoughts.

After removing the frame pointer checks for leaf functions, and adding a
check for all functions which jump outside of their scope, the number of
defconfig warnings dropped from 89 -> 47.  The Fedora config warning
count dropped from 207 -> 83.

Here are the remaining 47 warnings for defconfig:

stackvalidate: arch/x86/ia32/ia32entry.o: ia32_sysenter_target() is missing frame pointer logic
stackvalidate: arch/x86/ia32/ia32entry.o: return instruction outside of a function at .entry.text+0x52e
stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x359
stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x19be
stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x19e5
stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x1c21
stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x1ceb
stackvalidate: arch/x86/kernel/acpi/wakeup_64.o: unsupported jump to outside of the function at wakeup_long64+0x15
stackvalidate: arch/x86/kernel/acpi/wakeup_64.o: do_suspend_lowlevel() is missing frame pointer logic
stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x6b
stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0xc7
stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x110
stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x145
stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x1c4
stackvalidate: arch/x86/kernel/head_64.o: return instruction outside of a function at .head.text+0x1a2
stackvalidate: arch/x86/kernel/head_64.o: early_idt_handler() is missing frame pointer logic
stackvalidate: arch/x86/platform/efi/efi_stub_64.o: efi_call() is missing frame pointer logic
stackvalidate: arch/x86/realmode/rm/trampoline_64.o: return instruction outside of a function at .text+0x170
stackvalidate: arch/x86/realmode/rm/trampoline_64.o: return instruction outside of a function at .text+0x176
stackvalidate: arch/x86/realmode/rm/reboot.o: return instruction outside of a function at .text+0x2a
stackvalidate: arch/x86/realmode/rm/copy.o: copy_from_fs() is missing frame pointer logic
stackvalidate: arch/x86/realmode/rm/copy.o: copy_to_fs() is missing frame pointer logic
stackvalidate: arch/x86/power/hibernate_asm_64.o: return instruction outside of a function at .text+0x69
stackvalidate: arch/x86/power/hibernate_asm_64.o: return instruction outside of a function at .text+0x16d
stackvalidate: arch/x86/lib/copy_user_64.o: unsupported jump to outside of the function at _copy_to_user+0x25
stackvalidate: arch/x86/lib/copy_user_64.o: unsupported jump to outside of the function at _copy_from_user+0x25
stackvalidate: arch/x86/lib/getuser.o: unsupported jump to outside of the function at __get_user_1+0x14
stackvalidate: arch/x86/lib/getuser.o: unsupported jump to outside of the function at __get_user_2+0x4
stackvalidate: arch/x86/lib/getuser.o: unsupported jump to outside of the function at __get_user_4+0x4
stackvalidate: arch/x86/lib/getuser.o: unsupported jump to outside of the function at __get_user_8+0x4
stackvalidate: arch/x86/lib/getuser.o: return instruction outside of a function at .text+0xc5
stackvalidate: arch/x86/lib/memmove_64.o: return instruction outside of a function at .altinstr_replacement+0x5
stackvalidate: arch/x86/lib/putuser.o: unsupported jump to outside of the function at __put_user_1+0x14
stackvalidate: arch/x86/lib/putuser.o: unsupported jump to outside of the function at __put_user_2+0x1b
stackvalidate: arch/x86/lib/putuser.o: unsupported jump to outside of the function at __put_user_4+0x1b
stackvalidate: arch/x86/lib/putuser.o: unsupported jump to outside of the function at __put_user_8+0x1b
stackvalidate: arch/x86/lib/putuser.o: return instruction outside of a function at .text+0xc1
stackvalidate: arch/x86/lib/rwsem.o: call_rwsem_down_read_failed() is missing frame pointer logic
stackvalidate: arch/x86/lib/rwsem.o: call_rwsem_down_write_failed() is missing frame pointer logic
stackvalidate: arch/x86/lib/rwsem.o: call_rwsem_wake() is missing frame pointer logic
stackvalidate: arch/x86/lib/rwsem.o: call_rwsem_downgrade_wake() is missing frame pointer logic
stackvalidate: arch/x86/boot/copy.o: copy_from_fs() is missing frame pointer logic
stackvalidate: arch/x86/boot/copy.o: copy_to_fs() is missing frame pointer logic
stackvalidate: arch/x86/boot/compressed/head_64.o: return instruction outside of a function at .text+0x16e
stackvalidate: arch/x86/boot/compressed/head_64.o: return instruction outside of a function at .text+0x172
stackvalidate: arch/x86/boot/compressed/head_64.o: startup_32() is missing frame pointer logic
stackvalidate: arch/x86/boot/pmjump.o: unsupported jump to outside of the function at in_pm32+0x1c

Note that only 13 of the 47 warnings are actually due to missing frame
pointer logic.  The rest are ambiguous conditions which prevent
stackvalidate from being able to make sense of things: returning from
outside of a proper ELF function, or jumping from inside of a function
to outside of its scope.

Similarly, in the Fedora config case, only 27 of the 83 warnings are for
missing frame pointer logic.

If there are no objections, I'll go with this approach in the next
version of the patch set.

Thanks!

-- 
Josh

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
  2015-05-21 20:54       ` Josh Poimboeuf
@ 2015-05-21 21:53         ` Andy Lutomirski
  2015-05-22 14:53           ` Josh Poimboeuf
  2015-05-21 22:01         ` Borislav Petkov
  1 sibling, 1 reply; 710+ messages in thread
From: Andy Lutomirski @ 2015-05-21 21:53 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Ingo Molnar, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Michal Marek, Peter Zijlstra, X86 ML, live-patching,
	linux-kernel, Linus Torvalds, Andy Lutomirski, Denys Vlasenko,
	Brian Gerst, Peter Zijlstra, Borislav Petkov, Andrew Morton

On Thu, May 21, 2015 at 1:54 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Wed, May 20, 2015 at 04:48:10PM +0200, Ingo Molnar wrote:
>> Yeah, so many of these seem to be 'leaf only' functions: functions
>> that don't ever call functions themselves.
>>
>> So lets assume we always have CONFIG_FRAME_POINTERS=y.
>>
>> If they don't set up a frame pointer then they in essence won't show
>> up in the call chain - but normally they wouldn't because they call
>> nothing.
>>
>> If they trigger an exception/fault or if they get hit by an interrupt
>> then I think we'll still correctly walk the stack - just those
>> functions might be missing from the deterministic call chain, right?
>> (it will still show up as a '?' entry.)
>>
>> If they crash then we'll see them because the crashing RIP will be
>> printed.
>>
>> So I'm wondering what the x86 policy here should be: to create frame
>> pointers in them or not. Cc:-ed a few more gents for thoughts.
>
> After removing the frame pointer checks for leaf functions, and adding a
> check for all functions which jump outside of their scope, the number of
> defconfig warnings dropped from 89 -> 47.  The Fedora config warning
> count dropped from 207 -> 83.
>
> Here are the remaining 47 warnings for defconfig:
>
> stackvalidate: arch/x86/ia32/ia32entry.o: ia32_sysenter_target() is missing frame pointer logic
> stackvalidate: arch/x86/ia32/ia32entry.o: return instruction outside of a function at .entry.text+0x52e
> stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x359
> stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x19be
> stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x19e5
> stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x1c21
> stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x1ceb
> stackvalidate: arch/x86/kernel/acpi/wakeup_64.o: unsupported jump to outside of the function at wakeup_long64+0x15
> stackvalidate: arch/x86/kernel/acpi/wakeup_64.o: do_suspend_lowlevel() is missing frame pointer logic
> stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x6b
> stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0xc7
> stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x110
> stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x145
> stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x1c4
> stackvalidate: arch/x86/kernel/head_64.o: return instruction outside of a function at .head.text+0x1a2
> stackvalidate: arch/x86/kernel/head_64.o: early_idt_handler() is missing frame pointer logic
> stackvalidate: arch/x86/platform/efi/efi_stub_64.o: efi_call() is missing frame pointer logic
> stackvalidate: arch/x86/realmode/rm/trampoline_64.o: return instruction outside of a function at .text+0x170
> stackvalidate: arch/x86/realmode/rm/trampoline_64.o: return instruction outside of a function at .text+0x176
> stackvalidate: arch/x86/realmode/rm/reboot.o: return instruction outside of a function at .text+0x2a
> stackvalidate: arch/x86/realmode/rm/copy.o: copy_from_fs() is missing frame pointer logic
> stackvalidate: arch/x86/realmode/rm/copy.o: copy_to_fs() is missing frame pointer logic
> stackvalidate: arch/x86/power/hibernate_asm_64.o: return instruction outside of a function at .text+0x69
> stackvalidate: arch/x86/power/hibernate_asm_64.o: return instruction outside of a function at .text+0x16d
> stackvalidate: arch/x86/lib/copy_user_64.o: unsupported jump to outside of the function at _copy_to_user+0x25
> stackvalidate: arch/x86/lib/copy_user_64.o: unsupported jump to outside of the function at _copy_from_user+0x25
> stackvalidate: arch/x86/lib/getuser.o: unsupported jump to outside of the function at __get_user_1+0x14
> stackvalidate: arch/x86/lib/getuser.o: unsupported jump to outside of the function at __get_user_2+0x4
> stackvalidate: arch/x86/lib/getuser.o: unsupported jump to outside of the function at __get_user_4+0x4
> stackvalidate: arch/x86/lib/getuser.o: unsupported jump to outside of the function at __get_user_8+0x4
> stackvalidate: arch/x86/lib/getuser.o: return instruction outside of a function at .text+0xc5
> stackvalidate: arch/x86/lib/memmove_64.o: return instruction outside of a function at .altinstr_replacement+0x5
> stackvalidate: arch/x86/lib/putuser.o: unsupported jump to outside of the function at __put_user_1+0x14
> stackvalidate: arch/x86/lib/putuser.o: unsupported jump to outside of the function at __put_user_2+0x1b
> stackvalidate: arch/x86/lib/putuser.o: unsupported jump to outside of the function at __put_user_4+0x1b
> stackvalidate: arch/x86/lib/putuser.o: unsupported jump to outside of the function at __put_user_8+0x1b
> stackvalidate: arch/x86/lib/putuser.o: return instruction outside of a function at .text+0xc1
> stackvalidate: arch/x86/lib/rwsem.o: call_rwsem_down_read_failed() is missing frame pointer logic
> stackvalidate: arch/x86/lib/rwsem.o: call_rwsem_down_write_failed() is missing frame pointer logic
> stackvalidate: arch/x86/lib/rwsem.o: call_rwsem_wake() is missing frame pointer logic
> stackvalidate: arch/x86/lib/rwsem.o: call_rwsem_downgrade_wake() is missing frame pointer logic
> stackvalidate: arch/x86/boot/copy.o: copy_from_fs() is missing frame pointer logic
> stackvalidate: arch/x86/boot/copy.o: copy_to_fs() is missing frame pointer logic
> stackvalidate: arch/x86/boot/compressed/head_64.o: return instruction outside of a function at .text+0x16e
> stackvalidate: arch/x86/boot/compressed/head_64.o: return instruction outside of a function at .text+0x172
> stackvalidate: arch/x86/boot/compressed/head_64.o: startup_32() is missing frame pointer logic
> stackvalidate: arch/x86/boot/pmjump.o: unsupported jump to outside of the function at in_pm32+0x1c
>
> Note that only 13 of the 47 warnings are actually due to missing frame
> pointer logic.  The rest are ambiguous conditions which prevent
> stackvalidate from being able to make sense of things: returning from
> outside of a proper ELF function, or jumping from inside of a function
> to outside of its scope.
>
> Similarly, in the Fedora config case, only 27 of the 83 warnings are for
> missing frame pointer logic.
>
> If there are no objections, I'll go with this approach in the next
> version of the patch set.

I'm willing to review anything with "entry" in its filename.

--Andy

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
  2015-05-21 20:54       ` Josh Poimboeuf
  2015-05-21 21:53         ` Andy Lutomirski
@ 2015-05-21 22:01         ` Borislav Petkov
  2015-05-22 14:32           ` Josh Poimboeuf
  1 sibling, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-21 22:01 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Ingo Molnar, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Michal Marek, Peter Zijlstra, x86, live-patching, linux-kernel,
	Linus Torvalds, Andy Lutomirski, Denys Vlasenko, Brian Gerst,
	Peter Zijlstra, Andrew Morton

On Thu, May 21, 2015 at 03:54:25PM -0500, Josh Poimboeuf wrote:
> stackvalidate: arch/x86/lib/memmove_64.o: return instruction outside of a function at .altinstr_replacement+0x5

That must be something like this:

0000000000000000 <.altinstr_replacement>:
   0:   48 89 d1                mov    %rdx,%rcx
   3:   f3 a4                   rep movsb %ds:(%rsi),%es:(%rdi)
   5:   c3                      retq

right?

In any case, anything with alternatives is probably a false positive
because even if instructions appear outside of the containing function,
they get patched in and are actually inside. Jump offsets get fixed up
properly too. Should, at least :-)

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
  2015-05-21 22:01         ` Borislav Petkov
@ 2015-05-22 14:32           ` Josh Poimboeuf
  2015-05-22 21:18             ` Jiri Kosina
  2015-05-23  8:37             ` Borislav Petkov
  0 siblings, 2 replies; 710+ messages in thread
From: Josh Poimboeuf @ 2015-05-22 14:32 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Michal Marek, Peter Zijlstra, x86, live-patching, linux-kernel,
	Linus Torvalds, Andy Lutomirski, Denys Vlasenko, Brian Gerst,
	Peter Zijlstra, Andrew Morton

On Fri, May 22, 2015 at 12:01:58AM +0200, Borislav Petkov wrote:
> On Thu, May 21, 2015 at 03:54:25PM -0500, Josh Poimboeuf wrote:
> > stackvalidate: arch/x86/lib/memmove_64.o: return instruction outside of a function at .altinstr_replacement+0x5
> 
> That must be something like this:
> 
> 0000000000000000 <.altinstr_replacement>:
>    0:   48 89 d1                mov    %rdx,%rcx
>    3:   f3 a4                   rep movsb %ds:(%rsi),%es:(%rdi)
>    5:   c3                      retq
> 
> right?
> 
> In any case, anything with alternatives is probably a false positive
> because even if instructions appear outside of the containing function,
> they get patched in and are actually inside. Jump offsets get fixed up
> properly too. Should, at least :-)

Hm, alternatives do complicate things a bit.  It *is* a false positive,
but not necessarily because its part of an alternative instruction
block.

The above code would be patched into memmove(), which is a leaf function
because it doesn't call any other functions.  Leaf functions don't need
frame pointer logic, so we can ignore them.

If instead the above code were patched into a non-leaf function, we'd
have to change it to restore the frame pointer before returning.

-- 
Josh

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
  2015-05-21 21:53         ` Andy Lutomirski
@ 2015-05-22 14:53           ` Josh Poimboeuf
  0 siblings, 0 replies; 710+ messages in thread
From: Josh Poimboeuf @ 2015-05-22 14:53 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Michal Marek, Peter Zijlstra, X86 ML, live-patching,
	linux-kernel, Linus Torvalds, Andy Lutomirski, Denys Vlasenko,
	Brian Gerst, Peter Zijlstra, Borislav Petkov, Andrew Morton

On Thu, May 21, 2015 at 02:53:07PM -0700, Andy Lutomirski wrote:
> On Thu, May 21, 2015 at 1:54 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > After removing the frame pointer checks for leaf functions, and adding a
> > check for all functions which jump outside of their scope, the number of
> > defconfig warnings dropped from 89 -> 47.  The Fedora config warning
> > count dropped from 207 -> 83.
> >
> > Here are the remaining 47 warnings for defconfig:
> >
> > stackvalidate: arch/x86/ia32/ia32entry.o: ia32_sysenter_target() is missing frame pointer logic
> > stackvalidate: arch/x86/ia32/ia32entry.o: return instruction outside of a function at .entry.text+0x52e
> > stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x359
> > stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x19be
> > stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x19e5
> > stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x1c21
> > stackvalidate: arch/x86/kernel/entry_64.o: return instruction outside of a function at .entry.text+0x1ceb
> > stackvalidate: arch/x86/kernel/acpi/wakeup_64.o: unsupported jump to outside of the function at wakeup_long64+0x15
> > stackvalidate: arch/x86/kernel/acpi/wakeup_64.o: do_suspend_lowlevel() is missing frame pointer logic
> > stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x6b
> > stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0xc7
> > stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x110
> > stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x145
> > stackvalidate: arch/x86/kernel/relocate_kernel_64.o: return instruction outside of a function at .text+0x1c4
> > stackvalidate: arch/x86/kernel/head_64.o: return instruction outside of a function at .head.text+0x1a2
> > stackvalidate: arch/x86/kernel/head_64.o: early_idt_handler() is missing frame pointer logic
> > stackvalidate: arch/x86/platform/efi/efi_stub_64.o: efi_call() is missing frame pointer logic
> > stackvalidate: arch/x86/realmode/rm/trampoline_64.o: return instruction outside of a function at .text+0x170
> > stackvalidate: arch/x86/realmode/rm/trampoline_64.o: return instruction outside of a function at .text+0x176
> > stackvalidate: arch/x86/realmode/rm/reboot.o: return instruction outside of a function at .text+0x2a
> > stackvalidate: arch/x86/realmode/rm/copy.o: copy_from_fs() is missing frame pointer logic
> > stackvalidate: arch/x86/realmode/rm/copy.o: copy_to_fs() is missing frame pointer logic
> > stackvalidate: arch/x86/power/hibernate_asm_64.o: return instruction outside of a function at .text+0x69
> > stackvalidate: arch/x86/power/hibernate_asm_64.o: return instruction outside of a function at .text+0x16d
> > stackvalidate: arch/x86/lib/copy_user_64.o: unsupported jump to outside of the function at _copy_to_user+0x25
> > stackvalidate: arch/x86/lib/copy_user_64.o: unsupported jump to outside of the function at _copy_from_user+0x25
> > stackvalidate: arch/x86/lib/getuser.o: unsupported jump to outside of the function at __get_user_1+0x14
> > stackvalidate: arch/x86/lib/getuser.o: unsupported jump to outside of the function at __get_user_2+0x4
> > stackvalidate: arch/x86/lib/getuser.o: unsupported jump to outside of the function at __get_user_4+0x4
> > stackvalidate: arch/x86/lib/getuser.o: unsupported jump to outside of the function at __get_user_8+0x4
> > stackvalidate: arch/x86/lib/getuser.o: return instruction outside of a function at .text+0xc5
> > stackvalidate: arch/x86/lib/memmove_64.o: return instruction outside of a function at .altinstr_replacement+0x5
> > stackvalidate: arch/x86/lib/putuser.o: unsupported jump to outside of the function at __put_user_1+0x14
> > stackvalidate: arch/x86/lib/putuser.o: unsupported jump to outside of the function at __put_user_2+0x1b
> > stackvalidate: arch/x86/lib/putuser.o: unsupported jump to outside of the function at __put_user_4+0x1b
> > stackvalidate: arch/x86/lib/putuser.o: unsupported jump to outside of the function at __put_user_8+0x1b
> > stackvalidate: arch/x86/lib/putuser.o: return instruction outside of a function at .text+0xc1
> > stackvalidate: arch/x86/lib/rwsem.o: call_rwsem_down_read_failed() is missing frame pointer logic
> > stackvalidate: arch/x86/lib/rwsem.o: call_rwsem_down_write_failed() is missing frame pointer logic
> > stackvalidate: arch/x86/lib/rwsem.o: call_rwsem_wake() is missing frame pointer logic
> > stackvalidate: arch/x86/lib/rwsem.o: call_rwsem_downgrade_wake() is missing frame pointer logic
> > stackvalidate: arch/x86/boot/copy.o: copy_from_fs() is missing frame pointer logic
> > stackvalidate: arch/x86/boot/copy.o: copy_to_fs() is missing frame pointer logic
> > stackvalidate: arch/x86/boot/compressed/head_64.o: return instruction outside of a function at .text+0x16e
> > stackvalidate: arch/x86/boot/compressed/head_64.o: return instruction outside of a function at .text+0x172
> > stackvalidate: arch/x86/boot/compressed/head_64.o: startup_32() is missing frame pointer logic
> > stackvalidate: arch/x86/boot/pmjump.o: unsupported jump to outside of the function at in_pm32+0x1c
> >
> > Note that only 13 of the 47 warnings are actually due to missing frame
> > pointer logic.  The rest are ambiguous conditions which prevent
> > stackvalidate from being able to make sense of things: returning from
> > outside of a proper ELF function, or jumping from inside of a function
> > to outside of its scope.
> >
> > Similarly, in the Fedora config case, only 27 of the 83 warnings are for
> > missing frame pointer logic.
> >
> > If there are no objections, I'll go with this approach in the next
> > version of the patch set.
> 
> I'm willing to review anything with "entry" in its filename.

Thanks.  I think the "entry" warnings may be false positives, since that
code isn't called by any C kernel code.  (Now that I'm ignoring leaf
functions, the ratio of false positives to true positives has gone up.)

The false positives for "return instruction outside of a function" can
be marked with the RET_NOVALIDATE macro to tell stackvalidate to ignore
the return instruction, or with FILE_NOVALIDATE to tell it to ignore the
entire file.

I can add some patches to fix the warnings.  I'll put you on CC for the
"entry" changes.

-- 
Josh

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
  2015-05-22 14:32           ` Josh Poimboeuf
@ 2015-05-22 21:18             ` Jiri Kosina
  2015-05-22 22:22               ` Josh Poimboeuf
  2015-05-23  8:37             ` Borislav Petkov
  1 sibling, 1 reply; 710+ messages in thread
From: Jiri Kosina @ 2015-05-22 21:18 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Borislav Petkov, Ingo Molnar, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Michal Marek, Peter Zijlstra, x86, live-patching,
	linux-kernel, Linus Torvalds, Andy Lutomirski, Denys Vlasenko,
	Brian Gerst, Peter Zijlstra, Andrew Morton

On Fri, 22 May 2015, Josh Poimboeuf wrote:

> Hm, alternatives do complicate things a bit.  It *is* a false positive,
> but not necessarily because its part of an alternative instruction
> block.
> 
> The above code would be patched into memmove(), which is a leaf function
> because it doesn't call any other functions.  Leaf functions don't need
> frame pointer logic, so we can ignore them.
> 
> If instead the above code were patched into a non-leaf function, we'd
> have to change it to restore the frame pointer before returning.

Is this really only a problem of alternatives? How about 
dynamically-enabled tracepoints?

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
  2015-05-22 21:18             ` Jiri Kosina
@ 2015-05-22 22:22               ` Josh Poimboeuf
  0 siblings, 0 replies; 710+ messages in thread
From: Josh Poimboeuf @ 2015-05-22 22:22 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Borislav Petkov, Ingo Molnar, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Michal Marek, Peter Zijlstra, x86, live-patching,
	linux-kernel, Linus Torvalds, Andy Lutomirski, Denys Vlasenko,
	Brian Gerst, Peter Zijlstra, Andrew Morton

On Fri, May 22, 2015 at 11:18:57PM +0200, Jiri Kosina wrote:
> On Fri, 22 May 2015, Josh Poimboeuf wrote:
> 
> > Hm, alternatives do complicate things a bit.  It *is* a false positive,
> > but not necessarily because its part of an alternative instruction
> > block.
> > 
> > The above code would be patched into memmove(), which is a leaf function
> > because it doesn't call any other functions.  Leaf functions don't need
> > frame pointer logic, so we can ignore them.
> > 
> > If instead the above code were patched into a non-leaf function, we'd
> > have to change it to restore the frame pointer before returning.
> 
> Is this really only a problem of alternatives? How about 
> dynamically-enabled tracepoints?

I think tracepoints are only in C code, right?  stackvalidate only
analyzes asm code, so it's not a concern for this patch set.

And I think tracepoints rely on normal call instructions, so they
shouldn't cause any problems with frame pointers as far as I can tell.

-- 
Josh

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
  2015-05-22 14:32           ` Josh Poimboeuf
  2015-05-22 21:18             ` Jiri Kosina
@ 2015-05-23  8:37             ` Borislav Petkov
  1 sibling, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-23  8:37 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Ingo Molnar, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Michal Marek, Peter Zijlstra, x86, live-patching, linux-kernel,
	Linus Torvalds, Andy Lutomirski, Denys Vlasenko, Brian Gerst,
	Peter Zijlstra, Andrew Morton

On Fri, May 22, 2015 at 09:32:12AM -0500, Josh Poimboeuf wrote:
> If instead the above code were patched into a non-leaf function, we'd
> have to change it to restore the frame pointer before returning.

Not a problem, I think. One'll need to add the FP restoring before the
retq.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-20 11:11             ` Ingo Molnar
  2015-05-20 11:21               ` Borislav Petkov
@ 2015-05-25  2:42               ` Huang Rui
  2015-05-25 10:43                 ` Ingo Molnar
  1 sibling, 1 reply; 710+ messages in thread
From: Huang Rui @ 2015-05-25  2:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Borislav Petkov, Len Brown, Rafael J. Wysocki, Thomas Gleixner,
	x86, linux-kernel, Fengguang Wu, Aaron Lu, Li, Tony,
	Andy Lutomirski

On Wed, May 20, 2015 at 07:11:20PM +0800, Ingo Molnar wrote:
> 
> * Borislav Petkov <bp@suse.de> wrote:
> 
> > On Wed, May 20, 2015 at 12:22:58PM +0200, Ingo Molnar wrote:
> >
> > > Well, HLT does not get any hint from the OS how long the idling is 
> > > expected to last.
> > 
> > MWAIT on AMD doesn't either:
> 
> Yeah, MWAIT clearly doesn't, but I was talking about MWAITX, which 
> takes a timeout parameter as per these patches.
> 
> > > Another MWAITX round - we've got no crystal ball, so the hint 
> > > might be wrong if an external event occurs that we did not 
> > > anticipate.
> > 
> > So if we end up doing a bunch of MWAITX rounds instead of HLT and 
> > MWAITX saves less power than HLT, then we practically are worse.
> 
> So the way I think it would work ideally is (and note that this is 
> different from how you think it works):
> 
>   - MWAITX takes a 'timeout' parameter, but otherwise behaves exactly 
>     like MWAIT: i.e. once idle it won't exit idle on its own
> 
>   - based on the 'timeout' hint, MWAITX can internally optimize how 
>     deep sleep it enters. If the timeout is large it goes deep, if 
>     it's small, it goes shallow. This does not change the fact that no 
>     matter which state it enters, it will come back the moment an 
>     interrupt is posted.

No, the timeout value doesn't decide how 'deep' the power state enters.
Basically, the same power consumption with any timeout.

I summarized the comparison of mwait and mwaitx

                MWAIT                           MWAITX
opcode          0f 01 c9           |            0f 01 fb
ECX[0]                  value of RFLAGS.IF seen by instruction
ECX[1]          unused/#GP if set  |            enable timer if set
ECX[31:2]                     unused/#GP if set
EAX                             unused
EBX[31:0]       unused             |            max wait time (loops)


                MONITOR                         MONITORX
opcode          0f 01 c8           |            0f 01 fa
EAX                     (logical) address to monitor
ECX                     #GP if not zero

Thanks,
Rui

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-25  2:42               ` Huang Rui
@ 2015-05-25 10:43                 ` Ingo Molnar
  0 siblings, 0 replies; 710+ messages in thread
From: Ingo Molnar @ 2015-05-25 10:43 UTC (permalink / raw)
  To: Huang Rui
  Cc: Borislav Petkov, Len Brown, Rafael J. Wysocki, Thomas Gleixner,
	x86, linux-kernel, Fengguang Wu, Aaron Lu, Li, Tony,
	Andy Lutomirski


* Huang Rui <ray.huang@amd.com> wrote:

> No, the timeout value doesn't decide how 'deep' the power state 
> enters. Basically, the same power consumption with any timeout.
> 
> I summarized the comparison of mwait and mwaitx
> 
>                 MWAIT                           MWAITX
> opcode          0f 01 c9           |            0f 01 fb
> ECX[0]                  value of RFLAGS.IF seen by instruction
> ECX[1]          unused/#GP if set  |            enable timer if set
> ECX[31:2]                     unused/#GP if set
> EAX                             unused
> EBX[31:0]       unused             |            max wait time (loops)
> 
> 
>                 MONITOR                         MONITORX
> opcode          0f 01 c8           |            0f 01 fa
> EAX                     (logical) address to monitor
> ECX                     #GP if not zero

Ok, thanks for the clarification!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 710+ messages in thread

* [PATCH 00/18] tip queue 2015-05-26
@ 2015-05-26  8:28 Borislav Petkov
  2015-05-26  8:28 ` [PATCH 01/18] x86/kconfig: Simplify conditions for HAVE_ARCH_HUGE_VMAP Borislav Petkov
                   ` (17 more replies)
  0 siblings, 18 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-26  8:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: X86-ML, LKML

From: Borislav Petkov <bp@suse.de>

Hi Ingo,

some more stuff this week. None of it is urgent material, the bigger
chunk is MTRR/PAT cleanups.

Borislav Petkov (3):
  x86/documentation: Move kernel-stacks doc one level up
  x86/documentation: Remove STACKFAULT_STACK bulletpoint
  x86/documentation: Adapt Ingo's explanation on printing backtraces

Huang Rui (1):
  x86/process: Drop repeated word from comment

Luis R. Rodriguez (6):
  x86/mm/pat: Convert to pr_* usage
  x86: Document Write Combining MTRR type effects on PAT / non-PAT pages
  x86/mtrr: Avoid ifdeffery with phys_wc_to_mtrr_index()
  x86/mtrr: Generalize runtime disabling of MTRRs
  x86/mm/pat: Wrap pat_enabled
  x86/mm/pat: Export pat_enabled()

Prarit Bhargava (1):
  x86/cpu: Strip any /proc/cpuinfo model name field whitespace

Toshi Kani (6):
  x86/kconfig: Simplify conditions for HAVE_ARCH_HUGE_VMAP
  x86/mtrr: Fix MTRR lookup to handle an inclusive entry
  x86/mtrr: Fix MTRR state checks in mtrr_type_lookup()
  x86/mtrr: Use symbolic define as a retval for disabled MTRRs
  x86/mtrr: Clean up mtrr_type_lookup()
  x86/mm: Enhance MTRR checks in kernel mapping helpers

Xie XiuQi (1):
  x86/mce: Fix monarch timeout setting through the mce= cmdline option


^ permalink raw reply	[flat|nested] 710+ messages in thread

* [PATCH 01/18] x86/kconfig: Simplify conditions for HAVE_ARCH_HUGE_VMAP
  2015-05-26  8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
@ 2015-05-26  8:28 ` Borislav Petkov
  2015-05-27 14:17     ` tip-bot for Toshi Kani
  2015-05-26  8:28 ` [PATCH 02/18] x86/mtrr: Fix MTRR lookup to handle an inclusive entry Borislav Petkov
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-26  8:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: X86-ML, LKML

From: Toshi Kani <toshi.kani@hp.com>

Simplify the conditions selecting HAVE_ARCH_HUGE_VMAP since X86_PAE
depends on X86_32 already.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Cc: dave.hansen@intel.com
Cc: Elliott@hp.com
Cc: pebolle@tiscali.nl
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-mm <linux-mm@kvack.org>
Cc: x86-ml <x86@kernel.org>
Cc: lkml <linux-kernel@vger.kernel.org>
Link: http://lkml.kernel.org/r/1431714237-880-2-git-send-email-toshi.kani@hp.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 226d5696e1d1..4eb0b0ffae85 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -100,7 +100,7 @@ config X86
 	select IRQ_FORCED_THREADING
 	select HAVE_BPF_JIT if X86_64
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
-	select HAVE_ARCH_HUGE_VMAP if X86_64 || (X86_32 && X86_PAE)
+	select HAVE_ARCH_HUGE_VMAP if X86_64 || X86_PAE
 	select ARCH_HAS_SG_CHAIN
 	select CLKEVT_I8253
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
-- 
1.9.0.258.g00eda23


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH 02/18] x86/mtrr: Fix MTRR lookup to handle an inclusive entry
  2015-05-26  8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
  2015-05-26  8:28 ` [PATCH 01/18] x86/kconfig: Simplify conditions for HAVE_ARCH_HUGE_VMAP Borislav Petkov
@ 2015-05-26  8:28 ` Borislav Petkov
  2015-05-27 14:18     ` tip-bot for Toshi Kani
  2015-05-26  8:28 ` [PATCH 03/18] x86/mtrr: Fix MTRR state checks in mtrr_type_lookup() Borislav Petkov
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-26  8:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: X86-ML, LKML

From: Toshi Kani <toshi.kani@hp.com>

When an MTRR entry is inclusive to a requested range, i.e. the
start and end of the request are not within the MTRR entry range
but the range contains the MTRR entry entirely:

  range_start ... [mtrr_start ... mtrr_end] ... range_end

__mtrr_type_lookup() ignores such a case because both start_state
and end_state are set to zero.

This bug can cause the following issues:

1) reserve_memtype() tracks an effective memory type in case
   a request type is WB (ex. /dev/mem blindly uses WB). Missing
   to track with its effective type causes a subsequent request
   to map the same range with the effective type to fail.

2) pud_set_huge() and pmd_set_huge() check if a requested range
   has any overlap with MTRRs. Missing to detect an overlap may
   cause a performance penalty or undefined behavior.

This patch fixes the bug by adding a new flag, 'inclusive',
to detect the inclusive case.  This case is then handled in
the same way as end_state:1 since the first region is the same.
With this fix, __mtrr_type_lookup() handles the inclusive case
properly.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Cc: dave.hansen@intel.com
Cc: Elliott@hp.com
Cc: pebolle@tiscali.nl
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-mm <linux-mm@kvack.org>
Cc: x86-ml <x86@kernel.org>
Cc: lkml <linux-kernel@vger.kernel.org>
Link: http://lkml.kernel.org/r/1431714237-880-3-git-send-email-toshi.kani@hp.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/mtrr/generic.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 5b239679cfc9..e202d26f64a2 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -154,7 +154,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 
 	prev_match = 0xFF;
 	for (i = 0; i < num_var_ranges; ++i) {
-		unsigned short start_state, end_state;
+		unsigned short start_state, end_state, inclusive;
 
 		if (!(mtrr_state.var_ranges[i].mask_lo & (1 << 11)))
 			continue;
@@ -166,19 +166,27 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 
 		start_state = ((start & mask) == (base & mask));
 		end_state = ((end & mask) == (base & mask));
+		inclusive = ((start < base) && (end > base));
 
-		if (start_state != end_state) {
+		if ((start_state != end_state) || inclusive) {
 			/*
 			 * We have start:end spanning across an MTRR.
-			 * We split the region into
-			 * either
-			 * (start:mtrr_end) (mtrr_end:end)
-			 * or
-			 * (start:mtrr_start) (mtrr_start:end)
+			 * We split the region into either
+			 *
+			 * - start_state:1
+			 * (start:mtrr_end)(mtrr_end:end)
+			 * - end_state:1
+			 * (start:mtrr_start)(mtrr_start:end)
+			 * - inclusive:1
+			 * (start:mtrr_start)(mtrr_start:mtrr_end)(mtrr_end:end)
+			 *
 			 * depending on kind of overlap.
-			 * Return the type for first region and a pointer to
-			 * the start of second region so that caller will
-			 * lookup again on the second region.
+			 *
+			 * Return the type of the first region and a pointer
+			 * to the start of next region so that caller will be
+			 * advised to lookup again after having adjusted start
+			 * and end.
+			 *
 			 * Note: This way we handle multiple overlaps as well.
 			 */
 			if (start_state)
-- 
1.9.0.258.g00eda23


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH 03/18] x86/mtrr: Fix MTRR state checks in mtrr_type_lookup()
  2015-05-26  8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
  2015-05-26  8:28 ` [PATCH 01/18] x86/kconfig: Simplify conditions for HAVE_ARCH_HUGE_VMAP Borislav Petkov
  2015-05-26  8:28 ` [PATCH 02/18] x86/mtrr: Fix MTRR lookup to handle an inclusive entry Borislav Petkov
@ 2015-05-26  8:28 ` Borislav Petkov
  2015-05-27 14:18     ` tip-bot for Toshi Kani
  2015-05-26  8:28 ` [PATCH 04/18] x86/mtrr: Use symbolic define as a retval for disabled MTRRs Borislav Petkov
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-26  8:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: X86-ML, LKML

From: Toshi Kani <toshi.kani@hp.com>

'mtrr_state.enabled' contains the FE (fixed MTRRs enabled)
and E (MTRRs enabled) flags in MSR_MTRRdefType.  Intel SDM,
section 11.11.2.1, defines these flags as follows:

 - All MTRRs are disabled when the E flag is clear.
   The FE flag has no affect when the E flag is clear.
 - The default type is enabled when the E flag is set.
 - MTRR variable ranges are enabled when the E flag is set.
 - MTRR fixed ranges are enabled when both E and FE flags
   are set.

MTRR state checks in __mtrr_type_lookup() do not match with SDM.

Hence, this patch makes the following changes:
 - The current code detects MTRRs disabled when both E and
   FE flags are clear in mtrr_state.enabled.  Fix to detect
   MTRRs disabled when the E flag is clear.
 - The current code does not check if the FE bit is set in
   mtrr_state.enabled when looking at the fixed entries.
   Fix to check the FE flag.
 - The current code returns the default type when the E flag
   is clear in mtrr_state.enabled. However, the default type
   is UC when the E flag is clear.  Remove the code as this
   case is handled as MTRR disabled with the 1st change.

In addition, this patch defines the E and FE flags in
mtrr_state.enabled as follows.
 - FE flag: MTRR_STATE_MTRR_FIXED_ENABLED
 - E  flag: MTRR_STATE_MTRR_ENABLED

print_mtrr_state() and x86_get_mtrr_mem_range() are also updated
accordingly.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Cc: dave.hansen@intel.com
Cc: Elliott@hp.com
Cc: pebolle@tiscali.nl
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-mm <linux-mm@kvack.org>
Cc: x86-ml <x86@kernel.org>
Cc: lkml <linux-kernel@vger.kernel.org>
Link: http://lkml.kernel.org/r/1431714237-880-4-git-send-email-toshi.kani@hp.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/mtrr.h        |  4 ++++
 arch/x86/kernel/cpu/mtrr/cleanup.c |  3 ++-
 arch/x86/kernel/cpu/mtrr/generic.c | 15 ++++++++-------
 3 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index f768f6298419..ef927948657c 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -127,4 +127,8 @@ struct mtrr_gentry32 {
 				 _IOW(MTRR_IOCTL_BASE,  9, struct mtrr_sentry32)
 #endif /* CONFIG_COMPAT */
 
+/* Bit fields for enabled in struct mtrr_state_type */
+#define MTRR_STATE_MTRR_FIXED_ENABLED	0x01
+#define MTRR_STATE_MTRR_ENABLED		0x02
+
 #endif /* _ASM_X86_MTRR_H */
diff --git a/arch/x86/kernel/cpu/mtrr/cleanup.c b/arch/x86/kernel/cpu/mtrr/cleanup.c
index 5f90b85ff22e..70d7c93f4550 100644
--- a/arch/x86/kernel/cpu/mtrr/cleanup.c
+++ b/arch/x86/kernel/cpu/mtrr/cleanup.c
@@ -98,7 +98,8 @@ x86_get_mtrr_mem_range(struct range *range, int nr_range,
 			continue;
 		base = range_state[i].base_pfn;
 		if (base < (1<<(20-PAGE_SHIFT)) && mtrr_state.have_fixed &&
-		    (mtrr_state.enabled & 1)) {
+		    (mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED) &&
+		    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
 			/* Var MTRR contains UC entry below 1M? Skip it: */
 			printk(BIOS_BUG_MSG, i);
 			if (base + size <= (1<<(20-PAGE_SHIFT)))
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index e202d26f64a2..b0599dbb899a 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -119,14 +119,16 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 	if (!mtrr_state_set)
 		return 0xFF;
 
-	if (!mtrr_state.enabled)
+	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
 		return 0xFF;
 
 	/* Make end inclusive end, instead of exclusive */
 	end--;
 
 	/* Look in fixed ranges. Just return the type as per start */
-	if (mtrr_state.have_fixed && (start < 0x100000)) {
+	if ((start < 0x100000) &&
+	    (mtrr_state.have_fixed) &&
+	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
 		int idx;
 
 		if (start < 0x80000) {
@@ -149,9 +151,6 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 	 * Look of multiple ranges matching this address and pick type
 	 * as per MTRR precedence
 	 */
-	if (!(mtrr_state.enabled & 2))
-		return mtrr_state.def_type;
-
 	prev_match = 0xFF;
 	for (i = 0; i < num_var_ranges; ++i) {
 		unsigned short start_state, end_state, inclusive;
@@ -355,7 +354,9 @@ static void __init print_mtrr_state(void)
 		 mtrr_attrib_to_str(mtrr_state.def_type));
 	if (mtrr_state.have_fixed) {
 		pr_debug("MTRR fixed ranges %sabled:\n",
-			 mtrr_state.enabled & 1 ? "en" : "dis");
+			((mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED) &&
+			 (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) ?
+			 "en" : "dis");
 		print_fixed(0x00000, 0x10000, mtrr_state.fixed_ranges + 0);
 		for (i = 0; i < 2; ++i)
 			print_fixed(0x80000 + i * 0x20000, 0x04000,
@@ -368,7 +369,7 @@ static void __init print_mtrr_state(void)
 		print_fixed_last();
 	}
 	pr_debug("MTRR variable ranges %sabled:\n",
-		 mtrr_state.enabled & 2 ? "en" : "dis");
+		 mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED ? "en" : "dis");
 	high_width = (__ffs64(size_or_mask) - (32 - PAGE_SHIFT) + 3) / 4;
 
 	for (i = 0; i < num_var_ranges; ++i) {
-- 
1.9.0.258.g00eda23


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH 04/18] x86/mtrr: Use symbolic define as a retval for disabled MTRRs
  2015-05-26  8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
                   ` (2 preceding siblings ...)
  2015-05-26  8:28 ` [PATCH 03/18] x86/mtrr: Fix MTRR state checks in mtrr_type_lookup() Borislav Petkov
@ 2015-05-26  8:28 ` Borislav Petkov
  2015-05-27 14:18     ` tip-bot for Toshi Kani
  2015-05-26  8:28 ` [PATCH 05/18] x86/mtrr: Clean up mtrr_type_lookup() Borislav Petkov
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-26  8:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: X86-ML, LKML

From: Toshi Kani <toshi.kani@hp.com>

mtrr_type_lookup() returns verbatim 0xFF when MTRRs are disabled. This
patch defines MTRR_TYPE_INVALID to clarify the meaning of this value,
and documents its usage.

Document the return values of the kernel virtual address mapping helpers
pud_set_huge(), pmd_set_huge, pud_clear_huge() and pmd_clear_huge().

There is no functional change in this patch.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Cc: dave.hansen@intel.com
Cc: Elliott@hp.com
Cc: pebolle@tiscali.nl
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-mm <linux-mm@kvack.org>
Cc: x86-ml <x86@kernel.org>
Cc: lkml <linux-kernel@vger.kernel.org>
Link: http://lkml.kernel.org/r/1431714237-880-5-git-send-email-toshi.kani@hp.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/mtrr.h        |  2 +-
 arch/x86/include/uapi/asm/mtrr.h   |  8 +++++++-
 arch/x86/kernel/cpu/mtrr/generic.c | 14 ++++++-------
 arch/x86/mm/pgtable.c              | 42 +++++++++++++++++++++++++++++---------
 4 files changed, 47 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index ef927948657c..bb03a547c1ab 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -55,7 +55,7 @@ static inline u8 mtrr_type_lookup(u64 addr, u64 end)
 	/*
 	 * Return no-MTRRs:
 	 */
-	return 0xff;
+	return MTRR_TYPE_INVALID;
 }
 #define mtrr_save_fixed_ranges(arg) do {} while (0)
 #define mtrr_save_state() do {} while (0)
diff --git a/arch/x86/include/uapi/asm/mtrr.h b/arch/x86/include/uapi/asm/mtrr.h
index d0acb658c8f4..7528dcf59691 100644
--- a/arch/x86/include/uapi/asm/mtrr.h
+++ b/arch/x86/include/uapi/asm/mtrr.h
@@ -103,7 +103,7 @@ struct mtrr_state_type {
 #define MTRRIOC_GET_PAGE_ENTRY   _IOWR(MTRR_IOCTL_BASE, 8, struct mtrr_gentry)
 #define MTRRIOC_KILL_PAGE_ENTRY  _IOW(MTRR_IOCTL_BASE,  9, struct mtrr_sentry)
 
-/*  These are the region types  */
+/* MTRR memory types, which are defined in SDM */
 #define MTRR_TYPE_UNCACHABLE 0
 #define MTRR_TYPE_WRCOMB     1
 /*#define MTRR_TYPE_         2*/
@@ -113,5 +113,11 @@ struct mtrr_state_type {
 #define MTRR_TYPE_WRBACK     6
 #define MTRR_NUM_TYPES       7
 
+/*
+ * Invalid MTRR memory type.  mtrr_type_lookup() returns this value when
+ * MTRRs are disabled.  Note, this value is allocated from the reserved
+ * values (0x7-0xff) of the MTRR memory types.
+ */
+#define MTRR_TYPE_INVALID    0xff
 
 #endif /* _UAPI_ASM_X86_MTRR_H */
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index b0599dbb899a..7b1491c6232d 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -104,7 +104,7 @@ static int check_type_overlap(u8 *prev, u8 *curr)
 
 /*
  * Error/Semi-error returns:
- * 0xFF - when MTRR is not enabled
+ * MTRR_TYPE_INVALID - when MTRR is not enabled
  * *repeat == 1 implies [start:end] spanned across MTRR range and type returned
  *		corresponds only to [start:*partial_end].
  *		Caller has to lookup again for [*partial_end:end].
@@ -117,10 +117,10 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 
 	*repeat = 0;
 	if (!mtrr_state_set)
-		return 0xFF;
+		return MTRR_TYPE_INVALID;
 
 	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
-		return 0xFF;
+		return MTRR_TYPE_INVALID;
 
 	/* Make end inclusive end, instead of exclusive */
 	end--;
@@ -151,7 +151,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 	 * Look of multiple ranges matching this address and pick type
 	 * as per MTRR precedence
 	 */
-	prev_match = 0xFF;
+	prev_match = MTRR_TYPE_INVALID;
 	for (i = 0; i < num_var_ranges; ++i) {
 		unsigned short start_state, end_state, inclusive;
 
@@ -206,7 +206,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			continue;
 
 		curr_match = mtrr_state.var_ranges[i].base_lo & 0xff;
-		if (prev_match == 0xFF) {
+		if (prev_match == MTRR_TYPE_INVALID) {
 			prev_match = curr_match;
 			continue;
 		}
@@ -220,7 +220,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			return MTRR_TYPE_WRBACK;
 	}
 
-	if (prev_match != 0xFF)
+	if (prev_match != MTRR_TYPE_INVALID)
 		return prev_match;
 
 	return mtrr_state.def_type;
@@ -229,7 +229,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 /*
  * Returns the effective MTRR type for the region
  * Error return:
- * 0xFF - when MTRR is not enabled
+ * MTRR_TYPE_INVALID - when MTRR is not enabled
  */
 u8 mtrr_type_lookup(u64 start, u64 end)
 {
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 0b97d2c75df3..c30f9819786b 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -563,16 +563,22 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
 }
 
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+/**
+ * pud_set_huge - setup kernel PUD mapping
+ *
+ * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
+ * this function does not set up a huge page when the range is covered
+ * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
+ * disabled.
+ *
+ * Returns 1 on success and 0 on failure.
+ */
 int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 {
 	u8 mtrr;
 
-	/*
-	 * Do not use a huge page when the range is covered by non-WB type
-	 * of MTRRs.
-	 */
 	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
+	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
 		return 0;
 
 	prot = pgprot_4k_2_large(prot);
@@ -584,16 +590,22 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 	return 1;
 }
 
+/**
+ * pmd_set_huge - setup kernel PMD mapping
+ *
+ * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
+ * this function does not set up a huge page when the range is covered
+ * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
+ * disabled.
+ *
+ * Returns 1 on success and 0 on failure.
+ */
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
 {
 	u8 mtrr;
 
-	/*
-	 * Do not use a huge page when the range is covered by non-WB type
-	 * of MTRRs.
-	 */
 	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
+	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
 		return 0;
 
 	prot = pgprot_4k_2_large(prot);
@@ -605,6 +617,11 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
 	return 1;
 }
 
+/**
+ * pud_clear_huge - clear kernel PUD mapping when it is set
+ *
+ * Returns 1 on success and 0 on failure (no PUD map is found).
+ */
 int pud_clear_huge(pud_t *pud)
 {
 	if (pud_large(*pud)) {
@@ -615,6 +632,11 @@ int pud_clear_huge(pud_t *pud)
 	return 0;
 }
 
+/**
+ * pmd_clear_huge - clear kernel PMD mapping when it is set
+ *
+ * Returns 1 on success and 0 on failure (no PMD map is found).
+ */
 int pmd_clear_huge(pmd_t *pmd)
 {
 	if (pmd_large(*pmd)) {
-- 
1.9.0.258.g00eda23


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH 05/18] x86/mtrr: Clean up mtrr_type_lookup()
  2015-05-26  8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
                   ` (3 preceding siblings ...)
  2015-05-26  8:28 ` [PATCH 04/18] x86/mtrr: Use symbolic define as a retval for disabled MTRRs Borislav Petkov
@ 2015-05-26  8:28 ` Borislav Petkov
  2015-05-27 14:19     ` tip-bot for Toshi Kani
  2015-05-26  8:28 ` [PATCH 06/18] x86/process: Drop repeated word from comment Borislav Petkov
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-26  8:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: X86-ML, LKML

From: Toshi Kani <toshi.kani@hp.com>

MTRRs contain fixed and variable entries. mtrr_type_lookup() may
repeatedly call __mtrr_type_lookup() to handle a request that overlaps
with variable entries.

However, __mtrr_type_lookup() also handles the fixed entries, which
do not have to be repeated. Therefore, this patch creates separate
functions, mtrr_type_lookup_fixed() and mtrr_type_lookup_variable(), to
handle the fixed and variable ranges respectively.

The patch also updates the function headers to clarify the return values
and output argument. It updates comments to clarify that the repeating
is necessary to handle overlaps with the default type, since overlaps
with multiple entries alone can be handled without such repeating.

There is no functional change in this patch.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Cc: dave.hansen@intel.com
Cc: Elliott@hp.com
Cc: pebolle@tiscali.nl
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-mm <linux-mm@kvack.org>
Cc: x86-ml <x86@kernel.org>
Cc: lkml <linux-kernel@vger.kernel.org>
Link: http://lkml.kernel.org/r/1431714237-880-6-git-send-email-toshi.kani@hp.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/mtrr/generic.c | 138 +++++++++++++++++++++++--------------
 1 file changed, 86 insertions(+), 52 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 7b1491c6232d..e51100c49eea 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -102,55 +102,68 @@ static int check_type_overlap(u8 *prev, u8 *curr)
 	return 0;
 }
 
-/*
- * Error/Semi-error returns:
- * MTRR_TYPE_INVALID - when MTRR is not enabled
- * *repeat == 1 implies [start:end] spanned across MTRR range and type returned
- *		corresponds only to [start:*partial_end].
- *		Caller has to lookup again for [*partial_end:end].
+/**
+ * mtrr_type_lookup_fixed - look up memory type in MTRR fixed entries
+ *
+ * Return the MTRR fixed memory type of 'start'.
+ *
+ * MTRR fixed entries are divided into the following ways:
+ *  0x00000 - 0x7FFFF : This range is divided into eight 64KB sub-ranges
+ *  0x80000 - 0xBFFFF : This range is divided into sixteen 16KB sub-ranges
+ *  0xC0000 - 0xFFFFF : This range is divided into sixty-four 4KB sub-ranges
+ *
+ * Return Values:
+ * MTRR_TYPE_(type)  - Matched memory type
+ * MTRR_TYPE_INVALID - Unmatched
+ */
+static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
+{
+	int idx;
+
+	if (start >= 0x100000)
+		return MTRR_TYPE_INVALID;
+
+	/* 0x0 - 0x7FFFF */
+	if (start < 0x80000) {
+		idx = 0;
+		idx += (start >> 16);
+		return mtrr_state.fixed_ranges[idx];
+	/* 0x80000 - 0xBFFFF */
+	} else if (start < 0xC0000) {
+		idx = 1 * 8;
+		idx += ((start - 0x80000) >> 14);
+		return mtrr_state.fixed_ranges[idx];
+	}
+
+	/* 0xC0000 - 0xFFFFF */
+	idx = 3 * 8;
+	idx += ((start - 0xC0000) >> 12);
+	return mtrr_state.fixed_ranges[idx];
+}
+
+/**
+ * mtrr_type_lookup_variable - look up memory type in MTRR variable entries
+ *
+ * Return Value:
+ * MTRR_TYPE_(type) - Matched memory type or default memory type (unmatched)
+ *
+ * Output Argument:
+ * repeat - Set to 1 when [start:end] spanned across MTRR range and type
+ *	    returned corresponds only to [start:*partial_end].  Caller has
+ *	    to lookup again for [*partial_end:end].
  */
-static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
+static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
+				    int *repeat)
 {
 	int i;
 	u64 base, mask;
 	u8 prev_match, curr_match;
 
 	*repeat = 0;
-	if (!mtrr_state_set)
-		return MTRR_TYPE_INVALID;
-
-	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
-		return MTRR_TYPE_INVALID;
 
-	/* Make end inclusive end, instead of exclusive */
+	/* Make end inclusive instead of exclusive */
 	end--;
 
-	/* Look in fixed ranges. Just return the type as per start */
-	if ((start < 0x100000) &&
-	    (mtrr_state.have_fixed) &&
-	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
-		int idx;
-
-		if (start < 0x80000) {
-			idx = 0;
-			idx += (start >> 16);
-			return mtrr_state.fixed_ranges[idx];
-		} else if (start < 0xC0000) {
-			idx = 1 * 8;
-			idx += ((start - 0x80000) >> 14);
-			return mtrr_state.fixed_ranges[idx];
-		} else {
-			idx = 3 * 8;
-			idx += ((start - 0xC0000) >> 12);
-			return mtrr_state.fixed_ranges[idx];
-		}
-	}
-
-	/*
-	 * Look in variable ranges
-	 * Look of multiple ranges matching this address and pick type
-	 * as per MTRR precedence
-	 */
 	prev_match = MTRR_TYPE_INVALID;
 	for (i = 0; i < num_var_ranges; ++i) {
 		unsigned short start_state, end_state, inclusive;
@@ -186,7 +199,8 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			 * advised to lookup again after having adjusted start
 			 * and end.
 			 *
-			 * Note: This way we handle multiple overlaps as well.
+			 * Note: This way we handle overlaps with multiple
+			 * entries and the default type properly.
 			 */
 			if (start_state)
 				*partial_end = base + get_mtrr_size(mask);
@@ -215,21 +229,18 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			return curr_match;
 	}
 
-	if (mtrr_tom2) {
-		if (start >= (1ULL<<32) && (end < mtrr_tom2))
-			return MTRR_TYPE_WRBACK;
-	}
-
 	if (prev_match != MTRR_TYPE_INVALID)
 		return prev_match;
 
 	return mtrr_state.def_type;
 }
 
-/*
- * Returns the effective MTRR type for the region
- * Error return:
- * MTRR_TYPE_INVALID - when MTRR is not enabled
+/**
+ * mtrr_type_lookup - look up memory type in MTRR
+ *
+ * Return Values:
+ * MTRR_TYPE_(type)  - The effective MTRR type for the region
+ * MTRR_TYPE_INVALID - MTRR is disabled
  */
 u8 mtrr_type_lookup(u64 start, u64 end)
 {
@@ -237,22 +248,45 @@ u8 mtrr_type_lookup(u64 start, u64 end)
 	int repeat;
 	u64 partial_end;
 
-	type = __mtrr_type_lookup(start, end, &partial_end, &repeat);
+	if (!mtrr_state_set)
+		return MTRR_TYPE_INVALID;
+
+	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
+		return MTRR_TYPE_INVALID;
+
+	/*
+	 * Look up the fixed ranges first, which take priority over
+	 * the variable ranges.
+	 */
+	if ((start < 0x100000) &&
+	    (mtrr_state.have_fixed) &&
+	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
+		return mtrr_type_lookup_fixed(start, end);
+
+	/*
+	 * Look up the variable ranges.  Look of multiple ranges matching
+	 * this address and pick type as per MTRR precedence.
+	 */
+	type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
 
 	/*
 	 * Common path is with repeat = 0.
 	 * However, we can have cases where [start:end] spans across some
-	 * MTRR range. Do repeated lookups for that case here.
+	 * MTRR ranges and/or the default type.  Do repeated lookups for
+	 * that case here.
 	 */
 	while (repeat) {
 		prev_type = type;
 		start = partial_end;
-		type = __mtrr_type_lookup(start, end, &partial_end, &repeat);
+		type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
 
 		if (check_type_overlap(&prev_type, &type))
 			return type;
 	}
 
+	if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
+		return MTRR_TYPE_WRBACK;
+
 	return type;
 }
 
-- 
1.9.0.258.g00eda23


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH 06/18] x86/process: Drop repeated word from comment
  2015-05-26  8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
                   ` (4 preceding siblings ...)
  2015-05-26  8:28 ` [PATCH 05/18] x86/mtrr: Clean up mtrr_type_lookup() Borislav Petkov
@ 2015-05-26  8:28 ` Borislav Petkov
  2015-05-27 14:16   ` [tip:sched/core] sched/x86: Drop repeated word from mwait_idle() comment tip-bot for Huang Rui
  2015-05-26  8:28 ` [PATCH 07/18] x86/mm: Enhance MTRR checks in kernel mapping helpers Borislav Petkov
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-26  8:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: X86-ML, LKML

From: Huang Rui <ray.huang@amd.com>

A single "default" is fine.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Link: http://lkml.kernel.org/r/1432022472-2224-5-git-send-email-ray.huang@amd.com
[ Fix another typo and reflow comment. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/process.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 6e338e3b1dc0..c648139d68d7 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -445,11 +445,10 @@ static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
 }
 
 /*
- * MONITOR/MWAIT with no hints, used for default default C1 state.
- * This invokes MWAIT with interrutps enabled and no flags,
- * which is backwards compatible with the original MWAIT implementation.
+ * MONITOR/MWAIT with no hints, used for default C1 state. This invokes MWAIT
+ * with interrupts enabled and no flags, which is backwards compatible with the
+ * original MWAIT implementation.
  */
-
 static void mwait_idle(void)
 {
 	if (!current_set_polling_and_test()) {
-- 
1.9.0.258.g00eda23


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH 07/18] x86/mm: Enhance MTRR checks in kernel mapping helpers
  2015-05-26  8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
                   ` (5 preceding siblings ...)
  2015-05-26  8:28 ` [PATCH 06/18] x86/process: Drop repeated word from comment Borislav Petkov
@ 2015-05-26  8:28 ` Borislav Petkov
  2015-05-27 14:19     ` tip-bot for Toshi Kani
  2015-05-26  8:28 ` [PATCH 08/18] x86/mm/pat: Convert to pr_* usage Borislav Petkov
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-26  8:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: X86-ML, LKML

From: Toshi Kani <toshi.kani@hp.com>

This patch adds the argument 'uniform' to mtrr_type_lookup(), which gets
set to 1 when a given range is covered uniformly by MTRRs, i.e. the
range is fully covered by a single MTRR entry or the default type.

Change pud_set_huge() and pmd_set_huge() to honor the 'uniform' flag to
see if it is safe to create a huge page mapping in the range.

This allows them to create a huge page mapping in a range covered by
a single MTRR entry of any memory type. It also detects a non-optimal
request properly. They continue to check with the WB type since it does
not effectively change the uniform mapping even if a request spans
multiple MTRR entries.

pmd_set_huge() logs a warning message to a non-optimal request so that
driver writers will be aware of such a case. Drivers should make a
mapping request aligned to a single MTRR entry when the range is covered
by MTRRs.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Cc: dave.hansen@intel.com
Cc: Elliott@hp.com
Cc: pebolle@tiscali.nl
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-mm <linux-mm@kvack.org>
Cc: x86-ml <x86@kernel.org>
Cc: lkml <linux-kernel@vger.kernel.org>
Link: http://lkml.kernel.org/r/1431714237-880-7-git-send-email-toshi.kani@hp.com
[ Realign, flesh out comments, improve warning message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/mtrr.h        |  4 ++--
 arch/x86/kernel/cpu/mtrr/generic.c | 40 ++++++++++++++++++++++++++++----------
 arch/x86/mm/pat.c                  |  4 ++--
 arch/x86/mm/pgtable.c              | 38 +++++++++++++++++++++++-------------
 4 files changed, 58 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index bb03a547c1ab..a31759e1edd9 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -31,7 +31,7 @@
  * arch_phys_wc_add and arch_phys_wc_del.
  */
 # ifdef CONFIG_MTRR
-extern u8 mtrr_type_lookup(u64 addr, u64 end);
+extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform);
 extern void mtrr_save_fixed_ranges(void *);
 extern void mtrr_save_state(void);
 extern int mtrr_add(unsigned long base, unsigned long size,
@@ -50,7 +50,7 @@ extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
 extern int amd_special_default_mtrr(void);
 extern int phys_wc_to_mtrr_index(int handle);
 #  else
-static inline u8 mtrr_type_lookup(u64 addr, u64 end)
+static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
 {
 	/*
 	 * Return no-MTRRs:
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index e51100c49eea..f782d9b62cb3 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -147,19 +147,24 @@ static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
  * Return Value:
  * MTRR_TYPE_(type) - Matched memory type or default memory type (unmatched)
  *
- * Output Argument:
+ * Output Arguments:
  * repeat - Set to 1 when [start:end] spanned across MTRR range and type
  *	    returned corresponds only to [start:*partial_end].  Caller has
  *	    to lookup again for [*partial_end:end].
+ *
+ * uniform - Set to 1 when an MTRR covers the region uniformly, i.e. the
+ *	     region is fully covered by a single MTRR entry or the default
+ *	     type.
  */
 static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
-				    int *repeat)
+				    int *repeat, u8 *uniform)
 {
 	int i;
 	u64 base, mask;
 	u8 prev_match, curr_match;
 
 	*repeat = 0;
+	*uniform = 1;
 
 	/* Make end inclusive instead of exclusive */
 	end--;
@@ -214,6 +219,7 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
 
 			end = *partial_end - 1; /* end is inclusive */
 			*repeat = 1;
+			*uniform = 0;
 		}
 
 		if ((start & mask) != (base & mask))
@@ -225,6 +231,7 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
 			continue;
 		}
 
+		*uniform = 0;
 		if (check_type_overlap(&prev_match, &curr_match))
 			return curr_match;
 	}
@@ -241,10 +248,15 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
  * Return Values:
  * MTRR_TYPE_(type)  - The effective MTRR type for the region
  * MTRR_TYPE_INVALID - MTRR is disabled
+ *
+ * Output Argument:
+ * uniform - Set to 1 when an MTRR covers the region uniformly, i.e. the
+ *	     region is fully covered by a single MTRR entry or the default
+ *	     type.
  */
-u8 mtrr_type_lookup(u64 start, u64 end)
+u8 mtrr_type_lookup(u64 start, u64 end, u8 *uniform)
 {
-	u8 type, prev_type;
+	u8 type, prev_type, is_uniform = 1, dummy;
 	int repeat;
 	u64 partial_end;
 
@@ -260,14 +272,18 @@ u8 mtrr_type_lookup(u64 start, u64 end)
 	 */
 	if ((start < 0x100000) &&
 	    (mtrr_state.have_fixed) &&
-	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
-		return mtrr_type_lookup_fixed(start, end);
+	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
+		is_uniform = 0;
+		type = mtrr_type_lookup_fixed(start, end);
+		goto out;
+	}
 
 	/*
 	 * Look up the variable ranges.  Look of multiple ranges matching
 	 * this address and pick type as per MTRR precedence.
 	 */
-	type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
+	type = mtrr_type_lookup_variable(start, end, &partial_end,
+					 &repeat, &is_uniform);
 
 	/*
 	 * Common path is with repeat = 0.
@@ -278,15 +294,19 @@ u8 mtrr_type_lookup(u64 start, u64 end)
 	while (repeat) {
 		prev_type = type;
 		start = partial_end;
-		type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
+		is_uniform = 0;
+		type = mtrr_type_lookup_variable(start, end, &partial_end,
+						 &repeat, &dummy);
 
 		if (check_type_overlap(&prev_type, &type))
-			return type;
+			goto out;
 	}
 
 	if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
-		return MTRR_TYPE_WRBACK;
+		type = MTRR_TYPE_WRBACK;
 
+out:
+	*uniform = is_uniform;
 	return type;
 }
 
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 35af6771a95a..372ad422c2c3 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -267,9 +267,9 @@ static unsigned long pat_x_mtrr_type(u64 start, u64 end,
 	 * request is for WB.
 	 */
 	if (req_type == _PAGE_CACHE_MODE_WB) {
-		u8 mtrr_type;
+		u8 mtrr_type, uniform;
 
-		mtrr_type = mtrr_type_lookup(start, end);
+		mtrr_type = mtrr_type_lookup(start, end, &uniform);
 		if (mtrr_type != MTRR_TYPE_WRBACK)
 			return _PAGE_CACHE_MODE_UC_MINUS;
 
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index c30f9819786b..fb0a9dd1d6e4 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -566,19 +566,28 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
 /**
  * pud_set_huge - setup kernel PUD mapping
  *
- * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
- * this function does not set up a huge page when the range is covered
- * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
- * disabled.
+ * MTRRs can override PAT memory types with 4KiB granularity. Therefore, this
+ * function sets up a huge page only if any of the following conditions are met:
+ *
+ * - MTRRs are disabled, or
+ *
+ * - MTRRs are enabled and the range is completely covered by a single MTRR, or
+ *
+ * - MTRRs are enabled and the corresponding MTRR memory type is WB, which
+ *   has no effect on the requested PAT memory type.
+ *
+ * Callers should try to decrease page size (1GB -> 2MB -> 4K) if the bigger
+ * page mapping attempt fails.
  *
  * Returns 1 on success and 0 on failure.
  */
 int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 {
-	u8 mtrr;
+	u8 mtrr, uniform;
 
-	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
+	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
+	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
+	    (mtrr != MTRR_TYPE_WRBACK))
 		return 0;
 
 	prot = pgprot_4k_2_large(prot);
@@ -593,20 +602,21 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 /**
  * pmd_set_huge - setup kernel PMD mapping
  *
- * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
- * this function does not set up a huge page when the range is covered
- * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
- * disabled.
+ * See text over pud_set_huge() above.
  *
  * Returns 1 on success and 0 on failure.
  */
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
 {
-	u8 mtrr;
+	u8 mtrr, uniform;
 
-	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
+	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
+	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
+	    (mtrr != MTRR_TYPE_WRBACK)) {
+		pr_warn_once("%s: Cannot satisfy [mem %#010llx-%#010llx] with a huge-page mapping due to MTRR override.\n",
+			     __func__, addr, addr + PMD_SIZE);
 		return 0;
+	}
 
 	prot = pgprot_4k_2_large(prot);
 
-- 
1.9.0.258.g00eda23


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH 08/18] x86/mm/pat: Convert to pr_* usage
  2015-05-26  8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
                   ` (6 preceding siblings ...)
  2015-05-26  8:28 ` [PATCH 07/18] x86/mm: Enhance MTRR checks in kernel mapping helpers Borislav Petkov
@ 2015-05-26  8:28 ` Borislav Petkov
  2015-05-27 14:19   ` [tip:x86/mm] x86/mm/pat: Convert to pr_*() usage tip-bot for Luis R. Rodriguez
  2015-05-26  8:28 ` [PATCH 09/18] x86: Document Write Combining MTRR type effects on PAT / non-PAT pages Borislav Petkov
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-26  8:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: X86-ML, LKML

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Use pr_info() instead of the old printk to prefix the component where
things are coming from. With this readers will know exactly where the
message is coming from. We use pr_* helpers but define pr_fmt to the
empty string for easier grepping for those error messages.

We leave the users of dprintk() in place, this will print only when the
debugpat kernel parameter is enabled. We want to leave those enabled as
a debug feature, but also make them use the same prefix.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Cc: x86@kernel.org
Cc: cocci@systeme.lip6.fr
Link: http://lkml.kernel.org/r/1430425520-22275-2-git-send-email-mcgrof@do-not-panic.com
[ Kill pr_fmt. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/mm/pat.c          | 44 ++++++++++++++++++++++----------------------
 arch/x86/mm/pat_internal.h |  2 +-
 arch/x86/mm/pat_rbtree.c   |  6 +++---
 3 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 372ad422c2c3..8c50b9bfa996 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -33,13 +33,16 @@
 #include "pat_internal.h"
 #include "mm_internal.h"
 
+#undef pr_fmt
+#define pr_fmt(fmt) "" fmt
+
 #ifdef CONFIG_X86_PAT
 int __read_mostly pat_enabled = 1;
 
 static inline void pat_disable(const char *reason)
 {
 	pat_enabled = 0;
-	printk(KERN_INFO "%s\n", reason);
+	pr_info("x86/PAT: %s\n", reason);
 }
 
 static int __init nopat(char *str)
@@ -188,7 +191,7 @@ void pat_init_cache_modes(void)
 					   pat_msg + 4 * i);
 		update_cache_mode_entry(i, cache);
 	}
-	pr_info("PAT configuration [0-7]: %s\n", pat_msg);
+	pr_info("x86/PAT: Configuration [0-7]: %s\n", pat_msg);
 }
 
 #define PAT(x, y)	((u64)PAT_ ## y << ((x)*8))
@@ -211,8 +214,7 @@ void pat_init(void)
 			 * switched to PAT on the boot CPU. We have no way to
 			 * undo PAT.
 			 */
-			printk(KERN_ERR "PAT enabled, "
-			       "but not supported by secondary CPU\n");
+			pr_err("x86/PAT: PAT enabled, but not supported by secondary CPU\n");
 			BUG();
 		}
 	}
@@ -347,7 +349,7 @@ static int reserve_ram_pages_type(u64 start, u64 end,
 		page = pfn_to_page(pfn);
 		type = get_page_memtype(page);
 		if (type != -1) {
-			pr_info("reserve_ram_pages_type failed [mem %#010Lx-%#010Lx], track 0x%x, req 0x%x\n",
+			pr_info("x86/PAT: reserve_ram_pages_type failed [mem %#010Lx-%#010Lx], track 0x%x, req 0x%x\n",
 				start, end - 1, type, req_type);
 			if (new_type)
 				*new_type = type;
@@ -451,9 +453,9 @@ int reserve_memtype(u64 start, u64 end, enum page_cache_mode req_type,
 
 	err = rbt_memtype_check_insert(new, new_type);
 	if (err) {
-		printk(KERN_INFO "reserve_memtype failed [mem %#010Lx-%#010Lx], track %s, req %s\n",
-		       start, end - 1,
-		       cattr_name(new->type), cattr_name(req_type));
+		pr_info("x86/PAT: reserve_memtype failed [mem %#010Lx-%#010Lx], track %s, req %s\n",
+			start, end - 1,
+			cattr_name(new->type), cattr_name(req_type));
 		kfree(new);
 		spin_unlock(&memtype_lock);
 
@@ -497,8 +499,8 @@ int free_memtype(u64 start, u64 end)
 	spin_unlock(&memtype_lock);
 
 	if (!entry) {
-		printk(KERN_INFO "%s:%d freeing invalid memtype [mem %#010Lx-%#010Lx]\n",
-		       current->comm, current->pid, start, end - 1);
+		pr_info("x86/PAT: %s:%d freeing invalid memtype [mem %#010Lx-%#010Lx]\n",
+			current->comm, current->pid, start, end - 1);
 		return -EINVAL;
 	}
 
@@ -628,8 +630,8 @@ static inline int range_is_allowed(unsigned long pfn, unsigned long size)
 
 	while (cursor < to) {
 		if (!devmem_is_allowed(pfn)) {
-			printk(KERN_INFO "Program %s tried to access /dev/mem between [mem %#010Lx-%#010Lx], PAT prevents it\n",
-			       current->comm, from, to - 1);
+			pr_info("x86/PAT: Program %s tried to access /dev/mem between [mem %#010Lx-%#010Lx], PAT prevents it\n",
+				current->comm, from, to - 1);
 			return 0;
 		}
 		cursor += PAGE_SIZE;
@@ -698,8 +700,7 @@ int kernel_map_sync_memtype(u64 base, unsigned long size,
 				size;
 
 	if (ioremap_change_attr((unsigned long)__va(base), id_sz, pcm) < 0) {
-		printk(KERN_INFO "%s:%d ioremap_change_attr failed %s "
-			"for [mem %#010Lx-%#010Lx]\n",
+		pr_info("x86/PAT: %s:%d ioremap_change_attr failed %s for [mem %#010Lx-%#010Lx]\n",
 			current->comm, current->pid,
 			cattr_name(pcm),
 			base, (unsigned long long)(base + size-1));
@@ -734,7 +735,7 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
 
 		pcm = lookup_memtype(paddr);
 		if (want_pcm != pcm) {
-			printk(KERN_WARNING "%s:%d map pfn RAM range req %s for [mem %#010Lx-%#010Lx], got %s\n",
+			pr_warn("x86/PAT: %s:%d map pfn RAM range req %s for [mem %#010Lx-%#010Lx], got %s\n",
 				current->comm, current->pid,
 				cattr_name(want_pcm),
 				(unsigned long long)paddr,
@@ -755,13 +756,12 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
 		if (strict_prot ||
 		    !is_new_memtype_allowed(paddr, size, want_pcm, pcm)) {
 			free_memtype(paddr, paddr + size);
-			printk(KERN_ERR "%s:%d map pfn expected mapping type %s"
-				" for [mem %#010Lx-%#010Lx], got %s\n",
-				current->comm, current->pid,
-				cattr_name(want_pcm),
-				(unsigned long long)paddr,
-				(unsigned long long)(paddr + size - 1),
-				cattr_name(pcm));
+			pr_err("x86/PAT: %s:%d map pfn expected mapping type %s for [mem %#010Lx-%#010Lx], got %s\n",
+			       current->comm, current->pid,
+			       cattr_name(want_pcm),
+			       (unsigned long long)paddr,
+			       (unsigned long long)(paddr + size - 1),
+			       cattr_name(pcm));
 			return -EINVAL;
 		}
 		/*
diff --git a/arch/x86/mm/pat_internal.h b/arch/x86/mm/pat_internal.h
index f6411620305d..a739bfc40690 100644
--- a/arch/x86/mm/pat_internal.h
+++ b/arch/x86/mm/pat_internal.h
@@ -4,7 +4,7 @@
 extern int pat_debug_enable;
 
 #define dprintk(fmt, arg...) \
-	do { if (pat_debug_enable) printk(KERN_INFO fmt, ##arg); } while (0)
+	do { if (pat_debug_enable) pr_info("x86/PAT: " fmt, ##arg); } while (0)
 
 struct memtype {
 	u64			start;
diff --git a/arch/x86/mm/pat_rbtree.c b/arch/x86/mm/pat_rbtree.c
index 6582adcc8bd9..63931080366a 100644
--- a/arch/x86/mm/pat_rbtree.c
+++ b/arch/x86/mm/pat_rbtree.c
@@ -160,9 +160,9 @@ success:
 	return 0;
 
 failure:
-	printk(KERN_INFO "%s:%d conflicting memory types "
-		"%Lx-%Lx %s<->%s\n", current->comm, current->pid, start,
-		end, cattr_name(found_type), cattr_name(match->type));
+	pr_info("x86/PAT: %s:%d conflicting memory types %Lx-%Lx %s<->%s\n",
+		current->comm, current->pid, start, end,
+		cattr_name(found_type), cattr_name(match->type));
 	return -EBUSY;
 }
 
-- 
1.9.0.258.g00eda23


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH 09/18] x86: Document Write Combining MTRR type effects on PAT / non-PAT pages
  2015-05-26  8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
                   ` (7 preceding siblings ...)
  2015-05-26  8:28 ` [PATCH 08/18] x86/mm/pat: Convert to pr_* usage Borislav Petkov
@ 2015-05-26  8:28 ` Borislav Petkov
  2015-05-27 14:19   ` [tip:x86/mm] x86/mm/mtrr, pat: " tip-bot for Luis R. Rodriguez
  2015-05-26  8:28 ` [PATCH 10/18] x86/mtrr: Avoid ifdeffery with phys_wc_to_mtrr_index() Borislav Petkov
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-26  8:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: X86-ML, LKML

From: "Luis R. Rodriguez" <mcgrof@suse.com>

As part of the effort to phase out MTRR use document write-combining
MTRR effects on pages with different non-PAT page attributes flags and
different PAT entry values. Extend arch_phys_wc_add() documentation
to clarify power of two sizes / boundary requirements as we phase out
mtrr_add() use.

Lastly hint towards ioremap_uc() for corner cases on device drivers
working with devices with mixed regions where MTRR size requirements
would otherwise not enable write-combining effective memory types.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: linux-fbdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1430343851-967-3-git-send-email-mcgrof@do-not-panic.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 Documentation/x86/mtrr.txt      | 18 +++++++++++++++---
 Documentation/x86/pat.txt       | 35 ++++++++++++++++++++++++++++++++++-
 arch/x86/kernel/cpu/mtrr/main.c |  3 +++
 3 files changed, 52 insertions(+), 4 deletions(-)

diff --git a/Documentation/x86/mtrr.txt b/Documentation/x86/mtrr.txt
index cc071dc333c2..860bc3adc223 100644
--- a/Documentation/x86/mtrr.txt
+++ b/Documentation/x86/mtrr.txt
@@ -1,7 +1,19 @@
 MTRR (Memory Type Range Register) control
-3 Jun 1999
-Richard Gooch
-<rgooch@atnf.csiro.au>
+
+Richard Gooch <rgooch@atnf.csiro.au> - 3 Jun 1999
+Luis R. Rodriguez <mcgrof@do-not-panic.com> - April 9, 2015
+
+===============================================================================
+Phasing out MTRR use
+
+MTRR use is replaced on modern x86 hardware with PAT. Over time the only type
+of effective MTRR that is expected to be supported will be for write-combining.
+As MTRR use is phased out device drivers should use arch_phys_wc_add() to make
+MTRR effective on non-PAT systems while a no-op on PAT enabled systems.
+
+For details refer to Documentation/x86/pat.txt.
+
+===============================================================================
 
   On Intel P6 family processors (Pentium Pro, Pentium II and later)
   the Memory Type Range Registers (MTRRs) may be used to control
diff --git a/Documentation/x86/pat.txt b/Documentation/x86/pat.txt
index cf08c9fff3cd..521bd8adc3b8 100644
--- a/Documentation/x86/pat.txt
+++ b/Documentation/x86/pat.txt
@@ -34,6 +34,8 @@ ioremap                |    --    |    UC-     |       UC-        |
                        |          |            |                  |
 ioremap_cache          |    --    |    WB      |       WB         |
                        |          |            |                  |
+ioremap_uc             |    --    |    UC      |       UC         |
+                       |          |            |                  |
 ioremap_nocache        |    --    |    UC-     |       UC-        |
                        |          |            |                  |
 ioremap_wc             |    --    |    --      |       WC         |
@@ -102,7 +104,38 @@ wants to export a RAM region, it has to do set_memory_uc() or set_memory_wc()
 as step 0 above and also track the usage of those pages and use set_memory_wb()
 before the page is freed to free pool.
 
-
+MTRR effects on PAT / non-PAT systems
+-------------------------------------
+
+The following table provides the effects of using write-combining MTRRs when
+using ioremap*() calls on x86 for both non-PAT and PAT systems. Ideally
+mtrr_add() usage will be phased out in favor of arch_phys_wc_add() which will
+be a no-op on PAT enabled systems. The region over which a arch_phys_wc_add()
+is made, should already have been ioremapped with WC attributes or PAT entries,
+this can be done by using ioremap_wc() / set_memory_wc().  Devices which
+combine areas of IO memory desired to remain uncacheable with areas where
+write-combining is desirable should consider use of ioremap_uc() followed by
+set_memory_wc() to white-list effective write-combined areas.  Such use is
+nevertheless discouraged as the effective memory type is considered
+implementation defined, yet this strategy can be used as last resort on devices
+with size-constrained regions where otherwise MTRR write-combining would
+otherwise not be effective.
+
+----------------------------------------------------------------------
+MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
+----------------------------------------------------------------------
+                                                  Non-PAT |  PAT
+     PAT
+     |PCD
+     ||PWT
+     |||
+WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
+WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
+WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   UC
+WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
+----------------------------------------------------------------------
+
+(*) denotes implementation defined and is discouraged
 
 Notes:
 
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index ea5f363a1948..04aceb7e6443 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -538,6 +538,9 @@ EXPORT_SYMBOL(mtrr_del);
  * attempts to add a WC MTRR covering size bytes starting at base and
  * logs an error if this fails.
  *
+ * The called should provide a power of two size on an equivalent
+ * power of two boundary.
+ *
  * Drivers must store the return value to pass to mtrr_del_wc_if_needed,
  * but drivers should not try to interpret that return value.
  */
-- 
1.9.0.258.g00eda23


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH 10/18] x86/mtrr: Avoid ifdeffery with phys_wc_to_mtrr_index()
  2015-05-26  8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
                   ` (8 preceding siblings ...)
  2015-05-26  8:28 ` [PATCH 09/18] x86: Document Write Combining MTRR type effects on PAT / non-PAT pages Borislav Petkov
@ 2015-05-26  8:28 ` Borislav Petkov
  2015-05-27 14:20   ` [tip:x86/mm] x86/mm/mtrr: Avoid #ifdeffery " tip-bot for Luis R. Rodriguez
  2015-05-26  8:28 ` [PATCH 11/18] x86/mtrr: Generalize runtime disabling of MTRRs Borislav Petkov
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-26  8:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: X86-ML, LKML

From: "Luis R. Rodriguez" <mcgrof@suse.com>

There is only one user but since we're going to bury MTRR next out of
access to drivers, expose this last piece of API to drivers in a general
fashion only needing io.h for access to helpers.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Abhilash Kesavan <a.kesavan@samsung.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Cristian Stoica <cristian.stoica@freescale.com>
Cc: dri-devel@lists.freedesktop.org
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dbueso@suse.de>
Link: http://lkml.kernel.org/r/1429722736-4473-1-git-send-email-mcgrof@do-not-panic.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/io.h       |  3 +++
 arch/x86/include/asm/mtrr.h     |  5 -----
 arch/x86/kernel/cpu/mtrr/main.c |  6 +++---
 drivers/gpu/drm/drm_ioctl.c     | 14 +-------------
 include/linux/io.h              |  7 +++++++
 5 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 4afc05ffa566..a2b97404922d 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -339,6 +339,9 @@ extern bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
 #define IO_SPACE_LIMIT 0xffff
 
 #ifdef CONFIG_MTRR
+extern int __must_check arch_phys_wc_index(int handle);
+#define arch_phys_wc_index arch_phys_wc_index
+
 extern int __must_check arch_phys_wc_add(unsigned long base,
 					 unsigned long size);
 extern void arch_phys_wc_del(int handle);
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index a31759e1edd9..b94f6f64e23d 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -48,7 +48,6 @@ extern void mtrr_aps_init(void);
 extern void mtrr_bp_restore(void);
 extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
 extern int amd_special_default_mtrr(void);
-extern int phys_wc_to_mtrr_index(int handle);
 #  else
 static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
 {
@@ -84,10 +83,6 @@ static inline int mtrr_trim_uncached_memory(unsigned long end_pfn)
 static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
 {
 }
-static inline int phys_wc_to_mtrr_index(int handle)
-{
-	return -1;
-}
 
 #define mtrr_ap_init() do {} while (0)
 #define mtrr_bp_init() do {} while (0)
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 04aceb7e6443..81baf5fee0e1 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -580,7 +580,7 @@ void arch_phys_wc_del(int handle)
 EXPORT_SYMBOL(arch_phys_wc_del);
 
 /*
- * phys_wc_to_mtrr_index - translates arch_phys_wc_add's return value
+ * arch_phys_wc_index - translates arch_phys_wc_add's return value
  * @handle: Return value from arch_phys_wc_add
  *
  * This will turn the return value from arch_phys_wc_add into an mtrr
@@ -590,14 +590,14 @@ EXPORT_SYMBOL(arch_phys_wc_del);
  * in printk line.  Alas there is an illegitimate use in some ancient
  * drm ioctls.
  */
-int phys_wc_to_mtrr_index(int handle)
+int arch_phys_wc_index(int handle)
 {
 	if (handle < MTRR_TO_PHYS_WC_OFFSET)
 		return -1;
 	else
 		return handle - MTRR_TO_PHYS_WC_OFFSET;
 }
-EXPORT_SYMBOL_GPL(phys_wc_to_mtrr_index);
+EXPORT_SYMBOL_GPL(arch_phys_wc_index);
 
 /*
  * HACK ALERT!
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 266dcd6cdf3b..0a957828b3bd 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -36,9 +36,6 @@
 
 #include <linux/pci.h>
 #include <linux/export.h>
-#ifdef CONFIG_X86
-#include <asm/mtrr.h>
-#endif
 
 static int drm_version(struct drm_device *dev, void *data,
 		       struct drm_file *file_priv);
@@ -197,16 +194,7 @@ static int drm_getmap(struct drm_device *dev, void *data,
 	map->type = r_list->map->type;
 	map->flags = r_list->map->flags;
 	map->handle = (void *)(unsigned long) r_list->user_token;
-
-#ifdef CONFIG_X86
-	/*
-	 * There appears to be exactly one user of the mtrr index: dritest.
-	 * It's easy enough to keep it working on non-PAT systems.
-	 */
-	map->mtrr = phys_wc_to_mtrr_index(r_list->map->mtrr);
-#else
-	map->mtrr = -1;
-#endif
+	map->mtrr = arch_phys_wc_index(r_list->map->mtrr);
 
 	mutex_unlock(&dev->struct_mutex);
 
diff --git a/include/linux/io.h b/include/linux/io.h
index 986f2bffea1e..04cce4da3685 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -111,6 +111,13 @@ static inline void arch_phys_wc_del(int handle)
 }
 
 #define arch_phys_wc_add arch_phys_wc_add
+#ifndef arch_phys_wc_index
+static inline int arch_phys_wc_index(int handle)
+{
+	return -1;
+}
+#define arch_phys_wc_index arch_phys_wc_index
+#endif
 #endif
 
 #endif /* _LINUX_IO_H */
-- 
1.9.0.258.g00eda23


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH 11/18] x86/mtrr: Generalize runtime disabling of MTRRs
  2015-05-26  8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
                   ` (9 preceding siblings ...)
  2015-05-26  8:28 ` [PATCH 10/18] x86/mtrr: Avoid ifdeffery with phys_wc_to_mtrr_index() Borislav Petkov
@ 2015-05-26  8:28 ` Borislav Petkov
  2015-05-27 14:20   ` [tip:x86/mm] x86/mm/mtrr: " tip-bot for Luis R. Rodriguez
  2015-05-26  8:28 ` [PATCH 12/18] x86/mm/pat: Wrap pat_enabled Borislav Petkov
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-26  8:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: X86-ML, LKML

From: "Luis R. Rodriguez" <mcgrof@suse.com>

It is possible to enable CONFIG_MTRR and CONFIG_X86_PAT and end up with
a system with MTRR functionality disabled but PAT functionality enabled.
This can happen, for instance, when the Xen hypervisor is used where
MTRRs are not supported but PAT is. This can happen on Linux as of commit

  47591df50512 ("xen: Support Xen pv-domains using PAT")

by Juergen, introduced in v3.19.

Technically, we should assume the proper CPU bits would be set to
disable MTRRs but we can't always rely on this. At least on the Xen
Hypervisor, for instance, only X86_FEATURE_MTRR was disabled as of
Xen 4.4 through Xen commit 586ab6a [0], but not X86_FEATURE_K6_MTRR,
X86_FEATURE_CENTAUR_MCR, or X86_FEATURE_CYRIX_ARR for instance.

Roger Pau Monné has clarified though that although this is technically
true we will never support PVH on these CPU types so Xen has no need to
disable these bits on those systems. As per Roger, AMD K6, Centaur and
VIA chips don't have the necessary hardware extensions to allow running
PVH guests [1].

As per Toshi it is also possible for the BIOS to disable MTRR support,
in such cases get_mtrr_state() would update the MTRR state as per the
BIOS, we need to propagate this information as well.

x86 MTRR code relies on quite a bit of checks for mtrr_if being set to
check to see if MTRRs did get set up. Instead, lets provide a generic
getter for that. This also adds a few checks where they were not before
which could potentially safeguard ourselves against incorrect usage of
MTRR where this was not desirable.

Where possible match error codes as if MTRRs were disabled on
arch/x86/include/asm/mtrr.h.

Lastly, since disabling MTRRs can happen at run time and we could end up
with PAT enabled, best record now in our logs when MTRRs are disabled.

[0] ~/devel/xen (git::stable-4.5)$ git describe --contains 586ab6a 4.4.0-rc1~18
[1] http://lists.xenproject.org/archives/html/xen-devel/2015-03/msg03460.html

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: venkatesh.pallipadi@intel.com
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: konrad.wilk@oracle.com
Cc: ville.syrjala@linux.intel.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: toshi.kani@hp.com
Cc: bhelgaas@google.com
Cc: xen-devel@lists.xensource.com
Link: http://lkml.kernel.org/r/1426893517-2511-3-git-send-email-mcgrof@do-not-panic.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/mtrr/generic.c |  4 +++-
 arch/x86/kernel/cpu/mtrr/main.c    | 39 ++++++++++++++++++++++++++++++--------
 arch/x86/kernel/cpu/mtrr/mtrr.h    |  2 +-
 3 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index f782d9b62cb3..3b533cf37c74 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -445,7 +445,7 @@ static void __init print_mtrr_state(void)
 }
 
 /* Grab all of the MTRR state for this CPU into *state */
-void __init get_mtrr_state(void)
+bool __init get_mtrr_state(void)
 {
 	struct mtrr_var_range *vrs;
 	unsigned long flags;
@@ -489,6 +489,8 @@ void __init get_mtrr_state(void)
 
 	post_set();
 	local_irq_restore(flags);
+
+	return !!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED);
 }
 
 /* Some BIOS's are messed up and don't set all MTRRs the same! */
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 81baf5fee0e1..383efb26e516 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -59,6 +59,12 @@
 #define MTRR_TO_PHYS_WC_OFFSET 1000
 
 u32 num_var_ranges;
+static bool __mtrr_enabled;
+
+static bool mtrr_enabled(void)
+{
+	return __mtrr_enabled;
+}
 
 unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
 static DEFINE_MUTEX(mtrr_mutex);
@@ -286,7 +292,7 @@ int mtrr_add_page(unsigned long base, unsigned long size,
 	int i, replace, error;
 	mtrr_type ltype;
 
-	if (!mtrr_if)
+	if (!mtrr_enabled())
 		return -ENXIO;
 
 	error = mtrr_if->validate_add_page(base, size, type);
@@ -435,6 +441,8 @@ static int mtrr_check(unsigned long base, unsigned long size)
 int mtrr_add(unsigned long base, unsigned long size, unsigned int type,
 	     bool increment)
 {
+	if (!mtrr_enabled())
+		return -ENODEV;
 	if (mtrr_check(base, size))
 		return -EINVAL;
 	return mtrr_add_page(base >> PAGE_SHIFT, size >> PAGE_SHIFT, type,
@@ -463,8 +471,8 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
 	unsigned long lbase, lsize;
 	int error = -EINVAL;
 
-	if (!mtrr_if)
-		return -ENXIO;
+	if (!mtrr_enabled())
+		return -ENODEV;
 
 	max = num_var_ranges;
 	/* No CPU hotplug when we change MTRR entries */
@@ -523,6 +531,8 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
  */
 int mtrr_del(int reg, unsigned long base, unsigned long size)
 {
+	if (!mtrr_enabled())
+		return -ENODEV;
 	if (mtrr_check(base, size))
 		return -EINVAL;
 	return mtrr_del_page(reg, base >> PAGE_SHIFT, size >> PAGE_SHIFT);
@@ -548,7 +558,7 @@ int arch_phys_wc_add(unsigned long base, unsigned long size)
 {
 	int ret;
 
-	if (pat_enabled)
+	if (pat_enabled || !mtrr_enabled())
 		return 0;  /* Success!  (We don't need to do anything.) */
 
 	ret = mtrr_add(base, size, MTRR_TYPE_WRCOMB, true);
@@ -737,10 +747,12 @@ void __init mtrr_bp_init(void)
 	}
 
 	if (mtrr_if) {
+		__mtrr_enabled = true;
 		set_num_var_ranges();
 		init_table();
 		if (use_intel()) {
-			get_mtrr_state();
+			/* BIOS may override */
+			__mtrr_enabled = get_mtrr_state();
 
 			if (mtrr_cleanup(phys_addr)) {
 				changed_by_mtrr_cleanup = 1;
@@ -748,10 +760,16 @@ void __init mtrr_bp_init(void)
 			}
 		}
 	}
+
+	if (!mtrr_enabled())
+		pr_info("MTRR: Disabled\n");
 }
 
 void mtrr_ap_init(void)
 {
+	if (!mtrr_enabled())
+		return;
+
 	if (!use_intel() || mtrr_aps_delayed_init)
 		return;
 	/*
@@ -777,6 +795,9 @@ void mtrr_save_state(void)
 {
 	int first_cpu;
 
+	if (!mtrr_enabled())
+		return;
+
 	get_online_cpus();
 	first_cpu = cpumask_first(cpu_online_mask);
 	smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
@@ -785,6 +806,8 @@ void mtrr_save_state(void)
 
 void set_mtrr_aps_delayed_init(void)
 {
+	if (!mtrr_enabled())
+		return;
 	if (!use_intel())
 		return;
 
@@ -796,7 +819,7 @@ void set_mtrr_aps_delayed_init(void)
  */
 void mtrr_aps_init(void)
 {
-	if (!use_intel())
+	if (!use_intel() || !mtrr_enabled())
 		return;
 
 	/*
@@ -813,7 +836,7 @@ void mtrr_aps_init(void)
 
 void mtrr_bp_restore(void)
 {
-	if (!use_intel())
+	if (!use_intel() || !mtrr_enabled())
 		return;
 
 	mtrr_if->set_all();
@@ -821,7 +844,7 @@ void mtrr_bp_restore(void)
 
 static int __init mtrr_init_finialize(void)
 {
-	if (!mtrr_if)
+	if (!mtrr_enabled())
 		return 0;
 
 	if (use_intel()) {
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
index df5e41f31a27..951884dcc433 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -51,7 +51,7 @@ void set_mtrr_prepare_save(struct set_mtrr_context *ctxt);
 
 void fill_mtrr_var_range(unsigned int index,
 		u32 base_lo, u32 base_hi, u32 mask_lo, u32 mask_hi);
-void get_mtrr_state(void);
+bool get_mtrr_state(void);
 
 extern void set_mtrr_ops(const struct mtrr_ops *ops);
 
-- 
1.9.0.258.g00eda23


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH 12/18] x86/mm/pat: Wrap pat_enabled
  2015-05-26  8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
                   ` (10 preceding siblings ...)
  2015-05-26  8:28 ` [PATCH 11/18] x86/mtrr: Generalize runtime disabling of MTRRs Borislav Petkov
@ 2015-05-26  8:28 ` Borislav Petkov
  2015-05-27 14:20   ` [tip:x86/mm] x86/mm/pat: Wrap pat_enabled into a function API tip-bot for Luis R. Rodriguez
  2015-05-26  8:28 ` [PATCH 13/18] x86/mm/pat: Export pat_enabled() Borislav Petkov
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-26  8:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: X86-ML, LKML

From: "Luis R. Rodriguez" <mcgrof@suse.com>

We use pat_enabled in x86-specific code to see if PAT is enabled or not
but we're granting full access to it even though readers do not need to
set it. If, for instance, we granted access to it to modules later they
then could override the variable setting... no bueno.

This renames pat_enabled to a new static variable __pat_enabled. Folks
are redirected to use pat_enabled() now.

Code that sets this can only be internal to pat.c. Apart from the early
kernel parameter "nopat" to disable PAT, we also have a few cases that
disable it later and make use of a helper pat_disable(). It is wrapped
under an ifdef but since that code cannot run unless PAT was enabled its
not required to wrap it with ifdefs, unwrap that. Likewise, since "nopat"
doesn't really change non-PAT systems just remove that ifdef as well.

Although we could add and use an early_param_off(), these helpers don't
use __read_mostly but we want to keep __read_mostly for __pat_enabled as
this is a hot path -- upon boot, for instance, a simple guest may see
~4k accesses to pat_enabled(). Since __read_mostly early boot params are
not that common we don't add a helper for them just yet.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Link: http://lkml.kernel.org/r/1430425520-22275-3-git-send-email-mcgrof@do-not-panic.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/pat.h      |  7 +------
 arch/x86/kernel/cpu/mtrr/main.c |  2 +-
 arch/x86/mm/iomap_32.c          |  2 +-
 arch/x86/mm/ioremap.c           |  4 ++--
 arch/x86/mm/pageattr.c          |  2 +-
 arch/x86/mm/pat.c               | 33 +++++++++++++++------------------
 arch/x86/pci/i386.c             |  6 +++---
 7 files changed, 24 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/pat.h b/arch/x86/include/asm/pat.h
index 91bc4ba95f91..cdcff7f7f694 100644
--- a/arch/x86/include/asm/pat.h
+++ b/arch/x86/include/asm/pat.h
@@ -4,12 +4,7 @@
 #include <linux/types.h>
 #include <asm/pgtable_types.h>
 
-#ifdef CONFIG_X86_PAT
-extern int pat_enabled;
-#else
-static const int pat_enabled;
-#endif
-
+bool pat_enabled(void);
 extern void pat_init(void);
 void pat_init_cache_modes(void);
 
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 383efb26e516..e7ed0d8ebacb 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -558,7 +558,7 @@ int arch_phys_wc_add(unsigned long base, unsigned long size)
 {
 	int ret;
 
-	if (pat_enabled || !mtrr_enabled())
+	if (pat_enabled() || !mtrr_enabled())
 		return 0;  /* Success!  (We don't need to do anything.) */
 
 	ret = mtrr_add(base, size, MTRR_TYPE_WRCOMB, true);
diff --git a/arch/x86/mm/iomap_32.c b/arch/x86/mm/iomap_32.c
index 9ca35fc60cfe..3a2ec8790ca7 100644
--- a/arch/x86/mm/iomap_32.c
+++ b/arch/x86/mm/iomap_32.c
@@ -82,7 +82,7 @@ iomap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot)
 	 * MTRR is UC or WC.  UC_MINUS gets the real intention, of the
 	 * user, which is "WC if the MTRR is WC, UC if you can't do that."
 	 */
-	if (!pat_enabled && pgprot_val(prot) ==
+	if (!pat_enabled() && pgprot_val(prot) ==
 	    (__PAGE_KERNEL | cachemode2protval(_PAGE_CACHE_MODE_WC)))
 		prot = __pgprot(__PAGE_KERNEL |
 				cachemode2protval(_PAGE_CACHE_MODE_UC_MINUS));
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index a493bb83aa89..82d63ed70045 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -234,7 +234,7 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, unsigned long size)
 {
 	/*
 	 * Ideally, this should be:
-	 *	pat_enabled ? _PAGE_CACHE_MODE_UC : _PAGE_CACHE_MODE_UC_MINUS;
+	 *	pat_enabled() ? _PAGE_CACHE_MODE_UC : _PAGE_CACHE_MODE_UC_MINUS;
 	 *
 	 * Till we fix all X drivers to use ioremap_wc(), we will use
 	 * UC MINUS. Drivers that are certain they need or can already
@@ -292,7 +292,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc);
  */
 void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
 {
-	if (pat_enabled)
+	if (pat_enabled())
 		return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC,
 					__builtin_return_address(0));
 	else
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 397838eb292b..70d221fe2eb4 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1571,7 +1571,7 @@ int set_memory_wc(unsigned long addr, int numpages)
 {
 	int ret;
 
-	if (!pat_enabled)
+	if (!pat_enabled())
 		return set_memory_uc(addr, numpages);
 
 	ret = reserve_memtype(__pa(addr), __pa(addr) + numpages * PAGE_SIZE,
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 8c50b9bfa996..484dce7f759b 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -36,12 +36,11 @@
 #undef pr_fmt
 #define pr_fmt(fmt) "" fmt
 
-#ifdef CONFIG_X86_PAT
-int __read_mostly pat_enabled = 1;
+static int __read_mostly __pat_enabled = IS_ENABLED(CONFIG_X86_PAT);
 
 static inline void pat_disable(const char *reason)
 {
-	pat_enabled = 0;
+	__pat_enabled = 0;
 	pr_info("x86/PAT: %s\n", reason);
 }
 
@@ -51,13 +50,11 @@ static int __init nopat(char *str)
 	return 0;
 }
 early_param("nopat", nopat);
-#else
-static inline void pat_disable(const char *reason)
+
+bool pat_enabled(void)
 {
-	(void)reason;
+	return !!__pat_enabled;
 }
-#endif
-
 
 int pat_debug_enable;
 
@@ -201,7 +198,7 @@ void pat_init(void)
 	u64 pat;
 	bool boot_cpu = !boot_pat_state;
 
-	if (!pat_enabled)
+	if (!pat_enabled())
 		return;
 
 	if (!cpu_has_pat) {
@@ -402,7 +399,7 @@ int reserve_memtype(u64 start, u64 end, enum page_cache_mode req_type,
 
 	BUG_ON(start >= end); /* end is exclusive */
 
-	if (!pat_enabled) {
+	if (!pat_enabled()) {
 		/* This is identical to page table setting without PAT */
 		if (new_type) {
 			if (req_type == _PAGE_CACHE_MODE_WC)
@@ -477,7 +474,7 @@ int free_memtype(u64 start, u64 end)
 	int is_range_ram;
 	struct memtype *entry;
 
-	if (!pat_enabled)
+	if (!pat_enabled())
 		return 0;
 
 	/* Low ISA region is always mapped WB. No need to track */
@@ -625,7 +622,7 @@ static inline int range_is_allowed(unsigned long pfn, unsigned long size)
 	u64 to = from + size;
 	u64 cursor = from;
 
-	if (!pat_enabled)
+	if (!pat_enabled())
 		return 1;
 
 	while (cursor < to) {
@@ -661,7 +658,7 @@ int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
 	 * caching for the high addresses through the KEN pin, but
 	 * we maintain the tradition of paranoia in this code.
 	 */
-	if (!pat_enabled &&
+	if (!pat_enabled() &&
 	    !(boot_cpu_has(X86_FEATURE_MTRR) ||
 	      boot_cpu_has(X86_FEATURE_K6_MTRR) ||
 	      boot_cpu_has(X86_FEATURE_CYRIX_ARR) ||
@@ -730,7 +727,7 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
 	 * the type requested matches the type of first page in the range.
 	 */
 	if (is_ram) {
-		if (!pat_enabled)
+		if (!pat_enabled())
 			return 0;
 
 		pcm = lookup_memtype(paddr);
@@ -844,7 +841,7 @@ int track_pfn_remap(struct vm_area_struct *vma, pgprot_t *prot,
 		return ret;
 	}
 
-	if (!pat_enabled)
+	if (!pat_enabled())
 		return 0;
 
 	/*
@@ -872,7 +869,7 @@ int track_pfn_insert(struct vm_area_struct *vma, pgprot_t *prot,
 {
 	enum page_cache_mode pcm;
 
-	if (!pat_enabled)
+	if (!pat_enabled())
 		return 0;
 
 	/* Set prot based on lookup */
@@ -913,7 +910,7 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
 
 pgprot_t pgprot_writecombine(pgprot_t prot)
 {
-	if (pat_enabled)
+	if (pat_enabled())
 		return __pgprot(pgprot_val(prot) |
 				cachemode2protval(_PAGE_CACHE_MODE_WC));
 	else
@@ -996,7 +993,7 @@ static const struct file_operations memtype_fops = {
 
 static int __init pat_memtype_list_init(void)
 {
-	if (pat_enabled) {
+	if (pat_enabled()) {
 		debugfs_create_file("pat_memtype_list", S_IRUSR,
 				    arch_debugfs_dir, NULL, &memtype_fops);
 	}
diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index 349c0d32cc0b..0a9f2caf358f 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -429,12 +429,12 @@ int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
  	 * Caller can followup with UC MINUS request and add a WC mtrr if there
  	 * is a free mtrr slot.
  	 */
-	if (!pat_enabled && write_combine)
+	if (!pat_enabled() && write_combine)
 		return -EINVAL;
 
-	if (pat_enabled && write_combine)
+	if (pat_enabled() && write_combine)
 		prot |= cachemode2protval(_PAGE_CACHE_MODE_WC);
-	else if (pat_enabled || boot_cpu_data.x86 > 3)
+	else if (pat_enabled() || boot_cpu_data.x86 > 3)
 		/*
 		 * ioremap() and ioremap_nocache() defaults to UC MINUS for now.
 		 * To avoid attribute conflicts, request UC MINUS here
-- 
1.9.0.258.g00eda23


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH 13/18] x86/mm/pat: Export pat_enabled()
  2015-05-26  8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
                   ` (11 preceding siblings ...)
  2015-05-26  8:28 ` [PATCH 12/18] x86/mm/pat: Wrap pat_enabled Borislav Petkov
@ 2015-05-26  8:28 ` Borislav Petkov
  2015-05-27 14:21   ` [tip:x86/mm] " tip-bot for Luis R. Rodriguez
  2015-05-26  8:28 ` [PATCH 14/18] x86/cpu: Strip any /proc/cpuinfo model name field whitespace Borislav Petkov
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-26  8:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: X86-ML, LKML

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Two Linux device drivers cannot work with PAT and the work required to
make them work is significant. There is not enough motivation to convert
these drivers over to use PAT properly, the compromise reached is to let
drivers that cannot be ported to PAT check if PAT was enabled and if
so fail on probe with a recommendation to boot with the "nopat" kernel
parameter.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Link: http://lkml.kernel.org/r/1430425520-22275-4-git-send-email-mcgrof@do-not-panic.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/mm/pat.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 484dce7f759b..a1c96544099d 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -55,6 +55,7 @@ bool pat_enabled(void)
 {
 	return !!__pat_enabled;
 }
+EXPORT_SYMBOL_GPL(pat_enabled);
 
 int pat_debug_enable;
 
-- 
1.9.0.258.g00eda23


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH 14/18] x86/cpu: Strip any /proc/cpuinfo model name field whitespace
  2015-05-26  8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
                   ` (12 preceding siblings ...)
  2015-05-26  8:28 ` [PATCH 13/18] x86/mm/pat: Export pat_enabled() Borislav Petkov
@ 2015-05-26  8:28 ` Borislav Petkov
  2015-05-27 14:16   ` [tip:x86/cpu] x86/cpu: Strip any /proc/ cpuinfo " tip-bot for Prarit Bhargava
  2015-05-26  8:28 ` [PATCH 15/18] x86/documentation: Move kernel-stacks doc one level up Borislav Petkov
                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-26  8:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: X86-ML, LKML

From: Prarit Bhargava <prarit@redhat.com>

When comparing the 'model name' field of each core in /proc/cpuinfo it
was noticed that there is a whitespace difference between the cores'
model names.

After some quick investigation it was noticed that the model name fields
were actually different -- processor 0's model name field had trailing
whitespace removed, while the other processors did not.

Another way of seeing this behaviour is to convert spaces into
underscores in the output of /proc/cpuinfo,

  [thetango@prarit ~]# grep "^model name" /proc/cpuinfo | uniq -c | sed 's/\ /_/g'
  ______1_model_name      :_AMD_Opteron(TM)_Processor_6272
  _____63_model_name      :_AMD_Opteron(TM)_Processor_6272_________________

which shows the discrepancy.

This occurs because the kernel calls strim() on cpu 0's x86_model_id
field to output a pretty message to the console in print_cpu_info(),
and as a result strips the whitespace at the end of the ->x86_model_id
field.

But, the ->x86_model_id field should be the same for the all identical
CPUs in the box. Thus, we need to remove both leading and trailing
whitespace.

As a result, the print_cpu_info() output looks like

  smpboot: CPU0: AMD Opteron(TM) Processor 6272 (fam: 15, model: 01, stepping: 02)

and the x86_model_id field is correct on all processors on AMD platforms:

  _____64_model_name      :_AMD_Opteron(TM)_Processor_6272

Output is still correct on an Intel box:

  ____144_model_name      :_Intel(R)_Xeon(R)_CPU_E7-8890_v3_@_2.50GHz

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: lkml <linux-kernel@vger.kernel.org>
Link: http://lkml.kernel.org/r/1432050210-32036-1-git-send-email-prarit@redhat.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/common.c | 17 ++++-------------
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index a62cf04dac8a..41a8e9cb30bc 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -419,7 +419,6 @@ static const struct cpu_dev *cpu_devs[X86_VENDOR_NUM] = {};
 static void get_model_name(struct cpuinfo_x86 *c)
 {
 	unsigned int *v;
-	char *p, *q;
 
 	if (c->extended_cpuid_level < 0x80000004)
 		return;
@@ -431,18 +430,10 @@ static void get_model_name(struct cpuinfo_x86 *c)
 	c->x86_model_id[48] = 0;
 
 	/*
-	 * Intel chips right-justify this string for some dumb reason;
-	 * undo that brain damage:
+	 * Remove leading whitespace on Intel processors and trailing
+	 * whitespace on AMD processors.
 	 */
-	p = q = &c->x86_model_id[0];
-	while (*p == ' ')
-		p++;
-	if (p != q) {
-		while (*p)
-			*q++ = *p++;
-		while (q <= &c->x86_model_id[48])
-			*q++ = '\0';	/* Zero-pad the rest */
-	}
+	memmove(c->x86_model_id, strim(c->x86_model_id), 48);
 }
 
 void cpu_detect_cache_sizes(struct cpuinfo_x86 *c)
@@ -1122,7 +1113,7 @@ void print_cpu_info(struct cpuinfo_x86 *c)
 		printk(KERN_CONT "%s ", vendor);
 
 	if (c->x86_model_id[0])
-		printk(KERN_CONT "%s", strim(c->x86_model_id));
+		printk(KERN_CONT "%s", c->x86_model_id);
 	else
 		printk(KERN_CONT "%d86", c->x86);
 
-- 
1.9.0.258.g00eda23


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH 15/18] x86/documentation: Move kernel-stacks doc one level up
  2015-05-26  8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
                   ` (13 preceding siblings ...)
  2015-05-26  8:28 ` [PATCH 14/18] x86/cpu: Strip any /proc/cpuinfo model name field whitespace Borislav Petkov
@ 2015-05-26  8:28 ` Borislav Petkov
  2015-05-27 14:17   ` [tip:x86/debug] x86/Documentation: " tip-bot for Borislav Petkov
  2015-05-26  8:28 ` [PATCH 16/18] x86/documentation: Remove STACKFAULT_STACK bulletpoint Borislav Petkov
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-26  8:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: X86-ML, LKML

From: Borislav Petkov <bp@suse.de>

... to Documentation/x86/ as it is going to collect more and not only
64-bit specific info.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: X86 ML <x86@kernel.org>
Cc: live-patching@vger.kernel.org
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 Documentation/x86/{x86_64 => }/kernel-stacks | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename Documentation/x86/{x86_64 => }/kernel-stacks (100%)

diff --git a/Documentation/x86/x86_64/kernel-stacks b/Documentation/x86/kernel-stacks
similarity index 100%
rename from Documentation/x86/x86_64/kernel-stacks
rename to Documentation/x86/kernel-stacks
-- 
1.9.0.258.g00eda23


^ permalink raw reply	[flat|nested] 710+ messages in thread

* [PATCH 16/18] x86/documentation: Remove STACKFAULT_STACK bulletpoint
  2015-05-26  8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
                   ` (14 preceding siblings ...)
  2015-05-26  8:28 ` [PATCH 15/18] x86/documentation: Move kernel-stacks doc one level up Borislav Petkov
@ 2015-05-26  8:28 ` Borislav Petkov
  2015-05-27 14:17   ` [tip:x86/debug] x86/Documentation: " tip-bot for Borislav Petkov
  2015-05-26  8:28 ` [PATCH 17/18] x86/documentation: Adapt Ingo's explanation on printing backtraces Borislav Petkov
  2015-05-26  8:28 ` [PATCH 18/18] x86/mce: Fix monarch timeout setting through the mce= cmdline option Borislav Petkov
  17 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-26  8:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: X86-ML, LKML

From: Borislav Petkov <bp@suse.de>

Update the documentation after

  6f442be2fb22 ("x86_64, traps: Stop using IST for #SS").

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: X86 ML <x86@kernel.org>
Cc: live-patching@vger.kernel.org
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 Documentation/x86/kernel-stacks | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/Documentation/x86/kernel-stacks b/Documentation/x86/kernel-stacks
index e3c8a49d1a2f..c3c935b9d56e 100644
--- a/Documentation/x86/kernel-stacks
+++ b/Documentation/x86/kernel-stacks
@@ -1,3 +1,6 @@
+Kernel stacks on x86-64 bit
+---------------------------
+
 Most of the text from Keith Owens, hacked by AK
 
 x86_64 page size (PAGE_SIZE) is 4K.
@@ -56,13 +59,6 @@ If that assumption is ever broken then the stacks will become corrupt.
 
 The currently assigned IST stacks are :-
 
-* STACKFAULT_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
-
-  Used for interrupt 12 - Stack Fault Exception (#SS).
-
-  This allows the CPU to recover from invalid stack segments. Rarely
-  happens.
-
 * DOUBLEFAULT_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
 
   Used for interrupt 8 - Double Fault Exception (#DF).
-- 
1.9.0.258.g00eda23


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH 17/18] x86/documentation: Adapt Ingo's explanation on printing backtraces
  2015-05-26  8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
                   ` (15 preceding siblings ...)
  2015-05-26  8:28 ` [PATCH 16/18] x86/documentation: Remove STACKFAULT_STACK bulletpoint Borislav Petkov
@ 2015-05-26  8:28 ` Borislav Petkov
  2015-05-26  8:28 ` [PATCH 18/18] x86/mce: Fix monarch timeout setting through the mce= cmdline option Borislav Petkov
  17 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-26  8:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: X86-ML, LKML

From: Borislav Petkov <bp@suse.de>

Hold it down for future reference, as the question about the question
mark in stack traces keeps popping up.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: X86 ML <x86@kernel.org>
Cc: live-patching@vger.kernel.org
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20150521101614.GA10889@gmail.com
---
 Documentation/x86/kernel-stacks | 44 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/Documentation/x86/kernel-stacks b/Documentation/x86/kernel-stacks
index c3c935b9d56e..0f3a6c201943 100644
--- a/Documentation/x86/kernel-stacks
+++ b/Documentation/x86/kernel-stacks
@@ -95,3 +95,47 @@ The currently assigned IST stacks are :-
   assumptions about the previous state of the kernel stack.
 
 For more details see the Intel IA32 or AMD AMD64 architecture manuals.
+
+
+Printing backtraces on x86
+--------------------------
+
+The question about the '?' preceding function names in an x86 stacktrace
+keeps popping up, here's an indepth explanation. It helps if the reader
+stares at print_context_stack() and the whole machinery in and around
+arch/x86/kernel/dumpstack.c.
+
+Adapted from Ingo's mail, Message-ID: <20150521101614.GA10889@gmail.com>:
+
+We always scan the full kernel stack for return addresses stored on
+the kernel stack(s) [*], from stack top to stack bottom, and print out
+anything that 'looks like' a kernel text address.
+
+If it fits into the frame pointer chain, we print it without a question
+mark, knowing that it's part of the real backtrace.
+
+If the address does not fit into our expected frame pointer chain we
+still print it, but we print a '?'. It can mean two things:
+
+ - either the address is not part of the call chain: it's just stale
+   values on the kernel stack, from earlier function calls. This is
+   the common case.
+
+ - or it is part of the call chain, but the frame pointer was not set
+   up properly within the function, so we don't recognize it.
+
+This way we will always print out the real call chain (plus a few more
+entries), regardless of whether the frame pointer was set up correctly
+or not - but in most cases we'll get the call chain right as well. The
+entries printed are strictly in stack order, so you can deduce more
+information from that as well.
+
+The most important property of this method is that we _never_ lose
+information: we always strive to print _all_ addresses on the stack(s)
+that look like kernel text addresses, so if debug information is wrong,
+we still print out the real call chain as well - just with more question
+marks than ideal.
+
+[*] For things like IRQ and IST stacks, we also scan those stacks, in
+    the right order, and try to cross from one stack into another
+    reconstructing the call chain. This works most of the time.
-- 
1.9.0.258.g00eda23


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [PATCH 18/18] x86/mce: Fix monarch timeout setting through the mce= cmdline option
  2015-05-26  8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
                   ` (16 preceding siblings ...)
  2015-05-26  8:28 ` [PATCH 17/18] x86/documentation: Adapt Ingo's explanation on printing backtraces Borislav Petkov
@ 2015-05-26  8:28 ` Borislav Petkov
  2015-06-07 17:39   ` [tip:x86/core] " tip-bot for Xie XiuQi
  17 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-26  8:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: X86-ML, LKML

From: Xie XiuQi <xiexiuqi@huawei.com>

Using "mce=1,10000000" on the kernel cmdline to change the monarch
timeout does not work. The cause is that get_option() does parse a
subsequent comma in the option string and signals that with a return
value. So we don't need to check for a second comma ourselves.

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1432120943-25028-1-git-send-email-xiexiuqi@huawei.com
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index e535533d5ab8..e6580b9255de 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -2008,11 +2008,8 @@ static int __init mcheck_enable(char *str)
 	else if (!strcmp(str, "bios_cmci_threshold"))
 		cfg->bios_cmci_threshold = true;
 	else if (isdigit(str[0])) {
-		get_option(&str, &(cfg->tolerant));
-		if (*str == ',') {
-			++str;
+		if (get_option(&str, &(cfg->tolerant)) == 2)
 			get_option(&str, &(cfg->monarch_timeout));
-		}
 	} else {
 		pr_info("mce argument %s ignored. Please use /sys\n", str);
 		return 0;
-- 
1.9.0.258.g00eda23


^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/3] Compile-time stack frame pointer validation
  2015-05-21  7:52             ` Ingo Molnar
  2015-05-21 12:12               ` Ingo Molnar
@ 2015-05-26 23:06               ` Andi Kleen
  1 sibling, 0 replies; 710+ messages in thread
From: Andi Kleen @ 2015-05-26 23:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Josh Poimboeuf, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Michal Marek, Peter Zijlstra,
	X86 ML, live-patching, linux-kernel, Andy Lutomirski,
	Denys Vlasenko, Brian Gerst, Peter Zijlstra, Borislav Petkov,
	Andrew Morton

Ingo Molnar <mingo@kernel.org> writes:
>
> Especially on modern x86 CPUs with stack engines (latest Intel and AMD 
> CPUs) that keeps ESP updates out of the later stages of execution 
> pipelines, going from RBP framepointers to direct ESP use is 
> beneficial to performance and compresses I$ footprint as well:

Note that Atom doesn't have this stack engine, so you'll likely
see even more difference there.

> So the performance advantages of not doing framepointers is not 
> something we can ignore IMHO:

Agreed.

> but obviously performance isn't 
> everything - so if stack unwinding is unrobust, then we need and
> want frame pointers.

It wasn't that bad in the old days with the approx stack traces.  In
fact I bet it would be possible to write an automated tool that weeds
out many (most?) false positives automatically with a static
compile-time callgraph.

It would be good to at least make it easier building without them
again. Currently it's very difficult because a lot of subsystems force
select frame pointers.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-21  5:48     ` Andy Lutomirski
@ 2015-05-27  1:01       ` Andy Lutomirski
  2015-05-27 11:30         ` Borislav Petkov
  0 siblings, 1 reply; 710+ messages in thread
From: Andy Lutomirski @ 2015-05-27  1:01 UTC (permalink / raw)
  To: Huang Rui, Thomas Gleixner, Rafael J. Wysocki, Len Brown,
	Borislav Petkov
  Cc: John Stultz, Tony Li, X86 ML, Peter Zijlstra, Aaron Lu,
	Fengguang Wu, linux-kernel

On Wed, May 20, 2015 at 10:48 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On May 20, 2015 6:34 PM, "Andy Lutomirski" <luto@kernel.org> wrote:
>> If we did that *and* we had a non-crappy mwaitx, then we could apply an optimization: when going idle, we could turn off the TSC deadline timer and use mwaitx instead.  This would about an interrupt if the event that wakes us is our timer.
>>
>
> Hey, Intel, want to document your secret "Timed MWAIT" feature?  It
> causes a transition to C0 when the deadline expires (see 4.2.4 of the
> Desktop 4th Generation Intel Core Processor Family Datasheet Volume 1,
> order number 328897-001) and it even has an erratum (HSD63 / BDM32),
> but the instruction itself doesn't appear to be documented.
>

Found more:

https://chromium-review.googlesource.com/#/c/205161/

Oddly, Coreboot seems to have mis-spelled that MSR.  It's
MSR_PKG_CST_CONFIG_CONTROL, and bit 31 isn't defined in the SDM
(unsurprisingly).

--Andy

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer
  2015-05-27  1:01       ` Andy Lutomirski
@ 2015-05-27 11:30         ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-05-27 11:30 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Huang Rui, Thomas Gleixner, Rafael J. Wysocki, Len Brown,
	John Stultz, Tony Li, X86 ML, Peter Zijlstra, Aaron Lu,
	Fengguang Wu, linux-kernel

On Tue, May 26, 2015 at 06:01:03PM -0700, Andy Lutomirski wrote:
> https://chromium-review.googlesource.com/#/c/205161/
> 
> Oddly, Coreboot seems to have mis-spelled that MSR.  It's
> MSR_PKG_CST_CONFIG_CONTROL, and bit 31 isn't defined in the SDM
> (unsurprisingly).

Since this MSR is a control MSR and from looking at the comment in the
coreboot code and how they set that bit, it enables that MWAIT variant.
Even if the MSR write would stick on your hw, though, you'd still need
to know what it takes into EAX/ECX (and possibly some other register...
EBX, EDX...?) It might bring you some fun while trying to figure it out
:-)

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* [tip:sched/core] sched/x86: Drop repeated word from mwait_idle() comment
  2015-05-26  8:28 ` [PATCH 06/18] x86/process: Drop repeated word from comment Borislav Petkov
@ 2015-05-27 14:16   ` tip-bot for Huang Rui
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Huang Rui @ 2015-05-27 14:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, hpa, luto, peterz, ray.huang, linux-kernel, dvlasenk,
	bp, tglx, brgerst, mingo, bp

Commit-ID:  0fb0328d3458ff2d6ffbb280b75053c99a8a4b1f
Gitweb:     http://git.kernel.org/tip/0fb0328d3458ff2d6ffbb280b75053c99a8a4b1f
Author:     Huang Rui <ray.huang@amd.com>
AuthorDate: Tue, 26 May 2015 10:28:09 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:38:04 +0200

sched/x86: Drop repeated word from mwait_idle() comment

A single "default" is fine.

Signed-off-by: Huang Rui <ray.huang@amd.com>
[ Fix another typo and reflow comment. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1432022472-2224-5-git-send-email-ray.huang@amd.com
Link: http://lkml.kernel.org/r/1432628901-18044-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/process.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 6e338e3..c648139 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -445,11 +445,10 @@ static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
 }
 
 /*
- * MONITOR/MWAIT with no hints, used for default default C1 state.
- * This invokes MWAIT with interrutps enabled and no flags,
- * which is backwards compatible with the original MWAIT implementation.
+ * MONITOR/MWAIT with no hints, used for default C1 state. This invokes MWAIT
+ * with interrupts enabled and no flags, which is backwards compatible with the
+ * original MWAIT implementation.
  */
-
 static void mwait_idle(void)
 {
 	if (!current_set_polling_and_test()) {

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/cpu] x86/cpu: Strip any /proc/ cpuinfo model name field whitespace
  2015-05-26  8:28 ` [PATCH 14/18] x86/cpu: Strip any /proc/cpuinfo model name field whitespace Borislav Petkov
@ 2015-05-27 14:16   ` tip-bot for Prarit Bhargava
  2015-05-27 17:07     ` Joe Perches
  0 siblings, 1 reply; 710+ messages in thread
From: tip-bot for Prarit Bhargava @ 2015-05-27 14:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, imammedo, torvalds, dvlasenk, mingo, brgerst, luto, bp,
	tglx, bp, prarit, dave.hansen, linux-kernel, hpa, fenghua.yu

Commit-ID:  adafb98da6a7af5e45362933a7dae6ab0e5076bf
Gitweb:     http://git.kernel.org/tip/adafb98da6a7af5e45362933a7dae6ab0e5076bf
Author:     Prarit Bhargava <prarit@redhat.com>
AuthorDate: Tue, 26 May 2015 10:28:17 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:38:24 +0200

x86/cpu: Strip any /proc/cpuinfo model name field whitespace

When comparing the 'model name' field of each core in
/proc/cpuinfo it was noticed that there is a whitespace
difference between the cores' model names.

After some quick investigation it was noticed that the model
name fields were actually different -- processor 0's model name
field had trailing whitespace removed, while the other
processors did not.

Another way of seeing this behaviour is to convert spaces into
underscores in the output of /proc/cpuinfo,

  [thetango@prarit ~]# grep "^model name" /proc/cpuinfo | uniq -c | sed 's/\ /_/g'
  ______1_model_name      :_AMD_Opteron(TM)_Processor_6272
  _____63_model_name      :_AMD_Opteron(TM)_Processor_6272_________________

which shows the discrepancy.

This occurs because the kernel calls strim() on cpu 0's
x86_model_id field to output a pretty message to the console in
print_cpu_info(), and as a result strips the whitespace at the
end of the ->x86_model_id field.

But, the ->x86_model_id field should be the same for the all
identical CPUs in the box. Thus, we need to remove both leading
and trailing whitespace.

As a result, the print_cpu_info() output looks like

  smpboot: CPU0: AMD Opteron(TM) Processor 6272 (fam: 15, model: 01, stepping: 02)

and the x86_model_id field is correct on all processors on AMD
platforms:

  _____64_model_name      :_AMD_Opteron(TM)_Processor_6272

Output is still correct on an Intel box:

  ____144_model_name      :_Intel(R)_Xeon(R)_CPU_E7-8890_v3_@_2.50GHz

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1432050210-32036-1-git-send-email-prarit@redhat.com
Link: http://lkml.kernel.org/r/1432628901-18044-15-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/common.c | 17 ++++-------------
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index a62cf04..41a8e9c 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -419,7 +419,6 @@ static const struct cpu_dev *cpu_devs[X86_VENDOR_NUM] = {};
 static void get_model_name(struct cpuinfo_x86 *c)
 {
 	unsigned int *v;
-	char *p, *q;
 
 	if (c->extended_cpuid_level < 0x80000004)
 		return;
@@ -431,18 +430,10 @@ static void get_model_name(struct cpuinfo_x86 *c)
 	c->x86_model_id[48] = 0;
 
 	/*
-	 * Intel chips right-justify this string for some dumb reason;
-	 * undo that brain damage:
+	 * Remove leading whitespace on Intel processors and trailing
+	 * whitespace on AMD processors.
 	 */
-	p = q = &c->x86_model_id[0];
-	while (*p == ' ')
-		p++;
-	if (p != q) {
-		while (*p)
-			*q++ = *p++;
-		while (q <= &c->x86_model_id[48])
-			*q++ = '\0';	/* Zero-pad the rest */
-	}
+	memmove(c->x86_model_id, strim(c->x86_model_id), 48);
 }
 
 void cpu_detect_cache_sizes(struct cpuinfo_x86 *c)
@@ -1122,7 +1113,7 @@ void print_cpu_info(struct cpuinfo_x86 *c)
 		printk(KERN_CONT "%s ", vendor);
 
 	if (c->x86_model_id[0])
-		printk(KERN_CONT "%s", strim(c->x86_model_id));
+		printk(KERN_CONT "%s", c->x86_model_id);
 	else
 		printk(KERN_CONT "%d86", c->x86);
 

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/debug] x86/Documentation: Move kernel-stacks doc one level up
  2015-05-26  8:28 ` [PATCH 15/18] x86/documentation: Move kernel-stacks doc one level up Borislav Petkov
@ 2015-05-27 14:17   ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Borislav Petkov @ 2015-05-27 14:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dvlasenk, bp, jpoimboe, torvalds, luto, peterz, linux-kernel,
	brgerst, mingo, a.p.zijlstra, tglx, akpm, hpa, luto, mmarek, bp

Commit-ID:  54fd15780526c47fa29a85b066cf69996be59a59
Gitweb:     http://git.kernel.org/tip/54fd15780526c47fa29a85b066cf69996be59a59
Author:     Borislav Petkov <bp@suse.de>
AuthorDate: Tue, 26 May 2015 10:28:18 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:39:44 +0200

x86/Documentation: Move kernel-stacks doc one level up

... to Documentation/x86/ as it is going to collect more and not
only 64-bit specific info.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/1432628901-18044-16-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/x86/{x86_64 => }/kernel-stacks | 0
 1 file changed, 0 insertions(+), 0 deletions(-)

diff --git a/Documentation/x86/x86_64/kernel-stacks b/Documentation/x86/kernel-stacks
similarity index 100%
rename from Documentation/x86/x86_64/kernel-stacks
rename to Documentation/x86/kernel-stacks

^ permalink raw reply	[flat|nested] 710+ messages in thread

* [tip:x86/debug] x86/Documentation: Remove STACKFAULT_STACK bulletpoint
  2015-05-26  8:28 ` [PATCH 16/18] x86/documentation: Remove STACKFAULT_STACK bulletpoint Borislav Petkov
@ 2015-05-27 14:17   ` tip-bot for Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Borislav Petkov @ 2015-05-27 14:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, jpoimboe, peterz, brgerst, mmarek, linux-kernel,
	a.p.zijlstra, luto, bp, akpm, luto, hpa, bp, mingo, dvlasenk,
	torvalds

Commit-ID:  d724a9a52b0026ac6a05440c079c9a618acfd8cf
Gitweb:     http://git.kernel.org/tip/d724a9a52b0026ac6a05440c079c9a618acfd8cf
Author:     Borislav Petkov <bp@suse.de>
AuthorDate: Tue, 26 May 2015 10:28:19 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:39:46 +0200

x86/Documentation: Remove STACKFAULT_STACK bulletpoint

Update the documentation after

  6f442be2fb22 ("x86_64, traps: Stop using IST for #SS").

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/1432628901-18044-17-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/x86/kernel-stacks | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/Documentation/x86/kernel-stacks b/Documentation/x86/kernel-stacks
index e3c8a49..c3c935b 100644
--- a/Documentation/x86/kernel-stacks
+++ b/Documentation/x86/kernel-stacks
@@ -1,3 +1,6 @@
+Kernel stacks on x86-64 bit
+---------------------------
+
 Most of the text from Keith Owens, hacked by AK
 
 x86_64 page size (PAGE_SIZE) is 4K.
@@ -56,13 +59,6 @@ If that assumption is ever broken then the stacks will become corrupt.
 
 The currently assigned IST stacks are :-
 
-* STACKFAULT_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
-
-  Used for interrupt 12 - Stack Fault Exception (#SS).
-
-  This allows the CPU to recover from invalid stack segments. Rarely
-  happens.
-
 * DOUBLEFAULT_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
 
   Used for interrupt 8 - Double Fault Exception (#DF).

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/debug] x86/Documentation: Adapt Ingo' s explanation on printing backtraces
  2015-05-21 10:16             ` Ingo Molnar
  2015-05-21 10:47               ` Borislav Petkov
@ 2015-05-27 14:17               ` tip-bot for Borislav Petkov
  1 sibling, 0 replies; 710+ messages in thread
From: tip-bot for Borislav Petkov @ 2015-05-27 14:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, brgerst, bp, peterz, torvalds, mingo, mmarek, luto,
	dvlasenk, tglx, jpoimboe, luto, akpm, a.p.zijlstra, bp,
	linux-kernel

Commit-ID:  113b5e3720e79ad938374163c1b8e295521dc9cf
Gitweb:     http://git.kernel.org/tip/113b5e3720e79ad938374163c1b8e295521dc9cf
Author:     Borislav Petkov <bp@suse.de>
AuthorDate: Tue, 26 May 2015 10:28:20 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:39:47 +0200

x86/Documentation: Adapt Ingo's explanation on printing backtraces

Hold it down for future reference, as the question about the
question mark in stack traces keeps popping up.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/1432628901-18044-18-git-send-email-bp@alien8.de
Link: http://lkml.kernel.org/r/20150521101614.GA10889@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/x86/kernel-stacks | 44 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/Documentation/x86/kernel-stacks b/Documentation/x86/kernel-stacks
index c3c935b..0f3a6c2 100644
--- a/Documentation/x86/kernel-stacks
+++ b/Documentation/x86/kernel-stacks
@@ -95,3 +95,47 @@ The currently assigned IST stacks are :-
   assumptions about the previous state of the kernel stack.
 
 For more details see the Intel IA32 or AMD AMD64 architecture manuals.
+
+
+Printing backtraces on x86
+--------------------------
+
+The question about the '?' preceding function names in an x86 stacktrace
+keeps popping up, here's an indepth explanation. It helps if the reader
+stares at print_context_stack() and the whole machinery in and around
+arch/x86/kernel/dumpstack.c.
+
+Adapted from Ingo's mail, Message-ID: <20150521101614.GA10889@gmail.com>:
+
+We always scan the full kernel stack for return addresses stored on
+the kernel stack(s) [*], from stack top to stack bottom, and print out
+anything that 'looks like' a kernel text address.
+
+If it fits into the frame pointer chain, we print it without a question
+mark, knowing that it's part of the real backtrace.
+
+If the address does not fit into our expected frame pointer chain we
+still print it, but we print a '?'. It can mean two things:
+
+ - either the address is not part of the call chain: it's just stale
+   values on the kernel stack, from earlier function calls. This is
+   the common case.
+
+ - or it is part of the call chain, but the frame pointer was not set
+   up properly within the function, so we don't recognize it.
+
+This way we will always print out the real call chain (plus a few more
+entries), regardless of whether the frame pointer was set up correctly
+or not - but in most cases we'll get the call chain right as well. The
+entries printed are strictly in stack order, so you can deduce more
+information from that as well.
+
+The most important property of this method is that we _never_ lose
+information: we always strive to print _all_ addresses on the stack(s)
+that look like kernel text addresses, so if debug information is wrong,
+we still print out the real call chain as well - just with more question
+marks than ideal.
+
+[*] For things like IRQ and IST stacks, we also scan those stacks, in
+    the right order, and try to cross from one stack into another
+    reconstructing the call chain. This works most of the time.

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/mm] x86/mm/kconfig: Simplify conditions for HAVE_ARCH_HUGE_VMAP
  2015-05-26  8:28 ` [PATCH 01/18] x86/kconfig: Simplify conditions for HAVE_ARCH_HUGE_VMAP Borislav Petkov
@ 2015-05-27 14:17     ` tip-bot for Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Toshi Kani @ 2015-05-27 14:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dvlasenk, torvalds, hpa, toshi.kani, bp, akpm, tglx, linux-mm,
	linux-kernel, bp, mingo, peterz, mcgrof, brgerst, luto

Commit-ID:  10455f64aff0d715dcdfb09b02393df168fe267e
Gitweb:     http://git.kernel.org/tip/10455f64aff0d715dcdfb09b02393df168fe267e
Author:     Toshi Kani <toshi.kani@hp.com>
AuthorDate: Tue, 26 May 2015 10:28:04 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:40:55 +0200

x86/mm/kconfig: Simplify conditions for HAVE_ARCH_HUGE_VMAP

Simplify the conditions selecting HAVE_ARCH_HUGE_VMAP since
X86_PAE depends on X86_32 already.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-2-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 226d569..4eb0b0f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -100,7 +100,7 @@ config X86
 	select IRQ_FORCED_THREADING
 	select HAVE_BPF_JIT if X86_64
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
-	select HAVE_ARCH_HUGE_VMAP if X86_64 || (X86_32 && X86_PAE)
+	select HAVE_ARCH_HUGE_VMAP if X86_64 || X86_PAE
 	select ARCH_HAS_SG_CHAIN
 	select CLKEVT_I8253
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/mm] x86/mm/kconfig: Simplify conditions for HAVE_ARCH_HUGE_VMAP
@ 2015-05-27 14:17     ` tip-bot for Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Toshi Kani @ 2015-05-27 14:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dvlasenk, torvalds, hpa, toshi.kani, bp, akpm, tglx, linux-mm,
	linux-kernel, bp, mingo, peterz, mcgrof, brgerst, luto

Commit-ID:  10455f64aff0d715dcdfb09b02393df168fe267e
Gitweb:     http://git.kernel.org/tip/10455f64aff0d715dcdfb09b02393df168fe267e
Author:     Toshi Kani <toshi.kani@hp.com>
AuthorDate: Tue, 26 May 2015 10:28:04 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:40:55 +0200

x86/mm/kconfig: Simplify conditions for HAVE_ARCH_HUGE_VMAP

Simplify the conditions selecting HAVE_ARCH_HUGE_VMAP since
X86_PAE depends on X86_32 already.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-2-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 226d569..4eb0b0f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -100,7 +100,7 @@ config X86
 	select IRQ_FORCED_THREADING
 	select HAVE_BPF_JIT if X86_64
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
-	select HAVE_ARCH_HUGE_VMAP if X86_64 || (X86_32 && X86_PAE)
+	select HAVE_ARCH_HUGE_VMAP if X86_64 || X86_PAE
 	select ARCH_HAS_SG_CHAIN
 	select CLKEVT_I8253
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/mm] x86/mm/mtrr: Fix MTRR lookup to handle an inclusive entry
  2015-05-26  8:28 ` [PATCH 02/18] x86/mtrr: Fix MTRR lookup to handle an inclusive entry Borislav Petkov
@ 2015-05-27 14:18     ` tip-bot for Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Toshi Kani @ 2015-05-27 14:18 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, tglx, dvlasenk, peterz, bp, luto, torvalds, toshi.kani,
	akpm, mcgrof, hpa, brgerst, linux-mm, bp, linux-kernel

Commit-ID:  7f0431e3dc8953f41e9433581c1fdd7ee45860b0
Gitweb:     http://git.kernel.org/tip/7f0431e3dc8953f41e9433581c1fdd7ee45860b0
Author:     Toshi Kani <toshi.kani@hp.com>
AuthorDate: Tue, 26 May 2015 10:28:05 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:40:56 +0200

x86/mm/mtrr: Fix MTRR lookup to handle an inclusive entry

When an MTRR entry is inclusive to a requested range, i.e. the
start and end of the request are not within the MTRR entry range
but the range contains the MTRR entry entirely:

  range_start ... [mtrr_start ... mtrr_end] ... range_end

__mtrr_type_lookup() ignores such a case because both
start_state and end_state are set to zero.

This bug can cause the following issues:

1) reserve_memtype() tracks an effective memory type in case
   a request type is WB (ex. /dev/mem blindly uses WB). Missing
   to track with its effective type causes a subsequent request
   to map the same range with the effective type to fail.

2) pud_set_huge() and pmd_set_huge() check if a requested range
   has any overlap with MTRRs. Missing to detect an overlap may
   cause a performance penalty or undefined behavior.

This patch fixes the bug by adding a new flag, 'inclusive',
to detect the inclusive case.  This case is then handled in
the same way as end_state:1 since the first region is the same.
With this fix, __mtrr_type_lookup() handles the inclusive case
properly.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-3-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/mtrr/generic.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 5b23967..e202d26 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -154,7 +154,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 
 	prev_match = 0xFF;
 	for (i = 0; i < num_var_ranges; ++i) {
-		unsigned short start_state, end_state;
+		unsigned short start_state, end_state, inclusive;
 
 		if (!(mtrr_state.var_ranges[i].mask_lo & (1 << 11)))
 			continue;
@@ -166,19 +166,27 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 
 		start_state = ((start & mask) == (base & mask));
 		end_state = ((end & mask) == (base & mask));
+		inclusive = ((start < base) && (end > base));
 
-		if (start_state != end_state) {
+		if ((start_state != end_state) || inclusive) {
 			/*
 			 * We have start:end spanning across an MTRR.
-			 * We split the region into
-			 * either
-			 * (start:mtrr_end) (mtrr_end:end)
-			 * or
-			 * (start:mtrr_start) (mtrr_start:end)
+			 * We split the region into either
+			 *
+			 * - start_state:1
+			 * (start:mtrr_end)(mtrr_end:end)
+			 * - end_state:1
+			 * (start:mtrr_start)(mtrr_start:end)
+			 * - inclusive:1
+			 * (start:mtrr_start)(mtrr_start:mtrr_end)(mtrr_end:end)
+			 *
 			 * depending on kind of overlap.
-			 * Return the type for first region and a pointer to
-			 * the start of second region so that caller will
-			 * lookup again on the second region.
+			 *
+			 * Return the type of the first region and a pointer
+			 * to the start of next region so that caller will be
+			 * advised to lookup again after having adjusted start
+			 * and end.
+			 *
 			 * Note: This way we handle multiple overlaps as well.
 			 */
 			if (start_state)

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/mm] x86/mm/mtrr: Fix MTRR lookup to handle an inclusive entry
@ 2015-05-27 14:18     ` tip-bot for Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Toshi Kani @ 2015-05-27 14:18 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, tglx, dvlasenk, peterz, bp, luto, torvalds, toshi.kani,
	akpm, mcgrof, hpa, brgerst, linux-mm, bp, linux-kernel

Commit-ID:  7f0431e3dc8953f41e9433581c1fdd7ee45860b0
Gitweb:     http://git.kernel.org/tip/7f0431e3dc8953f41e9433581c1fdd7ee45860b0
Author:     Toshi Kani <toshi.kani@hp.com>
AuthorDate: Tue, 26 May 2015 10:28:05 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:40:56 +0200

x86/mm/mtrr: Fix MTRR lookup to handle an inclusive entry

When an MTRR entry is inclusive to a requested range, i.e. the
start and end of the request are not within the MTRR entry range
but the range contains the MTRR entry entirely:

  range_start ... [mtrr_start ... mtrr_end] ... range_end

__mtrr_type_lookup() ignores such a case because both
start_state and end_state are set to zero.

This bug can cause the following issues:

1) reserve_memtype() tracks an effective memory type in case
   a request type is WB (ex. /dev/mem blindly uses WB). Missing
   to track with its effective type causes a subsequent request
   to map the same range with the effective type to fail.

2) pud_set_huge() and pmd_set_huge() check if a requested range
   has any overlap with MTRRs. Missing to detect an overlap may
   cause a performance penalty or undefined behavior.

This patch fixes the bug by adding a new flag, 'inclusive',
to detect the inclusive case.  This case is then handled in
the same way as end_state:1 since the first region is the same.
With this fix, __mtrr_type_lookup() handles the inclusive case
properly.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-3-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/mtrr/generic.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 5b23967..e202d26 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -154,7 +154,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 
 	prev_match = 0xFF;
 	for (i = 0; i < num_var_ranges; ++i) {
-		unsigned short start_state, end_state;
+		unsigned short start_state, end_state, inclusive;
 
 		if (!(mtrr_state.var_ranges[i].mask_lo & (1 << 11)))
 			continue;
@@ -166,19 +166,27 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 
 		start_state = ((start & mask) == (base & mask));
 		end_state = ((end & mask) == (base & mask));
+		inclusive = ((start < base) && (end > base));
 
-		if (start_state != end_state) {
+		if ((start_state != end_state) || inclusive) {
 			/*
 			 * We have start:end spanning across an MTRR.
-			 * We split the region into
-			 * either
-			 * (start:mtrr_end) (mtrr_end:end)
-			 * or
-			 * (start:mtrr_start) (mtrr_start:end)
+			 * We split the region into either
+			 *
+			 * - start_state:1
+			 * (start:mtrr_end)(mtrr_end:end)
+			 * - end_state:1
+			 * (start:mtrr_start)(mtrr_start:end)
+			 * - inclusive:1
+			 * (start:mtrr_start)(mtrr_start:mtrr_end)(mtrr_end:end)
+			 *
 			 * depending on kind of overlap.
-			 * Return the type for first region and a pointer to
-			 * the start of second region so that caller will
-			 * lookup again on the second region.
+			 *
+			 * Return the type of the first region and a pointer
+			 * to the start of next region so that caller will be
+			 * advised to lookup again after having adjusted start
+			 * and end.
+			 *
 			 * Note: This way we handle multiple overlaps as well.
 			 */
 			if (start_state)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/mm] x86/mm/mtrr: Fix MTRR state checks in mtrr_type_lookup()
  2015-05-26  8:28 ` [PATCH 03/18] x86/mtrr: Fix MTRR state checks in mtrr_type_lookup() Borislav Petkov
@ 2015-05-27 14:18     ` tip-bot for Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Toshi Kani @ 2015-05-27 14:18 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, dvlasenk, linux-kernel, toshi.kani, peterz, bp, mcgrof,
	mingo, brgerst, torvalds, luto, akpm, tglx, linux-mm, bp

Commit-ID:  9b3aca620883fc06636737c82a4d024b22182281
Gitweb:     http://git.kernel.org/tip/9b3aca620883fc06636737c82a4d024b22182281
Author:     Toshi Kani <toshi.kani@hp.com>
AuthorDate: Tue, 26 May 2015 10:28:06 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:40:56 +0200

x86/mm/mtrr: Fix MTRR state checks in mtrr_type_lookup()

'mtrr_state.enabled' contains the FE (fixed MTRRs enabled)
and E (MTRRs enabled) flags in MSR_MTRRdefType.  Intel SDM,
section 11.11.2.1, defines these flags as follows:

 - All MTRRs are disabled when the E flag is clear.
   The FE flag has no affect when the E flag is clear.
 - The default type is enabled when the E flag is set.
 - MTRR variable ranges are enabled when the E flag is set.
 - MTRR fixed ranges are enabled when both E and FE flags
   are set.

MTRR state checks in __mtrr_type_lookup() do not match with SDM.

Hence, this patch makes the following changes:
 - The current code detects MTRRs disabled when both E and
   FE flags are clear in mtrr_state.enabled.  Fix to detect
   MTRRs disabled when the E flag is clear.
 - The current code does not check if the FE bit is set in
   mtrr_state.enabled when looking at the fixed entries.
   Fix to check the FE flag.
 - The current code returns the default type when the E flag
   is clear in mtrr_state.enabled. However, the default type
   is UC when the E flag is clear.  Remove the code as this
   case is handled as MTRR disabled with the 1st change.

In addition, this patch defines the E and FE flags in
mtrr_state.enabled as follows.
 - FE flag: MTRR_STATE_MTRR_FIXED_ENABLED
 - E  flag: MTRR_STATE_MTRR_ENABLED

print_mtrr_state() and x86_get_mtrr_mem_range() are also updated
accordingly.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-4-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/mtrr.h        |  4 ++++
 arch/x86/kernel/cpu/mtrr/cleanup.c |  3 ++-
 arch/x86/kernel/cpu/mtrr/generic.c | 15 ++++++++-------
 3 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index f768f62..ef92794 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -127,4 +127,8 @@ struct mtrr_gentry32 {
 				 _IOW(MTRR_IOCTL_BASE,  9, struct mtrr_sentry32)
 #endif /* CONFIG_COMPAT */
 
+/* Bit fields for enabled in struct mtrr_state_type */
+#define MTRR_STATE_MTRR_FIXED_ENABLED	0x01
+#define MTRR_STATE_MTRR_ENABLED		0x02
+
 #endif /* _ASM_X86_MTRR_H */
diff --git a/arch/x86/kernel/cpu/mtrr/cleanup.c b/arch/x86/kernel/cpu/mtrr/cleanup.c
index 5f90b85..70d7c93 100644
--- a/arch/x86/kernel/cpu/mtrr/cleanup.c
+++ b/arch/x86/kernel/cpu/mtrr/cleanup.c
@@ -98,7 +98,8 @@ x86_get_mtrr_mem_range(struct range *range, int nr_range,
 			continue;
 		base = range_state[i].base_pfn;
 		if (base < (1<<(20-PAGE_SHIFT)) && mtrr_state.have_fixed &&
-		    (mtrr_state.enabled & 1)) {
+		    (mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED) &&
+		    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
 			/* Var MTRR contains UC entry below 1M? Skip it: */
 			printk(BIOS_BUG_MSG, i);
 			if (base + size <= (1<<(20-PAGE_SHIFT)))
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index e202d26..b0599db 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -119,14 +119,16 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 	if (!mtrr_state_set)
 		return 0xFF;
 
-	if (!mtrr_state.enabled)
+	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
 		return 0xFF;
 
 	/* Make end inclusive end, instead of exclusive */
 	end--;
 
 	/* Look in fixed ranges. Just return the type as per start */
-	if (mtrr_state.have_fixed && (start < 0x100000)) {
+	if ((start < 0x100000) &&
+	    (mtrr_state.have_fixed) &&
+	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
 		int idx;
 
 		if (start < 0x80000) {
@@ -149,9 +151,6 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 	 * Look of multiple ranges matching this address and pick type
 	 * as per MTRR precedence
 	 */
-	if (!(mtrr_state.enabled & 2))
-		return mtrr_state.def_type;
-
 	prev_match = 0xFF;
 	for (i = 0; i < num_var_ranges; ++i) {
 		unsigned short start_state, end_state, inclusive;
@@ -355,7 +354,9 @@ static void __init print_mtrr_state(void)
 		 mtrr_attrib_to_str(mtrr_state.def_type));
 	if (mtrr_state.have_fixed) {
 		pr_debug("MTRR fixed ranges %sabled:\n",
-			 mtrr_state.enabled & 1 ? "en" : "dis");
+			((mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED) &&
+			 (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) ?
+			 "en" : "dis");
 		print_fixed(0x00000, 0x10000, mtrr_state.fixed_ranges + 0);
 		for (i = 0; i < 2; ++i)
 			print_fixed(0x80000 + i * 0x20000, 0x04000,
@@ -368,7 +369,7 @@ static void __init print_mtrr_state(void)
 		print_fixed_last();
 	}
 	pr_debug("MTRR variable ranges %sabled:\n",
-		 mtrr_state.enabled & 2 ? "en" : "dis");
+		 mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED ? "en" : "dis");
 	high_width = (__ffs64(size_or_mask) - (32 - PAGE_SHIFT) + 3) / 4;
 
 	for (i = 0; i < num_var_ranges; ++i) {

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/mm] x86/mm/mtrr: Fix MTRR state checks in mtrr_type_lookup()
@ 2015-05-27 14:18     ` tip-bot for Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Toshi Kani @ 2015-05-27 14:18 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, dvlasenk, linux-kernel, toshi.kani, peterz, bp, mcgrof,
	mingo, brgerst, torvalds, luto, akpm, tglx, linux-mm, bp

Commit-ID:  9b3aca620883fc06636737c82a4d024b22182281
Gitweb:     http://git.kernel.org/tip/9b3aca620883fc06636737c82a4d024b22182281
Author:     Toshi Kani <toshi.kani@hp.com>
AuthorDate: Tue, 26 May 2015 10:28:06 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:40:56 +0200

x86/mm/mtrr: Fix MTRR state checks in mtrr_type_lookup()

'mtrr_state.enabled' contains the FE (fixed MTRRs enabled)
and E (MTRRs enabled) flags in MSR_MTRRdefType.  Intel SDM,
section 11.11.2.1, defines these flags as follows:

 - All MTRRs are disabled when the E flag is clear.
   The FE flag has no affect when the E flag is clear.
 - The default type is enabled when the E flag is set.
 - MTRR variable ranges are enabled when the E flag is set.
 - MTRR fixed ranges are enabled when both E and FE flags
   are set.

MTRR state checks in __mtrr_type_lookup() do not match with SDM.

Hence, this patch makes the following changes:
 - The current code detects MTRRs disabled when both E and
   FE flags are clear in mtrr_state.enabled.  Fix to detect
   MTRRs disabled when the E flag is clear.
 - The current code does not check if the FE bit is set in
   mtrr_state.enabled when looking at the fixed entries.
   Fix to check the FE flag.
 - The current code returns the default type when the E flag
   is clear in mtrr_state.enabled. However, the default type
   is UC when the E flag is clear.  Remove the code as this
   case is handled as MTRR disabled with the 1st change.

In addition, this patch defines the E and FE flags in
mtrr_state.enabled as follows.
 - FE flag: MTRR_STATE_MTRR_FIXED_ENABLED
 - E  flag: MTRR_STATE_MTRR_ENABLED

print_mtrr_state() and x86_get_mtrr_mem_range() are also updated
accordingly.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-4-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/mtrr.h        |  4 ++++
 arch/x86/kernel/cpu/mtrr/cleanup.c |  3 ++-
 arch/x86/kernel/cpu/mtrr/generic.c | 15 ++++++++-------
 3 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index f768f62..ef92794 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -127,4 +127,8 @@ struct mtrr_gentry32 {
 				 _IOW(MTRR_IOCTL_BASE,  9, struct mtrr_sentry32)
 #endif /* CONFIG_COMPAT */
 
+/* Bit fields for enabled in struct mtrr_state_type */
+#define MTRR_STATE_MTRR_FIXED_ENABLED	0x01
+#define MTRR_STATE_MTRR_ENABLED		0x02
+
 #endif /* _ASM_X86_MTRR_H */
diff --git a/arch/x86/kernel/cpu/mtrr/cleanup.c b/arch/x86/kernel/cpu/mtrr/cleanup.c
index 5f90b85..70d7c93 100644
--- a/arch/x86/kernel/cpu/mtrr/cleanup.c
+++ b/arch/x86/kernel/cpu/mtrr/cleanup.c
@@ -98,7 +98,8 @@ x86_get_mtrr_mem_range(struct range *range, int nr_range,
 			continue;
 		base = range_state[i].base_pfn;
 		if (base < (1<<(20-PAGE_SHIFT)) && mtrr_state.have_fixed &&
-		    (mtrr_state.enabled & 1)) {
+		    (mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED) &&
+		    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
 			/* Var MTRR contains UC entry below 1M? Skip it: */
 			printk(BIOS_BUG_MSG, i);
 			if (base + size <= (1<<(20-PAGE_SHIFT)))
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index e202d26..b0599db 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -119,14 +119,16 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 	if (!mtrr_state_set)
 		return 0xFF;
 
-	if (!mtrr_state.enabled)
+	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
 		return 0xFF;
 
 	/* Make end inclusive end, instead of exclusive */
 	end--;
 
 	/* Look in fixed ranges. Just return the type as per start */
-	if (mtrr_state.have_fixed && (start < 0x100000)) {
+	if ((start < 0x100000) &&
+	    (mtrr_state.have_fixed) &&
+	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
 		int idx;
 
 		if (start < 0x80000) {
@@ -149,9 +151,6 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 	 * Look of multiple ranges matching this address and pick type
 	 * as per MTRR precedence
 	 */
-	if (!(mtrr_state.enabled & 2))
-		return mtrr_state.def_type;
-
 	prev_match = 0xFF;
 	for (i = 0; i < num_var_ranges; ++i) {
 		unsigned short start_state, end_state, inclusive;
@@ -355,7 +354,9 @@ static void __init print_mtrr_state(void)
 		 mtrr_attrib_to_str(mtrr_state.def_type));
 	if (mtrr_state.have_fixed) {
 		pr_debug("MTRR fixed ranges %sabled:\n",
-			 mtrr_state.enabled & 1 ? "en" : "dis");
+			((mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED) &&
+			 (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) ?
+			 "en" : "dis");
 		print_fixed(0x00000, 0x10000, mtrr_state.fixed_ranges + 0);
 		for (i = 0; i < 2; ++i)
 			print_fixed(0x80000 + i * 0x20000, 0x04000,
@@ -368,7 +369,7 @@ static void __init print_mtrr_state(void)
 		print_fixed_last();
 	}
 	pr_debug("MTRR variable ranges %sabled:\n",
-		 mtrr_state.enabled & 2 ? "en" : "dis");
+		 mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED ? "en" : "dis");
 	high_width = (__ffs64(size_or_mask) - (32 - PAGE_SHIFT) + 3) / 4;
 
 	for (i = 0; i < num_var_ranges; ++i) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/mm] x86/mm/mtrr: Use symbolic define as a retval for disabled MTRRs
  2015-05-26  8:28 ` [PATCH 04/18] x86/mtrr: Use symbolic define as a retval for disabled MTRRs Borislav Petkov
@ 2015-05-27 14:18     ` tip-bot for Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Toshi Kani @ 2015-05-27 14:18 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: bp, tglx, linux-mm, mcgrof, hpa, akpm, torvalds, linux-kernel,
	brgerst, mingo, toshi.kani, dvlasenk, peterz, bp, luto

Commit-ID:  3d3ca416d9b0784cfcf244eeeba1bcaf421bc64d
Gitweb:     http://git.kernel.org/tip/3d3ca416d9b0784cfcf244eeeba1bcaf421bc64d
Author:     Toshi Kani <toshi.kani@hp.com>
AuthorDate: Tue, 26 May 2015 10:28:07 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:40:57 +0200

x86/mm/mtrr: Use symbolic define as a retval for disabled MTRRs

mtrr_type_lookup() returns verbatim 0xFF when MTRRs are
disabled. This patch defines MTRR_TYPE_INVALID to clarify the
meaning of this value, and documents its usage.

Document the return values of the kernel virtual address mapping
helpers pud_set_huge(), pmd_set_huge, pud_clear_huge() and
pmd_clear_huge().

There is no functional change in this patch.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-5-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/mtrr.h        |  2 +-
 arch/x86/include/uapi/asm/mtrr.h   |  8 +++++++-
 arch/x86/kernel/cpu/mtrr/generic.c | 14 ++++++-------
 arch/x86/mm/pgtable.c              | 42 +++++++++++++++++++++++++++++---------
 4 files changed, 47 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index ef92794..bb03a54 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -55,7 +55,7 @@ static inline u8 mtrr_type_lookup(u64 addr, u64 end)
 	/*
 	 * Return no-MTRRs:
 	 */
-	return 0xff;
+	return MTRR_TYPE_INVALID;
 }
 #define mtrr_save_fixed_ranges(arg) do {} while (0)
 #define mtrr_save_state() do {} while (0)
diff --git a/arch/x86/include/uapi/asm/mtrr.h b/arch/x86/include/uapi/asm/mtrr.h
index d0acb65..7528dcf 100644
--- a/arch/x86/include/uapi/asm/mtrr.h
+++ b/arch/x86/include/uapi/asm/mtrr.h
@@ -103,7 +103,7 @@ struct mtrr_state_type {
 #define MTRRIOC_GET_PAGE_ENTRY   _IOWR(MTRR_IOCTL_BASE, 8, struct mtrr_gentry)
 #define MTRRIOC_KILL_PAGE_ENTRY  _IOW(MTRR_IOCTL_BASE,  9, struct mtrr_sentry)
 
-/*  These are the region types  */
+/* MTRR memory types, which are defined in SDM */
 #define MTRR_TYPE_UNCACHABLE 0
 #define MTRR_TYPE_WRCOMB     1
 /*#define MTRR_TYPE_         2*/
@@ -113,5 +113,11 @@ struct mtrr_state_type {
 #define MTRR_TYPE_WRBACK     6
 #define MTRR_NUM_TYPES       7
 
+/*
+ * Invalid MTRR memory type.  mtrr_type_lookup() returns this value when
+ * MTRRs are disabled.  Note, this value is allocated from the reserved
+ * values (0x7-0xff) of the MTRR memory types.
+ */
+#define MTRR_TYPE_INVALID    0xff
 
 #endif /* _UAPI_ASM_X86_MTRR_H */
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index b0599db..7b1491c 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -104,7 +104,7 @@ static int check_type_overlap(u8 *prev, u8 *curr)
 
 /*
  * Error/Semi-error returns:
- * 0xFF - when MTRR is not enabled
+ * MTRR_TYPE_INVALID - when MTRR is not enabled
  * *repeat == 1 implies [start:end] spanned across MTRR range and type returned
  *		corresponds only to [start:*partial_end].
  *		Caller has to lookup again for [*partial_end:end].
@@ -117,10 +117,10 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 
 	*repeat = 0;
 	if (!mtrr_state_set)
-		return 0xFF;
+		return MTRR_TYPE_INVALID;
 
 	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
-		return 0xFF;
+		return MTRR_TYPE_INVALID;
 
 	/* Make end inclusive end, instead of exclusive */
 	end--;
@@ -151,7 +151,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 	 * Look of multiple ranges matching this address and pick type
 	 * as per MTRR precedence
 	 */
-	prev_match = 0xFF;
+	prev_match = MTRR_TYPE_INVALID;
 	for (i = 0; i < num_var_ranges; ++i) {
 		unsigned short start_state, end_state, inclusive;
 
@@ -206,7 +206,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			continue;
 
 		curr_match = mtrr_state.var_ranges[i].base_lo & 0xff;
-		if (prev_match == 0xFF) {
+		if (prev_match == MTRR_TYPE_INVALID) {
 			prev_match = curr_match;
 			continue;
 		}
@@ -220,7 +220,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			return MTRR_TYPE_WRBACK;
 	}
 
-	if (prev_match != 0xFF)
+	if (prev_match != MTRR_TYPE_INVALID)
 		return prev_match;
 
 	return mtrr_state.def_type;
@@ -229,7 +229,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 /*
  * Returns the effective MTRR type for the region
  * Error return:
- * 0xFF - when MTRR is not enabled
+ * MTRR_TYPE_INVALID - when MTRR is not enabled
  */
 u8 mtrr_type_lookup(u64 start, u64 end)
 {
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 0b97d2c..c30f981 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -563,16 +563,22 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
 }
 
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+/**
+ * pud_set_huge - setup kernel PUD mapping
+ *
+ * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
+ * this function does not set up a huge page when the range is covered
+ * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
+ * disabled.
+ *
+ * Returns 1 on success and 0 on failure.
+ */
 int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 {
 	u8 mtrr;
 
-	/*
-	 * Do not use a huge page when the range is covered by non-WB type
-	 * of MTRRs.
-	 */
 	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
+	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
 		return 0;
 
 	prot = pgprot_4k_2_large(prot);
@@ -584,16 +590,22 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 	return 1;
 }
 
+/**
+ * pmd_set_huge - setup kernel PMD mapping
+ *
+ * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
+ * this function does not set up a huge page when the range is covered
+ * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
+ * disabled.
+ *
+ * Returns 1 on success and 0 on failure.
+ */
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
 {
 	u8 mtrr;
 
-	/*
-	 * Do not use a huge page when the range is covered by non-WB type
-	 * of MTRRs.
-	 */
 	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
+	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
 		return 0;
 
 	prot = pgprot_4k_2_large(prot);
@@ -605,6 +617,11 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
 	return 1;
 }
 
+/**
+ * pud_clear_huge - clear kernel PUD mapping when it is set
+ *
+ * Returns 1 on success and 0 on failure (no PUD map is found).
+ */
 int pud_clear_huge(pud_t *pud)
 {
 	if (pud_large(*pud)) {
@@ -615,6 +632,11 @@ int pud_clear_huge(pud_t *pud)
 	return 0;
 }
 
+/**
+ * pmd_clear_huge - clear kernel PMD mapping when it is set
+ *
+ * Returns 1 on success and 0 on failure (no PMD map is found).
+ */
 int pmd_clear_huge(pmd_t *pmd)
 {
 	if (pmd_large(*pmd)) {

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/mm] x86/mm/mtrr: Use symbolic define as a retval for disabled MTRRs
@ 2015-05-27 14:18     ` tip-bot for Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Toshi Kani @ 2015-05-27 14:18 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: bp, tglx, linux-mm, mcgrof, hpa, akpm, torvalds, linux-kernel,
	brgerst, mingo, toshi.kani, dvlasenk, peterz, bp, luto

Commit-ID:  3d3ca416d9b0784cfcf244eeeba1bcaf421bc64d
Gitweb:     http://git.kernel.org/tip/3d3ca416d9b0784cfcf244eeeba1bcaf421bc64d
Author:     Toshi Kani <toshi.kani@hp.com>
AuthorDate: Tue, 26 May 2015 10:28:07 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:40:57 +0200

x86/mm/mtrr: Use symbolic define as a retval for disabled MTRRs

mtrr_type_lookup() returns verbatim 0xFF when MTRRs are
disabled. This patch defines MTRR_TYPE_INVALID to clarify the
meaning of this value, and documents its usage.

Document the return values of the kernel virtual address mapping
helpers pud_set_huge(), pmd_set_huge, pud_clear_huge() and
pmd_clear_huge().

There is no functional change in this patch.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-5-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/mtrr.h        |  2 +-
 arch/x86/include/uapi/asm/mtrr.h   |  8 +++++++-
 arch/x86/kernel/cpu/mtrr/generic.c | 14 ++++++-------
 arch/x86/mm/pgtable.c              | 42 +++++++++++++++++++++++++++++---------
 4 files changed, 47 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index ef92794..bb03a54 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -55,7 +55,7 @@ static inline u8 mtrr_type_lookup(u64 addr, u64 end)
 	/*
 	 * Return no-MTRRs:
 	 */
-	return 0xff;
+	return MTRR_TYPE_INVALID;
 }
 #define mtrr_save_fixed_ranges(arg) do {} while (0)
 #define mtrr_save_state() do {} while (0)
diff --git a/arch/x86/include/uapi/asm/mtrr.h b/arch/x86/include/uapi/asm/mtrr.h
index d0acb65..7528dcf 100644
--- a/arch/x86/include/uapi/asm/mtrr.h
+++ b/arch/x86/include/uapi/asm/mtrr.h
@@ -103,7 +103,7 @@ struct mtrr_state_type {
 #define MTRRIOC_GET_PAGE_ENTRY   _IOWR(MTRR_IOCTL_BASE, 8, struct mtrr_gentry)
 #define MTRRIOC_KILL_PAGE_ENTRY  _IOW(MTRR_IOCTL_BASE,  9, struct mtrr_sentry)
 
-/*  These are the region types  */
+/* MTRR memory types, which are defined in SDM */
 #define MTRR_TYPE_UNCACHABLE 0
 #define MTRR_TYPE_WRCOMB     1
 /*#define MTRR_TYPE_         2*/
@@ -113,5 +113,11 @@ struct mtrr_state_type {
 #define MTRR_TYPE_WRBACK     6
 #define MTRR_NUM_TYPES       7
 
+/*
+ * Invalid MTRR memory type.  mtrr_type_lookup() returns this value when
+ * MTRRs are disabled.  Note, this value is allocated from the reserved
+ * values (0x7-0xff) of the MTRR memory types.
+ */
+#define MTRR_TYPE_INVALID    0xff
 
 #endif /* _UAPI_ASM_X86_MTRR_H */
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index b0599db..7b1491c 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -104,7 +104,7 @@ static int check_type_overlap(u8 *prev, u8 *curr)
 
 /*
  * Error/Semi-error returns:
- * 0xFF - when MTRR is not enabled
+ * MTRR_TYPE_INVALID - when MTRR is not enabled
  * *repeat == 1 implies [start:end] spanned across MTRR range and type returned
  *		corresponds only to [start:*partial_end].
  *		Caller has to lookup again for [*partial_end:end].
@@ -117,10 +117,10 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 
 	*repeat = 0;
 	if (!mtrr_state_set)
-		return 0xFF;
+		return MTRR_TYPE_INVALID;
 
 	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
-		return 0xFF;
+		return MTRR_TYPE_INVALID;
 
 	/* Make end inclusive end, instead of exclusive */
 	end--;
@@ -151,7 +151,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 	 * Look of multiple ranges matching this address and pick type
 	 * as per MTRR precedence
 	 */
-	prev_match = 0xFF;
+	prev_match = MTRR_TYPE_INVALID;
 	for (i = 0; i < num_var_ranges; ++i) {
 		unsigned short start_state, end_state, inclusive;
 
@@ -206,7 +206,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			continue;
 
 		curr_match = mtrr_state.var_ranges[i].base_lo & 0xff;
-		if (prev_match == 0xFF) {
+		if (prev_match == MTRR_TYPE_INVALID) {
 			prev_match = curr_match;
 			continue;
 		}
@@ -220,7 +220,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			return MTRR_TYPE_WRBACK;
 	}
 
-	if (prev_match != 0xFF)
+	if (prev_match != MTRR_TYPE_INVALID)
 		return prev_match;
 
 	return mtrr_state.def_type;
@@ -229,7 +229,7 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 /*
  * Returns the effective MTRR type for the region
  * Error return:
- * 0xFF - when MTRR is not enabled
+ * MTRR_TYPE_INVALID - when MTRR is not enabled
  */
 u8 mtrr_type_lookup(u64 start, u64 end)
 {
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 0b97d2c..c30f981 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -563,16 +563,22 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
 }
 
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+/**
+ * pud_set_huge - setup kernel PUD mapping
+ *
+ * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
+ * this function does not set up a huge page when the range is covered
+ * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
+ * disabled.
+ *
+ * Returns 1 on success and 0 on failure.
+ */
 int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 {
 	u8 mtrr;
 
-	/*
-	 * Do not use a huge page when the range is covered by non-WB type
-	 * of MTRRs.
-	 */
 	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
+	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
 		return 0;
 
 	prot = pgprot_4k_2_large(prot);
@@ -584,16 +590,22 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 	return 1;
 }
 
+/**
+ * pmd_set_huge - setup kernel PMD mapping
+ *
+ * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
+ * this function does not set up a huge page when the range is covered
+ * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
+ * disabled.
+ *
+ * Returns 1 on success and 0 on failure.
+ */
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
 {
 	u8 mtrr;
 
-	/*
-	 * Do not use a huge page when the range is covered by non-WB type
-	 * of MTRRs.
-	 */
 	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != 0xFF))
+	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
 		return 0;
 
 	prot = pgprot_4k_2_large(prot);
@@ -605,6 +617,11 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
 	return 1;
 }
 
+/**
+ * pud_clear_huge - clear kernel PUD mapping when it is set
+ *
+ * Returns 1 on success and 0 on failure (no PUD map is found).
+ */
 int pud_clear_huge(pud_t *pud)
 {
 	if (pud_large(*pud)) {
@@ -615,6 +632,11 @@ int pud_clear_huge(pud_t *pud)
 	return 0;
 }
 
+/**
+ * pmd_clear_huge - clear kernel PMD mapping when it is set
+ *
+ * Returns 1 on success and 0 on failure (no PMD map is found).
+ */
 int pmd_clear_huge(pmd_t *pmd)
 {
 	if (pmd_large(*pmd)) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/mm] x86/mm/mtrr: Clean up mtrr_type_lookup()
  2015-05-26  8:28 ` [PATCH 05/18] x86/mtrr: Clean up mtrr_type_lookup() Borislav Petkov
@ 2015-05-27 14:19     ` tip-bot for Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Toshi Kani @ 2015-05-27 14:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, dvlasenk, bp, bp, mingo, luto, linux-mm, linux-kernel,
	torvalds, mcgrof, toshi.kani, brgerst, peterz, akpm, tglx

Commit-ID:  0cc705f56e400764a171055f727d28a48260bb4b
Gitweb:     http://git.kernel.org/tip/0cc705f56e400764a171055f727d28a48260bb4b
Author:     Toshi Kani <toshi.kani@hp.com>
AuthorDate: Tue, 26 May 2015 10:28:08 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:40:57 +0200

x86/mm/mtrr: Clean up mtrr_type_lookup()

MTRRs contain fixed and variable entries. mtrr_type_lookup() may
repeatedly call __mtrr_type_lookup() to handle a request that
overlaps with variable entries.

However, __mtrr_type_lookup() also handles the fixed entries,
which do not have to be repeated. Therefore, this patch creates
separate functions, mtrr_type_lookup_fixed() and
mtrr_type_lookup_variable(), to handle the fixed and variable
ranges respectively.

The patch also updates the function headers to clarify the
return values and output argument. It updates comments to
clarify that the repeating is necessary to handle overlaps with
the default type, since overlaps with multiple entries alone can
be handled without such repeating.

There is no functional change in this patch.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-6-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/mtrr/generic.c | 138 +++++++++++++++++++++++--------------
 1 file changed, 86 insertions(+), 52 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 7b1491c..e51100c 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -102,55 +102,68 @@ static int check_type_overlap(u8 *prev, u8 *curr)
 	return 0;
 }
 
-/*
- * Error/Semi-error returns:
- * MTRR_TYPE_INVALID - when MTRR is not enabled
- * *repeat == 1 implies [start:end] spanned across MTRR range and type returned
- *		corresponds only to [start:*partial_end].
- *		Caller has to lookup again for [*partial_end:end].
+/**
+ * mtrr_type_lookup_fixed - look up memory type in MTRR fixed entries
+ *
+ * Return the MTRR fixed memory type of 'start'.
+ *
+ * MTRR fixed entries are divided into the following ways:
+ *  0x00000 - 0x7FFFF : This range is divided into eight 64KB sub-ranges
+ *  0x80000 - 0xBFFFF : This range is divided into sixteen 16KB sub-ranges
+ *  0xC0000 - 0xFFFFF : This range is divided into sixty-four 4KB sub-ranges
+ *
+ * Return Values:
+ * MTRR_TYPE_(type)  - Matched memory type
+ * MTRR_TYPE_INVALID - Unmatched
+ */
+static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
+{
+	int idx;
+
+	if (start >= 0x100000)
+		return MTRR_TYPE_INVALID;
+
+	/* 0x0 - 0x7FFFF */
+	if (start < 0x80000) {
+		idx = 0;
+		idx += (start >> 16);
+		return mtrr_state.fixed_ranges[idx];
+	/* 0x80000 - 0xBFFFF */
+	} else if (start < 0xC0000) {
+		idx = 1 * 8;
+		idx += ((start - 0x80000) >> 14);
+		return mtrr_state.fixed_ranges[idx];
+	}
+
+	/* 0xC0000 - 0xFFFFF */
+	idx = 3 * 8;
+	idx += ((start - 0xC0000) >> 12);
+	return mtrr_state.fixed_ranges[idx];
+}
+
+/**
+ * mtrr_type_lookup_variable - look up memory type in MTRR variable entries
+ *
+ * Return Value:
+ * MTRR_TYPE_(type) - Matched memory type or default memory type (unmatched)
+ *
+ * Output Argument:
+ * repeat - Set to 1 when [start:end] spanned across MTRR range and type
+ *	    returned corresponds only to [start:*partial_end].  Caller has
+ *	    to lookup again for [*partial_end:end].
  */
-static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
+static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
+				    int *repeat)
 {
 	int i;
 	u64 base, mask;
 	u8 prev_match, curr_match;
 
 	*repeat = 0;
-	if (!mtrr_state_set)
-		return MTRR_TYPE_INVALID;
-
-	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
-		return MTRR_TYPE_INVALID;
 
-	/* Make end inclusive end, instead of exclusive */
+	/* Make end inclusive instead of exclusive */
 	end--;
 
-	/* Look in fixed ranges. Just return the type as per start */
-	if ((start < 0x100000) &&
-	    (mtrr_state.have_fixed) &&
-	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
-		int idx;
-
-		if (start < 0x80000) {
-			idx = 0;
-			idx += (start >> 16);
-			return mtrr_state.fixed_ranges[idx];
-		} else if (start < 0xC0000) {
-			idx = 1 * 8;
-			idx += ((start - 0x80000) >> 14);
-			return mtrr_state.fixed_ranges[idx];
-		} else {
-			idx = 3 * 8;
-			idx += ((start - 0xC0000) >> 12);
-			return mtrr_state.fixed_ranges[idx];
-		}
-	}
-
-	/*
-	 * Look in variable ranges
-	 * Look of multiple ranges matching this address and pick type
-	 * as per MTRR precedence
-	 */
 	prev_match = MTRR_TYPE_INVALID;
 	for (i = 0; i < num_var_ranges; ++i) {
 		unsigned short start_state, end_state, inclusive;
@@ -186,7 +199,8 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			 * advised to lookup again after having adjusted start
 			 * and end.
 			 *
-			 * Note: This way we handle multiple overlaps as well.
+			 * Note: This way we handle overlaps with multiple
+			 * entries and the default type properly.
 			 */
 			if (start_state)
 				*partial_end = base + get_mtrr_size(mask);
@@ -215,21 +229,18 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			return curr_match;
 	}
 
-	if (mtrr_tom2) {
-		if (start >= (1ULL<<32) && (end < mtrr_tom2))
-			return MTRR_TYPE_WRBACK;
-	}
-
 	if (prev_match != MTRR_TYPE_INVALID)
 		return prev_match;
 
 	return mtrr_state.def_type;
 }
 
-/*
- * Returns the effective MTRR type for the region
- * Error return:
- * MTRR_TYPE_INVALID - when MTRR is not enabled
+/**
+ * mtrr_type_lookup - look up memory type in MTRR
+ *
+ * Return Values:
+ * MTRR_TYPE_(type)  - The effective MTRR type for the region
+ * MTRR_TYPE_INVALID - MTRR is disabled
  */
 u8 mtrr_type_lookup(u64 start, u64 end)
 {
@@ -237,22 +248,45 @@ u8 mtrr_type_lookup(u64 start, u64 end)
 	int repeat;
 	u64 partial_end;
 
-	type = __mtrr_type_lookup(start, end, &partial_end, &repeat);
+	if (!mtrr_state_set)
+		return MTRR_TYPE_INVALID;
+
+	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
+		return MTRR_TYPE_INVALID;
+
+	/*
+	 * Look up the fixed ranges first, which take priority over
+	 * the variable ranges.
+	 */
+	if ((start < 0x100000) &&
+	    (mtrr_state.have_fixed) &&
+	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
+		return mtrr_type_lookup_fixed(start, end);
+
+	/*
+	 * Look up the variable ranges.  Look of multiple ranges matching
+	 * this address and pick type as per MTRR precedence.
+	 */
+	type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
 
 	/*
 	 * Common path is with repeat = 0.
 	 * However, we can have cases where [start:end] spans across some
-	 * MTRR range. Do repeated lookups for that case here.
+	 * MTRR ranges and/or the default type.  Do repeated lookups for
+	 * that case here.
 	 */
 	while (repeat) {
 		prev_type = type;
 		start = partial_end;
-		type = __mtrr_type_lookup(start, end, &partial_end, &repeat);
+		type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
 
 		if (check_type_overlap(&prev_type, &type))
 			return type;
 	}
 
+	if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
+		return MTRR_TYPE_WRBACK;
+
 	return type;
 }
 

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/mm] x86/mm/mtrr: Clean up mtrr_type_lookup()
@ 2015-05-27 14:19     ` tip-bot for Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Toshi Kani @ 2015-05-27 14:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, dvlasenk, bp, bp, mingo, luto, linux-mm, linux-kernel,
	torvalds, mcgrof, toshi.kani, brgerst, peterz, akpm, tglx

Commit-ID:  0cc705f56e400764a171055f727d28a48260bb4b
Gitweb:     http://git.kernel.org/tip/0cc705f56e400764a171055f727d28a48260bb4b
Author:     Toshi Kani <toshi.kani@hp.com>
AuthorDate: Tue, 26 May 2015 10:28:08 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:40:57 +0200

x86/mm/mtrr: Clean up mtrr_type_lookup()

MTRRs contain fixed and variable entries. mtrr_type_lookup() may
repeatedly call __mtrr_type_lookup() to handle a request that
overlaps with variable entries.

However, __mtrr_type_lookup() also handles the fixed entries,
which do not have to be repeated. Therefore, this patch creates
separate functions, mtrr_type_lookup_fixed() and
mtrr_type_lookup_variable(), to handle the fixed and variable
ranges respectively.

The patch also updates the function headers to clarify the
return values and output argument. It updates comments to
clarify that the repeating is necessary to handle overlaps with
the default type, since overlaps with multiple entries alone can
be handled without such repeating.

There is no functional change in this patch.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-6-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/mtrr/generic.c | 138 +++++++++++++++++++++++--------------
 1 file changed, 86 insertions(+), 52 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 7b1491c..e51100c 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -102,55 +102,68 @@ static int check_type_overlap(u8 *prev, u8 *curr)
 	return 0;
 }
 
-/*
- * Error/Semi-error returns:
- * MTRR_TYPE_INVALID - when MTRR is not enabled
- * *repeat == 1 implies [start:end] spanned across MTRR range and type returned
- *		corresponds only to [start:*partial_end].
- *		Caller has to lookup again for [*partial_end:end].
+/**
+ * mtrr_type_lookup_fixed - look up memory type in MTRR fixed entries
+ *
+ * Return the MTRR fixed memory type of 'start'.
+ *
+ * MTRR fixed entries are divided into the following ways:
+ *  0x00000 - 0x7FFFF : This range is divided into eight 64KB sub-ranges
+ *  0x80000 - 0xBFFFF : This range is divided into sixteen 16KB sub-ranges
+ *  0xC0000 - 0xFFFFF : This range is divided into sixty-four 4KB sub-ranges
+ *
+ * Return Values:
+ * MTRR_TYPE_(type)  - Matched memory type
+ * MTRR_TYPE_INVALID - Unmatched
+ */
+static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
+{
+	int idx;
+
+	if (start >= 0x100000)
+		return MTRR_TYPE_INVALID;
+
+	/* 0x0 - 0x7FFFF */
+	if (start < 0x80000) {
+		idx = 0;
+		idx += (start >> 16);
+		return mtrr_state.fixed_ranges[idx];
+	/* 0x80000 - 0xBFFFF */
+	} else if (start < 0xC0000) {
+		idx = 1 * 8;
+		idx += ((start - 0x80000) >> 14);
+		return mtrr_state.fixed_ranges[idx];
+	}
+
+	/* 0xC0000 - 0xFFFFF */
+	idx = 3 * 8;
+	idx += ((start - 0xC0000) >> 12);
+	return mtrr_state.fixed_ranges[idx];
+}
+
+/**
+ * mtrr_type_lookup_variable - look up memory type in MTRR variable entries
+ *
+ * Return Value:
+ * MTRR_TYPE_(type) - Matched memory type or default memory type (unmatched)
+ *
+ * Output Argument:
+ * repeat - Set to 1 when [start:end] spanned across MTRR range and type
+ *	    returned corresponds only to [start:*partial_end].  Caller has
+ *	    to lookup again for [*partial_end:end].
  */
-static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
+static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
+				    int *repeat)
 {
 	int i;
 	u64 base, mask;
 	u8 prev_match, curr_match;
 
 	*repeat = 0;
-	if (!mtrr_state_set)
-		return MTRR_TYPE_INVALID;
-
-	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
-		return MTRR_TYPE_INVALID;
 
-	/* Make end inclusive end, instead of exclusive */
+	/* Make end inclusive instead of exclusive */
 	end--;
 
-	/* Look in fixed ranges. Just return the type as per start */
-	if ((start < 0x100000) &&
-	    (mtrr_state.have_fixed) &&
-	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
-		int idx;
-
-		if (start < 0x80000) {
-			idx = 0;
-			idx += (start >> 16);
-			return mtrr_state.fixed_ranges[idx];
-		} else if (start < 0xC0000) {
-			idx = 1 * 8;
-			idx += ((start - 0x80000) >> 14);
-			return mtrr_state.fixed_ranges[idx];
-		} else {
-			idx = 3 * 8;
-			idx += ((start - 0xC0000) >> 12);
-			return mtrr_state.fixed_ranges[idx];
-		}
-	}
-
-	/*
-	 * Look in variable ranges
-	 * Look of multiple ranges matching this address and pick type
-	 * as per MTRR precedence
-	 */
 	prev_match = MTRR_TYPE_INVALID;
 	for (i = 0; i < num_var_ranges; ++i) {
 		unsigned short start_state, end_state, inclusive;
@@ -186,7 +199,8 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			 * advised to lookup again after having adjusted start
 			 * and end.
 			 *
-			 * Note: This way we handle multiple overlaps as well.
+			 * Note: This way we handle overlaps with multiple
+			 * entries and the default type properly.
 			 */
 			if (start_state)
 				*partial_end = base + get_mtrr_size(mask);
@@ -215,21 +229,18 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
 			return curr_match;
 	}
 
-	if (mtrr_tom2) {
-		if (start >= (1ULL<<32) && (end < mtrr_tom2))
-			return MTRR_TYPE_WRBACK;
-	}
-
 	if (prev_match != MTRR_TYPE_INVALID)
 		return prev_match;
 
 	return mtrr_state.def_type;
 }
 
-/*
- * Returns the effective MTRR type for the region
- * Error return:
- * MTRR_TYPE_INVALID - when MTRR is not enabled
+/**
+ * mtrr_type_lookup - look up memory type in MTRR
+ *
+ * Return Values:
+ * MTRR_TYPE_(type)  - The effective MTRR type for the region
+ * MTRR_TYPE_INVALID - MTRR is disabled
  */
 u8 mtrr_type_lookup(u64 start, u64 end)
 {
@@ -237,22 +248,45 @@ u8 mtrr_type_lookup(u64 start, u64 end)
 	int repeat;
 	u64 partial_end;
 
-	type = __mtrr_type_lookup(start, end, &partial_end, &repeat);
+	if (!mtrr_state_set)
+		return MTRR_TYPE_INVALID;
+
+	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
+		return MTRR_TYPE_INVALID;
+
+	/*
+	 * Look up the fixed ranges first, which take priority over
+	 * the variable ranges.
+	 */
+	if ((start < 0x100000) &&
+	    (mtrr_state.have_fixed) &&
+	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
+		return mtrr_type_lookup_fixed(start, end);
+
+	/*
+	 * Look up the variable ranges.  Look of multiple ranges matching
+	 * this address and pick type as per MTRR precedence.
+	 */
+	type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
 
 	/*
 	 * Common path is with repeat = 0.
 	 * However, we can have cases where [start:end] spans across some
-	 * MTRR range. Do repeated lookups for that case here.
+	 * MTRR ranges and/or the default type.  Do repeated lookups for
+	 * that case here.
 	 */
 	while (repeat) {
 		prev_type = type;
 		start = partial_end;
-		type = __mtrr_type_lookup(start, end, &partial_end, &repeat);
+		type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
 
 		if (check_type_overlap(&prev_type, &type))
 			return type;
 	}
 
+	if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
+		return MTRR_TYPE_WRBACK;
+
 	return type;
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/mm] x86/mm/mtrr: Enhance MTRR checks in kernel mapping helpers
  2015-05-26  8:28 ` [PATCH 07/18] x86/mm: Enhance MTRR checks in kernel mapping helpers Borislav Petkov
@ 2015-05-27 14:19     ` tip-bot for Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Toshi Kani @ 2015-05-27 14:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: luto, toshi.kani, torvalds, dvlasenk, brgerst, peterz, bp, tglx,
	linux-mm, mcgrof, akpm, linux-kernel, hpa, bp, mingo

Commit-ID:  b73522e0c1be58d3c69b124985b8ccf94e3677f7
Gitweb:     http://git.kernel.org/tip/b73522e0c1be58d3c69b124985b8ccf94e3677f7
Author:     Toshi Kani <toshi.kani@hp.com>
AuthorDate: Tue, 26 May 2015 10:28:10 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:40:58 +0200

x86/mm/mtrr: Enhance MTRR checks in kernel mapping helpers

This patch adds the argument 'uniform' to mtrr_type_lookup(),
which gets set to 1 when a given range is covered uniformly by
MTRRs, i.e. the range is fully covered by a single MTRR entry or
the default type.

Change pud_set_huge() and pmd_set_huge() to honor the 'uniform'
flag to see if it is safe to create a huge page mapping in the
range.

This allows them to create a huge page mapping in a range
covered by a single MTRR entry of any memory type. It also
detects a non-optimal request properly. They continue to check
with the WB type since it does not effectively change the
uniform mapping even if a request spans multiple MTRR entries.

pmd_set_huge() logs a warning message to a non-optimal request
so that driver writers will be aware of such a case. Drivers
should make a mapping request aligned to a single MTRR entry
when the range is covered by MTRRs.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
[ Realign, flesh out comments, improve warning message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-7-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-8-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/mtrr.h        |  4 ++--
 arch/x86/kernel/cpu/mtrr/generic.c | 40 ++++++++++++++++++++++++++++----------
 arch/x86/mm/pat.c                  |  4 ++--
 arch/x86/mm/pgtable.c              | 38 +++++++++++++++++++++++-------------
 4 files changed, 58 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index bb03a54..a31759e 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -31,7 +31,7 @@
  * arch_phys_wc_add and arch_phys_wc_del.
  */
 # ifdef CONFIG_MTRR
-extern u8 mtrr_type_lookup(u64 addr, u64 end);
+extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform);
 extern void mtrr_save_fixed_ranges(void *);
 extern void mtrr_save_state(void);
 extern int mtrr_add(unsigned long base, unsigned long size,
@@ -50,7 +50,7 @@ extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
 extern int amd_special_default_mtrr(void);
 extern int phys_wc_to_mtrr_index(int handle);
 #  else
-static inline u8 mtrr_type_lookup(u64 addr, u64 end)
+static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
 {
 	/*
 	 * Return no-MTRRs:
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index e51100c..f782d9b 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -147,19 +147,24 @@ static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
  * Return Value:
  * MTRR_TYPE_(type) - Matched memory type or default memory type (unmatched)
  *
- * Output Argument:
+ * Output Arguments:
  * repeat - Set to 1 when [start:end] spanned across MTRR range and type
  *	    returned corresponds only to [start:*partial_end].  Caller has
  *	    to lookup again for [*partial_end:end].
+ *
+ * uniform - Set to 1 when an MTRR covers the region uniformly, i.e. the
+ *	     region is fully covered by a single MTRR entry or the default
+ *	     type.
  */
 static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
-				    int *repeat)
+				    int *repeat, u8 *uniform)
 {
 	int i;
 	u64 base, mask;
 	u8 prev_match, curr_match;
 
 	*repeat = 0;
+	*uniform = 1;
 
 	/* Make end inclusive instead of exclusive */
 	end--;
@@ -214,6 +219,7 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
 
 			end = *partial_end - 1; /* end is inclusive */
 			*repeat = 1;
+			*uniform = 0;
 		}
 
 		if ((start & mask) != (base & mask))
@@ -225,6 +231,7 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
 			continue;
 		}
 
+		*uniform = 0;
 		if (check_type_overlap(&prev_match, &curr_match))
 			return curr_match;
 	}
@@ -241,10 +248,15 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
  * Return Values:
  * MTRR_TYPE_(type)  - The effective MTRR type for the region
  * MTRR_TYPE_INVALID - MTRR is disabled
+ *
+ * Output Argument:
+ * uniform - Set to 1 when an MTRR covers the region uniformly, i.e. the
+ *	     region is fully covered by a single MTRR entry or the default
+ *	     type.
  */
-u8 mtrr_type_lookup(u64 start, u64 end)
+u8 mtrr_type_lookup(u64 start, u64 end, u8 *uniform)
 {
-	u8 type, prev_type;
+	u8 type, prev_type, is_uniform = 1, dummy;
 	int repeat;
 	u64 partial_end;
 
@@ -260,14 +272,18 @@ u8 mtrr_type_lookup(u64 start, u64 end)
 	 */
 	if ((start < 0x100000) &&
 	    (mtrr_state.have_fixed) &&
-	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
-		return mtrr_type_lookup_fixed(start, end);
+	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
+		is_uniform = 0;
+		type = mtrr_type_lookup_fixed(start, end);
+		goto out;
+	}
 
 	/*
 	 * Look up the variable ranges.  Look of multiple ranges matching
 	 * this address and pick type as per MTRR precedence.
 	 */
-	type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
+	type = mtrr_type_lookup_variable(start, end, &partial_end,
+					 &repeat, &is_uniform);
 
 	/*
 	 * Common path is with repeat = 0.
@@ -278,15 +294,19 @@ u8 mtrr_type_lookup(u64 start, u64 end)
 	while (repeat) {
 		prev_type = type;
 		start = partial_end;
-		type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
+		is_uniform = 0;
+		type = mtrr_type_lookup_variable(start, end, &partial_end,
+						 &repeat, &dummy);
 
 		if (check_type_overlap(&prev_type, &type))
-			return type;
+			goto out;
 	}
 
 	if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
-		return MTRR_TYPE_WRBACK;
+		type = MTRR_TYPE_WRBACK;
 
+out:
+	*uniform = is_uniform;
 	return type;
 }
 
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 35af677..372ad42 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -267,9 +267,9 @@ static unsigned long pat_x_mtrr_type(u64 start, u64 end,
 	 * request is for WB.
 	 */
 	if (req_type == _PAGE_CACHE_MODE_WB) {
-		u8 mtrr_type;
+		u8 mtrr_type, uniform;
 
-		mtrr_type = mtrr_type_lookup(start, end);
+		mtrr_type = mtrr_type_lookup(start, end, &uniform);
 		if (mtrr_type != MTRR_TYPE_WRBACK)
 			return _PAGE_CACHE_MODE_UC_MINUS;
 
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index c30f981..fb0a9dd 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -566,19 +566,28 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
 /**
  * pud_set_huge - setup kernel PUD mapping
  *
- * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
- * this function does not set up a huge page when the range is covered
- * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
- * disabled.
+ * MTRRs can override PAT memory types with 4KiB granularity. Therefore, this
+ * function sets up a huge page only if any of the following conditions are met:
+ *
+ * - MTRRs are disabled, or
+ *
+ * - MTRRs are enabled and the range is completely covered by a single MTRR, or
+ *
+ * - MTRRs are enabled and the corresponding MTRR memory type is WB, which
+ *   has no effect on the requested PAT memory type.
+ *
+ * Callers should try to decrease page size (1GB -> 2MB -> 4K) if the bigger
+ * page mapping attempt fails.
  *
  * Returns 1 on success and 0 on failure.
  */
 int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 {
-	u8 mtrr;
+	u8 mtrr, uniform;
 
-	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
+	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
+	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
+	    (mtrr != MTRR_TYPE_WRBACK))
 		return 0;
 
 	prot = pgprot_4k_2_large(prot);
@@ -593,20 +602,21 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 /**
  * pmd_set_huge - setup kernel PMD mapping
  *
- * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
- * this function does not set up a huge page when the range is covered
- * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
- * disabled.
+ * See text over pud_set_huge() above.
  *
  * Returns 1 on success and 0 on failure.
  */
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
 {
-	u8 mtrr;
+	u8 mtrr, uniform;
 
-	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
+	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
+	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
+	    (mtrr != MTRR_TYPE_WRBACK)) {
+		pr_warn_once("%s: Cannot satisfy [mem %#010llx-%#010llx] with a huge-page mapping due to MTRR override.\n",
+			     __func__, addr, addr + PMD_SIZE);
 		return 0;
+	}
 
 	prot = pgprot_4k_2_large(prot);
 

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/mm] x86/mm/mtrr: Enhance MTRR checks in kernel mapping helpers
@ 2015-05-27 14:19     ` tip-bot for Toshi Kani
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Toshi Kani @ 2015-05-27 14:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: luto, toshi.kani, torvalds, dvlasenk, brgerst, peterz, bp, tglx,
	linux-mm, mcgrof, akpm, linux-kernel, hpa, bp, mingo

Commit-ID:  b73522e0c1be58d3c69b124985b8ccf94e3677f7
Gitweb:     http://git.kernel.org/tip/b73522e0c1be58d3c69b124985b8ccf94e3677f7
Author:     Toshi Kani <toshi.kani@hp.com>
AuthorDate: Tue, 26 May 2015 10:28:10 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:40:58 +0200

x86/mm/mtrr: Enhance MTRR checks in kernel mapping helpers

This patch adds the argument 'uniform' to mtrr_type_lookup(),
which gets set to 1 when a given range is covered uniformly by
MTRRs, i.e. the range is fully covered by a single MTRR entry or
the default type.

Change pud_set_huge() and pmd_set_huge() to honor the 'uniform'
flag to see if it is safe to create a huge page mapping in the
range.

This allows them to create a huge page mapping in a range
covered by a single MTRR entry of any memory type. It also
detects a non-optimal request properly. They continue to check
with the WB type since it does not effectively change the
uniform mapping even if a request spans multiple MTRR entries.

pmd_set_huge() logs a warning message to a non-optimal request
so that driver writers will be aware of such a case. Drivers
should make a mapping request aligned to a single MTRR entry
when the range is covered by MTRRs.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
[ Realign, flesh out comments, improve warning message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Elliott@hp.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: pebolle@tiscali.nl
Link: http://lkml.kernel.org/r/1431714237-880-7-git-send-email-toshi.kani@hp.com
Link: http://lkml.kernel.org/r/1432628901-18044-8-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/mtrr.h        |  4 ++--
 arch/x86/kernel/cpu/mtrr/generic.c | 40 ++++++++++++++++++++++++++++----------
 arch/x86/mm/pat.c                  |  4 ++--
 arch/x86/mm/pgtable.c              | 38 +++++++++++++++++++++++-------------
 4 files changed, 58 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index bb03a54..a31759e 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -31,7 +31,7 @@
  * arch_phys_wc_add and arch_phys_wc_del.
  */
 # ifdef CONFIG_MTRR
-extern u8 mtrr_type_lookup(u64 addr, u64 end);
+extern u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform);
 extern void mtrr_save_fixed_ranges(void *);
 extern void mtrr_save_state(void);
 extern int mtrr_add(unsigned long base, unsigned long size,
@@ -50,7 +50,7 @@ extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
 extern int amd_special_default_mtrr(void);
 extern int phys_wc_to_mtrr_index(int handle);
 #  else
-static inline u8 mtrr_type_lookup(u64 addr, u64 end)
+static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
 {
 	/*
 	 * Return no-MTRRs:
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index e51100c..f782d9b 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -147,19 +147,24 @@ static u8 mtrr_type_lookup_fixed(u64 start, u64 end)
  * Return Value:
  * MTRR_TYPE_(type) - Matched memory type or default memory type (unmatched)
  *
- * Output Argument:
+ * Output Arguments:
  * repeat - Set to 1 when [start:end] spanned across MTRR range and type
  *	    returned corresponds only to [start:*partial_end].  Caller has
  *	    to lookup again for [*partial_end:end].
+ *
+ * uniform - Set to 1 when an MTRR covers the region uniformly, i.e. the
+ *	     region is fully covered by a single MTRR entry or the default
+ *	     type.
  */
 static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
-				    int *repeat)
+				    int *repeat, u8 *uniform)
 {
 	int i;
 	u64 base, mask;
 	u8 prev_match, curr_match;
 
 	*repeat = 0;
+	*uniform = 1;
 
 	/* Make end inclusive instead of exclusive */
 	end--;
@@ -214,6 +219,7 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
 
 			end = *partial_end - 1; /* end is inclusive */
 			*repeat = 1;
+			*uniform = 0;
 		}
 
 		if ((start & mask) != (base & mask))
@@ -225,6 +231,7 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
 			continue;
 		}
 
+		*uniform = 0;
 		if (check_type_overlap(&prev_match, &curr_match))
 			return curr_match;
 	}
@@ -241,10 +248,15 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end,
  * Return Values:
  * MTRR_TYPE_(type)  - The effective MTRR type for the region
  * MTRR_TYPE_INVALID - MTRR is disabled
+ *
+ * Output Argument:
+ * uniform - Set to 1 when an MTRR covers the region uniformly, i.e. the
+ *	     region is fully covered by a single MTRR entry or the default
+ *	     type.
  */
-u8 mtrr_type_lookup(u64 start, u64 end)
+u8 mtrr_type_lookup(u64 start, u64 end, u8 *uniform)
 {
-	u8 type, prev_type;
+	u8 type, prev_type, is_uniform = 1, dummy;
 	int repeat;
 	u64 partial_end;
 
@@ -260,14 +272,18 @@ u8 mtrr_type_lookup(u64 start, u64 end)
 	 */
 	if ((start < 0x100000) &&
 	    (mtrr_state.have_fixed) &&
-	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
-		return mtrr_type_lookup_fixed(start, end);
+	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED)) {
+		is_uniform = 0;
+		type = mtrr_type_lookup_fixed(start, end);
+		goto out;
+	}
 
 	/*
 	 * Look up the variable ranges.  Look of multiple ranges matching
 	 * this address and pick type as per MTRR precedence.
 	 */
-	type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
+	type = mtrr_type_lookup_variable(start, end, &partial_end,
+					 &repeat, &is_uniform);
 
 	/*
 	 * Common path is with repeat = 0.
@@ -278,15 +294,19 @@ u8 mtrr_type_lookup(u64 start, u64 end)
 	while (repeat) {
 		prev_type = type;
 		start = partial_end;
-		type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
+		is_uniform = 0;
+		type = mtrr_type_lookup_variable(start, end, &partial_end,
+						 &repeat, &dummy);
 
 		if (check_type_overlap(&prev_type, &type))
-			return type;
+			goto out;
 	}
 
 	if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
-		return MTRR_TYPE_WRBACK;
+		type = MTRR_TYPE_WRBACK;
 
+out:
+	*uniform = is_uniform;
 	return type;
 }
 
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 35af677..372ad42 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -267,9 +267,9 @@ static unsigned long pat_x_mtrr_type(u64 start, u64 end,
 	 * request is for WB.
 	 */
 	if (req_type == _PAGE_CACHE_MODE_WB) {
-		u8 mtrr_type;
+		u8 mtrr_type, uniform;
 
-		mtrr_type = mtrr_type_lookup(start, end);
+		mtrr_type = mtrr_type_lookup(start, end, &uniform);
 		if (mtrr_type != MTRR_TYPE_WRBACK)
 			return _PAGE_CACHE_MODE_UC_MINUS;
 
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index c30f981..fb0a9dd 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -566,19 +566,28 @@ void native_set_fixmap(enum fixed_addresses idx, phys_addr_t phys,
 /**
  * pud_set_huge - setup kernel PUD mapping
  *
- * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
- * this function does not set up a huge page when the range is covered
- * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
- * disabled.
+ * MTRRs can override PAT memory types with 4KiB granularity. Therefore, this
+ * function sets up a huge page only if any of the following conditions are met:
+ *
+ * - MTRRs are disabled, or
+ *
+ * - MTRRs are enabled and the range is completely covered by a single MTRR, or
+ *
+ * - MTRRs are enabled and the corresponding MTRR memory type is WB, which
+ *   has no effect on the requested PAT memory type.
+ *
+ * Callers should try to decrease page size (1GB -> 2MB -> 4K) if the bigger
+ * page mapping attempt fails.
  *
  * Returns 1 on success and 0 on failure.
  */
 int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 {
-	u8 mtrr;
+	u8 mtrr, uniform;
 
-	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
+	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
+	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
+	    (mtrr != MTRR_TYPE_WRBACK))
 		return 0;
 
 	prot = pgprot_4k_2_large(prot);
@@ -593,20 +602,21 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 /**
  * pmd_set_huge - setup kernel PMD mapping
  *
- * MTRR can override PAT memory types with 4KiB granularity.  Therefore,
- * this function does not set up a huge page when the range is covered
- * by a non-WB type of MTRR.  MTRR_TYPE_INVALID indicates that MTRR are
- * disabled.
+ * See text over pud_set_huge() above.
  *
  * Returns 1 on success and 0 on failure.
  */
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
 {
-	u8 mtrr;
+	u8 mtrr, uniform;
 
-	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE);
-	if ((mtrr != MTRR_TYPE_WRBACK) && (mtrr != MTRR_TYPE_INVALID))
+	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
+	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
+	    (mtrr != MTRR_TYPE_WRBACK)) {
+		pr_warn_once("%s: Cannot satisfy [mem %#010llx-%#010llx] with a huge-page mapping due to MTRR override.\n",
+			     __func__, addr, addr + PMD_SIZE);
 		return 0;
+	}
 
 	prot = pgprot_4k_2_large(prot);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/mm] x86/mm/pat: Convert to pr_*() usage
  2015-05-26  8:28 ` [PATCH 08/18] x86/mm/pat: Convert to pr_* usage Borislav Petkov
@ 2015-05-27 14:19   ` tip-bot for Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Luis R. Rodriguez @ 2015-05-27 14:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mst, torvalds, mcgrof, mingo, airlied, linux-kernel, luto,
	brgerst, awalls, peterz, hpa, bp, bp, jgross, bhelgaas, tglx,
	dledford, dvlasenk, daniel.vetter

Commit-ID:  9e76561f6a8a1a1c4f3152a3fb403ef9d6cfc2ff
Gitweb:     http://git.kernel.org/tip/9e76561f6a8a1a1c4f3152a3fb403ef9d6cfc2ff
Author:     Luis R. Rodriguez <mcgrof@suse.com>
AuthorDate: Tue, 26 May 2015 10:28:11 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:40:59 +0200

x86/mm/pat: Convert to pr_*() usage

Use pr_info() instead of the old printk to prefix the component
where things are coming from. With this readers will know
exactly where the message is coming from. We use pr_* helpers
but define pr_fmt to the empty string for easier grepping for
those error messages.

We leave the users of dprintk() in place, this will print only
when the debugpat kernel parameter is enabled. We want to leave
those enabled as a debug feature, but also make them use the
same prefix.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
[ Kill pr_fmt. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: cocci@systeme.lip6.fr
Cc: plagnioj@jcrosoft.com
Cc: tomi.valkeinen@ti.com
Link: http://lkml.kernel.org/r/1430425520-22275-2-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1432628901-18044-9-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/mm/pat.c          | 44 ++++++++++++++++++++++----------------------
 arch/x86/mm/pat_internal.h |  2 +-
 arch/x86/mm/pat_rbtree.c   |  6 +++---
 3 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 372ad42..8c50b9b 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -33,13 +33,16 @@
 #include "pat_internal.h"
 #include "mm_internal.h"
 
+#undef pr_fmt
+#define pr_fmt(fmt) "" fmt
+
 #ifdef CONFIG_X86_PAT
 int __read_mostly pat_enabled = 1;
 
 static inline void pat_disable(const char *reason)
 {
 	pat_enabled = 0;
-	printk(KERN_INFO "%s\n", reason);
+	pr_info("x86/PAT: %s\n", reason);
 }
 
 static int __init nopat(char *str)
@@ -188,7 +191,7 @@ void pat_init_cache_modes(void)
 					   pat_msg + 4 * i);
 		update_cache_mode_entry(i, cache);
 	}
-	pr_info("PAT configuration [0-7]: %s\n", pat_msg);
+	pr_info("x86/PAT: Configuration [0-7]: %s\n", pat_msg);
 }
 
 #define PAT(x, y)	((u64)PAT_ ## y << ((x)*8))
@@ -211,8 +214,7 @@ void pat_init(void)
 			 * switched to PAT on the boot CPU. We have no way to
 			 * undo PAT.
 			 */
-			printk(KERN_ERR "PAT enabled, "
-			       "but not supported by secondary CPU\n");
+			pr_err("x86/PAT: PAT enabled, but not supported by secondary CPU\n");
 			BUG();
 		}
 	}
@@ -347,7 +349,7 @@ static int reserve_ram_pages_type(u64 start, u64 end,
 		page = pfn_to_page(pfn);
 		type = get_page_memtype(page);
 		if (type != -1) {
-			pr_info("reserve_ram_pages_type failed [mem %#010Lx-%#010Lx], track 0x%x, req 0x%x\n",
+			pr_info("x86/PAT: reserve_ram_pages_type failed [mem %#010Lx-%#010Lx], track 0x%x, req 0x%x\n",
 				start, end - 1, type, req_type);
 			if (new_type)
 				*new_type = type;
@@ -451,9 +453,9 @@ int reserve_memtype(u64 start, u64 end, enum page_cache_mode req_type,
 
 	err = rbt_memtype_check_insert(new, new_type);
 	if (err) {
-		printk(KERN_INFO "reserve_memtype failed [mem %#010Lx-%#010Lx], track %s, req %s\n",
-		       start, end - 1,
-		       cattr_name(new->type), cattr_name(req_type));
+		pr_info("x86/PAT: reserve_memtype failed [mem %#010Lx-%#010Lx], track %s, req %s\n",
+			start, end - 1,
+			cattr_name(new->type), cattr_name(req_type));
 		kfree(new);
 		spin_unlock(&memtype_lock);
 
@@ -497,8 +499,8 @@ int free_memtype(u64 start, u64 end)
 	spin_unlock(&memtype_lock);
 
 	if (!entry) {
-		printk(KERN_INFO "%s:%d freeing invalid memtype [mem %#010Lx-%#010Lx]\n",
-		       current->comm, current->pid, start, end - 1);
+		pr_info("x86/PAT: %s:%d freeing invalid memtype [mem %#010Lx-%#010Lx]\n",
+			current->comm, current->pid, start, end - 1);
 		return -EINVAL;
 	}
 
@@ -628,8 +630,8 @@ static inline int range_is_allowed(unsigned long pfn, unsigned long size)
 
 	while (cursor < to) {
 		if (!devmem_is_allowed(pfn)) {
-			printk(KERN_INFO "Program %s tried to access /dev/mem between [mem %#010Lx-%#010Lx], PAT prevents it\n",
-			       current->comm, from, to - 1);
+			pr_info("x86/PAT: Program %s tried to access /dev/mem between [mem %#010Lx-%#010Lx], PAT prevents it\n",
+				current->comm, from, to - 1);
 			return 0;
 		}
 		cursor += PAGE_SIZE;
@@ -698,8 +700,7 @@ int kernel_map_sync_memtype(u64 base, unsigned long size,
 				size;
 
 	if (ioremap_change_attr((unsigned long)__va(base), id_sz, pcm) < 0) {
-		printk(KERN_INFO "%s:%d ioremap_change_attr failed %s "
-			"for [mem %#010Lx-%#010Lx]\n",
+		pr_info("x86/PAT: %s:%d ioremap_change_attr failed %s for [mem %#010Lx-%#010Lx]\n",
 			current->comm, current->pid,
 			cattr_name(pcm),
 			base, (unsigned long long)(base + size-1));
@@ -734,7 +735,7 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
 
 		pcm = lookup_memtype(paddr);
 		if (want_pcm != pcm) {
-			printk(KERN_WARNING "%s:%d map pfn RAM range req %s for [mem %#010Lx-%#010Lx], got %s\n",
+			pr_warn("x86/PAT: %s:%d map pfn RAM range req %s for [mem %#010Lx-%#010Lx], got %s\n",
 				current->comm, current->pid,
 				cattr_name(want_pcm),
 				(unsigned long long)paddr,
@@ -755,13 +756,12 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
 		if (strict_prot ||
 		    !is_new_memtype_allowed(paddr, size, want_pcm, pcm)) {
 			free_memtype(paddr, paddr + size);
-			printk(KERN_ERR "%s:%d map pfn expected mapping type %s"
-				" for [mem %#010Lx-%#010Lx], got %s\n",
-				current->comm, current->pid,
-				cattr_name(want_pcm),
-				(unsigned long long)paddr,
-				(unsigned long long)(paddr + size - 1),
-				cattr_name(pcm));
+			pr_err("x86/PAT: %s:%d map pfn expected mapping type %s for [mem %#010Lx-%#010Lx], got %s\n",
+			       current->comm, current->pid,
+			       cattr_name(want_pcm),
+			       (unsigned long long)paddr,
+			       (unsigned long long)(paddr + size - 1),
+			       cattr_name(pcm));
 			return -EINVAL;
 		}
 		/*
diff --git a/arch/x86/mm/pat_internal.h b/arch/x86/mm/pat_internal.h
index f641162..a739bfc 100644
--- a/arch/x86/mm/pat_internal.h
+++ b/arch/x86/mm/pat_internal.h
@@ -4,7 +4,7 @@
 extern int pat_debug_enable;
 
 #define dprintk(fmt, arg...) \
-	do { if (pat_debug_enable) printk(KERN_INFO fmt, ##arg); } while (0)
+	do { if (pat_debug_enable) pr_info("x86/PAT: " fmt, ##arg); } while (0)
 
 struct memtype {
 	u64			start;
diff --git a/arch/x86/mm/pat_rbtree.c b/arch/x86/mm/pat_rbtree.c
index 6582adc..6393108 100644
--- a/arch/x86/mm/pat_rbtree.c
+++ b/arch/x86/mm/pat_rbtree.c
@@ -160,9 +160,9 @@ success:
 	return 0;
 
 failure:
-	printk(KERN_INFO "%s:%d conflicting memory types "
-		"%Lx-%Lx %s<->%s\n", current->comm, current->pid, start,
-		end, cattr_name(found_type), cattr_name(match->type));
+	pr_info("x86/PAT: %s:%d conflicting memory types %Lx-%Lx %s<->%s\n",
+		current->comm, current->pid, start, end,
+		cattr_name(found_type), cattr_name(match->type));
 	return -EBUSY;
 }
 

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/mm] x86/mm/mtrr, pat: Document Write Combining MTRR type effects on PAT / non-PAT pages
  2015-05-26  8:28 ` [PATCH 09/18] x86: Document Write Combining MTRR type effects on PAT / non-PAT pages Borislav Petkov
@ 2015-05-27 14:19   ` tip-bot for Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Luis R. Rodriguez @ 2015-05-27 14:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: adaplas, sbsiddha, plagnioj, airlied, vbabka, dbueso, luto,
	peterz, bp, corbet, tomi.valkeinen, hpa, mcgrof, brgerst,
	daniel.vetter, jgross, mingo, bp, linux-kernel, dvlasenk,
	torvalds, dave.hansen, tglx, mgorman, syrjala

Commit-ID:  2f9e897353fcb99effd6eff22f7b464f8e2a659a
Gitweb:     http://git.kernel.org/tip/2f9e897353fcb99effd6eff22f7b464f8e2a659a
Author:     Luis R. Rodriguez <mcgrof@suse.com>
AuthorDate: Tue, 26 May 2015 10:28:12 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:40:59 +0200

x86/mm/mtrr, pat: Document Write Combining MTRR type effects on PAT / non-PAT pages

As part of the effort to phase out MTRR use document
write-combining MTRR effects on pages with different non-PAT
page attributes flags and different PAT entry values. Extend
arch_phys_wc_add() documentation to clarify power of two sizes /
boundary requirements as we phase out mtrr_add() use.

Lastly hint towards ioremap_uc() for corner cases on device
drivers working with devices with mixed regions where MTRR size
requirements would otherwise not enable write-combining
effective memory types.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-fbdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1430343851-967-3-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1432628901-18044-10-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/x86/mtrr.txt      | 18 +++++++++++++++---
 Documentation/x86/pat.txt       | 35 ++++++++++++++++++++++++++++++++++-
 arch/x86/kernel/cpu/mtrr/main.c |  3 +++
 3 files changed, 52 insertions(+), 4 deletions(-)

diff --git a/Documentation/x86/mtrr.txt b/Documentation/x86/mtrr.txt
index cc071dc..860bc3a 100644
--- a/Documentation/x86/mtrr.txt
+++ b/Documentation/x86/mtrr.txt
@@ -1,7 +1,19 @@
 MTRR (Memory Type Range Register) control
-3 Jun 1999
-Richard Gooch
-<rgooch@atnf.csiro.au>
+
+Richard Gooch <rgooch@atnf.csiro.au> - 3 Jun 1999
+Luis R. Rodriguez <mcgrof@do-not-panic.com> - April 9, 2015
+
+===============================================================================
+Phasing out MTRR use
+
+MTRR use is replaced on modern x86 hardware with PAT. Over time the only type
+of effective MTRR that is expected to be supported will be for write-combining.
+As MTRR use is phased out device drivers should use arch_phys_wc_add() to make
+MTRR effective on non-PAT systems while a no-op on PAT enabled systems.
+
+For details refer to Documentation/x86/pat.txt.
+
+===============================================================================
 
   On Intel P6 family processors (Pentium Pro, Pentium II and later)
   the Memory Type Range Registers (MTRRs) may be used to control
diff --git a/Documentation/x86/pat.txt b/Documentation/x86/pat.txt
index cf08c9f..521bd8a 100644
--- a/Documentation/x86/pat.txt
+++ b/Documentation/x86/pat.txt
@@ -34,6 +34,8 @@ ioremap                |    --    |    UC-     |       UC-        |
                        |          |            |                  |
 ioremap_cache          |    --    |    WB      |       WB         |
                        |          |            |                  |
+ioremap_uc             |    --    |    UC      |       UC         |
+                       |          |            |                  |
 ioremap_nocache        |    --    |    UC-     |       UC-        |
                        |          |            |                  |
 ioremap_wc             |    --    |    --      |       WC         |
@@ -102,7 +104,38 @@ wants to export a RAM region, it has to do set_memory_uc() or set_memory_wc()
 as step 0 above and also track the usage of those pages and use set_memory_wb()
 before the page is freed to free pool.
 
-
+MTRR effects on PAT / non-PAT systems
+-------------------------------------
+
+The following table provides the effects of using write-combining MTRRs when
+using ioremap*() calls on x86 for both non-PAT and PAT systems. Ideally
+mtrr_add() usage will be phased out in favor of arch_phys_wc_add() which will
+be a no-op on PAT enabled systems. The region over which a arch_phys_wc_add()
+is made, should already have been ioremapped with WC attributes or PAT entries,
+this can be done by using ioremap_wc() / set_memory_wc().  Devices which
+combine areas of IO memory desired to remain uncacheable with areas where
+write-combining is desirable should consider use of ioremap_uc() followed by
+set_memory_wc() to white-list effective write-combined areas.  Such use is
+nevertheless discouraged as the effective memory type is considered
+implementation defined, yet this strategy can be used as last resort on devices
+with size-constrained regions where otherwise MTRR write-combining would
+otherwise not be effective.
+
+----------------------------------------------------------------------
+MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
+----------------------------------------------------------------------
+                                                  Non-PAT |  PAT
+     PAT
+     |PCD
+     ||PWT
+     |||
+WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
+WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
+WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   UC
+WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
+----------------------------------------------------------------------
+
+(*) denotes implementation defined and is discouraged
 
 Notes:
 
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index ea5f363..04aceb7 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -538,6 +538,9 @@ EXPORT_SYMBOL(mtrr_del);
  * attempts to add a WC MTRR covering size bytes starting at base and
  * logs an error if this fails.
  *
+ * The called should provide a power of two size on an equivalent
+ * power of two boundary.
+ *
  * Drivers must store the return value to pass to mtrr_del_wc_if_needed,
  * but drivers should not try to interpret that return value.
  */

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/mm] x86/mm/mtrr: Avoid #ifdeffery with phys_wc_to_mtrr_index()
  2015-05-26  8:28 ` [PATCH 10/18] x86/mtrr: Avoid ifdeffery with phys_wc_to_mtrr_index() Borislav Petkov
@ 2015-05-27 14:20   ` tip-bot for Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Luis R. Rodriguez @ 2015-05-27 14:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: matthias.bgg, plagnioj, akpm, hpa, luto, sbsiddha, will.deacon,
	bp, dbueso, brgerst, vbabka, a.kesavan, mcgrof, adaplas, syrjala,
	mgorman, dave.hansen, peterz, tomi.valkeinen, linux-kernel,
	jgross, airlied, mingo, treding, daniel.vetter, tglx,
	catalin.marinas, toshi.kani, cristian.stoica, torvalds, gregkh,
	dvlasenk, bp

Commit-ID:  7d010fdf299929f9583ce5e17da629dcd83c36ef
Gitweb:     http://git.kernel.org/tip/7d010fdf299929f9583ce5e17da629dcd83c36ef
Author:     Luis R. Rodriguez <mcgrof@suse.com>
AuthorDate: Tue, 26 May 2015 10:28:13 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:41:00 +0200

x86/mm/mtrr: Avoid #ifdeffery with phys_wc_to_mtrr_index()

There is only one user but since we're going to bury MTRR next
out of access to drivers, expose this last piece of API to
drivers in a general fashion only needing io.h for access to
helpers.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Abhilash Kesavan <a.kesavan@samsung.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Cristian Stoica <cristian.stoica@freescale.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: dri-devel@lists.freedesktop.org
Link: http://lkml.kernel.org/r/1429722736-4473-1-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1432628901-18044-11-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/io.h       |  3 +++
 arch/x86/include/asm/mtrr.h     |  5 -----
 arch/x86/kernel/cpu/mtrr/main.c |  6 +++---
 drivers/gpu/drm/drm_ioctl.c     | 14 +-------------
 include/linux/io.h              |  7 +++++++
 5 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 4afc05f..a2b9740 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -339,6 +339,9 @@ extern bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
 #define IO_SPACE_LIMIT 0xffff
 
 #ifdef CONFIG_MTRR
+extern int __must_check arch_phys_wc_index(int handle);
+#define arch_phys_wc_index arch_phys_wc_index
+
 extern int __must_check arch_phys_wc_add(unsigned long base,
 					 unsigned long size);
 extern void arch_phys_wc_del(int handle);
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index a31759e..b94f6f6 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -48,7 +48,6 @@ extern void mtrr_aps_init(void);
 extern void mtrr_bp_restore(void);
 extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
 extern int amd_special_default_mtrr(void);
-extern int phys_wc_to_mtrr_index(int handle);
 #  else
 static inline u8 mtrr_type_lookup(u64 addr, u64 end, u8 *uniform)
 {
@@ -84,10 +83,6 @@ static inline int mtrr_trim_uncached_memory(unsigned long end_pfn)
 static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
 {
 }
-static inline int phys_wc_to_mtrr_index(int handle)
-{
-	return -1;
-}
 
 #define mtrr_ap_init() do {} while (0)
 #define mtrr_bp_init() do {} while (0)
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 04aceb7..81baf5f 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -580,7 +580,7 @@ void arch_phys_wc_del(int handle)
 EXPORT_SYMBOL(arch_phys_wc_del);
 
 /*
- * phys_wc_to_mtrr_index - translates arch_phys_wc_add's return value
+ * arch_phys_wc_index - translates arch_phys_wc_add's return value
  * @handle: Return value from arch_phys_wc_add
  *
  * This will turn the return value from arch_phys_wc_add into an mtrr
@@ -590,14 +590,14 @@ EXPORT_SYMBOL(arch_phys_wc_del);
  * in printk line.  Alas there is an illegitimate use in some ancient
  * drm ioctls.
  */
-int phys_wc_to_mtrr_index(int handle)
+int arch_phys_wc_index(int handle)
 {
 	if (handle < MTRR_TO_PHYS_WC_OFFSET)
 		return -1;
 	else
 		return handle - MTRR_TO_PHYS_WC_OFFSET;
 }
-EXPORT_SYMBOL_GPL(phys_wc_to_mtrr_index);
+EXPORT_SYMBOL_GPL(arch_phys_wc_index);
 
 /*
  * HACK ALERT!
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 266dcd6..0a95782 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -36,9 +36,6 @@
 
 #include <linux/pci.h>
 #include <linux/export.h>
-#ifdef CONFIG_X86
-#include <asm/mtrr.h>
-#endif
 
 static int drm_version(struct drm_device *dev, void *data,
 		       struct drm_file *file_priv);
@@ -197,16 +194,7 @@ static int drm_getmap(struct drm_device *dev, void *data,
 	map->type = r_list->map->type;
 	map->flags = r_list->map->flags;
 	map->handle = (void *)(unsigned long) r_list->user_token;
-
-#ifdef CONFIG_X86
-	/*
-	 * There appears to be exactly one user of the mtrr index: dritest.
-	 * It's easy enough to keep it working on non-PAT systems.
-	 */
-	map->mtrr = phys_wc_to_mtrr_index(r_list->map->mtrr);
-#else
-	map->mtrr = -1;
-#endif
+	map->mtrr = arch_phys_wc_index(r_list->map->mtrr);
 
 	mutex_unlock(&dev->struct_mutex);
 
diff --git a/include/linux/io.h b/include/linux/io.h
index 986f2bf..04cce4d 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -111,6 +111,13 @@ static inline void arch_phys_wc_del(int handle)
 }
 
 #define arch_phys_wc_add arch_phys_wc_add
+#ifndef arch_phys_wc_index
+static inline int arch_phys_wc_index(int handle)
+{
+	return -1;
+}
+#define arch_phys_wc_index arch_phys_wc_index
+#endif
 #endif
 
 #endif /* _LINUX_IO_H */

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/mm] x86/mm/mtrr: Generalize runtime disabling of MTRRs
  2015-05-26  8:28 ` [PATCH 11/18] x86/mtrr: Generalize runtime disabling of MTRRs Borislav Petkov
@ 2015-05-27 14:20   ` tip-bot for Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Luis R. Rodriguez @ 2015-05-27 14:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: bp, bp, brgerst, mingo, linux-kernel, sbsiddha, mgorman, airlied,
	roger.pau, tomi.valkeinen, plagnioj, adaplas, dvlasenk,
	dave.hansen, torvalds, luto, syrjala, toshi.kani, tglx,
	stefan.bader, peterz, jgross, vbabka, hpa, mcgrof, dbueso,
	daniel.vetter

Commit-ID:  f9626104a5b6815ec7d65789dfb900af5fa51e64
Gitweb:     http://git.kernel.org/tip/f9626104a5b6815ec7d65789dfb900af5fa51e64
Author:     Luis R. Rodriguez <mcgrof@suse.com>
AuthorDate: Tue, 26 May 2015 10:28:14 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:41:01 +0200

x86/mm/mtrr: Generalize runtime disabling of MTRRs

It is possible to enable CONFIG_MTRR and CONFIG_X86_PAT and end
up with a system with MTRR functionality disabled but PAT
functionality enabled. This can happen, for instance, when the
Xen hypervisor is used where MTRRs are not supported but PAT is.
This can happen on Linux as of commit

  47591df50512 ("xen: Support Xen pv-domains using PAT")

by Juergen, introduced in v3.19.

Technically, we should assume the proper CPU bits would be set
to disable MTRRs but we can't always rely on this. At least on
the Xen Hypervisor, for instance, only X86_FEATURE_MTRR was
disabled as of Xen 4.4 through Xen commit 586ab6a [0], but not
X86_FEATURE_K6_MTRR, X86_FEATURE_CENTAUR_MCR, or
X86_FEATURE_CYRIX_ARR for instance.

Roger Pau Monné has clarified though that although this is
technically true we will never support PVH on these CPU types so
Xen has no need to disable these bits on those systems. As per
Roger, AMD K6, Centaur and VIA chips don't have the necessary
hardware extensions to allow running PVH guests [1].

As per Toshi it is also possible for the BIOS to disable MTRR
support, in such cases get_mtrr_state() would update the MTRR
state as per the BIOS, we need to propagate this information as
well.

x86 MTRR code relies on quite a bit of checks for mtrr_if being
set to check to see if MTRRs did get set up. Instead, lets
provide a generic getter for that. This also adds a few checks
where they were not before which could potentially safeguard
ourselves against incorrect usage of MTRR where this was not
desirable.

Where possible match error codes as if MTRRs were disabled on
arch/x86/include/asm/mtrr.h.

Lastly, since disabling MTRRs can happen at run time and we
could end up with PAT enabled, best record now in our logs when
MTRRs are disabled.

[0] ~/devel/xen (git::stable-4.5)$ git describe --contains 586ab6a 4.4.0-rc1~18
[1] http://lists.xenproject.org/archives/html/xen-devel/2015-03/msg03460.html

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Antonino Daplas <adaplas@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Ville Syrjälä <syrjala@sci.fi>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: bhelgaas@google.com
Cc: david.vrabel@citrix.com
Cc: jbeulich@suse.com
Cc: konrad.wilk@oracle.com
Cc: venkatesh.pallipadi@intel.com
Cc: ville.syrjala@linux.intel.com
Cc: xen-devel@lists.xensource.com
Link: http://lkml.kernel.org/r/1426893517-2511-3-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1432628901-18044-12-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/mtrr/generic.c |  4 +++-
 arch/x86/kernel/cpu/mtrr/main.c    | 39 ++++++++++++++++++++++++++++++--------
 arch/x86/kernel/cpu/mtrr/mtrr.h    |  2 +-
 3 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index f782d9b..3b533cf 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -445,7 +445,7 @@ static void __init print_mtrr_state(void)
 }
 
 /* Grab all of the MTRR state for this CPU into *state */
-void __init get_mtrr_state(void)
+bool __init get_mtrr_state(void)
 {
 	struct mtrr_var_range *vrs;
 	unsigned long flags;
@@ -489,6 +489,8 @@ void __init get_mtrr_state(void)
 
 	post_set();
 	local_irq_restore(flags);
+
+	return !!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED);
 }
 
 /* Some BIOS's are messed up and don't set all MTRRs the same! */
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 81baf5f..383efb2 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -59,6 +59,12 @@
 #define MTRR_TO_PHYS_WC_OFFSET 1000
 
 u32 num_var_ranges;
+static bool __mtrr_enabled;
+
+static bool mtrr_enabled(void)
+{
+	return __mtrr_enabled;
+}
 
 unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
 static DEFINE_MUTEX(mtrr_mutex);
@@ -286,7 +292,7 @@ int mtrr_add_page(unsigned long base, unsigned long size,
 	int i, replace, error;
 	mtrr_type ltype;
 
-	if (!mtrr_if)
+	if (!mtrr_enabled())
 		return -ENXIO;
 
 	error = mtrr_if->validate_add_page(base, size, type);
@@ -435,6 +441,8 @@ static int mtrr_check(unsigned long base, unsigned long size)
 int mtrr_add(unsigned long base, unsigned long size, unsigned int type,
 	     bool increment)
 {
+	if (!mtrr_enabled())
+		return -ENODEV;
 	if (mtrr_check(base, size))
 		return -EINVAL;
 	return mtrr_add_page(base >> PAGE_SHIFT, size >> PAGE_SHIFT, type,
@@ -463,8 +471,8 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
 	unsigned long lbase, lsize;
 	int error = -EINVAL;
 
-	if (!mtrr_if)
-		return -ENXIO;
+	if (!mtrr_enabled())
+		return -ENODEV;
 
 	max = num_var_ranges;
 	/* No CPU hotplug when we change MTRR entries */
@@ -523,6 +531,8 @@ int mtrr_del_page(int reg, unsigned long base, unsigned long size)
  */
 int mtrr_del(int reg, unsigned long base, unsigned long size)
 {
+	if (!mtrr_enabled())
+		return -ENODEV;
 	if (mtrr_check(base, size))
 		return -EINVAL;
 	return mtrr_del_page(reg, base >> PAGE_SHIFT, size >> PAGE_SHIFT);
@@ -548,7 +558,7 @@ int arch_phys_wc_add(unsigned long base, unsigned long size)
 {
 	int ret;
 
-	if (pat_enabled)
+	if (pat_enabled || !mtrr_enabled())
 		return 0;  /* Success!  (We don't need to do anything.) */
 
 	ret = mtrr_add(base, size, MTRR_TYPE_WRCOMB, true);
@@ -737,10 +747,12 @@ void __init mtrr_bp_init(void)
 	}
 
 	if (mtrr_if) {
+		__mtrr_enabled = true;
 		set_num_var_ranges();
 		init_table();
 		if (use_intel()) {
-			get_mtrr_state();
+			/* BIOS may override */
+			__mtrr_enabled = get_mtrr_state();
 
 			if (mtrr_cleanup(phys_addr)) {
 				changed_by_mtrr_cleanup = 1;
@@ -748,10 +760,16 @@ void __init mtrr_bp_init(void)
 			}
 		}
 	}
+
+	if (!mtrr_enabled())
+		pr_info("MTRR: Disabled\n");
 }
 
 void mtrr_ap_init(void)
 {
+	if (!mtrr_enabled())
+		return;
+
 	if (!use_intel() || mtrr_aps_delayed_init)
 		return;
 	/*
@@ -777,6 +795,9 @@ void mtrr_save_state(void)
 {
 	int first_cpu;
 
+	if (!mtrr_enabled())
+		return;
+
 	get_online_cpus();
 	first_cpu = cpumask_first(cpu_online_mask);
 	smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
@@ -785,6 +806,8 @@ void mtrr_save_state(void)
 
 void set_mtrr_aps_delayed_init(void)
 {
+	if (!mtrr_enabled())
+		return;
 	if (!use_intel())
 		return;
 
@@ -796,7 +819,7 @@ void set_mtrr_aps_delayed_init(void)
  */
 void mtrr_aps_init(void)
 {
-	if (!use_intel())
+	if (!use_intel() || !mtrr_enabled())
 		return;
 
 	/*
@@ -813,7 +836,7 @@ void mtrr_aps_init(void)
 
 void mtrr_bp_restore(void)
 {
-	if (!use_intel())
+	if (!use_intel() || !mtrr_enabled())
 		return;
 
 	mtrr_if->set_all();
@@ -821,7 +844,7 @@ void mtrr_bp_restore(void)
 
 static int __init mtrr_init_finialize(void)
 {
-	if (!mtrr_if)
+	if (!mtrr_enabled())
 		return 0;
 
 	if (use_intel()) {
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
index df5e41f..951884d 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -51,7 +51,7 @@ void set_mtrr_prepare_save(struct set_mtrr_context *ctxt);
 
 void fill_mtrr_var_range(unsigned int index,
 		u32 base_lo, u32 base_hi, u32 mask_lo, u32 mask_hi);
-void get_mtrr_state(void);
+bool get_mtrr_state(void);
 
 extern void set_mtrr_ops(const struct mtrr_ops *ops);
 

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/mm] x86/mm/pat: Wrap pat_enabled into a function API
  2015-05-26  8:28 ` [PATCH 12/18] x86/mm/pat: Wrap pat_enabled Borislav Petkov
@ 2015-05-27 14:20   ` tip-bot for Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Luis R. Rodriguez @ 2015-05-27 14:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, awalls, bp, tglx, dledford, brgerst, mingo, bhelgaas,
	mcgrof, torvalds, jgross, dvlasenk, hpa, kyle, bp, cl,
	linux-kernel, luto, airlied, mst, daniel.vetter

Commit-ID:  cb32edf65bf2197a2d2226e94c7602dc92e295bb
Gitweb:     http://git.kernel.org/tip/cb32edf65bf2197a2d2226e94c7602dc92e295bb
Author:     Luis R. Rodriguez <mcgrof@suse.com>
AuthorDate: Tue, 26 May 2015 10:28:15 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:41:01 +0200

x86/mm/pat: Wrap pat_enabled into a function API

We use pat_enabled in x86-specific code to see if PAT is enabled
or not but we're granting full access to it even though readers
do not need to set it. If, for instance, we granted access to it
to modules later they then could override the variable
setting... no bueno.

This renames pat_enabled to a new static variable __pat_enabled.
Folks are redirected to use pat_enabled() now.

Code that sets this can only be internal to pat.c. Apart from
the early kernel parameter "nopat" to disable PAT, we also have
a few cases that disable it later and make use of a helper
pat_disable(). It is wrapped under an ifdef but since that code
cannot run unless PAT was enabled its not required to wrap it
with ifdefs, unwrap that. Likewise, since "nopat" doesn't really
change non-PAT systems just remove that ifdef as well.

Although we could add and use an early_param_off(), these
helpers don't use __read_mostly but we want to keep
__read_mostly for __pat_enabled as this is a hot path -- upon
boot, for instance, a simple guest may see ~4k accesses to
pat_enabled(). Since __read_mostly early boot params are not
that common we don't add a helper for them just yet.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1430425520-22275-3-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1432628901-18044-13-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/pat.h      |  7 +------
 arch/x86/kernel/cpu/mtrr/main.c |  2 +-
 arch/x86/mm/iomap_32.c          |  2 +-
 arch/x86/mm/ioremap.c           |  4 ++--
 arch/x86/mm/pageattr.c          |  2 +-
 arch/x86/mm/pat.c               | 33 +++++++++++++++------------------
 arch/x86/pci/i386.c             |  6 +++---
 7 files changed, 24 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/pat.h b/arch/x86/include/asm/pat.h
index 91bc4ba..cdcff7f 100644
--- a/arch/x86/include/asm/pat.h
+++ b/arch/x86/include/asm/pat.h
@@ -4,12 +4,7 @@
 #include <linux/types.h>
 #include <asm/pgtable_types.h>
 
-#ifdef CONFIG_X86_PAT
-extern int pat_enabled;
-#else
-static const int pat_enabled;
-#endif
-
+bool pat_enabled(void);
 extern void pat_init(void);
 void pat_init_cache_modes(void);
 
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index 383efb2..e7ed0d8 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -558,7 +558,7 @@ int arch_phys_wc_add(unsigned long base, unsigned long size)
 {
 	int ret;
 
-	if (pat_enabled || !mtrr_enabled())
+	if (pat_enabled() || !mtrr_enabled())
 		return 0;  /* Success!  (We don't need to do anything.) */
 
 	ret = mtrr_add(base, size, MTRR_TYPE_WRCOMB, true);
diff --git a/arch/x86/mm/iomap_32.c b/arch/x86/mm/iomap_32.c
index 9ca35fc..3a2ec87 100644
--- a/arch/x86/mm/iomap_32.c
+++ b/arch/x86/mm/iomap_32.c
@@ -82,7 +82,7 @@ iomap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot)
 	 * MTRR is UC or WC.  UC_MINUS gets the real intention, of the
 	 * user, which is "WC if the MTRR is WC, UC if you can't do that."
 	 */
-	if (!pat_enabled && pgprot_val(prot) ==
+	if (!pat_enabled() && pgprot_val(prot) ==
 	    (__PAGE_KERNEL | cachemode2protval(_PAGE_CACHE_MODE_WC)))
 		prot = __pgprot(__PAGE_KERNEL |
 				cachemode2protval(_PAGE_CACHE_MODE_UC_MINUS));
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index a493bb8..82d63ed 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -234,7 +234,7 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, unsigned long size)
 {
 	/*
 	 * Ideally, this should be:
-	 *	pat_enabled ? _PAGE_CACHE_MODE_UC : _PAGE_CACHE_MODE_UC_MINUS;
+	 *	pat_enabled() ? _PAGE_CACHE_MODE_UC : _PAGE_CACHE_MODE_UC_MINUS;
 	 *
 	 * Till we fix all X drivers to use ioremap_wc(), we will use
 	 * UC MINUS. Drivers that are certain they need or can already
@@ -292,7 +292,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc);
  */
 void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
 {
-	if (pat_enabled)
+	if (pat_enabled())
 		return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC,
 					__builtin_return_address(0));
 	else
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 397838e..70d221f 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1571,7 +1571,7 @@ int set_memory_wc(unsigned long addr, int numpages)
 {
 	int ret;
 
-	if (!pat_enabled)
+	if (!pat_enabled())
 		return set_memory_uc(addr, numpages);
 
 	ret = reserve_memtype(__pa(addr), __pa(addr) + numpages * PAGE_SIZE,
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 8c50b9b..484dce7 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -36,12 +36,11 @@
 #undef pr_fmt
 #define pr_fmt(fmt) "" fmt
 
-#ifdef CONFIG_X86_PAT
-int __read_mostly pat_enabled = 1;
+static int __read_mostly __pat_enabled = IS_ENABLED(CONFIG_X86_PAT);
 
 static inline void pat_disable(const char *reason)
 {
-	pat_enabled = 0;
+	__pat_enabled = 0;
 	pr_info("x86/PAT: %s\n", reason);
 }
 
@@ -51,13 +50,11 @@ static int __init nopat(char *str)
 	return 0;
 }
 early_param("nopat", nopat);
-#else
-static inline void pat_disable(const char *reason)
+
+bool pat_enabled(void)
 {
-	(void)reason;
+	return !!__pat_enabled;
 }
-#endif
-
 
 int pat_debug_enable;
 
@@ -201,7 +198,7 @@ void pat_init(void)
 	u64 pat;
 	bool boot_cpu = !boot_pat_state;
 
-	if (!pat_enabled)
+	if (!pat_enabled())
 		return;
 
 	if (!cpu_has_pat) {
@@ -402,7 +399,7 @@ int reserve_memtype(u64 start, u64 end, enum page_cache_mode req_type,
 
 	BUG_ON(start >= end); /* end is exclusive */
 
-	if (!pat_enabled) {
+	if (!pat_enabled()) {
 		/* This is identical to page table setting without PAT */
 		if (new_type) {
 			if (req_type == _PAGE_CACHE_MODE_WC)
@@ -477,7 +474,7 @@ int free_memtype(u64 start, u64 end)
 	int is_range_ram;
 	struct memtype *entry;
 
-	if (!pat_enabled)
+	if (!pat_enabled())
 		return 0;
 
 	/* Low ISA region is always mapped WB. No need to track */
@@ -625,7 +622,7 @@ static inline int range_is_allowed(unsigned long pfn, unsigned long size)
 	u64 to = from + size;
 	u64 cursor = from;
 
-	if (!pat_enabled)
+	if (!pat_enabled())
 		return 1;
 
 	while (cursor < to) {
@@ -661,7 +658,7 @@ int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
 	 * caching for the high addresses through the KEN pin, but
 	 * we maintain the tradition of paranoia in this code.
 	 */
-	if (!pat_enabled &&
+	if (!pat_enabled() &&
 	    !(boot_cpu_has(X86_FEATURE_MTRR) ||
 	      boot_cpu_has(X86_FEATURE_K6_MTRR) ||
 	      boot_cpu_has(X86_FEATURE_CYRIX_ARR) ||
@@ -730,7 +727,7 @@ static int reserve_pfn_range(u64 paddr, unsigned long size, pgprot_t *vma_prot,
 	 * the type requested matches the type of first page in the range.
 	 */
 	if (is_ram) {
-		if (!pat_enabled)
+		if (!pat_enabled())
 			return 0;
 
 		pcm = lookup_memtype(paddr);
@@ -844,7 +841,7 @@ int track_pfn_remap(struct vm_area_struct *vma, pgprot_t *prot,
 		return ret;
 	}
 
-	if (!pat_enabled)
+	if (!pat_enabled())
 		return 0;
 
 	/*
@@ -872,7 +869,7 @@ int track_pfn_insert(struct vm_area_struct *vma, pgprot_t *prot,
 {
 	enum page_cache_mode pcm;
 
-	if (!pat_enabled)
+	if (!pat_enabled())
 		return 0;
 
 	/* Set prot based on lookup */
@@ -913,7 +910,7 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
 
 pgprot_t pgprot_writecombine(pgprot_t prot)
 {
-	if (pat_enabled)
+	if (pat_enabled())
 		return __pgprot(pgprot_val(prot) |
 				cachemode2protval(_PAGE_CACHE_MODE_WC));
 	else
@@ -996,7 +993,7 @@ static const struct file_operations memtype_fops = {
 
 static int __init pat_memtype_list_init(void)
 {
-	if (pat_enabled) {
+	if (pat_enabled()) {
 		debugfs_create_file("pat_memtype_list", S_IRUSR,
 				    arch_debugfs_dir, NULL, &memtype_fops);
 	}
diff --git a/arch/x86/pci/i386.c b/arch/x86/pci/i386.c
index 349c0d3..0a9f2ca 100644
--- a/arch/x86/pci/i386.c
+++ b/arch/x86/pci/i386.c
@@ -429,12 +429,12 @@ int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma,
  	 * Caller can followup with UC MINUS request and add a WC mtrr if there
  	 * is a free mtrr slot.
  	 */
-	if (!pat_enabled && write_combine)
+	if (!pat_enabled() && write_combine)
 		return -EINVAL;
 
-	if (pat_enabled && write_combine)
+	if (pat_enabled() && write_combine)
 		prot |= cachemode2protval(_PAGE_CACHE_MODE_WC);
-	else if (pat_enabled || boot_cpu_data.x86 > 3)
+	else if (pat_enabled() || boot_cpu_data.x86 > 3)
 		/*
 		 * ioremap() and ioremap_nocache() defaults to UC MINUS for now.
 		 * To avoid attribute conflicts, request UC MINUS here

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* [tip:x86/mm] x86/mm/pat: Export pat_enabled()
  2015-05-26  8:28 ` [PATCH 13/18] x86/mm/pat: Export pat_enabled() Borislav Petkov
@ 2015-05-27 14:21   ` tip-bot for Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Luis R. Rodriguez @ 2015-05-27 14:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mcgrof, torvalds, peterz, tglx, mst, awalls, brgerst, jgross,
	hpa, linux-kernel, dvlasenk, dledford, bp, bp, luto,
	daniel.vetter, airlied, bhelgaas, mingo

Commit-ID:  fbe7193aa4787f27c84216d130ab877efc310d57
Gitweb:     http://git.kernel.org/tip/fbe7193aa4787f27c84216d130ab877efc310d57
Author:     Luis R. Rodriguez <mcgrof@suse.com>
AuthorDate: Tue, 26 May 2015 10:28:16 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:41:02 +0200

x86/mm/pat: Export pat_enabled()

Two Linux device drivers cannot work with PAT and the work
required to make them work is significant. There is not enough
motivation to convert these drivers over to use PAT properly,
the compromise reached is to let drivers that cannot be ported
to PAT check if PAT was enabled and if so fail on probe with a
recommendation to boot with the "nopat" kernel parameter.

Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1430425520-22275-4-git-send-email-mcgrof@do-not-panic.com
Link: http://lkml.kernel.org/r/1432628901-18044-14-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/mm/pat.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 484dce7..a1c9654 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -55,6 +55,7 @@ bool pat_enabled(void)
 {
 	return !!__pat_enabled;
 }
+EXPORT_SYMBOL_GPL(pat_enabled);
 
 int pat_debug_enable;
 

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [tip:x86/cpu] x86/cpu: Strip any /proc/ cpuinfo model name field whitespace
  2015-05-27 14:16   ` [tip:x86/cpu] x86/cpu: Strip any /proc/ cpuinfo " tip-bot for Prarit Bhargava
@ 2015-05-27 17:07     ` Joe Perches
  2015-05-27 19:06       ` Borislav Petkov
  0 siblings, 1 reply; 710+ messages in thread
From: Joe Perches @ 2015-05-27 17:07 UTC (permalink / raw)
  To: luto, bp, peterz, dvlasenk, torvalds, imammedo, brgerst, mingo,
	prarit, dave.hansen, fenghua.yu, hpa, linux-kernel, tglx, bp
  Cc: linux-tip-commits

On Wed, 2015-05-27 at 07:16 -0700, tip-bot for Prarit Bhargava wrote:

> x86/cpu: Strip any /proc/cpuinfo model name field whitespace
[]
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> @@ -431,18 +430,10 @@ static void get_model_name(struct cpuinfo_x86 *c)
>  	c->x86_model_id[48] = 0;
>  
>  	/*
> -	 * Intel chips right-justify this string for some dumb reason;
> -	 * undo that brain damage:
> +	 * Remove leading whitespace on Intel processors and trailing
> +	 * whitespace on AMD processors.
>  	 */
> -	p = q = &c->x86_model_id[0];
> -	while (*p == ' ')
> -		p++;
> -	if (p != q) {
> -		while (*p)
> -			*q++ = *p++;
> -		while (q <= &c->x86_model_id[48])
> -			*q++ = '\0';	/* Zero-pad the rest */
> -	}
> +	memmove(c->x86_model_id, strim(c->x86_model_id), 48);

This code can memmove from beyond the x86_model_id field.

If the id was a single right justified char, to avoid overrunning
the field, it'd be safer moving only the actual string and
terminating 0 though this code is sub-optimal:

	memmove(c->x86_model_id, strim(c->x86_model_id),
		strlen(strim(c->x86_model_id) + 1);

Maybe:
	char *model = strim(c->x86_model_id);
	memmove(c->x86_model_id, model, strlen(model) + 1);


^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH] x86, cpuinfo x86_model_id whitespace cleanup
  2015-05-19 19:22       ` Borislav Petkov
  2015-05-19 20:16         ` Andy Lutomirski
@ 2015-05-27 17:18         ` H. Peter Anvin
  1 sibling, 0 replies; 710+ messages in thread
From: H. Peter Anvin @ 2015-05-27 17:18 UTC (permalink / raw)
  To: Borislav Petkov, Andy Lutomirski
  Cc: linux-kernel, Fenghua Yu, Dave Hansen, Thomas Gleixner,
	Denys Vlasenko, Ingo Molnar, Brian Gerst, Igor Mammedov,
	the arch/x86 maintainers, Prarit Bhargava

On 05/19/2015 12:22 PM, Borislav Petkov wrote:
> 
> I guess I'm trying to find out why don't we have a BIG FAT WARNING over
> memcpy saying not to use it with overlapping buffers and larger than
> byte sizes. Or maybe this is something everyone, except me, just knows
> and that's a "Doh, Boris, of course!".
> 

It kind of is, and doesn't just apply to kernel programming.  In C99+
the memcpy() prototype has "restrict" in it to denote that memcpy()
buffers have to be non-overlapping.

	-hpa



^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [tip:x86/cpu] x86/cpu: Strip any /proc/ cpuinfo model name field whitespace
  2015-05-27 17:07     ` Joe Perches
@ 2015-05-27 19:06       ` Borislav Petkov
  2015-05-27 19:16         ` Joe Perches
  0 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-27 19:06 UTC (permalink / raw)
  To: Joe Perches
  Cc: luto, peterz, dvlasenk, torvalds, imammedo, brgerst, mingo,
	prarit, dave.hansen, fenghua.yu, hpa, linux-kernel, tglx, bp,
	linux-tip-commits

On Wed, May 27, 2015 at 10:07:34AM -0700, Joe Perches wrote:
> This code can memmove from beyond the x86_model_id field.

... in the theoretical case where some model ID has more than 64 - 48
preceding white spaces.

I guess we want to be prepared here for insane CPU model IDs coming from
virtualization.

> Maybe:
> 	char *model = strim(c->x86_model_id);
> 	memmove(c->x86_model_id, model, strlen(model) + 1);

Yes, and additionally limit that string length:

---
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index b35c777df6df..9d1fd48486d6 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -383,6 +383,9 @@ static const struct cpu_dev *cpu_devs[X86_VENDOR_NUM] = {};
 static void get_model_name(struct cpuinfo_x86 *c)
 {
 	unsigned int *v;
+	const char *model;
+
+#define MODEL_ID_MAXLEN 48
 
 	if (c->extended_cpuid_level < 0x80000004)
 		return;
@@ -391,13 +394,15 @@ static void get_model_name(struct cpuinfo_x86 *c)
 	cpuid(0x80000002, &v[0], &v[1], &v[2], &v[3]);
 	cpuid(0x80000003, &v[4], &v[5], &v[6], &v[7]);
 	cpuid(0x80000004, &v[8], &v[9], &v[10], &v[11]);
-	c->x86_model_id[48] = 0;
+	c->x86_model_id[MODEL_ID_MAXLEN] = 0;
 
 	/*
 	 * Remove leading whitespace on Intel processors and trailing
 	 * whitespace on AMD processors.
 	 */
-	memmove(c->x86_model_id, strim(c->x86_model_id), 48);
+	model = strim(c->x86_model_id);
+
+	memmove(c->x86_model_id, model, strnlen(model, MODEL_ID_MAXLEN) + 1);
 }
 
 void cpu_detect_cache_sizes(struct cpuinfo_x86 *c)

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [tip:x86/cpu] x86/cpu: Strip any /proc/ cpuinfo model name field whitespace
  2015-05-27 19:06       ` Borislav Petkov
@ 2015-05-27 19:16         ` Joe Perches
  2015-05-28 11:27           ` Prarit Bhargava
  0 siblings, 1 reply; 710+ messages in thread
From: Joe Perches @ 2015-05-27 19:16 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: luto, peterz, dvlasenk, torvalds, imammedo, brgerst, mingo,
	prarit, dave.hansen, fenghua.yu, hpa, linux-kernel, tglx, bp,
	linux-tip-commits

On Wed, 2015-05-27 at 21:06 +0200, Borislav Petkov wrote:
> On Wed, May 27, 2015 at 10:07:34AM -0700, Joe Perches wrote:
> > This code can memmove from beyond the x86_model_id field.
> 
> ... in the theoretical case where some model ID has more than 64 - 48
> preceding white spaces.
> 
> I guess we want to be prepared here for insane CPU model IDs coming from
> virtualization.
> 
> > Maybe:
> > 	char *model = strim(c->x86_model_id);
> > 	memmove(c->x86_model_id, model, strlen(model) + 1);
> 
> Yes, and additionally limit that string length:
> 
> ---
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
[]
> @@ -383,6 +383,9 @@ static const struct cpu_dev *cpu_devs[X86_VENDOR_NUM] = {};
>  static void get_model_name(struct cpuinfo_x86 *c)
>  {
>  	unsigned int *v;
> +	const char *model;
> +
> +#define MODEL_ID_MAXLEN 48
>  
>  	if (c->extended_cpuid_level < 0x80000004)
>  		return;
> @@ -391,13 +394,15 @@ static void get_model_name(struct cpuinfo_x86 *c)
>  	cpuid(0x80000002, &v[0], &v[1], &v[2], &v[3]);
>  	cpuid(0x80000003, &v[4], &v[5], &v[6], &v[7]);
>  	cpuid(0x80000004, &v[8], &v[9], &v[10], &v[11]);
> -	c->x86_model_id[48] = 0;
> +	c->x86_model_id[MODEL_ID_MAXLEN] = 0;
>  
>  	/*
>  	 * Remove leading whitespace on Intel processors and trailing
>  	 * whitespace on AMD processors.
>  	 */
> -	memmove(c->x86_model_id, strim(c->x86_model_id), 48);
> +	model = strim(c->x86_model_id);
> +
> +	memmove(c->x86_model_id, model, strnlen(model, MODEL_ID_MAXLEN) + 1);

I don't see any value in the #define or strnlen over strlen as
it's guaranteed terminated by the = 0 above, but <shrug> thanks.

cheers, Joe


^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [tip:x86/cpu] x86/cpu: Strip any /proc/ cpuinfo model name field whitespace
  2015-05-27 19:16         ` Joe Perches
@ 2015-05-28 11:27           ` Prarit Bhargava
  2015-05-28 11:32             ` Borislav Petkov
  0 siblings, 1 reply; 710+ messages in thread
From: Prarit Bhargava @ 2015-05-28 11:27 UTC (permalink / raw)
  To: Joe Perches
  Cc: Borislav Petkov, luto, peterz, dvlasenk, torvalds, imammedo,
	brgerst, mingo, dave.hansen, fenghua.yu, hpa, linux-kernel, tglx,
	bp, linux-tip-commits

On 05/27/2015 03:16 PM, Joe Perches wrote:
> On Wed, 2015-05-27 at 21:06 +0200, Borislav Petkov wrote:
>> On Wed, May 27, 2015 at 10:07:34AM -0700, Joe Perches wrote:
>>> This code can memmove from beyond the x86_model_id field.
>>
>> ... in the theoretical case where some model ID has more than 64 - 48
>> preceding white spaces.
>>
>> I guess we want to be prepared here for insane CPU model IDs coming from
>> virtualization.
>>
>>> Maybe:
>>> 	char *model = strim(c->x86_model_id);
>>> 	memmove(c->x86_model_id, model, strlen(model) + 1);
>>
>> Yes, and additionally limit that string length:
>>
>> ---
>> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> []
>> @@ -383,6 +383,9 @@ static const struct cpu_dev *cpu_devs[X86_VENDOR_NUM] = {};
>>  static void get_model_name(struct cpuinfo_x86 *c)
>>  {
>>  	unsigned int *v;
>> +	const char *model;
>> +
>> +#define MODEL_ID_MAXLEN 48
>>  
>>  	if (c->extended_cpuid_level < 0x80000004)
>>  		return;
>> @@ -391,13 +394,15 @@ static void get_model_name(struct cpuinfo_x86 *c)
>>  	cpuid(0x80000002, &v[0], &v[1], &v[2], &v[3]);
>>  	cpuid(0x80000003, &v[4], &v[5], &v[6], &v[7]);
>>  	cpuid(0x80000004, &v[8], &v[9], &v[10], &v[11]);
>> -	c->x86_model_id[48] = 0;
>> +	c->x86_model_id[MODEL_ID_MAXLEN] = 0;
>>  
>>  	/*
>>  	 * Remove leading whitespace on Intel processors and trailing
>>  	 * whitespace on AMD processors.
>>  	 */
>> -	memmove(c->x86_model_id, strim(c->x86_model_id), 48);
>> +	model = strim(c->x86_model_id);
>> +
>> +	memmove(c->x86_model_id, model, strnlen(model, MODEL_ID_MAXLEN) + 1);
> 
> I don't see any value in the #define or strnlen over strlen as
> it's guaranteed terminated by the = 0 above, but <shrug> thanks.
> 

FWIW, I agree with Joe here and don't think the #define is necessary.
I will post a follow-up patch against tip on LKML shortly.

P.

> cheers, Joe
> 
> 
> 


^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [tip:x86/cpu] x86/cpu: Strip any /proc/ cpuinfo model name field whitespace
  2015-05-28 11:27           ` Prarit Bhargava
@ 2015-05-28 11:32             ` Borislav Petkov
  2015-05-28 12:58               ` Borislav Petkov
  0 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-28 11:32 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: Joe Perches, luto, peterz, dvlasenk, torvalds, imammedo, brgerst,
	mingo, dave.hansen, fenghua.yu, hpa, linux-kernel, tglx, bp,
	linux-tip-commits

On Thu, May 28, 2015 at 07:27:19AM -0400, Prarit Bhargava wrote:
> FWIW, I agree with Joe here and don't think the #define is necessary.
> I will post a follow-up patch against tip on LKML shortly.

No need, I have a better one:

---
From: Borislav Petkov <bp@suse.de>
Date: Tue, 26 May 2015 10:28:17 +0200
Subject: [PATCH] x86/cpu: Trim model id whitespace

We did try trimming whitespace surrounding the 'model name' field
in /proc/cpuinfo since reportedly some userspace uses it in string
comparisons and there were discrepancies:

  [thetango@prarit ~]# grep "^model name" /proc/cpuinfo | uniq -c | sed 's/\ /_/g'
  ______1_model_name      :_AMD_Opteron(TM)_Processor_6272
  _____63_model_name      :_AMD_Opteron(TM)_Processor_6272_________________

However, there were issues with overlapping buffers, string sizes and
non-byte-sized copies in the previous proposed solutions; see Link tags
below for the whole farce.

So, instead of diddling with this more, let's simply extend what was
there originally with trimming any present trailing whitespace. Final
result is really simple and obvious.

Testing with the most insane model IDs qemu can generate, looks good:

  .model_id = "            My funny model ID CPU          ",
  ______4_model_name      :_My_funny_model_ID_CPU

  .model_id = "My funny model ID CPU          ",
  ______4_model_name      :_My_funny_model_ID_CPU

  .model_id = "            My funny model ID CPU",
  ______4_model_name      :_My_funny_model_ID_CPU

  .model_id = "            ",
  ______4_model_name      :__

  .model_id = "",
  ______4_model_name      :_15/02

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1432050210-32036-1-git-send-email-prarit@redhat.com
Link: http://lkml.kernel.org/r/1432628901-18044-15-git-send-email-bp@alien8.de
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/common.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 41a8e9cb30bc..351197cbbc8e 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -5,6 +5,7 @@
 #include <linux/module.h>
 #include <linux/percpu.h>
 #include <linux/string.h>
+#include <linux/ctype.h>
 #include <linux/delay.h>
 #include <linux/sched.h>
 #include <linux/init.h>
@@ -419,6 +420,7 @@ static const struct cpu_dev *cpu_devs[X86_VENDOR_NUM] = {};
 static void get_model_name(struct cpuinfo_x86 *c)
 {
 	unsigned int *v;
+	char *p, *q, *s;
 
 	if (c->extended_cpuid_level < 0x80000004)
 		return;
@@ -429,11 +431,21 @@ static void get_model_name(struct cpuinfo_x86 *c)
 	cpuid(0x80000004, &v[8], &v[9], &v[10], &v[11]);
 	c->x86_model_id[48] = 0;
 
-	/*
-	 * Remove leading whitespace on Intel processors and trailing
-	 * whitespace on AMD processors.
-	 */
-	memmove(c->x86_model_id, strim(c->x86_model_id), 48);
+	/* Trim whitespace */
+	p = q = s = &c->x86_model_id[0];
+
+	while (*p == ' ')
+		p++;
+
+	while (*p) {
+		/* Note the last non-whitespace index */
+		if (!isspace(*p))
+			s = q;
+
+		*q++ = *p++;
+	}
+
+	*(s + 1) = '\0';
 }
 
 void cpu_detect_cache_sizes(struct cpuinfo_x86 *c)
-- 
2.3.5

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [tip:x86/cpu] x86/cpu: Strip any /proc/ cpuinfo model name field whitespace
  2015-05-28 11:32             ` Borislav Petkov
@ 2015-05-28 12:58               ` Borislav Petkov
  2015-05-28 16:57                 ` H. Peter Anvin
  0 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-28 12:58 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: Joe Perches, luto, peterz, dvlasenk, torvalds, imammedo, brgerst,
	mingo, dave.hansen, fenghua.yu, hpa, linux-kernel, tglx, bp,
	linux-tip-commits

On Thu, May 28, 2015 at 01:32:29PM +0200, Borislav Petkov wrote:
> +	while (*p) {
> +		/* Note the last non-whitespace index */
> +		if (!isspace(*p))
> +			s = q;
> +
> +		*q++ = *p++;

This should be optimized to not copy if there's no preceding whitespace
and p == q:

From: Borislav Petkov <bp@suse.de>
Date: Tue, 26 May 2015 10:28:17 +0200
Subject: [PATCH] x86/cpu: Trim model id whitespace

We did try trimming whitespace surrounding the 'model name' field
in /proc/cpuinfo since reportedly some userspace uses it in string
comparisons and there were discrepancies:

  [thetango@prarit ~]# grep "^model name" /proc/cpuinfo | uniq -c | sed 's/\ /_/g'
  ______1_model_name      :_AMD_Opteron(TM)_Processor_6272
  _____63_model_name      :_AMD_Opteron(TM)_Processor_6272_________________

However, there were issues with overlapping buffers, string sizes and
non-byte-sized copies in the previous proposed solutions; see Link tags
below for the whole farce.

So, instead of diddling with this more, let's simply extend what was
there originally with trimming any present trailing whitespace. Final
result is really simple and obvious.

Testing with the most insane model IDs qemu can generate, looks good:

  .model_id = "            My funny model ID CPU          ",
  ______4_model_name      :_My_funny_model_ID_CPU

  .model_id = "My funny model ID CPU          ",
  ______4_model_name      :_My_funny_model_ID_CPU

  .model_id = "            My funny model ID CPU",
  ______4_model_name      :_My_funny_model_ID_CPU

  .model_id = "            ",
  ______4_model_name      :__

  .model_id = "",
  ______4_model_name      :_15/02

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1432050210-32036-1-git-send-email-prarit@redhat.com
Link: http://lkml.kernel.org/r/1432628901-18044-15-git-send-email-bp@alien8.de
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/common.c | 27 ++++++++++++++++++++++-----
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 41a8e9cb30bc..18120a33a2c1 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -5,6 +5,7 @@
 #include <linux/module.h>
 #include <linux/percpu.h>
 #include <linux/string.h>
+#include <linux/ctype.h>
 #include <linux/delay.h>
 #include <linux/sched.h>
 #include <linux/init.h>
@@ -419,6 +420,7 @@ static const struct cpu_dev *cpu_devs[X86_VENDOR_NUM] = {};
 static void get_model_name(struct cpuinfo_x86 *c)
 {
 	unsigned int *v;
+	char *p, *q, *s;
 
 	if (c->extended_cpuid_level < 0x80000004)
 		return;
@@ -429,11 +431,26 @@ static void get_model_name(struct cpuinfo_x86 *c)
 	cpuid(0x80000004, &v[8], &v[9], &v[10], &v[11]);
 	c->x86_model_id[48] = 0;
 
-	/*
-	 * Remove leading whitespace on Intel processors and trailing
-	 * whitespace on AMD processors.
-	 */
-	memmove(c->x86_model_id, strim(c->x86_model_id), 48);
+	/* Trim whitespace */
+	p = q = s = &c->x86_model_id[0];
+
+	while (*p == ' ')
+		p++;
+
+	while (*p) {
+		/* Note the last non-whitespace index: */
+		if (!isspace(*p))
+			s = q;
+
+		/* Only copy if p advanced due to whitespace: */
+		if (p != q)
+			*q = *p;
+
+		p++;
+		q++;
+	}
+
+	*(s + 1) = '\0';
 }
 
 void cpu_detect_cache_sizes(struct cpuinfo_x86 *c)
-- 
2.3.5

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [tip:x86/cpu] x86/cpu: Strip any /proc/ cpuinfo model name field whitespace
  2015-05-28 12:58               ` Borislav Petkov
@ 2015-05-28 16:57                 ` H. Peter Anvin
  2015-05-28 18:33                   ` Borislav Petkov
  0 siblings, 1 reply; 710+ messages in thread
From: H. Peter Anvin @ 2015-05-28 16:57 UTC (permalink / raw)
  To: Borislav Petkov, Prarit Bhargava
  Cc: Joe Perches, luto, peterz, dvlasenk, torvalds, imammedo, brgerst,
	mingo, dave.hansen, fenghua.yu, linux-kernel, tglx, bp,
	linux-tip-commits

Why?!

We are taking about 48 bytes run once per cpu.  It isn't worth it to optimize, in fact the extra code size hurts more.

On May 28, 2015 5:58:19 AM PDT, Borislav Petkov <bp@alien8.de> wrote:
>On Thu, May 28, 2015 at 01:32:29PM +0200, Borislav Petkov wrote:
>> +	while (*p) {
>> +		/* Note the last non-whitespace index */
>> +		if (!isspace(*p))
>> +			s = q;
>> +
>> +		*q++ = *p++;
>
>This should be optimized to not copy if there's no preceding whitespace
>and p == q:
>
>From: Borislav Petkov <bp@suse.de>
>Date: Tue, 26 May 2015 10:28:17 +0200
>Subject: [PATCH] x86/cpu: Trim model id whitespace
>
>We did try trimming whitespace surrounding the 'model name' field
>in /proc/cpuinfo since reportedly some userspace uses it in string
>comparisons and there were discrepancies:
>
>[thetango@prarit ~]# grep "^model name" /proc/cpuinfo | uniq -c | sed
>'s/\ /_/g'
>  ______1_model_name      :_AMD_Opteron(TM)_Processor_6272
>_____63_model_name     
>:_AMD_Opteron(TM)_Processor_6272_________________
>
>However, there were issues with overlapping buffers, string sizes and
>non-byte-sized copies in the previous proposed solutions; see Link tags
>below for the whole farce.
>
>So, instead of diddling with this more, let's simply extend what was
>there originally with trimming any present trailing whitespace. Final
>result is really simple and obvious.
>
>Testing with the most insane model IDs qemu can generate, looks good:
>
>  .model_id = "            My funny model ID CPU          ",
>  ______4_model_name      :_My_funny_model_ID_CPU
>
>  .model_id = "My funny model ID CPU          ",
>  ______4_model_name      :_My_funny_model_ID_CPU
>
>  .model_id = "            My funny model ID CPU",
>  ______4_model_name      :_My_funny_model_ID_CPU
>
>  .model_id = "            ",
>  ______4_model_name      :__
>
>  .model_id = "",
>  ______4_model_name      :_15/02
>
>Cc: Andy Lutomirski <luto@amacapital.net>
>Cc: Brian Gerst <brgerst@gmail.com>
>Cc: Dave Hansen <dave.hansen@linux.intel.com>
>Cc: Denys Vlasenko <dvlasenk@redhat.com>
>Cc: Fenghua Yu <fenghua.yu@intel.com>
>Cc: H. Peter Anvin <hpa@zytor.com>
>Cc: Igor Mammedov <imammedo@redhat.com>
>Cc: Linus Torvalds <torvalds@linux-foundation.org>
>Cc: Peter Zijlstra <peterz@infradead.org>
>Cc: Thomas Gleixner <tglx@linutronix.de>
>Link:
>http://lkml.kernel.org/r/1432050210-32036-1-git-send-email-prarit@redhat.com
>Link:
>http://lkml.kernel.org/r/1432628901-18044-15-git-send-email-bp@alien8.de
>Signed-off-by: Borislav Petkov <bp@suse.de>
>---
> arch/x86/kernel/cpu/common.c | 27 ++++++++++++++++++++++-----
> 1 file changed, 22 insertions(+), 5 deletions(-)
>
>diff --git a/arch/x86/kernel/cpu/common.c
>b/arch/x86/kernel/cpu/common.c
>index 41a8e9cb30bc..18120a33a2c1 100644
>--- a/arch/x86/kernel/cpu/common.c
>+++ b/arch/x86/kernel/cpu/common.c
>@@ -5,6 +5,7 @@
> #include <linux/module.h>
> #include <linux/percpu.h>
> #include <linux/string.h>
>+#include <linux/ctype.h>
> #include <linux/delay.h>
> #include <linux/sched.h>
> #include <linux/init.h>
>@@ -419,6 +420,7 @@ static const struct cpu_dev
>*cpu_devs[X86_VENDOR_NUM] = {};
> static void get_model_name(struct cpuinfo_x86 *c)
> {
> 	unsigned int *v;
>+	char *p, *q, *s;
> 
> 	if (c->extended_cpuid_level < 0x80000004)
> 		return;
>@@ -429,11 +431,26 @@ static void get_model_name(struct cpuinfo_x86 *c)
> 	cpuid(0x80000004, &v[8], &v[9], &v[10], &v[11]);
> 	c->x86_model_id[48] = 0;
> 
>-	/*
>-	 * Remove leading whitespace on Intel processors and trailing
>-	 * whitespace on AMD processors.
>-	 */
>-	memmove(c->x86_model_id, strim(c->x86_model_id), 48);
>+	/* Trim whitespace */
>+	p = q = s = &c->x86_model_id[0];
>+
>+	while (*p == ' ')
>+		p++;
>+
>+	while (*p) {
>+		/* Note the last non-whitespace index: */
>+		if (!isspace(*p))
>+			s = q;
>+
>+		/* Only copy if p advanced due to whitespace: */
>+		if (p != q)
>+			*q = *p;
>+
>+		p++;
>+		q++;
>+	}
>+
>+	*(s + 1) = '\0';
> }
> 
> void cpu_detect_cache_sizes(struct cpuinfo_x86 *c)

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [tip:x86/cpu] x86/cpu: Strip any /proc/ cpuinfo model name field whitespace
  2015-05-28 16:57                 ` H. Peter Anvin
@ 2015-05-28 18:33                   ` Borislav Petkov
  2015-05-28 20:39                     ` H. Peter Anvin
  0 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-05-28 18:33 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Prarit Bhargava, Joe Perches, luto, peterz, dvlasenk, torvalds,
	imammedo, brgerst, mingo, dave.hansen, fenghua.yu, linux-kernel,
	tglx, bp, linux-tip-commits

On Thu, May 28, 2015 at 09:57:15AM -0700, H. Peter Anvin wrote:
> Why?!
>
> We are taking about 48 bytes run once per cpu. It isn't worth it to
> optimize, in fact the extra code size hurts more.

I wanted to save us the redundant copying of the exact same bytes.
Because when there's no preceding whitespace, p and q point at the same
thing so we end up doing *p = *p.

OTOH, without the optimization, the code is even simpler.

I can remove it if you wanna - I don't care all that much.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [tip:x86/cpu] x86/cpu: Strip any /proc/ cpuinfo model name field whitespace
  2015-05-28 18:33                   ` Borislav Petkov
@ 2015-05-28 20:39                     ` H. Peter Anvin
  0 siblings, 0 replies; 710+ messages in thread
From: H. Peter Anvin @ 2015-05-28 20:39 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Prarit Bhargava, Joe Perches, luto, peterz, dvlasenk, torvalds,
	imammedo, brgerst, mingo, dave.hansen, fenghua.yu, linux-kernel,
	tglx, bp, linux-tip-commits

On 05/28/2015 11:33 AM, Borislav Petkov wrote:
> On Thu, May 28, 2015 at 09:57:15AM -0700, H. Peter Anvin wrote:
>> Why?!
>>
>> We are taking about 48 bytes run once per cpu. It isn't worth it to
>> optimize, in fact the extra code size hurts more.
> 
> I wanted to save us the redundant copying of the exact same bytes.
> Because when there's no preceding whitespace, p and q point at the same
> thing so we end up doing *p = *p.
> 
> OTOH, without the optimization, the code is even simpler.
> 
> I can remove it if you wanna - I don't care all that much.
> 

Yes, please.  Actually, with a test inside the loop the way you have it,
the resulting code will almost certainly be slower -- a redundant write
to an already dirty cache line is way cheaper than a branch.

	-hpa


^ permalink raw reply	[flat|nested] 710+ messages in thread

* [tip:x86/cpu] x86/cpu: Trim model ID whitespace
  2015-05-19 15:43 [PATCH] x86, cpuinfo x86_model_id whitespace cleanup Prarit Bhargava
                   ` (2 preceding siblings ...)
  2015-05-20  6:34 ` Ingo Molnar
@ 2015-06-02  8:42 ` tip-bot for Borislav Petkov
  3 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Borislav Petkov @ 2015-06-02  8:42 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, dvlasenk, brgerst, fenghua.yu, tglx, bp, peterz, luto,
	linux-kernel, dave.hansen, torvalds, hpa, imammedo

Commit-ID:  ee098e1aed67715f0ce4651813d0c33ab3a56e0b
Gitweb:     http://git.kernel.org/tip/ee098e1aed67715f0ce4651813d0c33ab3a56e0b
Author:     Borislav Petkov <bp@suse.de>
AuthorDate: Mon, 1 Jun 2015 12:06:57 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 2 Jun 2015 10:38:11 +0200

x86/cpu: Trim model ID whitespace

We did try trimming whitespace surrounding the 'model name'
field in /proc/cpuinfo since reportedly some userspace uses it
in string comparisons and there were discrepancies:

  [thetango@prarit ~]# grep "^model name" /proc/cpuinfo | uniq -c | sed 's/\ /_/g'
  ______1_model_name      :_AMD_Opteron(TM)_Processor_6272
  _____63_model_name      :_AMD_Opteron(TM)_Processor_6272_________________

However, there were issues with overlapping buffers, string
sizes and non-byte-sized copies in the previous proposed
solutions; see Link tags below for the whole farce.

So, instead of diddling with this more, let's simply extend what
was there originally with trimming any present trailing
whitespace. Final result is really simple and obvious.

Testing with the most insane model IDs qemu can generate, looks
good:

  .model_id = "            My funny model ID CPU          ",
  ______4_model_name      :_My_funny_model_ID_CPU

  .model_id = "My funny model ID CPU          ",
  ______4_model_name      :_My_funny_model_ID_CPU

  .model_id = "            My funny model ID CPU",
  ______4_model_name      :_My_funny_model_ID_CPU

  .model_id = "            ",
  ______4_model_name      :__

  .model_id = "",
  ______4_model_name      :_15/02

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1432050210-32036-1-git-send-email-prarit@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/common.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 41a8e9c..351197c 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -5,6 +5,7 @@
 #include <linux/module.h>
 #include <linux/percpu.h>
 #include <linux/string.h>
+#include <linux/ctype.h>
 #include <linux/delay.h>
 #include <linux/sched.h>
 #include <linux/init.h>
@@ -419,6 +420,7 @@ static const struct cpu_dev *cpu_devs[X86_VENDOR_NUM] = {};
 static void get_model_name(struct cpuinfo_x86 *c)
 {
 	unsigned int *v;
+	char *p, *q, *s;
 
 	if (c->extended_cpuid_level < 0x80000004)
 		return;
@@ -429,11 +431,21 @@ static void get_model_name(struct cpuinfo_x86 *c)
 	cpuid(0x80000004, &v[8], &v[9], &v[10], &v[11]);
 	c->x86_model_id[48] = 0;
 
-	/*
-	 * Remove leading whitespace on Intel processors and trailing
-	 * whitespace on AMD processors.
-	 */
-	memmove(c->x86_model_id, strim(c->x86_model_id), 48);
+	/* Trim whitespace */
+	p = q = s = &c->x86_model_id[0];
+
+	while (*p == ' ')
+		p++;
+
+	while (*p) {
+		/* Note the last non-whitespace index */
+		if (!isspace(*p))
+			s = q;
+
+		*q++ = *p++;
+	}
+
+	*(s + 1) = '\0';
 }
 
 void cpu_detect_cache_sizes(struct cpuinfo_x86 *c)

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/6] x86: document and address MTRR corner cases
  2015-04-29 21:44 [PATCH v4 0/6] x86: document and address MTRR corner cases Luis R. Rodriguez
@ 2015-06-03 23:50   ` Luis R. Rodriguez
  2015-04-29 21:44   ` Luis R. Rodriguez
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-06-03 23:50 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Doug Ledford, Andy Walls, Andy Lutomirski, Michael S. Tsirkin,
	cocci, linux-kernel, Luis R. Rodriguez, Dave Airlie,
	Daniel Vetter, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Borislav Petkov, Tomi Valkeinen,
	Jean-Christophe Plagniol-Villard

On Wed, Apr 29, 2015 at 2:44 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> This series addresses one commend fix on the table for mtrr_add()
> effect on the PAT case when UC- is used. Other than that it is
> the same as v4.
>
> Luis R. Rodriguez (6):
>   x86: add ioremap_uc() - force strong UC, PCD=1, PWT=1
>   x86: document WC MTRR effects on PAT / non-PAT pages
>   video: fbdev: atyfb: move framebuffer length fudging to helper
>   video: fbdev: atyfb: clarify ioremap() base and length used
>   video: fbdev: atyfb: replace MTRR UC hole with strong UC
>   video: fbdev: atyfb: use arch_phys_wc_add() and ioremap_wc()

Ville,

the x86 patches are in and on their way to the next version of Linux.
Can I trouble you for your review of the atyfb driver changes?

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* [Cocci] [PATCH v4 0/6] x86: document and address MTRR corner cases
@ 2015-06-03 23:50   ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-06-03 23:50 UTC (permalink / raw)
  To: cocci

On Wed, Apr 29, 2015 at 2:44 PM, Luis R. Rodriguez
<mcgrof@do-not-panic.com> wrote:
> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> This series addresses one commend fix on the table for mtrr_add()
> effect on the PAT case when UC- is used. Other than that it is
> the same as v4.
>
> Luis R. Rodriguez (6):
>   x86: add ioremap_uc() - force strong UC, PCD=1, PWT=1
>   x86: document WC MTRR effects on PAT / non-PAT pages
>   video: fbdev: atyfb: move framebuffer length fudging to helper
>   video: fbdev: atyfb: clarify ioremap() base and length used
>   video: fbdev: atyfb: replace MTRR UC hole with strong UC
>   video: fbdev: atyfb: use arch_phys_wc_add() and ioremap_wc()

Ville,

the x86 patches are in and on their way to the next version of Linux.
Can I trouble you for your review of the atyfb driver changes?

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* [tip:x86/core] x86/mce: Fix monarch timeout setting through the mce= cmdline option
  2015-05-26  8:28 ` [PATCH 18/18] x86/mce: Fix monarch timeout setting through the mce= cmdline option Borislav Petkov
@ 2015-06-07 17:39   ` tip-bot for Xie XiuQi
  0 siblings, 0 replies; 710+ messages in thread
From: tip-bot for Xie XiuQi @ 2015-06-07 17:39 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: xiexiuqi, linux-kernel, dvlasenk, tglx, peterz, luto, mingo, hpa,
	bp, brgerst, tony.luck, torvalds, bp

Commit-ID:  5c31b2800d8d3e735e5ecac8fc13d1cf862fd330
Gitweb:     http://git.kernel.org/tip/5c31b2800d8d3e735e5ecac8fc13d1cf862fd330
Author:     Xie XiuQi <xiexiuqi@huawei.com>
AuthorDate: Tue, 26 May 2015 10:28:21 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 27 May 2015 14:39:14 +0200

x86/mce: Fix monarch timeout setting through the mce= cmdline option

Using "mce=1,10000000" on the kernel cmdline to change the
monarch timeout does not work. The cause is that get_option()
does parse a subsequent comma in the option string and signals
that with a return value. So we don't need to check for a second
comma ourselves.

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1432120943-25028-1-git-send-email-xiexiuqi@huawei.com
Link: http://lkml.kernel.org/r/1432628901-18044-19-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 521e501..0cbcd31 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -2014,11 +2014,8 @@ static int __init mcheck_enable(char *str)
 	else if (!strcmp(str, "bios_cmci_threshold"))
 		cfg->bios_cmci_threshold = true;
 	else if (isdigit(str[0])) {
-		get_option(&str, &(cfg->tolerant));
-		if (*str == ',') {
-			++str;
+		if (get_option(&str, &cfg->tolerant) == 2)
 			get_option(&str, &(cfg->monarch_timeout));
-		}
 	} else {
 		pr_info("mce argument %s ignored. Please use /sys\n", str);
 		return 0;

^ permalink raw reply related	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/6] x86: document and address MTRR corner cases
  2015-06-03 23:50   ` [Cocci] " Luis R. Rodriguez
@ 2015-06-08 23:43     ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-06-08 23:43 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Doug Ledford, Andy Walls, Andy Lutomirski, Michael S. Tsirkin,
	cocci, linux-kernel, Luis R. Rodriguez, Dave Airlie,
	Daniel Vetter, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Borislav Petkov, Tomi Valkeinen,
	Jean-Christophe Plagniol-Villard

On Wed, Jun 3, 2015 at 4:50 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> Ville,
>
> the x86 patches are in and on their way to the next version of Linux.
> Can I trouble you for your review of the atyfb driver changes?

Hey Ville, just a friendly *poke*.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* [Cocci] [PATCH v4 0/6] x86: document and address MTRR corner cases
@ 2015-06-08 23:43     ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-06-08 23:43 UTC (permalink / raw)
  To: cocci

On Wed, Jun 3, 2015 at 4:50 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> Ville,
>
> the x86 patches are in and on their way to the next version of Linux.
> Can I trouble you for your review of the atyfb driver changes?

Hey Ville, just a friendly *poke*.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/6] x86: document and address MTRR corner cases
  2015-06-08 23:43     ` [Cocci] " Luis R. Rodriguez
@ 2015-06-16 19:31       ` Luis R. Rodriguez
  -1 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-06-16 19:31 UTC (permalink / raw)
  To: Ville Syrjälä, Ville Syrjälä
  Cc: Doug Ledford, Andy Walls, Andy Lutomirski, Michael S. Tsirkin,
	cocci, linux-kernel, Luis R. Rodriguez, Dave Airlie,
	Daniel Vetter, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Borislav Petkov, Tomi Valkeinen,
	Jean-Christophe Plagniol-Villard

On Mon, Jun 8, 2015 at 4:43 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Wed, Jun 3, 2015 at 4:50 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> Ville,
>>
>> the x86 patches are in and on their way to the next version of Linux.
>> Can I trouble you for your review of the atyfb driver changes?
>
> Hey Ville, just a friendly *poke*.

Hey, Ville, trying your Intel address, just in case. Full context of
the entire series, in case it helps as this happens to be the more
complex of the changes in the entire series:

http://lkml.kernel.org/r/CAB=NE6UgtdSoBsA=8+ueYRAZHDnWUSmQAoHhAaefqudBrSY7Zw@mail.gmail.com

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* [Cocci] [PATCH v4 0/6] x86: document and address MTRR corner cases
@ 2015-06-16 19:31       ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-06-16 19:31 UTC (permalink / raw)
  To: cocci

On Mon, Jun 8, 2015 at 4:43 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Wed, Jun 3, 2015 at 4:50 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> Ville,
>>
>> the x86 patches are in and on their way to the next version of Linux.
>> Can I trouble you for your review of the atyfb driver changes?
>
> Hey Ville, just a friendly *poke*.

Hey, Ville, trying your Intel address, just in case. Full context of
the entire series, in case it helps as this happens to be the more
complex of the changes in the entire series:

http://lkml.kernel.org/r/CAB=NE6UgtdSoBsA=8+ueYRAZHDnWUSmQAoHhAaefqudBrSY7Zw at mail.gmail.com

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/6] x86: document and address MTRR corner cases
  2015-06-16 19:31       ` [Cocci] " Luis R. Rodriguez
  (?)
@ 2015-06-19 22:22       ` Luis R. Rodriguez
  2015-06-25  1:24         ` Luis R. Rodriguez
  -1 siblings, 1 reply; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-06-19 22:22 UTC (permalink / raw)
  To: Ville Syrjälä, Ville Syrjälä,
	Dave Airlie, Andy Lutomirski, Tomi Valkeinen
  Cc: Doug Ledford, Andy Walls, Michael S. Tsirkin, linux-kernel,
	Luis R. Rodriguez, Daniel Vetter, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Borislav Petkov,
	Jean-Christophe Plagniol-Villard

On Tue, Jun 16, 2015 at 12:31 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Mon, Jun 8, 2015 at 4:43 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> On Wed, Jun 3, 2015 at 4:50 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>>> Ville,
>>>
>>> the x86 patches are in and on their way to the next version of Linux.
>>> Can I trouble you for your review of the atyfb driver changes?
>>
>> Hey Ville, just a friendly *poke*.
>
> Hey, Ville, trying your Intel address, just in case. Full context of
> the entire series, in case it helps as this happens to be the more
> complex of the changes in the entire series:
>
> http://lkml.kernel.org/r/CAB=NE6UgtdSoBsA=8+ueYRAZHDnWUSmQAoHhAaefqudBrSY7Zw@mail.gmail.com

Tomi, Dave, Andy,

Its' been one month now since posting the last unmodified version
(other than commit log) of this series [0] and no word or follow up
from Ville. The merge window is closing in and other than the PCI
changes this would be the last pending series. Can I trouble one of
you for your review ? I will note that this series depends on the
ioremap_uc() which went in through Ingo's tree and visible on
linux-next.

[0] http://lkml.kernel.org/r/20150529174051.GC23057@wotan.suse.de

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/6] x86: document and address MTRR corner cases
  2015-06-19 22:22       ` Luis R. Rodriguez
@ 2015-06-25  1:24         ` Luis R. Rodriguez
  2015-06-25  6:59           ` Ingo Molnar
  0 siblings, 1 reply; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-06-25  1:24 UTC (permalink / raw)
  To: Ville Syrjälä, Ville Syrjälä,
	Dave Airlie, Andy Lutomirski, Tomi Valkeinen, Andrew Morton
  Cc: Doug Ledford, Andy Walls, Michael S. Tsirkin, linux-kernel,
	Luis R. Rodriguez, Daniel Vetter, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Borislav Petkov,
	Jean-Christophe Plagniol-Villard

On Fri, Jun 19, 2015 at 3:22 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> Tomi, Dave, Andy,
>
> Its' been one month now since posting the last unmodified version
> (other than commit log) of this series [0] and no word or follow up
> from Ville. The merge window is closing in and other than the PCI
> changes this would be the last pending series. Can I trouble one of
> you for your review ? I will note that this series depends on the
> ioremap_uc() which went in through Ingo's tree and visible on
> linux-next.
>
> [0] http://lkml.kernel.org/r/20150529174051.GC23057@wotan.suse.de

Alright, I'll poke to see if Andrew might take these then. I'll post a
new clean series just to be crystal clear as this is a complex set, I
admit and it may be worth re-iterating things.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/6] x86: document and address MTRR corner cases
  2015-06-25  1:24         ` Luis R. Rodriguez
@ 2015-06-25  6:59           ` Ingo Molnar
  2015-06-25 16:41             ` Luis R. Rodriguez
  0 siblings, 1 reply; 710+ messages in thread
From: Ingo Molnar @ 2015-06-25  6:59 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Ville Syrjälä, Ville Syrjälä,
	Dave Airlie, Andy Lutomirski, Tomi Valkeinen, Andrew Morton,
	Doug Ledford, Andy Walls, Michael S. Tsirkin, linux-kernel,
	Daniel Vetter, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Borislav Petkov, Jean-Christophe Plagniol-Villard


* Luis R. Rodriguez <mcgrof@suse.com> wrote:

> On Fri, Jun 19, 2015 at 3:22 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > Tomi, Dave, Andy,
> >
> > Its' been one month now since posting the last unmodified version
> > (other than commit log) of this series [0] and no word or follow up
> > from Ville. The merge window is closing in and other than the PCI
> > changes this would be the last pending series. Can I trouble one of
> > you for your review ? I will note that this series depends on the
> > ioremap_uc() which went in through Ingo's tree and visible on
> > linux-next.
> >
> > [0] http://lkml.kernel.org/r/20150529174051.GC23057@wotan.suse.de
> 
> Alright, I'll poke to see if Andrew might take these then. I'll post a new clean 
> series just to be crystal clear as this is a complex set, I admit and it may be 
> worth re-iterating things.

No, please send it to us against -tip as the previous patches, I'd like all these 
changes that materially impact x86 to go through the x86 tree.

I can deal with the generic impact.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [PATCH v4 0/6] x86: document and address MTRR corner cases
  2015-06-25  6:59           ` Ingo Molnar
@ 2015-06-25 16:41             ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-06-25 16:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ville Syrjälä, Ville Syrjälä,
	Dave Airlie, Andy Lutomirski, Tomi Valkeinen, Andrew Morton,
	Doug Ledford, Andy Walls, Michael S. Tsirkin, linux-kernel,
	Daniel Vetter, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Borislav Petkov, Jean-Christophe Plagniol-Villard

On Thu, Jun 25, 2015 at 08:59:45AM +0200, Ingo Molnar wrote:
> 
> * Luis R. Rodriguez <mcgrof@suse.com> wrote:
> 
> > On Fri, Jun 19, 2015 at 3:22 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > > Tomi, Dave, Andy,
> > >
> > > Its' been one month now since posting the last unmodified version
> > > (other than commit log) of this series [0] and no word or follow up
> > > from Ville. The merge window is closing in and other than the PCI
> > > changes this would be the last pending series. Can I trouble one of
> > > you for your review ? I will note that this series depends on the
> > > ioremap_uc() which went in through Ingo's tree and visible on
> > > linux-next.
> > >
> > > [0] http://lkml.kernel.org/r/20150529174051.GC23057@wotan.suse.de
> > 
> > Alright, I'll poke to see if Andrew might take these then. I'll post a new clean 
> > series just to be crystal clear as this is a complex set, I admit and it may be 
> > worth re-iterating things.
> 
> No, please send it to us against -tip as the previous patches, I'd like all these 
> changes that materially impact x86 to go through the x86 tree.

Oh OK -- Andrew please ignore that atyfb series I sent yesterday to you then.

> I can deal with the generic impact.

Great, thanks.

 Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [tip:x86/mm] x86/mm/mtrr: Clean up mtrr_type_lookup()
  2015-05-27 14:19     ` tip-bot for Toshi Kani
@ 2015-07-31 13:18       ` Peter Zijlstra
  -1 siblings, 0 replies; 710+ messages in thread
From: Peter Zijlstra @ 2015-07-31 13:18 UTC (permalink / raw)
  To: mingo, hpa, bp, dvlasenk, bp, akpm, brgerst, tglx, linux-mm,
	luto, mcgrof, toshi.kani, torvalds, linux-kernel
  Cc: linux-tip-commits

On Wed, May 27, 2015 at 07:19:05AM -0700, tip-bot for Toshi Kani wrote:
> +/**
> + * mtrr_type_lookup - look up memory type in MTRR
> + *
> + * Return Values:
> + * MTRR_TYPE_(type)  - The effective MTRR type for the region
> + * MTRR_TYPE_INVALID - MTRR is disabled
>   */
>  u8 mtrr_type_lookup(u64 start, u64 end)
>  {

>  	int repeat;
>  	u64 partial_end;
>  
> +	if (!mtrr_state_set)
> +		return MTRR_TYPE_INVALID;
> +
> +	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
> +		return MTRR_TYPE_INVALID;
> +
> +	/*
> +	 * Look up the fixed ranges first, which take priority over
> +	 * the variable ranges.
> +	 */
> +	if ((start < 0x100000) &&
> +	    (mtrr_state.have_fixed) &&
> +	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
> +		return mtrr_type_lookup_fixed(start, end);
> +
> +	/*
> +	 * Look up the variable ranges.  Look of multiple ranges matching
> +	 * this address and pick type as per MTRR precedence.
> +	 */
> +	type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
>  
>  	/*
>  	 * Common path is with repeat = 0.
>  	 * However, we can have cases where [start:end] spans across some
> +	 * MTRR ranges and/or the default type.  Do repeated lookups for
> +	 * that case here.
>  	 */
>  	while (repeat) {
>  		prev_type = type;
>  		start = partial_end;
> +		type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
>  
>  		if (check_type_overlap(&prev_type, &type))
>  			return type;
>  	}
>  
> +	if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
> +		return MTRR_TYPE_WRBACK;
> +
>  	return type;
>  }

So I got staring at this MTRR horror show because I _really_ _Really_
want to kill stop_machine_from_inactive_cpu().

But I wondered about these lookup functions, should they not have an
assertion that preemption is disabled?

Using these functions with preemption enabled is racy against MTRR
updates. And if that race is ok, at the very least explain that it is
indeed racy and why this is not a problem.

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [tip:x86/mm] x86/mm/mtrr: Clean up mtrr_type_lookup()
@ 2015-07-31 13:18       ` Peter Zijlstra
  0 siblings, 0 replies; 710+ messages in thread
From: Peter Zijlstra @ 2015-07-31 13:18 UTC (permalink / raw)
  To: mingo, hpa, bp, dvlasenk, bp, akpm, brgerst, tglx, linux-mm,
	luto, mcgrof, toshi.kani, torvalds, linux-kernel
  Cc: linux-tip-commits

On Wed, May 27, 2015 at 07:19:05AM -0700, tip-bot for Toshi Kani wrote:
> +/**
> + * mtrr_type_lookup - look up memory type in MTRR
> + *
> + * Return Values:
> + * MTRR_TYPE_(type)  - The effective MTRR type for the region
> + * MTRR_TYPE_INVALID - MTRR is disabled
>   */
>  u8 mtrr_type_lookup(u64 start, u64 end)
>  {

>  	int repeat;
>  	u64 partial_end;
>  
> +	if (!mtrr_state_set)
> +		return MTRR_TYPE_INVALID;
> +
> +	if (!(mtrr_state.enabled & MTRR_STATE_MTRR_ENABLED))
> +		return MTRR_TYPE_INVALID;
> +
> +	/*
> +	 * Look up the fixed ranges first, which take priority over
> +	 * the variable ranges.
> +	 */
> +	if ((start < 0x100000) &&
> +	    (mtrr_state.have_fixed) &&
> +	    (mtrr_state.enabled & MTRR_STATE_MTRR_FIXED_ENABLED))
> +		return mtrr_type_lookup_fixed(start, end);
> +
> +	/*
> +	 * Look up the variable ranges.  Look of multiple ranges matching
> +	 * this address and pick type as per MTRR precedence.
> +	 */
> +	type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
>  
>  	/*
>  	 * Common path is with repeat = 0.
>  	 * However, we can have cases where [start:end] spans across some
> +	 * MTRR ranges and/or the default type.  Do repeated lookups for
> +	 * that case here.
>  	 */
>  	while (repeat) {
>  		prev_type = type;
>  		start = partial_end;
> +		type = mtrr_type_lookup_variable(start, end, &partial_end, &repeat);
>  
>  		if (check_type_overlap(&prev_type, &type))
>  			return type;
>  	}
>  
> +	if (mtrr_tom2 && (start >= (1ULL<<32)) && (end < mtrr_tom2))
> +		return MTRR_TYPE_WRBACK;
> +
>  	return type;
>  }

So I got staring at this MTRR horror show because I _really_ _Really_
want to kill stop_machine_from_inactive_cpu().

But I wondered about these lookup functions, should they not have an
assertion that preemption is disabled?

Using these functions with preemption enabled is racy against MTRR
updates. And if that race is ok, at the very least explain that it is
indeed racy and why this is not a problem.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [tip:x86/mm] x86/mm/mtrr: Clean up mtrr_type_lookup()
  2015-07-31 13:18       ` Peter Zijlstra
  (?)
@ 2015-07-31 14:44       ` Borislav Petkov
  2015-07-31 15:08           ` Peter Zijlstra
  -1 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-07-31 14:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, hpa, dvlasenk, bp, akpm, brgerst, tglx, linux-mm, luto,
	mcgrof, toshi.kani, torvalds, linux-kernel, linux-tip-commits

On Fri, Jul 31, 2015 at 03:18:02PM +0200, Peter Zijlstra wrote:
> Using these functions with preemption enabled is racy against MTRR
> updates. And if that race is ok, at the very least explain that it is
> indeed racy and why this is not a problem.

Right, so Luis has been working on burying direct MTRR access so
after that work is done, we'll be using only PAT for changing memory
attributes. Look at arch_phys_wc_add() and all those fbdev users of
mtrr_add() which get converted to that thing...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [tip:x86/mm] x86/mm/mtrr: Clean up mtrr_type_lookup()
  2015-07-31 14:44       ` Borislav Petkov
@ 2015-07-31 15:08           ` Peter Zijlstra
  0 siblings, 0 replies; 710+ messages in thread
From: Peter Zijlstra @ 2015-07-31 15:08 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: mingo, hpa, dvlasenk, bp, akpm, brgerst, tglx, linux-mm, luto,
	mcgrof, toshi.kani, torvalds, linux-kernel, linux-tip-commits

On Fri, Jul 31, 2015 at 04:44:52PM +0200, Borislav Petkov wrote:
> On Fri, Jul 31, 2015 at 03:18:02PM +0200, Peter Zijlstra wrote:
> > Using these functions with preemption enabled is racy against MTRR
> > updates. And if that race is ok, at the very least explain that it is
> > indeed racy and why this is not a problem.
> 
> Right, so Luis has been working on burying direct MTRR access so
> after that work is done, we'll be using only PAT for changing memory
> attributes. Look at arch_phys_wc_add() and all those fbdev users of
> mtrr_add() which get converted to that thing...

Drivers don't do those lookups afaict.

But its things like set_memory_XX(), and afaict that's all buggy against
MTRR modifications.

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [tip:x86/mm] x86/mm/mtrr: Clean up mtrr_type_lookup()
@ 2015-07-31 15:08           ` Peter Zijlstra
  0 siblings, 0 replies; 710+ messages in thread
From: Peter Zijlstra @ 2015-07-31 15:08 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: mingo, hpa, dvlasenk, bp, akpm, brgerst, tglx, linux-mm, luto,
	mcgrof, toshi.kani, torvalds, linux-kernel, linux-tip-commits

On Fri, Jul 31, 2015 at 04:44:52PM +0200, Borislav Petkov wrote:
> On Fri, Jul 31, 2015 at 03:18:02PM +0200, Peter Zijlstra wrote:
> > Using these functions with preemption enabled is racy against MTRR
> > updates. And if that race is ok, at the very least explain that it is
> > indeed racy and why this is not a problem.
> 
> Right, so Luis has been working on burying direct MTRR access so
> after that work is done, we'll be using only PAT for changing memory
> attributes. Look at arch_phys_wc_add() and all those fbdev users of
> mtrr_add() which get converted to that thing...

Drivers don't do those lookups afaict.

But its things like set_memory_XX(), and afaict that's all buggy against
MTRR modifications.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [tip:x86/mm] x86/mm/mtrr: Clean up mtrr_type_lookup()
  2015-07-31 15:08           ` Peter Zijlstra
  (?)
@ 2015-07-31 15:27           ` Borislav Petkov
  2015-08-01 14:28               ` Luis R. Rodriguez
  -1 siblings, 1 reply; 710+ messages in thread
From: Borislav Petkov @ 2015-07-31 15:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, hpa, dvlasenk, bp, akpm, brgerst, tglx, linux-mm, luto,
	mcgrof, toshi.kani, torvalds, linux-kernel, linux-tip-commits

On Fri, Jul 31, 2015 at 05:08:06PM +0200, Peter Zijlstra wrote:
> But its things like set_memory_XX(), and afaict that's all buggy against
> MTRR modifications.

I think the idea is to not do any MTRR modifications at some point:

>From Documentation/x86/pat.txt:

"... Ideally mtrr_add() usage will be phased out in favor of
arch_phys_wc_add() which will be a no-op on PAT enabled systems. The
region over which a arch_phys_wc_add() is made, should already have been
ioremapped with WC attributes or PAT entries, this can be done by using
ioremap_wc() / set_memory_wc()."

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [tip:x86/mm] x86/mm/mtrr: Clean up mtrr_type_lookup()
  2015-07-31 15:27           ` Borislav Petkov
@ 2015-08-01 14:28               ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-08-01 14:28 UTC (permalink / raw)
  To: Borislav Petkov, Toshi Kani
  Cc: Peter Zijlstra, mingo, hpa, dvlasenk, bp, akpm, brgerst, tglx,
	linux-mm, luto, torvalds, linux-kernel, linux-tip-commits

On Fri, Jul 31, 2015 at 05:27:13PM +0200, Borislav Petkov wrote:
> On Fri, Jul 31, 2015 at 05:08:06PM +0200, Peter Zijlstra wrote:
> > But its things like set_memory_XX(), and afaict that's all buggy against
> > MTRR modifications.
> 
> I think the idea is to not do any MTRR modifications at some point:
> 
> From Documentation/x86/pat.txt:
> 
> "... Ideally mtrr_add() usage will be phased out in favor of
> arch_phys_wc_add() which will be a no-op on PAT enabled systems. The
> region over which a arch_phys_wc_add() is made, should already have been
> ioremapped with WC attributes or PAT entries, this can be done by using
> ioremap_wc() / set_memory_wc()."

I need to update this documentation to remove set_memory_wc() there as we've
learned with the MTRR --> PAT conversion that set_memory_wc() cannot be used on
IO memory, it can only be used for RAM. I am not sure if I would call it being
broken that you cannot use set_memory_*() for IO memory that may have been by
design.

  Luis

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [tip:x86/mm] x86/mm/mtrr: Clean up mtrr_type_lookup()
@ 2015-08-01 14:28               ` Luis R. Rodriguez
  0 siblings, 0 replies; 710+ messages in thread
From: Luis R. Rodriguez @ 2015-08-01 14:28 UTC (permalink / raw)
  To: Borislav Petkov, Toshi Kani
  Cc: Peter Zijlstra, mingo, hpa, dvlasenk, bp, akpm, brgerst, tglx,
	linux-mm, luto, torvalds, linux-kernel, linux-tip-commits

On Fri, Jul 31, 2015 at 05:27:13PM +0200, Borislav Petkov wrote:
> On Fri, Jul 31, 2015 at 05:08:06PM +0200, Peter Zijlstra wrote:
> > But its things like set_memory_XX(), and afaict that's all buggy against
> > MTRR modifications.
> 
> I think the idea is to not do any MTRR modifications at some point:
> 
> From Documentation/x86/pat.txt:
> 
> "... Ideally mtrr_add() usage will be phased out in favor of
> arch_phys_wc_add() which will be a no-op on PAT enabled systems. The
> region over which a arch_phys_wc_add() is made, should already have been
> ioremapped with WC attributes or PAT entries, this can be done by using
> ioremap_wc() / set_memory_wc()."

I need to update this documentation to remove set_memory_wc() there as we've
learned with the MTRR --> PAT conversion that set_memory_wc() cannot be used on
IO memory, it can only be used for RAM. I am not sure if I would call it being
broken that you cannot use set_memory_*() for IO memory that may have been by
design.

  Luis

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [tip:x86/mm] x86/mm/mtrr: Clean up mtrr_type_lookup()
  2015-08-01 14:28               ` Luis R. Rodriguez
@ 2015-08-01 16:33                 ` Borislav Petkov
  -1 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-08-01 16:33 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Toshi Kani, Peter Zijlstra, mingo, hpa, dvlasenk, bp, akpm,
	brgerst, tglx, linux-mm, luto, torvalds, linux-kernel,
	linux-tip-commits

On Sat, Aug 01, 2015 at 04:28:20PM +0200, Luis R. Rodriguez wrote:
> I need to update this documentation to remove set_memory_wc() there as we've
> learned with the MTRR --> PAT conversion that set_memory_wc() cannot be used on
> IO memory, it can only be used for RAM. I am not sure if I would call it being
> broken that you cannot use set_memory_*() for IO memory that may have been by
> design.

Well, it doesn't really make sense to write-combine IO memory, does it?
My simplistic impression is that an IO range behind which there's a
device, cannot stomach any caching of IO as all commands/data accesses
need to happen as they get issued...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [tip:x86/mm] x86/mm/mtrr: Clean up mtrr_type_lookup()
@ 2015-08-01 16:33                 ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-08-01 16:33 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Toshi Kani, Peter Zijlstra, mingo, hpa, dvlasenk, bp, akpm,
	brgerst, tglx, linux-mm, luto, torvalds, linux-kernel,
	linux-tip-commits

On Sat, Aug 01, 2015 at 04:28:20PM +0200, Luis R. Rodriguez wrote:
> I need to update this documentation to remove set_memory_wc() there as we've
> learned with the MTRR --> PAT conversion that set_memory_wc() cannot be used on
> IO memory, it can only be used for RAM. I am not sure if I would call it being
> broken that you cannot use set_memory_*() for IO memory that may have been by
> design.

Well, it doesn't really make sense to write-combine IO memory, does it?
My simplistic impression is that an IO range behind which there's a
device, cannot stomach any caching of IO as all commands/data accesses
need to happen as they get issued...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [tip:x86/mm] x86/mm/mtrr: Clean up mtrr_type_lookup()
  2015-08-01 16:33                 ` Borislav Petkov
@ 2015-08-01 16:39                   ` Linus Torvalds
  -1 siblings, 0 replies; 710+ messages in thread
From: Linus Torvalds @ 2015-08-01 16:39 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Luis R. Rodriguez, Toshi Kani, Peter Zijlstra, Ingo Molnar,
	Peter Anvin, Denys Vlasenko, Borislav Petkov, Andrew Morton,
	Brian Gerst, Thomas Gleixner, linux-mm, Andy Lutomirski,
	Linux Kernel Mailing List, linux-tip-commits

On Sat, Aug 1, 2015 at 9:33 AM, Borislav Petkov <bp@alien8.de> wrote:
>
> Well, it doesn't really make sense to write-combine IO memory, does it?

Quite the reverse.

It makes no sense to write-combine normal memory (RAM), because caches
work and sane memory is always cache-coherent. So marking regular
memory write-combining is a sign of crap hardware (which admittedly
exists all too much, but hopefully goes away).

In contrast, marking MMIO memory write-combining is not a sign of crap
hardware - it's just a sign of things like frame buffers on the card
etc. Which very much wants write combining. So WC for MMIO at least
makes sense.

Yes, yes, I realize that "crap hardware" may actually be the more
common case, but still..

             Linus

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [tip:x86/mm] x86/mm/mtrr: Clean up mtrr_type_lookup()
@ 2015-08-01 16:39                   ` Linus Torvalds
  0 siblings, 0 replies; 710+ messages in thread
From: Linus Torvalds @ 2015-08-01 16:39 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Luis R. Rodriguez, Toshi Kani, Peter Zijlstra, Ingo Molnar,
	Peter Anvin, Denys Vlasenko, Borislav Petkov, Andrew Morton,
	Brian Gerst, Thomas Gleixner, linux-mm, Andy Lutomirski,
	Linux Kernel Mailing List, linux-tip-commits

On Sat, Aug 1, 2015 at 9:33 AM, Borislav Petkov <bp@alien8.de> wrote:
>
> Well, it doesn't really make sense to write-combine IO memory, does it?

Quite the reverse.

It makes no sense to write-combine normal memory (RAM), because caches
work and sane memory is always cache-coherent. So marking regular
memory write-combining is a sign of crap hardware (which admittedly
exists all too much, but hopefully goes away).

In contrast, marking MMIO memory write-combining is not a sign of crap
hardware - it's just a sign of things like frame buffers on the card
etc. Which very much wants write combining. So WC for MMIO at least
makes sense.

Yes, yes, I realize that "crap hardware" may actually be the more
common case, but still..

             Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [tip:x86/mm] x86/mm/mtrr: Clean up mtrr_type_lookup()
  2015-08-01 16:39                   ` Linus Torvalds
@ 2015-08-01 16:49                     ` Borislav Petkov
  -1 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-08-01 16:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Luis R. Rodriguez, Toshi Kani, Peter Zijlstra, Ingo Molnar,
	Peter Anvin, Denys Vlasenko, Borislav Petkov, Andrew Morton,
	Brian Gerst, Thomas Gleixner, linux-mm, Andy Lutomirski,
	Linux Kernel Mailing List, linux-tip-commits

On Sat, Aug 01, 2015 at 09:39:07AM -0700, Linus Torvalds wrote:
> Quite the reverse.
> 
> It makes no sense to write-combine normal memory (RAM), because caches
> work and sane memory is always cache-coherent. So marking regular
> memory write-combining is a sign of crap hardware (which admittedly
> exists all too much, but hopefully goes away).
> 
> In contrast, marking MMIO memory write-combining is not a sign of crap
> hardware - it's just a sign of things like frame buffers on the card
> etc. Which very much wants write combining. So WC for MMIO at least
> makes sense.
> 
> Yes, yes, I realize that "crap hardware" may actually be the more
> common case, but still..

Hmm, ok.

My simplistic mental picture while thinking of this is the IO range
where you send the commands to the device and you don't really want to
delay those but they should reach the device as they get issued.

OTOH, your example with frame buffers really wants to WC because sending
down each write separately is plain dumb.

Ok, I see, so it can make sense to have WC IO memory, depending on the
range and what you're going to use it for, I guess...

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [tip:x86/mm] x86/mm/mtrr: Clean up mtrr_type_lookup()
@ 2015-08-01 16:49                     ` Borislav Petkov
  0 siblings, 0 replies; 710+ messages in thread
From: Borislav Petkov @ 2015-08-01 16:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Luis R. Rodriguez, Toshi Kani, Peter Zijlstra, Ingo Molnar,
	Peter Anvin, Denys Vlasenko, Borislav Petkov, Andrew Morton,
	Brian Gerst, Thomas Gleixner, linux-mm, Andy Lutomirski,
	Linux Kernel Mailing List, linux-tip-commits

On Sat, Aug 01, 2015 at 09:39:07AM -0700, Linus Torvalds wrote:
> Quite the reverse.
> 
> It makes no sense to write-combine normal memory (RAM), because caches
> work and sane memory is always cache-coherent. So marking regular
> memory write-combining is a sign of crap hardware (which admittedly
> exists all too much, but hopefully goes away).
> 
> In contrast, marking MMIO memory write-combining is not a sign of crap
> hardware - it's just a sign of things like frame buffers on the card
> etc. Which very much wants write combining. So WC for MMIO at least
> makes sense.
> 
> Yes, yes, I realize that "crap hardware" may actually be the more
> common case, but still..

Hmm, ok.

My simplistic mental picture while thinking of this is the IO range
where you send the commands to the device and you don't really want to
delay those but they should reach the device as they get issued.

OTOH, your example with frame buffers really wants to WC because sending
down each write separately is plain dumb.

Ok, I see, so it can make sense to have WC IO memory, depending on the
range and what you're going to use it for, I guess...

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [tip:x86/mm] x86/mm/mtrr: Clean up mtrr_type_lookup()
  2015-08-01 16:49                     ` Borislav Petkov
@ 2015-08-01 17:03                       ` Linus Torvalds
  -1 siblings, 0 replies; 710+ messages in thread
From: Linus Torvalds @ 2015-08-01 17:03 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Luis R. Rodriguez, Toshi Kani, Peter Zijlstra, Ingo Molnar,
	Peter Anvin, Denys Vlasenko, Borislav Petkov, Andrew Morton,
	Brian Gerst, Thomas Gleixner, linux-mm, Andy Lutomirski,
	Linux Kernel Mailing List, linux-tip-commits

On Sat, Aug 1, 2015 at 9:49 AM, Borislav Petkov <bp@alien8.de> wrote:
>
> My simplistic mental picture while thinking of this is the IO range
> where you send the commands to the device and you don't really want to
> delay those but they should reach the device as they get issued.

Well, even for command streams, people often do go for a
write-combining approach, simply because it is *so* much more
efficient on the bus to buffer and burst things. The interface is set
up to not really "combine" things in the over-writing sense, but just
in the "combine continuous writes into bigger buffers on the CPU, and
then write it out as efficiently as possible" sense.

Of course, the device (and the driver) has to be designed properly for
that, and it makes sense only with certain kinds of models, but it can
actually be much more efficient to make the device interface be
something like "write 32-byte command packets to a circular
write-combining buffer" than it is to do things other ways. Back in
the days, that was one of the most efficient ways to try to fill up
the PCI bandwidth.

There are other approaches too, of course, with the modern variation
tending to be "the device does all real accesses by reading over DMA,
and the only time you use IO accesses is for setup and as a 'start
your DMA transfers now' kind of interface". But write-combining MMIO
used to be a very common model for high-performace IO not that long
ago, because DMA didn't actually use to be all that efficient at all
(nasty behavior with caches and snooping etc back before the memory
controller was on-die and DMA accesses snooped caches directly). So
the "DMA is efficient even for smaller things" thing is relatively
recent.

                     Linus

^ permalink raw reply	[flat|nested] 710+ messages in thread

* Re: [tip:x86/mm] x86/mm/mtrr: Clean up mtrr_type_lookup()
@ 2015-08-01 17:03                       ` Linus Torvalds
  0 siblings, 0 replies; 710+ messages in thread
From: Linus Torvalds @ 2015-08-01 17:03 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Luis R. Rodriguez, Toshi Kani, Peter Zijlstra, Ingo Molnar,
	Peter Anvin, Denys Vlasenko, Borislav Petkov, Andrew Morton,
	Brian Gerst, Thomas Gleixner, linux-mm, Andy Lutomirski,
	Linux Kernel Mailing List, linux-tip-commits

On Sat, Aug 1, 2015 at 9:49 AM, Borislav Petkov <bp@alien8.de> wrote:
>
> My simplistic mental picture while thinking of this is the IO range
> where you send the commands to the device and you don't really want to
> delay those but they should reach the device as they get issued.

Well, even for command streams, people often do go for a
write-combining approach, simply because it is *so* much more
efficient on the bus to buffer and burst things. The interface is set
up to not really "combine" things in the over-writing sense, but just
in the "combine continuous writes into bigger buffers on the CPU, and
then write it out as efficiently as possible" sense.

Of course, the device (and the driver) has to be designed properly for
that, and it makes sense only with certain kinds of models, but it can
actually be much more efficient to make the device interface be
something like "write 32-byte command packets to a circular
write-combining buffer" than it is to do things other ways. Back in
the days, that was one of the most efficient ways to try to fill up
the PCI bandwidth.

There are other approaches too, of course, with the modern variation
tending to be "the device does all real accesses by reading over DMA,
and the only time you use IO accesses is for setup and as a 'start
your DMA transfers now' kind of interface". But write-combining MMIO
used to be a very common model for high-performace IO not that long
ago, because DMA didn't actually use to be all that efficient at all
(nasty behavior with caches and snooping etc back before the memory
controller was on-die and DMA accesses snooped caches directly). So
the "DMA is efficient even for smaller things" thing is relatively
recent.

                     Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 710+ messages in thread

end of thread, other threads:[~2015-08-01 17:03 UTC | newest]

Thread overview: 710+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-20 23:17 [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Luis R. Rodriguez
2015-03-20 23:17 ` Luis R. Rodriguez
2015-03-20 23:17 ` [PATCH v1 01/47] x86: mtrr: annotate mtrr_type_lookup() is only implemented on generic_mtrr_ops Luis R. Rodriguez
2015-03-20 23:17   ` Luis R. Rodriguez
2015-03-20 23:17 ` [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR Luis R. Rodriguez
2015-03-20 23:17   ` Luis R. Rodriguez
2015-03-25 19:59   ` Konrad Rzeszutek Wilk
2015-03-25 19:59     ` Konrad Rzeszutek Wilk
2015-03-26  4:38     ` Juergen Gross
2015-03-26  4:38       ` Juergen Gross
2015-03-26 23:35     ` Luis R. Rodriguez
2015-03-26 23:35       ` Luis R. Rodriguez
2015-04-02 20:13       ` Bjorn Helgaas
2015-04-02 20:13         ` Bjorn Helgaas
2015-04-02 20:13         ` Bjorn Helgaas
2015-04-02 20:20         ` Luis R. Rodriguez
2015-04-02 20:20           ` Luis R. Rodriguez
2015-04-02 20:20           ` Luis R. Rodriguez
2015-04-02 20:28           ` Bjorn Helgaas
2015-04-02 20:28             ` Bjorn Helgaas
2015-04-02 20:28             ` Bjorn Helgaas
2015-04-02 21:02             ` Luis R. Rodriguez
2015-04-02 21:02               ` Luis R. Rodriguez
2015-04-02 21:02               ` Luis R. Rodriguez
2015-04-02 22:09               ` Bjorn Helgaas
2015-04-02 22:09                 ` Bjorn Helgaas
2015-04-02 22:09                 ` Bjorn Helgaas
2015-04-02 22:12                 ` [Xen-devel] " Luis R. Rodriguez
2015-04-02 22:12                   ` Luis R. Rodriguez
2015-04-02 22:12                   ` Luis R. Rodriguez
2015-03-27 20:40   ` Toshi Kani
2015-03-27 20:40     ` Toshi Kani
2015-03-27 23:56     ` Luis R. Rodriguez
2015-03-27 23:56       ` Luis R. Rodriguez
2015-04-02 21:49       ` Luis R. Rodriguez
2015-04-02 21:49         ` Luis R. Rodriguez
2015-04-02 23:52         ` Toshi Kani
2015-04-02 23:52           ` Toshi Kani
2015-04-03  1:08           ` Luis R. Rodriguez
2015-04-03  1:08             ` Luis R. Rodriguez
2015-03-20 23:17 ` [PATCH v1 03/47] devres: add devm_ioremap_wc() Luis R. Rodriguez
2015-03-20 23:17 ` Luis R. Rodriguez
2015-03-20 23:17   ` Luis R. Rodriguez
2015-03-20 23:49   ` Andy Lutomirski
2015-03-20 23:49     ` Andy Lutomirski
2015-03-25 19:50     ` Luis R. Rodriguez
2015-03-25 19:50       ` Luis R. Rodriguez
2015-03-25 19:50     ` Luis R. Rodriguez
2015-03-20 23:49   ` Andy Lutomirski
2015-03-20 23:17 ` [PATCH v1 04/47] pci: add pci_ioremap_wc_bar() Luis R. Rodriguez
2015-03-20 23:17   ` Luis R. Rodriguez
2015-03-20 23:50   ` Andy Lutomirski
2015-03-20 23:50   ` Andy Lutomirski
2015-03-20 23:50     ` Andy Lutomirski
2015-03-25 20:06     ` Luis R. Rodriguez
2015-03-25 20:06     ` Luis R. Rodriguez
2015-03-25 20:06       ` Luis R. Rodriguez
2015-03-25 20:03   ` Konrad Rzeszutek Wilk
2015-03-25 20:03   ` [Xen-devel] " Konrad Rzeszutek Wilk
2015-03-25 20:03     ` Konrad Rzeszutek Wilk
2015-03-25 20:39     ` Luis R. Rodriguez
2015-03-25 20:39       ` Luis R. Rodriguez
2015-03-25 20:39     ` Luis R. Rodriguez
2015-03-20 23:17 ` Luis R. Rodriguez
2015-03-20 23:17 ` [PATCH v1 05/47] pci: add pci_iomap_wc() variants Luis R. Rodriguez
2015-03-20 23:17   ` Luis R. Rodriguez
2015-03-23 17:20   ` Bjorn Helgaas
2015-03-23 17:20     ` Bjorn Helgaas
2015-03-23 17:20     ` Bjorn Helgaas
2015-03-26  3:00     ` Luis R. Rodriguez
2015-03-26  3:00     ` Luis R. Rodriguez
2015-03-26  3:00       ` Luis R. Rodriguez
2015-04-21 17:52       ` Luis R. Rodriguez
2015-04-21 18:46         ` Michael S. Tsirkin
2015-04-21 18:46         ` Michael S. Tsirkin
2015-04-21 17:52       ` Luis R. Rodriguez
2015-03-27 19:18     ` Toshi Kani
2015-03-27 19:18       ` Toshi Kani
2015-03-27 19:18       ` Toshi Kani
2015-04-21 19:25     ` Michael S. Tsirkin
2015-04-21 19:25       ` Michael S. Tsirkin
2015-04-21 19:25       ` Michael S. Tsirkin
2015-04-21 19:27       ` Luis R. Rodriguez
2015-04-21 19:27         ` Luis R. Rodriguez
2015-04-21 19:27         ` Luis R. Rodriguez
2015-03-25 20:07   ` Konrad Rzeszutek Wilk
2015-03-25 20:07     ` Konrad Rzeszutek Wilk
2015-03-25 20:07     ` Konrad Rzeszutek Wilk
2015-03-27 18:40     ` Luis R. Rodriguez
2015-03-27 18:40       ` Luis R. Rodriguez
2015-03-27 18:40       ` Luis R. Rodriguez
2015-03-20 23:17 ` [PATCH v1 06/47] mtrr: add __arch_phys_wc_add() Luis R. Rodriguez
2015-03-20 23:17   ` Luis R. Rodriguez
2015-03-20 23:48   ` Andy Lutomirski
2015-03-20 23:48   ` Andy Lutomirski
2015-03-20 23:48     ` Andy Lutomirski
2015-03-27 19:53     ` Luis R. Rodriguez
2015-03-27 19:53     ` Luis R. Rodriguez
2015-03-27 19:53       ` Luis R. Rodriguez
2015-03-27 19:58       ` Andy Lutomirski
2015-03-27 19:58       ` Andy Lutomirski
2015-03-27 19:58         ` Andy Lutomirski
2015-03-27 20:30         ` Luis R. Rodriguez
2015-03-27 20:30         ` Luis R. Rodriguez
2015-03-27 20:30           ` Luis R. Rodriguez
2015-03-27 21:23           ` Andy Lutomirski
2015-03-27 21:23             ` Andy Lutomirski
2015-03-27 23:04             ` Luis R. Rodriguez
2015-03-27 23:04             ` Luis R. Rodriguez
2015-03-27 23:04               ` Luis R. Rodriguez
2015-03-27 23:10               ` Andy Lutomirski
2015-03-27 23:10                 ` Andy Lutomirski
2015-03-27 23:33                 ` Luis R. Rodriguez
2015-03-27 23:33                   ` Luis R. Rodriguez
2015-03-27 23:33                 ` Luis R. Rodriguez
2015-03-27 23:10               ` Andy Lutomirski
2015-03-27 21:23           ` Andy Lutomirski
2015-04-02 20:21   ` Bjorn Helgaas
2015-04-02 20:21     ` Bjorn Helgaas
2015-04-02 20:55     ` Luis R. Rodriguez
2015-04-02 20:55       ` Luis R. Rodriguez
2015-04-02 22:35       ` Bjorn Helgaas
2015-04-02 22:35         ` Bjorn Helgaas
2015-04-02 22:54         ` Luis R. Rodriguez
2015-04-02 22:54           ` Luis R. Rodriguez
2015-04-02 22:54         ` Luis R. Rodriguez
2015-04-02 22:35       ` Bjorn Helgaas
2015-04-02 20:55     ` Luis R. Rodriguez
2015-04-02 20:21   ` Bjorn Helgaas
2015-03-20 23:17 ` Luis R. Rodriguez
2015-03-20 23:17 ` [PATCH v1 07/47] video: fbdev: atyfb: move framebuffer length fudging to helper Luis R. Rodriguez
2015-03-20 23:17   ` Luis R. Rodriguez
2015-03-20 23:17 ` Luis R. Rodriguez
2015-03-20 23:17 ` [PATCH v1 08/47] video: fbdev: atyfb: clarify ioremap() base and length used Luis R. Rodriguez
2015-03-20 23:17   ` Luis R. Rodriguez
2015-03-20 23:17 ` Luis R. Rodriguez
2015-03-20 23:17 ` [PATCH v1 09/47] vidoe: fbdev: atyfb: remove and fix MTRR MMIO "hole" work around Luis R. Rodriguez
2015-03-20 23:17   ` Luis R. Rodriguez
2015-03-20 23:52   ` Andy Lutomirski
2015-03-20 23:52     ` Andy Lutomirski
2015-03-27 20:12     ` Luis R. Rodriguez
2015-03-27 20:12     ` Luis R. Rodriguez
2015-03-27 20:12       ` Luis R. Rodriguez
2015-03-27 21:21       ` Andy Lutomirski
2015-03-27 21:21       ` Andy Lutomirski
2015-03-27 21:21         ` Andy Lutomirski
2015-03-27 23:31         ` Luis R. Rodriguez
2015-03-27 23:31           ` Luis R. Rodriguez
2015-03-27 23:31         ` Luis R. Rodriguez
2015-03-20 23:52   ` Andy Lutomirski
2015-03-21  9:15   ` Ville Syrjälä
2015-03-21  9:15   ` Ville Syrjälä
2015-03-21  9:15     ` Ville Syrjälä
2015-03-27  8:37     ` Ville Syrjälä
2015-03-27  8:37     ` Ville Syrjälä
2015-03-27  8:37       ` Ville Syrjälä
2015-03-27 19:38       ` Luis R. Rodriguez
2015-03-27 19:38         ` Luis R. Rodriguez
2015-03-27 19:38       ` Luis R. Rodriguez
2015-03-27 19:38     ` Luis R. Rodriguez
2015-03-27 19:38     ` Luis R. Rodriguez
2015-03-27 19:38       ` Luis R. Rodriguez
2015-03-27 19:43       ` Andy Lutomirski
2015-03-27 19:43       ` Andy Lutomirski
2015-03-27 19:43         ` Andy Lutomirski
2015-03-27 19:57         ` Luis R. Rodriguez
2015-03-27 19:57         ` Luis R. Rodriguez
2015-03-27 19:57           ` Luis R. Rodriguez
2015-03-27 21:56           ` Ville Syrjälä
2015-03-27 21:56             ` Ville Syrjälä
2015-03-27 22:02             ` Andy Lutomirski
2015-03-27 22:02             ` Andy Lutomirski
2015-03-27 22:02               ` Andy Lutomirski
2015-03-28  0:28               ` Luis R. Rodriguez
2015-03-28  0:28               ` Luis R. Rodriguez
2015-03-28  0:28                 ` Luis R. Rodriguez
2015-03-28 12:23                 ` Ville Syrjälä
2015-03-28 12:23                   ` Ville Syrjälä
2015-04-01 23:52                   ` Luis R. Rodriguez
2015-04-01 23:52                   ` Luis R. Rodriguez
2015-04-01 23:52                     ` Luis R. Rodriguez
2015-04-02  0:04                     ` Andy Lutomirski
2015-04-02  0:04                     ` Andy Lutomirski
2015-04-02  0:04                       ` Andy Lutomirski
2015-04-02 19:45                       ` Luis R. Rodriguez
2015-04-02 19:45                         ` Luis R. Rodriguez
2015-04-02 19:50                         ` Andy Lutomirski
2015-04-02 19:50                         ` Andy Lutomirski
2015-04-02 19:50                           ` Andy Lutomirski
2015-04-02 19:45                       ` Luis R. Rodriguez
2015-03-28 12:23                 ` Ville Syrjälä
2015-03-28  0:21             ` Luis R. Rodriguez
2015-03-28  0:21             ` Luis R. Rodriguez
2015-03-28  0:21               ` Luis R. Rodriguez
2015-03-27 21:56           ` Ville Syrjälä
2015-03-20 23:17 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 10/47] video: fbdev: atyfb: use arch_phys_wc_add() and ioremap_wc() Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 11/47] IB/qib: add acounting for MTRR Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 12/47] IB/qib: use arch_phys_wc_add() Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 13/47] IB/ipath: add counting for MTRR Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 14/47] IB/ipath: use __arch_phys_wc_add() Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 15/47] [media] media: ivtv: " Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 16/47] fusion: " Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 17/47] video: fbdev: vesafb: only support MTRR_TYPE_WRCOMB Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 18/47] vidoe: fbdev: vesafb: add missing mtrr_del() for added MTRR Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 19/47] video: fbdev: vesafb: use arch_phys_wc_add() Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 20/47] mtrr: avoid ifdef'ery with phys_wc_to_mtrr_index() Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 21/47] ethernet: myri10ge: use arch_phys_wc_add() Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-21  7:08   ` Hyong-Youb Kim
2015-03-21  7:08   ` Hyong-Youb Kim
2015-03-21  7:08     ` Hyong-Youb Kim
2015-03-27 20:36     ` Luis R. Rodriguez
2015-03-27 20:36     ` Luis R. Rodriguez
2015-03-27 20:36       ` Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 22/47] staging: sm750fb: use arch_phys_wc_add() and ioremap_wc() Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 23/47] staging: xgifb: " Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-04-30 17:40   ` Luis R. Rodriguez
2015-04-30 17:40   ` Luis R. Rodriguez
2015-04-30 17:40     ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 24/47] video: fbdev: arkfb: use arch_phys_wc_add() and pci_iomap_wc() Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 25/47] video: fbdev: radeonfb: use arch_phys_wc_add() and ioremap_wc() Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 26/47] video: fbdev: gbefb: add missing mtrr_del() calls Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 27/47] video: fbdev: gbefb: use arch_phys_wc_add() and devm_ioremap_wc() Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 28/47] video: fbdev: intelfb: use arch_phys_wc_add() and ioremap_wc() Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 29/47] video: fbdev: matrox: " Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 30/47] video: fbdev: neofb: " Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 31/47] video: fbdev: s3fb: use arch_phys_wc_add() and pci_iomap_wc() Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 32/47] video: fbdev: nvidia: use arch_phys_wc_add() and ioremap_wc() Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 33/47] video: fbdev: savagefb: " Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 34/47] video: fbdev: sisfb: " Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 35/47] video: fbdev: aty: " Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 36/47] video: fbdev: i810: " Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 37/47] video: fbdev: i740fb: use arch_phys_wc_add() and pci_ioremap_wc_bar() Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 38/47] video: fbdev: kyrofb: " Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 39/47] video: fbdev: pm2fb: use arch_phys_wc_add() and ioremap_wc() Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 40/47] video: fbdev: pm3fb: " Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 41/47] video: fbdev: rivafb: " Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 42/47] video: fbdev: tdfxfb: " Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 43/47] video: fbdev: vt8623fb: use arch_phys_wc_add() and pci_iomap_wc() Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 44/47] video: fbdev: atmel_lcdfb: use ioremap_wc() for framebuffer Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 45/47] video: fbdev: geode gxfb: " Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 46/47] video: fbdev: gxt4500: use pci_ioremap_wc_bar() " Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18 ` [PATCH v1 47/47] mtrr: bury MTRR - unexport mtrr_add() and mtrr_del() Luis R. Rodriguez
2015-03-20 23:18 ` Luis R. Rodriguez
2015-03-20 23:18   ` Luis R. Rodriguez
2015-03-21  1:08 ` [PATCH v1 00/47] mtrr/x86/drivers: bury MTRR Andy Lutomirski
2015-03-21  1:08 ` Andy Lutomirski
2015-03-21  1:08   ` Andy Lutomirski
2015-03-24 22:08 [PATCH v4 0/7] mtrr, mm, x86: Enhance MTRR checks for huge I/O mapping Toshi Kani
2015-03-24 22:08 ` Toshi Kani
2015-03-24 22:08 ` [PATCH v4 1/7] mm, x86: Document return values of mapping funcs Toshi Kani
2015-03-24 22:08   ` Toshi Kani
2015-05-05 11:19   ` Borislav Petkov
2015-05-05 11:19     ` Borislav Petkov
2015-05-05 13:46     ` Toshi Kani
2015-05-05 13:46       ` Toshi Kani
2015-05-05 14:19       ` Borislav Petkov
2015-05-05 14:19         ` Borislav Petkov
2015-05-05 14:14         ` Toshi Kani
2015-05-05 14:14           ` Toshi Kani
2015-03-24 22:08 ` [PATCH v4 2/7] mtrr, x86: Fix MTRR lookup to handle inclusive entry Toshi Kani
2015-03-24 22:08   ` Toshi Kani
2015-05-05 17:11   ` Borislav Petkov
2015-05-05 17:11     ` Borislav Petkov
2015-05-05 17:32     ` Toshi Kani
2015-05-05 17:32       ` Toshi Kani
2015-05-05 18:39       ` Borislav Petkov
2015-05-05 18:39         ` Borislav Petkov
2015-05-05 19:31         ` Toshi Kani
2015-05-05 19:31           ` Toshi Kani
2015-05-05 20:09           ` Borislav Petkov
2015-05-05 20:09             ` Borislav Petkov
2015-05-05 20:06             ` Toshi Kani
2015-05-05 20:06               ` Toshi Kani
2015-03-24 22:08 ` [PATCH v4 3/7] mtrr, x86: Remove a wrong address check in __mtrr_type_lookup() Toshi Kani
2015-03-24 22:08   ` Toshi Kani
2015-05-06 10:46   ` Borislav Petkov
2015-05-06 10:46     ` Borislav Petkov
2015-03-24 22:08 ` [PATCH v4 4/7] mtrr, x86: Fix MTRR state checks in mtrr_type_lookup() Toshi Kani
2015-03-24 22:08   ` Toshi Kani
2015-05-06 11:47   ` Borislav Petkov
2015-05-06 11:47     ` Borislav Petkov
2015-05-06 15:23     ` Toshi Kani
2015-05-06 15:23       ` Toshi Kani
2015-05-06 22:39       ` Borislav Petkov
2015-05-06 22:39         ` Borislav Petkov
2015-05-06 23:08         ` Toshi Kani
2015-05-06 23:08           ` Toshi Kani
2015-03-24 22:08 ` [PATCH v4 5/7] mtrr, x86: Define MTRR_TYPE_INVALID for mtrr_type_lookup() Toshi Kani
2015-03-24 22:08   ` Toshi Kani
2015-03-24 22:08 ` [PATCH v4 6/7] mtrr, x86: Clean up mtrr_type_lookup() Toshi Kani
2015-03-24 22:08   ` Toshi Kani
2015-05-06 13:41   ` Borislav Petkov
2015-05-06 13:41     ` Borislav Petkov
2015-05-06 16:00     ` Toshi Kani
2015-05-06 16:00       ` Toshi Kani
2015-05-06 22:49       ` Borislav Petkov
2015-05-06 22:49         ` Borislav Petkov
2015-05-06 23:42         ` Toshi Kani
2015-05-06 23:42           ` Toshi Kani
2015-05-07  7:52           ` Borislav Petkov
2015-05-07  7:52             ` Borislav Petkov
2015-05-07 13:45             ` Toshi Kani
2015-05-07 13:45               ` Toshi Kani
2015-03-24 22:08 ` [PATCH v4 7/7] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping Toshi Kani
2015-03-24 22:08   ` Toshi Kani
2015-05-09  9:08   ` Borislav Petkov
2015-05-09  9:08     ` Borislav Petkov
2015-05-11 19:25     ` Toshi Kani
2015-05-11 19:25       ` Toshi Kani
2015-05-11 20:18       ` Borislav Petkov
2015-05-11 20:18         ` Borislav Petkov
2015-05-11 20:38         ` Toshi Kani
2015-05-11 20:38           ` Toshi Kani
2015-05-11 21:42           ` Borislav Petkov
2015-05-11 21:42             ` Borislav Petkov
2015-05-11 22:09             ` Toshi Kani
2015-05-11 22:09               ` Toshi Kani
2015-05-12  7:28               ` Borislav Petkov
2015-05-12  7:28                 ` Borislav Petkov
2015-05-12 14:30                 ` Toshi Kani
2015-05-12 14:30                   ` Toshi Kani
2015-05-12 16:31                   ` Borislav Petkov
2015-05-12 16:31                     ` Borislav Petkov
2015-05-12 16:57                     ` Toshi Kani
2015-05-12 16:57                       ` Toshi Kani
2015-03-24 22:43 ` [PATCH v4 0/7] mtrr, mm, x86: Enhance MTRR checks for huge I/O mapping Andrew Morton
2015-03-24 22:43   ` Andrew Morton
2015-04-03  6:33   ` Ingo Molnar
2015-04-03  6:33     ` Ingo Molnar
2015-04-03 15:22     ` Toshi Kani
2015-04-03 15:22       ` Toshi Kani
2015-04-27 14:31       ` Toshi Kani
2015-04-27 14:31         ` Toshi Kani
2015-04-14 11:35 [PATCH] x86/kaslr: Fix typo in documentation Miroslav Benes
2015-04-14 11:37 ` Borislav Petkov
2015-04-22 17:12 [PATCH v4] mtrr: avoid ifdef'ery with phys_wc_to_mtrr_index() Luis R. Rodriguez
2015-04-22 17:12 ` Luis R. Rodriguez
2015-04-22 17:12 ` Luis R. Rodriguez
2015-04-28 22:13 [PATCH] x86: improve algorithm in clflush_cache_range Ross Zwisler
2015-04-29 10:28 ` Borislav Petkov
2015-04-28 22:46 [PATCH v2] x86: Add kerneldoc for pcommit_sfence() Ross Zwisler
2015-04-29 14:23 ` Borislav Petkov
2015-04-29 21:44 [PATCH v4 0/6] x86: document and address MTRR corner cases Luis R. Rodriguez
2015-04-29 21:44 ` [PATCH v4 1/6] x86: add ioremap_uc() - force strong UC, PCD=1, PWT=1 Luis R. Rodriguez
2015-04-29 21:44   ` Luis R. Rodriguez
2015-04-30 10:18   ` Borislav Petkov
2015-04-30 10:18     ` Borislav Petkov
2015-04-29 21:44 ` [PATCH v4 2/6] x86: document WC MTRR effects on PAT / non-PAT pages Luis R. Rodriguez
2015-04-29 21:44   ` Luis R. Rodriguez
2015-04-30 22:01   ` Randy Dunlap
2015-04-30 22:01     ` Randy Dunlap
2015-05-05  0:45     ` Luis R. Rodriguez
2015-05-05  0:45       ` Luis R. Rodriguez
2015-05-05  7:22       ` Borislav Petkov
2015-05-05  7:22         ` Borislav Petkov
2015-05-05  7:46         ` Luis R. Rodriguez
2015-05-05  7:46           ` Luis R. Rodriguez
2015-05-05  7:53           ` Borislav Petkov
2015-05-05  7:53             ` Borislav Petkov
2015-05-05  7:31     ` Luis R. Rodriguez
2015-05-05  7:31       ` Luis R. Rodriguez
2015-05-04 12:23   ` Borislav Petkov
2015-05-04 12:23     ` Borislav Petkov
2015-05-05  7:35     ` Luis R. Rodriguez
2015-05-05  7:35       ` Luis R. Rodriguez
2015-04-29 21:44 ` [PATCH v4 3/6] video: fbdev: atyfb: move framebuffer length fudging to helper Luis R. Rodriguez
2015-04-29 21:44   ` Luis R. Rodriguez
2015-04-29 21:44 ` [PATCH v4 4/6] video: fbdev: atyfb: clarify ioremap() base and length used Luis R. Rodriguez
2015-04-29 21:44   ` Luis R. Rodriguez
2015-04-29 21:44 ` [PATCH v4 5/6] video: fbdev: atyfb: replace MTRR UC hole with strong UC Luis R. Rodriguez
2015-04-29 21:44   ` Luis R. Rodriguez
2015-04-29 21:44 ` [PATCH v4 6/6] video: fbdev: atyfb: use arch_phys_wc_add() and ioremap_wc() Luis R. Rodriguez
2015-04-29 21:44   ` Luis R. Rodriguez
2015-05-20 19:53   ` Luis R. Rodriguez
2015-05-20 19:53     ` Luis R. Rodriguez
2015-05-20 20:57     ` Luis R. Rodriguez
2015-05-20 20:57       ` Luis R. Rodriguez
2015-06-03 23:50 ` [PATCH v4 0/6] x86: document and address MTRR corner cases Luis R. Rodriguez
2015-06-03 23:50   ` [Cocci] " Luis R. Rodriguez
2015-06-08 23:43   ` Luis R. Rodriguez
2015-06-08 23:43     ` [Cocci] " Luis R. Rodriguez
2015-06-16 19:31     ` Luis R. Rodriguez
2015-06-16 19:31       ` [Cocci] " Luis R. Rodriguez
2015-06-19 22:22       ` Luis R. Rodriguez
2015-06-25  1:24         ` Luis R. Rodriguez
2015-06-25  6:59           ` Ingo Molnar
2015-06-25 16:41             ` Luis R. Rodriguez
2015-04-30 20:25 [PATCH v5 0/6] x86: address drivers that do not work with PAT Luis R. Rodriguez
2015-04-30 20:25 ` [PATCH v5 1/6] x86/mm/pat: use pr_info() and friends Luis R. Rodriguez
2015-05-04 14:58   ` Borislav Petkov
2015-05-07  3:36   ` Elliott, Robert (Server Storage)
2015-05-14 15:55     ` Luis R. Rodriguez
2015-04-30 20:25 ` [PATCH v5 2/6] x86/mm/pat: redefine pat_enabled Luis R. Rodriguez
2015-05-04 15:22   ` Borislav Petkov
2015-05-05  0:42     ` Luis R. Rodriguez
2015-04-30 20:25 ` [PATCH v5 3/6] arch/x86/mm/pat: export pat_enabled() Luis R. Rodriguez
2015-05-04 15:29   ` Borislav Petkov
2015-04-30 20:25 ` [PATCH v5 4/6] ivtv: use arch_phys_wc_add() and require PAT disabled Luis R. Rodriguez
2015-04-30 20:25   ` Luis R. Rodriguez
2015-04-30 20:25   ` Luis R. Rodriguez
2015-04-30 20:25 ` [PATCH v5 5/6] IB/ipath: add counting for MTRR Luis R. Rodriguez
2015-04-30 20:25   ` Luis R. Rodriguez
2015-04-30 20:25 ` [PATCH v5 6/6] IB/ipath: use arch_phys_wc_add() and require PAT disabled Luis R. Rodriguez
2015-04-30 20:25   ` Luis R. Rodriguez
2015-04-30 20:25   ` Luis R. Rodriguez
2015-05-06 16:54 tools: Consolidate types.h Oleg Nesterov
2015-05-06 17:17 ` Borislav Petkov
2015-05-06 17:30   ` Oleg Nesterov
2015-05-06 17:37     ` Borislav Petkov
2015-05-07  2:53       ` Andy Lutomirski
2015-05-07 16:58         ` [PATCH 0/1] x86/vdso: add -Iarch/x86/include/uapi into HOST_EXTRACFLAGS Oleg Nesterov
2015-05-07 16:58           ` [PATCH 1/1] " Oleg Nesterov
2015-05-07 19:46             ` Andy Lutomirski
2015-05-07 21:55               ` Borislav Petkov
2015-05-11 12:44             ` [tip:x86/urgent] x86/vdso: Fix 'make bzImage' on older distros tip-bot for Oleg Nesterov
2015-05-11  8:15 [0/8] tip queue 2015-05-11 Borislav Petkov
2015-05-11  8:15 ` [PATCH] x86/alternatives: Switch AMD F15h and later to the P6 NOPs Borislav Petkov
2015-05-11 12:44   ` [tip:x86/asm] " tip-bot for Borislav Petkov
2015-05-11  8:15 ` [PATCH] x86/cpu/microcode: Zap changelog Borislav Petkov
2015-05-11 12:45   ` [tip:x86/microcode] " tip-bot for Borislav Petkov
2015-05-11  8:15 ` [PATCH] x86/kaslr: Fix typo in KASLR_FLAG documentation Borislav Petkov
2015-05-11 12:45   ` [tip:x86/boot] x86/kaslr: Fix typo in the " tip-bot for Miroslav Benes
2015-05-11  8:15 ` [PATCH 1/5] x86/mm: Do not flush last cacheline twice in clflush_cache_range() Borislav Petkov
2015-05-11 12:45   ` [tip:x86/mm] " tip-bot for Ross Zwisler
2015-05-11  8:15 ` [PATCH] x86/vdso: Add arch/x86/include/uapi include path to HOST_EXTRACFLAGS Borislav Petkov
2015-05-11  8:15 ` [PATCH 2/5] x86/mm: Add kerneldoc comments for pcommit_sfence() Borislav Petkov
2015-05-11 12:45   ` [tip:x86/mm] " tip-bot for Ross Zwisler
2015-05-11  8:15 ` [PATCH 3/5] x86/MTRR: Remove wrong address check in __mtrr_type_lookup() Borislav Petkov
2015-05-11 12:46   ` [tip:x86/mm] x86/mm/mtrr: Remove incorrect " tip-bot for Toshi Kani
2015-05-11 12:46     ` tip-bot for Toshi Kani
2015-05-11  8:15 ` [PATCH 4/5] x86/mm: Add ioremap_uc() helper to map memory uncacheable (not UC-) Borislav Petkov
2015-05-11 12:46   ` [tip:x86/mm] " tip-bot for Luis R. Rodriguez
2015-05-15 18:23 [PATCH v5 0/6] mtrr, mm, x86: Enhance MTRR checks for huge I/O mapping Toshi Kani
2015-05-15 18:23 ` Toshi Kani
2015-05-15 18:23 ` [PATCH v5 1/6] mm, x86: Simplify conditions of HAVE_ARCH_HUGE_VMAP Toshi Kani
2015-05-15 18:23   ` Toshi Kani
2015-05-17  8:30   ` Borislav Petkov
2015-05-15 18:23 ` [PATCH v5 2/6] mtrr, x86: Fix MTRR lookup to handle inclusive entry Toshi Kani
2015-05-15 18:23   ` Toshi Kani
2015-05-15 18:23 ` [PATCH v5 3/6] mtrr, x86: Fix MTRR state checks in mtrr_type_lookup() Toshi Kani
2015-05-15 18:23   ` Toshi Kani
2015-05-15 18:23 ` [PATCH v5 4/6] mtrr, x86: Define MTRR_TYPE_INVALID for mtrr_type_lookup() Toshi Kani
2015-05-15 18:23   ` Toshi Kani
2015-05-15 18:23 ` [PATCH v5 5/6] mtrr, x86: Clean up mtrr_type_lookup() Toshi Kani
2015-05-15 18:23   ` Toshi Kani
2015-05-15 18:23 ` [PATCH v5 6/6] mtrr, mm, x86: Enhance MTRR checks for KVA huge page mapping Toshi Kani
2015-05-15 18:23   ` Toshi Kani
2015-05-18 13:33   ` Borislav Petkov
2015-05-18 17:22     ` Toshi Kani
2015-05-18 17:22       ` Toshi Kani
2015-05-18 19:01       ` Borislav Petkov
2015-05-18 19:31         ` Toshi Kani
2015-05-18 19:31           ` Toshi Kani
2015-05-18 20:01           ` Borislav Petkov
2015-05-18 20:21             ` Toshi Kani
2015-05-18 20:21               ` Toshi Kani
2015-05-18 20:51               ` Borislav Petkov
2015-05-18 21:53                 ` Toshi Kani
2015-05-18 21:53                   ` Toshi Kani
2015-05-19 11:44                   ` Borislav Petkov
2015-05-19 13:23                     ` Borislav Petkov
2015-05-19 13:47                       ` Toshi Kani
2015-05-19 13:47                         ` Toshi Kani
2015-05-20 11:55                       ` Ingo Molnar
2015-05-20 11:55                         ` Ingo Molnar
2015-05-20 14:34                         ` Toshi Kani
2015-05-20 14:34                           ` Toshi Kani
2015-05-20 15:01                           ` Ingo Molnar
2015-05-20 15:01                             ` Ingo Molnar
2015-05-20 15:02                             ` Toshi Kani
2015-05-20 15:02                               ` Toshi Kani
2015-05-20 16:04                               ` Borislav Petkov
2015-05-20 15:46                                 ` Toshi Kani
2015-05-20 15:46                                   ` Toshi Kani
2015-05-18 16:34 [PATCH v4 0/3] Compile-time stack frame pointer validation Josh Poimboeuf
2015-05-18 16:34 ` [PATCH v4 1/3] x86, stackvalidate: " Josh Poimboeuf
2015-05-18 16:34 ` [PATCH v4 2/3] x86: Make push/pop CFI macros arch-independent Josh Poimboeuf
2015-05-18 16:34 ` [PATCH v4 3/3] x86, stackvalidate: Add asm frame pointer setup macros Josh Poimboeuf
2015-05-20 10:33 ` [PATCH v4 0/3] Compile-time stack frame pointer validation Ingo Molnar
2015-05-20 14:13   ` Josh Poimboeuf
2015-05-20 14:48     ` Ingo Molnar
2015-05-20 15:51       ` Josh Poimboeuf
2015-05-20 16:09         ` Josh Poimboeuf
2015-05-20 16:03       ` Andy Lutomirski
2015-05-20 16:25         ` Josh Poimboeuf
2015-05-20 16:39           ` Andy Lutomirski
2015-05-20 16:52           ` Borislav Petkov
2015-05-21 10:16             ` Ingo Molnar
2015-05-21 10:47               ` Borislav Petkov
2015-05-21 11:11                 ` Ingo Molnar
2015-05-21 15:49                   ` [PATCH 1/3] x86/documentation: Move kernel-stacks doc one level up Borislav Petkov
2015-05-21 15:49                     ` [PATCH 2/3] x86/documentation: Remove STACKFAULT_STACK bulletpoint Borislav Petkov
2015-05-21 15:49                     ` [PATCH 3/3] x86/documentation: Adapt Ingo's explanation on printing backtraces Borislav Petkov
2015-05-27 14:17               ` [tip:x86/debug] x86/Documentation: Adapt Ingo' s " tip-bot for Borislav Petkov
2015-05-20 16:59           ` [PATCH v4 0/3] Compile-time stack frame pointer validation Linus Torvalds
2015-05-20 17:20             ` Josh Poimboeuf
2015-05-21 10:27               ` Ingo Molnar
2015-05-21  7:52             ` Ingo Molnar
2015-05-21 12:12               ` Ingo Molnar
2015-05-26 23:06               ` Andi Kleen
2015-05-20 17:27         ` Peter Zijlstra
2015-05-20 19:10           ` Jiri Kosina
2015-05-21 20:54       ` Josh Poimboeuf
2015-05-21 21:53         ` Andy Lutomirski
2015-05-22 14:53           ` Josh Poimboeuf
2015-05-21 22:01         ` Borislav Petkov
2015-05-22 14:32           ` Josh Poimboeuf
2015-05-22 21:18             ` Jiri Kosina
2015-05-22 22:22               ` Josh Poimboeuf
2015-05-23  8:37             ` Borislav Petkov
2015-05-19  8:01 [RFC PATCH 0/4] x86, mwaitt: introduce AMD mwaitt support Huang Rui
2015-05-19  8:01 ` [RFC PATCH 1/4] x86, mwaitt: add monitorx and mwaitx instruction Huang Rui
2015-05-19 11:29   ` Borislav Petkov
2015-05-21  8:54     ` Huang Rui
2015-05-21  9:35       ` Borislav Petkov
2015-05-19  8:01 ` [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer Huang Rui
2015-05-19 11:31   ` Borislav Petkov
2015-05-20  8:55     ` Ingo Molnar
2015-05-20  9:12       ` Borislav Petkov
2015-05-20 10:22         ` Ingo Molnar
2015-05-20 10:50           ` Borislav Petkov
2015-05-20 11:11             ` Ingo Molnar
2015-05-20 11:21               ` Borislav Petkov
2015-05-20 11:41                 ` Ingo Molnar
2015-05-20 13:20                   ` Thomas Gleixner
2015-05-20 14:51                     ` Ingo Molnar
2015-05-20 15:55                       ` One Thousand Gnomes
2015-05-20 16:07                         ` Borislav Petkov
2015-05-20 19:12                           ` Thomas Gleixner
2015-05-20 20:15                             ` Borislav Petkov
2015-05-21 14:56                               ` Huang Rui
2015-05-21 16:02                                 ` Borislav Petkov
2015-05-21 16:45                                   ` Andy Lutomirski
2015-05-21 17:08                                     ` Borislav Petkov
2015-05-21 17:12                                       ` Andy Lutomirski
2015-05-21 19:30                                   ` Thomas Gleixner
2015-05-21 14:32                 ` Huang Rui
2015-05-25  2:42               ` Huang Rui
2015-05-25 10:43                 ` Ingo Molnar
2015-05-21 14:15         ` Huang Rui
2015-05-21 13:26     ` Huang Rui
2015-05-21  1:34   ` Andy Lutomirski
2015-05-21  5:48     ` Andy Lutomirski
2015-05-27  1:01       ` Andy Lutomirski
2015-05-27 11:30         ` Borislav Petkov
2015-05-21  9:41     ` Thomas Gleixner
2015-05-19  8:01 ` [RFC PATCH 3/4] x86, mwaitt: add document to describe mwaitx Huang Rui
2015-05-19  8:01 ` [RFC PATCH 4/4] x86, mwait: fix redundant comment Huang Rui
2015-05-19  9:40   ` Borislav Petkov
2015-05-19  8:57 ` [RFC PATCH 0/4] x86, mwaitt: introduce AMD mwaitt support Borislav Petkov
2015-05-19  9:44   ` Huang Rui
2015-05-19 15:43 [PATCH] x86, cpuinfo x86_model_id whitespace cleanup Prarit Bhargava
2015-05-19 16:56 ` Borislav Petkov
2015-05-19 17:25 ` Brian Gerst
2015-05-19 18:13   ` Borislav Petkov
2015-05-19 18:44     ` Andy Lutomirski
2015-05-19 19:22       ` Borislav Petkov
2015-05-19 20:16         ` Andy Lutomirski
2015-05-19 20:26           ` Joe Perches
2015-05-19 20:28             ` Joe Perches
2015-05-19 20:31           ` Borislav Petkov
2015-05-19 22:17             ` Prarit Bhargava
2015-05-27 17:18         ` H. Peter Anvin
2015-05-20  6:34 ` Ingo Molnar
2015-05-20 10:15   ` Prarit Bhargava
2015-06-02  8:42 ` [tip:x86/cpu] x86/cpu: Trim model ID whitespace tip-bot for Borislav Petkov
2015-05-20 11:22 [PATCH] mce: fix fail to set 'monarchtimeout' via boot option Xie XiuQi
2015-05-20 17:43 ` Borislav Petkov
2015-05-21  1:00   ` Xie XiuQi
2015-05-26  8:28 [PATCH 00/18] tip queue 2015-05-26 Borislav Petkov
2015-05-26  8:28 ` [PATCH 01/18] x86/kconfig: Simplify conditions for HAVE_ARCH_HUGE_VMAP Borislav Petkov
2015-05-27 14:17   ` [tip:x86/mm] x86/mm/kconfig: " tip-bot for Toshi Kani
2015-05-27 14:17     ` tip-bot for Toshi Kani
2015-05-26  8:28 ` [PATCH 02/18] x86/mtrr: Fix MTRR lookup to handle an inclusive entry Borislav Petkov
2015-05-27 14:18   ` [tip:x86/mm] x86/mm/mtrr: " tip-bot for Toshi Kani
2015-05-27 14:18     ` tip-bot for Toshi Kani
2015-05-26  8:28 ` [PATCH 03/18] x86/mtrr: Fix MTRR state checks in mtrr_type_lookup() Borislav Petkov
2015-05-27 14:18   ` [tip:x86/mm] x86/mm/mtrr: " tip-bot for Toshi Kani
2015-05-27 14:18     ` tip-bot for Toshi Kani
2015-05-26  8:28 ` [PATCH 04/18] x86/mtrr: Use symbolic define as a retval for disabled MTRRs Borislav Petkov
2015-05-27 14:18   ` [tip:x86/mm] x86/mm/mtrr: " tip-bot for Toshi Kani
2015-05-27 14:18     ` tip-bot for Toshi Kani
2015-05-26  8:28 ` [PATCH 05/18] x86/mtrr: Clean up mtrr_type_lookup() Borislav Petkov
2015-05-27 14:19   ` [tip:x86/mm] x86/mm/mtrr: " tip-bot for Toshi Kani
2015-05-27 14:19     ` tip-bot for Toshi Kani
2015-07-31 13:18     ` Peter Zijlstra
2015-07-31 13:18       ` Peter Zijlstra
2015-07-31 14:44       ` Borislav Petkov
2015-07-31 15:08         ` Peter Zijlstra
2015-07-31 15:08           ` Peter Zijlstra
2015-07-31 15:27           ` Borislav Petkov
2015-08-01 14:28             ` Luis R. Rodriguez
2015-08-01 14:28               ` Luis R. Rodriguez
2015-08-01 16:33               ` Borislav Petkov
2015-08-01 16:33                 ` Borislav Petkov
2015-08-01 16:39                 ` Linus Torvalds
2015-08-01 16:39                   ` Linus Torvalds
2015-08-01 16:49                   ` Borislav Petkov
2015-08-01 16:49                     ` Borislav Petkov
2015-08-01 17:03                     ` Linus Torvalds
2015-08-01 17:03                       ` Linus Torvalds
2015-05-26  8:28 ` [PATCH 06/18] x86/process: Drop repeated word from comment Borislav Petkov
2015-05-27 14:16   ` [tip:sched/core] sched/x86: Drop repeated word from mwait_idle() comment tip-bot for Huang Rui
2015-05-26  8:28 ` [PATCH 07/18] x86/mm: Enhance MTRR checks in kernel mapping helpers Borislav Petkov
2015-05-27 14:19   ` [tip:x86/mm] x86/mm/mtrr: " tip-bot for Toshi Kani
2015-05-27 14:19     ` tip-bot for Toshi Kani
2015-05-26  8:28 ` [PATCH 08/18] x86/mm/pat: Convert to pr_* usage Borislav Petkov
2015-05-27 14:19   ` [tip:x86/mm] x86/mm/pat: Convert to pr_*() usage tip-bot for Luis R. Rodriguez
2015-05-26  8:28 ` [PATCH 09/18] x86: Document Write Combining MTRR type effects on PAT / non-PAT pages Borislav Petkov
2015-05-27 14:19   ` [tip:x86/mm] x86/mm/mtrr, pat: " tip-bot for Luis R. Rodriguez
2015-05-26  8:28 ` [PATCH 10/18] x86/mtrr: Avoid ifdeffery with phys_wc_to_mtrr_index() Borislav Petkov
2015-05-27 14:20   ` [tip:x86/mm] x86/mm/mtrr: Avoid #ifdeffery " tip-bot for Luis R. Rodriguez
2015-05-26  8:28 ` [PATCH 11/18] x86/mtrr: Generalize runtime disabling of MTRRs Borislav Petkov
2015-05-27 14:20   ` [tip:x86/mm] x86/mm/mtrr: " tip-bot for Luis R. Rodriguez
2015-05-26  8:28 ` [PATCH 12/18] x86/mm/pat: Wrap pat_enabled Borislav Petkov
2015-05-27 14:20   ` [tip:x86/mm] x86/mm/pat: Wrap pat_enabled into a function API tip-bot for Luis R. Rodriguez
2015-05-26  8:28 ` [PATCH 13/18] x86/mm/pat: Export pat_enabled() Borislav Petkov
2015-05-27 14:21   ` [tip:x86/mm] " tip-bot for Luis R. Rodriguez
2015-05-26  8:28 ` [PATCH 14/18] x86/cpu: Strip any /proc/cpuinfo model name field whitespace Borislav Petkov
2015-05-27 14:16   ` [tip:x86/cpu] x86/cpu: Strip any /proc/ cpuinfo " tip-bot for Prarit Bhargava
2015-05-27 17:07     ` Joe Perches
2015-05-27 19:06       ` Borislav Petkov
2015-05-27 19:16         ` Joe Perches
2015-05-28 11:27           ` Prarit Bhargava
2015-05-28 11:32             ` Borislav Petkov
2015-05-28 12:58               ` Borislav Petkov
2015-05-28 16:57                 ` H. Peter Anvin
2015-05-28 18:33                   ` Borislav Petkov
2015-05-28 20:39                     ` H. Peter Anvin
2015-05-26  8:28 ` [PATCH 15/18] x86/documentation: Move kernel-stacks doc one level up Borislav Petkov
2015-05-27 14:17   ` [tip:x86/debug] x86/Documentation: " tip-bot for Borislav Petkov
2015-05-26  8:28 ` [PATCH 16/18] x86/documentation: Remove STACKFAULT_STACK bulletpoint Borislav Petkov
2015-05-27 14:17   ` [tip:x86/debug] x86/Documentation: " tip-bot for Borislav Petkov
2015-05-26  8:28 ` [PATCH 17/18] x86/documentation: Adapt Ingo's explanation on printing backtraces Borislav Petkov
2015-05-26  8:28 ` [PATCH 18/18] x86/mce: Fix monarch timeout setting through the mce= cmdline option Borislav Petkov
2015-06-07 17:39   ` [tip:x86/core] " tip-bot for Xie XiuQi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.