linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
@ 2015-04-15 20:42 Andy Lutomirski
  2015-04-15 20:56 ` H. Peter Anvin
                   ` (3 more replies)
  0 siblings, 4 replies; 40+ messages in thread
From: Andy Lutomirski @ 2015-04-15 20:42 UTC (permalink / raw)
  To: Luis R. Rodriguez, linux-rdma
  Cc: Toshi Kani, H. Peter Anvin, Ingo Molnar, linux-kernel,
	Hal Rosenstock, Sean Hefty, Suresh Siddha, Rickard Strandqvist,
	Mike Marciniszyn, Roland Dreier, Juergen Gross,
	Mauro Carvalho Chehab, Andy Walls, Borislav Petkov, Mel Gorman,
	Vlastimil Babka, Davidlohr Bueso, Dave Hansen,
	Jean-Christophe Plagniol-Villard, Thomas Gleixner,
	Ville Syrjälä,
	Linux Fbdev development list, linux-media, X86 ML

On Mon, Apr 13, 2015 at 10:49 AM, Luis R. Rodriguez <mcgrof@suse.com> wrote:

> c) ivtv: the driver does not have the PCI space mapped out separately, and
> in fact it actually does not do the math for the framebuffer, instead it lets
> the device's own CPU do that and assume where its at, see
> ivtvfb_get_framebuffer() and CX2341X_OSD_GET_FRAMEBUFFER, it has a get
> but not a setter. Its not clear if the firmware would make a split easy.
> We'd need ioremap_ucminus() here too and __arch_phys_wc_add().
>

IMO this should be conceptually easy to split.  Once we get the
framebuffer address, just unmap it (or don't prematurely map it) and
then ioremap the thing.

> From the beginning it seems only framebuffer devices used MTRR/WC, lately it
> seems infiniband drivers also find good use for for it for PIO TX buffers to
> blast some sort of data, in the future I would not be surprised if other
> devices found use for it.

IMO the Infiniband maintainers should fix their code.  Especially in
the server space, there aren't that many MTRRs to go around.  I wrote
arch_phys_wc_add in the first place because my server *ran out of
MTRRs*.

Hey, IB people: can you fix your drivers to use arch_phys_wc_add
(which is permitted to be a no-op) along with ioremap_wc?  Your users
will thank you.

> It may be true that the existing drivers that
> requires the above type of work are corner cases -- but I wouldn't hold my
> breath for that. The ivtv device is a good example of the worst type of
> situations and these days. So perhap __arch_phys_wc_add() and a
> ioremap_ucminus() might be something more than transient unless hardware folks
> get a good memo or already know how to just Do The Right Thing (TM).

I disagree.  We should try to NACK any new code that can't function
without MTRRs.

(Plus, ARM is growing in popularity in the server space, and ARM quite
sensibly doesn't have MTRRs.)

--Andy

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-15 20:42 ioremap_uc() followed by set_memory_wc() - burrying MTRR Andy Lutomirski
@ 2015-04-15 20:56 ` H. Peter Anvin
  2015-04-15 22:15 ` Luis R. Rodriguez
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 40+ messages in thread
From: H. Peter Anvin @ 2015-04-15 20:56 UTC (permalink / raw)
  To: Andy Lutomirski, Luis R. Rodriguez, linux-rdma
  Cc: Toshi Kani, Ingo Molnar, linux-kernel, Hal Rosenstock,
	Sean Hefty, Suresh Siddha, Rickard Strandqvist, Mike Marciniszyn,
	Roland Dreier, Juergen Gross, Mauro Carvalho Chehab, Andy Walls,
	Borislav Petkov, Mel Gorman, Vlastimil Babka, Davidlohr Bueso,
	Dave Hansen, Jean-Christophe Plagniol-Villard, Thomas Gleixner,
	Ville Syrjälä,
	Linux Fbdev development list, linux-media, X86 ML

On 04/15/2015 01:42 PM, Andy Lutomirski wrote:
> 
> I disagree.  We should try to NACK any new code that can't function
> without MTRRs.
> 
> (Plus, ARM is growing in popularity in the server space, and ARM quite
> sensibly doesn't have MTRRs.)
> 

<NOT SPEAKING FOR INTEL HERE>

Yes.  People need to understand that MTRRs are fundamentally a
transitional solution, a replacement for the KEN# logic in the P4 and P5
generation processors.  The KEN# logic in the chipset would notify the
CPU that a specific address should not be cached, without affecting the
software (which may have been written for x86s built before caching
existed, even.)

MTRRs move this to the head end, so the CPU knows ahead of time what to
do, as is required with newer architectures.  It also enabled write
combining in a transparent fashion.  However, it is still transitional;
it is there to describe the underlying constraints of the memory system
so that code which doesn't use paging can run at all, but the only thing
that can actually scale is PAT.

	-hpa


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-15 20:42 ioremap_uc() followed by set_memory_wc() - burrying MTRR Andy Lutomirski
  2015-04-15 20:56 ` H. Peter Anvin
@ 2015-04-15 22:15 ` Luis R. Rodriguez
  2015-04-15 22:50 ` Andy Walls
  2015-04-21 22:46 ` Luis R. Rodriguez
  3 siblings, 0 replies; 40+ messages in thread
From: Luis R. Rodriguez @ 2015-04-15 22:15 UTC (permalink / raw)
  To: Andy Lutomirski, Mauro Carvalho Chehab, linux-media
  Cc: linux-rdma, Toshi Kani, H. Peter Anvin, Ingo Molnar,
	linux-kernel, Hal Rosenstock, Sean Hefty, Suresh Siddha,
	Rickard Strandqvist, Mike Marciniszyn, Roland Dreier,
	Juergen Gross, Andy Walls, Borislav Petkov, Mel Gorman,
	Vlastimil Babka, Davidlohr Bueso, Dave Hansen,
	Jean-Christophe Plagniol-Villard, Thomas Gleixner,
	Ville Syrjälä,
	Linux Fbdev development list, X86 ML

On Wed, Apr 15, 2015 at 01:42:47PM -0700, Andy Lutomirski wrote:
> On Mon, Apr 13, 2015 at 10:49 AM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> 
> > c) ivtv: the driver does not have the PCI space mapped out separately, and
> > in fact it actually does not do the math for the framebuffer, instead it lets
> > the device's own CPU do that and assume where its at, see
> > ivtvfb_get_framebuffer() and CX2341X_OSD_GET_FRAMEBUFFER, it has a get
> > but not a setter. Its not clear if the firmware would make a split easy.
> > We'd need ioremap_ucminus() here too and __arch_phys_wc_add().
> >
> 
> IMO this should be conceptually easy to split.  Once we get the
> framebuffer address, just unmap it (or don't prematurely map it) and
> then ioremap the thing.

The driver has split code for handling framebuffer devices, the framebuffer
base address will also vary depending on the type of device it has, for
some its on the encoder, for others its on the decoder. We'd have to account
for the removal of the framebuffer on either of those regions, and would also
need some vetting that the driver doesn't use areas beyond that for MMIO.
Using the trick you suggest though we could overlap ioremap calls and if that
truly works on PAT and non-PAT adding a new ioremap_wc() could do the trick,
I'd appreciate a Tested-by or Acked-by to be done with this. Mauro, any chance
we can get a tested-by of ivtvfb for both non-PAT and PAT systems with this:

diff --git a/drivers/media/pci/ivtv/ivtvfb.c b/drivers/media/pci/ivtv/ivtvfb.c
index 9ff1230..1838738 100644
--- a/drivers/media/pci/ivtv/ivtvfb.c
+++ b/drivers/media/pci/ivtv/ivtvfb.c
@@ -44,10 +44,6 @@
 #include <linux/ivtvfb.h>
 #include <linux/slab.h>
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
-#endif
-
 #include "ivtv-driver.h"
 #include "ivtv-cards.h"
 #include "ivtv-i2c.h"
@@ -155,12 +151,11 @@ struct osd_info {
 	/* Buffer size */
 	u32 video_buffer_size;
 
-#ifdef CONFIG_MTRR
 	/* video_base rounded down as required by hardware MTRRs */
 	unsigned long fb_start_aligned_physaddr;
 	/* video_base rounded up as required by hardware MTRRs */
 	unsigned long fb_end_aligned_physaddr;
-#endif
+	int wc_cookie;
 
 	/* Store the buffer offset */
 	int set_osd_coords_x;
@@ -1099,6 +1094,8 @@ static int ivtvfb_init_vidmode(struct ivtv *itv)
 static int ivtvfb_init_io(struct ivtv *itv)
 {
 	struct osd_info *oi = itv->osd_info;
+	/* Find the largest power of two that maps the whole buffer */
+	int size_shift = 31;
 
 	mutex_lock(&itv->serialize_lock);
 	if (ivtv_init_on_first_open(itv)) {
@@ -1120,7 +1117,7 @@ static int ivtvfb_init_io(struct ivtv *itv)
 	oi->video_buffer_size = 1704960;
 
 	oi->video_pbase = itv->base_addr + IVTV_DECODER_OFFSET + oi->video_rbase;
-	oi->video_vbase = itv->dec_mem + oi->video_rbase;
+	oi->video_vbase = ioremap_wc(oi->video_pbase, oi->video_buffer_size);
 
 	if (!oi->video_vbase) {
 		IVTVFB_ERR("abort, video memory 0x%x @ 0x%lx isn't mapped!\n",
@@ -1132,29 +1129,16 @@ static int ivtvfb_init_io(struct ivtv *itv)
 			oi->video_pbase, oi->video_vbase,
 			oi->video_buffer_size / 1024);
 
-#ifdef CONFIG_MTRR
-	{
-		/* Find the largest power of two that maps the whole buffer */
-		int size_shift = 31;
-
-		while (!(oi->video_buffer_size & (1 << size_shift))) {
-			size_shift--;
-		}
-		size_shift++;
-		oi->fb_start_aligned_physaddr = oi->video_pbase & ~((1 << size_shift) - 1);
-		oi->fb_end_aligned_physaddr = oi->video_pbase + oi->video_buffer_size;
-		oi->fb_end_aligned_physaddr += (1 << size_shift) - 1;
-		oi->fb_end_aligned_physaddr &= ~((1 << size_shift) - 1);
-		if (mtrr_add(oi->fb_start_aligned_physaddr,
-			oi->fb_end_aligned_physaddr - oi->fb_start_aligned_physaddr,
-			     MTRR_TYPE_WRCOMB, 1) < 0) {
-			IVTVFB_INFO("disabled mttr\n");
-			oi->fb_start_aligned_physaddr = 0;
-			oi->fb_end_aligned_physaddr = 0;
-		}
-	}
-#endif
-
+	while (!(oi->video_buffer_size & (1 << size_shift)))
+		size_shift--;
+	size_shift++;
+	oi->fb_start_aligned_physaddr = oi->video_pbase & ~((1 << size_shift) - 1);
+	oi->fb_end_aligned_physaddr = oi->video_pbase + oi->video_buffer_size;
+	oi->fb_end_aligned_physaddr += (1 << size_shift) - 1;
+	oi->fb_end_aligned_physaddr &= ~((1 << size_shift) - 1);
+	oi->wc_cookie = arch_phys_wc_add(oi->fb_start_aligned_physaddr,
+					 oi->fb_end_aligned_physaddr -
+					 oi->fb_start_aligned_physaddr);
 	/* Blank the entire osd. */
 	memset_io(oi->video_vbase, 0, oi->video_buffer_size);
 
@@ -1172,14 +1156,8 @@ static void ivtvfb_release_buffers (struct ivtv *itv)
 
 	/* Release pseudo palette */
 	kfree(oi->ivtvfb_info.pseudo_palette);
-
-#ifdef CONFIG_MTRR
-	if (oi->fb_end_aligned_physaddr) {
-		mtrr_del(-1, oi->fb_start_aligned_physaddr,
-			oi->fb_end_aligned_physaddr - oi->fb_start_aligned_physaddr);
-	}
-#endif
-
+	iounmap(oi->video_vbase);
+	arch_phys_wc_del(oi->wc_cookie);
 	kfree(oi);
 	itv->osd_info = NULL;
 }

> 
> > From the beginning it seems only framebuffer devices used MTRR/WC, lately it
> > seems infiniband drivers also find good use for for it for PIO TX buffers to
> > blast some sort of data, in the future I would not be surprised if other
> > devices found use for it.
> 
> IMO the Infiniband maintainers should fix their code.  Especially in
> the server space, there aren't that many MTRRs to go around.  I wrote
> arch_phys_wc_add in the first place because my server *ran out of
> MTRRs*.
> 
> Hey, IB people: can you fix your drivers to use arch_phys_wc_add
> (which is permitted to be a no-op) along with ioremap_wc?  Your users
> will thank you.

Provided the above ivtv driver changes are OK this would be the *last* and only
driver required to be changed.

> > It may be true that the existing drivers that
> > requires the above type of work are corner cases -- but I wouldn't hold my
> > breath for that. The ivtv device is a good example of the worst type of
> > situations and these days. So perhap __arch_phys_wc_add() and a
> > ioremap_ucminus() might be something more than transient unless hardware folks
> > get a good memo or already know how to just Do The Right Thing (TM).
> 
> I disagree.  We should try to NACK any new code that can't function
> without MTRRs.
> 
> (Plus, ARM is growing in popularity in the server space, and ARM quite
> sensibly doesn't have MTRRs.)

Great, happy with this, but we need to address the last few drivers and their
exisitng code then.

 Luis

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-15 20:42 ioremap_uc() followed by set_memory_wc() - burrying MTRR Andy Lutomirski
  2015-04-15 20:56 ` H. Peter Anvin
  2015-04-15 22:15 ` Luis R. Rodriguez
@ 2015-04-15 22:50 ` Andy Walls
  2015-04-15 23:52   ` Andy Lutomirski
  2015-04-21 22:46 ` Luis R. Rodriguez
  3 siblings, 1 reply; 40+ messages in thread
From: Andy Walls @ 2015-04-15 22:50 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Luis R. Rodriguez, linux-rdma, Toshi Kani, H. Peter Anvin,
	Ingo Molnar, linux-kernel, Hal Rosenstock, Sean Hefty,
	Suresh Siddha, Rickard Strandqvist, Mike Marciniszyn,
	Roland Dreier, Juergen Gross, Mauro Carvalho Chehab,
	Borislav Petkov, Mel Gorman, Vlastimil Babka, Davidlohr Bueso,
	Dave Hansen, Jean-Christophe Plagniol-Villard, Thomas Gleixner,
	Ville Syrjälä,
	Linux Fbdev development list, linux-media, X86 ML

On Wed, 2015-04-15 at 13:42 -0700, Andy Lutomirski wrote:
> On Mon, Apr 13, 2015 at 10:49 AM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> 
> > c) ivtv: the driver does not have the PCI space mapped out separately, and
> > in fact it actually does not do the math for the framebuffer, instead it lets
> > the device's own CPU do that and assume where its at, see
> > ivtvfb_get_framebuffer() and CX2341X_OSD_GET_FRAMEBUFFER, it has a get
> > but not a setter. Its not clear if the firmware would make a split easy.
> > We'd need ioremap_ucminus() here too and __arch_phys_wc_add().
> >
> 
> IMO this should be conceptually easy to split.  Once we get the
> framebuffer address, just unmap it (or don't prematurely map it) and
> then ioremap the thing.

Not so easy.  The main ivtv driver has already set up the PCI device and
done the mapping for the MPEG-2 decoder/video output engine.  The video
decoder/output device nodes might already be open by user space calling
into the main driver, before the ivtvfb module is even loaded.

This could be mitigated by integrating all the ivtvfb module code into
the main ivtv module.  But even then not every PVR-350 owner wants to
use the video output OSD as a framebuffer.  Users might just want an
actual OSD overlaying their TV video playback.

Regards,
Andy


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-15 22:50 ` Andy Walls
@ 2015-04-15 23:52   ` Andy Lutomirski
  2015-04-16  0:33     ` Andy Walls
  0 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2015-04-15 23:52 UTC (permalink / raw)
  To: Andy Walls
  Cc: Luis R. Rodriguez, linux-rdma, Toshi Kani, H. Peter Anvin,
	Ingo Molnar, linux-kernel, Hal Rosenstock, Sean Hefty,
	Suresh Siddha, Rickard Strandqvist, Mike Marciniszyn,
	Roland Dreier, Juergen Gross, Mauro Carvalho Chehab,
	Borislav Petkov, Mel Gorman, Vlastimil Babka, Davidlohr Bueso,
	Dave Hansen, Jean-Christophe Plagniol-Villard, Thomas Gleixner,
	Ville Syrjälä,
	Linux Fbdev development list, linux-media, X86 ML

On Wed, Apr 15, 2015 at 3:50 PM, Andy Walls <awalls@md.metrocast.net> wrote:
> On Wed, 2015-04-15 at 13:42 -0700, Andy Lutomirski wrote:
>> On Mon, Apr 13, 2015 at 10:49 AM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>>
>> > c) ivtv: the driver does not have the PCI space mapped out separately, and
>> > in fact it actually does not do the math for the framebuffer, instead it lets
>> > the device's own CPU do that and assume where its at, see
>> > ivtvfb_get_framebuffer() and CX2341X_OSD_GET_FRAMEBUFFER, it has a get
>> > but not a setter. Its not clear if the firmware would make a split easy.
>> > We'd need ioremap_ucminus() here too and __arch_phys_wc_add().
>> >
>>
>> IMO this should be conceptually easy to split.  Once we get the
>> framebuffer address, just unmap it (or don't prematurely map it) and
>> then ioremap the thing.
>
> Not so easy.  The main ivtv driver has already set up the PCI device and
> done the mapping for the MPEG-2 decoder/video output engine.  The video
> decoder/output device nodes might already be open by user space calling
> into the main driver, before the ivtvfb module is even loaded.

Surely the MPEG-2 decoder/video engine won't overlap the framebuffer,
though.  Am I missing something?

--Andy

>
> This could be mitigated by integrating all the ivtvfb module code into
> the main ivtv module.  But even then not every PVR-350 owner wants to
> use the video output OSD as a framebuffer.  Users might just want an
> actual OSD overlaying their TV video playback.
>
> Regards,
> Andy
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-15 23:52   ` Andy Lutomirski
@ 2015-04-16  0:33     ` Andy Walls
  0 siblings, 0 replies; 40+ messages in thread
From: Andy Walls @ 2015-04-16  0:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Luis R. Rodriguez, linux-rdma, Toshi Kani, H. Peter Anvin,
	Ingo Molnar, linux-kernel, Hal Rosenstock, Sean Hefty,
	Suresh Siddha, Rickard Strandqvist, Mike Marciniszyn,
	Roland Dreier, Juergen Gross, Mauro Carvalho Chehab,
	Borislav Petkov, Mel Gorman, Vlastimil Babka, Davidlohr Bueso,
	Dave Hansen, Jean-Christophe Plagniol-Villard, Thomas Gleixner,
	Ville Syrjälä,
	Linux Fbdev development list, linux-media, X86 ML

On Wed, 2015-04-15 at 16:52 -0700, Andy Lutomirski wrote:
> On Wed, Apr 15, 2015 at 3:50 PM, Andy Walls <awalls@md.metrocast.net> wrote:
> > On Wed, 2015-04-15 at 13:42 -0700, Andy Lutomirski wrote:
> >> On Mon, Apr 13, 2015 at 10:49 AM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> >>
> >> > c) ivtv: the driver does not have the PCI space mapped out separately, and
> >> > in fact it actually does not do the math for the framebuffer, instead it lets
> >> > the device's own CPU do that and assume where its at, see
> >> > ivtvfb_get_framebuffer() and CX2341X_OSD_GET_FRAMEBUFFER, it has a get
> >> > but not a setter. Its not clear if the firmware would make a split easy.
> >> > We'd need ioremap_ucminus() here too and __arch_phys_wc_add().
> >> >
> >>
> >> IMO this should be conceptually easy to split.  Once we get the
> >> framebuffer address, just unmap it (or don't prematurely map it) and
> >> then ioremap the thing.
> >
> > Not so easy.  The main ivtv driver has already set up the PCI device and
> > done the mapping for the MPEG-2 decoder/video output engine.  The video
> > decoder/output device nodes might already be open by user space calling
> > into the main driver, before the ivtvfb module is even loaded.
> 
> Surely the MPEG-2 decoder/video engine won't overlap the framebuffer,
> though.  Am I missing something?

ivtvfb is stealing the decoders' OSD for use as a framebuffer.
The decoder video output memory doesn't overlap the decoder OSD memory,
but there is a functional overlap.  ivtv driver video output device
nodes can manipulate the OSD that ivtvfb is stealing.

It would be a dumb thing for the user to want to use ivtvfb, and to also
manipulate the OSD via the video output device nodes at the same time,
for anything other than setting up the TV video standard.  However the
current ivtv driver code doesn't prevent the OSD from being manipulated
by the video output device nodes when ivtvfb is in use.

-Andy


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-15 20:42 ioremap_uc() followed by set_memory_wc() - burrying MTRR Andy Lutomirski
                   ` (2 preceding siblings ...)
  2015-04-15 22:50 ` Andy Walls
@ 2015-04-21 22:46 ` Luis R. Rodriguez
  2015-04-21 22:57   ` Jason Gunthorpe
  3 siblings, 1 reply; 40+ messages in thread
From: Luis R. Rodriguez @ 2015-04-21 22:46 UTC (permalink / raw)
  To: Andy Lutomirski, jgunthorpe, mike.marciniszyn, infinipath,
	linux-rdma, awalls
  Cc: linux-rdma, Toshi Kani, H. Peter Anvin, Ingo Molnar,
	linux-kernel, Hal Rosenstock, Sean Hefty, Suresh Siddha,
	Rickard Strandqvist, Mike Marciniszyn, Roland Dreier,
	Juergen Gross, Mauro Carvalho Chehab, Andy Walls,
	Borislav Petkov, Mel Gorman, Vlastimil Babka, Davidlohr Bueso,
	Dave Hansen, Jean-Christophe Plagniol-Villard, Thomas Gleixner,
	Ville Syrjälä,
	Linux Fbdev development list, linux-media, X86 ML, mcgrof

On Wed, Apr 15, 2015 at 01:42:47PM -0700, Andy Lutomirski wrote:
> On Mon, Apr 13, 2015 at 10:49 AM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> 
> > c) ivtv: the driver does not have the PCI space mapped out separately, and
> > in fact it actually does not do the math for the framebuffer, instead it lets
> > the device's own CPU do that and assume where its at, see
> > ivtvfb_get_framebuffer() and CX2341X_OSD_GET_FRAMEBUFFER, it has a get
> > but not a setter. Its not clear if the firmware would make a split easy.
> > We'd need ioremap_ucminus() here too and __arch_phys_wc_add().
> >
> 
> IMO this should be conceptually easy to split.  Once we get the
> framebuffer address, just unmap it (or don't prematurely map it) and
> then ioremap the thing.

Side note to ipath driver folks: as reviewed with Andy Walls, the
ivtv driver cannot easily be ported to use PAT so we are evaluating
simply removing write-combing from that driver on future kernels.

> 
> > From the beginning it seems only framebuffer devices used MTRR/WC, lately it
> > seems infiniband drivers also find good use for for it for PIO TX buffers to
> > blast some sort of data, in the future I would not be surprised if other
> > devices found use for it.
> 
> IMO the Infiniband maintainers should fix their code.  Especially in
> the server space, there aren't that many MTRRs to go around.  I wrote
> arch_phys_wc_add in the first place because my server *ran out of
> MTRRs*.
> 
> Hey, IB people: can you fix your drivers to use arch_phys_wc_add
> (which is permitted to be a no-op) along with ioremap_wc?  Your users

ipath driver maintainers:

The ipath driver is one of two drivers left to convert over to
arch_phys_wc_add(). MTRR use is being deprecated, and its use is actually
highly discouraged now that we have proper PAT implemenation on Linux. Since we
are talking about annotating the qib driver as "known to be broken without PAT"
and since the ipath driver needs considerable work to be ported to use PAT (the
userspace register is just one area) I wanted to review if we can just remove
MTRR use on the ipath driver and annotate write-combining with PAT as a TODO
item.

This would help a lot in our journey to bury MTRR use.

  Luis

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-21 22:46 ` Luis R. Rodriguez
@ 2015-04-21 22:57   ` Jason Gunthorpe
  2015-04-21 23:39     ` Luis R. Rodriguez
  0 siblings, 1 reply; 40+ messages in thread
From: Jason Gunthorpe @ 2015-04-21 22:57 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Andy Lutomirski, mike.marciniszyn, infinipath, linux-rdma,
	awalls, Toshi Kani, H. Peter Anvin, Ingo Molnar, linux-kernel,
	Hal Rosenstock, Sean Hefty, Suresh Siddha, Rickard Strandqvist,
	Roland Dreier, Juergen Gross, Mauro Carvalho Chehab,
	Borislav Petkov, Mel Gorman, Vlastimil Babka, Davidlohr Bueso,
	Dave Hansen, Jean-Christophe Plagniol-Villard, Thomas Gleixner,
	Ville Syrjälä,
	Linux Fbdev development list, linux-media, X86 ML, mcgrof

On Wed, Apr 22, 2015 at 12:46:01AM +0200, Luis R. Rodriguez wrote:

> are talking about annotating the qib driver as "known to be broken without PAT"
> and since the ipath driver needs considerable work to be ported to
> use PAT (the

This only seems to be true for one of the chips that driver supports,
not all possibilities.

> userspace register is just one area) I wanted to review if we can just remove
> MTRR use on the ipath driver and annotate write-combining with PAT as a TODO
> item.

AFAIK, dropping MTRR support will completely break the performance to
the point the driver is unusable. If we drop MTRR we may as well
remove the driver.

Mike, do you think the time is right to just remove the iPath driver?

Jason

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-21 22:57   ` Jason Gunthorpe
@ 2015-04-21 23:39     ` Luis R. Rodriguez
  2015-04-22  5:39       ` Jason Gunthorpe
  0 siblings, 1 reply; 40+ messages in thread
From: Luis R. Rodriguez @ 2015-04-21 23:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Andy Lutomirski, mike.marciniszyn, infinipath, linux-rdma,
	awalls, Toshi Kani, H. Peter Anvin, Ingo Molnar, linux-kernel,
	Hal Rosenstock, Sean Hefty, Suresh Siddha, Rickard Strandqvist,
	Roland Dreier, Juergen Gross, Mauro Carvalho Chehab,
	Borislav Petkov, Mel Gorman, Vlastimil Babka, Davidlohr Bueso,
	Dave Hansen, Jean-Christophe Plagniol-Villard, Thomas Gleixner,
	Ville Syrjälä,
	Linux Fbdev development list, linux-media, X86 ML, mcgrof

On Tue, Apr 21, 2015 at 04:57:32PM -0600, Jason Gunthorpe wrote:
> On Wed, Apr 22, 2015 at 12:46:01AM +0200, Luis R. Rodriguez wrote:
> 
> > are talking about annotating the qib driver as "known to be broken without PAT"
> > and since the ipath driver needs considerable work to be ported to
> > use PAT (the
> 
> This only seems to be true for one of the chips that driver supports,
> not all possibilities.
> 
> > userspace register is just one area) I wanted to review if we can just remove
> > MTRR use on the ipath driver and annotate write-combining with PAT as a TODO
> > item.
> 
> AFAIK, dropping MTRR support will completely break the performance to
> the point the driver is unusable. If we drop MTRR we may as well
> remove the driver.

To be clear, the arch_phys_wc_add() API will be a no-op when PAT is
enabled on a system. Although some folks think PAT is new, its not,
its just that our implementation on Linux lacked a bit of push, recent
changes however make PAT support complete and that means now we'll have
PAT enabled on systems that likely didn't before on recent kernels.

There are other important motivations to use PAT:

 * Xen won't work with MTRR, having a unified PAT enabled kernel
   will ensure that when Xen is used write-combinging will take
   effect

 * Long term we want to make strong UC the default to ioremap() on
   x86, right now its not, its UC-, we need to convert all drivers
   that want write-combining over to use ioremap_wc() for their
   specific area, and it must not overlap. There are issues with
   using mtrr_add() on regions marked as UC-, since its the default
   this  means that mtrr_add() use on ioramp'd areas on PAT systems
   will actually make write-combining *void*. Refer to this table
   for combinatorial effects for non-PAT / PAT of wc MTRR:

   https://marc.info/?l=linux-kernel&m=142964809710517&w=1

So as the train of PAT enablement moves forward it means systems
that have PAT enabled now might not have write-combining effective.
In order to get the best of both worlds, non-PAT and PAT systems
we must design drivers cleanly for the non-writecombining and
write-combining areas. This mean using ioremap_nocache() for MMIO
and ioremap_wc() *only* for the desired, write-combining area,
followed by arch_phys_wc_add().

> Mike, do you think the time is right to just remove the iPath driver?

With PAT now being default the driver effectively won't work
with write-combining on modern kernels. Even if systems are old
they likely had PAT support, when upgrading kernels PAT will work
but write-combing won't on ipath.

  Luis

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-21 23:39     ` Luis R. Rodriguez
@ 2015-04-22  5:39       ` Jason Gunthorpe
  2015-04-22 15:23         ` Luis R. Rodriguez
  0 siblings, 1 reply; 40+ messages in thread
From: Jason Gunthorpe @ 2015-04-22  5:39 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Andy Lutomirski, mike.marciniszyn, infinipath, linux-rdma,
	awalls, Toshi Kani, H. Peter Anvin, Ingo Molnar, linux-kernel,
	Hal Rosenstock, Sean Hefty, Suresh Siddha, Rickard Strandqvist,
	Roland Dreier, Juergen Gross, Mauro Carvalho Chehab,
	Borislav Petkov, Mel Gorman, Vlastimil Babka, Davidlohr Bueso,
	Dave Hansen, Jean-Christophe Plagniol-Villard, Thomas Gleixner,
	Ville Syrj?l?,
	Linux Fbdev development list, linux-media, X86 ML, mcgrof

On Wed, Apr 22, 2015 at 01:39:07AM +0200, Luis R. Rodriguez wrote:
> > Mike, do you think the time is right to just remove the iPath driver?
> 
> With PAT now being default the driver effectively won't work
> with write-combining on modern kernels. Even if systems are old
> they likely had PAT support, when upgrading kernels PAT will work
> but write-combing won't on ipath.

Sorry, do you mean the driver already doesn't get WC? Or do you mean
after some more pending patches are applied?

Jason

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-22  5:39       ` Jason Gunthorpe
@ 2015-04-22 15:23         ` Luis R. Rodriguez
  2015-04-22 15:54           ` Luis R. Rodriguez
                             ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Luis R. Rodriguez @ 2015-04-22 15:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Andy Lutomirski, mike.marciniszyn, infinipath, linux-rdma,
	awalls, Toshi Kani, H. Peter Anvin, Ingo Molnar, linux-kernel,
	Hal Rosenstock, Sean Hefty, Suresh Siddha, Rickard Strandqvist,
	Roland Dreier, Juergen Gross, Mauro Carvalho Chehab,
	Borislav Petkov, Mel Gorman, Vlastimil Babka, Davidlohr Bueso,
	Dave Hansen, Jean-Christophe Plagniol-Villard, Thomas Gleixner,
	Ville Syrj?l?,
	Linux Fbdev development list, linux-media, X86 ML, mcgrof

On Tue, Apr 21, 2015 at 11:39:39PM -0600, Jason Gunthorpe wrote:
> On Wed, Apr 22, 2015 at 01:39:07AM +0200, Luis R. Rodriguez wrote:
> > > Mike, do you think the time is right to just remove the iPath driver?
> > 
> > With PAT now being default the driver effectively won't work
> > with write-combining on modern kernels. Even if systems are old
> > they likely had PAT support, when upgrading kernels PAT will work
> > but write-combing won't on ipath.
> 
> Sorry, do you mean the driver already doesn't get WC? Or do you mean
> after some more pending patches are applied?

No, you have to consider the system used and the effects of calls used
on the driver in light of this table:

----------------------------------------------------------------------
MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
----------------------------------------------------------------------
                                                  Non-PAT |  PAT
     PAT
     |PCD
     ||PWT
     |||
WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   UC
WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
----------------------------------------------------------------------

(*) denotes implementation defined and is discouraged

ioremap_nocache() will use _PAGE_CACHE_MODE_UC_MINUS by default today,
in the future we want to flip the switch and make _PAGE_CACHE_MODE_UC
the default. When that flip occurs it will mean ipath cannot get
write-combining on both non-PAT and PAT systems. Now that is for
the future, lets review the current situation for ipath.

For PAT capable systems if mtrr_add() is used today on a Linux system on a
region mapped with ioremap_nocache() that will mean you effectively nullify the
mtrr_add() effect as the combinatorial effect above yields an effective memory
type of UC.  For PAT systems you want to use ioremap_wc() on the region in
which you need write-combining followed by arch_phys_wc_add() which will *only*
call mtrr_add() *iff* PAT was not enabled. This also means we need to split
the ioremap'd areas so that the area that is using ioremap_nocache() can never
get write-combining (_PAGE_CACHE_MODE_UC). The ipath driver needs the regions
split just as was done for the qib driver.

Now we could just say that leaving things as-is is a non-issue if you are OK
with non-write-combining effects being the default behaviour left on the ipath
driver for PAT systems. In that case we can just use arch_phys_wc_add() on the
driver and while it won't trigger the mtrr_add() on PAT systems it sill won't
have any effect. We just typically don't want to see use of ioremap_nocache()
paired with arch_phys_wc_add(), grammatically the correct thing to do is pair
ioremap_wc() areas with a arch_phys_wc_add() to make the write-combining effects
on non-PAT systems. If the ipath driver is not going to get he work required
to split the regions though perhaps we can live with a corner case driver that
annotates PAT must be disabled on the systems that use it and convert it to
arch_phys_wc_add() to just help with phasing out of direct use of mtrr_add().
With this strategy if and when ipath driver gets a split done it would gain WC
on both PAT and non-PAT.

  Luis

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-22 15:23         ` Luis R. Rodriguez
@ 2015-04-22 15:54           ` Luis R. Rodriguez
  2015-04-22 15:59             ` Luis R. Rodriguez
  2015-04-22 16:17           ` Jason Gunthorpe
  2015-04-22 16:53           ` Andy Lutomirski
  2 siblings, 1 reply; 40+ messages in thread
From: Luis R. Rodriguez @ 2015-04-22 15:54 UTC (permalink / raw)
  To: Jason Gunthorpe, Andy Walls
  Cc: Andy Lutomirski, mike.marciniszyn, infinipath, linux-rdma,
	awalls, Toshi Kani, H. Peter Anvin, Ingo Molnar, linux-kernel,
	Hal Rosenstock, Sean Hefty, Suresh Siddha, Rickard Strandqvist,
	Roland Dreier, Juergen Gross, Mauro Carvalho Chehab,
	Borislav Petkov, Mel Gorman, Vlastimil Babka, Davidlohr Bueso,
	Dave Hansen, Jean-Christophe Plagniol-Villard, Thomas Gleixner,
	Ville Syrj?l?,
	Linux Fbdev development list, linux-media, X86 ML, mcgrof

On Wed, Apr 22, 2015 at 05:23:28PM +0200, Luis R. Rodriguez wrote:
> On Tue, Apr 21, 2015 at 11:39:39PM -0600, Jason Gunthorpe wrote:
> > On Wed, Apr 22, 2015 at 01:39:07AM +0200, Luis R. Rodriguez wrote:
> > > > Mike, do you think the time is right to just remove the iPath driver?
> > > 
> > > With PAT now being default the driver effectively won't work
> > > with write-combining on modern kernels. Even if systems are old
> > > they likely had PAT support, when upgrading kernels PAT will work
> > > but write-combing won't on ipath.
> > 
> > Sorry, do you mean the driver already doesn't get WC? Or do you mean
> > after some more pending patches are applied?
> 
> No, you have to consider the system used and the effects of calls used
> on the driver in light of this table:
> 
> ----------------------------------------------------------------------
> MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
> ----------------------------------------------------------------------
>                                                   Non-PAT |  PAT
>      PAT
>      |PCD
>      ||PWT
>      |||
> WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
> WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
> WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   UC
> WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
> ----------------------------------------------------------------------
> 
> (*) denotes implementation defined and is discouraged
> 
> ioremap_nocache() will use _PAGE_CACHE_MODE_UC_MINUS by default today,
> in the future we want to flip the switch and make _PAGE_CACHE_MODE_UC
> the default. When that flip occurs it will mean ipath cannot get
> write-combining on both non-PAT and PAT systems. Now that is for
> the future, lets review the current situation for ipath.
> 
> For PAT capable systems if mtrr_add() is used today on a Linux system on a
> region mapped with ioremap_nocache() that will mean you effectively nullify the
> mtrr_add() effect as the combinatorial effect above yields an effective memory
> type of UC.  For PAT systems you want to use ioremap_wc() on the region in
> which you need write-combining followed by arch_phys_wc_add() which will *only*
> call mtrr_add() *iff* PAT was not enabled. This also means we need to split
> the ioremap'd areas so that the area that is using ioremap_nocache() can never
> get write-combining (_PAGE_CACHE_MODE_UC). The ipath driver needs the regions
> split just as was done for the qib driver.
> 
> Now we could just say that leaving things as-is is a non-issue if you are OK
> with non-write-combining effects being the default behaviour left on the ipath
> driver for PAT systems. In that case we can just use arch_phys_wc_add() on the
> driver and while it won't trigger the mtrr_add() on PAT systems it sill won't
> have any effect. We just typically don't want to see use of ioremap_nocache()
> paired with arch_phys_wc_add(), grammatically the correct thing to do is pair
> ioremap_wc() areas with a arch_phys_wc_add() to make the write-combining effects
> on non-PAT systems. If the ipath driver is not going to get he work required
> to split the regions though perhaps we can live with a corner case driver that
> annotates PAT must be disabled on the systems that use it and convert it to
> arch_phys_wc_add() to just help with phasing out of direct use of mtrr_add().
> With this strategy if and when ipath driver gets a split done it would gain WC
> on both PAT and non-PAT.

Folks, after some thought I do believe the above temporary strategy would
avoid issues and would not have to stir people up to go and make code
changes. We can use the same strategy for both ivtv and ipath:

  * Annotate via Kconfig for the driver that it depends on !X86_PAT
    that will ensure that PAT systems won't use it, and convert it
    to use arch_phys_wc_add() to help phase out direct access to mtrr_add()

This would be correct given that the current situation on the driver
makes write-combining non-effective on PAT systems, we in fact gain
avoiding these type of use-cases, and annotate this as a big TODO item
for folks who do want it for PAT systems.

Thoughts?

  Luis

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-22 15:54           ` Luis R. Rodriguez
@ 2015-04-22 15:59             ` Luis R. Rodriguez
  0 siblings, 0 replies; 40+ messages in thread
From: Luis R. Rodriguez @ 2015-04-22 15:59 UTC (permalink / raw)
  To: Jason Gunthorpe, Andy Walls
  Cc: Andy Lutomirski, Mike Marciniszyn, Mike Marciniszyn, linux-rdma,
	Toshi Kani, H. Peter Anvin, Ingo Molnar, linux-kernel,
	Hal Rosenstock, Sean Hefty, Suresh Siddha, Rickard Strandqvist,
	Roland Dreier, Juergen Gross, Mauro Carvalho Chehab,
	Borislav Petkov, Mel Gorman, Vlastimil Babka, Davidlohr Bueso,
	Dave Hansen, Jean-Christophe Plagniol-Villard, Thomas Gleixner,
	Ville Syrj?l?,
	Linux Fbdev development list, linux-media, X86 ML,
	Luis R. Rodriguez

On Wed, Apr 22, 2015 at 8:54 AM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Wed, Apr 22, 2015 at 05:23:28PM +0200, Luis R. Rodriguez wrote:
>> On Tue, Apr 21, 2015 at 11:39:39PM -0600, Jason Gunthorpe wrote:
>> > On Wed, Apr 22, 2015 at 01:39:07AM +0200, Luis R. Rodriguez wrote:
>> > > > Mike, do you think the time is right to just remove the iPath driver?
>> > >
>> > > With PAT now being default the driver effectively won't work
>> > > with write-combining on modern kernels. Even if systems are old
>> > > they likely had PAT support, when upgrading kernels PAT will work
>> > > but write-combing won't on ipath.
>> >
>> > Sorry, do you mean the driver already doesn't get WC? Or do you mean
>> > after some more pending patches are applied?
>>
>> No, you have to consider the system used and the effects of calls used
>> on the driver in light of this table:
>>
>> ----------------------------------------------------------------------
>> MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
>> ----------------------------------------------------------------------
>>                                                   Non-PAT |  PAT
>>      PAT
>>      |PCD
>>      ||PWT
>>      |||
>> WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
>> WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
>> WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   UC
>> WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
>> ----------------------------------------------------------------------
>>
>> (*) denotes implementation defined and is discouraged
>>
>> ioremap_nocache() will use _PAGE_CACHE_MODE_UC_MINUS by default today,
>> in the future we want to flip the switch and make _PAGE_CACHE_MODE_UC
>> the default. When that flip occurs it will mean ipath cannot get
>> write-combining on both non-PAT and PAT systems. Now that is for
>> the future, lets review the current situation for ipath.
>>
>> For PAT capable systems if mtrr_add() is used today on a Linux system on a
>> region mapped with ioremap_nocache() that will mean you effectively nullify the
>> mtrr_add() effect as the combinatorial effect above yields an effective memory
>> type of UC.  For PAT systems you want to use ioremap_wc() on the region in
>> which you need write-combining followed by arch_phys_wc_add() which will *only*
>> call mtrr_add() *iff* PAT was not enabled. This also means we need to split
>> the ioremap'd areas so that the area that is using ioremap_nocache() can never
>> get write-combining (_PAGE_CACHE_MODE_UC). The ipath driver needs the regions
>> split just as was done for the qib driver.
>>
>> Now we could just say that leaving things as-is is a non-issue if you are OK
>> with non-write-combining effects being the default behaviour left on the ipath
>> driver for PAT systems. In that case we can just use arch_phys_wc_add() on the
>> driver and while it won't trigger the mtrr_add() on PAT systems it sill won't
>> have any effect. We just typically don't want to see use of ioremap_nocache()
>> paired with arch_phys_wc_add(), grammatically the correct thing to do is pair
>> ioremap_wc() areas with a arch_phys_wc_add() to make the write-combining effects
>> on non-PAT systems. If the ipath driver is not going to get he work required
>> to split the regions though perhaps we can live with a corner case driver that
>> annotates PAT must be disabled on the systems that use it and convert it to
>> arch_phys_wc_add() to just help with phasing out of direct use of mtrr_add().
>> With this strategy if and when ipath driver gets a split done it would gain WC
>> on both PAT and non-PAT.
>
> Folks, after some thought I do believe the above temporary strategy would
> avoid issues and would not have to stir people up to go and make code
> changes. We can use the same strategy for both ivtv and ipath:
>
>   * Annotate via Kconfig for the driver that it depends on !X86_PAT
>     that will ensure that PAT systems won't use it, and convert it
>     to use arch_phys_wc_add() to help phase out direct access to mtrr_add()
>
> This would be correct given that the current situation on the driver
> makes write-combining non-effective on PAT systems, we in fact gain
> avoiding these type of use-cases, and annotate this as a big TODO item
> for folks who do want it for PAT systems.
>
> Thoughts?

Another option in order to enable this type of checks at run time and
still be able to build the driver on standard distributions and just
prevent if from loading on PAT systems is to have some code in place
which would prevent the driver from loading if PAT was enabled, this
would enable folks to disable PAT via a kernel command line option,
and if that was used then the driver probe would complete.

 Luis

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-22 15:23         ` Luis R. Rodriguez
  2015-04-22 15:54           ` Luis R. Rodriguez
@ 2015-04-22 16:17           ` Jason Gunthorpe
  2015-04-22 16:51             ` Luis R. Rodriguez
  2015-04-22 18:53             ` Doug Ledford
  2015-04-22 16:53           ` Andy Lutomirski
  2 siblings, 2 replies; 40+ messages in thread
From: Jason Gunthorpe @ 2015-04-22 16:17 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Andy Lutomirski, mike.marciniszyn, infinipath, linux-rdma,
	awalls, Toshi Kani, H. Peter Anvin, Ingo Molnar, linux-kernel,
	Hal Rosenstock, Sean Hefty, Suresh Siddha, Rickard Strandqvist,
	Roland Dreier, Juergen Gross, Mauro Carvalho Chehab,
	Borislav Petkov, Mel Gorman, Vlastimil Babka, Davidlohr Bueso,
	Dave Hansen, Jean-Christophe Plagniol-Villard, Thomas Gleixner,
	Ville Syrj?l?,
	Linux Fbdev development list, linux-media, X86 ML, mcgrof

On Wed, Apr 22, 2015 at 05:23:28PM +0200, Luis R. Rodriguez wrote:
> On Tue, Apr 21, 2015 at 11:39:39PM -0600, Jason Gunthorpe wrote:
> > On Wed, Apr 22, 2015 at 01:39:07AM +0200, Luis R. Rodriguez wrote:
> > > > Mike, do you think the time is right to just remove the iPath driver?
> > > 
> > > With PAT now being default the driver effectively won't work
> > > with write-combining on modern kernels. Even if systems are old
> > > they likely had PAT support, when upgrading kernels PAT will work
> > > but write-combing won't on ipath.
> > 
> > Sorry, do you mean the driver already doesn't get WC? Or do you mean
> > after some more pending patches are applied?
> 
> No, you have to consider the system used and the effects of calls used
> on the driver in light of this table:

So, just to be clear:

At some point Linux started setting the PAT bits during
ioremap_nocache, which overrides MTRR, and at that point the driver
became broken on all PAT capable systems?

Not only that, but we've only just noticed it now, and no user ever
complained?

So that means either no users exist, or all users are on non-PAT
systems?

This driver only works on x86-64 systems. Are there any x86-64 systems
that are not PAT capable? IIRC even the first Opteron had PAT, but my
memory is fuzzy from back then :|

> Another option in order to enable this type of checks at run time
> and still be able to build the driver on standard distributions and
> just prevent if from loading on PAT systems is to have some code in
> place which would prevent the driver from loading if PAT was
> enabled, this would enable folks to disable PAT via a kernel command
> line option, and if that was used then the driver probe would
> complete.

This seems like a reasonble option to me. At the very least we might
learn if anyone is still using these cards.

I'd also love to remove the driver if it turns out there are actually
no users. qib substantially replaces it except for a few very old
cards.

Mike?

Jason

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-22 16:17           ` Jason Gunthorpe
@ 2015-04-22 16:51             ` Luis R. Rodriguez
  2015-04-22 18:53             ` Doug Ledford
  1 sibling, 0 replies; 40+ messages in thread
From: Luis R. Rodriguez @ 2015-04-22 16:51 UTC (permalink / raw)
  To: Jason Gunthorpe, Juergen Gross, Andy Walls
  Cc: Andy Lutomirski, mike.marciniszyn, infinipath, linux-rdma,
	awalls, Toshi Kani, H. Peter Anvin, Ingo Molnar, linux-kernel,
	Hal Rosenstock, Sean Hefty, Suresh Siddha, Rickard Strandqvist,
	Roland Dreier, Juergen Gross, Mauro Carvalho Chehab,
	Borislav Petkov, Mel Gorman, Vlastimil Babka, Davidlohr Bueso,
	Dave Hansen, Jean-Christophe Plagniol-Villard, Thomas Gleixner,
	Ville Syrj?l?,
	Linux Fbdev development list, linux-media, X86 ML, mcgrof

On Wed, Apr 22, 2015 at 10:17:55AM -0600, Jason Gunthorpe wrote:
> On Wed, Apr 22, 2015 at 05:23:28PM +0200, Luis R. Rodriguez wrote:
> > On Tue, Apr 21, 2015 at 11:39:39PM -0600, Jason Gunthorpe wrote:
> > > On Wed, Apr 22, 2015 at 01:39:07AM +0200, Luis R. Rodriguez wrote:
> > > > > Mike, do you think the time is right to just remove the iPath driver?
> > > > 
> > > > With PAT now being default the driver effectively won't work
> > > > with write-combining on modern kernels. Even if systems are old
> > > > they likely had PAT support, when upgrading kernels PAT will work
> > > > but write-combing won't on ipath.
> > > 
> > > Sorry, do you mean the driver already doesn't get WC? Or do you mean
> > > after some more pending patches are applied?
> > 
> > No, you have to consider the system used and the effects of calls used
> > on the driver in light of this table:
> 
> So, just to be clear:
> 
> At some point Linux started setting the PAT bits during
> ioremap_nocache, which overrides MTRR, and at that point the driver
> became broken on all PAT capable systems?

No, PAT code lacked quite a bit of love, and Juergen and some others have
been giving it some love and now we expect PAT to be enabled by default on
more systems. When and and on what systemes and as of what commits? Not
sure, there's quite a bit of PAT work but hoping Juergen might fill
in the details, CC'd.

> Not only that, but we've only just noticed it now, and no user ever
> complained?

No, well this is all recent, so we expect more PAT enabled systems now.

> So that means either no users exist, or all users are on non-PAT
> systems?

PAT was not being enabled before on most systems, it will now be on
most systems, for some systems there may be some errata that needs to
be addressed for PAT.

> This driver only works on x86-64 systems. Are there any x86-64 systems
> that are not PAT capable? 

Not that I know of, but I've heard of some "PAT errata" systems, and
that may need some attention.

> IIRC even the first Opteron had PAT, but my
> memory is fuzzy from back then :|
> 
> > Another option in order to enable this type of checks at run time
> > and still be able to build the driver on standard distributions and
> > just prevent if from loading on PAT systems is to have some code in
> > place which would prevent the driver from loading if PAT was
> > enabled, this would enable folks to disable PAT via a kernel command
> > line option, and if that was used then the driver probe would
> > complete.
> 
> This seems like a reasonble option to me. At the very least we might
> learn if anyone is still using these cards.

OK great.

> I'd also love to remove the driver if it turns out there are actually
> no users. qib substantially replaces it except for a few very old
> cards.
> 
> Mike?

By replacing do you mean same hardware? BTW below are the changes
which I describe above which would prevent both ipath and ivtv
to load on PAT enabled systems. I think its a reasonable compromise.
If this is OK I can proceed with a split of the patches and move 
on with the last series that burries MTRR.

Andy Walls, please review too.

  Luis

diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c
index bd0caed..3ef592c 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -42,6 +42,9 @@
 #include <linux/bitmap.h>
 #include <linux/slab.h>
 #include <linux/module.h>
+#ifdef CONFIG_X86_64
+#include <asm/pat.h>
+#endif
 
 #include "ipath_kernel.h"
 #include "ipath_verbs.h"
@@ -395,6 +398,14 @@ static int ipath_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	unsigned long long addr;
 	u32 bar0 = 0, bar1 = 0;
 
+#ifdef CONFIG_X86_64
+	if (WARN(pat_enabled,
+		 "ipath needs PAT disabled, boot with nopat kernel parameter\n")) {
+		ret = EINVAL;
+		goto bail;
+	}
+#endif
+
 	dd = ipath_alloc_devdata(pdev);
 	if (IS_ERR(dd)) {
 		ret = PTR_ERR(dd);
@@ -542,6 +553,7 @@ static int ipath_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	dd->ipath_kregbase = __ioremap(addr, len,
 		(_PAGE_NO_CACHE|_PAGE_WRITETHRU));
 #else
+	/* XXX: split this properly to enable on PAT */
 	dd->ipath_kregbase = ioremap_nocache(addr, len);
 #endif
 
@@ -587,12 +599,8 @@ static int ipath_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	ret = ipath_enable_wc(dd);
 
-	if (ret) {
-		ipath_dev_err(dd, "Write combining not enabled "
-			      "(err %d): performance may be poor\n",
-			      -ret);
+	if (ret)
 		ret = 0;
-	}
 
 	ipath_verify_pioperf(dd);
 
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index e08db70..f0f9471 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -463,9 +463,7 @@ struct ipath_devdata {
 	/* offset in HT config space of slave/primary interface block */
 	u8 ipath_ht_slave_off;
 	/* for write combining settings */
-	unsigned long ipath_wc_cookie;
-	unsigned long ipath_wc_base;
-	unsigned long ipath_wc_len;
+	int wc_cookie;
 	/* ref count for each pkey */
 	atomic_t ipath_pkeyrefs[4];
 	/* shadow copy of struct page *'s for exp tid pages */
diff --git a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
index 70c1f3a..7b6e4c8 100644
--- a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
+++ b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
@@ -37,7 +37,6 @@
  */
 
 #include <linux/pci.h>
-#include <asm/mtrr.h>
 #include <asm/processor.h>
 
 #include "ipath_kernel.h"
@@ -122,27 +121,14 @@ int ipath_enable_wc(struct ipath_devdata *dd)
 	}
 
 	if (!ret) {
-		int cookie;
-		ipath_cdbg(VERBOSE, "Setting mtrr for chip to WC "
-			   "(addr %llx, len=0x%llx)\n",
-			   (unsigned long long) pioaddr,
-			   (unsigned long long) piolen);
-		cookie = mtrr_add(pioaddr, piolen, MTRR_TYPE_WRCOMB, 1);
-		if (cookie < 0) {
-			{
-				dev_info(&dd->pcidev->dev,
-					 "mtrr_add()  WC for PIO bufs "
-					 "failed (%d)\n",
-					 cookie);
-				ret = -EINVAL;
-			}
-		} else {
-			ipath_cdbg(VERBOSE, "Set mtrr for chip to WC, "
-				   "cookie is %d\n", cookie);
-			dd->ipath_wc_cookie = cookie;
-			dd->ipath_wc_base = (unsigned long) pioaddr;
-			dd->ipath_wc_len = (unsigned long) piolen;
-		}
+		dd->wc_cookie = arch_phys_wc_add(pioaddr, piolen);
+		if (dd->wc_cookie < 0) {
+			ipath_dev_err(dd, "Seting mtrr failed on PIO buffers\n");
+			ret = -ENODEV;
+		} else if (dd->wc_cookie == 0)
+			ipath_cdbg(VERBOSE, "Set mtrr for chip to WC not needed\n");
+		else
+			ipath_cdbg(VERBOSE, "Set mtrr for chip to WC\n");
 	}
 
 	return ret;
@@ -154,16 +140,5 @@ int ipath_enable_wc(struct ipath_devdata *dd)
  */
 void ipath_disable_wc(struct ipath_devdata *dd)
 {
-	if (dd->ipath_wc_cookie) {
-		int r;
-		ipath_cdbg(VERBOSE, "undoing WCCOMB on pio buffers\n");
-		r = mtrr_del(dd->ipath_wc_cookie, dd->ipath_wc_base,
-			     dd->ipath_wc_len);
-		if (r < 0)
-			dev_info(&dd->pcidev->dev,
-				 "mtrr_del(%lx, %lx, %lx) failed: %d\n",
-				 dd->ipath_wc_cookie, dd->ipath_wc_base,
-				 dd->ipath_wc_len, r);
-		dd->ipath_wc_cookie = 0; /* even on failure */
-	}
+	arch_phys_wc_del(dd->wc_cookie);
 }
diff --git a/drivers/media/pci/ivtv/ivtvfb.c b/drivers/media/pci/ivtv/ivtvfb.c
index 9ff1230..369598c 100644
--- a/drivers/media/pci/ivtv/ivtvfb.c
+++ b/drivers/media/pci/ivtv/ivtvfb.c
@@ -44,8 +44,8 @@
 #include <linux/ivtvfb.h>
 #include <linux/slab.h>
 
-#ifdef CONFIG_MTRR
-#include <asm/mtrr.h>
+#ifdef CONFIG_X86_64
+#include <asm/pat.h>
 #endif
 
 #include "ivtv-driver.h"
@@ -155,12 +155,11 @@ struct osd_info {
 	/* Buffer size */
 	u32 video_buffer_size;
 
-#ifdef CONFIG_MTRR
 	/* video_base rounded down as required by hardware MTRRs */
 	unsigned long fb_start_aligned_physaddr;
 	/* video_base rounded up as required by hardware MTRRs */
 	unsigned long fb_end_aligned_physaddr;
-#endif
+	int wc_cookie;
 
 	/* Store the buffer offset */
 	int set_osd_coords_x;
@@ -1099,6 +1098,8 @@ static int ivtvfb_init_vidmode(struct ivtv *itv)
 static int ivtvfb_init_io(struct ivtv *itv)
 {
 	struct osd_info *oi = itv->osd_info;
+	/* Find the largest power of two that maps the whole buffer */
+	int size_shift = 31;
 
 	mutex_lock(&itv->serialize_lock);
 	if (ivtv_init_on_first_open(itv)) {
@@ -1120,6 +1121,7 @@ static int ivtvfb_init_io(struct ivtv *itv)
 	oi->video_buffer_size = 1704960;
 
 	oi->video_pbase = itv->base_addr + IVTV_DECODER_OFFSET + oi->video_rbase;
+	/* XXX: split this for PAT */
 	oi->video_vbase = itv->dec_mem + oi->video_rbase;
 
 	if (!oi->video_vbase) {
@@ -1132,29 +1134,16 @@ static int ivtvfb_init_io(struct ivtv *itv)
 			oi->video_pbase, oi->video_vbase,
 			oi->video_buffer_size / 1024);
 
-#ifdef CONFIG_MTRR
-	{
-		/* Find the largest power of two that maps the whole buffer */
-		int size_shift = 31;
-
-		while (!(oi->video_buffer_size & (1 << size_shift))) {
-			size_shift--;
-		}
-		size_shift++;
-		oi->fb_start_aligned_physaddr = oi->video_pbase & ~((1 << size_shift) - 1);
-		oi->fb_end_aligned_physaddr = oi->video_pbase + oi->video_buffer_size;
-		oi->fb_end_aligned_physaddr += (1 << size_shift) - 1;
-		oi->fb_end_aligned_physaddr &= ~((1 << size_shift) - 1);
-		if (mtrr_add(oi->fb_start_aligned_physaddr,
-			oi->fb_end_aligned_physaddr - oi->fb_start_aligned_physaddr,
-			     MTRR_TYPE_WRCOMB, 1) < 0) {
-			IVTVFB_INFO("disabled mttr\n");
-			oi->fb_start_aligned_physaddr = 0;
-			oi->fb_end_aligned_physaddr = 0;
-		}
-	}
-#endif
-
+	while (!(oi->video_buffer_size & (1 << size_shift)))
+		size_shift--;
+	size_shift++;
+	oi->fb_start_aligned_physaddr = oi->video_pbase & ~((1 << size_shift) - 1);
+	oi->fb_end_aligned_physaddr = oi->video_pbase + oi->video_buffer_size;
+	oi->fb_end_aligned_physaddr += (1 << size_shift) - 1;
+	oi->fb_end_aligned_physaddr &= ~((1 << size_shift) - 1);
+	oi->wc_cookie = arch_phys_wc_add(oi->fb_start_aligned_physaddr,
+					 oi->fb_end_aligned_physaddr -
+					 oi->fb_start_aligned_physaddr);
 	/* Blank the entire osd. */
 	memset_io(oi->video_vbase, 0, oi->video_buffer_size);
 
@@ -1172,14 +1161,8 @@ static void ivtvfb_release_buffers (struct ivtv *itv)
 
 	/* Release pseudo palette */
 	kfree(oi->ivtvfb_info.pseudo_palette);
-
-#ifdef CONFIG_MTRR
-	if (oi->fb_end_aligned_physaddr) {
-		mtrr_del(-1, oi->fb_start_aligned_physaddr,
-			oi->fb_end_aligned_physaddr - oi->fb_start_aligned_physaddr);
-	}
-#endif
-
+	iounmap(oi->video_vbase);
+	arch_phys_wc_del(oi->wc_cookie);
 	kfree(oi);
 	itv->osd_info = NULL;
 }
@@ -1190,6 +1173,13 @@ static int ivtvfb_init_card(struct ivtv *itv)
 {
 	int rc;
 
+#ifdef CONFIG_X86_64
+	if (WARN(pat_enabled,
+		 "ivtv needs PAT disabled, boot with nopat kernel parameter\n")) {
+		return EINVAL;
+	}
+#endif
+
 	if (itv->osd_info) {
 		IVTVFB_ERR("Card %d already initialised\n", ivtvfb_card_id);
 		return -EBUSY;

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-22 15:23         ` Luis R. Rodriguez
  2015-04-22 15:54           ` Luis R. Rodriguez
  2015-04-22 16:17           ` Jason Gunthorpe
@ 2015-04-22 16:53           ` Andy Lutomirski
  2015-04-22 17:07             ` Luis R. Rodriguez
  2 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2015-04-22 16:53 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Jason Gunthorpe, Mike Marciniszyn, Mike Marciniszyn, linux-rdma,
	Andy Walls, Toshi Kani, H. Peter Anvin, Ingo Molnar,
	linux-kernel, Hal Rosenstock, Sean Hefty, Suresh Siddha,
	Rickard Strandqvist, Roland Dreier, Juergen Gross,
	Mauro Carvalho Chehab, Borislav Petkov, Mel Gorman,
	Vlastimil Babka, Davidlohr Bueso, Dave Hansen,
	Jean-Christophe Plagniol-Villard, Thomas Gleixner, Ville Syrj?l?,
	Linux Fbdev development list, linux-media, X86 ML,
	Luis R. Rodriguez

On Wed, Apr 22, 2015 at 8:23 AM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Tue, Apr 21, 2015 at 11:39:39PM -0600, Jason Gunthorpe wrote:
>> On Wed, Apr 22, 2015 at 01:39:07AM +0200, Luis R. Rodriguez wrote:
>> > > Mike, do you think the time is right to just remove the iPath driver?
>> >
>> > With PAT now being default the driver effectively won't work
>> > with write-combining on modern kernels. Even if systems are old
>> > they likely had PAT support, when upgrading kernels PAT will work
>> > but write-combing won't on ipath.
>>
>> Sorry, do you mean the driver already doesn't get WC? Or do you mean
>> after some more pending patches are applied?
>
> No, you have to consider the system used and the effects of calls used
> on the driver in light of this table:
>
> ----------------------------------------------------------------------
> MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
> ----------------------------------------------------------------------
>                                                   Non-PAT |  PAT
>      PAT
>      |PCD
>      ||PWT
>      |||
> WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
> WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
> WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   UC
> WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
> ----------------------------------------------------------------------
>
> (*) denotes implementation defined and is discouraged
>
> ioremap_nocache() will use _PAGE_CACHE_MODE_UC_MINUS by default today,
> in the future we want to flip the switch and make _PAGE_CACHE_MODE_UC
> the default. When that flip occurs it will mean ipath cannot get
> write-combining on both non-PAT and PAT systems. Now that is for
> the future, lets review the current situation for ipath.
>
> For PAT capable systems if mtrr_add() is used today on a Linux system on a
> region mapped with ioremap_nocache() that will mean you effectively nullify the
> mtrr_add() effect as the combinatorial effect above yields an effective memory
> type of UC.

Are you sure?  I thought that ioremap_nocache currently is UC-, so
mtrr_add + ioremap_nocache gets WC even on PAT systems.

Going forward, when mtrr_add is gone, this will change, of course.

--Andy

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-22 16:53           ` Andy Lutomirski
@ 2015-04-22 17:07             ` Luis R. Rodriguez
  0 siblings, 0 replies; 40+ messages in thread
From: Luis R. Rodriguez @ 2015-04-22 17:07 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jason Gunthorpe, Mike Marciniszyn, Mike Marciniszyn, linux-rdma,
	Andy Walls, Toshi Kani, H. Peter Anvin, Ingo Molnar,
	linux-kernel, Hal Rosenstock, Sean Hefty, Suresh Siddha,
	Rickard Strandqvist, Roland Dreier, Juergen Gross,
	Mauro Carvalho Chehab, Borislav Petkov, Mel Gorman,
	Vlastimil Babka, Davidlohr Bueso, Dave Hansen,
	Jean-Christophe Plagniol-Villard, Thomas Gleixner, Ville Syrj?l?,
	Linux Fbdev development list, linux-media, X86 ML,
	Luis R. Rodriguez

On Wed, Apr 22, 2015 at 09:53:03AM -0700, Andy Lutomirski wrote:
> On Wed, Apr 22, 2015 at 8:23 AM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Tue, Apr 21, 2015 at 11:39:39PM -0600, Jason Gunthorpe wrote:
> >> On Wed, Apr 22, 2015 at 01:39:07AM +0200, Luis R. Rodriguez wrote:
> >> > > Mike, do you think the time is right to just remove the iPath driver?
> >> >
> >> > With PAT now being default the driver effectively won't work
> >> > with write-combining on modern kernels. Even if systems are old
> >> > they likely had PAT support, when upgrading kernels PAT will work
> >> > but write-combing won't on ipath.
> >>
> >> Sorry, do you mean the driver already doesn't get WC? Or do you mean
> >> after some more pending patches are applied?
> >
> > No, you have to consider the system used and the effects of calls used
> > on the driver in light of this table:
> >
> > ----------------------------------------------------------------------
> > MTRR Non-PAT   PAT    Linux ioremap value        Effective memory type
> > ----------------------------------------------------------------------
> >                                                   Non-PAT |  PAT
> >      PAT
> >      |PCD
> >      ||PWT
> >      |||
> > WC   000      WB      _PAGE_CACHE_MODE_WB            WC   |   WC
> > WC   001      WC      _PAGE_CACHE_MODE_WC            WC*  |   WC
> > WC   010      UC-     _PAGE_CACHE_MODE_UC_MINUS      WC*  |   UC
> > WC   011      UC      _PAGE_CACHE_MODE_UC            UC   |   UC
> > ----------------------------------------------------------------------
> >
> > (*) denotes implementation defined and is discouraged
> >
> > ioremap_nocache() will use _PAGE_CACHE_MODE_UC_MINUS by default today,
> > in the future we want to flip the switch and make _PAGE_CACHE_MODE_UC
> > the default. When that flip occurs it will mean ipath cannot get
> > write-combining on both non-PAT and PAT systems. Now that is for
> > the future, lets review the current situation for ipath.
> >
> > For PAT capable systems if mtrr_add() is used today on a Linux system on a
> > region mapped with ioremap_nocache() that will mean you effectively nullify the
> > mtrr_add() effect as the combinatorial effect above yields an effective memory
> > type of UC.
> 
> Are you sure?

Well lets double check.

>  I thought that ioremap_nocache currently is UC-,

It is.

> so mtrr_add + ioremap_nocache gets WC even on PAT systems.

https://www-ssl.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf

    As per Intel SDM "11.5.2.2 Selecting Memory Types for Pentium
    III and More Recent Processor Families" the ffect of a WC MTRR
    for a region with a PAT entry value of UC will be UC. The effect
    of a WC MTRR on a region with a PAT entry UC- will be WC. The
    effect of a WC MTRR on a regoin with PAT entry WC is WC.

And indeed as per table 11-7 mtrr WC on PAT UC- yields WC. So ineed the above
table needs adjustment for this. So for PAT systems write-combing would be
effective with mtrr_add(), but once strong UC (_PAGE_CACHE_MODE_UC) is used by
default for ioremap_nocache() what I mentioned will be true. Furhtermore if we
switch the drivers to use arch_phys_wc_add() then for sure write-combining will
also not be effective.

Jason, Andy, is the change still a reasonable compromise? We'd just be asking
users to boot with noat for users for ipath, ivtv until the drivers gets proper
PAT support with a split.

There are two motivations for this:

  * help move to strong UC as default
  * bury MTRR

> Going forward, when mtrr_add is gone, this will change, of course.

Indeed.

  Luis

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-22 16:17           ` Jason Gunthorpe
  2015-04-22 16:51             ` Luis R. Rodriguez
@ 2015-04-22 18:53             ` Doug Ledford
  2015-04-22 19:05               ` Luis R. Rodriguez
  2015-04-22 20:46               ` Jason Gunthorpe
  1 sibling, 2 replies; 40+ messages in thread
From: Doug Ledford @ 2015-04-22 18:53 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Luis R. Rodriguez, Andy Lutomirski, mike.marciniszyn, infinipath,
	linux-rdma, awalls, Toshi Kani, H. Peter Anvin, Ingo Molnar,
	linux-kernel, Hal Rosenstock, Sean Hefty, Suresh Siddha,
	Rickard Strandqvist, Roland Dreier, Juergen Gross,
	Mauro Carvalho Chehab, Borislav Petkov, Mel Gorman,
	Vlastimil Babka, Davidlohr Bueso, Dave Hansen,
	Jean-Christophe Plagniol-Villard, Thomas Gleixner, Ville Syrj?l?,
	Linux Fbdev development list, linux-media, X86 ML, mcgrof

[-- Attachment #1: Type: text/plain, Size: 2885 bytes --]

On Wed, 2015-04-22 at 10:17 -0600, Jason Gunthorpe wrote:
> On Wed, Apr 22, 2015 at 05:23:28PM +0200, Luis R. Rodriguez wrote:
> > On Tue, Apr 21, 2015 at 11:39:39PM -0600, Jason Gunthorpe wrote:
> > > On Wed, Apr 22, 2015 at 01:39:07AM +0200, Luis R. Rodriguez wrote:
> > > > > Mike, do you think the time is right to just remove the iPath driver?
> > > > 
> > > > With PAT now being default the driver effectively won't work
> > > > with write-combining on modern kernels. Even if systems are old
> > > > they likely had PAT support, when upgrading kernels PAT will work
> > > > but write-combing won't on ipath.
> > > 
> > > Sorry, do you mean the driver already doesn't get WC? Or do you mean
> > > after some more pending patches are applied?
> > 
> > No, you have to consider the system used and the effects of calls used
> > on the driver in light of this table:
> 
> So, just to be clear:
> 
> At some point Linux started setting the PAT bits during
> ioremap_nocache, which overrides MTRR, and at that point the driver
> became broken on all PAT capable systems?
> 
> Not only that, but we've only just noticed it now, and no user ever
> complained?
> 
> So that means either no users exist, or all users are on non-PAT
> systems?
> 
> This driver only works on x86-64 systems. Are there any x86-64 systems
> that are not PAT capable? IIRC even the first Opteron had PAT, but my
> memory is fuzzy from back then :|
> 
> > Another option in order to enable this type of checks at run time
> > and still be able to build the driver on standard distributions and
> > just prevent if from loading on PAT systems is to have some code in
> > place which would prevent the driver from loading if PAT was
> > enabled, this would enable folks to disable PAT via a kernel command
> > line option, and if that was used then the driver probe would
> > complete.
> 
> This seems like a reasonble option to me. At the very least we might
> learn if anyone is still using these cards.
> 
> I'd also love to remove the driver if it turns out there are actually
> no users. qib substantially replaces it except for a few very old
> cards.

To be precise, the split is that ipath powers the old HTX bus cards that
only work in AMD systems, qib is all PCI-e cards.  I still have a few
HTX cards, but I no longer have any systems with HTX slots, so we
haven't even used this driver in testing for 3 or 4 years now.  And
these are all old SDR cards, where the performance numbers were 800MB/s
with WC enabled, 50MB/s without it.

> Mike?
> 
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: 0E572FDD



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-22 18:53             ` Doug Ledford
@ 2015-04-22 19:05               ` Luis R. Rodriguez
  2015-04-22 19:10                 ` Doug Ledford
  2015-04-22 20:46               ` Jason Gunthorpe
  1 sibling, 1 reply; 40+ messages in thread
From: Luis R. Rodriguez @ 2015-04-22 19:05 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Jason Gunthorpe, Andy Lutomirski, mike.marciniszyn, infinipath,
	linux-rdma, awalls, Toshi Kani, H. Peter Anvin, Ingo Molnar,
	linux-kernel, Hal Rosenstock, Sean Hefty, Suresh Siddha,
	Rickard Strandqvist, Roland Dreier, Juergen Gross,
	Mauro Carvalho Chehab, Borislav Petkov, Mel Gorman,
	Vlastimil Babka, Davidlohr Bueso, Dave Hansen,
	Jean-Christophe Plagniol-Villard, Thomas Gleixner, Ville Syrj?l?,
	Linux Fbdev development list, linux-media, X86 ML, mcgrof

On Wed, Apr 22, 2015 at 02:53:11PM -0400, Doug Ledford wrote:
> On Wed, 2015-04-22 at 10:17 -0600, Jason Gunthorpe wrote:
> > On Wed, Apr 22, 2015 at 05:23:28PM +0200, Luis R. Rodriguez wrote:
> > > On Tue, Apr 21, 2015 at 11:39:39PM -0600, Jason Gunthorpe wrote:
> > > > On Wed, Apr 22, 2015 at 01:39:07AM +0200, Luis R. Rodriguez wrote:
> > > > > > Mike, do you think the time is right to just remove the iPath driver?
> > > > > 
> > > > > With PAT now being default the driver effectively won't work
> > > > > with write-combining on modern kernels. Even if systems are old
> > > > > they likely had PAT support, when upgrading kernels PAT will work
> > > > > but write-combing won't on ipath.
> > > > 
> > > > Sorry, do you mean the driver already doesn't get WC? Or do you mean
> > > > after some more pending patches are applied?
> > > 
> > > No, you have to consider the system used and the effects of calls used
> > > on the driver in light of this table:
> > 
> > So, just to be clear:
> > 
> > At some point Linux started setting the PAT bits during
> > ioremap_nocache, which overrides MTRR, and at that point the driver
> > became broken on all PAT capable systems?
> > 
> > Not only that, but we've only just noticed it now, and no user ever
> > complained?
> > 
> > So that means either no users exist, or all users are on non-PAT
> > systems?
> > 
> > This driver only works on x86-64 systems. Are there any x86-64 systems
> > that are not PAT capable? IIRC even the first Opteron had PAT, but my
> > memory is fuzzy from back then :|
> > 
> > > Another option in order to enable this type of checks at run time
> > > and still be able to build the driver on standard distributions and
> > > just prevent if from loading on PAT systems is to have some code in
> > > place which would prevent the driver from loading if PAT was
> > > enabled, this would enable folks to disable PAT via a kernel command
> > > line option, and if that was used then the driver probe would
> > > complete.
> > 
> > This seems like a reasonble option to me. At the very least we might
> > learn if anyone is still using these cards.
> > 
> > I'd also love to remove the driver if it turns out there are actually
> > no users. qib substantially replaces it except for a few very old
> > cards.
> 
> To be precise, the split is that ipath powers the old HTX bus cards that
> only work in AMD systems,

Do those systems have PAT support? CAn anyone check if PAT is enabled
if booted on a recent kernel?


 Luis

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-22 19:05               ` Luis R. Rodriguez
@ 2015-04-22 19:10                 ` Doug Ledford
  2015-04-22 19:14                   ` Luis R. Rodriguez
  0 siblings, 1 reply; 40+ messages in thread
From: Doug Ledford @ 2015-04-22 19:10 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Jason Gunthorpe, Andy Lutomirski, mike.marciniszyn, infinipath,
	linux-rdma, awalls, Toshi Kani, H. Peter Anvin, Ingo Molnar,
	linux-kernel, Hal Rosenstock, Sean Hefty, Suresh Siddha,
	Rickard Strandqvist, Roland Dreier, Juergen Gross,
	Mauro Carvalho Chehab, Borislav Petkov, Mel Gorman,
	Vlastimil Babka, Davidlohr Bueso, Dave Hansen,
	Jean-Christophe Plagniol-Villard, Thomas Gleixner, Ville Syrj?l?,
	Linux Fbdev development list, linux-media, X86 ML, mcgrof

[-- Attachment #1: Type: text/plain, Size: 679 bytes --]

On Wed, 2015-04-22 at 21:05 +0200, Luis R. Rodriguez wrote:

> > > I'd also love to remove the driver if it turns out there are actually
> > > no users. qib substantially replaces it except for a few very old
> > > cards.
> > 
> > To be precise, the split is that ipath powers the old HTX bus cards that
> > only work in AMD systems,
> 
> Do those systems have PAT support? CAn anyone check if PAT is enabled
> if booted on a recent kernel?

I don't have one of these systems any more.  The *only* one I ever had
was a monster IBM box...I can't even find a reference to it any more.

-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: 0E572FDD



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-22 19:10                 ` Doug Ledford
@ 2015-04-22 19:14                   ` Luis R. Rodriguez
  0 siblings, 0 replies; 40+ messages in thread
From: Luis R. Rodriguez @ 2015-04-22 19:14 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Jason Gunthorpe, Andy Lutomirski, Mike Marciniszyn,
	Mike Marciniszyn, linux-rdma, Andy Walls, Toshi Kani,
	H. Peter Anvin, Ingo Molnar, linux-kernel, Hal Rosenstock,
	Sean Hefty, Suresh Siddha, Rickard Strandqvist, Roland Dreier,
	Juergen Gross, Mauro Carvalho Chehab, Borislav Petkov,
	Mel Gorman, Vlastimil Babka, Davidlohr Bueso, Dave Hansen,
	Jean-Christophe Plagniol-Villard, Thomas Gleixner, Ville Syrj?l?,
	Linux Fbdev development list, linux-media, X86 ML

On Wed, Apr 22, 2015 at 12:10 PM, Doug Ledford <dledford@redhat.com> wrote:
> On Wed, 2015-04-22 at 21:05 +0200, Luis R. Rodriguez wrote:
>
>> > > I'd also love to remove the driver if it turns out there are actually
>> > > no users. qib substantially replaces it except for a few very old
>> > > cards.
>> >
>> > To be precise, the split is that ipath powers the old HTX bus cards that
>> > only work in AMD systems,
>>
>> Do those systems have PAT support? CAn anyone check if PAT is enabled
>> if booted on a recent kernel?
>
> I don't have one of these systems any more.  The *only* one I ever had
> was a monster IBM box...I can't even find a reference to it any more.

Um, yeah if its so rare then I think the compromise proposed might
make sense, specially since folks were even *considering* seriously
removing this device driver. I'll send some patches to propose the
strategy explained to require booting with pat disabled.

 Luis

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-22 18:53             ` Doug Ledford
  2015-04-22 19:05               ` Luis R. Rodriguez
@ 2015-04-22 20:46               ` Jason Gunthorpe
  2015-04-22 20:58                 ` Doug Ledford
  1 sibling, 1 reply; 40+ messages in thread
From: Jason Gunthorpe @ 2015-04-22 20:46 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Luis R. Rodriguez, Andy Lutomirski, mike.marciniszyn, infinipath,
	linux-rdma, awalls, Toshi Kani, H. Peter Anvin, Ingo Molnar,
	linux-kernel, Hal Rosenstock, Sean Hefty, Suresh Siddha,
	Rickard Strandqvist, Roland Dreier, Juergen Gross,
	Mauro Carvalho Chehab, Borislav Petkov, Mel Gorman,
	Vlastimil Babka, Davidlohr Bueso, Dave Hansen,
	Jean-Christophe Plagniol-Villard, Thomas Gleixner, Ville Syrj?l?,
	Linux Fbdev development list, linux-media, X86 ML, mcgrof

On Wed, Apr 22, 2015 at 02:53:11PM -0400, Doug Ledford wrote:

> To be precise, the split is that ipath powers the old HTX bus cards that
> only work in AMD systems, qib is all PCI-e cards.  I still have a few
> HTX cards, but I no longer have any systems with HTX slots, so we
> haven't even used this driver in testing for 3 or 4 years now.  And
> these are all old SDR cards, where the performance numbers were 800MB/s
> with WC enabled, 50MB/s without it.

Wow, I doubt any HTX systems are still in any kind of use.

It would be a nice clean up to drop the PPC support out of this driver
too. PPC never had HTX.

Jason

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-22 20:46               ` Jason Gunthorpe
@ 2015-04-22 20:58                 ` Doug Ledford
  0 siblings, 0 replies; 40+ messages in thread
From: Doug Ledford @ 2015-04-22 20:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Luis R. Rodriguez, Andy Lutomirski, mike.marciniszyn, infinipath,
	linux-rdma, awalls, Toshi Kani, H. Peter Anvin, Ingo Molnar,
	linux-kernel, Hal Rosenstock, Sean Hefty, Suresh Siddha,
	Rickard Strandqvist, Roland Dreier, Juergen Gross,
	Mauro Carvalho Chehab, Borislav Petkov, Mel Gorman,
	Vlastimil Babka, Davidlohr Bueso, Dave Hansen,
	Jean-Christophe Plagniol-Villard, Thomas Gleixner, Ville Syrj?l?,
	Linux Fbdev development list, linux-media, X86 ML, mcgrof

[-- Attachment #1: Type: text/plain, Size: 1548 bytes --]

On Wed, 2015-04-22 at 14:46 -0600, Jason Gunthorpe wrote:
> On Wed, Apr 22, 2015 at 02:53:11PM -0400, Doug Ledford wrote:
> 
> > To be precise, the split is that ipath powers the old HTX bus cards that
> > only work in AMD systems, qib is all PCI-e cards.  I still have a few
> > HTX cards, but I no longer have any systems with HTX slots, so we
> > haven't even used this driver in testing for 3 or 4 years now.  And
> > these are all old SDR cards, where the performance numbers were 800MB/s
> > with WC enabled, 50MB/s without it.
> 
> Wow, I doubt any HTX systems are still in any kind of use.
> 
> It would be a nice clean up to drop the PPC support out of this driver
> too. PPC never had HTX.

commit f6d60848baf9f4015c76c665791875ed623cd5b7
Author: Ralph Campbell <ralph.campbell@qlogic.com>
Date:   Thu May 6 17:03:19 2010 -0700

    IB/ipath: Remove support for QLogic PCIe QLE devices
    
    The ib_qib driver is taking over support for QLogic PCIe QLE
devices,
    so remove support for them from ib_ipath.  The ib_ipath driver now
    supports only the obsolete QLogic Hyper-Transport IB host channel
    adapter (model QHT7140).
    
    Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
    Signed-off-by: Roland Dreier <rolandd@cisco.com>

There you go.  It's been HTX only since 2010, and those cards were
already old then.  I think we should seriously consider deprecating and
then removing the driver.


-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: 0E572FDD



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
       [not found]                               ` <5536d47a.95968c0a.1d12.ffffbf85@mx.google.com>
@ 2015-04-21 23:14                                 ` Luis R. Rodriguez
  0 siblings, 0 replies; 40+ messages in thread
From: Luis R. Rodriguez @ 2015-04-21 23:14 UTC (permalink / raw)
  To: Andy Walls
  Cc: Andy Lutomirski, Hyong-Youb Kim, netdev, infinipath, jgunthorpe,
	mike.marciniszyn, Toshi Kani, H. Peter Anvin, Ingo Molnar,
	linux-kernel, Hal Rosenstock, Sean Hefty, Suresh Siddha,
	Rickard Strandqvist, Roland Dreier, Juergen Gross,
	Mauro Carvalho Chehab, Borislav Petkov, Mel Gorman,
	Vlastimil Babka, Davidlohr Bueso, Dave Hansen,
	Jean-Christophe Plagniol-Villard, Thomas Gleixner,
	Ville Syrjälä,
	linux-fbdev, linux-media, X86ML

On Tue, Apr 21, 2015 at 06:51:26PM -0400, Andy Walls wrote:
> Sorry for the top post; mobile work email account.
> 
> Luis,
> 
> You do the changes to remove MTTR and point me to your dev repo and branch.
> Also point me to the new functions/primitives I'll need.

There is nothing new actually needed for ivtv, unless of course
the ivtv driver is bounded to use a large MTRR that includes
the non-framebuffer region, if so then the ioremap_uc() would
be needed, and you can just cherry pick that patch:

https://marc.info/?l=linux-kernel&m=142964809110516&w=1

I'll bounce that patch to you as well. Might help reading this
patch too:

https://marc.info/?l=linux-kernel&m=142964809710517&w=1

If your write-combining area is not restricted by size constraints
so that it also include the non-framebuffer areas then you can just
do a simple conversion of the driver to use ioremap_wc() on the
framebuffer followed by arch_phys_wc_add().

An example driver that required changes to split this with size
contraints is atyfb, here are the changes for it:

https://marc.info/?l=linux-kernel&m=142964818810539&w=1
https://marc.info/?l=linux-kernel&m=142964813610531&w=1
https://marc.info/?l=linux-kernel&m=142964811010524&w=1
https://marc.info/?l=linux-kernel&m=142964814810532&w=1

If you are not constrained by MTRR's limitation on size then
a simple trivial driver conversion is sufficient. For example:

https://marc.info/?l=linux-kernel&m=142964744610286&w=1

I should also note that we are strivoing to also not use overlapping ioremap()
calls as we want to avoid that mess. Overlapping iroemap() calls with different
types could in theory work but its best we just design clean drivers and avoid
this.

As per Andy Lutomirski, what we'd need done on ivtv likely is
for it to ioremap() for an initial bring up of the device, then
infer the framebuffer offset, and only when that is being used
then iounmap and then ioremap() again split areas on the driver,
one with ioremap.

> I'll do the changes to add write-combining back into ivtv and ivtvfb, test
> them with my hardware and push them to my linuxtv.org git repo.

Great! The above sounded like a complexity you did not wish to
take on, but if you're up for the change, that'd be awesome!

> I know there is at least one English speaking user in India using ivtv with
> old PVR hardware, and probably folks in less developed places also using it.

If the above is too much work for that few amount of users I'd hope
we can just have them use older kernels, for the sake of sane APIs and
clean driver architecture.

  Luis

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-21 22:08                           ` Luis R. Rodriguez
@ 2015-04-21 22:09                             ` Andy Lutomirski
       [not found]                               ` <5536d47a.95968c0a.1d12.ffffbf85@mx.google.com>
  0 siblings, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2015-04-21 22:09 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Andy Walls, Hyong-Youb Kim, netdev, Toshi Kani, H. Peter Anvin,
	Ingo Molnar, linux-kernel, Hal Rosenstock, Sean Hefty,
	Suresh Siddha, Rickard Strandqvist, Mike Marciniszyn,
	Roland Dreier, Juergen Gross, Mauro Carvalho Chehab,
	Borislav Petkov, Mel Gorman, Vlastimil Babka, Davidlohr Bueso,
	Dave Hansen, Jean-Christophe Plagniol-Villard, Thomas Gleixner,
	Ville Syrjälä,
	linux-fbdev, linux-media, X86 ML

On Tue, Apr 21, 2015 at 3:08 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Tue, Apr 21, 2015 at 3:02 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
>> Andy, can we live without MTRR support on this driver for future kernels? This
>> would only leave ipath as the only offending driver.
>
> Sorry to be clear, can we live with removal of write-combining on this driver?
>

I personally think so, but a driver maintainer's ack would be nice.

--Andy

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-21 22:02                         ` Luis R. Rodriguez
@ 2015-04-21 22:08                           ` Luis R. Rodriguez
  2015-04-21 22:09                             ` Andy Lutomirski
  0 siblings, 1 reply; 40+ messages in thread
From: Luis R. Rodriguez @ 2015-04-21 22:08 UTC (permalink / raw)
  To: Andy Walls
  Cc: Hyong-Youb Kim, netdev, Andy Lutomirski, Toshi Kani,
	H. Peter Anvin, Ingo Molnar, linux-kernel, Hal Rosenstock,
	Sean Hefty, Suresh Siddha, Rickard Strandqvist, Mike Marciniszyn,
	Roland Dreier, Juergen Gross, Mauro Carvalho Chehab,
	Borislav Petkov, Mel Gorman, Vlastimil Babka, Davidlohr Bueso,
	Dave Hansen, Jean-Christophe Plagniol-Villard, Thomas Gleixner,
	Ville Syrjälä,
	linux-fbdev, linux-media, x86

On Tue, Apr 21, 2015 at 3:02 PM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> Andy, can we live without MTRR support on this driver for future kernels? This
> would only leave ipath as the only offending driver.

Sorry to be clear, can we live with removal of write-combining on this driver?

 Luis

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-16  1:07                       ` Andy Walls
@ 2015-04-21 22:02                         ` Luis R. Rodriguez
  2015-04-21 22:08                           ` Luis R. Rodriguez
  0 siblings, 1 reply; 40+ messages in thread
From: Luis R. Rodriguez @ 2015-04-21 22:02 UTC (permalink / raw)
  To: Andy Walls
  Cc: Hyong-Youb Kim, netdev, Andy Lutomirski, Toshi Kani,
	H. Peter Anvin, Ingo Molnar, linux-kernel, Hal Rosenstock,
	Sean Hefty, Suresh Siddha, Rickard Strandqvist, Mike Marciniszyn,
	Roland Dreier, Juergen Gross, Mauro Carvalho Chehab,
	Borislav Petkov, Mel Gorman, Vlastimil Babka, Davidlohr Bueso,
	dave.hansen, plagnioj, tglx, Ville Syrjälä,
	linux-fbdev, linux-media, x86

On Wed, Apr 15, 2015 at 09:07:37PM -0400, Andy Walls wrote:
> On Thu, 2015-04-16 at 01:58 +0200, Luis R. Rodriguez wrote:
> > Hey Andy, thanks for your review,  adding Hyong-Youb Kim for  review of the
> > full range ioremap_wc() idea below.
> > 
> > On Wed, Apr 15, 2015 at 06:38:51PM -0400, Andy Walls wrote:
> > > Hi All,
> > > 
> > > On Mon, 2015-04-13 at 19:49 +0200, Luis R. Rodriguez wrote:
> > > > From the beginning it seems only framebuffer devices used MTRR/WC,
> > > [snip]
> > > >  The ivtv device is a good example of the worst type of
> > > > situations and these days. So perhap __arch_phys_wc_add() and a
> > > > ioremap_ucminus() might be something more than transient unless hardware folks
> > > > get a good memo or already know how to just Do The Right Thing (TM).
> > > 
> > > Just to reiterate a subtle point, use of the ivtvfb is *optional*.  A
> > > user may or may not load it.  When the user does load the ivtvfb driver,
> > > the ivtv driver has already been initialized and may have functions of
> > > the card already in use by userspace.
> > 
> > I suspected this and its why I note that a rewrite to address a clean
> > split with separate ioremap seems rather difficult in this case.
> > 
> > > Hopefully no one is trying to use the OSD as framebuffer and the video
> > > decoder/output engine for video display at the same time. 
> > 
> > Worst case concern I have also is the implications of having overlapping
> > ioremap() calls (as proposed in my last reply) for different memory types
> > and having the different virtual memory addresse used by different parts
> > of the driver. Its not clear to me what the hardware implications of this
> > are.
> > 
> > >  But the video
> > > decoder/output device nodes may already be open for performing ioctl()
> > > functions so unmapping the decoder IO space out from under them, when
> > > loading the ivtvfb driver module, might not be a good thing. 
> > 
> > Using overlapping ioremap() calls with different memory types would address
> > this concern provided hardware won't barf both on the device and CPU. Hardware
> > folks could provide feedback or an ivtvfb user could test the patch supplied
> > on both non-PAT and PAT systems. Even so, who knows,  this might work on some
> > systems while not on others, only hardware folks would know.
> 
> The CX2341[56] firmware+hardware has a track record for being really
> picky about sytem hardware.  It's primary symptoms are for the DMA
> engine or Mailbox protocol to get hung up.  So yeah, it could barf
> easily on some users.
> 
> > An alternative... is to just ioremap_wc() the entire region, including
> > MMIO registers for these old devices.
> 
> That's my thought; as long as implementing PCI write then read can force
> writes to be posted and that setting that many pages as WC doesn't cause
> some sort of PAT resource exhaustion. (I know very little about PAT).

So upon review that strategy won't work well unless we implemnt some
sort of of hack on the driver. That's also quite a bit of work.

Andy, can we live without MTRR support on this driver for future kernels? This
would only leave ipath as the only offending driver.

  Luis

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-16  0:58                         ` Andy Lutomirski
  2015-04-16  1:01                           ` Andy Walls
@ 2015-04-21 17:35                           ` Luis R. Rodriguez
  1 sibling, 0 replies; 40+ messages in thread
From: Luis R. Rodriguez @ 2015-04-21 17:35 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Walls, Toshi Kani, H. Peter Anvin, Ingo Molnar,
	linux-kernel, Hal Rosenstock, Sean Hefty, Suresh Siddha,
	Rickard Strandqvist, Mike Marciniszyn, Roland Dreier,
	Juergen Gross, Mauro Carvalho Chehab, Borislav Petkov,
	Mel Gorman, Vlastimil Babka, Davidlohr Bueso, Dave Hansen,
	Jean-Christophe Plagniol-Villard, Thomas Gleixner,
	Ville Syrjälä,
	Linux Fbdev development list, linux-media, X86 ML,
	Greg Kroah-Hartman, mcgrof

On Wed, Apr 15, 2015 at 05:58:14PM -0700, Andy Lutomirski wrote:
> On Wed, Apr 15, 2015 at 4:59 PM, Andy Walls <awalls@md.metrocast.net> wrote:
> > On Wed, 2015-04-15 at 16:42 -0700, Andy Lutomirski wrote:
> >> On Wed, Apr 15, 2015 at 3:38 PM, Andy Walls <awalls@md.metrocast.net> wrote:
> >
> >> >
> >>
> >> IMO the right solution would be to avoid ioremapping the whole bar at
> >> startup.  Instead ioremap pieces once the driver learns what they are.
> >> This wouldn't have any of these problems -- you'd ioremap() register
> >> regions and you'd ioremap_wc() the framebuffer once you find it.  If
> >> there are regions of unknown purpose, just don't map them all.
> >>
> >> Would this be feasible?
> >
> > Feasible? Maybe.
> >
> > Worth the time and effort for end-of-life, convential PCI hardware so I
> > can have an optimally performing X display on a Standard Def Analog TV
> > screen?   Nope. I don't have that level of nostalgia.
> >
> 
> The point is actually to let us unexport or delete mtrr_add.  We can
> either severely regress performance on ivtv on PAT-capable hardware if
> we naively switch it to arch_phys_wc_add or we can do something else.
> The something else remains to be determined.

Back to square one: I can't come up with anything not too instrusive
or that dotes not requires substantial amount of work as an alternative to
removing MTRR completely right now (with the long term goal of also
making strong UC default) and its because of 2 device drivers:

  * ivtv: firmware API is poo and device is legacy, no one cares
  * ipath: driver needs a clean split and work is considerable,
    maintainers have not been responsive, do they care?

What do we want to do with these drivers? Let us be straight shooters,
if we are serious about having a performance regression on the drivers
for the sake of removing MTRR why not just seriously discuss removal
of these drivers. This way we can remain sane, upkeep a policy to
never even consider overlapping ioremap*() calls, and have a clean
expected strategy we can expect for new drivers.

I'm going to split up my patches now into 4 series:

1) things which are straight forward in converting drivers over
   to arch_phys_wc_add() and ioremap_wc(). These are subsystem
   wide though, so just a heads up, my hope is that each subsystem
   maintainer can take their own series unless someone else is
   comfortable in taking this in for x86

2) a few helpers in the like of ioremap_wc() needed for other drivers.
   These are straight forward but since they depend on x86 / core
   helpers it would be nice for them to go I guess through x86 folks.
   What maintainer is up to take these?

3) MTRR run time changes

4) corner cases - TBD - lets discuss here what we want to do with
   ivtv and ipath. I will however remove fusion's mtrr code use
   as its all commented out.

  Luis

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-16 18:54                         ` Luis R. Rodriguez
@ 2015-04-17  8:00                           ` Hyong-Youb Kim
  0 siblings, 0 replies; 40+ messages in thread
From: Hyong-Youb Kim @ 2015-04-17  8:00 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Andy Walls, Hyong-Youb Kim, netdev, Andy Lutomirski, Toshi Kani,
	H. Peter Anvin, Ingo Molnar, linux-kernel, Hal Rosenstock,
	Sean Hefty, Suresh Siddha, Rickard Strandqvist, Mike Marciniszyn,
	Roland Dreier, Juergen Gross, Mauro Carvalho Chehab,
	Borislav Petkov, Mel Gorman, Vlastimil Babka, Davidlohr Bueso,
	Johannes Berg, Felix Fietkau, Benjamin Poirier, dave.hansen,
	plagnioj, tglx, Ville Syrjälä,
	linux-fbdev, linux-media, x86

On Thu, Apr 16, 2015 at 08:54:26PM +0200, Luis R. Rodriguez wrote:
> > Without WC, descriptors would end up as multiple 4B or 8B MWr packets
> > to the NIC, which has a pretty big performance impact on this
> > particular NIC.
> 
> How big are the descriptors?

Some are 64B (a batch of eight 8B descriptors).  Some are 16B.

> > Most registers that do not want WC are actually in BAR2, which is not
> > mapped as WC.  For registers that are in BAR0, we do "write to the
> > register; wmb".  If we want to wait till the NIC has seen the write,
> > we do "write; wmb; read".
> 
> Interesting, thanks, yeah using this as a work around to the problem sounds
> plausible however it still would require likely making just as many changes to
> the ivtv and ipath driver as to just do a proper split. I do wonder however if
> this sort of work around can be generalized somehow though so that others could
> use, if this sort of thing is going to become prevalent. If so then this would
> serve two purposes: work around for the corner cases of MTRR use on Linux and
> also these sorts of device constraints.

These Myricom devices are very non-standard in my opinion, at least in
the Ethernet world.  Few, if any, other devices depend so much on WC
like these do.  I think almost all devices now have rings in host
memory.  The NIC pulls them via DMA.  No need for WC, and no need to
special case registers...

> In order to determine if this is likely to be generally useful could you elaborate
> a bit more about the detals of the performance issues of not bursting writes
> for the descriptor on this device.

For this particular Myricom device, performance penalty stems from the
use of slow path in the firmware.  They are not about how effectively
we use PCI Express or latency or bandwidth.  Small MWr packets end up
casuing slow path processing via the firmware in this device.

There are HPC low latency NICs that use WC for different reasons.  To
reduce latency as much as possible, some of these copy small packets
to the NIC memory via PIO (BAR0, and so on), instead of DMA.  They
want WC mapping to minimize PCI Express packets/transactions.

I do not know about video adapters and their use for WC.

> Even if that is done a conversion over to this work around seems it may require
> device specific nitpicks. For instance I note in myri10ge_submit_req() for
> small writes you just do a reverse write and do the first set last, then
> finally the last 32 bits are rewritten, I guess to trigger something?

Right.  The device polls the last word to start sending, DMA, etc.

> > This approach has worked for this device for many years.  I cannot say
> > whether it works for other devices, though.
> 
> I think it should but the more interesting question would be exactly
> *why* it was needed for this device, who determined that, and why?

Hopefully, the above text answers some of your questions.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-16  1:01                           ` Andy Walls
@ 2015-04-16 19:19                             ` Andy Lutomirski
  0 siblings, 0 replies; 40+ messages in thread
From: Andy Lutomirski @ 2015-04-16 19:19 UTC (permalink / raw)
  To: Andy Walls
  Cc: Jean-Christophe Plagniol-Villard, Roland Dreier,
	Linux Fbdev development list, Juergen Gross, X86 ML,
	Mauro Carvalho Chehab, Mike Marciniszyn, Suresh Siddha,
	Ville Syrjälä,
	Borislav Petkov, Sean Hefty, Rickard Strandqvist, Hal Rosenstock,
	linux-kernel, H. Peter Anvin, Mel Gorman, Vlastimil Babka,
	Luis R. Rodriguez, Toshi Kani, Thomas Gleixner, Dave Hansen,
	Davidlohr Bueso, linux-media, Ingo Molnar

On Apr 15, 2015 6:54 PM, "Andy Walls" <awalls@md.metrocast.net> wrote:
>
> On Wed, 2015-04-15 at 17:58 -0700, Andy Lutomirski wrote:
> > On Wed, Apr 15, 2015 at 4:59 PM, Andy Walls <awalls@md.metrocast.net> wrote:
> > > On Wed, 2015-04-15 at 16:42 -0700, Andy Lutomirski wrote:
> > >> On Wed, Apr 15, 2015 at 3:38 PM, Andy Walls <awalls@md.metrocast.net> wrote:
> > >
> > >> >
> > >>
> > >> IMO the right solution would be to avoid ioremapping the whole bar at
> > >> startup.  Instead ioremap pieces once the driver learns what they are.
> > >> This wouldn't have any of these problems -- you'd ioremap() register
> > >> regions and you'd ioremap_wc() the framebuffer once you find it.  If
> > >> there are regions of unknown purpose, just don't map them all.
> > >>
> > >> Would this be feasible?
> > >
> > > Feasible? Maybe.
> > >
> > > Worth the time and effort for end-of-life, convential PCI hardware so I
> > > can have an optimally performing X display on a Standard Def Analog TV
> > > screen?   Nope. I don't have that level of nostalgia.
> > >
> >
> > The point is actually to let us unexport or delete mtrr_add.
>
> Understood.
>
>
> >   We can
> > either severely regress performance on ivtv on PAT-capable hardware if
> > we naively switch it to arch_phys_wc_add or we can do something else.
> > The something else remains to be determined.
>
> Maybe ioremap the decoder register area as UC, and ioremap the rest of
> the decoder region to WC. (Does that suck up too many PAT resources?

PAT resources are unlimited.

> Then add PCI reads following any sort of singleton PCI writes in the WC
> region.  I assume PCI rules about write postings before reads still
> apply when WC is set.
>

I think we need sfence, too, but that's easy.  We also lose the write
sizes.  That is, adjacent writes get combined.  Maybe that's okay.

> > >
> > > We sort of know where some things are in the MMIO space due to
> > > experimentation and past efforts examining the firmware binary.
> > >
> > > Documentation/video4linux/cx2341x/fw-*.txt documents some things.  The
> > > driver code actually codifies a little bit more knowledge.
> > >
> > > The driver code for doing transfers between host and card is complex and
> > > fragile with some streams that use DMA, other streams that use PIO,
> > > digging VBI data straight out of card memory, and scatter-gather being
> > > broken on newer firmwares.  Playing around with ioremapping will be hard
> > > to get right and likely cause something in the code to break for the
> > > primary use case of the ivtv supported cards.
> >
> > Ick.
>
> Yeah.
>
> > If the only thing that really wants WC is the esoteric framebuffer
> > thing,
>
> That appears to be it.
>
> >  could we just switch to arch_phys_wc_add and assume that no one
> > will care about the regression on new CPUs with ivtv cards?
>
> That's on the table in my mind.  Not sure if it is the friendliest thing
> to do to users.  Quite honestly though, modern graphics cards have much
> better ouput resolution and performance.  Anyone with a modern system
> really should be using one.  (i.e. MythTV gave up on support for PVR-350
> output for video playback years ago in May 2010.)
>
>
> BTW, my 2005 system with multiple conventional PCI slots in it shows a
> 'pat' flag in /proc/cpuinfo.  (AMD Athlon(tm) 64 X2 Dual Core Processor
> 4200+)  I didn't know it was considered "new". :)

Tons of CPUs have that ability, but we often turn it off due to errata
on older CPUs.

--Andy

>
> Regards,
> Andy
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-16  4:18                       ` Hyong-Youb Kim
@ 2015-04-16 18:54                         ` Luis R. Rodriguez
  2015-04-17  8:00                           ` Hyong-Youb Kim
  0 siblings, 1 reply; 40+ messages in thread
From: Luis R. Rodriguez @ 2015-04-16 18:54 UTC (permalink / raw)
  To: Hyong-Youb Kim
  Cc: Andy Walls, Hyong-Youb Kim, netdev, Andy Lutomirski, Toshi Kani,
	H. Peter Anvin, Ingo Molnar, linux-kernel, Hal Rosenstock,
	Sean Hefty, Suresh Siddha, Rickard Strandqvist, Mike Marciniszyn,
	Roland Dreier, Juergen Gross, Mauro Carvalho Chehab,
	Borislav Petkov, Mel Gorman, Vlastimil Babka, Davidlohr Bueso,
	Johannes Berg, Felix Fietkau, Benjamin Poirier, dave.hansen,
	plagnioj, tglx, Ville Syrjälä,
	linux-fbdev, linux-media, x86

On Thu, Apr 16, 2015 at 01:18:37PM +0900, Hyong-Youb Kim wrote:
> On Thu, Apr 16, 2015 at 01:58:16AM +0200, Luis R. Rodriguez wrote:
> > 
> > An alternative... is to just ioremap_wc() the entire region, including
> > MMIO registers for these old devices. I see one ethernet driver that does
> > this, myri10ge, and am curious how and why they ended up deciding this
> > and if they have run into any issues. I wonder if this is a reasonable
> > comrpomise for these 2 remaining corner cases.
> > 
> 
> For myri10ge, it a performance thing.  Descriptor rings are in NIC
> memory BAR0, not in host memory.  Say, to send a packet, the driver
> writes the send descriptor to the ioremap'd NIC memory.  It is a
> multi-word descriptor.  So, to send it as one PCIE MWr transaction,
> the driver maps the whole BAR0 as WC and does "copy descriptor; wmb".

Interesting, so you burst write multi-word descriptor writes using
write-combining here for the Ethernet device.

> Without WC, descriptors would end up as multiple 4B or 8B MWr packets
> to the NIC, which has a pretty big performance impact on this
> particular NIC.

How big are the descriptors?

> Most registers that do not want WC are actually in BAR2, which is not
> mapped as WC.  For registers that are in BAR0, we do "write to the
> register; wmb".  If we want to wait till the NIC has seen the write,
> we do "write; wmb; read".

Interesting, thanks, yeah using this as a work around to the problem sounds
plausible however it still would require likely making just as many changes to
the ivtv and ipath driver as to just do a proper split. I do wonder however if
this sort of work around can be generalized somehow though so that others could
use, if this sort of thing is going to become prevalent. If so then this would
serve two purposes: work around for the corner cases of MTRR use on Linux and
also these sorts of device constraints.

In order to determine if this is likely to be generally useful could you elaborate
a bit more about the detals of the performance issues of not bursting writes
for the descriptor on this device.

Even if that is done a conversion over to this work around seems it may require
device specific nitpicks. For instance I note in myri10ge_submit_req() for
small writes you just do a reverse write and do the first set last, then
finally the last 32 bits are rewritten, I guess to trigger something?

> This approach has worked for this device for many years.  I cannot say
> whether it works for other devices, though.

I think it should but the more interesting question would be exactly
*why* it was needed for this device, who determined that, and why?

  Luis

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-15 23:58                     ` Luis R. Rodriguez
  2015-04-16  1:07                       ` Andy Walls
@ 2015-04-16  4:18                       ` Hyong-Youb Kim
  2015-04-16 18:54                         ` Luis R. Rodriguez
  1 sibling, 1 reply; 40+ messages in thread
From: Hyong-Youb Kim @ 2015-04-16  4:18 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Andy Walls, Hyong-Youb Kim, netdev, Andy Lutomirski, Toshi Kani,
	H. Peter Anvin, Ingo Molnar, linux-kernel, Hal Rosenstock,
	Sean Hefty, Suresh Siddha, Rickard Strandqvist, Mike Marciniszyn,
	Roland Dreier, Juergen Gross, Mauro Carvalho Chehab,
	Borislav Petkov, Mel Gorman, Vlastimil Babka, Davidlohr Bueso,
	dave.hansen, plagnioj, tglx, Ville Syrjälä,
	linux-fbdev, linux-media, x86

On Thu, Apr 16, 2015 at 01:58:16AM +0200, Luis R. Rodriguez wrote:
> 
> An alternative... is to just ioremap_wc() the entire region, including
> MMIO registers for these old devices. I see one ethernet driver that does
> this, myri10ge, and am curious how and why they ended up deciding this
> and if they have run into any issues. I wonder if this is a reasonable
> comrpomise for these 2 remaining corner cases.
> 

For myri10ge, it a performance thing.  Descriptor rings are in NIC
memory BAR0, not in host memory.  Say, to send a packet, the driver
writes the send descriptor to the ioremap'd NIC memory.  It is a
multi-word descriptor.  So, to send it as one PCIE MWr transaction,
the driver maps the whole BAR0 as WC and does "copy descriptor; wmb".

Without WC, descriptors would end up as multiple 4B or 8B MWr packets
to the NIC, which has a pretty big performance impact on this
particular NIC.

Most registers that do not want WC are actually in BAR2, which is not
mapped as WC.  For registers that are in BAR0, we do "write to the
register; wmb".  If we want to wait till the NIC has seen the write,
we do "write; wmb; read".

This approach has worked for this device for many years.  I cannot say
whether it works for other devices, though.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-15 23:58                     ` Luis R. Rodriguez
@ 2015-04-16  1:07                       ` Andy Walls
  2015-04-21 22:02                         ` Luis R. Rodriguez
  2015-04-16  4:18                       ` Hyong-Youb Kim
  1 sibling, 1 reply; 40+ messages in thread
From: Andy Walls @ 2015-04-16  1:07 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Hyong-Youb Kim, netdev, Andy Lutomirski, Toshi Kani,
	H. Peter Anvin, Ingo Molnar, linux-kernel, Hal Rosenstock,
	Sean Hefty, Suresh Siddha, Rickard Strandqvist, Mike Marciniszyn,
	Roland Dreier, Juergen Gross, Mauro Carvalho Chehab,
	Borislav Petkov, Mel Gorman, Vlastimil Babka, Davidlohr Bueso,
	dave.hansen, plagnioj, tglx, Ville Syrjälä,
	linux-fbdev, linux-media, x86

On Thu, 2015-04-16 at 01:58 +0200, Luis R. Rodriguez wrote:
> Hey Andy, thanks for your review,  adding Hyong-Youb Kim for  review of the
> full range ioremap_wc() idea below.
> 
> On Wed, Apr 15, 2015 at 06:38:51PM -0400, Andy Walls wrote:
> > Hi All,
> > 
> > On Mon, 2015-04-13 at 19:49 +0200, Luis R. Rodriguez wrote:
> > > From the beginning it seems only framebuffer devices used MTRR/WC,
> > [snip]
> > >  The ivtv device is a good example of the worst type of
> > > situations and these days. So perhap __arch_phys_wc_add() and a
> > > ioremap_ucminus() might be something more than transient unless hardware folks
> > > get a good memo or already know how to just Do The Right Thing (TM).
> > 
> > Just to reiterate a subtle point, use of the ivtvfb is *optional*.  A
> > user may or may not load it.  When the user does load the ivtvfb driver,
> > the ivtv driver has already been initialized and may have functions of
> > the card already in use by userspace.
> 
> I suspected this and its why I note that a rewrite to address a clean
> split with separate ioremap seems rather difficult in this case.
> 
> > Hopefully no one is trying to use the OSD as framebuffer and the video
> > decoder/output engine for video display at the same time. 
> 
> Worst case concern I have also is the implications of having overlapping
> ioremap() calls (as proposed in my last reply) for different memory types
> and having the different virtual memory addresse used by different parts
> of the driver. Its not clear to me what the hardware implications of this
> are.
> 
> >  But the video
> > decoder/output device nodes may already be open for performing ioctl()
> > functions so unmapping the decoder IO space out from under them, when
> > loading the ivtvfb driver module, might not be a good thing. 
> 
> Using overlapping ioremap() calls with different memory types would address
> this concern provided hardware won't barf both on the device and CPU. Hardware
> folks could provide feedback or an ivtvfb user could test the patch supplied
> on both non-PAT and PAT systems. Even so, who knows,  this might work on some
> systems while not on others, only hardware folks would know.

The CX2341[56] firmware+hardware has a track record for being really
picky about sytem hardware.  It's primary symptoms are for the DMA
engine or Mailbox protocol to get hung up.  So yeah, it could barf
easily on some users.

> An alternative... is to just ioremap_wc() the entire region, including
> MMIO registers for these old devices.

That's my thought; as long as implementing PCI write then read can force
writes to be posted and that setting that many pages as WC doesn't cause
some sort of PAT resource exhaustion. (I know very little about PAT).

>  I see one ethernet driver that does
> this, myri10ge, and am curious how and why they ended up deciding this
> and if they have run into any issues. I wonder if this is a reasonable
> comrpomise for these 2 remaining corner cases.
> 
>   Luis

Regards,
Andy


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-16  0:58                         ` Andy Lutomirski
@ 2015-04-16  1:01                           ` Andy Walls
  2015-04-16 19:19                             ` Andy Lutomirski
  2015-04-21 17:35                           ` Luis R. Rodriguez
  1 sibling, 1 reply; 40+ messages in thread
From: Andy Walls @ 2015-04-16  1:01 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Luis R. Rodriguez, Toshi Kani, H. Peter Anvin, Ingo Molnar,
	linux-kernel, Hal Rosenstock, Sean Hefty, Suresh Siddha,
	Rickard Strandqvist, Mike Marciniszyn, Roland Dreier,
	Juergen Gross, Mauro Carvalho Chehab, Borislav Petkov,
	Mel Gorman, Vlastimil Babka, Davidlohr Bueso, Dave Hansen,
	Jean-Christophe Plagniol-Villard, Thomas Gleixner,
	Ville Syrjälä,
	Linux Fbdev development list, linux-media, X86 ML

On Wed, 2015-04-15 at 17:58 -0700, Andy Lutomirski wrote:
> On Wed, Apr 15, 2015 at 4:59 PM, Andy Walls <awalls@md.metrocast.net> wrote:
> > On Wed, 2015-04-15 at 16:42 -0700, Andy Lutomirski wrote:
> >> On Wed, Apr 15, 2015 at 3:38 PM, Andy Walls <awalls@md.metrocast.net> wrote:
> >
> >> >
> >>
> >> IMO the right solution would be to avoid ioremapping the whole bar at
> >> startup.  Instead ioremap pieces once the driver learns what they are.
> >> This wouldn't have any of these problems -- you'd ioremap() register
> >> regions and you'd ioremap_wc() the framebuffer once you find it.  If
> >> there are regions of unknown purpose, just don't map them all.
> >>
> >> Would this be feasible?
> >
> > Feasible? Maybe.
> >
> > Worth the time and effort for end-of-life, convential PCI hardware so I
> > can have an optimally performing X display on a Standard Def Analog TV
> > screen?   Nope. I don't have that level of nostalgia.
> >
> 
> The point is actually to let us unexport or delete mtrr_add.

Understood.


>   We can
> either severely regress performance on ivtv on PAT-capable hardware if
> we naively switch it to arch_phys_wc_add or we can do something else.
> The something else remains to be determined.

Maybe ioremap the decoder register area as UC, and ioremap the rest of
the decoder region to WC. (Does that suck up too many PAT resources?)
Then add PCI reads following any sort of singleton PCI writes in the WC
region.  I assume PCI rules about write postings before reads still
apply when WC is set.

> >
> > We sort of know where some things are in the MMIO space due to
> > experimentation and past efforts examining the firmware binary.
> >
> > Documentation/video4linux/cx2341x/fw-*.txt documents some things.  The
> > driver code actually codifies a little bit more knowledge.
> >
> > The driver code for doing transfers between host and card is complex and
> > fragile with some streams that use DMA, other streams that use PIO,
> > digging VBI data straight out of card memory, and scatter-gather being
> > broken on newer firmwares.  Playing around with ioremapping will be hard
> > to get right and likely cause something in the code to break for the
> > primary use case of the ivtv supported cards.
> 
> Ick.

Yeah.

> If the only thing that really wants WC is the esoteric framebuffer
> thing,

That appears to be it.

>  could we just switch to arch_phys_wc_add and assume that no one
> will care about the regression on new CPUs with ivtv cards?

That's on the table in my mind.  Not sure if it is the friendliest thing
to do to users.  Quite honestly though, modern graphics cards have much
better ouput resolution and performance.  Anyone with a modern system
really should be using one.  (i.e. MythTV gave up on support for PVR-350
output for video playback years ago in May 2010.)


BTW, my 2005 system with multiple conventional PCI slots in it shows a
'pat' flag in /proc/cpuinfo.  (AMD Athlon(tm) 64 X2 Dual Core Processor
4200+)  I didn't know it was considered "new". :)

Regards,
Andy



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-15 23:59                       ` Andy Walls
@ 2015-04-16  0:58                         ` Andy Lutomirski
  2015-04-16  1:01                           ` Andy Walls
  2015-04-21 17:35                           ` Luis R. Rodriguez
  0 siblings, 2 replies; 40+ messages in thread
From: Andy Lutomirski @ 2015-04-16  0:58 UTC (permalink / raw)
  To: Andy Walls
  Cc: Luis R. Rodriguez, Toshi Kani, H. Peter Anvin, Ingo Molnar,
	linux-kernel, Hal Rosenstock, Sean Hefty, Suresh Siddha,
	Rickard Strandqvist, Mike Marciniszyn, Roland Dreier,
	Juergen Gross, Mauro Carvalho Chehab, Borislav Petkov,
	Mel Gorman, Vlastimil Babka, Davidlohr Bueso, Dave Hansen,
	Jean-Christophe Plagniol-Villard, Thomas Gleixner,
	Ville Syrjälä,
	Linux Fbdev development list, linux-media, X86 ML

On Wed, Apr 15, 2015 at 4:59 PM, Andy Walls <awalls@md.metrocast.net> wrote:
> On Wed, 2015-04-15 at 16:42 -0700, Andy Lutomirski wrote:
>> On Wed, Apr 15, 2015 at 3:38 PM, Andy Walls <awalls@md.metrocast.net> wrote:
>
>> >
>>
>> IMO the right solution would be to avoid ioremapping the whole bar at
>> startup.  Instead ioremap pieces once the driver learns what they are.
>> This wouldn't have any of these problems -- you'd ioremap() register
>> regions and you'd ioremap_wc() the framebuffer once you find it.  If
>> there are regions of unknown purpose, just don't map them all.
>>
>> Would this be feasible?
>
> Feasible? Maybe.
>
> Worth the time and effort for end-of-life, convential PCI hardware so I
> can have an optimally performing X display on a Standard Def Analog TV
> screen?   Nope. I don't have that level of nostalgia.
>

The point is actually to let us unexport or delete mtrr_add.  We can
either severely regress performance on ivtv on PAT-capable hardware if
we naively switch it to arch_phys_wc_add or we can do something else.
The something else remains to be determined.

>
> We sort of know where some things are in the MMIO space due to
> experimentation and past efforts examining the firmware binary.
>
> Documentation/video4linux/cx2341x/fw-*.txt documents some things.  The
> driver code actually codifies a little bit more knowledge.
>
> The driver code for doing transfers between host and card is complex and
> fragile with some streams that use DMA, other streams that use PIO,
> digging VBI data straight out of card memory, and scatter-gather being
> broken on newer firmwares.  Playing around with ioremapping will be hard
> to get right and likely cause something in the code to break for the
> primary use case of the ivtv supported cards.

Ick.

If the only thing that really wants WC is the esoteric framebuffer
thing, could we just switch to arch_phys_wc_add and assume that no one
will care about the regression on new CPUs with ivtv cards?


--Andy

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-15 23:42                     ` Andy Lutomirski
@ 2015-04-15 23:59                       ` Andy Walls
  2015-04-16  0:58                         ` Andy Lutomirski
  0 siblings, 1 reply; 40+ messages in thread
From: Andy Walls @ 2015-04-15 23:59 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Luis R. Rodriguez, Toshi Kani, H. Peter Anvin, Ingo Molnar,
	linux-kernel, Hal Rosenstock, Sean Hefty, Suresh Siddha,
	Rickard Strandqvist, Mike Marciniszyn, Roland Dreier,
	Juergen Gross, Mauro Carvalho Chehab, Borislav Petkov,
	Mel Gorman, Vlastimil Babka, Davidlohr Bueso, Dave Hansen,
	Jean-Christophe Plagniol-Villard, Thomas Gleixner,
	Ville Syrjälä,
	Linux Fbdev development list, linux-media, X86 ML

On Wed, 2015-04-15 at 16:42 -0700, Andy Lutomirski wrote:
> On Wed, Apr 15, 2015 at 3:38 PM, Andy Walls <awalls@md.metrocast.net> wrote:

> >
> 
> IMO the right solution would be to avoid ioremapping the whole bar at
> startup.  Instead ioremap pieces once the driver learns what they are.
> This wouldn't have any of these problems -- you'd ioremap() register
> regions and you'd ioremap_wc() the framebuffer once you find it.  If
> there are regions of unknown purpose, just don't map them all.
> 
> Would this be feasible?

Feasible? Maybe.

Worth the time and effort for end-of-life, convential PCI hardware so I
can have an optimally performing X display on a Standard Def Analog TV
screen?   Nope. I don't have that level of nostalgia.


We sort of know where some things are in the MMIO space due to
experimentation and past efforts examining the firmware binary.

Documentation/video4linux/cx2341x/fw-*.txt documents some things.  The
driver code actually codifies a little bit more knowledge.

The driver code for doing transfers between host and card is complex and
fragile with some streams that use DMA, other streams that use PIO,
digging VBI data straight out of card memory, and scatter-gather being
broken on newer firmwares.  Playing around with ioremapping will be hard
to get right and likely cause something in the code to break for the
primary use case of the ivtv supported cards. 

Regards,
Andy



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-15 22:38                   ` Andy Walls
  2015-04-15 23:42                     ` Andy Lutomirski
@ 2015-04-15 23:58                     ` Luis R. Rodriguez
  2015-04-16  1:07                       ` Andy Walls
  2015-04-16  4:18                       ` Hyong-Youb Kim
  1 sibling, 2 replies; 40+ messages in thread
From: Luis R. Rodriguez @ 2015-04-15 23:58 UTC (permalink / raw)
  To: Andy Walls, Hyong-Youb Kim, netdev
  Cc: Andy Lutomirski, Toshi Kani, H. Peter Anvin, Ingo Molnar,
	linux-kernel, Hal Rosenstock, Sean Hefty, Suresh Siddha,
	Rickard Strandqvist, Mike Marciniszyn, Roland Dreier,
	Juergen Gross, Mauro Carvalho Chehab, Borislav Petkov,
	Mel Gorman, Vlastimil Babka, Davidlohr Bueso, dave.hansen,
	plagnioj, tglx, Ville Syrjälä,
	linux-fbdev, linux-media, x86

Hey Andy, thanks for your review,  adding Hyong-Youb Kim for  review of the
full range ioremap_wc() idea below.

On Wed, Apr 15, 2015 at 06:38:51PM -0400, Andy Walls wrote:
> Hi All,
> 
> On Mon, 2015-04-13 at 19:49 +0200, Luis R. Rodriguez wrote:
> > From the beginning it seems only framebuffer devices used MTRR/WC,
> [snip]
> >  The ivtv device is a good example of the worst type of
> > situations and these days. So perhap __arch_phys_wc_add() and a
> > ioremap_ucminus() might be something more than transient unless hardware folks
> > get a good memo or already know how to just Do The Right Thing (TM).
> 
> Just to reiterate a subtle point, use of the ivtvfb is *optional*.  A
> user may or may not load it.  When the user does load the ivtvfb driver,
> the ivtv driver has already been initialized and may have functions of
> the card already in use by userspace.

I suspected this and its why I note that a rewrite to address a clean
split with separate ioremap seems rather difficult in this case.

> Hopefully no one is trying to use the OSD as framebuffer and the video
> decoder/output engine for video display at the same time. 

Worst case concern I have also is the implications of having overlapping
ioremap() calls (as proposed in my last reply) for different memory types
and having the different virtual memory addresse used by different parts
of the driver. Its not clear to me what the hardware implications of this
are.

>  But the video
> decoder/output device nodes may already be open for performing ioctl()
> functions so unmapping the decoder IO space out from under them, when
> loading the ivtvfb driver module, might not be a good thing. 

Using overlapping ioremap() calls with different memory types would address
this concern provided hardware won't barf both on the device and CPU. Hardware
folks could provide feedback or an ivtvfb user could test the patch supplied
on both non-PAT and PAT systems. Even so, who knows,  this might work on some
systems while not on others, only hardware folks would know.

An alternative... is to just ioremap_wc() the entire region, including
MMIO registers for these old devices. I see one ethernet driver that does
this, myri10ge, and am curious how and why they ended up deciding this
and if they have run into any issues. I wonder if this is a reasonable
comrpomise for these 2 remaining corner cases.

  Luis

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-15 22:38                   ` Andy Walls
@ 2015-04-15 23:42                     ` Andy Lutomirski
  2015-04-15 23:59                       ` Andy Walls
  2015-04-15 23:58                     ` Luis R. Rodriguez
  1 sibling, 1 reply; 40+ messages in thread
From: Andy Lutomirski @ 2015-04-15 23:42 UTC (permalink / raw)
  To: Andy Walls
  Cc: Luis R. Rodriguez, Toshi Kani, H. Peter Anvin, Ingo Molnar,
	linux-kernel, Hal Rosenstock, Sean Hefty, Suresh Siddha,
	Rickard Strandqvist, Mike Marciniszyn, Roland Dreier,
	Juergen Gross, Mauro Carvalho Chehab, Borislav Petkov,
	Mel Gorman, Vlastimil Babka, Davidlohr Bueso, Dave Hansen,
	Jean-Christophe Plagniol-Villard, Thomas Gleixner,
	Ville Syrjälä,
	Linux Fbdev development list, linux-media, X86 ML

On Wed, Apr 15, 2015 at 3:38 PM, Andy Walls <awalls@md.metrocast.net> wrote:
> Hi All,
>
> On Mon, 2015-04-13 at 19:49 +0200, Luis R. Rodriguez wrote:
> [snip]
>>  I only saw a few drivers using overlapping ioremap*()
>> calls though on my MTRR review and they are all old devices so likely mostly
>> used on non-PAT systems, but there might be other corner cases elsewhere.
>>
>> Lets recap, as I see it we have a few options with all this in mind on our
>> quest to bury MTRR (and later make strong UC default):
>>
>> 1) Let drivers do their own get_vm_area() calls as you note and handle the
>>    cut and splicing of ioremap areas
>>
>> 2) Provide an API to split and splice ioremap ranges
>>
>> 3) Try to avoid these types of situations and let drivers simply try to
>>    work towards a proper clean non-overlapping different ioremap() ranges
>>
>> Let me know if you think of others but please keep in mind the UC- to strong
>> UC converstion we want to do later as well. We have ruled out using
>> set_memor_wc() now.
>>
>> I prefer option 3), its technically possible but only for *new* drivers
>> and we'd need either some hard work to split the ioremap() areas or provide
>> a a set of *transient* APIs as I had originally proposed to phase / hope for
>> final transition. There are 3 drivers to address:
>>
>> [snip]
>
> Just some background for folks:
> The ivtv driver supports cards that perform Standard Definition PAL,
> NTSC, and SECAM TV capture + hardware MPEG-2 encoding and MPEG-2
> decoding + TV output.
>
> Of the supported cards only *one* card type, the PVR-350 based on the
> CX23415 chip, can perform the MPEG-2 or YUV video decoding and output,
> with an OSD overlay.  PVR-350's are legacy PCI cards that have been end
> of life since 2088 or earlier.  Despite that, there are still used units
> available on Amazon and eBay.
>
> The ivtvfb driver module is an *optional* companion driver module that
> provides a framebuffer interface for the user to output an X display, FB
> console, or whatever to a standard definition analog PAL, NTSC, or SECAM
> TV or VCR.  It does this by co-opting the OSD overlay of the video
> decoding output engine and having it take up the whole screen.
>
>
>
>> c) ivtv: the driver does not have the PCI space mapped out separately, and
>> in fact it actually does not do the math for the framebuffer, instead it lets
>> the device's own CPU do that and assume where its at, see
>> ivtvfb_get_framebuffer() and CX2341X_OSD_GET_FRAMEBUFFER, it has a get
>> but not a setter. Its not clear if the firmware would make a split easy.
>
> The CX2341[56]'s firmware + hardware machine are notorious for bugs and
> being hard to work with.  It would be difficult to determine with any
> certainty that the firmware returned predictable OSD framebuffer
> addresses from one user's system to the next.
>
>
>> We'd need ioremap_ucminus() here too and __arch_phys_wc_add().
>
> Yes.
>
> As a driver writer/maintainer, IMO the name ioremap_ucminus() is jargon,
> without a clear meaning from the name, and maybe too x86 PAT specific?
> The pat.txt file under Documentation didn't explain what UC- meant; I
> had to go searching old LKML emails to understand it was a unchached
> region that could be overridden with write combine regions.
>
> To me names along, these lines would be better:
>    ioremap_nocache_weak(), or
>    ioremap_nocache_mutable(), or
>    ioremap_nocache()  (with ioremap_nocache_strong() or something for
> the UC version)
>

IMO the right solution would be to avoid ioremapping the whole bar at
startup.  Instead ioremap pieces once the driver learns what they are.
This wouldn't have any of these problems -- you'd ioremap() register
regions and you'd ioremap_wc() the framebuffer once you find it.  If
there are regions of unknown purpose, just don't map them all.

Would this be feasible?

--Andy

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
  2015-04-13 17:49                 ` Luis R. Rodriguez
@ 2015-04-15 22:38                   ` Andy Walls
  2015-04-15 23:42                     ` Andy Lutomirski
  2015-04-15 23:58                     ` Luis R. Rodriguez
  0 siblings, 2 replies; 40+ messages in thread
From: Andy Walls @ 2015-04-15 22:38 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Andy Lutomirski, Toshi Kani, H. Peter Anvin, Ingo Molnar,
	linux-kernel, Hal Rosenstock, Sean Hefty, Suresh Siddha,
	Rickard Strandqvist, Mike Marciniszyn, Roland Dreier,
	Juergen Gross, Mauro Carvalho Chehab, Borislav Petkov,
	Mel Gorman, Vlastimil Babka, Davidlohr Bueso, dave.hansen,
	plagnioj, tglx, Ville Syrjälä,
	linux-fbdev, linux-media, x86

Hi All,

On Mon, 2015-04-13 at 19:49 +0200, Luis R. Rodriguez wrote:
[snip]
>  I only saw a few drivers using overlapping ioremap*()
> calls though on my MTRR review and they are all old devices so likely mostly
> used on non-PAT systems, but there might be other corner cases elsewhere.
> 
> Lets recap, as I see it we have a few options with all this in mind on our
> quest to bury MTRR (and later make strong UC default):
> 
> 1) Let drivers do their own get_vm_area() calls as you note and handle the
>    cut and splicing of ioremap areas
> 
> 2) Provide an API to split and splice ioremap ranges
> 
> 3) Try to avoid these types of situations and let drivers simply try to
>    work towards a proper clean non-overlapping different ioremap() ranges
> 
> Let me know if you think of others but please keep in mind the UC- to strong
> UC converstion we want to do later as well. We have ruled out using
> set_memor_wc() now.
> 
> I prefer option 3), its technically possible but only for *new* drivers
> and we'd need either some hard work to split the ioremap() areas or provide
> a a set of *transient* APIs as I had originally proposed to phase / hope for
> final transition. There are 3 drivers to address:
> 
> [snip]

Just some background for folks:
The ivtv driver supports cards that perform Standard Definition PAL,
NTSC, and SECAM TV capture + hardware MPEG-2 encoding and MPEG-2
decoding + TV output.

Of the supported cards only *one* card type, the PVR-350 based on the
CX23415 chip, can perform the MPEG-2 or YUV video decoding and output,
with an OSD overlay.  PVR-350's are legacy PCI cards that have been end
of life since 2088 or earlier.  Despite that, there are still used units
available on Amazon and eBay.

The ivtvfb driver module is an *optional* companion driver module that
provides a framebuffer interface for the user to output an X display, FB
console, or whatever to a standard definition analog PAL, NTSC, or SECAM
TV or VCR.  It does this by co-opting the OSD overlay of the video
decoding output engine and having it take up the whole screen.


 
> c) ivtv: the driver does not have the PCI space mapped out separately, and
> in fact it actually does not do the math for the framebuffer, instead it lets
> the device's own CPU do that and assume where its at, see
> ivtvfb_get_framebuffer() and CX2341X_OSD_GET_FRAMEBUFFER, it has a get
> but not a setter. Its not clear if the firmware would make a split easy.

The CX2341[56]'s firmware + hardware machine are notorious for bugs and
being hard to work with.  It would be difficult to determine with any
certainty that the firmware returned predictable OSD framebuffer
addresses from one user's system to the next.


> We'd need ioremap_ucminus() here too and __arch_phys_wc_add().

Yes.

As a driver writer/maintainer, IMO the name ioremap_ucminus() is jargon,
without a clear meaning from the name, and maybe too x86 PAT specific?
The pat.txt file under Documentation didn't explain what UC- meant; I
had to go searching old LKML emails to understand it was a unchached
region that could be overridden with write combine regions.

To me names along, these lines would be better:
   ioremap_nocache_weak(), or
   ioremap_nocache_mutable(), or
   ioremap_nocache()  (with ioremap_nocache_strong() or something for
the UC version)


> From the beginning it seems only framebuffer devices used MTRR/WC,
[snip]
>  The ivtv device is a good example of the worst type of
> situations and these days. So perhap __arch_phys_wc_add() and a
> ioremap_ucminus() might be something more than transient unless hardware folks
> get a good memo or already know how to just Do The Right Thing (TM).

Just to reiterate a subtle point, use of the ivtvfb is *optional*.  A
user may or may not load it.  When the user does load the ivtvfb driver,
the ivtv driver has already been initialized and may have functions of
the card already in use by userspace.

Hopefully no one is trying to use the OSD as framebuffer and the video
decoder/output engine for video display at the same time.  But the video
decoder/output device nodes may already be open for performing ioctl()
functions so unmapping the decoder IO space out from under them, when
loading the ivtvfb driver module, might not be a good thing. 

Regards,
Andy


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR
       [not found]               ` <CALCETrXd19C6pARde3pv-4pt-i52APtw5xs20itwROPq9VmCfg@mail.gmail.com>
@ 2015-04-13 17:49                 ` Luis R. Rodriguez
  2015-04-15 22:38                   ` Andy Walls
  0 siblings, 1 reply; 40+ messages in thread
From: Luis R. Rodriguez @ 2015-04-13 17:49 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Toshi Kani, H. Peter Anvin, Ingo Molnar, linux-kernel,
	Hal Rosenstock, Sean Hefty, Suresh Siddha, Rickard Strandqvist,
	Mike Marciniszyn, Roland Dreier, Juergen Gross,
	Mauro Carvalho Chehab, Andy Walls, Borislav Petkov, Mel Gorman,
	Vlastimil Babka, Davidlohr Bueso, dave.hansen, plagnioj, tglx,
	Ville Syrjälä,
	linux-fbdev, linux-media, x86

Cc'ing a few others as we ended up talking about the cruxes of my
unposted v2 series as I confirmed that set_memor_wc() would not work
as an alternative to my originally proposed __arch_phys_wc_add() to
force MTRR as a last resort on a few set of last remaining drivers.
This also discusses overlapping ioremap() calls and what we'd need
for a default shift from UC- to strong UC.

On Fri, Apr 10, 2015 at 11:25:22PM -0700, Andy Lutomirski wrote:
> On Apr 10, 2015 6:29 PM, "Luis R. Rodriguez" <mcgrof@suse.com> wrote:
> >
> > On Fri, Apr 10, 2015 at 02:22:51PM -0700, Andy Lutomirski wrote:
> > > On Fri, Apr 10, 2015 at 1:58 PM, Toshi Kani <toshi.kani@hp.com> wrote:
> > > > On Fri, 2015-04-10 at 23:05 +0200, Luis R. Rodriguez wrote:
> > > >> On Fri, Apr 10, 2015 at 01:49:39PM -0600, Toshi Kani wrote:
> > > >> > On Fri, 2015-04-10 at 12:34 -0700, Luis R. Rodriguez wrote:
> > > >> > > On Fri, Apr 10, 2015 at 12:14 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> > > >> > > > On Fri, Apr 10, 2015 at 10:17 AM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > > >> > > >>
> > > >> > > >> Andy, I'm ready to post a v2 series based on review of the first iteration of
> > > >> > > >> the bury-MTRR series however I ran into a snag in vetting for the ioremap_uc()
> > > >> > > >> followed by a set_memory_wc() strategy as a measure to both avoid when possible
> > > >> > > >> overlapping ioremap*() calls and/or to avoid the power of 2 MTRR size
> > > >> > > >> implications on having to use multiple MTRRs.
> > > >> > > >>
> > > >> > > >> I tested the strategy by just making my thinkpad's i915 driver use iremap_uc()
> > > >> > > >> which I add, and then use set_memory_wc(). To start off with I should note
> > > >> > > >> set_memory_*() helpers are x86 specific right now, this is not a problem for
> > > >> > > >> the series but its worth noting as we're replacing MTRR strategies which
> > > >> > > >> are x86 specific, but I am having issues getting set_memory_wc() take effect
> > > >> > > >> on my intel graphics card.
> > > >> > > >>
> > > >> > > >> I've reviewed if this is OK in code and I see no issues other than set_memory_*()
> > > >> > > >> helpers seem to be desirable for RAM, not IO memory, so was wondering if we need
> > > >> > > >> to add other helpers which can address IO memory or if this should work? The diff
> > > >> > > >> for the drivers is below, the actual commit for adding ioremap_uc() folllows
> > > >> > > >> with its commit log. Feedback / review on both is welcome as well as if you
> > > >> > > >> could help me figure out why this test-patch for i915 fails.
> > > >> > > >
> > > >> > > > I think they should work for IO memory, but I'm not really an authority here.
> > > >> > > >
> > > >> > > > Adding some people who have looked at the code recently.
> > > >> > >
> > > >> > > I was avoiding reviewing the cpa code but since this failed I just had
> > > >> > > to review it and I see nothing prevent it from being used on IO memory
> > > >> > > but -- memtype_rb_check_conflict() does prevent an overlap to be set
> > > >> > > on an *existing range* -- and since ioremap_uc() was used earlier the
> > > >> > > first reserve_memtype() with _PAGE_CACHE_MODE_WC by set_memory_wc() I
> > > >> > > believe should fail then. Please correct me if I'm wrong, I don't see
> > > >> > > the "conflicting memory types" print though, so not sure if it was
> > > >> > > because of that.
> > > >> > >
> > > >> > > I only started looking at this though but shouldn't this also mean we
> > > >> > > can't use overlapping ioremap() calls too? I thought that worked,
> > > >> > > because at least some drivers are using that strategy.
> > > >> >
> > > >> > set_memory_*() does not work with I/O memory ranges with the following
> > > >> > reasons:
> > > >> >
> > > >> > 1. __pa(addr) returns a fake address since there is no direct map.
> > > >> > 2. reserve_memtype() tracks regular memory and I/O memory differently.
> > > >> > For regular memory, set_memory_*() can modify WB with a new type since
> > > >> > reserve_memtype() does not track WB.  For I/O memory, reserve_memtype()
> > > >> > detects a conflict when a given type is different from a tracked type.
> > > >>
> > > >> Interesting, but I also just realized I had messed up my test patch too,
> > > >> I checked for (!ret) instead of (ret). This works now.
> > > >>
> > > >> > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > > >> > > index dccdc8a..dd9501b 100644
> > > >> > > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > > >> > > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > > >> > > @@ -1958,12 +1958,22 @@ static int ggtt_probe_common(struct drm_device *dev,
> > > >> > >         gtt_phys_addr = pci_resource_start(dev->pdev, 0) +
> > > >> > >                 (pci_resource_len(dev->pdev, 0) / 2);
> > > >> > >
> > > >> > > -       dev_priv->gtt.gsm = ioremap_wc(gtt_phys_addr, gtt_size);
> > > >> > > +       dev_priv->gtt.gsm = ioremap_uc(gtt_phys_addr, gtt_size);
> > > >> > >         if (!dev_priv->gtt.gsm) {
> > > >> > >                 DRM_ERROR("Failed to map the gtt page table\n");
> > > >> > >                 return -ENOMEM;
> > > >> > >         }
> > > >> > >
> > > >> > > +       printk("mcgrof:set_memory_wc() ggtt_probe_common()\n");
> > > >> > > +
> > > >> > > +       ret = set_memory_wc((unsigned long) dev_priv->gtt.gsm, gtt_size >> PAGE_SHIFT);
> > > >> > > +       if (!ret) {
> > > >>
> > > >> Mess up here.
> > > >>
> > > >> > > +               DRM_ERROR("mcgrof: failed set_memory_wc()\n");
> > > >> > > +               iounmap(dev_priv->gtt.gsm);
> > > >> > > +               return ret;
> > > >> > > +       }
> > > >> > > +
> > > >> > > +
> > > >> > >         ret = setup_scratch_page(dev);
> > > >> > >         if (ret) {
> > > >> > >                 DRM_ERROR("Scratch setup failed\n");
> > > >>
> > > >> as I read the code though reserve_memtype() should not allow for a change
> > > >> of an existing type as you note though (and therefore prevent also
> > > >> overlapping ioremap() calls on PAT), but since this is going through now
> > > >> I am not sure at if this is by chance somehow... ?
> > > >
> > > > Unless you fixed issue #1 above, set_memory_wc() passes a bogus address
> > > > to reserve_memtype().  Hence, it won't cause any conflict.
> > >
> > > Presumably fixing #1 would be okay (slow_virt_to_phys or whatever)
> > > since this stuff is already slow.
> >
> > That wouldn't cut it I think. That would still fail as the types
> > don't match. That means we would have to free_memtype() and
> > then alloc a new reserve_memtype(), but I don't think that's
> > enought still. For instance a new ioremap() does:
> >
> > 1) sanity checks on regions of memory
> > 2) get_vm_area_caller() for the range
> > 3) kernel_map_sync_memtype()
> > 4) ioremap_page_range()
> > 5) mmiotrace_ioremap()
> >
> > Correct me if I'm wrong but I think we'd need to do all this as well.
> > If so we'd be cutting and splicing an ioremap() range. This code would
> > seem tricky to get right. Maybe its best to not support this then and
> > simply have to live with having drivers do their own splititn gof
> > ioremap() calls.
> >
> > This *might* implicate some constraints on burrying MTRR though so...
> > please let me know what you think.
> 
> To throw out more ideas, what if the drivers instead did their own
> get_vm_area vmap calls and then mapped the two consecutive regions
> separately at consecutive addresses?

Instead of having an API do it for them? They'd have to implement quite a bit.
Driver writers also tend to not have the best of judgement and *typically* just
copy and paste code, so we'd have to either annotate it well as a work around
for a few drivers (in my series there are 3 that need this in order for us to
bury MTRR) and not share the code or provide an API for these cases.

Note that I belive these findings should also mean, if I understand things
correctly, that when PAT is used overlapping ioremap*() calls may work with
the caveat that the type of cache used will depend on the offset used, one
cannot hope that using either offset will yield the same caching effects.
There might even be other caveats with this but that will depend on the
hardware and not sure what those effects are. The reason is that for the
the cache effects requiring the appropriate offset is reserve_memtype() on IO
memory will use a fake address and this can in turn mean that even though the
same "pa" address was used we could end up with it not really checking
conflicts with the original reserve_memtype(). Some parts of this complexity
were described above. I only saw a few drivers using overlapping ioremap*()
calls though on my MTRR review and they are all old devices so likely mostly
used on non-PAT systems, but there might be other corner cases elsewhere.

Lets recap, as I see it we have a few options with all this in mind on our
quest to bury MTRR (and later make strong UC default):

1) Let drivers do their own get_vm_area() calls as you note and handle the
   cut and splicing of ioremap areas

2) Provide an API to split and splice ioremap ranges

3) Try to avoid these types of situations and let drivers simply try to
   work towards a proper clean non-overlapping different ioremap() ranges

Let me know if you think of others but please keep in mind the UC- to strong
UC converstion we want to do later as well. We have ruled out using
set_memor_wc() now.

I prefer option 3), its technically possible but only for *new* drivers
and we'd need either some hard work to split the ioremap() areas or provide
a a set of *transient* APIs as I had originally proposed to phase / hope for
final transition. There are 3 drivers to address:

a) atyfb: fortunately I believe I have finished the split for the atyfb
driver, ioremap_uc() would be used on only the MMIO region while ioremap_wc()
on the framebuffer (with newly corrected fixed size). This would go in as
an example of what work was required to do a split.

b) ipath:  while a split on ipath is possible the changes are quite
significant. Apart from changing the driver to use different offset bases
in different regions the driver also has a debug userspace filemap for
the entire register map, the code there would need to be modified to
use the right virtual address base depending on the virtual address accessed.
The qib driver already has logic which could be mimic'd for this fortunatley,
but still - this is considerable work. One side hack I was hoping for
was that overlapping ioremap*() calls with different page attribute
types would work even if the 2nd returned __iomem address was not used,
based on my review and Toshi's feedback on reserve_memtype() this would
_not work_, only using the right virtual address would ensure the right
caching technique is used. In fact since this is not really all clear
I would not be surprised if there are other caveats. Sticking to the original
__arch_phys_wc_add() to force MTRR use here might be best but note that
even if this is done an eventual change from ioremap_nocache() to default
from UC- to strong UC would cause a regression on the desired WC area, in
this case on the PIO buffers. We may need an API to keep UC- for drivers that
know that need it -- or we may need to really do tha hard work to try to
convert this driver to split the iorenmap*() areas. If we do not want to do the
work to split this driver's ioremap*() space we could live with having
__arch_phys_wc_add() and another API to force UC- even if later the default is
strong UC.

c) ivtv: the driver does not have the PCI space mapped out separately, and
in fact it actually does not do the math for the framebuffer, instead it lets
the device's own CPU do that and assume where its at, see
ivtvfb_get_framebuffer() and CX2341X_OSD_GET_FRAMEBUFFER, it has a get
but not a setter. Its not clear if the firmware would make a split easy.
We'd need ioremap_ucminus() here too and __arch_phys_wc_add().

>From the beginning it seems only framebuffer devices used MTRR/WC, lately it
seems infiniband drivers also find good use for for it for PIO TX buffers to
blast some sort of data, in the future I would not be surprised if other
devices found use for it. It may be true that the existing drivers that
requires the above type of work are corner cases -- but I wouldn't hold my
breath for that. The ivtv device is a good example of the worst type of
situations and these days. So perhap __arch_phys_wc_add() and a
ioremap_ucminus() might be something more than transient unless hardware folks
get a good memo or already know how to just Do The Right Thing (TM).

 Luis

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2015-04-22 20:59 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-15 20:42 ioremap_uc() followed by set_memory_wc() - burrying MTRR Andy Lutomirski
2015-04-15 20:56 ` H. Peter Anvin
2015-04-15 22:15 ` Luis R. Rodriguez
2015-04-15 22:50 ` Andy Walls
2015-04-15 23:52   ` Andy Lutomirski
2015-04-16  0:33     ` Andy Walls
2015-04-21 22:46 ` Luis R. Rodriguez
2015-04-21 22:57   ` Jason Gunthorpe
2015-04-21 23:39     ` Luis R. Rodriguez
2015-04-22  5:39       ` Jason Gunthorpe
2015-04-22 15:23         ` Luis R. Rodriguez
2015-04-22 15:54           ` Luis R. Rodriguez
2015-04-22 15:59             ` Luis R. Rodriguez
2015-04-22 16:17           ` Jason Gunthorpe
2015-04-22 16:51             ` Luis R. Rodriguez
2015-04-22 18:53             ` Doug Ledford
2015-04-22 19:05               ` Luis R. Rodriguez
2015-04-22 19:10                 ` Doug Ledford
2015-04-22 19:14                   ` Luis R. Rodriguez
2015-04-22 20:46               ` Jason Gunthorpe
2015-04-22 20:58                 ` Doug Ledford
2015-04-22 16:53           ` Andy Lutomirski
2015-04-22 17:07             ` Luis R. Rodriguez
     [not found] <20150410171750.GA5622@wotan.suse.de>
     [not found] ` <CALCETrUG=RiG8S9Gpiqm_0CxvxurxLTNKyuyPoFNX46EAauA+g@mail.gmail.com>
     [not found]   ` <CAB=NE6XgNgu7i2OiDxFVJLWiEjbjBY17-dV7L3yi2+yzgMhEbw@mail.gmail.com>
     [not found]     ` <1428695379.6646.69.camel@misato.fc.hp.com>
     [not found]       ` <20150410210538.GB5622@wotan.suse.de>
     [not found]         ` <1428699490.21794.5.camel@misato.fc.hp.com>
     [not found]           ` <CALCETrUP688aNjckygqO=AXXrNYvLQX6F0=b5fjmsCqqZU78+Q@mail.gmail.com>
     [not found]             ` <20150411012938.GC5622@wotan.suse.de>
     [not found]               ` <CALCETrXd19C6pARde3pv-4pt-i52APtw5xs20itwROPq9VmCfg@mail.gmail.com>
2015-04-13 17:49                 ` Luis R. Rodriguez
2015-04-15 22:38                   ` Andy Walls
2015-04-15 23:42                     ` Andy Lutomirski
2015-04-15 23:59                       ` Andy Walls
2015-04-16  0:58                         ` Andy Lutomirski
2015-04-16  1:01                           ` Andy Walls
2015-04-16 19:19                             ` Andy Lutomirski
2015-04-21 17:35                           ` Luis R. Rodriguez
2015-04-15 23:58                     ` Luis R. Rodriguez
2015-04-16  1:07                       ` Andy Walls
2015-04-21 22:02                         ` Luis R. Rodriguez
2015-04-21 22:08                           ` Luis R. Rodriguez
2015-04-21 22:09                             ` Andy Lutomirski
     [not found]                               ` <5536d47a.95968c0a.1d12.ffffbf85@mx.google.com>
2015-04-21 23:14                                 ` Luis R. Rodriguez
2015-04-16  4:18                       ` Hyong-Youb Kim
2015-04-16 18:54                         ` Luis R. Rodriguez
2015-04-17  8:00                           ` Hyong-Youb Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).