All of lore.kernel.org
 help / color / mirror / Atom feed
* Write GFX_FLSH_CNT after updating GGTT entries
@ 2015-11-19 10:20 Zhi Wang
  2015-11-19 10:35 ` Ville Syrjälä
  0 siblings, 1 reply; 8+ messages in thread
From: Zhi Wang @ 2015-11-19 10:20 UTC (permalink / raw)
  To: intel-gfx, Tian, Kevin, Gao, Ping A, Zhiyuan Lv

Hi Gurus:
     I'm curious about the register GFX_FLSH_CNT(0x101008) in 
i915_gem_gtt.c. Does these register exist in recently generations? After 
digging into b-spec, it looks only BXT and CHV has this register. Does 
the desktop platform also have this register which needs to be written 
after updating GGTT MMIOs?

BTW: Looks windows driver haven't used this MMIO... So whose behavior is 
the right behavior?

Thanks,
Zhi.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Write GFX_FLSH_CNT after updating GGTT entries
  2015-11-19 10:20 Write GFX_FLSH_CNT after updating GGTT entries Zhi Wang
@ 2015-11-19 10:35 ` Ville Syrjälä
  2015-11-19 13:04   ` Zhi Wang
  2015-11-20  9:23   ` Tian, Kevin
  0 siblings, 2 replies; 8+ messages in thread
From: Ville Syrjälä @ 2015-11-19 10:35 UTC (permalink / raw)
  To: Zhi Wang; +Cc: intel-gfx, Gao, Ping A

On Thu, Nov 19, 2015 at 06:20:23PM +0800, Zhi Wang wrote:
> Hi Gurus:
>      I'm curious about the register GFX_FLSH_CNT(0x101008) in 
> i915_gem_gtt.c. Does these register exist in recently generations? After 
> digging into b-spec, it looks only BXT and CHV has this register. Does 
> the desktop platform also have this register which needs to be written 
> after updating GGTT MMIOs?
> 
> BTW: Looks windows driver haven't used this MMIO... So whose behavior is 
> the right behavior?

As I understand it that register flushes the CPU GTT TLBs, and we need
to do it because of the WC mapping we have for the GTT PTEs. If we used
UC mapping we wouldn't need it since there's supposedly an automagic
TLB flush that happens on PTE writes.

BSpec is bad at finding some registers via bxml. Using dtsearch and
looking for both 0x<offset> and <offset>h is the method I use to track
such things down.

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Write GFX_FLSH_CNT after updating GGTT entries
  2015-11-19 10:35 ` Ville Syrjälä
@ 2015-11-19 13:04   ` Zhi Wang
  2015-11-19 13:26     ` Ville Syrjälä
  2015-11-20  9:23   ` Tian, Kevin
  1 sibling, 1 reply; 8+ messages in thread
From: Zhi Wang @ 2015-11-19 13:04 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx, Gao, Ping A

Hi Ville:

Thanks for the answer! :) Learned a lot.

I think the following scenario should be typical for a general PCI 
devices(perhaps a dedicated video card). How do other PCI devices handle 
this kinds of WC MMIO writes without GFX_FLSH_CNT? Only support UC mapping?

Thanks,
Zhi.

On 11/19/15 18:35, Ville Syrjälä wrote:
> On Thu, Nov 19, 2015 at 06:20:23PM +0800, Zhi Wang wrote:
>> Hi Gurus:
>>       I'm curious about the register GFX_FLSH_CNT(0x101008) in
>> i915_gem_gtt.c. Does these register exist in recently generations? After
>> digging into b-spec, it looks only BXT and CHV has this register. Does
>> the desktop platform also have this register which needs to be written
>> after updating GGTT MMIOs?
>>
>> BTW: Looks windows driver haven't used this MMIO... So whose behavior is
>> the right behavior?
>
> As I understand it that register flushes the CPU GTT TLBs, and we need
> to do it because of the WC mapping we have for the GTT PTEs. If we used
> UC mapping we wouldn't need it since there's supposedly an automagic
> TLB flush that happens on PTE writes.
>
> BSpec is bad at finding some registers via bxml. Using dtsearch and
> looking for both 0x<offset> and <offset>h is the method I use to track
> such things down.
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Write GFX_FLSH_CNT after updating GGTT entries
  2015-11-19 13:04   ` Zhi Wang
@ 2015-11-19 13:26     ` Ville Syrjälä
  2015-11-19 13:28       ` Zhi Wang
  0 siblings, 1 reply; 8+ messages in thread
From: Ville Syrjälä @ 2015-11-19 13:26 UTC (permalink / raw)
  To: Zhi Wang; +Cc: intel-gfx, Gao, Ping A

On Thu, Nov 19, 2015 at 09:04:11PM +0800, Zhi Wang wrote:
> Hi Ville:
> 
> Thanks for the answer! :) Learned a lot.
> 
> I think the following scenario should be typical for a general PCI 
> devices(perhaps a dedicated video card). How do other PCI devices handle 
> this kinds of WC MMIO writes without GFX_FLSH_CNT? Only support UC mapping?

The standard rule is that you can't use WC, except for prefetchable BARs.

> 
> Thanks,
> Zhi.
> 
> On 11/19/15 18:35, Ville Syrjälä wrote:
> > On Thu, Nov 19, 2015 at 06:20:23PM +0800, Zhi Wang wrote:
> >> Hi Gurus:
> >>       I'm curious about the register GFX_FLSH_CNT(0x101008) in
> >> i915_gem_gtt.c. Does these register exist in recently generations? After
> >> digging into b-spec, it looks only BXT and CHV has this register. Does
> >> the desktop platform also have this register which needs to be written
> >> after updating GGTT MMIOs?
> >>
> >> BTW: Looks windows driver haven't used this MMIO... So whose behavior is
> >> the right behavior?
> >
> > As I understand it that register flushes the CPU GTT TLBs, and we need
> > to do it because of the WC mapping we have for the GTT PTEs. If we used
> > UC mapping we wouldn't need it since there's supposedly an automagic
> > TLB flush that happens on PTE writes.
> >
> > BSpec is bad at finding some registers via bxml. Using dtsearch and
> > looking for both 0x<offset> and <offset>h is the method I use to track
> > such things down.
> >

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Write GFX_FLSH_CNT after updating GGTT entries
  2015-11-19 13:26     ` Ville Syrjälä
@ 2015-11-19 13:28       ` Zhi Wang
  0 siblings, 0 replies; 8+ messages in thread
From: Zhi Wang @ 2015-11-19 13:28 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx, Gao, Ping A

Thanks!

OK. Then I check my box:

00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd 
Gen Core processor Graphics Controller (rev 09) (prog-if 00 [VGA 
controller])
	Subsystem: Hewlett-Packard Company Device 3396
	Flags: bus master, fast devsel, latency 0, IRQ 27
	Memory at f7400000 (64-bit, non-prefetchable) [size=4M]

So this should be our own magic to play WC on a non-prefechable BAR 
especially for GGTT MMIOs?

Thanks,
Zhi.

On 11/19/15 21:26, Ville Syrjälä wrote:
> On Thu, Nov 19, 2015 at 09:04:11PM +0800, Zhi Wang wrote:
>> Hi Ville:
>>
>> Thanks for the answer! :) Learned a lot.
>>
>> I think the following scenario should be typical for a general PCI
>> devices(perhaps a dedicated video card). How do other PCI devices handle
>> this kinds of WC MMIO writes without GFX_FLSH_CNT? Only support UC mapping?
>
> The standard rule is that you can't use WC, except for prefetchable BARs.
>
>>
>> Thanks,
>> Zhi.
>>
>> On 11/19/15 18:35, Ville Syrjälä wrote:
>>> On Thu, Nov 19, 2015 at 06:20:23PM +0800, Zhi Wang wrote:
>>>> Hi Gurus:
>>>>        I'm curious about the register GFX_FLSH_CNT(0x101008) in
>>>> i915_gem_gtt.c. Does these register exist in recently generations? After
>>>> digging into b-spec, it looks only BXT and CHV has this register. Does
>>>> the desktop platform also have this register which needs to be written
>>>> after updating GGTT MMIOs?
>>>>
>>>> BTW: Looks windows driver haven't used this MMIO... So whose behavior is
>>>> the right behavior?
>>>
>>> As I understand it that register flushes the CPU GTT TLBs, and we need
>>> to do it because of the WC mapping we have for the GTT PTEs. If we used
>>> UC mapping we wouldn't need it since there's supposedly an automagic
>>> TLB flush that happens on PTE writes.
>>>
>>> BSpec is bad at finding some registers via bxml. Using dtsearch and
>>> looking for both 0x<offset> and <offset>h is the method I use to track
>>> such things down.
>>>
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Write GFX_FLSH_CNT after updating GGTT entries
  2015-11-19 10:35 ` Ville Syrjälä
  2015-11-19 13:04   ` Zhi Wang
@ 2015-11-20  9:23   ` Tian, Kevin
  2015-11-20  9:40     ` Chris Wilson
  1 sibling, 1 reply; 8+ messages in thread
From: Tian, Kevin @ 2015-11-20  9:23 UTC (permalink / raw)
  To: ville.syrjala, Wang, Zhi A; +Cc: intel-gfx, Gao, Ping A

> From: Ville Syrjälä [mailto:ville.syrjala@linux.intel.com]
> Sent: Thursday, November 19, 2015 6:35 PM
> 
> On Thu, Nov 19, 2015 at 06:20:23PM +0800, Zhi Wang wrote:
> > Hi Gurus:
> >      I'm curious about the register GFX_FLSH_CNT(0x101008) in
> > i915_gem_gtt.c. Does these register exist in recently generations? After
> > digging into b-spec, it looks only BXT and CHV has this register. Does
> > the desktop platform also have this register which needs to be written
> > after updating GGTT MMIOs?
> >
> > BTW: Looks windows driver haven't used this MMIO... So whose behavior is
> > the right behavior?
> 
> As I understand it that register flushes the CPU GTT TLBs, and we need
> to do it because of the WC mapping we have for the GTT PTEs. If we used
> UC mapping we wouldn't need it since there's supposedly an automagic
> TLB flush that happens on PTE writes.
> 
> BSpec is bad at finding some registers via bxml. Using dtsearch and
> looking for both 0x<offset> and <offset>h is the method I use to track
> such things down.
> 

Curious how much gain is observed by using WC vs. using UC on GTT
entries?

Thanks
Kevin
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Write GFX_FLSH_CNT after updating GGTT entries
  2015-11-20  9:23   ` Tian, Kevin
@ 2015-11-20  9:40     ` Chris Wilson
  2015-11-20  9:53       ` Tian, Kevin
  0 siblings, 1 reply; 8+ messages in thread
From: Chris Wilson @ 2015-11-20  9:40 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: intel-gfx, Gao, Ping A

On Fri, Nov 20, 2015 at 09:23:12AM +0000, Tian, Kevin wrote:
> > From: Ville Syrjälä [mailto:ville.syrjala@linux.intel.com]
> > Sent: Thursday, November 19, 2015 6:35 PM
> > 
> > On Thu, Nov 19, 2015 at 06:20:23PM +0800, Zhi Wang wrote:
> > > Hi Gurus:
> > >      I'm curious about the register GFX_FLSH_CNT(0x101008) in
> > > i915_gem_gtt.c. Does these register exist in recently generations? After
> > > digging into b-spec, it looks only BXT and CHV has this register. Does
> > > the desktop platform also have this register which needs to be written
> > > after updating GGTT MMIOs?
> > >
> > > BTW: Looks windows driver haven't used this MMIO... So whose behavior is
> > > the right behavior?
> > 
> > As I understand it that register flushes the CPU GTT TLBs, and we need
> > to do it because of the WC mapping we have for the GTT PTEs. If we used
> > UC mapping we wouldn't need it since there's supposedly an automagic
> > TLB flush that happens on PTE writes.
> > 
> > BSpec is bad at finding some registers via bxml. Using dtsearch and
> > looking for both 0x<offset> and <offset>h is the method I use to track
> > such things down.
> > 
> 
> Curious how much gain is observed by using WC vs. using UC on GTT
> entries?

Think back yonder when everything goes through the GGTT, and where
writing the PTEs was slower than allocating a bunch of pages and
applications would insist on submitting new objects every batch.

It was very easy to have workloads where UC GGTT updates were the
ratelimiting step. A WC update is ~8x faster, and sufficient to move the
bottleneck elsewhere.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Write GFX_FLSH_CNT after updating GGTT entries
  2015-11-20  9:40     ` Chris Wilson
@ 2015-11-20  9:53       ` Tian, Kevin
  0 siblings, 0 replies; 8+ messages in thread
From: Tian, Kevin @ 2015-11-20  9:53 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, Gao, Ping A

> From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> Sent: Friday, November 20, 2015 5:40 PM
> 
> On Fri, Nov 20, 2015 at 09:23:12AM +0000, Tian, Kevin wrote:
> > > From: Ville Syrjälä [mailto:ville.syrjala@linux.intel.com]
> > > Sent: Thursday, November 19, 2015 6:35 PM
> > >
> > > On Thu, Nov 19, 2015 at 06:20:23PM +0800, Zhi Wang wrote:
> > > > Hi Gurus:
> > > >      I'm curious about the register GFX_FLSH_CNT(0x101008) in
> > > > i915_gem_gtt.c. Does these register exist in recently generations? After
> > > > digging into b-spec, it looks only BXT and CHV has this register. Does
> > > > the desktop platform also have this register which needs to be written
> > > > after updating GGTT MMIOs?
> > > >
> > > > BTW: Looks windows driver haven't used this MMIO... So whose behavior is
> > > > the right behavior?
> > >
> > > As I understand it that register flushes the CPU GTT TLBs, and we need
> > > to do it because of the WC mapping we have for the GTT PTEs. If we used
> > > UC mapping we wouldn't need it since there's supposedly an automagic
> > > TLB flush that happens on PTE writes.
> > >
> > > BSpec is bad at finding some registers via bxml. Using dtsearch and
> > > looking for both 0x<offset> and <offset>h is the method I use to track
> > > such things down.
> > >
> >
> > Curious how much gain is observed by using WC vs. using UC on GTT
> > entries?
> 
> Think back yonder when everything goes through the GGTT, and where
> writing the PTEs was slower than allocating a bunch of pages and
> applications would insist on submitting new objects every batch.
> 
> It was very easy to have workloads where UC GGTT updates were the
> ratelimiting step. A WC update is ~8x faster, and sufficient to move the
> bottleneck elsewhere.
> -Chris

8x is clearly worthy of it. Today w/ PPGTT this optimization is
somehow less important then. :-)

Thanks
Kevin
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-11-20  9:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-19 10:20 Write GFX_FLSH_CNT after updating GGTT entries Zhi Wang
2015-11-19 10:35 ` Ville Syrjälä
2015-11-19 13:04   ` Zhi Wang
2015-11-19 13:26     ` Ville Syrjälä
2015-11-19 13:28       ` Zhi Wang
2015-11-20  9:23   ` Tian, Kevin
2015-11-20  9:40     ` Chris Wilson
2015-11-20  9:53       ` Tian, Kevin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.