All of lore.kernel.org
 help / color / mirror / Atom feed
* TTM placement & caching issue/questions
@ 2014-09-04  0:12 ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2014-09-04  0:12 UTC (permalink / raw)
  To: dri-devel; +Cc: Dave Airlie, linuxppc-dev, Alex Deucher

Hi folks !

I've been tracking down some problems with the recent DRI on powerpc and
stumbled upon something that doesn't look right, and not necessarily
only for us.

Now it's possible that I haven't fully understood the code here and I
also don't know to what extent some of that behaviour is necessary for
some platforms such as Intel GTT bits.

What I've observed with a simple/dumb (no DMA) driver like AST (but this
probably happens more generally) is that when evicting a BO from VRAM
into System memory, the TTM tries to preserve the existing caching
attributes of the VRAM object.

>From what I can tell, we end up with going from VRAM to System memory
type, and we eventually call ttm_bo_select_caching() to select the
caching option for the target.

This will, from what I can tell, try to use the same caching mode as the
original object:

	if ((cur_placement & caching) != 0)
		result |= (cur_placement & caching);

And cur_placement comes from bo->mem.placement which as far as I can
tell is based on the placement array which the drivers set up.

Now they tend to uniformly setup the placement for System memory as
TTM_PL_MASK_CACHING which enables all caching modes.

So I end up with, for example, my System memory BOs having
TTM_PL_FLAG_CACHED not set (though they also don't have
TTM_PL_FLAG_UNCACHED) and TTM_PL_FLAG_WC.

We don't seem to use the man->default_caching (which will have
TTM_PL_FLAG_CACHED) unless there is no matching bit at all between the
proposed placement and the existing caching mode.

Now this is a problem for several reason that I can think of:

 - On a number of powerpc platforms, such as all our server 64-bit one
for example, it's actually illegal to map system memory non-cached. The
system is fully cache coherent for all possible DMA originators (that we
care about at least) and mapping memory non-cachable while it's mapped
cachable in the linear mapping can cause nasty cache paradox which, when
detected by HW, can checkstop the system.

 - A similar issue exists, afaik, on ARM >= v7, so anything mapped
non-cachable must be removed from the linear mapping explicitly since
otherwise it can be speculatively prefetched into the cache.

 - I don't know about x86, but even then, it looks quite sub-optimal to
map the memory backing of the BOs and access it using a WC rather than a
cachable mapping attribute.

Now, some folks on IRC mentioned that there might be reasons for the
current behaviour as to not change the caching attributes when going
in/out of the GTT on Intel, I don't know how that relates and how that
works, but maybe that should be enforced by having a different placement
mask specifically on those chipsets.

Dave, should we change the various PCI drivers for generally coherent
devices such that the System memory type doesn't allow placements
without CACHED attribute ? Or at least on coherent platforms ? How do
detect that ? Should we have a TTM helper to establish the default
memory placement attributes that "normal PCI" drivers call to set that
up so we can have all the necessary arch ifdefs in one single place, at
least for "classic PCI/PCIe" stuff (AGP might need additional tweaks) ?

Non-PCI and "special" drivers like Intel can use a different set of
placement attributes to represent the requirements of those specific
platforms (mostly thinking of embedded ARM here which under some
circumstances might actually require non-cached mappings).
Or am I missing another part of the puzzle ?

As it-is, things are broken for me even for dumb drivers, and I suspect
to a large extent with radeon and nouveau too, though in some case we
might get away with it most of the time ... until the machine locks up
for some unexplainable reason... This might cause problems on existing
distros such as RHEL7 with our radeon adapters even.

Any suggestion of what's the best approach to fix it ? I'm happy to
produce the patches but I'm not that familiar with the TTM so I would
like to make sure I'm the right track first :-)

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* TTM placement & caching issue/questions
@ 2014-09-04  0:12 ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2014-09-04  0:12 UTC (permalink / raw)
  To: dri-devel; +Cc: linuxppc-dev

Hi folks !

I've been tracking down some problems with the recent DRI on powerpc and
stumbled upon something that doesn't look right, and not necessarily
only for us.

Now it's possible that I haven't fully understood the code here and I
also don't know to what extent some of that behaviour is necessary for
some platforms such as Intel GTT bits.

What I've observed with a simple/dumb (no DMA) driver like AST (but this
probably happens more generally) is that when evicting a BO from VRAM
into System memory, the TTM tries to preserve the existing caching
attributes of the VRAM object.

>From what I can tell, we end up with going from VRAM to System memory
type, and we eventually call ttm_bo_select_caching() to select the
caching option for the target.

This will, from what I can tell, try to use the same caching mode as the
original object:

	if ((cur_placement & caching) != 0)
		result |= (cur_placement & caching);

And cur_placement comes from bo->mem.placement which as far as I can
tell is based on the placement array which the drivers set up.

Now they tend to uniformly setup the placement for System memory as
TTM_PL_MASK_CACHING which enables all caching modes.

So I end up with, for example, my System memory BOs having
TTM_PL_FLAG_CACHED not set (though they also don't have
TTM_PL_FLAG_UNCACHED) and TTM_PL_FLAG_WC.

We don't seem to use the man->default_caching (which will have
TTM_PL_FLAG_CACHED) unless there is no matching bit at all between the
proposed placement and the existing caching mode.

Now this is a problem for several reason that I can think of:

 - On a number of powerpc platforms, such as all our server 64-bit one
for example, it's actually illegal to map system memory non-cached. The
system is fully cache coherent for all possible DMA originators (that we
care about at least) and mapping memory non-cachable while it's mapped
cachable in the linear mapping can cause nasty cache paradox which, when
detected by HW, can checkstop the system.

 - A similar issue exists, afaik, on ARM >= v7, so anything mapped
non-cachable must be removed from the linear mapping explicitly since
otherwise it can be speculatively prefetched into the cache.

 - I don't know about x86, but even then, it looks quite sub-optimal to
map the memory backing of the BOs and access it using a WC rather than a
cachable mapping attribute.

Now, some folks on IRC mentioned that there might be reasons for the
current behaviour as to not change the caching attributes when going
in/out of the GTT on Intel, I don't know how that relates and how that
works, but maybe that should be enforced by having a different placement
mask specifically on those chipsets.

Dave, should we change the various PCI drivers for generally coherent
devices such that the System memory type doesn't allow placements
without CACHED attribute ? Or at least on coherent platforms ? How do
detect that ? Should we have a TTM helper to establish the default
memory placement attributes that "normal PCI" drivers call to set that
up so we can have all the necessary arch ifdefs in one single place, at
least for "classic PCI/PCIe" stuff (AGP might need additional tweaks) ?

Non-PCI and "special" drivers like Intel can use a different set of
placement attributes to represent the requirements of those specific
platforms (mostly thinking of embedded ARM here which under some
circumstances might actually require non-cached mappings).
Or am I missing another part of the puzzle ?

As it-is, things are broken for me even for dumb drivers, and I suspect
to a large extent with radeon and nouveau too, though in some case we
might get away with it most of the time ... until the machine locks up
for some unexplainable reason... This might cause problems on existing
distros such as RHEL7 with our radeon adapters even.

Any suggestion of what's the best approach to fix it ? I'm happy to
produce the patches but I'm not that familiar with the TTM so I would
like to make sure I'm the right track first :-)

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
  2014-09-04  0:12 ` Benjamin Herrenschmidt
  (?)
@ 2014-09-04  1:55 ` Jerome Glisse
  2014-09-04  2:07     ` Jerome Glisse
                     ` (2 more replies)
  -1 siblings, 3 replies; 33+ messages in thread
From: Jerome Glisse @ 2014-09-04  1:55 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig,
	dri-devel

On Thu, Sep 04, 2014 at 10:12:27AM +1000, Benjamin Herrenschmidt wrote:
> Hi folks !
> 
> I've been tracking down some problems with the recent DRI on powerpc and
> stumbled upon something that doesn't look right, and not necessarily
> only for us.
> 
> Now it's possible that I haven't fully understood the code here and I
> also don't know to what extent some of that behaviour is necessary for
> some platforms such as Intel GTT bits.
> 
> What I've observed with a simple/dumb (no DMA) driver like AST (but this
> probably happens more generally) is that when evicting a BO from VRAM
> into System memory, the TTM tries to preserve the existing caching
> attributes of the VRAM object.
> 
> From what I can tell, we end up with going from VRAM to System memory
> type, and we eventually call ttm_bo_select_caching() to select the
> caching option for the target.
> 
> This will, from what I can tell, try to use the same caching mode as the
> original object:
> 
> 	if ((cur_placement & caching) != 0)
> 		result |= (cur_placement & caching);
> 
> And cur_placement comes from bo->mem.placement which as far as I can
> tell is based on the placement array which the drivers set up.
> 
> Now they tend to uniformly setup the placement for System memory as
> TTM_PL_MASK_CACHING which enables all caching modes.
> 
> So I end up with, for example, my System memory BOs having
> TTM_PL_FLAG_CACHED not set (though they also don't have
> TTM_PL_FLAG_UNCACHED) and TTM_PL_FLAG_WC.
> 
> We don't seem to use the man->default_caching (which will have
> TTM_PL_FLAG_CACHED) unless there is no matching bit at all between the
> proposed placement and the existing caching mode.
> 
> Now this is a problem for several reason that I can think of:
> 
>  - On a number of powerpc platforms, such as all our server 64-bit one
> for example, it's actually illegal to map system memory non-cached. The
> system is fully cache coherent for all possible DMA originators (that we
> care about at least) and mapping memory non-cachable while it's mapped
> cachable in the linear mapping can cause nasty cache paradox which, when
> detected by HW, can checkstop the system.
> 
>  - A similar issue exists, afaik, on ARM >= v7, so anything mapped
> non-cachable must be removed from the linear mapping explicitly since
> otherwise it can be speculatively prefetched into the cache.
> 
>  - I don't know about x86, but even then, it looks quite sub-optimal to
> map the memory backing of the BOs and access it using a WC rather than a
> cachable mapping attribute.
> 
> Now, some folks on IRC mentioned that there might be reasons for the
> current behaviour as to not change the caching attributes when going
> in/out of the GTT on Intel, I don't know how that relates and how that
> works, but maybe that should be enforced by having a different placement
> mask specifically on those chipsets.
> 
> Dave, should we change the various PCI drivers for generally coherent
> devices such that the System memory type doesn't allow placements
> without CACHED attribute ? Or at least on coherent platforms ? How do
> detect that ? Should we have a TTM helper to establish the default
> memory placement attributes that "normal PCI" drivers call to set that
> up so we can have all the necessary arch ifdefs in one single place, at
> least for "classic PCI/PCIe" stuff (AGP might need additional tweaks) ?
> 
> Non-PCI and "special" drivers like Intel can use a different set of
> placement attributes to represent the requirements of those specific
> platforms (mostly thinking of embedded ARM here which under some
> circumstances might actually require non-cached mappings).
> Or am I missing another part of the puzzle ?
> 
> As it-is, things are broken for me even for dumb drivers, and I suspect
> to a large extent with radeon and nouveau too, though in some case we
> might get away with it most of the time ... until the machine locks up
> for some unexplainable reason... This might cause problems on existing
> distros such as RHEL7 with our radeon adapters even.
> 
> Any suggestion of what's the best approach to fix it ? I'm happy to
> produce the patches but I'm not that familiar with the TTM so I would
> like to make sure I'm the right track first :-)

While i agree about the issue of incoherent double map of same page, i
think we have more issue. For instance lattely AMD have been pushing a
lot of patches to move things to use uncached memory for radeon and as
usual thoses patches comes with no comment to the motivations of those
changes. I am ccing the usual suspect ;)

What i understand is that uncached mapping for some frequently use buffer
give a significant performance boost (i am assuming this has to do with
all the snoop pci transaction overhead).

>From my understanding this is something that is allow on other OS and
any driver wishing to compete from performance point of view will want
that.


So i think we need to get a platform flags and or set_pages_array_wc|uc
needs to fail and this would fallback to cached mapping if the fallback
code still works. So if your arch properly return and error for those
cache changing function then you should be fine.

This also means that we need to fix ttm_tt_set_placement_caching so that
when it returns an error it switches to cached mapping. Which will always
work.


Cheers,
Jérôme

> 
> Cheers,
> Ben.
> 
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
  2014-09-04  1:55 ` Jerome Glisse
@ 2014-09-04  2:07     ` Jerome Glisse
  2014-09-04  2:15   ` Benjamin Herrenschmidt
  2014-09-04  7:12   ` Michel Dänzer
  2 siblings, 0 replies; 33+ messages in thread
From: Jerome Glisse @ 2014-09-04  2:07 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig,
	dri-devel

[-- Attachment #1: Type: text/plain, Size: 6133 bytes --]

On Wed, Sep 03, 2014 at 09:55:53PM -0400, Jerome Glisse wrote:
> On Thu, Sep 04, 2014 at 10:12:27AM +1000, Benjamin Herrenschmidt wrote:
> > Hi folks !
> > 
> > I've been tracking down some problems with the recent DRI on powerpc and
> > stumbled upon something that doesn't look right, and not necessarily
> > only for us.
> > 
> > Now it's possible that I haven't fully understood the code here and I
> > also don't know to what extent some of that behaviour is necessary for
> > some platforms such as Intel GTT bits.
> > 
> > What I've observed with a simple/dumb (no DMA) driver like AST (but this
> > probably happens more generally) is that when evicting a BO from VRAM
> > into System memory, the TTM tries to preserve the existing caching
> > attributes of the VRAM object.
> > 
> > From what I can tell, we end up with going from VRAM to System memory
> > type, and we eventually call ttm_bo_select_caching() to select the
> > caching option for the target.
> > 
> > This will, from what I can tell, try to use the same caching mode as the
> > original object:
> > 
> > 	if ((cur_placement & caching) != 0)
> > 		result |= (cur_placement & caching);
> > 
> > And cur_placement comes from bo->mem.placement which as far as I can
> > tell is based on the placement array which the drivers set up.
> > 
> > Now they tend to uniformly setup the placement for System memory as
> > TTM_PL_MASK_CACHING which enables all caching modes.
> > 
> > So I end up with, for example, my System memory BOs having
> > TTM_PL_FLAG_CACHED not set (though they also don't have
> > TTM_PL_FLAG_UNCACHED) and TTM_PL_FLAG_WC.
> > 
> > We don't seem to use the man->default_caching (which will have
> > TTM_PL_FLAG_CACHED) unless there is no matching bit at all between the
> > proposed placement and the existing caching mode.
> > 
> > Now this is a problem for several reason that I can think of:
> > 
> >  - On a number of powerpc platforms, such as all our server 64-bit one
> > for example, it's actually illegal to map system memory non-cached. The
> > system is fully cache coherent for all possible DMA originators (that we
> > care about at least) and mapping memory non-cachable while it's mapped
> > cachable in the linear mapping can cause nasty cache paradox which, when
> > detected by HW, can checkstop the system.
> > 
> >  - A similar issue exists, afaik, on ARM >= v7, so anything mapped
> > non-cachable must be removed from the linear mapping explicitly since
> > otherwise it can be speculatively prefetched into the cache.
> > 
> >  - I don't know about x86, but even then, it looks quite sub-optimal to
> > map the memory backing of the BOs and access it using a WC rather than a
> > cachable mapping attribute.
> > 
> > Now, some folks on IRC mentioned that there might be reasons for the
> > current behaviour as to not change the caching attributes when going
> > in/out of the GTT on Intel, I don't know how that relates and how that
> > works, but maybe that should be enforced by having a different placement
> > mask specifically on those chipsets.
> > 
> > Dave, should we change the various PCI drivers for generally coherent
> > devices such that the System memory type doesn't allow placements
> > without CACHED attribute ? Or at least on coherent platforms ? How do
> > detect that ? Should we have a TTM helper to establish the default
> > memory placement attributes that "normal PCI" drivers call to set that
> > up so we can have all the necessary arch ifdefs in one single place, at
> > least for "classic PCI/PCIe" stuff (AGP might need additional tweaks) ?
> > 
> > Non-PCI and "special" drivers like Intel can use a different set of
> > placement attributes to represent the requirements of those specific
> > platforms (mostly thinking of embedded ARM here which under some
> > circumstances might actually require non-cached mappings).
> > Or am I missing another part of the puzzle ?
> > 
> > As it-is, things are broken for me even for dumb drivers, and I suspect
> > to a large extent with radeon and nouveau too, though in some case we
> > might get away with it most of the time ... until the machine locks up
> > for some unexplainable reason... This might cause problems on existing
> > distros such as RHEL7 with our radeon adapters even.
> > 
> > Any suggestion of what's the best approach to fix it ? I'm happy to
> > produce the patches but I'm not that familiar with the TTM so I would
> > like to make sure I'm the right track first :-)
> 
> While i agree about the issue of incoherent double map of same page, i
> think we have more issue. For instance lattely AMD have been pushing a
> lot of patches to move things to use uncached memory for radeon and as
> usual thoses patches comes with no comment to the motivations of those
> changes. I am ccing the usual suspect ;)
> 
> What i understand is that uncached mapping for some frequently use buffer
> give a significant performance boost (i am assuming this has to do with
> all the snoop pci transaction overhead).
> 
> From my understanding this is something that is allow on other OS and
> any driver wishing to compete from performance point of view will want
> that.
> 
> 
> So i think we need to get a platform flags and or set_pages_array_wc|uc
> needs to fail and this would fallback to cached mapping if the fallback
> code still works. So if your arch properly return and error for those
> cache changing function then you should be fine.
> 
> This also means that we need to fix ttm_tt_set_placement_caching so that
> when it returns an error it switches to cached mapping. Which will always
> work.

So in the meantime the attached patch should work, it just silently ignore
the caching attribute request on non x86 instead of pretending that things
are setup as expected and then latter the radeon ou nouveau hw unsetting
the snoop bit.

It's not tested but i think it should work.


> 
> Cheers,
> Jérôme
> 
> > 
> > Cheers,
> > Ben.
> > 
> > 
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/dri-devel

[-- Attachment #2: 0001-drm-ttm-force-cached-mapping-on-non-x86-platform.patch --]
[-- Type: text/plain, Size: 1687 bytes --]

>From df0b5d1488daed05d7b4508759401a3e89cd4a38 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com>
Date: Wed, 3 Sep 2014 22:04:34 -0400
Subject: [PATCH] drm/ttm: force cached mapping on non x86 platform.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

People interested in providing uncached or write combined mapping
on there architecture need to do the ground work inside there arch
specific code to allow to break the linear kernel mapping so that
page mapping attributes can be updated, in the meantime force cached
mapping for non x86 architecture.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
---
 drivers/gpu/drm/ttm/ttm_tt.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index bf080ab..de14b83 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -89,14 +89,6 @@ static inline int ttm_tt_set_page_caching(struct page *p,
 
 	return ret;
 }
-#else /* CONFIG_X86 */
-static inline int ttm_tt_set_page_caching(struct page *p,
-					  enum ttm_caching_state c_old,
-					  enum ttm_caching_state c_new)
-{
-	return 0;
-}
-#endif /* CONFIG_X86 */
 
 /*
  * Change caching policy for the linear kernel map
@@ -149,6 +141,15 @@ out_err:
 	return ret;
 }
 
+#else /* CONFIG_X86 */
+static int ttm_tt_set_caching(struct ttm_tt *ttm,
+			      enum ttm_caching_state c_state)
+{
+	ttm->caching_state = tt_cached;
+	return 0;
+}
+#endif /* CONFIG_X86 */
+
 int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement)
 {
 	enum ttm_caching_state state;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
@ 2014-09-04  2:07     ` Jerome Glisse
  0 siblings, 0 replies; 33+ messages in thread
From: Jerome Glisse @ 2014-09-04  2:07 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig,
	dri-devel

[-- Attachment #1: Type: text/plain, Size: 6266 bytes --]

On Wed, Sep 03, 2014 at 09:55:53PM -0400, Jerome Glisse wrote:
> On Thu, Sep 04, 2014 at 10:12:27AM +1000, Benjamin Herrenschmidt wrote:
> > Hi folks !
> > 
> > I've been tracking down some problems with the recent DRI on powerpc and
> > stumbled upon something that doesn't look right, and not necessarily
> > only for us.
> > 
> > Now it's possible that I haven't fully understood the code here and I
> > also don't know to what extent some of that behaviour is necessary for
> > some platforms such as Intel GTT bits.
> > 
> > What I've observed with a simple/dumb (no DMA) driver like AST (but this
> > probably happens more generally) is that when evicting a BO from VRAM
> > into System memory, the TTM tries to preserve the existing caching
> > attributes of the VRAM object.
> > 
> > From what I can tell, we end up with going from VRAM to System memory
> > type, and we eventually call ttm_bo_select_caching() to select the
> > caching option for the target.
> > 
> > This will, from what I can tell, try to use the same caching mode as the
> > original object:
> > 
> > 	if ((cur_placement & caching) != 0)
> > 		result |= (cur_placement & caching);
> > 
> > And cur_placement comes from bo->mem.placement which as far as I can
> > tell is based on the placement array which the drivers set up.
> > 
> > Now they tend to uniformly setup the placement for System memory as
> > TTM_PL_MASK_CACHING which enables all caching modes.
> > 
> > So I end up with, for example, my System memory BOs having
> > TTM_PL_FLAG_CACHED not set (though they also don't have
> > TTM_PL_FLAG_UNCACHED) and TTM_PL_FLAG_WC.
> > 
> > We don't seem to use the man->default_caching (which will have
> > TTM_PL_FLAG_CACHED) unless there is no matching bit at all between the
> > proposed placement and the existing caching mode.
> > 
> > Now this is a problem for several reason that I can think of:
> > 
> >  - On a number of powerpc platforms, such as all our server 64-bit one
> > for example, it's actually illegal to map system memory non-cached. The
> > system is fully cache coherent for all possible DMA originators (that we
> > care about at least) and mapping memory non-cachable while it's mapped
> > cachable in the linear mapping can cause nasty cache paradox which, when
> > detected by HW, can checkstop the system.
> > 
> >  - A similar issue exists, afaik, on ARM >= v7, so anything mapped
> > non-cachable must be removed from the linear mapping explicitly since
> > otherwise it can be speculatively prefetched into the cache.
> > 
> >  - I don't know about x86, but even then, it looks quite sub-optimal to
> > map the memory backing of the BOs and access it using a WC rather than a
> > cachable mapping attribute.
> > 
> > Now, some folks on IRC mentioned that there might be reasons for the
> > current behaviour as to not change the caching attributes when going
> > in/out of the GTT on Intel, I don't know how that relates and how that
> > works, but maybe that should be enforced by having a different placement
> > mask specifically on those chipsets.
> > 
> > Dave, should we change the various PCI drivers for generally coherent
> > devices such that the System memory type doesn't allow placements
> > without CACHED attribute ? Or at least on coherent platforms ? How do
> > detect that ? Should we have a TTM helper to establish the default
> > memory placement attributes that "normal PCI" drivers call to set that
> > up so we can have all the necessary arch ifdefs in one single place, at
> > least for "classic PCI/PCIe" stuff (AGP might need additional tweaks) ?
> > 
> > Non-PCI and "special" drivers like Intel can use a different set of
> > placement attributes to represent the requirements of those specific
> > platforms (mostly thinking of embedded ARM here which under some
> > circumstances might actually require non-cached mappings).
> > Or am I missing another part of the puzzle ?
> > 
> > As it-is, things are broken for me even for dumb drivers, and I suspect
> > to a large extent with radeon and nouveau too, though in some case we
> > might get away with it most of the time ... until the machine locks up
> > for some unexplainable reason... This might cause problems on existing
> > distros such as RHEL7 with our radeon adapters even.
> > 
> > Any suggestion of what's the best approach to fix it ? I'm happy to
> > produce the patches but I'm not that familiar with the TTM so I would
> > like to make sure I'm the right track first :-)
> 
> While i agree about the issue of incoherent double map of same page, i
> think we have more issue. For instance lattely AMD have been pushing a
> lot of patches to move things to use uncached memory for radeon and as
> usual thoses patches comes with no comment to the motivations of those
> changes. I am ccing the usual suspect ;)
> 
> What i understand is that uncached mapping for some frequently use buffer
> give a significant performance boost (i am assuming this has to do with
> all the snoop pci transaction overhead).
> 
> From my understanding this is something that is allow on other OS and
> any driver wishing to compete from performance point of view will want
> that.
> 
> 
> So i think we need to get a platform flags and or set_pages_array_wc|uc
> needs to fail and this would fallback to cached mapping if the fallback
> code still works. So if your arch properly return and error for those
> cache changing function then you should be fine.
> 
> This also means that we need to fix ttm_tt_set_placement_caching so that
> when it returns an error it switches to cached mapping. Which will always
> work.

So in the meantime the attached patch should work, it just silently ignore
the caching attribute request on non x86 instead of pretending that things
are setup as expected and then latter the radeon ou nouveau hw unsetting
the snoop bit.

It's not tested but i think it should work.


> 
> Cheers,
> Jérôme
> 
> > 
> > Cheers,
> > Ben.
> > 
> > 
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/dri-devel

[-- Attachment #2: 0001-drm-ttm-force-cached-mapping-on-non-x86-platform.patch --]
[-- Type: text/plain, Size: 1744 bytes --]

>From df0b5d1488daed05d7b4508759401a3e89cd4a38 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com>
Date: Wed, 3 Sep 2014 22:04:34 -0400
Subject: [PATCH] drm/ttm: force cached mapping on non x86 platform.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

People interested in providing uncached or write combined mapping
on there architecture need to do the ground work inside there arch
specific code to allow to break the linear kernel mapping so that
page mapping attributes can be updated, in the meantime force cached
mapping for non x86 architecture.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
---
 drivers/gpu/drm/ttm/ttm_tt.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index bf080ab..de14b83 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -89,14 +89,6 @@ static inline int ttm_tt_set_page_caching(struct page *p,
 
 	return ret;
 }
-#else /* CONFIG_X86 */
-static inline int ttm_tt_set_page_caching(struct page *p,
-					  enum ttm_caching_state c_old,
-					  enum ttm_caching_state c_new)
-{
-	return 0;
-}
-#endif /* CONFIG_X86 */
 
 /*
  * Change caching policy for the linear kernel map
@@ -149,6 +141,15 @@ out_err:
 	return ret;
 }
 
+#else /* CONFIG_X86 */
+static int ttm_tt_set_caching(struct ttm_tt *ttm,
+			      enum ttm_caching_state c_state)
+{
+	ttm->caching_state = tt_cached;
+	return 0;
+}
+#endif /* CONFIG_X86 */
+
 int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement)
 {
 	enum ttm_caching_state state;
-- 
1.9.3


[-- Attachment #3: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
  2014-09-04  1:55 ` Jerome Glisse
  2014-09-04  2:07     ` Jerome Glisse
@ 2014-09-04  2:15   ` Benjamin Herrenschmidt
  2014-09-04  7:12   ` Michel Dänzer
  2 siblings, 0 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2014-09-04  2:15 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig,
	dri-devel

On Wed, 2014-09-03 at 21:55 -0400, Jerome Glisse wrote:
> So i think we need to get a platform flags and or set_pages_array_wc|uc
> needs to fail and this would fallback to cached mapping if the fallback
> code still works. So if your arch properly return and error for those
> cache changing function then you should be fine.
> 
> This also means that we need to fix ttm_tt_set_placement_caching so that
> when it returns an error it switches to cached mapping. Which will always
> work.

Can't I just filter the mem_type definitions in the mem_type_manager
with something along that totally untested patch ?

Or do I *also* need to make those set_page_array_* things fail ?

--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1308,6 +1308,24 @@ int ttm_bo_evict_mm(struct ttm_bo_device *bdev, unsigned 
 }
 EXPORT_SYMBOL(ttm_bo_evict_mm);
 
+static void ttm_bo_filter_mem_type(struct ttm_bo_device *bdev, unsigned type,
+                                  struct ttm_mem_type_manager *man)
+{
+       /*
+        * On some architectures/patforms, we cannot allow non-cachable
+        * mappings of system memory. This can be a problem with AGP on
+        * old G5 systems vs. TTM_PL_TT but we don't really have a choice
+        * at this point on ppc64 at least and the AGP on these never
+        * worked reliably anyway.
+        */
+#if defined(CONFIG_PPC) && !defined(CONFIG_NOT_COHERENT_CACHE)
+       if (type == TTM_PL_SYSTEM || type == TTM_PL_TT) {
+               man->available_caching &= TTM_PL_FLAG_CACHED;
+               man->default_caching &= man->available_caching;
+       }
+#endif
+}
+
 int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type,
                        unsigned long p_size)
 {
@@ -1327,6 +1345,8 @@ int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned ty
                return ret;
        man->bdev = bdev;
 
+       ttm_bo_filter_mem_type(bdev, type, man);
+
        ret = 0;
        if (type != TTM_PL_SYSTEM) {
                ret = (*man->func->init)(man, p_size);

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
  2014-09-04  2:07     ` Jerome Glisse
@ 2014-09-04  2:25       ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2014-09-04  2:25 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig,
	dri-devel

On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote:

> So in the meantime the attached patch should work, it just silently ignore
> the caching attribute request on non x86 instead of pretending that things
> are setup as expected and then latter the radeon ou nouveau hw unsetting
> the snoop bit.
> 
> It's not tested but i think it should work.

I'm still getting placements with !CACHED going from bo_memcpy in
ttm_io_prot() though ... I'm looking at filtering the placement
attributes instead.

Ben.

> > 
> > Cheers,
> > Jérôme
> > 
> > > 
> > > Cheers,
> > > Ben.
> > > 
> > > 
> > > _______________________________________________
> > > dri-devel mailing list
> > > dri-devel@lists.freedesktop.org
> > > http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
@ 2014-09-04  2:25       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2014-09-04  2:25 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig,
	dri-devel

On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote:

> So in the meantime the attached patch should work, it just silently ignore
> the caching attribute request on non x86 instead of pretending that things
> are setup as expected and then latter the radeon ou nouveau hw unsetting
> the snoop bit.
> 
> It's not tested but i think it should work.

I'm still getting placements with !CACHED going from bo_memcpy in
ttm_io_prot() though ... I'm looking at filtering the placement
attributes instead.

Ben.

> > 
> > Cheers,
> > Jérôme
> > 
> > > 
> > > Cheers,
> > > Ben.
> > > 
> > > 
> > > _______________________________________________
> > > dri-devel mailing list
> > > dri-devel@lists.freedesktop.org
> > > http://lists.freedesktop.org/mailman/listinfo/dri-devel


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
  2014-09-04  2:25       ` Benjamin Herrenschmidt
@ 2014-09-04  2:31         ` Jerome Glisse
  -1 siblings, 0 replies; 33+ messages in thread
From: Jerome Glisse @ 2014-09-04  2:31 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig,
	dri-devel

[-- Attachment #1: Type: text/plain, Size: 945 bytes --]

On Thu, Sep 04, 2014 at 12:25:23PM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote:
> 
> > So in the meantime the attached patch should work, it just silently ignore
> > the caching attribute request on non x86 instead of pretending that things
> > are setup as expected and then latter the radeon ou nouveau hw unsetting
> > the snoop bit.
> > 
> > It's not tested but i think it should work.
> 
> I'm still getting placements with !CACHED going from bo_memcpy in
> ttm_io_prot() though ... I'm looking at filtering the placement
> attributes instead.
> 
> Ben.

Ok so this one should do the trick.


> 
> > > 
> > > Cheers,
> > > Jérôme
> > > 
> > > > 
> > > > Cheers,
> > > > Ben.
> > > > 
> > > > 
> > > > _______________________________________________
> > > > dri-devel mailing list
> > > > dri-devel@lists.freedesktop.org
> > > > http://lists.freedesktop.org/mailman/listinfo/dri-devel
> 
> 

[-- Attachment #2: 0001-drm-ttm-force-cached-mapping-on-non-x86-platform.patch --]
[-- Type: text/plain, Size: 4525 bytes --]

>From def7a056d042220f91016d0a7c245ba8e96f90ba Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com>
Date: Wed, 3 Sep 2014 22:04:34 -0400
Subject: [PATCH] drm/ttm: force cached mapping on non x86 platform.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

People interested in providing uncached or write combined mapping
on there architecture need to do the ground work inside there arch
specific code to allow to break the linear kernel mapping so that
page mapping attributes can be updated, in the meantime force cached
mapping for non x86 architecture.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
---
 drivers/gpu/drm/radeon/radeon_ttm.c |  2 +-
 drivers/gpu/drm/ttm/ttm_bo.c        |  2 +-
 drivers/gpu/drm/ttm/ttm_tt.c        | 47 ++++++++++++++++++++++++++++---------
 include/drm/ttm/ttm_bo_driver.h     |  2 +-
 4 files changed, 39 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index 72afe82..4dd5060 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -304,7 +304,7 @@ static int radeon_move_vram_ram(struct ttm_buffer_object *bo,
 		return r;
 	}
 
-	r = ttm_tt_set_placement_caching(bo->ttm, tmp_mem.placement);
+	r = ttm_tt_set_placement_caching(bo->ttm, &tmp_mem.placement);
 	if (unlikely(r)) {
 		goto out_cleanup;
 	}
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 3da89d5..4dc21c3 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -305,7 +305,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
 				goto out_err;
 		}
 
-		ret = ttm_tt_set_placement_caching(bo->ttm, mem->placement);
+		ret = ttm_tt_set_placement_caching(bo->ttm, &mem->placement);
 		if (ret)
 			goto out_err;
 
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index bf080ab..7cbdb48 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -89,14 +89,6 @@ static inline int ttm_tt_set_page_caching(struct page *p,
 
 	return ret;
 }
-#else /* CONFIG_X86 */
-static inline int ttm_tt_set_page_caching(struct page *p,
-					  enum ttm_caching_state c_old,
-					  enum ttm_caching_state c_new)
-{
-	return 0;
-}
-#endif /* CONFIG_X86 */
 
 /*
  * Change caching policy for the linear kernel map
@@ -149,19 +141,52 @@ out_err:
 	return ret;
 }
 
-int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement)
+int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement)
 {
 	enum ttm_caching_state state;
 
-	if (placement & TTM_PL_FLAG_WC)
+	if (*placement & TTM_PL_FLAG_WC)
 		state = tt_wc;
-	else if (placement & TTM_PL_FLAG_UNCACHED)
+	else if (*placement & TTM_PL_FLAG_UNCACHED)
 		state = tt_uncached;
 	else
 		state = tt_cached;
 
 	return ttm_tt_set_caching(ttm, state);
 }
+#else /* CONFIG_X86 */
+int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement)
+{
+	if (placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) {
+		ttm->caching_state = tt_cached;
+	} else {
+		if (placement & TTM_PL_FLAG_WC)
+			ttm->caching_state = tt_wc;
+		else if (placement & TTM_PL_FLAG_UNCACHED)
+			ttm->caching_state = tt_uncached;
+		else
+			ttm->caching_state = tt_cached;
+	}
+	/*
+	 * Some architecture force cached so we need to reflect the
+	 * new ttm->caching_state into the mem->placement flags.
+	 */
+	*placement &= ~TTM_PL_MASK_CACHING;
+	switch (bo->ttm->caching_state) {
+	case tt_wc:
+		*placement |= TTM_PL_FLAG_WC;
+		break;
+	case tt_uncached:
+		*placement |= TTM_PL_FLAG_UNCACHED;
+		break;
+	case tt_cached:
+	default:
+		*placement |= TTM_PL_FLAG_CACHED;
+		break;
+	}
+	return 0;
+}
+#endif /* CONFIG_X86 */
 EXPORT_SYMBOL(ttm_tt_set_placement_caching);
 
 void ttm_tt_destroy(struct ttm_tt *ttm)
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 1d9f0f1..cbc5ad2 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -669,7 +669,7 @@ extern int ttm_tt_swapin(struct ttm_tt *ttm);
  * hit RAM. This function may be very costly as it involves global TLB
  * and cache flushes and potential page splitting / combining.
  */
-extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement);
+extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement);
 extern int ttm_tt_swapout(struct ttm_tt *ttm,
 			  struct file *persistent_swap_storage);
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
@ 2014-09-04  2:31         ` Jerome Glisse
  0 siblings, 0 replies; 33+ messages in thread
From: Jerome Glisse @ 2014-09-04  2:31 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig,
	dri-devel

[-- Attachment #1: Type: text/plain, Size: 980 bytes --]

On Thu, Sep 04, 2014 at 12:25:23PM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote:
> 
> > So in the meantime the attached patch should work, it just silently ignore
> > the caching attribute request on non x86 instead of pretending that things
> > are setup as expected and then latter the radeon ou nouveau hw unsetting
> > the snoop bit.
> > 
> > It's not tested but i think it should work.
> 
> I'm still getting placements with !CACHED going from bo_memcpy in
> ttm_io_prot() though ... I'm looking at filtering the placement
> attributes instead.
> 
> Ben.

Ok so this one should do the trick.


> 
> > > 
> > > Cheers,
> > > Jérôme
> > > 
> > > > 
> > > > Cheers,
> > > > Ben.
> > > > 
> > > > 
> > > > _______________________________________________
> > > > dri-devel mailing list
> > > > dri-devel@lists.freedesktop.org
> > > > http://lists.freedesktop.org/mailman/listinfo/dri-devel
> 
> 

[-- Attachment #2: 0001-drm-ttm-force-cached-mapping-on-non-x86-platform.patch --]
[-- Type: text/plain, Size: 4664 bytes --]

>From def7a056d042220f91016d0a7c245ba8e96f90ba Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com>
Date: Wed, 3 Sep 2014 22:04:34 -0400
Subject: [PATCH] drm/ttm: force cached mapping on non x86 platform.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

People interested in providing uncached or write combined mapping
on there architecture need to do the ground work inside there arch
specific code to allow to break the linear kernel mapping so that
page mapping attributes can be updated, in the meantime force cached
mapping for non x86 architecture.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
---
 drivers/gpu/drm/radeon/radeon_ttm.c |  2 +-
 drivers/gpu/drm/ttm/ttm_bo.c        |  2 +-
 drivers/gpu/drm/ttm/ttm_tt.c        | 47 ++++++++++++++++++++++++++++---------
 include/drm/ttm/ttm_bo_driver.h     |  2 +-
 4 files changed, 39 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index 72afe82..4dd5060 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -304,7 +304,7 @@ static int radeon_move_vram_ram(struct ttm_buffer_object *bo,
 		return r;
 	}
 
-	r = ttm_tt_set_placement_caching(bo->ttm, tmp_mem.placement);
+	r = ttm_tt_set_placement_caching(bo->ttm, &tmp_mem.placement);
 	if (unlikely(r)) {
 		goto out_cleanup;
 	}
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 3da89d5..4dc21c3 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -305,7 +305,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
 				goto out_err;
 		}
 
-		ret = ttm_tt_set_placement_caching(bo->ttm, mem->placement);
+		ret = ttm_tt_set_placement_caching(bo->ttm, &mem->placement);
 		if (ret)
 			goto out_err;
 
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index bf080ab..7cbdb48 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -89,14 +89,6 @@ static inline int ttm_tt_set_page_caching(struct page *p,
 
 	return ret;
 }
-#else /* CONFIG_X86 */
-static inline int ttm_tt_set_page_caching(struct page *p,
-					  enum ttm_caching_state c_old,
-					  enum ttm_caching_state c_new)
-{
-	return 0;
-}
-#endif /* CONFIG_X86 */
 
 /*
  * Change caching policy for the linear kernel map
@@ -149,19 +141,52 @@ out_err:
 	return ret;
 }
 
-int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement)
+int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement)
 {
 	enum ttm_caching_state state;
 
-	if (placement & TTM_PL_FLAG_WC)
+	if (*placement & TTM_PL_FLAG_WC)
 		state = tt_wc;
-	else if (placement & TTM_PL_FLAG_UNCACHED)
+	else if (*placement & TTM_PL_FLAG_UNCACHED)
 		state = tt_uncached;
 	else
 		state = tt_cached;
 
 	return ttm_tt_set_caching(ttm, state);
 }
+#else /* CONFIG_X86 */
+int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement)
+{
+	if (placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) {
+		ttm->caching_state = tt_cached;
+	} else {
+		if (placement & TTM_PL_FLAG_WC)
+			ttm->caching_state = tt_wc;
+		else if (placement & TTM_PL_FLAG_UNCACHED)
+			ttm->caching_state = tt_uncached;
+		else
+			ttm->caching_state = tt_cached;
+	}
+	/*
+	 * Some architecture force cached so we need to reflect the
+	 * new ttm->caching_state into the mem->placement flags.
+	 */
+	*placement &= ~TTM_PL_MASK_CACHING;
+	switch (bo->ttm->caching_state) {
+	case tt_wc:
+		*placement |= TTM_PL_FLAG_WC;
+		break;
+	case tt_uncached:
+		*placement |= TTM_PL_FLAG_UNCACHED;
+		break;
+	case tt_cached:
+	default:
+		*placement |= TTM_PL_FLAG_CACHED;
+		break;
+	}
+	return 0;
+}
+#endif /* CONFIG_X86 */
 EXPORT_SYMBOL(ttm_tt_set_placement_caching);
 
 void ttm_tt_destroy(struct ttm_tt *ttm)
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 1d9f0f1..cbc5ad2 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -669,7 +669,7 @@ extern int ttm_tt_swapin(struct ttm_tt *ttm);
  * hit RAM. This function may be very costly as it involves global TLB
  * and cache flushes and potential page splitting / combining.
  */
-extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement);
+extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement);
 extern int ttm_tt_swapout(struct ttm_tt *ttm,
 			  struct file *persistent_swap_storage);
 
-- 
1.9.3


[-- Attachment #3: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
  2014-09-04  2:31         ` Jerome Glisse
@ 2014-09-04  2:32           ` Jerome Glisse
  -1 siblings, 0 replies; 33+ messages in thread
From: Jerome Glisse @ 2014-09-04  2:32 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig,
	dri-devel

[-- Attachment #1: Type: text/plain, Size: 5913 bytes --]

On Wed, Sep 03, 2014 at 10:31:18PM -0400, Jerome Glisse wrote:
> On Thu, Sep 04, 2014 at 12:25:23PM +1000, Benjamin Herrenschmidt wrote:
> > On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote:
> > 
> > > So in the meantime the attached patch should work, it just silently ignore
> > > the caching attribute request on non x86 instead of pretending that things
> > > are setup as expected and then latter the radeon ou nouveau hw unsetting
> > > the snoop bit.
> > > 
> > > It's not tested but i think it should work.
> > 
> > I'm still getting placements with !CACHED going from bo_memcpy in
> > ttm_io_prot() though ... I'm looking at filtering the placement
> > attributes instead.
> > 
> > Ben.
> 
> Ok so this one should do the trick.

And this one should build :)

> 
> 
> > 
> > > > 
> > > > Cheers,
> > > > Jérôme
> > > > 
> > > > > 
> > > > > Cheers,
> > > > > Ben.
> > > > > 
> > > > > 
> > > > > _______________________________________________
> > > > > dri-devel mailing list
> > > > > dri-devel@lists.freedesktop.org
> > > > > http://lists.freedesktop.org/mailman/listinfo/dri-devel
> > 
> > 

> From def7a056d042220f91016d0a7c245ba8e96f90ba Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com>
> Date: Wed, 3 Sep 2014 22:04:34 -0400
> Subject: [PATCH] drm/ttm: force cached mapping on non x86 platform.
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> People interested in providing uncached or write combined mapping
> on there architecture need to do the ground work inside there arch
> specific code to allow to break the linear kernel mapping so that
> page mapping attributes can be updated, in the meantime force cached
> mapping for non x86 architecture.
> 
> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> ---
>  drivers/gpu/drm/radeon/radeon_ttm.c |  2 +-
>  drivers/gpu/drm/ttm/ttm_bo.c        |  2 +-
>  drivers/gpu/drm/ttm/ttm_tt.c        | 47 ++++++++++++++++++++++++++++---------
>  include/drm/ttm/ttm_bo_driver.h     |  2 +-
>  4 files changed, 39 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
> index 72afe82..4dd5060 100644
> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
> @@ -304,7 +304,7 @@ static int radeon_move_vram_ram(struct ttm_buffer_object *bo,
>  		return r;
>  	}
>  
> -	r = ttm_tt_set_placement_caching(bo->ttm, tmp_mem.placement);
> +	r = ttm_tt_set_placement_caching(bo->ttm, &tmp_mem.placement);
>  	if (unlikely(r)) {
>  		goto out_cleanup;
>  	}
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 3da89d5..4dc21c3 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -305,7 +305,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
>  				goto out_err;
>  		}
>  
> -		ret = ttm_tt_set_placement_caching(bo->ttm, mem->placement);
> +		ret = ttm_tt_set_placement_caching(bo->ttm, &mem->placement);
>  		if (ret)
>  			goto out_err;
>  
> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
> index bf080ab..7cbdb48 100644
> --- a/drivers/gpu/drm/ttm/ttm_tt.c
> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> @@ -89,14 +89,6 @@ static inline int ttm_tt_set_page_caching(struct page *p,
>  
>  	return ret;
>  }
> -#else /* CONFIG_X86 */
> -static inline int ttm_tt_set_page_caching(struct page *p,
> -					  enum ttm_caching_state c_old,
> -					  enum ttm_caching_state c_new)
> -{
> -	return 0;
> -}
> -#endif /* CONFIG_X86 */
>  
>  /*
>   * Change caching policy for the linear kernel map
> @@ -149,19 +141,52 @@ out_err:
>  	return ret;
>  }
>  
> -int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement)
> +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement)
>  {
>  	enum ttm_caching_state state;
>  
> -	if (placement & TTM_PL_FLAG_WC)
> +	if (*placement & TTM_PL_FLAG_WC)
>  		state = tt_wc;
> -	else if (placement & TTM_PL_FLAG_UNCACHED)
> +	else if (*placement & TTM_PL_FLAG_UNCACHED)
>  		state = tt_uncached;
>  	else
>  		state = tt_cached;
>  
>  	return ttm_tt_set_caching(ttm, state);
>  }
> +#else /* CONFIG_X86 */
> +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement)
> +{
> +	if (placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) {
> +		ttm->caching_state = tt_cached;
> +	} else {
> +		if (placement & TTM_PL_FLAG_WC)
> +			ttm->caching_state = tt_wc;
> +		else if (placement & TTM_PL_FLAG_UNCACHED)
> +			ttm->caching_state = tt_uncached;
> +		else
> +			ttm->caching_state = tt_cached;
> +	}
> +	/*
> +	 * Some architecture force cached so we need to reflect the
> +	 * new ttm->caching_state into the mem->placement flags.
> +	 */
> +	*placement &= ~TTM_PL_MASK_CACHING;
> +	switch (bo->ttm->caching_state) {
> +	case tt_wc:
> +		*placement |= TTM_PL_FLAG_WC;
> +		break;
> +	case tt_uncached:
> +		*placement |= TTM_PL_FLAG_UNCACHED;
> +		break;
> +	case tt_cached:
> +	default:
> +		*placement |= TTM_PL_FLAG_CACHED;
> +		break;
> +	}
> +	return 0;
> +}
> +#endif /* CONFIG_X86 */
>  EXPORT_SYMBOL(ttm_tt_set_placement_caching);
>  
>  void ttm_tt_destroy(struct ttm_tt *ttm)
> diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
> index 1d9f0f1..cbc5ad2 100644
> --- a/include/drm/ttm/ttm_bo_driver.h
> +++ b/include/drm/ttm/ttm_bo_driver.h
> @@ -669,7 +669,7 @@ extern int ttm_tt_swapin(struct ttm_tt *ttm);
>   * hit RAM. This function may be very costly as it involves global TLB
>   * and cache flushes and potential page splitting / combining.
>   */
> -extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement);
> +extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement);
>  extern int ttm_tt_swapout(struct ttm_tt *ttm,
>  			  struct file *persistent_swap_storage);
>  
> -- 
> 1.9.3
> 


[-- Attachment #2: 0001-drm-ttm-force-cached-mapping-on-non-x86-platform.patch --]
[-- Type: text/plain, Size: 4527 bytes --]

>From 6ed69ceee96873ab351881c6b9f931cec39f396a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com>
Date: Wed, 3 Sep 2014 22:04:34 -0400
Subject: [PATCH] drm/ttm: force cached mapping on non x86 platform.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

People interested in providing uncached or write combined mapping
on there architecture need to do the ground work inside there arch
specific code to allow to break the linear kernel mapping so that
page mapping attributes can be updated, in the meantime force cached
mapping for non x86 architecture.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
---
 drivers/gpu/drm/radeon/radeon_ttm.c |  2 +-
 drivers/gpu/drm/ttm/ttm_bo.c        |  2 +-
 drivers/gpu/drm/ttm/ttm_tt.c        | 47 ++++++++++++++++++++++++++++---------
 include/drm/ttm/ttm_bo_driver.h     |  2 +-
 4 files changed, 39 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index 72afe82..4dd5060 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -304,7 +304,7 @@ static int radeon_move_vram_ram(struct ttm_buffer_object *bo,
 		return r;
 	}
 
-	r = ttm_tt_set_placement_caching(bo->ttm, tmp_mem.placement);
+	r = ttm_tt_set_placement_caching(bo->ttm, &tmp_mem.placement);
 	if (unlikely(r)) {
 		goto out_cleanup;
 	}
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 3da89d5..4dc21c3 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -305,7 +305,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
 				goto out_err;
 		}
 
-		ret = ttm_tt_set_placement_caching(bo->ttm, mem->placement);
+		ret = ttm_tt_set_placement_caching(bo->ttm, &mem->placement);
 		if (ret)
 			goto out_err;
 
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index bf080ab..efd49b9 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -89,14 +89,6 @@ static inline int ttm_tt_set_page_caching(struct page *p,
 
 	return ret;
 }
-#else /* CONFIG_X86 */
-static inline int ttm_tt_set_page_caching(struct page *p,
-					  enum ttm_caching_state c_old,
-					  enum ttm_caching_state c_new)
-{
-	return 0;
-}
-#endif /* CONFIG_X86 */
 
 /*
  * Change caching policy for the linear kernel map
@@ -149,19 +141,52 @@ out_err:
 	return ret;
 }
 
-int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement)
+int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement)
 {
 	enum ttm_caching_state state;
 
-	if (placement & TTM_PL_FLAG_WC)
+	if (*placement & TTM_PL_FLAG_WC)
 		state = tt_wc;
-	else if (placement & TTM_PL_FLAG_UNCACHED)
+	else if (*placement & TTM_PL_FLAG_UNCACHED)
 		state = tt_uncached;
 	else
 		state = tt_cached;
 
 	return ttm_tt_set_caching(ttm, state);
 }
+#else /* CONFIG_X86 */
+int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement)
+{
+	if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) {
+		ttm->caching_state = tt_cached;
+	} else {
+		if (*placement & TTM_PL_FLAG_WC)
+			ttm->caching_state = tt_wc;
+		else if (placement & TTM_PL_FLAG_UNCACHED)
+			ttm->caching_state = tt_uncached;
+		else
+			ttm->caching_state = tt_cached;
+	}
+	/*
+	 * Some architecture force cached so we need to reflect the
+	 * new ttm->caching_state into the mem->placement flags.
+	 */
+	*placement &= ~TTM_PL_MASK_CACHING;
+	switch (bo->ttm->caching_state) {
+	case tt_wc:
+		*placement |= TTM_PL_FLAG_WC;
+		break;
+	case tt_uncached:
+		*placement |= TTM_PL_FLAG_UNCACHED;
+		break;
+	case tt_cached:
+	default:
+		*placement |= TTM_PL_FLAG_CACHED;
+		break;
+	}
+	return 0;
+}
+#endif /* CONFIG_X86 */
 EXPORT_SYMBOL(ttm_tt_set_placement_caching);
 
 void ttm_tt_destroy(struct ttm_tt *ttm)
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 1d9f0f1..cbc5ad2 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -669,7 +669,7 @@ extern int ttm_tt_swapin(struct ttm_tt *ttm);
  * hit RAM. This function may be very costly as it involves global TLB
  * and cache flushes and potential page splitting / combining.
  */
-extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement);
+extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement);
 extern int ttm_tt_swapout(struct ttm_tt *ttm,
 			  struct file *persistent_swap_storage);
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
@ 2014-09-04  2:32           ` Jerome Glisse
  0 siblings, 0 replies; 33+ messages in thread
From: Jerome Glisse @ 2014-09-04  2:32 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig,
	dri-devel

[-- Attachment #1: Type: text/plain, Size: 6093 bytes --]

On Wed, Sep 03, 2014 at 10:31:18PM -0400, Jerome Glisse wrote:
> On Thu, Sep 04, 2014 at 12:25:23PM +1000, Benjamin Herrenschmidt wrote:
> > On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote:
> > 
> > > So in the meantime the attached patch should work, it just silently ignore
> > > the caching attribute request on non x86 instead of pretending that things
> > > are setup as expected and then latter the radeon ou nouveau hw unsetting
> > > the snoop bit.
> > > 
> > > It's not tested but i think it should work.
> > 
> > I'm still getting placements with !CACHED going from bo_memcpy in
> > ttm_io_prot() though ... I'm looking at filtering the placement
> > attributes instead.
> > 
> > Ben.
> 
> Ok so this one should do the trick.

And this one should build :)

> 
> 
> > 
> > > > 
> > > > Cheers,
> > > > Jérôme
> > > > 
> > > > > 
> > > > > Cheers,
> > > > > Ben.
> > > > > 
> > > > > 
> > > > > _______________________________________________
> > > > > dri-devel mailing list
> > > > > dri-devel@lists.freedesktop.org
> > > > > http://lists.freedesktop.org/mailman/listinfo/dri-devel
> > 
> > 

> From def7a056d042220f91016d0a7c245ba8e96f90ba Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com>
> Date: Wed, 3 Sep 2014 22:04:34 -0400
> Subject: [PATCH] drm/ttm: force cached mapping on non x86 platform.
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> People interested in providing uncached or write combined mapping
> on there architecture need to do the ground work inside there arch
> specific code to allow to break the linear kernel mapping so that
> page mapping attributes can be updated, in the meantime force cached
> mapping for non x86 architecture.
> 
> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> ---
>  drivers/gpu/drm/radeon/radeon_ttm.c |  2 +-
>  drivers/gpu/drm/ttm/ttm_bo.c        |  2 +-
>  drivers/gpu/drm/ttm/ttm_tt.c        | 47 ++++++++++++++++++++++++++++---------
>  include/drm/ttm/ttm_bo_driver.h     |  2 +-
>  4 files changed, 39 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
> index 72afe82..4dd5060 100644
> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
> @@ -304,7 +304,7 @@ static int radeon_move_vram_ram(struct ttm_buffer_object *bo,
>  		return r;
>  	}
>  
> -	r = ttm_tt_set_placement_caching(bo->ttm, tmp_mem.placement);
> +	r = ttm_tt_set_placement_caching(bo->ttm, &tmp_mem.placement);
>  	if (unlikely(r)) {
>  		goto out_cleanup;
>  	}
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 3da89d5..4dc21c3 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -305,7 +305,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
>  				goto out_err;
>  		}
>  
> -		ret = ttm_tt_set_placement_caching(bo->ttm, mem->placement);
> +		ret = ttm_tt_set_placement_caching(bo->ttm, &mem->placement);
>  		if (ret)
>  			goto out_err;
>  
> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
> index bf080ab..7cbdb48 100644
> --- a/drivers/gpu/drm/ttm/ttm_tt.c
> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> @@ -89,14 +89,6 @@ static inline int ttm_tt_set_page_caching(struct page *p,
>  
>  	return ret;
>  }
> -#else /* CONFIG_X86 */
> -static inline int ttm_tt_set_page_caching(struct page *p,
> -					  enum ttm_caching_state c_old,
> -					  enum ttm_caching_state c_new)
> -{
> -	return 0;
> -}
> -#endif /* CONFIG_X86 */
>  
>  /*
>   * Change caching policy for the linear kernel map
> @@ -149,19 +141,52 @@ out_err:
>  	return ret;
>  }
>  
> -int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement)
> +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement)
>  {
>  	enum ttm_caching_state state;
>  
> -	if (placement & TTM_PL_FLAG_WC)
> +	if (*placement & TTM_PL_FLAG_WC)
>  		state = tt_wc;
> -	else if (placement & TTM_PL_FLAG_UNCACHED)
> +	else if (*placement & TTM_PL_FLAG_UNCACHED)
>  		state = tt_uncached;
>  	else
>  		state = tt_cached;
>  
>  	return ttm_tt_set_caching(ttm, state);
>  }
> +#else /* CONFIG_X86 */
> +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement)
> +{
> +	if (placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) {
> +		ttm->caching_state = tt_cached;
> +	} else {
> +		if (placement & TTM_PL_FLAG_WC)
> +			ttm->caching_state = tt_wc;
> +		else if (placement & TTM_PL_FLAG_UNCACHED)
> +			ttm->caching_state = tt_uncached;
> +		else
> +			ttm->caching_state = tt_cached;
> +	}
> +	/*
> +	 * Some architecture force cached so we need to reflect the
> +	 * new ttm->caching_state into the mem->placement flags.
> +	 */
> +	*placement &= ~TTM_PL_MASK_CACHING;
> +	switch (bo->ttm->caching_state) {
> +	case tt_wc:
> +		*placement |= TTM_PL_FLAG_WC;
> +		break;
> +	case tt_uncached:
> +		*placement |= TTM_PL_FLAG_UNCACHED;
> +		break;
> +	case tt_cached:
> +	default:
> +		*placement |= TTM_PL_FLAG_CACHED;
> +		break;
> +	}
> +	return 0;
> +}
> +#endif /* CONFIG_X86 */
>  EXPORT_SYMBOL(ttm_tt_set_placement_caching);
>  
>  void ttm_tt_destroy(struct ttm_tt *ttm)
> diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
> index 1d9f0f1..cbc5ad2 100644
> --- a/include/drm/ttm/ttm_bo_driver.h
> +++ b/include/drm/ttm/ttm_bo_driver.h
> @@ -669,7 +669,7 @@ extern int ttm_tt_swapin(struct ttm_tt *ttm);
>   * hit RAM. This function may be very costly as it involves global TLB
>   * and cache flushes and potential page splitting / combining.
>   */
> -extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement);
> +extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement);
>  extern int ttm_tt_swapout(struct ttm_tt *ttm,
>  			  struct file *persistent_swap_storage);
>  
> -- 
> 1.9.3
> 


[-- Attachment #2: 0001-drm-ttm-force-cached-mapping-on-non-x86-platform.patch --]
[-- Type: text/plain, Size: 4666 bytes --]

>From 6ed69ceee96873ab351881c6b9f931cec39f396a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com>
Date: Wed, 3 Sep 2014 22:04:34 -0400
Subject: [PATCH] drm/ttm: force cached mapping on non x86 platform.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

People interested in providing uncached or write combined mapping
on there architecture need to do the ground work inside there arch
specific code to allow to break the linear kernel mapping so that
page mapping attributes can be updated, in the meantime force cached
mapping for non x86 architecture.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
---
 drivers/gpu/drm/radeon/radeon_ttm.c |  2 +-
 drivers/gpu/drm/ttm/ttm_bo.c        |  2 +-
 drivers/gpu/drm/ttm/ttm_tt.c        | 47 ++++++++++++++++++++++++++++---------
 include/drm/ttm/ttm_bo_driver.h     |  2 +-
 4 files changed, 39 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index 72afe82..4dd5060 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -304,7 +304,7 @@ static int radeon_move_vram_ram(struct ttm_buffer_object *bo,
 		return r;
 	}
 
-	r = ttm_tt_set_placement_caching(bo->ttm, tmp_mem.placement);
+	r = ttm_tt_set_placement_caching(bo->ttm, &tmp_mem.placement);
 	if (unlikely(r)) {
 		goto out_cleanup;
 	}
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 3da89d5..4dc21c3 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -305,7 +305,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
 				goto out_err;
 		}
 
-		ret = ttm_tt_set_placement_caching(bo->ttm, mem->placement);
+		ret = ttm_tt_set_placement_caching(bo->ttm, &mem->placement);
 		if (ret)
 			goto out_err;
 
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index bf080ab..efd49b9 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -89,14 +89,6 @@ static inline int ttm_tt_set_page_caching(struct page *p,
 
 	return ret;
 }
-#else /* CONFIG_X86 */
-static inline int ttm_tt_set_page_caching(struct page *p,
-					  enum ttm_caching_state c_old,
-					  enum ttm_caching_state c_new)
-{
-	return 0;
-}
-#endif /* CONFIG_X86 */
 
 /*
  * Change caching policy for the linear kernel map
@@ -149,19 +141,52 @@ out_err:
 	return ret;
 }
 
-int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement)
+int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement)
 {
 	enum ttm_caching_state state;
 
-	if (placement & TTM_PL_FLAG_WC)
+	if (*placement & TTM_PL_FLAG_WC)
 		state = tt_wc;
-	else if (placement & TTM_PL_FLAG_UNCACHED)
+	else if (*placement & TTM_PL_FLAG_UNCACHED)
 		state = tt_uncached;
 	else
 		state = tt_cached;
 
 	return ttm_tt_set_caching(ttm, state);
 }
+#else /* CONFIG_X86 */
+int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement)
+{
+	if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) {
+		ttm->caching_state = tt_cached;
+	} else {
+		if (*placement & TTM_PL_FLAG_WC)
+			ttm->caching_state = tt_wc;
+		else if (placement & TTM_PL_FLAG_UNCACHED)
+			ttm->caching_state = tt_uncached;
+		else
+			ttm->caching_state = tt_cached;
+	}
+	/*
+	 * Some architecture force cached so we need to reflect the
+	 * new ttm->caching_state into the mem->placement flags.
+	 */
+	*placement &= ~TTM_PL_MASK_CACHING;
+	switch (bo->ttm->caching_state) {
+	case tt_wc:
+		*placement |= TTM_PL_FLAG_WC;
+		break;
+	case tt_uncached:
+		*placement |= TTM_PL_FLAG_UNCACHED;
+		break;
+	case tt_cached:
+	default:
+		*placement |= TTM_PL_FLAG_CACHED;
+		break;
+	}
+	return 0;
+}
+#endif /* CONFIG_X86 */
 EXPORT_SYMBOL(ttm_tt_set_placement_caching);
 
 void ttm_tt_destroy(struct ttm_tt *ttm)
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 1d9f0f1..cbc5ad2 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -669,7 +669,7 @@ extern int ttm_tt_swapin(struct ttm_tt *ttm);
  * hit RAM. This function may be very costly as it involves global TLB
  * and cache flushes and potential page splitting / combining.
  */
-extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement);
+extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement);
 extern int ttm_tt_swapout(struct ttm_tt *ttm,
 			  struct file *persistent_swap_storage);
 
-- 
1.9.3


[-- Attachment #3: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
  2014-09-04  2:31         ` Jerome Glisse
@ 2014-09-04  2:36           ` Jerome Glisse
  -1 siblings, 0 replies; 33+ messages in thread
From: Jerome Glisse @ 2014-09-04  2:36 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig,
	dri-devel

[-- Attachment #1: Type: text/plain, Size: 5924 bytes --]

On Wed, Sep 03, 2014 at 10:31:18PM -0400, Jerome Glisse wrote:
> On Thu, Sep 04, 2014 at 12:25:23PM +1000, Benjamin Herrenschmidt wrote:
> > On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote:
> > 
> > > So in the meantime the attached patch should work, it just silently ignore
> > > the caching attribute request on non x86 instead of pretending that things
> > > are setup as expected and then latter the radeon ou nouveau hw unsetting
> > > the snoop bit.
> > > 
> > > It's not tested but i think it should work.
> > 
> > I'm still getting placements with !CACHED going from bo_memcpy in
> > ttm_io_prot() though ... I'm looking at filtering the placement
> > attributes instead.
> > 
> > Ben.
> 
> Ok so this one should do the trick.

Ok final version ... famous last word.


> 
> 
> > 
> > > > 
> > > > Cheers,
> > > > Jérôme
> > > > 
> > > > > 
> > > > > Cheers,
> > > > > Ben.
> > > > > 
> > > > > 
> > > > > _______________________________________________
> > > > > dri-devel mailing list
> > > > > dri-devel@lists.freedesktop.org
> > > > > http://lists.freedesktop.org/mailman/listinfo/dri-devel
> > 
> > 

> From def7a056d042220f91016d0a7c245ba8e96f90ba Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com>
> Date: Wed, 3 Sep 2014 22:04:34 -0400
> Subject: [PATCH] drm/ttm: force cached mapping on non x86 platform.
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> People interested in providing uncached or write combined mapping
> on there architecture need to do the ground work inside there arch
> specific code to allow to break the linear kernel mapping so that
> page mapping attributes can be updated, in the meantime force cached
> mapping for non x86 architecture.
> 
> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> ---
>  drivers/gpu/drm/radeon/radeon_ttm.c |  2 +-
>  drivers/gpu/drm/ttm/ttm_bo.c        |  2 +-
>  drivers/gpu/drm/ttm/ttm_tt.c        | 47 ++++++++++++++++++++++++++++---------
>  include/drm/ttm/ttm_bo_driver.h     |  2 +-
>  4 files changed, 39 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
> index 72afe82..4dd5060 100644
> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
> @@ -304,7 +304,7 @@ static int radeon_move_vram_ram(struct ttm_buffer_object *bo,
>  		return r;
>  	}
>  
> -	r = ttm_tt_set_placement_caching(bo->ttm, tmp_mem.placement);
> +	r = ttm_tt_set_placement_caching(bo->ttm, &tmp_mem.placement);
>  	if (unlikely(r)) {
>  		goto out_cleanup;
>  	}
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 3da89d5..4dc21c3 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -305,7 +305,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
>  				goto out_err;
>  		}
>  
> -		ret = ttm_tt_set_placement_caching(bo->ttm, mem->placement);
> +		ret = ttm_tt_set_placement_caching(bo->ttm, &mem->placement);
>  		if (ret)
>  			goto out_err;
>  
> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
> index bf080ab..7cbdb48 100644
> --- a/drivers/gpu/drm/ttm/ttm_tt.c
> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> @@ -89,14 +89,6 @@ static inline int ttm_tt_set_page_caching(struct page *p,
>  
>  	return ret;
>  }
> -#else /* CONFIG_X86 */
> -static inline int ttm_tt_set_page_caching(struct page *p,
> -					  enum ttm_caching_state c_old,
> -					  enum ttm_caching_state c_new)
> -{
> -	return 0;
> -}
> -#endif /* CONFIG_X86 */
>  
>  /*
>   * Change caching policy for the linear kernel map
> @@ -149,19 +141,52 @@ out_err:
>  	return ret;
>  }
>  
> -int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement)
> +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement)
>  {
>  	enum ttm_caching_state state;
>  
> -	if (placement & TTM_PL_FLAG_WC)
> +	if (*placement & TTM_PL_FLAG_WC)
>  		state = tt_wc;
> -	else if (placement & TTM_PL_FLAG_UNCACHED)
> +	else if (*placement & TTM_PL_FLAG_UNCACHED)
>  		state = tt_uncached;
>  	else
>  		state = tt_cached;
>  
>  	return ttm_tt_set_caching(ttm, state);
>  }
> +#else /* CONFIG_X86 */
> +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement)
> +{
> +	if (placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) {
> +		ttm->caching_state = tt_cached;
> +	} else {
> +		if (placement & TTM_PL_FLAG_WC)
> +			ttm->caching_state = tt_wc;
> +		else if (placement & TTM_PL_FLAG_UNCACHED)
> +			ttm->caching_state = tt_uncached;
> +		else
> +			ttm->caching_state = tt_cached;
> +	}
> +	/*
> +	 * Some architecture force cached so we need to reflect the
> +	 * new ttm->caching_state into the mem->placement flags.
> +	 */
> +	*placement &= ~TTM_PL_MASK_CACHING;
> +	switch (bo->ttm->caching_state) {
> +	case tt_wc:
> +		*placement |= TTM_PL_FLAG_WC;
> +		break;
> +	case tt_uncached:
> +		*placement |= TTM_PL_FLAG_UNCACHED;
> +		break;
> +	case tt_cached:
> +	default:
> +		*placement |= TTM_PL_FLAG_CACHED;
> +		break;
> +	}
> +	return 0;
> +}
> +#endif /* CONFIG_X86 */
>  EXPORT_SYMBOL(ttm_tt_set_placement_caching);
>  
>  void ttm_tt_destroy(struct ttm_tt *ttm)
> diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
> index 1d9f0f1..cbc5ad2 100644
> --- a/include/drm/ttm/ttm_bo_driver.h
> +++ b/include/drm/ttm/ttm_bo_driver.h
> @@ -669,7 +669,7 @@ extern int ttm_tt_swapin(struct ttm_tt *ttm);
>   * hit RAM. This function may be very costly as it involves global TLB
>   * and cache flushes and potential page splitting / combining.
>   */
> -extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement);
> +extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement);
>  extern int ttm_tt_swapout(struct ttm_tt *ttm,
>  			  struct file *persistent_swap_storage);
>  
> -- 
> 1.9.3
> 


[-- Attachment #2: 0001-drm-ttm-force-cached-mapping-on-non-x86-platform.patch --]
[-- Type: text/plain, Size: 4186 bytes --]

>From 236038e18dc303bb9aa877922e01963d3fb0b7af Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com>
Date: Wed, 3 Sep 2014 22:04:34 -0400
Subject: [PATCH] drm/ttm: force cached mapping on non x86 platform.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

People interested in providing uncached or write combined mapping
on there architecture need to do the ground work inside there arch
specific code to allow to break the linear kernel mapping so that
page mapping attributes can be updated, in the meantime force cached
mapping for non x86 architecture.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
---
 drivers/gpu/drm/radeon/radeon_ttm.c |  2 +-
 drivers/gpu/drm/ttm/ttm_bo.c        |  2 +-
 drivers/gpu/drm/ttm/ttm_tt.c        | 32 +++++++++++++++++++++-----------
 include/drm/ttm/ttm_bo_driver.h     |  2 +-
 4 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index 72afe82..4dd5060 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -304,7 +304,7 @@ static int radeon_move_vram_ram(struct ttm_buffer_object *bo,
 		return r;
 	}
 
-	r = ttm_tt_set_placement_caching(bo->ttm, tmp_mem.placement);
+	r = ttm_tt_set_placement_caching(bo->ttm, &tmp_mem.placement);
 	if (unlikely(r)) {
 		goto out_cleanup;
 	}
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 3da89d5..4dc21c3 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -305,7 +305,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
 				goto out_err;
 		}
 
-		ret = ttm_tt_set_placement_caching(bo->ttm, mem->placement);
+		ret = ttm_tt_set_placement_caching(bo->ttm, &mem->placement);
 		if (ret)
 			goto out_err;
 
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index bf080ab..a0df803 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -89,14 +89,6 @@ static inline int ttm_tt_set_page_caching(struct page *p,
 
 	return ret;
 }
-#else /* CONFIG_X86 */
-static inline int ttm_tt_set_page_caching(struct page *p,
-					  enum ttm_caching_state c_old,
-					  enum ttm_caching_state c_new)
-{
-	return 0;
-}
-#endif /* CONFIG_X86 */
 
 /*
  * Change caching policy for the linear kernel map
@@ -149,19 +141,37 @@ out_err:
 	return ret;
 }
 
-int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement)
+int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement)
 {
 	enum ttm_caching_state state;
 
-	if (placement & TTM_PL_FLAG_WC)
+	if (*placement & TTM_PL_FLAG_WC)
 		state = tt_wc;
-	else if (placement & TTM_PL_FLAG_UNCACHED)
+	else if (*placement & TTM_PL_FLAG_UNCACHED)
 		state = tt_uncached;
 	else
 		state = tt_cached;
 
 	return ttm_tt_set_caching(ttm, state);
 }
+#else /* CONFIG_X86 */
+int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement)
+{
+	if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) {
+		ttm->caching_state = tt_cached;
+		*placement &= ~TTM_PL_MASK_CACHING;
+		*placement |= TTM_PL_FLAG_CACHED;
+	} else {
+		if (*placement & TTM_PL_FLAG_WC)
+			ttm->caching_state = tt_wc;
+		else if (placement & TTM_PL_FLAG_UNCACHED)
+			ttm->caching_state = tt_uncached;
+		else
+			ttm->caching_state = tt_cached;
+	}
+	return 0;
+}
+#endif /* CONFIG_X86 */
 EXPORT_SYMBOL(ttm_tt_set_placement_caching);
 
 void ttm_tt_destroy(struct ttm_tt *ttm)
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 1d9f0f1..cbc5ad2 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -669,7 +669,7 @@ extern int ttm_tt_swapin(struct ttm_tt *ttm);
  * hit RAM. This function may be very costly as it involves global TLB
  * and cache flushes and potential page splitting / combining.
  */
-extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement);
+extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement);
 extern int ttm_tt_swapout(struct ttm_tt *ttm,
 			  struct file *persistent_swap_storage);
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
@ 2014-09-04  2:36           ` Jerome Glisse
  0 siblings, 0 replies; 33+ messages in thread
From: Jerome Glisse @ 2014-09-04  2:36 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig,
	dri-devel

[-- Attachment #1: Type: text/plain, Size: 6105 bytes --]

On Wed, Sep 03, 2014 at 10:31:18PM -0400, Jerome Glisse wrote:
> On Thu, Sep 04, 2014 at 12:25:23PM +1000, Benjamin Herrenschmidt wrote:
> > On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote:
> > 
> > > So in the meantime the attached patch should work, it just silently ignore
> > > the caching attribute request on non x86 instead of pretending that things
> > > are setup as expected and then latter the radeon ou nouveau hw unsetting
> > > the snoop bit.
> > > 
> > > It's not tested but i think it should work.
> > 
> > I'm still getting placements with !CACHED going from bo_memcpy in
> > ttm_io_prot() though ... I'm looking at filtering the placement
> > attributes instead.
> > 
> > Ben.
> 
> Ok so this one should do the trick.

Ok final version ... famous last word.


> 
> 
> > 
> > > > 
> > > > Cheers,
> > > > Jérôme
> > > > 
> > > > > 
> > > > > Cheers,
> > > > > Ben.
> > > > > 
> > > > > 
> > > > > _______________________________________________
> > > > > dri-devel mailing list
> > > > > dri-devel@lists.freedesktop.org
> > > > > http://lists.freedesktop.org/mailman/listinfo/dri-devel
> > 
> > 

> From def7a056d042220f91016d0a7c245ba8e96f90ba Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com>
> Date: Wed, 3 Sep 2014 22:04:34 -0400
> Subject: [PATCH] drm/ttm: force cached mapping on non x86 platform.
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> People interested in providing uncached or write combined mapping
> on there architecture need to do the ground work inside there arch
> specific code to allow to break the linear kernel mapping so that
> page mapping attributes can be updated, in the meantime force cached
> mapping for non x86 architecture.
> 
> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> ---
>  drivers/gpu/drm/radeon/radeon_ttm.c |  2 +-
>  drivers/gpu/drm/ttm/ttm_bo.c        |  2 +-
>  drivers/gpu/drm/ttm/ttm_tt.c        | 47 ++++++++++++++++++++++++++++---------
>  include/drm/ttm/ttm_bo_driver.h     |  2 +-
>  4 files changed, 39 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
> index 72afe82..4dd5060 100644
> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
> @@ -304,7 +304,7 @@ static int radeon_move_vram_ram(struct ttm_buffer_object *bo,
>  		return r;
>  	}
>  
> -	r = ttm_tt_set_placement_caching(bo->ttm, tmp_mem.placement);
> +	r = ttm_tt_set_placement_caching(bo->ttm, &tmp_mem.placement);
>  	if (unlikely(r)) {
>  		goto out_cleanup;
>  	}
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 3da89d5..4dc21c3 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -305,7 +305,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
>  				goto out_err;
>  		}
>  
> -		ret = ttm_tt_set_placement_caching(bo->ttm, mem->placement);
> +		ret = ttm_tt_set_placement_caching(bo->ttm, &mem->placement);
>  		if (ret)
>  			goto out_err;
>  
> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
> index bf080ab..7cbdb48 100644
> --- a/drivers/gpu/drm/ttm/ttm_tt.c
> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> @@ -89,14 +89,6 @@ static inline int ttm_tt_set_page_caching(struct page *p,
>  
>  	return ret;
>  }
> -#else /* CONFIG_X86 */
> -static inline int ttm_tt_set_page_caching(struct page *p,
> -					  enum ttm_caching_state c_old,
> -					  enum ttm_caching_state c_new)
> -{
> -	return 0;
> -}
> -#endif /* CONFIG_X86 */
>  
>  /*
>   * Change caching policy for the linear kernel map
> @@ -149,19 +141,52 @@ out_err:
>  	return ret;
>  }
>  
> -int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement)
> +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement)
>  {
>  	enum ttm_caching_state state;
>  
> -	if (placement & TTM_PL_FLAG_WC)
> +	if (*placement & TTM_PL_FLAG_WC)
>  		state = tt_wc;
> -	else if (placement & TTM_PL_FLAG_UNCACHED)
> +	else if (*placement & TTM_PL_FLAG_UNCACHED)
>  		state = tt_uncached;
>  	else
>  		state = tt_cached;
>  
>  	return ttm_tt_set_caching(ttm, state);
>  }
> +#else /* CONFIG_X86 */
> +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement)
> +{
> +	if (placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) {
> +		ttm->caching_state = tt_cached;
> +	} else {
> +		if (placement & TTM_PL_FLAG_WC)
> +			ttm->caching_state = tt_wc;
> +		else if (placement & TTM_PL_FLAG_UNCACHED)
> +			ttm->caching_state = tt_uncached;
> +		else
> +			ttm->caching_state = tt_cached;
> +	}
> +	/*
> +	 * Some architecture force cached so we need to reflect the
> +	 * new ttm->caching_state into the mem->placement flags.
> +	 */
> +	*placement &= ~TTM_PL_MASK_CACHING;
> +	switch (bo->ttm->caching_state) {
> +	case tt_wc:
> +		*placement |= TTM_PL_FLAG_WC;
> +		break;
> +	case tt_uncached:
> +		*placement |= TTM_PL_FLAG_UNCACHED;
> +		break;
> +	case tt_cached:
> +	default:
> +		*placement |= TTM_PL_FLAG_CACHED;
> +		break;
> +	}
> +	return 0;
> +}
> +#endif /* CONFIG_X86 */
>  EXPORT_SYMBOL(ttm_tt_set_placement_caching);
>  
>  void ttm_tt_destroy(struct ttm_tt *ttm)
> diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
> index 1d9f0f1..cbc5ad2 100644
> --- a/include/drm/ttm/ttm_bo_driver.h
> +++ b/include/drm/ttm/ttm_bo_driver.h
> @@ -669,7 +669,7 @@ extern int ttm_tt_swapin(struct ttm_tt *ttm);
>   * hit RAM. This function may be very costly as it involves global TLB
>   * and cache flushes and potential page splitting / combining.
>   */
> -extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement);
> +extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement);
>  extern int ttm_tt_swapout(struct ttm_tt *ttm,
>  			  struct file *persistent_swap_storage);
>  
> -- 
> 1.9.3
> 


[-- Attachment #2: 0001-drm-ttm-force-cached-mapping-on-non-x86-platform.patch --]
[-- Type: text/plain, Size: 4310 bytes --]

>From 236038e18dc303bb9aa877922e01963d3fb0b7af Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com>
Date: Wed, 3 Sep 2014 22:04:34 -0400
Subject: [PATCH] drm/ttm: force cached mapping on non x86 platform.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

People interested in providing uncached or write combined mapping
on there architecture need to do the ground work inside there arch
specific code to allow to break the linear kernel mapping so that
page mapping attributes can be updated, in the meantime force cached
mapping for non x86 architecture.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
---
 drivers/gpu/drm/radeon/radeon_ttm.c |  2 +-
 drivers/gpu/drm/ttm/ttm_bo.c        |  2 +-
 drivers/gpu/drm/ttm/ttm_tt.c        | 32 +++++++++++++++++++++-----------
 include/drm/ttm/ttm_bo_driver.h     |  2 +-
 4 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index 72afe82..4dd5060 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -304,7 +304,7 @@ static int radeon_move_vram_ram(struct ttm_buffer_object *bo,
 		return r;
 	}
 
-	r = ttm_tt_set_placement_caching(bo->ttm, tmp_mem.placement);
+	r = ttm_tt_set_placement_caching(bo->ttm, &tmp_mem.placement);
 	if (unlikely(r)) {
 		goto out_cleanup;
 	}
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 3da89d5..4dc21c3 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -305,7 +305,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
 				goto out_err;
 		}
 
-		ret = ttm_tt_set_placement_caching(bo->ttm, mem->placement);
+		ret = ttm_tt_set_placement_caching(bo->ttm, &mem->placement);
 		if (ret)
 			goto out_err;
 
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index bf080ab..a0df803 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -89,14 +89,6 @@ static inline int ttm_tt_set_page_caching(struct page *p,
 
 	return ret;
 }
-#else /* CONFIG_X86 */
-static inline int ttm_tt_set_page_caching(struct page *p,
-					  enum ttm_caching_state c_old,
-					  enum ttm_caching_state c_new)
-{
-	return 0;
-}
-#endif /* CONFIG_X86 */
 
 /*
  * Change caching policy for the linear kernel map
@@ -149,19 +141,37 @@ out_err:
 	return ret;
 }
 
-int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement)
+int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement)
 {
 	enum ttm_caching_state state;
 
-	if (placement & TTM_PL_FLAG_WC)
+	if (*placement & TTM_PL_FLAG_WC)
 		state = tt_wc;
-	else if (placement & TTM_PL_FLAG_UNCACHED)
+	else if (*placement & TTM_PL_FLAG_UNCACHED)
 		state = tt_uncached;
 	else
 		state = tt_cached;
 
 	return ttm_tt_set_caching(ttm, state);
 }
+#else /* CONFIG_X86 */
+int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement)
+{
+	if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) {
+		ttm->caching_state = tt_cached;
+		*placement &= ~TTM_PL_MASK_CACHING;
+		*placement |= TTM_PL_FLAG_CACHED;
+	} else {
+		if (*placement & TTM_PL_FLAG_WC)
+			ttm->caching_state = tt_wc;
+		else if (placement & TTM_PL_FLAG_UNCACHED)
+			ttm->caching_state = tt_uncached;
+		else
+			ttm->caching_state = tt_cached;
+	}
+	return 0;
+}
+#endif /* CONFIG_X86 */
 EXPORT_SYMBOL(ttm_tt_set_placement_caching);
 
 void ttm_tt_destroy(struct ttm_tt *ttm)
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 1d9f0f1..cbc5ad2 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -669,7 +669,7 @@ extern int ttm_tt_swapin(struct ttm_tt *ttm);
  * hit RAM. This function may be very costly as it involves global TLB
  * and cache flushes and potential page splitting / combining.
  */
-extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement);
+extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement);
 extern int ttm_tt_swapout(struct ttm_tt *ttm,
 			  struct file *persistent_swap_storage);
 
-- 
1.9.3


[-- Attachment #3: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
  2014-09-04  2:36           ` Jerome Glisse
  (?)
@ 2014-09-04  5:23           ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2014-09-04  5:23 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig,
	dri-devel

On Wed, 2014-09-03 at 22:36 -0400, Jerome Glisse wrote:
> On Wed, Sep 03, 2014 at 10:31:18PM -0400, Jerome Glisse wrote:
> > On Thu, Sep 04, 2014 at 12:25:23PM +1000, Benjamin Herrenschmidt wrote:
> > > On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote:
> > > 
> > > > So in the meantime the attached patch should work, it just silently ignore
> > > > the caching attribute request on non x86 instead of pretending that things
> > > > are setup as expected and then latter the radeon ou nouveau hw unsetting
> > > > the snoop bit.
> > > > 
> > > > It's not tested but i think it should work.
> > > 
> > > I'm still getting placements with !CACHED going from bo_memcpy in
> > > ttm_io_prot() though ... I'm looking at filtering the placement
> > > attributes instead.
> > > 
> > > Ben.
> > 
> > Ok so this one should do the trick.
> 
> Ok final version ... famous last word.

Minus a couple of obvious typos that prevent if from building, it seems
to do the trick for me with the AST driver, no more bad mappings.

I'll still send a patch that catches the incorrect mapping attempts
inside ttm_io_prot() and warns to help future debugging and avoid
"random" behaviour. (I need to fix other things in the powerpc code
in there anyway).

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
  2014-09-04  2:36           ` Jerome Glisse
  (?)
  (?)
@ 2014-09-04  6:45           ` Gabriel Paubert
  -1 siblings, 0 replies; 33+ messages in thread
From: Gabriel Paubert @ 2014-09-04  6:45 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Michel Dänzer, dri-devel, linuxppc-dev, Alex Deucher,
	Christian Koenig

On Wed, Sep 03, 2014 at 10:36:57PM -0400, Jerome Glisse wrote:
> On Wed, Sep 03, 2014 at 10:31:18PM -0400, Jerome Glisse wrote:
> > On Thu, Sep 04, 2014 at 12:25:23PM +1000, Benjamin Herrenschmidt wrote:
> > > On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote:
> > > 
> > > > So in the meantime the attached patch should work, it just silently ignore
> > > > the caching attribute request on non x86 instead of pretending that things
> > > > are setup as expected and then latter the radeon ou nouveau hw unsetting
> > > > the snoop bit.
> > > > 
> > > > It's not tested but i think it should work.
> > > 
> > > I'm still getting placements with !CACHED going from bo_memcpy in
> > > ttm_io_prot() though ... I'm looking at filtering the placement
> > > attributes instead.
> > > 
> > > Ben.
> > 
> > Ok so this one should do the trick.
> 
> Ok final version ... famous last word.
[snipped older version]
> >From 236038e18dc303bb9aa877922e01963d3fb0b7af Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com>
> Date: Wed, 3 Sep 2014 22:04:34 -0400
> Subject: [PATCH] drm/ttm: force cached mapping on non x86 platform.
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> People interested in providing uncached or write combined mapping
> on there architecture need to do the ground work inside there arch

s/there/their/g

> specific code to allow to break the linear kernel mapping so that
> page mapping attributes can be updated, in the meantime force cached
> mapping for non x86 architecture.
> 
> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> ---
>  drivers/gpu/drm/radeon/radeon_ttm.c |  2 +-
>  drivers/gpu/drm/ttm/ttm_bo.c        |  2 +-
>  drivers/gpu/drm/ttm/ttm_tt.c        | 32 +++++++++++++++++++++-----------
>  include/drm/ttm/ttm_bo_driver.h     |  2 +-
>  4 files changed, 24 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
> index 72afe82..4dd5060 100644
> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
> @@ -304,7 +304,7 @@ static int radeon_move_vram_ram(struct ttm_buffer_object *bo,
>  		return r;
>  	}
>  
> -	r = ttm_tt_set_placement_caching(bo->ttm, tmp_mem.placement);
> +	r = ttm_tt_set_placement_caching(bo->ttm, &tmp_mem.placement);
>  	if (unlikely(r)) {
>  		goto out_cleanup;
>  	}
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 3da89d5..4dc21c3 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -305,7 +305,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
>  				goto out_err;
>  		}
>  
> -		ret = ttm_tt_set_placement_caching(bo->ttm, mem->placement);
> +		ret = ttm_tt_set_placement_caching(bo->ttm, &mem->placement);
>  		if (ret)
>  			goto out_err;
>  
> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
> index bf080ab..a0df803 100644
> --- a/drivers/gpu/drm/ttm/ttm_tt.c
> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> @@ -89,14 +89,6 @@ static inline int ttm_tt_set_page_caching(struct page *p,
>  
>  	return ret;
>  }
> -#else /* CONFIG_X86 */
> -static inline int ttm_tt_set_page_caching(struct page *p,
> -					  enum ttm_caching_state c_old,
> -					  enum ttm_caching_state c_new)
> -{
> -	return 0;
> -}
> -#endif /* CONFIG_X86 */
>  
>  /*
>   * Change caching policy for the linear kernel map
> @@ -149,19 +141,37 @@ out_err:
>  	return ret;
>  }
>  
> -int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement)
> +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement)
>  {
>  	enum ttm_caching_state state;
>  
> -	if (placement & TTM_PL_FLAG_WC)
> +	if (*placement & TTM_PL_FLAG_WC)
>  		state = tt_wc;
> -	else if (placement & TTM_PL_FLAG_UNCACHED)
> +	else if (*placement & TTM_PL_FLAG_UNCACHED)
>  		state = tt_uncached;
>  	else
>  		state = tt_cached;
>  
>  	return ttm_tt_set_caching(ttm, state);
>  }
> +#else /* CONFIG_X86 */
> +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement)
> +{
> +	if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) {
> +		ttm->caching_state = tt_cached;
> +		*placement &= ~TTM_PL_MASK_CACHING;
> +		*placement |= TTM_PL_FLAG_CACHED;
> +	} else {
> +		if (*placement & TTM_PL_FLAG_WC)
> +			ttm->caching_state = tt_wc;
> +		else if (placement & TTM_PL_FLAG_UNCACHED)
> +			ttm->caching_state = tt_uncached;
> +		else
> +			ttm->caching_state = tt_cached;
> +	}
> +	return 0;
> +}
> +#endif /* CONFIG_X86 */
>  EXPORT_SYMBOL(ttm_tt_set_placement_caching);
>  
>  void ttm_tt_destroy(struct ttm_tt *ttm)
> diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
> index 1d9f0f1..cbc5ad2 100644
> --- a/include/drm/ttm/ttm_bo_driver.h
> +++ b/include/drm/ttm/ttm_bo_driver.h
> @@ -669,7 +669,7 @@ extern int ttm_tt_swapin(struct ttm_tt *ttm);
>   * hit RAM. This function may be very costly as it involves global TLB
>   * and cache flushes and potential page splitting / combining.
>   */
> -extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement);
> +extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement);
>  extern int ttm_tt_swapout(struct ttm_tt *ttm,
>  			  struct file *persistent_swap_storage);
>  
> -- 
> 1.9.3
> 

> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
  2014-09-04  1:55 ` Jerome Glisse
  2014-09-04  2:07     ` Jerome Glisse
  2014-09-04  2:15   ` Benjamin Herrenschmidt
@ 2014-09-04  7:12   ` Michel Dänzer
  2 siblings, 0 replies; 33+ messages in thread
From: Michel Dänzer @ 2014-09-04  7:12 UTC (permalink / raw)
  To: Jerome Glisse, Benjamin Herrenschmidt
  Cc: Alex Deucher, linuxppc-dev, Christian Koenig, dri-devel

On 04.09.2014 10:55, Jerome Glisse wrote:
>
> While i agree about the issue of incoherent double map of same page, i
> think we have more issue. For instance lattely AMD have been pushing a
> lot of patches to move things to use uncached memory for radeon and as
> usual thoses patches comes with no comment to the motivations of those
> changes.

That would have been a fair review comment...


> What i understand is that uncached mapping for some frequently use buffer
> give a significant performance boost (i am assuming this has to do with
> all the snoop pci transaction overhead).

Exactly, although it's a win even if the data is written by the CPU only 
once and read by the GPU only once.


> This also means that we need to fix ttm_tt_set_placement_caching so that
> when it returns an error it switches to cached mapping. Which will always
> work.

GTT with AGP being one exception.


-- 
Earthling Michel Dänzer            |                  http://www.amd.com
Libre software enthusiast          |                Mesa and X developer

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
  2014-09-04  2:36           ` Jerome Glisse
                             ` (2 preceding siblings ...)
  (?)
@ 2014-09-04  7:19           ` Michel Dänzer
  2014-09-04  7:54             ` Benjamin Herrenschmidt
  -1 siblings, 1 reply; 33+ messages in thread
From: Michel Dänzer @ 2014-09-04  7:19 UTC (permalink / raw)
  To: Jerome Glisse, Benjamin Herrenschmidt
  Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig,
	dri-devel

On 04.09.2014 11:36, Jerome Glisse wrote:
> On Wed, Sep 03, 2014 at 10:31:18PM -0400, Jerome Glisse wrote:
>> On Thu, Sep 04, 2014 at 12:25:23PM +1000, Benjamin Herrenschmidt wrote:
>>> On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote:
>>>
>>>> So in the meantime the attached patch should work, it just silently ignore
>>>> the caching attribute request on non x86 instead of pretending that things
>>>> are setup as expected and then latter the radeon ou nouveau hw unsetting
>>>> the snoop bit.
>>>>
>>>> It's not tested but i think it should work.
>>>
>>> I'm still getting placements with !CACHED going from bo_memcpy in
>>> ttm_io_prot() though ... I'm looking at filtering the placement
>>> attributes instead.
>>>
>>> Ben.
>>
>> Ok so this one should do the trick.
>
> Ok final version ... famous last word.

[...]

> +#else /* CONFIG_X86 */
> +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement)
> +{
> +	if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) {
> +		ttm->caching_state = tt_cached;
> +		*placement &= ~TTM_PL_MASK_CACHING;
> +		*placement |= TTM_PL_FLAG_CACHED;

NAK, this will break AGP on PowerMacs.


-- 
Earthling Michel Dänzer            |                  http://www.amd.com
Libre software enthusiast          |                Mesa and X developer

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
  2014-09-04  0:12 ` Benjamin Herrenschmidt
@ 2014-09-04  7:44   ` Thomas Hellstrom
  -1 siblings, 0 replies; 33+ messages in thread
From: Thomas Hellstrom @ 2014-09-04  7:44 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jerome Glisse, linuxppc-dev, Michel Danzer, dri-devel

Hi!

Let me try to bring some clarity and suggestions into this.

On 09/04/2014 02:12 AM, Benjamin Herrenschmidt wrote:
> Hi folks !
>
> I've been tracking down some problems with the recent DRI on powerpc and
> stumbled upon something that doesn't look right, and not necessarily
> only for us.
>
> Now it's possible that I haven't fully understood the code here and I
> also don't know to what extent some of that behaviour is necessary for
> some platforms such as Intel GTT bits.
>
> What I've observed with a simple/dumb (no DMA) driver like AST (but this
> probably happens more generally) is that when evicting a BO from VRAM
> into System memory, the TTM tries to preserve the existing caching
> attributes of the VRAM object.
>
> >From what I can tell, we end up with going from VRAM to System memory
> type, and we eventually call ttm_bo_select_caching() to select the
> caching option for the target.
>
> This will, from what I can tell, try to use the same caching mode as the
> original object:
>
> 	if ((cur_placement & caching) != 0)
> 		result |= (cur_placement & caching);
>
> And cur_placement comes from bo->mem.placement which as far as I can
> tell is based on the placement array which the drivers set up.

This originates from the fact that when evicting GTT memory, on x86 it's
unnecessary and undesirable to switch caching mode when going to system.

>
> Now they tend to uniformly setup the placement for System memory as
> TTM_PL_MASK_CACHING which enables all caching modes.
>
> So I end up with, for example, my System memory BOs having
> TTM_PL_FLAG_CACHED not set (though they also don't have
> TTM_PL_FLAG_UNCACHED) and TTM_PL_FLAG_WC.
>
> We don't seem to use the man->default_caching (which will have
> TTM_PL_FLAG_CACHED) unless there is no matching bit at all between the
> proposed placement and the existing caching mode.
>
> Now this is a problem for several reason that I can think of:
>
>  - On a number of powerpc platforms, such as all our server 64-bit one
> for example, it's actually illegal to map system memory non-cached. The
> system is fully cache coherent for all possible DMA originators (that we
> care about at least) and mapping memory non-cachable while it's mapped
> cachable in the linear mapping can cause nasty cache paradox which, when
> detected by HW, can checkstop the system.
>
>  - A similar issue exists, afaik, on ARM >= v7, so anything mapped
> non-cachable must be removed from the linear mapping explicitly since
> otherwise it can be speculatively prefetched into the cache.
>
>  - I don't know about x86, but even then, it looks quite sub-optimal to
> map the memory backing of the BOs and access it using a WC rather than a
> cachable mapping attribute.

Last time I tested, (and it seems like Michel is on the same track),
writing with the CPU to write-combined memory was substantially faster
than writing to cached memory, with the additional side-effect that CPU
caches are left unpolluted.

Moreover (although only tested on Intel's embedded chipsets), texturing
from cpu-cache-coherent PCI memory was a real GPU performance hog
compared to texturing from non-snooped memory. Hence, whenever a buffer
could be classified as GPU-read-only (or almost at least), it should be
placed in write-combined memory.

>
> Now, some folks on IRC mentioned that there might be reasons for the
> current behaviour as to not change the caching attributes when going
> in/out of the GTT on Intel, I don't know how that relates and how that
> works, but maybe that should be enforced by having a different placement
> mask specifically on those chipsets.
>
> Dave, should we change the various PCI drivers for generally coherent
> devices such that the System memory type doesn't allow placements
> without CACHED attribute ? Or at least on coherent platforms ? How do
> detect that ? Should we have a TTM helper to establish the default
> memory placement attributes that "normal PCI" drivers call to set that
> up so we can have all the necessary arch ifdefs in one single place, at
> least for "classic PCI/PCIe" stuff (AGP might need additional tweaks) ?
>
> Non-PCI and "special" drivers like Intel can use a different set of
> placement attributes to represent the requirements of those specific
> platforms (mostly thinking of embedded ARM here which under some
> circumstances might actually require non-cached mappings).
> Or am I missing another part of the puzzle ?
>
> As it-is, things are broken for me even for dumb drivers, and I suspect
> to a large extent with radeon and nouveau too, though in some case we
> might get away with it most of the time ... until the machine locks up
> for some unexplainable reason... This might cause problems on existing
> distros such as RHEL7 with our radeon adapters even.
>
> Any suggestion of what's the best approach to fix it ? I'm happy to
> produce the patches but I'm not that familiar with the TTM so I would
> like to make sure I'm the right track first :-)

I dislike the approach of rewriting placements. In some cases I think it
won't even work, because placements are declared 'static const'

What I'd suggest is instead to intercept the driver response from
init_mem_type() and filter out undesired caching modes from
available_caching and default_caching, perhaps also looking at whether
the memory type is mappable or not. This should have the additional
benefit of working everywhere, and if a caching mode is selected that's
not available on the platform, you'll simply get an error. (I guess?)

/Thomas


>
> Cheers,
> Ben.
>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
@ 2014-09-04  7:44   ` Thomas Hellstrom
  0 siblings, 0 replies; 33+ messages in thread
From: Thomas Hellstrom @ 2014-09-04  7:44 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jerome Glisse, linuxppc-dev, Michel Danzer, dri-devel

Hi!

Let me try to bring some clarity and suggestions into this.

On 09/04/2014 02:12 AM, Benjamin Herrenschmidt wrote:
> Hi folks !
>
> I've been tracking down some problems with the recent DRI on powerpc and
> stumbled upon something that doesn't look right, and not necessarily
> only for us.
>
> Now it's possible that I haven't fully understood the code here and I
> also don't know to what extent some of that behaviour is necessary for
> some platforms such as Intel GTT bits.
>
> What I've observed with a simple/dumb (no DMA) driver like AST (but this
> probably happens more generally) is that when evicting a BO from VRAM
> into System memory, the TTM tries to preserve the existing caching
> attributes of the VRAM object.
>
> >From what I can tell, we end up with going from VRAM to System memory
> type, and we eventually call ttm_bo_select_caching() to select the
> caching option for the target.
>
> This will, from what I can tell, try to use the same caching mode as the
> original object:
>
> 	if ((cur_placement & caching) != 0)
> 		result |= (cur_placement & caching);
>
> And cur_placement comes from bo->mem.placement which as far as I can
> tell is based on the placement array which the drivers set up.

This originates from the fact that when evicting GTT memory, on x86 it's
unnecessary and undesirable to switch caching mode when going to system.

>
> Now they tend to uniformly setup the placement for System memory as
> TTM_PL_MASK_CACHING which enables all caching modes.
>
> So I end up with, for example, my System memory BOs having
> TTM_PL_FLAG_CACHED not set (though they also don't have
> TTM_PL_FLAG_UNCACHED) and TTM_PL_FLAG_WC.
>
> We don't seem to use the man->default_caching (which will have
> TTM_PL_FLAG_CACHED) unless there is no matching bit at all between the
> proposed placement and the existing caching mode.
>
> Now this is a problem for several reason that I can think of:
>
>  - On a number of powerpc platforms, such as all our server 64-bit one
> for example, it's actually illegal to map system memory non-cached. The
> system is fully cache coherent for all possible DMA originators (that we
> care about at least) and mapping memory non-cachable while it's mapped
> cachable in the linear mapping can cause nasty cache paradox which, when
> detected by HW, can checkstop the system.
>
>  - A similar issue exists, afaik, on ARM >= v7, so anything mapped
> non-cachable must be removed from the linear mapping explicitly since
> otherwise it can be speculatively prefetched into the cache.
>
>  - I don't know about x86, but even then, it looks quite sub-optimal to
> map the memory backing of the BOs and access it using a WC rather than a
> cachable mapping attribute.

Last time I tested, (and it seems like Michel is on the same track),
writing with the CPU to write-combined memory was substantially faster
than writing to cached memory, with the additional side-effect that CPU
caches are left unpolluted.

Moreover (although only tested on Intel's embedded chipsets), texturing
from cpu-cache-coherent PCI memory was a real GPU performance hog
compared to texturing from non-snooped memory. Hence, whenever a buffer
could be classified as GPU-read-only (or almost at least), it should be
placed in write-combined memory.

>
> Now, some folks on IRC mentioned that there might be reasons for the
> current behaviour as to not change the caching attributes when going
> in/out of the GTT on Intel, I don't know how that relates and how that
> works, but maybe that should be enforced by having a different placement
> mask specifically on those chipsets.
>
> Dave, should we change the various PCI drivers for generally coherent
> devices such that the System memory type doesn't allow placements
> without CACHED attribute ? Or at least on coherent platforms ? How do
> detect that ? Should we have a TTM helper to establish the default
> memory placement attributes that "normal PCI" drivers call to set that
> up so we can have all the necessary arch ifdefs in one single place, at
> least for "classic PCI/PCIe" stuff (AGP might need additional tweaks) ?
>
> Non-PCI and "special" drivers like Intel can use a different set of
> placement attributes to represent the requirements of those specific
> platforms (mostly thinking of embedded ARM here which under some
> circumstances might actually require non-cached mappings).
> Or am I missing another part of the puzzle ?
>
> As it-is, things are broken for me even for dumb drivers, and I suspect
> to a large extent with radeon and nouveau too, though in some case we
> might get away with it most of the time ... until the machine locks up
> for some unexplainable reason... This might cause problems on existing
> distros such as RHEL7 with our radeon adapters even.
>
> Any suggestion of what's the best approach to fix it ? I'm happy to
> produce the patches but I'm not that familiar with the TTM so I would
> like to make sure I'm the right track first :-)

I dislike the approach of rewriting placements. In some cases I think it
won't even work, because placements are declared 'static const'

What I'd suggest is instead to intercept the driver response from
init_mem_type() and filter out undesired caching modes from
available_caching and default_caching, perhaps also looking at whether
the memory type is mappable or not. This should have the additional
benefit of working everywhere, and if a caching mode is selected that's
not available on the platform, you'll simply get an error. (I guess?)

/Thomas


>
> Cheers,
> Ben.
>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
  2014-09-04  7:19           ` Michel Dänzer
@ 2014-09-04  7:54             ` Benjamin Herrenschmidt
  2014-09-04  7:59               ` Michel Dänzer
  0 siblings, 1 reply; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2014-09-04  7:54 UTC (permalink / raw)
  To: Michel Dänzer
  Cc: linuxppc-dev, Michel Dänzer, dri-devel, Alex Deucher,
	Jerome Glisse, Christian Koenig

On Thu, 2014-09-04 at 16:19 +0900, Michel Dänzer wrote:
> > +#else /* CONFIG_X86 */
> > +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t
> *placement)
> > +{
> > +     if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) {
> > +             ttm->caching_state = tt_cached;
> > +             *placement &= ~TTM_PL_MASK_CACHING;
> > +             *placement |= TTM_PL_FLAG_CACHED;
> 
> NAK, this will break AGP on PowerMacs.

 ... which doesn't work reliably anyway with DRI2 :-)

The problem is ... with DRI1 I think we had tricks to take out the
AGP from the linear mapping but that want away, didn't we ?

In any case, we are playing with fire on these by allowing the
cache paradox. It just happens that those old CPUs aren't *that*
aggressive at speculative prefetch and we probably rarely hit the
lockups that they would cause...

Michel, what do you recommend we do then ? The patch I sent to
double check in ttm_io_prot() has a specific hack to avoid warning
on PowerMac for the above reason, but we need to fix Jerome if we
want to keep that broken-by-design Mac AGP functionality going :-)

Maybe we could add a similar ifdef in the above ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
  2014-09-04  7:54             ` Benjamin Herrenschmidt
@ 2014-09-04  7:59               ` Michel Dänzer
  2014-09-04  7:59                 ` Michel Dänzer
  2014-09-04  8:07                 ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 33+ messages in thread
From: Michel Dänzer @ 2014-09-04  7:59 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig,
	dri-devel

On 04.09.2014 16:54, Benjamin Herrenschmidt wrote:
> On Thu, 2014-09-04 at 16:19 +0900, Michel Dänzer wrote:
>>> +#else /* CONFIG_X86 */
>>> +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t
>> *placement)
>>> +{
>>> +     if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) {
>>> +             ttm->caching_state = tt_cached;
>>> +             *placement &= ~TTM_PL_MASK_CACHING;
>>> +             *placement |= TTM_PL_FLAG_CACHED;
>>
>> NAK, this will break AGP on PowerMacs.
>
>   ... which doesn't work reliably anyway with DRI2 :-)

Define 'not reliably'. I have uptimes of weeks, and I'm pretty sure I'm 
not alone, at least with AGP 1x it seems to work quite well for most 
people. So I don't see the justification for intentionally breaking it 
completely for all of us.


-- 
Earthling Michel Dänzer            |                  http://www.amd.com
Libre software enthusiast          |                Mesa and X developer

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
  2014-09-04  7:59               ` Michel Dänzer
@ 2014-09-04  7:59                 ` Michel Dänzer
  2014-09-04  8:07                 ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 33+ messages in thread
From: Michel Dänzer @ 2014-09-04  7:59 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig,
	dri-devel

On 04.09.2014 16:59, Michel Dänzer wrote:
> On 04.09.2014 16:54, Benjamin Herrenschmidt wrote:
>> On Thu, 2014-09-04 at 16:19 +0900, Michel Dänzer wrote:
>>>> +#else /* CONFIG_X86 */
>>>> +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t
>>> *placement)
>>>> +{
>>>> +     if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) {
>>>> +             ttm->caching_state = tt_cached;
>>>> +             *placement &= ~TTM_PL_MASK_CACHING;
>>>> +             *placement |= TTM_PL_FLAG_CACHED;
>>>
>>> NAK, this will break AGP on PowerMacs.
>>
>>   ... which doesn't work reliably anyway with DRI2 :-)
>
> Define 'not reliably'. I have uptimes of weeks, and I'm pretty sure I'm
> not alone, at least with AGP 1x it seems to work quite well for most
> people. So I don't see the justification for intentionally breaking it
> completely for all of us.

Even more so because PCI GART is unusably slow in general.


-- 
Earthling Michel Dänzer            |                  http://www.amd.com
Libre software enthusiast          |                Mesa and X developer

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
  2014-09-04  7:44   ` Thomas Hellstrom
  (?)
@ 2014-09-04  8:06   ` Benjamin Herrenschmidt
  2014-09-04  8:46       ` Thomas Hellstrom
  -1 siblings, 1 reply; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2014-09-04  8:06 UTC (permalink / raw)
  To: Thomas Hellstrom; +Cc: Jerome Glisse, linuxppc-dev, Michel Danzer, dri-devel

On Thu, 2014-09-04 at 09:44 +0200, Thomas Hellstrom wrote:

> > This will, from what I can tell, try to use the same caching mode as the
> > original object:
> >
> > 	if ((cur_placement & caching) != 0)
> > 		result |= (cur_placement & caching);
> >
> > And cur_placement comes from bo->mem.placement which as far as I can
> > tell is based on the placement array which the drivers set up.
> 
> This originates from the fact that when evicting GTT memory, on x86 it's
> unnecessary and undesirable to switch caching mode when going to system.

But that's what I don't quite understand. We have two different mappings
here. The VRAM and the memory object. We wouldn't be "switching"... we
are creating a temporary mapping for the memory object in order to do
the memcpy, but we seem to be doing it by using the caching attributes
of the VRAM object.... or am I missing something ? I don't see how that
makes sense so I suppose I'm missing something here :-)

> Last time I tested, (and it seems like Michel is on the same track),
> writing with the CPU to write-combined memory was substantially faster
> than writing to cached memory, with the additional side-effect that CPU
> caches are left unpolluted.

That's very strange indeed. It's certainly an x86 specific artifact,
even if we were allowed by our hypervisor to map memory non-cachable
(the HW somewhat can), we tend to have a higher throughput by going
cachable, but that could be due to the way the PowerBus works (it's
basically very biased toward cachable transactions).

> I dislike the approach of rewriting placements. In some cases I think it
> won't even work, because placements are declared 'static const'
> 
> What I'd suggest is instead to intercept the driver response from
> init_mem_type() and filter out undesired caching modes from
> available_caching and default_caching, 

This was my original intent but Jerome seems to have different ideas
(see his proposed patches). I'm happy to revive mine as well and post it
as an alternative after I've tested it a bit more (tomorrow).

> perhaps also looking at whether
> the memory type is mappable or not. This should have the additional
> benefit of working everywhere, and if a caching mode is selected that's
> not available on the platform, you'll simply get an error. (I guess?)

You mean that if not mappable we don't bother filtering ?

The rule is really for me pretty simple:

   - If it's system memory (PL_SYSTEM/PL_TT), it MUST be cachable

   - If it's PCIe memory space (VRAM, registers, ...) it MUST be
non-cachable.

Cheers,
Ben.

> /Thomas
> 
> 
> >
> > Cheers,
> > Ben.
> >
> >
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
  2014-09-04  7:59               ` Michel Dänzer
  2014-09-04  7:59                 ` Michel Dänzer
@ 2014-09-04  8:07                 ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2014-09-04  8:07 UTC (permalink / raw)
  To: Michel Dänzer
  Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig,
	dri-devel

On Thu, 2014-09-04 at 16:59 +0900, Michel Dänzer wrote:
> 
> Define 'not reliably'. I have uptimes of weeks, and I'm pretty sure I'm 
> not alone, at least with AGP 1x it seems to work quite well for most 
> people. So I don't see the justification for intentionally breaking it 
> completely for all of us.

Oh I wasn't arguing for breaking it, just jesting. We need to keep it
working. It's amazing how well broken stuff actually work though :-)

I mean, it's architecturally broken and if we get a collision between
the cache and the NCU, the chip will crash. We just get lucky I suppose.

Anyway, I'll try a different approach tomorrow see how it goes.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
  2014-09-04  8:06   ` Benjamin Herrenschmidt
@ 2014-09-04  8:46       ` Thomas Hellstrom
  0 siblings, 0 replies; 33+ messages in thread
From: Thomas Hellstrom @ 2014-09-04  8:46 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jerome Glisse, linuxppc-dev, Michel Danzer, dri-devel

On 09/04/2014 10:06 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2014-09-04 at 09:44 +0200, Thomas Hellstrom wrote:
>
>>> This will, from what I can tell, try to use the same caching mode as the
>>> original object:
>>>
>>> 	if ((cur_placement & caching) != 0)
>>> 		result |= (cur_placement & caching);
>>>
>>> And cur_placement comes from bo->mem.placement which as far as I can
>>> tell is based on the placement array which the drivers set up.
>> This originates from the fact that when evicting GTT memory, on x86 it's
>> unnecessary and undesirable to switch caching mode when going to system.
> But that's what I don't quite understand. We have two different mappings
> here. The VRAM and the memory object. We wouldn't be "switching"... we
> are creating a temporary mapping for the memory object in order to do
> the memcpy, but we seem to be doing it by using the caching attributes
> of the VRAM object.... or am I missing something ? I don't see how that
> makes sense so I suppose I'm missing something here :-)

Well, the intention when TTM was written was that the driver writer
should be smart enough that when he wanted a move from unached VRAM to
system, he'd request cached system in the placement flags in the first
place.  If TTM somehow overrides such a request, that's a bug in TTM.

If the move, for example, is a result of an eviction, then the driver
evict_flags() function should ideally look at the current placement and
decide about a suitable placement based on that: vram-to-system moves
should generally request cacheable memory if the next access is expected
by the CPU. Probably write-combined otherwise.
If the move is the result of a TTM swapout, TTM will automatically
select cachable system, and for most other moves, I think the driver
writer is in full control.

>
>> Last time I tested, (and it seems like Michel is on the same track),
>> writing with the CPU to write-combined memory was substantially faster
>> than writing to cached memory, with the additional side-effect that CPU
>> caches are left unpolluted.
> That's very strange indeed. It's certainly an x86 specific artifact,
> even if we were allowed by our hypervisor to map memory non-cachable
> (the HW somewhat can), we tend to have a higher throughput by going
> cachable, but that could be due to the way the PowerBus works (it's
> basically very biased toward cachable transactions).
>
>> I dislike the approach of rewriting placements. In some cases I think it
>> won't even work, because placements are declared 'static const'
>>
>> What I'd suggest is instead to intercept the driver response from
>> init_mem_type() and filter out undesired caching modes from
>> available_caching and default_caching, 
> This was my original intent but Jerome seems to have different ideas
> (see his proposed patches). I'm happy to revive mine as well and post it
> as an alternative after I've tested it a bit more (tomorrow).
>
>> perhaps also looking at whether
>> the memory type is mappable or not. This should have the additional
>> benefit of working everywhere, and if a caching mode is selected that's
>> not available on the platform, you'll simply get an error. (I guess?)
> You mean that if not mappable we don't bother filtering ?
>
> The rule is really for me pretty simple:
>
>    - If it's system memory (PL_SYSTEM/PL_TT), it MUST be cachable
>
>    - If it's PCIe memory space (VRAM, registers, ...) it MUST be
> non-cachable.

Yes, something along these lines. I guess checking for VRAM or
TTM_MEMTYPE_FLAG_FIXED would perhaps do the trick

/Thomas

>
> Cheers,
> Ben.
>
>> /Thomas
>>
>>
>>> Cheers,
>>> Ben.
>>>
>>>
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/dri-devel&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=C9AHL1VngKBOxe2UrNP2eCZo6FLqdlr6Y90rpfE5rUs%3D%0A&s=73da0633bafc5d54bf116bc861d48d13c39cf8f41832adfb739709e98ec05553
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
@ 2014-09-04  8:46       ` Thomas Hellstrom
  0 siblings, 0 replies; 33+ messages in thread
From: Thomas Hellstrom @ 2014-09-04  8:46 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jerome Glisse, linuxppc-dev, Michel Danzer, dri-devel

On 09/04/2014 10:06 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2014-09-04 at 09:44 +0200, Thomas Hellstrom wrote:
>
>>> This will, from what I can tell, try to use the same caching mode as the
>>> original object:
>>>
>>> 	if ((cur_placement & caching) != 0)
>>> 		result |= (cur_placement & caching);
>>>
>>> And cur_placement comes from bo->mem.placement which as far as I can
>>> tell is based on the placement array which the drivers set up.
>> This originates from the fact that when evicting GTT memory, on x86 it's
>> unnecessary and undesirable to switch caching mode when going to system.
> But that's what I don't quite understand. We have two different mappings
> here. The VRAM and the memory object. We wouldn't be "switching"... we
> are creating a temporary mapping for the memory object in order to do
> the memcpy, but we seem to be doing it by using the caching attributes
> of the VRAM object.... or am I missing something ? I don't see how that
> makes sense so I suppose I'm missing something here :-)

Well, the intention when TTM was written was that the driver writer
should be smart enough that when he wanted a move from unached VRAM to
system, he'd request cached system in the placement flags in the first
place.  If TTM somehow overrides such a request, that's a bug in TTM.

If the move, for example, is a result of an eviction, then the driver
evict_flags() function should ideally look at the current placement and
decide about a suitable placement based on that: vram-to-system moves
should generally request cacheable memory if the next access is expected
by the CPU. Probably write-combined otherwise.
If the move is the result of a TTM swapout, TTM will automatically
select cachable system, and for most other moves, I think the driver
writer is in full control.

>
>> Last time I tested, (and it seems like Michel is on the same track),
>> writing with the CPU to write-combined memory was substantially faster
>> than writing to cached memory, with the additional side-effect that CPU
>> caches are left unpolluted.
> That's very strange indeed. It's certainly an x86 specific artifact,
> even if we were allowed by our hypervisor to map memory non-cachable
> (the HW somewhat can), we tend to have a higher throughput by going
> cachable, but that could be due to the way the PowerBus works (it's
> basically very biased toward cachable transactions).
>
>> I dislike the approach of rewriting placements. In some cases I think it
>> won't even work, because placements are declared 'static const'
>>
>> What I'd suggest is instead to intercept the driver response from
>> init_mem_type() and filter out undesired caching modes from
>> available_caching and default_caching, 
> This was my original intent but Jerome seems to have different ideas
> (see his proposed patches). I'm happy to revive mine as well and post it
> as an alternative after I've tested it a bit more (tomorrow).
>
>> perhaps also looking at whether
>> the memory type is mappable or not. This should have the additional
>> benefit of working everywhere, and if a caching mode is selected that's
>> not available on the platform, you'll simply get an error. (I guess?)
> You mean that if not mappable we don't bother filtering ?
>
> The rule is really for me pretty simple:
>
>    - If it's system memory (PL_SYSTEM/PL_TT), it MUST be cachable
>
>    - If it's PCIe memory space (VRAM, registers, ...) it MUST be
> non-cachable.

Yes, something along these lines. I guess checking for VRAM or
TTM_MEMTYPE_FLAG_FIXED would perhaps do the trick

/Thomas

>
> Cheers,
> Ben.
>
>> /Thomas
>>
>>
>>> Cheers,
>>> Ben.
>>>
>>>
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/dri-devel&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=C9AHL1VngKBOxe2UrNP2eCZo6FLqdlr6Y90rpfE5rUs%3D%0A&s=73da0633bafc5d54bf116bc861d48d13c39cf8f41832adfb739709e98ec05553
>

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
  2014-09-04  7:44   ` Thomas Hellstrom
@ 2014-09-04  9:34     ` Daniel Vetter
  -1 siblings, 0 replies; 33+ messages in thread
From: Daniel Vetter @ 2014-09-04  9:34 UTC (permalink / raw)
  To: Thomas Hellstrom; +Cc: dri-devel, Michel Danzer, linuxppc-dev

On Thu, Sep 04, 2014 at 09:44:04AM +0200, Thomas Hellstrom wrote:
> Last time I tested, (and it seems like Michel is on the same track),
> writing with the CPU to write-combined memory was substantially faster
> than writing to cached memory, with the additional side-effect that CPU
> caches are left unpolluted.
> 
> Moreover (although only tested on Intel's embedded chipsets), texturing
> from cpu-cache-coherent PCI memory was a real GPU performance hog
> compared to texturing from non-snooped memory. Hence, whenever a buffer
> could be classified as GPU-read-only (or almost at least), it should be
> placed in write-combined memory.

Just a quick comment since this explicitly referes to intel chips: On
desktop/laptop chips with the big shared l3/l4 caches it's the other way
round. Cached uploads are substantially faster than wc and not using
coherent access is a severe perf hit for texturing. I guess the hw guys
worked really hard to hide the snooping costs so that the gpu can benefit
from the massive bandwidth these caches can provide.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
@ 2014-09-04  9:34     ` Daniel Vetter
  0 siblings, 0 replies; 33+ messages in thread
From: Daniel Vetter @ 2014-09-04  9:34 UTC (permalink / raw)
  To: Thomas Hellstrom; +Cc: dri-devel, Michel Danzer, linuxppc-dev

On Thu, Sep 04, 2014 at 09:44:04AM +0200, Thomas Hellstrom wrote:
> Last time I tested, (and it seems like Michel is on the same track),
> writing with the CPU to write-combined memory was substantially faster
> than writing to cached memory, with the additional side-effect that CPU
> caches are left unpolluted.
> 
> Moreover (although only tested on Intel's embedded chipsets), texturing
> from cpu-cache-coherent PCI memory was a real GPU performance hog
> compared to texturing from non-snooped memory. Hence, whenever a buffer
> could be classified as GPU-read-only (or almost at least), it should be
> placed in write-combined memory.

Just a quick comment since this explicitly referes to intel chips: On
desktop/laptop chips with the big shared l3/l4 caches it's the other way
round. Cached uploads are substantially faster than wc and not using
coherent access is a severe perf hit for texturing. I guess the hw guys
worked really hard to hide the snooping costs so that the gpu can benefit
from the massive bandwidth these caches can provide.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
  2014-09-04  9:34     ` Daniel Vetter
  (?)
@ 2014-09-04  9:43     ` Benjamin Herrenschmidt
  2014-09-04 10:23         ` Thomas Hellstrom
  -1 siblings, 1 reply; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2014-09-04  9:43 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: linuxppc-dev, Thomas Hellstrom, dri-devel, Michel Danzer

On Thu, 2014-09-04 at 11:34 +0200, Daniel Vetter wrote:
> On Thu, Sep 04, 2014 at 09:44:04AM +0200, Thomas Hellstrom wrote:
> > Last time I tested, (and it seems like Michel is on the same track),
> > writing with the CPU to write-combined memory was substantially faster
> > than writing to cached memory, with the additional side-effect that CPU
> > caches are left unpolluted.
> > 
> > Moreover (although only tested on Intel's embedded chipsets), texturing
> > from cpu-cache-coherent PCI memory was a real GPU performance hog
> > compared to texturing from non-snooped memory. Hence, whenever a buffer
> > could be classified as GPU-read-only (or almost at least), it should be
> > placed in write-combined memory.
> 
> Just a quick comment since this explicitly referes to intel chips: On
> desktop/laptop chips with the big shared l3/l4 caches it's the other way
> round. Cached uploads are substantially faster than wc and not using
> coherent access is a severe perf hit for texturing. I guess the hw guys
> worked really hard to hide the snooping costs so that the gpu can benefit
> from the massive bandwidth these caches can provide.

This is similar to modern POWER chips as well. We have pretty big L3's
(though not technically shared they are in a separate quadrant and we
have a shared L4 in the memory buffer) and our fabric is generally
optimized for cachable/coherent access performance. In fact, we only
have so many credits for NC accesses on the bus...

What that tells me is that when setting up the desired cachability
attributes for the mapping of a memory object, we need to consider these
things here:

  - The hard requirement of the HW (non-coherent GPUs require NC, AGP
does in some cases, etc...) which I think is basically already handled
using the placement attributes set by the GPU driver for the memory type

  - The optimal attributes (and platform hard requirements) for fast
memory accesses to an object by the processor.  From what I read here,
this can be NC+WC on older Intel, cachable on newer, etc...)

  - The optimal attributes for fast GPU DMA accesses to the object in
system memory. Here too, this is fairly platform/chipset dependent.

Do we have flags in the DRM that tell us whether an object in memory is
more likely to be used by the GPU via DMA vs by the CPU via MMIO ? On
powerpc (except in the old AGP case), I wouldn't care about require
cachable in both case, but I can see the low latency crowd wanting the
former to be non-cachable while the dumb GPUs like AST who don't do DMA
would benefit greatly from the latter...

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
  2014-09-04  9:43     ` Benjamin Herrenschmidt
@ 2014-09-04 10:23         ` Thomas Hellstrom
  0 siblings, 0 replies; 33+ messages in thread
From: Thomas Hellstrom @ 2014-09-04 10:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, dri-devel, Daniel Vetter

On 09/04/2014 11:43 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2014-09-04 at 11:34 +0200, Daniel Vetter wrote:
>> On Thu, Sep 04, 2014 at 09:44:04AM +0200, Thomas Hellstrom wrote:
>>> Last time I tested, (and it seems like Michel is on the same track),
>>> writing with the CPU to write-combined memory was substantially faster
>>> than writing to cached memory, with the additional side-effect that CPU
>>> caches are left unpolluted.
>>>
>>> Moreover (although only tested on Intel's embedded chipsets), texturing
>>> from cpu-cache-coherent PCI memory was a real GPU performance hog
>>> compared to texturing from non-snooped memory. Hence, whenever a buffer
>>> could be classified as GPU-read-only (or almost at least), it should be
>>> placed in write-combined memory.
>> Just a quick comment since this explicitly referes to intel chips: On
>> desktop/laptop chips with the big shared l3/l4 caches it's the other way
>> round. Cached uploads are substantially faster than wc and not using
>> coherent access is a severe perf hit for texturing. I guess the hw guys
>> worked really hard to hide the snooping costs so that the gpu can benefit
>> from the massive bandwidth these caches can provide.
> This is similar to modern POWER chips as well. We have pretty big L3's
> (though not technically shared they are in a separate quadrant and we
> have a shared L4 in the memory buffer) and our fabric is generally
> optimized for cachable/coherent access performance. In fact, we only
> have so many credits for NC accesses on the bus...
>

Thanks both of you for the update. I haven't dealt with real hardware
for a while..

/Thomas

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
@ 2014-09-04 10:23         ` Thomas Hellstrom
  0 siblings, 0 replies; 33+ messages in thread
From: Thomas Hellstrom @ 2014-09-04 10:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, dri-devel, Daniel Vetter

On 09/04/2014 11:43 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2014-09-04 at 11:34 +0200, Daniel Vetter wrote:
>> On Thu, Sep 04, 2014 at 09:44:04AM +0200, Thomas Hellstrom wrote:
>>> Last time I tested, (and it seems like Michel is on the same track),
>>> writing with the CPU to write-combined memory was substantially faster
>>> than writing to cached memory, with the additional side-effect that CPU
>>> caches are left unpolluted.
>>>
>>> Moreover (although only tested on Intel's embedded chipsets), texturing
>>> from cpu-cache-coherent PCI memory was a real GPU performance hog
>>> compared to texturing from non-snooped memory. Hence, whenever a buffer
>>> could be classified as GPU-read-only (or almost at least), it should be
>>> placed in write-combined memory.
>> Just a quick comment since this explicitly referes to intel chips: On
>> desktop/laptop chips with the big shared l3/l4 caches it's the other way
>> round. Cached uploads are substantially faster than wc and not using
>> coherent access is a severe perf hit for texturing. I guess the hw guys
>> worked really hard to hide the snooping costs so that the gpu can benefit
>> from the massive bandwidth these caches can provide.
> This is similar to modern POWER chips as well. We have pretty big L3's
> (though not technically shared they are in a separate quadrant and we
> have a shared L4 in the memory buffer) and our fabric is generally
> optimized for cachable/coherent access performance. In fact, we only
> have so many credits for NC accesses on the bus...
>

Thanks both of you for the update. I haven't dealt with real hardware
for a while..

/Thomas

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: TTM placement & caching issue/questions
       [not found] <ED4D93630842CD4385F644DC5158EE9171B05E72@NTOVMAIL03.ad.otto.de>
@ 2014-09-05  7:40 ` Jochen Rollwagen
  0 siblings, 0 replies; 33+ messages in thread
From: Jochen Rollwagen @ 2014-09-05  7:40 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 566 bytes --]


Sorry for chiming in here, i’ve passively followed the discussion.

If you produce patches for a 3.14 kernel that solve this issue:

http://lists.freedesktop.org/archives/dri-devel/2013-November/049185.html

(thereby enabling AGP mode on mac mini G4’s, should still be quite a few 
of them around)

you can register me as a happy tester. My "uptime" with a powerpc AGP 
system is about two minutes :-) (although the system behaviour has 
changed somewhat with more recent kernels from a complete lockup to an 
endless loop).

Cheers

Jochen




[-- Attachment #1.2: Type: text/html, Size: 3021 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2014-09-05  7:40 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-04  0:12 TTM placement & caching issue/questions Benjamin Herrenschmidt
2014-09-04  0:12 ` Benjamin Herrenschmidt
2014-09-04  1:55 ` Jerome Glisse
2014-09-04  2:07   ` Jerome Glisse
2014-09-04  2:07     ` Jerome Glisse
2014-09-04  2:25     ` Benjamin Herrenschmidt
2014-09-04  2:25       ` Benjamin Herrenschmidt
2014-09-04  2:31       ` Jerome Glisse
2014-09-04  2:31         ` Jerome Glisse
2014-09-04  2:32         ` Jerome Glisse
2014-09-04  2:32           ` Jerome Glisse
2014-09-04  2:36         ` Jerome Glisse
2014-09-04  2:36           ` Jerome Glisse
2014-09-04  5:23           ` Benjamin Herrenschmidt
2014-09-04  6:45           ` Gabriel Paubert
2014-09-04  7:19           ` Michel Dänzer
2014-09-04  7:54             ` Benjamin Herrenschmidt
2014-09-04  7:59               ` Michel Dänzer
2014-09-04  7:59                 ` Michel Dänzer
2014-09-04  8:07                 ` Benjamin Herrenschmidt
2014-09-04  2:15   ` Benjamin Herrenschmidt
2014-09-04  7:12   ` Michel Dänzer
2014-09-04  7:44 ` Thomas Hellstrom
2014-09-04  7:44   ` Thomas Hellstrom
2014-09-04  8:06   ` Benjamin Herrenschmidt
2014-09-04  8:46     ` Thomas Hellstrom
2014-09-04  8:46       ` Thomas Hellstrom
2014-09-04  9:34   ` Daniel Vetter
2014-09-04  9:34     ` Daniel Vetter
2014-09-04  9:43     ` Benjamin Herrenschmidt
2014-09-04 10:23       ` Thomas Hellstrom
2014-09-04 10:23         ` Thomas Hellstrom
     [not found] <ED4D93630842CD4385F644DC5158EE9171B05E72@NTOVMAIL03.ad.otto.de>
2014-09-05  7:40 ` Jochen Rollwagen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.