* TTM placement & caching issue/questions
@ 2014-09-04 0:12 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 33+ messages in thread
From: Benjamin Herrenschmidt @ 2014-09-04 0:12 UTC (permalink / raw)
To: dri-devel; +Cc: Dave Airlie, linuxppc-dev, Alex Deucher
Hi folks !
I've been tracking down some problems with the recent DRI on powerpc and
stumbled upon something that doesn't look right, and not necessarily
only for us.
Now it's possible that I haven't fully understood the code here and I
also don't know to what extent some of that behaviour is necessary for
some platforms such as Intel GTT bits.
What I've observed with a simple/dumb (no DMA) driver like AST (but this
probably happens more generally) is that when evicting a BO from VRAM
into System memory, the TTM tries to preserve the existing caching
attributes of the VRAM object.
>From what I can tell, we end up with going from VRAM to System memory
type, and we eventually call ttm_bo_select_caching() to select the
caching option for the target.
This will, from what I can tell, try to use the same caching mode as the
original object:
if ((cur_placement & caching) != 0)
result |= (cur_placement & caching);
And cur_placement comes from bo->mem.placement which as far as I can
tell is based on the placement array which the drivers set up.
Now they tend to uniformly setup the placement for System memory as
TTM_PL_MASK_CACHING which enables all caching modes.
So I end up with, for example, my System memory BOs having
TTM_PL_FLAG_CACHED not set (though they also don't have
TTM_PL_FLAG_UNCACHED) and TTM_PL_FLAG_WC.
We don't seem to use the man->default_caching (which will have
TTM_PL_FLAG_CACHED) unless there is no matching bit at all between the
proposed placement and the existing caching mode.
Now this is a problem for several reason that I can think of:
- On a number of powerpc platforms, such as all our server 64-bit one
for example, it's actually illegal to map system memory non-cached. The
system is fully cache coherent for all possible DMA originators (that we
care about at least) and mapping memory non-cachable while it's mapped
cachable in the linear mapping can cause nasty cache paradox which, when
detected by HW, can checkstop the system.
- A similar issue exists, afaik, on ARM >= v7, so anything mapped
non-cachable must be removed from the linear mapping explicitly since
otherwise it can be speculatively prefetched into the cache.
- I don't know about x86, but even then, it looks quite sub-optimal to
map the memory backing of the BOs and access it using a WC rather than a
cachable mapping attribute.
Now, some folks on IRC mentioned that there might be reasons for the
current behaviour as to not change the caching attributes when going
in/out of the GTT on Intel, I don't know how that relates and how that
works, but maybe that should be enforced by having a different placement
mask specifically on those chipsets.
Dave, should we change the various PCI drivers for generally coherent
devices such that the System memory type doesn't allow placements
without CACHED attribute ? Or at least on coherent platforms ? How do
detect that ? Should we have a TTM helper to establish the default
memory placement attributes that "normal PCI" drivers call to set that
up so we can have all the necessary arch ifdefs in one single place, at
least for "classic PCI/PCIe" stuff (AGP might need additional tweaks) ?
Non-PCI and "special" drivers like Intel can use a different set of
placement attributes to represent the requirements of those specific
platforms (mostly thinking of embedded ARM here which under some
circumstances might actually require non-cached mappings).
Or am I missing another part of the puzzle ?
As it-is, things are broken for me even for dumb drivers, and I suspect
to a large extent with radeon and nouveau too, though in some case we
might get away with it most of the time ... until the machine locks up
for some unexplainable reason... This might cause problems on existing
distros such as RHEL7 with our radeon adapters even.
Any suggestion of what's the best approach to fix it ? I'm happy to
produce the patches but I'm not that familiar with the TTM so I would
like to make sure I'm the right track first :-)
Cheers,
Ben.
^ permalink raw reply [flat|nested] 33+ messages in thread
* TTM placement & caching issue/questions @ 2014-09-04 0:12 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 33+ messages in thread From: Benjamin Herrenschmidt @ 2014-09-04 0:12 UTC (permalink / raw) To: dri-devel; +Cc: linuxppc-dev Hi folks ! I've been tracking down some problems with the recent DRI on powerpc and stumbled upon something that doesn't look right, and not necessarily only for us. Now it's possible that I haven't fully understood the code here and I also don't know to what extent some of that behaviour is necessary for some platforms such as Intel GTT bits. What I've observed with a simple/dumb (no DMA) driver like AST (but this probably happens more generally) is that when evicting a BO from VRAM into System memory, the TTM tries to preserve the existing caching attributes of the VRAM object. >From what I can tell, we end up with going from VRAM to System memory type, and we eventually call ttm_bo_select_caching() to select the caching option for the target. This will, from what I can tell, try to use the same caching mode as the original object: if ((cur_placement & caching) != 0) result |= (cur_placement & caching); And cur_placement comes from bo->mem.placement which as far as I can tell is based on the placement array which the drivers set up. Now they tend to uniformly setup the placement for System memory as TTM_PL_MASK_CACHING which enables all caching modes. So I end up with, for example, my System memory BOs having TTM_PL_FLAG_CACHED not set (though they also don't have TTM_PL_FLAG_UNCACHED) and TTM_PL_FLAG_WC. We don't seem to use the man->default_caching (which will have TTM_PL_FLAG_CACHED) unless there is no matching bit at all between the proposed placement and the existing caching mode. Now this is a problem for several reason that I can think of: - On a number of powerpc platforms, such as all our server 64-bit one for example, it's actually illegal to map system memory non-cached. The system is fully cache coherent for all possible DMA originators (that we care about at least) and mapping memory non-cachable while it's mapped cachable in the linear mapping can cause nasty cache paradox which, when detected by HW, can checkstop the system. - A similar issue exists, afaik, on ARM >= v7, so anything mapped non-cachable must be removed from the linear mapping explicitly since otherwise it can be speculatively prefetched into the cache. - I don't know about x86, but even then, it looks quite sub-optimal to map the memory backing of the BOs and access it using a WC rather than a cachable mapping attribute. Now, some folks on IRC mentioned that there might be reasons for the current behaviour as to not change the caching attributes when going in/out of the GTT on Intel, I don't know how that relates and how that works, but maybe that should be enforced by having a different placement mask specifically on those chipsets. Dave, should we change the various PCI drivers for generally coherent devices such that the System memory type doesn't allow placements without CACHED attribute ? Or at least on coherent platforms ? How do detect that ? Should we have a TTM helper to establish the default memory placement attributes that "normal PCI" drivers call to set that up so we can have all the necessary arch ifdefs in one single place, at least for "classic PCI/PCIe" stuff (AGP might need additional tweaks) ? Non-PCI and "special" drivers like Intel can use a different set of placement attributes to represent the requirements of those specific platforms (mostly thinking of embedded ARM here which under some circumstances might actually require non-cached mappings). Or am I missing another part of the puzzle ? As it-is, things are broken for me even for dumb drivers, and I suspect to a large extent with radeon and nouveau too, though in some case we might get away with it most of the time ... until the machine locks up for some unexplainable reason... This might cause problems on existing distros such as RHEL7 with our radeon adapters even. Any suggestion of what's the best approach to fix it ? I'm happy to produce the patches but I'm not that familiar with the TTM so I would like to make sure I'm the right track first :-) Cheers, Ben. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions 2014-09-04 0:12 ` Benjamin Herrenschmidt (?) @ 2014-09-04 1:55 ` Jerome Glisse 2014-09-04 2:07 ` Jerome Glisse ` (2 more replies) -1 siblings, 3 replies; 33+ messages in thread From: Jerome Glisse @ 2014-09-04 1:55 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig, dri-devel On Thu, Sep 04, 2014 at 10:12:27AM +1000, Benjamin Herrenschmidt wrote: > Hi folks ! > > I've been tracking down some problems with the recent DRI on powerpc and > stumbled upon something that doesn't look right, and not necessarily > only for us. > > Now it's possible that I haven't fully understood the code here and I > also don't know to what extent some of that behaviour is necessary for > some platforms such as Intel GTT bits. > > What I've observed with a simple/dumb (no DMA) driver like AST (but this > probably happens more generally) is that when evicting a BO from VRAM > into System memory, the TTM tries to preserve the existing caching > attributes of the VRAM object. > > From what I can tell, we end up with going from VRAM to System memory > type, and we eventually call ttm_bo_select_caching() to select the > caching option for the target. > > This will, from what I can tell, try to use the same caching mode as the > original object: > > if ((cur_placement & caching) != 0) > result |= (cur_placement & caching); > > And cur_placement comes from bo->mem.placement which as far as I can > tell is based on the placement array which the drivers set up. > > Now they tend to uniformly setup the placement for System memory as > TTM_PL_MASK_CACHING which enables all caching modes. > > So I end up with, for example, my System memory BOs having > TTM_PL_FLAG_CACHED not set (though they also don't have > TTM_PL_FLAG_UNCACHED) and TTM_PL_FLAG_WC. > > We don't seem to use the man->default_caching (which will have > TTM_PL_FLAG_CACHED) unless there is no matching bit at all between the > proposed placement and the existing caching mode. > > Now this is a problem for several reason that I can think of: > > - On a number of powerpc platforms, such as all our server 64-bit one > for example, it's actually illegal to map system memory non-cached. The > system is fully cache coherent for all possible DMA originators (that we > care about at least) and mapping memory non-cachable while it's mapped > cachable in the linear mapping can cause nasty cache paradox which, when > detected by HW, can checkstop the system. > > - A similar issue exists, afaik, on ARM >= v7, so anything mapped > non-cachable must be removed from the linear mapping explicitly since > otherwise it can be speculatively prefetched into the cache. > > - I don't know about x86, but even then, it looks quite sub-optimal to > map the memory backing of the BOs and access it using a WC rather than a > cachable mapping attribute. > > Now, some folks on IRC mentioned that there might be reasons for the > current behaviour as to not change the caching attributes when going > in/out of the GTT on Intel, I don't know how that relates and how that > works, but maybe that should be enforced by having a different placement > mask specifically on those chipsets. > > Dave, should we change the various PCI drivers for generally coherent > devices such that the System memory type doesn't allow placements > without CACHED attribute ? Or at least on coherent platforms ? How do > detect that ? Should we have a TTM helper to establish the default > memory placement attributes that "normal PCI" drivers call to set that > up so we can have all the necessary arch ifdefs in one single place, at > least for "classic PCI/PCIe" stuff (AGP might need additional tweaks) ? > > Non-PCI and "special" drivers like Intel can use a different set of > placement attributes to represent the requirements of those specific > platforms (mostly thinking of embedded ARM here which under some > circumstances might actually require non-cached mappings). > Or am I missing another part of the puzzle ? > > As it-is, things are broken for me even for dumb drivers, and I suspect > to a large extent with radeon and nouveau too, though in some case we > might get away with it most of the time ... until the machine locks up > for some unexplainable reason... This might cause problems on existing > distros such as RHEL7 with our radeon adapters even. > > Any suggestion of what's the best approach to fix it ? I'm happy to > produce the patches but I'm not that familiar with the TTM so I would > like to make sure I'm the right track first :-) While i agree about the issue of incoherent double map of same page, i think we have more issue. For instance lattely AMD have been pushing a lot of patches to move things to use uncached memory for radeon and as usual thoses patches comes with no comment to the motivations of those changes. I am ccing the usual suspect ;) What i understand is that uncached mapping for some frequently use buffer give a significant performance boost (i am assuming this has to do with all the snoop pci transaction overhead). >From my understanding this is something that is allow on other OS and any driver wishing to compete from performance point of view will want that. So i think we need to get a platform flags and or set_pages_array_wc|uc needs to fail and this would fallback to cached mapping if the fallback code still works. So if your arch properly return and error for those cache changing function then you should be fine. This also means that we need to fix ttm_tt_set_placement_caching so that when it returns an error it switches to cached mapping. Which will always work. Cheers, Jérôme > > Cheers, > Ben. > > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions 2014-09-04 1:55 ` Jerome Glisse @ 2014-09-04 2:07 ` Jerome Glisse 2014-09-04 2:15 ` Benjamin Herrenschmidt 2014-09-04 7:12 ` Michel Dänzer 2 siblings, 0 replies; 33+ messages in thread From: Jerome Glisse @ 2014-09-04 2:07 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig, dri-devel [-- Attachment #1: Type: text/plain, Size: 6133 bytes --] On Wed, Sep 03, 2014 at 09:55:53PM -0400, Jerome Glisse wrote: > On Thu, Sep 04, 2014 at 10:12:27AM +1000, Benjamin Herrenschmidt wrote: > > Hi folks ! > > > > I've been tracking down some problems with the recent DRI on powerpc and > > stumbled upon something that doesn't look right, and not necessarily > > only for us. > > > > Now it's possible that I haven't fully understood the code here and I > > also don't know to what extent some of that behaviour is necessary for > > some platforms such as Intel GTT bits. > > > > What I've observed with a simple/dumb (no DMA) driver like AST (but this > > probably happens more generally) is that when evicting a BO from VRAM > > into System memory, the TTM tries to preserve the existing caching > > attributes of the VRAM object. > > > > From what I can tell, we end up with going from VRAM to System memory > > type, and we eventually call ttm_bo_select_caching() to select the > > caching option for the target. > > > > This will, from what I can tell, try to use the same caching mode as the > > original object: > > > > if ((cur_placement & caching) != 0) > > result |= (cur_placement & caching); > > > > And cur_placement comes from bo->mem.placement which as far as I can > > tell is based on the placement array which the drivers set up. > > > > Now they tend to uniformly setup the placement for System memory as > > TTM_PL_MASK_CACHING which enables all caching modes. > > > > So I end up with, for example, my System memory BOs having > > TTM_PL_FLAG_CACHED not set (though they also don't have > > TTM_PL_FLAG_UNCACHED) and TTM_PL_FLAG_WC. > > > > We don't seem to use the man->default_caching (which will have > > TTM_PL_FLAG_CACHED) unless there is no matching bit at all between the > > proposed placement and the existing caching mode. > > > > Now this is a problem for several reason that I can think of: > > > > - On a number of powerpc platforms, such as all our server 64-bit one > > for example, it's actually illegal to map system memory non-cached. The > > system is fully cache coherent for all possible DMA originators (that we > > care about at least) and mapping memory non-cachable while it's mapped > > cachable in the linear mapping can cause nasty cache paradox which, when > > detected by HW, can checkstop the system. > > > > - A similar issue exists, afaik, on ARM >= v7, so anything mapped > > non-cachable must be removed from the linear mapping explicitly since > > otherwise it can be speculatively prefetched into the cache. > > > > - I don't know about x86, but even then, it looks quite sub-optimal to > > map the memory backing of the BOs and access it using a WC rather than a > > cachable mapping attribute. > > > > Now, some folks on IRC mentioned that there might be reasons for the > > current behaviour as to not change the caching attributes when going > > in/out of the GTT on Intel, I don't know how that relates and how that > > works, but maybe that should be enforced by having a different placement > > mask specifically on those chipsets. > > > > Dave, should we change the various PCI drivers for generally coherent > > devices such that the System memory type doesn't allow placements > > without CACHED attribute ? Or at least on coherent platforms ? How do > > detect that ? Should we have a TTM helper to establish the default > > memory placement attributes that "normal PCI" drivers call to set that > > up so we can have all the necessary arch ifdefs in one single place, at > > least for "classic PCI/PCIe" stuff (AGP might need additional tweaks) ? > > > > Non-PCI and "special" drivers like Intel can use a different set of > > placement attributes to represent the requirements of those specific > > platforms (mostly thinking of embedded ARM here which under some > > circumstances might actually require non-cached mappings). > > Or am I missing another part of the puzzle ? > > > > As it-is, things are broken for me even for dumb drivers, and I suspect > > to a large extent with radeon and nouveau too, though in some case we > > might get away with it most of the time ... until the machine locks up > > for some unexplainable reason... This might cause problems on existing > > distros such as RHEL7 with our radeon adapters even. > > > > Any suggestion of what's the best approach to fix it ? I'm happy to > > produce the patches but I'm not that familiar with the TTM so I would > > like to make sure I'm the right track first :-) > > While i agree about the issue of incoherent double map of same page, i > think we have more issue. For instance lattely AMD have been pushing a > lot of patches to move things to use uncached memory for radeon and as > usual thoses patches comes with no comment to the motivations of those > changes. I am ccing the usual suspect ;) > > What i understand is that uncached mapping for some frequently use buffer > give a significant performance boost (i am assuming this has to do with > all the snoop pci transaction overhead). > > From my understanding this is something that is allow on other OS and > any driver wishing to compete from performance point of view will want > that. > > > So i think we need to get a platform flags and or set_pages_array_wc|uc > needs to fail and this would fallback to cached mapping if the fallback > code still works. So if your arch properly return and error for those > cache changing function then you should be fine. > > This also means that we need to fix ttm_tt_set_placement_caching so that > when it returns an error it switches to cached mapping. Which will always > work. So in the meantime the attached patch should work, it just silently ignore the caching attribute request on non x86 instead of pretending that things are setup as expected and then latter the radeon ou nouveau hw unsetting the snoop bit. It's not tested but i think it should work. > > Cheers, > Jérôme > > > > > Cheers, > > Ben. > > > > > > _______________________________________________ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/dri-devel [-- Attachment #2: 0001-drm-ttm-force-cached-mapping-on-non-x86-platform.patch --] [-- Type: text/plain, Size: 1687 bytes --] >From df0b5d1488daed05d7b4508759401a3e89cd4a38 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com> Date: Wed, 3 Sep 2014 22:04:34 -0400 Subject: [PATCH] drm/ttm: force cached mapping on non x86 platform. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit People interested in providing uncached or write combined mapping on there architecture need to do the ground work inside there arch specific code to allow to break the linear kernel mapping so that page mapping attributes can be updated, in the meantime force cached mapping for non x86 architecture. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> --- drivers/gpu/drm/ttm/ttm_tt.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index bf080ab..de14b83 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -89,14 +89,6 @@ static inline int ttm_tt_set_page_caching(struct page *p, return ret; } -#else /* CONFIG_X86 */ -static inline int ttm_tt_set_page_caching(struct page *p, - enum ttm_caching_state c_old, - enum ttm_caching_state c_new) -{ - return 0; -} -#endif /* CONFIG_X86 */ /* * Change caching policy for the linear kernel map @@ -149,6 +141,15 @@ out_err: return ret; } +#else /* CONFIG_X86 */ +static int ttm_tt_set_caching(struct ttm_tt *ttm, + enum ttm_caching_state c_state) +{ + ttm->caching_state = tt_cached; + return 0; +} +#endif /* CONFIG_X86 */ + int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement) { enum ttm_caching_state state; -- 1.9.3 ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions @ 2014-09-04 2:07 ` Jerome Glisse 0 siblings, 0 replies; 33+ messages in thread From: Jerome Glisse @ 2014-09-04 2:07 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig, dri-devel [-- Attachment #1: Type: text/plain, Size: 6266 bytes --] On Wed, Sep 03, 2014 at 09:55:53PM -0400, Jerome Glisse wrote: > On Thu, Sep 04, 2014 at 10:12:27AM +1000, Benjamin Herrenschmidt wrote: > > Hi folks ! > > > > I've been tracking down some problems with the recent DRI on powerpc and > > stumbled upon something that doesn't look right, and not necessarily > > only for us. > > > > Now it's possible that I haven't fully understood the code here and I > > also don't know to what extent some of that behaviour is necessary for > > some platforms such as Intel GTT bits. > > > > What I've observed with a simple/dumb (no DMA) driver like AST (but this > > probably happens more generally) is that when evicting a BO from VRAM > > into System memory, the TTM tries to preserve the existing caching > > attributes of the VRAM object. > > > > From what I can tell, we end up with going from VRAM to System memory > > type, and we eventually call ttm_bo_select_caching() to select the > > caching option for the target. > > > > This will, from what I can tell, try to use the same caching mode as the > > original object: > > > > if ((cur_placement & caching) != 0) > > result |= (cur_placement & caching); > > > > And cur_placement comes from bo->mem.placement which as far as I can > > tell is based on the placement array which the drivers set up. > > > > Now they tend to uniformly setup the placement for System memory as > > TTM_PL_MASK_CACHING which enables all caching modes. > > > > So I end up with, for example, my System memory BOs having > > TTM_PL_FLAG_CACHED not set (though they also don't have > > TTM_PL_FLAG_UNCACHED) and TTM_PL_FLAG_WC. > > > > We don't seem to use the man->default_caching (which will have > > TTM_PL_FLAG_CACHED) unless there is no matching bit at all between the > > proposed placement and the existing caching mode. > > > > Now this is a problem for several reason that I can think of: > > > > - On a number of powerpc platforms, such as all our server 64-bit one > > for example, it's actually illegal to map system memory non-cached. The > > system is fully cache coherent for all possible DMA originators (that we > > care about at least) and mapping memory non-cachable while it's mapped > > cachable in the linear mapping can cause nasty cache paradox which, when > > detected by HW, can checkstop the system. > > > > - A similar issue exists, afaik, on ARM >= v7, so anything mapped > > non-cachable must be removed from the linear mapping explicitly since > > otherwise it can be speculatively prefetched into the cache. > > > > - I don't know about x86, but even then, it looks quite sub-optimal to > > map the memory backing of the BOs and access it using a WC rather than a > > cachable mapping attribute. > > > > Now, some folks on IRC mentioned that there might be reasons for the > > current behaviour as to not change the caching attributes when going > > in/out of the GTT on Intel, I don't know how that relates and how that > > works, but maybe that should be enforced by having a different placement > > mask specifically on those chipsets. > > > > Dave, should we change the various PCI drivers for generally coherent > > devices such that the System memory type doesn't allow placements > > without CACHED attribute ? Or at least on coherent platforms ? How do > > detect that ? Should we have a TTM helper to establish the default > > memory placement attributes that "normal PCI" drivers call to set that > > up so we can have all the necessary arch ifdefs in one single place, at > > least for "classic PCI/PCIe" stuff (AGP might need additional tweaks) ? > > > > Non-PCI and "special" drivers like Intel can use a different set of > > placement attributes to represent the requirements of those specific > > platforms (mostly thinking of embedded ARM here which under some > > circumstances might actually require non-cached mappings). > > Or am I missing another part of the puzzle ? > > > > As it-is, things are broken for me even for dumb drivers, and I suspect > > to a large extent with radeon and nouveau too, though in some case we > > might get away with it most of the time ... until the machine locks up > > for some unexplainable reason... This might cause problems on existing > > distros such as RHEL7 with our radeon adapters even. > > > > Any suggestion of what's the best approach to fix it ? I'm happy to > > produce the patches but I'm not that familiar with the TTM so I would > > like to make sure I'm the right track first :-) > > While i agree about the issue of incoherent double map of same page, i > think we have more issue. For instance lattely AMD have been pushing a > lot of patches to move things to use uncached memory for radeon and as > usual thoses patches comes with no comment to the motivations of those > changes. I am ccing the usual suspect ;) > > What i understand is that uncached mapping for some frequently use buffer > give a significant performance boost (i am assuming this has to do with > all the snoop pci transaction overhead). > > From my understanding this is something that is allow on other OS and > any driver wishing to compete from performance point of view will want > that. > > > So i think we need to get a platform flags and or set_pages_array_wc|uc > needs to fail and this would fallback to cached mapping if the fallback > code still works. So if your arch properly return and error for those > cache changing function then you should be fine. > > This also means that we need to fix ttm_tt_set_placement_caching so that > when it returns an error it switches to cached mapping. Which will always > work. So in the meantime the attached patch should work, it just silently ignore the caching attribute request on non x86 instead of pretending that things are setup as expected and then latter the radeon ou nouveau hw unsetting the snoop bit. It's not tested but i think it should work. > > Cheers, > Jérôme > > > > > Cheers, > > Ben. > > > > > > _______________________________________________ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/dri-devel [-- Attachment #2: 0001-drm-ttm-force-cached-mapping-on-non-x86-platform.patch --] [-- Type: text/plain, Size: 1744 bytes --] >From df0b5d1488daed05d7b4508759401a3e89cd4a38 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com> Date: Wed, 3 Sep 2014 22:04:34 -0400 Subject: [PATCH] drm/ttm: force cached mapping on non x86 platform. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit People interested in providing uncached or write combined mapping on there architecture need to do the ground work inside there arch specific code to allow to break the linear kernel mapping so that page mapping attributes can be updated, in the meantime force cached mapping for non x86 architecture. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> --- drivers/gpu/drm/ttm/ttm_tt.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index bf080ab..de14b83 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -89,14 +89,6 @@ static inline int ttm_tt_set_page_caching(struct page *p, return ret; } -#else /* CONFIG_X86 */ -static inline int ttm_tt_set_page_caching(struct page *p, - enum ttm_caching_state c_old, - enum ttm_caching_state c_new) -{ - return 0; -} -#endif /* CONFIG_X86 */ /* * Change caching policy for the linear kernel map @@ -149,6 +141,15 @@ out_err: return ret; } +#else /* CONFIG_X86 */ +static int ttm_tt_set_caching(struct ttm_tt *ttm, + enum ttm_caching_state c_state) +{ + ttm->caching_state = tt_cached; + return 0; +} +#endif /* CONFIG_X86 */ + int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement) { enum ttm_caching_state state; -- 1.9.3 [-- Attachment #3: Type: text/plain, Size: 159 bytes --] _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions 2014-09-04 2:07 ` Jerome Glisse @ 2014-09-04 2:25 ` Benjamin Herrenschmidt -1 siblings, 0 replies; 33+ messages in thread From: Benjamin Herrenschmidt @ 2014-09-04 2:25 UTC (permalink / raw) To: Jerome Glisse Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig, dri-devel On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote: > So in the meantime the attached patch should work, it just silently ignore > the caching attribute request on non x86 instead of pretending that things > are setup as expected and then latter the radeon ou nouveau hw unsetting > the snoop bit. > > It's not tested but i think it should work. I'm still getting placements with !CACHED going from bo_memcpy in ttm_io_prot() though ... I'm looking at filtering the placement attributes instead. Ben. > > > > Cheers, > > Jérôme > > > > > > > > Cheers, > > > Ben. > > > > > > > > > _______________________________________________ > > > dri-devel mailing list > > > dri-devel@lists.freedesktop.org > > > http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions @ 2014-09-04 2:25 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 33+ messages in thread From: Benjamin Herrenschmidt @ 2014-09-04 2:25 UTC (permalink / raw) To: Jerome Glisse Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig, dri-devel On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote: > So in the meantime the attached patch should work, it just silently ignore > the caching attribute request on non x86 instead of pretending that things > are setup as expected and then latter the radeon ou nouveau hw unsetting > the snoop bit. > > It's not tested but i think it should work. I'm still getting placements with !CACHED going from bo_memcpy in ttm_io_prot() though ... I'm looking at filtering the placement attributes instead. Ben. > > > > Cheers, > > Jérôme > > > > > > > > Cheers, > > > Ben. > > > > > > > > > _______________________________________________ > > > dri-devel mailing list > > > dri-devel@lists.freedesktop.org > > > http://lists.freedesktop.org/mailman/listinfo/dri-devel _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions 2014-09-04 2:25 ` Benjamin Herrenschmidt @ 2014-09-04 2:31 ` Jerome Glisse -1 siblings, 0 replies; 33+ messages in thread From: Jerome Glisse @ 2014-09-04 2:31 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig, dri-devel [-- Attachment #1: Type: text/plain, Size: 945 bytes --] On Thu, Sep 04, 2014 at 12:25:23PM +1000, Benjamin Herrenschmidt wrote: > On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote: > > > So in the meantime the attached patch should work, it just silently ignore > > the caching attribute request on non x86 instead of pretending that things > > are setup as expected and then latter the radeon ou nouveau hw unsetting > > the snoop bit. > > > > It's not tested but i think it should work. > > I'm still getting placements with !CACHED going from bo_memcpy in > ttm_io_prot() though ... I'm looking at filtering the placement > attributes instead. > > Ben. Ok so this one should do the trick. > > > > > > > Cheers, > > > Jérôme > > > > > > > > > > > Cheers, > > > > Ben. > > > > > > > > > > > > _______________________________________________ > > > > dri-devel mailing list > > > > dri-devel@lists.freedesktop.org > > > > http://lists.freedesktop.org/mailman/listinfo/dri-devel > > [-- Attachment #2: 0001-drm-ttm-force-cached-mapping-on-non-x86-platform.patch --] [-- Type: text/plain, Size: 4525 bytes --] >From def7a056d042220f91016d0a7c245ba8e96f90ba Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com> Date: Wed, 3 Sep 2014 22:04:34 -0400 Subject: [PATCH] drm/ttm: force cached mapping on non x86 platform. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit People interested in providing uncached or write combined mapping on there architecture need to do the ground work inside there arch specific code to allow to break the linear kernel mapping so that page mapping attributes can be updated, in the meantime force cached mapping for non x86 architecture. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> --- drivers/gpu/drm/radeon/radeon_ttm.c | 2 +- drivers/gpu/drm/ttm/ttm_bo.c | 2 +- drivers/gpu/drm/ttm/ttm_tt.c | 47 ++++++++++++++++++++++++++++--------- include/drm/ttm/ttm_bo_driver.h | 2 +- 4 files changed, 39 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 72afe82..4dd5060 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -304,7 +304,7 @@ static int radeon_move_vram_ram(struct ttm_buffer_object *bo, return r; } - r = ttm_tt_set_placement_caching(bo->ttm, tmp_mem.placement); + r = ttm_tt_set_placement_caching(bo->ttm, &tmp_mem.placement); if (unlikely(r)) { goto out_cleanup; } diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 3da89d5..4dc21c3 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -305,7 +305,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo, goto out_err; } - ret = ttm_tt_set_placement_caching(bo->ttm, mem->placement); + ret = ttm_tt_set_placement_caching(bo->ttm, &mem->placement); if (ret) goto out_err; diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index bf080ab..7cbdb48 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -89,14 +89,6 @@ static inline int ttm_tt_set_page_caching(struct page *p, return ret; } -#else /* CONFIG_X86 */ -static inline int ttm_tt_set_page_caching(struct page *p, - enum ttm_caching_state c_old, - enum ttm_caching_state c_new) -{ - return 0; -} -#endif /* CONFIG_X86 */ /* * Change caching policy for the linear kernel map @@ -149,19 +141,52 @@ out_err: return ret; } -int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement) +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement) { enum ttm_caching_state state; - if (placement & TTM_PL_FLAG_WC) + if (*placement & TTM_PL_FLAG_WC) state = tt_wc; - else if (placement & TTM_PL_FLAG_UNCACHED) + else if (*placement & TTM_PL_FLAG_UNCACHED) state = tt_uncached; else state = tt_cached; return ttm_tt_set_caching(ttm, state); } +#else /* CONFIG_X86 */ +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement) +{ + if (placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) { + ttm->caching_state = tt_cached; + } else { + if (placement & TTM_PL_FLAG_WC) + ttm->caching_state = tt_wc; + else if (placement & TTM_PL_FLAG_UNCACHED) + ttm->caching_state = tt_uncached; + else + ttm->caching_state = tt_cached; + } + /* + * Some architecture force cached so we need to reflect the + * new ttm->caching_state into the mem->placement flags. + */ + *placement &= ~TTM_PL_MASK_CACHING; + switch (bo->ttm->caching_state) { + case tt_wc: + *placement |= TTM_PL_FLAG_WC; + break; + case tt_uncached: + *placement |= TTM_PL_FLAG_UNCACHED; + break; + case tt_cached: + default: + *placement |= TTM_PL_FLAG_CACHED; + break; + } + return 0; +} +#endif /* CONFIG_X86 */ EXPORT_SYMBOL(ttm_tt_set_placement_caching); void ttm_tt_destroy(struct ttm_tt *ttm) diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index 1d9f0f1..cbc5ad2 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -669,7 +669,7 @@ extern int ttm_tt_swapin(struct ttm_tt *ttm); * hit RAM. This function may be very costly as it involves global TLB * and cache flushes and potential page splitting / combining. */ -extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement); +extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement); extern int ttm_tt_swapout(struct ttm_tt *ttm, struct file *persistent_swap_storage); -- 1.9.3 ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions @ 2014-09-04 2:31 ` Jerome Glisse 0 siblings, 0 replies; 33+ messages in thread From: Jerome Glisse @ 2014-09-04 2:31 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig, dri-devel [-- Attachment #1: Type: text/plain, Size: 980 bytes --] On Thu, Sep 04, 2014 at 12:25:23PM +1000, Benjamin Herrenschmidt wrote: > On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote: > > > So in the meantime the attached patch should work, it just silently ignore > > the caching attribute request on non x86 instead of pretending that things > > are setup as expected and then latter the radeon ou nouveau hw unsetting > > the snoop bit. > > > > It's not tested but i think it should work. > > I'm still getting placements with !CACHED going from bo_memcpy in > ttm_io_prot() though ... I'm looking at filtering the placement > attributes instead. > > Ben. Ok so this one should do the trick. > > > > > > > Cheers, > > > Jérôme > > > > > > > > > > > Cheers, > > > > Ben. > > > > > > > > > > > > _______________________________________________ > > > > dri-devel mailing list > > > > dri-devel@lists.freedesktop.org > > > > http://lists.freedesktop.org/mailman/listinfo/dri-devel > > [-- Attachment #2: 0001-drm-ttm-force-cached-mapping-on-non-x86-platform.patch --] [-- Type: text/plain, Size: 4664 bytes --] >From def7a056d042220f91016d0a7c245ba8e96f90ba Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com> Date: Wed, 3 Sep 2014 22:04:34 -0400 Subject: [PATCH] drm/ttm: force cached mapping on non x86 platform. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit People interested in providing uncached or write combined mapping on there architecture need to do the ground work inside there arch specific code to allow to break the linear kernel mapping so that page mapping attributes can be updated, in the meantime force cached mapping for non x86 architecture. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> --- drivers/gpu/drm/radeon/radeon_ttm.c | 2 +- drivers/gpu/drm/ttm/ttm_bo.c | 2 +- drivers/gpu/drm/ttm/ttm_tt.c | 47 ++++++++++++++++++++++++++++--------- include/drm/ttm/ttm_bo_driver.h | 2 +- 4 files changed, 39 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 72afe82..4dd5060 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -304,7 +304,7 @@ static int radeon_move_vram_ram(struct ttm_buffer_object *bo, return r; } - r = ttm_tt_set_placement_caching(bo->ttm, tmp_mem.placement); + r = ttm_tt_set_placement_caching(bo->ttm, &tmp_mem.placement); if (unlikely(r)) { goto out_cleanup; } diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 3da89d5..4dc21c3 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -305,7 +305,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo, goto out_err; } - ret = ttm_tt_set_placement_caching(bo->ttm, mem->placement); + ret = ttm_tt_set_placement_caching(bo->ttm, &mem->placement); if (ret) goto out_err; diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index bf080ab..7cbdb48 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -89,14 +89,6 @@ static inline int ttm_tt_set_page_caching(struct page *p, return ret; } -#else /* CONFIG_X86 */ -static inline int ttm_tt_set_page_caching(struct page *p, - enum ttm_caching_state c_old, - enum ttm_caching_state c_new) -{ - return 0; -} -#endif /* CONFIG_X86 */ /* * Change caching policy for the linear kernel map @@ -149,19 +141,52 @@ out_err: return ret; } -int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement) +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement) { enum ttm_caching_state state; - if (placement & TTM_PL_FLAG_WC) + if (*placement & TTM_PL_FLAG_WC) state = tt_wc; - else if (placement & TTM_PL_FLAG_UNCACHED) + else if (*placement & TTM_PL_FLAG_UNCACHED) state = tt_uncached; else state = tt_cached; return ttm_tt_set_caching(ttm, state); } +#else /* CONFIG_X86 */ +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement) +{ + if (placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) { + ttm->caching_state = tt_cached; + } else { + if (placement & TTM_PL_FLAG_WC) + ttm->caching_state = tt_wc; + else if (placement & TTM_PL_FLAG_UNCACHED) + ttm->caching_state = tt_uncached; + else + ttm->caching_state = tt_cached; + } + /* + * Some architecture force cached so we need to reflect the + * new ttm->caching_state into the mem->placement flags. + */ + *placement &= ~TTM_PL_MASK_CACHING; + switch (bo->ttm->caching_state) { + case tt_wc: + *placement |= TTM_PL_FLAG_WC; + break; + case tt_uncached: + *placement |= TTM_PL_FLAG_UNCACHED; + break; + case tt_cached: + default: + *placement |= TTM_PL_FLAG_CACHED; + break; + } + return 0; +} +#endif /* CONFIG_X86 */ EXPORT_SYMBOL(ttm_tt_set_placement_caching); void ttm_tt_destroy(struct ttm_tt *ttm) diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index 1d9f0f1..cbc5ad2 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -669,7 +669,7 @@ extern int ttm_tt_swapin(struct ttm_tt *ttm); * hit RAM. This function may be very costly as it involves global TLB * and cache flushes and potential page splitting / combining. */ -extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement); +extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement); extern int ttm_tt_swapout(struct ttm_tt *ttm, struct file *persistent_swap_storage); -- 1.9.3 [-- Attachment #3: Type: text/plain, Size: 159 bytes --] _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions 2014-09-04 2:31 ` Jerome Glisse @ 2014-09-04 2:32 ` Jerome Glisse -1 siblings, 0 replies; 33+ messages in thread From: Jerome Glisse @ 2014-09-04 2:32 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig, dri-devel [-- Attachment #1: Type: text/plain, Size: 5913 bytes --] On Wed, Sep 03, 2014 at 10:31:18PM -0400, Jerome Glisse wrote: > On Thu, Sep 04, 2014 at 12:25:23PM +1000, Benjamin Herrenschmidt wrote: > > On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote: > > > > > So in the meantime the attached patch should work, it just silently ignore > > > the caching attribute request on non x86 instead of pretending that things > > > are setup as expected and then latter the radeon ou nouveau hw unsetting > > > the snoop bit. > > > > > > It's not tested but i think it should work. > > > > I'm still getting placements with !CACHED going from bo_memcpy in > > ttm_io_prot() though ... I'm looking at filtering the placement > > attributes instead. > > > > Ben. > > Ok so this one should do the trick. And this one should build :) > > > > > > > > > > > > Cheers, > > > > Jérôme > > > > > > > > > > > > > > Cheers, > > > > > Ben. > > > > > > > > > > > > > > > _______________________________________________ > > > > > dri-devel mailing list > > > > > dri-devel@lists.freedesktop.org > > > > > http://lists.freedesktop.org/mailman/listinfo/dri-devel > > > > > From def7a056d042220f91016d0a7c245ba8e96f90ba Mon Sep 17 00:00:00 2001 > From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com> > Date: Wed, 3 Sep 2014 22:04:34 -0400 > Subject: [PATCH] drm/ttm: force cached mapping on non x86 platform. > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > People interested in providing uncached or write combined mapping > on there architecture need to do the ground work inside there arch > specific code to allow to break the linear kernel mapping so that > page mapping attributes can be updated, in the meantime force cached > mapping for non x86 architecture. > > Signed-off-by: Jérôme Glisse <jglisse@redhat.com> > --- > drivers/gpu/drm/radeon/radeon_ttm.c | 2 +- > drivers/gpu/drm/ttm/ttm_bo.c | 2 +- > drivers/gpu/drm/ttm/ttm_tt.c | 47 ++++++++++++++++++++++++++++--------- > include/drm/ttm/ttm_bo_driver.h | 2 +- > 4 files changed, 39 insertions(+), 14 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c > index 72afe82..4dd5060 100644 > --- a/drivers/gpu/drm/radeon/radeon_ttm.c > +++ b/drivers/gpu/drm/radeon/radeon_ttm.c > @@ -304,7 +304,7 @@ static int radeon_move_vram_ram(struct ttm_buffer_object *bo, > return r; > } > > - r = ttm_tt_set_placement_caching(bo->ttm, tmp_mem.placement); > + r = ttm_tt_set_placement_caching(bo->ttm, &tmp_mem.placement); > if (unlikely(r)) { > goto out_cleanup; > } > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c > index 3da89d5..4dc21c3 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo.c > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > @@ -305,7 +305,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo, > goto out_err; > } > > - ret = ttm_tt_set_placement_caching(bo->ttm, mem->placement); > + ret = ttm_tt_set_placement_caching(bo->ttm, &mem->placement); > if (ret) > goto out_err; > > diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c > index bf080ab..7cbdb48 100644 > --- a/drivers/gpu/drm/ttm/ttm_tt.c > +++ b/drivers/gpu/drm/ttm/ttm_tt.c > @@ -89,14 +89,6 @@ static inline int ttm_tt_set_page_caching(struct page *p, > > return ret; > } > -#else /* CONFIG_X86 */ > -static inline int ttm_tt_set_page_caching(struct page *p, > - enum ttm_caching_state c_old, > - enum ttm_caching_state c_new) > -{ > - return 0; > -} > -#endif /* CONFIG_X86 */ > > /* > * Change caching policy for the linear kernel map > @@ -149,19 +141,52 @@ out_err: > return ret; > } > > -int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement) > +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement) > { > enum ttm_caching_state state; > > - if (placement & TTM_PL_FLAG_WC) > + if (*placement & TTM_PL_FLAG_WC) > state = tt_wc; > - else if (placement & TTM_PL_FLAG_UNCACHED) > + else if (*placement & TTM_PL_FLAG_UNCACHED) > state = tt_uncached; > else > state = tt_cached; > > return ttm_tt_set_caching(ttm, state); > } > +#else /* CONFIG_X86 */ > +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement) > +{ > + if (placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) { > + ttm->caching_state = tt_cached; > + } else { > + if (placement & TTM_PL_FLAG_WC) > + ttm->caching_state = tt_wc; > + else if (placement & TTM_PL_FLAG_UNCACHED) > + ttm->caching_state = tt_uncached; > + else > + ttm->caching_state = tt_cached; > + } > + /* > + * Some architecture force cached so we need to reflect the > + * new ttm->caching_state into the mem->placement flags. > + */ > + *placement &= ~TTM_PL_MASK_CACHING; > + switch (bo->ttm->caching_state) { > + case tt_wc: > + *placement |= TTM_PL_FLAG_WC; > + break; > + case tt_uncached: > + *placement |= TTM_PL_FLAG_UNCACHED; > + break; > + case tt_cached: > + default: > + *placement |= TTM_PL_FLAG_CACHED; > + break; > + } > + return 0; > +} > +#endif /* CONFIG_X86 */ > EXPORT_SYMBOL(ttm_tt_set_placement_caching); > > void ttm_tt_destroy(struct ttm_tt *ttm) > diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h > index 1d9f0f1..cbc5ad2 100644 > --- a/include/drm/ttm/ttm_bo_driver.h > +++ b/include/drm/ttm/ttm_bo_driver.h > @@ -669,7 +669,7 @@ extern int ttm_tt_swapin(struct ttm_tt *ttm); > * hit RAM. This function may be very costly as it involves global TLB > * and cache flushes and potential page splitting / combining. > */ > -extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement); > +extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement); > extern int ttm_tt_swapout(struct ttm_tt *ttm, > struct file *persistent_swap_storage); > > -- > 1.9.3 > [-- Attachment #2: 0001-drm-ttm-force-cached-mapping-on-non-x86-platform.patch --] [-- Type: text/plain, Size: 4527 bytes --] >From 6ed69ceee96873ab351881c6b9f931cec39f396a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com> Date: Wed, 3 Sep 2014 22:04:34 -0400 Subject: [PATCH] drm/ttm: force cached mapping on non x86 platform. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit People interested in providing uncached or write combined mapping on there architecture need to do the ground work inside there arch specific code to allow to break the linear kernel mapping so that page mapping attributes can be updated, in the meantime force cached mapping for non x86 architecture. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> --- drivers/gpu/drm/radeon/radeon_ttm.c | 2 +- drivers/gpu/drm/ttm/ttm_bo.c | 2 +- drivers/gpu/drm/ttm/ttm_tt.c | 47 ++++++++++++++++++++++++++++--------- include/drm/ttm/ttm_bo_driver.h | 2 +- 4 files changed, 39 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 72afe82..4dd5060 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -304,7 +304,7 @@ static int radeon_move_vram_ram(struct ttm_buffer_object *bo, return r; } - r = ttm_tt_set_placement_caching(bo->ttm, tmp_mem.placement); + r = ttm_tt_set_placement_caching(bo->ttm, &tmp_mem.placement); if (unlikely(r)) { goto out_cleanup; } diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 3da89d5..4dc21c3 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -305,7 +305,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo, goto out_err; } - ret = ttm_tt_set_placement_caching(bo->ttm, mem->placement); + ret = ttm_tt_set_placement_caching(bo->ttm, &mem->placement); if (ret) goto out_err; diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index bf080ab..efd49b9 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -89,14 +89,6 @@ static inline int ttm_tt_set_page_caching(struct page *p, return ret; } -#else /* CONFIG_X86 */ -static inline int ttm_tt_set_page_caching(struct page *p, - enum ttm_caching_state c_old, - enum ttm_caching_state c_new) -{ - return 0; -} -#endif /* CONFIG_X86 */ /* * Change caching policy for the linear kernel map @@ -149,19 +141,52 @@ out_err: return ret; } -int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement) +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement) { enum ttm_caching_state state; - if (placement & TTM_PL_FLAG_WC) + if (*placement & TTM_PL_FLAG_WC) state = tt_wc; - else if (placement & TTM_PL_FLAG_UNCACHED) + else if (*placement & TTM_PL_FLAG_UNCACHED) state = tt_uncached; else state = tt_cached; return ttm_tt_set_caching(ttm, state); } +#else /* CONFIG_X86 */ +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement) +{ + if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) { + ttm->caching_state = tt_cached; + } else { + if (*placement & TTM_PL_FLAG_WC) + ttm->caching_state = tt_wc; + else if (placement & TTM_PL_FLAG_UNCACHED) + ttm->caching_state = tt_uncached; + else + ttm->caching_state = tt_cached; + } + /* + * Some architecture force cached so we need to reflect the + * new ttm->caching_state into the mem->placement flags. + */ + *placement &= ~TTM_PL_MASK_CACHING; + switch (bo->ttm->caching_state) { + case tt_wc: + *placement |= TTM_PL_FLAG_WC; + break; + case tt_uncached: + *placement |= TTM_PL_FLAG_UNCACHED; + break; + case tt_cached: + default: + *placement |= TTM_PL_FLAG_CACHED; + break; + } + return 0; +} +#endif /* CONFIG_X86 */ EXPORT_SYMBOL(ttm_tt_set_placement_caching); void ttm_tt_destroy(struct ttm_tt *ttm) diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index 1d9f0f1..cbc5ad2 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -669,7 +669,7 @@ extern int ttm_tt_swapin(struct ttm_tt *ttm); * hit RAM. This function may be very costly as it involves global TLB * and cache flushes and potential page splitting / combining. */ -extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement); +extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement); extern int ttm_tt_swapout(struct ttm_tt *ttm, struct file *persistent_swap_storage); -- 1.9.3 ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions @ 2014-09-04 2:32 ` Jerome Glisse 0 siblings, 0 replies; 33+ messages in thread From: Jerome Glisse @ 2014-09-04 2:32 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig, dri-devel [-- Attachment #1: Type: text/plain, Size: 6093 bytes --] On Wed, Sep 03, 2014 at 10:31:18PM -0400, Jerome Glisse wrote: > On Thu, Sep 04, 2014 at 12:25:23PM +1000, Benjamin Herrenschmidt wrote: > > On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote: > > > > > So in the meantime the attached patch should work, it just silently ignore > > > the caching attribute request on non x86 instead of pretending that things > > > are setup as expected and then latter the radeon ou nouveau hw unsetting > > > the snoop bit. > > > > > > It's not tested but i think it should work. > > > > I'm still getting placements with !CACHED going from bo_memcpy in > > ttm_io_prot() though ... I'm looking at filtering the placement > > attributes instead. > > > > Ben. > > Ok so this one should do the trick. And this one should build :) > > > > > > > > > > > > Cheers, > > > > Jérôme > > > > > > > > > > > > > > Cheers, > > > > > Ben. > > > > > > > > > > > > > > > _______________________________________________ > > > > > dri-devel mailing list > > > > > dri-devel@lists.freedesktop.org > > > > > http://lists.freedesktop.org/mailman/listinfo/dri-devel > > > > > From def7a056d042220f91016d0a7c245ba8e96f90ba Mon Sep 17 00:00:00 2001 > From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com> > Date: Wed, 3 Sep 2014 22:04:34 -0400 > Subject: [PATCH] drm/ttm: force cached mapping on non x86 platform. > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > People interested in providing uncached or write combined mapping > on there architecture need to do the ground work inside there arch > specific code to allow to break the linear kernel mapping so that > page mapping attributes can be updated, in the meantime force cached > mapping for non x86 architecture. > > Signed-off-by: Jérôme Glisse <jglisse@redhat.com> > --- > drivers/gpu/drm/radeon/radeon_ttm.c | 2 +- > drivers/gpu/drm/ttm/ttm_bo.c | 2 +- > drivers/gpu/drm/ttm/ttm_tt.c | 47 ++++++++++++++++++++++++++++--------- > include/drm/ttm/ttm_bo_driver.h | 2 +- > 4 files changed, 39 insertions(+), 14 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c > index 72afe82..4dd5060 100644 > --- a/drivers/gpu/drm/radeon/radeon_ttm.c > +++ b/drivers/gpu/drm/radeon/radeon_ttm.c > @@ -304,7 +304,7 @@ static int radeon_move_vram_ram(struct ttm_buffer_object *bo, > return r; > } > > - r = ttm_tt_set_placement_caching(bo->ttm, tmp_mem.placement); > + r = ttm_tt_set_placement_caching(bo->ttm, &tmp_mem.placement); > if (unlikely(r)) { > goto out_cleanup; > } > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c > index 3da89d5..4dc21c3 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo.c > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > @@ -305,7 +305,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo, > goto out_err; > } > > - ret = ttm_tt_set_placement_caching(bo->ttm, mem->placement); > + ret = ttm_tt_set_placement_caching(bo->ttm, &mem->placement); > if (ret) > goto out_err; > > diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c > index bf080ab..7cbdb48 100644 > --- a/drivers/gpu/drm/ttm/ttm_tt.c > +++ b/drivers/gpu/drm/ttm/ttm_tt.c > @@ -89,14 +89,6 @@ static inline int ttm_tt_set_page_caching(struct page *p, > > return ret; > } > -#else /* CONFIG_X86 */ > -static inline int ttm_tt_set_page_caching(struct page *p, > - enum ttm_caching_state c_old, > - enum ttm_caching_state c_new) > -{ > - return 0; > -} > -#endif /* CONFIG_X86 */ > > /* > * Change caching policy for the linear kernel map > @@ -149,19 +141,52 @@ out_err: > return ret; > } > > -int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement) > +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement) > { > enum ttm_caching_state state; > > - if (placement & TTM_PL_FLAG_WC) > + if (*placement & TTM_PL_FLAG_WC) > state = tt_wc; > - else if (placement & TTM_PL_FLAG_UNCACHED) > + else if (*placement & TTM_PL_FLAG_UNCACHED) > state = tt_uncached; > else > state = tt_cached; > > return ttm_tt_set_caching(ttm, state); > } > +#else /* CONFIG_X86 */ > +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement) > +{ > + if (placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) { > + ttm->caching_state = tt_cached; > + } else { > + if (placement & TTM_PL_FLAG_WC) > + ttm->caching_state = tt_wc; > + else if (placement & TTM_PL_FLAG_UNCACHED) > + ttm->caching_state = tt_uncached; > + else > + ttm->caching_state = tt_cached; > + } > + /* > + * Some architecture force cached so we need to reflect the > + * new ttm->caching_state into the mem->placement flags. > + */ > + *placement &= ~TTM_PL_MASK_CACHING; > + switch (bo->ttm->caching_state) { > + case tt_wc: > + *placement |= TTM_PL_FLAG_WC; > + break; > + case tt_uncached: > + *placement |= TTM_PL_FLAG_UNCACHED; > + break; > + case tt_cached: > + default: > + *placement |= TTM_PL_FLAG_CACHED; > + break; > + } > + return 0; > +} > +#endif /* CONFIG_X86 */ > EXPORT_SYMBOL(ttm_tt_set_placement_caching); > > void ttm_tt_destroy(struct ttm_tt *ttm) > diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h > index 1d9f0f1..cbc5ad2 100644 > --- a/include/drm/ttm/ttm_bo_driver.h > +++ b/include/drm/ttm/ttm_bo_driver.h > @@ -669,7 +669,7 @@ extern int ttm_tt_swapin(struct ttm_tt *ttm); > * hit RAM. This function may be very costly as it involves global TLB > * and cache flushes and potential page splitting / combining. > */ > -extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement); > +extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement); > extern int ttm_tt_swapout(struct ttm_tt *ttm, > struct file *persistent_swap_storage); > > -- > 1.9.3 > [-- Attachment #2: 0001-drm-ttm-force-cached-mapping-on-non-x86-platform.patch --] [-- Type: text/plain, Size: 4666 bytes --] >From 6ed69ceee96873ab351881c6b9f931cec39f396a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com> Date: Wed, 3 Sep 2014 22:04:34 -0400 Subject: [PATCH] drm/ttm: force cached mapping on non x86 platform. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit People interested in providing uncached or write combined mapping on there architecture need to do the ground work inside there arch specific code to allow to break the linear kernel mapping so that page mapping attributes can be updated, in the meantime force cached mapping for non x86 architecture. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> --- drivers/gpu/drm/radeon/radeon_ttm.c | 2 +- drivers/gpu/drm/ttm/ttm_bo.c | 2 +- drivers/gpu/drm/ttm/ttm_tt.c | 47 ++++++++++++++++++++++++++++--------- include/drm/ttm/ttm_bo_driver.h | 2 +- 4 files changed, 39 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 72afe82..4dd5060 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -304,7 +304,7 @@ static int radeon_move_vram_ram(struct ttm_buffer_object *bo, return r; } - r = ttm_tt_set_placement_caching(bo->ttm, tmp_mem.placement); + r = ttm_tt_set_placement_caching(bo->ttm, &tmp_mem.placement); if (unlikely(r)) { goto out_cleanup; } diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 3da89d5..4dc21c3 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -305,7 +305,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo, goto out_err; } - ret = ttm_tt_set_placement_caching(bo->ttm, mem->placement); + ret = ttm_tt_set_placement_caching(bo->ttm, &mem->placement); if (ret) goto out_err; diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index bf080ab..efd49b9 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -89,14 +89,6 @@ static inline int ttm_tt_set_page_caching(struct page *p, return ret; } -#else /* CONFIG_X86 */ -static inline int ttm_tt_set_page_caching(struct page *p, - enum ttm_caching_state c_old, - enum ttm_caching_state c_new) -{ - return 0; -} -#endif /* CONFIG_X86 */ /* * Change caching policy for the linear kernel map @@ -149,19 +141,52 @@ out_err: return ret; } -int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement) +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement) { enum ttm_caching_state state; - if (placement & TTM_PL_FLAG_WC) + if (*placement & TTM_PL_FLAG_WC) state = tt_wc; - else if (placement & TTM_PL_FLAG_UNCACHED) + else if (*placement & TTM_PL_FLAG_UNCACHED) state = tt_uncached; else state = tt_cached; return ttm_tt_set_caching(ttm, state); } +#else /* CONFIG_X86 */ +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement) +{ + if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) { + ttm->caching_state = tt_cached; + } else { + if (*placement & TTM_PL_FLAG_WC) + ttm->caching_state = tt_wc; + else if (placement & TTM_PL_FLAG_UNCACHED) + ttm->caching_state = tt_uncached; + else + ttm->caching_state = tt_cached; + } + /* + * Some architecture force cached so we need to reflect the + * new ttm->caching_state into the mem->placement flags. + */ + *placement &= ~TTM_PL_MASK_CACHING; + switch (bo->ttm->caching_state) { + case tt_wc: + *placement |= TTM_PL_FLAG_WC; + break; + case tt_uncached: + *placement |= TTM_PL_FLAG_UNCACHED; + break; + case tt_cached: + default: + *placement |= TTM_PL_FLAG_CACHED; + break; + } + return 0; +} +#endif /* CONFIG_X86 */ EXPORT_SYMBOL(ttm_tt_set_placement_caching); void ttm_tt_destroy(struct ttm_tt *ttm) diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index 1d9f0f1..cbc5ad2 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -669,7 +669,7 @@ extern int ttm_tt_swapin(struct ttm_tt *ttm); * hit RAM. This function may be very costly as it involves global TLB * and cache flushes and potential page splitting / combining. */ -extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement); +extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement); extern int ttm_tt_swapout(struct ttm_tt *ttm, struct file *persistent_swap_storage); -- 1.9.3 [-- Attachment #3: Type: text/plain, Size: 159 bytes --] _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions 2014-09-04 2:31 ` Jerome Glisse @ 2014-09-04 2:36 ` Jerome Glisse -1 siblings, 0 replies; 33+ messages in thread From: Jerome Glisse @ 2014-09-04 2:36 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig, dri-devel [-- Attachment #1: Type: text/plain, Size: 5924 bytes --] On Wed, Sep 03, 2014 at 10:31:18PM -0400, Jerome Glisse wrote: > On Thu, Sep 04, 2014 at 12:25:23PM +1000, Benjamin Herrenschmidt wrote: > > On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote: > > > > > So in the meantime the attached patch should work, it just silently ignore > > > the caching attribute request on non x86 instead of pretending that things > > > are setup as expected and then latter the radeon ou nouveau hw unsetting > > > the snoop bit. > > > > > > It's not tested but i think it should work. > > > > I'm still getting placements with !CACHED going from bo_memcpy in > > ttm_io_prot() though ... I'm looking at filtering the placement > > attributes instead. > > > > Ben. > > Ok so this one should do the trick. Ok final version ... famous last word. > > > > > > > > > > > > Cheers, > > > > Jérôme > > > > > > > > > > > > > > Cheers, > > > > > Ben. > > > > > > > > > > > > > > > _______________________________________________ > > > > > dri-devel mailing list > > > > > dri-devel@lists.freedesktop.org > > > > > http://lists.freedesktop.org/mailman/listinfo/dri-devel > > > > > From def7a056d042220f91016d0a7c245ba8e96f90ba Mon Sep 17 00:00:00 2001 > From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com> > Date: Wed, 3 Sep 2014 22:04:34 -0400 > Subject: [PATCH] drm/ttm: force cached mapping on non x86 platform. > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > People interested in providing uncached or write combined mapping > on there architecture need to do the ground work inside there arch > specific code to allow to break the linear kernel mapping so that > page mapping attributes can be updated, in the meantime force cached > mapping for non x86 architecture. > > Signed-off-by: Jérôme Glisse <jglisse@redhat.com> > --- > drivers/gpu/drm/radeon/radeon_ttm.c | 2 +- > drivers/gpu/drm/ttm/ttm_bo.c | 2 +- > drivers/gpu/drm/ttm/ttm_tt.c | 47 ++++++++++++++++++++++++++++--------- > include/drm/ttm/ttm_bo_driver.h | 2 +- > 4 files changed, 39 insertions(+), 14 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c > index 72afe82..4dd5060 100644 > --- a/drivers/gpu/drm/radeon/radeon_ttm.c > +++ b/drivers/gpu/drm/radeon/radeon_ttm.c > @@ -304,7 +304,7 @@ static int radeon_move_vram_ram(struct ttm_buffer_object *bo, > return r; > } > > - r = ttm_tt_set_placement_caching(bo->ttm, tmp_mem.placement); > + r = ttm_tt_set_placement_caching(bo->ttm, &tmp_mem.placement); > if (unlikely(r)) { > goto out_cleanup; > } > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c > index 3da89d5..4dc21c3 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo.c > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > @@ -305,7 +305,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo, > goto out_err; > } > > - ret = ttm_tt_set_placement_caching(bo->ttm, mem->placement); > + ret = ttm_tt_set_placement_caching(bo->ttm, &mem->placement); > if (ret) > goto out_err; > > diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c > index bf080ab..7cbdb48 100644 > --- a/drivers/gpu/drm/ttm/ttm_tt.c > +++ b/drivers/gpu/drm/ttm/ttm_tt.c > @@ -89,14 +89,6 @@ static inline int ttm_tt_set_page_caching(struct page *p, > > return ret; > } > -#else /* CONFIG_X86 */ > -static inline int ttm_tt_set_page_caching(struct page *p, > - enum ttm_caching_state c_old, > - enum ttm_caching_state c_new) > -{ > - return 0; > -} > -#endif /* CONFIG_X86 */ > > /* > * Change caching policy for the linear kernel map > @@ -149,19 +141,52 @@ out_err: > return ret; > } > > -int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement) > +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement) > { > enum ttm_caching_state state; > > - if (placement & TTM_PL_FLAG_WC) > + if (*placement & TTM_PL_FLAG_WC) > state = tt_wc; > - else if (placement & TTM_PL_FLAG_UNCACHED) > + else if (*placement & TTM_PL_FLAG_UNCACHED) > state = tt_uncached; > else > state = tt_cached; > > return ttm_tt_set_caching(ttm, state); > } > +#else /* CONFIG_X86 */ > +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement) > +{ > + if (placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) { > + ttm->caching_state = tt_cached; > + } else { > + if (placement & TTM_PL_FLAG_WC) > + ttm->caching_state = tt_wc; > + else if (placement & TTM_PL_FLAG_UNCACHED) > + ttm->caching_state = tt_uncached; > + else > + ttm->caching_state = tt_cached; > + } > + /* > + * Some architecture force cached so we need to reflect the > + * new ttm->caching_state into the mem->placement flags. > + */ > + *placement &= ~TTM_PL_MASK_CACHING; > + switch (bo->ttm->caching_state) { > + case tt_wc: > + *placement |= TTM_PL_FLAG_WC; > + break; > + case tt_uncached: > + *placement |= TTM_PL_FLAG_UNCACHED; > + break; > + case tt_cached: > + default: > + *placement |= TTM_PL_FLAG_CACHED; > + break; > + } > + return 0; > +} > +#endif /* CONFIG_X86 */ > EXPORT_SYMBOL(ttm_tt_set_placement_caching); > > void ttm_tt_destroy(struct ttm_tt *ttm) > diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h > index 1d9f0f1..cbc5ad2 100644 > --- a/include/drm/ttm/ttm_bo_driver.h > +++ b/include/drm/ttm/ttm_bo_driver.h > @@ -669,7 +669,7 @@ extern int ttm_tt_swapin(struct ttm_tt *ttm); > * hit RAM. This function may be very costly as it involves global TLB > * and cache flushes and potential page splitting / combining. > */ > -extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement); > +extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement); > extern int ttm_tt_swapout(struct ttm_tt *ttm, > struct file *persistent_swap_storage); > > -- > 1.9.3 > [-- Attachment #2: 0001-drm-ttm-force-cached-mapping-on-non-x86-platform.patch --] [-- Type: text/plain, Size: 4186 bytes --] >From 236038e18dc303bb9aa877922e01963d3fb0b7af Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com> Date: Wed, 3 Sep 2014 22:04:34 -0400 Subject: [PATCH] drm/ttm: force cached mapping on non x86 platform. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit People interested in providing uncached or write combined mapping on there architecture need to do the ground work inside there arch specific code to allow to break the linear kernel mapping so that page mapping attributes can be updated, in the meantime force cached mapping for non x86 architecture. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> --- drivers/gpu/drm/radeon/radeon_ttm.c | 2 +- drivers/gpu/drm/ttm/ttm_bo.c | 2 +- drivers/gpu/drm/ttm/ttm_tt.c | 32 +++++++++++++++++++++----------- include/drm/ttm/ttm_bo_driver.h | 2 +- 4 files changed, 24 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 72afe82..4dd5060 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -304,7 +304,7 @@ static int radeon_move_vram_ram(struct ttm_buffer_object *bo, return r; } - r = ttm_tt_set_placement_caching(bo->ttm, tmp_mem.placement); + r = ttm_tt_set_placement_caching(bo->ttm, &tmp_mem.placement); if (unlikely(r)) { goto out_cleanup; } diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 3da89d5..4dc21c3 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -305,7 +305,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo, goto out_err; } - ret = ttm_tt_set_placement_caching(bo->ttm, mem->placement); + ret = ttm_tt_set_placement_caching(bo->ttm, &mem->placement); if (ret) goto out_err; diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index bf080ab..a0df803 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -89,14 +89,6 @@ static inline int ttm_tt_set_page_caching(struct page *p, return ret; } -#else /* CONFIG_X86 */ -static inline int ttm_tt_set_page_caching(struct page *p, - enum ttm_caching_state c_old, - enum ttm_caching_state c_new) -{ - return 0; -} -#endif /* CONFIG_X86 */ /* * Change caching policy for the linear kernel map @@ -149,19 +141,37 @@ out_err: return ret; } -int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement) +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement) { enum ttm_caching_state state; - if (placement & TTM_PL_FLAG_WC) + if (*placement & TTM_PL_FLAG_WC) state = tt_wc; - else if (placement & TTM_PL_FLAG_UNCACHED) + else if (*placement & TTM_PL_FLAG_UNCACHED) state = tt_uncached; else state = tt_cached; return ttm_tt_set_caching(ttm, state); } +#else /* CONFIG_X86 */ +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement) +{ + if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) { + ttm->caching_state = tt_cached; + *placement &= ~TTM_PL_MASK_CACHING; + *placement |= TTM_PL_FLAG_CACHED; + } else { + if (*placement & TTM_PL_FLAG_WC) + ttm->caching_state = tt_wc; + else if (placement & TTM_PL_FLAG_UNCACHED) + ttm->caching_state = tt_uncached; + else + ttm->caching_state = tt_cached; + } + return 0; +} +#endif /* CONFIG_X86 */ EXPORT_SYMBOL(ttm_tt_set_placement_caching); void ttm_tt_destroy(struct ttm_tt *ttm) diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index 1d9f0f1..cbc5ad2 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -669,7 +669,7 @@ extern int ttm_tt_swapin(struct ttm_tt *ttm); * hit RAM. This function may be very costly as it involves global TLB * and cache flushes and potential page splitting / combining. */ -extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement); +extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement); extern int ttm_tt_swapout(struct ttm_tt *ttm, struct file *persistent_swap_storage); -- 1.9.3 ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions @ 2014-09-04 2:36 ` Jerome Glisse 0 siblings, 0 replies; 33+ messages in thread From: Jerome Glisse @ 2014-09-04 2:36 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig, dri-devel [-- Attachment #1: Type: text/plain, Size: 6105 bytes --] On Wed, Sep 03, 2014 at 10:31:18PM -0400, Jerome Glisse wrote: > On Thu, Sep 04, 2014 at 12:25:23PM +1000, Benjamin Herrenschmidt wrote: > > On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote: > > > > > So in the meantime the attached patch should work, it just silently ignore > > > the caching attribute request on non x86 instead of pretending that things > > > are setup as expected and then latter the radeon ou nouveau hw unsetting > > > the snoop bit. > > > > > > It's not tested but i think it should work. > > > > I'm still getting placements with !CACHED going from bo_memcpy in > > ttm_io_prot() though ... I'm looking at filtering the placement > > attributes instead. > > > > Ben. > > Ok so this one should do the trick. Ok final version ... famous last word. > > > > > > > > > > > > Cheers, > > > > Jérôme > > > > > > > > > > > > > > Cheers, > > > > > Ben. > > > > > > > > > > > > > > > _______________________________________________ > > > > > dri-devel mailing list > > > > > dri-devel@lists.freedesktop.org > > > > > http://lists.freedesktop.org/mailman/listinfo/dri-devel > > > > > From def7a056d042220f91016d0a7c245ba8e96f90ba Mon Sep 17 00:00:00 2001 > From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com> > Date: Wed, 3 Sep 2014 22:04:34 -0400 > Subject: [PATCH] drm/ttm: force cached mapping on non x86 platform. > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > People interested in providing uncached or write combined mapping > on there architecture need to do the ground work inside there arch > specific code to allow to break the linear kernel mapping so that > page mapping attributes can be updated, in the meantime force cached > mapping for non x86 architecture. > > Signed-off-by: Jérôme Glisse <jglisse@redhat.com> > --- > drivers/gpu/drm/radeon/radeon_ttm.c | 2 +- > drivers/gpu/drm/ttm/ttm_bo.c | 2 +- > drivers/gpu/drm/ttm/ttm_tt.c | 47 ++++++++++++++++++++++++++++--------- > include/drm/ttm/ttm_bo_driver.h | 2 +- > 4 files changed, 39 insertions(+), 14 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c > index 72afe82..4dd5060 100644 > --- a/drivers/gpu/drm/radeon/radeon_ttm.c > +++ b/drivers/gpu/drm/radeon/radeon_ttm.c > @@ -304,7 +304,7 @@ static int radeon_move_vram_ram(struct ttm_buffer_object *bo, > return r; > } > > - r = ttm_tt_set_placement_caching(bo->ttm, tmp_mem.placement); > + r = ttm_tt_set_placement_caching(bo->ttm, &tmp_mem.placement); > if (unlikely(r)) { > goto out_cleanup; > } > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c > index 3da89d5..4dc21c3 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo.c > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > @@ -305,7 +305,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo, > goto out_err; > } > > - ret = ttm_tt_set_placement_caching(bo->ttm, mem->placement); > + ret = ttm_tt_set_placement_caching(bo->ttm, &mem->placement); > if (ret) > goto out_err; > > diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c > index bf080ab..7cbdb48 100644 > --- a/drivers/gpu/drm/ttm/ttm_tt.c > +++ b/drivers/gpu/drm/ttm/ttm_tt.c > @@ -89,14 +89,6 @@ static inline int ttm_tt_set_page_caching(struct page *p, > > return ret; > } > -#else /* CONFIG_X86 */ > -static inline int ttm_tt_set_page_caching(struct page *p, > - enum ttm_caching_state c_old, > - enum ttm_caching_state c_new) > -{ > - return 0; > -} > -#endif /* CONFIG_X86 */ > > /* > * Change caching policy for the linear kernel map > @@ -149,19 +141,52 @@ out_err: > return ret; > } > > -int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement) > +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement) > { > enum ttm_caching_state state; > > - if (placement & TTM_PL_FLAG_WC) > + if (*placement & TTM_PL_FLAG_WC) > state = tt_wc; > - else if (placement & TTM_PL_FLAG_UNCACHED) > + else if (*placement & TTM_PL_FLAG_UNCACHED) > state = tt_uncached; > else > state = tt_cached; > > return ttm_tt_set_caching(ttm, state); > } > +#else /* CONFIG_X86 */ > +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement) > +{ > + if (placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) { > + ttm->caching_state = tt_cached; > + } else { > + if (placement & TTM_PL_FLAG_WC) > + ttm->caching_state = tt_wc; > + else if (placement & TTM_PL_FLAG_UNCACHED) > + ttm->caching_state = tt_uncached; > + else > + ttm->caching_state = tt_cached; > + } > + /* > + * Some architecture force cached so we need to reflect the > + * new ttm->caching_state into the mem->placement flags. > + */ > + *placement &= ~TTM_PL_MASK_CACHING; > + switch (bo->ttm->caching_state) { > + case tt_wc: > + *placement |= TTM_PL_FLAG_WC; > + break; > + case tt_uncached: > + *placement |= TTM_PL_FLAG_UNCACHED; > + break; > + case tt_cached: > + default: > + *placement |= TTM_PL_FLAG_CACHED; > + break; > + } > + return 0; > +} > +#endif /* CONFIG_X86 */ > EXPORT_SYMBOL(ttm_tt_set_placement_caching); > > void ttm_tt_destroy(struct ttm_tt *ttm) > diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h > index 1d9f0f1..cbc5ad2 100644 > --- a/include/drm/ttm/ttm_bo_driver.h > +++ b/include/drm/ttm/ttm_bo_driver.h > @@ -669,7 +669,7 @@ extern int ttm_tt_swapin(struct ttm_tt *ttm); > * hit RAM. This function may be very costly as it involves global TLB > * and cache flushes and potential page splitting / combining. > */ > -extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement); > +extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement); > extern int ttm_tt_swapout(struct ttm_tt *ttm, > struct file *persistent_swap_storage); > > -- > 1.9.3 > [-- Attachment #2: 0001-drm-ttm-force-cached-mapping-on-non-x86-platform.patch --] [-- Type: text/plain, Size: 4310 bytes --] >From 236038e18dc303bb9aa877922e01963d3fb0b7af Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com> Date: Wed, 3 Sep 2014 22:04:34 -0400 Subject: [PATCH] drm/ttm: force cached mapping on non x86 platform. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit People interested in providing uncached or write combined mapping on there architecture need to do the ground work inside there arch specific code to allow to break the linear kernel mapping so that page mapping attributes can be updated, in the meantime force cached mapping for non x86 architecture. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> --- drivers/gpu/drm/radeon/radeon_ttm.c | 2 +- drivers/gpu/drm/ttm/ttm_bo.c | 2 +- drivers/gpu/drm/ttm/ttm_tt.c | 32 +++++++++++++++++++++----------- include/drm/ttm/ttm_bo_driver.h | 2 +- 4 files changed, 24 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 72afe82..4dd5060 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -304,7 +304,7 @@ static int radeon_move_vram_ram(struct ttm_buffer_object *bo, return r; } - r = ttm_tt_set_placement_caching(bo->ttm, tmp_mem.placement); + r = ttm_tt_set_placement_caching(bo->ttm, &tmp_mem.placement); if (unlikely(r)) { goto out_cleanup; } diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 3da89d5..4dc21c3 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -305,7 +305,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo, goto out_err; } - ret = ttm_tt_set_placement_caching(bo->ttm, mem->placement); + ret = ttm_tt_set_placement_caching(bo->ttm, &mem->placement); if (ret) goto out_err; diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index bf080ab..a0df803 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -89,14 +89,6 @@ static inline int ttm_tt_set_page_caching(struct page *p, return ret; } -#else /* CONFIG_X86 */ -static inline int ttm_tt_set_page_caching(struct page *p, - enum ttm_caching_state c_old, - enum ttm_caching_state c_new) -{ - return 0; -} -#endif /* CONFIG_X86 */ /* * Change caching policy for the linear kernel map @@ -149,19 +141,37 @@ out_err: return ret; } -int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement) +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement) { enum ttm_caching_state state; - if (placement & TTM_PL_FLAG_WC) + if (*placement & TTM_PL_FLAG_WC) state = tt_wc; - else if (placement & TTM_PL_FLAG_UNCACHED) + else if (*placement & TTM_PL_FLAG_UNCACHED) state = tt_uncached; else state = tt_cached; return ttm_tt_set_caching(ttm, state); } +#else /* CONFIG_X86 */ +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement) +{ + if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) { + ttm->caching_state = tt_cached; + *placement &= ~TTM_PL_MASK_CACHING; + *placement |= TTM_PL_FLAG_CACHED; + } else { + if (*placement & TTM_PL_FLAG_WC) + ttm->caching_state = tt_wc; + else if (placement & TTM_PL_FLAG_UNCACHED) + ttm->caching_state = tt_uncached; + else + ttm->caching_state = tt_cached; + } + return 0; +} +#endif /* CONFIG_X86 */ EXPORT_SYMBOL(ttm_tt_set_placement_caching); void ttm_tt_destroy(struct ttm_tt *ttm) diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index 1d9f0f1..cbc5ad2 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -669,7 +669,7 @@ extern int ttm_tt_swapin(struct ttm_tt *ttm); * hit RAM. This function may be very costly as it involves global TLB * and cache flushes and potential page splitting / combining. */ -extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement); +extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement); extern int ttm_tt_swapout(struct ttm_tt *ttm, struct file *persistent_swap_storage); -- 1.9.3 [-- Attachment #3: Type: text/plain, Size: 159 bytes --] _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions 2014-09-04 2:36 ` Jerome Glisse (?) @ 2014-09-04 5:23 ` Benjamin Herrenschmidt -1 siblings, 0 replies; 33+ messages in thread From: Benjamin Herrenschmidt @ 2014-09-04 5:23 UTC (permalink / raw) To: Jerome Glisse Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig, dri-devel On Wed, 2014-09-03 at 22:36 -0400, Jerome Glisse wrote: > On Wed, Sep 03, 2014 at 10:31:18PM -0400, Jerome Glisse wrote: > > On Thu, Sep 04, 2014 at 12:25:23PM +1000, Benjamin Herrenschmidt wrote: > > > On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote: > > > > > > > So in the meantime the attached patch should work, it just silently ignore > > > > the caching attribute request on non x86 instead of pretending that things > > > > are setup as expected and then latter the radeon ou nouveau hw unsetting > > > > the snoop bit. > > > > > > > > It's not tested but i think it should work. > > > > > > I'm still getting placements with !CACHED going from bo_memcpy in > > > ttm_io_prot() though ... I'm looking at filtering the placement > > > attributes instead. > > > > > > Ben. > > > > Ok so this one should do the trick. > > Ok final version ... famous last word. Minus a couple of obvious typos that prevent if from building, it seems to do the trick for me with the AST driver, no more bad mappings. I'll still send a patch that catches the incorrect mapping attempts inside ttm_io_prot() and warns to help future debugging and avoid "random" behaviour. (I need to fix other things in the powerpc code in there anyway). Cheers, Ben. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions 2014-09-04 2:36 ` Jerome Glisse (?) (?) @ 2014-09-04 6:45 ` Gabriel Paubert -1 siblings, 0 replies; 33+ messages in thread From: Gabriel Paubert @ 2014-09-04 6:45 UTC (permalink / raw) To: Jerome Glisse Cc: Michel Dänzer, dri-devel, linuxppc-dev, Alex Deucher, Christian Koenig On Wed, Sep 03, 2014 at 10:36:57PM -0400, Jerome Glisse wrote: > On Wed, Sep 03, 2014 at 10:31:18PM -0400, Jerome Glisse wrote: > > On Thu, Sep 04, 2014 at 12:25:23PM +1000, Benjamin Herrenschmidt wrote: > > > On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote: > > > > > > > So in the meantime the attached patch should work, it just silently ignore > > > > the caching attribute request on non x86 instead of pretending that things > > > > are setup as expected and then latter the radeon ou nouveau hw unsetting > > > > the snoop bit. > > > > > > > > It's not tested but i think it should work. > > > > > > I'm still getting placements with !CACHED going from bo_memcpy in > > > ttm_io_prot() though ... I'm looking at filtering the placement > > > attributes instead. > > > > > > Ben. > > > > Ok so this one should do the trick. > > Ok final version ... famous last word. [snipped older version] > >From 236038e18dc303bb9aa877922e01963d3fb0b7af Mon Sep 17 00:00:00 2001 > From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <jglisse@redhat.com> > Date: Wed, 3 Sep 2014 22:04:34 -0400 > Subject: [PATCH] drm/ttm: force cached mapping on non x86 platform. > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > People interested in providing uncached or write combined mapping > on there architecture need to do the ground work inside there arch s/there/their/g > specific code to allow to break the linear kernel mapping so that > page mapping attributes can be updated, in the meantime force cached > mapping for non x86 architecture. > > Signed-off-by: Jérôme Glisse <jglisse@redhat.com> > --- > drivers/gpu/drm/radeon/radeon_ttm.c | 2 +- > drivers/gpu/drm/ttm/ttm_bo.c | 2 +- > drivers/gpu/drm/ttm/ttm_tt.c | 32 +++++++++++++++++++++----------- > include/drm/ttm/ttm_bo_driver.h | 2 +- > 4 files changed, 24 insertions(+), 14 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c > index 72afe82..4dd5060 100644 > --- a/drivers/gpu/drm/radeon/radeon_ttm.c > +++ b/drivers/gpu/drm/radeon/radeon_ttm.c > @@ -304,7 +304,7 @@ static int radeon_move_vram_ram(struct ttm_buffer_object *bo, > return r; > } > > - r = ttm_tt_set_placement_caching(bo->ttm, tmp_mem.placement); > + r = ttm_tt_set_placement_caching(bo->ttm, &tmp_mem.placement); > if (unlikely(r)) { > goto out_cleanup; > } > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c > index 3da89d5..4dc21c3 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo.c > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > @@ -305,7 +305,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo, > goto out_err; > } > > - ret = ttm_tt_set_placement_caching(bo->ttm, mem->placement); > + ret = ttm_tt_set_placement_caching(bo->ttm, &mem->placement); > if (ret) > goto out_err; > > diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c > index bf080ab..a0df803 100644 > --- a/drivers/gpu/drm/ttm/ttm_tt.c > +++ b/drivers/gpu/drm/ttm/ttm_tt.c > @@ -89,14 +89,6 @@ static inline int ttm_tt_set_page_caching(struct page *p, > > return ret; > } > -#else /* CONFIG_X86 */ > -static inline int ttm_tt_set_page_caching(struct page *p, > - enum ttm_caching_state c_old, > - enum ttm_caching_state c_new) > -{ > - return 0; > -} > -#endif /* CONFIG_X86 */ > > /* > * Change caching policy for the linear kernel map > @@ -149,19 +141,37 @@ out_err: > return ret; > } > > -int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement) > +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement) > { > enum ttm_caching_state state; > > - if (placement & TTM_PL_FLAG_WC) > + if (*placement & TTM_PL_FLAG_WC) > state = tt_wc; > - else if (placement & TTM_PL_FLAG_UNCACHED) > + else if (*placement & TTM_PL_FLAG_UNCACHED) > state = tt_uncached; > else > state = tt_cached; > > return ttm_tt_set_caching(ttm, state); > } > +#else /* CONFIG_X86 */ > +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement) > +{ > + if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) { > + ttm->caching_state = tt_cached; > + *placement &= ~TTM_PL_MASK_CACHING; > + *placement |= TTM_PL_FLAG_CACHED; > + } else { > + if (*placement & TTM_PL_FLAG_WC) > + ttm->caching_state = tt_wc; > + else if (placement & TTM_PL_FLAG_UNCACHED) > + ttm->caching_state = tt_uncached; > + else > + ttm->caching_state = tt_cached; > + } > + return 0; > +} > +#endif /* CONFIG_X86 */ > EXPORT_SYMBOL(ttm_tt_set_placement_caching); > > void ttm_tt_destroy(struct ttm_tt *ttm) > diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h > index 1d9f0f1..cbc5ad2 100644 > --- a/include/drm/ttm/ttm_bo_driver.h > +++ b/include/drm/ttm/ttm_bo_driver.h > @@ -669,7 +669,7 @@ extern int ttm_tt_swapin(struct ttm_tt *ttm); > * hit RAM. This function may be very costly as it involves global TLB > * and cache flushes and potential page splitting / combining. > */ > -extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement); > +extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement); > extern int ttm_tt_swapout(struct ttm_tt *ttm, > struct file *persistent_swap_storage); > > -- > 1.9.3 > > _______________________________________________ > Linuxppc-dev mailing list > Linuxppc-dev@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/linuxppc-dev ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions 2014-09-04 2:36 ` Jerome Glisse ` (2 preceding siblings ...) (?) @ 2014-09-04 7:19 ` Michel Dänzer 2014-09-04 7:54 ` Benjamin Herrenschmidt -1 siblings, 1 reply; 33+ messages in thread From: Michel Dänzer @ 2014-09-04 7:19 UTC (permalink / raw) To: Jerome Glisse, Benjamin Herrenschmidt Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig, dri-devel On 04.09.2014 11:36, Jerome Glisse wrote: > On Wed, Sep 03, 2014 at 10:31:18PM -0400, Jerome Glisse wrote: >> On Thu, Sep 04, 2014 at 12:25:23PM +1000, Benjamin Herrenschmidt wrote: >>> On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote: >>> >>>> So in the meantime the attached patch should work, it just silently ignore >>>> the caching attribute request on non x86 instead of pretending that things >>>> are setup as expected and then latter the radeon ou nouveau hw unsetting >>>> the snoop bit. >>>> >>>> It's not tested but i think it should work. >>> >>> I'm still getting placements with !CACHED going from bo_memcpy in >>> ttm_io_prot() though ... I'm looking at filtering the placement >>> attributes instead. >>> >>> Ben. >> >> Ok so this one should do the trick. > > Ok final version ... famous last word. [...] > +#else /* CONFIG_X86 */ > +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement) > +{ > + if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) { > + ttm->caching_state = tt_cached; > + *placement &= ~TTM_PL_MASK_CACHING; > + *placement |= TTM_PL_FLAG_CACHED; NAK, this will break AGP on PowerMacs. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions 2014-09-04 7:19 ` Michel Dänzer @ 2014-09-04 7:54 ` Benjamin Herrenschmidt 2014-09-04 7:59 ` Michel Dänzer 0 siblings, 1 reply; 33+ messages in thread From: Benjamin Herrenschmidt @ 2014-09-04 7:54 UTC (permalink / raw) To: Michel Dänzer Cc: linuxppc-dev, Michel Dänzer, dri-devel, Alex Deucher, Jerome Glisse, Christian Koenig On Thu, 2014-09-04 at 16:19 +0900, Michel Dänzer wrote: > > +#else /* CONFIG_X86 */ > > +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t > *placement) > > +{ > > + if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) { > > + ttm->caching_state = tt_cached; > > + *placement &= ~TTM_PL_MASK_CACHING; > > + *placement |= TTM_PL_FLAG_CACHED; > > NAK, this will break AGP on PowerMacs. ... which doesn't work reliably anyway with DRI2 :-) The problem is ... with DRI1 I think we had tricks to take out the AGP from the linear mapping but that want away, didn't we ? In any case, we are playing with fire on these by allowing the cache paradox. It just happens that those old CPUs aren't *that* aggressive at speculative prefetch and we probably rarely hit the lockups that they would cause... Michel, what do you recommend we do then ? The patch I sent to double check in ttm_io_prot() has a specific hack to avoid warning on PowerMac for the above reason, but we need to fix Jerome if we want to keep that broken-by-design Mac AGP functionality going :-) Maybe we could add a similar ifdef in the above ? Cheers, Ben. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions 2014-09-04 7:54 ` Benjamin Herrenschmidt @ 2014-09-04 7:59 ` Michel Dänzer 2014-09-04 7:59 ` Michel Dänzer 2014-09-04 8:07 ` Benjamin Herrenschmidt 0 siblings, 2 replies; 33+ messages in thread From: Michel Dänzer @ 2014-09-04 7:59 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig, dri-devel On 04.09.2014 16:54, Benjamin Herrenschmidt wrote: > On Thu, 2014-09-04 at 16:19 +0900, Michel Dänzer wrote: >>> +#else /* CONFIG_X86 */ >>> +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t >> *placement) >>> +{ >>> + if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) { >>> + ttm->caching_state = tt_cached; >>> + *placement &= ~TTM_PL_MASK_CACHING; >>> + *placement |= TTM_PL_FLAG_CACHED; >> >> NAK, this will break AGP on PowerMacs. > > ... which doesn't work reliably anyway with DRI2 :-) Define 'not reliably'. I have uptimes of weeks, and I'm pretty sure I'm not alone, at least with AGP 1x it seems to work quite well for most people. So I don't see the justification for intentionally breaking it completely for all of us. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions 2014-09-04 7:59 ` Michel Dänzer @ 2014-09-04 7:59 ` Michel Dänzer 2014-09-04 8:07 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 33+ messages in thread From: Michel Dänzer @ 2014-09-04 7:59 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig, dri-devel On 04.09.2014 16:59, Michel Dänzer wrote: > On 04.09.2014 16:54, Benjamin Herrenschmidt wrote: >> On Thu, 2014-09-04 at 16:19 +0900, Michel Dänzer wrote: >>>> +#else /* CONFIG_X86 */ >>>> +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t >>> *placement) >>>> +{ >>>> + if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) { >>>> + ttm->caching_state = tt_cached; >>>> + *placement &= ~TTM_PL_MASK_CACHING; >>>> + *placement |= TTM_PL_FLAG_CACHED; >>> >>> NAK, this will break AGP on PowerMacs. >> >> ... which doesn't work reliably anyway with DRI2 :-) > > Define 'not reliably'. I have uptimes of weeks, and I'm pretty sure I'm > not alone, at least with AGP 1x it seems to work quite well for most > people. So I don't see the justification for intentionally breaking it > completely for all of us. Even more so because PCI GART is unusably slow in general. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions 2014-09-04 7:59 ` Michel Dänzer 2014-09-04 7:59 ` Michel Dänzer @ 2014-09-04 8:07 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 33+ messages in thread From: Benjamin Herrenschmidt @ 2014-09-04 8:07 UTC (permalink / raw) To: Michel Dänzer Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig, dri-devel On Thu, 2014-09-04 at 16:59 +0900, Michel Dänzer wrote: > > Define 'not reliably'. I have uptimes of weeks, and I'm pretty sure I'm > not alone, at least with AGP 1x it seems to work quite well for most > people. So I don't see the justification for intentionally breaking it > completely for all of us. Oh I wasn't arguing for breaking it, just jesting. We need to keep it working. It's amazing how well broken stuff actually work though :-) I mean, it's architecturally broken and if we get a collision between the cache and the NCU, the chip will crash. We just get lucky I suppose. Anyway, I'll try a different approach tomorrow see how it goes. Cheers, Ben. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions 2014-09-04 1:55 ` Jerome Glisse 2014-09-04 2:07 ` Jerome Glisse @ 2014-09-04 2:15 ` Benjamin Herrenschmidt 2014-09-04 7:12 ` Michel Dänzer 2 siblings, 0 replies; 33+ messages in thread From: Benjamin Herrenschmidt @ 2014-09-04 2:15 UTC (permalink / raw) To: Jerome Glisse Cc: Alex Deucher, linuxppc-dev, Michel Dänzer, Christian Koenig, dri-devel On Wed, 2014-09-03 at 21:55 -0400, Jerome Glisse wrote: > So i think we need to get a platform flags and or set_pages_array_wc|uc > needs to fail and this would fallback to cached mapping if the fallback > code still works. So if your arch properly return and error for those > cache changing function then you should be fine. > > This also means that we need to fix ttm_tt_set_placement_caching so that > when it returns an error it switches to cached mapping. Which will always > work. Can't I just filter the mem_type definitions in the mem_type_manager with something along that totally untested patch ? Or do I *also* need to make those set_page_array_* things fail ? --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1308,6 +1308,24 @@ int ttm_bo_evict_mm(struct ttm_bo_device *bdev, unsigned } EXPORT_SYMBOL(ttm_bo_evict_mm); +static void ttm_bo_filter_mem_type(struct ttm_bo_device *bdev, unsigned type, + struct ttm_mem_type_manager *man) +{ + /* + * On some architectures/patforms, we cannot allow non-cachable + * mappings of system memory. This can be a problem with AGP on + * old G5 systems vs. TTM_PL_TT but we don't really have a choice + * at this point on ppc64 at least and the AGP on these never + * worked reliably anyway. + */ +#if defined(CONFIG_PPC) && !defined(CONFIG_NOT_COHERENT_CACHE) + if (type == TTM_PL_SYSTEM || type == TTM_PL_TT) { + man->available_caching &= TTM_PL_FLAG_CACHED; + man->default_caching &= man->available_caching; + } +#endif +} + int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type, unsigned long p_size) { @@ -1327,6 +1345,8 @@ int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned ty return ret; man->bdev = bdev; + ttm_bo_filter_mem_type(bdev, type, man); + ret = 0; if (type != TTM_PL_SYSTEM) { ret = (*man->func->init)(man, p_size); ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions 2014-09-04 1:55 ` Jerome Glisse 2014-09-04 2:07 ` Jerome Glisse 2014-09-04 2:15 ` Benjamin Herrenschmidt @ 2014-09-04 7:12 ` Michel Dänzer 2 siblings, 0 replies; 33+ messages in thread From: Michel Dänzer @ 2014-09-04 7:12 UTC (permalink / raw) To: Jerome Glisse, Benjamin Herrenschmidt Cc: Alex Deucher, linuxppc-dev, Christian Koenig, dri-devel On 04.09.2014 10:55, Jerome Glisse wrote: > > While i agree about the issue of incoherent double map of same page, i > think we have more issue. For instance lattely AMD have been pushing a > lot of patches to move things to use uncached memory for radeon and as > usual thoses patches comes with no comment to the motivations of those > changes. That would have been a fair review comment... > What i understand is that uncached mapping for some frequently use buffer > give a significant performance boost (i am assuming this has to do with > all the snoop pci transaction overhead). Exactly, although it's a win even if the data is written by the CPU only once and read by the GPU only once. > This also means that we need to fix ttm_tt_set_placement_caching so that > when it returns an error it switches to cached mapping. Which will always > work. GTT with AGP being one exception. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions 2014-09-04 0:12 ` Benjamin Herrenschmidt @ 2014-09-04 7:44 ` Thomas Hellstrom -1 siblings, 0 replies; 33+ messages in thread From: Thomas Hellstrom @ 2014-09-04 7:44 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Jerome Glisse, linuxppc-dev, Michel Danzer, dri-devel Hi! Let me try to bring some clarity and suggestions into this. On 09/04/2014 02:12 AM, Benjamin Herrenschmidt wrote: > Hi folks ! > > I've been tracking down some problems with the recent DRI on powerpc and > stumbled upon something that doesn't look right, and not necessarily > only for us. > > Now it's possible that I haven't fully understood the code here and I > also don't know to what extent some of that behaviour is necessary for > some platforms such as Intel GTT bits. > > What I've observed with a simple/dumb (no DMA) driver like AST (but this > probably happens more generally) is that when evicting a BO from VRAM > into System memory, the TTM tries to preserve the existing caching > attributes of the VRAM object. > > >From what I can tell, we end up with going from VRAM to System memory > type, and we eventually call ttm_bo_select_caching() to select the > caching option for the target. > > This will, from what I can tell, try to use the same caching mode as the > original object: > > if ((cur_placement & caching) != 0) > result |= (cur_placement & caching); > > And cur_placement comes from bo->mem.placement which as far as I can > tell is based on the placement array which the drivers set up. This originates from the fact that when evicting GTT memory, on x86 it's unnecessary and undesirable to switch caching mode when going to system. > > Now they tend to uniformly setup the placement for System memory as > TTM_PL_MASK_CACHING which enables all caching modes. > > So I end up with, for example, my System memory BOs having > TTM_PL_FLAG_CACHED not set (though they also don't have > TTM_PL_FLAG_UNCACHED) and TTM_PL_FLAG_WC. > > We don't seem to use the man->default_caching (which will have > TTM_PL_FLAG_CACHED) unless there is no matching bit at all between the > proposed placement and the existing caching mode. > > Now this is a problem for several reason that I can think of: > > - On a number of powerpc platforms, such as all our server 64-bit one > for example, it's actually illegal to map system memory non-cached. The > system is fully cache coherent for all possible DMA originators (that we > care about at least) and mapping memory non-cachable while it's mapped > cachable in the linear mapping can cause nasty cache paradox which, when > detected by HW, can checkstop the system. > > - A similar issue exists, afaik, on ARM >= v7, so anything mapped > non-cachable must be removed from the linear mapping explicitly since > otherwise it can be speculatively prefetched into the cache. > > - I don't know about x86, but even then, it looks quite sub-optimal to > map the memory backing of the BOs and access it using a WC rather than a > cachable mapping attribute. Last time I tested, (and it seems like Michel is on the same track), writing with the CPU to write-combined memory was substantially faster than writing to cached memory, with the additional side-effect that CPU caches are left unpolluted. Moreover (although only tested on Intel's embedded chipsets), texturing from cpu-cache-coherent PCI memory was a real GPU performance hog compared to texturing from non-snooped memory. Hence, whenever a buffer could be classified as GPU-read-only (or almost at least), it should be placed in write-combined memory. > > Now, some folks on IRC mentioned that there might be reasons for the > current behaviour as to not change the caching attributes when going > in/out of the GTT on Intel, I don't know how that relates and how that > works, but maybe that should be enforced by having a different placement > mask specifically on those chipsets. > > Dave, should we change the various PCI drivers for generally coherent > devices such that the System memory type doesn't allow placements > without CACHED attribute ? Or at least on coherent platforms ? How do > detect that ? Should we have a TTM helper to establish the default > memory placement attributes that "normal PCI" drivers call to set that > up so we can have all the necessary arch ifdefs in one single place, at > least for "classic PCI/PCIe" stuff (AGP might need additional tweaks) ? > > Non-PCI and "special" drivers like Intel can use a different set of > placement attributes to represent the requirements of those specific > platforms (mostly thinking of embedded ARM here which under some > circumstances might actually require non-cached mappings). > Or am I missing another part of the puzzle ? > > As it-is, things are broken for me even for dumb drivers, and I suspect > to a large extent with radeon and nouveau too, though in some case we > might get away with it most of the time ... until the machine locks up > for some unexplainable reason... This might cause problems on existing > distros such as RHEL7 with our radeon adapters even. > > Any suggestion of what's the best approach to fix it ? I'm happy to > produce the patches but I'm not that familiar with the TTM so I would > like to make sure I'm the right track first :-) I dislike the approach of rewriting placements. In some cases I think it won't even work, because placements are declared 'static const' What I'd suggest is instead to intercept the driver response from init_mem_type() and filter out undesired caching modes from available_caching and default_caching, perhaps also looking at whether the memory type is mappable or not. This should have the additional benefit of working everywhere, and if a caching mode is selected that's not available on the platform, you'll simply get an error. (I guess?) /Thomas > > Cheers, > Ben. > > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions @ 2014-09-04 7:44 ` Thomas Hellstrom 0 siblings, 0 replies; 33+ messages in thread From: Thomas Hellstrom @ 2014-09-04 7:44 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Jerome Glisse, linuxppc-dev, Michel Danzer, dri-devel Hi! Let me try to bring some clarity and suggestions into this. On 09/04/2014 02:12 AM, Benjamin Herrenschmidt wrote: > Hi folks ! > > I've been tracking down some problems with the recent DRI on powerpc and > stumbled upon something that doesn't look right, and not necessarily > only for us. > > Now it's possible that I haven't fully understood the code here and I > also don't know to what extent some of that behaviour is necessary for > some platforms such as Intel GTT bits. > > What I've observed with a simple/dumb (no DMA) driver like AST (but this > probably happens more generally) is that when evicting a BO from VRAM > into System memory, the TTM tries to preserve the existing caching > attributes of the VRAM object. > > >From what I can tell, we end up with going from VRAM to System memory > type, and we eventually call ttm_bo_select_caching() to select the > caching option for the target. > > This will, from what I can tell, try to use the same caching mode as the > original object: > > if ((cur_placement & caching) != 0) > result |= (cur_placement & caching); > > And cur_placement comes from bo->mem.placement which as far as I can > tell is based on the placement array which the drivers set up. This originates from the fact that when evicting GTT memory, on x86 it's unnecessary and undesirable to switch caching mode when going to system. > > Now they tend to uniformly setup the placement for System memory as > TTM_PL_MASK_CACHING which enables all caching modes. > > So I end up with, for example, my System memory BOs having > TTM_PL_FLAG_CACHED not set (though they also don't have > TTM_PL_FLAG_UNCACHED) and TTM_PL_FLAG_WC. > > We don't seem to use the man->default_caching (which will have > TTM_PL_FLAG_CACHED) unless there is no matching bit at all between the > proposed placement and the existing caching mode. > > Now this is a problem for several reason that I can think of: > > - On a number of powerpc platforms, such as all our server 64-bit one > for example, it's actually illegal to map system memory non-cached. The > system is fully cache coherent for all possible DMA originators (that we > care about at least) and mapping memory non-cachable while it's mapped > cachable in the linear mapping can cause nasty cache paradox which, when > detected by HW, can checkstop the system. > > - A similar issue exists, afaik, on ARM >= v7, so anything mapped > non-cachable must be removed from the linear mapping explicitly since > otherwise it can be speculatively prefetched into the cache. > > - I don't know about x86, but even then, it looks quite sub-optimal to > map the memory backing of the BOs and access it using a WC rather than a > cachable mapping attribute. Last time I tested, (and it seems like Michel is on the same track), writing with the CPU to write-combined memory was substantially faster than writing to cached memory, with the additional side-effect that CPU caches are left unpolluted. Moreover (although only tested on Intel's embedded chipsets), texturing from cpu-cache-coherent PCI memory was a real GPU performance hog compared to texturing from non-snooped memory. Hence, whenever a buffer could be classified as GPU-read-only (or almost at least), it should be placed in write-combined memory. > > Now, some folks on IRC mentioned that there might be reasons for the > current behaviour as to not change the caching attributes when going > in/out of the GTT on Intel, I don't know how that relates and how that > works, but maybe that should be enforced by having a different placement > mask specifically on those chipsets. > > Dave, should we change the various PCI drivers for generally coherent > devices such that the System memory type doesn't allow placements > without CACHED attribute ? Or at least on coherent platforms ? How do > detect that ? Should we have a TTM helper to establish the default > memory placement attributes that "normal PCI" drivers call to set that > up so we can have all the necessary arch ifdefs in one single place, at > least for "classic PCI/PCIe" stuff (AGP might need additional tweaks) ? > > Non-PCI and "special" drivers like Intel can use a different set of > placement attributes to represent the requirements of those specific > platforms (mostly thinking of embedded ARM here which under some > circumstances might actually require non-cached mappings). > Or am I missing another part of the puzzle ? > > As it-is, things are broken for me even for dumb drivers, and I suspect > to a large extent with radeon and nouveau too, though in some case we > might get away with it most of the time ... until the machine locks up > for some unexplainable reason... This might cause problems on existing > distros such as RHEL7 with our radeon adapters even. > > Any suggestion of what's the best approach to fix it ? I'm happy to > produce the patches but I'm not that familiar with the TTM so I would > like to make sure I'm the right track first :-) I dislike the approach of rewriting placements. In some cases I think it won't even work, because placements are declared 'static const' What I'd suggest is instead to intercept the driver response from init_mem_type() and filter out undesired caching modes from available_caching and default_caching, perhaps also looking at whether the memory type is mappable or not. This should have the additional benefit of working everywhere, and if a caching mode is selected that's not available on the platform, you'll simply get an error. (I guess?) /Thomas > > Cheers, > Ben. > > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions 2014-09-04 7:44 ` Thomas Hellstrom (?) @ 2014-09-04 8:06 ` Benjamin Herrenschmidt 2014-09-04 8:46 ` Thomas Hellstrom -1 siblings, 1 reply; 33+ messages in thread From: Benjamin Herrenschmidt @ 2014-09-04 8:06 UTC (permalink / raw) To: Thomas Hellstrom; +Cc: Jerome Glisse, linuxppc-dev, Michel Danzer, dri-devel On Thu, 2014-09-04 at 09:44 +0200, Thomas Hellstrom wrote: > > This will, from what I can tell, try to use the same caching mode as the > > original object: > > > > if ((cur_placement & caching) != 0) > > result |= (cur_placement & caching); > > > > And cur_placement comes from bo->mem.placement which as far as I can > > tell is based on the placement array which the drivers set up. > > This originates from the fact that when evicting GTT memory, on x86 it's > unnecessary and undesirable to switch caching mode when going to system. But that's what I don't quite understand. We have two different mappings here. The VRAM and the memory object. We wouldn't be "switching"... we are creating a temporary mapping for the memory object in order to do the memcpy, but we seem to be doing it by using the caching attributes of the VRAM object.... or am I missing something ? I don't see how that makes sense so I suppose I'm missing something here :-) > Last time I tested, (and it seems like Michel is on the same track), > writing with the CPU to write-combined memory was substantially faster > than writing to cached memory, with the additional side-effect that CPU > caches are left unpolluted. That's very strange indeed. It's certainly an x86 specific artifact, even if we were allowed by our hypervisor to map memory non-cachable (the HW somewhat can), we tend to have a higher throughput by going cachable, but that could be due to the way the PowerBus works (it's basically very biased toward cachable transactions). > I dislike the approach of rewriting placements. In some cases I think it > won't even work, because placements are declared 'static const' > > What I'd suggest is instead to intercept the driver response from > init_mem_type() and filter out undesired caching modes from > available_caching and default_caching, This was my original intent but Jerome seems to have different ideas (see his proposed patches). I'm happy to revive mine as well and post it as an alternative after I've tested it a bit more (tomorrow). > perhaps also looking at whether > the memory type is mappable or not. This should have the additional > benefit of working everywhere, and if a caching mode is selected that's > not available on the platform, you'll simply get an error. (I guess?) You mean that if not mappable we don't bother filtering ? The rule is really for me pretty simple: - If it's system memory (PL_SYSTEM/PL_TT), it MUST be cachable - If it's PCIe memory space (VRAM, registers, ...) it MUST be non-cachable. Cheers, Ben. > /Thomas > > > > > > Cheers, > > Ben. > > > > > > _______________________________________________ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions 2014-09-04 8:06 ` Benjamin Herrenschmidt @ 2014-09-04 8:46 ` Thomas Hellstrom 0 siblings, 0 replies; 33+ messages in thread From: Thomas Hellstrom @ 2014-09-04 8:46 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Jerome Glisse, linuxppc-dev, Michel Danzer, dri-devel On 09/04/2014 10:06 AM, Benjamin Herrenschmidt wrote: > On Thu, 2014-09-04 at 09:44 +0200, Thomas Hellstrom wrote: > >>> This will, from what I can tell, try to use the same caching mode as the >>> original object: >>> >>> if ((cur_placement & caching) != 0) >>> result |= (cur_placement & caching); >>> >>> And cur_placement comes from bo->mem.placement which as far as I can >>> tell is based on the placement array which the drivers set up. >> This originates from the fact that when evicting GTT memory, on x86 it's >> unnecessary and undesirable to switch caching mode when going to system. > But that's what I don't quite understand. We have two different mappings > here. The VRAM and the memory object. We wouldn't be "switching"... we > are creating a temporary mapping for the memory object in order to do > the memcpy, but we seem to be doing it by using the caching attributes > of the VRAM object.... or am I missing something ? I don't see how that > makes sense so I suppose I'm missing something here :-) Well, the intention when TTM was written was that the driver writer should be smart enough that when he wanted a move from unached VRAM to system, he'd request cached system in the placement flags in the first place. If TTM somehow overrides such a request, that's a bug in TTM. If the move, for example, is a result of an eviction, then the driver evict_flags() function should ideally look at the current placement and decide about a suitable placement based on that: vram-to-system moves should generally request cacheable memory if the next access is expected by the CPU. Probably write-combined otherwise. If the move is the result of a TTM swapout, TTM will automatically select cachable system, and for most other moves, I think the driver writer is in full control. > >> Last time I tested, (and it seems like Michel is on the same track), >> writing with the CPU to write-combined memory was substantially faster >> than writing to cached memory, with the additional side-effect that CPU >> caches are left unpolluted. > That's very strange indeed. It's certainly an x86 specific artifact, > even if we were allowed by our hypervisor to map memory non-cachable > (the HW somewhat can), we tend to have a higher throughput by going > cachable, but that could be due to the way the PowerBus works (it's > basically very biased toward cachable transactions). > >> I dislike the approach of rewriting placements. In some cases I think it >> won't even work, because placements are declared 'static const' >> >> What I'd suggest is instead to intercept the driver response from >> init_mem_type() and filter out undesired caching modes from >> available_caching and default_caching, > This was my original intent but Jerome seems to have different ideas > (see his proposed patches). I'm happy to revive mine as well and post it > as an alternative after I've tested it a bit more (tomorrow). > >> perhaps also looking at whether >> the memory type is mappable or not. This should have the additional >> benefit of working everywhere, and if a caching mode is selected that's >> not available on the platform, you'll simply get an error. (I guess?) > You mean that if not mappable we don't bother filtering ? > > The rule is really for me pretty simple: > > - If it's system memory (PL_SYSTEM/PL_TT), it MUST be cachable > > - If it's PCIe memory space (VRAM, registers, ...) it MUST be > non-cachable. Yes, something along these lines. I guess checking for VRAM or TTM_MEMTYPE_FLAG_FIXED would perhaps do the trick /Thomas > > Cheers, > Ben. > >> /Thomas >> >> >>> Cheers, >>> Ben. >>> >>> >>> _______________________________________________ >>> dri-devel mailing list >>> dri-devel@lists.freedesktop.org >>> https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/dri-devel&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=C9AHL1VngKBOxe2UrNP2eCZo6FLqdlr6Y90rpfE5rUs%3D%0A&s=73da0633bafc5d54bf116bc861d48d13c39cf8f41832adfb739709e98ec05553 > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions @ 2014-09-04 8:46 ` Thomas Hellstrom 0 siblings, 0 replies; 33+ messages in thread From: Thomas Hellstrom @ 2014-09-04 8:46 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Jerome Glisse, linuxppc-dev, Michel Danzer, dri-devel On 09/04/2014 10:06 AM, Benjamin Herrenschmidt wrote: > On Thu, 2014-09-04 at 09:44 +0200, Thomas Hellstrom wrote: > >>> This will, from what I can tell, try to use the same caching mode as the >>> original object: >>> >>> if ((cur_placement & caching) != 0) >>> result |= (cur_placement & caching); >>> >>> And cur_placement comes from bo->mem.placement which as far as I can >>> tell is based on the placement array which the drivers set up. >> This originates from the fact that when evicting GTT memory, on x86 it's >> unnecessary and undesirable to switch caching mode when going to system. > But that's what I don't quite understand. We have two different mappings > here. The VRAM and the memory object. We wouldn't be "switching"... we > are creating a temporary mapping for the memory object in order to do > the memcpy, but we seem to be doing it by using the caching attributes > of the VRAM object.... or am I missing something ? I don't see how that > makes sense so I suppose I'm missing something here :-) Well, the intention when TTM was written was that the driver writer should be smart enough that when he wanted a move from unached VRAM to system, he'd request cached system in the placement flags in the first place. If TTM somehow overrides such a request, that's a bug in TTM. If the move, for example, is a result of an eviction, then the driver evict_flags() function should ideally look at the current placement and decide about a suitable placement based on that: vram-to-system moves should generally request cacheable memory if the next access is expected by the CPU. Probably write-combined otherwise. If the move is the result of a TTM swapout, TTM will automatically select cachable system, and for most other moves, I think the driver writer is in full control. > >> Last time I tested, (and it seems like Michel is on the same track), >> writing with the CPU to write-combined memory was substantially faster >> than writing to cached memory, with the additional side-effect that CPU >> caches are left unpolluted. > That's very strange indeed. It's certainly an x86 specific artifact, > even if we were allowed by our hypervisor to map memory non-cachable > (the HW somewhat can), we tend to have a higher throughput by going > cachable, but that could be due to the way the PowerBus works (it's > basically very biased toward cachable transactions). > >> I dislike the approach of rewriting placements. In some cases I think it >> won't even work, because placements are declared 'static const' >> >> What I'd suggest is instead to intercept the driver response from >> init_mem_type() and filter out undesired caching modes from >> available_caching and default_caching, > This was my original intent but Jerome seems to have different ideas > (see his proposed patches). I'm happy to revive mine as well and post it > as an alternative after I've tested it a bit more (tomorrow). > >> perhaps also looking at whether >> the memory type is mappable or not. This should have the additional >> benefit of working everywhere, and if a caching mode is selected that's >> not available on the platform, you'll simply get an error. (I guess?) > You mean that if not mappable we don't bother filtering ? > > The rule is really for me pretty simple: > > - If it's system memory (PL_SYSTEM/PL_TT), it MUST be cachable > > - If it's PCIe memory space (VRAM, registers, ...) it MUST be > non-cachable. Yes, something along these lines. I guess checking for VRAM or TTM_MEMTYPE_FLAG_FIXED would perhaps do the trick /Thomas > > Cheers, > Ben. > >> /Thomas >> >> >>> Cheers, >>> Ben. >>> >>> >>> _______________________________________________ >>> dri-devel mailing list >>> dri-devel@lists.freedesktop.org >>> https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/dri-devel&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=C9AHL1VngKBOxe2UrNP2eCZo6FLqdlr6Y90rpfE5rUs%3D%0A&s=73da0633bafc5d54bf116bc861d48d13c39cf8f41832adfb739709e98ec05553 > _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions 2014-09-04 7:44 ` Thomas Hellstrom @ 2014-09-04 9:34 ` Daniel Vetter -1 siblings, 0 replies; 33+ messages in thread From: Daniel Vetter @ 2014-09-04 9:34 UTC (permalink / raw) To: Thomas Hellstrom; +Cc: dri-devel, Michel Danzer, linuxppc-dev On Thu, Sep 04, 2014 at 09:44:04AM +0200, Thomas Hellstrom wrote: > Last time I tested, (and it seems like Michel is on the same track), > writing with the CPU to write-combined memory was substantially faster > than writing to cached memory, with the additional side-effect that CPU > caches are left unpolluted. > > Moreover (although only tested on Intel's embedded chipsets), texturing > from cpu-cache-coherent PCI memory was a real GPU performance hog > compared to texturing from non-snooped memory. Hence, whenever a buffer > could be classified as GPU-read-only (or almost at least), it should be > placed in write-combined memory. Just a quick comment since this explicitly referes to intel chips: On desktop/laptop chips with the big shared l3/l4 caches it's the other way round. Cached uploads are substantially faster than wc and not using coherent access is a severe perf hit for texturing. I guess the hw guys worked really hard to hide the snooping costs so that the gpu can benefit from the massive bandwidth these caches can provide. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions @ 2014-09-04 9:34 ` Daniel Vetter 0 siblings, 0 replies; 33+ messages in thread From: Daniel Vetter @ 2014-09-04 9:34 UTC (permalink / raw) To: Thomas Hellstrom; +Cc: dri-devel, Michel Danzer, linuxppc-dev On Thu, Sep 04, 2014 at 09:44:04AM +0200, Thomas Hellstrom wrote: > Last time I tested, (and it seems like Michel is on the same track), > writing with the CPU to write-combined memory was substantially faster > than writing to cached memory, with the additional side-effect that CPU > caches are left unpolluted. > > Moreover (although only tested on Intel's embedded chipsets), texturing > from cpu-cache-coherent PCI memory was a real GPU performance hog > compared to texturing from non-snooped memory. Hence, whenever a buffer > could be classified as GPU-read-only (or almost at least), it should be > placed in write-combined memory. Just a quick comment since this explicitly referes to intel chips: On desktop/laptop chips with the big shared l3/l4 caches it's the other way round. Cached uploads are substantially faster than wc and not using coherent access is a severe perf hit for texturing. I guess the hw guys worked really hard to hide the snooping costs so that the gpu can benefit from the massive bandwidth these caches can provide. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions 2014-09-04 9:34 ` Daniel Vetter (?) @ 2014-09-04 9:43 ` Benjamin Herrenschmidt 2014-09-04 10:23 ` Thomas Hellstrom -1 siblings, 1 reply; 33+ messages in thread From: Benjamin Herrenschmidt @ 2014-09-04 9:43 UTC (permalink / raw) To: Daniel Vetter; +Cc: linuxppc-dev, Thomas Hellstrom, dri-devel, Michel Danzer On Thu, 2014-09-04 at 11:34 +0200, Daniel Vetter wrote: > On Thu, Sep 04, 2014 at 09:44:04AM +0200, Thomas Hellstrom wrote: > > Last time I tested, (and it seems like Michel is on the same track), > > writing with the CPU to write-combined memory was substantially faster > > than writing to cached memory, with the additional side-effect that CPU > > caches are left unpolluted. > > > > Moreover (although only tested on Intel's embedded chipsets), texturing > > from cpu-cache-coherent PCI memory was a real GPU performance hog > > compared to texturing from non-snooped memory. Hence, whenever a buffer > > could be classified as GPU-read-only (or almost at least), it should be > > placed in write-combined memory. > > Just a quick comment since this explicitly referes to intel chips: On > desktop/laptop chips with the big shared l3/l4 caches it's the other way > round. Cached uploads are substantially faster than wc and not using > coherent access is a severe perf hit for texturing. I guess the hw guys > worked really hard to hide the snooping costs so that the gpu can benefit > from the massive bandwidth these caches can provide. This is similar to modern POWER chips as well. We have pretty big L3's (though not technically shared they are in a separate quadrant and we have a shared L4 in the memory buffer) and our fabric is generally optimized for cachable/coherent access performance. In fact, we only have so many credits for NC accesses on the bus... What that tells me is that when setting up the desired cachability attributes for the mapping of a memory object, we need to consider these things here: - The hard requirement of the HW (non-coherent GPUs require NC, AGP does in some cases, etc...) which I think is basically already handled using the placement attributes set by the GPU driver for the memory type - The optimal attributes (and platform hard requirements) for fast memory accesses to an object by the processor. From what I read here, this can be NC+WC on older Intel, cachable on newer, etc...) - The optimal attributes for fast GPU DMA accesses to the object in system memory. Here too, this is fairly platform/chipset dependent. Do we have flags in the DRM that tell us whether an object in memory is more likely to be used by the GPU via DMA vs by the CPU via MMIO ? On powerpc (except in the old AGP case), I wouldn't care about require cachable in both case, but I can see the low latency crowd wanting the former to be non-cachable while the dumb GPUs like AST who don't do DMA would benefit greatly from the latter... Cheers, Ben. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions 2014-09-04 9:43 ` Benjamin Herrenschmidt @ 2014-09-04 10:23 ` Thomas Hellstrom 0 siblings, 0 replies; 33+ messages in thread From: Thomas Hellstrom @ 2014-09-04 10:23 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, dri-devel, Daniel Vetter On 09/04/2014 11:43 AM, Benjamin Herrenschmidt wrote: > On Thu, 2014-09-04 at 11:34 +0200, Daniel Vetter wrote: >> On Thu, Sep 04, 2014 at 09:44:04AM +0200, Thomas Hellstrom wrote: >>> Last time I tested, (and it seems like Michel is on the same track), >>> writing with the CPU to write-combined memory was substantially faster >>> than writing to cached memory, with the additional side-effect that CPU >>> caches are left unpolluted. >>> >>> Moreover (although only tested on Intel's embedded chipsets), texturing >>> from cpu-cache-coherent PCI memory was a real GPU performance hog >>> compared to texturing from non-snooped memory. Hence, whenever a buffer >>> could be classified as GPU-read-only (or almost at least), it should be >>> placed in write-combined memory. >> Just a quick comment since this explicitly referes to intel chips: On >> desktop/laptop chips with the big shared l3/l4 caches it's the other way >> round. Cached uploads are substantially faster than wc and not using >> coherent access is a severe perf hit for texturing. I guess the hw guys >> worked really hard to hide the snooping costs so that the gpu can benefit >> from the massive bandwidth these caches can provide. > This is similar to modern POWER chips as well. We have pretty big L3's > (though not technically shared they are in a separate quadrant and we > have a shared L4 in the memory buffer) and our fabric is generally > optimized for cachable/coherent access performance. In fact, we only > have so many credits for NC accesses on the bus... > Thanks both of you for the update. I haven't dealt with real hardware for a while.. /Thomas ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: TTM placement & caching issue/questions @ 2014-09-04 10:23 ` Thomas Hellstrom 0 siblings, 0 replies; 33+ messages in thread From: Thomas Hellstrom @ 2014-09-04 10:23 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, dri-devel, Daniel Vetter On 09/04/2014 11:43 AM, Benjamin Herrenschmidt wrote: > On Thu, 2014-09-04 at 11:34 +0200, Daniel Vetter wrote: >> On Thu, Sep 04, 2014 at 09:44:04AM +0200, Thomas Hellstrom wrote: >>> Last time I tested, (and it seems like Michel is on the same track), >>> writing with the CPU to write-combined memory was substantially faster >>> than writing to cached memory, with the additional side-effect that CPU >>> caches are left unpolluted. >>> >>> Moreover (although only tested on Intel's embedded chipsets), texturing >>> from cpu-cache-coherent PCI memory was a real GPU performance hog >>> compared to texturing from non-snooped memory. Hence, whenever a buffer >>> could be classified as GPU-read-only (or almost at least), it should be >>> placed in write-combined memory. >> Just a quick comment since this explicitly referes to intel chips: On >> desktop/laptop chips with the big shared l3/l4 caches it's the other way >> round. Cached uploads are substantially faster than wc and not using >> coherent access is a severe perf hit for texturing. I guess the hw guys >> worked really hard to hide the snooping costs so that the gpu can benefit >> from the massive bandwidth these caches can provide. > This is similar to modern POWER chips as well. We have pretty big L3's > (though not technically shared they are in a separate quadrant and we > have a shared L4 in the memory buffer) and our fabric is generally > optimized for cachable/coherent access performance. In fact, we only > have so many credits for NC accesses on the bus... > Thanks both of you for the update. I haven't dealt with real hardware for a while.. /Thomas _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <ED4D93630842CD4385F644DC5158EE9171B05E72@NTOVMAIL03.ad.otto.de>]
* Re: TTM placement & caching issue/questions [not found] <ED4D93630842CD4385F644DC5158EE9171B05E72@NTOVMAIL03.ad.otto.de> @ 2014-09-05 7:40 ` Jochen Rollwagen 0 siblings, 0 replies; 33+ messages in thread From: Jochen Rollwagen @ 2014-09-05 7:40 UTC (permalink / raw) To: dri-devel [-- Attachment #1.1: Type: text/plain, Size: 566 bytes --] Sorry for chiming in here, i’ve passively followed the discussion. If you produce patches for a 3.14 kernel that solve this issue: http://lists.freedesktop.org/archives/dri-devel/2013-November/049185.html (thereby enabling AGP mode on mac mini G4’s, should still be quite a few of them around) you can register me as a happy tester. My "uptime" with a powerpc AGP system is about two minutes :-) (although the system behaviour has changed somewhat with more recent kernels from a complete lockup to an endless loop). Cheers Jochen [-- Attachment #1.2: Type: text/html, Size: 3021 bytes --] [-- Attachment #2: Type: text/plain, Size: 159 bytes --] _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2014-09-05 7:40 UTC | newest] Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-09-04 0:12 TTM placement & caching issue/questions Benjamin Herrenschmidt 2014-09-04 0:12 ` Benjamin Herrenschmidt 2014-09-04 1:55 ` Jerome Glisse 2014-09-04 2:07 ` Jerome Glisse 2014-09-04 2:07 ` Jerome Glisse 2014-09-04 2:25 ` Benjamin Herrenschmidt 2014-09-04 2:25 ` Benjamin Herrenschmidt 2014-09-04 2:31 ` Jerome Glisse 2014-09-04 2:31 ` Jerome Glisse 2014-09-04 2:32 ` Jerome Glisse 2014-09-04 2:32 ` Jerome Glisse 2014-09-04 2:36 ` Jerome Glisse 2014-09-04 2:36 ` Jerome Glisse 2014-09-04 5:23 ` Benjamin Herrenschmidt 2014-09-04 6:45 ` Gabriel Paubert 2014-09-04 7:19 ` Michel Dänzer 2014-09-04 7:54 ` Benjamin Herrenschmidt 2014-09-04 7:59 ` Michel Dänzer 2014-09-04 7:59 ` Michel Dänzer 2014-09-04 8:07 ` Benjamin Herrenschmidt 2014-09-04 2:15 ` Benjamin Herrenschmidt 2014-09-04 7:12 ` Michel Dänzer 2014-09-04 7:44 ` Thomas Hellstrom 2014-09-04 7:44 ` Thomas Hellstrom 2014-09-04 8:06 ` Benjamin Herrenschmidt 2014-09-04 8:46 ` Thomas Hellstrom 2014-09-04 8:46 ` Thomas Hellstrom 2014-09-04 9:34 ` Daniel Vetter 2014-09-04 9:34 ` Daniel Vetter 2014-09-04 9:43 ` Benjamin Herrenschmidt 2014-09-04 10:23 ` Thomas Hellstrom 2014-09-04 10:23 ` Thomas Hellstrom [not found] <ED4D93630842CD4385F644DC5158EE9171B05E72@NTOVMAIL03.ad.otto.de> 2014-09-05 7:40 ` Jochen Rollwagen
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.