All of lore.kernel.org
 help / color / mirror / Atom feed
* Why can't ttm_tt_swapout() handle uncached or WC BOs?
@ 2020-09-17 12:20 Christian König
  2020-09-17 14:44 ` Michel Dänzer
  0 siblings, 1 reply; 8+ messages in thread
From: Christian König @ 2020-09-17 12:20 UTC (permalink / raw)
  To: Michel Dänzer, Thomas Hellström, dri-devel, Daniel Vetter

Hi guys,

Michel once submitted a patch to fix triggering this BUG_ON in 
ttm_tt_swapout():

> BUG_ON(ttm->caching_state != tt_cached);

Now my question is does anybody know why we have that in the first place?

The only problematic thing I can see is calling copy_highpage() and that 
one is just doing a kmap_atomic()/kunmap_atomic() on the source and 
destination.

I can't see why it should be problematic for this temporary mapping to 
be cached instead of uncached or WC?

Does anybody has any idea?

Thanks,
Christian.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Why can't ttm_tt_swapout() handle uncached or WC BOs?
  2020-09-17 12:20 Why can't ttm_tt_swapout() handle uncached or WC BOs? Christian König
@ 2020-09-17 14:44 ` Michel Dänzer
  2020-09-17 14:49   ` Christian König
  0 siblings, 1 reply; 8+ messages in thread
From: Michel Dänzer @ 2020-09-17 14:44 UTC (permalink / raw)
  To: christian.koenig, Thomas Hellström, Daniel Vetter; +Cc: dri-devel

On 2020-09-17 2:20 p.m., Christian König wrote:
> Hi guys,
> 
> Michel once submitted a patch to fix triggering this BUG_ON in 
> ttm_tt_swapout():
> 
>> BUG_ON(ttm->caching_state != tt_cached);
> 
> Now my question is does anybody know why we have that in the first place?
> 
> The only problematic thing I can see is calling copy_highpage() and that 
> one is just doing a kmap_atomic()/kunmap_atomic() on the source and 
> destination.
> 
> I can't see why it should be problematic for this temporary mapping to 
> be cached instead of uncached or WC?
> 
> Does anybody has any idea?

One thing is that AFAIK some (ARM?) CPUs can get very upset when there's 
both a cached and uncacheable mapping for the same physical page.


-- 
Earthling Michel Dänzer               |               https://redhat.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Why can't ttm_tt_swapout() handle uncached or WC BOs?
  2020-09-17 14:44 ` Michel Dänzer
@ 2020-09-17 14:49   ` Christian König
  2020-09-23  3:17     ` Dave Airlie
  0 siblings, 1 reply; 8+ messages in thread
From: Christian König @ 2020-09-17 14:49 UTC (permalink / raw)
  To: Michel Dänzer, Thomas Hellström, Daniel Vetter; +Cc: dri-devel

Am 17.09.20 um 16:44 schrieb Michel Dänzer:
> On 2020-09-17 2:20 p.m., Christian König wrote:
>> Hi guys,
>>
>> Michel once submitted a patch to fix triggering this BUG_ON in 
>> ttm_tt_swapout():
>>
>>> BUG_ON(ttm->caching_state != tt_cached);
>>
>> Now my question is does anybody know why we have that in the first 
>> place?
>>
>> The only problematic thing I can see is calling copy_highpage() and 
>> that one is just doing a kmap_atomic()/kunmap_atomic() on the source 
>> and destination.
>>
>> I can't see why it should be problematic for this temporary mapping 
>> to be cached instead of uncached or WC?
>>
>> Does anybody has any idea?
>
> One thing is that AFAIK some (ARM?) CPUs can get very upset when 
> there's both a cached and uncacheable mapping for the same physical page.

Good point, but I already considered this.

If there is another mapping for that page left we are completely busted 
anyway since we are currently changing the underlying storage.

In other words nobody else should have a mapping because we are about to 
copy and then free up the memory.

Any other idea? It is the only place where we actually have to change 
the caching attributes.

Christian.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Why can't ttm_tt_swapout() handle uncached or WC BOs?
  2020-09-17 14:49   ` Christian König
@ 2020-09-23  3:17     ` Dave Airlie
  2020-09-23  9:24       ` Hellstrom, Thomas
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Airlie @ 2020-09-23  3:17 UTC (permalink / raw)
  To: Christian König; +Cc: Michel Dänzer, Thomas Hellström, dri-devel

On Fri, 18 Sep 2020 at 00:49, Christian König <christian.koenig@amd.com> wrote:
>
> Am 17.09.20 um 16:44 schrieb Michel Dänzer:
> > On 2020-09-17 2:20 p.m., Christian König wrote:
> >> Hi guys,
> >>
> >> Michel once submitted a patch to fix triggering this BUG_ON in
> >> ttm_tt_swapout():
> >>
> >>> BUG_ON(ttm->caching_state != tt_cached);
> >>
> >> Now my question is does anybody know why we have that in the first
> >> place?
> >>
> >> The only problematic thing I can see is calling copy_highpage() and
> >> that one is just doing a kmap_atomic()/kunmap_atomic() on the source
> >> and destination.
> >>
> >> I can't see why it should be problematic for this temporary mapping
> >> to be cached instead of uncached or WC?
> >>
> >> Does anybody has any idea?
> >
> > One thing is that AFAIK some (ARM?) CPUs can get very upset when
> > there's both a cached and uncacheable mapping for the same physical page.
>
> Good point, but I already considered this.
>
> If there is another mapping for that page left we are completely busted
> anyway since we are currently changing the underlying storage.
>

It's not just ARM of course, x86 PAT also has many issues about
multiple mappings of the same backing page with different attributes.

The only mappings might be in the linear mapping, since not all pages
we get here will necessarily be high pages I assume and therefore
could have a linear mapping. If we've changed the linear mapping to
non-cached, then create a cached kmap I'm not 100% that won't cause
issues.

but this is a definite maze of twisty passages and I'd probably need
to sit down and break it a bit more.

Dave.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Why can't ttm_tt_swapout() handle uncached or WC BOs?
  2020-09-23  3:17     ` Dave Airlie
@ 2020-09-23  9:24       ` Hellstrom, Thomas
  2020-09-23  9:29         ` Daniel Vetter
  2020-09-23 14:03         ` Christian König
  0 siblings, 2 replies; 8+ messages in thread
From: Hellstrom, Thomas @ 2020-09-23  9:24 UTC (permalink / raw)
  To: christian.koenig, airlied; +Cc: michel, dri-devel

On Wed, 2020-09-23 at 13:17 +1000, Dave Airlie wrote:
> On Fri, 18 Sep 2020 at 00:49, Christian König <
> christian.koenig@amd.com> wrote:
> > Am 17.09.20 um 16:44 schrieb Michel Dänzer:
> > > On 2020-09-17 2:20 p.m., Christian König wrote:
> > > > Hi guys,
> > > > 
> > > > Michel once submitted a patch to fix triggering this BUG_ON in
> > > > ttm_tt_swapout():
> > > > 
> > > > > BUG_ON(ttm->caching_state != tt_cached);
> > > > 
> > > > Now my question is does anybody know why we have that in the
> > > > first
> > > > place?
> > > > 
> > > > The only problematic thing I can see is calling copy_highpage()
> > > > and
> > > > that one is just doing a kmap_atomic()/kunmap_atomic() on the
> > > > source
> > > > and destination.
> > > > 
> > > > I can't see why it should be problematic for this temporary
> > > > mapping
> > > > to be cached instead of uncached or WC?
> > > > 
> > > > Does anybody has any idea?
> > > 
> > > One thing is that AFAIK some (ARM?) CPUs can get very upset when
> > > there's both a cached and uncacheable mapping for the same
> > > physical page.
> > 
> > Good point, but I already considered this.
> > 
> > If there is another mapping for that page left we are completely
> > busted
> > anyway since we are currently changing the underlying storage.
> > 
> 
> It's not just ARM of course, x86 PAT also has many issues about
> multiple mappings of the same backing page with different attributes.
> 
> The only mappings might be in the linear mapping, since not all pages
> we get here will necessarily be high pages I assume and therefore
> could have a linear mapping. If we've changed the linear mapping to
> non-cached, then create a cached kmap I'm not 100% that won't cause
> issues.
> 
> but this is a definite maze of twisty passages and I'd probably need
> to sit down and break it a bit more.
> 
> Dave.

This is a problem that goes back way far (12+) years that the x86
architecture is not allowed to do aliased mappings of pages, even
through mappable GTTs: There are two aspects:

1) Create a WC mapping of a page with data in the cache. When the cache
does a writeback, anything written through the WC mapping will get
overwritten.

2) Flushing the WB mappings first might not help, since at that time
there were some AMD processors (Athlons?) that were speculatively
prefetching data on the WB mapping into the cache at any time, and even
though it wasn't changed it did a writeback. Anything written through
WC in-between the prefetch and the writeback got overwritten. That was
a real and occuring problem at that time. AMD claimed it was not
violating specs.

So aliased mappings is probably at best very fragile. 

/Thomas




----------------------------------------------------------------------
Intel Sweden AB
Registered Office: Isafjordsgatan 30B, 164 40 Kista, Stockholm, Sweden
Registration Number: 556189-6027

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Why can't ttm_tt_swapout() handle uncached or WC BOs?
  2020-09-23  9:24       ` Hellstrom, Thomas
@ 2020-09-23  9:29         ` Daniel Vetter
  2020-09-23 14:03         ` Christian König
  1 sibling, 0 replies; 8+ messages in thread
From: Daniel Vetter @ 2020-09-23  9:29 UTC (permalink / raw)
  To: Hellstrom, Thomas; +Cc: michel, christian.koenig, dri-devel

On Wed, Sep 23, 2020 at 11:24 AM Hellstrom, Thomas
<thomas.hellstrom@intel.com> wrote:
>
> On Wed, 2020-09-23 at 13:17 +1000, Dave Airlie wrote:
> > On Fri, 18 Sep 2020 at 00:49, Christian König <
> > christian.koenig@amd.com> wrote:
> > > Am 17.09.20 um 16:44 schrieb Michel Dänzer:
> > > > On 2020-09-17 2:20 p.m., Christian König wrote:
> > > > > Hi guys,
> > > > >
> > > > > Michel once submitted a patch to fix triggering this BUG_ON in
> > > > > ttm_tt_swapout():
> > > > >
> > > > > > BUG_ON(ttm->caching_state != tt_cached);
> > > > >
> > > > > Now my question is does anybody know why we have that in the
> > > > > first
> > > > > place?
> > > > >
> > > > > The only problematic thing I can see is calling copy_highpage()
> > > > > and
> > > > > that one is just doing a kmap_atomic()/kunmap_atomic() on the
> > > > > source
> > > > > and destination.
> > > > >
> > > > > I can't see why it should be problematic for this temporary
> > > > > mapping
> > > > > to be cached instead of uncached or WC?
> > > > >
> > > > > Does anybody has any idea?
> > > >
> > > > One thing is that AFAIK some (ARM?) CPUs can get very upset when
> > > > there's both a cached and uncacheable mapping for the same
> > > > physical page.
> > >
> > > Good point, but I already considered this.
> > >
> > > If there is another mapping for that page left we are completely
> > > busted
> > > anyway since we are currently changing the underlying storage.
> > >
> >
> > It's not just ARM of course, x86 PAT also has many issues about
> > multiple mappings of the same backing page with different attributes.
> >
> > The only mappings might be in the linear mapping, since not all pages
> > we get here will necessarily be high pages I assume and therefore
> > could have a linear mapping. If we've changed the linear mapping to
> > non-cached, then create a cached kmap I'm not 100% that won't cause
> > issues.
> >
> > but this is a definite maze of twisty passages and I'd probably need
> > to sit down and break it a bit more.
> >
> > Dave.
>
> This is a problem that goes back way far (12+) years that the x86
> architecture is not allowed to do aliased mappings of pages, even
> through mappable GTTs: There are two aspects:
>
> 1) Create a WC mapping of a page with data in the cache. When the cache
> does a writeback, anything written through the WC mapping will get
> overwritten.
>
> 2) Flushing the WB mappings first might not help, since at that time
> there were some AMD processors (Athlons?) that were speculatively
> prefetching data on the WB mapping into the cache at any time, and even
> though it wasn't changed it did a writeback. Anything written through
> WC in-between the prefetch and the writeback got overwritten. That was
> a real and occuring problem at that time. AMD claimed it was not
> violating specs.
>
> So aliased mappings is probably at best very fragile.

Intel has an architectural guarantee (apparently) that clean
cachelines get dropped, never written back. We heavily rely on that
with all our cacheline flushing tricks for integrated i915.ko. Since
SoC keep being non-coherent for the gpu for perf reasons I don't
expect these kind of games to go away anytime soon (we even do these
cache flushes in userspace nowadays, to cut out the kernel).

For discrete it's a different story ofc.
-Daniel

>
> /Thomas
>
>
>
>
> ----------------------------------------------------------------------
> Intel Sweden AB
> Registered Office: Isafjordsgatan 30B, 164 40 Kista, Stockholm, Sweden
> Registration Number: 556189-6027
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Why can't ttm_tt_swapout() handle uncached or WC BOs?
  2020-09-23  9:24       ` Hellstrom, Thomas
  2020-09-23  9:29         ` Daniel Vetter
@ 2020-09-23 14:03         ` Christian König
  2020-09-23 16:03           ` Hellstrom, Thomas
  1 sibling, 1 reply; 8+ messages in thread
From: Christian König @ 2020-09-23 14:03 UTC (permalink / raw)
  To: Hellstrom, Thomas, airlied; +Cc: michel, dri-devel

Am 23.09.20 um 11:24 schrieb Hellstrom, Thomas:
> On Wed, 2020-09-23 at 13:17 +1000, Dave Airlie wrote:
>> On Fri, 18 Sep 2020 at 00:49, Christian König <
>> christian.koenig@amd.com> wrote:
>>> Am 17.09.20 um 16:44 schrieb Michel Dänzer:
>>>> On 2020-09-17 2:20 p.m., Christian König wrote:
>>>>> Hi guys,
>>>>>
>>>>> Michel once submitted a patch to fix triggering this BUG_ON in
>>>>> ttm_tt_swapout():
>>>>>
>>>>>> BUG_ON(ttm->caching_state != tt_cached);
>>>>> Now my question is does anybody know why we have that in the
>>>>> first
>>>>> place?
>>>>>
>>>>> The only problematic thing I can see is calling copy_highpage()
>>>>> and
>>>>> that one is just doing a kmap_atomic()/kunmap_atomic() on the
>>>>> source
>>>>> and destination.
>>>>>
>>>>> I can't see why it should be problematic for this temporary
>>>>> mapping
>>>>> to be cached instead of uncached or WC?
>>>>>
>>>>> Does anybody has any idea?
>>>> One thing is that AFAIK some (ARM?) CPUs can get very upset when
>>>> there's both a cached and uncacheable mapping for the same
>>>> physical page.
>>> Good point, but I already considered this.
>>>
>>> If there is another mapping for that page left we are completely
>>> busted
>>> anyway since we are currently changing the underlying storage.
>>>
>> It's not just ARM of course, x86 PAT also has many issues about
>> multiple mappings of the same backing page with different attributes.
>>
>> The only mappings might be in the linear mapping, since not all pages
>> we get here will necessarily be high pages I assume and therefore
>> could have a linear mapping. If we've changed the linear mapping to
>> non-cached, then create a cached kmap I'm not 100% that won't cause
>> issues.
>>
>> but this is a definite maze of twisty passages and I'd probably need
>> to sit down and break it a bit more.
>>
>> Dave.
> This is a problem that goes back way far (12+) years that the x86
> architecture is not allowed to do aliased mappings of pages, even
> through mappable GTTs: There are two aspects:
>
> 1) Create a WC mapping of a page with data in the cache. When the cache
> does a writeback, anything written through the WC mapping will get
> overwritten.
>
> 2) Flushing the WB mappings first might not help, since at that time
> there were some AMD processors (Athlons?) that were speculatively
> prefetching data on the WB mapping into the cache at any time, and even
> though it wasn't changed it did a writeback. Anything written through
> WC in-between the prefetch and the writeback got overwritten. That was
> a real and occuring problem at that time. AMD claimed it was not
> violating specs.
>
> So aliased mappings is probably at best very fragile.

All of this is correct, but I still can't see why ttm_bo_swapout() tries 
to change the caching?

See copy_highpage() will only create a temporary wb mapping for highmem 
pages which is destroyed immediately again. Otherwise it will use the 
linear mapping which should already have the correct caching attributes.

On the other hand on architectures which uses MTRR or similar approach 
where you can't change mapping on the fly this will totally blow up into 
your face.

Christian.

>
> /Thomas
>
>
>
>
> ----------------------------------------------------------------------
> Intel Sweden AB
> Registered Office: Isafjordsgatan 30B, 164 40 Kista, Stockholm, Sweden
> Registration Number: 556189-6027
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Why can't ttm_tt_swapout() handle uncached or WC BOs?
  2020-09-23 14:03         ` Christian König
@ 2020-09-23 16:03           ` Hellstrom, Thomas
  0 siblings, 0 replies; 8+ messages in thread
From: Hellstrom, Thomas @ 2020-09-23 16:03 UTC (permalink / raw)
  To: christian.koenig, airlied; +Cc: michel, dri-devel

On Wed, 2020-09-23 at 16:03 +0200, Christian König wrote:
> Am 23.09.20 um 11:24 schrieb Hellstrom, Thomas:
> > On Wed, 2020-09-23 at 13:17 +1000, Dave Airlie wrote:
> > > On Fri, 18 Sep 2020 at 00:49, Christian König <
> > > christian.koenig@amd.com> wrote:
> > > > Am 17.09.20 um 16:44 schrieb Michel Dänzer:
> > > > > On 2020-09-17 2:20 p.m., Christian König wrote:
> > > > > > Hi guys,
> > > > > > 
> > > > > > Michel once submitted a patch to fix triggering this BUG_ON
> > > > > > in
> > > > > > ttm_tt_swapout():
> > > > > > 
> > > > > > > BUG_ON(ttm->caching_state != tt_cached);
> > > > > > Now my question is does anybody know why we have that in
> > > > > > the
> > > > > > first
> > > > > > place?
> > > > > > 
> > > > > > The only problematic thing I can see is calling
> > > > > > copy_highpage()
> > > > > > and
> > > > > > that one is just doing a kmap_atomic()/kunmap_atomic() on
> > > > > > the
> > > > > > source
> > > > > > and destination.
> > > > > > 
> > > > > > I can't see why it should be problematic for this temporary
> > > > > > mapping
> > > > > > to be cached instead of uncached or WC?
> > > > > > 
> > > > > > Does anybody has any idea?
> > > > > One thing is that AFAIK some (ARM?) CPUs can get very upset
> > > > > when
> > > > > there's both a cached and uncacheable mapping for the same
> > > > > physical page.
> > > > Good point, but I already considered this.
> > > > 
> > > > If there is another mapping for that page left we are
> > > > completely
> > > > busted
> > > > anyway since we are currently changing the underlying storage.
> > > > 
> > > It's not just ARM of course, x86 PAT also has many issues about
> > > multiple mappings of the same backing page with different
> > > attributes.
> > > 
> > > The only mappings might be in the linear mapping, since not all
> > > pages
> > > we get here will necessarily be high pages I assume and therefore
> > > could have a linear mapping. If we've changed the linear mapping
> > > to
> > > non-cached, then create a cached kmap I'm not 100% that won't
> > > cause
> > > issues.
> > > 
> > > but this is a definite maze of twisty passages and I'd probably
> > > need
> > > to sit down and break it a bit more.
> > > 
> > > Dave.
> > This is a problem that goes back way far (12+) years that the x86
> > architecture is not allowed to do aliased mappings of pages, even
> > through mappable GTTs: There are two aspects:
> > 
> > 1) Create a WC mapping of a page with data in the cache. When the
> > cache
> > does a writeback, anything written through the WC mapping will get
> > overwritten.
> > 
> > 2) Flushing the WB mappings first might not help, since at that
> > time
> > there were some AMD processors (Athlons?) that were speculatively
> > prefetching data on the WB mapping into the cache at any time, and
> > even
> > though it wasn't changed it did a writeback. Anything written
> > through
> > WC in-between the prefetch and the writeback got overwritten. That
> > was
> > a real and occuring problem at that time. AMD claimed it was not
> > violating specs.
> > 
> > So aliased mappings is probably at best very fragile.
> 
> All of this is correct, but I still can't see why ttm_bo_swapout()
> tries 
> to change the caching?
> 
> See copy_highpage() will only create a temporary wb mapping for
> highmem 
> pages which is destroyed immediately again. Otherwise it will use
> the 
> linear mapping which should already have the correct caching
> attributes.
> 

IIRC there are a couple of historical reasons
- Generally avoiding aliased mappings with conflicting caching
- Reading out from WC is dead slow
- Not sure of the order of events, but I think swapping predates the WC
page pool, which meant the pages had to change caching anyway.
- A desire to be able insert the pages directly in the swap cache
rather than to copy shmem objects.

But today, we can probably do better. I guess, where available,
streaming WC-tailored memcpy and not changing the caching might be a
good choice?

/Thomas







----------------------------------------------------------------------
Intel Sweden AB
Registered Office: Isafjordsgatan 30B, 164 40 Kista, Stockholm, Sweden
Registration Number: 556189-6027

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-09-23 16:03 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-17 12:20 Why can't ttm_tt_swapout() handle uncached or WC BOs? Christian König
2020-09-17 14:44 ` Michel Dänzer
2020-09-17 14:49   ` Christian König
2020-09-23  3:17     ` Dave Airlie
2020-09-23  9:24       ` Hellstrom, Thomas
2020-09-23  9:29         ` Daniel Vetter
2020-09-23 14:03         ` Christian König
2020-09-23 16:03           ` Hellstrom, Thomas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.