dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* ttm_bo and multiple backing store segments
@ 2023-06-29 21:10 Welty, Brian
  2023-07-17 17:24 ` [Intel-xe] " Rodrigo Vivi
  0 siblings, 1 reply; 4+ messages in thread
From: Welty, Brian @ 2023-06-29 21:10 UTC (permalink / raw)
  To: Christian König, Thomas Hellström, dri-devel,
	Matthew Brost, intel-xe


Hi Christian / Thomas,

Wanted to ask if you have explored or thought about adding support in 
TTM such that a ttm_bo could have more than one underlying backing store 
segment (that is, to have a tree of ttm_resources)?
We are considering to support such BOs for Intel Xe driver.

Some of the benefits:
  * devices with page fault support can fault (and migrate) backing store
    at finer granularity than the entire BO
  * BOs can support having multiple backing store segments, which can be
    in different memory domains/regions
  * BO eviction could operate on smaller granularity than entire BO

Or is the thinking that workloads should use SVM/HMM instead of 
GEM_CREATE if they want above benefits?

Is this something you are open to seeing an RFC series that starts 
perhaps with just extending ttm_bo_validate() to see how this might 
shape up?

-Brian

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Intel-xe] ttm_bo and multiple backing store segments
  2023-06-29 21:10 ttm_bo and multiple backing store segments Welty, Brian
@ 2023-07-17 17:24 ` Rodrigo Vivi
  2023-07-19  9:02   ` Christian König
  0 siblings, 1 reply; 4+ messages in thread
From: Rodrigo Vivi @ 2023-07-17 17:24 UTC (permalink / raw)
  To: Welty, Brian
  Cc: Matthew Brost, Thomas Hellström, dri-devel, intel-xe,
	Christian König

On Thu, Jun 29, 2023 at 02:10:58PM -0700, Welty, Brian wrote:
> 
> Hi Christian / Thomas,
> 
> Wanted to ask if you have explored or thought about adding support in TTM
> such that a ttm_bo could have more than one underlying backing store segment
> (that is, to have a tree of ttm_resources)?
> We are considering to support such BOs for Intel Xe driver.

They are indeed the best one to give an opinion here.
I just have some dummy questions and comments below.

> 
> Some of the benefits:
>  * devices with page fault support can fault (and migrate) backing store
>    at finer granularity than the entire BO

what advantage does this bring? to each workload?
is it a performance on huge bo?

>  * BOs can support having multiple backing store segments, which can be
>    in different memory domains/regions

what locking challenges would this bring?
is this more targeting gpu + cpu? or only for our multi-tile platforms?
and what's the advantage this is bringing to real use cases?
(probably the svm/hmm question below answers my questions, but...)

>  * BO eviction could operate on smaller granularity than entire BO

I believe all the previous doubts apply to this item as well...

> 
> Or is the thinking that workloads should use SVM/HMM instead of GEM_CREATE
> if they want above benefits?
> 
> Is this something you are open to seeing an RFC series that starts perhaps
> with just extending ttm_bo_validate() to see how this might shape up?

Imho the RFC always help... a piece of code to see the idea usually draws
more attention from devs than ask in text mode. But more text explaining
the reasons behind are also helpful even with the RFC.

Thanks,
Rodrigo.

> 
> -Brian

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Intel-xe] ttm_bo and multiple backing store segments
  2023-07-17 17:24 ` [Intel-xe] " Rodrigo Vivi
@ 2023-07-19  9:02   ` Christian König
  2023-08-04  0:19     ` Welty, Brian
  0 siblings, 1 reply; 4+ messages in thread
From: Christian König @ 2023-07-19  9:02 UTC (permalink / raw)
  To: Rodrigo Vivi, Welty, Brian
  Cc: Matthew Brost, Thomas Hellström, dri-devel, intel-xe

Hi guys,

massive sorry for the delayed response, this mail felt totally through 
my radar without being noticed.

Am 17.07.23 um 19:24 schrieb Rodrigo Vivi:
> On Thu, Jun 29, 2023 at 02:10:58PM -0700, Welty, Brian wrote:
>> Hi Christian / Thomas,
>>
>> Wanted to ask if you have explored or thought about adding support in TTM
>> such that a ttm_bo could have more than one underlying backing store segment
>> (that is, to have a tree of ttm_resources)?

We already use something similar on amdgpu where basically the VRAM 
resources are stitched together from multiple backing pages.

That is not exactly the same, but it comes close.

>> We are considering to support such BOs for Intel Xe driver.
> They are indeed the best one to give an opinion here.
> I just have some dummy questions and comments below.
>
>> Some of the benefits:
>>   * devices with page fault support can fault (and migrate) backing store
>>     at finer granularity than the entire BO

We've considered that once as well and I even started hacking something 
together, but the problem was that at least at that point it wasn't 
doable because of limitations in the Linux memory management.

Basically the extended attributes used to control caching of pages where 
only definable per VMA! So when one piece of the BO would have been in 
uncached VRAM while another piece would be in cached system system 
memory you immediately ran into problems.

I think that issue is fixed by now, but I'm not 100% sure.

In general I think it might be beneficial, but I'm not 100% sure if it's 
worth the additional complexity.

Regards,
Christian.

> what advantage does this bring? to each workload?
> is it a performance on huge bo?
>
>>   * BOs can support having multiple backing store segments, which can be
>>     in different memory domains/regions
> what locking challenges would this bring?
> is this more targeting gpu + cpu? or only for our multi-tile platforms?
> and what's the advantage this is bringing to real use cases?
> (probably the svm/hmm question below answers my questions, but...)
>
>>   * BO eviction could operate on smaller granularity than entire BO
> I believe all the previous doubts apply to this item as well...
>
>> Or is the thinking that workloads should use SVM/HMM instead of GEM_CREATE
>> if they want above benefits?
>>
>> Is this something you are open to seeing an RFC series that starts perhaps
>> with just extending ttm_bo_validate() to see how this might shape up?
> Imho the RFC always help... a piece of code to see the idea usually draws
> more attention from devs than ask in text mode. But more text explaining
> the reasons behind are also helpful even with the RFC.
>
> Thanks,
> Rodrigo.
>
>> -Brian


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Intel-xe] ttm_bo and multiple backing store segments
  2023-07-19  9:02   ` Christian König
@ 2023-08-04  0:19     ` Welty, Brian
  0 siblings, 0 replies; 4+ messages in thread
From: Welty, Brian @ 2023-08-04  0:19 UTC (permalink / raw)
  To: Christian König, Rodrigo Vivi
  Cc: Matthew Brost, Thomas Hellström, dri-devel, intel-xe


Finally returning to this, thanks for the replies.

On 7/19/2023 2:02 AM, Christian König wrote:
> Hi guys,
> 
> massive sorry for the delayed response, this mail felt totally through 
> my radar without being noticed.
> 
> Am 17.07.23 um 19:24 schrieb Rodrigo Vivi:
>> On Thu, Jun 29, 2023 at 02:10:58PM -0700, Welty, Brian wrote:
>>> Hi Christian / Thomas,
>>>
>>> Wanted to ask if you have explored or thought about adding support in 
>>> TTM
>>> such that a ttm_bo could have more than one underlying backing store 
>>> segment
>>> (that is, to have a tree of ttm_resources)?
> 
> We already use something similar on amdgpu where basically the VRAM 
> resources are stitched together from multiple backing pages.
> 
> That is not exactly the same, but it comes close.

I tried searching for awhile for this in amdgpu but wasn't able to find 
it.  Didn't see any signs in amdgpu_vram_mgr.c.
Can you point me to where this code lives?  I wanted to review and 
compare the approach...


> 
>>> We are considering to support such BOs for Intel Xe driver.
>> They are indeed the best one to give an opinion here.
>> I just have some dummy questions and comments below.
>>
>>> Some of the benefits:
>>>   * devices with page fault support can fault (and migrate) backing 
>>> store
>>>     at finer granularity than the entire BO
> 
> We've considered that once as well and I even started hacking something 
> together, but the problem was that at least at that point it wasn't 
> doable because of limitations in the Linux memory management.
> 
> Basically the extended attributes used to control caching of pages where 
> only definable per VMA! So when one piece of the BO would have been in 
> uncached VRAM while another piece would be in cached system system 
> memory you immediately ran into problems.
> 
> I think that issue is fixed by now, but I'm not 100% sure.

Okay, thanks for mentioning.  I didn't come across such issue so far...

> 
> In general I think it might be beneficial, but I'm not 100% sure if it's 
> worth the additional complexity.

Agreed.  Well, up next is to put small RFC together then...

> 
> Regards,
> Christian.
> 
>> what advantage does this bring? to each workload?
>> is it a performance on huge bo?

Replying to Rodrigo's comments for the rest here...
Yes, providing more rationale is needed. I'll see about beefing up
the description with the RFC patches...
Bascially, all aspects of working with BO backing store can operate
on smaller granularity.
Including being able to support a BO which is larger than total VRAM.


>>
>>>   * BOs can support having multiple backing store segments, which can be
>>>     in different memory domains/regions
>> what locking challenges would this bring?

Intent would be to still have locking done at the BO level, and not 
attempt to introduce finer grained locking.

>> is this more targeting gpu + cpu? or only for our multi-tile platforms?
>> and what's the advantage this is bringing to real use cases?

Right, it's able to be leveraged for both types of usage you mentioned.
So with both gpu + cpu accessing a BO, the portion of the BO they are 
accessing can be placed locally.
And with an Xe gt0 + gt1 accessing a BO, we can place segments of it in 
the tile local to the gt.

>> (probably the svm/hmm question below answers my questions, but...)
>>
>>>   * BO eviction could operate on smaller granularity than entire BO
>> I believe all the previous doubts apply to this item as well...

Not sure what 'all the previous doubts' refers to...
Agree most of the value is lost if eviction is not updated to operate at 
finer granularity.  Will make sure to explore this.

>>
>>> Or is the thinking that workloads should use SVM/HMM instead of 
>>> GEM_CREATE
>>> if they want above benefits?
>>>
>>> Is this something you are open to seeing an RFC series that starts 
>>> perhaps
>>> with just extending ttm_bo_validate() to see how this might shape up?
>> Imho the RFC always help... a piece of code to see the idea usually draws
>> more attention from devs than ask in text mode. But more text explaining
>> the reasons behind are also helpful even with the RFC.

Will work up a small RFC and see where we go with this...

Thanks,
-Brian


>>
>> Thanks,
>> Rodrigo.
>>
>>> -Brian
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-08-04  0:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-29 21:10 ttm_bo and multiple backing store segments Welty, Brian
2023-07-17 17:24 ` [Intel-xe] " Rodrigo Vivi
2023-07-19  9:02   ` Christian König
2023-08-04  0:19     ` Welty, Brian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).