All of lore.kernel.org
 help / color / mirror / Atom feed
* i915 ttm_tt shmem backend
@ 2021-09-09 14:56 Matthew Auld
  2021-09-09 16:43 ` AW: " Koenig, Christian
  2021-09-10  7:46 ` Thomas Hellström
  0 siblings, 2 replies; 9+ messages in thread
From: Matthew Auld @ 2021-09-09 14:56 UTC (permalink / raw)
  To: Christian König, Christian König
  Cc: Thomas Hellström, ML dri-devel

Hi Christian,

We are looking into using shmem as a ttm_tt backend in i915 for cached
system memory objects. We would also like to make such objects visible
to the i915-gem shrinker, so that they may be swapped out or discarded
when under memory pressure.

One idea for handling this is roughly something like:
- Add a new TTM_PAGE_FLAG_SHMEM flag, or similar.
- Skip the ttm_pages_allocated accounting on such objects, similar to
how FLAG_SG is already handled.
- Skip all the page->mapping and page->index related bits, like in
tt_add_mapping, since it looks like these are set and used by shmem.
Not sure what functionally this might break, but looks like it's maybe
only driver specific?
- Skip calling into ttm_bo_swap_out/in and just have
ttm_populate/unpopulate handle this directly for such objects.
- Make such objects visible to the i915-gem shrinker.

Does this approach look acceptable?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* AW: i915 ttm_tt shmem backend
  2021-09-09 14:56 i915 ttm_tt shmem backend Matthew Auld
@ 2021-09-09 16:43 ` Koenig, Christian
  2021-09-09 16:56   ` Matthew Auld
  2021-09-10  7:46 ` Thomas Hellström
  1 sibling, 1 reply; 9+ messages in thread
From: Koenig, Christian @ 2021-09-09 16:43 UTC (permalink / raw)
  To: Matthew Auld, Christian König; +Cc: Thomas Hellström, ML dri-devel

[-- Attachment #1: Type: text/plain, Size: 1554 bytes --]

Hi Matthew,

this doesn't work, I've already tried something similar.

TTM uses the reverse lookup functionality when migrating BOs between system and device memory. And that doesn't seem to work with pages from a shmem file.

Regards,
Christian.

________________________________
Von: Matthew Auld <matthew.william.auld@gmail.com>
Gesendet: Donnerstag, 9. September 2021 16:56
An: Christian König <ckoenig.leichtzumerken@gmail.com>; Koenig, Christian <Christian.Koenig@amd.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>; ML dri-devel <dri-devel@lists.freedesktop.org>
Betreff: i915 ttm_tt shmem backend

Hi Christian,

We are looking into using shmem as a ttm_tt backend in i915 for cached
system memory objects. We would also like to make such objects visible
to the i915-gem shrinker, so that they may be swapped out or discarded
when under memory pressure.

One idea for handling this is roughly something like:
- Add a new TTM_PAGE_FLAG_SHMEM flag, or similar.
- Skip the ttm_pages_allocated accounting on such objects, similar to
how FLAG_SG is already handled.
- Skip all the page->mapping and page->index related bits, like in
tt_add_mapping, since it looks like these are set and used by shmem.
Not sure what functionally this might break, but looks like it's maybe
only driver specific?
- Skip calling into ttm_bo_swap_out/in and just have
ttm_populate/unpopulate handle this directly for such objects.
- Make such objects visible to the i915-gem shrinker.

Does this approach look acceptable?

[-- Attachment #2: Type: text/html, Size: 2258 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: i915 ttm_tt shmem backend
  2021-09-09 16:43 ` AW: " Koenig, Christian
@ 2021-09-09 16:56   ` Matthew Auld
  2021-09-10  7:53     ` Christian König
  2021-09-10  8:08     ` Thomas Hellström
  0 siblings, 2 replies; 9+ messages in thread
From: Matthew Auld @ 2021-09-09 16:56 UTC (permalink / raw)
  To: Koenig, Christian
  Cc: Christian König, Thomas Hellström, ML dri-devel

On Thu, 9 Sept 2021 at 17:43, Koenig, Christian
<Christian.Koenig@amd.com> wrote:
>
> Hi Matthew,
>
> this doesn't work, I've already tried something similar.
>
> TTM uses the reverse lookup functionality when migrating BOs between system and device memory. And that doesn't seem to work with pages from a shmem file.

Hmm, what do you mean by reverse lookup functionality? Could you
please point out where that is in the TTM code?

>
> Regards,
> Christian.
>
> ________________________________
> Von: Matthew Auld <matthew.william.auld@gmail.com>
> Gesendet: Donnerstag, 9. September 2021 16:56
> An: Christian König <ckoenig.leichtzumerken@gmail.com>; Koenig, Christian <Christian.Koenig@amd.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>; ML dri-devel <dri-devel@lists.freedesktop.org>
> Betreff: i915 ttm_tt shmem backend
>
> Hi Christian,
>
> We are looking into using shmem as a ttm_tt backend in i915 for cached
> system memory objects. We would also like to make such objects visible
> to the i915-gem shrinker, so that they may be swapped out or discarded
> when under memory pressure.
>
> One idea for handling this is roughly something like:
> - Add a new TTM_PAGE_FLAG_SHMEM flag, or similar.
> - Skip the ttm_pages_allocated accounting on such objects, similar to
> how FLAG_SG is already handled.
> - Skip all the page->mapping and page->index related bits, like in
> tt_add_mapping, since it looks like these are set and used by shmem.
> Not sure what functionally this might break, but looks like it's maybe
> only driver specific?
> - Skip calling into ttm_bo_swap_out/in and just have
> ttm_populate/unpopulate handle this directly for such objects.
> - Make such objects visible to the i915-gem shrinker.
>
> Does this approach look acceptable?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: i915 ttm_tt shmem backend
  2021-09-09 14:56 i915 ttm_tt shmem backend Matthew Auld
  2021-09-09 16:43 ` AW: " Koenig, Christian
@ 2021-09-10  7:46 ` Thomas Hellström
  1 sibling, 0 replies; 9+ messages in thread
From: Thomas Hellström @ 2021-09-10  7:46 UTC (permalink / raw)
  To: Matthew Auld, Christian König, Christian König; +Cc: ML dri-devel

Hi,

On 9/9/21 4:56 PM, Matthew Auld wrote:
> Hi Christian,
>
> We are looking into using shmem as a ttm_tt backend in i915 for cached
> system memory objects. We would also like to make such objects visible
> to the i915-gem shrinker, so that they may be swapped out or discarded
> when under memory pressure.
>
> One idea for handling this is roughly something like:
> - Add a new TTM_PAGE_FLAG_SHMEM flag, or similar.
> - Skip the ttm_pages_allocated accounting on such objects, similar to
> how FLAG_SG is already handled.
> - Skip all the page->mapping and page->index related bits, like in
> tt_add_mapping, since it looks like these are set and used by shmem.
> Not sure what functionally this might break, but looks like it's maybe
> only driver specific?

IIrc the page->mapping and index is needed when doing dirty-tracking 
using mkwrite and by vmwgfx at some point when doing fb_defio on top of 
TTM buffers. I don't think vmwgfx does that anymore, but it still does 
dirty-tracking.

/Thomas



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: i915 ttm_tt shmem backend
  2021-09-09 16:56   ` Matthew Auld
@ 2021-09-10  7:53     ` Christian König
  2021-09-10  8:08     ` Thomas Hellström
  1 sibling, 0 replies; 9+ messages in thread
From: Christian König @ 2021-09-10  7:53 UTC (permalink / raw)
  To: Matthew Auld, Koenig, Christian; +Cc: Thomas Hellström, ML dri-devel

Am 09.09.21 um 18:56 schrieb Matthew Auld:
> On Thu, 9 Sept 2021 at 17:43, Koenig, Christian
> <Christian.Koenig@amd.com> wrote:
>> Hi Matthew,
>>
>> this doesn't work, I've already tried something similar.
>>
>> TTM uses the reverse lookup functionality when migrating BOs between system and device memory. And that doesn't seem to work with pages from a shmem file.
> Hmm, what do you mean by reverse lookup functionality? Could you
> please point out where that is in the TTM code?

When TTM moves a buffer it must make sure that the buffer is not 
accessed by the CPU while moving it.

For this the standard reverse lockup functionality of the Linux kernel 
is used to figure out in which page tables a page is mapped and mark 
those as invalid. Accessing the buffer object will then cause a page 
fault which in turn waits for the buffer move to finish.

But when you back the pages in a TT object with pages from a shmemfile 
this reverse lockup functionality doesn't work for some reason. I 
couldn't figure out what exactly was going wrong here and didn't looked 
deeper, I assumed it's because of not setting up page->mapping and 
page->index correctly. Thomas or Daniel might know more.

Apart from that your approach sounds like pretty much what I tried as well.

Regards,
Christian.

>
>> Regards,
>> Christian.
>>
>> ________________________________
>> Von: Matthew Auld <matthew.william.auld@gmail.com>
>> Gesendet: Donnerstag, 9. September 2021 16:56
>> An: Christian König <ckoenig.leichtzumerken@gmail.com>; Koenig, Christian <Christian.Koenig@amd.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>; ML dri-devel <dri-devel@lists.freedesktop.org>
>> Betreff: i915 ttm_tt shmem backend
>>
>> Hi Christian,
>>
>> We are looking into using shmem as a ttm_tt backend in i915 for cached
>> system memory objects. We would also like to make such objects visible
>> to the i915-gem shrinker, so that they may be swapped out or discarded
>> when under memory pressure.
>>
>> One idea for handling this is roughly something like:
>> - Add a new TTM_PAGE_FLAG_SHMEM flag, or similar.
>> - Skip the ttm_pages_allocated accounting on such objects, similar to
>> how FLAG_SG is already handled.
>> - Skip all the page->mapping and page->index related bits, like in
>> tt_add_mapping, since it looks like these are set and used by shmem.
>> Not sure what functionally this might break, but looks like it's maybe
>> only driver specific?
>> - Skip calling into ttm_bo_swap_out/in and just have
>> ttm_populate/unpopulate handle this directly for such objects.
>> - Make such objects visible to the i915-gem shrinker.
>>
>> Does this approach look acceptable?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: i915 ttm_tt shmem backend
  2021-09-09 16:56   ` Matthew Auld
  2021-09-10  7:53     ` Christian König
@ 2021-09-10  8:08     ` Thomas Hellström
  2021-09-10  8:25       ` Christian König
  1 sibling, 1 reply; 9+ messages in thread
From: Thomas Hellström @ 2021-09-10  8:08 UTC (permalink / raw)
  To: Matthew Auld, Koenig, Christian; +Cc: Christian König, ML dri-devel

Perhaps some background and goal is worth mentioning here.


On Thu, 2021-09-09 at 17:56 +0100, Matthew Auld wrote:
> On Thu, 9 Sept 2021 at 17:43, Koenig, Christian
> <Christian.Koenig@amd.com> wrote:
> > 
> > Hi Matthew,
> > 
> > this doesn't work, I've already tried something similar.
> > 
> > TTM uses the reverse lookup functionality when migrating BOs
> > between system and device memory. And that doesn't seem to work
> > with pages from a shmem file.
> 
> Hmm, what do you mean by reverse lookup functionality? Could you
> please point out where that is in the TTM code?

I think this is in unmap_mapping_range() where, if we use VM_MIXEDMAP,
there is a reverse lookup on the PTEs that point to real pages. Now
that we move over to VM_PFNMAP, that problem should go away since core
vm never has a page to investigate. Probably this is why things works
on non-TTM i915 GEM.

@Christian: Some background here:
First I think that there might be things like the above that will pose
problems, and we may or may not be able to overcome those but more
importantly is that we agree with you that *if* we make it work, it is
something that you as a maintainer of TTM can accept from a design- and
maintainabiltiy point of view.

The approach would be similar to the buddy allocator, we adapt some
driver code to TTM in a way that it may be reused with other drivers,
and if other drivers are interested, we'd assist in moving to core TTM.
In essence it'd be a TTM shmem page pool with full shrinking ability
for cached pages only.

What we're really after here is the ability to shrink that doesn't
regress much w r t the elaborate shrinker that's in i915 today that is
power management aware and is also able to start shmem writebacks to
avoid shmem just caching the pages instead of giving them back to the
system (IIRC it was partly the lack of this that blocked earlier TTM
shrinking efforts).

And since it doesn't really matter whether the shrinker sits in core
TTM or in a driver, I think a future goal might be a set of TTM
shrinker helpers that makes sure we shrink the right TTM object, and
perhaps a simple implementation that is typically used by simple
drivers and other drivers can build on that for a more elaborate power-
management aware shrinker.

/Thomas



> 
> > 
> > Regards,
> > Christian.
> > 
> > ________________________________
> > Von: Matthew Auld <matthew.william.auld@gmail.com>
> > Gesendet: Donnerstag, 9. September 2021 16:56
> > An: Christian König <ckoenig.leichtzumerken@gmail.com>; Koenig,
> > Christian <Christian.Koenig@amd.com>
> > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>; ML dri-
> > devel <dri-devel@lists.freedesktop.org>
> > Betreff: i915 ttm_tt shmem backend
> > 
> > Hi Christian,
> > 
> > We are looking into using shmem as a ttm_tt backend in i915 for
> > cached
> > system memory objects. We would also like to make such objects
> > visible
> > to the i915-gem shrinker, so that they may be swapped out or
> > discarded
> > when under memory pressure.
> > 
> > One idea for handling this is roughly something like:
> > - Add a new TTM_PAGE_FLAG_SHMEM flag, or similar.
> > - Skip the ttm_pages_allocated accounting on such objects, similar
> > to
> > how FLAG_SG is already handled.
> > - Skip all the page->mapping and page->index related bits, like in
> > tt_add_mapping, since it looks like these are set and used by
> > shmem.
> > Not sure what functionally this might break, but looks like it's
> > maybe
> > only driver specific?
> > - Skip calling into ttm_bo_swap_out/in and just have
> > ttm_populate/unpopulate handle this directly for such objects.
> > - Make such objects visible to the i915-gem shrinker.
> > 
> > Does this approach look acceptable?



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: i915 ttm_tt shmem backend
  2021-09-10  8:08     ` Thomas Hellström
@ 2021-09-10  8:25       ` Christian König
  2021-09-10  8:40         ` Thomas Hellström
  0 siblings, 1 reply; 9+ messages in thread
From: Christian König @ 2021-09-10  8:25 UTC (permalink / raw)
  To: Thomas Hellström, Matthew Auld, Koenig, Christian; +Cc: ML dri-devel



Am 10.09.21 um 10:08 schrieb Thomas Hellström:
> Perhaps some background and goal is worth mentioning here.
>
>
> On Thu, 2021-09-09 at 17:56 +0100, Matthew Auld wrote:
>> On Thu, 9 Sept 2021 at 17:43, Koenig, Christian
>> <Christian.Koenig@amd.com> wrote:
>>> Hi Matthew,
>>>
>>> this doesn't work, I've already tried something similar.
>>>
>>> TTM uses the reverse lookup functionality when migrating BOs
>>> between system and device memory. And that doesn't seem to work
>>> with pages from a shmem file.
>> Hmm, what do you mean by reverse lookup functionality? Could you
>> please point out where that is in the TTM code?
> I think this is in unmap_mapping_range() where, if we use VM_MIXEDMAP,
> there is a reverse lookup on the PTEs that point to real pages. Now
> that we move over to VM_PFNMAP, that problem should go away since core
> vm never has a page to investigate. Probably this is why things works
> on non-TTM i915 GEM.

Yeah, that was really likely the root problem. I didn't kept 
investigating after realizing that my approach wouldn't work.

> @Christian: Some background here:
> First I think that there might be things like the above that will pose
> problems, and we may or may not be able to overcome those but more
> importantly is that we agree with you that *if* we make it work, it is
> something that you as a maintainer of TTM can accept from a design- and
> maintainabiltiy point of view.
>
> The approach would be similar to the buddy allocator, we adapt some
> driver code to TTM in a way that it may be reused with other drivers,
> and if other drivers are interested, we'd assist in moving to core TTM.
> In essence it'd be a TTM shmem page pool with full shrinking ability
> for cached pages only.
>
> What we're really after here is the ability to shrink that doesn't
> regress much w r t the elaborate shrinker that's in i915 today that is
> power management aware and is also able to start shmem writebacks to
> avoid shmem just caching the pages instead of giving them back to the
> system (IIRC it was partly the lack of this that blocked earlier TTM
> shrinking efforts).
>
> And since it doesn't really matter whether the shrinker sits in core
> TTM or in a driver, I think a future goal might be a set of TTM
> shrinker helpers that makes sure we shrink the right TTM object, and
> perhaps a simple implementation that is typically used by simple
> drivers and other drivers can build on that for a more elaborate power-
> management aware shrinker.

That's understandable, but I think not necessary what we should aim for 
in the long term.

First of all I would really like to move more of the functionality from 
ttm_pool.c into the core memory management, especially handling of 
uncached and write combined memory.

That's essentially completely architecture dependent and currently 
implemented extremely awkward. Either Daniels suggestion of having a 
GFP_WC or Christophs approach of moving all this into the DMA API is the 
way to go here.

As long as i915 has no interest in USWC support implementing their own 
shmemfile backend sounds fine to me, but I have strong doubt that this 
will be of use to anybody else.

Christian.

>
> /Thomas
>
>
>
>>> Regards,
>>> Christian.
>>>
>>> ________________________________
>>> Von: Matthew Auld <matthew.william.auld@gmail.com>
>>> Gesendet: Donnerstag, 9. September 2021 16:56
>>> An: Christian König <ckoenig.leichtzumerken@gmail.com>; Koenig,
>>> Christian <Christian.Koenig@amd.com>
>>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>; ML dri-
>>> devel <dri-devel@lists.freedesktop.org>
>>> Betreff: i915 ttm_tt shmem backend
>>>
>>> Hi Christian,
>>>
>>> We are looking into using shmem as a ttm_tt backend in i915 for
>>> cached
>>> system memory objects. We would also like to make such objects
>>> visible
>>> to the i915-gem shrinker, so that they may be swapped out or
>>> discarded
>>> when under memory pressure.
>>>
>>> One idea for handling this is roughly something like:
>>> - Add a new TTM_PAGE_FLAG_SHMEM flag, or similar.
>>> - Skip the ttm_pages_allocated accounting on such objects, similar
>>> to
>>> how FLAG_SG is already handled.
>>> - Skip all the page->mapping and page->index related bits, like in
>>> tt_add_mapping, since it looks like these are set and used by
>>> shmem.
>>> Not sure what functionally this might break, but looks like it's
>>> maybe
>>> only driver specific?
>>> - Skip calling into ttm_bo_swap_out/in and just have
>>> ttm_populate/unpopulate handle this directly for such objects.
>>> - Make such objects visible to the i915-gem shrinker.
>>>
>>> Does this approach look acceptable?
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: i915 ttm_tt shmem backend
  2021-09-10  8:25       ` Christian König
@ 2021-09-10  8:40         ` Thomas Hellström
  2021-09-10  8:51           ` Christian König
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Hellström @ 2021-09-10  8:40 UTC (permalink / raw)
  To: Christian König, Matthew Auld, Koenig, Christian; +Cc: ML dri-devel

On Fri, 2021-09-10 at 10:25 +0200, Christian König wrote:
> 
> 
> Am 10.09.21 um 10:08 schrieb Thomas Hellström:
> > Perhaps some background and goal is worth mentioning here.
> > 
> > 
> > On Thu, 2021-09-09 at 17:56 +0100, Matthew Auld wrote:
> > > On Thu, 9 Sept 2021 at 17:43, Koenig, Christian
> > > <Christian.Koenig@amd.com> wrote:
> > > > Hi Matthew,
> > > > 
> > > > this doesn't work, I've already tried something similar.
> > > > 
> > > > TTM uses the reverse lookup functionality when migrating BOs
> > > > between system and device memory. And that doesn't seem to work
> > > > with pages from a shmem file.
> > > Hmm, what do you mean by reverse lookup functionality? Could you
> > > please point out where that is in the TTM code?
> > I think this is in unmap_mapping_range() where, if we use
> > VM_MIXEDMAP,
> > there is a reverse lookup on the PTEs that point to real pages. Now
> > that we move over to VM_PFNMAP, that problem should go away since
> > core
> > vm never has a page to investigate. Probably this is why things
> > works
> > on non-TTM i915 GEM.
> 
> Yeah, that was really likely the root problem. I didn't kept 
> investigating after realizing that my approach wouldn't work.
> 
> > @Christian: Some background here:
> > First I think that there might be things like the above that will
> > pose
> > problems, and we may or may not be able to overcome those but more
> > importantly is that we agree with you that *if* we make it work, it
> > is
> > something that you as a maintainer of TTM can accept from a design-
> > and
> > maintainabiltiy point of view.
> > 
> > The approach would be similar to the buddy allocator, we adapt some
> > driver code to TTM in a way that it may be reused with other
> > drivers,
> > and if other drivers are interested, we'd assist in moving to core
> > TTM.
> > In essence it'd be a TTM shmem page pool with full shrinking
> > ability
> > for cached pages only.
> > 
> > What we're really after here is the ability to shrink that doesn't
> > regress much w r t the elaborate shrinker that's in i915 today that
> > is
> > power management aware and is also able to start shmem writebacks
> > to
> > avoid shmem just caching the pages instead of giving them back to
> > the
> > system (IIRC it was partly the lack of this that blocked earlier
> > TTM
> > shrinking efforts).
> > 
> > And since it doesn't really matter whether the shrinker sits in
> > core
> > TTM or in a driver, I think a future goal might be a set of TTM
> > shrinker helpers that makes sure we shrink the right TTM object,
> > and
> > perhaps a simple implementation that is typically used by simple
> > drivers and other drivers can build on that for a more elaborate
> > power-
> > management aware shrinker.
> 
> That's understandable, but I think not necessary what we should aim
> for 
> in the long term.
> 
> First of all I would really like to move more of the functionality
> from 
> ttm_pool.c into the core memory management, especially handling of 
> uncached and write combined memory.
> 
> That's essentially completely architecture dependent and currently 
> implemented extremely awkward. Either Daniels suggestion of having a 
> GFP_WC or Christophs approach of moving all this into the DMA API is
> the 
> way to go here.
> 
> As long as i915 has no interest in USWC support implementing their
> own 
> shmemfile backend sounds fine to me, but I have strong doubt that
> this 
> will be of use to anybody else.

OK. Sounds fine. In situations where we use WC system memory we will
use what's in TTM today. BTW on the shrinking approach for WC pages,
does the Christoph's DMA API solution envision some kind of support for
this?

/Thomas

> 
> Christian.
> 
> > 
> > /Thomas
> > 
> > 
> > 
> > > > Regards,
> > > > Christian.
> > > > 
> > > > ________________________________
> > > > Von: Matthew Auld <matthew.william.auld@gmail.com>
> > > > Gesendet: Donnerstag, 9. September 2021 16:56
> > > > An: Christian König <ckoenig.leichtzumerken@gmail.com>; Koenig,
> > > > Christian <Christian.Koenig@amd.com>
> > > > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>; ML
> > > > dri-
> > > > devel <dri-devel@lists.freedesktop.org>
> > > > Betreff: i915 ttm_tt shmem backend
> > > > 
> > > > Hi Christian,
> > > > 
> > > > We are looking into using shmem as a ttm_tt backend in i915 for
> > > > cached
> > > > system memory objects. We would also like to make such objects
> > > > visible
> > > > to the i915-gem shrinker, so that they may be swapped out or
> > > > discarded
> > > > when under memory pressure.
> > > > 
> > > > One idea for handling this is roughly something like:
> > > > - Add a new TTM_PAGE_FLAG_SHMEM flag, or similar.
> > > > - Skip the ttm_pages_allocated accounting on such objects,
> > > > similar
> > > > to
> > > > how FLAG_SG is already handled.
> > > > - Skip all the page->mapping and page->index related bits, like
> > > > in
> > > > tt_add_mapping, since it looks like these are set and used by
> > > > shmem.
> > > > Not sure what functionally this might break, but looks like
> > > > it's
> > > > maybe
> > > > only driver specific?
> > > > - Skip calling into ttm_bo_swap_out/in and just have
> > > > ttm_populate/unpopulate handle this directly for such objects.
> > > > - Make such objects visible to the i915-gem shrinker.
> > > > 
> > > > Does this approach look acceptable?
> > 
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: i915 ttm_tt shmem backend
  2021-09-10  8:40         ` Thomas Hellström
@ 2021-09-10  8:51           ` Christian König
  0 siblings, 0 replies; 9+ messages in thread
From: Christian König @ 2021-09-10  8:51 UTC (permalink / raw)
  To: Thomas Hellström, Matthew Auld, Koenig, Christian; +Cc: ML dri-devel

Am 10.09.21 um 10:40 schrieb Thomas Hellström:
> On Fri, 2021-09-10 at 10:25 +0200, Christian König wrote:
>>
>> Am 10.09.21 um 10:08 schrieb Thomas Hellström:
>>> Perhaps some background and goal is worth mentioning here.
>>>
>>>
>>> On Thu, 2021-09-09 at 17:56 +0100, Matthew Auld wrote:
>>>> On Thu, 9 Sept 2021 at 17:43, Koenig, Christian
>>>> <Christian.Koenig@amd.com> wrote:
>>>>> Hi Matthew,
>>>>>
>>>>> this doesn't work, I've already tried something similar.
>>>>>
>>>>> TTM uses the reverse lookup functionality when migrating BOs
>>>>> between system and device memory. And that doesn't seem to work
>>>>> with pages from a shmem file.
>>>> Hmm, what do you mean by reverse lookup functionality? Could you
>>>> please point out where that is in the TTM code?
>>> I think this is in unmap_mapping_range() where, if we use
>>> VM_MIXEDMAP,
>>> there is a reverse lookup on the PTEs that point to real pages. Now
>>> that we move over to VM_PFNMAP, that problem should go away since
>>> core
>>> vm never has a page to investigate. Probably this is why things
>>> works
>>> on non-TTM i915 GEM.
>> Yeah, that was really likely the root problem. I didn't kept
>> investigating after realizing that my approach wouldn't work.
>>
>>> @Christian: Some background here:
>>> First I think that there might be things like the above that will
>>> pose
>>> problems, and we may or may not be able to overcome those but more
>>> importantly is that we agree with you that *if* we make it work, it
>>> is
>>> something that you as a maintainer of TTM can accept from a design-
>>> and
>>> maintainabiltiy point of view.
>>>
>>> The approach would be similar to the buddy allocator, we adapt some
>>> driver code to TTM in a way that it may be reused with other
>>> drivers,
>>> and if other drivers are interested, we'd assist in moving to core
>>> TTM.
>>> In essence it'd be a TTM shmem page pool with full shrinking
>>> ability
>>> for cached pages only.
>>>
>>> What we're really after here is the ability to shrink that doesn't
>>> regress much w r t the elaborate shrinker that's in i915 today that
>>> is
>>> power management aware and is also able to start shmem writebacks
>>> to
>>> avoid shmem just caching the pages instead of giving them back to
>>> the
>>> system (IIRC it was partly the lack of this that blocked earlier
>>> TTM
>>> shrinking efforts).
>>>
>>> And since it doesn't really matter whether the shrinker sits in
>>> core
>>> TTM or in a driver, I think a future goal might be a set of TTM
>>> shrinker helpers that makes sure we shrink the right TTM object,
>>> and
>>> perhaps a simple implementation that is typically used by simple
>>> drivers and other drivers can build on that for a more elaborate
>>> power-
>>> management aware shrinker.
>> That's understandable, but I think not necessary what we should aim
>> for
>> in the long term.
>>
>> First of all I would really like to move more of the functionality
>> from
>> ttm_pool.c into the core memory management, especially handling of
>> uncached and write combined memory.
>>
>> That's essentially completely architecture dependent and currently
>> implemented extremely awkward. Either Daniels suggestion of having a
>> GFP_WC or Christophs approach of moving all this into the DMA API is
>> the
>> way to go here.
>>
>> As long as i915 has no interest in USWC support implementing their
>> own
>> shmemfile backend sounds fine to me, but I have strong doubt that
>> this
>> will be of use to anybody else.
> OK. Sounds fine. In situations where we use WC system memory we will
> use what's in TTM today. BTW on the shrinking approach for WC pages,
> does the Christoph's DMA API solution envision some kind of support for
> this?

Not Christoph DMA API solution, but what I have in mind for the TTM 
shrinker should work.

Essentially a shmemfile per device should help in solving most of the 
issues we ran into.

Christian.

>
> /Thomas
>
>> Christian.
>>
>>> /Thomas
>>>
>>>
>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>> ________________________________
>>>>> Von: Matthew Auld <matthew.william.auld@gmail.com>
>>>>> Gesendet: Donnerstag, 9. September 2021 16:56
>>>>> An: Christian König <ckoenig.leichtzumerken@gmail.com>; Koenig,
>>>>> Christian <Christian.Koenig@amd.com>
>>>>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>; ML
>>>>> dri-
>>>>> devel <dri-devel@lists.freedesktop.org>
>>>>> Betreff: i915 ttm_tt shmem backend
>>>>>
>>>>> Hi Christian,
>>>>>
>>>>> We are looking into using shmem as a ttm_tt backend in i915 for
>>>>> cached
>>>>> system memory objects. We would also like to make such objects
>>>>> visible
>>>>> to the i915-gem shrinker, so that they may be swapped out or
>>>>> discarded
>>>>> when under memory pressure.
>>>>>
>>>>> One idea for handling this is roughly something like:
>>>>> - Add a new TTM_PAGE_FLAG_SHMEM flag, or similar.
>>>>> - Skip the ttm_pages_allocated accounting on such objects,
>>>>> similar
>>>>> to
>>>>> how FLAG_SG is already handled.
>>>>> - Skip all the page->mapping and page->index related bits, like
>>>>> in
>>>>> tt_add_mapping, since it looks like these are set and used by
>>>>> shmem.
>>>>> Not sure what functionally this might break, but looks like
>>>>> it's
>>>>> maybe
>>>>> only driver specific?
>>>>> - Skip calling into ttm_bo_swap_out/in and just have
>>>>> ttm_populate/unpopulate handle this directly for such objects.
>>>>> - Make such objects visible to the i915-gem shrinker.
>>>>>
>>>>> Does this approach look acceptable?
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-09-10  8:51 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-09 14:56 i915 ttm_tt shmem backend Matthew Auld
2021-09-09 16:43 ` AW: " Koenig, Christian
2021-09-09 16:56   ` Matthew Auld
2021-09-10  7:53     ` Christian König
2021-09-10  8:08     ` Thomas Hellström
2021-09-10  8:25       ` Christian König
2021-09-10  8:40         ` Thomas Hellström
2021-09-10  8:51           ` Christian König
2021-09-10  7:46 ` Thomas Hellström

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.