dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* unified LRU for ttm and svm
@ 2023-10-19 16:51 Zeng, Oak
  0 siblings, 0 replies; only message in thread
From: Zeng, Oak @ 2023-10-19 16:51 UTC (permalink / raw)
  To: Christian König, Hellstrom,  Thomas
  Cc: Brost, Matthew, Philip Yang, Felix Kuehling, Welty, Brian,
	dri-devel, Vishwanathapura, Niranjana, intel-xe


[-- Attachment #1.1: Type: text/plain, Size: 2252 bytes --]

Hello all,

As a follow up to this thread https://www.spinics.net/lists/dri-devel/msg410740.html, I looked further into the idea of a shared LRU list for both ttm/bo and svm (to achieve a mutual eviction b/t them). I came up a rough design which I think better to align with you before I move too far.

As illustrated in below diagram:


  1.  There will be a global drm_lru_manager to maintain the shared LRU list. Each memory type will have a list, i.e., system memory has a list, gpu memory has a list. On system which has multiple gpu memory regions, we can have multiple GPU LRU
  2.  Move the LRU operation functions (such as bulk_move related) from ttm_resource_manager to drm_lru_manager
  3.  Drm_lru_manager should be initialized during device initialization. Ttm layer or svm layer can have weak reference to it for convenience.
  4.  Abstract a drm_lru_entity: This is supposed to be embedded in ttm_resource and svm_resource struct, as illustrated. Since ttm_resource and svm_resource are quite different in nature (ttm_resource is coupled with bo and svm_resource is struct page/pfn based), we can't provide unified eviction function for them. So a evict_func pointer is introduced in drm_lru_entity[Note 1].
  5.  Lru_lock. Currently the lru_lock is in ttm_device structure. Ideally this can be moved to drm_lru_manager. But besides the lru list, lru_lock also protect other ttm specific thing such as ttm_device's pinned list. The current plan is to move lru_lock to xe_device/amdgpu_device and ttm_device or svm can have a weak reference for convenience.

[cid:image001.png@01DA0285.844FA910]


Note 1: I have been considering a structure like below. Each hmm/svm resource page is backed by a struct page and struct page already has a lru member. So theoretically  the LRU list can be as below. This way we don't need to introduce the drm_lru_entity struct. The difficulty is, without modify the linux struct page, we can't cast a lru node to struct page or struct ttm_resource, since we don't know whether this node is used by ttm or svm. This is why I had to introduce drm_lru_entity to hold an evict_function above. But let me know if you have better idea.

[cid:image002.png@01DA0289.9AD5D110]

Thanks,
Oak


[-- Attachment #1.2: Type: text/html, Size: 7278 bytes --]

[-- Attachment #2: image001.png --]
[-- Type: image/png, Size: 41996 bytes --]

[-- Attachment #3: image002.png --]
[-- Type: image/png, Size: 39944 bytes --]

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-10-19 16:51 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-19 16:51 unified LRU for ttm and svm Zeng, Oak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).