dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/9] TTM shrinker helpers and xe buffer object shrinker
@ 2024-04-16 10:07 Thomas Hellström
  2024-04-16 10:07 ` [PATCH v2 1/9] drm/ttm: Allow TTM LRU list nodes of different types Thomas Hellström
                   ` (9 more replies)
  0 siblings, 10 replies; 21+ messages in thread
From: Thomas Hellström @ 2024-04-16 10:07 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Somalapuram Amaranath,
	Christian König, dri-devel, Matthew Brost

This series implements TTM shrinker / eviction helpers and an xe bo
shrinker. It builds on two previous series, *and obsoletes these*. First

https://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg484425.html

for patch 1-4, which IMO still could be reviewed and pushed as a
separate series.

Second the previous TTM shrinker series

https://lore.kernel.org/linux-mm/b7491378-defd-4f1c-31e2-29e4c77e2d67@amd.com/T/

Where the comment about layering
https://lore.kernel.org/linux-mm/b7491378-defd-4f1c-31e2-29e4c77e2d67@amd.com/T/#ma918844aa8a6efe8768fdcda0c6590d5c93850c9

now addressed, and this version also implements shmem objects for backup
rather than direct swap-cache insertions, which was used in the previuos
series. It turns out that with per-page backup / shrinking, shmem objects
appears to work just as well as direct swap-cache insertions with the
added benefit that was introduced in the previous TTM shrinker series to
avoid running out of swap entries isn't really needed.

In any case, patch 1-4 are better described in their separate series.
(RFC is removed for those).

Patch 5 could in theory be skipped but introduces a possibility to easily
add or test multiple backup backends, like the direct swap-cache
insertion or even files into fast dedicated nvme storage for for example.

Patch 6 introduces helpers in the ttm_pool code for page-by-page shrinking
and recovery. It avoids having to temporarily allocate a huge amount of
memory to be able to shrink a buffer object. It also introduces the
possibility to immediately write-back pages if needed, since that tends
to be a bit delayed when left to kswapd.

Patch 7 Adds a simple error injection to the above code to help increase
test coverage.

Patch 8 introduces a LRU walk helper for eviction and shrinking. It's
currently xe-only but not xe-specific and can easily be moved to TTM when
used by more than one driver or when eviction is implemented using it.

Patch 9 introduces a helper callback for shrinking (Also ready to be
moved to TTM) and an xe-specific shrinker implementation. It also
adds a kunit test to test the shrinker functionality by trying to
allocate twice the available amount of RAM as buffer objects. If there
is no swap-space available, the buffer objects are marked
purgeable.

v2:
- Squash obsolete revision history in the patch commit messages.
- Fix a couple of review comments by Christian
- Don't store the mem_type in the TTM managers but in the
  resource cursor.
- Rename introduced TTM *back_up* function names to *backup*
- Add ttm pool recovery fault injection.
- Shrinker xe kunit test
- Various bugfixes

Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: <dri-devel@lists.freedesktop.org>

Thomas Hellström (8):
  drm/ttm: Allow TTM LRU list nodes of different types
  drm/ttm: Use LRU hitches
  drm/ttm, drm/amdgpu, drm/xe: Consider hitch moves within bulk sublist
    moves
  drm/ttm: Allow continued swapout after -ENOSPC falure
  drm/ttm: Add a virtual base class for graphics memory backup
  drm/ttm/pool: Provide a helper to shrink pages.
  drm/xe, drm/ttm: Provide a generic LRU walker helper
  drm/xe: Add a shrinker for xe bos

 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |   4 +
 drivers/gpu/drm/ttm/Makefile           |   2 +-
 drivers/gpu/drm/ttm/ttm_backup_shmem.c | 137 +++++++++
 drivers/gpu/drm/ttm/ttm_bo.c           |   1 +
 drivers/gpu/drm/ttm/ttm_device.c       |  33 ++-
 drivers/gpu/drm/ttm/ttm_pool.c         | 391 ++++++++++++++++++++++++-
 drivers/gpu/drm/ttm/ttm_resource.c     | 231 ++++++++++++---
 drivers/gpu/drm/ttm/ttm_tt.c           |  34 +++
 drivers/gpu/drm/xe/Makefile            |   2 +
 drivers/gpu/drm/xe/xe_bo.c             | 123 ++++++--
 drivers/gpu/drm/xe/xe_bo.h             |   3 +
 drivers/gpu/drm/xe/xe_device.c         |   8 +
 drivers/gpu/drm/xe/xe_device_types.h   |   2 +
 drivers/gpu/drm/xe/xe_shrinker.c       | 237 +++++++++++++++
 drivers/gpu/drm/xe/xe_shrinker.h       |  18 ++
 drivers/gpu/drm/xe/xe_ttm_helpers.c    | 224 ++++++++++++++
 drivers/gpu/drm/xe/xe_ttm_helpers.h    |  63 ++++
 drivers/gpu/drm/xe/xe_vm.c             |   4 +
 include/drm/ttm/ttm_backup.h           | 136 +++++++++
 include/drm/ttm/ttm_device.h           |   2 +
 include/drm/ttm/ttm_pool.h             |   4 +
 include/drm/ttm/ttm_resource.h         |  96 +++++-
 include/drm/ttm/ttm_tt.h               |  19 ++
 23 files changed, 1683 insertions(+), 91 deletions(-)
 create mode 100644 drivers/gpu/drm/ttm/ttm_backup_shmem.c
 create mode 100644 drivers/gpu/drm/xe/xe_shrinker.c
 create mode 100644 drivers/gpu/drm/xe/xe_shrinker.h
 create mode 100644 drivers/gpu/drm/xe/xe_ttm_helpers.c
 create mode 100644 drivers/gpu/drm/xe/xe_ttm_helpers.h
 create mode 100644 include/drm/ttm/ttm_backup.h

-- 
2.44.0



Thomas Hellström (9):
  drm/ttm: Allow TTM LRU list nodes of different types
  drm/ttm: Use LRU hitches
  drm/ttm, drm/amdgpu, drm/xe: Consider hitch moves within bulk sublist
    moves
  drm/ttm: Allow continued swapout after -ENOSPC falure
  drm/ttm: Add a virtual base class for graphics memory backup
  drm/ttm/pool: Provide a helper to shrink pages.
  drm/ttm: Use fault-injection to test error paths
  drm/xe, drm/ttm: Provide a generic LRU walker helper
  drm/xe: Add a shrinker for xe bos

 drivers/gpu/drm/Kconfig                |  10 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |   4 +
 drivers/gpu/drm/ttm/Makefile           |   2 +-
 drivers/gpu/drm/ttm/ttm_backup_shmem.c | 137 ++++++++
 drivers/gpu/drm/ttm/ttm_bo.c           |   1 +
 drivers/gpu/drm/ttm/ttm_device.c       |  33 +-
 drivers/gpu/drm/ttm/ttm_pool.c         | 412 ++++++++++++++++++++++++-
 drivers/gpu/drm/ttm/ttm_resource.c     | 229 +++++++++++---
 drivers/gpu/drm/ttm/ttm_tt.c           |  37 +++
 drivers/gpu/drm/xe/Makefile            |   2 +
 drivers/gpu/drm/xe/tests/xe_bo.c       | 118 +++++++
 drivers/gpu/drm/xe/tests/xe_bo_test.c  |   1 +
 drivers/gpu/drm/xe/tests/xe_bo_test.h  |   1 +
 drivers/gpu/drm/xe/xe_bo.c             | 145 ++++++++-
 drivers/gpu/drm/xe/xe_bo.h             |   4 +
 drivers/gpu/drm/xe/xe_device.c         |   8 +
 drivers/gpu/drm/xe/xe_device_types.h   |   2 +
 drivers/gpu/drm/xe/xe_shrinker.c       | 226 ++++++++++++++
 drivers/gpu/drm/xe/xe_shrinker.h       |  18 ++
 drivers/gpu/drm/xe/xe_ttm_helpers.c    | 224 ++++++++++++++
 drivers/gpu/drm/xe/xe_ttm_helpers.h    |  50 +++
 drivers/gpu/drm/xe/xe_vm.c             |   4 +
 include/drm/ttm/ttm_backup.h           | 136 ++++++++
 include/drm/ttm/ttm_pool.h             |   5 +
 include/drm/ttm/ttm_resource.h         |  99 +++++-
 include/drm/ttm/ttm_tt.h               |  20 ++
 26 files changed, 1839 insertions(+), 89 deletions(-)
 create mode 100644 drivers/gpu/drm/ttm/ttm_backup_shmem.c
 create mode 100644 drivers/gpu/drm/xe/xe_shrinker.c
 create mode 100644 drivers/gpu/drm/xe/xe_shrinker.h
 create mode 100644 drivers/gpu/drm/xe/xe_ttm_helpers.c
 create mode 100644 drivers/gpu/drm/xe/xe_ttm_helpers.h
 create mode 100644 include/drm/ttm/ttm_backup.h

-- 
2.44.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 1/9] drm/ttm: Allow TTM LRU list nodes of different types
  2024-04-16 10:07 [PATCH v2 0/9] TTM shrinker helpers and xe buffer object shrinker Thomas Hellström
@ 2024-04-16 10:07 ` Thomas Hellström
  2024-04-17  1:15   ` Matthew Brost
  2024-04-16 10:07 ` [PATCH v2 2/9] drm/ttm: Use LRU hitches Thomas Hellström
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Thomas Hellström @ 2024-04-16 10:07 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Christian König,
	Somalapuram Amaranath, dri-devel, Matthew Brost

To be able to handle list unlocking while traversing the LRU
list, we want the iterators not only to point to the next
position of the list traversal, but to insert themselves as
list nodes at that point to work around the fact that the
next node might otherwise disappear from the list while
the iterator is pointing to it.

These list nodes need to be easily distinguishable from other
list nodes so that others traversing the list can skip
over them.

So declare a struct ttm_lru_item, with a struct list_head member
and a type enum. This will slightly increase the size of a
struct ttm_resource.

Changes in previous series:
- Update enum ttm_lru_item_type documentation.

Cc: Christian König <christian.koenig@amd.com>
Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
Cc: <dri-devel@lists.freedesktop.org>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_device.c   | 13 ++++--
 drivers/gpu/drm/ttm/ttm_resource.c | 70 ++++++++++++++++++++++--------
 include/drm/ttm/ttm_resource.h     | 51 +++++++++++++++++++++-
 3 files changed, 110 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index 76027960054f..f27406e851e5 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -270,17 +270,22 @@ EXPORT_SYMBOL(ttm_device_fini);
 static void ttm_device_clear_lru_dma_mappings(struct ttm_device *bdev,
 					      struct list_head *list)
 {
-	struct ttm_resource *res;
+	struct ttm_lru_item *lru;
 
 	spin_lock(&bdev->lru_lock);
-	while ((res = list_first_entry_or_null(list, typeof(*res), lru))) {
-		struct ttm_buffer_object *bo = res->bo;
+	while ((lru = list_first_entry_or_null(list, typeof(*lru), link))) {
+		struct ttm_buffer_object *bo;
+
+		if (!ttm_lru_item_is_res(lru))
+			continue;
+
+		bo = ttm_lru_item_to_res(lru)->bo;
 
 		/* Take ref against racing releases once lru_lock is unlocked */
 		if (!ttm_bo_get_unless_zero(bo))
 			continue;
 
-		list_del_init(&res->lru);
+		list_del_init(&bo->resource->lru.link);
 		spin_unlock(&bdev->lru_lock);
 
 		if (bo->ttm)
diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
index be8d286513f9..7aa5ca5c0e33 100644
--- a/drivers/gpu/drm/ttm/ttm_resource.c
+++ b/drivers/gpu/drm/ttm/ttm_resource.c
@@ -69,8 +69,8 @@ void ttm_lru_bulk_move_tail(struct ttm_lru_bulk_move *bulk)
 			dma_resv_assert_held(pos->last->bo->base.resv);
 
 			man = ttm_manager_type(pos->first->bo->bdev, i);
-			list_bulk_move_tail(&man->lru[j], &pos->first->lru,
-					    &pos->last->lru);
+			list_bulk_move_tail(&man->lru[j], &pos->first->lru.link,
+					    &pos->last->lru.link);
 		}
 	}
 }
@@ -83,14 +83,38 @@ ttm_lru_bulk_move_pos(struct ttm_lru_bulk_move *bulk, struct ttm_resource *res)
 	return &bulk->pos[res->mem_type][res->bo->priority];
 }
 
+/* Return the previous resource on the list (skip over non-resource list items) */
+static struct ttm_resource *ttm_lru_prev_res(struct ttm_resource *cur)
+{
+	struct ttm_lru_item *lru = &cur->lru;
+
+	do {
+		lru = list_prev_entry(lru, link);
+	} while (!ttm_lru_item_is_res(lru));
+
+	return ttm_lru_item_to_res(lru);
+}
+
+/* Return the next resource on the list (skip over non-resource list items) */
+static struct ttm_resource *ttm_lru_next_res(struct ttm_resource *cur)
+{
+	struct ttm_lru_item *lru = &cur->lru;
+
+	do {
+		lru = list_next_entry(lru, link);
+	} while (!ttm_lru_item_is_res(lru));
+
+	return ttm_lru_item_to_res(lru);
+}
+
 /* Move the resource to the tail of the bulk move range */
 static void ttm_lru_bulk_move_pos_tail(struct ttm_lru_bulk_move_pos *pos,
 				       struct ttm_resource *res)
 {
 	if (pos->last != res) {
 		if (pos->first == res)
-			pos->first = list_next_entry(res, lru);
-		list_move(&res->lru, &pos->last->lru);
+			pos->first = ttm_lru_next_res(res);
+		list_move(&res->lru.link, &pos->last->lru.link);
 		pos->last = res;
 	}
 }
@@ -121,11 +145,11 @@ static void ttm_lru_bulk_move_del(struct ttm_lru_bulk_move *bulk,
 		pos->first = NULL;
 		pos->last = NULL;
 	} else if (pos->first == res) {
-		pos->first = list_next_entry(res, lru);
+		pos->first = ttm_lru_next_res(res);
 	} else if (pos->last == res) {
-		pos->last = list_prev_entry(res, lru);
+		pos->last = ttm_lru_prev_res(res);
 	} else {
-		list_move(&res->lru, &pos->last->lru);
+		list_move(&res->lru.link, &pos->last->lru.link);
 	}
 }
 
@@ -154,7 +178,7 @@ void ttm_resource_move_to_lru_tail(struct ttm_resource *res)
 	lockdep_assert_held(&bo->bdev->lru_lock);
 
 	if (bo->pin_count) {
-		list_move_tail(&res->lru, &bdev->pinned);
+		list_move_tail(&res->lru.link, &bdev->pinned);
 
 	} else	if (bo->bulk_move) {
 		struct ttm_lru_bulk_move_pos *pos =
@@ -165,7 +189,7 @@ void ttm_resource_move_to_lru_tail(struct ttm_resource *res)
 		struct ttm_resource_manager *man;
 
 		man = ttm_manager_type(bdev, res->mem_type);
-		list_move_tail(&res->lru, &man->lru[bo->priority]);
+		list_move_tail(&res->lru.link, &man->lru[bo->priority]);
 	}
 }
 
@@ -196,9 +220,9 @@ void ttm_resource_init(struct ttm_buffer_object *bo,
 	man = ttm_manager_type(bo->bdev, place->mem_type);
 	spin_lock(&bo->bdev->lru_lock);
 	if (bo->pin_count)
-		list_add_tail(&res->lru, &bo->bdev->pinned);
+		list_add_tail(&res->lru.link, &bo->bdev->pinned);
 	else
-		list_add_tail(&res->lru, &man->lru[bo->priority]);
+		list_add_tail(&res->lru.link, &man->lru[bo->priority]);
 	man->usage += res->size;
 	spin_unlock(&bo->bdev->lru_lock);
 }
@@ -220,7 +244,7 @@ void ttm_resource_fini(struct ttm_resource_manager *man,
 	struct ttm_device *bdev = man->bdev;
 
 	spin_lock(&bdev->lru_lock);
-	list_del_init(&res->lru);
+	list_del_init(&res->lru.link);
 	man->usage -= res->size;
 	spin_unlock(&bdev->lru_lock);
 }
@@ -471,14 +495,16 @@ struct ttm_resource *
 ttm_resource_manager_first(struct ttm_resource_manager *man,
 			   struct ttm_resource_cursor *cursor)
 {
-	struct ttm_resource *res;
+	struct ttm_lru_item *lru;
 
 	lockdep_assert_held(&man->bdev->lru_lock);
 
 	for (cursor->priority = 0; cursor->priority < TTM_MAX_BO_PRIORITY;
 	     ++cursor->priority)
-		list_for_each_entry(res, &man->lru[cursor->priority], lru)
-			return res;
+		list_for_each_entry(lru, &man->lru[cursor->priority], link) {
+			if (ttm_lru_item_is_res(lru))
+				return ttm_lru_item_to_res(lru);
+		}
 
 	return NULL;
 }
@@ -497,15 +523,21 @@ ttm_resource_manager_next(struct ttm_resource_manager *man,
 			  struct ttm_resource_cursor *cursor,
 			  struct ttm_resource *res)
 {
+	struct ttm_lru_item *lru = &res->lru;
+
 	lockdep_assert_held(&man->bdev->lru_lock);
 
-	list_for_each_entry_continue(res, &man->lru[cursor->priority], lru)
-		return res;
+	list_for_each_entry_continue(lru, &man->lru[cursor->priority], link) {
+		if (ttm_lru_item_is_res(lru))
+			return ttm_lru_item_to_res(lru);
+	}
 
 	for (++cursor->priority; cursor->priority < TTM_MAX_BO_PRIORITY;
 	     ++cursor->priority)
-		list_for_each_entry(res, &man->lru[cursor->priority], lru)
-			return res;
+		list_for_each_entry(lru, &man->lru[cursor->priority], link) {
+			if (ttm_lru_item_is_res(lru))
+				ttm_lru_item_to_res(lru);
+		}
 
 	return NULL;
 }
diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h
index 69769355139f..4babc4ff10b0 100644
--- a/include/drm/ttm/ttm_resource.h
+++ b/include/drm/ttm/ttm_resource.h
@@ -49,6 +49,43 @@ struct io_mapping;
 struct sg_table;
 struct scatterlist;
 
+/**
+ * enum ttm_lru_item_type - enumerate ttm_lru_item subclasses
+ */
+enum ttm_lru_item_type {
+	/** @TTM_LRU_RESOURCE: The resource subclass */
+	TTM_LRU_RESOURCE,
+	/** @TTM_LRU_HITCH: The iterator hitch subclass */
+	TTM_LRU_HITCH
+};
+
+/**
+ * struct ttm_lru_item - The TTM lru list node base class
+ * @link: The list link
+ * @type: The subclass type
+ */
+struct ttm_lru_item {
+	struct list_head link;
+	enum ttm_lru_item_type type;
+};
+
+/**
+ * ttm_lru_item_init() - initialize a struct ttm_lru_item
+ * @item: The item to initialize
+ * @type: The subclass type
+ */
+static inline void ttm_lru_item_init(struct ttm_lru_item *item,
+				     enum ttm_lru_item_type type)
+{
+	item->type = type;
+	INIT_LIST_HEAD(&item->link);
+}
+
+static inline bool ttm_lru_item_is_res(const struct ttm_lru_item *item)
+{
+	return item->type == TTM_LRU_RESOURCE;
+}
+
 struct ttm_resource_manager_func {
 	/**
 	 * struct ttm_resource_manager_func member alloc
@@ -217,9 +254,21 @@ struct ttm_resource {
 	/**
 	 * @lru: Least recently used list, see &ttm_resource_manager.lru
 	 */
-	struct list_head lru;
+	struct ttm_lru_item lru;
 };
 
+/**
+ * ttm_lru_item_to_res() - Downcast a struct ttm_lru_item to a struct ttm_resource
+ * @item: The struct ttm_lru_item to downcast
+ *
+ * Return: Pointer to the embedding struct ttm_resource
+ */
+static inline struct ttm_resource *
+ttm_lru_item_to_res(struct ttm_lru_item *item)
+{
+	return container_of(item, struct ttm_resource, lru);
+}
+
 /**
  * struct ttm_resource_cursor
  *
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 2/9] drm/ttm: Use LRU hitches
  2024-04-16 10:07 [PATCH v2 0/9] TTM shrinker helpers and xe buffer object shrinker Thomas Hellström
  2024-04-16 10:07 ` [PATCH v2 1/9] drm/ttm: Allow TTM LRU list nodes of different types Thomas Hellström
@ 2024-04-16 10:07 ` Thomas Hellström
  2024-04-26  1:23   ` Matthew Brost
  2024-04-16 10:07 ` [PATCH v2 3/9] drm/ttm, drm/amdgpu, drm/xe: Consider hitch moves within bulk sublist moves Thomas Hellström
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Thomas Hellström @ 2024-04-16 10:07 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Christian König,
	Somalapuram Amaranath, dri-devel, Matthew Brost

Have iterators insert themselves into the list they are iterating
over using hitch list nodes. Since only the iterator owner
can remove these list nodes from the list, it's safe to unlock
the list and when continuing, use them as a starting point. Due to
the way LRU bumping works in TTM, newly added items will not be
missed, and bumped items will be iterated over a second time before
reaching the end of the list.

The exception is list with bulk move sublists. When bumping a
sublist, a hitch that is part of that sublist will also be moved
and we might miss items if restarting from it. This will be
addressed in a later patch.

Changes in previous series:
- Updated ttm_resource_cursor_fini() documentation.
v2:
- Don't reorder ttm_resource_manager_first() and _next().
  (Christian König).
- Use list_add instead of list_move
  (Christian König)

Cc: Christian König <christian.koenig@amd.com>
Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
Cc: <dri-devel@lists.freedesktop.org>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c       |  1 +
 drivers/gpu/drm/ttm/ttm_device.c   |  9 ++-
 drivers/gpu/drm/ttm/ttm_resource.c | 94 ++++++++++++++++++++----------
 include/drm/ttm/ttm_resource.h     | 16 +++--
 4 files changed, 82 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 6396dece0db1..43eda720657f 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -621,6 +621,7 @@ int ttm_mem_evict_first(struct ttm_device *bdev,
 		if (locked)
 			dma_resv_unlock(res->bo->base.resv);
 	}
+	ttm_resource_cursor_fini_locked(&cursor);
 
 	if (!bo) {
 		if (busy_bo && !ttm_bo_get_unless_zero(busy_bo))
diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index f27406e851e5..e8a6a1dab669 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -169,12 +169,17 @@ int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
 			num_pages = PFN_UP(bo->base.size);
 			ret = ttm_bo_swapout(bo, ctx, gfp_flags);
 			/* ttm_bo_swapout has dropped the lru_lock */
-			if (!ret)
+			if (!ret) {
+				ttm_resource_cursor_fini(&cursor);
 				return num_pages;
-			if (ret != -EBUSY)
+			}
+			if (ret != -EBUSY) {
+				ttm_resource_cursor_fini(&cursor);
 				return ret;
+			}
 		}
 	}
+	ttm_resource_cursor_fini_locked(&cursor);
 	spin_unlock(&bdev->lru_lock);
 	return 0;
 }
diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
index 7aa5ca5c0e33..22f8ae4ff4c0 100644
--- a/drivers/gpu/drm/ttm/ttm_resource.c
+++ b/drivers/gpu/drm/ttm/ttm_resource.c
@@ -32,6 +32,37 @@
 
 #include <drm/drm_util.h>
 
+/**
+ * ttm_resource_cursor_fini_locked() - Finalize the LRU list cursor usage
+ * @cursor: The struct ttm_resource_cursor to finalize.
+ *
+ * The function pulls the LRU list cursor off any lists it was previusly
+ * attached to. Needs to be called with the LRU lock held. The function
+ * can be called multiple times after eachother.
+ */
+void ttm_resource_cursor_fini_locked(struct ttm_resource_cursor *cursor)
+{
+	lockdep_assert_held(&cursor->man->bdev->lru_lock);
+	list_del_init(&cursor->hitch.link);
+}
+
+/**
+ * ttm_resource_cursor_fini() - Finalize the LRU list cursor usage
+ * @cursor: The struct ttm_resource_cursor to finalize.
+ *
+ * The function pulls the LRU list cursor off any lists it was previusly
+ * attached to. Needs to be called without the LRU list lock held. The
+ * function can be called multiple times after eachother.
+ */
+void ttm_resource_cursor_fini(struct ttm_resource_cursor *cursor)
+{
+	spinlock_t *lru_lock = &cursor->man->bdev->lru_lock;
+
+	spin_lock(lru_lock);
+	ttm_resource_cursor_fini_locked(cursor);
+	spin_unlock(lru_lock);
+}
+
 /**
  * ttm_lru_bulk_move_init - initialize a bulk move structure
  * @bulk: the structure to init
@@ -484,61 +515,62 @@ void ttm_resource_manager_debug(struct ttm_resource_manager *man,
 EXPORT_SYMBOL(ttm_resource_manager_debug);
 
 /**
- * ttm_resource_manager_first
- *
+ * ttm_resource_manager_first() - Start iterating over the resources
+ * of a resource manager
  * @man: resource manager to iterate over
  * @cursor: cursor to record the position
  *
- * Returns the first resource from the resource manager.
+ * Initializes the cursor and starts iterating. When done iterating,
+ * the caller must explicitly call ttm_resource_cursor_fini().
+ *
+ * Return: The first resource from the resource manager.
  */
 struct ttm_resource *
 ttm_resource_manager_first(struct ttm_resource_manager *man,
 			   struct ttm_resource_cursor *cursor)
 {
-	struct ttm_lru_item *lru;
-
 	lockdep_assert_held(&man->bdev->lru_lock);
 
-	for (cursor->priority = 0; cursor->priority < TTM_MAX_BO_PRIORITY;
-	     ++cursor->priority)
-		list_for_each_entry(lru, &man->lru[cursor->priority], link) {
-			if (ttm_lru_item_is_res(lru))
-				return ttm_lru_item_to_res(lru);
-		}
+	cursor->priority = 0;
+	cursor->man = man;
+	ttm_lru_item_init(&cursor->hitch, TTM_LRU_HITCH);
+	list_add(&cursor->hitch.link, &man->lru[cursor->priority]);
 
-	return NULL;
+	return ttm_resource_manager_next(cursor);
 }
 
 /**
- * ttm_resource_manager_next
- *
- * @man: resource manager to iterate over
+ * ttm_resource_manager_next() - Continue iterating over the resource manager
+ * resources
  * @cursor: cursor to record the position
- * @res: the current resource pointer
  *
- * Returns the next resource from the resource manager.
+ * Return: The next resource from the resource manager.
  */
 struct ttm_resource *
-ttm_resource_manager_next(struct ttm_resource_manager *man,
-			  struct ttm_resource_cursor *cursor,
-			  struct ttm_resource *res)
+ttm_resource_manager_next(struct ttm_resource_cursor *cursor)
 {
-	struct ttm_lru_item *lru = &res->lru;
+	struct ttm_resource_manager *man = cursor->man;
+	struct ttm_lru_item *lru;
 
 	lockdep_assert_held(&man->bdev->lru_lock);
 
-	list_for_each_entry_continue(lru, &man->lru[cursor->priority], link) {
-		if (ttm_lru_item_is_res(lru))
-			return ttm_lru_item_to_res(lru);
-	}
-
-	for (++cursor->priority; cursor->priority < TTM_MAX_BO_PRIORITY;
-	     ++cursor->priority)
-		list_for_each_entry(lru, &man->lru[cursor->priority], link) {
-			if (ttm_lru_item_is_res(lru))
-				ttm_lru_item_to_res(lru);
+	do {
+		lru = &cursor->hitch;
+		list_for_each_entry_continue(lru, &man->lru[cursor->priority], link) {
+			if (ttm_lru_item_is_res(lru)) {
+				list_move(&cursor->hitch.link, &lru->link);
+				return ttm_lru_item_to_res(lru);
+			}
 		}
 
+		if (++cursor->priority >= TTM_MAX_BO_PRIORITY)
+			break;
+
+		list_move(&cursor->hitch.link, &man->lru[cursor->priority]);
+	} while (true);
+
+	list_del_init(&cursor->hitch.link);
+
 	return NULL;
 }
 
diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h
index 4babc4ff10b0..dfc01258c981 100644
--- a/include/drm/ttm/ttm_resource.h
+++ b/include/drm/ttm/ttm_resource.h
@@ -271,15 +271,23 @@ ttm_lru_item_to_res(struct ttm_lru_item *item)
 
 /**
  * struct ttm_resource_cursor
- *
+ * @man: The resource manager currently being iterated over
+ * @hitch: A hitch list node inserted before the next resource
+ * to iterate over.
  * @priority: the current priority
  *
  * Cursor to iterate over the resources in a manager.
  */
 struct ttm_resource_cursor {
+	struct ttm_resource_manager *man;
+	struct ttm_lru_item hitch;
 	unsigned int priority;
 };
 
+void ttm_resource_cursor_fini_locked(struct ttm_resource_cursor *cursor);
+
+void ttm_resource_cursor_fini(struct ttm_resource_cursor *cursor);
+
 /**
  * struct ttm_lru_bulk_move_pos
  *
@@ -438,9 +446,7 @@ struct ttm_resource *
 ttm_resource_manager_first(struct ttm_resource_manager *man,
 			   struct ttm_resource_cursor *cursor);
 struct ttm_resource *
-ttm_resource_manager_next(struct ttm_resource_manager *man,
-			  struct ttm_resource_cursor *cursor,
-			  struct ttm_resource *res);
+ttm_resource_manager_next(struct ttm_resource_cursor *cursor);
 
 /**
  * ttm_resource_manager_for_each_res - iterate over all resources
@@ -452,7 +458,7 @@ ttm_resource_manager_next(struct ttm_resource_manager *man,
  */
 #define ttm_resource_manager_for_each_res(man, cursor, res)		\
 	for (res = ttm_resource_manager_first(man, cursor); res;	\
-	     res = ttm_resource_manager_next(man, cursor, res))
+	     res = ttm_resource_manager_next(cursor))
 
 struct ttm_kmap_iter *
 ttm_kmap_iter_iomap_init(struct ttm_kmap_iter_iomap *iter_io,
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 3/9] drm/ttm, drm/amdgpu, drm/xe: Consider hitch moves within bulk sublist moves
  2024-04-16 10:07 [PATCH v2 0/9] TTM shrinker helpers and xe buffer object shrinker Thomas Hellström
  2024-04-16 10:07 ` [PATCH v2 1/9] drm/ttm: Allow TTM LRU list nodes of different types Thomas Hellström
  2024-04-16 10:07 ` [PATCH v2 2/9] drm/ttm: Use LRU hitches Thomas Hellström
@ 2024-04-16 10:07 ` Thomas Hellström
  2024-04-26  1:20   ` Matthew Brost
  2024-04-16 10:07 ` [PATCH v2 4/9] drm/ttm: Allow continued swapout after -ENOSPC falure Thomas Hellström
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Thomas Hellström @ 2024-04-16 10:07 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Christian König,
	Somalapuram Amaranath, dri-devel, Matthew Brost

To address the problem with hitches moving when bulk move
sublists are lru-bumped, register the list cursors with the
ttm_lru_bulk_move structure when traversing its list, and
when lru-bumping the list, move the cursor hitch to the tail.
This also means it's mandatory for drivers to call
ttm_lru_bulk_move_init() and ttm_lru_bulk_move_fini() when
initializing and finalizing the bulk move structure, so add
those calls to the amdgpu- and xe driver.

Compared to v1 this is slightly more code but less fragile
and hopefully easier to understand.

Changes in previous series:
- Completely rework the functionality
- Avoid a NULL pointer dereference assigning manager->mem_type
- Remove some leftover code causing build problems
v2:
- For hitch bulk tail moves, store the mem_type in the cursor
  instead of with the manager.

Cc: Christian König <christian.koenig@amd.com>
Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
Cc: <dri-devel@lists.freedesktop.org>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  4 ++
 drivers/gpu/drm/ttm/ttm_resource.c     | 92 +++++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_vm.c             |  4 ++
 include/drm/ttm/ttm_resource.h         | 58 ++++++++++------
 4 files changed, 137 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 4299ce386322..18bf174c8d47 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2368,6 +2368,8 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	if (r)
 		return r;
 
+	ttm_lru_bulk_move_init(&vm->lru_bulk_move);
+
 	vm->is_compute_context = false;
 
 	vm->use_cpu_for_update = !!(adev->vm_manager.vm_update_mode &
@@ -2431,6 +2433,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 error_free_delayed:
 	dma_fence_put(vm->last_tlb_flush);
 	dma_fence_put(vm->last_unlocked);
+	ttm_lru_bulk_move_fini(&adev->mman.bdev, &vm->lru_bulk_move);
 	amdgpu_vm_fini_entities(vm);
 
 	return r;
@@ -2587,6 +2590,7 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
 		}
 	}
 
+	ttm_lru_bulk_move_fini(&adev->mman.bdev, &vm->lru_bulk_move);
 }
 
 /**
diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
index 22f8ae4ff4c0..2b93727c78e5 100644
--- a/drivers/gpu/drm/ttm/ttm_resource.c
+++ b/drivers/gpu/drm/ttm/ttm_resource.c
@@ -32,6 +32,49 @@
 
 #include <drm/drm_util.h>
 
+/* Detach the cursor from the bulk move list*/
+static void
+ttm_resource_cursor_clear_bulk(struct ttm_resource_cursor *cursor)
+{
+	cursor->bulk = NULL;
+	list_del_init(&cursor->bulk_link);
+}
+
+/* Move the cursor to the end of the bulk move list it's in */
+static void ttm_resource_cursor_move_bulk_tail(struct ttm_lru_bulk_move *bulk,
+					       struct ttm_resource_cursor *cursor)
+{
+	struct ttm_lru_bulk_move_pos *pos;
+
+	if (WARN_ON_ONCE(bulk != cursor->bulk)) {
+		list_del_init(&cursor->bulk_link);
+		return;
+	}
+
+	pos = &bulk->pos[cursor->mem_type][cursor->priority];
+	if (pos)
+		list_move(&cursor->hitch.link, &pos->last->lru.link);
+	ttm_resource_cursor_clear_bulk(cursor);
+}
+
+/* Move all cursors attached to a bulk move to its end */
+static void ttm_bulk_move_adjust_cursors(struct ttm_lru_bulk_move *bulk)
+{
+	struct ttm_resource_cursor *cursor, *next;
+
+	list_for_each_entry_safe(cursor, next, &bulk->cursor_list, bulk_link)
+		ttm_resource_cursor_move_bulk_tail(bulk, cursor);
+}
+
+/* Remove a cursor from an empty bulk move list */
+static void ttm_bulk_move_drop_cursors(struct ttm_lru_bulk_move *bulk)
+{
+	struct ttm_resource_cursor *cursor, *next;
+
+	list_for_each_entry_safe(cursor, next, &bulk->cursor_list, bulk_link)
+		ttm_resource_cursor_clear_bulk(cursor);
+}
+
 /**
  * ttm_resource_cursor_fini_locked() - Finalize the LRU list cursor usage
  * @cursor: The struct ttm_resource_cursor to finalize.
@@ -44,6 +87,7 @@ void ttm_resource_cursor_fini_locked(struct ttm_resource_cursor *cursor)
 {
 	lockdep_assert_held(&cursor->man->bdev->lru_lock);
 	list_del_init(&cursor->hitch.link);
+	ttm_resource_cursor_clear_bulk(cursor);
 }
 
 /**
@@ -72,9 +116,27 @@ void ttm_resource_cursor_fini(struct ttm_resource_cursor *cursor)
 void ttm_lru_bulk_move_init(struct ttm_lru_bulk_move *bulk)
 {
 	memset(bulk, 0, sizeof(*bulk));
+	INIT_LIST_HEAD(&bulk->cursor_list);
 }
 EXPORT_SYMBOL(ttm_lru_bulk_move_init);
 
+/**
+ * ttm_lru_bulk_move_fini - finalize a bulk move structure
+ * @bdev: The struct ttm_device
+ * @bulk: the structure to finalize
+ *
+ * Sanity checks that bulk moves don't have any
+ * resources left and hence no cursors attached.
+ */
+void ttm_lru_bulk_move_fini(struct ttm_device *bdev,
+			    struct ttm_lru_bulk_move *bulk)
+{
+	spin_lock(&bdev->lru_lock);
+	ttm_bulk_move_drop_cursors(bulk);
+	spin_unlock(&bdev->lru_lock);
+}
+EXPORT_SYMBOL(ttm_lru_bulk_move_fini);
+
 /**
  * ttm_lru_bulk_move_tail - bulk move range of resources to the LRU tail.
  *
@@ -87,6 +149,7 @@ void ttm_lru_bulk_move_tail(struct ttm_lru_bulk_move *bulk)
 {
 	unsigned i, j;
 
+	ttm_bulk_move_adjust_cursors(bulk);
 	for (i = 0; i < TTM_NUM_MEM_TYPES; ++i) {
 		for (j = 0; j < TTM_MAX_BO_PRIORITY; ++j) {
 			struct ttm_lru_bulk_move_pos *pos = &bulk->pos[i][j];
@@ -418,6 +481,7 @@ void ttm_resource_manager_init(struct ttm_resource_manager *man,
 	man->bdev = bdev;
 	man->size = size;
 	man->usage = 0;
+	man->mem_type = TTM_NUM_MEM_TYPES;
 
 	for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i)
 		INIT_LIST_HEAD(&man->lru[i]);
@@ -514,6 +578,29 @@ void ttm_resource_manager_debug(struct ttm_resource_manager *man,
 }
 EXPORT_SYMBOL(ttm_resource_manager_debug);
 
+static void
+ttm_resource_cursor_check_bulk(struct ttm_resource_cursor *cursor,
+			       struct ttm_lru_item *next_lru)
+{
+	struct ttm_resource *next = ttm_lru_item_to_res(next_lru);
+	struct ttm_lru_bulk_move *bulk = NULL;
+	struct ttm_buffer_object *bo = next->bo;
+
+	lockdep_assert_held(&cursor->man->bdev->lru_lock);
+	if (bo && bo->resource == next)
+		bulk = bo->bulk_move;
+
+	if (cursor->bulk != bulk) {
+		if (bulk) {
+			list_move_tail(&cursor->bulk_link, &bulk->cursor_list);
+			cursor->mem_type = next->mem_type;
+		} else {
+			list_del_init(&cursor->bulk_link);
+		}
+		cursor->bulk = bulk;
+	}
+}
+
 /**
  * ttm_resource_manager_first() - Start iterating over the resources
  * of a resource manager
@@ -534,6 +621,7 @@ ttm_resource_manager_first(struct ttm_resource_manager *man,
 	cursor->priority = 0;
 	cursor->man = man;
 	ttm_lru_item_init(&cursor->hitch, TTM_LRU_HITCH);
+	INIT_LIST_HEAD(&cursor->bulk_link);
 	list_add(&cursor->hitch.link, &man->lru[cursor->priority]);
 
 	return ttm_resource_manager_next(cursor);
@@ -558,6 +646,7 @@ ttm_resource_manager_next(struct ttm_resource_cursor *cursor)
 		lru = &cursor->hitch;
 		list_for_each_entry_continue(lru, &man->lru[cursor->priority], link) {
 			if (ttm_lru_item_is_res(lru)) {
+				ttm_resource_cursor_check_bulk(cursor, lru);
 				list_move(&cursor->hitch.link, &lru->link);
 				return ttm_lru_item_to_res(lru);
 			}
@@ -567,9 +656,10 @@ ttm_resource_manager_next(struct ttm_resource_cursor *cursor)
 			break;
 
 		list_move(&cursor->hitch.link, &man->lru[cursor->priority]);
+		ttm_resource_cursor_clear_bulk(cursor);
 	} while (true);
 
-	list_del_init(&cursor->hitch.link);
+	ttm_resource_cursor_fini_locked(cursor);
 
 	return NULL;
 }
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 2dbba55e7785..45e1975eed26 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1262,6 +1262,8 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
 
 	INIT_WORK(&vm->destroy_work, vm_destroy_work_func);
 
+	ttm_lru_bulk_move_init(&vm->lru_bulk_move);
+
 	INIT_LIST_HEAD(&vm->preempt.exec_queues);
 	vm->preempt.min_run_period_ms = 10;	/* FIXME: Wire up to uAPI */
 
@@ -1379,6 +1381,7 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
 	mutex_destroy(&vm->snap_mutex);
 	for_each_tile(tile, xe, id)
 		xe_range_fence_tree_fini(&vm->rftree[id]);
+	ttm_lru_bulk_move_fini(&xe->ttm, &vm->lru_bulk_move);
 	kfree(vm);
 	if (!(flags & XE_VM_FLAG_MIGRATION))
 		xe_device_mem_access_put(xe);
@@ -1518,6 +1521,7 @@ static void vm_destroy_work_func(struct work_struct *w)
 		XE_WARN_ON(vm->pt_root[id]);
 
 	trace_xe_vm_free(vm);
+	ttm_lru_bulk_move_fini(&xe->ttm, &vm->lru_bulk_move);
 	kfree(vm);
 }
 
diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h
index dfc01258c981..ed09b49a001e 100644
--- a/include/drm/ttm/ttm_resource.h
+++ b/include/drm/ttm/ttm_resource.h
@@ -211,6 +211,9 @@ struct ttm_resource_manager {
 	 * bdev->lru_lock.
 	 */
 	uint64_t usage;
+
+	/** @mem_type: The memory type used for this manager. */
+	unsigned int mem_type;
 };
 
 /**
@@ -269,25 +272,6 @@ ttm_lru_item_to_res(struct ttm_lru_item *item)
 	return container_of(item, struct ttm_resource, lru);
 }
 
-/**
- * struct ttm_resource_cursor
- * @man: The resource manager currently being iterated over
- * @hitch: A hitch list node inserted before the next resource
- * to iterate over.
- * @priority: the current priority
- *
- * Cursor to iterate over the resources in a manager.
- */
-struct ttm_resource_cursor {
-	struct ttm_resource_manager *man;
-	struct ttm_lru_item hitch;
-	unsigned int priority;
-};
-
-void ttm_resource_cursor_fini_locked(struct ttm_resource_cursor *cursor);
-
-void ttm_resource_cursor_fini(struct ttm_resource_cursor *cursor);
-
 /**
  * struct ttm_lru_bulk_move_pos
  *
@@ -303,8 +287,9 @@ struct ttm_lru_bulk_move_pos {
 
 /**
  * struct ttm_lru_bulk_move
- *
  * @pos: first/last lru entry for resources in the each domain/priority
+ * @cursor_list: The list of cursors currently traversing any of
+ * the sublists of @pos. Protected by the ttm device's lru_lock.
  *
  * Container for the current bulk move state. Should be used with
  * ttm_lru_bulk_move_init() and ttm_bo_set_bulk_move().
@@ -314,8 +299,39 @@ struct ttm_lru_bulk_move_pos {
  */
 struct ttm_lru_bulk_move {
 	struct ttm_lru_bulk_move_pos pos[TTM_NUM_MEM_TYPES][TTM_MAX_BO_PRIORITY];
+	struct list_head cursor_list;
+};
+
+/**
+ * struct ttm_resource_cursor
+ * @man: The resource manager currently being iterated over
+ * @hitch: A hitch list node inserted before the next resource
+ * to iterate over.
+ * @bulk_link: A list link for the list of cursors traversing the
+ * bulk sublist of @bulk. Protected by the ttm device's lru_lock.
+ * @bulk: Pointer to struct ttm_lru_bulk_move whose subrange @hitch is
+ * inserted to. NULL if none. Never dereference this pointer since
+ * the struct ttm_lru_bulk_move object pointed to might have been
+ * freed. The pointer is only for comparison.
+ * @mem_type: The memory type of the LRU list being traversed.
+ * This field is valid iff @bulk != NULL.
+ * @priority: the current priority
+ *
+ * Cursor to iterate over the resources in a manager.
+ */
+struct ttm_resource_cursor {
+	struct ttm_resource_manager *man;
+	struct ttm_lru_item hitch;
+	struct list_head bulk_link;
+	struct ttm_lru_bulk_move *bulk;
+	unsigned int mem_type;
+	unsigned int priority;
 };
 
+void ttm_resource_cursor_fini_locked(struct ttm_resource_cursor *cursor);
+
+void ttm_resource_cursor_fini(struct ttm_resource_cursor *cursor);
+
 /**
  * struct ttm_kmap_iter_iomap - Specialization for a struct io_mapping +
  * struct sg_table backed struct ttm_resource.
@@ -404,6 +420,8 @@ ttm_resource_manager_cleanup(struct ttm_resource_manager *man)
 
 void ttm_lru_bulk_move_init(struct ttm_lru_bulk_move *bulk);
 void ttm_lru_bulk_move_tail(struct ttm_lru_bulk_move *bulk);
+void ttm_lru_bulk_move_fini(struct ttm_device *bdev,
+			    struct ttm_lru_bulk_move *bulk);
 
 void ttm_resource_add_bulk_move(struct ttm_resource *res,
 				struct ttm_buffer_object *bo);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 4/9] drm/ttm: Allow continued swapout after -ENOSPC falure
  2024-04-16 10:07 [PATCH v2 0/9] TTM shrinker helpers and xe buffer object shrinker Thomas Hellström
                   ` (2 preceding siblings ...)
  2024-04-16 10:07 ` [PATCH v2 3/9] drm/ttm, drm/amdgpu, drm/xe: Consider hitch moves within bulk sublist moves Thomas Hellström
@ 2024-04-16 10:07 ` Thomas Hellström
  2024-04-17  1:44   ` Matthew Brost
  2024-04-16 10:07 ` [PATCH v2 5/9] drm/ttm: Add a virtual base class for graphics memory backup Thomas Hellström
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Thomas Hellström @ 2024-04-16 10:07 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Christian König,
	Somalapuram Amaranath, dri-devel, Matthew Brost

The -ENOSPC failure from ttm_bo_swapout() meant that the lru_lock
was dropped and simply restarting the iteration meant we'd likely
hit the same error again on the same resource. Now that we can
restart the iteration even if the lock was dropped, do that.

Cc: Christian König <christian.koenig@amd.com>
Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
Cc: <dri-devel@lists.freedesktop.org>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_device.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index e8a6a1dab669..4a030b4bc848 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -168,15 +168,20 @@ int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
 
 			num_pages = PFN_UP(bo->base.size);
 			ret = ttm_bo_swapout(bo, ctx, gfp_flags);
-			/* ttm_bo_swapout has dropped the lru_lock */
-			if (!ret) {
-				ttm_resource_cursor_fini(&cursor);
-				return num_pages;
-			}
-			if (ret != -EBUSY) {
-				ttm_resource_cursor_fini(&cursor);
-				return ret;
+			/* Couldn't swap out, and retained the lru_lock */
+			if (ret == -EBUSY)
+				continue;
+			/* Couldn't swap out and dropped the lru_lock */
+			if (ret == -ENOSPC) {
+				spin_lock(&bdev->lru_lock);
+				continue;
 			}
+			/*
+			 * Dropped the lock and either succeeded or
+			 * hit an error that forces us to break.
+			 */
+			ttm_resource_cursor_fini(&cursor);
+			return ret ? ret : num_pages;
 		}
 	}
 	ttm_resource_cursor_fini_locked(&cursor);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 5/9] drm/ttm: Add a virtual base class for graphics memory backup
  2024-04-16 10:07 [PATCH v2 0/9] TTM shrinker helpers and xe buffer object shrinker Thomas Hellström
                   ` (3 preceding siblings ...)
  2024-04-16 10:07 ` [PATCH v2 4/9] drm/ttm: Allow continued swapout after -ENOSPC falure Thomas Hellström
@ 2024-04-16 10:07 ` Thomas Hellström
  2024-04-16 10:07 ` [PATCH v2 6/9] drm/ttm/pool: Provide a helper to shrink pages Thomas Hellström
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Thomas Hellström @ 2024-04-16 10:07 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Christian König,
	Somalapuram Amaranath, dri-devel, Matthew Brost

Initially intended for experimenting with different backup
solutions (shmem vs direct swap cache insertion), abstract
the backup destination using a virtual base class.

Also provide a sample implementation for shmem.

While when settling on a preferred backup solution, one could
perhaps skip the abstraction, this functionality may actually
come in handy for configurable dedicated graphics memory
backup to fast nvme files or similar, whithout affecting
swap-space. Could indeed be useful for VRAM backup on S4 and
other cases.

Cc: Christian König <christian.koenig@amd.com>
Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
Cc: <dri-devel@lists.freedesktop.org>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/Makefile           |   2 +-
 drivers/gpu/drm/ttm/ttm_backup_shmem.c | 137 +++++++++++++++++++++++++
 include/drm/ttm/ttm_backup.h           | 136 ++++++++++++++++++++++++
 3 files changed, 274 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/ttm/ttm_backup_shmem.c
 create mode 100644 include/drm/ttm/ttm_backup.h

diff --git a/drivers/gpu/drm/ttm/Makefile b/drivers/gpu/drm/ttm/Makefile
index dad298127226..5e980dd90e41 100644
--- a/drivers/gpu/drm/ttm/Makefile
+++ b/drivers/gpu/drm/ttm/Makefile
@@ -4,7 +4,7 @@
 
 ttm-y := ttm_tt.o ttm_bo.o ttm_bo_util.o ttm_bo_vm.o ttm_module.o \
 	ttm_execbuf_util.o ttm_range_manager.o ttm_resource.o ttm_pool.o \
-	ttm_device.o ttm_sys_manager.o
+	ttm_device.o ttm_sys_manager.o ttm_backup_shmem.o
 ttm-$(CONFIG_AGP) += ttm_agp_backend.o
 
 obj-$(CONFIG_DRM_TTM) += ttm.o
diff --git a/drivers/gpu/drm/ttm/ttm_backup_shmem.c b/drivers/gpu/drm/ttm/ttm_backup_shmem.c
new file mode 100644
index 000000000000..79c2f552863a
--- /dev/null
+++ b/drivers/gpu/drm/ttm/ttm_backup_shmem.c
@@ -0,0 +1,137 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2024 Intel Corporation
+ */
+
+#include <drm/ttm/ttm_backup.h>
+#include <linux/page-flags.h>
+
+/**
+ * struct ttm_backup_shmem - A shmem based ttm_backup subclass.
+ * @backup: The base struct ttm_backup
+ * @filp: The associated shmem object
+ */
+struct ttm_backup_shmem {
+	struct ttm_backup backup;
+	struct file *filp;
+};
+
+static struct ttm_backup_shmem *to_backup_shmem(struct ttm_backup *backup)
+{
+	return container_of(backup, struct ttm_backup_shmem, backup);
+}
+
+static void ttm_backup_shmem_drop(struct ttm_backup *backup, unsigned long handle)
+{
+	handle -= 1;
+	shmem_truncate_range(file_inode(to_backup_shmem(backup)->filp), handle,
+			     handle + 1);
+}
+
+static int ttm_backup_shmem_copy_page(struct ttm_backup *backup, struct page *dst,
+				      unsigned long handle, bool killable)
+{
+	struct file *filp = to_backup_shmem(backup)->filp;
+	struct address_space *mapping = filp->f_mapping;
+	struct folio *from_folio;
+
+	handle -= 1;
+	from_folio = shmem_read_folio(mapping, handle);
+	if (IS_ERR(from_folio))
+		return PTR_ERR(from_folio);
+
+	/* Note: Use drm_memcpy_from_wc? */
+	copy_highpage(dst, folio_file_page(from_folio, handle));
+	folio_put(from_folio);
+
+	return 0;
+}
+
+static unsigned long
+ttm_backup_shmem_backup_page(struct ttm_backup *backup, struct page *page,
+			     bool writeback, pgoff_t i, gfp_t page_gfp,
+			     gfp_t alloc_gfp)
+{
+	struct file *filp = to_backup_shmem(backup)->filp;
+	struct address_space *mapping = filp->f_mapping;
+	unsigned long handle = 0;
+	struct folio *to_folio;
+	int ret;
+
+	to_folio = shmem_read_folio_gfp(mapping, i, alloc_gfp);
+	if (IS_ERR(to_folio))
+		return handle;
+
+	folio_mark_accessed(to_folio);
+	folio_lock(to_folio);
+	folio_mark_dirty(to_folio);
+	copy_highpage(folio_file_page(to_folio, i), page);
+	handle = i + 1;
+
+	if (writeback && !folio_mapped(to_folio) && folio_clear_dirty_for_io(to_folio)) {
+		struct writeback_control wbc = {
+			.sync_mode = WB_SYNC_NONE,
+			.nr_to_write = SWAP_CLUSTER_MAX,
+			.range_start = 0,
+			.range_end = LLONG_MAX,
+			.for_reclaim = 1,
+		};
+		folio_set_reclaim(to_folio);
+		ret = mapping->a_ops->writepage(folio_page(to_folio, 0), &wbc);
+		if (!folio_test_writeback(to_folio))
+			folio_clear_reclaim(to_folio);
+		/* If writepage succeeds, it unlocks the folio */
+		if (ret)
+			folio_unlock(to_folio);
+	} else {
+		folio_unlock(to_folio);
+	}
+
+	folio_put(to_folio);
+
+	return handle;
+}
+
+static void ttm_backup_shmem_fini(struct ttm_backup *backup)
+{
+	struct ttm_backup_shmem *sbackup = to_backup_shmem(backup);
+
+	fput(sbackup->filp);
+	kfree(sbackup);
+}
+
+static const struct ttm_backup_ops ttm_backup_shmem_ops = {
+	.drop = ttm_backup_shmem_drop,
+	.copy_backed_up_page = ttm_backup_shmem_copy_page,
+	.backup_page = ttm_backup_shmem_backup_page,
+	.fini = ttm_backup_shmem_fini,
+};
+
+/**
+ * ttm_backup_shmem_create() - Create a shmem-based struct backup.
+ * @size: The maximum size (in bytes) to back up.
+ *
+ * Create a backup utilizing shmem objects.
+ *
+ * Return: A pointer to a struct ttm_backup on success,
+ * an error pointer on error.
+ */
+struct ttm_backup *ttm_backup_shmem_create(loff_t size)
+{
+	struct ttm_backup_shmem *sbackup =
+		kzalloc(sizeof(*sbackup), GFP_KERNEL | __GFP_ACCOUNT);
+
+	if (!sbackup)
+		return ERR_PTR(-ENOMEM);
+
+	sbackup->filp = shmem_file_setup("ttm shmem backup", size, 0);
+	if (IS_ERR(sbackup->filp)) {
+		kfree(sbackup);
+		return ERR_CAST(sbackup->filp);
+	}
+
+	sbackup->backup.ops = &ttm_backup_shmem_ops;
+
+	return &sbackup->backup;
+}
+EXPORT_SYMBOL_GPL(ttm_backup_shmem_create);
diff --git a/include/drm/ttm/ttm_backup.h b/include/drm/ttm/ttm_backup.h
new file mode 100644
index 000000000000..88e8b97a6fdc
--- /dev/null
+++ b/include/drm/ttm/ttm_backup.h
@@ -0,0 +1,136 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2024 Intel Corporation
+ */
+
+#ifndef _TTM_BACKUP_H_
+#define _TTM_BACKUP_H_
+
+#include <linux/mm_types.h>
+#include <linux/shmem_fs.h>
+
+struct ttm_backup;
+
+/**
+ * ttm_backup_handle_to_page_ptr() - Convert handle to struct page pointer
+ * @handle: The handle to convert.
+ *
+ * Converts an opaque handle received from the
+ * struct ttm_backoup_ops::backup_page() function to an (invalid)
+ * struct page pointer suitable for a struct page array.
+ *
+ * Return: An (invalid) struct page pointer.
+ */
+static inline struct page *
+ttm_backup_handle_to_page_ptr(unsigned long handle)
+{
+	return (struct page *)(handle << 1 | 1);
+}
+
+/**
+ * ttm_backup_page_ptr_is_handle() - Whether a struct page pointer is a handle
+ * @page: The struct page pointer to check.
+ *
+ * Return: true if the struct page pointer is a handld returned from
+ * ttm_backup_handle_to_page_ptr(). False otherwise.
+ */
+static inline bool ttm_backup_page_ptr_is_handle(const struct page *page)
+{
+	return (unsigned long)page & 1;
+}
+
+/**
+ * ttm_backup_page_ptr_to_handle() - Convert a struct page pointer to a handle
+ * @page: The struct page pointer to convert
+ *
+ * Return: The handle that was previously used in
+ * ttm_backup_handle_to_page_ptr() to obtain a struct page pointer, suitable
+ * for use as argument in the struct ttm_backup_ops drop() or
+ * copy_backed_up_page() functions.
+ */
+static inline unsigned long
+ttm_backup_page_ptr_to_handle(const struct page *page)
+{
+	WARN_ON(!ttm_backup_page_ptr_is_handle(page));
+	return (unsigned long)page >> 1;
+}
+
+/** struct ttm_backup_ops - A struct ttm_backup backend operations */
+struct ttm_backup_ops {
+	/**
+	 * drop - release memory associated with a handle
+	 * @backup: The struct backup pointer used to obtain the handle
+	 * @handle: The handle obtained from the @backup_page function.
+	 */
+	void (*drop)(struct ttm_backup *backup, unsigned long handle);
+
+	/**
+	 * copy_backed_up_page - Copy the contents of a previously backed
+	 * up page
+	 * @backup: The struct backup pointer used to back up the page.
+	 * @dst: The struct page to copy into.
+	 * @handle: The handle returned when the page was backed up.
+	 * @intr: Try to perform waits interruptable or at least killable.
+	 *
+	 * Return: 0 on success, Negative error code on failure, notably
+	 * -EINTR if @intr was set to true and a signal is pending.
+	 */
+	int (*copy_backed_up_page)(struct ttm_backup *backup, struct page *dst,
+				   unsigned long handle, bool intr);
+
+	/**
+	 * backup_page - Backup a page
+	 * @backup: The struct backup pointer to use.
+	 * @page: The page to back up.
+	 * @writeback: Whether to perform immediate writeback of the page.
+	 * This may have performance implications.
+	 * @i: A unique integer for each page and each struct backup.
+	 * This is a hint allowing the backup backend to avoid managing
+	 * its address space separately.
+	 * @page_gfp: The gfp value used when the page was allocated.
+	 * This is used for accounting purposes.
+	 * @alloc_gfp: The gpf to be used when the backend needs to allocaete
+	 * memory.
+	 *
+	 * Return: A handle on success. 0 on failure.
+	 * (This is following the swp_entry_t convention).
+	 *
+	 * Note: This function could be extended to back up a folio and
+	 * backends would then split the folio internally if needed.
+	 * Drawback is that the caller would then have to keep track of
+	 */
+	unsigned long (*backup_page)(struct ttm_backup *backup, struct page *page,
+				     bool writeback, pgoff_t i, gfp_t page_gfp,
+				     gfp_t alloc_gfp);
+	/**
+	 * fini - Free the struct backup resources after last use.
+	 * @backup: Pointer to the struct backup whose resources to free.
+	 *
+	 * After a call to @fini, it's illegal to use the @backup pointer.
+	 */
+	void (*fini)(struct ttm_backup *backup);
+};
+
+/**
+ * struct ttm_backup - Abstract a backup backend.
+ * @ops: The operations as described above.
+ *
+ * The struct ttm_backup is intended to be subclassed by the
+ * backend implementation.
+ */
+struct ttm_backup {
+	const struct ttm_backup_ops *ops;
+};
+
+/**
+ * ttm_backup_shmem_create() - Create a shmem-based struct backup.
+ * @size: The maximum size (in bytes) to back up.
+ *
+ * Create a backup utilizing shmem objects.
+ *
+ * Return: A pointer to a struct ttm_backup on success,
+ * an error pointer on error.
+ */
+struct ttm_backup *ttm_backup_shmem_create(loff_t size);
+
+#endif
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 6/9] drm/ttm/pool: Provide a helper to shrink pages.
  2024-04-16 10:07 [PATCH v2 0/9] TTM shrinker helpers and xe buffer object shrinker Thomas Hellström
                   ` (4 preceding siblings ...)
  2024-04-16 10:07 ` [PATCH v2 5/9] drm/ttm: Add a virtual base class for graphics memory backup Thomas Hellström
@ 2024-04-16 10:07 ` Thomas Hellström
  2024-04-16 10:07 ` [PATCH v2 7/9] drm/ttm: Use fault-injection to test error paths Thomas Hellström
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Thomas Hellström @ 2024-04-16 10:07 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Christian König,
	Somalapuram Amaranath, dri-devel, Matthew Brost

Provide a helper to shrink ttm_tt page-vectors on a per-page
basis. A ttm_backup backend could then in theory get away with
allocating a single temporary page for each struct ttm_tt.

This is accomplished by splitting larger pages before trying to
back them up.

In the future we could allow ttm_backup to handle backing up
large pages as well, but currently there's no benefit in
doing that, since the shmem backup backend would have to
split those anyway to avoid allocating too much temporary
memory, and if the backend instead inserts pages into the
swap-cache, those are split on reclaim by the core.

Due to potential backup- and recover errors, allow partially swapped
out struct ttm_tt's, although mark them as swapped out stopping them
from being swapped out a second time. More details in the ttm_pool.c
DOC section.

v2:
- A couple of cleanups and error fixes in ttm_pool_back_up_tt.
- s/back_up/backup/
- Add a writeback parameter to the exported interface.

Cc: Christian König <christian.koenig@amd.com>
Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
Cc: <dri-devel@lists.freedesktop.org>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_pool.c | 397 +++++++++++++++++++++++++++++++--
 drivers/gpu/drm/ttm/ttm_tt.c   |  37 +++
 include/drm/ttm/ttm_pool.h     |   5 +
 include/drm/ttm/ttm_tt.h       |  20 ++
 4 files changed, 446 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 6e1fd6985ffc..38e50cf81b0a 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -41,6 +41,7 @@
 #include <asm/set_memory.h>
 #endif
 
+#include <drm/ttm/ttm_backup.h>
 #include <drm/ttm/ttm_pool.h>
 #include <drm/ttm/ttm_tt.h>
 #include <drm/ttm/ttm_bo.h>
@@ -58,6 +59,32 @@ struct ttm_pool_dma {
 	unsigned long vaddr;
 };
 
+/**
+ * struct ttm_pool_tt_restore - State representing restore from backup
+ * @alloced_pages: Total number of already allocated pages for the ttm_tt.
+ * @restored_pages: Number of (sub) pages restored from swap for this
+ *		     chunk of 1 << @order pages.
+ * @first_page: The ttm page ptr representing for @old_pages[0].
+ * @caching_divide: Page pointer where subsequent pages are cached.
+ * @old_pages: Backup copy of page pointers that were replaced by the new
+ *	       page allocation.
+ * @pool: The pool used for page allocation while restoring.
+ * @order: The order of the last page allocated while restoring.
+ *
+ * Recovery from backup might fail when we've recovered less than the
+ * full ttm_tt. In order not to loose any data (yet), keep information
+ * around that allows us to restart a failed ttm backup recovery.
+ */
+struct ttm_pool_tt_restore {
+	pgoff_t alloced_pages;
+	pgoff_t restored_pages;
+	struct page **first_page;
+	struct page **caching_divide;
+	struct ttm_pool *pool;
+	unsigned int order;
+	struct page *old_pages[];
+};
+
 static unsigned long page_pool_size;
 
 MODULE_PARM_DESC(page_pool_size, "Number of pages in the WC/UC/DMA pool");
@@ -354,11 +381,102 @@ static unsigned int ttm_pool_page_order(struct ttm_pool *pool, struct page *p)
 	return p->private;
 }
 
+/*
+ * To be able to insert single pages into backup directly,
+ * we need to split multi-order page allocations and make them look
+ * like single-page allocations.
+ */
+static void ttm_pool_split_for_swap(struct ttm_pool *pool, struct page *p)
+{
+	unsigned int order = ttm_pool_page_order(pool, p);
+	pgoff_t nr;
+
+	if (!order)
+		return;
+
+	split_page(p, order);
+	nr = 1UL << order;
+	while (nr--)
+		(p++)->private = 0;
+}
+
+/**
+ * DOC: Partial backup and restoration of a struct ttm_tt.
+ *
+ * Swapout using ttm_backup::ops::backup_page() and swapin using
+ * ttm_backup::ops::copy_backed_up_page() may fail.
+ * The former most likely due to lack of swap-space or memory, the latter due
+ * to lack of memory or because of signal interruption during waits.
+ *
+ * Backupfailure is easily handled by using a ttm_tt pages vector that holds
+ * both swap entries and page pointers. This has to be taken into account when
+ * restoring such a ttm_tt from backup, and when freeing it while backed up.
+ * When restoring, for simplicity, new pages are actually allocated from the
+ * pool and the contents of any old pages are copied in and then the old pages
+ * are released.
+ *
+ * For restoration failures, the struct ttm_pool_tt_restore holds sufficient state
+ * to be able to resume an interrupted restore, and that structure is freed once
+ * the restoration is complete. If the struct ttm_tt is destroyed while there
+ * is a valid struct ttm_pool_tt_restore attached, that is also properly taken
+ * care of.
+ */
+
+static bool ttm_pool_restore_valid(const struct ttm_pool_tt_restore *restore)
+{
+	return restore && restore->restored_pages < (1 << restore->order);
+}
+
+static int ttm_pool_restore_tt(struct ttm_pool_tt_restore *restore,
+			       struct ttm_backup *backup,
+			       struct ttm_operation_ctx *ctx)
+{
+	unsigned int i, nr = 1 << restore->order;
+	int ret = 0;
+
+	if (!ttm_pool_restore_valid(restore))
+		return 0;
+
+	for (i = restore->restored_pages; i < nr; ++i) {
+		struct page *p = restore->old_pages[i];
+
+		if (ttm_backup_page_ptr_is_handle(p)) {
+			unsigned long handle = ttm_backup_page_ptr_to_handle(p);
+
+			if (handle == 0)
+				continue;
+
+			ret = backup->ops->copy_backed_up_page
+				(backup, restore->first_page[i],
+				 handle, ctx->interruptible);
+			if (ret)
+				break;
+
+			backup->ops->drop(backup, handle);
+		} else if (p) {
+			/*
+			 * We could probably avoid splitting the old page
+			 * using clever logic, but ATM we don't care.
+			 */
+			ttm_pool_split_for_swap(restore->pool, p);
+			copy_highpage(restore->first_page[i], p);
+			__free_pages(p, 0);
+		}
+
+		restore->restored_pages++;
+		restore->old_pages[i] = NULL;
+		cond_resched();
+	}
+
+	return ret;
+}
+
 /* Called when we got a page, either from a pool or newly allocated */
 static int ttm_pool_page_allocated(struct ttm_pool *pool, unsigned int order,
 				   struct page *p, dma_addr_t **dma_addr,
 				   unsigned long *num_pages,
-				   struct page ***pages)
+				   struct page ***pages,
+				   struct ttm_pool_tt_restore *restore)
 {
 	unsigned int i;
 	int r;
@@ -369,6 +487,16 @@ static int ttm_pool_page_allocated(struct ttm_pool *pool, unsigned int order,
 			return r;
 	}
 
+	if (restore) {
+		memcpy(restore->old_pages, *pages,
+		       (1 << order) * sizeof(*restore->old_pages));
+		memset(*pages, 0, (1 << order) * sizeof(**pages));
+		restore->order = order;
+		restore->restored_pages = 0;
+		restore->first_page = *pages;
+		restore->alloced_pages += 1UL << order;
+	}
+
 	*num_pages -= 1 << order;
 	for (i = 1 << order; i; --i, ++(*pages), ++p)
 		**pages = p;
@@ -394,22 +522,39 @@ static void ttm_pool_free_range(struct ttm_pool *pool, struct ttm_tt *tt,
 				pgoff_t start_page, pgoff_t end_page)
 {
 	struct page **pages = &tt->pages[start_page];
+	struct ttm_backup *backup = tt->backup;
 	unsigned int order;
 	pgoff_t i, nr;
 
 	for (i = start_page; i < end_page; i += nr, pages += nr) {
 		struct ttm_pool_type *pt = NULL;
+		struct page *p = *pages;
+
+		if (ttm_backup_page_ptr_is_handle(p)) {
+			unsigned long handle = ttm_backup_page_ptr_to_handle(p);
+
+			nr = 1;
+			if (handle != 0)
+				backup->ops->drop(backup, handle);
+			continue;
+		}
+
+		if (pool) {
+			order = ttm_pool_page_order(pool, p);
+			nr = (1UL << order);
+			if (tt->dma_address)
+				ttm_pool_unmap(pool, tt->dma_address[i], nr);
 
-		order = ttm_pool_page_order(pool, *pages);
-		nr = (1UL << order);
-		if (tt->dma_address)
-			ttm_pool_unmap(pool, tt->dma_address[i], nr);
+			pt = ttm_pool_select_type(pool, caching, order);
+		} else {
+			order = p->private;
+			nr = (1UL << order);
+		}
 
-		pt = ttm_pool_select_type(pool, caching, order);
 		if (pt)
-			ttm_pool_type_give(pt, *pages);
+			ttm_pool_type_give(pt, p);
 		else
-			ttm_pool_free_page(pool, caching, order, *pages);
+			ttm_pool_free_page(pool, caching, order, p);
 	}
 }
 
@@ -453,9 +598,37 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 	else
 		gfp_flags |= GFP_HIGHUSER;
 
-	for (order = min_t(unsigned int, MAX_PAGE_ORDER, __fls(num_pages));
-	     num_pages;
-	     order = min_t(unsigned int, order, __fls(num_pages))) {
+	order = min_t(unsigned int, MAX_PAGE_ORDER, __fls(num_pages));
+
+	if (tt->page_flags & TTM_TT_FLAG_PRIV_BACKED_UP) {
+		if (!tt->restore) {
+			gfp_t gfp = GFP_KERNEL | __GFP_NOWARN;
+
+			if (ctx->gfp_retry_mayfail)
+				gfp |= __GFP_RETRY_MAYFAIL;
+
+			tt->restore =
+				kvzalloc(struct_size(tt->restore, old_pages,
+						     (size_t)1 << order), gfp);
+			/* RFC: Possibly loop on -ENOMEM and reduce order. */
+			if (!tt->restore)
+				return -ENOMEM;
+		} else if (ttm_pool_restore_valid(tt->restore)) {
+			struct ttm_pool_tt_restore *restore = tt->restore;
+
+			num_pages -= restore->alloced_pages;
+			order = min_t(unsigned int, order, __fls(num_pages));
+			pages += restore->alloced_pages;
+			r = ttm_pool_restore_tt(restore, tt->backup, ctx);
+			if (r)
+				return r;
+			caching = restore->caching_divide;
+		}
+
+		tt->restore->pool = pool;
+	}
+
+	for (; num_pages; order = min_t(unsigned int, order, __fls(num_pages))) {
 		struct ttm_pool_type *pt;
 
 		page_caching = tt->caching;
@@ -472,11 +645,19 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 				r = ttm_pool_page_allocated(pool, order, p,
 							    &dma_addr,
 							    &num_pages,
-							    &pages);
+							    &pages,
+							    tt->restore);
 				if (r)
 					goto error_free_page;
 
 				caching = pages;
+				if (ttm_pool_restore_valid(tt->restore)) {
+					r = ttm_pool_restore_tt(tt->restore, tt->backup,
+								ctx);
+					if (r)
+						goto error_free_all;
+				}
+
 				if (num_pages < (1 << order))
 					break;
 
@@ -496,9 +677,17 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 				caching = pages;
 			}
 			r = ttm_pool_page_allocated(pool, order, p, &dma_addr,
-						    &num_pages, &pages);
+						    &num_pages, &pages,
+						    tt->restore);
 			if (r)
 				goto error_free_page;
+
+			if (ttm_pool_restore_valid(tt->restore)) {
+				r = ttm_pool_restore_tt(tt->restore, tt->backup, ctx);
+				if (r)
+					goto error_free_all;
+			}
+
 			if (PageHighMem(p))
 				caching = pages;
 		}
@@ -517,12 +706,26 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 	if (r)
 		goto error_free_all;
 
+	if (tt->restore) {
+		kvfree(tt->restore);
+		tt->restore = NULL;
+	}
+
+	if (tt->page_flags & TTM_TT_FLAG_PRIV_BACKED_UP)
+		tt->page_flags &= ~(TTM_TT_FLAG_PRIV_BACKED_UP |
+				    TTM_TT_FLAG_SWAPPED);
+
 	return 0;
 
 error_free_page:
 	ttm_pool_free_page(pool, page_caching, order, p);
 
 error_free_all:
+	if (tt->page_flags & TTM_TT_FLAG_PRIV_BACKED_UP) {
+		tt->restore->caching_divide = caching;
+		return r;
+	}
+
 	num_pages = tt->num_pages - num_pages;
 	caching_divide = caching - tt->pages;
 	ttm_pool_free_range(pool, tt, tt->caching, 0, caching_divide);
@@ -549,6 +752,174 @@ void ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt)
 }
 EXPORT_SYMBOL(ttm_pool_free);
 
+/**
+ * ttm_pool_release_backed_up() - Release content of a swapped-out struct ttm_tt
+ * @tt: The struct ttm_tt.
+ *
+ * Release handles with associated content or any remaining pages of
+ * a backed-up struct ttm_tt.
+ */
+void ttm_pool_release_backed_up(struct ttm_tt *tt)
+{
+	struct ttm_backup *backup = tt->backup;
+	struct ttm_pool_tt_restore *restore;
+	pgoff_t i, start_page = 0;
+	unsigned long handle;
+
+	if (!(tt->page_flags & TTM_TT_FLAG_PRIV_BACKED_UP))
+		return;
+
+	restore = tt->restore;
+
+	if (ttm_pool_restore_valid(restore)) {
+		pgoff_t nr = 1UL << restore->order;
+
+		for (i = restore->restored_pages; i < nr; ++i) {
+			struct page *p = restore->old_pages[i];
+
+			if (ttm_backup_page_ptr_is_handle(p)) {
+				handle = ttm_backup_page_ptr_to_handle(p);
+				if (handle == 0)
+					continue;
+
+				backup->ops->drop(backup, handle);
+			} else if (p) {
+				ttm_pool_split_for_swap(restore->pool, p);
+				__free_pages(p, 0);
+			}
+		}
+	}
+
+	if (restore) {
+		pgoff_t mid = restore->caching_divide - tt->pages;
+
+		start_page = restore->alloced_pages;
+		/* Pages that might be dma-mapped and non-cached */
+		ttm_pool_free_range(restore->pool, tt, tt->caching,
+				    0, mid);
+		/* Pages that might be dma-mapped but cached */
+		ttm_pool_free_range(restore->pool, tt, ttm_cached,
+				    mid, restore->alloced_pages);
+	}
+
+	/* Shrunken pages. Cached and not dma-mapped. */
+	ttm_pool_free_range(NULL, tt, ttm_cached, start_page, tt->num_pages);
+
+	if (restore) {
+		kvfree(restore);
+		tt->restore = NULL;
+	}
+
+	tt->page_flags &= ~(TTM_TT_FLAG_PRIV_BACKED_UP | TTM_TT_FLAG_SWAPPED);
+}
+
+/**
+ * ttm_pool_backup_tt() - Back up or purge a struct ttm_tt
+ * @pool: The pool used when allocating the struct ttm_tt.
+ * @ttm: The struct ttm_tt.
+ * @purge: Don't back up but release pages directly to system.
+ * @writeback: If !@purge, Try to write out directly to the
+ * underlying persistent media.
+ *
+ * Back up or purge a struct ttm_tt. If @purge is true, then
+ * all pages will be freed directly to the system rather than to the pool
+ * they were allocated from, making the function behave similarly to
+ * ttm_pool_free(). If @purge is false the pages will be backed up instead,
+ * exchanged for handles.
+ * A subsequent call to ttm_pool_alloc() will then read back the content and
+ * a subsequent call to ttm_pool_release_shrunken() will drop it.
+ * If backup of a page fails for whatever reason, @ttm will still be
+ * partially backed up, retaining those pages for which backup fails.
+ *
+ * Return: Number of pages actually backed up or freed, or negative
+ * error code on error.
+ */
+long ttm_pool_backup_tt(struct ttm_pool *pool, struct ttm_tt *ttm, bool purge,
+			bool writeback)
+{
+	struct ttm_backup *backup = ttm->backup;
+	struct page *page;
+	unsigned long handle;
+	gfp_t alloc_gfp;
+	gfp_t gfp;
+	int ret = 0;
+	pgoff_t shrunken = 0;
+	pgoff_t i, num_pages;
+
+	if ((!get_nr_swap_pages() && !purge) ||
+	    pool->use_dma_alloc ||
+	    (ttm->page_flags & TTM_TT_FLAG_PRIV_BACKED_UP))
+		return -EBUSY;
+
+#ifdef CONFIG_X86
+	/* Anything returned to the system needs to be cached. */
+	if (ttm->caching != ttm_cached)
+		set_pages_array_wb(ttm->pages, ttm->num_pages);
+#endif
+
+	if (ttm->dma_address || purge) {
+		for (i = 0; i < ttm->num_pages; i += num_pages) {
+			unsigned int order;
+
+			page = ttm->pages[i];
+			if (unlikely(!page)) {
+				num_pages = 1;
+				continue;
+			}
+
+			order = ttm_pool_page_order(pool, page);
+			num_pages = 1UL << order;
+			if (ttm->dma_address)
+				ttm_pool_unmap(pool, ttm->dma_address[i],
+					       num_pages);
+			if (purge) {
+				shrunken += num_pages;
+				page->private = 0;
+				__free_pages(page, order);
+				memset(ttm->pages + i, 0,
+				       num_pages * sizeof(*ttm->pages));
+			}
+		}
+	}
+
+	if (purge)
+		return shrunken;
+
+	if (pool->use_dma32)
+		gfp = GFP_DMA32;
+	else
+		gfp = GFP_HIGHUSER;
+
+	alloc_gfp = GFP_KERNEL | __GFP_HIGH | __GFP_NOWARN | __GFP_RETRY_MAYFAIL;
+
+	for (i = 0; i < ttm->num_pages; ++i) {
+		page = ttm->pages[i];
+		if (unlikely(!page))
+			continue;
+
+		ttm_pool_split_for_swap(pool, page);
+
+		handle = backup->ops->backup_page(backup, page, writeback, i,
+						  gfp, alloc_gfp);
+		if (handle) {
+			ttm->pages[i] = ttm_backup_handle_to_page_ptr(handle);
+			put_page(page);
+			shrunken++;
+		} else {
+			/* We allow partially shrunken tts */
+			ret = -ENOMEM;
+			break;
+		}
+		cond_resched();
+	}
+
+	if (shrunken)
+		ttm->page_flags |= (TTM_TT_FLAG_PRIV_BACKED_UP |
+				    TTM_TT_FLAG_SWAPPED);
+
+	return shrunken ? shrunken : ret;
+}
+
 /**
  * ttm_pool_init - Initialize a pool
  *
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 578a7c37f00b..a14ba80eec44 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -39,6 +39,7 @@
 #include <drm/drm_cache.h>
 #include <drm/drm_device.h>
 #include <drm/drm_util.h>
+#include <drm/ttm/ttm_backup.h>
 #include <drm/ttm/ttm_bo.h>
 #include <drm/ttm/ttm_tt.h>
 
@@ -157,6 +158,7 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
 	ttm->swap_storage = NULL;
 	ttm->sg = bo->sg;
 	ttm->caching = caching;
+	ttm->restore = NULL;
 }
 
 int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
@@ -181,6 +183,12 @@ void ttm_tt_fini(struct ttm_tt *ttm)
 		fput(ttm->swap_storage);
 	ttm->swap_storage = NULL;
 
+	ttm_pool_release_backed_up(ttm);
+	if (ttm->backup) {
+		ttm->backup->ops->fini(ttm->backup);
+		ttm->backup = NULL;
+	}
+
 	if (ttm->pages)
 		kvfree(ttm->pages);
 	else
@@ -251,6 +259,35 @@ int ttm_tt_swapin(struct ttm_tt *ttm)
 	return ret;
 }
 
+/**
+ * ttm_tt_backup() - Helper to back up a struct ttm_tt.
+ * @bdev: The TTM device.
+ * @tt: The struct ttm_tt.
+ * @purge: Don't back up but release pages directly to system,
+ * bypassing any pooling.
+ * @writeback: If !@purge, try to write out directly to the
+ * underlying persistent media.
+ *
+ * Helper for a TTM driver to use from the bo_shrink() method to shrink
+ * a struct ttm_tt, after it has done the necessary unbinding. This function
+ * will update the page accounting and call ttm_pool_shrink_tt to free pages
+ * or move them to the swap cache.
+ *
+ * Return: Number of pages freed or swapped out, or negative error code on
+ * error.
+ */
+long ttm_tt_backup(struct ttm_device *bdev, struct ttm_tt *tt, bool purge,
+		   bool writeback)
+{
+	long ret = ttm_pool_backup_tt(&bdev->pool, tt, purge, writeback);
+
+	if (ret > 0)
+		tt->page_flags &= ~TTM_TT_FLAG_PRIV_POPULATED;
+
+	return ret;
+}
+EXPORT_SYMBOL(ttm_tt_backup);
+
 /**
  * ttm_tt_swapout - swap out tt object
  *
diff --git a/include/drm/ttm/ttm_pool.h b/include/drm/ttm/ttm_pool.h
index 160d954a261e..4e4db369952b 100644
--- a/include/drm/ttm/ttm_pool.h
+++ b/include/drm/ttm/ttm_pool.h
@@ -89,6 +89,11 @@ void ttm_pool_fini(struct ttm_pool *pool);
 
 int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m);
 
+void ttm_pool_release_backed_up(struct ttm_tt *tt);
+
+long ttm_pool_backup_tt(struct ttm_pool *pool, struct ttm_tt *ttm,
+			bool purge, bool writeback);
+
 int ttm_pool_mgr_init(unsigned long num_pages);
 void ttm_pool_mgr_fini(void);
 
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index 2b9d856ff388..6b990f1e7dd0 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -32,11 +32,13 @@
 #include <drm/ttm/ttm_caching.h>
 #include <drm/ttm/ttm_kmap_iter.h>
 
+struct ttm_backup;
 struct ttm_device;
 struct ttm_tt;
 struct ttm_resource;
 struct ttm_buffer_object;
 struct ttm_operation_ctx;
+struct ttm_pool_tt_restore;
 
 /**
  * struct ttm_tt - This is a structure holding the pages, caching- and aperture
@@ -85,6 +87,9 @@ struct ttm_tt {
 	 * fault handling abuses the DMA api a bit and dma_map_attrs can't be
 	 * used to assure pgprot always matches.
 	 *
+	 * TTM_TT_FLAG_PRIV_BACKED_UP: TTM internal only. This is set if the
+	 * struct ttm_tt has been (possibly partially) backed up.
+	 *
 	 * TTM_TT_FLAG_PRIV_POPULATED: TTM internal only. DO NOT USE. This is
 	 * set by TTM after ttm_tt_populate() has successfully returned, and is
 	 * then unset when TTM calls ttm_tt_unpopulate().
@@ -96,6 +101,7 @@ struct ttm_tt {
 #define TTM_TT_FLAG_DECRYPTED		BIT(4)
 
 #define TTM_TT_FLAG_PRIV_POPULATED	BIT(5)
+#define TTM_TT_FLAG_PRIV_BACKED_UP	BIT(6)
 	uint32_t page_flags;
 	/** @num_pages: Number of pages in the page array. */
 	uint32_t num_pages;
@@ -105,11 +111,21 @@ struct ttm_tt {
 	dma_addr_t *dma_address;
 	/** @swap_storage: Pointer to shmem struct file for swap storage. */
 	struct file *swap_storage;
+	/**
+	 * @backup: Pointer to backup struct for backed up tts.
+	 * RFC: Could possibly be unified with @swap_storage.
+	 */
+	struct ttm_backup *backup;
 	/**
 	 * @caching: The current caching state of the pages, see enum
 	 * ttm_caching.
 	 */
 	enum ttm_caching caching;
+	/**
+	 * @restore: Partial restoration from backup state.
+	 * RFC: Incorporate in struct ttm_backup?
+	 */
+	struct ttm_pool_tt_restore *restore;
 };
 
 /**
@@ -230,6 +246,10 @@ void ttm_tt_mgr_init(unsigned long num_pages, unsigned long num_dma32_pages);
 struct ttm_kmap_iter *ttm_kmap_iter_tt_init(struct ttm_kmap_iter_tt *iter_tt,
 					    struct ttm_tt *tt);
 unsigned long ttm_tt_pages_limit(void);
+
+long ttm_tt_backup(struct ttm_device *bdev, struct ttm_tt *tt, bool purge,
+		   bool writeback);
+
 #if IS_ENABLED(CONFIG_AGP)
 #include <linux/agp_backend.h>
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 7/9] drm/ttm: Use fault-injection to test error paths
  2024-04-16 10:07 [PATCH v2 0/9] TTM shrinker helpers and xe buffer object shrinker Thomas Hellström
                   ` (5 preceding siblings ...)
  2024-04-16 10:07 ` [PATCH v2 6/9] drm/ttm/pool: Provide a helper to shrink pages Thomas Hellström
@ 2024-04-16 10:07 ` Thomas Hellström
  2024-04-16 10:07 ` [PATCH v2 8/9] drm/xe, drm/ttm: Provide a generic LRU walker helper Thomas Hellström
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Thomas Hellström @ 2024-04-16 10:07 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Christian König,
	Somalapuram Amaranath, dri-devel, Matthew Brost

Use fault-injection to test partial TTM swapout and interrupted swapin.
Return -EINTR for swapin to test the callers ability to handle and
restart the swapin, and on swapout perform a partial swapout to test that
the swapin and release_shrunken functionality.

Cc: Christian König <christian.koenig@amd.com>
Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
Cc: <dri-devel@lists.freedesktop.org>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/Kconfig        | 10 ++++++++++
 drivers/gpu/drm/ttm/ttm_pool.c | 17 ++++++++++++++++-
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 959b19a04101..032f069235d5 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -258,6 +258,16 @@ config DRM_GPUVM
 	  GPU-VM representation providing helpers to manage a GPUs virtual
 	  address space
 
+config DRM_TTM_BACKUP_FAULT_INJECT
+	bool "Enable fault injection during TTM backup"
+	depends on DRM_TTM
+	default n
+	help
+	  Inject recoverable failures during TTM backup and recovery of
+	  backed-up objects. For DRM driver developers only.
+
+	  If in doubt, choose N.
+
 config DRM_BUDDY
 	tristate
 	depends on DRM
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 38e50cf81b0a..d32a1f2e5e50 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -431,6 +431,7 @@ static int ttm_pool_restore_tt(struct ttm_pool_tt_restore *restore,
 			       struct ttm_backup *backup,
 			       struct ttm_operation_ctx *ctx)
 {
+	static unsigned long __maybe_unused swappedin;
 	unsigned int i, nr = 1 << restore->order;
 	int ret = 0;
 
@@ -446,6 +447,13 @@ static int ttm_pool_restore_tt(struct ttm_pool_tt_restore *restore,
 			if (handle == 0)
 				continue;
 
+			if (IS_ENABLED(CONFIG_DRM_TTM_BACKUP_FAULT_INJECT) &&
+			    ctx->interruptible &&
+			    ++swappedin % 100 == 0) {
+				ret = -EINTR;
+				break;
+			}
+
 			ret = backup->ops->copy_backed_up_page
 				(backup, restore->first_page[i],
 				 handle, ctx->interruptible);
@@ -892,7 +900,14 @@ long ttm_pool_backup_tt(struct ttm_pool *pool, struct ttm_tt *ttm, bool purge,
 
 	alloc_gfp = GFP_KERNEL | __GFP_HIGH | __GFP_NOWARN | __GFP_RETRY_MAYFAIL;
 
-	for (i = 0; i < ttm->num_pages; ++i) {
+	num_pages = ttm->num_pages;
+
+	/* Pretend doing fault injection by shrinking only half of the pages. */
+
+	if (IS_ENABLED(CONFIG_DRM_TTM_BACKUP_FAULT_INJECT))
+		num_pages = DIV_ROUND_UP(num_pages, 2);
+
+	for (i = 0; i < num_pages; ++i) {
 		page = ttm->pages[i];
 		if (unlikely(!page))
 			continue;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 8/9] drm/xe, drm/ttm: Provide a generic LRU walker helper
  2024-04-16 10:07 [PATCH v2 0/9] TTM shrinker helpers and xe buffer object shrinker Thomas Hellström
                   ` (6 preceding siblings ...)
  2024-04-16 10:07 ` [PATCH v2 7/9] drm/ttm: Use fault-injection to test error paths Thomas Hellström
@ 2024-04-16 10:07 ` Thomas Hellström
  2024-04-16 10:07 ` [PATCH v2 9/9] drm/xe: Add a shrinker for xe bos Thomas Hellström
  2024-04-16 11:55 ` [PATCH v2 0/9] TTM shrinker helpers and xe buffer object shrinker Christian König
  9 siblings, 0 replies; 21+ messages in thread
From: Thomas Hellström @ 2024-04-16 10:07 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Christian König,
	Somalapuram Amaranath, dri-devel, Matthew Brost

Export the needed functions from TTM and provide a generic LRU
walker in xe, in the spirit of drm_gem_lru_scan() but building
on the restartable TTM LRU functionality.

The LRU walker optionally supports locking objects as part of
a drm_exec locking transaction, and can thus be used for both
exhaustive eviction and shrinking. And, in fact, direct
shrinking in the case where we fail to populate system memory
objects and want to retry by shrinking purgeable or evictable
local objects, which a shrinker is not capable of doing.

The LRU walker helper can easily be moved to TTM when / if used
by other drivers.

Cc: Christian König <christian.koenig@amd.com>
Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
Cc: <dri-devel@lists.freedesktop.org>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_resource.c  |   3 +
 drivers/gpu/drm/xe/Makefile         |   1 +
 drivers/gpu/drm/xe/xe_ttm_helpers.c | 149 ++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_ttm_helpers.h |  47 +++++++++
 4 files changed, 200 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/xe_ttm_helpers.c
 create mode 100644 drivers/gpu/drm/xe/xe_ttm_helpers.h

diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
index 2b93727c78e5..cf5d4f3fa5ad 100644
--- a/drivers/gpu/drm/ttm/ttm_resource.c
+++ b/drivers/gpu/drm/ttm/ttm_resource.c
@@ -106,6 +106,7 @@ void ttm_resource_cursor_fini(struct ttm_resource_cursor *cursor)
 	ttm_resource_cursor_fini_locked(cursor);
 	spin_unlock(lru_lock);
 }
+EXPORT_SYMBOL(ttm_resource_cursor_fini);
 
 /**
  * ttm_lru_bulk_move_init - initialize a bulk move structure
@@ -626,6 +627,7 @@ ttm_resource_manager_first(struct ttm_resource_manager *man,
 
 	return ttm_resource_manager_next(cursor);
 }
+EXPORT_SYMBOL(ttm_resource_manager_first);
 
 /**
  * ttm_resource_manager_next() - Continue iterating over the resource manager
@@ -663,6 +665,7 @@ ttm_resource_manager_next(struct ttm_resource_cursor *cursor)
 
 	return NULL;
 }
+EXPORT_SYMBOL(ttm_resource_manager_next);
 
 static void ttm_kmap_iter_iomap_map_local(struct ttm_kmap_iter *iter,
 					  struct iosys_map *dmap,
diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index c46d145606f6..50613f0fe635 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -135,6 +135,7 @@ xe-y += xe_bb.o \
 	xe_tile.o \
 	xe_tile_sysfs.o \
 	xe_trace.o \
+	xe_ttm_helpers.o \
 	xe_ttm_sys_mgr.o \
 	xe_ttm_stolen_mgr.o \
 	xe_ttm_vram_mgr.o \
diff --git a/drivers/gpu/drm/xe/xe_ttm_helpers.c b/drivers/gpu/drm/xe/xe_ttm_helpers.c
new file mode 100644
index 000000000000..92c951dee30d
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_ttm_helpers.c
@@ -0,0 +1,149 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2024 Intel Corporation
+ */
+
+#include <drm/drm_exec.h>
+
+#include "xe_ttm_helpers.h"
+
+#include <drm/ttm/ttm_bo.h>
+#include <drm/ttm/ttm_device.h>
+
+static bool xe_ttm_lru_walk_trylock(struct xe_ttm_lru_walk *walk,
+				    struct ttm_buffer_object *bo,
+				    bool *needs_unlock)
+{
+	struct ttm_operation_ctx *ctx = walk->ctx;
+
+	*needs_unlock = false;
+
+	if (!walk->exec && dma_resv_trylock(bo->base.resv)) {
+		*needs_unlock = true;
+		return true;
+	}
+
+	if (bo->base.resv == ctx->resv && ctx->allow_res_evict) {
+		dma_resv_assert_held(bo->base.resv);
+		return true;
+	}
+
+	return false;
+}
+
+static void xe_ttm_lru_walk_unlock(struct ttm_buffer_object *bo, bool locked)
+{
+	if (locked)
+		dma_resv_unlock(bo->base.resv);
+}
+
+/**
+ * xe_ttm_lru_walk_for_evict() - Perform a LRU list walk, with actions taken on
+ * valid items.
+ * @walk: describe the walks and actions taken
+ * @bdev: The TTM device.
+ * @man: The struct ttm_resource manager whose LRU lists we're walking.
+ * @mem_type: The memory type associated with @man.
+ * @target: The end condition for the walk.
+ *
+ * The LRU lists of @man are walk, and for each struct ttm_resource encountered,
+ * the corresponding ttm_buffer_object is locked and taken a reference on, and
+ * the LRU lock is dropped. the LRU lock may be dropped before locking and, in
+ * that case, it's verified that the item actually remains on the LRU list after
+ * the lock, and that the buffer object hasn't changed.
+ *
+ * With a locked object, the actions indicated by @walk->allow_bo() and
+ * @walk->process_bo are performed. (RFC: Combine these?), and after that, the
+ * bo is unlocked, the refcount dropped and the next struct ttm_resource is
+ * processed. Here we rely on TTM's restartable LRU list implementation.
+ *
+ * Typically @walk->process_bo() would return the number of pages evicted, and
+ * that when the total exceeds @target, or when the LRU list has been walked
+ * in full, iteration is terminated. It's also terminated on error.
+ *
+ * Buffer object dma_resv locking:
+ * This locking is performed using the combined interpretation of @walk->exec and
+ * @walk->ctx according to the following.
+ * 1) Sleeping locks: Sleeping locks are used exclusively if @walk->exec is true.
+ * The buffer object are not unlocked. That is the caller's responsibility.
+ * 2) Assuming bo is already locked: This assumption is made iff @walk->exec is false,
+ * @walk->ctx->allow_res_evict is true and bo->base.resv == @walk->ctx->resv.
+ * This is for cases where it is desired to evict bos sharing a reservation lock
+ * that is already held by the process. Thes bo locks are not unlocked during
+ * the walk.
+ * 3) Trylocking. Trylocking is done in all other cases. If trylocking fails, the
+ * iteration skips the current item and continues. Trylocks are always unlocked
+ * by the walk.
+ *
+ * Note that the way dma_resv individualization is done, locking needs to be done
+ * either with the LRU lock held (trylocking only) or with a reference on the
+ * object.
+ *
+ * Return: (Typically) The number of pages evicted or negative error code on error.
+ */
+long xe_ttm_lru_walk_for_evict(struct xe_ttm_lru_walk *walk, struct ttm_device *bdev,
+			       struct ttm_resource_manager *man, unsigned int mem_type,
+			       long target)
+{
+	struct drm_exec *exec = walk->exec;
+	struct ttm_resource_cursor cursor;
+	struct ttm_resource *res;
+	long sofar = 0;
+	long lret;
+	int ret;
+
+	spin_lock(&bdev->lru_lock);
+	ttm_resource_manager_for_each_res(man, &cursor, res) {
+		struct ttm_buffer_object *bo = res->bo;
+		bool bo_needs_unlock = false;
+		bool bo_locked = false;
+
+		if (!bo || bo->resource != res)
+			continue;
+
+		if (xe_ttm_lru_walk_trylock(walk, bo, &bo_needs_unlock))
+			bo_locked = true;
+		else if (!exec)
+			continue;
+
+		if (!ttm_bo_get_unless_zero(bo)) {
+			xe_ttm_lru_walk_unlock(bo, bo_needs_unlock);
+			continue;
+		}
+
+		spin_unlock(&bdev->lru_lock);
+
+		if (!bo_locked) {
+			ret = drm_exec_lock_obj(exec, &bo->base);
+			if (ret)
+				ttm_bo_put(bo);
+		}
+
+		lret = 0;
+
+		/*
+		 * Note that in between the release of the lru lock and the
+		 * drm_exec_lock_obj, the bo may have switched resource,
+		 * and also memory type. In that case we just skip it.
+		 */
+		if (bo->resource == res && res->mem_type == mem_type)
+			lret = walk->ops->process_bo(walk, bo);
+
+		xe_ttm_lru_walk_unlock(bo, bo_needs_unlock);
+		ttm_bo_put(bo);
+		if (lret < 0) {
+			sofar = lret;
+			goto out;
+		}
+
+		sofar += lret;
+		if (sofar >= target)
+			goto out;
+
+		spin_lock(&bdev->lru_lock);
+	}
+	spin_unlock(&bdev->lru_lock);
+out:
+	ttm_resource_cursor_fini(&cursor);
+	return sofar;
+}
diff --git a/drivers/gpu/drm/xe/xe_ttm_helpers.h b/drivers/gpu/drm/xe/xe_ttm_helpers.h
new file mode 100644
index 000000000000..080eecb65a80
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_ttm_helpers.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2024 Intel Corporation
+ */
+
+#ifndef _XE_TTM_HELPERS_H_
+#define _XE_TTM_HELPERS_H_
+
+#include <linux/types.h>
+
+struct drm_exec;
+struct ttm_device;
+struct ttm_buffer_object;
+struct ttm_operation_ctx;
+struct ttm_resource_manager;
+
+struct xe_ttm_lru_walk;
+
+/** struct xe_ttm_lru_walk_ops - Operations for a LRU walk. */
+struct xe_ttm_lru_walk_ops {
+	/**
+	 * process_bo - Process this bo.
+	 * @walk: struct xe_ttm_lru_walk describing the walk.
+	 * @bo: A locked and referenced buffer object.
+	 *
+	 * Return: Negative error code on error, Number of processed pages on
+	 * success. 0 also indicates success.
+	 */
+	long (*process_bo)(struct xe_ttm_lru_walk *walk, struct ttm_buffer_object *bo);
+};
+
+/**
+ * struct xe_ttm_lru_walk - Structure describing a LRU walk.
+ * @ops: Pointer to the ops structure.
+ * @ctx: Pointer to the struct ttm_operation_ctx.
+ * @exec: The struct drm_exec context for the WW transaction if any.
+ */
+struct xe_ttm_lru_walk {
+	const struct xe_ttm_lru_walk_ops *ops;
+	struct ttm_operation_ctx *ctx;
+	struct drm_exec *exec;
+};
+
+long xe_ttm_lru_walk_for_evict(struct xe_ttm_lru_walk *walk, struct ttm_device *bdev,
+			       struct ttm_resource_manager *man, unsigned int mem_type,
+			       long target);
+#endif
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 9/9] drm/xe: Add a shrinker for xe bos
  2024-04-16 10:07 [PATCH v2 0/9] TTM shrinker helpers and xe buffer object shrinker Thomas Hellström
                   ` (7 preceding siblings ...)
  2024-04-16 10:07 ` [PATCH v2 8/9] drm/xe, drm/ttm: Provide a generic LRU walker helper Thomas Hellström
@ 2024-04-16 10:07 ` Thomas Hellström
  2024-04-16 21:32   ` kernel test robot
  2024-04-17 13:45   ` kernel test robot
  2024-04-16 11:55 ` [PATCH v2 0/9] TTM shrinker helpers and xe buffer object shrinker Christian König
  9 siblings, 2 replies; 21+ messages in thread
From: Thomas Hellström @ 2024-04-16 10:07 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Matthew Brost, Somalapuram Amaranath,
	Christian König, dri-devel

Rather than relying on the TTM watermark accounting add a shrinker
for xe_bos in TT or system memory.

Leverage the newly added TTM per-page shrinking and shmem backup
support.

Although xe doesn't fully support WONTNEED (purgeable) bos yet,
introduce and add shrinker support for purgeable ttm_tts.

v2:
- Cleanups bugfixes and a KUNIT shrinker test.
- Add writeback support, and activate if kswapd.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/Makefile           |   1 +
 drivers/gpu/drm/xe/tests/xe_bo.c      | 118 ++++++++++++++
 drivers/gpu/drm/xe/tests/xe_bo_test.c |   1 +
 drivers/gpu/drm/xe/tests/xe_bo_test.h |   1 +
 drivers/gpu/drm/xe/xe_bo.c            | 145 +++++++++++++++--
 drivers/gpu/drm/xe/xe_bo.h            |   4 +
 drivers/gpu/drm/xe/xe_device.c        |   8 +
 drivers/gpu/drm/xe/xe_device_types.h  |   2 +
 drivers/gpu/drm/xe/xe_shrinker.c      | 226 ++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_shrinker.h      |  18 ++
 drivers/gpu/drm/xe/xe_ttm_helpers.c   |  75 +++++++++
 drivers/gpu/drm/xe/xe_ttm_helpers.h   |   3 +
 12 files changed, 585 insertions(+), 17 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/xe_shrinker.c
 create mode 100644 drivers/gpu/drm/xe/xe_shrinker.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index 50613f0fe635..99ce034d7e2a 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -130,6 +130,7 @@ xe-y += xe_bb.o \
 	xe_ring_ops.o \
 	xe_sa.o \
 	xe_sched_job.o \
+	xe_shrinker.o \
 	xe_step.o \
 	xe_sync.o \
 	xe_tile.o \
diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c
index 9f3c02826464..7576d362020f 100644
--- a/drivers/gpu/drm/xe/tests/xe_bo.c
+++ b/drivers/gpu/drm/xe/tests/xe_bo.c
@@ -6,6 +6,8 @@
 #include <kunit/test.h>
 #include <kunit/visibility.h>
 
+#include <uapi/linux/sysinfo.h>
+
 #include "tests/xe_bo_test.h"
 #include "tests/xe_pci_test.h"
 #include "tests/xe_test.h"
@@ -350,3 +352,119 @@ void xe_bo_evict_kunit(struct kunit *test)
 	xe_call_for_each_device(evict_test_run_device);
 }
 EXPORT_SYMBOL_IF_KUNIT(xe_bo_evict_kunit);
+
+struct xe_bo_link {
+	struct list_head link;
+	struct xe_bo *bo;
+};
+
+#define XE_BO_SHRINK_SIZE ((unsigned long)SZ_64M)
+
+/*
+ * Try to create system bos corresponding to twice the amount
+ * of available system memory to test shrinker functionality.
+ * If no swap space is available to accommodate the
+ * memory overcommit, mark bos purgeable.
+ */
+static int shrink_test_run_device(struct xe_device *xe)
+{
+	struct kunit *test = xe_cur_kunit();
+	LIST_HEAD(bos);
+	struct xe_bo_link *link, *next;
+	struct sysinfo si;
+	size_t total, alloced;
+	unsigned int interrupted = 0, successful = 0;
+
+	si_meminfo(&si);
+	total = si.freeram * si.mem_unit;
+
+	kunit_info(test, "Free ram is %lu bytes. Will allocate twice of that.\n",
+		   total);
+
+	total <<= 1;
+	for (alloced = 0; alloced < total ; alloced += XE_BO_SHRINK_SIZE) {
+		struct xe_bo *bo;
+		unsigned int mem_type;
+
+		link = kzalloc(sizeof(*link), GFP_KERNEL);
+		if (!link) {
+			KUNIT_FAIL(test, "Unexpeced link allocation failure\n");
+			break;
+		}
+
+		INIT_LIST_HEAD(&link->link);
+
+		/* We can create bos using WC caching here. But it is slower. */
+		bo = xe_bo_create_user(xe, NULL, NULL, XE_BO_SHRINK_SIZE,
+				       DRM_XE_GEM_CPU_CACHING_WB,
+				       ttm_bo_type_device,
+				       XE_BO_FLAG_SYSTEM);
+		if (IS_ERR(bo)) {
+			if (bo != ERR_PTR(-ENOMEM) && bo != ERR_PTR(-ENOSPC) &&
+			    bo != ERR_PTR(-EINTR) && bo != ERR_PTR(-ERESTARTSYS))
+				KUNIT_FAIL(test, "Error creating bo: %pe\n", bo);
+			kfree(link);
+			break;
+		}
+		link->bo = bo;
+		list_add_tail(&link->link, &bos);
+		xe_bo_lock(bo, false);
+
+		/*
+		 * If we're low on swap entries, we can't shrink unless the bo
+		 * is marked purgeable.
+		 */
+		if (get_nr_swap_pages() < (XE_BO_SHRINK_SIZE >> PAGE_SHIFT) * 128) {
+			struct xe_ttm_tt *xe_tt =
+				container_of(bo->ttm.ttm, typeof(*xe_tt), ttm);
+			long num_pages = xe_tt->ttm.num_pages;
+
+			xe_tt->purgeable = true;
+			xe_shrinker_mod_pages(xe->mem.shrinker, -num_pages,
+					      num_pages);
+		}
+
+		mem_type = bo->ttm.resource->mem_type;
+		xe_bo_unlock(bo);
+		if (mem_type != XE_PL_TT)
+			KUNIT_FAIL(test, "Bo in incorrect memory type: %u\n",
+				   bo->ttm.resource->mem_type);
+		cond_resched();
+		if (signal_pending(current))
+			break;
+	}
+
+	/* Read back and destroy bos */
+	list_for_each_entry_safe_reverse(link, next, &bos, link) {
+		static struct ttm_operation_ctx ctx = {.interruptible = true};
+		struct xe_bo *bo = link->bo;
+		int ret;
+
+		if (!signal_pending(current)) {
+			xe_bo_lock(bo, NULL);
+			ret = ttm_bo_validate(&bo->ttm, &tt_placement, &ctx);
+			xe_bo_unlock(bo);
+			if (ret && ret != -EINTR)
+				KUNIT_FAIL(test, "Validation failed: %pe\n",
+					   ERR_PTR(ret));
+			else if (ret)
+				interrupted++;
+			else
+				successful++;
+		}
+		xe_bo_put(link->bo);
+		list_del(&link->link);
+		kfree(link);
+		cond_resched();
+	}
+	kunit_info(test, "Readbacks interrupted: %u successful: %u\n",
+		   interrupted, successful);
+
+	return 0;
+}
+
+void xe_bo_shrink_kunit(struct kunit *test)
+{
+	xe_call_for_each_device(shrink_test_run_device);
+}
+EXPORT_SYMBOL_IF_KUNIT(xe_bo_shrink_kunit);
diff --git a/drivers/gpu/drm/xe/tests/xe_bo_test.c b/drivers/gpu/drm/xe/tests/xe_bo_test.c
index a324cde77db8..317fa923e287 100644
--- a/drivers/gpu/drm/xe/tests/xe_bo_test.c
+++ b/drivers/gpu/drm/xe/tests/xe_bo_test.c
@@ -10,6 +10,7 @@
 static struct kunit_case xe_bo_tests[] = {
 	KUNIT_CASE(xe_ccs_migrate_kunit),
 	KUNIT_CASE(xe_bo_evict_kunit),
+	KUNIT_CASE_SLOW(xe_bo_shrink_kunit),
 	{}
 };
 
diff --git a/drivers/gpu/drm/xe/tests/xe_bo_test.h b/drivers/gpu/drm/xe/tests/xe_bo_test.h
index 0113ab45066a..7f44d14a45c5 100644
--- a/drivers/gpu/drm/xe/tests/xe_bo_test.h
+++ b/drivers/gpu/drm/xe/tests/xe_bo_test.h
@@ -10,5 +10,6 @@ struct kunit;
 
 void xe_ccs_migrate_kunit(struct kunit *test);
 void xe_bo_evict_kunit(struct kunit *test);
+void xe_bo_shrink_kunit(struct kunit *test);
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index fdeb3691d3f6..fba743f33d8c 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -10,6 +10,7 @@
 #include <drm/drm_drv.h>
 #include <drm/drm_gem_ttm_helper.h>
 #include <drm/drm_managed.h>
+#include <drm/ttm/ttm_backup.h>
 #include <drm/ttm/ttm_device.h>
 #include <drm/ttm/ttm_placement.h>
 #include <drm/ttm/ttm_tt.h>
@@ -24,7 +25,9 @@
 #include "xe_migrate.h"
 #include "xe_preempt_fence.h"
 #include "xe_res_cursor.h"
+#include "xe_shrinker.h"
 #include "xe_trace.h"
+#include "xe_ttm_helpers.h"
 #include "xe_ttm_stolen_mgr.h"
 #include "xe_vm.h"
 
@@ -263,11 +266,15 @@ static void xe_evict_flags(struct ttm_buffer_object *tbo,
 	}
 }
 
+/* struct xe_ttm_tt - Subclassed ttm_tt for xe */
 struct xe_ttm_tt {
 	struct ttm_tt ttm;
-	struct device *dev;
+	/** @xe - The xe device */
+	struct xe_device *xe;
 	struct sg_table sgt;
 	struct sg_table *sg;
+	/** @purgeable - Whether the bo is purgeable (WONTNEED) */
+	bool purgeable;
 };
 
 static int xe_tt_map_sg(struct ttm_tt *tt)
@@ -276,7 +283,8 @@ static int xe_tt_map_sg(struct ttm_tt *tt)
 	unsigned long num_pages = tt->num_pages;
 	int ret;
 
-	XE_WARN_ON(tt->page_flags & TTM_TT_FLAG_EXTERNAL);
+	XE_WARN_ON((tt->page_flags & TTM_TT_FLAG_EXTERNAL) &&
+		   !(tt->page_flags & TTM_TT_FLAG_EXTERNAL_MAPPABLE));
 
 	if (xe_tt->sg)
 		return 0;
@@ -284,13 +292,13 @@ static int xe_tt_map_sg(struct ttm_tt *tt)
 	ret = sg_alloc_table_from_pages_segment(&xe_tt->sgt, tt->pages,
 						num_pages, 0,
 						(u64)num_pages << PAGE_SHIFT,
-						xe_sg_segment_size(xe_tt->dev),
+						xe_sg_segment_size(xe_tt->xe->drm.dev),
 						GFP_KERNEL);
 	if (ret)
 		return ret;
 
 	xe_tt->sg = &xe_tt->sgt;
-	ret = dma_map_sgtable(xe_tt->dev, xe_tt->sg, DMA_BIDIRECTIONAL,
+	ret = dma_map_sgtable(xe_tt->xe->drm.dev, xe_tt->sg, DMA_BIDIRECTIONAL,
 			      DMA_ATTR_SKIP_CPU_SYNC);
 	if (ret) {
 		sg_free_table(xe_tt->sg);
@@ -309,21 +317,41 @@ struct sg_table *xe_bo_sg(struct xe_bo *bo)
 	return xe_tt->sg;
 }
 
+/*
+ * Account ttm pages against the device shrinker's shrinkable and
+ * purgeable counts.
+ */
+static void xe_ttm_tt_account(struct ttm_tt *tt, bool add)
+{
+	struct xe_ttm_tt *xe_tt = container_of(tt, struct xe_ttm_tt, ttm);
+	long num_pages = tt->num_pages;
+
+	if (!add)
+		num_pages = -num_pages;
+
+	if (xe_tt->purgeable)
+		xe_shrinker_mod_pages(xe_tt->xe->mem.shrinker, 0, num_pages);
+	else
+		xe_shrinker_mod_pages(xe_tt->xe->mem.shrinker, num_pages, 0);
+}
+
 static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
 				       u32 page_flags)
 {
 	struct xe_bo *bo = ttm_to_xe_bo(ttm_bo);
 	struct xe_device *xe = xe_bo_device(bo);
-	struct xe_ttm_tt *tt;
+	struct xe_ttm_tt *xe_tt;
+	struct ttm_tt *tt;
 	unsigned long extra_pages;
 	enum ttm_caching caching;
 	int err;
 
-	tt = kzalloc(sizeof(*tt), GFP_KERNEL);
-	if (!tt)
+	xe_tt = kzalloc(sizeof(*xe_tt), GFP_KERNEL);
+	if (!xe_tt)
 		return NULL;
 
-	tt->dev = xe->drm.dev;
+	tt = &xe_tt->ttm;
+	xe_tt->xe = xe;
 
 	extra_pages = 0;
 	if (xe_bo_needs_ccs_pages(bo))
@@ -351,25 +379,37 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
 	    (xe->info.graphics_verx100 >= 1270 && bo->flags & XE_BO_FLAG_PAGETABLE))
 		caching = ttm_write_combined;
 
-	err = ttm_tt_init(&tt->ttm, &bo->ttm, page_flags, caching, extra_pages);
+	if (ttm_bo->type != ttm_bo_type_sg)
+		page_flags |= TTM_TT_FLAG_EXTERNAL | TTM_TT_FLAG_EXTERNAL_MAPPABLE;
+
+	err = ttm_tt_init(tt, &bo->ttm, page_flags, caching, extra_pages);
 	if (err) {
-		kfree(tt);
+		kfree(xe_tt);
 		return NULL;
 	}
 
-	return &tt->ttm;
+	tt->backup = ttm_backup_shmem_create(tt->num_pages << PAGE_SHIFT);
+	if (IS_ERR(tt->backup)) {
+		ttm_tt_fini(tt);
+		kfree(xe_tt);
+		return NULL;
+	}
+
+	return tt;
 }
 
 static int xe_ttm_tt_populate(struct ttm_device *ttm_dev, struct ttm_tt *tt,
 			      struct ttm_operation_ctx *ctx)
 {
+	struct xe_ttm_tt *xe_tt = container_of(tt, struct xe_ttm_tt, ttm);
 	int err;
 
 	/*
 	 * dma-bufs are not populated with pages, and the dma-
 	 * addresses are set up when moved to XE_PL_TT.
 	 */
-	if (tt->page_flags & TTM_TT_FLAG_EXTERNAL)
+	if ((tt->page_flags & TTM_TT_FLAG_EXTERNAL) &&
+	    !(tt->page_flags & TTM_TT_FLAG_EXTERNAL_MAPPABLE))
 		return 0;
 
 	err = ttm_pool_alloc(&ttm_dev->pool, tt, ctx);
@@ -378,27 +418,88 @@ static int xe_ttm_tt_populate(struct ttm_device *ttm_dev, struct ttm_tt *tt,
 
 	/* A follow up may move this xe_bo_move when BO is moved to XE_PL_TT */
 	err = xe_tt_map_sg(tt);
-	if (err)
+	if (err) {
 		ttm_pool_free(&ttm_dev->pool, tt);
+		return err;
+	}
 
-	return err;
+	xe_tt->purgeable = false;
+	xe_ttm_tt_account(tt, true);
+
+	return 0;
 }
 
 static void xe_ttm_tt_unpopulate(struct ttm_device *ttm_dev, struct ttm_tt *tt)
 {
 	struct xe_ttm_tt *xe_tt = container_of(tt, struct xe_ttm_tt, ttm);
+	struct xe_device *xe = xe_tt->xe;
 
-	if (tt->page_flags & TTM_TT_FLAG_EXTERNAL)
+	if ((tt->page_flags & TTM_TT_FLAG_EXTERNAL) &&
+	    !(tt->page_flags & TTM_TT_FLAG_EXTERNAL_MAPPABLE))
 		return;
 
 	if (xe_tt->sg) {
-		dma_unmap_sgtable(xe_tt->dev, xe_tt->sg,
+		dma_unmap_sgtable(xe->drm.dev, xe_tt->sg,
 				  DMA_BIDIRECTIONAL, 0);
 		sg_free_table(xe_tt->sg);
 		xe_tt->sg = NULL;
 	}
 
-	return ttm_pool_free(&ttm_dev->pool, tt);
+	ttm_pool_free(&ttm_dev->pool, tt);
+	xe_ttm_tt_account(tt, false);
+}
+
+/**
+ * xe_bo_shrink() - Try to shrink an xe bo.
+ * @walk:  - The walk parameters
+ * @bo: The TTM buffer object
+ * @purge: Only consider purgeable bos.
+ * @writeback: Try to write back to persistent storage.
+ *
+ * Try to shrink- or purge a bo, and if it succeeds, unmap dma.
+ * Note that we need to be able to handle also non xe bos
+ * (ghost bos), but only if the struct ttm_tt is embedded in
+ * a struct xe_ttm_tt.
+ *
+ * Return: The number of pages shrunken or purged, or negative error
+ * code on failure.
+ */
+long xe_bo_shrink(struct xe_ttm_lru_walk *walk, struct ttm_buffer_object *bo,
+		  bool purge, bool writeback)
+{
+	struct ttm_tt *tt = bo->ttm;
+	struct xe_ttm_tt *xe_tt = container_of(tt, struct xe_ttm_tt, ttm);
+	struct ttm_place place = {.mem_type = bo->resource->mem_type};
+	struct xe_device *xe = xe_tt->xe;
+	long lret;
+
+	if (!tt || !ttm_tt_is_populated(tt) ||
+	    !(tt->page_flags & TTM_TT_FLAG_EXTERNAL_MAPPABLE) ||
+	    (purge && !xe_tt->purgeable))
+		return 0L;
+
+	if (!ttm_bo_eviction_valuable(bo, &place))
+		return 0L;
+
+	/*
+	 * Unmap DMA only *after* a successful shrink. This should be ok
+	 * since bo gpu access is stopped anyway.
+	 */
+	lret = xe_ttm_bo_try_shrink(walk, bo, xe_tt->purgeable, writeback);
+	if (lret > 0) {
+		xe_assert(xe, !ttm_tt_is_populated(tt));
+
+		xe_ttm_tt_account(tt, false);
+
+		if (xe_tt->sg) {
+			dma_unmap_sgtable(xe->drm.dev, xe_tt->sg,
+					  DMA_BIDIRECTIONAL, 0);
+			sg_free_table(xe_tt->sg);
+			xe_tt->sg = NULL;
+		}
+	}
+
+	return lret;
 }
 
 static void xe_ttm_tt_destroy(struct ttm_device *ttm_dev, struct ttm_tt *tt)
@@ -1201,6 +1302,7 @@ struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
 	struct ttm_operation_ctx ctx = {
 		.interruptible = true,
 		.no_wait_gpu = false,
+		.gfp_retry_mayfail = true,
 	};
 	struct ttm_placement *placement;
 	uint32_t alignment;
@@ -1644,6 +1746,8 @@ int xe_bo_pin_external(struct xe_bo *bo)
 	}
 
 	ttm_bo_pin(&bo->ttm);
+	if (bo->ttm.ttm && ttm_tt_is_populated(bo->ttm.ttm))
+		xe_ttm_tt_account(bo->ttm.ttm, false);
 
 	/*
 	 * FIXME: If we always use the reserve / unreserve functions for locking
@@ -1702,6 +1806,8 @@ int xe_bo_pin(struct xe_bo *bo)
 	}
 
 	ttm_bo_pin(&bo->ttm);
+	if (bo->ttm.ttm && ttm_tt_is_populated(bo->ttm.ttm))
+		xe_ttm_tt_account(bo->ttm.ttm, false);
 
 	/*
 	 * FIXME: If we always use the reserve / unreserve functions for locking
@@ -1737,6 +1843,9 @@ void xe_bo_unpin_external(struct xe_bo *bo)
 	}
 
 	ttm_bo_unpin(&bo->ttm);
+	if (bo->ttm.ttm && ttm_tt_is_populated(bo->ttm.ttm))
+		xe_ttm_tt_account(bo->ttm.ttm, true);
+
 
 	/*
 	 * FIXME: If we always use the reserve / unreserve functions for locking
@@ -1766,6 +1875,8 @@ void xe_bo_unpin(struct xe_bo *bo)
 	}
 
 	ttm_bo_unpin(&bo->ttm);
+	if (bo->ttm.ttm && ttm_tt_is_populated(bo->ttm.ttm))
+		xe_ttm_tt_account(bo->ttm.ttm, true);
 }
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index a885b14bf595..ed3c61b474da 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -63,6 +63,7 @@
 #define XE_BO_PROPS_INVALID	(-1)
 
 struct sg_table;
+struct xe_ttm_lru_walk;
 
 struct xe_bo *xe_bo_alloc(void);
 void xe_bo_free(struct xe_bo *bo);
@@ -314,6 +315,9 @@ static inline unsigned int xe_sg_segment_size(struct device *dev)
 
 #define i915_gem_object_flush_if_display(obj)		((void)(obj))
 
+long xe_bo_shrink(struct xe_ttm_lru_walk *walk, struct ttm_buffer_object *bo,
+		  bool purge, bool writeback);
+
 #if IS_ENABLED(CONFIG_DRM_XE_KUNIT_TEST)
 /**
  * xe_bo_is_mem_type - Whether the bo currently resides in the given
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index d85a2ba0a057..b59128b68c4e 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -39,6 +39,7 @@
 #include "xe_pcode.h"
 #include "xe_pm.h"
 #include "xe_query.h"
+#include "xe_shrinker.h"
 #include "xe_sriov.h"
 #include "xe_tile.h"
 #include "xe_ttm_stolen_mgr.h"
@@ -236,6 +237,9 @@ static void xe_device_destroy(struct drm_device *dev, void *dummy)
 	if (xe->unordered_wq)
 		destroy_workqueue(xe->unordered_wq);
 
+	if (!IS_ERR_OR_NULL(xe->mem.shrinker))
+		xe_shrinker_destroy(xe->mem.shrinker);
+
 	ttm_device_fini(&xe->ttm);
 }
 
@@ -265,6 +269,10 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
 	if (err)
 		goto err;
 
+	xe->mem.shrinker = xe_shrinker_create(xe);
+	if (IS_ERR(xe->mem.shrinker))
+		return ERR_CAST(xe->mem.shrinker);
+
 	xe->info.devid = pdev->device;
 	xe->info.revid = pdev->revision;
 	xe->info.force_execlist = xe_modparam.force_execlist;
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 60ced5f90c2b..d3b6e3d416e1 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -315,6 +315,8 @@ struct xe_device {
 		struct xe_mem_region vram;
 		/** @mem.sys_mgr: system TTM manager */
 		struct ttm_resource_manager sys_mgr;
+		/** @mem.sys_mgr: system memory shrinker. */
+		struct xe_shrinker *shrinker;
 	} mem;
 
 	/** @sriov: device level virtualization data */
diff --git a/drivers/gpu/drm/xe/xe_shrinker.c b/drivers/gpu/drm/xe/xe_shrinker.c
new file mode 100644
index 000000000000..99f39cf8de89
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_shrinker.c
@@ -0,0 +1,226 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2024 Intel Corporation
+ */
+
+#include <linux/shrinker.h>
+#include <linux/swap.h>
+
+#include <drm/ttm/ttm_bo.h>
+#include <drm/ttm/ttm_tt.h>
+
+#include "xe_bo.h"
+#include "xe_shrinker.h"
+#include "xe_ttm_helpers.h"
+
+/**
+ * struct xe_shrinker - per-device shrinker
+ * @xe: Back pointer to the device.
+ * @lock: Lock protecting accounting.
+ * @shrinkable_pages: Number of pages that are currently shrinkable.
+ * @purgeable_pages: Number of pages that are currently purgeable.
+ * @shrink: Pointer to the mm shrinker.
+ */
+struct xe_shrinker {
+	struct xe_device *xe;
+	rwlock_t lock;
+	long shrinkable_pages;
+	long purgeable_pages;
+	struct shrinker *shrink;
+};
+
+/**
+ * struct xe_shrink_lru_walk - lru_walk subclass for shrinker
+ * @walk: The embedded base class.
+ * @xe: Pointer to the xe device.
+ * @purge: Purgeable only request from the srinker.
+ * @writeback: Try to write back to persistent storage.
+ */
+struct xe_shrink_lru_walk {
+	struct xe_ttm_lru_walk walk;
+	struct xe_device *xe;
+	bool purge;
+	bool writeback;
+};
+
+static struct xe_shrinker *to_xe_shrinker(struct shrinker *shrink)
+{
+	return shrink->private_data;
+}
+
+static struct xe_shrink_lru_walk *
+to_xe_shrink_lru_walk(struct xe_ttm_lru_walk *walk)
+{
+	return container_of(walk, struct xe_shrink_lru_walk, walk);
+}
+
+/**
+ * xe_shrinker_mod_pages() - Modify shrinker page accounting
+ * @shrinker: Pointer to the struct xe_shrinker.
+ * @shrinkable: Shrinkable pages delta. May be negative.
+ * @purgeable: Purgeable page delta. May be negative.
+ *
+ * Modifies the shrinkable and purgeable pages accounting.
+ */
+void
+xe_shrinker_mod_pages(struct xe_shrinker *shrinker, long shrinkable, long purgeable)
+{
+	write_lock(&shrinker->lock);
+	shrinker->shrinkable_pages += shrinkable;
+	shrinker->purgeable_pages += purgeable;
+	write_unlock(&shrinker->lock);
+}
+
+static long xe_shrinker_process_bo(struct xe_ttm_lru_walk *walk, struct ttm_buffer_object *bo)
+{
+	struct xe_shrink_lru_walk *shrink_walk = to_xe_shrink_lru_walk(walk);
+
+	return xe_bo_shrink(walk, bo, shrink_walk->purge, shrink_walk->writeback);
+}
+
+static long xe_shrinker_walk(struct xe_shrink_lru_walk *shrink_walk, long target)
+{
+	struct xe_device *xe = shrink_walk->xe;
+	struct ttm_resource_manager *man;
+	unsigned int mem_type;
+	long sofar = 0;
+	long lret;
+
+	for (mem_type = XE_PL_SYSTEM; mem_type <= XE_PL_TT; ++mem_type) {
+		man = ttm_manager_type(&xe->ttm, mem_type);
+		if (!man || !man->use_tt)
+			continue;
+
+		lret = xe_ttm_lru_walk_for_evict(&shrink_walk->walk,
+						 &xe->ttm, man,
+						 mem_type, target);
+		if (lret < 0)
+			return lret;
+
+		sofar += lret;
+		if (sofar >= target)
+			break;
+	}
+
+	return sofar;
+}
+
+static unsigned long
+xe_shrinker_count(struct shrinker *shrink, struct shrink_control *sc)
+{
+	struct xe_shrinker *shrinker = to_xe_shrinker(shrink);
+	unsigned long num_pages;
+
+	num_pages = get_nr_swap_pages();
+	read_lock(&shrinker->lock);
+	num_pages = min_t(unsigned long, num_pages, shrinker->shrinkable_pages);
+	num_pages += shrinker->purgeable_pages;
+	read_unlock(&shrinker->lock);
+
+	return num_pages ? num_pages : SHRINK_EMPTY;
+}
+
+static const struct xe_ttm_lru_walk_ops xe_shrink_ops = {
+	.process_bo = xe_shrinker_process_bo,
+};
+
+static unsigned long xe_shrinker_scan(struct shrinker *shrink, struct shrink_control *sc)
+{
+	struct xe_shrinker *shrinker = to_xe_shrinker(shrink);
+	bool is_kswapd = current_is_kswapd();
+	struct ttm_operation_ctx ctx = {
+		.interruptible = false,
+		.no_wait_gpu = !is_kswapd,
+	};
+	unsigned long nr_to_scan, freed = 0;
+	struct xe_shrink_lru_walk shrink_walk = {
+		.walk = {
+			.ops = &xe_shrink_ops,
+			.ctx = &ctx,
+		},
+		.xe = shrinker->xe,
+		.purge = true,
+		.writeback = is_kswapd,
+	};
+	bool purgeable;
+	long ret;
+
+	sc->nr_scanned = 0;
+	nr_to_scan = sc->nr_to_scan;
+
+	read_lock(&shrinker->lock);
+	purgeable = !!shrinker->purgeable_pages;
+	read_unlock(&shrinker->lock);
+
+	while (purgeable && freed < nr_to_scan) {
+		ret = xe_shrinker_walk(&shrink_walk, nr_to_scan);
+		if (ret <= 0)
+			break;
+
+		freed += ret;
+	}
+
+	sc->nr_scanned = freed;
+	if (freed < nr_to_scan)
+		nr_to_scan -= freed;
+	else
+		nr_to_scan = 0;
+	if (!nr_to_scan)
+		return freed ? freed : SHRINK_STOP;
+
+	shrink_walk.purge = false;
+	nr_to_scan = sc->nr_to_scan;
+	while (freed < nr_to_scan) {
+		ret = xe_shrinker_walk(&shrink_walk, nr_to_scan);
+		if (ret <= 0)
+			break;
+
+		freed += ret;
+	}
+
+	sc->nr_scanned = freed;
+
+	return freed ? freed : SHRINK_STOP;
+}
+
+/**
+ * xe_shrinker_create() - Create an xe per-device shrinker
+ * @xe: Pointer to the xe device.
+ *
+ * Returns: A pointer to the created shrinker on success,
+ * Negative error code on failure.
+ */
+struct xe_shrinker *xe_shrinker_create(struct xe_device *xe)
+{
+	struct xe_shrinker *shrinker = kzalloc(sizeof(*shrinker), GFP_KERNEL);
+
+	if (!shrinker)
+		return ERR_PTR(-ENOMEM);
+
+	shrinker->shrink = shrinker_alloc(0, "xe system shrinker");
+	if (!shrinker->shrink) {
+		kfree(shrinker);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	shrinker->xe = xe;
+	rwlock_init(&shrinker->lock);
+	shrinker->shrink->count_objects = xe_shrinker_count;
+	shrinker->shrink->scan_objects = xe_shrinker_scan;
+	shrinker->shrink->private_data = shrinker;
+	shrinker_register(shrinker->shrink);
+
+	return shrinker;
+}
+
+/**
+ * xe_shrinker_destroy() - Destroy an xe per-device shrinker
+ * @shrinker: Pointer to the shrinker to destroy.
+ */
+void xe_shrinker_destroy(struct xe_shrinker *shrinker)
+{
+	xe_assert(shrinker->xe, !shrinker->shrinkable_pages);
+	xe_assert(shrinker->xe, !shrinker->purgeable_pages);
+	shrinker_free(shrinker->shrink);
+	kfree(shrinker);
+}
diff --git a/drivers/gpu/drm/xe/xe_shrinker.h b/drivers/gpu/drm/xe/xe_shrinker.h
new file mode 100644
index 000000000000..28a038f4fcbf
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_shrinker.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2024 Intel Corporation
+ */
+
+#ifndef _XE_SHRINKER_H_
+#define _XE_SHRINKER_H_
+
+struct xe_shrinker;
+struct xe_device;
+
+void xe_shrinker_mod_pages(struct xe_shrinker *shrinker, long shrinkable, long purgeable);
+
+struct xe_shrinker *xe_shrinker_create(struct xe_device *xe);
+
+void xe_shrinker_destroy(struct xe_shrinker *shrinker);
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_ttm_helpers.c b/drivers/gpu/drm/xe/xe_ttm_helpers.c
index 92c951dee30d..b898eba5d1f8 100644
--- a/drivers/gpu/drm/xe/xe_ttm_helpers.c
+++ b/drivers/gpu/drm/xe/xe_ttm_helpers.c
@@ -9,6 +9,81 @@
 
 #include <drm/ttm/ttm_bo.h>
 #include <drm/ttm/ttm_device.h>
+#include <drm/ttm/ttm_placement.h>
+#include <drm/ttm/ttm_tt.h>
+
+/**
+ * xe_ttm_bo_try_shrink - LRU walk helper to shrink a ttm buffer object.
+ * @walk: The struct xe_ttm_lru_walk that describes the walk.
+ * @bo: The buffer object.
+ * @purge: Whether to attempt to purge the bo content since it's no
+ * longer needed.
+ * @writeback: If !@purge, attempt to write out to persistent storage.
+ *
+ * The function uses the ttm_tt_back_up functionality to back up or
+ * purge a struct ttm_tt. If the bo is not in system, it's first
+ * moved there.
+ *
+ * Return: The number of pages shrunken or purged, or
+ * negative error code on failure.
+ */
+long xe_ttm_bo_try_shrink(struct xe_ttm_lru_walk *walk, struct ttm_buffer_object *bo,
+			  bool purge, bool writeback)
+{
+	static const struct ttm_place sys_placement_flags = {
+		.fpfn = 0,
+		.lpfn = 0,
+		.mem_type = TTM_PL_SYSTEM,
+		.flags = 0,
+	};
+	static struct ttm_placement sys_placement = {
+		.num_placement = 1,
+		.placement = &sys_placement_flags,
+	};
+	struct ttm_operation_ctx *ctx = walk->ctx;
+	struct ttm_tt *tt = bo->ttm;
+	long lret;
+
+	dma_resv_assert_held(bo->base.resv);
+
+	if (!tt || !ttm_tt_is_populated(tt))
+		return 0;
+
+	if (bo->resource->mem_type != TTM_PL_SYSTEM) {
+		int ret = ttm_bo_validate(bo, &sys_placement, ctx);
+
+		if (ret) {
+			if (ret == -EINTR || ret == -EDEADLK ||
+			    ret == -ERESTARTSYS)
+				return ret;
+			return 0;
+		}
+	}
+
+	if (ctx->no_wait_gpu &&
+	    !dma_resv_test_signaled(bo->base.resv,
+				    DMA_RESV_USAGE_BOOKKEEP))
+		return 0;
+
+	lret = dma_resv_wait_timeout(bo->base.resv,
+				     DMA_RESV_USAGE_BOOKKEEP,
+				     ctx->interruptible,
+				     MAX_SCHEDULE_TIMEOUT);
+	if (lret < 0) {
+		if (lret == -ERESTARTSYS)
+			return lret;
+		return 0;
+	}
+
+	if (bo->deleted)
+		lret = ttm_tt_backup(bo->bdev, tt, true, writeback);
+	else
+		lret = ttm_tt_backup(bo->bdev, tt, purge, writeback);
+	if (lret < 0 && lret != -EINTR)
+		return 0;
+
+	return lret;
+}
 
 static bool xe_ttm_lru_walk_trylock(struct xe_ttm_lru_walk *walk,
 				    struct ttm_buffer_object *bo,
diff --git a/drivers/gpu/drm/xe/xe_ttm_helpers.h b/drivers/gpu/drm/xe/xe_ttm_helpers.h
index 080eecb65a80..b4f0822aec71 100644
--- a/drivers/gpu/drm/xe/xe_ttm_helpers.h
+++ b/drivers/gpu/drm/xe/xe_ttm_helpers.h
@@ -41,6 +41,9 @@ struct xe_ttm_lru_walk {
 	struct drm_exec *exec;
 };
 
+long xe_ttm_bo_try_shrink(struct xe_ttm_lru_walk *walk, struct ttm_buffer_object *bo,
+			  bool purge, bool writeback);
+
 long xe_ttm_lru_walk_for_evict(struct xe_ttm_lru_walk *walk, struct ttm_device *bdev,
 			       struct ttm_resource_manager *man, unsigned int mem_type,
 			       long target);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/9] TTM shrinker helpers and xe buffer object shrinker
  2024-04-16 10:07 [PATCH v2 0/9] TTM shrinker helpers and xe buffer object shrinker Thomas Hellström
                   ` (8 preceding siblings ...)
  2024-04-16 10:07 ` [PATCH v2 9/9] drm/xe: Add a shrinker for xe bos Thomas Hellström
@ 2024-04-16 11:55 ` Christian König
  2024-04-16 13:08   ` Thomas Hellström
  9 siblings, 1 reply; 21+ messages in thread
From: Christian König @ 2024-04-16 11:55 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe
  Cc: Somalapuram Amaranath, dri-devel, Matthew Brost, Kuehling, Felix

While patches 1-4 look good from a high level I still think it needs 
some prerequisite and re-ordering.

First of all make all the cleanups separate patches. In other words that 
ttm_resource_manager_next() takes only the cursor as argument, adding 
ttm_resource_cursor_fini()/ttm_resource_cursor_fini_locked() as one 
patch and then ttm_lru_bulk_move_init()/ttm_lru_bulk_move_fini() as second.

With that done I think we should first switch over TTM and all drivers 
using it to drm_exec as part of it's context object.

Then I would switch over to using LRU hitches for both swapping and 
eviction.

And when that's finally done we can take a look into the partial shmem 
swapping :)

And Felix is really (and mean *really*) looking forward to the partial 
shmem swapping as well.

Regards,
Christian.

Am 16.04.24 um 12:07 schrieb Thomas Hellström:
> This series implements TTM shrinker / eviction helpers and an xe bo
> shrinker. It builds on two previous series, *and obsoletes these*. First
>
> https://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg484425.html
>
> for patch 1-4, which IMO still could be reviewed and pushed as a
> separate series.
>
> Second the previous TTM shrinker series
>
> https://lore.kernel.org/linux-mm/b7491378-defd-4f1c-31e2-29e4c77e2d67@amd.com/T/
>
> Where the comment about layering
> https://lore.kernel.org/linux-mm/b7491378-defd-4f1c-31e2-29e4c77e2d67@amd.com/T/#ma918844aa8a6efe8768fdcda0c6590d5c93850c9
>
> now addressed, and this version also implements shmem objects for backup
> rather than direct swap-cache insertions, which was used in the previuos
> series. It turns out that with per-page backup / shrinking, shmem objects
> appears to work just as well as direct swap-cache insertions with the
> added benefit that was introduced in the previous TTM shrinker series to
> avoid running out of swap entries isn't really needed.
>
> In any case, patch 1-4 are better described in their separate series.
> (RFC is removed for those).
>
> Patch 5 could in theory be skipped but introduces a possibility to easily
> add or test multiple backup backends, like the direct swap-cache
> insertion or even files into fast dedicated nvme storage for for example.
>
> Patch 6 introduces helpers in the ttm_pool code for page-by-page shrinking
> and recovery. It avoids having to temporarily allocate a huge amount of
> memory to be able to shrink a buffer object. It also introduces the
> possibility to immediately write-back pages if needed, since that tends
> to be a bit delayed when left to kswapd.
>
> Patch 7 Adds a simple error injection to the above code to help increase
> test coverage.
>
> Patch 8 introduces a LRU walk helper for eviction and shrinking. It's
> currently xe-only but not xe-specific and can easily be moved to TTM when
> used by more than one driver or when eviction is implemented using it.
>
> Patch 9 introduces a helper callback for shrinking (Also ready to be
> moved to TTM) and an xe-specific shrinker implementation. It also
> adds a kunit test to test the shrinker functionality by trying to
> allocate twice the available amount of RAM as buffer objects. If there
> is no swap-space available, the buffer objects are marked
> purgeable.
>
> v2:
> - Squash obsolete revision history in the patch commit messages.
> - Fix a couple of review comments by Christian
> - Don't store the mem_type in the TTM managers but in the
>    resource cursor.
> - Rename introduced TTM *back_up* function names to *backup*
> - Add ttm pool recovery fault injection.
> - Shrinker xe kunit test
> - Various bugfixes
>
> Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: <dri-devel@lists.freedesktop.org>
>
> Thomas Hellström (8):
>    drm/ttm: Allow TTM LRU list nodes of different types
>    drm/ttm: Use LRU hitches
>    drm/ttm, drm/amdgpu, drm/xe: Consider hitch moves within bulk sublist
>      moves
>    drm/ttm: Allow continued swapout after -ENOSPC falure
>    drm/ttm: Add a virtual base class for graphics memory backup
>    drm/ttm/pool: Provide a helper to shrink pages.
>    drm/xe, drm/ttm: Provide a generic LRU walker helper
>    drm/xe: Add a shrinker for xe bos
>
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |   4 +
>   drivers/gpu/drm/ttm/Makefile           |   2 +-
>   drivers/gpu/drm/ttm/ttm_backup_shmem.c | 137 +++++++++
>   drivers/gpu/drm/ttm/ttm_bo.c           |   1 +
>   drivers/gpu/drm/ttm/ttm_device.c       |  33 ++-
>   drivers/gpu/drm/ttm/ttm_pool.c         | 391 ++++++++++++++++++++++++-
>   drivers/gpu/drm/ttm/ttm_resource.c     | 231 ++++++++++++---
>   drivers/gpu/drm/ttm/ttm_tt.c           |  34 +++
>   drivers/gpu/drm/xe/Makefile            |   2 +
>   drivers/gpu/drm/xe/xe_bo.c             | 123 ++++++--
>   drivers/gpu/drm/xe/xe_bo.h             |   3 +
>   drivers/gpu/drm/xe/xe_device.c         |   8 +
>   drivers/gpu/drm/xe/xe_device_types.h   |   2 +
>   drivers/gpu/drm/xe/xe_shrinker.c       | 237 +++++++++++++++
>   drivers/gpu/drm/xe/xe_shrinker.h       |  18 ++
>   drivers/gpu/drm/xe/xe_ttm_helpers.c    | 224 ++++++++++++++
>   drivers/gpu/drm/xe/xe_ttm_helpers.h    |  63 ++++
>   drivers/gpu/drm/xe/xe_vm.c             |   4 +
>   include/drm/ttm/ttm_backup.h           | 136 +++++++++
>   include/drm/ttm/ttm_device.h           |   2 +
>   include/drm/ttm/ttm_pool.h             |   4 +
>   include/drm/ttm/ttm_resource.h         |  96 +++++-
>   include/drm/ttm/ttm_tt.h               |  19 ++
>   23 files changed, 1683 insertions(+), 91 deletions(-)
>   create mode 100644 drivers/gpu/drm/ttm/ttm_backup_shmem.c
>   create mode 100644 drivers/gpu/drm/xe/xe_shrinker.c
>   create mode 100644 drivers/gpu/drm/xe/xe_shrinker.h
>   create mode 100644 drivers/gpu/drm/xe/xe_ttm_helpers.c
>   create mode 100644 drivers/gpu/drm/xe/xe_ttm_helpers.h
>   create mode 100644 include/drm/ttm/ttm_backup.h
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/9] TTM shrinker helpers and xe buffer object shrinker
  2024-04-16 11:55 ` [PATCH v2 0/9] TTM shrinker helpers and xe buffer object shrinker Christian König
@ 2024-04-16 13:08   ` Thomas Hellström
  2024-04-16 13:24     ` Christian König
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Hellström @ 2024-04-16 13:08 UTC (permalink / raw)
  To: Christian König, intel-xe
  Cc: Somalapuram Amaranath, dri-devel, Matthew Brost, Kuehling, Felix

Hi, Christian,

On Tue, 2024-04-16 at 13:55 +0200, Christian König wrote:
> While patches 1-4 look good from a high level I still think it needs 
> some prerequisite and re-ordering.
> 
> First of all make all the cleanups separate patches. In other words
> that 
> ttm_resource_manager_next() takes only the cursor as argument, adding
> ttm_resource_cursor_fini()/ttm_resource_cursor_fini_locked() as one 
> patch and then ttm_lru_bulk_move_init()/ttm_lru_bulk_move_fini() as
> second.

Yes, I can take a look at that. I think the shortening of the argument
list of ttm_resource_manager_next() makes sense as a separate cleanup. 

The other two are needed because of the changes introduced in the
respective patches. I could of course add stubs of these functions
before the patch that currently introduce them if needed, but don't
really see the point. What do you think.

> 
> With that done I think we should first switch over TTM and all
> drivers 
> using it to drm_exec as part of it's context object.

So are you ok with adding an optional drm_exec pointer in the
ttm_operation_ctx for this? (That was my plan moving forward).

However, when that has been added, I think it makes sense to leave to
the driver author to port their validation loops and bo allocation over
to using drm_exec. While we made sure the drm_exec object was indeed
passed to the validation helper in the drm_gpuvm code, I'm not sure
everybody actually includes their validation and bo allocation (for
example page-table-bos) in their drm_exec while_not_all_locked() loop,
and I think it's reasonable to require the "port the driver over" to be
an optional but strongly recommended driver effort. If the driver sets
ctx->drm_exec to NULL, it will fallback to current behaviour.

> 
> Then I would switch over to using LRU hitches for both swapping and 
> eviction.
> 
> And when that's finally done we can take a look into the partial
> shmem 
> swapping :)
> 
> And Felix is really (and mean *really*) looking forward to the
> partial 
> shmem swapping as well.

While the LRU walker helper introduced in patch 8 has drm_exec support,
shrinkers don't require it, since they are always trylocking. (However
being able to "evict" system to swap directly in the validation stage
using drm_exec locking is probably something we should support). 

That's why I opted for implementing shrinking before exhaustive
eviction. But if you insist we can do it the other way around. Most of
what's needed is already in the patches.

/Thomas


> 
> Regards,
> Christian.
> 
> Am 16.04.24 um 12:07 schrieb Thomas Hellström:
> > This series implements TTM shrinker / eviction helpers and an xe bo
> > shrinker. It builds on two previous series, *and obsoletes these*.
> > First
> > 
> > https://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg484425.html
> > 
> > for patch 1-4, which IMO still could be reviewed and pushed as a
> > separate series.
> > 
> > Second the previous TTM shrinker series
> > 
> > https://lore.kernel.org/linux-mm/b7491378-defd-4f1c-31e2-29e4c77e2d67@amd.com/T/
> > 
> > Where the comment about layering
> > https://lore.kernel.org/linux-mm/b7491378-defd-4f1c-31e2-29e4c77e2d67@amd.com/T/#ma918844aa8a6efe8768fdcda0c6590d5c93850c9
> > 
> > now addressed, and this version also implements shmem objects for
> > backup
> > rather than direct swap-cache insertions, which was used in the
> > previuos
> > series. It turns out that with per-page backup / shrinking, shmem
> > objects
> > appears to work just as well as direct swap-cache insertions with
> > the
> > added benefit that was introduced in the previous TTM shrinker
> > series to
> > avoid running out of swap entries isn't really needed.
> > 
> > In any case, patch 1-4 are better described in their separate
> > series.
> > (RFC is removed for those).
> > 
> > Patch 5 could in theory be skipped but introduces a possibility to
> > easily
> > add or test multiple backup backends, like the direct swap-cache
> > insertion or even files into fast dedicated nvme storage for for
> > example.
> > 
> > Patch 6 introduces helpers in the ttm_pool code for page-by-page
> > shrinking
> > and recovery. It avoids having to temporarily allocate a huge
> > amount of
> > memory to be able to shrink a buffer object. It also introduces the
> > possibility to immediately write-back pages if needed, since that
> > tends
> > to be a bit delayed when left to kswapd.
> > 
> > Patch 7 Adds a simple error injection to the above code to help
> > increase
> > test coverage.
> > 
> > Patch 8 introduces a LRU walk helper for eviction and shrinking.
> > It's
> > currently xe-only but not xe-specific and can easily be moved to
> > TTM when
> > used by more than one driver or when eviction is implemented using
> > it.
> > 
> > Patch 9 introduces a helper callback for shrinking (Also ready to
> > be
> > moved to TTM) and an xe-specific shrinker implementation. It also
> > adds a kunit test to test the shrinker functionality by trying to
> > allocate twice the available amount of RAM as buffer objects. If
> > there
> > is no swap-space available, the buffer objects are marked
> > purgeable.
> > 
> > v2:
> > - Squash obsolete revision history in the patch commit messages.
> > - Fix a couple of review comments by Christian
> > - Don't store the mem_type in the TTM managers but in the
> >    resource cursor.
> > - Rename introduced TTM *back_up* function names to *backup*
> > - Add ttm pool recovery fault injection.
> > - Shrinker xe kunit test
> > - Various bugfixes
> > 
> > Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
> > Cc: Christian König <christian.koenig@amd.com>
> > Cc: <dri-devel@lists.freedesktop.org>
> > 
> > Thomas Hellström (8):
> >    drm/ttm: Allow TTM LRU list nodes of different types
> >    drm/ttm: Use LRU hitches
> >    drm/ttm, drm/amdgpu, drm/xe: Consider hitch moves within bulk
> > sublist
> >      moves
> >    drm/ttm: Allow continued swapout after -ENOSPC falure
> >    drm/ttm: Add a virtual base class for graphics memory backup
> >    drm/ttm/pool: Provide a helper to shrink pages.
> >    drm/xe, drm/ttm: Provide a generic LRU walker helper
> >    drm/xe: Add a shrinker for xe bos
> > 
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |   4 +
> >   drivers/gpu/drm/ttm/Makefile           |   2 +-
> >   drivers/gpu/drm/ttm/ttm_backup_shmem.c | 137 +++++++++
> >   drivers/gpu/drm/ttm/ttm_bo.c           |   1 +
> >   drivers/gpu/drm/ttm/ttm_device.c       |  33 ++-
> >   drivers/gpu/drm/ttm/ttm_pool.c         | 391
> > ++++++++++++++++++++++++-
> >   drivers/gpu/drm/ttm/ttm_resource.c     | 231 ++++++++++++---
> >   drivers/gpu/drm/ttm/ttm_tt.c           |  34 +++
> >   drivers/gpu/drm/xe/Makefile            |   2 +
> >   drivers/gpu/drm/xe/xe_bo.c             | 123 ++++++--
> >   drivers/gpu/drm/xe/xe_bo.h             |   3 +
> >   drivers/gpu/drm/xe/xe_device.c         |   8 +
> >   drivers/gpu/drm/xe/xe_device_types.h   |   2 +
> >   drivers/gpu/drm/xe/xe_shrinker.c       | 237 +++++++++++++++
> >   drivers/gpu/drm/xe/xe_shrinker.h       |  18 ++
> >   drivers/gpu/drm/xe/xe_ttm_helpers.c    | 224 ++++++++++++++
> >   drivers/gpu/drm/xe/xe_ttm_helpers.h    |  63 ++++
> >   drivers/gpu/drm/xe/xe_vm.c             |   4 +
> >   include/drm/ttm/ttm_backup.h           | 136 +++++++++
> >   include/drm/ttm/ttm_device.h           |   2 +
> >   include/drm/ttm/ttm_pool.h             |   4 +
> >   include/drm/ttm/ttm_resource.h         |  96 +++++-
> >   include/drm/ttm/ttm_tt.h               |  19 ++
> >   23 files changed, 1683 insertions(+), 91 deletions(-)
> >   create mode 100644 drivers/gpu/drm/ttm/ttm_backup_shmem.c
> >   create mode 100644 drivers/gpu/drm/xe/xe_shrinker.c
> >   create mode 100644 drivers/gpu/drm/xe/xe_shrinker.h
> >   create mode 100644 drivers/gpu/drm/xe/xe_ttm_helpers.c
> >   create mode 100644 drivers/gpu/drm/xe/xe_ttm_helpers.h
> >   create mode 100644 include/drm/ttm/ttm_backup.h
> > 
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/9] TTM shrinker helpers and xe buffer object shrinker
  2024-04-16 13:08   ` Thomas Hellström
@ 2024-04-16 13:24     ` Christian König
  0 siblings, 0 replies; 21+ messages in thread
From: Christian König @ 2024-04-16 13:24 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe
  Cc: Somalapuram Amaranath, dri-devel, Matthew Brost, Kuehling, Felix

Am 16.04.24 um 15:08 schrieb Thomas Hellström:
> Hi, Christian,
>
> On Tue, 2024-04-16 at 13:55 +0200, Christian König wrote:
>> While patches 1-4 look good from a high level I still think it needs
>> some prerequisite and re-ordering.
>>
>> First of all make all the cleanups separate patches. In other words
>> that
>> ttm_resource_manager_next() takes only the cursor as argument, adding
>> ttm_resource_cursor_fini()/ttm_resource_cursor_fini_locked() as one
>> patch and then ttm_lru_bulk_move_init()/ttm_lru_bulk_move_fini() as
>> second.
> Yes, I can take a look at that. I think the shortening of the argument
> list of ttm_resource_manager_next() makes sense as a separate cleanup.
>
> The other two are needed because of the changes introduced in the
> respective patches. I could of course add stubs of these functions
> before the patch that currently introduce them if needed, but don't
> really see the point. What do you think.

Na, stubs doesn't make sense.

I was under the expression that you would have something to do for them 
even without the LRU patch. If that's not the case then just skip that.

>
>> With that done I think we should first switch over TTM and all
>> drivers
>> using it to drm_exec as part of it's context object.
> So are you ok with adding an optional drm_exec pointer in the
> ttm_operation_ctx for this? (That was my plan moving forward).

Yeah, perfectly valid. My thinking was just to do that first.

>
> However, when that has been added, I think it makes sense to leave to
> the driver author to port their validation loops and bo allocation over
> to using drm_exec. While we made sure the drm_exec object was indeed
> passed to the validation helper in the drm_gpuvm code, I'm not sure
> everybody actually includes their validation and bo allocation (for
> example page-table-bos) in their drm_exec while_not_all_locked() loop,
> and I think it's reasonable to require the "port the driver over" to be
> an optional but strongly recommended driver effort. If the driver sets
> ctx->drm_exec to NULL, it will fallback to current behaviour.

For the intermediate case I think it makes sense to have this optional, 
but in the long run we should make it mandatory.

>
>> Then I would switch over to using LRU hitches for both swapping and
>> eviction.
>>
>> And when that's finally done we can take a look into the partial
>> shmem
>> swapping :)
>>
>> And Felix is really (and mean *really*) looking forward to the
>> partial
>> shmem swapping as well.
> While the LRU walker helper introduced in patch 8 has drm_exec support,
> shrinkers don't require it, since they are always trylocking. (However
> being able to "evict" system to swap directly in the validation stage
> using drm_exec locking is probably something we should support).
>
> That's why I opted for implementing shrinking before exhaustive
> eviction. But if you insist we can do it the other way around. Most of
> what's needed is already in the patches.

I don't care about the order in which things are implemented, but I 
think we should have at least some eviction prototype as well for testing.

Eviction is just the much easier to exercise use case.

Regards,
Christian.

>
> /Thomas
>
>
>> Regards,
>> Christian.
>>
>> Am 16.04.24 um 12:07 schrieb Thomas Hellström:
>>> This series implements TTM shrinker / eviction helpers and an xe bo
>>> shrinker. It builds on two previous series, *and obsoletes these*.
>>> First
>>>
>>> https://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg484425.html
>>>
>>> for patch 1-4, which IMO still could be reviewed and pushed as a
>>> separate series.
>>>
>>> Second the previous TTM shrinker series
>>>
>>> https://lore.kernel.org/linux-mm/b7491378-defd-4f1c-31e2-29e4c77e2d67@amd.com/T/
>>>
>>> Where the comment about layering
>>> https://lore.kernel.org/linux-mm/b7491378-defd-4f1c-31e2-29e4c77e2d67@amd.com/T/#ma918844aa8a6efe8768fdcda0c6590d5c93850c9
>>>
>>> now addressed, and this version also implements shmem objects for
>>> backup
>>> rather than direct swap-cache insertions, which was used in the
>>> previuos
>>> series. It turns out that with per-page backup / shrinking, shmem
>>> objects
>>> appears to work just as well as direct swap-cache insertions with
>>> the
>>> added benefit that was introduced in the previous TTM shrinker
>>> series to
>>> avoid running out of swap entries isn't really needed.
>>>
>>> In any case, patch 1-4 are better described in their separate
>>> series.
>>> (RFC is removed for those).
>>>
>>> Patch 5 could in theory be skipped but introduces a possibility to
>>> easily
>>> add or test multiple backup backends, like the direct swap-cache
>>> insertion or even files into fast dedicated nvme storage for for
>>> example.
>>>
>>> Patch 6 introduces helpers in the ttm_pool code for page-by-page
>>> shrinking
>>> and recovery. It avoids having to temporarily allocate a huge
>>> amount of
>>> memory to be able to shrink a buffer object. It also introduces the
>>> possibility to immediately write-back pages if needed, since that
>>> tends
>>> to be a bit delayed when left to kswapd.
>>>
>>> Patch 7 Adds a simple error injection to the above code to help
>>> increase
>>> test coverage.
>>>
>>> Patch 8 introduces a LRU walk helper for eviction and shrinking.
>>> It's
>>> currently xe-only but not xe-specific and can easily be moved to
>>> TTM when
>>> used by more than one driver or when eviction is implemented using
>>> it.
>>>
>>> Patch 9 introduces a helper callback for shrinking (Also ready to
>>> be
>>> moved to TTM) and an xe-specific shrinker implementation. It also
>>> adds a kunit test to test the shrinker functionality by trying to
>>> allocate twice the available amount of RAM as buffer objects. If
>>> there
>>> is no swap-space available, the buffer objects are marked
>>> purgeable.
>>>
>>> v2:
>>> - Squash obsolete revision history in the patch commit messages.
>>> - Fix a couple of review comments by Christian
>>> - Don't store the mem_type in the TTM managers but in the
>>>     resource cursor.
>>> - Rename introduced TTM *back_up* function names to *backup*
>>> - Add ttm pool recovery fault injection.
>>> - Shrinker xe kunit test
>>> - Various bugfixes
>>>
>>> Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
>>> Cc: Christian König <christian.koenig@amd.com>
>>> Cc: <dri-devel@lists.freedesktop.org>
>>>
>>> Thomas Hellström (8):
>>>     drm/ttm: Allow TTM LRU list nodes of different types
>>>     drm/ttm: Use LRU hitches
>>>     drm/ttm, drm/amdgpu, drm/xe: Consider hitch moves within bulk
>>> sublist
>>>       moves
>>>     drm/ttm: Allow continued swapout after -ENOSPC falure
>>>     drm/ttm: Add a virtual base class for graphics memory backup
>>>     drm/ttm/pool: Provide a helper to shrink pages.
>>>     drm/xe, drm/ttm: Provide a generic LRU walker helper
>>>     drm/xe: Add a shrinker for xe bos
>>>
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |   4 +
>>>    drivers/gpu/drm/ttm/Makefile           |   2 +-
>>>    drivers/gpu/drm/ttm/ttm_backup_shmem.c | 137 +++++++++
>>>    drivers/gpu/drm/ttm/ttm_bo.c           |   1 +
>>>    drivers/gpu/drm/ttm/ttm_device.c       |  33 ++-
>>>    drivers/gpu/drm/ttm/ttm_pool.c         | 391
>>> ++++++++++++++++++++++++-
>>>    drivers/gpu/drm/ttm/ttm_resource.c     | 231 ++++++++++++---
>>>    drivers/gpu/drm/ttm/ttm_tt.c           |  34 +++
>>>    drivers/gpu/drm/xe/Makefile            |   2 +
>>>    drivers/gpu/drm/xe/xe_bo.c             | 123 ++++++--
>>>    drivers/gpu/drm/xe/xe_bo.h             |   3 +
>>>    drivers/gpu/drm/xe/xe_device.c         |   8 +
>>>    drivers/gpu/drm/xe/xe_device_types.h   |   2 +
>>>    drivers/gpu/drm/xe/xe_shrinker.c       | 237 +++++++++++++++
>>>    drivers/gpu/drm/xe/xe_shrinker.h       |  18 ++
>>>    drivers/gpu/drm/xe/xe_ttm_helpers.c    | 224 ++++++++++++++
>>>    drivers/gpu/drm/xe/xe_ttm_helpers.h    |  63 ++++
>>>    drivers/gpu/drm/xe/xe_vm.c             |   4 +
>>>    include/drm/ttm/ttm_backup.h           | 136 +++++++++
>>>    include/drm/ttm/ttm_device.h           |   2 +
>>>    include/drm/ttm/ttm_pool.h             |   4 +
>>>    include/drm/ttm/ttm_resource.h         |  96 +++++-
>>>    include/drm/ttm/ttm_tt.h               |  19 ++
>>>    23 files changed, 1683 insertions(+), 91 deletions(-)
>>>    create mode 100644 drivers/gpu/drm/ttm/ttm_backup_shmem.c
>>>    create mode 100644 drivers/gpu/drm/xe/xe_shrinker.c
>>>    create mode 100644 drivers/gpu/drm/xe/xe_shrinker.h
>>>    create mode 100644 drivers/gpu/drm/xe/xe_ttm_helpers.c
>>>    create mode 100644 drivers/gpu/drm/xe/xe_ttm_helpers.h
>>>    create mode 100644 include/drm/ttm/ttm_backup.h
>>>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 9/9] drm/xe: Add a shrinker for xe bos
  2024-04-16 10:07 ` [PATCH v2 9/9] drm/xe: Add a shrinker for xe bos Thomas Hellström
@ 2024-04-16 21:32   ` kernel test robot
  2024-04-17 13:45   ` kernel test robot
  1 sibling, 0 replies; 21+ messages in thread
From: kernel test robot @ 2024-04-16 21:32 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe
  Cc: llvm, oe-kbuild-all, Thomas Hellström, Matthew Brost,
	Somalapuram Amaranath, Christian König, dri-devel

Hi Thomas,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-xe/drm-xe-next]
[also build test WARNING on drm-intel/for-linux-next-fixes drm-tip/drm-tip linus/master v6.9-rc4 next-20240416]
[cannot apply to drm-misc/drm-misc-next drm/drm-next drm-exynos/exynos-drm-next drm-intel/for-linux-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Thomas-Hellstr-m/drm-ttm-Allow-TTM-LRU-list-nodes-of-different-types/20240416-181717
base:   https://gitlab.freedesktop.org/drm/xe/kernel.git drm-xe-next
patch link:    https://lore.kernel.org/r/20240416100730.6666-10-thomas.hellstrom%40linux.intel.com
patch subject: [PATCH v2 9/9] drm/xe: Add a shrinker for xe bos
config: i386-buildonly-randconfig-001-20240417 (https://download.01.org/0day-ci/archive/20240417/202404170528.tBjGQKCR-lkp@intel.com/config)
compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240417/202404170528.tBjGQKCR-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202404170528.tBjGQKCR-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from drivers/gpu/drm/xe/xe_bo.c:2420:
>> drivers/gpu/drm/xe/tests/xe_bo.c:382:6: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat]
     381 |         kunit_info(test, "Free ram is %lu bytes. Will allocate twice of that.\n",
         |                                       ~~~
         |                                       %zu
     382 |                    total);
         |                    ^~~~~
   include/kunit/test.h:546:39: note: expanded from macro 'kunit_info'
     546 |         kunit_printk(KERN_INFO, test, fmt, ##__VA_ARGS__)
         |                                       ~~~    ^~~~~~~~~~~
   include/kunit/test.h:534:21: note: expanded from macro 'kunit_printk'
     533 |         kunit_log(lvl, test, KUNIT_SUBTEST_INDENT "# %s: " fmt,         \
         |                                                            ~~~
     534 |                   (test)->name, ##__VA_ARGS__)
         |                                   ^~~~~~~~~~~
   include/kunit/test.h:527:21: note: expanded from macro 'kunit_log'
     527 |                 printk(lvl fmt, ##__VA_ARGS__);                         \
         |                            ~~~    ^~~~~~~~~~~
   include/linux/printk.h:457:60: note: expanded from macro 'printk'
     457 | #define printk(fmt, ...) printk_index_wrap(_printk, fmt, ##__VA_ARGS__)
         |                                                     ~~~    ^~~~~~~~~~~
   include/linux/printk.h:429:19: note: expanded from macro 'printk_index_wrap'
     429 |                 _p_func(_fmt, ##__VA_ARGS__);                           \
         |                         ~~~~    ^~~~~~~~~~~
   In file included from drivers/gpu/drm/xe/xe_bo.c:2420:
>> drivers/gpu/drm/xe/tests/xe_bo.c:382:6: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat]
     381 |         kunit_info(test, "Free ram is %lu bytes. Will allocate twice of that.\n",
         |                                       ~~~
         |                                       %zu
     382 |                    total);
         |                    ^~~~~
   include/kunit/test.h:546:39: note: expanded from macro 'kunit_info'
     546 |         kunit_printk(KERN_INFO, test, fmt, ##__VA_ARGS__)
         |                                       ~~~    ^~~~~~~~~~~
   include/kunit/test.h:534:21: note: expanded from macro 'kunit_printk'
     533 |         kunit_log(lvl, test, KUNIT_SUBTEST_INDENT "# %s: " fmt,         \
         |                                                            ~~~
     534 |                   (test)->name, ##__VA_ARGS__)
         |                                   ^~~~~~~~~~~
   include/kunit/test.h:529:8: note: expanded from macro 'kunit_log'
     528 |                 kunit_log_append((test_or_suite)->log,  fmt,            \
         |                                                         ~~~
     529 |                                  ##__VA_ARGS__);                        \
         |                                    ^~~~~~~~~~~
   2 warnings generated.


vim +382 drivers/gpu/drm/xe/tests/xe_bo.c

   362	
   363	/*
   364	 * Try to create system bos corresponding to twice the amount
   365	 * of available system memory to test shrinker functionality.
   366	 * If no swap space is available to accommodate the
   367	 * memory overcommit, mark bos purgeable.
   368	 */
   369	static int shrink_test_run_device(struct xe_device *xe)
   370	{
   371		struct kunit *test = xe_cur_kunit();
   372		LIST_HEAD(bos);
   373		struct xe_bo_link *link, *next;
   374		struct sysinfo si;
   375		size_t total, alloced;
   376		unsigned int interrupted = 0, successful = 0;
   377	
   378		si_meminfo(&si);
   379		total = si.freeram * si.mem_unit;
   380	
   381		kunit_info(test, "Free ram is %lu bytes. Will allocate twice of that.\n",
 > 382			   total);
   383	
   384		total <<= 1;
   385		for (alloced = 0; alloced < total ; alloced += XE_BO_SHRINK_SIZE) {
   386			struct xe_bo *bo;
   387			unsigned int mem_type;
   388	
   389			link = kzalloc(sizeof(*link), GFP_KERNEL);
   390			if (!link) {
   391				KUNIT_FAIL(test, "Unexpeced link allocation failure\n");
   392				break;
   393			}
   394	
   395			INIT_LIST_HEAD(&link->link);
   396	
   397			/* We can create bos using WC caching here. But it is slower. */
   398			bo = xe_bo_create_user(xe, NULL, NULL, XE_BO_SHRINK_SIZE,
   399					       DRM_XE_GEM_CPU_CACHING_WB,
   400					       ttm_bo_type_device,
   401					       XE_BO_FLAG_SYSTEM);
   402			if (IS_ERR(bo)) {
   403				if (bo != ERR_PTR(-ENOMEM) && bo != ERR_PTR(-ENOSPC) &&
   404				    bo != ERR_PTR(-EINTR) && bo != ERR_PTR(-ERESTARTSYS))
   405					KUNIT_FAIL(test, "Error creating bo: %pe\n", bo);
   406				kfree(link);
   407				break;
   408			}
   409			link->bo = bo;
   410			list_add_tail(&link->link, &bos);
   411			xe_bo_lock(bo, false);
   412	
   413			/*
   414			 * If we're low on swap entries, we can't shrink unless the bo
   415			 * is marked purgeable.
   416			 */
   417			if (get_nr_swap_pages() < (XE_BO_SHRINK_SIZE >> PAGE_SHIFT) * 128) {
   418				struct xe_ttm_tt *xe_tt =
   419					container_of(bo->ttm.ttm, typeof(*xe_tt), ttm);
   420				long num_pages = xe_tt->ttm.num_pages;
   421	
   422				xe_tt->purgeable = true;
   423				xe_shrinker_mod_pages(xe->mem.shrinker, -num_pages,
   424						      num_pages);
   425			}
   426	
   427			mem_type = bo->ttm.resource->mem_type;
   428			xe_bo_unlock(bo);
   429			if (mem_type != XE_PL_TT)
   430				KUNIT_FAIL(test, "Bo in incorrect memory type: %u\n",
   431					   bo->ttm.resource->mem_type);
   432			cond_resched();
   433			if (signal_pending(current))
   434				break;
   435		}
   436	
   437		/* Read back and destroy bos */
   438		list_for_each_entry_safe_reverse(link, next, &bos, link) {
   439			static struct ttm_operation_ctx ctx = {.interruptible = true};
   440			struct xe_bo *bo = link->bo;
   441			int ret;
   442	
   443			if (!signal_pending(current)) {
   444				xe_bo_lock(bo, NULL);
   445				ret = ttm_bo_validate(&bo->ttm, &tt_placement, &ctx);
   446				xe_bo_unlock(bo);
   447				if (ret && ret != -EINTR)
   448					KUNIT_FAIL(test, "Validation failed: %pe\n",
   449						   ERR_PTR(ret));
   450				else if (ret)
   451					interrupted++;
   452				else
   453					successful++;
   454			}
   455			xe_bo_put(link->bo);
   456			list_del(&link->link);
   457			kfree(link);
   458			cond_resched();
   459		}
   460		kunit_info(test, "Readbacks interrupted: %u successful: %u\n",
   461			   interrupted, successful);
   462	
   463		return 0;
   464	}
   465	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/9] drm/ttm: Allow TTM LRU list nodes of different types
  2024-04-16 10:07 ` [PATCH v2 1/9] drm/ttm: Allow TTM LRU list nodes of different types Thomas Hellström
@ 2024-04-17  1:15   ` Matthew Brost
  2024-04-17  6:09     ` Christian König
  0 siblings, 1 reply; 21+ messages in thread
From: Matthew Brost @ 2024-04-17  1:15 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Christian König, Somalapuram Amaranath, dri-devel

On Tue, Apr 16, 2024 at 12:07:22PM +0200, Thomas Hellström wrote:
> To be able to handle list unlocking while traversing the LRU
> list, we want the iterators not only to point to the next
> position of the list traversal, but to insert themselves as
> list nodes at that point to work around the fact that the
> next node might otherwise disappear from the list while
> the iterator is pointing to it.
> 
> These list nodes need to be easily distinguishable from other
> list nodes so that others traversing the list can skip
> over them.
> 
> So declare a struct ttm_lru_item, with a struct list_head member
> and a type enum. This will slightly increase the size of a
> struct ttm_resource.
> 
> Changes in previous series:
> - Update enum ttm_lru_item_type documentation.
> 

Patch itself makes sense to me. One style question (or maybe
suggestion?) below.

> Cc: Christian König <christian.koenig@amd.com>
> Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
> Cc: <dri-devel@lists.freedesktop.org>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/ttm/ttm_device.c   | 13 ++++--
>  drivers/gpu/drm/ttm/ttm_resource.c | 70 ++++++++++++++++++++++--------
>  include/drm/ttm/ttm_resource.h     | 51 +++++++++++++++++++++-
>  3 files changed, 110 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
> index 76027960054f..f27406e851e5 100644
> --- a/drivers/gpu/drm/ttm/ttm_device.c
> +++ b/drivers/gpu/drm/ttm/ttm_device.c
> @@ -270,17 +270,22 @@ EXPORT_SYMBOL(ttm_device_fini);
>  static void ttm_device_clear_lru_dma_mappings(struct ttm_device *bdev,
>  					      struct list_head *list)
>  {
> -	struct ttm_resource *res;
> +	struct ttm_lru_item *lru;
>  
>  	spin_lock(&bdev->lru_lock);
> -	while ((res = list_first_entry_or_null(list, typeof(*res), lru))) {
> -		struct ttm_buffer_object *bo = res->bo;
> +	while ((lru = list_first_entry_or_null(list, typeof(*lru), link))) {
> +		struct ttm_buffer_object *bo;
> +
> +		if (!ttm_lru_item_is_res(lru))
> +			continue;
> +
> +		bo = ttm_lru_item_to_res(lru)->bo;
>  
>  		/* Take ref against racing releases once lru_lock is unlocked */
>  		if (!ttm_bo_get_unless_zero(bo))
>  			continue;
>  
> -		list_del_init(&res->lru);
> +		list_del_init(&bo->resource->lru.link);
>  		spin_unlock(&bdev->lru_lock);
>  
>  		if (bo->ttm)
> diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
> index be8d286513f9..7aa5ca5c0e33 100644
> --- a/drivers/gpu/drm/ttm/ttm_resource.c
> +++ b/drivers/gpu/drm/ttm/ttm_resource.c
> @@ -69,8 +69,8 @@ void ttm_lru_bulk_move_tail(struct ttm_lru_bulk_move *bulk)
>  			dma_resv_assert_held(pos->last->bo->base.resv);
>  
>  			man = ttm_manager_type(pos->first->bo->bdev, i);
> -			list_bulk_move_tail(&man->lru[j], &pos->first->lru,
> -					    &pos->last->lru);
> +			list_bulk_move_tail(&man->lru[j], &pos->first->lru.link,
> +					    &pos->last->lru.link);
>  		}
>  	}
>  }
> @@ -83,14 +83,38 @@ ttm_lru_bulk_move_pos(struct ttm_lru_bulk_move *bulk, struct ttm_resource *res)
>  	return &bulk->pos[res->mem_type][res->bo->priority];
>  }
>  
> +/* Return the previous resource on the list (skip over non-resource list items) */
> +static struct ttm_resource *ttm_lru_prev_res(struct ttm_resource *cur)
> +{
> +	struct ttm_lru_item *lru = &cur->lru;
> +
> +	do {
> +		lru = list_prev_entry(lru, link);
> +	} while (!ttm_lru_item_is_res(lru));
> +
> +	return ttm_lru_item_to_res(lru);
> +}
> +
> +/* Return the next resource on the list (skip over non-resource list items) */
> +static struct ttm_resource *ttm_lru_next_res(struct ttm_resource *cur)
> +{
> +	struct ttm_lru_item *lru = &cur->lru;
> +
> +	do {
> +		lru = list_next_entry(lru, link);
> +	} while (!ttm_lru_item_is_res(lru));
> +
> +	return ttm_lru_item_to_res(lru);
> +}
> +
>  /* Move the resource to the tail of the bulk move range */
>  static void ttm_lru_bulk_move_pos_tail(struct ttm_lru_bulk_move_pos *pos,
>  				       struct ttm_resource *res)
>  {
>  	if (pos->last != res) {
>  		if (pos->first == res)
> -			pos->first = list_next_entry(res, lru);
> -		list_move(&res->lru, &pos->last->lru);
> +			pos->first = ttm_lru_next_res(res);
> +		list_move(&res->lru.link, &pos->last->lru.link);
>  		pos->last = res;
>  	}
>  }
> @@ -121,11 +145,11 @@ static void ttm_lru_bulk_move_del(struct ttm_lru_bulk_move *bulk,
>  		pos->first = NULL;
>  		pos->last = NULL;
>  	} else if (pos->first == res) {
> -		pos->first = list_next_entry(res, lru);
> +		pos->first = ttm_lru_next_res(res);
>  	} else if (pos->last == res) {
> -		pos->last = list_prev_entry(res, lru);
> +		pos->last = ttm_lru_prev_res(res);
>  	} else {
> -		list_move(&res->lru, &pos->last->lru);
> +		list_move(&res->lru.link, &pos->last->lru.link);
>  	}
>  }
>  
> @@ -154,7 +178,7 @@ void ttm_resource_move_to_lru_tail(struct ttm_resource *res)
>  	lockdep_assert_held(&bo->bdev->lru_lock);
>  
>  	if (bo->pin_count) {
> -		list_move_tail(&res->lru, &bdev->pinned);
> +		list_move_tail(&res->lru.link, &bdev->pinned);
>  
>  	} else	if (bo->bulk_move) {
>  		struct ttm_lru_bulk_move_pos *pos =
> @@ -165,7 +189,7 @@ void ttm_resource_move_to_lru_tail(struct ttm_resource *res)
>  		struct ttm_resource_manager *man;
>  
>  		man = ttm_manager_type(bdev, res->mem_type);
> -		list_move_tail(&res->lru, &man->lru[bo->priority]);
> +		list_move_tail(&res->lru.link, &man->lru[bo->priority]);
>  	}
>  }
>  
> @@ -196,9 +220,9 @@ void ttm_resource_init(struct ttm_buffer_object *bo,
>  	man = ttm_manager_type(bo->bdev, place->mem_type);
>  	spin_lock(&bo->bdev->lru_lock);
>  	if (bo->pin_count)
> -		list_add_tail(&res->lru, &bo->bdev->pinned);
> +		list_add_tail(&res->lru.link, &bo->bdev->pinned);
>  	else
> -		list_add_tail(&res->lru, &man->lru[bo->priority]);
> +		list_add_tail(&res->lru.link, &man->lru[bo->priority]);
>  	man->usage += res->size;
>  	spin_unlock(&bo->bdev->lru_lock);
>  }
> @@ -220,7 +244,7 @@ void ttm_resource_fini(struct ttm_resource_manager *man,
>  	struct ttm_device *bdev = man->bdev;
>  
>  	spin_lock(&bdev->lru_lock);
> -	list_del_init(&res->lru);
> +	list_del_init(&res->lru.link);
>  	man->usage -= res->size;
>  	spin_unlock(&bdev->lru_lock);
>  }
> @@ -471,14 +495,16 @@ struct ttm_resource *
>  ttm_resource_manager_first(struct ttm_resource_manager *man,
>  			   struct ttm_resource_cursor *cursor)
>  {
> -	struct ttm_resource *res;
> +	struct ttm_lru_item *lru;
>  
>  	lockdep_assert_held(&man->bdev->lru_lock);
>  
>  	for (cursor->priority = 0; cursor->priority < TTM_MAX_BO_PRIORITY;
>  	     ++cursor->priority)
> -		list_for_each_entry(res, &man->lru[cursor->priority], lru)
> -			return res;
> +		list_for_each_entry(lru, &man->lru[cursor->priority], link) {
> +			if (ttm_lru_item_is_res(lru))
> +				return ttm_lru_item_to_res(lru);
> +		}
>  
>  	return NULL;
>  }
> @@ -497,15 +523,21 @@ ttm_resource_manager_next(struct ttm_resource_manager *man,
>  			  struct ttm_resource_cursor *cursor,
>  			  struct ttm_resource *res)
>  {
> +	struct ttm_lru_item *lru = &res->lru;
> +
>  	lockdep_assert_held(&man->bdev->lru_lock);
>  
> -	list_for_each_entry_continue(res, &man->lru[cursor->priority], lru)
> -		return res;
> +	list_for_each_entry_continue(lru, &man->lru[cursor->priority], link) {
> +		if (ttm_lru_item_is_res(lru))
> +			return ttm_lru_item_to_res(lru);
> +	}
>  
>  	for (++cursor->priority; cursor->priority < TTM_MAX_BO_PRIORITY;
>  	     ++cursor->priority)
> -		list_for_each_entry(res, &man->lru[cursor->priority], lru)
> -			return res;
> +		list_for_each_entry(lru, &man->lru[cursor->priority], link) {
> +			if (ttm_lru_item_is_res(lru))
> +				ttm_lru_item_to_res(lru);
> +		}
>  
>  	return NULL;
>  }
> diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h
> index 69769355139f..4babc4ff10b0 100644
> --- a/include/drm/ttm/ttm_resource.h
> +++ b/include/drm/ttm/ttm_resource.h
> @@ -49,6 +49,43 @@ struct io_mapping;
>  struct sg_table;
>  struct scatterlist;
>  
> +/**
> + * enum ttm_lru_item_type - enumerate ttm_lru_item subclasses
> + */
> +enum ttm_lru_item_type {
> +	/** @TTM_LRU_RESOURCE: The resource subclass */
> +	TTM_LRU_RESOURCE,
> +	/** @TTM_LRU_HITCH: The iterator hitch subclass */
> +	TTM_LRU_HITCH
> +};
> +
> +/**
> + * struct ttm_lru_item - The TTM lru list node base class
> + * @link: The list link
> + * @type: The subclass type
> + */
> +struct ttm_lru_item {
> +	struct list_head link;
> +	enum ttm_lru_item_type type;
> +};
> +
> +/**
> + * ttm_lru_item_init() - initialize a struct ttm_lru_item
> + * @item: The item to initialize
> + * @type: The subclass type
> + */
> +static inline void ttm_lru_item_init(struct ttm_lru_item *item,
> +				     enum ttm_lru_item_type type)
> +{
> +	item->type = type;
> +	INIT_LIST_HEAD(&item->link);
> +}
> +
> +static inline bool ttm_lru_item_is_res(const struct ttm_lru_item *item)
> +{
> +	return item->type == TTM_LRU_RESOURCE;
> +}
> +
>  struct ttm_resource_manager_func {
>  	/**
>  	 * struct ttm_resource_manager_func member alloc
> @@ -217,9 +254,21 @@ struct ttm_resource {
>  	/**
>  	 * @lru: Least recently used list, see &ttm_resource_manager.lru
>  	 */
> -	struct list_head lru;
> +	struct ttm_lru_item lru;
>  };
>  
> +/**
> + * ttm_lru_item_to_res() - Downcast a struct ttm_lru_item to a struct ttm_resource
> + * @item: The struct ttm_lru_item to downcast
> + *
> + * Return: Pointer to the embedding struct ttm_resource
> + */
> +static inline struct ttm_resource *
> +ttm_lru_item_to_res(struct ttm_lru_item *item)

Pretty much everywhere in this series we have the following coding
pattern:

if (ttm_lru_item_is_res(item))
	do something with ttm_lru_item_to_res(item);

Would it make more sense to squash these functions together with only
ttm_lru_item_to_res which returns NULL if item is not TTM_LRU_RESOURCE?

The new pattern would be:

res = ttm_lru_item_is_res(item)
if (res)
	do something with res

What do you think?

Matt 

> +{
> +	return container_of(item, struct ttm_resource, lru);
> +}
> +
>  /**
>   * struct ttm_resource_cursor
>   *
> -- 
> 2.44.0
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 4/9] drm/ttm: Allow continued swapout after -ENOSPC falure
  2024-04-16 10:07 ` [PATCH v2 4/9] drm/ttm: Allow continued swapout after -ENOSPC falure Thomas Hellström
@ 2024-04-17  1:44   ` Matthew Brost
  0 siblings, 0 replies; 21+ messages in thread
From: Matthew Brost @ 2024-04-17  1:44 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Christian König, Somalapuram Amaranath, dri-devel

On Tue, Apr 16, 2024 at 12:07:25PM +0200, Thomas Hellström wrote:
> The -ENOSPC failure from ttm_bo_swapout() meant that the lru_lock
> was dropped and simply restarting the iteration meant we'd likely
> hit the same error again on the same resource. Now that we can
> restart the iteration even if the lock was dropped, do that.
> 

It is not clear what you describe in this commit message (-ENOSPC ==
-EBUSY + lru_lock dropped) is true (no comments in code).

It does appears to be true after examining ttm_bo_swapout() closely.
Maybe out of scope for the series but would it be possible to add some
kernel doc to ttm_device_swapout stating this?

Patch it self makes sense to me.

Matt

> Cc: Christian König <christian.koenig@amd.com>
> Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
> Cc: <dri-devel@lists.freedesktop.org>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/ttm/ttm_device.c | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
> index e8a6a1dab669..4a030b4bc848 100644
> --- a/drivers/gpu/drm/ttm/ttm_device.c
> +++ b/drivers/gpu/drm/ttm/ttm_device.c
> @@ -168,15 +168,20 @@ int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
>  
>  			num_pages = PFN_UP(bo->base.size);
>  			ret = ttm_bo_swapout(bo, ctx, gfp_flags);
> -			/* ttm_bo_swapout has dropped the lru_lock */
> -			if (!ret) {
> -				ttm_resource_cursor_fini(&cursor);
> -				return num_pages;
> -			}
> -			if (ret != -EBUSY) {
> -				ttm_resource_cursor_fini(&cursor);
> -				return ret;
> +			/* Couldn't swap out, and retained the lru_lock */
> +			if (ret == -EBUSY)
> +				continue;
> +			/* Couldn't swap out and dropped the lru_lock */
> +			if (ret == -ENOSPC) {
> +				spin_lock(&bdev->lru_lock);
> +				continue;
>  			}
> +			/*
> +			 * Dropped the lock and either succeeded or
> +			 * hit an error that forces us to break.
> +			 */
> +			ttm_resource_cursor_fini(&cursor);
> +			return ret ? ret : num_pages;
>  		}
>  	}
>  	ttm_resource_cursor_fini_locked(&cursor);
> -- 
> 2.44.0
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/9] drm/ttm: Allow TTM LRU list nodes of different types
  2024-04-17  1:15   ` Matthew Brost
@ 2024-04-17  6:09     ` Christian König
  2024-05-02 11:41       ` Thomas Hellström
  0 siblings, 1 reply; 21+ messages in thread
From: Christian König @ 2024-04-17  6:09 UTC (permalink / raw)
  To: Matthew Brost, Thomas Hellström
  Cc: intel-xe, Somalapuram Amaranath, dri-devel

Am 17.04.24 um 03:15 schrieb Matthew Brost:
> On Tue, Apr 16, 2024 at 12:07:22PM +0200, Thomas Hellström wrote:
>> To be able to handle list unlocking while traversing the LRU
>> list, we want the iterators not only to point to the next
>> position of the list traversal, but to insert themselves as
>> list nodes at that point to work around the fact that the
>> next node might otherwise disappear from the list while
>> the iterator is pointing to it.
>>
>> These list nodes need to be easily distinguishable from other
>> list nodes so that others traversing the list can skip
>> over them.
>>
>> So declare a struct ttm_lru_item, with a struct list_head member
>> and a type enum. This will slightly increase the size of a
>> struct ttm_resource.
>>
>> Changes in previous series:
>> - Update enum ttm_lru_item_type documentation.
>>
> Patch itself makes sense to me. One style question (or maybe
> suggestion?) below.
>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
>> Cc: <dri-devel@lists.freedesktop.org>
>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> ---
>>   drivers/gpu/drm/ttm/ttm_device.c   | 13 ++++--
>>   drivers/gpu/drm/ttm/ttm_resource.c | 70 ++++++++++++++++++++++--------
>>   include/drm/ttm/ttm_resource.h     | 51 +++++++++++++++++++++-
>>   3 files changed, 110 insertions(+), 24 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
>> index 76027960054f..f27406e851e5 100644
>> --- a/drivers/gpu/drm/ttm/ttm_device.c
>> +++ b/drivers/gpu/drm/ttm/ttm_device.c
>> @@ -270,17 +270,22 @@ EXPORT_SYMBOL(ttm_device_fini);
>>   static void ttm_device_clear_lru_dma_mappings(struct ttm_device *bdev,
>>   					      struct list_head *list)
>>   {
>> -	struct ttm_resource *res;
>> +	struct ttm_lru_item *lru;
>>   
>>   	spin_lock(&bdev->lru_lock);
>> -	while ((res = list_first_entry_or_null(list, typeof(*res), lru))) {
>> -		struct ttm_buffer_object *bo = res->bo;
>> +	while ((lru = list_first_entry_or_null(list, typeof(*lru), link))) {
>> +		struct ttm_buffer_object *bo;
>> +
>> +		if (!ttm_lru_item_is_res(lru))
>> +			continue;
>> +
>> +		bo = ttm_lru_item_to_res(lru)->bo;
>>   
>>   		/* Take ref against racing releases once lru_lock is unlocked */
>>   		if (!ttm_bo_get_unless_zero(bo))
>>   			continue;
>>   
>> -		list_del_init(&res->lru);
>> +		list_del_init(&bo->resource->lru.link);
>>   		spin_unlock(&bdev->lru_lock);
>>   
>>   		if (bo->ttm)
>> diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
>> index be8d286513f9..7aa5ca5c0e33 100644
>> --- a/drivers/gpu/drm/ttm/ttm_resource.c
>> +++ b/drivers/gpu/drm/ttm/ttm_resource.c
>> @@ -69,8 +69,8 @@ void ttm_lru_bulk_move_tail(struct ttm_lru_bulk_move *bulk)
>>   			dma_resv_assert_held(pos->last->bo->base.resv);
>>   
>>   			man = ttm_manager_type(pos->first->bo->bdev, i);
>> -			list_bulk_move_tail(&man->lru[j], &pos->first->lru,
>> -					    &pos->last->lru);
>> +			list_bulk_move_tail(&man->lru[j], &pos->first->lru.link,
>> +					    &pos->last->lru.link);
>>   		}
>>   	}
>>   }
>> @@ -83,14 +83,38 @@ ttm_lru_bulk_move_pos(struct ttm_lru_bulk_move *bulk, struct ttm_resource *res)
>>   	return &bulk->pos[res->mem_type][res->bo->priority];
>>   }
>>   
>> +/* Return the previous resource on the list (skip over non-resource list items) */
>> +static struct ttm_resource *ttm_lru_prev_res(struct ttm_resource *cur)
>> +{
>> +	struct ttm_lru_item *lru = &cur->lru;
>> +
>> +	do {
>> +		lru = list_prev_entry(lru, link);
>> +	} while (!ttm_lru_item_is_res(lru));
>> +
>> +	return ttm_lru_item_to_res(lru);
>> +}
>> +
>> +/* Return the next resource on the list (skip over non-resource list items) */
>> +static struct ttm_resource *ttm_lru_next_res(struct ttm_resource *cur)
>> +{
>> +	struct ttm_lru_item *lru = &cur->lru;
>> +
>> +	do {
>> +		lru = list_next_entry(lru, link);
>> +	} while (!ttm_lru_item_is_res(lru));
>> +
>> +	return ttm_lru_item_to_res(lru);
>> +}
>> +
>>   /* Move the resource to the tail of the bulk move range */
>>   static void ttm_lru_bulk_move_pos_tail(struct ttm_lru_bulk_move_pos *pos,
>>   				       struct ttm_resource *res)
>>   {
>>   	if (pos->last != res) {
>>   		if (pos->first == res)
>> -			pos->first = list_next_entry(res, lru);
>> -		list_move(&res->lru, &pos->last->lru);
>> +			pos->first = ttm_lru_next_res(res);
>> +		list_move(&res->lru.link, &pos->last->lru.link);
>>   		pos->last = res;
>>   	}
>>   }
>> @@ -121,11 +145,11 @@ static void ttm_lru_bulk_move_del(struct ttm_lru_bulk_move *bulk,
>>   		pos->first = NULL;
>>   		pos->last = NULL;
>>   	} else if (pos->first == res) {
>> -		pos->first = list_next_entry(res, lru);
>> +		pos->first = ttm_lru_next_res(res);
>>   	} else if (pos->last == res) {
>> -		pos->last = list_prev_entry(res, lru);
>> +		pos->last = ttm_lru_prev_res(res);
>>   	} else {
>> -		list_move(&res->lru, &pos->last->lru);
>> +		list_move(&res->lru.link, &pos->last->lru.link);
>>   	}
>>   }
>>   
>> @@ -154,7 +178,7 @@ void ttm_resource_move_to_lru_tail(struct ttm_resource *res)
>>   	lockdep_assert_held(&bo->bdev->lru_lock);
>>   
>>   	if (bo->pin_count) {
>> -		list_move_tail(&res->lru, &bdev->pinned);
>> +		list_move_tail(&res->lru.link, &bdev->pinned);
>>   
>>   	} else	if (bo->bulk_move) {
>>   		struct ttm_lru_bulk_move_pos *pos =
>> @@ -165,7 +189,7 @@ void ttm_resource_move_to_lru_tail(struct ttm_resource *res)
>>   		struct ttm_resource_manager *man;
>>   
>>   		man = ttm_manager_type(bdev, res->mem_type);
>> -		list_move_tail(&res->lru, &man->lru[bo->priority]);
>> +		list_move_tail(&res->lru.link, &man->lru[bo->priority]);
>>   	}
>>   }
>>   
>> @@ -196,9 +220,9 @@ void ttm_resource_init(struct ttm_buffer_object *bo,
>>   	man = ttm_manager_type(bo->bdev, place->mem_type);
>>   	spin_lock(&bo->bdev->lru_lock);
>>   	if (bo->pin_count)
>> -		list_add_tail(&res->lru, &bo->bdev->pinned);
>> +		list_add_tail(&res->lru.link, &bo->bdev->pinned);
>>   	else
>> -		list_add_tail(&res->lru, &man->lru[bo->priority]);
>> +		list_add_tail(&res->lru.link, &man->lru[bo->priority]);
>>   	man->usage += res->size;
>>   	spin_unlock(&bo->bdev->lru_lock);
>>   }
>> @@ -220,7 +244,7 @@ void ttm_resource_fini(struct ttm_resource_manager *man,
>>   	struct ttm_device *bdev = man->bdev;
>>   
>>   	spin_lock(&bdev->lru_lock);
>> -	list_del_init(&res->lru);
>> +	list_del_init(&res->lru.link);
>>   	man->usage -= res->size;
>>   	spin_unlock(&bdev->lru_lock);
>>   }
>> @@ -471,14 +495,16 @@ struct ttm_resource *
>>   ttm_resource_manager_first(struct ttm_resource_manager *man,
>>   			   struct ttm_resource_cursor *cursor)
>>   {
>> -	struct ttm_resource *res;
>> +	struct ttm_lru_item *lru;
>>   
>>   	lockdep_assert_held(&man->bdev->lru_lock);
>>   
>>   	for (cursor->priority = 0; cursor->priority < TTM_MAX_BO_PRIORITY;
>>   	     ++cursor->priority)
>> -		list_for_each_entry(res, &man->lru[cursor->priority], lru)
>> -			return res;
>> +		list_for_each_entry(lru, &man->lru[cursor->priority], link) {
>> +			if (ttm_lru_item_is_res(lru))
>> +				return ttm_lru_item_to_res(lru);
>> +		}
>>   
>>   	return NULL;
>>   }
>> @@ -497,15 +523,21 @@ ttm_resource_manager_next(struct ttm_resource_manager *man,
>>   			  struct ttm_resource_cursor *cursor,
>>   			  struct ttm_resource *res)
>>   {
>> +	struct ttm_lru_item *lru = &res->lru;
>> +
>>   	lockdep_assert_held(&man->bdev->lru_lock);
>>   
>> -	list_for_each_entry_continue(res, &man->lru[cursor->priority], lru)
>> -		return res;
>> +	list_for_each_entry_continue(lru, &man->lru[cursor->priority], link) {
>> +		if (ttm_lru_item_is_res(lru))
>> +			return ttm_lru_item_to_res(lru);
>> +	}
>>   
>>   	for (++cursor->priority; cursor->priority < TTM_MAX_BO_PRIORITY;
>>   	     ++cursor->priority)
>> -		list_for_each_entry(res, &man->lru[cursor->priority], lru)
>> -			return res;
>> +		list_for_each_entry(lru, &man->lru[cursor->priority], link) {
>> +			if (ttm_lru_item_is_res(lru))
>> +				ttm_lru_item_to_res(lru);
>> +		}
>>   
>>   	return NULL;
>>   }
>> diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h
>> index 69769355139f..4babc4ff10b0 100644
>> --- a/include/drm/ttm/ttm_resource.h
>> +++ b/include/drm/ttm/ttm_resource.h
>> @@ -49,6 +49,43 @@ struct io_mapping;
>>   struct sg_table;
>>   struct scatterlist;
>>   
>> +/**
>> + * enum ttm_lru_item_type - enumerate ttm_lru_item subclasses
>> + */
>> +enum ttm_lru_item_type {
>> +	/** @TTM_LRU_RESOURCE: The resource subclass */
>> +	TTM_LRU_RESOURCE,
>> +	/** @TTM_LRU_HITCH: The iterator hitch subclass */
>> +	TTM_LRU_HITCH
>> +};
>> +
>> +/**
>> + * struct ttm_lru_item - The TTM lru list node base class
>> + * @link: The list link
>> + * @type: The subclass type
>> + */
>> +struct ttm_lru_item {
>> +	struct list_head link;
>> +	enum ttm_lru_item_type type;
>> +};
>> +
>> +/**
>> + * ttm_lru_item_init() - initialize a struct ttm_lru_item
>> + * @item: The item to initialize
>> + * @type: The subclass type
>> + */
>> +static inline void ttm_lru_item_init(struct ttm_lru_item *item,
>> +				     enum ttm_lru_item_type type)
>> +{
>> +	item->type = type;
>> +	INIT_LIST_HEAD(&item->link);
>> +}
>> +
>> +static inline bool ttm_lru_item_is_res(const struct ttm_lru_item *item)
>> +{
>> +	return item->type == TTM_LRU_RESOURCE;
>> +}
>> +
>>   struct ttm_resource_manager_func {
>>   	/**
>>   	 * struct ttm_resource_manager_func member alloc
>> @@ -217,9 +254,21 @@ struct ttm_resource {
>>   	/**
>>   	 * @lru: Least recently used list, see &ttm_resource_manager.lru
>>   	 */
>> -	struct list_head lru;
>> +	struct ttm_lru_item lru;
>>   };
>>   
>> +/**
>> + * ttm_lru_item_to_res() - Downcast a struct ttm_lru_item to a struct ttm_resource
>> + * @item: The struct ttm_lru_item to downcast
>> + *
>> + * Return: Pointer to the embedding struct ttm_resource
>> + */
>> +static inline struct ttm_resource *
>> +ttm_lru_item_to_res(struct ttm_lru_item *item)
> Pretty much everywhere in this series we have the following coding
> pattern:
>
> if (ttm_lru_item_is_res(item))
> 	do something with ttm_lru_item_to_res(item);
>
> Would it make more sense to squash these functions together with only
> ttm_lru_item_to_res which returns NULL if item is not TTM_LRU_RESOURCE?
>
> The new pattern would be:
>
> res = ttm_lru_item_is_res(item)
> if (res)
> 	do something with res
>
> What do you think?

I would even say we should put that filtering into the iterator.

Nobody except the code which inserted the anchor into the LRU is 
interested in it.

Regards,
Christian.

>
> Matt
>
>> +{
>> +	return container_of(item, struct ttm_resource, lru);
>> +}
>> +
>>   /**
>>    * struct ttm_resource_cursor
>>    *
>> -- 
>> 2.44.0
>>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 9/9] drm/xe: Add a shrinker for xe bos
  2024-04-16 10:07 ` [PATCH v2 9/9] drm/xe: Add a shrinker for xe bos Thomas Hellström
  2024-04-16 21:32   ` kernel test robot
@ 2024-04-17 13:45   ` kernel test robot
  1 sibling, 0 replies; 21+ messages in thread
From: kernel test robot @ 2024-04-17 13:45 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe
  Cc: oe-kbuild-all, Thomas Hellström, Matthew Brost,
	Somalapuram Amaranath, Christian König, dri-devel

Hi Thomas,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-xe/drm-xe-next]
[also build test WARNING on drm-intel/for-linux-next-fixes drm-tip/drm-tip linus/master v6.9-rc4 next-20240417]
[cannot apply to drm-misc/drm-misc-next drm/drm-next drm-exynos/exynos-drm-next drm-intel/for-linux-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Thomas-Hellstr-m/drm-ttm-Allow-TTM-LRU-list-nodes-of-different-types/20240416-181717
base:   https://gitlab.freedesktop.org/drm/xe/kernel.git drm-xe-next
patch link:    https://lore.kernel.org/r/20240416100730.6666-10-thomas.hellstrom%40linux.intel.com
patch subject: [PATCH v2 9/9] drm/xe: Add a shrinker for xe bos
config: powerpc-randconfig-001-20240417 (https://download.01.org/0day-ci/archive/20240417/202404172100.qxirErE7-lkp@intel.com/config)
compiler: powerpc-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240417/202404172100.qxirErE7-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202404172100.qxirErE7-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from include/asm-generic/bug.h:22,
                    from arch/powerpc/include/asm/bug.h:116,
                    from include/linux/bug.h:5,
                    from include/linux/mmdebug.h:5,
                    from include/linux/mm.h:6,
                    from include/linux/pagemap.h:8,
                    from include/drm/ttm/ttm_tt.h:30,
                    from drivers/gpu/drm/xe/xe_bo.h:9,
                    from drivers/gpu/drm/xe/xe_bo.c:6:
   drivers/gpu/drm/xe/tests/xe_bo.c: In function 'shrink_test_run_device':
   include/linux/kern_levels.h:5:25: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Wformat=]
       5 | #define KERN_SOH        "\001"          /* ASCII Start Of Header */
         |                         ^~~~~~
   include/linux/printk.h:429:25: note: in definition of macro 'printk_index_wrap'
     429 |                 _p_func(_fmt, ##__VA_ARGS__);                           \
         |                         ^~~~
   include/kunit/test.h:527:17: note: in expansion of macro 'printk'
     527 |                 printk(lvl fmt, ##__VA_ARGS__);                         \
         |                 ^~~~~~
   include/kunit/test.h:533:9: note: in expansion of macro 'kunit_log'
     533 |         kunit_log(lvl, test, KUNIT_SUBTEST_INDENT "# %s: " fmt,         \
         |         ^~~~~~~~~
   include/kunit/test.h:546:9: note: in expansion of macro 'kunit_printk'
     546 |         kunit_printk(KERN_INFO, test, fmt, ##__VA_ARGS__)
         |         ^~~~~~~~~~~~
   include/linux/kern_levels.h:14:25: note: in expansion of macro 'KERN_SOH'
      14 | #define KERN_INFO       KERN_SOH "6"    /* informational */
         |                         ^~~~~~~~
   include/kunit/test.h:546:22: note: in expansion of macro 'KERN_INFO'
     546 |         kunit_printk(KERN_INFO, test, fmt, ##__VA_ARGS__)
         |                      ^~~~~~~~~
   drivers/gpu/drm/xe/tests/xe_bo.c:381:9: note: in expansion of macro 'kunit_info'
     381 |         kunit_info(test, "Free ram is %lu bytes. Will allocate twice of that.\n",
         |         ^~~~~~~~~~
   In file included from drivers/gpu/drm/xe/tests/xe_bo.c:6,
                    from drivers/gpu/drm/xe/xe_bo.c:2420:
>> include/kunit/test.h:50:41: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t' {aka 'unsigned int'} [-Wformat=]
      50 | #define KUNIT_SUBTEST_INDENT            "    "
         |                                         ^~~~~~
   include/kunit/test.h:528:57: note: in definition of macro 'kunit_log'
     528 |                 kunit_log_append((test_or_suite)->log,  fmt,            \
         |                                                         ^~~
   include/kunit/test.h:533:30: note: in expansion of macro 'KUNIT_SUBTEST_INDENT'
     533 |         kunit_log(lvl, test, KUNIT_SUBTEST_INDENT "# %s: " fmt,         \
         |                              ^~~~~~~~~~~~~~~~~~~~
   include/kunit/test.h:546:9: note: in expansion of macro 'kunit_printk'
     546 |         kunit_printk(KERN_INFO, test, fmt, ##__VA_ARGS__)
         |         ^~~~~~~~~~~~
   drivers/gpu/drm/xe/tests/xe_bo.c:381:9: note: in expansion of macro 'kunit_info'
     381 |         kunit_info(test, "Free ram is %lu bytes. Will allocate twice of that.\n",
         |         ^~~~~~~~~~


vim +50 include/kunit/test.h

6d2426b2f258da David Gow        2021-06-24  43  
c3bba690a26432 Alan Maguire     2020-03-26  44  /*
c3bba690a26432 Alan Maguire     2020-03-26  45   * TAP specifies subtest stream indentation of 4 spaces, 8 spaces for a
c3bba690a26432 Alan Maguire     2020-03-26  46   * sub-subtest.  See the "Subtests" section in
c3bba690a26432 Alan Maguire     2020-03-26  47   * https://node-tap.org/tap-protocol/
c3bba690a26432 Alan Maguire     2020-03-26  48   */
b1eaa8b2a55c9d Michal Wajdeczko 2023-05-17  49  #define KUNIT_INDENT_LEN		4
c3bba690a26432 Alan Maguire     2020-03-26 @50  #define KUNIT_SUBTEST_INDENT		"    "
c3bba690a26432 Alan Maguire     2020-03-26  51  #define KUNIT_SUBSUBTEST_INDENT		"        "
c3bba690a26432 Alan Maguire     2020-03-26  52  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/9] drm/ttm, drm/amdgpu, drm/xe: Consider hitch moves within bulk sublist moves
  2024-04-16 10:07 ` [PATCH v2 3/9] drm/ttm, drm/amdgpu, drm/xe: Consider hitch moves within bulk sublist moves Thomas Hellström
@ 2024-04-26  1:20   ` Matthew Brost
  0 siblings, 0 replies; 21+ messages in thread
From: Matthew Brost @ 2024-04-26  1:20 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Christian König, Somalapuram Amaranath, dri-devel

On Tue, Apr 16, 2024 at 12:07:24PM +0200, Thomas Hellström wrote:
> To address the problem with hitches moving when bulk move
> sublists are lru-bumped, register the list cursors with the
> ttm_lru_bulk_move structure when traversing its list, and
> when lru-bumping the list, move the cursor hitch to the tail.
> This also means it's mandatory for drivers to call
> ttm_lru_bulk_move_init() and ttm_lru_bulk_move_fini() when
> initializing and finalizing the bulk move structure, so add
> those calls to the amdgpu- and xe driver.
> 
> Compared to v1 this is slightly more code but less fragile
> and hopefully easier to understand.
> 
> Changes in previous series:
> - Completely rework the functionality
> - Avoid a NULL pointer dereference assigning manager->mem_type
> - Remove some leftover code causing build problems
> v2:
> - For hitch bulk tail moves, store the mem_type in the cursor
>   instead of with the manager.
> 
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
> Cc: <dri-devel@lists.freedesktop.org>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  4 ++
>  drivers/gpu/drm/ttm/ttm_resource.c     | 92 +++++++++++++++++++++++++-
>  drivers/gpu/drm/xe/xe_vm.c             |  4 ++
>  include/drm/ttm/ttm_resource.h         | 58 ++++++++++------
>  4 files changed, 137 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 4299ce386322..18bf174c8d47 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -2368,6 +2368,8 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>  	if (r)
>  		return r;
>  
> +	ttm_lru_bulk_move_init(&vm->lru_bulk_move);
> +
>  	vm->is_compute_context = false;
>  
>  	vm->use_cpu_for_update = !!(adev->vm_manager.vm_update_mode &
> @@ -2431,6 +2433,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>  error_free_delayed:
>  	dma_fence_put(vm->last_tlb_flush);
>  	dma_fence_put(vm->last_unlocked);
> +	ttm_lru_bulk_move_fini(&adev->mman.bdev, &vm->lru_bulk_move);
>  	amdgpu_vm_fini_entities(vm);
>  
>  	return r;
> @@ -2587,6 +2590,7 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
>  		}
>  	}
>  
> +	ttm_lru_bulk_move_fini(&adev->mman.bdev, &vm->lru_bulk_move);
>  }
>  
>  /**
> diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
> index 22f8ae4ff4c0..2b93727c78e5 100644
> --- a/drivers/gpu/drm/ttm/ttm_resource.c
> +++ b/drivers/gpu/drm/ttm/ttm_resource.c
> @@ -32,6 +32,49 @@
>  
>  #include <drm/drm_util.h>
>  
> +/* Detach the cursor from the bulk move list*/
> +static void
> +ttm_resource_cursor_clear_bulk(struct ttm_resource_cursor *cursor)
> +{
> +	cursor->bulk = NULL;
> +	list_del_init(&cursor->bulk_link);
> +}
> +
> +/* Move the cursor to the end of the bulk move list it's in */
> +static void ttm_resource_cursor_move_bulk_tail(struct ttm_lru_bulk_move *bulk,
> +					       struct ttm_resource_cursor *cursor)
> +{
> +	struct ttm_lru_bulk_move_pos *pos;
> +
> +	if (WARN_ON_ONCE(bulk != cursor->bulk)) {
> +		list_del_init(&cursor->bulk_link);
> +		return;
> +	}
> +
> +	pos = &bulk->pos[cursor->mem_type][cursor->priority];
> +	if (pos)
> +		list_move(&cursor->hitch.link, &pos->last->lru.link);
> +	ttm_resource_cursor_clear_bulk(cursor);
> +}
> +
> +/* Move all cursors attached to a bulk move to its end */
> +static void ttm_bulk_move_adjust_cursors(struct ttm_lru_bulk_move *bulk)
> +{
> +	struct ttm_resource_cursor *cursor, *next;
> +
> +	list_for_each_entry_safe(cursor, next, &bulk->cursor_list, bulk_link)
> +		ttm_resource_cursor_move_bulk_tail(bulk, cursor);
> +}
> +
> +/* Remove a cursor from an empty bulk move list */
> +static void ttm_bulk_move_drop_cursors(struct ttm_lru_bulk_move *bulk)
> +{
> +	struct ttm_resource_cursor *cursor, *next;
> +
> +	list_for_each_entry_safe(cursor, next, &bulk->cursor_list, bulk_link)
> +		ttm_resource_cursor_clear_bulk(cursor);
>
 +}
> +
>  /**
>   * ttm_resource_cursor_fini_locked() - Finalize the LRU list cursor usage
>   * @cursor: The struct ttm_resource_cursor to finalize.
> @@ -44,6 +87,7 @@ void ttm_resource_cursor_fini_locked(struct ttm_resource_cursor *cursor)
>  {
>  	lockdep_assert_held(&cursor->man->bdev->lru_lock);
>  	list_del_init(&cursor->hitch.link);
> +	ttm_resource_cursor_clear_bulk(cursor);
>  }
>  
>  /**
> @@ -72,9 +116,27 @@ void ttm_resource_cursor_fini(struct ttm_resource_cursor *cursor)
>  void ttm_lru_bulk_move_init(struct ttm_lru_bulk_move *bulk)
>  {
>  	memset(bulk, 0, sizeof(*bulk));
> +	INIT_LIST_HEAD(&bulk->cursor_list);
>  }
>  EXPORT_SYMBOL(ttm_lru_bulk_move_init);
>  
> +/**
> + * ttm_lru_bulk_move_fini - finalize a bulk move structure
> + * @bdev: The struct ttm_device
> + * @bulk: the structure to finalize
> + *
> + * Sanity checks that bulk moves don't have any
> + * resources left and hence no cursors attached.

I don't really see a sanity check here.

Wouldn't the sanity check be list_empty(&bulk->cursor_list)? Also pos == 0?

> + */
> +void ttm_lru_bulk_move_fini(struct ttm_device *bdev,
> +			    struct ttm_lru_bulk_move *bulk)
> +{
> +	spin_lock(&bdev->lru_lock);
> +	ttm_bulk_move_drop_cursors(bulk);
> +	spin_unlock(&bdev->lru_lock);
> +}
> +EXPORT_SYMBOL(ttm_lru_bulk_move_fini);
> +
>  /**
>   * ttm_lru_bulk_move_tail - bulk move range of resources to the LRU tail.
>   *
> @@ -87,6 +149,7 @@ void ttm_lru_bulk_move_tail(struct ttm_lru_bulk_move *bulk)
>  {
>  	unsigned i, j;
>  
> +	ttm_bulk_move_adjust_cursors(bulk);
>  	for (i = 0; i < TTM_NUM_MEM_TYPES; ++i) {
>  		for (j = 0; j < TTM_MAX_BO_PRIORITY; ++j) {
>  			struct ttm_lru_bulk_move_pos *pos = &bulk->pos[i][j];
> @@ -418,6 +481,7 @@ void ttm_resource_manager_init(struct ttm_resource_manager *man,
>  	man->bdev = bdev;
>  	man->size = size;
>  	man->usage = 0;
> +	man->mem_type = TTM_NUM_MEM_TYPES;
>  
>  	for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i)
>  		INIT_LIST_HEAD(&man->lru[i]);
> @@ -514,6 +578,29 @@ void ttm_resource_manager_debug(struct ttm_resource_manager *man,
>  }
>  EXPORT_SYMBOL(ttm_resource_manager_debug);
>  
> +static void
> +ttm_resource_cursor_check_bulk(struct ttm_resource_cursor *cursor,
> +			       struct ttm_lru_item *next_lru)
> +{
> +	struct ttm_resource *next = ttm_lru_item_to_res(next_lru);
> +	struct ttm_lru_bulk_move *bulk = NULL;
> +	struct ttm_buffer_object *bo = next->bo;
> +
> +	lockdep_assert_held(&cursor->man->bdev->lru_lock);
> +	if (bo && bo->resource == next)
> +		bulk = bo->bulk_move;
> +
> +	if (cursor->bulk != bulk) {
> +		if (bulk) {
> +			list_move_tail(&cursor->bulk_link, &bulk->cursor_list);
> +			cursor->mem_type = next->mem_type;
> +		} else {
> +			list_del_init(&cursor->bulk_link);
> +		}
> +		cursor->bulk = bulk;
> +	}
> +}
> +
>  /**
>   * ttm_resource_manager_first() - Start iterating over the resources
>   * of a resource manager
> @@ -534,6 +621,7 @@ ttm_resource_manager_first(struct ttm_resource_manager *man,
>  	cursor->priority = 0;
>  	cursor->man = man;
>  	ttm_lru_item_init(&cursor->hitch, TTM_LRU_HITCH);
> +	INIT_LIST_HEAD(&cursor->bulk_link);
>  	list_add(&cursor->hitch.link, &man->lru[cursor->priority]);
>  
>  	return ttm_resource_manager_next(cursor);
> @@ -558,6 +646,7 @@ ttm_resource_manager_next(struct ttm_resource_cursor *cursor)
>  		lru = &cursor->hitch;
>  		list_for_each_entry_continue(lru, &man->lru[cursor->priority], link) {
>  			if (ttm_lru_item_is_res(lru)) {
> +				ttm_resource_cursor_check_bulk(cursor, lru);
>  				list_move(&cursor->hitch.link, &lru->link);
>  				return ttm_lru_item_to_res(lru);
>  			}
> @@ -567,9 +656,10 @@ ttm_resource_manager_next(struct ttm_resource_cursor *cursor)
>  			break;
>  
>  		list_move(&cursor->hitch.link, &man->lru[cursor->priority]);
> +		ttm_resource_cursor_clear_bulk(cursor);
>  	} while (true);
>  
> -	list_del_init(&cursor->hitch.link);
> +	ttm_resource_cursor_fini_locked(cursor);

Nit: A patch eariler in the series which introduces this code should
probably just ttm_resource_cursor_fini_locked.

Matt

>  
>  	return NULL;
>  }
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 2dbba55e7785..45e1975eed26 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -1262,6 +1262,8 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
>  
>  	INIT_WORK(&vm->destroy_work, vm_destroy_work_func);
>  
> +	ttm_lru_bulk_move_init(&vm->lru_bulk_move);
> +
>  	INIT_LIST_HEAD(&vm->preempt.exec_queues);
>  	vm->preempt.min_run_period_ms = 10;	/* FIXME: Wire up to uAPI */
>  
> @@ -1379,6 +1381,7 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
>  	mutex_destroy(&vm->snap_mutex);
>  	for_each_tile(tile, xe, id)
>  		xe_range_fence_tree_fini(&vm->rftree[id]);
> +	ttm_lru_bulk_move_fini(&xe->ttm, &vm->lru_bulk_move);
>  	kfree(vm);
>  	if (!(flags & XE_VM_FLAG_MIGRATION))
>  		xe_device_mem_access_put(xe);
> @@ -1518,6 +1521,7 @@ static void vm_destroy_work_func(struct work_struct *w)
>  		XE_WARN_ON(vm->pt_root[id]);
>  
>  	trace_xe_vm_free(vm);
> +	ttm_lru_bulk_move_fini(&xe->ttm, &vm->lru_bulk_move);
>  	kfree(vm);
>  }
>  
> diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h
> index dfc01258c981..ed09b49a001e 100644
> --- a/include/drm/ttm/ttm_resource.h
> +++ b/include/drm/ttm/ttm_resource.h
> @@ -211,6 +211,9 @@ struct ttm_resource_manager {
>  	 * bdev->lru_lock.
>  	 */
>  	uint64_t usage;
> +
> +	/** @mem_type: The memory type used for this manager. */
> +	unsigned int mem_type;
>  };
>  
>  /**
> @@ -269,25 +272,6 @@ ttm_lru_item_to_res(struct ttm_lru_item *item)
>  	return container_of(item, struct ttm_resource, lru);
>  }
>  
> -/**
> - * struct ttm_resource_cursor
> - * @man: The resource manager currently being iterated over
> - * @hitch: A hitch list node inserted before the next resource
> - * to iterate over.
> - * @priority: the current priority
> - *
> - * Cursor to iterate over the resources in a manager.
> - */
> -struct ttm_resource_cursor {
> -	struct ttm_resource_manager *man;
> -	struct ttm_lru_item hitch;
> -	unsigned int priority;
> -};
> -
> -void ttm_resource_cursor_fini_locked(struct ttm_resource_cursor *cursor);
> -
> -void ttm_resource_cursor_fini(struct ttm_resource_cursor *cursor);
> -
>  /**
>   * struct ttm_lru_bulk_move_pos
>   *
> @@ -303,8 +287,9 @@ struct ttm_lru_bulk_move_pos {
>  
>  /**
>   * struct ttm_lru_bulk_move
> - *
>   * @pos: first/last lru entry for resources in the each domain/priority
> + * @cursor_list: The list of cursors currently traversing any of
> + * the sublists of @pos. Protected by the ttm device's lru_lock.
>   *
>   * Container for the current bulk move state. Should be used with
>   * ttm_lru_bulk_move_init() and ttm_bo_set_bulk_move().
> @@ -314,8 +299,39 @@ struct ttm_lru_bulk_move_pos {
>   */
>  struct ttm_lru_bulk_move {
>  	struct ttm_lru_bulk_move_pos pos[TTM_NUM_MEM_TYPES][TTM_MAX_BO_PRIORITY];
> +	struct list_head cursor_list;
> +};
> +
> +/**
> + * struct ttm_resource_cursor
> + * @man: The resource manager currently being iterated over
> + * @hitch: A hitch list node inserted before the next resource
> + * to iterate over.
> + * @bulk_link: A list link for the list of cursors traversing the
> + * bulk sublist of @bulk. Protected by the ttm device's lru_lock.
> + * @bulk: Pointer to struct ttm_lru_bulk_move whose subrange @hitch is
> + * inserted to. NULL if none. Never dereference this pointer since
> + * the struct ttm_lru_bulk_move object pointed to might have been
> + * freed. The pointer is only for comparison.
> + * @mem_type: The memory type of the LRU list being traversed.
> + * This field is valid iff @bulk != NULL.
> + * @priority: the current priority
> + *
> + * Cursor to iterate over the resources in a manager.
> + */
> +struct ttm_resource_cursor {
> +	struct ttm_resource_manager *man;
> +	struct ttm_lru_item hitch;
> +	struct list_head bulk_link;
> +	struct ttm_lru_bulk_move *bulk;
> +	unsigned int mem_type;
> +	unsigned int priority;
>  };
>  
> +void ttm_resource_cursor_fini_locked(struct ttm_resource_cursor *cursor);
> +
> +void ttm_resource_cursor_fini(struct ttm_resource_cursor *cursor);
> +
>  /**
>   * struct ttm_kmap_iter_iomap - Specialization for a struct io_mapping +
>   * struct sg_table backed struct ttm_resource.
> @@ -404,6 +420,8 @@ ttm_resource_manager_cleanup(struct ttm_resource_manager *man)
>  
>  void ttm_lru_bulk_move_init(struct ttm_lru_bulk_move *bulk);
>  void ttm_lru_bulk_move_tail(struct ttm_lru_bulk_move *bulk);
> +void ttm_lru_bulk_move_fini(struct ttm_device *bdev,
> +			    struct ttm_lru_bulk_move *bulk);
>  
>  void ttm_resource_add_bulk_move(struct ttm_resource *res,
>  				struct ttm_buffer_object *bo);
> -- 
> 2.44.0
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/9] drm/ttm: Use LRU hitches
  2024-04-16 10:07 ` [PATCH v2 2/9] drm/ttm: Use LRU hitches Thomas Hellström
@ 2024-04-26  1:23   ` Matthew Brost
  0 siblings, 0 replies; 21+ messages in thread
From: Matthew Brost @ 2024-04-26  1:23 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: intel-xe, Christian König, Somalapuram Amaranath, dri-devel

On Tue, Apr 16, 2024 at 12:07:23PM +0200, Thomas Hellström wrote:
> Have iterators insert themselves into the list they are iterating
> over using hitch list nodes. Since only the iterator owner
> can remove these list nodes from the list, it's safe to unlock
> the list and when continuing, use them as a starting point. Due to
> the way LRU bumping works in TTM, newly added items will not be
> missed, and bumped items will be iterated over a second time before
> reaching the end of the list.
> 
> The exception is list with bulk move sublists. When bumping a
> sublist, a hitch that is part of that sublist will also be moved
> and we might miss items if restarting from it. This will be
> addressed in a later patch.
> 
> Changes in previous series:
> - Updated ttm_resource_cursor_fini() documentation.
> v2:
> - Don't reorder ttm_resource_manager_first() and _next().
>   (Christian König).
> - Use list_add instead of list_move
>   (Christian König)
> 
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
> Cc: <dri-devel@lists.freedesktop.org>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/ttm/ttm_bo.c       |  1 +
>  drivers/gpu/drm/ttm/ttm_device.c   |  9 ++-
>  drivers/gpu/drm/ttm/ttm_resource.c | 94 ++++++++++++++++++++----------
>  include/drm/ttm/ttm_resource.h     | 16 +++--
>  4 files changed, 82 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 6396dece0db1..43eda720657f 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -621,6 +621,7 @@ int ttm_mem_evict_first(struct ttm_device *bdev,
>  		if (locked)
>  			dma_resv_unlock(res->bo->base.resv);
>  	}
> +	ttm_resource_cursor_fini_locked(&cursor);
>  
>  	if (!bo) {
>  		if (busy_bo && !ttm_bo_get_unless_zero(busy_bo))
> diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
> index f27406e851e5..e8a6a1dab669 100644
> --- a/drivers/gpu/drm/ttm/ttm_device.c
> +++ b/drivers/gpu/drm/ttm/ttm_device.c
> @@ -169,12 +169,17 @@ int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
>  			num_pages = PFN_UP(bo->base.size);
>  			ret = ttm_bo_swapout(bo, ctx, gfp_flags);
>  			/* ttm_bo_swapout has dropped the lru_lock */
> -			if (!ret)
> +			if (!ret) {
> +				ttm_resource_cursor_fini(&cursor);
>  				return num_pages;
> -			if (ret != -EBUSY)
> +			}
> +			if (ret != -EBUSY) {
> +				ttm_resource_cursor_fini(&cursor);
>  				return ret;
> +			}
>  		}
>  	}
> +	ttm_resource_cursor_fini_locked(&cursor);
>  	spin_unlock(&bdev->lru_lock);
>  	return 0;
>  }
> diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
> index 7aa5ca5c0e33..22f8ae4ff4c0 100644
> --- a/drivers/gpu/drm/ttm/ttm_resource.c
> +++ b/drivers/gpu/drm/ttm/ttm_resource.c
> @@ -32,6 +32,37 @@
>  
>  #include <drm/drm_util.h>
>  
> +/**
> + * ttm_resource_cursor_fini_locked() - Finalize the LRU list cursor usage
> + * @cursor: The struct ttm_resource_cursor to finalize.
> + *
> + * The function pulls the LRU list cursor off any lists it was previusly
> + * attached to. Needs to be called with the LRU lock held. The function
> + * can be called multiple times after eachother.
> + */
> +void ttm_resource_cursor_fini_locked(struct ttm_resource_cursor *cursor)
> +{
> +	lockdep_assert_held(&cursor->man->bdev->lru_lock);
> +	list_del_init(&cursor->hitch.link);
> +}
> +
> +/**
> + * ttm_resource_cursor_fini() - Finalize the LRU list cursor usage
> + * @cursor: The struct ttm_resource_cursor to finalize.
> + *
> + * The function pulls the LRU list cursor off any lists it was previusly
> + * attached to. Needs to be called without the LRU list lock held. The
> + * function can be called multiple times after eachother.
> + */
> +void ttm_resource_cursor_fini(struct ttm_resource_cursor *cursor)
> +{
> +	spinlock_t *lru_lock = &cursor->man->bdev->lru_lock;
> +
> +	spin_lock(lru_lock);
> +	ttm_resource_cursor_fini_locked(cursor);
> +	spin_unlock(lru_lock);
> +}
> +
>  /**
>   * ttm_lru_bulk_move_init - initialize a bulk move structure
>   * @bulk: the structure to init
> @@ -484,61 +515,62 @@ void ttm_resource_manager_debug(struct ttm_resource_manager *man,
>  EXPORT_SYMBOL(ttm_resource_manager_debug);
>  
>  /**
> - * ttm_resource_manager_first
> - *
> + * ttm_resource_manager_first() - Start iterating over the resources
> + * of a resource manager
>   * @man: resource manager to iterate over
>   * @cursor: cursor to record the position
>   *
> - * Returns the first resource from the resource manager.
> + * Initializes the cursor and starts iterating. When done iterating,
> + * the caller must explicitly call ttm_resource_cursor_fini().
> + *
> + * Return: The first resource from the resource manager.
>   */
>  struct ttm_resource *
>  ttm_resource_manager_first(struct ttm_resource_manager *man,
>  			   struct ttm_resource_cursor *cursor)
>  {
> -	struct ttm_lru_item *lru;
> -
>  	lockdep_assert_held(&man->bdev->lru_lock);
>  
> -	for (cursor->priority = 0; cursor->priority < TTM_MAX_BO_PRIORITY;
> -	     ++cursor->priority)
> -		list_for_each_entry(lru, &man->lru[cursor->priority], link) {
> -			if (ttm_lru_item_is_res(lru))
> -				return ttm_lru_item_to_res(lru);
> -		}
> +	cursor->priority = 0;
> +	cursor->man = man;
> +	ttm_lru_item_init(&cursor->hitch, TTM_LRU_HITCH);
> +	list_add(&cursor->hitch.link, &man->lru[cursor->priority]);
>  
> -	return NULL;
> +	return ttm_resource_manager_next(cursor);
>  }
>  
>  /**
> - * ttm_resource_manager_next
> - *
> - * @man: resource manager to iterate over
> + * ttm_resource_manager_next() - Continue iterating over the resource manager
> + * resources
>   * @cursor: cursor to record the position
> - * @res: the current resource pointer
>   *
> - * Returns the next resource from the resource manager.
> + * Return: The next resource from the resource manager.
>   */
>  struct ttm_resource *
> -ttm_resource_manager_next(struct ttm_resource_manager *man,
> -			  struct ttm_resource_cursor *cursor,
> -			  struct ttm_resource *res)
> +ttm_resource_manager_next(struct ttm_resource_cursor *cursor)
>  {
> -	struct ttm_lru_item *lru = &res->lru;
> +	struct ttm_resource_manager *man = cursor->man;
> +	struct ttm_lru_item *lru;
>  
>  	lockdep_assert_held(&man->bdev->lru_lock);
>  
> -	list_for_each_entry_continue(lru, &man->lru[cursor->priority], link) {
> -		if (ttm_lru_item_is_res(lru))
> -			return ttm_lru_item_to_res(lru);
> -	}
> -
> -	for (++cursor->priority; cursor->priority < TTM_MAX_BO_PRIORITY;
> -	     ++cursor->priority)
> -		list_for_each_entry(lru, &man->lru[cursor->priority], link) {
> -			if (ttm_lru_item_is_res(lru))
> -				ttm_lru_item_to_res(lru);
> +	do {
> +		lru = &cursor->hitch;
> +		list_for_each_entry_continue(lru, &man->lru[cursor->priority], link) {
> +			if (ttm_lru_item_is_res(lru)) {
> +				list_move(&cursor->hitch.link, &lru->link);
> +				return ttm_lru_item_to_res(lru);
> +			}
>  		}
>  
> +		if (++cursor->priority >= TTM_MAX_BO_PRIORITY)
> +			break;
> +
> +		list_move(&cursor->hitch.link, &man->lru[cursor->priority]);
> +	} while (true);
> +
> +	list_del_init(&cursor->hitch.link);

As mentioned in patch #3, probably use ttm_resource_cursor_fini_locked here.

Have an open pacth #1 too related this patch but this does look sane.

Matt

> +
>  	return NULL;
>  }
>  
> diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h
> index 4babc4ff10b0..dfc01258c981 100644
> --- a/include/drm/ttm/ttm_resource.h
> +++ b/include/drm/ttm/ttm_resource.h
> @@ -271,15 +271,23 @@ ttm_lru_item_to_res(struct ttm_lru_item *item)
>  
>  /**
>   * struct ttm_resource_cursor
> - *
> + * @man: The resource manager currently being iterated over
> + * @hitch: A hitch list node inserted before the next resource
> + * to iterate over.
>   * @priority: the current priority
>   *
>   * Cursor to iterate over the resources in a manager.
>   */
>  struct ttm_resource_cursor {
> +	struct ttm_resource_manager *man;
> +	struct ttm_lru_item hitch;
>  	unsigned int priority;
>  };
>  
> +void ttm_resource_cursor_fini_locked(struct ttm_resource_cursor *cursor);
> +
> +void ttm_resource_cursor_fini(struct ttm_resource_cursor *cursor);
> +
>  /**
>   * struct ttm_lru_bulk_move_pos
>   *
> @@ -438,9 +446,7 @@ struct ttm_resource *
>  ttm_resource_manager_first(struct ttm_resource_manager *man,
>  			   struct ttm_resource_cursor *cursor);
>  struct ttm_resource *
> -ttm_resource_manager_next(struct ttm_resource_manager *man,
> -			  struct ttm_resource_cursor *cursor,
> -			  struct ttm_resource *res);
> +ttm_resource_manager_next(struct ttm_resource_cursor *cursor);
>  
>  /**
>   * ttm_resource_manager_for_each_res - iterate over all resources
> @@ -452,7 +458,7 @@ ttm_resource_manager_next(struct ttm_resource_manager *man,
>   */
>  #define ttm_resource_manager_for_each_res(man, cursor, res)		\
>  	for (res = ttm_resource_manager_first(man, cursor); res;	\
> -	     res = ttm_resource_manager_next(man, cursor, res))
> +	     res = ttm_resource_manager_next(cursor))
>  
>  struct ttm_kmap_iter *
>  ttm_kmap_iter_iomap_init(struct ttm_kmap_iter_iomap *iter_io,
> -- 
> 2.44.0
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/9] drm/ttm: Allow TTM LRU list nodes of different types
  2024-04-17  6:09     ` Christian König
@ 2024-05-02 11:41       ` Thomas Hellström
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Hellström @ 2024-05-02 11:41 UTC (permalink / raw)
  To: Christian König, Matthew Brost
  Cc: intel-xe, Somalapuram Amaranath, dri-devel

Hi, Matt, Christian,

On Wed, 2024-04-17 at 08:09 +0200, Christian König wrote:
> Am 17.04.24 um 03:15 schrieb Matthew Brost:
> > On Tue, Apr 16, 2024 at 12:07:22PM +0200, Thomas Hellström wrote:
> > > To be able to handle list unlocking while traversing the LRU
> > > list, we want the iterators not only to point to the next
> > > position of the list traversal, but to insert themselves as
> > > list nodes at that point to work around the fact that the
> > > next node might otherwise disappear from the list while
> > > the iterator is pointing to it.
> > > 
> > > These list nodes need to be easily distinguishable from other
> > > list nodes so that others traversing the list can skip
> > > over them.
> > > 
> > > So declare a struct ttm_lru_item, with a struct list_head member
> > > and a type enum. This will slightly increase the size of a
> > > struct ttm_resource.
> > > 
> > > Changes in previous series:
> > > - Update enum ttm_lru_item_type documentation.
> > > 
> > Patch itself makes sense to me. One style question (or maybe
> > suggestion?) below.
> > 
> > > Cc: Christian König <christian.koenig@amd.com>
> > > Cc: Somalapuram Amaranath <Amaranath.Somalapuram@amd.com>
> > > Cc: <dri-devel@lists.freedesktop.org>
> > > Signed-off-by: Thomas Hellström
> > > <thomas.hellstrom@linux.intel.com>
> > > ---
> > >   drivers/gpu/drm/ttm/ttm_device.c   | 13 ++++--
> > >   drivers/gpu/drm/ttm/ttm_resource.c | 70 ++++++++++++++++++++++-
> > > -------
> > >   include/drm/ttm/ttm_resource.h     | 51 +++++++++++++++++++++-
> > >   3 files changed, 110 insertions(+), 24 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/ttm/ttm_device.c
> > > b/drivers/gpu/drm/ttm/ttm_device.c
> > > index 76027960054f..f27406e851e5 100644
> > > --- a/drivers/gpu/drm/ttm/ttm_device.c
> > > +++ b/drivers/gpu/drm/ttm/ttm_device.c
> > > @@ -270,17 +270,22 @@ EXPORT_SYMBOL(ttm_device_fini);
> > >   static void ttm_device_clear_lru_dma_mappings(struct ttm_device
> > > *bdev,
> > >   					      struct list_head
> > > *list)
> > >   {
> > > -	struct ttm_resource *res;
> > > +	struct ttm_lru_item *lru;
> > >   
> > >   	spin_lock(&bdev->lru_lock);
> > > -	while ((res = list_first_entry_or_null(list,
> > > typeof(*res), lru))) {
> > > -		struct ttm_buffer_object *bo = res->bo;
> > > +	while ((lru = list_first_entry_or_null(list,
> > > typeof(*lru), link))) {
> > > +		struct ttm_buffer_object *bo;
> > > +
> > > +		if (!ttm_lru_item_is_res(lru))
> > > +			continue;
> > > +
> > > +		bo = ttm_lru_item_to_res(lru)->bo;
> > >   
> > >   		/* Take ref against racing releases once
> > > lru_lock is unlocked */
> > >   		if (!ttm_bo_get_unless_zero(bo))
> > >   			continue;
> > >   
> > > -		list_del_init(&res->lru);
> > > +		list_del_init(&bo->resource->lru.link);
> > >   		spin_unlock(&bdev->lru_lock);
> > >   
> > >   		if (bo->ttm)
> > > diff --git a/drivers/gpu/drm/ttm/ttm_resource.c
> > > b/drivers/gpu/drm/ttm/ttm_resource.c
> > > index be8d286513f9..7aa5ca5c0e33 100644
> > > --- a/drivers/gpu/drm/ttm/ttm_resource.c
> > > +++ b/drivers/gpu/drm/ttm/ttm_resource.c
> > > @@ -69,8 +69,8 @@ void ttm_lru_bulk_move_tail(struct
> > > ttm_lru_bulk_move *bulk)
> > >   			dma_resv_assert_held(pos->last->bo-
> > > >base.resv);
> > >   
> > >   			man = ttm_manager_type(pos->first->bo-
> > > >bdev, i);
> > > -			list_bulk_move_tail(&man->lru[j], &pos-
> > > >first->lru,
> > > -					    &pos->last->lru);
> > > +			list_bulk_move_tail(&man->lru[j], &pos-
> > > >first->lru.link,
> > > +					    &pos->last-
> > > >lru.link);
> > >   		}
> > >   	}
> > >   }
> > > @@ -83,14 +83,38 @@ ttm_lru_bulk_move_pos(struct
> > > ttm_lru_bulk_move *bulk, struct ttm_resource *res)
> > >   	return &bulk->pos[res->mem_type][res->bo->priority];
> > >   }
> > >   
> > > +/* Return the previous resource on the list (skip over non-
> > > resource list items) */
> > > +static struct ttm_resource *ttm_lru_prev_res(struct ttm_resource
> > > *cur)
> > > +{
> > > +	struct ttm_lru_item *lru = &cur->lru;
> > > +
> > > +	do {
> > > +		lru = list_prev_entry(lru, link);
> > > +	} while (!ttm_lru_item_is_res(lru));
> > > +
> > > +	return ttm_lru_item_to_res(lru);
> > > +}
> > > +
> > > +/* Return the next resource on the list (skip over non-resource
> > > list items) */
> > > +static struct ttm_resource *ttm_lru_next_res(struct ttm_resource
> > > *cur)
> > > +{
> > > +	struct ttm_lru_item *lru = &cur->lru;
> > > +
> > > +	do {
> > > +		lru = list_next_entry(lru, link);
> > > +	} while (!ttm_lru_item_is_res(lru));
> > > +
> > > +	return ttm_lru_item_to_res(lru);
> > > +}
> > > +
> > >   /* Move the resource to the tail of the bulk move range */
> > >   static void ttm_lru_bulk_move_pos_tail(struct
> > > ttm_lru_bulk_move_pos *pos,
> > >   				       struct ttm_resource *res)
> > >   {
> > >   	if (pos->last != res) {
> > >   		if (pos->first == res)
> > > -			pos->first = list_next_entry(res, lru);
> > > -		list_move(&res->lru, &pos->last->lru);
> > > +			pos->first = ttm_lru_next_res(res);
> > > +		list_move(&res->lru.link, &pos->last->lru.link);
> > >   		pos->last = res;
> > >   	}
> > >   }
> > > @@ -121,11 +145,11 @@ static void ttm_lru_bulk_move_del(struct
> > > ttm_lru_bulk_move *bulk,
> > >   		pos->first = NULL;
> > >   		pos->last = NULL;
> > >   	} else if (pos->first == res) {
> > > -		pos->first = list_next_entry(res, lru);
> > > +		pos->first = ttm_lru_next_res(res);
> > >   	} else if (pos->last == res) {
> > > -		pos->last = list_prev_entry(res, lru);
> > > +		pos->last = ttm_lru_prev_res(res);
> > >   	} else {
> > > -		list_move(&res->lru, &pos->last->lru);
> > > +		list_move(&res->lru.link, &pos->last->lru.link);
> > >   	}
> > >   }
> > >   
> > > @@ -154,7 +178,7 @@ void ttm_resource_move_to_lru_tail(struct
> > > ttm_resource *res)
> > >   	lockdep_assert_held(&bo->bdev->lru_lock);
> > >   
> > >   	if (bo->pin_count) {
> > > -		list_move_tail(&res->lru, &bdev->pinned);
> > > +		list_move_tail(&res->lru.link, &bdev->pinned);
> > >   
> > >   	} else	if (bo->bulk_move) {
> > >   		struct ttm_lru_bulk_move_pos *pos =
> > > @@ -165,7 +189,7 @@ void ttm_resource_move_to_lru_tail(struct
> > > ttm_resource *res)
> > >   		struct ttm_resource_manager *man;
> > >   
> > >   		man = ttm_manager_type(bdev, res->mem_type);
> > > -		list_move_tail(&res->lru, &man->lru[bo-
> > > >priority]);
> > > +		list_move_tail(&res->lru.link, &man->lru[bo-
> > > >priority]);
> > >   	}
> > >   }
> > >   
> > > @@ -196,9 +220,9 @@ void ttm_resource_init(struct
> > > ttm_buffer_object *bo,
> > >   	man = ttm_manager_type(bo->bdev, place->mem_type);
> > >   	spin_lock(&bo->bdev->lru_lock);
> > >   	if (bo->pin_count)
> > > -		list_add_tail(&res->lru, &bo->bdev->pinned);
> > > +		list_add_tail(&res->lru.link, &bo->bdev-
> > > >pinned);
> > >   	else
> > > -		list_add_tail(&res->lru, &man->lru[bo-
> > > >priority]);
> > > +		list_add_tail(&res->lru.link, &man->lru[bo-
> > > >priority]);
> > >   	man->usage += res->size;
> > >   	spin_unlock(&bo->bdev->lru_lock);
> > >   }
> > > @@ -220,7 +244,7 @@ void ttm_resource_fini(struct
> > > ttm_resource_manager *man,
> > >   	struct ttm_device *bdev = man->bdev;
> > >   
> > >   	spin_lock(&bdev->lru_lock);
> > > -	list_del_init(&res->lru);
> > > +	list_del_init(&res->lru.link);
> > >   	man->usage -= res->size;
> > >   	spin_unlock(&bdev->lru_lock);
> > >   }
> > > @@ -471,14 +495,16 @@ struct ttm_resource *
> > >   ttm_resource_manager_first(struct ttm_resource_manager *man,
> > >   			   struct ttm_resource_cursor *cursor)
> > >   {
> > > -	struct ttm_resource *res;
> > > +	struct ttm_lru_item *lru;
> > >   
> > >   	lockdep_assert_held(&man->bdev->lru_lock);
> > >   
> > >   	for (cursor->priority = 0; cursor->priority <
> > > TTM_MAX_BO_PRIORITY;
> > >   	     ++cursor->priority)
> > > -		list_for_each_entry(res, &man->lru[cursor-
> > > >priority], lru)
> > > -			return res;
> > > +		list_for_each_entry(lru, &man->lru[cursor-
> > > >priority], link) {
> > > +			if (ttm_lru_item_is_res(lru))
> > > +				return ttm_lru_item_to_res(lru);
> > > +		}
> > >   
> > >   	return NULL;
> > >   }
> > > @@ -497,15 +523,21 @@ ttm_resource_manager_next(struct
> > > ttm_resource_manager *man,
> > >   			  struct ttm_resource_cursor *cursor,
> > >   			  struct ttm_resource *res)
> > >   {
> > > +	struct ttm_lru_item *lru = &res->lru;
> > > +
> > >   	lockdep_assert_held(&man->bdev->lru_lock);
> > >   
> > > -	list_for_each_entry_continue(res, &man->lru[cursor-
> > > >priority], lru)
> > > -		return res;
> > > +	list_for_each_entry_continue(lru, &man->lru[cursor-
> > > >priority], link) {
> > > +		if (ttm_lru_item_is_res(lru))
> > > +			return ttm_lru_item_to_res(lru);
> > > +	}
> > >   
> > >   	for (++cursor->priority; cursor->priority <
> > > TTM_MAX_BO_PRIORITY;
> > >   	     ++cursor->priority)
> > > -		list_for_each_entry(res, &man->lru[cursor-
> > > >priority], lru)
> > > -			return res;
> > > +		list_for_each_entry(lru, &man->lru[cursor-
> > > >priority], link) {
> > > +			if (ttm_lru_item_is_res(lru))
> > > +				ttm_lru_item_to_res(lru);
> > > +		}
> > >   
> > >   	return NULL;
> > >   }
> > > diff --git a/include/drm/ttm/ttm_resource.h
> > > b/include/drm/ttm/ttm_resource.h
> > > index 69769355139f..4babc4ff10b0 100644
> > > --- a/include/drm/ttm/ttm_resource.h
> > > +++ b/include/drm/ttm/ttm_resource.h
> > > @@ -49,6 +49,43 @@ struct io_mapping;
> > >   struct sg_table;
> > >   struct scatterlist;
> > >   
> > > +/**
> > > + * enum ttm_lru_item_type - enumerate ttm_lru_item subclasses
> > > + */
> > > +enum ttm_lru_item_type {
> > > +	/** @TTM_LRU_RESOURCE: The resource subclass */
> > > +	TTM_LRU_RESOURCE,
> > > +	/** @TTM_LRU_HITCH: The iterator hitch subclass */
> > > +	TTM_LRU_HITCH
> > > +};
> > > +
> > > +/**
> > > + * struct ttm_lru_item - The TTM lru list node base class
> > > + * @link: The list link
> > > + * @type: The subclass type
> > > + */
> > > +struct ttm_lru_item {
> > > +	struct list_head link;
> > > +	enum ttm_lru_item_type type;
> > > +};
> > > +
> > > +/**
> > > + * ttm_lru_item_init() - initialize a struct ttm_lru_item
> > > + * @item: The item to initialize
> > > + * @type: The subclass type
> > > + */
> > > +static inline void ttm_lru_item_init(struct ttm_lru_item *item,
> > > +				     enum ttm_lru_item_type
> > > type)
> > > +{
> > > +	item->type = type;
> > > +	INIT_LIST_HEAD(&item->link);
> > > +}
> > > +
> > > +static inline bool ttm_lru_item_is_res(const struct ttm_lru_item
> > > *item)
> > > +{
> > > +	return item->type == TTM_LRU_RESOURCE;
> > > +}
> > > +
> > >   struct ttm_resource_manager_func {
> > >   	/**
> > >   	 * struct ttm_resource_manager_func member alloc
> > > @@ -217,9 +254,21 @@ struct ttm_resource {
> > >   	/**
> > >   	 * @lru: Least recently used list, see
> > > &ttm_resource_manager.lru
> > >   	 */
> > > -	struct list_head lru;
> > > +	struct ttm_lru_item lru;
> > >   };
> > >   
> > > +/**
> > > + * ttm_lru_item_to_res() - Downcast a struct ttm_lru_item to a
> > > struct ttm_resource
> > > + * @item: The struct ttm_lru_item to downcast
> > > + *
> > > + * Return: Pointer to the embedding struct ttm_resource
> > > + */
> > > +static inline struct ttm_resource *
> > > +ttm_lru_item_to_res(struct ttm_lru_item *item)
> > Pretty much everywhere in this series we have the following coding
> > pattern:
> > 
> > if (ttm_lru_item_is_res(item))
> > 	do something with ttm_lru_item_to_res(item);
> > 
> > Would it make more sense to squash these functions together with
> > only
> > ttm_lru_item_to_res which returns NULL if item is not
> > TTM_LRU_RESOURCE?
> > 
> > The new pattern would be:
> > 
> > res = ttm_lru_item_is_res(item)
> > if (res)
> > 	do something with res
> > 
> > What do you think?

After looking into this, in the typical case this would actually amount
to

struct ttm_resource *res = ttm_lru_item_to_res(item)

if (res)
	do_something_with_res()

So four lines instead of two and an extra variable. So I don't see the
immediate benefit. I can change if you guys insist, though

> 
> I would even say we should put that filtering into the iterator.
> 
> Nobody except the code which inserted the anchor into the LRU is 
> interested in it.

That's true, and it's already part of the _next() iterator function.
However in those places where the iteration is open-coded it's still
present. Those functions are typically intended to drain the LRU list.

I could probably introduce a
	struct ttm_resource *ttm_lru_first_res_or_null(struct
list_head *);

function to handle those cases, though.

/Thomas.




> 
> Regards,
> Christian.
> 
> > 
> > Matt
> > 
> > > +{
> > > +	return container_of(item, struct ttm_resource, lru);
> > > +}
> > > +
> > >   /**
> > >    * struct ttm_resource_cursor
> > >    *
> > > -- 
> > > 2.44.0
> > > 
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2024-05-02 11:41 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-16 10:07 [PATCH v2 0/9] TTM shrinker helpers and xe buffer object shrinker Thomas Hellström
2024-04-16 10:07 ` [PATCH v2 1/9] drm/ttm: Allow TTM LRU list nodes of different types Thomas Hellström
2024-04-17  1:15   ` Matthew Brost
2024-04-17  6:09     ` Christian König
2024-05-02 11:41       ` Thomas Hellström
2024-04-16 10:07 ` [PATCH v2 2/9] drm/ttm: Use LRU hitches Thomas Hellström
2024-04-26  1:23   ` Matthew Brost
2024-04-16 10:07 ` [PATCH v2 3/9] drm/ttm, drm/amdgpu, drm/xe: Consider hitch moves within bulk sublist moves Thomas Hellström
2024-04-26  1:20   ` Matthew Brost
2024-04-16 10:07 ` [PATCH v2 4/9] drm/ttm: Allow continued swapout after -ENOSPC falure Thomas Hellström
2024-04-17  1:44   ` Matthew Brost
2024-04-16 10:07 ` [PATCH v2 5/9] drm/ttm: Add a virtual base class for graphics memory backup Thomas Hellström
2024-04-16 10:07 ` [PATCH v2 6/9] drm/ttm/pool: Provide a helper to shrink pages Thomas Hellström
2024-04-16 10:07 ` [PATCH v2 7/9] drm/ttm: Use fault-injection to test error paths Thomas Hellström
2024-04-16 10:07 ` [PATCH v2 8/9] drm/xe, drm/ttm: Provide a generic LRU walker helper Thomas Hellström
2024-04-16 10:07 ` [PATCH v2 9/9] drm/xe: Add a shrinker for xe bos Thomas Hellström
2024-04-16 21:32   ` kernel test robot
2024-04-17 13:45   ` kernel test robot
2024-04-16 11:55 ` [PATCH v2 0/9] TTM shrinker helpers and xe buffer object shrinker Christian König
2024-04-16 13:08   ` Thomas Hellström
2024-04-16 13:24     ` Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).