dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription
@ 2024-04-24 16:56 Friedrich Vock
  2024-04-24 16:56 ` [RFC PATCH 01/18] drm/ttm: Add tracking for evicted memory Friedrich Vock
                   ` (20 more replies)
  0 siblings, 21 replies; 40+ messages in thread
From: Friedrich Vock @ 2024-04-24 16:56 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Christian König, Alex Deucher

Hi everyone,

recently I've been looking into remedies for apps (in particular, newer
games) that experience significant performance loss when they start to
hit VRAM limits, especially on older or lower-end cards that struggle
to fit both desktop apps and all the game data into VRAM at once.

The root of the problem lies in the fact that from userspace's POV,
buffer eviction is very opaque: Userspace applications/drivers cannot
tell how oversubscribed VRAM is, nor do they have fine-grained control
over which buffers get evicted.  At the same time, with GPU APIs becoming
increasingly lower-level and GPU-driven, only the application itself
can know which buffers are used within a particular submission, and
how important each buffer is. For this, GPU APIs include interfaces
to query oversubscription and specify memory priorities: In Vulkan,
oversubscription can be queried through the VK_EXT_memory_budget
extension. Different buffers can also be assigned priorities via the
VK_EXT_pageable_device_local_memory extension. Modern games, especially
D3D12 games via vkd3d-proton, rely on oversubscription being reported and
priorities being respected in order to perform their memory management.

However, relaying this information to the kernel via the current KMD uAPIs
is not possible. On AMDGPU for example, all work submissions include a
"bo list" that contains any buffer object that is accessed during the
course of the submission. If VRAM is oversubscribed and a buffer in the
list was evicted to system memory, that buffer is moved back to VRAM
(potentially evicting other unused buffers).

Since the usermode driver doesn't know what buffers are used by the
application, its only choice is to submit a bo list that contains every
buffer the application has allocated. In case of VRAM oversubscription,
it is highly likely that some of the application's buffers were evicted,
which almost guarantees that some buffers will get moved around. Since
the bo list is only known at submit time, this also means the buffers
will get moved right before submitting application work, which is the
worst possible time to move buffers from a latency perspective. Another
consequence of the large bo list is that nearly all memory from other
applications will be evicted, too. When different applications (e.g. game
and compositor) submit work one after the other, this causes a ping-pong
effect where each app's submission evicts the other app's memory,
resulting in a large amount of unnecessary moves.

This overly aggressive eviction behavior led to RADV adopting a change
that effectively allows all VRAM applications to reside in system memory
[1].  This worked around the ping-ponging/excessive buffer moving problem,
but also meant that any memory evicted to system memory would forever
stay there, regardless of how VRAM is used.

My proposal aims at providing a middle ground between these extremes.
The goals I want to meet are:
- Userspace is accurately informed about VRAM oversubscription/how much
  VRAM has been evicted
- Buffer eviction respects priorities set by userspace - Wasteful
  ping-ponging is avoided to the extent possible

I have been testing out some prototypes, and came up with this rough
sketch of an API:

- For each ttm_resource_manager, the amount of evicted memory is tracked
  (similarly to how "usage" tracks the memory usage). When memory is
  evicted via ttm_bo_evict, the size of the evicted memory is added, when
  memory is un-evicted (see below), its size is subtracted. The amount of
  evicted memory for e.g. VRAM can be queried by userspace via an ioctl.

- Each ttm_resource_manager maintains a list of evicted buffer objects.

- ttm_mem_unevict walks the list of evicted bos for a given
  ttm_resource_manager and tries moving evicted resources back. When a
  buffer is freed, this function is called to immediately restore some
  evicted memory.

- Each ttm_buffer_object independently tracks the mem_type it wants
  to reside in.

- ttm_bo_try_unevict is added as a helper function which attempts to
  move the buffer to its preferred mem_type. If no space is available
  there, it fails with -ENOSPC/-ENOMEM.

- Similar to how ttm_bo_evict works, each driver can implement
  uneviction_valuable/unevict_flags callbacks to control buffer
  un-eviction.

This is what patches 1-10 accomplish (together with an amdgpu
implementation utilizing the new API).

Userspace priorities could then be implemented as follows:

- TTM already manages priorities for each buffer object. These priorities
  can be updated by userspace via a GEM_OP ioctl to inform the kernel
  which buffers should be evicted before others. If an ioctl increases
  the priority of a buffer, ttm_bo_try_unevict is called on that buffer to
  try and move it back (potentially evicting buffers with a lower
  priority)

- Buffers should never be evicted by other buffers with equal/lower
  priority, but if there is a buffer with lower priority occupying VRAM,
  it should be evicted in favor of the higher-priority one. This prevents
  ping-ponging between buffers that try evicting each other and is
  trivially implementable with an early-exit in ttm_mem_evict_first.

This is covered in patches 11-15, with the new features exposed to
userspace in patches 16-18.

I also have a RADV branch utilizing this API at [2], which I use for
testing.

This implementation is stil very much WIP, although the D3D12 games I
tested already seemed to benefit from it. Nevertheless, are still quite
a few TODOs and unresolved questions/problems.

Some kernel drivers (e.g i915) already use TTM priorities for
kernel-internal purposes. Of course, some of the highest priorities
should stay reserved for these purposes (with userspace being able to
use the lower priorities).

Another problem with priorities is the possibility of apps starving other
apps by occupying all of VRAM with high-priority allocations. A possible
solution could be include restricting the highest priority/priorities
to important apps like compositors.

Tying into this problem, only apps that are actively cooperating
to reduce memory pressure can benefit from the current memory priority
implementation. Eventually the priority system could also be utilized
to benefit all applications, for example with the desktop environment
boosting the priority of the currently-focused app/its cgroup (to
provide the best QoS to the apps the user is actively using). A full
implementation of this is probably out-of-scope for this initial proposal,
but it's probably a good idea to consider this as a possible future use
of the priority API.

I'm primarily looking to integrate this into amdgpu to solve the
issues I've seen there, but I'm also interested in feedback from
other drivers. Is this something you'd be interested in? Do you
have any objections/comments/questions about my proposed design?

Thanks,
Friedrich

[1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6833
[2] https://gitlab.freedesktop.org/pixelcluster/mesa/-/tree/spilling

Friedrich Vock (18):
  drm/ttm: Add tracking for evicted memory
  drm/ttm: Add per-BO eviction tracking
  drm/ttm: Implement BO eviction tracking
  drm/ttm: Add driver funcs for uneviction control
  drm/ttm: Add option to evict no BOs in operation
  drm/ttm: Add public buffer eviction/uneviction functions
  drm/amdgpu: Add TTM uneviction control functions
  drm/amdgpu: Don't try moving BOs to preferred domain before submit
  drm/amdgpu: Don't mark VRAM as a busy placement for VRAM|GTT resources
  drm/amdgpu: Don't add GTT to initial domains after failing to allocate
    VRAM
  drm/ttm: Bump BO priority count
  drm/ttm: Do not evict BOs with higher priority
  drm/ttm: Implement ttm_bo_update_priority
  drm/ttm: Consider BOs placed in non-favorite locations evicted
  drm/amdgpu: Set a default priority for user/kernel BOs
  drm/amdgpu: Implement SET_PRIORITY GEM op
  drm/amdgpu: Implement EVICTED_VRAM query
  drm/amdgpu: Bump minor version

 drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   2 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c     | 191 +---------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h     |   4 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    |  25 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    |   3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  26 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |   4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    |  50 ++++
 drivers/gpu/drm/ttm/ttm_bo.c               | 253 ++++++++++++++++++++-
 drivers/gpu/drm/ttm/ttm_bo_util.c          |   3 +
 drivers/gpu/drm/ttm/ttm_device.c           |   1 +
 drivers/gpu/drm/ttm/ttm_resource.c         |  19 +-
 include/drm/ttm/ttm_bo.h                   |  22 ++
 include/drm/ttm/ttm_device.h               |  28 +++
 include/drm/ttm/ttm_resource.h             |  11 +-
 include/uapi/drm/amdgpu_drm.h              |   3 +
 17 files changed, 430 insertions(+), 218 deletions(-)

--
2.44.0


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC PATCH 01/18] drm/ttm: Add tracking for evicted memory
  2024-04-24 16:56 [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Friedrich Vock
@ 2024-04-24 16:56 ` Friedrich Vock
  2024-04-24 16:56 ` [RFC PATCH 02/18] drm/ttm: Add per-BO eviction tracking Friedrich Vock
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 40+ messages in thread
From: Friedrich Vock @ 2024-04-24 16:56 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Christian König, Alex Deucher

These utilities will be used to keep track of what buffers have been
evicted from any particular place, to try and decide when to try undoing
the eviction.

Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
---
 drivers/gpu/drm/ttm/ttm_device.c   |  1 +
 drivers/gpu/drm/ttm/ttm_resource.c | 14 ++++++++++++++
 include/drm/ttm/ttm_device.h       |  5 +++++
 include/drm/ttm/ttm_resource.h     |  9 +++++++++
 4 files changed, 29 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index f5187b384ae9a..969d627ba06c0 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -219,6 +219,7 @@ int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *func

 	bdev->vma_manager = vma_manager;
 	spin_lock_init(&bdev->lru_lock);
+	spin_lock_init(&bdev->unevict_lock);
 	INIT_LIST_HEAD(&bdev->pinned);
 	bdev->dev_mapping = mapping;
 	mutex_lock(&ttm_global_mutex);
diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
index 46ff9c75bb124..1d6755a1153b1 100644
--- a/drivers/gpu/drm/ttm/ttm_resource.c
+++ b/drivers/gpu/drm/ttm/ttm_resource.c
@@ -25,6 +25,7 @@
 #include <linux/iosys-map.h>
 #include <linux/io-mapping.h>
 #include <linux/scatterlist.h>
+#include <linux/list.h>

 #include <drm/ttm/ttm_bo.h>
 #include <drm/ttm/ttm_placement.h>
@@ -392,9 +393,11 @@ void ttm_resource_manager_init(struct ttm_resource_manager *man,
 	man->bdev = bdev;
 	man->size = size;
 	man->usage = 0;
+	man->evicted_bytes = 0;

 	for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i)
 		INIT_LIST_HEAD(&man->lru[i]);
+	INIT_LIST_HEAD(&man->evicted);
 	man->move = NULL;
 }
 EXPORT_SYMBOL(ttm_resource_manager_init);
@@ -470,6 +473,17 @@ uint64_t ttm_resource_manager_usage(struct ttm_resource_manager *man)
 }
 EXPORT_SYMBOL(ttm_resource_manager_usage);

+uint64_t ttm_resource_manager_evicted_bytes(struct ttm_resource_manager *man)
+{
+	uint64_t evicted;
+
+	spin_lock(&man->bdev->unevict_lock);
+	evicted = man->evicted_bytes;
+	spin_unlock(&man->bdev->unevict_lock);
+	return evicted;
+}
+EXPORT_SYMBOL(ttm_resource_manager_evicted_bytes);
+
 /**
  * ttm_resource_manager_debug
  *
diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h
index c22f30535c848..baa264efe483d 100644
--- a/include/drm/ttm/ttm_device.h
+++ b/include/drm/ttm/ttm_device.h
@@ -251,6 +251,11 @@ struct ttm_device {
 	 */
 	spinlock_t lru_lock;

+	/**
+	 * @unevict_lock: Protection for per-manager uneviction tracking
+	 */
+	spinlock_t unevict_lock;
+
 	/**
 	 * @pinned: Buffer objects which are pinned and so not on any LRU list.
 	 */
diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h
index 78a226eba953c..7d1ce059c8805 100644
--- a/include/drm/ttm/ttm_resource.h
+++ b/include/drm/ttm/ttm_resource.h
@@ -145,6 +145,7 @@ struct ttm_resource_manager_func {
  * @move_lock: lock for move fence
  * @move: The fence of the last pipelined move operation.
  * @lru: The lru list for this memory type.
+ * @evicted: List of bos evicted from this memory type
  *
  * This structure is used to identify and manage memory types for a device.
  */
@@ -163,6 +164,7 @@ struct ttm_resource_manager {
 	 * Protected by @move_lock.
 	 */
 	struct dma_fence *move;
+	struct list_head evicted;

 	/*
 	 * Protected by the bdev->lru_lock.
@@ -174,6 +176,12 @@ struct ttm_resource_manager {
 	 * bdev->lru_lock.
 	 */
 	uint64_t usage;
+
+	/**
+	 * @evicted_bytes: How many bytes are evicted from this manager,
+	 * protexted by bdev->unevict_lock
+	 */
+	uint64_t evicted_bytes;
 };

 /**
@@ -382,6 +390,7 @@ int ttm_resource_manager_evict_all(struct ttm_device *bdev,
 				   struct ttm_resource_manager *man);

 uint64_t ttm_resource_manager_usage(struct ttm_resource_manager *man);
+uint64_t ttm_resource_manager_evicted_bytes(struct ttm_resource_manager *man);
 void ttm_resource_manager_debug(struct ttm_resource_manager *man,
 				struct drm_printer *p);

--
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 02/18] drm/ttm: Add per-BO eviction tracking
  2024-04-24 16:56 [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Friedrich Vock
  2024-04-24 16:56 ` [RFC PATCH 01/18] drm/ttm: Add tracking for evicted memory Friedrich Vock
@ 2024-04-24 16:56 ` Friedrich Vock
  2024-04-25  6:18   ` Christian König
  2024-04-24 16:56 ` [RFC PATCH 03/18] drm/ttm: Implement BO " Friedrich Vock
                   ` (18 subsequent siblings)
  20 siblings, 1 reply; 40+ messages in thread
From: Friedrich Vock @ 2024-04-24 16:56 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Christian König, Alex Deucher

Make each buffer object aware of whether it has been evicted or not.

Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
---
 drivers/gpu/drm/ttm/ttm_bo.c |  1 +
 include/drm/ttm/ttm_bo.h     | 11 +++++++++++
 2 files changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index edf10618fe2b2..3968b17453569 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -980,6 +980,7 @@ int ttm_bo_init_reserved(struct ttm_device *bdev, struct ttm_buffer_object *bo,
 	bo->pin_count = 0;
 	bo->sg = sg;
 	bo->bulk_move = NULL;
+	bo->evicted_type = TTM_NUM_MEM_TYPES;
 	if (resv)
 		bo->base.resv = resv;
 	else
diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
index 0223a41a64b24..8a1a29c6fbc50 100644
--- a/include/drm/ttm/ttm_bo.h
+++ b/include/drm/ttm/ttm_bo.h
@@ -121,6 +121,17 @@ struct ttm_buffer_object {
 	unsigned priority;
 	unsigned pin_count;

+	/**
+	 * @evicted_type: Memory type this BO was evicted from, if any.
+	 * TTM_NUM_MEM_TYPES if this BO was not evicted.
+	 */
+	int evicted_type;
+	/**
+	 * @evicted: Entry in the evicted list for the resource manager
+	 * this BO was evicted from.
+	 */
+	struct list_head evicted;
+
 	/**
 	 * @delayed_delete: Work item used when we can't delete the BO
 	 * immediately
--
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 03/18] drm/ttm: Implement BO eviction tracking
  2024-04-24 16:56 [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Friedrich Vock
  2024-04-24 16:56 ` [RFC PATCH 01/18] drm/ttm: Add tracking for evicted memory Friedrich Vock
  2024-04-24 16:56 ` [RFC PATCH 02/18] drm/ttm: Add per-BO eviction tracking Friedrich Vock
@ 2024-04-24 16:56 ` Friedrich Vock
  2024-04-24 16:56 ` [RFC PATCH 04/18] drm/ttm: Add driver funcs for uneviction control Friedrich Vock
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 40+ messages in thread
From: Friedrich Vock @ 2024-04-24 16:56 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Christian König, Alex Deucher

For each buffer object, remember evictions and try undoing them if
memory pressure gets lower again.

Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
---
 drivers/gpu/drm/ttm/ttm_bo.c      | 28 +++++++++++++++++++++++++++-
 drivers/gpu/drm/ttm/ttm_bo_util.c |  3 +++
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 3968b17453569..9a0efbf79316c 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -178,6 +178,12 @@ static void ttm_bo_cleanup_memtype_use(struct ttm_buffer_object *bo)
 {
 	if (bo->bdev->funcs->delete_mem_notify)
 		bo->bdev->funcs->delete_mem_notify(bo);
+	if (bo->evicted_type != TTM_NUM_MEM_TYPES) {
+		spin_lock(&bo->bdev->unevict_lock);
+		list_del_init(&bo->evicted);
+		man->evicted_bytes -= bo->base.size;
+		spin_unlock(&bo->bdev->unevict_lock);
+	}

 	ttm_bo_tt_destroy(bo);
 	ttm_resource_free(bo, &bo->resource);
@@ -429,7 +435,9 @@ static int ttm_bo_bounce_temp_buffer(struct ttm_buffer_object *bo,
 static int ttm_bo_evict(struct ttm_buffer_object *bo,
 			struct ttm_operation_ctx *ctx)
 {
+	int evicted_type = bo->resource->mem_type;
 	struct ttm_device *bdev = bo->bdev;
+	struct ttm_resource_manager *man;
 	struct ttm_resource *evict_mem;
 	struct ttm_placement placement;
 	struct ttm_place hop;
@@ -438,6 +446,7 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
 	memset(&hop, 0, sizeof(hop));

 	dma_resv_assert_held(bo->base.resv);
+	man = ttm_manager_type(bdev, evicted_type);

 	placement.num_placement = 0;
 	placement.num_busy_placement = 0;
@@ -477,6 +486,14 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
 		ttm_resource_free(bo, &evict_mem);
 		if (ret != -ERESTARTSYS && ret != -EINTR)
 			pr_err("Buffer eviction failed\n");
+	} else if (bo->evicted_type == TTM_NUM_MEM_TYPES &&
+		   bo->bdev->funcs->uneviction_valuable &&
+		   bo->bdev->funcs->uneviction_valuable(bo)) {
+		bo->evicted_type = evicted_type;
+		spin_lock(&bo->bdev->unevict_lock);
+		list_add_tail(&bo->evicted, &man->evicted);
+		man->evicted_bytes += bo->base.size;
+		spin_unlock(&bo->bdev->unevict_lock);
 	}
 out:
 	return ret;
@@ -845,6 +862,7 @@ static int ttm_bo_move_buffer(struct ttm_buffer_object *bo,
 			      struct ttm_placement *placement,
 			      struct ttm_operation_ctx *ctx)
 {
+	struct ttm_resource_manager *man;
 	struct ttm_resource *mem;
 	struct ttm_place hop;
 	int ret;
@@ -873,8 +891,16 @@ static int ttm_bo_move_buffer(struct ttm_buffer_object *bo,
 		goto bounce;
 	}
 out:
-	if (ret)
+	if (ret) {
 		ttm_resource_free(bo, &mem);
+	} else if (bo->evicted_type != TTM_NUM_MEM_TYPES) {
+		man = ttm_manager_type(bo->bdev, bo->evicted_type);
+		spin_lock(&bo->bdev->unevict_lock);
+		list_del_init(&bo->evicted);
+		man->evicted_bytes -= bo->base.size;
+		spin_unlock(&bo->bdev->unevict_lock);
+		bo->evicted_type = TTM_NUM_MEM_TYPES;
+	}
 	return ret;
 }

diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
index fd9fd3d15101c..119291c5ed85e 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -262,6 +262,9 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,
 		fbo->base.bulk_move = NULL;
 	}

+	INIT_LIST_HEAD(&fbo->base.evicted);
+	fbo->base.evicted_type = TTM_NUM_MEM_TYPES;
+
 	ret = dma_resv_reserve_fences(&fbo->base.base._resv, 1);
 	if (ret) {
 		kfree(fbo);
--
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 04/18] drm/ttm: Add driver funcs for uneviction control
  2024-04-24 16:56 [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Friedrich Vock
                   ` (2 preceding siblings ...)
  2024-04-24 16:56 ` [RFC PATCH 03/18] drm/ttm: Implement BO " Friedrich Vock
@ 2024-04-24 16:56 ` Friedrich Vock
  2024-04-24 16:56 ` [RFC PATCH 05/18] drm/ttm: Add option to evict no BOs in operation Friedrich Vock
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 40+ messages in thread
From: Friedrich Vock @ 2024-04-24 16:56 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Christian König, Alex Deucher

Provides fine-grained control for drivers over which buffers should be
considered when attempting to undo evictions.

Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
---
 include/drm/ttm/ttm_device.h | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h
index baa264efe483d..283795d674189 100644
--- a/include/drm/ttm/ttm_device.h
+++ b/include/drm/ttm/ttm_device.h
@@ -133,6 +133,29 @@ struct ttm_device_funcs {
 	void (*evict_flags)(struct ttm_buffer_object *bo,
 			    struct ttm_placement *placement);

+	/**
+	 * struct ttm_bo_driver member uneviction_valuable
+	 *
+	 * @bo: the buffer object to be unevicted
+	 *
+	 * Check with the driver if it is valuable to unevict a BO,
+	 * that is to move it back to its placecment before it was
+	 * evicted.
+	 */
+	bool (*uneviction_valuable)(struct ttm_buffer_object *bo);
+	/**
+	 * struct ttm_bo_driver member evict_flags:
+	 *
+	 * @bo: the buffer object that can be unevicted
+	 * @dest: The placement for the unevicted buffer
+	 *
+	 * This should not cause multihop evictions, and the core will warn
+	 * if one is proposed.
+	 */
+
+	void (*unevict_flags)(struct ttm_buffer_object *bo,
+			      struct ttm_placement *dest);
+
 	/**
 	 * struct ttm_bo_driver member move:
 	 *
--
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 05/18] drm/ttm: Add option to evict no BOs in operation
  2024-04-24 16:56 [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Friedrich Vock
                   ` (3 preceding siblings ...)
  2024-04-24 16:56 ` [RFC PATCH 04/18] drm/ttm: Add driver funcs for uneviction control Friedrich Vock
@ 2024-04-24 16:56 ` Friedrich Vock
  2024-04-25  6:20   ` Christian König
  2024-04-24 16:56 ` [RFC PATCH 06/18] drm/ttm: Add public buffer eviction/uneviction functions Friedrich Vock
                   ` (15 subsequent siblings)
  20 siblings, 1 reply; 40+ messages in thread
From: Friedrich Vock @ 2024-04-24 16:56 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Christian König, Alex Deucher

When undoing evictions because of decreased memory pressure, it makes no
sense to try evicting other buffers.

Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 2 ++
 include/drm/ttm/ttm_bo.h     | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 9a0efbf79316c..3b89fabc2f00a 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -764,6 +764,8 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
 			break;
 		if (unlikely(ret != -ENOSPC))
 			return ret;
+		if (ctx->no_evict)
+			return -ENOSPC;
 		ret = ttm_mem_evict_first(bdev, man, place, ctx,
 					  ticket);
 		if (unlikely(ret != 0))
diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
index 8a1a29c6fbc50..a8f21092403d6 100644
--- a/include/drm/ttm/ttm_bo.h
+++ b/include/drm/ttm/ttm_bo.h
@@ -192,6 +192,7 @@ struct ttm_operation_ctx {
 	bool gfp_retry_mayfail;
 	bool allow_res_evict;
 	bool force_alloc;
+	bool no_evict;
 	struct dma_resv *resv;
 	uint64_t bytes_moved;
 };
@@ -358,6 +359,7 @@ static inline void *ttm_kmap_obj_virtual(struct ttm_bo_kmap_obj *map,
 	return map->virtual;
 }

+
 int ttm_bo_wait_ctx(struct ttm_buffer_object *bo,
 		    struct ttm_operation_ctx *ctx);
 int ttm_bo_validate(struct ttm_buffer_object *bo,
--
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 06/18] drm/ttm: Add public buffer eviction/uneviction functions
  2024-04-24 16:56 [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Friedrich Vock
                   ` (4 preceding siblings ...)
  2024-04-24 16:56 ` [RFC PATCH 05/18] drm/ttm: Add option to evict no BOs in operation Friedrich Vock
@ 2024-04-24 16:56 ` Friedrich Vock
  2024-04-24 16:56 ` [RFC PATCH 07/18] drm/amdgpu: Add TTM uneviction control functions Friedrich Vock
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 40+ messages in thread
From: Friedrich Vock @ 2024-04-24 16:56 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Christian König, Alex Deucher

For now, they are only used internally inside TTM, but this will change
with the introduction of dynamic buffer priorities.

Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 168 ++++++++++++++++++++++++++++++++++-
 include/drm/ttm/ttm_bo.h     |   6 ++
 2 files changed, 172 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 3b89fabc2f00a..3047c763eb4eb 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -166,6 +166,111 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
 	return ret;
 }

+/**
+ * Fetches the next BO from the manager's list of evicted BOs.
+ * bdev->unevict_lock should be held when calling this function.
+ */
+static struct ttm_buffer_object *ttm_next_evicted_bo(struct ttm_device *bdev,
+						     struct ttm_resource_manager *man,
+						     struct ttm_buffer_object *cursor)
+{
+	struct ttm_buffer_object *bo = NULL;
+
+	if (cursor)
+		cursor = list_next_entry(cursor, evicted);
+	else
+		cursor = list_first_entry(&man->evicted, struct ttm_buffer_object, evicted);
+
+	if (!list_entry_is_head(cursor, &man->evicted, evicted))
+		bo = ttm_bo_get_unless_zero(cursor);
+	return bo;
+}
+
+void ttm_mem_unevict_evicted(struct ttm_device *bdev,
+			     struct ttm_resource_manager *man,
+			     bool interruptible)
+{
+	struct ttm_buffer_object *evicted_bo = NULL, *next_evicted_bo = NULL;
+	struct ttm_operation_ctx ctx;
+	int ret;
+
+	memset(&ctx, 0, sizeof(ctx));
+	ctx.interruptible = interruptible;
+	ctx.no_evict = true;
+
+	spin_lock(&bdev->unevict_lock);
+	evicted_bo = ttm_next_evicted_bo(bdev, man, NULL);
+	spin_unlock(&bdev->unevict_lock);
+
+	while (evicted_bo) {
+		if (interruptible)
+			ret = dma_resv_lock_interruptible(
+					evicted_bo->base.resv, NULL);
+		else
+			ret = dma_resv_lock(evicted_bo->base.resv,
+					    NULL);
+		if (ret) {
+			ttm_bo_put(evicted_bo);
+			break;
+		}
+
+		/* If we raced with another thread (and lost), the
+		 * other thread already removed the buffer from the
+		 * list. In that case, we need to start over because
+		 * our current cursor got removed.
+		 */
+		if (evicted_bo->evicted_type == TTM_NUM_MEM_TYPES)
+			ret = 0;
+		else
+			ret = ttm_bo_try_unevict(evicted_bo, &ctx);
+
+		next_evicted_bo = ret ? evicted_bo : NULL;
+
+		spin_lock(&bdev->unevict_lock);
+		next_evicted_bo = ttm_next_evicted_bo(bdev, man,
+						      next_evicted_bo);
+		spin_unlock(&bdev->unevict_lock);
+
+		dma_resv_unlock(evicted_bo->base.resv);
+		ttm_bo_put(evicted_bo);
+
+		evicted_bo = next_evicted_bo;
+	}
+}
+EXPORT_SYMBOL(ttm_mem_unevict_evicted);
+
+struct ttm_mem_unevict_work {
+	struct work_struct work;
+	struct ttm_device *bdev;
+	struct ttm_resource_manager *man;
+};
+
+static void ttm_mem_unevict_work(struct work_struct *work)
+{
+	struct ttm_mem_unevict_work *unevict_work;
+
+	unevict_work = container_of(work, typeof(*unevict_work), work);
+
+	ttm_mem_unevict_evicted(unevict_work->bdev, unevict_work->man,
+				false);
+}
+
+static void ttm_mem_queue_unevict(struct ttm_device *bdev,
+				  struct ttm_resource_manager *man)
+{
+	struct ttm_mem_unevict_work *work;
+
+	work = kzalloc(sizeof(*work), GFP_KERNEL);
+
+	if (!work)
+		return;
+
+	INIT_WORK(&work->work, ttm_mem_unevict_work);
+	work->bdev = bdev;
+	work->man = man;
+	queue_work_node(bdev->pool.nid, bdev->wq, &work->work);
+}
+
 /*
  * Call bo::reserved.
  * Will release GPU memory type usage on destruction.
@@ -176,6 +281,12 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,

 static void ttm_bo_cleanup_memtype_use(struct ttm_buffer_object *bo)
 {
+	struct ttm_resource_manager *man = NULL;
+	struct ttm_device *bdev = bo->bdev;
+
+	if (bo->resource)
+		man = ttm_manager_type(bo->bdev, bo->resource->mem_type);
+
 	if (bo->bdev->funcs->delete_mem_notify)
 		bo->bdev->funcs->delete_mem_notify(bo);
 	if (bo->evicted_type != TTM_NUM_MEM_TYPES) {
@@ -187,6 +298,9 @@ static void ttm_bo_cleanup_memtype_use(struct ttm_buffer_object *bo)

 	ttm_bo_tt_destroy(bo);
 	ttm_resource_free(bo, &bo->resource);
+
+	if (man)
+		ttm_mem_queue_unevict(bdev, man);
 }

 static int ttm_bo_individualize_resv(struct ttm_buffer_object *bo)
@@ -432,8 +546,7 @@ static int ttm_bo_bounce_temp_buffer(struct ttm_buffer_object *bo,
 	return 0;
 }

-static int ttm_bo_evict(struct ttm_buffer_object *bo,
-			struct ttm_operation_ctx *ctx)
+int ttm_bo_evict(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx)
 {
 	int evicted_type = bo->resource->mem_type;
 	struct ttm_device *bdev = bo->bdev;
@@ -499,6 +612,57 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
 	return ret;
 }

+int ttm_bo_try_unevict(struct ttm_buffer_object *bo,
+		       struct ttm_operation_ctx *ctx)
+{
+	struct ttm_resource_manager *man;
+	struct ttm_device *bdev = bo->bdev;
+	struct ttm_resource *unevict_mem;
+	struct ttm_placement placement;
+	struct ttm_place hop;
+	int ret = 0;
+
+	dma_resv_assert_held(bo->base.resv);
+
+	man = ttm_manager_type(bdev, bo->evicted_type);
+
+	if (bo->deleted)
+		goto out;
+
+	placement.num_placement = 0;
+	placement.num_busy_placement = 0;
+	bdev->funcs->unevict_flags(bo, &placement);
+
+	if (!placement.num_placement && !placement.num_busy_placement)
+		return -ENOSPC;
+
+	ret = ttm_bo_mem_space(bo, &placement, &unevict_mem, ctx);
+	if (ret)
+		return ret;
+
+	do {
+		ret = ttm_bo_handle_move_mem(bo, unevict_mem, true, ctx, &hop);
+		if (ret != -EMULTIHOP)
+			break;
+
+		ret = ttm_bo_bounce_temp_buffer(bo, &unevict_mem, ctx, &hop);
+	} while (!ret);
+
+	if (ret)
+		ttm_resource_free(bo, &unevict_mem);
+
+out:
+	if (!ret) {
+		spin_lock(&bdev->unevict_lock);
+		list_del_init(&bo->evicted);
+		man->evicted_bytes -= bo->base.size;
+		spin_unlock(&bdev->unevict_lock);
+		bo->evicted_type = TTM_NUM_MEM_TYPES;
+	}
+	return ret;
+}
+EXPORT_SYMBOL(ttm_bo_try_unevict);
+
 /**
  * ttm_bo_eviction_valuable
  *
diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
index a8f21092403d6..8f4e6366c0417 100644
--- a/include/drm/ttm/ttm_bo.h
+++ b/include/drm/ttm/ttm_bo.h
@@ -370,6 +370,9 @@ void ttm_bo_set_bulk_move(struct ttm_buffer_object *bo,
 			  struct ttm_lru_bulk_move *bulk);
 bool ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
 			      const struct ttm_place *place);
+int ttm_bo_evict(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx);
+int ttm_bo_try_unevict(struct ttm_buffer_object *bo,
+		       struct ttm_operation_ctx *ctx);
 int ttm_bo_init_reserved(struct ttm_device *bdev, struct ttm_buffer_object *bo,
 			 enum ttm_bo_type type, struct ttm_placement *placement,
 			 uint32_t alignment, struct ttm_operation_ctx *ctx,
@@ -395,6 +398,9 @@ int ttm_mem_evict_first(struct ttm_device *bdev,
 			const struct ttm_place *place,
 			struct ttm_operation_ctx *ctx,
 			struct ww_acquire_ctx *ticket);
+void ttm_mem_unevict_evicted(struct ttm_device *bdev,
+			     struct ttm_resource_manager *man,
+			     bool interruptible);
 vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo,
 			     struct vm_fault *vmf);
 vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
--
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 07/18] drm/amdgpu: Add TTM uneviction control functions
  2024-04-24 16:56 [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Friedrich Vock
                   ` (5 preceding siblings ...)
  2024-04-24 16:56 ` [RFC PATCH 06/18] drm/ttm: Add public buffer eviction/uneviction functions Friedrich Vock
@ 2024-04-24 16:56 ` Friedrich Vock
  2024-04-24 16:56 ` [RFC PATCH 08/18] drm/amdgpu: Don't try moving BOs to preferred domain before submit Friedrich Vock
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 40+ messages in thread
From: Friedrich Vock @ 2024-04-24 16:56 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Christian König, Alex Deucher

Try unevicting only VRAM/GTT BOs.

Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 50 +++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 64f5001a7dc5d..98e8a40408804 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -166,6 +166,31 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
 	*placement = abo->placement;
 }

+/**
+ * amdgpu_unevict_flags - Compute placement flags
+ *
+ * @bo: The buffer object to unevict
+ * @dest: Destination for unevicted BO
+ *
+ * Fill in placement data when for restoring evicted BOs
+ */
+static void amdgpu_unevict_flags(struct ttm_buffer_object *bo,
+				 struct ttm_placement *dest)
+{
+	struct amdgpu_bo *abo = ttm_to_amdgpu_bo(bo);
+
+	WARN_ON(bo->evicted_type == AMDGPU_PL_GDS ||
+		bo->evicted_type == AMDGPU_PL_GWS ||
+		bo->evicted_type == AMDGPU_PL_OA ||
+		bo->evicted_type == AMDGPU_PL_DOORBELL);
+	WARN_ON(bo->evicted_type == TTM_NUM_MEM_TYPES);
+
+	amdgpu_bo_placement_from_domain(abo, abo->preferred_domains);
+	*dest = abo->placement;
+	dest->num_placement = 1;
+	dest->num_busy_placement = 1;
+}
+
 /**
  * amdgpu_ttm_map_buffer - Map memory into the GART windows
  * @bo: buffer object to map
@@ -1424,6 +1449,29 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
 	return ttm_bo_eviction_valuable(bo, place);
 }

+/*
+ * amdgpu_ttm_bo_uneviction_valuable - Check to see if we can unevict a
+ * buffer object.
+ *
+ * Return true if uneviction is sensible. Called by ttm_bo_evict to
+ * decide whether to consider the buffer object for uneviction later.
+ */
+static bool amdgpu_ttm_bo_uneviction_valuable(struct ttm_buffer_object *bo)
+{
+	struct amdgpu_bo *abo;
+
+	if (!amdgpu_bo_is_amdgpu_bo(bo))
+		return false;
+
+	abo = ttm_to_amdgpu_bo(bo);
+
+	if (bo->type != ttm_bo_type_device)
+		return false;
+
+	return (abo->preferred_domains &
+		(AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT)) != 0;
+}
+
 static void amdgpu_ttm_vram_mm_access(struct amdgpu_device *adev, loff_t pos,
 				      void *buf, size_t size, bool write)
 {
@@ -1581,6 +1629,8 @@ static struct ttm_device_funcs amdgpu_bo_driver = {
 	.ttm_tt_destroy = &amdgpu_ttm_backend_destroy,
 	.eviction_valuable = amdgpu_ttm_bo_eviction_valuable,
 	.evict_flags = &amdgpu_evict_flags,
+	.uneviction_valuable = &amdgpu_ttm_bo_uneviction_valuable,
+	.unevict_flags = &amdgpu_unevict_flags,
 	.move = &amdgpu_bo_move,
 	.delete_mem_notify = &amdgpu_bo_delete_mem_notify,
 	.release_notify = &amdgpu_bo_release_notify,
--
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 08/18] drm/amdgpu: Don't try moving BOs to preferred domain before submit
  2024-04-24 16:56 [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Friedrich Vock
                   ` (6 preceding siblings ...)
  2024-04-24 16:56 ` [RFC PATCH 07/18] drm/amdgpu: Add TTM uneviction control functions Friedrich Vock
@ 2024-04-24 16:56 ` Friedrich Vock
  2024-04-25  6:36   ` Christian König
  2024-04-24 16:56 ` [RFC PATCH 09/18] drm/amdgpu: Don't mark VRAM as a busy placement for VRAM|GTT resources Friedrich Vock
                   ` (12 subsequent siblings)
  20 siblings, 1 reply; 40+ messages in thread
From: Friedrich Vock @ 2024-04-24 16:56 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Christian König, Alex Deucher

TTM now takes care of moving buffers to the best possible domain.

Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   2 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c     | 191 +--------------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h     |   4 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |   7 -
 4 files changed, 3 insertions(+), 201 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index cac0ca64367b3..3004adc6fa679 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1404,8 +1404,6 @@ bool amdgpu_device_need_post(struct amdgpu_device *adev);
 bool amdgpu_device_seamless_boot_supported(struct amdgpu_device *adev);
 bool amdgpu_device_should_use_aspm(struct amdgpu_device *adev);

-void amdgpu_cs_report_moved_bytes(struct amdgpu_device *adev, u64 num_bytes,
-				  u64 num_vis_bytes);
 int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev);
 void amdgpu_device_program_register_sequence(struct amdgpu_device *adev,
 					     const u32 *registers,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index e9168677ef0a6..92a0cffc1adc3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -638,196 +638,19 @@ static int amdgpu_cs_pass2(struct amdgpu_cs_parser *p)
 	return 0;
 }

-/* Convert microseconds to bytes. */
-static u64 us_to_bytes(struct amdgpu_device *adev, s64 us)
-{
-	if (us <= 0 || !adev->mm_stats.log2_max_MBps)
-		return 0;
-
-	/* Since accum_us is incremented by a million per second, just
-	 * multiply it by the number of MB/s to get the number of bytes.
-	 */
-	return us << adev->mm_stats.log2_max_MBps;
-}
-
-static s64 bytes_to_us(struct amdgpu_device *adev, u64 bytes)
-{
-	if (!adev->mm_stats.log2_max_MBps)
-		return 0;
-
-	return bytes >> adev->mm_stats.log2_max_MBps;
-}
-
-/* Returns how many bytes TTM can move right now. If no bytes can be moved,
- * it returns 0. If it returns non-zero, it's OK to move at least one buffer,
- * which means it can go over the threshold once. If that happens, the driver
- * will be in debt and no other buffer migrations can be done until that debt
- * is repaid.
- *
- * This approach allows moving a buffer of any size (it's important to allow
- * that).
- *
- * The currency is simply time in microseconds and it increases as the clock
- * ticks. The accumulated microseconds (us) are converted to bytes and
- * returned.
- */
-static void amdgpu_cs_get_threshold_for_moves(struct amdgpu_device *adev,
-					      u64 *max_bytes,
-					      u64 *max_vis_bytes)
-{
-	s64 time_us, increment_us;
-	u64 free_vram, total_vram, used_vram;
-	/* Allow a maximum of 200 accumulated ms. This is basically per-IB
-	 * throttling.
-	 *
-	 * It means that in order to get full max MBps, at least 5 IBs per
-	 * second must be submitted and not more than 200ms apart from each
-	 * other.
-	 */
-	const s64 us_upper_bound = 200000;
-
-	if (!adev->mm_stats.log2_max_MBps) {
-		*max_bytes = 0;
-		*max_vis_bytes = 0;
-		return;
-	}
-
-	total_vram = adev->gmc.real_vram_size - atomic64_read(&adev->vram_pin_size);
-	used_vram = ttm_resource_manager_usage(&adev->mman.vram_mgr.manager);
-	free_vram = used_vram >= total_vram ? 0 : total_vram - used_vram;
-
-	spin_lock(&adev->mm_stats.lock);
-
-	/* Increase the amount of accumulated us. */
-	time_us = ktime_to_us(ktime_get());
-	increment_us = time_us - adev->mm_stats.last_update_us;
-	adev->mm_stats.last_update_us = time_us;
-	adev->mm_stats.accum_us = min(adev->mm_stats.accum_us + increment_us,
-				      us_upper_bound);
-
-	/* This prevents the short period of low performance when the VRAM
-	 * usage is low and the driver is in debt or doesn't have enough
-	 * accumulated us to fill VRAM quickly.
-	 *
-	 * The situation can occur in these cases:
-	 * - a lot of VRAM is freed by userspace
-	 * - the presence of a big buffer causes a lot of evictions
-	 *   (solution: split buffers into smaller ones)
-	 *
-	 * If 128 MB or 1/8th of VRAM is free, start filling it now by setting
-	 * accum_us to a positive number.
-	 */
-	if (free_vram >= 128 * 1024 * 1024 || free_vram >= total_vram / 8) {
-		s64 min_us;
-
-		/* Be more aggressive on dGPUs. Try to fill a portion of free
-		 * VRAM now.
-		 */
-		if (!(adev->flags & AMD_IS_APU))
-			min_us = bytes_to_us(adev, free_vram / 4);
-		else
-			min_us = 0; /* Reset accum_us on APUs. */
-
-		adev->mm_stats.accum_us = max(min_us, adev->mm_stats.accum_us);
-	}
-
-	/* This is set to 0 if the driver is in debt to disallow (optional)
-	 * buffer moves.
-	 */
-	*max_bytes = us_to_bytes(adev, adev->mm_stats.accum_us);
-
-	/* Do the same for visible VRAM if half of it is free */
-	if (!amdgpu_gmc_vram_full_visible(&adev->gmc)) {
-		u64 total_vis_vram = adev->gmc.visible_vram_size;
-		u64 used_vis_vram =
-		  amdgpu_vram_mgr_vis_usage(&adev->mman.vram_mgr);
-
-		if (used_vis_vram < total_vis_vram) {
-			u64 free_vis_vram = total_vis_vram - used_vis_vram;
-
-			adev->mm_stats.accum_us_vis = min(adev->mm_stats.accum_us_vis +
-							  increment_us, us_upper_bound);
-
-			if (free_vis_vram >= total_vis_vram / 2)
-				adev->mm_stats.accum_us_vis =
-					max(bytes_to_us(adev, free_vis_vram / 2),
-					    adev->mm_stats.accum_us_vis);
-		}
-
-		*max_vis_bytes = us_to_bytes(adev, adev->mm_stats.accum_us_vis);
-	} else {
-		*max_vis_bytes = 0;
-	}
-
-	spin_unlock(&adev->mm_stats.lock);
-}
-
-/* Report how many bytes have really been moved for the last command
- * submission. This can result in a debt that can stop buffer migrations
- * temporarily.
- */
-void amdgpu_cs_report_moved_bytes(struct amdgpu_device *adev, u64 num_bytes,
-				  u64 num_vis_bytes)
-{
-	spin_lock(&adev->mm_stats.lock);
-	adev->mm_stats.accum_us -= bytes_to_us(adev, num_bytes);
-	adev->mm_stats.accum_us_vis -= bytes_to_us(adev, num_vis_bytes);
-	spin_unlock(&adev->mm_stats.lock);
-}
-
 static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo)
 {
-	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
-	struct amdgpu_cs_parser *p = param;
 	struct ttm_operation_ctx ctx = {
 		.interruptible = true,
 		.no_wait_gpu = false,
 		.resv = bo->tbo.base.resv
 	};
-	uint32_t domain;
-	int r;

 	if (bo->tbo.pin_count)
 		return 0;

-	/* Don't move this buffer if we have depleted our allowance
-	 * to move it. Don't move anything if the threshold is zero.
-	 */
-	if (p->bytes_moved < p->bytes_moved_threshold &&
-	    (!bo->tbo.base.dma_buf ||
-	    list_empty(&bo->tbo.base.dma_buf->attachments))) {
-		if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
-		    (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)) {
-			/* And don't move a CPU_ACCESS_REQUIRED BO to limited
-			 * visible VRAM if we've depleted our allowance to do
-			 * that.
-			 */
-			if (p->bytes_moved_vis < p->bytes_moved_vis_threshold)
-				domain = bo->preferred_domains;
-			else
-				domain = bo->allowed_domains;
-		} else {
-			domain = bo->preferred_domains;
-		}
-	} else {
-		domain = bo->allowed_domains;
-	}
-
-retry:
-	amdgpu_bo_placement_from_domain(bo, domain);
-	r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
-
-	p->bytes_moved += ctx.bytes_moved;
-	if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
-	    amdgpu_res_cpu_visible(adev, bo->tbo.resource))
-		p->bytes_moved_vis += ctx.bytes_moved;
-
-	if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {
-		domain = bo->allowed_domains;
-		goto retry;
-	}
-
-	return r;
+	amdgpu_bo_placement_from_domain(bo, bo->allowed_domains);
+	return ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
 }

 static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
@@ -947,13 +770,8 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 		e->user_pages = NULL;
 	}

-	amdgpu_cs_get_threshold_for_moves(p->adev, &p->bytes_moved_threshold,
-					  &p->bytes_moved_vis_threshold);
-	p->bytes_moved = 0;
-	p->bytes_moved_vis = 0;
-
 	r = amdgpu_vm_validate(p->adev, &fpriv->vm, NULL,
-			       amdgpu_cs_bo_validate, p);
+			       amdgpu_cs_bo_validate, NULL);
 	if (r) {
 		DRM_ERROR("amdgpu_vm_validate() failed.\n");
 		goto out_free_user_pages;
@@ -973,9 +791,6 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 		p->gang_leader->uf_addr += amdgpu_bo_gpu_offset(p->uf_bo);
 	}

-	amdgpu_cs_report_moved_bytes(p->adev, p->bytes_moved,
-				     p->bytes_moved_vis);
-
 	for (i = 0; i < p->gang_size; ++i)
 		amdgpu_job_set_resources(p->jobs[i], p->bo_list->gds_obj,
 					 p->bo_list->gws_obj,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h
index 39c33ad100cb7..e3d04ac4764be 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h
@@ -67,10 +67,6 @@ struct amdgpu_cs_parser {
 	struct amdgpu_bo_list		*bo_list;
 	struct amdgpu_mn		*mn;
 	struct dma_fence		*fence;
-	uint64_t			bytes_moved_threshold;
-	uint64_t			bytes_moved_vis_threshold;
-	uint64_t			bytes_moved;
-	uint64_t			bytes_moved_vis;

 	/* user fence */
 	struct amdgpu_bo		*uf_bo;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 331b9ed8062c7..5834a95d680d9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -621,13 +621,6 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
 	if (unlikely(r != 0))
 		return r;

-	if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
-	    amdgpu_res_cpu_visible(adev, bo->tbo.resource))
-		amdgpu_cs_report_moved_bytes(adev, ctx.bytes_moved,
-					     ctx.bytes_moved);
-	else
-		amdgpu_cs_report_moved_bytes(adev, ctx.bytes_moved, 0);
-
 	if (bp->flags & AMDGPU_GEM_CREATE_VRAM_CLEARED &&
 	    bo->tbo.resource->mem_type == TTM_PL_VRAM) {
 		struct dma_fence *fence;
--
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 09/18] drm/amdgpu: Don't mark VRAM as a busy placement for VRAM|GTT resources
  2024-04-24 16:56 [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Friedrich Vock
                   ` (7 preceding siblings ...)
  2024-04-24 16:56 ` [RFC PATCH 08/18] drm/amdgpu: Don't try moving BOs to preferred domain before submit Friedrich Vock
@ 2024-04-24 16:56 ` Friedrich Vock
  2024-04-25  6:24   ` Christian König
  2024-04-24 16:57 ` [RFC PATCH 10/18] drm/amdgpu: Don't add GTT to initial domains after failing to allocate VRAM Friedrich Vock
                   ` (11 subsequent siblings)
  20 siblings, 1 reply; 40+ messages in thread
From: Friedrich Vock @ 2024-04-24 16:56 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Christian König, Alex Deucher

We will never try evicting things from VRAM for these resources anyway.
This affects TTM buffer uneviction logic, which would otherwise try to
move these buffers into VRAM (clashing with VRAM-only allocations).

Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 5834a95d680d9..85c10d8086188 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -127,6 +127,7 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain)
 	struct amdgpu_device *adev = amdgpu_ttm_adev(abo->tbo.bdev);
 	struct ttm_placement *placement = &abo->placement;
 	struct ttm_place *places = abo->placements;
+	bool skip_vram_busy = false;
 	u64 flags = abo->flags;
 	u32 c = 0;

@@ -156,6 +157,13 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain)
 		if (flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)
 			places[c].flags |= TTM_PL_FLAG_CONTIGUOUS;
 		c++;
+
+		/*
+		 * If GTT is preferred by the buffer as well, don't try VRAM when it's
+		 * busy.
+		 */
+		if ((domain & abo->preferred_domains) & AMDGPU_GEM_DOMAIN_GTT)
+			skip_vram_busy = true;
 	}

 	if (domain & AMDGPU_GEM_DOMAIN_DOORBELL) {
@@ -223,6 +231,11 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain)

 	placement->num_busy_placement = c;
 	placement->busy_placement = places;
+
+	if (skip_vram_busy) {
+		--placement->num_busy_placement;
+		++placement->busy_placement;
+	}
 }

 /**
--
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 10/18] drm/amdgpu: Don't add GTT to initial domains after failing to allocate VRAM
  2024-04-24 16:56 [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Friedrich Vock
                   ` (8 preceding siblings ...)
  2024-04-24 16:56 ` [RFC PATCH 09/18] drm/amdgpu: Don't mark VRAM as a busy placement for VRAM|GTT resources Friedrich Vock
@ 2024-04-24 16:57 ` Friedrich Vock
  2024-04-25  6:25   ` Christian König
  2024-04-24 16:57 ` [RFC PATCH 11/18] drm/ttm: Bump BO priority count Friedrich Vock
                   ` (10 subsequent siblings)
  20 siblings, 1 reply; 40+ messages in thread
From: Friedrich Vock @ 2024-04-24 16:57 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Christian König, Alex Deucher

This adds GTT to the "preferred domains" of this buffer object, which
will also prevent any attempts at moving the buffer back to VRAM if
there is space. If VRAM is full, GTT will already be chosen as a
fallback.

Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    | 4 ----
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +-
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 6bbab141eaaeb..aea3770d3ea2e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -378,10 +378,6 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void *data,
 			goto retry;
 		}

-		if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
-			initial_domain |= AMDGPU_GEM_DOMAIN_GTT;
-			goto retry;
-		}
 		DRM_DEBUG("Failed to allocate GEM object (%llu, %d, %llu, %d)\n",
 				size, initial_domain, args->in.alignment, r);
 	}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 85c10d8086188..9978b85ed6f40 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -619,7 +619,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
 			  AMDGPU_GEM_DOMAIN_GDS))
 		amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
 	else
-		amdgpu_bo_placement_from_domain(bo, bp->domain);
+		amdgpu_bo_placement_from_domain(bo, bo->allowed_domains);
 	if (bp->type == ttm_bo_type_kernel)
 		bo->tbo.priority = 2;
 	else if (!(bp->flags & AMDGPU_GEM_CREATE_DISCARDABLE))
--
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 11/18] drm/ttm: Bump BO priority count
  2024-04-24 16:56 [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Friedrich Vock
                   ` (9 preceding siblings ...)
  2024-04-24 16:57 ` [RFC PATCH 10/18] drm/amdgpu: Don't add GTT to initial domains after failing to allocate VRAM Friedrich Vock
@ 2024-04-24 16:57 ` Friedrich Vock
  2024-04-24 16:57 ` [RFC PATCH 12/18] drm/ttm: Do not evict BOs with higher priority Friedrich Vock
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 40+ messages in thread
From: Friedrich Vock @ 2024-04-24 16:57 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Christian König, Alex Deucher

For adjustable priorities by userspace, it is nice to have a bit more
granularity.

Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
---
 include/drm/ttm/ttm_resource.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h
index 7d1ce059c8805..241643447488a 100644
--- a/include/drm/ttm/ttm_resource.h
+++ b/include/drm/ttm/ttm_resource.h
@@ -35,7 +35,7 @@
 #include <drm/ttm/ttm_caching.h>
 #include <drm/ttm/ttm_kmap_iter.h>

-#define TTM_MAX_BO_PRIORITY	4U
+#define TTM_MAX_BO_PRIORITY	8U
 #define TTM_NUM_MEM_TYPES 8

 struct ttm_device;
--
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 12/18] drm/ttm: Do not evict BOs with higher priority
  2024-04-24 16:56 [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Friedrich Vock
                   ` (10 preceding siblings ...)
  2024-04-24 16:57 ` [RFC PATCH 11/18] drm/ttm: Bump BO priority count Friedrich Vock
@ 2024-04-24 16:57 ` Friedrich Vock
  2024-04-25  6:26   ` Christian König
  2024-04-24 16:57 ` [RFC PATCH 13/18] drm/ttm: Implement ttm_bo_update_priority Friedrich Vock
                   ` (8 subsequent siblings)
  20 siblings, 1 reply; 40+ messages in thread
From: Friedrich Vock @ 2024-04-24 16:57 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Christian König, Alex Deucher

This makes buffer eviction significantly more stable by avoiding
ping-ponging caused by low-priority buffers evicting high-priority
buffers and vice versa.

Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
---
 drivers/gpu/drm/ttm/ttm_bo.c       | 9 +++++++--
 drivers/gpu/drm/ttm/ttm_resource.c | 5 +++--
 include/drm/ttm/ttm_bo.h           | 1 +
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 3047c763eb4eb..eae54cd4a7ce9 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -776,6 +776,7 @@ static int ttm_mem_evict_wait_busy(struct ttm_buffer_object *busy_bo,
 int ttm_mem_evict_first(struct ttm_device *bdev,
 			struct ttm_resource_manager *man,
 			const struct ttm_place *place,
+			unsigned int max_priority,
 			struct ttm_operation_ctx *ctx,
 			struct ww_acquire_ctx *ticket)
 {
@@ -788,6 +789,8 @@ int ttm_mem_evict_first(struct ttm_device *bdev,
 	spin_lock(&bdev->lru_lock);
 	ttm_resource_manager_for_each_res(man, &cursor, res) {
 		bool busy;
+		if (res->bo->priority > max_priority)
+			break;

 		if (!ttm_bo_evict_swapout_allowable(res->bo, ctx, place,
 						    &locked, &busy)) {
@@ -930,8 +933,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
 			return ret;
 		if (ctx->no_evict)
 			return -ENOSPC;
-		ret = ttm_mem_evict_first(bdev, man, place, ctx,
-					  ticket);
+		if (!bo->priority)
+			return -ENOSPC;
+		ret = ttm_mem_evict_first(bdev, man, place, bo->priority - 1,
+					  ctx, ticket);
 		if (unlikely(ret != 0))
 			return ret;
 	} while (1);
diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
index 1d6755a1153b1..63d4371adb519 100644
--- a/drivers/gpu/drm/ttm/ttm_resource.c
+++ b/drivers/gpu/drm/ttm/ttm_resource.c
@@ -431,8 +431,9 @@ int ttm_resource_manager_evict_all(struct ttm_device *bdev,
 	for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
 		while (!list_empty(&man->lru[i])) {
 			spin_unlock(&bdev->lru_lock);
-			ret = ttm_mem_evict_first(bdev, man, NULL, &ctx,
-						  NULL);
+			ret = ttm_mem_evict_first(bdev, man, NULL,
+						  TTM_MAX_BO_PRIORITY,
+						  &ctx, NULL);
 			if (ret)
 				return ret;
 			spin_lock(&bdev->lru_lock);
diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
index 8f4e6366c0417..91299a3b6fcfa 100644
--- a/include/drm/ttm/ttm_bo.h
+++ b/include/drm/ttm/ttm_bo.h
@@ -396,6 +396,7 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo);
 int ttm_mem_evict_first(struct ttm_device *bdev,
 			struct ttm_resource_manager *man,
 			const struct ttm_place *place,
+			unsigned int max_priority,
 			struct ttm_operation_ctx *ctx,
 			struct ww_acquire_ctx *ticket);
 void ttm_mem_unevict_evicted(struct ttm_device *bdev,
--
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 13/18] drm/ttm: Implement ttm_bo_update_priority
  2024-04-24 16:56 [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Friedrich Vock
                   ` (11 preceding siblings ...)
  2024-04-24 16:57 ` [RFC PATCH 12/18] drm/ttm: Do not evict BOs with higher priority Friedrich Vock
@ 2024-04-24 16:57 ` Friedrich Vock
  2024-04-25  6:29   ` Christian König
  2024-04-24 16:57 ` [RFC PATCH 14/18] drm/ttm: Consider BOs placed in non-favorite locations evicted Friedrich Vock
                   ` (7 subsequent siblings)
  20 siblings, 1 reply; 40+ messages in thread
From: Friedrich Vock @ 2024-04-24 16:57 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Christian König, Alex Deucher

Used to dynamically adjust priorities of buffers at runtime, to react to
changes in memory pressure/usage patterns.

Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 17 +++++++++++++++++
 include/drm/ttm/ttm_bo.h     |  2 ++
 2 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index eae54cd4a7ce9..6ac939c58a6b8 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -112,6 +112,23 @@ void ttm_bo_set_bulk_move(struct ttm_buffer_object *bo,
 }
 EXPORT_SYMBOL(ttm_bo_set_bulk_move);

+void ttm_bo_update_priority(struct ttm_buffer_object *bo, unsigned int new_prio)
+{
+	struct ttm_resource_manager *man;
+
+	if (!bo->resource)
+		return;
+
+	man = ttm_manager_type(bo->bdev, bo->resource->mem_type);
+
+	spin_lock(&bo->bdev->lru_lock);
+	ttm_resource_del_bulk_move(bo->resource, bo);
+	bo->priority = new_prio;
+	ttm_resource_add_bulk_move(bo->resource, bo);
+	spin_unlock(&bo->bdev->lru_lock);
+}
+EXPORT_SYMBOL(ttm_bo_update_priority);
+
 static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
 				  struct ttm_resource *mem, bool evict,
 				  struct ttm_operation_ctx *ctx,
diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
index 91299a3b6fcfa..51040bc443ea0 100644
--- a/include/drm/ttm/ttm_bo.h
+++ b/include/drm/ttm/ttm_bo.h
@@ -359,6 +359,8 @@ static inline void *ttm_kmap_obj_virtual(struct ttm_bo_kmap_obj *map,
 	return map->virtual;
 }

+void ttm_bo_update_priority(struct ttm_buffer_object *bo,
+			    unsigned int new_prio);

 int ttm_bo_wait_ctx(struct ttm_buffer_object *bo,
 		    struct ttm_operation_ctx *ctx);
--
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 14/18] drm/ttm: Consider BOs placed in non-favorite locations evicted
  2024-04-24 16:56 [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Friedrich Vock
                   ` (12 preceding siblings ...)
  2024-04-24 16:57 ` [RFC PATCH 13/18] drm/ttm: Implement ttm_bo_update_priority Friedrich Vock
@ 2024-04-24 16:57 ` Friedrich Vock
  2024-04-24 16:57 ` [RFC PATCH 15/18] drm/amdgpu: Set a default priority for user/kernel BOs Friedrich Vock
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 40+ messages in thread
From: Friedrich Vock @ 2024-04-24 16:57 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Christian König, Alex Deucher

If we didn't get the favorite placement because it was full, we should
try moving it into the favorite placement once there is space.

Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 6ac939c58a6b8..af8209f3bc894 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1111,7 +1111,9 @@ int ttm_bo_validate(struct ttm_buffer_object *bo,
 		    struct ttm_placement *placement,
 		    struct ttm_operation_ctx *ctx)
 {
-	int ret;
+	struct ttm_resource_manager *man;
+	int favorite_mem_type;
+	int ret, i;

 	dma_resv_assert_held(bo->base.resv);

@@ -1133,6 +1135,30 @@ int ttm_bo_validate(struct ttm_buffer_object *bo,
 	if (ret)
 		return ret;

+	if (bo->resource) {
+		/*
+		 * Also mark the buffer as evicted if we ended up in a
+		 * non-favorite busy placement, so the buffer get
+		 * moved into the favorite spot if possible.
+		 */
+		for (i = 1; i < placement->num_busy_placement; ++i) {
+			if (bo->resource->mem_type !=
+			    placement->busy_placement[i].mem_type)
+				continue;
+
+			favorite_mem_type =
+				placement->busy_placement[0].mem_type;
+			man = ttm_manager_type(bo->bdev,
+					       favorite_mem_type);
+
+			spin_lock(&bo->bdev->unevict_lock);
+			list_add_tail(&bo->evicted, &man->evicted);
+			man->evicted_bytes += bo->base.size;
+			spin_unlock(&bo->bdev->unevict_lock);
+			bo->evicted_type = favorite_mem_type;
+		}
+	}
+
 	/*
 	 * We might need to add a TTM.
 	 */
--
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 15/18] drm/amdgpu: Set a default priority for user/kernel BOs
  2024-04-24 16:56 [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Friedrich Vock
                   ` (13 preceding siblings ...)
  2024-04-24 16:57 ` [RFC PATCH 14/18] drm/ttm: Consider BOs placed in non-favorite locations evicted Friedrich Vock
@ 2024-04-24 16:57 ` Friedrich Vock
  2024-04-24 16:57 ` [RFC PATCH 16/18] drm/amdgpu: Implement SET_PRIORITY GEM op Friedrich Vock
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 40+ messages in thread
From: Friedrich Vock @ 2024-04-24 16:57 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Christian König, Alex Deucher

Reserve the highest priority for the kernel, and choose a balanced value
as userspace default. Userspace is intended to be able to modify these
later to mark buffers as important/unimportant.

Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 4 ++++
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index aea3770d3ea2e..5ca13e2e50f50 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -114,6 +114,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size,
 	bp.type = type;
 	bp.resv = resv;
 	bp.preferred_domain = initial_domain;
+	bp.priority = 4;
 	bp.flags = flags;
 	bp.domain = initial_domain;
 	bp.bo_ptr_size = sizeof(struct amdgpu_bo);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 9978b85ed6f40..0e9ea11a873ef 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -621,9 +621,9 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
 	else
 		amdgpu_bo_placement_from_domain(bo, bo->allowed_domains);
 	if (bp->type == ttm_bo_type_kernel)
-		bo->tbo.priority = 2;
+		bo->tbo.priority = AMDGPU_BO_PRIORITY_KERNEL;
 	else if (!(bp->flags & AMDGPU_GEM_CREATE_DISCARDABLE))
-		bo->tbo.priority = 1;
+		bo->tbo.priority = bp->priority;

 	if (!bp->destroy)
 		bp->destroy = &amdgpu_bo_destroy;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index 0f277bc6a2e32..36513da0ec767 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -42,6 +42,9 @@
 /* BO flag to indicate a KFD userptr BO */
 #define AMDGPU_AMDKFD_CREATE_USERPTR_BO	(1ULL << 63)

+#define AMDGPU_BO_PRIORITY_KERNEL    (TTM_MAX_BO_PRIORITY - 1)
+#define AMDGPU_BO_PRIORITY_MAX_USER  (TTM_MAX_BO_PRIORITY - 2)
+
 #define to_amdgpu_bo_user(abo) container_of((abo), struct amdgpu_bo_user, bo)
 #define to_amdgpu_bo_vm(abo) container_of((abo), struct amdgpu_bo_vm, bo)

@@ -52,6 +55,7 @@ struct amdgpu_bo_param {
 	u32				domain;
 	u32				preferred_domain;
 	u64				flags;
+	unsigned int                    priority;
 	enum ttm_bo_type		type;
 	bool				no_wait_gpu;
 	struct dma_resv			*resv;
--
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 16/18] drm/amdgpu: Implement SET_PRIORITY GEM op
  2024-04-24 16:56 [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Friedrich Vock
                   ` (14 preceding siblings ...)
  2024-04-24 16:57 ` [RFC PATCH 15/18] drm/amdgpu: Set a default priority for user/kernel BOs Friedrich Vock
@ 2024-04-24 16:57 ` Friedrich Vock
  2024-04-25  6:32   ` Christian König
  2024-04-24 16:57 ` [RFC PATCH 17/18] drm/amdgpu: Implement EVICTED_VRAM query Friedrich Vock
                   ` (4 subsequent siblings)
  20 siblings, 1 reply; 40+ messages in thread
From: Friedrich Vock @ 2024-04-24 16:57 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Christian König, Alex Deucher

Used by userspace to adjust buffer priorities in response to changes in
application demand and memory pressure.

Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 20 ++++++++++++++++++++
 include/uapi/drm/amdgpu_drm.h           |  1 +
 2 files changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 5ca13e2e50f50..6107810a9c205 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -836,8 +836,10 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data,
 {
 	struct amdgpu_device *adev = drm_to_adev(dev);
 	struct drm_amdgpu_gem_op *args = data;
+	struct ttm_resource_manager *man;
 	struct drm_gem_object *gobj;
 	struct amdgpu_vm_bo_base *base;
+	struct ttm_operation_ctx ctx;
 	struct amdgpu_bo *robj;
 	int r;

@@ -851,6 +853,9 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data,
 	if (unlikely(r))
 		goto out;

+	memset(&ctx, 0, sizeof(ctx));
+	ctx.interruptible = true;
+
 	switch (args->op) {
 	case AMDGPU_GEM_OP_GET_GEM_CREATE_INFO: {
 		struct drm_amdgpu_gem_create_in info;
@@ -898,6 +903,21 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data,

 		amdgpu_bo_unreserve(robj);
 		break;
+	case AMDGPU_GEM_OP_SET_PRIORITY:
+		if (args->value > AMDGPU_BO_PRIORITY_MAX_USER)
+			args->value = AMDGPU_BO_PRIORITY_MAX_USER;
+		ttm_bo_update_priority(&robj->tbo, args->value);
+		if (robj->tbo.evicted_type != TTM_NUM_MEM_TYPES) {
+			ttm_bo_try_unevict(&robj->tbo, &ctx);
+			amdgpu_bo_unreserve(robj);
+		} else {
+			amdgpu_bo_unreserve(robj);
+			man = ttm_manager_type(robj->tbo.bdev,
+				robj->tbo.resource->mem_type);
+			ttm_mem_unevict_evicted(robj->tbo.bdev, man,
+						true);
+		}
+		break;
 	default:
 		amdgpu_bo_unreserve(robj);
 		r = -EINVAL;
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index bdbe6b262a78d..53552dd489b9b 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -531,6 +531,7 @@ union drm_amdgpu_wait_fences {

 #define AMDGPU_GEM_OP_GET_GEM_CREATE_INFO	0
 #define AMDGPU_GEM_OP_SET_PLACEMENT		1
+#define AMDGPU_GEM_OP_SET_PRIORITY              2

 /* Sets or returns a value associated with a buffer. */
 struct drm_amdgpu_gem_op {
--
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 17/18] drm/amdgpu: Implement EVICTED_VRAM query
  2024-04-24 16:56 [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Friedrich Vock
                   ` (15 preceding siblings ...)
  2024-04-24 16:57 ` [RFC PATCH 16/18] drm/amdgpu: Implement SET_PRIORITY GEM op Friedrich Vock
@ 2024-04-24 16:57 ` Friedrich Vock
  2024-04-24 16:57 ` [RFC PATCH 18/18] drm/amdgpu: Bump minor version Friedrich Vock
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 40+ messages in thread
From: Friedrich Vock @ 2024-04-24 16:57 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Christian König, Alex Deucher

Used by userspace to gauge the severity of memory overcommit and make
prioritization decisions based on it.

Used by userspace to gauge the severity of memory overcommit and make
prioritization decisions based on it.

Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 3 +++
 include/uapi/drm/amdgpu_drm.h           | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 924baf58e3226..8cba30144bac6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -1246,6 +1246,9 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
 		return copy_to_user(out, &gpuvm_fault,
 				    min((size_t)size, sizeof(gpuvm_fault))) ? -EFAULT : 0;
 	}
+	case AMDGPU_INFO_EVICTED_VRAM:
+		ui64 = ttm_resource_manager_evicted_bytes(&adev->mman.vram_mgr.manager);
+		return copy_to_user(out, &ui64, min(size, 8u)) ? -EFAULT : 0;
 	default:
 		DRM_DEBUG_KMS("Invalid request %d\n", info->query);
 		return -EINVAL;
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 53552dd489b9b..5d04719386686 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -920,6 +920,8 @@ struct drm_amdgpu_cs_chunk_cp_gfx_shadow {
 #define AMDGPU_INFO_MAX_IBS			0x22
 /* query last page fault info */
 #define AMDGPU_INFO_GPUVM_FAULT			0x23
+/* query size of evicted vram allocations */
+#define AMDGPU_INFO_EVICTED_VRAM                0x24

 #define AMDGPU_INFO_MMR_SE_INDEX_SHIFT	0
 #define AMDGPU_INFO_MMR_SE_INDEX_MASK	0xff
--
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [RFC PATCH 18/18] drm/amdgpu: Bump minor version
  2024-04-24 16:56 [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Friedrich Vock
                   ` (16 preceding siblings ...)
  2024-04-24 16:57 ` [RFC PATCH 17/18] drm/amdgpu: Implement EVICTED_VRAM query Friedrich Vock
@ 2024-04-24 16:57 ` Friedrich Vock
  2024-04-25  6:54 ` [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Christian König
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 40+ messages in thread
From: Friedrich Vock @ 2024-04-24 16:57 UTC (permalink / raw)
  To: dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Christian König, Alex Deucher

Indicates support for EVICTED_VRAM queries and
AMDGPU_GEM_OP_SET_PRIORITY

Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index ea14f1c8f4304..4f8b62dbba17f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -116,9 +116,10 @@
  * - 3.55.0 - Add AMDGPU_INFO_GPUVM_FAULT query
  * - 3.56.0 - Update IB start address and size alignment for decode and encode
  * - 3.57.0 - Compute tunneling on GFX10+
+ * - 3.58.0 - Per-BO priorities and evicted memory size queries
  */
 #define KMS_DRIVER_MAJOR	3
-#define KMS_DRIVER_MINOR	57
+#define KMS_DRIVER_MINOR	58
 #define KMS_DRIVER_PATCHLEVEL	0

 /*
--
2.44.0


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 02/18] drm/ttm: Add per-BO eviction tracking
  2024-04-24 16:56 ` [RFC PATCH 02/18] drm/ttm: Add per-BO eviction tracking Friedrich Vock
@ 2024-04-25  6:18   ` Christian König
  2024-04-25 19:02     ` Matthew Brost
  0 siblings, 1 reply; 40+ messages in thread
From: Christian König @ 2024-04-25  6:18 UTC (permalink / raw)
  To: Friedrich Vock, dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Alex Deucher

Am 24.04.24 um 18:56 schrieb Friedrich Vock:
> Make each buffer object aware of whether it has been evicted or not.

That reverts some changes we made a couple of years ago.

In general the idea is that eviction isn't something we need to reverse 
in TTM.

Rather the driver gives the desired placement.

Regards,
Christian.

>
> Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c |  1 +
>   include/drm/ttm/ttm_bo.h     | 11 +++++++++++
>   2 files changed, 12 insertions(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index edf10618fe2b2..3968b17453569 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -980,6 +980,7 @@ int ttm_bo_init_reserved(struct ttm_device *bdev, struct ttm_buffer_object *bo,
>   	bo->pin_count = 0;
>   	bo->sg = sg;
>   	bo->bulk_move = NULL;
> +	bo->evicted_type = TTM_NUM_MEM_TYPES;
>   	if (resv)
>   		bo->base.resv = resv;
>   	else
> diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
> index 0223a41a64b24..8a1a29c6fbc50 100644
> --- a/include/drm/ttm/ttm_bo.h
> +++ b/include/drm/ttm/ttm_bo.h
> @@ -121,6 +121,17 @@ struct ttm_buffer_object {
>   	unsigned priority;
>   	unsigned pin_count;
>
> +	/**
> +	 * @evicted_type: Memory type this BO was evicted from, if any.
> +	 * TTM_NUM_MEM_TYPES if this BO was not evicted.
> +	 */
> +	int evicted_type;
> +	/**
> +	 * @evicted: Entry in the evicted list for the resource manager
> +	 * this BO was evicted from.
> +	 */
> +	struct list_head evicted;
> +
>   	/**
>   	 * @delayed_delete: Work item used when we can't delete the BO
>   	 * immediately
> --
> 2.44.0
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 05/18] drm/ttm: Add option to evict no BOs in operation
  2024-04-24 16:56 ` [RFC PATCH 05/18] drm/ttm: Add option to evict no BOs in operation Friedrich Vock
@ 2024-04-25  6:20   ` Christian König
  0 siblings, 0 replies; 40+ messages in thread
From: Christian König @ 2024-04-25  6:20 UTC (permalink / raw)
  To: Friedrich Vock, dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Alex Deucher

Am 24.04.24 um 18:56 schrieb Friedrich Vock:
> When undoing evictions because of decreased memory pressure, it makes no
> sense to try evicting other buffers.

That duplicates some functionality.

If a driver doesn't want eviction to happen it just needs to mark the 
desired placements as non-evictable with the TTM_PL_FLAG_DESIRED flag.

Regards,
Christian.

>
> Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 2 ++
>   include/drm/ttm/ttm_bo.h     | 2 ++
>   2 files changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 9a0efbf79316c..3b89fabc2f00a 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -764,6 +764,8 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
>   			break;
>   		if (unlikely(ret != -ENOSPC))
>   			return ret;
> +		if (ctx->no_evict)
> +			return -ENOSPC;
>   		ret = ttm_mem_evict_first(bdev, man, place, ctx,
>   					  ticket);
>   		if (unlikely(ret != 0))
> diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
> index 8a1a29c6fbc50..a8f21092403d6 100644
> --- a/include/drm/ttm/ttm_bo.h
> +++ b/include/drm/ttm/ttm_bo.h
> @@ -192,6 +192,7 @@ struct ttm_operation_ctx {
>   	bool gfp_retry_mayfail;
>   	bool allow_res_evict;
>   	bool force_alloc;
> +	bool no_evict;
>   	struct dma_resv *resv;
>   	uint64_t bytes_moved;
>   };
> @@ -358,6 +359,7 @@ static inline void *ttm_kmap_obj_virtual(struct ttm_bo_kmap_obj *map,
>   	return map->virtual;
>   }
>
> +
>   int ttm_bo_wait_ctx(struct ttm_buffer_object *bo,
>   		    struct ttm_operation_ctx *ctx);
>   int ttm_bo_validate(struct ttm_buffer_object *bo,
> --
> 2.44.0
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 09/18] drm/amdgpu: Don't mark VRAM as a busy placement for VRAM|GTT resources
  2024-04-24 16:56 ` [RFC PATCH 09/18] drm/amdgpu: Don't mark VRAM as a busy placement for VRAM|GTT resources Friedrich Vock
@ 2024-04-25  6:24   ` Christian König
  0 siblings, 0 replies; 40+ messages in thread
From: Christian König @ 2024-04-25  6:24 UTC (permalink / raw)
  To: Friedrich Vock, dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Alex Deucher

Am 24.04.24 um 18:56 schrieb Friedrich Vock:
> We will never try evicting things from VRAM for these resources anyway.
> This affects TTM buffer uneviction logic, which would otherwise try to
> move these buffers into VRAM (clashing with VRAM-only allocations).

You are working on outdated code. That change was already done by me as 
well.

Regards,
Christian.

>
> Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 13 +++++++++++++
>   1 file changed, 13 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 5834a95d680d9..85c10d8086188 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -127,6 +127,7 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain)
>   	struct amdgpu_device *adev = amdgpu_ttm_adev(abo->tbo.bdev);
>   	struct ttm_placement *placement = &abo->placement;
>   	struct ttm_place *places = abo->placements;
> +	bool skip_vram_busy = false;
>   	u64 flags = abo->flags;
>   	u32 c = 0;
>
> @@ -156,6 +157,13 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain)
>   		if (flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS)
>   			places[c].flags |= TTM_PL_FLAG_CONTIGUOUS;
>   		c++;
> +
> +		/*
> +		 * If GTT is preferred by the buffer as well, don't try VRAM when it's
> +		 * busy.
> +		 */
> +		if ((domain & abo->preferred_domains) & AMDGPU_GEM_DOMAIN_GTT)
> +			skip_vram_busy = true;
>   	}
>
>   	if (domain & AMDGPU_GEM_DOMAIN_DOORBELL) {
> @@ -223,6 +231,11 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain)
>
>   	placement->num_busy_placement = c;
>   	placement->busy_placement = places;
> +
> +	if (skip_vram_busy) {
> +		--placement->num_busy_placement;
> +		++placement->busy_placement;
> +	}
>   }
>
>   /**
> --
> 2.44.0
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 10/18] drm/amdgpu: Don't add GTT to initial domains after failing to allocate VRAM
  2024-04-24 16:57 ` [RFC PATCH 10/18] drm/amdgpu: Don't add GTT to initial domains after failing to allocate VRAM Friedrich Vock
@ 2024-04-25  6:25   ` Christian König
  2024-04-25  7:39     ` Friedrich Vock
  0 siblings, 1 reply; 40+ messages in thread
From: Christian König @ 2024-04-25  6:25 UTC (permalink / raw)
  To: Friedrich Vock, dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Alex Deucher

Am 24.04.24 um 18:57 schrieb Friedrich Vock:
> This adds GTT to the "preferred domains" of this buffer object, which
> will also prevent any attempts at moving the buffer back to VRAM if
> there is space. If VRAM is full, GTT will already be chosen as a
> fallback.

Big NAK to that one, this is mandatory for correct operation.

Regards,
Christian.

>
> Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    | 4 ----
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +-
>   2 files changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index 6bbab141eaaeb..aea3770d3ea2e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -378,10 +378,6 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void *data,
>   			goto retry;
>   		}
>
> -		if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
> -			initial_domain |= AMDGPU_GEM_DOMAIN_GTT;
> -			goto retry;
> -		}
>   		DRM_DEBUG("Failed to allocate GEM object (%llu, %d, %llu, %d)\n",
>   				size, initial_domain, args->in.alignment, r);
>   	}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 85c10d8086188..9978b85ed6f40 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -619,7 +619,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>   			  AMDGPU_GEM_DOMAIN_GDS))
>   		amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
>   	else
> -		amdgpu_bo_placement_from_domain(bo, bp->domain);
> +		amdgpu_bo_placement_from_domain(bo, bo->allowed_domains);
>   	if (bp->type == ttm_bo_type_kernel)
>   		bo->tbo.priority = 2;
>   	else if (!(bp->flags & AMDGPU_GEM_CREATE_DISCARDABLE))
> --
> 2.44.0
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 12/18] drm/ttm: Do not evict BOs with higher priority
  2024-04-24 16:57 ` [RFC PATCH 12/18] drm/ttm: Do not evict BOs with higher priority Friedrich Vock
@ 2024-04-25  6:26   ` Christian König
  0 siblings, 0 replies; 40+ messages in thread
From: Christian König @ 2024-04-25  6:26 UTC (permalink / raw)
  To: Friedrich Vock, dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Alex Deucher

Am 24.04.24 um 18:57 schrieb Friedrich Vock:
> This makes buffer eviction significantly more stable by avoiding
> ping-ponging caused by low-priority buffers evicting high-priority
> buffers and vice versa.

And creates a deny of service for the whole system by fork() bombing.

This is another very big NAK.

Regards,
Christian.

>
> Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c       | 9 +++++++--
>   drivers/gpu/drm/ttm/ttm_resource.c | 5 +++--
>   include/drm/ttm/ttm_bo.h           | 1 +
>   3 files changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 3047c763eb4eb..eae54cd4a7ce9 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -776,6 +776,7 @@ static int ttm_mem_evict_wait_busy(struct ttm_buffer_object *busy_bo,
>   int ttm_mem_evict_first(struct ttm_device *bdev,
>   			struct ttm_resource_manager *man,
>   			const struct ttm_place *place,
> +			unsigned int max_priority,
>   			struct ttm_operation_ctx *ctx,
>   			struct ww_acquire_ctx *ticket)
>   {
> @@ -788,6 +789,8 @@ int ttm_mem_evict_first(struct ttm_device *bdev,
>   	spin_lock(&bdev->lru_lock);
>   	ttm_resource_manager_for_each_res(man, &cursor, res) {
>   		bool busy;
> +		if (res->bo->priority > max_priority)
> +			break;
>
>   		if (!ttm_bo_evict_swapout_allowable(res->bo, ctx, place,
>   						    &locked, &busy)) {
> @@ -930,8 +933,10 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
>   			return ret;
>   		if (ctx->no_evict)
>   			return -ENOSPC;
> -		ret = ttm_mem_evict_first(bdev, man, place, ctx,
> -					  ticket);
> +		if (!bo->priority)
> +			return -ENOSPC;
> +		ret = ttm_mem_evict_first(bdev, man, place, bo->priority - 1,
> +					  ctx, ticket);
>   		if (unlikely(ret != 0))
>   			return ret;
>   	} while (1);
> diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
> index 1d6755a1153b1..63d4371adb519 100644
> --- a/drivers/gpu/drm/ttm/ttm_resource.c
> +++ b/drivers/gpu/drm/ttm/ttm_resource.c
> @@ -431,8 +431,9 @@ int ttm_resource_manager_evict_all(struct ttm_device *bdev,
>   	for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
>   		while (!list_empty(&man->lru[i])) {
>   			spin_unlock(&bdev->lru_lock);
> -			ret = ttm_mem_evict_first(bdev, man, NULL, &ctx,
> -						  NULL);
> +			ret = ttm_mem_evict_first(bdev, man, NULL,
> +						  TTM_MAX_BO_PRIORITY,
> +						  &ctx, NULL);
>   			if (ret)
>   				return ret;
>   			spin_lock(&bdev->lru_lock);
> diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
> index 8f4e6366c0417..91299a3b6fcfa 100644
> --- a/include/drm/ttm/ttm_bo.h
> +++ b/include/drm/ttm/ttm_bo.h
> @@ -396,6 +396,7 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo);
>   int ttm_mem_evict_first(struct ttm_device *bdev,
>   			struct ttm_resource_manager *man,
>   			const struct ttm_place *place,
> +			unsigned int max_priority,
>   			struct ttm_operation_ctx *ctx,
>   			struct ww_acquire_ctx *ticket);
>   void ttm_mem_unevict_evicted(struct ttm_device *bdev,
> --
> 2.44.0
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 13/18] drm/ttm: Implement ttm_bo_update_priority
  2024-04-24 16:57 ` [RFC PATCH 13/18] drm/ttm: Implement ttm_bo_update_priority Friedrich Vock
@ 2024-04-25  6:29   ` Christian König
  0 siblings, 0 replies; 40+ messages in thread
From: Christian König @ 2024-04-25  6:29 UTC (permalink / raw)
  To: Friedrich Vock, dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Alex Deucher

Am 24.04.24 um 18:57 schrieb Friedrich Vock:
> Used to dynamically adjust priorities of buffers at runtime, to react to
> changes in memory pressure/usage patterns.

And another big NAK. TTM priorities are meant to be static based on in 
kernel decisions which are not exposed to userspace.

In other words we can group BOs based on kernel, user, SVM etc... but 
never on something userspace can influence.

Regards,
Christian.

>
> Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 17 +++++++++++++++++
>   include/drm/ttm/ttm_bo.h     |  2 ++
>   2 files changed, 19 insertions(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index eae54cd4a7ce9..6ac939c58a6b8 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -112,6 +112,23 @@ void ttm_bo_set_bulk_move(struct ttm_buffer_object *bo,
>   }
>   EXPORT_SYMBOL(ttm_bo_set_bulk_move);
>
> +void ttm_bo_update_priority(struct ttm_buffer_object *bo, unsigned int new_prio)
> +{
> +	struct ttm_resource_manager *man;
> +
> +	if (!bo->resource)
> +		return;
> +
> +	man = ttm_manager_type(bo->bdev, bo->resource->mem_type);
> +
> +	spin_lock(&bo->bdev->lru_lock);
> +	ttm_resource_del_bulk_move(bo->resource, bo);
> +	bo->priority = new_prio;
> +	ttm_resource_add_bulk_move(bo->resource, bo);
> +	spin_unlock(&bo->bdev->lru_lock);
> +}
> +EXPORT_SYMBOL(ttm_bo_update_priority);
> +
>   static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
>   				  struct ttm_resource *mem, bool evict,
>   				  struct ttm_operation_ctx *ctx,
> diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
> index 91299a3b6fcfa..51040bc443ea0 100644
> --- a/include/drm/ttm/ttm_bo.h
> +++ b/include/drm/ttm/ttm_bo.h
> @@ -359,6 +359,8 @@ static inline void *ttm_kmap_obj_virtual(struct ttm_bo_kmap_obj *map,
>   	return map->virtual;
>   }
>
> +void ttm_bo_update_priority(struct ttm_buffer_object *bo,
> +			    unsigned int new_prio);
>
>   int ttm_bo_wait_ctx(struct ttm_buffer_object *bo,
>   		    struct ttm_operation_ctx *ctx);
> --
> 2.44.0
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 16/18] drm/amdgpu: Implement SET_PRIORITY GEM op
  2024-04-24 16:57 ` [RFC PATCH 16/18] drm/amdgpu: Implement SET_PRIORITY GEM op Friedrich Vock
@ 2024-04-25  6:32   ` Christian König
  2024-04-25  6:46     ` Friedrich Vock
  0 siblings, 1 reply; 40+ messages in thread
From: Christian König @ 2024-04-25  6:32 UTC (permalink / raw)
  To: Friedrich Vock, dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Alex Deucher

Am 24.04.24 um 18:57 schrieb Friedrich Vock:
> Used by userspace to adjust buffer priorities in response to changes in
> application demand and memory pressure.

Yeah, that was discussed over and over again. One big design criteria is 
that we can't have global priorities from userspace!

The background here is that this can trivially be abused.

What we can do is to have per process priorities, but that needs to be 
in the VM subsystem.

That's also the reason why I personally think that the handling 
shouldn't be inside TTM at all.

Regards,
Christian.

>
> Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 20 ++++++++++++++++++++
>   include/uapi/drm/amdgpu_drm.h           |  1 +
>   2 files changed, 21 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index 5ca13e2e50f50..6107810a9c205 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -836,8 +836,10 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data,
>   {
>   	struct amdgpu_device *adev = drm_to_adev(dev);
>   	struct drm_amdgpu_gem_op *args = data;
> +	struct ttm_resource_manager *man;
>   	struct drm_gem_object *gobj;
>   	struct amdgpu_vm_bo_base *base;
> +	struct ttm_operation_ctx ctx;
>   	struct amdgpu_bo *robj;
>   	int r;
>
> @@ -851,6 +853,9 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data,
>   	if (unlikely(r))
>   		goto out;
>
> +	memset(&ctx, 0, sizeof(ctx));
> +	ctx.interruptible = true;
> +
>   	switch (args->op) {
>   	case AMDGPU_GEM_OP_GET_GEM_CREATE_INFO: {
>   		struct drm_amdgpu_gem_create_in info;
> @@ -898,6 +903,21 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev, void *data,
>
>   		amdgpu_bo_unreserve(robj);
>   		break;
> +	case AMDGPU_GEM_OP_SET_PRIORITY:
> +		if (args->value > AMDGPU_BO_PRIORITY_MAX_USER)
> +			args->value = AMDGPU_BO_PRIORITY_MAX_USER;
> +		ttm_bo_update_priority(&robj->tbo, args->value);
> +		if (robj->tbo.evicted_type != TTM_NUM_MEM_TYPES) {
> +			ttm_bo_try_unevict(&robj->tbo, &ctx);
> +			amdgpu_bo_unreserve(robj);
> +		} else {
> +			amdgpu_bo_unreserve(robj);
> +			man = ttm_manager_type(robj->tbo.bdev,
> +				robj->tbo.resource->mem_type);
> +			ttm_mem_unevict_evicted(robj->tbo.bdev, man,
> +						true);
> +		}
> +		break;
>   	default:
>   		amdgpu_bo_unreserve(robj);
>   		r = -EINVAL;
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index bdbe6b262a78d..53552dd489b9b 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -531,6 +531,7 @@ union drm_amdgpu_wait_fences {
>
>   #define AMDGPU_GEM_OP_GET_GEM_CREATE_INFO	0
>   #define AMDGPU_GEM_OP_SET_PLACEMENT		1
> +#define AMDGPU_GEM_OP_SET_PRIORITY              2
>
>   /* Sets or returns a value associated with a buffer. */
>   struct drm_amdgpu_gem_op {
> --
> 2.44.0
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 08/18] drm/amdgpu: Don't try moving BOs to preferred domain before submit
  2024-04-24 16:56 ` [RFC PATCH 08/18] drm/amdgpu: Don't try moving BOs to preferred domain before submit Friedrich Vock
@ 2024-04-25  6:36   ` Christian König
  0 siblings, 0 replies; 40+ messages in thread
From: Christian König @ 2024-04-25  6:36 UTC (permalink / raw)
  To: Friedrich Vock, dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Alex Deucher

Am 24.04.24 um 18:56 schrieb Friedrich Vock:
> TTM now takes care of moving buffers to the best possible domain.

Yeah, I've been planning to do this for a while as well. The problem is 
really that we need to keep the functionality.

For example TTM currently doesn't have a concept of an userspace client. 
So it can't track the bytes already evicted for each client.

This needs to be added as infrastructure first and then we can start to 
change this over into moving more functionality into TTM.

Regards,
Christian.

>
> Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   2 -
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c     | 191 +--------------------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h     |   4 -
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |   7 -
>   4 files changed, 3 insertions(+), 201 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index cac0ca64367b3..3004adc6fa679 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1404,8 +1404,6 @@ bool amdgpu_device_need_post(struct amdgpu_device *adev);
>   bool amdgpu_device_seamless_boot_supported(struct amdgpu_device *adev);
>   bool amdgpu_device_should_use_aspm(struct amdgpu_device *adev);
>
> -void amdgpu_cs_report_moved_bytes(struct amdgpu_device *adev, u64 num_bytes,
> -				  u64 num_vis_bytes);
>   int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev);
>   void amdgpu_device_program_register_sequence(struct amdgpu_device *adev,
>   					     const u32 *registers,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index e9168677ef0a6..92a0cffc1adc3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -638,196 +638,19 @@ static int amdgpu_cs_pass2(struct amdgpu_cs_parser *p)
>   	return 0;
>   }
>
> -/* Convert microseconds to bytes. */
> -static u64 us_to_bytes(struct amdgpu_device *adev, s64 us)
> -{
> -	if (us <= 0 || !adev->mm_stats.log2_max_MBps)
> -		return 0;
> -
> -	/* Since accum_us is incremented by a million per second, just
> -	 * multiply it by the number of MB/s to get the number of bytes.
> -	 */
> -	return us << adev->mm_stats.log2_max_MBps;
> -}
> -
> -static s64 bytes_to_us(struct amdgpu_device *adev, u64 bytes)
> -{
> -	if (!adev->mm_stats.log2_max_MBps)
> -		return 0;
> -
> -	return bytes >> adev->mm_stats.log2_max_MBps;
> -}
> -
> -/* Returns how many bytes TTM can move right now. If no bytes can be moved,
> - * it returns 0. If it returns non-zero, it's OK to move at least one buffer,
> - * which means it can go over the threshold once. If that happens, the driver
> - * will be in debt and no other buffer migrations can be done until that debt
> - * is repaid.
> - *
> - * This approach allows moving a buffer of any size (it's important to allow
> - * that).
> - *
> - * The currency is simply time in microseconds and it increases as the clock
> - * ticks. The accumulated microseconds (us) are converted to bytes and
> - * returned.
> - */
> -static void amdgpu_cs_get_threshold_for_moves(struct amdgpu_device *adev,
> -					      u64 *max_bytes,
> -					      u64 *max_vis_bytes)
> -{
> -	s64 time_us, increment_us;
> -	u64 free_vram, total_vram, used_vram;
> -	/* Allow a maximum of 200 accumulated ms. This is basically per-IB
> -	 * throttling.
> -	 *
> -	 * It means that in order to get full max MBps, at least 5 IBs per
> -	 * second must be submitted and not more than 200ms apart from each
> -	 * other.
> -	 */
> -	const s64 us_upper_bound = 200000;
> -
> -	if (!adev->mm_stats.log2_max_MBps) {
> -		*max_bytes = 0;
> -		*max_vis_bytes = 0;
> -		return;
> -	}
> -
> -	total_vram = adev->gmc.real_vram_size - atomic64_read(&adev->vram_pin_size);
> -	used_vram = ttm_resource_manager_usage(&adev->mman.vram_mgr.manager);
> -	free_vram = used_vram >= total_vram ? 0 : total_vram - used_vram;
> -
> -	spin_lock(&adev->mm_stats.lock);
> -
> -	/* Increase the amount of accumulated us. */
> -	time_us = ktime_to_us(ktime_get());
> -	increment_us = time_us - adev->mm_stats.last_update_us;
> -	adev->mm_stats.last_update_us = time_us;
> -	adev->mm_stats.accum_us = min(adev->mm_stats.accum_us + increment_us,
> -				      us_upper_bound);
> -
> -	/* This prevents the short period of low performance when the VRAM
> -	 * usage is low and the driver is in debt or doesn't have enough
> -	 * accumulated us to fill VRAM quickly.
> -	 *
> -	 * The situation can occur in these cases:
> -	 * - a lot of VRAM is freed by userspace
> -	 * - the presence of a big buffer causes a lot of evictions
> -	 *   (solution: split buffers into smaller ones)
> -	 *
> -	 * If 128 MB or 1/8th of VRAM is free, start filling it now by setting
> -	 * accum_us to a positive number.
> -	 */
> -	if (free_vram >= 128 * 1024 * 1024 || free_vram >= total_vram / 8) {
> -		s64 min_us;
> -
> -		/* Be more aggressive on dGPUs. Try to fill a portion of free
> -		 * VRAM now.
> -		 */
> -		if (!(adev->flags & AMD_IS_APU))
> -			min_us = bytes_to_us(adev, free_vram / 4);
> -		else
> -			min_us = 0; /* Reset accum_us on APUs. */
> -
> -		adev->mm_stats.accum_us = max(min_us, adev->mm_stats.accum_us);
> -	}
> -
> -	/* This is set to 0 if the driver is in debt to disallow (optional)
> -	 * buffer moves.
> -	 */
> -	*max_bytes = us_to_bytes(adev, adev->mm_stats.accum_us);
> -
> -	/* Do the same for visible VRAM if half of it is free */
> -	if (!amdgpu_gmc_vram_full_visible(&adev->gmc)) {
> -		u64 total_vis_vram = adev->gmc.visible_vram_size;
> -		u64 used_vis_vram =
> -		  amdgpu_vram_mgr_vis_usage(&adev->mman.vram_mgr);
> -
> -		if (used_vis_vram < total_vis_vram) {
> -			u64 free_vis_vram = total_vis_vram - used_vis_vram;
> -
> -			adev->mm_stats.accum_us_vis = min(adev->mm_stats.accum_us_vis +
> -							  increment_us, us_upper_bound);
> -
> -			if (free_vis_vram >= total_vis_vram / 2)
> -				adev->mm_stats.accum_us_vis =
> -					max(bytes_to_us(adev, free_vis_vram / 2),
> -					    adev->mm_stats.accum_us_vis);
> -		}
> -
> -		*max_vis_bytes = us_to_bytes(adev, adev->mm_stats.accum_us_vis);
> -	} else {
> -		*max_vis_bytes = 0;
> -	}
> -
> -	spin_unlock(&adev->mm_stats.lock);
> -}
> -
> -/* Report how many bytes have really been moved for the last command
> - * submission. This can result in a debt that can stop buffer migrations
> - * temporarily.
> - */
> -void amdgpu_cs_report_moved_bytes(struct amdgpu_device *adev, u64 num_bytes,
> -				  u64 num_vis_bytes)
> -{
> -	spin_lock(&adev->mm_stats.lock);
> -	adev->mm_stats.accum_us -= bytes_to_us(adev, num_bytes);
> -	adev->mm_stats.accum_us_vis -= bytes_to_us(adev, num_vis_bytes);
> -	spin_unlock(&adev->mm_stats.lock);
> -}
> -
>   static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo)
>   {
> -	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
> -	struct amdgpu_cs_parser *p = param;
>   	struct ttm_operation_ctx ctx = {
>   		.interruptible = true,
>   		.no_wait_gpu = false,
>   		.resv = bo->tbo.base.resv
>   	};
> -	uint32_t domain;
> -	int r;
>
>   	if (bo->tbo.pin_count)
>   		return 0;
>
> -	/* Don't move this buffer if we have depleted our allowance
> -	 * to move it. Don't move anything if the threshold is zero.
> -	 */
> -	if (p->bytes_moved < p->bytes_moved_threshold &&
> -	    (!bo->tbo.base.dma_buf ||
> -	    list_empty(&bo->tbo.base.dma_buf->attachments))) {
> -		if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
> -		    (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)) {
> -			/* And don't move a CPU_ACCESS_REQUIRED BO to limited
> -			 * visible VRAM if we've depleted our allowance to do
> -			 * that.
> -			 */
> -			if (p->bytes_moved_vis < p->bytes_moved_vis_threshold)
> -				domain = bo->preferred_domains;
> -			else
> -				domain = bo->allowed_domains;
> -		} else {
> -			domain = bo->preferred_domains;
> -		}
> -	} else {
> -		domain = bo->allowed_domains;
> -	}
> -
> -retry:
> -	amdgpu_bo_placement_from_domain(bo, domain);
> -	r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
> -
> -	p->bytes_moved += ctx.bytes_moved;
> -	if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
> -	    amdgpu_res_cpu_visible(adev, bo->tbo.resource))
> -		p->bytes_moved_vis += ctx.bytes_moved;
> -
> -	if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {
> -		domain = bo->allowed_domains;
> -		goto retry;
> -	}
> -
> -	return r;
> +	amdgpu_bo_placement_from_domain(bo, bo->allowed_domains);
> +	return ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
>   }
>
>   static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
> @@ -947,13 +770,8 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
>   		e->user_pages = NULL;
>   	}
>
> -	amdgpu_cs_get_threshold_for_moves(p->adev, &p->bytes_moved_threshold,
> -					  &p->bytes_moved_vis_threshold);
> -	p->bytes_moved = 0;
> -	p->bytes_moved_vis = 0;
> -
>   	r = amdgpu_vm_validate(p->adev, &fpriv->vm, NULL,
> -			       amdgpu_cs_bo_validate, p);
> +			       amdgpu_cs_bo_validate, NULL);
>   	if (r) {
>   		DRM_ERROR("amdgpu_vm_validate() failed.\n");
>   		goto out_free_user_pages;
> @@ -973,9 +791,6 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
>   		p->gang_leader->uf_addr += amdgpu_bo_gpu_offset(p->uf_bo);
>   	}
>
> -	amdgpu_cs_report_moved_bytes(p->adev, p->bytes_moved,
> -				     p->bytes_moved_vis);
> -
>   	for (i = 0; i < p->gang_size; ++i)
>   		amdgpu_job_set_resources(p->jobs[i], p->bo_list->gds_obj,
>   					 p->bo_list->gws_obj,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h
> index 39c33ad100cb7..e3d04ac4764be 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h
> @@ -67,10 +67,6 @@ struct amdgpu_cs_parser {
>   	struct amdgpu_bo_list		*bo_list;
>   	struct amdgpu_mn		*mn;
>   	struct dma_fence		*fence;
> -	uint64_t			bytes_moved_threshold;
> -	uint64_t			bytes_moved_vis_threshold;
> -	uint64_t			bytes_moved;
> -	uint64_t			bytes_moved_vis;
>
>   	/* user fence */
>   	struct amdgpu_bo		*uf_bo;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 331b9ed8062c7..5834a95d680d9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -621,13 +621,6 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>   	if (unlikely(r != 0))
>   		return r;
>
> -	if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
> -	    amdgpu_res_cpu_visible(adev, bo->tbo.resource))
> -		amdgpu_cs_report_moved_bytes(adev, ctx.bytes_moved,
> -					     ctx.bytes_moved);
> -	else
> -		amdgpu_cs_report_moved_bytes(adev, ctx.bytes_moved, 0);
> -
>   	if (bp->flags & AMDGPU_GEM_CREATE_VRAM_CLEARED &&
>   	    bo->tbo.resource->mem_type == TTM_PL_VRAM) {
>   		struct dma_fence *fence;
> --
> 2.44.0
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 16/18] drm/amdgpu: Implement SET_PRIORITY GEM op
  2024-04-25  6:32   ` Christian König
@ 2024-04-25  6:46     ` Friedrich Vock
  2024-04-25  6:58       ` Christian König
  0 siblings, 1 reply; 40+ messages in thread
From: Friedrich Vock @ 2024-04-25  6:46 UTC (permalink / raw)
  To: Christian König, dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Alex Deucher

On 25.04.24 08:32, Christian König wrote:
> Am 24.04.24 um 18:57 schrieb Friedrich Vock:
>> Used by userspace to adjust buffer priorities in response to changes in
>> application demand and memory pressure.
>
> Yeah, that was discussed over and over again. One big design criteria
> is that we can't have global priorities from userspace!
>
> The background here is that this can trivially be abused.
>
I see your point when apps are allowed to prioritize themselves above
other apps, and I agree that should probably be disallowed at least for
unprivileged apps.

Disallowing this is a pretty trivial change though, and I don't really
see the abuse potential in being able to downgrade your own priority?

Regards,
Friedrich

> What we can do is to have per process priorities, but that needs to be
> in the VM subsystem.
>
> That's also the reason why I personally think that the handling
> shouldn't be inside TTM at all.
>
> Regards,
> Christian.
>
>>
>> Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 20 ++++++++++++++++++++
>>   include/uapi/drm/amdgpu_drm.h           |  1 +
>>   2 files changed, 21 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> index 5ca13e2e50f50..6107810a9c205 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> @@ -836,8 +836,10 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev,
>> void *data,
>>   {
>>       struct amdgpu_device *adev = drm_to_adev(dev);
>>       struct drm_amdgpu_gem_op *args = data;
>> +    struct ttm_resource_manager *man;
>>       struct drm_gem_object *gobj;
>>       struct amdgpu_vm_bo_base *base;
>> +    struct ttm_operation_ctx ctx;
>>       struct amdgpu_bo *robj;
>>       int r;
>>
>> @@ -851,6 +853,9 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev,
>> void *data,
>>       if (unlikely(r))
>>           goto out;
>>
>> +    memset(&ctx, 0, sizeof(ctx));
>> +    ctx.interruptible = true;
>> +
>>       switch (args->op) {
>>       case AMDGPU_GEM_OP_GET_GEM_CREATE_INFO: {
>>           struct drm_amdgpu_gem_create_in info;
>> @@ -898,6 +903,21 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev,
>> void *data,
>>
>>           amdgpu_bo_unreserve(robj);
>>           break;
>> +    case AMDGPU_GEM_OP_SET_PRIORITY:
>> +        if (args->value > AMDGPU_BO_PRIORITY_MAX_USER)
>> +            args->value = AMDGPU_BO_PRIORITY_MAX_USER;
>> +        ttm_bo_update_priority(&robj->tbo, args->value);
>> +        if (robj->tbo.evicted_type != TTM_NUM_MEM_TYPES) {
>> +            ttm_bo_try_unevict(&robj->tbo, &ctx);
>> +            amdgpu_bo_unreserve(robj);
>> +        } else {
>> +            amdgpu_bo_unreserve(robj);
>> +            man = ttm_manager_type(robj->tbo.bdev,
>> +                robj->tbo.resource->mem_type);
>> +            ttm_mem_unevict_evicted(robj->tbo.bdev, man,
>> +                        true);
>> +        }
>> +        break;
>>       default:
>>           amdgpu_bo_unreserve(robj);
>>           r = -EINVAL;
>> diff --git a/include/uapi/drm/amdgpu_drm.h
>> b/include/uapi/drm/amdgpu_drm.h
>> index bdbe6b262a78d..53552dd489b9b 100644
>> --- a/include/uapi/drm/amdgpu_drm.h
>> +++ b/include/uapi/drm/amdgpu_drm.h
>> @@ -531,6 +531,7 @@ union drm_amdgpu_wait_fences {
>>
>>   #define AMDGPU_GEM_OP_GET_GEM_CREATE_INFO    0
>>   #define AMDGPU_GEM_OP_SET_PLACEMENT        1
>> +#define AMDGPU_GEM_OP_SET_PRIORITY              2
>>
>>   /* Sets or returns a value associated with a buffer. */
>>   struct drm_amdgpu_gem_op {
>> --
>> 2.44.0
>>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription
  2024-04-24 16:56 [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Friedrich Vock
                   ` (17 preceding siblings ...)
  2024-04-24 16:57 ` [RFC PATCH 18/18] drm/amdgpu: Bump minor version Friedrich Vock
@ 2024-04-25  6:54 ` Christian König
  2024-04-25 13:22 ` Marek Olšák
  2024-05-02 14:23 ` Maarten Lankhorst
  20 siblings, 0 replies; 40+ messages in thread
From: Christian König @ 2024-04-25  6:54 UTC (permalink / raw)
  To: Friedrich Vock, dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Alex Deucher

In general: Yes please :)

But are exercising a lot of ideas we have already thrown over board over 
the years.

The general idea Marek and I have been working on for a while now is 
rather to make TTM aware of userspace "clients".

In other words we should start with having a TTM structure in the fpriv 
of the drivers and then track there how much VRAM was evicted for each 
client.

This should then be balanced so that each client gets it's equal share 
of VRAM and we pretty much end up with a static situation which only 
changes when applications become inactive/active (based on their GPU 
activity).

I will mail you some of the stuff we already came up with later on.

Regards,
Christian.

Am 24.04.24 um 18:56 schrieb Friedrich Vock:
> Hi everyone,
>
> recently I've been looking into remedies for apps (in particular, newer
> games) that experience significant performance loss when they start to
> hit VRAM limits, especially on older or lower-end cards that struggle
> to fit both desktop apps and all the game data into VRAM at once.
>
> The root of the problem lies in the fact that from userspace's POV,
> buffer eviction is very opaque: Userspace applications/drivers cannot
> tell how oversubscribed VRAM is, nor do they have fine-grained control
> over which buffers get evicted.  At the same time, with GPU APIs becoming
> increasingly lower-level and GPU-driven, only the application itself
> can know which buffers are used within a particular submission, and
> how important each buffer is. For this, GPU APIs include interfaces
> to query oversubscription and specify memory priorities: In Vulkan,
> oversubscription can be queried through the VK_EXT_memory_budget
> extension. Different buffers can also be assigned priorities via the
> VK_EXT_pageable_device_local_memory extension. Modern games, especially
> D3D12 games via vkd3d-proton, rely on oversubscription being reported and
> priorities being respected in order to perform their memory management.
>
> However, relaying this information to the kernel via the current KMD uAPIs
> is not possible. On AMDGPU for example, all work submissions include a
> "bo list" that contains any buffer object that is accessed during the
> course of the submission. If VRAM is oversubscribed and a buffer in the
> list was evicted to system memory, that buffer is moved back to VRAM
> (potentially evicting other unused buffers).
>
> Since the usermode driver doesn't know what buffers are used by the
> application, its only choice is to submit a bo list that contains every
> buffer the application has allocated. In case of VRAM oversubscription,
> it is highly likely that some of the application's buffers were evicted,
> which almost guarantees that some buffers will get moved around. Since
> the bo list is only known at submit time, this also means the buffers
> will get moved right before submitting application work, which is the
> worst possible time to move buffers from a latency perspective. Another
> consequence of the large bo list is that nearly all memory from other
> applications will be evicted, too. When different applications (e.g. game
> and compositor) submit work one after the other, this causes a ping-pong
> effect where each app's submission evicts the other app's memory,
> resulting in a large amount of unnecessary moves.
>
> This overly aggressive eviction behavior led to RADV adopting a change
> that effectively allows all VRAM applications to reside in system memory
> [1].  This worked around the ping-ponging/excessive buffer moving problem,
> but also meant that any memory evicted to system memory would forever
> stay there, regardless of how VRAM is used.
>
> My proposal aims at providing a middle ground between these extremes.
> The goals I want to meet are:
> - Userspace is accurately informed about VRAM oversubscription/how much
>    VRAM has been evicted
> - Buffer eviction respects priorities set by userspace - Wasteful
>    ping-ponging is avoided to the extent possible
>
> I have been testing out some prototypes, and came up with this rough
> sketch of an API:
>
> - For each ttm_resource_manager, the amount of evicted memory is tracked
>    (similarly to how "usage" tracks the memory usage). When memory is
>    evicted via ttm_bo_evict, the size of the evicted memory is added, when
>    memory is un-evicted (see below), its size is subtracted. The amount of
>    evicted memory for e.g. VRAM can be queried by userspace via an ioctl.
>
> - Each ttm_resource_manager maintains a list of evicted buffer objects.
>
> - ttm_mem_unevict walks the list of evicted bos for a given
>    ttm_resource_manager and tries moving evicted resources back. When a
>    buffer is freed, this function is called to immediately restore some
>    evicted memory.
>
> - Each ttm_buffer_object independently tracks the mem_type it wants
>    to reside in.
>
> - ttm_bo_try_unevict is added as a helper function which attempts to
>    move the buffer to its preferred mem_type. If no space is available
>    there, it fails with -ENOSPC/-ENOMEM.
>
> - Similar to how ttm_bo_evict works, each driver can implement
>    uneviction_valuable/unevict_flags callbacks to control buffer
>    un-eviction.
>
> This is what patches 1-10 accomplish (together with an amdgpu
> implementation utilizing the new API).
>
> Userspace priorities could then be implemented as follows:
>
> - TTM already manages priorities for each buffer object. These priorities
>    can be updated by userspace via a GEM_OP ioctl to inform the kernel
>    which buffers should be evicted before others. If an ioctl increases
>    the priority of a buffer, ttm_bo_try_unevict is called on that buffer to
>    try and move it back (potentially evicting buffers with a lower
>    priority)
>
> - Buffers should never be evicted by other buffers with equal/lower
>    priority, but if there is a buffer with lower priority occupying VRAM,
>    it should be evicted in favor of the higher-priority one. This prevents
>    ping-ponging between buffers that try evicting each other and is
>    trivially implementable with an early-exit in ttm_mem_evict_first.
>
> This is covered in patches 11-15, with the new features exposed to
> userspace in patches 16-18.
>
> I also have a RADV branch utilizing this API at [2], which I use for
> testing.
>
> This implementation is stil very much WIP, although the D3D12 games I
> tested already seemed to benefit from it. Nevertheless, are still quite
> a few TODOs and unresolved questions/problems.
>
> Some kernel drivers (e.g i915) already use TTM priorities for
> kernel-internal purposes. Of course, some of the highest priorities
> should stay reserved for these purposes (with userspace being able to
> use the lower priorities).
>
> Another problem with priorities is the possibility of apps starving other
> apps by occupying all of VRAM with high-priority allocations. A possible
> solution could be include restricting the highest priority/priorities
> to important apps like compositors.
>
> Tying into this problem, only apps that are actively cooperating
> to reduce memory pressure can benefit from the current memory priority
> implementation. Eventually the priority system could also be utilized
> to benefit all applications, for example with the desktop environment
> boosting the priority of the currently-focused app/its cgroup (to
> provide the best QoS to the apps the user is actively using). A full
> implementation of this is probably out-of-scope for this initial proposal,
> but it's probably a good idea to consider this as a possible future use
> of the priority API.
>
> I'm primarily looking to integrate this into amdgpu to solve the
> issues I've seen there, but I'm also interested in feedback from
> other drivers. Is this something you'd be interested in? Do you
> have any objections/comments/questions about my proposed design?
>
> Thanks,
> Friedrich
>
> [1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6833
> [2] https://gitlab.freedesktop.org/pixelcluster/mesa/-/tree/spilling
>
> Friedrich Vock (18):
>    drm/ttm: Add tracking for evicted memory
>    drm/ttm: Add per-BO eviction tracking
>    drm/ttm: Implement BO eviction tracking
>    drm/ttm: Add driver funcs for uneviction control
>    drm/ttm: Add option to evict no BOs in operation
>    drm/ttm: Add public buffer eviction/uneviction functions
>    drm/amdgpu: Add TTM uneviction control functions
>    drm/amdgpu: Don't try moving BOs to preferred domain before submit
>    drm/amdgpu: Don't mark VRAM as a busy placement for VRAM|GTT resources
>    drm/amdgpu: Don't add GTT to initial domains after failing to allocate
>      VRAM
>    drm/ttm: Bump BO priority count
>    drm/ttm: Do not evict BOs with higher priority
>    drm/ttm: Implement ttm_bo_update_priority
>    drm/ttm: Consider BOs placed in non-favorite locations evicted
>    drm/amdgpu: Set a default priority for user/kernel BOs
>    drm/amdgpu: Implement SET_PRIORITY GEM op
>    drm/amdgpu: Implement EVICTED_VRAM query
>    drm/amdgpu: Bump minor version
>
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   2 -
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c     | 191 +---------------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h     |   4 -
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |   3 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    |  25 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    |   3 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  26 ++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |   4 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    |  50 ++++
>   drivers/gpu/drm/ttm/ttm_bo.c               | 253 ++++++++++++++++++++-
>   drivers/gpu/drm/ttm/ttm_bo_util.c          |   3 +
>   drivers/gpu/drm/ttm/ttm_device.c           |   1 +
>   drivers/gpu/drm/ttm/ttm_resource.c         |  19 +-
>   include/drm/ttm/ttm_bo.h                   |  22 ++
>   include/drm/ttm/ttm_device.h               |  28 +++
>   include/drm/ttm/ttm_resource.h             |  11 +-
>   include/uapi/drm/amdgpu_drm.h              |   3 +
>   17 files changed, 430 insertions(+), 218 deletions(-)
>
> --
> 2.44.0
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 16/18] drm/amdgpu: Implement SET_PRIORITY GEM op
  2024-04-25  6:46     ` Friedrich Vock
@ 2024-04-25  6:58       ` Christian König
  2024-04-25  7:06         ` Friedrich Vock
  0 siblings, 1 reply; 40+ messages in thread
From: Christian König @ 2024-04-25  6:58 UTC (permalink / raw)
  To: Friedrich Vock, dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Alex Deucher

Am 25.04.24 um 08:46 schrieb Friedrich Vock:
> On 25.04.24 08:32, Christian König wrote:
>> Am 24.04.24 um 18:57 schrieb Friedrich Vock:
>>> Used by userspace to adjust buffer priorities in response to changes in
>>> application demand and memory pressure.
>>
>> Yeah, that was discussed over and over again. One big design criteria
>> is that we can't have global priorities from userspace!
>>
>> The background here is that this can trivially be abused.
>>
> I see your point when apps are allowed to prioritize themselves above
> other apps, and I agree that should probably be disallowed at least for
> unprivileged apps.
>
> Disallowing this is a pretty trivial change though, and I don't really
> see the abuse potential in being able to downgrade your own priority?

Yeah, I know what you mean and I'm also leaning towards that 
argumentation. But another good point is also that it doesn't actually help.

For example when you have desktop apps fighting with a game, you 
probably don't want to use static priorities, but rather evict the apps 
which are inactive and keep the apps which are active in the background.

In other words the priority just tells you which stuff from each app to 
evict first, but not which app to globally throw out.

Regards,
Christian.

>
> Regards,
> Friedrich
>
>> What we can do is to have per process priorities, but that needs to be
>> in the VM subsystem.
>>
>> That's also the reason why I personally think that the handling
>> shouldn't be inside TTM at all.
>>
>> Regards,
>> Christian.
>>
>>>
>>> Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 20 ++++++++++++++++++++
>>>   include/uapi/drm/amdgpu_drm.h           |  1 +
>>>   2 files changed, 21 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> index 5ca13e2e50f50..6107810a9c205 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> @@ -836,8 +836,10 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev,
>>> void *data,
>>>   {
>>>       struct amdgpu_device *adev = drm_to_adev(dev);
>>>       struct drm_amdgpu_gem_op *args = data;
>>> +    struct ttm_resource_manager *man;
>>>       struct drm_gem_object *gobj;
>>>       struct amdgpu_vm_bo_base *base;
>>> +    struct ttm_operation_ctx ctx;
>>>       struct amdgpu_bo *robj;
>>>       int r;
>>>
>>> @@ -851,6 +853,9 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev,
>>> void *data,
>>>       if (unlikely(r))
>>>           goto out;
>>>
>>> +    memset(&ctx, 0, sizeof(ctx));
>>> +    ctx.interruptible = true;
>>> +
>>>       switch (args->op) {
>>>       case AMDGPU_GEM_OP_GET_GEM_CREATE_INFO: {
>>>           struct drm_amdgpu_gem_create_in info;
>>> @@ -898,6 +903,21 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev,
>>> void *data,
>>>
>>>           amdgpu_bo_unreserve(robj);
>>>           break;
>>> +    case AMDGPU_GEM_OP_SET_PRIORITY:
>>> +        if (args->value > AMDGPU_BO_PRIORITY_MAX_USER)
>>> +            args->value = AMDGPU_BO_PRIORITY_MAX_USER;
>>> +        ttm_bo_update_priority(&robj->tbo, args->value);
>>> +        if (robj->tbo.evicted_type != TTM_NUM_MEM_TYPES) {
>>> +            ttm_bo_try_unevict(&robj->tbo, &ctx);
>>> +            amdgpu_bo_unreserve(robj);
>>> +        } else {
>>> +            amdgpu_bo_unreserve(robj);
>>> +            man = ttm_manager_type(robj->tbo.bdev,
>>> +                robj->tbo.resource->mem_type);
>>> +            ttm_mem_unevict_evicted(robj->tbo.bdev, man,
>>> +                        true);
>>> +        }
>>> +        break;
>>>       default:
>>>           amdgpu_bo_unreserve(robj);
>>>           r = -EINVAL;
>>> diff --git a/include/uapi/drm/amdgpu_drm.h
>>> b/include/uapi/drm/amdgpu_drm.h
>>> index bdbe6b262a78d..53552dd489b9b 100644
>>> --- a/include/uapi/drm/amdgpu_drm.h
>>> +++ b/include/uapi/drm/amdgpu_drm.h
>>> @@ -531,6 +531,7 @@ union drm_amdgpu_wait_fences {
>>>
>>>   #define AMDGPU_GEM_OP_GET_GEM_CREATE_INFO    0
>>>   #define AMDGPU_GEM_OP_SET_PLACEMENT        1
>>> +#define AMDGPU_GEM_OP_SET_PRIORITY              2
>>>
>>>   /* Sets or returns a value associated with a buffer. */
>>>   struct drm_amdgpu_gem_op {
>>> -- 
>>> 2.44.0
>>>
>>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 16/18] drm/amdgpu: Implement SET_PRIORITY GEM op
  2024-04-25  6:58       ` Christian König
@ 2024-04-25  7:06         ` Friedrich Vock
  2024-04-25  7:15           ` Christian König
  0 siblings, 1 reply; 40+ messages in thread
From: Friedrich Vock @ 2024-04-25  7:06 UTC (permalink / raw)
  To: Christian König, dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Alex Deucher

On 25.04.24 08:58, Christian König wrote:
> Am 25.04.24 um 08:46 schrieb Friedrich Vock:
>> On 25.04.24 08:32, Christian König wrote:
>>> Am 24.04.24 um 18:57 schrieb Friedrich Vock:
>>>> Used by userspace to adjust buffer priorities in response to
>>>> changes in
>>>> application demand and memory pressure.
>>>
>>> Yeah, that was discussed over and over again. One big design criteria
>>> is that we can't have global priorities from userspace!
>>>
>>> The background here is that this can trivially be abused.
>>>
>> I see your point when apps are allowed to prioritize themselves above
>> other apps, and I agree that should probably be disallowed at least for
>> unprivileged apps.
>>
>> Disallowing this is a pretty trivial change though, and I don't really
>> see the abuse potential in being able to downgrade your own priority?
>
> Yeah, I know what you mean and I'm also leaning towards that
> argumentation. But another good point is also that it doesn't actually
> help.
>
> For example when you have desktop apps fighting with a game, you
> probably don't want to use static priorities, but rather evict the
> apps which are inactive and keep the apps which are active in the
> background.
>
Sadly things are not as simple as "evict everything from app 1, keep
everything from app 2 active". The simplest failure case of this is
games that already oversubscribe VRAM on their own. Keeping the whole
app inside VRAM is literally impossible there, and it helps a lot to
know which buffers the app is most happy with evicting.
> In other words the priority just tells you which stuff from each app
> to evict first, but not which app to globally throw out.
>
Yeah, but per-buffer priority system could do both of these.

Regards,
Friedrich

> Regards,
> Christian.
>
>>
>> Regards,
>> Friedrich
>>
>>> What we can do is to have per process priorities, but that needs to be
>>> in the VM subsystem.
>>>
>>> That's also the reason why I personally think that the handling
>>> shouldn't be inside TTM at all.
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>> Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 20 ++++++++++++++++++++
>>>>   include/uapi/drm/amdgpu_drm.h           |  1 +
>>>>   2 files changed, 21 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>> index 5ca13e2e50f50..6107810a9c205 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>> @@ -836,8 +836,10 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev,
>>>> void *data,
>>>>   {
>>>>       struct amdgpu_device *adev = drm_to_adev(dev);
>>>>       struct drm_amdgpu_gem_op *args = data;
>>>> +    struct ttm_resource_manager *man;
>>>>       struct drm_gem_object *gobj;
>>>>       struct amdgpu_vm_bo_base *base;
>>>> +    struct ttm_operation_ctx ctx;
>>>>       struct amdgpu_bo *robj;
>>>>       int r;
>>>>
>>>> @@ -851,6 +853,9 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev,
>>>> void *data,
>>>>       if (unlikely(r))
>>>>           goto out;
>>>>
>>>> +    memset(&ctx, 0, sizeof(ctx));
>>>> +    ctx.interruptible = true;
>>>> +
>>>>       switch (args->op) {
>>>>       case AMDGPU_GEM_OP_GET_GEM_CREATE_INFO: {
>>>>           struct drm_amdgpu_gem_create_in info;
>>>> @@ -898,6 +903,21 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev,
>>>> void *data,
>>>>
>>>>           amdgpu_bo_unreserve(robj);
>>>>           break;
>>>> +    case AMDGPU_GEM_OP_SET_PRIORITY:
>>>> +        if (args->value > AMDGPU_BO_PRIORITY_MAX_USER)
>>>> +            args->value = AMDGPU_BO_PRIORITY_MAX_USER;
>>>> +        ttm_bo_update_priority(&robj->tbo, args->value);
>>>> +        if (robj->tbo.evicted_type != TTM_NUM_MEM_TYPES) {
>>>> +            ttm_bo_try_unevict(&robj->tbo, &ctx);
>>>> +            amdgpu_bo_unreserve(robj);
>>>> +        } else {
>>>> +            amdgpu_bo_unreserve(robj);
>>>> +            man = ttm_manager_type(robj->tbo.bdev,
>>>> +                robj->tbo.resource->mem_type);
>>>> +            ttm_mem_unevict_evicted(robj->tbo.bdev, man,
>>>> +                        true);
>>>> +        }
>>>> +        break;
>>>>       default:
>>>>           amdgpu_bo_unreserve(robj);
>>>>           r = -EINVAL;
>>>> diff --git a/include/uapi/drm/amdgpu_drm.h
>>>> b/include/uapi/drm/amdgpu_drm.h
>>>> index bdbe6b262a78d..53552dd489b9b 100644
>>>> --- a/include/uapi/drm/amdgpu_drm.h
>>>> +++ b/include/uapi/drm/amdgpu_drm.h
>>>> @@ -531,6 +531,7 @@ union drm_amdgpu_wait_fences {
>>>>
>>>>   #define AMDGPU_GEM_OP_GET_GEM_CREATE_INFO    0
>>>>   #define AMDGPU_GEM_OP_SET_PLACEMENT        1
>>>> +#define AMDGPU_GEM_OP_SET_PRIORITY              2
>>>>
>>>>   /* Sets or returns a value associated with a buffer. */
>>>>   struct drm_amdgpu_gem_op {
>>>> --
>>>> 2.44.0
>>>>
>>>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 16/18] drm/amdgpu: Implement SET_PRIORITY GEM op
  2024-04-25  7:06         ` Friedrich Vock
@ 2024-04-25  7:15           ` Christian König
  2024-04-25  7:39             ` Friedrich Vock
  0 siblings, 1 reply; 40+ messages in thread
From: Christian König @ 2024-04-25  7:15 UTC (permalink / raw)
  To: Friedrich Vock, dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Alex Deucher

Am 25.04.24 um 09:06 schrieb Friedrich Vock:
> On 25.04.24 08:58, Christian König wrote:
>> Am 25.04.24 um 08:46 schrieb Friedrich Vock:
>>> On 25.04.24 08:32, Christian König wrote:
>>>> Am 24.04.24 um 18:57 schrieb Friedrich Vock:
>>>>> Used by userspace to adjust buffer priorities in response to
>>>>> changes in
>>>>> application demand and memory pressure.
>>>>
>>>> Yeah, that was discussed over and over again. One big design criteria
>>>> is that we can't have global priorities from userspace!
>>>>
>>>> The background here is that this can trivially be abused.
>>>>
>>> I see your point when apps are allowed to prioritize themselves above
>>> other apps, and I agree that should probably be disallowed at least for
>>> unprivileged apps.
>>>
>>> Disallowing this is a pretty trivial change though, and I don't really
>>> see the abuse potential in being able to downgrade your own priority?
>>
>> Yeah, I know what you mean and I'm also leaning towards that
>> argumentation. But another good point is also that it doesn't actually
>> help.
>>
>> For example when you have desktop apps fighting with a game, you
>> probably don't want to use static priorities, but rather evict the
>> apps which are inactive and keep the apps which are active in the
>> background.
>>
> Sadly things are not as simple as "evict everything from app 1, keep
> everything from app 2 active". The simplest failure case of this is
> games that already oversubscribe VRAM on their own. Keeping the whole
> app inside VRAM is literally impossible there, and it helps a lot to
> know which buffers the app is most happy with evicting.
>> In other words the priority just tells you which stuff from each app
>> to evict first, but not which app to globally throw out.
>>
> Yeah, but per-buffer priority system could do both of these.

Yeah, but we already have that. See amdgpu_bo_list_entry_cmp() and 
amdgpu_bo_list_create().

This is the per application priority which can be used by userspace to 
give priority to each BO in a submission (or application wide).

The problem is rather that amdgpu/TTM never really made good use of that 
information.

Regards,
Christian.

>
> Regards,
> Friedrich
>
>> Regards,
>> Christian.
>>
>>>
>>> Regards,
>>> Friedrich
>>>
>>>> What we can do is to have per process priorities, but that needs to be
>>>> in the VM subsystem.
>>>>
>>>> That's also the reason why I personally think that the handling
>>>> shouldn't be inside TTM at all.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>>
>>>>> Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
>>>>> ---
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 20 ++++++++++++++++++++
>>>>>   include/uapi/drm/amdgpu_drm.h           |  1 +
>>>>>   2 files changed, 21 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>> index 5ca13e2e50f50..6107810a9c205 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>> @@ -836,8 +836,10 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev,
>>>>> void *data,
>>>>>   {
>>>>>       struct amdgpu_device *adev = drm_to_adev(dev);
>>>>>       struct drm_amdgpu_gem_op *args = data;
>>>>> +    struct ttm_resource_manager *man;
>>>>>       struct drm_gem_object *gobj;
>>>>>       struct amdgpu_vm_bo_base *base;
>>>>> +    struct ttm_operation_ctx ctx;
>>>>>       struct amdgpu_bo *robj;
>>>>>       int r;
>>>>>
>>>>> @@ -851,6 +853,9 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev,
>>>>> void *data,
>>>>>       if (unlikely(r))
>>>>>           goto out;
>>>>>
>>>>> +    memset(&ctx, 0, sizeof(ctx));
>>>>> +    ctx.interruptible = true;
>>>>> +
>>>>>       switch (args->op) {
>>>>>       case AMDGPU_GEM_OP_GET_GEM_CREATE_INFO: {
>>>>>           struct drm_amdgpu_gem_create_in info;
>>>>> @@ -898,6 +903,21 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev,
>>>>> void *data,
>>>>>
>>>>>           amdgpu_bo_unreserve(robj);
>>>>>           break;
>>>>> +    case AMDGPU_GEM_OP_SET_PRIORITY:
>>>>> +        if (args->value > AMDGPU_BO_PRIORITY_MAX_USER)
>>>>> +            args->value = AMDGPU_BO_PRIORITY_MAX_USER;
>>>>> +        ttm_bo_update_priority(&robj->tbo, args->value);
>>>>> +        if (robj->tbo.evicted_type != TTM_NUM_MEM_TYPES) {
>>>>> +            ttm_bo_try_unevict(&robj->tbo, &ctx);
>>>>> +            amdgpu_bo_unreserve(robj);
>>>>> +        } else {
>>>>> +            amdgpu_bo_unreserve(robj);
>>>>> +            man = ttm_manager_type(robj->tbo.bdev,
>>>>> +                robj->tbo.resource->mem_type);
>>>>> +            ttm_mem_unevict_evicted(robj->tbo.bdev, man,
>>>>> +                        true);
>>>>> +        }
>>>>> +        break;
>>>>>       default:
>>>>>           amdgpu_bo_unreserve(robj);
>>>>>           r = -EINVAL;
>>>>> diff --git a/include/uapi/drm/amdgpu_drm.h
>>>>> b/include/uapi/drm/amdgpu_drm.h
>>>>> index bdbe6b262a78d..53552dd489b9b 100644
>>>>> --- a/include/uapi/drm/amdgpu_drm.h
>>>>> +++ b/include/uapi/drm/amdgpu_drm.h
>>>>> @@ -531,6 +531,7 @@ union drm_amdgpu_wait_fences {
>>>>>
>>>>>   #define AMDGPU_GEM_OP_GET_GEM_CREATE_INFO    0
>>>>>   #define AMDGPU_GEM_OP_SET_PLACEMENT        1
>>>>> +#define AMDGPU_GEM_OP_SET_PRIORITY              2
>>>>>
>>>>>   /* Sets or returns a value associated with a buffer. */
>>>>>   struct drm_amdgpu_gem_op {
>>>>> -- 
>>>>> 2.44.0
>>>>>
>>>>
>>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 16/18] drm/amdgpu: Implement SET_PRIORITY GEM op
  2024-04-25  7:15           ` Christian König
@ 2024-04-25  7:39             ` Friedrich Vock
  0 siblings, 0 replies; 40+ messages in thread
From: Friedrich Vock @ 2024-04-25  7:39 UTC (permalink / raw)
  To: Christian König, dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Alex Deucher

On 25.04.24 09:15, Christian König wrote:
> Am 25.04.24 um 09:06 schrieb Friedrich Vock:
>> On 25.04.24 08:58, Christian König wrote:
>>> Am 25.04.24 um 08:46 schrieb Friedrich Vock:
>>>> On 25.04.24 08:32, Christian König wrote:
>>>>> Am 24.04.24 um 18:57 schrieb Friedrich Vock:
>>>>>> Used by userspace to adjust buffer priorities in response to
>>>>>> changes in
>>>>>> application demand and memory pressure.
>>>>>
>>>>> Yeah, that was discussed over and over again. One big design criteria
>>>>> is that we can't have global priorities from userspace!
>>>>>
>>>>> The background here is that this can trivially be abused.
>>>>>
>>>> I see your point when apps are allowed to prioritize themselves above
>>>> other apps, and I agree that should probably be disallowed at least
>>>> for
>>>> unprivileged apps.
>>>>
>>>> Disallowing this is a pretty trivial change though, and I don't really
>>>> see the abuse potential in being able to downgrade your own priority?
>>>
>>> Yeah, I know what you mean and I'm also leaning towards that
>>> argumentation. But another good point is also that it doesn't actually
>>> help.
>>>
>>> For example when you have desktop apps fighting with a game, you
>>> probably don't want to use static priorities, but rather evict the
>>> apps which are inactive and keep the apps which are active in the
>>> background.
>>>
>> Sadly things are not as simple as "evict everything from app 1, keep
>> everything from app 2 active". The simplest failure case of this is
>> games that already oversubscribe VRAM on their own. Keeping the whole
>> app inside VRAM is literally impossible there, and it helps a lot to
>> know which buffers the app is most happy with evicting.
>>> In other words the priority just tells you which stuff from each app
>>> to evict first, but not which app to globally throw out.
>>>
>> Yeah, but per-buffer priority system could do both of these.
>
> Yeah, but we already have that. See amdgpu_bo_list_entry_cmp() and
> amdgpu_bo_list_create().
>
> This is the per application priority which can be used by userspace to
> give priority to each BO in a submission (or application wide).
>
> The problem is rather that amdgpu/TTM never really made good use of
> that information.
>
I think it's nigh impossible to make good use of priority information if
you wrap it in the BO list which you only know on submit. For example,
you don't know when priorities change unless you duplicate all the
tracking work (that the application has to do too!) in the kernel. You
also have no way of knowing the priority changed until right when the
app wants to submit work using that BO, and starting to move BOs around
at that point is bad for submission latency. That's why I didn't go
forward with tracking priorities on a BO-list basis.

Also, the priorities being local to a submission is actually not that
great when talking about lowering priorities. Consider a case where an
app's working set fits into VRAM completely, but combined with the
working set of other apps running in parallel, VRAM is oversubscribed.
The app recognizes this and asks the kernel to evict one of its
rarely-used buffers by setting the priority to the lowest possible, to
make space for the other applications.
Without global priorities, the kernel can't honor that request, even
though it would solve the oversubscription with minimal performance
impact. Even with per-app priorities, the kernel isn't likely to evict
buffers from the requesting application unless all the other
applications have a higher priority.

Regards,
Friedrich

> Regards,
> Christian.
>
>>
>> Regards,
>> Friedrich
>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>> Regards,
>>>> Friedrich
>>>>
>>>>> What we can do is to have per process priorities, but that needs
>>>>> to be
>>>>> in the VM subsystem.
>>>>>
>>>>> That's also the reason why I personally think that the handling
>>>>> shouldn't be inside TTM at all.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>>
>>>>>> Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
>>>>>> ---
>>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 20 ++++++++++++++++++++
>>>>>>   include/uapi/drm/amdgpu_drm.h           |  1 +
>>>>>>   2 files changed, 21 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>>> index 5ca13e2e50f50..6107810a9c205 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>>> @@ -836,8 +836,10 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev,
>>>>>> void *data,
>>>>>>   {
>>>>>>       struct amdgpu_device *adev = drm_to_adev(dev);
>>>>>>       struct drm_amdgpu_gem_op *args = data;
>>>>>> +    struct ttm_resource_manager *man;
>>>>>>       struct drm_gem_object *gobj;
>>>>>>       struct amdgpu_vm_bo_base *base;
>>>>>> +    struct ttm_operation_ctx ctx;
>>>>>>       struct amdgpu_bo *robj;
>>>>>>       int r;
>>>>>>
>>>>>> @@ -851,6 +853,9 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev,
>>>>>> void *data,
>>>>>>       if (unlikely(r))
>>>>>>           goto out;
>>>>>>
>>>>>> +    memset(&ctx, 0, sizeof(ctx));
>>>>>> +    ctx.interruptible = true;
>>>>>> +
>>>>>>       switch (args->op) {
>>>>>>       case AMDGPU_GEM_OP_GET_GEM_CREATE_INFO: {
>>>>>>           struct drm_amdgpu_gem_create_in info;
>>>>>> @@ -898,6 +903,21 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev,
>>>>>> void *data,
>>>>>>
>>>>>>           amdgpu_bo_unreserve(robj);
>>>>>>           break;
>>>>>> +    case AMDGPU_GEM_OP_SET_PRIORITY:
>>>>>> +        if (args->value > AMDGPU_BO_PRIORITY_MAX_USER)
>>>>>> +            args->value = AMDGPU_BO_PRIORITY_MAX_USER;
>>>>>> +        ttm_bo_update_priority(&robj->tbo, args->value);
>>>>>> +        if (robj->tbo.evicted_type != TTM_NUM_MEM_TYPES) {
>>>>>> +            ttm_bo_try_unevict(&robj->tbo, &ctx);
>>>>>> +            amdgpu_bo_unreserve(robj);
>>>>>> +        } else {
>>>>>> +            amdgpu_bo_unreserve(robj);
>>>>>> +            man = ttm_manager_type(robj->tbo.bdev,
>>>>>> +                robj->tbo.resource->mem_type);
>>>>>> +            ttm_mem_unevict_evicted(robj->tbo.bdev, man,
>>>>>> +                        true);
>>>>>> +        }
>>>>>> +        break;
>>>>>>       default:
>>>>>>           amdgpu_bo_unreserve(robj);
>>>>>>           r = -EINVAL;
>>>>>> diff --git a/include/uapi/drm/amdgpu_drm.h
>>>>>> b/include/uapi/drm/amdgpu_drm.h
>>>>>> index bdbe6b262a78d..53552dd489b9b 100644
>>>>>> --- a/include/uapi/drm/amdgpu_drm.h
>>>>>> +++ b/include/uapi/drm/amdgpu_drm.h
>>>>>> @@ -531,6 +531,7 @@ union drm_amdgpu_wait_fences {
>>>>>>
>>>>>>   #define AMDGPU_GEM_OP_GET_GEM_CREATE_INFO    0
>>>>>>   #define AMDGPU_GEM_OP_SET_PLACEMENT        1
>>>>>> +#define AMDGPU_GEM_OP_SET_PRIORITY              2
>>>>>>
>>>>>>   /* Sets or returns a value associated with a buffer. */
>>>>>>   struct drm_amdgpu_gem_op {
>>>>>> --
>>>>>> 2.44.0
>>>>>>
>>>>>
>>>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 10/18] drm/amdgpu: Don't add GTT to initial domains after failing to allocate VRAM
  2024-04-25  6:25   ` Christian König
@ 2024-04-25  7:39     ` Friedrich Vock
  2024-04-25  7:54       ` Christian König
  0 siblings, 1 reply; 40+ messages in thread
From: Friedrich Vock @ 2024-04-25  7:39 UTC (permalink / raw)
  To: Christian König, dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Alex Deucher

On 25.04.24 08:25, Christian König wrote:
> Am 24.04.24 um 18:57 schrieb Friedrich Vock:
>> This adds GTT to the "preferred domains" of this buffer object, which
>> will also prevent any attempts at moving the buffer back to VRAM if
>> there is space. If VRAM is full, GTT will already be chosen as a
>> fallback.
>
> Big NAK to that one, this is mandatory for correct operation.
>
Hm, how is correctness affected here? We still fall back to GTT if
allocating in VRAM doesn't work, I don't see a difference except that
now we'll actually try moving it back into VRAM again.

Regards,
Friedrich

> Regards,
> Christian.
>
>>
>> Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    | 4 ----
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +-
>>   2 files changed, 1 insertion(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> index 6bbab141eaaeb..aea3770d3ea2e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> @@ -378,10 +378,6 @@ int amdgpu_gem_create_ioctl(struct drm_device
>> *dev, void *data,
>>               goto retry;
>>           }
>>
>> -        if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
>> -            initial_domain |= AMDGPU_GEM_DOMAIN_GTT;
>> -            goto retry;
>> -        }
>>           DRM_DEBUG("Failed to allocate GEM object (%llu, %d, %llu,
>> %d)\n",
>>                   size, initial_domain, args->in.alignment, r);
>>       }
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> index 85c10d8086188..9978b85ed6f40 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> @@ -619,7 +619,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>>                 AMDGPU_GEM_DOMAIN_GDS))
>>           amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
>>       else
>> -        amdgpu_bo_placement_from_domain(bo, bp->domain);
>> +        amdgpu_bo_placement_from_domain(bo, bo->allowed_domains);
>>       if (bp->type == ttm_bo_type_kernel)
>>           bo->tbo.priority = 2;
>>       else if (!(bp->flags & AMDGPU_GEM_CREATE_DISCARDABLE))
>> --
>> 2.44.0
>>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 10/18] drm/amdgpu: Don't add GTT to initial domains after failing to allocate VRAM
  2024-04-25  7:39     ` Friedrich Vock
@ 2024-04-25  7:54       ` Christian König
  0 siblings, 0 replies; 40+ messages in thread
From: Christian König @ 2024-04-25  7:54 UTC (permalink / raw)
  To: Friedrich Vock, dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Alex Deucher

Am 25.04.24 um 09:39 schrieb Friedrich Vock:
> On 25.04.24 08:25, Christian König wrote:
>> Am 24.04.24 um 18:57 schrieb Friedrich Vock:
>>> This adds GTT to the "preferred domains" of this buffer object, which
>>> will also prevent any attempts at moving the buffer back to VRAM if
>>> there is space. If VRAM is full, GTT will already be chosen as a
>>> fallback.
>>
>> Big NAK to that one, this is mandatory for correct operation.
>>
> Hm, how is correctness affected here? We still fall back to GTT if
> allocating in VRAM doesn't work, I don't see a difference except that
> now we'll actually try moving it back into VRAM again.

Well this is the fallback. Only during CS we try to allocate from GTT if 
allocating in VRAM doesn't work.

When you remove this here then any failed allocation from VRAM would be 
fatal.

What could be is that the handling is buggy and when we update the 
initial domain we also add GTT to the preferred domain, but that should 
then be fixed.

Regards,
Christian.

>
> Regards,
> Friedrich
>
>> Regards,
>> Christian.
>>
>>>
>>> Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    | 4 ----
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +-
>>>   2 files changed, 1 insertion(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> index 6bbab141eaaeb..aea3770d3ea2e 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> @@ -378,10 +378,6 @@ int amdgpu_gem_create_ioctl(struct drm_device
>>> *dev, void *data,
>>>               goto retry;
>>>           }
>>>
>>> -        if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
>>> -            initial_domain |= AMDGPU_GEM_DOMAIN_GTT;
>>> -            goto retry;
>>> -        }
>>>           DRM_DEBUG("Failed to allocate GEM object (%llu, %d, %llu,
>>> %d)\n",
>>>                   size, initial_domain, args->in.alignment, r);
>>>       }
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> index 85c10d8086188..9978b85ed6f40 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> @@ -619,7 +619,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>>>                 AMDGPU_GEM_DOMAIN_GDS))
>>>           amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
>>>       else
>>> -        amdgpu_bo_placement_from_domain(bo, bp->domain);
>>> +        amdgpu_bo_placement_from_domain(bo, bo->allowed_domains);
>>>       if (bp->type == ttm_bo_type_kernel)
>>>           bo->tbo.priority = 2;
>>>       else if (!(bp->flags & AMDGPU_GEM_CREATE_DISCARDABLE))
>>> -- 
>>> 2.44.0
>>>
>>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription
  2024-04-24 16:56 [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Friedrich Vock
                   ` (18 preceding siblings ...)
  2024-04-25  6:54 ` [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Christian König
@ 2024-04-25 13:22 ` Marek Olšák
  2024-04-25 13:33   ` Christian König
  2024-05-02 14:23 ` Maarten Lankhorst
  20 siblings, 1 reply; 40+ messages in thread
From: Marek Olšák @ 2024-04-25 13:22 UTC (permalink / raw)
  To: Friedrich Vock
  Cc: dri-devel, amd-gfx, Pierre-Loup Griffais, Tvrtko Ursulin,
	Bas Nieuwenhuizen, Joshua Ashton, Christian König,
	Alex Deucher

The most extreme ping-ponging is mitigated by throttling buffer moves
in the kernel, but it only works without VM_ALWAYS_VALID and you can
set BO priorities in the BO list. A better approach that works with
VM_ALWAYS_VALID would be nice.

Marek

On Wed, Apr 24, 2024 at 1:12 PM Friedrich Vock <friedrich.vock@gmx.de> wrote:
>
> Hi everyone,
>
> recently I've been looking into remedies for apps (in particular, newer
> games) that experience significant performance loss when they start to
> hit VRAM limits, especially on older or lower-end cards that struggle
> to fit both desktop apps and all the game data into VRAM at once.
>
> The root of the problem lies in the fact that from userspace's POV,
> buffer eviction is very opaque: Userspace applications/drivers cannot
> tell how oversubscribed VRAM is, nor do they have fine-grained control
> over which buffers get evicted.  At the same time, with GPU APIs becoming
> increasingly lower-level and GPU-driven, only the application itself
> can know which buffers are used within a particular submission, and
> how important each buffer is. For this, GPU APIs include interfaces
> to query oversubscription and specify memory priorities: In Vulkan,
> oversubscription can be queried through the VK_EXT_memory_budget
> extension. Different buffers can also be assigned priorities via the
> VK_EXT_pageable_device_local_memory extension. Modern games, especially
> D3D12 games via vkd3d-proton, rely on oversubscription being reported and
> priorities being respected in order to perform their memory management.
>
> However, relaying this information to the kernel via the current KMD uAPIs
> is not possible. On AMDGPU for example, all work submissions include a
> "bo list" that contains any buffer object that is accessed during the
> course of the submission. If VRAM is oversubscribed and a buffer in the
> list was evicted to system memory, that buffer is moved back to VRAM
> (potentially evicting other unused buffers).
>
> Since the usermode driver doesn't know what buffers are used by the
> application, its only choice is to submit a bo list that contains every
> buffer the application has allocated. In case of VRAM oversubscription,
> it is highly likely that some of the application's buffers were evicted,
> which almost guarantees that some buffers will get moved around. Since
> the bo list is only known at submit time, this also means the buffers
> will get moved right before submitting application work, which is the
> worst possible time to move buffers from a latency perspective. Another
> consequence of the large bo list is that nearly all memory from other
> applications will be evicted, too. When different applications (e.g. game
> and compositor) submit work one after the other, this causes a ping-pong
> effect where each app's submission evicts the other app's memory,
> resulting in a large amount of unnecessary moves.
>
> This overly aggressive eviction behavior led to RADV adopting a change
> that effectively allows all VRAM applications to reside in system memory
> [1].  This worked around the ping-ponging/excessive buffer moving problem,
> but also meant that any memory evicted to system memory would forever
> stay there, regardless of how VRAM is used.
>
> My proposal aims at providing a middle ground between these extremes.
> The goals I want to meet are:
> - Userspace is accurately informed about VRAM oversubscription/how much
>   VRAM has been evicted
> - Buffer eviction respects priorities set by userspace - Wasteful
>   ping-ponging is avoided to the extent possible
>
> I have been testing out some prototypes, and came up with this rough
> sketch of an API:
>
> - For each ttm_resource_manager, the amount of evicted memory is tracked
>   (similarly to how "usage" tracks the memory usage). When memory is
>   evicted via ttm_bo_evict, the size of the evicted memory is added, when
>   memory is un-evicted (see below), its size is subtracted. The amount of
>   evicted memory for e.g. VRAM can be queried by userspace via an ioctl.
>
> - Each ttm_resource_manager maintains a list of evicted buffer objects.
>
> - ttm_mem_unevict walks the list of evicted bos for a given
>   ttm_resource_manager and tries moving evicted resources back. When a
>   buffer is freed, this function is called to immediately restore some
>   evicted memory.
>
> - Each ttm_buffer_object independently tracks the mem_type it wants
>   to reside in.
>
> - ttm_bo_try_unevict is added as a helper function which attempts to
>   move the buffer to its preferred mem_type. If no space is available
>   there, it fails with -ENOSPC/-ENOMEM.
>
> - Similar to how ttm_bo_evict works, each driver can implement
>   uneviction_valuable/unevict_flags callbacks to control buffer
>   un-eviction.
>
> This is what patches 1-10 accomplish (together with an amdgpu
> implementation utilizing the new API).
>
> Userspace priorities could then be implemented as follows:
>
> - TTM already manages priorities for each buffer object. These priorities
>   can be updated by userspace via a GEM_OP ioctl to inform the kernel
>   which buffers should be evicted before others. If an ioctl increases
>   the priority of a buffer, ttm_bo_try_unevict is called on that buffer to
>   try and move it back (potentially evicting buffers with a lower
>   priority)
>
> - Buffers should never be evicted by other buffers with equal/lower
>   priority, but if there is a buffer with lower priority occupying VRAM,
>   it should be evicted in favor of the higher-priority one. This prevents
>   ping-ponging between buffers that try evicting each other and is
>   trivially implementable with an early-exit in ttm_mem_evict_first.
>
> This is covered in patches 11-15, with the new features exposed to
> userspace in patches 16-18.
>
> I also have a RADV branch utilizing this API at [2], which I use for
> testing.
>
> This implementation is stil very much WIP, although the D3D12 games I
> tested already seemed to benefit from it. Nevertheless, are still quite
> a few TODOs and unresolved questions/problems.
>
> Some kernel drivers (e.g i915) already use TTM priorities for
> kernel-internal purposes. Of course, some of the highest priorities
> should stay reserved for these purposes (with userspace being able to
> use the lower priorities).
>
> Another problem with priorities is the possibility of apps starving other
> apps by occupying all of VRAM with high-priority allocations. A possible
> solution could be include restricting the highest priority/priorities
> to important apps like compositors.
>
> Tying into this problem, only apps that are actively cooperating
> to reduce memory pressure can benefit from the current memory priority
> implementation. Eventually the priority system could also be utilized
> to benefit all applications, for example with the desktop environment
> boosting the priority of the currently-focused app/its cgroup (to
> provide the best QoS to the apps the user is actively using). A full
> implementation of this is probably out-of-scope for this initial proposal,
> but it's probably a good idea to consider this as a possible future use
> of the priority API.
>
> I'm primarily looking to integrate this into amdgpu to solve the
> issues I've seen there, but I'm also interested in feedback from
> other drivers. Is this something you'd be interested in? Do you
> have any objections/comments/questions about my proposed design?
>
> Thanks,
> Friedrich
>
> [1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6833
> [2] https://gitlab.freedesktop.org/pixelcluster/mesa/-/tree/spilling
>
> Friedrich Vock (18):
>   drm/ttm: Add tracking for evicted memory
>   drm/ttm: Add per-BO eviction tracking
>   drm/ttm: Implement BO eviction tracking
>   drm/ttm: Add driver funcs for uneviction control
>   drm/ttm: Add option to evict no BOs in operation
>   drm/ttm: Add public buffer eviction/uneviction functions
>   drm/amdgpu: Add TTM uneviction control functions
>   drm/amdgpu: Don't try moving BOs to preferred domain before submit
>   drm/amdgpu: Don't mark VRAM as a busy placement for VRAM|GTT resources
>   drm/amdgpu: Don't add GTT to initial domains after failing to allocate
>     VRAM
>   drm/ttm: Bump BO priority count
>   drm/ttm: Do not evict BOs with higher priority
>   drm/ttm: Implement ttm_bo_update_priority
>   drm/ttm: Consider BOs placed in non-favorite locations evicted
>   drm/amdgpu: Set a default priority for user/kernel BOs
>   drm/amdgpu: Implement SET_PRIORITY GEM op
>   drm/amdgpu: Implement EVICTED_VRAM query
>   drm/amdgpu: Bump minor version
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   2 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c     | 191 +---------------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h     |   4 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |   3 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    |  25 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    |   3 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  26 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |   4 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    |  50 ++++
>  drivers/gpu/drm/ttm/ttm_bo.c               | 253 ++++++++++++++++++++-
>  drivers/gpu/drm/ttm/ttm_bo_util.c          |   3 +
>  drivers/gpu/drm/ttm/ttm_device.c           |   1 +
>  drivers/gpu/drm/ttm/ttm_resource.c         |  19 +-
>  include/drm/ttm/ttm_bo.h                   |  22 ++
>  include/drm/ttm/ttm_device.h               |  28 +++
>  include/drm/ttm/ttm_resource.h             |  11 +-
>  include/uapi/drm/amdgpu_drm.h              |   3 +
>  17 files changed, 430 insertions(+), 218 deletions(-)
>
> --
> 2.44.0
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription
  2024-04-25 13:22 ` Marek Olšák
@ 2024-04-25 13:33   ` Christian König
  0 siblings, 0 replies; 40+ messages in thread
From: Christian König @ 2024-04-25 13:33 UTC (permalink / raw)
  To: Marek Olšák, Friedrich Vock
  Cc: dri-devel, amd-gfx, Pierre-Loup Griffais, Tvrtko Ursulin,
	Bas Nieuwenhuizen, Joshua Ashton, Alex Deucher

Yeah, and this patch set here is removing that functionality.

Which is major concern from my side as well.

Instead of removing it my long term plan was to move this into TTM ( the 
recent flags rework is going into that direction), so that both amdgpu 
and radeon can use the same code again *and* we can also apply it on 
VM_ALWAYS_VALID BOs.

Christian.

Am 25.04.24 um 15:22 schrieb Marek Olšák:
> The most extreme ping-ponging is mitigated by throttling buffer moves
> in the kernel, but it only works without VM_ALWAYS_VALID and you can
> set BO priorities in the BO list. A better approach that works with
> VM_ALWAYS_VALID would be nice.
>
> Marek
>
> On Wed, Apr 24, 2024 at 1:12 PM Friedrich Vock <friedrich.vock@gmx.de> wrote:
>> Hi everyone,
>>
>> recently I've been looking into remedies for apps (in particular, newer
>> games) that experience significant performance loss when they start to
>> hit VRAM limits, especially on older or lower-end cards that struggle
>> to fit both desktop apps and all the game data into VRAM at once.
>>
>> The root of the problem lies in the fact that from userspace's POV,
>> buffer eviction is very opaque: Userspace applications/drivers cannot
>> tell how oversubscribed VRAM is, nor do they have fine-grained control
>> over which buffers get evicted.  At the same time, with GPU APIs becoming
>> increasingly lower-level and GPU-driven, only the application itself
>> can know which buffers are used within a particular submission, and
>> how important each buffer is. For this, GPU APIs include interfaces
>> to query oversubscription and specify memory priorities: In Vulkan,
>> oversubscription can be queried through the VK_EXT_memory_budget
>> extension. Different buffers can also be assigned priorities via the
>> VK_EXT_pageable_device_local_memory extension. Modern games, especially
>> D3D12 games via vkd3d-proton, rely on oversubscription being reported and
>> priorities being respected in order to perform their memory management.
>>
>> However, relaying this information to the kernel via the current KMD uAPIs
>> is not possible. On AMDGPU for example, all work submissions include a
>> "bo list" that contains any buffer object that is accessed during the
>> course of the submission. If VRAM is oversubscribed and a buffer in the
>> list was evicted to system memory, that buffer is moved back to VRAM
>> (potentially evicting other unused buffers).
>>
>> Since the usermode driver doesn't know what buffers are used by the
>> application, its only choice is to submit a bo list that contains every
>> buffer the application has allocated. In case of VRAM oversubscription,
>> it is highly likely that some of the application's buffers were evicted,
>> which almost guarantees that some buffers will get moved around. Since
>> the bo list is only known at submit time, this also means the buffers
>> will get moved right before submitting application work, which is the
>> worst possible time to move buffers from a latency perspective. Another
>> consequence of the large bo list is that nearly all memory from other
>> applications will be evicted, too. When different applications (e.g. game
>> and compositor) submit work one after the other, this causes a ping-pong
>> effect where each app's submission evicts the other app's memory,
>> resulting in a large amount of unnecessary moves.
>>
>> This overly aggressive eviction behavior led to RADV adopting a change
>> that effectively allows all VRAM applications to reside in system memory
>> [1].  This worked around the ping-ponging/excessive buffer moving problem,
>> but also meant that any memory evicted to system memory would forever
>> stay there, regardless of how VRAM is used.
>>
>> My proposal aims at providing a middle ground between these extremes.
>> The goals I want to meet are:
>> - Userspace is accurately informed about VRAM oversubscription/how much
>>    VRAM has been evicted
>> - Buffer eviction respects priorities set by userspace - Wasteful
>>    ping-ponging is avoided to the extent possible
>>
>> I have been testing out some prototypes, and came up with this rough
>> sketch of an API:
>>
>> - For each ttm_resource_manager, the amount of evicted memory is tracked
>>    (similarly to how "usage" tracks the memory usage). When memory is
>>    evicted via ttm_bo_evict, the size of the evicted memory is added, when
>>    memory is un-evicted (see below), its size is subtracted. The amount of
>>    evicted memory for e.g. VRAM can be queried by userspace via an ioctl.
>>
>> - Each ttm_resource_manager maintains a list of evicted buffer objects.
>>
>> - ttm_mem_unevict walks the list of evicted bos for a given
>>    ttm_resource_manager and tries moving evicted resources back. When a
>>    buffer is freed, this function is called to immediately restore some
>>    evicted memory.
>>
>> - Each ttm_buffer_object independently tracks the mem_type it wants
>>    to reside in.
>>
>> - ttm_bo_try_unevict is added as a helper function which attempts to
>>    move the buffer to its preferred mem_type. If no space is available
>>    there, it fails with -ENOSPC/-ENOMEM.
>>
>> - Similar to how ttm_bo_evict works, each driver can implement
>>    uneviction_valuable/unevict_flags callbacks to control buffer
>>    un-eviction.
>>
>> This is what patches 1-10 accomplish (together with an amdgpu
>> implementation utilizing the new API).
>>
>> Userspace priorities could then be implemented as follows:
>>
>> - TTM already manages priorities for each buffer object. These priorities
>>    can be updated by userspace via a GEM_OP ioctl to inform the kernel
>>    which buffers should be evicted before others. If an ioctl increases
>>    the priority of a buffer, ttm_bo_try_unevict is called on that buffer to
>>    try and move it back (potentially evicting buffers with a lower
>>    priority)
>>
>> - Buffers should never be evicted by other buffers with equal/lower
>>    priority, but if there is a buffer with lower priority occupying VRAM,
>>    it should be evicted in favor of the higher-priority one. This prevents
>>    ping-ponging between buffers that try evicting each other and is
>>    trivially implementable with an early-exit in ttm_mem_evict_first.
>>
>> This is covered in patches 11-15, with the new features exposed to
>> userspace in patches 16-18.
>>
>> I also have a RADV branch utilizing this API at [2], which I use for
>> testing.
>>
>> This implementation is stil very much WIP, although the D3D12 games I
>> tested already seemed to benefit from it. Nevertheless, are still quite
>> a few TODOs and unresolved questions/problems.
>>
>> Some kernel drivers (e.g i915) already use TTM priorities for
>> kernel-internal purposes. Of course, some of the highest priorities
>> should stay reserved for these purposes (with userspace being able to
>> use the lower priorities).
>>
>> Another problem with priorities is the possibility of apps starving other
>> apps by occupying all of VRAM with high-priority allocations. A possible
>> solution could be include restricting the highest priority/priorities
>> to important apps like compositors.
>>
>> Tying into this problem, only apps that are actively cooperating
>> to reduce memory pressure can benefit from the current memory priority
>> implementation. Eventually the priority system could also be utilized
>> to benefit all applications, for example with the desktop environment
>> boosting the priority of the currently-focused app/its cgroup (to
>> provide the best QoS to the apps the user is actively using). A full
>> implementation of this is probably out-of-scope for this initial proposal,
>> but it's probably a good idea to consider this as a possible future use
>> of the priority API.
>>
>> I'm primarily looking to integrate this into amdgpu to solve the
>> issues I've seen there, but I'm also interested in feedback from
>> other drivers. Is this something you'd be interested in? Do you
>> have any objections/comments/questions about my proposed design?
>>
>> Thanks,
>> Friedrich
>>
>> [1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6833
>> [2] https://gitlab.freedesktop.org/pixelcluster/mesa/-/tree/spilling
>>
>> Friedrich Vock (18):
>>    drm/ttm: Add tracking for evicted memory
>>    drm/ttm: Add per-BO eviction tracking
>>    drm/ttm: Implement BO eviction tracking
>>    drm/ttm: Add driver funcs for uneviction control
>>    drm/ttm: Add option to evict no BOs in operation
>>    drm/ttm: Add public buffer eviction/uneviction functions
>>    drm/amdgpu: Add TTM uneviction control functions
>>    drm/amdgpu: Don't try moving BOs to preferred domain before submit
>>    drm/amdgpu: Don't mark VRAM as a busy placement for VRAM|GTT resources
>>    drm/amdgpu: Don't add GTT to initial domains after failing to allocate
>>      VRAM
>>    drm/ttm: Bump BO priority count
>>    drm/ttm: Do not evict BOs with higher priority
>>    drm/ttm: Implement ttm_bo_update_priority
>>    drm/ttm: Consider BOs placed in non-favorite locations evicted
>>    drm/amdgpu: Set a default priority for user/kernel BOs
>>    drm/amdgpu: Implement SET_PRIORITY GEM op
>>    drm/amdgpu: Implement EVICTED_VRAM query
>>    drm/amdgpu: Bump minor version
>>
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   2 -
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c     | 191 +---------------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h     |   4 -
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |   3 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    |  25 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    |   3 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  26 ++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |   4 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    |  50 ++++
>>   drivers/gpu/drm/ttm/ttm_bo.c               | 253 ++++++++++++++++++++-
>>   drivers/gpu/drm/ttm/ttm_bo_util.c          |   3 +
>>   drivers/gpu/drm/ttm/ttm_device.c           |   1 +
>>   drivers/gpu/drm/ttm/ttm_resource.c         |  19 +-
>>   include/drm/ttm/ttm_bo.h                   |  22 ++
>>   include/drm/ttm/ttm_device.h               |  28 +++
>>   include/drm/ttm/ttm_resource.h             |  11 +-
>>   include/uapi/drm/amdgpu_drm.h              |   3 +
>>   17 files changed, 430 insertions(+), 218 deletions(-)
>>
>> --
>> 2.44.0
>>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 02/18] drm/ttm: Add per-BO eviction tracking
  2024-04-25  6:18   ` Christian König
@ 2024-04-25 19:02     ` Matthew Brost
  2024-04-26  6:27       ` Christian König
  0 siblings, 1 reply; 40+ messages in thread
From: Matthew Brost @ 2024-04-25 19:02 UTC (permalink / raw)
  To: Christian König
  Cc: Friedrich Vock, dri-devel, amd-gfx, Pierre-Loup Griffais,
	Tvrtko Ursulin, Bas Nieuwenhuizen, Joshua Ashton, Alex Deucher

On Thu, Apr 25, 2024 at 08:18:38AM +0200, Christian König wrote:
> Am 24.04.24 um 18:56 schrieb Friedrich Vock:
> > Make each buffer object aware of whether it has been evicted or not.
> 
> That reverts some changes we made a couple of years ago.
> 
> In general the idea is that eviction isn't something we need to reverse in
> TTM.
> 
> Rather the driver gives the desired placement.
> 
> Regards,
> Christian.
> 

We have added a concept similar to this in drm_gpuvm [1]. GPUVM
maintains a list of evicted BOs and when the GPUVM is locked for
submission it has validate vfunc which is called on each BO. If driver
is using TTM, this is where the driver would call TTM BO validate which
unevicts the BO. Well at least this what we do it Xe [2].

The uneviction is a per VM operation not a global one. With this, a
global eviction list does not seem correct (admittedly not through the
entire series).

Matt

[1] https://elixir.bootlin.com/linux/v6.8.7/source/drivers/gpu/drm/drm_gpuvm.c#L86
[2] https://elixir.bootlin.com/linux/v6.8.7/source/drivers/gpu/drm/xe/xe_vm.c#L464

> > 
> > Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
> > ---
> >   drivers/gpu/drm/ttm/ttm_bo.c |  1 +
> >   include/drm/ttm/ttm_bo.h     | 11 +++++++++++
> >   2 files changed, 12 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> > index edf10618fe2b2..3968b17453569 100644
> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > @@ -980,6 +980,7 @@ int ttm_bo_init_reserved(struct ttm_device *bdev, struct ttm_buffer_object *bo,
> >   	bo->pin_count = 0;
> >   	bo->sg = sg;
> >   	bo->bulk_move = NULL;
> > +	bo->evicted_type = TTM_NUM_MEM_TYPES;
> >   	if (resv)
> >   		bo->base.resv = resv;
> >   	else
> > diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
> > index 0223a41a64b24..8a1a29c6fbc50 100644
> > --- a/include/drm/ttm/ttm_bo.h
> > +++ b/include/drm/ttm/ttm_bo.h
> > @@ -121,6 +121,17 @@ struct ttm_buffer_object {
> >   	unsigned priority;
> >   	unsigned pin_count;
> > 
> > +	/**
> > +	 * @evicted_type: Memory type this BO was evicted from, if any.
> > +	 * TTM_NUM_MEM_TYPES if this BO was not evicted.
> > +	 */
> > +	int evicted_type;
> > +	/**
> > +	 * @evicted: Entry in the evicted list for the resource manager
> > +	 * this BO was evicted from.
> > +	 */
> > +	struct list_head evicted;
> > +
> >   	/**
> >   	 * @delayed_delete: Work item used when we can't delete the BO
> >   	 * immediately
> > --
> > 2.44.0
> > 
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 02/18] drm/ttm: Add per-BO eviction tracking
  2024-04-25 19:02     ` Matthew Brost
@ 2024-04-26  6:27       ` Christian König
  0 siblings, 0 replies; 40+ messages in thread
From: Christian König @ 2024-04-26  6:27 UTC (permalink / raw)
  To: Matthew Brost, Christian König
  Cc: Friedrich Vock, dri-devel, amd-gfx, Pierre-Loup Griffais,
	Tvrtko Ursulin, Bas Nieuwenhuizen, Joshua Ashton, Alex Deucher

Am 25.04.24 um 21:02 schrieb Matthew Brost:
> On Thu, Apr 25, 2024 at 08:18:38AM +0200, Christian König wrote:
>> Am 24.04.24 um 18:56 schrieb Friedrich Vock:
>>> Make each buffer object aware of whether it has been evicted or not.
>> That reverts some changes we made a couple of years ago.
>>
>> In general the idea is that eviction isn't something we need to reverse in
>> TTM.
>>
>> Rather the driver gives the desired placement.
>>
>> Regards,
>> Christian.
>>
> We have added a concept similar to this in drm_gpuvm [1]. GPUVM
> maintains a list of evicted BOs and when the GPUVM is locked for
> submission it has validate vfunc which is called on each BO. If driver
> is using TTM, this is where the driver would call TTM BO validate which
> unevicts the BO. Well at least this what we do it Xe [2].
>
> The uneviction is a per VM operation not a global one. With this, a
> global eviction list does not seem correct (admittedly not through the
> entire series).

Yeah, that's exactly what I meant when I wrote that this is controlled 
by the "driver" :)

The state machine in AMDGPUs VM code is pretty much the same.

Regards,
Christian.

>
> Matt
>
> [1] https://elixir.bootlin.com/linux/v6.8.7/source/drivers/gpu/drm/drm_gpuvm.c#L86
> [2] https://elixir.bootlin.com/linux/v6.8.7/source/drivers/gpu/drm/xe/xe_vm.c#L464
>
>>> Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
>>> ---
>>>    drivers/gpu/drm/ttm/ttm_bo.c |  1 +
>>>    include/drm/ttm/ttm_bo.h     | 11 +++++++++++
>>>    2 files changed, 12 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>>> index edf10618fe2b2..3968b17453569 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -980,6 +980,7 @@ int ttm_bo_init_reserved(struct ttm_device *bdev, struct ttm_buffer_object *bo,
>>>    	bo->pin_count = 0;
>>>    	bo->sg = sg;
>>>    	bo->bulk_move = NULL;
>>> +	bo->evicted_type = TTM_NUM_MEM_TYPES;
>>>    	if (resv)
>>>    		bo->base.resv = resv;
>>>    	else
>>> diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
>>> index 0223a41a64b24..8a1a29c6fbc50 100644
>>> --- a/include/drm/ttm/ttm_bo.h
>>> +++ b/include/drm/ttm/ttm_bo.h
>>> @@ -121,6 +121,17 @@ struct ttm_buffer_object {
>>>    	unsigned priority;
>>>    	unsigned pin_count;
>>>
>>> +	/**
>>> +	 * @evicted_type: Memory type this BO was evicted from, if any.
>>> +	 * TTM_NUM_MEM_TYPES if this BO was not evicted.
>>> +	 */
>>> +	int evicted_type;
>>> +	/**
>>> +	 * @evicted: Entry in the evicted list for the resource manager
>>> +	 * this BO was evicted from.
>>> +	 */
>>> +	struct list_head evicted;
>>> +
>>>    	/**
>>>    	 * @delayed_delete: Work item used when we can't delete the BO
>>>    	 * immediately
>>> --
>>> 2.44.0
>>>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription
  2024-04-24 16:56 [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Friedrich Vock
                   ` (19 preceding siblings ...)
  2024-04-25 13:22 ` Marek Olšák
@ 2024-05-02 14:23 ` Maarten Lankhorst
  20 siblings, 0 replies; 40+ messages in thread
From: Maarten Lankhorst @ 2024-05-02 14:23 UTC (permalink / raw)
  To: Friedrich Vock, dri-devel, amd-gfx
  Cc: Pierre-Loup Griffais, Tvrtko Ursulin, Bas Nieuwenhuizen,
	Joshua Ashton, Christian König, Alex Deucher

Hey,

Den 2024-04-24 kl. 18:56, skrev Friedrich Vock:
> Hi everyone,
> 
> recently I've been looking into remedies for apps (in particular, newer
> games) that experience significant performance loss when they start to
> hit VRAM limits, especially on older or lower-end cards that struggle
> to fit both desktop apps and all the game data into VRAM at once.
> 
> The root of the problem lies in the fact that from userspace's POV,
> buffer eviction is very opaque: Userspace applications/drivers cannot
> tell how oversubscribed VRAM is, nor do they have fine-grained control
> over which buffers get evicted.  At the same time, with GPU APIs becoming
> increasingly lower-level and GPU-driven, only the application itself
> can know which buffers are used within a particular submission, and
> how important each buffer is. For this, GPU APIs include interfaces
> to query oversubscription and specify memory priorities: In Vulkan,
> oversubscription can be queried through the VK_EXT_memory_budget
> extension. Different buffers can also be assigned priorities via the
> VK_EXT_pageable_device_local_memory extension. Modern games, especially
> D3D12 games via vkd3d-proton, rely on oversubscription being reported and
> priorities being respected in order to perform their memory management.
> 
> However, relaying this information to the kernel via the current KMD uAPIs
> is not possible. On AMDGPU for example, all work submissions include a
> "bo list" that contains any buffer object that is accessed during the
> course of the submission. If VRAM is oversubscribed and a buffer in the
> list was evicted to system memory, that buffer is moved back to VRAM
> (potentially evicting other unused buffers).
> 
> Since the usermode driver doesn't know what buffers are used by the
> application, its only choice is to submit a bo list that contains every
> buffer the application has allocated. In case of VRAM oversubscription,
> it is highly likely that some of the application's buffers were evicted,
> which almost guarantees that some buffers will get moved around. Since
> the bo list is only known at submit time, this also means the buffers
> will get moved right before submitting application work, which is the
> worst possible time to move buffers from a latency perspective. Another
> consequence of the large bo list is that nearly all memory from other
> applications will be evicted, too. When different applications (e.g. game
> and compositor) submit work one after the other, this causes a ping-pong
> effect where each app's submission evicts the other app's memory,
> resulting in a large amount of unnecessary moves.
> 
> This overly aggressive eviction behavior led to RADV adopting a change
> that effectively allows all VRAM applications to reside in system memory
> [1].  This worked around the ping-ponging/excessive buffer moving problem,
> but also meant that any memory evicted to system memory would forever
> stay there, regardless of how VRAM is used.
> 
> My proposal aims at providing a middle ground between these extremes.
> The goals I want to meet are:
> - Userspace is accurately informed about VRAM oversubscription/how much
>    VRAM has been evicted
> - Buffer eviction respects priorities set by userspace - Wasteful
>    ping-ponging is avoided to the extent possible
> 
> I have been testing out some prototypes, and came up with this rough
> sketch of an API:
> 
> - For each ttm_resource_manager, the amount of evicted memory is tracked
>    (similarly to how "usage" tracks the memory usage). When memory is
>    evicted via ttm_bo_evict, the size of the evicted memory is added, when
>    memory is un-evicted (see below), its size is subtracted. The amount of
>    evicted memory for e.g. VRAM can be queried by userspace via an ioctl.
> 
> - Each ttm_resource_manager maintains a list of evicted buffer objects.
> 
> - ttm_mem_unevict walks the list of evicted bos for a given
>    ttm_resource_manager and tries moving evicted resources back. When a
>    buffer is freed, this function is called to immediately restore some
>    evicted memory.
> 
> - Each ttm_buffer_object independently tracks the mem_type it wants
>    to reside in.
> 
> - ttm_bo_try_unevict is added as a helper function which attempts to
>    move the buffer to its preferred mem_type. If no space is available
>    there, it fails with -ENOSPC/-ENOMEM.
> 
> - Similar to how ttm_bo_evict works, each driver can implement
>    uneviction_valuable/unevict_flags callbacks to control buffer
>    un-eviction.
> 
> This is what patches 1-10 accomplish (together with an amdgpu
> implementation utilizing the new API).
> 
> Userspace priorities could then be implemented as follows:
> 
> - TTM already manages priorities for each buffer object. These priorities
>    can be updated by userspace via a GEM_OP ioctl to inform the kernel
>    which buffers should be evicted before others. If an ioctl increases
>    the priority of a buffer, ttm_bo_try_unevict is called on that buffer to
>    try and move it back (potentially evicting buffers with a lower
>    priority)
> 
> - Buffers should never be evicted by other buffers with equal/lower
>    priority, but if there is a buffer with lower priority occupying VRAM,
>    it should be evicted in favor of the higher-priority one. This prevents
>    ping-ponging between buffers that try evicting each other and is
>    trivially implementable with an early-exit in ttm_mem_evict_first.
> 
> This is covered in patches 11-15, with the new features exposed to
> userspace in patches 16-18.
> 
> I also have a RADV branch utilizing this API at [2], which I use for
> testing.
> 
> This implementation is stil very much WIP, although the D3D12 games I
> tested already seemed to benefit from it. Nevertheless, are still quite
> a few TODOs and unresolved questions/problems.
> 
> Some kernel drivers (e.g i915) already use TTM priorities for
> kernel-internal purposes. Of course, some of the highest priorities
> should stay reserved for these purposes (with userspace being able to
> use the lower priorities).
> 
> Another problem with priorities is the possibility of apps starving other
> apps by occupying all of VRAM with high-priority allocations. A possible
> solution could be include restricting the highest priority/priorities
> to important apps like compositors.
> 
> Tying into this problem, only apps that are actively cooperating
> to reduce memory pressure can benefit from the current memory priority
> implementation. Eventually the priority system could also be utilized
> to benefit all applications, for example with the desktop environment
> boosting the priority of the currently-focused app/its cgroup (to
> provide the best QoS to the apps the user is actively using). A full
> implementation of this is probably out-of-scope for this initial proposal,
> but it's probably a good idea to consider this as a possible future use
> of the priority API.
> 
> I'm primarily looking to integrate this into amdgpu to solve the
> issues I've seen there, but I'm also interested in feedback from
> other drivers. Is this something you'd be interested in? Do you
> have any objections/comments/questions about my proposed design?
> 
> Thanks,
> Friedrich
> 
For Xe, I've been loking at using cgroups. A small prototype is available at

https://cgit.freedesktop.org/~mlankhorst/linux/log/?h=dumpcg

To stimulate discussion, I've added amdgpu support as well.
This should make it possible to isolate the compositor allocations
from the target program.

This support is still incomplete and covers vram only, but I need help 
from userspace and consensus from other drivers on how to move forward.

I'm thinking of making 3 cgroup limits:
1. Physical memory, each time a buffer is allocated, it counts towards 
it, regardless where it resides.
2. Mappable memory, all buffers allocated in sysmem or vram count 
towards this limit.
3. VRAM, only buffers residing in VRAM count here.

This ensures that VRAM can always be evicted to sysmem, by having a 
mappable memory quota, and having a sysmem reservation.

The main trouble is that when evicting, you want to charge the original 
process the changes in allocation limits, but it should be solvable.

I've been looking for someone else needing the usecase in a different 
context, so let me know what you think of the idea.

This can be generalized towards all uses of the GPU, but the compositor 
vs game thrashing is a good example of why it is useful to have.

I should still have my cgroup testcase somewhere, this is only a rebase 
of my previous proposal, but I think it fits the usecase.

Cheers,
Maarten

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2024-05-02 14:23 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-24 16:56 [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Friedrich Vock
2024-04-24 16:56 ` [RFC PATCH 01/18] drm/ttm: Add tracking for evicted memory Friedrich Vock
2024-04-24 16:56 ` [RFC PATCH 02/18] drm/ttm: Add per-BO eviction tracking Friedrich Vock
2024-04-25  6:18   ` Christian König
2024-04-25 19:02     ` Matthew Brost
2024-04-26  6:27       ` Christian König
2024-04-24 16:56 ` [RFC PATCH 03/18] drm/ttm: Implement BO " Friedrich Vock
2024-04-24 16:56 ` [RFC PATCH 04/18] drm/ttm: Add driver funcs for uneviction control Friedrich Vock
2024-04-24 16:56 ` [RFC PATCH 05/18] drm/ttm: Add option to evict no BOs in operation Friedrich Vock
2024-04-25  6:20   ` Christian König
2024-04-24 16:56 ` [RFC PATCH 06/18] drm/ttm: Add public buffer eviction/uneviction functions Friedrich Vock
2024-04-24 16:56 ` [RFC PATCH 07/18] drm/amdgpu: Add TTM uneviction control functions Friedrich Vock
2024-04-24 16:56 ` [RFC PATCH 08/18] drm/amdgpu: Don't try moving BOs to preferred domain before submit Friedrich Vock
2024-04-25  6:36   ` Christian König
2024-04-24 16:56 ` [RFC PATCH 09/18] drm/amdgpu: Don't mark VRAM as a busy placement for VRAM|GTT resources Friedrich Vock
2024-04-25  6:24   ` Christian König
2024-04-24 16:57 ` [RFC PATCH 10/18] drm/amdgpu: Don't add GTT to initial domains after failing to allocate VRAM Friedrich Vock
2024-04-25  6:25   ` Christian König
2024-04-25  7:39     ` Friedrich Vock
2024-04-25  7:54       ` Christian König
2024-04-24 16:57 ` [RFC PATCH 11/18] drm/ttm: Bump BO priority count Friedrich Vock
2024-04-24 16:57 ` [RFC PATCH 12/18] drm/ttm: Do not evict BOs with higher priority Friedrich Vock
2024-04-25  6:26   ` Christian König
2024-04-24 16:57 ` [RFC PATCH 13/18] drm/ttm: Implement ttm_bo_update_priority Friedrich Vock
2024-04-25  6:29   ` Christian König
2024-04-24 16:57 ` [RFC PATCH 14/18] drm/ttm: Consider BOs placed in non-favorite locations evicted Friedrich Vock
2024-04-24 16:57 ` [RFC PATCH 15/18] drm/amdgpu: Set a default priority for user/kernel BOs Friedrich Vock
2024-04-24 16:57 ` [RFC PATCH 16/18] drm/amdgpu: Implement SET_PRIORITY GEM op Friedrich Vock
2024-04-25  6:32   ` Christian König
2024-04-25  6:46     ` Friedrich Vock
2024-04-25  6:58       ` Christian König
2024-04-25  7:06         ` Friedrich Vock
2024-04-25  7:15           ` Christian König
2024-04-25  7:39             ` Friedrich Vock
2024-04-24 16:57 ` [RFC PATCH 17/18] drm/amdgpu: Implement EVICTED_VRAM query Friedrich Vock
2024-04-24 16:57 ` [RFC PATCH 18/18] drm/amdgpu: Bump minor version Friedrich Vock
2024-04-25  6:54 ` [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Christian König
2024-04-25 13:22 ` Marek Olšák
2024-04-25 13:33   ` Christian König
2024-05-02 14:23 ` Maarten Lankhorst

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).