[PATCH v2 0/7] drm/ttm: Small fixes / cleanups in prep for shrinking

dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/7] drm/ttm: Small fixes / cleanups in prep for shrinking
@ 2023-03-07 14:46 Thomas Hellström
  2023-03-07 14:46 ` [PATCH v2 1/7] drm/ttm: Fix a NULL pointer dereference Thomas Hellström
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Thomas Hellström @ 2023-03-07 14:46 UTC (permalink / raw)
  To: dri-devel
  Cc: Thomas Hellström, intel-gfx, Christian Koenig, Matthew Auld

I collected the, from my POW, uncontroversial patches from V1 of the TTM
shrinker series, some corrected after the initial patch submission, one
patch added from the Xe RFC ("drm/ttm: Don't print error message if
eviction was interrupted"). It would be nice to have these reviewed and
merged while reworking the rest.

v2:
- Simplify __ttm_pool_free().
- Fix the TTM_TT_FLAG bit numbers.
- Keep all allocation orders for TTM pages at or below PMD order

Thomas Hellström (7):
  drm/ttm: Fix a NULL pointer dereference
  drm/ttm/pool: Fix ttm_pool_alloc error path
  drm/ttm: Use the BIT macro for the TTM_TT_FLAGs
  drm/ttm: Unexport ttm_global_swapout()
  drm/ttm: Don't print error message if eviction was interrupted
  drm/ttm: Reduce the number of used allocation orders for TTM pages
  drm/ttm: Make the call to ttm_tt_populate() interruptible when
    faulting

 drivers/gpu/drm/ttm/ttm_bo.c     |  3 +-
 drivers/gpu/drm/ttm/ttm_bo_vm.c  | 13 ++++-
 drivers/gpu/drm/ttm/ttm_device.c |  3 +-
 drivers/gpu/drm/ttm/ttm_pool.c   | 95 ++++++++++++++++++--------------
 include/drm/ttm/ttm_tt.h         | 10 ++--
 5 files changed, 72 insertions(+), 52 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/7] drm/ttm: Fix a NULL pointer dereference
  2023-03-07 14:46 [PATCH v2 0/7] drm/ttm: Small fixes / cleanups in prep for shrinking Thomas Hellström
@ 2023-03-07 14:46 ` Thomas Hellström
  2023-03-07 16:55   ` Christian König
  2023-03-07 14:46 ` [PATCH v2 2/7] drm/ttm/pool: Fix ttm_pool_alloc error path Thomas Hellström
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Thomas Hellström @ 2023-03-07 14:46 UTC (permalink / raw)
  To: dri-devel
  Cc: Thomas Hellström, Arunpravin Paneer Selvam, Philip Yang,
	Daniel Vetter, Felix Kuehling, Tvrtko Ursulin, Qiang Yu,
	Huang Rui, Matthew Auld, Anshuman Gupta, Alex Deucher, intel-gfx,
	Christian König, Nirmoy Das

The LRU mechanism may look up a resource in the process of being removed
from an object. The locking rules here are a bit unclear but it looks
currently like res->bo assignment is protected by the LRU lock, whereas
bo->resource is protected by the object lock, while *clearing* of
bo->resource is also protected by the LRU lock. This means that if
we check that bo->resource points to the LRU resource under the LRU
lock we should be safe.
So perform that check before deciding to swap out a bo. That avoids
dereferencing a NULL bo->resource in ttm_bo_swapout().

Fixes: 6a9b02899402 ("drm/ttm: move the LRU into resource handling v4")
Cc: Christian König <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Philip Yang <Philip.Yang@amd.com>
Cc: Qiang Yu <qiang.yu@amd.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/ttm/ttm_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index c7a1862f322a..ae2f19dc9f81 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -158,7 +158,7 @@ int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
 			struct ttm_buffer_object *bo = res->bo;
 			uint32_t num_pages;
 
-			if (!bo)
+			if (!bo || bo->resource != res)
 				continue;
 
 			num_pages = PFN_UP(bo->base.size);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/7] drm/ttm/pool: Fix ttm_pool_alloc error path
  2023-03-07 14:46 [PATCH v2 0/7] drm/ttm: Small fixes / cleanups in prep for shrinking Thomas Hellström
  2023-03-07 14:46 ` [PATCH v2 1/7] drm/ttm: Fix a NULL pointer dereference Thomas Hellström
@ 2023-03-07 14:46 ` Thomas Hellström
  2023-03-08  8:48   ` Christian König
  2023-03-07 14:46 ` [PATCH v2 3/7] drm/ttm: Use the BIT macro for the TTM_TT_FLAGs Thomas Hellström
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Thomas Hellström @ 2023-03-07 14:46 UTC (permalink / raw)
  To: dri-devel
  Cc: Thomas Hellström, intel-gfx, Huang Rui, Matthew Auld,
	Dave Airlie, Christian König

When hitting an error, the error path forgot to unmap dma mappings and
could call set_pages_wb() on already uncached pages.

Fix this by introducing a common __ttm_pool_free() function that
does the right thing.

v2:
- Simplify __ttm_pool_free() (Christian König)

Fixes: d099fc8f540a ("drm/ttm: new TT backend allocation pool v3")
Cc: Christian König <christian.koenig@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_pool.c | 68 +++++++++++++++++++---------------
 1 file changed, 38 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index aa116a7bbae3..0b6e20613d19 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -367,6 +367,30 @@ static int ttm_pool_page_allocated(struct ttm_pool *pool, unsigned int order,
 	return 0;
 }
 
+static void __ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt,
+			    enum ttm_caching caching,
+			    pgoff_t start_page, pgoff_t end_page)
+{
+	struct page **pages = tt->pages;
+	unsigned int order;
+	pgoff_t i, nr;
+
+	for (i = start_page; i < end_page; i += nr, pages += nr) {
+		struct ttm_pool_type *pt = NULL;
+
+		order = ttm_pool_page_order(pool, *pages);
+		nr = (1UL << order);
+		if (tt->dma_address)
+			ttm_pool_unmap(pool, tt->dma_address[i], nr);
+
+		pt = ttm_pool_select_type(pool, caching, order);
+		if (pt)
+			ttm_pool_type_give(pt, *pages);
+		else
+			ttm_pool_free_page(pool, caching, order, *pages);
+	}
+}
+
 /**
  * ttm_pool_alloc - Fill a ttm_tt object
  *
@@ -382,12 +406,14 @@ static int ttm_pool_page_allocated(struct ttm_pool *pool, unsigned int order,
 int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 		   struct ttm_operation_ctx *ctx)
 {
-	unsigned long num_pages = tt->num_pages;
+	pgoff_t num_pages = tt->num_pages;
 	dma_addr_t *dma_addr = tt->dma_address;
 	struct page **caching = tt->pages;
 	struct page **pages = tt->pages;
+	enum ttm_caching page_caching;
 	gfp_t gfp_flags = GFP_USER;
-	unsigned int i, order;
+	pgoff_t caching_divide;
+	unsigned int order;
 	struct page *p;
 	int r;
 
@@ -410,6 +436,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 	     order = min_t(unsigned int, order, __fls(num_pages))) {
 		struct ttm_pool_type *pt;
 
+		page_caching = tt->caching;
 		pt = ttm_pool_select_type(pool, tt->caching, order);
 		p = pt ? ttm_pool_type_take(pt) : NULL;
 		if (p) {
@@ -418,6 +445,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 			if (r)
 				goto error_free_page;
 
+			caching = pages;
 			do {
 				r = ttm_pool_page_allocated(pool, order, p,
 							    &dma_addr,
@@ -426,14 +454,15 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 				if (r)
 					goto error_free_page;
 
+				caching = pages;
 				if (num_pages < (1 << order))
 					break;
 
 				p = ttm_pool_type_take(pt);
 			} while (p);
-			caching = pages;
 		}
 
+		page_caching = ttm_cached;
 		while (num_pages >= (1 << order) &&
 		       (p = ttm_pool_alloc_page(pool, gfp_flags, order))) {
 
@@ -442,6 +471,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 							   tt->caching);
 				if (r)
 					goto error_free_page;
+				caching = pages;
 			}
 			r = ttm_pool_page_allocated(pool, order, p, &dma_addr,
 						    &num_pages, &pages);
@@ -468,15 +498,13 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 	return 0;
 
 error_free_page:
-	ttm_pool_free_page(pool, tt->caching, order, p);
+	ttm_pool_free_page(pool, page_caching, order, p);
 
 error_free_all:
 	num_pages = tt->num_pages - num_pages;
-	for (i = 0; i < num_pages; ) {
-		order = ttm_pool_page_order(pool, tt->pages[i]);
-		ttm_pool_free_page(pool, tt->caching, order, tt->pages[i]);
-		i += 1 << order;
-	}
+	caching_divide = caching - tt->pages;
+	__ttm_pool_free(pool, tt, tt->caching, 0, caching_divide);
+	__ttm_pool_free(pool, tt, ttm_cached, caching_divide, num_pages);
 
 	return r;
 }
@@ -492,27 +520,7 @@ EXPORT_SYMBOL(ttm_pool_alloc);
  */
 void ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt)
 {
-	unsigned int i;
-
-	for (i = 0; i < tt->num_pages; ) {
-		struct page *p = tt->pages[i];
-		unsigned int order, num_pages;
-		struct ttm_pool_type *pt;
-
-		order = ttm_pool_page_order(pool, p);
-		num_pages = 1ULL << order;
-		if (tt->dma_address)
-			ttm_pool_unmap(pool, tt->dma_address[i], num_pages);
-
-		pt = ttm_pool_select_type(pool, tt->caching, order);
-		if (pt)
-			ttm_pool_type_give(pt, tt->pages[i]);
-		else
-			ttm_pool_free_page(pool, tt->caching, order,
-					   tt->pages[i]);
-
-		i += num_pages;
-	}
+	__ttm_pool_free(pool, tt, tt->caching, 0, tt->num_pages);
 
 	while (atomic_long_read(&allocated_pages) > page_pool_size)
 		ttm_pool_shrink();
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 3/7] drm/ttm: Use the BIT macro for the TTM_TT_FLAGs
  2023-03-07 14:46 [PATCH v2 0/7] drm/ttm: Small fixes / cleanups in prep for shrinking Thomas Hellström
  2023-03-07 14:46 ` [PATCH v2 1/7] drm/ttm: Fix a NULL pointer dereference Thomas Hellström
  2023-03-07 14:46 ` [PATCH v2 2/7] drm/ttm/pool: Fix ttm_pool_alloc error path Thomas Hellström
@ 2023-03-07 14:46 ` Thomas Hellström
  2023-03-08  8:49   ` Christian König
  2023-03-07 14:46 ` [PATCH v2 4/7] drm/ttm: Unexport ttm_global_swapout() Thomas Hellström
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Thomas Hellström @ 2023-03-07 14:46 UTC (permalink / raw)
  To: dri-devel
  Cc: Thomas Hellström, intel-gfx, Christian Koenig, Matthew Auld

New code is recommended to use the BIT macro instead of the explicit
shifts. Change the older defines so that we can keep the style consistent
with upcoming changes.

v2:
- Also change the value of the _PRIV_POPULATED bit (Christian König)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 include/drm/ttm/ttm_tt.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index b7d3f3843f1e..977ca195a536 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -83,12 +83,12 @@ struct ttm_tt {
 	 * set by TTM after ttm_tt_populate() has successfully returned, and is
 	 * then unset when TTM calls ttm_tt_unpopulate().
 	 */
-#define TTM_TT_FLAG_SWAPPED		(1 << 0)
-#define TTM_TT_FLAG_ZERO_ALLOC		(1 << 1)
-#define TTM_TT_FLAG_EXTERNAL		(1 << 2)
-#define TTM_TT_FLAG_EXTERNAL_MAPPABLE	(1 << 3)
+#define TTM_TT_FLAG_SWAPPED		BIT(0)
+#define TTM_TT_FLAG_ZERO_ALLOC		BIT(1)
+#define TTM_TT_FLAG_EXTERNAL		BIT(2)
+#define TTM_TT_FLAG_EXTERNAL_MAPPABLE	BIT(3)
 
-#define TTM_TT_FLAG_PRIV_POPULATED  (1U << 31)
+#define TTM_TT_FLAG_PRIV_POPULATED	BIT(4)
 	uint32_t page_flags;
 	/** @num_pages: Number of pages in the page array. */
 	uint32_t num_pages;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 4/7] drm/ttm: Unexport ttm_global_swapout()
  2023-03-07 14:46 [PATCH v2 0/7] drm/ttm: Small fixes / cleanups in prep for shrinking Thomas Hellström
                   ` (2 preceding siblings ...)
  2023-03-07 14:46 ` [PATCH v2 3/7] drm/ttm: Use the BIT macro for the TTM_TT_FLAGs Thomas Hellström
@ 2023-03-07 14:46 ` Thomas Hellström
  2023-03-08  8:49   ` Christian König
  2023-03-07 14:46 ` [PATCH v2 5/7] drm/ttm: Don't print error message if eviction was interrupted Thomas Hellström
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Thomas Hellström @ 2023-03-07 14:46 UTC (permalink / raw)
  To: dri-devel
  Cc: Thomas Hellström, intel-gfx, Christian Koenig, Matthew Auld

Unexport ttm_global_swapout() since it is not used outside of TTM.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_device.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index ae2f19dc9f81..64a59f46f6c3 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -137,7 +137,6 @@ int ttm_global_swapout(struct ttm_operation_ctx *ctx, gfp_t gfp_flags)
 	mutex_unlock(&ttm_global_mutex);
 	return ret;
 }
-EXPORT_SYMBOL(ttm_global_swapout);
 
 int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
 		       gfp_t gfp_flags)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 5/7] drm/ttm: Don't print error message if eviction was interrupted
  2023-03-07 14:46 [PATCH v2 0/7] drm/ttm: Small fixes / cleanups in prep for shrinking Thomas Hellström
                   ` (3 preceding siblings ...)
  2023-03-07 14:46 ` [PATCH v2 4/7] drm/ttm: Unexport ttm_global_swapout() Thomas Hellström
@ 2023-03-07 14:46 ` Thomas Hellström
  2023-03-08  8:50   ` Christian König
  2023-03-07 14:46 ` [PATCH v2 6/7] drm/ttm: Reduce the number of used allocation orders for TTM pages Thomas Hellström
  2023-03-07 14:46 ` [PATCH v2 7/7] drm/ttm: Make the call to ttm_tt_populate() interruptible when faulting Thomas Hellström
  6 siblings, 1 reply; 20+ messages in thread
From: Thomas Hellström @ 2023-03-07 14:46 UTC (permalink / raw)
  To: dri-devel
  Cc: Thomas Hellström, intel-gfx, Christian Koenig, Matthew Auld

Avoid printing an error message if eviction was interrupted by,
for example, the user pressing CTRL-C. That may happen if eviction
is waiting for something, like for example a free batch-buffer.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 882c2fa346f3..459f1b4440da 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -464,7 +464,8 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
 	if (ret == -EMULTIHOP) {
 		ret = ttm_bo_bounce_temp_buffer(bo, &evict_mem, ctx, &hop);
 		if (ret) {
-			pr_err("Buffer eviction failed\n");
+			if (ret != -ERESTARTSYS && ret != -EINTR)
+				pr_err("Buffer eviction failed\n");
 			ttm_resource_free(bo, &evict_mem);
 			goto out;
 		}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 6/7] drm/ttm: Reduce the number of used allocation orders for TTM pages
  2023-03-07 14:46 [PATCH v2 0/7] drm/ttm: Small fixes / cleanups in prep for shrinking Thomas Hellström
                   ` (4 preceding siblings ...)
  2023-03-07 14:46 ` [PATCH v2 5/7] drm/ttm: Don't print error message if eviction was interrupted Thomas Hellström
@ 2023-03-07 14:46 ` Thomas Hellström
  2023-03-07 19:42   ` [Intel-gfx] " kernel test robot
  2023-03-08  9:15   ` Christian König
  2023-03-07 14:46 ` [PATCH v2 7/7] drm/ttm: Make the call to ttm_tt_populate() interruptible when faulting Thomas Hellström
  6 siblings, 2 replies; 20+ messages in thread
From: Thomas Hellström @ 2023-03-07 14:46 UTC (permalink / raw)
  To: dri-devel
  Cc: Thomas Hellström, intel-gfx, Christian Koenig, Matthew Auld

When swapping out, we will split multi-order pages both in order to
move them to the swap-cache and to be able to return memory to the
swap cache as soon as possible on a page-by-page basis.
Reduce the page max order to the system PMD size, as we can then be nicer
to the system and avoid splitting gigantic pages.

Looking forward to when we might be able to swap out PMD size folios
without splitting, this will also be a benefit.

v2:
- Include all orders up to the PMD size (Christian König)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_pool.c | 27 ++++++++++++++++-----------
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 0b6e20613d19..939845d853af 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -47,6 +47,9 @@
 
 #include "ttm_module.h"
 
+#define TTM_MAX_ORDER (PMD_SHIFT - PAGE_SHIFT)
+#define TTM_DIM_ORDER (TTM_MAX_ORDER + 1)
+
 /**
  * struct ttm_pool_dma - Helper object for coherent DMA mappings
  *
@@ -65,11 +68,11 @@ module_param(page_pool_size, ulong, 0644);
 
 static atomic_long_t allocated_pages;
 
-static struct ttm_pool_type global_write_combined[MAX_ORDER];
-static struct ttm_pool_type global_uncached[MAX_ORDER];
+static struct ttm_pool_type global_write_combined[TTM_DIM_ORDER];
+static struct ttm_pool_type global_uncached[TTM_DIM_ORDER];
 
-static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER];
-static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
+static struct ttm_pool_type global_dma32_write_combined[TTM_DIM_ORDER];
+static struct ttm_pool_type global_dma32_uncached[TTM_DIM_ORDER];
 
 static spinlock_t shrinker_lock;
 static struct list_head shrinker_list;
@@ -431,7 +434,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 	else
 		gfp_flags |= GFP_HIGHUSER;
 
-	for (order = min_t(unsigned int, MAX_ORDER - 1, __fls(num_pages));
+	for (order = min_t(unsigned int, TTM_MAX_ORDER, __fls(num_pages));
 	     num_pages;
 	     order = min_t(unsigned int, order, __fls(num_pages))) {
 		struct ttm_pool_type *pt;
@@ -550,7 +553,7 @@ void ttm_pool_init(struct ttm_pool *pool, struct device *dev,
 
 	if (use_dma_alloc) {
 		for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
-			for (j = 0; j < MAX_ORDER; ++j)
+			for (j = 0; j < TTM_DIM_ORDER; ++j)
 				ttm_pool_type_init(&pool->caching[i].orders[j],
 						   pool, i, j);
 	}
@@ -570,7 +573,7 @@ void ttm_pool_fini(struct ttm_pool *pool)
 
 	if (pool->use_dma_alloc) {
 		for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
-			for (j = 0; j < MAX_ORDER; ++j)
+			for (j = 0; j < TTM_DIM_ORDER; ++j)
 				ttm_pool_type_fini(&pool->caching[i].orders[j]);
 	}
 
@@ -624,7 +627,7 @@ static void ttm_pool_debugfs_header(struct seq_file *m)
 	unsigned int i;
 
 	seq_puts(m, "\t ");
-	for (i = 0; i < MAX_ORDER; ++i)
+	for (i = 0; i < TTM_DIM_ORDER; ++i)
 		seq_printf(m, " ---%2u---", i);
 	seq_puts(m, "\n");
 }
@@ -635,7 +638,7 @@ static void ttm_pool_debugfs_orders(struct ttm_pool_type *pt,
 {
 	unsigned int i;
 
-	for (i = 0; i < MAX_ORDER; ++i)
+	for (i = 0; i < TTM_DIM_ORDER; ++i)
 		seq_printf(m, " %8u", ttm_pool_type_count(&pt[i]));
 	seq_puts(m, "\n");
 }
@@ -738,13 +741,15 @@ int ttm_pool_mgr_init(unsigned long num_pages)
 {
 	unsigned int i;
 
+	BUILD_BUG_ON(TTM_DIM_ORDER > MAX_ORDER);
+
 	if (!page_pool_size)
 		page_pool_size = num_pages;
 
 	spin_lock_init(&shrinker_lock);
 	INIT_LIST_HEAD(&shrinker_list);
 
-	for (i = 0; i < MAX_ORDER; ++i) {
+	for (i = 0; i < TTM_DIM_ORDER; ++i) {
 		ttm_pool_type_init(&global_write_combined[i], NULL,
 				   ttm_write_combined, i);
 		ttm_pool_type_init(&global_uncached[i], NULL, ttm_uncached, i);
@@ -777,7 +782,7 @@ void ttm_pool_mgr_fini(void)
 {
 	unsigned int i;
 
-	for (i = 0; i < MAX_ORDER; ++i) {
+	for (i = 0; i < TTM_DIM_ORDER; ++i) {
 		ttm_pool_type_fini(&global_write_combined[i]);
 		ttm_pool_type_fini(&global_uncached[i]);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 7/7] drm/ttm: Make the call to ttm_tt_populate() interruptible when faulting
  2023-03-07 14:46 [PATCH v2 0/7] drm/ttm: Small fixes / cleanups in prep for shrinking Thomas Hellström
                   ` (5 preceding siblings ...)
  2023-03-07 14:46 ` [PATCH v2 6/7] drm/ttm: Reduce the number of used allocation orders for TTM pages Thomas Hellström
@ 2023-03-07 14:46 ` Thomas Hellström
  6 siblings, 0 replies; 20+ messages in thread
From: Thomas Hellström @ 2023-03-07 14:46 UTC (permalink / raw)
  To: dri-devel
  Cc: Thomas Hellström, intel-gfx, Christian Koenig, Matthew Auld

When swapping in, or under memory pressure ttm_tt_populate() may sleep
for a substantiable amount of time. Allow interrupts during the sleep.
This will also allow us to inject -EINTR errors during swapin in upcoming
patches.

Also avoid returning VM_FAULT_OOM, since that will confuse the core
mm, making it print out a confused message and retrying the fault.
Return VM_FAULT_SIGBUS also under OOM conditions.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_bo_vm.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index ca7744b852f5..4bca6b54520a 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -218,14 +218,21 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
 	prot = ttm_io_prot(bo, bo->resource, prot);
 	if (!bo->resource->bus.is_iomem) {
 		struct ttm_operation_ctx ctx = {
-			.interruptible = false,
+			.interruptible = true,
 			.no_wait_gpu = false,
 			.force_alloc = true
 		};
 
 		ttm = bo->ttm;
-		if (ttm_tt_populate(bdev, bo->ttm, &ctx))
-			return VM_FAULT_OOM;
+		err = ttm_tt_populate(bdev, bo->ttm, &ctx);
+		if (err) {
+			if (err == -EINTR || err == -ERESTARTSYS ||
+			    err == -EAGAIN)
+				return VM_FAULT_NOPAGE;
+
+			pr_debug("TTM fault hit %pe.\n", ERR_PTR(err));
+			return VM_FAULT_SIGBUS;
+		}
 	} else {
 		/* Iomem should not be marked encrypted */
 		prot = pgprot_decrypted(prot);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/7] drm/ttm: Fix a NULL pointer dereference
  2023-03-07 14:46 ` [PATCH v2 1/7] drm/ttm: Fix a NULL pointer dereference Thomas Hellström
@ 2023-03-07 16:55   ` Christian König
  2023-03-07 17:46     ` Thomas Hellström
  0 siblings, 1 reply; 20+ messages in thread
From: Christian König @ 2023-03-07 16:55 UTC (permalink / raw)
  To: Thomas Hellström, dri-devel
  Cc: Philip Yang, Tvrtko Ursulin, Daniel Vetter, Felix Kuehling,
	Arunpravin Paneer Selvam, Qiang Yu, Huang Rui, Matthew Auld,
	Anshuman Gupta, Alex Deucher, intel-gfx, Nirmoy Das

Am 07.03.23 um 15:46 schrieb Thomas Hellström:
> The LRU mechanism may look up a resource in the process of being removed
> from an object. The locking rules here are a bit unclear but it looks
> currently like res->bo assignment is protected by the LRU lock, whereas
> bo->resource is protected by the object lock, while *clearing* of
> bo->resource is also protected by the LRU lock. This means that if
> we check that bo->resource points to the LRU resource under the LRU
> lock we should be safe.
> So perform that check before deciding to swap out a bo. That avoids
> dereferencing a NULL bo->resource in ttm_bo_swapout().

Please make sure that this is pushed to drm-misc-fixes ASAP.

I've getting complains for this from different sides.

Thanks,
Christian.

>
> Fixes: 6a9b02899402 ("drm/ttm: move the LRU into resource handling v4")
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Huang Rui <ray.huang@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
> Cc: Philip Yang <Philip.Yang@amd.com>
> Cc: Qiang Yu <qiang.yu@amd.com>
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: Nirmoy Das <nirmoy.das@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
> Cc: Anshuman Gupta <anshuman.gupta@intel.com>
> Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Reviewed-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/gpu/drm/ttm/ttm_device.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
> index c7a1862f322a..ae2f19dc9f81 100644
> --- a/drivers/gpu/drm/ttm/ttm_device.c
> +++ b/drivers/gpu/drm/ttm/ttm_device.c
> @@ -158,7 +158,7 @@ int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
>   			struct ttm_buffer_object *bo = res->bo;
>   			uint32_t num_pages;
>   
> -			if (!bo)
> +			if (!bo || bo->resource != res)
>   				continue;
>   
>   			num_pages = PFN_UP(bo->base.size);


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/7] drm/ttm: Fix a NULL pointer dereference
  2023-03-07 16:55   ` Christian König
@ 2023-03-07 17:46     ` Thomas Hellström
  0 siblings, 0 replies; 20+ messages in thread
From: Thomas Hellström @ 2023-03-07 17:46 UTC (permalink / raw)
  To: Christian König, dri-devel
  Cc: Philip Yang, Tvrtko Ursulin, Daniel Vetter, Felix Kuehling,
	Arunpravin Paneer Selvam, Qiang Yu, Huang Rui, Matthew Auld,
	Anshuman Gupta, Alex Deucher, intel-gfx, Nirmoy Das


On 3/7/23 17:55, Christian König wrote:
> Am 07.03.23 um 15:46 schrieb Thomas Hellström:
>> The LRU mechanism may look up a resource in the process of being removed
>> from an object. The locking rules here are a bit unclear but it looks
>> currently like res->bo assignment is protected by the LRU lock, whereas
>> bo->resource is protected by the object lock, while *clearing* of
>> bo->resource is also protected by the LRU lock. This means that if
>> we check that bo->resource points to the LRU resource under the LRU
>> lock we should be safe.
>> So perform that check before deciding to swap out a bo. That avoids
>> dereferencing a NULL bo->resource in ttm_bo_swapout().
>
> Please make sure that this is pushed to drm-misc-fixes ASAP.
>
> I've getting complains for this from different sides.
>
> Thanks,
> Christian.

Done.

/Thomas


>
>>
>> Fixes: 6a9b02899402 ("drm/ttm: move the LRU into resource handling v4")
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Cc: Huang Rui <ray.huang@amd.com>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
>> Cc: Philip Yang <Philip.Yang@amd.com>
>> Cc: Qiang Yu <qiang.yu@amd.com>
>> Cc: Matthew Auld <matthew.auld@intel.com>
>> Cc: Nirmoy Das <nirmoy.das@intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
>> Cc: Anshuman Gupta <anshuman.gupta@intel.com>
>> Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
>> Cc: dri-devel@lists.freedesktop.org
>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Reviewed-by: Christian König <christian.koenig@amd.com>
>> ---
>>   drivers/gpu/drm/ttm/ttm_device.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_device.c 
>> b/drivers/gpu/drm/ttm/ttm_device.c
>> index c7a1862f322a..ae2f19dc9f81 100644
>> --- a/drivers/gpu/drm/ttm/ttm_device.c
>> +++ b/drivers/gpu/drm/ttm/ttm_device.c
>> @@ -158,7 +158,7 @@ int ttm_device_swapout(struct ttm_device *bdev, 
>> struct ttm_operation_ctx *ctx,
>>               struct ttm_buffer_object *bo = res->bo;
>>               uint32_t num_pages;
>>   -            if (!bo)
>> +            if (!bo || bo->resource != res)
>>                   continue;
>>                 num_pages = PFN_UP(bo->base.size);
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Intel-gfx] [PATCH v2 6/7] drm/ttm: Reduce the number of used allocation orders for TTM pages
  2023-03-07 14:46 ` [PATCH v2 6/7] drm/ttm: Reduce the number of used allocation orders for TTM pages Thomas Hellström
@ 2023-03-07 19:42   ` kernel test robot
  2023-03-08  9:15   ` Christian König
  1 sibling, 0 replies; 20+ messages in thread
From: kernel test robot @ 2023-03-07 19:42 UTC (permalink / raw)
  To: Thomas Hellström, dri-devel
  Cc: Thomas Hellström, intel-gfx, Matthew Auld, Christian Koenig,
	oe-kbuild-all

Hi Thomas,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on drm-misc/drm-misc-next]
[also build test ERROR on drm-intel/for-linux-next]
[cannot apply to drm-tip/drm-tip]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Thomas-Hellstr-m/drm-ttm-Fix-a-NULL-pointer-dereference/20230307-224931
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link:    https://lore.kernel.org/r/20230307144621.10748-7-thomas.hellstrom%40linux.intel.com
patch subject: [Intel-gfx] [PATCH v2 6/7] drm/ttm: Reduce the number of used allocation orders for TTM pages
config: powerpc-randconfig-r006-20230306 (https://download.01.org/0day-ci/archive/20230308/202303080352.azyeWwwt-lkp@intel.com/config)
compiler: powerpc-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/0eee47dba298051fc49965d56cb17dd113ff0236
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Thomas-Hellstr-m/drm-ttm-Fix-a-NULL-pointer-dereference/20230307-224931
        git checkout 0eee47dba298051fc49965d56cb17dd113ff0236
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=powerpc olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=powerpc SHELL=/bin/bash drivers/gpu/drm/ttm/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303080352.azyeWwwt-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   In function 'ttm_pool_type_init',
       inlined from 'ttm_pool_init' at drivers/gpu/drm/ttm/ttm_pool.c:557:5:
>> drivers/gpu/drm/ttm/ttm_pool.c:264:18: warning: iteration 9 invokes undefined behavior [-Waggressive-loop-optimizations]
     264 |         pt->pool = pool;
         |         ~~~~~~~~~^~~~~~
   drivers/gpu/drm/ttm/ttm_pool.c: In function 'ttm_pool_init':
   drivers/gpu/drm/ttm/ttm_pool.c:556:39: note: within this loop
     556 |                         for (j = 0; j < TTM_DIM_ORDER; ++j)
         |                                       ^
   In file included from <command-line>:
   drivers/gpu/drm/ttm/ttm_pool.c: In function 'ttm_pool_mgr_init':
>> include/linux/compiler_types.h:358:45: error: call to '__compiletime_assert_283' declared with attribute error: BUILD_BUG_ON failed: TTM_DIM_ORDER > MAX_ORDER
     358 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |                                             ^
   include/linux/compiler_types.h:339:25: note: in definition of macro '__compiletime_assert'
     339 |                         prefix ## suffix();                             \
         |                         ^~~~~~
   include/linux/compiler_types.h:358:9: note: in expansion of macro '_compiletime_assert'
     358 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |         ^~~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
      39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
         |                                     ^~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:50:9: note: in expansion of macro 'BUILD_BUG_ON_MSG'
      50 |         BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
         |         ^~~~~~~~~~~~~~~~
   drivers/gpu/drm/ttm/ttm_pool.c:744:9: note: in expansion of macro 'BUILD_BUG_ON'
     744 |         BUILD_BUG_ON(TTM_DIM_ORDER > MAX_ORDER);
         |         ^~~~~~~~~~~~


vim +/__compiletime_assert_283 +358 include/linux/compiler_types.h

eb5c2d4b45e3d2 Will Deacon 2020-07-21  344  
eb5c2d4b45e3d2 Will Deacon 2020-07-21  345  #define _compiletime_assert(condition, msg, prefix, suffix) \
eb5c2d4b45e3d2 Will Deacon 2020-07-21  346  	__compiletime_assert(condition, msg, prefix, suffix)
eb5c2d4b45e3d2 Will Deacon 2020-07-21  347  
eb5c2d4b45e3d2 Will Deacon 2020-07-21  348  /**
eb5c2d4b45e3d2 Will Deacon 2020-07-21  349   * compiletime_assert - break build and emit msg if condition is false
eb5c2d4b45e3d2 Will Deacon 2020-07-21  350   * @condition: a compile-time constant condition to check
eb5c2d4b45e3d2 Will Deacon 2020-07-21  351   * @msg:       a message to emit if condition is false
eb5c2d4b45e3d2 Will Deacon 2020-07-21  352   *
eb5c2d4b45e3d2 Will Deacon 2020-07-21  353   * In tradition of POSIX assert, this macro will break the build if the
eb5c2d4b45e3d2 Will Deacon 2020-07-21  354   * supplied condition is *false*, emitting the supplied error message if the
eb5c2d4b45e3d2 Will Deacon 2020-07-21  355   * compiler has support to do so.
eb5c2d4b45e3d2 Will Deacon 2020-07-21  356   */
eb5c2d4b45e3d2 Will Deacon 2020-07-21  357  #define compiletime_assert(condition, msg) \
eb5c2d4b45e3d2 Will Deacon 2020-07-21 @358  	_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
eb5c2d4b45e3d2 Will Deacon 2020-07-21  359  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/7] drm/ttm/pool: Fix ttm_pool_alloc error path
  2023-03-07 14:46 ` [PATCH v2 2/7] drm/ttm/pool: Fix ttm_pool_alloc error path Thomas Hellström
@ 2023-03-08  8:48   ` Christian König
  2023-03-08  8:58     ` Thomas Hellström
  0 siblings, 1 reply; 20+ messages in thread
From: Christian König @ 2023-03-08  8:48 UTC (permalink / raw)
  To: Thomas Hellström, dri-devel
  Cc: Dave Airlie, intel-gfx, Huang Rui, Matthew Auld

Am 07.03.23 um 15:46 schrieb Thomas Hellström:
> When hitting an error, the error path forgot to unmap dma mappings and
> could call set_pages_wb() on already uncached pages.
>
> Fix this by introducing a common __ttm_pool_free() function that
> does the right thing.
>
> v2:
> - Simplify __ttm_pool_free() (Christian König)
>
> Fixes: d099fc8f540a ("drm/ttm: new TT backend allocation pool v3")
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Huang Rui <ray.huang@amd.com>
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>   drivers/gpu/drm/ttm/ttm_pool.c | 68 +++++++++++++++++++---------------
>   1 file changed, 38 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
> index aa116a7bbae3..0b6e20613d19 100644
> --- a/drivers/gpu/drm/ttm/ttm_pool.c
> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> @@ -367,6 +367,30 @@ static int ttm_pool_page_allocated(struct ttm_pool *pool, unsigned int order,
>   	return 0;
>   }
>   
> +static void __ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt,

Maybe name that ttm_pool_free_range() and add a comment why we need it. 
Something like "/* Cleanup all pages in the tt between start_page till 
end_page */".

Apart from that looks good to me.

Regards,
Christian.

> +			    enum ttm_caching caching,
> +			    pgoff_t start_page, pgoff_t end_page)
> +{
> +	struct page **pages = tt->pages;
> +	unsigned int order;
> +	pgoff_t i, nr;
> +
> +	for (i = start_page; i < end_page; i += nr, pages += nr) {
> +		struct ttm_pool_type *pt = NULL;
> +
> +		order = ttm_pool_page_order(pool, *pages);
> +		nr = (1UL << order);
> +		if (tt->dma_address)
> +			ttm_pool_unmap(pool, tt->dma_address[i], nr);
> +
> +		pt = ttm_pool_select_type(pool, caching, order);
> +		if (pt)
> +			ttm_pool_type_give(pt, *pages);
> +		else
> +			ttm_pool_free_page(pool, caching, order, *pages);
> +	}
> +}
> +
>   /**
>    * ttm_pool_alloc - Fill a ttm_tt object
>    *
> @@ -382,12 +406,14 @@ static int ttm_pool_page_allocated(struct ttm_pool *pool, unsigned int order,
>   int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
>   		   struct ttm_operation_ctx *ctx)
>   {
> -	unsigned long num_pages = tt->num_pages;
> +	pgoff_t num_pages = tt->num_pages;
>   	dma_addr_t *dma_addr = tt->dma_address;
>   	struct page **caching = tt->pages;
>   	struct page **pages = tt->pages;
> +	enum ttm_caching page_caching;
>   	gfp_t gfp_flags = GFP_USER;
> -	unsigned int i, order;
> +	pgoff_t caching_divide;
> +	unsigned int order;
>   	struct page *p;
>   	int r;
>   
> @@ -410,6 +436,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
>   	     order = min_t(unsigned int, order, __fls(num_pages))) {
>   		struct ttm_pool_type *pt;
>   
> +		page_caching = tt->caching;
>   		pt = ttm_pool_select_type(pool, tt->caching, order);
>   		p = pt ? ttm_pool_type_take(pt) : NULL;
>   		if (p) {
> @@ -418,6 +445,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
>   			if (r)
>   				goto error_free_page;
>   
> +			caching = pages;
>   			do {
>   				r = ttm_pool_page_allocated(pool, order, p,
>   							    &dma_addr,
> @@ -426,14 +454,15 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
>   				if (r)
>   					goto error_free_page;
>   
> +				caching = pages;
>   				if (num_pages < (1 << order))
>   					break;
>   
>   				p = ttm_pool_type_take(pt);
>   			} while (p);
> -			caching = pages;
>   		}
>   
> +		page_caching = ttm_cached;
>   		while (num_pages >= (1 << order) &&
>   		       (p = ttm_pool_alloc_page(pool, gfp_flags, order))) {
>   
> @@ -442,6 +471,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
>   							   tt->caching);
>   				if (r)
>   					goto error_free_page;
> +				caching = pages;
>   			}
>   			r = ttm_pool_page_allocated(pool, order, p, &dma_addr,
>   						    &num_pages, &pages);
> @@ -468,15 +498,13 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
>   	return 0;
>   
>   error_free_page:
> -	ttm_pool_free_page(pool, tt->caching, order, p);
> +	ttm_pool_free_page(pool, page_caching, order, p);
>   
>   error_free_all:
>   	num_pages = tt->num_pages - num_pages;
> -	for (i = 0; i < num_pages; ) {
> -		order = ttm_pool_page_order(pool, tt->pages[i]);
> -		ttm_pool_free_page(pool, tt->caching, order, tt->pages[i]);
> -		i += 1 << order;
> -	}
> +	caching_divide = caching - tt->pages;
> +	__ttm_pool_free(pool, tt, tt->caching, 0, caching_divide);
> +	__ttm_pool_free(pool, tt, ttm_cached, caching_divide, num_pages);
>   
>   	return r;
>   }
> @@ -492,27 +520,7 @@ EXPORT_SYMBOL(ttm_pool_alloc);
>    */
>   void ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt)
>   {
> -	unsigned int i;
> -
> -	for (i = 0; i < tt->num_pages; ) {
> -		struct page *p = tt->pages[i];
> -		unsigned int order, num_pages;
> -		struct ttm_pool_type *pt;
> -
> -		order = ttm_pool_page_order(pool, p);
> -		num_pages = 1ULL << order;
> -		if (tt->dma_address)
> -			ttm_pool_unmap(pool, tt->dma_address[i], num_pages);
> -
> -		pt = ttm_pool_select_type(pool, tt->caching, order);
> -		if (pt)
> -			ttm_pool_type_give(pt, tt->pages[i]);
> -		else
> -			ttm_pool_free_page(pool, tt->caching, order,
> -					   tt->pages[i]);
> -
> -		i += num_pages;
> -	}
> +	__ttm_pool_free(pool, tt, tt->caching, 0, tt->num_pages);
>   
>   	while (atomic_long_read(&allocated_pages) > page_pool_size)
>   		ttm_pool_shrink();


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/7] drm/ttm: Use the BIT macro for the TTM_TT_FLAGs
  2023-03-07 14:46 ` [PATCH v2 3/7] drm/ttm: Use the BIT macro for the TTM_TT_FLAGs Thomas Hellström
@ 2023-03-08  8:49   ` Christian König
  2023-03-09  7:06     ` Thomas Hellström
  0 siblings, 1 reply; 20+ messages in thread
From: Christian König @ 2023-03-08  8:49 UTC (permalink / raw)
  To: Thomas Hellström, dri-devel; +Cc: intel-gfx, Matthew Auld

Am 07.03.23 um 15:46 schrieb Thomas Hellström:
> New code is recommended to use the BIT macro instead of the explicit
> shifts. Change the older defines so that we can keep the style consistent
> with upcoming changes.
>
> v2:
> - Also change the value of the _PRIV_POPULATED bit (Christian König)
>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   include/drm/ttm/ttm_tt.h | 10 +++++-----
>   1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
> index b7d3f3843f1e..977ca195a536 100644
> --- a/include/drm/ttm/ttm_tt.h
> +++ b/include/drm/ttm/ttm_tt.h
> @@ -83,12 +83,12 @@ struct ttm_tt {
>   	 * set by TTM after ttm_tt_populate() has successfully returned, and is
>   	 * then unset when TTM calls ttm_tt_unpopulate().
>   	 */
> -#define TTM_TT_FLAG_SWAPPED		(1 << 0)
> -#define TTM_TT_FLAG_ZERO_ALLOC		(1 << 1)
> -#define TTM_TT_FLAG_EXTERNAL		(1 << 2)
> -#define TTM_TT_FLAG_EXTERNAL_MAPPABLE	(1 << 3)
> +#define TTM_TT_FLAG_SWAPPED		BIT(0)
> +#define TTM_TT_FLAG_ZERO_ALLOC		BIT(1)
> +#define TTM_TT_FLAG_EXTERNAL		BIT(2)
> +#define TTM_TT_FLAG_EXTERNAL_MAPPABLE	BIT(3)
>   
> -#define TTM_TT_FLAG_PRIV_POPULATED  (1U << 31)
> +#define TTM_TT_FLAG_PRIV_POPULATED	BIT(4)
>   	uint32_t page_flags;
>   	/** @num_pages: Number of pages in the page array. */
>   	uint32_t num_pages;


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 4/7] drm/ttm: Unexport ttm_global_swapout()
  2023-03-07 14:46 ` [PATCH v2 4/7] drm/ttm: Unexport ttm_global_swapout() Thomas Hellström
@ 2023-03-08  8:49   ` Christian König
  0 siblings, 0 replies; 20+ messages in thread
From: Christian König @ 2023-03-08  8:49 UTC (permalink / raw)
  To: Thomas Hellström, dri-devel; +Cc: intel-gfx, Matthew Auld

Am 07.03.23 um 15:46 schrieb Thomas Hellström:
> Unexport ttm_global_swapout() since it is not used outside of TTM.
>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/ttm/ttm_device.c | 1 -
>   1 file changed, 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
> index ae2f19dc9f81..64a59f46f6c3 100644
> --- a/drivers/gpu/drm/ttm/ttm_device.c
> +++ b/drivers/gpu/drm/ttm/ttm_device.c
> @@ -137,7 +137,6 @@ int ttm_global_swapout(struct ttm_operation_ctx *ctx, gfp_t gfp_flags)
>   	mutex_unlock(&ttm_global_mutex);
>   	return ret;
>   }
> -EXPORT_SYMBOL(ttm_global_swapout);
>   
>   int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
>   		       gfp_t gfp_flags)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 5/7] drm/ttm: Don't print error message if eviction was interrupted
  2023-03-07 14:46 ` [PATCH v2 5/7] drm/ttm: Don't print error message if eviction was interrupted Thomas Hellström
@ 2023-03-08  8:50   ` Christian König
  0 siblings, 0 replies; 20+ messages in thread
From: Christian König @ 2023-03-08  8:50 UTC (permalink / raw)
  To: Thomas Hellström, dri-devel; +Cc: intel-gfx, Matthew Auld

Am 07.03.23 um 15:46 schrieb Thomas Hellström:
> Avoid printing an error message if eviction was interrupted by,
> for example, the user pressing CTRL-C. That may happen if eviction
> is waiting for something, like for example a free batch-buffer.
>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 882c2fa346f3..459f1b4440da 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -464,7 +464,8 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
>   	if (ret == -EMULTIHOP) {
>   		ret = ttm_bo_bounce_temp_buffer(bo, &evict_mem, ctx, &hop);
>   		if (ret) {
> -			pr_err("Buffer eviction failed\n");
> +			if (ret != -ERESTARTSYS && ret != -EINTR)
> +				pr_err("Buffer eviction failed\n");
>   			ttm_resource_free(bo, &evict_mem);
>   			goto out;
>   		}


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/7] drm/ttm/pool: Fix ttm_pool_alloc error path
  2023-03-08  8:48   ` Christian König
@ 2023-03-08  8:58     ` Thomas Hellström
  0 siblings, 0 replies; 20+ messages in thread
From: Thomas Hellström @ 2023-03-08  8:58 UTC (permalink / raw)
  To: Christian König, dri-devel
  Cc: Dave Airlie, intel-gfx, Huang Rui, Matthew Auld


On 3/8/23 09:48, Christian König wrote:
> Am 07.03.23 um 15:46 schrieb Thomas Hellström:
>> When hitting an error, the error path forgot to unmap dma mappings and
>> could call set_pages_wb() on already uncached pages.
>>
>> Fix this by introducing a common __ttm_pool_free() function that
>> does the right thing.
>>
>> v2:
>> - Simplify __ttm_pool_free() (Christian König)
>>
>> Fixes: d099fc8f540a ("drm/ttm: new TT backend allocation pool v3")
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Dave Airlie <airlied@redhat.com>
>> Cc: Christian Koenig <christian.koenig@amd.com>
>> Cc: Huang Rui <ray.huang@amd.com>
>> Cc: dri-devel@lists.freedesktop.org
>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> ---
>>   drivers/gpu/drm/ttm/ttm_pool.c | 68 +++++++++++++++++++---------------
>>   1 file changed, 38 insertions(+), 30 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c 
>> b/drivers/gpu/drm/ttm/ttm_pool.c
>> index aa116a7bbae3..0b6e20613d19 100644
>> --- a/drivers/gpu/drm/ttm/ttm_pool.c
>> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
>> @@ -367,6 +367,30 @@ static int ttm_pool_page_allocated(struct 
>> ttm_pool *pool, unsigned int order,
>>       return 0;
>>   }
>>   +static void __ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt,
>
> Maybe name that ttm_pool_free_range() and add a comment why we need 
> it. Something like "/* Cleanup all pages in the tt between start_page 
> till end_page */".

Sure, will do.

/Thomas

>
> Apart from that looks good to me.
>
> Regards,
> Christian.
>
>> +                enum ttm_caching caching,
>> +                pgoff_t start_page, pgoff_t end_page)
>> +{
>> +    struct page **pages = tt->pages;
>> +    unsigned int order;
>> +    pgoff_t i, nr;
>> +
>> +    for (i = start_page; i < end_page; i += nr, pages += nr) {
>> +        struct ttm_pool_type *pt = NULL;
>> +
>> +        order = ttm_pool_page_order(pool, *pages);
>> +        nr = (1UL << order);
>> +        if (tt->dma_address)
>> +            ttm_pool_unmap(pool, tt->dma_address[i], nr);
>> +
>> +        pt = ttm_pool_select_type(pool, caching, order);
>> +        if (pt)
>> +            ttm_pool_type_give(pt, *pages);
>> +        else
>> +            ttm_pool_free_page(pool, caching, order, *pages);
>> +    }
>> +}
>> +
>>   /**
>>    * ttm_pool_alloc - Fill a ttm_tt object
>>    *
>> @@ -382,12 +406,14 @@ static int ttm_pool_page_allocated(struct 
>> ttm_pool *pool, unsigned int order,
>>   int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
>>              struct ttm_operation_ctx *ctx)
>>   {
>> -    unsigned long num_pages = tt->num_pages;
>> +    pgoff_t num_pages = tt->num_pages;
>>       dma_addr_t *dma_addr = tt->dma_address;
>>       struct page **caching = tt->pages;
>>       struct page **pages = tt->pages;
>> +    enum ttm_caching page_caching;
>>       gfp_t gfp_flags = GFP_USER;
>> -    unsigned int i, order;
>> +    pgoff_t caching_divide;
>> +    unsigned int order;
>>       struct page *p;
>>       int r;
>>   @@ -410,6 +436,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, 
>> struct ttm_tt *tt,
>>            order = min_t(unsigned int, order, __fls(num_pages))) {
>>           struct ttm_pool_type *pt;
>>   +        page_caching = tt->caching;
>>           pt = ttm_pool_select_type(pool, tt->caching, order);
>>           p = pt ? ttm_pool_type_take(pt) : NULL;
>>           if (p) {
>> @@ -418,6 +445,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct 
>> ttm_tt *tt,
>>               if (r)
>>                   goto error_free_page;
>>   +            caching = pages;
>>               do {
>>                   r = ttm_pool_page_allocated(pool, order, p,
>>                                   &dma_addr,
>> @@ -426,14 +454,15 @@ int ttm_pool_alloc(struct ttm_pool *pool, 
>> struct ttm_tt *tt,
>>                   if (r)
>>                       goto error_free_page;
>>   +                caching = pages;
>>                   if (num_pages < (1 << order))
>>                       break;
>>                     p = ttm_pool_type_take(pt);
>>               } while (p);
>> -            caching = pages;
>>           }
>>   +        page_caching = ttm_cached;
>>           while (num_pages >= (1 << order) &&
>>                  (p = ttm_pool_alloc_page(pool, gfp_flags, order))) {
>>   @@ -442,6 +471,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, 
>> struct ttm_tt *tt,
>>                                  tt->caching);
>>                   if (r)
>>                       goto error_free_page;
>> +                caching = pages;
>>               }
>>               r = ttm_pool_page_allocated(pool, order, p, &dma_addr,
>>                               &num_pages, &pages);
>> @@ -468,15 +498,13 @@ int ttm_pool_alloc(struct ttm_pool *pool, 
>> struct ttm_tt *tt,
>>       return 0;
>>     error_free_page:
>> -    ttm_pool_free_page(pool, tt->caching, order, p);
>> +    ttm_pool_free_page(pool, page_caching, order, p);
>>     error_free_all:
>>       num_pages = tt->num_pages - num_pages;
>> -    for (i = 0; i < num_pages; ) {
>> -        order = ttm_pool_page_order(pool, tt->pages[i]);
>> -        ttm_pool_free_page(pool, tt->caching, order, tt->pages[i]);
>> -        i += 1 << order;
>> -    }
>> +    caching_divide = caching - tt->pages;
>> +    __ttm_pool_free(pool, tt, tt->caching, 0, caching_divide);
>> +    __ttm_pool_free(pool, tt, ttm_cached, caching_divide, num_pages);
>>         return r;
>>   }
>> @@ -492,27 +520,7 @@ EXPORT_SYMBOL(ttm_pool_alloc);
>>    */
>>   void ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt)
>>   {
>> -    unsigned int i;
>> -
>> -    for (i = 0; i < tt->num_pages; ) {
>> -        struct page *p = tt->pages[i];
>> -        unsigned int order, num_pages;
>> -        struct ttm_pool_type *pt;
>> -
>> -        order = ttm_pool_page_order(pool, p);
>> -        num_pages = 1ULL << order;
>> -        if (tt->dma_address)
>> -            ttm_pool_unmap(pool, tt->dma_address[i], num_pages);
>> -
>> -        pt = ttm_pool_select_type(pool, tt->caching, order);
>> -        if (pt)
>> -            ttm_pool_type_give(pt, tt->pages[i]);
>> -        else
>> -            ttm_pool_free_page(pool, tt->caching, order,
>> -                       tt->pages[i]);
>> -
>> -        i += num_pages;
>> -    }
>> +    __ttm_pool_free(pool, tt, tt->caching, 0, tt->num_pages);
>>         while (atomic_long_read(&allocated_pages) > page_pool_size)
>>           ttm_pool_shrink();
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 6/7] drm/ttm: Reduce the number of used allocation orders for TTM pages
  2023-03-07 14:46 ` [PATCH v2 6/7] drm/ttm: Reduce the number of used allocation orders for TTM pages Thomas Hellström
  2023-03-07 19:42   ` [Intel-gfx] " kernel test robot
@ 2023-03-08  9:15   ` Christian König
  2023-03-08  9:22     ` Thomas Hellström
  1 sibling, 1 reply; 20+ messages in thread
From: Christian König @ 2023-03-08  9:15 UTC (permalink / raw)
  To: Thomas Hellström, dri-devel; +Cc: intel-gfx, Matthew Auld

Am 07.03.23 um 15:46 schrieb Thomas Hellström:
> When swapping out, we will split multi-order pages both in order to
> move them to the swap-cache and to be able to return memory to the
> swap cache as soon as possible on a page-by-page basis.
> Reduce the page max order to the system PMD size, as we can then be nicer
> to the system and avoid splitting gigantic pages.

Mhm, we actually have a todo to start supporting giant pages at some time.

Using the folio directly just saves tons of overhead when you don't need 
to allocate 2MiG page array any more for each 1GiB you allocate.

But that probably needs tons of work anyway, so feel free to add my rb 
for now.

Regards,
Christian.

>
> Looking forward to when we might be able to swap out PMD size folios
> without splitting, this will also be a benefit.
>
> v2:
> - Include all orders up to the PMD size (Christian König)
>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>   drivers/gpu/drm/ttm/ttm_pool.c | 27 ++++++++++++++++-----------
>   1 file changed, 16 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
> index 0b6e20613d19..939845d853af 100644
> --- a/drivers/gpu/drm/ttm/ttm_pool.c
> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> @@ -47,6 +47,9 @@
>   
>   #include "ttm_module.h"
>   
> +#define TTM_MAX_ORDER (PMD_SHIFT - PAGE_SHIFT)
> +#define TTM_DIM_ORDER (TTM_MAX_ORDER + 1)
> +
>   /**
>    * struct ttm_pool_dma - Helper object for coherent DMA mappings
>    *
> @@ -65,11 +68,11 @@ module_param(page_pool_size, ulong, 0644);
>   
>   static atomic_long_t allocated_pages;
>   
> -static struct ttm_pool_type global_write_combined[MAX_ORDER];
> -static struct ttm_pool_type global_uncached[MAX_ORDER];
> +static struct ttm_pool_type global_write_combined[TTM_DIM_ORDER];
> +static struct ttm_pool_type global_uncached[TTM_DIM_ORDER];
>   
> -static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER];
> -static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
> +static struct ttm_pool_type global_dma32_write_combined[TTM_DIM_ORDER];
> +static struct ttm_pool_type global_dma32_uncached[TTM_DIM_ORDER];
>   
>   static spinlock_t shrinker_lock;
>   static struct list_head shrinker_list;
> @@ -431,7 +434,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
>   	else
>   		gfp_flags |= GFP_HIGHUSER;
>   
> -	for (order = min_t(unsigned int, MAX_ORDER - 1, __fls(num_pages));
> +	for (order = min_t(unsigned int, TTM_MAX_ORDER, __fls(num_pages));
>   	     num_pages;
>   	     order = min_t(unsigned int, order, __fls(num_pages))) {
>   		struct ttm_pool_type *pt;
> @@ -550,7 +553,7 @@ void ttm_pool_init(struct ttm_pool *pool, struct device *dev,
>   
>   	if (use_dma_alloc) {
>   		for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
> -			for (j = 0; j < MAX_ORDER; ++j)
> +			for (j = 0; j < TTM_DIM_ORDER; ++j)
>   				ttm_pool_type_init(&pool->caching[i].orders[j],
>   						   pool, i, j);
>   	}
> @@ -570,7 +573,7 @@ void ttm_pool_fini(struct ttm_pool *pool)
>   
>   	if (pool->use_dma_alloc) {
>   		for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
> -			for (j = 0; j < MAX_ORDER; ++j)
> +			for (j = 0; j < TTM_DIM_ORDER; ++j)
>   				ttm_pool_type_fini(&pool->caching[i].orders[j]);
>   	}
>   
> @@ -624,7 +627,7 @@ static void ttm_pool_debugfs_header(struct seq_file *m)
>   	unsigned int i;
>   
>   	seq_puts(m, "\t ");
> -	for (i = 0; i < MAX_ORDER; ++i)
> +	for (i = 0; i < TTM_DIM_ORDER; ++i)
>   		seq_printf(m, " ---%2u---", i);
>   	seq_puts(m, "\n");
>   }
> @@ -635,7 +638,7 @@ static void ttm_pool_debugfs_orders(struct ttm_pool_type *pt,
>   {
>   	unsigned int i;
>   
> -	for (i = 0; i < MAX_ORDER; ++i)
> +	for (i = 0; i < TTM_DIM_ORDER; ++i)
>   		seq_printf(m, " %8u", ttm_pool_type_count(&pt[i]));
>   	seq_puts(m, "\n");
>   }
> @@ -738,13 +741,15 @@ int ttm_pool_mgr_init(unsigned long num_pages)
>   {
>   	unsigned int i;
>   
> +	BUILD_BUG_ON(TTM_DIM_ORDER > MAX_ORDER);
> +
>   	if (!page_pool_size)
>   		page_pool_size = num_pages;
>   
>   	spin_lock_init(&shrinker_lock);
>   	INIT_LIST_HEAD(&shrinker_list);
>   
> -	for (i = 0; i < MAX_ORDER; ++i) {
> +	for (i = 0; i < TTM_DIM_ORDER; ++i) {
>   		ttm_pool_type_init(&global_write_combined[i], NULL,
>   				   ttm_write_combined, i);
>   		ttm_pool_type_init(&global_uncached[i], NULL, ttm_uncached, i);
> @@ -777,7 +782,7 @@ void ttm_pool_mgr_fini(void)
>   {
>   	unsigned int i;
>   
> -	for (i = 0; i < MAX_ORDER; ++i) {
> +	for (i = 0; i < TTM_DIM_ORDER; ++i) {
>   		ttm_pool_type_fini(&global_write_combined[i]);
>   		ttm_pool_type_fini(&global_uncached[i]);
>   


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 6/7] drm/ttm: Reduce the number of used allocation orders for TTM pages
  2023-03-08  9:15   ` Christian König
@ 2023-03-08  9:22     ` Thomas Hellström
  0 siblings, 0 replies; 20+ messages in thread
From: Thomas Hellström @ 2023-03-08  9:22 UTC (permalink / raw)
  To: Christian König, dri-devel; +Cc: intel-gfx, Matthew Auld


On 3/8/23 10:15, Christian König wrote:
> Am 07.03.23 um 15:46 schrieb Thomas Hellström:
>> When swapping out, we will split multi-order pages both in order to
>> move them to the swap-cache and to be able to return memory to the
>> swap cache as soon as possible on a page-by-page basis.
>> Reduce the page max order to the system PMD size, as we can then be 
>> nicer
>> to the system and avoid splitting gigantic pages.
>
> Mhm, we actually have a todo to start supporting giant pages at some 
> time.
>
> Using the folio directly just saves tons of overhead when you don't 
> need to allocate 2MiG page array any more for each 1GiB you allocate.
>
> But that probably needs tons of work anyway, so feel free to add my rb 
> for now.

Thanks, I need to fix this anyway for powerpc where it seems PMD_ORDER > 
MAX_ORDER :/

It might be we'd want to replace the ttm page arrays with scatter-gather 
tables at some point?
I think at least vmwgfx, i915 and xe would benefit from that...

/Thomas

>
> Regards,
> Christian.
>
>>
>> Looking forward to when we might be able to swap out PMD size folios
>> without splitting, this will also be a benefit.
>>
>> v2:
>> - Include all orders up to the PMD size (Christian König)
>>
>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> ---
>>   drivers/gpu/drm/ttm/ttm_pool.c | 27 ++++++++++++++++-----------
>>   1 file changed, 16 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c 
>> b/drivers/gpu/drm/ttm/ttm_pool.c
>> index 0b6e20613d19..939845d853af 100644
>> --- a/drivers/gpu/drm/ttm/ttm_pool.c
>> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
>> @@ -47,6 +47,9 @@
>>     #include "ttm_module.h"
>>   +#define TTM_MAX_ORDER (PMD_SHIFT - PAGE_SHIFT)
>> +#define TTM_DIM_ORDER (TTM_MAX_ORDER + 1)
>> +
>>   /**
>>    * struct ttm_pool_dma - Helper object for coherent DMA mappings
>>    *
>> @@ -65,11 +68,11 @@ module_param(page_pool_size, ulong, 0644);
>>     static atomic_long_t allocated_pages;
>>   -static struct ttm_pool_type global_write_combined[MAX_ORDER];
>> -static struct ttm_pool_type global_uncached[MAX_ORDER];
>> +static struct ttm_pool_type global_write_combined[TTM_DIM_ORDER];
>> +static struct ttm_pool_type global_uncached[TTM_DIM_ORDER];
>>   -static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER];
>> -static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
>> +static struct ttm_pool_type global_dma32_write_combined[TTM_DIM_ORDER];
>> +static struct ttm_pool_type global_dma32_uncached[TTM_DIM_ORDER];
>>     static spinlock_t shrinker_lock;
>>   static struct list_head shrinker_list;
>> @@ -431,7 +434,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct 
>> ttm_tt *tt,
>>       else
>>           gfp_flags |= GFP_HIGHUSER;
>>   -    for (order = min_t(unsigned int, MAX_ORDER - 1, 
>> __fls(num_pages));
>> +    for (order = min_t(unsigned int, TTM_MAX_ORDER, __fls(num_pages));
>>            num_pages;
>>            order = min_t(unsigned int, order, __fls(num_pages))) {
>>           struct ttm_pool_type *pt;
>> @@ -550,7 +553,7 @@ void ttm_pool_init(struct ttm_pool *pool, struct 
>> device *dev,
>>         if (use_dma_alloc) {
>>           for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
>> -            for (j = 0; j < MAX_ORDER; ++j)
>> +            for (j = 0; j < TTM_DIM_ORDER; ++j)
>> ttm_pool_type_init(&pool->caching[i].orders[j],
>>                              pool, i, j);
>>       }
>> @@ -570,7 +573,7 @@ void ttm_pool_fini(struct ttm_pool *pool)
>>         if (pool->use_dma_alloc) {
>>           for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
>> -            for (j = 0; j < MAX_ORDER; ++j)
>> +            for (j = 0; j < TTM_DIM_ORDER; ++j)
>> ttm_pool_type_fini(&pool->caching[i].orders[j]);
>>       }
>>   @@ -624,7 +627,7 @@ static void ttm_pool_debugfs_header(struct 
>> seq_file *m)
>>       unsigned int i;
>>         seq_puts(m, "\t ");
>> -    for (i = 0; i < MAX_ORDER; ++i)
>> +    for (i = 0; i < TTM_DIM_ORDER; ++i)
>>           seq_printf(m, " ---%2u---", i);
>>       seq_puts(m, "\n");
>>   }
>> @@ -635,7 +638,7 @@ static void ttm_pool_debugfs_orders(struct 
>> ttm_pool_type *pt,
>>   {
>>       unsigned int i;
>>   -    for (i = 0; i < MAX_ORDER; ++i)
>> +    for (i = 0; i < TTM_DIM_ORDER; ++i)
>>           seq_printf(m, " %8u", ttm_pool_type_count(&pt[i]));
>>       seq_puts(m, "\n");
>>   }
>> @@ -738,13 +741,15 @@ int ttm_pool_mgr_init(unsigned long num_pages)
>>   {
>>       unsigned int i;
>>   +    BUILD_BUG_ON(TTM_DIM_ORDER > MAX_ORDER);
>> +
>>       if (!page_pool_size)
>>           page_pool_size = num_pages;
>>         spin_lock_init(&shrinker_lock);
>>       INIT_LIST_HEAD(&shrinker_list);
>>   -    for (i = 0; i < MAX_ORDER; ++i) {
>> +    for (i = 0; i < TTM_DIM_ORDER; ++i) {
>>           ttm_pool_type_init(&global_write_combined[i], NULL,
>>                      ttm_write_combined, i);
>>           ttm_pool_type_init(&global_uncached[i], NULL, ttm_uncached, 
>> i);
>> @@ -777,7 +782,7 @@ void ttm_pool_mgr_fini(void)
>>   {
>>       unsigned int i;
>>   -    for (i = 0; i < MAX_ORDER; ++i) {
>> +    for (i = 0; i < TTM_DIM_ORDER; ++i) {
>>           ttm_pool_type_fini(&global_write_combined[i]);
>>           ttm_pool_type_fini(&global_uncached[i]);
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/7] drm/ttm: Use the BIT macro for the TTM_TT_FLAGs
  2023-03-08  8:49   ` Christian König
@ 2023-03-09  7:06     ` Thomas Hellström
  2023-03-09  8:06       ` Christian König
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Hellström @ 2023-03-09  7:06 UTC (permalink / raw)
  To: Christian König, dri-devel; +Cc: intel-gfx, Matthew Auld

Hi, Christian,

Thanks for reviewing these.

Ack to merge reviewed patches through drm-misc-next?

Thanks,

Thomas


On 3/8/23 09:49, Christian König wrote:
> Am 07.03.23 um 15:46 schrieb Thomas Hellström:
>> New code is recommended to use the BIT macro instead of the explicit
>> shifts. Change the older defines so that we can keep the style 
>> consistent
>> with upcoming changes.
>>
>> v2:
>> - Also change the value of the _PRIV_POPULATED bit (Christian König)
>>
>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>
> Reviewed-by: Christian König <christian.koenig@amd.com>
>
>> ---
>>   include/drm/ttm/ttm_tt.h | 10 +++++-----
>>   1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
>> index b7d3f3843f1e..977ca195a536 100644
>> --- a/include/drm/ttm/ttm_tt.h
>> +++ b/include/drm/ttm/ttm_tt.h
>> @@ -83,12 +83,12 @@ struct ttm_tt {
>>        * set by TTM after ttm_tt_populate() has successfully 
>> returned, and is
>>        * then unset when TTM calls ttm_tt_unpopulate().
>>        */
>> -#define TTM_TT_FLAG_SWAPPED        (1 << 0)
>> -#define TTM_TT_FLAG_ZERO_ALLOC        (1 << 1)
>> -#define TTM_TT_FLAG_EXTERNAL        (1 << 2)
>> -#define TTM_TT_FLAG_EXTERNAL_MAPPABLE    (1 << 3)
>> +#define TTM_TT_FLAG_SWAPPED        BIT(0)
>> +#define TTM_TT_FLAG_ZERO_ALLOC        BIT(1)
>> +#define TTM_TT_FLAG_EXTERNAL        BIT(2)
>> +#define TTM_TT_FLAG_EXTERNAL_MAPPABLE    BIT(3)
>>   -#define TTM_TT_FLAG_PRIV_POPULATED  (1U << 31)
>> +#define TTM_TT_FLAG_PRIV_POPULATED    BIT(4)
>>       uint32_t page_flags;
>>       /** @num_pages: Number of pages in the page array. */
>>       uint32_t num_pages;
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/7] drm/ttm: Use the BIT macro for the TTM_TT_FLAGs
  2023-03-09  7:06     ` Thomas Hellström
@ 2023-03-09  8:06       ` Christian König
  0 siblings, 0 replies; 20+ messages in thread
From: Christian König @ 2023-03-09  8:06 UTC (permalink / raw)
  To: Thomas Hellström, dri-devel; +Cc: intel-gfx, Matthew Auld

Am 09.03.23 um 08:06 schrieb Thomas Hellström:
> Hi, Christian,
>
> Thanks for reviewing these.
>
> Ack to merge reviewed patches through drm-misc-next?

Sure.

Christian.

>
> Thanks,
>
> Thomas
>
>
> On 3/8/23 09:49, Christian König wrote:
>> Am 07.03.23 um 15:46 schrieb Thomas Hellström:
>>> New code is recommended to use the BIT macro instead of the explicit
>>> shifts. Change the older defines so that we can keep the style 
>>> consistent
>>> with upcoming changes.
>>>
>>> v2:
>>> - Also change the value of the _PRIV_POPULATED bit (Christian König)
>>>
>>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>
>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>
>>> ---
>>>   include/drm/ttm/ttm_tt.h | 10 +++++-----
>>>   1 file changed, 5 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
>>> index b7d3f3843f1e..977ca195a536 100644
>>> --- a/include/drm/ttm/ttm_tt.h
>>> +++ b/include/drm/ttm/ttm_tt.h
>>> @@ -83,12 +83,12 @@ struct ttm_tt {
>>>        * set by TTM after ttm_tt_populate() has successfully 
>>> returned, and is
>>>        * then unset when TTM calls ttm_tt_unpopulate().
>>>        */
>>> -#define TTM_TT_FLAG_SWAPPED        (1 << 0)
>>> -#define TTM_TT_FLAG_ZERO_ALLOC        (1 << 1)
>>> -#define TTM_TT_FLAG_EXTERNAL        (1 << 2)
>>> -#define TTM_TT_FLAG_EXTERNAL_MAPPABLE    (1 << 3)
>>> +#define TTM_TT_FLAG_SWAPPED        BIT(0)
>>> +#define TTM_TT_FLAG_ZERO_ALLOC        BIT(1)
>>> +#define TTM_TT_FLAG_EXTERNAL        BIT(2)
>>> +#define TTM_TT_FLAG_EXTERNAL_MAPPABLE    BIT(3)
>>>   -#define TTM_TT_FLAG_PRIV_POPULATED  (1U << 31)
>>> +#define TTM_TT_FLAG_PRIV_POPULATED    BIT(4)
>>>       uint32_t page_flags;
>>>       /** @num_pages: Number of pages in the page array. */
>>>       uint32_t num_pages;
>>


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-03-09  8:06 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-07 14:46 [PATCH v2 0/7] drm/ttm: Small fixes / cleanups in prep for shrinking Thomas Hellström
2023-03-07 14:46 ` [PATCH v2 1/7] drm/ttm: Fix a NULL pointer dereference Thomas Hellström
2023-03-07 16:55   ` Christian König
2023-03-07 17:46     ` Thomas Hellström
2023-03-07 14:46 ` [PATCH v2 2/7] drm/ttm/pool: Fix ttm_pool_alloc error path Thomas Hellström
2023-03-08  8:48   ` Christian König
2023-03-08  8:58     ` Thomas Hellström
2023-03-07 14:46 ` [PATCH v2 3/7] drm/ttm: Use the BIT macro for the TTM_TT_FLAGs Thomas Hellström
2023-03-08  8:49   ` Christian König
2023-03-09  7:06     ` Thomas Hellström
2023-03-09  8:06       ` Christian König
2023-03-07 14:46 ` [PATCH v2 4/7] drm/ttm: Unexport ttm_global_swapout() Thomas Hellström
2023-03-08  8:49   ` Christian König
2023-03-07 14:46 ` [PATCH v2 5/7] drm/ttm: Don't print error message if eviction was interrupted Thomas Hellström
2023-03-08  8:50   ` Christian König
2023-03-07 14:46 ` [PATCH v2 6/7] drm/ttm: Reduce the number of used allocation orders for TTM pages Thomas Hellström
2023-03-07 19:42   ` [Intel-gfx] " kernel test robot
2023-03-08  9:15   ` Christian König
2023-03-08  9:22     ` Thomas Hellström
2023-03-07 14:46 ` [PATCH v2 7/7] drm/ttm: Make the call to ttm_tt_populate() interruptible when faulting Thomas Hellström

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).