dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/16] Add a TTM shrinker
@ 2023-02-15 16:13 Thomas Hellström
  2023-02-15 16:13 ` [RFC PATCH 01/16] drm/ttm: Fix a NULL pointer dereference Thomas Hellström
                   ` (15 more replies)
  0 siblings, 16 replies; 32+ messages in thread
From: Thomas Hellström @ 2023-02-15 16:13 UTC (permalink / raw)
  To: dri-devel
  Cc: Miaohe Lin, Thomas Hellström, David Hildenbrand, NeilBrown,
	Daniel Vetter, intel-gfx, Dave Hansen, Matthew Wilcox (Oracle),
	linux-mm, linux-graphics-maintainer, Peter Xu, Johannes Weiner,
	Dave Airlie, Andrew Morton, Christian Koenig, Matthew Auld

This series introduces a TTM shrinker.

Currently the TTM subsystem allows a certain watermark fraction of
system memory to be pinned by GPUs. Any allocation beyond that will
cause TTM to attempt to copy memory to shmem objects for possible
later swapout so that that fraction is fulfilled. That unnecessarily
happens also on systems where swapping is not available, but still
works reasonably well in many cases.

However there is no way for the system to swap out all of graphics
memory even in situatons where graphics processes are suspended.

So add a TTM shrinker capable of moving graphics memory pages to the
swap cache for later laundring and free, and, in the case there is no
swap available, freeing graphics memory that is kept around for
caching purposes.

For devices where the shrinker is active, the watermark
fraction is disabled, but for devices not (yet) supporting shrinking
or using dma_alloced memory which we can't insert into the swap-cache,
keep it around.

Each driver needs to implement a callback to enable the shrinker for
its devices. Enable it for i915 as a POC. Will also be used by the
new Intel xe driver if accepted.

The parts of the series mostly needing consideration and feecback is

*) The mm part, inserting pages into the swap-cache. Is it acceptable and,
   if so, correct? It *might* be possible we can do without this part,
   but then we'd have to be able to call read_mapping_page() and
   trylock_page() on non-isolated shmem pages from reclaim context,
   and need to be able to recover from failures.

*) The TTM driver callback for shrinking

*) The additional TTM functions to mark buffer-objects as not needed, but
   good to have around for caching purposes.

*) Swapin doesn't lose content on error and is also interruptible or at
   least killable ATM. This complicates helpers. Should we
   drop this and just drop content on error, and wait for swapin
   uninterruptible? The TTM pool code could indeed do without additional
   complication...

*) Is there a better way to do shrink throttling to avoid filling the
   swap-cache completely.

*) Is it good enough for real-world workloads?

The series has been tested using the i915 driver with a 4GiB
VRAM DG1 on a system with 14GiB system memory and 16GiB SSD Swap, and using
an old igt-gpu-tools version, 8c0bb07b7b4d, of gem_lmem_swapping
which overcommits system memory quite extensively

Patch walkthrough:

Initial bugfixes, could be decoupled from the series.
drm/ttm: Fix a NULL pointer dereference.
drm/ttm/pool: Fix ttm_pool_alloc error path.

Cleanups and restructuring:
drm/ttm: Use the BIT macro for the TTM_TT_FLAGs
drm/ttm, drm/vmwgfx: Update the TTM swapout interface
drm/ttm: Unexport ttm_global_swapout()

Adding shrinker without enabling it:
drm/ttm: Don't use watermark accounting on shrinkable pools
drm/ttm: Reduce the number of used allocation orders for TTM pages
drm/ttm: Add a shrinker and shrinker accounting
drm/ttm: Introduce shrink throttling
drm/ttm: Remove pinned bos from shrinkable accounting
drm/ttm: Add a simple api to set/ clear purgeable ttm_tt content

Adding the core mm part to insert and read-back pages from the swap-cache:
mm: Add interfaces to back up and recover folio contents using swap.

TTM helpers for shrinking:
drm/ttm: Make the call to ttm_tt_populate() interruptible when faulting.
drm/ttm: Provide helpers for shrinking.
drm/ttm: Use fault-injection to test error paths.

Enable i915:
drm/i915, drm/ttm: Use the TTM shrinker rather than the external shmem pool

Any feedback greatly appreciated.
Thomas

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: NeilBrown <neilb@suse.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: <linux-graphics-maintainer@vmware.com>
Cc: <linux-mm@kvack.org>
Cc: <intel-gfx@lists.freedesktop.org>


Thomas Hellström (16):
  drm/ttm: Fix a NULL pointer dereference
  drm/ttm/pool: Fix ttm_pool_alloc error path
  drm/ttm: Use the BIT macro for the TTM_TT_FLAGs
  drm/ttm, drm/vmwgfx: Update the TTM swapout interface
  drm/ttm: Unexport ttm_global_swapout()
  drm/ttm: Don't use watermark accounting on shrinkable pools
  drm/ttm: Reduce the number of used allocation orders for TTM pages
  drm/ttm: Add a shrinker and shrinker accounting
  drm/ttm: Introduce shrink throttling.
  drm/ttm: Remove pinned bos from shrinkable accounting
  drm/ttm: Add a simple api to set / clear purgeable ttm_tt content
  mm: Add interfaces to back up and recover folio contents using swap
  drm/ttm: Make the call to ttm_tt_populate() interruptible when
    faulting
  drm/ttm: Provide helpers for shrinking
  drm/ttm: Use fault-injection to test error paths
  drm/i915, drm/ttm: Use the TTM shrinker rather than the external shmem
    pool

 drivers/gpu/drm/Kconfig                       |  11 +
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |   6 -
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   6 -
 drivers/gpu/drm/i915/gem/i915_gem_pages.c     |   5 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       | 273 ++-------
 drivers/gpu/drm/i915/i915_gem.c               |   3 +-
 drivers/gpu/drm/ttm/ttm_bo.c                  |  45 +-
 drivers/gpu/drm/ttm/ttm_bo_vm.c               |  19 +-
 drivers/gpu/drm/ttm/ttm_device.c              |  85 ++-
 drivers/gpu/drm/ttm/ttm_pool.c                | 522 ++++++++++++++++--
 drivers/gpu/drm/ttm/ttm_tt.c                  | 336 +++++++++--
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c           |   3 +-
 include/drm/ttm/ttm_bo.h                      |   4 +-
 include/drm/ttm/ttm_device.h                  |  36 +-
 include/drm/ttm/ttm_pool.h                    |  19 +
 include/drm/ttm/ttm_tt.h                      |  57 +-
 include/linux/swap.h                          |  10 +
 mm/Kconfig                                    |  18 +
 mm/Makefile                                   |   2 +
 mm/swap_backup_folio.c                        | 178 ++++++
 mm/swap_backup_folio_test.c                   | 111 ++++
 21 files changed, 1361 insertions(+), 388 deletions(-)
 create mode 100644 mm/swap_backup_folio.c
 create mode 100644 mm/swap_backup_folio_test.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH 01/16] drm/ttm: Fix a NULL pointer dereference
  2023-02-15 16:13 [RFC PATCH 00/16] Add a TTM shrinker Thomas Hellström
@ 2023-02-15 16:13 ` Thomas Hellström
  2023-02-15 17:25   ` Christian König
  2023-02-15 16:13 ` [RFC PATCH 02/16] drm/ttm/pool: Fix ttm_pool_alloc error path Thomas Hellström
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: Thomas Hellström @ 2023-02-15 16:13 UTC (permalink / raw)
  To: dri-devel
  Cc: Miaohe Lin, Philip Yang, NeilBrown, Daniel Vetter, Peter Xu,
	linux-mm, Dave Hansen, Huang Rui, David Hildenbrand,
	Matthew Wilcox (Oracle),
	linux-graphics-maintainer, Matthew Auld, Ramalingam C,
	Dave Airlie, Thomas Hellström, Arunpravin Paneer Selvam,
	Anshuman Gupta, intel-gfx, Qiang Yu, Tvrtko Ursulin,
	Felix Kuehling, Johannes Weiner, Alex Deucher, Andrew Morton,
	Christian König, Nirmoy Das

The LRU mechanism may look up a resource in the process of being removed
from an object. The locking rules here are a bit unclear but it looks
currently like res->bo assignment is protected by the LRU lock, whereas
bo->resource is protected by the object lock, while *clearing* of
bo->resource is also protected by the LRU lock. This means that if
we check that bo->resource points to the LRU resource under the LRU
lock we should be safe.
So perform that check before deciding to swap out a bo. That avoids
dereferencing a NULL bo->resource in ttm_bo_swapout().

Fixes: 6a9b02899402 ("drm/ttm: move the LRU into resource handling v4")
Cc: Christian König <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Philip Yang <Philip.Yang@amd.com>
Cc: Qiang Yu <qiang.yu@amd.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index c7a1862f322a..ae2f19dc9f81 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -158,7 +158,7 @@ int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
 			struct ttm_buffer_object *bo = res->bo;
 			uint32_t num_pages;
 
-			if (!bo)
+			if (!bo || bo->resource != res)
 				continue;
 
 			num_pages = PFN_UP(bo->base.size);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 02/16] drm/ttm/pool: Fix ttm_pool_alloc error path
  2023-02-15 16:13 [RFC PATCH 00/16] Add a TTM shrinker Thomas Hellström
  2023-02-15 16:13 ` [RFC PATCH 01/16] drm/ttm: Fix a NULL pointer dereference Thomas Hellström
@ 2023-02-15 16:13 ` Thomas Hellström
  2023-02-15 17:31   ` Christian König
  2023-02-15 16:13 ` [RFC PATCH 03/16] drm/ttm: Use the BIT macro for the TTM_TT_FLAGs Thomas Hellström
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: Thomas Hellström @ 2023-02-15 16:13 UTC (permalink / raw)
  To: dri-devel
  Cc: Miaohe Lin, Thomas Hellström, David Hildenbrand, NeilBrown,
	Daniel Vetter, intel-gfx, Matthew Wilcox (Oracle),
	linux-mm, Dave Hansen, Huang Rui, linux-graphics-maintainer,
	Peter Xu, Johannes Weiner, Madhav Chauhan, Dave Airlie,
	Andrew Morton, Christian König, Matthew Auld

When hitting an error, the error path forgot to unmap dma mappings and
could call set_pages_wb() on already uncached pages.

Fix this by introducing a common __ttm_pool_free() function that
does the right thing.

Fixes: d099fc8f540a ("drm/ttm: new TT backend allocation pool v3")
Cc: Christian König <christian.koenig@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Madhav Chauhan <madhav.chauhan@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_pool.c | 74 +++++++++++++++++++++-------------
 1 file changed, 45 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index aa116a7bbae3..1cc7591a9542 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -367,6 +367,39 @@ static int ttm_pool_page_allocated(struct ttm_pool *pool, unsigned int order,
 	return 0;
 }
 
+static void __ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt,
+			    struct page **caching_divide,
+			    enum ttm_caching initial_caching,
+			    enum ttm_caching subseq_caching,
+			    pgoff_t num_pages)
+{
+	enum ttm_caching caching = subseq_caching;
+	struct page **pages = tt->pages;
+	unsigned int order;
+	pgoff_t i, nr;
+
+	if (pool && caching_divide)
+		caching = initial_caching;
+
+	for (i = 0; i < num_pages; i += nr, pages += nr) {
+		struct ttm_pool_type *pt = NULL;
+
+		if (unlikely(caching_divide == pages))
+			caching = subseq_caching;
+
+		order = ttm_pool_page_order(pool, *pages);
+		nr = (1UL << order);
+		if (tt->dma_address)
+			ttm_pool_unmap(pool, tt->dma_address[i], nr);
+
+		pt = ttm_pool_select_type(pool, caching, order);
+		if (pt)
+			ttm_pool_type_give(pt, *pages);
+		else
+			ttm_pool_free_page(pool, caching, order, *pages);
+	}
+}
+
 /**
  * ttm_pool_alloc - Fill a ttm_tt object
  *
@@ -386,8 +419,9 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 	dma_addr_t *dma_addr = tt->dma_address;
 	struct page **caching = tt->pages;
 	struct page **pages = tt->pages;
+	enum ttm_caching page_caching;
 	gfp_t gfp_flags = GFP_USER;
-	unsigned int i, order;
+	unsigned int order;
 	struct page *p;
 	int r;
 
@@ -410,6 +444,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 	     order = min_t(unsigned int, order, __fls(num_pages))) {
 		struct ttm_pool_type *pt;
 
+		page_caching = tt->caching;
 		pt = ttm_pool_select_type(pool, tt->caching, order);
 		p = pt ? ttm_pool_type_take(pt) : NULL;
 		if (p) {
@@ -418,6 +453,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 			if (r)
 				goto error_free_page;
 
+			caching = pages;
 			do {
 				r = ttm_pool_page_allocated(pool, order, p,
 							    &dma_addr,
@@ -426,14 +462,15 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 				if (r)
 					goto error_free_page;
 
+				caching = pages;
 				if (num_pages < (1 << order))
 					break;
 
 				p = ttm_pool_type_take(pt);
 			} while (p);
-			caching = pages;
 		}
 
+		page_caching = ttm_cached;
 		while (num_pages >= (1 << order) &&
 		       (p = ttm_pool_alloc_page(pool, gfp_flags, order))) {
 
@@ -442,6 +479,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 							   tt->caching);
 				if (r)
 					goto error_free_page;
+				caching = pages;
 			}
 			r = ttm_pool_page_allocated(pool, order, p, &dma_addr,
 						    &num_pages, &pages);
@@ -468,15 +506,12 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 	return 0;
 
 error_free_page:
-	ttm_pool_free_page(pool, tt->caching, order, p);
+	ttm_pool_free_page(pool, page_caching, order, p);
 
 error_free_all:
 	num_pages = tt->num_pages - num_pages;
-	for (i = 0; i < num_pages; ) {
-		order = ttm_pool_page_order(pool, tt->pages[i]);
-		ttm_pool_free_page(pool, tt->caching, order, tt->pages[i]);
-		i += 1 << order;
-	}
+	__ttm_pool_free(pool, tt, caching, tt->caching, ttm_cached,
+			num_pages);
 
 	return r;
 }
@@ -492,27 +527,8 @@ EXPORT_SYMBOL(ttm_pool_alloc);
  */
 void ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt)
 {
-	unsigned int i;
-
-	for (i = 0; i < tt->num_pages; ) {
-		struct page *p = tt->pages[i];
-		unsigned int order, num_pages;
-		struct ttm_pool_type *pt;
-
-		order = ttm_pool_page_order(pool, p);
-		num_pages = 1ULL << order;
-		if (tt->dma_address)
-			ttm_pool_unmap(pool, tt->dma_address[i], num_pages);
-
-		pt = ttm_pool_select_type(pool, tt->caching, order);
-		if (pt)
-			ttm_pool_type_give(pt, tt->pages[i]);
-		else
-			ttm_pool_free_page(pool, tt->caching, order,
-					   tt->pages[i]);
-
-		i += num_pages;
-	}
+	__ttm_pool_free(pool, tt, NULL, tt->caching, tt->caching,
+			tt->num_pages);
 
 	while (atomic_long_read(&allocated_pages) > page_pool_size)
 		ttm_pool_shrink();
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 03/16] drm/ttm: Use the BIT macro for the TTM_TT_FLAGs
  2023-02-15 16:13 [RFC PATCH 00/16] Add a TTM shrinker Thomas Hellström
  2023-02-15 16:13 ` [RFC PATCH 01/16] drm/ttm: Fix a NULL pointer dereference Thomas Hellström
  2023-02-15 16:13 ` [RFC PATCH 02/16] drm/ttm/pool: Fix ttm_pool_alloc error path Thomas Hellström
@ 2023-02-15 16:13 ` Thomas Hellström
  2023-02-15 17:33   ` Christian König
  2023-02-15 16:13 ` [RFC PATCH 04/16] drm/ttm, drm/vmwgfx: Update the TTM swapout interface Thomas Hellström
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: Thomas Hellström @ 2023-02-15 16:13 UTC (permalink / raw)
  To: dri-devel
  Cc: Miaohe Lin, Thomas Hellström, David Hildenbrand, NeilBrown,
	Daniel Vetter, intel-gfx, Matthew Wilcox (Oracle),
	linux-mm, Dave Hansen, linux-graphics-maintainer, Peter Xu,
	Johannes Weiner, Dave Airlie, Andrew Morton, Christian Koenig,
	Matthew Auld

New code is recommended to use the BIT macro instead of the explicit
shifts. Change the older defines so that we can keep the style consistent
with upcoming changes.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 include/drm/ttm/ttm_tt.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index b7d3f3843f1e..cc54be1912e1 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -83,12 +83,12 @@ struct ttm_tt {
 	 * set by TTM after ttm_tt_populate() has successfully returned, and is
 	 * then unset when TTM calls ttm_tt_unpopulate().
 	 */
-#define TTM_TT_FLAG_SWAPPED		(1 << 0)
-#define TTM_TT_FLAG_ZERO_ALLOC		(1 << 1)
-#define TTM_TT_FLAG_EXTERNAL		(1 << 2)
-#define TTM_TT_FLAG_EXTERNAL_MAPPABLE	(1 << 3)
+#define TTM_TT_FLAG_SWAPPED		BIT(0)
+#define TTM_TT_FLAG_ZERO_ALLOC		BIT(1)
+#define TTM_TT_FLAG_EXTERNAL		BIT(2)
+#define TTM_TT_FLAG_EXTERNAL_MAPPABLE	BIT(3)
 
-#define TTM_TT_FLAG_PRIV_POPULATED  (1U << 31)
+#define TTM_TT_FLAG_PRIV_POPULATED	BIT(31)
 	uint32_t page_flags;
 	/** @num_pages: Number of pages in the page array. */
 	uint32_t num_pages;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 04/16] drm/ttm, drm/vmwgfx: Update the TTM swapout interface
  2023-02-15 16:13 [RFC PATCH 00/16] Add a TTM shrinker Thomas Hellström
                   ` (2 preceding siblings ...)
  2023-02-15 16:13 ` [RFC PATCH 03/16] drm/ttm: Use the BIT macro for the TTM_TT_FLAGs Thomas Hellström
@ 2023-02-15 16:13 ` Thomas Hellström
  2023-02-15 17:39   ` Christian König
  2023-02-15 16:13 ` [RFC PATCH 05/16] drm/ttm: Unexport ttm_global_swapout() Thomas Hellström
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: Thomas Hellström @ 2023-02-15 16:13 UTC (permalink / raw)
  To: dri-devel
  Cc: Miaohe Lin, Thomas Hellström, David Hildenbrand, NeilBrown,
	Daniel Vetter, intel-gfx, Matthew Wilcox (Oracle),
	linux-mm, Dave Hansen, linux-graphics-maintainer, Peter Xu,
	Johannes Weiner, Dave Airlie, Andrew Morton, Christian Koenig,
	Matthew Auld

Update the TTM swapout interfaces for better compatibility with a shrinker.
- Replace number-of-pages int return with a long to better match the
  kernel's shrinker interface.
- The gfp_flags parameter to ttm_xx_swapout() currently only takes the
  GFP_KERNEL value and shouldn't really be needed since the shrinker we
  hook up in upcoming patches sets a allocation context to match reclaim.
- Introduce a shrink reason enumeration and a driver callback to shrink
  buffer objects.
  The TTM_SHRINK_WATERMARK reason is going to still be handled using the
  existing shmem copy, and will be used by pool types that don't lend
  themselves well to shinking (dma_alloc pool) and when drivers explicitly
  requests swapout.
  The TTM_SHRINK_SWAP and TTM_SHRINK_PURGE reasons originate from a
  shrinker and is to be handled by a new driver callback, bo_shrink().
  Helpers for the new driver callback are provided in upcoming patches.

Cc: linux-graphics-maintainer@vmware.com
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c        | 38 ++++++++++++++++----
 drivers/gpu/drm/ttm/ttm_device.c    | 55 +++++++++++++++++++++--------
 drivers/gpu/drm/ttm/ttm_tt.c        | 23 ++++++------
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c |  3 +-
 include/drm/ttm/ttm_bo.h            |  4 +--
 include/drm/ttm/ttm_device.h        | 36 +++++++++++++++++--
 include/drm/ttm/ttm_tt.h            | 17 +++++++--
 7 files changed, 136 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 882c2fa346f3..e5c0970564c0 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1114,13 +1114,29 @@ int ttm_bo_wait_ctx(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx)
 }
 EXPORT_SYMBOL(ttm_bo_wait_ctx);
 
-int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
-		   gfp_t gfp_flags)
+/**
+ * ttm_bo_swapout() - Swap out or purge a buffer object
+ * @bo: The buffer object.
+ * @ctx: The ttm operation context.
+ * @reason: The swapout reason.
+ *
+ * Try to swap out or purge the contents of a system memory backed buffer
+ * object. The function needs to be called with the device's LRU lock held.
+ *
+ * Return: -EBUSY if the bo lock could not be grabbed or the object was
+ * otherwise busy. Otherwise the number of pages swapped out or negative
+ * error code on error. Iff the function didn't return -EBUSY, the
+ * LRU lock was dropped, and LRU traversal needs to restart.
+ */
+long ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
+		    enum ttm_shrink_reason reason)
 {
 	struct ttm_place place;
 	bool locked;
 	long ret;
 
+	lockdep_assert_held(&bo->bdev->lru_lock);
+
 	/*
 	 * While the bo may already reside in SYSTEM placement, set
 	 * SYSTEM as new placement to cover also the move further below.
@@ -1142,8 +1158,12 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
 	}
 
 	if (bo->deleted) {
+		long num_pages = bo->ttm->num_pages;
+
 		ret = ttm_bo_cleanup_refs(bo, false, false, locked);
 		ttm_bo_put(bo);
+		if (!ret)
+			return num_pages;
 		return ret == -EBUSY ? -ENOSPC : ret;
 	}
 
@@ -1184,13 +1204,17 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
 	 * Swap out. Buffer will be swapped in again as soon as
 	 * anyone tries to access a ttm page.
 	 */
-	if (bo->bdev->funcs->swap_notify)
-		bo->bdev->funcs->swap_notify(bo);
+	if (bo->bdev->funcs->bo_shrink && reason != TTM_SHRINK_WATERMARK) {
+		ret = bo->bdev->funcs->bo_shrink(bo, ctx);
+	} else {
+		if (bo->bdev->funcs->swap_notify)
+			bo->bdev->funcs->swap_notify(bo);
+		ret = ttm_tt_swapout(bo->bdev, bo->ttm);
+		if (!ret)
+			ret = bo->ttm->num_pages;
+	}
 
-	if (ttm_tt_is_populated(bo->ttm))
-		ret = ttm_tt_swapout(bo->bdev, bo->ttm, gfp_flags);
 out:
-
 	/*
 	 * Unreserve without putting on LRU to avoid swapping out an
 	 * already swapped buffer.
diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index ae2f19dc9f81..7eadea07027f 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -116,19 +116,28 @@ static int ttm_global_init(void)
 	return ret;
 }
 
-/*
- * A buffer object shrink method that tries to swap out the first
- * buffer object on the global::swap_lru list.
+/**
+ * ttm_global_swapout() - Select and swap out a system-memory-backed bo.
+ * @ctx: The operation context.
+ * @reason: The reason for swapout.
+ *
+ * Select, based on round-robin a TTM device and traverse the LRUs of
+ * that specific device until a suitable bo backed by system memory is found
+ * and swapped-out or purged.
+ *
+ * Return: Positive value or zero indicating the size in pages of the
+ * bo swapped out. Negative error code on error.
  */
-int ttm_global_swapout(struct ttm_operation_ctx *ctx, gfp_t gfp_flags)
+long ttm_global_swapout(struct ttm_operation_ctx *ctx,
+			enum ttm_shrink_reason reason)
 {
 	struct ttm_global *glob = &ttm_glob;
 	struct ttm_device *bdev;
-	int ret = 0;
+	long ret = 0;
 
 	mutex_lock(&ttm_global_mutex);
 	list_for_each_entry(bdev, &glob->device_list, device_list) {
-		ret = ttm_device_swapout(bdev, ctx, gfp_flags);
+		ret = ttm_device_swapout(bdev, ctx, reason);
 		if (ret > 0) {
 			list_move_tail(&bdev->device_list, &glob->device_list);
 			break;
@@ -139,14 +148,29 @@ int ttm_global_swapout(struct ttm_operation_ctx *ctx, gfp_t gfp_flags)
 }
 EXPORT_SYMBOL(ttm_global_swapout);
 
-int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
-		       gfp_t gfp_flags)
+/**
+ * ttm_device_swapout() - Select and swap out a system-memory-backed bo.
+ * @bdev: The device whos bos are considered for swapout.
+ * @ctx: The operation context.
+ * @reason: The reason for swapout.
+ *
+ * Traverse the LRUs of a specific device until a suitable bo backed by
+ * system memory is found and swapped-out or purged.
+ *
+ * Return: Positive value or zero indicating the size in pages of the
+ * bo swapped out. Negative error code on error.
+ */
+long ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
+			enum ttm_shrink_reason reason)
 {
 	struct ttm_resource_cursor cursor;
 	struct ttm_resource_manager *man;
 	struct ttm_resource *res;
 	unsigned i;
-	int ret;
+	long ret;
+
+	if (reason != TTM_SHRINK_WATERMARK && !bdev->funcs->bo_shrink)
+		return 0;
 
 	spin_lock(&bdev->lru_lock);
 	for (i = TTM_PL_SYSTEM; i < TTM_NUM_MEM_TYPES; ++i) {
@@ -156,16 +180,19 @@ int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
 
 		ttm_resource_manager_for_each_res(man, &cursor, res) {
 			struct ttm_buffer_object *bo = res->bo;
-			uint32_t num_pages;
+			struct ttm_tt *tt;
 
 			if (!bo || bo->resource != res)
 				continue;
 
-			num_pages = PFN_UP(bo->base.size);
-			ret = ttm_bo_swapout(bo, ctx, gfp_flags);
+			tt = bo->ttm;
+			if (!tt || (reason == TTM_SHRINK_PURGE &&
+				    !ttm_tt_purgeable(tt)))
+				continue;
+			ret = ttm_bo_swapout(bo, ctx, reason);
 			/* ttm_bo_swapout has dropped the lru_lock */
-			if (!ret)
-				return num_pages;
+			if (ret >= 0)
+				return ret;
 			if (ret != -EBUSY)
 				return ret;
 		}
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index ab725d9d14a6..a68c14de0161 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -239,22 +239,21 @@ int ttm_tt_swapin(struct ttm_tt *ttm)
 
 /**
  * ttm_tt_swapout - swap out tt object
- *
  * @bdev: TTM device structure.
  * @ttm: The struct ttm_tt.
- * @gfp_flags: Flags to use for memory allocation.
  *
- * Swapout a TT object to a shmem_file, return number of pages swapped out or
- * negative error code.
+ * Swapout a TT object to a shmem_file.
+ *
+ * Return: number of pages swapped out or negative error code on error.
  */
-int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
-		   gfp_t gfp_flags)
+int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm)
 {
 	loff_t size = (loff_t)ttm->num_pages << PAGE_SHIFT;
 	struct address_space *swap_space;
 	struct file *swap_storage;
 	struct page *from_page;
 	struct page *to_page;
+	gfp_t gfp_flags;
 	int i, ret;
 
 	swap_storage = shmem_file_setup("ttm swap", size, 0);
@@ -264,7 +263,7 @@ int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
 	}
 
 	swap_space = swap_storage->f_mapping;
-	gfp_flags &= mapping_gfp_mask(swap_space);
+	gfp_flags = GFP_KERNEL & mapping_gfp_mask(swap_space);
 
 	for (i = 0; i < ttm->num_pages; ++i) {
 		from_page = ttm->pages[i];
@@ -315,12 +314,14 @@ int ttm_tt_populate(struct ttm_device *bdev,
 	while (atomic_long_read(&ttm_pages_allocated) > ttm_pages_limit ||
 	       atomic_long_read(&ttm_dma32_pages_allocated) >
 	       ttm_dma32_pages_limit) {
+		long r = ttm_global_swapout(ctx, TTM_SHRINK_WATERMARK);
 
-		ret = ttm_global_swapout(ctx, GFP_KERNEL);
-		if (ret == 0)
+		if (!r)
 			break;
-		if (ret < 0)
+		if (r < 0) {
+			ret = r;
 			goto error;
+		}
 	}
 
 	if (bdev->funcs->ttm_tt_populate)
@@ -379,7 +380,7 @@ static int ttm_tt_debugfs_shrink_show(struct seq_file *m, void *data)
 {
 	struct ttm_operation_ctx ctx = { false, false };
 
-	seq_printf(m, "%d\n", ttm_global_swapout(&ctx, GFP_KERNEL));
+	seq_printf(m, "%ld\n", ttm_global_swapout(&ctx, TTM_SHRINK_SWAP));
 	return 0;
 }
 DEFINE_SHOW_ATTRIBUTE(ttm_tt_debugfs_shrink);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index 2588615a2a38..292c5199d2cc 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -1514,7 +1514,8 @@ static int vmw_pm_freeze(struct device *kdev)
 	vmw_execbuf_release_pinned_bo(dev_priv);
 	vmw_resource_evict_all(dev_priv);
 	vmw_release_device_early(dev_priv);
-	while (ttm_device_swapout(&dev_priv->bdev, &ctx, GFP_KERNEL) > 0);
+	while (ttm_device_swapout(&dev_priv->bdev, &ctx, TTM_SHRINK_WATERMARK) > 0)
+		;
 	vmw_fifo_resource_dec(dev_priv);
 	if (atomic_read(&dev_priv->num_fifo_resources) != 0) {
 		DRM_ERROR("Can't hibernate while 3D resources are active.\n");
diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
index 8b113c384236..6b45e0b639e0 100644
--- a/include/drm/ttm/ttm_bo.h
+++ b/include/drm/ttm/ttm_bo.h
@@ -375,8 +375,8 @@ void ttm_bo_kunmap(struct ttm_bo_kmap_obj *map);
 int ttm_bo_vmap(struct ttm_buffer_object *bo, struct iosys_map *map);
 void ttm_bo_vunmap(struct ttm_buffer_object *bo, struct iosys_map *map);
 int ttm_bo_mmap_obj(struct vm_area_struct *vma, struct ttm_buffer_object *bo);
-int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
-		   gfp_t gfp_flags);
+long ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
+		    enum ttm_shrink_reason reason);
 void ttm_bo_pin(struct ttm_buffer_object *bo);
 void ttm_bo_unpin(struct ttm_buffer_object *bo);
 int ttm_mem_evict_first(struct ttm_device *bdev,
diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h
index 4f3e81eac6f3..6bd2abf712ab 100644
--- a/include/drm/ttm/ttm_device.h
+++ b/include/drm/ttm/ttm_device.h
@@ -35,6 +35,21 @@ struct ttm_placement;
 struct ttm_buffer_object;
 struct ttm_operation_ctx;
 
+/**
+ * enum ttm_shrink_reason - Reason for shrinking system memory
+ * @TTM_SHRINK_WATERMARK - A watermark limit was reached. Not from reclaim.
+ * @TTM_SHRINK_PURGE - A request for shrinking only purged objects.
+ * @TTM_SHRINK_SWAP - A request for shrinking any object.
+ *
+ * This enum is intended for the buffer object- and shrink method selection
+ * algorithms. It's not intended to leak to or be used by TTM drivers.
+ */
+enum ttm_shrink_reason {
+	TTM_SHRINK_WATERMARK,
+	TTM_SHRINK_PURGE,
+	TTM_SHRINK_SWAP,
+};
+
 /**
  * struct ttm_global - Buffer object driver global data.
  */
@@ -207,6 +222,19 @@ struct ttm_device_funcs {
 	 * adding fences that may force a delayed delete
 	 */
 	void (*release_notify)(struct ttm_buffer_object *bo);
+
+	/**
+	 * Shrink the bo's system pages, Either by swapping or by purging.
+	 * @bo: Bo the system pages of which are to be shrunken.
+	 * @ctx: Operation ctx. In particular the driver callback should
+	 *       adhere to the no_wait_gpu and interruptible fields.
+	 *
+	 * This is also notifying the driver that the bo is about to be
+	 * shrunken and the driver should take care to unbind any GPU bindings
+	 * and to note that the content is purged if @bo->ttm is purgeable.
+	 */
+	long (*bo_shrink)(struct ttm_buffer_object *bo,
+			  struct ttm_operation_ctx *ctx);
 };
 
 /**
@@ -268,9 +296,11 @@ struct ttm_device {
 	struct workqueue_struct *wq;
 };
 
-int ttm_global_swapout(struct ttm_operation_ctx *ctx, gfp_t gfp_flags);
-int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
-		       gfp_t gfp_flags);
+long ttm_global_swapout(struct ttm_operation_ctx *ctx,
+			enum ttm_shrink_reason reason);
+
+long ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
+			enum ttm_shrink_reason reason);
 
 static inline struct ttm_resource_manager *
 ttm_manager_type(struct ttm_device *bdev, int mem_type)
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index cc54be1912e1..627168eba8f6 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -87,6 +87,7 @@ struct ttm_tt {
 #define TTM_TT_FLAG_ZERO_ALLOC		BIT(1)
 #define TTM_TT_FLAG_EXTERNAL		BIT(2)
 #define TTM_TT_FLAG_EXTERNAL_MAPPABLE	BIT(3)
+#define TTM_TT_FLAG_DONTNEED		BIT(4)
 
 #define TTM_TT_FLAG_PRIV_POPULATED	BIT(31)
 	uint32_t page_flags;
@@ -180,8 +181,8 @@ void ttm_tt_destroy(struct ttm_device *bdev, struct ttm_tt *ttm);
  * Swap in a previously swap out ttm_tt.
  */
 int ttm_tt_swapin(struct ttm_tt *ttm);
-int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
-		   gfp_t gfp_flags);
+
+int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm);
 
 /**
  * ttm_tt_populate - allocate pages for a ttm
@@ -223,6 +224,18 @@ void ttm_tt_mgr_init(unsigned long num_pages, unsigned long num_dma32_pages);
 struct ttm_kmap_iter *ttm_kmap_iter_tt_init(struct ttm_kmap_iter_tt *iter_tt,
 					    struct ttm_tt *tt);
 
+/**
+ * ttm_tt_purgeable() - Whether a struct ttm_tt's contents is purgeable
+ * @tt: The struct ttm_tt to consider.
+ *
+ * Return: Whether the contents is purgeable in the sence that the owner
+ * doesn't mind losing it as long as it gets notified.
+ */
+static inline bool ttm_tt_purgeable(struct ttm_tt *tt)
+{
+	return tt->page_flags & TTM_TT_FLAG_DONTNEED;
+}
+
 #if IS_ENABLED(CONFIG_AGP)
 #include <linux/agp_backend.h>
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 05/16] drm/ttm: Unexport ttm_global_swapout()
  2023-02-15 16:13 [RFC PATCH 00/16] Add a TTM shrinker Thomas Hellström
                   ` (3 preceding siblings ...)
  2023-02-15 16:13 ` [RFC PATCH 04/16] drm/ttm, drm/vmwgfx: Update the TTM swapout interface Thomas Hellström
@ 2023-02-15 16:13 ` Thomas Hellström
  2023-02-15 16:13 ` [RFC PATCH 06/16] drm/ttm: Don't use watermark accounting on shrinkable pools Thomas Hellström
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Thomas Hellström @ 2023-02-15 16:13 UTC (permalink / raw)
  To: dri-devel
  Cc: Miaohe Lin, Thomas Hellström, David Hildenbrand, NeilBrown,
	Daniel Vetter, intel-gfx, Matthew Wilcox (Oracle),
	linux-mm, Dave Hansen, linux-graphics-maintainer, Peter Xu,
	Johannes Weiner, Dave Airlie, Andrew Morton, Christian Koenig,
	Matthew Auld

Unexport ttm_global_swapout() since it is not used outside of TTM.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_device.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index 7eadea07027f..a3cac42bb456 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -146,7 +146,6 @@ long ttm_global_swapout(struct ttm_operation_ctx *ctx,
 	mutex_unlock(&ttm_global_mutex);
 	return ret;
 }
-EXPORT_SYMBOL(ttm_global_swapout);
 
 /**
  * ttm_device_swapout() - Select and swap out a system-memory-backed bo.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 06/16] drm/ttm: Don't use watermark accounting on shrinkable pools
  2023-02-15 16:13 [RFC PATCH 00/16] Add a TTM shrinker Thomas Hellström
                   ` (4 preceding siblings ...)
  2023-02-15 16:13 ` [RFC PATCH 05/16] drm/ttm: Unexport ttm_global_swapout() Thomas Hellström
@ 2023-02-15 16:13 ` Thomas Hellström
  2023-02-15 16:13 ` [RFC PATCH 07/16] drm/ttm: Reduce the number of used allocation orders for TTM pages Thomas Hellström
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Thomas Hellström @ 2023-02-15 16:13 UTC (permalink / raw)
  To: dri-devel
  Cc: Miaohe Lin, Thomas Hellström, David Hildenbrand, NeilBrown,
	Daniel Vetter, intel-gfx, Matthew Wilcox (Oracle),
	linux-mm, Dave Hansen, linux-graphics-maintainer, Peter Xu,
	Johannes Weiner, Dave Airlie, Andrew Morton, Christian Koenig,
	Matthew Auld

Clarify the meaning of the ttm_tt pages_limit watermarks as the max
number of pages not accessible by shrinkers, and update accordingly so that
memory allocated by TTM devices that support shrinking is not
accounted against those limits. In particular this means that devices
using the dma_alloc pool will still be using the watermark method.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_device.c |  3 ++-
 drivers/gpu/drm/ttm/ttm_tt.c     | 43 +++++++++++++++++++-------------
 include/drm/ttm/ttm_pool.h       | 15 +++++++++++
 3 files changed, 42 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index a3cac42bb456..e0a2be3ed13d 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -168,7 +168,8 @@ long ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
 	unsigned i;
 	long ret;
 
-	if (reason != TTM_SHRINK_WATERMARK && !bdev->funcs->bo_shrink)
+	if (reason != TTM_SHRINK_WATERMARK &&
+	    (!bdev->funcs->bo_shrink || !ttm_pool_can_shrink(&bdev->pool)))
 		return 0;
 
 	spin_lock(&bdev->lru_lock);
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index a68c14de0161..771e5f3c2fee 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -54,6 +54,21 @@ module_param_named(dma32_pages_limit, ttm_dma32_pages_limit, ulong, 0644);
 static atomic_long_t ttm_pages_allocated;
 static atomic_long_t ttm_dma32_pages_allocated;
 
+static bool ttm_tt_shrinkable(const struct ttm_device *bdev,
+			      const struct ttm_tt *tt)
+{
+	return !!bdev->funcs->bo_shrink &&
+		ttm_pool_can_shrink(&bdev->pool) &&
+		!(tt->page_flags & TTM_TT_FLAG_EXTERNAL);
+}
+
+static void ttm_tt_mod_allocated(bool dma32, long value)
+{
+	atomic_long_add(value, &ttm_pages_allocated);
+	if (dma32)
+		atomic_long_add(value, &ttm_dma32_pages_allocated);
+}
+
 /*
  * Allocates a ttm structure for the given BO.
  */
@@ -304,12 +319,9 @@ int ttm_tt_populate(struct ttm_device *bdev,
 	if (ttm_tt_is_populated(ttm))
 		return 0;
 
-	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
-		atomic_long_add(ttm->num_pages, &ttm_pages_allocated);
-		if (bdev->pool.use_dma32)
-			atomic_long_add(ttm->num_pages,
-					&ttm_dma32_pages_allocated);
-	}
+	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL) &&
+	    !ttm_tt_shrinkable(bdev, ttm))
+		ttm_tt_mod_allocated(bdev->pool.use_dma32, ttm->num_pages);
 
 	while (atomic_long_read(&ttm_pages_allocated) > ttm_pages_limit ||
 	       atomic_long_read(&ttm_dma32_pages_allocated) >
@@ -343,12 +355,10 @@ int ttm_tt_populate(struct ttm_device *bdev,
 	return 0;
 
 error:
-	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
-		atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
-		if (bdev->pool.use_dma32)
-			atomic_long_sub(ttm->num_pages,
-					&ttm_dma32_pages_allocated);
-	}
+	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL) &&
+	    !ttm_tt_shrinkable(bdev, ttm))
+		ttm_tt_mod_allocated(bdev->pool.use_dma32, -(long)ttm->num_pages);
+
 	return ret;
 }
 EXPORT_SYMBOL(ttm_tt_populate);
@@ -363,12 +373,9 @@ void ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm)
 	else
 		ttm_pool_free(&bdev->pool, ttm);
 
-	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
-		atomic_long_sub(ttm->num_pages, &ttm_pages_allocated);
-		if (bdev->pool.use_dma32)
-			atomic_long_sub(ttm->num_pages,
-					&ttm_dma32_pages_allocated);
-	}
+	if (!(ttm->page_flags & TTM_TT_FLAG_EXTERNAL) &&
+	    !ttm_tt_shrinkable(bdev, ttm))
+		ttm_tt_mod_allocated(bdev->pool.use_dma32, -(long)ttm->num_pages);
 
 	ttm->page_flags &= ~TTM_TT_FLAG_PRIV_POPULATED;
 }
diff --git a/include/drm/ttm/ttm_pool.h b/include/drm/ttm/ttm_pool.h
index ef09b23d29e3..c1200552892e 100644
--- a/include/drm/ttm/ttm_pool.h
+++ b/include/drm/ttm/ttm_pool.h
@@ -89,4 +89,19 @@ int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m);
 int ttm_pool_mgr_init(unsigned long num_pages);
 void ttm_pool_mgr_fini(void);
 
+/**
+ * ttm_pool_can_shrink - Whether page allocations from this pool are shrinkable
+ * @pool: The pool.
+ *
+ * Return: true if shrinkable, false if not.
+ */
+static inline bool ttm_pool_can_shrink(const struct ttm_pool *pool)
+{
+	/*
+	 * The dma_alloc pool pages can't be inserted into the
+	 * swap cache. Nor can they be split.
+	 */
+	return !pool->use_dma_alloc;
+}
+
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 07/16] drm/ttm: Reduce the number of used allocation orders for TTM pages
  2023-02-15 16:13 [RFC PATCH 00/16] Add a TTM shrinker Thomas Hellström
                   ` (5 preceding siblings ...)
  2023-02-15 16:13 ` [RFC PATCH 06/16] drm/ttm: Don't use watermark accounting on shrinkable pools Thomas Hellström
@ 2023-02-15 16:13 ` Thomas Hellström
  2023-02-15 17:42   ` Christian König
  2023-02-15 16:13 ` [RFC PATCH 08/16] drm/ttm: Add a shrinker and shrinker accounting Thomas Hellström
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: Thomas Hellström @ 2023-02-15 16:13 UTC (permalink / raw)
  To: dri-devel
  Cc: Miaohe Lin, Thomas Hellström, David Hildenbrand, NeilBrown,
	Daniel Vetter, intel-gfx, Matthew Wilcox (Oracle),
	linux-mm, Dave Hansen, linux-graphics-maintainer, Peter Xu,
	Johannes Weiner, Dave Airlie, Andrew Morton, Christian Koenig,
	Matthew Auld

When swapping out, we will split multi-order pages both in order to
move them to the swap-cache and to be able to return memory to the
swap cache as soon as possible on a page-by-page basis.
By reducing the page max order to the system PMD size, we can be nicer
to the system and avoid splitting gigantic pages. On top of this we also
include the 64K page size in the page sizes tried, since that appears to
be a common size for GPU applications.

Looking forward to when we might be able to swap out PMD size folios
without splitting, this will also be a benefit.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_pool.c | 58 ++++++++++++++++++++++++++--------
 1 file changed, 45 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 1cc7591a9542..8787fb6a218b 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -31,6 +31,8 @@
  * cause they are rather slow compared to alloc_pages+map.
  */
 
+#define pr_fmt(fmt) "[TTM POOL] " fmt
+
 #include <linux/module.h>
 #include <linux/dma-mapping.h>
 #include <linux/debugfs.h>
@@ -47,6 +49,18 @@
 
 #include "ttm_module.h"
 
+#define TTM_MAX_ORDER (PMD_SHIFT - PAGE_SHIFT)
+#define TTM_64K_ORDER (16 - PAGE_SHIFT)
+#if (TTM_MAX_ORDER < TTM_64K_ORDER)
+#undef TTM_MAX_ORDER
+#define TTM_MAX_ORDER TTM_64K_ORDER
+#endif
+#if ((MAX_ORDER - 1) < TTM_MAX_ORDER)
+#undef TTM_MAX_ORDER
+#define TTM_MAX_ORDER (MAX_ORDER - 1)
+#endif
+#define TTM_DIM_ORDER (TTM_MAX_ORDER + 1)
+
 /**
  * struct ttm_pool_dma - Helper object for coherent DMA mappings
  *
@@ -65,16 +79,18 @@ module_param(page_pool_size, ulong, 0644);
 
 static atomic_long_t allocated_pages;
 
-static struct ttm_pool_type global_write_combined[MAX_ORDER];
-static struct ttm_pool_type global_uncached[MAX_ORDER];
+static struct ttm_pool_type global_write_combined[TTM_DIM_ORDER];
+static struct ttm_pool_type global_uncached[TTM_DIM_ORDER];
 
-static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER];
-static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
+static struct ttm_pool_type global_dma32_write_combined[TTM_DIM_ORDER];
+static struct ttm_pool_type global_dma32_uncached[TTM_DIM_ORDER];
 
 static spinlock_t shrinker_lock;
 static struct list_head shrinker_list;
 static struct shrinker mm_shrinker;
 
+static unsigned int ttm_pool_orders[] = {TTM_MAX_ORDER, 0, 0};
+
 /* Allocate pages of size 1 << order with the given gfp_flags */
 static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
 					unsigned int order)
@@ -400,6 +416,17 @@ static void __ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt,
 	}
 }
 
+static unsigned int ttm_pool_select_order(unsigned int order, pgoff_t num_pages)
+{
+	unsigned int *cur_order = ttm_pool_orders;
+
+	order = min_t(unsigned int, __fls(num_pages), order);
+	while (order < *cur_order)
+		++cur_order;
+
+	return *cur_order;
+}
+
 /**
  * ttm_pool_alloc - Fill a ttm_tt object
  *
@@ -439,9 +466,8 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 	else
 		gfp_flags |= GFP_HIGHUSER;
 
-	for (order = min_t(unsigned int, MAX_ORDER - 1, __fls(num_pages));
-	     num_pages;
-	     order = min_t(unsigned int, order, __fls(num_pages))) {
+	order = ttm_pool_select_order(ttm_pool_orders[0], num_pages);
+	for (; num_pages; order = ttm_pool_select_order(order, num_pages)) {
 		struct ttm_pool_type *pt;
 
 		page_caching = tt->caching;
@@ -558,7 +584,7 @@ void ttm_pool_init(struct ttm_pool *pool, struct device *dev,
 
 	if (use_dma_alloc) {
 		for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
-			for (j = 0; j < MAX_ORDER; ++j)
+			for (j = 0; j < TTM_DIM_ORDER; ++j)
 				ttm_pool_type_init(&pool->caching[i].orders[j],
 						   pool, i, j);
 	}
@@ -578,7 +604,7 @@ void ttm_pool_fini(struct ttm_pool *pool)
 
 	if (pool->use_dma_alloc) {
 		for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
-			for (j = 0; j < MAX_ORDER; ++j)
+			for (j = 0; j < TTM_DIM_ORDER; ++j)
 				ttm_pool_type_fini(&pool->caching[i].orders[j]);
 	}
 
@@ -632,7 +658,7 @@ static void ttm_pool_debugfs_header(struct seq_file *m)
 	unsigned int i;
 
 	seq_puts(m, "\t ");
-	for (i = 0; i < MAX_ORDER; ++i)
+	for (i = 0; i < TTM_DIM_ORDER; ++i)
 		seq_printf(m, " ---%2u---", i);
 	seq_puts(m, "\n");
 }
@@ -643,7 +669,7 @@ static void ttm_pool_debugfs_orders(struct ttm_pool_type *pt,
 {
 	unsigned int i;
 
-	for (i = 0; i < MAX_ORDER; ++i)
+	for (i = 0; i < TTM_DIM_ORDER; ++i)
 		seq_printf(m, " %8u", ttm_pool_type_count(&pt[i]));
 	seq_puts(m, "\n");
 }
@@ -749,10 +775,16 @@ int ttm_pool_mgr_init(unsigned long num_pages)
 	if (!page_pool_size)
 		page_pool_size = num_pages;
 
+	if (TTM_64K_ORDER < TTM_MAX_ORDER)
+		ttm_pool_orders[1] = TTM_64K_ORDER;
+
+	pr_debug("Used orders are %u %u %u\n", ttm_pool_orders[0],
+		 ttm_pool_orders[1], ttm_pool_orders[2]);
+
 	spin_lock_init(&shrinker_lock);
 	INIT_LIST_HEAD(&shrinker_list);
 
-	for (i = 0; i < MAX_ORDER; ++i) {
+	for (i = 0; i < TTM_DIM_ORDER; ++i) {
 		ttm_pool_type_init(&global_write_combined[i], NULL,
 				   ttm_write_combined, i);
 		ttm_pool_type_init(&global_uncached[i], NULL, ttm_uncached, i);
@@ -785,7 +817,7 @@ void ttm_pool_mgr_fini(void)
 {
 	unsigned int i;
 
-	for (i = 0; i < MAX_ORDER; ++i) {
+	for (i = 0; i < TTM_DIM_ORDER; ++i) {
 		ttm_pool_type_fini(&global_write_combined[i]);
 		ttm_pool_type_fini(&global_uncached[i]);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 08/16] drm/ttm: Add a shrinker and shrinker accounting
  2023-02-15 16:13 [RFC PATCH 00/16] Add a TTM shrinker Thomas Hellström
                   ` (6 preceding siblings ...)
  2023-02-15 16:13 ` [RFC PATCH 07/16] drm/ttm: Reduce the number of used allocation orders for TTM pages Thomas Hellström
@ 2023-02-15 16:13 ` Thomas Hellström
  2023-02-15 16:13 ` [RFC PATCH 09/16] drm/ttm: Introduce shrink throttling Thomas Hellström
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Thomas Hellström @ 2023-02-15 16:13 UTC (permalink / raw)
  To: dri-devel
  Cc: Miaohe Lin, Thomas Hellström, David Hildenbrand, NeilBrown,
	Daniel Vetter, intel-gfx, Matthew Wilcox (Oracle),
	linux-mm, Dave Hansen, linux-graphics-maintainer, Peter Xu,
	Johannes Weiner, Dave Airlie, Andrew Morton, Christian Koenig,
	Matthew Auld

Register a TTM system memory-backed object shrinker and add
accounting for shrinkable and purgeable pages. For the shrinker to work,
the driver needs to register the bo_shrink callback which is responsible
for unbinding from GPU and the dma layer if needed. Helpers for that
callback to actually perform shrinking will be introduced in upcoming
patches.

Note that we can't lock the ttm_global_mutex from within the shrinker
scan() function as that might cause a deadlock issue. To fix that, add and
use a mutex which is used for global device list manipulation only and
make sure it isn't held when registering the shrinker.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_device.c |  26 ++++---
 drivers/gpu/drm/ttm/ttm_tt.c     | 112 +++++++++++++++++++++++++++++--
 include/drm/ttm/ttm_tt.h         |   2 +
 3 files changed, 125 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index e0a2be3ed13d..ce98752d2d32 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -36,10 +36,10 @@
 
 #include "ttm_module.h"
 
-/*
- * ttm_global_mutex - protecting the global state
- */
+/* ttm_global_mutex - protects the global state init and fini. */
 static DEFINE_MUTEX(ttm_global_mutex);
+/* ttm_global_list_mutex - protects the device list. */
+static DEFINE_MUTEX(ttm_global_list_mutex);
 static unsigned ttm_glob_use_count;
 struct ttm_global ttm_glob;
 EXPORT_SYMBOL(ttm_glob);
@@ -54,6 +54,7 @@ static void ttm_global_release(void)
 	if (--ttm_glob_use_count > 0)
 		goto out;
 
+	ttm_tt_mgr_fini();
 	ttm_pool_mgr_fini();
 	debugfs_remove(ttm_debugfs_root);
 
@@ -102,7 +103,10 @@ static int ttm_global_init(void)
 		goto out;
 	}
 
+	mutex_lock(&ttm_global_list_mutex);
 	INIT_LIST_HEAD(&glob->device_list);
+	mutex_unlock(&ttm_global_list_mutex);
+
 	atomic_set(&glob->bo_count, 0);
 
 	debugfs_create_atomic_t("buffer_objects", 0444, ttm_debugfs_root,
@@ -135,7 +139,7 @@ long ttm_global_swapout(struct ttm_operation_ctx *ctx,
 	struct ttm_device *bdev;
 	long ret = 0;
 
-	mutex_lock(&ttm_global_mutex);
+	mutex_lock(&ttm_global_list_mutex);
 	list_for_each_entry(bdev, &glob->device_list, device_list) {
 		ret = ttm_device_swapout(bdev, ctx, reason);
 		if (ret > 0) {
@@ -143,7 +147,7 @@ long ttm_global_swapout(struct ttm_operation_ctx *ctx,
 			break;
 		}
 	}
-	mutex_unlock(&ttm_global_mutex);
+	mutex_unlock(&ttm_global_list_mutex);
 	return ret;
 }
 
@@ -247,9 +251,9 @@ int ttm_device_init(struct ttm_device *bdev, struct ttm_device_funcs *funcs,
 	spin_lock_init(&bdev->lru_lock);
 	INIT_LIST_HEAD(&bdev->pinned);
 	bdev->dev_mapping = mapping;
-	mutex_lock(&ttm_global_mutex);
+	mutex_lock(&ttm_global_list_mutex);
 	list_add_tail(&bdev->device_list, &glob->device_list);
-	mutex_unlock(&ttm_global_mutex);
+	mutex_unlock(&ttm_global_list_mutex);
 
 	return 0;
 }
@@ -260,14 +264,14 @@ void ttm_device_fini(struct ttm_device *bdev)
 	struct ttm_resource_manager *man;
 	unsigned i;
 
+	mutex_lock(&ttm_global_list_mutex);
+	list_del(&bdev->device_list);
+	mutex_unlock(&ttm_global_list_mutex);
+
 	man = ttm_manager_type(bdev, TTM_PL_SYSTEM);
 	ttm_resource_manager_set_used(man, false);
 	ttm_set_driver_manager(bdev, TTM_PL_SYSTEM, NULL);
 
-	mutex_lock(&ttm_global_mutex);
-	list_del(&bdev->device_list);
-	mutex_unlock(&ttm_global_mutex);
-
 	drain_workqueue(bdev->wq);
 	destroy_workqueue(bdev->wq);
 
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 771e5f3c2fee..5a57117c21ec 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -37,6 +37,7 @@
 #include <linux/module.h>
 #include <drm/drm_cache.h>
 #include <drm/ttm/ttm_bo.h>
+#include <drm/ttm/ttm_pool.h>
 #include <drm/ttm/ttm_tt.h>
 
 #include "ttm_module.h"
@@ -54,6 +55,11 @@ module_param_named(dma32_pages_limit, ttm_dma32_pages_limit, ulong, 0644);
 static atomic_long_t ttm_pages_allocated;
 static atomic_long_t ttm_dma32_pages_allocated;
 
+static long shrinkable_pages;
+static long purgeable_pages;
+static DEFINE_RWLOCK(shrinkable_lock);
+static struct shrinker mm_shrinker;
+
 static bool ttm_tt_shrinkable(const struct ttm_device *bdev,
 			      const struct ttm_tt *tt)
 {
@@ -69,6 +75,14 @@ static void ttm_tt_mod_allocated(bool dma32, long value)
 		atomic_long_add(value, &ttm_dma32_pages_allocated);
 }
 
+static void ttm_tt_mod_shrinkable_pages(long shrinkable, long purgeable)
+{
+	write_lock(&shrinkable_lock);
+	shrinkable_pages += shrinkable;
+	purgeable_pages += purgeable;
+	write_unlock(&shrinkable_lock);
+}
+
 /*
  * Allocates a ttm structure for the given BO.
  */
@@ -352,6 +366,9 @@ int ttm_tt_populate(struct ttm_device *bdev,
 		}
 	}
 
+	if (ttm_tt_shrinkable(bdev, ttm))
+		ttm_tt_mod_shrinkable_pages(ttm->num_pages, 0);
+
 	return 0;
 
 error:
@@ -368,6 +385,13 @@ void ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm)
 	if (!ttm_tt_is_populated(ttm))
 		return;
 
+	if (ttm_tt_shrinkable(bdev, ttm)) {
+		if (ttm_tt_purgeable(ttm))
+			ttm_tt_mod_shrinkable_pages(0, -(long)ttm->num_pages);
+		else
+			ttm_tt_mod_shrinkable_pages(-(long)ttm->num_pages, 0);
+	}
+
 	if (bdev->funcs->ttm_tt_unpopulate)
 		bdev->funcs->ttm_tt_unpopulate(bdev, ttm);
 	else
@@ -394,11 +418,86 @@ DEFINE_SHOW_ATTRIBUTE(ttm_tt_debugfs_shrink);
 
 #endif
 
+static unsigned long ttm_tt_shrinker_count(struct shrinker *shrink,
+					   struct shrink_control *sc)
+{
+	unsigned long num_pages;
 
-/*
- * ttm_tt_mgr_init - register with the MM shrinker
- *
- * Register with the MM shrinker for swapping out BOs.
+	num_pages = get_nr_swap_pages();
+	read_lock(&shrinkable_lock);
+	num_pages = min_t(unsigned long, num_pages, shrinkable_pages);
+	num_pages += purgeable_pages;
+	read_unlock(&shrinkable_lock);
+
+	return num_pages ? num_pages : SHRINK_EMPTY;
+}
+
+static unsigned long ttm_tt_shrinker_scan(struct shrinker *shrink,
+					  struct shrink_control *sc)
+{
+	bool is_kswapd = current_is_kswapd();
+	struct ttm_operation_ctx ctx = {
+		.interruptible = false,
+		.no_wait_gpu = !is_kswapd,
+	};
+	unsigned long nr_to_scan, freed = 0;
+	long ret;
+
+	sc->nr_scanned = 0;
+	nr_to_scan = sc->nr_to_scan;
+
+	while (freed < nr_to_scan) {
+		ret = ttm_global_swapout(&ctx, TTM_SHRINK_PURGE);
+		if (ret <= 0)
+			break;
+
+		freed += ret;
+	}
+
+	sc->nr_scanned = freed;
+	if (freed < nr_to_scan)
+		nr_to_scan -= freed;
+	else
+		nr_to_scan = 0;
+	if (!nr_to_scan)
+		return freed ? freed : SHRINK_STOP;
+
+	while (freed < nr_to_scan) {
+		ret = ttm_global_swapout(&ctx, TTM_SHRINK_SWAP);
+		if (ret <= 0)
+			break;
+
+		freed += ret;
+	}
+
+	sc->nr_scanned = freed;
+
+	return freed ? freed : SHRINK_STOP;
+}
+
+/**
+ * ttm_tt_mgr_fini() - Check shrinkable accounting consistensy and remove
+ * the shrinker.
+ */
+void ttm_tt_mgr_fini(void)
+{
+	if (WARN_ON_ONCE(atomic_long_read(&ttm_pages_allocated) ||
+			 atomic_long_read(&ttm_dma32_pages_allocated) ||
+			 shrinkable_pages || purgeable_pages)) {
+		pr_warn("Inconsistent ttm_tt accounting:\n");
+		pr_warn("pages %ld dma32 %ld shrinkable %ld purgeable %ld\n",
+			atomic_long_read(&ttm_pages_allocated),
+			atomic_long_read(&ttm_dma32_pages_allocated),
+			shrinkable_pages, purgeable_pages);
+	}
+
+	unregister_shrinker(&mm_shrinker);
+}
+
+/**
+ * ttm_tt_mgr_init() - Provide watermark limits and register the shrinker.
+ * @num_pages - Number of pages TTM is allowed to pin.
+ * @num_dma32_pages - Number of dma32 pages TTM is allowed to pin.
  */
 void ttm_tt_mgr_init(unsigned long num_pages, unsigned long num_dma32_pages)
 {
@@ -412,6 +511,11 @@ void ttm_tt_mgr_init(unsigned long num_pages, unsigned long num_dma32_pages)
 
 	if (!ttm_dma32_pages_limit)
 		ttm_dma32_pages_limit = num_dma32_pages;
+
+	mm_shrinker.count_objects = ttm_tt_shrinker_count;
+	mm_shrinker.scan_objects = ttm_tt_shrinker_scan;
+	mm_shrinker.seeks = DEFAULT_SEEKS;
+	(void)register_shrinker(&mm_shrinker, "ttm-objects");
 }
 
 static void ttm_kmap_iter_tt_map_local(struct ttm_kmap_iter *iter,
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index 627168eba8f6..3f99787e2b93 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -221,6 +221,8 @@ static inline void ttm_tt_mark_for_clear(struct ttm_tt *ttm)
 
 void ttm_tt_mgr_init(unsigned long num_pages, unsigned long num_dma32_pages);
 
+void ttm_tt_mgr_fini(void);
+
 struct ttm_kmap_iter *ttm_kmap_iter_tt_init(struct ttm_kmap_iter_tt *iter_tt,
 					    struct ttm_tt *tt);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 09/16] drm/ttm: Introduce shrink throttling.
  2023-02-15 16:13 [RFC PATCH 00/16] Add a TTM shrinker Thomas Hellström
                   ` (7 preceding siblings ...)
  2023-02-15 16:13 ` [RFC PATCH 08/16] drm/ttm: Add a shrinker and shrinker accounting Thomas Hellström
@ 2023-02-15 16:13 ` Thomas Hellström
  2023-02-15 16:13 ` [RFC PATCH 10/16] drm/ttm: Remove pinned bos from shrinkable accounting Thomas Hellström
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Thomas Hellström @ 2023-02-15 16:13 UTC (permalink / raw)
  To: dri-devel
  Cc: Miaohe Lin, Thomas Hellström, David Hildenbrand, NeilBrown,
	Daniel Vetter, intel-gfx, Matthew Wilcox (Oracle),
	linux-mm, Dave Hansen, linux-graphics-maintainer, Peter Xu,
	Johannes Weiner, Dave Airlie, Andrew Morton, Christian Koenig,
	Matthew Auld

Since pages are not immediately freed by the TTM shrinker but rather
inserted into the swap cache, the system will keep on calling the
shrinker rapidly filling the swap cache which has a negative impact
on system performance.

When shrinking, throttle on the number of pages present in the swap
cache.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_tt.c | 40 ++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 5a57117c21ec..848adf2a623e 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -432,6 +432,42 @@ static unsigned long ttm_tt_shrinker_count(struct shrinker *shrink,
 	return num_pages ? num_pages : SHRINK_EMPTY;
 }
 
+#define TTM_SWAP_MIN_SWAP_PAGES (SZ_128M >> PAGE_SHIFT)
+#define TTM_SWAP_MAX_SWAPCACHE_PAGES (SZ_1G >> PAGE_SHIFT)
+static unsigned long ttm_tt_shrinker_throttle(unsigned long pages)
+{
+	unsigned long
+		tmp = get_nr_swap_pages();
+
+	/*
+	 * Draining available swap space too far will trigger
+	 * systemd-oomd even if there are a huge number of dirty pages
+	 * available for laundry and free in the swap cache. Don't drain
+	 * the available swap-space too far.
+	 */
+	if (tmp > TTM_SWAP_MIN_SWAP_PAGES)
+		tmp -= TTM_SWAP_MIN_SWAP_PAGES;
+	else
+		tmp = 0;
+
+	pages = min(tmp, pages);
+
+	/*
+	 * Our shrinker doesn't immediately free pages unless they belong
+	 * to purgeable objects. Rather they are inserted into the swap-cache.
+	 * But the system doesn't really get this and continues to call our
+	 * shrinker thinking it's still out of memory, when it could just
+	 * laundry pages in the swap cache and free them. So throttle on the
+	 * number of pages in the swap cache.
+	 */
+
+	tmp = total_swapcache_pages();
+	if (tmp > TTM_SWAP_MAX_SWAPCACHE_PAGES)
+		pages = 0;
+
+	return pages;
+}
+
 static unsigned long ttm_tt_shrinker_scan(struct shrinker *shrink,
 					  struct shrink_control *sc)
 {
@@ -459,6 +495,10 @@ static unsigned long ttm_tt_shrinker_scan(struct shrinker *shrink,
 		nr_to_scan -= freed;
 	else
 		nr_to_scan = 0;
+
+	if (nr_to_scan)
+		nr_to_scan = ttm_tt_shrinker_throttle(nr_to_scan);
+
 	if (!nr_to_scan)
 		return freed ? freed : SHRINK_STOP;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 10/16] drm/ttm: Remove pinned bos from shrinkable accounting
  2023-02-15 16:13 [RFC PATCH 00/16] Add a TTM shrinker Thomas Hellström
                   ` (8 preceding siblings ...)
  2023-02-15 16:13 ` [RFC PATCH 09/16] drm/ttm: Introduce shrink throttling Thomas Hellström
@ 2023-02-15 16:13 ` Thomas Hellström
  2023-02-15 16:14 ` [RFC PATCH 11/16] drm/ttm: Add a simple api to set / clear purgeable ttm_tt content Thomas Hellström
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Thomas Hellström @ 2023-02-15 16:13 UTC (permalink / raw)
  To: dri-devel
  Cc: Miaohe Lin, Thomas Hellström, David Hildenbrand, NeilBrown,
	Daniel Vetter, intel-gfx, Matthew Wilcox (Oracle),
	linux-mm, Dave Hansen, linux-graphics-maintainer, Peter Xu,
	Johannes Weiner, Dave Airlie, Andrew Morton, Christian Koenig,
	Matthew Auld

Pinned bos aren't shinkable and needs to be removed from the shrinkable
accounting. Do that, and in the process constify the tt argument to
ttm_tt_is_populated.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c |  7 +++++++
 drivers/gpu/drm/ttm/ttm_tt.c | 22 ++++++++++++++++++++++
 include/drm/ttm/ttm_tt.h     |  6 +++++-
 3 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index e5c0970564c0..e59e2a4605d0 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -650,6 +650,10 @@ void ttm_bo_pin(struct ttm_buffer_object *bo)
 {
 	dma_resv_assert_held(bo->base.resv);
 	WARN_ON_ONCE(!kref_read(&bo->kref));
+
+	if (!bo->pin_count && bo->ttm)
+		ttm_tt_set_pinned(bo->bdev, bo->ttm);
+
 	spin_lock(&bo->bdev->lru_lock);
 	if (bo->resource)
 		ttm_resource_del_bulk_move(bo->resource, bo);
@@ -671,6 +675,9 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo)
 	if (WARN_ON_ONCE(!bo->pin_count))
 		return;
 
+	if (bo->pin_count == 1 && bo->ttm)
+		ttm_tt_set_unpinned(bo->bdev, bo->ttm);
+
 	spin_lock(&bo->bdev->lru_lock);
 	--bo->pin_count;
 	if (bo->resource)
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 848adf2a623e..a39c617c7a8e 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -83,6 +83,28 @@ static void ttm_tt_mod_shrinkable_pages(long shrinkable, long purgeable)
 	write_unlock(&shrinkable_lock);
 }
 
+/**
+ * ttm_tt_set_pinned() - Modify the shinkable accounting when pinning a bo.
+ * @bdev: The TTM device.
+ * @tt: The struct tt_tt used by the pinned bo.
+ */
+void ttm_tt_set_pinned(const struct ttm_device *bdev, const struct ttm_tt *tt)
+{
+	if (ttm_tt_shrinkable(bdev, tt) && ttm_tt_is_populated(tt))
+		ttm_tt_mod_shrinkable_pages(-(long)tt->num_pages, 0);
+}
+
+/**
+ * ttm_tt_set_unpinned() - Modify the shinkable accounting when unpinning a bo.
+ * @bdev: The TTM device.
+ * @tt: The struct tt_tt used by the no longer pinned bo.
+ */
+void ttm_tt_set_unpinned(const struct ttm_device *bdev, const struct ttm_tt *tt)
+{
+	if (ttm_tt_shrinkable(bdev, tt) && ttm_tt_is_populated(tt))
+		ttm_tt_mod_shrinkable_pages(tt->num_pages, 0);
+}
+
 /*
  * Allocates a ttm structure for the given BO.
  */
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index 3f99787e2b93..69467671c2dd 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -118,7 +118,7 @@ struct ttm_kmap_iter_tt {
 	pgprot_t prot;
 };
 
-static inline bool ttm_tt_is_populated(struct ttm_tt *tt)
+static inline bool ttm_tt_is_populated(const struct ttm_tt *tt)
 {
 	return tt->page_flags & TTM_TT_FLAG_PRIV_POPULATED;
 }
@@ -238,6 +238,10 @@ static inline bool ttm_tt_purgeable(struct ttm_tt *tt)
 	return tt->page_flags & TTM_TT_FLAG_DONTNEED;
 }
 
+void ttm_tt_set_pinned(const struct ttm_device *bdev, const struct ttm_tt *tt);
+
+void ttm_tt_set_unpinned(const struct ttm_device *bdev, const struct ttm_tt *tt);
+
 #if IS_ENABLED(CONFIG_AGP)
 #include <linux/agp_backend.h>
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 11/16] drm/ttm: Add a simple api to set / clear purgeable ttm_tt content
  2023-02-15 16:13 [RFC PATCH 00/16] Add a TTM shrinker Thomas Hellström
                   ` (9 preceding siblings ...)
  2023-02-15 16:13 ` [RFC PATCH 10/16] drm/ttm: Remove pinned bos from shrinkable accounting Thomas Hellström
@ 2023-02-15 16:14 ` Thomas Hellström
  2023-02-15 16:14 ` [RFC PATCH 12/16] mm: Add interfaces to back up and recover folio contents using swap Thomas Hellström
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Thomas Hellström @ 2023-02-15 16:14 UTC (permalink / raw)
  To: dri-devel
  Cc: Miaohe Lin, Thomas Hellström, David Hildenbrand, NeilBrown,
	Daniel Vetter, intel-gfx, Matthew Wilcox (Oracle),
	linux-mm, Dave Hansen, linux-graphics-maintainer, Peter Xu,
	Johannes Weiner, Dave Airlie, Andrew Morton, Christian Koenig,
	Matthew Auld

In the absence of free swap space, a shrinker could still efficiently
free memory the content of which is no longer needed, and graphics
drivers typically has an interface to mark buffer object content as
no longer needed.

Add a possibility to propagate this to TTM, so that the shrinker
accounting and shrinker actions can be updated accordingly.

Moving forward, we will probably want this interface on the bo level and
have bo move support for it, but for now we strictly only need it for
the shrinker. Another option would be to have the drivers do the
purgeable vs shrinkable accounting.

This still leaves the responsibility to the driver to assign proper
LRU priority to purgeable buffer object so that the shrinker finds those
objects early during LRU traversal.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_tt.c | 59 ++++++++++++++++++++++++++++++++++++
 include/drm/ttm/ttm_tt.h     |  3 ++
 2 files changed, 62 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index a39c617c7a8e..c63be8f5ed2a 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -105,6 +105,65 @@ void ttm_tt_set_unpinned(const struct ttm_device *bdev, const struct ttm_tt *tt)
 		ttm_tt_mod_shrinkable_pages(tt->num_pages, 0);
 }
 
+/**
+ * ttm_tt_set_dontneed() - Mark ttm_tt content as not needed.
+ * @bdev: The ttm device.
+ * @tt: The struct ttm_tt.
+ *
+ * Mark the ttm_tt content as not needed for the shrinker accounting.
+ * This also means that the content will not be backed up on shrinking,
+ * but rather freed immediately.
+ *
+ * Return: 0 if successful, -EALREADY if content was never present or
+ * already backed up and was purged by this call.
+ */
+int ttm_tt_set_dontneed(const struct ttm_device *bdev, struct ttm_tt *tt)
+{
+	if (ttm_tt_is_populated(tt)) {
+		if (!ttm_tt_purgeable(tt)) {
+			tt->page_flags |= TTM_TT_FLAG_DONTNEED;
+			if (ttm_tt_shrinkable(bdev, tt))
+				ttm_tt_mod_shrinkable_pages(-(long)tt->num_pages,
+							    tt->num_pages);
+		}
+		return 0;
+	}
+
+	if (tt->swap_storage)
+		fput(tt->swap_storage);
+	tt->swap_storage = NULL;
+
+	return -EALREADY;
+}
+EXPORT_SYMBOL(ttm_tt_set_dontneed);
+
+/**
+ * ttm_tt_set_willneed() - Mark tt_tt content as needed.
+ * @bdev: The ttm device.
+ * @tt: The struct ttm_tt.
+ *
+ * Mark the ttm_tt content as needed and update the shrinker accounting
+ * accordingly.
+ *
+ * Return: 0 if successful, -EALREADY if content was never present or
+ * was already purged.
+ */
+int ttm_tt_set_willneed(const struct ttm_device *bdev, struct ttm_tt *tt)
+{
+	if (ttm_tt_is_populated(tt)) {
+		if (ttm_tt_purgeable(tt)) {
+			tt->page_flags &= ~TTM_TT_FLAG_DONTNEED;
+			if (ttm_tt_shrinkable(bdev, tt))
+				ttm_tt_mod_shrinkable_pages(tt->num_pages,
+							    -(long)tt->num_pages);
+		}
+		return 0;
+	}
+
+	return -EALREADY;
+}
+EXPORT_SYMBOL(ttm_tt_set_willneed);
+
 /*
  * Allocates a ttm structure for the given BO.
  */
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index 69467671c2dd..abb17527f76c 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -241,6 +241,9 @@ static inline bool ttm_tt_purgeable(struct ttm_tt *tt)
 void ttm_tt_set_pinned(const struct ttm_device *bdev, const struct ttm_tt *tt);
 
 void ttm_tt_set_unpinned(const struct ttm_device *bdev, const struct ttm_tt *tt);
+int ttm_tt_set_dontneed(const struct ttm_device *bdev, struct ttm_tt *tt);
+
+int ttm_tt_set_willneed(const struct ttm_device *bdev, struct ttm_tt *tt);
 
 #if IS_ENABLED(CONFIG_AGP)
 #include <linux/agp_backend.h>
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 12/16] mm: Add interfaces to back up and recover folio contents using swap
  2023-02-15 16:13 [RFC PATCH 00/16] Add a TTM shrinker Thomas Hellström
                   ` (10 preceding siblings ...)
  2023-02-15 16:14 ` [RFC PATCH 11/16] drm/ttm: Add a simple api to set / clear purgeable ttm_tt content Thomas Hellström
@ 2023-02-15 16:14 ` Thomas Hellström
  2023-02-15 16:14 ` [RFC PATCH 13/16] drm/ttm: Make the call to ttm_tt_populate() interruptible when faulting Thomas Hellström
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Thomas Hellström @ 2023-02-15 16:14 UTC (permalink / raw)
  To: dri-devel
  Cc: Miaohe Lin, Thomas Hellström, David Hildenbrand, NeilBrown,
	Daniel Vetter, intel-gfx, Dave Hansen, Matthew Wilcox (Oracle),
	linux-mm, linux-graphics-maintainer, Peter Xu, Johannes Weiner,
	Dave Airlie, Andrew Morton, Christian Koenig, Matthew Auld

GPU drivers have traditionally used shmem to back up GPU buffer contents
for swap on physical memory shortage. Some integrated GPU drivers use
shmem files as the backing storage for their GPU buffers, other drivers,
in particular drivers that need a Write-Combining caching strategy on
system pages, (but also drivers for discrete gpus in general) need to copy
to shmem on anticipated memory shortage.

The latter strategy does not lend itself very well to shrinker usage,
since shmem memory needs to be allocated and page trylocking of pagecache
pages need to be performed from reclaim context and both are prone to
failures. That makes the approach very fragile at best.

Add interfaces for GPU drivers to directly insert pages into the
swap-cache, thereby bypassing shmem and avoiding the shmem page
allocation and locking at shrink time completely, as well as the
content copy.

Also add a kunit test for experimenting with the interface functionality,
currently it seems PMD size folios doesn't work properly. Needs
further investigation if this is a viable approach.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: NeilBrown <neilb@suse.de>
Cc: linux-mm@kvack.org

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 include/linux/swap.h        |  10 ++
 mm/Kconfig                  |  18 ++++
 mm/Makefile                 |   2 +
 mm/swap_backup_folio.c      | 178 ++++++++++++++++++++++++++++++++++++
 mm/swap_backup_folio_test.c | 111 ++++++++++++++++++++++
 5 files changed, 319 insertions(+)
 create mode 100644 mm/swap_backup_folio.c
 create mode 100644 mm/swap_backup_folio_test.c

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 0ceed49516ad..fc38c72fe9ab 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -706,5 +706,15 @@ static inline bool mem_cgroup_swap_full(struct folio *folio)
 }
 #endif
 
+#ifdef CONFIG_SWAP_BACKUP_FOLIO
+swp_entry_t swap_backup_folio(struct folio *folio, bool writeback,
+			      gfp_t folio_gfp, gfp_t alloc_gfp);
+
+int swap_copy_folio(swp_entry_t swap, struct page *page, unsigned long index,
+		    bool killable);
+
+void swap_drop_folio(swp_entry_t swap);
+#endif
+
 #endif /* __KERNEL__*/
 #endif /* _LINUX_SWAP_H */
diff --git a/mm/Kconfig b/mm/Kconfig
index ff7b209dec05..b9e0a40e9e1a 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -191,6 +191,10 @@ config ZSMALLOC_STAT
 	  information to userspace via debugfs.
 	  If unsure, say N.
 
+config SWAP_BACKUP_FOLIO
+       bool
+       default n
+
 menu "SLAB allocator options"
 
 choice
@@ -1183,6 +1187,20 @@ config LRU_GEN_STATS
 	  This option has a per-memcg and per-node memory overhead.
 # }
 
+config SWAP_BACKUP_FOLIO_KUNIT_TEST
+       tristate "KUnit tests for swap_backup_folio() functionality" if !KUNIT_ALL_TESTS
+       depends on SWAP && KUNIT && SWAP_BACKUP_FOLIO
+       help
+	 This builds unit tests for the swap_backup_folio_functionality().
+	 This option is not useful for distributions or general kernels,
+	 but only for kernel developers working on MM swap functionality.
+
+	 For more information on KUnit and unit tests in general,
+	 please refer to the KUnit documentation in
+	 Documentation/dev-tools/kunit/.
+
+	 If in doubt, say "N".
+
 source "mm/damon/Kconfig"
 
 endmenu
diff --git a/mm/Makefile b/mm/Makefile
index 8e105e5b3e29..91cb9c73e16e 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -138,3 +138,5 @@ obj-$(CONFIG_IO_MAPPING) += io-mapping.o
 obj-$(CONFIG_HAVE_BOOTMEM_INFO_NODE) += bootmem_info.o
 obj-$(CONFIG_GENERIC_IOREMAP) += ioremap.o
 obj-$(CONFIG_SHRINKER_DEBUG) += shrinker_debug.o
+obj-$(CONFIG_SWAP_BACKUP_FOLIO) += swap_backup_folio.o
+obj-$(CONFIG_SWAP_BACKUP_FOLIO_KUNIT_TEST) += swap_backup_folio_test.o
diff --git a/mm/swap_backup_folio.c b/mm/swap_backup_folio.c
new file mode 100644
index 000000000000..f77ca478e625
--- /dev/null
+++ b/mm/swap_backup_folio.c
@@ -0,0 +1,178 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/mm_types.h>
+#include <linux/module.h>
+#include <linux/pagemap.h>
+#include <linux/swap.h>
+
+#include <linux/mm_inline.h>
+#include "swap.h"
+
+/**
+ * swap_backup_folio() - Insert an isolated folio into the swap-cache.
+ * @folio: The folio to insert.
+ * @writeback: Whether to perform immediate writeback.
+ * @folio_gfp: The gfp value used when the folio was allocated. Used for
+ *             cgroup charging only.
+ * @alloc_fgp: The gfp value used for swap cache radix tree memory allocations.
+ *
+ * Insert a folio into the swap cache and get a swp_entry_t back as a reference.
+ * If the swap cache folio should be subject of immediate writeback to
+ * a swap device, @writeback should be set to true.
+ * After a call to swap_backup_folio() the caller can
+ * drop its folio reference and use swap_copy_folio() to get the folio
+ * content back, or swap_drop_folio() to drop it completely.
+ * Currently only PAGE_SIZE folios work, or if CONFIG_THP_SWAP is
+ * enabled, HPAGE_PMD_NR*PAGE_SIZE may work as well, although that
+ * needs further testing.
+ *
+ * Return: A swp_entry_t. If its .val field is zero, an error occurred.
+ */
+swp_entry_t swap_backup_folio(struct folio *folio, bool writeback,
+			      gfp_t folio_gfp, gfp_t alloc_gfp)
+{
+	swp_entry_t swap = {};
+
+	if (VM_WARN_ON_ONCE_FOLIO(folio_nr_pages(folio) != 1 &&
+				  !(IS_ENABLED(CONFIG_THP_SWAP) &&
+				    folio_nr_pages(folio) == HPAGE_PMD_NR),
+				  folio))
+		return swap;
+
+	if (VM_WARN_ON_ONCE_FOLIO(folio_ref_count(folio) != 1 ||
+				  folio_test_lru(folio) ||
+				  folio_test_locked(folio), folio))
+		return swap;
+
+	/*
+	 * Typically called from reclaim so use folio_trylock. If the folio
+	 * is isolated with refcount == 1, then this trylock should always
+	 * succeed.
+	 */
+	if (!folio_trylock(folio))
+		return swap;
+
+	__folio_mark_uptodate(folio);
+	__folio_set_swapbacked(folio);
+
+	mem_cgroup_charge(folio, NULL, folio_gfp);
+
+	swap = folio_alloc_swap(folio);
+	if (!swap.val)
+		goto out;
+
+	if (add_to_swap_cache(folio, swap, alloc_gfp, NULL) == 0) {
+		int ret = -EINVAL;
+
+		swap_shmem_alloc(swap);
+		folio_add_lru(folio);
+		lru_add_drain();
+
+		/* Stolen from pageout(). */
+		if (writeback && folio_clear_dirty_for_io(folio)) {
+			struct writeback_control wbc = {
+				.sync_mode = WB_SYNC_NONE,
+				.nr_to_write = SWAP_CLUSTER_MAX,
+				.range_start = 0,
+				.range_end = LLONG_MAX,
+				.for_reclaim = 1,
+			};
+
+			folio_set_reclaim(folio);
+			ret = swap_writepage(folio_page(folio, 0), &wbc);
+			if (!folio_test_writeback(folio))
+				folio_clear_reclaim(folio);
+		}
+
+		if (ret)
+			folio_unlock(folio);
+		return swap;
+	}
+
+	put_swap_folio(folio, swap);
+out:
+	folio_clear_swapbacked(folio);
+	folio_mark_dirty(folio);
+	folio_unlock(folio);
+	mem_cgroup_uncharge(folio);
+
+	return swap;
+}
+EXPORT_SYMBOL(swap_backup_folio);
+
+/**
+ * swap_copy_folio() - Copy folio content that was previously backed up
+ * @swap: The swp_entry_t returned from swap_backup_folio().
+ * @to_page: The page to copy to.
+ * @index: The index to the source page in the folio represented by @swap.
+ * @killable: Whether to perform sleeping operations killable.
+ *
+ * Copies content that was previously backed up using swap_backup_folio(),
+ * to the destination page to_page. The swp_entry_t @swap is not freed, and
+ * copying can thus be done multiple times using @swap.
+ *
+ * Return: Zero on success, negative error code on error. In particular,
+ * -EINTR may be returned if a fatal signal is pending during wait for
+ * page-lock or wait for writeback and @killable is set to true.
+ */
+int swap_copy_folio(swp_entry_t swap, struct page *to_page,
+		    unsigned long index, bool killable)
+{
+	struct folio *folio = swap_cache_get_folio(swap, NULL, 0);
+	int ret;
+
+	if (!folio) {
+		struct vm_fault vmf = {};
+		struct page *page;
+
+		page = swap_cluster_readahead(swap, GFP_HIGHUSER_MOVABLE, &vmf);
+		if (page)
+			folio = page_folio(page);
+	}
+
+	if (!folio)
+		return -ENOMEM;
+
+	if (killable) {
+		ret = __folio_lock_killable(folio);
+		if (ret)
+			goto out_err;
+	} else {
+		folio_lock(folio);
+	}
+
+	VM_WARN_ON_ONCE_FOLIO(!folio_test_swapcache(folio) ||
+			      folio_swap_entry(folio).val != swap.val ||
+			      !folio_test_uptodate(folio), folio);
+
+	if (killable) {
+		ret = folio_wait_writeback_killable(folio);
+		if (ret)
+			goto out_err;
+	} else {
+		folio_wait_writeback(folio);
+	}
+
+	arch_swap_restore(swap, folio);
+	folio_unlock(folio);
+
+	copy_highpage(to_page, folio_page(folio, index));
+out_err:
+	folio_put(folio);
+	return ret;
+}
+EXPORT_SYMBOL(swap_copy_folio);
+
+/**
+ * swap_drop_folio - Drop a swap entry and its associated swap cache folio
+ * if any.
+ * @swap: The swap entry.
+ *
+ * Releases resources associated with a swap entry returned from
+ * swap_backup_folio().
+ */
+void swap_drop_folio(swp_entry_t swap)
+{
+	free_swap_and_cache(swap);
+}
+EXPORT_SYMBOL(swap_drop_folio);
diff --git a/mm/swap_backup_folio_test.c b/mm/swap_backup_folio_test.c
new file mode 100644
index 000000000000..34cde56d2a57
--- /dev/null
+++ b/mm/swap_backup_folio_test.c
@@ -0,0 +1,111 @@
+// SPDX-License-Identifier: MIT or GPL-2.0
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#include <kunit/test.h>
+#include <linux/delay.h>
+#include <linux/swap.h>
+#include <linux/sysinfo.h>
+
+struct gpu_swapped_page {
+	struct list_head link;
+	swp_entry_t swap;
+};
+
+static void swap_backup_test(struct kunit *test)
+{
+	gfp_t gfp = GFP_HIGHUSER_MOVABLE | __GFP_RETRY_MAYFAIL | __GFP_NOWARN;
+	struct gpu_swapped_page *gsp, *next;
+	struct folio *folio;
+	LIST_HEAD(list);
+	long i = 0L;
+	long num_folios;
+	unsigned long avail_ram;
+
+	avail_ram = si_mem_available() << PAGE_SHIFT;
+	kunit_info(test, "Available RAM is %lu MiB.\n", avail_ram / SZ_1M);
+	num_folios = get_nr_swap_pages();
+	num_folios = min_t(long, num_folios, avail_ram >> PAGE_SHIFT);
+
+	kunit_info(test, "Trying %ld swap pages\n", num_folios);
+
+	do {
+		/*
+		 * Expect folio_alloc() (out-of-physical-memory) or
+		 * swap_backup_folio() (out-of-swap-space) to fail before
+		 * this kzalloc().
+		 */
+		gsp = kzalloc(sizeof(*gsp), GFP_KERNEL);
+		if (!gsp) {
+			KUNIT_FAIL(test, "alloc gsp failed.\n");
+			break;
+		}
+
+		folio = vma_alloc_folio(gfp, 0, NULL, 0, false);
+		if (!folio) {
+			kunit_info(test, "folio_alloc failed.\n");
+			kfree(gsp);
+			break;
+		}
+
+		folio_mark_dirty(folio);
+
+		/* Use true instead of false here to trigger immediate writeback. */
+		gsp->swap = swap_backup_folio(folio, false, gfp,
+					      GFP_KERNEL | __GFP_HIGH |
+					      __GFP_NOWARN);
+		if (gsp->swap.val == 0) {
+			kunit_info(test, "swap_backup_folio() failed.\n");
+			folio_put(folio);
+			kfree(gsp);
+			break;
+		}
+
+		list_add_tail(&gsp->link, &list);
+		folio_put(folio);
+		cond_resched();
+		if (i % 1000 == 0)
+			kunit_info(test, "Backed up %ld\n", i);
+	} while (i++ < num_folios);
+
+	i = 0;
+	list_for_each_entry_safe(gsp, next, &list, link) {
+		int ret;
+
+		folio = folio_alloc(GFP_HIGHUSER, 0);
+		if (!folio) {
+			KUNIT_FAIL(test, "Allocation of readback folio failed.\n");
+		} else {
+			ret = swap_copy_folio(gsp->swap, folio_page(folio, 0),
+					      0, false);
+			if (ret)
+				KUNIT_FAIL(test, "swap_copy_folio() failed.\n");
+		}
+		folio_put(folio);
+		swap_drop_folio(gsp->swap);
+		list_del(&gsp->link);
+		kfree(gsp);
+		i++;
+		cond_resched();
+		if (i % 1000 == 0)
+			kunit_info(test, "Recovered %ld\n", i);
+	}
+
+	kunit_info(test, "Recover_total: %ld\n", i);
+}
+
+static struct kunit_case swap_backup_tests[] = {
+	KUNIT_CASE(swap_backup_test),
+	{}
+};
+
+static struct kunit_suite swap_backup_test_suite = {
+	.name = "swap_backup_folio",
+	.test_cases = swap_backup_tests,
+};
+
+kunit_test_suite(swap_backup_test_suite);
+
+MODULE_AUTHOR("Intel Corporation");
+MODULE_LICENSE("Dual MIT/GPL");
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 13/16] drm/ttm: Make the call to ttm_tt_populate() interruptible when faulting
  2023-02-15 16:13 [RFC PATCH 00/16] Add a TTM shrinker Thomas Hellström
                   ` (11 preceding siblings ...)
  2023-02-15 16:14 ` [RFC PATCH 12/16] mm: Add interfaces to back up and recover folio contents using swap Thomas Hellström
@ 2023-02-15 16:14 ` Thomas Hellström
  2023-02-15 16:14 ` [RFC PATCH 14/16] drm/ttm: Provide helpers for shrinking Thomas Hellström
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Thomas Hellström @ 2023-02-15 16:14 UTC (permalink / raw)
  To: dri-devel
  Cc: Miaohe Lin, Thomas Hellström, David Hildenbrand, NeilBrown,
	Daniel Vetter, intel-gfx, Matthew Wilcox (Oracle),
	linux-mm, Dave Hansen, linux-graphics-maintainer, Peter Xu,
	Johannes Weiner, Dave Airlie, Andrew Morton, Christian Koenig,
	Matthew Auld

When swapping in, or under memory pressure ttm_tt_populate() may sleep
for a substantiable amount of time. Allow interrupts during the sleep.
This will also allow us to inject -EINTR errors during swapin in upcoming
patches.

Also avoid returning VM_FAULT_OOM, since that will confuse the core
mm, making it print out a confused message and retrying the fault.
Return VM_FAULT_SIGBUS also under OOM conditions.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_bo_vm.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 3ecda6db24b8..80f106bfe385 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -218,14 +218,21 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
 	prot = ttm_io_prot(bo, bo->resource, prot);
 	if (!bo->resource->bus.is_iomem) {
 		struct ttm_operation_ctx ctx = {
-			.interruptible = false,
+			.interruptible = true,
 			.no_wait_gpu = false,
 			.force_alloc = true
 		};
 
 		ttm = bo->ttm;
-		if (ttm_tt_populate(bdev, bo->ttm, &ctx))
-			return VM_FAULT_OOM;
+		err = ttm_tt_populate(bdev, bo->ttm, &ctx);
+		if (err) {
+			if (err == -EINTR || err == -ERESTARTSYS ||
+			    err == -EAGAIN)
+				return VM_FAULT_NOPAGE;
+
+			pr_debug("TTM fault hit %pe.\n", ERR_PTR(err));
+			return VM_FAULT_SIGBUS;
+		}
 	} else {
 		/* Iomem should not be marked encrypted */
 		prot = pgprot_decrypted(prot);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 14/16] drm/ttm: Provide helpers for shrinking
  2023-02-15 16:13 [RFC PATCH 00/16] Add a TTM shrinker Thomas Hellström
                   ` (12 preceding siblings ...)
  2023-02-15 16:14 ` [RFC PATCH 13/16] drm/ttm: Make the call to ttm_tt_populate() interruptible when faulting Thomas Hellström
@ 2023-02-15 16:14 ` Thomas Hellström
  2023-02-15 16:14 ` [RFC PATCH 15/16] drm/ttm: Use fault-injection to test error paths Thomas Hellström
  2023-02-15 16:14 ` [RFC PATCH 16/16] drm/i915, drm/ttm: Use the TTM shrinker rather than the external shmem pool Thomas Hellström
  15 siblings, 0 replies; 32+ messages in thread
From: Thomas Hellström @ 2023-02-15 16:14 UTC (permalink / raw)
  To: dri-devel
  Cc: Miaohe Lin, Thomas Hellström, David Hildenbrand, NeilBrown,
	Daniel Vetter, intel-gfx, Matthew Wilcox (Oracle),
	linux-mm, Dave Hansen, linux-graphics-maintainer, Peter Xu,
	Johannes Weiner, Dave Airlie, Andrew Morton, Christian Koenig,
	Matthew Auld

Provide a helper to be used by the driver bo_shrink() callback to either
insert the pages of a struct ttm_tt into the swap-cache or to purge them
if the struct ttm_tt is purgeable. For pages with write-combined or
uncached linear kernel map, that linear kernel map is first changed to
cached.

Release pages with as little intermediate memory allocation as
possible, however some memory might be allocated during swapout for the
swap space radix tree.

Due to swapout- or swapin errors, allow partially swapped out struct
ttm_tt's, although mark them as swapped out stopping them from being
swapped out a second time. More details in the ttm_pool.c DOC section.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/Kconfig        |   1 +
 drivers/gpu/drm/ttm/ttm_pool.c | 403 +++++++++++++++++++++++++++++++--
 drivers/gpu/drm/ttm/ttm_tt.c   |  34 +++
 include/drm/ttm/ttm_pool.h     |   4 +
 include/drm/ttm/ttm_tt.h       |  10 +
 5 files changed, 437 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index dc0f94f02a82..1efd33411a92 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -196,6 +196,7 @@ source "drivers/gpu/drm/display/Kconfig"
 config DRM_TTM
 	tristate
 	depends on DRM && MMU
+	select SWAP_BACKUP_FOLIO
 	help
 	  GPU memory management subsystem for devices with multiple
 	  GPU memory types. Will be enabled automatically if a device driver
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 8787fb6a218b..319998b4a325 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -38,6 +38,7 @@
 #include <linux/debugfs.h>
 #include <linux/highmem.h>
 #include <linux/sched/mm.h>
+#include <linux/swap.h>
 
 #ifdef CONFIG_X86
 #include <asm/set_memory.h>
@@ -72,6 +73,32 @@ struct ttm_pool_dma {
 	unsigned long vaddr;
 };
 
+/**
+ * struct ttm_pool_tt_restore - State representing restore from swap.
+ * @alloced_pages: Total number of already allocated pages for the ttm_tt.
+ * @restored_pages: Number of (sub) pages restored from swap for this
+ *		     chunk of 1 << @order pages.
+ * @first_page: The ttm page ptr representing for @old_pages[0].
+ * @caching_divide: Page pointer where subsequent pages are cached.
+ * @old_pages: Backup copy of page pointers that were replaced by the new
+ *	       page allocation.
+ * @pool: The pool used for page allocation while restoring.
+ * @order: The order of the last page allocated while restoring.
+ *
+ * Recovery from swap space might fail when we've recovered less than the
+ * full ttm_tt. In order not to loose any data (yet), keep information
+ * around that allows us to restart a failed ttm swap-space recovery.
+ */
+struct ttm_pool_tt_restore {
+	pgoff_t alloced_pages;
+	pgoff_t restored_pages;
+	struct page **first_page;
+	struct page **caching_divide;
+	struct page *old_pages[1 << TTM_MAX_ORDER];
+	struct ttm_pool *pool;
+	unsigned int order;
+};
+
 static unsigned long page_pool_size;
 
 MODULE_PARM_DESC(page_pool_size, "Number of pages in the WC/UC/DMA pool");
@@ -91,6 +118,23 @@ static struct shrinker mm_shrinker;
 
 static unsigned int ttm_pool_orders[] = {TTM_MAX_ORDER, 0, 0};
 
+static struct page *ttm_pool_swap_to_page_ptr(swp_entry_t swap)
+{
+	return (struct page *)(swap.val << 1 | 1);
+}
+
+static swp_entry_t ttm_pool_page_ptr_to_swap(const struct page *p)
+{
+	swp_entry_t swap = {.val = ((unsigned long)p) >> 1};
+
+	return swap;
+}
+
+static bool ttm_pool_page_ptr_is_swap(const struct page *p)
+{
+	return ((unsigned long)p) & 1;
+}
+
 /* Allocate pages of size 1 << order with the given gfp_flags */
 static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
 					unsigned int order)
@@ -361,11 +405,99 @@ static unsigned int ttm_pool_page_order(struct ttm_pool *pool, struct page *p)
 	return p->private;
 }
 
+/*
+ * To be able to insert single pages into the swap cache directly,
+ * we need to split multi-order page allocations and make them look
+ * like single page-allocations.
+ */
+static void ttm_pool_split_for_swap(struct ttm_pool *pool, struct page *p)
+{
+	unsigned int order = ttm_pool_page_order(pool, p);
+	pgoff_t nr;
+
+	if (!order)
+		return;
+
+	split_page(p, order);
+	nr = 1UL << order;
+	while (nr--)
+		(p++)->private = 0;
+}
+
+/**
+ * DOC: Partial shrinking and restoration of a struct ttm_tt.
+ *
+ * Swapout using swap_backup_folio() and swapin using swap_copy_folio() may fail.
+ * The former most likely due to lack of swap-space or memory, the latter due
+ * to lack of memory or because of signal interruption during waits.
+ *
+ * Swapout failure is easily handled by using a ttm_tt pages vector that holds
+ * both swap entries and page pointers. This has to be taken into account when
+ * restoring such a ttm_tt from swap, and when freeing it while swapped out.
+ * When restoring, for simplicity, new pages are actually allocated from the
+ * pool and the contents of any old pages are copied in and then the old pages
+ * are released.
+ *
+ * For swapin failures, the struct ttm_pool_tt_restore holds sufficient state
+ * to be able to resume an interrupted restore, and that structure is freed once
+ * the restoration is complete. If the struct ttm_tt is destroyed while there
+ * is a valid struct ttm_pool_tt_restore attached, that is also properly taken
+ * care of.
+ */
+
+static bool ttm_pool_restore_valid(const struct ttm_pool_tt_restore *restore)
+{
+	return restore && restore->restored_pages < (1 << restore->order);
+}
+
+static int ttm_pool_swapin(struct ttm_pool_tt_restore *restore,
+			   struct ttm_operation_ctx *ctx)
+{
+	unsigned int i, nr = 1 << restore->order;
+	int ret = 0;
+
+	if (!ttm_pool_restore_valid(restore))
+		return 0;
+
+	for (i = restore->restored_pages; i < nr; ++i) {
+		struct page *p = restore->old_pages[i];
+
+		if (ttm_pool_page_ptr_is_swap(p)) {
+			swp_entry_t swap = ttm_pool_page_ptr_to_swap(p);
+
+			if (swap.val == 0)
+				continue;
+
+			ret = swap_copy_folio(swap, restore->first_page[i], 0,
+					      ctx->interruptible);
+			if (ret)
+				break;
+
+			swap_drop_folio(swap);
+		} else if (p) {
+			/*
+			 * We could probably avoid splitting the old page
+			 * using clever logic, but ATM we don't care.
+			 */
+			ttm_pool_split_for_swap(restore->pool, p);
+			copy_highpage(restore->first_page[i], p);
+			__free_pages(p, 0);
+		}
+
+		restore->restored_pages++;
+		restore->old_pages[i] = NULL;
+		cond_resched();
+	}
+
+	return ret;
+}
+
 /* Called when we got a page, either from a pool or newly allocated */
 static int ttm_pool_page_allocated(struct ttm_pool *pool, unsigned int order,
 				   struct page *p, dma_addr_t **dma_addr,
 				   unsigned long *num_pages,
-				   struct page ***pages)
+				   struct page ***pages,
+				   struct ttm_pool_tt_restore *restore)
 {
 	unsigned int i;
 	int r;
@@ -376,6 +508,16 @@ static int ttm_pool_page_allocated(struct ttm_pool *pool, unsigned int order,
 			return r;
 	}
 
+	if (restore) {
+		memcpy(restore->old_pages, *pages,
+		       (1 << order) * sizeof(*restore->old_pages));
+		memset(*pages, 0, (1 << order) * sizeof(**pages));
+		restore->order = order;
+		restore->restored_pages = 0;
+		restore->first_page = *pages;
+		restore->alloced_pages += 1UL << order;
+	}
+
 	*num_pages -= 1 << order;
 	for (i = 1 << order; i; --i, ++(*pages), ++p)
 		**pages = p;
@@ -387,32 +529,48 @@ static void __ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt,
 			    struct page **caching_divide,
 			    enum ttm_caching initial_caching,
 			    enum ttm_caching subseq_caching,
-			    pgoff_t num_pages)
+			    pgoff_t start_page, pgoff_t end_page)
 {
 	enum ttm_caching caching = subseq_caching;
-	struct page **pages = tt->pages;
+	struct page **pages = tt->pages + start_page;
 	unsigned int order;
 	pgoff_t i, nr;
 
 	if (pool && caching_divide)
 		caching = initial_caching;
 
-	for (i = 0; i < num_pages; i += nr, pages += nr) {
+	for (i = start_page; i < end_page; i += nr, pages += nr) {
 		struct ttm_pool_type *pt = NULL;
+		struct page *p = *pages;
 
 		if (unlikely(caching_divide == pages))
 			caching = subseq_caching;
 
-		order = ttm_pool_page_order(pool, *pages);
-		nr = (1UL << order);
-		if (tt->dma_address)
-			ttm_pool_unmap(pool, tt->dma_address[i], nr);
+		if (ttm_pool_page_ptr_is_swap(p)) {
+			swp_entry_t swap = ttm_pool_page_ptr_to_swap(p);
+
+			nr = 1;
+			if (swap.val != 0)
+				swap_drop_folio(swap);
+			continue;
+		}
+
+		if (pool) {
+			order = ttm_pool_page_order(pool, p);
+			nr = (1UL << order);
+			if (tt->dma_address)
+				ttm_pool_unmap(pool, tt->dma_address[i], nr);
+
+			pt = ttm_pool_select_type(pool, caching, order);
+		} else {
+			order = p->private;
+			nr = (1UL << order);
+		}
 
-		pt = ttm_pool_select_type(pool, caching, order);
 		if (pt)
-			ttm_pool_type_give(pt, *pages);
+			ttm_pool_type_give(pt, p);
 		else
-			ttm_pool_free_page(pool, caching, order, *pages);
+			ttm_pool_free_page(pool, caching, order, p);
 	}
 }
 
@@ -467,6 +625,28 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 		gfp_flags |= GFP_HIGHUSER;
 
 	order = ttm_pool_select_order(ttm_pool_orders[0], num_pages);
+
+	if (tt->page_flags & TTM_TT_FLAG_PRIV_SHRUNKEN) {
+		if (!tt->restore) {
+			tt->restore = kvzalloc(sizeof(*tt->restore),
+					       GFP_KERNEL);
+			if (!tt->restore)
+				return -ENOMEM;
+		} else if (ttm_pool_restore_valid(tt->restore)) {
+			struct ttm_pool_tt_restore *restore = tt->restore;
+
+			num_pages -= restore->alloced_pages;
+			order = ttm_pool_select_order(restore->order, num_pages);
+			pages += restore->alloced_pages;
+			r = ttm_pool_swapin(restore, ctx);
+			if (r)
+				return r;
+			caching = restore->caching_divide;
+		}
+
+		tt->restore->pool = pool;
+	}
+
 	for (; num_pages; order = ttm_pool_select_order(order, num_pages)) {
 		struct ttm_pool_type *pt;
 
@@ -484,11 +664,18 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 				r = ttm_pool_page_allocated(pool, order, p,
 							    &dma_addr,
 							    &num_pages,
-							    &pages);
+							    &pages,
+							    tt->restore);
 				if (r)
 					goto error_free_page;
 
 				caching = pages;
+				if (ttm_pool_restore_valid(tt->restore)) {
+					r = ttm_pool_swapin(tt->restore, ctx);
+					if (r)
+						goto error_free_all;
+				}
+
 				if (num_pages < (1 << order))
 					break;
 
@@ -508,9 +695,17 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 				caching = pages;
 			}
 			r = ttm_pool_page_allocated(pool, order, p, &dma_addr,
-						    &num_pages, &pages);
+						    &num_pages, &pages,
+						    tt->restore);
 			if (r)
 				goto error_free_page;
+
+			if (ttm_pool_restore_valid(tt->restore)) {
+				r = ttm_pool_swapin(tt->restore, ctx);
+				if (r)
+					goto error_free_all;
+			}
+
 			if (PageHighMem(p))
 				caching = pages;
 		}
@@ -529,15 +724,29 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
 	if (r)
 		goto error_free_all;
 
+	if (tt->restore) {
+		kvfree(tt->restore);
+		tt->restore = NULL;
+	}
+
+	if (tt->page_flags & TTM_TT_FLAG_PRIV_SHRUNKEN)
+		tt->page_flags &= ~(TTM_TT_FLAG_PRIV_SHRUNKEN |
+				    TTM_TT_FLAG_SWAPPED);
+
 	return 0;
 
 error_free_page:
 	ttm_pool_free_page(pool, page_caching, order, p);
 
 error_free_all:
+	if (tt->page_flags & TTM_TT_FLAG_PRIV_SHRUNKEN) {
+		tt->restore->caching_divide = caching;
+		return r;
+	}
+
 	num_pages = tt->num_pages - num_pages;
 	__ttm_pool_free(pool, tt, caching, tt->caching, ttm_cached,
-			num_pages);
+			0, num_pages);
 
 	return r;
 }
@@ -554,13 +763,177 @@ EXPORT_SYMBOL(ttm_pool_alloc);
 void ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt)
 {
 	__ttm_pool_free(pool, tt, NULL, tt->caching, tt->caching,
-			tt->num_pages);
+			0, tt->num_pages);
 
 	while (atomic_long_read(&allocated_pages) > page_pool_size)
 		ttm_pool_shrink();
 }
 EXPORT_SYMBOL(ttm_pool_free);
 
+/**
+ * ttm_pool_release_shrunken() - Release content of a swapped-out struct ttm_tt
+ * @tt: The struct ttm_tt.
+ *
+ * Release swap entries with associated content or any remaining pages of
+ * a swapped-out struct ttm_tt.
+ */
+void ttm_pool_release_shrunken(struct ttm_tt *tt)
+{
+	struct ttm_pool_tt_restore *restore;
+	struct page **caching_divide = NULL;
+	struct ttm_pool *pool = NULL;
+	pgoff_t i, start_page = 0;
+	swp_entry_t swap;
+
+	if (!(tt->page_flags & TTM_TT_FLAG_PRIV_SHRUNKEN))
+		return;
+
+	restore = tt->restore;
+
+	if (ttm_pool_restore_valid(restore)) {
+		pgoff_t nr = 1UL << restore->order;
+
+		for (i = restore->restored_pages; i < nr; ++i) {
+			struct page *p = restore->old_pages[i];
+
+			if (ttm_pool_page_ptr_is_swap(p)) {
+				swap = ttm_pool_page_ptr_to_swap(p);
+				if (swap.val == 0)
+					continue;
+
+				swap_drop_folio(swap);
+			} else if (p) {
+				ttm_pool_split_for_swap(restore->pool, p);
+				__free_pages(p, 0);
+			}
+		}
+	}
+
+	if (restore) {
+		pool = restore->pool;
+		caching_divide = restore->caching_divide;
+		start_page = restore->alloced_pages;
+		/* Pages that might be dma-mapped and non-cached */
+		__ttm_pool_free(pool, tt, caching_divide, tt->caching,
+				ttm_cached, 0, start_page);
+	}
+
+	/* Shrunken pages. Cached and not dma-mapped. */
+	__ttm_pool_free(NULL, tt, NULL, ttm_cached, ttm_cached, start_page,
+			tt->num_pages);
+
+	if (restore) {
+		kvfree(restore);
+		tt->restore = NULL;
+	}
+
+	tt->page_flags &= ~(TTM_TT_FLAG_PRIV_SHRUNKEN | TTM_TT_FLAG_SWAPPED);
+}
+
+/**
+ * ttm_pool_shrink_tt() - Swap out or purge a struct ttm_tt
+ * @pool: The pool used when allocating the struct ttm_tt.
+ * @ttm: The struct ttm_tt.
+ *
+ * Swap out or purge a struct ttm_tt. If @ttm is marked purgeable, then
+ * all pages will be freed directly to the system rather than to the pool
+ * they were allocated from, making the function behave similarly to
+ * ttm_pool_free(). If @ttm is not marked purgeable, the pages will be
+ * inserted into the swap cache instead, exchanged for a swap entry.
+ * A subsequent call to ttm_pool_alloc() will then read back the content and
+ * a subsequent call to ttm_pool_release_shrunken() will drop it.
+ * If swapout of a page fails for whatever reason, @ttm will still be
+ * partially swapped out, retaining those pages for which swapout fails.
+ *
+ * @Return: Number of pages actually swapped out or freed, or negative
+ * error code on error.
+ */
+long ttm_pool_shrink_tt(struct ttm_pool *pool, struct ttm_tt *ttm)
+{
+	struct page *page;
+	struct folio *folio;
+	swp_entry_t swap;
+	gfp_t alloc_gfp;
+	gfp_t gfp;
+	int ret = 0;
+	pgoff_t shrunken = 0;
+	pgoff_t i, num_pages;
+	bool purge = ttm_tt_purgeable(ttm);
+
+	if ((!get_nr_swap_pages() && purge) ||
+	    pool->use_dma_alloc ||
+	    (ttm->page_flags & TTM_TT_FLAG_PRIV_SHRUNKEN))
+		return -EBUSY;
+
+#ifdef CONFIG_X86
+	/* Anything returned to the system needs to be cached. */
+	if (ttm->caching != ttm_cached)
+		set_pages_array_wb(ttm->pages, ttm->num_pages);
+#endif
+
+	if (ttm->dma_address || purge) {
+		for (i = 0; i < ttm->num_pages; i += num_pages) {
+			unsigned int order;
+
+			page = ttm->pages[i];
+			if (unlikely(!page))
+				continue;
+
+			order = 1UL << ttm_pool_page_order(pool, page);
+			num_pages = 1UL << order;
+			if (ttm->dma_address)
+				ttm_pool_unmap(pool, ttm->dma_address[i],
+					       num_pages);
+			if (purge) {
+				shrunken += num_pages;
+				__free_pages(page, order);
+				memset(ttm->pages + i, 0,
+				       num_pages * sizeof(*ttm->pages));
+			}
+		}
+	}
+
+	if (purge)
+		return shrunken;
+
+	if (pool->use_dma32)
+		gfp = GFP_DMA32;
+	else
+		gfp = GFP_HIGHUSER;
+
+	alloc_gfp = GFP_KERNEL | __GFP_HIGH | __GFP_NOWARN;
+	if (current_is_kswapd())
+		alloc_gfp |= __GFP_NOMEMALLOC;
+
+	for (i = 0; i < ttm->num_pages; ++i) {
+		page = ttm->pages[i];
+		if (unlikely(!page))
+			continue;
+
+		ttm_pool_split_for_swap(pool, page);
+
+		folio = page_folio(page);
+		folio_mark_dirty(folio);
+		swap = swap_backup_folio(folio, false, gfp, alloc_gfp);
+		if (swap.val) {
+			ttm->pages[i] = ttm_pool_swap_to_page_ptr(swap);
+			folio_put(folio);
+			shrunken++;
+		} else {
+			/* We allow partially shrunken tts */
+			ret = -ENOMEM;
+			break;
+		}
+		cond_resched();
+	}
+
+	if (shrunken)
+		ttm->page_flags |= (TTM_TT_FLAG_PRIV_SHRUNKEN |
+				    TTM_TT_FLAG_SWAPPED);
+
+	return shrunken ? shrunken : ret;
+}
+
 /**
  * ttm_pool_init - Initialize a pool
  *
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index c63be8f5ed2a..8ac4a9cba34d 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -133,6 +133,8 @@ int ttm_tt_set_dontneed(const struct ttm_device *bdev, struct ttm_tt *tt)
 		fput(tt->swap_storage);
 	tt->swap_storage = NULL;
 
+	ttm_pool_release_shrunken(tt);
+
 	return -EALREADY;
 }
 EXPORT_SYMBOL(ttm_tt_set_dontneed);
@@ -253,6 +255,7 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
 	ttm->swap_storage = NULL;
 	ttm->sg = bo->sg;
 	ttm->caching = caching;
+	ttm->restore = NULL;
 }
 
 int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
@@ -277,6 +280,8 @@ void ttm_tt_fini(struct ttm_tt *ttm)
 		fput(ttm->swap_storage);
 	ttm->swap_storage = NULL;
 
+	ttm_pool_release_shrunken(ttm);
+
 	if (ttm->pages)
 		kvfree(ttm->pages);
 	else
@@ -347,6 +352,35 @@ int ttm_tt_swapin(struct ttm_tt *ttm)
 	return ret;
 }
 
+/**
+ * ttm_tt_shrink() - Helper for the driver bo_shrink() method.
+ * @bdev: The TTM device.
+ * @tt: The struct ttm_tt.
+ *
+ * Helper for a TTM driver to use from the bo_shrink() method to shrink
+ * a struct ttm_tt, after it has done the necessary unbinding. This function
+ * will update the page accounting and call ttm_pool_shrink_tt to free pages
+ * or move them to the swap cache.
+ *
+ * Return: Number of pages freed or swapped out, or negative error code on
+ * error.
+ */
+long ttm_tt_shrink(struct ttm_device *bdev, struct ttm_tt *tt)
+{
+	long ret = ttm_pool_shrink_tt(&bdev->pool, tt);
+
+	if (ret > 0) {
+		tt->page_flags &= ~TTM_TT_FLAG_PRIV_POPULATED;
+		if (ttm_tt_purgeable(tt))
+			ttm_tt_mod_shrinkable_pages(0, -(long)tt->num_pages);
+		else
+			ttm_tt_mod_shrinkable_pages(-(long)tt->num_pages, 0);
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL(ttm_tt_shrink);
+
 /**
  * ttm_tt_swapout - swap out tt object
  * @bdev: TTM device structure.
diff --git a/include/drm/ttm/ttm_pool.h b/include/drm/ttm/ttm_pool.h
index c1200552892e..bfe14138a992 100644
--- a/include/drm/ttm/ttm_pool.h
+++ b/include/drm/ttm/ttm_pool.h
@@ -86,6 +86,10 @@ void ttm_pool_fini(struct ttm_pool *pool);
 
 int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m);
 
+void ttm_pool_release_shrunken(struct ttm_tt *tt);
+
+long ttm_pool_shrink_tt(struct ttm_pool *pool, struct ttm_tt *ttm);
+
 int ttm_pool_mgr_init(unsigned long num_pages);
 void ttm_pool_mgr_fini(void);
 
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index abb17527f76c..0fa71292b676 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -37,6 +37,7 @@ struct ttm_tt;
 struct ttm_resource;
 struct ttm_buffer_object;
 struct ttm_operation_ctx;
+struct ttm_pool_tt_restore;
 
 /**
  * struct ttm_tt - This is a structure holding the pages, caching- and aperture
@@ -79,6 +80,10 @@ struct ttm_tt {
 	 *   page_flags = TTM_TT_FLAG_EXTERNAL |
 	 *		  TTM_TT_FLAG_EXTERNAL_MAPPABLE;
 	 *
+	 * TTM_TT_FLAG_PRIV_SHRUNKEN: TTM internal only. This is set if the
+	 * struct ttm_tt has been (possibly partially) swapped out to the
+	 * swap cache.
+	 *
 	 * TTM_TT_FLAG_PRIV_POPULATED: TTM internal only. DO NOT USE. This is
 	 * set by TTM after ttm_tt_populate() has successfully returned, and is
 	 * then unset when TTM calls ttm_tt_unpopulate().
@@ -89,6 +94,7 @@ struct ttm_tt {
 #define TTM_TT_FLAG_EXTERNAL_MAPPABLE	BIT(3)
 #define TTM_TT_FLAG_DONTNEED		BIT(4)
 
+#define TTM_TT_FLAG_PRIV_SHRUNKEN	BIT(30)
 #define TTM_TT_FLAG_PRIV_POPULATED	BIT(31)
 	uint32_t page_flags;
 	/** @num_pages: Number of pages in the page array. */
@@ -104,6 +110,8 @@ struct ttm_tt {
 	 * ttm_caching.
 	 */
 	enum ttm_caching caching;
+	/** @restore: Swap restore state. Drivers keep off. */
+	struct ttm_pool_tt_restore *restore;
 };
 
 /**
@@ -226,6 +234,8 @@ void ttm_tt_mgr_fini(void);
 struct ttm_kmap_iter *ttm_kmap_iter_tt_init(struct ttm_kmap_iter_tt *iter_tt,
 					    struct ttm_tt *tt);
 
+long ttm_tt_shrink(struct ttm_device *bdev, struct ttm_tt *tt);
+
 /**
  * ttm_tt_purgeable() - Whether a struct ttm_tt's contents is purgeable
  * @tt: The struct ttm_tt to consider.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 15/16] drm/ttm: Use fault-injection to test error paths
  2023-02-15 16:13 [RFC PATCH 00/16] Add a TTM shrinker Thomas Hellström
                   ` (13 preceding siblings ...)
  2023-02-15 16:14 ` [RFC PATCH 14/16] drm/ttm: Provide helpers for shrinking Thomas Hellström
@ 2023-02-15 16:14 ` Thomas Hellström
  2023-02-15 16:14 ` [RFC PATCH 16/16] drm/i915, drm/ttm: Use the TTM shrinker rather than the external shmem pool Thomas Hellström
  15 siblings, 0 replies; 32+ messages in thread
From: Thomas Hellström @ 2023-02-15 16:14 UTC (permalink / raw)
  To: dri-devel
  Cc: Miaohe Lin, Thomas Hellström, David Hildenbrand, NeilBrown,
	Daniel Vetter, intel-gfx, Matthew Wilcox (Oracle),
	linux-mm, Dave Hansen, linux-graphics-maintainer, Peter Xu,
	Johannes Weiner, Dave Airlie, Andrew Morton, Christian Koenig,
	Matthew Auld

Use fault-injection to test partial TTM swapout and interrupted swapin.
Return -EINTR for swapin to test the callers ability to handle and
restart the swapin, and on swapout perform a partial swapout to test that
the swapin and release_shrunken functionality.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/Kconfig        | 10 ++++++++++
 drivers/gpu/drm/ttm/ttm_pool.c | 17 ++++++++++++++++-
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 1efd33411a92..a78eed9af2c1 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -202,6 +202,16 @@ config DRM_TTM
 	  GPU memory types. Will be enabled automatically if a device driver
 	  uses it.
 
+config DRM_TTM_SHRINK_FAULT_INJECT
+	bool "Enable fault injection during TTM shrinking"
+	depends on DRM_TTM
+	default n
+	help
+	  Inject recoverable failures during TTM shrinking and recovery of
+	  shrunken objects. For DRM driver developers only.
+
+	  If in doubt, choose N.
+
 config DRM_BUDDY
 	tristate
 	depends on DRM
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 319998b4a325..d7c604593689 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -453,6 +453,7 @@ static bool ttm_pool_restore_valid(const struct ttm_pool_tt_restore *restore)
 static int ttm_pool_swapin(struct ttm_pool_tt_restore *restore,
 			   struct ttm_operation_ctx *ctx)
 {
+	static unsigned long __maybe_unused swappedin;
 	unsigned int i, nr = 1 << restore->order;
 	int ret = 0;
 
@@ -468,6 +469,13 @@ static int ttm_pool_swapin(struct ttm_pool_tt_restore *restore,
 			if (swap.val == 0)
 				continue;
 
+			if (IS_ENABLED(CONFIG_DRM_TTM_SHRINK_FAULT_INJECT) &&
+			    ctx->interruptible &&
+			    ++swappedin % 100 == 0) {
+				ret = -EINTR;
+				break;
+			}
+
 			ret = swap_copy_folio(swap, restore->first_page[i], 0,
 					      ctx->interruptible);
 			if (ret)
@@ -905,7 +913,14 @@ long ttm_pool_shrink_tt(struct ttm_pool *pool, struct ttm_tt *ttm)
 	if (current_is_kswapd())
 		alloc_gfp |= __GFP_NOMEMALLOC;
 
-	for (i = 0; i < ttm->num_pages; ++i) {
+	num_pages = ttm->num_pages;
+
+	/* Pretend doing fault injection by shrinking only half of the pages. */
+
+	if (IS_ENABLED(CONFIG_DRM_TTM_SHRINK_FAULT_INJECT))
+		num_pages = DIV_ROUND_UP(num_pages, 2);
+
+	for (i = 0; i < num_pages; ++i) {
 		page = ttm->pages[i];
 		if (unlikely(!page))
 			continue;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 16/16] drm/i915, drm/ttm: Use the TTM shrinker rather than the external shmem pool
  2023-02-15 16:13 [RFC PATCH 00/16] Add a TTM shrinker Thomas Hellström
                   ` (14 preceding siblings ...)
  2023-02-15 16:14 ` [RFC PATCH 15/16] drm/ttm: Use fault-injection to test error paths Thomas Hellström
@ 2023-02-15 16:14 ` Thomas Hellström
  15 siblings, 0 replies; 32+ messages in thread
From: Thomas Hellström @ 2023-02-15 16:14 UTC (permalink / raw)
  To: dri-devel
  Cc: Miaohe Lin, Thomas Hellström, David Hildenbrand, NeilBrown,
	Daniel Vetter, intel-gfx, Matthew Wilcox (Oracle),
	linux-mm, Dave Hansen, linux-graphics-maintainer, Peter Xu,
	Johannes Weiner, Dave Airlie, Andrew Morton, Christian Koenig,
	Matthew Auld

Remove the external i915 TTM shmem pool and replace it with the
normal TTM page allocation. Also provide a callback for the TTM
shrinker functionality.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |   6 -
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   6 -
 drivers/gpu/drm/i915/gem/i915_gem_pages.c     |   5 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       | 273 +++---------------
 drivers/gpu/drm/i915/i915_gem.c               |   3 +-
 drivers/gpu/drm/ttm/ttm_bo_vm.c               |   6 +-
 drivers/gpu/drm/ttm/ttm_tt.c                  |   3 -
 include/drm/ttm/ttm_tt.h                      |  15 +-
 8 files changed, 53 insertions(+), 264 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index f9a8acbba715..f694b5d479e5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -282,12 +282,6 @@ i915_gem_object_is_shrinkable(const struct drm_i915_gem_object *obj)
 	return i915_gem_object_type_has(obj, I915_GEM_OBJECT_IS_SHRINKABLE);
 }
 
-static inline bool
-i915_gem_object_has_self_managed_shrink_list(const struct drm_i915_gem_object *obj)
-{
-	return i915_gem_object_type_has(obj, I915_GEM_OBJECT_SELF_MANAGED_SHRINK_LIST);
-}
-
 static inline bool
 i915_gem_object_is_proxy(const struct drm_i915_gem_object *obj)
 {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 19c9bdd8f905..511dc1384a9c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -544,12 +544,6 @@ struct drm_i915_gem_object {
 		 */
 		atomic_t shrink_pin;
 
-		/**
-		 * @ttm_shrinkable: True when the object is using shmem pages
-		 * underneath. Protected by the object lock.
-		 */
-		bool ttm_shrinkable;
-
 		/**
 		 * @unknown_state: Indicate that the object is effectively
 		 * borked. This is write-once and set if we somehow encounter a
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index ecd86130b74f..c39d45661b84 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -73,7 +73,7 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
 		shrinkable = false;
 	}
 
-	if (shrinkable && !i915_gem_object_has_self_managed_shrink_list(obj)) {
+	if (shrinkable) {
 		struct list_head *list;
 		unsigned long flags;
 
@@ -216,8 +216,7 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
 	if (i915_gem_object_is_volatile(obj))
 		obj->mm.madv = I915_MADV_WILLNEED;
 
-	if (!i915_gem_object_has_self_managed_shrink_list(obj))
-		i915_gem_object_make_unshrinkable(obj);
+	i915_gem_object_make_unshrinkable(obj);
 
 	if (obj->mm.mapping) {
 		unmap_object(obj, page_mask_bits(obj->mm.mapping));
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 341b94672abc..f9bd4f50d495 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -3,8 +3,6 @@
  * Copyright © 2021 Intel Corporation
  */
 
-#include <linux/shmem_fs.h>
-
 #include <drm/ttm/ttm_placement.h>
 #include <drm/ttm/ttm_tt.h>
 #include <drm/drm_buddy.h>
@@ -37,8 +35,6 @@
  * @ttm: The base TTM page vector.
  * @dev: The struct device used for dma mapping and unmapping.
  * @cached_rsgt: The cached scatter-gather table.
- * @is_shmem: Set if using shmem.
- * @filp: The shmem file, if using shmem backend.
  *
  * Note that DMA may be going on right up to the point where the page-
  * vector is unpopulated in delayed destroy. Hence keep the
@@ -50,9 +46,6 @@ struct i915_ttm_tt {
 	struct ttm_tt ttm;
 	struct device *dev;
 	struct i915_refct_sgt cached_rsgt;
-
-	bool is_shmem;
-	struct file *filp;
 };
 
 static const struct ttm_place sys_placement_flags = {
@@ -185,75 +178,6 @@ i915_ttm_placement_from_obj(const struct drm_i915_gem_object *obj,
 	placement->busy_placement = busy;
 }
 
-static int i915_ttm_tt_shmem_populate(struct ttm_device *bdev,
-				      struct ttm_tt *ttm,
-				      struct ttm_operation_ctx *ctx)
-{
-	struct drm_i915_private *i915 = container_of(bdev, typeof(*i915), bdev);
-	struct intel_memory_region *mr = i915->mm.regions[INTEL_MEMORY_SYSTEM];
-	struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
-	const unsigned int max_segment = i915_sg_segment_size(i915->drm.dev);
-	const size_t size = (size_t)ttm->num_pages << PAGE_SHIFT;
-	struct file *filp = i915_tt->filp;
-	struct sgt_iter sgt_iter;
-	struct sg_table *st;
-	struct page *page;
-	unsigned long i;
-	int err;
-
-	if (!filp) {
-		struct address_space *mapping;
-		gfp_t mask;
-
-		filp = shmem_file_setup("i915-shmem-tt", size, VM_NORESERVE);
-		if (IS_ERR(filp))
-			return PTR_ERR(filp);
-
-		mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;
-
-		mapping = filp->f_mapping;
-		mapping_set_gfp_mask(mapping, mask);
-		GEM_BUG_ON(!(mapping_gfp_mask(mapping) & __GFP_RECLAIM));
-
-		i915_tt->filp = filp;
-	}
-
-	st = &i915_tt->cached_rsgt.table;
-	err = shmem_sg_alloc_table(i915, st, size, mr, filp->f_mapping,
-				   max_segment);
-	if (err)
-		return err;
-
-	err = dma_map_sgtable(i915_tt->dev, st, DMA_BIDIRECTIONAL,
-			      DMA_ATTR_SKIP_CPU_SYNC);
-	if (err)
-		goto err_free_st;
-
-	i = 0;
-	for_each_sgt_page(page, sgt_iter, st)
-		ttm->pages[i++] = page;
-
-	if (ttm->page_flags & TTM_TT_FLAG_SWAPPED)
-		ttm->page_flags &= ~TTM_TT_FLAG_SWAPPED;
-
-	return 0;
-
-err_free_st:
-	shmem_sg_free_table(st, filp->f_mapping, false, false);
-
-	return err;
-}
-
-static void i915_ttm_tt_shmem_unpopulate(struct ttm_tt *ttm)
-{
-	struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
-	bool backup = ttm->page_flags & TTM_TT_FLAG_SWAPPED;
-	struct sg_table *st = &i915_tt->cached_rsgt.table;
-
-	shmem_sg_free_table(st, file_inode(i915_tt->filp)->i_mapping,
-			    backup, backup);
-}
-
 static void i915_ttm_tt_release(struct kref *ref)
 {
 	struct i915_ttm_tt *i915_tt =
@@ -292,11 +216,6 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 		page_flags |= TTM_TT_FLAG_ZERO_ALLOC;
 
 	caching = i915_ttm_select_tt_caching(obj);
-	if (i915_gem_object_is_shrinkable(obj) && caching == ttm_cached) {
-		page_flags |= TTM_TT_FLAG_EXTERNAL |
-			      TTM_TT_FLAG_EXTERNAL_MAPPABLE;
-		i915_tt->is_shmem = true;
-	}
 
 	if (i915_gem_object_needs_ccs_pages(obj))
 		ccs_pages = DIV_ROUND_UP(DIV_ROUND_UP(bo->base.size,
@@ -325,9 +244,6 @@ static int i915_ttm_tt_populate(struct ttm_device *bdev,
 {
 	struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
 
-	if (i915_tt->is_shmem)
-		return i915_ttm_tt_shmem_populate(bdev, ttm, ctx);
-
 	return ttm_pool_alloc(&bdev->pool, ttm, ctx);
 }
 
@@ -339,21 +255,46 @@ static void i915_ttm_tt_unpopulate(struct ttm_device *bdev, struct ttm_tt *ttm)
 	if (st->sgl)
 		dma_unmap_sgtable(i915_tt->dev, st, DMA_BIDIRECTIONAL, 0);
 
-	if (i915_tt->is_shmem) {
-		i915_ttm_tt_shmem_unpopulate(ttm);
-	} else {
-		sg_free_table(st);
-		ttm_pool_free(&bdev->pool, ttm);
+	sg_free_table(st);
+	ttm_pool_free(&bdev->pool, ttm);
+}
+
+static long i915_ttm_bo_shrink(struct ttm_buffer_object *bo,
+			       struct ttm_operation_ctx *ctx)
+
+{
+	struct ttm_tt *tt = bo->ttm;
+	struct i915_ttm_tt *i915_tt = container_of(tt, typeof(*i915_tt), ttm);
+	struct sg_table *st = &i915_tt->cached_rsgt.table;
+	long ret;
+
+	if (!i915_ttm_is_ghost_object(bo)) {
+		struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
+		long ret = i915_ttm_move_notify(bo);
+
+		if (ret)
+			return ret;
+
+		if (obj->mm.madv == I915_MADV_DONTNEED) {
+			GEM_WARN_ON(!(tt->page_flags & TTM_TT_FLAG_DONTNEED));
+			obj->mm.madv = __I915_MADV_PURGED;
+		}
 	}
+
+	if (st->sgl)
+		dma_unmap_sgtable(i915_tt->dev, st, DMA_BIDIRECTIONAL, 0);
+
+	sg_free_table(st);
+
+	ret = ttm_tt_shrink(bo->bdev, tt);
+
+	return ret;
 }
 
 static void i915_ttm_tt_destroy(struct ttm_device *bdev, struct ttm_tt *ttm)
 {
 	struct i915_ttm_tt *i915_tt = container_of(ttm, typeof(*i915_tt), ttm);
 
-	if (i915_tt->filp)
-		fput(i915_tt->filp);
-
 	ttm_tt_fini(ttm);
 	i915_refct_sgt_put(&i915_tt->cached_rsgt);
 }
@@ -366,14 +307,6 @@ static bool i915_ttm_eviction_valuable(struct ttm_buffer_object *bo,
 	if (i915_ttm_is_ghost_object(bo))
 		return false;
 
-	/*
-	 * EXTERNAL objects should never be swapped out by TTM, instead we need
-	 * to handle that ourselves. TTM will already skip such objects for us,
-	 * but we would like to avoid grabbing locks for no good reason.
-	 */
-	if (bo->ttm && bo->ttm->page_flags & TTM_TT_FLAG_EXTERNAL)
-		return false;
-
 	/* Will do for now. Our pinned objects are still on TTM's LRU lists */
 	if (!i915_gem_object_evictable(obj))
 		return false;
@@ -439,18 +372,6 @@ int i915_ttm_purge(struct drm_i915_gem_object *obj)
 	if (ret)
 		return ret;
 
-	if (bo->ttm && i915_tt->filp) {
-		/*
-		 * The below fput(which eventually calls shmem_truncate) might
-		 * be delayed by worker, so when directly called to purge the
-		 * pages(like by the shrinker) we should try to be more
-		 * aggressive and release the pages immediately.
-		 */
-		shmem_truncate_range(file_inode(i915_tt->filp),
-				     0, (loff_t)-1);
-		fput(fetch_and_zero(&i915_tt->filp));
-	}
-
 	obj->write_domain = 0;
 	obj->read_domains = 0;
 	i915_ttm_adjust_gem_after_move(obj);
@@ -460,53 +381,6 @@ int i915_ttm_purge(struct drm_i915_gem_object *obj)
 	return 0;
 }
 
-static int i915_ttm_shrink(struct drm_i915_gem_object *obj, unsigned int flags)
-{
-	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
-	struct i915_ttm_tt *i915_tt =
-		container_of(bo->ttm, typeof(*i915_tt), ttm);
-	struct ttm_operation_ctx ctx = {
-		.interruptible = true,
-		.no_wait_gpu = flags & I915_GEM_OBJECT_SHRINK_NO_GPU_WAIT,
-	};
-	struct ttm_placement place = {};
-	int ret;
-
-	if (!bo->ttm || i915_ttm_cpu_maps_iomem(bo->resource))
-		return 0;
-
-	GEM_BUG_ON(!i915_tt->is_shmem);
-
-	if (!i915_tt->filp)
-		return 0;
-
-	ret = ttm_bo_wait_ctx(bo, &ctx);
-	if (ret)
-		return ret;
-
-	switch (obj->mm.madv) {
-	case I915_MADV_DONTNEED:
-		return i915_ttm_purge(obj);
-	case __I915_MADV_PURGED:
-		return 0;
-	}
-
-	if (bo->ttm->page_flags & TTM_TT_FLAG_SWAPPED)
-		return 0;
-
-	bo->ttm->page_flags |= TTM_TT_FLAG_SWAPPED;
-	ret = ttm_bo_validate(bo, &place, &ctx);
-	if (ret) {
-		bo->ttm->page_flags &= ~TTM_TT_FLAG_SWAPPED;
-		return ret;
-	}
-
-	if (flags & I915_GEM_OBJECT_SHRINK_WRITEBACK)
-		__shmem_writeback(obj->base.size, i915_tt->filp->f_mapping);
-
-	return 0;
-}
-
 static void i915_ttm_delete_mem_notify(struct ttm_buffer_object *bo)
 {
 	struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
@@ -765,6 +639,7 @@ static struct ttm_device_funcs i915_ttm_bo_driver = {
 	.io_mem_reserve = i915_ttm_io_mem_reserve,
 	.io_mem_pfn = i915_ttm_io_mem_pfn,
 	.access_memory = i915_ttm_access_memory,
+	.bo_shrink = i915_ttm_bo_shrink,
 };
 
 /**
@@ -931,8 +806,6 @@ void i915_ttm_adjust_lru(struct drm_i915_gem_object *obj)
 	struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
 	struct i915_ttm_tt *i915_tt =
 		container_of(bo->ttm, typeof(*i915_tt), ttm);
-	bool shrinkable =
-		bo->ttm && i915_tt->filp && ttm_tt_is_populated(bo->ttm);
 
 	/*
 	 * Don't manipulate the TTM LRUs while in TTM bo destruction.
@@ -941,54 +814,25 @@ void i915_ttm_adjust_lru(struct drm_i915_gem_object *obj)
 	if (!kref_read(&bo->kref))
 		return;
 
-	/*
-	 * We skip managing the shrinker LRU in set_pages() and just manage
-	 * everything here. This does at least solve the issue with having
-	 * temporary shmem mappings(like with evicted lmem) not being visible to
-	 * the shrinker. Only our shmem objects are shrinkable, everything else
-	 * we keep as unshrinkable.
-	 *
-	 * To make sure everything plays nice we keep an extra shrink pin in TTM
-	 * if the underlying pages are not currently shrinkable. Once we release
-	 * our pin, like when the pages are moved to shmem, the pages will then
-	 * be added to the shrinker LRU, assuming the caller isn't also holding
-	 * a pin.
-	 *
-	 * TODO: consider maybe also bumping the shrinker list here when we have
-	 * already unpinned it, which should give us something more like an LRU.
-	 *
-	 * TODO: There is a small window of opportunity for this function to
-	 * get called from eviction after we've dropped the last GEM refcount,
-	 * but before the TTM deleted flag is set on the object. Avoid
-	 * adjusting the shrinker list in such cases, since the object is
-	 * not available to the shrinker anyway due to its zero refcount.
-	 * To fix this properly we should move to a TTM shrinker LRU list for
-	 * these objects.
-	 */
-	if (kref_get_unless_zero(&obj->base.refcount)) {
-		if (shrinkable != obj->mm.ttm_shrinkable) {
-			if (shrinkable) {
-				if (obj->mm.madv == I915_MADV_WILLNEED)
-					__i915_gem_object_make_shrinkable(obj);
-				else
-					__i915_gem_object_make_purgeable(obj);
-			} else {
-				i915_gem_object_make_unshrinkable(obj);
-			}
-
-			obj->mm.ttm_shrinkable = shrinkable;
-		}
-		i915_gem_object_put(obj);
+	if (bo->ttm) {
+		int ret = 0;
+
+		if (obj->mm.madv == I915_MADV_DONTNEED &&
+		    !ttm_tt_purgeable(bo->ttm))
+			ret = ttm_tt_set_dontneed(bo->bdev, bo->ttm);
+		else if (obj->mm.madv == I915_MADV_WILLNEED &&
+			 ttm_tt_purgeable(bo->ttm))
+			ret = ttm_tt_set_willneed(bo->bdev, bo->ttm);
+
+		if (ret == -EALREADY)
+			obj->mm.madv = __I915_MADV_PURGED;
 	}
 
 	/*
 	 * Put on the correct LRU list depending on the MADV status
 	 */
 	spin_lock(&bo->bdev->lru_lock);
-	if (shrinkable) {
-		/* Try to keep shmem_tt from being considered for shrinking. */
-		bo->priority = TTM_MAX_BO_PRIORITY - 1;
-	} else if (obj->mm.madv != I915_MADV_WILLNEED) {
+	if (obj->mm.madv != I915_MADV_WILLNEED) {
 		bo->priority = I915_TTM_PRIO_PURGE;
 	} else if (!i915_gem_object_has_pages(obj)) {
 		bo->priority = I915_TTM_PRIO_NO_PAGES;
@@ -1226,13 +1070,10 @@ static void i915_ttm_unmap_virtual(struct drm_i915_gem_object *obj)
 
 static const struct drm_i915_gem_object_ops i915_gem_ttm_obj_ops = {
 	.name = "i915_gem_object_ttm",
-	.flags = I915_GEM_OBJECT_IS_SHRINKABLE |
-		 I915_GEM_OBJECT_SELF_MANAGED_SHRINK_LIST,
 
 	.get_pages = i915_ttm_get_pages,
 	.put_pages = i915_ttm_put_pages,
 	.truncate = i915_ttm_truncate,
-	.shrink = i915_ttm_shrink,
 
 	.adjust_lru = i915_ttm_adjust_lru,
 	.delayed_free = i915_ttm_delayed_free,
@@ -1251,18 +1092,6 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo)
 	mutex_destroy(&obj->ttm.get_io_page.lock);
 
 	if (obj->ttm.created) {
-		/*
-		 * We freely manage the shrinker LRU outide of the mm.pages life
-		 * cycle. As a result when destroying the object we should be
-		 * extra paranoid and ensure we remove it from the LRU, before
-		 * we free the object.
-		 *
-		 * Touching the ttm_shrinkable outside of the object lock here
-		 * should be safe now that the last GEM object ref was dropped.
-		 */
-		if (obj->mm.ttm_shrinkable)
-			i915_gem_object_make_unshrinkable(obj);
-
 		i915_ttm_backup_free(obj);
 
 		/* This releases all gem object bindings to the backend. */
@@ -1318,14 +1147,6 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
 	/* Forcing the page size is kernel internal only */
 	GEM_BUG_ON(page_size && obj->mm.n_placements);
 
-	/*
-	 * Keep an extra shrink pin to prevent the object from being made
-	 * shrinkable too early. If the ttm_tt is ever allocated in shmem, we
-	 * drop the pin. The TTM backend manages the shrinker LRU itself,
-	 * outside of the normal mm.pages life cycle.
-	 */
-	i915_gem_object_make_unshrinkable(obj);
-
 	/*
 	 * If this function fails, it will call the destructor, but
 	 * our caller still owns the object. So no freeing in the
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 35950fa91406..4dff76614347 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1068,8 +1068,7 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
 			obj->ops->adjust_lru(obj);
 	}
 
-	if (i915_gem_object_has_pages(obj) ||
-	    i915_gem_object_has_self_managed_shrink_list(obj)) {
+	if (i915_gem_object_has_pages(obj)) {
 		unsigned long flags;
 
 		spin_lock_irqsave(&i915->mm.obj_lock, flags);
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 80f106bfe385..7537bc300e34 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -150,10 +150,8 @@ vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo,
 	 * (if at all) by redirecting mmap to the exporter.
 	 */
 	if (bo->ttm && (bo->ttm->page_flags & TTM_TT_FLAG_EXTERNAL)) {
-		if (!(bo->ttm->page_flags & TTM_TT_FLAG_EXTERNAL_MAPPABLE)) {
-			dma_resv_unlock(bo->base.resv);
-			return VM_FAULT_SIGBUS;
-		}
+		dma_resv_unlock(bo->base.resv);
+		return VM_FAULT_SIGBUS;
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 8ac4a9cba34d..b0533833d581 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -198,9 +198,6 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool zero_alloc)
 	if (unlikely(bo->ttm == NULL))
 		return -ENOMEM;
 
-	WARN_ON(bo->ttm->page_flags & TTM_TT_FLAG_EXTERNAL_MAPPABLE &&
-		!(bo->ttm->page_flags & TTM_TT_FLAG_EXTERNAL));
-
 	return 0;
 }
 
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index 0fa71292b676..0d1d377903e0 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -68,18 +68,6 @@ struct ttm_tt {
 	 * Note that enum ttm_bo_type.ttm_bo_type_sg objects will always enable
 	 * this flag.
 	 *
-	 * TTM_TT_FLAG_EXTERNAL_MAPPABLE: Same behaviour as
-	 * TTM_TT_FLAG_EXTERNAL, but with the reduced restriction that it is
-	 * still valid to use TTM to map the pages directly. This is useful when
-	 * implementing a ttm_tt backend which still allocates driver owned
-	 * pages underneath(say with shmem).
-	 *
-	 * Note that since this also implies TTM_TT_FLAG_EXTERNAL, the usage
-	 * here should always be:
-	 *
-	 *   page_flags = TTM_TT_FLAG_EXTERNAL |
-	 *		  TTM_TT_FLAG_EXTERNAL_MAPPABLE;
-	 *
 	 * TTM_TT_FLAG_PRIV_SHRUNKEN: TTM internal only. This is set if the
 	 * struct ttm_tt has been (possibly partially) swapped out to the
 	 * swap cache.
@@ -91,8 +79,7 @@ struct ttm_tt {
 #define TTM_TT_FLAG_SWAPPED		BIT(0)
 #define TTM_TT_FLAG_ZERO_ALLOC		BIT(1)
 #define TTM_TT_FLAG_EXTERNAL		BIT(2)
-#define TTM_TT_FLAG_EXTERNAL_MAPPABLE	BIT(3)
-#define TTM_TT_FLAG_DONTNEED		BIT(4)
+#define TTM_TT_FLAG_DONTNEED		BIT(3)
 
 #define TTM_TT_FLAG_PRIV_SHRUNKEN	BIT(30)
 #define TTM_TT_FLAG_PRIV_POPULATED	BIT(31)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 01/16] drm/ttm: Fix a NULL pointer dereference
  2023-02-15 16:13 ` [RFC PATCH 01/16] drm/ttm: Fix a NULL pointer dereference Thomas Hellström
@ 2023-02-15 17:25   ` Christian König
  0 siblings, 0 replies; 32+ messages in thread
From: Christian König @ 2023-02-15 17:25 UTC (permalink / raw)
  To: Thomas Hellström, dri-devel
  Cc: Miaohe Lin, David Hildenbrand, NeilBrown, Daniel Vetter,
	Peter Xu, linux-mm, Dave Hansen, Huang Rui,
	Matthew Wilcox (Oracle),
	linux-graphics-maintainer, Matthew Auld, Ramalingam C,
	Dave Airlie, Philip Yang, Arunpravin Paneer Selvam,
	Anshuman Gupta, intel-gfx, Qiang Yu, Tvrtko Ursulin,
	Felix Kuehling, Johannes Weiner, Alex Deucher, Andrew Morton,
	Nirmoy Das

Am 15.02.23 um 17:13 schrieb Thomas Hellström:
> The LRU mechanism may look up a resource in the process of being removed
> from an object. The locking rules here are a bit unclear but it looks
> currently like res->bo assignment is protected by the LRU lock, whereas
> bo->resource is protected by the object lock, while *clearing* of
> bo->resource is also protected by the LRU lock. This means that if
> we check that bo->resource points to the LRU resource under the LRU
> lock we should be safe.
> So perform that check before deciding to swap out a bo. That avoids
> dereferencing a NULL bo->resource in ttm_bo_swapout().
>
> Fixes: 6a9b02899402 ("drm/ttm: move the LRU into resource handling v4")
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Huang Rui <ray.huang@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Felix Kuehling <Felix.Kuehling@amd.com>
> Cc: Philip Yang <Philip.Yang@amd.com>
> Cc: Qiang Yu <qiang.yu@amd.com>
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: Nirmoy Das <nirmoy.das@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
> Cc: Anshuman Gupta <anshuman.gupta@intel.com>
> Cc: Ramalingam C <ramalingam.c@intel.com>
> Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/ttm/ttm_device.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
> index c7a1862f322a..ae2f19dc9f81 100644
> --- a/drivers/gpu/drm/ttm/ttm_device.c
> +++ b/drivers/gpu/drm/ttm/ttm_device.c
> @@ -158,7 +158,7 @@ int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
>   			struct ttm_buffer_object *bo = res->bo;
>   			uint32_t num_pages;
>   
> -			if (!bo)
> +			if (!bo || bo->resource != res)
>   				continue;
>   
>   			num_pages = PFN_UP(bo->base.size);


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 02/16] drm/ttm/pool: Fix ttm_pool_alloc error path
  2023-02-15 16:13 ` [RFC PATCH 02/16] drm/ttm/pool: Fix ttm_pool_alloc error path Thomas Hellström
@ 2023-02-15 17:31   ` Christian König
  2023-02-15 18:02     ` Thomas Hellström
  0 siblings, 1 reply; 32+ messages in thread
From: Christian König @ 2023-02-15 17:31 UTC (permalink / raw)
  To: Thomas Hellström, dri-devel
  Cc: Miaohe Lin, David Hildenbrand, NeilBrown, Daniel Vetter,
	intel-gfx, Matthew Wilcox (Oracle),
	linux-mm, Dave Hansen, Huang Rui, linux-graphics-maintainer,
	Peter Xu, Johannes Weiner, Madhav Chauhan, Dave Airlie,
	Andrew Morton, Matthew Auld

Am 15.02.23 um 17:13 schrieb Thomas Hellström:
> When hitting an error, the error path forgot to unmap dma mappings and

I don't see where this happens?

> could call set_pages_wb() on already uncached pages.

Yeah, but what's the problem?

Regards,
Christian.

>
> Fix this by introducing a common __ttm_pool_free() function that
> does the right thing.
>
> Fixes: d099fc8f540a ("drm/ttm: new TT backend allocation pool v3")
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Madhav Chauhan <madhav.chauhan@amd.com>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Huang Rui <ray.huang@amd.com>
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>   drivers/gpu/drm/ttm/ttm_pool.c | 74 +++++++++++++++++++++-------------
>   1 file changed, 45 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
> index aa116a7bbae3..1cc7591a9542 100644
> --- a/drivers/gpu/drm/ttm/ttm_pool.c
> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> @@ -367,6 +367,39 @@ static int ttm_pool_page_allocated(struct ttm_pool *pool, unsigned int order,
>   	return 0;
>   }
>   
> +static void __ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt,
> +			    struct page **caching_divide,
> +			    enum ttm_caching initial_caching,
> +			    enum ttm_caching subseq_caching,
> +			    pgoff_t num_pages)
> +{
> +	enum ttm_caching caching = subseq_caching;
> +	struct page **pages = tt->pages;
> +	unsigned int order;
> +	pgoff_t i, nr;
> +
> +	if (pool && caching_divide)
> +		caching = initial_caching;
> +
> +	for (i = 0; i < num_pages; i += nr, pages += nr) {
> +		struct ttm_pool_type *pt = NULL;
> +
> +		if (unlikely(caching_divide == pages))
> +			caching = subseq_caching;
> +
> +		order = ttm_pool_page_order(pool, *pages);
> +		nr = (1UL << order);
> +		if (tt->dma_address)
> +			ttm_pool_unmap(pool, tt->dma_address[i], nr);
> +
> +		pt = ttm_pool_select_type(pool, caching, order);
> +		if (pt)
> +			ttm_pool_type_give(pt, *pages);
> +		else
> +			ttm_pool_free_page(pool, caching, order, *pages);
> +	}
> +}
> +
>   /**
>    * ttm_pool_alloc - Fill a ttm_tt object
>    *
> @@ -386,8 +419,9 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
>   	dma_addr_t *dma_addr = tt->dma_address;
>   	struct page **caching = tt->pages;
>   	struct page **pages = tt->pages;
> +	enum ttm_caching page_caching;
>   	gfp_t gfp_flags = GFP_USER;
> -	unsigned int i, order;
> +	unsigned int order;
>   	struct page *p;
>   	int r;
>   
> @@ -410,6 +444,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
>   	     order = min_t(unsigned int, order, __fls(num_pages))) {
>   		struct ttm_pool_type *pt;
>   
> +		page_caching = tt->caching;
>   		pt = ttm_pool_select_type(pool, tt->caching, order);
>   		p = pt ? ttm_pool_type_take(pt) : NULL;
>   		if (p) {
> @@ -418,6 +453,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
>   			if (r)
>   				goto error_free_page;
>   
> +			caching = pages;
>   			do {
>   				r = ttm_pool_page_allocated(pool, order, p,
>   							    &dma_addr,
> @@ -426,14 +462,15 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
>   				if (r)
>   					goto error_free_page;
>   
> +				caching = pages;
>   				if (num_pages < (1 << order))
>   					break;
>   
>   				p = ttm_pool_type_take(pt);
>   			} while (p);
> -			caching = pages;
>   		}
>   
> +		page_caching = ttm_cached;
>   		while (num_pages >= (1 << order) &&
>   		       (p = ttm_pool_alloc_page(pool, gfp_flags, order))) {
>   
> @@ -442,6 +479,7 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
>   							   tt->caching);
>   				if (r)
>   					goto error_free_page;
> +				caching = pages;
>   			}
>   			r = ttm_pool_page_allocated(pool, order, p, &dma_addr,
>   						    &num_pages, &pages);
> @@ -468,15 +506,12 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
>   	return 0;
>   
>   error_free_page:
> -	ttm_pool_free_page(pool, tt->caching, order, p);
> +	ttm_pool_free_page(pool, page_caching, order, p);
>   
>   error_free_all:
>   	num_pages = tt->num_pages - num_pages;
> -	for (i = 0; i < num_pages; ) {
> -		order = ttm_pool_page_order(pool, tt->pages[i]);
> -		ttm_pool_free_page(pool, tt->caching, order, tt->pages[i]);
> -		i += 1 << order;
> -	}
> +	__ttm_pool_free(pool, tt, caching, tt->caching, ttm_cached,
> +			num_pages);
>   
>   	return r;
>   }
> @@ -492,27 +527,8 @@ EXPORT_SYMBOL(ttm_pool_alloc);
>    */
>   void ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt)
>   {
> -	unsigned int i;
> -
> -	for (i = 0; i < tt->num_pages; ) {
> -		struct page *p = tt->pages[i];
> -		unsigned int order, num_pages;
> -		struct ttm_pool_type *pt;
> -
> -		order = ttm_pool_page_order(pool, p);
> -		num_pages = 1ULL << order;
> -		if (tt->dma_address)
> -			ttm_pool_unmap(pool, tt->dma_address[i], num_pages);
> -
> -		pt = ttm_pool_select_type(pool, tt->caching, order);
> -		if (pt)
> -			ttm_pool_type_give(pt, tt->pages[i]);
> -		else
> -			ttm_pool_free_page(pool, tt->caching, order,
> -					   tt->pages[i]);
> -
> -		i += num_pages;
> -	}
> +	__ttm_pool_free(pool, tt, NULL, tt->caching, tt->caching,
> +			tt->num_pages);
>   
>   	while (atomic_long_read(&allocated_pages) > page_pool_size)
>   		ttm_pool_shrink();


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 03/16] drm/ttm: Use the BIT macro for the TTM_TT_FLAGs
  2023-02-15 16:13 ` [RFC PATCH 03/16] drm/ttm: Use the BIT macro for the TTM_TT_FLAGs Thomas Hellström
@ 2023-02-15 17:33   ` Christian König
  0 siblings, 0 replies; 32+ messages in thread
From: Christian König @ 2023-02-15 17:33 UTC (permalink / raw)
  To: Thomas Hellström, dri-devel
  Cc: Miaohe Lin, David Hildenbrand, NeilBrown, Daniel Vetter,
	intel-gfx, Peter Xu, linux-mm, Dave Hansen,
	linux-graphics-maintainer, Matthew Wilcox (Oracle),
	Johannes Weiner, Dave Airlie, Andrew Morton, Matthew Auld



Am 15.02.23 um 17:13 schrieb Thomas Hellström:
> New code is recommended to use the BIT macro instead of the explicit
> shifts. Change the older defines so that we can keep the style consistent
> with upcoming changes.
>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>   include/drm/ttm/ttm_tt.h | 10 +++++-----
>   1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
> index b7d3f3843f1e..cc54be1912e1 100644
> --- a/include/drm/ttm/ttm_tt.h
> +++ b/include/drm/ttm/ttm_tt.h
> @@ -83,12 +83,12 @@ struct ttm_tt {
>   	 * set by TTM after ttm_tt_populate() has successfully returned, and is
>   	 * then unset when TTM calls ttm_tt_unpopulate().
>   	 */
> -#define TTM_TT_FLAG_SWAPPED		(1 << 0)
> -#define TTM_TT_FLAG_ZERO_ALLOC		(1 << 1)
> -#define TTM_TT_FLAG_EXTERNAL		(1 << 2)
> -#define TTM_TT_FLAG_EXTERNAL_MAPPABLE	(1 << 3)
> +#define TTM_TT_FLAG_SWAPPED		BIT(0)
> +#define TTM_TT_FLAG_ZERO_ALLOC		BIT(1)
> +#define TTM_TT_FLAG_EXTERNAL		BIT(2)
> +#define TTM_TT_FLAG_EXTERNAL_MAPPABLE	BIT(3)
>   
> -#define TTM_TT_FLAG_PRIV_POPULATED  (1U << 31)
> +#define TTM_TT_FLAG_PRIV_POPULATED	BIT(31)

While at it please just use BIT(4) for this, there is actually nothing 
special about it.

Christian.

>   	uint32_t page_flags;
>   	/** @num_pages: Number of pages in the page array. */
>   	uint32_t num_pages;


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 04/16] drm/ttm, drm/vmwgfx: Update the TTM swapout interface
  2023-02-15 16:13 ` [RFC PATCH 04/16] drm/ttm, drm/vmwgfx: Update the TTM swapout interface Thomas Hellström
@ 2023-02-15 17:39   ` Christian König
  2023-02-15 18:19     ` Thomas Hellström
  0 siblings, 1 reply; 32+ messages in thread
From: Christian König @ 2023-02-15 17:39 UTC (permalink / raw)
  To: Thomas Hellström, dri-devel
  Cc: Miaohe Lin, David Hildenbrand, NeilBrown, Daniel Vetter,
	intel-gfx, Matthew Wilcox (Oracle),
	linux-mm, Dave Hansen, linux-graphics-maintainer, Peter Xu,
	Johannes Weiner, Dave Airlie, Andrew Morton, Matthew Auld

Am 15.02.23 um 17:13 schrieb Thomas Hellström:
> Update the TTM swapout interfaces for better compatibility with a shrinker.
> - Replace number-of-pages int return with a long to better match the
>    kernel's shrinker interface.
> - The gfp_flags parameter to ttm_xx_swapout() currently only takes the
>    GFP_KERNEL value and shouldn't really be needed since the shrinker we
>    hook up in upcoming patches sets a allocation context to match reclaim.

> - Introduce a shrink reason enumeration and a driver callback to shrink
>    buffer objects.

Is that really necessary? This is mid-layering once more.

If drivers want to implement driver specific shrinking they should 
register their own shrinker callback.

Christian.


>    The TTM_SHRINK_WATERMARK reason is going to still be handled using the
>    existing shmem copy, and will be used by pool types that don't lend
>    themselves well to shinking (dma_alloc pool) and when drivers explicitly
>    requests swapout.
>    The TTM_SHRINK_SWAP and TTM_SHRINK_PURGE reasons originate from a
>    shrinker and is to be handled by a new driver callback, bo_shrink().
>    Helpers for the new driver callback are provided in upcoming patches.
>
> Cc: linux-graphics-maintainer@vmware.com
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c        | 38 ++++++++++++++++----
>   drivers/gpu/drm/ttm/ttm_device.c    | 55 +++++++++++++++++++++--------
>   drivers/gpu/drm/ttm/ttm_tt.c        | 23 ++++++------
>   drivers/gpu/drm/vmwgfx/vmwgfx_drv.c |  3 +-
>   include/drm/ttm/ttm_bo.h            |  4 +--
>   include/drm/ttm/ttm_device.h        | 36 +++++++++++++++++--
>   include/drm/ttm/ttm_tt.h            | 17 +++++++--
>   7 files changed, 136 insertions(+), 40 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 882c2fa346f3..e5c0970564c0 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -1114,13 +1114,29 @@ int ttm_bo_wait_ctx(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx)
>   }
>   EXPORT_SYMBOL(ttm_bo_wait_ctx);
>   
> -int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
> -		   gfp_t gfp_flags)
> +/**
> + * ttm_bo_swapout() - Swap out or purge a buffer object
> + * @bo: The buffer object.
> + * @ctx: The ttm operation context.
> + * @reason: The swapout reason.
> + *
> + * Try to swap out or purge the contents of a system memory backed buffer
> + * object. The function needs to be called with the device's LRU lock held.
> + *
> + * Return: -EBUSY if the bo lock could not be grabbed or the object was
> + * otherwise busy. Otherwise the number of pages swapped out or negative
> + * error code on error. Iff the function didn't return -EBUSY, the
> + * LRU lock was dropped, and LRU traversal needs to restart.
> + */
> +long ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
> +		    enum ttm_shrink_reason reason)
>   {
>   	struct ttm_place place;
>   	bool locked;
>   	long ret;
>   
> +	lockdep_assert_held(&bo->bdev->lru_lock);
> +
>   	/*
>   	 * While the bo may already reside in SYSTEM placement, set
>   	 * SYSTEM as new placement to cover also the move further below.
> @@ -1142,8 +1158,12 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
>   	}
>   
>   	if (bo->deleted) {
> +		long num_pages = bo->ttm->num_pages;
> +
>   		ret = ttm_bo_cleanup_refs(bo, false, false, locked);
>   		ttm_bo_put(bo);
> +		if (!ret)
> +			return num_pages;
>   		return ret == -EBUSY ? -ENOSPC : ret;
>   	}
>   
> @@ -1184,13 +1204,17 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
>   	 * Swap out. Buffer will be swapped in again as soon as
>   	 * anyone tries to access a ttm page.
>   	 */
> -	if (bo->bdev->funcs->swap_notify)
> -		bo->bdev->funcs->swap_notify(bo);
> +	if (bo->bdev->funcs->bo_shrink && reason != TTM_SHRINK_WATERMARK) {
> +		ret = bo->bdev->funcs->bo_shrink(bo, ctx);
> +	} else {
> +		if (bo->bdev->funcs->swap_notify)
> +			bo->bdev->funcs->swap_notify(bo);
> +		ret = ttm_tt_swapout(bo->bdev, bo->ttm);
> +		if (!ret)
> +			ret = bo->ttm->num_pages;
> +	}
>   
> -	if (ttm_tt_is_populated(bo->ttm))
> -		ret = ttm_tt_swapout(bo->bdev, bo->ttm, gfp_flags);
>   out:
> -
>   	/*
>   	 * Unreserve without putting on LRU to avoid swapping out an
>   	 * already swapped buffer.
> diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
> index ae2f19dc9f81..7eadea07027f 100644
> --- a/drivers/gpu/drm/ttm/ttm_device.c
> +++ b/drivers/gpu/drm/ttm/ttm_device.c
> @@ -116,19 +116,28 @@ static int ttm_global_init(void)
>   	return ret;
>   }
>   
> -/*
> - * A buffer object shrink method that tries to swap out the first
> - * buffer object on the global::swap_lru list.
> +/**
> + * ttm_global_swapout() - Select and swap out a system-memory-backed bo.
> + * @ctx: The operation context.
> + * @reason: The reason for swapout.
> + *
> + * Select, based on round-robin a TTM device and traverse the LRUs of
> + * that specific device until a suitable bo backed by system memory is found
> + * and swapped-out or purged.
> + *
> + * Return: Positive value or zero indicating the size in pages of the
> + * bo swapped out. Negative error code on error.
>    */
> -int ttm_global_swapout(struct ttm_operation_ctx *ctx, gfp_t gfp_flags)
> +long ttm_global_swapout(struct ttm_operation_ctx *ctx,
> +			enum ttm_shrink_reason reason)
>   {
>   	struct ttm_global *glob = &ttm_glob;
>   	struct ttm_device *bdev;
> -	int ret = 0;
> +	long ret = 0;
>   
>   	mutex_lock(&ttm_global_mutex);
>   	list_for_each_entry(bdev, &glob->device_list, device_list) {
> -		ret = ttm_device_swapout(bdev, ctx, gfp_flags);
> +		ret = ttm_device_swapout(bdev, ctx, reason);
>   		if (ret > 0) {
>   			list_move_tail(&bdev->device_list, &glob->device_list);
>   			break;
> @@ -139,14 +148,29 @@ int ttm_global_swapout(struct ttm_operation_ctx *ctx, gfp_t gfp_flags)
>   }
>   EXPORT_SYMBOL(ttm_global_swapout);
>   
> -int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
> -		       gfp_t gfp_flags)
> +/**
> + * ttm_device_swapout() - Select and swap out a system-memory-backed bo.
> + * @bdev: The device whos bos are considered for swapout.
> + * @ctx: The operation context.
> + * @reason: The reason for swapout.
> + *
> + * Traverse the LRUs of a specific device until a suitable bo backed by
> + * system memory is found and swapped-out or purged.
> + *
> + * Return: Positive value or zero indicating the size in pages of the
> + * bo swapped out. Negative error code on error.
> + */
> +long ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
> +			enum ttm_shrink_reason reason)
>   {
>   	struct ttm_resource_cursor cursor;
>   	struct ttm_resource_manager *man;
>   	struct ttm_resource *res;
>   	unsigned i;
> -	int ret;
> +	long ret;
> +
> +	if (reason != TTM_SHRINK_WATERMARK && !bdev->funcs->bo_shrink)
> +		return 0;
>   
>   	spin_lock(&bdev->lru_lock);
>   	for (i = TTM_PL_SYSTEM; i < TTM_NUM_MEM_TYPES; ++i) {
> @@ -156,16 +180,19 @@ int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
>   
>   		ttm_resource_manager_for_each_res(man, &cursor, res) {
>   			struct ttm_buffer_object *bo = res->bo;
> -			uint32_t num_pages;
> +			struct ttm_tt *tt;
>   
>   			if (!bo || bo->resource != res)
>   				continue;
>   
> -			num_pages = PFN_UP(bo->base.size);
> -			ret = ttm_bo_swapout(bo, ctx, gfp_flags);
> +			tt = bo->ttm;
> +			if (!tt || (reason == TTM_SHRINK_PURGE &&
> +				    !ttm_tt_purgeable(tt)))
> +				continue;
> +			ret = ttm_bo_swapout(bo, ctx, reason);
>   			/* ttm_bo_swapout has dropped the lru_lock */
> -			if (!ret)
> -				return num_pages;
> +			if (ret >= 0)
> +				return ret;
>   			if (ret != -EBUSY)
>   				return ret;
>   		}
> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
> index ab725d9d14a6..a68c14de0161 100644
> --- a/drivers/gpu/drm/ttm/ttm_tt.c
> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> @@ -239,22 +239,21 @@ int ttm_tt_swapin(struct ttm_tt *ttm)
>   
>   /**
>    * ttm_tt_swapout - swap out tt object
> - *
>    * @bdev: TTM device structure.
>    * @ttm: The struct ttm_tt.
> - * @gfp_flags: Flags to use for memory allocation.
>    *
> - * Swapout a TT object to a shmem_file, return number of pages swapped out or
> - * negative error code.
> + * Swapout a TT object to a shmem_file.
> + *
> + * Return: number of pages swapped out or negative error code on error.
>    */
> -int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
> -		   gfp_t gfp_flags)
> +int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm)
>   {
>   	loff_t size = (loff_t)ttm->num_pages << PAGE_SHIFT;
>   	struct address_space *swap_space;
>   	struct file *swap_storage;
>   	struct page *from_page;
>   	struct page *to_page;
> +	gfp_t gfp_flags;
>   	int i, ret;
>   
>   	swap_storage = shmem_file_setup("ttm swap", size, 0);
> @@ -264,7 +263,7 @@ int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
>   	}
>   
>   	swap_space = swap_storage->f_mapping;
> -	gfp_flags &= mapping_gfp_mask(swap_space);
> +	gfp_flags = GFP_KERNEL & mapping_gfp_mask(swap_space);
>   
>   	for (i = 0; i < ttm->num_pages; ++i) {
>   		from_page = ttm->pages[i];
> @@ -315,12 +314,14 @@ int ttm_tt_populate(struct ttm_device *bdev,
>   	while (atomic_long_read(&ttm_pages_allocated) > ttm_pages_limit ||
>   	       atomic_long_read(&ttm_dma32_pages_allocated) >
>   	       ttm_dma32_pages_limit) {
> +		long r = ttm_global_swapout(ctx, TTM_SHRINK_WATERMARK);
>   
> -		ret = ttm_global_swapout(ctx, GFP_KERNEL);
> -		if (ret == 0)
> +		if (!r)
>   			break;
> -		if (ret < 0)
> +		if (r < 0) {
> +			ret = r;
>   			goto error;
> +		}
>   	}
>   
>   	if (bdev->funcs->ttm_tt_populate)
> @@ -379,7 +380,7 @@ static int ttm_tt_debugfs_shrink_show(struct seq_file *m, void *data)
>   {
>   	struct ttm_operation_ctx ctx = { false, false };
>   
> -	seq_printf(m, "%d\n", ttm_global_swapout(&ctx, GFP_KERNEL));
> +	seq_printf(m, "%ld\n", ttm_global_swapout(&ctx, TTM_SHRINK_SWAP));
>   	return 0;
>   }
>   DEFINE_SHOW_ATTRIBUTE(ttm_tt_debugfs_shrink);
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> index 2588615a2a38..292c5199d2cc 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> @@ -1514,7 +1514,8 @@ static int vmw_pm_freeze(struct device *kdev)
>   	vmw_execbuf_release_pinned_bo(dev_priv);
>   	vmw_resource_evict_all(dev_priv);
>   	vmw_release_device_early(dev_priv);
> -	while (ttm_device_swapout(&dev_priv->bdev, &ctx, GFP_KERNEL) > 0);
> +	while (ttm_device_swapout(&dev_priv->bdev, &ctx, TTM_SHRINK_WATERMARK) > 0)
> +		;
>   	vmw_fifo_resource_dec(dev_priv);
>   	if (atomic_read(&dev_priv->num_fifo_resources) != 0) {
>   		DRM_ERROR("Can't hibernate while 3D resources are active.\n");
> diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
> index 8b113c384236..6b45e0b639e0 100644
> --- a/include/drm/ttm/ttm_bo.h
> +++ b/include/drm/ttm/ttm_bo.h
> @@ -375,8 +375,8 @@ void ttm_bo_kunmap(struct ttm_bo_kmap_obj *map);
>   int ttm_bo_vmap(struct ttm_buffer_object *bo, struct iosys_map *map);
>   void ttm_bo_vunmap(struct ttm_buffer_object *bo, struct iosys_map *map);
>   int ttm_bo_mmap_obj(struct vm_area_struct *vma, struct ttm_buffer_object *bo);
> -int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
> -		   gfp_t gfp_flags);
> +long ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
> +		    enum ttm_shrink_reason reason);
>   void ttm_bo_pin(struct ttm_buffer_object *bo);
>   void ttm_bo_unpin(struct ttm_buffer_object *bo);
>   int ttm_mem_evict_first(struct ttm_device *bdev,
> diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h
> index 4f3e81eac6f3..6bd2abf712ab 100644
> --- a/include/drm/ttm/ttm_device.h
> +++ b/include/drm/ttm/ttm_device.h
> @@ -35,6 +35,21 @@ struct ttm_placement;
>   struct ttm_buffer_object;
>   struct ttm_operation_ctx;
>   
> +/**
> + * enum ttm_shrink_reason - Reason for shrinking system memory
> + * @TTM_SHRINK_WATERMARK - A watermark limit was reached. Not from reclaim.
> + * @TTM_SHRINK_PURGE - A request for shrinking only purged objects.
> + * @TTM_SHRINK_SWAP - A request for shrinking any object.
> + *
> + * This enum is intended for the buffer object- and shrink method selection
> + * algorithms. It's not intended to leak to or be used by TTM drivers.
> + */
> +enum ttm_shrink_reason {
> +	TTM_SHRINK_WATERMARK,
> +	TTM_SHRINK_PURGE,
> +	TTM_SHRINK_SWAP,
> +};
> +
>   /**
>    * struct ttm_global - Buffer object driver global data.
>    */
> @@ -207,6 +222,19 @@ struct ttm_device_funcs {
>   	 * adding fences that may force a delayed delete
>   	 */
>   	void (*release_notify)(struct ttm_buffer_object *bo);
> +
> +	/**
> +	 * Shrink the bo's system pages, Either by swapping or by purging.
> +	 * @bo: Bo the system pages of which are to be shrunken.
> +	 * @ctx: Operation ctx. In particular the driver callback should
> +	 *       adhere to the no_wait_gpu and interruptible fields.
> +	 *
> +	 * This is also notifying the driver that the bo is about to be
> +	 * shrunken and the driver should take care to unbind any GPU bindings
> +	 * and to note that the content is purged if @bo->ttm is purgeable.
> +	 */
> +	long (*bo_shrink)(struct ttm_buffer_object *bo,
> +			  struct ttm_operation_ctx *ctx);
>   };
>   
>   /**
> @@ -268,9 +296,11 @@ struct ttm_device {
>   	struct workqueue_struct *wq;
>   };
>   
> -int ttm_global_swapout(struct ttm_operation_ctx *ctx, gfp_t gfp_flags);
> -int ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
> -		       gfp_t gfp_flags);
> +long ttm_global_swapout(struct ttm_operation_ctx *ctx,
> +			enum ttm_shrink_reason reason);
> +
> +long ttm_device_swapout(struct ttm_device *bdev, struct ttm_operation_ctx *ctx,
> +			enum ttm_shrink_reason reason);
>   
>   static inline struct ttm_resource_manager *
>   ttm_manager_type(struct ttm_device *bdev, int mem_type)
> diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
> index cc54be1912e1..627168eba8f6 100644
> --- a/include/drm/ttm/ttm_tt.h
> +++ b/include/drm/ttm/ttm_tt.h
> @@ -87,6 +87,7 @@ struct ttm_tt {
>   #define TTM_TT_FLAG_ZERO_ALLOC		BIT(1)
>   #define TTM_TT_FLAG_EXTERNAL		BIT(2)
>   #define TTM_TT_FLAG_EXTERNAL_MAPPABLE	BIT(3)
> +#define TTM_TT_FLAG_DONTNEED		BIT(4)
>   
>   #define TTM_TT_FLAG_PRIV_POPULATED	BIT(31)
>   	uint32_t page_flags;
> @@ -180,8 +181,8 @@ void ttm_tt_destroy(struct ttm_device *bdev, struct ttm_tt *ttm);
>    * Swap in a previously swap out ttm_tt.
>    */
>   int ttm_tt_swapin(struct ttm_tt *ttm);
> -int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
> -		   gfp_t gfp_flags);
> +
> +int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm);
>   
>   /**
>    * ttm_tt_populate - allocate pages for a ttm
> @@ -223,6 +224,18 @@ void ttm_tt_mgr_init(unsigned long num_pages, unsigned long num_dma32_pages);
>   struct ttm_kmap_iter *ttm_kmap_iter_tt_init(struct ttm_kmap_iter_tt *iter_tt,
>   					    struct ttm_tt *tt);
>   
> +/**
> + * ttm_tt_purgeable() - Whether a struct ttm_tt's contents is purgeable
> + * @tt: The struct ttm_tt to consider.
> + *
> + * Return: Whether the contents is purgeable in the sence that the owner
> + * doesn't mind losing it as long as it gets notified.
> + */
> +static inline bool ttm_tt_purgeable(struct ttm_tt *tt)
> +{
> +	return tt->page_flags & TTM_TT_FLAG_DONTNEED;
> +}
> +
>   #if IS_ENABLED(CONFIG_AGP)
>   #include <linux/agp_backend.h>
>   


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 07/16] drm/ttm: Reduce the number of used allocation orders for TTM pages
  2023-02-15 16:13 ` [RFC PATCH 07/16] drm/ttm: Reduce the number of used allocation orders for TTM pages Thomas Hellström
@ 2023-02-15 17:42   ` Christian König
  2023-02-15 18:12     ` Thomas Hellström
  0 siblings, 1 reply; 32+ messages in thread
From: Christian König @ 2023-02-15 17:42 UTC (permalink / raw)
  To: Thomas Hellström, dri-devel
  Cc: Miaohe Lin, David Hildenbrand, NeilBrown, Daniel Vetter,
	intel-gfx, Peter Xu, linux-mm, Dave Hansen,
	linux-graphics-maintainer, Matthew Wilcox (Oracle),
	Johannes Weiner, Dave Airlie, Andrew Morton, Matthew Auld

Am 15.02.23 um 17:13 schrieb Thomas Hellström:
> When swapping out, we will split multi-order pages both in order to
> move them to the swap-cache and to be able to return memory to the
> swap cache as soon as possible on a page-by-page basis.
> By reducing the page max order to the system PMD size, we can be nicer
> to the system and avoid splitting gigantic pages.


> On top of this we also
> include the 64K page size in the page sizes tried, since that appears to
> be a common size for GPU applications.

Please completely drop that. This is just nonsense spilling in from the 
Windows drivers.

Christian.

>
> Looking forward to when we might be able to swap out PMD size folios
> without splitting, this will also be a benefit.
>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>   drivers/gpu/drm/ttm/ttm_pool.c | 58 ++++++++++++++++++++++++++--------
>   1 file changed, 45 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
> index 1cc7591a9542..8787fb6a218b 100644
> --- a/drivers/gpu/drm/ttm/ttm_pool.c
> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> @@ -31,6 +31,8 @@
>    * cause they are rather slow compared to alloc_pages+map.
>    */
>   
> +#define pr_fmt(fmt) "[TTM POOL] " fmt
> +
>   #include <linux/module.h>
>   #include <linux/dma-mapping.h>
>   #include <linux/debugfs.h>
> @@ -47,6 +49,18 @@
>   
>   #include "ttm_module.h"
>   
> +#define TTM_MAX_ORDER (PMD_SHIFT - PAGE_SHIFT)
> +#define TTM_64K_ORDER (16 - PAGE_SHIFT)
> +#if (TTM_MAX_ORDER < TTM_64K_ORDER)
> +#undef TTM_MAX_ORDER
> +#define TTM_MAX_ORDER TTM_64K_ORDER
> +#endif
> +#if ((MAX_ORDER - 1) < TTM_MAX_ORDER)
> +#undef TTM_MAX_ORDER
> +#define TTM_MAX_ORDER (MAX_ORDER - 1)
> +#endif
> +#define TTM_DIM_ORDER (TTM_MAX_ORDER + 1)
> +
>   /**
>    * struct ttm_pool_dma - Helper object for coherent DMA mappings
>    *
> @@ -65,16 +79,18 @@ module_param(page_pool_size, ulong, 0644);
>   
>   static atomic_long_t allocated_pages;
>   
> -static struct ttm_pool_type global_write_combined[MAX_ORDER];
> -static struct ttm_pool_type global_uncached[MAX_ORDER];
> +static struct ttm_pool_type global_write_combined[TTM_DIM_ORDER];
> +static struct ttm_pool_type global_uncached[TTM_DIM_ORDER];
>   
> -static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER];
> -static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
> +static struct ttm_pool_type global_dma32_write_combined[TTM_DIM_ORDER];
> +static struct ttm_pool_type global_dma32_uncached[TTM_DIM_ORDER];
>   
>   static spinlock_t shrinker_lock;
>   static struct list_head shrinker_list;
>   static struct shrinker mm_shrinker;
>   
> +static unsigned int ttm_pool_orders[] = {TTM_MAX_ORDER, 0, 0};
> +
>   /* Allocate pages of size 1 << order with the given gfp_flags */
>   static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
>   					unsigned int order)
> @@ -400,6 +416,17 @@ static void __ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt,
>   	}
>   }
>   
> +static unsigned int ttm_pool_select_order(unsigned int order, pgoff_t num_pages)
> +{
> +	unsigned int *cur_order = ttm_pool_orders;
> +
> +	order = min_t(unsigned int, __fls(num_pages), order);
> +	while (order < *cur_order)
> +		++cur_order;
> +
> +	return *cur_order;
> +}
> +
>   /**
>    * ttm_pool_alloc - Fill a ttm_tt object
>    *
> @@ -439,9 +466,8 @@ int ttm_pool_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
>   	else
>   		gfp_flags |= GFP_HIGHUSER;
>   
> -	for (order = min_t(unsigned int, MAX_ORDER - 1, __fls(num_pages));
> -	     num_pages;
> -	     order = min_t(unsigned int, order, __fls(num_pages))) {
> +	order = ttm_pool_select_order(ttm_pool_orders[0], num_pages);
> +	for (; num_pages; order = ttm_pool_select_order(order, num_pages)) {
>   		struct ttm_pool_type *pt;
>   
>   		page_caching = tt->caching;
> @@ -558,7 +584,7 @@ void ttm_pool_init(struct ttm_pool *pool, struct device *dev,
>   
>   	if (use_dma_alloc) {
>   		for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
> -			for (j = 0; j < MAX_ORDER; ++j)
> +			for (j = 0; j < TTM_DIM_ORDER; ++j)
>   				ttm_pool_type_init(&pool->caching[i].orders[j],
>   						   pool, i, j);
>   	}
> @@ -578,7 +604,7 @@ void ttm_pool_fini(struct ttm_pool *pool)
>   
>   	if (pool->use_dma_alloc) {
>   		for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
> -			for (j = 0; j < MAX_ORDER; ++j)
> +			for (j = 0; j < TTM_DIM_ORDER; ++j)
>   				ttm_pool_type_fini(&pool->caching[i].orders[j]);
>   	}
>   
> @@ -632,7 +658,7 @@ static void ttm_pool_debugfs_header(struct seq_file *m)
>   	unsigned int i;
>   
>   	seq_puts(m, "\t ");
> -	for (i = 0; i < MAX_ORDER; ++i)
> +	for (i = 0; i < TTM_DIM_ORDER; ++i)
>   		seq_printf(m, " ---%2u---", i);
>   	seq_puts(m, "\n");
>   }
> @@ -643,7 +669,7 @@ static void ttm_pool_debugfs_orders(struct ttm_pool_type *pt,
>   {
>   	unsigned int i;
>   
> -	for (i = 0; i < MAX_ORDER; ++i)
> +	for (i = 0; i < TTM_DIM_ORDER; ++i)
>   		seq_printf(m, " %8u", ttm_pool_type_count(&pt[i]));
>   	seq_puts(m, "\n");
>   }
> @@ -749,10 +775,16 @@ int ttm_pool_mgr_init(unsigned long num_pages)
>   	if (!page_pool_size)
>   		page_pool_size = num_pages;
>   
> +	if (TTM_64K_ORDER < TTM_MAX_ORDER)
> +		ttm_pool_orders[1] = TTM_64K_ORDER;
> +
> +	pr_debug("Used orders are %u %u %u\n", ttm_pool_orders[0],
> +		 ttm_pool_orders[1], ttm_pool_orders[2]);
> +
>   	spin_lock_init(&shrinker_lock);
>   	INIT_LIST_HEAD(&shrinker_list);
>   
> -	for (i = 0; i < MAX_ORDER; ++i) {
> +	for (i = 0; i < TTM_DIM_ORDER; ++i) {
>   		ttm_pool_type_init(&global_write_combined[i], NULL,
>   				   ttm_write_combined, i);
>   		ttm_pool_type_init(&global_uncached[i], NULL, ttm_uncached, i);
> @@ -785,7 +817,7 @@ void ttm_pool_mgr_fini(void)
>   {
>   	unsigned int i;
>   
> -	for (i = 0; i < MAX_ORDER; ++i) {
> +	for (i = 0; i < TTM_DIM_ORDER; ++i) {
>   		ttm_pool_type_fini(&global_write_combined[i]);
>   		ttm_pool_type_fini(&global_uncached[i]);
>   


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 02/16] drm/ttm/pool: Fix ttm_pool_alloc error path
  2023-02-15 17:31   ` Christian König
@ 2023-02-15 18:02     ` Thomas Hellström
  2023-02-15 18:26       ` Christian König
  0 siblings, 1 reply; 32+ messages in thread
From: Thomas Hellström @ 2023-02-15 18:02 UTC (permalink / raw)
  To: Christian König, dri-devel
  Cc: Miaohe Lin, David Hildenbrand, NeilBrown, Daniel Vetter,
	intel-gfx, Matthew Wilcox (Oracle),
	linux-mm, Dave Hansen, Huang Rui, linux-graphics-maintainer,
	Peter Xu, Johannes Weiner, Madhav Chauhan, Dave Airlie,
	Andrew Morton, Matthew Auld

On Wed, 2023-02-15 at 18:31 +0100, Christian König wrote:
> Am 15.02.23 um 17:13 schrieb Thomas Hellström:
> > When hitting an error, the error path forgot to unmap dma mappings
> > and
> 
> I don't see where this happens?

From what I can tell, ttm_pool_page_allocated() maps the page for dma,
If we later hit an error, ttm_pool_free_page() will leak the mapping.

> 
> > could call set_pages_wb() on already uncached pages.
> 
> Yeah, but what's the problem?

Umm, at least if you try to set WC on an already WC'd page, the
set_pages_ code will spam dmesg with warnings. 
Not sure if set_pages_wb() on WB pages does the same, nor if it
issues unnecessary global cache / tlb flushes or whether that will
change in the future.
The point of avoiding the set_pages_wb() when already WB is you don't
have to check, and you don't have to care.

That said, the __ttm_pool_free() is used also in upcoming patches.

/Thomas


> 
> Regards,
> Christian.
> 
> > 
> > Fix this by introducing a common __ttm_pool_free() function that
> > does the right thing.
> > 
> > Fixes: d099fc8f540a ("drm/ttm: new TT backend allocation pool v3")
> > Cc: Christian König <christian.koenig@amd.com>
> > Cc: Dave Airlie <airlied@redhat.com>
> > Cc: Madhav Chauhan <madhav.chauhan@amd.com>
> > Cc: Christian Koenig <christian.koenig@amd.com>
> > Cc: Huang Rui <ray.huang@amd.com>
> > Cc: dri-devel@lists.freedesktop.org
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >   drivers/gpu/drm/ttm/ttm_pool.c | 74 +++++++++++++++++++++--------
> > -----
> >   1 file changed, 45 insertions(+), 29 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/ttm/ttm_pool.c
> > b/drivers/gpu/drm/ttm/ttm_pool.c
> > index aa116a7bbae3..1cc7591a9542 100644
> > --- a/drivers/gpu/drm/ttm/ttm_pool.c
> > +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> > @@ -367,6 +367,39 @@ static int ttm_pool_page_allocated(struct
> > ttm_pool *pool, unsigned int order,
> >         return 0;
> >   }
> >   
> > +static void __ttm_pool_free(struct ttm_pool *pool, struct ttm_tt
> > *tt,
> > +                           struct page **caching_divide,
> > +                           enum ttm_caching initial_caching,
> > +                           enum ttm_caching subseq_caching,
> > +                           pgoff_t num_pages)
> > +{
> > +       enum ttm_caching caching = subseq_caching;
> > +       struct page **pages = tt->pages;
> > +       unsigned int order;
> > +       pgoff_t i, nr;
> > +
> > +       if (pool && caching_divide)
> > +               caching = initial_caching;
> > +
> > +       for (i = 0; i < num_pages; i += nr, pages += nr) {
> > +               struct ttm_pool_type *pt = NULL;
> > +
> > +               if (unlikely(caching_divide == pages))
> > +                       caching = subseq_caching;
> > +
> > +               order = ttm_pool_page_order(pool, *pages);
> > +               nr = (1UL << order);
> > +               if (tt->dma_address)
> > +                       ttm_pool_unmap(pool, tt->dma_address[i],
> > nr);
> > +
> > +               pt = ttm_pool_select_type(pool, caching, order);
> > +               if (pt)
> > +                       ttm_pool_type_give(pt, *pages);
> > +               else
> > +                       ttm_pool_free_page(pool, caching, order,
> > *pages);
> > +       }
> > +}
> > +
> >   /**
> >    * ttm_pool_alloc - Fill a ttm_tt object
> >    *
> > @@ -386,8 +419,9 @@ int ttm_pool_alloc(struct ttm_pool *pool,
> > struct ttm_tt *tt,
> >         dma_addr_t *dma_addr = tt->dma_address;
> >         struct page **caching = tt->pages;
> >         struct page **pages = tt->pages;
> > +       enum ttm_caching page_caching;
> >         gfp_t gfp_flags = GFP_USER;
> > -       unsigned int i, order;
> > +       unsigned int order;
> >         struct page *p;
> >         int r;
> >   
> > @@ -410,6 +444,7 @@ int ttm_pool_alloc(struct ttm_pool *pool,
> > struct ttm_tt *tt,
> >              order = min_t(unsigned int, order, __fls(num_pages)))
> > {
> >                 struct ttm_pool_type *pt;
> >   
> > +               page_caching = tt->caching;
> >                 pt = ttm_pool_select_type(pool, tt->caching,
> > order);
> >                 p = pt ? ttm_pool_type_take(pt) : NULL;
> >                 if (p) {
> > @@ -418,6 +453,7 @@ int ttm_pool_alloc(struct ttm_pool *pool,
> > struct ttm_tt *tt,
> >                         if (r)
> >                                 goto error_free_page;
> >   
> > +                       caching = pages;
> >                         do {
> >                                 r = ttm_pool_page_allocated(pool,
> > order, p,
> >                                                            
> > &dma_addr,
> > @@ -426,14 +462,15 @@ int ttm_pool_alloc(struct ttm_pool *pool,
> > struct ttm_tt *tt,
> >                                 if (r)
> >                                         goto error_free_page;
> >   
> > +                               caching = pages;
> >                                 if (num_pages < (1 << order))
> >                                         break;
> >   
> >                                 p = ttm_pool_type_take(pt);
> >                         } while (p);
> > -                       caching = pages;
> >                 }
> >   
> > +               page_caching = ttm_cached;
> >                 while (num_pages >= (1 << order) &&
> >                        (p = ttm_pool_alloc_page(pool, gfp_flags,
> > order))) {
> >   
> > @@ -442,6 +479,7 @@ int ttm_pool_alloc(struct ttm_pool *pool,
> > struct ttm_tt *tt,
> >                                                            tt-
> > >caching);
> >                                 if (r)
> >                                         goto error_free_page;
> > +                               caching = pages;
> >                         }
> >                         r = ttm_pool_page_allocated(pool, order, p,
> > &dma_addr,
> >                                                     &num_pages,
> > &pages);
> > @@ -468,15 +506,12 @@ int ttm_pool_alloc(struct ttm_pool *pool,
> > struct ttm_tt *tt,
> >         return 0;
> >   
> >   error_free_page:
> > -       ttm_pool_free_page(pool, tt->caching, order, p);
> > +       ttm_pool_free_page(pool, page_caching, order, p);
> >   
> >   error_free_all:
> >         num_pages = tt->num_pages - num_pages;
> > -       for (i = 0; i < num_pages; ) {
> > -               order = ttm_pool_page_order(pool, tt->pages[i]);
> > -               ttm_pool_free_page(pool, tt->caching, order, tt-
> > >pages[i]);
> > -               i += 1 << order;
> > -       }
> > +       __ttm_pool_free(pool, tt, caching, tt->caching, ttm_cached,
> > +                       num_pages);
> >   
> >         return r;
> >   }
> > @@ -492,27 +527,8 @@ EXPORT_SYMBOL(ttm_pool_alloc);
> >    */
> >   void ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt)
> >   {
> > -       unsigned int i;
> > -
> > -       for (i = 0; i < tt->num_pages; ) {
> > -               struct page *p = tt->pages[i];
> > -               unsigned int order, num_pages;
> > -               struct ttm_pool_type *pt;
> > -
> > -               order = ttm_pool_page_order(pool, p);
> > -               num_pages = 1ULL << order;
> > -               if (tt->dma_address)
> > -                       ttm_pool_unmap(pool, tt->dma_address[i],
> > num_pages);
> > -
> > -               pt = ttm_pool_select_type(pool, tt->caching,
> > order);
> > -               if (pt)
> > -                       ttm_pool_type_give(pt, tt->pages[i]);
> > -               else
> > -                       ttm_pool_free_page(pool, tt->caching,
> > order,
> > -                                          tt->pages[i]);
> > -
> > -               i += num_pages;
> > -       }
> > +       __ttm_pool_free(pool, tt, NULL, tt->caching, tt->caching,
> > +                       tt->num_pages);
> >   
> >         while (atomic_long_read(&allocated_pages) > page_pool_size)
> >                 ttm_pool_shrink();
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 07/16] drm/ttm: Reduce the number of used allocation orders for TTM pages
  2023-02-15 17:42   ` Christian König
@ 2023-02-15 18:12     ` Thomas Hellström
  2023-02-15 18:30       ` Christian König
  0 siblings, 1 reply; 32+ messages in thread
From: Thomas Hellström @ 2023-02-15 18:12 UTC (permalink / raw)
  To: Christian König, dri-devel
  Cc: Miaohe Lin, David Hildenbrand, NeilBrown, Daniel Vetter,
	intel-gfx, Peter Xu, linux-mm, Dave Hansen,
	linux-graphics-maintainer, Matthew Wilcox (Oracle),
	Johannes Weiner, Dave Airlie, Andrew Morton, Matthew Auld

On Wed, 2023-02-15 at 18:42 +0100, Christian König wrote:
> Am 15.02.23 um 17:13 schrieb Thomas Hellström:
> > When swapping out, we will split multi-order pages both in order to
> > move them to the swap-cache and to be able to return memory to the
> > swap cache as soon as possible on a page-by-page basis.
> > By reducing the page max order to the system PMD size, we can be
> > nicer
> > to the system and avoid splitting gigantic pages.
> 
> 
> > On top of this we also
> > include the 64K page size in the page sizes tried, since that
> > appears to
> > be a common size for GPU applications.
> 
> Please completely drop that. 
You mean the 64K page size, or the whole patch?

> This is just nonsense spilling in from the 
> Windows drivers.

Agreed, but IIRC on the last RFC you asked me not to drop the 64K
pages, so that's why they are here. I can remove them if needed.

The only reason for keeping them from a performance point of view is
better efficiency on GPUs with 64K page size if not using a coalescing
IOMMU for dma-mapping.

Let me know what you think is best and I'll adjust accordingly.

/Thomas


> 
> Christian.
> 
> > 
> > Looking forward to when we might be able to swap out PMD size
> > folios
> > without splitting, this will also be a benefit.
> > 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >   drivers/gpu/drm/ttm/ttm_pool.c | 58 ++++++++++++++++++++++++++---
> > -----
> >   1 file changed, 45 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/ttm/ttm_pool.c
> > b/drivers/gpu/drm/ttm/ttm_pool.c
> > index 1cc7591a9542..8787fb6a218b 100644
> > --- a/drivers/gpu/drm/ttm/ttm_pool.c
> > +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> > @@ -31,6 +31,8 @@
> >    * cause they are rather slow compared to alloc_pages+map.
> >    */
> >   
> > +#define pr_fmt(fmt) "[TTM POOL] " fmt
> > +
> >   #include <linux/module.h>
> >   #include <linux/dma-mapping.h>
> >   #include <linux/debugfs.h>
> > @@ -47,6 +49,18 @@
> >   
> >   #include "ttm_module.h"
> >   
> > +#define TTM_MAX_ORDER (PMD_SHIFT - PAGE_SHIFT)
> > +#define TTM_64K_ORDER (16 - PAGE_SHIFT)
> > +#if (TTM_MAX_ORDER < TTM_64K_ORDER)
> > +#undef TTM_MAX_ORDER
> > +#define TTM_MAX_ORDER TTM_64K_ORDER
> > +#endif
> > +#if ((MAX_ORDER - 1) < TTM_MAX_ORDER)
> > +#undef TTM_MAX_ORDER
> > +#define TTM_MAX_ORDER (MAX_ORDER - 1)
> > +#endif
> > +#define TTM_DIM_ORDER (TTM_MAX_ORDER + 1)
> > +
> >   /**
> >    * struct ttm_pool_dma - Helper object for coherent DMA mappings
> >    *
> > @@ -65,16 +79,18 @@ module_param(page_pool_size, ulong, 0644);
> >   
> >   static atomic_long_t allocated_pages;
> >   
> > -static struct ttm_pool_type global_write_combined[MAX_ORDER];
> > -static struct ttm_pool_type global_uncached[MAX_ORDER];
> > +static struct ttm_pool_type global_write_combined[TTM_DIM_ORDER];
> > +static struct ttm_pool_type global_uncached[TTM_DIM_ORDER];
> >   
> > -static struct ttm_pool_type
> > global_dma32_write_combined[MAX_ORDER];
> > -static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
> > +static struct ttm_pool_type
> > global_dma32_write_combined[TTM_DIM_ORDER];
> > +static struct ttm_pool_type global_dma32_uncached[TTM_DIM_ORDER];
> >   
> >   static spinlock_t shrinker_lock;
> >   static struct list_head shrinker_list;
> >   static struct shrinker mm_shrinker;
> >   
> > +static unsigned int ttm_pool_orders[] = {TTM_MAX_ORDER, 0, 0};
> > +
> >   /* Allocate pages of size 1 << order with the given gfp_flags */
> >   static struct page *ttm_pool_alloc_page(struct ttm_pool *pool,
> > gfp_t gfp_flags,
> >                                         unsigned int order)
> > @@ -400,6 +416,17 @@ static void __ttm_pool_free(struct ttm_pool
> > *pool, struct ttm_tt *tt,
> >         }
> >   }
> >   
> > +static unsigned int ttm_pool_select_order(unsigned int order,
> > pgoff_t num_pages)
> > +{
> > +       unsigned int *cur_order = ttm_pool_orders;
> > +
> > +       order = min_t(unsigned int, __fls(num_pages), order);
> > +       while (order < *cur_order)
> > +               ++cur_order;
> > +
> > +       return *cur_order;
> > +}
> > +
> >   /**
> >    * ttm_pool_alloc - Fill a ttm_tt object
> >    *
> > @@ -439,9 +466,8 @@ int ttm_pool_alloc(struct ttm_pool *pool,
> > struct ttm_tt *tt,
> >         else
> >                 gfp_flags |= GFP_HIGHUSER;
> >   
> > -       for (order = min_t(unsigned int, MAX_ORDER - 1,
> > __fls(num_pages));
> > -            num_pages;
> > -            order = min_t(unsigned int, order, __fls(num_pages)))
> > {
> > +       order = ttm_pool_select_order(ttm_pool_orders[0],
> > num_pages);
> > +       for (; num_pages; order = ttm_pool_select_order(order,
> > num_pages)) {
> >                 struct ttm_pool_type *pt;
> >   
> >                 page_caching = tt->caching;
> > @@ -558,7 +584,7 @@ void ttm_pool_init(struct ttm_pool *pool,
> > struct device *dev,
> >   
> >         if (use_dma_alloc) {
> >                 for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
> > -                       for (j = 0; j < MAX_ORDER; ++j)
> > +                       for (j = 0; j < TTM_DIM_ORDER; ++j)
> >                                 ttm_pool_type_init(&pool-
> > >caching[i].orders[j],
> >                                                    pool, i, j);
> >         }
> > @@ -578,7 +604,7 @@ void ttm_pool_fini(struct ttm_pool *pool)
> >   
> >         if (pool->use_dma_alloc) {
> >                 for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
> > -                       for (j = 0; j < MAX_ORDER; ++j)
> > +                       for (j = 0; j < TTM_DIM_ORDER; ++j)
> >                                 ttm_pool_type_fini(&pool-
> > >caching[i].orders[j]);
> >         }
> >   
> > @@ -632,7 +658,7 @@ static void ttm_pool_debugfs_header(struct
> > seq_file *m)
> >         unsigned int i;
> >   
> >         seq_puts(m, "\t ");
> > -       for (i = 0; i < MAX_ORDER; ++i)
> > +       for (i = 0; i < TTM_DIM_ORDER; ++i)
> >                 seq_printf(m, " ---%2u---", i);
> >         seq_puts(m, "\n");
> >   }
> > @@ -643,7 +669,7 @@ static void ttm_pool_debugfs_orders(struct
> > ttm_pool_type *pt,
> >   {
> >         unsigned int i;
> >   
> > -       for (i = 0; i < MAX_ORDER; ++i)
> > +       for (i = 0; i < TTM_DIM_ORDER; ++i)
> >                 seq_printf(m, " %8u", ttm_pool_type_count(&pt[i]));
> >         seq_puts(m, "\n");
> >   }
> > @@ -749,10 +775,16 @@ int ttm_pool_mgr_init(unsigned long
> > num_pages)
> >         if (!page_pool_size)
> >                 page_pool_size = num_pages;
> >   
> > +       if (TTM_64K_ORDER < TTM_MAX_ORDER)
> > +               ttm_pool_orders[1] = TTM_64K_ORDER;
> > +
> > +       pr_debug("Used orders are %u %u %u\n", ttm_pool_orders[0],
> > +                ttm_pool_orders[1], ttm_pool_orders[2]);
> > +
> >         spin_lock_init(&shrinker_lock);
> >         INIT_LIST_HEAD(&shrinker_list);
> >   
> > -       for (i = 0; i < MAX_ORDER; ++i) {
> > +       for (i = 0; i < TTM_DIM_ORDER; ++i) {
> >                 ttm_pool_type_init(&global_write_combined[i], NULL,
> >                                    ttm_write_combined, i);
> >                 ttm_pool_type_init(&global_uncached[i], NULL,
> > ttm_uncached, i);
> > @@ -785,7 +817,7 @@ void ttm_pool_mgr_fini(void)
> >   {
> >         unsigned int i;
> >   
> > -       for (i = 0; i < MAX_ORDER; ++i) {
> > +       for (i = 0; i < TTM_DIM_ORDER; ++i) {
> >                 ttm_pool_type_fini(&global_write_combined[i]);
> >                 ttm_pool_type_fini(&global_uncached[i]);
> >   
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 04/16] drm/ttm, drm/vmwgfx: Update the TTM swapout interface
  2023-02-15 17:39   ` Christian König
@ 2023-02-15 18:19     ` Thomas Hellström
  2023-02-15 18:32       ` Christian König
  0 siblings, 1 reply; 32+ messages in thread
From: Thomas Hellström @ 2023-02-15 18:19 UTC (permalink / raw)
  To: Christian König, dri-devel
  Cc: Miaohe Lin, David Hildenbrand, NeilBrown, Daniel Vetter,
	intel-gfx, Matthew Wilcox (Oracle),
	linux-mm, Dave Hansen, linux-graphics-maintainer, Peter Xu,
	Johannes Weiner, Dave Airlie, Andrew Morton, Matthew Auld

On Wed, 2023-02-15 at 18:39 +0100, Christian König wrote:
> Am 15.02.23 um 17:13 schrieb Thomas Hellström:
> > Update the TTM swapout interfaces for better compatibility with a
> > shrinker.
> > - Replace number-of-pages int return with a long to better match
> > the
> >    kernel's shrinker interface.
> > - The gfp_flags parameter to ttm_xx_swapout() currently only takes
> > the
> >    GFP_KERNEL value and shouldn't really be needed since the
> > shrinker we
> >    hook up in upcoming patches sets a allocation context to match
> > reclaim.
> 
> > - Introduce a shrink reason enumeration and a driver callback to
> > shrink
> >    buffer objects.
> 
> Is that really necessary? This is mid-layering once more.
> 
> If drivers want to implement driver specific shrinking they should 
> register their own shrinker callback.

Yes, a choice needs to be made here. If TTM registers the shrinker, the
driver needs to be called at least to unbind and to remove dma-
mappings.

If the driver registers the shrinker it can still (I think) use the
pool helpers, but needs TTM for LRU traversal and accounting.

I can have a look at the latter if yout think that will be a better
solution.

/Thomas


> 
> Christian.
> 
> 
> >    The TTM_SHRINK_WATERMARK reason is going to still be handled
> > using the
> >    existing shmem copy, and will be used by pool types that don't
> > lend
> >    themselves well to shinking (dma_alloc pool) and when drivers
> > explicitly
> >    requests swapout.
> >    The TTM_SHRINK_SWAP and TTM_SHRINK_PURGE reasons originate from
> > a
> >    shrinker and is to be handled by a new driver callback,
> > bo_shrink().
> >    Helpers for the new driver callback are provided in upcoming
> > patches.
> > 
> > Cc: linux-graphics-maintainer@vmware.com
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >   drivers/gpu/drm/ttm/ttm_bo.c        | 38 ++++++++++++++++----
> >   drivers/gpu/drm/ttm/ttm_device.c    | 55 +++++++++++++++++++++---
> > -----
> >   drivers/gpu/drm/ttm/ttm_tt.c        | 23 ++++++------
> >   drivers/gpu/drm/vmwgfx/vmwgfx_drv.c |  3 +-
> >   include/drm/ttm/ttm_bo.h            |  4 +--
> >   include/drm/ttm/ttm_device.h        | 36 +++++++++++++++++--
> >   include/drm/ttm/ttm_tt.h            | 17 +++++++--
> >   7 files changed, 136 insertions(+), 40 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
> > b/drivers/gpu/drm/ttm/ttm_bo.c
> > index 882c2fa346f3..e5c0970564c0 100644
> > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > @@ -1114,13 +1114,29 @@ int ttm_bo_wait_ctx(struct
> > ttm_buffer_object *bo, struct ttm_operation_ctx *ctx)
> >   }
> >   EXPORT_SYMBOL(ttm_bo_wait_ctx);
> >   
> > -int ttm_bo_swapout(struct ttm_buffer_object *bo, struct
> > ttm_operation_ctx *ctx,
> > -                  gfp_t gfp_flags)
> > +/**
> > + * ttm_bo_swapout() - Swap out or purge a buffer object
> > + * @bo: The buffer object.
> > + * @ctx: The ttm operation context.
> > + * @reason: The swapout reason.
> > + *
> > + * Try to swap out or purge the contents of a system memory backed
> > buffer
> > + * object. The function needs to be called with the device's LRU
> > lock held.
> > + *
> > + * Return: -EBUSY if the bo lock could not be grabbed or the
> > object was
> > + * otherwise busy. Otherwise the number of pages swapped out or
> > negative
> > + * error code on error. Iff the function didn't return -EBUSY, the
> > + * LRU lock was dropped, and LRU traversal needs to restart.
> > + */
> > +long ttm_bo_swapout(struct ttm_buffer_object *bo, struct
> > ttm_operation_ctx *ctx,
> > +                   enum ttm_shrink_reason reason)
> >   {
> >         struct ttm_place place;
> >         bool locked;
> >         long ret;
> >   
> > +       lockdep_assert_held(&bo->bdev->lru_lock);
> > +
> >         /*
> >          * While the bo may already reside in SYSTEM placement, set
> >          * SYSTEM as new placement to cover also the move further
> > below.
> > @@ -1142,8 +1158,12 @@ int ttm_bo_swapout(struct ttm_buffer_object
> > *bo, struct ttm_operation_ctx *ctx,
> >         }
> >   
> >         if (bo->deleted) {
> > +               long num_pages = bo->ttm->num_pages;
> > +
> >                 ret = ttm_bo_cleanup_refs(bo, false, false,
> > locked);
> >                 ttm_bo_put(bo);
> > +               if (!ret)
> > +                       return num_pages;
> >                 return ret == -EBUSY ? -ENOSPC : ret;
> >         }
> >   
> > @@ -1184,13 +1204,17 @@ int ttm_bo_swapout(struct ttm_buffer_object
> > *bo, struct ttm_operation_ctx *ctx,
> >          * Swap out. Buffer will be swapped in again as soon as
> >          * anyone tries to access a ttm page.
> >          */
> > -       if (bo->bdev->funcs->swap_notify)
> > -               bo->bdev->funcs->swap_notify(bo);
> > +       if (bo->bdev->funcs->bo_shrink && reason !=
> > TTM_SHRINK_WATERMARK) {
> > +               ret = bo->bdev->funcs->bo_shrink(bo, ctx);
> > +       } else {
> > +               if (bo->bdev->funcs->swap_notify)
> > +                       bo->bdev->funcs->swap_notify(bo);
> > +               ret = ttm_tt_swapout(bo->bdev, bo->ttm);
> > +               if (!ret)
> > +                       ret = bo->ttm->num_pages;
> > +       }
> >   
> > -       if (ttm_tt_is_populated(bo->ttm))
> > -               ret = ttm_tt_swapout(bo->bdev, bo->ttm, gfp_flags);
> >   out:
> > -
> >         /*
> >          * Unreserve without putting on LRU to avoid swapping out
> > an
> >          * already swapped buffer.
> > diff --git a/drivers/gpu/drm/ttm/ttm_device.c
> > b/drivers/gpu/drm/ttm/ttm_device.c
> > index ae2f19dc9f81..7eadea07027f 100644
> > --- a/drivers/gpu/drm/ttm/ttm_device.c
> > +++ b/drivers/gpu/drm/ttm/ttm_device.c
> > @@ -116,19 +116,28 @@ static int ttm_global_init(void)
> >         return ret;
> >   }
> >   
> > -/*
> > - * A buffer object shrink method that tries to swap out the first
> > - * buffer object on the global::swap_lru list.
> > +/**
> > + * ttm_global_swapout() - Select and swap out a system-memory-
> > backed bo.
> > + * @ctx: The operation context.
> > + * @reason: The reason for swapout.
> > + *
> > + * Select, based on round-robin a TTM device and traverse the LRUs
> > of
> > + * that specific device until a suitable bo backed by system
> > memory is found
> > + * and swapped-out or purged.
> > + *
> > + * Return: Positive value or zero indicating the size in pages of
> > the
> > + * bo swapped out. Negative error code on error.
> >    */
> > -int ttm_global_swapout(struct ttm_operation_ctx *ctx, gfp_t
> > gfp_flags)
> > +long ttm_global_swapout(struct ttm_operation_ctx *ctx,
> > +                       enum ttm_shrink_reason reason)
> >   {
> >         struct ttm_global *glob = &ttm_glob;
> >         struct ttm_device *bdev;
> > -       int ret = 0;
> > +       long ret = 0;
> >   
> >         mutex_lock(&ttm_global_mutex);
> >         list_for_each_entry(bdev, &glob->device_list, device_list)
> > {
> > -               ret = ttm_device_swapout(bdev, ctx, gfp_flags);
> > +               ret = ttm_device_swapout(bdev, ctx, reason);
> >                 if (ret > 0) {
> >                         list_move_tail(&bdev->device_list, &glob-
> > >device_list);
> >                         break;
> > @@ -139,14 +148,29 @@ int ttm_global_swapout(struct
> > ttm_operation_ctx *ctx, gfp_t gfp_flags)
> >   }
> >   EXPORT_SYMBOL(ttm_global_swapout);
> >   
> > -int ttm_device_swapout(struct ttm_device *bdev, struct
> > ttm_operation_ctx *ctx,
> > -                      gfp_t gfp_flags)
> > +/**
> > + * ttm_device_swapout() - Select and swap out a system-memory-
> > backed bo.
> > + * @bdev: The device whos bos are considered for swapout.
> > + * @ctx: The operation context.
> > + * @reason: The reason for swapout.
> > + *
> > + * Traverse the LRUs of a specific device until a suitable bo
> > backed by
> > + * system memory is found and swapped-out or purged.
> > + *
> > + * Return: Positive value or zero indicating the size in pages of
> > the
> > + * bo swapped out. Negative error code on error.
> > + */
> > +long ttm_device_swapout(struct ttm_device *bdev, struct
> > ttm_operation_ctx *ctx,
> > +                       enum ttm_shrink_reason reason)
> >   {
> >         struct ttm_resource_cursor cursor;
> >         struct ttm_resource_manager *man;
> >         struct ttm_resource *res;
> >         unsigned i;
> > -       int ret;
> > +       long ret;
> > +
> > +       if (reason != TTM_SHRINK_WATERMARK && !bdev->funcs-
> > >bo_shrink)
> > +               return 0;
> >   
> >         spin_lock(&bdev->lru_lock);
> >         for (i = TTM_PL_SYSTEM; i < TTM_NUM_MEM_TYPES; ++i) {
> > @@ -156,16 +180,19 @@ int ttm_device_swapout(struct ttm_device
> > *bdev, struct ttm_operation_ctx *ctx,
> >   
> >                 ttm_resource_manager_for_each_res(man, &cursor,
> > res) {
> >                         struct ttm_buffer_object *bo = res->bo;
> > -                       uint32_t num_pages;
> > +                       struct ttm_tt *tt;
> >   
> >                         if (!bo || bo->resource != res)
> >                                 continue;
> >   
> > -                       num_pages = PFN_UP(bo->base.size);
> > -                       ret = ttm_bo_swapout(bo, ctx, gfp_flags);
> > +                       tt = bo->ttm;
> > +                       if (!tt || (reason == TTM_SHRINK_PURGE &&
> > +                                   !ttm_tt_purgeable(tt)))
> > +                               continue;
> > +                       ret = ttm_bo_swapout(bo, ctx, reason);
> >                         /* ttm_bo_swapout has dropped the lru_lock
> > */
> > -                       if (!ret)
> > -                               return num_pages;
> > +                       if (ret >= 0)
> > +                               return ret;
> >                         if (ret != -EBUSY)
> >                                 return ret;
> >                 }
> > diff --git a/drivers/gpu/drm/ttm/ttm_tt.c
> > b/drivers/gpu/drm/ttm/ttm_tt.c
> > index ab725d9d14a6..a68c14de0161 100644
> > --- a/drivers/gpu/drm/ttm/ttm_tt.c
> > +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> > @@ -239,22 +239,21 @@ int ttm_tt_swapin(struct ttm_tt *ttm)
> >   
> >   /**
> >    * ttm_tt_swapout - swap out tt object
> > - *
> >    * @bdev: TTM device structure.
> >    * @ttm: The struct ttm_tt.
> > - * @gfp_flags: Flags to use for memory allocation.
> >    *
> > - * Swapout a TT object to a shmem_file, return number of pages
> > swapped out or
> > - * negative error code.
> > + * Swapout a TT object to a shmem_file.
> > + *
> > + * Return: number of pages swapped out or negative error code on
> > error.
> >    */
> > -int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
> > -                  gfp_t gfp_flags)
> > +int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm)
> >   {
> >         loff_t size = (loff_t)ttm->num_pages << PAGE_SHIFT;
> >         struct address_space *swap_space;
> >         struct file *swap_storage;
> >         struct page *from_page;
> >         struct page *to_page;
> > +       gfp_t gfp_flags;
> >         int i, ret;
> >   
> >         swap_storage = shmem_file_setup("ttm swap", size, 0);
> > @@ -264,7 +263,7 @@ int ttm_tt_swapout(struct ttm_device *bdev,
> > struct ttm_tt *ttm,
> >         }
> >   
> >         swap_space = swap_storage->f_mapping;
> > -       gfp_flags &= mapping_gfp_mask(swap_space);
> > +       gfp_flags = GFP_KERNEL & mapping_gfp_mask(swap_space);
> >   
> >         for (i = 0; i < ttm->num_pages; ++i) {
> >                 from_page = ttm->pages[i];
> > @@ -315,12 +314,14 @@ int ttm_tt_populate(struct ttm_device *bdev,
> >         while (atomic_long_read(&ttm_pages_allocated) >
> > ttm_pages_limit ||
> >                atomic_long_read(&ttm_dma32_pages_allocated) >
> >                ttm_dma32_pages_limit) {
> > +               long r = ttm_global_swapout(ctx,
> > TTM_SHRINK_WATERMARK);
> >   
> > -               ret = ttm_global_swapout(ctx, GFP_KERNEL);
> > -               if (ret == 0)
> > +               if (!r)
> >                         break;
> > -               if (ret < 0)
> > +               if (r < 0) {
> > +                       ret = r;
> >                         goto error;
> > +               }
> >         }
> >   
> >         if (bdev->funcs->ttm_tt_populate)
> > @@ -379,7 +380,7 @@ static int ttm_tt_debugfs_shrink_show(struct
> > seq_file *m, void *data)
> >   {
> >         struct ttm_operation_ctx ctx = { false, false };
> >   
> > -       seq_printf(m, "%d\n", ttm_global_swapout(&ctx,
> > GFP_KERNEL));
> > +       seq_printf(m, "%ld\n", ttm_global_swapout(&ctx,
> > TTM_SHRINK_SWAP));
> >         return 0;
> >   }
> >   DEFINE_SHOW_ATTRIBUTE(ttm_tt_debugfs_shrink);
> > diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> > b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> > index 2588615a2a38..292c5199d2cc 100644
> > --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> > +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> > @@ -1514,7 +1514,8 @@ static int vmw_pm_freeze(struct device *kdev)
> >         vmw_execbuf_release_pinned_bo(dev_priv);
> >         vmw_resource_evict_all(dev_priv);
> >         vmw_release_device_early(dev_priv);
> > -       while (ttm_device_swapout(&dev_priv->bdev, &ctx,
> > GFP_KERNEL) > 0);
> > +       while (ttm_device_swapout(&dev_priv->bdev, &ctx,
> > TTM_SHRINK_WATERMARK) > 0)
> > +               ;
> >         vmw_fifo_resource_dec(dev_priv);
> >         if (atomic_read(&dev_priv->num_fifo_resources) != 0) {
> >                 DRM_ERROR("Can't hibernate while 3D resources are
> > active.\n");
> > diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
> > index 8b113c384236..6b45e0b639e0 100644
> > --- a/include/drm/ttm/ttm_bo.h
> > +++ b/include/drm/ttm/ttm_bo.h
> > @@ -375,8 +375,8 @@ void ttm_bo_kunmap(struct ttm_bo_kmap_obj
> > *map);
> >   int ttm_bo_vmap(struct ttm_buffer_object *bo, struct iosys_map
> > *map);
> >   void ttm_bo_vunmap(struct ttm_buffer_object *bo, struct iosys_map
> > *map);
> >   int ttm_bo_mmap_obj(struct vm_area_struct *vma, struct
> > ttm_buffer_object *bo);
> > -int ttm_bo_swapout(struct ttm_buffer_object *bo, struct
> > ttm_operation_ctx *ctx,
> > -                  gfp_t gfp_flags);
> > +long ttm_bo_swapout(struct ttm_buffer_object *bo, struct
> > ttm_operation_ctx *ctx,
> > +                   enum ttm_shrink_reason reason);
> >   void ttm_bo_pin(struct ttm_buffer_object *bo);
> >   void ttm_bo_unpin(struct ttm_buffer_object *bo);
> >   int ttm_mem_evict_first(struct ttm_device *bdev,
> > diff --git a/include/drm/ttm/ttm_device.h
> > b/include/drm/ttm/ttm_device.h
> > index 4f3e81eac6f3..6bd2abf712ab 100644
> > --- a/include/drm/ttm/ttm_device.h
> > +++ b/include/drm/ttm/ttm_device.h
> > @@ -35,6 +35,21 @@ struct ttm_placement;
> >   struct ttm_buffer_object;
> >   struct ttm_operation_ctx;
> >   
> > +/**
> > + * enum ttm_shrink_reason - Reason for shrinking system memory
> > + * @TTM_SHRINK_WATERMARK - A watermark limit was reached. Not from
> > reclaim.
> > + * @TTM_SHRINK_PURGE - A request for shrinking only purged
> > objects.
> > + * @TTM_SHRINK_SWAP - A request for shrinking any object.
> > + *
> > + * This enum is intended for the buffer object- and shrink method
> > selection
> > + * algorithms. It's not intended to leak to or be used by TTM
> > drivers.
> > + */
> > +enum ttm_shrink_reason {
> > +       TTM_SHRINK_WATERMARK,
> > +       TTM_SHRINK_PURGE,
> > +       TTM_SHRINK_SWAP,
> > +};
> > +
> >   /**
> >    * struct ttm_global - Buffer object driver global data.
> >    */
> > @@ -207,6 +222,19 @@ struct ttm_device_funcs {
> >          * adding fences that may force a delayed delete
> >          */
> >         void (*release_notify)(struct ttm_buffer_object *bo);
> > +
> > +       /**
> > +        * Shrink the bo's system pages, Either by swapping or by
> > purging.
> > +        * @bo: Bo the system pages of which are to be shrunken.
> > +        * @ctx: Operation ctx. In particular the driver callback
> > should
> > +        *       adhere to the no_wait_gpu and interruptible
> > fields.
> > +        *
> > +        * This is also notifying the driver that the bo is about
> > to be
> > +        * shrunken and the driver should take care to unbind any
> > GPU bindings
> > +        * and to note that the content is purged if @bo->ttm is
> > purgeable.
> > +        */
> > +       long (*bo_shrink)(struct ttm_buffer_object *bo,
> > +                         struct ttm_operation_ctx *ctx);
> >   };
> >   
> >   /**
> > @@ -268,9 +296,11 @@ struct ttm_device {
> >         struct workqueue_struct *wq;
> >   };
> >   
> > -int ttm_global_swapout(struct ttm_operation_ctx *ctx, gfp_t
> > gfp_flags);
> > -int ttm_device_swapout(struct ttm_device *bdev, struct
> > ttm_operation_ctx *ctx,
> > -                      gfp_t gfp_flags);
> > +long ttm_global_swapout(struct ttm_operation_ctx *ctx,
> > +                       enum ttm_shrink_reason reason);
> > +
> > +long ttm_device_swapout(struct ttm_device *bdev, struct
> > ttm_operation_ctx *ctx,
> > +                       enum ttm_shrink_reason reason);
> >   
> >   static inline struct ttm_resource_manager *
> >   ttm_manager_type(struct ttm_device *bdev, int mem_type)
> > diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
> > index cc54be1912e1..627168eba8f6 100644
> > --- a/include/drm/ttm/ttm_tt.h
> > +++ b/include/drm/ttm/ttm_tt.h
> > @@ -87,6 +87,7 @@ struct ttm_tt {
> >   #define TTM_TT_FLAG_ZERO_ALLOC                BIT(1)
> >   #define TTM_TT_FLAG_EXTERNAL          BIT(2)
> >   #define TTM_TT_FLAG_EXTERNAL_MAPPABLE BIT(3)
> > +#define TTM_TT_FLAG_DONTNEED           BIT(4)
> >   
> >   #define TTM_TT_FLAG_PRIV_POPULATED    BIT(31)
> >         uint32_t page_flags;
> > @@ -180,8 +181,8 @@ void ttm_tt_destroy(struct ttm_device *bdev,
> > struct ttm_tt *ttm);
> >    * Swap in a previously swap out ttm_tt.
> >    */
> >   int ttm_tt_swapin(struct ttm_tt *ttm);
> > -int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
> > -                  gfp_t gfp_flags);
> > +
> > +int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm);
> >   
> >   /**
> >    * ttm_tt_populate - allocate pages for a ttm
> > @@ -223,6 +224,18 @@ void ttm_tt_mgr_init(unsigned long num_pages,
> > unsigned long num_dma32_pages);
> >   struct ttm_kmap_iter *ttm_kmap_iter_tt_init(struct
> > ttm_kmap_iter_tt *iter_tt,
> >                                             struct ttm_tt *tt);
> >   
> > +/**
> > + * ttm_tt_purgeable() - Whether a struct ttm_tt's contents is
> > purgeable
> > + * @tt: The struct ttm_tt to consider.
> > + *
> > + * Return: Whether the contents is purgeable in the sence that the
> > owner
> > + * doesn't mind losing it as long as it gets notified.
> > + */
> > +static inline bool ttm_tt_purgeable(struct ttm_tt *tt)
> > +{
> > +       return tt->page_flags & TTM_TT_FLAG_DONTNEED;
> > +}
> > +
> >   #if IS_ENABLED(CONFIG_AGP)
> >   #include <linux/agp_backend.h>
> >   
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 02/16] drm/ttm/pool: Fix ttm_pool_alloc error path
  2023-02-15 18:02     ` Thomas Hellström
@ 2023-02-15 18:26       ` Christian König
  2023-02-15 18:51         ` Thomas Hellström
  0 siblings, 1 reply; 32+ messages in thread
From: Christian König @ 2023-02-15 18:26 UTC (permalink / raw)
  To: Thomas Hellström, dri-devel
  Cc: Miaohe Lin, David Hildenbrand, NeilBrown, Daniel Vetter,
	intel-gfx, Matthew Wilcox (Oracle),
	linux-mm, Dave Hansen, Huang Rui, linux-graphics-maintainer,
	Peter Xu, Johannes Weiner, Madhav Chauhan, Dave Airlie,
	Andrew Morton, Matthew Auld

Am 15.02.23 um 19:02 schrieb Thomas Hellström:
> On Wed, 2023-02-15 at 18:31 +0100, Christian König wrote:
>> Am 15.02.23 um 17:13 schrieb Thomas Hellström:
>>> When hitting an error, the error path forgot to unmap dma mappings
>>> and
>> I don't see where this happens?
>  From what I can tell, ttm_pool_page_allocated() maps the page for dma,
> If we later hit an error, ttm_pool_free_page() will leak the mapping.

Ah, I see. Good point.

>
>>> could call set_pages_wb() on already uncached pages.
>> Yeah, but what's the problem?
> Umm, at least if you try to set WC on an already WC'd page, the
> set_pages_ code will spam dmesg with warnings.
> Not sure if set_pages_wb() on WB pages does the same, nor if it
> issues unnecessary global cache / tlb flushes or whether that will
> change in the future.
> The point of avoiding the set_pages_wb() when already WB is you don't
> have to check, and you don't have to care.

Please just open code the error handling then. That helper function 
looks horrible complicated to me.

Alternatively we could have a free function for a range of pages.

Regards,
Christian.


>
> That said, the __ttm_pool_free() is used also in upcoming patches.
>
> /Thomas
>
>
>> Regards,
>> Christian.
>>
>>> Fix this by introducing a common __ttm_pool_free() function that
>>> does the right thing.
>>>
>>> Fixes: d099fc8f540a ("drm/ttm: new TT backend allocation pool v3")
>>> Cc: Christian König <christian.koenig@amd.com>
>>> Cc: Dave Airlie <airlied@redhat.com>
>>> Cc: Madhav Chauhan <madhav.chauhan@amd.com>
>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>> Cc: Huang Rui <ray.huang@amd.com>
>>> Cc: dri-devel@lists.freedesktop.org
>>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>> ---
>>>    drivers/gpu/drm/ttm/ttm_pool.c | 74 +++++++++++++++++++++--------
>>> -----
>>>    1 file changed, 45 insertions(+), 29 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c
>>> b/drivers/gpu/drm/ttm/ttm_pool.c
>>> index aa116a7bbae3..1cc7591a9542 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_pool.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
>>> @@ -367,6 +367,39 @@ static int ttm_pool_page_allocated(struct
>>> ttm_pool *pool, unsigned int order,
>>>          return 0;
>>>    }
>>>    
>>> +static void __ttm_pool_free(struct ttm_pool *pool, struct ttm_tt
>>> *tt,
>>> +                           struct page **caching_divide,
>>> +                           enum ttm_caching initial_caching,
>>> +                           enum ttm_caching subseq_caching,
>>> +                           pgoff_t num_pages)
>>> +{
>>> +       enum ttm_caching caching = subseq_caching;
>>> +       struct page **pages = tt->pages;
>>> +       unsigned int order;
>>> +       pgoff_t i, nr;
>>> +
>>> +       if (pool && caching_divide)
>>> +               caching = initial_caching;
>>> +
>>> +       for (i = 0; i < num_pages; i += nr, pages += nr) {
>>> +               struct ttm_pool_type *pt = NULL;
>>> +
>>> +               if (unlikely(caching_divide == pages))
>>> +                       caching = subseq_caching;
>>> +
>>> +               order = ttm_pool_page_order(pool, *pages);
>>> +               nr = (1UL << order);
>>> +               if (tt->dma_address)
>>> +                       ttm_pool_unmap(pool, tt->dma_address[i],
>>> nr);
>>> +
>>> +               pt = ttm_pool_select_type(pool, caching, order);
>>> +               if (pt)
>>> +                       ttm_pool_type_give(pt, *pages);
>>> +               else
>>> +                       ttm_pool_free_page(pool, caching, order,
>>> *pages);
>>> +       }
>>> +}
>>> +
>>>    /**
>>>     * ttm_pool_alloc - Fill a ttm_tt object
>>>     *
>>> @@ -386,8 +419,9 @@ int ttm_pool_alloc(struct ttm_pool *pool,
>>> struct ttm_tt *tt,
>>>          dma_addr_t *dma_addr = tt->dma_address;
>>>          struct page **caching = tt->pages;
>>>          struct page **pages = tt->pages;
>>> +       enum ttm_caching page_caching;
>>>          gfp_t gfp_flags = GFP_USER;
>>> -       unsigned int i, order;
>>> +       unsigned int order;
>>>          struct page *p;
>>>          int r;
>>>    
>>> @@ -410,6 +444,7 @@ int ttm_pool_alloc(struct ttm_pool *pool,
>>> struct ttm_tt *tt,
>>>               order = min_t(unsigned int, order, __fls(num_pages)))
>>> {
>>>                  struct ttm_pool_type *pt;
>>>    
>>> +               page_caching = tt->caching;
>>>                  pt = ttm_pool_select_type(pool, tt->caching,
>>> order);
>>>                  p = pt ? ttm_pool_type_take(pt) : NULL;
>>>                  if (p) {
>>> @@ -418,6 +453,7 @@ int ttm_pool_alloc(struct ttm_pool *pool,
>>> struct ttm_tt *tt,
>>>                          if (r)
>>>                                  goto error_free_page;
>>>    
>>> +                       caching = pages;
>>>                          do {
>>>                                  r = ttm_pool_page_allocated(pool,
>>> order, p,
>>>                                                             
>>> &dma_addr,
>>> @@ -426,14 +462,15 @@ int ttm_pool_alloc(struct ttm_pool *pool,
>>> struct ttm_tt *tt,
>>>                                  if (r)
>>>                                          goto error_free_page;
>>>    
>>> +                               caching = pages;
>>>                                  if (num_pages < (1 << order))
>>>                                          break;
>>>    
>>>                                  p = ttm_pool_type_take(pt);
>>>                          } while (p);
>>> -                       caching = pages;
>>>                  }
>>>    
>>> +               page_caching = ttm_cached;
>>>                  while (num_pages >= (1 << order) &&
>>>                         (p = ttm_pool_alloc_page(pool, gfp_flags,
>>> order))) {
>>>    
>>> @@ -442,6 +479,7 @@ int ttm_pool_alloc(struct ttm_pool *pool,
>>> struct ttm_tt *tt,
>>>                                                             tt-
>>>> caching);
>>>                                  if (r)
>>>                                          goto error_free_page;
>>> +                               caching = pages;
>>>                          }
>>>                          r = ttm_pool_page_allocated(pool, order, p,
>>> &dma_addr,
>>>                                                      &num_pages,
>>> &pages);
>>> @@ -468,15 +506,12 @@ int ttm_pool_alloc(struct ttm_pool *pool,
>>> struct ttm_tt *tt,
>>>          return 0;
>>>    
>>>    error_free_page:
>>> -       ttm_pool_free_page(pool, tt->caching, order, p);
>>> +       ttm_pool_free_page(pool, page_caching, order, p);
>>>    
>>>    error_free_all:
>>>          num_pages = tt->num_pages - num_pages;
>>> -       for (i = 0; i < num_pages; ) {
>>> -               order = ttm_pool_page_order(pool, tt->pages[i]);
>>> -               ttm_pool_free_page(pool, tt->caching, order, tt-
>>>> pages[i]);
>>> -               i += 1 << order;
>>> -       }
>>> +       __ttm_pool_free(pool, tt, caching, tt->caching, ttm_cached,
>>> +                       num_pages);
>>>    
>>>          return r;
>>>    }
>>> @@ -492,27 +527,8 @@ EXPORT_SYMBOL(ttm_pool_alloc);
>>>     */
>>>    void ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt)
>>>    {
>>> -       unsigned int i;
>>> -
>>> -       for (i = 0; i < tt->num_pages; ) {
>>> -               struct page *p = tt->pages[i];
>>> -               unsigned int order, num_pages;
>>> -               struct ttm_pool_type *pt;
>>> -
>>> -               order = ttm_pool_page_order(pool, p);
>>> -               num_pages = 1ULL << order;
>>> -               if (tt->dma_address)
>>> -                       ttm_pool_unmap(pool, tt->dma_address[i],
>>> num_pages);
>>> -
>>> -               pt = ttm_pool_select_type(pool, tt->caching,
>>> order);
>>> -               if (pt)
>>> -                       ttm_pool_type_give(pt, tt->pages[i]);
>>> -               else
>>> -                       ttm_pool_free_page(pool, tt->caching,
>>> order,
>>> -                                          tt->pages[i]);
>>> -
>>> -               i += num_pages;
>>> -       }
>>> +       __ttm_pool_free(pool, tt, NULL, tt->caching, tt->caching,
>>> +                       tt->num_pages);
>>>    
>>>          while (atomic_long_read(&allocated_pages) > page_pool_size)
>>>                  ttm_pool_shrink();


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 07/16] drm/ttm: Reduce the number of used allocation orders for TTM pages
  2023-02-15 18:12     ` Thomas Hellström
@ 2023-02-15 18:30       ` Christian König
  2023-02-15 19:00         ` Thomas Hellström
  0 siblings, 1 reply; 32+ messages in thread
From: Christian König @ 2023-02-15 18:30 UTC (permalink / raw)
  To: Thomas Hellström, dri-devel
  Cc: Miaohe Lin, David Hildenbrand, NeilBrown, Daniel Vetter,
	intel-gfx, Peter Xu, linux-mm, Dave Hansen,
	linux-graphics-maintainer, Matthew Wilcox (Oracle),
	Johannes Weiner, Dave Airlie, Andrew Morton, Matthew Auld

Am 15.02.23 um 19:12 schrieb Thomas Hellström:
> On Wed, 2023-02-15 at 18:42 +0100, Christian König wrote:
>> Am 15.02.23 um 17:13 schrieb Thomas Hellström:
>>> When swapping out, we will split multi-order pages both in order to
>>> move them to the swap-cache and to be able to return memory to the
>>> swap cache as soon as possible on a page-by-page basis.
>>> By reducing the page max order to the system PMD size, we can be
>>> nicer
>>> to the system and avoid splitting gigantic pages.
>>
>>> On top of this we also
>>> include the 64K page size in the page sizes tried, since that
>>> appears to
>>> be a common size for GPU applications.
>> Please completely drop that.
> You mean the 64K page size, or the whole patch?

The 64K page size. This was an invention from Microsoft to standardize 
GPU handling ~15-20years ago.

It turned out to be a complete shipwreck and by now 2MiB and 1GiB pages 
or just flexible hardware which can handle everything seem to become 
standard.

>> This is just nonsense spilling in from the
>> Windows drivers.
> Agreed, but IIRC on the last RFC you asked me not to drop the 64K
> pages, so that's why they are here. I can remove them if needed.

We could keep it if it's in any way beneficial, but I'm pretty sure I 
must have been drunk to ask for that.

> The only reason for keeping them from a performance point of view is
> better efficiency on GPUs with 64K page size if not using a coalescing
> IOMMU for dma-mapping.

Are any of those still produced? As far as I know neither NVidia, Intel 
nor AMD still assumes that page size in their hardware for quite a while 
now.

Regards,
Christian.

>
> Let me know what you think is best and I'll adjust accordingly.
>
> /Thomas
>
>
>> Christian.
>>
>>> Looking forward to when we might be able to swap out PMD size
>>> folios
>>> without splitting, this will also be a benefit.
>>>
>>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>> ---
>>>    drivers/gpu/drm/ttm/ttm_pool.c | 58 ++++++++++++++++++++++++++---
>>> -----
>>>    1 file changed, 45 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c
>>> b/drivers/gpu/drm/ttm/ttm_pool.c
>>> index 1cc7591a9542..8787fb6a218b 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_pool.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
>>> @@ -31,6 +31,8 @@
>>>     * cause they are rather slow compared to alloc_pages+map.
>>>     */
>>>    
>>> +#define pr_fmt(fmt) "[TTM POOL] " fmt
>>> +
>>>    #include <linux/module.h>
>>>    #include <linux/dma-mapping.h>
>>>    #include <linux/debugfs.h>
>>> @@ -47,6 +49,18 @@
>>>    
>>>    #include "ttm_module.h"
>>>    
>>> +#define TTM_MAX_ORDER (PMD_SHIFT - PAGE_SHIFT)
>>> +#define TTM_64K_ORDER (16 - PAGE_SHIFT)
>>> +#if (TTM_MAX_ORDER < TTM_64K_ORDER)
>>> +#undef TTM_MAX_ORDER
>>> +#define TTM_MAX_ORDER TTM_64K_ORDER
>>> +#endif
>>> +#if ((MAX_ORDER - 1) < TTM_MAX_ORDER)
>>> +#undef TTM_MAX_ORDER
>>> +#define TTM_MAX_ORDER (MAX_ORDER - 1)
>>> +#endif
>>> +#define TTM_DIM_ORDER (TTM_MAX_ORDER + 1)
>>> +
>>>    /**
>>>     * struct ttm_pool_dma - Helper object for coherent DMA mappings
>>>     *
>>> @@ -65,16 +79,18 @@ module_param(page_pool_size, ulong, 0644);
>>>    
>>>    static atomic_long_t allocated_pages;
>>>    
>>> -static struct ttm_pool_type global_write_combined[MAX_ORDER];
>>> -static struct ttm_pool_type global_uncached[MAX_ORDER];
>>> +static struct ttm_pool_type global_write_combined[TTM_DIM_ORDER];
>>> +static struct ttm_pool_type global_uncached[TTM_DIM_ORDER];
>>>    
>>> -static struct ttm_pool_type
>>> global_dma32_write_combined[MAX_ORDER];
>>> -static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
>>> +static struct ttm_pool_type
>>> global_dma32_write_combined[TTM_DIM_ORDER];
>>> +static struct ttm_pool_type global_dma32_uncached[TTM_DIM_ORDER];
>>>    
>>>    static spinlock_t shrinker_lock;
>>>    static struct list_head shrinker_list;
>>>    static struct shrinker mm_shrinker;
>>>    
>>> +static unsigned int ttm_pool_orders[] = {TTM_MAX_ORDER, 0, 0};
>>> +
>>>    /* Allocate pages of size 1 << order with the given gfp_flags */
>>>    static struct page *ttm_pool_alloc_page(struct ttm_pool *pool,
>>> gfp_t gfp_flags,
>>>                                          unsigned int order)
>>> @@ -400,6 +416,17 @@ static void __ttm_pool_free(struct ttm_pool
>>> *pool, struct ttm_tt *tt,
>>>          }
>>>    }
>>>    
>>> +static unsigned int ttm_pool_select_order(unsigned int order,
>>> pgoff_t num_pages)
>>> +{
>>> +       unsigned int *cur_order = ttm_pool_orders;
>>> +
>>> +       order = min_t(unsigned int, __fls(num_pages), order);
>>> +       while (order < *cur_order)
>>> +               ++cur_order;
>>> +
>>> +       return *cur_order;
>>> +}
>>> +
>>>    /**
>>>     * ttm_pool_alloc - Fill a ttm_tt object
>>>     *
>>> @@ -439,9 +466,8 @@ int ttm_pool_alloc(struct ttm_pool *pool,
>>> struct ttm_tt *tt,
>>>          else
>>>                  gfp_flags |= GFP_HIGHUSER;
>>>    
>>> -       for (order = min_t(unsigned int, MAX_ORDER - 1,
>>> __fls(num_pages));
>>> -            num_pages;
>>> -            order = min_t(unsigned int, order, __fls(num_pages)))
>>> {
>>> +       order = ttm_pool_select_order(ttm_pool_orders[0],
>>> num_pages);
>>> +       for (; num_pages; order = ttm_pool_select_order(order,
>>> num_pages)) {
>>>                  struct ttm_pool_type *pt;
>>>    
>>>                  page_caching = tt->caching;
>>> @@ -558,7 +584,7 @@ void ttm_pool_init(struct ttm_pool *pool,
>>> struct device *dev,
>>>    
>>>          if (use_dma_alloc) {
>>>                  for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
>>> -                       for (j = 0; j < MAX_ORDER; ++j)
>>> +                       for (j = 0; j < TTM_DIM_ORDER; ++j)
>>>                                  ttm_pool_type_init(&pool-
>>>> caching[i].orders[j],
>>>                                                     pool, i, j);
>>>          }
>>> @@ -578,7 +604,7 @@ void ttm_pool_fini(struct ttm_pool *pool)
>>>    
>>>          if (pool->use_dma_alloc) {
>>>                  for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
>>> -                       for (j = 0; j < MAX_ORDER; ++j)
>>> +                       for (j = 0; j < TTM_DIM_ORDER; ++j)
>>>                                  ttm_pool_type_fini(&pool-
>>>> caching[i].orders[j]);
>>>          }
>>>    
>>> @@ -632,7 +658,7 @@ static void ttm_pool_debugfs_header(struct
>>> seq_file *m)
>>>          unsigned int i;
>>>    
>>>          seq_puts(m, "\t ");
>>> -       for (i = 0; i < MAX_ORDER; ++i)
>>> +       for (i = 0; i < TTM_DIM_ORDER; ++i)
>>>                  seq_printf(m, " ---%2u---", i);
>>>          seq_puts(m, "\n");
>>>    }
>>> @@ -643,7 +669,7 @@ static void ttm_pool_debugfs_orders(struct
>>> ttm_pool_type *pt,
>>>    {
>>>          unsigned int i;
>>>    
>>> -       for (i = 0; i < MAX_ORDER; ++i)
>>> +       for (i = 0; i < TTM_DIM_ORDER; ++i)
>>>                  seq_printf(m, " %8u", ttm_pool_type_count(&pt[i]));
>>>          seq_puts(m, "\n");
>>>    }
>>> @@ -749,10 +775,16 @@ int ttm_pool_mgr_init(unsigned long
>>> num_pages)
>>>          if (!page_pool_size)
>>>                  page_pool_size = num_pages;
>>>    
>>> +       if (TTM_64K_ORDER < TTM_MAX_ORDER)
>>> +               ttm_pool_orders[1] = TTM_64K_ORDER;
>>> +
>>> +       pr_debug("Used orders are %u %u %u\n", ttm_pool_orders[0],
>>> +                ttm_pool_orders[1], ttm_pool_orders[2]);
>>> +
>>>          spin_lock_init(&shrinker_lock);
>>>          INIT_LIST_HEAD(&shrinker_list);
>>>    
>>> -       for (i = 0; i < MAX_ORDER; ++i) {
>>> +       for (i = 0; i < TTM_DIM_ORDER; ++i) {
>>>                  ttm_pool_type_init(&global_write_combined[i], NULL,
>>>                                     ttm_write_combined, i);
>>>                  ttm_pool_type_init(&global_uncached[i], NULL,
>>> ttm_uncached, i);
>>> @@ -785,7 +817,7 @@ void ttm_pool_mgr_fini(void)
>>>    {
>>>          unsigned int i;
>>>    
>>> -       for (i = 0; i < MAX_ORDER; ++i) {
>>> +       for (i = 0; i < TTM_DIM_ORDER; ++i) {
>>>                  ttm_pool_type_fini(&global_write_combined[i]);
>>>                  ttm_pool_type_fini(&global_uncached[i]);
>>>    


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 04/16] drm/ttm, drm/vmwgfx: Update the TTM swapout interface
  2023-02-15 18:19     ` Thomas Hellström
@ 2023-02-15 18:32       ` Christian König
  0 siblings, 0 replies; 32+ messages in thread
From: Christian König @ 2023-02-15 18:32 UTC (permalink / raw)
  To: Thomas Hellström, dri-devel
  Cc: Miaohe Lin, David Hildenbrand, NeilBrown, Daniel Vetter,
	intel-gfx, Matthew Wilcox (Oracle),
	linux-mm, Dave Hansen, linux-graphics-maintainer, Peter Xu,
	Johannes Weiner, Dave Airlie, Andrew Morton, Matthew Auld

Am 15.02.23 um 19:19 schrieb Thomas Hellström:
> On Wed, 2023-02-15 at 18:39 +0100, Christian König wrote:
>> Am 15.02.23 um 17:13 schrieb Thomas Hellström:
>>> Update the TTM swapout interfaces for better compatibility with a
>>> shrinker.
>>> - Replace number-of-pages int return with a long to better match
>>> the
>>>     kernel's shrinker interface.
>>> - The gfp_flags parameter to ttm_xx_swapout() currently only takes
>>> the
>>>     GFP_KERNEL value and shouldn't really be needed since the
>>> shrinker we
>>>     hook up in upcoming patches sets a allocation context to match
>>> reclaim.
>>> - Introduce a shrink reason enumeration and a driver callback to
>>> shrink
>>>     buffer objects.
>> Is that really necessary? This is mid-layering once more.
>>
>> If drivers want to implement driver specific shrinking they should
>> register their own shrinker callback.
> Yes, a choice needs to be made here. If TTM registers the shrinker, the
> driver needs to be called at least to unbind and to remove dma-
> mappings.
>
> If the driver registers the shrinker it can still (I think) use the
> pool helpers, but needs TTM for LRU traversal and accounting.
>
> I can have a look at the latter if yout think that will be a better
> solution.

Yeah, that's what I had in mind as well. Something like the drivers 
registers the shrinker and TTM provides the function to give a candidate 
for eviction.

Christian.

>
> /Thomas
>
>
>> Christian.
>>
>>
>>>     The TTM_SHRINK_WATERMARK reason is going to still be handled
>>> using the
>>>     existing shmem copy, and will be used by pool types that don't
>>> lend
>>>     themselves well to shinking (dma_alloc pool) and when drivers
>>> explicitly
>>>     requests swapout.
>>>     The TTM_SHRINK_SWAP and TTM_SHRINK_PURGE reasons originate from
>>> a
>>>     shrinker and is to be handled by a new driver callback,
>>> bo_shrink().
>>>     Helpers for the new driver callback are provided in upcoming
>>> patches.
>>>
>>> Cc: linux-graphics-maintainer@vmware.com
>>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>> ---
>>>    drivers/gpu/drm/ttm/ttm_bo.c        | 38 ++++++++++++++++----
>>>    drivers/gpu/drm/ttm/ttm_device.c    | 55 +++++++++++++++++++++---
>>> -----
>>>    drivers/gpu/drm/ttm/ttm_tt.c        | 23 ++++++------
>>>    drivers/gpu/drm/vmwgfx/vmwgfx_drv.c |  3 +-
>>>    include/drm/ttm/ttm_bo.h            |  4 +--
>>>    include/drm/ttm/ttm_device.h        | 36 +++++++++++++++++--
>>>    include/drm/ttm/ttm_tt.h            | 17 +++++++--
>>>    7 files changed, 136 insertions(+), 40 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
>>> b/drivers/gpu/drm/ttm/ttm_bo.c
>>> index 882c2fa346f3..e5c0970564c0 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -1114,13 +1114,29 @@ int ttm_bo_wait_ctx(struct
>>> ttm_buffer_object *bo, struct ttm_operation_ctx *ctx)
>>>    }
>>>    EXPORT_SYMBOL(ttm_bo_wait_ctx);
>>>    
>>> -int ttm_bo_swapout(struct ttm_buffer_object *bo, struct
>>> ttm_operation_ctx *ctx,
>>> -                  gfp_t gfp_flags)
>>> +/**
>>> + * ttm_bo_swapout() - Swap out or purge a buffer object
>>> + * @bo: The buffer object.
>>> + * @ctx: The ttm operation context.
>>> + * @reason: The swapout reason.
>>> + *
>>> + * Try to swap out or purge the contents of a system memory backed
>>> buffer
>>> + * object. The function needs to be called with the device's LRU
>>> lock held.
>>> + *
>>> + * Return: -EBUSY if the bo lock could not be grabbed or the
>>> object was
>>> + * otherwise busy. Otherwise the number of pages swapped out or
>>> negative
>>> + * error code on error. Iff the function didn't return -EBUSY, the
>>> + * LRU lock was dropped, and LRU traversal needs to restart.
>>> + */
>>> +long ttm_bo_swapout(struct ttm_buffer_object *bo, struct
>>> ttm_operation_ctx *ctx,
>>> +                   enum ttm_shrink_reason reason)
>>>    {
>>>          struct ttm_place place;
>>>          bool locked;
>>>          long ret;
>>>    
>>> +       lockdep_assert_held(&bo->bdev->lru_lock);
>>> +
>>>          /*
>>>           * While the bo may already reside in SYSTEM placement, set
>>>           * SYSTEM as new placement to cover also the move further
>>> below.
>>> @@ -1142,8 +1158,12 @@ int ttm_bo_swapout(struct ttm_buffer_object
>>> *bo, struct ttm_operation_ctx *ctx,
>>>          }
>>>    
>>>          if (bo->deleted) {
>>> +               long num_pages = bo->ttm->num_pages;
>>> +
>>>                  ret = ttm_bo_cleanup_refs(bo, false, false,
>>> locked);
>>>                  ttm_bo_put(bo);
>>> +               if (!ret)
>>> +                       return num_pages;
>>>                  return ret == -EBUSY ? -ENOSPC : ret;
>>>          }
>>>    
>>> @@ -1184,13 +1204,17 @@ int ttm_bo_swapout(struct ttm_buffer_object
>>> *bo, struct ttm_operation_ctx *ctx,
>>>           * Swap out. Buffer will be swapped in again as soon as
>>>           * anyone tries to access a ttm page.
>>>           */
>>> -       if (bo->bdev->funcs->swap_notify)
>>> -               bo->bdev->funcs->swap_notify(bo);
>>> +       if (bo->bdev->funcs->bo_shrink && reason !=
>>> TTM_SHRINK_WATERMARK) {
>>> +               ret = bo->bdev->funcs->bo_shrink(bo, ctx);
>>> +       } else {
>>> +               if (bo->bdev->funcs->swap_notify)
>>> +                       bo->bdev->funcs->swap_notify(bo);
>>> +               ret = ttm_tt_swapout(bo->bdev, bo->ttm);
>>> +               if (!ret)
>>> +                       ret = bo->ttm->num_pages;
>>> +       }
>>>    
>>> -       if (ttm_tt_is_populated(bo->ttm))
>>> -               ret = ttm_tt_swapout(bo->bdev, bo->ttm, gfp_flags);
>>>    out:
>>> -
>>>          /*
>>>           * Unreserve without putting on LRU to avoid swapping out
>>> an
>>>           * already swapped buffer.
>>> diff --git a/drivers/gpu/drm/ttm/ttm_device.c
>>> b/drivers/gpu/drm/ttm/ttm_device.c
>>> index ae2f19dc9f81..7eadea07027f 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_device.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_device.c
>>> @@ -116,19 +116,28 @@ static int ttm_global_init(void)
>>>          return ret;
>>>    }
>>>    
>>> -/*
>>> - * A buffer object shrink method that tries to swap out the first
>>> - * buffer object on the global::swap_lru list.
>>> +/**
>>> + * ttm_global_swapout() - Select and swap out a system-memory-
>>> backed bo.
>>> + * @ctx: The operation context.
>>> + * @reason: The reason for swapout.
>>> + *
>>> + * Select, based on round-robin a TTM device and traverse the LRUs
>>> of
>>> + * that specific device until a suitable bo backed by system
>>> memory is found
>>> + * and swapped-out or purged.
>>> + *
>>> + * Return: Positive value or zero indicating the size in pages of
>>> the
>>> + * bo swapped out. Negative error code on error.
>>>     */
>>> -int ttm_global_swapout(struct ttm_operation_ctx *ctx, gfp_t
>>> gfp_flags)
>>> +long ttm_global_swapout(struct ttm_operation_ctx *ctx,
>>> +                       enum ttm_shrink_reason reason)
>>>    {
>>>          struct ttm_global *glob = &ttm_glob;
>>>          struct ttm_device *bdev;
>>> -       int ret = 0;
>>> +       long ret = 0;
>>>    
>>>          mutex_lock(&ttm_global_mutex);
>>>          list_for_each_entry(bdev, &glob->device_list, device_list)
>>> {
>>> -               ret = ttm_device_swapout(bdev, ctx, gfp_flags);
>>> +               ret = ttm_device_swapout(bdev, ctx, reason);
>>>                  if (ret > 0) {
>>>                          list_move_tail(&bdev->device_list, &glob-
>>>> device_list);
>>>                          break;
>>> @@ -139,14 +148,29 @@ int ttm_global_swapout(struct
>>> ttm_operation_ctx *ctx, gfp_t gfp_flags)
>>>    }
>>>    EXPORT_SYMBOL(ttm_global_swapout);
>>>    
>>> -int ttm_device_swapout(struct ttm_device *bdev, struct
>>> ttm_operation_ctx *ctx,
>>> -                      gfp_t gfp_flags)
>>> +/**
>>> + * ttm_device_swapout() - Select and swap out a system-memory-
>>> backed bo.
>>> + * @bdev: The device whos bos are considered for swapout.
>>> + * @ctx: The operation context.
>>> + * @reason: The reason for swapout.
>>> + *
>>> + * Traverse the LRUs of a specific device until a suitable bo
>>> backed by
>>> + * system memory is found and swapped-out or purged.
>>> + *
>>> + * Return: Positive value or zero indicating the size in pages of
>>> the
>>> + * bo swapped out. Negative error code on error.
>>> + */
>>> +long ttm_device_swapout(struct ttm_device *bdev, struct
>>> ttm_operation_ctx *ctx,
>>> +                       enum ttm_shrink_reason reason)
>>>    {
>>>          struct ttm_resource_cursor cursor;
>>>          struct ttm_resource_manager *man;
>>>          struct ttm_resource *res;
>>>          unsigned i;
>>> -       int ret;
>>> +       long ret;
>>> +
>>> +       if (reason != TTM_SHRINK_WATERMARK && !bdev->funcs-
>>>> bo_shrink)
>>> +               return 0;
>>>    
>>>          spin_lock(&bdev->lru_lock);
>>>          for (i = TTM_PL_SYSTEM; i < TTM_NUM_MEM_TYPES; ++i) {
>>> @@ -156,16 +180,19 @@ int ttm_device_swapout(struct ttm_device
>>> *bdev, struct ttm_operation_ctx *ctx,
>>>    
>>>                  ttm_resource_manager_for_each_res(man, &cursor,
>>> res) {
>>>                          struct ttm_buffer_object *bo = res->bo;
>>> -                       uint32_t num_pages;
>>> +                       struct ttm_tt *tt;
>>>    
>>>                          if (!bo || bo->resource != res)
>>>                                  continue;
>>>    
>>> -                       num_pages = PFN_UP(bo->base.size);
>>> -                       ret = ttm_bo_swapout(bo, ctx, gfp_flags);
>>> +                       tt = bo->ttm;
>>> +                       if (!tt || (reason == TTM_SHRINK_PURGE &&
>>> +                                   !ttm_tt_purgeable(tt)))
>>> +                               continue;
>>> +                       ret = ttm_bo_swapout(bo, ctx, reason);
>>>                          /* ttm_bo_swapout has dropped the lru_lock
>>> */
>>> -                       if (!ret)
>>> -                               return num_pages;
>>> +                       if (ret >= 0)
>>> +                               return ret;
>>>                          if (ret != -EBUSY)
>>>                                  return ret;
>>>                  }
>>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c
>>> b/drivers/gpu/drm/ttm/ttm_tt.c
>>> index ab725d9d14a6..a68c14de0161 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>>> @@ -239,22 +239,21 @@ int ttm_tt_swapin(struct ttm_tt *ttm)
>>>    
>>>    /**
>>>     * ttm_tt_swapout - swap out tt object
>>> - *
>>>     * @bdev: TTM device structure.
>>>     * @ttm: The struct ttm_tt.
>>> - * @gfp_flags: Flags to use for memory allocation.
>>>     *
>>> - * Swapout a TT object to a shmem_file, return number of pages
>>> swapped out or
>>> - * negative error code.
>>> + * Swapout a TT object to a shmem_file.
>>> + *
>>> + * Return: number of pages swapped out or negative error code on
>>> error.
>>>     */
>>> -int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
>>> -                  gfp_t gfp_flags)
>>> +int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm)
>>>    {
>>>          loff_t size = (loff_t)ttm->num_pages << PAGE_SHIFT;
>>>          struct address_space *swap_space;
>>>          struct file *swap_storage;
>>>          struct page *from_page;
>>>          struct page *to_page;
>>> +       gfp_t gfp_flags;
>>>          int i, ret;
>>>    
>>>          swap_storage = shmem_file_setup("ttm swap", size, 0);
>>> @@ -264,7 +263,7 @@ int ttm_tt_swapout(struct ttm_device *bdev,
>>> struct ttm_tt *ttm,
>>>          }
>>>    
>>>          swap_space = swap_storage->f_mapping;
>>> -       gfp_flags &= mapping_gfp_mask(swap_space);
>>> +       gfp_flags = GFP_KERNEL & mapping_gfp_mask(swap_space);
>>>    
>>>          for (i = 0; i < ttm->num_pages; ++i) {
>>>                  from_page = ttm->pages[i];
>>> @@ -315,12 +314,14 @@ int ttm_tt_populate(struct ttm_device *bdev,
>>>          while (atomic_long_read(&ttm_pages_allocated) >
>>> ttm_pages_limit ||
>>>                 atomic_long_read(&ttm_dma32_pages_allocated) >
>>>                 ttm_dma32_pages_limit) {
>>> +               long r = ttm_global_swapout(ctx,
>>> TTM_SHRINK_WATERMARK);
>>>    
>>> -               ret = ttm_global_swapout(ctx, GFP_KERNEL);
>>> -               if (ret == 0)
>>> +               if (!r)
>>>                          break;
>>> -               if (ret < 0)
>>> +               if (r < 0) {
>>> +                       ret = r;
>>>                          goto error;
>>> +               }
>>>          }
>>>    
>>>          if (bdev->funcs->ttm_tt_populate)
>>> @@ -379,7 +380,7 @@ static int ttm_tt_debugfs_shrink_show(struct
>>> seq_file *m, void *data)
>>>    {
>>>          struct ttm_operation_ctx ctx = { false, false };
>>>    
>>> -       seq_printf(m, "%d\n", ttm_global_swapout(&ctx,
>>> GFP_KERNEL));
>>> +       seq_printf(m, "%ld\n", ttm_global_swapout(&ctx,
>>> TTM_SHRINK_SWAP));
>>>          return 0;
>>>    }
>>>    DEFINE_SHOW_ATTRIBUTE(ttm_tt_debugfs_shrink);
>>> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
>>> b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
>>> index 2588615a2a38..292c5199d2cc 100644
>>> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
>>> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
>>> @@ -1514,7 +1514,8 @@ static int vmw_pm_freeze(struct device *kdev)
>>>          vmw_execbuf_release_pinned_bo(dev_priv);
>>>          vmw_resource_evict_all(dev_priv);
>>>          vmw_release_device_early(dev_priv);
>>> -       while (ttm_device_swapout(&dev_priv->bdev, &ctx,
>>> GFP_KERNEL) > 0);
>>> +       while (ttm_device_swapout(&dev_priv->bdev, &ctx,
>>> TTM_SHRINK_WATERMARK) > 0)
>>> +               ;
>>>          vmw_fifo_resource_dec(dev_priv);
>>>          if (atomic_read(&dev_priv->num_fifo_resources) != 0) {
>>>                  DRM_ERROR("Can't hibernate while 3D resources are
>>> active.\n");
>>> diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
>>> index 8b113c384236..6b45e0b639e0 100644
>>> --- a/include/drm/ttm/ttm_bo.h
>>> +++ b/include/drm/ttm/ttm_bo.h
>>> @@ -375,8 +375,8 @@ void ttm_bo_kunmap(struct ttm_bo_kmap_obj
>>> *map);
>>>    int ttm_bo_vmap(struct ttm_buffer_object *bo, struct iosys_map
>>> *map);
>>>    void ttm_bo_vunmap(struct ttm_buffer_object *bo, struct iosys_map
>>> *map);
>>>    int ttm_bo_mmap_obj(struct vm_area_struct *vma, struct
>>> ttm_buffer_object *bo);
>>> -int ttm_bo_swapout(struct ttm_buffer_object *bo, struct
>>> ttm_operation_ctx *ctx,
>>> -                  gfp_t gfp_flags);
>>> +long ttm_bo_swapout(struct ttm_buffer_object *bo, struct
>>> ttm_operation_ctx *ctx,
>>> +                   enum ttm_shrink_reason reason);
>>>    void ttm_bo_pin(struct ttm_buffer_object *bo);
>>>    void ttm_bo_unpin(struct ttm_buffer_object *bo);
>>>    int ttm_mem_evict_first(struct ttm_device *bdev,
>>> diff --git a/include/drm/ttm/ttm_device.h
>>> b/include/drm/ttm/ttm_device.h
>>> index 4f3e81eac6f3..6bd2abf712ab 100644
>>> --- a/include/drm/ttm/ttm_device.h
>>> +++ b/include/drm/ttm/ttm_device.h
>>> @@ -35,6 +35,21 @@ struct ttm_placement;
>>>    struct ttm_buffer_object;
>>>    struct ttm_operation_ctx;
>>>    
>>> +/**
>>> + * enum ttm_shrink_reason - Reason for shrinking system memory
>>> + * @TTM_SHRINK_WATERMARK - A watermark limit was reached. Not from
>>> reclaim.
>>> + * @TTM_SHRINK_PURGE - A request for shrinking only purged
>>> objects.
>>> + * @TTM_SHRINK_SWAP - A request for shrinking any object.
>>> + *
>>> + * This enum is intended for the buffer object- and shrink method
>>> selection
>>> + * algorithms. It's not intended to leak to or be used by TTM
>>> drivers.
>>> + */
>>> +enum ttm_shrink_reason {
>>> +       TTM_SHRINK_WATERMARK,
>>> +       TTM_SHRINK_PURGE,
>>> +       TTM_SHRINK_SWAP,
>>> +};
>>> +
>>>    /**
>>>     * struct ttm_global - Buffer object driver global data.
>>>     */
>>> @@ -207,6 +222,19 @@ struct ttm_device_funcs {
>>>           * adding fences that may force a delayed delete
>>>           */
>>>          void (*release_notify)(struct ttm_buffer_object *bo);
>>> +
>>> +       /**
>>> +        * Shrink the bo's system pages, Either by swapping or by
>>> purging.
>>> +        * @bo: Bo the system pages of which are to be shrunken.
>>> +        * @ctx: Operation ctx. In particular the driver callback
>>> should
>>> +        *       adhere to the no_wait_gpu and interruptible
>>> fields.
>>> +        *
>>> +        * This is also notifying the driver that the bo is about
>>> to be
>>> +        * shrunken and the driver should take care to unbind any
>>> GPU bindings
>>> +        * and to note that the content is purged if @bo->ttm is
>>> purgeable.
>>> +        */
>>> +       long (*bo_shrink)(struct ttm_buffer_object *bo,
>>> +                         struct ttm_operation_ctx *ctx);
>>>    };
>>>    
>>>    /**
>>> @@ -268,9 +296,11 @@ struct ttm_device {
>>>          struct workqueue_struct *wq;
>>>    };
>>>    
>>> -int ttm_global_swapout(struct ttm_operation_ctx *ctx, gfp_t
>>> gfp_flags);
>>> -int ttm_device_swapout(struct ttm_device *bdev, struct
>>> ttm_operation_ctx *ctx,
>>> -                      gfp_t gfp_flags);
>>> +long ttm_global_swapout(struct ttm_operation_ctx *ctx,
>>> +                       enum ttm_shrink_reason reason);
>>> +
>>> +long ttm_device_swapout(struct ttm_device *bdev, struct
>>> ttm_operation_ctx *ctx,
>>> +                       enum ttm_shrink_reason reason);
>>>    
>>>    static inline struct ttm_resource_manager *
>>>    ttm_manager_type(struct ttm_device *bdev, int mem_type)
>>> diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
>>> index cc54be1912e1..627168eba8f6 100644
>>> --- a/include/drm/ttm/ttm_tt.h
>>> +++ b/include/drm/ttm/ttm_tt.h
>>> @@ -87,6 +87,7 @@ struct ttm_tt {
>>>    #define TTM_TT_FLAG_ZERO_ALLOC                BIT(1)
>>>    #define TTM_TT_FLAG_EXTERNAL          BIT(2)
>>>    #define TTM_TT_FLAG_EXTERNAL_MAPPABLE BIT(3)
>>> +#define TTM_TT_FLAG_DONTNEED           BIT(4)
>>>    
>>>    #define TTM_TT_FLAG_PRIV_POPULATED    BIT(31)
>>>          uint32_t page_flags;
>>> @@ -180,8 +181,8 @@ void ttm_tt_destroy(struct ttm_device *bdev,
>>> struct ttm_tt *ttm);
>>>     * Swap in a previously swap out ttm_tt.
>>>     */
>>>    int ttm_tt_swapin(struct ttm_tt *ttm);
>>> -int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm,
>>> -                  gfp_t gfp_flags);
>>> +
>>> +int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm);
>>>    
>>>    /**
>>>     * ttm_tt_populate - allocate pages for a ttm
>>> @@ -223,6 +224,18 @@ void ttm_tt_mgr_init(unsigned long num_pages,
>>> unsigned long num_dma32_pages);
>>>    struct ttm_kmap_iter *ttm_kmap_iter_tt_init(struct
>>> ttm_kmap_iter_tt *iter_tt,
>>>                                              struct ttm_tt *tt);
>>>    
>>> +/**
>>> + * ttm_tt_purgeable() - Whether a struct ttm_tt's contents is
>>> purgeable
>>> + * @tt: The struct ttm_tt to consider.
>>> + *
>>> + * Return: Whether the contents is purgeable in the sence that the
>>> owner
>>> + * doesn't mind losing it as long as it gets notified.
>>> + */
>>> +static inline bool ttm_tt_purgeable(struct ttm_tt *tt)
>>> +{
>>> +       return tt->page_flags & TTM_TT_FLAG_DONTNEED;
>>> +}
>>> +
>>>    #if IS_ENABLED(CONFIG_AGP)
>>>    #include <linux/agp_backend.h>
>>>    


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 02/16] drm/ttm/pool: Fix ttm_pool_alloc error path
  2023-02-15 18:26       ` Christian König
@ 2023-02-15 18:51         ` Thomas Hellström
  0 siblings, 0 replies; 32+ messages in thread
From: Thomas Hellström @ 2023-02-15 18:51 UTC (permalink / raw)
  To: Christian König, dri-devel
  Cc: Miaohe Lin, David Hildenbrand, NeilBrown, Daniel Vetter,
	intel-gfx, Matthew Wilcox (Oracle),
	linux-mm, Dave Hansen, Huang Rui, linux-graphics-maintainer,
	Peter Xu, Johannes Weiner, Madhav Chauhan, Dave Airlie,
	Andrew Morton, Matthew Auld

On Wed, 2023-02-15 at 19:26 +0100, Christian König wrote:
> Am 15.02.23 um 19:02 schrieb Thomas Hellström:
> > On Wed, 2023-02-15 at 18:31 +0100, Christian König wrote:
> > > Am 15.02.23 um 17:13 schrieb Thomas Hellström:
> > > > When hitting an error, the error path forgot to unmap dma
> > > > mappings
> > > > and
> > > I don't see where this happens?
> >  From what I can tell, ttm_pool_page_allocated() maps the page for
> > dma,
> > If we later hit an error, ttm_pool_free_page() will leak the
> > mapping.
> 
> Ah, I see. Good point.
> 
> > 
> > > > could call set_pages_wb() on already uncached pages.
> > > Yeah, but what's the problem?
> > Umm, at least if you try to set WC on an already WC'd page, the
> > set_pages_ code will spam dmesg with warnings.
> > Not sure if set_pages_wb() on WB pages does the same, nor if it
> > issues unnecessary global cache / tlb flushes or whether that will
> > change in the future.
> > The point of avoiding the set_pages_wb() when already WB is you
> > don't
> > have to check, and you don't have to care.
> 
> Please just open code the error handling then. That helper function 
> looks horrible complicated to me.
> 
> Alternatively we could have a free function for a range of pages.

OK, I'll see if this is doable without adding a tremendous amount of
code.

/Thomas


> 
> Regards,
> Christian.
> 
> 
> > 
> > That said, the __ttm_pool_free() is used also in upcoming patches.
> > 
> > /Thomas
> > 
> > 
> > > Regards,
> > > Christian.
> > > 
> > > > Fix this by introducing a common __ttm_pool_free() function
> > > > that
> > > > does the right thing.
> > > > 
> > > > Fixes: d099fc8f540a ("drm/ttm: new TT backend allocation pool
> > > > v3")
> > > > Cc: Christian König <christian.koenig@amd.com>
> > > > Cc: Dave Airlie <airlied@redhat.com>
> > > > Cc: Madhav Chauhan <madhav.chauhan@amd.com>
> > > > Cc: Christian Koenig <christian.koenig@amd.com>
> > > > Cc: Huang Rui <ray.huang@amd.com>
> > > > Cc: dri-devel@lists.freedesktop.org
> > > > Signed-off-by: Thomas Hellström
> > > > <thomas.hellstrom@linux.intel.com>
> > > > ---
> > > >    drivers/gpu/drm/ttm/ttm_pool.c | 74 +++++++++++++++++++++---
> > > > -----
> > > > -----
> > > >    1 file changed, 45 insertions(+), 29 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/ttm/ttm_pool.c
> > > > b/drivers/gpu/drm/ttm/ttm_pool.c
> > > > index aa116a7bbae3..1cc7591a9542 100644
> > > > --- a/drivers/gpu/drm/ttm/ttm_pool.c
> > > > +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> > > > @@ -367,6 +367,39 @@ static int ttm_pool_page_allocated(struct
> > > > ttm_pool *pool, unsigned int order,
> > > >          return 0;
> > > >    }
> > > >    
> > > > +static void __ttm_pool_free(struct ttm_pool *pool, struct
> > > > ttm_tt
> > > > *tt,
> > > > +                           struct page **caching_divide,
> > > > +                           enum ttm_caching initial_caching,
> > > > +                           enum ttm_caching subseq_caching,
> > > > +                           pgoff_t num_pages)
> > > > +{
> > > > +       enum ttm_caching caching = subseq_caching;
> > > > +       struct page **pages = tt->pages;
> > > > +       unsigned int order;
> > > > +       pgoff_t i, nr;
> > > > +
> > > > +       if (pool && caching_divide)
> > > > +               caching = initial_caching;
> > > > +
> > > > +       for (i = 0; i < num_pages; i += nr, pages += nr) {
> > > > +               struct ttm_pool_type *pt = NULL;
> > > > +
> > > > +               if (unlikely(caching_divide == pages))
> > > > +                       caching = subseq_caching;
> > > > +
> > > > +               order = ttm_pool_page_order(pool, *pages);
> > > > +               nr = (1UL << order);
> > > > +               if (tt->dma_address)
> > > > +                       ttm_pool_unmap(pool, tt-
> > > > >dma_address[i],
> > > > nr);
> > > > +
> > > > +               pt = ttm_pool_select_type(pool, caching,
> > > > order);
> > > > +               if (pt)
> > > > +                       ttm_pool_type_give(pt, *pages);
> > > > +               else
> > > > +                       ttm_pool_free_page(pool, caching,
> > > > order,
> > > > *pages);
> > > > +       }
> > > > +}
> > > > +
> > > >    /**
> > > >     * ttm_pool_alloc - Fill a ttm_tt object
> > > >     *
> > > > @@ -386,8 +419,9 @@ int ttm_pool_alloc(struct ttm_pool *pool,
> > > > struct ttm_tt *tt,
> > > >          dma_addr_t *dma_addr = tt->dma_address;
> > > >          struct page **caching = tt->pages;
> > > >          struct page **pages = tt->pages;
> > > > +       enum ttm_caching page_caching;
> > > >          gfp_t gfp_flags = GFP_USER;
> > > > -       unsigned int i, order;
> > > > +       unsigned int order;
> > > >          struct page *p;
> > > >          int r;
> > > >    
> > > > @@ -410,6 +444,7 @@ int ttm_pool_alloc(struct ttm_pool *pool,
> > > > struct ttm_tt *tt,
> > > >               order = min_t(unsigned int, order,
> > > > __fls(num_pages)))
> > > > {
> > > >                  struct ttm_pool_type *pt;
> > > >    
> > > > +               page_caching = tt->caching;
> > > >                  pt = ttm_pool_select_type(pool, tt->caching,
> > > > order);
> > > >                  p = pt ? ttm_pool_type_take(pt) : NULL;
> > > >                  if (p) {
> > > > @@ -418,6 +453,7 @@ int ttm_pool_alloc(struct ttm_pool *pool,
> > > > struct ttm_tt *tt,
> > > >                          if (r)
> > > >                                  goto error_free_page;
> > > >    
> > > > +                       caching = pages;
> > > >                          do {
> > > >                                  r =
> > > > ttm_pool_page_allocated(pool,
> > > > order, p,
> > > >                                                             
> > > > &dma_addr,
> > > > @@ -426,14 +462,15 @@ int ttm_pool_alloc(struct ttm_pool *pool,
> > > > struct ttm_tt *tt,
> > > >                                  if (r)
> > > >                                          goto error_free_page;
> > > >    
> > > > +                               caching = pages;
> > > >                                  if (num_pages < (1 << order))
> > > >                                          break;
> > > >    
> > > >                                  p = ttm_pool_type_take(pt);
> > > >                          } while (p);
> > > > -                       caching = pages;
> > > >                  }
> > > >    
> > > > +               page_caching = ttm_cached;
> > > >                  while (num_pages >= (1 << order) &&
> > > >                         (p = ttm_pool_alloc_page(pool,
> > > > gfp_flags,
> > > > order))) {
> > > >    
> > > > @@ -442,6 +479,7 @@ int ttm_pool_alloc(struct ttm_pool *pool,
> > > > struct ttm_tt *tt,
> > > >                                                             tt-
> > > > > caching);
> > > >                                  if (r)
> > > >                                          goto error_free_page;
> > > > +                               caching = pages;
> > > >                          }
> > > >                          r = ttm_pool_page_allocated(pool,
> > > > order, p,
> > > > &dma_addr,
> > > >                                                     
> > > > &num_pages,
> > > > &pages);
> > > > @@ -468,15 +506,12 @@ int ttm_pool_alloc(struct ttm_pool *pool,
> > > > struct ttm_tt *tt,
> > > >          return 0;
> > > >    
> > > >    error_free_page:
> > > > -       ttm_pool_free_page(pool, tt->caching, order, p);
> > > > +       ttm_pool_free_page(pool, page_caching, order, p);
> > > >    
> > > >    error_free_all:
> > > >          num_pages = tt->num_pages - num_pages;
> > > > -       for (i = 0; i < num_pages; ) {
> > > > -               order = ttm_pool_page_order(pool, tt-
> > > > >pages[i]);
> > > > -               ttm_pool_free_page(pool, tt->caching, order,
> > > > tt-
> > > > > pages[i]);
> > > > -               i += 1 << order;
> > > > -       }
> > > > +       __ttm_pool_free(pool, tt, caching, tt->caching,
> > > > ttm_cached,
> > > > +                       num_pages);
> > > >    
> > > >          return r;
> > > >    }
> > > > @@ -492,27 +527,8 @@ EXPORT_SYMBOL(ttm_pool_alloc);
> > > >     */
> > > >    void ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt)
> > > >    {
> > > > -       unsigned int i;
> > > > -
> > > > -       for (i = 0; i < tt->num_pages; ) {
> > > > -               struct page *p = tt->pages[i];
> > > > -               unsigned int order, num_pages;
> > > > -               struct ttm_pool_type *pt;
> > > > -
> > > > -               order = ttm_pool_page_order(pool, p);
> > > > -               num_pages = 1ULL << order;
> > > > -               if (tt->dma_address)
> > > > -                       ttm_pool_unmap(pool, tt-
> > > > >dma_address[i],
> > > > num_pages);
> > > > -
> > > > -               pt = ttm_pool_select_type(pool, tt->caching,
> > > > order);
> > > > -               if (pt)
> > > > -                       ttm_pool_type_give(pt, tt->pages[i]);
> > > > -               else
> > > > -                       ttm_pool_free_page(pool, tt->caching,
> > > > order,
> > > > -                                          tt->pages[i]);
> > > > -
> > > > -               i += num_pages;
> > > > -       }
> > > > +       __ttm_pool_free(pool, tt, NULL, tt->caching, tt-
> > > > >caching,
> > > > +                       tt->num_pages);
> > > >    
> > > >          while (atomic_long_read(&allocated_pages) >
> > > > page_pool_size)
> > > >                  ttm_pool_shrink();
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 07/16] drm/ttm: Reduce the number of used allocation orders for TTM pages
  2023-02-15 18:30       ` Christian König
@ 2023-02-15 19:00         ` Thomas Hellström
  2023-02-16  7:11           ` Christian König
  0 siblings, 1 reply; 32+ messages in thread
From: Thomas Hellström @ 2023-02-15 19:00 UTC (permalink / raw)
  To: Christian König, dri-devel
  Cc: Miaohe Lin, David Hildenbrand, NeilBrown, Daniel Vetter,
	intel-gfx, Peter Xu, linux-mm, Dave Hansen,
	linux-graphics-maintainer, Matthew Wilcox (Oracle),
	Johannes Weiner, Dave Airlie, Andrew Morton, Matthew Auld

On Wed, 2023-02-15 at 19:30 +0100, Christian König wrote:
> Am 15.02.23 um 19:12 schrieb Thomas Hellström:
> > On Wed, 2023-02-15 at 18:42 +0100, Christian König wrote:
> > > Am 15.02.23 um 17:13 schrieb Thomas Hellström:
> > > > When swapping out, we will split multi-order pages both in
> > > > order to
> > > > move them to the swap-cache and to be able to return memory to
> > > > the
> > > > swap cache as soon as possible on a page-by-page basis.
> > > > By reducing the page max order to the system PMD size, we can
> > > > be
> > > > nicer
> > > > to the system and avoid splitting gigantic pages.
> > > 
> > > > On top of this we also
> > > > include the 64K page size in the page sizes tried, since that
> > > > appears to
> > > > be a common size for GPU applications.
> > > Please completely drop that.
> > You mean the 64K page size, or the whole patch?
> 
> The 64K page size. This was an invention from Microsoft to
> standardize 
> GPU handling ~15-20years ago.
> 
> It turned out to be a complete shipwreck and by now 2MiB and 1GiB
> pages 
> or just flexible hardware which can handle everything seem to become 
> standard.
> 
> > > This is just nonsense spilling in from the
> > > Windows drivers.
> > Agreed, but IIRC on the last RFC you asked me not to drop the 64K
> > pages, so that's why they are here. I can remove them if needed.
> 
> We could keep it if it's in any way beneficial, but I'm pretty sure I
> must have been drunk to ask for that.
> 
> > The only reason for keeping them from a performance point of view
> > is
> > better efficiency on GPUs with 64K page size if not using a
> > coalescing
> > IOMMU for dma-mapping.
> 
> Are any of those still produced? As far as I know neither NVidia,
> Intel 
> nor AMD still assumes that page size in their hardware for quite a
> while 
> now.

Intel still supports 64K PTEs, so we use them where possible, otherwise
falling back to 4K. Typically we have coalescing IOMMU enabled when
testing, so can't really see the impact, but TBH I was surprised by the
number of 64K page allocations TTM spat out with this patch series, so
I definitely think there is a performance impact with !IOMMU, although
I can't quantify it ATM.

So then if it's OK with you I'll keep that size for now.

/Thomas



> 
> Regards,
> Christian.
> 
> > 
> > Let me know what you think is best and I'll adjust accordingly.
> > 
> > /Thomas
> > 
> > 
> > > Christian.
> > > 
> > > > Looking forward to when we might be able to swap out PMD size
> > > > folios
> > > > without splitting, this will also be a benefit.
> > > > 
> > > > Signed-off-by: Thomas Hellström
> > > > <thomas.hellstrom@linux.intel.com>
> > > > ---
> > > >    drivers/gpu/drm/ttm/ttm_pool.c | 58
> > > > ++++++++++++++++++++++++++---
> > > > -----
> > > >    1 file changed, 45 insertions(+), 13 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/ttm/ttm_pool.c
> > > > b/drivers/gpu/drm/ttm/ttm_pool.c
> > > > index 1cc7591a9542..8787fb6a218b 100644
> > > > --- a/drivers/gpu/drm/ttm/ttm_pool.c
> > > > +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> > > > @@ -31,6 +31,8 @@
> > > >     * cause they are rather slow compared to alloc_pages+map.
> > > >     */
> > > >    
> > > > +#define pr_fmt(fmt) "[TTM POOL] " fmt
> > > > +
> > > >    #include <linux/module.h>
> > > >    #include <linux/dma-mapping.h>
> > > >    #include <linux/debugfs.h>
> > > > @@ -47,6 +49,18 @@
> > > >    
> > > >    #include "ttm_module.h"
> > > >    
> > > > +#define TTM_MAX_ORDER (PMD_SHIFT - PAGE_SHIFT)
> > > > +#define TTM_64K_ORDER (16 - PAGE_SHIFT)
> > > > +#if (TTM_MAX_ORDER < TTM_64K_ORDER)
> > > > +#undef TTM_MAX_ORDER
> > > > +#define TTM_MAX_ORDER TTM_64K_ORDER
> > > > +#endif
> > > > +#if ((MAX_ORDER - 1) < TTM_MAX_ORDER)
> > > > +#undef TTM_MAX_ORDER
> > > > +#define TTM_MAX_ORDER (MAX_ORDER - 1)
> > > > +#endif
> > > > +#define TTM_DIM_ORDER (TTM_MAX_ORDER + 1)
> > > > +
> > > >    /**
> > > >     * struct ttm_pool_dma - Helper object for coherent DMA
> > > > mappings
> > > >     *
> > > > @@ -65,16 +79,18 @@ module_param(page_pool_size, ulong, 0644);
> > > >    
> > > >    static atomic_long_t allocated_pages;
> > > >    
> > > > -static struct ttm_pool_type global_write_combined[MAX_ORDER];
> > > > -static struct ttm_pool_type global_uncached[MAX_ORDER];
> > > > +static struct ttm_pool_type
> > > > global_write_combined[TTM_DIM_ORDER];
> > > > +static struct ttm_pool_type global_uncached[TTM_DIM_ORDER];
> > > >    
> > > > -static struct ttm_pool_type
> > > > global_dma32_write_combined[MAX_ORDER];
> > > > -static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
> > > > +static struct ttm_pool_type
> > > > global_dma32_write_combined[TTM_DIM_ORDER];
> > > > +static struct ttm_pool_type
> > > > global_dma32_uncached[TTM_DIM_ORDER];
> > > >    
> > > >    static spinlock_t shrinker_lock;
> > > >    static struct list_head shrinker_list;
> > > >    static struct shrinker mm_shrinker;
> > > >    
> > > > +static unsigned int ttm_pool_orders[] = {TTM_MAX_ORDER, 0, 0};
> > > > +
> > > >    /* Allocate pages of size 1 << order with the given
> > > > gfp_flags */
> > > >    static struct page *ttm_pool_alloc_page(struct ttm_pool
> > > > *pool,
> > > > gfp_t gfp_flags,
> > > >                                          unsigned int order)
> > > > @@ -400,6 +416,17 @@ static void __ttm_pool_free(struct
> > > > ttm_pool
> > > > *pool, struct ttm_tt *tt,
> > > >          }
> > > >    }
> > > >    
> > > > +static unsigned int ttm_pool_select_order(unsigned int order,
> > > > pgoff_t num_pages)
> > > > +{
> > > > +       unsigned int *cur_order = ttm_pool_orders;
> > > > +
> > > > +       order = min_t(unsigned int, __fls(num_pages), order);
> > > > +       while (order < *cur_order)
> > > > +               ++cur_order;
> > > > +
> > > > +       return *cur_order;
> > > > +}
> > > > +
> > > >    /**
> > > >     * ttm_pool_alloc - Fill a ttm_tt object
> > > >     *
> > > > @@ -439,9 +466,8 @@ int ttm_pool_alloc(struct ttm_pool *pool,
> > > > struct ttm_tt *tt,
> > > >          else
> > > >                  gfp_flags |= GFP_HIGHUSER;
> > > >    
> > > > -       for (order = min_t(unsigned int, MAX_ORDER - 1,
> > > > __fls(num_pages));
> > > > -            num_pages;
> > > > -            order = min_t(unsigned int, order,
> > > > __fls(num_pages)))
> > > > {
> > > > +       order = ttm_pool_select_order(ttm_pool_orders[0],
> > > > num_pages);
> > > > +       for (; num_pages; order = ttm_pool_select_order(order,
> > > > num_pages)) {
> > > >                  struct ttm_pool_type *pt;
> > > >    
> > > >                  page_caching = tt->caching;
> > > > @@ -558,7 +584,7 @@ void ttm_pool_init(struct ttm_pool *pool,
> > > > struct device *dev,
> > > >    
> > > >          if (use_dma_alloc) {
> > > >                  for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
> > > > -                       for (j = 0; j < MAX_ORDER; ++j)
> > > > +                       for (j = 0; j < TTM_DIM_ORDER; ++j)
> > > >                                  ttm_pool_type_init(&pool-
> > > > > caching[i].orders[j],
> > > >                                                     pool, i,
> > > > j);
> > > >          }
> > > > @@ -578,7 +604,7 @@ void ttm_pool_fini(struct ttm_pool *pool)
> > > >    
> > > >          if (pool->use_dma_alloc) {
> > > >                  for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
> > > > -                       for (j = 0; j < MAX_ORDER; ++j)
> > > > +                       for (j = 0; j < TTM_DIM_ORDER; ++j)
> > > >                                  ttm_pool_type_fini(&pool-
> > > > > caching[i].orders[j]);
> > > >          }
> > > >    
> > > > @@ -632,7 +658,7 @@ static void ttm_pool_debugfs_header(struct
> > > > seq_file *m)
> > > >          unsigned int i;
> > > >    
> > > >          seq_puts(m, "\t ");
> > > > -       for (i = 0; i < MAX_ORDER; ++i)
> > > > +       for (i = 0; i < TTM_DIM_ORDER; ++i)
> > > >                  seq_printf(m, " ---%2u---", i);
> > > >          seq_puts(m, "\n");
> > > >    }
> > > > @@ -643,7 +669,7 @@ static void ttm_pool_debugfs_orders(struct
> > > > ttm_pool_type *pt,
> > > >    {
> > > >          unsigned int i;
> > > >    
> > > > -       for (i = 0; i < MAX_ORDER; ++i)
> > > > +       for (i = 0; i < TTM_DIM_ORDER; ++i)
> > > >                  seq_printf(m, " %8u",
> > > > ttm_pool_type_count(&pt[i]));
> > > >          seq_puts(m, "\n");
> > > >    }
> > > > @@ -749,10 +775,16 @@ int ttm_pool_mgr_init(unsigned long
> > > > num_pages)
> > > >          if (!page_pool_size)
> > > >                  page_pool_size = num_pages;
> > > >    
> > > > +       if (TTM_64K_ORDER < TTM_MAX_ORDER)
> > > > +               ttm_pool_orders[1] = TTM_64K_ORDER;
> > > > +
> > > > +       pr_debug("Used orders are %u %u %u\n",
> > > > ttm_pool_orders[0],
> > > > +                ttm_pool_orders[1], ttm_pool_orders[2]);
> > > > +
> > > >          spin_lock_init(&shrinker_lock);
> > > >          INIT_LIST_HEAD(&shrinker_list);
> > > >    
> > > > -       for (i = 0; i < MAX_ORDER; ++i) {
> > > > +       for (i = 0; i < TTM_DIM_ORDER; ++i) {
> > > >                  ttm_pool_type_init(&global_write_combined[i],
> > > > NULL,
> > > >                                     ttm_write_combined, i);
> > > >                  ttm_pool_type_init(&global_uncached[i], NULL,
> > > > ttm_uncached, i);
> > > > @@ -785,7 +817,7 @@ void ttm_pool_mgr_fini(void)
> > > >    {
> > > >          unsigned int i;
> > > >    
> > > > -       for (i = 0; i < MAX_ORDER; ++i) {
> > > > +       for (i = 0; i < TTM_DIM_ORDER; ++i) {
> > > >                  ttm_pool_type_fini(&global_write_combined[i]);
> > > >                  ttm_pool_type_fini(&global_uncached[i]);
> > > >    
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 07/16] drm/ttm: Reduce the number of used allocation orders for TTM pages
  2023-02-15 19:00         ` Thomas Hellström
@ 2023-02-16  7:11           ` Christian König
  2023-02-16  7:24             ` Thomas Hellström
  0 siblings, 1 reply; 32+ messages in thread
From: Christian König @ 2023-02-16  7:11 UTC (permalink / raw)
  To: Thomas Hellström, dri-devel
  Cc: Miaohe Lin, David Hildenbrand, NeilBrown, Daniel Vetter,
	intel-gfx, Peter Xu, linux-mm, Dave Hansen,
	linux-graphics-maintainer, Matthew Wilcox (Oracle),
	Johannes Weiner, Dave Airlie, Andrew Morton, Matthew Auld

Am 15.02.23 um 20:00 schrieb Thomas Hellström:
> On Wed, 2023-02-15 at 19:30 +0100, Christian König wrote:
>> Am 15.02.23 um 19:12 schrieb Thomas Hellström:
>>> On Wed, 2023-02-15 at 18:42 +0100, Christian König wrote:
>>>> Am 15.02.23 um 17:13 schrieb Thomas Hellström:
>>>>> When swapping out, we will split multi-order pages both in
>>>>> order to
>>>>> move them to the swap-cache and to be able to return memory to
>>>>> the
>>>>> swap cache as soon as possible on a page-by-page basis.
>>>>> By reducing the page max order to the system PMD size, we can
>>>>> be
>>>>> nicer
>>>>> to the system and avoid splitting gigantic pages.
>>>>> On top of this we also
>>>>> include the 64K page size in the page sizes tried, since that
>>>>> appears to
>>>>> be a common size for GPU applications.
>>>> Please completely drop that.
>>> You mean the 64K page size, or the whole patch?
>> The 64K page size. This was an invention from Microsoft to
>> standardize
>> GPU handling ~15-20years ago.
>>
>> It turned out to be a complete shipwreck and by now 2MiB and 1GiB
>> pages
>> or just flexible hardware which can handle everything seem to become
>> standard.
>>
>>>> This is just nonsense spilling in from the
>>>> Windows drivers.
>>> Agreed, but IIRC on the last RFC you asked me not to drop the 64K
>>> pages, so that's why they are here. I can remove them if needed.
>> We could keep it if it's in any way beneficial, but I'm pretty sure I
>> must have been drunk to ask for that.
>>
>>> The only reason for keeping them from a performance point of view
>>> is
>>> better efficiency on GPUs with 64K page size if not using a
>>> coalescing
>>> IOMMU for dma-mapping.
>> Are any of those still produced? As far as I know neither NVidia,
>> Intel
>> nor AMD still assumes that page size in their hardware for quite a
>> while
>> now.
> Intel still supports 64K PTEs, so we use them where possible, otherwise
> falling back to 4K. Typically we have coalescing IOMMU enabled when
> testing, so can't really see the impact, but TBH I was surprised by the
> number of 64K page allocations TTM spat out with this patch series, so
> I definitely think there is a performance impact with !IOMMU, although
> I can't quantify it ATM.
>
> So then if it's OK with you I'll keep that size for now.

If it makes 64K pages preferred then this is a pretty clear NAK.

What we can do is to support any page size up to at least 2MiB here.

Christian.

>
> /Thomas
>
>
>
>> Regards,
>> Christian.
>>
>>> Let me know what you think is best and I'll adjust accordingly.
>>>
>>> /Thomas
>>>
>>>
>>>> Christian.
>>>>
>>>>> Looking forward to when we might be able to swap out PMD size
>>>>> folios
>>>>> without splitting, this will also be a benefit.
>>>>>
>>>>> Signed-off-by: Thomas Hellström
>>>>> <thomas.hellstrom@linux.intel.com>
>>>>> ---
>>>>>     drivers/gpu/drm/ttm/ttm_pool.c | 58
>>>>> ++++++++++++++++++++++++++---
>>>>> -----
>>>>>     1 file changed, 45 insertions(+), 13 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c
>>>>> b/drivers/gpu/drm/ttm/ttm_pool.c
>>>>> index 1cc7591a9542..8787fb6a218b 100644
>>>>> --- a/drivers/gpu/drm/ttm/ttm_pool.c
>>>>> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
>>>>> @@ -31,6 +31,8 @@
>>>>>      * cause they are rather slow compared to alloc_pages+map.
>>>>>      */
>>>>>     
>>>>> +#define pr_fmt(fmt) "[TTM POOL] " fmt
>>>>> +
>>>>>     #include <linux/module.h>
>>>>>     #include <linux/dma-mapping.h>
>>>>>     #include <linux/debugfs.h>
>>>>> @@ -47,6 +49,18 @@
>>>>>     
>>>>>     #include "ttm_module.h"
>>>>>     
>>>>> +#define TTM_MAX_ORDER (PMD_SHIFT - PAGE_SHIFT)
>>>>> +#define TTM_64K_ORDER (16 - PAGE_SHIFT)
>>>>> +#if (TTM_MAX_ORDER < TTM_64K_ORDER)
>>>>> +#undef TTM_MAX_ORDER
>>>>> +#define TTM_MAX_ORDER TTM_64K_ORDER
>>>>> +#endif
>>>>> +#if ((MAX_ORDER - 1) < TTM_MAX_ORDER)
>>>>> +#undef TTM_MAX_ORDER
>>>>> +#define TTM_MAX_ORDER (MAX_ORDER - 1)
>>>>> +#endif
>>>>> +#define TTM_DIM_ORDER (TTM_MAX_ORDER + 1)
>>>>> +
>>>>>     /**
>>>>>      * struct ttm_pool_dma - Helper object for coherent DMA
>>>>> mappings
>>>>>      *
>>>>> @@ -65,16 +79,18 @@ module_param(page_pool_size, ulong, 0644);
>>>>>     
>>>>>     static atomic_long_t allocated_pages;
>>>>>     
>>>>> -static struct ttm_pool_type global_write_combined[MAX_ORDER];
>>>>> -static struct ttm_pool_type global_uncached[MAX_ORDER];
>>>>> +static struct ttm_pool_type
>>>>> global_write_combined[TTM_DIM_ORDER];
>>>>> +static struct ttm_pool_type global_uncached[TTM_DIM_ORDER];
>>>>>     
>>>>> -static struct ttm_pool_type
>>>>> global_dma32_write_combined[MAX_ORDER];
>>>>> -static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
>>>>> +static struct ttm_pool_type
>>>>> global_dma32_write_combined[TTM_DIM_ORDER];
>>>>> +static struct ttm_pool_type
>>>>> global_dma32_uncached[TTM_DIM_ORDER];
>>>>>     
>>>>>     static spinlock_t shrinker_lock;
>>>>>     static struct list_head shrinker_list;
>>>>>     static struct shrinker mm_shrinker;
>>>>>     
>>>>> +static unsigned int ttm_pool_orders[] = {TTM_MAX_ORDER, 0, 0};
>>>>> +
>>>>>     /* Allocate pages of size 1 << order with the given
>>>>> gfp_flags */
>>>>>     static struct page *ttm_pool_alloc_page(struct ttm_pool
>>>>> *pool,
>>>>> gfp_t gfp_flags,
>>>>>                                           unsigned int order)
>>>>> @@ -400,6 +416,17 @@ static void __ttm_pool_free(struct
>>>>> ttm_pool
>>>>> *pool, struct ttm_tt *tt,
>>>>>           }
>>>>>     }
>>>>>     
>>>>> +static unsigned int ttm_pool_select_order(unsigned int order,
>>>>> pgoff_t num_pages)
>>>>> +{
>>>>> +       unsigned int *cur_order = ttm_pool_orders;
>>>>> +
>>>>> +       order = min_t(unsigned int, __fls(num_pages), order);
>>>>> +       while (order < *cur_order)
>>>>> +               ++cur_order;
>>>>> +
>>>>> +       return *cur_order;
>>>>> +}
>>>>> +
>>>>>     /**
>>>>>      * ttm_pool_alloc - Fill a ttm_tt object
>>>>>      *
>>>>> @@ -439,9 +466,8 @@ int ttm_pool_alloc(struct ttm_pool *pool,
>>>>> struct ttm_tt *tt,
>>>>>           else
>>>>>                   gfp_flags |= GFP_HIGHUSER;
>>>>>     
>>>>> -       for (order = min_t(unsigned int, MAX_ORDER - 1,
>>>>> __fls(num_pages));
>>>>> -            num_pages;
>>>>> -            order = min_t(unsigned int, order,
>>>>> __fls(num_pages)))
>>>>> {
>>>>> +       order = ttm_pool_select_order(ttm_pool_orders[0],
>>>>> num_pages);
>>>>> +       for (; num_pages; order = ttm_pool_select_order(order,
>>>>> num_pages)) {
>>>>>                   struct ttm_pool_type *pt;
>>>>>     
>>>>>                   page_caching = tt->caching;
>>>>> @@ -558,7 +584,7 @@ void ttm_pool_init(struct ttm_pool *pool,
>>>>> struct device *dev,
>>>>>     
>>>>>           if (use_dma_alloc) {
>>>>>                   for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
>>>>> -                       for (j = 0; j < MAX_ORDER; ++j)
>>>>> +                       for (j = 0; j < TTM_DIM_ORDER; ++j)
>>>>>                                   ttm_pool_type_init(&pool-
>>>>>> caching[i].orders[j],
>>>>>                                                      pool, i,
>>>>> j);
>>>>>           }
>>>>> @@ -578,7 +604,7 @@ void ttm_pool_fini(struct ttm_pool *pool)
>>>>>     
>>>>>           if (pool->use_dma_alloc) {
>>>>>                   for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
>>>>> -                       for (j = 0; j < MAX_ORDER; ++j)
>>>>> +                       for (j = 0; j < TTM_DIM_ORDER; ++j)
>>>>>                                   ttm_pool_type_fini(&pool-
>>>>>> caching[i].orders[j]);
>>>>>           }
>>>>>     
>>>>> @@ -632,7 +658,7 @@ static void ttm_pool_debugfs_header(struct
>>>>> seq_file *m)
>>>>>           unsigned int i;
>>>>>     
>>>>>           seq_puts(m, "\t ");
>>>>> -       for (i = 0; i < MAX_ORDER; ++i)
>>>>> +       for (i = 0; i < TTM_DIM_ORDER; ++i)
>>>>>                   seq_printf(m, " ---%2u---", i);
>>>>>           seq_puts(m, "\n");
>>>>>     }
>>>>> @@ -643,7 +669,7 @@ static void ttm_pool_debugfs_orders(struct
>>>>> ttm_pool_type *pt,
>>>>>     {
>>>>>           unsigned int i;
>>>>>     
>>>>> -       for (i = 0; i < MAX_ORDER; ++i)
>>>>> +       for (i = 0; i < TTM_DIM_ORDER; ++i)
>>>>>                   seq_printf(m, " %8u",
>>>>> ttm_pool_type_count(&pt[i]));
>>>>>           seq_puts(m, "\n");
>>>>>     }
>>>>> @@ -749,10 +775,16 @@ int ttm_pool_mgr_init(unsigned long
>>>>> num_pages)
>>>>>           if (!page_pool_size)
>>>>>                   page_pool_size = num_pages;
>>>>>     
>>>>> +       if (TTM_64K_ORDER < TTM_MAX_ORDER)
>>>>> +               ttm_pool_orders[1] = TTM_64K_ORDER;
>>>>> +
>>>>> +       pr_debug("Used orders are %u %u %u\n",
>>>>> ttm_pool_orders[0],
>>>>> +                ttm_pool_orders[1], ttm_pool_orders[2]);
>>>>> +
>>>>>           spin_lock_init(&shrinker_lock);
>>>>>           INIT_LIST_HEAD(&shrinker_list);
>>>>>     
>>>>> -       for (i = 0; i < MAX_ORDER; ++i) {
>>>>> +       for (i = 0; i < TTM_DIM_ORDER; ++i) {
>>>>>                   ttm_pool_type_init(&global_write_combined[i],
>>>>> NULL,
>>>>>                                      ttm_write_combined, i);
>>>>>                   ttm_pool_type_init(&global_uncached[i], NULL,
>>>>> ttm_uncached, i);
>>>>> @@ -785,7 +817,7 @@ void ttm_pool_mgr_fini(void)
>>>>>     {
>>>>>           unsigned int i;
>>>>>     
>>>>> -       for (i = 0; i < MAX_ORDER; ++i) {
>>>>> +       for (i = 0; i < TTM_DIM_ORDER; ++i) {
>>>>>                   ttm_pool_type_fini(&global_write_combined[i]);
>>>>>                   ttm_pool_type_fini(&global_uncached[i]);
>>>>>     


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 07/16] drm/ttm: Reduce the number of used allocation orders for TTM pages
  2023-02-16  7:11           ` Christian König
@ 2023-02-16  7:24             ` Thomas Hellström
  0 siblings, 0 replies; 32+ messages in thread
From: Thomas Hellström @ 2023-02-16  7:24 UTC (permalink / raw)
  To: Christian König, dri-devel
  Cc: Miaohe Lin, David Hildenbrand, NeilBrown, Daniel Vetter,
	intel-gfx, Peter Xu, linux-mm, Dave Hansen,
	linux-graphics-maintainer, Matthew Wilcox (Oracle),
	Johannes Weiner, Dave Airlie, Andrew Morton, Matthew Auld


On 2/16/23 08:11, Christian König wrote:
> Am 15.02.23 um 20:00 schrieb Thomas Hellström:
>> On Wed, 2023-02-15 at 19:30 +0100, Christian König wrote:
>>> Am 15.02.23 um 19:12 schrieb Thomas Hellström:
>>>> On Wed, 2023-02-15 at 18:42 +0100, Christian König wrote:
>>>>> Am 15.02.23 um 17:13 schrieb Thomas Hellström:
>>>>>> When swapping out, we will split multi-order pages both in
>>>>>> order to
>>>>>> move them to the swap-cache and to be able to return memory to
>>>>>> the
>>>>>> swap cache as soon as possible on a page-by-page basis.
>>>>>> By reducing the page max order to the system PMD size, we can
>>>>>> be
>>>>>> nicer
>>>>>> to the system and avoid splitting gigantic pages.
>>>>>> On top of this we also
>>>>>> include the 64K page size in the page sizes tried, since that
>>>>>> appears to
>>>>>> be a common size for GPU applications.
>>>>> Please completely drop that.
>>>> You mean the 64K page size, or the whole patch?
>>> The 64K page size. This was an invention from Microsoft to
>>> standardize
>>> GPU handling ~15-20years ago.
>>>
>>> It turned out to be a complete shipwreck and by now 2MiB and 1GiB
>>> pages
>>> or just flexible hardware which can handle everything seem to become
>>> standard.
>>>
>>>>> This is just nonsense spilling in from the
>>>>> Windows drivers.
>>>> Agreed, but IIRC on the last RFC you asked me not to drop the 64K
>>>> pages, so that's why they are here. I can remove them if needed.
>>> We could keep it if it's in any way beneficial, but I'm pretty sure I
>>> must have been drunk to ask for that.
>>>
>>>> The only reason for keeping them from a performance point of view
>>>> is
>>>> better efficiency on GPUs with 64K page size if not using a
>>>> coalescing
>>>> IOMMU for dma-mapping.
>>> Are any of those still produced? As far as I know neither NVidia,
>>> Intel
>>> nor AMD still assumes that page size in their hardware for quite a
>>> while
>>> now.
>> Intel still supports 64K PTEs, so we use them where possible, otherwise
>> falling back to 4K. Typically we have coalescing IOMMU enabled when
>> testing, so can't really see the impact, but TBH I was surprised by the
>> number of 64K page allocations TTM spat out with this patch series, so
>> I definitely think there is a performance impact with !IOMMU, although
>> I can't quantify it ATM.
>>
>> So then if it's OK with you I'll keep that size for now.
>
> If it makes 64K pages preferred then this is a pretty clear NAK.
>
> What we can do is to support any page size up to at least 2MiB here.

OK, I'll use that latter approach then. I don't have any strong 
preferences here except the swapin helper wants to keep the max pagesize 
as low as possible since it needs to store one page worth of 4K swap 
entries.

/Thomas

>
> Christian.
>
>>
>> /Thomas
>>
>>
>>
>>> Regards,
>>> Christian.
>>>
>>>> Let me know what you think is best and I'll adjust accordingly.
>>>>
>>>> /Thomas
>>>>
>>>>
>>>>> Christian.
>>>>>
>>>>>> Looking forward to when we might be able to swap out PMD size
>>>>>> folios
>>>>>> without splitting, this will also be a benefit.
>>>>>>
>>>>>> Signed-off-by: Thomas Hellström
>>>>>> <thomas.hellstrom@linux.intel.com>
>>>>>> ---
>>>>>>     drivers/gpu/drm/ttm/ttm_pool.c | 58
>>>>>> ++++++++++++++++++++++++++---
>>>>>> -----
>>>>>>     1 file changed, 45 insertions(+), 13 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c
>>>>>> b/drivers/gpu/drm/ttm/ttm_pool.c
>>>>>> index 1cc7591a9542..8787fb6a218b 100644
>>>>>> --- a/drivers/gpu/drm/ttm/ttm_pool.c
>>>>>> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
>>>>>> @@ -31,6 +31,8 @@
>>>>>>      * cause they are rather slow compared to alloc_pages+map.
>>>>>>      */
>>>>>>     +#define pr_fmt(fmt) "[TTM POOL] " fmt
>>>>>> +
>>>>>>     #include <linux/module.h>
>>>>>>     #include <linux/dma-mapping.h>
>>>>>>     #include <linux/debugfs.h>
>>>>>> @@ -47,6 +49,18 @@
>>>>>>         #include "ttm_module.h"
>>>>>>     +#define TTM_MAX_ORDER (PMD_SHIFT - PAGE_SHIFT)
>>>>>> +#define TTM_64K_ORDER (16 - PAGE_SHIFT)
>>>>>> +#if (TTM_MAX_ORDER < TTM_64K_ORDER)
>>>>>> +#undef TTM_MAX_ORDER
>>>>>> +#define TTM_MAX_ORDER TTM_64K_ORDER
>>>>>> +#endif
>>>>>> +#if ((MAX_ORDER - 1) < TTM_MAX_ORDER)
>>>>>> +#undef TTM_MAX_ORDER
>>>>>> +#define TTM_MAX_ORDER (MAX_ORDER - 1)
>>>>>> +#endif
>>>>>> +#define TTM_DIM_ORDER (TTM_MAX_ORDER + 1)
>>>>>> +
>>>>>>     /**
>>>>>>      * struct ttm_pool_dma - Helper object for coherent DMA
>>>>>> mappings
>>>>>>      *
>>>>>> @@ -65,16 +79,18 @@ module_param(page_pool_size, ulong, 0644);
>>>>>>         static atomic_long_t allocated_pages;
>>>>>>     -static struct ttm_pool_type global_write_combined[MAX_ORDER];
>>>>>> -static struct ttm_pool_type global_uncached[MAX_ORDER];
>>>>>> +static struct ttm_pool_type
>>>>>> global_write_combined[TTM_DIM_ORDER];
>>>>>> +static struct ttm_pool_type global_uncached[TTM_DIM_ORDER];
>>>>>>     -static struct ttm_pool_type
>>>>>> global_dma32_write_combined[MAX_ORDER];
>>>>>> -static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
>>>>>> +static struct ttm_pool_type
>>>>>> global_dma32_write_combined[TTM_DIM_ORDER];
>>>>>> +static struct ttm_pool_type
>>>>>> global_dma32_uncached[TTM_DIM_ORDER];
>>>>>>         static spinlock_t shrinker_lock;
>>>>>>     static struct list_head shrinker_list;
>>>>>>     static struct shrinker mm_shrinker;
>>>>>>     +static unsigned int ttm_pool_orders[] = {TTM_MAX_ORDER, 0, 0};
>>>>>> +
>>>>>>     /* Allocate pages of size 1 << order with the given
>>>>>> gfp_flags */
>>>>>>     static struct page *ttm_pool_alloc_page(struct ttm_pool
>>>>>> *pool,
>>>>>> gfp_t gfp_flags,
>>>>>>                                           unsigned int order)
>>>>>> @@ -400,6 +416,17 @@ static void __ttm_pool_free(struct
>>>>>> ttm_pool
>>>>>> *pool, struct ttm_tt *tt,
>>>>>>           }
>>>>>>     }
>>>>>>     +static unsigned int ttm_pool_select_order(unsigned int order,
>>>>>> pgoff_t num_pages)
>>>>>> +{
>>>>>> +       unsigned int *cur_order = ttm_pool_orders;
>>>>>> +
>>>>>> +       order = min_t(unsigned int, __fls(num_pages), order);
>>>>>> +       while (order < *cur_order)
>>>>>> +               ++cur_order;
>>>>>> +
>>>>>> +       return *cur_order;
>>>>>> +}
>>>>>> +
>>>>>>     /**
>>>>>>      * ttm_pool_alloc - Fill a ttm_tt object
>>>>>>      *
>>>>>> @@ -439,9 +466,8 @@ int ttm_pool_alloc(struct ttm_pool *pool,
>>>>>> struct ttm_tt *tt,
>>>>>>           else
>>>>>>                   gfp_flags |= GFP_HIGHUSER;
>>>>>>     -       for (order = min_t(unsigned int, MAX_ORDER - 1,
>>>>>> __fls(num_pages));
>>>>>> -            num_pages;
>>>>>> -            order = min_t(unsigned int, order,
>>>>>> __fls(num_pages)))
>>>>>> {
>>>>>> +       order = ttm_pool_select_order(ttm_pool_orders[0],
>>>>>> num_pages);
>>>>>> +       for (; num_pages; order = ttm_pool_select_order(order,
>>>>>> num_pages)) {
>>>>>>                   struct ttm_pool_type *pt;
>>>>>>                       page_caching = tt->caching;
>>>>>> @@ -558,7 +584,7 @@ void ttm_pool_init(struct ttm_pool *pool,
>>>>>> struct device *dev,
>>>>>>               if (use_dma_alloc) {
>>>>>>                   for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
>>>>>> -                       for (j = 0; j < MAX_ORDER; ++j)
>>>>>> +                       for (j = 0; j < TTM_DIM_ORDER; ++j)
>>>>>>                                   ttm_pool_type_init(&pool-
>>>>>>> caching[i].orders[j],
>>>>>> pool, i,
>>>>>> j);
>>>>>>           }
>>>>>> @@ -578,7 +604,7 @@ void ttm_pool_fini(struct ttm_pool *pool)
>>>>>>               if (pool->use_dma_alloc) {
>>>>>>                   for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
>>>>>> -                       for (j = 0; j < MAX_ORDER; ++j)
>>>>>> +                       for (j = 0; j < TTM_DIM_ORDER; ++j)
>>>>>>                                   ttm_pool_type_fini(&pool-
>>>>>>> caching[i].orders[j]);
>>>>>>           }
>>>>>>     @@ -632,7 +658,7 @@ static void ttm_pool_debugfs_header(struct
>>>>>> seq_file *m)
>>>>>>           unsigned int i;
>>>>>>               seq_puts(m, "\t ");
>>>>>> -       for (i = 0; i < MAX_ORDER; ++i)
>>>>>> +       for (i = 0; i < TTM_DIM_ORDER; ++i)
>>>>>>                   seq_printf(m, " ---%2u---", i);
>>>>>>           seq_puts(m, "\n");
>>>>>>     }
>>>>>> @@ -643,7 +669,7 @@ static void ttm_pool_debugfs_orders(struct
>>>>>> ttm_pool_type *pt,
>>>>>>     {
>>>>>>           unsigned int i;
>>>>>>     -       for (i = 0; i < MAX_ORDER; ++i)
>>>>>> +       for (i = 0; i < TTM_DIM_ORDER; ++i)
>>>>>>                   seq_printf(m, " %8u",
>>>>>> ttm_pool_type_count(&pt[i]));
>>>>>>           seq_puts(m, "\n");
>>>>>>     }
>>>>>> @@ -749,10 +775,16 @@ int ttm_pool_mgr_init(unsigned long
>>>>>> num_pages)
>>>>>>           if (!page_pool_size)
>>>>>>                   page_pool_size = num_pages;
>>>>>>     +       if (TTM_64K_ORDER < TTM_MAX_ORDER)
>>>>>> +               ttm_pool_orders[1] = TTM_64K_ORDER;
>>>>>> +
>>>>>> +       pr_debug("Used orders are %u %u %u\n",
>>>>>> ttm_pool_orders[0],
>>>>>> +                ttm_pool_orders[1], ttm_pool_orders[2]);
>>>>>> +
>>>>>>           spin_lock_init(&shrinker_lock);
>>>>>>           INIT_LIST_HEAD(&shrinker_list);
>>>>>>     -       for (i = 0; i < MAX_ORDER; ++i) {
>>>>>> +       for (i = 0; i < TTM_DIM_ORDER; ++i) {
>>>>>>                   ttm_pool_type_init(&global_write_combined[i],
>>>>>> NULL,
>>>>>>                                      ttm_write_combined, i);
>>>>>>                   ttm_pool_type_init(&global_uncached[i], NULL,
>>>>>> ttm_uncached, i);
>>>>>> @@ -785,7 +817,7 @@ void ttm_pool_mgr_fini(void)
>>>>>>     {
>>>>>>           unsigned int i;
>>>>>>     -       for (i = 0; i < MAX_ORDER; ++i) {
>>>>>> +       for (i = 0; i < TTM_DIM_ORDER; ++i) {
>>>>>>                   ttm_pool_type_fini(&global_write_combined[i]);
>>>>>>                   ttm_pool_type_fini(&global_uncached[i]);
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2023-02-16  7:24 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-15 16:13 [RFC PATCH 00/16] Add a TTM shrinker Thomas Hellström
2023-02-15 16:13 ` [RFC PATCH 01/16] drm/ttm: Fix a NULL pointer dereference Thomas Hellström
2023-02-15 17:25   ` Christian König
2023-02-15 16:13 ` [RFC PATCH 02/16] drm/ttm/pool: Fix ttm_pool_alloc error path Thomas Hellström
2023-02-15 17:31   ` Christian König
2023-02-15 18:02     ` Thomas Hellström
2023-02-15 18:26       ` Christian König
2023-02-15 18:51         ` Thomas Hellström
2023-02-15 16:13 ` [RFC PATCH 03/16] drm/ttm: Use the BIT macro for the TTM_TT_FLAGs Thomas Hellström
2023-02-15 17:33   ` Christian König
2023-02-15 16:13 ` [RFC PATCH 04/16] drm/ttm, drm/vmwgfx: Update the TTM swapout interface Thomas Hellström
2023-02-15 17:39   ` Christian König
2023-02-15 18:19     ` Thomas Hellström
2023-02-15 18:32       ` Christian König
2023-02-15 16:13 ` [RFC PATCH 05/16] drm/ttm: Unexport ttm_global_swapout() Thomas Hellström
2023-02-15 16:13 ` [RFC PATCH 06/16] drm/ttm: Don't use watermark accounting on shrinkable pools Thomas Hellström
2023-02-15 16:13 ` [RFC PATCH 07/16] drm/ttm: Reduce the number of used allocation orders for TTM pages Thomas Hellström
2023-02-15 17:42   ` Christian König
2023-02-15 18:12     ` Thomas Hellström
2023-02-15 18:30       ` Christian König
2023-02-15 19:00         ` Thomas Hellström
2023-02-16  7:11           ` Christian König
2023-02-16  7:24             ` Thomas Hellström
2023-02-15 16:13 ` [RFC PATCH 08/16] drm/ttm: Add a shrinker and shrinker accounting Thomas Hellström
2023-02-15 16:13 ` [RFC PATCH 09/16] drm/ttm: Introduce shrink throttling Thomas Hellström
2023-02-15 16:13 ` [RFC PATCH 10/16] drm/ttm: Remove pinned bos from shrinkable accounting Thomas Hellström
2023-02-15 16:14 ` [RFC PATCH 11/16] drm/ttm: Add a simple api to set / clear purgeable ttm_tt content Thomas Hellström
2023-02-15 16:14 ` [RFC PATCH 12/16] mm: Add interfaces to back up and recover folio contents using swap Thomas Hellström
2023-02-15 16:14 ` [RFC PATCH 13/16] drm/ttm: Make the call to ttm_tt_populate() interruptible when faulting Thomas Hellström
2023-02-15 16:14 ` [RFC PATCH 14/16] drm/ttm: Provide helpers for shrinking Thomas Hellström
2023-02-15 16:14 ` [RFC PATCH 15/16] drm/ttm: Use fault-injection to test error paths Thomas Hellström
2023-02-15 16:14 ` [RFC PATCH 16/16] drm/i915, drm/ttm: Use the TTM shrinker rather than the external shmem pool Thomas Hellström

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).