All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] shmem: Support for registration of driver/file owner specific ops
@ 2016-11-04 15:02 akash.goel
  2016-11-04 15:02 ` [PATCH 2/2] drm/i915: Make GPU pages movable akash.goel
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: akash.goel @ 2016-11-04 15:02 UTC (permalink / raw)
  To: intel-gfx
  Cc: Chris Wilson, Hugh Dickins, linux-mm, linux-kernel, Sourab Gupta,
	Akash Goel

From: Chris Wilson <chris@chris-wilson.co.uk>

This provides support for the drivers or shmem file owners to register
a set of callbacks, which can be invoked from the address space
operations methods implemented by shmem.  This allow the file owners to
hook into the shmem address space operations to do some extra/custom
operations in addition to the default ones.

The private_data field of address_space struct is used to store the
pointer to driver specific ops.  Currently only one ops field is defined,
which is migratepage, but can be extended on an as-needed basis.

The need for driver specific operations arises since some of the
operations (like migratepage) may not be handled completely within shmem,
so as to be effective, and would need some driver specific handling also.
Specifically, i915.ko would like to participate in migratepage().
i915.ko uses shmemfs to provide swappable backing storage for its user
objects, but when those objects are in use by the GPU it must pin the
entire object until the GPU is idle.  As a result, large chunks of memory
can be arbitrarily withdrawn from page migration, resulting in premature
out-of-memory due to fragmentation.  However, if i915.ko can receive the
migratepage() request, it can then flush the object from the GPU, remove
its pin and thus enable the migration.

Since gfx allocations are one of the major consumer of system memory, its
imperative to have such a mechanism to effectively deal with
fragmentation.  And therefore the need for such a provision for initiating
driver specific actions during address space operations.

v2:
- Drop dev_ prefix from the members of shmem_dev_info structure. (Joonas)
- Change the return type of shmem_set_device_op() to void and remove the
  check for pre-existing data. (Joonas)
- Rename shmem_set_device_op() to shmem_set_dev_info() to be consistent
  with shmem_dev_info structure. (Joonas)

Cc: Hugh Dickins <hughd@google.com>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.linux.org
Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 include/linux/shmem_fs.h | 13 +++++++++++++
 mm/shmem.c               | 17 ++++++++++++++++-
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index ff078e7..454c3ba 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -39,11 +39,24 @@ struct shmem_sb_info {
 	unsigned long shrinklist_len; /* Length of shrinklist */
 };
 
+struct shmem_dev_info {
+	void *private_data;
+	int (*migratepage)(struct address_space *mapping,
+			   struct page *newpage, struct page *page,
+			   enum migrate_mode mode, void *dev_priv_data);
+};
+
 static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
 {
 	return container_of(inode, struct shmem_inode_info, vfs_inode);
 }
 
+static inline void shmem_set_dev_info(struct address_space *mapping,
+				      struct shmem_dev_info *info)
+{
+	mapping->private_data = info;
+}
+
 /*
  * Functions in mm/shmem.c called directly from elsewhere:
  */
diff --git a/mm/shmem.c b/mm/shmem.c
index ad7813d..fce8de3 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1290,6 +1290,21 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 	return 0;
 }
 
+#ifdef CONFIG_MIGRATION
+static int shmem_migratepage(struct address_space *mapping,
+			     struct page *newpage, struct page *page,
+			     enum migrate_mode mode)
+{
+	struct shmem_dev_info *dev_info = mapping->private_data;
+
+	if (dev_info && dev_info->migratepage)
+		return dev_info->migratepage(mapping, newpage, page,
+					     mode, dev_info->private_data);
+
+	return migrate_page(mapping, newpage, page, mode);
+}
+#endif
+
 #if defined(CONFIG_NUMA) && defined(CONFIG_TMPFS)
 static void shmem_show_mpol(struct seq_file *seq, struct mempolicy *mpol)
 {
@@ -3654,7 +3669,7 @@ static void shmem_destroy_inodecache(void)
 	.write_end	= shmem_write_end,
 #endif
 #ifdef CONFIG_MIGRATION
-	.migratepage	= migrate_page,
+	.migratepage	= shmem_migratepage,
 #endif
 	.error_remove_page = generic_error_remove_page,
 };
-- 
1.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/2] drm/i915: Make GPU pages movable
  2016-11-04 15:02 [PATCH 1/2] shmem: Support for registration of driver/file owner specific ops akash.goel
@ 2016-11-04 15:02 ` akash.goel
  2016-11-09 11:28   ` Daniel Vetter
  2016-11-10  6:39     ` Hugh Dickins
  2016-11-04 17:15 ` ✓ Fi.CI.BAT: success for series starting with [1/2] shmem: Support for registration of driver/file owner specific ops Patchwork
  2016-11-10  5:36 ` [PATCH 1/2] " Hugh Dickins
  2 siblings, 2 replies; 29+ messages in thread
From: akash.goel @ 2016-11-04 15:02 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson, Hugh Dickins, linux-mm, Sourab Gupta, Akash Goel

From: Chris Wilson <chris@chris-wilson.co.uk>

On a long run of more than 2-3 days, physical memory tends to get
fragmented severely, which considerably slows down the system. In such a
scenario, the shrinker is also unable to help as lack of memory is not
the actual problem, since it has been observed that there are enough free
pages of 0 order. This also manifests itself when an indiviual zone in
the mm runs out of pages and if we cannot migrate pages between zones,
the kernel hits an out-of-memory even though there are free pages (and
often all of swap) available.

To address the issue of external fragementation, kernel does a compaction
(which involves migration of pages) but it's efficacy depends upon how
many pages are marked as MOVABLE, as only those pages can be migrated.

Currently the backing pages for GPU buffers are allocated from shmemfs
with GFP_RECLAIMABLE flag, in units of 4KB pages.  In the case of limited
swap space, it may not be possible always to reclaim or swap-out pages of
all the inactive objects, to make way for free space allowing formation
of higher order groups of physically-contiguous pages on compaction.

Just marking the GPU pages as MOVABLE will not suffice, as i915.ko has to
pin the pages if they are in use by GPU, which will prevent their
migration. So the migratepage callback in shmem is also hooked up to get
a notification when kernel initiates the page migration. On the
notification, i915.ko appropriately unpin the pages.  With this we can
effectively mark the GPU pages as MOVABLE and hence mitigate the
fragmentation problem.

v2:
 - Rename the migration routine to gem_shrink_migratepage, move it to the
   shrinker file, and use the existing constructs (Chris)
 - To cleanup, add a new helper function to encapsulate all page migration
   skip conditions (Chris)
 - Add a new local helper function in shrinker file, for dropping the
   backing pages, and call the same from gem_shrink() also (Chris)

v3:
 - Fix/invert the check on the return value of unsafe_drop_pages (Chris)

v4:
 - Minor tidy

v5:
 - Fix unsafe usage of unsafe_drop_pages()
 - Rebase onto vmap-notifier

v6:
- Remove i915_gem_object_get/put across unsafe_drop_pages() as with
  struct_mutex protection object can't disappear. (Chris)

Testcase: igt/gem_shrink
Bugzilla: (e.g.) https://bugs.freedesktop.org/show_bug.cgi?id=90254
Cc: Hugh Dickins <hughd@google.com>
Cc: linux-mm@kvack.org
Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h          |   2 +
 drivers/gpu/drm/i915/i915_gem.c          |   9 ++-
 drivers/gpu/drm/i915/i915_gem_shrinker.c | 132 +++++++++++++++++++++++++++++++
 3 files changed, 142 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4735b417..7f2717b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1357,6 +1357,8 @@ struct intel_l3_parity {
 };
 
 struct i915_gem_mm {
+	struct shmem_dev_info shmem_info;
+
 	/** Memory allocator for GTT stolen memory */
 	struct drm_mm stolen;
 	/** Protects the usage of the GTT stolen memory allocator. This is
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1f995ce..f0d4ce7 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2164,6 +2164,7 @@ void __i915_gem_object_invalidate(struct drm_i915_gem_object *obj)
 		if (obj->mm.madv == I915_MADV_WILLNEED)
 			mark_page_accessed(page);
 
+		set_page_private(page, 0);
 		put_page(page);
 	}
 	obj->mm.dirty = false;
@@ -2310,6 +2311,7 @@ static unsigned int swiotlb_max_size(void)
 			sg->length += PAGE_SIZE;
 		}
 		last_pfn = page_to_pfn(page);
+		set_page_private(page, (unsigned long)obj);
 
 		/* Check that the i965g/gm workaround works. */
 		WARN_ON((gfp & __GFP_DMA32) && (last_pfn >= 0x00100000UL));
@@ -2334,8 +2336,10 @@ static unsigned int swiotlb_max_size(void)
 
 err_pages:
 	sg_mark_end(sg);
-	for_each_sgt_page(page, sgt_iter, st)
+	for_each_sgt_page(page, sgt_iter, st) {
+		set_page_private(page, 0);
 		put_page(page);
+	}
 	sg_free_table(st);
 	kfree(st);
 
@@ -4185,6 +4189,8 @@ struct drm_i915_gem_object *
 		goto fail;
 
 	mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;
+	if (IS_ENABLED(MIGRATION))
+		mask |= __GFP_MOVABLE;
 	if (IS_CRESTLINE(dev_priv) || IS_BROADWATER(dev_priv)) {
 		/* 965gm cannot relocate objects above 4GiB. */
 		mask &= ~__GFP_HIGHMEM;
@@ -4193,6 +4199,7 @@ struct drm_i915_gem_object *
 
 	mapping = obj->base.filp->f_mapping;
 	mapping_set_gfp_mask(mapping, mask);
+	shmem_set_dev_info(mapping, &dev_priv->mm.shmem_info);
 
 	i915_gem_object_init(obj, &i915_gem_object_ops);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index a6fc1bd..051135ac 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -24,6 +24,7 @@
 
 #include <linux/oom.h>
 #include <linux/shmem_fs.h>
+#include <linux/migrate.h>
 #include <linux/slab.h>
 #include <linux/swap.h>
 #include <linux/pci.h>
@@ -473,6 +474,132 @@ struct shrinker_lock_uninterruptible {
 	return NOTIFY_DONE;
 }
 
+#ifdef CONFIG_MIGRATION
+static bool can_migrate_page(struct drm_i915_gem_object *obj)
+{
+	/* Avoid the migration of page if being actively used by GPU */
+	if (i915_gem_object_is_active(obj))
+		return false;
+
+	/* Skip the migration for purgeable objects otherwise there
+	 * will be a deadlock when shmem will try to lock the page for
+	 * truncation, which is already locked by the caller before
+	 * migration.
+	 */
+	if (obj->mm.madv == I915_MADV_DONTNEED)
+		return false;
+
+	/* Skip the migration for a pinned object */
+	if (atomic_read(&obj->mm.pages_pin_count) > obj->bind_count)
+		return false;
+
+	if (any_vma_pinned(obj))
+		return false;
+
+	return true;
+}
+
+static int do_migrate_page(struct drm_i915_gem_object *obj)
+{
+	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
+	int ret = 0;
+
+	if (!can_migrate_page(obj))
+		return -EBUSY;
+
+	/* HW access would be required for a GGTT bound object, for which
+	 * device has to be kept awake. But a deadlock scenario can arise if
+	 * the attempt is made to resume the device, when either a suspend
+	 * or a resume operation is already happening concurrently from some
+	 * other path and that only also triggers compaction. So only unbind
+	 * if the device is currently awake.
+	 */
+	if (!intel_runtime_pm_get_if_in_use(dev_priv))
+		return -EBUSY;
+
+	if (!unsafe_drop_pages(obj))
+		ret = -EBUSY;
+
+	intel_runtime_pm_put(dev_priv);
+	return ret;
+}
+
+static int i915_gem_shrinker_migratepage(struct address_space *mapping,
+					 struct page *newpage,
+					 struct page *page,
+					 enum migrate_mode mode,
+					 void *dev_priv_data)
+{
+	struct drm_i915_private *dev_priv = dev_priv_data;
+	struct shrinker_lock_uninterruptible slu;
+	int ret;
+
+	/*
+	 * Clear the private field of the new target page as it could have a
+	 * stale value in the private field. Otherwise later on if this page
+	 * itself gets migrated, without getting referred by the Driver
+	 * in between, the stale value would cause the i915_migratepage
+	 * function to go for a toss as object pointer is derived from it.
+	 * This should be safe since at the time of migration, private field
+	 * of the new page (which is actually an independent free 4KB page now)
+	 * should be like a don't care for the kernel.
+	 */
+	set_page_private(newpage, 0);
+
+	if (!page_private(page))
+		goto migrate;
+
+	/*
+	 * Check the page count, if Driver also has a reference then it should
+	 * be more than 2, as shmem will have one reference and one reference
+	 * would have been taken by the migration path itself. So if reference
+	 * is <=2, we can directly invoke the migration function.
+	 */
+	if (page_count(page) <= 2)
+		goto migrate;
+
+	/*
+	 * Use trylock here, with a timeout, for struct_mutex as
+	 * otherwise there is a possibility of deadlock due to lock
+	 * inversion. This path, which tries to migrate a particular
+	 * page after locking that page, can race with a path which
+	 * truncate/purge pages of the corresponding object (after
+	 * acquiring struct_mutex). Since page truncation will also
+	 * try to lock the page, a scenario of deadlock can arise.
+	 */
+	if (!i915_gem_shrinker_lock_uninterruptible(dev_priv, &slu, 10))
+		return -EBUSY;
+
+	ret = 0;
+	if (!PageSwapCache(page) && page_private(page)) {
+		struct drm_i915_gem_object *obj =
+			(struct drm_i915_gem_object *)page_private(page);
+
+		ret = do_migrate_page(obj);
+	}
+
+	i915_gem_shrinker_unlock_uninterruptible(dev_priv, &slu);
+	if (ret)
+		return ret;
+
+	/*
+	 * Ideally here we don't expect the page count to be > 2, as driver
+	 * would have dropped its reference, but occasionally it has been seen
+	 * coming as 3 & 4. This leads to a situation of unexpected page count,
+	 * causing migration failure, with -EGAIN error. This then leads to
+	 * multiple attempts by the kernel to migrate the same set of pages.
+	 * And sometimes the repeated attempts proves detrimental for stability.
+	 * Also since we don't know who is the other owner, and for how long its
+	 * gonna keep the reference, its better to return -EBUSY.
+	 */
+	if (page_count(page) > 2)
+		return -EBUSY;
+
+migrate:
+	return migrate_page(mapping, newpage, page, mode);
+}
+#endif
+
 /**
  * i915_gem_shrinker_init - Initialize i915 shrinker
  * @dev_priv: i915 device
@@ -491,6 +618,11 @@ void i915_gem_shrinker_init(struct drm_i915_private *dev_priv)
 
 	dev_priv->mm.vmap_notifier.notifier_call = i915_gem_shrinker_vmap;
 	WARN_ON(register_vmap_purge_notifier(&dev_priv->mm.vmap_notifier));
+
+	dev_priv->mm.shmem_info.private_data = dev_priv;
+#ifdef CONFIG_MIGRATION
+	dev_priv->mm.shmem_info.migratepage = i915_gem_shrinker_migratepage;
+#endif
 }
 
 /**
-- 
1.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [1/2] shmem: Support for registration of driver/file owner specific ops
  2016-11-04 15:02 [PATCH 1/2] shmem: Support for registration of driver/file owner specific ops akash.goel
  2016-11-04 15:02 ` [PATCH 2/2] drm/i915: Make GPU pages movable akash.goel
@ 2016-11-04 17:15 ` Patchwork
  2016-11-10  5:36 ` [PATCH 1/2] " Hugh Dickins
  2 siblings, 0 replies; 29+ messages in thread
From: Patchwork @ 2016-11-04 17:15 UTC (permalink / raw)
  To: Akash Goel; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/2] shmem: Support for registration of driver/file owner specific ops
URL   : https://patchwork.freedesktop.org/series/14845/
State : success

== Summary ==

Series 14845v1 Series without cover letter
https://patchwork.freedesktop.org/api/1.0/series/14845/revisions/1/mbox/


fi-bdw-5557u     total:241  pass:226  dwarn:0   dfail:0   fail:0   skip:15 
fi-bsw-n3050     total:241  pass:201  dwarn:0   dfail:0   fail:0   skip:40 
fi-byt-j1900     total:241  pass:213  dwarn:0   dfail:0   fail:0   skip:28 
fi-byt-n2820     total:241  pass:209  dwarn:0   dfail:0   fail:0   skip:32 
fi-hsw-4770      total:241  pass:221  dwarn:0   dfail:0   fail:0   skip:20 
fi-hsw-4770r     total:241  pass:221  dwarn:0   dfail:0   fail:0   skip:20 
fi-ilk-650       total:241  pass:188  dwarn:0   dfail:0   fail:0   skip:53 
fi-ivb-3520m     total:241  pass:219  dwarn:0   dfail:0   fail:0   skip:22 
fi-ivb-3770      total:241  pass:219  dwarn:0   dfail:0   fail:0   skip:22 
fi-kbl-7200u     total:241  pass:219  dwarn:0   dfail:0   fail:0   skip:22 
fi-skl-6260u     total:241  pass:227  dwarn:0   dfail:0   fail:0   skip:14 
fi-skl-6700hq    total:241  pass:220  dwarn:0   dfail:0   fail:0   skip:21 
fi-skl-6700k     total:241  pass:219  dwarn:1   dfail:0   fail:0   skip:21 
fi-skl-6770hq    total:241  pass:227  dwarn:0   dfail:0   fail:0   skip:14 
fi-snb-2520m     total:241  pass:209  dwarn:0   dfail:0   fail:0   skip:32 
fi-snb-2600      total:241  pass:208  dwarn:0   dfail:0   fail:0   skip:33 

00d2fcf7c84de382bd2ceb5eaf908f76900d0791 drm-intel-nightly: 2016y-11m-04d-15h-43m-43s UTC integration manifest
95d9dd7 drm/i915: Make GPU pages movable
e67b361 shmem: Support for registration of driver/file owner specific ops

== Logs ==

For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_2907/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] drm/i915: Make GPU pages movable
  2016-11-04 15:02 ` [PATCH 2/2] drm/i915: Make GPU pages movable akash.goel
@ 2016-11-09 11:28   ` Daniel Vetter
  2016-11-09 18:36     ` Hugh Dickins
  2016-11-10  6:39     ` Hugh Dickins
  1 sibling, 1 reply; 29+ messages in thread
From: Daniel Vetter @ 2016-11-09 11:28 UTC (permalink / raw)
  To: akash.goel; +Cc: intel-gfx, Chris Wilson, Hugh Dickins, linux-mm, Sourab Gupta

On Fri, Nov 04, 2016 at 08:32:56PM +0530, akash.goel@intel.com wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> On a long run of more than 2-3 days, physical memory tends to get
> fragmented severely, which considerably slows down the system. In such a
> scenario, the shrinker is also unable to help as lack of memory is not
> the actual problem, since it has been observed that there are enough free
> pages of 0 order. This also manifests itself when an indiviual zone in
> the mm runs out of pages and if we cannot migrate pages between zones,
> the kernel hits an out-of-memory even though there are free pages (and
> often all of swap) available.
> 
> To address the issue of external fragementation, kernel does a compaction
> (which involves migration of pages) but it's efficacy depends upon how
> many pages are marked as MOVABLE, as only those pages can be migrated.
> 
> Currently the backing pages for GPU buffers are allocated from shmemfs
> with GFP_RECLAIMABLE flag, in units of 4KB pages.  In the case of limited
> swap space, it may not be possible always to reclaim or swap-out pages of
> all the inactive objects, to make way for free space allowing formation
> of higher order groups of physically-contiguous pages on compaction.
> 
> Just marking the GPU pages as MOVABLE will not suffice, as i915.ko has to
> pin the pages if they are in use by GPU, which will prevent their
> migration. So the migratepage callback in shmem is also hooked up to get
> a notification when kernel initiates the page migration. On the
> notification, i915.ko appropriately unpin the pages.  With this we can
> effectively mark the GPU pages as MOVABLE and hence mitigate the
> fragmentation problem.
> 
> v2:
>  - Rename the migration routine to gem_shrink_migratepage, move it to the
>    shrinker file, and use the existing constructs (Chris)
>  - To cleanup, add a new helper function to encapsulate all page migration
>    skip conditions (Chris)
>  - Add a new local helper function in shrinker file, for dropping the
>    backing pages, and call the same from gem_shrink() also (Chris)
> 
> v3:
>  - Fix/invert the check on the return value of unsafe_drop_pages (Chris)
> 
> v4:
>  - Minor tidy
> 
> v5:
>  - Fix unsafe usage of unsafe_drop_pages()
>  - Rebase onto vmap-notifier
> 
> v6:
> - Remove i915_gem_object_get/put across unsafe_drop_pages() as with
>   struct_mutex protection object can't disappear. (Chris)
> 
> Testcase: igt/gem_shrink
> Bugzilla: (e.g.) https://bugs.freedesktop.org/show_bug.cgi?id=90254
> Cc: Hugh Dickins <hughd@google.com>
> Cc: linux-mm@kvack.org
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

Hi all -mm folks!

Any feedback on these two? It's kinda an intermediate step towards a
full-blown gemfs, and I think useful for that. Or do we need to go
directly to our own backing storage thing? Aside from ack/nack from -mm I
think this is ready for merging.

Thanks, Daniel

> ---
>  drivers/gpu/drm/i915/i915_drv.h          |   2 +
>  drivers/gpu/drm/i915/i915_gem.c          |   9 ++-
>  drivers/gpu/drm/i915/i915_gem_shrinker.c | 132 +++++++++++++++++++++++++++++++
>  3 files changed, 142 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 4735b417..7f2717b 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1357,6 +1357,8 @@ struct intel_l3_parity {
>  };
>  
>  struct i915_gem_mm {
> +	struct shmem_dev_info shmem_info;
> +
>  	/** Memory allocator for GTT stolen memory */
>  	struct drm_mm stolen;
>  	/** Protects the usage of the GTT stolen memory allocator. This is
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 1f995ce..f0d4ce7 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2164,6 +2164,7 @@ void __i915_gem_object_invalidate(struct drm_i915_gem_object *obj)
>  		if (obj->mm.madv == I915_MADV_WILLNEED)
>  			mark_page_accessed(page);
>  
> +		set_page_private(page, 0);
>  		put_page(page);
>  	}
>  	obj->mm.dirty = false;
> @@ -2310,6 +2311,7 @@ static unsigned int swiotlb_max_size(void)
>  			sg->length += PAGE_SIZE;
>  		}
>  		last_pfn = page_to_pfn(page);
> +		set_page_private(page, (unsigned long)obj);
>  
>  		/* Check that the i965g/gm workaround works. */
>  		WARN_ON((gfp & __GFP_DMA32) && (last_pfn >= 0x00100000UL));
> @@ -2334,8 +2336,10 @@ static unsigned int swiotlb_max_size(void)
>  
>  err_pages:
>  	sg_mark_end(sg);
> -	for_each_sgt_page(page, sgt_iter, st)
> +	for_each_sgt_page(page, sgt_iter, st) {
> +		set_page_private(page, 0);
>  		put_page(page);
> +	}
>  	sg_free_table(st);
>  	kfree(st);
>  
> @@ -4185,6 +4189,8 @@ struct drm_i915_gem_object *
>  		goto fail;
>  
>  	mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;
> +	if (IS_ENABLED(MIGRATION))
> +		mask |= __GFP_MOVABLE;
>  	if (IS_CRESTLINE(dev_priv) || IS_BROADWATER(dev_priv)) {
>  		/* 965gm cannot relocate objects above 4GiB. */
>  		mask &= ~__GFP_HIGHMEM;
> @@ -4193,6 +4199,7 @@ struct drm_i915_gem_object *
>  
>  	mapping = obj->base.filp->f_mapping;
>  	mapping_set_gfp_mask(mapping, mask);
> +	shmem_set_dev_info(mapping, &dev_priv->mm.shmem_info);
>  
>  	i915_gem_object_init(obj, &i915_gem_object_ops);
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> index a6fc1bd..051135ac 100644
> --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> @@ -24,6 +24,7 @@
>  
>  #include <linux/oom.h>
>  #include <linux/shmem_fs.h>
> +#include <linux/migrate.h>
>  #include <linux/slab.h>
>  #include <linux/swap.h>
>  #include <linux/pci.h>
> @@ -473,6 +474,132 @@ struct shrinker_lock_uninterruptible {
>  	return NOTIFY_DONE;
>  }
>  
> +#ifdef CONFIG_MIGRATION
> +static bool can_migrate_page(struct drm_i915_gem_object *obj)
> +{
> +	/* Avoid the migration of page if being actively used by GPU */
> +	if (i915_gem_object_is_active(obj))
> +		return false;
> +
> +	/* Skip the migration for purgeable objects otherwise there
> +	 * will be a deadlock when shmem will try to lock the page for
> +	 * truncation, which is already locked by the caller before
> +	 * migration.
> +	 */
> +	if (obj->mm.madv == I915_MADV_DONTNEED)
> +		return false;
> +
> +	/* Skip the migration for a pinned object */
> +	if (atomic_read(&obj->mm.pages_pin_count) > obj->bind_count)
> +		return false;
> +
> +	if (any_vma_pinned(obj))
> +		return false;
> +
> +	return true;
> +}
> +
> +static int do_migrate_page(struct drm_i915_gem_object *obj)
> +{
> +	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
> +	int ret = 0;
> +
> +	if (!can_migrate_page(obj))
> +		return -EBUSY;
> +
> +	/* HW access would be required for a GGTT bound object, for which
> +	 * device has to be kept awake. But a deadlock scenario can arise if
> +	 * the attempt is made to resume the device, when either a suspend
> +	 * or a resume operation is already happening concurrently from some
> +	 * other path and that only also triggers compaction. So only unbind
> +	 * if the device is currently awake.
> +	 */
> +	if (!intel_runtime_pm_get_if_in_use(dev_priv))
> +		return -EBUSY;
> +
> +	if (!unsafe_drop_pages(obj))
> +		ret = -EBUSY;
> +
> +	intel_runtime_pm_put(dev_priv);
> +	return ret;
> +}
> +
> +static int i915_gem_shrinker_migratepage(struct address_space *mapping,
> +					 struct page *newpage,
> +					 struct page *page,
> +					 enum migrate_mode mode,
> +					 void *dev_priv_data)
> +{
> +	struct drm_i915_private *dev_priv = dev_priv_data;
> +	struct shrinker_lock_uninterruptible slu;
> +	int ret;
> +
> +	/*
> +	 * Clear the private field of the new target page as it could have a
> +	 * stale value in the private field. Otherwise later on if this page
> +	 * itself gets migrated, without getting referred by the Driver
> +	 * in between, the stale value would cause the i915_migratepage
> +	 * function to go for a toss as object pointer is derived from it.
> +	 * This should be safe since at the time of migration, private field
> +	 * of the new page (which is actually an independent free 4KB page now)
> +	 * should be like a don't care for the kernel.
> +	 */
> +	set_page_private(newpage, 0);
> +
> +	if (!page_private(page))
> +		goto migrate;
> +
> +	/*
> +	 * Check the page count, if Driver also has a reference then it should
> +	 * be more than 2, as shmem will have one reference and one reference
> +	 * would have been taken by the migration path itself. So if reference
> +	 * is <=2, we can directly invoke the migration function.
> +	 */
> +	if (page_count(page) <= 2)
> +		goto migrate;
> +
> +	/*
> +	 * Use trylock here, with a timeout, for struct_mutex as
> +	 * otherwise there is a possibility of deadlock due to lock
> +	 * inversion. This path, which tries to migrate a particular
> +	 * page after locking that page, can race with a path which
> +	 * truncate/purge pages of the corresponding object (after
> +	 * acquiring struct_mutex). Since page truncation will also
> +	 * try to lock the page, a scenario of deadlock can arise.
> +	 */
> +	if (!i915_gem_shrinker_lock_uninterruptible(dev_priv, &slu, 10))
> +		return -EBUSY;
> +
> +	ret = 0;
> +	if (!PageSwapCache(page) && page_private(page)) {
> +		struct drm_i915_gem_object *obj =
> +			(struct drm_i915_gem_object *)page_private(page);
> +
> +		ret = do_migrate_page(obj);
> +	}
> +
> +	i915_gem_shrinker_unlock_uninterruptible(dev_priv, &slu);
> +	if (ret)
> +		return ret;
> +
> +	/*
> +	 * Ideally here we don't expect the page count to be > 2, as driver
> +	 * would have dropped its reference, but occasionally it has been seen
> +	 * coming as 3 & 4. This leads to a situation of unexpected page count,
> +	 * causing migration failure, with -EGAIN error. This then leads to
> +	 * multiple attempts by the kernel to migrate the same set of pages.
> +	 * And sometimes the repeated attempts proves detrimental for stability.
> +	 * Also since we don't know who is the other owner, and for how long its
> +	 * gonna keep the reference, its better to return -EBUSY.
> +	 */
> +	if (page_count(page) > 2)
> +		return -EBUSY;
> +
> +migrate:
> +	return migrate_page(mapping, newpage, page, mode);
> +}
> +#endif
> +
>  /**
>   * i915_gem_shrinker_init - Initialize i915 shrinker
>   * @dev_priv: i915 device
> @@ -491,6 +618,11 @@ void i915_gem_shrinker_init(struct drm_i915_private *dev_priv)
>  
>  	dev_priv->mm.vmap_notifier.notifier_call = i915_gem_shrinker_vmap;
>  	WARN_ON(register_vmap_purge_notifier(&dev_priv->mm.vmap_notifier));
> +
> +	dev_priv->mm.shmem_info.private_data = dev_priv;
> +#ifdef CONFIG_MIGRATION
> +	dev_priv->mm.shmem_info.migratepage = i915_gem_shrinker_migratepage;
> +#endif
>  }
>  
>  /**
> -- 
> 1.9.2
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] drm/i915: Make GPU pages movable
  2016-11-09 11:28   ` Daniel Vetter
@ 2016-11-09 18:36     ` Hugh Dickins
  2016-11-22 16:02       ` [Intel-gfx] " Matthew Auld
  0 siblings, 1 reply; 29+ messages in thread
From: Hugh Dickins @ 2016-11-09 18:36 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: akash.goel, intel-gfx, Chris Wilson, Hugh Dickins, linux-mm,
	Sourab Gupta, Andrew Morton

On Wed, 9 Nov 2016, Daniel Vetter wrote:
> 
> Hi all -mm folks!
> 
> Any feedback on these two? It's kinda an intermediate step towards a
> full-blown gemfs, and I think useful for that. Or do we need to go
> directly to our own backing storage thing? Aside from ack/nack from -mm I
> think this is ready for merging.

I'm currently considering them at last: will report back later.

Full-blown gemfs does not come in here, of course; but let me
fire a warning shot since you mention it: if it's going to use swap,
then we shall probably have to nak it in favour of continuing to use 
infrastructure from mm/shmem.c.  I very much understand why you would
love to avoid that dependence, but I doubt it can be safely bypassed.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] shmem: Support for registration of driver/file owner specific ops
  2016-11-04 15:02 [PATCH 1/2] shmem: Support for registration of driver/file owner specific ops akash.goel
  2016-11-04 15:02 ` [PATCH 2/2] drm/i915: Make GPU pages movable akash.goel
  2016-11-04 17:15 ` ✓ Fi.CI.BAT: success for series starting with [1/2] shmem: Support for registration of driver/file owner specific ops Patchwork
@ 2016-11-10  5:36 ` Hugh Dickins
  2016-11-10 16:22   ` Goel, Akash
  2 siblings, 1 reply; 29+ messages in thread
From: Hugh Dickins @ 2016-11-10  5:36 UTC (permalink / raw)
  To: Akash Goel
  Cc: intel-gfx, Chris Wilson, Hugh Dickins, linux-mm, linux-kernel,
	Sourab Gupta

On Fri, 4 Nov 2016, akash.goel@intel.com wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> This provides support for the drivers or shmem file owners to register
> a set of callbacks, which can be invoked from the address space
> operations methods implemented by shmem.  This allow the file owners to
> hook into the shmem address space operations to do some extra/custom
> operations in addition to the default ones.
> 
> The private_data field of address_space struct is used to store the
> pointer to driver specific ops.  Currently only one ops field is defined,
> which is migratepage, but can be extended on an as-needed basis.
> 
> The need for driver specific operations arises since some of the
> operations (like migratepage) may not be handled completely within shmem,
> so as to be effective, and would need some driver specific handling also.
> Specifically, i915.ko would like to participate in migratepage().
> i915.ko uses shmemfs to provide swappable backing storage for its user
> objects, but when those objects are in use by the GPU it must pin the
> entire object until the GPU is idle.  As a result, large chunks of memory
> can be arbitrarily withdrawn from page migration, resulting in premature
> out-of-memory due to fragmentation.  However, if i915.ko can receive the
> migratepage() request, it can then flush the object from the GPU, remove
> its pin and thus enable the migration.
> 
> Since gfx allocations are one of the major consumer of system memory, its
> imperative to have such a mechanism to effectively deal with
> fragmentation.  And therefore the need for such a provision for initiating
> driver specific actions during address space operations.

Thank you for persisting with this, and sorry for all my delay.

> 
> v2:
> - Drop dev_ prefix from the members of shmem_dev_info structure. (Joonas)
> - Change the return type of shmem_set_device_op() to void and remove the
>   check for pre-existing data. (Joonas)
> - Rename shmem_set_device_op() to shmem_set_dev_info() to be consistent
>   with shmem_dev_info structure. (Joonas)
> 
> Cc: Hugh Dickins <hughd@google.com>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.linux.org
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

That doesn't seem quite right: the From line above implies that Chris
wrote it, and should be first Signer; but perhaps the From line is wrong.

> ---
>  include/linux/shmem_fs.h | 13 +++++++++++++
>  mm/shmem.c               | 17 ++++++++++++++++-
>  2 files changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> index ff078e7..454c3ba 100644
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -39,11 +39,24 @@ struct shmem_sb_info {
>  	unsigned long shrinklist_len; /* Length of shrinklist */
>  };
>  
> +struct shmem_dev_info {
> +	void *private_data;
> +	int (*migratepage)(struct address_space *mapping,
> +			   struct page *newpage, struct page *page,
> +			   enum migrate_mode mode, void *dev_priv_data);

Aren't the private_data field and dev_priv_data arg a little bit
confusing and redundant?  Can't the migratepage() deduce dev_priv
for itself from mapping->private_data (perhaps wrapped by a
shmem_get_dev_info()), by using container_of()?

> +};
> +
>  static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
>  {
>  	return container_of(inode, struct shmem_inode_info, vfs_inode);
>  }
>  
> +static inline void shmem_set_dev_info(struct address_space *mapping,
> +				      struct shmem_dev_info *info)
> +{
> +	mapping->private_data = info;

Nit: if this stays as is, I'd prefer dev_info there and above,
since shmem.c uses info all over for its shmem_inode_info pointer.
But in second patch I suggest obj_info may be better than dev_info.

> +}
> +
>  /*
>   * Functions in mm/shmem.c called directly from elsewhere:
>   */
> diff --git a/mm/shmem.c b/mm/shmem.c
> index ad7813d..fce8de3 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1290,6 +1290,21 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>  	return 0;
>  }
>  
> +#ifdef CONFIG_MIGRATION
> +static int shmem_migratepage(struct address_space *mapping,
> +			     struct page *newpage, struct page *page,
> +			     enum migrate_mode mode)
> +{
> +	struct shmem_dev_info *dev_info = mapping->private_data;
> +
> +	if (dev_info && dev_info->migratepage)
> +		return dev_info->migratepage(mapping, newpage, page,
> +					     mode, dev_info->private_data);
> +
> +	return migrate_page(mapping, newpage, page, mode);
> +}
> +#endif
> +
>  #if defined(CONFIG_NUMA) && defined(CONFIG_TMPFS)
>  static void shmem_show_mpol(struct seq_file *seq, struct mempolicy *mpol)
>  {
> @@ -3654,7 +3669,7 @@ static void shmem_destroy_inodecache(void)
>  	.write_end	= shmem_write_end,
>  #endif
>  #ifdef CONFIG_MIGRATION
> -	.migratepage	= migrate_page,
> +	.migratepage	= shmem_migratepage,
>  #endif
>  	.error_remove_page = generic_error_remove_page,
>  };
> -- 
> 1.9.2

I didn't like this very much; but every time I tried to "improve" it,
found good reasons why you chose the way you did (modularity of i915,
constness of a_ops, reluctance to copy and modify a_ops, reluctance
to export those shmem methods separately).

I think perhaps later we just add a gem_ops pointer to shmem_inode_info,
for i915 or other gems to fill in as they wish (and shmem divert off to
them if set, as you've done); but for now you're trying to avoid
enlarging the shmem inode, okay.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] drm/i915: Make GPU pages movable
  2016-11-04 15:02 ` [PATCH 2/2] drm/i915: Make GPU pages movable akash.goel
@ 2016-11-10  6:39     ` Hugh Dickins
  2016-11-10  6:39     ` Hugh Dickins
  1 sibling, 0 replies; 29+ messages in thread
From: Hugh Dickins @ 2016-11-10  6:39 UTC (permalink / raw)
  To: Akash Goel; +Cc: intel-gfx, Chris Wilson, Hugh Dickins, linux-mm, Sourab Gupta

On Fri, 4 Nov 2016, akash.goel@intel.com wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> On a long run of more than 2-3 days, physical memory tends to get
> fragmented severely, which considerably slows down the system. In such a
> scenario, the shrinker is also unable to help as lack of memory is not
> the actual problem, since it has been observed that there are enough free
> pages of 0 order. This also manifests itself when an indiviual zone in
> the mm runs out of pages and if we cannot migrate pages between zones,
> the kernel hits an out-of-memory even though there are free pages (and
> often all of swap) available.
> 
> To address the issue of external fragementation, kernel does a compaction
> (which involves migration of pages) but it's efficacy depends upon how
> many pages are marked as MOVABLE, as only those pages can be migrated.
> 
> Currently the backing pages for GPU buffers are allocated from shmemfs
> with GFP_RECLAIMABLE flag, in units of 4KB pages.  In the case of limited
> swap space, it may not be possible always to reclaim or swap-out pages of
> all the inactive objects, to make way for free space allowing formation
> of higher order groups of physically-contiguous pages on compaction.
> 
> Just marking the GPU pages as MOVABLE will not suffice, as i915.ko has to
> pin the pages if they are in use by GPU, which will prevent their
> migration. So the migratepage callback in shmem is also hooked up to get
> a notification when kernel initiates the page migration. On the
> notification, i915.ko appropriately unpin the pages.  With this we can
> effectively mark the GPU pages as MOVABLE and hence mitigate the
> fragmentation problem.
> 
> v2:
>  - Rename the migration routine to gem_shrink_migratepage, move it to the
>    shrinker file, and use the existing constructs (Chris)
>  - To cleanup, add a new helper function to encapsulate all page migration
>    skip conditions (Chris)
>  - Add a new local helper function in shrinker file, for dropping the
>    backing pages, and call the same from gem_shrink() also (Chris)
> 
> v3:
>  - Fix/invert the check on the return value of unsafe_drop_pages (Chris)
> 
> v4:
>  - Minor tidy
> 
> v5:
>  - Fix unsafe usage of unsafe_drop_pages()
>  - Rebase onto vmap-notifier
> 
> v6:
> - Remove i915_gem_object_get/put across unsafe_drop_pages() as with
>   struct_mutex protection object can't disappear. (Chris)
> 
> Testcase: igt/gem_shrink
> Bugzilla: (e.g.) https://bugs.freedesktop.org/show_bug.cgi?id=90254
> Cc: Hugh Dickins <hughd@google.com>
> Cc: linux-mm@kvack.org
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

I'm confused!  But perhaps it's gone around and around between you all,
I'm not sure what the rules are then.  I think this sequence implies
that Sourab wrote it originally, then Akash and Chris passed it on
with refinements - but then Chris wouldn't add Reviewed-by.

> ---
>  drivers/gpu/drm/i915/i915_drv.h          |   2 +
>  drivers/gpu/drm/i915/i915_gem.c          |   9 ++-
>  drivers/gpu/drm/i915/i915_gem_shrinker.c | 132 +++++++++++++++++++++++++++++++
>  3 files changed, 142 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 4735b417..7f2717b 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1357,6 +1357,8 @@ struct intel_l3_parity {
>  };
>  
>  struct i915_gem_mm {
> +	struct shmem_dev_info shmem_info;
> +
>  	/** Memory allocator for GTT stolen memory */
>  	struct drm_mm stolen;
>  	/** Protects the usage of the GTT stolen memory allocator. This is
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 1f995ce..f0d4ce7 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2164,6 +2164,7 @@ void __i915_gem_object_invalidate(struct drm_i915_gem_object *obj)
>  		if (obj->mm.madv == I915_MADV_WILLNEED)
>  			mark_page_accessed(page);
>  
> +		set_page_private(page, 0);
>  		put_page(page);
>  	}
>  	obj->mm.dirty = false;
> @@ -2310,6 +2311,7 @@ static unsigned int swiotlb_max_size(void)
>  			sg->length += PAGE_SIZE;
>  		}
>  		last_pfn = page_to_pfn(page);
> +		set_page_private(page, (unsigned long)obj);
>  
>  		/* Check that the i965g/gm workaround works. */
>  		WARN_ON((gfp & __GFP_DMA32) && (last_pfn >= 0x00100000UL));
> @@ -2334,8 +2336,10 @@ static unsigned int swiotlb_max_size(void)
>  
>  err_pages:
>  	sg_mark_end(sg);
> -	for_each_sgt_page(page, sgt_iter, st)
> +	for_each_sgt_page(page, sgt_iter, st) {
> +		set_page_private(page, 0);
>  		put_page(page);
> +	}
>  	sg_free_table(st);
>  	kfree(st);
>  

I think your page_private() games there and below are correct (and
I suspect took a few iterations to get right); and we don't currently
have a use for page_private() in mm/shmem.c that conflicts with what
you're doing here (of course it's used for swap, but you're already
careful to exclude that case).

But I'd nonetheless prefer not to give it away to you: you're welcome
to use mapping->private_data as you have, but I'd rather keep the more
valuable page_private() available for mm or shmem use.

Would it be reasonable to ask you to rework this with the shmem_dev_info
in dev_priv replaced by shmem_obj_info in drm_i915_gem_object?  Then,
IIUC, you can access both it and the object which contains it from
the page->mapping pointer, without needing page->private.

It that's unreasonable to ask of you, would it be reasonable if I
added a third patch to make that change myself?

> @@ -4185,6 +4189,8 @@ struct drm_i915_gem_object *
>  		goto fail;
>  
>  	mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;
> +	if (IS_ENABLED(MIGRATION))
> +		mask |= __GFP_MOVABLE;

I was going to suggest just make that unconditional,
        mask = GFP_HIGHUSER_MOVABLE | __GFP_RECLAIMABLE;

But then I wondered what that __GFP_RECLAIMABLE actually achieves?
These pages are already __GFP_RECLAIM (inside GFP_HIGHUSER) and on
the LRU.  It affects gfpflags_to_migratetype(), but I'm not familiar
with what that different migratetype will end up doing.

>  	if (IS_CRESTLINE(dev_priv) || IS_BROADWATER(dev_priv)) {
>  		/* 965gm cannot relocate objects above 4GiB. */
>  		mask &= ~__GFP_HIGHMEM;
> @@ -4193,6 +4199,7 @@ struct drm_i915_gem_object *
>  
>  	mapping = obj->base.filp->f_mapping;
>  	mapping_set_gfp_mask(mapping, mask);
> +	shmem_set_dev_info(mapping, &dev_priv->mm.shmem_info);
>  
>  	i915_gem_object_init(obj, &i915_gem_object_ops);
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> index a6fc1bd..051135ac 100644
> --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> @@ -24,6 +24,7 @@
>  
>  #include <linux/oom.h>
>  #include <linux/shmem_fs.h>
> +#include <linux/migrate.h>
>  #include <linux/slab.h>
>  #include <linux/swap.h>
>  #include <linux/pci.h>
> @@ -473,6 +474,132 @@ struct shrinker_lock_uninterruptible {
>  	return NOTIFY_DONE;
>  }
>  
> +#ifdef CONFIG_MIGRATION
> +static bool can_migrate_page(struct drm_i915_gem_object *obj)
> +{
> +	/* Avoid the migration of page if being actively used by GPU */
> +	if (i915_gem_object_is_active(obj))
> +		return false;
> +
> +	/* Skip the migration for purgeable objects otherwise there
> +	 * will be a deadlock when shmem will try to lock the page for
> +	 * truncation, which is already locked by the caller before
> +	 * migration.
> +	 */
> +	if (obj->mm.madv == I915_MADV_DONTNEED)
> +		return false;
> +
> +	/* Skip the migration for a pinned object */
> +	if (atomic_read(&obj->mm.pages_pin_count) > obj->bind_count)
> +		return false;
> +
> +	if (any_vma_pinned(obj))
> +		return false;
> +
> +	return true;
> +}
> +
> +static int do_migrate_page(struct drm_i915_gem_object *obj)

do_migrate_page()?  But it does not.  Maybe prepare_for_migrate_page()?

> +{
> +	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
> +	int ret = 0;
> +
> +	if (!can_migrate_page(obj))
> +		return -EBUSY;
> +
> +	/* HW access would be required for a GGTT bound object, for which
> +	 * device has to be kept awake. But a deadlock scenario can arise if
> +	 * the attempt is made to resume the device, when either a suspend
> +	 * or a resume operation is already happening concurrently from some
> +	 * other path and that only also triggers compaction. So only unbind
> +	 * if the device is currently awake.
> +	 */
> +	if (!intel_runtime_pm_get_if_in_use(dev_priv))
> +		return -EBUSY;
> +
> +	if (!unsafe_drop_pages(obj))
> +		ret = -EBUSY;
> +
> +	intel_runtime_pm_put(dev_priv);
> +	return ret;
> +}
> +
> +static int i915_gem_shrinker_migratepage(struct address_space *mapping,
> +					 struct page *newpage,
> +					 struct page *page,
> +					 enum migrate_mode mode,
> +					 void *dev_priv_data)
> +{
> +	struct drm_i915_private *dev_priv = dev_priv_data;
> +	struct shrinker_lock_uninterruptible slu;
> +	int ret;
> +
> +	/*
> +	 * Clear the private field of the new target page as it could have a
> +	 * stale value in the private field. Otherwise later on if this page
> +	 * itself gets migrated, without getting referred by the Driver
> +	 * in between, the stale value would cause the i915_migratepage
> +	 * function to go for a toss as object pointer is derived from it.
> +	 * This should be safe since at the time of migration, private field
> +	 * of the new page (which is actually an independent free 4KB page now)
> +	 * should be like a don't care for the kernel.
> +	 */
> +	set_page_private(newpage, 0);
> +
> +	if (!page_private(page))
> +		goto migrate;
> +
> +	/*
> +	 * Check the page count, if Driver also has a reference then it should
> +	 * be more than 2, as shmem will have one reference and one reference
> +	 * would have been taken by the migration path itself. So if reference
> +	 * is <=2, we can directly invoke the migration function.
> +	 */
> +	if (page_count(page) <= 2)
> +		goto migrate;
> +
> +	/*
> +	 * Use trylock here, with a timeout, for struct_mutex as
> +	 * otherwise there is a possibility of deadlock due to lock
> +	 * inversion. This path, which tries to migrate a particular
> +	 * page after locking that page, can race with a path which
> +	 * truncate/purge pages of the corresponding object (after
> +	 * acquiring struct_mutex). Since page truncation will also
> +	 * try to lock the page, a scenario of deadlock can arise.
> +	 */
> +	if (!i915_gem_shrinker_lock_uninterruptible(dev_priv, &slu, 10))
> +		return -EBUSY;
> +
> +	ret = 0;
> +	if (!PageSwapCache(page) && page_private(page)) {
> +		struct drm_i915_gem_object *obj =
> +			(struct drm_i915_gem_object *)page_private(page);
> +
> +		ret = do_migrate_page(obj);
> +	}
> +
> +	i915_gem_shrinker_unlock_uninterruptible(dev_priv, &slu);
> +	if (ret)
> +		return ret;
> +
> +	/*
> +	 * Ideally here we don't expect the page count to be > 2, as driver
> +	 * would have dropped its reference, but occasionally it has been seen
> +	 * coming as 3 & 4. This leads to a situation of unexpected page count,
> +	 * causing migration failure, with -EGAIN error. This then leads to

s/EGAIN/EAGAIN/

> +	 * multiple attempts by the kernel to migrate the same set of pages.
> +	 * And sometimes the repeated attempts proves detrimental for stability.
> +	 * Also since we don't know who is the other owner, and for how long its
> +	 * gonna keep the reference, its better to return -EBUSY.

Fair enough, I know those 10 repeats can be quite a waste.  And you've
got a potential 10ms timeout above, which you don't want to multiply by 10.
I can't get too sniffy about your timeout, we have other sources of delay
in there, but it is always sad to add latency to MIGRATE_ASYNC mode.

> +	 */
> +	if (page_count(page) > 2)
> +		return -EBUSY;
> +
> +migrate:
> +	return migrate_page(mapping, newpage, page, mode);
> +}
> +#endif
> +
>  /**
>   * i915_gem_shrinker_init - Initialize i915 shrinker
>   * @dev_priv: i915 device
> @@ -491,6 +618,11 @@ void i915_gem_shrinker_init(struct drm_i915_private *dev_priv)
>  
>  	dev_priv->mm.vmap_notifier.notifier_call = i915_gem_shrinker_vmap;
>  	WARN_ON(register_vmap_purge_notifier(&dev_priv->mm.vmap_notifier));
> +
> +	dev_priv->mm.shmem_info.private_data = dev_priv;
> +#ifdef CONFIG_MIGRATION
> +	dev_priv->mm.shmem_info.migratepage = i915_gem_shrinker_migratepage;
> +#endif

If we avoid playing with page_private(), this initialization would go
away, but the equivalent be done near the call to i915_gem_object_init().

>  }
>  
>  /**
> -- 
> 1.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] drm/i915: Make GPU pages movable
@ 2016-11-10  6:39     ` Hugh Dickins
  0 siblings, 0 replies; 29+ messages in thread
From: Hugh Dickins @ 2016-11-10  6:39 UTC (permalink / raw)
  To: Akash Goel; +Cc: linux-mm, intel-gfx, Hugh Dickins, Sourab Gupta

On Fri, 4 Nov 2016, akash.goel@intel.com wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> On a long run of more than 2-3 days, physical memory tends to get
> fragmented severely, which considerably slows down the system. In such a
> scenario, the shrinker is also unable to help as lack of memory is not
> the actual problem, since it has been observed that there are enough free
> pages of 0 order. This also manifests itself when an indiviual zone in
> the mm runs out of pages and if we cannot migrate pages between zones,
> the kernel hits an out-of-memory even though there are free pages (and
> often all of swap) available.
> 
> To address the issue of external fragementation, kernel does a compaction
> (which involves migration of pages) but it's efficacy depends upon how
> many pages are marked as MOVABLE, as only those pages can be migrated.
> 
> Currently the backing pages for GPU buffers are allocated from shmemfs
> with GFP_RECLAIMABLE flag, in units of 4KB pages.  In the case of limited
> swap space, it may not be possible always to reclaim or swap-out pages of
> all the inactive objects, to make way for free space allowing formation
> of higher order groups of physically-contiguous pages on compaction.
> 
> Just marking the GPU pages as MOVABLE will not suffice, as i915.ko has to
> pin the pages if they are in use by GPU, which will prevent their
> migration. So the migratepage callback in shmem is also hooked up to get
> a notification when kernel initiates the page migration. On the
> notification, i915.ko appropriately unpin the pages.  With this we can
> effectively mark the GPU pages as MOVABLE and hence mitigate the
> fragmentation problem.
> 
> v2:
>  - Rename the migration routine to gem_shrink_migratepage, move it to the
>    shrinker file, and use the existing constructs (Chris)
>  - To cleanup, add a new helper function to encapsulate all page migration
>    skip conditions (Chris)
>  - Add a new local helper function in shrinker file, for dropping the
>    backing pages, and call the same from gem_shrink() also (Chris)
> 
> v3:
>  - Fix/invert the check on the return value of unsafe_drop_pages (Chris)
> 
> v4:
>  - Minor tidy
> 
> v5:
>  - Fix unsafe usage of unsafe_drop_pages()
>  - Rebase onto vmap-notifier
> 
> v6:
> - Remove i915_gem_object_get/put across unsafe_drop_pages() as with
>   struct_mutex protection object can't disappear. (Chris)
> 
> Testcase: igt/gem_shrink
> Bugzilla: (e.g.) https://bugs.freedesktop.org/show_bug.cgi?id=90254
> Cc: Hugh Dickins <hughd@google.com>
> Cc: linux-mm@kvack.org
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

I'm confused!  But perhaps it's gone around and around between you all,
I'm not sure what the rules are then.  I think this sequence implies
that Sourab wrote it originally, then Akash and Chris passed it on
with refinements - but then Chris wouldn't add Reviewed-by.

> ---
>  drivers/gpu/drm/i915/i915_drv.h          |   2 +
>  drivers/gpu/drm/i915/i915_gem.c          |   9 ++-
>  drivers/gpu/drm/i915/i915_gem_shrinker.c | 132 +++++++++++++++++++++++++++++++
>  3 files changed, 142 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 4735b417..7f2717b 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1357,6 +1357,8 @@ struct intel_l3_parity {
>  };
>  
>  struct i915_gem_mm {
> +	struct shmem_dev_info shmem_info;
> +
>  	/** Memory allocator for GTT stolen memory */
>  	struct drm_mm stolen;
>  	/** Protects the usage of the GTT stolen memory allocator. This is
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 1f995ce..f0d4ce7 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2164,6 +2164,7 @@ void __i915_gem_object_invalidate(struct drm_i915_gem_object *obj)
>  		if (obj->mm.madv == I915_MADV_WILLNEED)
>  			mark_page_accessed(page);
>  
> +		set_page_private(page, 0);
>  		put_page(page);
>  	}
>  	obj->mm.dirty = false;
> @@ -2310,6 +2311,7 @@ static unsigned int swiotlb_max_size(void)
>  			sg->length += PAGE_SIZE;
>  		}
>  		last_pfn = page_to_pfn(page);
> +		set_page_private(page, (unsigned long)obj);
>  
>  		/* Check that the i965g/gm workaround works. */
>  		WARN_ON((gfp & __GFP_DMA32) && (last_pfn >= 0x00100000UL));
> @@ -2334,8 +2336,10 @@ static unsigned int swiotlb_max_size(void)
>  
>  err_pages:
>  	sg_mark_end(sg);
> -	for_each_sgt_page(page, sgt_iter, st)
> +	for_each_sgt_page(page, sgt_iter, st) {
> +		set_page_private(page, 0);
>  		put_page(page);
> +	}
>  	sg_free_table(st);
>  	kfree(st);
>  

I think your page_private() games there and below are correct (and
I suspect took a few iterations to get right); and we don't currently
have a use for page_private() in mm/shmem.c that conflicts with what
you're doing here (of course it's used for swap, but you're already
careful to exclude that case).

But I'd nonetheless prefer not to give it away to you: you're welcome
to use mapping->private_data as you have, but I'd rather keep the more
valuable page_private() available for mm or shmem use.

Would it be reasonable to ask you to rework this with the shmem_dev_info
in dev_priv replaced by shmem_obj_info in drm_i915_gem_object?  Then,
IIUC, you can access both it and the object which contains it from
the page->mapping pointer, without needing page->private.

It that's unreasonable to ask of you, would it be reasonable if I
added a third patch to make that change myself?

> @@ -4185,6 +4189,8 @@ struct drm_i915_gem_object *
>  		goto fail;
>  
>  	mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;
> +	if (IS_ENABLED(MIGRATION))
> +		mask |= __GFP_MOVABLE;

I was going to suggest just make that unconditional,
        mask = GFP_HIGHUSER_MOVABLE | __GFP_RECLAIMABLE;

But then I wondered what that __GFP_RECLAIMABLE actually achieves?
These pages are already __GFP_RECLAIM (inside GFP_HIGHUSER) and on
the LRU.  It affects gfpflags_to_migratetype(), but I'm not familiar
with what that different migratetype will end up doing.

>  	if (IS_CRESTLINE(dev_priv) || IS_BROADWATER(dev_priv)) {
>  		/* 965gm cannot relocate objects above 4GiB. */
>  		mask &= ~__GFP_HIGHMEM;
> @@ -4193,6 +4199,7 @@ struct drm_i915_gem_object *
>  
>  	mapping = obj->base.filp->f_mapping;
>  	mapping_set_gfp_mask(mapping, mask);
> +	shmem_set_dev_info(mapping, &dev_priv->mm.shmem_info);
>  
>  	i915_gem_object_init(obj, &i915_gem_object_ops);
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> index a6fc1bd..051135ac 100644
> --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> @@ -24,6 +24,7 @@
>  
>  #include <linux/oom.h>
>  #include <linux/shmem_fs.h>
> +#include <linux/migrate.h>
>  #include <linux/slab.h>
>  #include <linux/swap.h>
>  #include <linux/pci.h>
> @@ -473,6 +474,132 @@ struct shrinker_lock_uninterruptible {
>  	return NOTIFY_DONE;
>  }
>  
> +#ifdef CONFIG_MIGRATION
> +static bool can_migrate_page(struct drm_i915_gem_object *obj)
> +{
> +	/* Avoid the migration of page if being actively used by GPU */
> +	if (i915_gem_object_is_active(obj))
> +		return false;
> +
> +	/* Skip the migration for purgeable objects otherwise there
> +	 * will be a deadlock when shmem will try to lock the page for
> +	 * truncation, which is already locked by the caller before
> +	 * migration.
> +	 */
> +	if (obj->mm.madv == I915_MADV_DONTNEED)
> +		return false;
> +
> +	/* Skip the migration for a pinned object */
> +	if (atomic_read(&obj->mm.pages_pin_count) > obj->bind_count)
> +		return false;
> +
> +	if (any_vma_pinned(obj))
> +		return false;
> +
> +	return true;
> +}
> +
> +static int do_migrate_page(struct drm_i915_gem_object *obj)

do_migrate_page()?  But it does not.  Maybe prepare_for_migrate_page()?

> +{
> +	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
> +	int ret = 0;
> +
> +	if (!can_migrate_page(obj))
> +		return -EBUSY;
> +
> +	/* HW access would be required for a GGTT bound object, for which
> +	 * device has to be kept awake. But a deadlock scenario can arise if
> +	 * the attempt is made to resume the device, when either a suspend
> +	 * or a resume operation is already happening concurrently from some
> +	 * other path and that only also triggers compaction. So only unbind
> +	 * if the device is currently awake.
> +	 */
> +	if (!intel_runtime_pm_get_if_in_use(dev_priv))
> +		return -EBUSY;
> +
> +	if (!unsafe_drop_pages(obj))
> +		ret = -EBUSY;
> +
> +	intel_runtime_pm_put(dev_priv);
> +	return ret;
> +}
> +
> +static int i915_gem_shrinker_migratepage(struct address_space *mapping,
> +					 struct page *newpage,
> +					 struct page *page,
> +					 enum migrate_mode mode,
> +					 void *dev_priv_data)
> +{
> +	struct drm_i915_private *dev_priv = dev_priv_data;
> +	struct shrinker_lock_uninterruptible slu;
> +	int ret;
> +
> +	/*
> +	 * Clear the private field of the new target page as it could have a
> +	 * stale value in the private field. Otherwise later on if this page
> +	 * itself gets migrated, without getting referred by the Driver
> +	 * in between, the stale value would cause the i915_migratepage
> +	 * function to go for a toss as object pointer is derived from it.
> +	 * This should be safe since at the time of migration, private field
> +	 * of the new page (which is actually an independent free 4KB page now)
> +	 * should be like a don't care for the kernel.
> +	 */
> +	set_page_private(newpage, 0);
> +
> +	if (!page_private(page))
> +		goto migrate;
> +
> +	/*
> +	 * Check the page count, if Driver also has a reference then it should
> +	 * be more than 2, as shmem will have one reference and one reference
> +	 * would have been taken by the migration path itself. So if reference
> +	 * is <=2, we can directly invoke the migration function.
> +	 */
> +	if (page_count(page) <= 2)
> +		goto migrate;
> +
> +	/*
> +	 * Use trylock here, with a timeout, for struct_mutex as
> +	 * otherwise there is a possibility of deadlock due to lock
> +	 * inversion. This path, which tries to migrate a particular
> +	 * page after locking that page, can race with a path which
> +	 * truncate/purge pages of the corresponding object (after
> +	 * acquiring struct_mutex). Since page truncation will also
> +	 * try to lock the page, a scenario of deadlock can arise.
> +	 */
> +	if (!i915_gem_shrinker_lock_uninterruptible(dev_priv, &slu, 10))
> +		return -EBUSY;
> +
> +	ret = 0;
> +	if (!PageSwapCache(page) && page_private(page)) {
> +		struct drm_i915_gem_object *obj =
> +			(struct drm_i915_gem_object *)page_private(page);
> +
> +		ret = do_migrate_page(obj);
> +	}
> +
> +	i915_gem_shrinker_unlock_uninterruptible(dev_priv, &slu);
> +	if (ret)
> +		return ret;
> +
> +	/*
> +	 * Ideally here we don't expect the page count to be > 2, as driver
> +	 * would have dropped its reference, but occasionally it has been seen
> +	 * coming as 3 & 4. This leads to a situation of unexpected page count,
> +	 * causing migration failure, with -EGAIN error. This then leads to

s/EGAIN/EAGAIN/

> +	 * multiple attempts by the kernel to migrate the same set of pages.
> +	 * And sometimes the repeated attempts proves detrimental for stability.
> +	 * Also since we don't know who is the other owner, and for how long its
> +	 * gonna keep the reference, its better to return -EBUSY.

Fair enough, I know those 10 repeats can be quite a waste.  And you've
got a potential 10ms timeout above, which you don't want to multiply by 10.
I can't get too sniffy about your timeout, we have other sources of delay
in there, but it is always sad to add latency to MIGRATE_ASYNC mode.

> +	 */
> +	if (page_count(page) > 2)
> +		return -EBUSY;
> +
> +migrate:
> +	return migrate_page(mapping, newpage, page, mode);
> +}
> +#endif
> +
>  /**
>   * i915_gem_shrinker_init - Initialize i915 shrinker
>   * @dev_priv: i915 device
> @@ -491,6 +618,11 @@ void i915_gem_shrinker_init(struct drm_i915_private *dev_priv)
>  
>  	dev_priv->mm.vmap_notifier.notifier_call = i915_gem_shrinker_vmap;
>  	WARN_ON(register_vmap_purge_notifier(&dev_priv->mm.vmap_notifier));
> +
> +	dev_priv->mm.shmem_info.private_data = dev_priv;
> +#ifdef CONFIG_MIGRATION
> +	dev_priv->mm.shmem_info.migratepage = i915_gem_shrinker_migratepage;
> +#endif

If we avoid playing with page_private(), this initialization would go
away, but the equivalent be done near the call to i915_gem_object_init().

>  }
>  
>  /**
> -- 
> 1.9.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] drm/i915: Make GPU pages movable
  2016-11-10  6:39     ` Hugh Dickins
  (?)
@ 2016-11-10  7:30     ` Goel, Akash
  2016-11-14  7:57       ` akash goel
  -1 siblings, 1 reply; 29+ messages in thread
From: Goel, Akash @ 2016-11-10  7:30 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: intel-gfx, Chris Wilson, linux-mm, Sourab Gupta, akash.goels, akash.goel



On 11/10/2016 12:09 PM, Hugh Dickins wrote:
> On Fri, 4 Nov 2016, akash.goel@intel.com wrote:
>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>
>> On a long run of more than 2-3 days, physical memory tends to get
>> fragmented severely, which considerably slows down the system. In such a
>> scenario, the shrinker is also unable to help as lack of memory is not
>> the actual problem, since it has been observed that there are enough free
>> pages of 0 order. This also manifests itself when an indiviual zone in
>> the mm runs out of pages and if we cannot migrate pages between zones,
>> the kernel hits an out-of-memory even though there are free pages (and
>> often all of swap) available.
>>
>> To address the issue of external fragementation, kernel does a compaction
>> (which involves migration of pages) but it's efficacy depends upon how
>> many pages are marked as MOVABLE, as only those pages can be migrated.
>>
>> Currently the backing pages for GPU buffers are allocated from shmemfs
>> with GFP_RECLAIMABLE flag, in units of 4KB pages.  In the case of limited
>> swap space, it may not be possible always to reclaim or swap-out pages of
>> all the inactive objects, to make way for free space allowing formation
>> of higher order groups of physically-contiguous pages on compaction.
>>
>> Just marking the GPU pages as MOVABLE will not suffice, as i915.ko has to
>> pin the pages if they are in use by GPU, which will prevent their
>> migration. So the migratepage callback in shmem is also hooked up to get
>> a notification when kernel initiates the page migration. On the
>> notification, i915.ko appropriately unpin the pages.  With this we can
>> effectively mark the GPU pages as MOVABLE and hence mitigate the
>> fragmentation problem.
>>
>> v2:
>>  - Rename the migration routine to gem_shrink_migratepage, move it to the
>>    shrinker file, and use the existing constructs (Chris)
>>  - To cleanup, add a new helper function to encapsulate all page migration
>>    skip conditions (Chris)
>>  - Add a new local helper function in shrinker file, for dropping the
>>    backing pages, and call the same from gem_shrink() also (Chris)
>>
>> v3:
>>  - Fix/invert the check on the return value of unsafe_drop_pages (Chris)
>>
>> v4:
>>  - Minor tidy
>>
>> v5:
>>  - Fix unsafe usage of unsafe_drop_pages()
>>  - Rebase onto vmap-notifier
>>
>> v6:
>> - Remove i915_gem_object_get/put across unsafe_drop_pages() as with
>>   struct_mutex protection object can't disappear. (Chris)
>>
>> Testcase: igt/gem_shrink
>> Bugzilla: (e.g.) https://bugs.freedesktop.org/show_bug.cgi?id=90254
>> Cc: Hugh Dickins <hughd@google.com>
>> Cc: linux-mm@kvack.org
>> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
>> Signed-off-by: Akash Goel <akash.goel@intel.com>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
>
> I'm confused!  But perhaps it's gone around and around between you all,
> I'm not sure what the rules are then.  I think this sequence implies
> that Sourab wrote it originally, then Akash and Chris passed it on
> with refinements - but then Chris wouldn't add Reviewed-by.
>
Thank you very much for the review and sorry for all the needless confusion.

Chris actually conceived the patches and prepared an initial version of 
them (hence he is the Author).
I & Sourab did the further refinements and fixed issues (all those 
page_private stuff).
Chris then reviewed the final patch and also recently did a rebase for it.

>> ---
>>  drivers/gpu/drm/i915/i915_drv.h          |   2 +
>>  drivers/gpu/drm/i915/i915_gem.c          |   9 ++-
>>  drivers/gpu/drm/i915/i915_gem_shrinker.c | 132 +++++++++++++++++++++++++++++++
>>  3 files changed, 142 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index 4735b417..7f2717b 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -1357,6 +1357,8 @@ struct intel_l3_parity {
>>  };
>>
>>  struct i915_gem_mm {
>> +	struct shmem_dev_info shmem_info;
>> +
>>  	/** Memory allocator for GTT stolen memory */
>>  	struct drm_mm stolen;
>>  	/** Protects the usage of the GTT stolen memory allocator. This is
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index 1f995ce..f0d4ce7 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -2164,6 +2164,7 @@ void __i915_gem_object_invalidate(struct drm_i915_gem_object *obj)
>>  		if (obj->mm.madv == I915_MADV_WILLNEED)
>>  			mark_page_accessed(page);
>>
>> +		set_page_private(page, 0);
>>  		put_page(page);
>>  	}
>>  	obj->mm.dirty = false;
>> @@ -2310,6 +2311,7 @@ static unsigned int swiotlb_max_size(void)
>>  			sg->length += PAGE_SIZE;
>>  		}
>>  		last_pfn = page_to_pfn(page);
>> +		set_page_private(page, (unsigned long)obj);
>>
>>  		/* Check that the i965g/gm workaround works. */
>>  		WARN_ON((gfp & __GFP_DMA32) && (last_pfn >= 0x00100000UL));
>> @@ -2334,8 +2336,10 @@ static unsigned int swiotlb_max_size(void)
>>
>>  err_pages:
>>  	sg_mark_end(sg);
>> -	for_each_sgt_page(page, sgt_iter, st)
>> +	for_each_sgt_page(page, sgt_iter, st) {
>> +		set_page_private(page, 0);
>>  		put_page(page);
>> +	}
>>  	sg_free_table(st);
>>  	kfree(st);
>>
>
> I think your page_private() games there and below are correct (and
> I suspect took a few iterations to get right); and we don't currently
> have a use for page_private() in mm/shmem.c that conflicts with what
> you're doing here (of course it's used for swap, but you're already
> careful to exclude that case).
>
> But I'd nonetheless prefer not to give it away to you: you're welcome
> to use mapping->private_data as you have, but I'd rather keep the more
> valuable page_private() available for mm or shmem use.
>
> Would it be reasonable to ask you to rework this with the shmem_dev_info
> in dev_priv replaced by shmem_obj_info in drm_i915_gem_object?  Then,
> IIUC, you can access both it and the object which contains it from
> the page->mapping pointer, without needing page->private.
>
If I understood your suggestion correctly, instead of page_private the 
object pointer can be derived from mapping->private (by having 
shmem_obj_info embedded inside drm_i915_gem_object instead of dev_priv).

Will definitely try to rework the patch as per your suggestions.

> It that's unreasonable to ask of you, would it be reasonable if I
> added a third patch to make that change myself?
>
>> @@ -4185,6 +4189,8 @@ struct drm_i915_gem_object *
>>  		goto fail;
>>
>>  	mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;
>> +	if (IS_ENABLED(MIGRATION))
>> +		mask |= __GFP_MOVABLE;
>
> I was going to suggest just make that unconditional,
>         mask = GFP_HIGHUSER_MOVABLE | __GFP_RECLAIMABLE;
>
> But then I wondered what that __GFP_RECLAIMABLE actually achieves?
> These pages are already __GFP_RECLAIM (inside GFP_HIGHUSER) and on
> the LRU.  It affects gfpflags_to_migratetype(), but I'm not familiar
> with what that different migratetype will end up doing.
>

Will check for this.

>>  	if (IS_CRESTLINE(dev_priv) || IS_BROADWATER(dev_priv)) {
>>  		/* 965gm cannot relocate objects above 4GiB. */
>>  		mask &= ~__GFP_HIGHMEM;
>> @@ -4193,6 +4199,7 @@ struct drm_i915_gem_object *
>>
>>  	mapping = obj->base.filp->f_mapping;
>>  	mapping_set_gfp_mask(mapping, mask);
>> +	shmem_set_dev_info(mapping, &dev_priv->mm.shmem_info);
>>
>>  	i915_gem_object_init(obj, &i915_gem_object_ops);
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
>> index a6fc1bd..051135ac 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
>> @@ -24,6 +24,7 @@
>>
>>  #include <linux/oom.h>
>>  #include <linux/shmem_fs.h>
>> +#include <linux/migrate.h>
>>  #include <linux/slab.h>
>>  #include <linux/swap.h>
>>  #include <linux/pci.h>
>> @@ -473,6 +474,132 @@ struct shrinker_lock_uninterruptible {
>>  	return NOTIFY_DONE;
>>  }
>>
>> +#ifdef CONFIG_MIGRATION
>> +static bool can_migrate_page(struct drm_i915_gem_object *obj)
>> +{
>> +	/* Avoid the migration of page if being actively used by GPU */
>> +	if (i915_gem_object_is_active(obj))
>> +		return false;
>> +
>> +	/* Skip the migration for purgeable objects otherwise there
>> +	 * will be a deadlock when shmem will try to lock the page for
>> +	 * truncation, which is already locked by the caller before
>> +	 * migration.
>> +	 */
>> +	if (obj->mm.madv == I915_MADV_DONTNEED)
>> +		return false;
>> +
>> +	/* Skip the migration for a pinned object */
>> +	if (atomic_read(&obj->mm.pages_pin_count) > obj->bind_count)
>> +		return false;
>> +
>> +	if (any_vma_pinned(obj))
>> +		return false;
>> +
>> +	return true;
>> +}
>> +
>> +static int do_migrate_page(struct drm_i915_gem_object *obj)
>
> do_migrate_page()?  But it does not.  Maybe prepare_for_migrate_page()?
>
fine..

>> +{
>> +	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
>> +	int ret = 0;
>> +
>> +	if (!can_migrate_page(obj))
>> +		return -EBUSY;
>> +
>> +	/* HW access would be required for a GGTT bound object, for which
>> +	 * device has to be kept awake. But a deadlock scenario can arise if
>> +	 * the attempt is made to resume the device, when either a suspend
>> +	 * or a resume operation is already happening concurrently from some
>> +	 * other path and that only also triggers compaction. So only unbind
>> +	 * if the device is currently awake.
>> +	 */
>> +	if (!intel_runtime_pm_get_if_in_use(dev_priv))
>> +		return -EBUSY;
>> +
>> +	if (!unsafe_drop_pages(obj))
>> +		ret = -EBUSY;
>> +
>> +	intel_runtime_pm_put(dev_priv);
>> +	return ret;
>> +}
>> +
>> +static int i915_gem_shrinker_migratepage(struct address_space *mapping,
>> +					 struct page *newpage,
>> +					 struct page *page,
>> +					 enum migrate_mode mode,
>> +					 void *dev_priv_data)
>> +{
>> +	struct drm_i915_private *dev_priv = dev_priv_data;
>> +	struct shrinker_lock_uninterruptible slu;
>> +	int ret;
>> +
>> +	/*
>> +	 * Clear the private field of the new target page as it could have a
>> +	 * stale value in the private field. Otherwise later on if this page
>> +	 * itself gets migrated, without getting referred by the Driver
>> +	 * in between, the stale value would cause the i915_migratepage
>> +	 * function to go for a toss as object pointer is derived from it.
>> +	 * This should be safe since at the time of migration, private field
>> +	 * of the new page (which is actually an independent free 4KB page now)
>> +	 * should be like a don't care for the kernel.
>> +	 */
>> +	set_page_private(newpage, 0);
>> +
>> +	if (!page_private(page))
>> +		goto migrate;
>> +
>> +	/*
>> +	 * Check the page count, if Driver also has a reference then it should
>> +	 * be more than 2, as shmem will have one reference and one reference
>> +	 * would have been taken by the migration path itself. So if reference
>> +	 * is <=2, we can directly invoke the migration function.
>> +	 */
>> +	if (page_count(page) <= 2)
>> +		goto migrate;
>> +
>> +	/*
>> +	 * Use trylock here, with a timeout, for struct_mutex as
>> +	 * otherwise there is a possibility of deadlock due to lock
>> +	 * inversion. This path, which tries to migrate a particular
>> +	 * page after locking that page, can race with a path which
>> +	 * truncate/purge pages of the corresponding object (after
>> +	 * acquiring struct_mutex). Since page truncation will also
>> +	 * try to lock the page, a scenario of deadlock can arise.
>> +	 */
>> +	if (!i915_gem_shrinker_lock_uninterruptible(dev_priv, &slu, 10))
>> +		return -EBUSY;
>> +
>> +	ret = 0;
>> +	if (!PageSwapCache(page) && page_private(page)) {
>> +		struct drm_i915_gem_object *obj =
>> +			(struct drm_i915_gem_object *)page_private(page);
>> +
>> +		ret = do_migrate_page(obj);
>> +	}
>> +
>> +	i915_gem_shrinker_unlock_uninterruptible(dev_priv, &slu);
>> +	if (ret)
>> +		return ret;
>> +
>> +	/*
>> +	 * Ideally here we don't expect the page count to be > 2, as driver
>> +	 * would have dropped its reference, but occasionally it has been seen
>> +	 * coming as 3 & 4. This leads to a situation of unexpected page count,
>> +	 * causing migration failure, with -EGAIN error. This then leads to
>
> s/EGAIN/EAGAIN/
Fine.
>
>> +	 * multiple attempts by the kernel to migrate the same set of pages.
>> +	 * And sometimes the repeated attempts proves detrimental for stability.
>> +	 * Also since we don't know who is the other owner, and for how long its
>> +	 * gonna keep the reference, its better to return -EBUSY.
>
> Fair enough, I know those 10 repeats can be quite a waste.  And you've
> got a potential 10ms timeout above, which you don't want to multiply by 10.
> I can't get too sniffy about your timeout, we have other sources of delay
> in there, but it is always sad to add latency to MIGRATE_ASYNC mode.
>
>> +	 */
>> +	if (page_count(page) > 2)
>> +		return -EBUSY;
>> +
>> +migrate:
>> +	return migrate_page(mapping, newpage, page, mode);
>> +}
>> +#endif
>> +
>>  /**
>>   * i915_gem_shrinker_init - Initialize i915 shrinker
>>   * @dev_priv: i915 device
>> @@ -491,6 +618,11 @@ void i915_gem_shrinker_init(struct drm_i915_private *dev_priv)
>>
>>  	dev_priv->mm.vmap_notifier.notifier_call = i915_gem_shrinker_vmap;
>>  	WARN_ON(register_vmap_purge_notifier(&dev_priv->mm.vmap_notifier));
>> +
>> +	dev_priv->mm.shmem_info.private_data = dev_priv;
>> +#ifdef CONFIG_MIGRATION
>> +	dev_priv->mm.shmem_info.migratepage = i915_gem_shrinker_migratepage;
>> +#endif
>
> If we avoid playing with page_private(), this initialization would go
> away, but the equivalent be done near the call to i915_gem_object_init().
>
Agree.

Best regards
Akash

>>  }
>>
>>  /**
>> --
>> 1.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] shmem: Support for registration of driver/file owner specific ops
  2016-11-10  5:36 ` [PATCH 1/2] " Hugh Dickins
@ 2016-11-10 16:22   ` Goel, Akash
  2016-11-11 13:50       ` Chris Wilson
  0 siblings, 1 reply; 29+ messages in thread
From: Goel, Akash @ 2016-11-10 16:22 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: intel-gfx, Chris Wilson, linux-mm, linux-kernel, Sourab Gupta,
	akash.goels, akash.goel



On 11/10/2016 11:06 AM, Hugh Dickins wrote:
> On Fri, 4 Nov 2016, akash.goel@intel.com wrote:
>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>
>> This provides support for the drivers or shmem file owners to register
>> a set of callbacks, which can be invoked from the address space
>> operations methods implemented by shmem.  This allow the file owners to
>> hook into the shmem address space operations to do some extra/custom
>> operations in addition to the default ones.
>>
>> The private_data field of address_space struct is used to store the
>> pointer to driver specific ops.  Currently only one ops field is defined,
>> which is migratepage, but can be extended on an as-needed basis.
>>
>> The need for driver specific operations arises since some of the
>> operations (like migratepage) may not be handled completely within shmem,
>> so as to be effective, and would need some driver specific handling also.
>> Specifically, i915.ko would like to participate in migratepage().
>> i915.ko uses shmemfs to provide swappable backing storage for its user
>> objects, but when those objects are in use by the GPU it must pin the
>> entire object until the GPU is idle.  As a result, large chunks of memory
>> can be arbitrarily withdrawn from page migration, resulting in premature
>> out-of-memory due to fragmentation.  However, if i915.ko can receive the
>> migratepage() request, it can then flush the object from the GPU, remove
>> its pin and thus enable the migration.
>>
>> Since gfx allocations are one of the major consumer of system memory, its
>> imperative to have such a mechanism to effectively deal with
>> fragmentation.  And therefore the need for such a provision for initiating
>> driver specific actions during address space operations.
>
> Thank you for persisting with this, and sorry for all my delay.
>
>>
>> v2:
>> - Drop dev_ prefix from the members of shmem_dev_info structure. (Joonas)
>> - Change the return type of shmem_set_device_op() to void and remove the
>>   check for pre-existing data. (Joonas)
>> - Rename shmem_set_device_op() to shmem_set_dev_info() to be consistent
>>   with shmem_dev_info structure. (Joonas)
>>
>> Cc: Hugh Dickins <hughd@google.com>
>> Cc: linux-mm@kvack.org
>> Cc: linux-kernel@vger.linux.org
>> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
>> Signed-off-by: Akash Goel <akash.goel@intel.com>
>> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
>
> That doesn't seem quite right: the From line above implies that Chris
> wrote it, and should be first Signer; but perhaps the From line is wrong.
>
Chris only wrote this patch initially, will do the required correction.

>> ---
>>  include/linux/shmem_fs.h | 13 +++++++++++++
>>  mm/shmem.c               | 17 ++++++++++++++++-
>>  2 files changed, 29 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
>> index ff078e7..454c3ba 100644
>> --- a/include/linux/shmem_fs.h
>> +++ b/include/linux/shmem_fs.h
>> @@ -39,11 +39,24 @@ struct shmem_sb_info {
>>  	unsigned long shrinklist_len; /* Length of shrinklist */
>>  };
>>
>> +struct shmem_dev_info {
>> +	void *private_data;
>> +	int (*migratepage)(struct address_space *mapping,
>> +			   struct page *newpage, struct page *page,
>> +			   enum migrate_mode mode, void *dev_priv_data);
>
> Aren't the private_data field and dev_priv_data arg a little bit
> confusing and redundant?  Can't the migratepage() deduce dev_priv
> for itself from mapping->private_data (perhaps wrapped by a
> shmem_get_dev_info()), by using container_of()?
>
Yes looks like migratepage() can deduce dev_priv from mapping->private_data.
Can we keep the private_data as a placeholder ?. Will 
s/dev_priv_data/private_data/.

As per your suggestion, in the other patch, object pointer can be 
derived from mapping->private_data (container_of) and dev_priv in turn 
can be derived from object pointer.

>> +};
>> +
>>  static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
>>  {
>>  	return container_of(inode, struct shmem_inode_info, vfs_inode);
>>  }
>>
>> +static inline void shmem_set_dev_info(struct address_space *mapping,
>> +				      struct shmem_dev_info *info)
>> +{
>> +	mapping->private_data = info;
>
> Nit: if this stays as is, I'd prefer dev_info there and above,
> since shmem.c uses info all over for its shmem_inode_info pointer.
> But in second patch I suggest obj_info may be better than dev_info.
>
Fine will s/info/dev_info.

Best regards
Akash

>> +}
>> +
>>  /*
>>   * Functions in mm/shmem.c called directly from elsewhere:
>>   */
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index ad7813d..fce8de3 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -1290,6 +1290,21 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>>  	return 0;
>>  }
>>
>> +#ifdef CONFIG_MIGRATION
>> +static int shmem_migratepage(struct address_space *mapping,
>> +			     struct page *newpage, struct page *page,
>> +			     enum migrate_mode mode)
>> +{
>> +	struct shmem_dev_info *dev_info = mapping->private_data;
>> +
>> +	if (dev_info && dev_info->migratepage)
>> +		return dev_info->migratepage(mapping, newpage, page,
>> +					     mode, dev_info->private_data);
>> +
>> +	return migrate_page(mapping, newpage, page, mode);
>> +}
>> +#endif
>> +
>>  #if defined(CONFIG_NUMA) && defined(CONFIG_TMPFS)
>>  static void shmem_show_mpol(struct seq_file *seq, struct mempolicy *mpol)
>>  {
>> @@ -3654,7 +3669,7 @@ static void shmem_destroy_inodecache(void)
>>  	.write_end	= shmem_write_end,
>>  #endif
>>  #ifdef CONFIG_MIGRATION
>> -	.migratepage	= migrate_page,
>> +	.migratepage	= shmem_migratepage,
>>  #endif
>>  	.error_remove_page = generic_error_remove_page,
>>  };
>> --
>> 1.9.2
>
> I didn't like this very much; but every time I tried to "improve" it,
> found good reasons why you chose the way you did (modularity of i915,
> constness of a_ops, reluctance to copy and modify a_ops, reluctance
> to export those shmem methods separately).
>
> I think perhaps later we just add a gem_ops pointer to shmem_inode_info,
> for i915 or other gems to fill in as they wish (and shmem divert off to
> them if set, as you've done); but for now you're trying to avoid
> enlarging the shmem inode, okay.
>
> Hugh
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] shmem: Support for registration of driver/file owner specific ops
  2016-11-10 16:22   ` Goel, Akash
@ 2016-11-11 13:50       ` Chris Wilson
  0 siblings, 0 replies; 29+ messages in thread
From: Chris Wilson @ 2016-11-11 13:50 UTC (permalink / raw)
  To: Goel, Akash
  Cc: Hugh Dickins, intel-gfx, linux-mm, linux-kernel, Sourab Gupta,
	akash.goels

On Thu, Nov 10, 2016 at 09:52:34PM +0530, Goel, Akash wrote:
> 
> 
> On 11/10/2016 11:06 AM, Hugh Dickins wrote:
> >On Fri, 4 Nov 2016, akash.goel@intel.com wrote:
> >>Cc: Hugh Dickins <hughd@google.com>
> >>Cc: linux-mm@kvack.org
> >>Cc: linux-kernel@vger.linux.org
> >>Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> >>Signed-off-by: Akash Goel <akash.goel@intel.com>
> >>Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> >
> >That doesn't seem quite right: the From line above implies that Chris
> >wrote it, and should be first Signer; but perhaps the From line is wrong.
> >
> Chris only wrote this patch initially, will do the required correction.

Akash is being modest. I gave him an idea I had been toying with to help
reduce premature oom, he is the one who deserves credit for turning it
into a functional patch and putting it into production.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] shmem: Support for registration of driver/file owner specific ops
@ 2016-11-11 13:50       ` Chris Wilson
  0 siblings, 0 replies; 29+ messages in thread
From: Chris Wilson @ 2016-11-11 13:50 UTC (permalink / raw)
  To: Goel, Akash; +Cc: intel-gfx, Hugh Dickins, linux-mm, Sourab Gupta, linux-kernel

On Thu, Nov 10, 2016 at 09:52:34PM +0530, Goel, Akash wrote:
> 
> 
> On 11/10/2016 11:06 AM, Hugh Dickins wrote:
> >On Fri, 4 Nov 2016, akash.goel@intel.com wrote:
> >>Cc: Hugh Dickins <hughd@google.com>
> >>Cc: linux-mm@kvack.org
> >>Cc: linux-kernel@vger.linux.org
> >>Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> >>Signed-off-by: Akash Goel <akash.goel@intel.com>
> >>Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> >
> >That doesn't seem quite right: the From line above implies that Chris
> >wrote it, and should be first Signer; but perhaps the From line is wrong.
> >
> Chris only wrote this patch initially, will do the required correction.

Akash is being modest. I gave him an idea I had been toying with to help
reduce premature oom, he is the one who deserves credit for turning it
into a functional patch and putting it into production.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] drm/i915: Make GPU pages movable
  2016-11-10  7:30     ` Goel, Akash
@ 2016-11-14  7:57       ` akash goel
  2016-11-16  1:25           ` Hugh Dickins
  0 siblings, 1 reply; 29+ messages in thread
From: akash goel @ 2016-11-14  7:57 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: intel-gfx, linux-mm, Chris Wilson, akash goel, Sourab Gupta

On Thu, Nov 10, 2016 at 1:00 PM, Goel, Akash <akash.goel@intel.com> wrote:
>
>
> On 11/10/2016 12:09 PM, Hugh Dickins wrote:
>>
>> On Fri, 4 Nov 2016, akash.goel@intel.com wrote:
>>>
>>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>>
>>> On a long run of more than 2-3 days, physical memory tends to get
>>> fragmented severely, which considerably slows down the system. In such a
>>> scenario, the shrinker is also unable to help as lack of memory is not
>>> the actual problem, since it has been observed that there are enough free
>>> pages of 0 order. This also manifests itself when an indiviual zone in
>>> the mm runs out of pages and if we cannot migrate pages between zones,
>>> the kernel hits an out-of-memory even though there are free pages (and
>>> often all of swap) available.
>>>
>>> To address the issue of external fragementation, kernel does a compaction
>>> (which involves migration of pages) but it's efficacy depends upon how
>>> many pages are marked as MOVABLE, as only those pages can be migrated.
>>>
>>> Currently the backing pages for GPU buffers are allocated from shmemfs
>>> with GFP_RECLAIMABLE flag, in units of 4KB pages.  In the case of limited
>>> swap space, it may not be possible always to reclaim or swap-out pages of
>>> all the inactive objects, to make way for free space allowing formation
>>> of higher order groups of physically-contiguous pages on compaction.
>>>
>>> Just marking the GPU pages as MOVABLE will not suffice, as i915.ko has to
>>> pin the pages if they are in use by GPU, which will prevent their
>>> migration. So the migratepage callback in shmem is also hooked up to get
>>> a notification when kernel initiates the page migration. On the
>>> notification, i915.ko appropriately unpin the pages.  With this we can
>>> effectively mark the GPU pages as MOVABLE and hence mitigate the
>>> fragmentation problem.
>>>
>>> v2:
>>>  - Rename the migration routine to gem_shrink_migratepage, move it to the
>>>    shrinker file, and use the existing constructs (Chris)
>>>  - To cleanup, add a new helper function to encapsulate all page
>>> migration
>>>    skip conditions (Chris)
>>>  - Add a new local helper function in shrinker file, for dropping the
>>>    backing pages, and call the same from gem_shrink() also (Chris)
>>>
>>> v3:
>>>  - Fix/invert the check on the return value of unsafe_drop_pages (Chris)
>>>
>>> v4:
>>>  - Minor tidy
>>>
>>> v5:
>>>  - Fix unsafe usage of unsafe_drop_pages()
>>>  - Rebase onto vmap-notifier
>>>
>>> v6:
>>> - Remove i915_gem_object_get/put across unsafe_drop_pages() as with
>>>   struct_mutex protection object can't disappear. (Chris)
>>>
>>> Testcase: igt/gem_shrink
>>> Bugzilla: (e.g.) https://bugs.freedesktop.org/show_bug.cgi?id=90254
>>> Cc: Hugh Dickins <hughd@google.com>
>>> Cc: linux-mm@kvack.org
>>> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
>>> Signed-off-by: Akash Goel <akash.goel@intel.com>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
>>
>>
>> I'm confused!  But perhaps it's gone around and around between you all,
>> I'm not sure what the rules are then.  I think this sequence implies
>> that Sourab wrote it originally, then Akash and Chris passed it on
>> with refinements - but then Chris wouldn't add Reviewed-by.
>>
> Thank you very much for the review and sorry for all the needless confusion.
>
> Chris actually conceived the patches and prepared an initial version of them
> (hence he is the Author).
> I & Sourab did the further refinements and fixed issues (all those
> page_private stuff).
> Chris then reviewed the final patch and also recently did a rebase for it.
>
>
>>> ---
>>>  drivers/gpu/drm/i915/i915_drv.h          |   2 +
>>>  drivers/gpu/drm/i915/i915_gem.c          |   9 ++-
>>>  drivers/gpu/drm/i915/i915_gem_shrinker.c | 132
>>> +++++++++++++++++++++++++++++++
>>>  3 files changed, 142 insertions(+), 1 deletion(-)
>>>
snip
>>
>>> @@ -4185,6 +4189,8 @@ struct drm_i915_gem_object *
>>>                 goto fail;
>>>
>>>         mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;
>>> +       if (IS_ENABLED(MIGRATION))
>>> +               mask |= __GFP_MOVABLE;
>>
>>
>> I was going to suggest just make that unconditional,
>>         mask = GFP_HIGHUSER_MOVABLE | __GFP_RECLAIMABLE;
>>
>> But then I wondered what that __GFP_RECLAIMABLE actually achieves?
>> These pages are already __GFP_RECLAIM (inside GFP_HIGHUSER) and on
>> the LRU.  It affects gfpflags_to_migratetype(), but I'm not familiar
>> with what that different migratetype will end up doing.
>>
>
> Will check for this.
>

The anti-fragmentation technique used by kernel is based on the idea
of grouping pages with identical mobility (UNMOVABLE, RECLAIMABLE,
MOVABLE) together.
__GFP_RECLAIMABLE, like  __GFP_MOVABLE, specifies the
mobility/migration type of the page and serves a different purpose
than __GFP_RECLAIM.

Also as per the below snippet from gfpflags_to_migratetype(), looks
like __GFP_MOVABLE &  __GFP_RECLAIMABLE can't be used together, which
makes sense.
/* Convert GFP flags to their corresponding migrate type */
#define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE | __GFP_MOVABLE)
static inline int gfpflags_to_migratetype(const gfp_t gfp_flags)
{
        VM_WARN_ON((gfp_flags & GFP_MOVABLE_MASK) == GFP_MOVABLE_MASK);
.....

So probably would need to update the mask like this,
       mask = GFP_HIGHUSER;
       if (IS_ENABLED(MIGRATION))
             mask |= __GFP_MOVABLE;
       else
             mask |=  __GFP_RECLAIMABLE;

Please kindly let us know if this looks fine to you or not.

Best regards
Akash

>
>>>         if (IS_CRESTLINE(dev_priv) || IS_BROADWATER(dev_priv)) {
>>>                 /* 965gm cannot relocate objects above 4GiB. */
>>>                 mask &= ~__GFP_HIGHMEM;
>>> @@ -4193,6 +4199,7 @@ struct drm_i915_gem_object *
>>>
>>>         mapping = obj->base.filp->f_mapping;
>>>         mapping_set_gfp_mask(mapping, mask);
>>> +       shmem_set_dev_info(mapping, &dev_priv->mm.shmem_info);
>>>
>>>         i915_gem_object_init(obj, &i915_gem_object_ops);
>>>
>>>  }
>>>
>>>  /**
>>> --
>>> 1.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] drm/i915: Make GPU pages movable
  2016-11-14  7:57       ` akash goel
@ 2016-11-16  1:25           ` Hugh Dickins
  0 siblings, 0 replies; 29+ messages in thread
From: Hugh Dickins @ 2016-11-16  1:25 UTC (permalink / raw)
  To: akash goel; +Cc: Hugh Dickins, intel-gfx, linux-mm, Chris Wilson, Sourab Gupta

On Mon, 14 Nov 2016, akash goel wrote:
> On Thu, Nov 10, 2016 at 1:00 PM, Goel, Akash <akash.goel@intel.com> wrote:
> > On 11/10/2016 12:09 PM, Hugh Dickins wrote:
> >> On Fri, 4 Nov 2016, akash.goel@intel.com wrote:
> >>> @@ -4185,6 +4189,8 @@ struct drm_i915_gem_object *
> >>>
> >>>         mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;
> >>> +       if (IS_ENABLED(MIGRATION))

Oh, I knew I'd seen a line like that recently, and it was bugging me
that I ought to search my mailboxes for it; but now I'm glad to find
it again.  If that condition stays, it would really need to say
              if (IS_ENABLED(CONFIG_MIGRATION))
wouldn't it?

> >>> +               mask |= __GFP_MOVABLE;
> >>
> >>
> >> I was going to suggest just make that unconditional,
> >>         mask = GFP_HIGHUSER_MOVABLE | __GFP_RECLAIMABLE;
> >>
> >> But then I wondered what that __GFP_RECLAIMABLE actually achieves?
> >> These pages are already __GFP_RECLAIM (inside GFP_HIGHUSER) and on
> >> the LRU.  It affects gfpflags_to_migratetype(), but I'm not familiar
> >> with what that different migratetype will end up doing.
> >>
> >
> > Will check for this.
> >
> 
> The anti-fragmentation technique used by kernel is based on the idea
> of grouping pages with identical mobility (UNMOVABLE, RECLAIMABLE,
> MOVABLE) together.

Yes.

> __GFP_RECLAIMABLE, like  __GFP_MOVABLE, specifies the
> mobility/migration type of the page and serves a different purpose
> than __GFP_RECLAIM.

Yes, I was wrong to mention __GFP_RECLAIM above: it describes what
to do when in difficulty allocating a page, but says nothing at all
about the nature of the page to be allocated.

> 
> Also as per the below snippet from gfpflags_to_migratetype(), looks
> like __GFP_MOVABLE &  __GFP_RECLAIMABLE can't be used together, which
> makes sense.
> /* Convert GFP flags to their corresponding migrate type */
> #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE | __GFP_MOVABLE)
> static inline int gfpflags_to_migratetype(const gfp_t gfp_flags)
> {
>         VM_WARN_ON((gfp_flags & GFP_MOVABLE_MASK) == GFP_MOVABLE_MASK);
> .....

You're right, that does exclude them from being used together.  And it
makes sense inasmuch as they're expected to be appled to quite different
uses of a page (lru pages versus slab pages).

The comment on __GFP_MOVABLE says "or can be reclaimed"; and
the comment on __GFP_RECLAIMABLE says "used for slab allocations...".
Though it does not say "used for allocations not put on a reclaimable
lru", I think that is the intention; whereas shmem allocations are put
on a reclaimable lru (though they might need your shrinker to unpin them).

> 
> So probably would need to update the mask like this,
>        mask = GFP_HIGHUSER;
>        if (IS_ENABLED(MIGRATION))
>              mask |= __GFP_MOVABLE;
>        else
>              mask |=  __GFP_RECLAIMABLE;
> 
> Please kindly let us know if this looks fine to you or not.

Thanks for looking into it more deeply.  You leave me thinking that
it should simply say

        mask = GFP_HIGHUSER_MOVABLE;

Which is the default anyway, but it then has the Crestline+Broadwater
condition to modify the mask further, so it's probably clearest to
leave the mask = GFP_HIGHUSER_MOVABLE explicit.

GFP_HIGHUSER_MOVABLE is used in many places, and includes __GFP_MOVABLE
without any condition on CONFIG_MIGRATION - because the migratetype is
irrelevant if there is no migration, I presume.

Would you lose something by not or'ing in __GFP_RECLAIMABLE when
CONFIG_MIGRATION=n?  No, because __GFP_RECLAIMABLE is not used for
anything but the migratetype, and the migratetype is then irrelevant.
(I didn't study the code closely enough to say whether the grouping
can still happen even when migration is disabled, but even if it
does still happen, I can't see that it would have any benefit.)

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] drm/i915: Make GPU pages movable
@ 2016-11-16  1:25           ` Hugh Dickins
  0 siblings, 0 replies; 29+ messages in thread
From: Hugh Dickins @ 2016-11-16  1:25 UTC (permalink / raw)
  To: akash goel; +Cc: linux-mm, intel-gfx, Hugh Dickins, Sourab Gupta

On Mon, 14 Nov 2016, akash goel wrote:
> On Thu, Nov 10, 2016 at 1:00 PM, Goel, Akash <akash.goel@intel.com> wrote:
> > On 11/10/2016 12:09 PM, Hugh Dickins wrote:
> >> On Fri, 4 Nov 2016, akash.goel@intel.com wrote:
> >>> @@ -4185,6 +4189,8 @@ struct drm_i915_gem_object *
> >>>
> >>>         mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;
> >>> +       if (IS_ENABLED(MIGRATION))

Oh, I knew I'd seen a line like that recently, and it was bugging me
that I ought to search my mailboxes for it; but now I'm glad to find
it again.  If that condition stays, it would really need to say
              if (IS_ENABLED(CONFIG_MIGRATION))
wouldn't it?

> >>> +               mask |= __GFP_MOVABLE;
> >>
> >>
> >> I was going to suggest just make that unconditional,
> >>         mask = GFP_HIGHUSER_MOVABLE | __GFP_RECLAIMABLE;
> >>
> >> But then I wondered what that __GFP_RECLAIMABLE actually achieves?
> >> These pages are already __GFP_RECLAIM (inside GFP_HIGHUSER) and on
> >> the LRU.  It affects gfpflags_to_migratetype(), but I'm not familiar
> >> with what that different migratetype will end up doing.
> >>
> >
> > Will check for this.
> >
> 
> The anti-fragmentation technique used by kernel is based on the idea
> of grouping pages with identical mobility (UNMOVABLE, RECLAIMABLE,
> MOVABLE) together.

Yes.

> __GFP_RECLAIMABLE, like  __GFP_MOVABLE, specifies the
> mobility/migration type of the page and serves a different purpose
> than __GFP_RECLAIM.

Yes, I was wrong to mention __GFP_RECLAIM above: it describes what
to do when in difficulty allocating a page, but says nothing at all
about the nature of the page to be allocated.

> 
> Also as per the below snippet from gfpflags_to_migratetype(), looks
> like __GFP_MOVABLE &  __GFP_RECLAIMABLE can't be used together, which
> makes sense.
> /* Convert GFP flags to their corresponding migrate type */
> #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE | __GFP_MOVABLE)
> static inline int gfpflags_to_migratetype(const gfp_t gfp_flags)
> {
>         VM_WARN_ON((gfp_flags & GFP_MOVABLE_MASK) == GFP_MOVABLE_MASK);
> .....

You're right, that does exclude them from being used together.  And it
makes sense inasmuch as they're expected to be appled to quite different
uses of a page (lru pages versus slab pages).

The comment on __GFP_MOVABLE says "or can be reclaimed"; and
the comment on __GFP_RECLAIMABLE says "used for slab allocations...".
Though it does not say "used for allocations not put on a reclaimable
lru", I think that is the intention; whereas shmem allocations are put
on a reclaimable lru (though they might need your shrinker to unpin them).

> 
> So probably would need to update the mask like this,
>        mask = GFP_HIGHUSER;
>        if (IS_ENABLED(MIGRATION))
>              mask |= __GFP_MOVABLE;
>        else
>              mask |=  __GFP_RECLAIMABLE;
> 
> Please kindly let us know if this looks fine to you or not.

Thanks for looking into it more deeply.  You leave me thinking that
it should simply say

        mask = GFP_HIGHUSER_MOVABLE;

Which is the default anyway, but it then has the Crestline+Broadwater
condition to modify the mask further, so it's probably clearest to
leave the mask = GFP_HIGHUSER_MOVABLE explicit.

GFP_HIGHUSER_MOVABLE is used in many places, and includes __GFP_MOVABLE
without any condition on CONFIG_MIGRATION - because the migratetype is
irrelevant if there is no migration, I presume.

Would you lose something by not or'ing in __GFP_RECLAIMABLE when
CONFIG_MIGRATION=n?  No, because __GFP_RECLAIMABLE is not used for
anything but the migratetype, and the migratetype is then irrelevant.
(I didn't study the code closely enough to say whether the grouping
can still happen even when migration is disabled, but even if it
does still happen, I can't see that it would have any benefit.)

Hugh
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] drm/i915: Make GPU pages movable
  2016-11-16  1:25           ` Hugh Dickins
@ 2016-11-16  5:13             ` akash goel
  -1 siblings, 0 replies; 29+ messages in thread
From: akash goel @ 2016-11-16  5:13 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: intel-gfx, linux-mm, Chris Wilson, Sourab Gupta, akash goel

On Wed, Nov 16, 2016 at 6:55 AM, Hugh Dickins <hughd@google.com> wrote:
> On Mon, 14 Nov 2016, akash goel wrote:
>> On Thu, Nov 10, 2016 at 1:00 PM, Goel, Akash <akash.goel@intel.com> wrote:
>> > On 11/10/2016 12:09 PM, Hugh Dickins wrote:
>> >> On Fri, 4 Nov 2016, akash.goel@intel.com wrote:
>> >>> @@ -4185,6 +4189,8 @@ struct drm_i915_gem_object *
>> >>>
>> >>>         mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;
>> >>> +       if (IS_ENABLED(MIGRATION))
>
> Oh, I knew I'd seen a line like that recently, and it was bugging me
> that I ought to search my mailboxes for it; but now I'm glad to find
> it again.  If that condition stays, it would really need to say
>               if (IS_ENABLED(CONFIG_MIGRATION))
> wouldn't it?
>
Sorry this was a blooper, should have been
             if (IS_ENABLED(CONFIG_MIGRATION))

>> >>> +               mask |= __GFP_MOVABLE;
>> >>
>> >>
>> >> I was going to suggest just make that unconditional,
>> >>         mask = GFP_HIGHUSER_MOVABLE | __GFP_RECLAIMABLE;
>> >>
>> >> But then I wondered what that __GFP_RECLAIMABLE actually achieves?
>> >> These pages are already __GFP_RECLAIM (inside GFP_HIGHUSER) and on
>> >> the LRU.  It affects gfpflags_to_migratetype(), but I'm not familiar
>> >> with what that different migratetype will end up doing.
>> >>
>> >
>> > Will check for this.
>> >
>>
>> The anti-fragmentation technique used by kernel is based on the idea
>> of grouping pages with identical mobility (UNMOVABLE, RECLAIMABLE,
>> MOVABLE) together.
>
> Yes.
>
>> __GFP_RECLAIMABLE, like  __GFP_MOVABLE, specifies the
>> mobility/migration type of the page and serves a different purpose
>> than __GFP_RECLAIM.
>
> Yes, I was wrong to mention __GFP_RECLAIM above: it describes what
> to do when in difficulty allocating a page, but says nothing at all
> about the nature of the page to be allocated.
>
Right, nicely phrased, thanks.

>>
>> Also as per the below snippet from gfpflags_to_migratetype(), looks
>> like __GFP_MOVABLE &  __GFP_RECLAIMABLE can't be used together, which
>> makes sense.
>> /* Convert GFP flags to their corresponding migrate type */
>> #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE | __GFP_MOVABLE)
>> static inline int gfpflags_to_migratetype(const gfp_t gfp_flags)
>> {
>>         VM_WARN_ON((gfp_flags & GFP_MOVABLE_MASK) == GFP_MOVABLE_MASK);
>> .....
>
> You're right, that does exclude them from being used together.  And it
> makes sense inasmuch as they're expected to be appled to quite different
> uses of a page (lru pages versus slab pages).
>
> The comment on __GFP_MOVABLE says "or can be reclaimed"; and
> the comment on __GFP_RECLAIMABLE says "used for slab allocations...".
> Though it does not say "used for allocations not put on a reclaimable
> lru", I think that is the intention; whereas shmem allocations are put
> on a reclaimable lru (though they might need your shrinker to unpin them).
>

As per my understanding both  __GFP_MOVABLE & __GFP_RECLAIMABLE type
pages would get added to the LRU list for reclaiming.
Irrespective of whether a shmem page is allocated as __GFP_MOVABLE
type or  __GFP_RECLAIMABLE type, it will be added to the LRU list.

>>
>> So probably would need to update the mask like this,
>>        mask = GFP_HIGHUSER;
>>        if (IS_ENABLED(MIGRATION))
>>              mask |= __GFP_MOVABLE;
>>        else
>>              mask |=  __GFP_RECLAIMABLE;
>>
>> Please kindly let us know if this looks fine to you or not.
>
> Thanks for looking into it more deeply.  You leave me thinking that
> it should simply say
>
>         mask = GFP_HIGHUSER_MOVABLE;
>
> Which is the default anyway, but it then has the Crestline+Broadwater
> condition to modify the mask further, so it's probably clearest to
> leave the mask = GFP_HIGHUSER_MOVABLE explicit.
>
> GFP_HIGHUSER_MOVABLE is used in many places, and includes __GFP_MOVABLE
> without any condition on CONFIG_MIGRATION - because the migratetype is
> irrelevant if there is no migration, I presume.
>
> Would you lose something by not or'ing in __GFP_RECLAIMABLE when
> CONFIG_MIGRATION=n?  No, because __GFP_RECLAIMABLE is not used for
> anything but the migratetype, and the migratetype is then irrelevant.
> (I didn't study the code closely enough to say whether the grouping
> can still happen even when migration is disabled, but even if it
> does still happen, I can't see that it would have any benefit.)
>
The freelist, for a particular order,  in buddy allocator is always
organized based on the migrate type.
from <mmzone.h>
struct free_area {
       struct list_head free_list[MIGRATE_TYPES];
        unsigned long nr_free;
};
And the page grouping, based on migrate type, is also always done by the kernel.

Its just that when CONFIG_MIGRATION=n, the actual migration does not
take place, leveraging the page grouping, so no compaction effectively
happens.
I think even without the migration, the page grouping could still be
beneficial in limiting the fragmentation to some extent, as at least
the non-movable pages will not be scattered/dispersed randomly.
But yes when CONFIG_MIGRATION=n,  __GFP_MOVABLE & __GFP_RECLAIMABLE
would be equivalent of each other, as then reclaiming is the only way
to facilitate the formation of higher order free page.
So as you suggested looks like we can simply set
              mask = GFP_HIGHUSER_MOVABLE;

Many thanks for all your inputs.

Best regards
Akash

> Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] drm/i915: Make GPU pages movable
@ 2016-11-16  5:13             ` akash goel
  0 siblings, 0 replies; 29+ messages in thread
From: akash goel @ 2016-11-16  5:13 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-mm, intel-gfx, Sourab Gupta

On Wed, Nov 16, 2016 at 6:55 AM, Hugh Dickins <hughd@google.com> wrote:
> On Mon, 14 Nov 2016, akash goel wrote:
>> On Thu, Nov 10, 2016 at 1:00 PM, Goel, Akash <akash.goel@intel.com> wrote:
>> > On 11/10/2016 12:09 PM, Hugh Dickins wrote:
>> >> On Fri, 4 Nov 2016, akash.goel@intel.com wrote:
>> >>> @@ -4185,6 +4189,8 @@ struct drm_i915_gem_object *
>> >>>
>> >>>         mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;
>> >>> +       if (IS_ENABLED(MIGRATION))
>
> Oh, I knew I'd seen a line like that recently, and it was bugging me
> that I ought to search my mailboxes for it; but now I'm glad to find
> it again.  If that condition stays, it would really need to say
>               if (IS_ENABLED(CONFIG_MIGRATION))
> wouldn't it?
>
Sorry this was a blooper, should have been
             if (IS_ENABLED(CONFIG_MIGRATION))

>> >>> +               mask |= __GFP_MOVABLE;
>> >>
>> >>
>> >> I was going to suggest just make that unconditional,
>> >>         mask = GFP_HIGHUSER_MOVABLE | __GFP_RECLAIMABLE;
>> >>
>> >> But then I wondered what that __GFP_RECLAIMABLE actually achieves?
>> >> These pages are already __GFP_RECLAIM (inside GFP_HIGHUSER) and on
>> >> the LRU.  It affects gfpflags_to_migratetype(), but I'm not familiar
>> >> with what that different migratetype will end up doing.
>> >>
>> >
>> > Will check for this.
>> >
>>
>> The anti-fragmentation technique used by kernel is based on the idea
>> of grouping pages with identical mobility (UNMOVABLE, RECLAIMABLE,
>> MOVABLE) together.
>
> Yes.
>
>> __GFP_RECLAIMABLE, like  __GFP_MOVABLE, specifies the
>> mobility/migration type of the page and serves a different purpose
>> than __GFP_RECLAIM.
>
> Yes, I was wrong to mention __GFP_RECLAIM above: it describes what
> to do when in difficulty allocating a page, but says nothing at all
> about the nature of the page to be allocated.
>
Right, nicely phrased, thanks.

>>
>> Also as per the below snippet from gfpflags_to_migratetype(), looks
>> like __GFP_MOVABLE &  __GFP_RECLAIMABLE can't be used together, which
>> makes sense.
>> /* Convert GFP flags to their corresponding migrate type */
>> #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE | __GFP_MOVABLE)
>> static inline int gfpflags_to_migratetype(const gfp_t gfp_flags)
>> {
>>         VM_WARN_ON((gfp_flags & GFP_MOVABLE_MASK) == GFP_MOVABLE_MASK);
>> .....
>
> You're right, that does exclude them from being used together.  And it
> makes sense inasmuch as they're expected to be appled to quite different
> uses of a page (lru pages versus slab pages).
>
> The comment on __GFP_MOVABLE says "or can be reclaimed"; and
> the comment on __GFP_RECLAIMABLE says "used for slab allocations...".
> Though it does not say "used for allocations not put on a reclaimable
> lru", I think that is the intention; whereas shmem allocations are put
> on a reclaimable lru (though they might need your shrinker to unpin them).
>

As per my understanding both  __GFP_MOVABLE & __GFP_RECLAIMABLE type
pages would get added to the LRU list for reclaiming.
Irrespective of whether a shmem page is allocated as __GFP_MOVABLE
type or  __GFP_RECLAIMABLE type, it will be added to the LRU list.

>>
>> So probably would need to update the mask like this,
>>        mask = GFP_HIGHUSER;
>>        if (IS_ENABLED(MIGRATION))
>>              mask |= __GFP_MOVABLE;
>>        else
>>              mask |=  __GFP_RECLAIMABLE;
>>
>> Please kindly let us know if this looks fine to you or not.
>
> Thanks for looking into it more deeply.  You leave me thinking that
> it should simply say
>
>         mask = GFP_HIGHUSER_MOVABLE;
>
> Which is the default anyway, but it then has the Crestline+Broadwater
> condition to modify the mask further, so it's probably clearest to
> leave the mask = GFP_HIGHUSER_MOVABLE explicit.
>
> GFP_HIGHUSER_MOVABLE is used in many places, and includes __GFP_MOVABLE
> without any condition on CONFIG_MIGRATION - because the migratetype is
> irrelevant if there is no migration, I presume.
>
> Would you lose something by not or'ing in __GFP_RECLAIMABLE when
> CONFIG_MIGRATION=n?  No, because __GFP_RECLAIMABLE is not used for
> anything but the migratetype, and the migratetype is then irrelevant.
> (I didn't study the code closely enough to say whether the grouping
> can still happen even when migration is disabled, but even if it
> does still happen, I can't see that it would have any benefit.)
>
The freelist, for a particular order,  in buddy allocator is always
organized based on the migrate type.
from <mmzone.h>
struct free_area {
       struct list_head free_list[MIGRATE_TYPES];
        unsigned long nr_free;
};
And the page grouping, based on migrate type, is also always done by the kernel.

Its just that when CONFIG_MIGRATION=n, the actual migration does not
take place, leveraging the page grouping, so no compaction effectively
happens.
I think even without the migration, the page grouping could still be
beneficial in limiting the fragmentation to some extent, as at least
the non-movable pages will not be scattered/dispersed randomly.
But yes when CONFIG_MIGRATION=n,  __GFP_MOVABLE & __GFP_RECLAIMABLE
would be equivalent of each other, as then reclaiming is the only way
to facilitate the formation of higher order free page.
So as you suggested looks like we can simply set
              mask = GFP_HIGHUSER_MOVABLE;

Many thanks for all your inputs.

Best regards
Akash

> Hugh
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-gfx] [PATCH 2/2] drm/i915: Make GPU pages movable
  2016-11-09 18:36     ` Hugh Dickins
@ 2016-11-22 16:02       ` Matthew Auld
  2016-11-23  5:26           ` Hugh Dickins
  0 siblings, 1 reply; 29+ messages in thread
From: Matthew Auld @ 2016-11-22 16:02 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Daniel Vetter, Intel Graphics Development, Sourab Gupta,
	linux-mm, akash.goel

On 9 November 2016 at 18:36, Hugh Dickins <hughd@google.com> wrote:
> On Wed, 9 Nov 2016, Daniel Vetter wrote:
>>
>> Hi all -mm folks!
>>
>> Any feedback on these two? It's kinda an intermediate step towards a
>> full-blown gemfs, and I think useful for that. Or do we need to go
>> directly to our own backing storage thing? Aside from ack/nack from -mm I
>> think this is ready for merging.
>
> I'm currently considering them at last: will report back later.
>
> Full-blown gemfs does not come in here, of course; but let me
> fire a warning shot since you mention it: if it's going to use swap,
> then we shall probably have to nak it in favour of continuing to use
> infrastructure from mm/shmem.c.  I very much understand why you would
> love to avoid that dependence, but I doubt it can be safely bypassed.
Could you please elaborate on what specifically you don't like about
gemfs implementing swap, just to make sure I'm following?

Thanks,
Matt

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-gfx] [PATCH 2/2] drm/i915: Make GPU pages movable
  2016-11-22 16:02       ` [Intel-gfx] " Matthew Auld
@ 2016-11-23  5:26           ` Hugh Dickins
  0 siblings, 0 replies; 29+ messages in thread
From: Hugh Dickins @ 2016-11-23  5:26 UTC (permalink / raw)
  To: Matthew Auld
  Cc: Hugh Dickins, Daniel Vetter, Intel Graphics Development,
	Sourab Gupta, linux-mm, akash.goel

On Tue, 22 Nov 2016, Matthew Auld wrote:
> On 9 November 2016 at 18:36, Hugh Dickins <hughd@google.com> wrote:
> > On Wed, 9 Nov 2016, Daniel Vetter wrote:
> >>
> >> Hi all -mm folks!
> >>
> >> Any feedback on these two? It's kinda an intermediate step towards a
> >> full-blown gemfs, and I think useful for that. Or do we need to go
> >> directly to our own backing storage thing? Aside from ack/nack from -mm I
> >> think this is ready for merging.
> >
> > I'm currently considering them at last: will report back later.
> >
> > Full-blown gemfs does not come in here, of course; but let me
> > fire a warning shot since you mention it: if it's going to use swap,
> > then we shall probably have to nak it in favour of continuing to use
> > infrastructure from mm/shmem.c.  I very much understand why you would
> > love to avoid that dependence, but I doubt it can be safely bypassed.
>
> Could you please elaborate on what specifically you don't like about
> gemfs implementing swap, just to make sure I'm following?

If we're talking about swap as implemented in mm/swapfile.c, and
managed for tmpfs mainly through shmem_getpage_gfp(): that's slippery
stuff, private to mm, and I would not want such trickiness duplicated
somewhere down in drivers/gpu/drm, where mm developers and drm developers
will keep on forgetting to keep it working correctly.

But you write of gemfs "implementing" swap (and I see Daniel wrote of
"our own backing storage"): perhaps you intend a disk or slow-mem file
of your own, dedicated to paging gemfs objects according to your own
rules, poked from memory reclaim via a shrinker.  I certainly don't
have the same quick objection to that: it may be a good way forward,
though I'm not at all sure (and would prefer a name distinct from
swap, so we wouldn't get confused - maybe gemswap).

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] drm/i915: Make GPU pages movable
@ 2016-11-23  5:26           ` Hugh Dickins
  0 siblings, 0 replies; 29+ messages in thread
From: Hugh Dickins @ 2016-11-23  5:26 UTC (permalink / raw)
  To: Matthew Auld
  Cc: Intel Graphics Development, Hugh Dickins, Sourab Gupta, linux-mm,
	akash.goel

On Tue, 22 Nov 2016, Matthew Auld wrote:
> On 9 November 2016 at 18:36, Hugh Dickins <hughd@google.com> wrote:
> > On Wed, 9 Nov 2016, Daniel Vetter wrote:
> >>
> >> Hi all -mm folks!
> >>
> >> Any feedback on these two? It's kinda an intermediate step towards a
> >> full-blown gemfs, and I think useful for that. Or do we need to go
> >> directly to our own backing storage thing? Aside from ack/nack from -mm I
> >> think this is ready for merging.
> >
> > I'm currently considering them at last: will report back later.
> >
> > Full-blown gemfs does not come in here, of course; but let me
> > fire a warning shot since you mention it: if it's going to use swap,
> > then we shall probably have to nak it in favour of continuing to use
> > infrastructure from mm/shmem.c.  I very much understand why you would
> > love to avoid that dependence, but I doubt it can be safely bypassed.
>
> Could you please elaborate on what specifically you don't like about
> gemfs implementing swap, just to make sure I'm following?

If we're talking about swap as implemented in mm/swapfile.c, and
managed for tmpfs mainly through shmem_getpage_gfp(): that's slippery
stuff, private to mm, and I would not want such trickiness duplicated
somewhere down in drivers/gpu/drm, where mm developers and drm developers
will keep on forgetting to keep it working correctly.

But you write of gemfs "implementing" swap (and I see Daniel wrote of
"our own backing storage"): perhaps you intend a disk or slow-mem file
of your own, dedicated to paging gemfs objects according to your own
rules, poked from memory reclaim via a shrinker.  I certainly don't
have the same quick objection to that: it may be a good way forward,
though I'm not at all sure (and would prefer a name distinct from
swap, so we wouldn't get confused - maybe gemswap).

Hugh
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-gfx] [PATCH 2/2] drm/i915: Make GPU pages movable
  2016-11-23  5:26           ` Hugh Dickins
@ 2016-11-23  8:36             ` Daniel Vetter
  -1 siblings, 0 replies; 29+ messages in thread
From: Daniel Vetter @ 2016-11-23  8:36 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Matthew Auld, Daniel Vetter, Intel Graphics Development,
	Sourab Gupta, linux-mm, akash.goel

On Tue, Nov 22, 2016 at 09:26:11PM -0800, Hugh Dickins wrote:
> On Tue, 22 Nov 2016, Matthew Auld wrote:
> > On 9 November 2016 at 18:36, Hugh Dickins <hughd@google.com> wrote:
> > > On Wed, 9 Nov 2016, Daniel Vetter wrote:
> > >>
> > >> Hi all -mm folks!
> > >>
> > >> Any feedback on these two? It's kinda an intermediate step towards a
> > >> full-blown gemfs, and I think useful for that. Or do we need to go
> > >> directly to our own backing storage thing? Aside from ack/nack from -mm I
> > >> think this is ready for merging.
> > >
> > > I'm currently considering them at last: will report back later.
> > >
> > > Full-blown gemfs does not come in here, of course; but let me
> > > fire a warning shot since you mention it: if it's going to use swap,
> > > then we shall probably have to nak it in favour of continuing to use
> > > infrastructure from mm/shmem.c.  I very much understand why you would
> > > love to avoid that dependence, but I doubt it can be safely bypassed.
> >
> > Could you please elaborate on what specifically you don't like about
> > gemfs implementing swap, just to make sure I'm following?
> 
> If we're talking about swap as implemented in mm/swapfile.c, and
> managed for tmpfs mainly through shmem_getpage_gfp(): that's slippery
> stuff, private to mm, and I would not want such trickiness duplicated
> somewhere down in drivers/gpu/drm, where mm developers and drm developers
> will keep on forgetting to keep it working correctly.
> 
> But you write of gemfs "implementing" swap (and I see Daniel wrote of
> "our own backing storage"): perhaps you intend a disk or slow-mem file
> of your own, dedicated to paging gemfs objects according to your own
> rules, poked from memory reclaim via a shrinker.  I certainly don't
> have the same quick objection to that: it may be a good way forward,
> though I'm not at all sure (and would prefer a name distinct from
> swap, so we wouldn't get confused - maybe gemswap).

"our backing storage" was from the pov of the gpu, which is just
memory (and then normal swap). I think that's exactly the part you don't
like ;-)

Anwyway, objections noted, we'll go and beef up the interfaces exposed by
shmem in the style of this patch series here. What I'll expect in the
future beyong the migrate callback so we can unpin pages is asking shmem
to move the pages around to a different numa node, and also asking for
hugepages (if available). Thanks a lot for your feedback meanwhile.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] drm/i915: Make GPU pages movable
@ 2016-11-23  8:36             ` Daniel Vetter
  0 siblings, 0 replies; 29+ messages in thread
From: Daniel Vetter @ 2016-11-23  8:36 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Intel Graphics Development, Sourab Gupta, linux-mm, akash.goel

On Tue, Nov 22, 2016 at 09:26:11PM -0800, Hugh Dickins wrote:
> On Tue, 22 Nov 2016, Matthew Auld wrote:
> > On 9 November 2016 at 18:36, Hugh Dickins <hughd@google.com> wrote:
> > > On Wed, 9 Nov 2016, Daniel Vetter wrote:
> > >>
> > >> Hi all -mm folks!
> > >>
> > >> Any feedback on these two? It's kinda an intermediate step towards a
> > >> full-blown gemfs, and I think useful for that. Or do we need to go
> > >> directly to our own backing storage thing? Aside from ack/nack from -mm I
> > >> think this is ready for merging.
> > >
> > > I'm currently considering them at last: will report back later.
> > >
> > > Full-blown gemfs does not come in here, of course; but let me
> > > fire a warning shot since you mention it: if it's going to use swap,
> > > then we shall probably have to nak it in favour of continuing to use
> > > infrastructure from mm/shmem.c.  I very much understand why you would
> > > love to avoid that dependence, but I doubt it can be safely bypassed.
> >
> > Could you please elaborate on what specifically you don't like about
> > gemfs implementing swap, just to make sure I'm following?
> 
> If we're talking about swap as implemented in mm/swapfile.c, and
> managed for tmpfs mainly through shmem_getpage_gfp(): that's slippery
> stuff, private to mm, and I would not want such trickiness duplicated
> somewhere down in drivers/gpu/drm, where mm developers and drm developers
> will keep on forgetting to keep it working correctly.
> 
> But you write of gemfs "implementing" swap (and I see Daniel wrote of
> "our own backing storage"): perhaps you intend a disk or slow-mem file
> of your own, dedicated to paging gemfs objects according to your own
> rules, poked from memory reclaim via a shrinker.  I certainly don't
> have the same quick objection to that: it may be a good way forward,
> though I'm not at all sure (and would prefer a name distinct from
> swap, so we wouldn't get confused - maybe gemswap).

"our backing storage" was from the pov of the gpu, which is just
memory (and then normal swap). I think that's exactly the part you don't
like ;-)

Anwyway, objections noted, we'll go and beef up the interfaces exposed by
shmem in the style of this patch series here. What I'll expect in the
future beyong the migrate callback so we can unpin pages is asking shmem
to move the pages around to a different numa node, and also asking for
hugepages (if available). Thanks a lot for your feedback meanwhile.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-gfx] [PATCH 2/2] drm/i915: Make GPU pages movable
  2016-11-23  8:36             ` Daniel Vetter
@ 2016-11-23 19:00               ` Hugh Dickins
  -1 siblings, 0 replies; 29+ messages in thread
From: Hugh Dickins @ 2016-11-23 19:00 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Hugh Dickins, Matthew Auld, Intel Graphics Development,
	Sourab Gupta, linux-mm, akash.goel

On Wed, 23 Nov 2016, Daniel Vetter wrote:
> On Tue, Nov 22, 2016 at 09:26:11PM -0800, Hugh Dickins wrote:
> > On Tue, 22 Nov 2016, Matthew Auld wrote:
> > > On 9 November 2016 at 18:36, Hugh Dickins <hughd@google.com> wrote:
> > > > On Wed, 9 Nov 2016, Daniel Vetter wrote:
> > > >>
> > > >> Hi all -mm folks!
> > > >>
> > > >> Any feedback on these two? It's kinda an intermediate step towards a
> > > >> full-blown gemfs, and I think useful for that. Or do we need to go
> > > >> directly to our own backing storage thing? Aside from ack/nack from -mm I
> > > >> think this is ready for merging.
> > > >
> > > > I'm currently considering them at last: will report back later.
> > > >
> > > > Full-blown gemfs does not come in here, of course; but let me
> > > > fire a warning shot since you mention it: if it's going to use swap,
> > > > then we shall probably have to nak it in favour of continuing to use
> > > > infrastructure from mm/shmem.c.  I very much understand why you would
> > > > love to avoid that dependence, but I doubt it can be safely bypassed.
> > >
> > > Could you please elaborate on what specifically you don't like about
> > > gemfs implementing swap, just to make sure I'm following?
> > 
> > If we're talking about swap as implemented in mm/swapfile.c, and
> > managed for tmpfs mainly through shmem_getpage_gfp(): that's slippery
> > stuff, private to mm, and I would not want such trickiness duplicated
> > somewhere down in drivers/gpu/drm, where mm developers and drm developers
> > will keep on forgetting to keep it working correctly.
> > 
> > But you write of gemfs "implementing" swap (and I see Daniel wrote of
> > "our own backing storage"): perhaps you intend a disk or slow-mem file
> > of your own, dedicated to paging gemfs objects according to your own
> > rules, poked from memory reclaim via a shrinker.  I certainly don't
> > have the same quick objection to that: it may be a good way forward,
> > though I'm not at all sure (and would prefer a name distinct from
> > swap, so we wouldn't get confused - maybe gemswap).
> 
> "our backing storage" was from the pov of the gpu, which is just
> memory (and then normal swap). I think that's exactly the part you don't
> like ;-)

Yes ;) but never mind, reassuring answer below...

> 
> Anwyway, objections noted, we'll go and beef up the interfaces exposed by
> shmem in the style of this patch series here. What I'll expect in the
> future beyong the migrate callback so we can unpin pages is asking shmem
> to move the pages around to a different numa node, and also asking for
> hugepages (if available). Thanks a lot for your feedback meanwhile.

Migration callback, NUMA improvements, hugepages: all are reasonable
things to be asking of shmem.  I expect we'll have some to and fro on
how best to fit whatever interface you want with how those are already
handled, but none of those requests is reason to replace shmem by an
independently backed gemfs.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] drm/i915: Make GPU pages movable
@ 2016-11-23 19:00               ` Hugh Dickins
  0 siblings, 0 replies; 29+ messages in thread
From: Hugh Dickins @ 2016-11-23 19:00 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Hugh Dickins, Intel Graphics Development, Sourab Gupta, linux-mm,
	akash.goel

On Wed, 23 Nov 2016, Daniel Vetter wrote:
> On Tue, Nov 22, 2016 at 09:26:11PM -0800, Hugh Dickins wrote:
> > On Tue, 22 Nov 2016, Matthew Auld wrote:
> > > On 9 November 2016 at 18:36, Hugh Dickins <hughd@google.com> wrote:
> > > > On Wed, 9 Nov 2016, Daniel Vetter wrote:
> > > >>
> > > >> Hi all -mm folks!
> > > >>
> > > >> Any feedback on these two? It's kinda an intermediate step towards a
> > > >> full-blown gemfs, and I think useful for that. Or do we need to go
> > > >> directly to our own backing storage thing? Aside from ack/nack from -mm I
> > > >> think this is ready for merging.
> > > >
> > > > I'm currently considering them at last: will report back later.
> > > >
> > > > Full-blown gemfs does not come in here, of course; but let me
> > > > fire a warning shot since you mention it: if it's going to use swap,
> > > > then we shall probably have to nak it in favour of continuing to use
> > > > infrastructure from mm/shmem.c.  I very much understand why you would
> > > > love to avoid that dependence, but I doubt it can be safely bypassed.
> > >
> > > Could you please elaborate on what specifically you don't like about
> > > gemfs implementing swap, just to make sure I'm following?
> > 
> > If we're talking about swap as implemented in mm/swapfile.c, and
> > managed for tmpfs mainly through shmem_getpage_gfp(): that's slippery
> > stuff, private to mm, and I would not want such trickiness duplicated
> > somewhere down in drivers/gpu/drm, where mm developers and drm developers
> > will keep on forgetting to keep it working correctly.
> > 
> > But you write of gemfs "implementing" swap (and I see Daniel wrote of
> > "our own backing storage"): perhaps you intend a disk or slow-mem file
> > of your own, dedicated to paging gemfs objects according to your own
> > rules, poked from memory reclaim via a shrinker.  I certainly don't
> > have the same quick objection to that: it may be a good way forward,
> > though I'm not at all sure (and would prefer a name distinct from
> > swap, so we wouldn't get confused - maybe gemswap).
> 
> "our backing storage" was from the pov of the gpu, which is just
> memory (and then normal swap). I think that's exactly the part you don't
> like ;-)

Yes ;) but never mind, reassuring answer below...

> 
> Anwyway, objections noted, we'll go and beef up the interfaces exposed by
> shmem in the style of this patch series here. What I'll expect in the
> future beyong the migrate callback so we can unpin pages is asking shmem
> to move the pages around to a different numa node, and also asking for
> hugepages (if available). Thanks a lot for your feedback meanwhile.

Migration callback, NUMA improvements, hugepages: all are reasonable
things to be asking of shmem.  I expect we'll have some to and fro on
how best to fit whatever interface you want with how those are already
handled, but none of those requests is reason to replace shmem by an
independently backed gemfs.

Hugh
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/2] shmem: Support for registration of driver/file owner specific ops
  2016-10-20 15:15     ` Joonas Lahtinen
@ 2016-11-04 12:48       ` akash.goel
  0 siblings, 0 replies; 29+ messages in thread
From: akash.goel @ 2016-11-04 12:48 UTC (permalink / raw)
  To: intel-gfx
  Cc: Chris Wilson, Hugh Dickins, linux-mm, linux-kernel, Sourab Gupta,
	Akash Goel

From: Chris Wilson <chris@chris-wilson.co.uk>

This provides support for the drivers or shmem file owners to register
a set of callbacks, which can be invoked from the address space
operations methods implemented by shmem.  This allow the file owners to
hook into the shmem address space operations to do some extra/custom
operations in addition to the default ones.

The private_data field of address_space struct is used to store the
pointer to driver specific ops.  Currently only one ops field is defined,
which is migratepage, but can be extended on an as-needed basis.

The need for driver specific operations arises since some of the
operations (like migratepage) may not be handled completely within shmem,
so as to be effective, and would need some driver specific handling also.
Specifically, i915.ko would like to participate in migratepage().
i915.ko uses shmemfs to provide swappable backing storage for its user
objects, but when those objects are in use by the GPU it must pin the
entire object until the GPU is idle.  As a result, large chunks of memory
can be arbitrarily withdrawn from page migration, resulting in premature
out-of-memory due to fragmentation.  However, if i915.ko can receive the
migratepage() request, it can then flush the object from the GPU, remove
its pin and thus enable the migration.

Since gfx allocations are one of the major consumer of system memory, its
imperative to have such a mechanism to effectively deal with
fragmentation.  And therefore the need for such a provision for initiating
driver specific actions during address space operations.

v2:
- Drop dev_ prefix from the members of shmem_dev_info structure. (Joonas)
- Change the return type of shmem_set_device_op() to void and remove the
  check for pre-existing data. (Joonas)
- Rename shmem_set_device_op() to shmem_set_dev_info() to be consistent
  with shmem_dev_info structure. (Joonas)

Cc: Hugh Dickins <hughd@google.com>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.linux.org
Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 include/linux/shmem_fs.h | 13 +++++++++++++
 mm/shmem.c               | 17 ++++++++++++++++-
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index ff078e7..22796a0 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -39,11 +39,24 @@ struct shmem_sb_info {
 	unsigned long shrinklist_len; /* Length of shrinklist */
 };
 
+struct shmem_dev_info {
+	void *private_data;
+	int (*migratepage)(struct address_space *mapping,
+			   struct page *newpage, struct page *page,
+			   enum migrate_mode mode, void *dev_priv_data);
+};
+
 static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
 {
 	return container_of(inode, struct shmem_inode_info, vfs_inode);
 }
 
+static inline void shmem_set_dev_info(struct address_space *mapping,
+					 struct shmem_dev_info *info)
+{
+	mapping->private_data = info;
+}
+
 /*
  * Functions in mm/shmem.c called directly from elsewhere:
  */
diff --git a/mm/shmem.c b/mm/shmem.c
index ad7813d..bf71ddd 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1290,6 +1290,21 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 	return 0;
 }
 
+#ifdef CONFIG_MIGRATION
+static int shmem_migratepage(struct address_space *mapping,
+			     struct page *newpage, struct page *page,
+			     enum migrate_mode mode)
+{
+	struct shmem_dev_info *dev_info = mapping->private_data;
+
+	if (dev_info && dev_info->migratepage)
+		return dev_info->migratepage(mapping, newpage, page,
+				mode, dev_info->private_data);
+
+	return migrate_page(mapping, newpage, page, mode);
+}
+#endif
+
 #if defined(CONFIG_NUMA) && defined(CONFIG_TMPFS)
 static void shmem_show_mpol(struct seq_file *seq, struct mempolicy *mpol)
 {
@@ -3654,7 +3669,7 @@ static void shmem_destroy_inodecache(void)
 	.write_end	= shmem_write_end,
 #endif
 #ifdef CONFIG_MIGRATION
-	.migratepage	= migrate_page,
+	.migratepage	= shmem_migratepage,
 #endif
 	.error_remove_page = generic_error_remove_page,
 };
-- 
1.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] shmem: Support for registration of Driver/file owner specific ops
  2016-10-19 15:11   ` akash goel
@ 2016-10-20 15:15     ` Joonas Lahtinen
  2016-11-04 12:48       ` [PATCH 1/2] shmem: Support for registration of driver/file " akash.goel
  0 siblings, 1 reply; 29+ messages in thread
From: Joonas Lahtinen @ 2016-10-20 15:15 UTC (permalink / raw)
  To: akash goel, Chris Wilson
  Cc: Goel, Akash, linux-mm, intel-gfx, Hugh Dickins, Sourab Gupta

On ke, 2016-10-19 at 20:41 +0530, akash goel wrote:
> On Thu, Mar 24, 2016 at 5:41 PM, Joonas Lahtinen
> > <joonas.lahtinen@linux.intel.com> wrote:
> > On ke, 2016-03-23 at 11:39 +0530, akash.goel@intel.com wrote:
> > > @@ -34,11 +34,28 @@ struct shmem_sb_info {
> > >       struct mempolicy *mpol;     /* default memory policy for mappings */
> > >  };
> > > 
> > > +struct shmem_dev_info {
> > > +     void *dev_private_data;
> > > +     int (*dev_migratepage)(struct address_space *mapping,
> > > +                            struct page *newpage, struct page *page,
> > > +                            enum migrate_mode mode, void *dev_priv_data);
> > 
> > One might want to have a separate shmem_dev_operations struct or
> > similar.
> > 
> Sorry for the very late turnaround.
> 
> Sorry couldn't get your point here. Are you suggesting to rename the
> structure to shmem_dev_operations ?

I'm pretty sure I was after putting migratepage function pointer in
shmem_dev_operations struct, but I think that can be done once there
are more functions.

s/dev_private_data/private_data/ and s/dev_priv_data/private_data/
might be in order, too. I should be obvious from context.

> > > +};
> > > +
> > >  static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
> > >  {
> > >       return container_of(inode, struct shmem_inode_info, vfs_inode);
> > >  }
> > > 
> > > +static inline int shmem_set_device_ops(struct address_space *mapping,
> > > +                             struct shmem_dev_info *info)
> > > +{

This name could be shmem_set_dev_info, if there will be separate _ops
struct in future.

> > > +     if (mapping->private_data != NULL)
> > > +             return -EEXIST;
> > > +
> > 
> > I did a quick random peek and most set functions are just void and
> > override existing data. I'd suggest the same.
> > 
> > > 
> > > +     mapping->private_data = info;
> > 
> Fine will change the return type to void and remove the check.
> 
> > 
> > Also, doesn't this kinda steal the mapping->private_data, might that be
> > unexpected for the user? I notice currently it's not being touched at
> > all.
> > 
> Sorry by User do you mean the shmem client who called shmem_file_setup() ?
> It seems clients are not expected to touch mapping->private_data and
> so shmemfs can safely use it.

If it's not used by others, should be fine. Not sure if WARN would be
in place, Chris?

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] shmem: Support for registration of Driver/file owner specific ops
  2016-03-24 12:11 ` Joonas Lahtinen
@ 2016-10-19 15:11   ` akash goel
  2016-10-20 15:15     ` Joonas Lahtinen
  0 siblings, 1 reply; 29+ messages in thread
From: akash goel @ 2016-10-19 15:11 UTC (permalink / raw)
  To: Joonas Lahtinen, Chris Wilson
  Cc: Goel, Akash, linux-mm, intel-gfx, Hugh Dickins, Sourab Gupta

On Thu, Mar 24, 2016 at 5:41 PM, Joonas Lahtinen
<joonas.lahtinen@linux.intel.com> wrote:
> On ke, 2016-03-23 at 11:39 +0530, akash.goel@intel.com wrote:
>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>
>> This provides support for the Drivers or shmem file owners to register
>> a set of callbacks, which can be invoked from the address space operations
>> methods implemented by shmem.
>> This allow the file owners to hook into the shmem address space operations
>> to do some extra/custom operations in addition to the default ones.
>>
>> The private_data field of address_space struct is used to store the pointer
>> to driver specific ops.
>> Currently only one ops field is defined, which is migratepage, but can be
>> extended on need basis.
>>
>> The need for driver specific operations arises since some of the operations
>> (like migratepage) may not be handled completely within shmem, so as to be
>> effective, and would need some driver specific handling also.
>>
>> Specifically, i915.ko would like to participate in migratepage().
>> i915.ko uses shmemfs to provide swappable backing storage for its user
>> objects, but when those objects are in use by the GPU it must pin the entire
>> object until the GPU is idle. As a result, large chunks of memory can be
>> arbitrarily withdrawn from page migration, resulting in premature
>> out-of-memory due to fragmentation. However, if i915.ko can receive the
>> migratepage() request, it can then flush the object from the GPU, remove
>> its pin and thus enable the migration.
>>
>> Since Gfx allocations are one of the major consumer of system memory, its
>> imperative to have such a mechanism to effectively deal with fragmentation.
>> And therefore the need for such a provision for initiating driver specific
>> actions during address space operations.
>>
>> Cc: Hugh Dickins <hughd@google.com>
>> Cc: linux-mm@kvack.org
>> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
>> Signed-off-by: Akash Goel <akash.goel@intel.com>
>> ---
>>  include/linux/shmem_fs.h | 17 +++++++++++++++++
>>  mm/shmem.c               | 17 ++++++++++++++++-
>>  2 files changed, 33 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
>> index 4d4780c..6cfa76a 100644
>> --- a/include/linux/shmem_fs.h
>> +++ b/include/linux/shmem_fs.h
>> @@ -34,11 +34,28 @@ struct shmem_sb_info {
>>       struct mempolicy *mpol;     /* default memory policy for mappings */
>>  };
>>
>> +struct shmem_dev_info {
>> +     void *dev_private_data;
>> +     int (*dev_migratepage)(struct address_space *mapping,
>> +                            struct page *newpage, struct page *page,
>> +                            enum migrate_mode mode, void *dev_priv_data);
>
> One might want to have a separate shmem_dev_operations struct or
> similar.
>
Sorry for the very late turnaround.

Sorry couldn't get your point here. Are you suggesting to rename the
structure to shmem_dev_operations ?

>> +};
>> +
>>  static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
>>  {
>>       return container_of(inode, struct shmem_inode_info, vfs_inode);
>>  }
>>
>> +static inline int shmem_set_device_ops(struct address_space *mapping,
>> +                             struct shmem_dev_info *info)
>> +{
>> +     if (mapping->private_data != NULL)
>> +             return -EEXIST;
>> +
>
> I did a quick random peek and most set functions are just void and
> override existing data. I'd suggest the same.
>
>> +     mapping->private_data = info;
>
Fine will change the return type to void and remove the check.

> Also, doesn't this kinda steal the mapping->private_data, might that be
> unexpected for the user? I notice currently it's not being touched at
> all.
>
Sorry by User do you mean the shmem client who called shmem_file_setup() ?
It seems clients are not expected to touch mapping->private_data and
so shmemfs can safely use it.

Best regards
Akash

>> +     return 0;
>> +}
>> +
>>  /*
>>   * Functions in mm/shmem.c called directly from elsewhere:
>>   */
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 440e2a7..f8625c4 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -952,6 +952,21 @@ redirty:
>>       return 0;
>>  }
>>
>> +#ifdef CONFIG_MIGRATION
>> +static int shmem_migratepage(struct address_space *mapping,
>> +                          struct page *newpage, struct page *page,
>> +                          enum migrate_mode mode)
>> +{
>> +     struct shmem_dev_info *dev_info = mapping->private_data;
>> +
>> +     if (dev_info && dev_info->dev_migratepage)
>> +             return dev_info->dev_migratepage(mapping, newpage, page,
>> +                             mode, dev_info->dev_private_data);
>> +
>> +     return migrate_page(mapping, newpage, page, mode);
>> +}
>> +#endif
>> +
>>  #ifdef CONFIG_NUMA
>>  #ifdef CONFIG_TMPFS
>>  static void shmem_show_mpol(struct seq_file *seq, struct mempolicy *mpol)
>> @@ -3168,7 +3183,7 @@ static const struct address_space_operations shmem_aops = {
>>       .write_end      = shmem_write_end,
>>  #endif
>>  #ifdef CONFIG_MIGRATION
>> -     .migratepage    = migrate_page,
>> +     .migratepage    = shmem_migratepage,
>>  #endif
>>       .error_remove_page = generic_error_remove_page,
>>  };
> --
> Joonas Lahtinen
> Open Source Technology Center
> Intel Corporation
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] shmem: Support for registration of Driver/file owner specific ops
  2016-03-23  6:09 [PATCH 1/2] shmem: Support for registration of Driver/file " akash.goel
@ 2016-03-24 12:11 ` Joonas Lahtinen
  2016-10-19 15:11   ` akash goel
  0 siblings, 1 reply; 29+ messages in thread
From: Joonas Lahtinen @ 2016-03-24 12:11 UTC (permalink / raw)
  To: akash.goel, intel-gfx; +Cc: linux-mm, Sourab Gupta, Hugh Dickins

On ke, 2016-03-23 at 11:39 +0530, akash.goel@intel.com wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> This provides support for the Drivers or shmem file owners to register
> a set of callbacks, which can be invoked from the address space operations
> methods implemented by shmem.
> This allow the file owners to hook into the shmem address space operations
> to do some extra/custom operations in addition to the default ones.
> 
> The private_data field of address_space struct is used to store the pointer
> to driver specific ops.
> Currently only one ops field is defined, which is migratepage, but can be
> extended on need basis.
> 
> The need for driver specific operations arises since some of the operations
> (like migratepage) may not be handled completely within shmem, so as to be
> effective, and would need some driver specific handling also.
> 
> Specifically, i915.ko would like to participate in migratepage().
> i915.ko uses shmemfs to provide swappable backing storage for its user
> objects, but when those objects are in use by the GPU it must pin the entire
> object until the GPU is idle. As a result, large chunks of memory can be
> arbitrarily withdrawn from page migration, resulting in premature
> out-of-memory due to fragmentation. However, if i915.ko can receive the
> migratepage() request, it can then flush the object from the GPU, remove
> its pin and thus enable the migration.
> 
> Since Gfx allocations are one of the major consumer of system memory, its
> imperative to have such a mechanism to effectively deal with fragmentation.
> And therefore the need for such a provision for initiating driver specific
> actions during address space operations.
> 
> Cc: Hugh Dickins <hughd@google.com>
> Cc: linux-mm@kvack.org
> Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> ---
>  include/linux/shmem_fs.h | 17 +++++++++++++++++
>  mm/shmem.c               | 17 ++++++++++++++++-
>  2 files changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> index 4d4780c..6cfa76a 100644
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -34,11 +34,28 @@ struct shmem_sb_info {
>  	struct mempolicy *mpol;     /* default memory policy for mappings */
>  };
>  
> +struct shmem_dev_info {
> +	void *dev_private_data;
> +	int (*dev_migratepage)(struct address_space *mapping,
> +			       struct page *newpage, struct page *page,
> +			       enum migrate_mode mode, void *dev_priv_data);

One might want to have a separate shmem_dev_operations struct or
similar.

> +};
> +
>  static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
>  {
>  	return container_of(inode, struct shmem_inode_info, vfs_inode);
>  }
>  
> +static inline int shmem_set_device_ops(struct address_space *mapping,
> +				struct shmem_dev_info *info)
> +{
> +	if (mapping->private_data != NULL)
> +		return -EEXIST;
> +

I did a quick random peek and most set functions are just void and
override existing data. I'd suggest the same.

> +	mapping->private_data = info;

Also, doesn't this kinda steal the mapping->private_data, might that be
unexpected for the user? I notice currently it's not being touched at
all.

> +	return 0;
> +}
> +
>  /*
>   * Functions in mm/shmem.c called directly from elsewhere:
>   */
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 440e2a7..f8625c4 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -952,6 +952,21 @@ redirty:
>  	return 0;
>  }
>  
> +#ifdef CONFIG_MIGRATION
> +static int shmem_migratepage(struct address_space *mapping,
> +			     struct page *newpage, struct page *page,
> +			     enum migrate_mode mode)
> +{
> +	struct shmem_dev_info *dev_info = mapping->private_data;
> +
> +	if (dev_info && dev_info->dev_migratepage)
> +		return dev_info->dev_migratepage(mapping, newpage, page,
> +				mode, dev_info->dev_private_data);
> +
> +	return migrate_page(mapping, newpage, page, mode);
> +}
> +#endif
> +
>  #ifdef CONFIG_NUMA
>  #ifdef CONFIG_TMPFS
>  static void shmem_show_mpol(struct seq_file *seq, struct mempolicy *mpol)
> @@ -3168,7 +3183,7 @@ static const struct address_space_operations shmem_aops = {
>  	.write_end	= shmem_write_end,
>  #endif
>  #ifdef CONFIG_MIGRATION
> -	.migratepage	= migrate_page,
> +	.migratepage	= shmem_migratepage,
>  #endif
>  	.error_remove_page = generic_error_remove_page,
>  };
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/2] shmem: Support for registration of Driver/file owner specific ops
@ 2016-03-23  6:09 akash.goel
  2016-03-24 12:11 ` Joonas Lahtinen
  0 siblings, 1 reply; 29+ messages in thread
From: akash.goel @ 2016-03-23  6:09 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson, Hugh Dickins, linux-mm, Sourab Gupta, Akash Goel

From: Chris Wilson <chris@chris-wilson.co.uk>

This provides support for the Drivers or shmem file owners to register
a set of callbacks, which can be invoked from the address space operations
methods implemented by shmem.
This allow the file owners to hook into the shmem address space operations
to do some extra/custom operations in addition to the default ones.

The private_data field of address_space struct is used to store the pointer
to driver specific ops.
Currently only one ops field is defined, which is migratepage, but can be
extended on need basis.

The need for driver specific operations arises since some of the operations
(like migratepage) may not be handled completely within shmem, so as to be
effective, and would need some driver specific handling also.

Specifically, i915.ko would like to participate in migratepage().
i915.ko uses shmemfs to provide swappable backing storage for its user
objects, but when those objects are in use by the GPU it must pin the entire
object until the GPU is idle. As a result, large chunks of memory can be
arbitrarily withdrawn from page migration, resulting in premature
out-of-memory due to fragmentation. However, if i915.ko can receive the
migratepage() request, it can then flush the object from the GPU, remove
its pin and thus enable the migration.

Since Gfx allocations are one of the major consumer of system memory, its
imperative to have such a mechanism to effectively deal with fragmentation.
And therefore the need for such a provision for initiating driver specific
actions during address space operations.

Cc: Hugh Dickins <hughd@google.com>
Cc: linux-mm@kvack.org
Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
---
 include/linux/shmem_fs.h | 17 +++++++++++++++++
 mm/shmem.c               | 17 ++++++++++++++++-
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 4d4780c..6cfa76a 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -34,11 +34,28 @@ struct shmem_sb_info {
 	struct mempolicy *mpol;     /* default memory policy for mappings */
 };
 
+struct shmem_dev_info {
+	void *dev_private_data;
+	int (*dev_migratepage)(struct address_space *mapping,
+			       struct page *newpage, struct page *page,
+			       enum migrate_mode mode, void *dev_priv_data);
+};
+
 static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
 {
 	return container_of(inode, struct shmem_inode_info, vfs_inode);
 }
 
+static inline int shmem_set_device_ops(struct address_space *mapping,
+				struct shmem_dev_info *info)
+{
+	if (mapping->private_data != NULL)
+		return -EEXIST;
+
+	mapping->private_data = info;
+	return 0;
+}
+
 /*
  * Functions in mm/shmem.c called directly from elsewhere:
  */
diff --git a/mm/shmem.c b/mm/shmem.c
index 440e2a7..f8625c4 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -952,6 +952,21 @@ redirty:
 	return 0;
 }
 
+#ifdef CONFIG_MIGRATION
+static int shmem_migratepage(struct address_space *mapping,
+			     struct page *newpage, struct page *page,
+			     enum migrate_mode mode)
+{
+	struct shmem_dev_info *dev_info = mapping->private_data;
+
+	if (dev_info && dev_info->dev_migratepage)
+		return dev_info->dev_migratepage(mapping, newpage, page,
+				mode, dev_info->dev_private_data);
+
+	return migrate_page(mapping, newpage, page, mode);
+}
+#endif
+
 #ifdef CONFIG_NUMA
 #ifdef CONFIG_TMPFS
 static void shmem_show_mpol(struct seq_file *seq, struct mempolicy *mpol)
@@ -3168,7 +3183,7 @@ static const struct address_space_operations shmem_aops = {
 	.write_end	= shmem_write_end,
 #endif
 #ifdef CONFIG_MIGRATION
-	.migratepage	= migrate_page,
+	.migratepage	= shmem_migratepage,
 #endif
 	.error_remove_page = generic_error_remove_page,
 };
-- 
1.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2016-11-23 19:00 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-04 15:02 [PATCH 1/2] shmem: Support for registration of driver/file owner specific ops akash.goel
2016-11-04 15:02 ` [PATCH 2/2] drm/i915: Make GPU pages movable akash.goel
2016-11-09 11:28   ` Daniel Vetter
2016-11-09 18:36     ` Hugh Dickins
2016-11-22 16:02       ` [Intel-gfx] " Matthew Auld
2016-11-23  5:26         ` Hugh Dickins
2016-11-23  5:26           ` Hugh Dickins
2016-11-23  8:36           ` [Intel-gfx] " Daniel Vetter
2016-11-23  8:36             ` Daniel Vetter
2016-11-23 19:00             ` [Intel-gfx] " Hugh Dickins
2016-11-23 19:00               ` Hugh Dickins
2016-11-10  6:39   ` Hugh Dickins
2016-11-10  6:39     ` Hugh Dickins
2016-11-10  7:30     ` Goel, Akash
2016-11-14  7:57       ` akash goel
2016-11-16  1:25         ` Hugh Dickins
2016-11-16  1:25           ` Hugh Dickins
2016-11-16  5:13           ` akash goel
2016-11-16  5:13             ` akash goel
2016-11-04 17:15 ` ✓ Fi.CI.BAT: success for series starting with [1/2] shmem: Support for registration of driver/file owner specific ops Patchwork
2016-11-10  5:36 ` [PATCH 1/2] " Hugh Dickins
2016-11-10 16:22   ` Goel, Akash
2016-11-11 13:50     ` Chris Wilson
2016-11-11 13:50       ` Chris Wilson
  -- strict thread matches above, loose matches on Subject: below --
2016-03-23  6:09 [PATCH 1/2] shmem: Support for registration of Driver/file " akash.goel
2016-03-24 12:11 ` Joonas Lahtinen
2016-10-19 15:11   ` akash goel
2016-10-20 15:15     ` Joonas Lahtinen
2016-11-04 12:48       ` [PATCH 1/2] shmem: Support for registration of driver/file " akash.goel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.