dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [RESUBMIT][PATCH 1/2] drm/i915/gem: Avoid taking runtime-pm under the shrinker
@ 2022-07-29 10:56 Janusz Krzysztofik
  2022-07-29 10:56 ` [RESUBMIT][PATCH 2/2] drm/i915/gem: Perform active shrinking from a background thread Janusz Krzysztofik
  0 siblings, 1 reply; 2+ messages in thread
From: Janusz Krzysztofik @ 2022-07-29 10:56 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Tvrtko Ursulin, dri-devel, Chris Wilson,
	Matthew Auld, Janusz Krzysztofik

From: Chris Wilson <chris@chris-wilson.co.uk>

Inside the shrinker, we cannot wake the device as that may cause
recursion into fs-reclaim, so instead we only unbind vma if the device
is currently awake. (In order to provide reclaim while asleep, we do
wake the device up during kswapd -- we probably want to limit that wake
up if we have anything to shrink though!)

To avoid the same fs_reclaim recursion potential during
i915_gem_object_unbind, we acquire a wakeref there, see commit
3e817471a34c ("drm/i915/gem: Take runtime-pm wakeref prior to unbinding").
However, we use i915_gem_object_unbind from the shrinker path to make the
object available for shrinking and so we must make the wakeref acquisition
here conditional.

<4> [437.542172] ======================================================
<4> [437.542174] WARNING: possible circular locking dependency detected
<4> [437.542176] 5.19.0-rc6-CI_DRM_11876-g2305e0d00665+ #1 Tainted: G     U
<4> [437.542179] ------------------------------------------------------
<4> [437.542181] kswapd0/93 is trying to acquire lock:
<4> [437.542183] ffffffff827a7608 (acpi_wakeup_lock){+.+.}-{3:3}, at: acpi_device_wakeup_disable+0x12/0x50
<4> [437.542191]
but task is already holding lock:
<4> [437.542194] ffffffff8275d360 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x91/0x5c0
<4> [437.542199]
which lock already depends on the new lock.
<4> [437.542202]
the existing dependency chain (in reverse order) is:
<4> [437.542204]
-> #2 (fs_reclaim){+.+.}-{0:0}:
<4> [437.542207]        fs_reclaim_acquire+0x9d/0xd0
<4> [437.542211]        kmem_cache_alloc_trace+0x2a/0x250
<4> [437.542214]        __acpi_device_add+0x263/0x3a0
<4> [437.542217]        acpi_add_single_object+0x3ea/0x710
<4> [437.542220]        acpi_bus_check_add+0xf7/0x240
<4> [437.542222]        acpi_bus_scan+0x34/0xf0
<4> [437.542224]        acpi_scan_init+0xf5/0x241
<4> [437.542228]        acpi_init+0x449/0x4aa
<4> [437.542230]        do_one_initcall+0x53/0x2e0
<4> [437.542233]        kernel_init_freeable+0x18f/0x1dd
<4> [437.542236]        kernel_init+0x11/0x110
<4> [437.542239]        ret_from_fork+0x1f/0x30
<4> [437.542241]
-> #1 (acpi_device_lock){+.+.}-{3:3}:
<4> [437.542245]        __mutex_lock+0x97/0xf20
<4> [437.542246]        acpi_enable_wakeup_device_power+0x30/0xf0
<4> [437.542249]        __acpi_device_wakeup_enable+0x31/0x110
<4> [437.542252]        acpi_pm_set_device_wakeup+0x55/0x100
<4> [437.542254]        __pci_enable_wake+0x5e/0xa0
<4> [437.542257]        pci_finish_runtime_suspend+0x32/0x70
<4> [437.542259]        pci_pm_runtime_suspend+0xa3/0x160
<4> [437.542262]        __rpm_callback+0x3d/0x110
<4> [437.542265]        rpm_callback+0x54/0x60
<4> [437.542268]        rpm_suspend.part.10+0x105/0x5a0
<4> [437.542270]        pm_runtime_work+0x7d/0x1e0
<4> [437.542273]        process_one_work+0x272/0x5c0
<4> [437.542276]        worker_thread+0x37/0x370
<4> [437.542278]        kthread+0xed/0x120
<4> [437.542280]        ret_from_fork+0x1f/0x30
<4> [437.542282]
-> #0 (acpi_wakeup_lock){+.+.}-{3:3}:
<4> [437.542285]        __lock_acquire+0x15ad/0x2940
<4> [437.542288]        lock_acquire+0xd3/0x310
<4> [437.542291]        __mutex_lock+0x97/0xf20
<4> [437.542293]        acpi_device_wakeup_disable+0x12/0x50
<4> [437.542295]        acpi_pm_set_device_wakeup+0x6e/0x100
<4> [437.542297]        __pci_enable_wake+0x73/0xa0
<4> [437.542300]        pci_pm_runtime_resume+0x45/0x90
<4> [437.542302]        __rpm_callback+0x3d/0x110
<4> [437.542304]        rpm_callback+0x54/0x60
<4> [437.542307]        rpm_resume+0x54f/0x750
<4> [437.542309]        __pm_runtime_resume+0x42/0x80
<4> [437.542311]        __intel_runtime_pm_get+0x19/0x80 [i915]
<4> [437.542386]        i915_gem_object_unbind+0x8f/0x3b0 [i915]
<4> [437.542487]        i915_gem_shrink+0x634/0x850 [i915]
<4> [437.542584]        i915_gem_shrinker_scan+0x3a/0xc0 [i915]
<4> [437.542679]        shrink_slab.constprop.97+0x1a4/0x4f0
<4> [437.542684]        shrink_node+0x21e/0x420
<4> [437.542687]        balance_pgdat+0x241/0x5c0
<4> [437.542690]        kswapd+0x229/0x4f0
<4> [437.542694]        kthread+0xed/0x120
<4> [437.542697]        ret_from_fork+0x1f/0x30
<4> [437.542701]
other info that might help us debug this:
<4> [437.542705] Chain exists of:
  acpi_wakeup_lock --> acpi_device_lock --> fs_reclaim
<4> [437.542713]  Possible unsafe locking scenario:
<4> [437.542716]        CPU0                    CPU1
<4> [437.542719]        ----                    ----
<4> [437.542721]   lock(fs_reclaim);
<4> [437.542725]                                lock(acpi_device_lock);
<4> [437.542728]                                lock(fs_reclaim);
<4> [437.542732]   lock(acpi_wakeup_lock);
<4> [437.542736]
 *** DEADLOCK ***

Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/6449
Fixes: 3e817471a34c ("drm/i915/gem: Take runtime-pm wakeref prior to unbinding")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v5.6+
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
Resubmit reason: keep in series with the other patch while dropping RFC
label from it.

 drivers/gpu/drm/i915/i915_gem.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 702e5b89be22..910a6fde5726 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -119,8 +119,8 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj,
 {
 	struct intel_runtime_pm *rpm = &to_i915(obj->base.dev)->runtime_pm;
 	bool vm_trylock = !!(flags & I915_GEM_OBJECT_UNBIND_VM_TRYLOCK);
+	intel_wakeref_t wakeref = 0;
 	LIST_HEAD(still_in_list);
-	intel_wakeref_t wakeref;
 	struct i915_vma *vma;
 	int ret;
 
@@ -135,7 +135,8 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj,
 	 * as they are required by the shrinker. Ergo, we wake the device up
 	 * first just in case.
 	 */
-	wakeref = intel_runtime_pm_get(rpm);
+	if (!(flags & I915_GEM_OBJECT_UNBIND_TEST))
+		wakeref = intel_runtime_pm_get(rpm);
 
 try_again:
 	ret = 0;
@@ -200,7 +201,8 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj,
 		goto try_again;
 	}
 
-	intel_runtime_pm_put(rpm, wakeref);
+	if (wakeref)
+		intel_runtime_pm_put(rpm, wakeref);
 
 	return ret;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [RESUBMIT][PATCH 2/2] drm/i915/gem: Perform active shrinking from a background thread
  2022-07-29 10:56 [RESUBMIT][PATCH 1/2] drm/i915/gem: Avoid taking runtime-pm under the shrinker Janusz Krzysztofik
@ 2022-07-29 10:56 ` Janusz Krzysztofik
  0 siblings, 0 replies; 2+ messages in thread
From: Janusz Krzysztofik @ 2022-07-29 10:56 UTC (permalink / raw)
  To: intel-gfx
  Cc: Thomas Hellström, Tvrtko Ursulin, dri-devel, Chris Wilson,
	Matthew Auld, Janusz Krzysztofik

From: Chris Wilson <chris@chris-wilson.co.uk>

i915 is very greedy and will retain system pages for as long as the user
requires them; once acquired they will be only returned when the object
is freed. In order to respond to system memory pressure, i915 hooks into
the shrinker subsystem, designed to prune the filesystem caches, to
unbind and return system pages. However, we can only do so if the device
is active at that moment, as we cannot resume the device from inside
direct reclaim to unbind pages from the GPU, nor do we want to delay
random processes with unbound waits trying to reclaim active pages. To
workaround that quandary, what we avoided in direct reclaim we
delegated to kswapd, as that is run from process context outside of
direct reclaim and able to sleep and resume the device.

In practice, kswapd also uses fs_reclaim_acquire() around its
shrink_slab calls, prohibiting runtime resume. If we cannot wake the
device from idle, we will retain system memory indefinitely.

As we cannot take advantage of kswapd's decoupled process context to
perform an active reclaim of bound pages, spawn our own kthread to wait
under our wakeref. Similar to kswapd, there is no direct dependency on
the background task to direct reclaim (other than failure to promptly
return pages will implicitly result in oom), as such the task itself does
not inherit the fs-reclaim context. A page reclaimed by i915 will
typically not immediately be available for re-use, as it will require
writeback, and so only a future allocation attempt may benefit.
Concurrent page allocation attempts do not wait for either kswapd or our
own swapper task.

We mark our kthread as a memallocator (allowed to dip into memory
reserves, but not allowed to trigger direct reclaim) and mark up
the call to the shrinker with a fs_reclaim critical section. This
should prevent us from accidentally abusing the background swapper task,
and so the swapper kthread behaves like kswapd with the exception of
being allowed to wake the device up, and being decoupled from the
shrinker_rwsem.

Reported-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/6449
Fixes: 178a30c90ac7 ("drm/i915: Unbind objects in shrinker only if device is runtime active")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
Resubmit reason: drop RFC label.

 drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 134 +++++++++++++++++--
 drivers/gpu/drm/i915/i915_drv.h              |  15 +++
 2 files changed, 135 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
index 1030053571a2..bc6c1978e64a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
@@ -310,6 +310,113 @@ i915_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
 	return count;
 }
 
+static unsigned long run_swapper(struct drm_i915_private *i915,
+				 unsigned long target,
+				 unsigned long *nr_scanned)
+{
+	return i915_gem_shrink(NULL, i915,
+			       target, nr_scanned,
+			       I915_SHRINK_ACTIVE |
+			       I915_SHRINK_BOUND |
+			       I915_SHRINK_UNBOUND |
+			       I915_SHRINK_WRITEBACK);
+}
+
+static int swapper(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	atomic_long_t *target = &i915->mm.swapper.target;
+	unsigned int noreclaim_state;
+
+	/*
+	 * For us to be running the swapper implies that the system is under
+	 * enough memory pressure to be swapping. At that point, we both want
+	 * to ensure we make forward progress in order to reclaim pages from
+	 * the device and not contribute further to direct reclaim pressure. We
+	 * mark ourselves as a memalloc task in order to not trigger direct
+	 * reclaim ourselves, but dip into the system memory reserves for
+	 * shrinkers.
+	 */
+	noreclaim_state = memalloc_noreclaim_save();
+
+	do {
+		intel_wakeref_t wakeref;
+
+		___wait_var_event(target,
+				  atomic_long_read(target) ||
+				  kthread_should_stop(),
+				  TASK_IDLE, 0, 0, schedule());
+		if (kthread_should_stop())
+			break;
+
+		with_intel_runtime_pm(&i915->runtime_pm, wakeref) {
+			unsigned long nr_scan = atomic_long_xchg(target, 0);
+
+			/*
+			 * Now that we have woken up the device hierarchy,
+			 * act as a normal shrinker. Our shrinker is primarily
+			 * focussed on supporting direct reclaim (low latency,
+			 * avoiding contention that may lead to more reclaim,
+			 * or prevent that reclaim from making forward progress)
+			 * and we wish to continue that good practice even
+			 * here where we could accidentally sleep holding locks.
+			 *
+			 * Let lockdep know and warn us about any bad practice
+			 * that may lead to high latency in direct reclaim, or
+			 * anywhere else.
+			 *
+			 * While the swapper is active, direct reclaim from
+			 * other threads will also be running in parallel
+			 * through i915_gem_shrink(), scouring for idle pages.
+			 */
+			fs_reclaim_acquire(GFP_KERNEL);
+			run_swapper(i915, nr_scan, &nr_scan);
+			fs_reclaim_release(GFP_KERNEL);
+		}
+	} while (1);
+
+	memalloc_noreclaim_restore(noreclaim_state);
+	return 0;
+}
+
+static void start_swapper(struct drm_i915_private *i915)
+{
+	i915->mm.swapper.tsk = kthread_run(swapper, i915, "i915-swapd");
+	if (IS_ERR(i915->mm.swapper.tsk))
+		drm_err(&i915->drm,
+			"Failed to launch swapper; memory reclaim may be degraded\n");
+}
+
+static unsigned long kick_swapper(struct drm_i915_private *i915,
+				  unsigned long nr_scan,
+				  unsigned long *scanned)
+{
+	/* Run immediately under kswap if disabled */
+	if (IS_ERR_OR_NULL(i915->mm.swapper.tsk))
+		/*
+		 * Note that as we are still inside kswapd, we are still
+		 * inside a fs_reclaim context and cannot forcibly wake the
+		 * device and so can only opportunitiscally reclaim bound
+		 * memory.
+		 */
+		return run_swapper(i915, nr_scan, scanned);
+
+	if (!atomic_long_fetch_add(nr_scan, &i915->mm.swapper.target))
+		wake_up_var(&i915->mm.swapper.target);
+
+	return 0;
+}
+
+static void stop_swapper(struct drm_i915_private *i915)
+{
+	struct task_struct *tsk = fetch_and_zero(&i915->mm.swapper.tsk);
+
+	if (IS_ERR_OR_NULL(tsk))
+		return;
+
+	kthread_stop(tsk);
+}
+
 static unsigned long
 i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
 {
@@ -318,27 +425,22 @@ i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
 	unsigned long freed;
 
 	sc->nr_scanned = 0;
-
 	freed = i915_gem_shrink(NULL, i915,
 				sc->nr_to_scan,
 				&sc->nr_scanned,
 				I915_SHRINK_BOUND |
 				I915_SHRINK_UNBOUND);
-	if (sc->nr_scanned < sc->nr_to_scan && current_is_kswapd()) {
-		intel_wakeref_t wakeref;
+	if (!sc->nr_scanned) /* nothing left to reclaim */
+		return SHRINK_STOP;
 
-		with_intel_runtime_pm(&i915->runtime_pm, wakeref) {
-			freed += i915_gem_shrink(NULL, i915,
-						 sc->nr_to_scan - sc->nr_scanned,
-						 &sc->nr_scanned,
-						 I915_SHRINK_ACTIVE |
-						 I915_SHRINK_BOUND |
-						 I915_SHRINK_UNBOUND |
-						 I915_SHRINK_WRITEBACK);
-		}
-	}
+	/* Pages still bound and system is failing with direct reclaim? */
+	if (sc->nr_scanned < sc->nr_to_scan && current_is_kswapd())
+		/* Defer high latency tasks to a background thread. */
+		freed += kick_swapper(i915,
+				      sc->nr_to_scan - sc->nr_scanned,
+				      &sc->nr_scanned);
 
-	return sc->nr_scanned ? freed : SHRINK_STOP;
+	return freed;
 }
 
 static int
@@ -434,10 +536,14 @@ void i915_gem_driver_register__shrinker(struct drm_i915_private *i915)
 	i915->mm.vmap_notifier.notifier_call = i915_gem_shrinker_vmap;
 	drm_WARN_ON(&i915->drm,
 		    register_vmap_purge_notifier(&i915->mm.vmap_notifier));
+
+	start_swapper(i915);
 }
 
 void i915_gem_driver_unregister__shrinker(struct drm_i915_private *i915)
 {
+	stop_swapper(i915);
+
 	drm_WARN_ON(&i915->drm,
 		    unregister_vmap_purge_notifier(&i915->mm.vmap_notifier));
 	drm_WARN_ON(&i915->drm,
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3364a6e5169b..976983ab67a9 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -283,6 +283,21 @@ struct i915_gem_mm {
 	/* shrinker accounting, also useful for userland debugging */
 	u64 shrink_memory;
 	u32 shrink_count;
+
+	/* background task for returning bound system pages */
+	struct {
+		struct task_struct *tsk; /* our kswapd equivalent */
+
+		/*
+		 * Track the number of pages do_shrink_slab() has asked us
+		 * to reclaim, and we have failed to find. This count of
+		 * outstanding reclaims is passed to the swapper thread,
+		 * which then blocks as it tries to achieve that goal.
+		 * It is likely that the target overshoots due to the
+		 * latency between our thread and kswapd making new requests.
+		 */
+		atomic_long_t target;
+	} swapper;
 };
 
 #define I915_IDLE_ENGINES_TIMEOUT (200) /* in ms */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-07-29 10:57 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-29 10:56 [RESUBMIT][PATCH 1/2] drm/i915/gem: Avoid taking runtime-pm under the shrinker Janusz Krzysztofik
2022-07-29 10:56 ` [RESUBMIT][PATCH 2/2] drm/i915/gem: Perform active shrinking from a background thread Janusz Krzysztofik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).