All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/6] drm/i915: Use the full hammer when shutting down the rcu tasks
@ 2016-10-31 10:26 Chris Wilson
  2016-10-31 10:26 ` [PATCH 2/6] drm/i915: Avoid accessing request->timeline outside of its lifetime Chris Wilson
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Chris Wilson @ 2016-10-31 10:26 UTC (permalink / raw)
  To: intel-gfx

To flush all call_rcu() tasks (here from i915_gem_free_object()) we need
to call rcu_barrier() (not synchronize_rcu()). If we don't then we may
still have objects being freed as we continue to teardown the driver -
in particular, the recently released rings may race with the memory
manager shutdown resulting in sporadic:

[  142.217186] WARNING: CPU: 7 PID: 6185 at drivers/gpu/drm/drm_mm.c:932 drm_mm_takedown+0x2e/0x40
[  142.217187] Memory manager not clean during takedown.
[  142.217187] Modules linked in: i915(-) x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel lpc_ich snd_hda_codec_realtek snd_hda_codec_generic mei_me mei snd_hda_codec_hdmi snd_hda_codec snd_hwdep snd_hda_core snd_pcm e1000e ptp pps_core [last unloaded: snd_hda_intel]
[  142.217199] CPU: 7 PID: 6185 Comm: rmmod Not tainted 4.9.0-rc2-CI-Trybot_242+ #1
[  142.217199] Hardware name: LENOVO 10AGS00601/SHARKBAY, BIOS FBKT34AUS 04/24/2013
[  142.217200]  ffffc90002ecfce0 ffffffff8142dd65 ffffc90002ecfd30 0000000000000000
[  142.217202]  ffffc90002ecfd20 ffffffff8107e4e6 000003a40778c2a8 ffff880401355c48
[  142.217204]  ffff88040778c2a8 ffffffffa040f3c0 ffffffffa040f4a0 00005621fbf8b1f0
[  142.217206] Call Trace:
[  142.217209]  [<ffffffff8142dd65>] dump_stack+0x67/0x92
[  142.217211]  [<ffffffff8107e4e6>] __warn+0xc6/0xe0
[  142.217213]  [<ffffffff8107e54a>] warn_slowpath_fmt+0x4a/0x50
[  142.217214]  [<ffffffff81559e3e>] drm_mm_takedown+0x2e/0x40
[  142.217236]  [<ffffffffa035c02a>] i915_gem_cleanup_stolen+0x1a/0x20 [i915]
[  142.217246]  [<ffffffffa034c581>] i915_ggtt_cleanup_hw+0x31/0xb0 [i915]
[  142.217253]  [<ffffffffa0310311>] i915_driver_cleanup_hw+0x31/0x40 [i915]
[  142.217260]  [<ffffffffa0312001>] i915_driver_unload+0x141/0x1a0 [i915]
[  142.217268]  [<ffffffffa031c2c4>] i915_pci_remove+0x14/0x20 [i915]
[  142.217269]  [<ffffffff8147d214>] pci_device_remove+0x34/0xb0
[  142.217271]  [<ffffffff8157b14c>] __device_release_driver+0x9c/0x150
[  142.217272]  [<ffffffff8157bcc6>] driver_detach+0xb6/0xc0
[  142.217273]  [<ffffffff8157abe3>] bus_remove_driver+0x53/0xd0
[  142.217274]  [<ffffffff8157c787>] driver_unregister+0x27/0x50
[  142.217276]  [<ffffffff8147c265>] pci_unregister_driver+0x25/0x70
[  142.217287]  [<ffffffffa03d764c>] i915_exit+0x1a/0x71 [i915]
[  142.217289]  [<ffffffff811136b3>] SyS_delete_module+0x193/0x1e0
[  142.217291]  [<ffffffff818174ae>] entry_SYSCALL_64_fastpath+0x1c/0xb1
[  142.217292] ---[ end trace 6fd164859c154772 ]---
[  142.217505] [drm:show_leaks] *ERROR* node [6b6b6b6b6b6b6b6b + 6b6b6b6b6b6b6b6b]: inserted at
                [<ffffffff81559ff3>] save_stack.isra.1+0x53/0xa0
                [<ffffffff8155a98d>] drm_mm_insert_node_in_range_generic+0x2ad/0x360
                [<ffffffffa035bf23>] i915_gem_stolen_insert_node_in_range+0x93/0xe0 [i915]
                [<ffffffffa035c855>] i915_gem_object_create_stolen+0x75/0xb0 [i915]
                [<ffffffffa036a51a>] intel_engine_create_ring+0x9a/0x140 [i915]
                [<ffffffffa036a921>] intel_init_ring_buffer+0xf1/0x440 [i915]
                [<ffffffffa036be1b>] intel_init_render_ring_buffer+0xab/0x1b0 [i915]
                [<ffffffffa0363d08>] intel_engines_init+0xc8/0x210 [i915]
                [<ffffffffa0355d7c>] i915_gem_init+0xac/0xf0 [i915]
                [<ffffffffa0311454>] i915_driver_load+0x9c4/0x1430 [i915]
                [<ffffffffa031c2f8>] i915_pci_probe+0x28/0x40 [i915]
                [<ffffffff8147d315>] pci_device_probe+0x85/0xf0
                [<ffffffff8157b7ff>] driver_probe_device+0x21f/0x430
                [<ffffffff8157baee>] __driver_attach+0xde/0xe0

In particular note that the node was being poisoned as we inspected the
list, a  clear indication that the object is being freed as we make the
assertion.

Fixes: fbbd37b36fa5 ("drm/i915: Move object release to a freelist + worker")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 839ce2ae38fa..ed01421e3be7 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -544,8 +544,10 @@ static void i915_gem_fini(struct drm_i915_private *dev_priv)
 	i915_gem_context_fini(&dev_priv->drm);
 	mutex_unlock(&dev_priv->drm.struct_mutex);
 
-	synchronize_rcu();
-	flush_work(&dev_priv->mm.free_work);
+	do {
+		rcu_barrier();
+		flush_work(&dev_priv->mm.free_work);
+	} while (!llist_empty(&dev_priv->mm.free_list));
 
 	WARN_ON(!list_empty(&dev_priv->context_list));
 }
-- 
2.10.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/6] drm/i915: Avoid accessing request->timeline outside of its lifetime
  2016-10-31 10:26 [PATCH 1/6] drm/i915: Use the full hammer when shutting down the rcu tasks Chris Wilson
@ 2016-10-31 10:26 ` Chris Wilson
  2016-10-31 17:35   ` Tvrtko Ursulin
  2016-10-31 10:26 ` [PATCH 3/6] drm/i915: Track pages pinned due to swizzling quirk Chris Wilson
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Chris Wilson @ 2016-10-31 10:26 UTC (permalink / raw)
  To: intel-gfx

Whilst waiting on a request, we may do so without holding any locks or
any guards beyond a reference to the request. In order to avoid taking
locks within request deallocation, we drop references to its timeline
(via the context and ppgtt) upon retirement. We should avoid chasing
such pointers outside of their control, in particular we inspect the
request->timeline to see if we may restore the RPS waitboost for a
client. If we instead look at the engine->timeline, we will have similar
behaviour on both full-ppgtt and !full-ppgtt systems and reduce the
amount of reward we give towards stalling clients (i.e. only if the
client stalls and the GPU is uncontended does it reclaim its boost).
This restores behaviour back to pre-timelines, whilst fixing:

[  645.078485] BUG: KASAN: use-after-free in i915_gem_object_wait_fence+0x1ee/0x2e0 at addr ffff8802335643a0
[  645.078577] Read of size 4 by task gem_exec_schedu/28408
[  645.078638] CPU: 1 PID: 28408 Comm: gem_exec_schedu Not tainted 4.9.0-rc2+ #64
[  645.078724] Hardware name:                  /        , BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015
[  645.078816]  ffff88022daef9a0 ffffffff8143d059 ffff880235402a80 ffff880233564200
[  645.078998]  ffff88022daef9c8 ffffffff81229c5c ffff88022daefa48 ffff880233564200
[  645.079172]  ffff880235402a80 ffff88022daefa38 ffffffff81229ef0 000000008110a796
[  645.079345] Call Trace:
[  645.079404]  [<ffffffff8143d059>] dump_stack+0x68/0x9f
[  645.079467]  [<ffffffff81229c5c>] kasan_object_err+0x1c/0x70
[  645.079534]  [<ffffffff81229ef0>] kasan_report_error+0x1f0/0x4b0
[  645.079601]  [<ffffffff8122a244>] kasan_report+0x34/0x40
[  645.079676]  [<ffffffff81634f5e>] ? i915_gem_object_wait_fence+0x1ee/0x2e0
[  645.079741]  [<ffffffff81229951>] __asan_load4+0x61/0x80
[  645.079807]  [<ffffffff81634f5e>] i915_gem_object_wait_fence+0x1ee/0x2e0
[  645.079876]  [<ffffffff816364bf>] i915_gem_object_wait+0x19f/0x590
[  645.079944]  [<ffffffff81636320>] ? i915_gem_object_wait_priority+0x500/0x500
[  645.080016]  [<ffffffff8110fb30>] ? debug_show_all_locks+0x1e0/0x1e0
[  645.080084]  [<ffffffff8110abdc>] ? check_chain_key+0x14c/0x210
[  645.080157]  [<ffffffff8110a796>] ? __lock_is_held+0x46/0xc0
[  645.080226]  [<ffffffff8163bc61>] ? i915_gem_set_domain_ioctl+0x141/0x690
[  645.080296]  [<ffffffff8163bcc2>] i915_gem_set_domain_ioctl+0x1a2/0x690
[  645.080366]  [<ffffffff811f8f85>] ? __might_fault+0x75/0xe0
[  645.080433]  [<ffffffff815a55f7>] drm_ioctl+0x327/0x640
[  645.080508]  [<ffffffff8163bb20>] ? i915_gem_obj_prepare_shmem_write+0x3a0/0x3a0
[  645.080603]  [<ffffffff815a52d0>] ? drm_ioctl_permit+0x120/0x120
[  645.080670]  [<ffffffff8110abdc>] ? check_chain_key+0x14c/0x210
[  645.080738]  [<ffffffff81275717>] do_vfs_ioctl+0x127/0xa20
[  645.080804]  [<ffffffff8120268c>] ? do_mmap+0x47c/0x580
[  645.080871]  [<ffffffff811da567>] ? vm_mmap_pgoff+0x117/0x140
[  645.080938]  [<ffffffff812755f0>] ? ioctl_preallocate+0x150/0x150
[  645.081011]  [<ffffffff81108c53>] ? up_write+0x23/0x50
[  645.081078]  [<ffffffff811da567>] ? vm_mmap_pgoff+0x117/0x140
[  645.081145]  [<ffffffff811da450>] ? vma_is_stack_for_current+0x90/0x90
[  645.081214]  [<ffffffff8110d853>] ? mark_held_locks+0x23/0xc0
[  645.082030]  [<ffffffff81288408>] ? __fget+0x168/0x250
[  645.082106]  [<ffffffff819ad517>] ? entry_SYSCALL_64_fastpath+0x5/0xb1
[  645.082176]  [<ffffffff81288592>] ? __fget_light+0xa2/0xc0
[  645.082242]  [<ffffffff8127604c>] SyS_ioctl+0x3c/0x70
[  645.082309]  [<ffffffff819ad52e>] entry_SYSCALL_64_fastpath+0x1c/0xb1
[  645.082374] Object at ffff880233564200, in cache kmalloc-8192 size: 8192
[  645.082431] Allocated:
[  645.082480] PID = 28408
[  645.082535]  [  645.082566] [<ffffffff8103ae66>] save_stack_trace+0x16/0x20
[  645.082623]  [  645.082656] [<ffffffff81228b06>] save_stack+0x46/0xd0
[  645.082716]  [  645.082756] [<ffffffff812292fd>] kasan_kmalloc+0xad/0xe0
[  645.082817]  [  645.082848] [<ffffffff81631752>] i915_ppgtt_create+0x52/0x220
[  645.082908]  [  645.082941] [<ffffffff8161db96>] i915_gem_create_context+0x396/0x560
[  645.083027]  [  645.083059] [<ffffffff8161f857>] i915_gem_context_create_ioctl+0x97/0xf0
[  645.083152]  [  645.083183] [<ffffffff815a55f7>] drm_ioctl+0x327/0x640
[  645.083243]  [  645.083274] [<ffffffff81275717>] do_vfs_ioctl+0x127/0xa20
[  645.083334]  [  645.083372] [<ffffffff8127604c>] SyS_ioctl+0x3c/0x70
[  645.083432]  [  645.083464] [<ffffffff819ad52e>] entry_SYSCALL_64_fastpath+0x1c/0xb1
[  645.083551] Freed:
[  645.083599] PID = 27629
[  645.083648]  [  645.083676] [<ffffffff8103ae66>] save_stack_trace+0x16/0x20
[  645.083738]  [  645.083770] [<ffffffff81228b06>] save_stack+0x46/0xd0
[  645.083830]  [  645.083862] [<ffffffff81229203>] kasan_slab_free+0x73/0xc0
[  645.083922]  [  645.083961] [<ffffffff812279c9>] kfree+0xa9/0x170
[  645.084021]  [  645.084053] [<ffffffff81629f60>] i915_ppgtt_release+0x100/0x180
[  645.084139]  [  645.084171] [<ffffffff8161d414>] i915_gem_context_free+0x1b4/0x230
[  645.084257]  [  645.084288] [<ffffffff816537b2>] intel_lr_context_unpin+0x192/0x230
[  645.084380]  [  645.084413] [<ffffffff81645250>] i915_gem_request_retire+0x620/0x630
[  645.084500]  [  645.085226] [<ffffffff816473d1>] i915_gem_retire_requests+0x181/0x280
[  645.085313]  [  645.085352] [<ffffffff816352ba>] i915_gem_retire_work_handler+0xca/0xe0
[  645.085440]  [  645.085471] [<ffffffff810c725b>] process_one_work+0x4fb/0x920
[  645.085532]  [  645.085562] [<ffffffff810c770d>] worker_thread+0x8d/0x840
[  645.085622]  [  645.085653] [<ffffffff810d21e5>] kthread+0x185/0x1b0
[  645.085718]  [  645.085750] [<ffffffff819ad7a7>] ret_from_fork+0x27/0x40
[  645.085811] Memory state around the buggy address:
[  645.085869]  ffff880233564280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  645.085956]  ffff880233564300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  645.086053] >ffff880233564380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  645.086138]                                ^
[  645.086193]  ffff880233564400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  645.086283]  ffff880233564480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 73cb97010d4f ("drm/i915: Combine seqno + tracking into a global timeline struct")
Fixes: 80b204bce8f2 ("drm/i915: Enable multiple timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c     | 7 +++----
 drivers/gpu/drm/i915/i915_gem.c         | 4 ++--
 drivers/gpu/drm/i915/i915_gpu_error.c   | 2 +-
 drivers/gpu/drm/i915/i915_irq.c         | 2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h | 5 +++++
 5 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 9bef6f55f99d..bf19192dcc3b 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1348,9 +1348,8 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 
 		seq_printf(m, "%s:\n", engine->name);
 		seq_printf(m, "\tseqno = %x [current %x, last %x]\n",
-			   engine->hangcheck.seqno,
-			   seqno[id],
-			   engine->timeline->last_submitted_seqno);
+			   engine->hangcheck.seqno, seqno[id],
+			   intel_engine_last_submit(engine));
 		seq_printf(m, "\twaiters? %s, fake irq active? %s\n",
 			   yesno(intel_engine_has_waiter(engine)),
 			   yesno(test_bit(engine->id,
@@ -3167,7 +3166,7 @@ static int i915_engine_info(struct seq_file *m, void *unused)
 		seq_printf(m, "%s\n", engine->name);
 		seq_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [score %d]\n",
 			   intel_engine_get_seqno(engine),
-			   engine->timeline->last_submitted_seqno,
+			   intel_engine_last_submit(engine),
 			   engine->hangcheck.seqno,
 			   engine->hangcheck.score);
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1e5d2bf777e4..b9f540b16a45 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -371,7 +371,7 @@ i915_gem_object_wait_fence(struct dma_fence *fence,
 	if (flags & I915_WAIT_LOCKED && i915_gem_request_completed(rq))
 		i915_gem_request_retire_upto(rq);
 
-	if (rps && rq->fence.seqno == rq->timeline->last_submitted_seqno) {
+	if (rps && rq->global_seqno == intel_engine_last_submit(rq->engine)) {
 		/* The GPU is now idle and this client has stalled.
 		 * Since no other client has submitted a request in the
 		 * meantime, assume that this client is the only one
@@ -2674,7 +2674,7 @@ static void i915_gem_cleanup_engine(struct intel_engine_cs *engine)
 	 * reset it.
 	 */
 	intel_engine_init_global_seqno(engine,
-				       engine->timeline->last_submitted_seqno);
+				       intel_engine_last_submit(engine));
 
 	/*
 	 * Clear the execlists queue up before freeing the requests, as those
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 7ba40487e345..204093f3eaa5 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1120,7 +1120,7 @@ static void error_record_engine_registers(struct drm_i915_error_state *error,
 	ee->instpm = I915_READ(RING_INSTPM(engine->mmio_base));
 	ee->acthd = intel_engine_get_active_head(engine);
 	ee->seqno = intel_engine_get_seqno(engine);
-	ee->last_seqno = engine->timeline->last_submitted_seqno;
+	ee->last_seqno = intel_engine_last_submit(engine);
 	ee->start = I915_READ_START(engine);
 	ee->head = I915_READ_HEAD(engine);
 	ee->tail = I915_READ_TAIL(engine);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 90d0905592f2..4dda2b1eefdb 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3168,7 +3168,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 
 		acthd = intel_engine_get_active_head(engine);
 		seqno = intel_engine_get_seqno(engine);
-		submit = READ_ONCE(engine->timeline->last_submitted_seqno);
+		submit = intel_engine_last_submit(engine);
 
 		if (engine->hangcheck.seqno == seqno) {
 			if (i915_seqno_passed(seqno, submit)) {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 43f61aa24ed7..a089e11ee575 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -496,6 +496,11 @@ static inline u32 intel_engine_get_seqno(struct intel_engine_cs *engine)
 	return intel_read_status_page(engine, I915_GEM_HWS_INDEX);
 }
 
+static inline u32 intel_engine_last_submit(struct intel_engine_cs *engine)
+{
+	return READ_ONCE(engine->timeline->last_submitted_seqno);
+}
+
 int init_workarounds_ring(struct intel_engine_cs *engine);
 
 void intel_engine_get_instdone(struct intel_engine_cs *engine,
-- 
2.10.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/6] drm/i915: Track pages pinned due to swizzling quirk
  2016-10-31 10:26 [PATCH 1/6] drm/i915: Use the full hammer when shutting down the rcu tasks Chris Wilson
  2016-10-31 10:26 ` [PATCH 2/6] drm/i915: Avoid accessing request->timeline outside of its lifetime Chris Wilson
@ 2016-10-31 10:26 ` Chris Wilson
  2016-11-01  8:39   ` Tvrtko Ursulin
  2016-10-31 10:26 ` [PATCH 4/6] drm/i915: Discard objects from mm global_list after being shrunk Chris Wilson
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Chris Wilson @ 2016-10-31 10:26 UTC (permalink / raw)
  To: intel-gfx

If we have a tiled object and an unknown CPU swizzle pattern, we pin the
pages to prevent the object from being swapped out (and us corrupting
the contents as we do not know the access pattern and so cannot convert
it to linear and back to tiled on reuse). This requires us to remember
to drop the extra pinning when freeing the object, or else we trigger
warnings about the pin leak. In commit fbbd37b36fa5 ("drm/i915: Move
object release to a freelist + worker"), the object free path was
deferred to a work, but the unpinning of the quirk, along with marking
the object as reclaimable, was left on the immediate path (so that if
required we could reclaim the pages under memory pressure as early as
possible). However, this split introduced a bug where the pages we no
longer being unpinned if they were marked as unneeded.

[  231.800401] WARNING: CPU: 1 PID: 90 at drivers/gpu/drm/i915/i915_gem.c:4275 __i915_gem_free_objects+0x326/0x3c0 [i915]
[  231.800403] WARN_ON(i915_gem_object_has_pinned_pages(obj))
[  231.800405] Modules linked in:
[  231.800406]  snd_hda_intel i915 snd_hda_codec_generic mei_me snd_hda_codec coretemp snd_hwdep mei lpc_ich snd_hda_core snd_pcm e1000e ptp pps_core [last unloaded: i915]
[  231.800426] CPU: 1 PID: 90 Comm: kworker/1:4 Tainted: G     U          4.9.0-rc2-CI-CI_DRM_1780+ #1
[  231.800428] Hardware name: LENOVO 7465CTO/7465CTO, BIOS 6DET44WW (2.08 ) 04/22/2009
[  231.800456] Workqueue: events __i915_gem_free_work [i915]
[  231.800459]  ffffc9000034fc80 ffffffff8142dd65 ffffc9000034fcd0 0000000000000000
[  231.800465]  ffffc9000034fcc0 ffffffff8107e4e6 000010b300000001 0000000000001000
[  231.800469]  ffff88011d3db740 ffff880130ef0000 0000000000000000 ffff880130ef5ea0
[  231.800474] Call Trace:
[  231.800479]  [<ffffffff8142dd65>] dump_stack+0x67/0x92
[  231.800484]  [<ffffffff8107e4e6>] __warn+0xc6/0xe0
[  231.800487]  [<ffffffff8107e54a>] warn_slowpath_fmt+0x4a/0x50
[  231.800491]  [<ffffffff811d12ac>] ? kmem_cache_free+0x2dc/0x340
[  231.800520]  [<ffffffffa009ef36>] __i915_gem_free_objects+0x326/0x3c0 [i915]
[  231.800548]  [<ffffffffa009effe>] __i915_gem_free_work+0x2e/0x50 [i915]
[  231.800552]  [<ffffffff8109c27c>] process_one_work+0x1ec/0x6b0
[  231.800555]  [<ffffffff8109c1f6>] ? process_one_work+0x166/0x6b0
[  231.800558]  [<ffffffff8109c789>] worker_thread+0x49/0x490
[  231.800561]  [<ffffffff8109c740>] ? process_one_work+0x6b0/0x6b0
[  231.800563]  [<ffffffff8109c740>] ? process_one_work+0x6b0/0x6b0
[  231.800566]  [<ffffffff810a2aab>] kthread+0xeb/0x110
[  231.800569]  [<ffffffff810a29c0>] ? kthread_park+0x60/0x60
[  231.800573]  [<ffffffff818164a7>] ret_from_fork+0x27/0x40

Fixes: fbbd37b36fa5 ("drm/i915: Move object release to a freelist + worker")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h        |  6 ++++++
 drivers/gpu/drm/i915/i915_gem.c        | 21 +++++++++++++--------
 drivers/gpu/drm/i915/i915_gem_tiling.c |  9 +++++++--
 3 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 42a499681966..7a18bf66f797 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2304,6 +2304,12 @@ struct drm_i915_gem_object {
 		 * pages were last acquired.
 		 */
 		bool dirty:1;
+
+		/**
+		 * This is set if the object has been pinned due to unknown
+		 * swizzling.
+		 */
+		bool quirked:1;
 	} mm;
 
 	/** Breadcrumb of last rendering to the buffer.
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b9f540b16a45..c58b7cabe87b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2324,8 +2324,10 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 		i915_gem_object_do_bit_17_swizzle(obj, st);
 
 	if (i915_gem_object_is_tiled(obj) &&
-	    dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES)
+	    dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
 		__i915_gem_object_pin_pages(obj);
+		obj->mm.quirked = true;
+	}
 
 	return st;
 
@@ -4091,10 +4093,15 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
 	if (obj->mm.pages &&
 	    i915_gem_object_is_tiled(obj) &&
 	    dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
-		if (obj->mm.madv == I915_MADV_WILLNEED)
+		if (obj->mm.madv == I915_MADV_WILLNEED) {
+			GEM_BUG_ON(!obj->mm.quirked);
 			__i915_gem_object_unpin_pages(obj);
-		if (args->madv == I915_MADV_WILLNEED)
+			obj->mm.quirked = false;
+		}
+		if (args->madv == I915_MADV_WILLNEED) {
 			__i915_gem_object_pin_pages(obj);
+			obj->mm.quirked = true;
+		}
 	}
 
 	if (obj->mm.madv != __I915_MADV_PURGED)
@@ -4335,14 +4342,12 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
 {
 	struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
 
+	if (obj->mm.quirked)
+		__i915_gem_object_unpin_pages(obj);
+
 	if (discard_backing_storage(obj))
 		obj->mm.madv = I915_MADV_DONTNEED;
 
-	if (obj->mm.pages && obj->mm.madv == I915_MADV_WILLNEED &&
-	    to_i915(obj->base.dev)->quirks & QUIRK_PIN_SWIZZLED_PAGES &&
-	    i915_gem_object_is_tiled(obj))
-		__i915_gem_object_unpin_pages(obj);
-
 	/* Before we free the object, make sure any pure RCU-only
 	 * read-side critical sections are complete, e.g.
 	 * i915_gem_busy_ioctl(). For the corresponding synchronized
diff --git a/drivers/gpu/drm/i915/i915_gem_tiling.c b/drivers/gpu/drm/i915/i915_gem_tiling.c
index 6395e62bd9e4..1577e7810cd6 100644
--- a/drivers/gpu/drm/i915/i915_gem_tiling.c
+++ b/drivers/gpu/drm/i915/i915_gem_tiling.c
@@ -263,10 +263,15 @@ i915_gem_set_tiling(struct drm_device *dev, void *data,
 			if (obj->mm.pages &&
 			    obj->mm.madv == I915_MADV_WILLNEED &&
 			    dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
-				if (args->tiling_mode == I915_TILING_NONE)
+				if (args->tiling_mode == I915_TILING_NONE) {
+					GEM_BUG_ON(!obj->mm.quirked);
 					__i915_gem_object_unpin_pages(obj);
-				if (!i915_gem_object_is_tiled(obj))
+					obj->mm.quirked = false;
+				}
+				if (!i915_gem_object_is_tiled(obj)) {
 					__i915_gem_object_pin_pages(obj);
+					obj->mm.quirked = true;
+				}
 			}
 			mutex_unlock(&obj->mm.lock);
 
-- 
2.10.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/6] drm/i915: Discard objects from mm global_list after being shrunk
  2016-10-31 10:26 [PATCH 1/6] drm/i915: Use the full hammer when shutting down the rcu tasks Chris Wilson
  2016-10-31 10:26 ` [PATCH 2/6] drm/i915: Avoid accessing request->timeline outside of its lifetime Chris Wilson
  2016-10-31 10:26 ` [PATCH 3/6] drm/i915: Track pages pinned due to swizzling quirk Chris Wilson
@ 2016-10-31 10:26 ` Chris Wilson
  2016-11-01  8:29   ` Tvrtko Ursulin
  2016-10-31 10:26 ` [PATCH 5/6] drm/i915: Move the recently scanned objects to the tail after shrinking Chris Wilson
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Chris Wilson @ 2016-10-31 10:26 UTC (permalink / raw)
  To: intel-gfx

In the shrinker, we can safely remove an empty object (obj->mm.pages ==
NULL) after having discarded the pages because we are holding the
struct_mutex.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_shrinker.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index 0241658af16b..b504ba091c4f 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -226,6 +226,7 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
 				mutex_lock(&obj->mm.lock);
 				if (!obj->mm.pages) {
 					__i915_gem_object_invalidate(obj);
+					list_del_init(&obj->global_list);
 					count += obj->base.size >> PAGE_SHIFT;
 				}
 				mutex_unlock(&obj->mm.lock);
-- 
2.10.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/6] drm/i915: Move the recently scanned objects to the tail after shrinking
  2016-10-31 10:26 [PATCH 1/6] drm/i915: Use the full hammer when shutting down the rcu tasks Chris Wilson
                   ` (2 preceding siblings ...)
  2016-10-31 10:26 ` [PATCH 4/6] drm/i915: Discard objects from mm global_list after being shrunk Chris Wilson
@ 2016-10-31 10:26 ` Chris Wilson
  2016-10-31 15:26   ` Joonas Lahtinen
  2016-10-31 10:26 ` [PATCH 6/6] drm/i915: Store the vma in an rbtree under the object Chris Wilson
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Chris Wilson @ 2016-10-31 10:26 UTC (permalink / raw)
  To: intel-gfx

During shrinking, we walk over the list of objects searching for
victims. Any that are not removed are put back into the global list.
Currently, they are put back in order (at the front) which means they
will be first to be scanned again. If we instead move them to the rear
of the list, we will scan new potential victims on the next pass and
waste less time rescanning unshrinkable objects. Normally the lists are
kept in rough order to shrinking (with object least frequently used at
the start), by moving just scanned objects to the rear we are
acknowledging that they are still in use.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_shrinker.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index b504ba091c4f..f4a1515737bd 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -232,7 +232,7 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
 				mutex_unlock(&obj->mm.lock);
 			}
 		}
-		list_splice(&still_in_list, phase->list);
+		list_splice_tail(&still_in_list, phase->list);
 	}
 
 	if (flags & I915_SHRINK_BOUND)
-- 
2.10.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 6/6] drm/i915: Store the vma in an rbtree under the object
  2016-10-31 10:26 [PATCH 1/6] drm/i915: Use the full hammer when shutting down the rcu tasks Chris Wilson
                   ` (3 preceding siblings ...)
  2016-10-31 10:26 ` [PATCH 5/6] drm/i915: Move the recently scanned objects to the tail after shrinking Chris Wilson
@ 2016-10-31 10:26 ` Chris Wilson
  2016-11-01  8:41   ` Tvrtko Ursulin
  2016-11-01  9:43   ` Tvrtko Ursulin
  2016-10-31 11:16 ` ✗ Fi.CI.BAT: failure for series starting with [1/6] drm/i915: Use the full hammer when shutting down the rcu tasks Patchwork
  2016-10-31 17:15 ` [PATCH 1/6] " Tvrtko Ursulin
  6 siblings, 2 replies; 25+ messages in thread
From: Chris Wilson @ 2016-10-31 10:26 UTC (permalink / raw)
  To: intel-gfx

With full-ppgtt one of the main bottlenecks is the lookup of the VMA
underneath the object. For execbuf there is merit in having a very fast
direct lookup of ctx:handle to the vma using a hashtree, but that still
leaves a large number of other lookups. One way to speed up the lookup
would be to use a rhashtable, but that requires extra allocations and
may exhibit poor worse case behaviour. An alternative is to use an
embedded rbtree, i.e. no extra allocations and deterministic behaviour,
but at the slight cost of O(lgN) lookups (instead of O(1) for
rhashtable). The major of such tree will be very shallow and so not much
slower, and still scales much, much better than the current unsorted
list.

References: https://bugs.freedesktop.org/show_bug.cgi?id=87726
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h     |  1 +
 drivers/gpu/drm/i915/i915_gem_gtt.c | 80 +++++++++++++++++++++++++------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  1 +
 3 files changed, 57 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7a18bf66f797..e923d6596cac 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2230,6 +2230,7 @@ struct drm_i915_gem_object {
 
 	/** List of VMAs backed by this object */
 	struct list_head vma_list;
+	struct rb_root vma_tree;
 
 	/** Stolen memory for this object, instead of being backed by shmem. */
 	struct drm_mm_node *stolen;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e7afad585929..aa2d21c41091 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3399,6 +3399,7 @@ void i915_vma_destroy(struct i915_vma *vma)
 	GEM_BUG_ON(!i915_vma_is_closed(vma));
 	GEM_BUG_ON(vma->fence);
 
+	rb_erase(&vma->obj_node, &vma->obj->vma_tree);
 	list_del(&vma->vm_link);
 	if (!i915_vma_is_ggtt(vma))
 		i915_ppgtt_put(i915_vm_to_ppgtt(vma->vm));
@@ -3416,12 +3417,33 @@ void i915_vma_close(struct i915_vma *vma)
 		WARN_ON(i915_vma_unbind(vma));
 }
 
+static inline int vma_compare(struct i915_vma *vma,
+			      struct i915_address_space *vm,
+			      const struct i915_ggtt_view *view)
+{
+	GEM_BUG_ON(view && !i915_vma_is_ggtt(vma));
+
+	if (vma->vm != vm)
+		return vma->vm - vm;
+
+	if (!view)
+		return vma->ggtt_view.type;
+
+	if (vma->ggtt_view.type != view->type)
+		return vma->ggtt_view.type - view->type;
+
+	return memcmp(&vma->ggtt_view.params,
+		      &view->params,
+		      sizeof(view->params));
+}
+
 static struct i915_vma *
 __i915_vma_create(struct drm_i915_gem_object *obj,
 		  struct i915_address_space *vm,
 		  const struct i915_ggtt_view *view)
 {
 	struct i915_vma *vma;
+	struct rb_node *rb, **p;
 	int i;
 
 	GEM_BUG_ON(vm->closed);
@@ -3455,33 +3477,28 @@ __i915_vma_create(struct drm_i915_gem_object *obj,
 
 	if (i915_is_ggtt(vm)) {
 		vma->flags |= I915_VMA_GGTT;
+		list_add(&vma->obj_link, &obj->vma_list);
 	} else {
 		i915_ppgtt_get(i915_vm_to_ppgtt(vm));
+		list_add_tail(&vma->obj_link, &obj->vma_list);
 	}
 
-	list_add_tail(&vma->obj_link, &obj->vma_list);
-	return vma;
-}
+	rb = NULL;
+	p = &obj->vma_tree.rb_node;
+	while (*p) {
+		struct i915_vma *pos;
 
-static inline bool vma_matches(struct i915_vma *vma,
-			       struct i915_address_space *vm,
-			       const struct i915_ggtt_view *view)
-{
-	if (vma->vm != vm)
-		return false;
-
-	if (!i915_vma_is_ggtt(vma))
-		return true;
-
-	if (!view)
-		return vma->ggtt_view.type == 0;
-
-	if (vma->ggtt_view.type != view->type)
-		return false;
+		rb = *p;
+		pos = rb_entry(rb, struct i915_vma, obj_node);
+		if (vma_compare(pos, vm, view) < 0)
+			p = &rb->rb_right;
+		else
+			p = &rb->rb_left;
+	}
+	rb_link_node(&vma->obj_node, rb, p);
+	rb_insert_color(&vma->obj_node, &obj->vma_tree);
 
-	return memcmp(&vma->ggtt_view.params,
-		      &view->params,
-		      sizeof(view->params)) == 0;
+	return vma;
 }
 
 struct i915_vma *
@@ -3501,11 +3518,22 @@ i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
 		    struct i915_address_space *vm,
 		    const struct i915_ggtt_view *view)
 {
-	struct i915_vma *vma;
+	struct rb_node *rb;
+
+	rb = obj->vma_tree.rb_node;
+	while (rb) {
+		struct i915_vma *vma;
+		int cmp;
 
-	list_for_each_entry_reverse(vma, &obj->vma_list, obj_link)
-		if (vma_matches(vma, vm, view))
+		vma = rb_entry(rb, struct i915_vma, obj_node);
+		cmp = vma_compare(vma, vm, view);
+		if (cmp == 0)
 			return vma;
+		else if (cmp < 0)
+			rb = rb->rb_right;
+		else
+			rb = rb->rb_left;
+	}
 
 	return NULL;
 }
@@ -3521,8 +3549,10 @@ i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
 	GEM_BUG_ON(view && !i915_is_ggtt(vm));
 
 	vma = i915_gem_obj_to_vma(obj, vm, view);
-	if (!vma)
+	if (!vma) {
 		vma = __i915_vma_create(obj, vm, view);
+		GEM_BUG_ON(vma != i915_gem_obj_to_vma(obj, vm, view));
+	}
 
 	GEM_BUG_ON(i915_vma_is_closed(vma));
 	return vma;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 518e75b64290..c23ef9db1f53 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -227,6 +227,7 @@ struct i915_vma {
 	struct list_head vm_link;
 
 	struct list_head obj_link; /* Link in the object's VMA list */
+	struct rb_node obj_node;
 
 	/** This vma's place in the batchbuffer or on the eviction list */
 	struct list_head exec_list;
-- 
2.10.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* ✗ Fi.CI.BAT: failure for series starting with [1/6] drm/i915: Use the full hammer when shutting down the rcu tasks
  2016-10-31 10:26 [PATCH 1/6] drm/i915: Use the full hammer when shutting down the rcu tasks Chris Wilson
                   ` (4 preceding siblings ...)
  2016-10-31 10:26 ` [PATCH 6/6] drm/i915: Store the vma in an rbtree under the object Chris Wilson
@ 2016-10-31 11:16 ` Patchwork
  2016-10-31 17:15 ` [PATCH 1/6] " Tvrtko Ursulin
  6 siblings, 0 replies; 25+ messages in thread
From: Patchwork @ 2016-10-31 11:16 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/6] drm/i915: Use the full hammer when shutting down the rcu tasks
URL   : https://patchwork.freedesktop.org/series/14606/
State : failure

== Summary ==

Series 14606v1 Series without cover letter
https://patchwork.freedesktop.org/api/1.0/series/14606/revisions/1/mbox/

Test drv_module_reload_basic:
                pass       -> SKIP       (fi-skl-6770hq)
Test gem_exec_suspend:
        Subgroup basic-s3:
                pass       -> DMESG-WARN (fi-skl-6770hq)
Test gem_ringfill:
        Subgroup basic-default-hang:
                incomplete -> PASS       (fi-hsw-4770r)
Test kms_flip:
        Subgroup basic-flip-vs-wf_vblank:
                pass       -> SKIP       (fi-byt-n2820)
Test kms_frontbuffer_tracking:
        Subgroup basic:
                pass       -> FAIL       (fi-skl-6770hq)
Test kms_pipe_crc_basic:
        Subgroup nonblocking-crc-pipe-a-frame-sequence:
                pass       -> DMESG-WARN (fi-ilk-650)

fi-bdw-5557u     total:241  pass:226  dwarn:0   dfail:0   fail:0   skip:15 
fi-bsw-n3050     total:241  pass:201  dwarn:0   dfail:0   fail:0   skip:40 
fi-bxt-t5700     total:241  pass:213  dwarn:0   dfail:0   fail:0   skip:28 
fi-byt-j1900     total:241  pass:213  dwarn:0   dfail:0   fail:0   skip:28 
fi-byt-n2820     total:241  pass:208  dwarn:0   dfail:0   fail:0   skip:33 
fi-hsw-4770      total:241  pass:221  dwarn:0   dfail:0   fail:0   skip:20 
fi-hsw-4770r     total:241  pass:220  dwarn:0   dfail:0   fail:0   skip:21 
fi-ilk-650       total:241  pass:186  dwarn:1   dfail:0   fail:0   skip:54 
fi-ivb-3520m     total:241  pass:218  dwarn:0   dfail:0   fail:0   skip:23 
fi-ivb-3770      total:241  pass:218  dwarn:0   dfail:0   fail:0   skip:23 
fi-kbl-7200u     total:241  pass:219  dwarn:0   dfail:0   fail:0   skip:22 
fi-skl-6260u     total:241  pass:227  dwarn:0   dfail:0   fail:0   skip:14 
fi-skl-6700hq    total:241  pass:220  dwarn:0   dfail:0   fail:0   skip:21 
fi-skl-6700k     total:241  pass:219  dwarn:1   dfail:0   fail:0   skip:21 
fi-skl-6770hq    total:241  pass:224  dwarn:1   dfail:0   fail:1   skip:15 
fi-snb-2520m     total:241  pass:208  dwarn:0   dfail:0   fail:0   skip:33 
fi-snb-2600      total:241  pass:207  dwarn:0   dfail:0   fail:0   skip:34 

bbb625b5b79bdbdefd87e68e15edaa120fe70d4f drm-intel-nightly: 2016y-10m-29d-12h-45m-44s UTC integration manifest
59c8ae4 drm/i915: Store the vma in an rbtree under the object
89bc99d drm/i915: Move the recently scanned objects to the tail after shrinking
39da91c drm/i915: Discard objects from mm global_list after being shrunk
c63d866 drm/i915: Track pages pinned due to swizzling quirk
002d32d drm/i915: Avoid accessing request->timeline outside of its lifetime
b814cf8 drm/i915: Use the full hammer when shutting down the rcu tasks

== Logs ==

For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_2862/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/6] drm/i915: Move the recently scanned objects to the tail after shrinking
  2016-10-31 10:26 ` [PATCH 5/6] drm/i915: Move the recently scanned objects to the tail after shrinking Chris Wilson
@ 2016-10-31 15:26   ` Joonas Lahtinen
  0 siblings, 0 replies; 25+ messages in thread
From: Joonas Lahtinen @ 2016-10-31 15:26 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On ma, 2016-10-31 at 10:26 +0000, Chris Wilson wrote:
> During shrinking, we walk over the list of objects searching for
> victims. Any that are not removed are put back into the global list.
> Currently, they are put back in order (at the front) which means they
> will be first to be scanned again. If we instead move them to the rear
> of the list, we will scan new potential victims on the next pass and
> waste less time rescanning unshrinkable objects. Normally the lists are
> kept in rough order to shrinking (with object least frequently used at
> the start), by moving just scanned objects to the rear we are
> acknowledging that they are still in use.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] drm/i915: Use the full hammer when shutting down the rcu tasks
  2016-10-31 10:26 [PATCH 1/6] drm/i915: Use the full hammer when shutting down the rcu tasks Chris Wilson
                   ` (5 preceding siblings ...)
  2016-10-31 11:16 ` ✗ Fi.CI.BAT: failure for series starting with [1/6] drm/i915: Use the full hammer when shutting down the rcu tasks Patchwork
@ 2016-10-31 17:15 ` Tvrtko Ursulin
  2016-10-31 21:05   ` Chris Wilson
  6 siblings, 1 reply; 25+ messages in thread
From: Tvrtko Ursulin @ 2016-10-31 17:15 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 31/10/2016 10:26, Chris Wilson wrote:
> To flush all call_rcu() tasks (here from i915_gem_free_object()) we need
> to call rcu_barrier() (not synchronize_rcu()). If we don't then we may
> still have objects being freed as we continue to teardown the driver -
> in particular, the recently released rings may race with the memory
> manager shutdown resulting in sporadic:
>
> [  142.217186] WARNING: CPU: 7 PID: 6185 at drivers/gpu/drm/drm_mm.c:932 drm_mm_takedown+0x2e/0x40
> [  142.217187] Memory manager not clean during takedown.
> [  142.217187] Modules linked in: i915(-) x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel lpc_ich snd_hda_codec_realtek snd_hda_codec_generic mei_me mei snd_hda_codec_hdmi snd_hda_codec snd_hwdep snd_hda_core snd_pcm e1000e ptp pps_core [last unloaded: snd_hda_intel]
> [  142.217199] CPU: 7 PID: 6185 Comm: rmmod Not tainted 4.9.0-rc2-CI-Trybot_242+ #1
> [  142.217199] Hardware name: LENOVO 10AGS00601/SHARKBAY, BIOS FBKT34AUS 04/24/2013
> [  142.217200]  ffffc90002ecfce0 ffffffff8142dd65 ffffc90002ecfd30 0000000000000000
> [  142.217202]  ffffc90002ecfd20 ffffffff8107e4e6 000003a40778c2a8 ffff880401355c48
> [  142.217204]  ffff88040778c2a8 ffffffffa040f3c0 ffffffffa040f4a0 00005621fbf8b1f0
> [  142.217206] Call Trace:
> [  142.217209]  [<ffffffff8142dd65>] dump_stack+0x67/0x92
> [  142.217211]  [<ffffffff8107e4e6>] __warn+0xc6/0xe0
> [  142.217213]  [<ffffffff8107e54a>] warn_slowpath_fmt+0x4a/0x50
> [  142.217214]  [<ffffffff81559e3e>] drm_mm_takedown+0x2e/0x40
> [  142.217236]  [<ffffffffa035c02a>] i915_gem_cleanup_stolen+0x1a/0x20 [i915]
> [  142.217246]  [<ffffffffa034c581>] i915_ggtt_cleanup_hw+0x31/0xb0 [i915]
> [  142.217253]  [<ffffffffa0310311>] i915_driver_cleanup_hw+0x31/0x40 [i915]
> [  142.217260]  [<ffffffffa0312001>] i915_driver_unload+0x141/0x1a0 [i915]
> [  142.217268]  [<ffffffffa031c2c4>] i915_pci_remove+0x14/0x20 [i915]
> [  142.217269]  [<ffffffff8147d214>] pci_device_remove+0x34/0xb0
> [  142.217271]  [<ffffffff8157b14c>] __device_release_driver+0x9c/0x150
> [  142.217272]  [<ffffffff8157bcc6>] driver_detach+0xb6/0xc0
> [  142.217273]  [<ffffffff8157abe3>] bus_remove_driver+0x53/0xd0
> [  142.217274]  [<ffffffff8157c787>] driver_unregister+0x27/0x50
> [  142.217276]  [<ffffffff8147c265>] pci_unregister_driver+0x25/0x70
> [  142.217287]  [<ffffffffa03d764c>] i915_exit+0x1a/0x71 [i915]
> [  142.217289]  [<ffffffff811136b3>] SyS_delete_module+0x193/0x1e0
> [  142.217291]  [<ffffffff818174ae>] entry_SYSCALL_64_fastpath+0x1c/0xb1
> [  142.217292] ---[ end trace 6fd164859c154772 ]---
> [  142.217505] [drm:show_leaks] *ERROR* node [6b6b6b6b6b6b6b6b + 6b6b6b6b6b6b6b6b]: inserted at
>                 [<ffffffff81559ff3>] save_stack.isra.1+0x53/0xa0
>                 [<ffffffff8155a98d>] drm_mm_insert_node_in_range_generic+0x2ad/0x360
>                 [<ffffffffa035bf23>] i915_gem_stolen_insert_node_in_range+0x93/0xe0 [i915]
>                 [<ffffffffa035c855>] i915_gem_object_create_stolen+0x75/0xb0 [i915]
>                 [<ffffffffa036a51a>] intel_engine_create_ring+0x9a/0x140 [i915]
>                 [<ffffffffa036a921>] intel_init_ring_buffer+0xf1/0x440 [i915]
>                 [<ffffffffa036be1b>] intel_init_render_ring_buffer+0xab/0x1b0 [i915]
>                 [<ffffffffa0363d08>] intel_engines_init+0xc8/0x210 [i915]
>                 [<ffffffffa0355d7c>] i915_gem_init+0xac/0xf0 [i915]
>                 [<ffffffffa0311454>] i915_driver_load+0x9c4/0x1430 [i915]
>                 [<ffffffffa031c2f8>] i915_pci_probe+0x28/0x40 [i915]
>                 [<ffffffff8147d315>] pci_device_probe+0x85/0xf0
>                 [<ffffffff8157b7ff>] driver_probe_device+0x21f/0x430
>                 [<ffffffff8157baee>] __driver_attach+0xde/0xe0
>
> In particular note that the node was being poisoned as we inspected the
> list, a  clear indication that the object is being freed as we make the
> assertion.
>
> Fixes: fbbd37b36fa5 ("drm/i915: Move object release to a freelist + worker")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 839ce2ae38fa..ed01421e3be7 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -544,8 +544,10 @@ static void i915_gem_fini(struct drm_i915_private *dev_priv)
>  	i915_gem_context_fini(&dev_priv->drm);
>  	mutex_unlock(&dev_priv->drm.struct_mutex);
>
> -	synchronize_rcu();
> -	flush_work(&dev_priv->mm.free_work);
> +	do {
> +		rcu_barrier();
> +		flush_work(&dev_priv->mm.free_work);
> +	} while (!llist_empty(&dev_priv->mm.free_list));
>
>  	WARN_ON(!list_empty(&dev_priv->context_list));
>  }
>

Not sure that the loop is required - after the rcu_barrier all the 
queued up additions have been added to the list and workers queued. So 
flush_work afterwards handles those and we are done, no?

If I am right it could be written as:

rcu_barrier();
flush_work(...);
WARN_ON(!llist_empty(...);

If I am not right then it sounds pretty fishy since it implies new 
objects are being freed in parallel to i915_gem_fini. Hm?

It fixes the CI fire so r-b anyway if you want to merge it while 
considering the above:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/6] drm/i915: Avoid accessing request->timeline outside of its lifetime
  2016-10-31 10:26 ` [PATCH 2/6] drm/i915: Avoid accessing request->timeline outside of its lifetime Chris Wilson
@ 2016-10-31 17:35   ` Tvrtko Ursulin
  2016-10-31 21:03     ` Chris Wilson
  0 siblings, 1 reply; 25+ messages in thread
From: Tvrtko Ursulin @ 2016-10-31 17:35 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 31/10/2016 10:26, Chris Wilson wrote:
> Whilst waiting on a request, we may do so without holding any locks or
> any guards beyond a reference to the request. In order to avoid taking
> locks within request deallocation, we drop references to its timeline
> (via the context and ppgtt) upon retirement. We should avoid chasing

Couldn't find that there is a reference taken (or dropped) on the 
timeline when stored in a request. It looks like a borrowed pointer to me?

> such pointers outside of their control, in particular we inspect the
> request->timeline to see if we may restore the RPS waitboost for a
> client. If we instead look at the engine->timeline, we will have similar
> behaviour on both full-ppgtt and !full-ppgtt systems and reduce the
> amount of reward we give towards stalling clients (i.e. only if the
> client stalls and the GPU is uncontended does it reclaim its boost).
> This restores behaviour back to pre-timelines, whilst fixing:
>
> [  645.078485] BUG: KASAN: use-after-free in i915_gem_object_wait_fence+0x1ee/0x2e0 at addr ffff8802335643a0
> [  645.078577] Read of size 4 by task gem_exec_schedu/28408
> [  645.078638] CPU: 1 PID: 28408 Comm: gem_exec_schedu Not tainted 4.9.0-rc2+ #64
> [  645.078724] Hardware name:                  /        , BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015
> [  645.078816]  ffff88022daef9a0 ffffffff8143d059 ffff880235402a80 ffff880233564200
> [  645.078998]  ffff88022daef9c8 ffffffff81229c5c ffff88022daefa48 ffff880233564200
> [  645.079172]  ffff880235402a80 ffff88022daefa38 ffffffff81229ef0 000000008110a796
> [  645.079345] Call Trace:
> [  645.079404]  [<ffffffff8143d059>] dump_stack+0x68/0x9f
> [  645.079467]  [<ffffffff81229c5c>] kasan_object_err+0x1c/0x70
> [  645.079534]  [<ffffffff81229ef0>] kasan_report_error+0x1f0/0x4b0
> [  645.079601]  [<ffffffff8122a244>] kasan_report+0x34/0x40
> [  645.079676]  [<ffffffff81634f5e>] ? i915_gem_object_wait_fence+0x1ee/0x2e0
> [  645.079741]  [<ffffffff81229951>] __asan_load4+0x61/0x80
> [  645.079807]  [<ffffffff81634f5e>] i915_gem_object_wait_fence+0x1ee/0x2e0
> [  645.079876]  [<ffffffff816364bf>] i915_gem_object_wait+0x19f/0x590
> [  645.079944]  [<ffffffff81636320>] ? i915_gem_object_wait_priority+0x500/0x500
> [  645.080016]  [<ffffffff8110fb30>] ? debug_show_all_locks+0x1e0/0x1e0
> [  645.080084]  [<ffffffff8110abdc>] ? check_chain_key+0x14c/0x210
> [  645.080157]  [<ffffffff8110a796>] ? __lock_is_held+0x46/0xc0
> [  645.080226]  [<ffffffff8163bc61>] ? i915_gem_set_domain_ioctl+0x141/0x690
> [  645.080296]  [<ffffffff8163bcc2>] i915_gem_set_domain_ioctl+0x1a2/0x690
> [  645.080366]  [<ffffffff811f8f85>] ? __might_fault+0x75/0xe0
> [  645.080433]  [<ffffffff815a55f7>] drm_ioctl+0x327/0x640
> [  645.080508]  [<ffffffff8163bb20>] ? i915_gem_obj_prepare_shmem_write+0x3a0/0x3a0
> [  645.080603]  [<ffffffff815a52d0>] ? drm_ioctl_permit+0x120/0x120
> [  645.080670]  [<ffffffff8110abdc>] ? check_chain_key+0x14c/0x210
> [  645.080738]  [<ffffffff81275717>] do_vfs_ioctl+0x127/0xa20
> [  645.080804]  [<ffffffff8120268c>] ? do_mmap+0x47c/0x580
> [  645.080871]  [<ffffffff811da567>] ? vm_mmap_pgoff+0x117/0x140
> [  645.080938]  [<ffffffff812755f0>] ? ioctl_preallocate+0x150/0x150
> [  645.081011]  [<ffffffff81108c53>] ? up_write+0x23/0x50
> [  645.081078]  [<ffffffff811da567>] ? vm_mmap_pgoff+0x117/0x140
> [  645.081145]  [<ffffffff811da450>] ? vma_is_stack_for_current+0x90/0x90
> [  645.081214]  [<ffffffff8110d853>] ? mark_held_locks+0x23/0xc0
> [  645.082030]  [<ffffffff81288408>] ? __fget+0x168/0x250
> [  645.082106]  [<ffffffff819ad517>] ? entry_SYSCALL_64_fastpath+0x5/0xb1
> [  645.082176]  [<ffffffff81288592>] ? __fget_light+0xa2/0xc0
> [  645.082242]  [<ffffffff8127604c>] SyS_ioctl+0x3c/0x70
> [  645.082309]  [<ffffffff819ad52e>] entry_SYSCALL_64_fastpath+0x1c/0xb1
> [  645.082374] Object at ffff880233564200, in cache kmalloc-8192 size: 8192
> [  645.082431] Allocated:
> [  645.082480] PID = 28408
> [  645.082535]  [  645.082566] [<ffffffff8103ae66>] save_stack_trace+0x16/0x20
> [  645.082623]  [  645.082656] [<ffffffff81228b06>] save_stack+0x46/0xd0
> [  645.082716]  [  645.082756] [<ffffffff812292fd>] kasan_kmalloc+0xad/0xe0
> [  645.082817]  [  645.082848] [<ffffffff81631752>] i915_ppgtt_create+0x52/0x220
> [  645.082908]  [  645.082941] [<ffffffff8161db96>] i915_gem_create_context+0x396/0x560
> [  645.083027]  [  645.083059] [<ffffffff8161f857>] i915_gem_context_create_ioctl+0x97/0xf0
> [  645.083152]  [  645.083183] [<ffffffff815a55f7>] drm_ioctl+0x327/0x640
> [  645.083243]  [  645.083274] [<ffffffff81275717>] do_vfs_ioctl+0x127/0xa20
> [  645.083334]  [  645.083372] [<ffffffff8127604c>] SyS_ioctl+0x3c/0x70
> [  645.083432]  [  645.083464] [<ffffffff819ad52e>] entry_SYSCALL_64_fastpath+0x1c/0xb1
> [  645.083551] Freed:
> [  645.083599] PID = 27629
> [  645.083648]  [  645.083676] [<ffffffff8103ae66>] save_stack_trace+0x16/0x20
> [  645.083738]  [  645.083770] [<ffffffff81228b06>] save_stack+0x46/0xd0
> [  645.083830]  [  645.083862] [<ffffffff81229203>] kasan_slab_free+0x73/0xc0
> [  645.083922]  [  645.083961] [<ffffffff812279c9>] kfree+0xa9/0x170
> [  645.084021]  [  645.084053] [<ffffffff81629f60>] i915_ppgtt_release+0x100/0x180
> [  645.084139]  [  645.084171] [<ffffffff8161d414>] i915_gem_context_free+0x1b4/0x230
> [  645.084257]  [  645.084288] [<ffffffff816537b2>] intel_lr_context_unpin+0x192/0x230
> [  645.084380]  [  645.084413] [<ffffffff81645250>] i915_gem_request_retire+0x620/0x630
> [  645.084500]  [  645.085226] [<ffffffff816473d1>] i915_gem_retire_requests+0x181/0x280
> [  645.085313]  [  645.085352] [<ffffffff816352ba>] i915_gem_retire_work_handler+0xca/0xe0
> [  645.085440]  [  645.085471] [<ffffffff810c725b>] process_one_work+0x4fb/0x920
> [  645.085532]  [  645.085562] [<ffffffff810c770d>] worker_thread+0x8d/0x840
> [  645.085622]  [  645.085653] [<ffffffff810d21e5>] kthread+0x185/0x1b0
> [  645.085718]  [  645.085750] [<ffffffff819ad7a7>] ret_from_fork+0x27/0x40
> [  645.085811] Memory state around the buggy address:
> [  645.085869]  ffff880233564280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [  645.085956]  ffff880233564300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [  645.086053] >ffff880233564380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [  645.086138]                                ^
> [  645.086193]  ffff880233564400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [  645.086283]  ffff880233564480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>
> Fixes: 73cb97010d4f ("drm/i915: Combine seqno + tracking into a global timeline struct")
> Fixes: 80b204bce8f2 ("drm/i915: Enable multiple timelines")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c     | 7 +++----
>  drivers/gpu/drm/i915/i915_gem.c         | 4 ++--
>  drivers/gpu/drm/i915/i915_gpu_error.c   | 2 +-
>  drivers/gpu/drm/i915/i915_irq.c         | 2 +-
>  drivers/gpu/drm/i915/intel_ringbuffer.h | 5 +++++
>  5 files changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 9bef6f55f99d..bf19192dcc3b 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1348,9 +1348,8 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
>
>  		seq_printf(m, "%s:\n", engine->name);
>  		seq_printf(m, "\tseqno = %x [current %x, last %x]\n",
> -			   engine->hangcheck.seqno,
> -			   seqno[id],
> -			   engine->timeline->last_submitted_seqno);
> +			   engine->hangcheck.seqno, seqno[id],
> +			   intel_engine_last_submit(engine));
>  		seq_printf(m, "\twaiters? %s, fake irq active? %s\n",
>  			   yesno(intel_engine_has_waiter(engine)),
>  			   yesno(test_bit(engine->id,
> @@ -3167,7 +3166,7 @@ static int i915_engine_info(struct seq_file *m, void *unused)
>  		seq_printf(m, "%s\n", engine->name);
>  		seq_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [score %d]\n",
>  			   intel_engine_get_seqno(engine),
> -			   engine->timeline->last_submitted_seqno,
> +			   intel_engine_last_submit(engine),
>  			   engine->hangcheck.seqno,
>  			   engine->hangcheck.score);
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 1e5d2bf777e4..b9f540b16a45 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -371,7 +371,7 @@ i915_gem_object_wait_fence(struct dma_fence *fence,
>  	if (flags & I915_WAIT_LOCKED && i915_gem_request_completed(rq))
>  		i915_gem_request_retire_upto(rq);

You could have stuck with req or request when adding new code. Now we 
have three names for requests in the code base. :(

>
> -	if (rps && rq->fence.seqno == rq->timeline->last_submitted_seqno) {
> +	if (rps && rq->global_seqno == intel_engine_last_submit(rq->engine)) {

Okay so not touching the rq->timeline here and rq->global_seqno is from 
the same seqno-space as the engine timeline seqnos I assume.

>  		/* The GPU is now idle and this client has stalled.
>  		 * Since no other client has submitted a request in the
>  		 * meantime, assume that this client is the only one
> @@ -2674,7 +2674,7 @@ static void i915_gem_cleanup_engine(struct intel_engine_cs *engine)
>  	 * reset it.
>  	 */
>  	intel_engine_init_global_seqno(engine,
> -				       engine->timeline->last_submitted_seqno);
> +				       intel_engine_last_submit(engine));
>
>  	/*
>  	 * Clear the execlists queue up before freeing the requests, as those
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 7ba40487e345..204093f3eaa5 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1120,7 +1120,7 @@ static void error_record_engine_registers(struct drm_i915_error_state *error,
>  	ee->instpm = I915_READ(RING_INSTPM(engine->mmio_base));
>  	ee->acthd = intel_engine_get_active_head(engine);
>  	ee->seqno = intel_engine_get_seqno(engine);
> -	ee->last_seqno = engine->timeline->last_submitted_seqno;
> +	ee->last_seqno = intel_engine_last_submit(engine);
>  	ee->start = I915_READ_START(engine);
>  	ee->head = I915_READ_HEAD(engine);
>  	ee->tail = I915_READ_TAIL(engine);
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 90d0905592f2..4dda2b1eefdb 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -3168,7 +3168,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>
>  		acthd = intel_engine_get_active_head(engine);
>  		seqno = intel_engine_get_seqno(engine);
> -		submit = READ_ONCE(engine->timeline->last_submitted_seqno);
> +		submit = intel_engine_last_submit(engine);
>
>  		if (engine->hangcheck.seqno == seqno) {
>  			if (i915_seqno_passed(seqno, submit)) {
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 43f61aa24ed7..a089e11ee575 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -496,6 +496,11 @@ static inline u32 intel_engine_get_seqno(struct intel_engine_cs *engine)
>  	return intel_read_status_page(engine, I915_GEM_HWS_INDEX);
>  }
>
> +static inline u32 intel_engine_last_submit(struct intel_engine_cs *engine)
> +{
> +	return READ_ONCE(engine->timeline->last_submitted_seqno);
> +}
> +

Don't like that READ_ONCE gets sprinkled all over the place via call 
sites. It should be extremely well defined and controlled from where it 
is used. Otherwise it suggests READ_ONCE is not really appropriate.

>  int init_workarounds_ring(struct intel_engine_cs *engine);
>
>  void intel_engine_get_instdone(struct intel_engine_cs *engine,
>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/6] drm/i915: Avoid accessing request->timeline outside of its lifetime
  2016-10-31 17:35   ` Tvrtko Ursulin
@ 2016-10-31 21:03     ` Chris Wilson
  2016-11-01  8:43       ` Tvrtko Ursulin
  0 siblings, 1 reply; 25+ messages in thread
From: Chris Wilson @ 2016-10-31 21:03 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Mon, Oct 31, 2016 at 05:35:50PM +0000, Tvrtko Ursulin wrote:
> 
> On 31/10/2016 10:26, Chris Wilson wrote:
> >Whilst waiting on a request, we may do so without holding any locks or
> >any guards beyond a reference to the request. In order to avoid taking
> >locks within request deallocation, we drop references to its timeline
> >(via the context and ppgtt) upon retirement. We should avoid chasing
> 
> Couldn't find that there is a reference taken (or dropped) on the
> timeline when stored in a request. It looks like a borrowed pointer
> to me?

The timeline is owned by the address space which is owned by either the
context or the device. The request holds a reference on the context (and
so indirectly onto the timeline, except for the device's which outlives
the request) up until we retire the request. (Retiring holds
struct_mutex so is earlier than freeing.)

> >+static inline u32 intel_engine_last_submit(struct intel_engine_cs *engine)
> >+{
> >+	return READ_ONCE(engine->timeline->last_submitted_seqno);
> >+}
> >+
> 
> Don't like that READ_ONCE gets sprinkled all over the place via call
> sites. It should be extremely well defined and controlled from where
> it is used. Otherwise it suggests READ_ONCE is not really
> appropriate.

It actually is appropriate. last_submitted_seqno is under the timeline
spinlock, none of the callers take the appropriate guard. This is
trying to document that these callers don't call about fully
synchronising the last_submitted_seqno with the request list or last
request pointer. And we don't care to go full seqlock, since we really
are only peeking.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] drm/i915: Use the full hammer when shutting down the rcu tasks
  2016-10-31 17:15 ` [PATCH 1/6] " Tvrtko Ursulin
@ 2016-10-31 21:05   ` Chris Wilson
  0 siblings, 0 replies; 25+ messages in thread
From: Chris Wilson @ 2016-10-31 21:05 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Mon, Oct 31, 2016 at 05:15:45PM +0000, Tvrtko Ursulin wrote:
> 
> On 31/10/2016 10:26, Chris Wilson wrote:
> >To flush all call_rcu() tasks (here from i915_gem_free_object()) we need
> >to call rcu_barrier() (not synchronize_rcu()). If we don't then we may
> >still have objects being freed as we continue to teardown the driver -
> >in particular, the recently released rings may race with the memory
> >manager shutdown resulting in sporadic:
> >
> >[  142.217186] WARNING: CPU: 7 PID: 6185 at drivers/gpu/drm/drm_mm.c:932 drm_mm_takedown+0x2e/0x40
> >[  142.217187] Memory manager not clean during takedown.
> >[  142.217187] Modules linked in: i915(-) x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel lpc_ich snd_hda_codec_realtek snd_hda_codec_generic mei_me mei snd_hda_codec_hdmi snd_hda_codec snd_hwdep snd_hda_core snd_pcm e1000e ptp pps_core [last unloaded: snd_hda_intel]
> >[  142.217199] CPU: 7 PID: 6185 Comm: rmmod Not tainted 4.9.0-rc2-CI-Trybot_242+ #1
> >[  142.217199] Hardware name: LENOVO 10AGS00601/SHARKBAY, BIOS FBKT34AUS 04/24/2013
> >[  142.217200]  ffffc90002ecfce0 ffffffff8142dd65 ffffc90002ecfd30 0000000000000000
> >[  142.217202]  ffffc90002ecfd20 ffffffff8107e4e6 000003a40778c2a8 ffff880401355c48
> >[  142.217204]  ffff88040778c2a8 ffffffffa040f3c0 ffffffffa040f4a0 00005621fbf8b1f0
> >[  142.217206] Call Trace:
> >[  142.217209]  [<ffffffff8142dd65>] dump_stack+0x67/0x92
> >[  142.217211]  [<ffffffff8107e4e6>] __warn+0xc6/0xe0
> >[  142.217213]  [<ffffffff8107e54a>] warn_slowpath_fmt+0x4a/0x50
> >[  142.217214]  [<ffffffff81559e3e>] drm_mm_takedown+0x2e/0x40
> >[  142.217236]  [<ffffffffa035c02a>] i915_gem_cleanup_stolen+0x1a/0x20 [i915]
> >[  142.217246]  [<ffffffffa034c581>] i915_ggtt_cleanup_hw+0x31/0xb0 [i915]
> >[  142.217253]  [<ffffffffa0310311>] i915_driver_cleanup_hw+0x31/0x40 [i915]
> >[  142.217260]  [<ffffffffa0312001>] i915_driver_unload+0x141/0x1a0 [i915]
> >[  142.217268]  [<ffffffffa031c2c4>] i915_pci_remove+0x14/0x20 [i915]
> >[  142.217269]  [<ffffffff8147d214>] pci_device_remove+0x34/0xb0
> >[  142.217271]  [<ffffffff8157b14c>] __device_release_driver+0x9c/0x150
> >[  142.217272]  [<ffffffff8157bcc6>] driver_detach+0xb6/0xc0
> >[  142.217273]  [<ffffffff8157abe3>] bus_remove_driver+0x53/0xd0
> >[  142.217274]  [<ffffffff8157c787>] driver_unregister+0x27/0x50
> >[  142.217276]  [<ffffffff8147c265>] pci_unregister_driver+0x25/0x70
> >[  142.217287]  [<ffffffffa03d764c>] i915_exit+0x1a/0x71 [i915]
> >[  142.217289]  [<ffffffff811136b3>] SyS_delete_module+0x193/0x1e0
> >[  142.217291]  [<ffffffff818174ae>] entry_SYSCALL_64_fastpath+0x1c/0xb1
> >[  142.217292] ---[ end trace 6fd164859c154772 ]---
> >[  142.217505] [drm:show_leaks] *ERROR* node [6b6b6b6b6b6b6b6b + 6b6b6b6b6b6b6b6b]: inserted at
> >                [<ffffffff81559ff3>] save_stack.isra.1+0x53/0xa0
> >                [<ffffffff8155a98d>] drm_mm_insert_node_in_range_generic+0x2ad/0x360
> >                [<ffffffffa035bf23>] i915_gem_stolen_insert_node_in_range+0x93/0xe0 [i915]
> >                [<ffffffffa035c855>] i915_gem_object_create_stolen+0x75/0xb0 [i915]
> >                [<ffffffffa036a51a>] intel_engine_create_ring+0x9a/0x140 [i915]
> >                [<ffffffffa036a921>] intel_init_ring_buffer+0xf1/0x440 [i915]
> >                [<ffffffffa036be1b>] intel_init_render_ring_buffer+0xab/0x1b0 [i915]
> >                [<ffffffffa0363d08>] intel_engines_init+0xc8/0x210 [i915]
> >                [<ffffffffa0355d7c>] i915_gem_init+0xac/0xf0 [i915]
> >                [<ffffffffa0311454>] i915_driver_load+0x9c4/0x1430 [i915]
> >                [<ffffffffa031c2f8>] i915_pci_probe+0x28/0x40 [i915]
> >                [<ffffffff8147d315>] pci_device_probe+0x85/0xf0
> >                [<ffffffff8157b7ff>] driver_probe_device+0x21f/0x430
> >                [<ffffffff8157baee>] __driver_attach+0xde/0xe0
> >
> >In particular note that the node was being poisoned as we inspected the
> >list, a  clear indication that the object is being freed as we make the
> >assertion.
> >
> >Fixes: fbbd37b36fa5 ("drm/i915: Move object release to a freelist + worker")
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> >---
> > drivers/gpu/drm/i915/i915_drv.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> >index 839ce2ae38fa..ed01421e3be7 100644
> >--- a/drivers/gpu/drm/i915/i915_drv.c
> >+++ b/drivers/gpu/drm/i915/i915_drv.c
> >@@ -544,8 +544,10 @@ static void i915_gem_fini(struct drm_i915_private *dev_priv)
> > 	i915_gem_context_fini(&dev_priv->drm);
> > 	mutex_unlock(&dev_priv->drm.struct_mutex);
> >
> >-	synchronize_rcu();
> >-	flush_work(&dev_priv->mm.free_work);
> >+	do {
> >+		rcu_barrier();
> >+		flush_work(&dev_priv->mm.free_work);
> >+	} while (!llist_empty(&dev_priv->mm.free_list));
> >
> > 	WARN_ON(!list_empty(&dev_priv->context_list));
> > }
> >
> 
> Not sure that the loop is required - after the rcu_barrier all the
> queued up additions have been added to the list and workers queued.
> So flush_work afterwards handles those and we are done, no?

We don't need the loop, it was an overreaction. A subsequent WARN would
be good enough to detect futher issues. That WARN would be better in
front of freeing the object kmem_cache.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/6] drm/i915: Discard objects from mm global_list after being shrunk
  2016-10-31 10:26 ` [PATCH 4/6] drm/i915: Discard objects from mm global_list after being shrunk Chris Wilson
@ 2016-11-01  8:29   ` Tvrtko Ursulin
  0 siblings, 0 replies; 25+ messages in thread
From: Tvrtko Ursulin @ 2016-11-01  8:29 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 31/10/2016 10:26, Chris Wilson wrote:
> In the shrinker, we can safely remove an empty object (obj->mm.pages ==
> NULL) after having discarded the pages because we are holding the
> struct_mutex.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem_shrinker.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> index 0241658af16b..b504ba091c4f 100644
> --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> @@ -226,6 +226,7 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
>  				mutex_lock(&obj->mm.lock);
>  				if (!obj->mm.pages) {
>  					__i915_gem_object_invalidate(obj);
> +					list_del_init(&obj->global_list);
>  					count += obj->base.size >> PAGE_SHIFT;
>  				}
>  				mutex_unlock(&obj->mm.lock);
>

This matches the same loop a bit higher up so from that POV:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/6] drm/i915: Track pages pinned due to swizzling quirk
  2016-10-31 10:26 ` [PATCH 3/6] drm/i915: Track pages pinned due to swizzling quirk Chris Wilson
@ 2016-11-01  8:39   ` Tvrtko Ursulin
  2016-11-01  8:48     ` Chris Wilson
  0 siblings, 1 reply; 25+ messages in thread
From: Tvrtko Ursulin @ 2016-11-01  8:39 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 31/10/2016 10:26, Chris Wilson wrote:
> If we have a tiled object and an unknown CPU swizzle pattern, we pin the
> pages to prevent the object from being swapped out (and us corrupting
> the contents as we do not know the access pattern and so cannot convert
> it to linear and back to tiled on reuse). This requires us to remember
> to drop the extra pinning when freeing the object, or else we trigger
> warnings about the pin leak. In commit fbbd37b36fa5 ("drm/i915: Move
> object release to a freelist + worker"), the object free path was
> deferred to a work, but the unpinning of the quirk, along with marking
> the object as reclaimable, was left on the immediate path (so that if
> required we could reclaim the pages under memory pressure as early as
> possible). However, this split introduced a bug where the pages we no
> longer being unpinned if they were marked as unneeded.

Last sentence is broken.

>
> [  231.800401] WARNING: CPU: 1 PID: 90 at drivers/gpu/drm/i915/i915_gem.c:4275 __i915_gem_free_objects+0x326/0x3c0 [i915]
> [  231.800403] WARN_ON(i915_gem_object_has_pinned_pages(obj))
> [  231.800405] Modules linked in:
> [  231.800406]  snd_hda_intel i915 snd_hda_codec_generic mei_me snd_hda_codec coretemp snd_hwdep mei lpc_ich snd_hda_core snd_pcm e1000e ptp pps_core [last unloaded: i915]
> [  231.800426] CPU: 1 PID: 90 Comm: kworker/1:4 Tainted: G     U          4.9.0-rc2-CI-CI_DRM_1780+ #1
> [  231.800428] Hardware name: LENOVO 7465CTO/7465CTO, BIOS 6DET44WW (2.08 ) 04/22/2009
> [  231.800456] Workqueue: events __i915_gem_free_work [i915]
> [  231.800459]  ffffc9000034fc80 ffffffff8142dd65 ffffc9000034fcd0 0000000000000000
> [  231.800465]  ffffc9000034fcc0 ffffffff8107e4e6 000010b300000001 0000000000001000
> [  231.800469]  ffff88011d3db740 ffff880130ef0000 0000000000000000 ffff880130ef5ea0
> [  231.800474] Call Trace:
> [  231.800479]  [<ffffffff8142dd65>] dump_stack+0x67/0x92
> [  231.800484]  [<ffffffff8107e4e6>] __warn+0xc6/0xe0
> [  231.800487]  [<ffffffff8107e54a>] warn_slowpath_fmt+0x4a/0x50
> [  231.800491]  [<ffffffff811d12ac>] ? kmem_cache_free+0x2dc/0x340
> [  231.800520]  [<ffffffffa009ef36>] __i915_gem_free_objects+0x326/0x3c0 [i915]
> [  231.800548]  [<ffffffffa009effe>] __i915_gem_free_work+0x2e/0x50 [i915]
> [  231.800552]  [<ffffffff8109c27c>] process_one_work+0x1ec/0x6b0
> [  231.800555]  [<ffffffff8109c1f6>] ? process_one_work+0x166/0x6b0
> [  231.800558]  [<ffffffff8109c789>] worker_thread+0x49/0x490
> [  231.800561]  [<ffffffff8109c740>] ? process_one_work+0x6b0/0x6b0
> [  231.800563]  [<ffffffff8109c740>] ? process_one_work+0x6b0/0x6b0
> [  231.800566]  [<ffffffff810a2aab>] kthread+0xeb/0x110
> [  231.800569]  [<ffffffff810a29c0>] ? kthread_park+0x60/0x60
> [  231.800573]  [<ffffffff818164a7>] ret_from_fork+0x27/0x40
>
> Fixes: fbbd37b36fa5 ("drm/i915: Move object release to a freelist + worker")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h        |  6 ++++++
>  drivers/gpu/drm/i915/i915_gem.c        | 21 +++++++++++++--------
>  drivers/gpu/drm/i915/i915_gem_tiling.c |  9 +++++++--
>  3 files changed, 26 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 42a499681966..7a18bf66f797 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2304,6 +2304,12 @@ struct drm_i915_gem_object {
>  		 * pages were last acquired.
>  		 */
>  		bool dirty:1;
> +
> +		/**
> +		 * This is set if the object has been pinned due to unknown
> +		 * swizzling.
> +		 */
> +		bool quirked:1;
>  	} mm;
>
>  	/** Breadcrumb of last rendering to the buffer.
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index b9f540b16a45..c58b7cabe87b 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2324,8 +2324,10 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
>  		i915_gem_object_do_bit_17_swizzle(obj, st);
>
>  	if (i915_gem_object_is_tiled(obj) &&
> -	    dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES)
> +	    dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
>  		__i915_gem_object_pin_pages(obj);
> +		obj->mm.quirked = true;
> +	}
>
>  	return st;
>
> @@ -4091,10 +4093,15 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
>  	if (obj->mm.pages &&
>  	    i915_gem_object_is_tiled(obj) &&
>  	    dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
> -		if (obj->mm.madv == I915_MADV_WILLNEED)
> +		if (obj->mm.madv == I915_MADV_WILLNEED) {
> +			GEM_BUG_ON(!obj->mm.quirked);
>  			__i915_gem_object_unpin_pages(obj);
> -		if (args->madv == I915_MADV_WILLNEED)
> +			obj->mm.quirked = false;
> +		}
> +		if (args->madv == I915_MADV_WILLNEED) {
>  			__i915_gem_object_pin_pages(obj);
> +			obj->mm.quirked = true;
> +		}
>  	}
>
>  	if (obj->mm.madv != __I915_MADV_PURGED)
> @@ -4335,14 +4342,12 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
>  {
>  	struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
>
> +	if (obj->mm.quirked)
> +		__i915_gem_object_unpin_pages(obj);
> +
>  	if (discard_backing_storage(obj))
>  		obj->mm.madv = I915_MADV_DONTNEED;
>
> -	if (obj->mm.pages && obj->mm.madv == I915_MADV_WILLNEED &&
> -	    to_i915(obj->base.dev)->quirks & QUIRK_PIN_SWIZZLED_PAGES &&
> -	    i915_gem_object_is_tiled(obj))
> -		__i915_gem_object_unpin_pages(obj);
> -

This reordering would not have been enough to fix this?

>  	/* Before we free the object, make sure any pure RCU-only
>  	 * read-side critical sections are complete, e.g.
>  	 * i915_gem_busy_ioctl(). For the corresponding synchronized
> diff --git a/drivers/gpu/drm/i915/i915_gem_tiling.c b/drivers/gpu/drm/i915/i915_gem_tiling.c
> index 6395e62bd9e4..1577e7810cd6 100644
> --- a/drivers/gpu/drm/i915/i915_gem_tiling.c
> +++ b/drivers/gpu/drm/i915/i915_gem_tiling.c
> @@ -263,10 +263,15 @@ i915_gem_set_tiling(struct drm_device *dev, void *data,
>  			if (obj->mm.pages &&
>  			    obj->mm.madv == I915_MADV_WILLNEED &&
>  			    dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
> -				if (args->tiling_mode == I915_TILING_NONE)
> +				if (args->tiling_mode == I915_TILING_NONE) {
> +					GEM_BUG_ON(!obj->mm.quirked);
>  					__i915_gem_object_unpin_pages(obj);
> -				if (!i915_gem_object_is_tiled(obj))
> +					obj->mm.quirked = false;
> +				}
> +				if (!i915_gem_object_is_tiled(obj)) {
>  					__i915_gem_object_pin_pages(obj);
> +					obj->mm.quirked = true;
> +				}
>  			}
>  			mutex_unlock(&obj->mm.lock);
>
>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] drm/i915: Store the vma in an rbtree under the object
  2016-10-31 10:26 ` [PATCH 6/6] drm/i915: Store the vma in an rbtree under the object Chris Wilson
@ 2016-11-01  8:41   ` Tvrtko Ursulin
  2016-11-01  8:50     ` Chris Wilson
  2016-11-01  9:43   ` Tvrtko Ursulin
  1 sibling, 1 reply; 25+ messages in thread
From: Tvrtko Ursulin @ 2016-11-01  8:41 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 31/10/2016 10:26, Chris Wilson wrote:
> With full-ppgtt one of the main bottlenecks is the lookup of the VMA
> underneath the object. For execbuf there is merit in having a very fast
> direct lookup of ctx:handle to the vma using a hashtree, but that still
> leaves a large number of other lookups. One way to speed up the lookup
> would be to use a rhashtable, but that requires extra allocations and
> may exhibit poor worse case behaviour. An alternative is to use an
> embedded rbtree, i.e. no extra allocations and deterministic behaviour,
> but at the slight cost of O(lgN) lookups (instead of O(1) for
> rhashtable). The major of such tree will be very shallow and so not much
> slower, and still scales much, much better than the current unsorted
> list.
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=87726
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

I suggest leaving this out of the mini-series which fixes the recently 
introduced bugs.

Regards,

Tvrtko


> ---
>  drivers/gpu/drm/i915/i915_drv.h     |  1 +
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 80 +++++++++++++++++++++++++------------
>  drivers/gpu/drm/i915/i915_gem_gtt.h |  1 +
>  3 files changed, 57 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 7a18bf66f797..e923d6596cac 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2230,6 +2230,7 @@ struct drm_i915_gem_object {
>
>  	/** List of VMAs backed by this object */
>  	struct list_head vma_list;
> +	struct rb_root vma_tree;
>
>  	/** Stolen memory for this object, instead of being backed by shmem. */
>  	struct drm_mm_node *stolen;
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index e7afad585929..aa2d21c41091 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -3399,6 +3399,7 @@ void i915_vma_destroy(struct i915_vma *vma)
>  	GEM_BUG_ON(!i915_vma_is_closed(vma));
>  	GEM_BUG_ON(vma->fence);
>
> +	rb_erase(&vma->obj_node, &vma->obj->vma_tree);
>  	list_del(&vma->vm_link);
>  	if (!i915_vma_is_ggtt(vma))
>  		i915_ppgtt_put(i915_vm_to_ppgtt(vma->vm));
> @@ -3416,12 +3417,33 @@ void i915_vma_close(struct i915_vma *vma)
>  		WARN_ON(i915_vma_unbind(vma));
>  }
>
> +static inline int vma_compare(struct i915_vma *vma,
> +			      struct i915_address_space *vm,
> +			      const struct i915_ggtt_view *view)
> +{
> +	GEM_BUG_ON(view && !i915_vma_is_ggtt(vma));
> +
> +	if (vma->vm != vm)
> +		return vma->vm - vm;
> +
> +	if (!view)
> +		return vma->ggtt_view.type;
> +
> +	if (vma->ggtt_view.type != view->type)
> +		return vma->ggtt_view.type - view->type;
> +
> +	return memcmp(&vma->ggtt_view.params,
> +		      &view->params,
> +		      sizeof(view->params));
> +}
> +
>  static struct i915_vma *
>  __i915_vma_create(struct drm_i915_gem_object *obj,
>  		  struct i915_address_space *vm,
>  		  const struct i915_ggtt_view *view)
>  {
>  	struct i915_vma *vma;
> +	struct rb_node *rb, **p;
>  	int i;
>
>  	GEM_BUG_ON(vm->closed);
> @@ -3455,33 +3477,28 @@ __i915_vma_create(struct drm_i915_gem_object *obj,
>
>  	if (i915_is_ggtt(vm)) {
>  		vma->flags |= I915_VMA_GGTT;
> +		list_add(&vma->obj_link, &obj->vma_list);
>  	} else {
>  		i915_ppgtt_get(i915_vm_to_ppgtt(vm));
> +		list_add_tail(&vma->obj_link, &obj->vma_list);
>  	}
>
> -	list_add_tail(&vma->obj_link, &obj->vma_list);
> -	return vma;
> -}
> +	rb = NULL;
> +	p = &obj->vma_tree.rb_node;
> +	while (*p) {
> +		struct i915_vma *pos;
>
> -static inline bool vma_matches(struct i915_vma *vma,
> -			       struct i915_address_space *vm,
> -			       const struct i915_ggtt_view *view)
> -{
> -	if (vma->vm != vm)
> -		return false;
> -
> -	if (!i915_vma_is_ggtt(vma))
> -		return true;
> -
> -	if (!view)
> -		return vma->ggtt_view.type == 0;
> -
> -	if (vma->ggtt_view.type != view->type)
> -		return false;
> +		rb = *p;
> +		pos = rb_entry(rb, struct i915_vma, obj_node);
> +		if (vma_compare(pos, vm, view) < 0)
> +			p = &rb->rb_right;
> +		else
> +			p = &rb->rb_left;
> +	}
> +	rb_link_node(&vma->obj_node, rb, p);
> +	rb_insert_color(&vma->obj_node, &obj->vma_tree);
>
> -	return memcmp(&vma->ggtt_view.params,
> -		      &view->params,
> -		      sizeof(view->params)) == 0;
> +	return vma;
>  }
>
>  struct i915_vma *
> @@ -3501,11 +3518,22 @@ i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
>  		    struct i915_address_space *vm,
>  		    const struct i915_ggtt_view *view)
>  {
> -	struct i915_vma *vma;
> +	struct rb_node *rb;
> +
> +	rb = obj->vma_tree.rb_node;
> +	while (rb) {
> +		struct i915_vma *vma;
> +		int cmp;
>
> -	list_for_each_entry_reverse(vma, &obj->vma_list, obj_link)
> -		if (vma_matches(vma, vm, view))
> +		vma = rb_entry(rb, struct i915_vma, obj_node);
> +		cmp = vma_compare(vma, vm, view);
> +		if (cmp == 0)
>  			return vma;
> +		else if (cmp < 0)
> +			rb = rb->rb_right;
> +		else
> +			rb = rb->rb_left;
> +	}
>
>  	return NULL;
>  }
> @@ -3521,8 +3549,10 @@ i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
>  	GEM_BUG_ON(view && !i915_is_ggtt(vm));
>
>  	vma = i915_gem_obj_to_vma(obj, vm, view);
> -	if (!vma)
> +	if (!vma) {
>  		vma = __i915_vma_create(obj, vm, view);
> +		GEM_BUG_ON(vma != i915_gem_obj_to_vma(obj, vm, view));
> +	}
>
>  	GEM_BUG_ON(i915_vma_is_closed(vma));
>  	return vma;
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 518e75b64290..c23ef9db1f53 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -227,6 +227,7 @@ struct i915_vma {
>  	struct list_head vm_link;
>
>  	struct list_head obj_link; /* Link in the object's VMA list */
> +	struct rb_node obj_node;
>
>  	/** This vma's place in the batchbuffer or on the eviction list */
>  	struct list_head exec_list;
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/6] drm/i915: Avoid accessing request->timeline outside of its lifetime
  2016-10-31 21:03     ` Chris Wilson
@ 2016-11-01  8:43       ` Tvrtko Ursulin
  0 siblings, 0 replies; 25+ messages in thread
From: Tvrtko Ursulin @ 2016-11-01  8:43 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 31/10/2016 21:03, Chris Wilson wrote:
> On Mon, Oct 31, 2016 at 05:35:50PM +0000, Tvrtko Ursulin wrote:
>>
>> On 31/10/2016 10:26, Chris Wilson wrote:
>>> Whilst waiting on a request, we may do so without holding any locks or
>>> any guards beyond a reference to the request. In order to avoid taking
>>> locks within request deallocation, we drop references to its timeline
>>> (via the context and ppgtt) upon retirement. We should avoid chasing
>>
>> Couldn't find that there is a reference taken (or dropped) on the
>> timeline when stored in a request. It looks like a borrowed pointer
>> to me?
>
> The timeline is owned by the address space which is owned by either the
> context or the device. The request holds a reference on the context (and
> so indirectly onto the timeline, except for the device's which outlives
> the request) up until we retire the request. (Retiring holds
> struct_mutex so is earlier than freeing.)
>
>>> +static inline u32 intel_engine_last_submit(struct intel_engine_cs *engine)
>>> +{
>>> +	return READ_ONCE(engine->timeline->last_submitted_seqno);
>>> +}
>>> +
>>
>> Don't like that READ_ONCE gets sprinkled all over the place via call
>> sites. It should be extremely well defined and controlled from where
>> it is used. Otherwise it suggests READ_ONCE is not really
>> appropriate.
>
> It actually is appropriate. last_submitted_seqno is under the timeline
> spinlock, none of the callers take the appropriate guard. This is
> trying to document that these callers don't call about fully
> synchronising the last_submitted_seqno with the request list or last
> request pointer. And we don't care to go full seqlock, since we really
> are only peeking.

Ah my bad, I thought it was actually req->timeline->last_submitted_seqno.

In this case, assuming the seqno namespaces are correct:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/6] drm/i915: Track pages pinned due to swizzling quirk
  2016-11-01  8:39   ` Tvrtko Ursulin
@ 2016-11-01  8:48     ` Chris Wilson
  2016-11-01  8:52       ` Tvrtko Ursulin
  0 siblings, 1 reply; 25+ messages in thread
From: Chris Wilson @ 2016-11-01  8:48 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Tue, Nov 01, 2016 at 08:39:14AM +0000, Tvrtko Ursulin wrote:
> 
> On 31/10/2016 10:26, Chris Wilson wrote:
> >If we have a tiled object and an unknown CPU swizzle pattern, we pin the
> >pages to prevent the object from being swapped out (and us corrupting
> >the contents as we do not know the access pattern and so cannot convert
> >it to linear and back to tiled on reuse). This requires us to remember
> >to drop the extra pinning when freeing the object, or else we trigger
> >warnings about the pin leak. In commit fbbd37b36fa5 ("drm/i915: Move
> >object release to a freelist + worker"), the object free path was
> >deferred to a work, but the unpinning of the quirk, along with marking
> >the object as reclaimable, was left on the immediate path (so that if
> >required we could reclaim the pages under memory pressure as early as
> >possible). However, this split introduced a bug where the pages we no
> >longer being unpinned if they were marked as unneeded.
> 
> Last sentence is broken.

Almost the right words in the wrong order.

> > 	if (obj->mm.madv != __I915_MADV_PURGED)
> >@@ -4335,14 +4342,12 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
> > {
> > 	struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
> >
> >+	if (obj->mm.quirked)
> >+		__i915_gem_object_unpin_pages(obj);
> >+
> > 	if (discard_backing_storage(obj))
> > 		obj->mm.madv = I915_MADV_DONTNEED;
> >
> >-	if (obj->mm.pages && obj->mm.madv == I915_MADV_WILLNEED &&
> >-	    to_i915(obj->base.dev)->quirks & QUIRK_PIN_SWIZZLED_PAGES &&
> >-	    i915_gem_object_is_tiled(obj))
> >-		__i915_gem_object_unpin_pages(obj);
> >-
> 
> This reordering would not have been enough to fix this?

Yes. I was trying to avoid the repeated if() conditions and thought a
flag would eliminate a few of them. It only managed to kill this one and
provide assertions elsewhere.

Still we reduce the 3 [of 4] tests done to one for all devices but some
odd gen3/gen4 (and gain a sanity check for uncommon code paths).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] drm/i915: Store the vma in an rbtree under the object
  2016-11-01  8:41   ` Tvrtko Ursulin
@ 2016-11-01  8:50     ` Chris Wilson
  2016-11-01  8:54       ` Tvrtko Ursulin
  0 siblings, 1 reply; 25+ messages in thread
From: Chris Wilson @ 2016-11-01  8:50 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Tue, Nov 01, 2016 at 08:41:01AM +0000, Tvrtko Ursulin wrote:
> 
> On 31/10/2016 10:26, Chris Wilson wrote:
> >With full-ppgtt one of the main bottlenecks is the lookup of the VMA
> >underneath the object. For execbuf there is merit in having a very fast
> >direct lookup of ctx:handle to the vma using a hashtree, but that still
> >leaves a large number of other lookups. One way to speed up the lookup
> >would be to use a rhashtable, but that requires extra allocations and
> >may exhibit poor worse case behaviour. An alternative is to use an
> >embedded rbtree, i.e. no extra allocations and deterministic behaviour,
> >but at the slight cost of O(lgN) lookups (instead of O(1) for
> >rhashtable). The major of such tree will be very shallow and so not much
> >slower, and still scales much, much better than the current unsorted
> >list.
> >
> >References: https://bugs.freedesktop.org/show_bug.cgi?id=87726
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> I suggest leaving this out of the mini-series which fixes the
> recently introduced bugs.

Just review it now, it's a two year old bug that was impacting the test
cases I was running... :-p Then we won't have to review it next week ;)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/6] drm/i915: Track pages pinned due to swizzling quirk
  2016-11-01  8:48     ` Chris Wilson
@ 2016-11-01  8:52       ` Tvrtko Ursulin
  0 siblings, 0 replies; 25+ messages in thread
From: Tvrtko Ursulin @ 2016-11-01  8:52 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 01/11/2016 08:48, Chris Wilson wrote:
> On Tue, Nov 01, 2016 at 08:39:14AM +0000, Tvrtko Ursulin wrote:
>>
>> On 31/10/2016 10:26, Chris Wilson wrote:
>>> If we have a tiled object and an unknown CPU swizzle pattern, we pin the
>>> pages to prevent the object from being swapped out (and us corrupting
>>> the contents as we do not know the access pattern and so cannot convert
>>> it to linear and back to tiled on reuse). This requires us to remember
>>> to drop the extra pinning when freeing the object, or else we trigger
>>> warnings about the pin leak. In commit fbbd37b36fa5 ("drm/i915: Move
>>> object release to a freelist + worker"), the object free path was
>>> deferred to a work, but the unpinning of the quirk, along with marking
>>> the object as reclaimable, was left on the immediate path (so that if
>>> required we could reclaim the pages under memory pressure as early as
>>> possible). However, this split introduced a bug where the pages we no
>>> longer being unpinned if they were marked as unneeded.
>>
>> Last sentence is broken.
>
> Almost the right words in the wrong order.
>
>>> 	if (obj->mm.madv != __I915_MADV_PURGED)
>>> @@ -4335,14 +4342,12 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
>>> {
>>> 	struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
>>>
>>> +	if (obj->mm.quirked)
>>> +		__i915_gem_object_unpin_pages(obj);
>>> +
>>> 	if (discard_backing_storage(obj))
>>> 		obj->mm.madv = I915_MADV_DONTNEED;
>>>
>>> -	if (obj->mm.pages && obj->mm.madv == I915_MADV_WILLNEED &&
>>> -	    to_i915(obj->base.dev)->quirks & QUIRK_PIN_SWIZZLED_PAGES &&
>>> -	    i915_gem_object_is_tiled(obj))
>>> -		__i915_gem_object_unpin_pages(obj);
>>> -
>>
>> This reordering would not have been enough to fix this?
>
> Yes. I was trying to avoid the repeated if() conditions and thought a
> flag would eliminate a few of them. It only managed to kill this one and
> provide assertions elsewhere.
>
> Still we reduce the 3 [of 4] tests done to one for all devices but some
> odd gen3/gen4 (and gain a sanity check for uncommon code paths).

Okay then, with the commit message fix:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] drm/i915: Store the vma in an rbtree under the object
  2016-11-01  8:50     ` Chris Wilson
@ 2016-11-01  8:54       ` Tvrtko Ursulin
  2016-11-01  9:06         ` Chris Wilson
  0 siblings, 1 reply; 25+ messages in thread
From: Tvrtko Ursulin @ 2016-11-01  8:54 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 01/11/2016 08:50, Chris Wilson wrote:
> On Tue, Nov 01, 2016 at 08:41:01AM +0000, Tvrtko Ursulin wrote:
>>
>> On 31/10/2016 10:26, Chris Wilson wrote:
>>> With full-ppgtt one of the main bottlenecks is the lookup of the VMA
>>> underneath the object. For execbuf there is merit in having a very fast
>>> direct lookup of ctx:handle to the vma using a hashtree, but that still
>>> leaves a large number of other lookups. One way to speed up the lookup
>>> would be to use a rhashtable, but that requires extra allocations and
>>> may exhibit poor worse case behaviour. An alternative is to use an
>>> embedded rbtree, i.e. no extra allocations and deterministic behaviour,
>>> but at the slight cost of O(lgN) lookups (instead of O(1) for
>>> rhashtable). The major of such tree will be very shallow and so not much
>>> slower, and still scales much, much better than the current unsorted
>>> list.
>>>
>>> References: https://bugs.freedesktop.org/show_bug.cgi?id=87726
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>
>> I suggest leaving this out of the mini-series which fixes the
>> recently introduced bugs.
>
> Just review it now, it's a two year old bug that was impacting the test
> cases I was running... :-p Then we won't have to review it next week ;)

I'll review it, just split it out please so the CI fire can be 
extinguished first.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] drm/i915: Store the vma in an rbtree under the object
  2016-11-01  8:54       ` Tvrtko Ursulin
@ 2016-11-01  9:06         ` Chris Wilson
  2016-11-01  9:20           ` Tvrtko Ursulin
  0 siblings, 1 reply; 25+ messages in thread
From: Chris Wilson @ 2016-11-01  9:06 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Tue, Nov 01, 2016 at 08:54:09AM +0000, Tvrtko Ursulin wrote:
> 
> On 01/11/2016 08:50, Chris Wilson wrote:
> >On Tue, Nov 01, 2016 at 08:41:01AM +0000, Tvrtko Ursulin wrote:
> >>
> >>On 31/10/2016 10:26, Chris Wilson wrote:
> >>>With full-ppgtt one of the main bottlenecks is the lookup of the VMA
> >>>underneath the object. For execbuf there is merit in having a very fast
> >>>direct lookup of ctx:handle to the vma using a hashtree, but that still
> >>>leaves a large number of other lookups. One way to speed up the lookup
> >>>would be to use a rhashtable, but that requires extra allocations and
> >>>may exhibit poor worse case behaviour. An alternative is to use an
> >>>embedded rbtree, i.e. no extra allocations and deterministic behaviour,
> >>>but at the slight cost of O(lgN) lookups (instead of O(1) for
> >>>rhashtable). The major of such tree will be very shallow and so not much
> >>>slower, and still scales much, much better than the current unsorted
> >>>list.
> >>>
> >>>References: https://bugs.freedesktop.org/show_bug.cgi?id=87726
> >>>Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>
> >>I suggest leaving this out of the mini-series which fixes the
> >>recently introduced bugs.
> >
> >Just review it now, it's a two year old bug that was impacting the test
> >cases I was running... :-p Then we won't have to review it next week ;)
> 
> I'll review it, just split it out please so the CI fire can be
> extinguished first.

Sure, I'm pushing the fixes as soon as they've been baked. There's a
regression fix that builds upon this patch for multiple timelines (the
linear walk in reservation_object is particularly nasty for scaling).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] drm/i915: Store the vma in an rbtree under the object
  2016-11-01  9:06         ` Chris Wilson
@ 2016-11-01  9:20           ` Tvrtko Ursulin
  2016-11-01  9:45             ` Chris Wilson
  0 siblings, 1 reply; 25+ messages in thread
From: Tvrtko Ursulin @ 2016-11-01  9:20 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 01/11/2016 09:06, Chris Wilson wrote:
> On Tue, Nov 01, 2016 at 08:54:09AM +0000, Tvrtko Ursulin wrote:
>>
>> On 01/11/2016 08:50, Chris Wilson wrote:
>>> On Tue, Nov 01, 2016 at 08:41:01AM +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 31/10/2016 10:26, Chris Wilson wrote:
>>>>> With full-ppgtt one of the main bottlenecks is the lookup of the VMA
>>>>> underneath the object. For execbuf there is merit in having a very fast
>>>>> direct lookup of ctx:handle to the vma using a hashtree, but that still
>>>>> leaves a large number of other lookups. One way to speed up the lookup
>>>>> would be to use a rhashtable, but that requires extra allocations and
>>>>> may exhibit poor worse case behaviour. An alternative is to use an
>>>>> embedded rbtree, i.e. no extra allocations and deterministic behaviour,
>>>>> but at the slight cost of O(lgN) lookups (instead of O(1) for
>>>>> rhashtable). The major of such tree will be very shallow and so not much
>>>>> slower, and still scales much, much better than the current unsorted
>>>>> list.
>>>>>
>>>>> References: https://bugs.freedesktop.org/show_bug.cgi?id=87726
>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>
>>>> I suggest leaving this out of the mini-series which fixes the
>>>> recently introduced bugs.
>>>
>>> Just review it now, it's a two year old bug that was impacting the test
>>> cases I was running... :-p Then we won't have to review it next week ;)
>>
>> I'll review it, just split it out please so the CI fire can be
>> extinguished first.
>
> Sure, I'm pushing the fixes as soon as they've been baked. There's a
> regression fix that builds upon this patch for multiple timelines (the
> linear walk in reservation_object is particularly nasty for scaling).

Which one is that, the reservation_object_get_fences_rcu in 
i915_gem_object_wait_reservation?

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] drm/i915: Store the vma in an rbtree under the object
  2016-10-31 10:26 ` [PATCH 6/6] drm/i915: Store the vma in an rbtree under the object Chris Wilson
  2016-11-01  8:41   ` Tvrtko Ursulin
@ 2016-11-01  9:43   ` Tvrtko Ursulin
  2016-11-01  9:56     ` Chris Wilson
  1 sibling, 1 reply; 25+ messages in thread
From: Tvrtko Ursulin @ 2016-11-01  9:43 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 31/10/2016 10:26, Chris Wilson wrote:
> With full-ppgtt one of the main bottlenecks is the lookup of the VMA
> underneath the object. For execbuf there is merit in having a very fast
> direct lookup of ctx:handle to the vma using a hashtree, but that still
> leaves a large number of other lookups. One way to speed up the lookup
> would be to use a rhashtable, but that requires extra allocations and
> may exhibit poor worse case behaviour. An alternative is to use an
> embedded rbtree, i.e. no extra allocations and deterministic behaviour,
> but at the slight cost of O(lgN) lookups (instead of O(1) for
> rhashtable). The major of such tree will be very shallow and so not much
> slower, and still scales much, much better than the current unsorted
> list.
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=87726
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_drv.h     |  1 +
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 80 +++++++++++++++++++++++++------------
>  drivers/gpu/drm/i915/i915_gem_gtt.h |  1 +
>  3 files changed, 57 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 7a18bf66f797..e923d6596cac 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2230,6 +2230,7 @@ struct drm_i915_gem_object {
>
>  	/** List of VMAs backed by this object */
>  	struct list_head vma_list;
> +	struct rb_root vma_tree;
>
>  	/** Stolen memory for this object, instead of being backed by shmem. */
>  	struct drm_mm_node *stolen;
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index e7afad585929..aa2d21c41091 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -3399,6 +3399,7 @@ void i915_vma_destroy(struct i915_vma *vma)
>  	GEM_BUG_ON(!i915_vma_is_closed(vma));
>  	GEM_BUG_ON(vma->fence);
>
> +	rb_erase(&vma->obj_node, &vma->obj->vma_tree);
>  	list_del(&vma->vm_link);
>  	if (!i915_vma_is_ggtt(vma))
>  		i915_ppgtt_put(i915_vm_to_ppgtt(vma->vm));
> @@ -3416,12 +3417,33 @@ void i915_vma_close(struct i915_vma *vma)
>  		WARN_ON(i915_vma_unbind(vma));
>  }
>
> +static inline int vma_compare(struct i915_vma *vma,
> +			      struct i915_address_space *vm,
> +			      const struct i915_ggtt_view *view)
> +{
> +	GEM_BUG_ON(view && !i915_vma_is_ggtt(vma));
> +
> +	if (vma->vm != vm)
> +		return vma->vm - vm;

This can theoretically overflow so I think long should be the return type.

> +
> +	if (!view)
> +		return vma->ggtt_view.type;
> +
> +	if (vma->ggtt_view.type != view->type)
> +		return vma->ggtt_view.type - view->type;
> +
> +	return memcmp(&vma->ggtt_view.params,
> +		      &view->params,
> +		      sizeof(view->params));
> +}
> +
>  static struct i915_vma *
>  __i915_vma_create(struct drm_i915_gem_object *obj,
>  		  struct i915_address_space *vm,
>  		  const struct i915_ggtt_view *view)
>  {
>  	struct i915_vma *vma;
> +	struct rb_node *rb, **p;
>  	int i;
>
>  	GEM_BUG_ON(vm->closed);
> @@ -3455,33 +3477,28 @@ __i915_vma_create(struct drm_i915_gem_object *obj,
>
>  	if (i915_is_ggtt(vm)) {
>  		vma->flags |= I915_VMA_GGTT;
> +		list_add(&vma->obj_link, &obj->vma_list);
>  	} else {
>  		i915_ppgtt_get(i915_vm_to_ppgtt(vm));
> +		list_add_tail(&vma->obj_link, &obj->vma_list);
>  	}

Will this make a difference to anything since it is not used for 
lookups? I suppose it makes a nicer debugfs output if nothing else so 
not complaining.

>
> -	list_add_tail(&vma->obj_link, &obj->vma_list);
> -	return vma;
> -}
> +	rb = NULL;
> +	p = &obj->vma_tree.rb_node;
> +	while (*p) {
> +		struct i915_vma *pos;
>
> -static inline bool vma_matches(struct i915_vma *vma,
> -			       struct i915_address_space *vm,
> -			       const struct i915_ggtt_view *view)
> -{
> -	if (vma->vm != vm)
> -		return false;
> -
> -	if (!i915_vma_is_ggtt(vma))
> -		return true;
> -
> -	if (!view)
> -		return vma->ggtt_view.type == 0;
> -
> -	if (vma->ggtt_view.type != view->type)
> -		return false;
> +		rb = *p;
> +		pos = rb_entry(rb, struct i915_vma, obj_node);
> +		if (vma_compare(pos, vm, view) < 0)
> +			p = &rb->rb_right;
> +		else
> +			p = &rb->rb_left;
> +	}
> +	rb_link_node(&vma->obj_node, rb, p);
> +	rb_insert_color(&vma->obj_node, &obj->vma_tree);
>
> -	return memcmp(&vma->ggtt_view.params,
> -		      &view->params,
> -		      sizeof(view->params)) == 0;
> +	return vma;
>  }
>
>  struct i915_vma *
> @@ -3501,11 +3518,22 @@ i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
>  		    struct i915_address_space *vm,
>  		    const struct i915_ggtt_view *view)
>  {
> -	struct i915_vma *vma;
> +	struct rb_node *rb;
> +
> +	rb = obj->vma_tree.rb_node;
> +	while (rb) {
> +		struct i915_vma *vma;
> +		int cmp;
>
> -	list_for_each_entry_reverse(vma, &obj->vma_list, obj_link)
> -		if (vma_matches(vma, vm, view))
> +		vma = rb_entry(rb, struct i915_vma, obj_node);
> +		cmp = vma_compare(vma, vm, view);
> +		if (cmp == 0)
>  			return vma;
> +		else if (cmp < 0)
> +			rb = rb->rb_right;
> +		else
> +			rb = rb->rb_left;
> +	}
>
>  	return NULL;
>  }
> @@ -3521,8 +3549,10 @@ i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
>  	GEM_BUG_ON(view && !i915_is_ggtt(vm));
>
>  	vma = i915_gem_obj_to_vma(obj, vm, view);
> -	if (!vma)
> +	if (!vma) {
>  		vma = __i915_vma_create(obj, vm, view);
> +		GEM_BUG_ON(vma != i915_gem_obj_to_vma(obj, vm, view));

Nice touch which makes the review so much easier! :)

> +	}
>
>  	GEM_BUG_ON(i915_vma_is_closed(vma));
>  	return vma;
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 518e75b64290..c23ef9db1f53 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -227,6 +227,7 @@ struct i915_vma {
>  	struct list_head vm_link;
>
>  	struct list_head obj_link; /* Link in the object's VMA list */
> +	struct rb_node obj_node;
>
>  	/** This vma's place in the batchbuffer or on the eviction list */
>  	struct list_head exec_list;
>

LGTM in general.

Just the comparison return type (and accompanying locals) and then:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] drm/i915: Store the vma in an rbtree under the object
  2016-11-01  9:20           ` Tvrtko Ursulin
@ 2016-11-01  9:45             ` Chris Wilson
  0 siblings, 0 replies; 25+ messages in thread
From: Chris Wilson @ 2016-11-01  9:45 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Tue, Nov 01, 2016 at 09:20:33AM +0000, Tvrtko Ursulin wrote:
> 
> On 01/11/2016 09:06, Chris Wilson wrote:
> >On Tue, Nov 01, 2016 at 08:54:09AM +0000, Tvrtko Ursulin wrote:
> >>
> >>On 01/11/2016 08:50, Chris Wilson wrote:
> >>>On Tue, Nov 01, 2016 at 08:41:01AM +0000, Tvrtko Ursulin wrote:
> >>>>
> >>>>On 31/10/2016 10:26, Chris Wilson wrote:
> >>>>>With full-ppgtt one of the main bottlenecks is the lookup of the VMA
> >>>>>underneath the object. For execbuf there is merit in having a very fast
> >>>>>direct lookup of ctx:handle to the vma using a hashtree, but that still
> >>>>>leaves a large number of other lookups. One way to speed up the lookup
> >>>>>would be to use a rhashtable, but that requires extra allocations and
> >>>>>may exhibit poor worse case behaviour. An alternative is to use an
> >>>>>embedded rbtree, i.e. no extra allocations and deterministic behaviour,
> >>>>>but at the slight cost of O(lgN) lookups (instead of O(1) for
> >>>>>rhashtable). The major of such tree will be very shallow and so not much
> >>>>>slower, and still scales much, much better than the current unsorted
> >>>>>list.
> >>>>>
> >>>>>References: https://bugs.freedesktop.org/show_bug.cgi?id=87726
> >>>>>Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>>>
> >>>>I suggest leaving this out of the mini-series which fixes the
> >>>>recently introduced bugs.
> >>>
> >>>Just review it now, it's a two year old bug that was impacting the test
> >>>cases I was running... :-p Then we won't have to review it next week ;)
> >>
> >>I'll review it, just split it out please so the CI fire can be
> >>extinguished first.
> >
> >Sure, I'm pushing the fixes as soon as they've been baked. There's a
> >regression fix that builds upon this patch for multiple timelines (the
> >linear walk in reservation_object is particularly nasty for scaling).
> 
> Which one is that, the reservation_object_get_fences_rcu in
> i915_gem_object_wait_reservation?

Mostly reservation_object_add_shared_fence()

The reservation object is not autopruning, those fences stay there until
replaced. So as an object is used by more and more contexts (such as
many GL clients sharing the same resource) that shared array keeps on
growing, and we have to walk over it on every execbuf, both to update
the fence and indeed often to compute the await.

We can use a radixtree/idr or a rhashtable here since they share the RCU
share lookup required for reservation_object (and couple in the seqlock
for update tracking), the stumbling point is the that
reservation_object_reserve_shared_fence() guarrantees the next
add_shared_fence() will not fail (ENOMEM). Both idr/radixtree have
preload, but both use per-cpu caches and so require preemption disabling
from the point of reservation to use; i.e. not suitable. Changing them
to preload onto a specific tree seems reasonably easy except for
convincing core of their merit. (There changing idr looks like it would
be easier.) radixtree is easier to use than idr, but iterating the
radixtree for (get_fences_rcu) is not as cheap as I expect. :|
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] drm/i915: Store the vma in an rbtree under the object
  2016-11-01  9:43   ` Tvrtko Ursulin
@ 2016-11-01  9:56     ` Chris Wilson
  0 siblings, 0 replies; 25+ messages in thread
From: Chris Wilson @ 2016-11-01  9:56 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Tue, Nov 01, 2016 at 09:43:53AM +0000, Tvrtko Ursulin wrote:
> 
> On 31/10/2016 10:26, Chris Wilson wrote:
> >With full-ppgtt one of the main bottlenecks is the lookup of the VMA
> >underneath the object. For execbuf there is merit in having a very fast
> >direct lookup of ctx:handle to the vma using a hashtree, but that still
> >leaves a large number of other lookups. One way to speed up the lookup
> >would be to use a rhashtable, but that requires extra allocations and
> >may exhibit poor worse case behaviour. An alternative is to use an
> >embedded rbtree, i.e. no extra allocations and deterministic behaviour,
> >but at the slight cost of O(lgN) lookups (instead of O(1) for
> >rhashtable). The major of such tree will be very shallow and so not much
> >slower, and still scales much, much better than the current unsorted
> >list.
> >
> >References: https://bugs.freedesktop.org/show_bug.cgi?id=87726
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >---
> > drivers/gpu/drm/i915/i915_drv.h     |  1 +
> > drivers/gpu/drm/i915/i915_gem_gtt.c | 80 +++++++++++++++++++++++++------------
> > drivers/gpu/drm/i915/i915_gem_gtt.h |  1 +
> > 3 files changed, 57 insertions(+), 25 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >index 7a18bf66f797..e923d6596cac 100644
> >--- a/drivers/gpu/drm/i915/i915_drv.h
> >+++ b/drivers/gpu/drm/i915/i915_drv.h
> >@@ -2230,6 +2230,7 @@ struct drm_i915_gem_object {
> >
> > 	/** List of VMAs backed by this object */
> > 	struct list_head vma_list;
> >+	struct rb_root vma_tree;
> >
> > 	/** Stolen memory for this object, instead of being backed by shmem. */
> > 	struct drm_mm_node *stolen;
> >diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> >index e7afad585929..aa2d21c41091 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> >@@ -3399,6 +3399,7 @@ void i915_vma_destroy(struct i915_vma *vma)
> > 	GEM_BUG_ON(!i915_vma_is_closed(vma));
> > 	GEM_BUG_ON(vma->fence);
> >
> >+	rb_erase(&vma->obj_node, &vma->obj->vma_tree);
> > 	list_del(&vma->vm_link);
> > 	if (!i915_vma_is_ggtt(vma))
> > 		i915_ppgtt_put(i915_vm_to_ppgtt(vma->vm));
> >@@ -3416,12 +3417,33 @@ void i915_vma_close(struct i915_vma *vma)
> > 		WARN_ON(i915_vma_unbind(vma));
> > }
> >
> >+static inline int vma_compare(struct i915_vma *vma,
> >+			      struct i915_address_space *vm,
> >+			      const struct i915_ggtt_view *view)
> >+{
> >+	GEM_BUG_ON(view && !i915_vma_is_ggtt(vma));
> >+
> >+	if (vma->vm != vm)
> >+		return vma->vm - vm;
> 
> This can theoretically overflow so I think long should be the return type.

Yeah, the same thought cross through my mind. gcc should be smart enough
to only use the cc flags, but still...

I miss the spaceship operator.
 
> >+
> >+	if (!view)
> >+		return vma->ggtt_view.type;
> >+
> >+	if (vma->ggtt_view.type != view->type)
> >+		return vma->ggtt_view.type - view->type;
> >+
> >+	return memcmp(&vma->ggtt_view.params,
> >+		      &view->params,
> >+		      sizeof(view->params));
> >+}
> >+
> > static struct i915_vma *
> > __i915_vma_create(struct drm_i915_gem_object *obj,
> > 		  struct i915_address_space *vm,
> > 		  const struct i915_ggtt_view *view)
> > {
> > 	struct i915_vma *vma;
> >+	struct rb_node *rb, **p;
> > 	int i;
> >
> > 	GEM_BUG_ON(vm->closed);
> >@@ -3455,33 +3477,28 @@ __i915_vma_create(struct drm_i915_gem_object *obj,
> >
> > 	if (i915_is_ggtt(vm)) {
> > 		vma->flags |= I915_VMA_GGTT;
> >+		list_add(&vma->obj_link, &obj->vma_list);
> > 	} else {
> > 		i915_ppgtt_get(i915_vm_to_ppgtt(vm));
> >+		list_add_tail(&vma->obj_link, &obj->vma_list);
> > 	}
> 
> Will this make a difference to anything since it is not used for
> lookups? I suppose it makes a nicer debugfs output if nothing else
> so not complaining.

It does. The code assumes GGTT entries are first (some of my patches
came from a time before the regression was introduced by Daniel because
he claimed it made no difference but had not benchmarked it!).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-11-01  9:56 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-31 10:26 [PATCH 1/6] drm/i915: Use the full hammer when shutting down the rcu tasks Chris Wilson
2016-10-31 10:26 ` [PATCH 2/6] drm/i915: Avoid accessing request->timeline outside of its lifetime Chris Wilson
2016-10-31 17:35   ` Tvrtko Ursulin
2016-10-31 21:03     ` Chris Wilson
2016-11-01  8:43       ` Tvrtko Ursulin
2016-10-31 10:26 ` [PATCH 3/6] drm/i915: Track pages pinned due to swizzling quirk Chris Wilson
2016-11-01  8:39   ` Tvrtko Ursulin
2016-11-01  8:48     ` Chris Wilson
2016-11-01  8:52       ` Tvrtko Ursulin
2016-10-31 10:26 ` [PATCH 4/6] drm/i915: Discard objects from mm global_list after being shrunk Chris Wilson
2016-11-01  8:29   ` Tvrtko Ursulin
2016-10-31 10:26 ` [PATCH 5/6] drm/i915: Move the recently scanned objects to the tail after shrinking Chris Wilson
2016-10-31 15:26   ` Joonas Lahtinen
2016-10-31 10:26 ` [PATCH 6/6] drm/i915: Store the vma in an rbtree under the object Chris Wilson
2016-11-01  8:41   ` Tvrtko Ursulin
2016-11-01  8:50     ` Chris Wilson
2016-11-01  8:54       ` Tvrtko Ursulin
2016-11-01  9:06         ` Chris Wilson
2016-11-01  9:20           ` Tvrtko Ursulin
2016-11-01  9:45             ` Chris Wilson
2016-11-01  9:43   ` Tvrtko Ursulin
2016-11-01  9:56     ` Chris Wilson
2016-10-31 11:16 ` ✗ Fi.CI.BAT: failure for series starting with [1/6] drm/i915: Use the full hammer when shutting down the rcu tasks Patchwork
2016-10-31 17:15 ` [PATCH 1/6] " Tvrtko Ursulin
2016-10-31 21:05   ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.