All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-gfx] [RFC PATCH 0/4] drm/i915: Resolve device hotunplug issues
@ 2020-05-18 18:17 Janusz Krzysztofik
  2020-05-18 18:17 ` [Intel-gfx] [RFC PATCH 1/4] drm/i915: Drop user contexts on driver remove Janusz Krzysztofik
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: Janusz Krzysztofik @ 2020-05-18 18:17 UTC (permalink / raw)
  To: intel-gfx

The idea is to revoke DMA mappings on driver remove in order to work
around intel-iommu judging late unmapping on driver release after an
open device is removed as bugs.  That also resolves runtime power
management warnings on late object removal.

Janusz Krzysztofik (4):
  drm/i915: Drop user contexts on driver remove
  drm/i915: Release GT resources on driver remove
  drm/i915: Move GGTT cleanup from driver_release to _remove
  drm/i915: Move UC firmware cleanup from driver_release to _remove

 drivers/gpu/drm/i915/gem/i915_gem_context.c | 38 +++++++++++++++++++++
 drivers/gpu/drm/i915/gem/i915_gem_context.h |  1 +
 drivers/gpu/drm/i915/gt/intel_ggtt.c        | 13 ++++---
 drivers/gpu/drm/i915/gt/intel_gt.c          |  2 ++
 drivers/gpu/drm/i915/gt/intel_gtt.h         |  1 +
 drivers/gpu/drm/i915/i915_drv.c             |  2 ++
 drivers/gpu/drm/i915/i915_gem.c             |  5 ++-
 7 files changed, 57 insertions(+), 5 deletions(-)

-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Intel-gfx] [RFC PATCH 1/4] drm/i915: Drop user contexts on driver remove
  2020-05-18 18:17 [Intel-gfx] [RFC PATCH 0/4] drm/i915: Resolve device hotunplug issues Janusz Krzysztofik
@ 2020-05-18 18:17 ` Janusz Krzysztofik
  2020-05-27 14:23   ` Michał Winiarski
  2020-05-28 10:14   ` Tvrtko Ursulin
  2020-05-18 18:17 ` [Intel-gfx] [RFC PATCH 2/4] drm/i915: Release GT resources " Janusz Krzysztofik
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 20+ messages in thread
From: Janusz Krzysztofik @ 2020-05-18 18:17 UTC (permalink / raw)
  To: intel-gfx

Contexts associated with open device file descriptors together with
their assigned address spaces are now closed on device file close.  On
address space closure its associated DMA mappings are revoked.  If the
device is removed while being open, subsequent attempts to revoke
those mappings while closing the device file descriptor may may be
judged by intel-iommu code as a bug and result in kernel panic.

Since user contexts become useless after the device is no longer
available, drop them on device removal.

<4> [36.900985] ------------[ cut here ]------------
<2> [36.901005] kernel BUG at drivers/iommu/intel-iommu.c:3717!
<4> [36.901105] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
<4> [36.901117] CPU: 0 PID: 39 Comm: kworker/u8:1 Tainted: G     U  W         5.7.0-rc5-CI-CI_DRM_8485+ #1
<4> [36.901133] Hardware name: Intel Corporation Elkhart Lake Embedded Platform/ElkhartLake LPDDR4x T3 CRB, BIOS EHLSFWI1.R00.1484.A00.1911290833 11/29/2019
<4> [36.901250] Workqueue: i915 __i915_vm_release [i915]
<4> [36.901264] RIP: 0010:intel_unmap+0x1f5/0x230
<4> [36.901274] Code: 01 e8 9f bc a9 ff 85 c0 74 09 80 3d df 60 09 01 00 74 19 65 ff 0d 13 12 97 7e 0f 85 fc fe ff ff e8 82 b0 95 ff e9 f2 fe ff ff <0f> 0b e8 d4 bd a9 ff 85 c0 75 de 48 c7 c2 10 84 2c 82 be 54 00 00
<4> [36.901302] RSP: 0018:ffffc900001ebdc0 EFLAGS: 00010246
<4> [36.901313] RAX: 0000000000000000 RBX: ffff8882561dd000 RCX: 0000000000000000
<4> [36.901324] RDX: 0000000000001000 RSI: 00000000ffd9c000 RDI: ffff888274c94000
<4> [36.901336] RBP: ffff888274c940b0 R08: 0000000000000000 R09: 0000000000000001
<4> [36.901348] R10: 000000000a25d812 R11: 00000000112af2d4 R12: ffff888252c70200
<4> [36.901360] R13: 00000000ffd9c000 R14: 0000000000001000 R15: ffff8882561dd010
<4> [36.901372] FS:  0000000000000000(0000) GS:ffff888278000000(0000) knlGS:0000000000000000
<4> [36.901386] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [36.901396] CR2: 00007f06def54950 CR3: 0000000255844000 CR4: 0000000000340ef0
<4> [36.901408] Call Trace:
<4> [36.901418]  ? process_one_work+0x1de/0x600
<4> [36.901494]  cleanup_page_dma+0x37/0x70 [i915]
<4> [36.901573]  free_pd+0x9/0x20 [i915]
<4> [36.901644]  gen8_ppgtt_cleanup+0x59/0xc0 [i915]
<4> [36.901721]  __i915_vm_release+0x14/0x30 [i915]
<4> [36.901733]  process_one_work+0x268/0x600
<4> [36.901744]  ? __schedule+0x307/0x8d0
<4> [36.901756]  worker_thread+0x37/0x380
<4> [36.901766]  ? process_one_work+0x600/0x600
<4> [36.901775]  kthread+0x140/0x160
<4> [36.901783]  ? kthread_park+0x80/0x80
<4> [36.901792]  ret_from_fork+0x24/0x50
<4> [36.901804] Modules linked in: mei_hdcp i915 x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ax88179_178a usbnet mii mei_me mei prime_numbers intel_lpss_pci
<4> [36.901857] ---[ end trace 52d1b4d81f8d1ea7 ]---

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 38 +++++++++++++++++++++
 drivers/gpu/drm/i915/gem/i915_gem_context.h |  1 +
 drivers/gpu/drm/i915/i915_gem.c             |  2 ++
 3 files changed, 41 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 900ea8b7fc8f..0096a69fbfd3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -927,6 +927,44 @@ void i915_gem_driver_release__contexts(struct drm_i915_private *i915)
 	rcu_barrier(); /* and flush the left over RCU frees */
 }
 
+void i915_gem_driver_remove__contexts(struct drm_i915_private *i915)
+{
+	struct i915_gem_context *ctx, *cn;
+
+	list_for_each_entry_safe(ctx, cn, &i915->gem.contexts.list, link) {
+		struct drm_i915_file_private *file_priv = ctx->file_priv;
+		struct i915_gem_context *entry;
+		unsigned long int id;
+
+		if (i915_gem_context_is_closed(ctx) || IS_ERR(file_priv))
+			continue;
+
+		xa_for_each(&file_priv->context_xa, id, entry) {
+			struct i915_address_space *vm;
+			unsigned long int idx;
+
+			if (entry != ctx)
+				continue;
+
+			xa_erase(&file_priv->context_xa, id);
+
+			if (id)
+				break;
+
+			xa_for_each(&file_priv->vm_xa, idx, vm) {
+				xa_erase(&file_priv->vm_xa, idx);
+				i915_vm_put(vm);
+			}
+
+			break;
+		}
+
+		context_close(ctx);
+	}
+
+	i915_gem_driver_release__contexts(i915);
+}
+
 static int gem_context_register(struct i915_gem_context *ctx,
 				struct drm_i915_file_private *fpriv,
 				u32 *id)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h
index 3702b2fb27ab..62808bea9239 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
@@ -110,6 +110,7 @@ i915_gem_context_clear_user_engines(struct i915_gem_context *ctx)
 
 /* i915_gem_context.c */
 void i915_gem_init__contexts(struct drm_i915_private *i915);
+void i915_gem_driver_remove__contexts(struct drm_i915_private *i915);
 void i915_gem_driver_release__contexts(struct drm_i915_private *i915);
 
 int i915_gem_context_open(struct drm_i915_private *i915,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0cbcb9f54e7d..87d3c4f5b6c6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1189,6 +1189,8 @@ void i915_gem_driver_remove(struct drm_i915_private *dev_priv)
 	/* Flush any outstanding unpin_work. */
 	i915_gem_drain_workqueue(dev_priv);
 
+	i915_gem_driver_remove__contexts(dev_priv);
+
 	i915_gem_drain_freed_objects(dev_priv);
 }
 
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-gfx] [RFC PATCH 2/4] drm/i915: Release GT resources on driver remove
  2020-05-18 18:17 [Intel-gfx] [RFC PATCH 0/4] drm/i915: Resolve device hotunplug issues Janusz Krzysztofik
  2020-05-18 18:17 ` [Intel-gfx] [RFC PATCH 1/4] drm/i915: Drop user contexts on driver remove Janusz Krzysztofik
@ 2020-05-18 18:17 ` Janusz Krzysztofik
  2020-05-27 18:35   ` Michał Winiarski
  2020-05-18 18:17 ` [Intel-gfx] [RFC PATCH 3/4] drm/i915: Move GGTT cleanup from driver_release to _remove Janusz Krzysztofik
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Janusz Krzysztofik @ 2020-05-18 18:17 UTC (permalink / raw)
  To: intel-gfx

GT scratch page is now released and its DMA mappings revoked on driver
release.  If a device is removed while its file descriptor is still
open, the driver is not released until last device file descriptor
closure.  In that case intel-iommu code may judge late DMA unmapping as
a bug and kernel panic may occur.

Since DMA mapped address space may be no longer usable after device
removal, release GT resources including scratch page as well as a
reference to its address space on driver remove.  Implement that by
just calling intel_gt_driver_release() on GT remove as that function
has been already made safe to be called again on driver release even if
already called before, e.g. on GEM initialization failure.

<4> [39.201062] ------------[ cut here ]------------
<2> [39.201074] kernel BUG at drivers/iommu/intel-iommu.c:3717!
<4> [39.201154] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
<4> [39.201162] CPU: 6 PID: 7 Comm: kworker/u16:0 Tainted: G     U  W         5.7.0-rc5-CI-CI_DRM_8485+ #1
<4> [39.201172] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP, BIOS ICLSFWR1.R00.3175.A00.1904261428 04/26/2019
<4> [39.201243] Workqueue: i915 __i915_gem_free_work [i915]
<4> [39.201252] RIP: 0010:intel_unmap+0x1f5/0x230
<4> [39.201260] Code: 01 e8 9f bc a9 ff 85 c0 74 09 80 3d df 60 09 01 00 74 19 65 ff 0d 13 12 97 7e 0f 85 fc fe ff ff e8 82 b0 95 ff e9 f2 fe ff ff <0f> 0b e8 d4 bd a9 ff 85 c0 75 de 48 c7 c2 10 84 2c 82 be 54 00 00
<4> [39.201278] RSP: 0018:ffffc900000dbc98 EFLAGS: 00010246
<4> [39.201285] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffea0021d30000
<4> [39.201293] RDX: 000000000005f000 RSI: 00000000fed00000 RDI: ffff888889eec000
<4> [39.201301] RBP: ffff888889eec0b0 R08: 0000000000000000 R09: 00000000fffffffe
<4> [39.201309] R10: 00000000458139fc R11: 00000000f6c6d8b2 R12: 0000000000000025
<4> [39.201318] R13: 00000000fed00000 R14: 000000000005f000 R15: 0000000000000025
<4> [39.201326] FS:  0000000000000000(0000) GS:ffff888890100000(0000) knlGS:0000000000000000
<4> [39.201335] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [39.201342] CR2: 0000560f1308e148 CR3: 0000000881972002 CR4: 0000000000760ee0
<4> [39.201350] PKRU: 55555554
<4> [39.201355] Call Trace:
<4> [39.201361]  intel_unmap_sg+0x7b/0x180
<4> [39.201412]  shmem_put_pages+0x43/0x250 [i915]
<4> [39.201472]  ? __i915_gem_object_unset_pages.part.12+0x11b/0x1d0 [i915]
<4> [39.201531]  ? __i915_gem_object_unset_pages.part.12+0x133/0x1d0 [i915]
<4> [39.201590]  __i915_gem_object_put_pages+0x81/0xc0 [i915]
<4> [39.201646]  __i915_gem_free_objects.isra.21+0x1a7/0x4b0 [i915]
<4> [39.201658]  process_one_work+0x268/0x600
<4> [39.201666]  ? __schedule+0x307/0x8d0
<4> [39.201675]  worker_thread+0x1d0/0x380
<4> [39.201682]  ? process_one_work+0x600/0x600
<4> [39.201689]  kthread+0x140/0x160
<4> [39.201695]  ? kthread_park+0x80/0x80
<4> [39.201703]  ret_from_fork+0x24/0x50
<4> [39.201712] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_hda_intel snd_intel_dspcfg snd_hda_codec e1000e ax88179_178a usbnet snd_hwdep mii snd_hda_core ghash_clmulni_intel snd_pcm ptp pps_core mei_me mei intel_lpss_pci prime_numbers
<4> [39.201764] ---[ end trace f3ec1bae3de04509 ]---

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index f069551e412f..5771e80e85a6 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -599,6 +599,8 @@ void intel_gt_driver_remove(struct intel_gt *gt)
 	intel_uc_driver_remove(&gt->uc);
 
 	intel_engines_release(gt);
+
+	intel_gt_driver_release(gt);
 }
 
 void intel_gt_driver_unregister(struct intel_gt *gt)
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-gfx] [RFC PATCH 3/4] drm/i915: Move GGTT cleanup from driver_release to _remove
  2020-05-18 18:17 [Intel-gfx] [RFC PATCH 0/4] drm/i915: Resolve device hotunplug issues Janusz Krzysztofik
  2020-05-18 18:17 ` [Intel-gfx] [RFC PATCH 1/4] drm/i915: Drop user contexts on driver remove Janusz Krzysztofik
  2020-05-18 18:17 ` [Intel-gfx] [RFC PATCH 2/4] drm/i915: Release GT resources " Janusz Krzysztofik
@ 2020-05-18 18:17 ` Janusz Krzysztofik
  2020-05-27 19:12   ` Michał Winiarski
  2020-05-18 18:17 ` [Intel-gfx] [RFC PATCH 4/4] drm/i915: Move UC firmware " Janusz Krzysztofik
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Janusz Krzysztofik @ 2020-05-18 18:17 UTC (permalink / raw)
  To: intel-gfx

GGTT including its scratch page is not cleaned up until driver release.
Since DMA mappings still exist before scratch page cleanup, unmapping
them on last device close after the driver has been already removed may
be judged by intel-iommu code as a bug and result in kernel panic.

Since we abort requests and block user access to hardware on device
removal, there seems not much sense in still keeping GGTT.  Clean it up
on driver remove.  However, skip GGTT address space cleanup as its
mutext may still be needed if there are objects to be freed.  That
cleanup is always called on address space release after all.

[   81.290284] ------------[ cut here ]------------
[   81.291602] kernel BUG at drivers/iommu/intel-iommu.c:3591!
[   81.293558] invalid opcode: 0000 [#1] PREEMPT SMP
[   81.294695] CPU: 0 PID: 207 Comm: core_hotunplug Tainted: G     U            5.4.17 #2
[   81.296579] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[   81.297959] RIP: 0010:intel_unmap+0x200/0x230
[   81.299015] Code: 00 e8 e4 45 c5 ff 85 c0 74 09 80 3d 2b 84 c0 00 00 74 19 65 ff 0d 78 9a b2 7e 0f 85 fa fe ff ff e8 95 57 b1 ff e9 f0 fe ff ff <0f> 0b e8 19 4c c5 ff 85 c0 75 de 48 c7 c2 48 d2 e1 81 be 57 00 00
[   81.303425] RSP: 0018:ffffc9000013fda0 EFLAGS: 00010246
[   81.304683] RAX: 0000000000000000 RBX: ffff8882228dd0b0 RCX: 0000000000000000
[   81.306384] RDX: 0000000000001000 RSI: 00000000af801000 RDI: ffff8882228dd0b0
[   81.308086] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[   81.309788] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000af801000
[   81.311489] R13: ffff888223a00000 R14: 0000000000001000 R15: ffff888223a0a2e8
[   81.313191] FS:  00007f5408e3c940(0000) GS:ffff888228600000(0000) knlGS:0000000000000000
[   81.315116] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   81.316495] CR2: 0000000001fc0048 CR3: 000000022464a000 CR4: 00000000000006b0
[   81.318196] Call Trace:
[   81.318967]  cleanup_scratch_page+0x44/0x80 [i915]
[   81.320281]  i915_ggtt_driver_release+0x15b/0x220 [i915]
[   81.321717]  i915_driver_release+0x33/0x90 [i915]
[   81.322856]  drm_release+0xbc/0xd0
[   81.323691]  __fput+0xcd/0x260
[   81.324447]  task_work_run+0x90/0xc0
[   81.325323]  do_syscall_64+0x3da/0x560
[   81.326240]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   81.327457] RIP: 0033:0x7f54096ecba3
[   81.328332] Code: 00 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 45 c3 0f 1f 40 00 48 83 ec 18 89 7c 24 0c e8
[   81.332741] RSP: 002b:00007ffcc5165698 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[   81.334546] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f54096ecba3
[   81.336247] RDX: 00000000005cc5d0 RSI: 0000000000000005 RDI: 0000000000000004
[   81.337949] RBP: 0000000000000003 R08: 00000000005b8014 R09: 0000000000000004
[   81.339650] R10: 00000000005cc650 R11: 0000000000000246 R12: 00000000004022f0
[   81.341352] R13: 00007ffcc5165850 R14: 0000000000000000 R15: 0000000000000000
[   81.343059] Modules linked in: i915 mfd_core intel_gtt prime_numbers
[   81.345015] ---[ end trace 010aae55e56f8998 ]---

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>

drm/i915: Defer GGTT vm address space fini to vm release

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_ggtt.c | 13 +++++++++----
 drivers/gpu/drm/i915/gt/intel_gtt.h  |  1 +
 drivers/gpu/drm/i915/i915_drv.c      |  2 ++
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 66165b10256e..ff2b4f74149a 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -701,7 +701,6 @@ static void ggtt_cleanup_hw(struct i915_ggtt *ggtt)
 	ggtt->vm.cleanup(&ggtt->vm);
 
 	mutex_unlock(&ggtt->vm.mutex);
-	i915_address_space_fini(&ggtt->vm);
 
 	arch_phys_wc_del(ggtt->mtrr);
 
@@ -709,6 +708,15 @@ static void ggtt_cleanup_hw(struct i915_ggtt *ggtt)
 		io_mapping_fini(&ggtt->iomap);
 }
 
+void i915_ggtt_driver_remove(struct drm_i915_private *i915)
+{
+	struct i915_ggtt *ggtt = &i915->ggtt;
+
+	fini_aliasing_ppgtt(ggtt);
+
+	ggtt_cleanup_hw(ggtt);
+}
+
 /**
  * i915_ggtt_driver_release - Clean up GGTT hardware initialization
  * @i915: i915 device
@@ -718,10 +726,7 @@ void i915_ggtt_driver_release(struct drm_i915_private *i915)
 	struct i915_ggtt *ggtt = &i915->ggtt;
 	struct pagevec *pvec;
 
-	fini_aliasing_ppgtt(ggtt);
-
 	intel_ggtt_fini_fences(ggtt);
-	ggtt_cleanup_hw(ggtt);
 
 	pvec = &i915->mm.wc_stash.pvec;
 	if (pvec->nr) {
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index d93ebdf3fa0e..f140ce5c171a 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -501,6 +501,7 @@ int i915_ggtt_enable_hw(struct drm_i915_private *i915);
 void i915_ggtt_enable_guc(struct i915_ggtt *ggtt);
 void i915_ggtt_disable_guc(struct i915_ggtt *ggtt);
 int i915_init_ggtt(struct drm_i915_private *i915);
+void i915_ggtt_driver_remove(struct drm_i915_private *i915);
 void i915_ggtt_driver_release(struct drm_i915_private *i915);
 
 static inline bool i915_ggtt_has_aperture(const struct i915_ggtt *ggtt)
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 34ee12f3f02d..e4d9f0f6f183 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -752,6 +752,7 @@ static int i915_driver_hw_probe(struct drm_i915_private *dev_priv)
 err_mem_regions:
 	intel_memory_regions_driver_release(dev_priv);
 err_ggtt:
+	i915_ggtt_driver_remove(dev_priv);
 	i915_ggtt_driver_release(dev_priv);
 err_perf:
 	i915_perf_fini(dev_priv);
@@ -768,6 +769,7 @@ static void i915_driver_hw_remove(struct drm_i915_private *dev_priv)
 
 	i915_perf_fini(dev_priv);
 
+	i915_ggtt_driver_remove(dev_priv);
 	if (pdev->msi_enabled)
 		pci_disable_msi(pdev);
 
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-gfx] [RFC PATCH 4/4] drm/i915: Move UC firmware cleanup from driver_release to _remove
  2020-05-18 18:17 [Intel-gfx] [RFC PATCH 0/4] drm/i915: Resolve device hotunplug issues Janusz Krzysztofik
                   ` (2 preceding siblings ...)
  2020-05-18 18:17 ` [Intel-gfx] [RFC PATCH 3/4] drm/i915: Move GGTT cleanup from driver_release to _remove Janusz Krzysztofik
@ 2020-05-18 18:17 ` Janusz Krzysztofik
  2020-05-27 19:15   ` Michał Winiarski
  2020-05-18 23:07 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Resolve device hotunplug issues Patchwork
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Janusz Krzysztofik @ 2020-05-18 18:17 UTC (permalink / raw)
  To: intel-gfx

UC firmware is stored in a GEM object.  Clean it up on driver remove to
avoid intel-iommu triggered kernel panic on late DMA unmapping or even
an RPM related warning on object late removal in no IOMMU setups.

<4> [93.335282] ------------[ cut here ]------------
<4> [93.335515] pm_runtime_get_sync() failed: -13
<4> [93.336056] WARNING: CPU: 6 PID: 200 at drivers/gpu/drm/i915/intel_runtime_pm.c:361 __intel_runtime_pm_get+0x4d/0x60 [i915]
<4> [93.336104] Modules linked in: snd_hda_codec_hdmi mei_hdcp i915 x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel cdc_ether snd_intel_dspcfg usbnet snd_hda_codec mii snd_hwdep snd_hda_core e1000e snd_pcm ptp pps_core mei_me mei intel_lpss_pci prime_numbers
<4> [93.336268] CPU: 6 PID: 200 Comm: kworker/u16:3 Tainted: G     U            5.7.0-rc4-CI-Trybot_6405+ #1
<4> [93.336289] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake Y LPDDR4x T4 Crb, BIOS TGLSFWI1.R00.2457.A16.1912270059 12/27/2019
<4> [93.336811] Workqueue: i915 __i915_gem_free_work [i915]
<4> [93.337296] RIP: 0010:__intel_runtime_pm_get+0x4d/0x60 [i915]
<4> [93.337332] Code: ff ff 48 89 df 5b 5d e9 a1 fa ff ff 80 3d 4b 7a 2e 00 00 75 e1 89 c6 48 c7 c7 a8 2d 40 a0 c6 05 39 7a 2e 00 01 e8 53 fc e9 e0 <0f> 0b eb c8 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 41 57 41
<4> [93.337357] RSP: 0018:ffffc9000144bdd8 EFLAGS: 00010282
<4> [93.337384] RAX: 0000000000000000 RBX: ffff88838ee5bc40 RCX: 0000000000000001
<4> [93.337409] RDX: 0000000080000001 RSI: ffff88839d264928 RDI: 00000000ffffffff
<4> [93.337440] RBP: 0000000000000001 R08: ffff88839d264928 R09: 0000000000000000
<4> [93.337467] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88838ee5bc40
<4> [93.337493] R13: 0000000000000006 R14: ffffffff82769a30 R15: ffff88839376bab0
<4> [93.337515] FS:  0000000000000000(0000) GS:ffff8883a4100000(0000) knlGS:0000000000000000
<4> [93.337542] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [93.337563] CR2: 000055bc19b16ff8 CR3: 00000003a11c4005 CR4: 0000000000760ee0
<4> [93.337583] PKRU: 55555554
<4> [93.337605] Call Trace:
<4> [93.338148]  i915_gem_object_release_mmap+0x23/0x70 [i915]
<4> [93.338665]  __i915_gem_free_objects.isra.21+0x10a/0x4b0 [i915]
<4> [93.338741]  process_one_work+0x268/0x600
<4> [93.338785]  ? __schedule+0x307/0x8d0
<4> [93.338878]  worker_thread+0x37/0x380
<4> [93.338929]  ? process_one_work+0x600/0x600
<4> [93.338963]  kthread+0x140/0x160
<4> [93.339006]  ? kthread_park+0x80/0x80
<4> [93.339057]  ret_from_fork+0x24/0x50
<4> [93.339181] irq event stamp: 204220
<4> [93.339219] hardirqs last  enabled at (204219): [<ffffffff8113399d>] console_unlock+0x4cd/0x5a0
<4> [93.339250] hardirqs last disabled at (204220): [<ffffffff81001d50>] trace_hardirqs_off_thunk+0x1a/0x1c
<4> [93.339277] softirqs last  enabled at (204208): [<ffffffff81e00395>] __do_softirq+0x395/0x49e
<4> [93.339307] softirqs last disabled at (204197): [<ffffffff810bbc9a>] irq_exit+0xba/0xc0
<4> [93.339330] ---[ end trace f066187622b8c484 ]---

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 87d3c4f5b6c6..f9d37c7e6d6f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1191,6 +1191,8 @@ void i915_gem_driver_remove(struct drm_i915_private *dev_priv)
 
 	i915_gem_driver_remove__contexts(dev_priv);
 
+	intel_uc_cleanup_firmwares(&dev_priv->gt.uc);
+
 	i915_gem_drain_freed_objects(dev_priv);
 }
 
@@ -1202,7 +1204,6 @@ void i915_gem_driver_release(struct drm_i915_private *dev_priv)
 
 	intel_wa_list_free(&dev_priv->gt_wa_list);
 
-	intel_uc_cleanup_firmwares(&dev_priv->gt.uc);
 	i915_gem_cleanup_userptr(dev_priv);
 
 	i915_gem_drain_freed_objects(dev_priv);
-- 
2.21.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Resolve device hotunplug issues
  2020-05-18 18:17 [Intel-gfx] [RFC PATCH 0/4] drm/i915: Resolve device hotunplug issues Janusz Krzysztofik
                   ` (3 preceding siblings ...)
  2020-05-18 18:17 ` [Intel-gfx] [RFC PATCH 4/4] drm/i915: Move UC firmware " Janusz Krzysztofik
@ 2020-05-18 23:07 ` Patchwork
  2020-05-18 23:08 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Patchwork @ 2020-05-18 23:07 UTC (permalink / raw)
  To: Janusz Krzysztofik; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Resolve device hotunplug issues
URL   : https://patchwork.freedesktop.org/series/77372/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
4d3df739b858 drm/i915: Drop user contexts on driver remove
-:19: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#19: 
<4> [36.901117] CPU: 0 PID: 39 Comm: kworker/u8:1 Tainted: G     U  W         5.7.0-rc5-CI-CI_DRM_8485+ #1

-:66: WARNING:UNNECESSARY_INT: Prefer 'unsigned long' over 'unsigned long int' as the int is unnecessary
#66: FILE: drivers/gpu/drm/i915/gem/i915_gem_context.c:937:
+		unsigned long int id;

-:73: WARNING:UNNECESSARY_INT: Prefer 'unsigned long' over 'unsigned long int' as the int is unnecessary
#73: FILE: drivers/gpu/drm/i915/gem/i915_gem_context.c:944:
+			unsigned long int idx;

total: 0 errors, 3 warnings, 0 checks, 59 lines checked
8fd9c3d7066d drm/i915: Release GT resources on driver remove
-:22: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#22: 
<4> [39.201162] CPU: 6 PID: 7 Comm: kworker/u16:0 Tainted: G     U  W         5.7.0-rc5-CI-CI_DRM_8485+ #1

total: 0 errors, 1 warnings, 0 checks, 8 lines checked
6790fa9abcae drm/i915: Move GGTT cleanup from driver_release to _remove
-:57: WARNING:BAD_SIGN_OFF: Duplicate signature
#57: 
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>

total: 0 errors, 1 warnings, 0 checks, 53 lines checked
721e7aad8fe5 drm/i915: Move UC firmware cleanup from driver_release to _remove
-:13: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#13: 
<4> [93.336056] WARNING: CPU: 6 PID: 200 at drivers/gpu/drm/i915/intel_runtime_pm.c:361 __intel_runtime_pm_get+0x4d/0x60 [i915]

total: 0 errors, 1 warnings, 0 checks, 15 lines checked

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915: Resolve device hotunplug issues
  2020-05-18 18:17 [Intel-gfx] [RFC PATCH 0/4] drm/i915: Resolve device hotunplug issues Janusz Krzysztofik
                   ` (4 preceding siblings ...)
  2020-05-18 23:07 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Resolve device hotunplug issues Patchwork
@ 2020-05-18 23:08 ` Patchwork
  2020-05-18 23:29 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
  2020-05-19  0:54 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
  7 siblings, 0 replies; 20+ messages in thread
From: Patchwork @ 2020-05-18 23:08 UTC (permalink / raw)
  To: Janusz Krzysztofik; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Resolve device hotunplug issues
URL   : https://patchwork.freedesktop.org/series/77372/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.0
Fast mode used, each commit won't be checked separately.
-
+drivers/gpu/drm/i915/display/intel_display.c:1222:22: error: Expected constant expression in case statement
+drivers/gpu/drm/i915/display/intel_display.c:1225:22: error: Expected constant expression in case statement
+drivers/gpu/drm/i915/display/intel_display.c:1228:22: error: Expected constant expression in case statement
+drivers/gpu/drm/i915/display/intel_display.c:1231:22: error: Expected constant expression in case statement
+drivers/gpu/drm/i915/gem/i915_gem_context.c:2312:17: error: bad integer constant expression
+drivers/gpu/drm/i915/gem/i915_gem_context.c:2313:17: error: bad integer constant expression
+drivers/gpu/drm/i915/gem/i915_gem_context.c:2314:17: error: bad integer constant expression
+drivers/gpu/drm/i915/gem/i915_gem_context.c:2315:17: error: bad integer constant expression
+drivers/gpu/drm/i915/gem/i915_gem_context.c:2316:17: error: bad integer constant expression
+drivers/gpu/drm/i915/gem/i915_gem_context.c:2317:17: error: bad integer constant expression
+drivers/gpu/drm/i915/gt/intel_reset.c:1310:5: warning: context imbalance in 'intel_gt_reset_trylock' - different lock contexts for basic block
+drivers/gpu/drm/i915/gt/sysfs_engines.c:61:10: error: bad integer constant expression
+drivers/gpu/drm/i915/gt/sysfs_engines.c:62:10: error: bad integer constant expression
+drivers/gpu/drm/i915/gt/sysfs_engines.c:66:10: error: bad integer constant expression
+drivers/gpu/drm/i915/gvt/mmio.c:287:23: warning: memcpy with byte count of 279040
+drivers/gpu/drm/i915/i915_perf.c:1425:15: warning: memset with byte count of 16777216
+drivers/gpu/drm/i915/i915_perf.c:1479:15: warning: memset with byte count of 16777216
+./include/linux/compiler.h:199:9: warning: context imbalance in 'engines_sample' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'fwtable_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'fwtable_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'fwtable_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'fwtable_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'fwtable_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'fwtable_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'fwtable_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen11_fwtable_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen11_fwtable_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen11_fwtable_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen11_fwtable_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen11_fwtable_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen11_fwtable_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen11_fwtable_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen12_fwtable_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen12_fwtable_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen12_fwtable_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen12_fwtable_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen12_fwtable_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen12_fwtable_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen12_fwtable_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen6_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen6_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen6_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen6_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen6_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen6_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen6_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen8_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen8_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen8_write8' - different lock contexts for basic block

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915: Resolve device hotunplug issues
  2020-05-18 18:17 [Intel-gfx] [RFC PATCH 0/4] drm/i915: Resolve device hotunplug issues Janusz Krzysztofik
                   ` (5 preceding siblings ...)
  2020-05-18 23:08 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
@ 2020-05-18 23:29 ` Patchwork
  2020-05-19  0:54 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
  7 siblings, 0 replies; 20+ messages in thread
From: Patchwork @ 2020-05-18 23:29 UTC (permalink / raw)
  To: Janusz Krzysztofik; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Resolve device hotunplug issues
URL   : https://patchwork.freedesktop.org/series/77372/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8498 -> Patchwork_17701
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17701/index.html

Known issues
------------

  Here are the changes found in Patchwork_17701 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_chamelium@dp-crc-fast:
    - fi-icl-u2:          [PASS][1] -> [FAIL][2] ([i915#262])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8498/fi-icl-u2/igt@kms_chamelium@dp-crc-fast.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17701/fi-icl-u2/igt@kms_chamelium@dp-crc-fast.html

  
  [i915#262]: https://gitlab.freedesktop.org/drm/intel/issues/262


Participating hosts (52 -> 45)
------------------------------

  Missing    (7): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 fi-byt-clapper fi-bdw-samus 


Build changes
-------------

  * Linux: CI_DRM_8498 -> Patchwork_17701

  CI-20190529: 20190529
  CI_DRM_8498: 1493c649ae92207a758afa50a639275bd6c80e2e @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5659: 66ab5e42811fee3dea8c21ab29e70e323a0650de @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17701: 721e7aad8fe545ebf5b6c946880abbc18b0838e3 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

721e7aad8fe5 drm/i915: Move UC firmware cleanup from driver_release to _remove
6790fa9abcae drm/i915: Move GGTT cleanup from driver_release to _remove
8fd9c3d7066d drm/i915: Release GT resources on driver remove
4d3df739b858 drm/i915: Drop user contexts on driver remove

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17701/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Intel-gfx] ✓ Fi.CI.IGT: success for drm/i915: Resolve device hotunplug issues
  2020-05-18 18:17 [Intel-gfx] [RFC PATCH 0/4] drm/i915: Resolve device hotunplug issues Janusz Krzysztofik
                   ` (6 preceding siblings ...)
  2020-05-18 23:29 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
@ 2020-05-19  0:54 ` Patchwork
  7 siblings, 0 replies; 20+ messages in thread
From: Patchwork @ 2020-05-19  0:54 UTC (permalink / raw)
  To: Janusz Krzysztofik; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Resolve device hotunplug issues
URL   : https://patchwork.freedesktop.org/series/77372/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8498_full -> Patchwork_17701_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Known issues
------------

  Here are the changes found in Patchwork_17701_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_cursor_crc@pipe-c-cursor-suspend:
    - shard-apl:          [PASS][1] -> [DMESG-WARN][2] ([i915#180]) +1 similar issue
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8498/shard-apl6/igt@kms_cursor_crc@pipe-c-cursor-suspend.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17701/shard-apl6/igt@kms_cursor_crc@pipe-c-cursor-suspend.html

  * igt@kms_cursor_legacy@all-pipes-torture-bo:
    - shard-snb:          [PASS][3] -> [DMESG-WARN][4] ([i915#128])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8498/shard-snb5/igt@kms_cursor_legacy@all-pipes-torture-bo.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17701/shard-snb4/igt@kms_cursor_legacy@all-pipes-torture-bo.html

  * igt@kms_flip_tiling@flip-changes-tiling:
    - shard-apl:          [PASS][5] -> [FAIL][6] ([i915#95])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8498/shard-apl8/igt@kms_flip_tiling@flip-changes-tiling.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17701/shard-apl8/igt@kms_flip_tiling@flip-changes-tiling.html

  * igt@kms_hdr@bpc-switch-dpms:
    - shard-skl:          [PASS][7] -> [FAIL][8] ([i915#1188]) +1 similar issue
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8498/shard-skl7/igt@kms_hdr@bpc-switch-dpms.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17701/shard-skl7/igt@kms_hdr@bpc-switch-dpms.html

  * igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes:
    - shard-skl:          [PASS][9] -> [INCOMPLETE][10] ([i915#69])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8498/shard-skl8/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17701/shard-skl6/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes.html
    - shard-kbl:          [PASS][11] -> [DMESG-WARN][12] ([i915#180]) +1 similar issue
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8498/shard-kbl1/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17701/shard-kbl3/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes.html

  * igt@kms_plane_alpha_blend@pipe-c-coverage-7efc:
    - shard-skl:          [PASS][13] -> [FAIL][14] ([fdo#108145] / [i915#265]) +1 similar issue
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8498/shard-skl3/igt@kms_plane_alpha_blend@pipe-c-coverage-7efc.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17701/shard-skl10/igt@kms_plane_alpha_blend@pipe-c-coverage-7efc.html

  * igt@kms_psr@psr2_sprite_render:
    - shard-iclb:         [PASS][15] -> [SKIP][16] ([fdo#109441]) +2 similar issues
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8498/shard-iclb2/igt@kms_psr@psr2_sprite_render.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17701/shard-iclb4/igt@kms_psr@psr2_sprite_render.html

  
#### Possible fixes ####

  * igt@gen9_exec_parse@allowed-all:
    - shard-apl:          [DMESG-WARN][17] ([i915#1436] / [i915#716]) -> [PASS][18]
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8498/shard-apl1/igt@gen9_exec_parse@allowed-all.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17701/shard-apl3/igt@gen9_exec_parse@allowed-all.html

  * igt@i915_selftest@live@execlists:
    - shard-skl:          [INCOMPLETE][19] ([i915#1795] / [i915#1874]) -> [PASS][20]
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8498/shard-skl2/igt@i915_selftest@live@execlists.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17701/shard-skl5/igt@i915_selftest@live@execlists.html

  * igt@kms_cursor_crc@pipe-a-cursor-suspend:
    - shard-kbl:          [DMESG-WARN][21] ([i915#180]) -> [PASS][22] +4 similar issues
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8498/shard-kbl2/igt@kms_cursor_crc@pipe-a-cursor-suspend.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17701/shard-kbl2/igt@kms_cursor_crc@pipe-a-cursor-suspend.html

  * igt@kms_dp_dsc@basic-dsc-enable-edp:
    - shard-iclb:         [SKIP][23] ([fdo#109349]) -> [PASS][24]
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8498/shard-iclb5/igt@kms_dp_dsc@basic-dsc-enable-edp.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17701/shard-iclb2/igt@kms_dp_dsc@basic-dsc-enable-edp.html

  * {igt@kms_flip@flip-vs-expired-vblank-interruptible@a-hdmi-a1}:
    - shard-glk:          [FAIL][25] ([i915#79]) -> [PASS][26]
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8498/shard-glk9/igt@kms_flip@flip-vs-expired-vblank-interruptible@a-hdmi-a1.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17701/shard-glk2/igt@kms_flip@flip-vs-expired-vblank-interruptible@a-hdmi-a1.html

  * {igt@kms_flip@flip-vs-expired-vblank-interruptible@b-edp1}:
    - shard-skl:          [FAIL][27] ([i915#79]) -> [PASS][28]
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8498/shard-skl1/igt@kms_flip@flip-vs-expired-vblank-interruptible@b-edp1.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17701/shard-skl10/igt@kms_flip@flip-vs-expired-vblank-interruptible@b-edp1.html

  * {igt@kms_flip@flip-vs-suspend-interruptible@c-dp1}:
    - shard-apl:          [DMESG-WARN][29] ([i915#180]) -> [PASS][30] +2 similar issues
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8498/shard-apl8/igt@kms_flip@flip-vs-suspend-interruptible@c-dp1.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17701/shard-apl8/igt@kms_flip@flip-vs-suspend-interruptible@c-dp1.html

  * igt@kms_psr@no_drrs:
    - shard-iclb:         [FAIL][31] ([i915#173]) -> [PASS][32]
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8498/shard-iclb1/igt@kms_psr@no_drrs.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17701/shard-iclb2/igt@kms_psr@no_drrs.html

  * igt@kms_psr@psr2_cursor_render:
    - shard-iclb:         [SKIP][33] ([fdo#109441]) -> [PASS][34]
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8498/shard-iclb1/igt@kms_psr@psr2_cursor_render.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17701/shard-iclb2/igt@kms_psr@psr2_cursor_render.html

  * {igt@perf@polling-parameterized}:
    - shard-tglb:         [FAIL][35] ([i915#1542]) -> [PASS][36]
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8498/shard-tglb1/igt@perf@polling-parameterized.html
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17701/shard-tglb3/igt@perf@polling-parameterized.html

  
#### Warnings ####

  * igt@i915_pm_dc@dc6-psr:
    - shard-tglb:         [SKIP][37] ([i915#468]) -> [FAIL][38] ([i915#454])
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8498/shard-tglb2/igt@i915_pm_dc@dc6-psr.html
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17701/shard-tglb1/igt@i915_pm_dc@dc6-psr.html

  * igt@kms_content_protection@atomic:
    - shard-apl:          [DMESG-FAIL][39] ([fdo#110321]) -> [TIMEOUT][40] ([i915#1319])
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8498/shard-apl7/igt@kms_content_protection@atomic.html
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17701/shard-apl3/igt@kms_content_protection@atomic.html

  * igt@kms_content_protection@atomic-dpms:
    - shard-apl:          [TIMEOUT][41] ([i915#1319]) -> [FAIL][42] ([fdo#110321] / [fdo#110336])
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8498/shard-apl1/igt@kms_content_protection@atomic-dpms.html
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17701/shard-apl6/igt@kms_content_protection@atomic-dpms.html

  * igt@kms_content_protection@legacy:
    - shard-apl:          [FAIL][43] ([fdo#110321] / [fdo#110336]) -> [TIMEOUT][44] ([i915#1319])
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8498/shard-apl6/igt@kms_content_protection@legacy.html
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17701/shard-apl2/igt@kms_content_protection@legacy.html

  * igt@kms_content_protection@lic:
    - shard-kbl:          [FAIL][45] ([fdo#110321]) -> [TIMEOUT][46] ([i915#1319])
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8498/shard-kbl3/igt@kms_content_protection@lic.html
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17701/shard-kbl4/igt@kms_content_protection@lic.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#109349]: https://bugs.freedesktop.org/show_bug.cgi?id=109349
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#110321]: https://bugs.freedesktop.org/show_bug.cgi?id=110321
  [fdo#110336]: https://bugs.freedesktop.org/show_bug.cgi?id=110336
  [i915#1188]: https://gitlab.freedesktop.org/drm/intel/issues/1188
  [i915#128]: https://gitlab.freedesktop.org/drm/intel/issues/128
  [i915#1319]: https://gitlab.freedesktop.org/drm/intel/issues/1319
  [i915#1436]: https://gitlab.freedesktop.org/drm/intel/issues/1436
  [i915#1542]: https://gitlab.freedesktop.org/drm/intel/issues/1542
  [i915#173]: https://gitlab.freedesktop.org/drm/intel/issues/173
  [i915#1795]: https://gitlab.freedesktop.org/drm/intel/issues/1795
  [i915#180]: https://gitlab.freedesktop.org/drm/intel/issues/180
  [i915#1874]: https://gitlab.freedesktop.org/drm/intel/issues/1874
  [i915#265]: https://gitlab.freedesktop.org/drm/intel/issues/265
  [i915#454]: https://gitlab.freedesktop.org/drm/intel/issues/454
  [i915#468]: https://gitlab.freedesktop.org/drm/intel/issues/468
  [i915#69]: https://gitlab.freedesktop.org/drm/intel/issues/69
  [i915#716]: https://gitlab.freedesktop.org/drm/intel/issues/716
  [i915#79]: https://gitlab.freedesktop.org/drm/intel/issues/79
  [i915#95]: https://gitlab.freedesktop.org/drm/intel/issues/95


Participating hosts (11 -> 11)
------------------------------

  No changes in participating hosts


Build changes
-------------

  * Linux: CI_DRM_8498 -> Patchwork_17701

  CI-20190529: 20190529
  CI_DRM_8498: 1493c649ae92207a758afa50a639275bd6c80e2e @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5659: 66ab5e42811fee3dea8c21ab29e70e323a0650de @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17701: 721e7aad8fe545ebf5b6c946880abbc18b0838e3 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17701/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 1/4] drm/i915: Drop user contexts on driver remove
  2020-05-18 18:17 ` [Intel-gfx] [RFC PATCH 1/4] drm/i915: Drop user contexts on driver remove Janusz Krzysztofik
@ 2020-05-27 14:23   ` Michał Winiarski
  2020-05-28 10:14   ` Tvrtko Ursulin
  1 sibling, 0 replies; 20+ messages in thread
From: Michał Winiarski @ 2020-05-27 14:23 UTC (permalink / raw)
  To: intel-gfx

Quoting Janusz Krzysztofik (2020-05-18 20:17:17)
> Contexts associated with open device file descriptors together with
> their assigned address spaces are now closed on device file close.  On
> address space closure its associated DMA mappings are revoked.  If the
> device is removed while being open, subsequent attempts to revoke
> those mappings while closing the device file descriptor may may be
> judged by intel-iommu code as a bug and result in kernel panic.
> 
> Since user contexts become useless after the device is no longer
> available, drop them on device removal.
> 
> <4> [36.900985] ------------[ cut here ]------------
> <2> [36.901005] kernel BUG at drivers/iommu/intel-iommu.c:3717!
> <4> [36.901105] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> <4> [36.901117] CPU: 0 PID: 39 Comm: kworker/u8:1 Tainted: G     U  W         5.7.0-rc5-CI-CI_DRM_8485+ #1
> <4> [36.901133] Hardware name: Intel Corporation Elkhart Lake Embedded Platform/ElkhartLake LPDDR4x T3 CRB, BIOS EHLSFWI1.R00.1484.A00.1911290833 11/29/2019
> <4> [36.901250] Workqueue: i915 __i915_vm_release [i915]
> <4> [36.901264] RIP: 0010:intel_unmap+0x1f5/0x230
> <4> [36.901274] Code: 01 e8 9f bc a9 ff 85 c0 74 09 80 3d df 60 09 01 00 74 19 65 ff 0d 13 12 97 7e 0f 85 fc fe ff ff e8 82 b0 95 ff e9 f2 fe ff ff <0f> 0b e8 d4 bd a9 ff 85 c0 75 de 48 c7 c2 10 84 2c 82 be 54 00 00
> <4> [36.901302] RSP: 0018:ffffc900001ebdc0 EFLAGS: 00010246
> <4> [36.901313] RAX: 0000000000000000 RBX: ffff8882561dd000 RCX: 0000000000000000
> <4> [36.901324] RDX: 0000000000001000 RSI: 00000000ffd9c000 RDI: ffff888274c94000
> <4> [36.901336] RBP: ffff888274c940b0 R08: 0000000000000000 R09: 0000000000000001
> <4> [36.901348] R10: 000000000a25d812 R11: 00000000112af2d4 R12: ffff888252c70200
> <4> [36.901360] R13: 00000000ffd9c000 R14: 0000000000001000 R15: ffff8882561dd010
> <4> [36.901372] FS:  0000000000000000(0000) GS:ffff888278000000(0000) knlGS:0000000000000000
> <4> [36.901386] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4> [36.901396] CR2: 00007f06def54950 CR3: 0000000255844000 CR4: 0000000000340ef0
> <4> [36.901408] Call Trace:
> <4> [36.901418]  ? process_one_work+0x1de/0x600
> <4> [36.901494]  cleanup_page_dma+0x37/0x70 [i915]
> <4> [36.901573]  free_pd+0x9/0x20 [i915]
> <4> [36.901644]  gen8_ppgtt_cleanup+0x59/0xc0 [i915]
> <4> [36.901721]  __i915_vm_release+0x14/0x30 [i915]
> <4> [36.901733]  process_one_work+0x268/0x600
> <4> [36.901744]  ? __schedule+0x307/0x8d0
> <4> [36.901756]  worker_thread+0x37/0x380
> <4> [36.901766]  ? process_one_work+0x600/0x600
> <4> [36.901775]  kthread+0x140/0x160
> <4> [36.901783]  ? kthread_park+0x80/0x80
> <4> [36.901792]  ret_from_fork+0x24/0x50
> <4> [36.901804] Modules linked in: mei_hdcp i915 x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ax88179_178a usbnet mii mei_me mei prime_numbers intel_lpss_pci
> <4> [36.901857] ---[ end trace 52d1b4d81f8d1ea7 ]---
> 
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c | 38 +++++++++++++++++++++
>  drivers/gpu/drm/i915/gem/i915_gem_context.h |  1 +
>  drivers/gpu/drm/i915/i915_gem.c             |  2 ++
>  3 files changed, 41 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 900ea8b7fc8f..0096a69fbfd3 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -927,6 +927,44 @@ void i915_gem_driver_release__contexts(struct drm_i915_private *i915)
>         rcu_barrier(); /* and flush the left over RCU frees */
>  }
>  
> +void i915_gem_driver_remove__contexts(struct drm_i915_private *i915)
> +{
> +       struct i915_gem_context *ctx, *cn;
> +
> +       list_for_each_entry_safe(ctx, cn, &i915->gem.contexts.list, link) {

You're not removing ctx from gem.contexts.list inside this loop.

> +               struct drm_i915_file_private *file_priv = ctx->file_priv;
> +               struct i915_gem_context *entry;
> +               unsigned long int id;
> +
> +               if (i915_gem_context_is_closed(ctx) || IS_ERR(file_priv))
> +                       continue;
> +
> +               xa_for_each(&file_priv->context_xa, id, entry) {

We're iterating over contexts?
I thought we were already doing that by going over i915->gem.contexts.list.

> +                       struct i915_address_space *vm;
> +                       unsigned long int idx;
> +
> +                       if (entry != ctx)
> +                               continue;
> +
> +                       xa_erase(&file_priv->context_xa, id);
> +
> +                       if (id)
> +                               break;

Ok... So we're exiting early for !default contexts?

> +
> +                       xa_for_each(&file_priv->vm_xa, idx, vm) {
> +                               xa_erase(&file_priv->vm_xa, idx);
> +                               i915_vm_put(vm);
> +                       }

And for each vm as yet another inner loop?
Why? After the first round of going over contexts this is going to be empty.

> +
> +                       break;

Ah... But there won't be another round...

Yeah, the intent behind those loops is not clear to me.

Can this all be expressed as (note: pseudocode):

/* for all fds */
list_for_each(&dev.filelist) {
	/* for each contexts registered to this file */
	xa_for_each(&file_priv->context_xa) {
		/* drop ctx */
	}
	/* for each VM registered to this file */
	xa_for_each(&file_priv->vm_xa) {
		/* drop VM */
	}
}

Or am I missing something important here?

-Michał

> +               }
> +
> +               context_close(ctx);
> +       }
> +
> +       i915_gem_driver_release__contexts(i915);
> +}
> +
>  static int gem_context_register(struct i915_gem_context *ctx,
>                                 struct drm_i915_file_private *fpriv,
>                                 u32 *id)
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h
> index 3702b2fb27ab..62808bea9239 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
> @@ -110,6 +110,7 @@ i915_gem_context_clear_user_engines(struct i915_gem_context *ctx)
>  
>  /* i915_gem_context.c */
>  void i915_gem_init__contexts(struct drm_i915_private *i915);
> +void i915_gem_driver_remove__contexts(struct drm_i915_private *i915);
>  void i915_gem_driver_release__contexts(struct drm_i915_private *i915);
>  
>  int i915_gem_context_open(struct drm_i915_private *i915,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 0cbcb9f54e7d..87d3c4f5b6c6 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1189,6 +1189,8 @@ void i915_gem_driver_remove(struct drm_i915_private *dev_priv)
>         /* Flush any outstanding unpin_work. */
>         i915_gem_drain_workqueue(dev_priv);
>  
> +       i915_gem_driver_remove__contexts(dev_priv);
> +
>         i915_gem_drain_freed_objects(dev_priv);
>  }
>  
> -- 
> 2.21.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 2/4] drm/i915: Release GT resources on driver remove
  2020-05-18 18:17 ` [Intel-gfx] [RFC PATCH 2/4] drm/i915: Release GT resources " Janusz Krzysztofik
@ 2020-05-27 18:35   ` Michał Winiarski
  0 siblings, 0 replies; 20+ messages in thread
From: Michał Winiarski @ 2020-05-27 18:35 UTC (permalink / raw)
  To: Janusz Krzysztofik, intel-gfx

Quoting Janusz Krzysztofik (2020-05-18 20:17:18)
> GT scratch page is now released and its DMA mappings revoked on driver
> release.  If a device is removed while its file descriptor is still
> open, the driver is not released until last device file descriptor
> closure.  In that case intel-iommu code may judge late DMA unmapping as
> a bug and kernel panic may occur.
> 
> Since DMA mapped address space may be no longer usable after device
> removal, release GT resources including scratch page as well as a
> reference to its address space on driver remove.  Implement that by
> just calling intel_gt_driver_release() on GT remove as that function
> has been already made safe to be called again on driver release even if
> already called before, e.g. on GEM initialization failure.

Do you mean:
	if (vm) /* FIXME being called twice on error paths :( */
		i915_vm_put(vm);

?

We're not fixing that... We're adding more :(
Unfortunately I don't have a clear answer on how to rework our init / cleanup to
be unplug friendly, and this fixes a bug, so...

Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>

-Michał


> 
> <4> [39.201062] ------------[ cut here ]------------
> <2> [39.201074] kernel BUG at drivers/iommu/intel-iommu.c:3717!
> <4> [39.201154] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> <4> [39.201162] CPU: 6 PID: 7 Comm: kworker/u16:0 Tainted: G     U  W         5.7.0-rc5-CI-CI_DRM_8485+ #1
> <4> [39.201172] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP, BIOS ICLSFWR1.R00.3175.A00.1904261428 04/26/2019
> <4> [39.201243] Workqueue: i915 __i915_gem_free_work [i915]
> <4> [39.201252] RIP: 0010:intel_unmap+0x1f5/0x230
> <4> [39.201260] Code: 01 e8 9f bc a9 ff 85 c0 74 09 80 3d df 60 09 01 00 74 19 65 ff 0d 13 12 97 7e 0f 85 fc fe ff ff e8 82 b0 95 ff e9 f2 fe ff ff <0f> 0b e8 d4 bd a9 ff 85 c0 75 de 48 c7 c2 10 84 2c 82 be 54 00 00
> <4> [39.201278] RSP: 0018:ffffc900000dbc98 EFLAGS: 00010246
> <4> [39.201285] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffea0021d30000
> <4> [39.201293] RDX: 000000000005f000 RSI: 00000000fed00000 RDI: ffff888889eec000
> <4> [39.201301] RBP: ffff888889eec0b0 R08: 0000000000000000 R09: 00000000fffffffe
> <4> [39.201309] R10: 00000000458139fc R11: 00000000f6c6d8b2 R12: 0000000000000025
> <4> [39.201318] R13: 00000000fed00000 R14: 000000000005f000 R15: 0000000000000025
> <4> [39.201326] FS:  0000000000000000(0000) GS:ffff888890100000(0000) knlGS:0000000000000000
> <4> [39.201335] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4> [39.201342] CR2: 0000560f1308e148 CR3: 0000000881972002 CR4: 0000000000760ee0
> <4> [39.201350] PKRU: 55555554
> <4> [39.201355] Call Trace:
> <4> [39.201361]  intel_unmap_sg+0x7b/0x180
> <4> [39.201412]  shmem_put_pages+0x43/0x250 [i915]
> <4> [39.201472]  ? __i915_gem_object_unset_pages.part.12+0x11b/0x1d0 [i915]
> <4> [39.201531]  ? __i915_gem_object_unset_pages.part.12+0x133/0x1d0 [i915]
> <4> [39.201590]  __i915_gem_object_put_pages+0x81/0xc0 [i915]
> <4> [39.201646]  __i915_gem_free_objects.isra.21+0x1a7/0x4b0 [i915]
> <4> [39.201658]  process_one_work+0x268/0x600
> <4> [39.201666]  ? __schedule+0x307/0x8d0
> <4> [39.201675]  worker_thread+0x1d0/0x380
> <4> [39.201682]  ? process_one_work+0x600/0x600
> <4> [39.201689]  kthread+0x140/0x160
> <4> [39.201695]  ? kthread_park+0x80/0x80
> <4> [39.201703]  ret_from_fork+0x24/0x50
> <4> [39.201712] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_hda_intel snd_intel_dspcfg snd_hda_codec e1000e ax88179_178a usbnet snd_hwdep mii snd_hda_core ghash_clmulni_intel snd_pcm ptp pps_core mei_me mei intel_lpss_pci prime_numbers
> <4> [39.201764] ---[ end trace f3ec1bae3de04509 ]---
> 
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_gt.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index f069551e412f..5771e80e85a6 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -599,6 +599,8 @@ void intel_gt_driver_remove(struct intel_gt *gt)
>         intel_uc_driver_remove(&gt->uc);
>  
>         intel_engines_release(gt);
> +
> +       intel_gt_driver_release(gt);
>  }
>  
>  void intel_gt_driver_unregister(struct intel_gt *gt)
> -- 
> 2.21.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 3/4] drm/i915: Move GGTT cleanup from driver_release to _remove
  2020-05-18 18:17 ` [Intel-gfx] [RFC PATCH 3/4] drm/i915: Move GGTT cleanup from driver_release to _remove Janusz Krzysztofik
@ 2020-05-27 19:12   ` Michał Winiarski
  2020-05-28  9:12     ` Janusz Krzysztofik
  0 siblings, 1 reply; 20+ messages in thread
From: Michał Winiarski @ 2020-05-27 19:12 UTC (permalink / raw)
  To: Janusz Krzysztofik, intel-gfx

Quoting Janusz Krzysztofik (2020-05-18 20:17:19)
> GGTT including its scratch page is not cleaned up until driver release.
> Since DMA mappings still exist before scratch page cleanup, unmapping
> them on last device close after the driver has been already removed may
> be judged by intel-iommu code as a bug and result in kernel panic.
> 
> Since we abort requests and block user access to hardware on device
> removal, there seems not much sense in still keeping GGTT.  Clean it up
> on driver remove.  However, skip GGTT address space cleanup as its
> mutext may still be needed if there are objects to be freed.  That
> cleanup is always called on address space release after all.
> 
> [   81.290284] ------------[ cut here ]------------
> [   81.291602] kernel BUG at drivers/iommu/intel-iommu.c:3591!
> [   81.293558] invalid opcode: 0000 [#1] PREEMPT SMP
> [   81.294695] CPU: 0 PID: 207 Comm: core_hotunplug Tainted: G     U            5.4.17 #2
> [   81.296579] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
> [   81.297959] RIP: 0010:intel_unmap+0x200/0x230
> [   81.299015] Code: 00 e8 e4 45 c5 ff 85 c0 74 09 80 3d 2b 84 c0 00 00 74 19 65 ff 0d 78 9a b2 7e 0f 85 fa fe ff ff e8 95 57 b1 ff e9 f0 fe ff ff <0f> 0b e8 19 4c c5 ff 85 c0 75 de 48 c7 c2 48 d2 e1 81 be 57 00 00
> [   81.303425] RSP: 0018:ffffc9000013fda0 EFLAGS: 00010246
> [   81.304683] RAX: 0000000000000000 RBX: ffff8882228dd0b0 RCX: 0000000000000000
> [   81.306384] RDX: 0000000000001000 RSI: 00000000af801000 RDI: ffff8882228dd0b0
> [   81.308086] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [   81.309788] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000af801000
> [   81.311489] R13: ffff888223a00000 R14: 0000000000001000 R15: ffff888223a0a2e8
> [   81.313191] FS:  00007f5408e3c940(0000) GS:ffff888228600000(0000) knlGS:0000000000000000
> [   81.315116] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   81.316495] CR2: 0000000001fc0048 CR3: 000000022464a000 CR4: 00000000000006b0
> [   81.318196] Call Trace:
> [   81.318967]  cleanup_scratch_page+0x44/0x80 [i915]
> [   81.320281]  i915_ggtt_driver_release+0x15b/0x220 [i915]
> [   81.321717]  i915_driver_release+0x33/0x90 [i915]
> [   81.322856]  drm_release+0xbc/0xd0
> [   81.323691]  __fput+0xcd/0x260
> [   81.324447]  task_work_run+0x90/0xc0
> [   81.325323]  do_syscall_64+0x3da/0x560
> [   81.326240]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [   81.327457] RIP: 0033:0x7f54096ecba3
> [   81.328332] Code: 00 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 45 c3 0f 1f 40 00 48 83 ec 18 89 7c 24 0c e8
> [   81.332741] RSP: 002b:00007ffcc5165698 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
> [   81.334546] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f54096ecba3
> [   81.336247] RDX: 00000000005cc5d0 RSI: 0000000000000005 RDI: 0000000000000004
> [   81.337949] RBP: 0000000000000003 R08: 00000000005b8014 R09: 0000000000000004
> [   81.339650] R10: 00000000005cc650 R11: 0000000000000246 R12: 00000000004022f0
> [   81.341352] R13: 00007ffcc5165850 R14: 0000000000000000 R15: 0000000000000000
> [   81.343059] Modules linked in: i915 mfd_core intel_gtt prime_numbers
> [   81.345015] ---[ end trace 010aae55e56f8998 ]---
> 
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> 
> drm/i915: Defer GGTT vm address space fini to vm release

Hum?

> 
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_ggtt.c | 13 +++++++++----
>  drivers/gpu/drm/i915/gt/intel_gtt.h  |  1 +
>  drivers/gpu/drm/i915/i915_drv.c      |  2 ++
>  3 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index 66165b10256e..ff2b4f74149a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -701,7 +701,6 @@ static void ggtt_cleanup_hw(struct i915_ggtt *ggtt)
>         ggtt->vm.cleanup(&ggtt->vm);
>  
>         mutex_unlock(&ggtt->vm.mutex);
> -       i915_address_space_fini(&ggtt->vm);

Ok, so this was defered to release. Where are we going to drop the final ref?
And also - I can see that we have a:

	GEM_BUG_ON(i915_is_ggtt(vm));

in i915_vm_release().
Which means that we probably don't drop the final ref and don't ever call
i915_address_space_fini for ggtt.

-Michał

>  
>         arch_phys_wc_del(ggtt->mtrr);
>  
> @@ -709,6 +708,15 @@ static void ggtt_cleanup_hw(struct i915_ggtt *ggtt)
>                 io_mapping_fini(&ggtt->iomap);
>  }
>  
> +void i915_ggtt_driver_remove(struct drm_i915_private *i915)
> +{
> +       struct i915_ggtt *ggtt = &i915->ggtt;
> +
> +       fini_aliasing_ppgtt(ggtt);
> +
> +       ggtt_cleanup_hw(ggtt);
> +}
> +
>  /**
>   * i915_ggtt_driver_release - Clean up GGTT hardware initialization
>   * @i915: i915 device
> @@ -718,10 +726,7 @@ void i915_ggtt_driver_release(struct drm_i915_private *i915)
>         struct i915_ggtt *ggtt = &i915->ggtt;
>         struct pagevec *pvec;
>  
> -       fini_aliasing_ppgtt(ggtt);
> -
>         intel_ggtt_fini_fences(ggtt);
> -       ggtt_cleanup_hw(ggtt);
>  
>         pvec = &i915->mm.wc_stash.pvec;
>         if (pvec->nr) {
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> index d93ebdf3fa0e..f140ce5c171a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> @@ -501,6 +501,7 @@ int i915_ggtt_enable_hw(struct drm_i915_private *i915);
>  void i915_ggtt_enable_guc(struct i915_ggtt *ggtt);
>  void i915_ggtt_disable_guc(struct i915_ggtt *ggtt);
>  int i915_init_ggtt(struct drm_i915_private *i915);
> +void i915_ggtt_driver_remove(struct drm_i915_private *i915);
>  void i915_ggtt_driver_release(struct drm_i915_private *i915);
>  
>  static inline bool i915_ggtt_has_aperture(const struct i915_ggtt *ggtt)
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 34ee12f3f02d..e4d9f0f6f183 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -752,6 +752,7 @@ static int i915_driver_hw_probe(struct drm_i915_private *dev_priv)
>  err_mem_regions:
>         intel_memory_regions_driver_release(dev_priv);
>  err_ggtt:
> +       i915_ggtt_driver_remove(dev_priv);
>         i915_ggtt_driver_release(dev_priv);
>  err_perf:
>         i915_perf_fini(dev_priv);
> @@ -768,6 +769,7 @@ static void i915_driver_hw_remove(struct drm_i915_private *dev_priv)
>  
>         i915_perf_fini(dev_priv);
>  
> +       i915_ggtt_driver_remove(dev_priv);
>         if (pdev->msi_enabled)
>                 pci_disable_msi(pdev);
>  
> -- 
> 2.21.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 4/4] drm/i915: Move UC firmware cleanup from driver_release to _remove
  2020-05-18 18:17 ` [Intel-gfx] [RFC PATCH 4/4] drm/i915: Move UC firmware " Janusz Krzysztofik
@ 2020-05-27 19:15   ` Michał Winiarski
  2020-05-28  9:13     ` Janusz Krzysztofik
  0 siblings, 1 reply; 20+ messages in thread
From: Michał Winiarski @ 2020-05-27 19:15 UTC (permalink / raw)
  To: Janusz Krzysztofik, intel-gfx

Quoting Janusz Krzysztofik (2020-05-18 20:17:20)
> UC firmware is stored in a GEM object.  Clean it up on driver remove to
					 ^ double whitespace
> avoid intel-iommu triggered kernel panic on late DMA unmapping or even
> an RPM related warning on object late removal in no IOMMU setups.

This is no longer the case after:
drm/i915/gem: Lazily acquire the device wakeref for freeing objects

Right?

> 
> <4> [93.335282] ------------[ cut here ]------------
> <4> [93.335515] pm_runtime_get_sync() failed: -13
> <4> [93.336056] WARNING: CPU: 6 PID: 200 at drivers/gpu/drm/i915/intel_runtime_pm.c:361 __intel_runtime_pm_get+0x4d/0x60 [i915]
> <4> [93.336104] Modules linked in: snd_hda_codec_hdmi mei_hdcp i915 x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel cdc_ether snd_intel_dspcfg usbnet snd_hda_codec mii snd_hwdep snd_hda_core e1000e snd_pcm ptp pps_core mei_me mei intel_lpss_pci prime_numbers
> <4> [93.336268] CPU: 6 PID: 200 Comm: kworker/u16:3 Tainted: G     U            5.7.0-rc4-CI-Trybot_6405+ #1
> <4> [93.336289] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake Y LPDDR4x T4 Crb, BIOS TGLSFWI1.R00.2457.A16.1912270059 12/27/2019
> <4> [93.336811] Workqueue: i915 __i915_gem_free_work [i915]
> <4> [93.337296] RIP: 0010:__intel_runtime_pm_get+0x4d/0x60 [i915]
> <4> [93.337332] Code: ff ff 48 89 df 5b 5d e9 a1 fa ff ff 80 3d 4b 7a 2e 00 00 75 e1 89 c6 48 c7 c7 a8 2d 40 a0 c6 05 39 7a 2e 00 01 e8 53 fc e9 e0 <0f> 0b eb c8 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 41 57 41
> <4> [93.337357] RSP: 0018:ffffc9000144bdd8 EFLAGS: 00010282
> <4> [93.337384] RAX: 0000000000000000 RBX: ffff88838ee5bc40 RCX: 0000000000000001
> <4> [93.337409] RDX: 0000000080000001 RSI: ffff88839d264928 RDI: 00000000ffffffff
> <4> [93.337440] RBP: 0000000000000001 R08: ffff88839d264928 R09: 0000000000000000
> <4> [93.337467] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88838ee5bc40
> <4> [93.337493] R13: 0000000000000006 R14: ffffffff82769a30 R15: ffff88839376bab0
> <4> [93.337515] FS:  0000000000000000(0000) GS:ffff8883a4100000(0000) knlGS:0000000000000000
> <4> [93.337542] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4> [93.337563] CR2: 000055bc19b16ff8 CR3: 00000003a11c4005 CR4: 0000000000760ee0
> <4> [93.337583] PKRU: 55555554
> <4> [93.337605] Call Trace:
> <4> [93.338148]  i915_gem_object_release_mmap+0x23/0x70 [i915]
> <4> [93.338665]  __i915_gem_free_objects.isra.21+0x10a/0x4b0 [i915]
> <4> [93.338741]  process_one_work+0x268/0x600
> <4> [93.338785]  ? __schedule+0x307/0x8d0
> <4> [93.338878]  worker_thread+0x37/0x380
> <4> [93.338929]  ? process_one_work+0x600/0x600
> <4> [93.338963]  kthread+0x140/0x160
> <4> [93.339006]  ? kthread_park+0x80/0x80
> <4> [93.339057]  ret_from_fork+0x24/0x50
> <4> [93.339181] irq event stamp: 204220
> <4> [93.339219] hardirqs last  enabled at (204219): [<ffffffff8113399d>] console_unlock+0x4cd/0x5a0
> <4> [93.339250] hardirqs last disabled at (204220): [<ffffffff81001d50>] trace_hardirqs_off_thunk+0x1a/0x1c
> <4> [93.339277] softirqs last  enabled at (204208): [<ffffffff81e00395>] __do_softirq+0x395/0x49e
> <4> [93.339307] softirqs last disabled at (204197): [<ffffffff810bbc9a>] irq_exit+0xba/0xc0
> <4> [93.339330] ---[ end trace f066187622b8c484 ]---
> 
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>

Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>

-Michał

> ---
>  drivers/gpu/drm/i915/i915_gem.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 87d3c4f5b6c6..f9d37c7e6d6f 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1191,6 +1191,8 @@ void i915_gem_driver_remove(struct drm_i915_private *dev_priv)
>  
>         i915_gem_driver_remove__contexts(dev_priv);
>  
> +       intel_uc_cleanup_firmwares(&dev_priv->gt.uc);
> +
>         i915_gem_drain_freed_objects(dev_priv);
>  }
>  
> @@ -1202,7 +1204,6 @@ void i915_gem_driver_release(struct drm_i915_private *dev_priv)
>  
>         intel_wa_list_free(&dev_priv->gt_wa_list);
>  
> -       intel_uc_cleanup_firmwares(&dev_priv->gt.uc);
>         i915_gem_cleanup_userptr(dev_priv);
>  
>         i915_gem_drain_freed_objects(dev_priv);
> -- 
> 2.21.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 3/4] drm/i915: Move GGTT cleanup from driver_release to _remove
  2020-05-27 19:12   ` Michał Winiarski
@ 2020-05-28  9:12     ` Janusz Krzysztofik
  0 siblings, 0 replies; 20+ messages in thread
From: Janusz Krzysztofik @ 2020-05-28  9:12 UTC (permalink / raw)
  To: Michał Winiarski, intel-gfx

On Wed, 2020-05-27 at 21:12 +0200, Michał Winiarski wrote:
> Quoting Janusz Krzysztofik (2020-05-18 20:17:19)
> > GGTT including its scratch page is not cleaned up until driver release.
> > Since DMA mappings still exist before scratch page cleanup, unmapping
> > them on last device close after the driver has been already removed may
> > be judged by intel-iommu code as a bug and result in kernel panic.
> > 
> > Since we abort requests and block user access to hardware on device
> > removal, there seems not much sense in still keeping GGTT.  Clean it up
> > on driver remove.  However, skip GGTT address space cleanup as its
> > mutext may still be needed if there are objects to be freed.  That
> > cleanup is always called on address space release after all.
> > 
> > [   81.290284] ------------[ cut here ]------------
> > [   81.291602] kernel BUG at drivers/iommu/intel-iommu.c:3591!
> > [   81.293558] invalid opcode: 0000 [#1] PREEMPT SMP
> > [   81.294695] CPU: 0 PID: 207 Comm: core_hotunplug Tainted: G     U            5.4.17 #2
> > [   81.296579] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
> > [   81.297959] RIP: 0010:intel_unmap+0x200/0x230
> > [   81.299015] Code: 00 e8 e4 45 c5 ff 85 c0 74 09 80 3d 2b 84 c0 00 00 74 19 65 ff 0d 78 9a b2 7e 0f 85 fa fe ff ff e8 95 57 b1 ff e9 f0 fe ff ff <0f> 0b e8 19 4c c5 ff 85 c0 75 de 48 c7 c2 48 d2 e1 81 be 57 00 00
> > [   81.303425] RSP: 0018:ffffc9000013fda0 EFLAGS: 00010246
> > [   81.304683] RAX: 0000000000000000 RBX: ffff8882228dd0b0 RCX: 0000000000000000
> > [   81.306384] RDX: 0000000000001000 RSI: 00000000af801000 RDI: ffff8882228dd0b0
> > [   81.308086] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> > [   81.309788] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000af801000
> > [   81.311489] R13: ffff888223a00000 R14: 0000000000001000 R15: ffff888223a0a2e8
> > [   81.313191] FS:  00007f5408e3c940(0000) GS:ffff888228600000(0000) knlGS:0000000000000000
> > [   81.315116] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   81.316495] CR2: 0000000001fc0048 CR3: 000000022464a000 CR4: 00000000000006b0
> > [   81.318196] Call Trace:
> > [   81.318967]  cleanup_scratch_page+0x44/0x80 [i915]
> > [   81.320281]  i915_ggtt_driver_release+0x15b/0x220 [i915]
> > [   81.321717]  i915_driver_release+0x33/0x90 [i915]
> > [   81.322856]  drm_release+0xbc/0xd0
> > [   81.323691]  __fput+0xcd/0x260
> > [   81.324447]  task_work_run+0x90/0xc0
> > [   81.325323]  do_syscall_64+0x3da/0x560
> > [   81.326240]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > [   81.327457] RIP: 0033:0x7f54096ecba3
> > [   81.328332] Code: 00 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 45 c3 0f 1f 40 00 48 83 ec 18 89 7c 24 0c e8
> > [   81.332741] RSP: 002b:00007ffcc5165698 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
> > [   81.334546] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f54096ecba3
> > [   81.336247] RDX: 00000000005cc5d0 RSI: 0000000000000005 RDI: 0000000000000004
> > [   81.337949] RBP: 0000000000000003 R08: 00000000005b8014 R09: 0000000000000004
> > [   81.339650] R10: 00000000005cc650 R11: 0000000000000246 R12: 00000000004022f0
> > [   81.341352] R13: 00007ffcc5165850 R14: 0000000000000000 R15: 0000000000000000
> > [   81.343059] Modules linked in: i915 mfd_core intel_gtt prime_numbers
> > [   81.345015] ---[ end trace 010aae55e56f8998 ]---
> > 
> > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> > 
> > drm/i915: Defer GGTT vm address space fini to vm release
> 
> Hum?

That's a patch squashing left-over I apparently missed, sorry.

> 
> > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/intel_ggtt.c | 13 +++++++++----
> >  drivers/gpu/drm/i915/gt/intel_gtt.h  |  1 +
> >  drivers/gpu/drm/i915/i915_drv.c      |  2 ++
> >  3 files changed, 12 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > index 66165b10256e..ff2b4f74149a 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> > @@ -701,7 +701,6 @@ static void ggtt_cleanup_hw(struct i915_ggtt *ggtt)
> >         ggtt->vm.cleanup(&ggtt->vm);
> >  
> >         mutex_unlock(&ggtt->vm.mutex);
> > -       i915_address_space_fini(&ggtt->vm);
> 
> Ok, so this was defered to release. Where are we going to drop the final ref?
> And also - I can see that we have a:
> 
> 	GEM_BUG_ON(i915_is_ggtt(vm));
> 
> in i915_vm_release().
> Which means that we probably don't drop the final ref and don't ever call
> i915_address_space_fini for ggtt.

Uh, that renders my solution invalid.  Let me take another approach.

Thanks,
Janusz

> 
> -Michał
> 
> >  
> >         arch_phys_wc_del(ggtt->mtrr);
> >  
> > @@ -709,6 +708,15 @@ static void ggtt_cleanup_hw(struct i915_ggtt *ggtt)
> >                 io_mapping_fini(&ggtt->iomap);
> >  }
> >  
> > +void i915_ggtt_driver_remove(struct drm_i915_private *i915)
> > +{
> > +       struct i915_ggtt *ggtt = &i915->ggtt;
> > +
> > +       fini_aliasing_ppgtt(ggtt);
> > +
> > +       ggtt_cleanup_hw(ggtt);
> > +}
> > +
> >  /**
> >   * i915_ggtt_driver_release - Clean up GGTT hardware initialization
> >   * @i915: i915 device
> > @@ -718,10 +726,7 @@ void i915_ggtt_driver_release(struct drm_i915_private *i915)
> >         struct i915_ggtt *ggtt = &i915->ggtt;
> >         struct pagevec *pvec;
> >  
> > -       fini_aliasing_ppgtt(ggtt);
> > -
> >         intel_ggtt_fini_fences(ggtt);
> > -       ggtt_cleanup_hw(ggtt);
> >  
> >         pvec = &i915->mm.wc_stash.pvec;
> >         if (pvec->nr) {
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > index d93ebdf3fa0e..f140ce5c171a 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
> > @@ -501,6 +501,7 @@ int i915_ggtt_enable_hw(struct drm_i915_private *i915);
> >  void i915_ggtt_enable_guc(struct i915_ggtt *ggtt);
> >  void i915_ggtt_disable_guc(struct i915_ggtt *ggtt);
> >  int i915_init_ggtt(struct drm_i915_private *i915);
> > +void i915_ggtt_driver_remove(struct drm_i915_private *i915);
> >  void i915_ggtt_driver_release(struct drm_i915_private *i915);
> >  
> >  static inline bool i915_ggtt_has_aperture(const struct i915_ggtt *ggtt)
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > index 34ee12f3f02d..e4d9f0f6f183 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -752,6 +752,7 @@ static int i915_driver_hw_probe(struct drm_i915_private *dev_priv)
> >  err_mem_regions:
> >         intel_memory_regions_driver_release(dev_priv);
> >  err_ggtt:
> > +       i915_ggtt_driver_remove(dev_priv);
> >         i915_ggtt_driver_release(dev_priv);
> >  err_perf:
> >         i915_perf_fini(dev_priv);
> > @@ -768,6 +769,7 @@ static void i915_driver_hw_remove(struct drm_i915_private *dev_priv)
> >  
> >         i915_perf_fini(dev_priv);
> >  
> > +       i915_ggtt_driver_remove(dev_priv);
> >         if (pdev->msi_enabled)
> >                 pci_disable_msi(pdev);
> >  
> > -- 
> > 2.21.1
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 4/4] drm/i915: Move UC firmware cleanup from driver_release to _remove
  2020-05-27 19:15   ` Michał Winiarski
@ 2020-05-28  9:13     ` Janusz Krzysztofik
  0 siblings, 0 replies; 20+ messages in thread
From: Janusz Krzysztofik @ 2020-05-28  9:13 UTC (permalink / raw)
  To: Michał Winiarski, intel-gfx

Hi Michał,

On Wed, 2020-05-27 at 21:15 +0200, Michał Winiarski wrote:
> Quoting Janusz Krzysztofik (2020-05-18 20:17:20)
> > UC firmware is stored in a GEM object.  Clean it up on driver remove to
> 					 ^ double whitespace

That's a result of my addiction to an old-fashioned style of separating
sentences with double space.  BTW, that's how my vi editor still joins
two lines when first of them ends with a full stop.

> > avoid intel-iommu triggered kernel panic on late DMA unmapping or even
> > an RPM related warning on object late removal in no IOMMU setups.
> 
> This is no longer the case after:
> drm/i915/gem: Lazily acquire the device wakeref for freeing objects
> 
> Right?

Not really.  That was only a first step in that direction, some
operations called from that processing path still try to acquire the
wakaref unconditionally.

> 
> > <4> [93.335282] ------------[ cut here ]------------
> > <4> [93.335515] pm_runtime_get_sync() failed: -13
> > <4> [93.336056] WARNING: CPU: 6 PID: 200 at drivers/gpu/drm/i915/intel_runtime_pm.c:361 __intel_runtime_pm_get+0x4d/0x60 [i915]
> > <4> [93.336104] Modules linked in: snd_hda_codec_hdmi mei_hdcp i915 x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel cdc_ether snd_intel_dspcfg usbnet snd_hda_codec mii snd_hwdep snd_hda_core e1000e snd_pcm ptp pps_core mei_me mei intel_lpss_pci prime_numbers
> > <4> [93.336268] CPU: 6 PID: 200 Comm: kworker/u16:3 Tainted: G     U            5.7.0-rc4-CI-Trybot_6405+ #1
> > <4> [93.336289] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake Y LPDDR4x T4 Crb, BIOS TGLSFWI1.R00.2457.A16.1912270059 12/27/2019
> > <4> [93.336811] Workqueue: i915 __i915_gem_free_work [i915]
> > <4> [93.337296] RIP: 0010:__intel_runtime_pm_get+0x4d/0x60 [i915]
> > <4> [93.337332] Code: ff ff 48 89 df 5b 5d e9 a1 fa ff ff 80 3d 4b 7a 2e 00 00 75 e1 89 c6 48 c7 c7 a8 2d 40 a0 c6 05 39 7a 2e 00 01 e8 53 fc e9 e0 <0f> 0b eb c8 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 41 57 41
> > <4> [93.337357] RSP: 0018:ffffc9000144bdd8 EFLAGS: 00010282
> > <4> [93.337384] RAX: 0000000000000000 RBX: ffff88838ee5bc40 RCX: 0000000000000001
> > <4> [93.337409] RDX: 0000000080000001 RSI: ffff88839d264928 RDI: 00000000ffffffff
> > <4> [93.337440] RBP: 0000000000000001 R08: ffff88839d264928 R09: 0000000000000000
> > <4> [93.337467] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88838ee5bc40
> > <4> [93.337493] R13: 0000000000000006 R14: ffffffff82769a30 R15: ffff88839376bab0
> > <4> [93.337515] FS:  0000000000000000(0000) GS:ffff8883a4100000(0000) knlGS:0000000000000000
> > <4> [93.337542] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > <4> [93.337563] CR2: 000055bc19b16ff8 CR3: 00000003a11c4005 CR4: 0000000000760ee0
> > <4> [93.337583] PKRU: 55555554
> > <4> [93.337605] Call Trace:
> > <4> [93.338148]  i915_gem_object_release_mmap+0x23/0x70 [i915]
> > <4> [93.338665]  __i915_gem_free_objects.isra.21+0x10a/0x4b0 [i915]
> > <4> [93.338741]  process_one_work+0x268/0x600
> > <4> [93.338785]  ? __schedule+0x307/0x8d0
> > <4> [93.338878]  worker_thread+0x37/0x380
> > <4> [93.338929]  ? process_one_work+0x600/0x600
> > <4> [93.338963]  kthread+0x140/0x160
> > <4> [93.339006]  ? kthread_park+0x80/0x80
> > <4> [93.339057]  ret_from_fork+0x24/0x50
> > <4> [93.339181] irq event stamp: 204220
> > <4> [93.339219] hardirqs last  enabled at (204219): [<ffffffff8113399d>] console_unlock+0x4cd/0x5a0
> > <4> [93.339250] hardirqs last disabled at (204220): [<ffffffff81001d50>] trace_hardirqs_off_thunk+0x1a/0x1c
> > <4> [93.339277] softirqs last  enabled at (204208): [<ffffffff81e00395>] __do_softirq+0x395/0x49e
> > <4> [93.339307] softirqs last disabled at (204197): [<ffffffff810bbc9a>] irq_exit+0xba/0xc0
> > <4> [93.339330] ---[ end trace f066187622b8c484 ]---
> > 
> > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> 
> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>

Thanks,
Janusz

> 
> -Michał
> 
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 87d3c4f5b6c6..f9d37c7e6d6f 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -1191,6 +1191,8 @@ void i915_gem_driver_remove(struct drm_i915_private *dev_priv)
> >  
> >         i915_gem_driver_remove__contexts(dev_priv);
> >  
> > +       intel_uc_cleanup_firmwares(&dev_priv->gt.uc);
> > +
> >         i915_gem_drain_freed_objects(dev_priv);
> >  }
> >  
> > @@ -1202,7 +1204,6 @@ void i915_gem_driver_release(struct drm_i915_private *dev_priv)
> >  
> >         intel_wa_list_free(&dev_priv->gt_wa_list);
> >  
> > -       intel_uc_cleanup_firmwares(&dev_priv->gt.uc);
> >         i915_gem_cleanup_userptr(dev_priv);
> >  
> >         i915_gem_drain_freed_objects(dev_priv);
> > -- 
> > 2.21.1
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 1/4] drm/i915: Drop user contexts on driver remove
  2020-05-18 18:17 ` [Intel-gfx] [RFC PATCH 1/4] drm/i915: Drop user contexts on driver remove Janusz Krzysztofik
  2020-05-27 14:23   ` Michał Winiarski
@ 2020-05-28 10:14   ` Tvrtko Ursulin
  2020-05-28 12:10     ` Janusz Krzysztofik
  1 sibling, 1 reply; 20+ messages in thread
From: Tvrtko Ursulin @ 2020-05-28 10:14 UTC (permalink / raw)
  To: Janusz Krzysztofik, intel-gfx


On 18/05/2020 19:17, Janusz Krzysztofik wrote:
> Contexts associated with open device file descriptors together with
> their assigned address spaces are now closed on device file close.  On

i915_gem_driver_remove looks like module unload to me, not device file 
close. So..

> address space closure its associated DMA mappings are revoked.  If the
> device is removed while being open, subsequent attempts to revoke
> those mappings while closing the device file descriptor may may be
> judged by intel-iommu code as a bug and result in kernel panic.
> 
> Since user contexts become useless after the device is no longer
> available, drop them on device removal.
> 
> <4> [36.900985] ------------[ cut here ]------------
> <2> [36.901005] kernel BUG at drivers/iommu/intel-iommu.c:3717!
> <4> [36.901105] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> <4> [36.901117] CPU: 0 PID: 39 Comm: kworker/u8:1 Tainted: G     U  W         5.7.0-rc5-CI-CI_DRM_8485+ #1
> <4> [36.901133] Hardware name: Intel Corporation Elkhart Lake Embedded Platform/ElkhartLake LPDDR4x T3 CRB, BIOS EHLSFWI1.R00.1484.A00.1911290833 11/29/2019
> <4> [36.901250] Workqueue: i915 __i915_vm_release [i915]
> <4> [36.901264] RIP: 0010:intel_unmap+0x1f5/0x230
> <4> [36.901274] Code: 01 e8 9f bc a9 ff 85 c0 74 09 80 3d df 60 09 01 00 74 19 65 ff 0d 13 12 97 7e 0f 85 fc fe ff ff e8 82 b0 95 ff e9 f2 fe ff ff <0f> 0b e8 d4 bd a9 ff 85 c0 75 de 48 c7 c2 10 84 2c 82 be 54 00 00
> <4> [36.901302] RSP: 0018:ffffc900001ebdc0 EFLAGS: 00010246
> <4> [36.901313] RAX: 0000000000000000 RBX: ffff8882561dd000 RCX: 0000000000000000
> <4> [36.901324] RDX: 0000000000001000 RSI: 00000000ffd9c000 RDI: ffff888274c94000
> <4> [36.901336] RBP: ffff888274c940b0 R08: 0000000000000000 R09: 0000000000000001
> <4> [36.901348] R10: 000000000a25d812 R11: 00000000112af2d4 R12: ffff888252c70200
> <4> [36.901360] R13: 00000000ffd9c000 R14: 0000000000001000 R15: ffff8882561dd010
> <4> [36.901372] FS:  0000000000000000(0000) GS:ffff888278000000(0000) knlGS:0000000000000000
> <4> [36.901386] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4> [36.901396] CR2: 00007f06def54950 CR3: 0000000255844000 CR4: 0000000000340ef0
> <4> [36.901408] Call Trace:
> <4> [36.901418]  ? process_one_work+0x1de/0x600
> <4> [36.901494]  cleanup_page_dma+0x37/0x70 [i915]
> <4> [36.901573]  free_pd+0x9/0x20 [i915]
> <4> [36.901644]  gen8_ppgtt_cleanup+0x59/0xc0 [i915]
> <4> [36.901721]  __i915_vm_release+0x14/0x30 [i915]
> <4> [36.901733]  process_one_work+0x268/0x600
> <4> [36.901744]  ? __schedule+0x307/0x8d0
> <4> [36.901756]  worker_thread+0x37/0x380
> <4> [36.901766]  ? process_one_work+0x600/0x600
> <4> [36.901775]  kthread+0x140/0x160
> <4> [36.901783]  ? kthread_park+0x80/0x80
> <4> [36.901792]  ret_from_fork+0x24/0x50
> <4> [36.901804] Modules linked in: mei_hdcp i915 x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ax88179_178a usbnet mii mei_me mei prime_numbers intel_lpss_pci
> <4> [36.901857] ---[ end trace 52d1b4d81f8d1ea7 ]---
> 
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_context.c | 38 +++++++++++++++++++++
>   drivers/gpu/drm/i915/gem/i915_gem_context.h |  1 +
>   drivers/gpu/drm/i915/i915_gem.c             |  2 ++
>   3 files changed, 41 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 900ea8b7fc8f..0096a69fbfd3 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -927,6 +927,44 @@ void i915_gem_driver_release__contexts(struct drm_i915_private *i915)
>   	rcu_barrier(); /* and flush the left over RCU frees */
>   }
>   
> +void i915_gem_driver_remove__contexts(struct drm_i915_private *i915)
> +{
> +	struct i915_gem_context *ctx, *cn;
> +
> +	list_for_each_entry_safe(ctx, cn, &i915->gem.contexts.list, link) {
> +		struct drm_i915_file_private *file_priv = ctx->file_priv;
> +		struct i915_gem_context *entry;
> +		unsigned long int id;
> +
> +		if (i915_gem_context_is_closed(ctx) || IS_ERR(file_priv))
> +			continue;
> +
> +		xa_for_each(&file_priv->context_xa, id, entry) {

... how is driver unload possible with open drm file descriptors, or 
active contexts? If something is going wrong sounds like something else.

drm postclose -> i915_gem_context_close -> closes all contexts and puts 
all vm. What can remain dangling? An active context? But there is idling 
via i915_gem_driver_remove -> i915_gem_suspend_late.

Regards,

Tvrtko

> +			struct i915_address_space *vm;
> +			unsigned long int idx;
> +
> +			if (entry != ctx)
> +				continue;
> +
> +			xa_erase(&file_priv->context_xa, id);
> +
> +			if (id)
> +				break;
> +
> +			xa_for_each(&file_priv->vm_xa, idx, vm) {
> +				xa_erase(&file_priv->vm_xa, idx);
> +				i915_vm_put(vm);
> +			}
> +
> +			break;
> +		}
> +
> +		context_close(ctx);
> +	}
> +
> +	i915_gem_driver_release__contexts(i915);
> +}
> +
>   static int gem_context_register(struct i915_gem_context *ctx,
>   				struct drm_i915_file_private *fpriv,
>   				u32 *id)
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h
> index 3702b2fb27ab..62808bea9239 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
> @@ -110,6 +110,7 @@ i915_gem_context_clear_user_engines(struct i915_gem_context *ctx)
>   
>   /* i915_gem_context.c */
>   void i915_gem_init__contexts(struct drm_i915_private *i915);
> +void i915_gem_driver_remove__contexts(struct drm_i915_private *i915);
>   void i915_gem_driver_release__contexts(struct drm_i915_private *i915);
>   
>   int i915_gem_context_open(struct drm_i915_private *i915,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 0cbcb9f54e7d..87d3c4f5b6c6 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1189,6 +1189,8 @@ void i915_gem_driver_remove(struct drm_i915_private *dev_priv)
>   	/* Flush any outstanding unpin_work. */
>   	i915_gem_drain_workqueue(dev_priv);
>   
> +	i915_gem_driver_remove__contexts(dev_priv);
> +
>   	i915_gem_drain_freed_objects(dev_priv);
>   }
>   
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 1/4] drm/i915: Drop user contexts on driver remove
  2020-05-28 10:14   ` Tvrtko Ursulin
@ 2020-05-28 12:10     ` Janusz Krzysztofik
  2020-05-28 13:34       ` Tvrtko Ursulin
  0 siblings, 1 reply; 20+ messages in thread
From: Janusz Krzysztofik @ 2020-05-28 12:10 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Hi Tvrtko,

On Thu, 2020-05-28 at 11:14 +0100, Tvrtko Ursulin wrote:
> On 18/05/2020 19:17, Janusz Krzysztofik wrote:
> > Contexts associated with open device file descriptors together with
> > their assigned address spaces are now closed on device file close.  On
> 
> i915_gem_driver_remove looks like module unload to me, not device file 
> close. So..

Not only module unload ...

> 
> > address space closure its associated DMA mappings are revoked.  If the
> > device is removed while being open, subsequent attempts to revoke
> > those mappings while closing the device file descriptor may may be
> > judged by intel-iommu code as a bug and result in kernel panic.
> > 
> > Since user contexts become useless after the device is no longer
> > available, drop them on device removal.
> > 
> > <4> [36.900985] ------------[ cut here ]------------
> > <2> [36.901005] kernel BUG at drivers/iommu/intel-iommu.c:3717!
> > <4> [36.901105] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > <4> [36.901117] CPU: 0 PID: 39 Comm: kworker/u8:1 Tainted: G     U  W         5.7.0-rc5-CI-CI_DRM_8485+ #1
> > <4> [36.901133] Hardware name: Intel Corporation Elkhart Lake Embedded Platform/ElkhartLake LPDDR4x T3 CRB, BIOS EHLSFWI1.R00.1484.A00.1911290833 11/29/2019
> > <4> [36.901250] Workqueue: i915 __i915_vm_release [i915]
> > <4> [36.901264] RIP: 0010:intel_unmap+0x1f5/0x230
> > <4> [36.901274] Code: 01 e8 9f bc a9 ff 85 c0 74 09 80 3d df 60 09 01 00 74 19 65 ff 0d 13 12 97 7e 0f 85 fc fe ff ff e8 82 b0 95 ff e9 f2 fe ff ff <0f> 0b e8 d4 bd a9 ff 85 c0 75 de 48 c7 c2 10 84 2c 82 be 54 00 00
> > <4> [36.901302] RSP: 0018:ffffc900001ebdc0 EFLAGS: 00010246
> > <4> [36.901313] RAX: 0000000000000000 RBX: ffff8882561dd000 RCX: 0000000000000000
> > <4> [36.901324] RDX: 0000000000001000 RSI: 00000000ffd9c000 RDI: ffff888274c94000
> > <4> [36.901336] RBP: ffff888274c940b0 R08: 0000000000000000 R09: 0000000000000001
> > <4> [36.901348] R10: 000000000a25d812 R11: 00000000112af2d4 R12: ffff888252c70200
> > <4> [36.901360] R13: 00000000ffd9c000 R14: 0000000000001000 R15: ffff8882561dd010
> > <4> [36.901372] FS:  0000000000000000(0000) GS:ffff888278000000(0000) knlGS:0000000000000000
> > <4> [36.901386] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > <4> [36.901396] CR2: 00007f06def54950 CR3: 0000000255844000 CR4: 0000000000340ef0
> > <4> [36.901408] Call Trace:
> > <4> [36.901418]  ? process_one_work+0x1de/0x600
> > <4> [36.901494]  cleanup_page_dma+0x37/0x70 [i915]
> > <4> [36.901573]  free_pd+0x9/0x20 [i915]
> > <4> [36.901644]  gen8_ppgtt_cleanup+0x59/0xc0 [i915]
> > <4> [36.901721]  __i915_vm_release+0x14/0x30 [i915]
> > <4> [36.901733]  process_one_work+0x268/0x600
> > <4> [36.901744]  ? __schedule+0x307/0x8d0
> > <4> [36.901756]  worker_thread+0x37/0x380
> > <4> [36.901766]  ? process_one_work+0x600/0x600
> > <4> [36.901775]  kthread+0x140/0x160
> > <4> [36.901783]  ? kthread_park+0x80/0x80
> > <4> [36.901792]  ret_from_fork+0x24/0x50
> > <4> [36.901804] Modules linked in: mei_hdcp i915 x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ax88179_178a usbnet mii mei_me mei prime_numbers intel_lpss_pci
> > <4> [36.901857] ---[ end trace 52d1b4d81f8d1ea7 ]---
> > 
> > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/gem/i915_gem_context.c | 38 +++++++++++++++++++++
> >   drivers/gpu/drm/i915/gem/i915_gem_context.h |  1 +
> >   drivers/gpu/drm/i915/i915_gem.c             |  2 ++
> >   3 files changed, 41 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > index 900ea8b7fc8f..0096a69fbfd3 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > @@ -927,6 +927,44 @@ void i915_gem_driver_release__contexts(struct drm_i915_private *i915)
> >   	rcu_barrier(); /* and flush the left over RCU frees */
> >   }
> >   
> > +void i915_gem_driver_remove__contexts(struct drm_i915_private *i915)
> > +{
> > +	struct i915_gem_context *ctx, *cn;
> > +
> > +	list_for_each_entry_safe(ctx, cn, &i915->gem.contexts.list, link) {
> > +		struct drm_i915_file_private *file_priv = ctx->file_priv;
> > +		struct i915_gem_context *entry;
> > +		unsigned long int id;
> > +
> > +		if (i915_gem_context_is_closed(ctx) || IS_ERR(file_priv))
> > +			continue;
> > +
> > +		xa_for_each(&file_priv->context_xa, id, entry) {
> 
> ... how is driver unload possible with open drm file descriptors, or 
> active contexts? 

... but also PCI driver unbind or PCI device remove, with the module
still loaded.  That may perfectly happen even if a device file
descriptor is still kept open.

> If something is going wrong sounds like something else.

I think we might consider that "something" as intel-iommu code, but see
also the last paragraph of my response below.

> 
> drm postclose -> i915_gem_context_close -> closes all contexts and puts 
> all vm. What can remain dangling? An active context? But there is idling 
> via i915_gem_driver_remove -> i915_gem_suspend_late.

Idling doesn't release DMA mappings which may then be released as late
as on i915_gem_driver_release.  If that doesn't happen on device remove
only later on last device close, intel-iommu triggers a bug.

I reported that issue to intel-iommu maintainers several months ago[1].
They even agreed that was something that might need to be fixed[2], I
provided them with information how the issue could be reproduced
easily[3] but no progress has been reported since then.  If you think
the issue should be definitely fixed on intel-iommu side, please help
me to convince intel-iommu maintainers to take care of that.

However, please note DMA-API also complains about DMA mappings still
left active when device is removed.  That may indicate we should rather
fix that on our side.

Thanks,
Janusz

[1] https://marc.info/?l=linux-kernel&m=156689857510234&w=2
[2] https://marc.info/?l=linux-kernel&m=156706978013967&w=2
[3] https://marc.info/?l=linux-kernel&m=156749651025416&w=2

> 
> Regards,
> 
> Tvrtko
> 
> > +			struct i915_address_space *vm;
> > +			unsigned long int idx;
> > +
> > +			if (entry != ctx)
> > +				continue;
> > +
> > +			xa_erase(&file_priv->context_xa, id);
> > +
> > +			if (id)
> > +				break;
> > +
> > +			xa_for_each(&file_priv->vm_xa, idx, vm) {
> > +				xa_erase(&file_priv->vm_xa, idx);
> > +				i915_vm_put(vm);
> > +			}
> > +
> > +			break;
> > +		}
> > +
> > +		context_close(ctx);
> > +	}
> > +
> > +	i915_gem_driver_release__contexts(i915);
> > +}
> > +
> >   static int gem_context_register(struct i915_gem_context *ctx,
> >   				struct drm_i915_file_private *fpriv,
> >   				u32 *id)
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h
> > index 3702b2fb27ab..62808bea9239 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
> > @@ -110,6 +110,7 @@ i915_gem_context_clear_user_engines(struct i915_gem_context *ctx)
> >   
> >   /* i915_gem_context.c */
> >   void i915_gem_init__contexts(struct drm_i915_private *i915);
> > +void i915_gem_driver_remove__contexts(struct drm_i915_private *i915);
> >   void i915_gem_driver_release__contexts(struct drm_i915_private *i915);
> >   
> >   int i915_gem_context_open(struct drm_i915_private *i915,
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 0cbcb9f54e7d..87d3c4f5b6c6 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -1189,6 +1189,8 @@ void i915_gem_driver_remove(struct drm_i915_private *dev_priv)
> >   	/* Flush any outstanding unpin_work. */
> >   	i915_gem_drain_workqueue(dev_priv);
> >   
> > +	i915_gem_driver_remove__contexts(dev_priv);
> > +
> >   	i915_gem_drain_freed_objects(dev_priv);
> >   }
> >   
> > 

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 1/4] drm/i915: Drop user contexts on driver remove
  2020-05-28 12:10     ` Janusz Krzysztofik
@ 2020-05-28 13:34       ` Tvrtko Ursulin
  2020-05-28 13:41         ` Chris Wilson
  0 siblings, 1 reply; 20+ messages in thread
From: Tvrtko Ursulin @ 2020-05-28 13:34 UTC (permalink / raw)
  To: Janusz Krzysztofik, intel-gfx


On 28/05/2020 13:10, Janusz Krzysztofik wrote:
> Hi Tvrtko,
> 
> On Thu, 2020-05-28 at 11:14 +0100, Tvrtko Ursulin wrote:
>> On 18/05/2020 19:17, Janusz Krzysztofik wrote:
>>> Contexts associated with open device file descriptors together with
>>> their assigned address spaces are now closed on device file close.  On
>>
>> i915_gem_driver_remove looks like module unload to me, not device file
>> close. So..
> 
> Not only module unload ...
> 
>>
>>> address space closure its associated DMA mappings are revoked.  If the
>>> device is removed while being open, subsequent attempts to revoke
>>> those mappings while closing the device file descriptor may may be
>>> judged by intel-iommu code as a bug and result in kernel panic.
>>>
>>> Since user contexts become useless after the device is no longer
>>> available, drop them on device removal.
>>>
>>> <4> [36.900985] ------------[ cut here ]------------
>>> <2> [36.901005] kernel BUG at drivers/iommu/intel-iommu.c:3717!
>>> <4> [36.901105] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>>> <4> [36.901117] CPU: 0 PID: 39 Comm: kworker/u8:1 Tainted: G     U  W         5.7.0-rc5-CI-CI_DRM_8485+ #1
>>> <4> [36.901133] Hardware name: Intel Corporation Elkhart Lake Embedded Platform/ElkhartLake LPDDR4x T3 CRB, BIOS EHLSFWI1.R00.1484.A00.1911290833 11/29/2019
>>> <4> [36.901250] Workqueue: i915 __i915_vm_release [i915]
>>> <4> [36.901264] RIP: 0010:intel_unmap+0x1f5/0x230
>>> <4> [36.901274] Code: 01 e8 9f bc a9 ff 85 c0 74 09 80 3d df 60 09 01 00 74 19 65 ff 0d 13 12 97 7e 0f 85 fc fe ff ff e8 82 b0 95 ff e9 f2 fe ff ff <0f> 0b e8 d4 bd a9 ff 85 c0 75 de 48 c7 c2 10 84 2c 82 be 54 00 00
>>> <4> [36.901302] RSP: 0018:ffffc900001ebdc0 EFLAGS: 00010246
>>> <4> [36.901313] RAX: 0000000000000000 RBX: ffff8882561dd000 RCX: 0000000000000000
>>> <4> [36.901324] RDX: 0000000000001000 RSI: 00000000ffd9c000 RDI: ffff888274c94000
>>> <4> [36.901336] RBP: ffff888274c940b0 R08: 0000000000000000 R09: 0000000000000001
>>> <4> [36.901348] R10: 000000000a25d812 R11: 00000000112af2d4 R12: ffff888252c70200
>>> <4> [36.901360] R13: 00000000ffd9c000 R14: 0000000000001000 R15: ffff8882561dd010
>>> <4> [36.901372] FS:  0000000000000000(0000) GS:ffff888278000000(0000) knlGS:0000000000000000
>>> <4> [36.901386] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> <4> [36.901396] CR2: 00007f06def54950 CR3: 0000000255844000 CR4: 0000000000340ef0
>>> <4> [36.901408] Call Trace:
>>> <4> [36.901418]  ? process_one_work+0x1de/0x600
>>> <4> [36.901494]  cleanup_page_dma+0x37/0x70 [i915]
>>> <4> [36.901573]  free_pd+0x9/0x20 [i915]
>>> <4> [36.901644]  gen8_ppgtt_cleanup+0x59/0xc0 [i915]
>>> <4> [36.901721]  __i915_vm_release+0x14/0x30 [i915]
>>> <4> [36.901733]  process_one_work+0x268/0x600
>>> <4> [36.901744]  ? __schedule+0x307/0x8d0
>>> <4> [36.901756]  worker_thread+0x37/0x380
>>> <4> [36.901766]  ? process_one_work+0x600/0x600
>>> <4> [36.901775]  kthread+0x140/0x160
>>> <4> [36.901783]  ? kthread_park+0x80/0x80
>>> <4> [36.901792]  ret_from_fork+0x24/0x50
>>> <4> [36.901804] Modules linked in: mei_hdcp i915 x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ax88179_178a usbnet mii mei_me mei prime_numbers intel_lpss_pci
>>> <4> [36.901857] ---[ end trace 52d1b4d81f8d1ea7 ]---
>>>
>>> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gem/i915_gem_context.c | 38 +++++++++++++++++++++
>>>    drivers/gpu/drm/i915/gem/i915_gem_context.h |  1 +
>>>    drivers/gpu/drm/i915/i915_gem.c             |  2 ++
>>>    3 files changed, 41 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>> index 900ea8b7fc8f..0096a69fbfd3 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>> @@ -927,6 +927,44 @@ void i915_gem_driver_release__contexts(struct drm_i915_private *i915)
>>>    	rcu_barrier(); /* and flush the left over RCU frees */
>>>    }
>>>    
>>> +void i915_gem_driver_remove__contexts(struct drm_i915_private *i915)
>>> +{
>>> +	struct i915_gem_context *ctx, *cn;
>>> +
>>> +	list_for_each_entry_safe(ctx, cn, &i915->gem.contexts.list, link) {
>>> +		struct drm_i915_file_private *file_priv = ctx->file_priv;
>>> +		struct i915_gem_context *entry;
>>> +		unsigned long int id;
>>> +
>>> +		if (i915_gem_context_is_closed(ctx) || IS_ERR(file_priv))
>>> +			continue;
>>> +
>>> +		xa_for_each(&file_priv->context_xa, id, entry) {
>>
>> ... how is driver unload possible with open drm file descriptors, or
>> active contexts?
> 
> ... but also PCI driver unbind or PCI device remove, with the module
> still loaded.  That may perfectly happen even if a device file
> descriptor is still kept open.

I see. What do we do, or plan to do, with those left open drm fds after 
the driver is unbound from the device? Is there a path connected to keep 
saying -ENODEV (Or is a different errno standard for this case, like 
ENXIO?) from that point onward for everything done with that fd. So 
userspace couldn't do anything more with it, attempt to create a new 
context etc. Is the DRM core handling this?

Regards,

Tvrtko

> 
>> If something is going wrong sounds like something else.
> 
> I think we might consider that "something" as intel-iommu code, but see
> also the last paragraph of my response below.
> 
>>
>> drm postclose -> i915_gem_context_close -> closes all contexts and puts
>> all vm. What can remain dangling? An active context? But there is idling
>> via i915_gem_driver_remove -> i915_gem_suspend_late.
> 
> Idling doesn't release DMA mappings which may then be released as late
> as on i915_gem_driver_release.  If that doesn't happen on device remove
> only later on last device close, intel-iommu triggers a bug.
> 
> I reported that issue to intel-iommu maintainers several months ago[1].
> They even agreed that was something that might need to be fixed[2], I
> provided them with information how the issue could be reproduced
> easily[3] but no progress has been reported since then.  If you think
> the issue should be definitely fixed on intel-iommu side, please help
> me to convince intel-iommu maintainers to take care of that.
> 
> However, please note DMA-API also complains about DMA mappings still
> left active when device is removed.  That may indicate we should rather
> fix that on our side.
> 
> Thanks,
> Janusz
> 
> [1] https://marc.info/?l=linux-kernel&m=156689857510234&w=2
> [2] https://marc.info/?l=linux-kernel&m=156706978013967&w=2
> [3] https://marc.info/?l=linux-kernel&m=156749651025416&w=2
> 
>>
>> Regards,
>>
>> Tvrtko
>>
>>> +			struct i915_address_space *vm;
>>> +			unsigned long int idx;
>>> +
>>> +			if (entry != ctx)
>>> +				continue;
>>> +
>>> +			xa_erase(&file_priv->context_xa, id);
>>> +
>>> +			if (id)
>>> +				break;
>>> +
>>> +			xa_for_each(&file_priv->vm_xa, idx, vm) {
>>> +				xa_erase(&file_priv->vm_xa, idx);
>>> +				i915_vm_put(vm);
>>> +			}
>>> +
>>> +			break;
>>> +		}
>>> +
>>> +		context_close(ctx);
>>> +	}
>>> +
>>> +	i915_gem_driver_release__contexts(i915);
>>> +}
>>> +
>>>    static int gem_context_register(struct i915_gem_context *ctx,
>>>    				struct drm_i915_file_private *fpriv,
>>>    				u32 *id)
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h
>>> index 3702b2fb27ab..62808bea9239 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
>>> @@ -110,6 +110,7 @@ i915_gem_context_clear_user_engines(struct i915_gem_context *ctx)
>>>    
>>>    /* i915_gem_context.c */
>>>    void i915_gem_init__contexts(struct drm_i915_private *i915);
>>> +void i915_gem_driver_remove__contexts(struct drm_i915_private *i915);
>>>    void i915_gem_driver_release__contexts(struct drm_i915_private *i915);
>>>    
>>>    int i915_gem_context_open(struct drm_i915_private *i915,
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>> index 0cbcb9f54e7d..87d3c4f5b6c6 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -1189,6 +1189,8 @@ void i915_gem_driver_remove(struct drm_i915_private *dev_priv)
>>>    	/* Flush any outstanding unpin_work. */
>>>    	i915_gem_drain_workqueue(dev_priv);
>>>    
>>> +	i915_gem_driver_remove__contexts(dev_priv);
>>> +
>>>    	i915_gem_drain_freed_objects(dev_priv);
>>>    }
>>>    
>>>
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 1/4] drm/i915: Drop user contexts on driver remove
  2020-05-28 13:34       ` Tvrtko Ursulin
@ 2020-05-28 13:41         ` Chris Wilson
  2020-05-28 14:00           ` Janusz Krzysztofik
  0 siblings, 1 reply; 20+ messages in thread
From: Chris Wilson @ 2020-05-28 13:41 UTC (permalink / raw)
  To: Janusz Krzysztofik, Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-05-28 14:34:42)
> 
> On 28/05/2020 13:10, Janusz Krzysztofik wrote:
> > Hi Tvrtko,
> > 
> > On Thu, 2020-05-28 at 11:14 +0100, Tvrtko Ursulin wrote:
> >> On 18/05/2020 19:17, Janusz Krzysztofik wrote:
> >>> Contexts associated with open device file descriptors together with
> >>> their assigned address spaces are now closed on device file close.  On
> >>
> >> i915_gem_driver_remove looks like module unload to me, not device file
> >> close. So..
> > 
> > Not only module unload ...
> > 
> >>
> >>> address space closure its associated DMA mappings are revoked.  If the
> >>> device is removed while being open, subsequent attempts to revoke
> >>> those mappings while closing the device file descriptor may may be
> >>> judged by intel-iommu code as a bug and result in kernel panic.
> >>>
> >>> Since user contexts become useless after the device is no longer
> >>> available, drop them on device removal.
> >>>
> >>> <4> [36.900985] ------------[ cut here ]------------
> >>> <2> [36.901005] kernel BUG at drivers/iommu/intel-iommu.c:3717!
> >>> <4> [36.901105] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> >>> <4> [36.901117] CPU: 0 PID: 39 Comm: kworker/u8:1 Tainted: G     U  W         5.7.0-rc5-CI-CI_DRM_8485+ #1
> >>> <4> [36.901133] Hardware name: Intel Corporation Elkhart Lake Embedded Platform/ElkhartLake LPDDR4x T3 CRB, BIOS EHLSFWI1.R00.1484.A00.1911290833 11/29/2019
> >>> <4> [36.901250] Workqueue: i915 __i915_vm_release [i915]
> >>> <4> [36.901264] RIP: 0010:intel_unmap+0x1f5/0x230
> >>> <4> [36.901274] Code: 01 e8 9f bc a9 ff 85 c0 74 09 80 3d df 60 09 01 00 74 19 65 ff 0d 13 12 97 7e 0f 85 fc fe ff ff e8 82 b0 95 ff e9 f2 fe ff ff <0f> 0b e8 d4 bd a9 ff 85 c0 75 de 48 c7 c2 10 84 2c 82 be 54 00 00
> >>> <4> [36.901302] RSP: 0018:ffffc900001ebdc0 EFLAGS: 00010246
> >>> <4> [36.901313] RAX: 0000000000000000 RBX: ffff8882561dd000 RCX: 0000000000000000
> >>> <4> [36.901324] RDX: 0000000000001000 RSI: 00000000ffd9c000 RDI: ffff888274c94000
> >>> <4> [36.901336] RBP: ffff888274c940b0 R08: 0000000000000000 R09: 0000000000000001
> >>> <4> [36.901348] R10: 000000000a25d812 R11: 00000000112af2d4 R12: ffff888252c70200
> >>> <4> [36.901360] R13: 00000000ffd9c000 R14: 0000000000001000 R15: ffff8882561dd010
> >>> <4> [36.901372] FS:  0000000000000000(0000) GS:ffff888278000000(0000) knlGS:0000000000000000
> >>> <4> [36.901386] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> <4> [36.901396] CR2: 00007f06def54950 CR3: 0000000255844000 CR4: 0000000000340ef0
> >>> <4> [36.901408] Call Trace:
> >>> <4> [36.901418]  ? process_one_work+0x1de/0x600
> >>> <4> [36.901494]  cleanup_page_dma+0x37/0x70 [i915]
> >>> <4> [36.901573]  free_pd+0x9/0x20 [i915]
> >>> <4> [36.901644]  gen8_ppgtt_cleanup+0x59/0xc0 [i915]
> >>> <4> [36.901721]  __i915_vm_release+0x14/0x30 [i915]
> >>> <4> [36.901733]  process_one_work+0x268/0x600
> >>> <4> [36.901744]  ? __schedule+0x307/0x8d0
> >>> <4> [36.901756]  worker_thread+0x37/0x380
> >>> <4> [36.901766]  ? process_one_work+0x600/0x600
> >>> <4> [36.901775]  kthread+0x140/0x160
> >>> <4> [36.901783]  ? kthread_park+0x80/0x80
> >>> <4> [36.901792]  ret_from_fork+0x24/0x50
> >>> <4> [36.901804] Modules linked in: mei_hdcp i915 x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ax88179_178a usbnet mii mei_me mei prime_numbers intel_lpss_pci
> >>> <4> [36.901857] ---[ end trace 52d1b4d81f8d1ea7 ]---
> >>>
> >>> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> >>> ---
> >>>    drivers/gpu/drm/i915/gem/i915_gem_context.c | 38 +++++++++++++++++++++
> >>>    drivers/gpu/drm/i915/gem/i915_gem_context.h |  1 +
> >>>    drivers/gpu/drm/i915/i915_gem.c             |  2 ++
> >>>    3 files changed, 41 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> >>> index 900ea8b7fc8f..0096a69fbfd3 100644
> >>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> >>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> >>> @@ -927,6 +927,44 @@ void i915_gem_driver_release__contexts(struct drm_i915_private *i915)
> >>>     rcu_barrier(); /* and flush the left over RCU frees */
> >>>    }
> >>>    
> >>> +void i915_gem_driver_remove__contexts(struct drm_i915_private *i915)
> >>> +{
> >>> +   struct i915_gem_context *ctx, *cn;
> >>> +
> >>> +   list_for_each_entry_safe(ctx, cn, &i915->gem.contexts.list, link) {
> >>> +           struct drm_i915_file_private *file_priv = ctx->file_priv;
> >>> +           struct i915_gem_context *entry;
> >>> +           unsigned long int id;
> >>> +
> >>> +           if (i915_gem_context_is_closed(ctx) || IS_ERR(file_priv))
> >>> +                   continue;
> >>> +
> >>> +           xa_for_each(&file_priv->context_xa, id, entry) {
> >>
> >> ... how is driver unload possible with open drm file descriptors, or
> >> active contexts?
> > 
> > ... but also PCI driver unbind or PCI device remove, with the module
> > still loaded.  That may perfectly happen even if a device file
> > descriptor is still kept open.
> 
> I see. What do we do, or plan to do, with those left open drm fds after 
> the driver is unbound from the device? Is there a path connected to keep 
> saying -ENODEV (Or is a different errno standard for this case, like 
> ENXIO?) from that point onward for everything done with that fd. So 
> userspace couldn't do anything more with it, attempt to create a new 
> context etc. Is the DRM core handling this?

We mark the device as wedged, so any attempt to use the GPU results in
-EIO. There's a loose edge in the faulthandler, but marking the ggtt as
closed and SIGBUS on trying to fault in an non-existent page takes care
of that.

There's a very expensive srcu in every drm_ioctl for checking unplugged
that we don't need because we already do the -EIO.

In particular this patch is worthless, or worse misdirection. The issue
is not the contexts or page directories and this is not fixing the
problem and only a few of the symptoms.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Intel-gfx] [RFC PATCH 1/4] drm/i915: Drop user contexts on driver remove
  2020-05-28 13:41         ` Chris Wilson
@ 2020-05-28 14:00           ` Janusz Krzysztofik
  0 siblings, 0 replies; 20+ messages in thread
From: Janusz Krzysztofik @ 2020-05-28 14:00 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, intel-gfx

Hi Chris,

On Thu, 2020-05-28 at 14:41 +0100, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-05-28 14:34:42)
> > On 28/05/2020 13:10, Janusz Krzysztofik wrote:
> > > Hi Tvrtko,
> > > 
> > > On Thu, 2020-05-28 at 11:14 +0100, Tvrtko Ursulin wrote:
> > > > On 18/05/2020 19:17, Janusz Krzysztofik wrote:
> > > > > Contexts associated with open device file descriptors together with
> > > > > their assigned address spaces are now closed on device file close.  On
> > > > 
> > > > i915_gem_driver_remove looks like module unload to me, not device file
> > > > close. So..
> > > 
> > > Not only module unload ...
> > > 
> > > > > address space closure its associated DMA mappings are revoked.  If the
> > > > > device is removed while being open, subsequent attempts to revoke
> > > > > those mappings while closing the device file descriptor may may be
> > > > > judged by intel-iommu code as a bug and result in kernel panic.
> > > > > 
> > > > > Since user contexts become useless after the device is no longer
> > > > > available, drop them on device removal.
> > > > > 
> > > > > <4> [36.900985] ------------[ cut here ]------------
> > > > > <2> [36.901005] kernel BUG at drivers/iommu/intel-iommu.c:3717!
> > > > > <4> [36.901105] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > > > > <4> [36.901117] CPU: 0 PID: 39 Comm: kworker/u8:1 Tainted: G     U  W         5.7.0-rc5-CI-CI_DRM_8485+ #1
> > > > > <4> [36.901133] Hardware name: Intel Corporation Elkhart Lake Embedded Platform/ElkhartLake LPDDR4x T3 CRB, BIOS EHLSFWI1.R00.1484.A00.1911290833 11/29/2019
> > > > > <4> [36.901250] Workqueue: i915 __i915_vm_release [i915]
> > > > > <4> [36.901264] RIP: 0010:intel_unmap+0x1f5/0x230
> > > > > <4> [36.901274] Code: 01 e8 9f bc a9 ff 85 c0 74 09 80 3d df 60 09 01 00 74 19 65 ff 0d 13 12 97 7e 0f 85 fc fe ff ff e8 82 b0 95 ff e9 f2 fe ff ff <0f> 0b e8 d4 bd a9 ff 85 c0 75 de 48 c7 c2 10 84 2c 82 be 54 00 00
> > > > > <4> [36.901302] RSP: 0018:ffffc900001ebdc0 EFLAGS: 00010246
> > > > > <4> [36.901313] RAX: 0000000000000000 RBX: ffff8882561dd000 RCX: 0000000000000000
> > > > > <4> [36.901324] RDX: 0000000000001000 RSI: 00000000ffd9c000 RDI: ffff888274c94000
> > > > > <4> [36.901336] RBP: ffff888274c940b0 R08: 0000000000000000 R09: 0000000000000001
> > > > > <4> [36.901348] R10: 000000000a25d812 R11: 00000000112af2d4 R12: ffff888252c70200
> > > > > <4> [36.901360] R13: 00000000ffd9c000 R14: 0000000000001000 R15: ffff8882561dd010
> > > > > <4> [36.901372] FS:  0000000000000000(0000) GS:ffff888278000000(0000) knlGS:0000000000000000
> > > > > <4> [36.901386] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > <4> [36.901396] CR2: 00007f06def54950 CR3: 0000000255844000 CR4: 0000000000340ef0
> > > > > <4> [36.901408] Call Trace:
> > > > > <4> [36.901418]  ? process_one_work+0x1de/0x600
> > > > > <4> [36.901494]  cleanup_page_dma+0x37/0x70 [i915]
> > > > > <4> [36.901573]  free_pd+0x9/0x20 [i915]
> > > > > <4> [36.901644]  gen8_ppgtt_cleanup+0x59/0xc0 [i915]
> > > > > <4> [36.901721]  __i915_vm_release+0x14/0x30 [i915]
> > > > > <4> [36.901733]  process_one_work+0x268/0x600
> > > > > <4> [36.901744]  ? __schedule+0x307/0x8d0
> > > > > <4> [36.901756]  worker_thread+0x37/0x380
> > > > > <4> [36.901766]  ? process_one_work+0x600/0x600
> > > > > <4> [36.901775]  kthread+0x140/0x160
> > > > > <4> [36.901783]  ? kthread_park+0x80/0x80
> > > > > <4> [36.901792]  ret_from_fork+0x24/0x50
> > > > > <4> [36.901804] Modules linked in: mei_hdcp i915 x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ax88179_178a usbnet mii mei_me mei prime_numbers intel_lpss_pci
> > > > > <4> [36.901857] ---[ end trace 52d1b4d81f8d1ea7 ]---
> > > > > 
> > > > > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> > > > > ---
> > > > >    drivers/gpu/drm/i915/gem/i915_gem_context.c | 38 +++++++++++++++++++++
> > > > >    drivers/gpu/drm/i915/gem/i915_gem_context.h |  1 +
> > > > >    drivers/gpu/drm/i915/i915_gem.c             |  2 ++
> > > > >    3 files changed, 41 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > > > index 900ea8b7fc8f..0096a69fbfd3 100644
> > > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > > > @@ -927,6 +927,44 @@ void i915_gem_driver_release__contexts(struct drm_i915_private *i915)
> > > > >     rcu_barrier(); /* and flush the left over RCU frees */
> > > > >    }
> > > > >    
> > > > > +void i915_gem_driver_remove__contexts(struct drm_i915_private *i915)
> > > > > +{
> > > > > +   struct i915_gem_context *ctx, *cn;
> > > > > +
> > > > > +   list_for_each_entry_safe(ctx, cn, &i915->gem.contexts.list, link) {
> > > > > +           struct drm_i915_file_private *file_priv = ctx->file_priv;
> > > > > +           struct i915_gem_context *entry;
> > > > > +           unsigned long int id;
> > > > > +
> > > > > +           if (i915_gem_context_is_closed(ctx) || IS_ERR(file_priv))
> > > > > +                   continue;
> > > > > +
> > > > > +           xa_for_each(&file_priv->context_xa, id, entry) {
> > > > 
> > > > ... how is driver unload possible with open drm file descriptors, or
> > > > active contexts?
> > > 
> > > ... but also PCI driver unbind or PCI device remove, with the module
> > > still loaded.  That may perfectly happen even if a device file
> > > descriptor is still kept open.
> > 
> > I see. What do we do, or plan to do, with those left open drm fds after 
> > the driver is unbound from the device? Is there a path connected to keep 
> > saying -ENODEV (Or is a different errno standard for this case, like 
> > ENXIO?) from that point onward for everything done with that fd. So 
> > userspace couldn't do anything more with it, attempt to create a new 
> > context etc. Is the DRM core handling this?
> 
> We mark the device as wedged, so any attempt to use the GPU results in
> -EIO. There's a loose edge in the faulthandler, but marking the ggtt as
> closed and SIGBUS on trying to fault in an non-existent page takes care
> of that.
> 
> There's a very expensive srcu in every drm_ioctl for checking unplugged
> that we don't need because we already do the -EIO.

Thanks for help with that explanation.

> 
> In particular this patch is worthless, or worse misdirection. The issue
> is not the contexts or page directories and this is not fixing the
> problem and only a few of the symptoms.

Please note this patch is only a part of a series.  My findings are
that it is needed for the solution the series proposes to be effective.
The idea is to release all DMA mappings still on driver remove.  I
would really appreciate your comments on that idea and maybe on other
patches of the series as well.

Thanks,
Janusz
 
> -Chris

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-05-28 14:00 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-18 18:17 [Intel-gfx] [RFC PATCH 0/4] drm/i915: Resolve device hotunplug issues Janusz Krzysztofik
2020-05-18 18:17 ` [Intel-gfx] [RFC PATCH 1/4] drm/i915: Drop user contexts on driver remove Janusz Krzysztofik
2020-05-27 14:23   ` Michał Winiarski
2020-05-28 10:14   ` Tvrtko Ursulin
2020-05-28 12:10     ` Janusz Krzysztofik
2020-05-28 13:34       ` Tvrtko Ursulin
2020-05-28 13:41         ` Chris Wilson
2020-05-28 14:00           ` Janusz Krzysztofik
2020-05-18 18:17 ` [Intel-gfx] [RFC PATCH 2/4] drm/i915: Release GT resources " Janusz Krzysztofik
2020-05-27 18:35   ` Michał Winiarski
2020-05-18 18:17 ` [Intel-gfx] [RFC PATCH 3/4] drm/i915: Move GGTT cleanup from driver_release to _remove Janusz Krzysztofik
2020-05-27 19:12   ` Michał Winiarski
2020-05-28  9:12     ` Janusz Krzysztofik
2020-05-18 18:17 ` [Intel-gfx] [RFC PATCH 4/4] drm/i915: Move UC firmware " Janusz Krzysztofik
2020-05-27 19:15   ` Michał Winiarski
2020-05-28  9:13     ` Janusz Krzysztofik
2020-05-18 23:07 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Resolve device hotunplug issues Patchwork
2020-05-18 23:08 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2020-05-18 23:29 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2020-05-19  0:54 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.