From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 26394C6FA86 for ; Mon, 19 Sep 2022 16:08:48 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 95DFB10E692; Mon, 19 Sep 2022 16:08:46 +0000 (UTC) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id A381F10E691; Mon, 19 Sep 2022 16:08:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663603722; x=1695139722; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=NPs0wAMiICa5jSbcC4YEuaTH4A5GViFkO+tb3iPLa2M=; b=UBfRZcrFMjiLEBVO8VurqZbgkNm8wCQJVGzeb3dGXSzSrXRxHq2eKvwG gTKIfLTnExtWMn7kmvHSgWQTsC/HhRJW/dUTXv3KYbaaGQtuRtnrk52pR BkBM8s3/yiBo2QGyUwl9CnKmAQliyzdmETFCINqTBHeHgWM4ydQxAI/9S Vbgz05NH/RjVEn/y996yAxsycnQKY9tZCEKA3TEEbc975dzHAxfHakwbN JOJ6pWOdAcuCx4CO6+J0WxPAv6YuboZs1uQWmRe5q/spCQo6rNkCmJBN3 bWgCh3V+9GtorZF66mmQ7KAgB/4SBotKVoSYOHwn2Qo4KIGtqzDtnqn6/ w==; X-IronPort-AV: E=McAfee;i="6500,9779,10475"; a="325734535" X-IronPort-AV: E=Sophos;i="5.93,328,1654585200"; d="scan'208";a="325734535" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Sep 2022 09:08:39 -0700 X-IronPort-AV: E=Sophos;i="5.93,328,1654585200"; d="scan'208";a="569707417" Received: from biancabe-mobl1.ger.corp.intel.com (HELO intel.com) ([10.252.32.244]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Sep 2022 09:08:33 -0700 Date: Mon, 19 Sep 2022 18:08:26 +0200 From: Andi Shyti To: Janusz Krzysztofik Subject: Re: [PATCH v3 0/2] drm/i915/gem: Really move i915_gem_context.link under ref protection Message-ID: References: <20220916092403.201355-1-janusz.krzysztofik@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220916092403.201355-1-janusz.krzysztofik@linux.intel.com> X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tvrtko Ursulin , intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, Chris Wilson , Andi Shyti Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Pushed, Thanks! Andi On Fri, Sep 16, 2022 at 11:24:01AM +0200, Janusz Krzysztofik wrote: > i915_perf assumes that it can use the i915_gem_context reference to > protect its i915->gem.contexts.list iteration. However, this requires > that we do not remove the context from the list until after we drop the > final reference and release the struct. If, as currently, we remove the > context from the list during context_close(), the link.next pointer may > be poisoned while we are holding the context reference and cause a GPF: > > [ 4070.573157] i915 0000:00:02.0: [drm:i915_perf_open_ioctl [i915]] filtering on ctx_id=0x > 1fffff ctx_id_mask=0x1fffff > [ 4070.574881] general protection fault, probably for non-canonical address 0xdead00000000 > 0100: 0000 [#1] PREEMPT SMP > [ 4070.574897] CPU: 1 PID: 284392 Comm: amd_performance Tainted: G E 5.17.9 > #180 > [ 4070.574903] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017 > [ 4070.574907] RIP: 0010:oa_configure_all_contexts.isra.0+0x222/0x350 [i915] > [ 4070.574982] Code: 08 e8 32 6e 10 e1 4d 8b 6d 50 b8 ff ff ff ff 49 83 ed 50 f0 41 0f c1 04 24 83 f8 01 0f 84 e3 00 00 00 85 c0 0f 8e fa 00 00 00 <49> 8b 45 50 48 8d 70 b0 49 8d 45 50 48 39 44 24 10 0f 85 34 fe ff > [ 4070.574990] RSP: 0018:ffffc90002077b78 EFLAGS: 00010202 > [ 4070.574995] RAX: 0000000000000002 RBX: 0000000000000002 RCX: 0000000000000000 > [ 4070.575000] RDX: 0000000000000001 RSI: ffffc90002077b20 RDI: ffff88810ddc7c68 > [ 4070.575004] RBP: 0000000000000001 R08: ffff888103242648 R09: fffffffffffffffc > [ 4070.575008] R10: ffffffff82c50bc0 R11: 0000000000025c80 R12: ffff888101bf1860 > [ 4070.575012] R13: dead0000000000b0 R14: ffffc90002077c04 R15: ffff88810be5cabc > [ 4070.575016] FS: 00007f1ed50c0780(0000) GS:ffff88885ec80000(0000) knlGS:0000000000000000 > [ 4070.575021] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 4070.575025] CR2: 00007f1ed5590280 CR3: 000000010ef6f005 CR4: 00000000003706e0 > [ 4070.575029] Call Trace: > [ 4070.575033] > [ 4070.575037] lrc_configure_all_contexts+0x13e/0x150 [i915] > [ 4070.575103] gen8_enable_metric_set+0x4d/0x90 [i915] > [ 4070.575164] i915_perf_open_ioctl+0xbc0/0x1500 [i915] > [ 4070.575224] ? asm_common_interrupt+0x1e/0x40 > [ 4070.575232] ? i915_oa_init_reg_state+0x110/0x110 [i915] > [ 4070.575290] drm_ioctl_kernel+0x85/0x110 > [ 4070.575296] ? update_load_avg+0x5f/0x5e0 > [ 4070.575302] drm_ioctl+0x1d3/0x370 > [ 4070.575307] ? i915_oa_init_reg_state+0x110/0x110 [i915] > [ 4070.575382] ? gen8_gt_irq_handler+0x46/0x130 [i915] > [ 4070.575445] __x64_sys_ioctl+0x3c4/0x8d0 > [ 4070.575451] ? __do_softirq+0xaa/0x1d2 > [ 4070.575456] do_syscall_64+0x35/0x80 > [ 4070.575461] entry_SYSCALL_64_after_hwframe+0x44/0xae > [ 4070.575467] RIP: 0033:0x7f1ed5c10397 > [ 4070.575471] Code: 3c 1c e8 1c ff ff ff 85 c0 79 87 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 da 0d 00 f7 d8 64 89 01 48 > [ 4070.575478] RSP: 002b:00007ffd65c8d7a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > [ 4070.575484] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f1ed5c10397 > [ 4070.575488] RDX: 00007ffd65c8d7c0 RSI: 0000000040106476 RDI: 0000000000000006 > [ 4070.575492] RBP: 00005620972f9c60 R08: 000000000000000a R09: 0000000000000005 > [ 4070.575496] R10: 000000000000000d R11: 0000000000000246 R12: 000000000000000a > [ 4070.575500] R13: 000000000000000d R14: 0000000000000000 R15: 00007ffd65c8d7c0 > [ 4070.575505] > [ 4070.575507] Modules linked in: nls_ascii(E) nls_cp437(E) vfat(E) fat(E) i915(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) aesni_intel(E) crypto_simd(E) intel_gtt(E) cryptd(E) ttm(E) rapl(E) intel_cstate(E) drm_kms_helper(E) cfbfillrect(E) syscopyarea(E) cfbimgblt(E) intel_uncore(E) sysfillrect(E) mei_me(E) sysimgblt(E) i2c_i801(E) fb_sys_fops(E) mei(E) intel_pch_thermal(E) i2c_smbus(E) cfbcopyarea(E) video(E) button(E) efivarfs(E) autofs4(E) > [ 4070.575549] ---[ end trace 0000000000000000 ]--- > > However, there is a risk of triggering kernel warning on contexts list not > empty at driver release time if we deleagate that task to a worker for > i915_gem_context_release_work(), unless that work is flushed first. > Unfortunately, it is not flushed on driver release. Fix it. > > Chris Wilson (1): > drm/i915/gem: Really move i915_gem_context.link under ref protection > > Janusz Krzysztofik (1): > drm/i915/gem: Flush contexts on driver release > > drivers/gpu/drm/i915/gem/i915_gem_context.c | 8 ++++---- > drivers/gpu/drm/i915/i915_gem.c | 3 ++- > 2 files changed, 6 insertions(+), 5 deletions(-) > > -- > 2.25.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D53B6C54EE9 for ; Mon, 19 Sep 2022 16:08:45 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 0F8C010E691; Mon, 19 Sep 2022 16:08:45 +0000 (UTC) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id A381F10E691; Mon, 19 Sep 2022 16:08:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663603722; x=1695139722; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=NPs0wAMiICa5jSbcC4YEuaTH4A5GViFkO+tb3iPLa2M=; b=UBfRZcrFMjiLEBVO8VurqZbgkNm8wCQJVGzeb3dGXSzSrXRxHq2eKvwG gTKIfLTnExtWMn7kmvHSgWQTsC/HhRJW/dUTXv3KYbaaGQtuRtnrk52pR BkBM8s3/yiBo2QGyUwl9CnKmAQliyzdmETFCINqTBHeHgWM4ydQxAI/9S Vbgz05NH/RjVEn/y996yAxsycnQKY9tZCEKA3TEEbc975dzHAxfHakwbN JOJ6pWOdAcuCx4CO6+J0WxPAv6YuboZs1uQWmRe5q/spCQo6rNkCmJBN3 bWgCh3V+9GtorZF66mmQ7KAgB/4SBotKVoSYOHwn2Qo4KIGtqzDtnqn6/ w==; X-IronPort-AV: E=McAfee;i="6500,9779,10475"; a="325734535" X-IronPort-AV: E=Sophos;i="5.93,328,1654585200"; d="scan'208";a="325734535" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Sep 2022 09:08:39 -0700 X-IronPort-AV: E=Sophos;i="5.93,328,1654585200"; d="scan'208";a="569707417" Received: from biancabe-mobl1.ger.corp.intel.com (HELO intel.com) ([10.252.32.244]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Sep 2022 09:08:33 -0700 Date: Mon, 19 Sep 2022 18:08:26 +0200 From: Andi Shyti To: Janusz Krzysztofik Message-ID: References: <20220916092403.201355-1-janusz.krzysztofik@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220916092403.201355-1-janusz.krzysztofik@linux.intel.com> Subject: Re: [Intel-gfx] [PATCH v3 0/2] drm/i915/gem: Really move i915_gem_context.link under ref protection X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Daniel Vetter , intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Pushed, Thanks! Andi On Fri, Sep 16, 2022 at 11:24:01AM +0200, Janusz Krzysztofik wrote: > i915_perf assumes that it can use the i915_gem_context reference to > protect its i915->gem.contexts.list iteration. However, this requires > that we do not remove the context from the list until after we drop the > final reference and release the struct. If, as currently, we remove the > context from the list during context_close(), the link.next pointer may > be poisoned while we are holding the context reference and cause a GPF: > > [ 4070.573157] i915 0000:00:02.0: [drm:i915_perf_open_ioctl [i915]] filtering on ctx_id=0x > 1fffff ctx_id_mask=0x1fffff > [ 4070.574881] general protection fault, probably for non-canonical address 0xdead00000000 > 0100: 0000 [#1] PREEMPT SMP > [ 4070.574897] CPU: 1 PID: 284392 Comm: amd_performance Tainted: G E 5.17.9 > #180 > [ 4070.574903] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017 > [ 4070.574907] RIP: 0010:oa_configure_all_contexts.isra.0+0x222/0x350 [i915] > [ 4070.574982] Code: 08 e8 32 6e 10 e1 4d 8b 6d 50 b8 ff ff ff ff 49 83 ed 50 f0 41 0f c1 04 24 83 f8 01 0f 84 e3 00 00 00 85 c0 0f 8e fa 00 00 00 <49> 8b 45 50 48 8d 70 b0 49 8d 45 50 48 39 44 24 10 0f 85 34 fe ff > [ 4070.574990] RSP: 0018:ffffc90002077b78 EFLAGS: 00010202 > [ 4070.574995] RAX: 0000000000000002 RBX: 0000000000000002 RCX: 0000000000000000 > [ 4070.575000] RDX: 0000000000000001 RSI: ffffc90002077b20 RDI: ffff88810ddc7c68 > [ 4070.575004] RBP: 0000000000000001 R08: ffff888103242648 R09: fffffffffffffffc > [ 4070.575008] R10: ffffffff82c50bc0 R11: 0000000000025c80 R12: ffff888101bf1860 > [ 4070.575012] R13: dead0000000000b0 R14: ffffc90002077c04 R15: ffff88810be5cabc > [ 4070.575016] FS: 00007f1ed50c0780(0000) GS:ffff88885ec80000(0000) knlGS:0000000000000000 > [ 4070.575021] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 4070.575025] CR2: 00007f1ed5590280 CR3: 000000010ef6f005 CR4: 00000000003706e0 > [ 4070.575029] Call Trace: > [ 4070.575033] > [ 4070.575037] lrc_configure_all_contexts+0x13e/0x150 [i915] > [ 4070.575103] gen8_enable_metric_set+0x4d/0x90 [i915] > [ 4070.575164] i915_perf_open_ioctl+0xbc0/0x1500 [i915] > [ 4070.575224] ? asm_common_interrupt+0x1e/0x40 > [ 4070.575232] ? i915_oa_init_reg_state+0x110/0x110 [i915] > [ 4070.575290] drm_ioctl_kernel+0x85/0x110 > [ 4070.575296] ? update_load_avg+0x5f/0x5e0 > [ 4070.575302] drm_ioctl+0x1d3/0x370 > [ 4070.575307] ? i915_oa_init_reg_state+0x110/0x110 [i915] > [ 4070.575382] ? gen8_gt_irq_handler+0x46/0x130 [i915] > [ 4070.575445] __x64_sys_ioctl+0x3c4/0x8d0 > [ 4070.575451] ? __do_softirq+0xaa/0x1d2 > [ 4070.575456] do_syscall_64+0x35/0x80 > [ 4070.575461] entry_SYSCALL_64_after_hwframe+0x44/0xae > [ 4070.575467] RIP: 0033:0x7f1ed5c10397 > [ 4070.575471] Code: 3c 1c e8 1c ff ff ff 85 c0 79 87 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 da 0d 00 f7 d8 64 89 01 48 > [ 4070.575478] RSP: 002b:00007ffd65c8d7a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > [ 4070.575484] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f1ed5c10397 > [ 4070.575488] RDX: 00007ffd65c8d7c0 RSI: 0000000040106476 RDI: 0000000000000006 > [ 4070.575492] RBP: 00005620972f9c60 R08: 000000000000000a R09: 0000000000000005 > [ 4070.575496] R10: 000000000000000d R11: 0000000000000246 R12: 000000000000000a > [ 4070.575500] R13: 000000000000000d R14: 0000000000000000 R15: 00007ffd65c8d7c0 > [ 4070.575505] > [ 4070.575507] Modules linked in: nls_ascii(E) nls_cp437(E) vfat(E) fat(E) i915(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) aesni_intel(E) crypto_simd(E) intel_gtt(E) cryptd(E) ttm(E) rapl(E) intel_cstate(E) drm_kms_helper(E) cfbfillrect(E) syscopyarea(E) cfbimgblt(E) intel_uncore(E) sysfillrect(E) mei_me(E) sysimgblt(E) i2c_i801(E) fb_sys_fops(E) mei(E) intel_pch_thermal(E) i2c_smbus(E) cfbcopyarea(E) video(E) button(E) efivarfs(E) autofs4(E) > [ 4070.575549] ---[ end trace 0000000000000000 ]--- > > However, there is a risk of triggering kernel warning on contexts list not > empty at driver release time if we deleagate that task to a worker for > i915_gem_context_release_work(), unless that work is flushed first. > Unfortunately, it is not flushed on driver release. Fix it. > > Chris Wilson (1): > drm/i915/gem: Really move i915_gem_context.link under ref protection > > Janusz Krzysztofik (1): > drm/i915/gem: Flush contexts on driver release > > drivers/gpu/drm/i915/gem/i915_gem_context.c | 8 ++++---- > drivers/gpu/drm/i915/i915_gem.c | 3 ++- > 2 files changed, 6 insertions(+), 5 deletions(-) > > -- > 2.25.1