From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3D814C32793 for ; Wed, 18 Jan 2023 08:29:58 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9CFF410E184; Wed, 18 Jan 2023 08:29:54 +0000 (UTC) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id BF6F010E10D; Wed, 18 Jan 2023 08:29:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1674030592; x=1705566592; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=hd+NYneLBIAY9f1rlyWmG8Fx5MU7PPQmxwtR9U8seHY=; b=nv6AyHwULcTHkHOHTzZlbfIiSRf2/2TCMGF5/G0tRchYoRqXRi/okhYN KymsTrhsLe/X7aqavQ9HH4boaIejZalGDIGkibHs3S7zQ9txMWd5+WY3y 12r02xtmNpscwoNwWRN4xfg+jW0f7uNnMRda6nrpVomJjKFtCBwT0pogP aNFPeczhG7aMDnAbzhyfhf45nbryTjMUoKDE4J4nXzxa2H9vBbwWTGLZk BPofUm0YJZpHUsu1StNwjUavtTgPKy+ODVGl8v/UZNsdX7Np8OPa9ncqN OX4pz3HwjBnExywMSIKscKzrAicRmnQS/+wfpk137wYKozWExbdGYSTtu A==; X-IronPort-AV: E=McAfee;i="6500,9779,10593"; a="352177160" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="352177160" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jan 2023 00:29:52 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10593"; a="748378938" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="748378938" Received: from smile.fi.intel.com ([10.237.72.54]) by FMSMGA003.fm.intel.com with ESMTP; 18 Jan 2023 00:29:47 -0800 Received: from andy by smile.fi.intel.com with local (Exim 4.96) (envelope-from ) id 1pI3pK-00B0sS-0S; Wed, 18 Jan 2023 10:29:46 +0200 Date: Wed, 18 Jan 2023 10:29:45 +0200 From: Andy Shevchenko To: John.C.Harrison@intel.com Subject: Re: [PATCH v2 1/5] drm/i915: Fix request locking during error capture & debugfs dump Message-ID: References: <20230117213630.2897570-1-John.C.Harrison@Intel.com> <20230117213630.2897570-2-John.C.Harrison@Intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230117213630.2897570-2-John.C.Harrison@Intel.com> Organization: Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Matthew Brost , Tvrtko Ursulin , Bruce Chang , Michael Cheng , Aravind Iddamsetty , Alan Previn , Umesh Nerlige Ramappa , Intel-GFX@lists.freedesktop.org, Lucas De Marchi , Chris Wilson , Daniele Ceraolo Spurio , DRI-Devel@lists.freedesktop.org, Andrzej Hajda , Rodrigo Vivi , Tejas Upadhyay , Matthew Auld Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On Tue, Jan 17, 2023 at 01:36:26PM -0800, John.C.Harrison@Intel.com wrote: > From: John Harrison > > When GuC support was added to error capture, the locking around the > request object was broken. Fix it up. > > The context based search manages the spinlocking around the search > internally. So it needs to grab the reference count internally as > well. The execlist only request based search relies on external > locking, so it needs an external reference count. So no change to that > code itself but the context version does change. > > The only other caller is the code for dumping engine state to debugfs. > That code wasn't previously getting an explicit reference at all as it > does everything while holding the execlist specific spinlock. So that > needs updaing as well as that spinlock doesn't help when using GuC > submission. Rather than trying to conditionally get/put depending on > submission model, just change it to always do the get/put. > > In addition, intel_guc_find_hung_context() was not acquiring the > correct spinlock before searching the request list. So fix that up too. > Fixes: dc0dad365c5e ("drm/i915/guc: Fix for error capture after full GPU reset > with GuC") Must be one line. > Fixes: 573ba126aef3 ("drm/i915/guc: Capture error state on context reset") > Cc: Matthew Brost > Cc: John Harrison > Cc: Jani Nikula > Cc: Joonas Lahtinen > Cc: Rodrigo Vivi > Cc: Tvrtko Ursulin > Cc: Daniele Ceraolo Spurio > Cc: Andrzej Hajda > Cc: Chris Wilson > Cc: Matthew Auld > Cc: Matt Roper > Cc: Umesh Nerlige Ramappa > Cc: Michael Cheng > Cc: Lucas De Marchi > Cc: Tejas Upadhyay > Cc: Andy Shevchenko > Cc: Aravind Iddamsetty > Cc: Alan Previn > Cc: Bruce Chang > Cc: intel-gfx@lists.freedesktop.org Is it possible to utilize --to --cc parameters to git send-email instead of noisy Cc list? ... > + if (hung_rq) > + i915_request_put(hung_rq); In Linux kernel the idiom is that freeing resources APIs should be NULL-aware (or ERR_PTR aware or both). Does i915 follows that? If so, the test should be inside i915_request_put() rather than in any of the callers. ... > @@ -4847,6 +4857,7 @@ void intel_guc_find_hung_context(struct intel_engine_cs *engine) > xa_lock(&guc->context_lookup); > goto done; > } > + > next: > intel_context_put(ce); > xa_lock(&guc->context_lookup); Stray change. -- With Best Regards, Andy Shevchenko From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 27F6EC38159 for ; Wed, 18 Jan 2023 08:29:55 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 7651610E10D; Wed, 18 Jan 2023 08:29:54 +0000 (UTC) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id BF6F010E10D; Wed, 18 Jan 2023 08:29:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1674030592; x=1705566592; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=hd+NYneLBIAY9f1rlyWmG8Fx5MU7PPQmxwtR9U8seHY=; b=nv6AyHwULcTHkHOHTzZlbfIiSRf2/2TCMGF5/G0tRchYoRqXRi/okhYN KymsTrhsLe/X7aqavQ9HH4boaIejZalGDIGkibHs3S7zQ9txMWd5+WY3y 12r02xtmNpscwoNwWRN4xfg+jW0f7uNnMRda6nrpVomJjKFtCBwT0pogP aNFPeczhG7aMDnAbzhyfhf45nbryTjMUoKDE4J4nXzxa2H9vBbwWTGLZk BPofUm0YJZpHUsu1StNwjUavtTgPKy+ODVGl8v/UZNsdX7Np8OPa9ncqN OX4pz3HwjBnExywMSIKscKzrAicRmnQS/+wfpk137wYKozWExbdGYSTtu A==; X-IronPort-AV: E=McAfee;i="6500,9779,10593"; a="352177160" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="352177160" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jan 2023 00:29:52 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10593"; a="748378938" X-IronPort-AV: E=Sophos;i="5.97,224,1669104000"; d="scan'208";a="748378938" Received: from smile.fi.intel.com ([10.237.72.54]) by FMSMGA003.fm.intel.com with ESMTP; 18 Jan 2023 00:29:47 -0800 Received: from andy by smile.fi.intel.com with local (Exim 4.96) (envelope-from ) id 1pI3pK-00B0sS-0S; Wed, 18 Jan 2023 10:29:46 +0200 Date: Wed, 18 Jan 2023 10:29:45 +0200 From: Andy Shevchenko To: John.C.Harrison@intel.com Message-ID: References: <20230117213630.2897570-1-John.C.Harrison@Intel.com> <20230117213630.2897570-2-John.C.Harrison@Intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230117213630.2897570-2-John.C.Harrison@Intel.com> Organization: Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo Subject: Re: [Intel-gfx] [PATCH v2 1/5] drm/i915: Fix request locking during error capture & debugfs dump X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Michael Cheng , Alan Previn , Intel-GFX@lists.freedesktop.org, Lucas De Marchi , Chris Wilson , DRI-Devel@lists.freedesktop.org, Andrzej Hajda , Rodrigo Vivi , Tejas Upadhyay , Matthew Auld Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" On Tue, Jan 17, 2023 at 01:36:26PM -0800, John.C.Harrison@Intel.com wrote: > From: John Harrison > > When GuC support was added to error capture, the locking around the > request object was broken. Fix it up. > > The context based search manages the spinlocking around the search > internally. So it needs to grab the reference count internally as > well. The execlist only request based search relies on external > locking, so it needs an external reference count. So no change to that > code itself but the context version does change. > > The only other caller is the code for dumping engine state to debugfs. > That code wasn't previously getting an explicit reference at all as it > does everything while holding the execlist specific spinlock. So that > needs updaing as well as that spinlock doesn't help when using GuC > submission. Rather than trying to conditionally get/put depending on > submission model, just change it to always do the get/put. > > In addition, intel_guc_find_hung_context() was not acquiring the > correct spinlock before searching the request list. So fix that up too. > Fixes: dc0dad365c5e ("drm/i915/guc: Fix for error capture after full GPU reset > with GuC") Must be one line. > Fixes: 573ba126aef3 ("drm/i915/guc: Capture error state on context reset") > Cc: Matthew Brost > Cc: John Harrison > Cc: Jani Nikula > Cc: Joonas Lahtinen > Cc: Rodrigo Vivi > Cc: Tvrtko Ursulin > Cc: Daniele Ceraolo Spurio > Cc: Andrzej Hajda > Cc: Chris Wilson > Cc: Matthew Auld > Cc: Matt Roper > Cc: Umesh Nerlige Ramappa > Cc: Michael Cheng > Cc: Lucas De Marchi > Cc: Tejas Upadhyay > Cc: Andy Shevchenko > Cc: Aravind Iddamsetty > Cc: Alan Previn > Cc: Bruce Chang > Cc: intel-gfx@lists.freedesktop.org Is it possible to utilize --to --cc parameters to git send-email instead of noisy Cc list? ... > + if (hung_rq) > + i915_request_put(hung_rq); In Linux kernel the idiom is that freeing resources APIs should be NULL-aware (or ERR_PTR aware or both). Does i915 follows that? If so, the test should be inside i915_request_put() rather than in any of the callers. ... > @@ -4847,6 +4857,7 @@ void intel_guc_find_hung_context(struct intel_engine_cs *engine) > xa_lock(&guc->context_lookup); > goto done; > } > + > next: > intel_context_put(ce); > xa_lock(&guc->context_lookup); Stray change. -- With Best Regards, Andy Shevchenko