From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, HK_RANDOM_FROM,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 015F6C35247 for ; Wed, 5 Feb 2020 18:34:03 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D41F820674 for ; Wed, 5 Feb 2020 18:34:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D41F820674 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 3BDD96F92A; Wed, 5 Feb 2020 18:34:02 +0000 (UTC) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by gabe.freedesktop.org (Postfix) with ESMTPS id BC16C6F92D for ; Wed, 5 Feb 2020 18:34:00 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Feb 2020 10:34:00 -0800 X-IronPort-AV: E=Sophos;i="5.70,406,1574150400"; d="scan'208";a="224743072" Received: from aabader-mobl1.ccr.corp.intel.com (HELO [10.252.21.249]) ([10.252.21.249]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/AES256-SHA; 05 Feb 2020 10:33:59 -0800 To: Chris Wilson , intel-gfx@lists.freedesktop.org References: <20200205121147.1834445-1-chris@chris-wilson.co.uk> <20200205121313.1834548-1-chris@chris-wilson.co.uk> <637ae604-f50d-7436-eb0b-e69d555e401f@linux.intel.com> <158092108409.5585.7308401904801560850@skylake-alporthouse-com> From: Tvrtko Ursulin Organization: Intel Corporation UK Plc Message-ID: <0eb12f13-eb2c-2c44-2a04-aa65deef2df8@linux.intel.com> Date: Wed, 5 Feb 2020 18:33:57 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <158092108409.5585.7308401904801560850@skylake-alporthouse-com> Content-Language: en-US Subject: Re: [Intel-gfx] [PATCH v2] drm/i915/gem: Don't leak non-persistent requests on changing engines X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" On 05/02/2020 16:44, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2020-02-05 16:22:34) >> On 05/02/2020 12:13, Chris Wilson wrote: >>> If we have a set of active engines marked as being non-persistent, we >>> lose track of those if the user replaces those engines with >>> I915_CONTEXT_PARAM_ENGINES. As part of our uABI contract is that >>> non-persistent requests are terminated if they are no longer being >>> tracked by the user's context (in order to prevent a lost request >>> causing an untracked and so unstoppable GPU hang), we need to apply the >>> same context cancellation upon changing engines. >>> >>> Fixes: a0e047156cde ("drm/i915/gem: Make context persistence optional") >>> Testcase: XXX >>> Signed-off-by: Chris Wilson >>> Cc: Tvrtko Ursulin >>> --- >>> drivers/gpu/drm/i915/gem/i915_gem_context.c | 7 +++++++ >>> 1 file changed, 7 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c >>> index 52a749691a8d..20f1d3e0221f 100644 >>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c >>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c >>> @@ -1624,11 +1624,18 @@ set_engines(struct i915_gem_context *ctx, >>> >>> replace: >>> mutex_lock(&ctx->engines_mutex); >>> + >>> + /* Flush stale requests off the old engines if required */ >>> + if (!i915_gem_context_is_persistent(ctx) || >>> + !i915_modparams.enable_hangcheck) >>> + kill_context(ctx); >> >> Is the negative effect of this is legit contexts can't keep submitting >> and changing the map? Only if PREEMPT_TIMEOUT is disabled I think but >> still. Might break legitimate userspace. Not that I offer solutions.. :( >> Banning changing engines once context went non-persistent? That too can >> break someone. > > It closes the hole we have. To do otherwise, we need to keep track of > the old engines. Not an impossible task, certainly inconvenient. > > struct old_engines { > struct i915_active active; > struct list_head link; > struct i915_gem_context *ctx; > void *engines; > int num_engines; > }; > > With a list+spinlock in the ctx that we can work in kill_context. > > The biggest catch there is actually worrying about attaching the active > to already executing request, and making sure the coupling doesn't bug > on a concurrent completion. Hmm, it's just a completion callback, but > more convenient to use a ready made one. What would you do with old engines? We don't have a mechanism to mark intel_context closed. Hm, right, it would get unreachable by definition. But how to terminate it if it doesn't play nicely? Regards, Tvrtko _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx