From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8A20C4338F for ; Wed, 11 Aug 2021 10:31:00 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2A1BA600CD for ; Wed, 11 Aug 2021 10:31:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 2A1BA600CD Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ffwll.ch Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id DC7A06E117; Wed, 11 Aug 2021 10:30:57 +0000 (UTC) Received: from mail-wm1-x32e.google.com (mail-wm1-x32e.google.com [IPv6:2a00:1450:4864:20::32e]) by gabe.freedesktop.org (Postfix) with ESMTPS id A45446E116 for ; Wed, 11 Aug 2021 10:30:56 +0000 (UTC) Received: by mail-wm1-x32e.google.com with SMTP id n32so764161wms.2 for ; Wed, 11 Aug 2021 03:30:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=JzdAzPlFMIBxGnASivmCusmQmNZftWOM84fP9GYuvPY=; b=XcIWCz+1v1U0QbWSUaQjj78JzgHC3UAi0bB8ZWQsZbNMuK+trAY5Qj5BBwEMfiDY8+ irg/MBLWIkjddCJRU+QwXzK9cJ+KnbmB/ck/L7WQfJG32VsmLnmAetzFQ9LDFryk6ZOg JeS3/tbKETDd02JIRv6bpjx707zCZXAewa5Sg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=JzdAzPlFMIBxGnASivmCusmQmNZftWOM84fP9GYuvPY=; b=t2E2LzwqFo6RNzIfhI/nLctNAmrhVbSNAGAstR6BgPcjX5ZyGtJybPDSdcGlbVzvdF +bvnUbjoLg62wNW984YesTTDNieQlkNQNDPI/WsASd/yvuni5j41ZO3MxIyaRd5zyexd q4/0heuyPFyGTI6koZWEKlkDGPbvhuTWfrxKeUtqSySMrTVKQAppzkF4N4BkFWXlYBqa o+D/7a1v56PvbUzUkHEY5aYRCen5Z2A6gyw8shT9YCIwSPEdE0RRmbWuhg/DcCaQAnR5 J9xFvuAZWtErlmWX0/9LhL7EJTCba8yJZMvkB8fFY23z0nsDnMw9Qs48oUWH7y1btJsG v0Dw== X-Gm-Message-State: AOAM5331RCzncyoWrx5k6G9XTJeOfefDhO64wd82ndVsvoI5sTpyvxuJ 2IYQ/o+PK0cyxhFT0RJE1qsNzF6VBfweTQ== X-Google-Smtp-Source: ABdhPJzEqrpz3QDRZ6yRDEnjptoZrEmdRb3NM4EdZzoM/fwOjil2KqXeEWaw15AnD10ujHZwgIzoQA== X-Received: by 2002:a1c:1b8f:: with SMTP id b137mr19910wmb.85.1628677855117; Wed, 11 Aug 2021 03:30:55 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id z7sm1070807wmi.4.2021.08.11.03.30.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Aug 2021 03:30:54 -0700 (PDT) Date: Wed, 11 Aug 2021 12:30:52 +0200 From: Daniel Vetter To: Matthew Brost Cc: gfx-internal-devel@eclists.intel.com, dri-devel@lists.freedesktop.org Subject: Re: [PATCH 1/9] drm/i915/guc: Fix blocked context accounting Message-ID: References: <20210811011622.255784-1-matthew.brost@intel.com> <20210811011622.255784-2-matthew.brost@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210811011622.255784-2-matthew.brost@intel.com> X-Operating-System: Linux phenom 5.10.0-7-amd64 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On Wed, Aug 11, 2021 at 01:16:14AM +0000, Matthew Brost wrote: > Prior to this patch the blocked context counter was cleared on > init_sched_state (used during registering a context & resets) which is > incorrect. This state needs to be persistent or the counter can read the > incorrect value resulting in scheduling never getting enabled again. > > Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") Tiny bikeshed on that commit, but for SCHED_STATE_BLOCKED_MASK you want GENMASK. Also SCHED_STATE_BLOCKED is usually called SCHED_STATE_BLOCKED_BIAS or similar if you put a counter into that field. But also ... we can't afford a separate counter? Is all the bitshifting actually worth the space savings? With a separate counter your bugfix below would look a lot more reasonable too. > Signed-off-by: Matthew Brost > Cc: > --- > drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > index 87d8dc8f51b9..69faa39da178 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > @@ -152,7 +152,7 @@ static inline void init_sched_state(struct intel_context *ce) > { > /* Only should be called from guc_lrc_desc_pin() */ > atomic_set(&ce->guc_sched_state_no_lock, 0); atomic_t without barriers or anything like that. tsk, tsk. Do we really need this? Also I looked through the callchain of init_sched_state and I couldn't find any locking, nor any description about ownership that would explain why there's no locking. E.g. scrub_guc_desc_for_outstanding_g2h has an xa_for_each with no protection. No idea how that one works. I also couldn't figure out how anything else in there is protected (no spinlocks to be seen anywhere at least). And then there's stuff like this: /* Flush context */ spin_lock_irqsave(&ce->guc_state.lock, flags); spin_unlock_irqrestore(&ce->guc_state.lock, flags); This pattern seems very common, and it freaks me out. Finally none of the locking or consistency rules are explained in the kerneldoc (or even comments) in the relevant datastructures, which is not great. > - ce->guc_state.sched_state = 0; > + ce->guc_state.sched_state &= SCHED_STATE_BLOCKED_MASK; The patch itself matches the commit message and makes sense. But like I said, would be cleaner I think if it's just a separate counter. Reviewed-by: Daniel Vetter > } > > static inline bool > -- > 2.32.0 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch