From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C49FCC47094 for ; Thu, 10 Jun 2021 14:45:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A5DF560E08 for ; Thu, 10 Jun 2021 14:45:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230153AbhFJOrZ (ORCPT ); Thu, 10 Jun 2021 10:47:25 -0400 Received: from mail-ot1-f53.google.com ([209.85.210.53]:39692 "EHLO mail-ot1-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231208AbhFJOrX (ORCPT ); Thu, 10 Jun 2021 10:47:23 -0400 Received: by mail-ot1-f53.google.com with SMTP id 5-20020a9d01050000b02903c700c45721so26645889otu.6 for ; Thu, 10 Jun 2021 07:45:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=5r1JUAnf5vqj0yEA5qbVlT9HSVi/kDaJI1P3a5pkfsU=; b=W70cv9kRs5vC91D+5krSiC2wgrRgfcRqDBAvHN6Jz57LjX2QVyZmsMEhHTP67RchNg qOLEMa6gQuxFjFntv/fypYATIKLR/Bdo11s57OcCPfIP5M4k2gXoq7tRYnSsADHUBL61 /6/U9NGTqfT0AfqKcIcb0zYJr1y/r0NI/dAfWCMYFiS2Wco5+JOwEof1nQMYLQHIKlHe NASNPYOQ63fs3H4YA94pAv/O8n6F80sT5CuspLWbFS3W9Y5aT2oWbHk9bUmVVtKCU5DI Xb95PoULP6bLz93XftyUEJRHt0V69MydJiwUUJPsCIZRsCVOCXSQowzCAHE22j++5wRa pc2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=5r1JUAnf5vqj0yEA5qbVlT9HSVi/kDaJI1P3a5pkfsU=; b=LdIYZCo5Qyn+3A99zseYJ4OFa/YtSpBOjNHTLfgXavXx9HdrEojCNRSeWG4ST58nz4 smagp8vFOwYwZTsEGbYT3a+NvnxA3ESWMwGM8QhWs84uEodtRtdkdYly8271keCBrV6t PI93mEoyRW/K3HlhZAWMdVzxIAn5dpNIHEd2Efr9lPwudyg8qnPgdZdJLRnRelSc57hq lkflnRaK+fbnFaDUD0+fUVjctCduuxF1VmYpGGwSoqiqMDhVymU07rT8DK0K2uYr5GG9 5epc6NtZr0EcCPIqocm5uY9tx0Ne5+qqGoIGuNF5FA+DT0MBPIG37YJBjKaCi4jkAntY V88w== X-Gm-Message-State: AOAM530gRHio97pYq3DDR7ARddSq+dxez/+Y93ef2rZxESGV1/j/BIOD zLnoGpAEK4hVh7b/JXk56gCOdQ== X-Google-Smtp-Source: ABdhPJyU40SttyHVQTJlyA84enWC1soa+5ksDd5YxGOEO8l6OO67Iv+5BiAGIy1r5b60eDx25QCl9A== X-Received: by 2002:a9d:65:: with SMTP id 92mr2691191ota.239.1623336267233; Thu, 10 Jun 2021 07:44:27 -0700 (PDT) Received: from yoga (104-57-184-186.lightspeed.austtx.sbcglobal.net. [104.57.184.186]) by smtp.gmail.com with ESMTPSA id f2sm562397ooj.22.2021.06.10.07.44.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Jun 2021 07:44:26 -0700 (PDT) Date: Thu, 10 Jun 2021 09:44:24 -0500 From: Bjorn Andersson To: Dmitry Baryshkov Cc: Rob Clark , Sean Paul , David Airlie , Daniel Vetter , Abhinav Kumar , linux-arm-msm@vger.kernel.org, dri-devel@lists.freedesktop.org, freedreno@lists.freedesktop.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] drm/msm/dpu: Avoid ABBA deadlock between IRQ modules Message-ID: References: <20210609231507.3031904-1-bjorn.andersson@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-arm-msm@vger.kernel.org On Thu 10 Jun 06:46 CDT 2021, Dmitry Baryshkov wrote: > On 10/06/2021 02:15, Bjorn Andersson wrote: > > Handling of the interrupt callback lists is done in dpu_core_irq.c, > > under the "cb_lock" spinlock. When these operations results in the need > > for enableing or disabling the IRQ in the hardware the code jumps to > > dpu_hw_interrupts.c, which protects its operations with "irq_lock" > > spinlock. > > > > When an interrupt fires, dpu_hw_intr_dispatch_irq() inspects the > > hardware state while holding the "irq_lock" spinlock and jumps to > > dpu_core_irq_callback_handler() to invoke the registered handlers, which > > traverses the callback list under the "cb_lock" spinlock. > > > > As such, in the event that these happens concurrently we'll end up with > > a deadlock. > > > > Prior to '1c1e7763a6d4 ("drm/msm/dpu: simplify IRQ enabling/disabling")' > > the enable/disable of the hardware interrupt was done outside the > > "cb_lock" region, optimitically by using an atomic enable-counter for > > each interrupt and an warning print if someone changed the list between > > the atomic_read and the time the operation concluded. > > > > Rather than re-introducing the large array of atomics, serialize the > > register/unregister operations under a single mutex. > > > > Fixes: 1c1e7763a6d4 ("drm/msm/dpu: simplify IRQ enabling/disabling") > > Signed-off-by: Bjorn Andersson > > I have been thinking about this for quite some time. I'm still not confident > about the proposed scheme. > > What more intrusive, but more obvious change: > - Drop dpu_kms->irq_obj.cb_lock alltogether. Use hw_intr's irq_lock instead > in the register/unregister path and no locks in the callback itself. I did consider this as well, but while I certainly don't like the current design I think it would be easy to miss the fact that the two "different" modules share the lock. > - Do not take locks in the dpu_hw_intr_enable_irq/disable_irq (as we are > already locked outside). > This is correct. > The core_irq is the only user of the hw_intr framework. In fact I'd like to > squash them together at some point (if I get some time, I'll send patches > during this cycle). > As we've seen the two modules are tightly coupled, and it's full of slow function pointer calls, so I'm definitely in favor of us refactoring this into a single thing. Further more, rather than dealing with the callback lists in dpu_core_irq, I think we should explore just implementing this as another chained irq handler, and just use the common IRQ code directly - it has a known API and is known to work. That said, such redesign will take a little bit of time and we definitely need to squash the deadlock, as my laptop only survives 10-15 minutes with two displays active. So if we agree that we want to squash the two modules, then your suggestion of sharing the lock seems like a reasonable intermediate workaround. I'll respin and retest my patch accordingly. Thanks, Bjorn > > > --- > > drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c | 10 +++++++--- > > drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h | 2 ++ > > 2 files changed, 9 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c > > index 4f110c428b60..62bbe35eff7b 100644 > > --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c > > +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c > > @@ -82,11 +82,13 @@ int dpu_core_irq_register_callback(struct dpu_kms *dpu_kms, int irq_idx, > > DPU_DEBUG("[%pS] irq_idx=%d\n", __builtin_return_address(0), irq_idx); > > + mutex_lock(&dpu_kms->irq_obj.hw_enable_lock); > > spin_lock_irqsave(&dpu_kms->irq_obj.cb_lock, irq_flags); > > trace_dpu_core_irq_register_callback(irq_idx, register_irq_cb); > > list_del_init(®ister_irq_cb->list); > > list_add_tail(®ister_irq_cb->list, > > &dpu_kms->irq_obj.irq_cb_tbl[irq_idx]); > > + spin_unlock_irqrestore(&dpu_kms->irq_obj.cb_lock, irq_flags); > > if (list_is_first(®ister_irq_cb->list, > > &dpu_kms->irq_obj.irq_cb_tbl[irq_idx])) { > > int ret = dpu_kms->hw_intr->ops.enable_irq( > > @@ -96,8 +98,7 @@ int dpu_core_irq_register_callback(struct dpu_kms *dpu_kms, int irq_idx, > > DPU_ERROR("Fail to enable IRQ for irq_idx:%d\n", > > irq_idx); > > } > > - > > - spin_unlock_irqrestore(&dpu_kms->irq_obj.cb_lock, irq_flags); > > + mutex_unlock(&dpu_kms->irq_obj.hw_enable_lock); > > return 0; > > } > > @@ -127,9 +128,11 @@ int dpu_core_irq_unregister_callback(struct dpu_kms *dpu_kms, int irq_idx, > > DPU_DEBUG("[%pS] irq_idx=%d\n", __builtin_return_address(0), irq_idx); > > + mutex_lock(&dpu_kms->irq_obj.hw_enable_lock); > > spin_lock_irqsave(&dpu_kms->irq_obj.cb_lock, irq_flags); > > trace_dpu_core_irq_unregister_callback(irq_idx, register_irq_cb); > > list_del_init(®ister_irq_cb->list); > > + spin_unlock_irqrestore(&dpu_kms->irq_obj.cb_lock, irq_flags); > > /* empty callback list but interrupt is still enabled */ > > if (list_empty(&dpu_kms->irq_obj.irq_cb_tbl[irq_idx])) { > > int ret = dpu_kms->hw_intr->ops.disable_irq( > > @@ -140,7 +143,7 @@ int dpu_core_irq_unregister_callback(struct dpu_kms *dpu_kms, int irq_idx, > > irq_idx); > > DPU_DEBUG("irq_idx=%d ret=%d\n", irq_idx, ret); > > } > > - spin_unlock_irqrestore(&dpu_kms->irq_obj.cb_lock, irq_flags); > > + mutex_unlock(&dpu_kms->irq_obj.hw_enable_lock); > > return 0; > > } > > @@ -207,6 +210,7 @@ void dpu_core_irq_preinstall(struct dpu_kms *dpu_kms) > > dpu_disable_all_irqs(dpu_kms); > > pm_runtime_put_sync(&dpu_kms->pdev->dev); > > + mutex_init(&dpu_kms->irq_obj.hw_enable_lock); > > spin_lock_init(&dpu_kms->irq_obj.cb_lock); > > /* Create irq callbacks for all possible irq_idx */ > > diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h > > index f6840b1af6e4..5a162caea29d 100644 > > --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h > > +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h > > @@ -83,6 +83,7 @@ struct dpu_irq_callback { > > * @total_irq: total number of irq_idx obtained from HW interrupts mapping > > * @irq_cb_tbl: array of IRQ callbacks setting > > * @cb_lock: callback lock > > + * @hw_enable_lock: lock to synchronize callback register and unregister > > * @debugfs_file: debugfs file for irq statistics > > */ > > struct dpu_irq { > > @@ -90,6 +91,7 @@ struct dpu_irq { > > struct list_head *irq_cb_tbl; > > atomic_t *irq_counts; > > spinlock_t cb_lock; > > + struct mutex hw_enable_lock; > > }; > > struct dpu_kms { > > > > > -- > With best wishes > Dmitry