From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8A62C433F5 for ; Tue, 12 Apr 2022 04:47:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343778AbiDLEuI (ORCPT ); Tue, 12 Apr 2022 00:50:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56952 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235177AbiDLEuH (ORCPT ); Tue, 12 Apr 2022 00:50:07 -0400 Received: from mail-oi1-x236.google.com (mail-oi1-x236.google.com [IPv6:2607:f8b0:4864:20::236]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 652012E6AB for ; Mon, 11 Apr 2022 21:47:50 -0700 (PDT) Received: by mail-oi1-x236.google.com with SMTP id e189so17895689oia.8 for ; Mon, 11 Apr 2022 21:47:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:in-reply-to:references:from:user-agent:date:message-id :subject:to:cc; bh=YYQEo5j2Ii2p8H5ZS23pXYIYx499iXk6axIZJFS2N9g=; b=XMxVGa8AckEYdPLkkCRkFLuRVmw86/dj5NmmRDa1LrlJAmpn2bPhnrXLqyi9ChrDlB UdUuyxhKHjNRmPSZFzJtTxTH5ep14tjJzZIpkJfcht7E9C70t3K1j3ACnHmF5GyXIfr4 IaxkhWdpJTFKYmTTuZyfyIl78+vPVYyDqBSeA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:in-reply-to:references:from :user-agent:date:message-id:subject:to:cc; bh=YYQEo5j2Ii2p8H5ZS23pXYIYx499iXk6axIZJFS2N9g=; b=i4EiDo/IvIDapliRtOMiDL6AonFTGqemQmg2OViE7dVfrWVPLcRyfKGoZUBiU0XeZM muYr7WXWupDTTDqbauaibcclnRPqrnedm2Rfua9grhJ0ZmD/ealSQ1M8rqigYPGhJsk+ DNB4RmCCK0iJduViY9sDL0rzEf7hlPJzB8TW170Ut6HlhwoFPs/x0cN8wwSe3j99JuIn HgrVJuiQ5tUZ4LbMhmW1R9JiBzpvRWq91Ig473QAtnrFA5K/duoLNIjXbZUXkN10Uklk RVOf98hFAR5OQeD+vIX5Sd6robQp9g6+O3sSKh69mmDgP3Ky6FhgEkY6kTwg1WJ1Oh0T W+Vw== X-Gm-Message-State: AOAM532FtWeJzR4a8tZQS1mm00rM8FKbphg/BNH+GJCNPUjP15qlFRxM 8r4qGbzqAObyLrxWq35CtQW0DoiCiT9dWGHtqYojkQ== X-Google-Smtp-Source: ABdhPJy3ENRm1IsCajLk61o+xCdj4yulCsVNsBguR1dqMjmKhpW3tu/Z+AUrgZ0reqgxkhlEO75blvagp+uFa41xz3I= X-Received: by 2002:aca:a9c8:0:b0:2da:45b6:b796 with SMTP id s191-20020acaa9c8000000b002da45b6b796mr946369oie.193.1649738869745; Mon, 11 Apr 2022 21:47:49 -0700 (PDT) Received: from 753933720722 named unknown by gmailapi.google.com with HTTPREST; Tue, 12 Apr 2022 00:47:49 -0400 MIME-Version: 1.0 In-Reply-To: References: <20211125174751.25317-1-djakov@kernel.org> From: Stephen Boyd User-Agent: alot/0.10 Date: Tue, 12 Apr 2022 00:47:49 -0400 Message-ID: Subject: Re: [PATCH v3] interconnect: qcom: icc-rpmh: Add BCMs to commit list in pre_aggregate To: Alex Elder , djakov@kernel.org, quic_mdtipton@quicinc.com Cc: linux-arm-msm@vger.kernel.org, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, mka@chromium.org, dianders@chromium.org, Bjorn Andersson , Taniya Das , Sibi Sankar Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-arm-msm@vger.kernel.org Quoting Stephen Boyd (2022-04-11 13:20:27) > > I think it's some IPA unclocked access because it's an SError async > abort. I looked some at the IPA clk implementation and it is a little > concerning. I see two problems. The first problem is that clk-rpmh is > only voting on the active state for the IPA clk. Maybe RPMh will move > the sleep/wake state over from the active state automatically as long as > we never make a request in either of those states? I don't know, but it > looks concerning and either needs to be fixed or a big comment > indicating that the copy happens. This patch makes it set a frequency in > each state bucket. > This patch doesn't seem to matter. I think that's because RPMh copies over active state settings to sleep and wake state automatically if those states are never changed by that processor. Someone at qcom needs to confirm this theory. I'll send the patch and see if that spurs someone at qcom to respond. > > The second problem I see is that the IPA clk resource is "IP0" but it is > still defined in the interconnect/qcom/sc7180.c file. > > $ git grep \"IP0\" -- drivers/{interconnect,clk}/ > drivers/clk/qcom/clk-rpmh.c:DEFINE_CLK_RPMH_BCM(sdm845, ipa, "IP0"); > drivers/clk/qcom/clk-rpmh.c:DEFINE_CLK_RPMH_BCM(sdx55, ipa, "IP0"); > drivers/interconnect/qcom/sc7180.c:DEFINE_QBCM(bcm_ip0, "IP0", false, &ipa_core_slave); > drivers/interconnect/qcom/sc8180x.c:DEFINE_QBCM(bcm_ip0, "IP0", false, &slv_ipa_core_slave); > drivers/interconnect/qcom/sdx55.c:DEFINE_QBCM(bcm_ip0, "IP0", false, &ipa_core_slave); > drivers/interconnect/qcom/sm8150.c:DEFINE_QBCM(bcm_ip0, "IP0", false, &ipa_core_slave); > drivers/interconnect/qcom/sm8250.c:DEFINE_QBCM(bcm_ip0, "IP0", false, &ipa_core_slave); > > That sounds pretty bad because it means both interconnect and clk frameworks > are trying to control the same RPMh resource, trampling over each other. > Combine that with unused clks and sync_state support and I don't know > what the state of the IP0 resource really is anymore. Alright, some more debugging has confirmed this. I put a print in the rpmh driver to figure out when the IP0 resource, according to cmd-db, is being modified. Luckily, the interconnect driver uses the rpmh batch API while the clk driver uses the direct write API. I put a dump_stack() when the batch API is called on the IP0 address and that sometimes gets zeroed out (i.e. IPA clk is turned off) after the rpmh clk driver turned it ON. The rpmh driver in the kernel doesn't do any aggregation between kernel clients, so the problem is this sequence: IPA driver probes runtime PM get() IPA clk enabled -> IP0 resource is ON interconnect zeroes out the IP0 resource -> IP0 resource is off IPA driver tries to access a register and blows up The interconnect framework is zeroing it out now because there's a sync_state callback once all drivers have probed. This is why I saw it more easily when the IPA driver is builtin vs. a module. The IPA driver is much more likely to have turned on the resource before sync_state is called, but the use of autosuspend made it variable. If the IPA driver autosuspend callback is delayed just enough, then IPA will turn on the IP0 resource via the clk API, sync state will turn it off from the interconnect framework, and then the pm_runtime_get_sync() in the IPA driver will skip powering the clk on again because it never turned it off. The runtime PM state of the device is correct, but the clk is off. Oops! I think changing the IPA_AUTOSUSPEND_DELAY define to be multiple seconds makes it even more likely, because then the IPA device definitely won't be suspended by the time interconnect sync state happens. Anyway, this also explains why the IPA runtime PM patch made things better. Sometimes the IPA device will be runtime suspended, and thus it will power on the IP0 resource on runtime resume even after interconnect sync state happened. This is the situation on v5.10.y, where runtime PM isn't happening at all, but sync state is once commit b95b668eaaa2 ("interconnect: qcom: icc-rpmh: Add BCMs to commit list in pre_aggregate") is included. The IP0 resource is guaranteed to be turned off after sync state drops the request. So the fix is simple, completely remove IP0 from the interconnect driver so that sync state doesn't turn it off. Then the IPA driver without runtime PM will work because the clk resource is turned on and kept on until system suspend. When the runtime PM patch in IPA is applied, it will also work because runtime PM will power things on correctly, or keep them powered on because autosuspend is delayed. I'll send this patch to the list shortly. And split it up for the other drivers too.