From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.1 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D17DC3F2CD for ; Thu, 5 Mar 2020 09:41:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0172F2166E for ; Thu, 5 Mar 2020 09:41:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=mg.codeaurora.org header.i=@mg.codeaurora.org header.b="pnxgxz5V" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726618AbgCEJlr (ORCPT ); Thu, 5 Mar 2020 04:41:47 -0500 Received: from mail27.static.mailgun.info ([104.130.122.27]:60677 "EHLO mail27.static.mailgun.info" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725880AbgCEJlr (ORCPT ); Thu, 5 Mar 2020 04:41:47 -0500 DKIM-Signature: a=rsa-sha256; v=1; c=relaxed/relaxed; d=mg.codeaurora.org; q=dns/txt; s=smtp; t=1583401306; h=Content-Transfer-Encoding: Content-Type: In-Reply-To: MIME-Version: Date: Message-ID: From: References: Cc: To: Subject: Sender; bh=p29vmtXnIGYKAtfuFzEtWLuvGXAbVi5EzBi6IIHnlMU=; b=pnxgxz5VlIQvPY+Y4zdLVI3N1stgzEFwYIqZVbh5p/pv4f+eBU2vRyXRuzT5CpH7yT1hggOj cjh9pyoFemDNaX/gHYhbAIHKs9HBEYjTjNBCsIpUlMO1FmUXq/J1BBsFGigwwfUjo+hn0Gp0 7GZlXPXaWvhHnrXx4mfkvPjTtes= X-Mailgun-Sending-Ip: 104.130.122.27 X-Mailgun-Sid: WyI1MzIzYiIsICJsaW51eC1hcm0tbXNtQHZnZXIua2VybmVsLm9yZyIsICJiZTllNGEiXQ== Received: from smtp.codeaurora.org (ec2-35-166-182-171.us-west-2.compute.amazonaws.com [35.166.182.171]) by mxa.mailgun.org with ESMTP id 5e60c94f.7f98367ddea0-smtp-out-n01; Thu, 05 Mar 2020 09:41:35 -0000 (UTC) Received: by smtp.codeaurora.org (Postfix, from userid 1001) id 06C44C4479F; Thu, 5 Mar 2020 09:41:34 +0000 (UTC) Received: from [10.206.13.37] (blr-c-bdr-fw-01_GlobalNAT_AllZones-Outside.qualcomm.com [103.229.19.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: mkshah) by smtp.codeaurora.org (Postfix) with ESMTPSA id 0DD3AC4479C; Thu, 5 Mar 2020 09:41:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 0DD3AC4479C Authentication-Results: aws-us-west-2-caf-mail-1.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: aws-us-west-2-caf-mail-1.web.codeaurora.org; spf=none smtp.mailfrom=mkshah@codeaurora.org Subject: Re: [PATCH v9 3/3] soc: qcom: rpmh: Invoke rpmh_flush() for dirty caches To: Doug Anderson Cc: Stephen Boyd , Matthias Kaehlcke , Evan Green , Bjorn Andersson , LKML , linux-arm-msm , Andy Gross , Rajendra Nayak , Lina Iyer , lsrao@codeaurora.org References: <1582889903-12890-1-git-send-email-mkshah@codeaurora.org> <1582889903-12890-4-git-send-email-mkshah@codeaurora.org> <7704638e-b473-d0cf-73ab-2bdb67b636ba@codeaurora.org> From: Maulik Shah Message-ID: Date: Thu, 5 Mar 2020 15:11:27 +0530 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Content-Language: en-GB Sender: linux-arm-msm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-arm-msm@vger.kernel.org Hi, On 3/4/2020 6:10 AM, Doug Anderson wrote: > Hi, > > On Mon, Mar 2, 2020 at 9:47 PM Maulik Shah wrote: >> >> On 2/29/2020 5:15 AM, Doug Anderson wrote: >>> Hi, >>> >>> On Fri, Feb 28, 2020 at 3:38 AM Maulik Shah wrote: >>>> Add changes to invoke rpmh flush() from within cache_lock when the data >>>> in cache is dirty. >>>> >>>> This is done only if OSI is not supported in PSCI. If OSI is supported >>>> rpmh_flush can get invoked when the last cpu going to power collapse >>>> deepest low power mode. >>>> >>>> Also remove "depends on COMPILE_TEST" for Kconfig option QCOM_RPMH so the >>>> driver is only compiled for arm64 which supports psci_has_osi_support() >>>> API. >>>> >>>> Signed-off-by: Maulik Shah >>>> Reviewed-by: Srinivas Rao L >>>> --- >>>> drivers/soc/qcom/Kconfig | 2 +- >>>> drivers/soc/qcom/rpmh.c | 33 ++++++++++++++++++++++----------- >>>> 2 files changed, 23 insertions(+), 12 deletions(-) >>>> >>>> diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig >>>> index d0a73e7..2e581bc 100644 >>>> --- a/drivers/soc/qcom/Kconfig >>>> +++ b/drivers/soc/qcom/Kconfig >>>> @@ -105,7 +105,7 @@ config QCOM_RMTFS_MEM >>>> >>>> config QCOM_RPMH >>>> bool "Qualcomm RPM-Hardened (RPMH) Communication" >>>> - depends on ARCH_QCOM && ARM64 || COMPILE_TEST >>>> + depends on ARCH_QCOM && ARM64 >>>> help >>>> Support for communication with the hardened-RPM blocks in >>>> Qualcomm Technologies Inc (QTI) SoCs. RPMH communication uses an >>>> diff --git a/drivers/soc/qcom/rpmh.c b/drivers/soc/qcom/rpmh.c >>>> index f28afe4..6a5a60c 100644 >>>> --- a/drivers/soc/qcom/rpmh.c >>>> +++ b/drivers/soc/qcom/rpmh.c >>>> @@ -12,6 +12,7 @@ >>>> #include >>>> #include >>>> #include >>>> +#include >>>> #include >>>> #include >>>> #include >>>> @@ -158,6 +159,13 @@ static struct cache_req *cache_rpm_request(struct rpmh_ctrlr *ctrlr, >>>> } >>>> >>>> unlock: >>>> + if (ctrlr->dirty && !psci_has_osi_support()) { >>>> + if (rpmh_flush(ctrlr)) { >>>> + spin_unlock_irqrestore(&ctrlr->cache_lock, flags); >>>> + return ERR_PTR(-EINVAL); >>>> + } >>>> + } >>> It's been a long time since I looked in depth at RPMH, but upon a >>> first glance this seems like it's gonna be terrible for performance. >>> You're going to send every entry again and again, aren't you? In >>> other words in pseudo-code: >>> >>> 1. rpmh_write(addr=0x10, data=0x99); >>> ==> writes on the bus (0x10, 0x99) >>> >>> 2. rpmh_write(addr=0x11, data=0xaa); >>> ==> writes on the bus (0x10, 0x99) >>> ==> writes on the bus (0x11, 0xaa) >>> >>> 3. rpmh_write(addr=0x10, data=0xbb); >>> ==> writes on the bus (0x10, 0xbb) >>> ==> writes on the bus (0x11, 0xaa) >>> >>> 4. rpmh_write(addr=0x12, data=0xcc); >>> ==> writes on the bus (0x10, 0xbb) >>> ==> writes on the bus (0x11, 0xaa) >>> ==> writes on the bus (0x12, 0xcc) >>> >>> That seems bad. >> Hi Doug, >> >> No this is NOT how data is sent to RPMh/AOSS. >> The rpmh_flush() fills up DRV-2 (HLOS) TCSes, makes it ready and The HW >> takes care of >> sending data of Sleep TCSes for each of the EL/ DRV(s) when Last cpu is >> going to deepest >> low power mode and of WAKE TCSes while first cpu is waking up. > Ah, I see. So for sleep / wake commands we never directly wait for > them to go out on the bus while the system is awake. We just program > them all to the RPMH hardware and they'll set there and all get sent > automatically when the last CPU goes into deepest low power mode. > > ...so actually the whole point of OSI mode (from an RPMH perspective) > is not to avoid transactions on the bus. It's just avoiding > programming RPMH over and over again. Is that correct? > > ...and the reason we have all these data structures in the kernel is > to keep track of auxiliary information about the things in the > sleep/wake TCSs and make it easier to update bits of them? correct. > > >>> Why can't you just send the new request itself and >>> forget adding it to the cache? In other words don't even call >>> cache_rpm_request() in the non-OSI case and then in __rpmh_write() >>> just send right away... >> This won’t work out. Let me explain why… >> >> We have 3 SLEEP and 3 WAKE TCSes from below config.. >> qcom,tcs-config = , >> , >> , >> Each TCS has total 16 commands so total 48 commands(16*3) for each SLEEP >> and WAKE TCSes, >> that can be filled up. >> >> Now Lets take a example in pseudo-code on what could happen if we don’t >> cache and >> immediately fill up TCSes commands. The triggering part doesn’t happen >> as explained above >> it fills up TCSes and makes them ready.. >> >> Time-t0 (from client_x invoking rpmh_write_batch() for SLEEP SET, a >> batch of 3 commands) >> >> rpmh_write_batch( >> addr=0x10, data=0x99, -> fills up CMD0 in SLEEP TCS_0 >> addr=0x11, data=0xaa, -> fills up CMD1 in SLEEP TCS_0 >> addr=0x10, data=0xbb); -> fills up CMD2 in SLEEP TCS_0 >> >> Time-t1 (from client_y invoking rpmh_write(), a single command) >> >> rpmh_write( >> addr=0x12, data=0xcc, -> fills up CMD3 in SLEEP TCS_0 >> ); >> >> Time-t2 (from client_x invokes rpmh_invalidate() which invalidates all >> previous *batch requests* only) >> >> At this point, it should have CMD3 only in TCS while CMD 0,1,2 needs to >> be freed up, since we expect >> a new batch request now. >> >> Since driver didn’t cache anything in the first place, it doesn’t know >> details about previous batch request >> like how many commands it had, what were the commands of those batches >> when filling up in TCSes, and so on… >> (basically all the data required to free up only CMD 0,1,2, and don’t >> disturb CMD3) >> >> Whats more? >> >> The new batch request could be of let say 5 commands after invalidation, >> instead of 3 commands in previous batch. >> So it will not fit in CMD-0,1,2 and we might want to allocate from >> CMD-4,5,6,7,8 now. >> >> This will leave a hole in TCS CMDs (each TCS has 16 total commands) >> unless we re-arrange everything. >> Also we may want to fill up batch request first and then single >> requests, by not caching anything, driver don’t >> know which one is batch and which one is single request. > OK, I got it now. I'll try to spend some time tomorrow looking over > everything / testing with my new understanding. > > >> There are other cases like below which also gets impacted if driver >> don't cache anything... >> >> for example, when we don’t have dedicated ACTIVE TCS ( if we have below >> config with ACTIVE TCS count 0) >> qcom,tcs-config = , >> , >> , >> >> Now to send active data, driver may re-use/ re-purpose few of the sleep >> or wake TCS, to be used as ACTIVE TCS and once work is done, >> it will be re-allocated in SLEEP/ WAKE TCS pool accordingly. If driver >> don’t cache, all the SLEEP and WAKE data is lost when one >> of TCS is repurposed to use as ACTIVE TCS. > Ah, interesting. I'll read the code more, but are you expecting this > type of situation to work today, or is it theoretical for the future? yes, we have targets which needs to work with this type of situation. > > >> Hope above explanation clears why caching is important and gives clear >> view of caching v/s not caching. >> >> Thanks, >> Maulik >> >>> I tried to test this and my printouts didn't show anything actually >>> happening in rpmh_flush(). Maybe I just don't have the write patches >>> to exercise this properly... >> it may be due to missing interconnect patches series >> https://patchwork.kernel.org/project/linux-arm-msm/list/?series=247175 > I ended up pulling those in but I was still not seeing things work as > I expected. I'll debug more tomorrow to see if it was my expectations > that were wrong or if there was a real issue. > > >>>> spin_unlock_irqrestore(&ctrlr->cache_lock, flags); >>>> >>>> return req; >>>> @@ -285,26 +293,35 @@ int rpmh_write(const struct device *dev, enum rpmh_state state, >>>> } >>>> EXPORT_SYMBOL(rpmh_write); >>>> >>>> -static void cache_batch(struct rpmh_ctrlr *ctrlr, struct batch_cache_req *req) >>>> +static int cache_batch(struct rpmh_ctrlr *ctrlr, struct batch_cache_req *req) >>>> { >>>> unsigned long flags; >>>> >>>> spin_lock_irqsave(&ctrlr->cache_lock, flags); >>>> + >>>> list_add_tail(&req->list, &ctrlr->batch_cache); >>>> ctrlr->dirty = true; >>>> + >>>> + if (!psci_has_osi_support()) { >>>> + if (rpmh_flush(ctrlr)) { >>>> + spin_unlock_irqrestore(&ctrlr->cache_lock, flags); >>>> + return -EINVAL; >>>> + } >>>> + } >>>> + >>>> spin_unlock_irqrestore(&ctrlr->cache_lock, flags); >>>> + >>>> + return 0; >>>> } >>>> >>>> static int flush_batch(struct rpmh_ctrlr *ctrlr) >>>> { >>>> struct batch_cache_req *req; >>>> const struct rpmh_request *rpm_msg; >>>> - unsigned long flags; >>>> int ret = 0; >>>> int i; >>>> >>>> /* Send Sleep/Wake requests to the controller, expect no response */ >>>> - spin_lock_irqsave(&ctrlr->cache_lock, flags); >>>> list_for_each_entry(req, &ctrlr->batch_cache, list) { >>>> for (i = 0; i < req->count; i++) { >>>> rpm_msg = req->rpm_msgs + i; >>>> @@ -314,7 +331,6 @@ static int flush_batch(struct rpmh_ctrlr *ctrlr) >>>> break; >>>> } >>>> } >>>> - spin_unlock_irqrestore(&ctrlr->cache_lock, flags); >>>> >>>> return ret; >>>> } >>>> @@ -386,10 +402,8 @@ int rpmh_write_batch(const struct device *dev, enum rpmh_state state, >>>> cmd += n[i]; >>>> } >>>> >>>> - if (state != RPMH_ACTIVE_ONLY_STATE) { >>>> - cache_batch(ctrlr, req); >>>> - return 0; >>>> - } >>>> + if (state != RPMH_ACTIVE_ONLY_STATE) >>>> + return cache_batch(ctrlr, req); >>> I'm curious: why not just do: >>> >>> if (state != RPMH_ACTIVE_ONLY_STATE && psci_has_osi_support()) { >>> cache_batch(ctrlr, req); >>> return 0; >>> } >>> >>> ...AKA don't even cache it up if we're not in OSI mode. IIUC this >>> would be a huge deal because with your code you're doing the whole >>> RPMH transfer under "spin_lock_irqsave", right? And presumably RPMH >>> transfers are somewhat slow, otherwise why did anyone come up with >>> this whole caching / last-man-down scheme to start with? >>> >>> OK, it turned out to be at least slightly more complex because it >>> appears that we're supposed to use rpmh_rsc_write_ctrl_data() for >>> sleep/wake stuff and that they never do completions, but it really >>> wasn't too hard. I prototyped it at . >>> Feel free to hijack that change if it looks like a starting point and >>> if it looks like I'm not too confused. >> I looked at this change and thought of it earlier but it won’t work out >> for the reasons in above example. >> I have thought of few optimizations in rpmh_flush() to reduce its time, >> if we *really* see any performance impact… >> >> below is high level idea… >> When rpmh_write_batch() is invoked for SLEEP_SETs, currently >> rpmh_flush() will update both SLEEP and WAKE TCS contents, >> However we may change it to update only SLEEP TCS, and when >> rpmh_write_batch() is invoked for WAKE SETs, update only WAKE TCS contents. >> This way it may reduce time by roughly ~50%. > OK, that's something to keep in mind. Agree that it doesn't have the > be part of the initial change. > > -Doug Thanks, Maulik -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation