From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A066C433F5 for ; Sun, 7 Nov 2021 18:22:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 11B2360FD8 for ; Sun, 7 Nov 2021 18:22:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236107AbhKGSZC (ORCPT ); Sun, 7 Nov 2021 13:25:02 -0500 Received: from foss.arm.com ([217.140.110.172]:44208 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235313AbhKGSY7 (ORCPT ); Sun, 7 Nov 2021 13:24:59 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 665772B; Sun, 7 Nov 2021 10:22:16 -0800 (PST) Received: from e120937-lin (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0F6B43F718; Sun, 7 Nov 2021 10:22:14 -0800 (PST) Date: Sun, 7 Nov 2021 18:22:12 +0000 From: Cristian Marussi To: rishabhb@codeaurora.org Cc: Sudeep Holla , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, avajid@codeaurora.org, adharmap@codeaurora.org Subject: Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails Message-ID: <20211107182212.GK6526@e120937-lin> References: <20210805105427.GU6592@e120937-lin> <51782599a01a6a22409d01e5fc1f8a50@codeaurora.org> <20210831054835.GJ13160@e120937-lin> <20210901093558.GL13160@e120937-lin> <20211102113221.w7ivffssjb6jmggj@bogus> <9385b2ca9b688b00735cc0b7f626f008@codeaurora.org> <20211105094310.GI6526@e120937-lin> <20211107103407.GJ6526@e120937-lin> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20211107103407.GJ6526@e120937-lin> User-Agent: Mutt/1.9.4 (2018-02-28) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Nov 07, 2021 at 10:34:07AM +0000, Cristian Marussi wrote: > On Fri, Nov 05, 2021 at 10:40:59AM -0700, rishabhb@codeaurora.org wrote: > > On 2021-11-05 02:43, Cristian Marussi wrote: > > > On Thu, Nov 04, 2021 at 04:40:03PM -0700, rishabhb@codeaurora.org wrote: > > > > On 2021-11-02 04:32, Sudeep Holla wrote: > > > > > On Mon, Nov 01, 2021 at 09:35:42AM -0700, rishabhb@codeaurora.org wrote: > > > > > > On 2021-09-01 02:35, Cristian Marussi wrote: > > > > > > > On Tue, Aug 31, 2021 at 06:48:35AM +0100, Cristian Marussi wrote: > > > > > > > > On Mon, Aug 30, 2021 at 02:09:37PM -0700, rishabhb@codeaurora.org > > > > > > > > wrote: > > > > > > > > > Hi Christian > > > > > > > > > > > > > > > > Hi Rishabh, > > > > > > > > > > > > > > Hi Rishabh, > > > > > Hi Rishabh, > Hi Rishabhb, > > > apologies for the delay in coming back to you. > > > A few comments below. > > > > > > > > > > > thanks for looking into this kind of bad interactions. > > > > > > > > > > > > > > > > > There seems to be another issue here. The response from agent can be delayed > > > > > > > > > causing a timeout during base protocol acquire, > > > > > > > > > which leads to the probe failure. What I have observed is sometimes the > > > > > > > > > failure of probe and rx_callback (due to a delayed message) > > > > > > > > > happens at the same time on different cpus. > > > > > > > > > Because of this race, the device memory may be cleared while the > > > > > > > > > interrupt(rx_callback) is executing on another cpu. > > > > > > > > > > > > > > > > You are right that concurrency was not handled properly in this kind > > > > > > > > of > > > > > > > > context and moreover, if you think about it, even the case of out of > > > > > > > > order reception of responses and delayed_responses (type2 SCMI > > > > > > > > messages) > > > > > > > > for asynchronous SCMI commands was not handled properly. > > > > > > > > > > > > > > > > > How do you propose we solve this? Do you think it is better to take the > > > > > > > > > setting up of base and other protocols out of probe and > > > > > > > > > in some delayed work? That would imply the device memory is not released > > > > > > > > > until remove is called. Or should we add locking to > > > > > > > > > the interrupt handler(scmi_rx_callback) and the cleanup in probe to avoid > > > > > > > > > the race? > > > > > > > > > > > > > > > > > > > > > > > > > These issues were more easily exposed by SCMI Virtio transport, so in > > > > > > > > the series where I introduced scmi-virtio: > > > > > > > > > > > > > > > > https://lore.kernel.org/linux-arm-kernel/162848483974.232214.9506203742448269364.b4-ty@arm.com/ > > > > > > > > > > > > > > > > (which is now queued for v5.15 ... now on -next I think...finger > > > > > > > > crossed) > > > > > > > > > > > > > > > > I took the chance to rectify a couple of other things in the SCMI core > > > > > > > > in the initial commits. > > > > > > > > As an example, in the above series > > > > > > > > > > > > > > > > [PATCH v7 05/15] firmware: arm_scmi: Handle concurrent and > > > > > > > > out-of-order messages > > > > > > > > > > > > > > > > cares to add a refcount to xfers and some locking on xfers between TX > > > > > > > > and RX path to avoid that a timed out xfer can vanish while the rx > > > > > > > > path > > > > > > > > is concurrently working on it (as you said); moreover I handle the > > > > > > > > condition (rare if not unplausible anyway) in which a transport > > > > > > > > delivers > > > > > > > > out of order responses and delayed responses. > > > > > > > > > > > > > > > > I tested this scenarios on some fake emulated SCMI Virtio transport > > > > > > > > where I could play any sort of mess and tricks to stress this limit > > > > > > > > conditions, but you're more than welcome to verify if the race you are > > > > > > > > seeing on Base protocol time out is solved (as I would hope :D) by > > > > > > > > this > > > > > > > > series of mine. > > > > > > > > > > > > > > > > Let me know, any feedback is welcome. > > > > > > > > > > > > > > > > Btw, in the series above there are also other minor changes, but there > > > > > > > > is also another more radical change needed to ensure correctness and > > > > > > > > protection against stale old messages which maybe could interest you > > > > > > > > in general if you are looking into SCMI: > > > > > > > > > > > > > > > > [PATCH v7 04/15] firmware: arm_scmi: Introduce monotonically > > > > > > > > increasing tokens > > > > > > > > > > > > > > > > Let me know if yo have other concerns. > > > > > > > > > > > > > > > > > > > > > > Hi Rishabhb, > > > > > > > > > > > > > > just a quick remark, thinking again about your fail @probe scenario > > > > > > > above > > > > > > > I realized that while the concurrency patch I mentioned above could help > > > > > > > on > > > > > > > races against vanishing xfers when late timed-out responses are > > > > > > > delivered, > > > > > > > here we really are then also shutting down everything on failure, so > > > > > > > there > > > > > > > could be further issues between a very late invokation of > > > > > > > scmi_rx_callback > > > > > > > and the core devm_ helpers freeing the underlying xfer/cinfo/etc.. > > > > > > > structs > > > > > > > used by scmi-rx-callback itself (maybe this was already what you meant > > > > > > > and > > > > > > > I didn't get it,...sorry) > > > > > > > > > > > > > > On the other side, I don't feel that delaying Base init to a deferred > > > > > > > worker is a viable solution since we need Base protocol init to be > > > > > > > initialized and we need to just give up if we cannot communicate with > > > > > > > the SCMI platform fw in such early stages. (Base protocol is really the > > > > > > > only mandatory proto is I remember correctly the spec) > > > > > > > > > > > > > > Currenly I'm off and only glancing at mails but I'll have a thought > > > > > > > about > > > > > > > these issues once back in a few weeks time. > > > > > > > > > > > > > > Thanks, > > > > > > > Cristian > > > > > > > > > > > > > Hi Cristian > > > > > > I hope you enjoyed your vacation. Did you get a chance to look at > > > > > > the issue > > > > > > stated above and have some idea as to how to solve this? > > > > > > > > > > Do you still see the issue with v5.15 ? Can you please check if haven't > > > > > already done that ? > > > > > > > > > > Also 30ms delay we have is huge IMO and we typically expect the > > > > > communication > > > > > with remote processor or any entity that implements SCMI to happen in > > > > > terms > > > > > of one or few ms tops. > > > > > > > > > > If there is a race, we need to fix that but I am interested in knowing > > > > > why the default time of 30ms not sufficient ? Did increasing that helps > > > > > and is this timeout happening only for the initial commands(guessing the > > > > > SCMI firmware is not yet ready) or does it happen even during run-time ? > > > > > > > > Hi Sudeep > > > > I haven't checked on 5.15 but after glancing at the code I believe > > > > we should > > > > see the same issue. > > > > I agree 30ms is a big enough value and should be something that remote > > > > firmware should resolve. But > > > > if remote firmware goes into a bad state and not functioning > > > > properly at > > > > least kernel should not panic. > > > > > > > > The issue we see here happens during scmi probe. The response from the > > > > remote agent can be delayed > > > > causing a timeout during base protocol acquire, which leads to the > > > > probe > > > > failure. > > > > What I have observed is sometimes the failure of probe and > > > > rx_callback (due > > > > to a delayed message) > > > > happens around the same time on different cpus. Because of this > > > > race, the > > > > device memory may be cleared > > > > while the interrupt(rx_callback) is executing on another cpu. > > > > > > So I was looking at the failure path you mentioned: a late concurrent > > > reply on Base protocol from the fw, during the probe, leads to an > > > invocation > > > of scmi_rx_callback() on a different CPU while core data structs like > > > cinfo are being freed by the SCMI core on the probe failure path. > > > (v5.15-added SCMI concurrrency handling stuff I mentiond shuld help for > > > races regarding xfer but not for the cinfo stuff in this case ...) > > > > > > We cannot defer Base proto init since we just wanna fail early while > > > probing if not even the Base protocol can work fine, and also because > > > Base protocol information are indeed needed for initial setup, so we > > > cannot juts proceed if we did not even got a Base reply on the number of > > > protos. (already said) > > > > > > In my opinion, the proper way to address this kind of races at probe > > > failure should be to ensure that the transport you are using is properly > > > shut down completely before cleanup starts (same applies for a clean > > > remove), i.e. scmi_rx_callback should not even be possibly registered to > > > be called when the the final cleanup by the core is started (devm_ frees > > > I mean after scmi_probe exit failing...) > > > > > > BUT indeed looking back at transport layers like mailbox and virtio, > > > this > > > should be happening already, because the flow is like > > > > > > scmi_probe() > > > { > > > ... > > > > > > clean_tx_rx_setup: > > > scmi_cleanup_txrx_channels() > > > .... > > > --->>> ret = idr_for_each(idr, info->desc->ops->chan_free, idr); > > > - > > > return ret; > > > } > > > > > > .... only after this scmi_probe returns the core devm layer starts > > > freeing devm_ > > > allocated stuff like cinfo, AND the above per-transport specific > > > .chan_free seems > > > to take care to 'deactivate/dregister' the scmi_rx_callback at the > > > transport layer: > > > > > > > > > e.g. MBOX transport > > > ------------------------- > > > static int mailbox_chan_free(int id, void *p, void *data) > > > { > > > struct scmi_chan_info *cinfo = p; > > > struct scmi_mailbox *smbox = cinfo->transport_info; > > > > > > if (smbox && !IS_ERR(smbox->chan)) { > > > mbox_free_channel(smbox->chan); <<< THIS MBOX CORE CALL DEACTIVATE > > > cinfo->transport_info = NULL; > > > > > > > > > e.g. VIRTIO Transport > > > ----------------------------- > > > static int virtio_chan_free(int id, void *p, void *data) > > > { > > > unsigned long flags; > > > struct scmi_chan_info *cinfo = p; > > > struct scmi_vio_channel *vioch = cinfo->transport_info; > > > > > > spin_lock_irqsave(&vioch->ready_lock, flags); > > > vioch->ready = false; <<<< THIS VIRTIO FLAG > > > DEACTIVATE VIRTIO CBS INVOKCATION > > > spin_unlock_irqrestore(&vioch->ready_lock, flags); > > > > > > > > > ... AND both of the above call are indeed also spinlocked heavily, so > > > that > > > the 'deactivation' of the scmi_rx_callback should be visible properly; > > > in > > > other words I would expect that after the above .chan_free() have > > > completed the scmi_rx_callback() cannot be called anymore, because the > > > transport itself will properly drop any so-late fw reply. > > > > > > So I am now wondering, which transport are you using in your tests ? > > > since at least for the above 2 example it seems to me that your > > > race-on-probe failure condition should be already addressed by the > > > transport layer itself....or am I getting wrong the nature of the race ? > > > > > > Thanks > > > Cristian > > > > Hi Cristian > > You caught the scenario perfectly. But there is still a possibility of a > > race. To be clear we use > > the mbox transport. Let me explain in more detail. > > Lets assume that the last command (base protocol acquire) kernel sent to > > remote agent timed out. > > This will lead to final cleanup before exiting probe like you mentioned. > > Once cleanup is done(mailbox_chan_free) > > no more responses from remote agent will acknowledged but if the response > > comes in between the cleanup in probe > > and the last command timing out we will see a race since the response can > > come asynchronously. In this scenario cleanup > > and scmi_rx_callback race with each other. > > I believe to solve this we need to synchronize cleanup with > > scmi_rx_callback. we can serialize these two paths > > and exit early in rx_callback if cleanup has been completed. > > > > Yes indeed, but my concern is also not to introduce to much contention > on the RX path (with irqsave spinlocking & friends), given that this racy > scenario has surely to be handled but it is also highly unlikely, so I don't > want to slow down all the rx path all the time. > > So I tried something along this lines: > > ----8<------ > diff --git a/drivers/firmware/arm_scmi/common.h b/drivers/firmware/arm_scmi/common.h > index dea1bfbe1052..036f8ccff450 100644 > --- a/drivers/firmware/arm_scmi/common.h > +++ b/drivers/firmware/arm_scmi/common.h > @@ -340,11 +340,13 @@ void scmi_protocol_release(const struct scmi_handle *handle, u8 protocol_id); > * channel > * @handle: Pointer to SCMI entity handle > * @transport_info: Transport layer related information > + * @users: A refcount to track active users of this channel > */ > struct scmi_chan_info { > struct device *dev; > struct scmi_handle *handle; > void *transport_info; > + refcount_t users; > }; > > /** > diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c > index b406b3f78f46..5814ed3f444e 100644 > --- a/drivers/firmware/arm_scmi/driver.c > +++ b/drivers/firmware/arm_scmi/driver.c > @@ -678,6 +678,16 @@ static void scmi_handle_response(struct scmi_chan_info *cinfo, > scmi_xfer_command_release(info, xfer); > } > > +static inline bool scmi_acquire_channel(struct scmi_chan_info *cinfo) > +{ > + return refcount_inc_not_zero(&cinfo->users); > +} > + > +static inline void scmi_release_channel(struct scmi_chan_info *cinfo) > +{ > + return refcount_dec(&cinfo->users); > +} > + > /** > * scmi_rx_callback() - callback for receiving messages > * > @@ -695,6 +705,10 @@ void scmi_rx_callback(struct scmi_chan_info *cinfo, u32 msg_hdr, void *priv) > { > u8 msg_type = MSG_XTRACT_TYPE(msg_hdr); > > + /* Bail out if channel freed already */ > + if (!scmi_acquire_channel(cinfo)) > + return; > + > switch (msg_type) { > case MSG_TYPE_NOTIFICATION: > scmi_handle_notification(cinfo, msg_hdr, priv); > @@ -707,6 +721,8 @@ void scmi_rx_callback(struct scmi_chan_info *cinfo, u32 msg_hdr, void *priv) > WARN_ONCE(1, "received unknown msg_type:%d\n", msg_type); > break; > } > + > + scmi_release_channel(cinfo); > } > > /** > @@ -1506,10 +1522,27 @@ static int scmi_chan_setup(struct scmi_info *info, struct device *dev, > return ret; > } > > + refcount_set(&cinfo->users, 1); > cinfo->handle = &info->handle; > return 0; > } > > +static int scmi_chan_free(int id, void *p, void *data) > +{ > + struct scmi_chan_info *cinfo = p; > + struct scmi_info *info = handle_to_scmi_info(cinfo->handle); > + > + if (refcount_dec_and_test(&cinfo->users)) { > + info->desc->ops->chan_free(id, cinfo, data); > + } else { > + /* Stall till the ongoing rx_callback completes */ > + spin_until_cond(refcount_read(&cinfo->users) == 0); > + info->desc->ops->chan_free(id, cinfo, data); > + } > + > + return 0; > +} > + > static inline int > scmi_txrx_setup(struct scmi_info *info, struct device *dev, int prot_id) > { > @@ -1792,11 +1825,11 @@ static int scmi_cleanup_txrx_channels(struct scmi_info *info) > int ret; > struct idr *idr = &info->tx_idr; > > - ret = idr_for_each(idr, info->desc->ops->chan_free, idr); > + ret = idr_for_each(idr, scmi_chan_free, idr); > idr_destroy(&info->tx_idr); > > idr = &info->rx_idr; > - ret = idr_for_each(idr, info->desc->ops->chan_free, idr); > + ret = idr_for_each(idr, scmi_chan_free, idr); > idr_destroy(&info->rx_idr); > > return ret; > > ------8<----- > > Can you give it a go on your setup ? > > Beware it is not really tested on the racy error path (:P) and I could have > still missed something regarding synchro (and I expect an undesired refcount > warn on the scmi_release_channel too when the race is hit....but just to > experiment a bit for now and see if something like this could be enough while > avoiding further locking) > o Looking back at this patch of mine, even though it could work for the racy issue at hand, it is currently clearly completely broken on the regular unload/free flow since cinfo structs can be re-used multiple times. Sorry, please ignore this attempt, I'll rework in a more sensible way. Thanks, Cristian From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8988C433EF for ; Sun, 7 Nov 2021 18:24:25 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 706A66139E for ; Sun, 7 Nov 2021 18:24:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 706A66139E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=aLINbZtoXeuJ6IevgfdrtiAh83CzeMgO2IfZjXz6OJk=; b=WlL+PfodvOeKdx UlopFQBQHO84AdnxlU9tX4dA2feQGYCJRxjJbQPqC7DptGtpjaJpvpNHRMyKDOFg9WljAt0DozmXO /OOLN7x6E8idDGL5n2Ot8P8WTqz0VMEZVp6bAaNWA8mlgyuYGOnp8pkUi7G9+G0faJgnMJGIywDVw nAT4i1Rp0Z+SXTBO8M0PeUEI6ScSn+goaOtxe+t6HrhPs4bj6lrd5m7G6rfw86nNtJdpaOV8cwRZN Bn1VdVg0jNicjiq+HzMi3ymIhXvzrvAXRNYn5MqhPR3qB8s3+FjP62Epdlbv9Fbv8f/++tVga+QwZ Zh5Ffe7qtwjUG/fUbU/g==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mjmoL-00EgzR-GG; Sun, 07 Nov 2021 18:22:33 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mjmo7-00Egxl-QP for linux-arm-kernel@lists.infradead.org; Sun, 07 Nov 2021 18:22:22 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 665772B; Sun, 7 Nov 2021 10:22:16 -0800 (PST) Received: from e120937-lin (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0F6B43F718; Sun, 7 Nov 2021 10:22:14 -0800 (PST) Date: Sun, 7 Nov 2021 18:22:12 +0000 From: Cristian Marussi To: rishabhb@codeaurora.org Cc: Sudeep Holla , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, avajid@codeaurora.org, adharmap@codeaurora.org Subject: Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails Message-ID: <20211107182212.GK6526@e120937-lin> References: <20210805105427.GU6592@e120937-lin> <51782599a01a6a22409d01e5fc1f8a50@codeaurora.org> <20210831054835.GJ13160@e120937-lin> <20210901093558.GL13160@e120937-lin> <20211102113221.w7ivffssjb6jmggj@bogus> <9385b2ca9b688b00735cc0b7f626f008@codeaurora.org> <20211105094310.GI6526@e120937-lin> <20211107103407.GJ6526@e120937-lin> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20211107103407.GJ6526@e120937-lin> User-Agent: Mutt/1.9.4 (2018-02-28) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20211107_102219_997764_A937E766 X-CRM114-Status: GOOD ( 86.98 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Sun, Nov 07, 2021 at 10:34:07AM +0000, Cristian Marussi wrote: > On Fri, Nov 05, 2021 at 10:40:59AM -0700, rishabhb@codeaurora.org wrote: > > On 2021-11-05 02:43, Cristian Marussi wrote: > > > On Thu, Nov 04, 2021 at 04:40:03PM -0700, rishabhb@codeaurora.org wrote: > > > > On 2021-11-02 04:32, Sudeep Holla wrote: > > > > > On Mon, Nov 01, 2021 at 09:35:42AM -0700, rishabhb@codeaurora.org wrote: > > > > > > On 2021-09-01 02:35, Cristian Marussi wrote: > > > > > > > On Tue, Aug 31, 2021 at 06:48:35AM +0100, Cristian Marussi wrote: > > > > > > > > On Mon, Aug 30, 2021 at 02:09:37PM -0700, rishabhb@codeaurora.org > > > > > > > > wrote: > > > > > > > > > Hi Christian > > > > > > > > > > > > > > > > Hi Rishabh, > > > > > > > > > > > > > > Hi Rishabh, > > > > > Hi Rishabh, > Hi Rishabhb, > > > apologies for the delay in coming back to you. > > > A few comments below. > > > > > > > > > > > thanks for looking into this kind of bad interactions. > > > > > > > > > > > > > > > > > There seems to be another issue here. The response from agent can be delayed > > > > > > > > > causing a timeout during base protocol acquire, > > > > > > > > > which leads to the probe failure. What I have observed is sometimes the > > > > > > > > > failure of probe and rx_callback (due to a delayed message) > > > > > > > > > happens at the same time on different cpus. > > > > > > > > > Because of this race, the device memory may be cleared while the > > > > > > > > > interrupt(rx_callback) is executing on another cpu. > > > > > > > > > > > > > > > > You are right that concurrency was not handled properly in this kind > > > > > > > > of > > > > > > > > context and moreover, if you think about it, even the case of out of > > > > > > > > order reception of responses and delayed_responses (type2 SCMI > > > > > > > > messages) > > > > > > > > for asynchronous SCMI commands was not handled properly. > > > > > > > > > > > > > > > > > How do you propose we solve this? Do you think it is better to take the > > > > > > > > > setting up of base and other protocols out of probe and > > > > > > > > > in some delayed work? That would imply the device memory is not released > > > > > > > > > until remove is called. Or should we add locking to > > > > > > > > > the interrupt handler(scmi_rx_callback) and the cleanup in probe to avoid > > > > > > > > > the race? > > > > > > > > > > > > > > > > > > > > > > > > > These issues were more easily exposed by SCMI Virtio transport, so in > > > > > > > > the series where I introduced scmi-virtio: > > > > > > > > > > > > > > > > https://lore.kernel.org/linux-arm-kernel/162848483974.232214.9506203742448269364.b4-ty@arm.com/ > > > > > > > > > > > > > > > > (which is now queued for v5.15 ... now on -next I think...finger > > > > > > > > crossed) > > > > > > > > > > > > > > > > I took the chance to rectify a couple of other things in the SCMI core > > > > > > > > in the initial commits. > > > > > > > > As an example, in the above series > > > > > > > > > > > > > > > > [PATCH v7 05/15] firmware: arm_scmi: Handle concurrent and > > > > > > > > out-of-order messages > > > > > > > > > > > > > > > > cares to add a refcount to xfers and some locking on xfers between TX > > > > > > > > and RX path to avoid that a timed out xfer can vanish while the rx > > > > > > > > path > > > > > > > > is concurrently working on it (as you said); moreover I handle the > > > > > > > > condition (rare if not unplausible anyway) in which a transport > > > > > > > > delivers > > > > > > > > out of order responses and delayed responses. > > > > > > > > > > > > > > > > I tested this scenarios on some fake emulated SCMI Virtio transport > > > > > > > > where I could play any sort of mess and tricks to stress this limit > > > > > > > > conditions, but you're more than welcome to verify if the race you are > > > > > > > > seeing on Base protocol time out is solved (as I would hope :D) by > > > > > > > > this > > > > > > > > series of mine. > > > > > > > > > > > > > > > > Let me know, any feedback is welcome. > > > > > > > > > > > > > > > > Btw, in the series above there are also other minor changes, but there > > > > > > > > is also another more radical change needed to ensure correctness and > > > > > > > > protection against stale old messages which maybe could interest you > > > > > > > > in general if you are looking into SCMI: > > > > > > > > > > > > > > > > [PATCH v7 04/15] firmware: arm_scmi: Introduce monotonically > > > > > > > > increasing tokens > > > > > > > > > > > > > > > > Let me know if yo have other concerns. > > > > > > > > > > > > > > > > > > > > > > Hi Rishabhb, > > > > > > > > > > > > > > just a quick remark, thinking again about your fail @probe scenario > > > > > > > above > > > > > > > I realized that while the concurrency patch I mentioned above could help > > > > > > > on > > > > > > > races against vanishing xfers when late timed-out responses are > > > > > > > delivered, > > > > > > > here we really are then also shutting down everything on failure, so > > > > > > > there > > > > > > > could be further issues between a very late invokation of > > > > > > > scmi_rx_callback > > > > > > > and the core devm_ helpers freeing the underlying xfer/cinfo/etc.. > > > > > > > structs > > > > > > > used by scmi-rx-callback itself (maybe this was already what you meant > > > > > > > and > > > > > > > I didn't get it,...sorry) > > > > > > > > > > > > > > On the other side, I don't feel that delaying Base init to a deferred > > > > > > > worker is a viable solution since we need Base protocol init to be > > > > > > > initialized and we need to just give up if we cannot communicate with > > > > > > > the SCMI platform fw in such early stages. (Base protocol is really the > > > > > > > only mandatory proto is I remember correctly the spec) > > > > > > > > > > > > > > Currenly I'm off and only glancing at mails but I'll have a thought > > > > > > > about > > > > > > > these issues once back in a few weeks time. > > > > > > > > > > > > > > Thanks, > > > > > > > Cristian > > > > > > > > > > > > > Hi Cristian > > > > > > I hope you enjoyed your vacation. Did you get a chance to look at > > > > > > the issue > > > > > > stated above and have some idea as to how to solve this? > > > > > > > > > > Do you still see the issue with v5.15 ? Can you please check if haven't > > > > > already done that ? > > > > > > > > > > Also 30ms delay we have is huge IMO and we typically expect the > > > > > communication > > > > > with remote processor or any entity that implements SCMI to happen in > > > > > terms > > > > > of one or few ms tops. > > > > > > > > > > If there is a race, we need to fix that but I am interested in knowing > > > > > why the default time of 30ms not sufficient ? Did increasing that helps > > > > > and is this timeout happening only for the initial commands(guessing the > > > > > SCMI firmware is not yet ready) or does it happen even during run-time ? > > > > > > > > Hi Sudeep > > > > I haven't checked on 5.15 but after glancing at the code I believe > > > > we should > > > > see the same issue. > > > > I agree 30ms is a big enough value and should be something that remote > > > > firmware should resolve. But > > > > if remote firmware goes into a bad state and not functioning > > > > properly at > > > > least kernel should not panic. > > > > > > > > The issue we see here happens during scmi probe. The response from the > > > > remote agent can be delayed > > > > causing a timeout during base protocol acquire, which leads to the > > > > probe > > > > failure. > > > > What I have observed is sometimes the failure of probe and > > > > rx_callback (due > > > > to a delayed message) > > > > happens around the same time on different cpus. Because of this > > > > race, the > > > > device memory may be cleared > > > > while the interrupt(rx_callback) is executing on another cpu. > > > > > > So I was looking at the failure path you mentioned: a late concurrent > > > reply on Base protocol from the fw, during the probe, leads to an > > > invocation > > > of scmi_rx_callback() on a different CPU while core data structs like > > > cinfo are being freed by the SCMI core on the probe failure path. > > > (v5.15-added SCMI concurrrency handling stuff I mentiond shuld help for > > > races regarding xfer but not for the cinfo stuff in this case ...) > > > > > > We cannot defer Base proto init since we just wanna fail early while > > > probing if not even the Base protocol can work fine, and also because > > > Base protocol information are indeed needed for initial setup, so we > > > cannot juts proceed if we did not even got a Base reply on the number of > > > protos. (already said) > > > > > > In my opinion, the proper way to address this kind of races at probe > > > failure should be to ensure that the transport you are using is properly > > > shut down completely before cleanup starts (same applies for a clean > > > remove), i.e. scmi_rx_callback should not even be possibly registered to > > > be called when the the final cleanup by the core is started (devm_ frees > > > I mean after scmi_probe exit failing...) > > > > > > BUT indeed looking back at transport layers like mailbox and virtio, > > > this > > > should be happening already, because the flow is like > > > > > > scmi_probe() > > > { > > > ... > > > > > > clean_tx_rx_setup: > > > scmi_cleanup_txrx_channels() > > > .... > > > --->>> ret = idr_for_each(idr, info->desc->ops->chan_free, idr); > > > - > > > return ret; > > > } > > > > > > .... only after this scmi_probe returns the core devm layer starts > > > freeing devm_ > > > allocated stuff like cinfo, AND the above per-transport specific > > > .chan_free seems > > > to take care to 'deactivate/dregister' the scmi_rx_callback at the > > > transport layer: > > > > > > > > > e.g. MBOX transport > > > ------------------------- > > > static int mailbox_chan_free(int id, void *p, void *data) > > > { > > > struct scmi_chan_info *cinfo = p; > > > struct scmi_mailbox *smbox = cinfo->transport_info; > > > > > > if (smbox && !IS_ERR(smbox->chan)) { > > > mbox_free_channel(smbox->chan); <<< THIS MBOX CORE CALL DEACTIVATE > > > cinfo->transport_info = NULL; > > > > > > > > > e.g. VIRTIO Transport > > > ----------------------------- > > > static int virtio_chan_free(int id, void *p, void *data) > > > { > > > unsigned long flags; > > > struct scmi_chan_info *cinfo = p; > > > struct scmi_vio_channel *vioch = cinfo->transport_info; > > > > > > spin_lock_irqsave(&vioch->ready_lock, flags); > > > vioch->ready = false; <<<< THIS VIRTIO FLAG > > > DEACTIVATE VIRTIO CBS INVOKCATION > > > spin_unlock_irqrestore(&vioch->ready_lock, flags); > > > > > > > > > ... AND both of the above call are indeed also spinlocked heavily, so > > > that > > > the 'deactivation' of the scmi_rx_callback should be visible properly; > > > in > > > other words I would expect that after the above .chan_free() have > > > completed the scmi_rx_callback() cannot be called anymore, because the > > > transport itself will properly drop any so-late fw reply. > > > > > > So I am now wondering, which transport are you using in your tests ? > > > since at least for the above 2 example it seems to me that your > > > race-on-probe failure condition should be already addressed by the > > > transport layer itself....or am I getting wrong the nature of the race ? > > > > > > Thanks > > > Cristian > > > > Hi Cristian > > You caught the scenario perfectly. But there is still a possibility of a > > race. To be clear we use > > the mbox transport. Let me explain in more detail. > > Lets assume that the last command (base protocol acquire) kernel sent to > > remote agent timed out. > > This will lead to final cleanup before exiting probe like you mentioned. > > Once cleanup is done(mailbox_chan_free) > > no more responses from remote agent will acknowledged but if the response > > comes in between the cleanup in probe > > and the last command timing out we will see a race since the response can > > come asynchronously. In this scenario cleanup > > and scmi_rx_callback race with each other. > > I believe to solve this we need to synchronize cleanup with > > scmi_rx_callback. we can serialize these two paths > > and exit early in rx_callback if cleanup has been completed. > > > > Yes indeed, but my concern is also not to introduce to much contention > on the RX path (with irqsave spinlocking & friends), given that this racy > scenario has surely to be handled but it is also highly unlikely, so I don't > want to slow down all the rx path all the time. > > So I tried something along this lines: > > ----8<------ > diff --git a/drivers/firmware/arm_scmi/common.h b/drivers/firmware/arm_scmi/common.h > index dea1bfbe1052..036f8ccff450 100644 > --- a/drivers/firmware/arm_scmi/common.h > +++ b/drivers/firmware/arm_scmi/common.h > @@ -340,11 +340,13 @@ void scmi_protocol_release(const struct scmi_handle *handle, u8 protocol_id); > * channel > * @handle: Pointer to SCMI entity handle > * @transport_info: Transport layer related information > + * @users: A refcount to track active users of this channel > */ > struct scmi_chan_info { > struct device *dev; > struct scmi_handle *handle; > void *transport_info; > + refcount_t users; > }; > > /** > diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c > index b406b3f78f46..5814ed3f444e 100644 > --- a/drivers/firmware/arm_scmi/driver.c > +++ b/drivers/firmware/arm_scmi/driver.c > @@ -678,6 +678,16 @@ static void scmi_handle_response(struct scmi_chan_info *cinfo, > scmi_xfer_command_release(info, xfer); > } > > +static inline bool scmi_acquire_channel(struct scmi_chan_info *cinfo) > +{ > + return refcount_inc_not_zero(&cinfo->users); > +} > + > +static inline void scmi_release_channel(struct scmi_chan_info *cinfo) > +{ > + return refcount_dec(&cinfo->users); > +} > + > /** > * scmi_rx_callback() - callback for receiving messages > * > @@ -695,6 +705,10 @@ void scmi_rx_callback(struct scmi_chan_info *cinfo, u32 msg_hdr, void *priv) > { > u8 msg_type = MSG_XTRACT_TYPE(msg_hdr); > > + /* Bail out if channel freed already */ > + if (!scmi_acquire_channel(cinfo)) > + return; > + > switch (msg_type) { > case MSG_TYPE_NOTIFICATION: > scmi_handle_notification(cinfo, msg_hdr, priv); > @@ -707,6 +721,8 @@ void scmi_rx_callback(struct scmi_chan_info *cinfo, u32 msg_hdr, void *priv) > WARN_ONCE(1, "received unknown msg_type:%d\n", msg_type); > break; > } > + > + scmi_release_channel(cinfo); > } > > /** > @@ -1506,10 +1522,27 @@ static int scmi_chan_setup(struct scmi_info *info, struct device *dev, > return ret; > } > > + refcount_set(&cinfo->users, 1); > cinfo->handle = &info->handle; > return 0; > } > > +static int scmi_chan_free(int id, void *p, void *data) > +{ > + struct scmi_chan_info *cinfo = p; > + struct scmi_info *info = handle_to_scmi_info(cinfo->handle); > + > + if (refcount_dec_and_test(&cinfo->users)) { > + info->desc->ops->chan_free(id, cinfo, data); > + } else { > + /* Stall till the ongoing rx_callback completes */ > + spin_until_cond(refcount_read(&cinfo->users) == 0); > + info->desc->ops->chan_free(id, cinfo, data); > + } > + > + return 0; > +} > + > static inline int > scmi_txrx_setup(struct scmi_info *info, struct device *dev, int prot_id) > { > @@ -1792,11 +1825,11 @@ static int scmi_cleanup_txrx_channels(struct scmi_info *info) > int ret; > struct idr *idr = &info->tx_idr; > > - ret = idr_for_each(idr, info->desc->ops->chan_free, idr); > + ret = idr_for_each(idr, scmi_chan_free, idr); > idr_destroy(&info->tx_idr); > > idr = &info->rx_idr; > - ret = idr_for_each(idr, info->desc->ops->chan_free, idr); > + ret = idr_for_each(idr, scmi_chan_free, idr); > idr_destroy(&info->rx_idr); > > return ret; > > ------8<----- > > Can you give it a go on your setup ? > > Beware it is not really tested on the racy error path (:P) and I could have > still missed something regarding synchro (and I expect an undesired refcount > warn on the scmi_release_channel too when the race is hit....but just to > experiment a bit for now and see if something like this could be enough while > avoiding further locking) > o Looking back at this patch of mine, even though it could work for the racy issue at hand, it is currently clearly completely broken on the regular unload/free flow since cinfo structs can be re-used multiple times. Sorry, please ignore this attempt, I'll rework in a more sensible way. Thanks, Cristian _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel