From: rishabhb@codeaurora.org To: Cristian Marussi <cristian.marussi@arm.com> Cc: sudeep.holla@arm.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, avajid@codeaurora.org, adharmap@codeaurora.org Subject: Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails Date: Mon, 01 Nov 2021 09:35:42 -0700 [thread overview] Message-ID: <aab71610e11c2dd293159576cc53e277@codeaurora.org> (raw) In-Reply-To: <20210901093558.GL13160@e120937-lin> On 2021-09-01 02:35, Cristian Marussi wrote: > On Tue, Aug 31, 2021 at 06:48:35AM +0100, Cristian Marussi wrote: >> On Mon, Aug 30, 2021 at 02:09:37PM -0700, rishabhb@codeaurora.org >> wrote: >> > Hi Christian >> >> Hi Rishabh, >> >> thanks for looking into this kind of bad interactions. >> >> > There seems to be another issue here. The response from agent can be delayed >> > causing a timeout during base protocol acquire, >> > which leads to the probe failure. What I have observed is sometimes the >> > failure of probe and rx_callback (due to a delayed message) >> > happens at the same time on different cpus. >> > Because of this race, the device memory may be cleared while the >> > interrupt(rx_callback) is executing on another cpu. >> >> You are right that concurrency was not handled properly in this kind >> of >> context and moreover, if you think about it, even the case of out of >> order reception of responses and delayed_responses (type2 SCMI >> messages) >> for asynchronous SCMI commands was not handled properly. >> >> > How do you propose we solve this? Do you think it is better to take the >> > setting up of base and other protocols out of probe and >> > in some delayed work? That would imply the device memory is not released >> > until remove is called. Or should we add locking to >> > the interrupt handler(scmi_rx_callback) and the cleanup in probe to avoid >> > the race? >> > >> >> These issues were more easily exposed by SCMI Virtio transport, so in >> the series where I introduced scmi-virtio: >> >> https://lore.kernel.org/linux-arm-kernel/162848483974.232214.9506203742448269364.b4-ty@arm.com/ >> >> (which is now queued for v5.15 ... now on -next I think...finger >> crossed) >> >> I took the chance to rectify a couple of other things in the SCMI core >> in the initial commits. >> As an example, in the above series >> >> [PATCH v7 05/15] firmware: arm_scmi: Handle concurrent and >> out-of-order messages >> >> cares to add a refcount to xfers and some locking on xfers between TX >> and RX path to avoid that a timed out xfer can vanish while the rx >> path >> is concurrently working on it (as you said); moreover I handle the >> condition (rare if not unplausible anyway) in which a transport >> delivers >> out of order responses and delayed responses. >> >> I tested this scenarios on some fake emulated SCMI Virtio transport >> where I could play any sort of mess and tricks to stress this limit >> conditions, but you're more than welcome to verify if the race you are >> seeing on Base protocol time out is solved (as I would hope :D) by >> this >> series of mine. >> >> Let me know, any feedback is welcome. >> >> Btw, in the series above there are also other minor changes, but there >> is also another more radical change needed to ensure correctness and >> protection against stale old messages which maybe could interest you >> in general if you are looking into SCMI: >> >> [PATCH v7 04/15] firmware: arm_scmi: Introduce monotonically >> increasing tokens >> >> Let me know if yo have other concerns. >> > > Hi Rishabhb, > > just a quick remark, thinking again about your fail @probe scenario > above > I realized that while the concurrency patch I mentioned above could > help on > races against vanishing xfers when late timed-out responses are > delivered, > here we really are then also shutting down everything on failure, so > there > could be further issues between a very late invokation of > scmi_rx_callback > and the core devm_ helpers freeing the underlying xfer/cinfo/etc.. > structs > used by scmi-rx-callback itself (maybe this was already what you meant > and > I didn't get it,...sorry) > > On the other side, I don't feel that delaying Base init to a deferred > worker is a viable solution since we need Base protocol init to be > initialized and we need to just give up if we cannot communicate with > the SCMI platform fw in such early stages. (Base protocol is really the > only mandatory proto is I remember correctly the spec) > > Currenly I'm off and only glancing at mails but I'll have a thought > about > these issues once back in a few weeks time. > > Thanks, > Cristian > Hi Cristian I hope you enjoyed your vacation. Did you get a chance to look at the issue stated above and have some idea as to how to solve this? > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2021-11-01 16:37 UTC|newest] Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-08-04 21:19 Rishabh Bhatnagar 2021-08-05 10:54 ` Cristian Marussi 2021-08-05 10:54 ` Cristian Marussi 2021-08-30 21:09 ` rishabhb 2021-08-31 5:48 ` Cristian Marussi 2021-08-31 5:48 ` Cristian Marussi 2021-09-01 9:35 ` Cristian Marussi 2021-09-01 9:35 ` Cristian Marussi 2021-11-01 16:35 ` rishabhb [this message] 2021-11-02 11:32 ` Sudeep Holla 2021-11-02 11:32 ` Sudeep Holla 2021-11-04 23:40 ` rishabhb 2021-11-05 9:43 ` Cristian Marussi 2021-11-05 9:43 ` Cristian Marussi 2021-11-05 17:40 ` rishabhb 2021-11-07 10:34 ` Cristian Marussi 2021-11-07 10:34 ` Cristian Marussi 2021-11-07 18:22 ` Cristian Marussi 2021-11-07 18:22 ` Cristian Marussi 2021-11-08 17:58 ` rishabhb 2021-11-09 14:38 ` Cristian Marussi 2021-11-09 14:38 ` Cristian Marussi 2021-08-09 5:00 ` Sudeep Holla 2021-08-09 5:00 ` Sudeep Holla
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=aab71610e11c2dd293159576cc53e277@codeaurora.org \ --to=rishabhb@codeaurora.org \ --cc=adharmap@codeaurora.org \ --cc=avajid@codeaurora.org \ --cc=cristian.marussi@arm.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=sudeep.holla@arm.com \ --subject='Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.