* Query regarding "firmware: arm_scmi: Free mailbox channels if probe fails" @ 2022-09-22 5:01 Shivnandan Kumar 2022-09-22 14:54 ` Cristian Marussi 0 siblings, 1 reply; 7+ messages in thread From: Shivnandan Kumar @ 2022-09-22 5:01 UTC (permalink / raw) To: cristian.marussi, sudeep.holla Cc: linux-arm-kernel, linux-kernel, quic_rgottimu, quic_avajid Hi Christian, Do you have any update or suggestion regarding thread https://lore.kernel.org/lkml/20211105094310.GI6526@e120937-lin/T/#m07993053f6f238864acad4e9bad9f08d85aeb019. We are still getting this issue and wanted to check if there is any fix that I can try. Thanks, Shivnandan _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Query regarding "firmware: arm_scmi: Free mailbox channels if probe fails" 2022-09-22 5:01 Query regarding "firmware: arm_scmi: Free mailbox channels if probe fails" Shivnandan Kumar @ 2022-09-22 14:54 ` Cristian Marussi 2022-09-22 15:28 ` Cristian Marussi 0 siblings, 1 reply; 7+ messages in thread From: Cristian Marussi @ 2022-09-22 14:54 UTC (permalink / raw) To: Shivnandan Kumar Cc: sudeep.holla, linux-arm-kernel, linux-kernel, quic_rgottimu, quic_avajid On Thu, Sep 22, 2022 at 10:31:47AM +0530, Shivnandan Kumar wrote: > Hi Christian, > Hi Shivnandan, > > Do you have any update or suggestion regarding thread https://lore.kernel.org/lkml/20211105094310.GI6526@e120937-lin/T/#m07993053f6f238864acad4e9bad9f08d85aeb019. > > We are still getting this issue and wanted to check if there is any fix > that I can try. > Sorry this issue fell to the bottom of my list in these past months... ... but it stil on TODO :D So today I tried to get my head around this issue again (i.e. mainly re-reading the above thread to remind me what was the status and wth I had written... :P) In summary the racy thing seemed to be caused by the a delayed late SCMI Base reply happily served on one core by scmi_rx_callback operating on some well-defined SCMI channel, while on another core we are effectively shutting down the system and destroying such channels: now this should be clearly NOT be possible and it is what we have to synchronize. Looking at the transport layer that you use, mailbox, I see that while setup/free helpers are synchronized on an internal chan->lock, the RX path inside the mailbox core is not, so I tried this: diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c index 4229b9b5da98..bb6173c0ad54 100644 --- a/drivers/mailbox/mailbox.c +++ b/drivers/mailbox/mailbox.c @@ -157,9 +157,13 @@ static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer) */ void mbox_chan_received_data(struct mbox_chan *chan, void *mssg) { + unsigned long flags; + + spin_lock_irqsave(&chan->lock, flags); /* No buffering the received data */ if (chan->cl->rx_callback) chan->cl->rx_callback(chan->cl, mssg); + spin_unlock_irqrestore(&chan->lock, flags); } EXPORT_SYMBOL_GPL(mbox_chan_received_data); ... can you try on your setup ? I dont have a way to easily reproduce your race as of now... NOTE THAT, I am not still convinced that the above fix, if it works, will constitute the final solution to this issue, I could maybe move this same kind of sync up into the SCMI transport layer to avoid to impact all other users of the above mailbox interface (since, as of today, nobody has reported any issue like ours due to the missing spinlock..)..but it could be helpful to test the above to verify that this is really where the root issue is. Thanks, Cristian _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: Query regarding "firmware: arm_scmi: Free mailbox channels if probe fails" 2022-09-22 14:54 ` Cristian Marussi @ 2022-09-22 15:28 ` Cristian Marussi 2022-09-30 12:59 ` Shivnandan Kumar 0 siblings, 1 reply; 7+ messages in thread From: Cristian Marussi @ 2022-09-22 15:28 UTC (permalink / raw) To: Shivnandan Kumar Cc: sudeep.holla, linux-arm-kernel, linux-kernel, quic_rgottimu, quic_avajid On Thu, Sep 22, 2022 at 03:54:31PM +0100, Cristian Marussi wrote: > On Thu, Sep 22, 2022 at 10:31:47AM +0530, Shivnandan Kumar wrote: > > Hi Christian, > > > Hi Shivnandan, [snip] > Looking at the transport layer that you use, mailbox, I see that while > setup/free helpers are synchronized on an internal chan->lock, the RX > path inside the mailbox core is not, so I tried this: > > > diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c > index 4229b9b5da98..bb6173c0ad54 100644 > --- a/drivers/mailbox/mailbox.c > +++ b/drivers/mailbox/mailbox.c > @@ -157,9 +157,13 @@ static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer) > */ > void mbox_chan_received_data(struct mbox_chan *chan, void *mssg) > { > + unsigned long flags; > + > + spin_lock_irqsave(&chan->lock, flags); > /* No buffering the received data */ > if (chan->cl->rx_callback) > chan->cl->rx_callback(chan->cl, mssg); > + spin_unlock_irqrestore(&chan->lock, flags); > } > EXPORT_SYMBOL_GPL(mbox_chan_received_data); > ...sorry... a small change on the tentative above fix... ----8<------ diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c index 4229b9b5da98..6fbe183acdae 100644 --- a/drivers/mailbox/mailbox.c +++ b/drivers/mailbox/mailbox.c @@ -157,9 +157,13 @@ static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer) */ void mbox_chan_received_data(struct mbox_chan *chan, void *mssg) { + unsigned long flags; + + spin_lock_irqsave(&chan->lock, flags); /* No buffering the received data */ - if (chan->cl->rx_callback) + if (chan->cl && chan->cl->rx_callback) chan->cl->rx_callback(chan->cl, mssg); + spin_unlock_irqrestore(&chan->lock, flags); } EXPORT_SYMBOL_GPL(mbox_chan_received_data); ------8<----- Thanks, Cristian _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: Query regarding "firmware: arm_scmi: Free mailbox channels if probe fails" 2022-09-22 15:28 ` Cristian Marussi @ 2022-09-30 12:59 ` Shivnandan Kumar 2022-10-03 13:22 ` Cristian Marussi 0 siblings, 1 reply; 7+ messages in thread From: Shivnandan Kumar @ 2022-09-30 12:59 UTC (permalink / raw) To: Cristian Marussi Cc: sudeep.holla, linux-arm-kernel, linux-kernel, quic_rgottimu, quic_avajid hi Cristian, Thanks for your support in providing the patch to try. I found one race condition in our downstream mbox controller driver while accessing con_priv, when I serialized access to this, issue is not seen on 3 days of testing. As you rightly mentioned that your provided patch will impact all the other users. Also if we take your provided patch, same race still exists while accessing con_priv in our downstream mbox controller so this issue will still be there. So, we are planning to merge the patch( serialized access to con_priv) in our downstream mbox controller now. Thanks, Shivnandan On 9/22/2022 8:58 PM, Cristian Marussi wrote: > On Thu, Sep 22, 2022 at 03:54:31PM +0100, Cristian Marussi wrote: >> On Thu, Sep 22, 2022 at 10:31:47AM +0530, Shivnandan Kumar wrote: >>> Hi Cristian, >>> > Hi Shivnandan, > > [snip] > >> Looking at the transport layer that you use, mailbox, I see that while >> setup/free helpers are synchronized on an internal chan->lock, the RX >> path inside the mailbox core is not, so I tried this: >> >> >> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c >> index 4229b9b5da98..bb6173c0ad54 100644 >> --- a/drivers/mailbox/mailbox.c >> +++ b/drivers/mailbox/mailbox.c >> @@ -157,9 +157,13 @@ static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer) >> */ >> void mbox_chan_received_data(struct mbox_chan *chan, void *mssg) >> { >> + unsigned long flags; >> + >> + spin_lock_irqsave(&chan->lock, flags); >> /* No buffering the received data */ >> if (chan->cl->rx_callback) >> chan->cl->rx_callback(chan->cl, mssg); >> + spin_unlock_irqrestore(&chan->lock, flags); >> } >> EXPORT_SYMBOL_GPL(mbox_chan_received_data); >> > ...sorry... a small change on the tentative above fix... > > ----8<------ > > diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c > index 4229b9b5da98..6fbe183acdae 100644 > --- a/drivers/mailbox/mailbox.c > +++ b/drivers/mailbox/mailbox.c > @@ -157,9 +157,13 @@ static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer) > */ > void mbox_chan_received_data(struct mbox_chan *chan, void *mssg) > { > + unsigned long flags; > + > + spin_lock_irqsave(&chan->lock, flags); > /* No buffering the received data */ > - if (chan->cl->rx_callback) > + if (chan->cl && chan->cl->rx_callback) > chan->cl->rx_callback(chan->cl, mssg); > + spin_unlock_irqrestore(&chan->lock, flags); > } > EXPORT_SYMBOL_GPL(mbox_chan_received_data); > > ------8<----- > > Thanks, > Cristian > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Query regarding "firmware: arm_scmi: Free mailbox channels if probe fails" 2022-09-30 12:59 ` Shivnandan Kumar @ 2022-10-03 13:22 ` Cristian Marussi 2022-10-11 10:04 ` Shivnandan Kumar 0 siblings, 1 reply; 7+ messages in thread From: Cristian Marussi @ 2022-10-03 13:22 UTC (permalink / raw) To: Shivnandan Kumar Cc: sudeep.holla, linux-arm-kernel, linux-kernel, quic_rgottimu, quic_avajid On Fri, Sep 30, 2022 at 06:29:02PM +0530, Shivnandan Kumar wrote: > hi Cristian, Hi Shivnandan, > > Thanks for your support in providing the patch to try. > > I found one race condition in our downstream mbox controller driver while > accessing con_priv, when I serialized access to this, issue is not seen on 3 > days of testing. Good to hear that you find the issue. > > As you rightly mentioned that your provided patch will impact all the other > users. > > Also if we take your provided patch, same race still exists while accessing > con_priv in our downstream mbox controller so this issue will still be > there. > Yes indeed, even though I think that race in the mailbox core between RX path and chan_free could still be theoretically possible it does not seem to me appropriate to try to fix it now that you cannot reproduce it anymore and no other mailbox user has ever raised this concern (even though, as said, the proper solution to that race wont probably be directly in the mailbox-core as in my experimental two liners..) > So, we are planning to merge the patch( serialized access to con_priv) in > our downstream mbox controller now. > Ok, just out of curiosity, once done, can you point me at your downstream public sources so I can see the issue and the fix that you are applying to your trees ? Thanks, Cristian _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Query regarding "firmware: arm_scmi: Free mailbox channels if probe fails" 2022-10-03 13:22 ` Cristian Marussi @ 2022-10-11 10:04 ` Shivnandan Kumar 2022-10-11 10:40 ` Cristian Marussi 0 siblings, 1 reply; 7+ messages in thread From: Shivnandan Kumar @ 2022-10-11 10:04 UTC (permalink / raw) To: Cristian Marussi Cc: sudeep.holla, linux-arm-kernel, linux-kernel, quic_rgottimu Hi Cristian, >>Ok, just out of curiosity, once done, can you point me at your downstream public sources so I can see the issue and the fix that you are applying to your trees ? https://source.codeaurora.org/quic/la/kernel/msm-5.10/tree/drivers/soc/qcom/qcom_rimps.c?h=KERNEL.PLATFORM.1.0.r1-07800-kernel.0 I have added lock while accessing con_priv inside irq handler and shutdown function. I have one input regarding timeout from firmware, can we enable BUG on response time out in function do_xfer based on some debug config flag,this will help to debug firmware timeout issue faster. We will only enable that config flag during internal testing. Thanks, Shivnandan On 10/3/2022 6:52 PM, Cristian Marussi wrote: > On Fri, Sep 30, 2022 at 06:29:02PM +0530, Shivnandan Kumar wrote: >> hi Cristian, > Hi Shivnandan, >> Thanks for your support in providing the patch to try. >> >> I found one race condition in our downstream mbox controller driver while >> accessing con_priv, when I serialized access to this, issue is not seen on 3 >> days of testing. > Good to hear that you find the issue. > >> As you rightly mentioned that your provided patch will impact all the other >> users. >> >> Also if we take your provided patch, same race still exists while accessing >> con_priv in our downstream mbox controller so this issue will still be >> there. >> > Yes indeed, even though I think that race in the mailbox core between RX path > and chan_free could still be theoretically possible it does not seem to me > appropriate to try to fix it now that you cannot reproduce it anymore and > no other mailbox user has ever raised this concern (even though, as said, the > proper solution to that race wont probably be directly in the mailbox-core as > in my experimental two liners..) > >> So, we are planning to merge the patch( serialized access to con_priv) in >> our downstream mbox controller now. >> > Ok, just out of curiosity, once done, can you point me at your downstream public > sources so I can see the issue and the fix that you are applying to your trees ? > > Thanks, > Cristian _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Query regarding "firmware: arm_scmi: Free mailbox channels if probe fails" 2022-10-11 10:04 ` Shivnandan Kumar @ 2022-10-11 10:40 ` Cristian Marussi 0 siblings, 0 replies; 7+ messages in thread From: Cristian Marussi @ 2022-10-11 10:40 UTC (permalink / raw) To: Shivnandan Kumar Cc: sudeep.holla, linux-arm-kernel, linux-kernel, quic_rgottimu On Tue, Oct 11, 2022 at 03:34:45PM +0530, Shivnandan Kumar wrote: > > Hi Cristian, > Hi Shivnandan, > >>Ok, just out of curiosity, once done, can you point me at your downstream > public sources so I can see the issue and the fix that you are applying to > your trees ? > > https://source.codeaurora.org/quic/la/kernel/msm-5.10/tree/drivers/soc/qcom/qcom_rimps.c?h=KERNEL.PLATFORM.1.0.r1-07800-kernel.0 > > I have added lock while accessing con_priv inside irq handler and shutdown > function. > Thanks ! > > I have one input regarding timeout from firmware, can we enable BUG on > response time out in function do_xfer based on some debug config flag,this > will help to debug firmware timeout issue faster. > > We will only enable that config flag during internal testing. > I understand a sort of 'Panic-on-timeout' would be useful to just freeze the system as it is and debug, but it seems to me pretty much invasive (and generally frowned upon) to BUG_ON timeouts, given on some SCMI platforms/transports a few timeouts can happen really not so infrequently due to transient conditions during moments of peak SCMI traffic. Even though you mention to make it conditional to Kconfig, I'm not sure this could fly, especially if you want to enable only for internal testing...I'll ping Sudeep about this to see what he thinks. As an alternative, what if I try to improve SCMI tracing/debug, let's say dumping more info in dmesg about the offending (timed-out) message instead of hanging the system as a whole ? I'd have also some still-brewing-and-not-published patches to add some SCMI stats somewhere in sysfs to be able to read current SCMI errors/timeouts and transport anomalies, would that be of interest ? ...maybe, we could combine some of these stats and some sort of BUG_ON/WARN_ON (if it will fly eventually..) into some kind SCMI_DEBUG mode ...any input on your needs about which kind of SCMI info you'll like to see exposed by the stack would be welcome. Last but not least, since we are talking about SCMI Server/FW testing, have you (or your team) seen this work-in-progress of mine: https://lore.kernel.org/linux-arm-kernel/20220903183042.3913053-1-cristian.marussi@arm.com/ about a new unified userspace interface to inject/snoop SCMI messages to test/fuzz/stress the SCMI server wherever it is placed ? Any feedback on the API proprosed in the cover-letter would be highly welcome; I'll post a new V4 next week possibly, and the changes to the existing ARM SCMI Compliance suite (mentioned in the cover) to support this new SCMI Raw mode are in their final stage too. Thanks, Cristian _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-10-11 10:41 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-09-22 5:01 Query regarding "firmware: arm_scmi: Free mailbox channels if probe fails" Shivnandan Kumar 2022-09-22 14:54 ` Cristian Marussi 2022-09-22 15:28 ` Cristian Marussi 2022-09-30 12:59 ` Shivnandan Kumar 2022-10-03 13:22 ` Cristian Marussi 2022-10-11 10:04 ` Shivnandan Kumar 2022-10-11 10:40 ` Cristian Marussi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).