linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* Query regarding "firmware: arm_scmi: Free mailbox channels if probe fails"
@ 2022-09-22  5:01 Shivnandan Kumar
  2022-09-22 14:54 ` Cristian Marussi
  0 siblings, 1 reply; 7+ messages in thread
From: Shivnandan Kumar @ 2022-09-22  5:01 UTC (permalink / raw)
  To: cristian.marussi, sudeep.holla
  Cc: linux-arm-kernel, linux-kernel, quic_rgottimu, quic_avajid

Hi Christian,


Do you have any update or suggestion regarding thread 
https://lore.kernel.org/lkml/20211105094310.GI6526@e120937-lin/T/#m07993053f6f238864acad4e9bad9f08d85aeb019.

We are still getting this issue and wanted to check if  there is any fix 
that I can try.


Thanks,

Shivnandan


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Query regarding "firmware: arm_scmi: Free mailbox channels if probe fails"
  2022-09-22  5:01 Query regarding "firmware: arm_scmi: Free mailbox channels if probe fails" Shivnandan Kumar
@ 2022-09-22 14:54 ` Cristian Marussi
  2022-09-22 15:28   ` Cristian Marussi
  0 siblings, 1 reply; 7+ messages in thread
From: Cristian Marussi @ 2022-09-22 14:54 UTC (permalink / raw)
  To: Shivnandan Kumar
  Cc: sudeep.holla, linux-arm-kernel, linux-kernel, quic_rgottimu, quic_avajid

On Thu, Sep 22, 2022 at 10:31:47AM +0530, Shivnandan Kumar wrote:
> Hi Christian,
> 

Hi Shivnandan,

> 
> Do you have any update or suggestion regarding thread https://lore.kernel.org/lkml/20211105094310.GI6526@e120937-lin/T/#m07993053f6f238864acad4e9bad9f08d85aeb019.
> 
> We are still getting this issue and wanted to check if  there is any fix
> that I can try.
> 

Sorry this issue fell to the bottom of my list in these past months...
... but it stil on TODO :D

So today I tried to get my head around this issue again (i.e. mainly
re-reading the above thread to remind me what was the status and wth I
had written... :P)

In summary the racy thing seemed to be caused by the a delayed late
SCMI Base reply happily served on one core by scmi_rx_callback operating
on some well-defined SCMI channel, while on another core we are effectively
shutting down the system and destroying such channels: now this should
be clearly NOT be possible and it is what we have to synchronize.

Looking at the transport layer that you use, mailbox, I see that while
setup/free helpers are synchronized on an internal chan->lock, the RX
path inside the mailbox core is not, so I tried this:


diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
index 4229b9b5da98..bb6173c0ad54 100644
--- a/drivers/mailbox/mailbox.c
+++ b/drivers/mailbox/mailbox.c
@@ -157,9 +157,13 @@ static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer)
  */
 void mbox_chan_received_data(struct mbox_chan *chan, void *mssg)
 {
+       unsigned long flags;
+
+       spin_lock_irqsave(&chan->lock, flags);
        /* No buffering the received data */
        if (chan->cl->rx_callback)
                chan->cl->rx_callback(chan->cl, mssg);
+       spin_unlock_irqrestore(&chan->lock, flags);
 }
 EXPORT_SYMBOL_GPL(mbox_chan_received_data);
 

... can you try on your setup ? I dont have a way to easily reproduce
your race as of now...

NOTE THAT, I am not still convinced that the above fix, if it works, will
constitute the final solution to this issue, I could maybe move this same
kind of sync up into the SCMI transport layer to avoid to impact all other
users of the above mailbox interface (since, as of today, nobody has
reported any issue like ours due to the missing spinlock..)..but it
could be helpful to test the above to verify that this is really where
the root issue is.

Thanks,
Cristian

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Query regarding "firmware: arm_scmi: Free mailbox channels if probe fails"
  2022-09-22 14:54 ` Cristian Marussi
@ 2022-09-22 15:28   ` Cristian Marussi
  2022-09-30 12:59     ` Shivnandan Kumar
  0 siblings, 1 reply; 7+ messages in thread
From: Cristian Marussi @ 2022-09-22 15:28 UTC (permalink / raw)
  To: Shivnandan Kumar
  Cc: sudeep.holla, linux-arm-kernel, linux-kernel, quic_rgottimu, quic_avajid

On Thu, Sep 22, 2022 at 03:54:31PM +0100, Cristian Marussi wrote:
> On Thu, Sep 22, 2022 at 10:31:47AM +0530, Shivnandan Kumar wrote:
> > Hi Christian,
> > 
> 

Hi Shivnandan,
 
[snip]

> Looking at the transport layer that you use, mailbox, I see that while
> setup/free helpers are synchronized on an internal chan->lock, the RX
> path inside the mailbox core is not, so I tried this:
> 
> 
> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
> index 4229b9b5da98..bb6173c0ad54 100644
> --- a/drivers/mailbox/mailbox.c
> +++ b/drivers/mailbox/mailbox.c
> @@ -157,9 +157,13 @@ static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer)
>   */
>  void mbox_chan_received_data(struct mbox_chan *chan, void *mssg)
>  {
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&chan->lock, flags);
>         /* No buffering the received data */
>         if (chan->cl->rx_callback)
>                 chan->cl->rx_callback(chan->cl, mssg);
> +       spin_unlock_irqrestore(&chan->lock, flags);
>  }
>  EXPORT_SYMBOL_GPL(mbox_chan_received_data);
>  

...sorry... a small change on the tentative above fix...

----8<------

diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
index 4229b9b5da98..6fbe183acdae 100644
--- a/drivers/mailbox/mailbox.c
+++ b/drivers/mailbox/mailbox.c
@@ -157,9 +157,13 @@ static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer)
  */
 void mbox_chan_received_data(struct mbox_chan *chan, void *mssg)
 {
+       unsigned long flags;
+
+       spin_lock_irqsave(&chan->lock, flags);
        /* No buffering the received data */
-       if (chan->cl->rx_callback)
+       if (chan->cl && chan->cl->rx_callback)
                chan->cl->rx_callback(chan->cl, mssg);
+       spin_unlock_irqrestore(&chan->lock, flags);
 }
 EXPORT_SYMBOL_GPL(mbox_chan_received_data);

------8<-----

Thanks,
Cristian


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Query regarding "firmware: arm_scmi: Free mailbox channels if probe fails"
  2022-09-22 15:28   ` Cristian Marussi
@ 2022-09-30 12:59     ` Shivnandan Kumar
  2022-10-03 13:22       ` Cristian Marussi
  0 siblings, 1 reply; 7+ messages in thread
From: Shivnandan Kumar @ 2022-09-30 12:59 UTC (permalink / raw)
  To: Cristian Marussi
  Cc: sudeep.holla, linux-arm-kernel, linux-kernel, quic_rgottimu, quic_avajid

hi Cristian,

Thanks for your support in providing the patch to try.

I found one race condition in our downstream mbox controller driver 
while accessing con_priv, when I serialized access to this, issue is not 
seen on 3 days of testing.

As you rightly mentioned that your provided patch will impact all the 
other users.

Also if  we take your provided patch, same race still exists while 
accessing con_priv in our downstream mbox controller so this issue will 
still be there.

So, we are planning to merge the patch( serialized access to con_priv) 
in our downstream mbox controller now.


Thanks,

Shivnandan


On 9/22/2022 8:58 PM, Cristian Marussi wrote:
> On Thu, Sep 22, 2022 at 03:54:31PM +0100, Cristian Marussi wrote:
>> On Thu, Sep 22, 2022 at 10:31:47AM +0530, Shivnandan Kumar wrote:
>>> Hi Cristian,
>>>
> Hi Shivnandan,
>   
> [snip]
>
>> Looking at the transport layer that you use, mailbox, I see that while
>> setup/free helpers are synchronized on an internal chan->lock, the RX
>> path inside the mailbox core is not, so I tried this:
>>
>>
>> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
>> index 4229b9b5da98..bb6173c0ad54 100644
>> --- a/drivers/mailbox/mailbox.c
>> +++ b/drivers/mailbox/mailbox.c
>> @@ -157,9 +157,13 @@ static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer)
>>    */
>>   void mbox_chan_received_data(struct mbox_chan *chan, void *mssg)
>>   {
>> +       unsigned long flags;
>> +
>> +       spin_lock_irqsave(&chan->lock, flags);
>>          /* No buffering the received data */
>>          if (chan->cl->rx_callback)
>>                  chan->cl->rx_callback(chan->cl, mssg);
>> +       spin_unlock_irqrestore(&chan->lock, flags);
>>   }
>>   EXPORT_SYMBOL_GPL(mbox_chan_received_data);
>>   
> ...sorry... a small change on the tentative above fix...
>
> ----8<------
>
> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
> index 4229b9b5da98..6fbe183acdae 100644
> --- a/drivers/mailbox/mailbox.c
> +++ b/drivers/mailbox/mailbox.c
> @@ -157,9 +157,13 @@ static enum hrtimer_restart txdone_hrtimer(struct hrtimer *hrtimer)
>    */
>   void mbox_chan_received_data(struct mbox_chan *chan, void *mssg)
>   {
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&chan->lock, flags);
>          /* No buffering the received data */
> -       if (chan->cl->rx_callback)
> +       if (chan->cl && chan->cl->rx_callback)
>                  chan->cl->rx_callback(chan->cl, mssg);
> +       spin_unlock_irqrestore(&chan->lock, flags);
>   }
>   EXPORT_SYMBOL_GPL(mbox_chan_received_data);
>
> ------8<-----
>
> Thanks,
> Cristian
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Query regarding "firmware: arm_scmi: Free mailbox channels if probe fails"
  2022-09-30 12:59     ` Shivnandan Kumar
@ 2022-10-03 13:22       ` Cristian Marussi
  2022-10-11 10:04         ` Shivnandan Kumar
  0 siblings, 1 reply; 7+ messages in thread
From: Cristian Marussi @ 2022-10-03 13:22 UTC (permalink / raw)
  To: Shivnandan Kumar
  Cc: sudeep.holla, linux-arm-kernel, linux-kernel, quic_rgottimu, quic_avajid

On Fri, Sep 30, 2022 at 06:29:02PM +0530, Shivnandan Kumar wrote:
> hi Cristian,

Hi Shivnandan,
> 
> Thanks for your support in providing the patch to try.
> 
> I found one race condition in our downstream mbox controller driver while
> accessing con_priv, when I serialized access to this, issue is not seen on 3
> days of testing.

Good to hear that you find the issue.

> 
> As you rightly mentioned that your provided patch will impact all the other
> users.
> 
> Also if  we take your provided patch, same race still exists while accessing
> con_priv in our downstream mbox controller so this issue will still be
> there.
> 

Yes indeed, even though I think that race in the mailbox core between RX path
and chan_free could still be theoretically possible it does not seem to me
appropriate to try to fix it now that you cannot reproduce it anymore and
no other mailbox user has ever raised this concern (even though, as said, the
proper solution to that race wont probably be directly in the mailbox-core as
in my experimental two liners..)

> So, we are planning to merge the patch( serialized access to con_priv) in
> our downstream mbox controller now.
> 

Ok, just out of curiosity, once done, can you point me at your downstream public
sources so I can see the issue and the fix that you are applying to your trees ?

Thanks,
Cristian

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Query regarding "firmware: arm_scmi: Free mailbox channels if probe fails"
  2022-10-03 13:22       ` Cristian Marussi
@ 2022-10-11 10:04         ` Shivnandan Kumar
  2022-10-11 10:40           ` Cristian Marussi
  0 siblings, 1 reply; 7+ messages in thread
From: Shivnandan Kumar @ 2022-10-11 10:04 UTC (permalink / raw)
  To: Cristian Marussi
  Cc: sudeep.holla, linux-arm-kernel, linux-kernel, quic_rgottimu


Hi Cristian,

 >>Ok, just out of curiosity, once done, can you point me at your 
downstream public sources so I can see the issue and the fix that you 
are applying to your trees ?

https://source.codeaurora.org/quic/la/kernel/msm-5.10/tree/drivers/soc/qcom/qcom_rimps.c?h=KERNEL.PLATFORM.1.0.r1-07800-kernel.0

I have added lock while accessing con_priv inside irq handler and 
shutdown function.


I have one input regarding timeout from firmware, can we enable BUG on 
response  time out in function do_xfer based on some debug config 
flag,this will help to debug firmware timeout issue faster.

We will only enable that config flag during internal testing.


Thanks,

Shivnandan

On 10/3/2022 6:52 PM, Cristian Marussi wrote:
> On Fri, Sep 30, 2022 at 06:29:02PM +0530, Shivnandan Kumar wrote:
>> hi Cristian,
> Hi Shivnandan,
>> Thanks for your support in providing the patch to try.
>>
>> I found one race condition in our downstream mbox controller driver while
>> accessing con_priv, when I serialized access to this, issue is not seen on 3
>> days of testing.
> Good to hear that you find the issue.
>
>> As you rightly mentioned that your provided patch will impact all the other
>> users.
>>
>> Also if  we take your provided patch, same race still exists while accessing
>> con_priv in our downstream mbox controller so this issue will still be
>> there.
>>
> Yes indeed, even though I think that race in the mailbox core between RX path
> and chan_free could still be theoretically possible it does not seem to me
> appropriate to try to fix it now that you cannot reproduce it anymore and
> no other mailbox user has ever raised this concern (even though, as said, the
> proper solution to that race wont probably be directly in the mailbox-core as
> in my experimental two liners..)
>
>> So, we are planning to merge the patch( serialized access to con_priv) in
>> our downstream mbox controller now.
>>
> Ok, just out of curiosity, once done, can you point me at your downstream public
> sources so I can see the issue and the fix that you are applying to your trees ?
>
> Thanks,
> Cristian

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Query regarding "firmware: arm_scmi: Free mailbox channels if probe fails"
  2022-10-11 10:04         ` Shivnandan Kumar
@ 2022-10-11 10:40           ` Cristian Marussi
  0 siblings, 0 replies; 7+ messages in thread
From: Cristian Marussi @ 2022-10-11 10:40 UTC (permalink / raw)
  To: Shivnandan Kumar
  Cc: sudeep.holla, linux-arm-kernel, linux-kernel, quic_rgottimu

On Tue, Oct 11, 2022 at 03:34:45PM +0530, Shivnandan Kumar wrote:
> 
> Hi Cristian,
> 

Hi Shivnandan,

> >>Ok, just out of curiosity, once done, can you point me at your downstream
> public sources so I can see the issue and the fix that you are applying to
> your trees ?
> 
> https://source.codeaurora.org/quic/la/kernel/msm-5.10/tree/drivers/soc/qcom/qcom_rimps.c?h=KERNEL.PLATFORM.1.0.r1-07800-kernel.0
> 
> I have added lock while accessing con_priv inside irq handler and shutdown
> function.
> 

Thanks !

> 
> I have one input regarding timeout from firmware, can we enable BUG on
> response  time out in function do_xfer based on some debug config flag,this
> will help to debug firmware timeout issue faster.
> 
> We will only enable that config flag during internal testing.
> 

I understand a sort of 'Panic-on-timeout' would be useful to just freeze
the system as it is and debug, but it seems to me pretty much invasive
(and generally frowned upon) to BUG_ON timeouts, given on some SCMI
platforms/transports a few timeouts can happen really not so infrequently
due to transient conditions during moments of peak SCMI traffic.

Even though you mention to make it conditional to Kconfig, I'm not sure
this could fly, especially if you want to enable only for internal
testing...I'll ping Sudeep about this to see what he thinks.

As an alternative, what if I try to improve SCMI tracing/debug, let's say
dumping more info in dmesg about the offending (timed-out) message instead
of hanging the system as a whole ?

I'd have also some still-brewing-and-not-published patches to add some
SCMI stats somewhere in sysfs to be able to read current SCMI errors/timeouts
and transport anomalies, would that be of interest ?

...maybe, we could combine some of these stats and some sort of
BUG_ON/WARN_ON (if it will fly eventually..) into some kind SCMI_DEBUG mode
...any input on your needs about which kind of SCMI info you'll like to see
exposed by the stack would be welcome.

Last but not least, since we are talking about SCMI Server/FW testing,
have you (or your team) seen this work-in-progress of mine:

https://lore.kernel.org/linux-arm-kernel/20220903183042.3913053-1-cristian.marussi@arm.com/

about a new unified userspace interface to inject/snoop SCMI messages to
test/fuzz/stress the SCMI server wherever it is placed ?

Any feedback on the API proprosed in the cover-letter would be highly welcome;
I'll post a new V4 next week possibly, and the changes to the existing ARM SCMI
Compliance suite (mentioned in the cover) to support this new SCMI Raw
mode are in their final stage too.

Thanks,
Cristian

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-10-11 10:41 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-22  5:01 Query regarding "firmware: arm_scmi: Free mailbox channels if probe fails" Shivnandan Kumar
2022-09-22 14:54 ` Cristian Marussi
2022-09-22 15:28   ` Cristian Marussi
2022-09-30 12:59     ` Shivnandan Kumar
2022-10-03 13:22       ` Cristian Marussi
2022-10-11 10:04         ` Shivnandan Kumar
2022-10-11 10:40           ` Cristian Marussi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).