linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -next 1/1] firmware: arm_scmi: Fix possible deadlock in shmem_tx_prepare()
@ 2022-09-30  2:14 Yaxiong Tian
  2022-10-03 11:57 ` Cristian Marussi
  0 siblings, 1 reply; 8+ messages in thread
From: Yaxiong Tian @ 2022-09-30  2:14 UTC (permalink / raw)
  To: sudeep.holla, cristian.marussi, linux-arm-kernel, linux-kernel
  Cc: Yaxiong Tian, stable, Jim Quinlan

From: Yaxiong Tian <tianyaxiong@kylinos.cn>

In shmem_tx_prepare() ,it use spin_until_cond() to wait the chanel to be
free. Though it can avoid risk about overwriting the old command.But
when the platform has some problem in setting the chanel to free(such as
Improper initialization ),it can lead to the chanel never to be free.So
the os is sticked in it.In addition when shmem_tx_prepare() called,this
indicates that the previous transfer has commpleted or timed out.It
unsuitable for unconditional waiting the chanel to be free.

So for system stablility,we can add timeout condition in waiting the
chanel to be free.

Fixes: 9dc34d635c67 ("firmware: arm_scmi: Check if platform has released shmem before using")
Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn>
Cc: stable@vger.kernel.org
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Jim Quinlan <james.quinlan@broadcom.com>
---
 drivers/firmware/arm_scmi/shmem.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/arm_scmi/shmem.c b/drivers/firmware/arm_scmi/shmem.c
index 0e3eaea5d852..ae6110a81855 100644
--- a/drivers/firmware/arm_scmi/shmem.c
+++ b/drivers/firmware/arm_scmi/shmem.c
@@ -8,6 +8,7 @@
 #include <linux/io.h>
 #include <linux/processor.h>
 #include <linux/types.h>
+#include <linux/ktime.h>
 
 #include "common.h"
 
@@ -29,17 +30,27 @@ struct scmi_shared_mem {
 	u8 msg_payload[];
 };
 
+#define SCMI_MAX_TX_PREPARE_TIMEOUT_MS 30
+
 void shmem_tx_prepare(struct scmi_shared_mem __iomem *shmem,
 		      struct scmi_xfer *xfer)
 {
+	ktime_t stop = ktime_add_ms(ktime_get(), SCMI_MAX_TX_PREPARE_TIMEOUT_MS);
 	/*
 	 * Ideally channel must be free by now unless OS timeout last
 	 * request and platform continued to process the same, wait
 	 * until it releases the shared memory, otherwise we may endup
 	 * overwriting its response with new message payload or vice-versa
 	 */
-	spin_until_cond(ioread32(&shmem->channel_status) &
-			SCMI_SHMEM_CHAN_STAT_CHANNEL_FREE);
+	spin_until_cond((ioread32(&shmem->channel_status) &
+			SCMI_SHMEM_CHAN_STAT_CHANNEL_FREE) ||
+			ktime_after(ktime_get(), stop));
+
+	if (unlikely(ktime_after(ktime_get(), stop))) {
+		pr_err("timed out in shmem_tx_prepare(caller: %pS).\n",
+					(void *)_RET_IP_);
+	}
+
 	/* Mark channel busy + clear error */
 	iowrite32(0x0, &shmem->channel_status);
 	iowrite32(xfer->hdr.poll_completion ? 0 : SCMI_SHMEM_FLAG_INTR_ENABLED,
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH -next 1/1] firmware: arm_scmi: Fix possible deadlock in shmem_tx_prepare()
  2022-09-30  2:14 [PATCH -next 1/1] firmware: arm_scmi: Fix possible deadlock in shmem_tx_prepare() Yaxiong Tian
@ 2022-10-03 11:57 ` Cristian Marussi
       [not found]   ` <tencent_38B168A1B211AC1C3F5BAACEB5DF9992E107@qq.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Cristian Marussi @ 2022-10-03 11:57 UTC (permalink / raw)
  To: Yaxiong Tian
  Cc: sudeep.holla, linux-arm-kernel, linux-kernel, Yaxiong Tian,
	stable, Jim Quinlan

On Fri, Sep 30, 2022 at 10:14:50AM +0800, Yaxiong Tian wrote:
> From: Yaxiong Tian <tianyaxiong@kylinos.cn>
> 

Hi Tian,

> In shmem_tx_prepare() ,it use spin_until_cond() to wait the chanel to be
> free. Though it can avoid risk about overwriting the old command.But
> when the platform has some problem in setting the chanel to free(such as
> Improper initialization ),it can lead to the chanel never to be free.So
> the os is sticked in it.

On a transport based on shared memory area indeed the busy/free bitflag is
used, on the TX channel, to 'pass' ownership of the channel from the agent
(Linux) to the platform SCMI server: as stated in the inline comment block,
once the Linux agent write his command into the shmem area and set the
channel busy, it HAS TO wait for the channel to be set free again by the
platform to avoid possible corruptions due to a very late command being
deliverd by the platform so late that it could override the next new command
freshly written into the shared memory by the Linux agent.

In other words you CANNOT forcibly grab back the channel ownership on
timeout like you are doing here, because you cannot assume anything on
the status of the SCMI server entity: a misbehaving SCMI server could
deliver a very late reply at any time (since it still thinks to have
ownership) and the possible subsequent corruption in the next queued
TX message would probably generate issues very difficult to spot and
debug.

Inded, in the SCMI Kernel stack we never forcibly free a TX transport
channel on timeout, scmi_clear_channel() is called only on the RX channel
for notifications and delayed response, because the ownership relation
is just the opposite, we are indeed releasing the channel to the platform
after we have processed their messages.

Same we do on a different transport like virtio, in which a virtio buffer,
related a message without a reply, is just lost if the SCMI server never
sent back anything: the only difference is that virtio has usually more
numerous free buffers so we can carry on anyway using some of the other
remaning free buffers.

Moreover, when the SCMI server or a transport ended up in such a broken
state that it does not even release the channel, it means the SCMI
server stack  is in serious trouble, and we (Kernel) do not really want
to do any businness with a backend in such a misbehaving state, the SCMI
server should be fixed; instead, this kind of approach you propose could
even hide some class of transient server-side problems by just ignoring
this condition and grabbing back the channel forcibly.

> In addition when shmem_tx_prepare() called,this
> indicates that the previous transfer has commpleted or timed out.It
> unsuitable for unconditional waiting the chanel to be free.

Note that there could be also the case in which there are many other
threads of executions on different cores trying to send a bunch of SCMI
commands (there are indeed many SCMI drivers currently in the stack) so
all these requests HAVE TO be queued while waiting for the shared
memory area to be free again, so I don't think you can assume in general
that shmem_tx_prepare() is called only after a successfull command completion
or timeout: it is probably true when an SCMI Mailbox transport is
used but it is not neccessarly the case when other shmem based transports
like optee/smc are used. You could have N threads trying to send an SCMI
command, queued, spinning on that loop, while waiting for the channel to
be free: in such a case note that you TX has not even complete, you are
still waiting for the channel the SCMI upper layer TX timeout has NOT
even being started since your the transmission itself is pending (i.e.
send_message has still not returned...)

> 
> So for system stablility,we can add timeout condition in waiting the
> chanel to be free.

In the end, I could agree that it is unfortunate that, if the SCMI Server
gets stuck also the SCMI Linux agent gets stuck while waiting for a free
channel, so that it could be sensible to timeout as you proposed BUT after
the timeout you should NOT carry on, BUT FAIL the whole transmission;
in this scenario, though, it would be tricky to choose a proper timeout
value, because, as said above, the allowed timeout for the spin would
depend on the number of existing queued in-flight transactions: as an
example, if you had N previous pending transmissions and transport with
X ms timeout, I would spin for at least N * X ms before timing out and
failing. (additionally the current tx_prepare() helpers return voids so
you'll need to figure out a way to report an error and stop the transaction
by 'tunnelling' this errors back to .send_message())

Not sure that all of this kind of work would be worth to address some,
maybe transient, error conditions due to a broken SCMI server, BUT in any
case, any kind of timeout you want to introduce in the spin loop MUST
result in a failed transmission until the FREE bitflag is cleared by the
SCMI server; i.e. if that flag won't be cleared EVER by the server, you
have to end up with a sequence of timed-out spinloops and transmission
failures, you definetely cannot recover forcibly like this.

Fix the SCMI server instead.

Thanks,
Cristian

> 
> Fixes: 9dc34d635c67 ("firmware: arm_scmi: Check if platform has released shmem before using")
> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn>
> Cc: stable@vger.kernel.org
> Cc: Sudeep Holla <sudeep.holla@arm.com>
> Cc: Jim Quinlan <james.quinlan@broadcom.com>
> ---
>  drivers/firmware/arm_scmi/shmem.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/firmware/arm_scmi/shmem.c b/drivers/firmware/arm_scmi/shmem.c
> index 0e3eaea5d852..ae6110a81855 100644
> --- a/drivers/firmware/arm_scmi/shmem.c
> +++ b/drivers/firmware/arm_scmi/shmem.c
> @@ -8,6 +8,7 @@
>  #include <linux/io.h>
>  #include <linux/processor.h>
>  #include <linux/types.h>
> +#include <linux/ktime.h>
>  
>  #include "common.h"
>  
> @@ -29,17 +30,27 @@ struct scmi_shared_mem {
>  	u8 msg_payload[];
>  };
>  
> +#define SCMI_MAX_TX_PREPARE_TIMEOUT_MS 30
> +
>  void shmem_tx_prepare(struct scmi_shared_mem __iomem *shmem,
>  		      struct scmi_xfer *xfer)
>  {
> +	ktime_t stop = ktime_add_ms(ktime_get(), SCMI_MAX_TX_PREPARE_TIMEOUT_MS);
>  	/*
>  	 * Ideally channel must be free by now unless OS timeout last
>  	 * request and platform continued to process the same, wait
>  	 * until it releases the shared memory, otherwise we may endup
>  	 * overwriting its response with new message payload or vice-versa
>  	 */
> -	spin_until_cond(ioread32(&shmem->channel_status) &
> -			SCMI_SHMEM_CHAN_STAT_CHANNEL_FREE);
> +	spin_until_cond((ioread32(&shmem->channel_status) &
> +			SCMI_SHMEM_CHAN_STAT_CHANNEL_FREE) ||
> +			ktime_after(ktime_get(), stop));
> +
> +	if (unlikely(ktime_after(ktime_get(), stop))) {
> +		pr_err("timed out in shmem_tx_prepare(caller: %pS).\n",
> +					(void *)_RET_IP_);
> +	}
> +
>  	/* Mark channel busy + clear error */
>  	iowrite32(0x0, &shmem->channel_status);
>  	iowrite32(xfer->hdr.poll_completion ? 0 : SCMI_SHMEM_FLAG_INTR_ENABLED,
> -- 
> 2.25.1
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH -next 1/1] firmware: arm_scmi: Fix possible deadlock in shmem_tx_prepare()
       [not found]   ` <tencent_38B168A1B211AC1C3F5BAACEB5DF9992E107@qq.com>
@ 2022-10-11 14:14     ` Cristian Marussi
  2022-10-13  7:05       ` YaxiongTian
  0 siblings, 1 reply; 8+ messages in thread
From: Cristian Marussi @ 2022-10-11 14:14 UTC (permalink / raw)
  To: Yaxiong Tian
  Cc: james.quinlan, linux-arm-kernel, linux-kernel, stable,
	sudeep.holla, tianyaxiong

On Tue, Oct 11, 2022 at 03:46:15PM +0800, Yaxiong Tian wrote:
> >Note that there could be also the case in which there are many other
> >threads of executions on different cores trying to send a bunch of SCMI
> >commands (there are indeed many SCMI drivers currently in the stack) so
> >all these requests HAVE TO be queued while waiting for the shared
> >memory area to be free again, so I don't think you can assume in general
> >that shmem_tx_prepare() is called only after a successfull command completion
> >or timeout: it is probably true when an SCMI Mailbox transport is
> >used but it is not neccessarly the case when other shmem based transports
> >like optee/smc are used. You could have N threads trying to send an SCMI
> >command, queued, spinning on that loop, while waiting for the channel to
> >be free: in such a case note that you TX has not even complete, you are
> >still waiting for the channel the SCMI upper layer TX timeout has NOT
> >even being started since your the transmission itself is pending (i.e.
> >send_message has still not returned...)
> 
> I read the code in optee/smc, scmi_optee_send_message()/smc_send_message()
> have channel lock before shmem_tx_prepare(). The lock was released when 
> transports was completed 、timeout or err. So although they have N threads 
> trying to send an SCMI command, queued, but only one could take up the channel.
> Also only one thread call shmem_tx_prepare() in one channel and spin() in it.
> 
> It is also true in mailboxs or other shmem based transports,because SCMI 
> protocol say:"agent must exclusive the channel."This is very reasonable through 
> the channel lock rather than shared memory.
> 
> So it is simple for selecting timeout period. Because there is only one thread 
> spin() in one channel, so the timeout period only depending on the firmware. 
> In other words this time can be a constant. 
> 

Yes, my bad, I forgot that any shared-mem based transport has some kind
of mechanism to access exclusively the channel for the whole transaction
to avoid some thread can issue a tx_prepare before the previous
transaction has fully completed (i.e. the result in the reply, if any,
was fetched before being overwritten by the next)

> >Not sure that all of this kind of work would be worth to address some,
> >maybe transient, error conditions due to a broken SCMI server, BUT in any
> >case, any kind of timeout you want to introduce in the spin loop MUST
> >result in a failed transmission until the FREE bitflag is cleared by the
> >SCMI server; i.e. if that flag won't be cleared EVER by the server, you
> >have to end up with a sequence of timed-out spinloops and transmission
> >failures, you definetely cannot recover forcibly like this.
> 
> I totally agree above.In such broken SCMI server,users cannot get any Any 
> hints.So I think it at least pr_warn(). We can set prepare_tx_timout parameter 
> in scmi_desc,or just set options for users to check error.
>

Problem is anyway, as you said, you'll have to pick this timeout from the
related transport scmi_desc (even if as of now the max_rx_timeout for
all existent shared mem transport is the same..) and this means anyway
adding more complexity to the chain of calls to just to print a warn of
some kind in a rare error-situation from which you cannot recover anyway.

Due to other unrelated discussions, I was starting to think about
exposing some debug-only (Kconfig dependent) SCMI stats like timeouts, errors,
unpexpected/OoO/late_replies in order to ease the debug and monitoring
of the health of a running SCMI stack: maybe this could be a place where
to flag this FW issues without changing the spinloop above (or
to add the kind of timeout you mentioned but only when some sort of
CONFIG_SCMI_DEBUG is enabled...)...still to fully think it through, though.

Any thoughts ?

Thanks,
Cristian

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH -next 1/1] firmware: arm_scmi: Fix possible deadlock in shmem_tx_prepare()
  2022-10-11 14:14     ` Cristian Marussi
@ 2022-10-13  7:05       ` YaxiongTian
  2022-10-13 16:02         ` Cristian Marussi
  0 siblings, 1 reply; 8+ messages in thread
From: YaxiongTian @ 2022-10-13  7:05 UTC (permalink / raw)
  To: cristian.marussi
  Cc: iambestgod, james.quinlan, linux-arm-kernel, linux-kernel,
	stable, sudeep.holla, tianyaxiong

Hi Cristian

    There may be a problem with my qq email client,   I don't see my 
mail in the

communityI had to switch outlook email.Forgive me if you've received 
multiple emails.

 >Problem is anyway, as you said, you'll have to pick this timeout from the
 >related transport scmi_desc (even if as of now the max_rx_timeout for
 >all existent shared mem transport is the same..) and this means anyway
 >adding more complexity to the chain of calls to just to print a warn of
 >some kind in a rare error-situation from which you cannot recover anyway.

   Yes,it has add more complexity about Monitorring this time.For system
stability,the safest thing to do is to abort the transmission.But this will
lose performance due to more complexity in such unusual situation.

 >Due to other unrelated discussions, I was starting to think about
 >exposing some debug-only (Kconfig dependent) SCMI stats like timeouts, 
errors,
 >unpexpected/OoO/late_replies in order to ease the debug and monitoring
 >of the health of a running SCMI stack: maybe this could be a place where
 >to flag this FW issues without changing the spinloop above (or
 >to add the kind of timeout you mentioned but only when some sort of
 >CONFIG_SCMI_DEBUG is enabled...)...still to fully think it through, 
though.

   I think it should active report warn or err rather than user queries the
information manually.(i.e fs_debug way).Becasue in system startup\S1\S3\S4,
user can not queries this flag in Fw,they need get stuck message 
immediately.


Thanks,
Tian


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH -next 1/1] firmware: arm_scmi: Fix possible deadlock in shmem_tx_prepare()
  2022-10-13  7:05       ` YaxiongTian
@ 2022-10-13 16:02         ` Cristian Marussi
  2022-10-14 11:56           ` Sudeep Holla
  0 siblings, 1 reply; 8+ messages in thread
From: Cristian Marussi @ 2022-10-13 16:02 UTC (permalink / raw)
  To: YaxiongTian
  Cc: iambestgod, james.quinlan, linux-arm-kernel, linux-kernel,
	stable, sudeep.holla, tianyaxiong

On Thu, Oct 13, 2022 at 03:05:43PM +0800, YaxiongTian wrote:
> Hi Cristian
> 
>    There may be a problem with my qq email client,   I don't see my mail in
> the
> 
> communityI had to switch outlook email.Forgive me if you've received
> multiple emails.
> 
No worries.

> >Problem is anyway, as you said, you'll have to pick this timeout from the
> >related transport scmi_desc (even if as of now the max_rx_timeout for
> >all existent shared mem transport is the same..) and this means anyway
> >adding more complexity to the chain of calls to just to print a warn of
> >some kind in a rare error-situation from which you cannot recover anyway.
> 
>   Yes,it has add more complexity about Monitorring this time.For system
> stability,the safest thing to do is to abort the transmission.But this will
> lose performance due to more complexity in such unusual situation.
> 
> >Due to other unrelated discussions, I was starting to think about
> >exposing some debug-only (Kconfig dependent) SCMI stats like timeouts,
> errors,
> >unpexpected/OoO/late_replies in order to ease the debug and monitoring
> >of the health of a running SCMI stack: maybe this could be a place where
> >to flag this FW issues without changing the spinloop above (or
> >to add the kind of timeout you mentioned but only when some sort of
> >CONFIG_SCMI_DEBUG is enabled...)...still to fully think it through, though.
> 
>   I think it should active report warn or err rather than user queries the
> information manually.(i.e fs_debug way).Becasue in system startup\S1\S3\S4,
> user can not queries this flag in Fw,they need get stuck message
> immediately.
> 

Looking more closely at this, I experimented a bit with an SCMI stack based on
mailbox transport in which I had forcefully set the spin_until_cond() to
spin forever.

Even though on a normal SCMI system when the SCMI stack fails at boot
the system is supposed to boot anyway (maybe slower), this particular
failure in TX path led indeed to a system that does not boot at all and
spits out an infinite sequence of:

[ 2924.499486] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 2924.505596] rcu:     2-...0: (0 ticks this GP) idle=1be4/1/0x4000000000000000 softirq=50/50 fqs=364757
[ 2924.514672]  (detected by 4, t=730678 jiffies, g=-1119, q=134 ncpus=6)
[ 2924.521215] Task dump for CPU 2:
[ 2924.524445] task:kworker/u12:0   state:R  running task     stack:    0 pid:    9 ppid:     2 flags:0x0000000a
[ 2924.534391] Workqueue: events_unbound deferred_probe_work_func
[ 2924.540244] Call trace:
[ 2924.542691]  __switch_to+0xe4/0x1b8
[ 2924.546189]  deferred_probe_work_func+0xa4/0xf8
[ 2924.550731]  process_one_work+0x208/0x480
[ 2924.554754]  worker_thread+0x230/0x428
[ 2924.558514]  kthread+0x114/0x120
[ 2924.561752]  ret_from_fork+0x10/0x20

I imagine this is the annoying thing you want to avoid.

So experimenting a bit with a patch similar to yours (ignoring the timeout
config issues and using the static cnt to temporarily stuck and revive the SCMI
transport)

------>8-----
diff --git a/drivers/firmware/arm_scmi/shmem.c b/drivers/firmware/arm_scmi/shmem.c
index 0e3eaea5d852..6dde669abd03 100644
--- a/drivers/firmware/arm_scmi/shmem.c
+++ b/drivers/firmware/arm_scmi/shmem.c
@@ -8,6 +8,7 @@
 #include <linux/io.h>
 #include <linux/processor.h>
 #include <linux/types.h>
 
 #include "common.h"
 
@@ -29,17 +30,28 @@ struct scmi_shared_mem {
        u8 msg_payload[];
 };
 
+static int cnt = 50;
 void shmem_tx_prepare(struct scmi_shared_mem __iomem *shmem,
                      struct scmi_xfer *xfer)
 {
+       ktime_t stop;
+
        /*
         * Ideally channel must be free by now unless OS timeout last
         * request and platform continued to process the same, wait
         * until it releases the shared memory, otherwise we may endup
         * overwriting its response with new message payload or vice-versa
         */
-       spin_until_cond(ioread32(&shmem->channel_status) &
-                       SCMI_SHMEM_CHAN_STAT_CHANNEL_FREE);
+       stop = ktime_add_ms(ktime_get(), 35);
+       spin_until_cond(((--cnt > 0) && ioread32(&shmem->channel_status) &
+                       SCMI_SHMEM_CHAN_STAT_CHANNEL_FREE) ||
+                       ktime_after(ktime_get(), stop));
+       if (ktime_after(ktime_get(), stop)) {
+               pr_warn_once("TX Timeout !\n");
+               cnt = 10;
+               return;
+       }
+
        /* Mark channel busy + clear error */
        iowrite32(0x0, &shmem->channel_status);
        iowrite32(xfer->hdr.poll_completion ? 0 : SCMI_SHMEM_FLAG_INTR_ENABLED,
----8<-------------

With the above I had in fact a system that could boot even with a
failing/stuck SCMI transport, but, as expected the SCMI stack
functionality was totally compromised after the first timeout with no
possibility to recover.

Moreover I was thinking at what could happen if later on after boot the
SCMI server should end in some funny/hogged condition so that it is,
only temporarily, a bit slower to answer and release the channel: with
the current implemenation the Kernel agent will spin just a little bit
more waiting for the channel to be freed and then everything carries
without much hassle, while with this possible new timing-out solution
we could end up dropping that transmission and compromising the whole
transport fucntionality for all the subsequent transmissions.

So, again, I'm not sure it is worth making such a change even for debug
purposes, given that in the worst scenario above you end up with a
system stuck at boot but for which the SCMI stack is anyway compromised
and where the only solution is fixing the server FW really.

I'll ask Sudeep is thoughts about the possible hang.

Thanks,
Cristian

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH -next 1/1] firmware: arm_scmi: Fix possible deadlock in shmem_tx_prepare()
  2022-10-13 16:02         ` Cristian Marussi
@ 2022-10-14 11:56           ` Sudeep Holla
  2022-10-14 12:23             ` Cristian Marussi
  0 siblings, 1 reply; 8+ messages in thread
From: Sudeep Holla @ 2022-10-14 11:56 UTC (permalink / raw)
  To: Cristian Marussi
  Cc: YaxiongTian, iambestgod, james.quinlan, linux-arm-kernel,
	linux-kernel, stable, tianyaxiong

On Thu, Oct 13, 2022 at 05:02:15PM +0100, Cristian Marussi wrote:
> On Thu, Oct 13, 2022 at 03:05:43PM +0800, YaxiongTian wrote:
> > Hi Cristian
> > 
> > �� There may be a problem with my qq email client, � I don't see my mail in
> > the
> > 
> > communityI had to switch outlook email.Forgive me if you've received
> > multiple emails.
> > 
> No worries.
> 
> > >Problem is anyway, as you said, you'll have to pick this timeout from the
> > >related transport scmi_desc (even if as of now the max_rx_timeout for
> > >all existent shared mem transport is the same..) and this means anyway
> > >adding more complexity to the chain of calls to just to print a warn of
> > >some kind in a rare error-situation from which you cannot recover anyway.
> > 
> > � Yes,it has add more complexity about Monitorring this time.For system
> > stability,the safest thing to do is to abort the transmission.But this will
> > lose performance due to more complexity in such unusual situation.
> > 
> > >Due to other unrelated discussions, I was starting to think about
> > >exposing some debug-only (Kconfig dependent) SCMI stats like timeouts,
> > errors,
> > >unpexpected/OoO/late_replies in order to ease the debug and monitoring
> > >of the health of a running SCMI stack: maybe this could be a place where
> > >to flag this FW issues without changing the spinloop above (or
> > >to add the kind of timeout you mentioned but only when some sort of
> > >CONFIG_SCMI_DEBUG is enabled...)...still to fully think it through, though.
> > 
> > � I think it should active report warn or err rather than user queries the
> > information manually.(i.e fs_debug way).Becasue in system startup\S1\S3\S4,
> > user can not queries this flag in Fw,they need get stuck message
> > immediately.
> > 
> 
> Looking more closely at this, I experimented a bit with an SCMI stack based on
> mailbox transport in which I had forcefully set the spin_until_cond() to
> spin forever.
> 
> Even though on a normal SCMI system when the SCMI stack fails at boot
> the system is supposed to boot anyway (maybe slower), this particular
> failure in TX path led indeed to a system that does not boot at all and
> spits out an infinite sequence of:
> 
> [ 2924.499486] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 2924.505596] rcu:     2-...0: (0 ticks this GP) idle=1be4/1/0x4000000000000000 softirq=50/50 fqs=364757
> [ 2924.514672]  (detected by 4, t=730678 jiffies, g=-1119, q=134 ncpus=6)
> [ 2924.521215] Task dump for CPU 2:
> [ 2924.524445] task:kworker/u12:0   state:R  running task     stack:    0 pid:    9 ppid:     2 flags:0x0000000a
> [ 2924.534391] Workqueue: events_unbound deferred_probe_work_func
> [ 2924.540244] Call trace:
> [ 2924.542691]  __switch_to+0xe4/0x1b8
> [ 2924.546189]  deferred_probe_work_func+0xa4/0xf8
> [ 2924.550731]  process_one_work+0x208/0x480
> [ 2924.554754]  worker_thread+0x230/0x428
> [ 2924.558514]  kthread+0x114/0x120
> [ 2924.561752]  ret_from_fork+0x10/0x20
> 
> I imagine this is the annoying thing you want to avoid.
> 
> So experimenting a bit with a patch similar to yours (ignoring the timeout
> config issues and using the static cnt to temporarily stuck and revive the SCMI
> transport)
> 
> ------>8-----
> diff --git a/drivers/firmware/arm_scmi/shmem.c b/drivers/firmware/arm_scmi/shmem.c
> index 0e3eaea5d852..6dde669abd03 100644
> --- a/drivers/firmware/arm_scmi/shmem.c
> +++ b/drivers/firmware/arm_scmi/shmem.c
> @@ -8,6 +8,7 @@
>  #include <linux/io.h>
>  #include <linux/processor.h>
>  #include <linux/types.h>
>  
>  #include "common.h"
>  
> @@ -29,17 +30,28 @@ struct scmi_shared_mem {
>         u8 msg_payload[];
>  };
>  
> +static int cnt = 50;
>  void shmem_tx_prepare(struct scmi_shared_mem __iomem *shmem,
>                       struct scmi_xfer *xfer)
>  {
> +       ktime_t stop;
> +
>         /*
>          * Ideally channel must be free by now unless OS timeout last
>          * request and platform continued to process the same, wait
>          * until it releases the shared memory, otherwise we may endup
>          * overwriting its response with new message payload or vice-versa
>          */
> -       spin_until_cond(ioread32(&shmem->channel_status) &
> -                       SCMI_SHMEM_CHAN_STAT_CHANNEL_FREE);
> +       stop = ktime_add_ms(ktime_get(), 35);
> +       spin_until_cond(((--cnt > 0) && ioread32(&shmem->channel_status) &
> +                       SCMI_SHMEM_CHAN_STAT_CHANNEL_FREE) ||
> +                       ktime_after(ktime_get(), stop));
> +       if (ktime_after(ktime_get(), stop)) {
> +               pr_warn_once("TX Timeout !\n");
> +               cnt = 10;
> +               return;
> +       }
> +
>         /* Mark channel busy + clear error */
>         iowrite32(0x0, &shmem->channel_status);
>         iowrite32(xfer->hdr.poll_completion ? 0 : SCMI_SHMEM_FLAG_INTR_ENABLED,
> ----8<-------------
> 
> With the above I had in fact a system that could boot even with a
> failing/stuck SCMI transport, but, as expected the SCMI stack
> functionality was totally compromised after the first timeout with no
> possibility to recover.
> 
> Moreover I was thinking at what could happen if later on after boot the
> SCMI server should end in some funny/hogged condition so that it is,
> only temporarily, a bit slower to answer and release the channel: with
> the current implemenation the Kernel agent will spin just a little bit
> more waiting for the channel to be freed and then everything carries
> without much hassle, while with this possible new timing-out solution
> we could end up dropping that transmission and compromising the whole
> transport fucntionality for all the subsequent transmissions.
> 
> So, again, I'm not sure it is worth making such a change even for debug
> purposes, given that in the worst scenario above you end up with a
> system stuck at boot but for which the SCMI stack is anyway compromised
> and where the only solution is fixing the server FW really.
> 
> I'll ask Sudeep is thoughts about the possible hang.
>

I am fine with the patch as it provides more info on what is going wrong
in the system. Please post the patch separately with all the info/background.

-- 
Regards,
Sudeep

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH -next 1/1] firmware: arm_scmi: Fix possible deadlock in shmem_tx_prepare()
  2022-10-14 11:56           ` Sudeep Holla
@ 2022-10-14 12:23             ` Cristian Marussi
  2022-10-28 14:14               ` Cristian Marussi
  0 siblings, 1 reply; 8+ messages in thread
From: Cristian Marussi @ 2022-10-14 12:23 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: YaxiongTian, iambestgod, james.quinlan, linux-arm-kernel,
	linux-kernel, stable, tianyaxiong

On Fri, Oct 14, 2022 at 12:56:39PM +0100, Sudeep Holla wrote:
> On Thu, Oct 13, 2022 at 05:02:15PM +0100, Cristian Marussi wrote:
> > On Thu, Oct 13, 2022 at 03:05:43PM +0800, YaxiongTian wrote:
> > > Hi Cristian
> > > 
> > > �� There may be a problem with my qq email client, � I don't see my mail in
> > > the
> > > 
> > > communityI had to switch outlook email.Forgive me if you've received
> > > multiple emails.
> > > 
> > No worries.
> > 
> > > >Problem is anyway, as you said, you'll have to pick this timeout from the
> > > >related transport scmi_desc (even if as of now the max_rx_timeout for
> > > >all existent shared mem transport is the same..) and this means anyway
> > > >adding more complexity to the chain of calls to just to print a warn of
> > > >some kind in a rare error-situation from which you cannot recover anyway.
> > > 
> > > � Yes,it has add more complexity about Monitorring this time.For system
> > > stability,the safest thing to do is to abort the transmission.But this will
> > > lose performance due to more complexity in such unusual situation.
> > > 
> > > >Due to other unrelated discussions, I was starting to think about
> > > >exposing some debug-only (Kconfig dependent) SCMI stats like timeouts,
> > > errors,
> > > >unpexpected/OoO/late_replies in order to ease the debug and monitoring
> > > >of the health of a running SCMI stack: maybe this could be a place where
> > > >to flag this FW issues without changing the spinloop above (or
> > > >to add the kind of timeout you mentioned but only when some sort of
> > > >CONFIG_SCMI_DEBUG is enabled...)...still to fully think it through, though.
> > > 
> > > � I think it should active report warn or err rather than user queries the
> > > information manually.(i.e fs_debug way).Becasue in system startup\S1\S3\S4,
> > > user can not queries this flag in Fw,they need get stuck message
> > > immediately.
> > > 
> > 
> > Looking more closely at this, I experimented a bit with an SCMI stack based on
> > mailbox transport in which I had forcefully set the spin_until_cond() to
> > spin forever.
> > 
> > Even though on a normal SCMI system when the SCMI stack fails at boot
> > the system is supposed to boot anyway (maybe slower), this particular
> > failure in TX path led indeed to a system that does not boot at all and
> > spits out an infinite sequence of:
> > 
> > [ 2924.499486] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > [ 2924.505596] rcu:     2-...0: (0 ticks this GP) idle=1be4/1/0x4000000000000000 softirq=50/50 fqs=364757
> > [ 2924.514672]  (detected by 4, t=730678 jiffies, g=-1119, q=134 ncpus=6)
> > [ 2924.521215] Task dump for CPU 2:
> > [ 2924.524445] task:kworker/u12:0   state:R  running task     stack:    0 pid:    9 ppid:     2 flags:0x0000000a
> > [ 2924.534391] Workqueue: events_unbound deferred_probe_work_func
> > [ 2924.540244] Call trace:
> > [ 2924.542691]  __switch_to+0xe4/0x1b8
> > [ 2924.546189]  deferred_probe_work_func+0xa4/0xf8
> > [ 2924.550731]  process_one_work+0x208/0x480
> > [ 2924.554754]  worker_thread+0x230/0x428
> > [ 2924.558514]  kthread+0x114/0x120
> > [ 2924.561752]  ret_from_fork+0x10/0x20
> > 
> > I imagine this is the annoying thing you want to avoid.
> > 
> > So experimenting a bit with a patch similar to yours (ignoring the timeout
> > config issues and using the static cnt to temporarily stuck and revive the SCMI
> > transport)
> > 
> > ------>8-----
> > diff --git a/drivers/firmware/arm_scmi/shmem.c b/drivers/firmware/arm_scmi/shmem.c
> > index 0e3eaea5d852..6dde669abd03 100644
> > --- a/drivers/firmware/arm_scmi/shmem.c
> > +++ b/drivers/firmware/arm_scmi/shmem.c
> > @@ -8,6 +8,7 @@
> >  #include <linux/io.h>
> >  #include <linux/processor.h>
> >  #include <linux/types.h>
> >  
> >  #include "common.h"
> >  
> > @@ -29,17 +30,28 @@ struct scmi_shared_mem {
> >         u8 msg_payload[];
> >  };
> >  
> > +static int cnt = 50;
> >  void shmem_tx_prepare(struct scmi_shared_mem __iomem *shmem,
> >                       struct scmi_xfer *xfer)
> >  {
> > +       ktime_t stop;
> > +
> >         /*
> >          * Ideally channel must be free by now unless OS timeout last
> >          * request and platform continued to process the same, wait
> >          * until it releases the shared memory, otherwise we may endup
> >          * overwriting its response with new message payload or vice-versa
> >          */
> > -       spin_until_cond(ioread32(&shmem->channel_status) &
> > -                       SCMI_SHMEM_CHAN_STAT_CHANNEL_FREE);
> > +       stop = ktime_add_ms(ktime_get(), 35);
> > +       spin_until_cond(((--cnt > 0) && ioread32(&shmem->channel_status) &
> > +                       SCMI_SHMEM_CHAN_STAT_CHANNEL_FREE) ||
> > +                       ktime_after(ktime_get(), stop));
> > +       if (ktime_after(ktime_get(), stop)) {
> > +               pr_warn_once("TX Timeout !\n");
> > +               cnt = 10;
> > +               return;
> > +       }
> > +
> >         /* Mark channel busy + clear error */
> >         iowrite32(0x0, &shmem->channel_status);
> >         iowrite32(xfer->hdr.poll_completion ? 0 : SCMI_SHMEM_FLAG_INTR_ENABLED,
> > ----8<-------------
> > 
> > With the above I had in fact a system that could boot even with a
> > failing/stuck SCMI transport, but, as expected the SCMI stack
> > functionality was totally compromised after the first timeout with no
> > possibility to recover.
> > 
> > Moreover I was thinking at what could happen if later on after boot the
> > SCMI server should end in some funny/hogged condition so that it is,
> > only temporarily, a bit slower to answer and release the channel: with
> > the current implemenation the Kernel agent will spin just a little bit
> > more waiting for the channel to be freed and then everything carries
> > without much hassle, while with this possible new timing-out solution
> > we could end up dropping that transmission and compromising the whole
> > transport fucntionality for all the subsequent transmissions.
> > 
> > So, again, I'm not sure it is worth making such a change even for debug
> > purposes, given that in the worst scenario above you end up with a
> > system stuck at boot but for which the SCMI stack is anyway compromised
> > and where the only solution is fixing the server FW really.
> > 
> > I'll ask Sudeep is thoughts about the possible hang.
> >
> 
> I am fine with the patch as it provides more info on what is going wrong
> in the system. Please post the patch separately with all the info/background.
> 

Ok, I'll cleanup and post adding Reported/Suggested-by: YaxiongTian

I'm inclined to set the timeout comfortably more than the transport RX
timeout. (2xrx_timeout ?) to account for overhead and avoiding to bail
out on some transient delays.

Thanks,
Cristian

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH -next 1/1] firmware: arm_scmi: Fix possible deadlock in shmem_tx_prepare()
  2022-10-14 12:23             ` Cristian Marussi
@ 2022-10-28 14:14               ` Cristian Marussi
  0 siblings, 0 replies; 8+ messages in thread
From: Cristian Marussi @ 2022-10-28 14:14 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: YaxiongTian, iambestgod, james.quinlan, linux-arm-kernel,
	linux-kernel, stable, tianyaxiong

On Fri, Oct 14, 2022 at 01:23:11PM +0100, Cristian Marussi wrote:
> On Fri, Oct 14, 2022 at 12:56:39PM +0100, Sudeep Holla wrote:
> > On Thu, Oct 13, 2022 at 05:02:15PM +0100, Cristian Marussi wrote:
> > > On Thu, Oct 13, 2022 at 03:05:43PM +0800, YaxiongTian wrote:
> > > > Hi Cristian
> > > > 
> > > > �� There may be a problem with my qq email client, � I don't see my mail in
> > > > the
> > > > 
> > > > communityI had to switch outlook email.Forgive me if you've received
> > > > multiple emails.
> > > > 
> > > No worries.

Hi,


> > > 
> > > > >Problem is anyway, as you said, you'll have to pick this timeout from the
> > > > >related transport scmi_desc (even if as of now the max_rx_timeout for
> > > > >all existent shared mem transport is the same..) and this means anyway
> > > > >adding more complexity to the chain of calls to just to print a warn of
> > > > >some kind in a rare error-situation from which you cannot recover anyway.
> > > > 
> > > > � Yes,it has add more complexity about Monitorring this time.For system
> > > > stability,the safest thing to do is to abort the transmission.But this will
> > > > lose performance due to more complexity in such unusual situation.
> > > > 
> > > > >Due to other unrelated discussions, I was starting to think about
> > > > >exposing some debug-only (Kconfig dependent) SCMI stats like timeouts,
> > > > errors,
> > > > >unpexpected/OoO/late_replies in order to ease the debug and monitoring
> > > > >of the health of a running SCMI stack: maybe this could be a place where
> > > > >to flag this FW issues without changing the spinloop above (or
> > > > >to add the kind of timeout you mentioned but only when some sort of
> > > > >CONFIG_SCMI_DEBUG is enabled...)...still to fully think it through, though.
> > > > 
> > > > � I think it should active report warn or err rather than user queries the
> > > > information manually.(i.e fs_debug way).Becasue in system startup\S1\S3\S4,
> > > > user can not queries this flag in Fw,they need get stuck message
> > > > immediately.
> > > > 
> > > 
> > > Looking more closely at this, I experimented a bit with an SCMI stack based on
> > > mailbox transport in which I had forcefully set the spin_until_cond() to
> > > spin forever.
> > > 
> > > Even though on a normal SCMI system when the SCMI stack fails at boot
> > > the system is supposed to boot anyway (maybe slower), this particular
> > > failure in TX path led indeed to a system that does not boot at all and
> > > spits out an infinite sequence of:
> > > 
> > > [ 2924.499486] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > [ 2924.505596] rcu:     2-...0: (0 ticks this GP) idle=1be4/1/0x4000000000000000 softirq=50/50 fqs=364757
> > > [ 2924.514672]  (detected by 4, t=730678 jiffies, g=-1119, q=134 ncpus=6)
> > > [ 2924.521215] Task dump for CPU 2:
> > > [ 2924.524445] task:kworker/u12:0   state:R  running task     stack:    0 pid:    9 ppid:     2 flags:0x0000000a
> > > [ 2924.534391] Workqueue: events_unbound deferred_probe_work_func
> > > [ 2924.540244] Call trace:
> > > [ 2924.542691]  __switch_to+0xe4/0x1b8
> > > [ 2924.546189]  deferred_probe_work_func+0xa4/0xf8
> > > [ 2924.550731]  process_one_work+0x208/0x480
> > > [ 2924.554754]  worker_thread+0x230/0x428
> > > [ 2924.558514]  kthread+0x114/0x120
> > > [ 2924.561752]  ret_from_fork+0x10/0x20
> > > 
> > > I imagine this is the annoying thing you want to avoid.
> > > 
> > > So experimenting a bit with a patch similar to yours (ignoring the timeout
> > > config issues and using the static cnt to temporarily stuck and revive the SCMI
> > > transport)
> > > 
> > > ------>8-----
> > > diff --git a/drivers/firmware/arm_scmi/shmem.c b/drivers/firmware/arm_scmi/shmem.c
> > > index 0e3eaea5d852..6dde669abd03 100644
> > > --- a/drivers/firmware/arm_scmi/shmem.c
> > > +++ b/drivers/firmware/arm_scmi/shmem.c
> > > @@ -8,6 +8,7 @@
> > >  #include <linux/io.h>
> > >  #include <linux/processor.h>
> > >  #include <linux/types.h>
> > >  
> > >  #include "common.h"
> > >  
> > > @@ -29,17 +30,28 @@ struct scmi_shared_mem {
> > >         u8 msg_payload[];
> > >  };
> > >  
> > > +static int cnt = 50;
> > >  void shmem_tx_prepare(struct scmi_shared_mem __iomem *shmem,
> > >                       struct scmi_xfer *xfer)
> > >  {
> > > +       ktime_t stop;
> > > +
> > >         /*
> > >          * Ideally channel must be free by now unless OS timeout last
> > >          * request and platform continued to process the same, wait
> > >          * until it releases the shared memory, otherwise we may endup
> > >          * overwriting its response with new message payload or vice-versa
> > >          */
> > > -       spin_until_cond(ioread32(&shmem->channel_status) &
> > > -                       SCMI_SHMEM_CHAN_STAT_CHANNEL_FREE);
> > > +       stop = ktime_add_ms(ktime_get(), 35);
> > > +       spin_until_cond(((--cnt > 0) && ioread32(&shmem->channel_status) &
> > > +                       SCMI_SHMEM_CHAN_STAT_CHANNEL_FREE) ||
> > > +                       ktime_after(ktime_get(), stop));
> > > +       if (ktime_after(ktime_get(), stop)) {
> > > +               pr_warn_once("TX Timeout !\n");
> > > +               cnt = 10;
> > > +               return;
> > > +       }
> > > +
> > >         /* Mark channel busy + clear error */
> > >         iowrite32(0x0, &shmem->channel_status);
> > >         iowrite32(xfer->hdr.poll_completion ? 0 : SCMI_SHMEM_FLAG_INTR_ENABLED,
> > > ----8<-------------
> > > 
> > > With the above I had in fact a system that could boot even with a
> > > failing/stuck SCMI transport, but, as expected the SCMI stack
> > > functionality was totally compromised after the first timeout with no
> > > possibility to recover.
> > > 
> > > Moreover I was thinking at what could happen if later on after boot the
> > > SCMI server should end in some funny/hogged condition so that it is,
> > > only temporarily, a bit slower to answer and release the channel: with
> > > the current implemenation the Kernel agent will spin just a little bit
> > > more waiting for the channel to be freed and then everything carries
> > > without much hassle, while with this possible new timing-out solution
> > > we could end up dropping that transmission and compromising the whole
> > > transport fucntionality for all the subsequent transmissions.
> > > 
> > > So, again, I'm not sure it is worth making such a change even for debug
> > > purposes, given that in the worst scenario above you end up with a
> > > system stuck at boot but for which the SCMI stack is anyway compromised
> > > and where the only solution is fixing the server FW really.
> > > 
> > > I'll ask Sudeep is thoughts about the possible hang.
> > >
> > 
> > I am fine with the patch as it provides more info on what is going wrong
> > in the system. Please post the patch separately with all the info/background.
> > 
> 
> Ok, I'll cleanup and post adding Reported/Suggested-by: YaxiongTian
> 
> I'm inclined to set the timeout comfortably more than the transport RX
> timeout. (2xrx_timeout ?) to account for overhead and avoiding to bail
> out on some transient delays.

Just posted a fix for this like agreed.

https://lore.kernel.org/linux-arm-kernel/20221028140833.280091-3-cristian.marussi@arm.com/T/#u

Thanks,
Cristian

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-10-28 14:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-30  2:14 [PATCH -next 1/1] firmware: arm_scmi: Fix possible deadlock in shmem_tx_prepare() Yaxiong Tian
2022-10-03 11:57 ` Cristian Marussi
     [not found]   ` <tencent_38B168A1B211AC1C3F5BAACEB5DF9992E107@qq.com>
2022-10-11 14:14     ` Cristian Marussi
2022-10-13  7:05       ` YaxiongTian
2022-10-13 16:02         ` Cristian Marussi
2022-10-14 11:56           ` Sudeep Holla
2022-10-14 12:23             ` Cristian Marussi
2022-10-28 14:14               ` Cristian Marussi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).