linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Cristian Marussi <cristian.marussi@arm.com>
To: Yaxiong Tian <iambestgod@qq.com>
Cc: james.quinlan@broadcom.com, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	sudeep.holla@arm.com, tianyaxiong@kylinos.cn
Subject: Re: [PATCH -next 1/1] firmware: arm_scmi: Fix possible deadlock in shmem_tx_prepare()
Date: Tue, 11 Oct 2022 15:14:27 +0100	[thread overview]
Message-ID: <Y0V6Q7ZJ3GjBwWub@e120937-lin> (raw)
In-Reply-To: <tencent_38B168A1B211AC1C3F5BAACEB5DF9992E107@qq.com>

On Tue, Oct 11, 2022 at 03:46:15PM +0800, Yaxiong Tian wrote:
> >Note that there could be also the case in which there are many other
> >threads of executions on different cores trying to send a bunch of SCMI
> >commands (there are indeed many SCMI drivers currently in the stack) so
> >all these requests HAVE TO be queued while waiting for the shared
> >memory area to be free again, so I don't think you can assume in general
> >that shmem_tx_prepare() is called only after a successfull command completion
> >or timeout: it is probably true when an SCMI Mailbox transport is
> >used but it is not neccessarly the case when other shmem based transports
> >like optee/smc are used. You could have N threads trying to send an SCMI
> >command, queued, spinning on that loop, while waiting for the channel to
> >be free: in such a case note that you TX has not even complete, you are
> >still waiting for the channel the SCMI upper layer TX timeout has NOT
> >even being started since your the transmission itself is pending (i.e.
> >send_message has still not returned...)
> 
> I read the code in optee/smc, scmi_optee_send_message()/smc_send_message()
> have channel lock before shmem_tx_prepare(). The lock was released when 
> transports was completed 、timeout or err. So although they have N threads 
> trying to send an SCMI command, queued, but only one could take up the channel.
> Also only one thread call shmem_tx_prepare() in one channel and spin() in it.
> 
> It is also true in mailboxs or other shmem based transports,because SCMI 
> protocol say:"agent must exclusive the channel."This is very reasonable through 
> the channel lock rather than shared memory.
> 
> So it is simple for selecting timeout period. Because there is only one thread 
> spin() in one channel, so the timeout period only depending on the firmware. 
> In other words this time can be a constant. 
> 

Yes, my bad, I forgot that any shared-mem based transport has some kind
of mechanism to access exclusively the channel for the whole transaction
to avoid some thread can issue a tx_prepare before the previous
transaction has fully completed (i.e. the result in the reply, if any,
was fetched before being overwritten by the next)

> >Not sure that all of this kind of work would be worth to address some,
> >maybe transient, error conditions due to a broken SCMI server, BUT in any
> >case, any kind of timeout you want to introduce in the spin loop MUST
> >result in a failed transmission until the FREE bitflag is cleared by the
> >SCMI server; i.e. if that flag won't be cleared EVER by the server, you
> >have to end up with a sequence of timed-out spinloops and transmission
> >failures, you definetely cannot recover forcibly like this.
> 
> I totally agree above.In such broken SCMI server,users cannot get any Any 
> hints.So I think it at least pr_warn(). We can set prepare_tx_timout parameter 
> in scmi_desc,or just set options for users to check error.
>

Problem is anyway, as you said, you'll have to pick this timeout from the
related transport scmi_desc (even if as of now the max_rx_timeout for
all existent shared mem transport is the same..) and this means anyway
adding more complexity to the chain of calls to just to print a warn of
some kind in a rare error-situation from which you cannot recover anyway.

Due to other unrelated discussions, I was starting to think about
exposing some debug-only (Kconfig dependent) SCMI stats like timeouts, errors,
unpexpected/OoO/late_replies in order to ease the debug and monitoring
of the health of a running SCMI stack: maybe this could be a place where
to flag this FW issues without changing the spinloop above (or
to add the kind of timeout you mentioned but only when some sort of
CONFIG_SCMI_DEBUG is enabled...)...still to fully think it through, though.

Any thoughts ?

Thanks,
Cristian

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2022-10-11 14:15 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-30  2:14 [PATCH -next 1/1] firmware: arm_scmi: Fix possible deadlock in shmem_tx_prepare() Yaxiong Tian
2022-10-03 11:57 ` Cristian Marussi
     [not found]   ` <tencent_38B168A1B211AC1C3F5BAACEB5DF9992E107@qq.com>
2022-10-11 14:14     ` Cristian Marussi [this message]
2022-10-13  7:05       ` YaxiongTian
2022-10-13 16:02         ` Cristian Marussi
2022-10-14 11:56           ` Sudeep Holla
2022-10-14 12:23             ` Cristian Marussi
2022-10-28 14:14               ` Cristian Marussi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y0V6Q7ZJ3GjBwWub@e120937-lin \
    --to=cristian.marussi@arm.com \
    --cc=iambestgod@qq.com \
    --cc=james.quinlan@broadcom.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=sudeep.holla@arm.com \
    --cc=tianyaxiong@kylinos.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).