All of lore.kernel.org
 help / color / mirror / Atom feed
From: Cristian Marussi <cristian.marussi@arm.com>
To: rishabhb@codeaurora.org
Cc: sudeep.holla@arm.com, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, avajid@codeaurora.org,
	adharmap@codeaurora.org
Subject: Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails
Date: Tue, 31 Aug 2021 06:48:35 +0100	[thread overview]
Message-ID: <20210831054835.GJ13160@e120937-lin> (raw)
In-Reply-To: <51782599a01a6a22409d01e5fc1f8a50@codeaurora.org>

On Mon, Aug 30, 2021 at 02:09:37PM -0700, rishabhb@codeaurora.org wrote:
> Hi Christian

Hi Rishabh,

thanks for looking into this kind of bad interactions.

> There seems to be another issue here. The response from agent can be delayed
> causing a timeout during base protocol acquire,
> which leads to the probe failure. What I have observed is sometimes the
> failure of probe and rx_callback (due to a delayed message)
> happens at the same time on different cpus.
> Because of this race, the device memory may be cleared while the
> interrupt(rx_callback) is executing on another cpu.

You are right that concurrency was not handled properly in this kind of
context and moreover, if you think about it, even the case of out of
order reception of responses and delayed_responses (type2 SCMI messages)
for asynchronous SCMI commands was not handled properly.

> How do you propose we solve this? Do you think it is better to take the
> setting up of base and other protocols out of probe and
> in some delayed work? That would imply the device memory is not released
> until remove is called. Or should we add locking to
> the interrupt handler(scmi_rx_callback) and the cleanup in probe to avoid
> the race?
> 

These issues were more easily exposed by SCMI Virtio transport, so in
the series where I introduced scmi-virtio:

https://lore.kernel.org/linux-arm-kernel/162848483974.232214.9506203742448269364.b4-ty@arm.com/

(which is now queued for v5.15 ...  now on -next I think...finger crossed)

I took the chance to rectify a couple of other things in the SCMI core
in the initial commits.
As an example, in the above series

 [PATCH v7 05/15] firmware: arm_scmi: Handle concurrent and out-of-order messages

cares to add a refcount to xfers and some locking on xfers between TX
and RX path to avoid that a timed out xfer can vanish while the rx path
is concurrently working on it (as you said); moreover I handle the
condition (rare if not unplausible anyway) in which a transport delivers
out of order responses and delayed responses.

I tested this scenarios on some fake emulated SCMI Virtio transport
where I could play any sort of mess and tricks to stress this limit
conditions, but you're more than welcome to verify if the race you are
seeing on Base protocol time out is solved (as I would hope :D) by this
series of mine.

Let me know, any feedback is welcome.

Btw, in the series above there are also other minor changes, but there
is also another more radical change needed to ensure correctness and
protection against stale old messages which maybe could interest you
in general if you are looking into SCMI:

[PATCH v7 04/15] firmware: arm_scmi: Introduce monotonically increasing tokens 

Let me know if yo have other concerns.

Thanks
Cristian


WARNING: multiple messages have this Message-ID (diff)
From: Cristian Marussi <cristian.marussi@arm.com>
To: rishabhb@codeaurora.org
Cc: sudeep.holla@arm.com, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, avajid@codeaurora.org,
	adharmap@codeaurora.org
Subject: Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails
Date: Tue, 31 Aug 2021 06:48:35 +0100	[thread overview]
Message-ID: <20210831054835.GJ13160@e120937-lin> (raw)
In-Reply-To: <51782599a01a6a22409d01e5fc1f8a50@codeaurora.org>

On Mon, Aug 30, 2021 at 02:09:37PM -0700, rishabhb@codeaurora.org wrote:
> Hi Christian

Hi Rishabh,

thanks for looking into this kind of bad interactions.

> There seems to be another issue here. The response from agent can be delayed
> causing a timeout during base protocol acquire,
> which leads to the probe failure. What I have observed is sometimes the
> failure of probe and rx_callback (due to a delayed message)
> happens at the same time on different cpus.
> Because of this race, the device memory may be cleared while the
> interrupt(rx_callback) is executing on another cpu.

You are right that concurrency was not handled properly in this kind of
context and moreover, if you think about it, even the case of out of
order reception of responses and delayed_responses (type2 SCMI messages)
for asynchronous SCMI commands was not handled properly.

> How do you propose we solve this? Do you think it is better to take the
> setting up of base and other protocols out of probe and
> in some delayed work? That would imply the device memory is not released
> until remove is called. Or should we add locking to
> the interrupt handler(scmi_rx_callback) and the cleanup in probe to avoid
> the race?
> 

These issues were more easily exposed by SCMI Virtio transport, so in
the series where I introduced scmi-virtio:

https://lore.kernel.org/linux-arm-kernel/162848483974.232214.9506203742448269364.b4-ty@arm.com/

(which is now queued for v5.15 ...  now on -next I think...finger crossed)

I took the chance to rectify a couple of other things in the SCMI core
in the initial commits.
As an example, in the above series

 [PATCH v7 05/15] firmware: arm_scmi: Handle concurrent and out-of-order messages

cares to add a refcount to xfers and some locking on xfers between TX
and RX path to avoid that a timed out xfer can vanish while the rx path
is concurrently working on it (as you said); moreover I handle the
condition (rare if not unplausible anyway) in which a transport delivers
out of order responses and delayed responses.

I tested this scenarios on some fake emulated SCMI Virtio transport
where I could play any sort of mess and tricks to stress this limit
conditions, but you're more than welcome to verify if the race you are
seeing on Base protocol time out is solved (as I would hope :D) by this
series of mine.

Let me know, any feedback is welcome.

Btw, in the series above there are also other minor changes, but there
is also another more radical change needed to ensure correctness and
protection against stale old messages which maybe could interest you
in general if you are looking into SCMI:

[PATCH v7 04/15] firmware: arm_scmi: Introduce monotonically increasing tokens 

Let me know if yo have other concerns.

Thanks
Cristian


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-08-31  5:48 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-04 21:19 [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails Rishabh Bhatnagar
2021-08-05 10:54 ` Cristian Marussi
2021-08-05 10:54   ` Cristian Marussi
2021-08-30 21:09   ` rishabhb
2021-08-31  5:48     ` Cristian Marussi [this message]
2021-08-31  5:48       ` Cristian Marussi
2021-09-01  9:35       ` Cristian Marussi
2021-09-01  9:35         ` Cristian Marussi
2021-11-01 16:35         ` rishabhb
2021-11-02 11:32           ` Sudeep Holla
2021-11-02 11:32             ` Sudeep Holla
2021-11-04 23:40             ` rishabhb
2021-11-05  9:43               ` Cristian Marussi
2021-11-05  9:43                 ` Cristian Marussi
2021-11-05 17:40                 ` rishabhb
2021-11-07 10:34                   ` Cristian Marussi
2021-11-07 10:34                     ` Cristian Marussi
2021-11-07 18:22                     ` Cristian Marussi
2021-11-07 18:22                       ` Cristian Marussi
2021-11-08 17:58                       ` rishabhb
2021-11-09 14:38                         ` Cristian Marussi
2021-11-09 14:38                           ` Cristian Marussi
2021-08-09  5:00 ` Sudeep Holla
2021-08-09  5:00   ` Sudeep Holla

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210831054835.GJ13160@e120937-lin \
    --to=cristian.marussi@arm.com \
    --cc=adharmap@codeaurora.org \
    --cc=avajid@codeaurora.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rishabhb@codeaurora.org \
    --cc=sudeep.holla@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.