All of lore.kernel.org
 help / color / mirror / Atom feed
From: rishabhb@codeaurora.org
To: Cristian Marussi <cristian.marussi@arm.com>
Cc: sudeep.holla@arm.com, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, avajid@codeaurora.org,
	adharmap@codeaurora.org
Subject: Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails
Date: Mon, 01 Nov 2021 09:35:42 -0700	[thread overview]
Message-ID: <aab71610e11c2dd293159576cc53e277@codeaurora.org> (raw)
In-Reply-To: <20210901093558.GL13160@e120937-lin>

On 2021-09-01 02:35, Cristian Marussi wrote:
> On Tue, Aug 31, 2021 at 06:48:35AM +0100, Cristian Marussi wrote:
>> On Mon, Aug 30, 2021 at 02:09:37PM -0700, rishabhb@codeaurora.org 
>> wrote:
>> > Hi Christian
>> 
>> Hi Rishabh,
>> 
>> thanks for looking into this kind of bad interactions.
>> 
>> > There seems to be another issue here. The response from agent can be delayed
>> > causing a timeout during base protocol acquire,
>> > which leads to the probe failure. What I have observed is sometimes the
>> > failure of probe and rx_callback (due to a delayed message)
>> > happens at the same time on different cpus.
>> > Because of this race, the device memory may be cleared while the
>> > interrupt(rx_callback) is executing on another cpu.
>> 
>> You are right that concurrency was not handled properly in this kind 
>> of
>> context and moreover, if you think about it, even the case of out of
>> order reception of responses and delayed_responses (type2 SCMI 
>> messages)
>> for asynchronous SCMI commands was not handled properly.
>> 
>> > How do you propose we solve this? Do you think it is better to take the
>> > setting up of base and other protocols out of probe and
>> > in some delayed work? That would imply the device memory is not released
>> > until remove is called. Or should we add locking to
>> > the interrupt handler(scmi_rx_callback) and the cleanup in probe to avoid
>> > the race?
>> >
>> 
>> These issues were more easily exposed by SCMI Virtio transport, so in
>> the series where I introduced scmi-virtio:
>> 
>> https://lore.kernel.org/linux-arm-kernel/162848483974.232214.9506203742448269364.b4-ty@arm.com/
>> 
>> (which is now queued for v5.15 ...  now on -next I think...finger 
>> crossed)
>> 
>> I took the chance to rectify a couple of other things in the SCMI core
>> in the initial commits.
>> As an example, in the above series
>> 
>>  [PATCH v7 05/15] firmware: arm_scmi: Handle concurrent and 
>> out-of-order messages
>> 
>> cares to add a refcount to xfers and some locking on xfers between TX
>> and RX path to avoid that a timed out xfer can vanish while the rx 
>> path
>> is concurrently working on it (as you said); moreover I handle the
>> condition (rare if not unplausible anyway) in which a transport 
>> delivers
>> out of order responses and delayed responses.
>> 
>> I tested this scenarios on some fake emulated SCMI Virtio transport
>> where I could play any sort of mess and tricks to stress this limit
>> conditions, but you're more than welcome to verify if the race you are
>> seeing on Base protocol time out is solved (as I would hope :D) by 
>> this
>> series of mine.
>> 
>> Let me know, any feedback is welcome.
>> 
>> Btw, in the series above there are also other minor changes, but there
>> is also another more radical change needed to ensure correctness and
>> protection against stale old messages which maybe could interest you
>> in general if you are looking into SCMI:
>> 
>> [PATCH v7 04/15] firmware: arm_scmi: Introduce monotonically 
>> increasing tokens
>> 
>> Let me know if yo have other concerns.
>> 
> 
> Hi Rishabhb,
> 
> just a quick remark, thinking again about your fail @probe scenario 
> above
> I realized that while the concurrency patch I mentioned above could 
> help on
> races against vanishing xfers when late timed-out responses are 
> delivered,
> here we really are then also shutting down everything on failure, so 
> there
> could be further issues between a very late invokation of 
> scmi_rx_callback
> and the core devm_ helpers freeing the underlying xfer/cinfo/etc.. 
> structs
> used by scmi-rx-callback itself (maybe this was already what you meant 
> and
> I didn't get it,...sorry)
> 
> On the other side, I don't feel that delaying Base init to a deferred
> worker is a viable solution since we need Base protocol init to be
> initialized and we need to just give up if we cannot communicate with
> the SCMI platform fw in such early stages. (Base protocol is really the
> only mandatory proto is I remember correctly the spec)
> 
> Currenly I'm off and only glancing at mails but I'll have a thought 
> about
> these issues once back in a few weeks time.
> 
> Thanks,
> Cristian
> 
Hi Cristian
I hope you enjoyed your vacation. Did you get a chance to look at the 
issue stated above
and have some idea as to how to solve this?
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-11-01 16:37 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-04 21:19 Rishabh Bhatnagar
2021-08-05 10:54 ` Cristian Marussi
2021-08-05 10:54   ` Cristian Marussi
2021-08-30 21:09   ` rishabhb
2021-08-31  5:48     ` Cristian Marussi
2021-08-31  5:48       ` Cristian Marussi
2021-09-01  9:35       ` Cristian Marussi
2021-09-01  9:35         ` Cristian Marussi
2021-11-01 16:35         ` rishabhb [this message]
2021-11-02 11:32           ` Sudeep Holla
2021-11-02 11:32             ` Sudeep Holla
2021-11-04 23:40             ` rishabhb
2021-11-05  9:43               ` Cristian Marussi
2021-11-05  9:43                 ` Cristian Marussi
2021-11-05 17:40                 ` rishabhb
2021-11-07 10:34                   ` Cristian Marussi
2021-11-07 10:34                     ` Cristian Marussi
2021-11-07 18:22                     ` Cristian Marussi
2021-11-07 18:22                       ` Cristian Marussi
2021-11-08 17:58                       ` rishabhb
2021-11-09 14:38                         ` Cristian Marussi
2021-11-09 14:38                           ` Cristian Marussi
2021-08-09  5:00 ` Sudeep Holla
2021-08-09  5:00   ` Sudeep Holla

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aab71610e11c2dd293159576cc53e277@codeaurora.org \
    --to=rishabhb@codeaurora.org \
    --cc=adharmap@codeaurora.org \
    --cc=avajid@codeaurora.org \
    --cc=cristian.marussi@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sudeep.holla@arm.com \
    --subject='Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.