All of lore.kernel.org
 help / color / mirror / Atom feed
From: rishabhb@codeaurora.org
To: Cristian Marussi <cristian.marussi@arm.com>
Cc: sudeep.holla@arm.com, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, avajid@codeaurora.org,
	adharmap@codeaurora.org
Subject: Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails
Date: Mon, 30 Aug 2021 14:09:37 -0700	[thread overview]
Message-ID: <51782599a01a6a22409d01e5fc1f8a50@codeaurora.org> (raw)
In-Reply-To: <20210805105427.GU6592@e120937-lin>

Hi Christian
There seems to be another issue here. The response from agent can be 
delayed causing a timeout during base protocol acquire,
which leads to the probe failure. What I have observed is sometimes the 
failure of probe and rx_callback (due to a delayed message)
happens at the same time on different cpus.
Because of this race, the device memory may be cleared while the 
interrupt(rx_callback) is executing on another cpu.
How do you propose we solve this? Do you think it is better to take the 
setting up of base and other protocols out of probe and
in some delayed work? That would imply the device memory is not released 
until remove is called. Or should we add locking to
the interrupt handler(scmi_rx_callback) and the cleanup in probe to 
avoid the race?

On 2021-08-05 03:54, Cristian Marussi wrote:
> On Wed, Aug 04, 2021 at 02:19:59PM -0700, Rishabh Bhatnagar wrote:
>> Mailbox channels for the base protocol are setup during probe.
>> There can be a scenario where probe fails to acquire the base
>> protocol due to a timeout leading to cleaning up of all device
>> managed memory including the scmi_mailbox structure setup during
>> mailbox_chan_setup function.
>> [   12.735104]arm-scmi soc:qcom,scmi: timed out in resp(caller: 
>> version_get+0x84/0x140)
>> [   12.735224]arm-scmi soc:qcom,scmi: unable to communicate with SCMI
>> [   12.735947]arm-scmi: probe of soc:qcom,scmi failed with error -110
>> 
>> Now when a message arrives at cpu slightly after the timeout, the 
>> mailbox
>> controller will try to call the rx_callback of the client and might 
>> end
>> up accessing freed memory.
>> [   12.758363][    C0] Call trace:
>> [   12.758367][    C0]  rx_callback+0x24/0x160
>> [   12.758372][    C0]  mbox_chan_received_data+0x44/0x94
>> [   12.758386][    C0]  __handle_irq_event_percpu+0xd4/0x240
>> This patch frees the mailbox channels setup during probe and adds some 
>> more
>> error handling in case the probe fails.
>> 
>> Signed-off-by: Rishabh Bhatnagar <rishabhb@codeaurora.org>
>> ---
>>  drivers/firmware/arm_scmi/driver.c | 35 
>> ++++++++++++++++++++++++-----------
>>  1 file changed, 24 insertions(+), 11 deletions(-)
>> 
>> diff --git a/drivers/firmware/arm_scmi/driver.c 
>> b/drivers/firmware/arm_scmi/driver.c
>> index 9b2e8d4..ead3bd3 100644
>> --- a/drivers/firmware/arm_scmi/driver.c
>> +++ b/drivers/firmware/arm_scmi/driver.c
>> @@ -1390,6 +1390,21 @@ void scmi_protocol_device_unrequest(const 
>> struct scmi_device_id *id_table)
>>  	mutex_unlock(&scmi_requested_devices_mtx);
>>  }
>> 
> 
> Hi,
> 
>> +static int scmi_cleanup_txrx_channels(struct scmi_info *info)
>> +{
>> +	int ret;
>> +	struct idr *idr = &info->tx_idr;
>> +
>> +	ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
>> +	idr_destroy(&info->tx_idr);
>> +
>> +	idr = &info->rx_idr;
>> +	ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
>> +	idr_destroy(&info->rx_idr);
>> +
>> +	return ret;
>> +}
>> +
>>  static int scmi_probe(struct platform_device *pdev)
>>  {
>>  	int ret;
>> @@ -1430,7 +1445,7 @@ static int scmi_probe(struct platform_device 
>> *pdev)
>> 
>>  	ret = scmi_xfer_info_init(info);
>>  	if (ret)
>> -		return ret;
>> +		goto clear_txrx_setup;
>> 
>>  	if (scmi_notification_init(handle))
>>  		dev_err(dev, "SCMI Notifications NOT available.\n");
>> @@ -1443,7 +1458,7 @@ static int scmi_probe(struct platform_device 
>> *pdev)
>>  	ret = scmi_protocol_acquire(handle, SCMI_PROTOCOL_BASE);
>>  	if (ret) {
>>  		dev_err(dev, "unable to communicate with SCMI\n");
>> -		return ret;
>> +		goto notification_exit;
>>  	}
>> 
>>  	mutex_lock(&scmi_list_mutex);
>> @@ -1482,6 +1497,12 @@ static int scmi_probe(struct platform_device 
>> *pdev)
>>  	}
>> 
>>  	return 0;
>> +
>> +notification_exit:
>> +	scmi_notification_exit(&info->handle);
>> +clear_txrx_setup:
>> +	scmi_cleanup_txrx_channels(info);
>> +	return ret;
>>  }
>> 
>>  void scmi_free_channel(struct scmi_chan_info *cinfo, struct idr *idr, 
>> int id)
>> @@ -1493,7 +1514,6 @@ static int scmi_remove(struct platform_device 
>> *pdev)
>>  {
>>  	int ret = 0, id;
>>  	struct scmi_info *info = platform_get_drvdata(pdev);
>> -	struct idr *idr = &info->tx_idr;
>>  	struct device_node *child;
>> 
>>  	mutex_lock(&scmi_list_mutex);
>> @@ -1517,14 +1537,7 @@ static int scmi_remove(struct platform_device 
>> *pdev)
>>  	idr_destroy(&info->active_protocols);
>> 
>>  	/* Safe to free channels since no more users */
>> -	ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
>> -	idr_destroy(&info->tx_idr);
>> -
>> -	idr = &info->rx_idr;
>> -	ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
>> -	idr_destroy(&info->rx_idr);
>> -
>> -	return ret;
>> +	return scmi_cleanup_txrx_channels(info);
>>  }
>> 
> 
> Looks good to me.
> 
> Reviewed-by: Cristian Marussi <cristian.marussi@arm.com>
> Tested-by: Cristian Marussi <cristian.marussi@arm.com>
> (Re-tested on for-next/scmi on top of virtio-scmi series)
> 
> Thanks,
> Cristian

  reply	other threads:[~2021-08-30 21:09 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-04 21:19 Rishabh Bhatnagar
2021-08-05 10:54 ` Cristian Marussi
2021-08-05 10:54   ` Cristian Marussi
2021-08-30 21:09   ` rishabhb [this message]
2021-08-31  5:48     ` Cristian Marussi
2021-08-31  5:48       ` Cristian Marussi
2021-09-01  9:35       ` Cristian Marussi
2021-09-01  9:35         ` Cristian Marussi
2021-11-01 16:35         ` rishabhb
2021-11-02 11:32           ` Sudeep Holla
2021-11-02 11:32             ` Sudeep Holla
2021-11-04 23:40             ` rishabhb
2021-11-05  9:43               ` Cristian Marussi
2021-11-05  9:43                 ` Cristian Marussi
2021-11-05 17:40                 ` rishabhb
2021-11-07 10:34                   ` Cristian Marussi
2021-11-07 10:34                     ` Cristian Marussi
2021-11-07 18:22                     ` Cristian Marussi
2021-11-07 18:22                       ` Cristian Marussi
2021-11-08 17:58                       ` rishabhb
2021-11-09 14:38                         ` Cristian Marussi
2021-11-09 14:38                           ` Cristian Marussi
2021-08-09  5:00 ` Sudeep Holla
2021-08-09  5:00   ` Sudeep Holla

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51782599a01a6a22409d01e5fc1f8a50@codeaurora.org \
    --to=rishabhb@codeaurora.org \
    --cc=adharmap@codeaurora.org \
    --cc=avajid@codeaurora.org \
    --cc=cristian.marussi@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sudeep.holla@arm.com \
    --subject='Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.