All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] mailbox: Fix error handling in mbox_send_message()
@ 2022-09-15 16:47 Evgenii Shatokhin
  2022-09-15 16:47 ` [PATCH 1/2] mailbox: Propagate errors from .send_data() callback to mbox_send_message() Evgenii Shatokhin
  2022-09-15 16:47 ` [PATCH 2/2] mailbox: Error out early if the mbox driver has failed to submit the message Evgenii Shatokhin
  0 siblings, 2 replies; 7+ messages in thread
From: Evgenii Shatokhin @ 2022-09-15 16:47 UTC (permalink / raw)
  To: Jassi Brar; +Cc: linux-kernel, Ilya Kuznetsov, linux, Evgenii Shatokhin

This series fixes a couple issues I found while experimenting with mailbox device drivers:

1. If one tries to send a message via the mailbox using mbox_send_message(), and the controller driver for the mailbox device returns any error, mbox_send_message() returns -ETIME when in blocking mode. This is confusing at least.

2. If the mbox controller driver fails to send a message, mbox_send_message() needlessly waits for the message to be sent. That ends when the waiting timeout has been expired. However, the pointer to the message remains in the queue of the mbox framework (chan->msg_data) and the framework will try to process it over and over again, when one tries to send other messages. If the driver rejects the message each time because it is invalid (some drivers do that), this will prevent anyone from using that mailbox channel.

Evgenii Shatokhin (2):
  mailbox: Propagate errors from .send_data() callback to
    mbox_send_message()
  mailbox: Error out early if the mbox driver has failed to submit the
    message

 drivers/mailbox/mailbox.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] mailbox: Propagate errors from .send_data() callback to mbox_send_message()
  2022-09-15 16:47 [PATCH 0/2] mailbox: Fix error handling in mbox_send_message() Evgenii Shatokhin
@ 2022-09-15 16:47 ` Evgenii Shatokhin
  2022-09-16 15:57   ` Jassi Brar
  2022-09-15 16:47 ` [PATCH 2/2] mailbox: Error out early if the mbox driver has failed to submit the message Evgenii Shatokhin
  1 sibling, 1 reply; 7+ messages in thread
From: Evgenii Shatokhin @ 2022-09-15 16:47 UTC (permalink / raw)
  To: Jassi Brar; +Cc: linux-kernel, Ilya Kuznetsov, linux, Evgenii Shatokhin

msg_submit() calls .send_data() function from the mailbox controller driver
to place the first of the queued messages to the mailbox. Depending on the
actual driver used, this operation could fail.

In this case, if mbox_send_message() is called in blocking mode, it will
always return -ETIME rather than the actual error reported by the
underlying driver. This could be confusing, so let us propagate the
error from msg_submit() to mbox_send_message().

The errors from msg_submit() called in tx_tick() should be handled in a
subsequent patch.

Signed-off-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
---
 drivers/mailbox/mailbox.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
index 4229b9b5da98..04db5ef58f93 100644
--- a/drivers/mailbox/mailbox.c
+++ b/drivers/mailbox/mailbox.c
@@ -50,17 +50,24 @@ static int add_to_rbuf(struct mbox_chan *chan, void *mssg)
 	return idx;
 }
 
-static void msg_submit(struct mbox_chan *chan)
+static int msg_submit(struct mbox_chan *chan)
 {
 	unsigned count, idx;
 	unsigned long flags;
 	void *data;
-	int err = -EBUSY;
+	int err = 0;
 
 	spin_lock_irqsave(&chan->lock, flags);
 
-	if (!chan->msg_count || chan->active_req)
+	if (!chan->msg_count) {
+		spin_unlock_irqrestore(&chan->lock, flags);
+		return 0;
+	}
+
+	if (chan->active_req) {
+		err = -EBUSY;
 		goto exit;
+	}
 
 	count = chan->msg_count;
 	idx = chan->msg_free;
@@ -88,6 +95,7 @@ static void msg_submit(struct mbox_chan *chan)
 		hrtimer_start(&chan->mbox->poll_hrt, 0, HRTIMER_MODE_REL);
 		spin_unlock_irqrestore(&chan->mbox->poll_hrt_lock, flags);
 	}
+	return err;
 }
 
 static void tx_tick(struct mbox_chan *chan, int r)
@@ -256,6 +264,7 @@ EXPORT_SYMBOL_GPL(mbox_client_peek_data);
 int mbox_send_message(struct mbox_chan *chan, void *mssg)
 {
 	int t;
+	int err;
 
 	if (!chan || !chan->cl)
 		return -EINVAL;
@@ -266,7 +275,7 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg)
 		return t;
 	}
 
-	msg_submit(chan);
+	err = msg_submit(chan);
 
 	if (chan->cl->tx_block) {
 		unsigned long wait;
@@ -284,7 +293,7 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg)
 		}
 	}
 
-	return t;
+	return (err < 0) ? err : t;
 }
 EXPORT_SYMBOL_GPL(mbox_send_message);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] mailbox: Error out early if the mbox driver has failed to submit the message
  2022-09-15 16:47 [PATCH 0/2] mailbox: Fix error handling in mbox_send_message() Evgenii Shatokhin
  2022-09-15 16:47 ` [PATCH 1/2] mailbox: Propagate errors from .send_data() callback to mbox_send_message() Evgenii Shatokhin
@ 2022-09-15 16:47 ` Evgenii Shatokhin
  2022-09-16 17:04   ` Jassi Brar
  1 sibling, 1 reply; 7+ messages in thread
From: Evgenii Shatokhin @ 2022-09-15 16:47 UTC (permalink / raw)
  To: Jassi Brar; +Cc: linux-kernel, Ilya Kuznetsov, linux, Evgenii Shatokhin

mbox_send_message() places the pointer to the message to the queue
(add_to_rbuf) then calls msg_submit(chan) to submit the first of the
queued messaged to the mailbox. Some of mailbox drivers can return
errors from their .send_data() callbacks, e.g., if the message is
invalid or there is something wrong with the mailbox device.

In this case:
* hrtimer is not started by msg_submit();
* the pointer to that message remains in the queue;
* if mbox_send_message() is called in blocking mode, it will needlessly
wait for tx_tout ms (or for an hour if tx_out is not set), then it will
call tx_tick(chan, -ETIME).

tx_tick() will then try to submit the first message in the queue - the
same message as before. The underlying driver might reject it again.

The problematic message could then remain in the queue forever, which would
prevent the system from sending other, maybe unrelated messages via the
same mailbox channel. Moreover, the caller would be unable to free or reuse
the message structure.

Let us remove the message from the queue and error out from
mbox_send_message() early if sending of the message fails.

As for tx_tick() - not sure, if chan->cl->tx_done() should still be
called if msg_submit(chan) reports an error. For now, tx_tick() will
exit early too.

Signed-off-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
---
 drivers/mailbox/mailbox.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
index 04db5ef58f93..32d9ba05427e 100644
--- a/drivers/mailbox/mailbox.c
+++ b/drivers/mailbox/mailbox.c
@@ -82,10 +82,9 @@ static int msg_submit(struct mbox_chan *chan)
 		chan->cl->tx_prepare(chan->cl, data);
 	/* Try to submit a message to the MBOX controller */
 	err = chan->mbox->ops->send_data(chan, data);
-	if (!err) {
+	chan->msg_count--;
+	if (!err)
 		chan->active_req = data;
-		chan->msg_count--;
-	}
 exit:
 	spin_unlock_irqrestore(&chan->lock, flags);
 
@@ -102,6 +101,7 @@ static void tx_tick(struct mbox_chan *chan, int r)
 {
 	unsigned long flags;
 	void *mssg;
+	int err;
 
 	spin_lock_irqsave(&chan->lock, flags);
 	mssg = chan->active_req;
@@ -109,9 +109,8 @@ static void tx_tick(struct mbox_chan *chan, int r)
 	spin_unlock_irqrestore(&chan->lock, flags);
 
 	/* Submit next message */
-	msg_submit(chan);
-
-	if (!mssg)
+	err = msg_submit(chan);
+	if (!mssg || err)
 		return;
 
 	/* Notify the client */
@@ -276,6 +275,8 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg)
 	}
 
 	err = msg_submit(chan);
+	if (err)
+		return err;
 
 	if (chan->cl->tx_block) {
 		unsigned long wait;
@@ -293,7 +294,7 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg)
 		}
 	}
 
-	return (err < 0) ? err : t;
+	return t;
 }
 EXPORT_SYMBOL_GPL(mbox_send_message);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] mailbox: Propagate errors from .send_data() callback to mbox_send_message()
  2022-09-15 16:47 ` [PATCH 1/2] mailbox: Propagate errors from .send_data() callback to mbox_send_message() Evgenii Shatokhin
@ 2022-09-16 15:57   ` Jassi Brar
  0 siblings, 0 replies; 7+ messages in thread
From: Jassi Brar @ 2022-09-16 15:57 UTC (permalink / raw)
  To: Evgenii Shatokhin; +Cc: linux-kernel, Ilya Kuznetsov, linux

On Thu, Sep 15, 2022 at 11:49 AM Evgenii Shatokhin
<e.shatokhin@yadro.com> wrote:
>
> msg_submit() calls .send_data() function from the mailbox controller driver
> to place the first of the queued messages to the mailbox. Depending on the
> actual driver used, this operation could fail.
>
> In this case, if mbox_send_message() is called in blocking mode, it will
> always return -ETIME rather than the actual error reported by the
> underlying driver. This could be confusing, so let us propagate the
> error from msg_submit() to mbox_send_message().
>
In blocking mode, the client gets -ETIME because the api waited long
enough for the transfer to be successful.
It is the job of the underlying controller driver to be in a
consistent state and recover from errors. It may print its own error
but the client shouldn't have to differentiate between failure causes.

thanks

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] mailbox: Error out early if the mbox driver has failed to submit the message
  2022-09-15 16:47 ` [PATCH 2/2] mailbox: Error out early if the mbox driver has failed to submit the message Evgenii Shatokhin
@ 2022-09-16 17:04   ` Jassi Brar
  2022-09-18 16:32     ` Evgeny Shatokhin
  0 siblings, 1 reply; 7+ messages in thread
From: Jassi Brar @ 2022-09-16 17:04 UTC (permalink / raw)
  To: Evgenii Shatokhin; +Cc: linux-kernel, Ilya Kuznetsov, linux

On Thu, Sep 15, 2022 at 11:50 AM Evgenii Shatokhin
<e.shatokhin@yadro.com> wrote:
>
> mbox_send_message() places the pointer to the message to the queue
> (add_to_rbuf) then calls msg_submit(chan) to submit the first of the
> queued messaged to the mailbox. Some of mailbox drivers can return
> errors from their .send_data() callbacks, e.g., if the message is
> invalid or there is something wrong with the mailbox device.
>
The message can't be invalid because the client code is written for a
particular provider.

Though it is possible for the mailbox controller to break down for
some reason. In that case, the blocking api will keep retrying until
successful. But ideally the client, upon getting -ETIME, should free()
and request() the channel reset it (because controller drivers usually
don't contain the logic to automatically reset upon some error).

thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] mailbox: Error out early if the mbox driver has failed to submit the message
  2022-09-16 17:04   ` Jassi Brar
@ 2022-09-18 16:32     ` Evgeny Shatokhin
  2022-10-05 12:11       ` Evgeny Shatokhin
  0 siblings, 1 reply; 7+ messages in thread
From: Evgeny Shatokhin @ 2022-09-18 16:32 UTC (permalink / raw)
  To: Jassi Brar; +Cc: linux-kernel, Ilya Kuznetsov, linux

Thank you for a quick reply!

On 16.09.2022 20:04, Jassi Brar wrote:
> On Thu, Sep 15, 2022 at 11:50 AM Evgenii Shatokhin
> <e.shatokhin@yadro.com> wrote:
>>
>> mbox_send_message() places the pointer to the message to the queue
>> (add_to_rbuf) then calls msg_submit(chan) to submit the first of the
>> queued messaged to the mailbox. Some of mailbox drivers can return
>> errors from their .send_data() callbacks, e.g., if the message is
>> invalid or there is something wrong with the mailbox device.
>>
> The message can't be invalid because the client code is written for a
> particular provider.

As of mainline kernel v6.0-rc5, there are mailbox controller drivers 
which check if the messages are valid in their .send_data() callbacks. 
Example:

drivers/mailbox/rockchip-mailbox.c, rockchip_mbox_send_data():
	if (msg->rx_size > mb->buf_size) {
		dev_err(mb->mbox.dev, "Transmit size over buf size(%d)\n",
			mb->buf_size);
		return -EINVAL;
	}

Other examples are zynqmp_ipi_send_data() from 
drivers/mailbox/zynqmp-ipi-mailbox.c, ti_msgmgr_send_data() from 
drivers/mailbox/ti-msgmgr.c, etc.

If this is incorrect and the controller drivers should not do such 
things, I'd suggest to clearly state it in the docs, because it is far 
from obvious from Documentation/driver-api/mailbox.rst at the moment.

That is, one could state that checking if the messages to be transmitted 
are valid is a responsibility of the callers of mailbox API rather than 
of the controller driver.

I could prepare such patch for the docs. Objections?

> 
> Though it is possible for the mailbox controller to break down for
> some reason. In that case, the blocking api will keep retrying until
> successful. 

As far as I can see from the code, the behaviour seems to be different.

mbox_send_message() calls msg_submit() to send the message the first 
time. If that fails, hrtimer is not armed, so there will be no attempts 
to send the message again till tx_out ms pass:

	err = chan->mbox->ops->send_data(chan, data);
	if (!err) {
		chan->active_req = data;
		chan->msg_count--;
	}
exit:
	spin_unlock_irqrestore(&chan->lock, flags);

	if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
		/* kick start the timer immediately to avoid delays */
		spin_lock_irqsave(&chan->mbox->poll_hrt_lock, flags);
		hrtimer_start(&chan->mbox->poll_hrt, 0, HRTIMER_MODE_REL);
		spin_unlock_irqrestore(&chan->mbox->poll_hrt_lock, flags);
	}

This is from msg_submit(). Thus, the hrtimer will not fire, tx_tick() 
will not be called until tx_out ms have passed, and no attempts to send 
the message again will be made here.

In addition, complete(&chan->tx_complete) will not be called, so, 
mbox_send_message() will have to needlessly wait whole tx_out ms.
Only after that, it will call tx_tick(chan, -ETIME), which will, in 
turn, call msg_submit() to try to send the message again. If the mbox 
has not recovered, sending will fail again.

Then, mbox_send_message() will exit with -ETIME. The pointer to the 
message will remain in chan->msg_data[], but the framework will not 
attempt to send it again until the client calls mbox_send_message() for 
another, possibly unrelated message.

In this case, to sum up, mbox_send_message():
* needlessly waits for tx_out ms;
* only tries to send the message twice rather than makes retries until 
successful;
* does not inform the client about the actual error happened, just 
returns -ETIME;
* keeps the pointer to the message in chan->msg_data[], which is too 
easy to overlook on the client side. Too easy for the client to, say, 
reuse the structure and cause trouble.

What I suggest is to leave it to the client (or some other 
provider-specific code using the client) what to do with the failures.

If the error is reported by the controller driver, don't wait in 
mbox_send_message(), just pass the error to the client and exit. If the 
client decides to ignore the error - OK, its problem. Or - it may kick 
the mbox device somehow in a provider-specific way to make it work, or - 
reset the channel, or - do anything else to make things work again.

The behaviour of mbox_send_message() would then become more consistent: 
either it has sent the message successfully or it failed and returned an 
error, without side-effects (like the pointer to that message kept in 
the internal buffer).

I do not think this change would break the existing controller drivers 
and client drivers.

What do you think?

>But ideally the client, upon getting -ETIME, should free()
> and request() the channel reset it (because controller drivers usually
> don't contain the logic to automatically reset upon some error).
> 
> thanks.

Regards,
Evgenii


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] mailbox: Error out early if the mbox driver has failed to submit the message
  2022-09-18 16:32     ` Evgeny Shatokhin
@ 2022-10-05 12:11       ` Evgeny Shatokhin
  0 siblings, 0 replies; 7+ messages in thread
From: Evgeny Shatokhin @ 2022-10-05 12:11 UTC (permalink / raw)
  To: Jassi Brar; +Cc: linux-kernel, Ilya Kuznetsov, linux

Hi,

On 18.09.2022 19:32, Evgeny Shatokhin wrote:
> Thank you for a quick reply!
> 
> On 16.09.2022 20:04, Jassi Brar wrote:
>> On Thu, Sep 15, 2022 at 11:50 AM Evgenii Shatokhin
>> <e.shatokhin@yadro.com> wrote:
>>>
>>> mbox_send_message() places the pointer to the message to the queue
>>> (add_to_rbuf) then calls msg_submit(chan) to submit the first of the
>>> queued messaged to the mailbox. Some of mailbox drivers can return
>>> errors from their .send_data() callbacks, e.g., if the message is
>>> invalid or there is something wrong with the mailbox device.
>>>
>> The message can't be invalid because the client code is written for a
>> particular provider.
> 
> As of mainline kernel v6.0-rc5, there are mailbox controller drivers 
> which check if the messages are valid in their .send_data() callbacks. 
> Example:
> 
> drivers/mailbox/rockchip-mailbox.c, rockchip_mbox_send_data():
>      if (msg->rx_size > mb->buf_size) {
>          dev_err(mb->mbox.dev, "Transmit size over buf size(%d)\n",
>              mb->buf_size);
>          return -EINVAL;
>      }
> 
> Other examples are zynqmp_ipi_send_data() from 
> drivers/mailbox/zynqmp-ipi-mailbox.c, ti_msgmgr_send_data() from 
> drivers/mailbox/ti-msgmgr.c, etc.
> 
> If this is incorrect and the controller drivers should not do such 
> things, I'd suggest to clearly state it in the docs, because it is far 
> from obvious from Documentation/driver-api/mailbox.rst at the moment.
> 
> That is, one could state that checking if the messages to be transmitted 
> are valid is a responsibility of the callers of mailbox API rather than 
> of the controller driver.
> 
> I could prepare such patch for the docs. Objections?
> 
>>
>> Though it is possible for the mailbox controller to break down for
>> some reason. In that case, the blocking api will keep retrying until
>> successful. 
> 
> As far as I can see from the code, the behaviour seems to be different.
> 
> mbox_send_message() calls msg_submit() to send the message the first 
> time. If that fails, hrtimer is not armed, so there will be no attempts 
> to send the message again till tx_out ms pass:
> 
>      err = chan->mbox->ops->send_data(chan, data);
>      if (!err) {
>          chan->active_req = data;
>          chan->msg_count--;
>      }
> exit:
>      spin_unlock_irqrestore(&chan->lock, flags);
> 
>      if (!err && (chan->txdone_method & TXDONE_BY_POLL)) {
>          /* kick start the timer immediately to avoid delays */
>          spin_lock_irqsave(&chan->mbox->poll_hrt_lock, flags);
>          hrtimer_start(&chan->mbox->poll_hrt, 0, HRTIMER_MODE_REL);
>          spin_unlock_irqrestore(&chan->mbox->poll_hrt_lock, flags);
>      }
> 
> This is from msg_submit(). Thus, the hrtimer will not fire, tx_tick() 
> will not be called until tx_out ms have passed, and no attempts to send 
> the message again will be made here.
> 
> In addition, complete(&chan->tx_complete) will not be called, so, 
> mbox_send_message() will have to needlessly wait whole tx_out ms.
> Only after that, it will call tx_tick(chan, -ETIME), which will, in 
> turn, call msg_submit() to try to send the message again. If the mbox 
> has not recovered, sending will fail again.
> 
> Then, mbox_send_message() will exit with -ETIME. The pointer to the 
> message will remain in chan->msg_data[], but the framework will not 
> attempt to send it again until the client calls mbox_send_message() for 
> another, possibly unrelated message.
> 
> In this case, to sum up, mbox_send_message():
> * needlessly waits for tx_out ms;
> * only tries to send the message twice rather than makes retries until 
> successful;
> * does not inform the client about the actual error happened, just 
> returns -ETIME;
> * keeps the pointer to the message in chan->msg_data[], which is too 
> easy to overlook on the client side. Too easy for the client to, say, 
> reuse the structure and cause trouble.
> 
> What I suggest is to leave it to the client (or some other 
> provider-specific code using the client) what to do with the failures.
> 
> If the error is reported by the controller driver, don't wait in 
> mbox_send_message(), just pass the error to the client and exit. If the 
> client decides to ignore the error - OK, its problem. Or - it may kick 
> the mbox device somehow in a provider-specific way to make it work, or - 
> reset the channel, or - do anything else to make things work again.
> 
> The behaviour of mbox_send_message() would then become more consistent: 
> either it has sent the message successfully or it failed and returned an 
> error, without side-effects (like the pointer to that message kept in 
> the internal buffer).
> 
> I do not think this change would break the existing controller drivers 
> and client drivers.
> 
> What do you think?
> 
>> But ideally the client, upon getting -ETIME, should free()
>> and request() the channel reset it (because controller drivers usually
>> don't contain the logic to automatically reset upon some error).
>>
>> thanks.

Any updates on this?
Looking forward to your comments.

Regards,
Evgenii

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-10-05 12:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-15 16:47 [PATCH 0/2] mailbox: Fix error handling in mbox_send_message() Evgenii Shatokhin
2022-09-15 16:47 ` [PATCH 1/2] mailbox: Propagate errors from .send_data() callback to mbox_send_message() Evgenii Shatokhin
2022-09-16 15:57   ` Jassi Brar
2022-09-15 16:47 ` [PATCH 2/2] mailbox: Error out early if the mbox driver has failed to submit the message Evgenii Shatokhin
2022-09-16 17:04   ` Jassi Brar
2022-09-18 16:32     ` Evgeny Shatokhin
2022-10-05 12:11       ` Evgeny Shatokhin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.