* [PATCH/RFC v2] net: ethernet: ravb: exit if hardware is in-progress in tx timeout
@ 2020-07-20 11:58 Yoshihiro Shimoda
2020-07-20 17:14 ` Sergei Shtylyov
0 siblings, 1 reply; 3+ messages in thread
From: Yoshihiro Shimoda @ 2020-07-20 11:58 UTC (permalink / raw)
To: sergei.shtylyov, davem, kuba
Cc: dirk.behme, Shashikant.Suguni, netdev, linux-renesas-soc,
Yoshihiro Shimoda
According to the report of [1], this driver is possible to cause
the following error in ravb_tx_timeout_work().
ravb e6800000.ethernet ethernet: failed to switch device to config mode
This error means that the hardware could not change the state
from "Operation" to "Configuration" while some tx and/or rx queue
are operating. After that, ravb_config() in ravb_dmac_init() will fail,
and then any descriptors will be not allocaled anymore so that NULL
pointer dereference happens after that on ravb_start_xmit().
To fix the issue, the ravb_tx_timeout_work() should check
the return value of ravb_stop_dma() whether this hardware can be
re-initialized or not. If ravb_stop_dma() fails, ravb_tx_timeout_work()
re-enables TX and RX and just exits.
[1]
https://lore.kernel.org/linux-renesas-soc/20200518045452.2390-1-dirk.behme@de.bosch.com/
Reported-by: Dirk Behme <dirk.behme@de.bosch.com>
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
---
Changes from RFC v1:
- Check the return value of ravb_stop_dma() and exit if the hardware
condition can not be initialized in the tx timeout.
- Update the commit subject and description.
- Fix some typo.
https://patchwork.kernel.org/patch/11570217/
Unfortunately, I still didn't reproduce the issue yet. So, I still
marked RFC on this patch.
drivers/net/ethernet/renesas/ravb_main.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index a442bcf6..500f5c1 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -1458,7 +1458,18 @@ static void ravb_tx_timeout_work(struct work_struct *work)
ravb_ptp_stop(ndev);
/* Wait for DMA stopping */
- ravb_stop_dma(ndev);
+ if (ravb_stop_dma(ndev)) {
+ /* If ravb_stop_dma() fails, the hardware is still in-progress
+ * as "Operation" mode for TX and/or RX. So, this should not
+ * call the following functions because ravb_dmac_init() is
+ * possible to fail too. Also, this should not retry
+ * ravb_stop_dma() again and again here because it's possible
+ * to wait forever. So, this just re-enables the TX and RX and
+ * skip the following re-initialization procedure.
+ */
+ ravb_rcv_snd_enable(ndev);
+ goto out;
+ }
ravb_ring_free(ndev, RAVB_BE);
ravb_ring_free(ndev, RAVB_NC);
@@ -1467,6 +1478,7 @@ static void ravb_tx_timeout_work(struct work_struct *work)
ravb_dmac_init(ndev);
ravb_emac_init(ndev);
+out:
/* Initialise PTP Clock driver */
if (priv->chip_id == RCAR_GEN2)
ravb_ptp_init(ndev, priv->pdev);
--
2.7.4
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH/RFC v2] net: ethernet: ravb: exit if hardware is in-progress in tx timeout
2020-07-20 11:58 [PATCH/RFC v2] net: ethernet: ravb: exit if hardware is in-progress in tx timeout Yoshihiro Shimoda
@ 2020-07-20 17:14 ` Sergei Shtylyov
2020-07-21 1:39 ` Yoshihiro Shimoda
0 siblings, 1 reply; 3+ messages in thread
From: Sergei Shtylyov @ 2020-07-20 17:14 UTC (permalink / raw)
To: Yoshihiro Shimoda, davem, kuba
Cc: dirk.behme, Shashikant.Suguni, netdev, linux-renesas-soc
Hello!
On 7/20/20 2:58 PM, Yoshihiro Shimoda wrote:
> According to the report of [1], this driver is possible to cause
> the following error in ravb_tx_timeout_work().
>
> ravb e6800000.ethernet ethernet: failed to switch device to config mode
Hmm, maybe we need a larger timeout there? The current one amounts to only
~100 ms for all cases (maybe we should parametrize the timeout?)...
> This error means that the hardware could not change the state
> from "Operation" to "Configuration" while some tx and/or rx queue
> are operating. After that, ravb_config() in ravb_dmac_init() will fail,
Are we seeing double messages from ravb_config()? I think we aren't...
> and then any descriptors will be not allocaled anymore so that NULL
> pointer dereference happens after that on ravb_start_xmit().
>
> To fix the issue, the ravb_tx_timeout_work() should check
> the return value of ravb_stop_dma() whether this hardware can be
> re-initialized or not. If ravb_stop_dma() fails, ravb_tx_timeout_work()
> re-enables TX and RX and just exits.
>
> [1]
> https://lore.kernel.org/linux-renesas-soc/20200518045452.2390-1-dirk.behme@de.bosch.com/
>
> Reported-by: Dirk Behme <dirk.behme@de.bosch.com>
> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Assuming the comment below is fixed:
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
> ---
> Changes from RFC v1:
> - Check the return value of ravb_stop_dma() and exit if the hardware
> condition can not be initialized in the tx timeout.
> - Update the commit subject and description.
> - Fix some typo.
> https://patchwork.kernel.org/patch/11570217/
>
> Unfortunately, I still didn't reproduce the issue yet. So, I still
> marked RFC on this patch.
I think the Bosch people should test this patch, as they reported the kernel oops...
>
> drivers/net/ethernet/renesas/ravb_main.c | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> index a442bcf6..500f5c1 100644
> --- a/drivers/net/ethernet/renesas/ravb_main.c
> +++ b/drivers/net/ethernet/renesas/ravb_main.c
> @@ -1458,7 +1458,18 @@ static void ravb_tx_timeout_work(struct work_struct *work)
> ravb_ptp_stop(ndev);
>
> /* Wait for DMA stopping */
> - ravb_stop_dma(ndev);
> + if (ravb_stop_dma(ndev)) {
> + /* If ravb_stop_dma() fails, the hardware is still in-progress
> + * as "Operation" mode for TX and/or RX. So, this should not
s/in-progress as "Operation" mode/operating/.
> + * call the following functions because ravb_dmac_init() is
> + * possible to fail too. Also, this should not retry
> + * ravb_stop_dma() again and again here because it's possible
> + * to wait forever. So, this just re-enables the TX and RX and
> + * skip the following re-initialization procedure.
> + */
> + ravb_rcv_snd_enable(ndev);
> + goto out;
> + }
>
> ravb_ring_free(ndev, RAVB_BE);
> ravb_ring_free(ndev, RAVB_NC);
> @@ -1467,6 +1478,7 @@ static void ravb_tx_timeout_work(struct work_struct *work)
> ravb_dmac_init(ndev);
BTW, that one also may fail...
> ravb_emac_init(ndev);
>
> +out:
> /* Initialise PTP Clock driver */
> if (priv->chip_id == RCAR_GEN2)
> ravb_ptp_init(ndev, priv->pdev);
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* RE: [PATCH/RFC v2] net: ethernet: ravb: exit if hardware is in-progress in tx timeout
2020-07-20 17:14 ` Sergei Shtylyov
@ 2020-07-21 1:39 ` Yoshihiro Shimoda
0 siblings, 0 replies; 3+ messages in thread
From: Yoshihiro Shimoda @ 2020-07-21 1:39 UTC (permalink / raw)
To: Sergei Shtylyov, davem, kuba
Cc: REE dirk.behme@de.bosch.com, Shashikant.Suguni, netdev,
linux-renesas-soc
Hello!
Thank you for your review!
> From: Sergei Shtylyov, Sent: Tuesday, July 21, 2020 2:15 AM
>
> Hello!
>
> On 7/20/20 2:58 PM, Yoshihiro Shimoda wrote:
>
> > According to the report of [1], this driver is possible to cause
> > the following error in ravb_tx_timeout_work().
> >
> > ravb e6800000.ethernet ethernet: failed to switch device to config mode
>
> Hmm, maybe we need a larger timeout there? The current one amounts to only
> ~100 ms for all cases (maybe we should parametrize the timeout?)...
I don't think so because we cannot assume when RX is finished.
For example, if other device sends to the hardware by using "ping -f",
the hardware is operating as RX while the ping is running.
> > This error means that the hardware could not change the state
> > from "Operation" to "Configuration" while some tx and/or rx queue
> > are operating. After that, ravb_config() in ravb_dmac_init() will fail,
>
> Are we seeing double messages from ravb_config()? I think we aren't...
No, we are not seeing double messages from ravb_config() because
ravb_stop_dma() is possible to fail before ravb_config() is called if
TCCR or CSR is specific condition.
> > and then any descriptors will be not allocaled anymore so that NULL
> > pointer dereference happens after that on ravb_start_xmit().
> >
> > To fix the issue, the ravb_tx_timeout_work() should check
> > the return value of ravb_stop_dma() whether this hardware can be
> > re-initialized or not. If ravb_stop_dma() fails, ravb_tx_timeout_work()
> > re-enables TX and RX and just exits.
> >
> > [1]
> > https://lore.kernel.org/linux-renesas-soc/20200518045452.2390-1-dirk.behme@de.bosch.com/
> >
> > Reported-by: Dirk Behme <dirk.behme@de.bosch.com>
> > Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
>
> Assuming the comment below is fixed:
>
> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Thanks!
> > ---
> > Changes from RFC v1:
> > - Check the return value of ravb_stop_dma() and exit if the hardware
> > condition can not be initialized in the tx timeout.
> > - Update the commit subject and description.
> > - Fix some typo.
> > https://patchwork.kernel.org/patch/11570217/
> >
> > Unfortunately, I still didn't reproduce the issue yet. So, I still
> > marked RFC on this patch.
>
> I think the Bosch people should test this patch, as they reported the kernel oops...
>
> >
> > drivers/net/ethernet/renesas/ravb_main.c | 14 +++++++++++++-
> > 1 file changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
> > index a442bcf6..500f5c1 100644
> > --- a/drivers/net/ethernet/renesas/ravb_main.c
> > +++ b/drivers/net/ethernet/renesas/ravb_main.c
> > @@ -1458,7 +1458,18 @@ static void ravb_tx_timeout_work(struct work_struct *work)
> > ravb_ptp_stop(ndev);
> >
> > /* Wait for DMA stopping */
> > - ravb_stop_dma(ndev);
> > + if (ravb_stop_dma(ndev)) {
> > + /* If ravb_stop_dma() fails, the hardware is still in-progress
> > + * as "Operation" mode for TX and/or RX. So, this should not
>
> s/in-progress as "Operation" mode/operating/.
I'll fix it.
> > + * call the following functions because ravb_dmac_init() is
> > + * possible to fail too. Also, this should not retry
> > + * ravb_stop_dma() again and again here because it's possible
> > + * to wait forever. So, this just re-enables the TX and RX and
> > + * skip the following re-initialization procedure.
> > + */
> > + ravb_rcv_snd_enable(ndev);
> > + goto out;
> > + }
> >
> > ravb_ring_free(ndev, RAVB_BE);
> > ravb_ring_free(ndev, RAVB_NC);
> > @@ -1467,6 +1478,7 @@ static void ravb_tx_timeout_work(struct work_struct *work)
> > ravb_dmac_init(ndev);
>
> BTW, that one also may fail...
Yes, that's true... In this case, I think this should print error message and
stop TX and RX to avoid any unexpected behaviors like kernel panic. So, I'll add
such a code.
Best regards,
Yoshihiro Shimoda
> > ravb_emac_init(ndev);
> >
> > +out:
> > /* Initialise PTP Clock driver */
> > if (priv->chip_id == RCAR_GEN2)
> > ravb_ptp_init(ndev, priv->pdev);
> >
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-07-21 1:39 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-20 11:58 [PATCH/RFC v2] net: ethernet: ravb: exit if hardware is in-progress in tx timeout Yoshihiro Shimoda
2020-07-20 17:14 ` Sergei Shtylyov
2020-07-21 1:39 ` Yoshihiro Shimoda
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.