netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] net: macb: restart tx after tx used bit read
@ 2018-12-17 10:02 Claudiu.Beznea
  2018-12-17 10:14 ` Nicolas.Ferre
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Claudiu.Beznea @ 2018-12-17 10:02 UTC (permalink / raw)
  To: Nicolas.Ferre, davem; +Cc: netdev, linux-kernel, Claudiu.Beznea

From: Claudiu Beznea <claudiu.beznea@microchip.com>

On some platforms (currently detected only on SAMA5D4) TX might stuck
even the pachets are still present in DMA memories and TX start was
issued for them. This happens due to race condition between MACB driver
updating next TX buffer descriptor to be used and IP reading the same
descriptor. In such a case, the "TX USED BIT READ" interrupt is asserted.
GEM/MACB user guide specifies that if a "TX USED BIT READ" interrupt
is asserted TX must be restarted. Restart TX if used bit is read and
packets are present in software TX queue. Packets are removed from software
TX queue if TX was successful for them (see macb_tx_interrupt()).

Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
---

Changes in v3:
- remove "inline" keyword

Changes in v2:
- use "static inline" instead of "inline static" for macb_tx_restart()

 drivers/net/ethernet/cadence/macb_main.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 1d86b4d5645a..f920230386ee 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -61,7 +61,8 @@
 #define MACB_TX_ERR_FLAGS	(MACB_BIT(ISR_TUND)			\
 					| MACB_BIT(ISR_RLE)		\
 					| MACB_BIT(TXERR))
-#define MACB_TX_INT_FLAGS	(MACB_TX_ERR_FLAGS | MACB_BIT(TCOMP))
+#define MACB_TX_INT_FLAGS	(MACB_TX_ERR_FLAGS | MACB_BIT(TCOMP)	\
+					| MACB_BIT(TXUBR))
 
 /* Max length of transmit frame must be a multiple of 8 bytes */
 #define MACB_TX_LEN_ALIGN	8
@@ -1312,6 +1313,21 @@ static void macb_hresp_error_task(unsigned long data)
 	netif_tx_start_all_queues(dev);
 }
 
+static void macb_tx_restart(struct macb_queue *queue)
+{
+	unsigned int head = queue->tx_head;
+	unsigned int tail = queue->tx_tail;
+	struct macb *bp = queue->bp;
+
+	if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
+		queue_writel(queue, ISR, MACB_BIT(TXUBR));
+
+	if (head == tail)
+		return;
+
+	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
+}
+
 static irqreturn_t macb_interrupt(int irq, void *dev_id)
 {
 	struct macb_queue *queue = dev_id;
@@ -1369,6 +1385,9 @@ static irqreturn_t macb_interrupt(int irq, void *dev_id)
 		if (status & MACB_BIT(TCOMP))
 			macb_tx_interrupt(queue);
 
+		if (status & MACB_BIT(TXUBR))
+			macb_tx_restart(queue);
+
 		/* Link change detection isn't possible with RMII, so we'll
 		 * add that if/when we get our hands on a full-blown MII PHY.
 		 */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] net: macb: restart tx after tx used bit read
  2018-12-17 10:02 [PATCH v3] net: macb: restart tx after tx used bit read Claudiu.Beznea
@ 2018-12-17 10:14 ` Nicolas.Ferre
  2018-12-18 23:58 ` David Miller
  2022-03-23  8:08 ` Tomas Melin
  2 siblings, 0 replies; 22+ messages in thread
From: Nicolas.Ferre @ 2018-12-17 10:14 UTC (permalink / raw)
  To: Claudiu.Beznea, davem; +Cc: netdev, linux-kernel

On 17/12/2018 at 11:02, Claudiu Beznea - M18063 wrote:
> From: Claudiu Beznea <claudiu.beznea@microchip.com>
> 
> On some platforms (currently detected only on SAMA5D4) TX might stuck
> even the pachets are still present in DMA memories and TX start was
> issued for them. This happens due to race condition between MACB driver
> updating next TX buffer descriptor to be used and IP reading the same
> descriptor. In such a case, the "TX USED BIT READ" interrupt is asserted.
> GEM/MACB user guide specifies that if a "TX USED BIT READ" interrupt
> is asserted TX must be restarted. Restart TX if used bit is read and
> packets are present in software TX queue. Packets are removed from software
> TX queue if TX was successful for them (see macb_tx_interrupt()).
> 
> Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>

Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>

Thanks. Best regards,
   Nicolas

> ---
> 
> Changes in v3:
> - remove "inline" keyword
> 
> Changes in v2:
> - use "static inline" instead of "inline static" for macb_tx_restart()
> 
>   drivers/net/ethernet/cadence/macb_main.c | 21 ++++++++++++++++++++-
>   1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> index 1d86b4d5645a..f920230386ee 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -61,7 +61,8 @@
>   #define MACB_TX_ERR_FLAGS	(MACB_BIT(ISR_TUND)			\
>   					| MACB_BIT(ISR_RLE)		\
>   					| MACB_BIT(TXERR))
> -#define MACB_TX_INT_FLAGS	(MACB_TX_ERR_FLAGS | MACB_BIT(TCOMP))
> +#define MACB_TX_INT_FLAGS	(MACB_TX_ERR_FLAGS | MACB_BIT(TCOMP)	\
> +					| MACB_BIT(TXUBR))
>   
>   /* Max length of transmit frame must be a multiple of 8 bytes */
>   #define MACB_TX_LEN_ALIGN	8
> @@ -1312,6 +1313,21 @@ static void macb_hresp_error_task(unsigned long data)
>   	netif_tx_start_all_queues(dev);
>   }
>   
> +static void macb_tx_restart(struct macb_queue *queue)
> +{
> +	unsigned int head = queue->tx_head;
> +	unsigned int tail = queue->tx_tail;
> +	struct macb *bp = queue->bp;
> +
> +	if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> +		queue_writel(queue, ISR, MACB_BIT(TXUBR));
> +
> +	if (head == tail)
> +		return;
> +
> +	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
> +}
> +
>   static irqreturn_t macb_interrupt(int irq, void *dev_id)
>   {
>   	struct macb_queue *queue = dev_id;
> @@ -1369,6 +1385,9 @@ static irqreturn_t macb_interrupt(int irq, void *dev_id)
>   		if (status & MACB_BIT(TCOMP))
>   			macb_tx_interrupt(queue);
>   
> +		if (status & MACB_BIT(TXUBR))
> +			macb_tx_restart(queue);
> +
>   		/* Link change detection isn't possible with RMII, so we'll
>   		 * add that if/when we get our hands on a full-blown MII PHY.
>   		 */
> 


-- 
Nicolas Ferre

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] net: macb: restart tx after tx used bit read
  2018-12-17 10:02 [PATCH v3] net: macb: restart tx after tx used bit read Claudiu.Beznea
  2018-12-17 10:14 ` Nicolas.Ferre
@ 2018-12-18 23:58 ` David Miller
  2022-03-23  8:08 ` Tomas Melin
  2 siblings, 0 replies; 22+ messages in thread
From: David Miller @ 2018-12-18 23:58 UTC (permalink / raw)
  To: Claudiu.Beznea; +Cc: Nicolas.Ferre, netdev, linux-kernel

From: <Claudiu.Beznea@microchip.com>
Date: Mon, 17 Dec 2018 10:02:42 +0000

> From: Claudiu Beznea <claudiu.beznea@microchip.com>
> 
> On some platforms (currently detected only on SAMA5D4) TX might stuck
> even the pachets are still present in DMA memories and TX start was
> issued for them. This happens due to race condition between MACB driver
> updating next TX buffer descriptor to be used and IP reading the same
> descriptor. In such a case, the "TX USED BIT READ" interrupt is asserted.
> GEM/MACB user guide specifies that if a "TX USED BIT READ" interrupt
> is asserted TX must be restarted. Restart TX if used bit is read and
> packets are present in software TX queue. Packets are removed from software
> TX queue if TX was successful for them (see macb_tx_interrupt()).
> 
> Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] net: macb: restart tx after tx used bit read
  2018-12-17 10:02 [PATCH v3] net: macb: restart tx after tx used bit read Claudiu.Beznea
  2018-12-17 10:14 ` Nicolas.Ferre
  2018-12-18 23:58 ` David Miller
@ 2022-03-23  8:08 ` Tomas Melin
  2022-03-23 15:43   ` Jakub Kicinski
  2 siblings, 1 reply; 22+ messages in thread
From: Tomas Melin @ 2022-03-23  8:08 UTC (permalink / raw)
  To: claudiu.beznea; +Cc: Nicolas.Ferre, davem, linux-kernel, netdev

Hi,

> From: <Claudiu.Beznea@microchip.com>
> To: <Nicolas.Ferre@microchip.com>, <davem@davemloft.net>
> Cc: <netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
> 	<Claudiu.Beznea@microchip.com>
> Subject: [PATCH v3] net: macb: restart tx after tx used bit read
> Date: Mon, 17 Dec 2018 10:02:42 +0000	[thread overview]
> Message-ID: <1545040937-6583-1-git-send-email-claudiu.beznea@microchip.com> (raw)
> 
> From: Claudiu Beznea <claudiu.beznea@microchip.com>
> 
> On some platforms (currently detected only on SAMA5D4) TX might stuck
> even the pachets are still present in DMA memories and TX start was
> issued for them. This happens due to race condition between MACB driver
> updating next TX buffer descriptor to be used and IP reading the same
> descriptor. In such a case, the "TX USED BIT READ" interrupt is asserted.
> GEM/MACB user guide specifies that if a "TX USED BIT READ" interrupt
> is asserted TX must be restarted. Restart TX if used bit is read and
> packets are present in software TX queue. Packets are removed from software
> TX queue if TX was successful for them (see macb_tx_interrupt()).
> 
> Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>

On Xilinx Zynq the above change can cause infinite interrupt loop leading 
to CPU stall. Seems timing/load needs to be appropriate for this to happen, and currently
with 1G ethernet this can be triggered normally within minutes when running stress tests
on the network interface.

The events leading up to the interrupt looping are similar as the issue described in the
commit message. However in our case, restarting TX does not help at all. Instead
the controller is stuck on the queue end descriptor generating endless TX_USED           
interrupts, never breaking out of interrupt routine.

Any chance you remember more details about in which situation restarting TX helped for
your use case? was tx_qbar at the end of frame or stopped in middle of frame?

thanks,
Tomas Melin


> ---
> 
> Changes in v3:
> - remove "inline" keyword
> 
> Changes in v2:
> - use "static inline" instead of "inline static" for macb_tx_restart()
> 
>  drivers/net/ethernet/cadence/macb_main.c | 21 ++++++++++++++++++++-
>  1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> index 1d86b4d5645a..f920230386ee 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -61,7 +61,8 @@
>  #define MACB_TX_ERR_FLAGS	(MACB_BIT(ISR_TUND)			\
>  					| MACB_BIT(ISR_RLE)		\
>  					| MACB_BIT(TXERR))
> -#define MACB_TX_INT_FLAGS	(MACB_TX_ERR_FLAGS | MACB_BIT(TCOMP))
> +#define MACB_TX_INT_FLAGS	(MACB_TX_ERR_FLAGS | MACB_BIT(TCOMP)	\
> +					| MACB_BIT(TXUBR))
>  
>  /* Max length of transmit frame must be a multiple of 8 bytes */
>  #define MACB_TX_LEN_ALIGN	8
> @@ -1312,6 +1313,21 @@ static void macb_hresp_error_task(unsigned long data)
>  	netif_tx_start_all_queues(dev);
>  }
>  
> +static void macb_tx_restart(struct macb_queue *queue)
> +{
> +	unsigned int head = queue->tx_head;
> +	unsigned int tail = queue->tx_tail;
> +	struct macb *bp = queue->bp;
> +
> +	if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> +		queue_writel(queue, ISR, MACB_BIT(TXUBR));
> +
> +	if (head == tail)
> +		return;
> +
> +	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
> +}
> +
>  static irqreturn_t macb_interrupt(int irq, void *dev_id)
>  {
>  	struct macb_queue *queue = dev_id;
> @@ -1369,6 +1385,9 @@ static irqreturn_t macb_interrupt(int irq, void *dev_id)
>  		if (status & MACB_BIT(TCOMP))
>  			macb_tx_interrupt(queue);
>  
> +		if (status & MACB_BIT(TXUBR))
> +			macb_tx_restart(queue);
> +
>  		/* Link change detection isn't possible with RMII, so we'll
>  		 * add that if/when we get our hands on a full-blown MII PHY.
>  		 */
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] net: macb: restart tx after tx used bit read
  2022-03-23  8:08 ` Tomas Melin
@ 2022-03-23 15:43   ` Jakub Kicinski
  2022-03-23 16:42     ` Robert Hancock
  0 siblings, 1 reply; 22+ messages in thread
From: Jakub Kicinski @ 2022-03-23 15:43 UTC (permalink / raw)
  To: Tomas Melin, Robert Hancock; +Cc: claudiu.beznea, Nicolas.Ferre, davem, netdev

On Wed, 23 Mar 2022 10:08:20 +0200 Tomas Melin wrote:
> > From: <Claudiu.Beznea@microchip.com>
> > To: <Nicolas.Ferre@microchip.com>, <davem@davemloft.net>
> > Cc: <netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
> > 	<Claudiu.Beznea@microchip.com>
> > Subject: [PATCH v3] net: macb: restart tx after tx used bit read
> > Date: Mon, 17 Dec 2018 10:02:42 +0000	[thread overview]
> > Message-ID: <1545040937-6583-1-git-send-email-claudiu.beznea@microchip.com> (raw)
> > 
> > From: Claudiu Beznea <claudiu.beznea@microchip.com>
> > 
> > On some platforms (currently detected only on SAMA5D4) TX might stuck
> > even the pachets are still present in DMA memories and TX start was
> > issued for them. This happens due to race condition between MACB driver
> > updating next TX buffer descriptor to be used and IP reading the same
> > descriptor. In such a case, the "TX USED BIT READ" interrupt is asserted.
> > GEM/MACB user guide specifies that if a "TX USED BIT READ" interrupt
> > is asserted TX must be restarted. Restart TX if used bit is read and
> > packets are present in software TX queue. Packets are removed from software
> > TX queue if TX was successful for them (see macb_tx_interrupt()).
> > 
> > Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>  
> 
> On Xilinx Zynq the above change can cause infinite interrupt loop leading 
> to CPU stall. Seems timing/load needs to be appropriate for this to happen, and currently
> with 1G ethernet this can be triggered normally within minutes when running stress tests
> on the network interface.
> 
> The events leading up to the interrupt looping are similar as the issue described in the
> commit message. However in our case, restarting TX does not help at all. Instead
> the controller is stuck on the queue end descriptor generating endless TX_USED           
> interrupts, never breaking out of interrupt routine.
> 
> Any chance you remember more details about in which situation restarting TX helped for
> your use case? was tx_qbar at the end of frame or stopped in middle of frame?

Which kernel version are you using? Robert has been working on macb +
Zynq recently, adding him to CC.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] net: macb: restart tx after tx used bit read
  2022-03-23 15:43   ` Jakub Kicinski
@ 2022-03-23 16:42     ` Robert Hancock
  2022-03-25  7:10       ` Tomas Melin
                         ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Robert Hancock @ 2022-03-23 16:42 UTC (permalink / raw)
  To: kuba, tomas.melin; +Cc: Nicolas.Ferre, davem, claudiu.beznea, netdev

On Wed, 2022-03-23 at 08:43 -0700, Jakub Kicinski wrote:
> On Wed, 23 Mar 2022 10:08:20 +0200 Tomas Melin wrote:
> > > From: <Claudiu.Beznea@microchip.com>
> > > To: <Nicolas.Ferre@microchip.com>, <davem@davemloft.net>
> > > Cc: <netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
> > > 	<Claudiu.Beznea@microchip.com>
> > > Subject: [PATCH v3] net: macb: restart tx after tx used bit read
> > > Date: Mon, 17 Dec 2018 10:02:42 +0000	[thread overview]
> > > Message-ID: <
> > > 1545040937-6583-1-git-send-email-claudiu.beznea@microchip.com> (raw)
> > > 
> > > From: Claudiu Beznea <claudiu.beznea@microchip.com>
> > > 
> > > On some platforms (currently detected only on SAMA5D4) TX might stuck
> > > even the pachets are still present in DMA memories and TX start was
> > > issued for them. This happens due to race condition between MACB driver
> > > updating next TX buffer descriptor to be used and IP reading the same
> > > descriptor. In such a case, the "TX USED BIT READ" interrupt is asserted.
> > > GEM/MACB user guide specifies that if a "TX USED BIT READ" interrupt
> > > is asserted TX must be restarted. Restart TX if used bit is read and
> > > packets are present in software TX queue. Packets are removed from
> > > software
> > > TX queue if TX was successful for them (see macb_tx_interrupt()).
> > > 
> > > Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>  
> > 
> > On Xilinx Zynq the above change can cause infinite interrupt loop leading 
> > to CPU stall. Seems timing/load needs to be appropriate for this to happen,
> > and currently
> > with 1G ethernet this can be triggered normally within minutes when running
> > stress tests
> > on the network interface.
> > 
> > The events leading up to the interrupt looping are similar as the issue
> > described in the
> > commit message. However in our case, restarting TX does not help at all.
> > Instead
> > the controller is stuck on the queue end descriptor generating endless
> > TX_USED           
> > interrupts, never breaking out of interrupt routine.
> > 
> > Any chance you remember more details about in which situation restarting TX
> > helped for
> > your use case? was tx_qbar at the end of frame or stopped in middle of
> > frame?
> 
> Which kernel version are you using? Robert has been working on macb +
> Zynq recently, adding him to CC.

We have been working with ZynqMP and haven't seen such isses in the past, but
I'm not sure we've tried the same type of stress test on those interfaces. If
by Zynq, Tomas means the Zynq-7000 series, that might be a different
version/revision of the IP core than we have as well.

I haven't looked at the TX ring descriptor and register setup on this core in
that much detail, but the fact the controller gets into this "TX used bit read"
state in the first place seems unusual. I'm wondering if something is being
done in the wrong order or if we are missing a memory barrier etc?

-- 
Robert Hancock
Senior Hardware Designer, Calian Advanced Technologies
www.calian.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] net: macb: restart tx after tx used bit read
  2022-03-23 16:42     ` Robert Hancock
@ 2022-03-25  7:10       ` Tomas Melin
  2022-03-25  8:13       ` Claudiu.Beznea
  2023-04-07 21:33       ` [PATCH 0/1] Alternative, " Ingo Rohloff
  2 siblings, 0 replies; 22+ messages in thread
From: Tomas Melin @ 2022-03-25  7:10 UTC (permalink / raw)
  To: Robert Hancock, kuba; +Cc: Nicolas.Ferre, davem, claudiu.beznea, netdev

Hi,

On 23/03/2022 18:42, Robert Hancock wrote:
> On Wed, 2022-03-23 at 08:43 -0700, Jakub Kicinski wrote:
>> On Wed, 23 Mar 2022 10:08:20 +0200 Tomas Melin wrote:
>>>> From: <Claudiu.Beznea@microchip.com>
>>>> To: <Nicolas.Ferre@microchip.com>, <davem@davemloft.net>
>>>> Cc: <netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
>>>> 	<Claudiu.Beznea@microchip.com>
>>>> Subject: [PATCH v3] net: macb: restart tx after tx used bit read
>>>> Date: Mon, 17 Dec 2018 10:02:42 +0000	[thread overview]
>>>> Message-ID: <
>>>> 1545040937-6583-1-git-send-email-claudiu.beznea@microchip.com> (raw)
>>>>
>>>> From: Claudiu Beznea <claudiu.beznea@microchip.com>
>>>>
>>>> On some platforms (currently detected only on SAMA5D4) TX might stuck
>>>> even the pachets are still present in DMA memories and TX start was
>>>> issued for them. This happens due to race condition between MACB driver
>>>> updating next TX buffer descriptor to be used and IP reading the same
>>>> descriptor. In such a case, the "TX USED BIT READ" interrupt is asserted.
>>>> GEM/MACB user guide specifies that if a "TX USED BIT READ" interrupt
>>>> is asserted TX must be restarted. Restart TX if used bit is read and
>>>> packets are present in software TX queue. Packets are removed from
>>>> software
>>>> TX queue if TX was successful for them (see macb_tx_interrupt()).
>>>>
>>>> Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
>>>
>>> On Xilinx Zynq the above change can cause infinite interrupt loop leading
>>> to CPU stall. Seems timing/load needs to be appropriate for this to happen,
>>> and currently
>>> with 1G ethernet this can be triggered normally within minutes when running
>>> stress tests
>>> on the network interface.
>>>
>>> The events leading up to the interrupt looping are similar as the issue
>>> described in the
>>> commit message. However in our case, restarting TX does not help at all.
>>> Instead
>>> the controller is stuck on the queue end descriptor generating endless
>>> TX_USED
>>> interrupts, never breaking out of interrupt routine.
>>>
>>> Any chance you remember more details about in which situation restarting TX
>>> helped for
>>> your use case? was tx_qbar at the end of frame or stopped in middle of
>>> frame?
>>
>> Which kernel version are you using? Robert has been working on macb +
>> Zynq recently, adding him to CC.

This was originally seen on 4.19.x series kernel, but the same issue is 
also present with 5.10.x series kernels. I have tried looking, but did 
not see any changes which would seem related to this particular issues 
neither in newer mainline kernels or xilinx tree.

These stall issues have surfaced as CPU load and timing has changed, 
even with the same kernel version. So it seems rather likely that it 
needs the correct timing to get triggered.

> 
> We have been working with ZynqMP and haven't seen such isses in the past, but
> I'm not sure we've tried the same type of stress test on those interfaces. If
> by Zynq, Tomas means the Zynq-7000 series, that might be a different
> version/revision of the IP core than we have as well.
Indeed, this is the Zynq-7000 series.

> 
> I haven't looked at the TX ring descriptor and register setup on this core in
> that much detail, but the fact the controller gets into this "TX used bit read"
> state in the first place seems unusual. I'm wondering if something is being
> done in the wrong order or if we are missing a memory barrier etc?
> 
I agree that it could be something like that, or then the controller 
gets into some unknown state that makes the transmission halt until more 
data gets pushed into the buffer.

I have a proposal for improving on the original tx restart approach, and 
recently posted it to the list. With that patch applied, have not been 
able to cause any stall sitations during stress testing anymore.

Thanks,
Tomas


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] net: macb: restart tx after tx used bit read
  2022-03-23 16:42     ` Robert Hancock
  2022-03-25  7:10       ` Tomas Melin
@ 2022-03-25  8:13       ` Claudiu.Beznea
  2022-03-25  9:33         ` Tomas Melin
  2023-04-07 21:33       ` [PATCH 0/1] Alternative, " Ingo Rohloff
  2 siblings, 1 reply; 22+ messages in thread
From: Claudiu.Beznea @ 2022-03-25  8:13 UTC (permalink / raw)
  To: robert.hancock, kuba, tomas.melin; +Cc: Nicolas.Ferre, davem, netdev

Hi,

On 23.03.2022 18:42, Robert Hancock wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On Wed, 2022-03-23 at 08:43 -0700, Jakub Kicinski wrote:
>> On Wed, 23 Mar 2022 10:08:20 +0200 Tomas Melin wrote:
>>>> From: <Claudiu.Beznea@microchip.com>
>>>> To: <Nicolas.Ferre@microchip.com>, <davem@davemloft.net>
>>>> Cc: <netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
>>>>   <Claudiu.Beznea@microchip.com>
>>>> Subject: [PATCH v3] net: macb: restart tx after tx used bit read
>>>> Date: Mon, 17 Dec 2018 10:02:42 +0000     [thread overview]
>>>> Message-ID: <
>>>> 1545040937-6583-1-git-send-email-claudiu.beznea@microchip.com> (raw)
>>>>
>>>> From: Claudiu Beznea <claudiu.beznea@microchip.com>
>>>>
>>>> On some platforms (currently detected only on SAMA5D4) TX might stuck
>>>> even the pachets are still present in DMA memories and TX start was
>>>> issued for them. This happens due to race condition between MACB driver
>>>> updating next TX buffer descriptor to be used and IP reading the same
>>>> descriptor. In such a case, the "TX USED BIT READ" interrupt is asserted.
>>>> GEM/MACB user guide specifies that if a "TX USED BIT READ" interrupt
>>>> is asserted TX must be restarted. Restart TX if used bit is read and
>>>> packets are present in software TX queue. Packets are removed from
>>>> software
>>>> TX queue if TX was successful for them (see macb_tx_interrupt()).
>>>>
>>>> Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
>>>
>>> On Xilinx Zynq the above change can cause infinite interrupt loop leading
>>> to CPU stall. Seems timing/load needs to be appropriate for this to happen,
>>> and currently
>>> with 1G ethernet this can be triggered normally within minutes when running
>>> stress tests
>>> on the network interface.
>>>
>>> The events leading up to the interrupt looping are similar as the issue
>>> described in the
>>> commit message. However in our case, restarting TX does not help at all.
>>> Instead
>>> the controller is stuck on the queue end descriptor generating endless
>>> TX_USED
>>> interrupts, never breaking out of interrupt routine.
>>>
>>> Any chance you remember more details about in which situation restarting TX
>>> helped for
>>> your use case? was tx_qbar at the end of frame or stopped in middle of
>>> frame?

I look though my emails for this particular issue, didn't find all that I
need with regards to the issue that leads to this fix, but what can I tell
from my mind and some emails still in my inbox is that this issue had been
reproduced at that time only with a particular we server running on SAMA5D4
and at some point a packet stopped being transmitted although TX_START had
been issued for it. In that case the controller fired TX Used bit read
interrupt.

The GEM datasheet specifies this "Transmit is halted when a buffer
descriptor with its used bit set is read, a transmit error occurs, or by
writing to the transmit halt bit of the network control register"

Also, at that point had a support case open on Cadence and they confirm
that having TX restarted is the good way.

At the time of investigating the issue I only found it reproducible only on
one SoC (SAMA5D4) out of 4 (SAMA5D2, SAMA5D3 and one ARM926 based SoC). All
these are probably less faster than ZynqMP.

Though this IP is today present also on SAMA7G5 who's CPU can run @1GHz and
MAC IP being clocked @200MHz. Even in this last setup I haven't saw the
behavior with used bit read being fired too often.

By any chance on your setup do you have small packets inserted in MACB
queues at high rate?

>>
>> Which kernel version are you using? Robert has been working on macb +
>> Zynq recently, adding him to CC.
> 
> We have been working with ZynqMP and haven't seen such isses in the past, but
> I'm not sure we've tried the same type of stress test on those interfaces. If
> by Zynq, Tomas means the Zynq-7000 series, that might be a different
> version/revision of the IP core than we have as well.
> 
> I haven't looked at the TX ring descriptor and register setup on this core in
> that much detail, but the fact the controller gets into this "TX used bit read"
> state in the first place seems unusual. I'm wondering if something is being
> done in the wrong order or if we are missing a memory barrier etc?

That might possible especially on descriptors update path.

> 
> --
> Robert Hancock
> Senior Hardware Designer, Calian Advanced Technologies
> www.calian.com


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] net: macb: restart tx after tx used bit read
  2022-03-25  8:13       ` Claudiu.Beznea
@ 2022-03-25  9:33         ` Tomas Melin
  0 siblings, 0 replies; 22+ messages in thread
From: Tomas Melin @ 2022-03-25  9:33 UTC (permalink / raw)
  To: Claudiu.Beznea, robert.hancock, kuba; +Cc: Nicolas.Ferre, davem, netdev

Hi,

On 25/03/2022 10:13, Claudiu.Beznea@microchip.com wrote:
> Hi,
> 

>>>>
>>>> Any chance you remember more details about in which situation restarting TX
>>>> helped for
>>>> your use case? was tx_qbar at the end of frame or stopped in middle of
>>>> frame?
> 
> I look though my emails for this particular issue, didn't find all that I
> need with regards to the issue that leads to this fix, but what can I tell
> from my mind and some emails still in my inbox is that this issue had been
> reproduced at that time only with a particular we server running on SAMA5D4
> and at some point a packet stopped being transmitted although TX_START had
> been issued for it. In that case the controller fired TX Used bit read
> interrupt.
> 
> The GEM datasheet specifies this "Transmit is halted when a buffer
> descriptor with its used bit set is read, a transmit error occurs, or by
> writing to the transmit halt bit of the network control register"
> 
> Also, at that point had a support case open on Cadence and they confirm
> that having TX restarted is the good way.
> 
> At the time of investigating the issue I only found it reproducible only on
> one SoC (SAMA5D4) out of 4 (SAMA5D2, SAMA5D3 and one ARM926 based SoC). All
> these are probably less faster than ZynqMP.
> 
> Though this IP is today present also on SAMA7G5 who's CPU can run @1GHz and
> MAC IP being clocked @200MHz. Even in this last setup I haven't saw the
> behavior with used bit read being fired too often.
> 
> By any chance on your setup do you have small packets inserted in MACB
> queues at high rate?

Thanks for elaborating on your case.

Frames are two descriptors long (header + one) and packets ~1450 bytes.
with iperf it's relatively easy to trigger this situation, but this is 
also seen with "normal" traffic. It just take longer to trigger.


thanks,
Tomas


> 
>>>
>>> Which kernel version are you using? Robert has been working on macb +
>>> Zynq recently, adding him to CC.
>>
>> We have been working with ZynqMP and haven't seen such isses in the past, but
>> I'm not sure we've tried the same type of stress test on those interfaces. If
>> by Zynq, Tomas means the Zynq-7000 series, that might be a different
>> version/revision of the IP core than we have as well.
>>
>> I haven't looked at the TX ring descriptor and register setup on this core in
>> that much detail, but the fact the controller gets into this "TX used bit read"
>> state in the first place seems unusual. I'm wondering if something is being
>> done in the wrong order or if we are missing a memory barrier etc?
> 
> That might possible especially on descriptors update path.
> 
>>
>> --
>> Robert Hancock
>> Senior Hardware Designer, Calian Advanced Technologies
>> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.calian.com%2F&amp;data=04%7C01%7Ctomas.melin%40vaisala.com%7C1aaf00e981f14742b92108da0e3754c1%7C6d7393e041f54c2e9b124c2be5da5c57%7C0%7C0%7C637837928068621708%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=230PlqUdZGct3lBAd39A0Zm8hxIgh5jovRRrQJ%2BXVyc%3D&amp;reserved=0
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 0/1] Alternative, restart tx after tx used bit read
  2022-03-23 16:42     ` Robert Hancock
  2022-03-25  7:10       ` Tomas Melin
  2022-03-25  8:13       ` Claudiu.Beznea
@ 2023-04-07 21:33       ` Ingo Rohloff
  2023-04-07 21:33         ` [PATCH 1/1] net: macb: A different way to restart a stuck TX descriptor ring Ingo Rohloff
                           ` (4 more replies)
  2 siblings, 5 replies; 22+ messages in thread
From: Ingo Rohloff @ 2023-04-07 21:33 UTC (permalink / raw)
  To: robert.hancock
  Cc: Nicolas.Ferre, claudiu.beznea, davem, kuba, netdev, tomas.melin,
	Ingo Rohloff

I am sorry; this is a long E-Mail.

I am referring to this problem:

Robert Hancock wrote:
> On Wed, 2022-03-23 at 08:43 -0700, Jakub Kicinski wrote:
> > On Wed, 23 Mar 2022 10:08:20 +0200 Tomas Melin wrote:
> > > > From: Claudiu Beznea <claudiu.beznea@microchip.com>
> > > > 
> > > > On some platforms (currently detected only on SAMA5D4) TX might stuck
> > > > even the pachets are still present in DMA memories and TX start was
> > > > issued for them.
> > > > ...
> > > On Xilinx Zynq the above change can cause infinite interrupt loop
> > > leading to CPU stall.  Seems timing/load needs to be appropriate for
> > > this to happen, and currently with 1G ethernet this can be triggered
> > > normally within minutes when running stress tests on the network
> > > interface.
> > > ...
> > Which kernel version are you using?  Robert has been working on macb +
> > Zynq recently, adding him to CC.
> ...
> I haven't looked at the TX ring descriptor and register setup on this core
> in that much detail, but the fact the controller gets into this "TX used
> bit read" state in the first place seems unusual.  I'm wondering if
> something is being done in the wrong order or if we are missing a memory
> barrier etc?

I am developing on a ZynqMP (Ultrascale+) SoC from AMD/Xilinx.
I have seen the same issue before commit 4298388574dae6168 ("net: macb:
restart tx after tx used bit read")

The scenario which sometimes triggers it for me:

I have an application running on the PC.
The application sends a short command (via TCP) to the ZynqMP.
The ZynqMP answers with a long stream of bytes via TCP
(around 230KiB).
The PC knows the amount of data and waits to receive the data completely.
The PC gets stuck, because the last TCP segment of the transfer gets
stuck in the ZynqMP and is not transmitted.
You can re-trigger the TX Ring by pinging the ZynqMP:
The Ping answer will re-trigger the TX ring, which in turn will also
then send the stuck IP/TCP packet.

Unfortunately triggering this problem seems to be hard; at least I am 
not able to reproduce it easily.

So: If anyone has a more reliable way to trigger the problem, 
please tell me.
This is to check if my proposed alternative works under all circumstances.

I have an alternate implementation, which does not require to turn on
the "TX USED BIT READ" (TUBR) interrupt.
The reason why I think this alternative might be better is, because I
believe the TUBR interrupt happens at the wrong time; so I am not sure
that the current implementation works reliably.

Analysis:
Commit 404cd086f29e867f ("net: macb: Allocate valid memory for TX and RX BD
prefetch") mentions:

    GEM version in ZynqMP and most versions greater than r1p07 supports
    TX and RX BD prefetch. The number of BDs that can be prefetched is a
    HW configurable parameter. For ZynqMP, this parameter is 4.

I think what happens is this:
Example Scenario (SW == linux kernel, HW == cadence ethernet IP).
1) SW has written TX descriptors 0..7
2) HW is currently transmitting TX descriptor 6.
   HW has already prefetched TX descriptors 6,7,8,9.
3) SW writes TX descriptor 8 (clearing TX_USED)
4) SW writes the TSTART bit.
   HW ignores this, because it is still transmitting.
5) HW transmits TX descriptor 7.
6) HW reaches descriptor 8; because this descriptor
   has already been prefetched, HW sees a non-active
   descriptor (TX_USED set) and stops transmitting.

From debugging the code it seems that the TUBR interrupt happens, when
a descriptor is prefetched, which has a TX_USED bit set, which is before
it is processed by the rest of the hardware:
When looking at the end of a transfer it seems I get a TUBR interrupt,
followed by some more TX COMPLETE interrupts.

Additionally that means at the time the TUBR interrupt happens, it
is too early to write the TSTART bit again, because the hardware is
still actively transmitting.

The alternative I implemented is to check in macb_tx_complete() if

1) The TX Queue is non-empty (there are pending TX descriptors)
2) The hardware indicates that it is not transmitting any more

If this situation is detected, the TSTART bit will be written to
restart the TX ring.

I know for sure, that I hit the code path, which restarts the 
transmission in macb_tx_complete(); that's why I believe the
"Example Scenario" I described above is correct.

I am still not sure if what I implemented is enough:
macb_tx_complete() should at least see all completed TX descriptors.
I still believe there is a (very short) time window in which there
might be a race:
1) HW completes TX descriptor 7 and sets the TX_USED bit
   in TX descriptor 7.
   TX descriptor 8 was prefetched with a set TX_USED bit.
2) SW sees that TX descriptor 7 is completed
   (TX_USED bit now is set).
3) SW sees that there still is a pending TX descriptor 8.
4) SW checks if the TGO bit is still set, which it is.
   So the SW does nothing at this point.
5) HW processes the prefetched,set TX_USED bit in
   TX descriptor 8 and stops transmission (clearing the TGO bit).

I am not sure if it is guaranteed that 5) cannot happen after 4).  If 5)
happens after 4) as described above, then the controller still gets stuck.
The only idea I can come up with, is to re-check the TGO bit
a second time a little bit later, but I am not sure how to
implement this.

Is there anyone who has access to hardware documentation, which
sheds some light onto the way the descriptor prefetching works?

so long
  Ingo


Ingo Rohloff (1):
  net: macb: A different way to restart a stuck TX descriptor ring.

 drivers/net/ethernet/cadence/macb.h      |  1 -
 drivers/net/ethernet/cadence/macb_main.c | 67 +++++++++---------------
 2 files changed, 24 insertions(+), 44 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/1] net: macb: A different way to restart a stuck TX descriptor ring.
  2023-04-07 21:33       ` [PATCH 0/1] Alternative, " Ingo Rohloff
@ 2023-04-07 21:33         ` Ingo Rohloff
  2023-04-11 15:25           ` Jesse Brandeburg
  2023-04-10 17:05         ` [PATCH 0/1] Alternative, restart tx after tx used bit read Robert Hancock
                           ` (3 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Ingo Rohloff @ 2023-04-07 21:33 UTC (permalink / raw)
  To: robert.hancock
  Cc: Nicolas.Ferre, claudiu.beznea, davem, kuba, netdev, tomas.melin,
	Ingo Rohloff

This implements a different approach than Commit 4298388574dae6 ("net:
macb: restart tx after tx used bit read"):

When reaping TX descriptors in macb_tx_complete(), if there are still
active descriptors pending (queue is not empty) and the controller
additionally signals that it is not any longer working on the TX ring,
then something has to be wrong. Reasoning:
Each time a descriptor is added to the TX ring (via macb_start_xmit()) the
controller is triggered to start transmitting (via setting the TSTART
bit).
At this point in time, there are two cases:
1) The controller already has read an inactive descriptor
   (with a set TX_USED bit).
2) The controller has not yet read an inactive descriptor
   and is still actively transmitting.

In case 1) setting the TSTART bit, should restart transmission.
In case 2) the controller should continue transmitting and at some point
reach the freshly added descriptors and then process them too.

This patch checks in macb_tx_complete() if the TX queue is non-empty and
additionally if the controller indicates that it is not transmitting any
longer. If this condition is detected, the TSTART bit is set again to
restart transmission.

Signed-off-by: Ingo Rohloff <ingo.rohloff@lauterbach.com>
---
 drivers/net/ethernet/cadence/macb.h      |  1 -
 drivers/net/ethernet/cadence/macb_main.c | 66 +++++++++---------------
 2 files changed, 23 insertions(+), 44 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
index 14dfec4db8f9..b749fa2c0342 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -1205,7 +1205,6 @@ struct macb_queue {
 	struct macb_tx_skb	*tx_skb;
 	dma_addr_t		tx_ring_dma;
 	struct work_struct	tx_error_task;
-	bool			txubr_pending;
 	struct napi_struct	napi_tx;
 
 	dma_addr_t		rx_ring_dma;
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 66e30561569e..077024ad9ecc 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -70,8 +70,7 @@ struct sifive_fu540_macb_mgmt {
 #define MACB_TX_ERR_FLAGS	(MACB_BIT(ISR_TUND)			\
 					| MACB_BIT(ISR_RLE)		\
 					| MACB_BIT(TXERR))
-#define MACB_TX_INT_FLAGS	(MACB_TX_ERR_FLAGS | MACB_BIT(TCOMP)	\
-					| MACB_BIT(TXUBR))
+#define MACB_TX_INT_FLAGS	(MACB_TX_ERR_FLAGS | MACB_BIT(TCOMP))
 
 /* Max length of transmit frame must be a multiple of 8 bytes */
 #define MACB_TX_LEN_ALIGN	8
@@ -1272,6 +1271,26 @@ static int macb_tx_complete(struct macb_queue *queue, int budget)
 	}
 
 	queue->tx_tail = tail;
+
+	if (tail != head) {
+		unsigned long flags;
+		u32 status;
+
+		spin_lock_irqsave(&bp->lock, flags);
+		status = macb_readl(bp, TSR);
+		if (!(status & MACB_BIT(TGO))) {
+			/* We have frames to be transmitted pending,
+			 * but controller is not transmitting any more.
+			 * Restart transmit engine
+			 */
+			u32 ncr;
+
+			ncr = macb_readl(bp, NCR) | MACB_BIT(TSTART);
+			macb_writel(bp, NCR, ncr);
+		}
+		spin_unlock_irqrestore(&bp->lock, flags);
+	}
+
 	if (__netif_subqueue_stopped(bp->dev, queue_index) &&
 	    CIRC_CNT(queue->tx_head, queue->tx_tail,
 		     bp->tx_ring_size) <= MACB_TX_WAKEUP_THRESH(bp))
@@ -1688,31 +1707,6 @@ static int macb_rx_poll(struct napi_struct *napi, int budget)
 	return work_done;
 }
 
-static void macb_tx_restart(struct macb_queue *queue)
-{
-	struct macb *bp = queue->bp;
-	unsigned int head_idx, tbqp;
-
-	spin_lock(&queue->tx_ptr_lock);
-
-	if (queue->tx_head == queue->tx_tail)
-		goto out_tx_ptr_unlock;
-
-	tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
-	tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
-	head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, queue->tx_head));
-
-	if (tbqp == head_idx)
-		goto out_tx_ptr_unlock;
-
-	spin_lock_irq(&bp->lock);
-	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
-	spin_unlock_irq(&bp->lock);
-
-out_tx_ptr_unlock:
-	spin_unlock(&queue->tx_ptr_lock);
-}
-
 static bool macb_tx_complete_pending(struct macb_queue *queue)
 {
 	bool retval = false;
@@ -1737,13 +1731,6 @@ static int macb_tx_poll(struct napi_struct *napi, int budget)
 
 	work_done = macb_tx_complete(queue, budget);
 
-	rmb(); // ensure txubr_pending is up to date
-	if (queue->txubr_pending) {
-		queue->txubr_pending = false;
-		netdev_vdbg(bp->dev, "poll: tx restart\n");
-		macb_tx_restart(queue);
-	}
-
 	netdev_vdbg(bp->dev, "TX poll: queue = %u, work_done = %d, budget = %d\n",
 		    (unsigned int)(queue - bp->queues), work_done, budget);
 
@@ -1913,17 +1900,10 @@ static irqreturn_t macb_interrupt(int irq, void *dev_id)
 			}
 		}
 
-		if (status & (MACB_BIT(TCOMP) |
-			      MACB_BIT(TXUBR))) {
+		if (status & MACB_BIT(TCOMP)) {
 			queue_writel(queue, IDR, MACB_BIT(TCOMP));
 			if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
-				queue_writel(queue, ISR, MACB_BIT(TCOMP) |
-							 MACB_BIT(TXUBR));
-
-			if (status & MACB_BIT(TXUBR)) {
-				queue->txubr_pending = true;
-				wmb(); // ensure softirq can see update
-			}
+				queue_writel(queue, ISR, MACB_BIT(TCOMP));
 
 			if (napi_schedule_prep(&queue->napi_tx)) {
 				netdev_vdbg(bp->dev, "scheduling TX softirq\n");
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/1] Alternative, restart tx after tx used bit read
  2023-04-07 21:33       ` [PATCH 0/1] Alternative, " Ingo Rohloff
  2023-04-07 21:33         ` [PATCH 1/1] net: macb: A different way to restart a stuck TX descriptor ring Ingo Rohloff
@ 2023-04-10 17:05         ` Robert Hancock
  2023-04-12  2:07         ` Jakub Kicinski
                           ` (2 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Robert Hancock @ 2023-04-10 17:05 UTC (permalink / raw)
  To: ingo.rohloff
  Cc: Nicolas.Ferre, claudiu.beznea, tomas.melin, davem, kuba,
	harini.katakam, netdev

On Fri, 2023-04-07 at 23:33 +0200, Ingo Rohloff wrote:
> CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you recognize the sender
> and know the content is safe.
> 
> I am sorry; this is a long E-Mail.
> 
> I am referring to this problem:
> 
> Robert Hancock wrote:
> > On Wed, 2022-03-23 at 08:43 -0700, Jakub Kicinski wrote:
> > > On Wed, 23 Mar 2022 10:08:20 +0200 Tomas Melin wrote:
> > > > > From: Claudiu Beznea <claudiu.beznea@microchip.com>
> > > > > 
> > > > > On some platforms (currently detected only on SAMA5D4) TX
> > > > > might stuck
> > > > > even the pachets are still present in DMA memories and TX
> > > > > start was
> > > > > issued for them.
> > > > > ...
> > > > On Xilinx Zynq the above change can cause infinite interrupt
> > > > loop
> > > > leading to CPU stall.  Seems timing/load needs to be
> > > > appropriate for
> > > > this to happen, and currently with 1G ethernet this can be
> > > > triggered
> > > > normally within minutes when running stress tests on the
> > > > network
> > > > interface.
> > > > ...
> > > Which kernel version are you using?  Robert has been working on
> > > macb +
> > > Zynq recently, adding him to CC.
> > ...
> > I haven't looked at the TX ring descriptor and register setup on
> > this core
> > in that much detail, but the fact the controller gets into this "TX
> > used
> > bit read" state in the first place seems unusual.  I'm wondering if
> > something is being done in the wrong order or if we are missing a
> > memory
> > barrier etc?
> 
> I am developing on a ZynqMP (Ultrascale+) SoC from AMD/Xilinx.
> I have seen the same issue before commit 4298388574dae6168 ("net:
> macb:
> restart tx after tx used bit read")
> 
> The scenario which sometimes triggers it for me:
> 
> I have an application running on the PC.
> The application sends a short command (via TCP) to the ZynqMP.
> The ZynqMP answers with a long stream of bytes via TCP
> (around 230KiB).
> The PC knows the amount of data and waits to receive the data
> completely.
> The PC gets stuck, because the last TCP segment of the transfer gets
> stuck in the ZynqMP and is not transmitted.
> You can re-trigger the TX Ring by pinging the ZynqMP:
> The Ping answer will re-trigger the TX ring, which in turn will also
> then send the stuck IP/TCP packet.
> 
> Unfortunately triggering this problem seems to be hard; at least I am
> not able to reproduce it easily.
> 
> So: If anyone has a more reliable way to trigger the problem,
> please tell me.
> This is to check if my proposed alternative works under all
> circumstances.
> 
> I have an alternate implementation, which does not require to turn on
> the "TX USED BIT READ" (TUBR) interrupt.
> The reason why I think this alternative might be better is, because I
> believe the TUBR interrupt happens at the wrong time; so I am not
> sure
> that the current implementation works reliably.
> 
> Analysis:
> Commit 404cd086f29e867f ("net: macb: Allocate valid memory for TX and
> RX BD
> prefetch") mentions:
> 
>     GEM version in ZynqMP and most versions greater than r1p07
> supports
>     TX and RX BD prefetch. The number of BDs that can be prefetched
> is a
>     HW configurable parameter. For ZynqMP, this parameter is 4.
> 
> I think what happens is this:
> Example Scenario (SW == linux kernel, HW == cadence ethernet IP).
> 1) SW has written TX descriptors 0..7
> 2) HW is currently transmitting TX descriptor 6.
>    HW has already prefetched TX descriptors 6,7,8,9.
> 3) SW writes TX descriptor 8 (clearing TX_USED)
> 4) SW writes the TSTART bit.
>    HW ignores this, because it is still transmitting.
> 5) HW transmits TX descriptor 7.
> 6) HW reaches descriptor 8; because this descriptor
>    has already been prefetched, HW sees a non-active
>    descriptor (TX_USED set) and stops transmitting.
> 
> From debugging the code it seems that the TUBR interrupt happens,
> when
> a descriptor is prefetched, which has a TX_USED bit set, which is
> before
> it is processed by the rest of the hardware:
> When looking at the end of a transfer it seems I get a TUBR
> interrupt,
> followed by some more TX COMPLETE interrupts.
> 
> Additionally that means at the time the TUBR interrupt happens, it
> is too early to write the TSTART bit again, because the hardware is
> still actively transmitting.
> 
> The alternative I implemented is to check in macb_tx_complete() if
> 
> 1) The TX Queue is non-empty (there are pending TX descriptors)
> 2) The hardware indicates that it is not transmitting any more
> 
> If this situation is detected, the TSTART bit will be written to
> restart the TX ring.
> 
> I know for sure, that I hit the code path, which restarts the
> transmission in macb_tx_complete(); that's why I believe the
> "Example Scenario" I described above is correct.
> 
> I am still not sure if what I implemented is enough:
> macb_tx_complete() should at least see all completed TX descriptors.
> I still believe there is a (very short) time window in which there
> might be a race:
> 1) HW completes TX descriptor 7 and sets the TX_USED bit
>    in TX descriptor 7.
>    TX descriptor 8 was prefetched with a set TX_USED bit.
> 2) SW sees that TX descriptor 7 is completed
>    (TX_USED bit now is set).
> 3) SW sees that there still is a pending TX descriptor 8.
> 4) SW checks if the TGO bit is still set, which it is.
>    So the SW does nothing at this point.
> 5) HW processes the prefetched,set TX_USED bit in
>    TX descriptor 8 and stops transmission (clearing the TGO bit).
> 
> I am not sure if it is guaranteed that 5) cannot happen after 4).  If
> 5)
> happens after 4) as described above, then the controller still gets
> stuck.
> The only idea I can come up with, is to re-check the TGO bit
> a second time a little bit later, but I am not sure how to
> implement this.

I would have a similar concern that a race condition like that could
happen. I suspect in order to fix this properly we would need to know
more about how this prefetch mechanism works and how software was
supposed to cope with it. If it works as simplistically as you
described, it seems like it would inevitably cause a bunch of hard to
handle race conditions and it may be preferable to disable it in the
core if possible.

> 
> Is there anyone who has access to hardware documentation, which
> sheds some light onto the way the descriptor prefetching works?
> 
> so long
>   Ingo
> 
> 
> Ingo Rohloff (1):
>   net: macb: A different way to restart a stuck TX descriptor ring.
> 
>  drivers/net/ethernet/cadence/macb.h      |  1 -
>  drivers/net/ethernet/cadence/macb_main.c | 67 +++++++++-------------
> --
>  2 files changed, 24 insertions(+), 44 deletions(-)
> 
> --
> 2.17.1
> 

-- 
Robert Hancock
Senior Hardware Designer, Calian Advanced Technologies
www.calian.com



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/1] net: macb: A different way to restart a stuck TX descriptor ring.
  2023-04-07 21:33         ` [PATCH 1/1] net: macb: A different way to restart a stuck TX descriptor ring Ingo Rohloff
@ 2023-04-11 15:25           ` Jesse Brandeburg
  0 siblings, 0 replies; 22+ messages in thread
From: Jesse Brandeburg @ 2023-04-11 15:25 UTC (permalink / raw)
  To: Ingo Rohloff, robert.hancock
  Cc: Nicolas.Ferre, claudiu.beznea, davem, kuba, netdev, tomas.melin

On 4/7/2023 2:33 PM, Ingo Rohloff wrote:
> This implements a different approach than Commit 4298388574dae6 ("net:
> macb: restart tx after tx used bit read"):
> 
> When reaping TX descriptors in macb_tx_complete(), if there are still
> active descriptors pending (queue is not empty) and the controller
> additionally signals that it is not any longer working on the TX ring,
> then something has to be wrong. Reasoning:
> Each time a descriptor is added to the TX ring (via macb_start_xmit()) the
> controller is triggered to start transmitting (via setting the TSTART
> bit).
> At this point in time, there are two cases:
> 1) The controller already has read an inactive descriptor
>    (with a set TX_USED bit).
> 2) The controller has not yet read an inactive descriptor
>    and is still actively transmitting.
> 
> In case 1) setting the TSTART bit, should restart transmission.
> In case 2) the controller should continue transmitting and at some point
> reach the freshly added descriptors and then process them too.
> 
> This patch checks in macb_tx_complete() if the TX queue is non-empty and
> additionally if the controller indicates that it is not transmitting any
> longer. If this condition is detected, the TSTART bit is set again to
> restart transmission.
> 
> Signed-off-by: Ingo Rohloff <ingo.rohloff@lauterbach.com>


I see this series is still under discussion. Next time you send please
use correct subject line:
[PATCH net v1] macb: ...

Also please be sure to cc the correct maintainers.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/1] Alternative, restart tx after tx used bit read
  2023-04-07 21:33       ` [PATCH 0/1] Alternative, " Ingo Rohloff
  2023-04-07 21:33         ` [PATCH 1/1] net: macb: A different way to restart a stuck TX descriptor ring Ingo Rohloff
  2023-04-10 17:05         ` [PATCH 0/1] Alternative, restart tx after tx used bit read Robert Hancock
@ 2023-04-12  2:07         ` Jakub Kicinski
  2023-04-12  3:17           ` Lars-Peter Clausen
                             ` (3 more replies)
  2023-04-12  6:27         ` [PATCH 0/1] Alternative, restart tx after tx used bit read Tomas Melin
  2023-04-24  8:54         ` Claudiu.Beznea
  4 siblings, 4 replies; 22+ messages in thread
From: Jakub Kicinski @ 2023-04-12  2:07 UTC (permalink / raw)
  To: Ingo Rohloff, Roman Gushchin, Lars-Peter Clausen
  Cc: robert.hancock, Nicolas.Ferre, claudiu.beznea, davem, netdev,
	tomas.melin

On Fri,  7 Apr 2023 23:33:48 +0200 Ingo Rohloff wrote:
> Analysis:
> Commit 404cd086f29e867f ("net: macb: Allocate valid memory for TX and RX BD
> prefetch") mentions:
> 
>     GEM version in ZynqMP and most versions greater than r1p07 supports
>     TX and RX BD prefetch. The number of BDs that can be prefetched is a
>     HW configurable parameter. For ZynqMP, this parameter is 4.
> 
> I think what happens is this:
> Example Scenario (SW == linux kernel, HW == cadence ethernet IP).
> 1) SW has written TX descriptors 0..7
> 2) HW is currently transmitting TX descriptor 6.
>    HW has already prefetched TX descriptors 6,7,8,9.
> 3) SW writes TX descriptor 8 (clearing TX_USED)
> 4) SW writes the TSTART bit.
>    HW ignores this, because it is still transmitting.
> 5) HW transmits TX descriptor 7.
> 6) HW reaches descriptor 8; because this descriptor
>    has already been prefetched, HW sees a non-active
>    descriptor (TX_USED set) and stops transmitting.

This sounds broken, any idea if this is how the IP is supposed to work
or it may be an integration issue in Zynq?  The other side of this
question is how expensive the workaround is - a spin lock and two extra
register reads on completion seems like a lot.

Roman, Lars, have you seen Tx stalls on your macb setups?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/1] Alternative, restart tx after tx used bit read
  2023-04-12  2:07         ` Jakub Kicinski
@ 2023-04-12  3:17           ` Lars-Peter Clausen
  2023-04-12  3:43           ` Roman Gushchin
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Lars-Peter Clausen @ 2023-04-12  3:17 UTC (permalink / raw)
  To: Jakub Kicinski, Ingo Rohloff, Roman Gushchin
  Cc: robert.hancock, Nicolas.Ferre, claudiu.beznea, davem, netdev,
	tomas.melin

On 4/11/23 19:07, Jakub Kicinski wrote:
> On Fri,  7 Apr 2023 23:33:48 +0200 Ingo Rohloff wrote:
>> Analysis:
>> Commit 404cd086f29e867f ("net: macb: Allocate valid memory for TX and RX BD
>> prefetch") mentions:
>>
>>      GEM version in ZynqMP and most versions greater than r1p07 supports
>>      TX and RX BD prefetch. The number of BDs that can be prefetched is a
>>      HW configurable parameter. For ZynqMP, this parameter is 4.
>>
>> I think what happens is this:
>> Example Scenario (SW == linux kernel, HW == cadence ethernet IP).
>> 1) SW has written TX descriptors 0..7
>> 2) HW is currently transmitting TX descriptor 6.
>>     HW has already prefetched TX descriptors 6,7,8,9.
>> 3) SW writes TX descriptor 8 (clearing TX_USED)
>> 4) SW writes the TSTART bit.
>>     HW ignores this, because it is still transmitting.
>> 5) HW transmits TX descriptor 7.
>> 6) HW reaches descriptor 8; because this descriptor
>>     has already been prefetched, HW sees a non-active
>>     descriptor (TX_USED set) and stops transmitting.
> This sounds broken, any idea if this is how the IP is supposed to work
> or it may be an integration issue in Zynq?  The other side of this
> question is how expensive the workaround is - a spin lock and two extra
> register reads on completion seems like a lot.
>
> Roman, Lars, have you seen Tx stalls on your macb setups?

We haven't seen that one yet. We are also using ZynqMP and have done 
iperf3 stress testing in the past, but maybe just got lucky.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/1] Alternative, restart tx after tx used bit read
  2023-04-12  2:07         ` Jakub Kicinski
  2023-04-12  3:17           ` Lars-Peter Clausen
@ 2023-04-12  3:43           ` Roman Gushchin
  2023-04-21 13:00           ` [PATCH v2 0/1] net: macb: Avoid erroneously stopped TX ring Ingo Rohloff
  2023-04-21 13:00           ` [PATCH v2 1/1] " Ingo Rohloff
  3 siblings, 0 replies; 22+ messages in thread
From: Roman Gushchin @ 2023-04-12  3:43 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Ingo Rohloff, Lars-Peter Clausen, robert.hancock, Nicolas.Ferre,
	claudiu.beznea, davem, netdev, tomas.melin

On Tue, Apr 11, 2023 at 07:07:15PM -0700, Jakub Kicinski wrote:
> On Fri,  7 Apr 2023 23:33:48 +0200 Ingo Rohloff wrote:
> > Analysis:
> > Commit 404cd086f29e867f ("net: macb: Allocate valid memory for TX and RX BD
> > prefetch") mentions:
> > 
> >     GEM version in ZynqMP and most versions greater than r1p07 supports
> >     TX and RX BD prefetch. The number of BDs that can be prefetched is a
> >     HW configurable parameter. For ZynqMP, this parameter is 4.
> > 
> > I think what happens is this:
> > Example Scenario (SW == linux kernel, HW == cadence ethernet IP).
> > 1) SW has written TX descriptors 0..7
> > 2) HW is currently transmitting TX descriptor 6.
> >    HW has already prefetched TX descriptors 6,7,8,9.
> > 3) SW writes TX descriptor 8 (clearing TX_USED)
> > 4) SW writes the TSTART bit.
> >    HW ignores this, because it is still transmitting.
> > 5) HW transmits TX descriptor 7.
> > 6) HW reaches descriptor 8; because this descriptor
> >    has already been prefetched, HW sees a non-active
> >    descriptor (TX_USED set) and stops transmitting.
> 
> This sounds broken, any idea if this is how the IP is supposed to work
> or it may be an integration issue in Zynq?  The other side of this
> question is how expensive the workaround is - a spin lock and two extra
> register reads on completion seems like a lot.
> 
> Roman, Lars, have you seen Tx stalls on your macb setups?

Not yet, but also we have a custom patch that reduces the number of tx queues
to 1, which "fixed" some lockup we've seen in the past.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/1] Alternative, restart tx after tx used bit read
  2023-04-07 21:33       ` [PATCH 0/1] Alternative, " Ingo Rohloff
                           ` (2 preceding siblings ...)
  2023-04-12  2:07         ` Jakub Kicinski
@ 2023-04-12  6:27         ` Tomas Melin
  2023-04-12 16:17           ` Robert Hancock
  2023-04-24  8:54         ` Claudiu.Beznea
  4 siblings, 1 reply; 22+ messages in thread
From: Tomas Melin @ 2023-04-12  6:27 UTC (permalink / raw)
  To: Ingo Rohloff, robert.hancock
  Cc: Nicolas.Ferre, claudiu.beznea, davem, kuba, netdev

Hi,
On 08/04/2023 00:33, Ingo Rohloff wrote:
> [You don't often get email from ingo.rohloff@lauterbach.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> I am sorry; this is a long E-Mail.
> 
> I am referring to this problem:
> 
> Robert Hancock wrote:
>> On Wed, 2022-03-23 at 08:43 -0700, Jakub Kicinski wrote:
>>> On Wed, 23 Mar 2022 10:08:20 +0200 Tomas Melin wrote:
>>>>> From: Claudiu Beznea <claudiu.beznea@microchip.com>
>>>>>
>>>>> On some platforms (currently detected only on SAMA5D4) TX might stuck
>>>>> even the pachets are still present in DMA memories and TX start was
>>>>> issued for them.
>>>>> ...
>>>> On Xilinx Zynq the above change can cause infinite interrupt loop
>>>> leading to CPU stall.  Seems timing/load needs to be appropriate for
>>>> this to happen, and currently with 1G ethernet this can be triggered
>>>> normally within minutes when running stress tests on the network
>>>> interface.
>>>> ...
>>> Which kernel version are you using?  Robert has been working on macb +
>>> Zynq recently, adding him to CC.
>> ...
>> I haven't looked at the TX ring descriptor and register setup on this core
>> in that much detail, but the fact the controller gets into this "TX used
>> bit read" state in the first place seems unusual.  I'm wondering if
>> something is being done in the wrong order or if we are missing a memory
>> barrier etc?
> 
> I am developing on a ZynqMP (Ultrascale+) SoC from AMD/Xilinx.
> I have seen the same issue before commit 4298388574dae6168 ("net: macb:
> restart tx after tx used bit read")

Since you mention before that commit this triggered, have you been able
to reproduce the problem with that commit applied?

> 
> The scenario which sometimes triggers it for me:
> 
> I have an application running on the PC.
> The application sends a short command (via TCP) to the ZynqMP.
> The ZynqMP answers with a long stream of bytes via TCP
> (around 230KiB).
> The PC knows the amount of data and waits to receive the data completely.
> The PC gets stuck, because the last TCP segment of the transfer gets
> stuck in the ZynqMP and is not transmitted.
> You can re-trigger the TX Ring by pinging the ZynqMP:
> The Ping answer will re-trigger the TX ring, which in turn will also
> then send the stuck IP/TCP packet.
> 
> Unfortunately triggering this problem seems to be hard; at least I am
> not able to reproduce it easily.
> 
> So: If anyone has a more reliable way to trigger the problem,
> please tell me.
> This is to check if my proposed alternative works under all circumstances.
> 
> I have an alternate implementation, which does not require to turn on
> the "TX USED BIT READ" (TUBR) interrupt.
> The reason why I think this alternative might be better is, because I
> believe the TUBR interrupt happens at the wrong time; so I am not sure
> that the current implementation works reliably.
> 
> Analysis:
> Commit 404cd086f29e867f ("net: macb: Allocate valid memory for TX and RX BD
> prefetch") mentions:
> 
>     GEM version in ZynqMP and most versions greater than r1p07 supports
>     TX and RX BD prefetch. The number of BDs that can be prefetched is a
>     HW configurable parameter. For ZynqMP, this parameter is 4.
> 
> I think what happens is this:
> Example Scenario (SW == linux kernel, HW == cadence ethernet IP).
> 1) SW has written TX descriptors 0..7
> 2) HW is currently transmitting TX descriptor 6.
>    HW has already prefetched TX descriptors 6,7,8,9.
> 3) SW writes TX descriptor 8 (clearing TX_USED)
> 4) SW writes the TSTART bit.
>    HW ignores this, because it is still transmitting.
> 5) HW transmits TX descriptor 7.
> 6) HW reaches descriptor 8; because this descriptor
>    has already been prefetched, HW sees a non-active
>    descriptor (TX_USED set) and stops transmitting.
> 
> From debugging the code it seems that the TUBR interrupt happens, when
> a descriptor is prefetched, which has a TX_USED bit set, which is before
> it is processed by the rest of the hardware:
> When looking at the end of a transfer it seems I get a TUBR interrupt,
> followed by some more TX COMPLETE interrupts.
> 
I recall that the documentation for the TUBR is rather sparse, so to be
sure about the semantics how this is supposed to work, internal
documentation would indeed be valuable.

Thanks,
Tomas


> Additionally that means at the time the TUBR interrupt happens, it
> is too early to write the TSTART bit again, because the hardware is
> still actively transmitting.
> 
> The alternative I implemented is to check in macb_tx_complete() if
> 
> 1) The TX Queue is non-empty (there are pending TX descriptors)
> 2) The hardware indicates that it is not transmitting any more
> 
> If this situation is detected, the TSTART bit will be written to
> restart the TX ring.
> 
> I know for sure, that I hit the code path, which restarts the
> transmission in macb_tx_complete(); that's why I believe the
> "Example Scenario" I described above is correct.
> 
> I am still not sure if what I implemented is enough:
> macb_tx_complete() should at least see all completed TX descriptors.
> I still believe there is a (very short) time window in which there
> might be a race:
> 1) HW completes TX descriptor 7 and sets the TX_USED bit
>    in TX descriptor 7.
>    TX descriptor 8 was prefetched with a set TX_USED bit.
> 2) SW sees that TX descriptor 7 is completed
>    (TX_USED bit now is set).
> 3) SW sees that there still is a pending TX descriptor 8.
> 4) SW checks if the TGO bit is still set, which it is.
>    So the SW does nothing at this point.
> 5) HW processes the prefetched,set TX_USED bit in
>    TX descriptor 8 and stops transmission (clearing the TGO bit).
> 
> I am not sure if it is guaranteed that 5) cannot happen after 4).  If 5)
> happens after 4) as described above, then the controller still gets stuck.
> The only idea I can come up with, is to re-check the TGO bit
> a second time a little bit later, but I am not sure how to
> implement this.
> 
> Is there anyone who has access to hardware documentation, which
> sheds some light onto the way the descriptor prefetching works?
> 
> so long
>   Ingo
> 
> 
> Ingo Rohloff (1):
>   net: macb: A different way to restart a stuck TX descriptor ring.
> 
>  drivers/net/ethernet/cadence/macb.h      |  1 -
>  drivers/net/ethernet/cadence/macb_main.c | 67 +++++++++---------------
>  2 files changed, 24 insertions(+), 44 deletions(-)
> 
> --
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/1] Alternative, restart tx after tx used bit read
  2023-04-12  6:27         ` [PATCH 0/1] Alternative, restart tx after tx used bit read Tomas Melin
@ 2023-04-12 16:17           ` Robert Hancock
  0 siblings, 0 replies; 22+ messages in thread
From: Robert Hancock @ 2023-04-12 16:17 UTC (permalink / raw)
  To: ingo.rohloff, tomas.melin
  Cc: Nicolas.Ferre, claudiu.beznea, davem, kuba, harini.katakam, netdev

On Wed, 2023-04-12 at 09:27 +0300, Tomas Melin wrote:
> Hi,
> On 08/04/2023 00:33, Ingo Rohloff wrote:
> > [You don't often get email from ingo.rohloff@lauterbach.com. Learn
> > why this is important at
> > https://urldefense.com/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!IOGos0k!iFvgajE8vnJRZmpTVg9e4dOxJyPnniaWpfcD4DPEWlcvL2TH5ATnrbPJEzOmuYtxPdatijpc4zTDtStxJ-VSTcAhXw$
> >   ]
> > 
> > I am sorry; this is a long E-Mail.
> > 
> > I am referring to this problem:
> > 
> > Robert Hancock wrote:
> > > On Wed, 2022-03-23 at 08:43 -0700, Jakub Kicinski wrote:
> > > > On Wed, 23 Mar 2022 10:08:20 +0200 Tomas Melin wrote:
> > > > > > From: Claudiu Beznea <claudiu.beznea@microchip.com>
> > > > > > 
> > > > > > On some platforms (currently detected only on SAMA5D4) TX
> > > > > > might stuck
> > > > > > even the pachets are still present in DMA memories and TX
> > > > > > start was
> > > > > > issued for them.
> > > > > > ...
> > > > > On Xilinx Zynq the above change can cause infinite interrupt
> > > > > loop
> > > > > leading to CPU stall.  Seems timing/load needs to be
> > > > > appropriate for
> > > > > this to happen, and currently with 1G ethernet this can be
> > > > > triggered
> > > > > normally within minutes when running stress tests on the
> > > > > network
> > > > > interface.
> > > > > ...
> > > > Which kernel version are you using?  Robert has been working on
> > > > macb +
> > > > Zynq recently, adding him to CC.
> > > ...
> > > I haven't looked at the TX ring descriptor and register setup on
> > > this core
> > > in that much detail, but the fact the controller gets into this
> > > "TX used
> > > bit read" state in the first place seems unusual.  I'm wondering
> > > if
> > > something is being done in the wrong order or if we are missing a
> > > memory
> > > barrier etc?
> > 
> > I am developing on a ZynqMP (Ultrascale+) SoC from AMD/Xilinx.
> > I have seen the same issue before commit 4298388574dae6168 ("net:
> > macb:
> > restart tx after tx used bit read")
> 
> Since you mention before that commit this triggered, have you been
> able
> to reproduce the problem with that commit applied?
> 
> > 
> > The scenario which sometimes triggers it for me:
> > 
> > I have an application running on the PC.
> > The application sends a short command (via TCP) to the ZynqMP.
> > The ZynqMP answers with a long stream of bytes via TCP
> > (around 230KiB).
> > The PC knows the amount of data and waits to receive the data
> > completely.
> > The PC gets stuck, because the last TCP segment of the transfer
> > gets
> > stuck in the ZynqMP and is not transmitted.
> > You can re-trigger the TX Ring by pinging the ZynqMP:
> > The Ping answer will re-trigger the TX ring, which in turn will
> > also
> > then send the stuck IP/TCP packet.
> > 
> > Unfortunately triggering this problem seems to be hard; at least I
> > am
> > not able to reproduce it easily.
> > 
> > So: If anyone has a more reliable way to trigger the problem,
> > please tell me.
> > This is to check if my proposed alternative works under all
> > circumstances.
> > 
> > I have an alternate implementation, which does not require to turn
> > on
> > the "TX USED BIT READ" (TUBR) interrupt.
> > The reason why I think this alternative might be better is, because
> > I
> > believe the TUBR interrupt happens at the wrong time; so I am not
> > sure
> > that the current implementation works reliably.
> > 
> > Analysis:
> > Commit 404cd086f29e867f ("net: macb: Allocate valid memory for TX
> > and RX BD
> > prefetch") mentions:
> > 
> >     GEM version in ZynqMP and most versions greater than r1p07
> > supports
> >     TX and RX BD prefetch. The number of BDs that can be prefetched
> > is a
> >     HW configurable parameter. For ZynqMP, this parameter is 4.
> > 
> > I think what happens is this:
> > Example Scenario (SW == linux kernel, HW == cadence ethernet IP).
> > 1) SW has written TX descriptors 0..7
> > 2) HW is currently transmitting TX descriptor 6.
> >    HW has already prefetched TX descriptors 6,7,8,9.
> > 3) SW writes TX descriptor 8 (clearing TX_USED)
> > 4) SW writes the TSTART bit.
> >    HW ignores this, because it is still transmitting.
> > 5) HW transmits TX descriptor 7.
> > 6) HW reaches descriptor 8; because this descriptor
> >    has already been prefetched, HW sees a non-active
> >    descriptor (TX_USED set) and stops transmitting.
> > 
> > From debugging the code it seems that the TUBR interrupt happens,
> > when
> > a descriptor is prefetched, which has a TX_USED bit set, which is
> > before
> > it is processed by the rest of the hardware:
> > When looking at the end of a transfer it seems I get a TUBR
> > interrupt,
> > followed by some more TX COMPLETE interrupts.
> > 
> I recall that the documentation for the TUBR is rather sparse, so to
> be
> sure about the semantics how this is supposed to work, internal
> documentation would indeed be valuable.

It would be nice if Cadence would just make the documentation for this
core fully public.

CCing Harini from Xilinx who added some previous workarounds related to
descriptor prefetch. Do you have some more visibility into how this
mechanism is supposed to work? Given that the driver is potentially
modifying descriptors to add new entries right in front of the
hardware's current position in the ring, I'm not sure how it can be
avoided that it ends up processing stale descriptor values and ending
up in this TXUBR state, which it seems may sometimes not be handled
properly. Maybe if there was a way to invalidate the prefetched entries
after modifying the TX ring, but that might still be racy. If the
performance impact is not that much, it might be easiest to just
disable TX descriptor prefetch if there was some way to do so.


> 
> Thanks,
> Tomas
> 
> 
> > Additionally that means at the time the TUBR interrupt happens, it
> > is too early to write the TSTART bit again, because the hardware is
> > still actively transmitting.
> > 
> > The alternative I implemented is to check in macb_tx_complete() if
> > 
> > 1) The TX Queue is non-empty (there are pending TX descriptors)
> > 2) The hardware indicates that it is not transmitting any more
> > 
> > If this situation is detected, the TSTART bit will be written to
> > restart the TX ring.
> > 
> > I know for sure, that I hit the code path, which restarts the
> > transmission in macb_tx_complete(); that's why I believe the
> > "Example Scenario" I described above is correct.
> > 
> > I am still not sure if what I implemented is enough:
> > macb_tx_complete() should at least see all completed TX
> > descriptors.
> > I still believe there is a (very short) time window in which there
> > might be a race:
> > 1) HW completes TX descriptor 7 and sets the TX_USED bit
> >    in TX descriptor 7.
> >    TX descriptor 8 was prefetched with a set TX_USED bit.
> > 2) SW sees that TX descriptor 7 is completed
> >    (TX_USED bit now is set).
> > 3) SW sees that there still is a pending TX descriptor 8.
> > 4) SW checks if the TGO bit is still set, which it is.
> >    So the SW does nothing at this point.
> > 5) HW processes the prefetched,set TX_USED bit in
> >    TX descriptor 8 and stops transmission (clearing the TGO bit).
> > 
> > I am not sure if it is guaranteed that 5) cannot happen after 4). 
> > If 5)
> > happens after 4) as described above, then the controller still gets
> > stuck.
> > The only idea I can come up with, is to re-check the TGO bit
> > a second time a little bit later, but I am not sure how to
> > implement this.
> > 
> > Is there anyone who has access to hardware documentation, which
> > sheds some light onto the way the descriptor prefetching works?
> > 
> > so long
> >   Ingo
> > 
> > 
> > Ingo Rohloff (1):
> >   net: macb: A different way to restart a stuck TX descriptor ring.
> > 
> >  drivers/net/ethernet/cadence/macb.h      |  1 -
> >  drivers/net/ethernet/cadence/macb_main.c | 67 +++++++++-----------
> > ----
> >  2 files changed, 24 insertions(+), 44 deletions(-)
> > 
> > --
> > 2.17.1
> > 

-- 
Robert Hancock
Senior Hardware Designer, Calian Advanced Technologies
www.calian.com



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 0/1] net: macb: Avoid erroneously stopped TX ring.
  2023-04-12  2:07         ` Jakub Kicinski
  2023-04-12  3:17           ` Lars-Peter Clausen
  2023-04-12  3:43           ` Roman Gushchin
@ 2023-04-21 13:00           ` Ingo Rohloff
  2023-04-21 13:00           ` [PATCH v2 1/1] " Ingo Rohloff
  3 siblings, 0 replies; 22+ messages in thread
From: Ingo Rohloff @ 2023-04-21 13:00 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Roman Gushchin, Lars-Peter Clausen, robert.hancock,
	Nicolas.Ferre, claudiu.beznea, davem, netdev, tomas.melin,
	Ingo Rohloff

On Tue, 11 Apr 2023 19:07:15 -0700
Jakub Kicinski <kuba@kernel.org> wrote:

> On Fri,  7 Apr 2023 23:33:48 +0200 Ingo Rohloff wrote:
> > Analysis:
> > Commit 404cd086f29e867f ("net: macb: Allocate valid memory for TX and RX BD
> > prefetch") mentions:
> >
> >     GEM version in ZynqMP and most versions greater than r1p07 supports
> >     TX and RX BD prefetch. The number of BDs that can be prefetched is a
> >     HW configurable parameter. For ZynqMP, this parameter is 4.
> >
> > I think what happens is this:
> > Example Scenario (SW == linux kernel, HW == cadence ethernet IP).
> > 1) SW has written TX descriptors 0..7
> > 2) HW is currently transmitting TX descriptor 6.
> >    HW has already prefetched TX descriptors 6,7,8,9.
> > 3) SW writes TX descriptor 8 (clearing TX_USED)
> > 4) SW writes the TSTART bit.
> >    HW ignores this, because it is still transmitting.
> > 5) HW transmits TX descriptor 7.
> > 6) HW reaches descriptor 8; because this descriptor
> >    has already been prefetched, HW sees a non-active
> >    descriptor (TX_USED set) and stops transmitting.
>
> This sounds broken, any idea if this is how the IP is supposed to work
> or it may be an integration issue in Zynq?  The other side of this
> question is how expensive the workaround is - a spin lock and two extra
> register reads on completion seems like a lot.
>
> Roman, Lars, have you seen Tx stalls on your macb setups?

I think I finally was able to hunt down the exact conditions.
It seems to be simpler than I thought.
This also results in a completely modified patch and work-around.

As far as I can tell there is a race condition between HW and SW.
(TX descriptor 4 is just used as an example here):

1) HW has just read TX descriptor 4 (with set TX_USED bit).
   "Just read" means: The read data of the read transfer is
   in flight to the HW module.
   SW clears TX_USED bit of TX descriptor 4.

2) SW writes TSTART bit in NCR register.
   HW is still processing TX descriptor 4 (TGO bit in TSR still set)
   and thus ignores the TSTART trigger.

3) HW stops TX processing (TGO bit in TSR is cleared), because it
   sees the set TX_USED bit of the (previous) TX descriptor 4.

We now have got an active pending TX descriptor 4, but HW will not
restart transmission, because it ignored the corresponding
TSTART trigger.

I encountered the stuck TX descriptor state, when before:
1) TBQP register indicated HW reached the TX descriptor,
   where the TX_USED bit is just cleared by SW.
2) HW had a set TGO bit in the TSR register,
   which indicates HW is still processing the TX ring.

The patch I propose tries to ensure that the TSTART trigger
is not ignored.
The idea is if we have this condition

* The TGO bit is set in TSR
* The TBQP register points to the TX descriptor,
  where the TX_USED bit was just cleared.

then I assume we are in an unknown state:

* HW could either see the cleared TX_USED bit and continue, or
* HW could ignore the TSTART trigger and stop.

The patch tries to detect this condition in the new function
macb_fix_tstart_race().
If this unknown state is detected, the HW is rechecked until either
* The TBQP register points to a different descriptor (meaning
  the hardware is still actively processing the TX ring)
* The hardware indicates it has stopped processing via a
  cleared TGO bit in the TSR register.
  In this case the TSTART bit is written again.


Note: Cadence might be able to clear up under which circumstances
this race condition might happen.
On the Xilinx ZynqMP Ultrascale+ device I have here,
I for sure reach the line

    /* Controller stopped... write TSTART again.

in the patch; which indicates the code of the patch at least covers
a condition which happens in reality on this particular SoC.


regards
  Ingo


Ingo Rohloff (1):
  net: macb: Avoid erroneously stopped TX ring.

 drivers/net/ethernet/cadence/macb.h      |  1 -
 drivers/net/ethernet/cadence/macb_main.c | 94 +++++++++++++-----------
 2 files changed, 50 insertions(+), 45 deletions(-)

--
2.17.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/1] net: macb: Avoid erroneously stopped TX ring.
  2023-04-12  2:07         ` Jakub Kicinski
                             ` (2 preceding siblings ...)
  2023-04-21 13:00           ` [PATCH v2 0/1] net: macb: Avoid erroneously stopped TX ring Ingo Rohloff
@ 2023-04-21 13:00           ` Ingo Rohloff
  2023-04-22  2:57             ` Jakub Kicinski
  3 siblings, 1 reply; 22+ messages in thread
From: Ingo Rohloff @ 2023-04-21 13:00 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Roman Gushchin, Lars-Peter Clausen, robert.hancock,
	Nicolas.Ferre, claudiu.beznea, davem, netdev, tomas.melin,
	Ingo Rohloff

The SW puts a frame to be transmitted into the TX descriptor ring with
macb_start_xmit().
The last step of this operation is, that the SW clears the TX_USED bit
of the first descriptor it wrote.
The HW already reached and read this descriptor with a set TX_USED bit.
The SW sets the TSTART bit in the NCR register.
This is a race condition:
1) Either the HW already has processed the descriptor and has stopped the
   transmission, so the TGO bit in the TSR register is cleared.
2) The HW has read, but not yet processed the descriptor.
In case 2) the HW ignores the TSTART trigger and stops the
transmission a little bit later.

You now have got a TX descriptor in the TX ring which is ready (TX_USED
bit cleared), but the hardware does not process the fresh descriptor,
because it ignored the corresponding TSTART trigger.

This patch checks if the hardware is processing the same descriptor, where
the TX_USED bit was just cleared.
If this is the case this patch ensures that the TSTART trigger is repeated
if needed.

Signed-off-by: Ingo Rohloff <ingo.rohloff@lauterbach.com>
---
 drivers/net/ethernet/cadence/macb.h      |  1 -
 drivers/net/ethernet/cadence/macb_main.c | 94 +++++++++++++-----------
 2 files changed, 50 insertions(+), 45 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
index 14dfec4db8f9..b749fa2c0342 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -1205,7 +1205,6 @@ struct macb_queue {
 	struct macb_tx_skb	*tx_skb;
 	dma_addr_t		tx_ring_dma;
 	struct work_struct	tx_error_task;
-	bool			txubr_pending;
 	struct napi_struct	napi_tx;
 
 	dma_addr_t		rx_ring_dma;
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index e43d99ec50ba..0fc9a345c1f0 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -70,8 +70,7 @@ struct sifive_fu540_macb_mgmt {
 #define MACB_TX_ERR_FLAGS	(MACB_BIT(ISR_TUND)			\
 					| MACB_BIT(ISR_RLE)		\
 					| MACB_BIT(TXERR))
-#define MACB_TX_INT_FLAGS	(MACB_TX_ERR_FLAGS | MACB_BIT(TCOMP)	\
-					| MACB_BIT(TXUBR))
+#define MACB_TX_INT_FLAGS	(MACB_TX_ERR_FLAGS | MACB_BIT(TCOMP))
 
 /* Max length of transmit frame must be a multiple of 8 bytes */
 #define MACB_TX_LEN_ALIGN	8
@@ -1692,31 +1691,6 @@ static int macb_rx_poll(struct napi_struct *napi, int budget)
 	return work_done;
 }
 
-static void macb_tx_restart(struct macb_queue *queue)
-{
-	struct macb *bp = queue->bp;
-	unsigned int head_idx, tbqp;
-
-	spin_lock(&queue->tx_ptr_lock);
-
-	if (queue->tx_head == queue->tx_tail)
-		goto out_tx_ptr_unlock;
-
-	tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
-	tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
-	head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, queue->tx_head));
-
-	if (tbqp == head_idx)
-		goto out_tx_ptr_unlock;
-
-	spin_lock_irq(&bp->lock);
-	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
-	spin_unlock_irq(&bp->lock);
-
-out_tx_ptr_unlock:
-	spin_unlock(&queue->tx_ptr_lock);
-}
-
 static bool macb_tx_complete_pending(struct macb_queue *queue)
 {
 	bool retval = false;
@@ -1741,13 +1715,6 @@ static int macb_tx_poll(struct napi_struct *napi, int budget)
 
 	work_done = macb_tx_complete(queue, budget);
 
-	rmb(); // ensure txubr_pending is up to date
-	if (queue->txubr_pending) {
-		queue->txubr_pending = false;
-		netdev_vdbg(bp->dev, "poll: tx restart\n");
-		macb_tx_restart(queue);
-	}
-
 	netdev_vdbg(bp->dev, "TX poll: queue = %u, work_done = %d, budget = %d\n",
 		    (unsigned int)(queue - bp->queues), work_done, budget);
 
@@ -1917,17 +1884,10 @@ static irqreturn_t macb_interrupt(int irq, void *dev_id)
 			}
 		}
 
-		if (status & (MACB_BIT(TCOMP) |
-			      MACB_BIT(TXUBR))) {
+		if (status & MACB_BIT(TCOMP)) {
 			queue_writel(queue, IDR, MACB_BIT(TCOMP));
 			if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
-				queue_writel(queue, ISR, MACB_BIT(TCOMP) |
-							 MACB_BIT(TXUBR));
-
-			if (status & MACB_BIT(TXUBR)) {
-				queue->txubr_pending = true;
-				wmb(); // ensure softirq can see update
-			}
+				queue_writel(queue, ISR, MACB_BIT(TCOMP));
 
 			if (napi_schedule_prep(&queue->napi_tx)) {
 				netdev_vdbg(bp->dev, "scheduling TX softirq\n");
@@ -2288,14 +2248,54 @@ static int macb_pad_and_fcs(struct sk_buff **skb, struct net_device *ndev)
 	return 0;
 }
 
+static void macb_fix_tstart_race(unsigned int tx_head,
+				 struct macb *bp, struct macb_queue *queue)
+{
+	u32 macb_tsr, macb_tbqp, macb_ncr;
+
+	/* Controller was (probably) active when we wrote TSTART.
+	 * This might be a race condition.
+	 * Ensure TSTART is not ignored.
+	 */
+	for (;;) {
+		macb_tbqp = queue_readl(queue, TBQP);
+		macb_tbqp = macb_tbqp - lower_32_bits(queue->tx_ring_dma);
+		macb_tbqp = macb_tbqp / macb_dma_desc_get_size(bp);
+		macb_tbqp = macb_tx_ring_wrap(bp, macb_tbqp);
+		if (tx_head != macb_tbqp) {
+			/* Controller is working on different descriptor.
+			 * There should be no problem.
+			 */
+			break;
+		}
+
+		/* Controller works on the descriptor we just wrote.
+		 * TSTART might not have worked. Check for TGO again.
+		 */
+		macb_tsr = macb_readl(bp, TSR);
+		if (!(macb_tsr & MACB_BIT(TGO))) {
+			/* Controller stopped... write TSTART again.
+			 */
+			macb_ncr = macb_readl(bp, NCR);
+			macb_ncr = macb_ncr | MACB_BIT(TSTART);
+			macb_writel(bp, NCR, macb_ncr);
+			break;
+		}
+		/* Controller might stop or process our descriptor.
+		 * Check again.
+		 */
+	}
+}
+
 static netdev_tx_t macb_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	u16 queue_index = skb_get_queue_mapping(skb);
 	struct macb *bp = netdev_priv(dev);
 	struct macb_queue *queue = &bp->queues[queue_index];
-	unsigned int desc_cnt, nr_frags, frag_size, f;
+	unsigned int desc_cnt, nr_frags, frag_size, f, tx_head;
 	unsigned int hdrlen;
 	bool is_lso;
+	u32 macb_tsr;
 	netdev_tx_t ret = NETDEV_TX_OK;
 
 	if (macb_clear_csum(skb)) {
@@ -2367,6 +2367,9 @@ static netdev_tx_t macb_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		goto unlock;
 	}
 
+	/* remember first descriptor we are going to modify */
+	tx_head = macb_tx_ring_wrap(bp, queue->tx_head);
+
 	/* Map socket buffer for DMA transfer */
 	if (!macb_tx_map(bp, queue, skb, hdrlen)) {
 		dev_kfree_skb_any(skb);
@@ -2378,7 +2381,10 @@ static netdev_tx_t macb_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	skb_tx_timestamp(skb);
 
 	spin_lock_irq(&bp->lock);
+	macb_tsr = macb_readl(bp, TSR);
 	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
+	if (macb_tsr & MACB_BIT(TGO))
+		macb_fix_tstart_race(tx_head, bp, queue);
 	spin_unlock_irq(&bp->lock);
 
 	if (CIRC_SPACE(queue->tx_head, queue->tx_tail, bp->tx_ring_size) < 1)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/1] net: macb: Avoid erroneously stopped TX ring.
  2023-04-21 13:00           ` [PATCH v2 1/1] " Ingo Rohloff
@ 2023-04-22  2:57             ` Jakub Kicinski
  0 siblings, 0 replies; 22+ messages in thread
From: Jakub Kicinski @ 2023-04-22  2:57 UTC (permalink / raw)
  To: Ingo Rohloff
  Cc: Roman Gushchin, Lars-Peter Clausen, robert.hancock,
	Nicolas.Ferre, claudiu.beznea, davem, netdev, tomas.melin

On Fri, 21 Apr 2023 15:00:35 +0200 Ingo Rohloff wrote:
> The SW puts a frame to be transmitted into the TX descriptor ring with
> macb_start_xmit().
> The last step of this operation is, that the SW clears the TX_USED bit
> of the first descriptor it wrote.
> The HW already reached and read this descriptor with a set TX_USED bit.
> The SW sets the TSTART bit in the NCR register.
> This is a race condition:
> 1) Either the HW already has processed the descriptor and has stopped the
>    transmission, so the TGO bit in the TSR register is cleared.
> 2) The HW has read, but not yet processed the descriptor.
> In case 2) the HW ignores the TSTART trigger and stops the
> transmission a little bit later.
> 
> You now have got a TX descriptor in the TX ring which is ready (TX_USED
> bit cleared), but the hardware does not process the fresh descriptor,
> because it ignored the corresponding TSTART trigger.
> 
> This patch checks if the hardware is processing the same descriptor, where
> the TX_USED bit was just cleared.
> If this is the case this patch ensures that the TSTART trigger is repeated
> if needed.

Were you able to measure the performance impact of the workaround?
Doesn't name a difference?

> +static void macb_fix_tstart_race(unsigned int tx_head,
> +				 struct macb *bp, struct macb_queue *queue)
> +{
> +	u32 macb_tsr, macb_tbqp, macb_ncr;
> +
> +	/* Controller was (probably) active when we wrote TSTART.
> +	 * This might be a race condition.
> +	 * Ensure TSTART is not ignored.
> +	 */
> +	for (;;) {
> +		macb_tbqp = queue_readl(queue, TBQP);
> +		macb_tbqp = macb_tbqp - lower_32_bits(queue->tx_ring_dma);
> +		macb_tbqp = macb_tbqp / macb_dma_desc_get_size(bp);
> +		macb_tbqp = macb_tx_ring_wrap(bp, macb_tbqp);
> +		if (tx_head != macb_tbqp) {
> +			/* Controller is working on different descriptor.
> +			 * There should be no problem.
> +			 */
> +			break;
> +		}
> +
> +		/* Controller works on the descriptor we just wrote.
> +		 * TSTART might not have worked. Check for TGO again.
> +		 */
> +		macb_tsr = macb_readl(bp, TSR);
> +		if (!(macb_tsr & MACB_BIT(TGO))) {
> +			/* Controller stopped... write TSTART again.
> +			 */
> +			macb_ncr = macb_readl(bp, NCR);
> +			macb_ncr = macb_ncr | MACB_BIT(TSTART);
> +			macb_writel(bp, NCR, macb_ncr);
> +			break;
> +		}
> +		/* Controller might stop or process our descriptor.
> +		 * Check again.
> +		 */

We should add (1) cpu_relax(), (2) a statistic which would count that
the condition was encountered, and (3) some form of limit on the loop,
so that we don't hang the host with irqs off if the NIC goes bad.

> +	}
> +}

>  	spin_lock_irq(&bp->lock);
> +	macb_tsr = macb_readl(bp, TSR);

-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/1] Alternative, restart tx after tx used bit read
  2023-04-07 21:33       ` [PATCH 0/1] Alternative, " Ingo Rohloff
                           ` (3 preceding siblings ...)
  2023-04-12  6:27         ` [PATCH 0/1] Alternative, restart tx after tx used bit read Tomas Melin
@ 2023-04-24  8:54         ` Claudiu.Beznea
  4 siblings, 0 replies; 22+ messages in thread
From: Claudiu.Beznea @ 2023-04-24  8:54 UTC (permalink / raw)
  To: ingo.rohloff, robert.hancock
  Cc: Nicolas.Ferre, davem, kuba, netdev, tomas.melin

On 08.04.2023 00:33, Ingo Rohloff wrote:
> [You don't often get email from ingo.rohloff@lauterbach.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> I am sorry; this is a long E-Mail.
> 
> I am referring to this problem:
> 
> Robert Hancock wrote:
>> On Wed, 2022-03-23 at 08:43 -0700, Jakub Kicinski wrote:
>>> On Wed, 23 Mar 2022 10:08:20 +0200 Tomas Melin wrote:
>>>>> From: Claudiu Beznea <claudiu.beznea@microchip.com>
>>>>>
>>>>> On some platforms (currently detected only on SAMA5D4) TX might stuck
>>>>> even the pachets are still present in DMA memories and TX start was
>>>>> issued for them.
>>>>> ...
>>>> On Xilinx Zynq the above change can cause infinite interrupt loop
>>>> leading to CPU stall.  Seems timing/load needs to be appropriate for
>>>> this to happen, and currently with 1G ethernet this can be triggered
>>>> normally within minutes when running stress tests on the network
>>>> interface.
>>>> ...
>>> Which kernel version are you using?  Robert has been working on macb +
>>> Zynq recently, adding him to CC.
>> ...
>> I haven't looked at the TX ring descriptor and register setup on this core
>> in that much detail, but the fact the controller gets into this "TX used
>> bit read" state in the first place seems unusual.  I'm wondering if
>> something is being done in the wrong order or if we are missing a memory
>> barrier etc?
> 
> I am developing on a ZynqMP (Ultrascale+) SoC from AMD/Xilinx.
> I have seen the same issue before commit 4298388574dae6168 ("net: macb:
> restart tx after tx used bit read")
> 
> The scenario which sometimes triggers it for me:
> 
> I have an application running on the PC.
> The application sends a short command (via TCP) to the ZynqMP.
> The ZynqMP answers with a long stream of bytes via TCP
> (around 230KiB).
> The PC knows the amount of data and waits to receive the data completely.
> The PC gets stuck, because the last TCP segment of the transfer gets
> stuck in the ZynqMP and is not transmitted.
> You can re-trigger the TX Ring by pinging the ZynqMP:
> The Ping answer will re-trigger the TX ring, which in turn will also
> then send the stuck IP/TCP packet.

From the moment I've been working on this I remember ping was also
restarting the TX for me (which is expected as it enqueues more descriptors
and issue TSTART). At the time I worked on this the hardware prefetch was
not implemented in software (and I think the hardware I had the issue with
didn't support it).

> 
> Unfortunately triggering this problem seems to be hard; at least I am
> not able to reproduce it easily.

It was the same on my side.

> 
> So: If anyone has a more reliable way to trigger the problem,
> please tell me.
> This is to check if my proposed alternative works under all circumstances.
> 
> I have an alternate implementation, which does not require to turn on
> the "TX USED BIT READ" (TUBR) interrupt.
> The reason why I think this alternative might be better is, because I
> believe the TUBR interrupt happens at the wrong time; so I am not sure
> that the current implementation works reliably.
> 
> Analysis:
> Commit 404cd086f29e867f ("net: macb: Allocate valid memory for TX and RX BD
> prefetch") mentions:
> 
>     GEM version in ZynqMP and most versions greater than r1p07 supports
>     TX and RX BD prefetch. The number of BDs that can be prefetched is a
>     HW configurable parameter. For ZynqMP, this parameter is 4.
> 
> I think what happens is this:
> Example Scenario (SW == linux kernel, HW == cadence ethernet IP).
> 1) SW has written TX descriptors 0..7
> 2) HW is currently transmitting TX descriptor 6.
>    HW has already prefetched TX descriptors 6,7,8,9.
> 3) SW writes TX descriptor 8 (clearing TX_USED)
> 4) SW writes the TSTART bit.
>    HW ignores this, because it is still transmitting.
> 5) HW transmits TX descriptor 7.
> 6) HW reaches descriptor 8; because this descriptor
>    has already been prefetched, HW sees a non-active
>    descriptor (TX_USED set) and stops transmitting.

I think you can check TBQP register to see if hardware position in TX queue
is ahead of software to validate against hardware this prefetch scenario.

> 
>>From debugging the code it seems that the TUBR interrupt happens, when
> a descriptor is prefetched, which has a TX_USED bit set, which is before

From what I know, yes, TUBR is raised when HW TX logic reaches a descriptor
with TX_USED set.

> it is processed by the rest of the hardware:
> When looking at the end of a transfer it seems I get a TUBR interrupt,
> followed by some more TX COMPLETE interrupts.
> 
> Additionally that means at the time the TUBR interrupt happens, it
> is too early to write the TSTART bit again, because the hardware is
> still actively transmitting.
> 
> The alternative I implemented is to check in macb_tx_complete() if
> 
> 1) The TX Queue is non-empty (there are pending TX descriptors)
> 2) The hardware indicates that it is not transmitting any more
> 
> If this situation is detected, the TSTART bit will be written to
> restart the TX ring.
> 
> I know for sure, that I hit the code path, which restarts the
> transmission in macb_tx_complete(); that's why I believe the
> "Example Scenario" I described above is correct.
> 
> I am still not sure if what I implemented is enough:
> macb_tx_complete() should at least see all completed TX descriptors.
> I still believe there is a (very short) time window in which there
> might be a race:
> 1) HW completes TX descriptor 7 and sets the TX_USED bit
>    in TX descriptor 7.
>    TX descriptor 8 was prefetched with a set TX_USED bit.
> 2) SW sees that TX descriptor 7 is completed
>    (TX_USED bit now is set).
> 3) SW sees that there still is a pending TX descriptor 8.
> 4) SW checks if the TGO bit is still set, which it is.
>    So the SW does nothing at this point.
> 5) HW processes the prefetched,set TX_USED bit in
>    TX descriptor 8 and stops transmission (clearing the TGO bit).
> 
> I am not sure if it is guaranteed that 5) cannot happen after 4).  If 5)
> happens after 4) as described above, then the controller still gets stuck.
> The only idea I can come up with, is to re-check the TGO bit
> a second time a little bit later, but I am not sure how to
> implement this.
> 
> Is there anyone who has access to hardware documentation, which
> sheds some light onto the way the descriptor prefetching works?

Some part of it could be found here:
https://ww1.microchip.com/downloads/aemDocuments/documents/MPU32/ProductDocuments/DataSheets/SAMA7G5-Series-Data-Sheet-DS60001765A.pdf#_OPENTOPIC_TOC_PROCESSING_d17023e778582

Thank you,
Claudiu

> 
> so long
>   Ingo
> 
> 
> Ingo Rohloff (1):
>   net: macb: A different way to restart a stuck TX descriptor ring.
> 
>  drivers/net/ethernet/cadence/macb.h      |  1 -
>  drivers/net/ethernet/cadence/macb_main.c | 67 +++++++++---------------
>  2 files changed, 24 insertions(+), 44 deletions(-)
> 
> --
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-04-24  8:54 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-17 10:02 [PATCH v3] net: macb: restart tx after tx used bit read Claudiu.Beznea
2018-12-17 10:14 ` Nicolas.Ferre
2018-12-18 23:58 ` David Miller
2022-03-23  8:08 ` Tomas Melin
2022-03-23 15:43   ` Jakub Kicinski
2022-03-23 16:42     ` Robert Hancock
2022-03-25  7:10       ` Tomas Melin
2022-03-25  8:13       ` Claudiu.Beznea
2022-03-25  9:33         ` Tomas Melin
2023-04-07 21:33       ` [PATCH 0/1] Alternative, " Ingo Rohloff
2023-04-07 21:33         ` [PATCH 1/1] net: macb: A different way to restart a stuck TX descriptor ring Ingo Rohloff
2023-04-11 15:25           ` Jesse Brandeburg
2023-04-10 17:05         ` [PATCH 0/1] Alternative, restart tx after tx used bit read Robert Hancock
2023-04-12  2:07         ` Jakub Kicinski
2023-04-12  3:17           ` Lars-Peter Clausen
2023-04-12  3:43           ` Roman Gushchin
2023-04-21 13:00           ` [PATCH v2 0/1] net: macb: Avoid erroneously stopped TX ring Ingo Rohloff
2023-04-21 13:00           ` [PATCH v2 1/1] " Ingo Rohloff
2023-04-22  2:57             ` Jakub Kicinski
2023-04-12  6:27         ` [PATCH 0/1] Alternative, restart tx after tx used bit read Tomas Melin
2023-04-12 16:17           ` Robert Hancock
2023-04-24  8:54         ` Claudiu.Beznea

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).