All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/2] MACB NAPI improvements
@ 2022-04-29 22:31 Robert Hancock
  2022-04-29 22:31 ` [PATCH net-next 1/2] net: macb: simplify/cleanup NAPI reschedule checking Robert Hancock
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Robert Hancock @ 2022-04-29 22:31 UTC (permalink / raw)
  To: netdev
  Cc: nicolas.ferre, claudiu.beznea, davem, edumazet, kuba, pabeni,
	Robert Hancock

Simplify the logic in the Cadence MACB/GEM driver for determining
when to reschedule NAPI processing, and update it to use NAPI for the
TX path as well as the RX path.

Robert Hancock (2):
  net: macb: simplify/cleanup NAPI reschedule checking
  net: macb: use NAPI for TX completion path

 drivers/net/ethernet/cadence/macb.h      |   1 +
 drivers/net/ethernet/cadence/macb_main.c | 193 ++++++++++++-----------
 2 files changed, 106 insertions(+), 88 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH net-next 1/2] net: macb: simplify/cleanup NAPI reschedule checking
  2022-04-29 22:31 [PATCH net-next 0/2] MACB NAPI improvements Robert Hancock
@ 2022-04-29 22:31 ` Robert Hancock
  2022-04-29 22:31 ` [PATCH net-next 2/2] net: macb: use NAPI for TX completion path Robert Hancock
  2022-05-05 17:56 ` [PATCH net-next 0/2] MACB NAPI improvements Robert Hancock
  2 siblings, 0 replies; 16+ messages in thread
From: Robert Hancock @ 2022-04-29 22:31 UTC (permalink / raw)
  To: netdev
  Cc: nicolas.ferre, claudiu.beznea, davem, edumazet, kuba, pabeni,
	Robert Hancock

Previously the macb_poll method was checking the RSR register after
completing its RX receive work to see if additional packets had been
received since IRQs were disabled, since this controller does not
maintain the pending IRQ status across IRQ disable. It also had to
double-check the register after re-enabling IRQs to detect if packets
were received after the first check but before IRQs were enabled.

Using the RSR register for this purpose is problematic since it reflects
the global device state rather than the per-queue state, so if packets
are being received on multiple queues it may end up retriggering receive
on a queue where the packets did not actually arrive and not on the one
where they did arrive. This will also cause problems with an upcoming
change to use NAPI for the TX path where use of multiple queues is more
likely.

Add a macb_rx_pending function to check the RX ring to see if more
packets have arrived in the queue, and use that to check if NAPI should
be rescheduled rather than the RSR register. By doing this, we can just
ignore the global RSR register entirely, and thus save some extra device
register accesses at the same time.

This also makes the previous first check for pending packets rather
redundant, since it would be checking the RX ring state which was just
checked in the receive work function. Therefore we can get rid of it and
just check after enabling interrupts whether packets are already
pending.

Signed-off-by: Robert Hancock <robert.hancock@calian.com>
---
 drivers/net/ethernet/cadence/macb_main.c | 65 +++++++++++-------------
 1 file changed, 31 insertions(+), 34 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 6434e74c04f1..160dc5ad84ae 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1554,54 +1554,51 @@ static int macb_rx(struct macb_queue *queue, struct napi_struct *napi,
 	return received;
 }
 
+static bool macb_rx_pending(struct macb_queue *queue)
+{
+	struct macb *bp = queue->bp;
+	unsigned int		entry;
+	struct macb_dma_desc	*desc;
+
+	entry = macb_rx_ring_wrap(bp, queue->rx_tail);
+	desc = macb_rx_desc(queue, entry);
+
+	/* Make hw descriptor updates visible to CPU */
+	rmb();
+
+	return (desc->addr & MACB_BIT(RX_USED)) != 0;
+}
+
 static int macb_poll(struct napi_struct *napi, int budget)
 {
 	struct macb_queue *queue = container_of(napi, struct macb_queue, napi);
 	struct macb *bp = queue->bp;
 	int work_done;
-	u32 status;
 
-	status = macb_readl(bp, RSR);
-	macb_writel(bp, RSR, status);
+	work_done = bp->macbgem_ops.mog_rx(queue, napi, budget);
 
-	netdev_vdbg(bp->dev, "poll: status = %08lx, budget = %d\n",
-		    (unsigned long)status, budget);
+	netdev_vdbg(bp->dev, "poll: queue = %u, work_done = %d, budget = %d\n",
+		    (unsigned int)(queue - bp->queues), work_done, budget);
 
-	work_done = bp->macbgem_ops.mog_rx(queue, napi, budget);
-	if (work_done < budget) {
-		napi_complete_done(napi, work_done);
+	if (work_done < budget && napi_complete_done(napi, work_done)) {
+		queue_writel(queue, IER, bp->rx_intr_mask);
 
-		/* RSR bits only seem to propagate to raise interrupts when
-		 * interrupts are enabled at the time, so if bits are already
-		 * set due to packets received while interrupts were disabled,
+		/* Packet completions only seem to propagate to raise
+		 * interrupts when interrupts are enabled at the time, so if
+		 * packets were received while interrupts were disabled,
 		 * they will not cause another interrupt to be generated when
 		 * interrupts are re-enabled.
-		 * Check for this case here. This has been seen to happen
-		 * around 30% of the time under heavy network load.
+		 * Check for this case here to avoid losing a wakeup. This can
+		 * potentially race with the interrupt handler doing the same
+		 * actions if an interrupt is raised just after enabling them,
+		 * but this should be harmless.
 		 */
-		status = macb_readl(bp, RSR);
-		if (status) {
+		if (macb_rx_pending(queue)) {
+			queue_writel(queue, IDR, bp->rx_intr_mask);
 			if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
 				queue_writel(queue, ISR, MACB_BIT(RCOMP));
-			napi_reschedule(napi);
-		} else {
-			queue_writel(queue, IER, bp->rx_intr_mask);
-
-			/* In rare cases, packets could have been received in
-			 * the window between the check above and re-enabling
-			 * interrupts. Therefore, a double-check is required
-			 * to avoid losing a wakeup. This can potentially race
-			 * with the interrupt handler doing the same actions
-			 * if an interrupt is raised just after enabling them,
-			 * but this should be harmless.
-			 */
-			status = macb_readl(bp, RSR);
-			if (unlikely(status)) {
-				queue_writel(queue, IDR, bp->rx_intr_mask);
-				if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
-					queue_writel(queue, ISR, MACB_BIT(RCOMP));
-				napi_schedule(napi);
-			}
+			netdev_vdbg(bp->dev, "poll: packets pending, reschedule\n");
+			napi_schedule(napi);
 		}
 	}
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next 2/2] net: macb: use NAPI for TX completion path
  2022-04-29 22:31 [PATCH net-next 0/2] MACB NAPI improvements Robert Hancock
  2022-04-29 22:31 ` [PATCH net-next 1/2] net: macb: simplify/cleanup NAPI reschedule checking Robert Hancock
@ 2022-04-29 22:31 ` Robert Hancock
  2022-04-29 23:09   ` Robert Hancock
                     ` (2 more replies)
  2022-05-05 17:56 ` [PATCH net-next 0/2] MACB NAPI improvements Robert Hancock
  2 siblings, 3 replies; 16+ messages in thread
From: Robert Hancock @ 2022-04-29 22:31 UTC (permalink / raw)
  To: netdev
  Cc: nicolas.ferre, claudiu.beznea, davem, edumazet, kuba, pabeni,
	Robert Hancock

This driver was using the TX IRQ handler to perform all TX completion
tasks. Under heavy TX network load, this can cause significant irqs-off
latencies (found to be in the hundreds of microseconds using ftrace).
This can cause other issues, such as overrunning serial UART FIFOs when
using high baud rates with limited UART FIFO sizes.

Switch to using the NAPI poll handler to perform the TX completion work
to get this out of hard IRQ context and avoid the IRQ latency impact.

The TX Used Bit Read interrupt (TXUBR) handling also needs to be moved
into the NAPI poll handler to maintain the proper order of operations. A
flag is used to notify the poll handler that a UBR condition needs to be
handled. The macb_tx_restart handler has had some locking added for global
register access, since this could now potentially happen concurrently on
different queues.

Signed-off-by: Robert Hancock <robert.hancock@calian.com>
---
 drivers/net/ethernet/cadence/macb.h      |   1 +
 drivers/net/ethernet/cadence/macb_main.c | 138 +++++++++++++----------
 2 files changed, 80 insertions(+), 59 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
index f0a7d8396a4a..5355cef95a9b 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -1209,6 +1209,7 @@ struct macb_queue {
 	struct macb_tx_skb	*tx_skb;
 	dma_addr_t		tx_ring_dma;
 	struct work_struct	tx_error_task;
+	bool			txubr_pending;
 
 	dma_addr_t		rx_ring_dma;
 	dma_addr_t		rx_buffers_dma;
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 160dc5ad84ae..1cb8afb8ef0d 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -959,7 +959,7 @@ static int macb_halt_tx(struct macb *bp)
 	return -ETIMEDOUT;
 }
 
-static void macb_tx_unmap(struct macb *bp, struct macb_tx_skb *tx_skb)
+static void macb_tx_unmap(struct macb *bp, struct macb_tx_skb *tx_skb, int budget)
 {
 	if (tx_skb->mapping) {
 		if (tx_skb->mapped_as_page)
@@ -972,7 +972,7 @@ static void macb_tx_unmap(struct macb *bp, struct macb_tx_skb *tx_skb)
 	}
 
 	if (tx_skb->skb) {
-		dev_kfree_skb_any(tx_skb->skb);
+		napi_consume_skb(tx_skb->skb, budget);
 		tx_skb->skb = NULL;
 	}
 }
@@ -1025,12 +1025,13 @@ static void macb_tx_error_task(struct work_struct *work)
 		    (unsigned int)(queue - bp->queues),
 		    queue->tx_tail, queue->tx_head);
 
-	/* Prevent the queue IRQ handlers from running: each of them may call
-	 * macb_tx_interrupt(), which in turn may call netif_wake_subqueue().
+	/* Prevent the queue NAPI poll from running, as it calls
+	 * macb_tx_complete(), which in turn may call netif_wake_subqueue().
 	 * As explained below, we have to halt the transmission before updating
 	 * TBQP registers so we call netif_tx_stop_all_queues() to notify the
 	 * network engine about the macb/gem being halted.
 	 */
+	napi_disable(&queue->napi);
 	spin_lock_irqsave(&bp->lock, flags);
 
 	/* Make sure nobody is trying to queue up new packets */
@@ -1058,7 +1059,7 @@ static void macb_tx_error_task(struct work_struct *work)
 		if (ctrl & MACB_BIT(TX_USED)) {
 			/* skb is set for the last buffer of the frame */
 			while (!skb) {
-				macb_tx_unmap(bp, tx_skb);
+				macb_tx_unmap(bp, tx_skb, 0);
 				tail++;
 				tx_skb = macb_tx_skb(queue, tail);
 				skb = tx_skb->skb;
@@ -1088,7 +1089,7 @@ static void macb_tx_error_task(struct work_struct *work)
 			desc->ctrl = ctrl | MACB_BIT(TX_USED);
 		}
 
-		macb_tx_unmap(bp, tx_skb);
+		macb_tx_unmap(bp, tx_skb, 0);
 	}
 
 	/* Set end of TX queue */
@@ -1118,25 +1119,28 @@ static void macb_tx_error_task(struct work_struct *work)
 	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
 
 	spin_unlock_irqrestore(&bp->lock, flags);
+	napi_enable(&queue->napi);
 }
 
-static void macb_tx_interrupt(struct macb_queue *queue)
+static bool macb_tx_complete_pending(struct macb_queue *queue)
+{
+	if (queue->tx_head != queue->tx_tail) {
+		/* Make hw descriptor updates visible to CPU */
+		rmb();
+
+		if (macb_tx_desc(queue, queue->tx_tail)->ctrl & MACB_BIT(TX_USED))
+			return true;
+	}
+	return false;
+}
+
+static void macb_tx_complete(struct macb_queue *queue, int budget)
 {
 	unsigned int tail;
 	unsigned int head;
-	u32 status;
 	struct macb *bp = queue->bp;
 	u16 queue_index = queue - bp->queues;
 
-	status = macb_readl(bp, TSR);
-	macb_writel(bp, TSR, status);
-
-	if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
-		queue_writel(queue, ISR, MACB_BIT(TCOMP));
-
-	netdev_vdbg(bp->dev, "macb_tx_interrupt status = 0x%03lx\n",
-		    (unsigned long)status);
-
 	head = queue->tx_head;
 	for (tail = queue->tx_tail; tail != head; tail++) {
 		struct macb_tx_skb	*tx_skb;
@@ -1182,7 +1186,7 @@ static void macb_tx_interrupt(struct macb_queue *queue)
 			}
 
 			/* Now we can safely release resources */
-			macb_tx_unmap(bp, tx_skb);
+			macb_tx_unmap(bp, tx_skb, budget);
 
 			/* skb is set only for the last buffer of the frame.
 			 * WARNING: at this point skb has been freed by
@@ -1569,23 +1573,55 @@ static bool macb_rx_pending(struct macb_queue *queue)
 	return (desc->addr & MACB_BIT(RX_USED)) != 0;
 }
 
+static void macb_tx_restart(struct macb_queue *queue)
+{
+	unsigned int head = queue->tx_head;
+	unsigned int tail = queue->tx_tail;
+	struct macb *bp = queue->bp;
+	unsigned int head_idx, tbqp;
+
+	if (head == tail)
+		return;
+
+	tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
+	tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
+	head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, head));
+
+	if (tbqp == head_idx)
+		return;
+
+	spin_lock_irq(&bp->lock);
+	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
+	spin_unlock_irq(&bp->lock);
+}
+
 static int macb_poll(struct napi_struct *napi, int budget)
 {
 	struct macb_queue *queue = container_of(napi, struct macb_queue, napi);
 	struct macb *bp = queue->bp;
 	int work_done;
 
+	macb_tx_complete(queue, budget);
+
+	rmb(); // ensure txubr_pending is up to date
+	if (queue->txubr_pending) {
+		queue->txubr_pending = false;
+		netdev_vdbg(bp->dev, "poll: tx restart\n");
+		macb_tx_restart(queue);
+	}
+
 	work_done = bp->macbgem_ops.mog_rx(queue, napi, budget);
 
 	netdev_vdbg(bp->dev, "poll: queue = %u, work_done = %d, budget = %d\n",
 		    (unsigned int)(queue - bp->queues), work_done, budget);
 
 	if (work_done < budget && napi_complete_done(napi, work_done)) {
-		queue_writel(queue, IER, bp->rx_intr_mask);
+		queue_writel(queue, IER, bp->rx_intr_mask |
+					 MACB_BIT(TCOMP));
 
 		/* Packet completions only seem to propagate to raise
 		 * interrupts when interrupts are enabled at the time, so if
-		 * packets were received while interrupts were disabled,
+		 * packets were sent/received while interrupts were disabled,
 		 * they will not cause another interrupt to be generated when
 		 * interrupts are re-enabled.
 		 * Check for this case here to avoid losing a wakeup. This can
@@ -1593,10 +1629,13 @@ static int macb_poll(struct napi_struct *napi, int budget)
 		 * actions if an interrupt is raised just after enabling them,
 		 * but this should be harmless.
 		 */
-		if (macb_rx_pending(queue)) {
-			queue_writel(queue, IDR, bp->rx_intr_mask);
+		if (macb_rx_pending(queue) ||
+		    macb_tx_complete_pending(queue)) {
+			queue_writel(queue, IDR, bp->rx_intr_mask |
+						 MACB_BIT(TCOMP));
 			if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
-				queue_writel(queue, ISR, MACB_BIT(RCOMP));
+				queue_writel(queue, ISR, MACB_BIT(RCOMP) |
+							 MACB_BIT(TCOMP));
 			netdev_vdbg(bp->dev, "poll: packets pending, reschedule\n");
 			napi_schedule(napi);
 		}
@@ -1646,29 +1685,6 @@ static void macb_hresp_error_task(struct tasklet_struct *t)
 	netif_tx_start_all_queues(dev);
 }
 
-static void macb_tx_restart(struct macb_queue *queue)
-{
-	unsigned int head = queue->tx_head;
-	unsigned int tail = queue->tx_tail;
-	struct macb *bp = queue->bp;
-	unsigned int head_idx, tbqp;
-
-	if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
-		queue_writel(queue, ISR, MACB_BIT(TXUBR));
-
-	if (head == tail)
-		return;
-
-	tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
-	tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
-	head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, head));
-
-	if (tbqp == head_idx)
-		return;
-
-	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
-}
-
 static irqreturn_t macb_wol_interrupt(int irq, void *dev_id)
 {
 	struct macb_queue *queue = dev_id;
@@ -1754,19 +1770,29 @@ static irqreturn_t macb_interrupt(int irq, void *dev_id)
 			    (unsigned int)(queue - bp->queues),
 			    (unsigned long)status);
 
-		if (status & bp->rx_intr_mask) {
-			/* There's no point taking any more interrupts
-			 * until we have processed the buffers. The
+		if (status & (bp->rx_intr_mask |
+			      MACB_BIT(TCOMP) |
+			      MACB_BIT(TXUBR))) {
+			/* There's no point taking any more RX/TX completion
+			 * interrupts until we have processed the buffers. The
 			 * scheduling call may fail if the poll routine
 			 * is already scheduled, so disable interrupts
 			 * now.
 			 */
-			queue_writel(queue, IDR, bp->rx_intr_mask);
+			queue_writel(queue, IDR, bp->rx_intr_mask |
+						 MACB_BIT(TCOMP));
 			if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
-				queue_writel(queue, ISR, MACB_BIT(RCOMP));
+				queue_writel(queue, ISR, MACB_BIT(RCOMP) |
+							 MACB_BIT(TCOMP) |
+							 MACB_BIT(TXUBR));
+
+			if (status & MACB_BIT(TXUBR)) {
+				queue->txubr_pending = true;
+				wmb(); // ensure softirq can see update
+			}
 
 			if (napi_schedule_prep(&queue->napi)) {
-				netdev_vdbg(bp->dev, "scheduling RX softirq\n");
+				netdev_vdbg(bp->dev, "scheduling NAPI softirq\n");
 				__napi_schedule(&queue->napi);
 			}
 		}
@@ -1781,12 +1807,6 @@ static irqreturn_t macb_interrupt(int irq, void *dev_id)
 			break;
 		}
 
-		if (status & MACB_BIT(TCOMP))
-			macb_tx_interrupt(queue);
-
-		if (status & MACB_BIT(TXUBR))
-			macb_tx_restart(queue);
-
 		/* Link change detection isn't possible with RMII, so we'll
 		 * add that if/when we get our hands on a full-blown MII PHY.
 		 */
@@ -2019,7 +2039,7 @@ static unsigned int macb_tx_map(struct macb *bp,
 	for (i = queue->tx_head; i != tx_head; i++) {
 		tx_skb = macb_tx_skb(queue, i);
 
-		macb_tx_unmap(bp, tx_skb);
+		macb_tx_unmap(bp, tx_skb, 0);
 	}
 
 	return 0;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 2/2] net: macb: use NAPI for TX completion path
  2022-04-29 22:31 ` [PATCH net-next 2/2] net: macb: use NAPI for TX completion path Robert Hancock
@ 2022-04-29 23:09   ` Robert Hancock
  2022-05-03  7:22     ` Paolo Abeni
  2022-05-03  9:34     ` Tomas Melin
  2022-05-03  9:57   ` Tomas Melin
  2022-05-06 19:04   ` Robert Hancock
  2 siblings, 2 replies; 16+ messages in thread
From: Robert Hancock @ 2022-04-29 23:09 UTC (permalink / raw)
  To: netdev
  Cc: nicolas.ferre, davem, claudiu.beznea, kuba, pabeni, edumazet,
	tomas.melin

On Fri, 2022-04-29 at 16:31 -0600, Robert Hancock wrote:
> This driver was using the TX IRQ handler to perform all TX completion
> tasks. Under heavy TX network load, this can cause significant irqs-off
> latencies (found to be in the hundreds of microseconds using ftrace).
> This can cause other issues, such as overrunning serial UART FIFOs when
> using high baud rates with limited UART FIFO sizes.
> 
> Switch to using the NAPI poll handler to perform the TX completion work
> to get this out of hard IRQ context and avoid the IRQ latency impact.
> 
> The TX Used Bit Read interrupt (TXUBR) handling also needs to be moved
> into the NAPI poll handler to maintain the proper order of operations. A
> flag is used to notify the poll handler that a UBR condition needs to be
> handled. The macb_tx_restart handler has had some locking added for global
> register access, since this could now potentially happen concurrently on
> different queues.
> 
> Signed-off-by: Robert Hancock <robert.hancock@calian.com>
> ---
>  drivers/net/ethernet/cadence/macb.h      |   1 +
>  drivers/net/ethernet/cadence/macb_main.c | 138 +++++++++++++----------
>  2 files changed, 80 insertions(+), 59 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cadence/macb.h
> b/drivers/net/ethernet/cadence/macb.h
> index f0a7d8396a4a..5355cef95a9b 100644
> --- a/drivers/net/ethernet/cadence/macb.h
> +++ b/drivers/net/ethernet/cadence/macb.h
> @@ -1209,6 +1209,7 @@ struct macb_queue {
>  	struct macb_tx_skb	*tx_skb;
>  	dma_addr_t		tx_ring_dma;
>  	struct work_struct	tx_error_task;
> +	bool			txubr_pending;
>  
>  	dma_addr_t		rx_ring_dma;
>  	dma_addr_t		rx_buffers_dma;
> diff --git a/drivers/net/ethernet/cadence/macb_main.c
> b/drivers/net/ethernet/cadence/macb_main.c
> index 160dc5ad84ae..1cb8afb8ef0d 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -959,7 +959,7 @@ static int macb_halt_tx(struct macb *bp)
>  	return -ETIMEDOUT;
>  }
>  
> -static void macb_tx_unmap(struct macb *bp, struct macb_tx_skb *tx_skb)
> +static void macb_tx_unmap(struct macb *bp, struct macb_tx_skb *tx_skb, int
> budget)
>  {
>  	if (tx_skb->mapping) {
>  		if (tx_skb->mapped_as_page)
> @@ -972,7 +972,7 @@ static void macb_tx_unmap(struct macb *bp, struct
> macb_tx_skb *tx_skb)
>  	}
>  
>  	if (tx_skb->skb) {
> -		dev_kfree_skb_any(tx_skb->skb);
> +		napi_consume_skb(tx_skb->skb, budget);
>  		tx_skb->skb = NULL;
>  	}
>  }
> @@ -1025,12 +1025,13 @@ static void macb_tx_error_task(struct work_struct
> *work)
>  		    (unsigned int)(queue - bp->queues),
>  		    queue->tx_tail, queue->tx_head);
>  
> -	/* Prevent the queue IRQ handlers from running: each of them may call
> -	 * macb_tx_interrupt(), which in turn may call netif_wake_subqueue().
> +	/* Prevent the queue NAPI poll from running, as it calls
> +	 * macb_tx_complete(), which in turn may call netif_wake_subqueue().
>  	 * As explained below, we have to halt the transmission before updating
>  	 * TBQP registers so we call netif_tx_stop_all_queues() to notify the
>  	 * network engine about the macb/gem being halted.
>  	 */
> +	napi_disable(&queue->napi);
>  	spin_lock_irqsave(&bp->lock, flags);
>  
>  	/* Make sure nobody is trying to queue up new packets */
> @@ -1058,7 +1059,7 @@ static void macb_tx_error_task(struct work_struct
> *work)
>  		if (ctrl & MACB_BIT(TX_USED)) {
>  			/* skb is set for the last buffer of the frame */
>  			while (!skb) {
> -				macb_tx_unmap(bp, tx_skb);
> +				macb_tx_unmap(bp, tx_skb, 0);
>  				tail++;
>  				tx_skb = macb_tx_skb(queue, tail);
>  				skb = tx_skb->skb;
> @@ -1088,7 +1089,7 @@ static void macb_tx_error_task(struct work_struct
> *work)
>  			desc->ctrl = ctrl | MACB_BIT(TX_USED);
>  		}
>  
> -		macb_tx_unmap(bp, tx_skb);
> +		macb_tx_unmap(bp, tx_skb, 0);
>  	}
>  
>  	/* Set end of TX queue */
> @@ -1118,25 +1119,28 @@ static void macb_tx_error_task(struct work_struct
> *work)
>  	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
>  
>  	spin_unlock_irqrestore(&bp->lock, flags);
> +	napi_enable(&queue->napi);
>  }
>  
> -static void macb_tx_interrupt(struct macb_queue *queue)
> +static bool macb_tx_complete_pending(struct macb_queue *queue)
> +{
> +	if (queue->tx_head != queue->tx_tail) {
> +		/* Make hw descriptor updates visible to CPU */
> +		rmb();
> +
> +		if (macb_tx_desc(queue, queue->tx_tail)->ctrl &
> MACB_BIT(TX_USED))
> +			return true;
> +	}
> +	return false;
> +}
> +
> +static void macb_tx_complete(struct macb_queue *queue, int budget)
>  {
>  	unsigned int tail;
>  	unsigned int head;
> -	u32 status;
>  	struct macb *bp = queue->bp;
>  	u16 queue_index = queue - bp->queues;
>  
> -	status = macb_readl(bp, TSR);
> -	macb_writel(bp, TSR, status);
> -
> -	if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> -		queue_writel(queue, ISR, MACB_BIT(TCOMP));
> -
> -	netdev_vdbg(bp->dev, "macb_tx_interrupt status = 0x%03lx\n",
> -		    (unsigned long)status);
> -
>  	head = queue->tx_head;
>  	for (tail = queue->tx_tail; tail != head; tail++) {
>  		struct macb_tx_skb	*tx_skb;
> @@ -1182,7 +1186,7 @@ static void macb_tx_interrupt(struct macb_queue *queue)
>  			}
>  
>  			/* Now we can safely release resources */
> -			macb_tx_unmap(bp, tx_skb);
> +			macb_tx_unmap(bp, tx_skb, budget);
>  
>  			/* skb is set only for the last buffer of the frame.
>  			 * WARNING: at this point skb has been freed by
> @@ -1569,23 +1573,55 @@ static bool macb_rx_pending(struct macb_queue *queue)
>  	return (desc->addr & MACB_BIT(RX_USED)) != 0;
>  }
>  
> +static void macb_tx_restart(struct macb_queue *queue)
> +{
> +	unsigned int head = queue->tx_head;
> +	unsigned int tail = queue->tx_tail;
> +	struct macb *bp = queue->bp;
> +	unsigned int head_idx, tbqp;
> +
> +	if (head == tail)
> +		return;
> +
> +	tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
> +	tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
> +	head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, head));
> +
> +	if (tbqp == head_idx)
> +		return;
> +
> +	spin_lock_irq(&bp->lock);
> +	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
> +	spin_unlock_irq(&bp->lock);
> +}
> +
>  static int macb_poll(struct napi_struct *napi, int budget)
>  {
>  	struct macb_queue *queue = container_of(napi, struct macb_queue, napi);
>  	struct macb *bp = queue->bp;
>  	int work_done;
>  
> +	macb_tx_complete(queue, budget);
> +
> +	rmb(); // ensure txubr_pending is up to date
> +	if (queue->txubr_pending) {
> +		queue->txubr_pending = false;
> +		netdev_vdbg(bp->dev, "poll: tx restart\n");
> +		macb_tx_restart(queue);
> +	}
> +

Thinking about this a bit more, I wonder if we could just do this tx_restart
call unconditionally here? Then we wouldn't need this txubr_pending flag at
all. CCing Tomas Melin who worked on this tx_restart code recently.

>  	work_done = bp->macbgem_ops.mog_rx(queue, napi, budget);
>  
>  	netdev_vdbg(bp->dev, "poll: queue = %u, work_done = %d, budget = %d\n",
>  		    (unsigned int)(queue - bp->queues), work_done, budget);
>  
>  	if (work_done < budget && napi_complete_done(napi, work_done)) {
> -		queue_writel(queue, IER, bp->rx_intr_mask);
> +		queue_writel(queue, IER, bp->rx_intr_mask |
> +					 MACB_BIT(TCOMP));
>  
>  		/* Packet completions only seem to propagate to raise
>  		 * interrupts when interrupts are enabled at the time, so if
> -		 * packets were received while interrupts were disabled,
> +		 * packets were sent/received while interrupts were disabled,
>  		 * they will not cause another interrupt to be generated when
>  		 * interrupts are re-enabled.
>  		 * Check for this case here to avoid losing a wakeup. This can
> @@ -1593,10 +1629,13 @@ static int macb_poll(struct napi_struct *napi, int
> budget)
>  		 * actions if an interrupt is raised just after enabling them,
>  		 * but this should be harmless.
>  		 */
> -		if (macb_rx_pending(queue)) {
> -			queue_writel(queue, IDR, bp->rx_intr_mask);
> +		if (macb_rx_pending(queue) ||
> +		    macb_tx_complete_pending(queue)) {
> +			queue_writel(queue, IDR, bp->rx_intr_mask |
> +						 MACB_BIT(TCOMP));
>  			if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> -				queue_writel(queue, ISR, MACB_BIT(RCOMP));
> +				queue_writel(queue, ISR, MACB_BIT(RCOMP) |
> +							 MACB_BIT(TCOMP));
>  			netdev_vdbg(bp->dev, "poll: packets pending,
> reschedule\n");
>  			napi_schedule(napi);
>  		}
> @@ -1646,29 +1685,6 @@ static void macb_hresp_error_task(struct
> tasklet_struct *t)
>  	netif_tx_start_all_queues(dev);
>  }
>  
> -static void macb_tx_restart(struct macb_queue *queue)
> -{
> -	unsigned int head = queue->tx_head;
> -	unsigned int tail = queue->tx_tail;
> -	struct macb *bp = queue->bp;
> -	unsigned int head_idx, tbqp;
> -
> -	if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> -		queue_writel(queue, ISR, MACB_BIT(TXUBR));
> -
> -	if (head == tail)
> -		return;
> -
> -	tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
> -	tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
> -	head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, head));
> -
> -	if (tbqp == head_idx)
> -		return;
> -
> -	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
> -}
> -
>  static irqreturn_t macb_wol_interrupt(int irq, void *dev_id)
>  {
>  	struct macb_queue *queue = dev_id;
> @@ -1754,19 +1770,29 @@ static irqreturn_t macb_interrupt(int irq, void
> *dev_id)
>  			    (unsigned int)(queue - bp->queues),
>  			    (unsigned long)status);
>  
> -		if (status & bp->rx_intr_mask) {
> -			/* There's no point taking any more interrupts
> -			 * until we have processed the buffers. The
> +		if (status & (bp->rx_intr_mask |
> +			      MACB_BIT(TCOMP) |
> +			      MACB_BIT(TXUBR))) {
> +			/* There's no point taking any more RX/TX completion
> +			 * interrupts until we have processed the buffers. The
>  			 * scheduling call may fail if the poll routine
>  			 * is already scheduled, so disable interrupts
>  			 * now.
>  			 */
> -			queue_writel(queue, IDR, bp->rx_intr_mask);
> +			queue_writel(queue, IDR, bp->rx_intr_mask |
> +						 MACB_BIT(TCOMP));
>  			if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> -				queue_writel(queue, ISR, MACB_BIT(RCOMP));
> +				queue_writel(queue, ISR, MACB_BIT(RCOMP) |
> +							 MACB_BIT(TCOMP) |
> +							 MACB_BIT(TXUBR));
> +
> +			if (status & MACB_BIT(TXUBR)) {
> +				queue->txubr_pending = true;
> +				wmb(); // ensure softirq can see update
> +			}
>  
>  			if (napi_schedule_prep(&queue->napi)) {
> -				netdev_vdbg(bp->dev, "scheduling RX
> softirq\n");
> +				netdev_vdbg(bp->dev, "scheduling NAPI
> softirq\n");
>  				__napi_schedule(&queue->napi);
>  			}
>  		}
> @@ -1781,12 +1807,6 @@ static irqreturn_t macb_interrupt(int irq, void
> *dev_id)
>  			break;
>  		}
>  
> -		if (status & MACB_BIT(TCOMP))
> -			macb_tx_interrupt(queue);
> -
> -		if (status & MACB_BIT(TXUBR))
> -			macb_tx_restart(queue);
> -
>  		/* Link change detection isn't possible with RMII, so we'll
>  		 * add that if/when we get our hands on a full-blown MII PHY.
>  		 */
> @@ -2019,7 +2039,7 @@ static unsigned int macb_tx_map(struct macb *bp,
>  	for (i = queue->tx_head; i != tx_head; i++) {
>  		tx_skb = macb_tx_skb(queue, i);
>  
> -		macb_tx_unmap(bp, tx_skb);
> +		macb_tx_unmap(bp, tx_skb, 0);
>  	}
>  
>  	return 0;
-- 
Robert Hancock
Senior Hardware Designer, Calian Advanced Technologies
www.calian.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 2/2] net: macb: use NAPI for TX completion path
  2022-04-29 23:09   ` Robert Hancock
@ 2022-05-03  7:22     ` Paolo Abeni
  2022-05-03 18:34       ` Robert Hancock
  2022-05-03  9:34     ` Tomas Melin
  1 sibling, 1 reply; 16+ messages in thread
From: Paolo Abeni @ 2022-05-03  7:22 UTC (permalink / raw)
  To: Robert Hancock, netdev
  Cc: nicolas.ferre, davem, claudiu.beznea, kuba, edumazet, tomas.melin

On Fri, 2022-04-29 at 23:09 +0000, Robert Hancock wrote:
> On Fri, 2022-04-29 at 16:31 -0600, Robert Hancock wrote:
> > This driver was using the TX IRQ handler to perform all TX completion
> > tasks. Under heavy TX network load, this can cause significant irqs-off
> > latencies (found to be in the hundreds of microseconds using ftrace).
> > This can cause other issues, such as overrunning serial UART FIFOs when
> > using high baud rates with limited UART FIFO sizes.
> > 
> > Switch to using the NAPI poll handler to perform the TX completion work
> > to get this out of hard IRQ context and avoid the IRQ latency impact.
> > 
> > The TX Used Bit Read interrupt (TXUBR) handling also needs to be moved
> > into the NAPI poll handler to maintain the proper order of operations. A
> > flag is used to notify the poll handler that a UBR condition needs to be
> > handled. The macb_tx_restart handler has had some locking added for global
> > register access, since this could now potentially happen concurrently on
> > different queues.
> > 
> > Signed-off-by: Robert Hancock <robert.hancock@calian.com>
> > ---
> >  drivers/net/ethernet/cadence/macb.h      |   1 +
> >  drivers/net/ethernet/cadence/macb_main.c | 138 +++++++++++++----------
> >  2 files changed, 80 insertions(+), 59 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/cadence/macb.h
> > b/drivers/net/ethernet/cadence/macb.h
> > index f0a7d8396a4a..5355cef95a9b 100644
> > --- a/drivers/net/ethernet/cadence/macb.h
> > +++ b/drivers/net/ethernet/cadence/macb.h
> > @@ -1209,6 +1209,7 @@ struct macb_queue {
> >  	struct macb_tx_skb	*tx_skb;
> >  	dma_addr_t		tx_ring_dma;
> >  	struct work_struct	tx_error_task;
> > +	bool			txubr_pending;
> >  
> >  	dma_addr_t		rx_ring_dma;
> >  	dma_addr_t		rx_buffers_dma;
> > diff --git a/drivers/net/ethernet/cadence/macb_main.c
> > b/drivers/net/ethernet/cadence/macb_main.c
> > index 160dc5ad84ae..1cb8afb8ef0d 100644
> > --- a/drivers/net/ethernet/cadence/macb_main.c
> > +++ b/drivers/net/ethernet/cadence/macb_main.c
> > @@ -959,7 +959,7 @@ static int macb_halt_tx(struct macb *bp)
> >  	return -ETIMEDOUT;
> >  }
> >  
> > -static void macb_tx_unmap(struct macb *bp, struct macb_tx_skb *tx_skb)
> > +static void macb_tx_unmap(struct macb *bp, struct macb_tx_skb *tx_skb, int
> > budget)
> >  {
> >  	if (tx_skb->mapping) {
> >  		if (tx_skb->mapped_as_page)
> > @@ -972,7 +972,7 @@ static void macb_tx_unmap(struct macb *bp, struct
> > macb_tx_skb *tx_skb)
> >  	}
> >  
> >  	if (tx_skb->skb) {
> > -		dev_kfree_skb_any(tx_skb->skb);
> > +		napi_consume_skb(tx_skb->skb, budget);
> >  		tx_skb->skb = NULL;
> >  	}
> >  }
> > @@ -1025,12 +1025,13 @@ static void macb_tx_error_task(struct work_struct
> > *work)
> >  		    (unsigned int)(queue - bp->queues),
> >  		    queue->tx_tail, queue->tx_head);
> >  
> > -	/* Prevent the queue IRQ handlers from running: each of them may call
> > -	 * macb_tx_interrupt(), which in turn may call netif_wake_subqueue().
> > +	/* Prevent the queue NAPI poll from running, as it calls
> > +	 * macb_tx_complete(), which in turn may call netif_wake_subqueue().
> >  	 * As explained below, we have to halt the transmission before updating
> >  	 * TBQP registers so we call netif_tx_stop_all_queues() to notify the
> >  	 * network engine about the macb/gem being halted.
> >  	 */
> > +	napi_disable(&queue->napi);
> >  	spin_lock_irqsave(&bp->lock, flags);
> >  
> >  	/* Make sure nobody is trying to queue up new packets */
> > @@ -1058,7 +1059,7 @@ static void macb_tx_error_task(struct work_struct
> > *work)
> >  		if (ctrl & MACB_BIT(TX_USED)) {
> >  			/* skb is set for the last buffer of the frame */
> >  			while (!skb) {
> > -				macb_tx_unmap(bp, tx_skb);
> > +				macb_tx_unmap(bp, tx_skb, 0);
> >  				tail++;
> >  				tx_skb = macb_tx_skb(queue, tail);
> >  				skb = tx_skb->skb;
> > @@ -1088,7 +1089,7 @@ static void macb_tx_error_task(struct work_struct
> > *work)
> >  			desc->ctrl = ctrl | MACB_BIT(TX_USED);
> >  		}
> >  
> > -		macb_tx_unmap(bp, tx_skb);
> > +		macb_tx_unmap(bp, tx_skb, 0);
> >  	}
> >  
> >  	/* Set end of TX queue */
> > @@ -1118,25 +1119,28 @@ static void macb_tx_error_task(struct work_struct
> > *work)
> >  	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
> >  
> >  	spin_unlock_irqrestore(&bp->lock, flags);
> > +	napi_enable(&queue->napi);
> >  }
> >  
> > -static void macb_tx_interrupt(struct macb_queue *queue)
> > +static bool macb_tx_complete_pending(struct macb_queue *queue)
> > +{
> > +	if (queue->tx_head != queue->tx_tail) {
> > +		/* Make hw descriptor updates visible to CPU */
> > +		rmb();
> > +
> > +		if (macb_tx_desc(queue, queue->tx_tail)->ctrl &
> > MACB_BIT(TX_USED))
> > +			return true;
> > +	}
> > +	return false;
> > +}
> > +
> > +static void macb_tx_complete(struct macb_queue *queue, int budget)
> >  {
> >  	unsigned int tail;
> >  	unsigned int head;
> > -	u32 status;
> >  	struct macb *bp = queue->bp;
> >  	u16 queue_index = queue - bp->queues;
> >  
> > -	status = macb_readl(bp, TSR);
> > -	macb_writel(bp, TSR, status);
> > -
> > -	if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> > -		queue_writel(queue, ISR, MACB_BIT(TCOMP));
> > -
> > -	netdev_vdbg(bp->dev, "macb_tx_interrupt status = 0x%03lx\n",
> > -		    (unsigned long)status);
> > -
> >  	head = queue->tx_head;
> >  	for (tail = queue->tx_tail; tail != head; tail++) {
> >  		struct macb_tx_skb	*tx_skb;
> > @@ -1182,7 +1186,7 @@ static void macb_tx_interrupt(struct macb_queue *queue)
> >  			}
> >  
> >  			/* Now we can safely release resources */
> > -			macb_tx_unmap(bp, tx_skb);
> > +			macb_tx_unmap(bp, tx_skb, budget);
> >  
> >  			/* skb is set only for the last buffer of the frame.
> >  			 * WARNING: at this point skb has been freed by
> > @@ -1569,23 +1573,55 @@ static bool macb_rx_pending(struct macb_queue *queue)
> >  	return (desc->addr & MACB_BIT(RX_USED)) != 0;
> >  }
> >  
> > +static void macb_tx_restart(struct macb_queue *queue)
> > +{
> > +	unsigned int head = queue->tx_head;
> > +	unsigned int tail = queue->tx_tail;
> > +	struct macb *bp = queue->bp;
> > +	unsigned int head_idx, tbqp;
> > +
> > +	if (head == tail)
> > +		return;
> > +
> > +	tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
> > +	tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
> > +	head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, head));
> > +
> > +	if (tbqp == head_idx)
> > +		return;
> > +
> > +	spin_lock_irq(&bp->lock);
> > +	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
> > +	spin_unlock_irq(&bp->lock);
> > +}
> > +
> >  static int macb_poll(struct napi_struct *napi, int budget)
> >  {
> >  	struct macb_queue *queue = container_of(napi, struct macb_queue, napi);
> >  	struct macb *bp = queue->bp;
> >  	int work_done;
> >  
> > +	macb_tx_complete(queue, budget);
> > +
> > +	rmb(); // ensure txubr_pending is up to date
> > +	if (queue->txubr_pending) {
> > +		queue->txubr_pending = false;
> > +		netdev_vdbg(bp->dev, "poll: tx restart\n");
> > +		macb_tx_restart(queue);
> > +	}
> > +
> 
> Thinking about this a bit more, I wonder if we could just do this tx_restart
> call unconditionally here? Then we wouldn't need this txubr_pending flag at
> all. CCing Tomas Melin who worked on this tx_restart code recently.

Looking only at the code, and lacking the NIC specs, the two
alternative look functionally equivalent.

Performance wise it could matter. It depends on the relative cost of
ISR+memory barriers vs restarting TX  - removing txubr_pending you will
trade the former for the latter.

I guess the easier way to check is doing performance comparisons with
the 2 options. I hope you have the relevant H/W handy ;)

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 2/2] net: macb: use NAPI for TX completion path
  2022-04-29 23:09   ` Robert Hancock
  2022-05-03  7:22     ` Paolo Abeni
@ 2022-05-03  9:34     ` Tomas Melin
  2022-05-03 17:01       ` Robert Hancock
  1 sibling, 1 reply; 16+ messages in thread
From: Tomas Melin @ 2022-05-03  9:34 UTC (permalink / raw)
  To: Robert Hancock, netdev
  Cc: nicolas.ferre, davem, claudiu.beznea, kuba, pabeni, edumazet

Hi,

On 30/04/2022 02:09, Robert Hancock wrote:
> On Fri, 2022-04-29 at 16:31 -0600, Robert Hancock wrote:
>> This driver was using the TX IRQ handler to perform all TX completion
>> tasks. Under heavy TX network load, this can cause significant irqs-off
>> latencies (found to be in the hundreds of microseconds using ftrace).
>> This can cause other issues, such as overrunning serial UART FIFOs when
>> using high baud rates with limited UART FIFO sizes.
>>
>> Switch to using the NAPI poll handler to perform the TX completion work
>> to get this out of hard IRQ context and avoid the IRQ latency impact.
>>
>> The TX Used Bit Read interrupt (TXUBR) handling also needs to be moved
>> into the NAPI poll handler to maintain the proper order of operations. A
>> flag is used to notify the poll handler that a UBR condition needs to be
>> handled. The macb_tx_restart handler has had some locking added for global
>> register access, since this could now potentially happen concurrently on
>> different queues.
>>
>> Signed-off-by: Robert Hancock <robert.hancock@calian.com>
>> ---
>>  drivers/net/ethernet/cadence/macb.h      |   1 +
>>  drivers/net/ethernet/cadence/macb_main.c | 138 +++++++++++++----------
>>  2 files changed, 80 insertions(+), 59 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/cadence/macb.h
>> b/drivers/net/ethernet/cadence/macb.h
>> index f0a7d8396a4a..5355cef95a9b 100644
>> --- a/drivers/net/ethernet/cadence/macb.h
>> +++ b/drivers/net/ethernet/cadence/macb.h
>> @@ -1209,6 +1209,7 @@ struct macb_queue {
>>  	struct macb_tx_skb	*tx_skb;
>>  	dma_addr_t		tx_ring_dma;
>>  	struct work_struct	tx_error_task;
>> +	bool			txubr_pending;
>>  
>>  	dma_addr_t		rx_ring_dma;
>>  	dma_addr_t		rx_buffers_dma;
>> diff --git a/drivers/net/ethernet/cadence/macb_main.c
>> b/drivers/net/ethernet/cadence/macb_main.c
>> index 160dc5ad84ae..1cb8afb8ef0d 100644
>> --- a/drivers/net/ethernet/cadence/macb_main.c
>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>> @@ -959,7 +959,7 @@ static int macb_halt_tx(struct macb *bp)
>>  	return -ETIMEDOUT;
>>  }
>>  
>> -static void macb_tx_unmap(struct macb *bp, struct macb_tx_skb *tx_skb)
>> +static void macb_tx_unmap(struct macb *bp, struct macb_tx_skb *tx_skb, int
>> budget)
>>  {
>>  	if (tx_skb->mapping) {
>>  		if (tx_skb->mapped_as_page)
>> @@ -972,7 +972,7 @@ static void macb_tx_unmap(struct macb *bp, struct
>> macb_tx_skb *tx_skb)
>>  	}
>>  
>>  	if (tx_skb->skb) {
>> -		dev_kfree_skb_any(tx_skb->skb);
>> +		napi_consume_skb(tx_skb->skb, budget);
>>  		tx_skb->skb = NULL;
>>  	}
>>  }
>> @@ -1025,12 +1025,13 @@ static void macb_tx_error_task(struct work_struct
>> *work)
>>  		    (unsigned int)(queue - bp->queues),
>>  		    queue->tx_tail, queue->tx_head);
>>  
>> -	/* Prevent the queue IRQ handlers from running: each of them may call
>> -	 * macb_tx_interrupt(), which in turn may call netif_wake_subqueue().
>> +	/* Prevent the queue NAPI poll from running, as it calls
>> +	 * macb_tx_complete(), which in turn may call netif_wake_subqueue().
>>  	 * As explained below, we have to halt the transmission before updating
>>  	 * TBQP registers so we call netif_tx_stop_all_queues() to notify the
>>  	 * network engine about the macb/gem being halted.
>>  	 */
>> +	napi_disable(&queue->napi);
>>  	spin_lock_irqsave(&bp->lock, flags);
>>  
>>  	/* Make sure nobody is trying to queue up new packets */
>> @@ -1058,7 +1059,7 @@ static void macb_tx_error_task(struct work_struct
>> *work)
>>  		if (ctrl & MACB_BIT(TX_USED)) {
>>  			/* skb is set for the last buffer of the frame */
>>  			while (!skb) {
>> -				macb_tx_unmap(bp, tx_skb);
>> +				macb_tx_unmap(bp, tx_skb, 0);
>>  				tail++;
>>  				tx_skb = macb_tx_skb(queue, tail);
>>  				skb = tx_skb->skb;
>> @@ -1088,7 +1089,7 @@ static void macb_tx_error_task(struct work_struct
>> *work)
>>  			desc->ctrl = ctrl | MACB_BIT(TX_USED);
>>  		}
>>  
>> -		macb_tx_unmap(bp, tx_skb);
>> +		macb_tx_unmap(bp, tx_skb, 0);
>>  	}
>>  
>>  	/* Set end of TX queue */
>> @@ -1118,25 +1119,28 @@ static void macb_tx_error_task(struct work_struct
>> *work)
>>  	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
>>  
>>  	spin_unlock_irqrestore(&bp->lock, flags);
>> +	napi_enable(&queue->napi);
>>  }
>>  
>> -static void macb_tx_interrupt(struct macb_queue *queue)
>> +static bool macb_tx_complete_pending(struct macb_queue *queue)
>> +{
>> +	if (queue->tx_head != queue->tx_tail) {
>> +		/* Make hw descriptor updates visible to CPU */
>> +		rmb();
>> +
>> +		if (macb_tx_desc(queue, queue->tx_tail)->ctrl &
>> MACB_BIT(TX_USED))
>> +			return true;
>> +	}
>> +	return false;
>> +}
>> +
>> +static void macb_tx_complete(struct macb_queue *queue, int budget)
>>  {
>>  	unsigned int tail;
>>  	unsigned int head;
>> -	u32 status;
>>  	struct macb *bp = queue->bp;
>>  	u16 queue_index = queue - bp->queues;
>>  
>> -	status = macb_readl(bp, TSR);
>> -	macb_writel(bp, TSR, status);
>> -
>> -	if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
>> -		queue_writel(queue, ISR, MACB_BIT(TCOMP));
>> -
>> -	netdev_vdbg(bp->dev, "macb_tx_interrupt status = 0x%03lx\n",
>> -		    (unsigned long)status);
>> -
>>  	head = queue->tx_head;
>>  	for (tail = queue->tx_tail; tail != head; tail++) {
>>  		struct macb_tx_skb	*tx_skb;
>> @@ -1182,7 +1186,7 @@ static void macb_tx_interrupt(struct macb_queue *queue)
>>  			}
>>  
>>  			/* Now we can safely release resources */
>> -			macb_tx_unmap(bp, tx_skb);
>> +			macb_tx_unmap(bp, tx_skb, budget);
>>  
>>  			/* skb is set only for the last buffer of the frame.
>>  			 * WARNING: at this point skb has been freed by
>> @@ -1569,23 +1573,55 @@ static bool macb_rx_pending(struct macb_queue *queue)
>>  	return (desc->addr & MACB_BIT(RX_USED)) != 0;
>>  }
>>  
>> +static void macb_tx_restart(struct macb_queue *queue)
>> +{
>> +	unsigned int head = queue->tx_head;
>> +	unsigned int tail = queue->tx_tail;
>> +	struct macb *bp = queue->bp;
>> +	unsigned int head_idx, tbqp;
>> +
>> +	if (head == tail)
>> +		return;
>> +
>> +	tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
>> +	tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
>> +	head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, head));
>> +
>> +	if (tbqp == head_idx)
>> +		return;
>> +
>> +	spin_lock_irq(&bp->lock);
>> +	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
>> +	spin_unlock_irq(&bp->lock);
>> +}
>> +
>>  static int macb_poll(struct napi_struct *napi, int budget)
>>  {
>>  	struct macb_queue *queue = container_of(napi, struct macb_queue, napi);
>>  	struct macb *bp = queue->bp;
>>  	int work_done;
>>  
>> +	macb_tx_complete(queue, budget);
>> +
>> +	rmb(); // ensure txubr_pending is up to date
>> +	if (queue->txubr_pending) {
>> +		queue->txubr_pending = false;
>> +		netdev_vdbg(bp->dev, "poll: tx restart\n");
>> +		macb_tx_restart(queue);
>> +	}
>> +
> 
> Thinking about this a bit more, I wonder if we could just do this tx_restart
> call unconditionally here? Then we wouldn't need this txubr_pending flag at
> all. CCing Tomas Melin who worked on this tx_restart code recently.

Not sure could this cause some semantic change on how this restarting
gets handled by the hardware. Only looking at the patch it looks like it
might be possible to call it unconditionally.

But should there anyways be some condition for the tx side handling, as
I suppose macb_poll() runs when there is rx interrupt even if tx side
has nothing to process?

Thanks,
Tomas


> 
>>  	work_done = bp->macbgem_ops.mog_rx(queue, napi, budget);
>>  
>>  	netdev_vdbg(bp->dev, "poll: queue = %u, work_done = %d, budget = %d\n",
>>  		    (unsigned int)(queue - bp->queues), work_done, budget);
>>  
>>  	if (work_done < budget && napi_complete_done(napi, work_done)) {
>> -		queue_writel(queue, IER, bp->rx_intr_mask);
>> +		queue_writel(queue, IER, bp->rx_intr_mask |
>> +					 MACB_BIT(TCOMP));
>>  
>>  		/* Packet completions only seem to propagate to raise
>>  		 * interrupts when interrupts are enabled at the time, so if
>> -		 * packets were received while interrupts were disabled,
>> +		 * packets were sent/received while interrupts were disabled,
>>  		 * they will not cause another interrupt to be generated when
>>  		 * interrupts are re-enabled.
>>  		 * Check for this case here to avoid losing a wakeup. This can
>> @@ -1593,10 +1629,13 @@ static int macb_poll(struct napi_struct *napi, int
>> budget)
>>  		 * actions if an interrupt is raised just after enabling them,
>>  		 * but this should be harmless.
>>  		 */
>> -		if (macb_rx_pending(queue)) {
>> -			queue_writel(queue, IDR, bp->rx_intr_mask);
>> +		if (macb_rx_pending(queue) ||
>> +		    macb_tx_complete_pending(queue)) {
>> +			queue_writel(queue, IDR, bp->rx_intr_mask |
>> +						 MACB_BIT(TCOMP));
>>  			if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
>> -				queue_writel(queue, ISR, MACB_BIT(RCOMP));
>> +				queue_writel(queue, ISR, MACB_BIT(RCOMP) |
>> +							 MACB_BIT(TCOMP));
>>  			netdev_vdbg(bp->dev, "poll: packets pending,
>> reschedule\n");
>>  			napi_schedule(napi);
>>  		}
>> @@ -1646,29 +1685,6 @@ static void macb_hresp_error_task(struct
>> tasklet_struct *t)
>>  	netif_tx_start_all_queues(dev);
>>  }
>>  
>> -static void macb_tx_restart(struct macb_queue *queue)
>> -{
>> -	unsigned int head = queue->tx_head;
>> -	unsigned int tail = queue->tx_tail;
>> -	struct macb *bp = queue->bp;
>> -	unsigned int head_idx, tbqp;
>> -
>> -	if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
>> -		queue_writel(queue, ISR, MACB_BIT(TXUBR));
>> -
>> -	if (head == tail)
>> -		return;
>> -
>> -	tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
>> -	tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
>> -	head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, head));
>> -
>> -	if (tbqp == head_idx)
>> -		return;
>> -
>> -	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
>> -}
>> -
>>  static irqreturn_t macb_wol_interrupt(int irq, void *dev_id)
>>  {
>>  	struct macb_queue *queue = dev_id;
>> @@ -1754,19 +1770,29 @@ static irqreturn_t macb_interrupt(int irq, void
>> *dev_id)
>>  			    (unsigned int)(queue - bp->queues),
>>  			    (unsigned long)status);
>>  
>> -		if (status & bp->rx_intr_mask) {
>> -			/* There's no point taking any more interrupts
>> -			 * until we have processed the buffers. The
>> +		if (status & (bp->rx_intr_mask |
>> +			      MACB_BIT(TCOMP) |
>> +			      MACB_BIT(TXUBR))) {
>> +			/* There's no point taking any more RX/TX completion
>> +			 * interrupts until we have processed the buffers. The
>>  			 * scheduling call may fail if the poll routine
>>  			 * is already scheduled, so disable interrupts
>>  			 * now.
>>  			 */
>> -			queue_writel(queue, IDR, bp->rx_intr_mask);
>> +			queue_writel(queue, IDR, bp->rx_intr_mask |
>> +						 MACB_BIT(TCOMP));
>>  			if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
>> -				queue_writel(queue, ISR, MACB_BIT(RCOMP));
>> +				queue_writel(queue, ISR, MACB_BIT(RCOMP) |
>> +							 MACB_BIT(TCOMP) |
>> +							 MACB_BIT(TXUBR));
>> +
>> +			if (status & MACB_BIT(TXUBR)) {
>> +				queue->txubr_pending = true;
>> +				wmb(); // ensure softirq can see update
>> +			}
>>  
>>  			if (napi_schedule_prep(&queue->napi)) {
>> -				netdev_vdbg(bp->dev, "scheduling RX
>> softirq\n");
>> +				netdev_vdbg(bp->dev, "scheduling NAPI
>> softirq\n");
>>  				__napi_schedule(&queue->napi);
>>  			}
>>  		}
>> @@ -1781,12 +1807,6 @@ static irqreturn_t macb_interrupt(int irq, void
>> *dev_id)
>>  			break;
>>  		}
>>  
>> -		if (status & MACB_BIT(TCOMP))
>> -			macb_tx_interrupt(queue);
>> -
>> -		if (status & MACB_BIT(TXUBR))
>> -			macb_tx_restart(queue);
>> -
>>  		/* Link change detection isn't possible with RMII, so we'll
>>  		 * add that if/when we get our hands on a full-blown MII PHY.
>>  		 */
>> @@ -2019,7 +2039,7 @@ static unsigned int macb_tx_map(struct macb *bp,
>>  	for (i = queue->tx_head; i != tx_head; i++) {
>>  		tx_skb = macb_tx_skb(queue, i);
>>  
>> -		macb_tx_unmap(bp, tx_skb);
>> +		macb_tx_unmap(bp, tx_skb, 0);
>>  	}
>>  
>>  	return 0;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 2/2] net: macb: use NAPI for TX completion path
  2022-04-29 22:31 ` [PATCH net-next 2/2] net: macb: use NAPI for TX completion path Robert Hancock
  2022-04-29 23:09   ` Robert Hancock
@ 2022-05-03  9:57   ` Tomas Melin
  2022-05-03 17:07     ` Robert Hancock
  2022-05-06 19:04   ` Robert Hancock
  2 siblings, 1 reply; 16+ messages in thread
From: Tomas Melin @ 2022-05-03  9:57 UTC (permalink / raw)
  To: Robert Hancock, netdev
  Cc: nicolas.ferre, claudiu.beznea, davem, edumazet, kuba, pabeni

Hi,


On 30/04/2022 01:31, Robert Hancock wrote:
> This driver was using the TX IRQ handler to perform all TX completion
> tasks. Under heavy TX network load, this can cause significant irqs-off
> latencies (found to be in the hundreds of microseconds using ftrace).
> This can cause other issues, such as overrunning serial UART FIFOs when
> using high baud rates with limited UART FIFO sizes.
> 
> Switch to using the NAPI poll handler to perform the TX completion work
> to get this out of hard IRQ context and avoid the IRQ latency impact.
> 
> The TX Used Bit Read interrupt (TXUBR) handling also needs to be moved
> into the NAPI poll handler to maintain the proper order of operations. A
> flag is used to notify the poll handler that a UBR condition needs to be
> handled. The macb_tx_restart handler has had some locking added for global
> register access, since this could now potentially happen concurrently on
> different queues.
> 
> Signed-off-by: Robert Hancock <robert.hancock@calian.com>
> ---
>  drivers/net/ethernet/cadence/macb.h      |   1 +
>  drivers/net/ethernet/cadence/macb_main.c | 138 +++++++++++++----------
>  2 files changed, 80 insertions(+), 59 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
> index f0a7d8396a4a..5355cef95a9b 100644
> --- a/drivers/net/ethernet/cadence/macb.h
> +++ b/drivers/net/ethernet/cadence/macb.h
> @@ -1209,6 +1209,7 @@ struct macb_queue {
>  	struct macb_tx_skb	*tx_skb;
>  	dma_addr_t		tx_ring_dma;
>  	struct work_struct	tx_error_task;
> +	bool			txubr_pending;
>  
>  	dma_addr_t		rx_ring_dma;
>  	dma_addr_t		rx_buffers_dma;
> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> index 160dc5ad84ae..1cb8afb8ef0d 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -959,7 +959,7 @@ static int macb_halt_tx(struct macb *bp)
>  	return -ETIMEDOUT;
>  }
>  
> -static void macb_tx_unmap(struct macb *bp, struct macb_tx_skb *tx_skb)
> +static void macb_tx_unmap(struct macb *bp, struct macb_tx_skb *tx_skb, int budget)
>  {
>  	if (tx_skb->mapping) {
>  		if (tx_skb->mapped_as_page)
> @@ -972,7 +972,7 @@ static void macb_tx_unmap(struct macb *bp, struct macb_tx_skb *tx_skb)
>  	}
>  
>  	if (tx_skb->skb) {
> -		dev_kfree_skb_any(tx_skb->skb);
> +		napi_consume_skb(tx_skb->skb, budget);
>  		tx_skb->skb = NULL;
>  	}
>  }
> @@ -1025,12 +1025,13 @@ static void macb_tx_error_task(struct work_struct *work)
>  		    (unsigned int)(queue - bp->queues),
>  		    queue->tx_tail, queue->tx_head);
>  
> -	/* Prevent the queue IRQ handlers from running: each of them may call
> -	 * macb_tx_interrupt(), which in turn may call netif_wake_subqueue().
> +	/* Prevent the queue NAPI poll from running, as it calls
> +	 * macb_tx_complete(), which in turn may call netif_wake_subqueue().
>  	 * As explained below, we have to halt the transmission before updating
>  	 * TBQP registers so we call netif_tx_stop_all_queues() to notify the
>  	 * network engine about the macb/gem being halted.
>  	 */
> +	napi_disable(&queue->napi);
>  	spin_lock_irqsave(&bp->lock, flags);
>  
>  	/* Make sure nobody is trying to queue up new packets */
> @@ -1058,7 +1059,7 @@ static void macb_tx_error_task(struct work_struct *work)
>  		if (ctrl & MACB_BIT(TX_USED)) {
>  			/* skb is set for the last buffer of the frame */
>  			while (!skb) {
> -				macb_tx_unmap(bp, tx_skb);
> +				macb_tx_unmap(bp, tx_skb, 0);
>  				tail++;
>  				tx_skb = macb_tx_skb(queue, tail);
>  				skb = tx_skb->skb;
> @@ -1088,7 +1089,7 @@ static void macb_tx_error_task(struct work_struct *work)
>  			desc->ctrl = ctrl | MACB_BIT(TX_USED);
>  		}
>  
> -		macb_tx_unmap(bp, tx_skb);
> +		macb_tx_unmap(bp, tx_skb, 0);
>  	}
>  
>  	/* Set end of TX queue */
> @@ -1118,25 +1119,28 @@ static void macb_tx_error_task(struct work_struct *work)
>  	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
>  
>  	spin_unlock_irqrestore(&bp->lock, flags);
> +	napi_enable(&queue->napi);
>  }
>  
> -static void macb_tx_interrupt(struct macb_queue *queue)
> +static bool macb_tx_complete_pending(struct macb_queue *queue)

This might be somewhat problematic approach as TX_USED bit could
potentially also be set if a descriptor with TX_USED is found mid-frame
(for whatever reason).
Not sure how it should be done, but it would be nice if TCOMP
would rather be known on per queue basis.

Related note, rx side function seems to be named as macb_rx_pending(),
should be similary macb_tx_pending()?


Thanks,
Tomas


> +{
> +	if (queue->tx_head != queue->tx_tail) {
> +		/* Make hw descriptor updates visible to CPU */
> +		rmb();
> +
> +		if (macb_tx_desc(queue, queue->tx_tail)->ctrl & MACB_BIT(TX_USED))
> +			return true;
> +	}
> +	return false;
> +}
> +
> +static void macb_tx_complete(struct macb_queue *queue, int budget)
>  {
>  	unsigned int tail;
>  	unsigned int head;
> -	u32 status;
>  	struct macb *bp = queue->bp;
>  	u16 queue_index = queue - bp->queues;
>  
> -	status = macb_readl(bp, TSR);
> -	macb_writel(bp, TSR, status);
> -
> -	if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> -		queue_writel(queue, ISR, MACB_BIT(TCOMP));
> -
> -	netdev_vdbg(bp->dev, "macb_tx_interrupt status = 0x%03lx\n",
> -		    (unsigned long)status);
> -
>  	head = queue->tx_head;
>  	for (tail = queue->tx_tail; tail != head; tail++) {
>  		struct macb_tx_skb	*tx_skb;
> @@ -1182,7 +1186,7 @@ static void macb_tx_interrupt(struct macb_queue *queue)
>  			}
>  
>  			/* Now we can safely release resources */
> -			macb_tx_unmap(bp, tx_skb);
> +			macb_tx_unmap(bp, tx_skb, budget);
>  
>  			/* skb is set only for the last buffer of the frame.
>  			 * WARNING: at this point skb has been freed by
> @@ -1569,23 +1573,55 @@ static bool macb_rx_pending(struct macb_queue *queue)
>  	return (desc->addr & MACB_BIT(RX_USED)) != 0;
>  }
>  
> +static void macb_tx_restart(struct macb_queue *queue)
> +{
> +	unsigned int head = queue->tx_head;
> +	unsigned int tail = queue->tx_tail;
> +	struct macb *bp = queue->bp;
> +	unsigned int head_idx, tbqp;
> +
> +	if (head == tail)
> +		return;
> +
> +	tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
> +	tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
> +	head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, head));
> +
> +	if (tbqp == head_idx)
> +		return;
> +
> +	spin_lock_irq(&bp->lock);
> +	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
> +	spin_unlock_irq(&bp->lock);
> +}
> +
>  static int macb_poll(struct napi_struct *napi, int budget)
>  {
>  	struct macb_queue *queue = container_of(napi, struct macb_queue, napi);
>  	struct macb *bp = queue->bp;
>  	int work_done;
>  
> +	macb_tx_complete(queue, budget);
> +
> +	rmb(); // ensure txubr_pending is up to date
> +	if (queue->txubr_pending) {
> +		queue->txubr_pending = false;
> +		netdev_vdbg(bp->dev, "poll: tx restart\n");
> +		macb_tx_restart(queue);
> +	}
> +
>  	work_done = bp->macbgem_ops.mog_rx(queue, napi, budget);
>  
>  	netdev_vdbg(bp->dev, "poll: queue = %u, work_done = %d, budget = %d\n",
>  		    (unsigned int)(queue - bp->queues), work_done, budget);
>  
>  	if (work_done < budget && napi_complete_done(napi, work_done)) {
> -		queue_writel(queue, IER, bp->rx_intr_mask);
> +		queue_writel(queue, IER, bp->rx_intr_mask |
> +					 MACB_BIT(TCOMP));
>  
>  		/* Packet completions only seem to propagate to raise
>  		 * interrupts when interrupts are enabled at the time, so if
> -		 * packets were received while interrupts were disabled,
> +		 * packets were sent/received while interrupts were disabled,
>  		 * they will not cause another interrupt to be generated when
>  		 * interrupts are re-enabled.
>  		 * Check for this case here to avoid losing a wakeup. This can
> @@ -1593,10 +1629,13 @@ static int macb_poll(struct napi_struct *napi, int budget)
>  		 * actions if an interrupt is raised just after enabling them,
>  		 * but this should be harmless.
>  		 */
> -		if (macb_rx_pending(queue)) {
> -			queue_writel(queue, IDR, bp->rx_intr_mask);
> +		if (macb_rx_pending(queue) ||
> +		    macb_tx_complete_pending(queue)) {
> +			queue_writel(queue, IDR, bp->rx_intr_mask |
> +						 MACB_BIT(TCOMP));
>  			if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> -				queue_writel(queue, ISR, MACB_BIT(RCOMP));
> +				queue_writel(queue, ISR, MACB_BIT(RCOMP) |
> +							 MACB_BIT(TCOMP));
>  			netdev_vdbg(bp->dev, "poll: packets pending, reschedule\n");
>  			napi_schedule(napi);
>  		}
> @@ -1646,29 +1685,6 @@ static void macb_hresp_error_task(struct tasklet_struct *t)
>  	netif_tx_start_all_queues(dev);
>  }
>  
> -static void macb_tx_restart(struct macb_queue *queue)
> -{
> -	unsigned int head = queue->tx_head;
> -	unsigned int tail = queue->tx_tail;
> -	struct macb *bp = queue->bp;
> -	unsigned int head_idx, tbqp;
> -
> -	if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> -		queue_writel(queue, ISR, MACB_BIT(TXUBR));
> -
> -	if (head == tail)
> -		return;
> -
> -	tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
> -	tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
> -	head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, head));
> -
> -	if (tbqp == head_idx)
> -		return;
> -
> -	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
> -}
> -
>  static irqreturn_t macb_wol_interrupt(int irq, void *dev_id)
>  {
>  	struct macb_queue *queue = dev_id;
> @@ -1754,19 +1770,29 @@ static irqreturn_t macb_interrupt(int irq, void *dev_id)
>  			    (unsigned int)(queue - bp->queues),
>  			    (unsigned long)status);
>  
> -		if (status & bp->rx_intr_mask) {
> -			/* There's no point taking any more interrupts
> -			 * until we have processed the buffers. The
> +		if (status & (bp->rx_intr_mask |
> +			      MACB_BIT(TCOMP) |
> +			      MACB_BIT(TXUBR))) {
> +			/* There's no point taking any more RX/TX completion
> +			 * interrupts until we have processed the buffers. The
>  			 * scheduling call may fail if the poll routine
>  			 * is already scheduled, so disable interrupts
>  			 * now.
>  			 */
> -			queue_writel(queue, IDR, bp->rx_intr_mask);
> +			queue_writel(queue, IDR, bp->rx_intr_mask |
> +						 MACB_BIT(TCOMP));
>  			if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> -				queue_writel(queue, ISR, MACB_BIT(RCOMP));
> +				queue_writel(queue, ISR, MACB_BIT(RCOMP) |
> +							 MACB_BIT(TCOMP) |> +							 MACB_BIT(TXUBR))> +
> +			if (status & MACB_BIT(TXUBR)) {
> +				queue->txubr_pending = true;
> +				wmb(); // ensure softirq can see update
> +			}
>  
>  			if (napi_schedule_prep(&queue->napi)) {
> -				netdev_vdbg(bp->dev, "scheduling RX softirq\n");
> +				netdev_vdbg(bp->dev, "scheduling NAPI softirq\n");
>  				__napi_schedule(&queue->napi);
>  			}
>  		}
> @@ -1781,12 +1807,6 @@ static irqreturn_t macb_interrupt(int irq, void *dev_id)
>  			break;
>  		}
>  
> -		if (status & MACB_BIT(TCOMP))
> -			macb_tx_interrupt(queue);
> -
> -		if (status & MACB_BIT(TXUBR))
> -			macb_tx_restart(queue);
> -
>  		/* Link change detection isn't possible with RMII, so we'll
>  		 * add that if/when we get our hands on a full-blown MII PHY.
>  		 */
> @@ -2019,7 +2039,7 @@ static unsigned int macb_tx_map(struct macb *bp,
>  	for (i = queue->tx_head; i != tx_head; i++) {
>  		tx_skb = macb_tx_skb(queue, i);
>  
> -		macb_tx_unmap(bp, tx_skb);
> +		macb_tx_unmap(bp, tx_skb, 0);
>  	}
>  
>  	return 0;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 2/2] net: macb: use NAPI for TX completion path
  2022-05-03  9:34     ` Tomas Melin
@ 2022-05-03 17:01       ` Robert Hancock
  2022-05-09  9:14         ` Harini Katakam
  0 siblings, 1 reply; 16+ messages in thread
From: Robert Hancock @ 2022-05-03 17:01 UTC (permalink / raw)
  To: netdev, tomas.melin
  Cc: nicolas.ferre, davem, claudiu.beznea, kuba, pabeni, edumazet

On Tue, 2022-05-03 at 12:34 +0300, Tomas Melin wrote:
> Hi,
> 
> On 30/04/2022 02:09, Robert Hancock wrote:
> > On Fri, 2022-04-29 at 16:31 -0600, Robert Hancock wrote:
> > > This driver was using the TX IRQ handler to perform all TX completion
> > > tasks. Under heavy TX network load, this can cause significant irqs-off
> > > latencies (found to be in the hundreds of microseconds using ftrace).
> > > This can cause other issues, such as overrunning serial UART FIFOs when
> > > using high baud rates with limited UART FIFO sizes.
> > > 
> > > Switch to using the NAPI poll handler to perform the TX completion work
> > > to get this out of hard IRQ context and avoid the IRQ latency impact.
> > > 
> > > The TX Used Bit Read interrupt (TXUBR) handling also needs to be moved
> > > into the NAPI poll handler to maintain the proper order of operations. A
> > > flag is used to notify the poll handler that a UBR condition needs to be
> > > handled. The macb_tx_restart handler has had some locking added for
> > > global
> > > register access, since this could now potentially happen concurrently on
> > > different queues.
> > > 
> > > Signed-off-by: Robert Hancock <robert.hancock@calian.com>
> > > ---
> > >  drivers/net/ethernet/cadence/macb.h      |   1 +
> > >  drivers/net/ethernet/cadence/macb_main.c | 138 +++++++++++++----------
> > >  2 files changed, 80 insertions(+), 59 deletions(-)
> > > 
> > > diff --git a/drivers/net/ethernet/cadence/macb.h
> > > b/drivers/net/ethernet/cadence/macb.h
> > > index f0a7d8396a4a..5355cef95a9b 100644
> > > --- a/drivers/net/ethernet/cadence/macb.h
> > > +++ b/drivers/net/ethernet/cadence/macb.h
> > > @@ -1209,6 +1209,7 @@ struct macb_queue {
> > >  	struct macb_tx_skb	*tx_skb;
> > >  	dma_addr_t		tx_ring_dma;
> > >  	struct work_struct	tx_error_task;
> > > +	bool			txubr_pending;
> > >  
> > >  	dma_addr_t		rx_ring_dma;
> > >  	dma_addr_t		rx_buffers_dma;
> > > diff --git a/drivers/net/ethernet/cadence/macb_main.c
> > > b/drivers/net/ethernet/cadence/macb_main.c
> > > index 160dc5ad84ae..1cb8afb8ef0d 100644
> > > --- a/drivers/net/ethernet/cadence/macb_main.c
> > > +++ b/drivers/net/ethernet/cadence/macb_main.c
> > > @@ -959,7 +959,7 @@ static int macb_halt_tx(struct macb *bp)
> > >  	return -ETIMEDOUT;
> > >  }
> > >  
> > > -static void macb_tx_unmap(struct macb *bp, struct macb_tx_skb *tx_skb)
> > > +static void macb_tx_unmap(struct macb *bp, struct macb_tx_skb *tx_skb,
> > > int
> > > budget)
> > >  {
> > >  	if (tx_skb->mapping) {
> > >  		if (tx_skb->mapped_as_page)
> > > @@ -972,7 +972,7 @@ static void macb_tx_unmap(struct macb *bp, struct
> > > macb_tx_skb *tx_skb)
> > >  	}
> > >  
> > >  	if (tx_skb->skb) {
> > > -		dev_kfree_skb_any(tx_skb->skb);
> > > +		napi_consume_skb(tx_skb->skb, budget);
> > >  		tx_skb->skb = NULL;
> > >  	}
> > >  }
> > > @@ -1025,12 +1025,13 @@ static void macb_tx_error_task(struct work_struct
> > > *work)
> > >  		    (unsigned int)(queue - bp->queues),
> > >  		    queue->tx_tail, queue->tx_head);
> > >  
> > > -	/* Prevent the queue IRQ handlers from running: each of them may call
> > > -	 * macb_tx_interrupt(), which in turn may call netif_wake_subqueue().
> > > +	/* Prevent the queue NAPI poll from running, as it calls
> > > +	 * macb_tx_complete(), which in turn may call netif_wake_subqueue().
> > >  	 * As explained below, we have to halt the transmission before updating
> > >  	 * TBQP registers so we call netif_tx_stop_all_queues() to notify the
> > >  	 * network engine about the macb/gem being halted.
> > >  	 */
> > > +	napi_disable(&queue->napi);
> > >  	spin_lock_irqsave(&bp->lock, flags);
> > >  
> > >  	/* Make sure nobody is trying to queue up new packets */
> > > @@ -1058,7 +1059,7 @@ static void macb_tx_error_task(struct work_struct
> > > *work)
> > >  		if (ctrl & MACB_BIT(TX_USED)) {
> > >  			/* skb is set for the last buffer of the frame */
> > >  			while (!skb) {
> > > -				macb_tx_unmap(bp, tx_skb);
> > > +				macb_tx_unmap(bp, tx_skb, 0);
> > >  				tail++;
> > >  				tx_skb = macb_tx_skb(queue, tail);
> > >  				skb = tx_skb->skb;
> > > @@ -1088,7 +1089,7 @@ static void macb_tx_error_task(struct work_struct
> > > *work)
> > >  			desc->ctrl = ctrl | MACB_BIT(TX_USED);
> > >  		}
> > >  
> > > -		macb_tx_unmap(bp, tx_skb);
> > > +		macb_tx_unmap(bp, tx_skb, 0);
> > >  	}
> > >  
> > >  	/* Set end of TX queue */
> > > @@ -1118,25 +1119,28 @@ static void macb_tx_error_task(struct work_struct
> > > *work)
> > >  	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
> > >  
> > >  	spin_unlock_irqrestore(&bp->lock, flags);
> > > +	napi_enable(&queue->napi);
> > >  }
> > >  
> > > -static void macb_tx_interrupt(struct macb_queue *queue)
> > > +static bool macb_tx_complete_pending(struct macb_queue *queue)
> > > +{
> > > +	if (queue->tx_head != queue->tx_tail) {
> > > +		/* Make hw descriptor updates visible to CPU */
> > > +		rmb();
> > > +
> > > +		if (macb_tx_desc(queue, queue->tx_tail)->ctrl &
> > > MACB_BIT(TX_USED))
> > > +			return true;
> > > +	}
> > > +	return false;
> > > +}
> > > +
> > > +static void macb_tx_complete(struct macb_queue *queue, int budget)
> > >  {
> > >  	unsigned int tail;
> > >  	unsigned int head;
> > > -	u32 status;
> > >  	struct macb *bp = queue->bp;
> > >  	u16 queue_index = queue - bp->queues;
> > >  
> > > -	status = macb_readl(bp, TSR);
> > > -	macb_writel(bp, TSR, status);
> > > -
> > > -	if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> > > -		queue_writel(queue, ISR, MACB_BIT(TCOMP));
> > > -
> > > -	netdev_vdbg(bp->dev, "macb_tx_interrupt status = 0x%03lx\n",
> > > -		    (unsigned long)status);
> > > -
> > >  	head = queue->tx_head;
> > >  	for (tail = queue->tx_tail; tail != head; tail++) {
> > >  		struct macb_tx_skb	*tx_skb;
> > > @@ -1182,7 +1186,7 @@ static void macb_tx_interrupt(struct macb_queue
> > > *queue)
> > >  			}
> > >  
> > >  			/* Now we can safely release resources */
> > > -			macb_tx_unmap(bp, tx_skb);
> > > +			macb_tx_unmap(bp, tx_skb, budget);
> > >  
> > >  			/* skb is set only for the last buffer of the frame.
> > >  			 * WARNING: at this point skb has been freed by
> > > @@ -1569,23 +1573,55 @@ static bool macb_rx_pending(struct macb_queue
> > > *queue)
> > >  	return (desc->addr & MACB_BIT(RX_USED)) != 0;
> > >  }
> > >  
> > > +static void macb_tx_restart(struct macb_queue *queue)
> > > +{
> > > +	unsigned int head = queue->tx_head;
> > > +	unsigned int tail = queue->tx_tail;
> > > +	struct macb *bp = queue->bp;
> > > +	unsigned int head_idx, tbqp;
> > > +
> > > +	if (head == tail)
> > > +		return;
> > > +
> > > +	tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
> > > +	tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
> > > +	head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, head));
> > > +
> > > +	if (tbqp == head_idx)
> > > +		return;
> > > +
> > > +	spin_lock_irq(&bp->lock);
> > > +	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
> > > +	spin_unlock_irq(&bp->lock);
> > > +}
> > > +
> > >  static int macb_poll(struct napi_struct *napi, int budget)
> > >  {
> > >  	struct macb_queue *queue = container_of(napi, struct macb_queue, napi);
> > >  	struct macb *bp = queue->bp;
> > >  	int work_done;
> > >  
> > > +	macb_tx_complete(queue, budget);
> > > +
> > > +	rmb(); // ensure txubr_pending is up to date
> > > +	if (queue->txubr_pending) {
> > > +		queue->txubr_pending = false;
> > > +		netdev_vdbg(bp->dev, "poll: tx restart\n");
> > > +		macb_tx_restart(queue);
> > > +	}
> > > +
> > 
> > Thinking about this a bit more, I wonder if we could just do this
> > tx_restart
> > call unconditionally here? Then we wouldn't need this txubr_pending flag at
> > all. CCing Tomas Melin who worked on this tx_restart code recently.
> 
> Not sure could this cause some semantic change on how this restarting
> gets handled by the hardware. Only looking at the patch it looks like it
> might be possible to call it unconditionally.

Originally I thought there might be a correctness issue with calling it
unconditionally, but looking at it further I don't think there really is. The
FreeBSD driver for this hardware also seems to always do the TX restart in the
interrupt handler if there are packets in the TX queue.

I think the only real issue is whether it's better performance wise to do it
all the time rather than only after the hardware asserts a TXUBR interrupt. I
expect it would be worse to do it all the time, as that would mean an extra
MMIO read, spinlock, MMIO read and MMIO write, versus just a read barrier and
checking a flag in memory.

> 
> But should there anyways be some condition for the tx side handling, as
> I suppose macb_poll() runs when there is rx interrupt even if tx side
> has nothing to process?

I opted not to do that for this case, as it should be pretty harmless and cheap
to just check the TX ring to see if any packets have been completed yet, rather
than actually tracking if a TX completion was pending. That seems to be the
standard practice in some other drivers (r8169, etc.)

> 
> Thanks,
> Tomas
> 
> 
> > >  	work_done = bp->macbgem_ops.mog_rx(queue, napi, budget);
> > >  
> > >  	netdev_vdbg(bp->dev, "poll: queue = %u, work_done = %d, budget = %d\n",
> > >  		    (unsigned int)(queue - bp->queues), work_done, budget);
> > >  
> > >  	if (work_done < budget && napi_complete_done(napi, work_done)) {
> > > -		queue_writel(queue, IER, bp->rx_intr_mask);
> > > +		queue_writel(queue, IER, bp->rx_intr_mask |
> > > +					 MACB_BIT(TCOMP));
> > >  
> > >  		/* Packet completions only seem to propagate to raise
> > >  		 * interrupts when interrupts are enabled at the time, so if
> > > -		 * packets were received while interrupts were disabled,
> > > +		 * packets were sent/received while interrupts were disabled,
> > >  		 * they will not cause another interrupt to be generated when
> > >  		 * interrupts are re-enabled.
> > >  		 * Check for this case here to avoid losing a wakeup. This can
> > > @@ -1593,10 +1629,13 @@ static int macb_poll(struct napi_struct *napi,
> > > int
> > > budget)
> > >  		 * actions if an interrupt is raised just after enabling them,
> > >  		 * but this should be harmless.
> > >  		 */
> > > -		if (macb_rx_pending(queue)) {
> > > -			queue_writel(queue, IDR, bp->rx_intr_mask);
> > > +		if (macb_rx_pending(queue) ||
> > > +		    macb_tx_complete_pending(queue)) {
> > > +			queue_writel(queue, IDR, bp->rx_intr_mask |
> > > +						 MACB_BIT(TCOMP));
> > >  			if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> > > -				queue_writel(queue, ISR, MACB_BIT(RCOMP));
> > > +				queue_writel(queue, ISR, MACB_BIT(RCOMP) |
> > > +							 MACB_BIT(TCOMP));
> > >  			netdev_vdbg(bp->dev, "poll: packets pending,
> > > reschedule\n");
> > >  			napi_schedule(napi);
> > >  		}
> > > @@ -1646,29 +1685,6 @@ static void macb_hresp_error_task(struct
> > > tasklet_struct *t)
> > >  	netif_tx_start_all_queues(dev);
> > >  }
> > >  
> > > -static void macb_tx_restart(struct macb_queue *queue)
> > > -{
> > > -	unsigned int head = queue->tx_head;
> > > -	unsigned int tail = queue->tx_tail;
> > > -	struct macb *bp = queue->bp;
> > > -	unsigned int head_idx, tbqp;
> > > -
> > > -	if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> > > -		queue_writel(queue, ISR, MACB_BIT(TXUBR));
> > > -
> > > -	if (head == tail)
> > > -		return;
> > > -
> > > -	tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
> > > -	tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
> > > -	head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, head));
> > > -
> > > -	if (tbqp == head_idx)
> > > -		return;
> > > -
> > > -	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
> > > -}
> > > -
> > >  static irqreturn_t macb_wol_interrupt(int irq, void *dev_id)
> > >  {
> > >  	struct macb_queue *queue = dev_id;
> > > @@ -1754,19 +1770,29 @@ static irqreturn_t macb_interrupt(int irq, void
> > > *dev_id)
> > >  			    (unsigned int)(queue - bp->queues),
> > >  			    (unsigned long)status);
> > >  
> > > -		if (status & bp->rx_intr_mask) {
> > > -			/* There's no point taking any more interrupts
> > > -			 * until we have processed the buffers. The
> > > +		if (status & (bp->rx_intr_mask |
> > > +			      MACB_BIT(TCOMP) |
> > > +			      MACB_BIT(TXUBR))) {
> > > +			/* There's no point taking any more RX/TX completion
> > > +			 * interrupts until we have processed the buffers. The
> > >  			 * scheduling call may fail if the poll routine
> > >  			 * is already scheduled, so disable interrupts
> > >  			 * now.
> > >  			 */
> > > -			queue_writel(queue, IDR, bp->rx_intr_mask);
> > > +			queue_writel(queue, IDR, bp->rx_intr_mask |
> > > +						 MACB_BIT(TCOMP));
> > >  			if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> > > -				queue_writel(queue, ISR, MACB_BIT(RCOMP));
> > > +				queue_writel(queue, ISR, MACB_BIT(RCOMP) |
> > > +							 MACB_BIT(TCOMP) |
> > > +							 MACB_BIT(TXUBR));
> > > +
> > > +			if (status & MACB_BIT(TXUBR)) {
> > > +				queue->txubr_pending = true;
> > > +				wmb(); // ensure softirq can see update
> > > +			}
> > >  
> > >  			if (napi_schedule_prep(&queue->napi)) {
> > > -				netdev_vdbg(bp->dev, "scheduling RX
> > > softirq\n");
> > > +				netdev_vdbg(bp->dev, "scheduling NAPI
> > > softirq\n");
> > >  				__napi_schedule(&queue->napi);
> > >  			}
> > >  		}
> > > @@ -1781,12 +1807,6 @@ static irqreturn_t macb_interrupt(int irq, void
> > > *dev_id)
> > >  			break;
> > >  		}
> > >  
> > > -		if (status & MACB_BIT(TCOMP))
> > > -			macb_tx_interrupt(queue);
> > > -
> > > -		if (status & MACB_BIT(TXUBR))
> > > -			macb_tx_restart(queue);
> > > -
> > >  		/* Link change detection isn't possible with RMII, so we'll
> > >  		 * add that if/when we get our hands on a full-blown MII PHY.
> > >  		 */
> > > @@ -2019,7 +2039,7 @@ static unsigned int macb_tx_map(struct macb *bp,
> > >  	for (i = queue->tx_head; i != tx_head; i++) {
> > >  		tx_skb = macb_tx_skb(queue, i);
> > >  
> > > -		macb_tx_unmap(bp, tx_skb);
> > > +		macb_tx_unmap(bp, tx_skb, 0);
> > >  	}
> > >  
> > >  	return 0;
-- 
Robert Hancock
Senior Hardware Designer, Calian Advanced Technologies
www.calian.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 2/2] net: macb: use NAPI for TX completion path
  2022-05-03  9:57   ` Tomas Melin
@ 2022-05-03 17:07     ` Robert Hancock
  2022-05-09  5:49       ` Tomas Melin
  0 siblings, 1 reply; 16+ messages in thread
From: Robert Hancock @ 2022-05-03 17:07 UTC (permalink / raw)
  To: netdev, tomas.melin
  Cc: nicolas.ferre, davem, claudiu.beznea, kuba, pabeni, edumazet

On Tue, 2022-05-03 at 12:57 +0300, Tomas Melin wrote:
> Hi,
> 
> 
> On 30/04/2022 01:31, Robert Hancock wrote:
> > This driver was using the TX IRQ handler to perform all TX completion
> > tasks. Under heavy TX network load, this can cause significant irqs-off
> > latencies (found to be in the hundreds of microseconds using ftrace).
> > This can cause other issues, such as overrunning serial UART FIFOs when
> > using high baud rates with limited UART FIFO sizes.
> > 
> > Switch to using the NAPI poll handler to perform the TX completion work
> > to get this out of hard IRQ context and avoid the IRQ latency impact.
> > 
> > The TX Used Bit Read interrupt (TXUBR) handling also needs to be moved
> > into the NAPI poll handler to maintain the proper order of operations. A
> > flag is used to notify the poll handler that a UBR condition needs to be
> > handled. The macb_tx_restart handler has had some locking added for global
> > register access, since this could now potentially happen concurrently on
> > different queues.
> > 
> > Signed-off-by: Robert Hancock <robert.hancock@calian.com>
> > ---
> >  drivers/net/ethernet/cadence/macb.h      |   1 +
> >  drivers/net/ethernet/cadence/macb_main.c | 138 +++++++++++++----------
> >  2 files changed, 80 insertions(+), 59 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/cadence/macb.h
> > b/drivers/net/ethernet/cadence/macb.h
> > index f0a7d8396a4a..5355cef95a9b 100644
> > --- a/drivers/net/ethernet/cadence/macb.h
> > +++ b/drivers/net/ethernet/cadence/macb.h
> > @@ -1209,6 +1209,7 @@ struct macb_queue {
> >  	struct macb_tx_skb	*tx_skb;
> >  	dma_addr_t		tx_ring_dma;
> >  	struct work_struct	tx_error_task;
> > +	bool			txubr_pending;
> >  
> >  	dma_addr_t		rx_ring_dma;
> >  	dma_addr_t		rx_buffers_dma;
> > diff --git a/drivers/net/ethernet/cadence/macb_main.c
> > b/drivers/net/ethernet/cadence/macb_main.c
> > index 160dc5ad84ae..1cb8afb8ef0d 100644
> > --- a/drivers/net/ethernet/cadence/macb_main.c
> > +++ b/drivers/net/ethernet/cadence/macb_main.c
> > @@ -959,7 +959,7 @@ static int macb_halt_tx(struct macb *bp)
> >  	return -ETIMEDOUT;
> >  }
> >  
> > -static void macb_tx_unmap(struct macb *bp, struct macb_tx_skb *tx_skb)
> > +static void macb_tx_unmap(struct macb *bp, struct macb_tx_skb *tx_skb, int
> > budget)
> >  {
> >  	if (tx_skb->mapping) {
> >  		if (tx_skb->mapped_as_page)
> > @@ -972,7 +972,7 @@ static void macb_tx_unmap(struct macb *bp, struct
> > macb_tx_skb *tx_skb)
> >  	}
> >  
> >  	if (tx_skb->skb) {
> > -		dev_kfree_skb_any(tx_skb->skb);
> > +		napi_consume_skb(tx_skb->skb, budget);
> >  		tx_skb->skb = NULL;
> >  	}
> >  }
> > @@ -1025,12 +1025,13 @@ static void macb_tx_error_task(struct work_struct
> > *work)
> >  		    (unsigned int)(queue - bp->queues),
> >  		    queue->tx_tail, queue->tx_head);
> >  
> > -	/* Prevent the queue IRQ handlers from running: each of them may call
> > -	 * macb_tx_interrupt(), which in turn may call netif_wake_subqueue().
> > +	/* Prevent the queue NAPI poll from running, as it calls
> > +	 * macb_tx_complete(), which in turn may call netif_wake_subqueue().
> >  	 * As explained below, we have to halt the transmission before updating
> >  	 * TBQP registers so we call netif_tx_stop_all_queues() to notify the
> >  	 * network engine about the macb/gem being halted.
> >  	 */
> > +	napi_disable(&queue->napi);
> >  	spin_lock_irqsave(&bp->lock, flags);
> >  
> >  	/* Make sure nobody is trying to queue up new packets */
> > @@ -1058,7 +1059,7 @@ static void macb_tx_error_task(struct work_struct
> > *work)
> >  		if (ctrl & MACB_BIT(TX_USED)) {
> >  			/* skb is set for the last buffer of the frame */
> >  			while (!skb) {
> > -				macb_tx_unmap(bp, tx_skb);
> > +				macb_tx_unmap(bp, tx_skb, 0);
> >  				tail++;
> >  				tx_skb = macb_tx_skb(queue, tail);
> >  				skb = tx_skb->skb;
> > @@ -1088,7 +1089,7 @@ static void macb_tx_error_task(struct work_struct
> > *work)
> >  			desc->ctrl = ctrl | MACB_BIT(TX_USED);
> >  		}
> >  
> > -		macb_tx_unmap(bp, tx_skb);
> > +		macb_tx_unmap(bp, tx_skb, 0);
> >  	}
> >  
> >  	/* Set end of TX queue */
> > @@ -1118,25 +1119,28 @@ static void macb_tx_error_task(struct work_struct
> > *work)
> >  	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
> >  
> >  	spin_unlock_irqrestore(&bp->lock, flags);
> > +	napi_enable(&queue->napi);
> >  }
> >  
> > -static void macb_tx_interrupt(struct macb_queue *queue)
> > +static bool macb_tx_complete_pending(struct macb_queue *queue)
> 
> This might be somewhat problematic approach as TX_USED bit could
> potentially also be set if a descriptor with TX_USED is found mid-frame
> (for whatever reason).
> Not sure how it should be done, but it would be nice if TCOMP
> would rather be known on per queue basis.

I think this scenario would already have been a problem if it were possible -
i.e. when we detect the TCOMP interrupt was asserted, it has likely already
started sending the next frame in the ring when we are processing the ring
entries, so if a TX_USED bit was set in the middle of a frame we would
potentially already have been acting on it and likely breaking? So it seems
like a safe assumption that this doesn't happen, as it that would make it very
difficult for the driver to tell when a frame was actually done or not..

> 
> Related note, rx side function seems to be named as macb_rx_pending(),
> should be similary macb_tx_pending()?

I thought that naming might be a bit misleading, as in the TX case we are not
only interested in whether frames are pending (i.e. in the TX ring to be sent)
but also there are completed frames in the ring that have not been processed
yet.

> 
> 
> Thanks,
> Tomas
> 
> 
> > +{
> > +	if (queue->tx_head != queue->tx_tail) {
> > +		/* Make hw descriptor updates visible to CPU */
> > +		rmb();
> > +
> > +		if (macb_tx_desc(queue, queue->tx_tail)->ctrl &
> > MACB_BIT(TX_USED))
> > +			return true;
> > +	}
> > +	return false;
> > +}
> > +
> > +static void macb_tx_complete(struct macb_queue *queue, int budget)
> >  {
> >  	unsigned int tail;
> >  	unsigned int head;
> > -	u32 status;
> >  	struct macb *bp = queue->bp;
> >  	u16 queue_index = queue - bp->queues;
> >  
> > -	status = macb_readl(bp, TSR);
> > -	macb_writel(bp, TSR, status);
> > -
> > -	if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> > -		queue_writel(queue, ISR, MACB_BIT(TCOMP));
> > -
> > -	netdev_vdbg(bp->dev, "macb_tx_interrupt status = 0x%03lx\n",
> > -		    (unsigned long)status);
> > -
> >  	head = queue->tx_head;
> >  	for (tail = queue->tx_tail; tail != head; tail++) {
> >  		struct macb_tx_skb	*tx_skb;
> > @@ -1182,7 +1186,7 @@ static void macb_tx_interrupt(struct macb_queue
> > *queue)
> >  			}
> >  
> >  			/* Now we can safely release resources */
> > -			macb_tx_unmap(bp, tx_skb);
> > +			macb_tx_unmap(bp, tx_skb, budget);
> >  
> >  			/* skb is set only for the last buffer of the frame.
> >  			 * WARNING: at this point skb has been freed by
> > @@ -1569,23 +1573,55 @@ static bool macb_rx_pending(struct macb_queue
> > *queue)
> >  	return (desc->addr & MACB_BIT(RX_USED)) != 0;
> >  }
> >  
> > +static void macb_tx_restart(struct macb_queue *queue)
> > +{
> > +	unsigned int head = queue->tx_head;
> > +	unsigned int tail = queue->tx_tail;
> > +	struct macb *bp = queue->bp;
> > +	unsigned int head_idx, tbqp;
> > +
> > +	if (head == tail)
> > +		return;
> > +
> > +	tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
> > +	tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
> > +	head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, head));
> > +
> > +	if (tbqp == head_idx)
> > +		return;
> > +
> > +	spin_lock_irq(&bp->lock);
> > +	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
> > +	spin_unlock_irq(&bp->lock);
> > +}
> > +
> >  static int macb_poll(struct napi_struct *napi, int budget)
> >  {
> >  	struct macb_queue *queue = container_of(napi, struct macb_queue, napi);
> >  	struct macb *bp = queue->bp;
> >  	int work_done;
> >  
> > +	macb_tx_complete(queue, budget);
> > +
> > +	rmb(); // ensure txubr_pending is up to date
> > +	if (queue->txubr_pending) {
> > +		queue->txubr_pending = false;
> > +		netdev_vdbg(bp->dev, "poll: tx restart\n");
> > +		macb_tx_restart(queue);
> > +	}
> > +
> >  	work_done = bp->macbgem_ops.mog_rx(queue, napi, budget);
> >  
> >  	netdev_vdbg(bp->dev, "poll: queue = %u, work_done = %d, budget = %d\n",
> >  		    (unsigned int)(queue - bp->queues), work_done, budget);
> >  
> >  	if (work_done < budget && napi_complete_done(napi, work_done)) {
> > -		queue_writel(queue, IER, bp->rx_intr_mask);
> > +		queue_writel(queue, IER, bp->rx_intr_mask |
> > +					 MACB_BIT(TCOMP));
> >  
> >  		/* Packet completions only seem to propagate to raise
> >  		 * interrupts when interrupts are enabled at the time, so if
> > -		 * packets were received while interrupts were disabled,
> > +		 * packets were sent/received while interrupts were disabled,
> >  		 * they will not cause another interrupt to be generated when
> >  		 * interrupts are re-enabled.
> >  		 * Check for this case here to avoid losing a wakeup. This can
> > @@ -1593,10 +1629,13 @@ static int macb_poll(struct napi_struct *napi, int
> > budget)
> >  		 * actions if an interrupt is raised just after enabling them,
> >  		 * but this should be harmless.
> >  		 */
> > -		if (macb_rx_pending(queue)) {
> > -			queue_writel(queue, IDR, bp->rx_intr_mask);
> > +		if (macb_rx_pending(queue) ||
> > +		    macb_tx_complete_pending(queue)) {
> > +			queue_writel(queue, IDR, bp->rx_intr_mask |
> > +						 MACB_BIT(TCOMP));
> >  			if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> > -				queue_writel(queue, ISR, MACB_BIT(RCOMP));
> > +				queue_writel(queue, ISR, MACB_BIT(RCOMP) |
> > +							 MACB_BIT(TCOMP));
> >  			netdev_vdbg(bp->dev, "poll: packets pending,
> > reschedule\n");
> >  			napi_schedule(napi);
> >  		}
> > @@ -1646,29 +1685,6 @@ static void macb_hresp_error_task(struct
> > tasklet_struct *t)
> >  	netif_tx_start_all_queues(dev);
> >  }
> >  
> > -static void macb_tx_restart(struct macb_queue *queue)
> > -{
> > -	unsigned int head = queue->tx_head;
> > -	unsigned int tail = queue->tx_tail;
> > -	struct macb *bp = queue->bp;
> > -	unsigned int head_idx, tbqp;
> > -
> > -	if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> > -		queue_writel(queue, ISR, MACB_BIT(TXUBR));
> > -
> > -	if (head == tail)
> > -		return;
> > -
> > -	tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
> > -	tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
> > -	head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, head));
> > -
> > -	if (tbqp == head_idx)
> > -		return;
> > -
> > -	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
> > -}
> > -
> >  static irqreturn_t macb_wol_interrupt(int irq, void *dev_id)
> >  {
> >  	struct macb_queue *queue = dev_id;
> > @@ -1754,19 +1770,29 @@ static irqreturn_t macb_interrupt(int irq, void
> > *dev_id)
> >  			    (unsigned int)(queue - bp->queues),
> >  			    (unsigned long)status);
> >  
> > -		if (status & bp->rx_intr_mask) {
> > -			/* There's no point taking any more interrupts
> > -			 * until we have processed the buffers. The
> > +		if (status & (bp->rx_intr_mask |
> > +			      MACB_BIT(TCOMP) |
> > +			      MACB_BIT(TXUBR))) {
> > +			/* There's no point taking any more RX/TX completion
> > +			 * interrupts until we have processed the buffers. The
> >  			 * scheduling call may fail if the poll routine
> >  			 * is already scheduled, so disable interrupts
> >  			 * now.
> >  			 */
> > -			queue_writel(queue, IDR, bp->rx_intr_mask);
> > +			queue_writel(queue, IDR, bp->rx_intr_mask |
> > +						 MACB_BIT(TCOMP));
> >  			if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> > -				queue_writel(queue, ISR, MACB_BIT(RCOMP));
> > +				queue_writel(queue, ISR, MACB_BIT(RCOMP) |
> > +							 MACB_BIT(TCOMP) |> +	
> > 						 MACB_BIT(TXUBR))> +
> > +			if (status & MACB_BIT(TXUBR)) {
> > +				queue->txubr_pending = true;
> > +				wmb(); // ensure softirq can see update
> > +			}
> >  
> >  			if (napi_schedule_prep(&queue->napi)) {
> > -				netdev_vdbg(bp->dev, "scheduling RX
> > softirq\n");
> > +				netdev_vdbg(bp->dev, "scheduling NAPI
> > softirq\n");
> >  				__napi_schedule(&queue->napi);
> >  			}
> >  		}
> > @@ -1781,12 +1807,6 @@ static irqreturn_t macb_interrupt(int irq, void
> > *dev_id)
> >  			break;
> >  		}
> >  
> > -		if (status & MACB_BIT(TCOMP))
> > -			macb_tx_interrupt(queue);
> > -
> > -		if (status & MACB_BIT(TXUBR))
> > -			macb_tx_restart(queue);
> > -
> >  		/* Link change detection isn't possible with RMII, so we'll
> >  		 * add that if/when we get our hands on a full-blown MII PHY.
> >  		 */
> > @@ -2019,7 +2039,7 @@ static unsigned int macb_tx_map(struct macb *bp,
> >  	for (i = queue->tx_head; i != tx_head; i++) {
> >  		tx_skb = macb_tx_skb(queue, i);
> >  
> > -		macb_tx_unmap(bp, tx_skb);
> > +		macb_tx_unmap(bp, tx_skb, 0);
> >  	}
> >  
> >  	return 0;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 2/2] net: macb: use NAPI for TX completion path
  2022-05-03  7:22     ` Paolo Abeni
@ 2022-05-03 18:34       ` Robert Hancock
  0 siblings, 0 replies; 16+ messages in thread
From: Robert Hancock @ 2022-05-03 18:34 UTC (permalink / raw)
  To: netdev, pabeni
  Cc: nicolas.ferre, davem, claudiu.beznea, kuba, edumazet, tomas.melin

On Tue, 2022-05-03 at 09:22 +0200, Paolo Abeni wrote:
> On Fri, 2022-04-29 at 23:09 +0000, Robert Hancock wrote:
> > On Fri, 2022-04-29 at 16:31 -0600, Robert Hancock wrote:
> > > This driver was using the TX IRQ handler to perform all TX completion
> > > tasks. Under heavy TX network load, this can cause significant irqs-off
> > > latencies (found to be in the hundreds of microseconds using ftrace).
> > > This can cause other issues, such as overrunning serial UART FIFOs when
> > > using high baud rates with limited UART FIFO sizes.
> > > 
> > > Switch to using the NAPI poll handler to perform the TX completion work
> > > to get this out of hard IRQ context and avoid the IRQ latency impact.
> > > 
> > > The TX Used Bit Read interrupt (TXUBR) handling also needs to be moved
> > > into the NAPI poll handler to maintain the proper order of operations. A
> > > flag is used to notify the poll handler that a UBR condition needs to be
> > > handled. The macb_tx_restart handler has had some locking added for
> > > global
> > > register access, since this could now potentially happen concurrently on
> > > different queues.
> > > 
> > > Signed-off-by: Robert Hancock <robert.hancock@calian.com>
> > > ---
> > >  drivers/net/ethernet/cadence/macb.h      |   1 +
> > >  drivers/net/ethernet/cadence/macb_main.c | 138 +++++++++++++----------
> > >  2 files changed, 80 insertions(+), 59 deletions(-)
> > > 
> > > diff --git a/drivers/net/ethernet/cadence/macb.h
> > > b/drivers/net/ethernet/cadence/macb.h
> > > index f0a7d8396a4a..5355cef95a9b 100644
> > > --- a/drivers/net/ethernet/cadence/macb.h
> > > +++ b/drivers/net/ethernet/cadence/macb.h
> > > @@ -1209,6 +1209,7 @@ struct macb_queue {
> > >  	struct macb_tx_skb	*tx_skb;
> > >  	dma_addr_t		tx_ring_dma;
> > >  	struct work_struct	tx_error_task;
> > > +	bool			txubr_pending;
> > >  
> > >  	dma_addr_t		rx_ring_dma;
> > >  	dma_addr_t		rx_buffers_dma;
> > > diff --git a/drivers/net/ethernet/cadence/macb_main.c
> > > b/drivers/net/ethernet/cadence/macb_main.c
> > > index 160dc5ad84ae..1cb8afb8ef0d 100644
> > > --- a/drivers/net/ethernet/cadence/macb_main.c
> > > +++ b/drivers/net/ethernet/cadence/macb_main.c
> > > @@ -959,7 +959,7 @@ static int macb_halt_tx(struct macb *bp)
> > >  	return -ETIMEDOUT;
> > >  }
> > >  
> > > -static void macb_tx_unmap(struct macb *bp, struct macb_tx_skb *tx_skb)
> > > +static void macb_tx_unmap(struct macb *bp, struct macb_tx_skb *tx_skb,
> > > int
> > > budget)
> > >  {
> > >  	if (tx_skb->mapping) {
> > >  		if (tx_skb->mapped_as_page)
> > > @@ -972,7 +972,7 @@ static void macb_tx_unmap(struct macb *bp, struct
> > > macb_tx_skb *tx_skb)
> > >  	}
> > >  
> > >  	if (tx_skb->skb) {
> > > -		dev_kfree_skb_any(tx_skb->skb);
> > > +		napi_consume_skb(tx_skb->skb, budget);
> > >  		tx_skb->skb = NULL;
> > >  	}
> > >  }
> > > @@ -1025,12 +1025,13 @@ static void macb_tx_error_task(struct work_struct
> > > *work)
> > >  		    (unsigned int)(queue - bp->queues),
> > >  		    queue->tx_tail, queue->tx_head);
> > >  
> > > -	/* Prevent the queue IRQ handlers from running: each of them may call
> > > -	 * macb_tx_interrupt(), which in turn may call netif_wake_subqueue().
> > > +	/* Prevent the queue NAPI poll from running, as it calls
> > > +	 * macb_tx_complete(), which in turn may call netif_wake_subqueue().
> > >  	 * As explained below, we have to halt the transmission before updating
> > >  	 * TBQP registers so we call netif_tx_stop_all_queues() to notify the
> > >  	 * network engine about the macb/gem being halted.
> > >  	 */
> > > +	napi_disable(&queue->napi);
> > >  	spin_lock_irqsave(&bp->lock, flags);
> > >  
> > >  	/* Make sure nobody is trying to queue up new packets */
> > > @@ -1058,7 +1059,7 @@ static void macb_tx_error_task(struct work_struct
> > > *work)
> > >  		if (ctrl & MACB_BIT(TX_USED)) {
> > >  			/* skb is set for the last buffer of the frame */
> > >  			while (!skb) {
> > > -				macb_tx_unmap(bp, tx_skb);
> > > +				macb_tx_unmap(bp, tx_skb, 0);
> > >  				tail++;
> > >  				tx_skb = macb_tx_skb(queue, tail);
> > >  				skb = tx_skb->skb;
> > > @@ -1088,7 +1089,7 @@ static void macb_tx_error_task(struct work_struct
> > > *work)
> > >  			desc->ctrl = ctrl | MACB_BIT(TX_USED);
> > >  		}
> > >  
> > > -		macb_tx_unmap(bp, tx_skb);
> > > +		macb_tx_unmap(bp, tx_skb, 0);
> > >  	}
> > >  
> > >  	/* Set end of TX queue */
> > > @@ -1118,25 +1119,28 @@ static void macb_tx_error_task(struct work_struct
> > > *work)
> > >  	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
> > >  
> > >  	spin_unlock_irqrestore(&bp->lock, flags);
> > > +	napi_enable(&queue->napi);
> > >  }
> > >  
> > > -static void macb_tx_interrupt(struct macb_queue *queue)
> > > +static bool macb_tx_complete_pending(struct macb_queue *queue)
> > > +{
> > > +	if (queue->tx_head != queue->tx_tail) {
> > > +		/* Make hw descriptor updates visible to CPU */
> > > +		rmb();
> > > +
> > > +		if (macb_tx_desc(queue, queue->tx_tail)->ctrl &
> > > MACB_BIT(TX_USED))
> > > +			return true;
> > > +	}
> > > +	return false;
> > > +}
> > > +
> > > +static void macb_tx_complete(struct macb_queue *queue, int budget)
> > >  {
> > >  	unsigned int tail;
> > >  	unsigned int head;
> > > -	u32 status;
> > >  	struct macb *bp = queue->bp;
> > >  	u16 queue_index = queue - bp->queues;
> > >  
> > > -	status = macb_readl(bp, TSR);
> > > -	macb_writel(bp, TSR, status);
> > > -
> > > -	if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> > > -		queue_writel(queue, ISR, MACB_BIT(TCOMP));
> > > -
> > > -	netdev_vdbg(bp->dev, "macb_tx_interrupt status = 0x%03lx\n",
> > > -		    (unsigned long)status);
> > > -
> > >  	head = queue->tx_head;
> > >  	for (tail = queue->tx_tail; tail != head; tail++) {
> > >  		struct macb_tx_skb	*tx_skb;
> > > @@ -1182,7 +1186,7 @@ static void macb_tx_interrupt(struct macb_queue
> > > *queue)
> > >  			}
> > >  
> > >  			/* Now we can safely release resources */
> > > -			macb_tx_unmap(bp, tx_skb);
> > > +			macb_tx_unmap(bp, tx_skb, budget);
> > >  
> > >  			/* skb is set only for the last buffer of the frame.
> > >  			 * WARNING: at this point skb has been freed by
> > > @@ -1569,23 +1573,55 @@ static bool macb_rx_pending(struct macb_queue
> > > *queue)
> > >  	return (desc->addr & MACB_BIT(RX_USED)) != 0;
> > >  }
> > >  
> > > +static void macb_tx_restart(struct macb_queue *queue)
> > > +{
> > > +	unsigned int head = queue->tx_head;
> > > +	unsigned int tail = queue->tx_tail;
> > > +	struct macb *bp = queue->bp;
> > > +	unsigned int head_idx, tbqp;
> > > +
> > > +	if (head == tail)
> > > +		return;
> > > +
> > > +	tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
> > > +	tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
> > > +	head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, head));
> > > +
> > > +	if (tbqp == head_idx)
> > > +		return;
> > > +
> > > +	spin_lock_irq(&bp->lock);
> > > +	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
> > > +	spin_unlock_irq(&bp->lock);
> > > +}
> > > +
> > >  static int macb_poll(struct napi_struct *napi, int budget)
> > >  {
> > >  	struct macb_queue *queue = container_of(napi, struct macb_queue, napi);
> > >  	struct macb *bp = queue->bp;
> > >  	int work_done;
> > >  
> > > +	macb_tx_complete(queue, budget);
> > > +
> > > +	rmb(); // ensure txubr_pending is up to date
> > > +	if (queue->txubr_pending) {
> > > +		queue->txubr_pending = false;
> > > +		netdev_vdbg(bp->dev, "poll: tx restart\n");
> > > +		macb_tx_restart(queue);
> > > +	}
> > > +
> > 
> > Thinking about this a bit more, I wonder if we could just do this
> > tx_restart
> > call unconditionally here? Then we wouldn't need this txubr_pending flag at
> > all. CCing Tomas Melin who worked on this tx_restart code recently.
> 
> Looking only at the code, and lacking the NIC specs, the two
> alternative look functionally equivalent.
> 
> Performance wise it could matter. It depends on the relative cost of
> ISR+memory barriers vs restarting TX  - removing txubr_pending you will
> trade the former for the latter.
> 
> I guess the easier way to check is doing performance comparisons with
> the 2 options. I hope you have the relevant H/W handy ;)

I've done some benchmarks with this under a moderate TX load (about 600 Mbps)
on the Xilinx ZynqMP platform. It looks like the softirq CPU usage is a few
percent higher when doing an unconditional TX restart (around 52% vs. 48-50%
with the code as submitted). So it's not earth shattering or highly conclusive
either way, but it seems the way I had it is likely more efficient.

> 
> Thanks,
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 0/2] MACB NAPI improvements
  2022-04-29 22:31 [PATCH net-next 0/2] MACB NAPI improvements Robert Hancock
  2022-04-29 22:31 ` [PATCH net-next 1/2] net: macb: simplify/cleanup NAPI reschedule checking Robert Hancock
  2022-04-29 22:31 ` [PATCH net-next 2/2] net: macb: use NAPI for TX completion path Robert Hancock
@ 2022-05-05 17:56 ` Robert Hancock
  2 siblings, 0 replies; 16+ messages in thread
From: Robert Hancock @ 2022-05-05 17:56 UTC (permalink / raw)
  To: netdev
  Cc: nicolas.ferre, davem, claudiu.beznea, kuba, pabeni, edumazet,
	tomas.melin

On Fri, 2022-04-29 at 16:31 -0600, Robert Hancock wrote:
> Simplify the logic in the Cadence MACB/GEM driver for determining
> when to reschedule NAPI processing, and update it to use NAPI for the
> TX path as well as the RX path.
> 
> Robert Hancock (2):
>   net: macb: simplify/cleanup NAPI reschedule checking
>   net: macb: use NAPI for TX completion path
> 
>  drivers/net/ethernet/cadence/macb.h      |   1 +
>  drivers/net/ethernet/cadence/macb_main.c | 193 ++++++++++++-----------
>  2 files changed, 106 insertions(+), 88 deletions(-)
> 

Looks like this patchset is marked as Changes Requested in Patchwork, there was
some discussion but nothing that seemed like there was a conclusion that any
changes were needed. If anyone would like to see changes in another spin,
please let me know.

-- 
Robert Hancock
Senior Hardware Designer, Calian Advanced Technologies
www.calian.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 2/2] net: macb: use NAPI for TX completion path
  2022-04-29 22:31 ` [PATCH net-next 2/2] net: macb: use NAPI for TX completion path Robert Hancock
  2022-04-29 23:09   ` Robert Hancock
  2022-05-03  9:57   ` Tomas Melin
@ 2022-05-06 19:04   ` Robert Hancock
  2 siblings, 0 replies; 16+ messages in thread
From: Robert Hancock @ 2022-05-06 19:04 UTC (permalink / raw)
  To: netdev; +Cc: nicolas.ferre, davem, claudiu.beznea, kuba, pabeni, edumazet

On Fri, 2022-04-29 at 16:31 -0600, Robert Hancock wrote:
> This driver was using the TX IRQ handler to perform all TX completion
> tasks. Under heavy TX network load, this can cause significant irqs-off
> latencies (found to be in the hundreds of microseconds using ftrace).
> This can cause other issues, such as overrunning serial UART FIFOs when
> using high baud rates with limited UART FIFO sizes.
> 
> Switch to using the NAPI poll handler to perform the TX completion work
> to get this out of hard IRQ context and avoid the IRQ latency impact.
> 
> The TX Used Bit Read interrupt (TXUBR) handling also needs to be moved
> into the NAPI poll handler to maintain the proper order of operations. A
> flag is used to notify the poll handler that a UBR condition needs to be
> handled. The macb_tx_restart handler has had some locking added for global
> register access, since this could now potentially happen concurrently on
> different queues.
> 
> Signed-off-by: Robert Hancock <robert.hancock@calian.com>
> ---
>  drivers/net/ethernet/cadence/macb.h      |   1 +
>  drivers/net/ethernet/cadence/macb_main.c | 138 +++++++++++++----------
>  2 files changed, 80 insertions(+), 59 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cadence/macb.h
> b/drivers/net/ethernet/cadence/macb.h
> index f0a7d8396a4a..5355cef95a9b 100644
> --- a/drivers/net/ethernet/cadence/macb.h
> +++ b/drivers/net/ethernet/cadence/macb.h
> @@ -1209,6 +1209,7 @@ struct macb_queue {
>  	struct macb_tx_skb	*tx_skb;
>  	dma_addr_t		tx_ring_dma;
>  	struct work_struct	tx_error_task;
> +	bool			txubr_pending;
>  
>  	dma_addr_t		rx_ring_dma;
>  	dma_addr_t		rx_buffers_dma;
> diff --git a/drivers/net/ethernet/cadence/macb_main.c
> b/drivers/net/ethernet/cadence/macb_main.c
> index 160dc5ad84ae..1cb8afb8ef0d 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -959,7 +959,7 @@ static int macb_halt_tx(struct macb *bp)
>  	return -ETIMEDOUT;
>  }
>  
> -static void macb_tx_unmap(struct macb *bp, struct macb_tx_skb *tx_skb)
> +static void macb_tx_unmap(struct macb *bp, struct macb_tx_skb *tx_skb, int
> budget)
>  {
>  	if (tx_skb->mapping) {
>  		if (tx_skb->mapped_as_page)
> @@ -972,7 +972,7 @@ static void macb_tx_unmap(struct macb *bp, struct
> macb_tx_skb *tx_skb)
>  	}
>  
>  	if (tx_skb->skb) {
> -		dev_kfree_skb_any(tx_skb->skb);
> +		napi_consume_skb(tx_skb->skb, budget);
>  		tx_skb->skb = NULL;
>  	}
>  }
> @@ -1025,12 +1025,13 @@ static void macb_tx_error_task(struct work_struct
> *work)
>  		    (unsigned int)(queue - bp->queues),
>  		    queue->tx_tail, queue->tx_head);
>  
> -	/* Prevent the queue IRQ handlers from running: each of them may call
> -	 * macb_tx_interrupt(), which in turn may call netif_wake_subqueue().
> +	/* Prevent the queue NAPI poll from running, as it calls
> +	 * macb_tx_complete(), which in turn may call netif_wake_subqueue().
>  	 * As explained below, we have to halt the transmission before updating
>  	 * TBQP registers so we call netif_tx_stop_all_queues() to notify the
>  	 * network engine about the macb/gem being halted.
>  	 */
> +	napi_disable(&queue->napi);
>  	spin_lock_irqsave(&bp->lock, flags);
>  
>  	/* Make sure nobody is trying to queue up new packets */
> @@ -1058,7 +1059,7 @@ static void macb_tx_error_task(struct work_struct
> *work)
>  		if (ctrl & MACB_BIT(TX_USED)) {
>  			/* skb is set for the last buffer of the frame */
>  			while (!skb) {
> -				macb_tx_unmap(bp, tx_skb);
> +				macb_tx_unmap(bp, tx_skb, 0);
>  				tail++;
>  				tx_skb = macb_tx_skb(queue, tail);
>  				skb = tx_skb->skb;
> @@ -1088,7 +1089,7 @@ static void macb_tx_error_task(struct work_struct
> *work)
>  			desc->ctrl = ctrl | MACB_BIT(TX_USED);
>  		}
>  
> -		macb_tx_unmap(bp, tx_skb);
> +		macb_tx_unmap(bp, tx_skb, 0);
>  	}
>  
>  	/* Set end of TX queue */
> @@ -1118,25 +1119,28 @@ static void macb_tx_error_task(struct work_struct
> *work)
>  	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
>  
>  	spin_unlock_irqrestore(&bp->lock, flags);
> +	napi_enable(&queue->napi);
>  }
>  
> -static void macb_tx_interrupt(struct macb_queue *queue)
> +static bool macb_tx_complete_pending(struct macb_queue *queue)
> +{
> +	if (queue->tx_head != queue->tx_tail) {
> +		/* Make hw descriptor updates visible to CPU */
> +		rmb();
> +
> +		if (macb_tx_desc(queue, queue->tx_tail)->ctrl &
> MACB_BIT(TX_USED))
> +			return true;
> +	}
> +	return false;
> +}
> +
> +static void macb_tx_complete(struct macb_queue *queue, int budget)
>  {
>  	unsigned int tail;
>  	unsigned int head;
> -	u32 status;
>  	struct macb *bp = queue->bp;
>  	u16 queue_index = queue - bp->queues;
>  
> -	status = macb_readl(bp, TSR);
> -	macb_writel(bp, TSR, status);
> -
> -	if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> -		queue_writel(queue, ISR, MACB_BIT(TCOMP));
> -
> -	netdev_vdbg(bp->dev, "macb_tx_interrupt status = 0x%03lx\n",
> -		    (unsigned long)status);
> -
>  	head = queue->tx_head;
>  	for (tail = queue->tx_tail; tail != head; tail++) {
>  		struct macb_tx_skb	*tx_skb;
> @@ -1182,7 +1186,7 @@ static void macb_tx_interrupt(struct macb_queue *queue)
>  			}
>  
>  			/* Now we can safely release resources */
> -			macb_tx_unmap(bp, tx_skb);
> +			macb_tx_unmap(bp, tx_skb, budget);
>  
>  			/* skb is set only for the last buffer of the frame.
>  			 * WARNING: at this point skb has been freed by
> @@ -1569,23 +1573,55 @@ static bool macb_rx_pending(struct macb_queue *queue)
>  	return (desc->addr & MACB_BIT(RX_USED)) != 0;
>  }
>  
> +static void macb_tx_restart(struct macb_queue *queue)
> +{
> +	unsigned int head = queue->tx_head;
> +	unsigned int tail = queue->tx_tail;
> +	struct macb *bp = queue->bp;
> +	unsigned int head_idx, tbqp;
> +
> +	if (head == tail)
> +		return;
> +
> +	tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
> +	tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
> +	head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, head));
> +
> +	if (tbqp == head_idx)
> +		return;
> +
> +	spin_lock_irq(&bp->lock);
> +	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
> +	spin_unlock_irq(&bp->lock);
> +}
> +
>  static int macb_poll(struct napi_struct *napi, int budget)
>  {
>  	struct macb_queue *queue = container_of(napi, struct macb_queue, napi);
>  	struct macb *bp = queue->bp;
>  	int work_done;
>  
> +	macb_tx_complete(queue, budget);
> +
> +	rmb(); // ensure txubr_pending is up to date
> +	if (queue->txubr_pending) {
> +		queue->txubr_pending = false;
> +		netdev_vdbg(bp->dev, "poll: tx restart\n");
> +		macb_tx_restart(queue);
> +	}
> +
>  	work_done = bp->macbgem_ops.mog_rx(queue, napi, budget);
>  
>  	netdev_vdbg(bp->dev, "poll: queue = %u, work_done = %d, budget = %d\n",
>  		    (unsigned int)(queue - bp->queues), work_done, budget);
>  
>  	if (work_done < budget && napi_complete_done(napi, work_done)) {
> -		queue_writel(queue, IER, bp->rx_intr_mask);
> +		queue_writel(queue, IER, bp->rx_intr_mask |
> +					 MACB_BIT(TCOMP));
>  
>  		/* Packet completions only seem to propagate to raise
>  		 * interrupts when interrupts are enabled at the time, so if
> -		 * packets were received while interrupts were disabled,
> +		 * packets were sent/received while interrupts were disabled,
>  		 * they will not cause another interrupt to be generated when
>  		 * interrupts are re-enabled.
>  		 * Check for this case here to avoid losing a wakeup. This can
> @@ -1593,10 +1629,13 @@ static int macb_poll(struct napi_struct *napi, int
> budget)
>  		 * actions if an interrupt is raised just after enabling them,
>  		 * but this should be harmless.
>  		 */
> -		if (macb_rx_pending(queue)) {
> -			queue_writel(queue, IDR, bp->rx_intr_mask);
> +		if (macb_rx_pending(queue) ||
> +		    macb_tx_complete_pending(queue)) {
> +			queue_writel(queue, IDR, bp->rx_intr_mask |
> +						 MACB_BIT(TCOMP));
>  			if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> -				queue_writel(queue, ISR, MACB_BIT(RCOMP));
> +				queue_writel(queue, ISR, MACB_BIT(RCOMP) |
> +							 MACB_BIT(TCOMP));
>  			netdev_vdbg(bp->dev, "poll: packets pending,
> reschedule\n");
>  			napi_schedule(napi);
>  		}
> @@ -1646,29 +1685,6 @@ static void macb_hresp_error_task(struct
> tasklet_struct *t)
>  	netif_tx_start_all_queues(dev);
>  }
>  
> -static void macb_tx_restart(struct macb_queue *queue)
> -{
> -	unsigned int head = queue->tx_head;
> -	unsigned int tail = queue->tx_tail;
> -	struct macb *bp = queue->bp;
> -	unsigned int head_idx, tbqp;
> -
> -	if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> -		queue_writel(queue, ISR, MACB_BIT(TXUBR));
> -
> -	if (head == tail)
> -		return;
> -
> -	tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
> -	tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
> -	head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, head));
> -
> -	if (tbqp == head_idx)
> -		return;
> -
> -	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
> -}
> -
>  static irqreturn_t macb_wol_interrupt(int irq, void *dev_id)
>  {
>  	struct macb_queue *queue = dev_id;
> @@ -1754,19 +1770,29 @@ static irqreturn_t macb_interrupt(int irq, void
> *dev_id)
>  			    (unsigned int)(queue - bp->queues),
>  			    (unsigned long)status);
>  
> -		if (status & bp->rx_intr_mask) {
> -			/* There's no point taking any more interrupts
> -			 * until we have processed the buffers. The
> +		if (status & (bp->rx_intr_mask |
> +			      MACB_BIT(TCOMP) |
> +			      MACB_BIT(TXUBR))) {
> +			/* There's no point taking any more RX/TX completion
> +			 * interrupts until we have processed the buffers. The
>  			 * scheduling call may fail if the poll routine
>  			 * is already scheduled, so disable interrupts
>  			 * now.
>  			 */
> -			queue_writel(queue, IDR, bp->rx_intr_mask);
> +			queue_writel(queue, IDR, bp->rx_intr_mask |
> +						 MACB_BIT(TCOMP));
>  			if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> -				queue_writel(queue, ISR, MACB_BIT(RCOMP));
> +				queue_writel(queue, ISR, MACB_BIT(RCOMP) |
> +							 MACB_BIT(TCOMP) |
> +							 MACB_BIT(TXUBR));
> +
> +			if (status & MACB_BIT(TXUBR)) {
> +				queue->txubr_pending = true;
> +				wmb(); // ensure softirq can see update
> +			}
>  
>  			if (napi_schedule_prep(&queue->napi)) {
> -				netdev_vdbg(bp->dev, "scheduling RX
> softirq\n");
> +				netdev_vdbg(bp->dev, "scheduling NAPI
> softirq\n");
>  				__napi_schedule(&queue->napi);
>  			}
>  		}
> @@ -1781,12 +1807,6 @@ static irqreturn_t macb_interrupt(int irq, void
> *dev_id)
>  			break;
>  		}
>  
> -		if (status & MACB_BIT(TCOMP))
> -			macb_tx_interrupt(queue);
> 

I've noticed that there's a potential issue since the processing that was being
done in macb_tx_interrupt (and is now in macb_tx_complete) is now no longer
protected by a lock, so we need some new locking to protect the TX ring
pointers from being accessed concurrently in the TX start vs. TX polling paths.
Will add this in and post a new revision.

> -
> -		if (status & MACB_BIT(TXUBR))
> -			macb_tx_restart(queue);
> -
>  		/* Link change detection isn't possible with RMII, so we'll
>  		 * add that if/when we get our hands on a full-blown MII PHY.
>  		 */
> @@ -2019,7 +2039,7 @@ static unsigned int macb_tx_map(struct macb *bp,
>  	for (i = queue->tx_head; i != tx_head; i++) {
>  		tx_skb = macb_tx_skb(queue, i);
>  
> -		macb_tx_unmap(bp, tx_skb);
> +		macb_tx_unmap(bp, tx_skb, 0);
>  	}
>  
>  	return 0;
-- 
Robert Hancock
Senior Hardware Designer, Calian Advanced Technologies
www.calian.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 2/2] net: macb: use NAPI for TX completion path
  2022-05-03 17:07     ` Robert Hancock
@ 2022-05-09  5:49       ` Tomas Melin
  2022-05-09 18:00         ` Robert Hancock
  0 siblings, 1 reply; 16+ messages in thread
From: Tomas Melin @ 2022-05-09  5:49 UTC (permalink / raw)
  To: Robert Hancock, netdev
  Cc: nicolas.ferre, davem, claudiu.beznea, kuba, pabeni, edumazet

Hi Robert,

On 03/05/2022 20:07, Robert Hancock wrote:
> On Tue, 2022-05-03 at 12:57 +0300, Tomas Melin wrote:
>> Hi,
>>
>>
>> On 30/04/2022 01:31, Robert Hancock wrote:
>>> This driver was using the TX IRQ handler to perform all TX completion
>>> tasks. Under heavy TX network load, this can cause significant irqs-off
>>> latencies (found to be in the hundreds of microseconds using ftrace).
>>> This can cause other issues, such as overrunning serial UART FIFOs when
>>> using high baud rates with limited UART FIFO sizes.
>>>
>>> Switch to using the NAPI poll handler to perform the TX completion work
>>> to get this out of hard IRQ context and avoid the IRQ latency impact.
>>>
>>> The TX Used Bit Read interrupt (TXUBR) handling also needs to be moved
>>> into the NAPI poll handler to maintain the proper order of operations. A
>>> flag is used to notify the poll handler that a UBR condition needs to be
>>> handled. The macb_tx_restart handler has had some locking added for global
>>> register access, since this could now potentially happen concurrently on
>>> different queues.
>>>
>>> Signed-off-by: Robert Hancock <robert.hancock@calian.com>
>>> ---
>>>  drivers/net/ethernet/cadence/macb.h      |   1 +
>>>  drivers/net/ethernet/cadence/macb_main.c | 138 +++++++++++++----------
>>>  2 files changed, 80 insertions(+), 59 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/cadence/macb.h
>>> b/drivers/net/ethernet/cadence/macb.h
>>> index f0a7d8396a4a..5355cef95a9b 100644
>>> --- a/drivers/net/ethernet/cadence/macb.h
>>> +++ b/drivers/net/ethernet/cadence/macb.h
>>> @@ -1209,6 +1209,7 @@ struct macb_queue {
>>>  	struct macb_tx_skb	*tx_skb;
>>>  	dma_addr_t		tx_ring_dma;
>>>  	struct work_struct	tx_error_task;
>>> +	bool			txubr_pending;
>>>  
>>>  	dma_addr_t		rx_ring_dma;
>>>  	dma_addr_t		rx_buffers_dma;
>>> diff --git a/drivers/net/ethernet/cadence/macb_main.c
>>> b/drivers/net/ethernet/cadence/macb_main.c
>>> index 160dc5ad84ae..1cb8afb8ef0d 100644
>>> --- a/drivers/net/ethernet/cadence/macb_main.c
>>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>>> @@ -959,7 +959,7 @@ static int macb_halt_tx(struct macb *bp)
>>>  	return -ETIMEDOUT;
>>>  }
>>>  
>>> -static void macb_tx_unmap(struct macb *bp, struct macb_tx_skb *tx_skb)
>>> +static void macb_tx_unmap(struct macb *bp, struct macb_tx_skb *tx_skb, int
>>> budget)
>>>  {
>>>  	if (tx_skb->mapping) {
>>>  		if (tx_skb->mapped_as_page)
>>> @@ -972,7 +972,7 @@ static void macb_tx_unmap(struct macb *bp, struct
>>> macb_tx_skb *tx_skb)
>>>  	}
>>>  
>>>  	if (tx_skb->skb) {
>>> -		dev_kfree_skb_any(tx_skb->skb);
>>> +		napi_consume_skb(tx_skb->skb, budget);
>>>  		tx_skb->skb = NULL;
>>>  	}
>>>  }
>>> @@ -1025,12 +1025,13 @@ static void macb_tx_error_task(struct work_struct
>>> *work)
>>>  		    (unsigned int)(queue - bp->queues),
>>>  		    queue->tx_tail, queue->tx_head);
>>>  
>>> -	/* Prevent the queue IRQ handlers from running: each of them may call
>>> -	 * macb_tx_interrupt(), which in turn may call netif_wake_subqueue().
>>> +	/* Prevent the queue NAPI poll from running, as it calls
>>> +	 * macb_tx_complete(), which in turn may call netif_wake_subqueue().
>>>  	 * As explained below, we have to halt the transmission before updating
>>>  	 * TBQP registers so we call netif_tx_stop_all_queues() to notify the
>>>  	 * network engine about the macb/gem being halted.
>>>  	 */
>>> +	napi_disable(&queue->napi);
>>>  	spin_lock_irqsave(&bp->lock, flags);
>>>  
>>>  	/* Make sure nobody is trying to queue up new packets */
>>> @@ -1058,7 +1059,7 @@ static void macb_tx_error_task(struct work_struct
>>> *work)
>>>  		if (ctrl & MACB_BIT(TX_USED)) {
>>>  			/* skb is set for the last buffer of the frame */
>>>  			while (!skb) {
>>> -				macb_tx_unmap(bp, tx_skb);
>>> +				macb_tx_unmap(bp, tx_skb, 0);
>>>  				tail++;
>>>  				tx_skb = macb_tx_skb(queue, tail);
>>>  				skb = tx_skb->skb;
>>> @@ -1088,7 +1089,7 @@ static void macb_tx_error_task(struct work_struct
>>> *work)
>>>  			desc->ctrl = ctrl | MACB_BIT(TX_USED);
>>>  		}
>>>  
>>> -		macb_tx_unmap(bp, tx_skb);
>>> +		macb_tx_unmap(bp, tx_skb, 0);
>>>  	}
>>>  
>>>  	/* Set end of TX queue */
>>> @@ -1118,25 +1119,28 @@ static void macb_tx_error_task(struct work_struct
>>> *work)
>>>  	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
>>>  
>>>  	spin_unlock_irqrestore(&bp->lock, flags);
>>> +	napi_enable(&queue->napi);
>>>  }
>>>  
>>> -static void macb_tx_interrupt(struct macb_queue *queue)
>>> +static bool macb_tx_complete_pending(struct macb_queue *queue)
>>
>> This might be somewhat problematic approach as TX_USED bit could
>> potentially also be set if a descriptor with TX_USED is found mid-frame
>> (for whatever reason).
>> Not sure how it should be done, but it would be nice if TCOMP
>> would rather be known on per queue basis.
> 
> I think this scenario would already have been a problem if it were possible -
> i.e. when we detect the TCOMP interrupt was asserted, it has likely already
> started sending the next frame in the ring when we are processing the ring
> entries, so if a TX_USED bit was set in the middle of a frame we would
> potentially already have been acting on it and likely breaking? So it seems
> like a safe assumption that this doesn't happen, as it that would make it very
> difficult for the driver to tell when a frame was actually done or not..
The consideration was that TCOMP and TX_USED are as such separate
interrupts, so TX_USED can be asserted even though there is no TCOMP.

Mixing terminology could become confusing here. I.e. IMHO
tx_complete_pending should check if there is TCOMP pending and
txubr_pending TX_USED.

> 
>>
>> Related note, rx side function seems to be named as macb_rx_pending(),
>> should be similary macb_tx_pending()?
> 
> I thought that naming might be a bit misleading, as in the TX case we are not
> only interested in whether frames are pending (i.e. in the TX ring to be sent)
> but also there are completed frames in the ring that have not been processed
> yet.

Wouldn't tx_pending() describe that better then? :) (That there is tx
side operations pending to be processed.)

Thanks,
Tomas



> 
>>
>>
>> Thanks,
>> Tomas
>>
>>
>>> +{
>>> +	if (queue->tx_head != queue->tx_tail) {
>>> +		/* Make hw descriptor updates visible to CPU */
>>> +		rmb();
>>> +
>>> +		if (macb_tx_desc(queue, queue->tx_tail)->ctrl &
>>> MACB_BIT(TX_USED))
>>> +			return true;
>>> +	}
>>> +	return false;
>>> +}
>>> +
>>> +static void macb_tx_complete(struct macb_queue *queue, int budget)
>>>  {
>>>  	unsigned int tail;
>>>  	unsigned int head;
>>> -	u32 status;
>>>  	struct macb *bp = queue->bp;
>>>  	u16 queue_index = queue - bp->queues;
>>>  
>>> -	status = macb_readl(bp, TSR);
>>> -	macb_writel(bp, TSR, status);
>>> -
>>> -	if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
>>> -		queue_writel(queue, ISR, MACB_BIT(TCOMP));
>>> -
>>> -	netdev_vdbg(bp->dev, "macb_tx_interrupt status = 0x%03lx\n",
>>> -		    (unsigned long)status);
>>> -
>>>  	head = queue->tx_head;
>>>  	for (tail = queue->tx_tail; tail != head; tail++) {
>>>  		struct macb_tx_skb	*tx_skb;
>>> @@ -1182,7 +1186,7 @@ static void macb_tx_interrupt(struct macb_queue
>>> *queue)
>>>  			}
>>>  
>>>  			/* Now we can safely release resources */
>>> -			macb_tx_unmap(bp, tx_skb);
>>> +			macb_tx_unmap(bp, tx_skb, budget);
>>>  
>>>  			/* skb is set only for the last buffer of the frame.
>>>  			 * WARNING: at this point skb has been freed by
>>> @@ -1569,23 +1573,55 @@ static bool macb_rx_pending(struct macb_queue
>>> *queue)
>>>  	return (desc->addr & MACB_BIT(RX_USED)) != 0;
>>>  }
>>>  
>>> +static void macb_tx_restart(struct macb_queue *queue)
>>> +{
>>> +	unsigned int head = queue->tx_head;
>>> +	unsigned int tail = queue->tx_tail;
>>> +	struct macb *bp = queue->bp;
>>> +	unsigned int head_idx, tbqp;
>>> +
>>> +	if (head == tail)
>>> +		return;
>>> +
>>> +	tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
>>> +	tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
>>> +	head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, head));
>>> +
>>> +	if (tbqp == head_idx)
>>> +		return;
>>> +
>>> +	spin_lock_irq(&bp->lock);
>>> +	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
>>> +	spin_unlock_irq(&bp->lock);
>>> +}
>>> +
>>>  static int macb_poll(struct napi_struct *napi, int budget)
>>>  {
>>>  	struct macb_queue *queue = container_of(napi, struct macb_queue, napi);
>>>  	struct macb *bp = queue->bp;
>>>  	int work_done;
>>>  
>>> +	macb_tx_complete(queue, budget);
>>> +
>>> +	rmb(); // ensure txubr_pending is up to date
>>> +	if (queue->txubr_pending) {
>>> +		queue->txubr_pending = false;
>>> +		netdev_vdbg(bp->dev, "poll: tx restart\n");
>>> +		macb_tx_restart(queue);
>>> +	}
>>> +
>>>  	work_done = bp->macbgem_ops.mog_rx(queue, napi, budget);
>>>  
>>>  	netdev_vdbg(bp->dev, "poll: queue = %u, work_done = %d, budget = %d\n",
>>>  		    (unsigned int)(queue - bp->queues), work_done, budget);
>>>  
>>>  	if (work_done < budget && napi_complete_done(napi, work_done)) {
>>> -		queue_writel(queue, IER, bp->rx_intr_mask);
>>> +		queue_writel(queue, IER, bp->rx_intr_mask |
>>> +					 MACB_BIT(TCOMP));
>>>  
>>>  		/* Packet completions only seem to propagate to raise
>>>  		 * interrupts when interrupts are enabled at the time, so if
>>> -		 * packets were received while interrupts were disabled,
>>> +		 * packets were sent/received while interrupts were disabled,
>>>  		 * they will not cause another interrupt to be generated when
>>>  		 * interrupts are re-enabled.
>>>  		 * Check for this case here to avoid losing a wakeup. This can
>>> @@ -1593,10 +1629,13 @@ static int macb_poll(struct napi_struct *napi, int
>>> budget)
>>>  		 * actions if an interrupt is raised just after enabling them,
>>>  		 * but this should be harmless.
>>>  		 */
>>> -		if (macb_rx_pending(queue)) {
>>> -			queue_writel(queue, IDR, bp->rx_intr_mask);
>>> +		if (macb_rx_pending(queue) ||
>>> +		    macb_tx_complete_pending(queue)) {
>>> +			queue_writel(queue, IDR, bp->rx_intr_mask |
>>> +						 MACB_BIT(TCOMP));
>>>  			if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
>>> -				queue_writel(queue, ISR, MACB_BIT(RCOMP));
>>> +				queue_writel(queue, ISR, MACB_BIT(RCOMP) |
>>> +							 MACB_BIT(TCOMP));
>>>  			netdev_vdbg(bp->dev, "poll: packets pending,
>>> reschedule\n");
>>>  			napi_schedule(napi);
>>>  		}
>>> @@ -1646,29 +1685,6 @@ static void macb_hresp_error_task(struct
>>> tasklet_struct *t)
>>>  	netif_tx_start_all_queues(dev);
>>>  }
>>>  
>>> -static void macb_tx_restart(struct macb_queue *queue)
>>> -{
>>> -	unsigned int head = queue->tx_head;
>>> -	unsigned int tail = queue->tx_tail;
>>> -	struct macb *bp = queue->bp;
>>> -	unsigned int head_idx, tbqp;
>>> -
>>> -	if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
>>> -		queue_writel(queue, ISR, MACB_BIT(TXUBR));
>>> -
>>> -	if (head == tail)
>>> -		return;
>>> -
>>> -	tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
>>> -	tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
>>> -	head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, head));
>>> -
>>> -	if (tbqp == head_idx)
>>> -		return;
>>> -
>>> -	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
>>> -}
>>> -
>>>  static irqreturn_t macb_wol_interrupt(int irq, void *dev_id)
>>>  {
>>>  	struct macb_queue *queue = dev_id;
>>> @@ -1754,19 +1770,29 @@ static irqreturn_t macb_interrupt(int irq, void
>>> *dev_id)
>>>  			    (unsigned int)(queue - bp->queues),
>>>  			    (unsigned long)status);
>>>  
>>> -		if (status & bp->rx_intr_mask) {
>>> -			/* There's no point taking any more interrupts
>>> -			 * until we have processed the buffers. The
>>> +		if (status & (bp->rx_intr_mask |
>>> +			      MACB_BIT(TCOMP) |
>>> +			      MACB_BIT(TXUBR))) {
>>> +			/* There's no point taking any more RX/TX completion
>>> +			 * interrupts until we have processed the buffers. The
>>>  			 * scheduling call may fail if the poll routine
>>>  			 * is already scheduled, so disable interrupts
>>>  			 * now.
>>>  			 */
>>> -			queue_writel(queue, IDR, bp->rx_intr_mask);
>>> +			queue_writel(queue, IDR, bp->rx_intr_mask |
>>> +						 MACB_BIT(TCOMP));
>>>  			if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
>>> -				queue_writel(queue, ISR, MACB_BIT(RCOMP));
>>> +				queue_writel(queue, ISR, MACB_BIT(RCOMP) |
>>> +							 MACB_BIT(TCOMP) |> +	
>>> 						 MACB_BIT(TXUBR))> +
>>> +			if (status & MACB_BIT(TXUBR)) {
>>> +				queue->txubr_pending = true;
>>> +				wmb(); // ensure softirq can see update
>>> +			}
>>>  
>>>  			if (napi_schedule_prep(&queue->napi)) {
>>> -				netdev_vdbg(bp->dev, "scheduling RX
>>> softirq\n");
>>> +				netdev_vdbg(bp->dev, "scheduling NAPI
>>> softirq\n");
>>>  				__napi_schedule(&queue->napi);
>>>  			}
>>>  		}
>>> @@ -1781,12 +1807,6 @@ static irqreturn_t macb_interrupt(int irq, void
>>> *dev_id)
>>>  			break;
>>>  		}
>>>  
>>> -		if (status & MACB_BIT(TCOMP))
>>> -			macb_tx_interrupt(queue);
>>> -
>>> -		if (status & MACB_BIT(TXUBR))
>>> -			macb_tx_restart(queue);
>>> -
>>>  		/* Link change detection isn't possible with RMII, so we'll
>>>  		 * add that if/when we get our hands on a full-blown MII PHY.
>>>  		 */
>>> @@ -2019,7 +2039,7 @@ static unsigned int macb_tx_map(struct macb *bp,
>>>  	for (i = queue->tx_head; i != tx_head; i++) {
>>>  		tx_skb = macb_tx_skb(queue, i);
>>>  
>>> -		macb_tx_unmap(bp, tx_skb);
>>> +		macb_tx_unmap(bp, tx_skb, 0);
>>>  	}
>>>  
>>>  	return 0;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 2/2] net: macb: use NAPI for TX completion path
  2022-05-03 17:01       ` Robert Hancock
@ 2022-05-09  9:14         ` Harini Katakam
  2022-05-09 19:46           ` Robert Hancock
  0 siblings, 1 reply; 16+ messages in thread
From: Harini Katakam @ 2022-05-09  9:14 UTC (permalink / raw)
  To: Robert Hancock
  Cc: netdev, tomas.melin, nicolas.ferre, davem, claudiu.beznea, kuba,
	pabeni, edumazet

Hi Robert,

Thanks for the patch.

<snip>
>
> Originally I thought there might be a correctness issue with calling it
> unconditionally, but looking at it further I don't think there really is. The
> FreeBSD driver for this hardware also seems to always do the TX restart in the
> interrupt handler if there are packets in the TX queue.
>
> I think the only real issue is whether it's better performance wise to do it
> all the time rather than only after the hardware asserts a TXUBR interrupt. I
> expect it would be worse to do it all the time, as that would mean an extra
> MMIO read, spinlock, MMIO read and MMIO write, versus just a read barrier and
> checking a flag in memory.
>

I agree that doing TX restart only on UBR is better.

> >
> > But should there anyways be some condition for the tx side handling, as
> > I suppose macb_poll() runs when there is rx interrupt even if tx side
> > has nothing to process?
>
> I opted not to do that for this case, as it should be pretty harmless and cheap
> to just check the TX ring to see if any packets have been completed yet, rather
> than actually tracking if a TX completion was pending. That seems to be the
> standard practice in some other drivers (r8169, etc.)

In this implementation the TX interrupt bit is being cleared and TX
processing is
scheduled when there is an RX interrupt as well as vice versa. Could you please
consider using the check "status & MACB_BIT(TCOMP)" for TX interrupt and NAPI?
If I understand your reply above right, you mention that the above check is more
expensive than parsing the TX ring for new data. In unbalanced traffic
scenarios i.e.
server only or client only, will this be efficient?

Regards,
Harini

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 2/2] net: macb: use NAPI for TX completion path
  2022-05-09  5:49       ` Tomas Melin
@ 2022-05-09 18:00         ` Robert Hancock
  0 siblings, 0 replies; 16+ messages in thread
From: Robert Hancock @ 2022-05-09 18:00 UTC (permalink / raw)
  To: netdev, tomas.melin
  Cc: nicolas.ferre, davem, claudiu.beznea, kuba, pabeni, edumazet

On Mon, 2022-05-09 at 08:49 +0300, Tomas Melin wrote:
> Hi Robert,
> 
> On 03/05/2022 20:07, Robert Hancock wrote:
> > On Tue, 2022-05-03 at 12:57 +0300, Tomas Melin wrote:
> > > Hi,
> > > 
> > > 
> > > On 30/04/2022 01:31, Robert Hancock wrote:
> > > > This driver was using the TX IRQ handler to perform all TX completion
> > > > tasks. Under heavy TX network load, this can cause significant irqs-off
> > > > latencies (found to be in the hundreds of microseconds using ftrace).
> > > > This can cause other issues, such as overrunning serial UART FIFOs when
> > > > using high baud rates with limited UART FIFO sizes.
> > > > 
> > > > Switch to using the NAPI poll handler to perform the TX completion work
> > > > to get this out of hard IRQ context and avoid the IRQ latency impact.
> > > > 
> > > > The TX Used Bit Read interrupt (TXUBR) handling also needs to be moved
> > > > into the NAPI poll handler to maintain the proper order of operations.
> > > > A
> > > > flag is used to notify the poll handler that a UBR condition needs to
> > > > be
> > > > handled. The macb_tx_restart handler has had some locking added for
> > > > global
> > > > register access, since this could now potentially happen concurrently
> > > > on
> > > > different queues.
> > > > 
> > > > Signed-off-by: Robert Hancock <robert.hancock@calian.com>
> > > > ---
> > > >  drivers/net/ethernet/cadence/macb.h      |   1 +
> > > >  drivers/net/ethernet/cadence/macb_main.c | 138 +++++++++++++----------
> > > >  2 files changed, 80 insertions(+), 59 deletions(-)
> > > > 
> > > > diff --git a/drivers/net/ethernet/cadence/macb.h
> > > > b/drivers/net/ethernet/cadence/macb.h
> > > > index f0a7d8396a4a..5355cef95a9b 100644
> > > > --- a/drivers/net/ethernet/cadence/macb.h
> > > > +++ b/drivers/net/ethernet/cadence/macb.h
> > > > @@ -1209,6 +1209,7 @@ struct macb_queue {
> > > >  	struct macb_tx_skb	*tx_skb;
> > > >  	dma_addr_t		tx_ring_dma;
> > > >  	struct work_struct	tx_error_task;
> > > > +	bool			txubr_pending;
> > > >  
> > > >  	dma_addr_t		rx_ring_dma;
> > > >  	dma_addr_t		rx_buffers_dma;
> > > > diff --git a/drivers/net/ethernet/cadence/macb_main.c
> > > > b/drivers/net/ethernet/cadence/macb_main.c
> > > > index 160dc5ad84ae..1cb8afb8ef0d 100644
> > > > --- a/drivers/net/ethernet/cadence/macb_main.c
> > > > +++ b/drivers/net/ethernet/cadence/macb_main.c
> > > > @@ -959,7 +959,7 @@ static int macb_halt_tx(struct macb *bp)
> > > >  	return -ETIMEDOUT;
> > > >  }
> > > >  
> > > > -static void macb_tx_unmap(struct macb *bp, struct macb_tx_skb *tx_skb)
> > > > +static void macb_tx_unmap(struct macb *bp, struct macb_tx_skb *tx_skb,
> > > > int
> > > > budget)
> > > >  {
> > > >  	if (tx_skb->mapping) {
> > > >  		if (tx_skb->mapped_as_page)
> > > > @@ -972,7 +972,7 @@ static void macb_tx_unmap(struct macb *bp, struct
> > > > macb_tx_skb *tx_skb)
> > > >  	}
> > > >  
> > > >  	if (tx_skb->skb) {
> > > > -		dev_kfree_skb_any(tx_skb->skb);
> > > > +		napi_consume_skb(tx_skb->skb, budget);
> > > >  		tx_skb->skb = NULL;
> > > >  	}
> > > >  }
> > > > @@ -1025,12 +1025,13 @@ static void macb_tx_error_task(struct
> > > > work_struct
> > > > *work)
> > > >  		    (unsigned int)(queue - bp->queues),
> > > >  		    queue->tx_tail, queue->tx_head);
> > > >  
> > > > -	/* Prevent the queue IRQ handlers from running: each of them
> > > > may call
> > > > -	 * macb_tx_interrupt(), which in turn may call
> > > > netif_wake_subqueue().
> > > > +	/* Prevent the queue NAPI poll from running, as it calls
> > > > +	 * macb_tx_complete(), which in turn may call
> > > > netif_wake_subqueue().
> > > >  	 * As explained below, we have to halt the transmission before
> > > > updating
> > > >  	 * TBQP registers so we call netif_tx_stop_all_queues() to
> > > > notify the
> > > >  	 * network engine about the macb/gem being halted.
> > > >  	 */
> > > > +	napi_disable(&queue->napi);
> > > >  	spin_lock_irqsave(&bp->lock, flags);
> > > >  
> > > >  	/* Make sure nobody is trying to queue up new packets */
> > > > @@ -1058,7 +1059,7 @@ static void macb_tx_error_task(struct work_struct
> > > > *work)
> > > >  		if (ctrl & MACB_BIT(TX_USED)) {
> > > >  			/* skb is set for the last buffer of the frame
> > > > */
> > > >  			while (!skb) {
> > > > -				macb_tx_unmap(bp, tx_skb);
> > > > +				macb_tx_unmap(bp, tx_skb, 0);
> > > >  				tail++;
> > > >  				tx_skb = macb_tx_skb(queue, tail);
> > > >  				skb = tx_skb->skb;
> > > > @@ -1088,7 +1089,7 @@ static void macb_tx_error_task(struct work_struct
> > > > *work)
> > > >  			desc->ctrl = ctrl | MACB_BIT(TX_USED);
> > > >  		}
> > > >  
> > > > -		macb_tx_unmap(bp, tx_skb);
> > > > +		macb_tx_unmap(bp, tx_skb, 0);
> > > >  	}
> > > >  
> > > >  	/* Set end of TX queue */
> > > > @@ -1118,25 +1119,28 @@ static void macb_tx_error_task(struct
> > > > work_struct
> > > > *work)
> > > >  	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
> > > >  
> > > >  	spin_unlock_irqrestore(&bp->lock, flags);
> > > > +	napi_enable(&queue->napi);
> > > >  }
> > > >  
> > > > -static void macb_tx_interrupt(struct macb_queue *queue)
> > > > +static bool macb_tx_complete_pending(struct macb_queue *queue)
> > > 
> > > This might be somewhat problematic approach as TX_USED bit could
> > > potentially also be set if a descriptor with TX_USED is found mid-frame
> > > (for whatever reason).
> > > Not sure how it should be done, but it would be nice if TCOMP
> > > would rather be known on per queue basis.
> > 
> > I think this scenario would already have been a problem if it were possible
> > -
> > i.e. when we detect the TCOMP interrupt was asserted, it has likely already
> > started sending the next frame in the ring when we are processing the ring
> > entries, so if a TX_USED bit was set in the middle of a frame we would
> > potentially already have been acting on it and likely breaking? So it seems
> > like a safe assumption that this doesn't happen, as it that would make it
> > very
> > difficult for the driver to tell when a frame was actually done or not..
> The consideration was that TCOMP and TX_USED are as such separate
> interrupts, so TX_USED can be asserted even though there is no TCOMP.
> 
> Mixing terminology could become confusing here. I.e. IMHO
> tx_complete_pending should check if there is TCOMP pending and
> txubr_pending TX_USED.
> 
> > > Related note, rx side function seems to be named as macb_rx_pending(),
> > > should be similary macb_tx_pending()?
> > 
> > I thought that naming might be a bit misleading, as in the TX case we are
> > not
> > only interested in whether frames are pending (i.e. in the TX ring to be
> > sent)
> > but also there are completed frames in the ring that have not been
> > processed
> > yet.
> 
> Wouldn't tx_pending() describe that better then? :) (That there is tx
> side operations pending to be processed.)

I think my explanation was unclear: in this case only care about whether TX
packets have been completed. We don't care about incomplete TX packets that are
still in the ring which could also be considered "pending".

> 
> Thanks,
> Tomas
> 
> 
> 
> > > 
> > > Thanks,
> > > Tomas
> > > 
> > > 
> > > > +{
> > > > +	if (queue->tx_head != queue->tx_tail) {
> > > > +		/* Make hw descriptor updates visible to CPU */
> > > > +		rmb();
> > > > +
> > > > +		if (macb_tx_desc(queue, queue->tx_tail)->ctrl &
> > > > MACB_BIT(TX_USED))
> > > > +			return true;
> > > > +	}
> > > > +	return false;
> > > > +}
> > > > +
> > > > +static void macb_tx_complete(struct macb_queue *queue, int budget)
> > > >  {
> > > >  	unsigned int tail;
> > > >  	unsigned int head;
> > > > -	u32 status;
> > > >  	struct macb *bp = queue->bp;
> > > >  	u16 queue_index = queue - bp->queues;
> > > >  
> > > > -	status = macb_readl(bp, TSR);
> > > > -	macb_writel(bp, TSR, status);
> > > > -
> > > > -	if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> > > > -		queue_writel(queue, ISR, MACB_BIT(TCOMP));
> > > > -
> > > > -	netdev_vdbg(bp->dev, "macb_tx_interrupt status = 0x%03lx\n",
> > > > -		    (unsigned long)status);
> > > > -
> > > >  	head = queue->tx_head;
> > > >  	for (tail = queue->tx_tail; tail != head; tail++) {
> > > >  		struct macb_tx_skb	*tx_skb;
> > > > @@ -1182,7 +1186,7 @@ static void macb_tx_interrupt(struct macb_queue
> > > > *queue)
> > > >  			}
> > > >  
> > > >  			/* Now we can safely release resources */
> > > > -			macb_tx_unmap(bp, tx_skb);
> > > > +			macb_tx_unmap(bp, tx_skb, budget);
> > > >  
> > > >  			/* skb is set only for the last buffer of the
> > > > frame.
> > > >  			 * WARNING: at this point skb has been freed by
> > > > @@ -1569,23 +1573,55 @@ static bool macb_rx_pending(struct macb_queue
> > > > *queue)
> > > >  	return (desc->addr & MACB_BIT(RX_USED)) != 0;
> > > >  }
> > > >  
> > > > +static void macb_tx_restart(struct macb_queue *queue)
> > > > +{
> > > > +	unsigned int head = queue->tx_head;
> > > > +	unsigned int tail = queue->tx_tail;
> > > > +	struct macb *bp = queue->bp;
> > > > +	unsigned int head_idx, tbqp;
> > > > +
> > > > +	if (head == tail)
> > > > +		return;
> > > > +
> > > > +	tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
> > > > +	tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
> > > > +	head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp,
> > > > head));
> > > > +
> > > > +	if (tbqp == head_idx)
> > > > +		return;
> > > > +
> > > > +	spin_lock_irq(&bp->lock);
> > > > +	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
> > > > +	spin_unlock_irq(&bp->lock);
> > > > +}
> > > > +
> > > >  static int macb_poll(struct napi_struct *napi, int budget)
> > > >  {
> > > >  	struct macb_queue *queue = container_of(napi, struct
> > > > macb_queue, napi);
> > > >  	struct macb *bp = queue->bp;
> > > >  	int work_done;
> > > >  
> > > > +	macb_tx_complete(queue, budget);
> > > > +
> > > > +	rmb(); // ensure txubr_pending is up to date
> > > > +	if (queue->txubr_pending) {
> > > > +		queue->txubr_pending = false;
> > > > +		netdev_vdbg(bp->dev, "poll: tx restart\n");
> > > > +		macb_tx_restart(queue);
> > > > +	}
> > > > +
> > > >  	work_done = bp->macbgem_ops.mog_rx(queue, napi, budget);
> > > >  
> > > >  	netdev_vdbg(bp->dev, "poll: queue = %u, work_done = %d, budget
> > > > = %d\n",
> > > >  		    (unsigned int)(queue - bp->queues), work_done,
> > > > budget);
> > > >  
> > > >  	if (work_done < budget && napi_complete_done(napi, work_done))
> > > > {
> > > > -		queue_writel(queue, IER, bp->rx_intr_mask);
> > > > +		queue_writel(queue, IER, bp->rx_intr_mask |
> > > > +					 MACB_BIT(TCOMP));
> > > >  
> > > >  		/* Packet completions only seem to propagate to raise
> > > >  		 * interrupts when interrupts are enabled at the time,
> > > > so if
> > > > -		 * packets were received while interrupts were
> > > > disabled,
> > > > +		 * packets were sent/received while interrupts were
> > > > disabled,
> > > >  		 * they will not cause another interrupt to be
> > > > generated when
> > > >  		 * interrupts are re-enabled.
> > > >  		 * Check for this case here to avoid losing a wakeup.
> > > > This can
> > > > @@ -1593,10 +1629,13 @@ static int macb_poll(struct napi_struct *napi,
> > > > int
> > > > budget)
> > > >  		 * actions if an interrupt is raised just after
> > > > enabling them,
> > > >  		 * but this should be harmless.
> > > >  		 */
> > > > -		if (macb_rx_pending(queue)) {
> > > > -			queue_writel(queue, IDR, bp->rx_intr_mask);
> > > > +		if (macb_rx_pending(queue) ||
> > > > +		    macb_tx_complete_pending(queue)) {
> > > > +			queue_writel(queue, IDR, bp->rx_intr_mask |
> > > > +						 MACB_BIT(TCOMP));
> > > >  			if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> > > > -				queue_writel(queue, ISR,
> > > > MACB_BIT(RCOMP));
> > > > +				queue_writel(queue, ISR,
> > > > MACB_BIT(RCOMP) |
> > > > +							 MACB_BIT(TCOMP
> > > > ));
> > > >  			netdev_vdbg(bp->dev, "poll: packets pending,
> > > > reschedule\n");
> > > >  			napi_schedule(napi);
> > > >  		}
> > > > @@ -1646,29 +1685,6 @@ static void macb_hresp_error_task(struct
> > > > tasklet_struct *t)
> > > >  	netif_tx_start_all_queues(dev);
> > > >  }
> > > >  
> > > > -static void macb_tx_restart(struct macb_queue *queue)
> > > > -{
> > > > -	unsigned int head = queue->tx_head;
> > > > -	unsigned int tail = queue->tx_tail;
> > > > -	struct macb *bp = queue->bp;
> > > > -	unsigned int head_idx, tbqp;
> > > > -
> > > > -	if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> > > > -		queue_writel(queue, ISR, MACB_BIT(TXUBR));
> > > > -
> > > > -	if (head == tail)
> > > > -		return;
> > > > -
> > > > -	tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
> > > > -	tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
> > > > -	head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp,
> > > > head));
> > > > -
> > > > -	if (tbqp == head_idx)
> > > > -		return;
> > > > -
> > > > -	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
> > > > -}
> > > > -
> > > >  static irqreturn_t macb_wol_interrupt(int irq, void *dev_id)
> > > >  {
> > > >  	struct macb_queue *queue = dev_id;
> > > > @@ -1754,19 +1770,29 @@ static irqreturn_t macb_interrupt(int irq, void
> > > > *dev_id)
> > > >  			    (unsigned int)(queue - bp->queues),
> > > >  			    (unsigned long)status);
> > > >  
> > > > -		if (status & bp->rx_intr_mask) {
> > > > -			/* There's no point taking any more interrupts
> > > > -			 * until we have processed the buffers. The
> > > > +		if (status & (bp->rx_intr_mask |
> > > > +			      MACB_BIT(TCOMP) |
> > > > +			      MACB_BIT(TXUBR))) {
> > > > +			/* There's no point taking any more RX/TX
> > > > completion
> > > > +			 * interrupts until we have processed the
> > > > buffers. The
> > > >  			 * scheduling call may fail if the poll routine
> > > >  			 * is already scheduled, so disable interrupts
> > > >  			 * now.
> > > >  			 */
> > > > -			queue_writel(queue, IDR, bp->rx_intr_mask);
> > > > +			queue_writel(queue, IDR, bp->rx_intr_mask |
> > > > +						 MACB_BIT(TCOMP));
> > > >  			if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
> > > > -				queue_writel(queue, ISR,
> > > > MACB_BIT(RCOMP));
> > > > +				queue_writel(queue, ISR,
> > > > MACB_BIT(RCOMP) |
> > > > +							 MACB_BIT(TCOMP
> > > > ) |> +	
> > > > 						 MACB_BIT(TXUBR))> +
> > > > +			if (status & MACB_BIT(TXUBR)) {
> > > > +				queue->txubr_pending = true;
> > > > +				wmb(); // ensure softirq can see update
> > > > +			}
> > > >  
> > > >  			if (napi_schedule_prep(&queue->napi)) {
> > > > -				netdev_vdbg(bp->dev, "scheduling RX
> > > > softirq\n");
> > > > +				netdev_vdbg(bp->dev, "scheduling NAPI
> > > > softirq\n");
> > > >  				__napi_schedule(&queue->napi);
> > > >  			}
> > > >  		}
> > > > @@ -1781,12 +1807,6 @@ static irqreturn_t macb_interrupt(int irq, void
> > > > *dev_id)
> > > >  			break;
> > > >  		}
> > > >  
> > > > -		if (status & MACB_BIT(TCOMP))
> > > > -			macb_tx_interrupt(queue);
> > > > -
> > > > -		if (status & MACB_BIT(TXUBR))
> > > > -			macb_tx_restart(queue);
> > > > -
> > > >  		/* Link change detection isn't possible with RMII, so
> > > > we'll
> > > >  		 * add that if/when we get our hands on a full-blown
> > > > MII PHY.
> > > >  		 */
> > > > @@ -2019,7 +2039,7 @@ static unsigned int macb_tx_map(struct macb *bp,
> > > >  	for (i = queue->tx_head; i != tx_head; i++) {
> > > >  		tx_skb = macb_tx_skb(queue, i);
> > > >  
> > > > -		macb_tx_unmap(bp, tx_skb);
> > > > +		macb_tx_unmap(bp, tx_skb, 0);
> > > >  	}
> > > >  
> > > >  	return 0;
-- 
Robert Hancock
Senior Hardware Designer, Calian Advanced Technologies
www.calian.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 2/2] net: macb: use NAPI for TX completion path
  2022-05-09  9:14         ` Harini Katakam
@ 2022-05-09 19:46           ` Robert Hancock
  0 siblings, 0 replies; 16+ messages in thread
From: Robert Hancock @ 2022-05-09 19:46 UTC (permalink / raw)
  To: harinik
  Cc: kuba, pabeni, tomas.melin, claudiu.beznea, edumazet, netdev,
	davem, nicolas.ferre

On Mon, 2022-05-09 at 14:44 +0530, Harini Katakam wrote:
> Hi Robert,
> 
> Thanks for the patch.
> 
> <snip>
> > Originally I thought there might be a correctness issue with calling it
> > unconditionally, but looking at it further I don't think there really is.
> > The
> > FreeBSD driver for this hardware also seems to always do the TX restart in
> > the
> > interrupt handler if there are packets in the TX queue.
> > 
> > I think the only real issue is whether it's better performance wise to do
> > it
> > all the time rather than only after the hardware asserts a TXUBR interrupt.
> > I
> > expect it would be worse to do it all the time, as that would mean an extra
> > MMIO read, spinlock, MMIO read and MMIO write, versus just a read barrier
> > and
> > checking a flag in memory.
> > 
> 
> I agree that doing TX restart only on UBR is better.
> 
> > > But should there anyways be some condition for the tx side handling, as
> > > I suppose macb_poll() runs when there is rx interrupt even if tx side
> > > has nothing to process?
> > 
> > I opted not to do that for this case, as it should be pretty harmless and
> > cheap
> > to just check the TX ring to see if any packets have been completed yet,
> > rather
> > than actually tracking if a TX completion was pending. That seems to be the
> > standard practice in some other drivers (r8169, etc.)
> 
> In this implementation the TX interrupt bit is being cleared and TX
> processing is
> scheduled when there is an RX interrupt as well as vice versa. Could you
> please
> consider using the check "status & MACB_BIT(TCOMP)" for TX interrupt and
> NAPI?
> If I understand your reply above right, you mention that the above check is
> more
> expensive than parsing the TX ring for new data. In unbalanced traffic
> scenarios i.e.
> server only or client only, will this be efficient?

In the v2 of this patch set I'm about to submit, I'm splitting the NAPI
structures so that TX and RX have their own poll functions, which avoids the
need for any redundant checks of the state of the other ring (TX or RX) when
not triggered by an interrupt.

> 
> Regards,
> Harini

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-05-09 19:46 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-29 22:31 [PATCH net-next 0/2] MACB NAPI improvements Robert Hancock
2022-04-29 22:31 ` [PATCH net-next 1/2] net: macb: simplify/cleanup NAPI reschedule checking Robert Hancock
2022-04-29 22:31 ` [PATCH net-next 2/2] net: macb: use NAPI for TX completion path Robert Hancock
2022-04-29 23:09   ` Robert Hancock
2022-05-03  7:22     ` Paolo Abeni
2022-05-03 18:34       ` Robert Hancock
2022-05-03  9:34     ` Tomas Melin
2022-05-03 17:01       ` Robert Hancock
2022-05-09  9:14         ` Harini Katakam
2022-05-09 19:46           ` Robert Hancock
2022-05-03  9:57   ` Tomas Melin
2022-05-03 17:07     ` Robert Hancock
2022-05-09  5:49       ` Tomas Melin
2022-05-09 18:00         ` Robert Hancock
2022-05-06 19:04   ` Robert Hancock
2022-05-05 17:56 ` [PATCH net-next 0/2] MACB NAPI improvements Robert Hancock

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.