linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6]:  powerpc/cell spidernet ethernet driver update
@ 2006-08-18 22:07 Linas Vepstas
  2006-08-18 22:20 ` [PATCH 1/6]: powerpc/cell spidernet burst alignment patch Linas Vepstas
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Linas Vepstas @ 2006-08-18 22:07 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: netdev, linux-kernel, linuxppc-dev, akpm, Arnd Bergmann,
	James K Lewis, Utz Bacher, ens Osterkamp, Benjamin Herrenschmidt


Jeff,

Can you apply and forward upstream the following series of patches?
This is the same set of patches I sent only a few days ago; however,
it appears that I failed to cc you on some of them, and I failed to
ask anyone in particular to actually apply them. So I'm asking now. :-)

The maintainership of the spidernet driver is transitioning from
Utz Bacher and Jens Osterkamp to Jim Lewis; you should be receiving
a patch to the MAINTAINERS file from Jim, noting this, shortly.  
I suspect that neither Utz nor Jens will be ack'ing or nack'ing 
these patches unless prodded; let me know if you want some particlar
ack from someone before you apply these. (BenH and Arnd Bergmann
are the ones who usually keep us honest about these things.)

Although these are based on a week-old mm tree, they should 
apply cleanly just about anywhere, as there has been little/no 
change to this code base in a long while. 

The patches bring the transmit performance of the driver up
to a decent level, as reviewed in greter detail in the patch
descriptions.

--linas

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/6]: powerpc/cell spidernet burst alignment patch
  2006-08-18 22:07 [PATCH 0/6]: powerpc/cell spidernet ethernet driver update Linas Vepstas
@ 2006-08-18 22:20 ` Linas Vepstas
  2006-08-18 22:51   ` Arnd Bergmann
  2006-08-18 22:21 ` [PATCH 2/6]: powerpc/cell spidernet low watermark patch Linas Vepstas
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Linas Vepstas @ 2006-08-18 22:20 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: akpm, Arnd Bergmann, netdev, James K Lewis, linux-kernel,
	linuxppc-dev, ens Osterkamp



This patch increases the Burst Address alignment from 64 to 1024 in the
Spidernet driver. This improves transmit performance for arge packets
from about 100Mbps to 300-400Mbps.

From: James K Lewis <jklewis@us.ibm.com>
Signed-off-by: James K Lewis <jklewis@us.ibm.com>
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Cc: Utz Bacher <utz.bacher@de.ibm.com>
Cc: Jens Osterkamp <Jens.Osterkamp@de.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>

----
 drivers/net/spider_net.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.18-rc3-mm2/drivers/net/spider_net.h
===================================================================
--- linux-2.6.18-rc3-mm2.orig/drivers/net/spider_net.h	2006-08-07 14:37:10.000000000 -0500
+++ linux-2.6.18-rc3-mm2/drivers/net/spider_net.h	2006-08-11 11:09:57.000000000 -0500
@@ -209,7 +209,7 @@ extern char spider_net_driver_name[];
 #define SPIDER_NET_DMA_RX_FEND_VALUE	0x00030003
 /* to set TX_DMA_EN */
 #define SPIDER_NET_TX_DMA_EN		0x80000000
-#define SPIDER_NET_GDTDCEIDIS		0x00000002
+#define SPIDER_NET_GDTDCEIDIS		0x00000302
 #define SPIDER_NET_DMA_TX_VALUE		SPIDER_NET_TX_DMA_EN | \
 					SPIDER_NET_GDTDCEIDIS
 #define SPIDER_NET_DMA_TX_FEND_VALUE	0x00030003

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 2/6]: powerpc/cell spidernet low watermark patch.
  2006-08-18 22:07 [PATCH 0/6]: powerpc/cell spidernet ethernet driver update Linas Vepstas
  2006-08-18 22:20 ` [PATCH 1/6]: powerpc/cell spidernet burst alignment patch Linas Vepstas
@ 2006-08-18 22:21 ` Linas Vepstas
       [not found]   ` <200608190109.15129.arnd@arndb.de>
  2006-08-18 22:23 ` [PATCH 3/6]: powerpc/cell spidernet stop error printing patch Linas Vepstas
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Linas Vepstas @ 2006-08-18 22:21 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: akpm, Arnd Bergmann, netdev, James K Lewis, linux-kernel,
	linuxppc-dev, ens Osterkamp



Implement basic low-watermark support for the transmit queue.
Hardware low-watermarks allow a properly configured kernel
to continously stream data to a device and not have to handle 
any interrupts at all in doing so. Correct zero-interrupt
operation can be actually observed for this driver, when the 
socket buffer is made large enough.

The basic idea of a low-watermark interrupt is as follows.
The device driver queues up a bunch of packets for the hardware
to transmit, and then kicks the hardware to get it started.
As the hardware drains the queue of pending, untransmitted 
packets, the device driver will want to know when the queue
is almost empty, so that it can queue some more packets.

If the queue drains down to the low waterark, then an interrupt
will be generated. However, if the kernel/driver continues 
to add enough packets to keep the queue partially filled,
no interrupt will actually be generated, and the hardware 
can continue streaming packets indefinitely in this mode.

The impelmentation is done by setting the DESCR_TXDESFLG flag 
in one of the packets. When the hardware sees this flag, it will 
interrupt the device driver. Because this flag is on a fixed
packet, rather than at  fixed location in the queue, the
code below needs to move the flag as more packets are
queued up. This implementation attempts to keep the flag 
at about 1/4 from "empty".

This patch boosts driver performance from about 
300-400Mbps for 1500 byte packets, to about 710-740Mbps.


Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: James K Lewis <jklewis@us.ibm.com>
Cc: Utz Bacher <utz.bacher@de.ibm.com>
Cc: Jens Osterkamp <Jens.Osterkamp@de.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>

----
 drivers/net/spider_net.c |   56 ++++++++++++++++++++++++++++++++++++++++++-----
 drivers/net/spider_net.h |    6 +++--
 2 files changed, 55 insertions(+), 7 deletions(-)

Index: linux-2.6.18-rc3-mm2/drivers/net/spider_net.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/drivers/net/spider_net.c	2006-08-07 14:39:38.000000000 -0500
+++ linux-2.6.18-rc3-mm2/drivers/net/spider_net.c	2006-08-11 11:23:24.000000000 -0500
@@ -700,6 +700,39 @@ spider_net_release_tx_descr(struct spide
 	dev_kfree_skb_any(skb);
 }
 
+static void
+spider_net_set_low_watermark(struct spider_net_card *card)
+{
+	int status;
+	int cnt=0;
+	int i;
+	struct spider_net_descr *descr = card->tx_chain.tail;
+
+	/* Measure the length of the queue. */
+	while (descr != card->tx_chain.head) {
+		status = descr->dmac_cmd_status & SPIDER_NET_DESCR_NOT_IN_USE;
+		if (status == SPIDER_NET_DESCR_NOT_IN_USE)
+			break;
+		descr = descr->next;
+		cnt++;
+	}
+	if (cnt == 0)
+		return;
+
+	/* Set low-watermark 3/4th's of the way into the queue. */
+	descr = card->tx_chain.tail;
+	cnt = (cnt*3)/4;
+	for (i=0;i<cnt; i++)
+		descr = descr->next;
+
+	/* Set the new watermark, clear the old wtermark */
+	descr->dmac_cmd_status |= SPIDER_NET_DESCR_TXDESFLG;
+	if (card->low_watermark && card->low_watermark != descr)
+		card->low_watermark->dmac_cmd_status =
+		     card->low_watermark->dmac_cmd_status & ~SPIDER_NET_DESCR_TXDESFLG;
+	card->low_watermark = descr;
+}
+
 /**
  * spider_net_release_tx_chain - processes sent tx descriptors
  * @card: adapter structure
@@ -717,6 +750,7 @@ spider_net_release_tx_chain(struct spide
 {
 	struct spider_net_descr_chain *chain = &card->tx_chain;
 	int status;
+	int rc=0;
 
 	spider_net_read_reg(card, SPIDER_NET_GDTDMACCNTR);
 
@@ -729,8 +763,10 @@ spider_net_release_tx_chain(struct spide
 			break;
 
 		case SPIDER_NET_DESCR_CARDOWNED:
-			if (!brutal)
-				return 1;
+			if (!brutal) {
+				rc = 1;
+				goto done;
+			}
 			/* fallthrough, if we release the descriptors
 			 * brutally (then we don't care about
 			 * SPIDER_NET_DESCR_CARDOWNED) */
@@ -747,12 +783,15 @@ spider_net_release_tx_chain(struct spide
 
 		default:
 			card->netdev_stats.tx_dropped++;
-			return 1;
+			rc = 1;
+			goto done;
 		}
 		spider_net_release_tx_descr(card);
 	}
-
-	return 0;
+done:
+	if (rc == 1)
+		spider_net_set_low_watermark(card);
+	return rc;
 }
 
 /**
@@ -1453,6 +1492,10 @@ spider_net_interrupt(int irq, void *ptr,
 		spider_net_rx_irq_off(card);
 		netif_rx_schedule(netdev);
 	}
+	if (status_reg & SPIDER_NET_TXINT ) {
+		spider_net_cleanup_tx_ring(card);
+		netif_wake_queue(netdev);
+	}
 
 	if (status_reg & SPIDER_NET_ERRINT )
 		spider_net_handle_error_irq(card, status_reg);
@@ -1615,6 +1658,9 @@ spider_net_open(struct net_device *netde
 			card->descr,
 			PCI_DMA_TODEVICE, tx_descriptors))
 		goto alloc_tx_failed;
+
+	card->low_watermark = NULL;
+
 	if (spider_net_init_chain(card, &card->rx_chain,
 			card->descr + tx_descriptors,
 			PCI_DMA_FROMDEVICE, rx_descriptors))
Index: linux-2.6.18-rc3-mm2/drivers/net/spider_net.h
===================================================================
--- linux-2.6.18-rc3-mm2.orig/drivers/net/spider_net.h	2006-08-11 11:09:57.000000000 -0500
+++ linux-2.6.18-rc3-mm2/drivers/net/spider_net.h	2006-08-11 11:19:47.000000000 -0500
@@ -47,7 +47,7 @@ extern char spider_net_driver_name[];
 #define SPIDER_NET_TX_DESCRIPTORS_MIN		16
 #define SPIDER_NET_TX_DESCRIPTORS_MAX		512
 
-#define SPIDER_NET_TX_TIMER			20
+#define SPIDER_NET_TX_TIMER			(HZ/5)
 
 #define SPIDER_NET_RX_CSUM_DEFAULT		1
 
@@ -209,7 +209,7 @@ extern char spider_net_driver_name[];
 #define SPIDER_NET_DMA_RX_FEND_VALUE	0x00030003
 /* to set TX_DMA_EN */
 #define SPIDER_NET_TX_DMA_EN		0x80000000
-#define SPIDER_NET_GDTDCEIDIS		0x00000302
+#define SPIDER_NET_GDTDCEIDIS		0x00000300
 #define SPIDER_NET_DMA_TX_VALUE		SPIDER_NET_TX_DMA_EN | \
 					SPIDER_NET_GDTDCEIDIS
 #define SPIDER_NET_DMA_TX_FEND_VALUE	0x00030003
@@ -349,6 +349,7 @@ enum spider_net_int2_status {
 #define SPIDER_NET_DESCR_FORCE_END		0x50000000 /* used in rx and tx */
 #define SPIDER_NET_DESCR_CARDOWNED		0xA0000000 /* used in rx and tx */
 #define SPIDER_NET_DESCR_NOT_IN_USE		0xF0000000
+#define SPIDER_NET_DESCR_TXDESFLG		0x00800000
 
 struct spider_net_descr {
 	/* as defined by the hardware */
@@ -424,6 +425,7 @@ struct spider_net_card {
 
 	struct spider_net_descr_chain tx_chain;
 	struct spider_net_descr_chain rx_chain;
+	struct spider_net_descr *low_watermark;
 
 	struct net_device_stats netdev_stats;
 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 3/6]: powerpc/cell spidernet stop error printing patch.
  2006-08-18 22:07 [PATCH 0/6]: powerpc/cell spidernet ethernet driver update Linas Vepstas
  2006-08-18 22:20 ` [PATCH 1/6]: powerpc/cell spidernet burst alignment patch Linas Vepstas
  2006-08-18 22:21 ` [PATCH 2/6]: powerpc/cell spidernet low watermark patch Linas Vepstas
@ 2006-08-18 22:23 ` Linas Vepstas
  2006-08-18 22:25 ` [PATCH 4/6]: powerpc/cell spidernet ethtool -i version number info Linas Vepstas
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Linas Vepstas @ 2006-08-18 22:23 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: akpm, Arnd Bergmann, netdev, James K Lewis, linux-kernel,
	linuxppc-dev, ens Osterkamp



Turn off mis-interpretation of the queue-empty interrupt
status bit as an error. This bit is set as a part of 
the previous low-watermark patch.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: James K Lewis <jklewis@us.ibm.com>
Cc: Utz Bacher <utz.bacher@de.ibm.com>
Cc: Jens Osterkamp <Jens.Osterkamp@de.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>

----
 drivers/net/spider_net.c |   13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

Index: linux-2.6.18-rc3-mm2/drivers/net/spider_net.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/drivers/net/spider_net.c	2006-08-11 11:23:24.000000000 -0500
+++ linux-2.6.18-rc3-mm2/drivers/net/spider_net.c	2006-08-11 11:34:16.000000000 -0500
@@ -1275,12 +1275,15 @@ spider_net_handle_error_irq(struct spide
 	case SPIDER_NET_PHYINT:
 	case SPIDER_NET_GMAC2INT:
 	case SPIDER_NET_GMAC1INT:
-	case SPIDER_NET_GIPSINT:
 	case SPIDER_NET_GFIFOINT:
 	case SPIDER_NET_DMACINT:
 	case SPIDER_NET_GSYSINT:
 		break; */
 
+	case SPIDER_NET_GIPSINT:
+		show_error = 0;
+		break;
+
 	case SPIDER_NET_GPWOPCMPINT:
 		/* PHY write operation completed */
 		show_error = 0;
@@ -1339,9 +1342,10 @@ spider_net_handle_error_irq(struct spide
 	case SPIDER_NET_GDTDCEINT:
 		/* chain end. If a descriptor should be sent, kick off
 		 * tx dma
-		if (card->tx_chain.tail == card->tx_chain.head)
+		if (card->tx_chain.tail != card->tx_chain.head)
 			spider_net_kick_tx_dma(card);
-		show_error = 0; */
+		*/
+		show_error = 0;
 		break;
 
 	/* case SPIDER_NET_G1TMCNTINT: not used. print a message */
@@ -1455,8 +1459,9 @@ spider_net_handle_error_irq(struct spide
 	}
 
 	if ((show_error) && (netif_msg_intr(card)))
-		pr_err("Got error interrupt, GHIINT0STS = 0x%08x, "
+		pr_err("Got error interrupt on %s, GHIINT0STS = 0x%08x, "
 		       "GHIINT1STS = 0x%08x, GHIINT2STS = 0x%08x\n",
+		       card->netdev->name,
 		       status_reg, error_reg1, error_reg2);
 
 	/* clear interrupt sources */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 4/6]: powerpc/cell spidernet ethtool -i version number info.
  2006-08-18 22:07 [PATCH 0/6]: powerpc/cell spidernet ethernet driver update Linas Vepstas
                   ` (2 preceding siblings ...)
  2006-08-18 22:23 ` [PATCH 3/6]: powerpc/cell spidernet stop error printing patch Linas Vepstas
@ 2006-08-18 22:25 ` Linas Vepstas
  2006-08-18 22:56   ` Arnd Bergmann
  2006-08-18 22:26 ` [PATCH 5/6]: powerpc/cell spidernet bottom half Linas Vepstas
  2006-08-18 22:29 ` [PATCH 6/6]: powerpc/cell spidernet refine locking Linas Vepstas
  5 siblings, 1 reply; 27+ messages in thread
From: Linas Vepstas @ 2006-08-18 22:25 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: akpm, Arnd Bergmann, netdev, James K Lewis, linux-kernel,
	linuxppc-dev, ens Osterkamp



This patch adds version information as reported by 
ethtool -i to the Spidernet driver.

From: James K Lewis <jklewis@us.ibm.com>
Signed-off-by: James K Lewis <jklewis@us.ibm.com>
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Cc: Utz Bacher <utz.bacher@de.ibm.com>
Cc: Jens Osterkamp <Jens.Osterkamp@de.ibm.com>

----
 drivers/net/spider_net.c         |    3 +++
 drivers/net/spider_net.h         |    2 ++
 drivers/net/spider_net_ethtool.c |    2 +-
 3 files changed, 6 insertions(+), 1 deletion(-)

Index: linux-2.6.18-rc3-mm2/drivers/net/spider_net.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/drivers/net/spider_net.c	2006-08-11 11:34:16.000000000 -0500
+++ linux-2.6.18-rc3-mm2/drivers/net/spider_net.c	2006-08-11 11:38:48.000000000 -0500
@@ -55,6 +55,7 @@ MODULE_AUTHOR("Utz Bacher <utz.bacher@de
 	      "<Jens.Osterkamp@de.ibm.com>");
 MODULE_DESCRIPTION("Spider Southbridge Gigabit Ethernet driver");
 MODULE_LICENSE("GPL");
+MODULE_VERSION(VERSION);
 
 static int rx_descriptors = SPIDER_NET_RX_DESCRIPTORS_DEFAULT;
 static int tx_descriptors = SPIDER_NET_TX_DESCRIPTORS_DEFAULT;
@@ -2293,6 +2294,8 @@ static struct pci_driver spider_net_driv
  */
 static int __init spider_net_init(void)
 {
+	printk("spidernet Version %s.\n",VERSION);
+
 	if (rx_descriptors < SPIDER_NET_RX_DESCRIPTORS_MIN) {
 		rx_descriptors = SPIDER_NET_RX_DESCRIPTORS_MIN;
 		pr_info("adjusting rx descriptors to %i.\n", rx_descriptors);
Index: linux-2.6.18-rc3-mm2/drivers/net/spider_net.h
===================================================================
--- linux-2.6.18-rc3-mm2.orig/drivers/net/spider_net.h	2006-08-11 11:19:47.000000000 -0500
+++ linux-2.6.18-rc3-mm2/drivers/net/spider_net.h	2006-08-11 11:38:48.000000000 -0500
@@ -24,6 +24,8 @@
 #ifndef _SPIDER_NET_H
 #define _SPIDER_NET_H
 
+#define VERSION "1.1 A"
+
 #include "sungem_phy.h"
 
 extern int spider_net_stop(struct net_device *netdev);
Index: linux-2.6.18-rc3-mm2/drivers/net/spider_net_ethtool.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/drivers/net/spider_net_ethtool.c	2006-06-17 20:49:35.000000000 -0500
+++ linux-2.6.18-rc3-mm2/drivers/net/spider_net_ethtool.c	2006-08-11 11:38:48.000000000 -0500
@@ -55,7 +55,7 @@ spider_net_ethtool_get_drvinfo(struct ne
 	/* clear and fill out info */
 	memset(drvinfo, 0, sizeof(struct ethtool_drvinfo));
 	strncpy(drvinfo->driver, spider_net_driver_name, 32);
-	strncpy(drvinfo->version, "0.1", 32);
+	strncpy(drvinfo->version, VERSION, 32);
 	strcpy(drvinfo->fw_version, "no information");
 	strncpy(drvinfo->bus_info, pci_name(card->pdev), 32);
 }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 5/6]: powerpc/cell spidernet bottom half
  2006-08-18 22:07 [PATCH 0/6]: powerpc/cell spidernet ethernet driver update Linas Vepstas
                   ` (3 preceding siblings ...)
  2006-08-18 22:25 ` [PATCH 4/6]: powerpc/cell spidernet ethtool -i version number info Linas Vepstas
@ 2006-08-18 22:26 ` Linas Vepstas
  2006-08-18 23:03   ` Arnd Bergmann
  2006-08-18 22:29 ` [PATCH 6/6]: powerpc/cell spidernet refine locking Linas Vepstas
  5 siblings, 1 reply; 27+ messages in thread
From: Linas Vepstas @ 2006-08-18 22:26 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: akpm, Arnd Bergmann, netdev, James K Lewis, linux-kernel,
	linuxppc-dev, ens Osterkamp


The recent set of low-waterark patches for the spider result in a
significant amount of computing being done in an interrupt context.
This patch moves this to a "bottom half" aka work queue, so that
the code runs in a normal kernel context. Curiously, this seems to 
result in a performance boost of about 5% for large packets.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Cc: Utz Bacher <utz.bacher@de.ibm.com>
Cc: Jens Osterkamp <Jens.Osterkamp@de.ibm.com>
Cc: James K Lewis <jklewis@us.ibm.com>

----
 drivers/net/spider_net.c |   21 +++++++++++++++------
 drivers/net/spider_net.h |    4 +++-
 2 files changed, 18 insertions(+), 7 deletions(-)

Index: linux-2.6.18-rc3-mm2/drivers/net/spider_net.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/drivers/net/spider_net.c	2006-08-15 14:25:50.000000000 -0500
+++ linux-2.6.18-rc3-mm2/drivers/net/spider_net.c	2006-08-15 14:28:56.000000000 -0500
@@ -883,9 +883,10 @@ out:
  * spider_net_cleanup_tx_ring - cleans up the TX ring
  * @card: card structure
  *
- * spider_net_cleanup_tx_ring is called by the tx_timer (as we don't use
- * interrupts to cleanup our TX ring) and returns sent packets to the stack
- * by freeing them
+ * spider_net_cleanup_tx_ring is called by either the tx_timer
+ * or from a work-queue scheduled by the tx-empty interrupt.
+ * This routine releases resources associted with transmitted
+ * packets, including updating the queue tail pointer.
  */
 static void
 spider_net_cleanup_tx_ring(struct spider_net_card *card)
@@ -895,12 +896,20 @@ spider_net_cleanup_tx_ring(struct spider
 	spin_lock_irqsave(&card->tx_chain.lock, flags);
 
 	if ((spider_net_release_tx_chain(card, 0) != 0) &&
-	    (card->netdev->flags & IFF_UP))
+	    (card->netdev->flags & IFF_UP)) {
 		spider_net_kick_tx_dma(card);
+		netif_wake_queue(card->netdev);
+	}
 
 	spin_unlock_irqrestore(&card->tx_chain.lock, flags);
 }
 
+static void
+spider_net_tx_cleanup_task(void * data)
+{
+	spider_net_cleanup_tx_ring((struct spider_net_card *) data);
+}
+
 /**
  * spider_net_do_ioctl - called for device ioctls
  * @netdev: interface device structure
@@ -1499,8 +1508,7 @@ spider_net_interrupt(int irq, void *ptr,
 		netif_rx_schedule(netdev);
 	}
 	if (status_reg & SPIDER_NET_TXINT ) {
-		spider_net_cleanup_tx_ring(card);
-		netif_wake_queue(netdev);
+		schedule_work(&card->tx_cleanup_task);
 	}
 
 	if (status_reg & SPIDER_NET_ERRINT )
@@ -2117,6 +2125,7 @@ spider_net_alloc_card(void)
 	card->netdev = netdev;
 	card->msg_enable = SPIDER_NET_DEFAULT_MSG;
 	INIT_WORK(&card->tx_timeout_task, spider_net_tx_timeout_task, netdev);
+	INIT_WORK(&card->tx_cleanup_task, spider_net_tx_cleanup_task, card);
 	init_waitqueue_head(&card->waitq);
 	atomic_set(&card->tx_timeout_task_counter, 0);
 
Index: linux-2.6.18-rc3-mm2/drivers/net/spider_net.h
===================================================================
--- linux-2.6.18-rc3-mm2.orig/drivers/net/spider_net.h	2006-08-15 14:25:50.000000000 -0500
+++ linux-2.6.18-rc3-mm2/drivers/net/spider_net.h	2006-08-15 14:28:56.000000000 -0500
@@ -24,7 +24,7 @@
 #ifndef _SPIDER_NET_H
 #define _SPIDER_NET_H
 
-#define VERSION "1.1 A"
+#define VERSION "1.1 B"
 
 #include "sungem_phy.h"
 
@@ -441,6 +441,8 @@ struct spider_net_card {
 	atomic_t tx_timeout_task_counter;
 	wait_queue_head_t waitq;
 
+	struct work_struct tx_cleanup_task;
+
 	/* for ethtool */
 	int msg_enable;
 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 6/6]:  powerpc/cell spidernet refine locking
  2006-08-18 22:07 [PATCH 0/6]: powerpc/cell spidernet ethernet driver update Linas Vepstas
                   ` (4 preceding siblings ...)
  2006-08-18 22:26 ` [PATCH 5/6]: powerpc/cell spidernet bottom half Linas Vepstas
@ 2006-08-18 22:29 ` Linas Vepstas
  5 siblings, 0 replies; 27+ messages in thread
From: Linas Vepstas @ 2006-08-18 22:29 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: akpm, Arnd Bergmann, netdev, James K Lewis, linux-kernel,
	linuxppc-dev, ens Osterkamp



The transmit side of the spider ethernet driver currently
places locks around some very large chunks of code. This
results in a fair amount of lock contention is some cases. 
This patch makes the locks much more fine-grained, protecting
only the cirtical sections. One lock is used to protect 
three locations: the queue head and tail pointers, and the 
queue low-watermark location.

This, with the previous patches, result in the following 
performance, using netperf, averaged over 5 minute runs:

pkt size    rate
--------    ----
1500        804 Mbits/sec
 800        701 Mbits/sec
 600        600 Mbits/sec
 300        280 Mbits/sec
  60         60 Mbits/sec


Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Cc: Utz Bacher <utz.bacher@de.ibm.com>
Cc: Jens Osterkamp <Jens.Osterkamp@de.ibm.com>
Cc: James K Lewis <jklewis@us.ibm.com>

----
 drivers/net/spider_net.c |   77 ++++++++++++++++++++---------------------------
 drivers/net/spider_net.h |    2 -
 2 files changed, 35 insertions(+), 44 deletions(-)

Index: linux-2.6.18-rc3-mm2/drivers/net/spider_net.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/drivers/net/spider_net.c	2006-08-15 14:28:56.000000000 -0500
+++ linux-2.6.18-rc3-mm2/drivers/net/spider_net.c	2006-08-15 14:29:36.000000000 -0500
@@ -644,8 +644,9 @@ static int
 spider_net_prepare_tx_descr(struct spider_net_card *card,
 			    struct sk_buff *skb)
 {
-	struct spider_net_descr *descr = card->tx_chain.head;
+	struct spider_net_descr *descr;
 	dma_addr_t buf;
+	unsigned long flags;
 
 	buf = pci_map_single(card->pdev, skb->data, skb->len, PCI_DMA_TODEVICE);
 	if (buf == DMA_ERROR_CODE) {
@@ -655,6 +656,10 @@ spider_net_prepare_tx_descr(struct spide
 		return -ENOMEM;
 	}
 
+	spin_lock_irqsave(&card->tx_chain.lock, flags);
+	descr = card->tx_chain.head;
+	card->tx_chain.head = card->tx_chain.head->next;
+
 	descr->buf_addr = buf;
 	descr->buf_size = skb->len;
 	descr->next_descr_addr = 0;
@@ -663,6 +668,8 @@ spider_net_prepare_tx_descr(struct spide
 
 	descr->dmac_cmd_status =
 			SPIDER_NET_DESCR_CARDOWNED | SPIDER_NET_DMAC_NOCS;
+	spin_unlock_irqrestore(&card->tx_chain.lock, flags);
+
 	if (skb->protocol == htons(ETH_P_IP))
 		switch (skb->nh.iph->protocol) {
 		case IPPROTO_TCP:
@@ -673,37 +680,16 @@ spider_net_prepare_tx_descr(struct spide
 			break;
 		}
 
+	/* Chain the bus address, so that the DMA engine finds this descr. */
 	descr->prev->next_descr_addr = descr->bus_addr;
 
 	return 0;
 }
 
-/**
- * spider_net_release_tx_descr - processes a used tx descriptor
- * @card: card structure
- * @descr: descriptor to release
- *
- * releases a used tx descriptor (unmapping, freeing of skb)
- */
-static inline void
-spider_net_release_tx_descr(struct spider_net_card *card)
-{
-	struct spider_net_descr *descr = card->tx_chain.tail;
-	struct sk_buff *skb;
-
-	card->tx_chain.tail = card->tx_chain.tail->next;
-	descr->dmac_cmd_status |= SPIDER_NET_DESCR_NOT_IN_USE;
-
-	/* unmap the skb */
-	skb = descr->skb;
-	pci_unmap_single(card->pdev, descr->buf_addr, skb->len,
-			PCI_DMA_TODEVICE);
-	dev_kfree_skb_any(skb);
-}
-
 static void
 spider_net_set_low_watermark(struct spider_net_card *card)
 {
+	unsigned long flags;
 	int status;
 	int cnt=0;
 	int i;
@@ -727,11 +713,13 @@ spider_net_set_low_watermark(struct spid
 		descr = descr->next;
 
 	/* Set the new watermark, clear the old wtermark */
+	spin_lock_irqsave(&card->tx_chain.lock, flags);
 	descr->dmac_cmd_status |= SPIDER_NET_DESCR_TXDESFLG;
 	if (card->low_watermark && card->low_watermark != descr)
 		card->low_watermark->dmac_cmd_status =
 		     card->low_watermark->dmac_cmd_status & ~SPIDER_NET_DESCR_TXDESFLG;
 	card->low_watermark = descr;
+	spin_unlock_irqrestore(&card->tx_chain.lock, flags);
 }
 
 /**
@@ -750,22 +738,30 @@ static int
 spider_net_release_tx_chain(struct spider_net_card *card, int brutal)
 {
 	struct spider_net_descr_chain *chain = &card->tx_chain;
+	struct spider_net_descr *descr;
+	struct sk_buff *skb;
+	u32 buf_addr;
+	unsigned long flags;
 	int status;
 	int rc=0;
 
 	spider_net_read_reg(card, SPIDER_NET_GDTDMACCNTR);
 
 	while (chain->tail != chain->head) {
-		status = spider_net_get_descr_status(chain->tail);
+		spin_lock_irqsave(&chain->lock, flags);
+		descr = chain->tail;
+
+		status = spider_net_get_descr_status(descr);
 		switch (status) {
 		case SPIDER_NET_DESCR_COMPLETE:
 			card->netdev_stats.tx_packets++;
-			card->netdev_stats.tx_bytes += chain->tail->skb->len;
+			card->netdev_stats.tx_bytes += descr->skb->len;
 			break;
 
 		case SPIDER_NET_DESCR_CARDOWNED:
 			if (!brutal) {
 				rc = 1;
+				spin_unlock_irqrestore(&chain->lock, flags);
 				goto done;
 			}
 			/* fallthrough, if we release the descriptors
@@ -785,9 +781,19 @@ spider_net_release_tx_chain(struct spide
 		default:
 			card->netdev_stats.tx_dropped++;
 			rc = 1;
+			spin_unlock_irqrestore(&chain->lock, flags);
 			goto done;
 		}
-		spider_net_release_tx_descr(card);
+
+		chain->tail = chain->tail->next;
+		descr->dmac_cmd_status |= SPIDER_NET_DESCR_NOT_IN_USE;
+		skb = descr->skb;
+		buf_addr = descr->buf_addr;
+		spin_unlock_irqrestore(&chain->lock, flags);
+
+		/* unmap the skb */
+		pci_unmap_single(card->pdev, buf_addr, skb->len, PCI_DMA_TODEVICE);
+		dev_kfree_skb_any(skb);
 	}
 done:
 	if (rc == 1)
@@ -844,11 +850,8 @@ spider_net_xmit(struct sk_buff *skb, str
 	struct spider_net_card *card = netdev_priv(netdev);
 	struct spider_net_descr_chain *chain = &card->tx_chain;
 	struct spider_net_descr *descr = chain->head;
-	unsigned long flags;
 	int result;
 
-	spin_lock_irqsave(&chain->lock, flags);
-
 	spider_net_release_tx_chain(card, 0);
 
 	if (chain->head->next == chain->tail->prev) {
@@ -869,12 +872,9 @@ spider_net_xmit(struct sk_buff *skb, str
 	}
 
 	result = NETDEV_TX_OK;
-
 	spider_net_kick_tx_dma(card);
-	card->tx_chain.head = card->tx_chain.head->next;
 
 out:
-	spin_unlock_irqrestore(&chain->lock, flags);
 	netif_wake_queue(netdev);
 	return result;
 }
@@ -891,17 +891,11 @@ out:
 static void
 spider_net_cleanup_tx_ring(struct spider_net_card *card)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(&card->tx_chain.lock, flags);
-
 	if ((spider_net_release_tx_chain(card, 0) != 0) &&
 	    (card->netdev->flags & IFF_UP)) {
 		spider_net_kick_tx_dma(card);
 		netif_wake_queue(card->netdev);
 	}
-
-	spin_unlock_irqrestore(&card->tx_chain.lock, flags);
 }
 
 static void
@@ -1932,10 +1926,7 @@ spider_net_stop(struct net_device *netde
 	spider_net_disable_rxdmac(card);
 
 	/* release chains */
-	if (spin_trylock(&card->tx_chain.lock)) {
-		spider_net_release_tx_chain(card, 1);
-		spin_unlock(&card->tx_chain.lock);
-	}
+	spider_net_release_tx_chain(card, 1);
 
 	spider_net_free_chain(card, &card->tx_chain);
 	spider_net_free_chain(card, &card->rx_chain);
Index: linux-2.6.18-rc3-mm2/drivers/net/spider_net.h
===================================================================
--- linux-2.6.18-rc3-mm2.orig/drivers/net/spider_net.h	2006-08-15 14:28:56.000000000 -0500
+++ linux-2.6.18-rc3-mm2/drivers/net/spider_net.h	2006-08-15 14:29:36.000000000 -0500
@@ -24,7 +24,7 @@
 #ifndef _SPIDER_NET_H
 #define _SPIDER_NET_H
 
-#define VERSION "1.1 B"
+#define VERSION "1.1 C"
 
 #include "sungem_phy.h"
 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/6]: powerpc/cell spidernet burst alignment patch
  2006-08-18 22:20 ` [PATCH 1/6]: powerpc/cell spidernet burst alignment patch Linas Vepstas
@ 2006-08-18 22:51   ` Arnd Bergmann
  0 siblings, 0 replies; 27+ messages in thread
From: Arnd Bergmann @ 2006-08-18 22:51 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Linas Vepstas, Jeff Garzik, akpm, netdev, James K Lewis,
	linux-kernel, ens Osterkamp

On Saturday 19 August 2006 00:20, Linas Vepstas wrote:
> This patch increases the Burst Address alignment from 64 to 1024 in the
> Spidernet driver. This improves transmit performance for arge packets
> from about 100Mbps to 300-400Mbps.

Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/6]: powerpc/cell spidernet ethtool -i version number info.
  2006-08-18 22:25 ` [PATCH 4/6]: powerpc/cell spidernet ethtool -i version number info Linas Vepstas
@ 2006-08-18 22:56   ` Arnd Bergmann
  0 siblings, 0 replies; 27+ messages in thread
From: Arnd Bergmann @ 2006-08-18 22:56 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Linas Vepstas, Jeff Garzik, akpm, netdev, James K Lewis,
	linux-kernel, ens Osterkamp

On Saturday 19 August 2006 00:25, Linas Vepstas wrote:
> This patch adds version information as reported by 
> ethtool -i to the Spidernet driver.

Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>

except for

> @@ -2293,6 +2294,8 @@ static struct pci_driver spider_net_driv
>   */
>  static int __init spider_net_init(void)
>  {
> +       printk("spidernet Version %s.\n",VERSION);
> +

This printk is missing a level (KERN_INFO or similar). Moreover,
it is rather strange for a driver to print a message when no
device is actually used by it. I'd rather drop the version
printk completely, but I know that Jim has strong feelings about
what to do with version information. I suggest that if we decide
to keep something like that in the driver, it should be printed
in spider_net_probe().

	Arnd <><

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/6]: powerpc/cell spidernet bottom half
  2006-08-18 22:26 ` [PATCH 5/6]: powerpc/cell spidernet bottom half Linas Vepstas
@ 2006-08-18 23:03   ` Arnd Bergmann
  2006-08-19  0:56     ` [RFC] HOWTO use NAPI to reduce TX interrupts Arnd Bergmann
  2006-08-23 21:52     ` [PATCH 5/6]: powerpc/cell spidernet bottom half Linas Vepstas
  0 siblings, 2 replies; 27+ messages in thread
From: Arnd Bergmann @ 2006-08-18 23:03 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Linas Vepstas, Jeff Garzik, akpm, netdev, James K Lewis,
	linux-kernel, ens Osterkamp

On Saturday 19 August 2006 00:26, Linas Vepstas wrote:
> The recent set of low-waterark patches for the spider result in a
> significant amount of computing being done in an interrupt context.
> This patch moves this to a "bottom half" aka work queue, so that
> the code runs in a normal kernel context. Curiously, this seems to 
> result in a performance boost of about 5% for large packets.

I guess this one still needs some work. We already have a bottom
half mechanism in the network layer, using the NAPI poll function
that is strongly serialized.

Linas, you wrote that you have tried doing the TX descriptor cleanup
in dev->poll(), but I think you missed the point that this function
needs should then be scheduled whenever necessary.

This seems a little strange, but I think what we need to do is in the
low-watermark interrupt call netif_rx_schedule(netdev) in order to
arrange for the ->poll function to be called in the next softirq.

Someone should probably document that in 
Documentation/networking/NAPI_HOWTO.txt, I might end up doing that
once we get it right for spidernet.

	Arnd <><

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC] HOWTO use NAPI to reduce TX interrupts
  2006-08-18 23:03   ` Arnd Bergmann
@ 2006-08-19  0:56     ` Arnd Bergmann
  2006-08-20  1:31       ` Stephen Hemminger
  2006-08-23 21:52     ` [PATCH 5/6]: powerpc/cell spidernet bottom half Linas Vepstas
  1 sibling, 1 reply; 27+ messages in thread
From: Arnd Bergmann @ 2006-08-19  0:56 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: akpm, James K Lewis, linux-kernel, netdev, Jeff Garzik,
	Jens Osterkamp, David Miller

On Saturday 19 August 2006 01:03, Arnd Bergmann wrote:
> Someone should probably document that in 
> Documentation/networking/NAPI_HOWTO.txt, I might end up doing that
> once we get it right for spidernet.

Oh well, what else is there to do on a Friday night ;-)

This is a first draft, I expect to have gotten some aspects wrong,
so please review well.

For those that did not follow the spidernet discussion, it turns out
that all good ethernet drivers call their TX descriptor reclaim function
from their poll() method, but there is no documentation explaining
why this is a good idea . I talked to a number of people that have
written network drivers before and most of them have not understood
this well enough. I'm sure spidernet is not the only driver that
implemented this wrong.

Thanks to benh and davem for being really patient about explaining this
to me!

	Arnd <><

diff --git a/Documentation/networking/NAPI_HOWTO.txt b/Documentation/networking/NAPI_HOWTO.txt
index 54376e8..4d333a4 100644
--- a/Documentation/networking/NAPI_HOWTO.txt
+++ b/Documentation/networking/NAPI_HOWTO.txt
@@ -738,6 +738,60 @@ USER       PID %CPU %MEM  SIZE   RSS TTY
 root         3  0.2  0.0     0     0  ?  RWN Aug 15 602:00 (ksoftirqd_CPU0)
 root       232  0.0  7.9 41400 40884  ?  S   Aug 15  74:12 gated 
 
+
+APPENDIX 4: Using NAPI for TX skb cleanup
+=========================================
+
+While most of the discussion is focused on optimising the receive path,
+in most drivers it is also beneficial to free TX buffers from the
+dev->poll() function. Many devices trigger an interrupt for each
+frame that has been sent out to notify the driver that it can free
+the skb. This results in a large amount of interrupt processing that
+we want to avoid.
+
+The simplistic approach of setting a long kernel timer to clean up
+descriptors results in poor throughput because a user process that
+tries to send out a lot of data then blocks on its socket send buffer,
+while the driver never frees up the skbs in that buffer until the
+timeout.
+
+In order to get optimal throughput on transmit, the sent skbs need to
+be cleaned up before the chip runs out of data to transmit, so
+relying on an end of queue interrupt means that in the window between
+the interrupt and the time that new user packets have arrived in the
+adapter, there is no outgoing data on the wire, even if user data is
+available.
+It may also be bad to defer freeing skbs too long because they may consume
+a significan amount of memory.
+
+Experience shows that combination of events that trigger skb reclaim
+works best. These events include:
+- new packets coming in through hard_start_xmit()
+- packets coming in from the network through dev->poll()
+- time has passed since the first frame was send over the wire
+  but has not been reclaimed (tx_coalesce_usecs)
+- a number of packets have been sent (tx_max_coalesced_frames)
+
+We can avoid expensive locking between these by using the poll()
+function as the only place to call skb reclaim. This also means
+that in the interrupt handler, we always call netif_rx_schedule()
+for any interrupt, regardless of whether it is an rx or tx
+event. Don't get confused by the fact that we're scheduling
+the rx softirq to do tx work.
+
+Depending on the actual hardware, slightly different methods
+for coalesced tx interrupts may be used:
+- a timer that starts with the successful transmission of a packet
+  may need to be replaced with a timer that is started at when a
+  packet is submitted to the adapter.
+- instead of an interrupt that is triggered after a fixed number
+  of transmitted packets, it may be possible to mark a specific
+  packet so it generates an interrupt after processing.
+- If the adapter knows about the number of packets that have been
+  queued, a low-watermark interrupt may be used that fires
+  when the number drops below a user-defined value.
+
+
 --------------------------------------------------------------------
 
 relevant sites:

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC] HOWTO use NAPI to reduce TX interrupts
  2006-08-20  1:31       ` Stephen Hemminger
@ 2006-08-19 11:25         ` Arnd Bergmann
  2006-08-20 17:48           ` [RFC v2] " Arnd Bergmann
  2006-08-21 23:52           ` [RFC] HOWTO use NAPI to reduce TX interrupts Linas Vepstas
  0 siblings, 2 replies; 27+ messages in thread
From: Arnd Bergmann @ 2006-08-19 11:25 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: linuxppc-dev, akpm, James K Lewis, linux-kernel, netdev,
	Jeff Garzik, Jens Osterkamp, David Miller

On Sunday 20 August 2006 03:31, Stephen Hemminger wrote:
> 
> The reason reclaim via poll() is efficient is because it avoid causing a 
> softirq that is
> necessary when skb_free_irq() is done. Instead it reuses the softirq 
> from the poll() routine. 

Ok, I completely missed this point so far, thanks for the info.

> Like all Rx NAPI, using poll() for reclaim means: 
>     + aggregating multiple frames in one irq
>     - increased overhead of twiddling with the IRQ mask
>     - more ways to get driver stuck

What is the best way to treat the IRQ mask for TX interrupts?
I guess it should be roughly:

- off when we expect ->poll() to be called, i.e. after calling
  netif_rx_schedule() or returning after a partial rx from poll().
- off when there are no packets left in the TX queue
- on while RX interrupts are on and we're waiting for packets
  to be transmitted.

> Some drivers do all their irq work in the poll() routine (including PHY 
> handling).
> This is good if reading the IRQ status does an auto mask operation.
> 
> The whole NAPI documentation area is a mess and needs a good writer
> to do some major restructuring. It should also be split into reference 
> information,
> tutorial and guide sections.

I won't be able to do that work, I'm neither a good writer nor a networking
person.

Do you think we should still merge a section like the text I wrote up, even
if it makes the text even less well structured? Should I maybe add it
somewhere else than the appendix?

	Arnd <><

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] HOWTO use NAPI to reduce TX interrupts
  2006-08-19  0:56     ` [RFC] HOWTO use NAPI to reduce TX interrupts Arnd Bergmann
@ 2006-08-20  1:31       ` Stephen Hemminger
  2006-08-19 11:25         ` Arnd Bergmann
  0 siblings, 1 reply; 27+ messages in thread
From: Stephen Hemminger @ 2006-08-20  1:31 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linuxppc-dev, akpm, James K Lewis, linux-kernel, netdev,
	Jeff Garzik, Jens Osterkamp, David Miller

Arnd Bergmann wrote:
> On Saturday 19 August 2006 01:03, Arnd Bergmann wrote:
>   
>> Someone should probably document that in 
>> Documentation/networking/NAPI_HOWTO.txt, I might end up doing that
>> once we get it right for spidernet
>>     

The reason reclaim via poll() is efficient is because it avoid causing a 
softirq that is
necessary when skb_free_irq() is done. Instead it reuses the softirq 
from the poll()
routine. Like all Rx NAPI, using poll() for reclaim means:
    + aggregating multiple frames in one irq
    - increased overhead of twiddling with the IRQ mask
    - more ways to get driver stuck
   
Some drivers do all their irq work in the poll() routine (including PHY 
handling).
This is good if reading the IRQ status does an auto mask operation.

The whole NAPI documentation area is a mess and needs a good writer
to do some major restructuring. It should also be split into reference 
information,
tutorial and guide sections.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/6]: powerpc/cell spidernet low watermark patch.
       [not found]   ` <200608190109.15129.arnd@arndb.de>
@ 2006-08-20  6:31     ` Benjamin Herrenschmidt
  2006-08-20 10:03       ` Arnd Bergmann
  0 siblings, 1 reply; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2006-08-20  6:31 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linuxppc-dev, akpm, James K Lewis, linux-kernel, netdev,
	Jeff Garzik, ens Osterkamp


> card->low_watermark->next->dmac_cmd_status |= SPIDER_NET_DESCR_TXDESFLG;
> mb();
> card->low_watermark->dmac_cmd_status &= ~SPIDER_NET_DESCR_TXDESFLG;
> card->low_watermark = card->low_watermark->next;
> 
> when we queue another frame for TX.

I would have expected those to be racy vs. the hardware... what if the
hardware is updating dmac_cmd_status just as your are trying to and the
bit out of it ?

Ben



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/6]: powerpc/cell spidernet low watermark patch.
  2006-08-20  6:31     ` Benjamin Herrenschmidt
@ 2006-08-20 10:03       ` Arnd Bergmann
  2006-08-23 21:36         ` Linas Vepstas
  0 siblings, 1 reply; 27+ messages in thread
From: Arnd Bergmann @ 2006-08-20 10:03 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linuxppc-dev, akpm, James K Lewis, linux-kernel, netdev,
	Jeff Garzik, ens Osterkamp

On Sunday 20 August 2006 08:31, Benjamin Herrenschmidt wrote:
> > card->low_watermark->next->dmac_cmd_status |= SPIDER_NET_DESCR_TXDESFLG;
> > mb();
> > card->low_watermark->dmac_cmd_status &= ~SPIDER_NET_DESCR_TXDESFLG;
> > card->low_watermark = card->low_watermark->next;
> > 
> > when we queue another frame for TX.
> 
> I would have expected those to be racy vs. the hardware... what if the
> hardware is updating dmac_cmd_status just as your are trying to and the
> bit out of it ?

Right, that doesn't work. It is the only bit we use in that byte though,
so maybe it can be done with a single byte write.

	Arnd <><

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC v2] HOWTO use NAPI to reduce TX interrupts
  2006-08-19 11:25         ` Arnd Bergmann
@ 2006-08-20 17:48           ` Arnd Bergmann
  2006-08-21 20:40             ` NAPI documentation Stephen Hemminger
  2006-08-21 23:52           ` [RFC] HOWTO use NAPI to reduce TX interrupts Linas Vepstas
  1 sibling, 1 reply; 27+ messages in thread
From: Arnd Bergmann @ 2006-08-20 17:48 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: linuxppc-dev, akpm, James K Lewis, linux-kernel, netdev,
	Jeff Garzik, Jens Osterkamp, David Miller, Linas Vepstas

A recent discussion about the spidernet driver resulted in the dicovery
that network drivers are supposed to use NAPI for both their receive and
transmit paths, but this is documented nowhere.

In order to help the next person writing a NAPI based driver, I wrote
down what I found missing about this.

Please tell me if anything in here is still wrong or could use better
wording.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

---
This is the second version of my mini howto, after a few comments
I got from Stephen Hemminger and  Avuton Olrich.

Index: linux-cg/Documentation/networking/NAPI_HOWTO.txt
===================================================================
--- linux-cg.orig/Documentation/networking/NAPI_HOWTO.txt	2006-08-20 16:51:12.000000000 +0200
+++ linux-cg/Documentation/networking/NAPI_HOWTO.txt	2006-08-20 19:42:20.000000000 +0200
@@ -1,11 +1,6 @@
-HISTORY:
-February 16/2002 -- revision 0.2.1:
-COR typo corrected
-February 10/2002 -- revision 0.2:
-some spell checking ;->
-January 12/2002 -- revision 0.1
-This is still work in progress so may change.
-To keep up to date please watch this space.
+Note: this document could use a serious cleanup by a good writer.
+It would be nice to split out the reference parts into a kerneldoc
+document and turn the rest into a tutorial.
 
 Introduction to NAPI
 ====================
@@ -738,6 +733,64 @@
 root         3  0.2  0.0     0     0  ?  RWN Aug 15 602:00 (ksoftirqd_CPU0)
 root       232  0.0  7.9 41400 40884  ?  S   Aug 15  74:12 gated 
 
+
+APPENDIX 4: Using NAPI for TX skb cleanup
+=========================================
+
+While most of the discussion is focused on optimizing the receive path, in
+most drivers it is also beneficial to free TX buffers from the dev->poll()
+function. Many devices trigger an interrupt for each packet that has been
+sent out to notify the driver that it can free the skb. This results in
+a large amount of interrupt processing that we want to avoid. It is also
+suboptimal to free skbs in a hardirq context, because dev_kfree_skb_irq()
+needs to schedule a softirq to do the actual work. Calling dev_kfree_skb()
+from dev->poll() directly avoids these extra softirq schedules.
+
+The simplistic approach of setting a long kernel timer to clean up
+descriptors results in poor throughput because a user process that tries
+to send out a lot of data then blocks on its socket send buffer, while
+the driver never frees up the skbs in that buffer until the timeout.
+
+Trying the cleanup every time that hard_start_xmit() is entered provides
+relatively good throughput, but typically causes extra processing overhead
+because of mmio accesses and/or spinlocks, so you normally want to batch
+skb reclaim.
+
+In order to get optimal throughput on transmit, the sent skbs need to be
+cleaned up before the chip runs out of data to transmit, so relying on
+an end of queue interrupt means that in the window between the interrupt
+and the time that new user packets have arrived in the adapter, there is
+no outgoing data on the wire, even if user data is available.  It may
+also be bad to defer freeing skbs too long because they may consume a
+significant amount of memory.
+
+Experience shows that combination of events that trigger skb reclaim
+works best. These events include:
+- new packets coming in through hard_start_xmit()
+- packets coming in from the network through dev->poll()
+- time has passed since the first packet was send over the wire
+  but has not been reclaimed (tx_coalesce_usecs)
+- a number of packets have been sent (tx_max_coalesced_frames)
+
+We can avoid expensive locking between these by using the poll() function
+as the only place to call skb reclaim. This also means that in the
+interrupt handler, we always call netif_rx_schedule() for any interrupt,
+including those for tx or e.g. PHY handling.  This is particularly
+helpful if reading the IRQ status does an auto mask operation.
+
+Depending on the actual hardware, slightly different methods for coalesced
+tx interrupts may be used:
+- a timer that starts with the successful transmission of a packet
+  may need to be replaced with a timer that is started at when a packet
+  is submitted to the adapter.
+- instead of an interrupt that is triggered after a fixed number
+  of transmitted packets, it may be possible to mark a specific packet
+  so it generates an interrupt after processing.
+- If the adapter knows about the number of packets that have been
+  queued, a low-watermark interrupt may be used that fires when the
+  number drops below a user-defined value.
+
+
 --------------------------------------------------------------------
 
 relevant sites:
@@ -764,3 +817,4 @@
 Manfred Spraul <manfred@colorfullife.com>
 Donald Becker <becker@scyld.com>
 Jeff Garzik <jgarzik@pobox.com>
+Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* NAPI documentation
  2006-08-20 17:48           ` [RFC v2] " Arnd Bergmann
@ 2006-08-21 20:40             ` Stephen Hemminger
  2006-08-21 22:05               ` David Miller
  0 siblings, 1 reply; 27+ messages in thread
From: Stephen Hemminger @ 2006-08-21 20:40 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linuxppc-dev, akpm, James K Lewis, linux-kernel, netdev,
	Jeff Garzik, Jens Osterkamp, David Miller, Linas Vepstas,
	Jonathan Corbet

I took this opportunity to get a start on improving NAPI documentation.
I mashed together the short text about NAPI (with permission) from lwn.net
and the existing NAPI-HOWTO, to make a page on the Linux net wiki.

I took the liberty of removing some of the bits that were out of date, or referred
to old Becker code. It still needs lots of editing to be presentable.

Please edit and improve
	http://linux-net.osdl.org/index.php/NAPI

When the page is in good shape, I will de-wiki it to place in kernel doc tree.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NAPI documentation
  2006-08-21 20:40             ` NAPI documentation Stephen Hemminger
@ 2006-08-21 22:05               ` David Miller
  2006-08-21 22:09                 ` Stephen Hemminger
  0 siblings, 1 reply; 27+ messages in thread
From: David Miller @ 2006-08-21 22:05 UTC (permalink / raw)
  To: shemminger
  Cc: arnd, linuxppc-dev, akpm, jklewis, linux-kernel, netdev, jgarzik,
	Jens.Osterkamp, linas, corbet

From: Stephen Hemminger <shemminger@osdl.org>
Date: Mon, 21 Aug 2006 13:40:53 -0700

> Please edit and improve
> 	http://linux-net.osdl.org/index.php/NAPI
> 
> When the page is in good shape, I will de-wiki it to place in kernel doc tree.

How do I edit the introduction paragraphs at the top?  I want to edit
this sentence since it sounds awful:

	NAPI ("New API") is a modification to the packet process, ...

I want to change "packet process" to something more descriptive
and accurate.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NAPI documentation
  2006-08-21 22:05               ` David Miller
@ 2006-08-21 22:09                 ` Stephen Hemminger
  2006-08-21 22:17                   ` David Miller
  0 siblings, 1 reply; 27+ messages in thread
From: Stephen Hemminger @ 2006-08-21 22:09 UTC (permalink / raw)
  To: David Miller
  Cc: arnd, linuxppc-dev, akpm, jklewis, linux-kernel, netdev, jgarzik,
	Jens.Osterkamp, linas, corbet

On Mon, 21 Aug 2006 15:05:09 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Stephen Hemminger <shemminger@osdl.org>
> Date: Mon, 21 Aug 2006 13:40:53 -0700
> 
> > Please edit and improve
> > 	http://linux-net.osdl.org/index.php/NAPI
> > 
> > When the page is in good shape, I will de-wiki it to place in kernel doc tree.
> 
> How do I edit the introduction paragraphs at the top? 

Click edit on wiki and it is the first sentence.


> I want to edit
> this sentence since it sounds awful:
> 
> 	NAPI ("New API") is a modification to the packet process, ...
> 
> I want to change "packet process" to something more descriptive
> and accurate.
>

Yes.

Also, does Jamal have a link to his UKUUG paper?

-- 
Stephen Hemminger <shemminger@osdl.org>

All non-trivial abstractions, to some degree, are leaky.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: NAPI documentation
  2006-08-21 22:09                 ` Stephen Hemminger
@ 2006-08-21 22:17                   ` David Miller
  0 siblings, 0 replies; 27+ messages in thread
From: David Miller @ 2006-08-21 22:17 UTC (permalink / raw)
  To: shemminger
  Cc: arnd, linuxppc-dev, akpm, jklewis, linux-kernel, netdev, jgarzik,
	Jens.Osterkamp, linas, corbet

From: Stephen Hemminger <shemminger@osdl.org>
Date: Mon, 21 Aug 2006 15:09:35 -0700

> Click edit on wiki and it is the first sentence.

Thanks a lot.

> Also, does Jamal have a link to his UKUUG paper?

I don't think there is a copy online right now.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] HOWTO use NAPI to reduce TX interrupts
  2006-08-19 11:25         ` Arnd Bergmann
  2006-08-20 17:48           ` [RFC v2] " Arnd Bergmann
@ 2006-08-21 23:52           ` Linas Vepstas
  2006-08-21 23:56             ` David Miller
  1 sibling, 1 reply; 27+ messages in thread
From: Linas Vepstas @ 2006-08-21 23:52 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Stephen Hemminger, akpm, netdev, James K Lewis, linux-kernel,
	linuxppc-dev, Jens Osterkamp, Jeff Garzik, David Miller

On Sat, Aug 19, 2006 at 01:25:18PM +0200, Arnd Bergmann wrote:
> 
> What is the best way to treat the IRQ mask for TX interrupts?
> I guess it should be roughly:
> 
> - off when we expect ->poll() to be called, i.e. after calling
>   netif_rx_schedule() or returning after a partial rx from poll().

Under what circumstance does one turn TX interrupts back on?
I couldn't quite figure that out.

--linas

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] HOWTO use NAPI to reduce TX interrupts
  2006-08-21 23:52           ` [RFC] HOWTO use NAPI to reduce TX interrupts Linas Vepstas
@ 2006-08-21 23:56             ` David Miller
  2006-08-22  0:29               ` Roland Dreier
  0 siblings, 1 reply; 27+ messages in thread
From: David Miller @ 2006-08-21 23:56 UTC (permalink / raw)
  To: linas
  Cc: arnd, shemminger, akpm, netdev, jklewis, linux-kernel,
	linuxppc-dev, Jens.Osterkamp, jgarzik

From: linas@austin.ibm.com (Linas Vepstas)
Date: Mon, 21 Aug 2006 18:52:44 -0500

> Under what circumstance does one turn TX interrupts back on?
> I couldn't quite figure that out.

Don't touch interrupts until both RX and TX queue work is
fullydepleted.  You seem to have this notion that RX and TX interrupts
are seperate.  They aren't, even if your device can generate those
events individually.  Whatever interrupt you get, you shut down all
interrupt sources and schedule the ->poll().  Then ->poll() does
something like:

	all_tx_completion_work();
	ret = as_much_rx_work_as_budget_and_quota_allows();
	if (!ret)
		reenable_interrupts_and_complet_napi_poll();

You always run the TX completion work fully, then you do the RX work
within the quota/budget.

See the tg3 driver for details, really...


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] HOWTO use NAPI to reduce TX interrupts
  2006-08-21 23:56             ` David Miller
@ 2006-08-22  0:29               ` Roland Dreier
  2006-08-22  0:32                 ` David Miller
  0 siblings, 1 reply; 27+ messages in thread
From: Roland Dreier @ 2006-08-22  0:29 UTC (permalink / raw)
  To: David Miller
  Cc: linas, arnd, shemminger, akpm, netdev, jklewis, linux-kernel,
	linuxppc-dev, Jens.Osterkamp, jgarzik

    David> Don't touch interrupts until both RX and TX queue work is
    David> fullydepleted.  You seem to have this notion that RX and TX
    David> interrupts are seperate.  They aren't, even if your device
    David> can generate those events individually.  Whatever interrupt
    David> you get, you shut down all interrupt sources and schedule
    David> the ->poll().  Then ->poll() does something like:

This is a digression from spidernet, but what if a device is able to
generate separate MSIs for TX and RX?  Some people from IBM have
suggested that it is beneficial for throughput to handle TX work and
RX work for IP-over-InfiniBand in parallel on separate CPUs, and
handling everything through the ->poll() method would defeat this.

 - R.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC] HOWTO use NAPI to reduce TX interrupts
  2006-08-22  0:29               ` Roland Dreier
@ 2006-08-22  0:32                 ` David Miller
  0 siblings, 0 replies; 27+ messages in thread
From: David Miller @ 2006-08-22  0:32 UTC (permalink / raw)
  To: rdreier
  Cc: linas, arnd, shemminger, akpm, netdev, jklewis, linux-kernel,
	linuxppc-dev, Jens.Osterkamp, jgarzik

From: Roland Dreier <rdreier@cisco.com>
Date: Mon, 21 Aug 2006 17:29:05 -0700

> This is a digression from spidernet, but what if a device is able to
> generate separate MSIs for TX and RX?  Some people from IBM have
> suggested that it is beneficial for throughput to handle TX work and
> RX work for IP-over-InfiniBand in parallel on separate CPUs, and
> handling everything through the ->poll() method would defeat this.

The TX work is so incredibly cheap, relatively speaking, compared
to the full input packet processing path that the RX side runs
that I see no real benefit.

In fact, you might even get better locality due to the way the
locking can be performed if TX reclaim runs inside of ->poll()

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/6]: powerpc/cell spidernet low watermark patch.
  2006-08-20 10:03       ` Arnd Bergmann
@ 2006-08-23 21:36         ` Linas Vepstas
  2006-08-23 22:03           ` David Miller
  0 siblings, 1 reply; 27+ messages in thread
From: Linas Vepstas @ 2006-08-23 21:36 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Benjamin Herrenschmidt, netdev, James K Lewis, linux-kernel,
	linuxppc-dev

On Sun, Aug 20, 2006 at 12:03:14PM +0200, Arnd Bergmann wrote:
> On Sunday 20 August 2006 08:31, Benjamin Herrenschmidt wrote:
> > > card->low_watermark->next->dmac_cmd_status |= SPIDER_NET_DESCR_TXDESFLG;
> > > mb();
> > > card->low_watermark->dmac_cmd_status &= ~SPIDER_NET_DESCR_TXDESFLG;
> > > card->low_watermark = card->low_watermark->next;
> > > 
> > > when we queue another frame for TX.
> > 
> > I would have expected those to be racy vs. the hardware... what if the
> > hardware is updating dmac_cmd_status just as your are trying to and the
> > bit out of it ?
> 
> Right, that doesn't work. It is the only bit we use in that byte though,
> so maybe it can be done with a single byte write.

Thanks, you're right, I missed that.  I'll change this to byte access 
shortly.  Any recommendations for style/api for byte access? 

I could create a searate patch to change struct descr {} to split 
the u32 into several u8's; there's a dozen spots that get touched.

Alternatel, I could do a cheesy cast to char[4] and access that way.
Opinions?

--linas

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/6]: powerpc/cell spidernet bottom half
  2006-08-18 23:03   ` Arnd Bergmann
  2006-08-19  0:56     ` [RFC] HOWTO use NAPI to reduce TX interrupts Arnd Bergmann
@ 2006-08-23 21:52     ` Linas Vepstas
  1 sibling, 0 replies; 27+ messages in thread
From: Linas Vepstas @ 2006-08-23 21:52 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linuxppc-dev, Jeff Garzik, akpm, netdev, James K Lewis,
	linux-kernel, ens Osterkamp

On Sat, Aug 19, 2006 at 01:03:04AM +0200, Arnd Bergmann wrote:
> using the NAPI poll function

Still fiddling with this. Getting side-tracked after noticing
that the RX side generates a *huge* numbe of interrupts, despite
code in the driver which superficially appears to be RX NAPI.  
One step forward, two steps back, isn't there a dance like that?

--linas

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/6]: powerpc/cell spidernet low watermark patch.
  2006-08-23 21:36         ` Linas Vepstas
@ 2006-08-23 22:03           ` David Miller
  0 siblings, 0 replies; 27+ messages in thread
From: David Miller @ 2006-08-23 22:03 UTC (permalink / raw)
  To: linas; +Cc: arnd, benh, netdev, jklewis, linux-kernel, linuxppc-dev

From: linas@austin.ibm.com (Linas Vepstas)
Date: Wed, 23 Aug 2006 16:36:42 -0500

> I could create a searate patch to change struct descr {} to split 
> the u32 into several u8's; there's a dozen spots that get touched.
> 
> Alternatel, I could do a cheesy cast to char[4] and access that way.
> Opinions?

The most portable scheme would be a "u32/u8[4]" union with
appropriate endianness checks when determining which byte
to access in the u8[] view.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2006-08-23 22:03 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-08-18 22:07 [PATCH 0/6]: powerpc/cell spidernet ethernet driver update Linas Vepstas
2006-08-18 22:20 ` [PATCH 1/6]: powerpc/cell spidernet burst alignment patch Linas Vepstas
2006-08-18 22:51   ` Arnd Bergmann
2006-08-18 22:21 ` [PATCH 2/6]: powerpc/cell spidernet low watermark patch Linas Vepstas
     [not found]   ` <200608190109.15129.arnd@arndb.de>
2006-08-20  6:31     ` Benjamin Herrenschmidt
2006-08-20 10:03       ` Arnd Bergmann
2006-08-23 21:36         ` Linas Vepstas
2006-08-23 22:03           ` David Miller
2006-08-18 22:23 ` [PATCH 3/6]: powerpc/cell spidernet stop error printing patch Linas Vepstas
2006-08-18 22:25 ` [PATCH 4/6]: powerpc/cell spidernet ethtool -i version number info Linas Vepstas
2006-08-18 22:56   ` Arnd Bergmann
2006-08-18 22:26 ` [PATCH 5/6]: powerpc/cell spidernet bottom half Linas Vepstas
2006-08-18 23:03   ` Arnd Bergmann
2006-08-19  0:56     ` [RFC] HOWTO use NAPI to reduce TX interrupts Arnd Bergmann
2006-08-20  1:31       ` Stephen Hemminger
2006-08-19 11:25         ` Arnd Bergmann
2006-08-20 17:48           ` [RFC v2] " Arnd Bergmann
2006-08-21 20:40             ` NAPI documentation Stephen Hemminger
2006-08-21 22:05               ` David Miller
2006-08-21 22:09                 ` Stephen Hemminger
2006-08-21 22:17                   ` David Miller
2006-08-21 23:52           ` [RFC] HOWTO use NAPI to reduce TX interrupts Linas Vepstas
2006-08-21 23:56             ` David Miller
2006-08-22  0:29               ` Roland Dreier
2006-08-22  0:32                 ` David Miller
2006-08-23 21:52     ` [PATCH 5/6]: powerpc/cell spidernet bottom half Linas Vepstas
2006-08-18 22:29 ` [PATCH 6/6]: powerpc/cell spidernet refine locking Linas Vepstas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).