All of lore.kernel.org
 help / color / mirror / Atom feed
* Generic rx-recycling and emergency skb pool
@ 2010-07-02 19:20 Sebastian Andrzej Siewior
  2010-07-02 19:20 ` [PATCH 1/8] net: implement generic rx recycling Sebastian Andrzej Siewior
                   ` (8 more replies)
  0 siblings, 9 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx


This is version two of generic rx-recycling followed by version one of
emergency skb pools which are built on top of rx-recycling.
The change from v1 of generic rx-recycling is that the list access is
unlocked instead of locked.
Patch six which introduces the emergency pools adds the locking back.
This is required since we now have two not serialized users. In order
not to punish everyone patch eight removes this locking again. That
patch converts only two drivers so you have an idea what I think is
required to get the locking removed.

The idea behind emergency pools is to have pre-allocated skbs for TX and
RX. Using the memory allocator for it leads to latencies during memory
pressure. The pre-allocated skb are "tagged" and should get back to the
pool once they are through the stack so the pool should never get
exhausted.

While it was easy to convert the drivers which share the same concept of
rx-recycling to use the emergency pools it was difficult to hook up the
more complex drivers like e1000e. The e1000e can use split skbs / a frag
list which is different from the allocation currently used. So instead of
forcing all drivers to use the same way of doing things I've been thinking
about providing a dedicated callback for skb allocation and checking if
this skb is "good enough". This is not yet implemented.

I would be glad to receive some feedback on this patch series before I go
any further. Unfortunately I'm on vacation for the next two weeks so I
can't respond earlier. tglx is on Cc and should be able respond earlier :)

Sebastian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/8] net: implement generic rx recycling
  2010-07-02 19:20 Generic rx-recycling and emergency skb pool Sebastian Andrzej Siewior
@ 2010-07-02 19:20 ` Sebastian Andrzej Siewior
  2010-07-02 19:20 ` [PATCH 2/8] net/gianfar: use generic recycling infrasstructure Sebastian Andrzej Siewior
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx, Sebastian Andrzej Siewior

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

the code is basically what other drivers like gianfar are using right
now.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/netdevice.h |   67 +++++++++++++++++++++++++++++++++++++--------
 net/core/dev.c            |    1 +
 2 files changed, 56 insertions(+), 12 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 8fa5e5a..4fa400b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -595,6 +595,18 @@ struct netdev_rx_queue {
 } ____cacheline_aligned_in_smp;
 #endif /* CONFIG_RPS */
 
+/* Use this variant when it is known for sure that it
+ * is executing from hardware interrupt context or with hardware interrupts
+ * disabled.
+ */
+extern void dev_kfree_skb_irq(struct sk_buff *skb);
+
+/* Use this variant in places where it could be invoked
+ * from either hardware interrupt or other context, with hardware interrupts
+ * either disabled or enabled.
+ */
+extern void dev_kfree_skb_any(struct sk_buff *skb);
+
 /*
  * This structure defines the management hooks for network devices.
  * The following hooks can be defined; unless noted otherwise, they are
@@ -1077,9 +1089,52 @@ struct net_device {
 #endif
 	/* n-tuple filter list attached to this device */
 	struct ethtool_rx_ntuple_list ethtool_ntuple_list;
+	struct sk_buff_head rx_recycle;
+	u32 rx_rec_skbs_max;
+	u32 rx_rec_skb_size;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
+static inline void net_recycle_init(struct net_device *dev, u32 qlen, u32 size)
+{
+	dev->rx_rec_skbs_max = qlen;
+	dev->rx_rec_skb_size = size;
+}
+
+static inline void net_recycle_size(struct net_device *dev, u32 size)
+{
+	dev->rx_rec_skb_size = size;
+}
+
+static inline void net_recycle_qlen(struct net_device *dev, u32 qlen)
+{
+	dev->rx_rec_skbs_max = qlen;
+}
+
+static inline void net_recycle_cleanup(struct net_device *dev)
+{
+	skb_queue_purge(&dev->rx_recycle);
+}
+
+static inline void net_recycle_add(struct net_device *dev, struct sk_buff *skb)
+{
+	if (skb_queue_len(&dev->rx_recycle) < dev->rx_rec_skbs_max &&
+			skb_recycle_check(skb, dev->rx_rec_skb_size))
+		__skb_queue_head(&dev->rx_recycle, skb);
+	else
+		dev_kfree_skb_any(skb);
+}
+
+static inline struct sk_buff *net_recycle_get(struct net_device *dev)
+{
+	struct sk_buff *skb;
+
+	skb = __skb_dequeue(&dev->rx_recycle);
+	if (skb)
+		return skb;
+	return netdev_alloc_skb(dev, dev->rx_rec_skb_size);
+}
+
 #define	NETDEV_ALIGN		32
 
 static inline
@@ -1672,18 +1727,6 @@ static inline int netif_is_multiqueue(const struct net_device *dev)
 	return (dev->num_tx_queues > 1);
 }
 
-/* Use this variant when it is known for sure that it
- * is executing from hardware interrupt context or with hardware interrupts
- * disabled.
- */
-extern void dev_kfree_skb_irq(struct sk_buff *skb);
-
-/* Use this variant in places where it could be invoked
- * from either hardware interrupt or other context, with hardware interrupts
- * either disabled or enabled.
- */
-extern void dev_kfree_skb_any(struct sk_buff *skb);
-
 #define HAVE_NETIF_RX 1
 extern int		netif_rx(struct sk_buff *skb);
 extern int		netif_rx_ni(struct sk_buff *skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index e85cc5f..db9acd5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5404,6 +5404,7 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 
 	netdev_init_queues(dev);
 
+	skb_queue_head_init(&dev->rx_recycle);
 	INIT_LIST_HEAD(&dev->ethtool_ntuple_list.list);
 	dev->ethtool_ntuple_list.count = 0;
 	INIT_LIST_HEAD(&dev->napi_list);
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/8] net/gianfar: use generic recycling infrasstructure
  2010-07-02 19:20 Generic rx-recycling and emergency skb pool Sebastian Andrzej Siewior
  2010-07-02 19:20 ` [PATCH 1/8] net: implement generic rx recycling Sebastian Andrzej Siewior
@ 2010-07-02 19:20 ` Sebastian Andrzej Siewior
  2010-07-02 19:20 ` [PATCH 3/8] net/mv643xx: use generic recycling infrastructure Sebastian Andrzej Siewior
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx, Sebastian Andrzej Siewior

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/gianfar.c         |   28 ++++++++--------------------
 drivers/net/gianfar.h         |    2 --
 drivers/net/gianfar_ethtool.c |    1 +
 3 files changed, 9 insertions(+), 22 deletions(-)

diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index fccb7a3..1a1a249 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -1148,6 +1148,9 @@ static int gfar_probe(struct of_device *ofdev,
 		priv->rx_queue[i]->rxic = DEFAULT_RXIC;
 	}
 
+	net_recycle_init(dev, DEFAULT_RX_RING_SIZE,
+			priv->rx_buffer_size + RXBUF_ALIGNMENT);
+
 	/* enable filer if using multiple RX queues*/
 	if(priv->num_rx_queues > 1)
 		priv->rx_filer_enable = 1;
@@ -1768,7 +1771,7 @@ static void free_skb_resources(struct gfar_private *priv)
 			sizeof(struct rxbd8) * priv->total_rx_ring_size,
 			priv->tx_queue[0]->tx_bd_base,
 			priv->tx_queue[0]->tx_bd_dma_base);
-	skb_queue_purge(&priv->rx_recycle);
+	net_recycle_cleanup(priv->ndev);
 }
 
 void gfar_start(struct net_device *dev)
@@ -1949,8 +1952,6 @@ static int gfar_enet_open(struct net_device *dev)
 
 	enable_napi(priv);
 
-	skb_queue_head_init(&priv->rx_recycle);
-
 	/* Initialize a bunch of registers */
 	init_registers(dev);
 
@@ -2366,6 +2367,7 @@ static int gfar_change_mtu(struct net_device *dev, int new_mtu)
 		stop_gfar(dev);
 
 	priv->rx_buffer_size = tempsize;
+	net_recycle_size(dev, tempsize + RXBUF_ALIGNMENT);
 
 	dev->mtu = new_mtu;
 
@@ -2498,16 +2500,7 @@ static int gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue)
 			bdp = next_txbd(bdp, base, tx_ring_size);
 		}
 
-		/*
-		 * If there's room in the queue (limit it to rx_buffer_size)
-		 * we add this skb back into the pool, if it's the right size
-		 */
-		if (skb_queue_len(&priv->rx_recycle) < rx_queue->rx_ring_size &&
-				skb_recycle_check(skb, priv->rx_buffer_size +
-					RXBUF_ALIGNMENT))
-			__skb_queue_head(&priv->rx_recycle, skb);
-		else
-			dev_kfree_skb_any(skb);
+		net_recycle_add(dev, skb);
 
 		tx_queue->tx_skbuff[skb_dirtytx] = NULL;
 
@@ -2573,14 +2566,9 @@ static void gfar_new_rxbdp(struct gfar_priv_rx_q *rx_queue, struct rxbd8 *bdp,
 struct sk_buff * gfar_new_skb(struct net_device *dev)
 {
 	unsigned int alignamount;
-	struct gfar_private *priv = netdev_priv(dev);
 	struct sk_buff *skb = NULL;
 
-	skb = __skb_dequeue(&priv->rx_recycle);
-	if (!skb)
-		skb = netdev_alloc_skb(dev,
-				priv->rx_buffer_size + RXBUF_ALIGNMENT);
-
+	skb = net_recycle_get(dev);
 	if (!skb)
 		return NULL;
 
@@ -2753,7 +2741,7 @@ int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, int rx_work_limit)
 				 * recycle list.
 				 */
 				skb_reserve(skb, -GFAR_CB(skb)->alignamount);
-				__skb_queue_head(&priv->rx_recycle, skb);
+				net_recycle_add(dev, skb);
 			}
 		} else {
 			/* Increment the number of packets */
diff --git a/drivers/net/gianfar.h b/drivers/net/gianfar.h
index 710810e..66f8c04 100644
--- a/drivers/net/gianfar.h
+++ b/drivers/net/gianfar.h
@@ -1068,8 +1068,6 @@ struct gfar_private {
 
 	u32 cur_filer_idx;
 
-	struct sk_buff_head rx_recycle;
-
 	struct vlan_group *vlgrp;
 
 
diff --git a/drivers/net/gianfar_ethtool.c b/drivers/net/gianfar_ethtool.c
index 9bda023..8a6d567 100644
--- a/drivers/net/gianfar_ethtool.c
+++ b/drivers/net/gianfar_ethtool.c
@@ -508,6 +508,7 @@ static int gfar_sringparam(struct net_device *dev, struct ethtool_ringparam *rva
 		priv->tx_queue[i]->tx_ring_size = rvals->tx_pending;
 		priv->tx_queue[i]->num_txbdfree = priv->tx_queue[i]->tx_ring_size;
 	}
+	net_recycle_qlen(dev, rvals->rx_pending);
 
 	/* Rebuild the rings with the new size */
 	if (dev->flags & IFF_UP) {
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/8] net/mv643xx: use generic recycling infrastructure
  2010-07-02 19:20 Generic rx-recycling and emergency skb pool Sebastian Andrzej Siewior
  2010-07-02 19:20 ` [PATCH 1/8] net: implement generic rx recycling Sebastian Andrzej Siewior
  2010-07-02 19:20 ` [PATCH 2/8] net/gianfar: use generic recycling infrasstructure Sebastian Andrzej Siewior
@ 2010-07-02 19:20 ` Sebastian Andrzej Siewior
  2010-07-02 19:20 ` [PATCH 4/8] net/stmmac: " Sebastian Andrzej Siewior
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx, Sebastian Andrzej Siewior

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/mv643xx_eth.c |   27 +++++++++------------------
 1 files changed, 9 insertions(+), 18 deletions(-)

diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c
index 82b720f..a58ba48 100644
--- a/drivers/net/mv643xx_eth.c
+++ b/drivers/net/mv643xx_eth.c
@@ -404,8 +404,6 @@ struct mv643xx_eth_private {
 	u8 work_rx_refill;
 
 	int skb_size;
-	struct sk_buff_head rx_recycle;
-
 	/*
 	 * RX state.
 	 */
@@ -649,6 +647,7 @@ err:
 static int rxq_refill(struct rx_queue *rxq, int budget)
 {
 	struct mv643xx_eth_private *mp = rxq_to_mp(rxq);
+	struct net_device *dev = mp->dev;
 	int refilled;
 
 	refilled = 0;
@@ -658,10 +657,7 @@ static int rxq_refill(struct rx_queue *rxq, int budget)
 		struct rx_desc *rx_desc;
 		int size;
 
-		skb = __skb_dequeue(&mp->rx_recycle);
-		if (skb == NULL)
-			skb = dev_alloc_skb(mp->skb_size);
-
+		skb = net_recycle_get(dev);
 		if (skb == NULL) {
 			mp->oom = 1;
 			goto oom;
@@ -921,6 +917,7 @@ out:
 static int txq_reclaim(struct tx_queue *txq, int budget, int force)
 {
 	struct mv643xx_eth_private *mp = txq_to_mp(txq);
+	struct net_device *dev = mp->dev;
 	struct netdev_queue *nq = netdev_get_tx_queue(mp->dev, txq->index);
 	int reclaimed;
 
@@ -967,14 +964,8 @@ static int txq_reclaim(struct tx_queue *txq, int budget, int force)
 				       desc->byte_cnt, DMA_TO_DEVICE);
 		}
 
-		if (skb != NULL) {
-			if (skb_queue_len(&mp->rx_recycle) <
-					mp->rx_ring_size &&
-			    skb_recycle_check(skb, mp->skb_size))
-				__skb_queue_head(&mp->rx_recycle, skb);
-			else
-				dev_kfree_skb(skb);
-		}
+		if (skb)
+			net_recycle_add(dev, skb);
 	}
 
 	__netif_tx_unlock(nq);
@@ -1563,7 +1554,7 @@ mv643xx_eth_set_ringparam(struct net_device *dev, struct ethtool_ringparam *er)
 
 	mp->rx_ring_size = er->rx_pending < 4096 ? er->rx_pending : 4096;
 	mp->tx_ring_size = er->tx_pending < 4096 ? er->tx_pending : 4096;
-
+	net_recycle_qlen(dev, mp->rx_ring_size);
 	if (netif_running(dev)) {
 		mv643xx_eth_stop(dev);
 		if (mv643xx_eth_open(dev)) {
@@ -2344,9 +2335,9 @@ static int mv643xx_eth_open(struct net_device *dev)
 
 	mv643xx_eth_recalc_skb_size(mp);
 
-	napi_enable(&mp->napi);
+	net_recycle_init(mp->dev, mp->rx_ring_size, mp->skb_size);
 
-	skb_queue_head_init(&mp->rx_recycle);
+	napi_enable(&mp->napi);
 
 	mp->int_mask = INT_EXT;
 
@@ -2442,7 +2433,7 @@ static int mv643xx_eth_stop(struct net_device *dev)
 	mib_counters_update(mp);
 	del_timer_sync(&mp->mib_counters_timer);
 
-	skb_queue_purge(&mp->rx_recycle);
+	net_recycle_cleanup(dev);
 
 	for (i = 0; i < mp->rxq_count; i++)
 		rxq_deinit(mp->rxq + i);
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/8] net/stmmac: use generic recycling infrastructure
  2010-07-02 19:20 Generic rx-recycling and emergency skb pool Sebastian Andrzej Siewior
                   ` (2 preceding siblings ...)
  2010-07-02 19:20 ` [PATCH 3/8] net/mv643xx: use generic recycling infrastructure Sebastian Andrzej Siewior
@ 2010-07-02 19:20 ` Sebastian Andrzej Siewior
  2010-07-02 19:20 ` [PATCH 5/8] net/ucc_geth: " Sebastian Andrzej Siewior
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx, Sebastian Andrzej Siewior

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/stmmac/stmmac.h      |    1 -
 drivers/net/stmmac/stmmac_main.c |   26 +++++++-------------------
 2 files changed, 7 insertions(+), 20 deletions(-)

diff --git a/drivers/net/stmmac/stmmac.h b/drivers/net/stmmac/stmmac.h
index ebebc64..dbf9f95 100644
--- a/drivers/net/stmmac/stmmac.h
+++ b/drivers/net/stmmac/stmmac.h
@@ -44,7 +44,6 @@ struct stmmac_priv {
 	unsigned int dirty_rx;
 	struct sk_buff **rx_skbuff;
 	dma_addr_t *rx_skbuff_dma;
-	struct sk_buff_head rx_recycle;
 
 	struct net_device *dev;
 	int is_gmac;
diff --git a/drivers/net/stmmac/stmmac_main.c b/drivers/net/stmmac/stmmac_main.c
index a31d580..722a5e6 100644
--- a/drivers/net/stmmac/stmmac_main.c
+++ b/drivers/net/stmmac/stmmac_main.c
@@ -636,18 +636,7 @@ static void stmmac_tx(struct stmmac_priv *priv)
 			p->des3 = 0;
 
 		if (likely(skb != NULL)) {
-			/*
-			 * If there's room in the queue (limit it to size)
-			 * we add this skb back into the pool,
-			 * if it's the right size.
-			 */
-			if ((skb_queue_len(&priv->rx_recycle) <
-				priv->dma_rx_size) &&
-				skb_recycle_check(skb, priv->dma_buf_sz))
-				__skb_queue_head(&priv->rx_recycle, skb);
-			else
-				dev_kfree_skb(skb);
-
+			net_recycle_add(priv->dev, skb);
 			priv->tx_skbuff[entry] = NULL;
 		}
 
@@ -843,6 +832,9 @@ static int stmmac_open(struct net_device *dev)
 	priv->dma_buf_sz = STMMAC_ALIGN(buf_sz);
 	init_dma_desc_rings(dev);
 
+	net_recycle_init(priv->dev, priv->dma_rx_size, priv->dma_buf_sz +
+			NET_IP_ALIGN);
+
 	/* DMA initialization and SW reset */
 	if (unlikely(priv->hw->dma->init(ioaddr, priv->pbl, priv->dma_tx_phy,
 					 priv->dma_rx_phy) < 0)) {
@@ -894,7 +886,6 @@ static int stmmac_open(struct net_device *dev)
 		phy_start(priv->phydev);
 
 	napi_enable(&priv->napi);
-	skb_queue_head_init(&priv->rx_recycle);
 	netif_start_queue(dev);
 	return 0;
 }
@@ -925,7 +916,7 @@ static int stmmac_release(struct net_device *dev)
 		kfree(priv->tm);
 #endif
 	napi_disable(&priv->napi);
-	skb_queue_purge(&priv->rx_recycle);
+	net_recycle_cleanup(priv->dev);
 
 	/* Free the IRQ lines */
 	free_irq(dev->irq, dev);
@@ -1157,13 +1148,10 @@ static inline void stmmac_rx_refill(struct stmmac_priv *priv)
 		if (likely(priv->rx_skbuff[entry] == NULL)) {
 			struct sk_buff *skb;
 
-			skb = __skb_dequeue(&priv->rx_recycle);
-			if (skb == NULL)
-				skb = netdev_alloc_skb_ip_align(priv->dev,
-								bfsize);
-
+			skb = net_recycle_get(priv->dev);
 			if (unlikely(skb == NULL))
 				break;
+			skb_reserve(skb, NET_IP_ALIGN);
 
 			priv->rx_skbuff[entry] = skb;
 			priv->rx_skbuff_dma[entry] =
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/8] net/ucc_geth: use generic recycling infrastructure
  2010-07-02 19:20 Generic rx-recycling and emergency skb pool Sebastian Andrzej Siewior
                   ` (3 preceding siblings ...)
  2010-07-02 19:20 ` [PATCH 4/8] net/stmmac: " Sebastian Andrzej Siewior
@ 2010-07-02 19:20 ` Sebastian Andrzej Siewior
  2010-07-02 19:20 ` [PATCH 6/8] net: implement emergency pools Sebastian Andrzej Siewior
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx, Sebastian Andrzej Siewior

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/ucc_geth.c |   30 ++++++++----------------------
 drivers/net/ucc_geth.h |    2 --
 2 files changed, 8 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index dc32a62..9d6097b 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -210,10 +210,7 @@ static struct sk_buff *get_new_skb(struct ucc_geth_private *ugeth,
 {
 	struct sk_buff *skb = NULL;
 
-	skb = __skb_dequeue(&ugeth->rx_recycle);
-	if (!skb)
-		skb = dev_alloc_skb(ugeth->ug_info->uf_info.max_rx_buf_length +
-				    UCC_GETH_RX_DATA_BUF_ALIGNMENT);
+	skb = net_recycle_get(ugeth->ndev);
 	if (skb == NULL)
 		return NULL;
 
@@ -1992,8 +1989,6 @@ static void ucc_geth_memclean(struct ucc_geth_private *ugeth)
 		iounmap(ugeth->ug_regs);
 		ugeth->ug_regs = NULL;
 	}
-
-	skb_queue_purge(&ugeth->rx_recycle);
 }
 
 static void ucc_geth_set_multi(struct net_device *dev)
@@ -2069,6 +2064,7 @@ static void ucc_geth_stop(struct ucc_geth_private *ugeth)
 	ugeth->phydev = NULL;
 
 	ucc_geth_memclean(ugeth);
+	net_recycle_cleanup(ugeth->ndev);
 }
 
 static int ucc_struct_init(struct ucc_geth_private *ugeth)
@@ -2205,9 +2201,6 @@ static int ucc_struct_init(struct ucc_geth_private *ugeth)
 			ugeth_err("%s: Failed to ioremap regs.", __func__);
 		return -ENOMEM;
 	}
-
-	skb_queue_head_init(&ugeth->rx_recycle);
-
 	return 0;
 }
 
@@ -3213,12 +3206,8 @@ static int ucc_geth_rx(struct ucc_geth_private *ugeth, u8 rxQ, int rx_work_limit
 			if (netif_msg_rx_err(ugeth))
 				ugeth_err("%s, %d: ERROR!!! skb - 0x%08x",
 					   __func__, __LINE__, (u32) skb);
-			if (skb) {
-				skb->data = skb->head + NET_SKB_PAD;
-				skb->len = 0;
-				skb_reset_tail_pointer(skb);
-				__skb_queue_head(&ugeth->rx_recycle, skb);
-			}
+			if (skb)
+				net_recycle_add(dev, skb);
 
 			ugeth->rx_skbuff[rxQ][ugeth->skb_currx[rxQ]] = NULL;
 			dev->stats.rx_dropped++;
@@ -3288,13 +3277,7 @@ static int ucc_geth_tx(struct net_device *dev, u8 txQ)
 
 		dev->stats.tx_packets++;
 
-		if (skb_queue_len(&ugeth->rx_recycle) < RX_BD_RING_LEN &&
-			     skb_recycle_check(skb,
-				    ugeth->ug_info->uf_info.max_rx_buf_length +
-				    UCC_GETH_RX_DATA_BUF_ALIGNMENT))
-			__skb_queue_head(&ugeth->rx_recycle, skb);
-		else
-			dev_kfree_skb(skb);
+		net_recycle_add(dev, skb);
 
 		ugeth->tx_skbuff[txQ][ugeth->skb_dirtytx[txQ]] = NULL;
 		ugeth->skb_dirtytx[txQ] =
@@ -3929,6 +3912,9 @@ static int ucc_geth_probe(struct of_device* ofdev, const struct of_device_id *ma
 	netif_napi_add(dev, &ugeth->napi, ucc_geth_poll, 64);
 	dev->mtu = 1500;
 
+	net_recycle_init(dev, RX_BD_RING_LEN, ug_info->uf_info.max_rx_buf_length
+			+ UCC_GETH_RX_DATA_BUF_ALIGNMENT);
+
 	ugeth->msg_enable = netif_msg_init(debug.msg_enable, UGETH_MSG_DEFAULT);
 	ugeth->phy_interface = phy_interface;
 	ugeth->max_speed = max_speed;
diff --git a/drivers/net/ucc_geth.h b/drivers/net/ucc_geth.h
index 05a9558..07c0816 100644
--- a/drivers/net/ucc_geth.h
+++ b/drivers/net/ucc_geth.h
@@ -1213,8 +1213,6 @@ struct ucc_geth_private {
 	/* index of the first skb which hasn't been transmitted yet. */
 	u16 skb_dirtytx[NUM_TX_QUEUES];
 
-	struct sk_buff_head rx_recycle;
-
 	struct ugeth_mii_info *mii_info;
 	struct phy_device *phydev;
 	phy_interface_t phy_interface;
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/8] net: implement emergency pools
  2010-07-02 19:20 Generic rx-recycling and emergency skb pool Sebastian Andrzej Siewior
                   ` (4 preceding siblings ...)
  2010-07-02 19:20 ` [PATCH 5/8] net/ucc_geth: " Sebastian Andrzej Siewior
@ 2010-07-02 19:20 ` Sebastian Andrzej Siewior
  2010-07-02 19:20 ` [PATCH 7/8] net/emergency_skb: create a deep copy on clone Sebastian Andrzej Siewior
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx, Sebastian Andrzej Siewior

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

This patch implements emergency pools which are bound to a specific
network device. They can be activated via the socket interface and used
for a specific socket.
The pools are built on top of rx-recycling. The socket interface allows
to set the number of skbs in the pool and to active the pool.
The size of the skb which are accepted / added to the pool can not be
changed. It is set by the network driver and get altered on MTU change.
This requires to drop the current pool and re-allocate it. If the driver
does not set the skb size, the emergency pools can not be used.
Once the emergency pools are activated all rx-skbs allocation by the
network driver are taken from the pool. tx-skbs are allocated from the
emergency pool only for the relevant socket, i.e. that one which
activated the emergency mode.
Since the socket _and_ the driver can add/remove skbs to/from the pool
the list operations are using now skb_queue_head() and skb_dequeue().
There is patch later in the series which tries to bring the old unlock
behavior back if the emergency pools are not used by the user.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/alpha/include/asm/socket.h   |    4 +
 arch/arm/include/asm/socket.h     |    3 +
 arch/avr32/include/asm/socket.h   |    3 +
 arch/cris/include/asm/socket.h    |    5 +-
 arch/frv/include/asm/socket.h     |    4 +-
 arch/h8300/include/asm/socket.h   |    3 +
 arch/ia64/include/asm/socket.h    |    3 +
 arch/m32r/include/asm/socket.h    |    3 +
 arch/m68k/include/asm/socket.h    |    3 +
 arch/mips/include/asm/socket.h    |    3 +
 arch/mn10300/include/asm/socket.h |    3 +
 arch/parisc/include/asm/socket.h  |    3 +
 arch/powerpc/include/asm/socket.h |    3 +
 arch/s390/include/asm/socket.h    |    3 +
 arch/sparc/include/asm/socket.h   |    3 +
 arch/xtensa/include/asm/socket.h  |    3 +
 include/asm-generic/socket.h      |    4 +
 include/linux/netdevice.h         |   52 +++++++------
 include/linux/skbuff.h            |    1 +
 include/net/sock.h                |    2 +
 net/core/skbuff.c                 |    8 ++
 net/core/sock.c                   |  142 +++++++++++++++++++++++++++++++++++++
 22 files changed, 234 insertions(+), 27 deletions(-)

diff --git a/arch/alpha/include/asm/socket.h b/arch/alpha/include/asm/socket.h
index 06edfef..ea49db3 100644
--- a/arch/alpha/include/asm/socket.h
+++ b/arch/alpha/include/asm/socket.h
@@ -69,6 +69,10 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
+
 /* O_NONBLOCK clashes with the bits used for socket types.  Therefore we
  * have to define SOCK_NONBLOCK to a different value here.
  */
diff --git a/arch/arm/include/asm/socket.h b/arch/arm/include/asm/socket.h
index 90ffd04..b827010 100644
--- a/arch/arm/include/asm/socket.h
+++ b/arch/arm/include/asm/socket.h
@@ -62,4 +62,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/avr32/include/asm/socket.h b/arch/avr32/include/asm/socket.h
index c8d1fae..64a7d45 100644
--- a/arch/avr32/include/asm/socket.h
+++ b/arch/avr32/include/asm/socket.h
@@ -62,4 +62,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* __ASM_AVR32_SOCKET_H */
diff --git a/arch/cris/include/asm/socket.h b/arch/cris/include/asm/socket.h
index 1a4a619..9b8e7ed 100644
--- a/arch/cris/include/asm/socket.h
+++ b/arch/cris/include/asm/socket.h
@@ -64,6 +64,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_SOCKET_H */
-
-
diff --git a/arch/frv/include/asm/socket.h b/arch/frv/include/asm/socket.h
index a6b2688..15a262f 100644
--- a/arch/frv/include/asm/socket.h
+++ b/arch/frv/include/asm/socket.h
@@ -62,5 +62,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_SOCKET_H */
-
diff --git a/arch/h8300/include/asm/socket.h b/arch/h8300/include/asm/socket.h
index 04c0f45..d46d64e 100644
--- a/arch/h8300/include/asm/socket.h
+++ b/arch/h8300/include/asm/socket.h
@@ -62,4 +62,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/ia64/include/asm/socket.h b/arch/ia64/include/asm/socket.h
index 51427ea..04983aa 100644
--- a/arch/ia64/include/asm/socket.h
+++ b/arch/ia64/include/asm/socket.h
@@ -71,4 +71,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m32r/include/asm/socket.h b/arch/m32r/include/asm/socket.h
index 469787c..a0e5431 100644
--- a/arch/m32r/include/asm/socket.h
+++ b/arch/m32r/include/asm/socket.h
@@ -62,4 +62,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/m68k/include/asm/socket.h b/arch/m68k/include/asm/socket.h
index 9bf49c8..7018ceb 100644
--- a/arch/m68k/include/asm/socket.h
+++ b/arch/m68k/include/asm/socket.h
@@ -62,4 +62,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/mips/include/asm/socket.h b/arch/mips/include/asm/socket.h
index 9de5190..9f9d93a 100644
--- a/arch/mips/include/asm/socket.h
+++ b/arch/mips/include/asm/socket.h
@@ -82,6 +82,9 @@ To add: #define SO_REUSEPORT 0x0200	/* Allow local address and port reuse.  */
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #ifdef __KERNEL__
 
 /** sock_type - Socket types
diff --git a/arch/mn10300/include/asm/socket.h b/arch/mn10300/include/asm/socket.h
index 4e60c42..70476eb 100644
--- a/arch/mn10300/include/asm/socket.h
+++ b/arch/mn10300/include/asm/socket.h
@@ -62,4 +62,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/parisc/include/asm/socket.h b/arch/parisc/include/asm/socket.h
index 225b7d6..a4706d0 100644
--- a/arch/parisc/include/asm/socket.h
+++ b/arch/parisc/include/asm/socket.h
@@ -61,6 +61,9 @@
 
 #define SO_RXQ_OVFL             0x4021
 
+#define SO_EPOOL_QLEN		0x4022
+#define SO_EPOOL_SIZE		0x4023
+#define SO_EPOOL_MODE		0x4024
 /* O_NONBLOCK clashes with the bits used for socket types.  Therefore we
  * have to define SOCK_NONBLOCK to a different value here.
  */
diff --git a/arch/powerpc/include/asm/socket.h b/arch/powerpc/include/asm/socket.h
index 866f760..dce10f9 100644
--- a/arch/powerpc/include/asm/socket.h
+++ b/arch/powerpc/include/asm/socket.h
@@ -69,4 +69,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN           41
+#define SO_EPOOL_SIZE           42
+#define SO_EPOOL_MODE           43
 #endif	/* _ASM_POWERPC_SOCKET_H */
diff --git a/arch/s390/include/asm/socket.h b/arch/s390/include/asm/socket.h
index fdff1e9..73d0117 100644
--- a/arch/s390/include/asm/socket.h
+++ b/arch/s390/include/asm/socket.h
@@ -70,4 +70,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN           41
+#define SO_EPOOL_SIZE           42
+#define SO_EPOOL_MODE           43
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/sparc/include/asm/socket.h b/arch/sparc/include/asm/socket.h
index 9d3fefc..39eea91 100644
--- a/arch/sparc/include/asm/socket.h
+++ b/arch/sparc/include/asm/socket.h
@@ -58,6 +58,9 @@
 
 #define SO_RXQ_OVFL             0x0024
 
+#define SO_EPOOL_QLEN           0x0025
+#define SO_EPOOL_SIZE           0x0026
+#define SO_EPOOL_MODE           0x0027
 /* Security levels - as per NRL IPv6 - don't actually do anything */
 #define SO_SECURITY_AUTHENTICATION		0x5001
 #define SO_SECURITY_ENCRYPTION_TRANSPORT	0x5002
diff --git a/arch/xtensa/include/asm/socket.h b/arch/xtensa/include/asm/socket.h
index cbdf2ff..161a2e5 100644
--- a/arch/xtensa/include/asm/socket.h
+++ b/arch/xtensa/include/asm/socket.h
@@ -73,4 +73,7 @@
 
 #define SO_RXQ_OVFL             40
 
+#define SO_EPOOL_QLEN           41
+#define SO_EPOOL_SIZE           42
+#define SO_EPOOL_MODE           43
 #endif	/* _XTENSA_SOCKET_H */
diff --git a/include/asm-generic/socket.h b/include/asm-generic/socket.h
index 9a6115e..fa9ccbb 100644
--- a/include/asm-generic/socket.h
+++ b/include/asm-generic/socket.h
@@ -64,4 +64,8 @@
 #define SO_DOMAIN		39
 
 #define SO_RXQ_OVFL             40
+
+#define SO_EPOOL_QLEN		41
+#define SO_EPOOL_SIZE		42
+#define SO_EPOOL_MODE		43
 #endif /* __ASM_GENERIC_SOCKET_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4fa400b..fa7e951 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1095,6 +1095,28 @@ struct net_device {
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
+/**
+ *	dev_put - release reference to device
+ *	@dev: network device
+ *
+ * Release reference to device to allow it to be freed.
+ */
+static inline void dev_put(struct net_device *dev)
+{
+	atomic_dec(&dev->refcnt);
+}
+
+/**
+ *	dev_hold - get reference to device
+ *	@dev: network device
+ *
+ * Hold reference to device to keep it from being freed.
+ */
+static inline void dev_hold(struct net_device *dev)
+{
+	atomic_inc(&dev->refcnt);
+}
+
 static inline void net_recycle_init(struct net_device *dev, u32 qlen, u32 size)
 {
 	dev->rx_rec_skbs_max = qlen;
@@ -1118,9 +1140,13 @@ static inline void net_recycle_cleanup(struct net_device *dev)
 
 static inline void net_recycle_add(struct net_device *dev, struct sk_buff *skb)
 {
+	if (skb->emerg_dev) {
+		dev_put(skb->emerg_dev);
+		skb->emerg_dev = NULL;
+	}
 	if (skb_queue_len(&dev->rx_recycle) < dev->rx_rec_skbs_max &&
 			skb_recycle_check(skb, dev->rx_rec_skb_size))
-		__skb_queue_head(&dev->rx_recycle, skb);
+		skb_queue_head(&dev->rx_recycle, skb);
 	else
 		dev_kfree_skb_any(skb);
 }
@@ -1129,7 +1155,7 @@ static inline struct sk_buff *net_recycle_get(struct net_device *dev)
 {
 	struct sk_buff *skb;
 
-	skb = __skb_dequeue(&dev->rx_recycle);
+	skb = skb_dequeue(&dev->rx_recycle);
 	if (skb)
 		return skb;
 	return netdev_alloc_skb(dev, dev->rx_rec_skb_size);
@@ -1783,28 +1809,6 @@ extern int		netdev_budget;
 /* Called by rtnetlink.c:rtnl_unlock() */
 extern void netdev_run_todo(void);
 
-/**
- *	dev_put - release reference to device
- *	@dev: network device
- *
- * Release reference to device to allow it to be freed.
- */
-static inline void dev_put(struct net_device *dev)
-{
-	atomic_dec(&dev->refcnt);
-}
-
-/**
- *	dev_hold - get reference to device
- *	@dev: network device
- *
- * Hold reference to device to keep it from being freed.
- */
-static inline void dev_hold(struct net_device *dev)
-{
-	atomic_inc(&dev->refcnt);
-}
-
 /* Carrier loss detection, dial on demand. The functions netif_carrier_on
  * and _off may be called from IRQ context, but it is caller
  * who is responsible for serialization of these calls.
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ac74ee0..caee62c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -319,6 +319,7 @@ struct sk_buff {
 
 	struct sock		*sk;
 	struct net_device	*dev;
+	struct net_device	*emerg_dev;
 
 	/*
 	 * This is the control buffer. It is free to use for every
diff --git a/include/net/sock.h b/include/net/sock.h
index 4f26f2f..3f3518a 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -314,6 +314,8 @@ struct sock {
 #endif
 	__u32			sk_mark;
 	u32			sk_classid;
+	u32			emerg_en;
+	/* XXX 4 bytes hole on 64 bit */
 	void			(*sk_state_change)(struct sock *sk);
 	void			(*sk_data_ready)(struct sock *sk, int bytes);
 	void			(*sk_write_space)(struct sock *sk);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 34432b4..f02737d 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -425,6 +425,13 @@ static void skb_release_all(struct sk_buff *skb)
 
 void __kfree_skb(struct sk_buff *skb)
 {
+	struct net_device *ndev = skb->emerg_dev;
+
+	if (ndev) {
+		net_recycle_add(ndev, skb);
+		return;
+	}
+
 	skb_release_all(skb);
 	kfree_skbmem(skb);
 }
@@ -563,6 +570,7 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)
 {
 #define C(x) n->x = skb->x
 
+	n->emerg_dev = NULL;
 	n->next = n->prev = NULL;
 	n->sk = NULL;
 	__copy_skb_header(n, skb);
diff --git a/net/core/sock.c b/net/core/sock.c
index fef2434..33aa1a5 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -472,6 +472,71 @@ static inline void sock_valbool_flag(struct sock *sk, int bit, int valbool)
 		sock_reset_flag(sk, bit);
 }
 
+static int sock_epool_set_qlen(struct sock *sk, int val)
+{
+	struct net *net = sock_net(sk);
+	struct net_device *dev;
+
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
+	if (!sk->sk_bound_dev_if)
+		return -ENODEV;
+	dev = dev_get_by_index(net, sk->sk_bound_dev_if);
+	if (!dev)
+		return -ENODEV;
+
+	net_recycle_qlen(dev, val);
+	dev_put(dev);
+	return 0;
+}
+
+static int sock_epool_set_mode(struct sock *sk, int val)
+{
+	int ret;
+	struct net *net = sock_net(sk);
+	struct net_device *dev;
+
+	if (!val) {
+		sk->emerg_en = 0;
+		return 0;
+	}
+	if (sk->emerg_en && val)
+		return -EBUSY;
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+	if (!sk->sk_bound_dev_if)
+		return -ENODEV;
+	dev = dev_get_by_index(net, sk->sk_bound_dev_if);
+	if (!dev)
+		return -ENODEV;
+	ret = -ENODEV;
+	if (!dev->rx_rec_skb_size)
+		goto out;
+
+	do {
+		struct sk_buff *skb;
+
+		if (skb_queue_len(&dev->rx_recycle) >= dev->rx_rec_skbs_max) {
+			ret = 0;
+			break;
+		}
+
+		skb = __netdev_alloc_skb(dev, dev->rx_rec_skb_size, GFP_KERNEL);
+		if (!skb) {
+			ret = -ENOMEM;
+			break;
+		}
+		net_recycle_add(dev, skb);
+	} while (1);
+
+	if (!ret)
+		sk->emerg_en = 1;
+out:
+	dev_put(dev);
+	return ret;
+}
+
 /*
  *	This is meant for all protocols to use and covers goings on
  *	at the socket level. Everything here is generic.
@@ -740,6 +805,15 @@ set_rcvbuf:
 		else
 			sock_reset_flag(sk, SOCK_RXQ_OVFL);
 		break;
+	case SO_EPOOL_QLEN:
+		ret = sock_epool_set_qlen(sk, val);
+		break;
+	case SO_EPOOL_SIZE:
+		ret = -EINVAL;
+		break;
+	case SO_EPOOL_MODE:
+		ret = sock_epool_set_mode(sk, valbool);
+		break;
 	default:
 		ret = -ENOPROTOOPT;
 		break;
@@ -961,6 +1035,35 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 		v.val = !!sock_flag(sk, SOCK_RXQ_OVFL);
 		break;
 
+	case SO_EPOOL_QLEN:
+	{
+		struct net *net = sock_net(sk);
+		struct net_device *dev;
+
+		if (!sk->sk_bound_dev_if)
+			return -ENODEV;
+		dev = dev_get_by_index(net, sk->sk_bound_dev_if);
+		if (!dev)
+			return -ENODEV;
+		v.val = dev->rx_rec_skbs_max;
+		break;
+	}
+	case SO_EPOOL_SIZE:
+	{
+		struct net *net = sock_net(sk);
+		struct net_device *dev;
+
+		if (!sk->sk_bound_dev_if)
+			return -ENODEV;
+		dev = dev_get_by_index(net, sk->sk_bound_dev_if);
+		if (!dev)
+			return -ENODEV;
+		v.val = dev->rx_rec_skb_size;
+		break;
+	}
+	case SO_EPOOL_MODE:
+		v.val = sk->emerg_en;
+		break;
 	default:
 		return -ENOPROTOOPT;
 	}
@@ -1459,6 +1562,37 @@ static long sock_wait_for_wmem(struct sock *sk, long timeo)
 	return timeo;
 }
 
+static struct sk_buff *alloc_emerg_skb(struct sock *sk, unsigned int skb_len)
+{
+	struct net *net = sock_net(sk);
+	struct net_device *dev;
+	int err;
+	struct sk_buff *skb;
+
+	err = -ENODEV;
+	if (!sk->sk_bound_dev_if)
+		return ERR_PTR(err);
+	dev = dev_get_by_index(net, sk->sk_bound_dev_if);
+	if (!dev)
+		return ERR_PTR(err);
+	err = -EINVAL;
+	if (dev->rx_rec_skb_size < skb_len) {
+		dev_put(dev);
+		return ERR_PTR(err);
+	}
+	skb = skb_dequeue(&dev->rx_recycle);
+	if (!skb) {
+		dev_put(dev);
+		err = -ENOBUFS;
+		return ERR_PTR(err);
+	}
+	/*
+	 * dev will be put once the skb is back from
+	 * its journey.
+	 */
+	skb->emerg_dev = dev;
+	return skb;
+}
 
 /*
  *	Generic send/receive buffer handlers
@@ -1488,6 +1622,14 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
 			goto failure;
 
 		if (atomic_read(&sk->sk_wmem_alloc) < sk->sk_sndbuf) {
+			if (sk->emerg_en) {
+				skb = alloc_emerg_skb(sk, header_len + data_len);
+				if (IS_ERR(skb)) {
+					err = PTR_ERR(skb);
+					goto failure;
+				}
+				break;
+			}
 			skb = alloc_skb(header_len, gfp_mask);
 			if (skb) {
 				int npages;
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 7/8] net/emergency_skb: create a deep copy on clone
  2010-07-02 19:20 Generic rx-recycling and emergency skb pool Sebastian Andrzej Siewior
                   ` (5 preceding siblings ...)
  2010-07-02 19:20 ` [PATCH 6/8] net: implement emergency pools Sebastian Andrzej Siewior
@ 2010-07-02 19:20 ` Sebastian Andrzej Siewior
  2010-07-02 19:20 ` [PATCH 8/8] net/emergency: remove locking from reycling pool if emergncy pools are not used Sebastian Andrzej Siewior
  2010-07-03  6:23 ` Generic rx-recycling and emergency skb pool Eric Dumazet
  8 siblings, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx, Sebastian Andrzej Siewior

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

skb_clone() creates a clone of the skb: a new head is allocated from the
slab cache and the reference counter for the data part is incremented.
For the skbs from the emergency pool, we don't really want to clone
them that way:
- talking to slab may lead to lock contention which in turn increases
  the latency.
- the original (with the data part) may return earlier to the pool than
  the clone. In that case we would "lose" the skb from the emergency
  pool.

Instead we do a copy of head and data into a skb from the emergency
pool.
This patch cuts pskb_copy() into a helper function which does
the bare work and the remaining pskb_copy() allocates a new skb and
calls it.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 net/core/skbuff.c |   80 +++++++++++++++++++++++++++++++++++++++--------------
 1 files changed, 59 insertions(+), 21 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f02737d..9e094fc 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -613,6 +613,7 @@ struct sk_buff *skb_morph(struct sk_buff *dst, struct sk_buff *src)
 }
 EXPORT_SYMBOL_GPL(skb_morph);
 
+static int __pskb_copy(struct sk_buff *skb, struct sk_buff *n);
 /**
  *	skb_clone	-	duplicate an sk_buff
  *	@skb: buffer to clone
@@ -631,6 +632,20 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask)
 {
 	struct sk_buff *n;
 
+	if (skb->emerg_dev) {
+		n = skb_dequeue(&skb->emerg_dev->rx_recycle);
+		if (!n)
+			goto norm_clone;
+		/* remove earlier reservers */
+		skb_reserve(n, - skb_headroom(n));
+		if (!__pskb_copy(skb, n)) {
+			n->emerg_dev = skb->emerg_dev;
+			dev_hold(skb->emerg_dev);
+			return n;
+		}
+		net_recycle_add(skb->emerg_dev, n);
+	}
+norm_clone:
 	n = skb + 1;
 	if (skb->fclone == SKB_FCLONE_ORIG &&
 	    n->fclone == SKB_FCLONE_UNAVAILABLE) {
@@ -720,31 +735,22 @@ struct sk_buff *skb_copy(const struct sk_buff *skb, gfp_t gfp_mask)
 EXPORT_SYMBOL(skb_copy);
 
 /**
- *	pskb_copy	-	create copy of an sk_buff with private head.
- *	@skb: buffer to copy
- *	@gfp_mask: allocation priority
+ *      __pskb_copy     -       create copy of an sk_buff with private head.
+ *      @skb: buffer to copy
+ *      @n: skb to copy it
  *
- *	Make a copy of both an &sk_buff and part of its data, located
- *	in header. Fragmented data remain shared. This is used when
- *	the caller wishes to modify only header of &sk_buff and needs
- *	private copy of the header to alter. Returns %NULL on failure
- *	or the pointer to the buffer on success.
- *	The returned buffer has a reference count of 1.
+ *      This functions behaves like pskb_copy() except that it takes
+ *      an allready allocated skb where it will copy head and data.
+ *      The returned buffer has a reference count of 1.
  */
-
-struct sk_buff *pskb_copy(struct sk_buff *skb, gfp_t gfp_mask)
+static int __pskb_copy(struct sk_buff *skb, struct sk_buff *n)
 {
-	/*
-	 *	Allocate the copy buffer
-	 */
-	struct sk_buff *n;
 #ifdef NET_SKBUFF_DATA_USES_OFFSET
-	n = alloc_skb(skb->end, gfp_mask);
+	if (skb->end > n->end)
 #else
-	n = alloc_skb(skb->end - skb->head, gfp_mask);
+	if ((skb->end - skb->head) > (n->end - n->head))
 #endif
-	if (!n)
-		goto out;
+		return -EMSGSIZE;
 
 	/* Set the data pointer */
 	skb_reserve(n, skb->data - skb->head);
@@ -773,8 +779,40 @@ struct sk_buff *pskb_copy(struct sk_buff *skb, gfp_t gfp_mask)
 	}
 
 	copy_skb_header(n, skb);
-out:
-	return n;
+	return 0;
+}
+
+/**
+ *	pskb_copy	-	create copy of an sk_buff with private head.
+ *	@skb: buffer to copy
+ *	@gfp_mask: allocation priority
+ *
+ *	Make a copy of both an &sk_buff and part of its data, located
+ *	in header. Fragmented data remain shared. This is used when
+ *	the caller wishes to modify only header of &sk_buff and needs
+ *	private copy of the header to alter. Returns %NULL on failure
+ *	or the pointer to the buffer on success.
+ *	The returned buffer has a reference count of 1.
+ */
+
+struct sk_buff *pskb_copy(struct sk_buff *skb, gfp_t gfp_mask)
+{
+	/*
+	 *	Allocate the copy buffer
+	 */
+	struct sk_buff *n;
+#ifdef NET_SKBUFF_DATA_USES_OFFSET
+	n = alloc_skb(skb->end, gfp_mask);
+#else
+	n = alloc_skb(skb->end - skb->head, gfp_mask);
+#endif
+	if (!n)
+		return NULL;
+	if (!__pskb_copy(skb, n))
+		return n;
+	kfree_skb(n);
+	return NULL;
+
 }
 EXPORT_SYMBOL(pskb_copy);
 
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 8/8] net/emergency: remove locking from reycling pool if emergncy pools are not used
  2010-07-02 19:20 Generic rx-recycling and emergency skb pool Sebastian Andrzej Siewior
                   ` (6 preceding siblings ...)
  2010-07-02 19:20 ` [PATCH 7/8] net/emergency_skb: create a deep copy on clone Sebastian Andrzej Siewior
@ 2010-07-02 19:20 ` Sebastian Andrzej Siewior
  2010-07-03  6:23 ` Generic rx-recycling and emergency skb pool Eric Dumazet
  8 siblings, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2010-07-02 19:20 UTC (permalink / raw)
  To: netdev; +Cc: tglx, Sebastian Andrzej Siewior

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Right now both users (socket and network driver) are accessing the
emergency pools locked. If there is no socket user then we have only the
NIC driver which is touching the pool in its napi callback. The locking
is not required since the driver has its own locking.

Disabling the emergency pools results in emerg_skb_users going down to
zero. Once the driver notices this then further allocations will be lock
less. This is performed while holding the list lock of the pool to
ensure that all users which touch the struct locked are gone.
Enabling the pools for the socket user requires that the NIC driver is
not touching the pool unlocked. This is ensured by the ndo_emerg_reload
callback which performs a lightweight disable/enable of the card.

As a side effect, the emergency mode of the nic is now deactivated once
the last user is gone.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/gianfar.c     |   10 ++++
 drivers/net/ucc_geth.c    |    8 ++++
 include/linux/netdevice.h |   55 ++++++++++++++++++++++--
 net/core/dev.c            |    1 +
 net/core/skbuff.c         |   19 ++++++++-
 net/core/sock.c           |  101 ++++++++++++++++++++++++++++++++++++++-------
 6 files changed, 172 insertions(+), 22 deletions(-)

diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 1a1a249..0c891ea 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -116,6 +116,7 @@ static void gfar_new_rxbdp(struct gfar_priv_rx_q *rx_queue, struct rxbd8 *bdp,
 		struct sk_buff *skb);
 static int gfar_set_mac_address(struct net_device *dev);
 static int gfar_change_mtu(struct net_device *dev, int new_mtu);
+static int gfar_emerg_reload(struct net_device *dev);
 static irqreturn_t gfar_error(int irq, void *dev_id);
 static irqreturn_t gfar_transmit(int irq, void *dev_id);
 static irqreturn_t gfar_interrupt(int irq, void *dev_id);
@@ -464,6 +465,7 @@ static const struct net_device_ops gfar_netdev_ops = {
 	.ndo_start_xmit = gfar_start_xmit,
 	.ndo_stop = gfar_close,
 	.ndo_change_mtu = gfar_change_mtu,
+	.ndo_emerg_reload = gfar_emerg_reload,
 	.ndo_set_multicast_list = gfar_set_multi,
 	.ndo_tx_timeout = gfar_timeout,
 	.ndo_do_ioctl = gfar_ioctl,
@@ -1918,6 +1920,8 @@ int startup_gfar(struct net_device *ndev)
 	if (err)
 		return err;
 
+	fill_emerg_pool(ndev);
+
 	gfar_init_mac(ndev);
 
 	for (i = 0; i < priv->num_grps; i++) {
@@ -2393,6 +2397,12 @@ static int gfar_change_mtu(struct net_device *dev, int new_mtu)
 	return 0;
 }
 
+static int gfar_emerg_reload(struct net_device *dev)
+{
+	stop_gfar(dev);
+	startup_gfar(dev);
+}
+
 /* gfar_reset_task gets scheduled when a packet has not been
  * transmitted after a set amount of time.
  * For now, assume that clearing out all the structures, and
diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index 9d6097b..6280226 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -1991,6 +1991,12 @@ static void ucc_geth_memclean(struct ucc_geth_private *ugeth)
 	}
 }
 
+static void ucc_emerg_reload(struct net_device *dev)
+{
+	ucc_geth_close(dev);
+	ucc_geth_open(dev);
+}
+
 static void ucc_geth_set_multi(struct net_device *dev)
 {
 	struct ucc_geth_private *ugeth;
@@ -3499,6 +3505,7 @@ static int ucc_geth_open(struct net_device *dev)
 				  dev->name);
 		goto err;
 	}
+	fill_emerg_pool(dev);
 
 	err = request_irq(ugeth->ug_info->uf_info.irq, ucc_geth_irq_handler,
 			  0, "UCC Geth", dev);
@@ -3707,6 +3714,7 @@ static const struct net_device_ops ucc_geth_netdev_ops = {
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_set_mac_address	= ucc_geth_set_mac_addr,
 	.ndo_change_mtu		= eth_change_mtu,
+	.ndo_emerg_reload	= ucc_emerg_reload,
 	.ndo_set_multicast_list	= ucc_geth_set_multi,
 	.ndo_tx_timeout		= ucc_geth_timeout,
 	.ndo_do_ioctl		= ucc_geth_ioctl,
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index fa7e951..2606156 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -717,6 +717,10 @@ extern void dev_kfree_skb_any(struct sk_buff *skb);
  * int (*ndo_set_vf_port)(struct net_device *dev, int vf,
  *			  struct nlattr *port[]);
  * int (*ndo_get_vf_port)(struct net_device *dev, int vf, struct sk_buff *skb);
+ * void (*ndo_emerg_reload)(struct net_device *dev);
+ *	If the card supports emergency pools then this function will perform a
+ *	lightweight reload to ensure that the card is not lockless accessing
+ *	the emergency pool for recycling purpose.
  */
 #define HAVE_NET_DEVICE_OPS
 struct net_device_ops {
@@ -788,6 +792,7 @@ struct net_device_ops {
 	int			(*ndo_fcoe_get_wwn)(struct net_device *dev,
 						    u64 *wwn, int type);
 #endif
+	void			(*ndo_emerg_reload)(struct net_device *dev);
 };
 
 /*
@@ -1092,6 +1097,7 @@ struct net_device {
 	struct sk_buff_head rx_recycle;
 	u32 rx_rec_skbs_max;
 	u32 rx_rec_skb_size;
+	atomic_t emerg_skb_users;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
@@ -1138,24 +1144,63 @@ static inline void net_recycle_cleanup(struct net_device *dev)
 	skb_queue_purge(&dev->rx_recycle);
 }
 
+static inline int recycle_possible(struct net_device *dev, struct sk_buff *skb)
+{
+	if (skb_queue_len(&dev->rx_recycle) < dev->rx_rec_skbs_max &&
+			skb_recycle_check(skb, dev->rx_rec_skb_size))
+		return 1;
+	else
+		return 0;
+}
+
 static inline void net_recycle_add(struct net_device *dev, struct sk_buff *skb)
 {
+	int emerg_active;
+
 	if (skb->emerg_dev) {
 		dev_put(skb->emerg_dev);
 		skb->emerg_dev = NULL;
 	}
-	if (skb_queue_len(&dev->rx_recycle) < dev->rx_rec_skbs_max &&
-			skb_recycle_check(skb, dev->rx_rec_skb_size))
-		skb_queue_head(&dev->rx_recycle, skb);
-	else
+
+	emerg_active = atomic_read(&dev->emerg_skb_users);
+
+	if (recycle_possible(dev, skb)) {
+		if (emerg_active)
+			skb_queue_head(&dev->rx_recycle, skb);
+		else
+			__skb_queue_head(&dev->rx_recycle, skb);
+	} else {
 		dev_kfree_skb_any(skb);
+	}
+}
+
+static inline void fill_emerg_pool(struct net_device *dev)
+{
+	struct sk_buff *skb;
+
+	if (atomic_read(&dev->emerg_skb_users) == 0)
+		return;
+
+	do {
+		if (skb_queue_len(&dev->rx_recycle) >= dev->rx_rec_skbs_max)
+			return;
+
+		skb = __netdev_alloc_skb(dev, dev->rx_rec_skb_size, GFP_KERNEL);
+		if (!skb)
+			return;
+
+		net_recycle_add(dev, skb);
+	} while (1);
 }
 
 static inline struct sk_buff *net_recycle_get(struct net_device *dev)
 {
 	struct sk_buff *skb;
 
-	skb = skb_dequeue(&dev->rx_recycle);
+	if (atomic_read(&dev->emerg_skb_users) > 0)
+		skb = skb_dequeue(&dev->rx_recycle);
+	else
+		skb = __skb_dequeue(&dev->rx_recycle);
 	if (skb)
 		return skb;
 	return netdev_alloc_skb(dev, dev->rx_rec_skb_size);
diff --git a/net/core/dev.c b/net/core/dev.c
index db9acd5..5dbe356 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5405,6 +5405,7 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 	netdev_init_queues(dev);
 
 	skb_queue_head_init(&dev->rx_recycle);
+	atomic_set(&dev->emerg_skb_users, 0);
 	INIT_LIST_HEAD(&dev->ethtool_ntuple_list.list);
 	dev->ethtool_ntuple_list.count = 0;
 	INIT_LIST_HEAD(&dev->napi_list);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 9e094fc..374d353 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -428,10 +428,25 @@ void __kfree_skb(struct sk_buff *skb)
 	struct net_device *ndev = skb->emerg_dev;
 
 	if (ndev) {
-		net_recycle_add(ndev, skb);
+		int emerg_en;
+		unsigned long flags;
+		int can_recycle;
+
+		skb->emerg_dev = NULL;
+		can_recycle = recycle_possible(ndev, skb);
+		spin_lock_irqsave(&ndev->rx_recycle.lock, flags);
+		emerg_en = atomic_read(&ndev->emerg_skb_users);
+		if (!emerg_en || !can_recycle) {
+			spin_unlock_irqrestore(&ndev->rx_recycle.lock, flags);
+			dev_put(ndev);
+			goto free_it;
+		}
+		__skb_queue_head(&ndev->rx_recycle, skb);
+		spin_unlock_irqrestore(&ndev->rx_recycle.lock, flags);
+		dev_put(ndev);
 		return;
 	}
-
+free_it:
 	skb_release_all(skb);
 	kfree_skbmem(skb);
 }
diff --git a/net/core/sock.c b/net/core/sock.c
index 33aa1a5..409d069 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -424,6 +424,10 @@ static int sock_bindtodevice(struct sock *sk, char __user *optval, int optlen)
 	if (optlen < 0)
 		goto out;
 
+	ret = -EBUSY;
+	if (sk->emerg_en)
+		goto out;
+
 	/* Bind this socket to a particular device like "eth0",
 	 * as specified in the passed interface name. If the
 	 * name is "" or the option length is zero the socket
@@ -497,10 +501,6 @@ static int sock_epool_set_mode(struct sock *sk, int val)
 	struct net *net = sock_net(sk);
 	struct net_device *dev;
 
-	if (!val) {
-		sk->emerg_en = 0;
-		return 0;
-	}
 	if (sk->emerg_en && val)
 		return -EBUSY;
 	if (!capable(CAP_NET_ADMIN))
@@ -510,10 +510,45 @@ static int sock_epool_set_mode(struct sock *sk, int val)
 	dev = dev_get_by_index(net, sk->sk_bound_dev_if);
 	if (!dev)
 		return -ENODEV;
+	if (!val) {
+		ret = 0;
+		if (!sk->emerg_en)
+			goto out;
+		/*
+		 * new skbs for this socket won't be taken from the emergency
+		 * pool anymore.
+		 */
+		sk->emerg_en = 0;
+		smp_rmb();
+		/* get rid of anyone who got into the critical section before
+		 * the flag was changed and is still there
+		 */
+		spin_lock_irq(&dev->rx_recycle.lock);
+		/*
+		 * if we fall down to 0 users, kfree will no longer add
+		 * packets to the pool but simply free them. Also the recycle
+		 * code which takes tx packets and adds them to the pool will
+		 * start working lockless since there are no further user
+		 * except the nic driver.
+		 */
+		atomic_dec(&dev->emerg_skb_users);
+		spin_unlock_irq(&dev->rx_recycle.lock);
+		goto out;
+	}
 	ret = -ENODEV;
+
 	if (!dev->rx_rec_skb_size)
 		goto out;
 
+	if (!dev->netdev_ops->ndo_emerg_reload)
+		goto out;
+
+	rtnl_lock();
+	ret = atomic_add_return(1, &dev->emerg_skb_users);
+	if (ret == 1 && (dev->flags & IFF_UP))
+		dev->netdev_ops->ndo_emerg_reload(dev);
+	rtnl_unlock();
+
 	do {
 		struct sk_buff *skb;
 
@@ -532,6 +567,8 @@ static int sock_epool_set_mode(struct sock *sk, int val)
 
 	if (!ret)
 		sk->emerg_en = 1;
+	else
+		atomic_dec(&dev->emerg_skb_users);
 out:
 	dev_put(dev);
 	return ret;
@@ -1223,6 +1260,8 @@ EXPORT_SYMBOL(sk_alloc);
 static void __sk_free(struct sock *sk)
 {
 	struct sk_filter *filter;
+	struct net_device *dev;
+	struct net *net = sock_net(sk);
 
 	if (sk->sk_destruct)
 		sk->sk_destruct(sk);
@@ -1244,6 +1283,18 @@ static void __sk_free(struct sock *sk)
 	if (sk->sk_peer_cred)
 		put_cred(sk->sk_peer_cred);
 	put_pid(sk->sk_peer_pid);
+
+	if (sk->emerg_en) {
+		dev = dev_get_by_index(net, sk->sk_bound_dev_if);
+		if (dev) {
+			spin_lock_irq(&dev->rx_recycle.lock);
+			atomic_dec(&dev->emerg_skb_users);
+			spin_unlock_irq(&dev->rx_recycle.lock);
+			WARN_ON(atomic_read(&dev->emerg_skb_users) < 0);
+			dev_put(dev);
+		}
+	}
+
 	put_net(sock_net(sk));
 	sk_prot_free(sk->sk_prot_creator, sk);
 }
@@ -1566,6 +1617,7 @@ static struct sk_buff *alloc_emerg_skb(struct sock *sk, unsigned int skb_len)
 {
 	struct net *net = sock_net(sk);
 	struct net_device *dev;
+	unsigned long flags;
 	int err;
 	struct sk_buff *skb;
 
@@ -1580,18 +1632,33 @@ static struct sk_buff *alloc_emerg_skb(struct sock *sk, unsigned int skb_len)
 		dev_put(dev);
 		return ERR_PTR(err);
 	}
-	skb = skb_dequeue(&dev->rx_recycle);
-	if (!skb) {
-		dev_put(dev);
-		err = -ENOBUFS;
-		return ERR_PTR(err);
+
+	spin_lock_irqsave(&dev->rx_recycle.lock, flags);
+	if (sk->emerg_en) {
+		skb = __skb_dequeue(&dev->rx_recycle);
+		spin_unlock_irqrestore(&dev->rx_recycle.lock, flags);
+		if (!skb) {
+			dev_put(dev);
+			err = -ENOBUFS;
+			return ERR_PTR(err);
+		}
+		/* remove earlier skb_reserve() */
+		skb_reserve(skb, - skb_headroom(skb));
+		skb->emerg_dev = dev;
+		/*
+		 * dev will be put once the skb is back from
+		 * its journey.
+		 */
+		return skb;
 	}
 	/*
-	 * dev will be put once the skb is back from
-	 * its journey.
+	 * We got called but the emergency pools are not activated. This might
+	 * happen if the pools got deactivated between checkecing the emerg_en
+	 * flag and taking the lock.
 	 */
-	skb->emerg_dev = dev;
-	return skb;
+	spin_unlock_irqrestore(&dev->rx_recycle.lock, flags);
+	dev_put(dev);
+	return NULL;
 }
 
 /*
@@ -1627,8 +1694,12 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
 				if (IS_ERR(skb)) {
 					err = PTR_ERR(skb);
 					goto failure;
-				}
-				break;
+
+				} else if (skb)
+					break;
+				/*
+				 * else skb is null because emerg_en is not set
+				 */
 			}
 			skb = alloc_skb(header_len, gfp_mask);
 			if (skb) {
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Generic rx-recycling and emergency skb pool
  2010-07-02 19:20 Generic rx-recycling and emergency skb pool Sebastian Andrzej Siewior
                   ` (7 preceding siblings ...)
  2010-07-02 19:20 ` [PATCH 8/8] net/emergency: remove locking from reycling pool if emergncy pools are not used Sebastian Andrzej Siewior
@ 2010-07-03  6:23 ` Eric Dumazet
  2010-07-03  6:46   ` David Miller
  2010-08-26 17:31   ` Sebastian Andrzej Siewior
  8 siblings, 2 replies; 13+ messages in thread
From: Eric Dumazet @ 2010-07-03  6:23 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: netdev, tglx

Le vendredi 02 juillet 2010 à 21:20 +0200, Sebastian Andrzej Siewior a
écrit :
> This is version two of generic rx-recycling followed by version one of
> emergency skb pools which are built on top of rx-recycling.
> The change from v1 of generic rx-recycling is that the list access is
> unlocked instead of locked.
> Patch six which introduces the emergency pools adds the locking back.
> This is required since we now have two not serialized users. In order
> not to punish everyone patch eight removes this locking again. That
> patch converts only two drivers so you have an idea what I think is
> required to get the locking removed.
> 
> The idea behind emergency pools is to have pre-allocated skbs for TX and
> RX. Using the memory allocator for it leads to latencies during memory
> pressure. The pre-allocated skb are "tagged" and should get back to the
> pool once they are through the stack so the pool should never get
> exhausted.
> 
> While it was easy to convert the drivers which share the same concept of
> rx-recycling to use the emergency pools it was difficult to hook up the
> more complex drivers like e1000e. The e1000e can use split skbs / a frag
> list which is different from the allocation currently used. So instead of
> forcing all drivers to use the same way of doing things I've been thinking
> about providing a dedicated callback for skb allocation and checking if
> this skb is "good enough". This is not yet implemented.
> 
> I would be glad to receive some feedback on this patch series before I go
> any further. Unfortunately I'm on vacation for the next two weeks so I
> can't respond earlier. tglx is on Cc and should be able respond earlier :)
> 
> Sebastian

Sebastian

I read all patches, and my initial feeling is all this is very complex
and have many shortcomings.

Most modern NICS are multiqueue, so that each cpu can use a queue on its
own without slowing down other cpus.

Yet rx recycling has one queue per device, defeating part of the
multiqueue goal.

Patch 6/8 even touches dev->refcnt on each emerg packet
Patch 6/8 adds 8 bytes (emerg_dev) to skb. Oh well...

Adding cache layers, especially dumb ones like this one, is probably the
sign something more fundamental is broken somewhere.


I do believe for example that netdev_alloc_skb() should not try to use
the node affinity of the device, but use current cpu node for sk_buff at
least, and possibly for data part too.

One other problem of skb are the two memory blocs involved, and fact
that first one (skb) is already very big and fat, and filled/dirtied
many cycles before its use in RX path.

Maybe its time to provide new API, so that a driver can build an skb at
the time RX interrupt is handled, not at the time the rx ring buffer is
renewed. RX ring should only provide the data part to NIC, and skb
should be built when NIC delivers the frame, so that we provide to IP
stack a real hot skb.




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Generic rx-recycling and emergency skb pool
  2010-07-03  6:23 ` Generic rx-recycling and emergency skb pool Eric Dumazet
@ 2010-07-03  6:46   ` David Miller
  2010-07-03  7:31     ` Eric Dumazet
  2010-08-26 17:31   ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 13+ messages in thread
From: David Miller @ 2010-07-03  6:46 UTC (permalink / raw)
  To: eric.dumazet; +Cc: sebastian, netdev, tglx

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 03 Jul 2010 08:23:25 +0200

> Maybe its time to provide new API, so that a driver can build an skb at
> the time RX interrupt is handled, not at the time the rx ring buffer is
> renewed. RX ring should only provide the data part to NIC, and skb
> should be built when NIC delivers the frame, so that we provide to IP
> stack a real hot skb.

Drivers do this already, and in fact I am very sure I'm mentioned this
to you at least other time in the past, and one such driver is NIU :-)

It's trivial to do with devices which work on power-of-2 chopped up
pages.  In fact I'm surprised that RX buffer management scheme is
not more prevalent in network devices.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Generic rx-recycling and emergency skb pool
  2010-07-03  6:46   ` David Miller
@ 2010-07-03  7:31     ` Eric Dumazet
  0 siblings, 0 replies; 13+ messages in thread
From: Eric Dumazet @ 2010-07-03  7:31 UTC (permalink / raw)
  To: David Miller; +Cc: sebastian, netdev, tglx

Le vendredi 02 juillet 2010 à 23:46 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Sat, 03 Jul 2010 08:23:25 +0200
> 
> > Maybe its time to provide new API, so that a driver can build an skb at
> > the time RX interrupt is handled, not at the time the rx ring buffer is
> > renewed. RX ring should only provide the data part to NIC, and skb
> > should be built when NIC delivers the frame, so that we provide to IP
> > stack a real hot skb.
> 
> Drivers do this already, and in fact I am very sure I'm mentioned this
> to you at least other time in the past, and one such driver is NIU :-)

You did indeed. I should said "provide a generic and universal API",
usable by average NIC driver, not only big ones. (NIU is more than
10.000 lines of code ;) )

NIU still uses netdev_alloc_skb() API. I was thinking of a function to
allocate and populate the sk_buff, but data part provided by caller.

> 
> It's trivial to do with devices which work on power-of-2 chopped up
> pages.  In fact I'm surprised that RX buffer management scheme is
> not more prevalent in network devices.

Many of them were written in last century :)




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Generic rx-recycling and emergency skb pool
  2010-07-03  6:23 ` Generic rx-recycling and emergency skb pool Eric Dumazet
  2010-07-03  6:46   ` David Miller
@ 2010-08-26 17:31   ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2010-08-26 17:31 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, tglx

* Eric Dumazet | 2010-07-03 08:23:25 [+0200]:

>Sebastian
Eric,

>Most modern NICS are multiqueue, so that each cpu can use a queue on its
>own without slowing down other cpus.
True. gianfar uses multiqueue howver in its napi poll one CPU grabs a
locks and inspects every queue one by one.

>Yet rx recycling has one queue per device, defeating part of the
>multiqueue goal.
Well, we could add rx recycling per queue if you prefer it that way.

>Patch 6/8 even touches dev->refcnt on each emerg packet
>Patch 6/8 adds 8 bytes (emerg_dev) to skb. Oh well...
>
>Adding cache layers, especially dumb ones like this one, is probably the
>sign something more fundamental is broken somewhere.

The memory for the skbs is comming from slab. slab's latency is getting
quite high on memory pressure. Therefore I try to avoid talking to slab
while I'm allocating a new skb. So I created a pool of pre-allocated.

>I do believe for example that netdev_alloc_skb() should not try to use
>the node affinity of the device, but use current cpu node for sk_buff at
>least, and possibly for data part too.
>
>One other problem of skb are the two memory blocs involved, and fact
>that first one (skb) is already very big and fat, and filled/dirtied
>many cycles before its use in RX path.
>
>Maybe its time to provide new API, so that a driver can build an skb at
>the time RX interrupt is handled, not at the time the rx ring buffer is
>renewed. RX ring should only provide the data part to NIC, and skb
>should be built when NIC delivers the frame, so that we provide to IP
>stack a real hot skb.

And where are you allocating the memory from? Remember, SLAB could be
slow :)

Sebastian

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-08-26 17:31 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-02 19:20 Generic rx-recycling and emergency skb pool Sebastian Andrzej Siewior
2010-07-02 19:20 ` [PATCH 1/8] net: implement generic rx recycling Sebastian Andrzej Siewior
2010-07-02 19:20 ` [PATCH 2/8] net/gianfar: use generic recycling infrasstructure Sebastian Andrzej Siewior
2010-07-02 19:20 ` [PATCH 3/8] net/mv643xx: use generic recycling infrastructure Sebastian Andrzej Siewior
2010-07-02 19:20 ` [PATCH 4/8] net/stmmac: " Sebastian Andrzej Siewior
2010-07-02 19:20 ` [PATCH 5/8] net/ucc_geth: " Sebastian Andrzej Siewior
2010-07-02 19:20 ` [PATCH 6/8] net: implement emergency pools Sebastian Andrzej Siewior
2010-07-02 19:20 ` [PATCH 7/8] net/emergency_skb: create a deep copy on clone Sebastian Andrzej Siewior
2010-07-02 19:20 ` [PATCH 8/8] net/emergency: remove locking from reycling pool if emergncy pools are not used Sebastian Andrzej Siewior
2010-07-03  6:23 ` Generic rx-recycling and emergency skb pool Eric Dumazet
2010-07-03  6:46   ` David Miller
2010-07-03  7:31     ` Eric Dumazet
2010-08-26 17:31   ` Sebastian Andrzej Siewior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.