All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] enic: improve rq buff allocation and reduce dma mapping
@ 2015-01-31 12:28 Govindarajulu Varadarajan
  2015-01-31 12:28 ` [PATCH net-next 1/4] enic: implement frag allocator Govindarajulu Varadarajan
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Govindarajulu Varadarajan @ 2015-01-31 12:28 UTC (permalink / raw)
  To: davem, netdev; +Cc: ssujith, benve, edumazet, ben, Govindarajulu Varadarajan

The following series tries to address these two problem in rq buff allocation.

* Memory wastage because of large 9k allocation using kmalloc:
  For 9k mtu buffer, netdev_alloc_skb_ip_align internally calls kmalloc for
  size > 4096. In case of 9k buff, kmalloc returns pages for order 2, 16k.
  And we use only ~9k of 16k. 7k memory wasted. Using the frag the frag
  allocator in patch 1/2, we can allocate three 9k buffs in a 32k page size.
  Typical enic configuration has 8 rq, and desc ring of size 4096.
  Thats 8 * 4096 * (16*1024) = 512 MB. Using this frag allocator:
  8 * 4096 * (32*1024/3) = 341 MB. Thats 171 MB of memory save.

* frequent dma_map() calls:
  we call dma_map() for every buff we allocate. When iommu is on, This is very
  cpu time consuming. From my testing, most of the cpu cycles are wasted
  spinning on spin_lock_irqsave(&iovad->iova_rbtree_lock, flags) in
  intel_map_page() .. -> ..__alloc_and_insert_iova_range()

  With this patch, we call dma_map() once for 32k page. i.e once for every three
  9k desc, and once every twenty 1500 bytes desc.

Here are testing result with 8 rq, 4096 ring size and 9k mtu. irq of each rq
is affinitized with different CPU. Ran iperf with 32 threads. Link is 10G.
iommu is on.

                CPU utilization         throughput

without patch   100%                    1.8 Gbps
with patch      13%                     9.8 Gbps

Govindarajulu Varadarajan (4):
  enic: implement frag allocator
  enic: Add rq allocation failure stats
  ethtool: add RX_ALLOC_ORDER to tunable
  enic: add ethtool support for changing alloc order

 drivers/net/ethernet/cisco/enic/enic.h         |  16 +++
 drivers/net/ethernet/cisco/enic/enic_ethtool.c |  17 +++
 drivers/net/ethernet/cisco/enic/enic_main.c    | 177 +++++++++++++++++++++----
 drivers/net/ethernet/cisco/enic/vnic_rq.c      |  13 ++
 drivers/net/ethernet/cisco/enic/vnic_rq.h      |   2 +
 drivers/net/ethernet/cisco/enic/vnic_stats.h   |   2 +
 include/uapi/linux/ethtool.h                   |   1 +
 net/core/ethtool.c                             |   5 +
 8 files changed, 209 insertions(+), 24 deletions(-)

-- 
2.2.2

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net-next 1/4] enic: implement frag allocator
  2015-01-31 12:28 [PATCH net-next 0/4] enic: improve rq buff allocation and reduce dma mapping Govindarajulu Varadarajan
@ 2015-01-31 12:28 ` Govindarajulu Varadarajan
  2015-01-31 12:28 ` [PATCH net-next 2/4] enic: Add rq allocation failure stats Govindarajulu Varadarajan
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Govindarajulu Varadarajan @ 2015-01-31 12:28 UTC (permalink / raw)
  To: davem, netdev; +Cc: ssujith, benve, edumazet, ben, Govindarajulu Varadarajan

This patch implements frag allocator for rq buffer. This is based on
__alloc_page_frag & __page_frag_refill implementation in net/core/skbuff.c

In addition to frag allocation from order(3) page in __alloc_page_frag,
we also maintain dma address of the page. While allocating a frag for rx buffer
we return va + offset for virtual address of the frag, and pa + offset for
dma address of the frag. This reduces the number of calls to dma_map()
by 1/3 for 9k mtu and by 1/20 for 1500 mtu.

__alloc_page_frag is limited to max buffer size of PAGE_SIZE, i.e 4096 in most
of the cases. So 9k buffer allocation goes through kmalloc which return
page of order 2, 16k. We waste 7k bytes for every 9k buffer.

We maintain dma_count variable which is incremented when we allocate a frag.
enic_unmap_dma will decrement the dma_count and unmap it when there is no user
of that page in rx ring.

This reduces the memory utilization for 9k mtu by 33%.

enic_alloc_cache struct, which stores the page details, is declared per rq.
And all calls to allocation, free, dmap_unmap is serialized. So we do not need
locks.

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
---
 drivers/net/ethernet/cisco/enic/enic.h      |  16 +++
 drivers/net/ethernet/cisco/enic/enic_main.c | 156 +++++++++++++++++++++++-----
 drivers/net/ethernet/cisco/enic/vnic_rq.c   |  13 +++
 drivers/net/ethernet/cisco/enic/vnic_rq.h   |   2 +
 4 files changed, 163 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/cisco/enic/enic.h b/drivers/net/ethernet/cisco/enic/enic.h
index 84b6a2b..7fd3db1 100644
--- a/drivers/net/ethernet/cisco/enic/enic.h
+++ b/drivers/net/ethernet/cisco/enic/enic.h
@@ -20,6 +20,11 @@
 #ifndef _ENIC_H_
 #define _ENIC_H_
 
+#include <linux/if.h>
+#include <linux/if_link.h>
+#include <linux/if_ether.h>
+#include <linux/netdevice.h>
+
 #include "vnic_enet.h"
 #include "vnic_dev.h"
 #include "vnic_wq.h"
@@ -176,6 +181,7 @@ struct enic {
 	u64 rq_truncated_pkts;
 	u64 rq_bad_fcs;
 	struct napi_struct napi[ENIC_RQ_MAX + ENIC_WQ_MAX];
+	u8 alloc_order;
 
 	/* interrupt resource cache line section */
 	____cacheline_aligned struct vnic_intr intr[ENIC_INTR_MAX];
@@ -191,6 +197,16 @@ struct enic {
 	struct vnic_gen_stats gen_stats;
 };
 
+#define ENIC_ALLOC_ORDER		get_order(32 * 1024)
+
+struct enic_alloc_cache {
+	struct page_frag	frag;
+	unsigned int		pagecnt_bias;
+	int			dma_count;
+	void			*va;
+	dma_addr_t		pa;
+};
+
 static inline struct device *enic_get_dev(struct enic *enic)
 {
 	return &(enic->pdev->dev);
diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c
index ee44c82..d9cad93 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -950,6 +950,105 @@ nla_put_failure:
 	return -EMSGSIZE;
 }
 
+struct enic_alloc_cache *enic_page_refill(struct enic *enic, size_t sz,
+					  gfp_t gfp)
+{
+	struct enic_alloc_cache *ec;
+	gfp_t gfp_comp = gfp | __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY;
+	u8 order = enic->alloc_order;
+
+	ec = kzalloc(sizeof(*ec), GFP_ATOMIC);
+	if (unlikely(!ec))
+		goto no_ec;
+	ec->frag.page = alloc_pages_node(NUMA_NO_NODE, gfp_comp, order);
+	if (unlikely(!ec->frag.page)) {
+		order = get_order(sz);
+		ec->frag.page = alloc_pages_node(NUMA_NO_NODE, gfp, order);
+		if (!ec->frag.page)
+			goto free_ec;
+	}
+
+	ec->frag.size = (PAGE_SIZE << order);
+	ec->va = page_address(ec->frag.page);
+	ec->pa = pci_map_single(enic->pdev, ec->va, ec->frag.size,
+				PCI_DMA_FROMDEVICE);
+	if (unlikely(enic_dma_map_check(enic, ec->pa)))
+		goto free_page;
+	atomic_add(ec->frag.size - 1, &ec->frag.page->_count);
+	ec->pagecnt_bias = ec->frag.size;
+	ec->frag.offset = ec->frag.size;
+
+	return ec;
+
+free_page:
+	__free_pages(ec->frag.page, order);
+free_ec:
+	kfree(ec);
+no_ec:
+	return NULL;
+}
+
+struct enic_alloc_cache *enic_alloc_frag(struct vnic_rq *rq, size_t sz)
+{
+	struct enic *enic = vnic_dev_priv(rq->vdev);
+	struct enic_alloc_cache *ec = rq->ec;
+	int offset;
+
+	if (unlikely(!ec)) {
+refill:
+		ec = enic_page_refill(enic, sz, GFP_ATOMIC);
+		rq->ec = ec;
+
+		if (unlikely(!ec))
+			return NULL;
+	}
+
+	offset = ec->frag.offset - sz;
+	if (offset < 0) {
+		if (!atomic_sub_and_test(ec->pagecnt_bias,
+					 &ec->frag.page->_count)) {
+			/* rq cleanup service has processed all the frags
+			 * belonging to this page. Since page->_count is not 0
+			 * and ec->dma_count is 0 these frags should be in
+			 * stack. We should unmap the page here.
+			 */
+			if (!ec->dma_count) {
+				pci_unmap_single(enic->pdev, ec->pa,
+						 ec->frag.size,
+						 PCI_DMA_FROMDEVICE);
+				kfree(ec);
+			} else {
+			/* frags from this page are still in rx queue. Let the
+			 * rx cleanup service unmap the page in enic_unmap_dma.
+			 */
+				ec->pagecnt_bias = 0;
+			}
+			goto refill;
+		}
+		WARN_ON(ec->dma_count);
+		atomic_set(&ec->frag.page->_count, ec->frag.size);
+		ec->pagecnt_bias = ec->frag.size;
+		offset = ec->frag.size - sz;
+	}
+	ec->pagecnt_bias--;
+	ec->dma_count++;
+	ec->frag.offset = offset;
+
+	return ec;
+}
+
+void enic_unmap_dma(struct enic *enic, struct enic_alloc_cache *ec)
+{
+	/* enic_alloc_frag is done using this page. We should be free to unmap
+	 * the page if there are no pending frags in the queue.
+	 */
+	if (!--ec->dma_count && !ec->pagecnt_bias) {
+		pci_unmap_single(enic->pdev, ec->pa, ec->frag.size,
+				 PCI_DMA_FROMDEVICE);
+		kfree(ec);
+	}
+}
+
 static void enic_free_rq_buf(struct vnic_rq *rq, struct vnic_rq_buf *buf)
 {
 	struct enic *enic = vnic_dev_priv(rq->vdev);
@@ -957,8 +1056,7 @@ static void enic_free_rq_buf(struct vnic_rq *rq, struct vnic_rq_buf *buf)
 	if (!buf->os_buf)
 		return;
 
-	pci_unmap_single(enic->pdev, buf->dma_addr,
-		buf->len, PCI_DMA_FROMDEVICE);
+	enic_unmap_dma(enic, buf->ec);
 	dev_kfree_skb_any(buf->os_buf);
 	buf->os_buf = NULL;
 }
@@ -968,10 +1066,12 @@ static int enic_rq_alloc_buf(struct vnic_rq *rq)
 	struct enic *enic = vnic_dev_priv(rq->vdev);
 	struct net_device *netdev = enic->netdev;
 	struct sk_buff *skb;
-	unsigned int len = netdev->mtu + VLAN_ETH_HLEN;
+	unsigned int len;
 	unsigned int os_buf_index = 0;
 	dma_addr_t dma_addr;
 	struct vnic_rq_buf *buf = rq->to_use;
+	struct enic_alloc_cache *ec;
+	void *va;
 
 	if (buf->os_buf) {
 		enic_queue_rq_desc(rq, buf->os_buf, os_buf_index, buf->dma_addr,
@@ -979,21 +1079,33 @@ static int enic_rq_alloc_buf(struct vnic_rq *rq)
 
 		return 0;
 	}
-	skb = netdev_alloc_skb_ip_align(netdev, len);
-	if (!skb)
-		return -ENOMEM;
 
-	dma_addr = pci_map_single(enic->pdev, skb->data, len,
-				  PCI_DMA_FROMDEVICE);
-	if (unlikely(enic_dma_map_check(enic, dma_addr))) {
-		dev_kfree_skb(skb);
-		return -ENOMEM;
-	}
+	len = netdev->mtu + VLAN_ETH_HLEN + NET_IP_ALIGN + NET_SKB_PAD;
+	len = SKB_DATA_ALIGN(len) +
+	      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 
-	enic_queue_rq_desc(rq, skb, os_buf_index,
-		dma_addr, len);
+	ec = enic_alloc_frag(rq, len);
+	if (unlikely(!ec))
+		goto alloc_fail;
+	va = ec->va + ec->frag.offset;
+	skb = build_skb(va, len);
+	if (unlikely(!skb)) {
+		ec->pagecnt_bias++;
+		ec->frag.offset += len;
+		ec->dma_count--;
+
+		goto alloc_fail;
+	}
+	skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
+	dma_addr = ec->pa + ec->frag.offset + NET_SKB_PAD + NET_IP_ALIGN;
+	buf->ec = ec;
+	enic_queue_rq_desc(rq, skb, os_buf_index, dma_addr,
+			   netdev->mtu + VLAN_ETH_HLEN);
 
 	return 0;
+
+alloc_fail:
+	return -ENOMEM;
 }
 
 static void enic_intr_update_pkt_size(struct vnic_rx_bytes_counter *pkt_size,
@@ -1016,8 +1128,6 @@ static bool enic_rxcopybreak(struct net_device *netdev, struct sk_buff **skb,
 	new_skb = netdev_alloc_skb_ip_align(netdev, len);
 	if (!new_skb)
 		return false;
-	pci_dma_sync_single_for_cpu(enic->pdev, buf->dma_addr, len,
-				    DMA_FROM_DEVICE);
 	memcpy(new_skb->data, (*skb)->data, len);
 	*skb = new_skb;
 
@@ -1065,8 +1175,7 @@ static void enic_rq_indicate_buf(struct vnic_rq *rq,
 				enic->rq_truncated_pkts++;
 		}
 
-		pci_unmap_single(enic->pdev, buf->dma_addr, buf->len,
-				 PCI_DMA_FROMDEVICE);
+		enic_unmap_dma(enic, buf->ec);
 		dev_kfree_skb_any(skb);
 		buf->os_buf = NULL;
 
@@ -1077,11 +1186,11 @@ static void enic_rq_indicate_buf(struct vnic_rq *rq,
 
 		/* Good receive
 		 */
-
+		pci_dma_sync_single_for_cpu(enic->pdev, buf->dma_addr,
+					    bytes_written, DMA_FROM_DEVICE);
 		if (!enic_rxcopybreak(netdev, &skb, buf, bytes_written)) {
 			buf->os_buf = NULL;
-			pci_unmap_single(enic->pdev, buf->dma_addr, buf->len,
-					 PCI_DMA_FROMDEVICE);
+			enic_unmap_dma(enic, buf->ec);
 		}
 		prefetch(skb->data - NET_IP_ALIGN);
 
@@ -1122,9 +1231,7 @@ static void enic_rq_indicate_buf(struct vnic_rq *rq,
 
 		/* Buffer overflow
 		 */
-
-		pci_unmap_single(enic->pdev, buf->dma_addr, buf->len,
-				 PCI_DMA_FROMDEVICE);
+		enic_unmap_dma(enic, buf->ec);
 		dev_kfree_skb_any(skb);
 		buf->os_buf = NULL;
 	}
@@ -2637,6 +2744,7 @@ static int enic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto err_out_dev_deinit;
 	}
 	enic->rx_copybreak = RX_COPYBREAK_DEFAULT;
+	enic->alloc_order = ENIC_ALLOC_ORDER;
 
 	return 0;
 
diff --git a/drivers/net/ethernet/cisco/enic/vnic_rq.c b/drivers/net/ethernet/cisco/enic/vnic_rq.c
index 36a2ed6..c31669f 100644
--- a/drivers/net/ethernet/cisco/enic/vnic_rq.c
+++ b/drivers/net/ethernet/cisco/enic/vnic_rq.c
@@ -26,6 +26,7 @@
 
 #include "vnic_dev.h"
 #include "vnic_rq.h"
+#include "enic.h"
 
 static int vnic_rq_alloc_bufs(struct vnic_rq *rq)
 {
@@ -199,6 +200,18 @@ void vnic_rq_clean(struct vnic_rq *rq,
 		rq->ring.desc_avail++;
 	}
 
+	if (rq->ec) {
+		struct enic *enic = vnic_dev_priv(rq->vdev);
+		struct enic_alloc_cache *ec = rq->ec;
+
+		WARN_ON(ec->dma_count);
+		pci_unmap_single(enic->pdev, ec->pa, ec->frag.size,
+				 PCI_DMA_FROMDEVICE);
+		atomic_sub(ec->pagecnt_bias - 1, &ec->frag.page->_count);
+		__free_pages(ec->frag.page, get_order(ec->frag.size));
+		kfree(ec);
+		rq->ec = NULL;
+	}
 	/* Use current fetch_index as the ring starting point */
 	fetch_index = ioread32(&rq->ctrl->fetch_index);
 
diff --git a/drivers/net/ethernet/cisco/enic/vnic_rq.h b/drivers/net/ethernet/cisco/enic/vnic_rq.h
index 8111d52..2e4815a 100644
--- a/drivers/net/ethernet/cisco/enic/vnic_rq.h
+++ b/drivers/net/ethernet/cisco/enic/vnic_rq.h
@@ -73,6 +73,7 @@ struct vnic_rq_buf {
 	unsigned int index;
 	void *desc;
 	uint64_t wr_id;
+	struct enic_alloc_cache	*ec;
 };
 
 struct vnic_rq {
@@ -100,6 +101,7 @@ struct vnic_rq {
 	unsigned int bpoll_state;
 	spinlock_t bpoll_lock;
 #endif /* CONFIG_NET_RX_BUSY_POLL */
+	struct enic_alloc_cache	*ec;
 };
 
 static inline unsigned int vnic_rq_desc_avail(struct vnic_rq *rq)
-- 
2.2.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 2/4] enic: Add rq allocation failure stats
  2015-01-31 12:28 [PATCH net-next 0/4] enic: improve rq buff allocation and reduce dma mapping Govindarajulu Varadarajan
  2015-01-31 12:28 ` [PATCH net-next 1/4] enic: implement frag allocator Govindarajulu Varadarajan
@ 2015-01-31 12:28 ` Govindarajulu Varadarajan
  2015-01-31 12:28 ` [PATCH net-next 3/4] ethtool: add RX_ALLOC_ORDER to tunable Govindarajulu Varadarajan
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Govindarajulu Varadarajan @ 2015-01-31 12:28 UTC (permalink / raw)
  To: davem, netdev; +Cc: ssujith, benve, edumazet, ben, Govindarajulu Varadarajan

This patch adds rq buff allocation failure stats.

cache_alloc_err: incremented when higher order page allocation fails.

enic_rq_alloc_buf: incremented rq buff fails. Either due to page alloc
failure or build_skb.

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
---
 drivers/net/ethernet/cisco/enic/enic_ethtool.c | 2 ++
 drivers/net/ethernet/cisco/enic/enic_main.c    | 3 +++
 drivers/net/ethernet/cisco/enic/vnic_stats.h   | 2 ++
 3 files changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/cisco/enic/enic_ethtool.c b/drivers/net/ethernet/cisco/enic/enic_ethtool.c
index 0c396c1..3f9d91b 100644
--- a/drivers/net/ethernet/cisco/enic/enic_ethtool.c
+++ b/drivers/net/ethernet/cisco/enic/enic_ethtool.c
@@ -86,6 +86,8 @@ static const struct enic_stat enic_rx_stats[] = {
 
 static const struct enic_stat enic_gen_stats[] = {
 	ENIC_GEN_STAT(dma_map_error),
+	ENIC_GEN_STAT(cache_alloc_err),
+	ENIC_GEN_STAT(rq_alloc_err),
 };
 
 static const unsigned int enic_n_tx_stats = ARRAY_SIZE(enic_tx_stats);
diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c
index d9cad93..f15687d 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -962,6 +962,7 @@ struct enic_alloc_cache *enic_page_refill(struct enic *enic, size_t sz,
 		goto no_ec;
 	ec->frag.page = alloc_pages_node(NUMA_NO_NODE, gfp_comp, order);
 	if (unlikely(!ec->frag.page)) {
+		enic->gen_stats.cache_alloc_err++;
 		order = get_order(sz);
 		ec->frag.page = alloc_pages_node(NUMA_NO_NODE, gfp, order);
 		if (!ec->frag.page)
@@ -1105,6 +1106,8 @@ static int enic_rq_alloc_buf(struct vnic_rq *rq)
 	return 0;
 
 alloc_fail:
+	enic->gen_stats.rq_alloc_err++;
+
 	return -ENOMEM;
 }
 
diff --git a/drivers/net/ethernet/cisco/enic/vnic_stats.h b/drivers/net/ethernet/cisco/enic/vnic_stats.h
index 74c81ed..b2a4528 100644
--- a/drivers/net/ethernet/cisco/enic/vnic_stats.h
+++ b/drivers/net/ethernet/cisco/enic/vnic_stats.h
@@ -65,6 +65,8 @@ struct vnic_rx_stats {
 /* Generic statistics */
 struct vnic_gen_stats {
 	u64 dma_map_error;
+	u64 cache_alloc_err;	/* alloc_pages(enic->order) failures */
+	u64 rq_alloc_err;	/* rq buf + skb alloc failures */
 };
 
 struct vnic_stats {
-- 
2.2.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 3/4] ethtool: add RX_ALLOC_ORDER to tunable
  2015-01-31 12:28 [PATCH net-next 0/4] enic: improve rq buff allocation and reduce dma mapping Govindarajulu Varadarajan
  2015-01-31 12:28 ` [PATCH net-next 1/4] enic: implement frag allocator Govindarajulu Varadarajan
  2015-01-31 12:28 ` [PATCH net-next 2/4] enic: Add rq allocation failure stats Govindarajulu Varadarajan
@ 2015-01-31 12:28 ` Govindarajulu Varadarajan
  2015-02-03  3:21   ` David Miller
  2015-01-31 12:28 ` [PATCH net-next 4/4] enic: add ethtool support for changing alloc order Govindarajulu Varadarajan
  2015-02-02 15:56 ` [PATCH net-next 0/4] enic: improve rq buff allocation and reduce dma mapping David Laight
  4 siblings, 1 reply; 10+ messages in thread
From: Govindarajulu Varadarajan @ 2015-01-31 12:28 UTC (permalink / raw)
  To: davem, netdev; +Cc: ssujith, benve, edumazet, ben, Govindarajulu Varadarajan

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
---
 include/uapi/linux/ethtool.h | 1 +
 net/core/ethtool.c           | 5 +++++
 2 files changed, 6 insertions(+)

diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 5f66d9c..59362f8 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -213,6 +213,7 @@ enum tunable_id {
 	ETHTOOL_ID_UNSPEC,
 	ETHTOOL_RX_COPYBREAK,
 	ETHTOOL_TX_COPYBREAK,
+	ETHTOOL_RX_ALLOC_ORDER,
 };
 
 enum tunable_type_id {
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 91f74f3..5fd7ebf 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1670,6 +1670,11 @@ static int ethtool_tunable_valid(const struct ethtool_tunable *tuna)
 		    tuna->type_id != ETHTOOL_TUNABLE_U32)
 			return -EINVAL;
 		break;
+	case ETHTOOL_RX_ALLOC_ORDER:
+		if (tuna->len != sizeof(u8) ||
+		    tuna->type_id != ETHTOOL_TUNABLE_U8)
+			return -EINVAL;
+		break;
 	default:
 		return -EINVAL;
 	}
-- 
2.2.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 4/4] enic: add ethtool support for changing alloc order
  2015-01-31 12:28 [PATCH net-next 0/4] enic: improve rq buff allocation and reduce dma mapping Govindarajulu Varadarajan
                   ` (2 preceding siblings ...)
  2015-01-31 12:28 ` [PATCH net-next 3/4] ethtool: add RX_ALLOC_ORDER to tunable Govindarajulu Varadarajan
@ 2015-01-31 12:28 ` Govindarajulu Varadarajan
  2015-02-02 15:56 ` [PATCH net-next 0/4] enic: improve rq buff allocation and reduce dma mapping David Laight
  4 siblings, 0 replies; 10+ messages in thread
From: Govindarajulu Varadarajan @ 2015-01-31 12:28 UTC (permalink / raw)
  To: davem, netdev; +Cc: ssujith, benve, edumazet, ben, Govindarajulu Varadarajan

Adds support for changing page order of enic frag allocator.

In case of changing mtu, if size of new mtu is greater than size of compound
page allocated in enic frag allocator, we change the order to min order required
for the new mtu. We would like to give high priority for changing mtu than order
value.

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
---
 drivers/net/ethernet/cisco/enic/enic_ethtool.c | 15 +++++++++++++++
 drivers/net/ethernet/cisco/enic/enic_main.c    | 18 ++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/drivers/net/ethernet/cisco/enic/enic_ethtool.c b/drivers/net/ethernet/cisco/enic/enic_ethtool.c
index 3f9d91b..e29423c 100644
--- a/drivers/net/ethernet/cisco/enic/enic_ethtool.c
+++ b/drivers/net/ethernet/cisco/enic/enic_ethtool.c
@@ -18,6 +18,7 @@
 
 #include <linux/netdevice.h>
 #include <linux/ethtool.h>
+#include <linux/if_vlan.h>
 
 #include "enic_res.h"
 #include "enic.h"
@@ -409,6 +410,9 @@ static int enic_get_tunable(struct net_device *dev,
 	case ETHTOOL_RX_COPYBREAK:
 		*(u32 *)data = enic->rx_copybreak;
 		break;
+	case ETHTOOL_RX_ALLOC_ORDER:
+		*(u8 *)data = enic->alloc_order;
+		break;
 	default:
 		ret = -EINVAL;
 		break;
@@ -428,6 +432,17 @@ static int enic_set_tunable(struct net_device *dev,
 	case ETHTOOL_RX_COPYBREAK:
 		enic->rx_copybreak = *(u32 *)data;
 		break;
+	case ETHTOOL_RX_ALLOC_ORDER:
+		ret = dev->mtu + VLAN_ETH_HLEN + NET_IP_ALIGN + NET_SKB_PAD;
+		ret = SKB_DATA_ALIGN(ret) +
+		      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+		if (*(u8 *)data < get_order(ret)) {
+			ret = -EINVAL;
+			break;
+		}
+		ret = 0;
+		enic->alloc_order = *(u8 *)data;
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c
index f15687d..4a759a0 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -1872,7 +1872,11 @@ static int enic_change_mtu(struct net_device *netdev, int new_mtu)
 {
 	struct enic *enic = netdev_priv(netdev);
 	int running = netif_running(netdev);
+	size_t len;
 
+	len = new_mtu + VLAN_ETH_HLEN + NET_IP_ALIGN + NET_SKB_PAD;
+	len = SKB_DATA_ALIGN(len) +
+	      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 	if (new_mtu < ENIC_MIN_MTU || new_mtu > ENIC_MAX_MTU)
 		return -EINVAL;
 
@@ -1884,6 +1888,11 @@ static int enic_change_mtu(struct net_device *netdev, int new_mtu)
 
 	netdev->mtu = new_mtu;
 
+	if (len > (PAGE_SIZE << enic->alloc_order)) {
+		enic->alloc_order = get_order(len);
+		netdev_warn(netdev, "new mtu is greater than size of rx alloc_page, resetting enic->alloc_order to :%d\n",
+			    enic->alloc_order);
+	}
 	if (netdev->mtu > enic->port_mtu)
 		netdev_warn(netdev,
 			"interface MTU (%d) set higher than port MTU (%d)\n",
@@ -1902,8 +1911,12 @@ static void enic_change_mtu_work(struct work_struct *work)
 	int new_mtu = vnic_dev_mtu(enic->vdev);
 	int err;
 	unsigned int i;
+	size_t len;
 
 	new_mtu = max_t(int, ENIC_MIN_MTU, min_t(int, ENIC_MAX_MTU, new_mtu));
+	len = new_mtu + VLAN_ETH_HLEN + NET_IP_ALIGN + NET_SKB_PAD;
+	len = SKB_DATA_ALIGN(len) +
+	      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 
 	rtnl_lock();
 
@@ -1927,6 +1940,11 @@ static void enic_change_mtu_work(struct work_struct *work)
 
 	/* Fill RQ with new_mtu-sized buffers */
 	netdev->mtu = new_mtu;
+	if (len > (PAGE_SIZE << enic->alloc_order)) {
+		enic->alloc_order = get_order(len);
+		netdev_warn(netdev, "new mtu is greater than size of rx alloc_page, resetting enic->alloc_order to :%d\n",
+			    enic->alloc_order);
+	}
 	vnic_rq_fill(&enic->rq[0], enic_rq_alloc_buf);
 	/* Need at least one buffer on ring to get going */
 	if (vnic_rq_desc_used(&enic->rq[0]) == 0) {
-- 
2.2.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* RE: [PATCH net-next 0/4] enic: improve rq buff allocation and reduce dma mapping
  2015-01-31 12:28 [PATCH net-next 0/4] enic: improve rq buff allocation and reduce dma mapping Govindarajulu Varadarajan
                   ` (3 preceding siblings ...)
  2015-01-31 12:28 ` [PATCH net-next 4/4] enic: add ethtool support for changing alloc order Govindarajulu Varadarajan
@ 2015-02-02 15:56 ` David Laight
  2015-02-02 17:49   ` Govindarajulu Varadarajan
  4 siblings, 1 reply; 10+ messages in thread
From: David Laight @ 2015-02-02 15:56 UTC (permalink / raw)
  To: 'Govindarajulu Varadarajan', davem, netdev
  Cc: ssujith, benve, edumazet, ben

From: Govindarajulu Varadarajan
> The following series tries to address these two problem in rq buff allocation.
> 
> * Memory wastage because of large 9k allocation using kmalloc:
>   For 9k mtu buffer, netdev_alloc_skb_ip_align internally calls kmalloc for
>   size > 4096. In case of 9k buff, kmalloc returns pages for order 2, 16k.
>   And we use only ~9k of 16k. 7k memory wasted. Using the frag the frag
>   allocator in patch 1/2, we can allocate three 9k buffs in a 32k page size.
>   Typical enic configuration has 8 rq, and desc ring of size 4096.
>   Thats 8 * 4096 * (16*1024) = 512 MB. Using this frag allocator:
>   8 * 4096 * (32*1024/3) = 341 MB. Thats 171 MB of memory save.
> 
> * frequent dma_map() calls:
>   we call dma_map() for every buff we allocate. When iommu is on, This is very
>   cpu time consuming. From my testing, most of the cpu cycles are wasted
>   spinning on spin_lock_irqsave(&iovad->iova_rbtree_lock, flags) in
>   intel_map_page() .. -> ..__alloc_and_insert_iova_range()
> 
>   With this patch, we call dma_map() once for 32k page. i.e once for every three
>   9k desc, and once every twenty 1500 bytes desc.

Two questions:
1) How are you handling the skb's true_size ?
2) Memory fragmentation could easily make the allocation of 32k fail.

	David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH net-next 0/4] enic: improve rq buff allocation and reduce dma mapping
  2015-02-02 15:56 ` [PATCH net-next 0/4] enic: improve rq buff allocation and reduce dma mapping David Laight
@ 2015-02-02 17:49   ` Govindarajulu Varadarajan
  0 siblings, 0 replies; 10+ messages in thread
From: Govindarajulu Varadarajan @ 2015-02-02 17:49 UTC (permalink / raw)
  To: David Laight
  Cc: 'Govindarajulu Varadarajan',
	davem, netdev, ssujith, benve, edumazet, ben

On Mon, 2 Feb 2015, David Laight wrote:
> Two questions:
> 1) How are you handling the skb's true_size ?

skb->true_size is set to ksize(data) for data allocated using alloc_page and
kmalloc. For frags we set it to size of frag. Check the function build_skb()

> 2) Memory fragmentation could easily make the allocation of 32k fail.
>

With huge memory these days order-3 allocation failure is quite rare. In my
testing on system with 8G memory I have never encountered order-3 failure.
This is probably why __netdev_alloc_frag tries order-3 page allocation first.

If order-3 page allocation fails, we drop to minimum order required for the
given size.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 3/4] ethtool: add RX_ALLOC_ORDER to tunable
  2015-01-31 12:28 ` [PATCH net-next 3/4] ethtool: add RX_ALLOC_ORDER to tunable Govindarajulu Varadarajan
@ 2015-02-03  3:21   ` David Miller
  2015-02-03  9:49     ` Govindarajulu Varadarajan
  0 siblings, 1 reply; 10+ messages in thread
From: David Miller @ 2015-02-03  3:21 UTC (permalink / raw)
  To: _govind; +Cc: netdev, ssujith, benve, edumazet, ben

From: Govindarajulu Varadarajan <_govind@gmx.com>
Date: Sat, 31 Jan 2015 17:58:09 +0530

> Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>

This is terrible.

You haven't explained what this means.

And to tell you the truth, from what I can tell this tunable is
very specific to how you have implemented RX frags in the enic
driver in this series and won't necessarily translate to how
other drivers manage RX buffers.

You need to actually design this facility properly, understand
what the needs are of other drivers and how this facility
can be relevant for more drivers than your own.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 3/4] ethtool: add RX_ALLOC_ORDER to tunable
  2015-02-03  3:21   ` David Miller
@ 2015-02-03  9:49     ` Govindarajulu Varadarajan
  2015-02-05 17:28       ` Govindarajulu Varadarajan
  0 siblings, 1 reply; 10+ messages in thread
From: Govindarajulu Varadarajan @ 2015-02-03  9:49 UTC (permalink / raw)
  To: David Miller; +Cc: _govind, netdev, ssujith, benve, edumazet, ben

On Mon, 2 Feb 2015, David Miller wrote:

> From: Govindarajulu Varadarajan <_govind@gmx.com>
> Date: Sat, 31 Jan 2015 17:58:09 +0530
>
>> Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
>
> This is terrible.
>
> You haven't explained what this means.
>
> And to tell you the truth, from what I can tell this tunable is
> very specific to how you have implemented RX frags in the enic
> driver in this series and won't necessarily translate to how
> other drivers manage RX buffers.
>

Yes I agree that this is too specific to driver. From my knowledge mlx4 also
uses similar frag allocation from large order page. Other that these two
drivers, this facility may not make sense to other drivers.

On systems with huge memory we can go higher than order-3, and if user sees that
order-3 are failing, he should be able to reduce the order.

Since this is very driver specific, are you fine if I move this to device sysfs?


> You need to actually design this facility properly, understand
> what the needs are of other drivers and how this facility
> can be relevant for more drivers than your own.
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 3/4] ethtool: add RX_ALLOC_ORDER to tunable
  2015-02-03  9:49     ` Govindarajulu Varadarajan
@ 2015-02-05 17:28       ` Govindarajulu Varadarajan
  0 siblings, 0 replies; 10+ messages in thread
From: Govindarajulu Varadarajan @ 2015-02-05 17:28 UTC (permalink / raw)
  To: David Miller; +Cc: _govind, netdev, ssujith, benve, edumazet, ben

On Tue, 3 Feb 2015, Govindarajulu Varadarajan wrote:

> On Mon, 2 Feb 2015, David Miller wrote:
>
>> From: Govindarajulu Varadarajan <_govind@gmx.com>
>> Date: Sat, 31 Jan 2015 17:58:09 +0530
>> 
>>> Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
>> 
>> This is terrible.
>> 
>> You haven't explained what this means.
>> 
>> And to tell you the truth, from what I can tell this tunable is
>> very specific to how you have implemented RX frags in the enic
>> driver in this series and won't necessarily translate to how
>> other drivers manage RX buffers.
>> 
>
> Yes I agree that this is too specific to driver. From my knowledge mlx4 also
> uses similar frag allocation from large order page. Other that these two
> drivers, this facility may not make sense to other drivers.
>
> On systems with huge memory we can go higher than order-3, and if user sees 
> that
> order-3 are failing, he should be able to reduce the order.
>
> Since this is very driver specific, are you fine if I move this to device 
> sysfs?
>

I hope there are no issues with the patch 1 & 2. Should I drop the
'changing order' patches and resend patch 1 & 2 alone?

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-02-05 17:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-31 12:28 [PATCH net-next 0/4] enic: improve rq buff allocation and reduce dma mapping Govindarajulu Varadarajan
2015-01-31 12:28 ` [PATCH net-next 1/4] enic: implement frag allocator Govindarajulu Varadarajan
2015-01-31 12:28 ` [PATCH net-next 2/4] enic: Add rq allocation failure stats Govindarajulu Varadarajan
2015-01-31 12:28 ` [PATCH net-next 3/4] ethtool: add RX_ALLOC_ORDER to tunable Govindarajulu Varadarajan
2015-02-03  3:21   ` David Miller
2015-02-03  9:49     ` Govindarajulu Varadarajan
2015-02-05 17:28       ` Govindarajulu Varadarajan
2015-01-31 12:28 ` [PATCH net-next 4/4] enic: add ethtool support for changing alloc order Govindarajulu Varadarajan
2015-02-02 15:56 ` [PATCH net-next 0/4] enic: improve rq buff allocation and reduce dma mapping David Laight
2015-02-02 17:49   ` Govindarajulu Varadarajan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.