All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v9 net-next 0/7] net: low latency Ethernet device polling
@ 2013-06-05 10:34 Eliezer Tamir
  2013-06-05 10:34   ` Eliezer Tamir
                   ` (7 more replies)
  0 siblings, 8 replies; 51+ messages in thread
From: Eliezer Tamir @ 2013-06-05 10:34 UTC (permalink / raw)
  To: David Miller
  Cc: linux-kernel, netdev, Jesse Brandeburg, Don Skidmore,
	e1000-devel, Willem de Bruijn, Eric Dumazet, Ben Hutchings,
	Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz, Amir Vadai,
	Eliezer Tamir

And here is v9.
Except for typo fixes in comments/description, only 2/7 and 5/7 were changed.

Thanks to everyone for their input.

-Eliezer

Change log:
v9
- correct sysctl proc_handler, reported by Eric Dumazet and Amir Vadai.
- more int -> bool changes, reported by Eric Dumazet.
- better mask testing in sock_poll(), reported by Eric Dumazet.

v8
- split out udp and select/poll into separate patches.
  what used to be patch 2/5 is now three patches.
- type corrections from Amir Vadai and Cong Wang:
  one unsigned long that was left when changing to cycles_t
  int -> bool 
- more detailed patch descriptions.

v7
- suggested by Ben Hutchings and Eric Dumazet:
  type fixes, static for globals in net/core.c,
  avoid napi_id collisions in napi_hash_add()

v6
- many small fixes suggested by Eric Dumazet:
  data locality, typos, documentation
  protect napi_hash insert/delete with a spinlock (napi_gen_id is no
  longer atomic_t since it's only accessed with the spinlock held.)
- added IPv6 TCP and UDP support (only minimally tested)

v5
- corrections suggested by Ben Hutchings:
  fixed typos, moved the config option and sysctl value from IPv4 to net
- moved sk_mark_ll() to the protocol handlers
- removed global id mechanism, replaced with a hashed napi_id.
  based on code sample from Eric Dumazet
  Note that ixgbe_free_q_vector() already waits an rcu grace period
  before freeing the q_vector, so nothing additional needs to be done
  when adding a call to napi_hash_del().
- simple poll/select support

v4
- removed separate config option for TCP as suggested Eric Dumazet.
- added linux mib counter for packets received through the low latency path,
  as suggested by Andi Kleen.
- re-allow module unloading, remove module param, use a global generation id
  instead to prevent the use of a stale napi pointer, as suggested
  by Eric Dumazet
- updated Documentation/networking/ip-sysctl.txt text

v3
- coding style changes suggested by Dave Miller

v2
- the sysctl knob is now in microseconds. The default value is now 0 (off).
- for now the code depends at configure time on CONFIG_I86_TSC 
- the napi reference in struct skb is now a union with the dma cookie
  since the former is only used on RX and the latter on TX,
  as suggested by Eric Dumazet.
- we do a better job at honoring non-blocking operations.
- removed busy-polling support for tcp_read_sock()
- remove dynamic disabling of GRO
- coding style fixes
- disallow unloading the device module after the feature has been used

Credit:
Jesse Brandeburg, Arun Chekhov Ilango, Julie Cummings,
Alexander Duyck, Eric Geisler, Jason Neighbors, Yadong Li,
Mike Polehn, Anil Vasudevan, Don Wood
Special thanks for finding bugs in earlier versions:
Willem de Bruijn and Andi Kleen

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v9 net-next 1/7] net: add napi_id and hash
  2013-06-05 10:34 [PATCH v9 net-next 0/7] net: low latency Ethernet device polling Eliezer Tamir
@ 2013-06-05 10:34   ` Eliezer Tamir
  2013-06-05 10:34   ` Eliezer Tamir
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 51+ messages in thread
From: Eliezer Tamir @ 2013-06-05 10:34 UTC (permalink / raw)
  To: David Miller
  Cc: linux-kernel, netdev, Jesse Brandeburg, Don Skidmore,
	e1000-devel, Willem de Bruijn, Eric Dumazet, Ben Hutchings,
	Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz, Amir Vadai,
	Eliezer Tamir

Adds a napi_id and a hashing mechanism to lookup a napi by id.
This will be used by subsequent patches to implement low latency
Ethernet device polling.
Based on a code sample by Eric Dumazet.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 include/linux/netdevice.h |   29 ++++++++++++++++++++++
 net/core/dev.c            |   59 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 88 insertions(+), 0 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 8f967e3..39bbd46 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -324,12 +324,15 @@ struct napi_struct {
 	struct sk_buff		*gro_list;
 	struct sk_buff		*skb;
 	struct list_head	dev_list;
+	struct hlist_node	napi_hash_node;
+	unsigned int		napi_id;
 };
 
 enum {
 	NAPI_STATE_SCHED,	/* Poll is scheduled */
 	NAPI_STATE_DISABLE,	/* Disable pending */
 	NAPI_STATE_NPSVC,	/* Netpoll - don't dequeue from poll_list */
+	NAPI_STATE_HASHED,	/* In NAPI hash */
 };
 
 enum gro_result {
@@ -446,6 +449,32 @@ extern void __napi_complete(struct napi_struct *n);
 extern void napi_complete(struct napi_struct *n);
 
 /**
+ *	napi_by_id - lookup a NAPI by napi_id
+ *	@napi_id: hashed napi_id
+ *
+ * lookup @napi_id in napi_hash table
+ * must be called under rcu_read_lock()
+ */
+extern struct napi_struct *napi_by_id(unsigned int napi_id);
+
+/**
+ *	napi_hash_add - add a NAPI to global hashtable
+ *	@napi: napi context
+ *
+ * generate a new napi_id and store a @napi under it in napi_hash
+ */
+extern void napi_hash_add(struct napi_struct *napi);
+
+/**
+ *	napi_hash_del - remove a NAPI from global table
+ *	@napi: napi context
+ *
+ * Warning: caller must observe rcu grace period
+ * before freeing memory containing @napi
+ */
+extern void napi_hash_del(struct napi_struct *napi);
+
+/**
  *	napi_disable - prevent NAPI from scheduling
  *	@n: napi context
  *
diff --git a/net/core/dev.c b/net/core/dev.c
index 9c18557..fa007db 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -129,6 +129,7 @@
 #include <linux/inetdevice.h>
 #include <linux/cpu_rmap.h>
 #include <linux/static_key.h>
+#include <linux/hashtable.h>
 
 #include "net-sysfs.h"
 
@@ -166,6 +167,12 @@ static struct list_head offload_base __read_mostly;
 DEFINE_RWLOCK(dev_base_lock);
 EXPORT_SYMBOL(dev_base_lock);
 
+/* protects napi_hash addition/deletion and napi_gen_id */
+static DEFINE_SPINLOCK(napi_hash_lock);
+
+static unsigned int napi_gen_id;
+static DEFINE_HASHTABLE(napi_hash, 8);
+
 seqcount_t devnet_rename_seq;
 
 static inline void dev_base_seq_inc(struct net *net)
@@ -4136,6 +4143,58 @@ void napi_complete(struct napi_struct *n)
 }
 EXPORT_SYMBOL(napi_complete);
 
+/* must be called under rcu_read_lock(), as we dont take a reference */
+struct napi_struct *napi_by_id(unsigned int napi_id)
+{
+	unsigned int hash = napi_id % HASH_SIZE(napi_hash);
+	struct napi_struct *napi;
+
+	hlist_for_each_entry_rcu(napi, &napi_hash[hash], napi_hash_node)
+		if (napi->napi_id == napi_id)
+			return napi;
+
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(napi_by_id);
+
+void napi_hash_add(struct napi_struct *napi)
+{
+	if (!test_and_set_bit(NAPI_STATE_HASHED, &napi->state)) {
+
+		spin_lock(&napi_hash_lock);
+
+		/* 0 is not a valid id, we also skip an id that is taken
+		 * we expect both events to be extremely rare
+		 */
+		napi->napi_id = 0;
+		while (!napi->napi_id) {
+			napi->napi_id = ++napi_gen_id;
+			if (napi_by_id(napi->napi_id))
+				napi->napi_id = 0;
+		}
+
+		hlist_add_head_rcu(&napi->napi_hash_node,
+			&napi_hash[napi->napi_id % HASH_SIZE(napi_hash)]);
+
+		spin_unlock(&napi_hash_lock);
+	}
+}
+EXPORT_SYMBOL_GPL(napi_hash_add);
+
+/* Warning : caller is responsible to make sure rcu grace period
+ * is respected before freeing memory containing @napi
+ */
+void napi_hash_del(struct napi_struct *napi)
+{
+	spin_lock(&napi_hash_lock);
+
+	if (test_and_clear_bit(NAPI_STATE_HASHED, &napi->state))
+		hlist_del_rcu(&napi->napi_hash_node);
+
+	spin_unlock(&napi_hash_lock);
+}
+EXPORT_SYMBOL_GPL(napi_hash_del);
+
 void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
 		    int (*poll)(struct napi_struct *, int), int weight)
 {


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v9 net-next 1/7] net: add napi_id and hash
@ 2013-06-05 10:34   ` Eliezer Tamir
  0 siblings, 0 replies; 51+ messages in thread
From: Eliezer Tamir @ 2013-06-05 10:34 UTC (permalink / raw)
  To: David Miller
  Cc: Willem de Bruijn, Or Gerlitz, e1000-devel, netdev, HPA,
	Amir Vadai, linux-kernel, Eliezer Tamir, Jesse Brandeburg,
	Andi Kleen, Ben Hutchings, Eric Dumazet, Eilon Greenstien

Adds a napi_id and a hashing mechanism to lookup a napi by id.
This will be used by subsequent patches to implement low latency
Ethernet device polling.
Based on a code sample by Eric Dumazet.

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 include/linux/netdevice.h |   29 ++++++++++++++++++++++
 net/core/dev.c            |   59 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 88 insertions(+), 0 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 8f967e3..39bbd46 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -324,12 +324,15 @@ struct napi_struct {
 	struct sk_buff		*gro_list;
 	struct sk_buff		*skb;
 	struct list_head	dev_list;
+	struct hlist_node	napi_hash_node;
+	unsigned int		napi_id;
 };
 
 enum {
 	NAPI_STATE_SCHED,	/* Poll is scheduled */
 	NAPI_STATE_DISABLE,	/* Disable pending */
 	NAPI_STATE_NPSVC,	/* Netpoll - don't dequeue from poll_list */
+	NAPI_STATE_HASHED,	/* In NAPI hash */
 };
 
 enum gro_result {
@@ -446,6 +449,32 @@ extern void __napi_complete(struct napi_struct *n);
 extern void napi_complete(struct napi_struct *n);
 
 /**
+ *	napi_by_id - lookup a NAPI by napi_id
+ *	@napi_id: hashed napi_id
+ *
+ * lookup @napi_id in napi_hash table
+ * must be called under rcu_read_lock()
+ */
+extern struct napi_struct *napi_by_id(unsigned int napi_id);
+
+/**
+ *	napi_hash_add - add a NAPI to global hashtable
+ *	@napi: napi context
+ *
+ * generate a new napi_id and store a @napi under it in napi_hash
+ */
+extern void napi_hash_add(struct napi_struct *napi);
+
+/**
+ *	napi_hash_del - remove a NAPI from global table
+ *	@napi: napi context
+ *
+ * Warning: caller must observe rcu grace period
+ * before freeing memory containing @napi
+ */
+extern void napi_hash_del(struct napi_struct *napi);
+
+/**
  *	napi_disable - prevent NAPI from scheduling
  *	@n: napi context
  *
diff --git a/net/core/dev.c b/net/core/dev.c
index 9c18557..fa007db 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -129,6 +129,7 @@
 #include <linux/inetdevice.h>
 #include <linux/cpu_rmap.h>
 #include <linux/static_key.h>
+#include <linux/hashtable.h>
 
 #include "net-sysfs.h"
 
@@ -166,6 +167,12 @@ static struct list_head offload_base __read_mostly;
 DEFINE_RWLOCK(dev_base_lock);
 EXPORT_SYMBOL(dev_base_lock);
 
+/* protects napi_hash addition/deletion and napi_gen_id */
+static DEFINE_SPINLOCK(napi_hash_lock);
+
+static unsigned int napi_gen_id;
+static DEFINE_HASHTABLE(napi_hash, 8);
+
 seqcount_t devnet_rename_seq;
 
 static inline void dev_base_seq_inc(struct net *net)
@@ -4136,6 +4143,58 @@ void napi_complete(struct napi_struct *n)
 }
 EXPORT_SYMBOL(napi_complete);
 
+/* must be called under rcu_read_lock(), as we dont take a reference */
+struct napi_struct *napi_by_id(unsigned int napi_id)
+{
+	unsigned int hash = napi_id % HASH_SIZE(napi_hash);
+	struct napi_struct *napi;
+
+	hlist_for_each_entry_rcu(napi, &napi_hash[hash], napi_hash_node)
+		if (napi->napi_id == napi_id)
+			return napi;
+
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(napi_by_id);
+
+void napi_hash_add(struct napi_struct *napi)
+{
+	if (!test_and_set_bit(NAPI_STATE_HASHED, &napi->state)) {
+
+		spin_lock(&napi_hash_lock);
+
+		/* 0 is not a valid id, we also skip an id that is taken
+		 * we expect both events to be extremely rare
+		 */
+		napi->napi_id = 0;
+		while (!napi->napi_id) {
+			napi->napi_id = ++napi_gen_id;
+			if (napi_by_id(napi->napi_id))
+				napi->napi_id = 0;
+		}
+
+		hlist_add_head_rcu(&napi->napi_hash_node,
+			&napi_hash[napi->napi_id % HASH_SIZE(napi_hash)]);
+
+		spin_unlock(&napi_hash_lock);
+	}
+}
+EXPORT_SYMBOL_GPL(napi_hash_add);
+
+/* Warning : caller is responsible to make sure rcu grace period
+ * is respected before freeing memory containing @napi
+ */
+void napi_hash_del(struct napi_struct *napi)
+{
+	spin_lock(&napi_hash_lock);
+
+	if (test_and_clear_bit(NAPI_STATE_HASHED, &napi->state))
+		hlist_del_rcu(&napi->napi_hash_node);
+
+	spin_unlock(&napi_hash_lock);
+}
+EXPORT_SYMBOL_GPL(napi_hash_del);
+
 void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
 		    int (*poll)(struct napi_struct *, int), int weight)
 {


------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v9 net-next 2/7] net: add low latency socket poll
  2013-06-05 10:34 [PATCH v9 net-next 0/7] net: low latency Ethernet device polling Eliezer Tamir
@ 2013-06-05 10:34   ` Eliezer Tamir
  2013-06-05 10:34   ` Eliezer Tamir
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 51+ messages in thread
From: Eliezer Tamir @ 2013-06-05 10:34 UTC (permalink / raw)
  To: David Miller
  Cc: linux-kernel, netdev, Jesse Brandeburg, Don Skidmore,
	e1000-devel, Willem de Bruijn, Eric Dumazet, Ben Hutchings,
	Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz, Amir Vadai,
	Eliezer Tamir

Adds an ndo_ll_poll method and the code that supports it.
This method can be used by low latency applications to busy-poll
Ethernet device queues directly from the socket code.
sysctl_net_ll_poll controls how many microseconds to poll.
Default is zero (disabled).
Individual protocol support will be added by subsequent patches.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 Documentation/sysctl/net.txt |    7 ++
 include/linux/netdevice.h    |    3 +
 include/linux/skbuff.h       |    8 ++
 include/net/ll_poll.h        |  148 ++++++++++++++++++++++++++++++++++++++++++
 include/net/sock.h           |    4 +
 include/uapi/linux/snmp.h    |    1 
 net/Kconfig                  |   12 +++
 net/core/skbuff.c            |    4 +
 net/core/sock.c              |    6 ++
 net/core/sysctl_net_core.c   |   10 +++
 net/ipv4/proc.c              |    1 
 net/socket.c                 |    6 ++
 12 files changed, 208 insertions(+), 2 deletions(-)
 create mode 100644 include/net/ll_poll.h

diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
index c1f8640..85ab72d 100644
--- a/Documentation/sysctl/net.txt
+++ b/Documentation/sysctl/net.txt
@@ -50,6 +50,13 @@ The maximum number of packets that kernel can handle on a NAPI interrupt,
 it's a Per-CPU variable.
 Default: 64
 
+low_latency_poll
+----------------
+Low latency busy poll timeout. (needs CONFIG_NET_LL_RX_POLL)
+Approximate time in us to spin waiting for packets on the device queue.
+Recommended value is 50. May increase power usage.
+Default: 0 (off)
+
 rmem_default
 ------------
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 39bbd46..2ecb96d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -972,6 +972,9 @@ struct net_device_ops {
 						     gfp_t gfp);
 	void			(*ndo_netpoll_cleanup)(struct net_device *dev);
 #endif
+#ifdef CONFIG_NET_LL_RX_POLL
+	int			(*ndo_ll_poll)(struct napi_struct *dev);
+#endif
 	int			(*ndo_set_vf_mac)(struct net_device *dev,
 						  int queue, u8 *mac);
 	int			(*ndo_set_vf_vlan)(struct net_device *dev,
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b999790..908246b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -386,6 +386,7 @@ typedef unsigned char *sk_buff_data_t;
  *	@no_fcs:  Request NIC to treat last 4 bytes as Ethernet FCS
  *	@dma_cookie: a cookie to one of several possible DMA operations
  *		done by skb DMA functions
+  *	@napi_id: id of the NAPI struct this skb came from
  *	@secmark: security marking
  *	@mark: Generic packet mark
  *	@dropcount: total number of sk_receive_queue overflows
@@ -500,8 +501,11 @@ struct sk_buff {
 	/* 7/9 bit hole (depending on ndisc_nodetype presence) */
 	kmemcheck_bitfield_end(flags2);
 
-#ifdef CONFIG_NET_DMA
-	dma_cookie_t		dma_cookie;
+#if defined CONFIG_NET_DMA || defined CONFIG_NET_LL_RX_POLL
+	union {
+		unsigned int	napi_id;
+		dma_cookie_t	dma_cookie;
+	};
 #endif
 #ifdef CONFIG_NETWORK_SECMARK
 	__u32			secmark;
diff --git a/include/net/ll_poll.h b/include/net/ll_poll.h
new file mode 100644
index 0000000..bc262f8
--- /dev/null
+++ b/include/net/ll_poll.h
@@ -0,0 +1,148 @@
+/*
+ * Low Latency Sockets
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ * Author: Eliezer Tamir
+ *
+ * Contact Information:
+ * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
+ */
+
+/*
+ * For now this depends on CONFIG_X86_TSC
+ */
+
+#ifndef _LINUX_NET_LL_POLL_H
+#define _LINUX_NET_LL_POLL_H
+
+#include <linux/netdevice.h>
+#include <net/ip.h>
+
+#ifdef CONFIG_NET_LL_RX_POLL
+
+struct napi_struct;
+extern unsigned long sysctl_net_ll_poll __read_mostly;
+
+/* return values from ndo_ll_poll */
+#define LL_FLUSH_FAILED		-1
+#define LL_FLUSH_BUSY		-2
+
+/* we don't mind a ~2.5% imprecision */
+#define TSC_MHZ (tsc_khz >> 10)
+
+static inline cycles_t ll_end_time(void)
+{
+	return TSC_MHZ * ACCESS_ONCE(sysctl_net_ll_poll) + get_cycles();
+}
+
+static inline bool sk_valid_ll(struct sock *sk)
+{
+	return sysctl_net_ll_poll && sk->sk_napi_id &&
+	       !need_resched() && !signal_pending(current);
+}
+
+static inline bool can_poll_ll(cycles_t end_time)
+{
+	return !time_after((unsigned long)get_cycles(),
+			    (unsigned long)end_time);
+}
+
+static inline bool sk_poll_ll(struct sock *sk, int nonblock)
+{
+	cycles_t end_time = ll_end_time();
+	const struct net_device_ops *ops;
+	struct napi_struct *napi;
+	int rc = false;
+
+	/*
+	 * rcu read lock for napi hash
+	 * bh so we don't race with net_rx_action
+	 */
+	rcu_read_lock_bh();
+
+	napi = napi_by_id(sk->sk_napi_id);
+	if (!napi)
+		goto out;
+
+	ops = napi->dev->netdev_ops;
+	if (!ops->ndo_ll_poll)
+		goto out;
+
+	do {
+
+		rc = ops->ndo_ll_poll(napi);
+
+		if (rc == LL_FLUSH_FAILED)
+			break; /* permanent failure */
+
+		if (rc > 0)
+			/* local bh are disabled so it is ok to use _BH */
+			NET_ADD_STATS_BH(sock_net(sk),
+					 LINUX_MIB_LOWLATENCYRXPACKETS, rc);
+
+	} while (skb_queue_empty(&sk->sk_receive_queue)
+			&& can_poll_ll(end_time) && !nonblock);
+
+	rc = !skb_queue_empty(&sk->sk_receive_queue);
+out:
+	rcu_read_unlock_bh();
+	return rc;
+}
+
+/* used in the NIC receive handler to mark the skb */
+static inline void skb_mark_ll(struct sk_buff *skb, struct napi_struct *napi)
+{
+	skb->napi_id = napi->napi_id;
+}
+
+/* used in the protocol hanlder to propagate the napi_id to the socket */
+static inline void sk_mark_ll(struct sock *sk, struct sk_buff *skb)
+{
+	sk->sk_napi_id = skb->napi_id;
+}
+
+#else /* CONFIG_NET_LL_RX_POLL */
+
+static inline cycles_t ll_end_time(void)
+{
+	return 0;
+}
+
+static inline bool sk_valid_ll(struct sock *sk)
+{
+	return false;
+}
+
+static inline bool sk_poll_ll(struct sock *sk, int nonblock)
+{
+	return false;
+}
+
+static inline void skb_mark_ll(struct sk_buff *skb, struct napi_struct *napi)
+{
+}
+
+static inline void sk_mark_ll(struct sock *sk, struct sk_buff *skb)
+{
+}
+
+static inline bool can_poll_ll(cycles_t end_time)
+{
+	return false;
+}
+
+#endif /* CONFIG_NET_LL_RX_POLL */
+#endif /* _LINUX_NET_LL_POLL_H */
diff --git a/include/net/sock.h b/include/net/sock.h
index 66772cf..ac8e181 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -229,6 +229,7 @@ struct cg_proto;
   *	@sk_omem_alloc: "o" is "option" or "other"
   *	@sk_wmem_queued: persistent queue size
   *	@sk_forward_alloc: space allocated forward
+  *	@sk_napi_id: id of the last napi context to receive data for sk
   *	@sk_allocation: allocation mode
   *	@sk_sndbuf: size of send buffer in bytes
   *	@sk_flags: %SO_LINGER (l_onoff), %SO_BROADCAST, %SO_KEEPALIVE,
@@ -325,6 +326,9 @@ struct sock {
 #ifdef CONFIG_RPS
 	__u32			sk_rxhash;
 #endif
+#ifdef CONFIG_NET_LL_RX_POLL
+	unsigned int		sk_napi_id;
+#endif
 	atomic_t		sk_drops;
 	int			sk_rcvbuf;
 
diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index df2e8b4..26cbf76 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -253,6 +253,7 @@ enum
 	LINUX_MIB_TCPFASTOPENLISTENOVERFLOW,	/* TCPFastOpenListenOverflow */
 	LINUX_MIB_TCPFASTOPENCOOKIEREQD,	/* TCPFastOpenCookieReqd */
 	LINUX_MIB_TCPSPURIOUS_RTX_HOSTQUEUES, /* TCPSpuriousRtxHostQueues */
+	LINUX_MIB_LOWLATENCYRXPACKETS,		/* LowLatencyRxPackets */
 	__LINUX_MIB_MAX
 };
 
diff --git a/net/Kconfig b/net/Kconfig
index 523e43e..d6a9ce6 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -243,6 +243,18 @@ config NETPRIO_CGROUP
 	  Cgroup subsystem for use in assigning processes to network priorities on
 	  a per-interface basis
 
+config NET_LL_RX_POLL
+	bool "Low Latency Receive Poll"
+	depends on X86_TSC
+	default n
+	---help---
+	  Support Low Latency Receive Queue Poll.
+	  (For network card drivers which support this option.)
+	  When waiting for data in read or poll call directly into the the device driver
+	  to flush packets which may be pending on the device queues into the stack.
+
+	  If unsure, say N.
+
 config BQL
 	boolean
 	depends on SYSFS
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 6b1b52c..73ef74d 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -733,6 +733,10 @@ static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
 	new->vlan_tci		= old->vlan_tci;
 
 	skb_copy_secmark(new, old);
+
+#ifdef CONFIG_NET_LL_RX_POLL
+	new->napi_id	= old->napi_id;
+#endif
 }
 
 /*
diff --git a/net/core/sock.c b/net/core/sock.c
index 6ba327d..804fd5b 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -139,6 +139,8 @@
 #include <net/tcp.h>
 #endif
 
+#include <net/ll_poll.h>
+
 static DEFINE_MUTEX(proto_list_mutex);
 static LIST_HEAD(proto_list);
 
@@ -2284,6 +2286,10 @@ void sock_init_data(struct socket *sock, struct sock *sk)
 
 	sk->sk_stamp = ktime_set(-1L, 0);
 
+#ifdef CONFIG_NET_LL_RX_POLL
+	sk->sk_napi_id		=	0;
+#endif
+
 	/*
 	 * Before updating sk_refcnt, we must commit prior changes to memory
 	 * (Documentation/RCU/rculist_nulls.txt for details)
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 741db5fc..4b48f39 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -19,6 +19,7 @@
 #include <net/ip.h>
 #include <net/sock.h>
 #include <net/net_ratelimit.h>
+#include <net/ll_poll.h>
 
 static int one = 1;
 
@@ -284,6 +285,15 @@ static struct ctl_table net_core_table[] = {
 		.proc_handler	= flow_limit_table_len_sysctl
 	},
 #endif /* CONFIG_NET_FLOW_LIMIT */
+#ifdef CONFIG_NET_LL_RX_POLL
+	{
+		.procname	= "low_latency_poll",
+		.data		= &sysctl_net_ll_poll,
+		.maxlen		= sizeof(unsigned long),
+		.mode		= 0644,
+		.proc_handler	= proc_doulongvec_minmax
+	},
+#endif
 #endif /* CONFIG_NET */
 	{
 		.procname	= "netdev_budget",
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index 2a5bf86..6577a11 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -273,6 +273,7 @@ static const struct snmp_mib snmp4_net_list[] = {
 	SNMP_MIB_ITEM("TCPFastOpenListenOverflow", LINUX_MIB_TCPFASTOPENLISTENOVERFLOW),
 	SNMP_MIB_ITEM("TCPFastOpenCookieReqd", LINUX_MIB_TCPFASTOPENCOOKIEREQD),
 	SNMP_MIB_ITEM("TCPSpuriousRtxHostQueues", LINUX_MIB_TCPSPURIOUS_RTX_HOSTQUEUES),
+	SNMP_MIB_ITEM("LowLatencyRxPackets", LINUX_MIB_LOWLATENCYRXPACKETS),
 	SNMP_MIB_SENTINEL
 };
 
diff --git a/net/socket.c b/net/socket.c
index 6b94633..721f4e7 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -104,6 +104,12 @@
 #include <linux/route.h>
 #include <linux/sockios.h>
 #include <linux/atalk.h>
+#include <net/ll_poll.h>
+
+#ifdef CONFIG_NET_LL_RX_POLL
+unsigned long sysctl_net_ll_poll __read_mostly;
+EXPORT_SYMBOL_GPL(sysctl_net_ll_poll);
+#endif
 
 static int sock_no_open(struct inode *irrelevant, struct file *dontcare);
 static ssize_t sock_aio_read(struct kiocb *iocb, const struct iovec *iov,


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v9 net-next 2/7] net: add low latency socket poll
@ 2013-06-05 10:34   ` Eliezer Tamir
  0 siblings, 0 replies; 51+ messages in thread
From: Eliezer Tamir @ 2013-06-05 10:34 UTC (permalink / raw)
  To: David Miller
  Cc: Willem de Bruijn, Or Gerlitz, e1000-devel, netdev, HPA,
	Amir Vadai, linux-kernel, Eliezer Tamir, Jesse Brandeburg,
	Andi Kleen, Ben Hutchings, Eric Dumazet, Eilon Greenstien

Adds an ndo_ll_poll method and the code that supports it.
This method can be used by low latency applications to busy-poll
Ethernet device queues directly from the socket code.
sysctl_net_ll_poll controls how many microseconds to poll.
Default is zero (disabled).
Individual protocol support will be added by subsequent patches.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 Documentation/sysctl/net.txt |    7 ++
 include/linux/netdevice.h    |    3 +
 include/linux/skbuff.h       |    8 ++
 include/net/ll_poll.h        |  148 ++++++++++++++++++++++++++++++++++++++++++
 include/net/sock.h           |    4 +
 include/uapi/linux/snmp.h    |    1 
 net/Kconfig                  |   12 +++
 net/core/skbuff.c            |    4 +
 net/core/sock.c              |    6 ++
 net/core/sysctl_net_core.c   |   10 +++
 net/ipv4/proc.c              |    1 
 net/socket.c                 |    6 ++
 12 files changed, 208 insertions(+), 2 deletions(-)
 create mode 100644 include/net/ll_poll.h

diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
index c1f8640..85ab72d 100644
--- a/Documentation/sysctl/net.txt
+++ b/Documentation/sysctl/net.txt
@@ -50,6 +50,13 @@ The maximum number of packets that kernel can handle on a NAPI interrupt,
 it's a Per-CPU variable.
 Default: 64
 
+low_latency_poll
+----------------
+Low latency busy poll timeout. (needs CONFIG_NET_LL_RX_POLL)
+Approximate time in us to spin waiting for packets on the device queue.
+Recommended value is 50. May increase power usage.
+Default: 0 (off)
+
 rmem_default
 ------------
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 39bbd46..2ecb96d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -972,6 +972,9 @@ struct net_device_ops {
 						     gfp_t gfp);
 	void			(*ndo_netpoll_cleanup)(struct net_device *dev);
 #endif
+#ifdef CONFIG_NET_LL_RX_POLL
+	int			(*ndo_ll_poll)(struct napi_struct *dev);
+#endif
 	int			(*ndo_set_vf_mac)(struct net_device *dev,
 						  int queue, u8 *mac);
 	int			(*ndo_set_vf_vlan)(struct net_device *dev,
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b999790..908246b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -386,6 +386,7 @@ typedef unsigned char *sk_buff_data_t;
  *	@no_fcs:  Request NIC to treat last 4 bytes as Ethernet FCS
  *	@dma_cookie: a cookie to one of several possible DMA operations
  *		done by skb DMA functions
+  *	@napi_id: id of the NAPI struct this skb came from
  *	@secmark: security marking
  *	@mark: Generic packet mark
  *	@dropcount: total number of sk_receive_queue overflows
@@ -500,8 +501,11 @@ struct sk_buff {
 	/* 7/9 bit hole (depending on ndisc_nodetype presence) */
 	kmemcheck_bitfield_end(flags2);
 
-#ifdef CONFIG_NET_DMA
-	dma_cookie_t		dma_cookie;
+#if defined CONFIG_NET_DMA || defined CONFIG_NET_LL_RX_POLL
+	union {
+		unsigned int	napi_id;
+		dma_cookie_t	dma_cookie;
+	};
 #endif
 #ifdef CONFIG_NETWORK_SECMARK
 	__u32			secmark;
diff --git a/include/net/ll_poll.h b/include/net/ll_poll.h
new file mode 100644
index 0000000..bc262f8
--- /dev/null
+++ b/include/net/ll_poll.h
@@ -0,0 +1,148 @@
+/*
+ * Low Latency Sockets
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ * Author: Eliezer Tamir
+ *
+ * Contact Information:
+ * e1000-devel Mailing List <e1000-devel@lists.sourceforge.net>
+ */
+
+/*
+ * For now this depends on CONFIG_X86_TSC
+ */
+
+#ifndef _LINUX_NET_LL_POLL_H
+#define _LINUX_NET_LL_POLL_H
+
+#include <linux/netdevice.h>
+#include <net/ip.h>
+
+#ifdef CONFIG_NET_LL_RX_POLL
+
+struct napi_struct;
+extern unsigned long sysctl_net_ll_poll __read_mostly;
+
+/* return values from ndo_ll_poll */
+#define LL_FLUSH_FAILED		-1
+#define LL_FLUSH_BUSY		-2
+
+/* we don't mind a ~2.5% imprecision */
+#define TSC_MHZ (tsc_khz >> 10)
+
+static inline cycles_t ll_end_time(void)
+{
+	return TSC_MHZ * ACCESS_ONCE(sysctl_net_ll_poll) + get_cycles();
+}
+
+static inline bool sk_valid_ll(struct sock *sk)
+{
+	return sysctl_net_ll_poll && sk->sk_napi_id &&
+	       !need_resched() && !signal_pending(current);
+}
+
+static inline bool can_poll_ll(cycles_t end_time)
+{
+	return !time_after((unsigned long)get_cycles(),
+			    (unsigned long)end_time);
+}
+
+static inline bool sk_poll_ll(struct sock *sk, int nonblock)
+{
+	cycles_t end_time = ll_end_time();
+	const struct net_device_ops *ops;
+	struct napi_struct *napi;
+	int rc = false;
+
+	/*
+	 * rcu read lock for napi hash
+	 * bh so we don't race with net_rx_action
+	 */
+	rcu_read_lock_bh();
+
+	napi = napi_by_id(sk->sk_napi_id);
+	if (!napi)
+		goto out;
+
+	ops = napi->dev->netdev_ops;
+	if (!ops->ndo_ll_poll)
+		goto out;
+
+	do {
+
+		rc = ops->ndo_ll_poll(napi);
+
+		if (rc == LL_FLUSH_FAILED)
+			break; /* permanent failure */
+
+		if (rc > 0)
+			/* local bh are disabled so it is ok to use _BH */
+			NET_ADD_STATS_BH(sock_net(sk),
+					 LINUX_MIB_LOWLATENCYRXPACKETS, rc);
+
+	} while (skb_queue_empty(&sk->sk_receive_queue)
+			&& can_poll_ll(end_time) && !nonblock);
+
+	rc = !skb_queue_empty(&sk->sk_receive_queue);
+out:
+	rcu_read_unlock_bh();
+	return rc;
+}
+
+/* used in the NIC receive handler to mark the skb */
+static inline void skb_mark_ll(struct sk_buff *skb, struct napi_struct *napi)
+{
+	skb->napi_id = napi->napi_id;
+}
+
+/* used in the protocol hanlder to propagate the napi_id to the socket */
+static inline void sk_mark_ll(struct sock *sk, struct sk_buff *skb)
+{
+	sk->sk_napi_id = skb->napi_id;
+}
+
+#else /* CONFIG_NET_LL_RX_POLL */
+
+static inline cycles_t ll_end_time(void)
+{
+	return 0;
+}
+
+static inline bool sk_valid_ll(struct sock *sk)
+{
+	return false;
+}
+
+static inline bool sk_poll_ll(struct sock *sk, int nonblock)
+{
+	return false;
+}
+
+static inline void skb_mark_ll(struct sk_buff *skb, struct napi_struct *napi)
+{
+}
+
+static inline void sk_mark_ll(struct sock *sk, struct sk_buff *skb)
+{
+}
+
+static inline bool can_poll_ll(cycles_t end_time)
+{
+	return false;
+}
+
+#endif /* CONFIG_NET_LL_RX_POLL */
+#endif /* _LINUX_NET_LL_POLL_H */
diff --git a/include/net/sock.h b/include/net/sock.h
index 66772cf..ac8e181 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -229,6 +229,7 @@ struct cg_proto;
   *	@sk_omem_alloc: "o" is "option" or "other"
   *	@sk_wmem_queued: persistent queue size
   *	@sk_forward_alloc: space allocated forward
+  *	@sk_napi_id: id of the last napi context to receive data for sk
   *	@sk_allocation: allocation mode
   *	@sk_sndbuf: size of send buffer in bytes
   *	@sk_flags: %SO_LINGER (l_onoff), %SO_BROADCAST, %SO_KEEPALIVE,
@@ -325,6 +326,9 @@ struct sock {
 #ifdef CONFIG_RPS
 	__u32			sk_rxhash;
 #endif
+#ifdef CONFIG_NET_LL_RX_POLL
+	unsigned int		sk_napi_id;
+#endif
 	atomic_t		sk_drops;
 	int			sk_rcvbuf;
 
diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index df2e8b4..26cbf76 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -253,6 +253,7 @@ enum
 	LINUX_MIB_TCPFASTOPENLISTENOVERFLOW,	/* TCPFastOpenListenOverflow */
 	LINUX_MIB_TCPFASTOPENCOOKIEREQD,	/* TCPFastOpenCookieReqd */
 	LINUX_MIB_TCPSPURIOUS_RTX_HOSTQUEUES, /* TCPSpuriousRtxHostQueues */
+	LINUX_MIB_LOWLATENCYRXPACKETS,		/* LowLatencyRxPackets */
 	__LINUX_MIB_MAX
 };
 
diff --git a/net/Kconfig b/net/Kconfig
index 523e43e..d6a9ce6 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -243,6 +243,18 @@ config NETPRIO_CGROUP
 	  Cgroup subsystem for use in assigning processes to network priorities on
 	  a per-interface basis
 
+config NET_LL_RX_POLL
+	bool "Low Latency Receive Poll"
+	depends on X86_TSC
+	default n
+	---help---
+	  Support Low Latency Receive Queue Poll.
+	  (For network card drivers which support this option.)
+	  When waiting for data in read or poll call directly into the the device driver
+	  to flush packets which may be pending on the device queues into the stack.
+
+	  If unsure, say N.
+
 config BQL
 	boolean
 	depends on SYSFS
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 6b1b52c..73ef74d 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -733,6 +733,10 @@ static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
 	new->vlan_tci		= old->vlan_tci;
 
 	skb_copy_secmark(new, old);
+
+#ifdef CONFIG_NET_LL_RX_POLL
+	new->napi_id	= old->napi_id;
+#endif
 }
 
 /*
diff --git a/net/core/sock.c b/net/core/sock.c
index 6ba327d..804fd5b 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -139,6 +139,8 @@
 #include <net/tcp.h>
 #endif
 
+#include <net/ll_poll.h>
+
 static DEFINE_MUTEX(proto_list_mutex);
 static LIST_HEAD(proto_list);
 
@@ -2284,6 +2286,10 @@ void sock_init_data(struct socket *sock, struct sock *sk)
 
 	sk->sk_stamp = ktime_set(-1L, 0);
 
+#ifdef CONFIG_NET_LL_RX_POLL
+	sk->sk_napi_id		=	0;
+#endif
+
 	/*
 	 * Before updating sk_refcnt, we must commit prior changes to memory
 	 * (Documentation/RCU/rculist_nulls.txt for details)
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 741db5fc..4b48f39 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -19,6 +19,7 @@
 #include <net/ip.h>
 #include <net/sock.h>
 #include <net/net_ratelimit.h>
+#include <net/ll_poll.h>
 
 static int one = 1;
 
@@ -284,6 +285,15 @@ static struct ctl_table net_core_table[] = {
 		.proc_handler	= flow_limit_table_len_sysctl
 	},
 #endif /* CONFIG_NET_FLOW_LIMIT */
+#ifdef CONFIG_NET_LL_RX_POLL
+	{
+		.procname	= "low_latency_poll",
+		.data		= &sysctl_net_ll_poll,
+		.maxlen		= sizeof(unsigned long),
+		.mode		= 0644,
+		.proc_handler	= proc_doulongvec_minmax
+	},
+#endif
 #endif /* CONFIG_NET */
 	{
 		.procname	= "netdev_budget",
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index 2a5bf86..6577a11 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -273,6 +273,7 @@ static const struct snmp_mib snmp4_net_list[] = {
 	SNMP_MIB_ITEM("TCPFastOpenListenOverflow", LINUX_MIB_TCPFASTOPENLISTENOVERFLOW),
 	SNMP_MIB_ITEM("TCPFastOpenCookieReqd", LINUX_MIB_TCPFASTOPENCOOKIEREQD),
 	SNMP_MIB_ITEM("TCPSpuriousRtxHostQueues", LINUX_MIB_TCPSPURIOUS_RTX_HOSTQUEUES),
+	SNMP_MIB_ITEM("LowLatencyRxPackets", LINUX_MIB_LOWLATENCYRXPACKETS),
 	SNMP_MIB_SENTINEL
 };
 
diff --git a/net/socket.c b/net/socket.c
index 6b94633..721f4e7 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -104,6 +104,12 @@
 #include <linux/route.h>
 #include <linux/sockios.h>
 #include <linux/atalk.h>
+#include <net/ll_poll.h>
+
+#ifdef CONFIG_NET_LL_RX_POLL
+unsigned long sysctl_net_ll_poll __read_mostly;
+EXPORT_SYMBOL_GPL(sysctl_net_ll_poll);
+#endif
 
 static int sock_no_open(struct inode *irrelevant, struct file *dontcare);
 static ssize_t sock_aio_read(struct kiocb *iocb, const struct iovec *iov,


------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v9 net-next 3/7] udp: add low latency socket poll support
  2013-06-05 10:34 [PATCH v9 net-next 0/7] net: low latency Ethernet device polling Eliezer Tamir
@ 2013-06-05 10:34   ` Eliezer Tamir
  2013-06-05 10:34   ` Eliezer Tamir
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 51+ messages in thread
From: Eliezer Tamir @ 2013-06-05 10:34 UTC (permalink / raw)
  To: David Miller
  Cc: linux-kernel, netdev, Jesse Brandeburg, Don Skidmore,
	e1000-devel, Willem de Bruijn, Eric Dumazet, Ben Hutchings,
	Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz, Amir Vadai,
	Eliezer Tamir

Add upport for busy-polling on UDP sockets.
In __udp[46]_lib_rcv add a call to sk_mark_ll() to copy the napi_id
from the skb into the sk.
This is done at the earliest possible moment, right after we identify
which socket this skb is for.
In __skb_recv_datagram When there is no data and the user
tries to read we busy poll.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 net/core/datagram.c |    4 ++++
 net/ipv4/udp.c      |    6 +++++-
 net/ipv6/udp.c      |    6 +++++-
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/net/core/datagram.c b/net/core/datagram.c
index b71423d..9cbaba9 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -56,6 +56,7 @@
 #include <net/sock.h>
 #include <net/tcp_states.h>
 #include <trace/events/skb.h>
+#include <net/ll_poll.h>
 
 /*
  *	Is a socket 'connection oriented' ?
@@ -207,6 +208,9 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
 		}
 		spin_unlock_irqrestore(&queue->lock, cpu_flags);
 
+		if (sk_valid_ll(sk) && sk_poll_ll(sk, flags & MSG_DONTWAIT))
+			continue;
+
 		/* User doesn't want to wait */
 		error = -EAGAIN;
 		if (!timeo)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index c7338ec..2955b25 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -109,6 +109,7 @@
 #include <trace/events/udp.h>
 #include <linux/static_key.h>
 #include <trace/events/skb.h>
+#include <net/ll_poll.h>
 #include "udp_impl.h"
 
 struct udp_table udp_table __read_mostly;
@@ -1709,7 +1710,10 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
 
 	if (sk != NULL) {
-		int ret = udp_queue_rcv_skb(sk, skb);
+		int ret;
+
+		sk_mark_ll(sk, skb);
+		ret = udp_queue_rcv_skb(sk, skb);
 		sock_put(sk);
 
 		/* a return value > 0 means to resubmit the input, but
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index b580853..f77e34c 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -46,6 +46,7 @@
 #include <net/ip6_checksum.h>
 #include <net/xfrm.h>
 #include <net/inet6_hashtables.h>
+#include <net/ll_poll.h>
 
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
@@ -841,7 +842,10 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	 */
 	sk = __udp6_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
 	if (sk != NULL) {
-		int ret = udpv6_queue_rcv_skb(sk, skb);
+		int ret;
+
+		sk_mark_ll(sk, skb);
+		ret = udpv6_queue_rcv_skb(sk, skb);
 		sock_put(sk);
 
 		/* a return value > 0 means to resubmit the input, but


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v9 net-next 3/7] udp: add low latency socket poll support
@ 2013-06-05 10:34   ` Eliezer Tamir
  0 siblings, 0 replies; 51+ messages in thread
From: Eliezer Tamir @ 2013-06-05 10:34 UTC (permalink / raw)
  To: David Miller
  Cc: Willem de Bruijn, Or Gerlitz, e1000-devel, netdev, HPA,
	Amir Vadai, linux-kernel, Eliezer Tamir, Jesse Brandeburg,
	Andi Kleen, Ben Hutchings, Eric Dumazet, Eilon Greenstien

Add upport for busy-polling on UDP sockets.
In __udp[46]_lib_rcv add a call to sk_mark_ll() to copy the napi_id
from the skb into the sk.
This is done at the earliest possible moment, right after we identify
which socket this skb is for.
In __skb_recv_datagram When there is no data and the user
tries to read we busy poll.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 net/core/datagram.c |    4 ++++
 net/ipv4/udp.c      |    6 +++++-
 net/ipv6/udp.c      |    6 +++++-
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/net/core/datagram.c b/net/core/datagram.c
index b71423d..9cbaba9 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -56,6 +56,7 @@
 #include <net/sock.h>
 #include <net/tcp_states.h>
 #include <trace/events/skb.h>
+#include <net/ll_poll.h>
 
 /*
  *	Is a socket 'connection oriented' ?
@@ -207,6 +208,9 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
 		}
 		spin_unlock_irqrestore(&queue->lock, cpu_flags);
 
+		if (sk_valid_ll(sk) && sk_poll_ll(sk, flags & MSG_DONTWAIT))
+			continue;
+
 		/* User doesn't want to wait */
 		error = -EAGAIN;
 		if (!timeo)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index c7338ec..2955b25 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -109,6 +109,7 @@
 #include <trace/events/udp.h>
 #include <linux/static_key.h>
 #include <trace/events/skb.h>
+#include <net/ll_poll.h>
 #include "udp_impl.h"
 
 struct udp_table udp_table __read_mostly;
@@ -1709,7 +1710,10 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
 
 	if (sk != NULL) {
-		int ret = udp_queue_rcv_skb(sk, skb);
+		int ret;
+
+		sk_mark_ll(sk, skb);
+		ret = udp_queue_rcv_skb(sk, skb);
 		sock_put(sk);
 
 		/* a return value > 0 means to resubmit the input, but
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index b580853..f77e34c 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -46,6 +46,7 @@
 #include <net/ip6_checksum.h>
 #include <net/xfrm.h>
 #include <net/inet6_hashtables.h>
+#include <net/ll_poll.h>
 
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
@@ -841,7 +842,10 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	 */
 	sk = __udp6_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
 	if (sk != NULL) {
-		int ret = udpv6_queue_rcv_skb(sk, skb);
+		int ret;
+
+		sk_mark_ll(sk, skb);
+		ret = udpv6_queue_rcv_skb(sk, skb);
 		sock_put(sk);
 
 		/* a return value > 0 means to resubmit the input, but


------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v9 net-next 4/7] tcp: add low latency socket poll support.
  2013-06-05 10:34 [PATCH v9 net-next 0/7] net: low latency Ethernet device polling Eliezer Tamir
@ 2013-06-05 10:34   ` Eliezer Tamir
  2013-06-05 10:34   ` Eliezer Tamir
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 51+ messages in thread
From: Eliezer Tamir @ 2013-06-05 10:34 UTC (permalink / raw)
  To: David Miller
  Cc: linux-kernel, netdev, Jesse Brandeburg, Don Skidmore,
	e1000-devel, Willem de Bruijn, Eric Dumazet, Ben Hutchings,
	Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz, Amir Vadai,
	Eliezer Tamir

Adds low latency socket poll support for TCP.
In tcp_v[46]_rcv() add a call to sk_mark_ll() to copy the napi_id
from the skb to the sk.
In tcp_recvmsg(), when there is no data in the socket we busy-poll.
This is a good example of how to add busy-poll support to more protocols.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 net/ipv4/tcp.c      |    5 +++++
 net/ipv4/tcp_ipv4.c |    2 ++
 net/ipv6/tcp_ipv6.c |    2 ++
 3 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index b5d4ad9..bf09f6b 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -279,6 +279,7 @@
 
 #include <asm/uaccess.h>
 #include <asm/ioctls.h>
+#include <net/ll_poll.h>
 
 int sysctl_tcp_fin_timeout __read_mostly = TCP_FIN_TIMEOUT;
 
@@ -1553,6 +1554,10 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	struct sk_buff *skb;
 	u32 urg_hole = 0;
 
+	if (sk_valid_ll(sk) && skb_queue_empty(&sk->sk_receive_queue)
+	    && (sk->sk_state == TCP_ESTABLISHED))
+		sk_poll_ll(sk, nonblock);
+
 	lock_sock(sk);
 
 	err = -ENOTCONN;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index d20ede0..35fd8bc 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -75,6 +75,7 @@
 #include <net/netdma.h>
 #include <net/secure_seq.h>
 #include <net/tcp_memcontrol.h>
+#include <net/ll_poll.h>
 
 #include <linux/inet.h>
 #include <linux/ipv6.h>
@@ -2011,6 +2012,7 @@ process:
 	if (sk_filter(sk, skb))
 		goto discard_and_relse;
 
+	sk_mark_ll(sk, skb);
 	skb->dev = NULL;
 
 	bh_lock_sock_nested(sk);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 0a17ed9..5cffa5c 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -63,6 +63,7 @@
 #include <net/inet_common.h>
 #include <net/secure_seq.h>
 #include <net/tcp_memcontrol.h>
+#include <net/ll_poll.h>
 
 #include <asm/uaccess.h>
 
@@ -1498,6 +1499,7 @@ process:
 	if (sk_filter(sk, skb))
 		goto discard_and_relse;
 
+	sk_mark_ll(sk, skb);
 	skb->dev = NULL;
 
 	bh_lock_sock_nested(sk);


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v9 net-next 4/7] tcp: add low latency socket poll support.
@ 2013-06-05 10:34   ` Eliezer Tamir
  0 siblings, 0 replies; 51+ messages in thread
From: Eliezer Tamir @ 2013-06-05 10:34 UTC (permalink / raw)
  To: David Miller
  Cc: Willem de Bruijn, Or Gerlitz, e1000-devel, netdev, HPA,
	Amir Vadai, linux-kernel, Eliezer Tamir, Jesse Brandeburg,
	Andi Kleen, Ben Hutchings, Eric Dumazet, Eilon Greenstien

Adds low latency socket poll support for TCP.
In tcp_v[46]_rcv() add a call to sk_mark_ll() to copy the napi_id
from the skb to the sk.
In tcp_recvmsg(), when there is no data in the socket we busy-poll.
This is a good example of how to add busy-poll support to more protocols.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 net/ipv4/tcp.c      |    5 +++++
 net/ipv4/tcp_ipv4.c |    2 ++
 net/ipv6/tcp_ipv6.c |    2 ++
 3 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index b5d4ad9..bf09f6b 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -279,6 +279,7 @@
 
 #include <asm/uaccess.h>
 #include <asm/ioctls.h>
+#include <net/ll_poll.h>
 
 int sysctl_tcp_fin_timeout __read_mostly = TCP_FIN_TIMEOUT;
 
@@ -1553,6 +1554,10 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	struct sk_buff *skb;
 	u32 urg_hole = 0;
 
+	if (sk_valid_ll(sk) && skb_queue_empty(&sk->sk_receive_queue)
+	    && (sk->sk_state == TCP_ESTABLISHED))
+		sk_poll_ll(sk, nonblock);
+
 	lock_sock(sk);
 
 	err = -ENOTCONN;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index d20ede0..35fd8bc 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -75,6 +75,7 @@
 #include <net/netdma.h>
 #include <net/secure_seq.h>
 #include <net/tcp_memcontrol.h>
+#include <net/ll_poll.h>
 
 #include <linux/inet.h>
 #include <linux/ipv6.h>
@@ -2011,6 +2012,7 @@ process:
 	if (sk_filter(sk, skb))
 		goto discard_and_relse;
 
+	sk_mark_ll(sk, skb);
 	skb->dev = NULL;
 
 	bh_lock_sock_nested(sk);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 0a17ed9..5cffa5c 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -63,6 +63,7 @@
 #include <net/inet_common.h>
 #include <net/secure_seq.h>
 #include <net/tcp_memcontrol.h>
+#include <net/ll_poll.h>
 
 #include <asm/uaccess.h>
 
@@ -1498,6 +1499,7 @@ process:
 	if (sk_filter(sk, skb))
 		goto discard_and_relse;
 
+	sk_mark_ll(sk, skb);
 	skb->dev = NULL;
 
 	bh_lock_sock_nested(sk);


------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v9 net-next 5/7] net: simple poll/select low latency socket poll
  2013-06-05 10:34 [PATCH v9 net-next 0/7] net: low latency Ethernet device polling Eliezer Tamir
@ 2013-06-05 10:34   ` Eliezer Tamir
  2013-06-05 10:34   ` Eliezer Tamir
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 51+ messages in thread
From: Eliezer Tamir @ 2013-06-05 10:34 UTC (permalink / raw)
  To: David Miller
  Cc: linux-kernel, netdev, Jesse Brandeburg, Don Skidmore,
	e1000-devel, Willem de Bruijn, Eric Dumazet, Ben Hutchings,
	Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz, Amir Vadai,
	Eliezer Tamir

A very naive select/poll busy-poll support.
Add busy-polling to sock_poll().
When poll/select have nothing to report, call the low-level
sock_poll() again until we are out of time or we find something.
Right now we poll every socket once, this is suboptimal
but improves latency when the number of sockets polled is not large.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 fs/select.c  |    7 +++++++
 net/socket.c |   10 +++++++++-
 2 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index 8c1c96c..f116bf0 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -27,6 +27,7 @@
 #include <linux/rcupdate.h>
 #include <linux/hrtimer.h>
 #include <linux/sched/rt.h>
+#include <net/ll_poll.h>
 
 #include <asm/uaccess.h>
 
@@ -400,6 +401,7 @@ int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
 	poll_table *wait;
 	int retval, i, timed_out = 0;
 	unsigned long slack = 0;
+	cycles_t ll_time = ll_end_time();
 
 	rcu_read_lock();
 	retval = max_select_fd(n, fds);
@@ -486,6 +488,8 @@ int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
 			break;
 		}
 
+		if (can_poll_ll(ll_time))
+			continue;
 		/*
 		 * If this is the first loop and we have a timeout
 		 * given, then we convert to ktime_t and set the to
@@ -750,6 +754,7 @@ static int do_poll(unsigned int nfds,  struct poll_list *list,
 	ktime_t expire, *to = NULL;
 	int timed_out = 0, count = 0;
 	unsigned long slack = 0;
+	cycles_t ll_time = ll_end_time();
 
 	/* Optimise the no-wait case */
 	if (end_time && !end_time->tv_sec && !end_time->tv_nsec) {
@@ -795,6 +800,8 @@ static int do_poll(unsigned int nfds,  struct poll_list *list,
 		if (count || timed_out)
 			break;
 
+		if (can_poll_ll(ll_time))
+			continue;
 		/*
 		 * If this is the first loop and we have a timeout
 		 * given, then we convert to ktime_t and set the to
diff --git a/net/socket.c b/net/socket.c
index 721f4e7..c34dad0 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1148,13 +1148,21 @@ EXPORT_SYMBOL(sock_create_lite);
 /* No kernel lock held - perfect */
 static unsigned int sock_poll(struct file *file, poll_table *wait)
 {
+	unsigned int poll_result;
 	struct socket *sock;
 
 	/*
 	 *      We can't return errors to poll, so it's either yes or no.
 	 */
 	sock = file->private_data;
-	return sock->ops->poll(file, sock, wait);
+
+	poll_result = sock->ops->poll(file, sock, wait);
+
+	if (wait && !(poll_result & wait->_key) &&
+		sk_valid_ll(sock->sk) && sk_poll_ll(sock->sk, 1))
+			poll_result = sock->ops->poll(file, sock, NULL);
+
+	return poll_result;
 }
 
 static int sock_mmap(struct file *file, struct vm_area_struct *vma)


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v9 net-next 5/7] net: simple poll/select low latency socket poll
@ 2013-06-05 10:34   ` Eliezer Tamir
  0 siblings, 0 replies; 51+ messages in thread
From: Eliezer Tamir @ 2013-06-05 10:34 UTC (permalink / raw)
  To: David Miller
  Cc: Willem de Bruijn, Or Gerlitz, e1000-devel, netdev, HPA,
	Amir Vadai, linux-kernel, Eliezer Tamir, Jesse Brandeburg,
	Andi Kleen, Ben Hutchings, Eric Dumazet, Eilon Greenstien

A very naive select/poll busy-poll support.
Add busy-polling to sock_poll().
When poll/select have nothing to report, call the low-level
sock_poll() again until we are out of time or we find something.
Right now we poll every socket once, this is suboptimal
but improves latency when the number of sockets polled is not large.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 fs/select.c  |    7 +++++++
 net/socket.c |   10 +++++++++-
 2 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index 8c1c96c..f116bf0 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -27,6 +27,7 @@
 #include <linux/rcupdate.h>
 #include <linux/hrtimer.h>
 #include <linux/sched/rt.h>
+#include <net/ll_poll.h>
 
 #include <asm/uaccess.h>
 
@@ -400,6 +401,7 @@ int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
 	poll_table *wait;
 	int retval, i, timed_out = 0;
 	unsigned long slack = 0;
+	cycles_t ll_time = ll_end_time();
 
 	rcu_read_lock();
 	retval = max_select_fd(n, fds);
@@ -486,6 +488,8 @@ int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
 			break;
 		}
 
+		if (can_poll_ll(ll_time))
+			continue;
 		/*
 		 * If this is the first loop and we have a timeout
 		 * given, then we convert to ktime_t and set the to
@@ -750,6 +754,7 @@ static int do_poll(unsigned int nfds,  struct poll_list *list,
 	ktime_t expire, *to = NULL;
 	int timed_out = 0, count = 0;
 	unsigned long slack = 0;
+	cycles_t ll_time = ll_end_time();
 
 	/* Optimise the no-wait case */
 	if (end_time && !end_time->tv_sec && !end_time->tv_nsec) {
@@ -795,6 +800,8 @@ static int do_poll(unsigned int nfds,  struct poll_list *list,
 		if (count || timed_out)
 			break;
 
+		if (can_poll_ll(ll_time))
+			continue;
 		/*
 		 * If this is the first loop and we have a timeout
 		 * given, then we convert to ktime_t and set the to
diff --git a/net/socket.c b/net/socket.c
index 721f4e7..c34dad0 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1148,13 +1148,21 @@ EXPORT_SYMBOL(sock_create_lite);
 /* No kernel lock held - perfect */
 static unsigned int sock_poll(struct file *file, poll_table *wait)
 {
+	unsigned int poll_result;
 	struct socket *sock;
 
 	/*
 	 *      We can't return errors to poll, so it's either yes or no.
 	 */
 	sock = file->private_data;
-	return sock->ops->poll(file, sock, wait);
+
+	poll_result = sock->ops->poll(file, sock, wait);
+
+	if (wait && !(poll_result & wait->_key) &&
+		sk_valid_ll(sock->sk) && sk_poll_ll(sock->sk, 1))
+			poll_result = sock->ops->poll(file, sock, NULL);
+
+	return poll_result;
 }
 
 static int sock_mmap(struct file *file, struct vm_area_struct *vma)


------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v9 net-next 6/7] ixgbe: add support for ndo_ll_poll
  2013-06-05 10:34 [PATCH v9 net-next 0/7] net: low latency Ethernet device polling Eliezer Tamir
@ 2013-06-05 10:35   ` Eliezer Tamir
  2013-06-05 10:34   ` Eliezer Tamir
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 51+ messages in thread
From: Eliezer Tamir @ 2013-06-05 10:35 UTC (permalink / raw)
  To: David Miller
  Cc: linux-kernel, netdev, Jesse Brandeburg, Don Skidmore,
	e1000-devel, Willem de Bruijn, Eric Dumazet, Ben Hutchings,
	Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz, Amir Vadai,
	Eliezer Tamir

Add the ixgbe driver code implementing ndo_ll_poll.
Adds ndo_ll_poll method and locking between it and the napi poll.
When receiving a packet we use skb_mark_ll to record the napi it came from.
Add each napi to the napi_hash right after netif_napi_add().

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |  120 +++++++++++++++++++++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c  |    2 
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   63 +++++++++++--
 3 files changed, 177 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index ca93238..e9d9862 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -52,6 +52,8 @@
 #include <linux/dca.h>
 #endif
 
+#include <net/ll_poll.h>
+
 /* common prefix used by pr_<> macros */
 #undef pr_fmt
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -356,9 +358,127 @@ struct ixgbe_q_vector {
 	struct rcu_head rcu;	/* to avoid race with update stats on free */
 	char name[IFNAMSIZ + 9];
 
+#ifdef CONFIG_NET_LL_RX_POLL
+	unsigned int state;
+#define IXGBE_QV_STATE_IDLE        0
+#define IXGBE_QV_STATE_NAPI	   1    /* NAPI owns this QV */
+#define IXGBE_QV_STATE_POLL	   2    /* poll owns this QV */
+#define IXGBE_QV_LOCKED (IXGBE_QV_STATE_NAPI | IXGBE_QV_STATE_POLL)
+#define IXGBE_QV_STATE_NAPI_YIELD  4    /* NAPI yielded this QV */
+#define IXGBE_QV_STATE_POLL_YIELD  8    /* poll yielded this QV */
+#define IXGBE_QV_YIELD (IXGBE_QV_STATE_NAPI_YIELD | IXGBE_QV_STATE_POLL_YIELD)
+#define IXGBE_QV_USER_PEND (IXGBE_QV_STATE_POLL | IXGBE_QV_STATE_POLL_YIELD)
+	spinlock_t lock;
+#endif  /* CONFIG_NET_LL_RX_POLL */
+
 	/* for dynamic allocation of rings associated with this q_vector */
 	struct ixgbe_ring ring[0] ____cacheline_internodealigned_in_smp;
 };
+#ifdef CONFIG_NET_LL_RX_POLL
+static inline void ixgbe_qv_init_lock(struct ixgbe_q_vector *q_vector)
+{
+
+	spin_lock_init(&q_vector->lock);
+	q_vector->state = IXGBE_QV_STATE_IDLE;
+}
+
+/* called from the device poll routine to get ownership of a q_vector */
+static inline bool ixgbe_qv_lock_napi(struct ixgbe_q_vector *q_vector)
+{
+	int rc = true;
+	spin_lock(&q_vector->lock);
+	if (q_vector->state & IXGBE_QV_LOCKED) {
+		WARN_ON(q_vector->state & IXGBE_QV_STATE_NAPI);
+		q_vector->state |= IXGBE_QV_STATE_NAPI_YIELD;
+		rc = false;
+	} else
+		/* we don't care if someone yielded */
+		q_vector->state = IXGBE_QV_STATE_NAPI;
+	spin_unlock(&q_vector->lock);
+	return rc;
+}
+
+/* returns true is someone tried to get the qv while napi had it */
+static inline bool ixgbe_qv_unlock_napi(struct ixgbe_q_vector *q_vector)
+{
+	int rc = false;
+	spin_lock(&q_vector->lock);
+	WARN_ON(q_vector->state & (IXGBE_QV_STATE_POLL |
+			       IXGBE_QV_STATE_NAPI_YIELD));
+
+	if (q_vector->state & IXGBE_QV_STATE_POLL_YIELD)
+		rc = true;
+	q_vector->state = IXGBE_QV_STATE_IDLE;
+	spin_unlock(&q_vector->lock);
+	return rc;
+}
+
+/* called from ixgbe_low_latency_poll() */
+static inline bool ixgbe_qv_lock_poll(struct ixgbe_q_vector *q_vector)
+{
+	int rc = true;
+	spin_lock_bh(&q_vector->lock);
+	if ((q_vector->state & IXGBE_QV_LOCKED)) {
+		q_vector->state |= IXGBE_QV_STATE_POLL_YIELD;
+		rc = false;
+	} else
+		/* preserve yield marks */
+		q_vector->state |= IXGBE_QV_STATE_POLL;
+	spin_unlock_bh(&q_vector->lock);
+	return rc;
+}
+
+/* returns true if someone tried to get the qv while it was locked */
+static inline bool ixgbe_qv_unlock_poll(struct ixgbe_q_vector *q_vector)
+{
+	int rc = false;
+	spin_lock_bh(&q_vector->lock);
+	WARN_ON(q_vector->state & (IXGBE_QV_STATE_NAPI));
+
+	if (q_vector->state & IXGBE_QV_STATE_POLL_YIELD)
+		rc = true;
+	q_vector->state = IXGBE_QV_STATE_IDLE;
+	spin_unlock_bh(&q_vector->lock);
+	return rc;
+}
+
+/* true if a socket is polling, even if it did not get the lock */
+static inline bool ixgbe_qv_ll_polling(struct ixgbe_q_vector *q_vector)
+{
+	WARN_ON(!(q_vector->state & IXGBE_QV_LOCKED));
+	return q_vector->state & IXGBE_QV_USER_PEND;
+}
+#else /* CONFIG_NET_LL_RX_POLL */
+static inline void ixgbe_qv_init_lock(struct ixgbe_q_vector *q_vector)
+{
+}
+
+static inline bool ixgbe_qv_lock_napi(struct ixgbe_q_vector *q_vector)
+{
+	return true;
+}
+
+static inline bool ixgbe_qv_unlock_napi(struct ixgbe_q_vector *q_vector)
+{
+	return false;
+}
+
+static inline bool ixgbe_qv_lock_poll(struct ixgbe_q_vector *q_vector)
+{
+	return false;
+}
+
+static inline bool ixgbe_qv_unlock_poll(struct ixgbe_q_vector *q_vector)
+{
+	return false;
+}
+
+static inline bool ixgbe_qv_ll_polling(struct ixgbe_q_vector *q_vector)
+{
+	return false;
+}
+#endif /* CONFIG_NET_LL_RX_POLL */
+
 #ifdef CONFIG_IXGBE_HWMON
 
 #define IXGBE_HWMON_TYPE_LOC		0
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
index ef5f7a6..90b4e10 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
@@ -811,6 +811,7 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
 	/* initialize NAPI */
 	netif_napi_add(adapter->netdev, &q_vector->napi,
 		       ixgbe_poll, 64);
+	napi_hash_add(&q_vector->napi);
 
 	/* tie q_vector and adapter together */
 	adapter->q_vector[v_idx] = q_vector;
@@ -931,6 +932,7 @@ static void ixgbe_free_q_vector(struct ixgbe_adapter *adapter, int v_idx)
 		adapter->rx_ring[ring->queue_index] = NULL;
 
 	adapter->q_vector[v_idx] = NULL;
+	napi_hash_del(&q_vector->napi);
 	netif_napi_del(&q_vector->napi);
 
 	/*
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index d30fbdd..9a7dc40 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1504,7 +1504,9 @@ static void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
 {
 	struct ixgbe_adapter *adapter = q_vector->adapter;
 
-	if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL))
+	if (ixgbe_qv_ll_polling(q_vector))
+		netif_receive_skb(skb);
+	else if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL))
 		napi_gro_receive(&q_vector->napi, skb);
 	else
 		netif_rx(skb);
@@ -1892,9 +1894,9 @@ dma_sync:
  * expensive overhead for IOMMU access this provides a means of avoiding
  * it by maintaining the mapping of the page to the syste.
  *
- * Returns true if all work is completed without reaching budget
+ * Returns amount of work completed
  **/
-static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
+static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 			       struct ixgbe_ring *rx_ring,
 			       const int budget)
 {
@@ -1976,6 +1978,7 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		}
 
 #endif /* IXGBE_FCOE */
+		skb_mark_ll(skb, &q_vector->napi);
 		ixgbe_rx_skb(q_vector, skb);
 
 		/* update budget accounting */
@@ -1992,9 +1995,37 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 	if (cleaned_count)
 		ixgbe_alloc_rx_buffers(rx_ring, cleaned_count);
 
-	return (total_rx_packets < budget);
+	return total_rx_packets;
 }
 
+#ifdef CONFIG_NET_LL_RX_POLL
+/* must be called with local_bh_disable()d */
+static int ixgbe_low_latency_recv(struct napi_struct *napi)
+{
+	struct ixgbe_q_vector *q_vector =
+			container_of(napi, struct ixgbe_q_vector, napi);
+	struct ixgbe_adapter *adapter = q_vector->adapter;
+	struct ixgbe_ring  *ring;
+	int found = 0;
+
+	if (test_bit(__IXGBE_DOWN, &adapter->state))
+		return LL_FLUSH_FAILED;
+
+	if (!ixgbe_qv_lock_poll(q_vector))
+		return LL_FLUSH_BUSY;
+
+	ixgbe_for_each_ring(ring, q_vector->rx) {
+		found = ixgbe_clean_rx_irq(q_vector, ring, 4);
+		if (found)
+			break;
+	}
+
+	ixgbe_qv_unlock_poll(q_vector);
+
+	return found;
+}
+#endif	/* CONFIG_NET_LL_RX_POLL */
+
 /**
  * ixgbe_configure_msix - Configure MSI-X hardware
  * @adapter: board private structure
@@ -2550,6 +2581,9 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
 	ixgbe_for_each_ring(ring, q_vector->tx)
 		clean_complete &= !!ixgbe_clean_tx_irq(q_vector, ring);
 
+	if (!ixgbe_qv_lock_napi(q_vector))
+		return budget;
+
 	/* attempt to distribute budget to each queue fairly, but don't allow
 	 * the budget to go below 1 because we'll exit polling */
 	if (q_vector->rx.count > 1)
@@ -2558,9 +2592,10 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
 		per_ring_budget = budget;
 
 	ixgbe_for_each_ring(ring, q_vector->rx)
-		clean_complete &= ixgbe_clean_rx_irq(q_vector, ring,
-						     per_ring_budget);
+		clean_complete &= (ixgbe_clean_rx_irq(q_vector, ring,
+				   per_ring_budget) < per_ring_budget);
 
+	ixgbe_qv_unlock_napi(q_vector);
 	/* If all work not completed, return budget and keep polling */
 	if (!clean_complete)
 		return budget;
@@ -3747,16 +3782,25 @@ static void ixgbe_napi_enable_all(struct ixgbe_adapter *adapter)
 {
 	int q_idx;
 
-	for (q_idx = 0; q_idx < adapter->num_q_vectors; q_idx++)
+	for (q_idx = 0; q_idx < adapter->num_q_vectors; q_idx++) {
+		ixgbe_qv_init_lock(adapter->q_vector[q_idx]);
 		napi_enable(&adapter->q_vector[q_idx]->napi);
+	}
 }
 
 static void ixgbe_napi_disable_all(struct ixgbe_adapter *adapter)
 {
 	int q_idx;
 
-	for (q_idx = 0; q_idx < adapter->num_q_vectors; q_idx++)
+	local_bh_disable(); /* for ixgbe_qv_lock_napi() */
+	for (q_idx = 0; q_idx < adapter->num_q_vectors; q_idx++) {
 		napi_disable(&adapter->q_vector[q_idx]->napi);
+		while (!ixgbe_qv_lock_napi(adapter->q_vector[q_idx])) {
+			pr_info("QV %d locked\n", q_idx);
+			mdelay(1);
+		}
+	}
+	local_bh_enable();
 }
 
 #ifdef CONFIG_IXGBE_DCB
@@ -7177,6 +7221,9 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= ixgbe_netpoll,
 #endif
+#ifdef CONFIG_NET_LL_RX_POLL
+	.ndo_ll_poll		= ixgbe_low_latency_recv,
+#endif
 #ifdef IXGBE_FCOE
 	.ndo_fcoe_ddp_setup = ixgbe_fcoe_ddp_get,
 	.ndo_fcoe_ddp_target = ixgbe_fcoe_ddp_target,


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v9 net-next 6/7] ixgbe: add support for ndo_ll_poll
@ 2013-06-05 10:35   ` Eliezer Tamir
  0 siblings, 0 replies; 51+ messages in thread
From: Eliezer Tamir @ 2013-06-05 10:35 UTC (permalink / raw)
  To: David Miller
  Cc: Willem de Bruijn, Or Gerlitz, e1000-devel, netdev, HPA,
	Amir Vadai, linux-kernel, Eliezer Tamir, Jesse Brandeburg,
	Andi Kleen, Ben Hutchings, Eric Dumazet, Eilon Greenstien

Add the ixgbe driver code implementing ndo_ll_poll.
Adds ndo_ll_poll method and locking between it and the napi poll.
When receiving a packet we use skb_mark_ll to record the napi it came from.
Add each napi to the napi_hash right after netif_napi_add().

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |  120 +++++++++++++++++++++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c  |    2 
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   63 +++++++++++--
 3 files changed, 177 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index ca93238..e9d9862 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -52,6 +52,8 @@
 #include <linux/dca.h>
 #endif
 
+#include <net/ll_poll.h>
+
 /* common prefix used by pr_<> macros */
 #undef pr_fmt
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -356,9 +358,127 @@ struct ixgbe_q_vector {
 	struct rcu_head rcu;	/* to avoid race with update stats on free */
 	char name[IFNAMSIZ + 9];
 
+#ifdef CONFIG_NET_LL_RX_POLL
+	unsigned int state;
+#define IXGBE_QV_STATE_IDLE        0
+#define IXGBE_QV_STATE_NAPI	   1    /* NAPI owns this QV */
+#define IXGBE_QV_STATE_POLL	   2    /* poll owns this QV */
+#define IXGBE_QV_LOCKED (IXGBE_QV_STATE_NAPI | IXGBE_QV_STATE_POLL)
+#define IXGBE_QV_STATE_NAPI_YIELD  4    /* NAPI yielded this QV */
+#define IXGBE_QV_STATE_POLL_YIELD  8    /* poll yielded this QV */
+#define IXGBE_QV_YIELD (IXGBE_QV_STATE_NAPI_YIELD | IXGBE_QV_STATE_POLL_YIELD)
+#define IXGBE_QV_USER_PEND (IXGBE_QV_STATE_POLL | IXGBE_QV_STATE_POLL_YIELD)
+	spinlock_t lock;
+#endif  /* CONFIG_NET_LL_RX_POLL */
+
 	/* for dynamic allocation of rings associated with this q_vector */
 	struct ixgbe_ring ring[0] ____cacheline_internodealigned_in_smp;
 };
+#ifdef CONFIG_NET_LL_RX_POLL
+static inline void ixgbe_qv_init_lock(struct ixgbe_q_vector *q_vector)
+{
+
+	spin_lock_init(&q_vector->lock);
+	q_vector->state = IXGBE_QV_STATE_IDLE;
+}
+
+/* called from the device poll routine to get ownership of a q_vector */
+static inline bool ixgbe_qv_lock_napi(struct ixgbe_q_vector *q_vector)
+{
+	int rc = true;
+	spin_lock(&q_vector->lock);
+	if (q_vector->state & IXGBE_QV_LOCKED) {
+		WARN_ON(q_vector->state & IXGBE_QV_STATE_NAPI);
+		q_vector->state |= IXGBE_QV_STATE_NAPI_YIELD;
+		rc = false;
+	} else
+		/* we don't care if someone yielded */
+		q_vector->state = IXGBE_QV_STATE_NAPI;
+	spin_unlock(&q_vector->lock);
+	return rc;
+}
+
+/* returns true is someone tried to get the qv while napi had it */
+static inline bool ixgbe_qv_unlock_napi(struct ixgbe_q_vector *q_vector)
+{
+	int rc = false;
+	spin_lock(&q_vector->lock);
+	WARN_ON(q_vector->state & (IXGBE_QV_STATE_POLL |
+			       IXGBE_QV_STATE_NAPI_YIELD));
+
+	if (q_vector->state & IXGBE_QV_STATE_POLL_YIELD)
+		rc = true;
+	q_vector->state = IXGBE_QV_STATE_IDLE;
+	spin_unlock(&q_vector->lock);
+	return rc;
+}
+
+/* called from ixgbe_low_latency_poll() */
+static inline bool ixgbe_qv_lock_poll(struct ixgbe_q_vector *q_vector)
+{
+	int rc = true;
+	spin_lock_bh(&q_vector->lock);
+	if ((q_vector->state & IXGBE_QV_LOCKED)) {
+		q_vector->state |= IXGBE_QV_STATE_POLL_YIELD;
+		rc = false;
+	} else
+		/* preserve yield marks */
+		q_vector->state |= IXGBE_QV_STATE_POLL;
+	spin_unlock_bh(&q_vector->lock);
+	return rc;
+}
+
+/* returns true if someone tried to get the qv while it was locked */
+static inline bool ixgbe_qv_unlock_poll(struct ixgbe_q_vector *q_vector)
+{
+	int rc = false;
+	spin_lock_bh(&q_vector->lock);
+	WARN_ON(q_vector->state & (IXGBE_QV_STATE_NAPI));
+
+	if (q_vector->state & IXGBE_QV_STATE_POLL_YIELD)
+		rc = true;
+	q_vector->state = IXGBE_QV_STATE_IDLE;
+	spin_unlock_bh(&q_vector->lock);
+	return rc;
+}
+
+/* true if a socket is polling, even if it did not get the lock */
+static inline bool ixgbe_qv_ll_polling(struct ixgbe_q_vector *q_vector)
+{
+	WARN_ON(!(q_vector->state & IXGBE_QV_LOCKED));
+	return q_vector->state & IXGBE_QV_USER_PEND;
+}
+#else /* CONFIG_NET_LL_RX_POLL */
+static inline void ixgbe_qv_init_lock(struct ixgbe_q_vector *q_vector)
+{
+}
+
+static inline bool ixgbe_qv_lock_napi(struct ixgbe_q_vector *q_vector)
+{
+	return true;
+}
+
+static inline bool ixgbe_qv_unlock_napi(struct ixgbe_q_vector *q_vector)
+{
+	return false;
+}
+
+static inline bool ixgbe_qv_lock_poll(struct ixgbe_q_vector *q_vector)
+{
+	return false;
+}
+
+static inline bool ixgbe_qv_unlock_poll(struct ixgbe_q_vector *q_vector)
+{
+	return false;
+}
+
+static inline bool ixgbe_qv_ll_polling(struct ixgbe_q_vector *q_vector)
+{
+	return false;
+}
+#endif /* CONFIG_NET_LL_RX_POLL */
+
 #ifdef CONFIG_IXGBE_HWMON
 
 #define IXGBE_HWMON_TYPE_LOC		0
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
index ef5f7a6..90b4e10 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
@@ -811,6 +811,7 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
 	/* initialize NAPI */
 	netif_napi_add(adapter->netdev, &q_vector->napi,
 		       ixgbe_poll, 64);
+	napi_hash_add(&q_vector->napi);
 
 	/* tie q_vector and adapter together */
 	adapter->q_vector[v_idx] = q_vector;
@@ -931,6 +932,7 @@ static void ixgbe_free_q_vector(struct ixgbe_adapter *adapter, int v_idx)
 		adapter->rx_ring[ring->queue_index] = NULL;
 
 	adapter->q_vector[v_idx] = NULL;
+	napi_hash_del(&q_vector->napi);
 	netif_napi_del(&q_vector->napi);
 
 	/*
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index d30fbdd..9a7dc40 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1504,7 +1504,9 @@ static void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
 {
 	struct ixgbe_adapter *adapter = q_vector->adapter;
 
-	if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL))
+	if (ixgbe_qv_ll_polling(q_vector))
+		netif_receive_skb(skb);
+	else if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL))
 		napi_gro_receive(&q_vector->napi, skb);
 	else
 		netif_rx(skb);
@@ -1892,9 +1894,9 @@ dma_sync:
  * expensive overhead for IOMMU access this provides a means of avoiding
  * it by maintaining the mapping of the page to the syste.
  *
- * Returns true if all work is completed without reaching budget
+ * Returns amount of work completed
  **/
-static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
+static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 			       struct ixgbe_ring *rx_ring,
 			       const int budget)
 {
@@ -1976,6 +1978,7 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		}
 
 #endif /* IXGBE_FCOE */
+		skb_mark_ll(skb, &q_vector->napi);
 		ixgbe_rx_skb(q_vector, skb);
 
 		/* update budget accounting */
@@ -1992,9 +1995,37 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 	if (cleaned_count)
 		ixgbe_alloc_rx_buffers(rx_ring, cleaned_count);
 
-	return (total_rx_packets < budget);
+	return total_rx_packets;
 }
 
+#ifdef CONFIG_NET_LL_RX_POLL
+/* must be called with local_bh_disable()d */
+static int ixgbe_low_latency_recv(struct napi_struct *napi)
+{
+	struct ixgbe_q_vector *q_vector =
+			container_of(napi, struct ixgbe_q_vector, napi);
+	struct ixgbe_adapter *adapter = q_vector->adapter;
+	struct ixgbe_ring  *ring;
+	int found = 0;
+
+	if (test_bit(__IXGBE_DOWN, &adapter->state))
+		return LL_FLUSH_FAILED;
+
+	if (!ixgbe_qv_lock_poll(q_vector))
+		return LL_FLUSH_BUSY;
+
+	ixgbe_for_each_ring(ring, q_vector->rx) {
+		found = ixgbe_clean_rx_irq(q_vector, ring, 4);
+		if (found)
+			break;
+	}
+
+	ixgbe_qv_unlock_poll(q_vector);
+
+	return found;
+}
+#endif	/* CONFIG_NET_LL_RX_POLL */
+
 /**
  * ixgbe_configure_msix - Configure MSI-X hardware
  * @adapter: board private structure
@@ -2550,6 +2581,9 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
 	ixgbe_for_each_ring(ring, q_vector->tx)
 		clean_complete &= !!ixgbe_clean_tx_irq(q_vector, ring);
 
+	if (!ixgbe_qv_lock_napi(q_vector))
+		return budget;
+
 	/* attempt to distribute budget to each queue fairly, but don't allow
 	 * the budget to go below 1 because we'll exit polling */
 	if (q_vector->rx.count > 1)
@@ -2558,9 +2592,10 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
 		per_ring_budget = budget;
 
 	ixgbe_for_each_ring(ring, q_vector->rx)
-		clean_complete &= ixgbe_clean_rx_irq(q_vector, ring,
-						     per_ring_budget);
+		clean_complete &= (ixgbe_clean_rx_irq(q_vector, ring,
+				   per_ring_budget) < per_ring_budget);
 
+	ixgbe_qv_unlock_napi(q_vector);
 	/* If all work not completed, return budget and keep polling */
 	if (!clean_complete)
 		return budget;
@@ -3747,16 +3782,25 @@ static void ixgbe_napi_enable_all(struct ixgbe_adapter *adapter)
 {
 	int q_idx;
 
-	for (q_idx = 0; q_idx < adapter->num_q_vectors; q_idx++)
+	for (q_idx = 0; q_idx < adapter->num_q_vectors; q_idx++) {
+		ixgbe_qv_init_lock(adapter->q_vector[q_idx]);
 		napi_enable(&adapter->q_vector[q_idx]->napi);
+	}
 }
 
 static void ixgbe_napi_disable_all(struct ixgbe_adapter *adapter)
 {
 	int q_idx;
 
-	for (q_idx = 0; q_idx < adapter->num_q_vectors; q_idx++)
+	local_bh_disable(); /* for ixgbe_qv_lock_napi() */
+	for (q_idx = 0; q_idx < adapter->num_q_vectors; q_idx++) {
 		napi_disable(&adapter->q_vector[q_idx]->napi);
+		while (!ixgbe_qv_lock_napi(adapter->q_vector[q_idx])) {
+			pr_info("QV %d locked\n", q_idx);
+			mdelay(1);
+		}
+	}
+	local_bh_enable();
 }
 
 #ifdef CONFIG_IXGBE_DCB
@@ -7177,6 +7221,9 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= ixgbe_netpoll,
 #endif
+#ifdef CONFIG_NET_LL_RX_POLL
+	.ndo_ll_poll		= ixgbe_low_latency_recv,
+#endif
 #ifdef IXGBE_FCOE
 	.ndo_fcoe_ddp_setup = ixgbe_fcoe_ddp_get,
 	.ndo_fcoe_ddp_target = ixgbe_fcoe_ddp_target,


------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v9 net-next 7/7] ixgbe: add extra stats for ndo_ll_poll
  2013-06-05 10:34 [PATCH v9 net-next 0/7] net: low latency Ethernet device polling Eliezer Tamir
@ 2013-06-05 10:35   ` Eliezer Tamir
  2013-06-05 10:34   ` Eliezer Tamir
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 51+ messages in thread
From: Eliezer Tamir @ 2013-06-05 10:35 UTC (permalink / raw)
  To: David Miller
  Cc: linux-kernel, netdev, Jesse Brandeburg, Don Skidmore,
	e1000-devel, Willem de Bruijn, Eric Dumazet, Ben Hutchings,
	Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz, Amir Vadai,
	Eliezer Tamir

Add additional statistics to the ixgbe driver for ndo_ll_poll
Defined under LL_EXTENDED_STATS

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |   14 ++++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |   40 ++++++++++++++++++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |    6 +++
 3 files changed, 60 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index e9d9862..fb098b4 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -54,6 +54,9 @@
 
 #include <net/ll_poll.h>
 
+#ifdef CONFIG_NET_LL_RX_POLL
+#define LL_EXTENDED_STATS
+#endif
 /* common prefix used by pr_<> macros */
 #undef pr_fmt
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -184,6 +187,11 @@ struct ixgbe_rx_buffer {
 struct ixgbe_queue_stats {
 	u64 packets;
 	u64 bytes;
+#ifdef LL_EXTENDED_STATS
+	u64 yields;
+	u64 misses;
+	u64 cleaned;
+#endif  /* LL_EXTENDED_STATS */
 };
 
 struct ixgbe_tx_queue_stats {
@@ -391,6 +399,9 @@ static inline bool ixgbe_qv_lock_napi(struct ixgbe_q_vector *q_vector)
 		WARN_ON(q_vector->state & IXGBE_QV_STATE_NAPI);
 		q_vector->state |= IXGBE_QV_STATE_NAPI_YIELD;
 		rc = false;
+#ifdef LL_EXTENDED_STATS
+		q_vector->tx.ring->stats.yields++;
+#endif
 	} else
 		/* we don't care if someone yielded */
 		q_vector->state = IXGBE_QV_STATE_NAPI;
@@ -421,6 +432,9 @@ static inline bool ixgbe_qv_lock_poll(struct ixgbe_q_vector *q_vector)
 	if ((q_vector->state & IXGBE_QV_LOCKED)) {
 		q_vector->state |= IXGBE_QV_STATE_POLL_YIELD;
 		rc = false;
+#ifdef LL_EXTENDED_STATS
+		q_vector->rx.ring->stats.yields++;
+#endif
 	} else
 		/* preserve yield marks */
 		q_vector->state |= IXGBE_QV_STATE_POLL;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index d375472..24e2e7a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -1054,6 +1054,12 @@ static void ixgbe_get_ethtool_stats(struct net_device *netdev,
 			data[i] = 0;
 			data[i+1] = 0;
 			i += 2;
+#ifdef LL_EXTENDED_STATS
+			data[i] = 0;
+			data[i+1] = 0;
+			data[i+2] = 0;
+			i += 3;
+#endif
 			continue;
 		}
 
@@ -1063,6 +1069,12 @@ static void ixgbe_get_ethtool_stats(struct net_device *netdev,
 			data[i+1] = ring->stats.bytes;
 		} while (u64_stats_fetch_retry_bh(&ring->syncp, start));
 		i += 2;
+#ifdef LL_EXTENDED_STATS
+		data[i] = ring->stats.yields;
+		data[i+1] = ring->stats.misses;
+		data[i+2] = ring->stats.cleaned;
+		i += 3;
+#endif
 	}
 	for (j = 0; j < IXGBE_NUM_RX_QUEUES; j++) {
 		ring = adapter->rx_ring[j];
@@ -1070,6 +1082,12 @@ static void ixgbe_get_ethtool_stats(struct net_device *netdev,
 			data[i] = 0;
 			data[i+1] = 0;
 			i += 2;
+#ifdef LL_EXTENDED_STATS
+			data[i] = 0;
+			data[i+1] = 0;
+			data[i+2] = 0;
+			i += 3;
+#endif
 			continue;
 		}
 
@@ -1079,6 +1097,12 @@ static void ixgbe_get_ethtool_stats(struct net_device *netdev,
 			data[i+1] = ring->stats.bytes;
 		} while (u64_stats_fetch_retry_bh(&ring->syncp, start));
 		i += 2;
+#ifdef LL_EXTENDED_STATS
+		data[i] = ring->stats.yields;
+		data[i+1] = ring->stats.misses;
+		data[i+2] = ring->stats.cleaned;
+		i += 3;
+#endif
 	}
 
 	for (j = 0; j < IXGBE_MAX_PACKET_BUFFERS; j++) {
@@ -1115,12 +1139,28 @@ static void ixgbe_get_strings(struct net_device *netdev, u32 stringset,
 			p += ETH_GSTRING_LEN;
 			sprintf(p, "tx_queue_%u_bytes", i);
 			p += ETH_GSTRING_LEN;
+#ifdef LL_EXTENDED_STATS
+			sprintf(p, "tx_q_%u_napi_yield", i);
+			p += ETH_GSTRING_LEN;
+			sprintf(p, "tx_q_%u_misses", i);
+			p += ETH_GSTRING_LEN;
+			sprintf(p, "tx_q_%u_cleaned", i);
+			p += ETH_GSTRING_LEN;
+#endif /* LL_EXTENDED_STATS */
 		}
 		for (i = 0; i < IXGBE_NUM_RX_QUEUES; i++) {
 			sprintf(p, "rx_queue_%u_packets", i);
 			p += ETH_GSTRING_LEN;
 			sprintf(p, "rx_queue_%u_bytes", i);
 			p += ETH_GSTRING_LEN;
+#ifdef LL_EXTENDED_STATS
+			sprintf(p, "rx_q_%u_ll_poll_yield", i);
+			p += ETH_GSTRING_LEN;
+			sprintf(p, "rx_q_%u_misses", i);
+			p += ETH_GSTRING_LEN;
+			sprintf(p, "rx_q_%u_cleaned", i);
+			p += ETH_GSTRING_LEN;
+#endif /* LL_EXTENDED_STATS */
 		}
 		for (i = 0; i < IXGBE_MAX_PACKET_BUFFERS; i++) {
 			sprintf(p, "tx_pb_%u_pxon", i);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 9a7dc40..047ebaa 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2016,6 +2016,12 @@ static int ixgbe_low_latency_recv(struct napi_struct *napi)
 
 	ixgbe_for_each_ring(ring, q_vector->rx) {
 		found = ixgbe_clean_rx_irq(q_vector, ring, 4);
+#ifdef LL_EXTENDED_STATS
+		if (found)
+			ring->stats.cleaned += found;
+		else
+			ring->stats.misses++;
+#endif
 		if (found)
 			break;
 	}


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v9 net-next 7/7] ixgbe: add extra stats for ndo_ll_poll
@ 2013-06-05 10:35   ` Eliezer Tamir
  0 siblings, 0 replies; 51+ messages in thread
From: Eliezer Tamir @ 2013-06-05 10:35 UTC (permalink / raw)
  To: David Miller
  Cc: Willem de Bruijn, Or Gerlitz, e1000-devel, netdev, HPA,
	Amir Vadai, linux-kernel, Eliezer Tamir, Jesse Brandeburg,
	Andi Kleen, Ben Hutchings, Eric Dumazet, Eilon Greenstien

Add additional statistics to the ixgbe driver for ndo_ll_poll
Defined under LL_EXTENDED_STATS

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |   14 ++++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |   40 ++++++++++++++++++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |    6 +++
 3 files changed, 60 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index e9d9862..fb098b4 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -54,6 +54,9 @@
 
 #include <net/ll_poll.h>
 
+#ifdef CONFIG_NET_LL_RX_POLL
+#define LL_EXTENDED_STATS
+#endif
 /* common prefix used by pr_<> macros */
 #undef pr_fmt
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -184,6 +187,11 @@ struct ixgbe_rx_buffer {
 struct ixgbe_queue_stats {
 	u64 packets;
 	u64 bytes;
+#ifdef LL_EXTENDED_STATS
+	u64 yields;
+	u64 misses;
+	u64 cleaned;
+#endif  /* LL_EXTENDED_STATS */
 };
 
 struct ixgbe_tx_queue_stats {
@@ -391,6 +399,9 @@ static inline bool ixgbe_qv_lock_napi(struct ixgbe_q_vector *q_vector)
 		WARN_ON(q_vector->state & IXGBE_QV_STATE_NAPI);
 		q_vector->state |= IXGBE_QV_STATE_NAPI_YIELD;
 		rc = false;
+#ifdef LL_EXTENDED_STATS
+		q_vector->tx.ring->stats.yields++;
+#endif
 	} else
 		/* we don't care if someone yielded */
 		q_vector->state = IXGBE_QV_STATE_NAPI;
@@ -421,6 +432,9 @@ static inline bool ixgbe_qv_lock_poll(struct ixgbe_q_vector *q_vector)
 	if ((q_vector->state & IXGBE_QV_LOCKED)) {
 		q_vector->state |= IXGBE_QV_STATE_POLL_YIELD;
 		rc = false;
+#ifdef LL_EXTENDED_STATS
+		q_vector->rx.ring->stats.yields++;
+#endif
 	} else
 		/* preserve yield marks */
 		q_vector->state |= IXGBE_QV_STATE_POLL;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index d375472..24e2e7a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -1054,6 +1054,12 @@ static void ixgbe_get_ethtool_stats(struct net_device *netdev,
 			data[i] = 0;
 			data[i+1] = 0;
 			i += 2;
+#ifdef LL_EXTENDED_STATS
+			data[i] = 0;
+			data[i+1] = 0;
+			data[i+2] = 0;
+			i += 3;
+#endif
 			continue;
 		}
 
@@ -1063,6 +1069,12 @@ static void ixgbe_get_ethtool_stats(struct net_device *netdev,
 			data[i+1] = ring->stats.bytes;
 		} while (u64_stats_fetch_retry_bh(&ring->syncp, start));
 		i += 2;
+#ifdef LL_EXTENDED_STATS
+		data[i] = ring->stats.yields;
+		data[i+1] = ring->stats.misses;
+		data[i+2] = ring->stats.cleaned;
+		i += 3;
+#endif
 	}
 	for (j = 0; j < IXGBE_NUM_RX_QUEUES; j++) {
 		ring = adapter->rx_ring[j];
@@ -1070,6 +1082,12 @@ static void ixgbe_get_ethtool_stats(struct net_device *netdev,
 			data[i] = 0;
 			data[i+1] = 0;
 			i += 2;
+#ifdef LL_EXTENDED_STATS
+			data[i] = 0;
+			data[i+1] = 0;
+			data[i+2] = 0;
+			i += 3;
+#endif
 			continue;
 		}
 
@@ -1079,6 +1097,12 @@ static void ixgbe_get_ethtool_stats(struct net_device *netdev,
 			data[i+1] = ring->stats.bytes;
 		} while (u64_stats_fetch_retry_bh(&ring->syncp, start));
 		i += 2;
+#ifdef LL_EXTENDED_STATS
+		data[i] = ring->stats.yields;
+		data[i+1] = ring->stats.misses;
+		data[i+2] = ring->stats.cleaned;
+		i += 3;
+#endif
 	}
 
 	for (j = 0; j < IXGBE_MAX_PACKET_BUFFERS; j++) {
@@ -1115,12 +1139,28 @@ static void ixgbe_get_strings(struct net_device *netdev, u32 stringset,
 			p += ETH_GSTRING_LEN;
 			sprintf(p, "tx_queue_%u_bytes", i);
 			p += ETH_GSTRING_LEN;
+#ifdef LL_EXTENDED_STATS
+			sprintf(p, "tx_q_%u_napi_yield", i);
+			p += ETH_GSTRING_LEN;
+			sprintf(p, "tx_q_%u_misses", i);
+			p += ETH_GSTRING_LEN;
+			sprintf(p, "tx_q_%u_cleaned", i);
+			p += ETH_GSTRING_LEN;
+#endif /* LL_EXTENDED_STATS */
 		}
 		for (i = 0; i < IXGBE_NUM_RX_QUEUES; i++) {
 			sprintf(p, "rx_queue_%u_packets", i);
 			p += ETH_GSTRING_LEN;
 			sprintf(p, "rx_queue_%u_bytes", i);
 			p += ETH_GSTRING_LEN;
+#ifdef LL_EXTENDED_STATS
+			sprintf(p, "rx_q_%u_ll_poll_yield", i);
+			p += ETH_GSTRING_LEN;
+			sprintf(p, "rx_q_%u_misses", i);
+			p += ETH_GSTRING_LEN;
+			sprintf(p, "rx_q_%u_cleaned", i);
+			p += ETH_GSTRING_LEN;
+#endif /* LL_EXTENDED_STATS */
 		}
 		for (i = 0; i < IXGBE_MAX_PACKET_BUFFERS; i++) {
 			sprintf(p, "tx_pb_%u_pxon", i);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 9a7dc40..047ebaa 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2016,6 +2016,12 @@ static int ixgbe_low_latency_recv(struct napi_struct *napi)
 
 	ixgbe_for_each_ring(ring, q_vector->rx) {
 		found = ixgbe_clean_rx_irq(q_vector, ring, 4);
+#ifdef LL_EXTENDED_STATS
+		if (found)
+			ring->stats.cleaned += found;
+		else
+			ring->stats.misses++;
+#endif
 		if (found)
 			break;
 	}


------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 1/7] net: add napi_id and hash
  2013-06-05 10:34   ` Eliezer Tamir
  (?)
@ 2013-06-05 13:18   ` Eric Dumazet
  -1 siblings, 0 replies; 51+ messages in thread
From: Eric Dumazet @ 2013-06-05 13:18 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: David Miller, linux-kernel, netdev, Jesse Brandeburg,
	Don Skidmore, e1000-devel, Willem de Bruijn, Ben Hutchings,
	Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz, Amir Vadai,
	Eliezer Tamir

On Wed, 2013-06-05 at 13:34 +0300, Eliezer Tamir wrote:
> Adds a napi_id and a hashing mechanism to lookup a napi by id.
> This will be used by subsequent patches to implement low latency
> Ethernet device polling.
> Based on a code sample by Eric Dumazet.
> 
> Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
> ---

Signed-off-by: Eric Dumazet <edumazet@google.com>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 2/7] net: add low latency socket poll
  2013-06-05 10:34   ` Eliezer Tamir
@ 2013-06-05 13:23     ` Eric Dumazet
  -1 siblings, 0 replies; 51+ messages in thread
From: Eric Dumazet @ 2013-06-05 13:23 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: David Miller, linux-kernel, netdev, Jesse Brandeburg,
	Don Skidmore, e1000-devel, Willem de Bruijn, Ben Hutchings,
	Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz, Amir Vadai,
	Eliezer Tamir

On Wed, 2013-06-05 at 13:34 +0300, Eliezer Tamir wrote:
> Adds an ndo_ll_poll method and the code that supports it.
> This method can be used by low latency applications to busy-poll
> Ethernet device queues directly from the socket code.
> sysctl_net_ll_poll controls how many microseconds to poll.
> Default is zero (disabled).
> Individual protocol support will be added by subsequent patches.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Tested-by: Willem de Bruijn <willemb@google.com>
> Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
> ---

Are you sure this version was tested by Willem ?

Acked-by: Eric Dumazet <edumazet@google.com>




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 2/7] net: add low latency socket poll
@ 2013-06-05 13:23     ` Eric Dumazet
  0 siblings, 0 replies; 51+ messages in thread
From: Eric Dumazet @ 2013-06-05 13:23 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: Willem de Bruijn, Don, Or Gerlitz, e1000-devel, netdev, HPA,
	Amir Vadai, Jesse Brandeburg, Eliezer Tamir, linux-kernel,
	Andi Kleen, Ben Hutchings, Eilon Greenstien, David Miller

On Wed, 2013-06-05 at 13:34 +0300, Eliezer Tamir wrote:
> Adds an ndo_ll_poll method and the code that supports it.
> This method can be used by low latency applications to busy-poll
> Ethernet device queues directly from the socket code.
> sysctl_net_ll_poll controls how many microseconds to poll.
> Default is zero (disabled).
> Individual protocol support will be added by subsequent patches.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Tested-by: Willem de Bruijn <willemb@google.com>
> Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
> ---

Are you sure this version was tested by Willem ?

Acked-by: Eric Dumazet <edumazet@google.com>




------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 3/7] udp: add low latency socket poll support
  2013-06-05 10:34   ` Eliezer Tamir
@ 2013-06-05 13:25     ` Eric Dumazet
  -1 siblings, 0 replies; 51+ messages in thread
From: Eric Dumazet @ 2013-06-05 13:25 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: David Miller, linux-kernel, netdev, Jesse Brandeburg,
	Don Skidmore, e1000-devel, Willem de Bruijn, Ben Hutchings,
	Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz, Amir Vadai,
	Eliezer Tamir

On Wed, 2013-06-05 at 13:34 +0300, Eliezer Tamir wrote:
> Add upport for busy-polling on UDP sockets.
> In __udp[46]_lib_rcv add a call to sk_mark_ll() to copy the napi_id
> from the skb into the sk.
> This is done at the earliest possible moment, right after we identify
> which socket this skb is for.
> In __skb_recv_datagram When there is no data and the user
> tries to read we busy poll.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Tested-by: Willem de Bruijn <willemb@google.com>
> Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
> ---

Acked-by: Eric Dumazet <edumazet@google.com>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 3/7] udp: add low latency socket poll support
@ 2013-06-05 13:25     ` Eric Dumazet
  0 siblings, 0 replies; 51+ messages in thread
From: Eric Dumazet @ 2013-06-05 13:25 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: Willem de Bruijn, Don, Or Gerlitz, e1000-devel, netdev, HPA,
	Amir Vadai, Jesse Brandeburg, Eliezer Tamir, linux-kernel,
	Andi Kleen, Ben Hutchings, Eilon Greenstien, David Miller

On Wed, 2013-06-05 at 13:34 +0300, Eliezer Tamir wrote:
> Add upport for busy-polling on UDP sockets.
> In __udp[46]_lib_rcv add a call to sk_mark_ll() to copy the napi_id
> from the skb into the sk.
> This is done at the earliest possible moment, right after we identify
> which socket this skb is for.
> In __skb_recv_datagram When there is no data and the user
> tries to read we busy poll.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Tested-by: Willem de Bruijn <willemb@google.com>
> Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
> ---

Acked-by: Eric Dumazet <edumazet@google.com>



------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 4/7] tcp: add low latency socket poll support.
  2013-06-05 10:34   ` Eliezer Tamir
@ 2013-06-05 13:25     ` Eric Dumazet
  -1 siblings, 0 replies; 51+ messages in thread
From: Eric Dumazet @ 2013-06-05 13:25 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: David Miller, linux-kernel, netdev, Jesse Brandeburg,
	Don Skidmore, e1000-devel, Willem de Bruijn, Ben Hutchings,
	Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz, Amir Vadai,
	Eliezer Tamir

On Wed, 2013-06-05 at 13:34 +0300, Eliezer Tamir wrote:
> Adds low latency socket poll support for TCP.
> In tcp_v[46]_rcv() add a call to sk_mark_ll() to copy the napi_id
> from the skb to the sk.
> In tcp_recvmsg(), when there is no data in the socket we busy-poll.
> This is a good example of how to add busy-poll support to more protocols.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Tested-by: Willem de Bruijn <willemb@google.com>
> Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
> ---
Acked-by: Eric Dumazet <edumazet@google.com>



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 4/7] tcp: add low latency socket poll support.
@ 2013-06-05 13:25     ` Eric Dumazet
  0 siblings, 0 replies; 51+ messages in thread
From: Eric Dumazet @ 2013-06-05 13:25 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: Willem de Bruijn, Don, Or Gerlitz, e1000-devel, netdev, HPA,
	Amir Vadai, Jesse Brandeburg, Eliezer Tamir, linux-kernel,
	Andi Kleen, Ben Hutchings, Eilon Greenstien, David Miller

On Wed, 2013-06-05 at 13:34 +0300, Eliezer Tamir wrote:
> Adds low latency socket poll support for TCP.
> In tcp_v[46]_rcv() add a call to sk_mark_ll() to copy the napi_id
> from the skb to the sk.
> In tcp_recvmsg(), when there is no data in the socket we busy-poll.
> This is a good example of how to add busy-poll support to more protocols.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Tested-by: Willem de Bruijn <willemb@google.com>
> Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
> ---
Acked-by: Eric Dumazet <edumazet@google.com>



------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 5/7] net: simple poll/select low latency socket poll
  2013-06-05 10:34   ` Eliezer Tamir
@ 2013-06-05 13:30     ` Eric Dumazet
  -1 siblings, 0 replies; 51+ messages in thread
From: Eric Dumazet @ 2013-06-05 13:30 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: David Miller, linux-kernel, netdev, Jesse Brandeburg,
	Don Skidmore, e1000-devel, Willem de Bruijn, Ben Hutchings,
	Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz, Amir Vadai,
	Eliezer Tamir

On Wed, 2013-06-05 at 13:34 +0300, Eliezer Tamir wrote:
> A very naive select/poll busy-poll support.
> Add busy-polling to sock_poll().
> When poll/select have nothing to report, call the low-level
> sock_poll() again until we are out of time or we find something.
> Right now we poll every socket once, this is suboptimal
> but improves latency when the number of sockets polled is not large.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Tested-by: Willem de Bruijn <willemb@google.com>
> Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
> ---

I am a bit uneasy with this one, because an applicatio polling() on one
thousand file descriptors using select()/poll(), will call sk_poll_ll()
one thousand times.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 5/7] net: simple poll/select low latency socket poll
@ 2013-06-05 13:30     ` Eric Dumazet
  0 siblings, 0 replies; 51+ messages in thread
From: Eric Dumazet @ 2013-06-05 13:30 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: Willem de Bruijn, Don, Or Gerlitz, e1000-devel, netdev, HPA,
	Amir Vadai, Jesse Brandeburg, Eliezer Tamir, linux-kernel,
	Andi Kleen, Ben Hutchings, Eilon Greenstien, David Miller

On Wed, 2013-06-05 at 13:34 +0300, Eliezer Tamir wrote:
> A very naive select/poll busy-poll support.
> Add busy-polling to sock_poll().
> When poll/select have nothing to report, call the low-level
> sock_poll() again until we are out of time or we find something.
> Right now we poll every socket once, this is suboptimal
> but improves latency when the number of sockets polled is not large.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Tested-by: Willem de Bruijn <willemb@google.com>
> Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
> ---

I am a bit uneasy with this one, because an applicatio polling() on one
thousand file descriptors using select()/poll(), will call sk_poll_ll()
one thousand times.




------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 5/7] net: simple poll/select low latency socket poll
  2013-06-05 13:30     ` Eric Dumazet
@ 2013-06-05 13:41       ` Eliezer Tamir
  -1 siblings, 0 replies; 51+ messages in thread
From: Eliezer Tamir @ 2013-06-05 13:41 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, linux-kernel, netdev, Jesse Brandeburg,
	Don Skidmore, e1000-devel, Willem de Bruijn, Ben Hutchings,
	Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz, Amir Vadai,
	Eliezer Tamir

On 05/06/2013 16:30, Eric Dumazet wrote:
> On Wed, 2013-06-05 at 13:34 +0300, Eliezer Tamir wrote:
>> A very naive select/poll busy-poll support.
>> Add busy-polling to sock_poll().
>> When poll/select have nothing to report, call the low-level
>> sock_poll() again until we are out of time or we find something.
>> Right now we poll every socket once, this is suboptimal
>> but improves latency when the number of sockets polled is not large.
>>
>> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
>> Tested-by: Willem de Bruijn <willemb@google.com>
>> Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
>> ---
>
> I am a bit uneasy with this one, because an applicatio polling() on one
> thousand file descriptors using select()/poll(), will call sk_poll_ll()
> one thousand times.

But we call sk_poll_ll() with nonblock set, so it will only test once
for each socket and not loop.

I think this is not as bad as it sounds.
We still honor the time limit on how long to poll.

When we busy-wait on a single socket we call sk_poll_ll() repeatedly
until we timeout or we have something to report.

Here on the other hand, we sk_poll_ll() once for each file, so we loop 
on the files. We moved the loop from inside sk_poll_ll to select/poll.

I also plan on improving this this in the next stage.

The plan is to give control on whether sk_poll_ll is called to
select/poll/epoll, so the caller has even more control.

-Eliezer



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 5/7] net: simple poll/select low latency socket poll
@ 2013-06-05 13:41       ` Eliezer Tamir
  0 siblings, 0 replies; 51+ messages in thread
From: Eliezer Tamir @ 2013-06-05 13:41 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Willem de Bruijn, Or Gerlitz, e1000-devel, netdev, HPA,
	Amir Vadai, Jesse Brandeburg, Eliezer Tamir, linux-kernel,
	Andi Kleen, Ben Hutchings, Eilon Greenstien, David Miller

On 05/06/2013 16:30, Eric Dumazet wrote:
> On Wed, 2013-06-05 at 13:34 +0300, Eliezer Tamir wrote:
>> A very naive select/poll busy-poll support.
>> Add busy-polling to sock_poll().
>> When poll/select have nothing to report, call the low-level
>> sock_poll() again until we are out of time or we find something.
>> Right now we poll every socket once, this is suboptimal
>> but improves latency when the number of sockets polled is not large.
>>
>> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
>> Tested-by: Willem de Bruijn <willemb@google.com>
>> Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
>> ---
>
> I am a bit uneasy with this one, because an applicatio polling() on one
> thousand file descriptors using select()/poll(), will call sk_poll_ll()
> one thousand times.

But we call sk_poll_ll() with nonblock set, so it will only test once
for each socket and not loop.

I think this is not as bad as it sounds.
We still honor the time limit on how long to poll.

When we busy-wait on a single socket we call sk_poll_ll() repeatedly
until we timeout or we have something to report.

Here on the other hand, we sk_poll_ll() once for each file, so we loop 
on the files. We moved the loop from inside sk_poll_ll to select/poll.

I also plan on improving this this in the next stage.

The plan is to give control on whether sk_poll_ll is called to
select/poll/epoll, so the caller has even more control.

-Eliezer



------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH v9 net-next 5/7] net: simple poll/select low latency socket poll
  2013-06-05 13:30     ` Eric Dumazet
@ 2013-06-05 13:49       ` David Laight
  -1 siblings, 0 replies; 51+ messages in thread
From: David Laight @ 2013-06-05 13:49 UTC (permalink / raw)
  To: Eric Dumazet, Eliezer Tamir
  Cc: David Miller, linux-kernel, netdev, Jesse Brandeburg,
	Don Skidmore, e1000-devel, Willem de Bruijn, Ben Hutchings,
	Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz, Amir Vadai,
	Eliezer Tamir

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 643 bytes --]

> I am a bit uneasy with this one, because an applicatio polling() on one
> thousand file descriptors using select()/poll(), will call sk_poll_ll()
> one thousand times.

Anything calling poll() on 1000 fds probably has performance
issues already! Which is why kevent schemes have been added.

At least the Linux code doesn't use a linked list for
the fd -> 'struct file' map which made poll() O(n^2),
and getting to that number of open fds O(n^3) on
some versions of SVR4.

	David

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 5/7] net: simple poll/select low latency socket poll
@ 2013-06-05 13:49       ` David Laight
  0 siblings, 0 replies; 51+ messages in thread
From: David Laight @ 2013-06-05 13:49 UTC (permalink / raw)
  To: Eric Dumazet, Eliezer Tamir
  Cc: Willem de Bruijn, Or Gerlitz, e1000-devel, netdev, HPA,
	Amir Vadai, Jesse Brandeburg, Eliezer Tamir, linux-kernel,
	Andi Kleen, Ben Hutchings, Eilon Greenstien, David Miller

> I am a bit uneasy with this one, because an applicatio polling() on one
> thousand file descriptors using select()/poll(), will call sk_poll_ll()
> one thousand times.

Anything calling poll() on 1000 fds probably has performance
issues already! Which is why kevent schemes have been added.

At least the Linux code doesn't use a linked list for
the fd -> 'struct file' map which made poll() O(n^2),
and getting to that number of open fds O(n^3) on
some versions of SVR4.

	David

------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 5/7] net: simple poll/select low latency socket poll
  2013-06-05 13:41       ` Eliezer Tamir
@ 2013-06-05 13:56         ` Eric Dumazet
  -1 siblings, 0 replies; 51+ messages in thread
From: Eric Dumazet @ 2013-06-05 13:56 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: David Miller, linux-kernel, netdev, Jesse Brandeburg,
	Don Skidmore, e1000-devel, Willem de Bruijn, Ben Hutchings,
	Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz, Amir Vadai,
	Eliezer Tamir

On Wed, 2013-06-05 at 16:41 +0300, Eliezer Tamir wrote:
> On 05/06/2013 16:30, Eric Dumazet wrote:
> > On Wed, 2013-06-05 at 13:34 +0300, Eliezer Tamir wrote:
> >> A very naive select/poll busy-poll support.
> >> Add busy-polling to sock_poll().
> >> When poll/select have nothing to report, call the low-level
> >> sock_poll() again until we are out of time or we find something.
> >> Right now we poll every socket once, this is suboptimal
> >> but improves latency when the number of sockets polled is not large.
> >>
> >> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> >> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> >> Tested-by: Willem de Bruijn <willemb@google.com>
> >> Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
> >> ---
> >
> > I am a bit uneasy with this one, because an applicatio polling() on one
> > thousand file descriptors using select()/poll(), will call sk_poll_ll()
> > one thousand times.
> 
> But we call sk_poll_ll() with nonblock set, so it will only test once
> for each socket and not loop.
> 
> I think this is not as bad as it sounds.
> We still honor the time limit on how long to poll.
> 
> When we busy-wait on a single socket we call sk_poll_ll() repeatedly
> until we timeout or we have something to report.
> 
> Here on the other hand, we sk_poll_ll() once for each file, so we loop 
> on the files. We moved the loop from inside sk_poll_ll to select/poll.
> 
> I also plan on improving this this in the next stage.
> 
> The plan is to give control on whether sk_poll_ll is called to
> select/poll/epoll, so the caller has even more control.

This looks quite easy, by adding in include/uapi/asm-generic/poll.h

#define POLL_LL 0x8000

And do the sk_poll_ll() call only if flag is set.

I do not think we have to support select(), as its legacy interface, and
people wanting ll should really use epoll() or poll().




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 5/7] net: simple poll/select low latency socket poll
@ 2013-06-05 13:56         ` Eric Dumazet
  0 siblings, 0 replies; 51+ messages in thread
From: Eric Dumazet @ 2013-06-05 13:56 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: Willem de Bruijn, Don, Or Gerlitz, e1000-devel, netdev, HPA,
	Amir Vadai, Jesse Brandeburg, Eliezer Tamir, linux-kernel,
	Andi Kleen, Ben Hutchings, Eilon Greenstien, David Miller

On Wed, 2013-06-05 at 16:41 +0300, Eliezer Tamir wrote:
> On 05/06/2013 16:30, Eric Dumazet wrote:
> > On Wed, 2013-06-05 at 13:34 +0300, Eliezer Tamir wrote:
> >> A very naive select/poll busy-poll support.
> >> Add busy-polling to sock_poll().
> >> When poll/select have nothing to report, call the low-level
> >> sock_poll() again until we are out of time or we find something.
> >> Right now we poll every socket once, this is suboptimal
> >> but improves latency when the number of sockets polled is not large.
> >>
> >> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> >> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> >> Tested-by: Willem de Bruijn <willemb@google.com>
> >> Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
> >> ---
> >
> > I am a bit uneasy with this one, because an applicatio polling() on one
> > thousand file descriptors using select()/poll(), will call sk_poll_ll()
> > one thousand times.
> 
> But we call sk_poll_ll() with nonblock set, so it will only test once
> for each socket and not loop.
> 
> I think this is not as bad as it sounds.
> We still honor the time limit on how long to poll.
> 
> When we busy-wait on a single socket we call sk_poll_ll() repeatedly
> until we timeout or we have something to report.
> 
> Here on the other hand, we sk_poll_ll() once for each file, so we loop 
> on the files. We moved the loop from inside sk_poll_ll to select/poll.
> 
> I also plan on improving this this in the next stage.
> 
> The plan is to give control on whether sk_poll_ll is called to
> select/poll/epoll, so the caller has even more control.

This looks quite easy, by adding in include/uapi/asm-generic/poll.h

#define POLL_LL 0x8000

And do the sk_poll_ll() call only if flag is set.

I do not think we have to support select(), as its legacy interface, and
people wanting ll should really use epoll() or poll().




------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH v9 net-next 5/7] net: simple poll/select low latency socket poll
  2013-06-05 13:49       ` David Laight
@ 2013-06-05 14:00         ` Eric Dumazet
  -1 siblings, 0 replies; 51+ messages in thread
From: Eric Dumazet @ 2013-06-05 14:00 UTC (permalink / raw)
  To: David Laight
  Cc: Eliezer Tamir, David Miller, linux-kernel, netdev,
	Jesse Brandeburg, Don Skidmore, e1000-devel, Willem de Bruijn,
	Ben Hutchings, Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz,
	Amir Vadai, Eliezer Tamir

On Wed, 2013-06-05 at 14:49 +0100, David Laight wrote:
> > I am a bit uneasy with this one, because an applicatio polling() on one
> > thousand file descriptors using select()/poll(), will call sk_poll_ll()
> > one thousand times.
> 
> Anything calling poll() on 1000 fds probably has performance
> issues already! Which is why kevent schemes have been added.
> 

You'll be surprised but many applications still use poll(),
and not epoll() or whatever OS specific interface, because they
are non portable or buggy. (I played with FreeBSD and kevent crashed
easily at 64,000 fds, while the epoll() version reached 4,000,000 fds
with no problems)





^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 5/7] net: simple poll/select low latency socket poll
@ 2013-06-05 14:00         ` Eric Dumazet
  0 siblings, 0 replies; 51+ messages in thread
From: Eric Dumazet @ 2013-06-05 14:00 UTC (permalink / raw)
  To: David Laight
  Cc: Willem de Bruijn, Don, Eliezer Tamir, Or Gerlitz, e1000-devel,
	netdev, HPA, Amir Vadai, Jesse Brandeburg, Eliezer Tamir,
	linux-kernel, Andi Kleen, Ben Hutchings, Eilon Greenstien,
	David Miller

On Wed, 2013-06-05 at 14:49 +0100, David Laight wrote:
> > I am a bit uneasy with this one, because an applicatio polling() on one
> > thousand file descriptors using select()/poll(), will call sk_poll_ll()
> > one thousand times.
> 
> Anything calling poll() on 1000 fds probably has performance
> issues already! Which is why kevent schemes have been added.
> 

You'll be surprised but many applications still use poll(),
and not epoll() or whatever OS specific interface, because they
are non portable or buggy. (I played with FreeBSD and kevent crashed
easily at 64,000 fds, while the epoll() version reached 4,000,000 fds
with no problems)





------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 5/7] net: simple poll/select low latency socket poll
  2013-06-05 13:56         ` Eric Dumazet
@ 2013-06-05 14:17           ` Eric Dumazet
  -1 siblings, 0 replies; 51+ messages in thread
From: Eric Dumazet @ 2013-06-05 14:17 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: David Miller, linux-kernel, netdev, Jesse Brandeburg,
	Don Skidmore, e1000-devel, Willem de Bruijn, Ben Hutchings,
	Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz, Amir Vadai,
	Eliezer Tamir

On Wed, 2013-06-05 at 06:56 -0700, Eric Dumazet wrote:

> This looks quite easy, by adding in include/uapi/asm-generic/poll.h
> 
> #define POLL_LL 0x8000
> 
> And do the sk_poll_ll() call only if flag is set.
> 
> I do not think we have to support select(), as its legacy interface, and
> people wanting ll should really use epoll() or poll().

Alternatively, add a per socket flag to enable/disable ll

This global enable assumes the application owns the host anyway.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 5/7] net: simple poll/select low latency socket poll
@ 2013-06-05 14:17           ` Eric Dumazet
  0 siblings, 0 replies; 51+ messages in thread
From: Eric Dumazet @ 2013-06-05 14:17 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: Willem de Bruijn, Don, Or Gerlitz, e1000-devel, netdev, HPA,
	Amir Vadai, Jesse Brandeburg, Eliezer Tamir, linux-kernel,
	Andi Kleen, Ben Hutchings, Eilon Greenstien, David Miller

On Wed, 2013-06-05 at 06:56 -0700, Eric Dumazet wrote:

> This looks quite easy, by adding in include/uapi/asm-generic/poll.h
> 
> #define POLL_LL 0x8000
> 
> And do the sk_poll_ll() call only if flag is set.
> 
> I do not think we have to support select(), as its legacy interface, and
> people wanting ll should really use epoll() or poll().

Alternatively, add a per socket flag to enable/disable ll

This global enable assumes the application owns the host anyway.




------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 5/7] net: simple poll/select low latency socket poll
  2013-06-05 14:17           ` Eric Dumazet
  (?)
@ 2013-06-05 14:56           ` Eliezer Tamir
  -1 siblings, 0 replies; 51+ messages in thread
From: Eliezer Tamir @ 2013-06-05 14:56 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, linux-kernel, netdev, Jesse Brandeburg,
	Don Skidmore, e1000-devel, Willem de Bruijn, Ben Hutchings,
	Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz, Amir Vadai,
	Eliezer Tamir

On 05/06/2013 17:17, Eric Dumazet wrote:
> On Wed, 2013-06-05 at 06:56 -0700, Eric Dumazet wrote:
>
>> This looks quite easy, by adding in include/uapi/asm-generic/poll.h
>>
>> #define POLL_LL 0x8000
>>
>> And do the sk_poll_ll() call only if flag is set.
>>
>> I do not think we have to support select(), as its legacy interface, and
>> people wanting ll should really use epoll() or poll().
>
> Alternatively, add a per socket flag to enable/disable ll
>
> This global enable assumes the application owns the host anyway.
>

I plan on adding a socket option in the next stage.
I'm also testing a patch much like you described with a poll flag.
Select/poll set it to indicate that they want to busy poll.
Sock_poll sets it to indicate that this socket can (at the moment)
busy-poll.

If you think the way things are done right now is unacceptable,
even as an experimental feature, I would much prefer to drop this patch
and have the rest applied rather then bring in new code that is not 
fully tested at this stage.

-Eliezer


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 5/7] net: simple poll/select low latency socket poll
  2013-06-05 13:41       ` Eliezer Tamir
@ 2013-06-05 15:20         ` Eric Dumazet
  -1 siblings, 0 replies; 51+ messages in thread
From: Eric Dumazet @ 2013-06-05 15:20 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: David Miller, linux-kernel, netdev, Jesse Brandeburg,
	Don Skidmore, e1000-devel, Willem de Bruijn, Ben Hutchings,
	Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz, Amir Vadai,
	Eliezer Tamir

On Wed, 2013-06-05 at 16:41 +0300, Eliezer Tamir wrote:
> On 05/06/2013 16:30, Eric Dumazet wrote:

> > I am a bit uneasy with this one, because an applicatio polling() on one
> > thousand file descriptors using select()/poll(), will call sk_poll_ll()
> > one thousand times.
> 
> But we call sk_poll_ll() with nonblock set, so it will only test once
> for each socket and not loop.
> 
> I think this is not as bad as it sounds.
> We still honor the time limit on how long to poll.

We still call ndo_ll_poll() a thousand times, and probably do a
spinlock/unlock a thousand times in the driver.

I would definitely be convinced if you give us some performance numbers
of a poll() on a thousand tcp sockets for example.

See my following mail about sk_poll_ll()


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 5/7] net: simple poll/select low latency socket poll
@ 2013-06-05 15:20         ` Eric Dumazet
  0 siblings, 0 replies; 51+ messages in thread
From: Eric Dumazet @ 2013-06-05 15:20 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: Willem de Bruijn, Don, Or Gerlitz, e1000-devel, netdev, HPA,
	Amir Vadai, Jesse Brandeburg, Eliezer Tamir, linux-kernel,
	Andi Kleen, Ben Hutchings, Eilon Greenstien, David Miller

On Wed, 2013-06-05 at 16:41 +0300, Eliezer Tamir wrote:
> On 05/06/2013 16:30, Eric Dumazet wrote:

> > I am a bit uneasy with this one, because an applicatio polling() on one
> > thousand file descriptors using select()/poll(), will call sk_poll_ll()
> > one thousand times.
> 
> But we call sk_poll_ll() with nonblock set, so it will only test once
> for each socket and not loop.
> 
> I think this is not as bad as it sounds.
> We still honor the time limit on how long to poll.

We still call ndo_ll_poll() a thousand times, and probably do a
spinlock/unlock a thousand times in the driver.

I would definitely be convinced if you give us some performance numbers
of a poll() on a thousand tcp sockets for example.

See my following mail about sk_poll_ll()


------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 2/7] net: add low latency socket poll
  2013-06-05 10:34   ` Eliezer Tamir
  (?)
  (?)
@ 2013-06-05 15:21   ` Eric Dumazet
  2013-06-05 15:30     ` Eliezer Tamir
  -1 siblings, 1 reply; 51+ messages in thread
From: Eric Dumazet @ 2013-06-05 15:21 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: David Miller, linux-kernel, netdev, Jesse Brandeburg,
	Don Skidmore, e1000-devel, Willem de Bruijn, Ben Hutchings,
	Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz, Amir Vadai,
	Eliezer Tamir

On Wed, 2013-06-05 at 13:34 +0300, Eliezer Tamir wrote:


This is probably too big to be inlined, and nonblock should be a bool


It would also make sense to give end_time as a parameter, so that the
polling() code could really give  a end_time for the whole duration of
poll().

(You then should test can_poll_ll(end_time) _before_ call to
ndo_ll_poll())


> +static inline bool sk_poll_ll(struct sock *sk, int nonblock)
> +{
> +	cycles_t end_time = ll_end_time();
> +	const struct net_device_ops *ops;
> +	struct napi_struct *napi;
> +	int rc = false;
> +
> +	/*
> +	 * rcu read lock for napi hash
> +	 * bh so we don't race with net_rx_action
> +	 */
> +	rcu_read_lock_bh();
> +
> +	napi = napi_by_id(sk->sk_napi_id);
> +	if (!napi)
> +		goto out;
> +
> +	ops = napi->dev->netdev_ops;
> +	if (!ops->ndo_ll_poll)
> +		goto out;
> +
> +	do {
> +
> +		rc = ops->ndo_ll_poll(napi);
> +
> +		if (rc == LL_FLUSH_FAILED)
> +			break; /* permanent failure */
> +
> +		if (rc > 0)
> +			/* local bh are disabled so it is ok to use _BH */
> +			NET_ADD_STATS_BH(sock_net(sk),
> +					 LINUX_MIB_LOWLATENCYRXPACKETS, rc);
> +
> +	} while (skb_queue_empty(&sk->sk_receive_queue)
> +			&& can_poll_ll(end_time) && !nonblock);
> +
> +	rc = !skb_queue_empty(&sk->sk_receive_queue);
> +out:
> +	rcu_read_unlock_bh();
> +	return rc;
> +}




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 2/7] net: add low latency socket poll
  2013-06-05 13:23     ` Eric Dumazet
  (?)
@ 2013-06-05 15:28     ` Willem de Bruijn
  2013-06-05 15:31       ` Eliezer Tamir
  -1 siblings, 1 reply; 51+ messages in thread
From: Willem de Bruijn @ 2013-06-05 15:28 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eliezer Tamir, David Miller, linux-kernel, netdev,
	Jesse Brandeburg, Don Skidmore, e1000-devel, Ben Hutchings,
	Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz, Amir Vadai,
	Eliezer Tamir

On Wed, Jun 5, 2013 at 9:23 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2013-06-05 at 13:34 +0300, Eliezer Tamir wrote:
>> Adds an ndo_ll_poll method and the code that supports it.
>> This method can be used by low latency applications to busy-poll
>> Ethernet device queues directly from the socket code.
>> sysctl_net_ll_poll controls how many microseconds to poll.
>> Default is zero (disabled).
>> Individual protocol support will be added by subsequent patches.
>>
>> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
>> Tested-by: Willem de Bruijn <willemb@google.com>
>> Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
>> ---
>
> Are you sure this version was tested by Willem ?

I hadn't tested the latest revisions, indeed. Am testing this one right now.

> Acked-by: Eric Dumazet <edumazet@google.com>
>
>
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 2/7] net: add low latency socket poll
  2013-06-05 15:21   ` Eric Dumazet
@ 2013-06-05 15:30     ` Eliezer Tamir
  2013-06-05 15:39         ` Eric Dumazet
  0 siblings, 1 reply; 51+ messages in thread
From: Eliezer Tamir @ 2013-06-05 15:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, linux-kernel, netdev, Jesse Brandeburg,
	Don Skidmore, e1000-devel, Willem de Bruijn, Ben Hutchings,
	Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz, Amir Vadai,
	Eliezer Tamir


On 05/06/2013 18:21, Eric Dumazet wrote:
> On Wed, 2013-06-05 at 13:34 +0300, Eliezer Tamir wrote:
>
>
> This is probably too big to be inlined, and nonblock should be a bool


> It would also make sense to give end_time as a parameter, so that the
> polling() code could really give  a end_time for the whole duration of
> poll().
>
> (You then should test can_poll_ll(end_time) _before_ call to
> ndo_ll_poll())

how would you handle a nonblocking operation in that case?
I guess if we have a socket option, then we don't need to handle none 
blocking any diffrent, since the user specified exactly how much time to 
waste polling. right?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 2/7] net: add low latency socket poll
  2013-06-05 15:28     ` Willem de Bruijn
@ 2013-06-05 15:31       ` Eliezer Tamir
  0 siblings, 0 replies; 51+ messages in thread
From: Eliezer Tamir @ 2013-06-05 15:31 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Eric Dumazet, David Miller, linux-kernel, netdev,
	Jesse Brandeburg, Don Skidmore, e1000-devel, Ben Hutchings,
	Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz, Amir Vadai,
	Eliezer Tamir

On 05/06/2013 18:28, Willem de Bruijn wrote:
> On Wed, Jun 5, 2013 at 9:23 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Wed, 2013-06-05 at 13:34 +0300, Eliezer Tamir wrote:
>>> Adds an ndo_ll_poll method and the code that supports it.
>>> This method can be used by low latency applications to busy-poll
>>> Ethernet device queues directly from the socket code.
>>> sysctl_net_ll_poll controls how many microseconds to poll.
>>> Default is zero (disabled).
>>> Individual protocol support will be added by subsequent patches.
>>>
>>> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>>> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
>>> Tested-by: Willem de Bruijn <willemb@google.com>
>>> Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
>>> ---
>>
>> Are you sure this version was tested by Willem ?
>
> I hadn't tested the latest revisions, indeed. Am testing this one right now.

Sorry, I missed that.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 2/7] net: add low latency socket poll
  2013-06-05 15:30     ` Eliezer Tamir
@ 2013-06-05 15:39         ` Eric Dumazet
  0 siblings, 0 replies; 51+ messages in thread
From: Eric Dumazet @ 2013-06-05 15:39 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: David Miller, linux-kernel, netdev, Jesse Brandeburg,
	Don Skidmore, e1000-devel, Willem de Bruijn, Ben Hutchings,
	Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz, Amir Vadai,
	Eliezer Tamir

On Wed, 2013-06-05 at 18:30 +0300, Eliezer Tamir wrote:
> On 05/06/2013 18:21, Eric Dumazet wrote:
> > On Wed, 2013-06-05 at 13:34 +0300, Eliezer Tamir wrote:
> >
> >
> > This is probably too big to be inlined, and nonblock should be a bool
> 
> 
> > It would also make sense to give end_time as a parameter, so that the
> > polling() code could really give  a end_time for the whole duration of
> > poll().
> >
> > (You then should test can_poll_ll(end_time) _before_ call to
> > ndo_ll_poll())
> 
> how would you handle a nonblocking operation in that case?
> I guess if we have a socket option, then we don't need to handle none 
> blocking any diffrent, since the user specified exactly how much time to 
> waste polling. right?

If the thread already spent 50us in the poll() system call, it for sure
should not call any ndo_ll_poll(). This makes no more sense at this
point.




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 2/7] net: add low latency socket poll
@ 2013-06-05 15:39         ` Eric Dumazet
  0 siblings, 0 replies; 51+ messages in thread
From: Eric Dumazet @ 2013-06-05 15:39 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: Willem de Bruijn, Don, Or Gerlitz, e1000-devel, netdev, HPA,
	Amir Vadai, Jesse Brandeburg, Eliezer Tamir, linux-kernel,
	Andi Kleen, Ben Hutchings, Eilon Greenstien, David Miller

On Wed, 2013-06-05 at 18:30 +0300, Eliezer Tamir wrote:
> On 05/06/2013 18:21, Eric Dumazet wrote:
> > On Wed, 2013-06-05 at 13:34 +0300, Eliezer Tamir wrote:
> >
> >
> > This is probably too big to be inlined, and nonblock should be a bool
> 
> 
> > It would also make sense to give end_time as a parameter, so that the
> > polling() code could really give  a end_time for the whole duration of
> > poll().
> >
> > (You then should test can_poll_ll(end_time) _before_ call to
> > ndo_ll_poll())
> 
> how would you handle a nonblocking operation in that case?
> I guess if we have a socket option, then we don't need to handle none 
> blocking any diffrent, since the user specified exactly how much time to 
> waste polling. right?

If the thread already spent 50us in the poll() system call, it for sure
should not call any ndo_ll_poll(). This makes no more sense at this
point.




------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 2/7] net: add low latency socket poll
  2013-06-05 15:39         ` Eric Dumazet
@ 2013-06-05 15:46           ` Eliezer Tamir
  -1 siblings, 0 replies; 51+ messages in thread
From: Eliezer Tamir @ 2013-06-05 15:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, linux-kernel, netdev, Jesse Brandeburg,
	Don Skidmore, e1000-devel, Willem de Bruijn, Ben Hutchings,
	Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz, Amir Vadai,
	Eliezer Tamir

On 05/06/2013 18:39, Eric Dumazet wrote:
> On Wed, 2013-06-05 at 18:30 +0300, Eliezer Tamir wrote:
>> On 05/06/2013 18:21, Eric Dumazet wrote:

>>> It would also make sense to give end_time as a parameter, so that the
>>> polling() code could really give  a end_time for the whole duration of
>>> poll().
>>>
>>> (You then should test can_poll_ll(end_time) _before_ call to
>>> ndo_ll_poll())
>>
>> how would you handle a nonblocking operation in that case?
>> I guess if we have a socket option, then we don't need to handle none
>> blocking any diffrent, since the user specified exactly how much time to
>> waste polling. right?
>
> If the thread already spent 50us in the poll() system call, it for sure
> should not call any ndo_ll_poll(). This makes no more sense at this
> point.

what about a non-blocking read from a socket?
Right now we assume this means poll only once since the application will 
repeat as needed.

maybe add a "once" parameter that will cause sk_poll_ll() to ignore end 
time and only try once?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 2/7] net: add low latency socket poll
@ 2013-06-05 15:46           ` Eliezer Tamir
  0 siblings, 0 replies; 51+ messages in thread
From: Eliezer Tamir @ 2013-06-05 15:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Willem de Bruijn, Or Gerlitz, e1000-devel, netdev, HPA,
	Amir Vadai, Jesse Brandeburg, Eliezer Tamir, linux-kernel,
	Andi Kleen, Ben Hutchings, Eilon Greenstien, David Miller

On 05/06/2013 18:39, Eric Dumazet wrote:
> On Wed, 2013-06-05 at 18:30 +0300, Eliezer Tamir wrote:
>> On 05/06/2013 18:21, Eric Dumazet wrote:

>>> It would also make sense to give end_time as a parameter, so that the
>>> polling() code could really give  a end_time for the whole duration of
>>> poll().
>>>
>>> (You then should test can_poll_ll(end_time) _before_ call to
>>> ndo_ll_poll())
>>
>> how would you handle a nonblocking operation in that case?
>> I guess if we have a socket option, then we don't need to handle none
>> blocking any diffrent, since the user specified exactly how much time to
>> waste polling. right?
>
> If the thread already spent 50us in the poll() system call, it for sure
> should not call any ndo_ll_poll(). This makes no more sense at this
> point.

what about a non-blocking read from a socket?
Right now we assume this means poll only once since the application will 
repeat as needed.

maybe add a "once" parameter that will cause sk_poll_ll() to ignore end 
time and only try once?

------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 5/7] net: simple poll/select low latency socket poll
  2013-06-05 15:20         ` Eric Dumazet
  (?)
@ 2013-06-05 15:47         ` Eliezer Tamir
  -1 siblings, 0 replies; 51+ messages in thread
From: Eliezer Tamir @ 2013-06-05 15:47 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, linux-kernel, netdev, Jesse Brandeburg,
	Don Skidmore, e1000-devel, Willem de Bruijn, Ben Hutchings,
	Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz, Amir Vadai,
	Eliezer Tamir

On 05/06/2013 18:20, Eric Dumazet wrote:
> On Wed, 2013-06-05 at 16:41 +0300, Eliezer Tamir wrote:
>> On 05/06/2013 16:30, Eric Dumazet wrote:
>
>>> I am a bit uneasy with this one, because an applicatio polling() on one
>>> thousand file descriptors using select()/poll(), will call sk_poll_ll()
>>> one thousand times.
>>
>> But we call sk_poll_ll() with nonblock set, so it will only test once
>> for each socket and not loop.
>>
>> I think this is not as bad as it sounds.
>> We still honor the time limit on how long to poll.
>
> We still call ndo_ll_poll() a thousand times, and probably do a
> spinlock/unlock a thousand times in the driver.
>
> I would definitely be convinced if you give us some performance numbers
> of a poll() on a thousand tcp sockets for example.

So with 1000 sockets this is defiantly not a win
sockperf with 1000 udp sockets

sysctl   50         0
select 178.5 us / 130.0 us
poll   188.6 us / 130.0 us


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 2/7] net: add low latency socket poll
  2013-06-05 15:46           ` Eliezer Tamir
@ 2013-06-05 15:59             ` Eric Dumazet
  -1 siblings, 0 replies; 51+ messages in thread
From: Eric Dumazet @ 2013-06-05 15:59 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: David Miller, linux-kernel, netdev, Jesse Brandeburg,
	Don Skidmore, e1000-devel, Willem de Bruijn, Ben Hutchings,
	Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz, Amir Vadai,
	Eliezer Tamir

On Wed, 2013-06-05 at 18:46 +0300, Eliezer Tamir wrote:
> On 05/06/2013 18:39, Eric Dumazet wrote:
> > On Wed, 2013-06-05 at 18:30 +0300, Eliezer Tamir wrote:
> >> On 05/06/2013 18:21, Eric Dumazet wrote:
> 
> >>> It would also make sense to give end_time as a parameter, so that the
> >>> polling() code could really give  a end_time for the whole duration of
> >>> poll().
> >>>
> >>> (You then should test can_poll_ll(end_time) _before_ call to
> >>> ndo_ll_poll())
> >>
> >> how would you handle a nonblocking operation in that case?
> >> I guess if we have a socket option, then we don't need to handle none
> >> blocking any diffrent, since the user specified exactly how much time to
> >> waste polling. right?
> >
> > If the thread already spent 50us in the poll() system call, it for sure
> > should not call any ndo_ll_poll(). This makes no more sense at this
> > point.
> 
> what about a non-blocking read from a socket?
> Right now we assume this means poll only once since the application will 
> repeat as needed.
> 
> maybe add a "once" parameter that will cause sk_poll_ll() to ignore end 
> time and only try once?

extern bool __sk_poll_ll(struct sock *sk, cycles_t end);

static inline bool sk_poll_ll(struct sock *sk, bool nonblock)
{
	return __sk_poll_ll(sk, nonblock, ll_end_time());
}

In the poll() code, we should call ll_end_time() once, even if we poll
1000 fds.





^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 2/7] net: add low latency socket poll
@ 2013-06-05 15:59             ` Eric Dumazet
  0 siblings, 0 replies; 51+ messages in thread
From: Eric Dumazet @ 2013-06-05 15:59 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: Willem de Bruijn, Don, Or Gerlitz, e1000-devel, netdev, HPA,
	Amir Vadai, Jesse Brandeburg, Eliezer Tamir, linux-kernel,
	Andi Kleen, Ben Hutchings, Eilon Greenstien, David Miller

On Wed, 2013-06-05 at 18:46 +0300, Eliezer Tamir wrote:
> On 05/06/2013 18:39, Eric Dumazet wrote:
> > On Wed, 2013-06-05 at 18:30 +0300, Eliezer Tamir wrote:
> >> On 05/06/2013 18:21, Eric Dumazet wrote:
> 
> >>> It would also make sense to give end_time as a parameter, so that the
> >>> polling() code could really give  a end_time for the whole duration of
> >>> poll().
> >>>
> >>> (You then should test can_poll_ll(end_time) _before_ call to
> >>> ndo_ll_poll())
> >>
> >> how would you handle a nonblocking operation in that case?
> >> I guess if we have a socket option, then we don't need to handle none
> >> blocking any diffrent, since the user specified exactly how much time to
> >> waste polling. right?
> >
> > If the thread already spent 50us in the poll() system call, it for sure
> > should not call any ndo_ll_poll(). This makes no more sense at this
> > point.
> 
> what about a non-blocking read from a socket?
> Right now we assume this means poll only once since the application will 
> repeat as needed.
> 
> maybe add a "once" parameter that will cause sk_poll_ll() to ignore end 
> time and only try once?

extern bool __sk_poll_ll(struct sock *sk, cycles_t end);

static inline bool sk_poll_ll(struct sock *sk, bool nonblock)
{
	return __sk_poll_ll(sk, nonblock, ll_end_time());
}

In the poll() code, we should call ll_end_time() once, even if we poll
1000 fds.





------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 2/7] net: add low latency socket poll
  2013-06-05 15:59             ` Eric Dumazet
  (?)
@ 2013-06-06 12:50             ` Eliezer Tamir
  -1 siblings, 0 replies; 51+ messages in thread
From: Eliezer Tamir @ 2013-06-06 12:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, linux-kernel, netdev, Jesse Brandeburg,
	Don Skidmore, e1000-devel, Willem de Bruijn, Ben Hutchings,
	Andi Kleen, HPA, Eilon Greenstien, Or Gerlitz, Amir Vadai,
	Eliezer Tamir

On 05/06/2013 18:59, Eric Dumazet wrote:
> On Wed, 2013-06-05 at 18:46 +0300, Eliezer Tamir wrote:
>> On 05/06/2013 18:39, Eric Dumazet wrote:
>>> On Wed, 2013-06-05 at 18:30 +0300, Eliezer Tamir wrote:
>>>> On 05/06/2013 18:21, Eric Dumazet wrote:
>>
>>>>> It would also make sense to give end_time as a parameter, so that the
>>>>> polling() code could really give  a end_time for the whole duration of
>>>>> poll().
>>>>>
>>>>> (You then should test can_poll_ll(end_time) _before_ call to
>>>>> ndo_ll_poll())
>>>>
>>>> how would you handle a nonblocking operation in that case?
>>>> I guess if we have a socket option, then we don't need to handle none
>>>> blocking any diffrent, since the user specified exactly how much time to
>>>> waste polling. right?
>>>
>>> If the thread already spent 50us in the poll() system call, it for sure
>>> should not call any ndo_ll_poll(). This makes no more sense at this
>>> point.
>>
>> what about a non-blocking read from a socket?
>> Right now we assume this means poll only once since the application will
>> repeat as needed.
>>
>> maybe add a "once" parameter that will cause sk_poll_ll() to ignore end
>> time and only try once?
>
> extern bool __sk_poll_ll(struct sock *sk, cycles_t end);
>
> static inline bool sk_poll_ll(struct sock *sk, bool nonblock)
> {
> 	return __sk_poll_ll(sk, nonblock, ll_end_time());
> }
>
> In the poll() code, we should call ll_end_time() once, even if we poll
> 1000 fds.

Right now we have three uses for sk_poll_ll

1. blocking read - In this case we loop until:
   a !skb_queue_empty(&sk->sk_receive_queue)
or
   b !can_poll_ll(end_time)

2. non-blocking read - only try once, ignoring end time.

3. poll/select - for each socket we only try once (nonblock==1),
  we loop in poll/select until we are lucky or run out of time.

For 1 we want to loop inside sk_poll_ll() but for 3 we loop in poll/select.

So it seems all we need is for sk_poll_ll() to not call ll_end_time() if 
nonblock is set.

( something like cycles_t end_time = nonblock ? 0 : ll_end_time(); )

Or we could move out looping in all cases to the calling function.
Does this mean we should push out rcu_read_lock_bh() into the caller
  as well?

-Eliezer

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 0/7] net: low latency Ethernet device polling
  2013-06-05 10:34 [PATCH v9 net-next 0/7] net: low latency Ethernet device polling Eliezer Tamir
                   ` (6 preceding siblings ...)
  2013-06-05 10:35   ` Eliezer Tamir
@ 2013-06-07 21:48 ` David Miller
  2013-06-08 18:06   ` Eliezer Tamir
  7 siblings, 1 reply; 51+ messages in thread
From: David Miller @ 2013-06-07 21:48 UTC (permalink / raw)
  To: eliezer.tamir
  Cc: linux-kernel, netdev, jesse.brandeburg, donald.c.skidmore,
	e1000-devel, willemb, erdnetdev, bhutchings, andi, hpa, eilong,
	or.gerlitz, amirv, eliezer

From: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Date: Wed, 05 Jun 2013 13:34:00 +0300

> And here is v9.
> Except for typo fixes in comments/description, only 2/7 and 5/7 were changed.
> 
> Thanks to everyone for their input.

Since there is some discussion about the way the poll() bits work,
might I suggest you make a v10 without the poll stuff, which I will
apply to net-next, and then you can build the poll stuff on top of
that?

Thanks!

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v9 net-next 0/7] net: low latency Ethernet device polling
  2013-06-07 21:48 ` [PATCH v9 net-next 0/7] net: low latency Ethernet device polling David Miller
@ 2013-06-08 18:06   ` Eliezer Tamir
  0 siblings, 0 replies; 51+ messages in thread
From: Eliezer Tamir @ 2013-06-08 18:06 UTC (permalink / raw)
  To: David Miller
  Cc: linux-kernel, netdev, jesse.brandeburg, donald.c.skidmore,
	e1000-devel, willemb, erdnetdev, bhutchings, andi, hpa, eilong,
	or.gerlitz, amirv, eliezer

On 08/06/2013 00:48, David Miller wrote:
> From: Eliezer Tamir <eliezer.tamir@linux.intel.com>
> Date: Wed, 05 Jun 2013 13:34:00 +0300
>
>> And here is v9.
>> Except for typo fixes in comments/description, only 2/7 and 5/7 were changed.
>>
>> Thanks to everyone for their input.
>
> Since there is some discussion about the way the poll() bits work,
> might I suggest you make a v10 without the poll stuff, which I will
> apply to net-next, and then you can build the poll stuff on top of
> that?

I will do that.

Thanks,
Eliezer

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2013-06-08 18:06 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-05 10:34 [PATCH v9 net-next 0/7] net: low latency Ethernet device polling Eliezer Tamir
2013-06-05 10:34 ` [PATCH v9 net-next 1/7] net: add napi_id and hash Eliezer Tamir
2013-06-05 10:34   ` Eliezer Tamir
2013-06-05 13:18   ` Eric Dumazet
2013-06-05 10:34 ` [PATCH v9 net-next 2/7] net: add low latency socket poll Eliezer Tamir
2013-06-05 10:34   ` Eliezer Tamir
2013-06-05 13:23   ` Eric Dumazet
2013-06-05 13:23     ` Eric Dumazet
2013-06-05 15:28     ` Willem de Bruijn
2013-06-05 15:31       ` Eliezer Tamir
2013-06-05 15:21   ` Eric Dumazet
2013-06-05 15:30     ` Eliezer Tamir
2013-06-05 15:39       ` Eric Dumazet
2013-06-05 15:39         ` Eric Dumazet
2013-06-05 15:46         ` Eliezer Tamir
2013-06-05 15:46           ` Eliezer Tamir
2013-06-05 15:59           ` Eric Dumazet
2013-06-05 15:59             ` Eric Dumazet
2013-06-06 12:50             ` Eliezer Tamir
2013-06-05 10:34 ` [PATCH v9 net-next 3/7] udp: add low latency socket poll support Eliezer Tamir
2013-06-05 10:34   ` Eliezer Tamir
2013-06-05 13:25   ` Eric Dumazet
2013-06-05 13:25     ` Eric Dumazet
2013-06-05 10:34 ` [PATCH v9 net-next 4/7] tcp: " Eliezer Tamir
2013-06-05 10:34   ` Eliezer Tamir
2013-06-05 13:25   ` Eric Dumazet
2013-06-05 13:25     ` Eric Dumazet
2013-06-05 10:34 ` [PATCH v9 net-next 5/7] net: simple poll/select low latency socket poll Eliezer Tamir
2013-06-05 10:34   ` Eliezer Tamir
2013-06-05 13:30   ` Eric Dumazet
2013-06-05 13:30     ` Eric Dumazet
2013-06-05 13:41     ` Eliezer Tamir
2013-06-05 13:41       ` Eliezer Tamir
2013-06-05 13:56       ` Eric Dumazet
2013-06-05 13:56         ` Eric Dumazet
2013-06-05 14:17         ` Eric Dumazet
2013-06-05 14:17           ` Eric Dumazet
2013-06-05 14:56           ` Eliezer Tamir
2013-06-05 15:20       ` Eric Dumazet
2013-06-05 15:20         ` Eric Dumazet
2013-06-05 15:47         ` Eliezer Tamir
2013-06-05 13:49     ` David Laight
2013-06-05 13:49       ` David Laight
2013-06-05 14:00       ` Eric Dumazet
2013-06-05 14:00         ` Eric Dumazet
2013-06-05 10:35 ` [PATCH v9 net-next 6/7] ixgbe: add support for ndo_ll_poll Eliezer Tamir
2013-06-05 10:35   ` Eliezer Tamir
2013-06-05 10:35 ` [PATCH v9 net-next 7/7] ixgbe: add extra stats " Eliezer Tamir
2013-06-05 10:35   ` Eliezer Tamir
2013-06-07 21:48 ` [PATCH v9 net-next 0/7] net: low latency Ethernet device polling David Miller
2013-06-08 18:06   ` Eliezer Tamir

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.