[PATCH v4 net-next 0/4] net: low latency Ethernet device polling

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 net-next 0/4] net: low latency Ethernet device polling
@ 2013-05-21 14:26 Eliezer Tamir
  2013-05-21 14:26 ` [PATCH v4 net-next 1/4] net: implement support for low latency socket polling Eliezer Tamir
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Eliezer Tamir @ 2013-05-21 14:26 UTC (permalink / raw)
  To: Dave Miller
  Cc: Willem de Bruijn, Or Gerlitz, e1000-devel, netdev, HPA,
	linux-kernel, Eliezer Tamir, Jesse Brandeburg, Andi Kleen,
	Eilon Greenstien

Hello Dave,

I believe that I addressed the issues that were raised.
Please look and see if you have more comments.

Thank you all for your input.

To prevent the use of a stale napi pointer, I implemented a global id
that should be incremented whenever a napi is freed.
I used the free space in skb's second bitfield (7 bits) since I did not
want to increase the size of the structure. In an earlier attempt to do
this I chopped the global id to seven bits but in testing, this would
crash on the wrap-around. 
Now if the seven bits that are in the skb match the global id, 
we save the un-chopped id in the socket.
(This removes the module parameter and the limit on unloading.)


is this how you prefer the change log?

change log
v4
- removed separate config option for TCP busy-polling as suggested Eric Dumazet.
- added linux mib counter for packets received through the low latency path.
- re-allow module unloading, remove module param, use a global generation id
  instead to prevent the use of a stale napi pointer, as suggested
  by Eric Dumazet
- updated Documentation/networking/ip-sysctl.txt text

v3
- coding style changes suggested by Dave Miller

v2
- the sysctl knob is now in microseconds. The default value is now 0 (off).
- for now the code depends at configure time on CONFIG_I86_TSC 
- the napi reference in struct skb is now a union with the dma cookie
  since the former is only used on RX and the latter on TX,
  as suggested by Eric Dumazet.
- we do a better job at honoring non-blocking operations.
- removed busy-polling support for tcp_read_sock()
- remove dynamic disabling of GRO
- coding style fixes
- disallow unloading the device module after the feature has been used

Thanks,
Eliezer

------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 net-next 1/4] net: implement support for low latency socket polling
  2013-05-21 14:26 [PATCH v4 net-next 0/4] net: low latency Ethernet device polling Eliezer Tamir
@ 2013-05-21 14:26 ` Eliezer Tamir
  2013-05-21 14:35   ` Eric Dumazet
                     ` (2 more replies)
  2013-05-21 14:27 ` [PATCH v4 net-next 2/4] tcp: add TCP support for low latency receive poll Eliezer Tamir
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 12+ messages in thread
From: Eliezer Tamir @ 2013-05-21 14:26 UTC (permalink / raw)
  To: Dave Miller
  Cc: linux-kernel, netdev, Jesse Brandeburg, Don Skidmore,
	e1000-devel, Willem de Bruijn, Andi Kleen, HPA, Or Gerlitz,
	Eilon Greenstien, Eliezer Tamir

Adds a new ndo_ll_poll method and the code that supports and uses it.
This method can be used by low latency applications to busy poll ethernet
device queues directly from the socket code. The ip_low_latency_poll sysctl
entry controls how many cycles to poll. Set to zero to disable.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 Documentation/networking/ip-sysctl.txt |    6 ++
 include/linux/netdevice.h              |    3 +
 include/linux/skbuff.h                 |   13 +++-
 include/net/ll_poll.h                  |  117 ++++++++++++++++++++++++++++++++
 include/net/sock.h                     |    4 +
 include/uapi/linux/snmp.h              |    1 
 net/core/datagram.c                    |    7 ++
 net/core/skbuff.c                      |    4 +
 net/core/sock.c                        |    6 ++
 net/ipv4/Kconfig                       |   12 +++
 net/ipv4/proc.c                        |    1 
 net/ipv4/sysctl_net_ipv4.c             |   10 +++
 net/socket.c                           |   26 +++++++
 13 files changed, 206 insertions(+), 4 deletions(-)
 create mode 100644 include/net/ll_poll.h

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index f98ca63..16a36b3 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -19,6 +19,12 @@ ip_no_pmtu_disc - BOOLEAN
 	Disable Path MTU Discovery.
 	default FALSE
 
+ip_low_latency_poll - INTEGER
+	Low latency busy poll timeout. (needs CONFIG_INET_LL_RX_POLL)
+	Approximate time in ms to spin waiting for packets on the device queue.
+	Recommended value is 50. May increase power usage.
+	default 0
+
 min_pmtu - INTEGER
 	default 552 - minimum discovered Path MTU
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a94a5a0..e25f798 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -943,6 +943,9 @@ struct net_device_ops {
 						     gfp_t gfp);
 	void			(*ndo_netpoll_cleanup)(struct net_device *dev);
 #endif
+#ifdef CONFIG_INET_LL_RX_POLL
+	int			(*ndo_ll_poll)(struct napi_struct *dev);
+#endif
 	int			(*ndo_set_vf_mac)(struct net_device *dev,
 						  int queue, u8 *mac);
 	int			(*ndo_set_vf_vlan)(struct net_device *dev,
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 2e0ced1..0a61a6e 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -384,6 +384,7 @@ typedef unsigned char *sk_buff_data_t;
  *	@no_fcs:  Request NIC to treat last 4 bytes as Ethernet FCS
  *	@dma_cookie: a cookie to one of several possible DMA operations
  *		done by skb DMA functions
+  *	@dev_ref: the NAPI struct this skb came from
  *	@secmark: security marking
  *	@mark: Generic packet mark
  *	@dropcount: total number of sk_receive_queue overflows
@@ -494,11 +495,17 @@ struct sk_buff {
 	 * headers if needed
 	 */
 	__u8			encapsulation:1;
-	/* 7/9 bit hole (depending on ndisc_nodetype presence) */
+#ifdef CONFIG_INET_LL_RX_POLL
+	__u8			ll_gen_id:7;
+#endif
+	/* 0-2 bit hole (depending on ndisc_nodetype and ll_gen_id) */
 	kmemcheck_bitfield_end(flags2);
 
-#ifdef CONFIG_NET_DMA
-	dma_cookie_t		dma_cookie;
+#if defined CONFIG_NET_DMA || defined CONFIG_INET_LL_RX_POLL
+	union {
+		struct napi_struct	*dev_ref;
+		dma_cookie_t		dma_cookie;
+	};
 #endif
 #ifdef CONFIG_NETWORK_SECMARK
 	__u32			secmark;
diff --git a/include/net/ll_poll.h b/include/net/ll_poll.h
new file mode 100644
index 0000000..a31b2b4
--- /dev/null
+++ b/include/net/ll_poll.h
@@ -0,0 +1,117 @@
+/*
+ * low latency network device queue flush
+ * Copyright(c) 2013 Intel Corporation.
+ * Author: Eliezer Tamir
+ *
+ * For now this depends on CONFIG_I86_TSC
+ */
+
+#ifndef _LINUX_NET_LL_POLL_H
+#define _LINUX_NET_LL_POLL_H
+
+#ifdef CONFIG_INET_LL_RX_POLL
+#include <linux/netdevice.h>
+#include <net/ip.h>
+
+struct napi_struct;
+extern int sysctl_net_ll_poll __read_mostly;
+extern unsigned int ll_global_gen_id __read_mostly;
+
+/* we only have room for 7 bits of generation id in the skb */
+#define SKB_LL_GEN_MASK	0x7FF
+#define SKB_LL_GEN(id)		(id & SKB_LL_GEN_MASK)
+
+/* return values from ndo_ll_poll */
+#define LL_FLUSH_FAILED		-1
+#define LL_FLUSH_BUSY		-2
+
+/* we don't mind a ~2.5% imprecision */
+#define TSC_MHZ (tsc_khz >> 10)
+
+static inline bool sk_valid_ll(struct sock *sk)
+{
+	return sysctl_net_ll_poll && sk->dev_ref &&
+		sk->ll_gen_id == ll_global_gen_id &&
+		!need_resched() && !signal_pending(current);
+}
+
+static inline bool sk_poll_ll(struct sock *sk, int nonblock)
+{
+	unsigned long end_time = TSC_MHZ * ACCESS_ONCE(sysctl_net_ll_poll)
+					+ get_cycles();
+	struct napi_struct *napi = sk->dev_ref;
+	const struct net_device_ops *ops;
+	int rc;
+
+	if (!napi->dev->netdev_ops->ndo_ll_poll)
+		return false;
+
+	local_bh_disable();
+
+	ops = napi->dev->netdev_ops;
+	while (skb_queue_empty(&sk->sk_receive_queue) &&
+			!time_after((unsigned long)get_cycles(), end_time)) {
+		rc = ops->ndo_ll_poll(napi);
+
+		if (rc == LL_FLUSH_FAILED)
+			break; /* premanent failure */
+
+		if (rc > 0)
+			/* local bh are disabled so it is ok to use _BH */
+			NET_ADD_STATS_BH(sock_net(sk),
+				LINUX_MIB_LOWLATENCYRXPACKETS, rc);
+		if (nonblock)
+			break;
+	}
+
+	local_bh_enable();
+
+	return !skb_queue_empty(&sk->sk_receive_queue);
+}
+
+/* should be called when destroying a napi struct */
+static inline void inc_ll_gen_id(void)
+{
+	ll_global_gen_id++;
+}
+
+static inline void skb_mark_ll(struct sk_buff *skb, struct napi_struct *napi)
+{
+	skb->dev_ref = napi;
+	skb->ll_gen_id = SKB_LL_GEN(ll_global_gen_id);
+}
+
+static inline void sk_mark_ll(struct sock *sk, struct sk_buff *skb)
+{
+	if (skb->dev_ref && skb->ll_gen_id == SKB_LL_GEN(ll_global_gen_id)) {
+		sk->dev_ref = skb->dev_ref;
+		sk->ll_gen_id = ll_global_gen_id;
+	} else
+		sk->dev_ref = NULL; /* clear expired ref */
+}
+
+#else /* CONFIG_INET_LL_RX_FLUSH */
+
+static inline bool sk_valid_ll(struct sock *sk)
+{
+	return 0;
+}
+
+static inline bool sk_poll_ll(struct sock *sk, int nonblock)
+{
+	return 0;
+}
+
+static inline void skb_mark_ll(struct sk_buff *skb, struct napi_struct *napi)
+{
+}
+
+static inline void sk_mark_ll(struct sock *sk, struct sk_buff *skb)
+{
+}
+
+static inline void inc_ll_gen_id(void)
+{
+}
+#endif /* CONFIG_INET_LL_RX_FLUSH */
+#endif /* _LINUX_NET_LL_POLL_H */
diff --git a/include/net/sock.h b/include/net/sock.h
index 66772cf..1ccf1e6 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -399,6 +399,10 @@ struct sock {
 	int			(*sk_backlog_rcv)(struct sock *sk,
 						  struct sk_buff *skb);
 	void                    (*sk_destruct)(struct sock *sk);
+#ifdef CONFIG_INET_LL_RX_POLL
+	struct napi_struct	*dev_ref;
+	unsigned int		ll_gen_id;
+#endif
 };
 
 /*
diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index df2e8b4..26cbf76 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -253,6 +253,7 @@ enum
 	LINUX_MIB_TCPFASTOPENLISTENOVERFLOW,	/* TCPFastOpenListenOverflow */
 	LINUX_MIB_TCPFASTOPENCOOKIEREQD,	/* TCPFastOpenCookieReqd */
 	LINUX_MIB_TCPSPURIOUS_RTX_HOSTQUEUES, /* TCPSpuriousRtxHostQueues */
+	LINUX_MIB_LOWLATENCYRXPACKETS,		/* LowLatencyRxPackets */
 	__LINUX_MIB_MAX
 };
 
diff --git a/net/core/datagram.c b/net/core/datagram.c
index b71423d..df3dab8 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -56,6 +56,7 @@
 #include <net/sock.h>
 #include <net/tcp_states.h>
 #include <trace/events/skb.h>
+#include <net/ll_poll.h>
 
 /*
  *	Is a socket 'connection oriented' ?
@@ -201,12 +202,18 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
 			} else
 				__skb_unlink(skb, queue);
 
+			sk_mark_ll(sk, skb);
 			spin_unlock_irqrestore(&queue->lock, cpu_flags);
 			*off = _off;
 			return skb;
 		}
 		spin_unlock_irqrestore(&queue->lock, cpu_flags);
 
+#ifdef CONFIG_INET_LL_RX_POLL
+		if (sk_valid_ll(sk) && sk_poll_ll(sk, flags & MSG_DONTWAIT))
+			continue;
+#endif
+
 		/* User doesn't want to wait */
 		error = -EAGAIN;
 		if (!timeo)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index af9185d..4efd230 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -739,6 +739,10 @@ static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
 	new->vlan_tci		= old->vlan_tci;
 
 	skb_copy_secmark(new, old);
+
+#ifdef CONFIG_INET_LL_RX_POLL
+	new->dev_ref		= old->dev_ref;
+#endif
 }
 
 /*
diff --git a/net/core/sock.c b/net/core/sock.c
index 6ba327d..d8058ce 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -139,6 +139,8 @@
 #include <net/tcp.h>
 #endif
 
+#include <net/ll_poll.h>
+
 static DEFINE_MUTEX(proto_list_mutex);
 static LIST_HEAD(proto_list);
 
@@ -2284,6 +2286,10 @@ void sock_init_data(struct socket *sock, struct sock *sk)
 
 	sk->sk_stamp = ktime_set(-1L, 0);
 
+#ifdef CONFIG_INET_LL_RX_POLL
+	sk->dev_ref	=	NULL;
+#endif
+
 	/*
 	 * Before updating sk_refcnt, we must commit prior changes to memory
 	 * (Documentation/RCU/rculist_nulls.txt for details)
diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index 8603ca8..d209f0f 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -409,6 +409,18 @@ config INET_LRO
 
 	  If unsure, say Y.
 
+config INET_LL_RX_POLL
+	bool "Low Latency Receive Poll"
+	depends on X86_TSC
+	default n
+	---help---
+	  Support Low Latency Receive Queue Poll.
+	  (For network card drivers which support this option.)
+	  When waiting for data in read or poll call directly into the the device driver
+	  to flush packets which may be pending on the device queues into the stack.
+
+	  If unsure, say N.
+
 config INET_DIAG
 	tristate "INET: socket monitoring interface"
 	default y
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index 2a5bf86..6577a11 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -273,6 +273,7 @@ static const struct snmp_mib snmp4_net_list[] = {
 	SNMP_MIB_ITEM("TCPFastOpenListenOverflow", LINUX_MIB_TCPFASTOPENLISTENOVERFLOW),
 	SNMP_MIB_ITEM("TCPFastOpenCookieReqd", LINUX_MIB_TCPFASTOPENCOOKIEREQD),
 	SNMP_MIB_ITEM("TCPSpuriousRtxHostQueues", LINUX_MIB_TCPSPURIOUS_RTX_HOSTQUEUES),
+	SNMP_MIB_ITEM("LowLatencyRxPackets", LINUX_MIB_LOWLATENCYRXPACKETS),
 	SNMP_MIB_SENTINEL
 };
 
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index fa2f63f..d0fcaaf 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -25,6 +25,7 @@
 #include <net/inet_frag.h>
 #include <net/ping.h>
 #include <net/tcp_memcontrol.h>
+#include <net/ll_poll.h>
 
 static int zero;
 static int one = 1;
@@ -326,6 +327,15 @@ static struct ctl_table ipv4_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
+#ifdef CONFIG_INET_LL_RX_POLL
+	{
+		.procname	= "ip_low_latency_poll",
+		.data		= &sysctl_net_ll_poll,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
+#endif
 	{
 		.procname	= "tcp_syn_retries",
 		.data		= &sysctl_tcp_syn_retries,
diff --git a/net/socket.c b/net/socket.c
index 6b94633..626f6f7 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -105,6 +105,14 @@
 #include <linux/sockios.h>
 #include <linux/atalk.h>
 
+#ifdef CONFIG_INET_LL_RX_POLL
+#include <net/ll_poll.h>
+int sysctl_net_ll_poll __read_mostly;
+EXPORT_SYMBOL_GPL(sysctl_net_ll_poll);
+unsigned int ll_global_gen_id __read_mostly;
+EXPORT_SYMBOL_GPL(ll_global_gen_id);
+#endif
+
 static int sock_no_open(struct inode *irrelevant, struct file *dontcare);
 static ssize_t sock_aio_read(struct kiocb *iocb, const struct iovec *iov,
 			 unsigned long nr_segs, loff_t pos);
@@ -1142,13 +1150,29 @@ EXPORT_SYMBOL(sock_create_lite);
 /* No kernel lock held - perfect */
 static unsigned int sock_poll(struct file *file, poll_table *wait)
 {
+	unsigned int poll_result;
 	struct socket *sock;
 
 	/*
 	 *      We can't return errors to poll, so it's either yes or no.
 	 */
 	sock = file->private_data;
-	return sock->ops->poll(file, sock, wait);
+
+	poll_result = sock->ops->poll(file, sock, wait);
+
+#ifdef CONFIG_INET_LL_RX_POLL
+	if (wait &&
+	    !(poll_result & (POLLRDNORM | POLLERR | POLLRDHUP | POLLHUP))) {
+		struct sock *sk = sock->sk;
+
+		/* only try once per poll */
+		if (sk_valid_ll(sk) && sk_poll_ll(sk, 1))
+			poll_result = sock->ops->poll(file, sock, wait);
+
+	}
+#endif /* CONFIG_INET_LL_RX_POLL */
+
+	return poll_result;
 }
 
 static int sock_mmap(struct file *file, struct vm_area_struct *vma)

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 net-next 2/4] tcp: add TCP support for low latency receive poll.
  2013-05-21 14:26 [PATCH v4 net-next 0/4] net: low latency Ethernet device polling Eliezer Tamir
  2013-05-21 14:26 ` [PATCH v4 net-next 1/4] net: implement support for low latency socket polling Eliezer Tamir
@ 2013-05-21 14:27 ` Eliezer Tamir
  2013-05-21 17:30   ` Ben Hutchings
  2013-05-21 14:27 ` [PATCH v4 net-next 3/4] ixgbe: Add support for ndo_ll_poll Eliezer Tamir
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Eliezer Tamir @ 2013-05-21 14:27 UTC (permalink / raw)
  To: Dave Miller
  Cc: Willem de Bruijn, Or Gerlitz, e1000-devel, netdev, HPA,
	linux-kernel, Eliezer Tamir, Jesse Brandeburg, Andi Kleen,
	Eilon Greenstien

adds busy-poll support for TCP.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 net/ipv4/tcp.c       |    9 +++++++++
 net/ipv4/tcp_input.c |    4 ++++
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index dcb116d..b9cc512 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -279,6 +279,7 @@
 
 #include <asm/uaccess.h>
 #include <asm/ioctls.h>
+#include <net/ll_poll.h>
 
 int sysctl_tcp_fin_timeout __read_mostly = TCP_FIN_TIMEOUT;
 
@@ -1504,6 +1505,7 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
 			if (offset + 1 != skb->len)
 				continue;
 		}
+		sk_mark_ll(sk, skb);
 		if (tcp_hdr(skb)->fin) {
 			sk_eat_skb(sk, skb, false);
 			++seq;
@@ -1551,6 +1553,12 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	struct sk_buff *skb;
 	u32 urg_hole = 0;
 
+#ifdef CONFIG_INET_LL_RX_POLL
+	if (sk_valid_ll(sk) && skb_queue_empty(&sk->sk_receive_queue)
+	    && (sk->sk_state == TCP_ESTABLISHED))
+		sk_poll_ll(sk, nonblock);
+#endif
+
 	lock_sock(sk);
 
 	err = -ENOTCONN;
@@ -1855,6 +1863,7 @@ do_prequeue:
 					break;
 				}
 			}
+			sk_mark_ll(sk, skb);
 		}
 
 		*seq += used;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b358e8c..f3f293b 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -74,6 +74,7 @@
 #include <linux/ipsec.h>
 #include <asm/unaligned.h>
 #include <net/netdma.h>
+#include <net/ll_poll.h>
 
 int sysctl_tcp_timestamps __read_mostly = 1;
 int sysctl_tcp_window_scaling __read_mostly = 1;
@@ -4329,6 +4330,7 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 				tp->copied_seq += chunk;
 				eaten = (chunk == skb->len);
 				tcp_rcv_space_adjust(sk);
+				sk_mark_ll(sk, skb);
 			}
 			local_bh_disable();
 		}
@@ -4896,6 +4898,7 @@ static int tcp_copy_to_iovec(struct sock *sk, struct sk_buff *skb, int hlen)
 		tp->ucopy.len -= chunk;
 		tp->copied_seq += chunk;
 		tcp_rcv_space_adjust(sk);
+		sk_mark_ll(sk, skb);
 	}
 
 	local_bh_disable();
@@ -4955,6 +4958,7 @@ static bool tcp_dma_try_early_copy(struct sock *sk, struct sk_buff *skb,
 		tp->ucopy.len -= chunk;
 		tp->copied_seq += chunk;
 		tcp_rcv_space_adjust(sk);
+		sk_mark_ll(sk, skb);
 
 		if ((tp->ucopy.len == 0) ||
 		    (tcp_flag_word(tcp_hdr(skb)) & TCP_FLAG_PSH) ||


------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 net-next 3/4] ixgbe: Add support for ndo_ll_poll
  2013-05-21 14:26 [PATCH v4 net-next 0/4] net: low latency Ethernet device polling Eliezer Tamir
  2013-05-21 14:26 ` [PATCH v4 net-next 1/4] net: implement support for low latency socket polling Eliezer Tamir
  2013-05-21 14:27 ` [PATCH v4 net-next 2/4] tcp: add TCP support for low latency receive poll Eliezer Tamir
@ 2013-05-21 14:27 ` Eliezer Tamir
  2013-05-21 17:33   ` Ben Hutchings
  2013-05-21 14:27 ` [PATCH v4 net-next 4/4] ixgbe: add extra stats " Eliezer Tamir
  2013-05-23 11:00 ` [PATCH v4 net-next 0/4] net: low latency Ethernet device polling Alex Rosenbaum
  4 siblings, 1 reply; 12+ messages in thread
From: Eliezer Tamir @ 2013-05-21 14:27 UTC (permalink / raw)
  To: Dave Miller
  Cc: Willem de Bruijn, Or Gerlitz, e1000-devel, netdev, HPA,
	linux-kernel, Eliezer Tamir, Jesse Brandeburg, Andi Kleen,
	Eilon Greenstien

Add the ixgbe driver code implementing ndo_ll_poll.
It should be easy for other drivers to do something similar
in order to enable support for CONFIG_INET_LL_RX_POLL

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |   98 +++++++++++++++++++++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c  |    2 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   63 ++++++++++++++--
 3 files changed, 155 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index ca93238..a2fd08b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -52,6 +52,8 @@
 #include <linux/dca.h>
 #endif
 
+#include <net/ll_poll.h>
+
 /* common prefix used by pr_<> macros */
 #undef pr_fmt
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -356,9 +358,105 @@ struct ixgbe_q_vector {
 	struct rcu_head rcu;	/* to avoid race with update stats on free */
 	char name[IFNAMSIZ + 9];
 
+#ifdef CONFIG_INET_LL_RX_POLL
+	unsigned int state;
+#define IXGBE_QV_STATE_IDLE        0
+#define IXGBE_QV_STATE_NAPI	   1    /* NAPI owns this QV */
+#define IXGBE_QV_STATE_POLL	   2    /* poll owns this QV */
+#define IXGBE_QV_LOCKED (IXGBE_QV_STATE_NAPI | IXGBE_QV_STATE_POLL)
+#define IXGBE_QV_STATE_NAPI_YIELD  4    /* NAPI yielded this QV */
+#define IXGBE_QV_STATE_POLL_YIELD  8    /* poll yielded this QV */
+#define IXGBE_QV_YIELD (IXGBE_QV_STATE_NAPI_YIELD | IXGBE_QV_STATE_POLL_YIELD)
+#define IXGBE_QV_USER_PEND (IXGBE_QV_STATE_POLL | IXGBE_QV_STATE_POLL_YIELD)
+	spinlock_t lock;
+#endif  /* CONFIG_INET_LL_RX_POLL */
+
 	/* for dynamic allocation of rings associated with this q_vector */
 	struct ixgbe_ring ring[0] ____cacheline_internodealigned_in_smp;
 };
+#ifdef CONFIG_INET_LL_RX_POLL
+static inline void ixgbe_qv_init_lock(struct ixgbe_q_vector *q_vector)
+{
+
+	spin_lock_init(&q_vector->lock);
+	q_vector->state = IXGBE_QV_STATE_IDLE;
+}
+
+/* called from the device poll rutine to get ownership of a q_vector */
+static inline int ixgbe_qv_lock_napi(struct ixgbe_q_vector *q_vector)
+{
+	int rc = true;
+	spin_lock(&q_vector->lock);
+	if (q_vector->state & IXGBE_QV_LOCKED) {
+		WARN_ON(q_vector->state & IXGBE_QV_STATE_NAPI);
+		q_vector->state |= IXGBE_QV_STATE_NAPI_YIELD;
+		rc = false;
+	} else
+		/* we don't care if someone yielded */
+		q_vector->state = IXGBE_QV_STATE_NAPI;
+	spin_unlock(&q_vector->lock);
+	return rc;
+}
+
+/* returns true is someone tried to get the qv while napi had it */
+static inline int ixgbe_qv_unlock_napi(struct ixgbe_q_vector *q_vector)
+{
+	int rc = false;
+	spin_lock(&q_vector->lock);
+	WARN_ON(q_vector->state & (IXGBE_QV_STATE_POLL |
+			       IXGBE_QV_STATE_NAPI_YIELD));
+
+	if (q_vector->state & IXGBE_QV_STATE_POLL_YIELD)
+		rc = true;
+	q_vector->state = IXGBE_QV_STATE_IDLE;
+	spin_unlock(&q_vector->lock);
+	return rc;
+}
+
+/* called from ixgbe_low_latency_poll() */
+static inline int ixgbe_qv_lock_poll(struct ixgbe_q_vector *q_vector)
+{
+	int rc = true;
+	spin_lock_bh(&q_vector->lock);
+	if ((q_vector->state & IXGBE_QV_LOCKED)) {
+		q_vector->state |= IXGBE_QV_STATE_POLL_YIELD;
+		rc = false;
+	} else
+		/* preserve yield marks */
+		q_vector->state |= IXGBE_QV_STATE_POLL;
+	spin_unlock_bh(&q_vector->lock);
+	return rc;
+}
+
+/* returns true if someone tried to get the qv while it was locked */
+static inline int ixgbe_qv_unlock_poll(struct ixgbe_q_vector *q_vector)
+{
+	int rc = false;
+	spin_lock_bh(&q_vector->lock);
+	WARN_ON(q_vector->state & (IXGBE_QV_STATE_NAPI));
+
+	if (q_vector->state & IXGBE_QV_STATE_POLL_YIELD)
+		rc = true;
+	q_vector->state = IXGBE_QV_STATE_IDLE;
+	spin_unlock_bh(&q_vector->lock);
+	return rc;
+}
+
+/* true if a socket is polling, even if it did not get the lock */
+static inline int ixgbe_qv_ll_polling(struct ixgbe_q_vector *q_vector)
+{
+	WARN_ON(!(q_vector->state & IXGBE_QV_LOCKED));
+	return q_vector->state & IXGBE_QV_USER_PEND;
+}
+#else
+#define ixgbe_qv_init_lock(qv) do {} while (0)
+#define ixgbe_qv_lock_napi(qv) 1
+#define ixgbe_qv_unlock_napi(qv) 0
+#define ixgbe_qv_lock_poll(qv) 0
+#define ixgbe_qv_unlock_poll(qv) 0
+#define ixgbe_qv_ll_polling(qv) 0
+#endif /* CONFIG_INET_LL_RX_POLL */
+
 #ifdef CONFIG_IXGBE_HWMON
 
 #define IXGBE_HWMON_TYPE_LOC		0
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
index ef5f7a6..e7ca6e1 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
@@ -1019,6 +1019,8 @@ static void ixgbe_free_q_vectors(struct ixgbe_adapter *adapter)
 	adapter->num_rx_queues = 0;
 	adapter->num_q_vectors = 0;
 
+	inc_ll_gen_id();
+
 	while (v_idx--)
 		ixgbe_free_q_vector(adapter, v_idx);
 }
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index d30fbdd..5e43258 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1504,7 +1504,9 @@ static void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
 {
 	struct ixgbe_adapter *adapter = q_vector->adapter;
 
-	if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL))
+	if (ixgbe_qv_ll_polling(q_vector))
+		netif_receive_skb(skb);
+	else if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL))
 		napi_gro_receive(&q_vector->napi, skb);
 	else
 		netif_rx(skb);
@@ -1892,9 +1894,9 @@ dma_sync:
  * expensive overhead for IOMMU access this provides a means of avoiding
  * it by maintaining the mapping of the page to the syste.
  *
- * Returns true if all work is completed without reaching budget
+ * Returns amount of work completed
  **/
-static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
+static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 			       struct ixgbe_ring *rx_ring,
 			       const int budget)
 {
@@ -1976,6 +1978,7 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		}
 
 #endif /* IXGBE_FCOE */
+		skb_mark_ll(skb, &q_vector->napi);
 		ixgbe_rx_skb(q_vector, skb);
 
 		/* update budget accounting */
@@ -1992,9 +1995,37 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 	if (cleaned_count)
 		ixgbe_alloc_rx_buffers(rx_ring, cleaned_count);
 
-	return (total_rx_packets < budget);
+	return total_rx_packets;
 }
 
+#ifdef CONFIG_INET_LL_RX_POLL
+/* must be called with local_bh_disable()d */
+static int ixgbe_low_latency_recv(struct napi_struct *napi)
+{
+	struct ixgbe_q_vector *q_vector =
+			container_of(napi, struct ixgbe_q_vector, napi);
+	struct ixgbe_adapter *adapter = q_vector->adapter;
+	struct ixgbe_ring  *ring;
+	int found = 0;
+
+	if (test_bit(__IXGBE_DOWN, &adapter->state))
+		return LL_FLUSH_FAILED;
+
+	if (!ixgbe_qv_lock_poll(q_vector))
+		return LL_FLUSH_BUSY;
+
+	ixgbe_for_each_ring(ring, q_vector->rx) {
+		found = ixgbe_clean_rx_irq(q_vector, ring, 4);
+		if (found)
+			break;
+	}
+
+	ixgbe_qv_unlock_poll(q_vector);
+
+	return found;
+}
+#endif	/* CONFIG_INET_LL_RX_POLL */
+
 /**
  * ixgbe_configure_msix - Configure MSI-X hardware
  * @adapter: board private structure
@@ -2550,6 +2581,9 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
 	ixgbe_for_each_ring(ring, q_vector->tx)
 		clean_complete &= !!ixgbe_clean_tx_irq(q_vector, ring);
 
+	if (!ixgbe_qv_lock_napi(q_vector))
+		return budget;
+
 	/* attempt to distribute budget to each queue fairly, but don't allow
 	 * the budget to go below 1 because we'll exit polling */
 	if (q_vector->rx.count > 1)
@@ -2558,9 +2592,10 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
 		per_ring_budget = budget;
 
 	ixgbe_for_each_ring(ring, q_vector->rx)
-		clean_complete &= ixgbe_clean_rx_irq(q_vector, ring,
-						     per_ring_budget);
+		clean_complete &= (ixgbe_clean_rx_irq(q_vector, ring,
+				   per_ring_budget) < per_ring_budget);
 
+	ixgbe_qv_unlock_napi(q_vector);
 	/* If all work not completed, return budget and keep polling */
 	if (!clean_complete)
 		return budget;
@@ -3747,16 +3782,25 @@ static void ixgbe_napi_enable_all(struct ixgbe_adapter *adapter)
 {
 	int q_idx;
 
-	for (q_idx = 0; q_idx < adapter->num_q_vectors; q_idx++)
+	for (q_idx = 0; q_idx < adapter->num_q_vectors; q_idx++) {
+		ixgbe_qv_init_lock(adapter->q_vector[q_idx]);
 		napi_enable(&adapter->q_vector[q_idx]->napi);
+	}
 }
 
 static void ixgbe_napi_disable_all(struct ixgbe_adapter *adapter)
 {
 	int q_idx;
 
-	for (q_idx = 0; q_idx < adapter->num_q_vectors; q_idx++)
+	local_bh_disable(); /* for ixgbe_qv_lock_napi() */
+	for (q_idx = 0; q_idx < adapter->num_q_vectors; q_idx++) {
 		napi_disable(&adapter->q_vector[q_idx]->napi);
+		while (!ixgbe_qv_lock_napi(adapter->q_vector[q_idx])) {
+			pr_info("QV %d locked\n", q_idx);
+			mdelay(1);
+		}
+	}
+	local_bh_enable();
 }
 
 #ifdef CONFIG_IXGBE_DCB
@@ -7177,6 +7221,9 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= ixgbe_netpoll,
 #endif
+#ifdef CONFIG_INET_LL_RX_POLL
+	.ndo_ll_poll		= ixgbe_low_latency_recv,
+#endif
 #ifdef IXGBE_FCOE
 	.ndo_fcoe_ddp_setup = ixgbe_fcoe_ddp_get,
 	.ndo_fcoe_ddp_target = ixgbe_fcoe_ddp_target,


------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 net-next 4/4] ixgbe: add extra stats for ndo_ll_poll
  2013-05-21 14:26 [PATCH v4 net-next 0/4] net: low latency Ethernet device polling Eliezer Tamir
                   ` (2 preceding siblings ...)
  2013-05-21 14:27 ` [PATCH v4 net-next 3/4] ixgbe: Add support for ndo_ll_poll Eliezer Tamir
@ 2013-05-21 14:27 ` Eliezer Tamir
  2013-05-23 11:00 ` [PATCH v4 net-next 0/4] net: low latency Ethernet device polling Alex Rosenbaum
  4 siblings, 0 replies; 12+ messages in thread
From: Eliezer Tamir @ 2013-05-21 14:27 UTC (permalink / raw)
  To: Dave Miller
  Cc: Willem de Bruijn, Or Gerlitz, e1000-devel, netdev, HPA,
	linux-kernel, Eliezer Tamir, Jesse Brandeburg, Andi Kleen,
	Eilon Greenstien

Add additional statistics to the ixgbe driver for ndo_ll_poll
Defined under LL_EXTENDED_STATS

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |   14 ++++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |   40 ++++++++++++++++++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |    6 +++
 3 files changed, 60 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index a2fd08b..58ac602 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -54,6 +54,9 @@
 
 #include <net/ll_poll.h>
 
+#ifdef CONFIG_INET_LL_RX_POLL
+#define LL_EXTENDED_STATS
+#endif
 /* common prefix used by pr_<> macros */
 #undef pr_fmt
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -184,6 +187,11 @@ struct ixgbe_rx_buffer {
 struct ixgbe_queue_stats {
 	u64 packets;
 	u64 bytes;
+#ifdef LL_EXTENDED_STATS
+	u64 yields;
+	u64 misses;
+	u64 cleaned;
+#endif  /* LL_EXTENDED_STATS */
 };
 
 struct ixgbe_tx_queue_stats {
@@ -391,6 +399,9 @@ static inline int ixgbe_qv_lock_napi(struct ixgbe_q_vector *q_vector)
 		WARN_ON(q_vector->state & IXGBE_QV_STATE_NAPI);
 		q_vector->state |= IXGBE_QV_STATE_NAPI_YIELD;
 		rc = false;
+#ifdef LL_EXTENDED_STATS
+		q_vector->tx.ring->stats.yields++;
+#endif
 	} else
 		/* we don't care if someone yielded */
 		q_vector->state = IXGBE_QV_STATE_NAPI;
@@ -421,6 +432,9 @@ static inline int ixgbe_qv_lock_poll(struct ixgbe_q_vector *q_vector)
 	if ((q_vector->state & IXGBE_QV_LOCKED)) {
 		q_vector->state |= IXGBE_QV_STATE_POLL_YIELD;
 		rc = false;
+#ifdef LL_EXTENDED_STATS
+		q_vector->rx.ring->stats.yields++;
+#endif
 	} else
 		/* preserve yield marks */
 		q_vector->state |= IXGBE_QV_STATE_POLL;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index d375472..24e2e7a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -1054,6 +1054,12 @@ static void ixgbe_get_ethtool_stats(struct net_device *netdev,
 			data[i] = 0;
 			data[i+1] = 0;
 			i += 2;
+#ifdef LL_EXTENDED_STATS
+			data[i] = 0;
+			data[i+1] = 0;
+			data[i+2] = 0;
+			i += 3;
+#endif
 			continue;
 		}
 
@@ -1063,6 +1069,12 @@ static void ixgbe_get_ethtool_stats(struct net_device *netdev,
 			data[i+1] = ring->stats.bytes;
 		} while (u64_stats_fetch_retry_bh(&ring->syncp, start));
 		i += 2;
+#ifdef LL_EXTENDED_STATS
+		data[i] = ring->stats.yields;
+		data[i+1] = ring->stats.misses;
+		data[i+2] = ring->stats.cleaned;
+		i += 3;
+#endif
 	}
 	for (j = 0; j < IXGBE_NUM_RX_QUEUES; j++) {
 		ring = adapter->rx_ring[j];
@@ -1070,6 +1082,12 @@ static void ixgbe_get_ethtool_stats(struct net_device *netdev,
 			data[i] = 0;
 			data[i+1] = 0;
 			i += 2;
+#ifdef LL_EXTENDED_STATS
+			data[i] = 0;
+			data[i+1] = 0;
+			data[i+2] = 0;
+			i += 3;
+#endif
 			continue;
 		}
 
@@ -1079,6 +1097,12 @@ static void ixgbe_get_ethtool_stats(struct net_device *netdev,
 			data[i+1] = ring->stats.bytes;
 		} while (u64_stats_fetch_retry_bh(&ring->syncp, start));
 		i += 2;
+#ifdef LL_EXTENDED_STATS
+		data[i] = ring->stats.yields;
+		data[i+1] = ring->stats.misses;
+		data[i+2] = ring->stats.cleaned;
+		i += 3;
+#endif
 	}
 
 	for (j = 0; j < IXGBE_MAX_PACKET_BUFFERS; j++) {
@@ -1115,12 +1139,28 @@ static void ixgbe_get_strings(struct net_device *netdev, u32 stringset,
 			p += ETH_GSTRING_LEN;
 			sprintf(p, "tx_queue_%u_bytes", i);
 			p += ETH_GSTRING_LEN;
+#ifdef LL_EXTENDED_STATS
+			sprintf(p, "tx_q_%u_napi_yield", i);
+			p += ETH_GSTRING_LEN;
+			sprintf(p, "tx_q_%u_misses", i);
+			p += ETH_GSTRING_LEN;
+			sprintf(p, "tx_q_%u_cleaned", i);
+			p += ETH_GSTRING_LEN;
+#endif /* LL_EXTENDED_STATS */
 		}
 		for (i = 0; i < IXGBE_NUM_RX_QUEUES; i++) {
 			sprintf(p, "rx_queue_%u_packets", i);
 			p += ETH_GSTRING_LEN;
 			sprintf(p, "rx_queue_%u_bytes", i);
 			p += ETH_GSTRING_LEN;
+#ifdef LL_EXTENDED_STATS
+			sprintf(p, "rx_q_%u_ll_poll_yield", i);
+			p += ETH_GSTRING_LEN;
+			sprintf(p, "rx_q_%u_misses", i);
+			p += ETH_GSTRING_LEN;
+			sprintf(p, "rx_q_%u_cleaned", i);
+			p += ETH_GSTRING_LEN;
+#endif /* LL_EXTENDED_STATS */
 		}
 		for (i = 0; i < IXGBE_MAX_PACKET_BUFFERS; i++) {
 			sprintf(p, "tx_pb_%u_pxon", i);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 5e43258..4c6ba4b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2016,6 +2016,12 @@ static int ixgbe_low_latency_recv(struct napi_struct *napi)
 
 	ixgbe_for_each_ring(ring, q_vector->rx) {
 		found = ixgbe_clean_rx_irq(q_vector, ring, 4);
+#ifdef LL_EXTENDED_STATS
+		if (found)
+			ring->stats.cleaned += found;
+		else
+			ring->stats.misses++;
+#endif
 		if (found)
 			break;
 	}


------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 net-next 1/4] net: implement support for low latency socket polling
  2013-05-21 14:26 ` [PATCH v4 net-next 1/4] net: implement support for low latency socket polling Eliezer Tamir
@ 2013-05-21 14:35   ` Eric Dumazet
  2013-05-21 17:23   ` Ben Hutchings
  2013-05-21 17:27   ` Ben Hutchings
  2 siblings, 0 replies; 12+ messages in thread
From: Eric Dumazet @ 2013-05-21 14:35 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: Willem de Bruijn, Don, Or Gerlitz, e1000-devel, netdev, HPA,
	Jesse Brandeburg, Eliezer Tamir, linux-kernel, Andi Kleen,
	Eilon Greenstien, Dave Miller

On Tue, 2013-05-21 at 17:26 +0300, Eliezer Tamir wrote:

> +/* should be called when destroying a napi struct */
> +static inline void inc_ll_gen_id(void)
> +{
> +	ll_global_gen_id++;
> +}
> +
> +static inline void skb_mark_ll(struct sk_buff *skb, struct napi_struct *napi)
> +{
> +	skb->dev_ref = napi;
> +	skb->ll_gen_id = SKB_LL_GEN(ll_global_gen_id);
> +}
> +
> +static inline void sk_mark_ll(struct sock *sk, struct sk_buff *skb)
> +{
> +	if (skb->dev_ref && skb->ll_gen_id == SKB_LL_GEN(ll_global_gen_id)) {
> +		sk->dev_ref = skb->dev_ref;
> +		sk->ll_gen_id = ll_global_gen_id;
> +	} else
> +		sk->dev_ref = NULL; /* clear expired ref */
> +}
> +

Thats really hacky.

Please dont rush sending a new patch set every day.




------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 net-next 1/4] net: implement support for low latency socket polling
  2013-05-21 14:26 ` [PATCH v4 net-next 1/4] net: implement support for low latency socket polling Eliezer Tamir
  2013-05-21 14:35   ` Eric Dumazet
@ 2013-05-21 17:23   ` Ben Hutchings
  2013-05-21 17:28     ` Eliezer Tamir
  2013-05-21 17:27   ` Ben Hutchings
  2 siblings, 1 reply; 12+ messages in thread
From: Ben Hutchings @ 2013-05-21 17:23 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: Dave Miller, linux-kernel, netdev, Jesse Brandeburg,
	Don Skidmore, e1000-devel, Willem de Bruijn, Andi Kleen, HPA,
	Or Gerlitz, Eilon Greenstien, Eliezer Tamir

On Tue, 2013-05-21 at 17:26 +0300, Eliezer Tamir wrote:
> Adds a new ndo_ll_poll method and the code that supports and uses it.
> This method can be used by low latency applications to busy poll ethernet
> device queues directly from the socket code. The ip_low_latency_poll sysctl
> entry controls how many cycles to poll. Set to zero to disable.

Microseconds, not cycles.

[...]
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -19,6 +19,12 @@ ip_no_pmtu_disc - BOOLEAN
>  	Disable Path MTU Discovery.
>  	default FALSE
>  
> +ip_low_latency_poll - INTEGER
> +	Low latency busy poll timeout. (needs CONFIG_INET_LL_RX_POLL)
> +	Approximate time in ms to spin waiting for packets on the device queue.

us, not ms.

[...]
> --- /dev/null
> +++ b/include/net/ll_poll.h
> @@ -0,0 +1,117 @@
> +/*
> + * low latency network device queue flush
> + * Copyright(c) 2013 Intel Corporation.
> + * Author: Eliezer Tamir
> + *
> + * For now this depends on CONFIG_I86_TSC

CONFIG_X86_TSC

[...]
> +static inline bool sk_poll_ll(struct sock *sk, int nonblock)
> +{
> +	unsigned long end_time = TSC_MHZ * ACCESS_ONCE(sysctl_net_ll_poll)
> +					+ get_cycles();
> +	struct napi_struct *napi = sk->dev_ref;
> +	const struct net_device_ops *ops;
> +	int rc;
> +
> +	if (!napi->dev->netdev_ops->ndo_ll_poll)
> +		return false;
> +
> +	local_bh_disable();
> +
> +	ops = napi->dev->netdev_ops;

Should be done before testing ndo_ll_poll above, so you can avoid
repeating the indirections.

> +	while (skb_queue_empty(&sk->sk_receive_queue) &&
> +			!time_after((unsigned long)get_cycles(), end_time)) {
> +		rc = ops->ndo_ll_poll(napi);
> +
> +		if (rc == LL_FLUSH_FAILED)
> +			break; /* premanent failure */
[...]

Typo: 'permanent'.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 net-next 1/4] net: implement support for low latency socket polling
  2013-05-21 14:26 ` [PATCH v4 net-next 1/4] net: implement support for low latency socket polling Eliezer Tamir
  2013-05-21 14:35   ` Eric Dumazet
  2013-05-21 17:23   ` Ben Hutchings
@ 2013-05-21 17:27   ` Ben Hutchings
  2 siblings, 0 replies; 12+ messages in thread
From: Ben Hutchings @ 2013-05-21 17:27 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: Willem de Bruijn, Don, Or Gerlitz, e1000-devel, netdev, HPA,
	Jesse Brandeburg, Eliezer Tamir, linux-kernel, Andi Kleen,
	Eilon Greenstien, Dave Miller

On Tue, 2013-05-21 at 17:26 +0300, Eliezer Tamir wrote:
> Adds a new ndo_ll_poll method and the code that supports and uses it.
> This method can be used by low latency applications to busy poll ethernet
> device queues directly from the socket code. The ip_low_latency_poll sysctl
> entry controls how many cycles to poll. Set to zero to disable.
[...]

One more general point: why is this treated as an IPv4 option (in
Kconfig and sysctls)?  I don't have any particular expectation that it
will be used with network protocols other than IPv4 and v6 any time
soon, but logically it's not dependent on them.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 net-next 1/4] net: implement support for low latency socket polling
  2013-05-21 17:23   ` Ben Hutchings
@ 2013-05-21 17:28     ` Eliezer Tamir
  0 siblings, 0 replies; 12+ messages in thread
From: Eliezer Tamir @ 2013-05-21 17:28 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Dave Miller, linux-kernel, netdev, Jesse Brandeburg,
	Don Skidmore, e1000-devel, Willem de Bruijn, Andi Kleen, HPA,
	Or Gerlitz, Eilon Greenstien, Eliezer Tamir

On 21/05/2013 20:23, Ben Hutchings wrote:
> On Tue, 2013-05-21 at 17:26 +0300, Eliezer Tamir wrote:

> Microseconds, not cycles.

> us, not ms.

> CONFIG_X86_TSC

> Should be done before testing ndo_ll_poll above, so you can avoid
> repeating the indirections.

> Typo: 'permanent'.
>
> Ben.
>
Thanks!
-Eliezer

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 net-next 2/4] tcp: add TCP support for low latency receive poll.
  2013-05-21 14:27 ` [PATCH v4 net-next 2/4] tcp: add TCP support for low latency receive poll Eliezer Tamir
@ 2013-05-21 17:30   ` Ben Hutchings
  0 siblings, 0 replies; 12+ messages in thread
From: Ben Hutchings @ 2013-05-21 17:30 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: Willem de Bruijn, Don, Or Gerlitz, e1000-devel, netdev, HPA,
	Jesse Brandeburg, Eliezer Tamir, linux-kernel, Andi Kleen,
	Eilon Greenstien, Dave Miller

On Tue, 2013-05-21 at 17:27 +0300, Eliezer Tamir wrote:
> adds busy-poll support for TCP.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Tested-by: Willem de Bruijn <willemb@google.com>
> Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
> ---
> 
>  net/ipv4/tcp.c       |    9 +++++++++
>  net/ipv4/tcp_input.c |    4 ++++
>  2 files changed, 13 insertions(+), 0 deletions(-)
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index dcb116d..b9cc512 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
[...]
> @@ -1551,6 +1553,12 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
>  	struct sk_buff *skb;
>  	u32 urg_hole = 0;
>  
> +#ifdef CONFIG_INET_LL_RX_POLL
> +	if (sk_valid_ll(sk) && skb_queue_empty(&sk->sk_receive_queue)
> +	    && (sk->sk_state == TCP_ESTABLISHED))
> +		sk_poll_ll(sk, nonblock);
> +#endif
[...]

I don't think the #ifdef is needed; this should compile down to nothing
if the config option is disabled.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 net-next 3/4] ixgbe: Add support for ndo_ll_poll
  2013-05-21 14:27 ` [PATCH v4 net-next 3/4] ixgbe: Add support for ndo_ll_poll Eliezer Tamir
@ 2013-05-21 17:33   ` Ben Hutchings
  0 siblings, 0 replies; 12+ messages in thread
From: Ben Hutchings @ 2013-05-21 17:33 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: Willem de Bruijn, Don, Or Gerlitz, e1000-devel, netdev, HPA,
	Jesse Brandeburg, Eliezer Tamir, linux-kernel, Andi Kleen,
	Eilon Greenstien, Dave Miller

On Tue, 2013-05-21 at 17:27 +0300, Eliezer Tamir wrote:
> Add the ixgbe driver code implementing ndo_ll_poll.
> It should be easy for other drivers to do something similar
> in order to enable support for CONFIG_INET_LL_RX_POLL
[...]
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> @@ -1019,6 +1019,8 @@ static void ixgbe_free_q_vectors(struct ixgbe_adapter *adapter)
>  	adapter->num_rx_queues = 0;
>  	adapter->num_q_vectors = 0;
>  
> +	inc_ll_gen_id();

I think that should be handled by the networking core (somewhere) and
not by drivers.

Ben.

>  	while (v_idx--)
>  		ixgbe_free_q_vector(adapter, v_idx);
>  }
[...]

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 net-next 0/4] net: low latency Ethernet device polling
  2013-05-21 14:26 [PATCH v4 net-next 0/4] net: low latency Ethernet device polling Eliezer Tamir
                   ` (3 preceding siblings ...)
  2013-05-21 14:27 ` [PATCH v4 net-next 4/4] ixgbe: add extra stats " Eliezer Tamir
@ 2013-05-23 11:00 ` Alex Rosenbaum
  4 siblings, 0 replies; 12+ messages in thread
From: Alex Rosenbaum @ 2013-05-23 11:00 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: Dave Miller, linux-kernel, netdev, Jesse Brandeburg,
	Don Skidmore, e1000-devel, Willem de Bruijn, Andi Kleen, HPA,
	Or Gerlitz, Eilon Greenstien, Eliezer Tamir

Eliezer,

With AmirV's help we got this working on our NIC as well and it look 
nice. We too see the nice performance gain.

I tested with epoll and as expected there is no performance improvement. 
I don't think there is any point delaying this feature commit due to 
this fact. Future development should handle that.

I also tested LLS with different message rates.
Using sockperf you can set a ping send rate (--msp) and measure latency 
at different rates (I don't think netperf can do this).
In the financial trading sector, low latencies for 100mps are just as 
important as in 50Kmsp (or higher). The market orders go out at these 
low rates.
I noticed a penalty in the latency performance as I go lower in mps. I 
don't think it is related to the LLS code but it is more obvious then 
without it since you reach lower results.

These numbers are for sockperf TCP ping-pong at different msp for a 12 
byte payload.
I verified LLS hit counter was at 100% for all different message rates 
on both server and client side.
Machine is a x86_64 Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz with 64GB ram.

    rate   | LLS-on RTT | LLS-off RTT
   (msp) |    (usec)      |   (usec)
  10000 |     14.0        |     21.8
    1000 |     15.6        |     23.0
      100 |     16.6        |     24.4

You can see that as I go lower in send message rate the latency increases.

* Don't consider these number as best results, they or on a random 
machine with some effort tuning and core isolation. I saw this hit in 
performance as lower the msg rate on several machine elsewhere and I am 
sure it will reproduce on your tuned machine so you can notice it as well.

Again, this should not block your feature commit but is interesting for 
me to understand and I though someone here might have a good explanation.

thanks,
Alex Rosenbaum
Director R&D Application Acceleration
Mellanox Technologies | Raanana, Israel | +972 (74) 712-9215

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-05-23 11:00 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-21 14:26 [PATCH v4 net-next 0/4] net: low latency Ethernet device polling Eliezer Tamir
2013-05-21 14:26 ` [PATCH v4 net-next 1/4] net: implement support for low latency socket polling Eliezer Tamir
2013-05-21 14:35   ` Eric Dumazet
2013-05-21 17:23   ` Ben Hutchings
2013-05-21 17:28     ` Eliezer Tamir
2013-05-21 17:27   ` Ben Hutchings
2013-05-21 14:27 ` [PATCH v4 net-next 2/4] tcp: add TCP support for low latency receive poll Eliezer Tamir
2013-05-21 17:30   ` Ben Hutchings
2013-05-21 14:27 ` [PATCH v4 net-next 3/4] ixgbe: Add support for ndo_ll_poll Eliezer Tamir
2013-05-21 17:33   ` Ben Hutchings
2013-05-21 14:27 ` [PATCH v4 net-next 4/4] ixgbe: add extra stats " Eliezer Tamir
2013-05-23 11:00 ` [PATCH v4 net-next 0/4] net: low latency Ethernet device polling Alex Rosenbaum

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).