All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 net-next 0/4] net: low latency Ethernet device polling
@ 2013-05-19 10:25 Eliezer Tamir
  2013-05-19 10:25   ` Eliezer Tamir
                   ` (4 more replies)
  0 siblings, 5 replies; 29+ messages in thread
From: Eliezer Tamir @ 2013-05-19 10:25 UTC (permalink / raw)
  To: Dave Miller
  Cc: linux-kernel, netdev, Jesse Brandeburg, Don Skidmore,
	e1000-devel, Willem de Bruijn, Andi Kleen, HPA, Eliezer Tamir

Dave, 

Please consider applying to net-next.

Thanks,
Eliezer

This is an updated version of the code we posted on February.

Patch 1 adds ndo_ll_poll and the IP code to use it.
Patch 2 is an example of how TCP can use ndo_ll_poll.
Patch 3 shows how this method would be implemented for the ixgbe driver.
Patch 4 adds statistics to the ixgbe driver for ndo_ll_poll events.

Changes from previous version:
1. The sysctl knob is now in microseconds, we don't adjust it for cpu
clock changes. The default value is now 0 (off).
Recommended value is around 50.

2. For now the code depends at configure time on CONFIG_I86_TSC to
satisfy both the need for a high precision get_cycles() and a 64 bit
cycles_t. I looked into using sched_clock(). It looks like it does not
have the required precision on all architectures. Using config it would
be easy to add other architectures once some testing has been done on them.

3. The napi reference in struct skb is now a union with the dma cookie
since the former is only used on RX and the latter on TX, as suggested
by Eric Dumazet.

4. We do a better job at honoring non-blocking operations.

5. Removed busy-polling support for tcp_read_sock().
Doing a release_sock() followed by a lock_sock() to get the backlog 
processed is unsafe there.
If there is interest in tcp_read_sock() support we would need another
way to get backlog processing done.
BTW I was not able to find a microbenchamrk that uses tcp_read_sock(),
any suggestions?

6. To avoid the overhead of reference counting napi structs by skbs
and sockets in the fastpath, and increasing the size of the skb struct,
we no longer allow unloading the module once this feature has been used.

It seems that for most of the people interested in busy-polling, giving
up the ability to blindly remove the module for a slight but measurable
performance gain is a good tradeoff.
(There is a module parameter to override this behavior and if you know
what you are doing and are careful to stop the processes you can safely
unload, but we don't enforce this.)

7. We no longer try to dynamically turn GRO off when someone is busy-
polling, since this sometimes caused reordering with packets left on
the napi->gro_list by napi. For most workloads you should probably start
by globally disabling GRO with ethtool. In some cases the performance
gain of GRO greatly outweighs the cost of reordering.
Your mileage may vary.

8. Many small changes suggested by people here. I would like to thank
all of the people that took the time to review our code.

The performance is about the same as the last time.
I promised Rick Jones CPU utilization numbers so here are some examples
with these numbers added.

Performance numbers:
        setup                    TCP_RR              UDP_RR
kernel  Config     C3/6 rx-usecs tps cpu% S.dem   tps cpu% S.dem
patched optimized* on   100      87k 3.13 11.4   94K 3.17 10.7
patched optimized* on   0        71k 3.12 14.0   84k 3.19 12.0
patched optimized* on   adaptive 80k 3.13 12.5   90k 3.46 12.2
patched typical    on   100      72  3.13 14.0   79k 3.17 12.8
patched typical    on   0        60k 2.13 16.5   71k 3.18 14.0
patched typical    on   adaptive 67k 3.51 16.7   75k 3.36 14.5
3.9     optimized* on   adaptive 25k 1.0  12.7   28k 0.98 11.2
3.9     typical    off  0        48k 1.09  7.3   52k 1.11 4.18
3.9     typical    0ff  adaptive 35k 1.12 4.08   38k 0.65 5.49
3.9     optimized* off  adaptive 40k 0.82 4.83   43k 0.70 5.23
3.9     optimized* off  0        57k 1.17 4.08   62k 1.04 3.95
*not the same config as the one used in v1.

Test setup details:
Machines: each with two Intel Xeon 2680 CPUs and X520 (82599) optical NICs
Tests: Netperf tcp_rr and udp_rr, 1 byte (round trips per second)
Kernel: unmodified 3.9 and patched 3.9
Config: typical is derived from RH6.2, optimized is a stripped down config.
Interrupt coalescing (ethtool rx-usecs) settings: 0=off, 1=adaptive, 100 us
When C3/6 states were turned on (via BIOS) the performance governor was used.

This is not the same exact optimized config that I used last time.
When trying it on kernel 3.9 my machines would not boot.
So I re-did it and I removed a slightly different set of options.
As a result it is a bit faster on the patched kernel.
This is also probably the explanation for a slight regression in 
the performance of the unpatched 3.9 kernel with the optimized config
compared to the 3.8 results.

how to test:
(changes from v1 are highlighted with ***)

1. The patchset should apply cleanly to net-next.
If someone wants a set for 3.9 I can give it to them.
(don't forget to configure INET_LL_RX_POLL and INET_LL_TCP_POLL).

2. The ethtool -c setting for rx-usecs should be on the order of 100.

3. *** Use ethtool -K to disable GRO and LRO
(You are encouraged to try it both ways. If you find that your workload
does better with GRO on do tell us.)

4. *** Sysctl value net.ipv4.ip_low_latency_poll controls how long
(in us) to busy-wait for more data, You are encouraged to play
with this and see what works for you. The default is now 0 so you need to
set it to turn the feature on. I recommend a value around 50.

4. benchmark thread and IRQ should be bound to separate cores.
Both cores should be on the same CPU NUMA node as the NIC.
When the app and the IRQ run on the same CPU  you get a ~5% penalty.
If interrupt coalescing is set to a low value this penalty
can be very large.

5. If you suspect that your machine is not configured properly,
use numademo to make sure that the CPU to memory BW is OK.
numademo 128m memcpy local copy numbers should be more than
8GB/s on a properly configured machine.

Credit:
Jesse Brandeburg, Arun Chekhov Ilango, Julie Cummings,
Alexander Duyck, Eric Geisler, Jason Neighbors, Yadong Li,
Mike Polehn, Anil Vasudevan, Don Wood
Special thanks for finding bugs in earlier versions:
Willem de Bruijn and Andi Kleen

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 net-next 1/4] net: implement support for low latency socket polling
  2013-05-19 10:25 [PATCH v2 net-next 0/4] net: low latency Ethernet device polling Eliezer Tamir
@ 2013-05-19 10:25   ` Eliezer Tamir
  2013-05-19 10:25   ` Eliezer Tamir
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 29+ messages in thread
From: Eliezer Tamir @ 2013-05-19 10:25 UTC (permalink / raw)
  To: Dave Miller
  Cc: linux-kernel, netdev, Jesse Brandeburg, Don Skidmore,
	e1000-devel, Willem de Bruijn, Andi Kleen, HPA, Eliezer Tamir

Adds a new ndo_ll_poll method and the code that supports and uses it.
This method can be used by low latency applications to busy poll ethernet
device queues directly from the socket code. The ip_low_latency_poll sysctl
entry controls how many cycles to poll. Set to zero to disable.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 Documentation/networking/ip-sysctl.txt |    5 ++
 include/linux/netdevice.h              |    3 +
 include/linux/skbuff.h                 |    8 +++-
 include/net/ll_poll.h                  |   73 ++++++++++++++++++++++++++++++++
 include/net/sock.h                     |    3 +
 net/core/datagram.c                    |    7 +++
 net/core/skbuff.c                      |    4 ++
 net/core/sock.c                        |    6 +++
 net/ipv4/Kconfig                       |   12 +++++
 net/ipv4/sysctl_net_ipv4.c             |   10 ++++
 net/socket.c                           |   25 +++++++++++
 11 files changed, 153 insertions(+), 3 deletions(-)
 create mode 100644 include/net/ll_poll.h

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index f98ca63..cfcf0ea 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -19,6 +19,11 @@ ip_no_pmtu_disc - BOOLEAN
 	Disable Path MTU Discovery.
 	default FALSE
 
+ip_low_latency_poll - INTEGER
+	Low latency busy poll timeout. (needs CONFIG_INET_LL_RX_POLL)
+	Approximate time in ms to spin waiting for packets on the device queue.
+	default 0
+
 min_pmtu - INTEGER
 	default 552 - minimum discovered Path MTU
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a94a5a0..e25f798 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -943,6 +943,9 @@ struct net_device_ops {
 						     gfp_t gfp);
 	void			(*ndo_netpoll_cleanup)(struct net_device *dev);
 #endif
+#ifdef CONFIG_INET_LL_RX_POLL
+	int			(*ndo_ll_poll)(struct napi_struct *dev);
+#endif
 	int			(*ndo_set_vf_mac)(struct net_device *dev,
 						  int queue, u8 *mac);
 	int			(*ndo_set_vf_vlan)(struct net_device *dev,
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 2e0ced1..4047e1e 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -384,6 +384,7 @@ typedef unsigned char *sk_buff_data_t;
  *	@no_fcs:  Request NIC to treat last 4 bytes as Ethernet FCS
  *	@dma_cookie: a cookie to one of several possible DMA operations
  *		done by skb DMA functions
+ *      @dev_ref: the NAPI struct this skb came from
  *	@secmark: security marking
  *	@mark: Generic packet mark
  *	@dropcount: total number of sk_receive_queue overflows
@@ -497,8 +498,11 @@ struct sk_buff {
 	/* 7/9 bit hole (depending on ndisc_nodetype presence) */
 	kmemcheck_bitfield_end(flags2);
 
-#ifdef CONFIG_NET_DMA
-	dma_cookie_t		dma_cookie;
+#if defined CONFIG_NET_DMA || defined CONFIG_INET_LL_RX_POLL
+	union {
+		struct napi_struct	*dev_ref;
+		dma_cookie_t		dma_cookie;
+	};
 #endif
 #ifdef CONFIG_NETWORK_SECMARK
 	__u32			secmark;
diff --git a/include/net/ll_poll.h b/include/net/ll_poll.h
new file mode 100644
index 0000000..6b5c03f
--- /dev/null
+++ b/include/net/ll_poll.h
@@ -0,0 +1,73 @@
+/*
+ * low latency network device queue flush
+ * Copyright(c) 2013 Intel Corporation.
+ * Author: Eliezer Tamir
+ *
+ * For now this depends on CONFIG_I86_TSC
+ */
+
+#ifndef _LINUX_NET_LL_POLL_H
+#define _LINUX_NET_LL_POLL_H
+#ifdef CONFIG_INET_LL_RX_POLL
+#include <linux/netdevice.h>
+struct napi_struct;
+extern int sysctl_net_ll_poll __read_mostly;
+
+/* return values from ndo_ll_poll */
+#define LL_FLUSH_DONE		0
+#define LL_FLUSH_FAILED		1
+#define LL_FLUSH_BUSY		2
+
+/* we don't mind a ~2.5% imprecision */
+#define TSC_MHZ (tsc_khz >> 10)
+
+static inline bool sk_valid_ll(struct sock *sk)
+{
+	return sysctl_net_ll_poll && sk->dev_ref &&
+		!need_resched() && !signal_pending(current);
+}
+
+static inline bool sk_poll_ll(struct sock *sk, int nonblock)
+{
+	struct napi_struct *napi = sk->dev_ref;
+	const struct net_device_ops *ops;
+	unsigned long end_time = TSC_MHZ * ACCESS_ONCE(sysctl_net_ll_poll)
+					+ get_cycles();
+
+	if (!napi->dev->netdev_ops->ndo_ll_poll)
+		return false;
+
+	local_bh_disable();
+
+	ops = napi->dev->netdev_ops;
+	while (skb_queue_empty(&sk->sk_receive_queue) &&
+			!time_after((unsigned long)get_cycles(), end_time)) {
+		if (ops->ndo_ll_poll(napi) == LL_FLUSH_FAILED)
+				break; /* premanent failure */
+		if (nonblock)
+			break;
+	}
+
+	local_bh_enable();
+
+	return !skb_queue_empty(&sk->sk_receive_queue);
+}
+
+static inline void skb_mark_ll(struct sk_buff *skb, struct napi_struct *napi)
+{
+	skb->dev_ref = napi;
+}
+
+static inline void sk_mark_ll(struct sock *sk, struct sk_buff *skb)
+{
+		sk->dev_ref = skb->dev_ref;
+}
+#else /* CONFIG_INET_LL_RX_FLUSH */
+
+#define sk_valid_ll(sk) 0
+#define sk_poll_ll(sk, nonblock) do {} while (0)
+#define skb_mark_ll(napi, skb) do {} while (0)
+#define sk_mark_ll(sk, skb) do {} while (0)
+
+#endif /* CONFIG_INET_LL_RX_FLUSH */
+#endif /* _LINUX_NET_LL_POLL_H */
diff --git a/include/net/sock.h b/include/net/sock.h
index 66772cf..eeabdcd4 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -399,6 +399,9 @@ struct sock {
 	int			(*sk_backlog_rcv)(struct sock *sk,
 						  struct sk_buff *skb);
 	void                    (*sk_destruct)(struct sock *sk);
+#ifdef CONFIG_INET_LL_RX_POLL
+	struct napi_struct	*dev_ref;
+#endif
 };
 
 /*
diff --git a/net/core/datagram.c b/net/core/datagram.c
index b71423d..df3dab8 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -56,6 +56,7 @@
 #include <net/sock.h>
 #include <net/tcp_states.h>
 #include <trace/events/skb.h>
+#include <net/ll_poll.h>
 
 /*
  *	Is a socket 'connection oriented' ?
@@ -201,12 +202,18 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
 			} else
 				__skb_unlink(skb, queue);
 
+			sk_mark_ll(sk, skb);
 			spin_unlock_irqrestore(&queue->lock, cpu_flags);
 			*off = _off;
 			return skb;
 		}
 		spin_unlock_irqrestore(&queue->lock, cpu_flags);
 
+#ifdef CONFIG_INET_LL_RX_POLL
+		if (sk_valid_ll(sk) && sk_poll_ll(sk, flags & MSG_DONTWAIT))
+			continue;
+#endif
+
 		/* User doesn't want to wait */
 		error = -EAGAIN;
 		if (!timeo)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index af9185d..4efd230 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -739,6 +739,10 @@ static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
 	new->vlan_tci		= old->vlan_tci;
 
 	skb_copy_secmark(new, old);
+
+#ifdef CONFIG_INET_LL_RX_POLL
+	new->dev_ref		= old->dev_ref;
+#endif
 }
 
 /*
diff --git a/net/core/sock.c b/net/core/sock.c
index 6ba327d..d8058ce 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -139,6 +139,8 @@
 #include <net/tcp.h>
 #endif
 
+#include <net/ll_poll.h>
+
 static DEFINE_MUTEX(proto_list_mutex);
 static LIST_HEAD(proto_list);
 
@@ -2284,6 +2286,10 @@ void sock_init_data(struct socket *sock, struct sock *sk)
 
 	sk->sk_stamp = ktime_set(-1L, 0);
 
+#ifdef CONFIG_INET_LL_RX_POLL
+	sk->dev_ref	=	NULL;
+#endif
+
 	/*
 	 * Before updating sk_refcnt, we must commit prior changes to memory
 	 * (Documentation/RCU/rculist_nulls.txt for details)
diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index 8603ca8..d209f0f 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -409,6 +409,18 @@ config INET_LRO
 
 	  If unsure, say Y.
 
+config INET_LL_RX_POLL
+	bool "Low Latency Receive Poll"
+	depends on X86_TSC
+	default n
+	---help---
+	  Support Low Latency Receive Queue Poll.
+	  (For network card drivers which support this option.)
+	  When waiting for data in read or poll call directly into the the device driver
+	  to flush packets which may be pending on the device queues into the stack.
+
+	  If unsure, say N.
+
 config INET_DIAG
 	tristate "INET: socket monitoring interface"
 	default y
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index fa2f63f..d0fcaaf 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -25,6 +25,7 @@
 #include <net/inet_frag.h>
 #include <net/ping.h>
 #include <net/tcp_memcontrol.h>
+#include <net/ll_poll.h>
 
 static int zero;
 static int one = 1;
@@ -326,6 +327,15 @@ static struct ctl_table ipv4_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
+#ifdef CONFIG_INET_LL_RX_POLL
+	{
+		.procname	= "ip_low_latency_poll",
+		.data		= &sysctl_net_ll_poll,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
+#endif
 	{
 		.procname	= "tcp_syn_retries",
 		.data		= &sysctl_tcp_syn_retries,
diff --git a/net/socket.c b/net/socket.c
index 6b94633..4afe504 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -105,6 +105,12 @@
 #include <linux/sockios.h>
 #include <linux/atalk.h>
 
+#ifdef CONFIG_INET_LL_RX_POLL
+#include <net/ll_poll.h>
+int sysctl_net_ll_poll __read_mostly;
+EXPORT_SYMBOL_GPL(sysctl_net_ll_poll);
+#endif
+
 static int sock_no_open(struct inode *irrelevant, struct file *dontcare);
 static ssize_t sock_aio_read(struct kiocb *iocb, const struct iovec *iov,
 			 unsigned long nr_segs, loff_t pos);
@@ -1143,12 +1149,29 @@ EXPORT_SYMBOL(sock_create_lite);
 static unsigned int sock_poll(struct file *file, poll_table *wait)
 {
 	struct socket *sock;
+	unsigned int poll_result;
 
 	/*
 	 *      We can't return errors to poll, so it's either yes or no.
 	 */
 	sock = file->private_data;
-	return sock->ops->poll(file, sock, wait);
+
+	poll_result = sock->ops->poll(file, sock, wait);
+
+#ifdef CONFIG_INET_LL_RX_POLL
+	if (wait &&
+	    !(poll_result & (POLLRDNORM | POLLERR | POLLRDHUP | POLLHUP))) {
+
+		struct sock *sk = sock->sk;
+
+		/* only try once per poll */
+		if (sk_valid_ll(sk) && sk_poll_ll(sk, 1))
+			poll_result = sock->ops->poll(file, sock, wait);
+
+	}
+#endif /* CONFIG_INET_LL_RX_POLL */
+
+	return poll_result;
 }
 
 static int sock_mmap(struct file *file, struct vm_area_struct *vma)


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 net-next 1/4] net: implement support for low latency socket polling
@ 2013-05-19 10:25   ` Eliezer Tamir
  0 siblings, 0 replies; 29+ messages in thread
From: Eliezer Tamir @ 2013-05-19 10:25 UTC (permalink / raw)
  To: Dave Miller
  Cc: Willem de Bruijn, e1000-devel, netdev, linux-kernel,
	Jesse Brandeburg, Andi Kleen, Don Skidmore, HPA, Eliezer Tamir

Adds a new ndo_ll_poll method and the code that supports and uses it.
This method can be used by low latency applications to busy poll ethernet
device queues directly from the socket code. The ip_low_latency_poll sysctl
entry controls how many cycles to poll. Set to zero to disable.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 Documentation/networking/ip-sysctl.txt |    5 ++
 include/linux/netdevice.h              |    3 +
 include/linux/skbuff.h                 |    8 +++-
 include/net/ll_poll.h                  |   73 ++++++++++++++++++++++++++++++++
 include/net/sock.h                     |    3 +
 net/core/datagram.c                    |    7 +++
 net/core/skbuff.c                      |    4 ++
 net/core/sock.c                        |    6 +++
 net/ipv4/Kconfig                       |   12 +++++
 net/ipv4/sysctl_net_ipv4.c             |   10 ++++
 net/socket.c                           |   25 +++++++++++
 11 files changed, 153 insertions(+), 3 deletions(-)
 create mode 100644 include/net/ll_poll.h

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index f98ca63..cfcf0ea 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -19,6 +19,11 @@ ip_no_pmtu_disc - BOOLEAN
 	Disable Path MTU Discovery.
 	default FALSE
 
+ip_low_latency_poll - INTEGER
+	Low latency busy poll timeout. (needs CONFIG_INET_LL_RX_POLL)
+	Approximate time in ms to spin waiting for packets on the device queue.
+	default 0
+
 min_pmtu - INTEGER
 	default 552 - minimum discovered Path MTU
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a94a5a0..e25f798 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -943,6 +943,9 @@ struct net_device_ops {
 						     gfp_t gfp);
 	void			(*ndo_netpoll_cleanup)(struct net_device *dev);
 #endif
+#ifdef CONFIG_INET_LL_RX_POLL
+	int			(*ndo_ll_poll)(struct napi_struct *dev);
+#endif
 	int			(*ndo_set_vf_mac)(struct net_device *dev,
 						  int queue, u8 *mac);
 	int			(*ndo_set_vf_vlan)(struct net_device *dev,
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 2e0ced1..4047e1e 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -384,6 +384,7 @@ typedef unsigned char *sk_buff_data_t;
  *	@no_fcs:  Request NIC to treat last 4 bytes as Ethernet FCS
  *	@dma_cookie: a cookie to one of several possible DMA operations
  *		done by skb DMA functions
+ *      @dev_ref: the NAPI struct this skb came from
  *	@secmark: security marking
  *	@mark: Generic packet mark
  *	@dropcount: total number of sk_receive_queue overflows
@@ -497,8 +498,11 @@ struct sk_buff {
 	/* 7/9 bit hole (depending on ndisc_nodetype presence) */
 	kmemcheck_bitfield_end(flags2);
 
-#ifdef CONFIG_NET_DMA
-	dma_cookie_t		dma_cookie;
+#if defined CONFIG_NET_DMA || defined CONFIG_INET_LL_RX_POLL
+	union {
+		struct napi_struct	*dev_ref;
+		dma_cookie_t		dma_cookie;
+	};
 #endif
 #ifdef CONFIG_NETWORK_SECMARK
 	__u32			secmark;
diff --git a/include/net/ll_poll.h b/include/net/ll_poll.h
new file mode 100644
index 0000000..6b5c03f
--- /dev/null
+++ b/include/net/ll_poll.h
@@ -0,0 +1,73 @@
+/*
+ * low latency network device queue flush
+ * Copyright(c) 2013 Intel Corporation.
+ * Author: Eliezer Tamir
+ *
+ * For now this depends on CONFIG_I86_TSC
+ */
+
+#ifndef _LINUX_NET_LL_POLL_H
+#define _LINUX_NET_LL_POLL_H
+#ifdef CONFIG_INET_LL_RX_POLL
+#include <linux/netdevice.h>
+struct napi_struct;
+extern int sysctl_net_ll_poll __read_mostly;
+
+/* return values from ndo_ll_poll */
+#define LL_FLUSH_DONE		0
+#define LL_FLUSH_FAILED		1
+#define LL_FLUSH_BUSY		2
+
+/* we don't mind a ~2.5% imprecision */
+#define TSC_MHZ (tsc_khz >> 10)
+
+static inline bool sk_valid_ll(struct sock *sk)
+{
+	return sysctl_net_ll_poll && sk->dev_ref &&
+		!need_resched() && !signal_pending(current);
+}
+
+static inline bool sk_poll_ll(struct sock *sk, int nonblock)
+{
+	struct napi_struct *napi = sk->dev_ref;
+	const struct net_device_ops *ops;
+	unsigned long end_time = TSC_MHZ * ACCESS_ONCE(sysctl_net_ll_poll)
+					+ get_cycles();
+
+	if (!napi->dev->netdev_ops->ndo_ll_poll)
+		return false;
+
+	local_bh_disable();
+
+	ops = napi->dev->netdev_ops;
+	while (skb_queue_empty(&sk->sk_receive_queue) &&
+			!time_after((unsigned long)get_cycles(), end_time)) {
+		if (ops->ndo_ll_poll(napi) == LL_FLUSH_FAILED)
+				break; /* premanent failure */
+		if (nonblock)
+			break;
+	}
+
+	local_bh_enable();
+
+	return !skb_queue_empty(&sk->sk_receive_queue);
+}
+
+static inline void skb_mark_ll(struct sk_buff *skb, struct napi_struct *napi)
+{
+	skb->dev_ref = napi;
+}
+
+static inline void sk_mark_ll(struct sock *sk, struct sk_buff *skb)
+{
+		sk->dev_ref = skb->dev_ref;
+}
+#else /* CONFIG_INET_LL_RX_FLUSH */
+
+#define sk_valid_ll(sk) 0
+#define sk_poll_ll(sk, nonblock) do {} while (0)
+#define skb_mark_ll(napi, skb) do {} while (0)
+#define sk_mark_ll(sk, skb) do {} while (0)
+
+#endif /* CONFIG_INET_LL_RX_FLUSH */
+#endif /* _LINUX_NET_LL_POLL_H */
diff --git a/include/net/sock.h b/include/net/sock.h
index 66772cf..eeabdcd4 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -399,6 +399,9 @@ struct sock {
 	int			(*sk_backlog_rcv)(struct sock *sk,
 						  struct sk_buff *skb);
 	void                    (*sk_destruct)(struct sock *sk);
+#ifdef CONFIG_INET_LL_RX_POLL
+	struct napi_struct	*dev_ref;
+#endif
 };
 
 /*
diff --git a/net/core/datagram.c b/net/core/datagram.c
index b71423d..df3dab8 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -56,6 +56,7 @@
 #include <net/sock.h>
 #include <net/tcp_states.h>
 #include <trace/events/skb.h>
+#include <net/ll_poll.h>
 
 /*
  *	Is a socket 'connection oriented' ?
@@ -201,12 +202,18 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
 			} else
 				__skb_unlink(skb, queue);
 
+			sk_mark_ll(sk, skb);
 			spin_unlock_irqrestore(&queue->lock, cpu_flags);
 			*off = _off;
 			return skb;
 		}
 		spin_unlock_irqrestore(&queue->lock, cpu_flags);
 
+#ifdef CONFIG_INET_LL_RX_POLL
+		if (sk_valid_ll(sk) && sk_poll_ll(sk, flags & MSG_DONTWAIT))
+			continue;
+#endif
+
 		/* User doesn't want to wait */
 		error = -EAGAIN;
 		if (!timeo)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index af9185d..4efd230 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -739,6 +739,10 @@ static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
 	new->vlan_tci		= old->vlan_tci;
 
 	skb_copy_secmark(new, old);
+
+#ifdef CONFIG_INET_LL_RX_POLL
+	new->dev_ref		= old->dev_ref;
+#endif
 }
 
 /*
diff --git a/net/core/sock.c b/net/core/sock.c
index 6ba327d..d8058ce 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -139,6 +139,8 @@
 #include <net/tcp.h>
 #endif
 
+#include <net/ll_poll.h>
+
 static DEFINE_MUTEX(proto_list_mutex);
 static LIST_HEAD(proto_list);
 
@@ -2284,6 +2286,10 @@ void sock_init_data(struct socket *sock, struct sock *sk)
 
 	sk->sk_stamp = ktime_set(-1L, 0);
 
+#ifdef CONFIG_INET_LL_RX_POLL
+	sk->dev_ref	=	NULL;
+#endif
+
 	/*
 	 * Before updating sk_refcnt, we must commit prior changes to memory
 	 * (Documentation/RCU/rculist_nulls.txt for details)
diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index 8603ca8..d209f0f 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -409,6 +409,18 @@ config INET_LRO
 
 	  If unsure, say Y.
 
+config INET_LL_RX_POLL
+	bool "Low Latency Receive Poll"
+	depends on X86_TSC
+	default n
+	---help---
+	  Support Low Latency Receive Queue Poll.
+	  (For network card drivers which support this option.)
+	  When waiting for data in read or poll call directly into the the device driver
+	  to flush packets which may be pending on the device queues into the stack.
+
+	  If unsure, say N.
+
 config INET_DIAG
 	tristate "INET: socket monitoring interface"
 	default y
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index fa2f63f..d0fcaaf 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -25,6 +25,7 @@
 #include <net/inet_frag.h>
 #include <net/ping.h>
 #include <net/tcp_memcontrol.h>
+#include <net/ll_poll.h>
 
 static int zero;
 static int one = 1;
@@ -326,6 +327,15 @@ static struct ctl_table ipv4_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
+#ifdef CONFIG_INET_LL_RX_POLL
+	{
+		.procname	= "ip_low_latency_poll",
+		.data		= &sysctl_net_ll_poll,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
+#endif
 	{
 		.procname	= "tcp_syn_retries",
 		.data		= &sysctl_tcp_syn_retries,
diff --git a/net/socket.c b/net/socket.c
index 6b94633..4afe504 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -105,6 +105,12 @@
 #include <linux/sockios.h>
 #include <linux/atalk.h>
 
+#ifdef CONFIG_INET_LL_RX_POLL
+#include <net/ll_poll.h>
+int sysctl_net_ll_poll __read_mostly;
+EXPORT_SYMBOL_GPL(sysctl_net_ll_poll);
+#endif
+
 static int sock_no_open(struct inode *irrelevant, struct file *dontcare);
 static ssize_t sock_aio_read(struct kiocb *iocb, const struct iovec *iov,
 			 unsigned long nr_segs, loff_t pos);
@@ -1143,12 +1149,29 @@ EXPORT_SYMBOL(sock_create_lite);
 static unsigned int sock_poll(struct file *file, poll_table *wait)
 {
 	struct socket *sock;
+	unsigned int poll_result;
 
 	/*
 	 *      We can't return errors to poll, so it's either yes or no.
 	 */
 	sock = file->private_data;
-	return sock->ops->poll(file, sock, wait);
+
+	poll_result = sock->ops->poll(file, sock, wait);
+
+#ifdef CONFIG_INET_LL_RX_POLL
+	if (wait &&
+	    !(poll_result & (POLLRDNORM | POLLERR | POLLRDHUP | POLLHUP))) {
+
+		struct sock *sk = sock->sk;
+
+		/* only try once per poll */
+		if (sk_valid_ll(sk) && sk_poll_ll(sk, 1))
+			poll_result = sock->ops->poll(file, sock, wait);
+
+	}
+#endif /* CONFIG_INET_LL_RX_POLL */
+
+	return poll_result;
 }
 
 static int sock_mmap(struct file *file, struct vm_area_struct *vma)


------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 net-next 2/4] tcp: add TCP support for low latency receive poll.
  2013-05-19 10:25 [PATCH v2 net-next 0/4] net: low latency Ethernet device polling Eliezer Tamir
@ 2013-05-19 10:25   ` Eliezer Tamir
  2013-05-19 10:25   ` Eliezer Tamir
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 29+ messages in thread
From: Eliezer Tamir @ 2013-05-19 10:25 UTC (permalink / raw)
  To: Dave Miller
  Cc: linux-kernel, netdev, Jesse Brandeburg, Don Skidmore,
	e1000-devel, Willem de Bruijn, Andi Kleen, HPA, Eliezer Tamir

an example of how one could add support for ndo_ll_poll to TCP.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 net/ipv4/Kconfig     |   11 +++++++++++
 net/ipv4/tcp.c       |    9 +++++++++
 net/ipv4/tcp_input.c |    4 ++++
 3 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index d209f0f..8a3239e 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -421,6 +421,17 @@ config INET_LL_RX_POLL
 
 	  If unsure, say N.
 
+config INET_LL_TCP_POLL
+	bool "Low Latency TCP Receive Poll"
+	depends on INET_LL_RX_POLL
+	default n
+	---help---
+	  TCP support for Low Latency TCP Queue Poll.
+	  (For network cards that support this option.)
+	  Add support to the TCP stack for direct polling of the network card.
+
+	  If unsure, say N.
+
 config INET_DIAG
 	tristate "INET: socket monitoring interface"
 	default y
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index dcb116d..85b8040 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -279,6 +279,7 @@
 
 #include <asm/uaccess.h>
 #include <asm/ioctls.h>
+#include <net/ll_poll.h>
 
 int sysctl_tcp_fin_timeout __read_mostly = TCP_FIN_TIMEOUT;
 
@@ -1504,6 +1505,7 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
 			if (offset + 1 != skb->len)
 				continue;
 		}
+		sk_mark_ll(sk, skb);
 		if (tcp_hdr(skb)->fin) {
 			sk_eat_skb(sk, skb, false);
 			++seq;
@@ -1551,6 +1553,12 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	struct sk_buff *skb;
 	u32 urg_hole = 0;
 
+#ifdef CONFIG_INET_LL_TCP_POLL
+	if (sk_valid_ll(sk) && skb_queue_empty(&sk->sk_receive_queue)
+	    && (sk->sk_state == TCP_ESTABLISHED))
+		sk_poll_ll(sk, nonblock);
+#endif
+
 	lock_sock(sk);
 
 	err = -ENOTCONN;
@@ -1855,6 +1863,7 @@ do_prequeue:
 					break;
 				}
 			}
+			sk_mark_ll(sk, skb);
 		}
 
 		*seq += used;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b358e8c..f3f293b 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -74,6 +74,7 @@
 #include <linux/ipsec.h>
 #include <asm/unaligned.h>
 #include <net/netdma.h>
+#include <net/ll_poll.h>
 
 int sysctl_tcp_timestamps __read_mostly = 1;
 int sysctl_tcp_window_scaling __read_mostly = 1;
@@ -4329,6 +4330,7 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 				tp->copied_seq += chunk;
 				eaten = (chunk == skb->len);
 				tcp_rcv_space_adjust(sk);
+				sk_mark_ll(sk, skb);
 			}
 			local_bh_disable();
 		}
@@ -4896,6 +4898,7 @@ static int tcp_copy_to_iovec(struct sock *sk, struct sk_buff *skb, int hlen)
 		tp->ucopy.len -= chunk;
 		tp->copied_seq += chunk;
 		tcp_rcv_space_adjust(sk);
+		sk_mark_ll(sk, skb);
 	}
 
 	local_bh_disable();
@@ -4955,6 +4958,7 @@ static bool tcp_dma_try_early_copy(struct sock *sk, struct sk_buff *skb,
 		tp->ucopy.len -= chunk;
 		tp->copied_seq += chunk;
 		tcp_rcv_space_adjust(sk);
+		sk_mark_ll(sk, skb);
 
 		if ((tp->ucopy.len == 0) ||
 		    (tcp_flag_word(tcp_hdr(skb)) & TCP_FLAG_PSH) ||


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 net-next 2/4] tcp: add TCP support for low latency receive poll.
@ 2013-05-19 10:25   ` Eliezer Tamir
  0 siblings, 0 replies; 29+ messages in thread
From: Eliezer Tamir @ 2013-05-19 10:25 UTC (permalink / raw)
  To: Dave Miller
  Cc: Willem de Bruijn, e1000-devel, netdev, linux-kernel,
	Jesse Brandeburg, Andi Kleen, Don Skidmore, HPA, Eliezer Tamir

an example of how one could add support for ndo_ll_poll to TCP.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 net/ipv4/Kconfig     |   11 +++++++++++
 net/ipv4/tcp.c       |    9 +++++++++
 net/ipv4/tcp_input.c |    4 ++++
 3 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index d209f0f..8a3239e 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -421,6 +421,17 @@ config INET_LL_RX_POLL
 
 	  If unsure, say N.
 
+config INET_LL_TCP_POLL
+	bool "Low Latency TCP Receive Poll"
+	depends on INET_LL_RX_POLL
+	default n
+	---help---
+	  TCP support for Low Latency TCP Queue Poll.
+	  (For network cards that support this option.)
+	  Add support to the TCP stack for direct polling of the network card.
+
+	  If unsure, say N.
+
 config INET_DIAG
 	tristate "INET: socket monitoring interface"
 	default y
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index dcb116d..85b8040 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -279,6 +279,7 @@
 
 #include <asm/uaccess.h>
 #include <asm/ioctls.h>
+#include <net/ll_poll.h>
 
 int sysctl_tcp_fin_timeout __read_mostly = TCP_FIN_TIMEOUT;
 
@@ -1504,6 +1505,7 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
 			if (offset + 1 != skb->len)
 				continue;
 		}
+		sk_mark_ll(sk, skb);
 		if (tcp_hdr(skb)->fin) {
 			sk_eat_skb(sk, skb, false);
 			++seq;
@@ -1551,6 +1553,12 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	struct sk_buff *skb;
 	u32 urg_hole = 0;
 
+#ifdef CONFIG_INET_LL_TCP_POLL
+	if (sk_valid_ll(sk) && skb_queue_empty(&sk->sk_receive_queue)
+	    && (sk->sk_state == TCP_ESTABLISHED))
+		sk_poll_ll(sk, nonblock);
+#endif
+
 	lock_sock(sk);
 
 	err = -ENOTCONN;
@@ -1855,6 +1863,7 @@ do_prequeue:
 					break;
 				}
 			}
+			sk_mark_ll(sk, skb);
 		}
 
 		*seq += used;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b358e8c..f3f293b 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -74,6 +74,7 @@
 #include <linux/ipsec.h>
 #include <asm/unaligned.h>
 #include <net/netdma.h>
+#include <net/ll_poll.h>
 
 int sysctl_tcp_timestamps __read_mostly = 1;
 int sysctl_tcp_window_scaling __read_mostly = 1;
@@ -4329,6 +4330,7 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 				tp->copied_seq += chunk;
 				eaten = (chunk == skb->len);
 				tcp_rcv_space_adjust(sk);
+				sk_mark_ll(sk, skb);
 			}
 			local_bh_disable();
 		}
@@ -4896,6 +4898,7 @@ static int tcp_copy_to_iovec(struct sock *sk, struct sk_buff *skb, int hlen)
 		tp->ucopy.len -= chunk;
 		tp->copied_seq += chunk;
 		tcp_rcv_space_adjust(sk);
+		sk_mark_ll(sk, skb);
 	}
 
 	local_bh_disable();
@@ -4955,6 +4958,7 @@ static bool tcp_dma_try_early_copy(struct sock *sk, struct sk_buff *skb,
 		tp->ucopy.len -= chunk;
 		tp->copied_seq += chunk;
 		tcp_rcv_space_adjust(sk);
+		sk_mark_ll(sk, skb);
 
 		if ((tp->ucopy.len == 0) ||
 		    (tcp_flag_word(tcp_hdr(skb)) & TCP_FLAG_PSH) ||


------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 net-next 3/4] ixgbe: Add support for ndo_ll_poll
  2013-05-19 10:25 [PATCH v2 net-next 0/4] net: low latency Ethernet device polling Eliezer Tamir
@ 2013-05-19 10:25   ` Eliezer Tamir
  2013-05-19 10:25   ` Eliezer Tamir
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 29+ messages in thread
From: Eliezer Tamir @ 2013-05-19 10:25 UTC (permalink / raw)
  To: Dave Miller
  Cc: linux-kernel, netdev, Jesse Brandeburg, Don Skidmore,
	e1000-devel, Willem de Bruijn, Andi Kleen, HPA, Eliezer Tamir

Add the ixgbe driver code implementing ndo_ll_poll.
It should be easy for other drivers to do something similar
in order to enable support for CONFIG_INET_LL_RX_POLL

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |   96 +++++++++++++++++++++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   80 +++++++++++++++++++--
 2 files changed, 168 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index ca93238..72be661 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -356,9 +356,105 @@ struct ixgbe_q_vector {
 	struct rcu_head rcu;	/* to avoid race with update stats on free */
 	char name[IFNAMSIZ + 9];
 
+#ifdef CONFIG_INET_LL_RX_POLL
+	unsigned int state;
+#define IXGBE_QV_STATE_IDLE        0
+#define IXGBE_QV_STATE_NAPI	   1    /* NAPI owns this QV */
+#define IXGBE_QV_STATE_POLL	   2    /* poll owns this QV */
+#define IXGBE_QV_LOCKED (IXGBE_QV_STATE_NAPI | IXGBE_QV_STATE_POLL)
+#define IXGBE_QV_STATE_NAPI_YIELD  4    /* NAPI yielded this QV */
+#define IXGBE_QV_STATE_POLL_YIELD  8    /* poll yielded this QV */
+#define IXGBE_QV_YIELD (IXGBE_QV_STATE_NAPI_YIELD | IXGBE_QV_STATE_POLL_YIELD)
+#define IXGBE_QV_USER_PEND (IXGBE_QV_STATE_POLL | IXGBE_QV_STATE_POLL_YIELD)
+	spinlock_t lock;
+#endif  /* CONFIG_INET_LL_RX_POLL */
+
 	/* for dynamic allocation of rings associated with this q_vector */
 	struct ixgbe_ring ring[0] ____cacheline_internodealigned_in_smp;
 };
+#ifdef CONFIG_INET_LL_RX_POLL
+static inline void ixgbe_qv_init_lock(struct ixgbe_q_vector *q_vector)
+{
+
+	spin_lock_init(&q_vector->lock);
+	q_vector->state = IXGBE_QV_STATE_IDLE;
+}
+
+/* called from the device poll rutine to get ownership of a q_vector */
+static inline int ixgbe_qv_lock_napi(struct ixgbe_q_vector *q_vector)
+{
+	int rc = true;
+	spin_lock(&q_vector->lock);
+	if (q_vector->state & IXGBE_QV_LOCKED) {
+		WARN_ON(q_vector->state & IXGBE_QV_STATE_NAPI);
+		q_vector->state |= IXGBE_QV_STATE_NAPI_YIELD;
+		rc = false;
+	} else
+		/* we don't care if someone yielded */
+		q_vector->state = IXGBE_QV_STATE_NAPI;
+	spin_unlock(&q_vector->lock);
+	return rc;
+}
+
+/* returns true is someone tried to get the qv while napi had it */
+static inline int ixgbe_qv_unlock_napi(struct ixgbe_q_vector *q_vector)
+{
+	int rc = false;
+	spin_lock(&q_vector->lock);
+	WARN_ON(q_vector->state & (IXGBE_QV_STATE_POLL |
+			       IXGBE_QV_STATE_NAPI_YIELD));
+
+	if (q_vector->state & IXGBE_QV_STATE_POLL_YIELD)
+		rc = true;
+	q_vector->state = IXGBE_QV_STATE_IDLE;
+	spin_unlock(&q_vector->lock);
+	return rc;
+}
+
+/* called from ixgbe_low_latency_poll() */
+static inline int ixgbe_qv_lock_poll(struct ixgbe_q_vector *q_vector)
+{
+	int rc = true;
+	spin_lock_bh(&q_vector->lock);
+	if ((q_vector->state & IXGBE_QV_LOCKED)) {
+		q_vector->state |= IXGBE_QV_STATE_POLL_YIELD;
+		rc = false;
+	} else
+		/* preserve yield marks */
+		q_vector->state |= IXGBE_QV_STATE_POLL;
+	spin_unlock_bh(&q_vector->lock);
+	return rc;
+}
+
+/* returns true if someone tried to get the qv while it was locked */
+static inline int ixgbe_qv_unlock_poll(struct ixgbe_q_vector *q_vector)
+{
+	int rc = false;
+	spin_lock_bh(&q_vector->lock);
+	WARN_ON(q_vector->state & (IXGBE_QV_STATE_NAPI));
+
+	if (q_vector->state & IXGBE_QV_STATE_POLL_YIELD)
+		rc = true;
+	q_vector->state = IXGBE_QV_STATE_IDLE;
+	spin_unlock_bh(&q_vector->lock);
+	return rc;
+}
+
+/* true if a socket is polling, even if it did not get the lock */
+static inline int ixgbe_qv_ll_polling(struct ixgbe_q_vector *q_vector)
+{
+	WARN_ON(!(q_vector->state & IXGBE_QV_LOCKED));
+	return q_vector->state & IXGBE_QV_USER_PEND;
+}
+#else
+#define ixgbe_qv_init_lock(qv) do {} while (0)
+#define ixgbe_qv_lock_napi(qv) 1
+#define ixgbe_qv_unlock_napi(qv) 0
+#define ixgbe_qv_lock_poll(qv) 0
+#define ixgbe_qv_unlock_poll(qv) 0
+#define ixgbe_qv_ll_polling(qv) 0
+#endif /* CONFIG_INET_LL_RX_POLL */
+
 #ifdef CONFIG_IXGBE_HWMON
 
 #define IXGBE_HWMON_TYPE_LOC		0
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index d30fbdd..628b7b1 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -47,6 +47,7 @@
 #include <linux/if_bridge.h>
 #include <linux/prefetch.h>
 #include <scsi/fc/fc_fcoe.h>
+#include <net/ll_poll.h>
 
 #include "ixgbe.h"
 #include "ixgbe_common.h"
@@ -144,6 +145,14 @@ static int debug = -1;
 module_param(debug, int, 0);
 MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)");
 
+#ifdef CONFIG_INET_LL_RX_POLL
+static int allow_unsafe_removal;
+static int unsafe_to_remove;
+module_param(allow_unsafe_removal, int, 0);
+MODULE_PARM_DESC(allow_unsafe_removal,
+	"Allow removal of module after low latency receive was used");
+#endif
+
 MODULE_AUTHOR("Intel Corporation, <linux.nics@intel.com>");
 MODULE_DESCRIPTION("Intel(R) 10 Gigabit PCI Express Network Driver");
 MODULE_LICENSE("GPL");
@@ -1504,7 +1513,9 @@ static void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
 {
 	struct ixgbe_adapter *adapter = q_vector->adapter;
 
-	if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL))
+	if (ixgbe_qv_ll_polling(q_vector))
+		netif_receive_skb(skb);
+	else if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL))
 		napi_gro_receive(&q_vector->napi, skb);
 	else
 		netif_rx(skb);
@@ -1892,9 +1903,9 @@ dma_sync:
  * expensive overhead for IOMMU access this provides a means of avoiding
  * it by maintaining the mapping of the page to the syste.
  *
- * Returns true if all work is completed without reaching budget
+ * Returns amount of work completed
  **/
-static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
+static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 			       struct ixgbe_ring *rx_ring,
 			       const int budget)
 {
@@ -1976,6 +1987,7 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		}
 
 #endif /* IXGBE_FCOE */
+		skb_mark_ll(skb, &q_vector->napi);
 		ixgbe_rx_skb(q_vector, skb);
 
 		/* update budget accounting */
@@ -1992,9 +2004,45 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 	if (cleaned_count)
 		ixgbe_alloc_rx_buffers(rx_ring, cleaned_count);
 
-	return (total_rx_packets < budget);
+	return total_rx_packets;
 }
 
+#ifdef CONFIG_INET_LL_RX_POLL
+/* must be called with local_bh_disable()d */
+static int ixgbe_low_latency_recv(struct napi_struct *napi)
+{
+	struct ixgbe_q_vector *q_vector =
+			container_of(napi, struct ixgbe_q_vector, napi);
+	struct ixgbe_adapter *adapter = q_vector->adapter;
+	struct ixgbe_ring  *ring;
+	int found;
+
+	if (unlikely(!unsafe_to_remove)) {
+		unsafe_to_remove = 1;
+		if (!allow_unsafe_removal) {
+			pr_info("module may no longer be removed\n");
+			try_module_get(THIS_MODULE);
+		}
+	}
+
+	if (test_bit(__IXGBE_DOWN, &adapter->state))
+		return LL_FLUSH_FAILED;
+
+	if (!ixgbe_qv_lock_poll(q_vector))
+		return LL_FLUSH_BUSY;
+
+	ixgbe_for_each_ring(ring, q_vector->rx) {
+		found = ixgbe_clean_rx_irq(q_vector, ring, 4);
+		if (found)
+			break;
+	}
+
+	ixgbe_qv_unlock_poll(q_vector);
+
+	return LL_FLUSH_DONE;
+}
+#endif	/* CONFIG_INET_LL_RX_POLL */
+
 /**
  * ixgbe_configure_msix - Configure MSI-X hardware
  * @adapter: board private structure
@@ -2550,6 +2598,9 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
 	ixgbe_for_each_ring(ring, q_vector->tx)
 		clean_complete &= !!ixgbe_clean_tx_irq(q_vector, ring);
 
+	if (!ixgbe_qv_lock_napi(q_vector))
+		return budget;
+
 	/* attempt to distribute budget to each queue fairly, but don't allow
 	 * the budget to go below 1 because we'll exit polling */
 	if (q_vector->rx.count > 1)
@@ -2558,9 +2609,10 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
 		per_ring_budget = budget;
 
 	ixgbe_for_each_ring(ring, q_vector->rx)
-		clean_complete &= ixgbe_clean_rx_irq(q_vector, ring,
-						     per_ring_budget);
+		clean_complete &= (ixgbe_clean_rx_irq(q_vector, ring,
+				   per_ring_budget) < per_ring_budget);
 
+	ixgbe_qv_unlock_napi(q_vector);
 	/* If all work not completed, return budget and keep polling */
 	if (!clean_complete)
 		return budget;
@@ -3747,16 +3799,25 @@ static void ixgbe_napi_enable_all(struct ixgbe_adapter *adapter)
 {
 	int q_idx;
 
-	for (q_idx = 0; q_idx < adapter->num_q_vectors; q_idx++)
+	for (q_idx = 0; q_idx < adapter->num_q_vectors; q_idx++) {
+		ixgbe_qv_init_lock(adapter->q_vector[q_idx]);
 		napi_enable(&adapter->q_vector[q_idx]->napi);
+	}
 }
 
 static void ixgbe_napi_disable_all(struct ixgbe_adapter *adapter)
 {
 	int q_idx;
 
-	for (q_idx = 0; q_idx < adapter->num_q_vectors; q_idx++)
+	local_bh_disable(); /* for ixgbe_qv_lock_napi() */
+	for (q_idx = 0; q_idx < adapter->num_q_vectors; q_idx++) {
 		napi_disable(&adapter->q_vector[q_idx]->napi);
+		while (!ixgbe_qv_lock_napi(adapter->q_vector[q_idx])) {
+			pr_info("QV %d locked\n", q_idx);
+			mdelay(1);
+		}
+	}
+	local_bh_enable();
 }
 
 #ifdef CONFIG_IXGBE_DCB
@@ -7177,6 +7238,9 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= ixgbe_netpoll,
 #endif
+#ifdef CONFIG_INET_LL_RX_POLL
+	.ndo_ll_poll		= ixgbe_low_latency_recv,
+#endif
 #ifdef IXGBE_FCOE
 	.ndo_fcoe_ddp_setup = ixgbe_fcoe_ddp_get,
 	.ndo_fcoe_ddp_target = ixgbe_fcoe_ddp_target,


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 net-next 3/4] ixgbe: Add support for ndo_ll_poll
@ 2013-05-19 10:25   ` Eliezer Tamir
  0 siblings, 0 replies; 29+ messages in thread
From: Eliezer Tamir @ 2013-05-19 10:25 UTC (permalink / raw)
  To: Dave Miller
  Cc: Willem de Bruijn, e1000-devel, netdev, linux-kernel,
	Jesse Brandeburg, Andi Kleen, Don Skidmore, HPA, Eliezer Tamir

Add the ixgbe driver code implementing ndo_ll_poll.
It should be easy for other drivers to do something similar
in order to enable support for CONFIG_INET_LL_RX_POLL

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |   96 +++++++++++++++++++++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   80 +++++++++++++++++++--
 2 files changed, 168 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index ca93238..72be661 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -356,9 +356,105 @@ struct ixgbe_q_vector {
 	struct rcu_head rcu;	/* to avoid race with update stats on free */
 	char name[IFNAMSIZ + 9];
 
+#ifdef CONFIG_INET_LL_RX_POLL
+	unsigned int state;
+#define IXGBE_QV_STATE_IDLE        0
+#define IXGBE_QV_STATE_NAPI	   1    /* NAPI owns this QV */
+#define IXGBE_QV_STATE_POLL	   2    /* poll owns this QV */
+#define IXGBE_QV_LOCKED (IXGBE_QV_STATE_NAPI | IXGBE_QV_STATE_POLL)
+#define IXGBE_QV_STATE_NAPI_YIELD  4    /* NAPI yielded this QV */
+#define IXGBE_QV_STATE_POLL_YIELD  8    /* poll yielded this QV */
+#define IXGBE_QV_YIELD (IXGBE_QV_STATE_NAPI_YIELD | IXGBE_QV_STATE_POLL_YIELD)
+#define IXGBE_QV_USER_PEND (IXGBE_QV_STATE_POLL | IXGBE_QV_STATE_POLL_YIELD)
+	spinlock_t lock;
+#endif  /* CONFIG_INET_LL_RX_POLL */
+
 	/* for dynamic allocation of rings associated with this q_vector */
 	struct ixgbe_ring ring[0] ____cacheline_internodealigned_in_smp;
 };
+#ifdef CONFIG_INET_LL_RX_POLL
+static inline void ixgbe_qv_init_lock(struct ixgbe_q_vector *q_vector)
+{
+
+	spin_lock_init(&q_vector->lock);
+	q_vector->state = IXGBE_QV_STATE_IDLE;
+}
+
+/* called from the device poll rutine to get ownership of a q_vector */
+static inline int ixgbe_qv_lock_napi(struct ixgbe_q_vector *q_vector)
+{
+	int rc = true;
+	spin_lock(&q_vector->lock);
+	if (q_vector->state & IXGBE_QV_LOCKED) {
+		WARN_ON(q_vector->state & IXGBE_QV_STATE_NAPI);
+		q_vector->state |= IXGBE_QV_STATE_NAPI_YIELD;
+		rc = false;
+	} else
+		/* we don't care if someone yielded */
+		q_vector->state = IXGBE_QV_STATE_NAPI;
+	spin_unlock(&q_vector->lock);
+	return rc;
+}
+
+/* returns true is someone tried to get the qv while napi had it */
+static inline int ixgbe_qv_unlock_napi(struct ixgbe_q_vector *q_vector)
+{
+	int rc = false;
+	spin_lock(&q_vector->lock);
+	WARN_ON(q_vector->state & (IXGBE_QV_STATE_POLL |
+			       IXGBE_QV_STATE_NAPI_YIELD));
+
+	if (q_vector->state & IXGBE_QV_STATE_POLL_YIELD)
+		rc = true;
+	q_vector->state = IXGBE_QV_STATE_IDLE;
+	spin_unlock(&q_vector->lock);
+	return rc;
+}
+
+/* called from ixgbe_low_latency_poll() */
+static inline int ixgbe_qv_lock_poll(struct ixgbe_q_vector *q_vector)
+{
+	int rc = true;
+	spin_lock_bh(&q_vector->lock);
+	if ((q_vector->state & IXGBE_QV_LOCKED)) {
+		q_vector->state |= IXGBE_QV_STATE_POLL_YIELD;
+		rc = false;
+	} else
+		/* preserve yield marks */
+		q_vector->state |= IXGBE_QV_STATE_POLL;
+	spin_unlock_bh(&q_vector->lock);
+	return rc;
+}
+
+/* returns true if someone tried to get the qv while it was locked */
+static inline int ixgbe_qv_unlock_poll(struct ixgbe_q_vector *q_vector)
+{
+	int rc = false;
+	spin_lock_bh(&q_vector->lock);
+	WARN_ON(q_vector->state & (IXGBE_QV_STATE_NAPI));
+
+	if (q_vector->state & IXGBE_QV_STATE_POLL_YIELD)
+		rc = true;
+	q_vector->state = IXGBE_QV_STATE_IDLE;
+	spin_unlock_bh(&q_vector->lock);
+	return rc;
+}
+
+/* true if a socket is polling, even if it did not get the lock */
+static inline int ixgbe_qv_ll_polling(struct ixgbe_q_vector *q_vector)
+{
+	WARN_ON(!(q_vector->state & IXGBE_QV_LOCKED));
+	return q_vector->state & IXGBE_QV_USER_PEND;
+}
+#else
+#define ixgbe_qv_init_lock(qv) do {} while (0)
+#define ixgbe_qv_lock_napi(qv) 1
+#define ixgbe_qv_unlock_napi(qv) 0
+#define ixgbe_qv_lock_poll(qv) 0
+#define ixgbe_qv_unlock_poll(qv) 0
+#define ixgbe_qv_ll_polling(qv) 0
+#endif /* CONFIG_INET_LL_RX_POLL */
+
 #ifdef CONFIG_IXGBE_HWMON
 
 #define IXGBE_HWMON_TYPE_LOC		0
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index d30fbdd..628b7b1 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -47,6 +47,7 @@
 #include <linux/if_bridge.h>
 #include <linux/prefetch.h>
 #include <scsi/fc/fc_fcoe.h>
+#include <net/ll_poll.h>
 
 #include "ixgbe.h"
 #include "ixgbe_common.h"
@@ -144,6 +145,14 @@ static int debug = -1;
 module_param(debug, int, 0);
 MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)");
 
+#ifdef CONFIG_INET_LL_RX_POLL
+static int allow_unsafe_removal;
+static int unsafe_to_remove;
+module_param(allow_unsafe_removal, int, 0);
+MODULE_PARM_DESC(allow_unsafe_removal,
+	"Allow removal of module after low latency receive was used");
+#endif
+
 MODULE_AUTHOR("Intel Corporation, <linux.nics@intel.com>");
 MODULE_DESCRIPTION("Intel(R) 10 Gigabit PCI Express Network Driver");
 MODULE_LICENSE("GPL");
@@ -1504,7 +1513,9 @@ static void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
 {
 	struct ixgbe_adapter *adapter = q_vector->adapter;
 
-	if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL))
+	if (ixgbe_qv_ll_polling(q_vector))
+		netif_receive_skb(skb);
+	else if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL))
 		napi_gro_receive(&q_vector->napi, skb);
 	else
 		netif_rx(skb);
@@ -1892,9 +1903,9 @@ dma_sync:
  * expensive overhead for IOMMU access this provides a means of avoiding
  * it by maintaining the mapping of the page to the syste.
  *
- * Returns true if all work is completed without reaching budget
+ * Returns amount of work completed
  **/
-static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
+static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 			       struct ixgbe_ring *rx_ring,
 			       const int budget)
 {
@@ -1976,6 +1987,7 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		}
 
 #endif /* IXGBE_FCOE */
+		skb_mark_ll(skb, &q_vector->napi);
 		ixgbe_rx_skb(q_vector, skb);
 
 		/* update budget accounting */
@@ -1992,9 +2004,45 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 	if (cleaned_count)
 		ixgbe_alloc_rx_buffers(rx_ring, cleaned_count);
 
-	return (total_rx_packets < budget);
+	return total_rx_packets;
 }
 
+#ifdef CONFIG_INET_LL_RX_POLL
+/* must be called with local_bh_disable()d */
+static int ixgbe_low_latency_recv(struct napi_struct *napi)
+{
+	struct ixgbe_q_vector *q_vector =
+			container_of(napi, struct ixgbe_q_vector, napi);
+	struct ixgbe_adapter *adapter = q_vector->adapter;
+	struct ixgbe_ring  *ring;
+	int found;
+
+	if (unlikely(!unsafe_to_remove)) {
+		unsafe_to_remove = 1;
+		if (!allow_unsafe_removal) {
+			pr_info("module may no longer be removed\n");
+			try_module_get(THIS_MODULE);
+		}
+	}
+
+	if (test_bit(__IXGBE_DOWN, &adapter->state))
+		return LL_FLUSH_FAILED;
+
+	if (!ixgbe_qv_lock_poll(q_vector))
+		return LL_FLUSH_BUSY;
+
+	ixgbe_for_each_ring(ring, q_vector->rx) {
+		found = ixgbe_clean_rx_irq(q_vector, ring, 4);
+		if (found)
+			break;
+	}
+
+	ixgbe_qv_unlock_poll(q_vector);
+
+	return LL_FLUSH_DONE;
+}
+#endif	/* CONFIG_INET_LL_RX_POLL */
+
 /**
  * ixgbe_configure_msix - Configure MSI-X hardware
  * @adapter: board private structure
@@ -2550,6 +2598,9 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
 	ixgbe_for_each_ring(ring, q_vector->tx)
 		clean_complete &= !!ixgbe_clean_tx_irq(q_vector, ring);
 
+	if (!ixgbe_qv_lock_napi(q_vector))
+		return budget;
+
 	/* attempt to distribute budget to each queue fairly, but don't allow
 	 * the budget to go below 1 because we'll exit polling */
 	if (q_vector->rx.count > 1)
@@ -2558,9 +2609,10 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
 		per_ring_budget = budget;
 
 	ixgbe_for_each_ring(ring, q_vector->rx)
-		clean_complete &= ixgbe_clean_rx_irq(q_vector, ring,
-						     per_ring_budget);
+		clean_complete &= (ixgbe_clean_rx_irq(q_vector, ring,
+				   per_ring_budget) < per_ring_budget);
 
+	ixgbe_qv_unlock_napi(q_vector);
 	/* If all work not completed, return budget and keep polling */
 	if (!clean_complete)
 		return budget;
@@ -3747,16 +3799,25 @@ static void ixgbe_napi_enable_all(struct ixgbe_adapter *adapter)
 {
 	int q_idx;
 
-	for (q_idx = 0; q_idx < adapter->num_q_vectors; q_idx++)
+	for (q_idx = 0; q_idx < adapter->num_q_vectors; q_idx++) {
+		ixgbe_qv_init_lock(adapter->q_vector[q_idx]);
 		napi_enable(&adapter->q_vector[q_idx]->napi);
+	}
 }
 
 static void ixgbe_napi_disable_all(struct ixgbe_adapter *adapter)
 {
 	int q_idx;
 
-	for (q_idx = 0; q_idx < adapter->num_q_vectors; q_idx++)
+	local_bh_disable(); /* for ixgbe_qv_lock_napi() */
+	for (q_idx = 0; q_idx < adapter->num_q_vectors; q_idx++) {
 		napi_disable(&adapter->q_vector[q_idx]->napi);
+		while (!ixgbe_qv_lock_napi(adapter->q_vector[q_idx])) {
+			pr_info("QV %d locked\n", q_idx);
+			mdelay(1);
+		}
+	}
+	local_bh_enable();
 }
 
 #ifdef CONFIG_IXGBE_DCB
@@ -7177,6 +7238,9 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= ixgbe_netpoll,
 #endif
+#ifdef CONFIG_INET_LL_RX_POLL
+	.ndo_ll_poll		= ixgbe_low_latency_recv,
+#endif
 #ifdef IXGBE_FCOE
 	.ndo_fcoe_ddp_setup = ixgbe_fcoe_ddp_get,
 	.ndo_fcoe_ddp_target = ixgbe_fcoe_ddp_target,


------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 net-next 4/4] ixgbe: add extra stats for ndo_ll_poll
  2013-05-19 10:25 [PATCH v2 net-next 0/4] net: low latency Ethernet device polling Eliezer Tamir
                   ` (2 preceding siblings ...)
  2013-05-19 10:25   ` Eliezer Tamir
@ 2013-05-19 10:26 ` Eliezer Tamir
  2013-05-19 19:06   ` Or Gerlitz
  4 siblings, 0 replies; 29+ messages in thread
From: Eliezer Tamir @ 2013-05-19 10:26 UTC (permalink / raw)
  To: Dave Miller
  Cc: linux-kernel, netdev, Jesse Brandeburg, Don Skidmore,
	e1000-devel, Willem de Bruijn, Andi Kleen, HPA, Eliezer Tamir

Add additional statistics to the ixgbe driver for ndo_ll_poll
Defined under LL_EXTENDED_STATS

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
---

 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |   14 ++++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |   40 ++++++++++++++++++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |    6 +++
 3 files changed, 60 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 72be661..2a7de7c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -52,6 +52,9 @@
 #include <linux/dca.h>
 #endif
 
+#ifdef CONFIG_INET_LL_RX_POLL
+#define LL_EXTENDED_STATS
+#endif  /* CONFIG_INET_LL_RX_POLL */
 /* common prefix used by pr_<> macros */
 #undef pr_fmt
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -182,6 +185,11 @@ struct ixgbe_rx_buffer {
 struct ixgbe_queue_stats {
 	u64 packets;
 	u64 bytes;
+#ifdef LL_EXTENDED_STATS
+	u64 yields;
+	u64 misses;
+	u64 cleaned;
+#endif  /* LL_EXTENDED_STATS */
 };
 
 struct ixgbe_tx_queue_stats {
@@ -389,6 +397,9 @@ static inline int ixgbe_qv_lock_napi(struct ixgbe_q_vector *q_vector)
 		WARN_ON(q_vector->state & IXGBE_QV_STATE_NAPI);
 		q_vector->state |= IXGBE_QV_STATE_NAPI_YIELD;
 		rc = false;
+#ifdef LL_EXTENDED_STATS
+		q_vector->tx.ring->stats.yields++;
+#endif
 	} else
 		/* we don't care if someone yielded */
 		q_vector->state = IXGBE_QV_STATE_NAPI;
@@ -419,6 +430,9 @@ static inline int ixgbe_qv_lock_poll(struct ixgbe_q_vector *q_vector)
 	if ((q_vector->state & IXGBE_QV_LOCKED)) {
 		q_vector->state |= IXGBE_QV_STATE_POLL_YIELD;
 		rc = false;
+#ifdef LL_EXTENDED_STATS
+		q_vector->rx.ring->stats.yields++;
+#endif
 	} else
 		/* preserve yield marks */
 		q_vector->state |= IXGBE_QV_STATE_POLL;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index d375472..24e2e7a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -1054,6 +1054,12 @@ static void ixgbe_get_ethtool_stats(struct net_device *netdev,
 			data[i] = 0;
 			data[i+1] = 0;
 			i += 2;
+#ifdef LL_EXTENDED_STATS
+			data[i] = 0;
+			data[i+1] = 0;
+			data[i+2] = 0;
+			i += 3;
+#endif
 			continue;
 		}
 
@@ -1063,6 +1069,12 @@ static void ixgbe_get_ethtool_stats(struct net_device *netdev,
 			data[i+1] = ring->stats.bytes;
 		} while (u64_stats_fetch_retry_bh(&ring->syncp, start));
 		i += 2;
+#ifdef LL_EXTENDED_STATS
+		data[i] = ring->stats.yields;
+		data[i+1] = ring->stats.misses;
+		data[i+2] = ring->stats.cleaned;
+		i += 3;
+#endif
 	}
 	for (j = 0; j < IXGBE_NUM_RX_QUEUES; j++) {
 		ring = adapter->rx_ring[j];
@@ -1070,6 +1082,12 @@ static void ixgbe_get_ethtool_stats(struct net_device *netdev,
 			data[i] = 0;
 			data[i+1] = 0;
 			i += 2;
+#ifdef LL_EXTENDED_STATS
+			data[i] = 0;
+			data[i+1] = 0;
+			data[i+2] = 0;
+			i += 3;
+#endif
 			continue;
 		}
 
@@ -1079,6 +1097,12 @@ static void ixgbe_get_ethtool_stats(struct net_device *netdev,
 			data[i+1] = ring->stats.bytes;
 		} while (u64_stats_fetch_retry_bh(&ring->syncp, start));
 		i += 2;
+#ifdef LL_EXTENDED_STATS
+		data[i] = ring->stats.yields;
+		data[i+1] = ring->stats.misses;
+		data[i+2] = ring->stats.cleaned;
+		i += 3;
+#endif
 	}
 
 	for (j = 0; j < IXGBE_MAX_PACKET_BUFFERS; j++) {
@@ -1115,12 +1139,28 @@ static void ixgbe_get_strings(struct net_device *netdev, u32 stringset,
 			p += ETH_GSTRING_LEN;
 			sprintf(p, "tx_queue_%u_bytes", i);
 			p += ETH_GSTRING_LEN;
+#ifdef LL_EXTENDED_STATS
+			sprintf(p, "tx_q_%u_napi_yield", i);
+			p += ETH_GSTRING_LEN;
+			sprintf(p, "tx_q_%u_misses", i);
+			p += ETH_GSTRING_LEN;
+			sprintf(p, "tx_q_%u_cleaned", i);
+			p += ETH_GSTRING_LEN;
+#endif /* LL_EXTENDED_STATS */
 		}
 		for (i = 0; i < IXGBE_NUM_RX_QUEUES; i++) {
 			sprintf(p, "rx_queue_%u_packets", i);
 			p += ETH_GSTRING_LEN;
 			sprintf(p, "rx_queue_%u_bytes", i);
 			p += ETH_GSTRING_LEN;
+#ifdef LL_EXTENDED_STATS
+			sprintf(p, "rx_q_%u_ll_poll_yield", i);
+			p += ETH_GSTRING_LEN;
+			sprintf(p, "rx_q_%u_misses", i);
+			p += ETH_GSTRING_LEN;
+			sprintf(p, "rx_q_%u_cleaned", i);
+			p += ETH_GSTRING_LEN;
+#endif /* LL_EXTENDED_STATS */
 		}
 		for (i = 0; i < IXGBE_MAX_PACKET_BUFFERS; i++) {
 			sprintf(p, "tx_pb_%u_pxon", i);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 628b7b1..1b2214b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2033,6 +2033,12 @@ static int ixgbe_low_latency_recv(struct napi_struct *napi)
 
 	ixgbe_for_each_ring(ring, q_vector->rx) {
 		found = ixgbe_clean_rx_irq(q_vector, ring, 4);
+#ifdef LL_EXTENDED_STATS
+		if (found)
+			ring->stats.cleaned += found;
+		else
+			ring->stats.misses++;
+#endif
 		if (found)
 			break;
 	}


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 net-next 0/4] net: low latency Ethernet device polling
  2013-05-19 10:25 [PATCH v2 net-next 0/4] net: low latency Ethernet device polling Eliezer Tamir
@ 2013-05-19 19:06   ` Or Gerlitz
  2013-05-19 10:25   ` Eliezer Tamir
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 29+ messages in thread
From: Or Gerlitz @ 2013-05-19 19:06 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: Dave Miller, linux-kernel, netdev, Jesse Brandeburg,
	Don Skidmore, e1000-devel, Willem de Bruijn, Andi Kleen, HPA,
	Eliezer Tamir

On Sun, May 19, 2013 at 1:25 PM, Eliezer Tamir
<eliezer.tamir@linux.intel.com> wrote:
> This is an updated version of the code we posted on February.

Last time you've placed a copy of the patchset in the rfc branch of
git://github.com/jbrandeb/lls.git  - can you repost there V2 too?

Or.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 net-next 0/4] net: low latency Ethernet device polling
@ 2013-05-19 19:06   ` Or Gerlitz
  0 siblings, 0 replies; 29+ messages in thread
From: Or Gerlitz @ 2013-05-19 19:06 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: Willem de Bruijn, e1000-devel, netdev, Jesse Brandeburg,
	linux-kernel, Andi Kleen, Don Skidmore, HPA, Eliezer Tamir,
	Dave Miller

On Sun, May 19, 2013 at 1:25 PM, Eliezer Tamir
<eliezer.tamir@linux.intel.com> wrote:
> This is an updated version of the code we posted on February.

Last time you've placed a copy of the patchset in the rfc branch of
git://github.com/jbrandeb/lls.git  - can you repost there V2 too?

Or.

------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 net-next 0/4] net: low latency Ethernet device polling
  2013-05-19 19:06   ` Or Gerlitz
@ 2013-05-19 19:20     ` Eliezer Tamir
  -1 siblings, 0 replies; 29+ messages in thread
From: Eliezer Tamir @ 2013-05-19 19:20 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Dave Miller, linux-kernel, netdev, Jesse Brandeburg,
	Don Skidmore, e1000-devel, Willem de Bruijn, Andi Kleen, HPA,
	Eliezer Tamir

On 19/05/2013 22:06, Or Gerlitz wrote:
> On Sun, May 19, 2013 at 1:25 PM, Eliezer Tamir
> <eliezer.tamir@linux.intel.com> wrote:
>> This is an updated version of the code we posted on February.
>
> Last time you've placed a copy of the patchset in the rfc branch of
> git://github.com/jbrandeb/lls.git  - can you repost there V2 too?
>
> Or.
>
Yes, but it will have to wait for tomorrow.
It's on Jesse's github account and for him it's still the middle of the 
weekend.

-Eliezer

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 net-next 0/4] net: low latency Ethernet device polling
@ 2013-05-19 19:20     ` Eliezer Tamir
  0 siblings, 0 replies; 29+ messages in thread
From: Eliezer Tamir @ 2013-05-19 19:20 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Willem de Bruijn, e1000-devel, netdev, Jesse Brandeburg,
	linux-kernel, Andi Kleen, Don Skidmore, HPA, Eliezer Tamir,
	Dave Miller

On 19/05/2013 22:06, Or Gerlitz wrote:
> On Sun, May 19, 2013 at 1:25 PM, Eliezer Tamir
> <eliezer.tamir@linux.intel.com> wrote:
>> This is an updated version of the code we posted on February.
>
> Last time you've placed a copy of the patchset in the rfc branch of
> git://github.com/jbrandeb/lls.git  - can you repost there V2 too?
>
> Or.
>
Yes, but it will have to wait for tomorrow.
It's on Jesse's github account and for him it's still the middle of the 
weekend.

-Eliezer

------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 net-next 0/4] net: low latency Ethernet device polling
  2013-05-19 19:06   ` Or Gerlitz
  (?)
  (?)
@ 2013-05-19 19:25   ` Eliezer Tamir
  2013-05-19 19:56     ` Or Gerlitz
  -1 siblings, 1 reply; 29+ messages in thread
From: Eliezer Tamir @ 2013-05-19 19:25 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Dave Miller, linux-kernel, netdev, Jesse Brandeburg,
	Don Skidmore, e1000-devel, Willem de Bruijn, Andi Kleen, HPA,
	Eliezer Tamir

On 19/05/2013 22:06, Or Gerlitz wrote:
> Last time you've placed a copy of the patchset in the rfc branch of
> git://github.com/jbrandeb/lls.git  - can you repost there V2 too?
>
> Or.
>
BTW did you try the last version on your HW?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 net-next 0/4] net: low latency Ethernet device polling
  2013-05-19 19:25   ` Eliezer Tamir
@ 2013-05-19 19:56     ` Or Gerlitz
  2013-05-20 10:26       ` Eliezer Tamir
  0 siblings, 1 reply; 29+ messages in thread
From: Or Gerlitz @ 2013-05-19 19:56 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: Willem de Bruijn, e1000-devel, netdev, HPA, Jesse Brandeburg,
	linux-kernel, Andi Kleen, Eliezer Tamir, Dave Miller


[-- Attachment #1.1: Type: text/plain, Size: 442 bytes --]

On Sun, May 19, 2013 at 10:25 PM, Eliezer Tamir <
eliezer.tamir@linux.intel.com> wrote:

> On 19/05/2013 22:06, Or Gerlitz wrote:
>
>> Last time you've placed a copy of the patchset in the rfc branch of
>> git://github.com/jbrandeb/lls.**git <http://github.com/jbrandeb/lls.git> - can you repost there V2 too?
>>
>> Or.
>>
>>  BTW did you try the last version on your HW?
>


nope, you didn't provide mlx4 patch :( we will look into that ...

[-- Attachment #2: Type: text/plain, Size: 403 bytes --]

------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d

[-- Attachment #3: Type: text/plain, Size: 257 bytes --]

_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 net-next 1/4] net: implement support for low latency socket polling
  2013-05-19 10:25   ` Eliezer Tamir
  (?)
@ 2013-05-20  7:54   ` David Miller
  2013-05-20  8:29       ` Joe Perches
  2013-05-20  9:39       ` Eliezer Tamir
  -1 siblings, 2 replies; 29+ messages in thread
From: David Miller @ 2013-05-20  7:54 UTC (permalink / raw)
  To: eliezer.tamir
  Cc: linux-kernel, netdev, jesse.brandeburg, donanld.c.skidmore,
	e1000-devel, willemb, andi, hpa, eliezer

From: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Date: Sun, 19 May 2013 13:25:33 +0300

> +#ifndef _LINUX_NET_LL_POLL_H
> +#define _LINUX_NET_LL_POLL_H
> +#ifdef CONFIG_INET_LL_RX_POLL

Please put an empty line before the final ifdef test here.

> +static inline void sk_mark_ll(struct sock *sk, struct sk_buff *skb)
> +{
> +		sk->dev_ref = skb->dev_ref;
	^^^

One tab too many.

> +#else /* CONFIG_INET_LL_RX_FLUSH */
> +
> +#define sk_valid_ll(sk) 0
> +#define sk_poll_ll(sk, nonblock) do {} while (0)
> +#define skb_mark_ll(napi, skb) do {} while (0)
> +#define sk_mark_ll(sk, skb) do {} while (0)

Make these inline functions too, so that even if
CONFIG_INET_LL_RX_POLL is disabled, the arguments and return values
are still properly type checked.

>  {
>  	struct socket *sock;
> +	unsigned int poll_result;

Please order local variable declarations from longest line to
shortest line.

> +	    !(poll_result & (POLLRDNORM | POLLERR | POLLRDHUP | POLLHUP))) {
> +
> +		struct sock *sk = sock->sk;

Please remove the empty line before the variable declaration.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 net-next 1/4] net: implement support for low latency socket polling
  2013-05-20  7:54   ` David Miller
@ 2013-05-20  8:29       ` Joe Perches
  2013-05-20  9:39       ` Eliezer Tamir
  1 sibling, 0 replies; 29+ messages in thread
From: Joe Perches @ 2013-05-20  8:29 UTC (permalink / raw)
  To: David Miller
  Cc: eliezer.tamir, linux-kernel, netdev, jesse.brandeburg,
	donanld.c.skidmore, e1000-devel, willemb, andi, hpa, eliezer

On Mon, 2013-05-20 at 00:54 -0700, David Miller wrote:
> From: Eliezer Tamir <eliezer.tamir@linux.intel.com>
[]
> >  {
> >       struct socket *sock;
> > +     unsigned int poll_result;
> 
> Please order local variable declarations from longest line to
> shortest line.

reverse christmas tree doesn't seem especially valuable or
sensible for automatics.

I'd rather order by first use or even alphabetical, especially
with long lists of automatics.




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 net-next 1/4] net: implement support for low latency socket polling
@ 2013-05-20  8:29       ` Joe Perches
  0 siblings, 0 replies; 29+ messages in thread
From: Joe Perches @ 2013-05-20  8:29 UTC (permalink / raw)
  To: David Miller
  Cc: willemb, eliezer.tamir, e1000-devel, netdev, jesse.brandeburg,
	linux-kernel, andi, donanld.c.skidmore, hpa, eliezer

On Mon, 2013-05-20 at 00:54 -0700, David Miller wrote:
> From: Eliezer Tamir <eliezer.tamir@linux.intel.com>
[]
> >  {
> >       struct socket *sock;
> > +     unsigned int poll_result;
> 
> Please order local variable declarations from longest line to
> shortest line.

reverse christmas tree doesn't seem especially valuable or
sensible for automatics.

I'd rather order by first use or even alphabetical, especially
with long lists of automatics.




------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 net-next 1/4] net: implement support for low latency socket polling
  2013-05-20  8:29       ` Joe Perches
  (?)
@ 2013-05-20  9:16       ` David Miller
  -1 siblings, 0 replies; 29+ messages in thread
From: David Miller @ 2013-05-20  9:16 UTC (permalink / raw)
  To: joe
  Cc: eliezer.tamir, linux-kernel, netdev, jesse.brandeburg,
	donanld.c.skidmore, e1000-devel, willemb, andi, hpa, eliezer

From: Joe Perches <joe@perches.com>
Date: Mon, 20 May 2013 01:29:47 -0700

> On Mon, 2013-05-20 at 00:54 -0700, David Miller wrote:
>> From: Eliezer Tamir <eliezer.tamir@linux.intel.com>
> []
>> >  {
>> >       struct socket *sock;
>> > +     unsigned int poll_result;
>> 
>> Please order local variable declarations from longest line to
>> shortest line.
> 
> reverse christmas tree doesn't seem especially valuable or
> sensible for automatics.

It looks nicer to the eyes and provides a predictable pattern
to look at.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 net-next 1/4] net: implement support for low latency socket polling
  2013-05-20  7:54   ` David Miller
@ 2013-05-20  9:39       ` Eliezer Tamir
  2013-05-20  9:39       ` Eliezer Tamir
  1 sibling, 0 replies; 29+ messages in thread
From: Eliezer Tamir @ 2013-05-20  9:39 UTC (permalink / raw)
  To: David Miller
  Cc: linux-kernel, netdev, jesse.brandeburg, Skidmore, Donald C,
	e1000-devel, willemb, andi, hpa, eliezer

On 20/05/2013 10:54, David Miller wrote:
> From: Eliezer Tamir <eliezer.tamir@linux.intel.com>
> Date: Sun, 19 May 2013 13:25:33 +0300
>
>> +#else /* CONFIG_INET_LL_RX_FLUSH */
>> +
>> +#define sk_valid_ll(sk) 0
>> +#define sk_poll_ll(sk, nonblock) do {} while (0)
>> +#define skb_mark_ll(napi, skb) do {} while (0)
>> +#define sk_mark_ll(sk, skb) do {} while (0)
>
> Make these inline functions too, so that even if
> CONFIG_INET_LL_RX_POLL is disabled, the arguments and return values
> are still properly type checked.

Is this what you had in mind?

static inline bool sk_valid_ll(struct sock *sk)
{
	return 0;
}

static inline bool sk_poll_ll(struct sock *sk, int noblock)
{
	return 0;
}

static inline void skb_mark_ll(struct sk_buff *skb, struct napi_struct 
*napi)
{
}

static inline voiv sk_mark_ll(struct sock *sk, struct sk_buff *skb)
{
}

would you like me to resend the whole set or just this patch?

Thanks,
Eliezer

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 net-next 1/4] net: implement support for low latency socket polling
@ 2013-05-20  9:39       ` Eliezer Tamir
  0 siblings, 0 replies; 29+ messages in thread
From: Eliezer Tamir @ 2013-05-20  9:39 UTC (permalink / raw)
  To: David Miller
  Cc: willemb, e1000-devel, netdev, hpa, linux-kernel,
	jesse.brandeburg, andi, eliezer

On 20/05/2013 10:54, David Miller wrote:
> From: Eliezer Tamir <eliezer.tamir@linux.intel.com>
> Date: Sun, 19 May 2013 13:25:33 +0300
>
>> +#else /* CONFIG_INET_LL_RX_FLUSH */
>> +
>> +#define sk_valid_ll(sk) 0
>> +#define sk_poll_ll(sk, nonblock) do {} while (0)
>> +#define skb_mark_ll(napi, skb) do {} while (0)
>> +#define sk_mark_ll(sk, skb) do {} while (0)
>
> Make these inline functions too, so that even if
> CONFIG_INET_LL_RX_POLL is disabled, the arguments and return values
> are still properly type checked.

Is this what you had in mind?

static inline bool sk_valid_ll(struct sock *sk)
{
	return 0;
}

static inline bool sk_poll_ll(struct sock *sk, int noblock)
{
	return 0;
}

static inline void skb_mark_ll(struct sk_buff *skb, struct napi_struct 
*napi)
{
}

static inline voiv sk_mark_ll(struct sock *sk, struct sk_buff *skb)
{
}

would you like me to resend the whole set or just this patch?

Thanks,
Eliezer

------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 net-next 0/4] net: low latency Ethernet device polling
  2013-05-19 19:56     ` Or Gerlitz
@ 2013-05-20 10:26       ` Eliezer Tamir
  0 siblings, 0 replies; 29+ messages in thread
From: Eliezer Tamir @ 2013-05-20 10:26 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Dave Miller, linux-kernel, netdev, Jesse Brandeburg,
	Don Skidmore, e1000-devel, Willem de Bruijn, Andi Kleen, HPA,
	Eliezer Tamir

On 19/05/2013 22:56, Or Gerlitz wrote:
> On Sun, May 19, 2013 at 10:25 PM, Eliezer Tamir <
> eliezer.tamir@linux.intel.com> wrote:
>
>> On 19/05/2013 22:06, Or Gerlitz wrote:
>>
>>> Last time you've placed a copy of the patchset in the rfc branch of
>>> git://github.com/jbrandeb/lls.**git <http://github.com/jbrandeb/lls.git> - can you repost there V2 too?
>>>
>>> Or.
>>>
>>>   BTW did you try the last version on your HW?
>>
>
>
> nope, you didn't provide mlx4 patch :( we will look into that ...

I would never risk compromising your job security like that

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 net-next 1/4] net: implement support for low latency socket polling
  2013-05-19 10:25   ` Eliezer Tamir
@ 2013-05-20 16:22     ` Andi Kleen
  -1 siblings, 0 replies; 29+ messages in thread
From: Andi Kleen @ 2013-05-20 16:22 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: Dave Miller, linux-kernel, netdev, Jesse Brandeburg,
	Don Skidmore, e1000-devel, Willem de Bruijn, Andi Kleen, HPA,
	Eliezer Tamir

> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> index f98ca63..cfcf0ea 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -19,6 +19,11 @@ ip_no_pmtu_disc - BOOLEAN
>  	Disable Path MTU Discovery.
>  	default FALSE
>  
> +ip_low_latency_poll - INTEGER
> +	Low latency busy poll timeout. (needs CONFIG_INET_LL_RX_POLL)
> +	Approximate time in ms to spin waiting for packets on the device queue.
> +	default 0

Can you document the suggested value here?

Also I would add something like "may increase power usage"

The other thing I would add is a linuxmib statistic counter so that
we can see the polling happens.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 net-next 1/4] net: implement support for low latency socket polling
@ 2013-05-20 16:22     ` Andi Kleen
  0 siblings, 0 replies; 29+ messages in thread
From: Andi Kleen @ 2013-05-20 16:22 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: Willem de Bruijn, e1000-devel, netdev, Jesse Brandeburg,
	linux-kernel, Andi Kleen, Don Skidmore, HPA, Eliezer Tamir,
	Dave Miller

> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> index f98ca63..cfcf0ea 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -19,6 +19,11 @@ ip_no_pmtu_disc - BOOLEAN
>  	Disable Path MTU Discovery.
>  	default FALSE
>  
> +ip_low_latency_poll - INTEGER
> +	Low latency busy poll timeout. (needs CONFIG_INET_LL_RX_POLL)
> +	Approximate time in ms to spin waiting for packets on the device queue.
> +	default 0

Can you document the suggested value here?

Also I would add something like "may increase power usage"

The other thing I would add is a linuxmib statistic counter so that
we can see the polling happens.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 net-next 1/4] net: implement support for low latency socket polling
  2013-05-20  9:39       ` Eliezer Tamir
  (?)
@ 2013-05-20 19:34       ` David Miller
  -1 siblings, 0 replies; 29+ messages in thread
From: David Miller @ 2013-05-20 19:34 UTC (permalink / raw)
  To: eliezer.tamir
  Cc: linux-kernel, netdev, jesse.brandeburg, donald.c.skidmore,
	e1000-devel, willemb, andi, hpa, eliezer

From: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Date: Mon, 20 May 2013 12:39:59 +0300

> On 20/05/2013 10:54, David Miller wrote:
>> From: Eliezer Tamir <eliezer.tamir@linux.intel.com>
>> Date: Sun, 19 May 2013 13:25:33 +0300
>>
>>> +#else /* CONFIG_INET_LL_RX_FLUSH */
>>> +
>>> +#define sk_valid_ll(sk) 0
>>> +#define sk_poll_ll(sk, nonblock) do {} while (0)
>>> +#define skb_mark_ll(napi, skb) do {} while (0)
>>> +#define sk_mark_ll(sk, skb) do {} while (0)
>>
>> Make these inline functions too, so that even if
>> CONFIG_INET_LL_RX_POLL is disabled, the arguments and return values
>> are still properly type checked.
> 
> Is this what you had in mind?
> 
> static inline bool sk_valid_ll(struct sock *sk)

Yes.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [E1000-devel] [PATCH v2 net-next 0/4] net: low latency Ethernet device polling
  2013-05-19 19:20     ` Eliezer Tamir
  (?)
@ 2013-05-20 20:09     ` Jeff Kirsher
  2013-05-20 22:08       ` Jesse Brandeburg
  -1 siblings, 1 reply; 29+ messages in thread
From: Jeff Kirsher @ 2013-05-20 20:09 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: Or Gerlitz, Willem de Bruijn, e1000-devel, netdev,
	Jesse Brandeburg, linux-kernel, Andi Kleen, Don Skidmore, HPA,
	Eliezer Tamir, Dave Miller

[-- Attachment #1: Type: text/plain, Size: 709 bytes --]

On Sun, 2013-05-19 at 22:20 +0300, Eliezer Tamir wrote:
> On 19/05/2013 22:06, Or Gerlitz wrote:
> > On Sun, May 19, 2013 at 1:25 PM, Eliezer Tamir
> > <eliezer.tamir@linux.intel.com> wrote:
> >> This is an updated version of the code we posted on February.
> >
> > Last time you've placed a copy of the patchset in the rfc branch of
> > git://github.com/jbrandeb/lls.git  - can you repost there V2 too?
> >
> > Or.
> >
> Yes, but it will have to wait for tomorrow.
> It's on Jesse's github account and for him it's still the middle of the 
> weekend.
> 
> -Eliezer

If Jesse has not done so already, I can put your series of patches on a
branch of my kernel net-next tree.  Let me know...

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 net-next 0/4] net: low latency Ethernet device polling
  2013-05-20 20:09     ` [E1000-devel] " Jeff Kirsher
@ 2013-05-20 22:08       ` Jesse Brandeburg
  0 siblings, 0 replies; 29+ messages in thread
From: Jesse Brandeburg @ 2013-05-20 22:08 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: Willem de Bruijn, Eliezer Tamir, Or Gerlitz, e1000-devel,
	NetDEV list, Jesse Brandeburg, linux-kernel, Andi Kleen,
	Don Skidmore, HPA, Eliezer Tamir, Dave Miller


[-- Attachment #1.1: Type: text/plain, Size: 607 bytes --]

On Mon, May 20, 2013 at 1:09 PM, Jeff Kirsher
<jeffrey.t.kirsher@intel.com>wrote:

> On Sun, 2013-05-19 at 22:20 +0300, Eliezer Tamir wrote:
> > On 19/05/2013 22:06, Or Gerlitz wrote:
> > > On Sun, May 19, 2013 at 1:25 PM, Eliezer Tamir
> > > <eliezer.tamir@linux.intel.com> wrote:
> > >> This is an updated version of the code we posted on February.
> > >
> > > Last time you've placed a copy of the patchset in the rfc branch of
> > > git://github.com/jbrandeb/lls.git  - can you repost there V2 too?
>

the latest set (the v3 changes) were posted to
the rfcv2 branch on git://github.com/jbrandeb/lls.git

[-- Attachment #2: Type: text/plain, Size: 403 bytes --]

------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d

[-- Attachment #3: Type: text/plain, Size: 257 bytes --]

_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 net-next 0/4] net: low latency Ethernet device polling
  2013-05-19 19:06   ` Or Gerlitz
                     ` (2 preceding siblings ...)
  (?)
@ 2013-05-20 22:44   ` Brandeburg, Jesse
  -1 siblings, 0 replies; 29+ messages in thread
From: Brandeburg, Jesse @ 2013-05-20 22:44 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Eliezer Tamir, Dave Miller, linux-kernel, netdev,
	Jesse Brandeburg, Don Skidmore, e1000-devel, Willem de Bruijn,
	Andi Kleen, HPA, Eliezer Tamir

On Sun, 19 May 2013, Or Gerlitz wrote:
> On Sun, May 19, 2013 at 1:25 PM, Eliezer Tamir
> <eliezer.tamir@linux.intel.com> wrote:
> > This is an updated version of the code we posted on February.
> 
> Last time you've placed a copy of the patchset in the rfc branch of
> git://github.com/jbrandeb/lls.git  - can you repost there V2 too?

done, sorry for the dup, the first post got html munged by gmail web 
interface.

the latest set (the v3 changes) were posted to
the rfcv2 branch on git://github.com/jbrandeb/lls.git

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 net-next 1/4] net: implement support for low latency socket polling
  2013-05-20 16:22     ` Andi Kleen
@ 2013-05-21  6:16       ` Eliezer Tamir
  -1 siblings, 0 replies; 29+ messages in thread
From: Eliezer Tamir @ 2013-05-21  6:16 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Dave Miller, linux-kernel, netdev, Jesse Brandeburg,
	Don Skidmore, e1000-devel, Willem de Bruijn, HPA, Eliezer Tamir

On 20/05/2013 19:22, Andi Kleen wrote:
>> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
>> index f98ca63..cfcf0ea 100644
>> --- a/Documentation/networking/ip-sysctl.txt
>> +++ b/Documentation/networking/ip-sysctl.txt
>> @@ -19,6 +19,11 @@ ip_no_pmtu_disc - BOOLEAN
>>   	Disable Path MTU Discovery.
>>   	default FALSE
>>
>> +ip_low_latency_poll - INTEGER
>> +	Low latency busy poll timeout. (needs CONFIG_INET_LL_RX_POLL)
>> +	Approximate time in ms to spin waiting for packets on the device queue.
>> +	default 0
>
> Can you document the suggested value here?

ok
> Also I would add something like "may increase power usage"

:)
> The other thing I would add is a linuxmib statistic counter so that
> we can see the polling happens.

Yes,
I will add a counter for packets that were received through this path.

Eliezer

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 net-next 1/4] net: implement support for low latency socket polling
@ 2013-05-21  6:16       ` Eliezer Tamir
  0 siblings, 0 replies; 29+ messages in thread
From: Eliezer Tamir @ 2013-05-21  6:16 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Willem de Bruijn, e1000-devel, netdev, Jesse Brandeburg,
	linux-kernel, Don Skidmore, HPA, Eliezer Tamir, Dave Miller

On 20/05/2013 19:22, Andi Kleen wrote:
>> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
>> index f98ca63..cfcf0ea 100644
>> --- a/Documentation/networking/ip-sysctl.txt
>> +++ b/Documentation/networking/ip-sysctl.txt
>> @@ -19,6 +19,11 @@ ip_no_pmtu_disc - BOOLEAN
>>   	Disable Path MTU Discovery.
>>   	default FALSE
>>
>> +ip_low_latency_poll - INTEGER
>> +	Low latency busy poll timeout. (needs CONFIG_INET_LL_RX_POLL)
>> +	Approximate time in ms to spin waiting for packets on the device queue.
>> +	default 0
>
> Can you document the suggested value here?

ok
> Also I would add something like "may increase power usage"

:)
> The other thing I would add is a linuxmib statistic counter so that
> we can see the polling happens.

Yes,
I will add a counter for packets that were received through this path.

Eliezer

------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2013-05-21  6:16 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-19 10:25 [PATCH v2 net-next 0/4] net: low latency Ethernet device polling Eliezer Tamir
2013-05-19 10:25 ` [PATCH v2 net-next 1/4] net: implement support for low latency socket polling Eliezer Tamir
2013-05-19 10:25   ` Eliezer Tamir
2013-05-20  7:54   ` David Miller
2013-05-20  8:29     ` Joe Perches
2013-05-20  8:29       ` Joe Perches
2013-05-20  9:16       ` David Miller
2013-05-20  9:39     ` Eliezer Tamir
2013-05-20  9:39       ` Eliezer Tamir
2013-05-20 19:34       ` David Miller
2013-05-20 16:22   ` Andi Kleen
2013-05-20 16:22     ` Andi Kleen
2013-05-21  6:16     ` Eliezer Tamir
2013-05-21  6:16       ` Eliezer Tamir
2013-05-19 10:25 ` [PATCH v2 net-next 2/4] tcp: add TCP support for low latency receive poll Eliezer Tamir
2013-05-19 10:25   ` Eliezer Tamir
2013-05-19 10:25 ` [PATCH v2 net-next 3/4] ixgbe: Add support for ndo_ll_poll Eliezer Tamir
2013-05-19 10:25   ` Eliezer Tamir
2013-05-19 10:26 ` [PATCH v2 net-next 4/4] ixgbe: add extra stats " Eliezer Tamir
2013-05-19 19:06 ` [PATCH v2 net-next 0/4] net: low latency Ethernet device polling Or Gerlitz
2013-05-19 19:06   ` Or Gerlitz
2013-05-19 19:20   ` Eliezer Tamir
2013-05-19 19:20     ` Eliezer Tamir
2013-05-20 20:09     ` [E1000-devel] " Jeff Kirsher
2013-05-20 22:08       ` Jesse Brandeburg
2013-05-19 19:25   ` Eliezer Tamir
2013-05-19 19:56     ` Or Gerlitz
2013-05-20 10:26       ` Eliezer Tamir
2013-05-20 22:44   ` Brandeburg, Jesse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.