linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Duyck <alexander.duyck@gmail.com>
To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: sridhar.samudrala@intel.com, edumazet@google.com,
	davem@davemloft.net, linux-api@vger.kernel.org
Subject: [net-next PATCH v2 7/8] epoll: Add busy poll support to epoll with socket fds.
Date: Thu, 23 Mar 2017 14:37:55 -0700	[thread overview]
Message-ID: <20170323213755.12615.6599.stgit@localhost.localdomain> (raw)
In-Reply-To: <20170323211820.12615.88907.stgit@localhost.localdomain>

From: Sridhar Samudrala <sridhar.samudrala@intel.com>

This patch adds busy poll support to epoll. The implementation is meant to
be opportunistic in that it will take the NAPI ID from the last socket
that is added to the ready list that contains a valid NAPI ID and it will
use that for busy polling until the ready list goes empty.  Once the ready
list goes empty the NAPI ID is reset and busy polling is disabled until a
new socket is added to the ready list.

In addition when we insert a new socket into the epoll we record the NAPI
ID and assume we are going to receive events on it.  If that doesn't occur
it will be evicted as the active NAPI ID and we will resume normal
behavior.

An application can use SO_INCOMING_CPU or SO_REUSEPORT_ATTACH_C/EBPF socket
options to spread the incoming connections to specific worker threads
based on the incoming queue. This enables epoll for each worker thread
to have only sockets that receive packets from a single queue. So when an
application calls epoll_wait() and there are no events available to report,
busy polling is done on the associated queue to pull the packets.

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
 fs/eventpoll.c |   93 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 93 insertions(+)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 341251421ced..5420767c9b68 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -42,6 +42,7 @@
 #include <linux/seq_file.h>
 #include <linux/compat.h>
 #include <linux/rculist.h>
+#include <net/busy_poll.h>
 
 /*
  * LOCKING:
@@ -224,6 +225,11 @@ struct eventpoll {
 	/* used to optimize loop detection check */
 	int visited;
 	struct list_head visited_list_link;
+
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	/* used to track busy poll napi_id */
+	unsigned int napi_id;
+#endif
 };
 
 /* Wait structure used by the poll hooks */
@@ -384,6 +390,77 @@ static inline int ep_events_available(struct eventpoll *ep)
 	return !list_empty(&ep->rdllist) || ep->ovflist != EP_UNACTIVE_PTR;
 }
 
+#ifdef CONFIG_NET_RX_BUSY_POLL
+static bool ep_busy_loop_end(void *p, unsigned long start_time)
+{
+	struct eventpoll *ep = p;
+
+	return ep_events_available(ep) || busy_loop_timeout(start_time);
+}
+#endif /* CONFIG_NET_RX_BUSY_POLL */
+
+/*
+ * Busy poll if globally on and supporting sockets found && no events,
+ * busy loop will return if need_resched or ep_events_available.
+ *
+ * we must do our busy polling with irqs enabled
+ */
+static void ep_busy_loop(struct eventpoll *ep, int nonblock)
+{
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	unsigned int napi_id = READ_ONCE(ep->napi_id);
+
+	if ((napi_id >= MIN_NAPI_ID) && net_busy_loop_on())
+		napi_busy_loop(napi_id, nonblock ? NULL : ep_busy_loop_end, ep);
+#endif
+}
+
+static inline void ep_reset_busy_poll_napi_id(struct eventpoll *ep)
+{
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	if (ep->napi_id)
+		ep->napi_id = 0;
+#endif
+}
+
+/*
+ * Set epoll busy poll NAPI ID from sk.
+ */
+static inline void ep_set_busy_poll_napi_id(struct epitem *epi)
+{
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	struct eventpoll *ep;
+	unsigned int napi_id;
+	struct socket *sock;
+	struct sock *sk;
+	int err;
+
+	if (!net_busy_loop_on())
+		return;
+
+	sock = sock_from_file(epi->ffd.file, &err);
+	if (!sock)
+		return;
+
+	sk = sock->sk;
+	if (!sk)
+		return;
+
+	napi_id = READ_ONCE(sk->sk_napi_id);
+	ep = epi->ep;
+
+	/* Non-NAPI IDs can be rejected
+	 *	or
+	 * Nothing to do if we already have this ID
+	 */
+	if (napi_id < MIN_NAPI_ID || napi_id == ep->napi_id)
+		return;
+
+	/* record NAPI ID for use in next busy poll */
+	ep->napi_id = napi_id;
+#endif
+}
+
 /**
  * ep_call_nested - Perform a bound (possibly) nested call, by checking
  *                  that the recursion limit is not exceeded, and that
@@ -1022,6 +1099,8 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k
 
 	spin_lock_irqsave(&ep->lock, flags);
 
+	ep_set_busy_poll_napi_id(epi);
+
 	/*
 	 * If the event mask does not contain any poll(2) event, we consider the
 	 * descriptor to be disabled. This condition is likely the effect of the
@@ -1363,6 +1442,9 @@ static int ep_insert(struct eventpoll *ep, struct epoll_event *event,
 	/* We have to drop the new item inside our item list to keep track of it */
 	spin_lock_irqsave(&ep->lock, flags);
 
+	/* record NAPI ID of new item if present */
+	ep_set_busy_poll_napi_id(epi);
+
 	/* If the file is already "ready" we drop it inside the ready list */
 	if ((revents & event->events) && !ep_is_linked(&epi->rdllink)) {
 		list_add_tail(&epi->rdllink, &ep->rdllist);
@@ -1637,10 +1719,21 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
 	}
 
 fetch_events:
+
+	if (!ep_events_available(ep))
+		ep_busy_loop(ep, timed_out);
+
 	spin_lock_irqsave(&ep->lock, flags);
 
 	if (!ep_events_available(ep)) {
 		/*
+		 * Busy poll timed out.  Drop NAPI ID for now, we can add
+		 * it back in when we have moved a socket with a valid NAPI
+		 * ID onto the ready list.
+		 */
+		ep_reset_busy_poll_napi_id(ep);
+
+		/*
 		 * We don't have any available event to return to the caller.
 		 * We need to sleep here, and we will be wake up by
 		 * ep_poll_callback() when events will become available.

  parent reply	other threads:[~2017-03-23 21:37 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-23 21:36 [net-next PATCH v2 0/8] Add busy poll support for epoll Alexander Duyck
     [not found] ` <20170323211820.12615.88907.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2017-03-23 21:36   ` [net-next PATCH v2 1/8] net: Busy polling should ignore sender CPUs Alexander Duyck
2017-03-23 22:05     ` Eric Dumazet
2017-03-23 21:36 ` [net-next PATCH v2 2/8] tcp: Record Rx hash and NAPI ID in tcp_child_process Alexander Duyck
     [not found]   ` <20170323213644.12615.27158.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2017-03-24  1:00     ` Eric Dumazet
2017-03-23 21:36 ` [net-next PATCH v2 3/8] net: Only define skb_mark_napi_id in one spot instead of two Alexander Duyck
     [not found]   ` <20170323213651.12615.62895.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2017-03-24  1:02     ` Eric Dumazet
2017-03-23 21:37 ` [net-next PATCH v2 4/8] net: Change return type of sk_busy_loop from bool to void Alexander Duyck
     [not found]   ` <20170323213715.12615.49246.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2017-03-24  1:02     ` Eric Dumazet
2017-03-23 21:37 ` [net-next PATCH v2 5/8] net: Track start of busy loop instead of when it should end Alexander Duyck
2017-03-24  1:24   ` Eric Dumazet
2017-03-24  3:42     ` Alexander Duyck
     [not found]       ` <CAKgT0Ue+UFOAhwxN-EOjWYZa9YZc2QaT5gvoLap70COs0rF7NA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-03-24  4:27         ` Eric Dumazet
2017-03-24  4:38           ` Eric Dumazet
2017-03-24  5:55           ` Alexander Duyck
     [not found]             ` <CAKgT0UfQPLqHtZ_SjanbLoG6t5f36C4nsORuw6yjLT0xZrEoEQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-03-24 11:16               ` Eric Dumazet
     [not found]                 ` <1490354210.9687.44.camel-XN9IlZ5yJG9HTL0Zs8A6p+yfmBU6pStAUsxypvmhUTTZJqsBc5GL+g@public.gmane.org>
2017-03-24 15:48                   ` Alexander Duyck
2017-03-23 21:37 ` [net-next PATCH v2 6/8] net: Commonize busy polling code to focus on napi_id instead of socket Alexander Duyck
     [not found]   ` <20170323213749.12615.38165.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2017-03-23 22:25     ` Eric Dumazet
2017-03-23 21:37 ` Alexander Duyck [this message]
2017-03-23 21:38 ` [net-next PATCH v2 8/8] net: Introduce SO_INCOMING_NAPI_ID Alexander Duyck
     [not found]   ` <20170323213802.12615.58216.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2017-03-23 22:22     ` Eric Dumazet
2017-03-23 22:43     ` Andy Lutomirski
2017-03-24  0:58       ` Alexander Duyck
     [not found]         ` <CAKgT0UcHJVycQ3+h09L2Ph=TVncqHPJ6dZpicUgBo7TaFTN7yw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-03-24  4:47           ` Andy Lutomirski
2017-03-24  5:07             ` Eric Dumazet
2017-03-23 22:07 ` [net-next PATCH v2 0/8] Add busy poll support for epoll Alexei Starovoitov
     [not found]   ` <20170323220721.GA62356-+o4/htvd0TDFYCXBM6kdu7fOX0fSgVTm@public.gmane.org>
2017-03-23 22:38     ` Alexander Duyck
     [not found]       ` <CAKgT0Uctdf4N-w72EY8T5Zfw=QCSCgAesXdgOh-HUYdD=Aq9AA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-03-23 22:49         ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170323213755.12615.6599.stgit@localhost.localdomain \
    --to=alexander.duyck@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=sridhar.samudrala@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).