From: Jason Baron <jbaron@akamai.com>
To: akpm@linux-foundation.org
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
Alexander Viro <viro@zeniv.linux.org.uk>,
Salman Qazi <sqazi@google.com>
Subject: [PATCH] epoll: avoid calling ep_call_nested() from ep_poll_safewake()
Date: Fri, 13 Oct 2017 14:48:53 -0400 [thread overview]
Message-ID: <1507920533-8812-1-git-send-email-jbaron@akamai.com> (raw)
The ep_poll_safewake() function is used to wakeup potentially nested epoll
file descriptors. The function uses ep_call_nested() to prevent entering
the same wake up queue more than once, and to prevent excessively deep
wakeup paths (deeper than EP_MAX_NESTS). However, this is not necessary
since we are already preventing these conditions during EPOLL_CTL_ADD. This
saves extra function calls, and avoids taking a global lock during the
ep_call_nested() calls.
I have, however, left ep_call_nested() for the CONFIG_DEBUG_LOCK_ALLOC
case, since ep_call_nested() keeps track of the nesting level, and this is
required by the call to spin_lock_irqsave_nested(). It would be nice to
remove the ep_call_nested() calls for the CONFIG_DEBUG_LOCK_ALLOC case as
well, however its not clear how to simply pass the nesting level through
multiple wake_up() levels without more surgery. In any case, I don't think
CONFIG_DEBUG_LOCK_ALLOC is generally used for production. This patch, also
apparently fixes a workload at Google that Salman Qazi reported by
completely removing the poll_safewake_ncalls->lock from wakeup paths.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Salman Qazi <sqazi@google.com>
---
fs/eventpoll.c | 47 ++++++++++++++++++-----------------------------
1 file changed, 18 insertions(+), 29 deletions(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 2fabd19..69acfab 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -276,9 +276,6 @@ static DEFINE_MUTEX(epmutex);
/* Used to check for epoll file descriptor inclusion loops */
static struct nested_calls poll_loop_ncalls;
-/* Used for safe wake up implementation */
-static struct nested_calls poll_safewake_ncalls;
-
/* Used to call file's f_op->poll() under the nested calls boundaries */
static struct nested_calls poll_readywalk_ncalls;
@@ -551,40 +548,21 @@ static int ep_call_nested(struct nested_calls *ncalls, int max_nests,
* this special case of epoll.
*/
#ifdef CONFIG_DEBUG_LOCK_ALLOC
-static inline void ep_wake_up_nested(wait_queue_head_t *wqueue,
- unsigned long events, int subclass)
+
+static struct nested_calls poll_safewake_ncalls;
+
+static int ep_poll_wakeup_proc(void *priv, void *cookie, int call_nests)
{
unsigned long flags;
+ wait_queue_head_t *wqueue = (wait_queue_head_t *)cookie;
- spin_lock_irqsave_nested(&wqueue->lock, flags, subclass);
- wake_up_locked_poll(wqueue, events);
+ spin_lock_irqsave_nested(&wqueue->lock, flags, call_nests + 1);
+ wake_up_locked_poll(wqueue, POLLIN);
spin_unlock_irqrestore(&wqueue->lock, flags);
-}
-#else
-static inline void ep_wake_up_nested(wait_queue_head_t *wqueue,
- unsigned long events, int subclass)
-{
- wake_up_poll(wqueue, events);
-}
-#endif
-static int ep_poll_wakeup_proc(void *priv, void *cookie, int call_nests)
-{
- ep_wake_up_nested((wait_queue_head_t *) cookie, POLLIN,
- 1 + call_nests);
return 0;
}
-/*
- * Perform a safe wake up of the poll wait list. The problem is that
- * with the new callback'd wake up system, it is possible that the
- * poll callback is reentered from inside the call to wake_up() done
- * on the poll wait queue head. The rule is that we cannot reenter the
- * wake up code from the same task more than EP_MAX_NESTS times,
- * and we cannot reenter the same wait queue head at all. This will
- * enable to have a hierarchy of epoll file descriptor of no more than
- * EP_MAX_NESTS deep.
- */
static void ep_poll_safewake(wait_queue_head_t *wq)
{
int this_cpu = get_cpu();
@@ -595,6 +573,15 @@ static void ep_poll_safewake(wait_queue_head_t *wq)
put_cpu();
}
+#else
+
+static void ep_poll_safewake(wait_queue_head_t *wq)
+{
+ wake_up_poll(wq, POLLIN);
+}
+
+#endif
+
static void ep_remove_wait_queue(struct eppoll_entry *pwq)
{
wait_queue_head_t *whead;
@@ -2315,8 +2302,10 @@ static int __init eventpoll_init(void)
*/
ep_nested_calls_init(&poll_loop_ncalls);
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
/* Initialize the structure used to perform safe poll wait head wake ups */
ep_nested_calls_init(&poll_safewake_ncalls);
+#endif
/* Initialize the structure used to perform file's f_op->poll() calls */
ep_nested_calls_init(&poll_readywalk_ncalls);
--
2.6.1
next reply other threads:[~2017-10-13 18:49 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-13 18:48 Jason Baron [this message]
2017-10-17 15:37 ` [PATCH] epoll: avoid calling ep_call_nested() from ep_poll_safewake() Davidlohr Bueso
2017-10-18 14:03 ` Jason Baron
2017-10-28 14:05 ` Davidlohr Bueso
2017-10-31 14:14 ` Jason Baron
2018-01-18 11:00 ` Hou Tao
2018-01-18 21:18 ` Jason Baron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1507920533-8812-1-git-send-email-jbaron@akamai.com \
--to=jbaron@akamai.com \
--cc=akpm@linux-foundation.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=sqazi@google.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).