linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/8] epoll: remove epmutex from ep_free() and eventpoll_release_file() for non-nested case
@ 2017-10-28 12:58 Hou Tao
  2017-10-28 12:58 ` [RFC][PATCH 1/8] epoll: remove epmutex from ep_free() & eventpoll_release_file() Hou Tao
                   ` (8 more replies)
  0 siblings, 9 replies; 12+ messages in thread
From: Hou Tao @ 2017-10-28 12:58 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, viro, jbaron, oleg, dave, koct9i

Hi,

We are optimizing the Request-Per-Second of nginx http server, and we found
that acquiring epmutex in eventpoll_release_file() will become a bottleneck
under the one-request-per-connection scenario. The following are some details
of the scenario:

* HTTP server (nginx):
	* under ARM64 with 64 cores
	* 64 worker processes, each worker is binded to a specific CPU
	* keepalive_requests = 1 in nginx.conf: nginx will close the
	  connection fd after a reply is send
* HTTP benchmark tool (wrk):
	* under x86-64 with 48 cores
	* 16 threads, 64 connections per-thread

Before the patch, the RPS measured by wrk is ~220K, after applying
the patch the RPS is ~240K. We also measure the overhead of
eventpoll_release_file() and its children by perf: 29% before and
2% after.

In the following section I will explain the purposes of epmutex, and
the way of replacing it by using locks with a smaller granularity.

epmutex serves four purposes:
(1) serialize ep_loop_check() and ep_free()/eventpoll_release_file()
	(a) ensure the validity of ep when clearing visited_list
	The acquisition of epmutex in ep_free() prevent the freeing of ep.

	It's fixed in patch 2: when freeing ep, remove it from visited_list.
	When there is no nested-epoll cast, ep will not been added to
	visited_list, so we check the condition first. If it has already been
	added to visited_list, we need to wait for the release of epmutex.

(2) serialize reverse_path_check() and ep_free()/eventpoll_release_file()
	(a) ensure the validity of file in tfile_check_list
	epi->ffd.file was added to tfile_check_list under ep->mtx, but
	was accessed without ep->mtx. The acquisition of epmutex in
	eventpoll_release_file() prevent the freeing of file.

	It's fixed in patch 3: when releasing file, remove it from
	tfile_check_list. If it has been already added, we need to
	wait for the release of epmutex.

	(b) ensure the validity of epi->ep and epi->ep->file
	The epmutex will prevent the freeing of ep and its related file,
	so it's OK to access epi->ep under rcu read critical region.

	The change is done in patch 4: we free ep by rcu, so it's OK
	to access epi->ep->file under rcu read critical region. The file
	has already been freed by rcu, so it's also OK to access its fields.

(3) serialize the concurrent invocations of epoll_ctl(EPOLL_CTL_ADD)
    for the nested-epoll-fd case
	(a) protect tfile_check_list and visited_list

	There is nothing to do.

(4) serialize ep_free() and eventpoll_release_file()
	(a) protect file->f_ep_links
	eventpoll_release_file() will read the list through
	file->f_ep_links, and modify it through epi->fllink.
	ep_free() will modify it through epi->fllink.

	It's fixed in patch 5: using rcu and list_first_or_null_rcu() to
	iterate file->f_ep_links instead of epmutex.

	(b) ensure the validity of epi->ep
	When eventpoll_release_file() gets epi from file->f_ep_links,
	epi->ep should still be valid.

	It's fixed in patch 4 and 6: add an ref-counter to eventpoll and
	free eventpoll by rcu.

	(c) protect the removal of epi
	Both ep_free() and eventpoll_release_file() will try to remove
	the same epi, if one function has removed the epi, the other
	function should not remove it again.

	It's fixed in patch 7: check whether or not ep_free() has already
	removed the epi before the invocation of ep_remove() in
	eventpoll_release_file().

	(d) ensure the validity of epi->ffd.file
	When ep_remove() is invoked by ep_free(), epi->ffd.file should
	still be valid.

	Do not need to do anything: when ep_free() is invoking ep_remove()
	and access epi->ffd.file, if the file is freeing, the freeing will
	be blocked on ep->mtx, so it's OK to access the file in ep_remove().

Patch 1 just removes epmutex from ep_free() and eventpoll_release_file(),
and patch 8 enlarge the protected region of ep->mtx to protect against
the iteration of ep->rbr.

The patch set has passed the epoll related test cases in LTP, and we are
planing to run some torture or performance test cases for nested-epoll
cases.

Comments and questions are welcome.

Regards,

Tao
---
Hou Tao (8):
  epoll: remove epmutex from ep_free() & eventpoll_release_file()
  epoll: remove ep from visited_list when freeing ep
  epoll: remove file from tfile_check_list when releasing file
  epoll: free eventpoll by rcu to provide existence guarantee
  epoll: iterate epi in file->f_ep_links by using list_first_or_null_rcu
  epoll: ensure the validity of ep when removing epi in
    eventpoll_release_file()
  epoll: prevent the double-free of epi in eventpoll_release_file()
  epoll: protect the iteration of ep->rbr by ep->mtx in ep_free()

 fs/eventpoll.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 88 insertions(+), 14 deletions(-)

-- 
2.7.5

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC][PATCH 1/8] epoll: remove epmutex from ep_free() & eventpoll_release_file()
  2017-10-28 12:58 [RFC][PATCH 0/8] epoll: remove epmutex from ep_free() and eventpoll_release_file() for non-nested case Hou Tao
@ 2017-10-28 12:58 ` Hou Tao
  2017-10-28 13:58   ` Davidlohr Bueso
  2017-10-28 12:58 ` [RFC][PATCH 2/8] epoll: remove ep from visited_list when freeing ep Hou Tao
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 12+ messages in thread
From: Hou Tao @ 2017-10-28 12:58 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, viro, jbaron, oleg, dave, koct9i

Remove the global epmutex from ep_free() and eventpoll_release_file().
In the later patches, we will add locks with a smaller granularity
to serve the same purposes of epmutex.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 fs/eventpoll.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 2fabd19..26ab0c5 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -835,7 +835,6 @@ static void ep_free(struct eventpoll *ep)
 	 * anymore. The only hit might come from eventpoll_release_file() but
 	 * holding "epmutex" is sufficient here.
 	 */
-	mutex_lock(&epmutex);
 
 	/*
 	 * Walks through the whole tree by unregistering poll callbacks.
@@ -863,7 +862,6 @@ static void ep_free(struct eventpoll *ep)
 	}
 	mutex_unlock(&ep->mtx);
 
-	mutex_unlock(&epmutex);
 	mutex_destroy(&ep->mtx);
 	free_uid(ep->user);
 	wakeup_source_unregister(ep->ws);
@@ -1013,14 +1011,12 @@ void eventpoll_release_file(struct file *file)
 	 *
 	 * Besides, ep_remove() acquires the lock, so we can't hold it here.
 	 */
-	mutex_lock(&epmutex);
 	list_for_each_entry_safe(epi, next, &file->f_ep_links, fllink) {
 		ep = epi->ep;
 		mutex_lock_nested(&ep->mtx, 0);
 		ep_remove(ep, epi);
 		mutex_unlock(&ep->mtx);
 	}
-	mutex_unlock(&epmutex);
 }
 
 static int ep_alloc(struct eventpoll **pep)
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC][PATCH 2/8] epoll: remove ep from visited_list when freeing ep
  2017-10-28 12:58 [RFC][PATCH 0/8] epoll: remove epmutex from ep_free() and eventpoll_release_file() for non-nested case Hou Tao
  2017-10-28 12:58 ` [RFC][PATCH 1/8] epoll: remove epmutex from ep_free() & eventpoll_release_file() Hou Tao
@ 2017-10-28 12:58 ` Hou Tao
  2017-10-28 12:58 ` [RFC][PATCH 3/8] epoll: remove file from tfile_check_list when releasing file Hou Tao
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Hou Tao @ 2017-10-28 12:58 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, viro, jbaron, oleg, dave, koct9i

Before the removal of epmutex, the acquisition of epmutex in
ep_free() will prevent the freeing of ep, so it's OK to access
ep in visited_list in ep_loop_check().

To ensure the validity of ep when clearing visited_list, we need
to remove ep from visited_list when freeing ep. If the ep had been
added to the visited_list, we need to wait for its removal.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 fs/eventpoll.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 26ab0c5..44ea587 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -862,6 +862,18 @@ static void ep_free(struct eventpoll *ep)
 	}
 	mutex_unlock(&ep->mtx);
 
+	/*
+	 * ep will not been added to visited_list, because ep_ctrl()
+	 * can not get its reference and can not reference it by the
+	 * corresponding epitem. The only possible operation is list_del_init,
+	 * so it's OK to use list_empty_careful() here.
+	 */
+	if (!list_empty_careful(&ep->visited_list_link)) {
+		mutex_lock(&epmutex);
+		list_del_init(&ep->visited_list_link);
+		mutex_unlock(&epmutex);
+	}
+
 	mutex_destroy(&ep->mtx);
 	free_uid(ep->user);
 	wakeup_source_unregister(ep->ws);
@@ -1039,6 +1051,7 @@ static int ep_alloc(struct eventpoll **pep)
 	ep->rbr = RB_ROOT_CACHED;
 	ep->ovflist = EP_UNACTIVE_PTR;
 	ep->user = user;
+	INIT_LIST_HEAD(&ep->visited_list_link);
 
 	*pep = ep;
 
@@ -1928,7 +1941,7 @@ static int ep_loop_check(struct eventpoll *ep, struct file *file)
 	list_for_each_entry_safe(ep_cur, ep_next, &visited_list,
 							visited_list_link) {
 		ep_cur->visited = 0;
-		list_del(&ep_cur->visited_list_link);
+		list_del_init(&ep_cur->visited_list_link);
 	}
 	return ret;
 }
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC][PATCH 3/8] epoll: remove file from tfile_check_list when releasing file
  2017-10-28 12:58 [RFC][PATCH 0/8] epoll: remove epmutex from ep_free() and eventpoll_release_file() for non-nested case Hou Tao
  2017-10-28 12:58 ` [RFC][PATCH 1/8] epoll: remove epmutex from ep_free() & eventpoll_release_file() Hou Tao
  2017-10-28 12:58 ` [RFC][PATCH 2/8] epoll: remove ep from visited_list when freeing ep Hou Tao
@ 2017-10-28 12:58 ` Hou Tao
  2017-10-28 12:58 ` [RFC][PATCH 4/8] epoll: free eventpoll by rcu to provide existence guarantee Hou Tao
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Hou Tao @ 2017-10-28 12:58 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, viro, jbaron, oleg, dave, koct9i

Before the removal of epmutex, the acquisition of epmutex in
eventpoll_release_file() will prevent the freeing of file, so
it's OK to iterate files in tfile_check_list.

And now epmutex is removed, so when releasing file, we need
to remove file from tfile_check_list to ensure the validity
of file.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 fs/eventpoll.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 44ea587..998c635 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1029,6 +1029,18 @@ void eventpoll_release_file(struct file *file)
 		ep_remove(ep, epi);
 		mutex_unlock(&ep->mtx);
 	}
+
+	/*
+	 * The file can not been added to tfile_check_list again, because
+	 * (1) its refcnt has been zero, so ep_ctrl() can no longer get its reference
+	 * (2) its related ep items have been removed, so ep_loop_check_proc()
+	 *     can not get the file by ep->rbr
+	 */
+	if (!list_empty_careful(&file->f_tfile_llink)) {
+		mutex_lock(&epmutex);
+		list_del_init(&file->f_tfile_llink);
+		mutex_unlock(&epmutex);
+	}
 }
 
 static int ep_alloc(struct eventpoll **pep)
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC][PATCH 4/8] epoll: free eventpoll by rcu to provide existence guarantee
  2017-10-28 12:58 [RFC][PATCH 0/8] epoll: remove epmutex from ep_free() and eventpoll_release_file() for non-nested case Hou Tao
                   ` (2 preceding siblings ...)
  2017-10-28 12:58 ` [RFC][PATCH 3/8] epoll: remove file from tfile_check_list when releasing file Hou Tao
@ 2017-10-28 12:58 ` Hou Tao
  2017-10-28 12:58 ` [RFC][PATCH 5/8] epoll: iterate epi in file->f_ep_links by using list_first_or_null_rcu Hou Tao
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Hou Tao @ 2017-10-28 12:58 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, viro, jbaron, oleg, dave, koct9i

Before the removal of epmutex, it's OK to access epi->ep in
reverse_path_check_proc(), because the freeing of ep will be
blocked on ep_free().

After the removal of epmutex, when accessing epi->ep in
reverse_path_check_proc(), it's possible that it has been
release because this eventpoll struct belongs to an epoll fd
which also polls the target file.

So freeing eventpoll by rcu to ensure the accessed fields of
eventpoll are still valid when invoking reverse_path_check_proc().

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 fs/eventpoll.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 998c635..18de596 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -230,6 +230,9 @@ struct eventpoll {
 	/* used to track busy poll napi_id */
 	unsigned int napi_id;
 #endif
+
+	/* used to free itself */
+	struct rcu_head rcu;
 };
 
 /* Wait structure used by the poll hooks */
@@ -818,6 +821,12 @@ static int ep_remove(struct eventpoll *ep, struct epitem *epi)
 	return 0;
 }
 
+static void ep_rcu_free(struct rcu_head *head)
+{
+	struct eventpoll *ep = container_of(head, struct eventpoll, rcu);
+	kfree(ep);
+}
+
 static void ep_free(struct eventpoll *ep)
 {
 	struct rb_node *rbp;
@@ -877,7 +886,8 @@ static void ep_free(struct eventpoll *ep)
 	mutex_destroy(&ep->mtx);
 	free_uid(ep->user);
 	wakeup_source_unregister(ep->ws);
-	kfree(ep);
+
+	call_rcu(&ep->rcu, ep_rcu_free);
 }
 
 static int ep_eventpoll_release(struct inode *inode, struct file *file)
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC][PATCH 5/8] epoll: iterate epi in file->f_ep_links by using list_first_or_null_rcu
  2017-10-28 12:58 [RFC][PATCH 0/8] epoll: remove epmutex from ep_free() and eventpoll_release_file() for non-nested case Hou Tao
                   ` (3 preceding siblings ...)
  2017-10-28 12:58 ` [RFC][PATCH 4/8] epoll: free eventpoll by rcu to provide existence guarantee Hou Tao
@ 2017-10-28 12:58 ` Hou Tao
  2017-10-28 12:58 ` [RFC][PATCH 6/8] epoll: ensure the validity of ep when removing epi in eventpoll_release_file() Hou Tao
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Hou Tao @ 2017-10-28 12:58 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, viro, jbaron, oleg, dave, koct9i

When eventpoll_release_file() iterates epitem in file->f_ep_links,
the epitem may be removed by ep_free(). To protect again the concurrent
writer, iterate file->f_ep_links by using rcu_read_lock() and
list_first_or_null_rcu()

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 fs/eventpoll.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 18de596..e1e4796 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1033,12 +1033,22 @@ void eventpoll_release_file(struct file *file)
 	 *
 	 * Besides, ep_remove() acquires the lock, so we can't hold it here.
 	 */
-	list_for_each_entry_safe(epi, next, &file->f_ep_links, fllink) {
+	rcu_read_lock();
+	while (true) {
+		epi = list_first_or_null_rcu(&file->f_ep_links, struct epitem, fllink);
+		if (!epi)
+			break;
+
 		ep = epi->ep;
+		rcu_read_unlock();
+
 		mutex_lock_nested(&ep->mtx, 0);
 		ep_remove(ep, epi);
 		mutex_unlock(&ep->mtx);
+
+		rcu_read_lock();
 	}
+	rcu_read_unlock();
 
 	/*
 	 * The file can not been added to tfile_check_list again, because
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC][PATCH 6/8] epoll: ensure the validity of ep when removing epi in eventpoll_release_file()
  2017-10-28 12:58 [RFC][PATCH 0/8] epoll: remove epmutex from ep_free() and eventpoll_release_file() for non-nested case Hou Tao
                   ` (4 preceding siblings ...)
  2017-10-28 12:58 ` [RFC][PATCH 5/8] epoll: iterate epi in file->f_ep_links by using list_first_or_null_rcu Hou Tao
@ 2017-10-28 12:58 ` Hou Tao
  2017-10-28 12:58 ` [RFC][PATCH 7/8] epoll: prevent the double-free of " Hou Tao
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Hou Tao @ 2017-10-28 12:58 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, viro, jbaron, oleg, dave, koct9i

Before the removal of epmutex, ep_free() will be blocked by epmutex when
invoking eventpoll_release_file(), so epi->ep will be valid.

After the removal of epmutex, epi->ep may be freed during the invocation
of eventpoll_release_file(). We can not use rcu_read_lock/unlock because
we needs to acquire ep->mtx which is a mutex, so we add a ref-counter to
eventpoll and increase it before leaving the rcu read critical region.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 fs/eventpoll.c | 38 +++++++++++++++++++++++++++++++-------
 1 file changed, 31 insertions(+), 7 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index e1e4796..27b743b 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -43,6 +43,7 @@
 #include <linux/compat.h>
 #include <linux/rculist.h>
 #include <net/busy_poll.h>
+#include <linux/refcount.h>
 
 /*
  * LOCKING:
@@ -231,6 +232,8 @@ struct eventpoll {
 	unsigned int napi_id;
 #endif
 
+	/* used to ensure the validity of eventpoll when release file */
+	refcount_t ref;
 	/* used to free itself */
 	struct rcu_head rcu;
 };
@@ -827,6 +830,25 @@ static void ep_rcu_free(struct rcu_head *head)
 	kfree(ep);
 }
 
+static void eventpoll_put_ep(struct eventpoll *ep)
+{
+	if (refcount_dec_and_test(&ep->ref)) {
+		mutex_destroy(&ep->mtx);
+		free_uid(ep->user);
+		wakeup_source_unregister(ep->ws);
+
+		call_rcu(&ep->rcu, ep_rcu_free);
+	}
+}
+
+static struct eventpoll *eventpoll_get_ep(struct eventpoll *ep)
+{
+	if (refcount_inc_not_zero(&ep->ref))
+		return ep;
+	else
+		return NULL;
+}
+
 static void ep_free(struct eventpoll *ep)
 {
 	struct rb_node *rbp;
@@ -883,11 +905,7 @@ static void ep_free(struct eventpoll *ep)
 		mutex_unlock(&epmutex);
 	}
 
-	mutex_destroy(&ep->mtx);
-	free_uid(ep->user);
-	wakeup_source_unregister(ep->ws);
-
-	call_rcu(&ep->rcu, ep_rcu_free);
+	eventpoll_put_ep(ep);
 }
 
 static int ep_eventpoll_release(struct inode *inode, struct file *file)
@@ -1018,7 +1036,7 @@ static const struct file_operations eventpoll_fops = {
 void eventpoll_release_file(struct file *file)
 {
 	struct eventpoll *ep;
-	struct epitem *epi, *next;
+	struct epitem *epi;
 
 	/*
 	 * We don't want to get "file->f_lock" because it is not
@@ -1039,13 +1057,18 @@ void eventpoll_release_file(struct file *file)
 		if (!epi)
 			break;
 
-		ep = epi->ep;
+		ep = eventpoll_get_ep(epi->ep);
+		/* Current epi has been removed by ep_free() */
+		if (!ep)
+			continue;
 		rcu_read_unlock();
 
 		mutex_lock_nested(&ep->mtx, 0);
 		ep_remove(ep, epi);
 		mutex_unlock(&ep->mtx);
 
+		eventpoll_put_ep(ep);
+
 		rcu_read_lock();
 	}
 	rcu_read_unlock();
@@ -1084,6 +1107,7 @@ static int ep_alloc(struct eventpoll **pep)
 	ep->ovflist = EP_UNACTIVE_PTR;
 	ep->user = user;
 	INIT_LIST_HEAD(&ep->visited_list_link);
+	refcount_set(&ep->ref, 1);
 
 	*pep = ep;
 
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC][PATCH 7/8] epoll: prevent the double-free of epi in eventpoll_release_file()
  2017-10-28 12:58 [RFC][PATCH 0/8] epoll: remove epmutex from ep_free() and eventpoll_release_file() for non-nested case Hou Tao
                   ` (5 preceding siblings ...)
  2017-10-28 12:58 ` [RFC][PATCH 6/8] epoll: ensure the validity of ep when removing epi in eventpoll_release_file() Hou Tao
@ 2017-10-28 12:58 ` Hou Tao
  2017-10-28 12:58 ` [RFC][PATCH 8/8] epoll: protect the iteration of ep->rbr by ep->mtx in ep_free() Hou Tao
  2017-10-31 13:01 ` [RFC][PATCH 0/8] epoll: remove epmutex from ep_free() and eventpoll_release_file() for non-nested case Davidlohr Bueso
  8 siblings, 0 replies; 12+ messages in thread
From: Hou Tao @ 2017-10-28 12:58 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, viro, jbaron, oleg, dave, koct9i

After the removal of epmutex, it's possible eventpoll_release_file()
and ep_free() will try to remove the same epi. The removal of epi
is protected by ep->mtx, so there are two possible sequences:

(1) eventpoll_release_file() first, then ep_free()
    there is no double-free, because after the invocation of
    eventpoll_release_file(), the epi will be removed from ep->rbr
    and ep_free() will not be able to free it.
(2) ep_free() first, then eventpoll_release_file()
    double-free is possible because before the invocation of ep_remove()
    in ep_free() eventpoll_release_file may has already got the epi from
    file->f_ep_links. To protect against the double-free case, check
    rb_first_cached() in eventpoll_release_file() to ensure the epi
    has not been removed by ep_free()

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 fs/eventpoll.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 27b743b..cd7a9f4 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1058,13 +1058,22 @@ void eventpoll_release_file(struct file *file)
 			break;
 
 		ep = eventpoll_get_ep(epi->ep);
-		/* Current epi has been removed by ep_free() */
+		/* Current epi had been removed by ep_free() */
 		if (!ep)
 			continue;
 		rcu_read_unlock();
 
 		mutex_lock_nested(&ep->mtx, 0);
-		ep_remove(ep, epi);
+		/*
+		 * If rb_first_cached() returns NULL, it means that the current epi
+		 * had been removed by ep_free(). To prevent epi from double-freeing,
+		 * check the condition before invoking ep_remove().
+		 * If eventpoll_release_file() frees epi firstly, the epi will not
+		 * be freed again because the epi must have been removed from ep->rbr
+		 * when ep_free() is invoked.
+		 */
+		if (rb_first_cached(&ep->rbr))
+			ep_remove(ep, epi);
 		mutex_unlock(&ep->mtx);
 
 		eventpoll_put_ep(ep);
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC][PATCH 8/8] epoll: protect the iteration of ep->rbr by ep->mtx in ep_free()
  2017-10-28 12:58 [RFC][PATCH 0/8] epoll: remove epmutex from ep_free() and eventpoll_release_file() for non-nested case Hou Tao
                   ` (6 preceding siblings ...)
  2017-10-28 12:58 ` [RFC][PATCH 7/8] epoll: prevent the double-free of " Hou Tao
@ 2017-10-28 12:58 ` Hou Tao
  2017-10-31 13:01 ` [RFC][PATCH 0/8] epoll: remove epmutex from ep_free() and eventpoll_release_file() for non-nested case Davidlohr Bueso
  8 siblings, 0 replies; 12+ messages in thread
From: Hou Tao @ 2017-10-28 12:58 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, viro, jbaron, oleg, dave, koct9i

When ep_free() iterates the epi in ep->rbr, the epi may be removed
by eventpoll_release_file(). To protect again the case, acquiring
ep->mtx before the iteration of ep->rbr.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 fs/eventpoll.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index cd7a9f4..7618fb5 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -870,6 +870,7 @@ static void ep_free(struct eventpoll *ep)
 	/*
 	 * Walks through the whole tree by unregistering poll callbacks.
 	 */
+	mutex_lock(&ep->mtx);
 	for (rbp = rb_first_cached(&ep->rbr); rbp; rbp = rb_next(rbp)) {
 		epi = rb_entry(rbp, struct epitem, rbn);
 
@@ -885,7 +886,6 @@ static void ep_free(struct eventpoll *ep)
 	 * We do not need to lock ep->mtx, either, we only do it to prevent
 	 * a lockdep warning.
 	 */
-	mutex_lock(&ep->mtx);
 	while ((rbp = rb_first_cached(&ep->rbr)) != NULL) {
 		epi = rb_entry(rbp, struct epitem, rbn);
 		ep_remove(ep, epi);
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH 1/8] epoll: remove epmutex from ep_free() & eventpoll_release_file()
  2017-10-28 12:58 ` [RFC][PATCH 1/8] epoll: remove epmutex from ep_free() & eventpoll_release_file() Hou Tao
@ 2017-10-28 13:58   ` Davidlohr Bueso
  2017-10-30  7:09     ` Hou Tao
  0 siblings, 1 reply; 12+ messages in thread
From: Davidlohr Bueso @ 2017-10-28 13:58 UTC (permalink / raw)
  To: Hou Tao; +Cc: linux-fsdevel, linux-kernel, viro, jbaron, oleg, koct9i

On Sat, 28 Oct 2017, Hou Tao wrote:

>Remove the global epmutex from ep_free() and eventpoll_release_file().
>In the later patches, we will add locks with a smaller granularity
>to serve the same purposes of epmutex.
>
>Signed-off-by: Hou Tao <houtao1@huawei.com>
>---
> fs/eventpoll.c | 4 ----
> 1 file changed, 4 deletions(-)
>
>diff --git a/fs/eventpoll.c b/fs/eventpoll.c
>index 2fabd19..26ab0c5 100644
>--- a/fs/eventpoll.c
>+++ b/fs/eventpoll.c
>@@ -835,7 +835,6 @@ static void ep_free(struct eventpoll *ep)
> 	 * anymore. The only hit might come from eventpoll_release_file() but
> 	 * holding "epmutex" is sufficient here.
> 	 */
^^
What about this comment (and the equivalent one in eventpoll_release_file()?

>-	mutex_lock(&epmutex);

...even if you fix it in a later patch, this patch breaks bisection. Now
we just race between ep_free() and eventpoll_release_file(). This patch
should be folded in, no?

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH 1/8] epoll: remove epmutex from ep_free() & eventpoll_release_file()
  2017-10-28 13:58   ` Davidlohr Bueso
@ 2017-10-30  7:09     ` Hou Tao
  0 siblings, 0 replies; 12+ messages in thread
From: Hou Tao @ 2017-10-30  7:09 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: linux-fsdevel, linux-kernel, viro, jbaron, oleg, koct9i

Hi,

On 2017/10/28 21:58, Davidlohr Bueso wrote:
> On Sat, 28 Oct 2017, Hou Tao wrote:
> 
>> Remove the global epmutex from ep_free() and eventpoll_release_file().
>> In the later patches, we will add locks with a smaller granularity
>> to serve the same purposes of epmutex.
>>
>> Signed-off-by: Hou Tao <houtao1@huawei.com>
>> ---
>> fs/eventpoll.c | 4 ----
>> 1 file changed, 4 deletions(-)
>>
>> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
>> index 2fabd19..26ab0c5 100644
>> --- a/fs/eventpoll.c
>> +++ b/fs/eventpoll.c
>> @@ -835,7 +835,6 @@ static void ep_free(struct eventpoll *ep)
>>      * anymore. The only hit might come from eventpoll_release_file() but
>>      * holding "epmutex" is sufficient here.
>>      */
> ^^
> What about this comment (and the equivalent one in eventpoll_release_file()?
> 
>> -    mutex_lock(&epmutex);
> 
Thanks for your reminder. I will fix the related comments in a v1 patchset.

> ...even if you fix it in a later patch, this patch breaks bisection. Now
> we just race between ep_free() and eventpoll_release_file(). This patch
> should be folded in, no?
Yes, the patchset should be squashed into one patch, however it will be
difficult to explain the purpose of these modifications, so I break them
into little pieces, and hoped that these little patches can explain the
reason why the modification is needed in a cleaner way. It also will
be fixed in v1 patch.

Regards,

Tao

> Thanks,
> Davidlohr
> 
> .
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH 0/8] epoll: remove epmutex from ep_free() and eventpoll_release_file() for non-nested case
  2017-10-28 12:58 [RFC][PATCH 0/8] epoll: remove epmutex from ep_free() and eventpoll_release_file() for non-nested case Hou Tao
                   ` (7 preceding siblings ...)
  2017-10-28 12:58 ` [RFC][PATCH 8/8] epoll: protect the iteration of ep->rbr by ep->mtx in ep_free() Hou Tao
@ 2017-10-31 13:01 ` Davidlohr Bueso
  8 siblings, 0 replies; 12+ messages in thread
From: Davidlohr Bueso @ 2017-10-31 13:01 UTC (permalink / raw)
  To: Hou Tao; +Cc: linux-fsdevel, linux-kernel, viro, jbaron, oleg, koct9i

On Sat, 28 Oct 2017, Hou Tao wrote:

>The patch set has passed the epoll related test cases in LTP, and we are
>planing to run some torture or performance test cases for nested-epoll
>cases.

fyi this series survived running a 2 level nested epoll_wait() benchmark
on a 40-core box, with 80 concurrent threads:

http://linux-scalability.org/epoll/epoll-test.c

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-10-31 13:02 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-28 12:58 [RFC][PATCH 0/8] epoll: remove epmutex from ep_free() and eventpoll_release_file() for non-nested case Hou Tao
2017-10-28 12:58 ` [RFC][PATCH 1/8] epoll: remove epmutex from ep_free() & eventpoll_release_file() Hou Tao
2017-10-28 13:58   ` Davidlohr Bueso
2017-10-30  7:09     ` Hou Tao
2017-10-28 12:58 ` [RFC][PATCH 2/8] epoll: remove ep from visited_list when freeing ep Hou Tao
2017-10-28 12:58 ` [RFC][PATCH 3/8] epoll: remove file from tfile_check_list when releasing file Hou Tao
2017-10-28 12:58 ` [RFC][PATCH 4/8] epoll: free eventpoll by rcu to provide existence guarantee Hou Tao
2017-10-28 12:58 ` [RFC][PATCH 5/8] epoll: iterate epi in file->f_ep_links by using list_first_or_null_rcu Hou Tao
2017-10-28 12:58 ` [RFC][PATCH 6/8] epoll: ensure the validity of ep when removing epi in eventpoll_release_file() Hou Tao
2017-10-28 12:58 ` [RFC][PATCH 7/8] epoll: prevent the double-free of " Hou Tao
2017-10-28 12:58 ` [RFC][PATCH 8/8] epoll: protect the iteration of ep->rbr by ep->mtx in ep_free() Hou Tao
2017-10-31 13:01 ` [RFC][PATCH 0/8] epoll: remove epmutex from ep_free() and eventpoll_release_file() for non-nested case Davidlohr Bueso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).