linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/8] epoll: remove epmutex from ep_free() and eventpoll_release_file() for non-nested case
@ 2017-10-28 12:58 Hou Tao
  2017-10-28 12:58 ` [RFC][PATCH 1/8] epoll: remove epmutex from ep_free() & eventpoll_release_file() Hou Tao
                   ` (8 more replies)
  0 siblings, 9 replies; 12+ messages in thread
From: Hou Tao @ 2017-10-28 12:58 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-kernel, viro, jbaron, oleg, dave, koct9i

Hi,

We are optimizing the Request-Per-Second of nginx http server, and we found
that acquiring epmutex in eventpoll_release_file() will become a bottleneck
under the one-request-per-connection scenario. The following are some details
of the scenario:

* HTTP server (nginx):
	* under ARM64 with 64 cores
	* 64 worker processes, each worker is binded to a specific CPU
	* keepalive_requests = 1 in nginx.conf: nginx will close the
	  connection fd after a reply is send
* HTTP benchmark tool (wrk):
	* under x86-64 with 48 cores
	* 16 threads, 64 connections per-thread

Before the patch, the RPS measured by wrk is ~220K, after applying
the patch the RPS is ~240K. We also measure the overhead of
eventpoll_release_file() and its children by perf: 29% before and
2% after.

In the following section I will explain the purposes of epmutex, and
the way of replacing it by using locks with a smaller granularity.

epmutex serves four purposes:
(1) serialize ep_loop_check() and ep_free()/eventpoll_release_file()
	(a) ensure the validity of ep when clearing visited_list
	The acquisition of epmutex in ep_free() prevent the freeing of ep.

	It's fixed in patch 2: when freeing ep, remove it from visited_list.
	When there is no nested-epoll cast, ep will not been added to
	visited_list, so we check the condition first. If it has already been
	added to visited_list, we need to wait for the release of epmutex.

(2) serialize reverse_path_check() and ep_free()/eventpoll_release_file()
	(a) ensure the validity of file in tfile_check_list
	epi->ffd.file was added to tfile_check_list under ep->mtx, but
	was accessed without ep->mtx. The acquisition of epmutex in
	eventpoll_release_file() prevent the freeing of file.

	It's fixed in patch 3: when releasing file, remove it from
	tfile_check_list. If it has been already added, we need to
	wait for the release of epmutex.

	(b) ensure the validity of epi->ep and epi->ep->file
	The epmutex will prevent the freeing of ep and its related file,
	so it's OK to access epi->ep under rcu read critical region.

	The change is done in patch 4: we free ep by rcu, so it's OK
	to access epi->ep->file under rcu read critical region. The file
	has already been freed by rcu, so it's also OK to access its fields.

(3) serialize the concurrent invocations of epoll_ctl(EPOLL_CTL_ADD)
    for the nested-epoll-fd case
	(a) protect tfile_check_list and visited_list

	There is nothing to do.

(4) serialize ep_free() and eventpoll_release_file()
	(a) protect file->f_ep_links
	eventpoll_release_file() will read the list through
	file->f_ep_links, and modify it through epi->fllink.
	ep_free() will modify it through epi->fllink.

	It's fixed in patch 5: using rcu and list_first_or_null_rcu() to
	iterate file->f_ep_links instead of epmutex.

	(b) ensure the validity of epi->ep
	When eventpoll_release_file() gets epi from file->f_ep_links,
	epi->ep should still be valid.

	It's fixed in patch 4 and 6: add an ref-counter to eventpoll and
	free eventpoll by rcu.

	(c) protect the removal of epi
	Both ep_free() and eventpoll_release_file() will try to remove
	the same epi, if one function has removed the epi, the other
	function should not remove it again.

	It's fixed in patch 7: check whether or not ep_free() has already
	removed the epi before the invocation of ep_remove() in
	eventpoll_release_file().

	(d) ensure the validity of epi->ffd.file
	When ep_remove() is invoked by ep_free(), epi->ffd.file should
	still be valid.

	Do not need to do anything: when ep_free() is invoking ep_remove()
	and access epi->ffd.file, if the file is freeing, the freeing will
	be blocked on ep->mtx, so it's OK to access the file in ep_remove().

Patch 1 just removes epmutex from ep_free() and eventpoll_release_file(),
and patch 8 enlarge the protected region of ep->mtx to protect against
the iteration of ep->rbr.

The patch set has passed the epoll related test cases in LTP, and we are
planing to run some torture or performance test cases for nested-epoll
cases.

Comments and questions are welcome.

Regards,

Tao
---
Hou Tao (8):
  epoll: remove epmutex from ep_free() & eventpoll_release_file()
  epoll: remove ep from visited_list when freeing ep
  epoll: remove file from tfile_check_list when releasing file
  epoll: free eventpoll by rcu to provide existence guarantee
  epoll: iterate epi in file->f_ep_links by using list_first_or_null_rcu
  epoll: ensure the validity of ep when removing epi in
    eventpoll_release_file()
  epoll: prevent the double-free of epi in eventpoll_release_file()
  epoll: protect the iteration of ep->rbr by ep->mtx in ep_free()

 fs/eventpoll.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 88 insertions(+), 14 deletions(-)

-- 
2.7.5

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-10-31 13:02 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-28 12:58 [RFC][PATCH 0/8] epoll: remove epmutex from ep_free() and eventpoll_release_file() for non-nested case Hou Tao
2017-10-28 12:58 ` [RFC][PATCH 1/8] epoll: remove epmutex from ep_free() & eventpoll_release_file() Hou Tao
2017-10-28 13:58   ` Davidlohr Bueso
2017-10-30  7:09     ` Hou Tao
2017-10-28 12:58 ` [RFC][PATCH 2/8] epoll: remove ep from visited_list when freeing ep Hou Tao
2017-10-28 12:58 ` [RFC][PATCH 3/8] epoll: remove file from tfile_check_list when releasing file Hou Tao
2017-10-28 12:58 ` [RFC][PATCH 4/8] epoll: free eventpoll by rcu to provide existence guarantee Hou Tao
2017-10-28 12:58 ` [RFC][PATCH 5/8] epoll: iterate epi in file->f_ep_links by using list_first_or_null_rcu Hou Tao
2017-10-28 12:58 ` [RFC][PATCH 6/8] epoll: ensure the validity of ep when removing epi in eventpoll_release_file() Hou Tao
2017-10-28 12:58 ` [RFC][PATCH 7/8] epoll: prevent the double-free of " Hou Tao
2017-10-28 12:58 ` [RFC][PATCH 8/8] epoll: protect the iteration of ep->rbr by ep->mtx in ep_free() Hou Tao
2017-10-31 13:01 ` [RFC][PATCH 0/8] epoll: remove epmutex from ep_free() and eventpoll_release_file() for non-nested case Davidlohr Bueso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).