From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751594AbdJ1Mwu (ORCPT ); Sat, 28 Oct 2017 08:52:50 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:9538 "EHLO szxga05-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751170AbdJ1Mwb (ORCPT ); Sat, 28 Oct 2017 08:52:31 -0400 From: Hou Tao To: CC: , , , , , Subject: [RFC][PATCH 0/8] epoll: remove epmutex from ep_free() and eventpoll_release_file() for non-nested case Date: Sat, 28 Oct 2017 20:58:19 +0800 Message-ID: <1509195507-29037-1-git-send-email-houtao1@huawei.com> X-Mailer: git-send-email 2.7.5 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.175.124.28] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A090204.59F47D81.002E,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2014-11-16 11:51:01, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: d227566debd7b737efffd57e2e6ac0e4 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, We are optimizing the Request-Per-Second of nginx http server, and we found that acquiring epmutex in eventpoll_release_file() will become a bottleneck under the one-request-per-connection scenario. The following are some details of the scenario: * HTTP server (nginx): * under ARM64 with 64 cores * 64 worker processes, each worker is binded to a specific CPU * keepalive_requests = 1 in nginx.conf: nginx will close the connection fd after a reply is send * HTTP benchmark tool (wrk): * under x86-64 with 48 cores * 16 threads, 64 connections per-thread Before the patch, the RPS measured by wrk is ~220K, after applying the patch the RPS is ~240K. We also measure the overhead of eventpoll_release_file() and its children by perf: 29% before and 2% after. In the following section I will explain the purposes of epmutex, and the way of replacing it by using locks with a smaller granularity. epmutex serves four purposes: (1) serialize ep_loop_check() and ep_free()/eventpoll_release_file() (a) ensure the validity of ep when clearing visited_list The acquisition of epmutex in ep_free() prevent the freeing of ep. It's fixed in patch 2: when freeing ep, remove it from visited_list. When there is no nested-epoll cast, ep will not been added to visited_list, so we check the condition first. If it has already been added to visited_list, we need to wait for the release of epmutex. (2) serialize reverse_path_check() and ep_free()/eventpoll_release_file() (a) ensure the validity of file in tfile_check_list epi->ffd.file was added to tfile_check_list under ep->mtx, but was accessed without ep->mtx. The acquisition of epmutex in eventpoll_release_file() prevent the freeing of file. It's fixed in patch 3: when releasing file, remove it from tfile_check_list. If it has been already added, we need to wait for the release of epmutex. (b) ensure the validity of epi->ep and epi->ep->file The epmutex will prevent the freeing of ep and its related file, so it's OK to access epi->ep under rcu read critical region. The change is done in patch 4: we free ep by rcu, so it's OK to access epi->ep->file under rcu read critical region. The file has already been freed by rcu, so it's also OK to access its fields. (3) serialize the concurrent invocations of epoll_ctl(EPOLL_CTL_ADD) for the nested-epoll-fd case (a) protect tfile_check_list and visited_list There is nothing to do. (4) serialize ep_free() and eventpoll_release_file() (a) protect file->f_ep_links eventpoll_release_file() will read the list through file->f_ep_links, and modify it through epi->fllink. ep_free() will modify it through epi->fllink. It's fixed in patch 5: using rcu and list_first_or_null_rcu() to iterate file->f_ep_links instead of epmutex. (b) ensure the validity of epi->ep When eventpoll_release_file() gets epi from file->f_ep_links, epi->ep should still be valid. It's fixed in patch 4 and 6: add an ref-counter to eventpoll and free eventpoll by rcu. (c) protect the removal of epi Both ep_free() and eventpoll_release_file() will try to remove the same epi, if one function has removed the epi, the other function should not remove it again. It's fixed in patch 7: check whether or not ep_free() has already removed the epi before the invocation of ep_remove() in eventpoll_release_file(). (d) ensure the validity of epi->ffd.file When ep_remove() is invoked by ep_free(), epi->ffd.file should still be valid. Do not need to do anything: when ep_free() is invoking ep_remove() and access epi->ffd.file, if the file is freeing, the freeing will be blocked on ep->mtx, so it's OK to access the file in ep_remove(). Patch 1 just removes epmutex from ep_free() and eventpoll_release_file(), and patch 8 enlarge the protected region of ep->mtx to protect against the iteration of ep->rbr. The patch set has passed the epoll related test cases in LTP, and we are planing to run some torture or performance test cases for nested-epoll cases. Comments and questions are welcome. Regards, Tao --- Hou Tao (8): epoll: remove epmutex from ep_free() & eventpoll_release_file() epoll: remove ep from visited_list when freeing ep epoll: remove file from tfile_check_list when releasing file epoll: free eventpoll by rcu to provide existence guarantee epoll: iterate epi in file->f_ep_links by using list_first_or_null_rcu epoll: ensure the validity of ep when removing epi in eventpoll_release_file() epoll: prevent the double-free of epi in eventpoll_release_file() epoll: protect the iteration of ep->rbr by ep->mtx in ep_free() fs/eventpoll.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 88 insertions(+), 14 deletions(-) -- 2.7.5