From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752457AbdJROD1 (ORCPT ); Wed, 18 Oct 2017 10:03:27 -0400 Received: from mx0a-00190b01.pphosted.com ([67.231.149.131]:44154 "EHLO mx0a-00190b01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751792AbdJRODX (ORCPT ); Wed, 18 Oct 2017 10:03:23 -0400 Subject: Re: [PATCH] epoll: avoid calling ep_call_nested() from ep_poll_safewake() To: Davidlohr Bueso Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Alexander Viro , Salman Qazi References: <1507920533-8812-1-git-send-email-jbaron@akamai.com> <20171017153737.kvrzzhvy55ocgb7u@linux-n805> From: Jason Baron Message-ID: <05416148-bf19-7718-b6b4-7658b5aca2ee@akamai.com> Date: Wed, 18 Oct 2017 10:03:10 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: <20171017153737.kvrzzhvy55ocgb7u@linux-n805> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-10-18_05:,, signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1710180197 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-10-18_05:,, signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1710180197 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/17/2017 11:37 AM, Davidlohr Bueso wrote: > On Fri, 13 Oct 2017, Jason Baron wrote: > >> The ep_poll_safewake() function is used to wakeup potentially nested >> epoll >> file descriptors. The function uses ep_call_nested() to prevent entering >> the same wake up queue more than once, and to prevent excessively deep >> wakeup paths (deeper than EP_MAX_NESTS). However, this is not necessary >> since we are already preventing these conditions during EPOLL_CTL_ADD. >> This >> saves extra function calls, and avoids taking a global lock during the >> ep_call_nested() calls. > > This makes sense. > >> >> I have, however, left ep_call_nested() for the CONFIG_DEBUG_LOCK_ALLOC >> case, since ep_call_nested() keeps track of the nesting level, and >> this is >> required by the call to spin_lock_irqsave_nested(). It would be nice to >> remove the ep_call_nested() calls for the CONFIG_DEBUG_LOCK_ALLOC case as >> well, however its not clear how to simply pass the nesting level through >> multiple wake_up() levels without more surgery. In any case, I don't >> think >> CONFIG_DEBUG_LOCK_ALLOC is generally used for production. This patch, >> also >> apparently fixes a workload at Google that Salman Qazi reported by >> completely removing the poll_safewake_ncalls->lock from wakeup paths. > > I'm a bit curious about the workload (which uses lots of EPOLL_CTL_ADDs) as > I was tackling the nested epoll scaling issue with loop and readywalk lists > in mind. >> I'm not sure the details of the workload - perhaps Salman can elaborate further about it. It would seem that the safewake would potentially be the most contended in general in the nested case, because generally you have a few epoll fds attached to lots of sources doing wakeups. So those sources are all going to conflict on the safewake lock. The readywalk is used when performing a 'nested' poll and in general this is likely going to be called on a few epoll fds. That said, we should remove it too. I will post a patch to remove it. The loop lock is used during EPOLL_CTL_ADD to check for loops and deep wakeup paths and so I would expect this to be less common, but I wouldn't doubt there are workloads impacted by it. We can potentially, I think remove this one too - and the global 'epmutex'. I posted some ideas a while ago on it: http://lkml.iu.edu/hypermail//linux/kernel/1501.1/05905.html We can work through these ideas or others... Thanks, -Jason >> Signed-off-by: Jason Baron >> Cc: Alexander Viro >> Cc: Andrew Morton >> Cc: Salman Qazi > > Acked-by: Davidlohr Bueso > >> --- >> fs/eventpoll.c | 47 ++++++++++++++++++----------------------------- >> 1 file changed, 18 insertions(+), 29 deletions(-) > > Yay for getting rid of some of the callback hell. > > Thanks, > Davidlohr