From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.4 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8521C00A89 for ; Mon, 2 Nov 2020 17:49:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 61B2B21D91 for ; Mon, 2 Nov 2020 17:49:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Ldc/5Z9q" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725806AbgKBRtc (ORCPT ); Mon, 2 Nov 2020 12:49:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52060 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725789AbgKBRtb (ORCPT ); Mon, 2 Nov 2020 12:49:31 -0500 Received: from mail-wm1-x32b.google.com (mail-wm1-x32b.google.com [IPv6:2a00:1450:4864:20::32b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8C12CC0617A6 for ; Mon, 2 Nov 2020 09:49:31 -0800 (PST) Received: by mail-wm1-x32b.google.com with SMTP id h22so10355897wmb.0 for ; Mon, 02 Nov 2020 09:49:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=yMpsM0M3AdGW4q+fUp9nAkmdebCdEbbt+FqsNBDtnyU=; b=Ldc/5Z9q44Ih+zCEpxPC0bp2t3nP3laWvJVgM+C/rUcraM5h63WYWsToTDPIhrrfHO pAmf5kAc6A2VrP3OMVKSGro1hZ5g5QV2Uf4u7T7Wu+wEB+JSOlJ5ee+hFumPjeropUJT aeE4Ya0L8zXaZ49zKjLCKMO1GqTLLHfDiUiLr3xA8678yso9/u4Ya5JgBppWslnlIGEh aGFkV25zC7lo9HOg8OGy+FhFtFV13595RPyAN+JdtXISCgkgewEJc6dEMGdkTV7iX9+y DBOH5B7+WcHGIij6No8mFdGkkhXr1wP+PmmJuyl+EFINvs2UcRB0xDMj/aS7f+rJXrs4 027A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=yMpsM0M3AdGW4q+fUp9nAkmdebCdEbbt+FqsNBDtnyU=; b=BGjfj+MI+F6dxBH8cpZs/Fz6HzYqA9LMjTQ6wol82t/AeFwwATYYFPlZoGGaa9Fqu/ zgQm319uhN85nrL+W60yHHpewibIsoeQ2xW1Mdu5/XtcG5DAE2ugbTfs7gkWfmsX2ZNJ +4dv8W5xC0G16CUGtC06SrSJTzUra+WZ2/Hrqj1CsBhaZdApAnzau7/Le0IzttsFGob6 PlpN/HIlf0R3kj0G97S8guGf7nkhu/Dwsl5woNIa/kTlkpp/gOM34k9/tFmQPcV4nVQe QGLZn0mQJY5GaEZb1yjcfS93LgfToqZ2eSsgB/FYdopuOFQwuuKn4aAC9D0RkF0DvEku 84pw== X-Gm-Message-State: AOAM533azTWCZtyiO/VIE6KCLQKbI7gjnWs1JGrGOZl4n/u18fQaAQe+ 9GcAbfevwc0AJtd8gFkESOJxI0nm27aTgfsomtE1eg== X-Google-Smtp-Source: ABdhPJxPk5bF9lRHtWcHQmKc47iH0s0bLRz71yugVR7jasX8OG3MGmuWd5PJb+n0AlN1GK2ENXi7XFtVBwk5fThRntc= X-Received: by 2002:a1c:4944:: with SMTP id w65mr19504307wma.50.1604339369807; Mon, 02 Nov 2020 09:49:29 -0800 (PST) MIME-Version: 1.0 References: <20201101170656.48abbd5e88375219f868af5e@linux-foundation.org> <20201102010804.uENOQsZO9%akpm@linux-foundation.org> In-Reply-To: From: Soheil Hassas Yeganeh Date: Mon, 2 Nov 2020 12:48:53 -0500 Message-ID: Subject: Re: [patch 13/15] epoll: check ep_events_available() upon timeout To: Linus Torvalds Cc: Andrew Morton , Davidlohr Bueso , Eric Dumazet , Guantao Liu , Khazhismel Kumykov , Linux-MM , mm-commits@vger.kernel.org, Al Viro , Willem de Bruijn Content-Type: text/plain; charset="UTF-8" Precedence: bulk Reply-To: linux-kernel@vger.kernel.org List-ID: X-Mailing-List: mm-commits@vger.kernel.org On Mon, Nov 2, 2020 at 12:08 PM Linus Torvalds wrote: > > On Sun, Nov 1, 2020 at 5:08 PM Andrew Morton wrote: > > > > After abc610e01c66 ("fs/epoll: avoid barrier after an epoll_wait(2) > > timeout"), we break out of the ep_poll loop upon timeout, without checking > > whether there is any new events available. Prior to that patch-series we > > always called ep_events_available() after exiting the loop. > > This patch looks overly complicated to me. > > It does the exact same thing as the "break" does, except: > > - it does it the non-optimized way without the "avoid spinlock" > > - it sets eavail if there was > > It would seem like the *much* simpler patch is to just do this oneliner instead: > > diff --git a/fs/eventpoll.c b/fs/eventpoll.c > index 4df61129566d..29fa770ce1e3 100644 > --- a/fs/eventpoll.c > +++ b/fs/eventpoll.c > @@ -1907,6 +1907,7 @@ static int ep_poll(struct eventpoll *ep, > struct epoll_event __user *events, > > if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS)) { > timed_out = 1; > + eavail = 1; > break; > } > > and *boom* you're done. That will mean that after a timeout we'll try > one more time to just do that ep_events_available() thing. Thank you for the suggestion. That was the first version I tried, and I can confirm it fixes the race because we call ep_send_events() once more before returning. Though, I believe, due to time_out=1, we won't goto fetch_events to call ep_events_available(): if (!res && eavail && !(res = ep_send_events(ep, events, maxevents)) && !timed_out) goto fetch_events; > I can see no downside to just setting eavail unconditionally and > keeping the code much simpler. Hmm? You're spot on that the patch is more complicated than your suggestion. However, the downside I observed was a performance regression for the non-racy case: Suppose there are a few threads with a similar non-zero timeout and no ready event. They will all experience a noticeable contention in ep_scan_ready_list, by unconditionally calling ep_send_events(). The contention was large because there will be 2 write locks on ep->lock and one mutex lock on ep->mtx with a large critical section. FWIW, I also tried the following idea to eliminate that contention but I couldn't prove that it's correct, because of the gap between calling ep_events_available and removing this thread from the wait queue. That was why I resorted to calling __remove_wait_queue() under the lock. diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 4df61129566d..29fa770ce1e3 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1907,6 +1907,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS)) { timed_out = 1; + eavail = ep_events_available(ep); break; } Also, please note that the non-optimized way without the "avoid spinlock" is not problematic in any of our benchmarks because, when the thread times out, it's likely on the wait queue except for this particular racy scenario. Thanks! Soheil > Linus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DEF58C4742C for ; Mon, 2 Nov 2020 17:49:33 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4BE0421534 for ; Mon, 2 Nov 2020 17:49:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Ldc/5Z9q" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4BE0421534 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B34896B0036; Mon, 2 Nov 2020 12:49:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AE59F6B0068; Mon, 2 Nov 2020 12:49:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A21C06B006C; Mon, 2 Nov 2020 12:49:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0130.hostedemail.com [216.40.44.130]) by kanga.kvack.org (Postfix) with ESMTP id 72EBE6B0036 for ; Mon, 2 Nov 2020 12:49:32 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 0CA8E1EF1 for ; Mon, 2 Nov 2020 17:49:32 +0000 (UTC) X-FDA: 77440215384.22.fowl30_1010c1e272b1 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id E17A018038E68 for ; Mon, 2 Nov 2020 17:49:31 +0000 (UTC) X-HE-Tag: fowl30_1010c1e272b1 X-Filterd-Recvd-Size: 6471 Received: from mail-wm1-f49.google.com (mail-wm1-f49.google.com [209.85.128.49]) by imf18.hostedemail.com (Postfix) with ESMTP for ; Mon, 2 Nov 2020 17:49:31 +0000 (UTC) Received: by mail-wm1-f49.google.com with SMTP id h62so5519286wme.3 for ; Mon, 02 Nov 2020 09:49:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=yMpsM0M3AdGW4q+fUp9nAkmdebCdEbbt+FqsNBDtnyU=; b=Ldc/5Z9q44Ih+zCEpxPC0bp2t3nP3laWvJVgM+C/rUcraM5h63WYWsToTDPIhrrfHO pAmf5kAc6A2VrP3OMVKSGro1hZ5g5QV2Uf4u7T7Wu+wEB+JSOlJ5ee+hFumPjeropUJT aeE4Ya0L8zXaZ49zKjLCKMO1GqTLLHfDiUiLr3xA8678yso9/u4Ya5JgBppWslnlIGEh aGFkV25zC7lo9HOg8OGy+FhFtFV13595RPyAN+JdtXISCgkgewEJc6dEMGdkTV7iX9+y DBOH5B7+WcHGIij6No8mFdGkkhXr1wP+PmmJuyl+EFINvs2UcRB0xDMj/aS7f+rJXrs4 027A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=yMpsM0M3AdGW4q+fUp9nAkmdebCdEbbt+FqsNBDtnyU=; b=qfXBT7o8r8sZ/GKa3yh8ViXdRfI+K8/XaMvvNpGXnRpZct1kLDITlA0HNschID+H68 CMTzdbPrVE80N87JkKcPxph/F9J3O9g/JFqMe4B5TRM/FmMeHi1E828MhE7msmNouE2N 7irdPLrQ7rCWHkR+iDl2J6X8hxc9pFM0o4NFm552QYoUtUF4VLBhXk8YJdzxpOM+GP2I m9fBgFEO9QkseYmf62JpHBZZXnepibGChCx60Hp31d7s7xvTZoBV5RvajOtEba4brInW hg/kgSbfJM0PdwV+oanhFwRLglJ+H/eyEGDgfBQIY8sW+3RRxnrCdXhyKi/M5XPX+Ucb Napw== X-Gm-Message-State: AOAM533aWR6b2JMBYo2pbir5w9YWRVf3mUlO+kgEnBCkrqm6W2FUP3Ll n6kBp5erFRrlVXuP0coExDj5g3aA3HhVYhMdds+tTg== X-Google-Smtp-Source: ABdhPJxPk5bF9lRHtWcHQmKc47iH0s0bLRz71yugVR7jasX8OG3MGmuWd5PJb+n0AlN1GK2ENXi7XFtVBwk5fThRntc= X-Received: by 2002:a1c:4944:: with SMTP id w65mr19504307wma.50.1604339369807; Mon, 02 Nov 2020 09:49:29 -0800 (PST) MIME-Version: 1.0 References: <20201101170656.48abbd5e88375219f868af5e@linux-foundation.org> <20201102010804.uENOQsZO9%akpm@linux-foundation.org> In-Reply-To: From: Soheil Hassas Yeganeh Date: Mon, 2 Nov 2020 12:48:53 -0500 Message-ID: Subject: Re: [patch 13/15] epoll: check ep_events_available() upon timeout To: Linus Torvalds Cc: Andrew Morton , Davidlohr Bueso , Eric Dumazet , Guantao Liu , Khazhismel Kumykov , Linux-MM , mm-commits@vger.kernel.org, Al Viro , Willem de Bruijn Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Nov 2, 2020 at 12:08 PM Linus Torvalds wrote: > > On Sun, Nov 1, 2020 at 5:08 PM Andrew Morton wrote: > > > > After abc610e01c66 ("fs/epoll: avoid barrier after an epoll_wait(2) > > timeout"), we break out of the ep_poll loop upon timeout, without checking > > whether there is any new events available. Prior to that patch-series we > > always called ep_events_available() after exiting the loop. > > This patch looks overly complicated to me. > > It does the exact same thing as the "break" does, except: > > - it does it the non-optimized way without the "avoid spinlock" > > - it sets eavail if there was > > It would seem like the *much* simpler patch is to just do this oneliner instead: > > diff --git a/fs/eventpoll.c b/fs/eventpoll.c > index 4df61129566d..29fa770ce1e3 100644 > --- a/fs/eventpoll.c > +++ b/fs/eventpoll.c > @@ -1907,6 +1907,7 @@ static int ep_poll(struct eventpoll *ep, > struct epoll_event __user *events, > > if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS)) { > timed_out = 1; > + eavail = 1; > break; > } > > and *boom* you're done. That will mean that after a timeout we'll try > one more time to just do that ep_events_available() thing. Thank you for the suggestion. That was the first version I tried, and I can confirm it fixes the race because we call ep_send_events() once more before returning. Though, I believe, due to time_out=1, we won't goto fetch_events to call ep_events_available(): if (!res && eavail && !(res = ep_send_events(ep, events, maxevents)) && !timed_out) goto fetch_events; > I can see no downside to just setting eavail unconditionally and > keeping the code much simpler. Hmm? You're spot on that the patch is more complicated than your suggestion. However, the downside I observed was a performance regression for the non-racy case: Suppose there are a few threads with a similar non-zero timeout and no ready event. They will all experience a noticeable contention in ep_scan_ready_list, by unconditionally calling ep_send_events(). The contention was large because there will be 2 write locks on ep->lock and one mutex lock on ep->mtx with a large critical section. FWIW, I also tried the following idea to eliminate that contention but I couldn't prove that it's correct, because of the gap between calling ep_events_available and removing this thread from the wait queue. That was why I resorted to calling __remove_wait_queue() under the lock. diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 4df61129566d..29fa770ce1e3 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1907,6 +1907,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS)) { timed_out = 1; + eavail = ep_events_available(ep); break; } Also, please note that the non-optimized way without the "avoid spinlock" is not problematic in any of our benchmarks because, when the thread times out, it's likely on the wait queue except for this particular racy scenario. Thanks! Soheil > Linus