From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BBF76C4338F for ; Wed, 25 Aug 2021 17:25:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 91AAD610A0 for ; Wed, 25 Aug 2021 17:25:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242321AbhHYR0Q (ORCPT ); Wed, 25 Aug 2021 13:26:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54836 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233612AbhHYR0O (ORCPT ); Wed, 25 Aug 2021 13:26:14 -0400 Received: from mail-lf1-x136.google.com (mail-lf1-x136.google.com [IPv6:2a00:1450:4864:20::136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 653FBC061757 for ; Wed, 25 Aug 2021 10:25:28 -0700 (PDT) Received: by mail-lf1-x136.google.com with SMTP id j4so475758lfg.9 for ; Wed, 25 Aug 2021 10:25:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=MNIM3WW8Ze0x+S3sN3sIiBTuhvgRcNb/CPfWfCAvio0=; b=GmKeqIU2Gudp2kzLna3RtlddFJNw1mWlpvF91VBf+XJ07q1njknT5faN+RJUZN1dgX XnC/W25NfdVpfGc1TNVWNK6vz9gdO6XkyZh8GfdejwMBLBn4wEh7G25lzX50HyIl2bzx 40QBc9QshjOo80HOfie6UE8OyWhZyv5UGWLJ0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=MNIM3WW8Ze0x+S3sN3sIiBTuhvgRcNb/CPfWfCAvio0=; b=V019Dlcs6RoArTAtRJMUrpiqw4nn7oPB6JyLdtNTn9m2umKb3nxu9vUhCPAQ0Wf9Re tvjyI7UtHEAabdN7yIFFnkZFABx72y3ZTQIE1BEqDi6UfZyIEpqP+QeSX1nEmIPIXLOt jeW1EMMt6mG5SRUQUoa73PBe5x5vNRHaQkrobgXhGq49VG8pqnzREXAu4OiCSvsDZtmb q8dH8DeLgM0DWeYFuZ8By7ym3ist+q7esat7cIxkHOBcM1Kfqq57gAq9bhMM1BXiLHHE DjzenLNnRh4+jCVZkkPkBu8OsUuPbQ+uvjMGv75hWWaNk0sniiH6DmQQhIv7WUUQdoPM Cfwg== X-Gm-Message-State: AOAM530UpFPr3jWw5K0XG+O7WpUB5ZNWaB+jG/X2XYJJDg/44eXEYu2M 2Q+og7SrU5H3S53E4zk5hXzoYEo/wMh6/ozp X-Google-Smtp-Source: ABdhPJwjI68/93XSVt7Dlc75Zqn80fOvjJbeyQaCI1/IAgt40F4OT8ikDI5/h7YBZLsPof9x31jmrg== X-Received: by 2002:ac2:4c56:: with SMTP id o22mr9971118lfk.240.1629912326560; Wed, 25 Aug 2021 10:25:26 -0700 (PDT) Received: from mail-lj1-f173.google.com (mail-lj1-f173.google.com. [209.85.208.173]) by smtp.gmail.com with ESMTPSA id d24sm66711lfs.80.2021.08.25.10.25.25 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 25 Aug 2021 10:25:25 -0700 (PDT) Received: by mail-lj1-f173.google.com with SMTP id m4so10107601ljq.8 for ; Wed, 25 Aug 2021 10:25:25 -0700 (PDT) X-Received: by 2002:a05:651c:908:: with SMTP id e8mr8505608ljq.507.1629912324881; Wed, 25 Aug 2021 10:25:24 -0700 (PDT) MIME-Version: 1.0 References: <20210824151337.GC27667@xsang-OptiPlex-9020> <875yvtpqbc.fsf@disp2133> In-Reply-To: <875yvtpqbc.fsf@disp2133> From: Linus Torvalds Date: Wed, 25 Aug 2021 10:25:08 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [pipe] 3b844826b6: stress-ng.sigio.ops_per_sec -99.3% regression To: "Eric W. Biederman" Cc: kernel test robot , Colin Ian King , Sandeep Patil , Mel Gorman , LKML , lkp@lists.01.org, kernel test robot , "Huang, Ying" , Feng Tang , Zhengjun Xing Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 25, 2021 at 7:11 AM Eric W. Biederman wrote: > > We have two things going on, a pipe wake up and signal wake up. > > Does their order matter? It feels weird that it is possible that > the data can be read from the pipe and the reader woken up to write > more when the signal that notifies the reader of that state has > not even been queued for delivery. I don't think the order matters. The only thing that matters is that the signal (and the regular wakeup, for that matter) is donme *after* the operation that triggers them is complete. It would be problematic if we sent the signal for "you can read more now" before we had actually done the write, so that the recipient would then try to read and get a "nothing there". But if you have both a pending reader, and something that asked for SIGIO, they'll both get notified. Not in any particular orfder between those two, but both will be notified after the write (or read) that triggered it has been done. Of course, the pending reader/writer that gets notified might be the _same_ one that also gets the SIGIO, so you could then have the SIGIO mean that the reader/writer gets EINTR and needs to read/write again. If you asked for both, you'll get it. The way our pipe code is organized, the _likely_ case is that you'll do the read/write and take the signal handler without ever getting -EAGAIN. Because we'll test for "do we have more data" before we test for "do we have signals pending" - but that's not really relevant for correctness (it's more a greedy "let's try to read while possible" and could be good for avoiding extra system calls). And I think it was actually a historical mistake to tie the "send SIGIO" together so tightly with "wake up other side". So my untested patch (well, it's now tested by Oliver) is likely the right thing to do regardless. Basically, the "wake up readers/writers" thing is the one where we *know* the other side - it's just the other end, and it's the kernel code in the same fs/pipe.c file. So we can - and should - optimize against doing unnecessary wakeups. But as has now been shown several times, we shouldn't optimize the case where the wakups are sent to code we don't control - ie user space. Whether that be as a result of epoll, or as a signal delivery, that "other side" is not under our control and clearly doesn't want the optimal minimal wakeups for just state transition cases. Of course, I could always wish that the receiving side always did the right thing, and worked with the minimal state transition data, but that clearly simply isn't the case. It wasn't the case for EPOLLET, and it wasn't the case for SIGIO. "If wishes were horses ..." So I'll commit that SIGIO fix, even if Colin has already changed stress-ng. No real harm in just doing both belt and suspenders, and I was wrong last time when I thought the EPOLLET thing was purely a test-suite issue. Let nobody say that I can't learn from my mistakes.. Linus From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============1656021933770701734==" MIME-Version: 1.0 From: Linus Torvalds To: lkp@lists.01.org Subject: Re: [pipe] 3b844826b6: stress-ng.sigio.ops_per_sec -99.3% regression Date: Wed, 25 Aug 2021 10:25:08 -0700 Message-ID: In-Reply-To: <875yvtpqbc.fsf@disp2133> List-Id: --===============1656021933770701734== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On Wed, Aug 25, 2021 at 7:11 AM Eric W. Biederman = wrote: > > We have two things going on, a pipe wake up and signal wake up. > > Does their order matter? It feels weird that it is possible that > the data can be read from the pipe and the reader woken up to write > more when the signal that notifies the reader of that state has > not even been queued for delivery. I don't think the order matters. The only thing that matters is that the signal (and the regular wakeup, for that matter) is donme *after* the operation that triggers them is complete. It would be problematic if we sent the signal for "you can read more now" before we had actually done the write, so that the recipient would then try to read and get a "nothing there". But if you have both a pending reader, and something that asked for SIGIO, they'll both get notified. Not in any particular orfder between those two, but both will be notified after the write (or read) that triggered it has been done. Of course, the pending reader/writer that gets notified might be the _same_ one that also gets the SIGIO, so you could then have the SIGIO mean that the reader/writer gets EINTR and needs to read/write again. If you asked for both, you'll get it. The way our pipe code is organized, the _likely_ case is that you'll do the read/write and take the signal handler without ever getting -EAGAIN. Because we'll test for "do we have more data" before we test for "do we have signals pending" - but that's not really relevant for correctness (it's more a greedy "let's try to read while possible" and could be good for avoiding extra system calls). And I think it was actually a historical mistake to tie the "send SIGIO" together so tightly with "wake up other side". So my untested patch (well, it's now tested by Oliver) is likely the right thing to do regardless. Basically, the "wake up readers/writers" thing is the one where we *know* the other side - it's just the other end, and it's the kernel code in the same fs/pipe.c file. So we can - and should - optimize against doing unnecessary wakeups. But as has now been shown several times, we shouldn't optimize the case where the wakups are sent to code we don't control - ie user space. Whether that be as a result of epoll, or as a signal delivery, that "other side" is not under our control and clearly doesn't want the optimal minimal wakeups for just state transition cases. Of course, I could always wish that the receiving side always did the right thing, and worked with the minimal state transition data, but that clearly simply isn't the case. It wasn't the case for EPOLLET, and it wasn't the case for SIGIO. "If wishes were horses ..." So I'll commit that SIGIO fix, even if Colin has already changed stress-ng. No real harm in just doing both belt and suspenders, and I was wrong last time when I thought the EPOLLET thing was purely a test-suite issue. Let nobody say that I can't learn from my mistakes.. Linus --===============1656021933770701734==--