From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39FACC43603 for ; Mon, 9 Dec 2019 09:54:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 01DD42072D for ; Mon, 9 Dec 2019 09:54:02 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="FYhpnVQx" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727379AbfLIJyC (ORCPT ); Mon, 9 Dec 2019 04:54:02 -0500 Received: from mail-lf1-f67.google.com ([209.85.167.67]:37819 "EHLO mail-lf1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727329AbfLIJyB (ORCPT ); Mon, 9 Dec 2019 04:54:01 -0500 Received: by mail-lf1-f67.google.com with SMTP id b15so10179587lfc.4 for ; Mon, 09 Dec 2019 01:53:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=SrN+istqMjTsOK45GjMp/wgqCgFHWcl6iEK6o+bDYyM=; b=FYhpnVQxjhJIIqGoA+uTyUBGndUjSdhRjhXD+hmCdSzrzcpK+TmYWixigujmqBc29g cQw4LhNbscwXEXbw3rcU3LO+yNZN4dWwwT14yLzwBt9UGVsjKQbYKzNpmVOFq7TREiV0 tqL0VFCdHZ7v3c0GfjYLk46UAROJtl3WhUQa0/HUPL0P9aGB82cOgEJhIJuPNXgtlTgv XChrtF2KrAZb9Z8WH9sR7PodmzJ/MB6tRnFFCYP7241+kzzZXN0qeNTcontRvE52wIaE rVJuNBmIDPVwSZFdkYk198+xipgQqNpucAQrUMGlkaE0GDJ7z7nae9MUvq4veoX7s7YH UZXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=SrN+istqMjTsOK45GjMp/wgqCgFHWcl6iEK6o+bDYyM=; b=ff9b7lyFvSCn9gBTEJhOJCgYfwV3+83lpMmjJdzTYhgIIkx0BYBLt3qcYFYd4kJ7cH sCO50dTE7PoLjFWFG438OyrdWAiXegKjuM3mmRYOO3z8wiIoxArLf8wAHTpyLuaMYwyE OSisWUmzLs1pa3GeBXCVS6YDknJi6ntL8WxlU1yWQckToFHZGWwP/iHx90G4tX8TqT2K UCOP6NUa5HBPPQX/f/iSauh3D3ADxxAW2JQpStTiULfNs3w2+/kQkPMZ4GKckLiPMQCQ Xr2Z2f6rQ1292q1x5OsJwOjKIM6deqpXB90YKkxFyaAC4CgZZHTFZBmf78P96J/AZ3tP 69NQ== X-Gm-Message-State: APjAAAUIb9wXho8eZGhEXXYF82p8AsOBUVtdygYKHFuzOnDkktXWH2XJ OVPnHHJmmiOsvsN5gC0a4Lx4JFzPqhsl+qwygCrwfA== X-Google-Smtp-Source: APXvYqwfvOkeVUH5fg2kwnMXipvPVRUV+ZauWZg2WGQOLmGb4ZjQarFMEQy8e9EIVX91PBR/vHMnpU9Wsyptxc0KP8Y= X-Received: by 2002:ac2:43a7:: with SMTP id t7mr13538590lfl.125.1575885238439; Mon, 09 Dec 2019 01:53:58 -0800 (PST) MIME-Version: 1.0 References: <157558502272.10278.8718685637610645781.stgit@warthog.procyon.org.uk> <20191206135604.GB2734@twin.jikos.cz> In-Reply-To: From: Vincent Guittot Date: Mon, 9 Dec 2019 10:53:46 +0100 Message-ID: Subject: Re: [PATCH 0/2] pipe: Fixes [ver #2] To: Linus Torvalds Cc: David Sterba , David Howells , Eric Biggers , Al Viro , linux-fsdevel , Linux Kernel Mailing List , Peter Zijlstra , Ingo Molnar Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, 7 Dec 2019 at 23:48, Linus Torvalds wrote: > > On Fri, Dec 6, 2019 at 7:50 PM Linus Torvalds > wrote: > > > > The "make goes slow" problem bisects down to b667b8673443 ("pipe: > > Advance tail pointer inside of wait spinlock in pipe_read()"). > > I'm not entirely sure that ends up being 100% true. It did bisect to > that, but the behavior wasn't entirely stable. There definitely is > some nasty timing trigger. > > But I did finally figure out what seems to have been going on with at > least the biggest part of the build performance regression. It's seems > to be a nasty interaction with the scheduler and the GNU make > jobserver, and in particular the pipe wakeups really _really_ do seem > to want to be synchronous both for the readers and the writers. > > When a writer wakes up a reader, we want the reader to react quickly > and vice versa. The most obvious case was for the GNU make jobserver, > where sub-makes would do a single-byte write to the jobserver pipe, > and we want to wake up the reader *immediatly*, because the reader is > actually a lot more important than the writer. The reader is what gets > the next job going, the writer just got done with the last one. > > And when a reader empties a full pipe, it's because the writer is > generating data, and you want to just get the writer going again asap. > > Anyway, I've spent way too much time looking at this and wondering > about odd performance patterns. It seems to be mostly back up to > normal. > > I say "mostly", because I still see times of "not as many concurrent > compiles going as I'd expect". It might be a kbuild problem, it might > be an issue with GNU make (I've seen problems with the make jobserver > wanting many more tokens than expected before and the kernel makefiles > - it migth be about deep subdirectories etc), and it might be some > remaining pipe issue. But my allmodconfig builds aren't _enormously_ > slower than they used to be. > > But there's definitely some unhappy interaction with the jobserver. I > have 16 threads (8 cores with HT), and I generally use "make -j32" to > keep them busy because the jobserver isn't great. The pipe rework made > even that 2x slop not work all that well. Something held on to tokens > too long, and there was definitely some interaction with the pipe > wakeup code. Using "-j64" hid the problem, but it was a problem. > > It might be the new scheduler balancing changes that are interacting > with the pipe thing. I'm adding PeterZ, Ingo and Vincent to the cc, > because I hadn't realized just how important the sync wakeup seems to > be for pipe performance even at a big level. Which version of make should I use to reproduce the problem ? My setup is not the same and my make is a bit old but I haven't been able to reproduce the problem described above on my arm64 octa cores system and v5.5-rc1. All cores are busy with -j16. And even -j8 keeps the cores almost always busy > > I've pushed out my pipe changes. I really didn't want to do that kind > of stuff at the end of the merge window, but I spent a lot more time > than I wanted looking at this code, because I was getting to the point > where the alternative was to just revert it all. > > DavidH, give these a look: > > 85190d15f4ea pipe: don't use 'pipe_wait() for basic pipe IO > a28c8b9db8a1 pipe: remove 'waiting_writers' merging logic > f467a6a66419 pipe: fix and clarify pipe read wakeup logic > 1b6b26ae7053 pipe: fix and clarify pipe write wakeup logic > ad910e36da4c pipe: fix poll/select race introduced by the pipe rework > > the top two of which are purely "I'm fed up looking at this code, this > needs to go" kind of changes. > > In particular, that last change is because I think the GNU jobserver > problem is partly a thundering herd issue: when a job token becomes > free (ie somebody does a one-byte write to an empty jobserver pipe), > it wakes up *everybody* who is waiting for a token. One of them will > get it, and the others will go to sleep again. And then it repeats all > over. I didn't fix it, but it _could_ be fixed with exclusive waits > for readers/writers, but that means more smarts than pipe_wait() can > do. And because the jobserver isn't great at keeping everybody happy, > I'm using a much bigger "make -jX" value than the number of CPU's I > have, which makes the herd bigger. And I suspect none of this helps > the scheduler pick the _right_ process to run, which just makes > scheduling an even bigger problem. > > Linus