From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F4C6C4320E for ; Wed, 4 Aug 2021 16:39:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 45EFC61008 for ; Wed, 4 Aug 2021 16:39:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239495AbhHDQjo (ORCPT ); Wed, 4 Aug 2021 12:39:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57260 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239438AbhHDQj3 (ORCPT ); Wed, 4 Aug 2021 12:39:29 -0400 Received: from mail-lj1-x233.google.com (mail-lj1-x233.google.com [IPv6:2a00:1450:4864:20::233]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9AE14C0613D5 for ; Wed, 4 Aug 2021 09:39:15 -0700 (PDT) Received: by mail-lj1-x233.google.com with SMTP id e5so3278496ljp.6 for ; Wed, 04 Aug 2021 09:39:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Mbke9nvjw7+ZeekiQRXLpz0LVFhUcu96Nb3OTfXE7BU=; b=PZ1SUNe5srVpzwzOaifNxeDnFelZQ09VVBF0S+czoEleI8H7NaXtf6BgmLYsKuLNiC LTcU8OYPWLuF/4NphS6i+l1ci/XnoMb+i+upOTt69bQAT3JgDhcod/J5a5/zqAWUQ5CE n8TsxAIOFru99StbsoS1slJejG91rgTWWoOlM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Mbke9nvjw7+ZeekiQRXLpz0LVFhUcu96Nb3OTfXE7BU=; b=SmOrEW7bj3oPvipHf2NTw+miSDkn0b2t00t3hPjZxuSU1IVls40rFo0AHWdAABZreG xFFaZsyJr01Kxjajjyu3ZshODFlJaGTemXfy4h/ThaTAm557FcftGpXhdmbl+Y0yOO4q 8ynkhGNicNemnVAYh3HfrKP/sz5GmmH6UtkLORbumLjlvCU6oxNxoWDEuQcyWOX6KQU7 gY6BYzsWri0rv44+OXQfNBr2JiIL4jkydLj4knVzc4ajTTw7r5wzq2aKS3LbOyT0VOXv P03fqTjgFuv4j4Bj3NUFGj2mqW/h760d9kLW4MQE04WND6dIiPlAR/N2TPWfHerj73q9 hvyQ== X-Gm-Message-State: AOAM531/J5ctcHWPPglhyj4JT3JIVdOqyZub1u98LxRBvV5aKKhhRQGf yCsApur6rReMv3rBQ7UQ4VGciZGSTGezj5wI X-Google-Smtp-Source: ABdhPJw/wVBgfMtdSMGeXnhpJiZbP96qvmTrqJEEndD2SXHc/fbnhOD+kQfXA89U0iJ0Hwp/V5/v1Q== X-Received: by 2002:a2e:9ccf:: with SMTP id g15mr223911ljj.408.1628095153650; Wed, 04 Aug 2021 09:39:13 -0700 (PDT) Received: from mail-lf1-f46.google.com (mail-lf1-f46.google.com. [209.85.167.46]) by smtp.gmail.com with ESMTPSA id s13sm242736lfd.53.2021.08.04.09.39.13 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 04 Aug 2021 09:39:13 -0700 (PDT) Received: by mail-lf1-f46.google.com with SMTP id g30so1242149lfv.4 for ; Wed, 04 Aug 2021 09:39:13 -0700 (PDT) X-Received: by 2002:a19:fc06:: with SMTP id a6mr62531lfi.377.1628094677755; Wed, 04 Aug 2021 09:31:17 -0700 (PDT) MIME-Version: 1.0 References: <1628086770.5rn8p04n6j.none.ref@localhost> <1628086770.5rn8p04n6j.none@localhost> In-Reply-To: <1628086770.5rn8p04n6j.none@localhost> From: Linus Torvalds Date: Wed, 4 Aug 2021 09:31:01 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [REGRESSION?] Simultaneous writes to a reader-less, non-full pipe can hang To: "Alex Xu (Hello71)" Cc: Linux Kernel Mailing List , David Howells , acrichton@mozilla.com, Rasmus Villemoes , Greg Kroah-Hartman , Peter Zijlstra , Nicolas Dichtel , Ian Kent , Christian Brauner , keyrings@vger.kernel.org, linux-usb@vger.kernel.org, linux-block , LSM List , linux-fsdevel , Linux API Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-usb@vger.kernel.org Your program is buggy. On Wed, Aug 4, 2021 at 8:37 AM Alex Xu (Hello71) wrote: > > pipe(pipefd); > printf("init buffer: %d\n", fcntl(pipefd[1], F_GETPIPE_SZ)); > printf("new buffer: %d\n", fcntl(pipefd[1], F_SETPIPE_SZ, 0)); Yeah, what did you expect this to do? You said you want a minimal buffer, you get a really small buffer. Then you try to write multiple messages to the pipe that you just said should have a minimum size. Don't do that then. > /proc/x/stack shows that the remaining thread is hanging at pipe.c:560. > It looks like not only there needs to be space in the pipe, but also > slots. Correct. The fullness of a pipe is not about whether it has the possibility of merging more bytes into an existing not-full slot, but about whether it has empty slots left. Part of that is simply the POSIX pipe guarantees - a write of size PIPE_BUF or less is guaranteed to be atomic, so it mustn't be split among buffers. So a pipe must not be "writable" unless it has space for at least that much (think select/poll, which don't know the size of the write). The fact that we might be able to reuse a partially filled buffer for smaller writes is simply not relevant to that issue. And yes, we could have different measures of "could write" for different writes, but we just don't have or want that complexity. Please don't mess with F_SETPIPE_SZ unless you have a really good reason to do so, and actually understand what you are doing. Doing a F_SETPIPE_SZ, 0 basically means "I want the mimimum pipe size possible". And that one accepts exactly one write at a time. Of course, the exact semantics are much more complicated than that "exactly one write". The pipe write code will optimistically merge writes into a previous buffer, which means that depending on the pattern of your writes, the exact number of bytes you can write will be very different. But that "merge writes into a previous buffer" only appends to the buffer - not _reuse_ it - so when each buffer is one page in size, what happens is that you can merge up to 4096 bytes worth of writes, but then after that the pipe write will want a new buffer - even if the old buffer is now empty because of old reads. That's why your test program won't block immediately: both writers will actually start out happily doing writes into that one buffer that is allocated, but at some point that buffer ends, and it wants to allocate a new buffer. But you told it not to allocate more buffers, and the old buffer is never completely empty because your readers never read _everythign_, so it will hang, waiting for you to empty the one minimal buffer it allocated. And that will never happen. There's a very real reason why we do *not* by default say "pipes can only ever use only one buffer". I don't think this is a regression, but if you have an actual application - not a test program - that does crazy things like this and used to work (I'm not sure it has ever worked, though), we can look into making it work again. That said, I suspect the way to make it work is to just say "the minimum pipe size is two slots" rather than change the "we want at least one empty slot". Exactly because of that whole "look, we must not consider a pipe that doesn't have a slot writable". Because clearly people don't understand how subtle F_SETPIPE_SZ is. It's not really a "byte count", even though that is how it's expressed. Linus