From: "Alex Xu (Hello71)" <alex_y_xu@yahoo.ca>
To: linux-kernel@vger.kernel.org, dhowells@redhat.com, acrichton@mozilla.com
Cc: torvalds@linux-foundation.org,
Rasmus Villemoes <linux@rasmusvillemoes.dk>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Peter Zijlstra <peterz@infradead.org>,
nicolas.dichtel@6wind.com, raven@themaw.net,
Christian Brauner <christian@brauner.io>,
keyrings@vger.kernel.org, linux-usb@vger.kernel.org,
linux-block@vger.kernel.org,
linux-security-module@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org
Subject: [REGRESSION?] Simultaneous writes to a reader-less, non-full pipe can hang
Date: Wed, 04 Aug 2021 11:37:36 -0400 [thread overview]
Message-ID: <1628086770.5rn8p04n6j.none@localhost> (raw)
In-Reply-To: 1628086770.5rn8p04n6j.none.ref@localhost
Hi,
An issue "Jobserver hangs due to full pipe" was recently reported
against Cargo, the Rust package manager. This was diagnosed as an issue
with pipe writes hanging in certain circumstances.
Specifically, if two or more threads simultaneously write to a pipe, it
is possible for all the writers to hang despite there being significant
space available in the pipe.
I have translated the Rust example to C with some small adjustments:
#define _GNU_SOURCE
#include <fcntl.h>
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
static int pipefd[2];
void *thread_start(void *arg) {
char buf[1];
for (int i = 0; i < 1000000; i++) {
read(pipefd[0], buf, sizeof(buf));
write(pipefd[1], buf, sizeof(buf));
}
puts("done");
return NULL;
}
int main() {
pipe(pipefd);
printf("init buffer: %d\n", fcntl(pipefd[1], F_GETPIPE_SZ));
printf("new buffer: %d\n", fcntl(pipefd[1], F_SETPIPE_SZ, 0));
write(pipefd[1], "aa", 2);
pthread_t thread1, thread2;
pthread_create(&thread1, NULL, thread_start, NULL);
pthread_create(&thread2, NULL, thread_start, NULL);
pthread_join(thread1, NULL);
pthread_join(thread2, NULL);
}
The expected behavior of this program is to print:
init buffer: 65536
new buffer: 4096
done
done
and then exit.
On Linux 5.14-rc4, compiling this program and running it will print the
following about half the time:
init buffer: 65536
new buffer: 4096
done
and then hang. This is unexpected behavior, since the pipe is at most
two bytes full at any given time.
/proc/x/stack shows that the remaining thread is hanging at pipe.c:560.
It looks like not only there needs to be space in the pipe, but also
slots. At pipe.c:1306, a one-page pipe has only one slot. this led me to
test nthreads=2, which also hangs. Checking blame of the pipe_write
comment, it was added in a194dfe, which says, among other things:
> We just abandon the preallocated slot if we get a copy error. Future
> writes may continue it and a future read will eventually recycle it.
This matches the observed behavior: in this case, there are no readers
on the pipe, so the abandoned slot is lost.
In my opinion (as expressed on the issue), the pipe is being misused
here. As explained in the pipe(7) manual page:
> Applications should not rely on a particular capacity: an application
> should be designed so that a reading process consumes data as soon as
> it is available, so that a writing process does not remain blocked.
Despite the misuse, I am reporting this for the following reasons:
1. I am reasonably confident that this is a regression in the kernel,
which has a standard of making reasonable efforts to maintain
backwards compatibility even with broken programs.
2. Even if this is not a regression, it seems like this situation could
be handled somewhat more gracefully. In this case, we are not writing
4095 bytes and then expecting a one-byte write to succeed; the pipe
is actually almost entirely empty.
3. Pipe sizes dynamically shrink in Linux, so despite the fact that this
case is unlikely to occur with two or more slots available, even a
program which does not explicitly allocate a one-page pipe buffer may
wind up with one if the user has 1024 or more pipes already open.
This significantly exacerbates the next point:
4. GNU make's jobserver uses pipes in a similar manner. By my reading of
the paper, it is theoretically possible for an N simultaneous writes
to occur without any readers, where N is the maximum concurrent jobs
permitted.
Consider the following example with make -j2: two compile jobs are to
be performed: one at the top level, and one in a sub-directory. The
top-level make invokes one make and one cc, costing two tokens. The
sub-make invokes one cc with its free token. The pipe is now empty.
Now, suppose the two compilers return at exactly the same time. Both
copies of make will attempt to simultaneously write a token to the
pipe. This does not yet trigger deadlock: at least one write will
always succeed on an empty pipe. Suppose the sub-make's write goes
through. It then exits. The top-level make, however, is still blocked
on its original write, since it was not successfully merged with the
other write. The build is now deadlocked.
I think this does not happen only by a coincidental design decision:
when the sub-make exits, the top-level make receives a SIGCHLD. GNU
make registers a SA_RESTART handler for SIGCHLD, so the write will be
interrupted and restarted. This is only a coincidence, however: the
program does not actually expect writing to the control pipe to ever
block; it could just as well de-register the signal handler while
performing the write and still be fully correct.
Regards,
Alex.
next parent reply other threads:[~2021-08-04 15:37 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1628086770.5rn8p04n6j.none.ref@localhost>
2021-08-04 15:37 ` Alex Xu (Hello71) [this message]
2021-08-04 16:31 ` [REGRESSION?] Simultaneous writes to a reader-less, non-full pipe can hang Linus Torvalds
2021-08-04 19:48 ` Alex Xu (Hello71)
2021-08-04 20:04 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1628086770.5rn8p04n6j.none@localhost \
--to=alex_y_xu@yahoo.ca \
--cc=acrichton@mozilla.com \
--cc=christian@brauner.io \
--cc=dhowells@redhat.com \
--cc=gregkh@linuxfoundation.org \
--cc=keyrings@vger.kernel.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-security-module@vger.kernel.org \
--cc=linux-usb@vger.kernel.org \
--cc=linux@rasmusvillemoes.dk \
--cc=nicolas.dichtel@6wind.com \
--cc=peterz@infradead.org \
--cc=raven@themaw.net \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).