From: Linus Torvalds <torvalds@linux-foundation.org> To: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Cc: Alexander Viro <aviro@redhat.com>, David Howells <dhowells@redhat.com>, Rasmus Villemoes <linux@rasmusvillemoes.dk>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Peter Zijlstra <peterz@infradead.org>, Nicolas Dichtel <nicolas.dichtel@6wind.com>, Ian Kent <raven@themaw.net>, Christian Brauner <christian@brauner.io>, keyrings@vger.kernel.org, "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>, Linux API <linux-api@vger.kernel.org>, lkml <linux-kernel@vger.kernel.org>, Davide Libenzi <davidel@xmailserver.org> Subject: Re: Regression: epoll edge-triggered (EPOLLET) for pipes/FIFOs Date: Mon, 12 Oct 2020 13:52:15 -0700 [thread overview] Message-ID: <CAHk-=wgjR7Nd4CyDoi3SH9kPJp_Td9S-hhFJZMqvp6GS1Ww8eg@mail.gmail.com> (raw) In-Reply-To: <81229415-fb97-51f7-332c-d5e468bcbf2a@gmail.com> [-- Attachment #1: Type: text/plain, Size: 1458 bytes --] On Mon, Oct 12, 2020 at 1:30 PM Michael Kerrisk (man-pages) <mtk.manpages@gmail.com> wrote: > > [CC += Davide] I'm not sure how active Davide is any more.. > I don't think this is correct. The epoll(7) manual page > sill carries the text written long ago by Davide Libenzi, > the creator of epoll: > > Since even with edge-triggered epoll, multiple events can be gen‐ > erated upon receipt of multiple chunks of data, the caller has the > option to specify the EPOLLONESHOT flag, to tell epoll to disable > the associated file descriptor after the receipt of an event with > epoll_wait(2). > > My reading of that text is that in the scenario that I describe a > readiness notification should be generated at step 3 (and indeed > should be generated whenever additional data bleeds into the channel). Hmm. That is unfortunate, because it basically exposes an internal wait queue implementation decision, not actual real semantics. I suspect it's easy enough to "fix" the regression with the attached patch. It's pretty nonsensical, but I guess there's not a lot of downside - if the pipe wasn't empty, there normally shouldn't be any non-epoll readers anyway. I'm busy merging, mind testing this odd patch out? It is _entirely_ untested, but from the symptoms I think it's the obvious fix. I did the same thing for the "reader starting out from a full pipe" case too. Linus [-- Attachment #2: patch --] [-- Type: application/octet-stream, Size: 1619 bytes --] fs/pipe.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/fs/pipe.c b/fs/pipe.c index 0ac197658a2d..e60565bb8125 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -228,14 +228,13 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to) __pipe_lock(pipe); /* - * We only wake up writers if the pipe was full when we started - * reading in order to avoid unnecessary wakeups. + * Epoll wants us to wake things up whether it was full or not. * * But when we do wake up writers, we do so using a sync wakeup * (WF_SYNC), because we want them to get going and generate more * data for us. */ - was_full = pipe_full(pipe->head, pipe->tail, pipe->max_usage); + was_full = true; for (;;) { unsigned int head = pipe->head; unsigned int tail = pipe->tail; @@ -429,8 +428,8 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from) #endif /* - * Only wake up if the pipe started out empty, since - * otherwise there should be no readers waiting. + * Epoll nonsensically wants a wakeup wheher the pipe was + * already empty or not. * * If it wasn't empty we try to merge new data into * the last buffer. @@ -440,9 +439,9 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from) * spanning multiple pages. */ head = pipe->head; - was_empty = pipe_empty(head, pipe->tail); + was_empty = true; chars = total_len & (PAGE_SIZE-1); - if (chars && !was_empty) { + if (chars && !pipe_empty(head, pipe->tail)) { unsigned int mask = pipe->ring_size - 1; struct pipe_buffer *buf = &pipe->bufs[(head - 1) & mask]; int offset = buf->offset + buf->len;
next prev parent reply other threads:[~2020-10-12 20:52 UTC|newest] Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-10-12 18:39 Michael Kerrisk (man-pages) 2020-10-12 19:25 ` Linus Torvalds 2020-10-12 19:28 ` Linus Torvalds 2020-10-12 20:30 ` Michael Kerrisk (man-pages) 2020-10-12 20:52 ` Linus Torvalds [this message] 2020-10-12 21:43 ` Randy Dunlap 2020-10-12 21:51 ` Michael Kerrisk (man-pages) 2020-10-12 22:30 ` Linus Torvalds 2020-10-13 9:47 ` Michael Kerrisk (man-pages)
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CAHk-=wgjR7Nd4CyDoi3SH9kPJp_Td9S-hhFJZMqvp6GS1Ww8eg@mail.gmail.com' \ --to=torvalds@linux-foundation.org \ --cc=aviro@redhat.com \ --cc=christian@brauner.io \ --cc=davidel@xmailserver.org \ --cc=dhowells@redhat.com \ --cc=gregkh@linuxfoundation.org \ --cc=keyrings@vger.kernel.org \ --cc=linux-api@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux@rasmusvillemoes.dk \ --cc=mtk.manpages@gmail.com \ --cc=nicolas.dichtel@6wind.com \ --cc=peterz@infradead.org \ --cc=raven@themaw.net \ --subject='Re: Regression: epoll edge-triggered (EPOLLET) for pipes/FIFOs' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).