All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Johannes Hirte <johannes.hirte@datenkhaos.de>
Cc: David Howells <dhowells@redhat.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Nicolas Dichtel <nicolas.dichtel@6wind.com>,
	raven@themaw.net, Christian Brauner <christian@brauner.io>,
	keyrings@vger.kernel.org, linux-usb@vger.kernel.org,
	linux-block <linux-block@vger.kernel.org>,
	LSM List <linux-security-module@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 04/10] pipe: Use head and tail pointers for the ring, not cursor and length [ver #2]
Date: Fri, 6 Dec 2019 22:47:11 -0800	[thread overview]
Message-ID: <CAHk-=wgkmG9xu_tvMbFTUyn3f2knr7POHjiwMtEmNxXzdPN8wg@mail.gmail.com> (raw)
In-Reply-To: <20191207000015.GA1757@latitude>

On Fri, Dec 6, 2019 at 4:00 PM Johannes Hirte
<johannes.hirte@datenkhaos.de> wrote:
>
> Tested with 5.4.0-11505-g347f56fb3890 and still the same wrong behavior.
> Reliable testcase is facebook, where timeline isn't updated with firefox.

Hmm. I'm not on FB, so that's not a great test for me.

But I've been staring at the code for a long time, and I did find another issue.

poll() and select() were subtly racy and broken. The code did

        unsigned int head = READ_ONCE(pipe->head);
        unsigned int tail = READ_ONCE(pipe->tail);

which is ok in theory - select and poll can be racy, and doing racy
reads is ok and we do it in other places too.

But when you don't do proper locking and do racy poll/select, you need
to make sure that *if* you were wrong, and said "there's nothing
pending", you need to have added yourself to the wait-queue so that
any changes caused poll to update.

And the new pipe code did that wrong. It added itself to the poll wait
queues *after* it had read that racy data, so you could get into a
race where

 - poll reads stale data

      - data changes, wakeup happens

 - poll adds itself to the poll wait queue after the wakeup

 - poll returns "nothing to read/write" based on stale data, and never
saw the wakeup event that told it otherwise.

So a patch something like the appended (whitespace-damaged once again,
because it's untested and I've only been _looking_ a the code) might
solve that issue.

That said, the race here is quite small. Since that firefox problem is
apparently repeatable for you, the timing is either _very_ unlucky, or
there is something else going on too.

                  Linus

--- snip snip ---

diff --git a/fs/pipe.c b/fs/pipe.c
index c561f7f5e902..4c39ea9b3419 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -557,12 +557,24 @@ pipe_poll(struct file *filp, poll_table *wait)
 {
        __poll_t mask;
        struct pipe_inode_info *pipe = filp->private_data;
-       unsigned int head = READ_ONCE(pipe->head);
-       unsigned int tail = READ_ONCE(pipe->tail);
+       unsigned int head, tail;

+       /*
+        * Reading only -- no need for acquiring the semaphore.
+        *
+        * But because this is racy, the code has to add the
+        * entry to the poll table _first_ ..
+        */
        poll_wait(filp, &pipe->wait, wait);

-       /* Reading only -- no need for acquiring the semaphore.  */
+       /*
+        * .. and only then can you do the racy tests. That way,
+        * if something changes and you got it wrong, the poll
+        * table entry will wake you up and fix it.
+        */
+       head = READ_ONCE(pipe->head);
+       tail = READ_ONCE(pipe->tail);
+
        mask = 0;
        if (filp->f_mode & FMODE_READ) {
                if (!pipe_empty(head, tail))

WARNING: multiple messages have this Message-ID (diff)
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Johannes Hirte <johannes.hirte@datenkhaos.de>
Cc: David Howells <dhowells@redhat.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Nicolas Dichtel <nicolas.dichtel@6wind.com>,
	raven@themaw.net, Christian Brauner <christian@brauner.io>,
	keyrings@vger.kernel.org, linux-usb@vger.kernel.org,
	linux-block <linux-block@vger.kernel.org>,
	LSM List <linux-security-module@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 04/10] pipe: Use head and tail pointers for the ring, not cursor and length [ver #2]
Date: Sat, 07 Dec 2019 06:47:11 +0000	[thread overview]
Message-ID: <CAHk-=wgkmG9xu_tvMbFTUyn3f2knr7POHjiwMtEmNxXzdPN8wg@mail.gmail.com> (raw)
In-Reply-To: <20191207000015.GA1757@latitude>

On Fri, Dec 6, 2019 at 4:00 PM Johannes Hirte
<johannes.hirte@datenkhaos.de> wrote:
>
> Tested with 5.4.0-11505-g347f56fb3890 and still the same wrong behavior.
> Reliable testcase is facebook, where timeline isn't updated with firefox.

Hmm. I'm not on FB, so that's not a great test for me.

But I've been staring at the code for a long time, and I did find another issue.

poll() and select() were subtly racy and broken. The code did

        unsigned int head = READ_ONCE(pipe->head);
        unsigned int tail = READ_ONCE(pipe->tail);

which is ok in theory - select and poll can be racy, and doing racy
reads is ok and we do it in other places too.

But when you don't do proper locking and do racy poll/select, you need
to make sure that *if* you were wrong, and said "there's nothing
pending", you need to have added yourself to the wait-queue so that
any changes caused poll to update.

And the new pipe code did that wrong. It added itself to the poll wait
queues *after* it had read that racy data, so you could get into a
race where

 - poll reads stale data

      - data changes, wakeup happens

 - poll adds itself to the poll wait queue after the wakeup

 - poll returns "nothing to read/write" based on stale data, and never
saw the wakeup event that told it otherwise.

So a patch something like the appended (whitespace-damaged once again,
because it's untested and I've only been _looking_ a the code) might
solve that issue.

That said, the race here is quite small. Since that firefox problem is
apparently repeatable for you, the timing is either _very_ unlucky, or
there is something else going on too.

                  Linus

--- snip snip ---

diff --git a/fs/pipe.c b/fs/pipe.c
index c561f7f5e902..4c39ea9b3419 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -557,12 +557,24 @@ pipe_poll(struct file *filp, poll_table *wait)
 {
        __poll_t mask;
        struct pipe_inode_info *pipe = filp->private_data;
-       unsigned int head = READ_ONCE(pipe->head);
-       unsigned int tail = READ_ONCE(pipe->tail);
+       unsigned int head, tail;

+       /*
+        * Reading only -- no need for acquiring the semaphore.
+        *
+        * But because this is racy, the code has to add the
+        * entry to the poll table _first_ ..
+        */
        poll_wait(filp, &pipe->wait, wait);

-       /* Reading only -- no need for acquiring the semaphore.  */
+       /*
+        * .. and only then can you do the racy tests. That way,
+        * if something changes and you got it wrong, the poll
+        * table entry will wake you up and fix it.
+        */
+       head = READ_ONCE(pipe->head);
+       tail = READ_ONCE(pipe->tail);
+
        mask = 0;
        if (filp->f_mode & FMODE_READ) {
                if (!pipe_empty(head, tail))

  parent reply	other threads:[~2019-12-07  6:47 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-23 20:17 [RFC PATCH 00/10] pipe: Notification queue preparation [ver #2] David Howells
2019-10-23 20:17 ` David Howells
2019-10-23 20:17 ` David Howells
2019-10-23 20:17 ` [RFC PATCH 01/10] pipe: Reduce #inclusion of pipe_fs_i.h " David Howells
2019-10-23 20:17   ` David Howells
2019-10-23 20:17   ` David Howells
2019-10-23 20:17 ` [RFC PATCH 02/10] Remove the nr_exclusive argument from __wake_up_sync_key() " David Howells
2019-10-23 20:17   ` David Howells
2019-10-23 20:17   ` David Howells
2019-10-23 20:17 ` [RFC PATCH 03/10] Add wake_up_interruptible_sync_poll_locked() " David Howells
2019-10-23 20:17   ` David Howells
2019-10-23 20:17   ` David Howells
2019-10-23 20:17 ` [RFC PATCH 04/10] pipe: Use head and tail pointers for the ring, not cursor and length " David Howells
2019-10-23 20:17   ` David Howells
2019-10-23 20:17   ` David Howells
2019-10-26  0:29   ` kbuild test robot
2019-10-26  0:50   ` kbuild test robot
2019-10-26 14:58   ` [pipe] 6567a02d20: BUG:kernel_NULL_pointer_dereference,address kernel test robot
2019-10-26 14:58     ` kernel test robot
2019-10-26 14:58     ` kernel test robot
2019-10-27 14:03   ` [RFC PATCH 04/10] pipe: Use head and tail pointers for the ring, not cursor and length [ver #2] Linus Torvalds
2019-10-27 14:03     ` Linus Torvalds
2019-10-30 16:19   ` Ilya Dryomov
2019-10-30 16:19     ` Ilya Dryomov
2019-10-30 20:35     ` Rasmus Villemoes
2019-10-30 20:35       ` Rasmus Villemoes
2019-10-30 22:16       ` Ilya Dryomov
2019-10-30 22:16         ` Ilya Dryomov
2019-10-30 22:38         ` Rasmus Villemoes
2019-10-30 22:38           ` Rasmus Villemoes
2019-10-31 15:11     ` David Howells
2019-10-31 15:57       ` Ilya Dryomov
2019-10-31 15:57         ` Ilya Dryomov
2019-11-01 14:53       ` David Howells
2019-10-31 14:57   ` David Howells
2019-11-03 11:17     ` Matthew Wilcox
2019-11-03 11:17       ` Matthew Wilcox
2019-12-06 21:47   ` Johannes Hirte
2019-12-06 21:47     ` Johannes Hirte
2019-12-06 22:14     ` Linus Torvalds
2019-12-06 22:14       ` Linus Torvalds
2019-12-07  0:00       ` Johannes Hirte
2019-12-07  0:00         ` Johannes Hirte
2019-12-07  1:03         ` Linus Torvalds
2019-12-07  1:03           ` Linus Torvalds
2019-12-08 17:56           ` Johannes Hirte
2019-12-08 17:56             ` Johannes Hirte
2019-12-08 18:10             ` Linus Torvalds
2019-12-08 18:10               ` Linus Torvalds
2019-12-07  6:47         ` Linus Torvalds [this message]
2019-12-07  6:47           ` Linus Torvalds
2019-12-06 22:15   ` David Howells
2019-10-23 20:17 ` [RFC PATCH 05/10] pipe: Allow pipes to have kernel-reserved slots " David Howells
2019-10-23 20:17   ` David Howells
2019-10-23 20:17   ` David Howells
2019-10-23 20:18 ` [RFC PATCH 06/10] pipe: Advance tail pointer inside of wait spinlock in pipe_read() " David Howells
2019-10-23 20:18   ` David Howells
2019-10-23 20:18   ` David Howells
2019-10-23 20:18 ` [RFC PATCH 07/10] pipe: Conditionalise wakeup " David Howells
2019-10-23 20:18   ` David Howells
2019-10-23 20:18   ` David Howells
2019-10-27 15:57   ` Konstantin Khlebnikov
2019-10-27 15:57     ` Konstantin Khlebnikov
2019-10-31 15:21   ` David Howells
2019-10-31 16:38   ` David Howells
2019-11-03 11:04     ` Konstantin Khlebnikov
2019-11-03 11:04       ` Konstantin Khlebnikov
2019-10-23 20:18 ` [RFC PATCH 08/10] pipe: Rearrange sequence in pipe_write() to preallocate slot " David Howells
2019-10-23 20:18   ` David Howells
2019-10-23 20:18   ` David Howells
2019-10-23 20:18 ` [RFC PATCH 09/10] pipe: Remove redundant wakeup from pipe_write() " David Howells
2019-10-23 20:18   ` David Howells
2019-10-23 20:18   ` David Howells
2019-10-23 20:18 ` [RFC PATCH 10/10] pipe: Check for ring full inside of the spinlock in " David Howells
2019-10-23 20:18   ` David Howells
2019-10-23 20:18   ` David Howells
2019-10-24 10:32 ` [RFC PATCH 04/10] pipe: Use head and tail pointers for the ring, not cursor and length " David Howells
2019-10-24 10:32   ` David Howells
2019-10-24 13:14 ` [RFC PATCH 00/10] pipe: Notification queue preparation " Peter Zijlstra
2019-10-24 13:14   ` Peter Zijlstra
2019-10-24 16:57 ` [RFC PATCH 11/10] pipe: Add fsync() support " David Howells
2019-10-24 16:57   ` David Howells
2019-10-24 21:29   ` Linus Torvalds
2019-10-24 21:29     ` Linus Torvalds
2019-10-25  8:34   ` David Howells
2019-10-25  8:34     ` David Howells
2019-10-27 15:22   ` Christoph Hellwig
2019-10-27 15:22     ` Christoph Hellwig
2019-10-27 16:04   ` Konstantin Khlebnikov
2019-10-27 16:04     ` Konstantin Khlebnikov
2019-10-31 15:13   ` David Howells
2019-10-31 15:15   ` David Howells
2019-11-02 18:53     ` Linus Torvalds
2019-11-02 18:53       ` Linus Torvalds
2019-11-02 19:34     ` David Howells
2019-11-02 20:31       ` Andy Lutomirski
2019-11-02 20:31         ` Andy Lutomirski
2019-11-02 22:03         ` Linus Torvalds
2019-11-02 22:03           ` Linus Torvalds
2019-11-02 22:09           ` Linus Torvalds
2019-11-02 22:09             ` Linus Torvalds
2019-11-02 22:30           ` Andy Lutomirski
2019-11-02 22:30             ` Andy Lutomirski
2019-11-02 23:02             ` Linus Torvalds
2019-11-02 23:02               ` Linus Torvalds
2019-11-02 23:09               ` Linus Torvalds
2019-11-02 23:09                 ` Linus Torvalds
2019-11-02 23:14                 ` Andy Lutomirski
2019-11-02 23:14                   ` Andy Lutomirski
2019-11-03 12:02                   ` Konstantin Khlebnikov
2019-11-03 12:02                     ` Konstantin Khlebnikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wgkmG9xu_tvMbFTUyn3f2knr7POHjiwMtEmNxXzdPN8wg@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=christian@brauner.io \
    --cc=dhowells@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=johannes.hirte@datenkhaos.de \
    --cc=keyrings@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=nicolas.dichtel@6wind.com \
    --cc=peterz@infradead.org \
    --cc=raven@themaw.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.