All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jann Horn <jannh@google.com>
To: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
	raven@themaw.net, Miklos Szeredi <mszeredi@redhat.com>,
	Christian Brauner <christian@brauner.io>,
	Linux API <linux-api@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	kernel list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 15/19] vfs: Add superblock notifications [ver #16]
Date: Fri, 21 Feb 2020 16:44:42 +0100	[thread overview]
Message-ID: <CAG48ez2nFks+yN1Kp4TZisso+rjvv_4UW0FTo8iFUd4Qyq1qDw@mail.gmail.com> (raw)
In-Reply-To: <1897788.1582295034@warthog.procyon.org.uk>

On Fri, Feb 21, 2020 at 3:24 PM David Howells <dhowells@redhat.com> wrote:
>
> Jann Horn <jannh@google.com> wrote:
>
> > > +               if (!s->s_watchers) {
> >
> > READ_ONCE() ?
>
> I'm not sure it matters.  It can only be set once, and the next time we read
> it we're inside the lock.  And at this point, I don't actually dereference it,
> and if it's non-NULL, it's not going to change.

I'd really like these READ_ONCE() things to be *anywhere* the value
can concurrently change, for two reasons:

First, it tells the reader "keep in mind that this value may
concurrently change in some way, don't just assume that it'll stay the
same".

But also, it tells the compiler that if it generates multiple loads
here and assumes that they return the same value, *really* bad stuff
may happen. GCC has some really fun behavior when compiling a switch()
on a value that might change concurrently without using READ_ONCE():
It sometimes generates multiple loads, where the first load is used to
test whether the value is in a specific range and then the second load
is used for actually indexing into a table of jump destinations. If
the value is concurrently mutated from an in-bounds value to an
out-of-bounds value, this code will load a jump destination from
random out-of-bounds memory.

An example:

$ cat gcc-jump.c
int blah(int *x, int y) {
  switch (*x) {
    case 0: return y+1;
    case 1: return y*2;
    case 2: return y-3;
    case 3: return y^1;
    case 4: return y+6;
    case 5: return y-5;
    case 6: return y|1;
    case 7: return y&4;
    case 8: return y|5;
    case 9: return y-3;
    case 10: return y&8;
    case 11: return y|9;
    default: return y;
  }
}
$ gcc-9 -O2 -c -o gcc-jump.o gcc-jump.c
$ objdump -dr gcc-jump.o
[...]
0000000000000000 <blah>:
   0: 83 3f 0b              cmpl   $0xb,(%rdi)
   3: 0f 87 00 00 00 00    ja     9 <blah+0x9>
5: R_X86_64_PC32 .text.unlikely-0x4
   9: 8b 07                mov    (%rdi),%eax
   b: 48 8d 15 00 00 00 00 lea    0x0(%rip),%rdx        # 12 <blah+0x12>
e: R_X86_64_PC32 .rodata-0x4
  12: 48 63 04 82          movslq (%rdx,%rax,4),%rax
  16: 48 01 d0              add    %rdx,%rax
  19: ff e0                jmpq   *%rax
[...]


Or if you want to see a full example that actually crashes:

$ cat gcc-jump-crash.c
#include <pthread.h>

int mutating_number;

__attribute__((noinline)) int blah(int *x, int y) {
  switch (*x) {
    case 0: return y+1;
    case 1: return y*2;
    case 2: return y-3;
    case 3: return y^1;
    case 4: return y+6;
    case 5: return y-5;
    case 6: return y|1;
    case 7: return y&4;
    case 8: return y|5;
    case 9: return y-3;
    case 10: return y&8;
    case 11: return y|9;
    default: return y;
  }
}

int blah_num;
void *thread_fn(void *dummy) {
  while (1) {
    blah_num = blah(&mutating_number, blah_num);
  }
}

int main(void) {
  pthread_t thread;
  pthread_create(&thread, NULL, thread_fn, NULL);
  while (1) {
    *(volatile int *)&mutating_number = 1;
    *(volatile int *)&mutating_number = 100000000;
  }
}
$ gcc-9 -O2 -pthread -o gcc-jump-crash gcc-jump-crash.c -ggdb -Wall
$ gdb ./gcc-jump-crash
[...]
(gdb) run
[...]
Thread 2 "gcc-jump-crash" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7db6700 (LWP 33237)]
0x00005555555551a2 in blah (x=0x555555558034 <mutating_number>, y=0)
at gcc-jump-crash.c:6
6   switch (*x) {
(gdb) x/10i blah
   0x555555555190 <blah>: cmp    DWORD PTR [rdi],0xb
   0x555555555193 <blah+3>: ja     0x555555555050 <blah+4294966976>
   0x555555555199 <blah+9>: mov    eax,DWORD PTR [rdi]
   0x55555555519b <blah+11>: lea    rdx,[rip+0xe62]        # 0x555555556004
=> 0x5555555551a2 <blah+18>: movsxd rax,DWORD PTR [rdx+rax*4]
   0x5555555551a6 <blah+22>: add    rax,rdx
   0x5555555551a9 <blah+25>: jmp    rax
   0x5555555551ab <blah+27>: nop    DWORD PTR [rax+rax*1+0x0]
   0x5555555551b0 <blah+32>: lea    eax,[rsi-0x3]
   0x5555555551b3 <blah+35>: ret
(gdb)


Here's a presentation from Felix Wilhelm, a security researcher who
managed to find a case in the Xen hypervisor where a switch() on a
value in shared memory was exploitable to compromise the hypervisor
from inside a guest (see slides 35 and following):
<https://www.blackhat.com/docs/us-16/materials/us-16-Wilhelm-Xenpwn-Breaking-Paravirtualized-Devices.pdf>

I realize that a compiler is extremely unlikely to make such an
optimization decision for a simple "if (!a->b)" branch; but still, I
would prefer to have READ_ONCE() everywhere where it is semantically
required, not just everywhere where you can think of a concrete
compiler optimization that will break stuff.

> > > +                       ret = add_watch_to_object(watch, s->s_watchers);
> > > +                       if (ret == 0) {
> > > +                               spin_lock(&sb_lock);
> > > +                               s->s_count++;
> > > +                               spin_unlock(&sb_lock);
> >
> > Where is the corresponding decrement of s->s_count? I'm guessing that
> > it should be in the ->release_watch() handler, except that there isn't
> > one...
>
> Um.  Good question.  I think this should do the job:
>
>         static void sb_release_watch(struct watch *watch)
>         {
>                 put_super(watch->private);
>         }
>
> And this then has to be set later:
>
>         init_watch_list(wlist, sb_release_watch);

(And as in the other case, the s->s_count increment will probably have
to be moved above the add_watch_to_object(), unless you hold the
sb_lock around it?)

  reply	other threads:[~2020-02-21 15:45 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-18 17:04 [PATCH 00/19] VFS: Filesystem information and notifications [ver #16] David Howells
2020-02-18 17:05 ` [PATCH 01/19] vfs: syscall: Add fsinfo() to query filesystem information " David Howells
2020-02-19 16:31   ` Darrick J. Wong
2020-02-19 20:07   ` Jann Horn
2020-02-20 10:34   ` David Howells
2020-02-20 15:48     ` Darrick J. Wong
2020-02-20 11:03   ` David Howells
2020-02-20 14:54     ` Jann Horn
2020-02-20 15:31       ` Darrick J. Wong
2020-02-18 17:05 ` [PATCH 02/19] fsinfo: Add syscalls to other arches " David Howells
2020-02-21 14:51   ` Christian Brauner
2020-02-21 18:10     ` Geert Uytterhoeven
2020-02-18 17:05 ` [PATCH 03/19] fsinfo: Provide a bitmap of supported features " David Howells
2020-02-19 16:37   ` Darrick J. Wong
2020-02-20 12:22   ` David Howells
2020-02-18 17:05 ` [PATCH 04/19] vfs: Add mount change counter " David Howells
2020-02-21 14:48   ` Christian Brauner
2020-02-18 17:05 ` [PATCH 05/19] vfs: Introduce a non-repeating system-unique superblock ID " David Howells
2020-02-19 16:53   ` Darrick J. Wong
2020-02-20 12:45   ` David Howells
2020-02-18 17:05 ` [PATCH 06/19] vfs: Allow fsinfo() to look up a mount object by " David Howells
2020-02-21 15:09   ` Christian Brauner
2020-02-18 17:05 ` [PATCH 07/19] vfs: Allow mount information to be queried by fsinfo() " David Howells
2020-02-18 17:05 ` [PATCH 08/19] vfs: fsinfo sample: Mount listing program " David Howells
2020-02-18 17:06 ` [PATCH 09/19] fsinfo: Allow the mount topology propogation flags to be retrieved " David Howells
2020-02-18 17:06 ` [PATCH 10/19] fsinfo: Add API documentation " David Howells
2020-02-18 17:06 ` [PATCH 11/19] afs: Support fsinfo() " David Howells
2020-02-19 21:01   ` Jann Horn
2020-02-20 12:58   ` David Howells
2020-02-20 14:58     ` Jann Horn
2020-02-21 13:26     ` David Howells
2020-02-18 17:06 ` [PATCH 12/19] security: Add hooks to rule on setting a superblock or mount watch " David Howells
2020-02-18 17:06 ` [PATCH 13/19] vfs: Add a mount-notification facility " David Howells
2020-02-19 22:40   ` Jann Horn
2020-02-19 22:55     ` Jann Horn
2020-02-21 12:24   ` David Howells
2020-02-21 15:49     ` Jann Horn
2020-02-21 17:06     ` David Howells
2020-02-21 17:36       ` seq_lock and lockdep_is_held() assertions Jann Horn
2020-02-21 18:02         ` John Stultz
2020-02-18 17:06 ` [PATCH 14/19] notifications: sample: Display mount tree change notifications [ver #16] David Howells
2020-02-18 17:06 ` [PATCH 15/19] vfs: Add superblock " David Howells
2020-02-19 23:08   ` Jann Horn
2020-02-21 14:23   ` David Howells
2020-02-21 15:44     ` Jann Horn [this message]
2020-02-21 16:33     ` David Howells
2020-02-21 16:41       ` Jann Horn
2020-02-21 17:11       ` David Howells
2020-02-18 17:06 ` [PATCH 16/19] fsinfo: Provide superblock notification counter " David Howells
2020-02-18 17:07 ` [PATCH 17/19] notifications: sample: Display superblock notifications " David Howells
2020-02-18 17:07 ` [PATCH 18/19] ext4: Add example fsinfo information " David Howells
2020-02-19 17:04   ` Darrick J. Wong
2020-02-20  0:53   ` kbuild test robot
2020-02-20  0:53     ` kbuild test robot
2020-02-21 14:43   ` David Howells
2020-02-21 16:26     ` Darrick J. Wong
2020-02-18 17:07 ` [PATCH 19/19] nfs: Add example filesystem " David Howells
2020-02-20  2:13   ` kbuild test robot
2020-02-20  2:13     ` kbuild test robot
2020-02-20  2:20   ` kbuild test robot
2020-02-20  2:20     ` kbuild test robot
2020-02-18 18:12 ` David Howells
2020-02-19 10:23 ` [PATCH 00/19] VFS: Filesystem information and notifications " Stefan Metzmacher
2020-02-19 14:46 ` Christian Brauner
2020-02-19 15:50   ` Darrick J. Wong
2020-02-20  4:42   ` Ian Kent
2020-02-20  9:09     ` Christian Brauner
2020-02-20 11:30       ` Ian Kent
2020-02-19 16:16 ` David Howells
2020-02-21 12:57 ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAG48ez2nFks+yN1Kp4TZisso+rjvv_4UW0FTo8iFUd4Qyq1qDw@mail.gmail.com \
    --to=jannh@google.com \
    --cc=christian@brauner.io \
    --cc=dhowells@redhat.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mszeredi@redhat.com \
    --cc=raven@themaw.net \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.