linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* general protection fault in do_move_mount (2)
@ 2019-06-18 10:47 syzbot
  2019-06-18 14:02 ` Al Viro
  0 siblings, 1 reply; 22+ messages in thread
From: syzbot @ 2019-06-18 10:47 UTC (permalink / raw)
  To: arnd, axboe, bp, catalin.marinas, christian, dhowells, geert,
	hare, heiko.carstens, hpa, linux-fsdevel, linux-kernel, luto,
	mingo, syzkaller-bugs, tglx, viro, x86

Hello,

syzbot found the following crash on:

HEAD commit:    9e0babf2 Linux 5.2-rc5
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=138b310aa00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=d16883d6c7f0d717
dashboard link: https://syzkaller.appspot.com/bug?extid=6004acbaa1893ad013f0
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=154e8c2aa00000

The bug was bisected to:

commit 9c8ad7a2ff0bfe58f019ec0abc1fb965114dde7d
Author: David Howells <dhowells@redhat.com>
Date:   Thu May 16 11:52:27 2019 +0000

     uapi, x86: Fix the syscall numbering of the mount API syscalls [ver #2]

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=14fa8411a00000
final crash:    https://syzkaller.appspot.com/x/report.txt?x=16fa8411a00000
console output: https://syzkaller.appspot.com/x/log.txt?x=12fa8411a00000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+6004acbaa1893ad013f0@syzkaller.appspotmail.com
Fixes: 9c8ad7a2ff0b ("uapi, x86: Fix the syscall numbering of the mount API  
syscalls [ver #2]")

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 9133 Comm: syz-executor.0 Not tainted 5.2.0-rc5 #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:do_move_mount.isra.0+0x5fe/0xe10 fs/namespace.c:2602
Code: ff ff 00 0f 84 7a fb ff ff e8 de a4 b5 ff 48 8b 85 50 ff ff ff 48 8d  
78 48 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f  
85 6d 07 00 00 48 8b 85 50 ff ff ff 31 ff 4c 8b 78
RSP: 0018:ffff888097117d58 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: ffff888097117ea8 RCX: 1ffff11015330137
RDX: 0000000000000006 RSI: ffffffff81bb1c82 RDI: 0000000000000032
RBP: ffff888097117e38 R08: ffff88808971c5c0 R09: ffffed1015d06be0
R10: ffffed1015d06bdf R11: ffff8880ae835efb R12: ffff88809f377f00
R13: ffff88821baa2420 R14: ffff888097117e90 R15: ffff8880a99809a0
FS:  00007f0eb2bcd700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa3553c2000 CR3: 00000000a8a61000 CR4: 00000000001406f0
Call Trace:
  __do_sys_move_mount fs/namespace.c:3524 [inline]
  __se_sys_move_mount fs/namespace.c:3483 [inline]
  __x64_sys_move_mount+0x355/0x440 fs/namespace.c:3483
  do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f0eb2bccc78 EFLAGS: 00000246 ORIG_RAX: 00000000000001ad
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00000000004592c9
RDX: ffffffffffffff9c RSI: 0000000020000040 RDI: 0000000000000003
RBP: 000000000075c070 R08: 0000000000000066 R09: 0000000000000000
R10: 0000000020000100 R11: 0000000000000246 R12: 00007f0eb2bcd6d4
R13: 00000000004c5706 R14: 00000000004d9ba8 R15: 00000000ffffffff
Modules linked in:
---[ end trace 9c2a9754ccc962c7 ]---
RIP: 0010:do_move_mount.isra.0+0x5fe/0xe10 fs/namespace.c:2602
Code: ff ff 00 0f 84 7a fb ff ff e8 de a4 b5 ff 48 8b 85 50 ff ff ff 48 8d  
78 48 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f  
85 6d 07 00 00 48 8b 85 50 ff ff ff 31 ff 4c 8b 78
RSP: 0018:ffff888097117d58 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: ffff888097117ea8 RCX: 1ffff11015330137
RDX: 0000000000000006 RSI: ffffffff81bb1c82 RDI: 0000000000000032
RBP: ffff888097117e38 R08: ffff88808971c5c0 R09: ffffed1015d06be0
R10: ffffed1015d06bdf R11: ffff8880ae835efb R12: ffff88809f377f00
R13: ffff88821baa2420 R14: ffff888097117e90 R15: ffff8880a99809a0
FS:  00007f0eb2bcd700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa3553c2000 CR3: 00000000a8a61000 CR4: 00000000001406f0


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: general protection fault in do_move_mount (2)
  2019-06-18 10:47 general protection fault in do_move_mount (2) syzbot
@ 2019-06-18 14:02 ` Al Viro
  2019-06-24  9:28   ` Dmitry Vyukov
  0 siblings, 1 reply; 22+ messages in thread
From: Al Viro @ 2019-06-18 14:02 UTC (permalink / raw)
  To: syzbot
  Cc: arnd, axboe, bp, catalin.marinas, christian, dhowells, geert,
	hare, heiko.carstens, hpa, linux-fsdevel, linux-kernel, luto,
	mingo, syzkaller-bugs, tglx, x86

On Tue, Jun 18, 2019 at 03:47:10AM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    9e0babf2 Linux 5.2-rc5
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=138b310aa00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=d16883d6c7f0d717
> dashboard link: https://syzkaller.appspot.com/bug?extid=6004acbaa1893ad013f0
> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=154e8c2aa00000

IDGI...

mkdir(&(0x7f0000632000)='./file0\x00', 0x0)
mount(0x0, 0x0, 0x0, 0x0, 0x0)
syz_open_procfs(0x0, 0x0)
r0 = open(&(0x7f0000032ff8)='./file0\x00', 0x0, 0x0)
r1 = memfd_create(&(0x7f00000001c0)='\xb3', 0x0)
write$FUSE_DIRENT(r1, &(0x7f0000000080)=ANY=[], 0x29)
move_mount(r0, &(0x7f0000000040)='./file0\x00', 0xffffffffffffff9c, &(0x7f0000000100)='./file0\x00', 0x66)

reads as if we'd done mkdir ./file0, opened it and then tried
to feed move_mount(2) "./file0" relative to that descriptor.
How the hell has that avoided an instant -ENOENT?  On the first
pair, that is - the second one (AT_FDCWD, "./file0") is fine...

Confused...  Incidentally, what the hell is
	mount(0x0, 0x0, 0x0, 0x0, 0x0)
about?  *IF* that really refers to mount(2) with
such arguments, all you'll get is -EFAULT.  Way before
it gets to actually doing anything - it won't get past
        /* ... and get the mountpoint */
        retval = user_path(dir_name, &path);
        if (retval)
                return retval;
in do_mount(2)...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: general protection fault in do_move_mount (2)
  2019-06-18 14:02 ` Al Viro
@ 2019-06-24  9:28   ` Dmitry Vyukov
  2019-06-29 20:27     ` [PATCH] vfs: move_mount: reject moving kernel internal mounts Eric Biggers
  2019-06-29 20:39     ` general protection fault in do_move_mount (2) Eric Biggers
  0 siblings, 2 replies; 22+ messages in thread
From: Dmitry Vyukov @ 2019-06-24  9:28 UTC (permalink / raw)
  To: Al Viro
  Cc: syzbot, Arnd Bergmann, Jens Axboe, Borislav Petkov,
	Catalin Marinas, Christian Brauner, David Howells,
	Geert Uytterhoeven, Hannes Reinecke, Heiko Carstens,
	H. Peter Anvin, linux-fsdevel, LKML, Andy Lutomirski,
	Ingo Molnar, syzkaller-bugs, Thomas Gleixner,
	the arch/x86 maintainers

On Tue, Jun 18, 2019 at 4:03 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Tue, Jun 18, 2019 at 03:47:10AM -0700, syzbot wrote:
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit:    9e0babf2 Linux 5.2-rc5
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=138b310aa00000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=d16883d6c7f0d717
> > dashboard link: https://syzkaller.appspot.com/bug?extid=6004acbaa1893ad013f0
> > compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=154e8c2aa00000
>
> IDGI...
>
> mkdir(&(0x7f0000632000)='./file0\x00', 0x0)
> mount(0x0, 0x0, 0x0, 0x0, 0x0)
> syz_open_procfs(0x0, 0x0)
> r0 = open(&(0x7f0000032ff8)='./file0\x00', 0x0, 0x0)
> r1 = memfd_create(&(0x7f00000001c0)='\xb3', 0x0)
> write$FUSE_DIRENT(r1, &(0x7f0000000080)=ANY=[], 0x29)
> move_mount(r0, &(0x7f0000000040)='./file0\x00', 0xffffffffffffff9c, &(0x7f0000000100)='./file0\x00', 0x66)
>
> reads as if we'd done mkdir ./file0, opened it and then tried
> to feed move_mount(2) "./file0" relative to that descriptor.
> How the hell has that avoided an instant -ENOENT?  On the first
> pair, that is - the second one (AT_FDCWD, "./file0") is fine...
>
> Confused...  Incidentally, what the hell is
>         mount(0x0, 0x0, 0x0, 0x0, 0x0)
> about?  *IF* that really refers to mount(2) with
> such arguments, all you'll get is -EFAULT.  Way before
> it gets to actually doing anything - it won't get past
>         /* ... and get the mountpoint */
>         retval = user_path(dir_name, &path);
>         if (retval)
>                 return retval;
> in do_mount(2)...

Yes, mount(0x0, 0x0, 0x0, 0x0, 0x0) is mount with 0 arguments. Most
likely it returns EFAULT.
Since the reproducer have "threaded":true,"collide":true and no C
repro, most likely this is a subtle race. So it attempted to remove
mount(0x0, 0x0, 0x0, 0x0, 0x0) but it did not crash, so the conclusion
was that it's somehow needed. You can actually see that other
reproducers for this bug do not have this mount, but are otherwise
similar.

With "threaded":true,"collide":true the execution mode is not just
"execute each syscall once one after another".
The syscalls are executed in separate threads and actually twice. You
can see the approximate execution mode in this C program:
https://gist.githubusercontent.com/dvyukov/c3a52f012e7cff9bdebf3935d35245cf/raw/b5587824111a1d982c985b00137ae8609572335b/gistfile1.txt
Yet using the C program did not trigger the crash somehow (maybe just
slightly different timings).

Since syzkaller was able to reproduce it multiple times, it looks like
a real bug rather than an induced memory corruption or something.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] vfs: move_mount: reject moving kernel internal mounts
  2019-06-24  9:28   ` Dmitry Vyukov
@ 2019-06-29 20:27     ` Eric Biggers
  2019-06-29 20:39       ` Al Viro
  2019-07-01 16:45       ` Eric Biggers
  2019-06-29 20:39     ` general protection fault in do_move_mount (2) Eric Biggers
  1 sibling, 2 replies; 22+ messages in thread
From: Eric Biggers @ 2019-06-29 20:27 UTC (permalink / raw)
  To: linux-fsdevel, Alexander Viro; +Cc: David Howells, linux-kernel, syzkaller-bugs

From: Eric Biggers <ebiggers@google.com>

sys_move_mount() crashes by dereferencing the pointer MNT_NS_INTERNAL,
a.k.a. ERR_PTR(-EINVAL), if the old mount is specified by fd for a
kernel object with an internal mount, such as a pipe or memfd.

Fix it by checking for this case and returning -EINVAL.

Reproducer:

    #include <unistd.h>

    #define __NR_move_mount         429
    #define MOVE_MOUNT_F_EMPTY_PATH 0x00000004

    int main()
    {
    	int fds[2];

    	pipe(fds);
        syscall(__NR_move_mount, fds[0], "", -1, "/", MOVE_MOUNT_F_EMPTY_PATH);
    }

Reported-by: syzbot+6004acbaa1893ad013f0@syzkaller.appspotmail.com
Fixes: 2db154b3ea8e ("vfs: syscall: Add move_mount(2) to move mounts around")
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 7660c2749c96..a7e5a44770a7 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2600,7 +2600,7 @@ static int do_move_mount(struct path *old_path, struct path *new_path)
 	if (attached && !check_mnt(old))
 		goto out;
 
-	if (!attached && !(ns && is_anon_ns(ns)))
+	if (!attached && !(ns && ns != MNT_NS_INTERNAL && is_anon_ns(ns)))
 		goto out;
 
 	if (old->mnt.mnt_flags & MNT_LOCKED)
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] vfs: move_mount: reject moving kernel internal mounts
  2019-06-29 20:27     ` [PATCH] vfs: move_mount: reject moving kernel internal mounts Eric Biggers
@ 2019-06-29 20:39       ` Al Viro
  2019-07-01  1:08         ` Al Viro
  2019-07-01  7:38         ` David Howells
  2019-07-01 16:45       ` Eric Biggers
  1 sibling, 2 replies; 22+ messages in thread
From: Al Viro @ 2019-06-29 20:39 UTC (permalink / raw)
  To: Eric Biggers; +Cc: linux-fsdevel, David Howells, linux-kernel, syzkaller-bugs

On Sat, Jun 29, 2019 at 01:27:44PM -0700, Eric Biggers wrote:

> @@ -2600,7 +2600,7 @@ static int do_move_mount(struct path *old_path, struct path *new_path)
>  	if (attached && !check_mnt(old))
>  		goto out;
>  
> -	if (!attached && !(ns && is_anon_ns(ns)))
> +	if (!attached && !(ns && ns != MNT_NS_INTERNAL && is_anon_ns(ns)))
>  		goto out;
>  
>  	if (old->mnt.mnt_flags & MNT_LOCKED)

*UGH*

Applied, but that code is getting really ugly ;-/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: general protection fault in do_move_mount (2)
  2019-06-24  9:28   ` Dmitry Vyukov
  2019-06-29 20:27     ` [PATCH] vfs: move_mount: reject moving kernel internal mounts Eric Biggers
@ 2019-06-29 20:39     ` Eric Biggers
  2019-07-01 14:59       ` Dmitry Vyukov
  1 sibling, 1 reply; 22+ messages in thread
From: Eric Biggers @ 2019-06-29 20:39 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Al Viro, syzbot, Arnd Bergmann, Jens Axboe, Borislav Petkov,
	Catalin Marinas, Christian Brauner, David Howells,
	Geert Uytterhoeven, Hannes Reinecke, Heiko Carstens,
	H. Peter Anvin, linux-fsdevel, LKML, Andy Lutomirski,
	Ingo Molnar, syzkaller-bugs, Thomas Gleixner,
	the arch/x86 maintainers

On Mon, Jun 24, 2019 at 11:28:18AM +0200, 'Dmitry Vyukov' via syzkaller-bugs wrote:
> On Tue, Jun 18, 2019 at 4:03 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > On Tue, Jun 18, 2019 at 03:47:10AM -0700, syzbot wrote:
> > > Hello,
> > >
> > > syzbot found the following crash on:
> > >
> > > HEAD commit:    9e0babf2 Linux 5.2-rc5
> > > git tree:       upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=138b310aa00000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=d16883d6c7f0d717
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=6004acbaa1893ad013f0
> > > compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=154e8c2aa00000
> >
> > IDGI...
> >
> > mkdir(&(0x7f0000632000)='./file0\x00', 0x0)
> > mount(0x0, 0x0, 0x0, 0x0, 0x0)
> > syz_open_procfs(0x0, 0x0)
> > r0 = open(&(0x7f0000032ff8)='./file0\x00', 0x0, 0x0)
> > r1 = memfd_create(&(0x7f00000001c0)='\xb3', 0x0)
> > write$FUSE_DIRENT(r1, &(0x7f0000000080)=ANY=[], 0x29)
> > move_mount(r0, &(0x7f0000000040)='./file0\x00', 0xffffffffffffff9c, &(0x7f0000000100)='./file0\x00', 0x66)
> >
> > reads as if we'd done mkdir ./file0, opened it and then tried
> > to feed move_mount(2) "./file0" relative to that descriptor.
> > How the hell has that avoided an instant -ENOENT?  On the first
> > pair, that is - the second one (AT_FDCWD, "./file0") is fine...
> >
> > Confused...  Incidentally, what the hell is
> >         mount(0x0, 0x0, 0x0, 0x0, 0x0)
> > about?  *IF* that really refers to mount(2) with
> > such arguments, all you'll get is -EFAULT.  Way before
> > it gets to actually doing anything - it won't get past
> >         /* ... and get the mountpoint */
> >         retval = user_path(dir_name, &path);
> >         if (retval)
> >                 return retval;
> > in do_mount(2)...
> 
> Yes, mount(0x0, 0x0, 0x0, 0x0, 0x0) is mount with 0 arguments. Most
> likely it returns EFAULT.
> Since the reproducer have "threaded":true,"collide":true and no C
> repro, most likely this is a subtle race. So it attempted to remove
> mount(0x0, 0x0, 0x0, 0x0, 0x0) but it did not crash, so the conclusion
> was that it's somehow needed. You can actually see that other
> reproducers for this bug do not have this mount, but are otherwise
> similar.
> 
> With "threaded":true,"collide":true the execution mode is not just
> "execute each syscall once one after another".
> The syscalls are executed in separate threads and actually twice. You
> can see the approximate execution mode in this C program:
> https://gist.githubusercontent.com/dvyukov/c3a52f012e7cff9bdebf3935d35245cf/raw/b5587824111a1d982c985b00137ae8609572335b/gistfile1.txt
> Yet using the C program did not trigger the crash somehow (maybe just
> slightly different timings).
> 
> Since syzkaller was able to reproduce it multiple times, it looks like
> a real bug rather than an induced memory corruption or something.
> 

I sent a patch to fix this bug (https://lore.kernel.org/linux-fsdevel/20190629202744.12396-1-ebiggers@kernel.org/T/#u)

Dmitry, any idea why syzbot found such a bizarre reproducer for this?
This is actually reproducible by a simple single threaded program:

    #include <unistd.h>

    #define __NR_move_mount         429
    #define MOVE_MOUNT_F_EMPTY_PATH 0x00000004

    int main()
    {
    	int fds[2];

	pipe(fds);
	syscall(__NR_move_mount, fds[0], "", -1, "/", MOVE_MOUNT_F_EMPTY_PATH);
    }

FYI, it also isn't really appropriate for syzbot to bisect all bugs in new
syscalls to wiring them up to x86, and then blame all the x86 maintainers.
Normally such bugs will be in the syscall itself, regardless of architecture.

- Eric

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] vfs: move_mount: reject moving kernel internal mounts
  2019-06-29 20:39       ` Al Viro
@ 2019-07-01  1:08         ` Al Viro
  2019-07-01 15:43           ` Eric Biggers
  2019-07-01  7:38         ` David Howells
  1 sibling, 1 reply; 22+ messages in thread
From: Al Viro @ 2019-07-01  1:08 UTC (permalink / raw)
  To: Eric Biggers; +Cc: linux-fsdevel, David Howells, linux-kernel, syzkaller-bugs

On Sat, Jun 29, 2019 at 09:39:16PM +0100, Al Viro wrote:
> On Sat, Jun 29, 2019 at 01:27:44PM -0700, Eric Biggers wrote:
> 
> > @@ -2600,7 +2600,7 @@ static int do_move_mount(struct path *old_path, struct path *new_path)
> >  	if (attached && !check_mnt(old))
> >  		goto out;
> >  
> > -	if (!attached && !(ns && is_anon_ns(ns)))
> > +	if (!attached && !(ns && ns != MNT_NS_INTERNAL && is_anon_ns(ns)))
> >  		goto out;
> >  
> >  	if (old->mnt.mnt_flags & MNT_LOCKED)
> 
> *UGH*
> 
> Applied, but that code is getting really ugly ;-/

FWIW, it's too ugly and confusing.  Look:
        /* The mountpoint must be in our namespace. */
        if (!check_mnt(p))
                goto out;

        /* The thing moved should be either ours or completely unattached. */
        if (attached && !check_mnt(old))
                goto out;

        if (!attached && !(ns && ns != MNT_NS_INTERNAL && is_anon_ns(ns)))
                goto out;

OK, the first check is sane and understandable.  But let's look at what's
coming after it.  We have two cases:
	1) attached.  IOW, old->mnt_parent != old.  In that case we
require old->mnt_ns == current mnt_ns.  Anything else is rejected.
	2) !attached.  old->mnt_parent == old.  In that case we
require old->mnt_ns to be an anon namespace.

Let's reorder that a bit:
        /* The mountpoint must be in our namespace. */
        if (!check_mnt(p))
                goto out;

	/* The thing moved must be mounted... */
	if (!is_mounted(old_path->mnt))
		goto out;

        /* ... and either ours or the root of anon namespace */
	if (!(attached ? check_mnt(old) : is_anon_ns(ns)))
		goto out;

IMO that looks saner and all it costs us is a redundant check
in attached case.  Objections?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] vfs: move_mount: reject moving kernel internal mounts
  2019-06-29 20:39       ` Al Viro
  2019-07-01  1:08         ` Al Viro
@ 2019-07-01  7:38         ` David Howells
  2019-07-01 11:19           ` Al Viro
  1 sibling, 1 reply; 22+ messages in thread
From: David Howells @ 2019-07-01  7:38 UTC (permalink / raw)
  To: Al Viro
  Cc: dhowells, Eric Biggers, linux-fsdevel, linux-kernel, syzkaller-bugs

Al Viro <viro@zeniv.linux.org.uk> wrote:

> 	/* The thing moved must be mounted... */
> 	if (!is_mounted(old_path->mnt))
> 		goto out;

Um...  Doesn't that stuff up fsmount()?

David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] vfs: move_mount: reject moving kernel internal mounts
  2019-07-01  7:38         ` David Howells
@ 2019-07-01 11:19           ` Al Viro
  0 siblings, 0 replies; 22+ messages in thread
From: Al Viro @ 2019-07-01 11:19 UTC (permalink / raw)
  To: David Howells; +Cc: Eric Biggers, linux-fsdevel, linux-kernel, syzkaller-bugs

On Mon, Jul 01, 2019 at 08:38:10AM +0100, David Howells wrote:
> Al Viro <viro@zeniv.linux.org.uk> wrote:
> 
> > 	/* The thing moved must be mounted... */
> > 	if (!is_mounted(old_path->mnt))
> > 		goto out;
> 
> Um...  Doesn't that stuff up fsmount()?

Nope - check is_mounted() definition.  Stuff in anon namespace
*is* mounted there, so that's not a problem.

FWIW, is_mounted() would've been better off spelled as
ns != NULL && ns != MNT_NS_INTERNAL; the use of IS_ERR_OR_NULL
in there works, but is unidiomatic and I don't think it yields
better code...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: general protection fault in do_move_mount (2)
  2019-06-29 20:39     ` general protection fault in do_move_mount (2) Eric Biggers
@ 2019-07-01 14:59       ` Dmitry Vyukov
  2019-07-01 15:18         ` Eric Biggers
  0 siblings, 1 reply; 22+ messages in thread
From: Dmitry Vyukov @ 2019-07-01 14:59 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Al Viro, syzbot, Arnd Bergmann, Jens Axboe, Borislav Petkov,
	Catalin Marinas, Christian Brauner, David Howells,
	Geert Uytterhoeven, Hannes Reinecke, Heiko Carstens,
	H. Peter Anvin, linux-fsdevel, LKML, Andy Lutomirski,
	Ingo Molnar, syzkaller-bugs, Thomas Gleixner,
	the arch/x86 maintainers

On Sat, Jun 29, 2019 at 10:39 PM Eric Biggers <ebiggers@kernel.org> wrote:
>
> On Mon, Jun 24, 2019 at 11:28:18AM +0200, 'Dmitry Vyukov' via syzkaller-bugs wrote:
> > On Tue, Jun 18, 2019 at 4:03 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > On Tue, Jun 18, 2019 at 03:47:10AM -0700, syzbot wrote:
> > > > Hello,
> > > >
> > > > syzbot found the following crash on:
> > > >
> > > > HEAD commit:    9e0babf2 Linux 5.2-rc5
> > > > git tree:       upstream
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=138b310aa00000
> > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=d16883d6c7f0d717
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=6004acbaa1893ad013f0
> > > > compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> > > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=154e8c2aa00000
> > >
> > > IDGI...
> > >
> > > mkdir(&(0x7f0000632000)='./file0\x00', 0x0)
> > > mount(0x0, 0x0, 0x0, 0x0, 0x0)
> > > syz_open_procfs(0x0, 0x0)
> > > r0 = open(&(0x7f0000032ff8)='./file0\x00', 0x0, 0x0)
> > > r1 = memfd_create(&(0x7f00000001c0)='\xb3', 0x0)
> > > write$FUSE_DIRENT(r1, &(0x7f0000000080)=ANY=[], 0x29)
> > > move_mount(r0, &(0x7f0000000040)='./file0\x00', 0xffffffffffffff9c, &(0x7f0000000100)='./file0\x00', 0x66)
> > >
> > > reads as if we'd done mkdir ./file0, opened it and then tried
> > > to feed move_mount(2) "./file0" relative to that descriptor.
> > > How the hell has that avoided an instant -ENOENT?  On the first
> > > pair, that is - the second one (AT_FDCWD, "./file0") is fine...
> > >
> > > Confused...  Incidentally, what the hell is
> > >         mount(0x0, 0x0, 0x0, 0x0, 0x0)
> > > about?  *IF* that really refers to mount(2) with
> > > such arguments, all you'll get is -EFAULT.  Way before
> > > it gets to actually doing anything - it won't get past
> > >         /* ... and get the mountpoint */
> > >         retval = user_path(dir_name, &path);
> > >         if (retval)
> > >                 return retval;
> > > in do_mount(2)...
> >
> > Yes, mount(0x0, 0x0, 0x0, 0x0, 0x0) is mount with 0 arguments. Most
> > likely it returns EFAULT.
> > Since the reproducer have "threaded":true,"collide":true and no C
> > repro, most likely this is a subtle race. So it attempted to remove
> > mount(0x0, 0x0, 0x0, 0x0, 0x0) but it did not crash, so the conclusion
> > was that it's somehow needed. You can actually see that other
> > reproducers for this bug do not have this mount, but are otherwise
> > similar.
> >
> > With "threaded":true,"collide":true the execution mode is not just
> > "execute each syscall once one after another".
> > The syscalls are executed in separate threads and actually twice. You
> > can see the approximate execution mode in this C program:
> > https://gist.githubusercontent.com/dvyukov/c3a52f012e7cff9bdebf3935d35245cf/raw/b5587824111a1d982c985b00137ae8609572335b/gistfile1.txt
> > Yet using the C program did not trigger the crash somehow (maybe just
> > slightly different timings).
> >
> > Since syzkaller was able to reproduce it multiple times, it looks like
> > a real bug rather than an induced memory corruption or something.
> >
>
> I sent a patch to fix this bug (https://lore.kernel.org/linux-fsdevel/20190629202744.12396-1-ebiggers@kernel.org/T/#u)
>
> Dmitry, any idea why syzbot found such a bizarre reproducer for this?
> This is actually reproducible by a simple single threaded program:
>
>     #include <unistd.h>
>
>     #define __NR_move_mount         429
>     #define MOVE_MOUNT_F_EMPTY_PATH 0x00000004
>
>     int main()
>     {
>         int fds[2];
>
>         pipe(fds);
>         syscall(__NR_move_mount, fds[0], "", -1, "/", MOVE_MOUNT_F_EMPTY_PATH);
>     }


There is no pipe in the reproducer, so it could not theoretically come
up with the reproducer with the pipe. During minimization syzkaller
only tries to remove syscalls and simplify arguments and execution
mode.
What would be the simplest reproducer expressed as further
minimization of this reproducer?
https://syzkaller.appspot.com/x/repro.syz?x=154e8c2aa00000
I assume one of the syscalls is still move_mount, but what is the
other one? If it's memfd_create, or open of the procfs file, then it
seems that [ab]used heavy threading and syscall colliding as way to do
an arbitrary mutation of the program. Per se results of
memfd_create/procfs are not passed to move_mount. But by abusing races
it probably managed to do so in small percent of cases. It would also
explain why it's hard to reproduce.


> FYI, it also isn't really appropriate for syzbot to bisect all bugs in new
> syscalls to wiring them up to x86, and then blame all the x86 maintainers.
> Normally such bugs will be in the syscall itself, regardless of architecture.

Agree. Do you think it's something worth handling automatically
(stands out of the long tail of other inappropriate cases)? If so, how
could we detect such cases? It seems that some of these predicates are
quite hard to program. Similar things happen with introduction of new
bug detection tools and checks, wiring any functionality to new access
points and similar things.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: general protection fault in do_move_mount (2)
  2019-07-01 14:59       ` Dmitry Vyukov
@ 2019-07-01 15:18         ` Eric Biggers
  2019-07-05 12:17           ` Dmitry Vyukov
  2019-07-05 13:02           ` Dmitry Vyukov
  0 siblings, 2 replies; 22+ messages in thread
From: Eric Biggers @ 2019-07-01 15:18 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Al Viro, syzbot, Arnd Bergmann, Jens Axboe, Borislav Petkov,
	Catalin Marinas, Christian Brauner, David Howells,
	Geert Uytterhoeven, Hannes Reinecke, Heiko Carstens,
	H. Peter Anvin, linux-fsdevel, LKML, Andy Lutomirski,
	Ingo Molnar, syzkaller-bugs, Thomas Gleixner,
	the arch/x86 maintainers

On Mon, Jul 01, 2019 at 04:59:04PM +0200, 'Dmitry Vyukov' via syzkaller-bugs wrote:
> >
> > Dmitry, any idea why syzbot found such a bizarre reproducer for this?
> > This is actually reproducible by a simple single threaded program:
> >
> >     #include <unistd.h>
> >
> >     #define __NR_move_mount         429
> >     #define MOVE_MOUNT_F_EMPTY_PATH 0x00000004
> >
> >     int main()
> >     {
> >         int fds[2];
> >
> >         pipe(fds);
> >         syscall(__NR_move_mount, fds[0], "", -1, "/", MOVE_MOUNT_F_EMPTY_PATH);
> >     }
> 
> 
> There is no pipe in the reproducer, so it could not theoretically come
> up with the reproducer with the pipe. During minimization syzkaller
> only tries to remove syscalls and simplify arguments and execution
> mode.
> What would be the simplest reproducer expressed as further
> minimization of this reproducer?
> https://syzkaller.appspot.com/x/repro.syz?x=154e8c2aa00000
> I assume one of the syscalls is still move_mount, but what is the
> other one? If it's memfd_create, or open of the procfs file, then it
> seems that [ab]used heavy threading and syscall colliding as way to do
> an arbitrary mutation of the program. Per se results of
> memfd_create/procfs are not passed to move_mount. But by abusing races
> it probably managed to do so in small percent of cases. It would also
> explain why it's hard to reproduce.

To be clear, memfd_create() works just as well:

	#define _GNU_SOURCE
	#include <sys/mman.h>
	#include <unistd.h>

	#define __NR_move_mount         429
	#define MOVE_MOUNT_F_EMPTY_PATH 0x00000004

	int main()
	{
		int fd = memfd_create("foo", 0);

		syscall(__NR_move_mount, fd, "", -1, "/", MOVE_MOUNT_F_EMPTY_PATH);
	}

I just changed it to pipe() in my example, because pipe() is less obscure.

> 
> 
> > FYI, it also isn't really appropriate for syzbot to bisect all bugs in new
> > syscalls to wiring them up to x86, and then blame all the x86 maintainers.
> > Normally such bugs will be in the syscall itself, regardless of architecture.
> 
> Agree. Do you think it's something worth handling automatically
> (stands out of the long tail of other inappropriate cases)? If so, how
> could we detect such cases? It seems that some of these predicates are
> quite hard to program. Similar things happen with introduction of new
> bug detection tools and checks, wiring any functionality to new access
> points and similar things.
> 

Yes, this case could easily be automatically detected (most of the time) by
listing the filenames changed in the commit, and checking whether they all match
the pattern syscall.*\.tbl.  Sure, it's not common, but it could be alongside
other similar straightforward checks like checking for merge commits and
checking for commits that only modify Documentation/.

I'm not even asking for more correct bisection results at this point, I'm just
asking for fewer bad bisection results.

- Eric

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] vfs: move_mount: reject moving kernel internal mounts
  2019-07-01  1:08         ` Al Viro
@ 2019-07-01 15:43           ` Eric Biggers
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Biggers @ 2019-07-01 15:43 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, David Howells, linux-kernel, syzkaller-bugs

On Mon, Jul 01, 2019 at 02:08:48AM +0100, Al Viro wrote:
> 
> Let's reorder that a bit:
>         /* The mountpoint must be in our namespace. */
>         if (!check_mnt(p))
>                 goto out;
> 
> 	/* The thing moved must be mounted... */
> 	if (!is_mounted(old_path->mnt))
> 		goto out;
> 
>         /* ... and either ours or the root of anon namespace */
> 	if (!(attached ? check_mnt(old) : is_anon_ns(ns)))
> 		goto out;
> 
> IMO that looks saner and all it costs us is a redundant check
> in attached case.  Objections?

Looks good to me.

- Eric

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] vfs: move_mount: reject moving kernel internal mounts
  2019-06-29 20:27     ` [PATCH] vfs: move_mount: reject moving kernel internal mounts Eric Biggers
  2019-06-29 20:39       ` Al Viro
@ 2019-07-01 16:45       ` Eric Biggers
  2019-07-01 18:22         ` Al Viro
  1 sibling, 1 reply; 22+ messages in thread
From: Eric Biggers @ 2019-07-01 16:45 UTC (permalink / raw)
  To: David Howells; +Cc: linux-fsdevel, Alexander Viro, linux-kernel, syzkaller-bugs

On Sat, Jun 29, 2019 at 01:27:44PM -0700, Eric Biggers wrote:
> 
> Reproducer:
> 
>     #include <unistd.h>
> 
>     #define __NR_move_mount         429
>     #define MOVE_MOUNT_F_EMPTY_PATH 0x00000004
> 
>     int main()
>     {
>     	  int fds[2];
> 
>     	  pipe(fds);
>         syscall(__NR_move_mount, fds[0], "", -1, "/", MOVE_MOUNT_F_EMPTY_PATH);
>     }

David, I'd like to add this as a regression test somewhere.

Can you point me to the tests for the new mount syscalls?

I checked LTP, kselftests, and xfstests, but nothing to be found.

- Eric

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] vfs: move_mount: reject moving kernel internal mounts
  2019-07-01 16:45       ` Eric Biggers
@ 2019-07-01 18:22         ` Al Viro
  2019-07-01 19:20           ` Al Viro
                             ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Al Viro @ 2019-07-01 18:22 UTC (permalink / raw)
  To: Eric Biggers; +Cc: David Howells, linux-fsdevel, linux-kernel, syzkaller-bugs

On Mon, Jul 01, 2019 at 09:45:37AM -0700, Eric Biggers wrote:
> On Sat, Jun 29, 2019 at 01:27:44PM -0700, Eric Biggers wrote:
> > 
> > Reproducer:
> > 
> >     #include <unistd.h>
> > 
> >     #define __NR_move_mount         429
> >     #define MOVE_MOUNT_F_EMPTY_PATH 0x00000004
> > 
> >     int main()
> >     {
> >     	  int fds[2];
> > 
> >     	  pipe(fds);
> >         syscall(__NR_move_mount, fds[0], "", -1, "/", MOVE_MOUNT_F_EMPTY_PATH);
> >     }
> 
> David, I'd like to add this as a regression test somewhere.
> 
> Can you point me to the tests for the new mount syscalls?
> 
> I checked LTP, kselftests, and xfstests, but nothing to be found.

FWIW, it's not just move_mount(2) - I'd expect

	int fds[2];
	char s[80];

	pipe(fds);
	sprintf(s, "/dev/fd/%d", fds[0]);
	mount(s, "/dev/null", NULL, MS_MOVE, 0);

to step into exactly the same thing.  mount(2) does follow symlinks -
always had...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] vfs: move_mount: reject moving kernel internal mounts
  2019-07-01 18:22         ` Al Viro
@ 2019-07-01 19:20           ` Al Viro
  2019-07-02 18:22           ` Eric Biggers
  2019-07-05  9:01           ` move_mount.2 David Howells
  2 siblings, 0 replies; 22+ messages in thread
From: Al Viro @ 2019-07-01 19:20 UTC (permalink / raw)
  To: Eric Biggers; +Cc: David Howells, linux-fsdevel, linux-kernel, syzkaller-bugs

On Mon, Jul 01, 2019 at 07:22:39PM +0100, Al Viro wrote:

> FWIW, it's not just move_mount(2) - I'd expect
> 
> 	int fds[2];
> 	char s[80];
> 
> 	pipe(fds);
> 	sprintf(s, "/dev/fd/%d", fds[0]);
> 	mount(s, "/dev/null", NULL, MS_MOVE, 0);
> 
> to step into exactly the same thing.  mount(2) does follow symlinks -
> always had...

The same goes for e.g.

#define _GNU_SOURCE
#include <sched.h>
#include <sys/mount.h>
#include <stdio.h>
#include <sys/epoll.h>

main()
{
	char s[80];
	unshare(CLONE_NEWNS);	// so nobody else gets confused
 	sprintf(s, "/dev/fd/%d", epoll_create1(0));
 	mount(s, "/dev/null", NULL, MS_MOVE, 0);	// see if it oopses
}

modulo error-checking, etc.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] vfs: move_mount: reject moving kernel internal mounts
  2019-07-01 18:22         ` Al Viro
  2019-07-01 19:20           ` Al Viro
@ 2019-07-02 18:22           ` Eric Biggers
  2019-07-09 19:40             ` Eric Biggers
  2019-07-05  9:01           ` move_mount.2 David Howells
  2 siblings, 1 reply; 22+ messages in thread
From: Eric Biggers @ 2019-07-02 18:22 UTC (permalink / raw)
  To: Al Viro, David Howells; +Cc: linux-fsdevel, linux-kernel, syzkaller-bugs

On Mon, Jul 01, 2019 at 07:22:39PM +0100, Al Viro wrote:
> On Mon, Jul 01, 2019 at 09:45:37AM -0700, Eric Biggers wrote:
> > On Sat, Jun 29, 2019 at 01:27:44PM -0700, Eric Biggers wrote:
> > > 
> > > Reproducer:
> > > 
> > >     #include <unistd.h>
> > > 
> > >     #define __NR_move_mount         429
> > >     #define MOVE_MOUNT_F_EMPTY_PATH 0x00000004
> > > 
> > >     int main()
> > >     {
> > >     	  int fds[2];
> > > 
> > >     	  pipe(fds);
> > >         syscall(__NR_move_mount, fds[0], "", -1, "/", MOVE_MOUNT_F_EMPTY_PATH);
> > >     }
> > 
> > David, I'd like to add this as a regression test somewhere.
> > 
> > Can you point me to the tests for the new mount syscalls?
> > 
> > I checked LTP, kselftests, and xfstests, but nothing to be found.
> 
> FWIW, it's not just move_mount(2) - I'd expect
> 
> 	int fds[2];
> 	char s[80];
> 
> 	pipe(fds);
> 	sprintf(s, "/dev/fd/%d", fds[0]);
> 	mount(s, "/dev/null", NULL, MS_MOVE, 0);
> 
> to step into exactly the same thing.  mount(2) does follow symlinks -
> always had...

Sure, but the new mount syscalls still need tests.  Where are the tests?

Also, since the case of a fd with an internal mount was overlooked, probably the
man page needs to be updated clarify that move_mount(2) fails with EINVAL in
this case.  Where is the man page?

- Eric

^ permalink raw reply	[flat|nested] 22+ messages in thread

* move_mount.2
  2019-07-01 18:22         ` Al Viro
  2019-07-01 19:20           ` Al Viro
  2019-07-02 18:22           ` Eric Biggers
@ 2019-07-05  9:01           ` David Howells
  2 siblings, 0 replies; 22+ messages in thread
From: David Howells @ 2019-07-05  9:01 UTC (permalink / raw)
  To: Eric Biggers
  Cc: dhowells, Al Viro, linux-fsdevel, linux-kernel, syzkaller-bugs

Eric Biggers <ebiggers@kernel.org> wrote:

> Also, since the case of a fd with an internal mount was overlooked, probably
> the man page needs to be updated clarify that move_mount(2) fails with
> EINVAL in this case.  Where is the man page?

See below.  I'm in the middle of updating the manpages I need to push.

David
---
'\" t
.\" Copyright (c) 2018 David Howells <dhowells@redhat.com>
.\"
.\" %%%LICENSE_START(VERBATIM)
.\" Permission is granted to make and distribute verbatim copies of this
.\" manual provided the copyright notice and this permission notice are
.\" preserved on all copies.
.\"
.\" Permission is granted to copy and distribute modified versions of this
.\" manual under the conditions for verbatim copying, provided that the
.\" entire resulting derived work is distributed under the terms of a
.\" permission notice identical to this one.
.\"
.\" Since the Linux kernel and libraries are constantly changing, this
.\" manual page may be incorrect or out-of-date.  The author(s) assume no
.\" responsibility for errors or omissions, or for damages resulting from
.\" the use of the information contained herein.  The author(s) may not
.\" have taken the same level of care in the production of this manual,
.\" which is licensed free of charge, as they might when working
.\" professionally.
.\"
.\" Formatted or processed versions of this manual, if unaccompanied by
.\" the source, must acknowledge the copyright and authors of this work.
.\" %%%LICENSE_END
.\"
.TH MOVE_MOUNT 2 2018-06-08 "Linux" "Linux Programmer's Manual"
.SH NAME
move_mount \- Move mount objects around the filesystem topology
.SH SYNOPSIS
.nf
.B #include <sys/types.h>
.br
.B #include <sys/mount.h>
.br
.B #include <unistd.h>
.br
.BR "#include <fcntl.h>           " "/* Definition of AT_* constants */"
.PP
.BI "int move_mount(int " from_dirfd ", const char *" from_pathname ","
.BI "               int " to_dirfd ", const char *" to_pathname ","
.BI "               unsigned int " flags );
.fi
.PP
.IR Note :
There are no glibc wrappers for these system calls.
.SH DESCRIPTION
The
.BR move_mount ()
call moves a mount from one place to another; it can also be used to attach an
unattached mount created by
.BR fsmount "() or " open_tree "() with " OPEN_TREE_CLONE .
.PP
If
.BR move_mount ()
is called repeatedly with a file descriptor that refers to a mount object,
then the object will be attached/moved the first time and then moved again and
again and again, detaching it from the previous mountpoint each time.
.PP
To access the source mount object or the destination mountpoint, no
permissions are required on the object itself, but if either pathname is
supplied, execute (search) permission is required on all of the directories
specified in
.IR from_pathname " or " to_pathname .
.PP
The caller does, however, require the appropriate capabilities or permission
to effect a mount.
.PP
.BR move_mount ()
uses
.IR from_pathname ", " from_dirfd " and some " flags
to locate the mount object to be moved and
.IR to_pathname ", " to_dirfd " and some other " flags
to locate the destination mountpoint.  Each lookup can be done in one of a
variety of ways:
.TP
[*] By absolute path.
The pathname points to an absolute path and the dirfd is ignored.  The file is
looked up by name, starting from the root of the filesystem as seen by the
calling process.
.TP
[*] By cwd-relative path.
The pathname points to a relative path and the dirfd is
.IR AT_FDCWD .
The file is looked up by name, starting from the current working directory.
.TP
[*] By dir-relative path.
The pathname points to relative path and the dirfd indicates a file descriptor
pointing to a directory.  The file is looked up by name, starting from the
directory specified by
.IR dirfd .
.TP
[*] By file descriptor.
The pathname points to "", the dirfd points directly to the mount object to
move or the destination mount point and the appropriate
.B *_EMPTY_PATH
flag is set.
.PP
.I flags
can be used to influence a path-based lookup.  A value for
.I flags
is constructed by OR'ing together zero or more of the following constants:
.TP
.BR MOVE_MOUNT_F_EMPTY_PATH
.\" commit 65cfc6722361570bfe255698d9cd4dccaf47570d
If
.I from_pathname
is an empty string, operate on the file referred to by
.IR from_dirfd
(which may have been obtained using the
.BR open (2)
.B O_PATH
flag or
.BR open_tree ())
If
.I from_dirfd
is
.BR AT_FDCWD ,
the call operates on the current working directory.
In this case,
.I from_dirfd
can refer to any type of file, not just a directory.
This flag is Linux-specific; define
.B _GNU_SOURCE
.\" Before glibc 2.16, defining _ATFILE_SOURCE sufficed
to obtain its definition.
.TP
.B MOVE_MOUNT_T_EMPTY_PATH
As above, but operating on
.IR to_pathname " and " to_dirfd .
.TP
.B MOVE_MOUNT_F_NO_AUTOMOUNT
Don't automount the terminal ("basename") component of
.I from_pathname
if it is a directory that is an automount point.  This allows a mount object
that has an automount point at its root to be moved and prevents unintended
triggering of an automount point.
The
.B MOVE_MOUNT_F_NO_AUTOMOUNT
flag has no effect if the automount point has already been mounted over.  This
flag is Linux-specific; define
.B _GNU_SOURCE
.\" Before glibc 2.16, defining _ATFILE_SOURCE sufficed
to obtain its definition.
.TP
.B MOVE_MOUNT_T_NO_AUTOMOUNT
As above, but operating on
.IR to_pathname " and " to_dirfd .
This allows an automount point to be manually mounted over.
.TP
.B MOVE_MOUNT_F_SYMLINKS
If
.I from_pathname
is a symbolic link, then dereference it.  The default for
.BR move_mount ()
is to not follow symlinks.
.TP
.B MOVE_MOUNT_T_SYMLINKS
As above, but operating on
.IR to_pathname " and " to_dirfd .

.SH EXAMPLES
The
.BR move_mount ()
function can be used like the following:
.PP
.RS
.nf
move_mount(AT_FDCWD, "/a", AT_FDCWD, "/b", 0);
.fi
.RE
.PP
This would move the object mounted on "/a" to "/b".  It can also be used in
conjunction with
.BR open_tree "(2) or " open "(2) with " O_PATH :
.PP
.RS
.nf
fd = open_tree(AT_FDCWD, "/mnt", 0);
move_mount(fd, "", AT_FDCWD, "/mnt2", MOVE_MOUNT_F_EMPTY_PATH);
move_mount(fd, "", AT_FDCWD, "/mnt3", MOVE_MOUNT_F_EMPTY_PATH);
move_mount(fd, "", AT_FDCWD, "/mnt4", MOVE_MOUNT_F_EMPTY_PATH);
.fi
.RE
.PP
This would attach the path point for "/mnt" to fd, then it would move the
mount to "/mnt2", then move it to "/mnt3" and finally to "/mnt4".
.PP
It can also be used to attach new mounts:
.PP
.RS
.nf
sfd = fsopen("ext4", FSOPEN_CLOEXEC);
write(sfd, "s /dev/sda1");
write(sfd, "o user_xattr");
mfd = fsmount(sfd, FSMOUNT_CLOEXEC, MS_NODEV);
move_mount(mfd, "", AT_FDCWD, "/home", MOVE_MOUNT_F_EMPTY_PATH);
.fi
.RE
.PP
Which would open the Ext4 filesystem mounted on "/dev/sda1", turn on user
extended attribute support and create a mount object for it.  Finally, the new
mount object would be attached with
.BR move_mount ()
to "/home".


.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
.\"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
.SH RETURN VALUE
On success, 0 is returned.  On error, \-1 is returned, and
.I errno
is set appropriately.
.SH ERRORS
.TP
.B EACCES
Search permission is denied for one of the directories
in the path prefix of
.IR pathname .
(See also
.BR path_resolution (7).)
.TP
.B EBADF
.IR from_dirfd " or " to_dirfd
is not a valid open file descriptor.
.TP
.B EFAULT
.IR from_pathname " or " to_pathname
is NULL or either one point to a location outside the process's accessible
address space.
.TP
.B EINVAL
Reserved flag specified in
.IR flags .
.TP
.B ELOOP
Too many symbolic links encountered while traversing the pathname.
.TP
.B ENAMETOOLONG
.IR from_pathname " or " to_pathname
is too long.
.TP
.B ENOENT
A component of
.IR from_pathname " or " to_pathname
does not exist, or one is an empty string and the appropriate
.B *_EMPTY_PATH
was not specified in
.IR flags .
.TP
.B ENOMEM
Out of memory (i.e., kernel memory).
.TP
.B ENOTDIR
A component of the path prefix of
.IR from_pathname " or " to_pathname
is not a directory or one or the other is relative and the appropriate
.I *_dirfd
is a file descriptor referring to a file other than a directory.
.SH VERSIONS
.BR move_mount ()
was added to Linux in kernel 4.18.
.SH CONFORMING TO
.BR move_mount ()
is Linux-specific.
.SH NOTES
Glibc does not (yet) provide a wrapper for the
.BR move_mount ()
system call; call it using
.BR syscall (2).
.SH SEE ALSO
.BR fsmount (2),
.BR fsopen (2),
.BR open_tree (2)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: general protection fault in do_move_mount (2)
  2019-07-01 15:18         ` Eric Biggers
@ 2019-07-05 12:17           ` Dmitry Vyukov
  2019-07-05 13:02           ` Dmitry Vyukov
  1 sibling, 0 replies; 22+ messages in thread
From: Dmitry Vyukov @ 2019-07-05 12:17 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Al Viro, syzbot, Arnd Bergmann, Jens Axboe, Borislav Petkov,
	Catalin Marinas, Christian Brauner, David Howells,
	Geert Uytterhoeven, Hannes Reinecke, Heiko Carstens,
	H. Peter Anvin, linux-fsdevel, LKML, Andy Lutomirski,
	Ingo Molnar, syzkaller-bugs, Thomas Gleixner,
	the arch/x86 maintainers

On Mon, Jul 1, 2019 at 5:18 PM Eric Biggers <ebiggers@kernel.org> wrote:
>
> On Mon, Jul 01, 2019 at 04:59:04PM +0200, 'Dmitry Vyukov' via syzkaller-bugs wrote:
> > >
> > > Dmitry, any idea why syzbot found such a bizarre reproducer for this?
> > > This is actually reproducible by a simple single threaded program:
> > >
> > >     #include <unistd.h>
> > >
> > >     #define __NR_move_mount         429
> > >     #define MOVE_MOUNT_F_EMPTY_PATH 0x00000004
> > >
> > >     int main()
> > >     {
> > >         int fds[2];
> > >
> > >         pipe(fds);
> > >         syscall(__NR_move_mount, fds[0], "", -1, "/", MOVE_MOUNT_F_EMPTY_PATH);
> > >     }
> >
> >
> > There is no pipe in the reproducer, so it could not theoretically come
> > up with the reproducer with the pipe. During minimization syzkaller
> > only tries to remove syscalls and simplify arguments and execution
> > mode.
> > What would be the simplest reproducer expressed as further
> > minimization of this reproducer?
> > https://syzkaller.appspot.com/x/repro.syz?x=154e8c2aa00000
> > I assume one of the syscalls is still move_mount, but what is the
> > other one? If it's memfd_create, or open of the procfs file, then it
> > seems that [ab]used heavy threading and syscall colliding as way to do
> > an arbitrary mutation of the program. Per se results of
> > memfd_create/procfs are not passed to move_mount. But by abusing races
> > it probably managed to do so in small percent of cases. It would also
> > explain why it's hard to reproduce.
>
> To be clear, memfd_create() works just as well:
>
>         #define _GNU_SOURCE
>         #include <sys/mman.h>
>         #include <unistd.h>
>
>         #define __NR_move_mount         429
>         #define MOVE_MOUNT_F_EMPTY_PATH 0x00000004
>
>         int main()
>         {
>                 int fd = memfd_create("foo", 0);
>
>                 syscall(__NR_move_mount, fd, "", -1, "/", MOVE_MOUNT_F_EMPTY_PATH);
>         }
>
> I just changed it to pipe() in my example, because pipe() is less obscure.

Then I think the reason for the bizarre reproducer is what I described above.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: general protection fault in do_move_mount (2)
  2019-07-01 15:18         ` Eric Biggers
  2019-07-05 12:17           ` Dmitry Vyukov
@ 2019-07-05 13:02           ` Dmitry Vyukov
  1 sibling, 0 replies; 22+ messages in thread
From: Dmitry Vyukov @ 2019-07-05 13:02 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Al Viro, syzbot, Arnd Bergmann, Jens Axboe, Borislav Petkov,
	Catalin Marinas, Christian Brauner, David Howells,
	Geert Uytterhoeven, Hannes Reinecke, Heiko Carstens,
	H. Peter Anvin, linux-fsdevel, LKML, Andy Lutomirski,
	Ingo Molnar, syzkaller-bugs, Thomas Gleixner,
	the arch/x86 maintainers

On Mon, Jul 1, 2019 at 5:18 PM Eric Biggers <ebiggers@kernel.org> wrote:
> > > FYI, it also isn't really appropriate for syzbot to bisect all bugs in new
> > > syscalls to wiring them up to x86, and then blame all the x86 maintainers.
> > > Normally such bugs will be in the syscall itself, regardless of architecture.
> >
> > Agree. Do you think it's something worth handling automatically
> > (stands out of the long tail of other inappropriate cases)? If so, how
> > could we detect such cases? It seems that some of these predicates are
> > quite hard to program. Similar things happen with introduction of new
> > bug detection tools and checks, wiring any functionality to new access
> > points and similar things.
> >
>
> Yes, this case could easily be automatically detected (most of the time) by
> listing the filenames changed in the commit, and checking whether they all match
> the pattern syscall.*\.tbl.  Sure, it's not common, but it could be alongside
> other similar straightforward checks like checking for merge commits and
> checking for commits that only modify Documentation/.
>
> I'm not even asking for more correct bisection results at this point, I'm just
> asking for fewer bad bisection results.


Agree, if we implement a common framework for doing this type of
checks and affecting reporting in some fixed set of ways, adding more
rules can make sense even if they don't affect lots of cases. I filed
https://github.com/google/syzkaller/issues/1271 for this.

There are several open questions, though.
1. The syscall.*\.tbl change is formally the right bisection result
and it communicates a bit of potentially useful information. Do we
want to handle them differently from, say, Documentation/* changes
which are significantly a different type "incorrect".
2. You mentioned merges. It seems that they can be just anything:
completely incorrect results; formally correct, but not the change
that introduced the bug; as well as the totally right commit to blame.
Are you sure we should mark all of them as completely incorrect?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] vfs: move_mount: reject moving kernel internal mounts
  2019-07-02 18:22           ` Eric Biggers
@ 2019-07-09 19:40             ` Eric Biggers
  2019-07-09 20:54               ` Al Viro
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Biggers @ 2019-07-09 19:40 UTC (permalink / raw)
  To: Al Viro, David Howells; +Cc: linux-fsdevel, linux-kernel, syzkaller-bugs

On Tue, Jul 02, 2019 at 11:22:59AM -0700, Eric Biggers wrote:
> 
> Sure, but the new mount syscalls still need tests.  Where are the tests?
> 

Still waiting for an answer to this question.

Did we just release 6 new syscalls with no tests?

I don't understand how that is even remotely acceptable.

- Eric

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] vfs: move_mount: reject moving kernel internal mounts
  2019-07-09 19:40             ` Eric Biggers
@ 2019-07-09 20:54               ` Al Viro
  2019-07-10  3:23                 ` 6 new syscalls without tests (was: [PATCH] vfs: move_mount: reject moving kernel internal mounts) Eric Biggers
  0 siblings, 1 reply; 22+ messages in thread
From: Al Viro @ 2019-07-09 20:54 UTC (permalink / raw)
  To: David Howells, linux-fsdevel, linux-kernel, syzkaller-bugs

On Tue, Jul 09, 2019 at 12:40:01PM -0700, Eric Biggers wrote:
> On Tue, Jul 02, 2019 at 11:22:59AM -0700, Eric Biggers wrote:
> > 
> > Sure, but the new mount syscalls still need tests.  Where are the tests?
> > 
> 
> Still waiting for an answer to this question.

In samples/vfs/fsmount.c, IIRC, and that's not much of a test.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* 6 new syscalls without tests (was: [PATCH] vfs: move_mount: reject moving kernel internal mounts)
  2019-07-09 20:54               ` Al Viro
@ 2019-07-10  3:23                 ` Eric Biggers
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Biggers @ 2019-07-10  3:23 UTC (permalink / raw)
  To: Al Viro; +Cc: David Howells, linux-fsdevel, linux-kernel, syzkaller-bugs

On Tue, Jul 09, 2019 at 09:54:24PM +0100, Al Viro wrote:
> On Tue, Jul 09, 2019 at 12:40:01PM -0700, Eric Biggers wrote:
> > On Tue, Jul 02, 2019 at 11:22:59AM -0700, Eric Biggers wrote:
> > > 
> > > Sure, but the new mount syscalls still need tests.  Where are the tests?
> > > 
> > 
> > Still waiting for an answer to this question.
> 
> In samples/vfs/fsmount.c, IIRC, and that's not much of a test.

A sample program doesn't count.  There need to be tests that can be run
automatically as part of a well known test suite, such as LTP, kselftests, or
xfstests.  Why is this not mandatory for new syscalls to be accepted?  What if
they are broken and we don't know?  See what happened with copy_file_range():
https://lwn.net/Articles/774114/

- Eric

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-07-10  3:23 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-18 10:47 general protection fault in do_move_mount (2) syzbot
2019-06-18 14:02 ` Al Viro
2019-06-24  9:28   ` Dmitry Vyukov
2019-06-29 20:27     ` [PATCH] vfs: move_mount: reject moving kernel internal mounts Eric Biggers
2019-06-29 20:39       ` Al Viro
2019-07-01  1:08         ` Al Viro
2019-07-01 15:43           ` Eric Biggers
2019-07-01  7:38         ` David Howells
2019-07-01 11:19           ` Al Viro
2019-07-01 16:45       ` Eric Biggers
2019-07-01 18:22         ` Al Viro
2019-07-01 19:20           ` Al Viro
2019-07-02 18:22           ` Eric Biggers
2019-07-09 19:40             ` Eric Biggers
2019-07-09 20:54               ` Al Viro
2019-07-10  3:23                 ` 6 new syscalls without tests (was: [PATCH] vfs: move_mount: reject moving kernel internal mounts) Eric Biggers
2019-07-05  9:01           ` move_mount.2 David Howells
2019-06-29 20:39     ` general protection fault in do_move_mount (2) Eric Biggers
2019-07-01 14:59       ` Dmitry Vyukov
2019-07-01 15:18         ` Eric Biggers
2019-07-05 12:17           ` Dmitry Vyukov
2019-07-05 13:02           ` Dmitry Vyukov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).