linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* kernel BUG at mm/hugetlb.c:LINE!
@ 2020-04-06  3:06 syzbot
  2020-04-06 22:05 ` Mike Kravetz
  0 siblings, 1 reply; 21+ messages in thread
From: syzbot @ 2020-04-06  3:06 UTC (permalink / raw)
  To: akpm, linux-fsdevel, linux-kernel, linux-mm, mike.kravetz,
	mszeredi, syzkaller-bugs, viro

Hello,

syzbot found the following crash on:

HEAD commit:    1a323ea5 x86: get rid of 'errret' argument to __get_user_x..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=132e940be00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=8c1e98458335a7d1
dashboard link: https://syzkaller.appspot.com/bug?extid=d6ec23007e951dadf3de
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12921933e00000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=172e940be00000

The bug was bisected to:

commit e950564b97fd0f541b02eb207685d0746f5ecf29
Author: Miklos Szeredi <mszeredi@redhat.com>
Date:   Tue Jul 24 13:01:55 2018 +0000

    vfs: don't evict uninitialized inode

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=115cad33e00000
final crash:    https://syzkaller.appspot.com/x/report.txt?x=135cad33e00000
console output: https://syzkaller.appspot.com/x/log.txt?x=155cad33e00000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+d6ec23007e951dadf3de@syzkaller.appspotmail.com
Fixes: e950564b97fd ("vfs: don't evict uninitialized inode")

overlayfs: upper fs does not support xattr, falling back to index=off and metacopy=off.
------------[ cut here ]------------
kernel BUG at mm/hugetlb.c:3416!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 7036 Comm: syz-executor110 Not tainted 5.6.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__unmap_hugepage_range+0xa26/0xbc0 mm/hugetlb.c:3416
Code: 00 48 c7 c7 60 37 35 88 e8 57 b4 a2 ff e9 b3 fd ff ff e8 cd 90 c6 ff 0f 0b e9 c4 f7 ff ff e8 c1 90 c6 ff 0f 0b e8 ba 90 c6 ff <0f> 0b e8 b3 90 c6 ff 83 8c 24 c0 00 00 00 01 48 8d bc 24 a0 00 00
RSP: 0018:ffffc900017779b0 EFLAGS: 00010293
RAX: ffff88808cf5c2c0 RBX: ffffffff8c641c08 RCX: ffffffff81ac50b4
RDX: 0000000000000000 RSI: ffffffff81ac58a6 RDI: 0000000000000007
RBP: 0000000020000000 R08: ffff88808cf5c2c0 R09: ffffed10129d8111
R10: ffffed10129d8110 R11: ffff888094ec0887 R12: 0000000000003000
R13: 0000000000000000 R14: 0000000020003000 R15: 0000000000200000
FS:  00000000013c0880(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000140 CR3: 0000000093554000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 __unmap_hugepage_range_final+0x30/0x70 mm/hugetlb.c:3507
 unmap_single_vma+0x238/0x300 mm/memory.c:1296
 unmap_vmas+0x16f/0x2f0 mm/memory.c:1332
 exit_mmap+0x2aa/0x510 mm/mmap.c:3126
 __mmput kernel/fork.c:1082 [inline]
 mmput+0x168/0x4b0 kernel/fork.c:1103
 exit_mm kernel/exit.c:477 [inline]
 do_exit+0xa51/0x2dd0 kernel/exit.c:780
 do_group_exit+0x125/0x340 kernel/exit.c:891
 __do_sys_exit_group kernel/exit.c:902 [inline]
 __se_sys_exit_group kernel/exit.c:900 [inline]
 __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:900
 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
 entry_SYSCALL_64_after_hwframe+0x49/0xb3
RIP: 0033:0x43efe8
Code: Bad RIP value.
RSP: 002b:00007ffdfe6c00f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000043efe8
RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
RBP: 00000000004be7e8 R08: 00000000000000e7 R09: ffffffffffffffd0
R10: 0000040000000011 R11: 0000000000000246 R12: 0000000000000001
R13: 00000000006d0180 R14: 0000000000000000 R15: 0000000000000000
Modules linked in:
---[ end trace 2d36245d65cb52f7 ]---
RIP: 0010:__unmap_hugepage_range+0xa26/0xbc0 mm/hugetlb.c:3416
Code: 00 48 c7 c7 60 37 35 88 e8 57 b4 a2 ff e9 b3 fd ff ff e8 cd 90 c6 ff 0f 0b e9 c4 f7 ff ff e8 c1 90 c6 ff 0f 0b e8 ba 90 c6 ff <0f> 0b e8 b3 90 c6 ff 83 8c 24 c0 00 00 00 01 48 8d bc 24 a0 00 00
RSP: 0018:ffffc900017779b0 EFLAGS: 00010293
RAX: ffff88808cf5c2c0 RBX: ffffffff8c641c08 RCX: ffffffff81ac50b4
RDX: 0000000000000000 RSI: ffffffff81ac58a6 RDI: 0000000000000007
RBP: 0000000020000000 R08: ffff88808cf5c2c0 R09: ffffed10129d8111
R10: ffffed10129d8110 R11: ffff888094ec0887 R12: 0000000000003000
R13: 0000000000000000 R14: 0000000020003000 R15: 0000000000200000
FS:  00000000013c0880(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8cc24dd000 CR3: 0000000093554000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: kernel BUG at mm/hugetlb.c:LINE!
  2020-04-06  3:06 kernel BUG at mm/hugetlb.c:LINE! syzbot
@ 2020-04-06 22:05 ` Mike Kravetz
  2020-05-12 15:04   ` Miklos Szeredi
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Kravetz @ 2020-04-06 22:05 UTC (permalink / raw)
  To: syzbot, akpm, linux-fsdevel, linux-kernel, linux-mm, mszeredi,
	syzkaller-bugs, viro

On 4/5/20 8:06 PM, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    1a323ea5 x86: get rid of 'errret' argument to __get_user_x..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=132e940be00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=8c1e98458335a7d1
> dashboard link: https://syzkaller.appspot.com/bug?extid=d6ec23007e951dadf3de
> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12921933e00000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=172e940be00000
> 
> The bug was bisected to:
> 
> commit e950564b97fd0f541b02eb207685d0746f5ecf29
> Author: Miklos Szeredi <mszeredi@redhat.com>
> Date:   Tue Jul 24 13:01:55 2018 +0000
> 
>     vfs: don't evict uninitialized inode
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=115cad33e00000
> final crash:    https://syzkaller.appspot.com/x/report.txt?x=135cad33e00000
> console output: https://syzkaller.appspot.com/x/log.txt?x=155cad33e00000
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+d6ec23007e951dadf3de@syzkaller.appspotmail.com
> Fixes: e950564b97fd ("vfs: don't evict uninitialized inode")
> 
> overlayfs: upper fs does not support xattr, falling back to index=off and metacopy=off.
> ------------[ cut here ]------------
> kernel BUG at mm/hugetlb.c:3416!
> invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> CPU: 0 PID: 7036 Comm: syz-executor110 Not tainted 5.6.0-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:__unmap_hugepage_range+0xa26/0xbc0 mm/hugetlb.c:3416
> Code: 00 48 c7 c7 60 37 35 88 e8 57 b4 a2 ff e9 b3 fd ff ff e8 cd 90 c6 ff 0f 0b e9 c4 f7 ff ff e8 c1 90 c6 ff 0f 0b e8 ba 90 c6 ff <0f> 0b e8 b3 90 c6 ff 83 8c 24 c0 00 00 00 01 48 8d bc 24 a0 00 00
> RSP: 0018:ffffc900017779b0 EFLAGS: 00010293
> RAX: ffff88808cf5c2c0 RBX: ffffffff8c641c08 RCX: ffffffff81ac50b4
> RDX: 0000000000000000 RSI: ffffffff81ac58a6 RDI: 0000000000000007
> RBP: 0000000020000000 R08: ffff88808cf5c2c0 R09: ffffed10129d8111
> R10: ffffed10129d8110 R11: ffff888094ec0887 R12: 0000000000003000
> R13: 0000000000000000 R14: 0000000020003000 R15: 0000000000200000
> FS:  00000000013c0880(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000020000140 CR3: 0000000093554000 CR4: 00000000001406f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  __unmap_hugepage_range_final+0x30/0x70 mm/hugetlb.c:3507
>  unmap_single_vma+0x238/0x300 mm/memory.c:1296
>  unmap_vmas+0x16f/0x2f0 mm/memory.c:1332
>  exit_mmap+0x2aa/0x510 mm/mmap.c:3126
>  __mmput kernel/fork.c:1082 [inline]
>  mmput+0x168/0x4b0 kernel/fork.c:1103
>  exit_mm kernel/exit.c:477 [inline]
>  do_exit+0xa51/0x2dd0 kernel/exit.c:780
>  do_group_exit+0x125/0x340 kernel/exit.c:891
>  __do_sys_exit_group kernel/exit.c:902 [inline]
>  __se_sys_exit_group kernel/exit.c:900 [inline]
>  __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:900
>  do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
>  entry_SYSCALL_64_after_hwframe+0x49/0xb3

This is not new and certainly not caused by commit e950564b97fd.

hugetlbf only operates on huge page aligned and sized files/mappings.
To make sure this happens, the mmap code contians the following to 'round
up' length to huge page size:

	if (!(flags & MAP_ANONYMOUS)) {
		audit_mmap_fd(fd, flags);
		file = fget(fd);
		if (!file)
			return -EBADF;
		if (is_file_hugepages(file))
			len = ALIGN(len, huge_page_size(hstate_file(file)));
		retval = -EINVAL;
		if (unlikely(flags & MAP_HUGETLB && !is_file_hugepages(file)))
			goto out_fput;
	} else if (flags & MAP_HUGETLB) {
		struct user_struct *user = NULL;
		struct hstate *hs;

		hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
		if (!hs)
			return -EINVAL;

		len = ALIGN(len, huge_page_size(hs));

However, in this syzbot test case the 'file' is in an overlayfs filesystem
created as follows:

mkdir("./file0", 000)                   = 0
mount(NULL, "./file0", "hugetlbfs", MS_MANDLOCK|MS_POSIXACL, NULL) = 0
chdir("./file0")                        = 0
mkdir("./file1", 000)                   = 0
mkdir("./bus", 000)                     = 0
mkdir("./file0", 000)                   = 0
mount("\177ELF\2\1\1", "./bus", "overlay", 0, "lowerdir=./bus,workdir=./file1,u"...) = 0

The routine is_file_hugepages() is just comparing the file ops to huegtlbfs:

	if (file->f_op == &hugetlbfs_file_operations)
		return true;

Since the file is in an overlayfs, file->f_op == ovl_file_operations.
Therefore, length will not be rounded up to huge page size and we create a
mapping with incorrect size which leads to the BUG.

Because of the code in mmap, the hugetlbfs mmap() routine assumes length is
rounded to a huge page size.  I can easily add a check to hugetlbfs mmap
to validate length and return -EINVAL.  However, I think we really want to
do the 'round up' earlier in mmap.  This is because the man page says:

   Huge page (Huge TLB) mappings
       For mappings that employ huge pages, the requirements for the arguments
       of  mmap()  and munmap() differ somewhat from the requirements for map‐
       pings that use the native system page size.

       For mmap(), offset must be a multiple of the underlying huge page size.
       The system automatically aligns length to be a multiple of the underly‐
       ing huge page size.

Since the location for the mapping is chosen BEFORE getting to the hugetlbfs
mmap routine, we can not wait until then to round up the length.  Is there a
defined way to go from a struct file * to the underlying filesystem so we
can continue to do the 'round up' in early mmap code?

One other thing I noticed with overlayfs is that it does not contain a
specific get_unmapped_area file_operations routine.  I would expect it to at
least check for and use the get_unmapped_area of the underlying filesystem?
Can someone comment if this is by design?
In the case of hugetlbfs, get_unmapped_area is even provided by most
architectures.  So, it seems like we would like/need to be calling the correct
routine.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: kernel BUG at mm/hugetlb.c:LINE!
  2020-04-06 22:05 ` Mike Kravetz
@ 2020-05-12 15:04   ` Miklos Szeredi
  2020-05-12 18:11     ` Mike Kravetz
  2020-05-18 23:41     ` Colin Walters
  0 siblings, 2 replies; 21+ messages in thread
From: Miklos Szeredi @ 2020-05-12 15:04 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: syzbot, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm,
	Miklos Szeredi, syzkaller-bugs, Al Viro

On Tue, Apr 7, 2020 at 12:06 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 4/5/20 8:06 PM, syzbot wrote:
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit:    1a323ea5 x86: get rid of 'errret' argument to __get_user_x..
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=132e940be00000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=8c1e98458335a7d1
> > dashboard link: https://syzkaller.appspot.com/bug?extid=d6ec23007e951dadf3de
> > compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12921933e00000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=172e940be00000
> >
> > The bug was bisected to:
> >
> > commit e950564b97fd0f541b02eb207685d0746f5ecf29
> > Author: Miklos Szeredi <mszeredi@redhat.com>
> > Date:   Tue Jul 24 13:01:55 2018 +0000
> >
> >     vfs: don't evict uninitialized inode
> >
> > bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=115cad33e00000
> > final crash:    https://syzkaller.appspot.com/x/report.txt?x=135cad33e00000
> > console output: https://syzkaller.appspot.com/x/log.txt?x=155cad33e00000
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+d6ec23007e951dadf3de@syzkaller.appspotmail.com
> > Fixes: e950564b97fd ("vfs: don't evict uninitialized inode")
> >
> > overlayfs: upper fs does not support xattr, falling back to index=off and metacopy=off.
> > ------------[ cut here ]------------
> > kernel BUG at mm/hugetlb.c:3416!
> > invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> > CPU: 0 PID: 7036 Comm: syz-executor110 Not tainted 5.6.0-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > RIP: 0010:__unmap_hugepage_range+0xa26/0xbc0 mm/hugetlb.c:3416
> > Code: 00 48 c7 c7 60 37 35 88 e8 57 b4 a2 ff e9 b3 fd ff ff e8 cd 90 c6 ff 0f 0b e9 c4 f7 ff ff e8 c1 90 c6 ff 0f 0b e8 ba 90 c6 ff <0f> 0b e8 b3 90 c6 ff 83 8c 24 c0 00 00 00 01 48 8d bc 24 a0 00 00
> > RSP: 0018:ffffc900017779b0 EFLAGS: 00010293
> > RAX: ffff88808cf5c2c0 RBX: ffffffff8c641c08 RCX: ffffffff81ac50b4
> > RDX: 0000000000000000 RSI: ffffffff81ac58a6 RDI: 0000000000000007
> > RBP: 0000000020000000 R08: ffff88808cf5c2c0 R09: ffffed10129d8111
> > R10: ffffed10129d8110 R11: ffff888094ec0887 R12: 0000000000003000
> > R13: 0000000000000000 R14: 0000000020003000 R15: 0000000000200000
> > FS:  00000000013c0880(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000020000140 CR3: 0000000093554000 CR4: 00000000001406f0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> >  __unmap_hugepage_range_final+0x30/0x70 mm/hugetlb.c:3507
> >  unmap_single_vma+0x238/0x300 mm/memory.c:1296
> >  unmap_vmas+0x16f/0x2f0 mm/memory.c:1332
> >  exit_mmap+0x2aa/0x510 mm/mmap.c:3126
> >  __mmput kernel/fork.c:1082 [inline]
> >  mmput+0x168/0x4b0 kernel/fork.c:1103
> >  exit_mm kernel/exit.c:477 [inline]
> >  do_exit+0xa51/0x2dd0 kernel/exit.c:780
> >  do_group_exit+0x125/0x340 kernel/exit.c:891
> >  __do_sys_exit_group kernel/exit.c:902 [inline]
> >  __se_sys_exit_group kernel/exit.c:900 [inline]
> >  __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:900
> >  do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
> >  entry_SYSCALL_64_after_hwframe+0x49/0xb3
>
> This is not new and certainly not caused by commit e950564b97fd.

Sorry for replying late...

> hugetlbf only operates on huge page aligned and sized files/mappings.
> To make sure this happens, the mmap code contians the following to 'round
> up' length to huge page size:
>
>         if (!(flags & MAP_ANONYMOUS)) {
>                 audit_mmap_fd(fd, flags);
>                 file = fget(fd);
>                 if (!file)
>                         return -EBADF;
>                 if (is_file_hugepages(file))
>                         len = ALIGN(len, huge_page_size(hstate_file(file)));
>                 retval = -EINVAL;
>                 if (unlikely(flags & MAP_HUGETLB && !is_file_hugepages(file)))
>                         goto out_fput;
>         } else if (flags & MAP_HUGETLB) {
>                 struct user_struct *user = NULL;
>                 struct hstate *hs;
>
>                 hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
>                 if (!hs)
>                         return -EINVAL;
>
>                 len = ALIGN(len, huge_page_size(hs));
>
> However, in this syzbot test case the 'file' is in an overlayfs filesystem
> created as follows:
>
> mkdir("./file0", 000)                   = 0
> mount(NULL, "./file0", "hugetlbfs", MS_MANDLOCK|MS_POSIXACL, NULL) = 0
> chdir("./file0")                        = 0
> mkdir("./file1", 000)                   = 0
> mkdir("./bus", 000)                     = 0
> mkdir("./file0", 000)                   = 0
> mount("\177ELF\2\1\1", "./bus", "overlay", 0, "lowerdir=./bus,workdir=./file1,u"...) = 0
>
> The routine is_file_hugepages() is just comparing the file ops to huegtlbfs:
>
>         if (file->f_op == &hugetlbfs_file_operations)
>                 return true;
>
> Since the file is in an overlayfs, file->f_op == ovl_file_operations.
> Therefore, length will not be rounded up to huge page size and we create a
> mapping with incorrect size which leads to the BUG.
>
> Because of the code in mmap, the hugetlbfs mmap() routine assumes length is
> rounded to a huge page size.  I can easily add a check to hugetlbfs mmap
> to validate length and return -EINVAL.  However, I think we really want to
> do the 'round up' earlier in mmap.  This is because the man page says:
>
>    Huge page (Huge TLB) mappings
>        For mappings that employ huge pages, the requirements for the arguments
>        of  mmap()  and munmap() differ somewhat from the requirements for map‐
>        pings that use the native system page size.
>
>        For mmap(), offset must be a multiple of the underlying huge page size.
>        The system automatically aligns length to be a multiple of the underly‐
>        ing huge page size.
>
> Since the location for the mapping is chosen BEFORE getting to the hugetlbfs
> mmap routine, we can not wait until then to round up the length.  Is there a
> defined way to go from a struct file * to the underlying filesystem so we
> can continue to do the 'round up' in early mmap code?

That's easy enough:

static inline struct file *real_file(struct file *file)
{
    return file->f_op != ovl_file_operations ? file : file->private_data;
}

But adding more filesystem specific code to generic code does not
sound like the cleanest way to solve this...

> One other thing I noticed with overlayfs is that it does not contain a
> specific get_unmapped_area file_operations routine.  I would expect it to at
> least check for and use the get_unmapped_area of the underlying filesystem?
> Can someone comment if this is by design?

Not sure.  What exactly is f_op->get_unmapped_area supposed to do?

> In the case of hugetlbfs, get_unmapped_area is even provided by most
> architectures.  So, it seems like we would like/need to be calling the correct
> routine.

Okay.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: kernel BUG at mm/hugetlb.c:LINE!
  2020-05-12 15:04   ` Miklos Szeredi
@ 2020-05-12 18:11     ` Mike Kravetz
  2020-05-15 22:15       ` Mike Kravetz
  2020-05-18 23:41     ` Colin Walters
  1 sibling, 1 reply; 21+ messages in thread
From: Mike Kravetz @ 2020-05-12 18:11 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: syzbot, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm,
	Miklos Szeredi, syzkaller-bugs, Al Viro

On 5/12/20 8:04 AM, Miklos Szeredi wrote:
> On Tue, Apr 7, 2020 at 12:06 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>> On 4/5/20 8:06 PM, syzbot wrote:
>>
>> The routine is_file_hugepages() is just comparing the file ops to huegtlbfs:
>>
>>         if (file->f_op == &hugetlbfs_file_operations)
>>                 return true;
>>
>> Since the file is in an overlayfs, file->f_op == ovl_file_operations.
>> Therefore, length will not be rounded up to huge page size and we create a
>> mapping with incorrect size which leads to the BUG.
>>
>> Because of the code in mmap, the hugetlbfs mmap() routine assumes length is
>> rounded to a huge page size.  I can easily add a check to hugetlbfs mmap
>> to validate length and return -EINVAL.  However, I think we really want to
>> do the 'round up' earlier in mmap.  This is because the man page says:
>>
>>    Huge page (Huge TLB) mappings
>>        For mappings that employ huge pages, the requirements for the arguments
>>        of  mmap()  and munmap() differ somewhat from the requirements for map‐
>>        pings that use the native system page size.
>>
>>        For mmap(), offset must be a multiple of the underlying huge page size.
>>        The system automatically aligns length to be a multiple of the underly‐
>>        ing huge page size.
>>
>> Since the location for the mapping is chosen BEFORE getting to the hugetlbfs
>> mmap routine, we can not wait until then to round up the length.  Is there a
>> defined way to go from a struct file * to the underlying filesystem so we
>> can continue to do the 'round up' in early mmap code?
> 
> That's easy enough:
> 
> static inline struct file *real_file(struct file *file)
> {
>     return file->f_op != ovl_file_operations ? file : file->private_data;
> }
> 
> But adding more filesystem specific code to generic code does not
> sound like the cleanest way to solve this...

We can incorporate the above 'real_file' functionality in the filesystem
specific routine is_file_hugepages(), and I think that would address this
specific issue.  I'll code that up.

>> One other thing I noticed with overlayfs is that it does not contain a
>> specific get_unmapped_area file_operations routine.  I would expect it to at
>> least check for and use the get_unmapped_area of the underlying filesystem?
>> Can someone comment if this is by design?
> 
> Not sure.  What exactly is f_op->get_unmapped_area supposed to do?
> 

IIUC, filesystems can define their own routines to get addresses for mmap
operations.  Quite a few filesystems define get_unmapped_area.

The generic mmap code does the following,

	get_area = current->mm->get_unmapped_area;
	if (file) {
		if (file->f_op->get_unmapped_area)
			get_area = file->f_op->get_unmapped_area;
	} else if (flags & MAP_SHARED) {
		/*
		 * mmap_region() will call shmem_zero_setup() to create a file,
		 * so use shmem's get_unmapped_area in case it can be huge.
		 * do_mmap_pgoff() will clear pgoff, so match alignment.
		 */
		pgoff = 0;
		get_area = shmem_get_unmapped_area;
	}

	addr = get_area(file, addr, len, pgoff, flags);

If the filesystem provides a get_unmapped_area, it will use it.  I beleive
overlayfs prevents this from happening for the underlying filesystem.

Perhaps we do need to add something like a call 'real_file' to this generic
code?  I can't think of any other way to get to the underlying filesystem
get_unmapped_area here.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: kernel BUG at mm/hugetlb.c:LINE!
  2020-05-12 18:11     ` Mike Kravetz
@ 2020-05-15 22:15       ` Mike Kravetz
  2020-05-18 11:12         ` Miklos Szeredi
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Kravetz @ 2020-05-15 22:15 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: syzbot, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm,
	Miklos Szeredi, syzkaller-bugs, Al Viro

On 5/12/20 11:11 AM, Mike Kravetz wrote:
> On 5/12/20 8:04 AM, Miklos Szeredi wrote:
>> On Tue, Apr 7, 2020 at 12:06 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>> On 4/5/20 8:06 PM, syzbot wrote:
>>>
>>> The routine is_file_hugepages() is just comparing the file ops to huegtlbfs:
>>>
>>>         if (file->f_op == &hugetlbfs_file_operations)
>>>                 return true;
>>>
>>> Since the file is in an overlayfs, file->f_op == ovl_file_operations.
>>> Therefore, length will not be rounded up to huge page size and we create a
>>> mapping with incorrect size which leads to the BUG.
>>>
>>> Because of the code in mmap, the hugetlbfs mmap() routine assumes length is
>>> rounded to a huge page size.  I can easily add a check to hugetlbfs mmap
>>> to validate length and return -EINVAL.  However, I think we really want to
>>> do the 'round up' earlier in mmap.  This is because the man page says:
>>>
>>>    Huge page (Huge TLB) mappings
>>>        For mappings that employ huge pages, the requirements for the arguments
>>>        of  mmap()  and munmap() differ somewhat from the requirements for map‐
>>>        pings that use the native system page size.
>>>
>>>        For mmap(), offset must be a multiple of the underlying huge page size.
>>>        The system automatically aligns length to be a multiple of the underly‐
>>>        ing huge page size.
>>>
>>> Since the location for the mapping is chosen BEFORE getting to the hugetlbfs
>>> mmap routine, we can not wait until then to round up the length.  Is there a
>>> defined way to go from a struct file * to the underlying filesystem so we
>>> can continue to do the 'round up' in early mmap code?
>>
>> That's easy enough:
>>
>> static inline struct file *real_file(struct file *file)
>> {
>>     return file->f_op != ovl_file_operations ? file : file->private_data;
>> }
>>
>> But adding more filesystem specific code to generic code does not
>> sound like the cleanest way to solve this...
> 
> We can incorporate the above 'real_file' functionality in the filesystem
> specific routine is_file_hugepages(), and I think that would address this
> specific issue.  I'll code that up.
> 
>>> One other thing I noticed with overlayfs is that it does not contain a
>>> specific get_unmapped_area file_operations routine.  I would expect it to at
>>> least check for and use the get_unmapped_area of the underlying filesystem?
>>> Can someone comment if this is by design?
>>
>> Not sure.  What exactly is f_op->get_unmapped_area supposed to do?
>>
> 
> IIUC, filesystems can define their own routines to get addresses for mmap
> operations.  Quite a few filesystems define get_unmapped_area.
> 
> The generic mmap code does the following,
> 
> 	get_area = current->mm->get_unmapped_area;
> 	if (file) {
> 		if (file->f_op->get_unmapped_area)
> 			get_area = file->f_op->get_unmapped_area;
> 	} else if (flags & MAP_SHARED) {
> 		/*
> 		 * mmap_region() will call shmem_zero_setup() to create a file,
> 		 * so use shmem's get_unmapped_area in case it can be huge.
> 		 * do_mmap_pgoff() will clear pgoff, so match alignment.
> 		 */
> 		pgoff = 0;
> 		get_area = shmem_get_unmapped_area;
> 	}
> 
> 	addr = get_area(file, addr, len, pgoff, flags);
> 
> If the filesystem provides a get_unmapped_area, it will use it.  I beleive
> overlayfs prevents this from happening for the underlying filesystem.
> 
> Perhaps we do need to add something like a call 'real_file' to this generic
> code?  I can't think of any other way to get to the underlying filesystem
> get_unmapped_area here.

I started going down the path of creating a get_unmapped_area f_op for
overlayfs.  That is pretty straight forward and works well.  But that
did not take care of the is_file_hugepages() routine.  Recall that
is_file_hugepages simply does if (file->f_op == &hugetlbfs_file_operations).

I suppose I could add a specific overlayfs check like real_file here.  But,
that does not seem like a clean solution.

I also discovered other routines doing comparisons like
if (file->f_op == <expected_fops>), they are:
is_dma_buf_file()
is_file_shm_hugepages()
get_pipe_info()
is_file_epoll()

So, it seems that these routines are also impacted if operating on files
in an overlayfs? 

Any suggestions on how to move forward?  It seems like there may be the
need for a real_file() routine?  I see a d_real dentry_op was added to
deal with this issue for dentries.  Might we need something similiar for
files (f_real)?

Looking for suggestions as I do not normally work with this code.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: kernel BUG at mm/hugetlb.c:LINE!
  2020-05-15 22:15       ` Mike Kravetz
@ 2020-05-18 11:12         ` Miklos Szeredi
  2020-05-18 23:22           ` Mike Kravetz
  0 siblings, 1 reply; 21+ messages in thread
From: Miklos Szeredi @ 2020-05-18 11:12 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: syzbot, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm,
	Miklos Szeredi, syzkaller-bugs, Al Viro

On Sat, May 16, 2020 at 12:15 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:

> I started going down the path of creating a get_unmapped_area f_op for
> overlayfs.  That is pretty straight forward and works well.  But that
> did not take care of the is_file_hugepages() routine.  Recall that
> is_file_hugepages simply does if (file->f_op == &hugetlbfs_file_operations).
>
> I suppose I could add a specific overlayfs check like real_file here.  But,
> that does not seem like a clean solution.
>
> I also discovered other routines doing comparisons like
> if (file->f_op == <expected_fops>), they are:
> is_dma_buf_file()
> is_file_shm_hugepages()
> get_pipe_info()
> is_file_epoll()
>
> So, it seems that these routines are also impacted if operating on files
> in an overlayfs?

Those are non-filesystems, with the exception of
is_file_shm_hugepages(), the only caller of which is
is_file_hugepages().

> Any suggestions on how to move forward?  It seems like there may be the
> need for a real_file() routine?  I see a d_real dentry_op was added to
> deal with this issue for dentries.  Might we need something similiar for
> files (f_real)?
>
> Looking for suggestions as I do not normally work with this code.

And I'm not so familiar with hugepages code.  I'd suggest moving
length alignment into f_op->get_unmapped_area() and cleaning up other
special casing of hugetlb mappings, but it's probably far from
trivial...

So yeah, that leaves a real_file() helper or something similar.
Unlike the example I gave first it actually needs to be recursive:

static inline struct file *real_file(struct file *file)
{
    whole (unlikely(file->f_op == ovl_file_operations))
        file = file->private_data;
    return file;
}

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: kernel BUG at mm/hugetlb.c:LINE!
  2020-05-18 11:12         ` Miklos Szeredi
@ 2020-05-18 23:22           ` Mike Kravetz
  0 siblings, 0 replies; 21+ messages in thread
From: Mike Kravetz @ 2020-05-18 23:22 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: syzbot, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm,
	Miklos Szeredi, syzkaller-bugs, Al Viro

On 5/18/20 4:12 AM, Miklos Szeredi wrote:
> On Sat, May 16, 2020 at 12:15 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>> Any suggestions on how to move forward?  It seems like there may be the
>> need for a real_file() routine?  I see a d_real dentry_op was added to
>> deal with this issue for dentries.  Might we need something similiar for
>> files (f_real)?
>>
>> Looking for suggestions as I do not normally work with this code.
> 
> And I'm not so familiar with hugepages code.  I'd suggest moving
> length alignment into f_op->get_unmapped_area() and cleaning up other
> special casing of hugetlb mappings, but it's probably far from
> trivial...
> 
> So yeah, that leaves a real_file() helper or something similar.
> Unlike the example I gave first it actually needs to be recursive:
> 
> static inline struct file *real_file(struct file *file)
> {
>     whole (unlikely(file->f_op == ovl_file_operations))
>         file = file->private_data;
>     return file;
> }

If we add real_file(), then I think it only needs to be called in two
places: is_file_hugepages() and core mmap code.  However, I could not
think of a good place to put real_file().  Below is a patch which creates
a new file <linux/overlayfs.h> for the routine.  It does solve this BUG
and should fix any other issues with callers of is_file_hugepages().
Let me know what you think.

I add a 'Suggested-by:' for real_file, but am happy to change that to
a 'Signed-off-by:' if you prefer.

From ea6a96aa3f5365df39f7cf213f87abe336b43e71 Mon Sep 17 00:00:00 2001
From: Mike Kravetz <mike.kravetz@oracle.com>
Date: Mon, 18 May 2020 15:29:12 -0700
Subject: [PATCH] ovl: provide real_file() for use by hugetlb and mmap

If a file is on a union/overlay, then the 'struct file *' will have
overlayfs file operations.  The routine is_file_hugepages() compares
f->f_op to hugetlbfs_file_operations to determine if it is a hugetlbfs
file.  If a hugetlbfs file is on a union/overlay, this comparison is
false and is_file_hugepages() incorrectly indicates the underlying
file is not hugetlbfs.  One result of this is a BUG as shown in [1].

mmap uses is_file_hugepages() because hugetlbfs files have different
alignment restrictions.  In addition, mmap code would like to use the
filesystem specific get_unmapped_area() routine if one is defined.

To address this issue, add a new routine real_file() which will return
the underlying file.  Update is_file_hugepages and mmap code to get the
real file.

[1] https://lore.kernel.org/linux-mm/000000000000b4684e05a2968ca6@google.com/

Reported-by: syzbot+d6ec23007e951dadf3de@syzkaller.appspotmail.com
Suggested-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 include/linux/hugetlb.h   |  3 +++
 include/linux/overlayfs.h | 27 +++++++++++++++++++++++++++
 mm/mmap.c                 |  2 ++
 3 files changed, 32 insertions(+)
 create mode 100644 include/linux/overlayfs.h

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 43a1cef8f0f1..fb22c0a7474a 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -9,6 +9,7 @@
 #include <linux/cgroup.h>
 #include <linux/list.h>
 #include <linux/kref.h>
+#include <linux/overlayfs.h>
 #include <asm/pgtable.h>
 
 struct ctl_table;
@@ -437,6 +438,8 @@ struct file *hugetlb_file_setup(const char *name, size_t size, vm_flags_t acct,
 
 static inline bool is_file_hugepages(struct file *file)
 {
+	file = real_file(file);
+
 	if (file->f_op == &hugetlbfs_file_operations)
 		return true;
 
diff --git a/include/linux/overlayfs.h b/include/linux/overlayfs.h
new file mode 100644
index 000000000000..eecdfda0286f
--- /dev/null
+++ b/include/linux/overlayfs.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_OVERLAYFS_H
+#define _LINUX_OVERLAYFS_H
+
+#include <linux/fs.h>
+
+extern const struct file_operations ovl_file_operations;
+
+#ifdef CONFIG_OVERLAY_FS
+/*
+ * If file is on a union/overlay, then return the underlying real file.
+ * Otherwise return the file itself.
+ */
+static inline struct file *real_file(struct file *file)
+{
+	while (unlikely(file->f_op == &ovl_file_operations))
+		file = file->private_data;
+	return file;
+}
+#else
+static inline struct file *real_file(struct file *file)
+{
+	return file;
+}
+#endif
+
+#endif /* _LINUX_OVERLAYFS_H */
diff --git a/mm/mmap.c b/mm/mmap.c
index f609e9ec4a25..7f45a4057a15 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -47,6 +47,7 @@
 #include <linux/pkeys.h>
 #include <linux/oom.h>
 #include <linux/sched/mm.h>
+#include <linux/overlayfs.h>
 
 #include <linux/uaccess.h>
 #include <asm/cacheflush.h>
@@ -2203,6 +2204,7 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
 
 	get_area = current->mm->get_unmapped_area;
 	if (file) {
+		file = real_file(file);
 		if (file->f_op->get_unmapped_area)
 			get_area = file->f_op->get_unmapped_area;
 	} else if (flags & MAP_SHARED) {
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: kernel BUG at mm/hugetlb.c:LINE!
  2020-05-12 15:04   ` Miklos Szeredi
  2020-05-12 18:11     ` Mike Kravetz
@ 2020-05-18 23:41     ` Colin Walters
  2020-05-19  0:35       ` Mike Kravetz
  1 sibling, 1 reply; 21+ messages in thread
From: Colin Walters @ 2020-05-18 23:41 UTC (permalink / raw)
  To: Miklos Szeredi, Mike Kravetz
  Cc: syzbot, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm,
	Miklos Szeredi, syzkaller-bugs, Al Viro



On Tue, May 12, 2020, at 11:04 AM, Miklos Szeredi wrote:

> > However, in this syzbot test case the 'file' is in an overlayfs filesystem
> > created as follows:
> >
> > mkdir("./file0", 000)                   = 0
> > mount(NULL, "./file0", "hugetlbfs", MS_MANDLOCK|MS_POSIXACL, NULL) = 0
> > chdir("./file0")                        = 0
> > mkdir("./file1", 000)                   = 0
> > mkdir("./bus", 000)                     = 0
> > mkdir("./file0", 000)                   = 0
> > mount("\177ELF\2\1\1", "./bus", "overlay", 0, "lowerdir=./bus,workdir=./file1,u"...) = 0

Is there any actual valid use case for mounting an overlayfs on top of hugetlbfs?  I can't think of one.  Why isn't the response to this to instead only allow mounting overlayfs on top of basically a set of whitelisted filesystems?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: kernel BUG at mm/hugetlb.c:LINE!
  2020-05-18 23:41     ` Colin Walters
@ 2020-05-19  0:35       ` Mike Kravetz
  2020-05-20 11:20         ` Miklos Szeredi
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Kravetz @ 2020-05-19  0:35 UTC (permalink / raw)
  To: Colin Walters, Miklos Szeredi
  Cc: syzbot, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm,
	Miklos Szeredi, syzkaller-bugs, Al Viro

On 5/18/20 4:41 PM, Colin Walters wrote:
> 
> On Tue, May 12, 2020, at 11:04 AM, Miklos Szeredi wrote:
> 
>>> However, in this syzbot test case the 'file' is in an overlayfs filesystem
>>> created as follows:
>>>
>>> mkdir("./file0", 000)                   = 0
>>> mount(NULL, "./file0", "hugetlbfs", MS_MANDLOCK|MS_POSIXACL, NULL) = 0
>>> chdir("./file0")                        = 0
>>> mkdir("./file1", 000)                   = 0
>>> mkdir("./bus", 000)                     = 0
>>> mkdir("./file0", 000)                   = 0
>>> mount("\177ELF\2\1\1", "./bus", "overlay", 0, "lowerdir=./bus,workdir=./file1,u"...) = 0
> 
> Is there any actual valid use case for mounting an overlayfs on top of hugetlbfs?  I can't think of one.  Why isn't the response to this to instead only allow mounting overlayfs on top of basically a set of whitelisted filesystems?
> 

I can not think of a use case.  I'll let Miklos comment on adding whitelist
capability to overlayfs.

IMO - This BUG/report revealed two issues.  First is the BUG by mmap'ing
a hugetlbfs file on overlayfs.  The other is that core mmap code will skip
any filesystem specific get_unmapped area routine if on a union/overlay.
My patch fixes both, but if we go with a whitelist approach and don't allow
hugetlbfs I think we still need to address the filesystem specific
get_unmapped area issue.  That is easy enough to do by adding a routine to
overlayfs which calls the routine for the underlying fs.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: kernel BUG at mm/hugetlb.c:LINE!
  2020-05-19  0:35       ` Mike Kravetz
@ 2020-05-20 11:20         ` Miklos Szeredi
  2020-05-20 17:27           ` Mike Kravetz
  0 siblings, 1 reply; 21+ messages in thread
From: Miklos Szeredi @ 2020-05-20 11:20 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Colin Walters, syzbot, Andrew Morton, linux-fsdevel,
	linux-kernel, linux-mm, Miklos Szeredi, syzkaller-bugs, Al Viro

On Tue, May 19, 2020 at 2:35 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 5/18/20 4:41 PM, Colin Walters wrote:
> >
> > On Tue, May 12, 2020, at 11:04 AM, Miklos Szeredi wrote:
> >
> >>> However, in this syzbot test case the 'file' is in an overlayfs filesystem
> >>> created as follows:
> >>>
> >>> mkdir("./file0", 000)                   = 0
> >>> mount(NULL, "./file0", "hugetlbfs", MS_MANDLOCK|MS_POSIXACL, NULL) = 0
> >>> chdir("./file0")                        = 0
> >>> mkdir("./file1", 000)                   = 0
> >>> mkdir("./bus", 000)                     = 0
> >>> mkdir("./file0", 000)                   = 0
> >>> mount("\177ELF\2\1\1", "./bus", "overlay", 0, "lowerdir=./bus,workdir=./file1,u"...) = 0
> >
> > Is there any actual valid use case for mounting an overlayfs on top of hugetlbfs?  I can't think of one.  Why isn't the response to this to instead only allow mounting overlayfs on top of basically a set of whitelisted filesystems?
> >
>
> I can not think of a use case.  I'll let Miklos comment on adding whitelist
> capability to overlayfs.

I've not heard of overlayfs being used over hugetlbfs.

Overlayfs on tmpfs is definitely used, I guess without hugepages.
But if we'd want to allow tmpfs _without_ hugepages but not tmpfs
_with_ hugepages, then we can't just whitelist based on filesystem
type, but need to look at mount options as well.  Which isn't really a
clean solution either.

> IMO - This BUG/report revealed two issues.  First is the BUG by mmap'ing
> a hugetlbfs file on overlayfs.  The other is that core mmap code will skip
> any filesystem specific get_unmapped area routine if on a union/overlay.
> My patch fixes both, but if we go with a whitelist approach and don't allow
> hugetlbfs I think we still need to address the filesystem specific
> get_unmapped area issue.  That is easy enough to do by adding a routine to
> overlayfs which calls the routine for the underlying fs.

I think the two are strongly related:  get_unmapped_area() adjusts the
address alignment, and the is_file_hugepages() call in
ksys_mmap_pgoff() adjusts the length alignment.

Is there any other purpose for which  f_op->get_unmapped_area() is
used by a filesystem?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: kernel BUG at mm/hugetlb.c:LINE!
  2020-05-20 11:20         ` Miklos Szeredi
@ 2020-05-20 17:27           ` Mike Kravetz
  2020-05-22 10:05             ` Miklos Szeredi
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Kravetz @ 2020-05-20 17:27 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Colin Walters, syzbot, Andrew Morton, linux-fsdevel,
	linux-kernel, linux-mm, Miklos Szeredi, syzkaller-bugs, Al Viro

On 5/20/20 4:20 AM, Miklos Szeredi wrote:
> On Tue, May 19, 2020 at 2:35 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>
>> On 5/18/20 4:41 PM, Colin Walters wrote:
>>>
>>> On Tue, May 12, 2020, at 11:04 AM, Miklos Szeredi wrote:
>>>
>>>>> However, in this syzbot test case the 'file' is in an overlayfs filesystem
>>>>> created as follows:
>>>>>
>>>>> mkdir("./file0", 000)                   = 0
>>>>> mount(NULL, "./file0", "hugetlbfs", MS_MANDLOCK|MS_POSIXACL, NULL) = 0
>>>>> chdir("./file0")                        = 0
>>>>> mkdir("./file1", 000)                   = 0
>>>>> mkdir("./bus", 000)                     = 0
>>>>> mkdir("./file0", 000)                   = 0
>>>>> mount("\177ELF\2\1\1", "./bus", "overlay", 0, "lowerdir=./bus,workdir=./file1,u"...) = 0
>>>
>>> Is there any actual valid use case for mounting an overlayfs on top of hugetlbfs?  I can't think of one.  Why isn't the response to this to instead only allow mounting overlayfs on top of basically a set of whitelisted filesystems?
>>>
>>
>> I can not think of a use case.  I'll let Miklos comment on adding whitelist
>> capability to overlayfs.
> 
> I've not heard of overlayfs being used over hugetlbfs.
> 
> Overlayfs on tmpfs is definitely used, I guess without hugepages.
> But if we'd want to allow tmpfs _without_ hugepages but not tmpfs
> _with_ hugepages, then we can't just whitelist based on filesystem
> type, but need to look at mount options as well.  Which isn't really a
> clean solution either.
> 
>> IMO - This BUG/report revealed two issues.  First is the BUG by mmap'ing
>> a hugetlbfs file on overlayfs.  The other is that core mmap code will skip
>> any filesystem specific get_unmapped area routine if on a union/overlay.
>> My patch fixes both, but if we go with a whitelist approach and don't allow
>> hugetlbfs I think we still need to address the filesystem specific
>> get_unmapped area issue.  That is easy enough to do by adding a routine to
>> overlayfs which calls the routine for the underlying fs.
> 
> I think the two are strongly related:  get_unmapped_area() adjusts the
> address alignment, and the is_file_hugepages() call in
> ksys_mmap_pgoff() adjusts the length alignment.
> 
> Is there any other purpose for which  f_op->get_unmapped_area() is
> used by a filesystem?
> 

I am fairly confident it is all about checking limits and alignment.  The
filesystem knows if it can/should align to base or huge page size. DAX has
some interesting additional restrictions, and several 'traditional' filesystems
check if they are 'on DAX'.

In a previous e-mail, you suggested hugetlb_get_unmapped_area could do the
length adjustment in hugetlb_get_unmapped_area (generic and arch specific).
I agree, although there may be the need to add length overflow checks in
these routines (after round up) as this is done in core code now.  However,
this can be done as a separate cleanup patch.

In any case, we need to get the core mmap code to call filesystem specific
get_unmapped_area if on a union/overlay.  The patch I suggested does this
by simply calling real_file to determine if there is a filesystem specific
get_unmapped_area.  The other approach would be to provide an overlayfs
get_unmapped_area that calls the underlying filesystem get_unmapped_area.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: kernel BUG at mm/hugetlb.c:LINE!
  2020-05-20 17:27           ` Mike Kravetz
@ 2020-05-22 10:05             ` Miklos Szeredi
  2020-05-28  0:01               ` Mike Kravetz
  0 siblings, 1 reply; 21+ messages in thread
From: Miklos Szeredi @ 2020-05-22 10:05 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Colin Walters, syzbot, Andrew Morton, linux-fsdevel,
	linux-kernel, linux-mm, Miklos Szeredi, syzkaller-bugs, Al Viro,
	linux-unionfs

On Wed, May 20, 2020 at 10:27:15AM -0700, Mike Kravetz wrote:

> I am fairly confident it is all about checking limits and alignment.  The
> filesystem knows if it can/should align to base or huge page size. DAX has
> some interesting additional restrictions, and several 'traditional' filesystems
> check if they are 'on DAX'.


Okay, I haven't looked at DAX vs. overlay.  I'm sure it's going to come up at
some point, if it hasn't already.

> 
> In a previous e-mail, you suggested hugetlb_get_unmapped_area could do the
> length adjustment in hugetlb_get_unmapped_area (generic and arch specific).
> I agree, although there may be the need to add length overflow checks in
> these routines (after round up) as this is done in core code now.  However,
> this can be done as a separate cleanup patch.
> 
> In any case, we need to get the core mmap code to call filesystem specific
> get_unmapped_area if on a union/overlay.  The patch I suggested does this
> by simply calling real_file to determine if there is a filesystem specific
> get_unmapped_area.  The other approach would be to provide an overlayfs
> get_unmapped_area that calls the underlying filesystem get_unmapped_area.

That latter is what's done for all other stacked operations in overlayfs.

Untested patch below.

Thanks,
Miklos

---
 fs/overlayfs/file.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -757,6 +757,17 @@ static loff_t ovl_remap_file_range(struc
 			    remap_flags, op);
 }
 
+static unsigned long ovl_get_unmapped_area(struct file *file,
+				unsigned long uaddr, unsigned long len,
+				unsigned long pgoff, unsigned long flags)
+{
+	struct file *realfile = file->private_data;
+
+	return (realfile->f_op->get_unmapped_area ?:
+		current->mm->get_unmapped_area)(realfile,
+						uaddr, len, pgoff, flags);
+}
+
 const struct file_operations ovl_file_operations = {
 	.open		= ovl_open,
 	.release	= ovl_release,
@@ -774,6 +785,7 @@ const struct file_operations ovl_file_op
 
 	.copy_file_range	= ovl_copy_file_range,
 	.remap_file_range	= ovl_remap_file_range,
+	.get_unmapped_area	= ovl_get_unmapped_area,
 };
 
 int __init ovl_aio_request_cache_init(void)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: kernel BUG at mm/hugetlb.c:LINE!
  2020-05-22 10:05             ` Miklos Szeredi
@ 2020-05-28  0:01               ` Mike Kravetz
  2020-05-28  8:37                 ` [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area() kbuild test robot
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Kravetz @ 2020-05-28  0:01 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Colin Walters, syzbot, Andrew Morton, linux-fsdevel,
	linux-kernel, linux-mm, Miklos Szeredi, syzkaller-bugs, Al Viro,
	linux-unionfs

On 5/22/20 3:05 AM, Miklos Szeredi wrote:
> On Wed, May 20, 2020 at 10:27:15AM -0700, Mike Kravetz wrote:
> 
>> I am fairly confident it is all about checking limits and alignment.  The
>> filesystem knows if it can/should align to base or huge page size. DAX has
>> some interesting additional restrictions, and several 'traditional' filesystems
>> check if they are 'on DAX'.
> 
> 
> Okay, I haven't looked at DAX vs. overlay.  I'm sure it's going to come up at
> some point, if it hasn't already.
> 
>>
>> In a previous e-mail, you suggested hugetlb_get_unmapped_area could do the
>> length adjustment in hugetlb_get_unmapped_area (generic and arch specific).
>> I agree, although there may be the need to add length overflow checks in
>> these routines (after round up) as this is done in core code now.  However,
>> this can be done as a separate cleanup patch.
>>
>> In any case, we need to get the core mmap code to call filesystem specific
>> get_unmapped_area if on a union/overlay.  The patch I suggested does this
>> by simply calling real_file to determine if there is a filesystem specific
>> get_unmapped_area.  The other approach would be to provide an overlayfs
>> get_unmapped_area that calls the underlying filesystem get_unmapped_area.
> 
> That latter is what's done for all other stacked operations in overlayfs.
> 
> Untested patch below.
> 

Thanks Miklos!

We still need the 'real_file()' routine for is_file_hugepages.  So combining
these, I propose the following patch.  It addresses the known issue as well
as potential issues with is_file_hugepages returning incorrect information.
I don't really like a separate header file for real_file, but I can not
think of any good place to put it.

Let me know what you think,

From 33f6bbd19406108b61a4113b1ec8e4e6888cd482 Mon Sep 17 00:00:00 2001
From: Mike Kravetz <mike.kravetz@oracle.com>
Date: Wed, 27 May 2020 16:58:58 -0700
Subject: [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area()

If a file is on a union/overlay, then the 'struct file *' will have
overlayfs file operations.  The routine is_file_hugepages() compares
f->f_op to hugetlbfs_file_operations to determine if it is a hugetlbfs
file.  If a hugetlbfs file is on a union/overlay, this comparison is
false and is_file_hugepages() incorrectly indicates the underlying
file is not hugetlbfs.  One result of this is a BUG as shown in [1].

mmap uses is_file_hugepages() because hugetlbfs files have different
alignment restrictions.  In addition, mmap code would like to use the
filesystem specific get_unmapped_area() routine if one is defined.

To address this issue,
- Add a new routine real_file() which will return the underlying file.
- Update is_file_hugepages to get the real file.
- Add get_unmapped_area f_op to oerrlayfs to call underlying routine.

[1] https://lore.kernel.org/linux-mm/000000000000b4684e05a2968ca6@google.com/

Reported-by: syzbot+d6ec23007e951dadf3de@syzkaller.appspotmail.com
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 fs/overlayfs/file.c       | 13 +++++++++++++
 include/linux/hugetlb.h   |  3 +++
 include/linux/overlayfs.h | 27 +++++++++++++++++++++++++++
 3 files changed, 43 insertions(+)
 create mode 100644 include/linux/overlayfs.h

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 87c362f65448..cc020e1c72d5 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -12,6 +12,7 @@
 #include <linux/splice.h>
 #include <linux/mm.h>
 #include <linux/fs.h>
+#include <linux/overlayfs.h>
 #include "overlayfs.h"
 
 struct ovl_aio_req {
@@ -757,6 +758,17 @@ static loff_t ovl_remap_file_range(struct file *file_in, loff_t pos_in,
 			    remap_flags, op);
 }
 
+static unsigned long ovl_get_unmapped_area(struct file *file,
+				unsigned long uaddr, unsigned long len,
+				unsigned long pgoff, unsigned long flags)
+{
+	struct file *realfile = real_file(file);
+
+	return (realfile->f_op->get_unmapped_area ?:
+		current->mm->get_unmapped_area)(realfile,
+						uaddr, len, pgoff, flags);
+}
+
 const struct file_operations ovl_file_operations = {
 	.open		= ovl_open,
 	.release	= ovl_release,
@@ -774,6 +786,7 @@ const struct file_operations ovl_file_operations = {
 
 	.copy_file_range	= ovl_copy_file_range,
 	.remap_file_range	= ovl_remap_file_range,
+	.get_unmapped_area	= ovl_get_unmapped_area,
 };
 
 int __init ovl_aio_request_cache_init(void)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 43a1cef8f0f1..fb22c0a7474a 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -9,6 +9,7 @@
 #include <linux/cgroup.h>
 #include <linux/list.h>
 #include <linux/kref.h>
+#include <linux/overlayfs.h>
 #include <asm/pgtable.h>
 
 struct ctl_table;
@@ -437,6 +438,8 @@ struct file *hugetlb_file_setup(const char *name, size_t size, vm_flags_t acct,
 
 static inline bool is_file_hugepages(struct file *file)
 {
+	file = real_file(file);
+
 	if (file->f_op == &hugetlbfs_file_operations)
 		return true;
 
diff --git a/include/linux/overlayfs.h b/include/linux/overlayfs.h
new file mode 100644
index 000000000000..eecdfda0286f
--- /dev/null
+++ b/include/linux/overlayfs.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_OVERLAYFS_H
+#define _LINUX_OVERLAYFS_H
+
+#include <linux/fs.h>
+
+extern const struct file_operations ovl_file_operations;
+
+#ifdef CONFIG_OVERLAY_FS
+/*
+ * If file is on a union/overlay, then return the underlying real file.
+ * Otherwise return the file itself.
+ */
+static inline struct file *real_file(struct file *file)
+{
+	while (unlikely(file->f_op == &ovl_file_operations))
+		file = file->private_data;
+	return file;
+}
+#else
+static inline struct file *real_file(struct file *file)
+{
+	return file;
+}
+#endif
+
+#endif /* _LINUX_OVERLAYFS_H */
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area()
  2020-05-28  0:01               ` Mike Kravetz
@ 2020-05-28  8:37                 ` kbuild test robot
  2020-05-28 21:01                   ` Mike Kravetz
  0 siblings, 1 reply; 21+ messages in thread
From: kbuild test robot @ 2020-05-28  8:37 UTC (permalink / raw)
  To: Mike Kravetz, Miklos Szeredi
  Cc: kbuild-all, Colin Walters, syzbot, Andrew Morton,
	Linux Memory Management List, linux-fsdevel, linux-kernel,
	syzkaller-bugs, Al Viro, linux-unionfs

[-- Attachment #1: Type: text/plain, Size: 2140 bytes --]

Hi Mike,

I love your patch! Yet something to improve:

[auto build test ERROR on miklos-vfs/overlayfs-next]
[also build test ERROR on linus/master v5.7-rc7]
[cannot apply to linux/master next-20200526]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Mike-Kravetz/ovl-provide-real_file-and-overlayfs-get_unmapped_area/20200528-080533
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next
config: h8300-randconfig-r036-20200528 (attached as .config)
compiler: h8300-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=h8300 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>, old ones prefixed by <<):

fs/overlayfs/file.c: In function 'ovl_get_unmapped_area':
>> fs/overlayfs/file.c:768:14: error: 'struct mm_struct' has no member named 'get_unmapped_area'
768 |   current->mm->get_unmapped_area)(realfile,
|              ^~
>> fs/overlayfs/file.c:770:1: warning: control reaches end of non-void function [-Wreturn-type]
770 | }
| ^

vim +768 fs/overlayfs/file.c

   760	
   761	static unsigned long ovl_get_unmapped_area(struct file *file,
   762					unsigned long uaddr, unsigned long len,
   763					unsigned long pgoff, unsigned long flags)
   764	{
   765		struct file *realfile = real_file(file);
   766	
   767		return (realfile->f_op->get_unmapped_area ?:
 > 768			current->mm->get_unmapped_area)(realfile,
   769							uaddr, len, pgoff, flags);
 > 770	}
   771	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 31095 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area()
  2020-05-28  8:37                 ` [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area() kbuild test robot
@ 2020-05-28 21:01                   ` Mike Kravetz
  2020-06-04  9:16                     ` Miklos Szeredi
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Kravetz @ 2020-05-28 21:01 UTC (permalink / raw)
  To: kbuild test robot, Miklos Szeredi
  Cc: kbuild-all, Colin Walters, syzbot, Andrew Morton,
	Linux Memory Management List, linux-fsdevel, linux-kernel,
	syzkaller-bugs, Al Viro, linux-unionfs

On 5/28/20 1:37 AM, kbuild test robot wrote:
> Hi Mike,
> 
> I love your patch! Yet something to improve:
> 
> [auto build test ERROR on miklos-vfs/overlayfs-next]
> [also build test ERROR on linus/master v5.7-rc7]
> [cannot apply to linux/master next-20200526]
> [if your patch is applied to the wrong git tree, please drop us a note to help
> improve the system. BTW, we also suggest to use '--base' option to specify the
> base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
> 
> url:    https://github.com/0day-ci/linux/commits/Mike-Kravetz/ovl-provide-real_file-and-overlayfs-get_unmapped_area/20200528-080533
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next
> config: h8300-randconfig-r036-20200528 (attached as .config)
> compiler: h8300-linux-gcc (GCC) 9.3.0
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=h8300 
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kbuild test robot <lkp@intel.com>
> 
> All error/warnings (new ones prefixed by >>, old ones prefixed by <<):
> 
> fs/overlayfs/file.c: In function 'ovl_get_unmapped_area':
>>> fs/overlayfs/file.c:768:14: error: 'struct mm_struct' has no member named 'get_unmapped_area'
> 768 |   current->mm->get_unmapped_area)(realfile,
> |              ^~
>>> fs/overlayfs/file.c:770:1: warning: control reaches end of non-void function [-Wreturn-type]
> 770 | }
> | ^
> 
> vim +768 fs/overlayfs/file.c
> 
>    760	
>    761	static unsigned long ovl_get_unmapped_area(struct file *file,
>    762					unsigned long uaddr, unsigned long len,
>    763					unsigned long pgoff, unsigned long flags)
>    764	{
>    765		struct file *realfile = real_file(file);
>    766	
>    767		return (realfile->f_op->get_unmapped_area ?:
>  > 768			current->mm->get_unmapped_area)(realfile,
>    769							uaddr, len, pgoff, flags);
>  > 770	}
>    771	
> 
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
> 

Well yuck!  get_unmapped_area is not part of mm_struct if !CONFIG_MMU.

Miklos, would adding '#ifdef CONFIG_MMU' around the overlayfs code be too
ugly for you?  Another option is to use real_file() in the mmap code as
done in [1].

Sorry for all the questions.  However, I want to make sure you are happy
with any overlayfs changes.

[1] https://lore.kernel.org/linux-mm/04a00e3b-539c-236f-e43b-0024ef94b7cb@oracle.com/
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area()
  2020-05-28 21:01                   ` Mike Kravetz
@ 2020-06-04  9:16                     ` Miklos Szeredi
  2020-06-11  0:13                       ` Mike Kravetz
  0 siblings, 1 reply; 21+ messages in thread
From: Miklos Szeredi @ 2020-06-04  9:16 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: kbuild test robot, kbuild-all, Colin Walters, syzbot,
	Andrew Morton, Linux Memory Management List, linux-fsdevel,
	linux-kernel, syzkaller-bugs, Al Viro, overlayfs

On Thu, May 28, 2020 at 11:01 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 5/28/20 1:37 AM, kbuild test robot wrote:
> > Hi Mike,
> >
> > I love your patch! Yet something to improve:
> >
> > [auto build test ERROR on miklos-vfs/overlayfs-next]
> > [also build test ERROR on linus/master v5.7-rc7]
> > [cannot apply to linux/master next-20200526]
> > [if your patch is applied to the wrong git tree, please drop us a note to help
> > improve the system. BTW, we also suggest to use '--base' option to specify the
> > base tree in git format-patch, please see https://stackoverflow.com/a/37406982]
> >
> > url:    https://github.com/0day-ci/linux/commits/Mike-Kravetz/ovl-provide-real_file-and-overlayfs-get_unmapped_area/20200528-080533
> > base:   https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next
> > config: h8300-randconfig-r036-20200528 (attached as .config)
> > compiler: h8300-linux-gcc (GCC) 9.3.0
> > reproduce (this is a W=1 build):
> >         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> >         chmod +x ~/bin/make.cross
> >         # save the attached .config to linux build tree
> >         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=h8300
> >
> > If you fix the issue, kindly add following tag as appropriate
> > Reported-by: kbuild test robot <lkp@intel.com>
> >
> > All error/warnings (new ones prefixed by >>, old ones prefixed by <<):
> >
> > fs/overlayfs/file.c: In function 'ovl_get_unmapped_area':
> >>> fs/overlayfs/file.c:768:14: error: 'struct mm_struct' has no member named 'get_unmapped_area'
> > 768 |   current->mm->get_unmapped_area)(realfile,
> > |              ^~
> >>> fs/overlayfs/file.c:770:1: warning: control reaches end of non-void function [-Wreturn-type]
> > 770 | }
> > | ^
> >
> > vim +768 fs/overlayfs/file.c
> >
> >    760
> >    761        static unsigned long ovl_get_unmapped_area(struct file *file,
> >    762                                        unsigned long uaddr, unsigned long len,
> >    763                                        unsigned long pgoff, unsigned long flags)
> >    764        {
> >    765                struct file *realfile = real_file(file);
> >    766
> >    767                return (realfile->f_op->get_unmapped_area ?:
> >  > 768                        current->mm->get_unmapped_area)(realfile,
> >    769                                                        uaddr, len, pgoff, flags);
> >  > 770        }
> >    771
> >
> > ---
> > 0-DAY CI Kernel Test Service, Intel Corporation
> > https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
> >
>
> Well yuck!  get_unmapped_area is not part of mm_struct if !CONFIG_MMU.
>
> Miklos, would adding '#ifdef CONFIG_MMU' around the overlayfs code be too
> ugly for you?  Another option is to use real_file() in the mmap code as
> done in [1].

I think the proper fix is to add an inline helper
(call_get_unmapped_area()?) in linux/mm.h, and make that work properly
for the NOMMU case as well.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area()
  2020-06-04  9:16                     ` Miklos Szeredi
@ 2020-06-11  0:13                       ` Mike Kravetz
  2020-06-11  0:37                         ` Al Viro
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Kravetz @ 2020-06-11  0:13 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: kbuild test robot, kbuild-all, Colin Walters, syzbot,
	Andrew Morton, Linux Memory Management List, linux-fsdevel,
	linux-kernel, syzkaller-bugs, Al Viro, overlayfs

On 6/4/20 2:16 AM, Miklos Szeredi wrote:
> On Thu, May 28, 2020 at 11:01 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>
>> Well yuck!  get_unmapped_area is not part of mm_struct if !CONFIG_MMU.
>>
>> Miklos, would adding '#ifdef CONFIG_MMU' around the overlayfs code be too
>> ugly for you?  Another option is to use real_file() in the mmap code as
>> done in [1].
> 
> I think the proper fix is to add an inline helper
> (call_get_unmapped_area()?) in linux/mm.h, and make that work properly
> for the NOMMU case as well.
> 

I'm ignoring the above issue for now.  There may be a more fundamental issue
that I do not know how to solve without adding another memeber to
file_operations.  Why?

In order to go from file -> realfile, one needs to do something like the
code you provided.

	while (file->f_op == &ovl_file_operations)
		file = file->private_data;
	return file;

The problem is that this needs to be called from core kernel code
(is_file_hugepages in mmap).  ovl_file_operations is obviously only defined
in the overlayfs code.  Since overlayfs can be built as a module, I do not
know of a way to reference ovl_file_operations which will only become available
when/if overlayfs is loaded.  Is there a way to do that?

If there is no other way to do this, then I think we need to add another
member to file_operations as is done in the following patch.  I hope there
is another way, because adding another file_op seems like overkill.

From 6d22c93284263b5ddcc2199adc1bcceb233f1499 Mon Sep 17 00:00:00 2001
From: Mike Kravetz <mike.kravetz@oracle.com>
Date: Wed, 10 Jun 2020 16:23:46 -0700
Subject: [PATCH v3] ovl: add f_real file_operation and supporting code for
 overlayfs

If a file is on a union/overlay, then the 'struct file *' will have
overlayfs file operations.  The routine is_file_hugepages() compares
f->f_op to hugetlbfs_file_operations to determine if it is a hugetlbfs
file.  If a hugetlbfs file is on a union/overlay, this comparison is
false and is_file_hugepages() incorrectly indicates the underlying
file is not hugetlbfs.  One result of this is a BUG as shown in [1].

mmap uses is_file_hugepages() to identify hugetlbfs file and potentially
round up length because hugetlbfs files have different alignment
restrictions.  This is defined/expected behavior.  In addition, mmap
code would like to use the filesystem specific get_unmapped_area()
routine if one is defined.

To address this issue,
- Add a new file operation f_real while will return the underlying file.
  Only overlayfs provides a function for this operation.
- Add a new routine real_file() which can be used by core code get an
  underlying file.
- Update is_file_hugepages to get the real file.
- Add get_unmapped_area f_op to oerrlayfs to call underlying routine.

[1] https://lore.kernel.org/linux-mm/000000000000b4684e05a2968ca6@google.com/

Reported-by: syzbot+d6ec23007e951dadf3de@syzkaller.appspotmail.com
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 fs/overlayfs/file.c     | 23 +++++++++++++++++++++++
 include/linux/fs.h      |  8 ++++++++
 include/linux/hugetlb.h |  2 ++
 3 files changed, 33 insertions(+)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 87c362f65448..eb870fc3912f 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -757,6 +757,27 @@ static loff_t ovl_remap_file_range(struct file *file_in, loff_t pos_in,
 			    remap_flags, op);
 }
 
+#ifdef CONFIG_MMU
+static unsigned long ovl_get_unmapped_area(struct file *file,
+				unsigned long uaddr, unsigned long len,
+				unsigned long pgoff, unsigned long flags)
+{
+	file = real_file(file);
+
+	return (file->f_op->get_unmapped_area ?:
+		current->mm->get_unmapped_area)(file, uaddr, len, pgoff, flags);
+}
+#else
+#define ovl_get_unmapped_area NULL
+#endif
+
+static struct file *ovl_f_real(struct file *file)
+{
+	while (file->f_op == &ovl_file_operations)
+		file = file->private_data;
+	return file;
+}
+
 const struct file_operations ovl_file_operations = {
 	.open		= ovl_open,
 	.release	= ovl_release,
@@ -771,9 +792,11 @@ const struct file_operations ovl_file_operations = {
 	.compat_ioctl	= ovl_compat_ioctl,
 	.splice_read    = ovl_splice_read,
 	.splice_write   = ovl_splice_write,
+	.f_real		= ovl_f_real,
 
 	.copy_file_range	= ovl_copy_file_range,
 	.remap_file_range	= ovl_remap_file_range,
+	.get_unmapped_area	= ovl_get_unmapped_area,
 };
 
 int __init ovl_aio_request_cache_init(void)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 45cc10cdf6dd..59a969549aec 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1863,6 +1863,7 @@ struct file_operations {
 				   struct file *file_out, loff_t pos_out,
 				   loff_t len, unsigned int remap_flags);
 	int (*fadvise)(struct file *, loff_t, loff_t, int);
+	struct file * (*f_real)(struct file *file);
 } __randomize_layout;
 
 struct inode_operations {
@@ -1895,6 +1896,13 @@ struct inode_operations {
 	int (*set_acl)(struct inode *, struct posix_acl *, int);
 } ____cacheline_aligned;
 
+static inline struct file *real_file(struct file *file)
+{
+	if (unlikely(file->f_op->f_real))
+		return file->f_op->f_real(file);
+	return file;
+}
+
 static inline ssize_t call_read_iter(struct file *file, struct kiocb *kio,
 				     struct iov_iter *iter)
 {
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 43a1cef8f0f1..528b07145414 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -437,6 +437,8 @@ struct file *hugetlb_file_setup(const char *name, size_t size, vm_flags_t acct,
 
 static inline bool is_file_hugepages(struct file *file)
 {
+	file = real_file(file);
+
 	if (file->f_op == &hugetlbfs_file_operations)
 		return true;
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area()
  2020-06-11  0:13                       ` Mike Kravetz
@ 2020-06-11  0:37                         ` Al Viro
  2020-06-11  1:36                           ` Matthew Wilcox
  0 siblings, 1 reply; 21+ messages in thread
From: Al Viro @ 2020-06-11  0:37 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Miklos Szeredi, kbuild test robot, kbuild-all, Colin Walters,
	syzbot, Andrew Morton, Linux Memory Management List,
	linux-fsdevel, linux-kernel, syzkaller-bugs, overlayfs

On Wed, Jun 10, 2020 at 05:13:52PM -0700, Mike Kravetz wrote:

> To address this issue,
> - Add a new file operation f_real while will return the underlying file.
>   Only overlayfs provides a function for this operation.
> - Add a new routine real_file() which can be used by core code get an
>   underlying file.
> - Update is_file_hugepages to get the real file.

Egads...  So to find out whether it's a hugetlb you would
	* check if a method is NULL
	* if not, call it
	* ... and check if the method table of the result is hugetlbfs one?

Here's a radical suggestion: FMODE_HUGEPAGES.  Just have it set by
->open() and let is_file_hugepages() check it.  In ->f_mode.  And
make the bloody hugetlbfs_file_operations static, while we are at it.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area()
  2020-06-11  0:37                         ` Al Viro
@ 2020-06-11  1:36                           ` Matthew Wilcox
  2020-06-11  2:17                             ` Al Viro
  0 siblings, 1 reply; 21+ messages in thread
From: Matthew Wilcox @ 2020-06-11  1:36 UTC (permalink / raw)
  To: Al Viro
  Cc: Mike Kravetz, Miklos Szeredi, kbuild test robot, kbuild-all,
	Colin Walters, syzbot, Andrew Morton,
	Linux Memory Management List, linux-fsdevel, linux-kernel,
	syzkaller-bugs, overlayfs

On Thu, Jun 11, 2020 at 01:37:26AM +0100, Al Viro wrote:
> On Wed, Jun 10, 2020 at 05:13:52PM -0700, Mike Kravetz wrote:
> 
> > To address this issue,
> > - Add a new file operation f_real while will return the underlying file.
> >   Only overlayfs provides a function for this operation.
> > - Add a new routine real_file() which can be used by core code get an
> >   underlying file.
> > - Update is_file_hugepages to get the real file.
> 
> Egads...  So to find out whether it's a hugetlb you would
> 	* check if a method is NULL
> 	* if not, call it
> 	* ... and check if the method table of the result is hugetlbfs one?
> 
> Here's a radical suggestion: FMODE_HUGEPAGES.  Just have it set by
> ->open() and let is_file_hugepages() check it.  In ->f_mode.  And
> make the bloody hugetlbfs_file_operations static, while we are at it.

ITYM FMODE_OVL_UPPER.  To quote Mike:

>         while (file->f_op == &ovl_file_operations)
>                 file = file->private_data;
>         return file;

which would be transformed into:

	while (file->f_mode & FMODE_OVL_UPPER)
		file = file->private_data;
	return file;

Or are you proposing that overlayfs copy FMODE_HUGEPAGES from the
underlying fs to the overlaying fs?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area()
  2020-06-11  1:36                           ` Matthew Wilcox
@ 2020-06-11  2:17                             ` Al Viro
  2020-06-11  2:31                               ` Mike Kravetz
  0 siblings, 1 reply; 21+ messages in thread
From: Al Viro @ 2020-06-11  2:17 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Mike Kravetz, Miklos Szeredi, kbuild test robot, kbuild-all,
	Colin Walters, syzbot, Andrew Morton,
	Linux Memory Management List, linux-fsdevel, linux-kernel,
	syzkaller-bugs, overlayfs

On Wed, Jun 10, 2020 at 06:36:16PM -0700, Matthew Wilcox wrote:

> 	while (file->f_mode & FMODE_OVL_UPPER)
> 		file = file->private_data;
> 	return file;
> 
> Or are you proposing that overlayfs copy FMODE_HUGEPAGES from the
> underlying fs to the overlaying fs?

The latter - that way nobody outside of overlayfs needs to know what
its ->private_data points to, for one thing.  And it's cheaper that
way, obviously.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area()
  2020-06-11  2:17                             ` Al Viro
@ 2020-06-11  2:31                               ` Mike Kravetz
  0 siblings, 0 replies; 21+ messages in thread
From: Mike Kravetz @ 2020-06-11  2:31 UTC (permalink / raw)
  To: Al Viro, Matthew Wilcox
  Cc: Miklos Szeredi, kbuild test robot, kbuild-all, Colin Walters,
	syzbot, Andrew Morton, Linux Memory Management List,
	linux-fsdevel, linux-kernel, syzkaller-bugs, overlayfs

On 6/10/20 7:17 PM, Al Viro wrote:
> On Wed, Jun 10, 2020 at 06:36:16PM -0700, Matthew Wilcox wrote:
> 
>> 	while (file->f_mode & FMODE_OVL_UPPER)
>> 		file = file->private_data;
>> 	return file;
>>
>> Or are you proposing that overlayfs copy FMODE_HUGEPAGES from the
>> underlying fs to the overlaying fs?
> 
> The latter - that way nobody outside of overlayfs needs to know what
> its ->private_data points to, for one thing.  And it's cheaper that
> way, obviously.
> 

Thanks Al and Matthew!

I knew adding a file op for this was overkill and was looking for other
suggestions.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-06-11  2:32 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-06  3:06 kernel BUG at mm/hugetlb.c:LINE! syzbot
2020-04-06 22:05 ` Mike Kravetz
2020-05-12 15:04   ` Miklos Szeredi
2020-05-12 18:11     ` Mike Kravetz
2020-05-15 22:15       ` Mike Kravetz
2020-05-18 11:12         ` Miklos Szeredi
2020-05-18 23:22           ` Mike Kravetz
2020-05-18 23:41     ` Colin Walters
2020-05-19  0:35       ` Mike Kravetz
2020-05-20 11:20         ` Miklos Szeredi
2020-05-20 17:27           ` Mike Kravetz
2020-05-22 10:05             ` Miklos Szeredi
2020-05-28  0:01               ` Mike Kravetz
2020-05-28  8:37                 ` [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area() kbuild test robot
2020-05-28 21:01                   ` Mike Kravetz
2020-06-04  9:16                     ` Miklos Szeredi
2020-06-11  0:13                       ` Mike Kravetz
2020-06-11  0:37                         ` Al Viro
2020-06-11  1:36                           ` Matthew Wilcox
2020-06-11  2:17                             ` Al Viro
2020-06-11  2:31                               ` Mike Kravetz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).