* kernel BUG at mm/hugetlb.c:LINE! @ 2020-04-06 3:06 syzbot 2020-04-06 22:05 ` Mike Kravetz 0 siblings, 1 reply; 21+ messages in thread From: syzbot @ 2020-04-06 3:06 UTC (permalink / raw) To: akpm, linux-fsdevel, linux-kernel, linux-mm, mike.kravetz, mszeredi, syzkaller-bugs, viro Hello, syzbot found the following crash on: HEAD commit: 1a323ea5 x86: get rid of 'errret' argument to __get_user_x.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=132e940be00000 kernel config: https://syzkaller.appspot.com/x/.config?x=8c1e98458335a7d1 dashboard link: https://syzkaller.appspot.com/bug?extid=d6ec23007e951dadf3de compiler: gcc (GCC) 9.0.0 20181231 (experimental) syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12921933e00000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=172e940be00000 The bug was bisected to: commit e950564b97fd0f541b02eb207685d0746f5ecf29 Author: Miklos Szeredi <mszeredi@redhat.com> Date: Tue Jul 24 13:01:55 2018 +0000 vfs: don't evict uninitialized inode bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=115cad33e00000 final crash: https://syzkaller.appspot.com/x/report.txt?x=135cad33e00000 console output: https://syzkaller.appspot.com/x/log.txt?x=155cad33e00000 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+d6ec23007e951dadf3de@syzkaller.appspotmail.com Fixes: e950564b97fd ("vfs: don't evict uninitialized inode") overlayfs: upper fs does not support xattr, falling back to index=off and metacopy=off. ------------[ cut here ]------------ kernel BUG at mm/hugetlb.c:3416! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 7036 Comm: syz-executor110 Not tainted 5.6.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__unmap_hugepage_range+0xa26/0xbc0 mm/hugetlb.c:3416 Code: 00 48 c7 c7 60 37 35 88 e8 57 b4 a2 ff e9 b3 fd ff ff e8 cd 90 c6 ff 0f 0b e9 c4 f7 ff ff e8 c1 90 c6 ff 0f 0b e8 ba 90 c6 ff <0f> 0b e8 b3 90 c6 ff 83 8c 24 c0 00 00 00 01 48 8d bc 24 a0 00 00 RSP: 0018:ffffc900017779b0 EFLAGS: 00010293 RAX: ffff88808cf5c2c0 RBX: ffffffff8c641c08 RCX: ffffffff81ac50b4 RDX: 0000000000000000 RSI: ffffffff81ac58a6 RDI: 0000000000000007 RBP: 0000000020000000 R08: ffff88808cf5c2c0 R09: ffffed10129d8111 R10: ffffed10129d8110 R11: ffff888094ec0887 R12: 0000000000003000 R13: 0000000000000000 R14: 0000000020003000 R15: 0000000000200000 FS: 00000000013c0880(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000140 CR3: 0000000093554000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __unmap_hugepage_range_final+0x30/0x70 mm/hugetlb.c:3507 unmap_single_vma+0x238/0x300 mm/memory.c:1296 unmap_vmas+0x16f/0x2f0 mm/memory.c:1332 exit_mmap+0x2aa/0x510 mm/mmap.c:3126 __mmput kernel/fork.c:1082 [inline] mmput+0x168/0x4b0 kernel/fork.c:1103 exit_mm kernel/exit.c:477 [inline] do_exit+0xa51/0x2dd0 kernel/exit.c:780 do_group_exit+0x125/0x340 kernel/exit.c:891 __do_sys_exit_group kernel/exit.c:902 [inline] __se_sys_exit_group kernel/exit.c:900 [inline] __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:900 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295 entry_SYSCALL_64_after_hwframe+0x49/0xb3 RIP: 0033:0x43efe8 Code: Bad RIP value. RSP: 002b:00007ffdfe6c00f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000043efe8 RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000 RBP: 00000000004be7e8 R08: 00000000000000e7 R09: ffffffffffffffd0 R10: 0000040000000011 R11: 0000000000000246 R12: 0000000000000001 R13: 00000000006d0180 R14: 0000000000000000 R15: 0000000000000000 Modules linked in: ---[ end trace 2d36245d65cb52f7 ]--- RIP: 0010:__unmap_hugepage_range+0xa26/0xbc0 mm/hugetlb.c:3416 Code: 00 48 c7 c7 60 37 35 88 e8 57 b4 a2 ff e9 b3 fd ff ff e8 cd 90 c6 ff 0f 0b e9 c4 f7 ff ff e8 c1 90 c6 ff 0f 0b e8 ba 90 c6 ff <0f> 0b e8 b3 90 c6 ff 83 8c 24 c0 00 00 00 01 48 8d bc 24 a0 00 00 RSP: 0018:ffffc900017779b0 EFLAGS: 00010293 RAX: ffff88808cf5c2c0 RBX: ffffffff8c641c08 RCX: ffffffff81ac50b4 RDX: 0000000000000000 RSI: ffffffff81ac58a6 RDI: 0000000000000007 RBP: 0000000020000000 R08: ffff88808cf5c2c0 R09: ffffed10129d8111 R10: ffffed10129d8110 R11: ffff888094ec0887 R12: 0000000000003000 R13: 0000000000000000 R14: 0000000020003000 R15: 0000000000200000 FS: 00000000013c0880(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8cc24dd000 CR3: 0000000093554000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 --- This bug is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this bug report. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. For information about bisection process see: https://goo.gl/tpsmEJ#bisection syzbot can test patches for this bug, for details see: https://goo.gl/tpsmEJ#testing-patches ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: kernel BUG at mm/hugetlb.c:LINE! 2020-04-06 3:06 kernel BUG at mm/hugetlb.c:LINE! syzbot @ 2020-04-06 22:05 ` Mike Kravetz 2020-05-12 15:04 ` Miklos Szeredi 0 siblings, 1 reply; 21+ messages in thread From: Mike Kravetz @ 2020-04-06 22:05 UTC (permalink / raw) To: syzbot, akpm, linux-fsdevel, linux-kernel, linux-mm, mszeredi, syzkaller-bugs, viro On 4/5/20 8:06 PM, syzbot wrote: > Hello, > > syzbot found the following crash on: > > HEAD commit: 1a323ea5 x86: get rid of 'errret' argument to __get_user_x.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=132e940be00000 > kernel config: https://syzkaller.appspot.com/x/.config?x=8c1e98458335a7d1 > dashboard link: https://syzkaller.appspot.com/bug?extid=d6ec23007e951dadf3de > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12921933e00000 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=172e940be00000 > > The bug was bisected to: > > commit e950564b97fd0f541b02eb207685d0746f5ecf29 > Author: Miklos Szeredi <mszeredi@redhat.com> > Date: Tue Jul 24 13:01:55 2018 +0000 > > vfs: don't evict uninitialized inode > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=115cad33e00000 > final crash: https://syzkaller.appspot.com/x/report.txt?x=135cad33e00000 > console output: https://syzkaller.appspot.com/x/log.txt?x=155cad33e00000 > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+d6ec23007e951dadf3de@syzkaller.appspotmail.com > Fixes: e950564b97fd ("vfs: don't evict uninitialized inode") > > overlayfs: upper fs does not support xattr, falling back to index=off and metacopy=off. > ------------[ cut here ]------------ > kernel BUG at mm/hugetlb.c:3416! > invalid opcode: 0000 [#1] PREEMPT SMP KASAN > CPU: 0 PID: 7036 Comm: syz-executor110 Not tainted 5.6.0-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > RIP: 0010:__unmap_hugepage_range+0xa26/0xbc0 mm/hugetlb.c:3416 > Code: 00 48 c7 c7 60 37 35 88 e8 57 b4 a2 ff e9 b3 fd ff ff e8 cd 90 c6 ff 0f 0b e9 c4 f7 ff ff e8 c1 90 c6 ff 0f 0b e8 ba 90 c6 ff <0f> 0b e8 b3 90 c6 ff 83 8c 24 c0 00 00 00 01 48 8d bc 24 a0 00 00 > RSP: 0018:ffffc900017779b0 EFLAGS: 00010293 > RAX: ffff88808cf5c2c0 RBX: ffffffff8c641c08 RCX: ffffffff81ac50b4 > RDX: 0000000000000000 RSI: ffffffff81ac58a6 RDI: 0000000000000007 > RBP: 0000000020000000 R08: ffff88808cf5c2c0 R09: ffffed10129d8111 > R10: ffffed10129d8110 R11: ffff888094ec0887 R12: 0000000000003000 > R13: 0000000000000000 R14: 0000000020003000 R15: 0000000000200000 > FS: 00000000013c0880(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000020000140 CR3: 0000000093554000 CR4: 00000000001406f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > __unmap_hugepage_range_final+0x30/0x70 mm/hugetlb.c:3507 > unmap_single_vma+0x238/0x300 mm/memory.c:1296 > unmap_vmas+0x16f/0x2f0 mm/memory.c:1332 > exit_mmap+0x2aa/0x510 mm/mmap.c:3126 > __mmput kernel/fork.c:1082 [inline] > mmput+0x168/0x4b0 kernel/fork.c:1103 > exit_mm kernel/exit.c:477 [inline] > do_exit+0xa51/0x2dd0 kernel/exit.c:780 > do_group_exit+0x125/0x340 kernel/exit.c:891 > __do_sys_exit_group kernel/exit.c:902 [inline] > __se_sys_exit_group kernel/exit.c:900 [inline] > __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:900 > do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295 > entry_SYSCALL_64_after_hwframe+0x49/0xb3 This is not new and certainly not caused by commit e950564b97fd. hugetlbf only operates on huge page aligned and sized files/mappings. To make sure this happens, the mmap code contians the following to 'round up' length to huge page size: if (!(flags & MAP_ANONYMOUS)) { audit_mmap_fd(fd, flags); file = fget(fd); if (!file) return -EBADF; if (is_file_hugepages(file)) len = ALIGN(len, huge_page_size(hstate_file(file))); retval = -EINVAL; if (unlikely(flags & MAP_HUGETLB && !is_file_hugepages(file))) goto out_fput; } else if (flags & MAP_HUGETLB) { struct user_struct *user = NULL; struct hstate *hs; hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK); if (!hs) return -EINVAL; len = ALIGN(len, huge_page_size(hs)); However, in this syzbot test case the 'file' is in an overlayfs filesystem created as follows: mkdir("./file0", 000) = 0 mount(NULL, "./file0", "hugetlbfs", MS_MANDLOCK|MS_POSIXACL, NULL) = 0 chdir("./file0") = 0 mkdir("./file1", 000) = 0 mkdir("./bus", 000) = 0 mkdir("./file0", 000) = 0 mount("\177ELF\2\1\1", "./bus", "overlay", 0, "lowerdir=./bus,workdir=./file1,u"...) = 0 The routine is_file_hugepages() is just comparing the file ops to huegtlbfs: if (file->f_op == &hugetlbfs_file_operations) return true; Since the file is in an overlayfs, file->f_op == ovl_file_operations. Therefore, length will not be rounded up to huge page size and we create a mapping with incorrect size which leads to the BUG. Because of the code in mmap, the hugetlbfs mmap() routine assumes length is rounded to a huge page size. I can easily add a check to hugetlbfs mmap to validate length and return -EINVAL. However, I think we really want to do the 'round up' earlier in mmap. This is because the man page says: Huge page (Huge TLB) mappings For mappings that employ huge pages, the requirements for the arguments of mmap() and munmap() differ somewhat from the requirements for map‐ pings that use the native system page size. For mmap(), offset must be a multiple of the underlying huge page size. The system automatically aligns length to be a multiple of the underly‐ ing huge page size. Since the location for the mapping is chosen BEFORE getting to the hugetlbfs mmap routine, we can not wait until then to round up the length. Is there a defined way to go from a struct file * to the underlying filesystem so we can continue to do the 'round up' in early mmap code? One other thing I noticed with overlayfs is that it does not contain a specific get_unmapped_area file_operations routine. I would expect it to at least check for and use the get_unmapped_area of the underlying filesystem? Can someone comment if this is by design? In the case of hugetlbfs, get_unmapped_area is even provided by most architectures. So, it seems like we would like/need to be calling the correct routine. -- Mike Kravetz ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: kernel BUG at mm/hugetlb.c:LINE! 2020-04-06 22:05 ` Mike Kravetz @ 2020-05-12 15:04 ` Miklos Szeredi 2020-05-12 18:11 ` Mike Kravetz 2020-05-18 23:41 ` Colin Walters 0 siblings, 2 replies; 21+ messages in thread From: Miklos Szeredi @ 2020-05-12 15:04 UTC (permalink / raw) To: Mike Kravetz Cc: syzbot, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Miklos Szeredi, syzkaller-bugs, Al Viro On Tue, Apr 7, 2020 at 12:06 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > On 4/5/20 8:06 PM, syzbot wrote: > > Hello, > > > > syzbot found the following crash on: > > > > HEAD commit: 1a323ea5 x86: get rid of 'errret' argument to __get_user_x.. > > git tree: upstream > > console output: https://syzkaller.appspot.com/x/log.txt?x=132e940be00000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=8c1e98458335a7d1 > > dashboard link: https://syzkaller.appspot.com/bug?extid=d6ec23007e951dadf3de > > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12921933e00000 > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=172e940be00000 > > > > The bug was bisected to: > > > > commit e950564b97fd0f541b02eb207685d0746f5ecf29 > > Author: Miklos Szeredi <mszeredi@redhat.com> > > Date: Tue Jul 24 13:01:55 2018 +0000 > > > > vfs: don't evict uninitialized inode > > > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=115cad33e00000 > > final crash: https://syzkaller.appspot.com/x/report.txt?x=135cad33e00000 > > console output: https://syzkaller.appspot.com/x/log.txt?x=155cad33e00000 > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > Reported-by: syzbot+d6ec23007e951dadf3de@syzkaller.appspotmail.com > > Fixes: e950564b97fd ("vfs: don't evict uninitialized inode") > > > > overlayfs: upper fs does not support xattr, falling back to index=off and metacopy=off. > > ------------[ cut here ]------------ > > kernel BUG at mm/hugetlb.c:3416! > > invalid opcode: 0000 [#1] PREEMPT SMP KASAN > > CPU: 0 PID: 7036 Comm: syz-executor110 Not tainted 5.6.0-syzkaller #0 > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > RIP: 0010:__unmap_hugepage_range+0xa26/0xbc0 mm/hugetlb.c:3416 > > Code: 00 48 c7 c7 60 37 35 88 e8 57 b4 a2 ff e9 b3 fd ff ff e8 cd 90 c6 ff 0f 0b e9 c4 f7 ff ff e8 c1 90 c6 ff 0f 0b e8 ba 90 c6 ff <0f> 0b e8 b3 90 c6 ff 83 8c 24 c0 00 00 00 01 48 8d bc 24 a0 00 00 > > RSP: 0018:ffffc900017779b0 EFLAGS: 00010293 > > RAX: ffff88808cf5c2c0 RBX: ffffffff8c641c08 RCX: ffffffff81ac50b4 > > RDX: 0000000000000000 RSI: ffffffff81ac58a6 RDI: 0000000000000007 > > RBP: 0000000020000000 R08: ffff88808cf5c2c0 R09: ffffed10129d8111 > > R10: ffffed10129d8110 R11: ffff888094ec0887 R12: 0000000000003000 > > R13: 0000000000000000 R14: 0000000020003000 R15: 0000000000200000 > > FS: 00000000013c0880(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 0000000020000140 CR3: 0000000093554000 CR4: 00000000001406f0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > Call Trace: > > __unmap_hugepage_range_final+0x30/0x70 mm/hugetlb.c:3507 > > unmap_single_vma+0x238/0x300 mm/memory.c:1296 > > unmap_vmas+0x16f/0x2f0 mm/memory.c:1332 > > exit_mmap+0x2aa/0x510 mm/mmap.c:3126 > > __mmput kernel/fork.c:1082 [inline] > > mmput+0x168/0x4b0 kernel/fork.c:1103 > > exit_mm kernel/exit.c:477 [inline] > > do_exit+0xa51/0x2dd0 kernel/exit.c:780 > > do_group_exit+0x125/0x340 kernel/exit.c:891 > > __do_sys_exit_group kernel/exit.c:902 [inline] > > __se_sys_exit_group kernel/exit.c:900 [inline] > > __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:900 > > do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295 > > entry_SYSCALL_64_after_hwframe+0x49/0xb3 > > This is not new and certainly not caused by commit e950564b97fd. Sorry for replying late... > hugetlbf only operates on huge page aligned and sized files/mappings. > To make sure this happens, the mmap code contians the following to 'round > up' length to huge page size: > > if (!(flags & MAP_ANONYMOUS)) { > audit_mmap_fd(fd, flags); > file = fget(fd); > if (!file) > return -EBADF; > if (is_file_hugepages(file)) > len = ALIGN(len, huge_page_size(hstate_file(file))); > retval = -EINVAL; > if (unlikely(flags & MAP_HUGETLB && !is_file_hugepages(file))) > goto out_fput; > } else if (flags & MAP_HUGETLB) { > struct user_struct *user = NULL; > struct hstate *hs; > > hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK); > if (!hs) > return -EINVAL; > > len = ALIGN(len, huge_page_size(hs)); > > However, in this syzbot test case the 'file' is in an overlayfs filesystem > created as follows: > > mkdir("./file0", 000) = 0 > mount(NULL, "./file0", "hugetlbfs", MS_MANDLOCK|MS_POSIXACL, NULL) = 0 > chdir("./file0") = 0 > mkdir("./file1", 000) = 0 > mkdir("./bus", 000) = 0 > mkdir("./file0", 000) = 0 > mount("\177ELF\2\1\1", "./bus", "overlay", 0, "lowerdir=./bus,workdir=./file1,u"...) = 0 > > The routine is_file_hugepages() is just comparing the file ops to huegtlbfs: > > if (file->f_op == &hugetlbfs_file_operations) > return true; > > Since the file is in an overlayfs, file->f_op == ovl_file_operations. > Therefore, length will not be rounded up to huge page size and we create a > mapping with incorrect size which leads to the BUG. > > Because of the code in mmap, the hugetlbfs mmap() routine assumes length is > rounded to a huge page size. I can easily add a check to hugetlbfs mmap > to validate length and return -EINVAL. However, I think we really want to > do the 'round up' earlier in mmap. This is because the man page says: > > Huge page (Huge TLB) mappings > For mappings that employ huge pages, the requirements for the arguments > of mmap() and munmap() differ somewhat from the requirements for map‐ > pings that use the native system page size. > > For mmap(), offset must be a multiple of the underlying huge page size. > The system automatically aligns length to be a multiple of the underly‐ > ing huge page size. > > Since the location for the mapping is chosen BEFORE getting to the hugetlbfs > mmap routine, we can not wait until then to round up the length. Is there a > defined way to go from a struct file * to the underlying filesystem so we > can continue to do the 'round up' in early mmap code? That's easy enough: static inline struct file *real_file(struct file *file) { return file->f_op != ovl_file_operations ? file : file->private_data; } But adding more filesystem specific code to generic code does not sound like the cleanest way to solve this... > One other thing I noticed with overlayfs is that it does not contain a > specific get_unmapped_area file_operations routine. I would expect it to at > least check for and use the get_unmapped_area of the underlying filesystem? > Can someone comment if this is by design? Not sure. What exactly is f_op->get_unmapped_area supposed to do? > In the case of hugetlbfs, get_unmapped_area is even provided by most > architectures. So, it seems like we would like/need to be calling the correct > routine. Okay. Thanks, Miklos ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: kernel BUG at mm/hugetlb.c:LINE! 2020-05-12 15:04 ` Miklos Szeredi @ 2020-05-12 18:11 ` Mike Kravetz 2020-05-15 22:15 ` Mike Kravetz 2020-05-18 23:41 ` Colin Walters 1 sibling, 1 reply; 21+ messages in thread From: Mike Kravetz @ 2020-05-12 18:11 UTC (permalink / raw) To: Miklos Szeredi Cc: syzbot, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Miklos Szeredi, syzkaller-bugs, Al Viro On 5/12/20 8:04 AM, Miklos Szeredi wrote: > On Tue, Apr 7, 2020 at 12:06 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: >> On 4/5/20 8:06 PM, syzbot wrote: >> >> The routine is_file_hugepages() is just comparing the file ops to huegtlbfs: >> >> if (file->f_op == &hugetlbfs_file_operations) >> return true; >> >> Since the file is in an overlayfs, file->f_op == ovl_file_operations. >> Therefore, length will not be rounded up to huge page size and we create a >> mapping with incorrect size which leads to the BUG. >> >> Because of the code in mmap, the hugetlbfs mmap() routine assumes length is >> rounded to a huge page size. I can easily add a check to hugetlbfs mmap >> to validate length and return -EINVAL. However, I think we really want to >> do the 'round up' earlier in mmap. This is because the man page says: >> >> Huge page (Huge TLB) mappings >> For mappings that employ huge pages, the requirements for the arguments >> of mmap() and munmap() differ somewhat from the requirements for map‐ >> pings that use the native system page size. >> >> For mmap(), offset must be a multiple of the underlying huge page size. >> The system automatically aligns length to be a multiple of the underly‐ >> ing huge page size. >> >> Since the location for the mapping is chosen BEFORE getting to the hugetlbfs >> mmap routine, we can not wait until then to round up the length. Is there a >> defined way to go from a struct file * to the underlying filesystem so we >> can continue to do the 'round up' in early mmap code? > > That's easy enough: > > static inline struct file *real_file(struct file *file) > { > return file->f_op != ovl_file_operations ? file : file->private_data; > } > > But adding more filesystem specific code to generic code does not > sound like the cleanest way to solve this... We can incorporate the above 'real_file' functionality in the filesystem specific routine is_file_hugepages(), and I think that would address this specific issue. I'll code that up. >> One other thing I noticed with overlayfs is that it does not contain a >> specific get_unmapped_area file_operations routine. I would expect it to at >> least check for and use the get_unmapped_area of the underlying filesystem? >> Can someone comment if this is by design? > > Not sure. What exactly is f_op->get_unmapped_area supposed to do? > IIUC, filesystems can define their own routines to get addresses for mmap operations. Quite a few filesystems define get_unmapped_area. The generic mmap code does the following, get_area = current->mm->get_unmapped_area; if (file) { if (file->f_op->get_unmapped_area) get_area = file->f_op->get_unmapped_area; } else if (flags & MAP_SHARED) { /* * mmap_region() will call shmem_zero_setup() to create a file, * so use shmem's get_unmapped_area in case it can be huge. * do_mmap_pgoff() will clear pgoff, so match alignment. */ pgoff = 0; get_area = shmem_get_unmapped_area; } addr = get_area(file, addr, len, pgoff, flags); If the filesystem provides a get_unmapped_area, it will use it. I beleive overlayfs prevents this from happening for the underlying filesystem. Perhaps we do need to add something like a call 'real_file' to this generic code? I can't think of any other way to get to the underlying filesystem get_unmapped_area here. -- Mike Kravetz ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: kernel BUG at mm/hugetlb.c:LINE! 2020-05-12 18:11 ` Mike Kravetz @ 2020-05-15 22:15 ` Mike Kravetz 2020-05-18 11:12 ` Miklos Szeredi 0 siblings, 1 reply; 21+ messages in thread From: Mike Kravetz @ 2020-05-15 22:15 UTC (permalink / raw) To: Miklos Szeredi Cc: syzbot, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Miklos Szeredi, syzkaller-bugs, Al Viro On 5/12/20 11:11 AM, Mike Kravetz wrote: > On 5/12/20 8:04 AM, Miklos Szeredi wrote: >> On Tue, Apr 7, 2020 at 12:06 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: >>> On 4/5/20 8:06 PM, syzbot wrote: >>> >>> The routine is_file_hugepages() is just comparing the file ops to huegtlbfs: >>> >>> if (file->f_op == &hugetlbfs_file_operations) >>> return true; >>> >>> Since the file is in an overlayfs, file->f_op == ovl_file_operations. >>> Therefore, length will not be rounded up to huge page size and we create a >>> mapping with incorrect size which leads to the BUG. >>> >>> Because of the code in mmap, the hugetlbfs mmap() routine assumes length is >>> rounded to a huge page size. I can easily add a check to hugetlbfs mmap >>> to validate length and return -EINVAL. However, I think we really want to >>> do the 'round up' earlier in mmap. This is because the man page says: >>> >>> Huge page (Huge TLB) mappings >>> For mappings that employ huge pages, the requirements for the arguments >>> of mmap() and munmap() differ somewhat from the requirements for map‐ >>> pings that use the native system page size. >>> >>> For mmap(), offset must be a multiple of the underlying huge page size. >>> The system automatically aligns length to be a multiple of the underly‐ >>> ing huge page size. >>> >>> Since the location for the mapping is chosen BEFORE getting to the hugetlbfs >>> mmap routine, we can not wait until then to round up the length. Is there a >>> defined way to go from a struct file * to the underlying filesystem so we >>> can continue to do the 'round up' in early mmap code? >> >> That's easy enough: >> >> static inline struct file *real_file(struct file *file) >> { >> return file->f_op != ovl_file_operations ? file : file->private_data; >> } >> >> But adding more filesystem specific code to generic code does not >> sound like the cleanest way to solve this... > > We can incorporate the above 'real_file' functionality in the filesystem > specific routine is_file_hugepages(), and I think that would address this > specific issue. I'll code that up. > >>> One other thing I noticed with overlayfs is that it does not contain a >>> specific get_unmapped_area file_operations routine. I would expect it to at >>> least check for and use the get_unmapped_area of the underlying filesystem? >>> Can someone comment if this is by design? >> >> Not sure. What exactly is f_op->get_unmapped_area supposed to do? >> > > IIUC, filesystems can define their own routines to get addresses for mmap > operations. Quite a few filesystems define get_unmapped_area. > > The generic mmap code does the following, > > get_area = current->mm->get_unmapped_area; > if (file) { > if (file->f_op->get_unmapped_area) > get_area = file->f_op->get_unmapped_area; > } else if (flags & MAP_SHARED) { > /* > * mmap_region() will call shmem_zero_setup() to create a file, > * so use shmem's get_unmapped_area in case it can be huge. > * do_mmap_pgoff() will clear pgoff, so match alignment. > */ > pgoff = 0; > get_area = shmem_get_unmapped_area; > } > > addr = get_area(file, addr, len, pgoff, flags); > > If the filesystem provides a get_unmapped_area, it will use it. I beleive > overlayfs prevents this from happening for the underlying filesystem. > > Perhaps we do need to add something like a call 'real_file' to this generic > code? I can't think of any other way to get to the underlying filesystem > get_unmapped_area here. I started going down the path of creating a get_unmapped_area f_op for overlayfs. That is pretty straight forward and works well. But that did not take care of the is_file_hugepages() routine. Recall that is_file_hugepages simply does if (file->f_op == &hugetlbfs_file_operations). I suppose I could add a specific overlayfs check like real_file here. But, that does not seem like a clean solution. I also discovered other routines doing comparisons like if (file->f_op == <expected_fops>), they are: is_dma_buf_file() is_file_shm_hugepages() get_pipe_info() is_file_epoll() So, it seems that these routines are also impacted if operating on files in an overlayfs? Any suggestions on how to move forward? It seems like there may be the need for a real_file() routine? I see a d_real dentry_op was added to deal with this issue for dentries. Might we need something similiar for files (f_real)? Looking for suggestions as I do not normally work with this code. -- Mike Kravetz ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: kernel BUG at mm/hugetlb.c:LINE! 2020-05-15 22:15 ` Mike Kravetz @ 2020-05-18 11:12 ` Miklos Szeredi 2020-05-18 23:22 ` Mike Kravetz 0 siblings, 1 reply; 21+ messages in thread From: Miklos Szeredi @ 2020-05-18 11:12 UTC (permalink / raw) To: Mike Kravetz Cc: syzbot, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Miklos Szeredi, syzkaller-bugs, Al Viro On Sat, May 16, 2020 at 12:15 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: > I started going down the path of creating a get_unmapped_area f_op for > overlayfs. That is pretty straight forward and works well. But that > did not take care of the is_file_hugepages() routine. Recall that > is_file_hugepages simply does if (file->f_op == &hugetlbfs_file_operations). > > I suppose I could add a specific overlayfs check like real_file here. But, > that does not seem like a clean solution. > > I also discovered other routines doing comparisons like > if (file->f_op == <expected_fops>), they are: > is_dma_buf_file() > is_file_shm_hugepages() > get_pipe_info() > is_file_epoll() > > So, it seems that these routines are also impacted if operating on files > in an overlayfs? Those are non-filesystems, with the exception of is_file_shm_hugepages(), the only caller of which is is_file_hugepages(). > Any suggestions on how to move forward? It seems like there may be the > need for a real_file() routine? I see a d_real dentry_op was added to > deal with this issue for dentries. Might we need something similiar for > files (f_real)? > > Looking for suggestions as I do not normally work with this code. And I'm not so familiar with hugepages code. I'd suggest moving length alignment into f_op->get_unmapped_area() and cleaning up other special casing of hugetlb mappings, but it's probably far from trivial... So yeah, that leaves a real_file() helper or something similar. Unlike the example I gave first it actually needs to be recursive: static inline struct file *real_file(struct file *file) { whole (unlikely(file->f_op == ovl_file_operations)) file = file->private_data; return file; } Thanks, Miklos ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: kernel BUG at mm/hugetlb.c:LINE! 2020-05-18 11:12 ` Miklos Szeredi @ 2020-05-18 23:22 ` Mike Kravetz 0 siblings, 0 replies; 21+ messages in thread From: Mike Kravetz @ 2020-05-18 23:22 UTC (permalink / raw) To: Miklos Szeredi Cc: syzbot, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Miklos Szeredi, syzkaller-bugs, Al Viro On 5/18/20 4:12 AM, Miklos Szeredi wrote: > On Sat, May 16, 2020 at 12:15 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: >> Any suggestions on how to move forward? It seems like there may be the >> need for a real_file() routine? I see a d_real dentry_op was added to >> deal with this issue for dentries. Might we need something similiar for >> files (f_real)? >> >> Looking for suggestions as I do not normally work with this code. > > And I'm not so familiar with hugepages code. I'd suggest moving > length alignment into f_op->get_unmapped_area() and cleaning up other > special casing of hugetlb mappings, but it's probably far from > trivial... > > So yeah, that leaves a real_file() helper or something similar. > Unlike the example I gave first it actually needs to be recursive: > > static inline struct file *real_file(struct file *file) > { > whole (unlikely(file->f_op == ovl_file_operations)) > file = file->private_data; > return file; > } If we add real_file(), then I think it only needs to be called in two places: is_file_hugepages() and core mmap code. However, I could not think of a good place to put real_file(). Below is a patch which creates a new file <linux/overlayfs.h> for the routine. It does solve this BUG and should fix any other issues with callers of is_file_hugepages(). Let me know what you think. I add a 'Suggested-by:' for real_file, but am happy to change that to a 'Signed-off-by:' if you prefer. From ea6a96aa3f5365df39f7cf213f87abe336b43e71 Mon Sep 17 00:00:00 2001 From: Mike Kravetz <mike.kravetz@oracle.com> Date: Mon, 18 May 2020 15:29:12 -0700 Subject: [PATCH] ovl: provide real_file() for use by hugetlb and mmap If a file is on a union/overlay, then the 'struct file *' will have overlayfs file operations. The routine is_file_hugepages() compares f->f_op to hugetlbfs_file_operations to determine if it is a hugetlbfs file. If a hugetlbfs file is on a union/overlay, this comparison is false and is_file_hugepages() incorrectly indicates the underlying file is not hugetlbfs. One result of this is a BUG as shown in [1]. mmap uses is_file_hugepages() because hugetlbfs files have different alignment restrictions. In addition, mmap code would like to use the filesystem specific get_unmapped_area() routine if one is defined. To address this issue, add a new routine real_file() which will return the underlying file. Update is_file_hugepages and mmap code to get the real file. [1] https://lore.kernel.org/linux-mm/000000000000b4684e05a2968ca6@google.com/ Reported-by: syzbot+d6ec23007e951dadf3de@syzkaller.appspotmail.com Suggested-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> --- include/linux/hugetlb.h | 3 +++ include/linux/overlayfs.h | 27 +++++++++++++++++++++++++++ mm/mmap.c | 2 ++ 3 files changed, 32 insertions(+) create mode 100644 include/linux/overlayfs.h diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 43a1cef8f0f1..fb22c0a7474a 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -9,6 +9,7 @@ #include <linux/cgroup.h> #include <linux/list.h> #include <linux/kref.h> +#include <linux/overlayfs.h> #include <asm/pgtable.h> struct ctl_table; @@ -437,6 +438,8 @@ struct file *hugetlb_file_setup(const char *name, size_t size, vm_flags_t acct, static inline bool is_file_hugepages(struct file *file) { + file = real_file(file); + if (file->f_op == &hugetlbfs_file_operations) return true; diff --git a/include/linux/overlayfs.h b/include/linux/overlayfs.h new file mode 100644 index 000000000000..eecdfda0286f --- /dev/null +++ b/include/linux/overlayfs.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_OVERLAYFS_H +#define _LINUX_OVERLAYFS_H + +#include <linux/fs.h> + +extern const struct file_operations ovl_file_operations; + +#ifdef CONFIG_OVERLAY_FS +/* + * If file is on a union/overlay, then return the underlying real file. + * Otherwise return the file itself. + */ +static inline struct file *real_file(struct file *file) +{ + while (unlikely(file->f_op == &ovl_file_operations)) + file = file->private_data; + return file; +} +#else +static inline struct file *real_file(struct file *file) +{ + return file; +} +#endif + +#endif /* _LINUX_OVERLAYFS_H */ diff --git a/mm/mmap.c b/mm/mmap.c index f609e9ec4a25..7f45a4057a15 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -47,6 +47,7 @@ #include <linux/pkeys.h> #include <linux/oom.h> #include <linux/sched/mm.h> +#include <linux/overlayfs.h> #include <linux/uaccess.h> #include <asm/cacheflush.h> @@ -2203,6 +2204,7 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len, get_area = current->mm->get_unmapped_area; if (file) { + file = real_file(file); if (file->f_op->get_unmapped_area) get_area = file->f_op->get_unmapped_area; } else if (flags & MAP_SHARED) { -- 2.25.4 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: kernel BUG at mm/hugetlb.c:LINE! 2020-05-12 15:04 ` Miklos Szeredi 2020-05-12 18:11 ` Mike Kravetz @ 2020-05-18 23:41 ` Colin Walters 2020-05-19 0:35 ` Mike Kravetz 1 sibling, 1 reply; 21+ messages in thread From: Colin Walters @ 2020-05-18 23:41 UTC (permalink / raw) To: Miklos Szeredi, Mike Kravetz Cc: syzbot, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Miklos Szeredi, syzkaller-bugs, Al Viro On Tue, May 12, 2020, at 11:04 AM, Miklos Szeredi wrote: > > However, in this syzbot test case the 'file' is in an overlayfs filesystem > > created as follows: > > > > mkdir("./file0", 000) = 0 > > mount(NULL, "./file0", "hugetlbfs", MS_MANDLOCK|MS_POSIXACL, NULL) = 0 > > chdir("./file0") = 0 > > mkdir("./file1", 000) = 0 > > mkdir("./bus", 000) = 0 > > mkdir("./file0", 000) = 0 > > mount("\177ELF\2\1\1", "./bus", "overlay", 0, "lowerdir=./bus,workdir=./file1,u"...) = 0 Is there any actual valid use case for mounting an overlayfs on top of hugetlbfs? I can't think of one. Why isn't the response to this to instead only allow mounting overlayfs on top of basically a set of whitelisted filesystems? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: kernel BUG at mm/hugetlb.c:LINE! 2020-05-18 23:41 ` Colin Walters @ 2020-05-19 0:35 ` Mike Kravetz 2020-05-20 11:20 ` Miklos Szeredi 0 siblings, 1 reply; 21+ messages in thread From: Mike Kravetz @ 2020-05-19 0:35 UTC (permalink / raw) To: Colin Walters, Miklos Szeredi Cc: syzbot, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Miklos Szeredi, syzkaller-bugs, Al Viro On 5/18/20 4:41 PM, Colin Walters wrote: > > On Tue, May 12, 2020, at 11:04 AM, Miklos Szeredi wrote: > >>> However, in this syzbot test case the 'file' is in an overlayfs filesystem >>> created as follows: >>> >>> mkdir("./file0", 000) = 0 >>> mount(NULL, "./file0", "hugetlbfs", MS_MANDLOCK|MS_POSIXACL, NULL) = 0 >>> chdir("./file0") = 0 >>> mkdir("./file1", 000) = 0 >>> mkdir("./bus", 000) = 0 >>> mkdir("./file0", 000) = 0 >>> mount("\177ELF\2\1\1", "./bus", "overlay", 0, "lowerdir=./bus,workdir=./file1,u"...) = 0 > > Is there any actual valid use case for mounting an overlayfs on top of hugetlbfs? I can't think of one. Why isn't the response to this to instead only allow mounting overlayfs on top of basically a set of whitelisted filesystems? > I can not think of a use case. I'll let Miklos comment on adding whitelist capability to overlayfs. IMO - This BUG/report revealed two issues. First is the BUG by mmap'ing a hugetlbfs file on overlayfs. The other is that core mmap code will skip any filesystem specific get_unmapped area routine if on a union/overlay. My patch fixes both, but if we go with a whitelist approach and don't allow hugetlbfs I think we still need to address the filesystem specific get_unmapped area issue. That is easy enough to do by adding a routine to overlayfs which calls the routine for the underlying fs. -- Mike Kravetz ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: kernel BUG at mm/hugetlb.c:LINE! 2020-05-19 0:35 ` Mike Kravetz @ 2020-05-20 11:20 ` Miklos Szeredi 2020-05-20 17:27 ` Mike Kravetz 0 siblings, 1 reply; 21+ messages in thread From: Miklos Szeredi @ 2020-05-20 11:20 UTC (permalink / raw) To: Mike Kravetz Cc: Colin Walters, syzbot, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Miklos Szeredi, syzkaller-bugs, Al Viro On Tue, May 19, 2020 at 2:35 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > On 5/18/20 4:41 PM, Colin Walters wrote: > > > > On Tue, May 12, 2020, at 11:04 AM, Miklos Szeredi wrote: > > > >>> However, in this syzbot test case the 'file' is in an overlayfs filesystem > >>> created as follows: > >>> > >>> mkdir("./file0", 000) = 0 > >>> mount(NULL, "./file0", "hugetlbfs", MS_MANDLOCK|MS_POSIXACL, NULL) = 0 > >>> chdir("./file0") = 0 > >>> mkdir("./file1", 000) = 0 > >>> mkdir("./bus", 000) = 0 > >>> mkdir("./file0", 000) = 0 > >>> mount("\177ELF\2\1\1", "./bus", "overlay", 0, "lowerdir=./bus,workdir=./file1,u"...) = 0 > > > > Is there any actual valid use case for mounting an overlayfs on top of hugetlbfs? I can't think of one. Why isn't the response to this to instead only allow mounting overlayfs on top of basically a set of whitelisted filesystems? > > > > I can not think of a use case. I'll let Miklos comment on adding whitelist > capability to overlayfs. I've not heard of overlayfs being used over hugetlbfs. Overlayfs on tmpfs is definitely used, I guess without hugepages. But if we'd want to allow tmpfs _without_ hugepages but not tmpfs _with_ hugepages, then we can't just whitelist based on filesystem type, but need to look at mount options as well. Which isn't really a clean solution either. > IMO - This BUG/report revealed two issues. First is the BUG by mmap'ing > a hugetlbfs file on overlayfs. The other is that core mmap code will skip > any filesystem specific get_unmapped area routine if on a union/overlay. > My patch fixes both, but if we go with a whitelist approach and don't allow > hugetlbfs I think we still need to address the filesystem specific > get_unmapped area issue. That is easy enough to do by adding a routine to > overlayfs which calls the routine for the underlying fs. I think the two are strongly related: get_unmapped_area() adjusts the address alignment, and the is_file_hugepages() call in ksys_mmap_pgoff() adjusts the length alignment. Is there any other purpose for which f_op->get_unmapped_area() is used by a filesystem? Thanks, Miklos ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: kernel BUG at mm/hugetlb.c:LINE! 2020-05-20 11:20 ` Miklos Szeredi @ 2020-05-20 17:27 ` Mike Kravetz 2020-05-22 10:05 ` Miklos Szeredi 0 siblings, 1 reply; 21+ messages in thread From: Mike Kravetz @ 2020-05-20 17:27 UTC (permalink / raw) To: Miklos Szeredi Cc: Colin Walters, syzbot, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Miklos Szeredi, syzkaller-bugs, Al Viro On 5/20/20 4:20 AM, Miklos Szeredi wrote: > On Tue, May 19, 2020 at 2:35 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: >> >> On 5/18/20 4:41 PM, Colin Walters wrote: >>> >>> On Tue, May 12, 2020, at 11:04 AM, Miklos Szeredi wrote: >>> >>>>> However, in this syzbot test case the 'file' is in an overlayfs filesystem >>>>> created as follows: >>>>> >>>>> mkdir("./file0", 000) = 0 >>>>> mount(NULL, "./file0", "hugetlbfs", MS_MANDLOCK|MS_POSIXACL, NULL) = 0 >>>>> chdir("./file0") = 0 >>>>> mkdir("./file1", 000) = 0 >>>>> mkdir("./bus", 000) = 0 >>>>> mkdir("./file0", 000) = 0 >>>>> mount("\177ELF\2\1\1", "./bus", "overlay", 0, "lowerdir=./bus,workdir=./file1,u"...) = 0 >>> >>> Is there any actual valid use case for mounting an overlayfs on top of hugetlbfs? I can't think of one. Why isn't the response to this to instead only allow mounting overlayfs on top of basically a set of whitelisted filesystems? >>> >> >> I can not think of a use case. I'll let Miklos comment on adding whitelist >> capability to overlayfs. > > I've not heard of overlayfs being used over hugetlbfs. > > Overlayfs on tmpfs is definitely used, I guess without hugepages. > But if we'd want to allow tmpfs _without_ hugepages but not tmpfs > _with_ hugepages, then we can't just whitelist based on filesystem > type, but need to look at mount options as well. Which isn't really a > clean solution either. > >> IMO - This BUG/report revealed two issues. First is the BUG by mmap'ing >> a hugetlbfs file on overlayfs. The other is that core mmap code will skip >> any filesystem specific get_unmapped area routine if on a union/overlay. >> My patch fixes both, but if we go with a whitelist approach and don't allow >> hugetlbfs I think we still need to address the filesystem specific >> get_unmapped area issue. That is easy enough to do by adding a routine to >> overlayfs which calls the routine for the underlying fs. > > I think the two are strongly related: get_unmapped_area() adjusts the > address alignment, and the is_file_hugepages() call in > ksys_mmap_pgoff() adjusts the length alignment. > > Is there any other purpose for which f_op->get_unmapped_area() is > used by a filesystem? > I am fairly confident it is all about checking limits and alignment. The filesystem knows if it can/should align to base or huge page size. DAX has some interesting additional restrictions, and several 'traditional' filesystems check if they are 'on DAX'. In a previous e-mail, you suggested hugetlb_get_unmapped_area could do the length adjustment in hugetlb_get_unmapped_area (generic and arch specific). I agree, although there may be the need to add length overflow checks in these routines (after round up) as this is done in core code now. However, this can be done as a separate cleanup patch. In any case, we need to get the core mmap code to call filesystem specific get_unmapped_area if on a union/overlay. The patch I suggested does this by simply calling real_file to determine if there is a filesystem specific get_unmapped_area. The other approach would be to provide an overlayfs get_unmapped_area that calls the underlying filesystem get_unmapped_area. -- Mike Kravetz ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: kernel BUG at mm/hugetlb.c:LINE! 2020-05-20 17:27 ` Mike Kravetz @ 2020-05-22 10:05 ` Miklos Szeredi 2020-05-28 0:01 ` Mike Kravetz 0 siblings, 1 reply; 21+ messages in thread From: Miklos Szeredi @ 2020-05-22 10:05 UTC (permalink / raw) To: Mike Kravetz Cc: Colin Walters, syzbot, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Miklos Szeredi, syzkaller-bugs, Al Viro, linux-unionfs On Wed, May 20, 2020 at 10:27:15AM -0700, Mike Kravetz wrote: > I am fairly confident it is all about checking limits and alignment. The > filesystem knows if it can/should align to base or huge page size. DAX has > some interesting additional restrictions, and several 'traditional' filesystems > check if they are 'on DAX'. Okay, I haven't looked at DAX vs. overlay. I'm sure it's going to come up at some point, if it hasn't already. > > In a previous e-mail, you suggested hugetlb_get_unmapped_area could do the > length adjustment in hugetlb_get_unmapped_area (generic and arch specific). > I agree, although there may be the need to add length overflow checks in > these routines (after round up) as this is done in core code now. However, > this can be done as a separate cleanup patch. > > In any case, we need to get the core mmap code to call filesystem specific > get_unmapped_area if on a union/overlay. The patch I suggested does this > by simply calling real_file to determine if there is a filesystem specific > get_unmapped_area. The other approach would be to provide an overlayfs > get_unmapped_area that calls the underlying filesystem get_unmapped_area. That latter is what's done for all other stacked operations in overlayfs. Untested patch below. Thanks, Miklos --- fs/overlayfs/file.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -757,6 +757,17 @@ static loff_t ovl_remap_file_range(struc remap_flags, op); } +static unsigned long ovl_get_unmapped_area(struct file *file, + unsigned long uaddr, unsigned long len, + unsigned long pgoff, unsigned long flags) +{ + struct file *realfile = file->private_data; + + return (realfile->f_op->get_unmapped_area ?: + current->mm->get_unmapped_area)(realfile, + uaddr, len, pgoff, flags); +} + const struct file_operations ovl_file_operations = { .open = ovl_open, .release = ovl_release, @@ -774,6 +785,7 @@ const struct file_operations ovl_file_op .copy_file_range = ovl_copy_file_range, .remap_file_range = ovl_remap_file_range, + .get_unmapped_area = ovl_get_unmapped_area, }; int __init ovl_aio_request_cache_init(void) ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: kernel BUG at mm/hugetlb.c:LINE! 2020-05-22 10:05 ` Miklos Szeredi @ 2020-05-28 0:01 ` Mike Kravetz 2020-05-28 8:37 ` [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area() kbuild test robot 0 siblings, 1 reply; 21+ messages in thread From: Mike Kravetz @ 2020-05-28 0:01 UTC (permalink / raw) To: Miklos Szeredi Cc: Colin Walters, syzbot, Andrew Morton, linux-fsdevel, linux-kernel, linux-mm, Miklos Szeredi, syzkaller-bugs, Al Viro, linux-unionfs On 5/22/20 3:05 AM, Miklos Szeredi wrote: > On Wed, May 20, 2020 at 10:27:15AM -0700, Mike Kravetz wrote: > >> I am fairly confident it is all about checking limits and alignment. The >> filesystem knows if it can/should align to base or huge page size. DAX has >> some interesting additional restrictions, and several 'traditional' filesystems >> check if they are 'on DAX'. > > > Okay, I haven't looked at DAX vs. overlay. I'm sure it's going to come up at > some point, if it hasn't already. > >> >> In a previous e-mail, you suggested hugetlb_get_unmapped_area could do the >> length adjustment in hugetlb_get_unmapped_area (generic and arch specific). >> I agree, although there may be the need to add length overflow checks in >> these routines (after round up) as this is done in core code now. However, >> this can be done as a separate cleanup patch. >> >> In any case, we need to get the core mmap code to call filesystem specific >> get_unmapped_area if on a union/overlay. The patch I suggested does this >> by simply calling real_file to determine if there is a filesystem specific >> get_unmapped_area. The other approach would be to provide an overlayfs >> get_unmapped_area that calls the underlying filesystem get_unmapped_area. > > That latter is what's done for all other stacked operations in overlayfs. > > Untested patch below. > Thanks Miklos! We still need the 'real_file()' routine for is_file_hugepages. So combining these, I propose the following patch. It addresses the known issue as well as potential issues with is_file_hugepages returning incorrect information. I don't really like a separate header file for real_file, but I can not think of any good place to put it. Let me know what you think, From 33f6bbd19406108b61a4113b1ec8e4e6888cd482 Mon Sep 17 00:00:00 2001 From: Mike Kravetz <mike.kravetz@oracle.com> Date: Wed, 27 May 2020 16:58:58 -0700 Subject: [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area() If a file is on a union/overlay, then the 'struct file *' will have overlayfs file operations. The routine is_file_hugepages() compares f->f_op to hugetlbfs_file_operations to determine if it is a hugetlbfs file. If a hugetlbfs file is on a union/overlay, this comparison is false and is_file_hugepages() incorrectly indicates the underlying file is not hugetlbfs. One result of this is a BUG as shown in [1]. mmap uses is_file_hugepages() because hugetlbfs files have different alignment restrictions. In addition, mmap code would like to use the filesystem specific get_unmapped_area() routine if one is defined. To address this issue, - Add a new routine real_file() which will return the underlying file. - Update is_file_hugepages to get the real file. - Add get_unmapped_area f_op to oerrlayfs to call underlying routine. [1] https://lore.kernel.org/linux-mm/000000000000b4684e05a2968ca6@google.com/ Reported-by: syzbot+d6ec23007e951dadf3de@syzkaller.appspotmail.com Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> --- fs/overlayfs/file.c | 13 +++++++++++++ include/linux/hugetlb.h | 3 +++ include/linux/overlayfs.h | 27 +++++++++++++++++++++++++++ 3 files changed, 43 insertions(+) create mode 100644 include/linux/overlayfs.h diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c index 87c362f65448..cc020e1c72d5 100644 --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -12,6 +12,7 @@ #include <linux/splice.h> #include <linux/mm.h> #include <linux/fs.h> +#include <linux/overlayfs.h> #include "overlayfs.h" struct ovl_aio_req { @@ -757,6 +758,17 @@ static loff_t ovl_remap_file_range(struct file *file_in, loff_t pos_in, remap_flags, op); } +static unsigned long ovl_get_unmapped_area(struct file *file, + unsigned long uaddr, unsigned long len, + unsigned long pgoff, unsigned long flags) +{ + struct file *realfile = real_file(file); + + return (realfile->f_op->get_unmapped_area ?: + current->mm->get_unmapped_area)(realfile, + uaddr, len, pgoff, flags); +} + const struct file_operations ovl_file_operations = { .open = ovl_open, .release = ovl_release, @@ -774,6 +786,7 @@ const struct file_operations ovl_file_operations = { .copy_file_range = ovl_copy_file_range, .remap_file_range = ovl_remap_file_range, + .get_unmapped_area = ovl_get_unmapped_area, }; int __init ovl_aio_request_cache_init(void) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 43a1cef8f0f1..fb22c0a7474a 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -9,6 +9,7 @@ #include <linux/cgroup.h> #include <linux/list.h> #include <linux/kref.h> +#include <linux/overlayfs.h> #include <asm/pgtable.h> struct ctl_table; @@ -437,6 +438,8 @@ struct file *hugetlb_file_setup(const char *name, size_t size, vm_flags_t acct, static inline bool is_file_hugepages(struct file *file) { + file = real_file(file); + if (file->f_op == &hugetlbfs_file_operations) return true; diff --git a/include/linux/overlayfs.h b/include/linux/overlayfs.h new file mode 100644 index 000000000000..eecdfda0286f --- /dev/null +++ b/include/linux/overlayfs.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_OVERLAYFS_H +#define _LINUX_OVERLAYFS_H + +#include <linux/fs.h> + +extern const struct file_operations ovl_file_operations; + +#ifdef CONFIG_OVERLAY_FS +/* + * If file is on a union/overlay, then return the underlying real file. + * Otherwise return the file itself. + */ +static inline struct file *real_file(struct file *file) +{ + while (unlikely(file->f_op == &ovl_file_operations)) + file = file->private_data; + return file; +} +#else +static inline struct file *real_file(struct file *file) +{ + return file; +} +#endif + +#endif /* _LINUX_OVERLAYFS_H */ -- 2.25.4 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area() 2020-05-28 0:01 ` Mike Kravetz @ 2020-05-28 8:37 ` kbuild test robot 2020-05-28 21:01 ` Mike Kravetz 0 siblings, 1 reply; 21+ messages in thread From: kbuild test robot @ 2020-05-28 8:37 UTC (permalink / raw) To: Mike Kravetz, Miklos Szeredi Cc: kbuild-all, Colin Walters, syzbot, Andrew Morton, Linux Memory Management List, linux-fsdevel, linux-kernel, syzkaller-bugs, Al Viro, linux-unionfs [-- Attachment #1: Type: text/plain, Size: 2140 bytes --] Hi Mike, I love your patch! Yet something to improve: [auto build test ERROR on miklos-vfs/overlayfs-next] [also build test ERROR on linus/master v5.7-rc7] [cannot apply to linux/master next-20200526] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/Mike-Kravetz/ovl-provide-real_file-and-overlayfs-get_unmapped_area/20200528-080533 base: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next config: h8300-randconfig-r036-20200528 (attached as .config) compiler: h8300-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=h8300 If you fix the issue, kindly add following tag as appropriate Reported-by: kbuild test robot <lkp@intel.com> All error/warnings (new ones prefixed by >>, old ones prefixed by <<): fs/overlayfs/file.c: In function 'ovl_get_unmapped_area': >> fs/overlayfs/file.c:768:14: error: 'struct mm_struct' has no member named 'get_unmapped_area' 768 | current->mm->get_unmapped_area)(realfile, | ^~ >> fs/overlayfs/file.c:770:1: warning: control reaches end of non-void function [-Wreturn-type] 770 | } | ^ vim +768 fs/overlayfs/file.c 760 761 static unsigned long ovl_get_unmapped_area(struct file *file, 762 unsigned long uaddr, unsigned long len, 763 unsigned long pgoff, unsigned long flags) 764 { 765 struct file *realfile = real_file(file); 766 767 return (realfile->f_op->get_unmapped_area ?: > 768 current->mm->get_unmapped_area)(realfile, 769 uaddr, len, pgoff, flags); > 770 } 771 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 31095 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area() 2020-05-28 8:37 ` [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area() kbuild test robot @ 2020-05-28 21:01 ` Mike Kravetz 2020-06-04 9:16 ` Miklos Szeredi 0 siblings, 1 reply; 21+ messages in thread From: Mike Kravetz @ 2020-05-28 21:01 UTC (permalink / raw) To: kbuild test robot, Miklos Szeredi Cc: kbuild-all, Colin Walters, syzbot, Andrew Morton, Linux Memory Management List, linux-fsdevel, linux-kernel, syzkaller-bugs, Al Viro, linux-unionfs On 5/28/20 1:37 AM, kbuild test robot wrote: > Hi Mike, > > I love your patch! Yet something to improve: > > [auto build test ERROR on miklos-vfs/overlayfs-next] > [also build test ERROR on linus/master v5.7-rc7] > [cannot apply to linux/master next-20200526] > [if your patch is applied to the wrong git tree, please drop us a note to help > improve the system. BTW, we also suggest to use '--base' option to specify the > base tree in git format-patch, please see https://stackoverflow.com/a/37406982] > > url: https://github.com/0day-ci/linux/commits/Mike-Kravetz/ovl-provide-real_file-and-overlayfs-get_unmapped_area/20200528-080533 > base: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next > config: h8300-randconfig-r036-20200528 (attached as .config) > compiler: h8300-linux-gcc (GCC) 9.3.0 > reproduce (this is a W=1 build): > wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross > chmod +x ~/bin/make.cross > # save the attached .config to linux build tree > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=h8300 > > If you fix the issue, kindly add following tag as appropriate > Reported-by: kbuild test robot <lkp@intel.com> > > All error/warnings (new ones prefixed by >>, old ones prefixed by <<): > > fs/overlayfs/file.c: In function 'ovl_get_unmapped_area': >>> fs/overlayfs/file.c:768:14: error: 'struct mm_struct' has no member named 'get_unmapped_area' > 768 | current->mm->get_unmapped_area)(realfile, > | ^~ >>> fs/overlayfs/file.c:770:1: warning: control reaches end of non-void function [-Wreturn-type] > 770 | } > | ^ > > vim +768 fs/overlayfs/file.c > > 760 > 761 static unsigned long ovl_get_unmapped_area(struct file *file, > 762 unsigned long uaddr, unsigned long len, > 763 unsigned long pgoff, unsigned long flags) > 764 { > 765 struct file *realfile = real_file(file); > 766 > 767 return (realfile->f_op->get_unmapped_area ?: > > 768 current->mm->get_unmapped_area)(realfile, > 769 uaddr, len, pgoff, flags); > > 770 } > 771 > > --- > 0-DAY CI Kernel Test Service, Intel Corporation > https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org > Well yuck! get_unmapped_area is not part of mm_struct if !CONFIG_MMU. Miklos, would adding '#ifdef CONFIG_MMU' around the overlayfs code be too ugly for you? Another option is to use real_file() in the mmap code as done in [1]. Sorry for all the questions. However, I want to make sure you are happy with any overlayfs changes. [1] https://lore.kernel.org/linux-mm/04a00e3b-539c-236f-e43b-0024ef94b7cb@oracle.com/ -- Mike Kravetz ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area() 2020-05-28 21:01 ` Mike Kravetz @ 2020-06-04 9:16 ` Miklos Szeredi 2020-06-11 0:13 ` Mike Kravetz 0 siblings, 1 reply; 21+ messages in thread From: Miklos Szeredi @ 2020-06-04 9:16 UTC (permalink / raw) To: Mike Kravetz Cc: kbuild test robot, kbuild-all, Colin Walters, syzbot, Andrew Morton, Linux Memory Management List, linux-fsdevel, linux-kernel, syzkaller-bugs, Al Viro, overlayfs On Thu, May 28, 2020 at 11:01 PM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > On 5/28/20 1:37 AM, kbuild test robot wrote: > > Hi Mike, > > > > I love your patch! Yet something to improve: > > > > [auto build test ERROR on miklos-vfs/overlayfs-next] > > [also build test ERROR on linus/master v5.7-rc7] > > [cannot apply to linux/master next-20200526] > > [if your patch is applied to the wrong git tree, please drop us a note to help > > improve the system. BTW, we also suggest to use '--base' option to specify the > > base tree in git format-patch, please see https://stackoverflow.com/a/37406982] > > > > url: https://github.com/0day-ci/linux/commits/Mike-Kravetz/ovl-provide-real_file-and-overlayfs-get_unmapped_area/20200528-080533 > > base: https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs-next > > config: h8300-randconfig-r036-20200528 (attached as .config) > > compiler: h8300-linux-gcc (GCC) 9.3.0 > > reproduce (this is a W=1 build): > > wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross > > chmod +x ~/bin/make.cross > > # save the attached .config to linux build tree > > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=h8300 > > > > If you fix the issue, kindly add following tag as appropriate > > Reported-by: kbuild test robot <lkp@intel.com> > > > > All error/warnings (new ones prefixed by >>, old ones prefixed by <<): > > > > fs/overlayfs/file.c: In function 'ovl_get_unmapped_area': > >>> fs/overlayfs/file.c:768:14: error: 'struct mm_struct' has no member named 'get_unmapped_area' > > 768 | current->mm->get_unmapped_area)(realfile, > > | ^~ > >>> fs/overlayfs/file.c:770:1: warning: control reaches end of non-void function [-Wreturn-type] > > 770 | } > > | ^ > > > > vim +768 fs/overlayfs/file.c > > > > 760 > > 761 static unsigned long ovl_get_unmapped_area(struct file *file, > > 762 unsigned long uaddr, unsigned long len, > > 763 unsigned long pgoff, unsigned long flags) > > 764 { > > 765 struct file *realfile = real_file(file); > > 766 > > 767 return (realfile->f_op->get_unmapped_area ?: > > > 768 current->mm->get_unmapped_area)(realfile, > > 769 uaddr, len, pgoff, flags); > > > 770 } > > 771 > > > > --- > > 0-DAY CI Kernel Test Service, Intel Corporation > > https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org > > > > Well yuck! get_unmapped_area is not part of mm_struct if !CONFIG_MMU. > > Miklos, would adding '#ifdef CONFIG_MMU' around the overlayfs code be too > ugly for you? Another option is to use real_file() in the mmap code as > done in [1]. I think the proper fix is to add an inline helper (call_get_unmapped_area()?) in linux/mm.h, and make that work properly for the NOMMU case as well. Thanks, Miklos ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area() 2020-06-04 9:16 ` Miklos Szeredi @ 2020-06-11 0:13 ` Mike Kravetz 2020-06-11 0:37 ` Al Viro 0 siblings, 1 reply; 21+ messages in thread From: Mike Kravetz @ 2020-06-11 0:13 UTC (permalink / raw) To: Miklos Szeredi Cc: kbuild test robot, kbuild-all, Colin Walters, syzbot, Andrew Morton, Linux Memory Management List, linux-fsdevel, linux-kernel, syzkaller-bugs, Al Viro, overlayfs On 6/4/20 2:16 AM, Miklos Szeredi wrote: > On Thu, May 28, 2020 at 11:01 PM Mike Kravetz <mike.kravetz@oracle.com> wrote: >> >> Well yuck! get_unmapped_area is not part of mm_struct if !CONFIG_MMU. >> >> Miklos, would adding '#ifdef CONFIG_MMU' around the overlayfs code be too >> ugly for you? Another option is to use real_file() in the mmap code as >> done in [1]. > > I think the proper fix is to add an inline helper > (call_get_unmapped_area()?) in linux/mm.h, and make that work properly > for the NOMMU case as well. > I'm ignoring the above issue for now. There may be a more fundamental issue that I do not know how to solve without adding another memeber to file_operations. Why? In order to go from file -> realfile, one needs to do something like the code you provided. while (file->f_op == &ovl_file_operations) file = file->private_data; return file; The problem is that this needs to be called from core kernel code (is_file_hugepages in mmap). ovl_file_operations is obviously only defined in the overlayfs code. Since overlayfs can be built as a module, I do not know of a way to reference ovl_file_operations which will only become available when/if overlayfs is loaded. Is there a way to do that? If there is no other way to do this, then I think we need to add another member to file_operations as is done in the following patch. I hope there is another way, because adding another file_op seems like overkill. From 6d22c93284263b5ddcc2199adc1bcceb233f1499 Mon Sep 17 00:00:00 2001 From: Mike Kravetz <mike.kravetz@oracle.com> Date: Wed, 10 Jun 2020 16:23:46 -0700 Subject: [PATCH v3] ovl: add f_real file_operation and supporting code for overlayfs If a file is on a union/overlay, then the 'struct file *' will have overlayfs file operations. The routine is_file_hugepages() compares f->f_op to hugetlbfs_file_operations to determine if it is a hugetlbfs file. If a hugetlbfs file is on a union/overlay, this comparison is false and is_file_hugepages() incorrectly indicates the underlying file is not hugetlbfs. One result of this is a BUG as shown in [1]. mmap uses is_file_hugepages() to identify hugetlbfs file and potentially round up length because hugetlbfs files have different alignment restrictions. This is defined/expected behavior. In addition, mmap code would like to use the filesystem specific get_unmapped_area() routine if one is defined. To address this issue, - Add a new file operation f_real while will return the underlying file. Only overlayfs provides a function for this operation. - Add a new routine real_file() which can be used by core code get an underlying file. - Update is_file_hugepages to get the real file. - Add get_unmapped_area f_op to oerrlayfs to call underlying routine. [1] https://lore.kernel.org/linux-mm/000000000000b4684e05a2968ca6@google.com/ Reported-by: syzbot+d6ec23007e951dadf3de@syzkaller.appspotmail.com Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> --- fs/overlayfs/file.c | 23 +++++++++++++++++++++++ include/linux/fs.h | 8 ++++++++ include/linux/hugetlb.h | 2 ++ 3 files changed, 33 insertions(+) diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c index 87c362f65448..eb870fc3912f 100644 --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -757,6 +757,27 @@ static loff_t ovl_remap_file_range(struct file *file_in, loff_t pos_in, remap_flags, op); } +#ifdef CONFIG_MMU +static unsigned long ovl_get_unmapped_area(struct file *file, + unsigned long uaddr, unsigned long len, + unsigned long pgoff, unsigned long flags) +{ + file = real_file(file); + + return (file->f_op->get_unmapped_area ?: + current->mm->get_unmapped_area)(file, uaddr, len, pgoff, flags); +} +#else +#define ovl_get_unmapped_area NULL +#endif + +static struct file *ovl_f_real(struct file *file) +{ + while (file->f_op == &ovl_file_operations) + file = file->private_data; + return file; +} + const struct file_operations ovl_file_operations = { .open = ovl_open, .release = ovl_release, @@ -771,9 +792,11 @@ const struct file_operations ovl_file_operations = { .compat_ioctl = ovl_compat_ioctl, .splice_read = ovl_splice_read, .splice_write = ovl_splice_write, + .f_real = ovl_f_real, .copy_file_range = ovl_copy_file_range, .remap_file_range = ovl_remap_file_range, + .get_unmapped_area = ovl_get_unmapped_area, }; int __init ovl_aio_request_cache_init(void) diff --git a/include/linux/fs.h b/include/linux/fs.h index 45cc10cdf6dd..59a969549aec 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1863,6 +1863,7 @@ struct file_operations { struct file *file_out, loff_t pos_out, loff_t len, unsigned int remap_flags); int (*fadvise)(struct file *, loff_t, loff_t, int); + struct file * (*f_real)(struct file *file); } __randomize_layout; struct inode_operations { @@ -1895,6 +1896,13 @@ struct inode_operations { int (*set_acl)(struct inode *, struct posix_acl *, int); } ____cacheline_aligned; +static inline struct file *real_file(struct file *file) +{ + if (unlikely(file->f_op->f_real)) + return file->f_op->f_real(file); + return file; +} + static inline ssize_t call_read_iter(struct file *file, struct kiocb *kio, struct iov_iter *iter) { diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 43a1cef8f0f1..528b07145414 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -437,6 +437,8 @@ struct file *hugetlb_file_setup(const char *name, size_t size, vm_flags_t acct, static inline bool is_file_hugepages(struct file *file) { + file = real_file(file); + if (file->f_op == &hugetlbfs_file_operations) return true; -- 2.25.4 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area() 2020-06-11 0:13 ` Mike Kravetz @ 2020-06-11 0:37 ` Al Viro 2020-06-11 1:36 ` Matthew Wilcox 0 siblings, 1 reply; 21+ messages in thread From: Al Viro @ 2020-06-11 0:37 UTC (permalink / raw) To: Mike Kravetz Cc: Miklos Szeredi, kbuild test robot, kbuild-all, Colin Walters, syzbot, Andrew Morton, Linux Memory Management List, linux-fsdevel, linux-kernel, syzkaller-bugs, overlayfs On Wed, Jun 10, 2020 at 05:13:52PM -0700, Mike Kravetz wrote: > To address this issue, > - Add a new file operation f_real while will return the underlying file. > Only overlayfs provides a function for this operation. > - Add a new routine real_file() which can be used by core code get an > underlying file. > - Update is_file_hugepages to get the real file. Egads... So to find out whether it's a hugetlb you would * check if a method is NULL * if not, call it * ... and check if the method table of the result is hugetlbfs one? Here's a radical suggestion: FMODE_HUGEPAGES. Just have it set by ->open() and let is_file_hugepages() check it. In ->f_mode. And make the bloody hugetlbfs_file_operations static, while we are at it. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area() 2020-06-11 0:37 ` Al Viro @ 2020-06-11 1:36 ` Matthew Wilcox 2020-06-11 2:17 ` Al Viro 0 siblings, 1 reply; 21+ messages in thread From: Matthew Wilcox @ 2020-06-11 1:36 UTC (permalink / raw) To: Al Viro Cc: Mike Kravetz, Miklos Szeredi, kbuild test robot, kbuild-all, Colin Walters, syzbot, Andrew Morton, Linux Memory Management List, linux-fsdevel, linux-kernel, syzkaller-bugs, overlayfs On Thu, Jun 11, 2020 at 01:37:26AM +0100, Al Viro wrote: > On Wed, Jun 10, 2020 at 05:13:52PM -0700, Mike Kravetz wrote: > > > To address this issue, > > - Add a new file operation f_real while will return the underlying file. > > Only overlayfs provides a function for this operation. > > - Add a new routine real_file() which can be used by core code get an > > underlying file. > > - Update is_file_hugepages to get the real file. > > Egads... So to find out whether it's a hugetlb you would > * check if a method is NULL > * if not, call it > * ... and check if the method table of the result is hugetlbfs one? > > Here's a radical suggestion: FMODE_HUGEPAGES. Just have it set by > ->open() and let is_file_hugepages() check it. In ->f_mode. And > make the bloody hugetlbfs_file_operations static, while we are at it. ITYM FMODE_OVL_UPPER. To quote Mike: > while (file->f_op == &ovl_file_operations) > file = file->private_data; > return file; which would be transformed into: while (file->f_mode & FMODE_OVL_UPPER) file = file->private_data; return file; Or are you proposing that overlayfs copy FMODE_HUGEPAGES from the underlying fs to the overlaying fs? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area() 2020-06-11 1:36 ` Matthew Wilcox @ 2020-06-11 2:17 ` Al Viro 2020-06-11 2:31 ` Mike Kravetz 0 siblings, 1 reply; 21+ messages in thread From: Al Viro @ 2020-06-11 2:17 UTC (permalink / raw) To: Matthew Wilcox Cc: Mike Kravetz, Miklos Szeredi, kbuild test robot, kbuild-all, Colin Walters, syzbot, Andrew Morton, Linux Memory Management List, linux-fsdevel, linux-kernel, syzkaller-bugs, overlayfs On Wed, Jun 10, 2020 at 06:36:16PM -0700, Matthew Wilcox wrote: > while (file->f_mode & FMODE_OVL_UPPER) > file = file->private_data; > return file; > > Or are you proposing that overlayfs copy FMODE_HUGEPAGES from the > underlying fs to the overlaying fs? The latter - that way nobody outside of overlayfs needs to know what its ->private_data points to, for one thing. And it's cheaper that way, obviously. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area() 2020-06-11 2:17 ` Al Viro @ 2020-06-11 2:31 ` Mike Kravetz 0 siblings, 0 replies; 21+ messages in thread From: Mike Kravetz @ 2020-06-11 2:31 UTC (permalink / raw) To: Al Viro, Matthew Wilcox Cc: Miklos Szeredi, kbuild test robot, kbuild-all, Colin Walters, syzbot, Andrew Morton, Linux Memory Management List, linux-fsdevel, linux-kernel, syzkaller-bugs, overlayfs On 6/10/20 7:17 PM, Al Viro wrote: > On Wed, Jun 10, 2020 at 06:36:16PM -0700, Matthew Wilcox wrote: > >> while (file->f_mode & FMODE_OVL_UPPER) >> file = file->private_data; >> return file; >> >> Or are you proposing that overlayfs copy FMODE_HUGEPAGES from the >> underlying fs to the overlaying fs? > > The latter - that way nobody outside of overlayfs needs to know what > its ->private_data points to, for one thing. And it's cheaper that > way, obviously. > Thanks Al and Matthew! I knew adding a file op for this was overkill and was looking for other suggestions. -- Mike Kravetz ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2020-06-11 2:32 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-04-06 3:06 kernel BUG at mm/hugetlb.c:LINE! syzbot 2020-04-06 22:05 ` Mike Kravetz 2020-05-12 15:04 ` Miklos Szeredi 2020-05-12 18:11 ` Mike Kravetz 2020-05-15 22:15 ` Mike Kravetz 2020-05-18 11:12 ` Miklos Szeredi 2020-05-18 23:22 ` Mike Kravetz 2020-05-18 23:41 ` Colin Walters 2020-05-19 0:35 ` Mike Kravetz 2020-05-20 11:20 ` Miklos Szeredi 2020-05-20 17:27 ` Mike Kravetz 2020-05-22 10:05 ` Miklos Szeredi 2020-05-28 0:01 ` Mike Kravetz 2020-05-28 8:37 ` [PATCH v2] ovl: provide real_file() and overlayfs get_unmapped_area() kbuild test robot 2020-05-28 21:01 ` Mike Kravetz 2020-06-04 9:16 ` Miklos Szeredi 2020-06-11 0:13 ` Mike Kravetz 2020-06-11 0:37 ` Al Viro 2020-06-11 1:36 ` Matthew Wilcox 2020-06-11 2:17 ` Al Viro 2020-06-11 2:31 ` Mike Kravetz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).