All of lore.kernel.org
 help / color / mirror / Atom feed
* [syzbot] [mm?] WARNING in __page_table_check_ptes_set
@ 2024-04-21 20:16 syzbot
  2024-04-22 10:07 ` David Hildenbrand
  0 siblings, 1 reply; 6+ messages in thread
From: syzbot @ 2024-04-21 20:16 UTC (permalink / raw)
  To: akpm, linux-kernel, linux-mm, pasha.tatashin, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    4eab35893071 Add linux-next specific files for 20240417
git tree:       linux-next
console+strace: https://syzkaller.appspot.com/x/log.txt?x=1727a61b180000
kernel config:  https://syzkaller.appspot.com/x/.config?x=27920e47287645ff
dashboard link: https://syzkaller.appspot.com/bug?extid=d8426b591c36b21c750e
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=156da22d180000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=163dfec7180000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/9f7d6c097fb4/disk-4eab3589.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/287b16352982/vmlinux-4eab3589.xz
kernel image: https://storage.googleapis.com/syzbot-assets/23839c65c573/bzImage-4eab3589.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+d8426b591c36b21c750e@syzkaller.appspotmail.com

------------[ cut here ]------------
WARNING: CPU: 0 PID: 5084 at mm/page_table_check.c:199 __page_table_check_pte mm/page_table_check.c:199 [inline]
WARNING: CPU: 0 PID: 5084 at mm/page_table_check.c:199 __page_table_check_ptes_set+0x1db/0x420 mm/page_table_check.c:213
Modules linked in:
CPU: 0 PID: 5084 Comm: syz-executor382 Not tainted 6.9.0-rc4-next-20240417-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
RIP: 0010:__page_table_check_pte mm/page_table_check.c:199 [inline]
RIP: 0010:__page_table_check_ptes_set+0x1db/0x420 mm/page_table_check.c:213
Code: 48 8b 7c 24 40 48 c7 c6 80 19 46 8e e8 ee df 8e ff 41 83 fc 1d 74 18 41 83 fc 1a 75 1d e8 5d da 8e ff eb 10 e8 56 da 8e ff 90 <0f> 0b 90 eb 10 e8 4b da 8e ff 90 0f 0b 90 eb 05 e8 40 da 8e ff 48
RSP: 0018:ffffc9000366f740 EFLAGS: 00010293
RAX: ffffffff8207833a RBX: ffffc9000366f7c0 RCX: ffff888022af3c00
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
RBP: ffffc9000366f830 R08: ffffffff820782af R09: 1ffffd40000a6a10
R10: dffffc0000000000 R11: fffff940000a6a11 R12: 0000000000000000
R13: 0000000014d42c67 R14: 0000000000000001 R15: 0000000000000000
FS:  0000555567f79380(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000066c7e0 CR3: 0000000078cb0000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 page_table_check_ptes_set include/linux/page_table_check.h:74 [inline]
 set_ptes include/linux/pgtable.h:267 [inline]
 __ptep_modify_prot_commit include/linux/pgtable.h:1269 [inline]
 ptep_modify_prot_commit include/linux/pgtable.h:1302 [inline]
 change_pte_range mm/mprotect.c:194 [inline]
 change_pmd_range mm/mprotect.c:424 [inline]
 change_pud_range mm/mprotect.c:457 [inline]
 change_p4d_range mm/mprotect.c:480 [inline]
 change_protection_range mm/mprotect.c:508 [inline]
 change_protection+0x2770/0x3cc0 mm/mprotect.c:542
 mprotect_fixup+0x740/0xa90 mm/mprotect.c:655
 do_mprotect_pkey+0x90d/0xe00 mm/mprotect.c:820
 __do_sys_mprotect mm/mprotect.c:841 [inline]
 __se_sys_mprotect mm/mprotect.c:838 [inline]
 __x64_sys_mprotect+0x80/0x90 mm/mprotect.c:838
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f45514bf429
Code: 48 83 c4 28 c3 e8 37 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe52191598 EFLAGS: 00000246 ORIG_RAX: 000000000000000a
RAX: ffffffffffffffda RBX: 00007ffe52191768 RCX: 00007f45514bf429
RDX: 000000000000000f RSI: 0000000000004000 RDI: 0000000020ffc000
RBP: 00007f4551532610 R08: 00007ffe52191768 R09: 00007ffe52191768
R10: 00007ffe52191768 R11: 0000000000000246 R12: 0000000000000001
R13: 00007ffe52191758 R14: 0000000000000001 R15: 0000000000000001
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] [mm?] WARNING in __page_table_check_ptes_set
  2024-04-21 20:16 [syzbot] [mm?] WARNING in __page_table_check_ptes_set syzbot
@ 2024-04-22 10:07 ` David Hildenbrand
  2024-04-22 10:38   ` David Hildenbrand
  0 siblings, 1 reply; 6+ messages in thread
From: David Hildenbrand @ 2024-04-22 10:07 UTC (permalink / raw)
  To: syzbot, akpm, linux-kernel, linux-mm, pasha.tatashin, syzkaller-bugs

On 21.04.24 22:16, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    4eab35893071 Add linux-next specific files for 20240417
> git tree:       linux-next
> console+strace: https://syzkaller.appspot.com/x/log.txt?x=1727a61b180000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=27920e47287645ff
> dashboard link: https://syzkaller.appspot.com/bug?extid=d8426b591c36b21c750e
> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=156da22d180000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=163dfec7180000
> 
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/9f7d6c097fb4/disk-4eab3589.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/287b16352982/vmlinux-4eab3589.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/23839c65c573/bzImage-4eab3589.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+d8426b591c36b21c750e@syzkaller.appspotmail.com
> 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 5084 at mm/page_table_check.c:199 __page_table_check_pte mm/page_table_check.c:199 [inline]
> WARNING: CPU: 0 PID: 5084 at mm/page_table_check.c:199 __page_table_check_ptes_set+0x1db/0x420 

I think this is

if (pte_present(pte) && pte_uffd_wp(pte))
	WARN_ON_ONCE(pte_write(pte));

mm/page_table_check.c:213
> Modules linked in:
> CPU: 0 PID: 5084 Comm: syz-executor382 Not tainted 6.9.0-rc4-next-20240417-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
> RIP: 0010:__page_table_check_pte mm/page_table_check.c:199 [inline]
> RIP: 0010:__page_table_check_ptes_set+0x1db/0x420 mm/page_table_check.c:213
> Code: 48 8b 7c 24 40 48 c7 c6 80 19 46 8e e8 ee df 8e ff 41 83 fc 1d 74 18 41 83 fc 1a 75 1d e8 5d da 8e ff eb 10 e8 56 da 8e ff 90 <0f> 0b 90 eb 10 e8 4b da 8e ff 90 0f 0b 90 eb 05 e8 40 da 8e ff 48
> RSP: 0018:ffffc9000366f740 EFLAGS: 00010293
> RAX: ffffffff8207833a RBX: ffffc9000366f7c0 RCX: ffff888022af3c00
> RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
> RBP: ffffc9000366f830 R08: ffffffff820782af R09: 1ffffd40000a6a10
> R10: dffffc0000000000 R11: fffff940000a6a11 R12: 0000000000000000
> R13: 0000000014d42c67 R14: 0000000000000001 R15: 0000000000000000
> FS:  0000555567f79380(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000000066c7e0 CR3: 0000000078cb0000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>   <TASK>
>   page_table_check_ptes_set include/linux/page_table_check.h:74 [inline]
>   set_ptes include/linux/pgtable.h:267 [inline]
>   __ptep_modify_prot_commit include/linux/pgtable.h:1269 [inline]
>   ptep_modify_prot_commit include/linux/pgtable.h:1302 [inline]
>   change_pte_range mm/mprotect.c:194 [inline]
>   change_pmd_range mm/mprotect.c:424 [inline]
>   change_pud_range mm/mprotect.c:457 [inline]
>   change_p4d_range mm/mprotect.c:480 [inline]
>   change_protection_range mm/mprotect.c:508 [inline]
>   change_protection+0x2770/0x3cc0 mm/mprotect.c:542
>   mprotect_fixup+0x740/0xa90 mm/mprotect.c:655
>   do_mprotect_pkey+0x90d/0xe00 mm/mprotect.c:820
>   __do_sys_mprotect mm/mprotect.c:841 [inline]
>   __se_sys_mprotect mm/mprotect.c:838 [inline]
>   __x64_sys_mprotect+0x80/0x90 mm/mprotect.c:838
>   do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>   do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
>   entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f45514bf429
> Code: 48 83 c4 28 c3 e8 37 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007ffe52191598 EFLAGS: 00000246 ORIG_RAX: 000000000000000a
> RAX: ffffffffffffffda RBX: 00007ffe52191768 RCX: 00007f45514bf429
> RDX: 000000000000000f RSI: 0000000000004000 RDI: 0000000020ffc000
> RBP: 00007f4551532610 R08: 00007ffe52191768 R09: 00007ffe52191768
> R10: 00007ffe52191768 R11: 0000000000000246 R12: 0000000000000001
> R13: 00007ffe52191758 R14: 0000000000000001 R15: 0000000000000001
>   </TASK>

Did we find a real issue that involves mprotect()?

At least can_change_pte_writable() should always return "false" for 
userfaultfd_pte_wp().

Do we maybe have a uffd-wp PTE outside of a UFFD_WP VMA?

Or was the PTE already writable and we only detect it now as we call 
mprotect()? (missed to detect it earlier?)

> 
> 
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> 
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
> 
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.
> 
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
> 
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
> 
> If you want to undo deduplication, reply with:
> #syz undup
> 

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] [mm?] WARNING in __page_table_check_ptes_set
  2024-04-22 10:07 ` David Hildenbrand
@ 2024-04-22 10:38   ` David Hildenbrand
  2024-04-22 11:46     ` David Hildenbrand
  0 siblings, 1 reply; 6+ messages in thread
From: David Hildenbrand @ 2024-04-22 10:38 UTC (permalink / raw)
  To: syzbot, akpm, linux-kernel, linux-mm, pasha.tatashin, syzkaller-bugs

On 22.04.24 12:07, David Hildenbrand wrote:
> On 21.04.24 22:16, syzbot wrote:
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit:    4eab35893071 Add linux-next specific files for 20240417
>> git tree:       linux-next
>> console+strace: https://syzkaller.appspot.com/x/log.txt?x=1727a61b180000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=27920e47287645ff
>> dashboard link: https://syzkaller.appspot.com/bug?extid=d8426b591c36b21c750e
>> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=156da22d180000
>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=163dfec7180000
>>
>> Downloadable assets:
>> disk image: https://storage.googleapis.com/syzbot-assets/9f7d6c097fb4/disk-4eab3589.raw.xz
>> vmlinux: https://storage.googleapis.com/syzbot-assets/287b16352982/vmlinux-4eab3589.xz
>> kernel image: https://storage.googleapis.com/syzbot-assets/23839c65c573/bzImage-4eab3589.xz
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+d8426b591c36b21c750e@syzkaller.appspotmail.com
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 5084 at mm/page_table_check.c:199 __page_table_check_pte mm/page_table_check.c:199 [inline]
>> WARNING: CPU: 0 PID: 5084 at mm/page_table_check.c:199 __page_table_check_ptes_set+0x1db/0x420
> 
> I think this is
> 
> if (pte_present(pte) && pte_uffd_wp(pte))
> 	WARN_ON_ONCE(pte_write(pte));
> 
> mm/page_table_check.c:213
>> Modules linked in:
>> CPU: 0 PID: 5084 Comm: syz-executor382 Not tainted 6.9.0-rc4-next-20240417-syzkaller #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
>> RIP: 0010:__page_table_check_pte mm/page_table_check.c:199 [inline]
>> RIP: 0010:__page_table_check_ptes_set+0x1db/0x420 mm/page_table_check.c:213
>> Code: 48 8b 7c 24 40 48 c7 c6 80 19 46 8e e8 ee df 8e ff 41 83 fc 1d 74 18 41 83 fc 1a 75 1d e8 5d da 8e ff eb 10 e8 56 da 8e ff 90 <0f> 0b 90 eb 10 e8 4b da 8e ff 90 0f 0b 90 eb 05 e8 40 da 8e ff 48
>> RSP: 0018:ffffc9000366f740 EFLAGS: 00010293
>> RAX: ffffffff8207833a RBX: ffffc9000366f7c0 RCX: ffff888022af3c00
>> RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
>> RBP: ffffc9000366f830 R08: ffffffff820782af R09: 1ffffd40000a6a10
>> R10: dffffc0000000000 R11: fffff940000a6a11 R12: 0000000000000000
>> R13: 0000000014d42c67 R14: 0000000000000001 R15: 0000000000000000
>> FS:  0000555567f79380(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 000000000066c7e0 CR3: 0000000078cb0000 CR4: 00000000003506f0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Call Trace:
>>    <TASK>
>>    page_table_check_ptes_set include/linux/page_table_check.h:74 [inline]
>>    set_ptes include/linux/pgtable.h:267 [inline]
>>    __ptep_modify_prot_commit include/linux/pgtable.h:1269 [inline]
>>    ptep_modify_prot_commit include/linux/pgtable.h:1302 [inline]
>>    change_pte_range mm/mprotect.c:194 [inline]
>>    change_pmd_range mm/mprotect.c:424 [inline]
>>    change_pud_range mm/mprotect.c:457 [inline]
>>    change_p4d_range mm/mprotect.c:480 [inline]
>>    change_protection_range mm/mprotect.c:508 [inline]
>>    change_protection+0x2770/0x3cc0 mm/mprotect.c:542
>>    mprotect_fixup+0x740/0xa90 mm/mprotect.c:655
>>    do_mprotect_pkey+0x90d/0xe00 mm/mprotect.c:820
>>    __do_sys_mprotect mm/mprotect.c:841 [inline]
>>    __se_sys_mprotect mm/mprotect.c:838 [inline]
>>    __x64_sys_mprotect+0x80/0x90 mm/mprotect.c:838
>>    do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>>    do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
>>    entry_SYSCALL_64_after_hwframe+0x77/0x7f
>> RIP: 0033:0x7f45514bf429
>> Code: 48 83 c4 28 c3 e8 37 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
>> RSP: 002b:00007ffe52191598 EFLAGS: 00000246 ORIG_RAX: 000000000000000a
>> RAX: ffffffffffffffda RBX: 00007ffe52191768 RCX: 00007f45514bf429
>> RDX: 000000000000000f RSI: 0000000000004000 RDI: 0000000020ffc000
>> RBP: 00007f4551532610 R08: 00007ffe52191768 R09: 00007ffe52191768
>> R10: 00007ffe52191768 R11: 0000000000000246 R12: 0000000000000001
>> R13: 00007ffe52191758 R14: 0000000000000001 R15: 0000000000000001
>>    </TASK>
> 
> Did we find a real issue that involves mprotect()?
> 
> At least can_change_pte_writable() should always return "false" for
> userfaultfd_pte_wp().
> 
> Do we maybe have a uffd-wp PTE outside of a UFFD_WP VMA?
> 
> Or was the PTE already writable and we only detect it now as we call
> mprotect()? (missed to detect it earlier?)

Staring at the reproducer, we do


   syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul, /*prot=*/0ul,
           /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
           /*offset=*/0ul);
   syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul,
           /*prot=PROT_WRITE|PROT_READ|PROT_EXEC*/ 7ul,
           /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
           /*offset=*/0ul);

-> Writable anonymous memmory

   syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul, /*prot=*/0ul,
           /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
           /*offset=*/0ul);
   intptr_t res = 0;
   res = syscall(__NR_userfaultfd,
                 /*flags=UFFD_USER_MODE_ONLY|O_NONBLOCK*/ 0x801ul);
   if (res != -1)
     r[0] = res;
   *(uint64_t*)0x200004c0 = 0xaa;
   *(uint64_t*)0x200004c8 = 0;
   *(uint64_t*)0x200004d0 = 0;
   syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0xc018aa3f, /*arg=*/0x200004c0ul);

-> _UFFDIO_API handshake?

   syscall(__NR_mprotect, /*addr=*/0x20ffc000ul, /*len=*/0x3000ul,
           /*prot=PROT_SEM|PROT_EXEC*/ 0xcul);

-> Protect target range R/O. I assume: no page populated yet?
-> 3 pages starting at 0x20ffc000ul;

   *(uint64_t*)0x20000180 = 0x20ffc000;
   *(uint64_t*)0x20000188 = 0x3000;
   *(uint64_t*)0x20000190 = 3;
   *(uint64_t*)0x20000198 = 0;
   syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0xc020aa00, /*arg=*/0x20000180ul);

-> _UFFDIO_REGISTER (aa00)
-> _range = 3 pages starting at 0x20ffc000ul
-> _mode = UFFDIO_REGISTER_MODE_WP | UFFDIO_REGISTER_MODE_MINOR

   *(uint64_t*)0x20000000 = 0x20ffd000;
   *(uint64_t*)0x20000008 = 0x20ffb000;
   *(uint64_t*)0x20000010 = 0x1000;
   *(uint64_t*)0x20000018 = 3;
   *(uint64_t*)0x20000020 = 0;
   syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0xc028aa03, /*arg=*/0x20000000ul);

-> _UFFDIO_COPY (aa03)
-> dst = 0x20ffd000
-> src = 0x20ffb000
-> len = 0x1000 (single page)
-> mode = UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP

-> We are copying into the R/O range. src should be R/W and trigger a page fault
    on access where we get a fresh page.

   *(uint16_t*)0x200000c0 = 1;
   *(uint64_t*)0x200000c8 = 0x20000040;
   *(uint16_t*)0x20000040 = 6;
   *(uint8_t*)0x20000042 = 0;
   *(uint8_t*)0x20000043 = 0;
   *(uint32_t*)0x20000044 = 0x7fffffff;
   res = syscall(__NR_seccomp, /*op=*/1ul, /*flags=*/0ul, /*arg=*/0x200000c0ul);
   if (res != -1)
     r[1] = res;
   syscall(__NR_open_tree, /*dfd=*/-1, /*filename=*/0ul, /*flags=*/0ul);

-> No idea what happens here and if it is relevant. If __NR_seccomp failed, we would
    no set r[1].

   syscall(__NR_close_range, /*fd=*/r[1], /*max_fd=*/-1, /*flags=*/0ul);

-> Is that closing uffd as well, especially if __NR_seccomp failed?

   syscall(__NR_mprotect, /*addr=*/0x20ffc000ul, /*len=*/0x4000ul,
           /*prot=PROT_SEM|PROT_WRITE|PROT_READ|PROT_EXEC*/ 0xful);

-> Restore write permissions. This seems to fire the uffd-wp page table check I assume.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] [mm?] WARNING in __page_table_check_ptes_set
  2024-04-22 10:38   ` David Hildenbrand
@ 2024-04-22 11:46     ` David Hildenbrand
  2024-04-22 13:28       ` Peter Xu
  0 siblings, 1 reply; 6+ messages in thread
From: David Hildenbrand @ 2024-04-22 11:46 UTC (permalink / raw)
  To: syzbot, akpm, linux-kernel, linux-mm, pasha.tatashin, syzkaller-bugs

On 22.04.24 12:38, David Hildenbrand wrote:
> On 22.04.24 12:07, David Hildenbrand wrote:
>> On 21.04.24 22:16, syzbot wrote:
>>> Hello,
>>>
>>> syzbot found the following issue on:
>>>
>>> HEAD commit:    4eab35893071 Add linux-next specific files for 20240417
>>> git tree:       linux-next
>>> console+strace: https://syzkaller.appspot.com/x/log.txt?x=1727a61b180000
>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=27920e47287645ff
>>> dashboard link: https://syzkaller.appspot.com/bug?extid=d8426b591c36b21c750e
>>> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=156da22d180000
>>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=163dfec7180000
>>>
>>> Downloadable assets:
>>> disk image: https://storage.googleapis.com/syzbot-assets/9f7d6c097fb4/disk-4eab3589.raw.xz
>>> vmlinux: https://storage.googleapis.com/syzbot-assets/287b16352982/vmlinux-4eab3589.xz
>>> kernel image: https://storage.googleapis.com/syzbot-assets/23839c65c573/bzImage-4eab3589.xz
>>>
>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>> Reported-by: syzbot+d8426b591c36b21c750e@syzkaller.appspotmail.com
>>>
>>> ------------[ cut here ]------------
>>> WARNING: CPU: 0 PID: 5084 at mm/page_table_check.c:199 __page_table_check_pte mm/page_table_check.c:199 [inline]
>>> WARNING: CPU: 0 PID: 5084 at mm/page_table_check.c:199 __page_table_check_ptes_set+0x1db/0x420
>>
>> I think this is
>>
>> if (pte_present(pte) && pte_uffd_wp(pte))
>> 	WARN_ON_ONCE(pte_write(pte));
>>
>> mm/page_table_check.c:213
>>> Modules linked in:
>>> CPU: 0 PID: 5084 Comm: syz-executor382 Not tainted 6.9.0-rc4-next-20240417-syzkaller #0
>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
>>> RIP: 0010:__page_table_check_pte mm/page_table_check.c:199 [inline]
>>> RIP: 0010:__page_table_check_ptes_set+0x1db/0x420 mm/page_table_check.c:213
>>> Code: 48 8b 7c 24 40 48 c7 c6 80 19 46 8e e8 ee df 8e ff 41 83 fc 1d 74 18 41 83 fc 1a 75 1d e8 5d da 8e ff eb 10 e8 56 da 8e ff 90 <0f> 0b 90 eb 10 e8 4b da 8e ff 90 0f 0b 90 eb 05 e8 40 da 8e ff 48
>>> RSP: 0018:ffffc9000366f740 EFLAGS: 00010293
>>> RAX: ffffffff8207833a RBX: ffffc9000366f7c0 RCX: ffff888022af3c00
>>> RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
>>> RBP: ffffc9000366f830 R08: ffffffff820782af R09: 1ffffd40000a6a10
>>> R10: dffffc0000000000 R11: fffff940000a6a11 R12: 0000000000000000
>>> R13: 0000000014d42c67 R14: 0000000000000001 R15: 0000000000000000
>>> FS:  0000555567f79380(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 000000000066c7e0 CR3: 0000000078cb0000 CR4: 00000000003506f0
>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> Call Trace:
>>>     <TASK>
>>>     page_table_check_ptes_set include/linux/page_table_check.h:74 [inline]
>>>     set_ptes include/linux/pgtable.h:267 [inline]
>>>     __ptep_modify_prot_commit include/linux/pgtable.h:1269 [inline]
>>>     ptep_modify_prot_commit include/linux/pgtable.h:1302 [inline]
>>>     change_pte_range mm/mprotect.c:194 [inline]
>>>     change_pmd_range mm/mprotect.c:424 [inline]
>>>     change_pud_range mm/mprotect.c:457 [inline]
>>>     change_p4d_range mm/mprotect.c:480 [inline]
>>>     change_protection_range mm/mprotect.c:508 [inline]
>>>     change_protection+0x2770/0x3cc0 mm/mprotect.c:542
>>>     mprotect_fixup+0x740/0xa90 mm/mprotect.c:655
>>>     do_mprotect_pkey+0x90d/0xe00 mm/mprotect.c:820
>>>     __do_sys_mprotect mm/mprotect.c:841 [inline]
>>>     __se_sys_mprotect mm/mprotect.c:838 [inline]
>>>     __x64_sys_mprotect+0x80/0x90 mm/mprotect.c:838
>>>     do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>>>     do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
>>>     entry_SYSCALL_64_after_hwframe+0x77/0x7f
>>> RIP: 0033:0x7f45514bf429
>>> Code: 48 83 c4 28 c3 e8 37 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
>>> RSP: 002b:00007ffe52191598 EFLAGS: 00000246 ORIG_RAX: 000000000000000a
>>> RAX: ffffffffffffffda RBX: 00007ffe52191768 RCX: 00007f45514bf429
>>> RDX: 000000000000000f RSI: 0000000000004000 RDI: 0000000020ffc000
>>> RBP: 00007f4551532610 R08: 00007ffe52191768 R09: 00007ffe52191768
>>> R10: 00007ffe52191768 R11: 0000000000000246 R12: 0000000000000001
>>> R13: 00007ffe52191758 R14: 0000000000000001 R15: 0000000000000001
>>>     </TASK>
>>
>> Did we find a real issue that involves mprotect()?
>>
>> At least can_change_pte_writable() should always return "false" for
>> userfaultfd_pte_wp().
>>
>> Do we maybe have a uffd-wp PTE outside of a UFFD_WP VMA?
>>
>> Or was the PTE already writable and we only detect it now as we call
>> mprotect()? (missed to detect it earlier?)
> 
> Staring at the reproducer, we do
> 
> 
>     syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul, /*prot=*/0ul,
>             /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
>             /*offset=*/0ul);
>     syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul,
>             /*prot=PROT_WRITE|PROT_READ|PROT_EXEC*/ 7ul,
>             /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
>             /*offset=*/0ul);
> 
> -> Writable anonymous memmory
> 
>     syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul, /*prot=*/0ul,
>             /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
>             /*offset=*/0ul);
>     intptr_t res = 0;
>     res = syscall(__NR_userfaultfd,
>                   /*flags=UFFD_USER_MODE_ONLY|O_NONBLOCK*/ 0x801ul);
>     if (res != -1)
>       r[0] = res;
>     *(uint64_t*)0x200004c0 = 0xaa;
>     *(uint64_t*)0x200004c8 = 0;
>     *(uint64_t*)0x200004d0 = 0;
>     syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0xc018aa3f, /*arg=*/0x200004c0ul);
> 
> -> _UFFDIO_API handshake?
> 
>     syscall(__NR_mprotect, /*addr=*/0x20ffc000ul, /*len=*/0x3000ul,
>             /*prot=PROT_SEM|PROT_EXEC*/ 0xcul);
> 
> -> Protect target range R/O. I assume: no page populated yet?
> -> 3 pages starting at 0x20ffc000ul;
> 
>     *(uint64_t*)0x20000180 = 0x20ffc000;
>     *(uint64_t*)0x20000188 = 0x3000;
>     *(uint64_t*)0x20000190 = 3;
>     *(uint64_t*)0x20000198 = 0;
>     syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0xc020aa00, /*arg=*/0x20000180ul);
> 
> -> _UFFDIO_REGISTER (aa00)
> -> _range = 3 pages starting at 0x20ffc000ul
> -> _mode = UFFDIO_REGISTER_MODE_WP | UFFDIO_REGISTER_MODE_MINOR
> 
>     *(uint64_t*)0x20000000 = 0x20ffd000;
>     *(uint64_t*)0x20000008 = 0x20ffb000;
>     *(uint64_t*)0x20000010 = 0x1000;
>     *(uint64_t*)0x20000018 = 3;
>     *(uint64_t*)0x20000020 = 0;
>     syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0xc028aa03, /*arg=*/0x20000000ul);
> 
> -> _UFFDIO_COPY (aa03)
> -> dst = 0x20ffd000
> -> src = 0x20ffb000
> -> len = 0x1000 (single page)
> -> mode = UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP
> 
> -> We are copying into the R/O range. src should be R/W and trigger a page fault
>      on access where we get a fresh page.
> 
>     *(uint16_t*)0x200000c0 = 1;
>     *(uint64_t*)0x200000c8 = 0x20000040;
>     *(uint16_t*)0x20000040 = 6;
>     *(uint8_t*)0x20000042 = 0;
>     *(uint8_t*)0x20000043 = 0;
>     *(uint32_t*)0x20000044 = 0x7fffffff;
>     res = syscall(__NR_seccomp, /*op=*/1ul, /*flags=*/0ul, /*arg=*/0x200000c0ul);
>     if (res != -1)
>       r[1] = res;
>     syscall(__NR_open_tree, /*dfd=*/-1, /*filename=*/0ul, /*flags=*/0ul);
> 
> -> No idea what happens here and if it is relevant. If __NR_seccomp failed, we would
>      no set r[1].
> 
>     syscall(__NR_close_range, /*fd=*/r[1], /*max_fd=*/-1, /*flags=*/0ul);
> 
> -> Is that closing uffd as well, especially if __NR_seccomp failed?
> 
>     syscall(__NR_mprotect, /*addr=*/0x20ffc000ul, /*len=*/0x4000ul,
>             /*prot=PROT_SEM|PROT_WRITE|PROT_READ|PROT_EXEC*/ 0xful);
> 
> -> Restore write permissions. This seems to fire the uffd-wp page table check I assume.

I think the issue is that userfaultfd_release() will clear the VMA UFFD_WP flag,
but it will not clear PTE uffd-wp bits. So we have leftover PTE uffd-wp bits at
the time we wr-unprotect.

I thought we removed that lazy handling, but looks like we didn't consider the
"close uffd" case in:

commit f369b07c861435bd812a9d14493f71b34132ed6f
Author: Peter Xu <peterx@redhat.com>
Date:   Thu Aug 11 16:13:40 2022 -0400

     mm/uffd: reset write protection when unregister with wp-mode


close should behave just like unregister.


Simplified+readable reproducer:

#define _GNU_SOURCE

#include <stdint.h>
#include <fcntl.h>
#include <sys/syscall.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/ioctl.h>
#include <linux/userfaultfd.h>
#include <unistd.h>

int main(void)
{
         void *src = mmap(0, 4096, PROT_READ, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
         void *dst = mmap(0, 4096, PROT_READ, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
         struct uffdio_register uffdio_register = {};
         struct uffdio_copy uffdio_copy = {};
         struct uffdio_api uffdio_api = {};
         int uffd;

         uffd = syscall(SYS_userfaultfd, O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
         uffdio_api.api = UFFD_API;
         ioctl(uffd, UFFDIO_API, &uffdio_api);

         uffdio_register.range.start = (uintptr_t)dst;
         uffdio_register.range.len = 4096;
         uffdio_register.mode = UFFDIO_REGISTER_MODE_WP;
         ioctl(uffd, UFFDIO_REGISTER, &uffdio_register);

         uffdio_copy.dst = (uintptr_t)dst;
         uffdio_copy.src = (uintptr_t)src;
         uffdio_copy.len = 4096;
         uffdio_copy.mode = UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP;
         ioctl(uffd, UFFDIO_COPY, &uffdio_copy);

         close(uffd);

         mprotect(dst, 4096, PROT_READ|PROT_WRITE);
         return 0;
}


-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] [mm?] WARNING in __page_table_check_ptes_set
  2024-04-22 11:46     ` David Hildenbrand
@ 2024-04-22 13:28       ` Peter Xu
  2024-04-22 15:10         ` David Hildenbrand
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Xu @ 2024-04-22 13:28 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: syzbot, akpm, linux-kernel, linux-mm, pasha.tatashin, syzkaller-bugs

On Mon, Apr 22, 2024 at 01:46:20PM +0200, David Hildenbrand wrote:
> On 22.04.24 12:38, David Hildenbrand wrote:
> > On 22.04.24 12:07, David Hildenbrand wrote:
> > > On 21.04.24 22:16, syzbot wrote:
> > > > Hello,
> > > > 
> > > > syzbot found the following issue on:
> > > > 
> > > > HEAD commit:    4eab35893071 Add linux-next specific files for 20240417
> > > > git tree:       linux-next
> > > > console+strace: https://syzkaller.appspot.com/x/log.txt?x=1727a61b180000
> > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=27920e47287645ff
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=d8426b591c36b21c750e
> > > > compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> > > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=156da22d180000
> > > > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=163dfec7180000
> > > > 
> > > > Downloadable assets:
> > > > disk image: https://storage.googleapis.com/syzbot-assets/9f7d6c097fb4/disk-4eab3589.raw.xz
> > > > vmlinux: https://storage.googleapis.com/syzbot-assets/287b16352982/vmlinux-4eab3589.xz
> > > > kernel image: https://storage.googleapis.com/syzbot-assets/23839c65c573/bzImage-4eab3589.xz
> > > > 
> > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > Reported-by: syzbot+d8426b591c36b21c750e@syzkaller.appspotmail.com
> > > > 
> > > > ------------[ cut here ]------------
> > > > WARNING: CPU: 0 PID: 5084 at mm/page_table_check.c:199 __page_table_check_pte mm/page_table_check.c:199 [inline]
> > > > WARNING: CPU: 0 PID: 5084 at mm/page_table_check.c:199 __page_table_check_ptes_set+0x1db/0x420
> > > 
> > > I think this is
> > > 
> > > if (pte_present(pte) && pte_uffd_wp(pte))
> > > 	WARN_ON_ONCE(pte_write(pte));
> > > 
> > > mm/page_table_check.c:213
> > > > Modules linked in:
> > > > CPU: 0 PID: 5084 Comm: syz-executor382 Not tainted 6.9.0-rc4-next-20240417-syzkaller #0
> > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
> > > > RIP: 0010:__page_table_check_pte mm/page_table_check.c:199 [inline]
> > > > RIP: 0010:__page_table_check_ptes_set+0x1db/0x420 mm/page_table_check.c:213
> > > > Code: 48 8b 7c 24 40 48 c7 c6 80 19 46 8e e8 ee df 8e ff 41 83 fc 1d 74 18 41 83 fc 1a 75 1d e8 5d da 8e ff eb 10 e8 56 da 8e ff 90 <0f> 0b 90 eb 10 e8 4b da 8e ff 90 0f 0b 90 eb 05 e8 40 da 8e ff 48
> > > > RSP: 0018:ffffc9000366f740 EFLAGS: 00010293
> > > > RAX: ffffffff8207833a RBX: ffffc9000366f7c0 RCX: ffff888022af3c00
> > > > RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
> > > > RBP: ffffc9000366f830 R08: ffffffff820782af R09: 1ffffd40000a6a10
> > > > R10: dffffc0000000000 R11: fffff940000a6a11 R12: 0000000000000000
> > > > R13: 0000000014d42c67 R14: 0000000000000001 R15: 0000000000000000
> > > > FS:  0000555567f79380(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
> > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > CR2: 000000000066c7e0 CR3: 0000000078cb0000 CR4: 00000000003506f0
> > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > Call Trace:
> > > >     <TASK>
> > > >     page_table_check_ptes_set include/linux/page_table_check.h:74 [inline]
> > > >     set_ptes include/linux/pgtable.h:267 [inline]
> > > >     __ptep_modify_prot_commit include/linux/pgtable.h:1269 [inline]
> > > >     ptep_modify_prot_commit include/linux/pgtable.h:1302 [inline]
> > > >     change_pte_range mm/mprotect.c:194 [inline]
> > > >     change_pmd_range mm/mprotect.c:424 [inline]
> > > >     change_pud_range mm/mprotect.c:457 [inline]
> > > >     change_p4d_range mm/mprotect.c:480 [inline]
> > > >     change_protection_range mm/mprotect.c:508 [inline]
> > > >     change_protection+0x2770/0x3cc0 mm/mprotect.c:542
> > > >     mprotect_fixup+0x740/0xa90 mm/mprotect.c:655
> > > >     do_mprotect_pkey+0x90d/0xe00 mm/mprotect.c:820
> > > >     __do_sys_mprotect mm/mprotect.c:841 [inline]
> > > >     __se_sys_mprotect mm/mprotect.c:838 [inline]
> > > >     __x64_sys_mprotect+0x80/0x90 mm/mprotect.c:838
> > > >     do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> > > >     do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
> > > >     entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > > > RIP: 0033:0x7f45514bf429
> > > > Code: 48 83 c4 28 c3 e8 37 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > > > RSP: 002b:00007ffe52191598 EFLAGS: 00000246 ORIG_RAX: 000000000000000a
> > > > RAX: ffffffffffffffda RBX: 00007ffe52191768 RCX: 00007f45514bf429
> > > > RDX: 000000000000000f RSI: 0000000000004000 RDI: 0000000020ffc000
> > > > RBP: 00007f4551532610 R08: 00007ffe52191768 R09: 00007ffe52191768
> > > > R10: 00007ffe52191768 R11: 0000000000000246 R12: 0000000000000001
> > > > R13: 00007ffe52191758 R14: 0000000000000001 R15: 0000000000000001
> > > >     </TASK>
> > > 
> > > Did we find a real issue that involves mprotect()?
> > > 
> > > At least can_change_pte_writable() should always return "false" for
> > > userfaultfd_pte_wp().
> > > 
> > > Do we maybe have a uffd-wp PTE outside of a UFFD_WP VMA?
> > > 
> > > Or was the PTE already writable and we only detect it now as we call
> > > mprotect()? (missed to detect it earlier?)
> > 
> > Staring at the reproducer, we do
> > 
> > 
> >     syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul, /*prot=*/0ul,
> >             /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
> >             /*offset=*/0ul);
> >     syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul,
> >             /*prot=PROT_WRITE|PROT_READ|PROT_EXEC*/ 7ul,
> >             /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
> >             /*offset=*/0ul);
> > 
> > -> Writable anonymous memmory
> > 
> >     syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul, /*prot=*/0ul,
> >             /*flags=MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE*/ 0x32ul, /*fd=*/-1,
> >             /*offset=*/0ul);
> >     intptr_t res = 0;
> >     res = syscall(__NR_userfaultfd,
> >                   /*flags=UFFD_USER_MODE_ONLY|O_NONBLOCK*/ 0x801ul);
> >     if (res != -1)
> >       r[0] = res;
> >     *(uint64_t*)0x200004c0 = 0xaa;
> >     *(uint64_t*)0x200004c8 = 0;
> >     *(uint64_t*)0x200004d0 = 0;
> >     syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0xc018aa3f, /*arg=*/0x200004c0ul);
> > 
> > -> _UFFDIO_API handshake?
> > 
> >     syscall(__NR_mprotect, /*addr=*/0x20ffc000ul, /*len=*/0x3000ul,
> >             /*prot=PROT_SEM|PROT_EXEC*/ 0xcul);
> > 
> > -> Protect target range R/O. I assume: no page populated yet?
> > -> 3 pages starting at 0x20ffc000ul;
> > 
> >     *(uint64_t*)0x20000180 = 0x20ffc000;
> >     *(uint64_t*)0x20000188 = 0x3000;
> >     *(uint64_t*)0x20000190 = 3;
> >     *(uint64_t*)0x20000198 = 0;
> >     syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0xc020aa00, /*arg=*/0x20000180ul);
> > 
> > -> _UFFDIO_REGISTER (aa00)
> > -> _range = 3 pages starting at 0x20ffc000ul
> > -> _mode = UFFDIO_REGISTER_MODE_WP | UFFDIO_REGISTER_MODE_MINOR
> > 
> >     *(uint64_t*)0x20000000 = 0x20ffd000;
> >     *(uint64_t*)0x20000008 = 0x20ffb000;
> >     *(uint64_t*)0x20000010 = 0x1000;
> >     *(uint64_t*)0x20000018 = 3;
> >     *(uint64_t*)0x20000020 = 0;
> >     syscall(__NR_ioctl, /*fd=*/r[0], /*cmd=*/0xc028aa03, /*arg=*/0x20000000ul);
> > 
> > -> _UFFDIO_COPY (aa03)
> > -> dst = 0x20ffd000
> > -> src = 0x20ffb000
> > -> len = 0x1000 (single page)
> > -> mode = UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP
> > 
> > -> We are copying into the R/O range. src should be R/W and trigger a page fault
> >      on access where we get a fresh page.
> > 
> >     *(uint16_t*)0x200000c0 = 1;
> >     *(uint64_t*)0x200000c8 = 0x20000040;
> >     *(uint16_t*)0x20000040 = 6;
> >     *(uint8_t*)0x20000042 = 0;
> >     *(uint8_t*)0x20000043 = 0;
> >     *(uint32_t*)0x20000044 = 0x7fffffff;
> >     res = syscall(__NR_seccomp, /*op=*/1ul, /*flags=*/0ul, /*arg=*/0x200000c0ul);
> >     if (res != -1)
> >       r[1] = res;
> >     syscall(__NR_open_tree, /*dfd=*/-1, /*filename=*/0ul, /*flags=*/0ul);
> > 
> > -> No idea what happens here and if it is relevant. If __NR_seccomp failed, we would
> >      no set r[1].
> > 
> >     syscall(__NR_close_range, /*fd=*/r[1], /*max_fd=*/-1, /*flags=*/0ul);
> > 
> > -> Is that closing uffd as well, especially if __NR_seccomp failed?
> > 
> >     syscall(__NR_mprotect, /*addr=*/0x20ffc000ul, /*len=*/0x4000ul,
> >             /*prot=PROT_SEM|PROT_WRITE|PROT_READ|PROT_EXEC*/ 0xful);
> > 
> > -> Restore write permissions. This seems to fire the uffd-wp page table check I assume.
> 
> I think the issue is that userfaultfd_release() will clear the VMA UFFD_WP flag,
> but it will not clear PTE uffd-wp bits. So we have leftover PTE uffd-wp bits at
> the time we wr-unprotect.
> 
> I thought we removed that lazy handling, but looks like we didn't consider the
> "close uffd" case in:
> 
> commit f369b07c861435bd812a9d14493f71b34132ed6f
> Author: Peter Xu <peterx@redhat.com>
> Date:   Thu Aug 11 16:13:40 2022 -0400
> 
>     mm/uffd: reset write protection when unregister with wp-mode
> 
> 
> close should behave just like unregister.
> 
> 
> Simplified+readable reproducer:
> 
> #define _GNU_SOURCE
> 
> #include <stdint.h>
> #include <fcntl.h>
> #include <sys/syscall.h>
> #include <sys/mman.h>
> #include <sys/types.h>
> #include <sys/ioctl.h>
> #include <linux/userfaultfd.h>
> #include <unistd.h>
> 
> int main(void)
> {
>         void *src = mmap(0, 4096, PROT_READ, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
>         void *dst = mmap(0, 4096, PROT_READ, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
>         struct uffdio_register uffdio_register = {};
>         struct uffdio_copy uffdio_copy = {};
>         struct uffdio_api uffdio_api = {};
>         int uffd;
> 
>         uffd = syscall(SYS_userfaultfd, O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
>         uffdio_api.api = UFFD_API;
>         ioctl(uffd, UFFDIO_API, &uffdio_api);
> 
>         uffdio_register.range.start = (uintptr_t)dst;
>         uffdio_register.range.len = 4096;
>         uffdio_register.mode = UFFDIO_REGISTER_MODE_WP;
>         ioctl(uffd, UFFDIO_REGISTER, &uffdio_register);
> 
>         uffdio_copy.dst = (uintptr_t)dst;
>         uffdio_copy.src = (uintptr_t)src;
>         uffdio_copy.len = 4096;
>         uffdio_copy.mode = UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP;
>         ioctl(uffd, UFFDIO_COPY, &uffdio_copy);
> 
>         close(uffd);
> 
>         mprotect(dst, 4096, PROT_READ|PROT_WRITE);
>         return 0;
> }

Thanks, I'll post a patch.

PS: next time feel free to try "strace ./reproducer", it'll do the
translations and I found it handy to work with syzbot.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] [mm?] WARNING in __page_table_check_ptes_set
  2024-04-22 13:28       ` Peter Xu
@ 2024-04-22 15:10         ` David Hildenbrand
  0 siblings, 0 replies; 6+ messages in thread
From: David Hildenbrand @ 2024-04-22 15:10 UTC (permalink / raw)
  To: Peter Xu
  Cc: syzbot, akpm, linux-kernel, linux-mm, pasha.tatashin, syzkaller-bugs

>> commit f369b07c861435bd812a9d14493f71b34132ed6f
>> Author: Peter Xu <peterx@redhat.com>
>> Date:   Thu Aug 11 16:13:40 2022 -0400
>>
>>      mm/uffd: reset write protection when unregister with wp-mode
>>
>>
>> close should behave just like unregister.
>>
>>
>> Simplified+readable reproducer:
>>
>> #define _GNU_SOURCE
>>
>> #include <stdint.h>
>> #include <fcntl.h>
>> #include <sys/syscall.h>
>> #include <sys/mman.h>
>> #include <sys/types.h>
>> #include <sys/ioctl.h>
>> #include <linux/userfaultfd.h>
>> #include <unistd.h>
>>
>> int main(void)
>> {
>>          void *src = mmap(0, 4096, PROT_READ, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
>>          void *dst = mmap(0, 4096, PROT_READ, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
>>          struct uffdio_register uffdio_register = {};
>>          struct uffdio_copy uffdio_copy = {};
>>          struct uffdio_api uffdio_api = {};
>>          int uffd;
>>
>>          uffd = syscall(SYS_userfaultfd, O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
>>          uffdio_api.api = UFFD_API;
>>          ioctl(uffd, UFFDIO_API, &uffdio_api);
>>
>>          uffdio_register.range.start = (uintptr_t)dst;
>>          uffdio_register.range.len = 4096;
>>          uffdio_register.mode = UFFDIO_REGISTER_MODE_WP;
>>          ioctl(uffd, UFFDIO_REGISTER, &uffdio_register);
>>
>>          uffdio_copy.dst = (uintptr_t)dst;
>>          uffdio_copy.src = (uintptr_t)src;
>>          uffdio_copy.len = 4096;
>>          uffdio_copy.mode = UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP;
>>          ioctl(uffd, UFFDIO_COPY, &uffdio_copy);
>>
>>          close(uffd);
>>
>>          mprotect(dst, 4096, PROT_READ|PROT_WRITE);
>>          return 0;
>> }
> 
> Thanks, I'll post a patch.
> 
> PS: next time feel free to try "strace ./reproducer", it'll do the
> translations and I found it handy to work with syzbot.

Cool, was not aware that it would do that amount of translation!

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-04-22 15:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-21 20:16 [syzbot] [mm?] WARNING in __page_table_check_ptes_set syzbot
2024-04-22 10:07 ` David Hildenbrand
2024-04-22 10:38   ` David Hildenbrand
2024-04-22 11:46     ` David Hildenbrand
2024-04-22 13:28       ` Peter Xu
2024-04-22 15:10         ` David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.