linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum
@ 2023-02-02  6:54 syzbot
  2023-02-13 15:56 ` syzbot
  2023-03-03 21:43 ` syzbot
  0 siblings, 2 replies; 20+ messages in thread
From: syzbot @ 2023-02-02  6:54 UTC (permalink / raw)
  To: adilger.kernel, linux-ext4, linux-kernel, llvm, nathan,
	ndesaulniers, syzkaller-bugs, trix, tytso

Hello,

syzbot found the following issue on:

HEAD commit:    c96618275234 Fix up more non-executable files marked execu..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=14287dc1480000
kernel config:  https://syzkaller.appspot.com/x/.config?x=c8d5c2ee6c2bd4b8
dashboard link: https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
compiler:       Debian clang version 13.0.1-6~deb11u1, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/a829cd39e940/disk-c9661827.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/abbc86f52a98/vmlinux-c9661827.xz
kernel image: https://storage.googleapis.com/syzbot-assets/ab0970dd4f84/bzImage-c9661827.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+8785e41224a3afd04321@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: slab-out-of-bounds in crc16+0x206/0x280 lib/crc16.c:58
Read of size 1 at addr ffff888075f5c0a8 by task syz-executor.2/15586

CPU: 1 PID: 15586 Comm: syz-executor.2 Not tainted 6.2.0-rc5-syzkaller-00205-gc96618275234 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x1b1/0x290 lib/dump_stack.c:106
 print_address_description+0x74/0x340 mm/kasan/report.c:306
 print_report+0x107/0x1f0 mm/kasan/report.c:417
 kasan_report+0xcd/0x100 mm/kasan/report.c:517
 crc16+0x206/0x280 lib/crc16.c:58
 ext4_group_desc_csum+0x81b/0xb20 fs/ext4/super.c:3187
 ext4_group_desc_csum_set+0x195/0x230 fs/ext4/super.c:3210
 ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
 ext4_free_blocks+0x191a/0x2810 fs/ext4/mballoc.c:6173
 ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
 ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
 ext4_ext_remove_space+0x24ef/0x46a0 fs/ext4/extents.c:2958
 ext4_ext_truncate+0x177/0x220 fs/ext4/extents.c:4416
 ext4_truncate+0xa6a/0xea0 fs/ext4/inode.c:4342
 ext4_setattr+0x10c8/0x1930 fs/ext4/inode.c:5622
 notify_change+0xe50/0x1100 fs/attr.c:482
 do_truncate+0x200/0x2f0 fs/open.c:65
 handle_truncate fs/namei.c:3216 [inline]
 do_open fs/namei.c:3561 [inline]
 path_openat+0x272b/0x2dd0 fs/namei.c:3714
 do_filp_open+0x264/0x4f0 fs/namei.c:3741
 do_sys_openat2+0x124/0x4e0 fs/open.c:1310
 do_sys_open fs/open.c:1326 [inline]
 __do_sys_creat fs/open.c:1402 [inline]
 __se_sys_creat fs/open.c:1396 [inline]
 __x64_sys_creat+0x11f/0x160 fs/open.c:1396
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f72f8a8c0c9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f72f97e3168 EFLAGS: 00000246 ORIG_RAX: 0000000000000055
RAX: ffffffffffffffda RBX: 00007f72f8bac050 RCX: 00007f72f8a8c0c9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000280
RBP: 00007f72f8ae7ae9 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffd165348bf R14: 00007f72f97e3300 R15: 0000000000022000
 </TASK>

Allocated by task 5119:
 kasan_save_stack mm/kasan/common.c:45 [inline]
 kasan_set_track+0x3d/0x60 mm/kasan/common.c:52
 __kasan_slab_alloc+0x65/0x70 mm/kasan/common.c:325
 kasan_slab_alloc include/linux/kasan.h:201 [inline]
 slab_post_alloc_hook mm/slab.h:761 [inline]
 slab_alloc_node mm/slub.c:3452 [inline]
 slab_alloc mm/slub.c:3460 [inline]
 __kmem_cache_alloc_lru mm/slub.c:3467 [inline]
 kmem_cache_alloc+0x1b3/0x350 mm/slub.c:3476
 kmem_cache_zalloc include/linux/slab.h:710 [inline]
 __kernfs_new_node+0xdb/0x730 fs/kernfs/dir.c:614
 kernfs_new_node+0x95/0x160 fs/kernfs/dir.c:676
 __kernfs_create_file+0x45/0x2e0 fs/kernfs/file.c:1047
 sysfs_add_file_mode_ns+0x21d/0x330 fs/sysfs/file.c:294
 create_files fs/sysfs/group.c:64 [inline]
 internal_create_group+0x508/0xde0 fs/sysfs/group.c:148
 internal_create_groups fs/sysfs/group.c:188 [inline]
 sysfs_create_groups+0x5d/0x130 fs/sysfs/group.c:214
 create_dir lib/kobject.c:68 [inline]
 kobject_add_internal+0x723/0xd10 lib/kobject.c:223
 kobject_add_varg lib/kobject.c:358 [inline]
 kobject_init_and_add+0x104/0x160 lib/kobject.c:441
 netdev_queue_add_kobject net/core/net-sysfs.c:1666 [inline]
 netdev_queue_update_kobjects+0x20c/0x4c0 net/core/net-sysfs.c:1718
 register_queue_kobjects net/core/net-sysfs.c:1779 [inline]
 netdev_register_kobject+0x263/0x310 net/core/net-sysfs.c:2019
 register_netdevice+0x1043/0x17a0 net/core/dev.c:10045
 bond_newlink+0x3f/0x90 drivers/net/bonding/bond_netlink.c:560
 rtnl_newlink_create net/core/rtnetlink.c:3407 [inline]
 __rtnl_newlink net/core/rtnetlink.c:3624 [inline]
 rtnl_newlink+0x14b3/0x2020 net/core/rtnetlink.c:3637
 rtnetlink_rcv_msg+0x822/0xf10 net/core/rtnetlink.c:6141
 netlink_rcv_skb+0x1f0/0x470 net/netlink/af_netlink.c:2574
 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
 netlink_unicast+0x7e7/0x9c0 net/netlink/af_netlink.c:1365
 netlink_sendmsg+0x9b3/0xcd0 net/netlink/af_netlink.c:1942
 sock_sendmsg_nosec net/socket.c:714 [inline]
 sock_sendmsg net/socket.c:734 [inline]
 __sys_sendto+0x46e/0x5f0 net/socket.c:2117
 __do_sys_sendto net/socket.c:2129 [inline]
 __se_sys_sendto net/socket.c:2125 [inline]
 __x64_sys_sendto+0xda/0xf0 net/socket.c:2125
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

The buggy address belongs to the object at ffff888075f5c000
 which belongs to the cache kernfs_node_cache of size 168
The buggy address is located 0 bytes to the right of
 168-byte region [ffff888075f5c000, ffff888075f5c0a8)

The buggy address belongs to the physical page:
page:ffffea0001d7d700 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x75f5c
flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000000200 ffff8880129ebc80 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000110011 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 5119, tgid 5119 (syz-executor.3), ts 232703738304, free_ts 232703424583
 prep_new_page mm/page_alloc.c:2531 [inline]
 get_page_from_freelist+0x742/0x7c0 mm/page_alloc.c:4283
 __alloc_pages+0x259/0x560 mm/page_alloc.c:5549
 alloc_slab_page+0xbd/0x190 mm/slub.c:1851
 allocate_slab+0x5e/0x3c0 mm/slub.c:1998
 new_slab mm/slub.c:2051 [inline]
 ___slab_alloc+0x782/0xe20 mm/slub.c:3193
 __slab_alloc mm/slub.c:3292 [inline]
 __slab_alloc_node mm/slub.c:3345 [inline]
 slab_alloc_node mm/slub.c:3442 [inline]
 slab_alloc mm/slub.c:3460 [inline]
 __kmem_cache_alloc_lru mm/slub.c:3467 [inline]
 kmem_cache_alloc+0x268/0x350 mm/slub.c:3476
 kmem_cache_zalloc include/linux/slab.h:710 [inline]
 __kernfs_new_node+0xdb/0x730 fs/kernfs/dir.c:614
 kernfs_new_node+0x95/0x160 fs/kernfs/dir.c:676
 __kernfs_create_file+0x45/0x2e0 fs/kernfs/file.c:1047
 sysfs_add_file_mode_ns+0x21d/0x330 fs/sysfs/file.c:294
 create_files fs/sysfs/group.c:64 [inline]
 internal_create_group+0x508/0xde0 fs/sysfs/group.c:148
 internal_create_groups fs/sysfs/group.c:188 [inline]
 sysfs_create_groups+0x5d/0x130 fs/sysfs/group.c:214
 create_dir lib/kobject.c:68 [inline]
 kobject_add_internal+0x723/0xd10 lib/kobject.c:223
 kobject_add_varg lib/kobject.c:358 [inline]
 kobject_init_and_add+0x104/0x160 lib/kobject.c:441
 netdev_queue_add_kobject net/core/net-sysfs.c:1666 [inline]
 netdev_queue_update_kobjects+0x20c/0x4c0 net/core/net-sysfs.c:1718
 register_queue_kobjects net/core/net-sysfs.c:1779 [inline]
 netdev_register_kobject+0x263/0x310 net/core/net-sysfs.c:2019
page last free stack trace:
 reset_page_owner include/linux/page_owner.h:24 [inline]
 free_pages_prepare mm/page_alloc.c:1446 [inline]
 free_pcp_prepare+0x751/0x780 mm/page_alloc.c:1496
 free_unref_page_prepare mm/page_alloc.c:3369 [inline]
 free_unref_page+0x19/0x4c0 mm/page_alloc.c:3464
 qlist_free_all+0x2b/0x70 mm/kasan/quarantine.c:187
 kasan_quarantine_reduce+0x156/0x170 mm/kasan/quarantine.c:294
 __kasan_slab_alloc+0x1f/0x70 mm/kasan/common.c:302
 kasan_slab_alloc include/linux/kasan.h:201 [inline]
 slab_post_alloc_hook mm/slab.h:761 [inline]
 slab_alloc_node mm/slub.c:3452 [inline]
 __kmem_cache_alloc_node+0x1e0/0x340 mm/slub.c:3491
 kmalloc_trace+0x26/0x60 mm/slab_common.c:1062
 kmalloc include/linux/slab.h:580 [inline]
 kzalloc include/linux/slab.h:720 [inline]
 ref_tracker_alloc+0x128/0x440 lib/ref_tracker.c:85
 __netdev_tracker_alloc include/linux/netdevice.h:4020 [inline]
 netdev_hold include/linux/netdevice.h:4049 [inline]
 rx_queue_add_kobject net/core/net-sysfs.c:1060 [inline]
 net_rx_queue_update_kobjects+0x15d/0x4c0 net/core/net-sysfs.c:1114
 register_queue_kobjects net/core/net-sysfs.c:1774 [inline]
 netdev_register_kobject+0x222/0x310 net/core/net-sysfs.c:2019
 register_netdevice+0x1043/0x17a0 net/core/dev.c:10045
 bond_newlink+0x3f/0x90 drivers/net/bonding/bond_netlink.c:560
 rtnl_newlink_create net/core/rtnetlink.c:3407 [inline]
 __rtnl_newlink net/core/rtnetlink.c:3624 [inline]
 rtnl_newlink+0x14b3/0x2020 net/core/rtnetlink.c:3637
 rtnetlink_rcv_msg+0x822/0xf10 net/core/rtnetlink.c:6141
 netlink_rcv_skb+0x1f0/0x470 net/netlink/af_netlink.c:2574
 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
 netlink_unicast+0x7e7/0x9c0 net/netlink/af_netlink.c:1365

Memory state around the buggy address:
 ffff888075f5bf80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff888075f5c000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff888075f5c080: 00 00 00 00 00 fc fc fc fc fc fc fc fc 00 00 00
                                  ^
 ffff888075f5c100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff888075f5c180: 00 00 fc fc fc fc fc fc fc fc 00 00 00 00 00 00
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum
  2023-02-02  6:54 [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum syzbot
@ 2023-02-13 15:56 ` syzbot
  2023-03-01 12:13   ` Tudor Ambarus
  2023-03-03 21:43 ` syzbot
  1 sibling, 1 reply; 20+ messages in thread
From: syzbot @ 2023-02-13 15:56 UTC (permalink / raw)
  To: adilger.kernel, linux-ext4, linux-kernel, llvm, nathan,
	ndesaulniers, syzkaller-bugs, trix, tytso

syzbot has found a reproducer for the following issue on:

HEAD commit:    ceaa837f96ad Linux 6.2-rc8
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
kernel config:  https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
dashboard link: https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
compiler:       Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
kernel image: https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+8785e41224a3afd04321@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339

CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted 6.2.0-rc8-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
 print_address_description mm/kasan/report.c:306 [inline]
 print_report+0x163/0x4f0 mm/kasan/report.c:417
 kasan_report+0x13a/0x170 mm/kasan/report.c:517
 crc16+0x1fb/0x280 lib/crc16.c:58
 ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
 ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
 ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
 ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
 ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
 ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
 ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
 ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
 ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
 ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
 evict+0x2a4/0x620 fs/inode.c:664
 do_unlinkat+0x4f1/0x930 fs/namei.c:4327
 __do_sys_unlink fs/namei.c:4368 [inline]
 __se_sys_unlink fs/namei.c:4366 [inline]
 __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fbc85a8c0f9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
 </TASK>

The buggy address belongs to the physical page:
page:ffffea0001f78000 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 pfn:0x7de00
flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08 0000000000000000
raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as freed
page last allocated via order 1, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
 prep_new_page mm/page_alloc.c:2531 [inline]
 get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
 __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
 alloc_slab_page+0x6a/0x160 mm/slub.c:1851
 allocate_slab mm/slub.c:1998 [inline]
 new_slab+0x84/0x2f0 mm/slub.c:2051
 ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
 __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
 kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
 mt_alloc_bulk lib/maple_tree.c:157 [inline]
 mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
 mas_node_count_gfp lib/maple_tree.c:1316 [inline]
 mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
 vma_expand+0x277/0x850 mm/mmap.c:541
 mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
 do_mmap+0x8c9/0xf70 mm/mmap.c:1411
 vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
page last free stack trace:
 reset_page_owner include/linux/page_owner.h:24 [inline]
 free_pages_prepare mm/page_alloc.c:1446 [inline]
 free_pcp_prepare mm/page_alloc.c:1496 [inline]
 free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
 free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
 qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
 kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
 __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
 kasan_slab_alloc include/linux/kasan.h:201 [inline]
 slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
 slab_alloc_node mm/slub.c:3452 [inline]
 kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
 __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
 alloc_skb include/linux/skbuff.h:1270 [inline]
 alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
 sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
 unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
 sock_sendmsg_nosec net/socket.c:714 [inline]
 sock_sendmsg net/socket.c:734 [inline]
 __sys_sendto+0x475/0x5f0 net/socket.c:2117
 __do_sys_sendto net/socket.c:2129 [inline]
 __se_sys_sendto net/socket.c:2125 [inline]
 __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Memory state around the buggy address:
 ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                   ^
 ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum
  2023-02-13 15:56 ` syzbot
@ 2023-03-01 12:13   ` Tudor Ambarus
  2023-03-07 10:39     ` Jan Kara
  0 siblings, 1 reply; 20+ messages in thread
From: Tudor Ambarus @ 2023-03-01 12:13 UTC (permalink / raw)
  To: syzbot, adilger.kernel, linux-ext4, linux-kernel, llvm, nathan,
	ndesaulniers, syzkaller-bugs, trix, tytso, Lee Jones

Hi!

On 2/13/23 15:56, syzbot wrote:
> syzbot has found a reproducer for the following issue on:
> 
> HEAD commit:    ceaa837f96ad Linux 6.2-rc8
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
> dashboard link: https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
> compiler:       Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000
> 
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
> mounted in repro: https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+8785e41224a3afd04321@syzkaller.appspotmail.com
> 
> ==================================================================
> BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
> Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339
> 
> CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted 6.2.0-rc8-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023
> Call Trace:
>   <TASK>
>   __dump_stack lib/dump_stack.c:88 [inline]
>   dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
>   print_address_description mm/kasan/report.c:306 [inline]
>   print_report+0x163/0x4f0 mm/kasan/report.c:417
>   kasan_report+0x13a/0x170 mm/kasan/report.c:517
>   crc16+0x1fb/0x280 lib/crc16.c:58
>   ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
>   ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
>   ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
>   ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
>   ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
>   ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
>   ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
>   ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
>   ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
>   ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
>   evict+0x2a4/0x620 fs/inode.c:664
>   do_unlinkat+0x4f1/0x930 fs/namei.c:4327
>   __do_sys_unlink fs/namei.c:4368 [inline]
>   __se_sys_unlink fs/namei.c:4366 [inline]
>   __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
>   do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>   do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7fbc85a8c0f9
> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
> RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
> RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
>   </TASK>
> 
> The buggy address belongs to the physical page:
> page:ffffea0001f78000 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 pfn:0x7de00
> flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
> raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08 0000000000000000
> raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000
> page dumped because: kasan: bad access detected
> page_owner tracks the page as freed
> page last allocated via order 1, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
>   prep_new_page mm/page_alloc.c:2531 [inline]
>   get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
>   __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
>   alloc_slab_page+0x6a/0x160 mm/slub.c:1851
>   allocate_slab mm/slub.c:1998 [inline]
>   new_slab+0x84/0x2f0 mm/slub.c:2051
>   ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
>   __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
>   kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
>   mt_alloc_bulk lib/maple_tree.c:157 [inline]
>   mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
>   mas_node_count_gfp lib/maple_tree.c:1316 [inline]
>   mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
>   vma_expand+0x277/0x850 mm/mmap.c:541
>   mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
>   do_mmap+0x8c9/0xf70 mm/mmap.c:1411
>   vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
>   do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>   do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> page last free stack trace:
>   reset_page_owner include/linux/page_owner.h:24 [inline]
>   free_pages_prepare mm/page_alloc.c:1446 [inline]
>   free_pcp_prepare mm/page_alloc.c:1496 [inline]
>   free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
>   free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
>   qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
>   kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
>   __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
>   kasan_slab_alloc include/linux/kasan.h:201 [inline]
>   slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
>   slab_alloc_node mm/slub.c:3452 [inline]
>   kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
>   __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
>   alloc_skb include/linux/skbuff.h:1270 [inline]
>   alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
>   sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
>   unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
>   sock_sendmsg_nosec net/socket.c:714 [inline]
>   sock_sendmsg net/socket.c:734 [inline]
>   __sys_sendto+0x475/0x5f0 net/socket.c:2117
>   __do_sys_sendto net/socket.c:2129 [inline]
>   __se_sys_sendto net/socket.c:2125 [inline]
>   __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
>   do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>   do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> 
> Memory state around the buggy address:
>   ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>   ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>                     ^
>   ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>   ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ==================================================================
> 


I think the patch from below should fix it.

I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
super block in the buffer get corrupted sometime after the .get_tree
(which eventually calls __ext4_fill_super()) is called. So instead of
relying on the contents of the buffer, we should instead rely on the
s_desc_size initialized at the __ext4_fill_super() time.

If someone finds this good (or bad), or has a more in depth explanation,
please let me know, it will help me better understand the subsystem. In
the meantime I'll continue to investigate this and prepare a patch for
it.

Cheers,
ta

index 260c1b3e3ef2..91d41e84da32 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3182,11 +3182,9 @@ static __le16 ext4_group_desc_csum(struct 
super_block *sb, __u32 block_group,
         crc = crc16(crc, (__u8 *)gdp, offset);
         offset += sizeof(gdp->bg_checksum); /* skip checksum */
         /* for checksum of struct ext4_group_desc do the rest...*/
-       if (ext4_has_feature_64bit(sb) &&
-           offset < le16_to_cpu(sbi->s_es->s_desc_size))
+       if (ext4_has_feature_64bit(sb) && offset < sbi->s_desc_size)
                 crc = crc16(crc, (__u8 *)gdp + offset,
-                           le16_to_cpu(sbi->s_es->s_desc_size) -
-                               offset);
+                           sbi->s_desc_size - offset);

  out:
         return cpu_to_le16(crc);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum
  2023-02-02  6:54 [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum syzbot
  2023-02-13 15:56 ` syzbot
@ 2023-03-03 21:43 ` syzbot
  1 sibling, 0 replies; 20+ messages in thread
From: syzbot @ 2023-03-03 21:43 UTC (permalink / raw)
  To: adilger.kernel, joneslee, linux-ext4, linux-fsdevel,
	linux-kernel, llvm, nathan, ndesaulniers, syzkaller-bugs, trix,
	tudor.ambarus, tytso

syzbot has found a reproducer for the following issue on:

HEAD commit:    596b6b709632 Merge branch 'for-next/core' into for-kernelci
git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
console output: https://syzkaller.appspot.com/x/log.txt?x=1151054cc80000
kernel config:  https://syzkaller.appspot.com/x/.config?x=3519974f3f27816d
dashboard link: https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
compiler:       Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
userspace arch: arm64
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16ce3de4c80000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=16b02598c80000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/06e2210b88a3/disk-596b6b70.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/79e6930ab577/vmlinux-596b6b70.xz
kernel image: https://storage.googleapis.com/syzbot-assets/56b95e6bcb5c/Image-596b6b70.gz.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/a765d6554060/mount_0.gz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+8785e41224a3afd04321@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: slab-out-of-bounds in crc16+0xc0/0x104 lib/crc16.c:58
Read of size 1 at addr ffff0000d5eff0a8 by task syz-executor175/8245

CPU: 1 PID: 8245 Comm: syz-executor175 Not tainted 6.2.0-syzkaller-18302-g596b6b709632 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023
Call trace:
 dump_backtrace+0x1c8/0x1f4 arch/arm64/kernel/stacktrace.c:158
 show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:165
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xd0/0x124 lib/dump_stack.c:106
 print_address_description mm/kasan/report.c:306 [inline]
 print_report+0x174/0x4c0 mm/kasan/report.c:417
 kasan_report+0xd4/0x130 mm/kasan/report.c:517
 __asan_report_load1_noabort+0x2c/0x38 mm/kasan/report_generic.c:348
 crc16+0xc0/0x104 lib/crc16.c:58
 ext4_group_desc_csum+0x6a8/0x99c fs/ext4/super.c:3187
 ext4_group_desc_csum_set+0x17c/0x210 fs/ext4/super.c:3210
 __ext4_new_inode+0x20dc/0x3acc fs/ext4/ialloc.c:1227
 ext4_create+0x234/0x480 fs/ext4/namei.c:2809
 lookup_open fs/namei.c:3413 [inline]
 open_last_lookups fs/namei.c:3481 [inline]
 path_openat+0xe6c/0x2578 fs/namei.c:3711
 do_filp_open+0x1bc/0x3cc fs/namei.c:3741
 do_sys_openat2+0x128/0x3d8 fs/open.c:1310
 do_sys_open fs/open.c:1326 [inline]
 __do_sys_openat fs/open.c:1342 [inline]
 __se_sys_openat fs/open.c:1337 [inline]
 __arm64_sys_openat+0x1f0/0x240 fs/open.c:1337
 __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
 invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52
 el0_svc_common+0x138/0x258 arch/arm64/kernel/syscall.c:142
 do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:193
 el0_svc+0x58/0x168 arch/arm64/kernel/entry-common.c:637
 el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591

Allocated by task 5961:
 kasan_save_stack mm/kasan/common.c:45 [inline]
 kasan_set_track+0x4c/0x80 mm/kasan/common.c:52
 kasan_save_alloc_info+0x24/0x30 mm/kasan/generic.c:512
 __kasan_slab_alloc+0x74/0x8c mm/kasan/common.c:328
 kasan_slab_alloc include/linux/kasan.h:201 [inline]
 slab_post_alloc_hook+0x80/0x478 mm/slab.h:761
 slab_alloc_node mm/slub.c:3452 [inline]
 slab_alloc mm/slub.c:3460 [inline]
 __kmem_cache_alloc_lru mm/slub.c:3467 [inline]
 kmem_cache_alloc+0x288/0x37c mm/slub.c:3476
 kmem_cache_zalloc include/linux/slab.h:710 [inline]
 __kernfs_new_node+0xe4/0x66c fs/kernfs/dir.c:614
 kernfs_new_node+0x98/0x184 fs/kernfs/dir.c:676
 __kernfs_create_file+0x60/0x2d4 fs/kernfs/file.c:1047
 sysfs_add_file_mode_ns+0x1dc/0x298 fs/sysfs/file.c:294
 create_files fs/sysfs/group.c:64 [inline]
 internal_create_group+0x428/0xbec fs/sysfs/group.c:148
 internal_create_groups fs/sysfs/group.c:188 [inline]
 sysfs_create_groups+0x60/0x130 fs/sysfs/group.c:214
 create_dir lib/kobject.c:68 [inline]
 kobject_add_internal+0x5d4/0xb14 lib/kobject.c:223
 kobject_add_varg lib/kobject.c:358 [inline]
 kobject_init_and_add+0x130/0x1a0 lib/kobject.c:441
 netdev_queue_add_kobject net/core/net-sysfs.c:1666 [inline]
 netdev_queue_update_kobjects+0x1d8/0x470 net/core/net-sysfs.c:1718
 register_queue_kobjects net/core/net-sysfs.c:1779 [inline]
 netdev_register_kobject+0x22c/0x2d8 net/core/net-sysfs.c:2019
 register_netdevice+0xcb8/0x1270 net/core/dev.c:10037
 bond_newlink+0x50/0xa8 drivers/net/bonding/bond_netlink.c:560
 rtnl_newlink_create net/core/rtnetlink.c:3407 [inline]
 __rtnl_newlink net/core/rtnetlink.c:3624 [inline]
 rtnl_newlink+0x1174/0x1b1c net/core/rtnetlink.c:3637
 rtnetlink_rcv_msg+0x6ec/0xc8c net/core/rtnetlink.c:6141
 netlink_rcv_skb+0x214/0x3c4 net/netlink/af_netlink.c:2574
 rtnetlink_rcv+0x28/0x38 net/core/rtnetlink.c:6159
 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
 netlink_unicast+0x660/0x8d4 net/netlink/af_netlink.c:1365
 netlink_sendmsg+0x800/0xae0 net/netlink/af_netlink.c:1942
 sock_sendmsg_nosec net/socket.c:714 [inline]
 sock_sendmsg net/socket.c:734 [inline]
 __sys_sendto+0x3b4/0x504 net/socket.c:2120
 __do_sys_sendto net/socket.c:2132 [inline]
 __se_sys_sendto net/socket.c:2128 [inline]
 __arm64_sys_sendto+0xd8/0xf8 net/socket.c:2128
 __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
 invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52
 el0_svc_common+0x138/0x258 arch/arm64/kernel/syscall.c:142
 do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:193
 el0_svc+0x58/0x168 arch/arm64/kernel/entry-common.c:637
 el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591

The buggy address belongs to the object at ffff0000d5eff000
 which belongs to the cache kernfs_node_cache of size 168
The buggy address is located 0 bytes to the right of
 168-byte region [ffff0000d5eff000, ffff0000d5eff0a8)

The buggy address belongs to the physical page:
page:0000000016584f53 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x115eff
flags: 0x5ffc00000000200(slab|node=0|zone=2|lastcpupid=0x7ff)
raw: 05ffc00000000200 ffff0000c0844c00 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000110011 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff0000d5efef80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff0000d5eff000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff0000d5eff080: 00 00 00 00 00 fc fc fc fc fc fc fc fc 00 00 00
                                  ^
 ffff0000d5eff100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff0000d5eff180: 00 00 fc fc fc fc fc fc fc fc 00 00 00 00 00 00
==================================================================
EXT4-fs error (device loop3): __ext4_get_inode_loc:4560: comm syz-executor175: Invalid inode table block 4 in block_group 0
EXT4-fs error (device loop3) in ext4_reserve_inode_write:5906: Corrupt filesystem
EXT4-fs error (device loop3): __ext4_get_inode_loc:4560: comm syz-executor175: Invalid inode table block 4 in block_group 0
EXT4-fs error (device loop3) in ext4_reserve_inode_write:5906: Corrupt filesystem
EXT4-fs error (device loop3): ext4_evict_inode:279: inode #18: comm syz-executor175: mark_inode_dirty error
EXT4-fs warning (device loop3): ext4_evict_inode:282: couldn't mark inode dirty (err -117)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum
  2023-03-01 12:13   ` Tudor Ambarus
@ 2023-03-07 10:39     ` Jan Kara
  2023-03-07 11:02       ` Tudor Ambarus
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Kara @ 2023-03-07 10:39 UTC (permalink / raw)
  To: Tudor Ambarus
  Cc: syzbot, adilger.kernel, linux-ext4, linux-kernel, llvm, nathan,
	ndesaulniers, syzkaller-bugs, trix, tytso, Lee Jones

Hi!

On Wed 01-03-23 12:13:51, Tudor Ambarus wrote:
> On 2/13/23 15:56, syzbot wrote:
> > syzbot has found a reproducer for the following issue on:
> > 
> > HEAD commit:    ceaa837f96ad Linux 6.2-rc8
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
> > dashboard link: https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
> > compiler:       Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000
> > 
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
> > mounted in repro: https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz
> > 
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+8785e41224a3afd04321@syzkaller.appspotmail.com
> > 
> > ==================================================================
> > BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
> > Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339
> > 
> > CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted 6.2.0-rc8-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023
> > Call Trace:
> >   <TASK>
> >   __dump_stack lib/dump_stack.c:88 [inline]
> >   dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
> >   print_address_description mm/kasan/report.c:306 [inline]
> >   print_report+0x163/0x4f0 mm/kasan/report.c:417
> >   kasan_report+0x13a/0x170 mm/kasan/report.c:517
> >   crc16+0x1fb/0x280 lib/crc16.c:58
> >   ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
> >   ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
> >   ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
> >   ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
> >   ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
> >   ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
> >   ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
> >   ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
> >   ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
> >   ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
> >   evict+0x2a4/0x620 fs/inode.c:664
> >   do_unlinkat+0x4f1/0x930 fs/namei.c:4327
> >   __do_sys_unlink fs/namei.c:4368 [inline]
> >   __se_sys_unlink fs/namei.c:4366 [inline]
> >   __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
> >   do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> >   do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> >   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > RIP: 0033:0x7fbc85a8c0f9
> > Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
> > RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
> > RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
> >   </TASK>
> > 
> > The buggy address belongs to the physical page:
> > page:ffffea0001f78000 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 pfn:0x7de00
> > flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
> > raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08 0000000000000000
> > raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000
> > page dumped because: kasan: bad access detected
> > page_owner tracks the page as freed
> > page last allocated via order 1, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
> >   prep_new_page mm/page_alloc.c:2531 [inline]
> >   get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
> >   __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
> >   alloc_slab_page+0x6a/0x160 mm/slub.c:1851
> >   allocate_slab mm/slub.c:1998 [inline]
> >   new_slab+0x84/0x2f0 mm/slub.c:2051
> >   ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
> >   __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
> >   kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
> >   mt_alloc_bulk lib/maple_tree.c:157 [inline]
> >   mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
> >   mas_node_count_gfp lib/maple_tree.c:1316 [inline]
> >   mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
> >   vma_expand+0x277/0x850 mm/mmap.c:541
> >   mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
> >   do_mmap+0x8c9/0xf70 mm/mmap.c:1411
> >   vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
> >   do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> >   do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> >   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > page last free stack trace:
> >   reset_page_owner include/linux/page_owner.h:24 [inline]
> >   free_pages_prepare mm/page_alloc.c:1446 [inline]
> >   free_pcp_prepare mm/page_alloc.c:1496 [inline]
> >   free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
> >   free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
> >   qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
> >   kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
> >   __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
> >   kasan_slab_alloc include/linux/kasan.h:201 [inline]
> >   slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
> >   slab_alloc_node mm/slub.c:3452 [inline]
> >   kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
> >   __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
> >   alloc_skb include/linux/skbuff.h:1270 [inline]
> >   alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
> >   sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
> >   unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
> >   sock_sendmsg_nosec net/socket.c:714 [inline]
> >   sock_sendmsg net/socket.c:734 [inline]
> >   __sys_sendto+0x475/0x5f0 net/socket.c:2117
> >   __do_sys_sendto net/socket.c:2129 [inline]
> >   __se_sys_sendto net/socket.c:2125 [inline]
> >   __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
> >   do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> >   do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> >   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > 
> > Memory state around the buggy address:
> >   ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >   ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> >                     ^
> >   ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> >   ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > ==================================================================
> > 
> 
> 
> I think the patch from below should fix it.
> 
> I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
> EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
> super block in the buffer get corrupted sometime after the .get_tree
> (which eventually calls __ext4_fill_super()) is called. So instead of
> relying on the contents of the buffer, we should instead rely on the
> s_desc_size initialized at the __ext4_fill_super() time.
> 
> If someone finds this good (or bad), or has a more in depth explanation,
> please let me know, it will help me better understand the subsystem. In
> the meantime I'll continue to investigate this and prepare a patch for
> it.

If there's something corrupting the superblock while the filesystem is
mounted, we need to find what is corrupting the SB and fix *that*. Not try
to paper over the problem by not using the on-disk data... Maybe journal
replay is corrupting the value or something like that?

								Honza

> index 260c1b3e3ef2..91d41e84da32 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -3182,11 +3182,9 @@ static __le16 ext4_group_desc_csum(struct super_block
> *sb, __u32 block_group,
>         crc = crc16(crc, (__u8 *)gdp, offset);
>         offset += sizeof(gdp->bg_checksum); /* skip checksum */
>         /* for checksum of struct ext4_group_desc do the rest...*/
> -       if (ext4_has_feature_64bit(sb) &&
> -           offset < le16_to_cpu(sbi->s_es->s_desc_size))
> +       if (ext4_has_feature_64bit(sb) && offset < sbi->s_desc_size)
>                 crc = crc16(crc, (__u8 *)gdp + offset,
> -                           le16_to_cpu(sbi->s_es->s_desc_size) -
> -                               offset);
> +                           sbi->s_desc_size - offset);
> 
>  out:
>         return cpu_to_le16(crc);
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum
  2023-03-07 10:39     ` Jan Kara
@ 2023-03-07 11:02       ` Tudor Ambarus
  2023-03-13 11:11         ` Tudor Ambarus
  0 siblings, 1 reply; 20+ messages in thread
From: Tudor Ambarus @ 2023-03-07 11:02 UTC (permalink / raw)
  To: Jan Kara
  Cc: syzbot, adilger.kernel, linux-ext4, linux-kernel, llvm, nathan,
	ndesaulniers, syzkaller-bugs, trix, tytso, Lee Jones



On 3/7/23 10:39, Jan Kara wrote:
> Hi!

Hi!

Thanks for taking the time to review the proposal!

> 
> On Wed 01-03-23 12:13:51, Tudor Ambarus wrote:
>> On 2/13/23 15:56, syzbot wrote:
>>> syzbot has found a reproducer for the following issue on:
>>>
>>> HEAD commit:    ceaa837f96ad Linux 6.2-rc8
>>> git tree:       upstream
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
>>> dashboard link: https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
>>> compiler:       Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000
>>>
>>> Downloadable assets:
>>> disk image: https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
>>> vmlinux: https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
>>> kernel image: https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
>>> mounted in repro: https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz
>>>
>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>> Reported-by: syzbot+8785e41224a3afd04321@syzkaller.appspotmail.com
>>>
>>> ==================================================================
>>> BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
>>> Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339
>>>
>>> CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted 6.2.0-rc8-syzkaller #0
>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023
>>> Call Trace:
>>>    <TASK>
>>>    __dump_stack lib/dump_stack.c:88 [inline]
>>>    dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
>>>    print_address_description mm/kasan/report.c:306 [inline]
>>>    print_report+0x163/0x4f0 mm/kasan/report.c:417
>>>    kasan_report+0x13a/0x170 mm/kasan/report.c:517
>>>    crc16+0x1fb/0x280 lib/crc16.c:58
>>>    ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
>>>    ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
>>>    ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
>>>    ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
>>>    ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
>>>    ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
>>>    ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
>>>    ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
>>>    ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
>>>    ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
>>>    evict+0x2a4/0x620 fs/inode.c:664
>>>    do_unlinkat+0x4f1/0x930 fs/namei.c:4327
>>>    __do_sys_unlink fs/namei.c:4368 [inline]
>>>    __se_sys_unlink fs/namei.c:4366 [inline]
>>>    __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
>>>    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>> RIP: 0033:0x7fbc85a8c0f9
>>> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
>>> RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
>>> RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
>>> RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
>>> R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
>>>    </TASK>
>>>
>>> The buggy address belongs to the physical page:
>>> page:ffffea0001f78000 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 pfn:0x7de00
>>> flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
>>> raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08 0000000000000000
>>> raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000
>>> page dumped because: kasan: bad access detected
>>> page_owner tracks the page as freed
>>> page last allocated via order 1, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
>>>    prep_new_page mm/page_alloc.c:2531 [inline]
>>>    get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
>>>    __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
>>>    alloc_slab_page+0x6a/0x160 mm/slub.c:1851
>>>    allocate_slab mm/slub.c:1998 [inline]
>>>    new_slab+0x84/0x2f0 mm/slub.c:2051
>>>    ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
>>>    __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
>>>    kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
>>>    mt_alloc_bulk lib/maple_tree.c:157 [inline]
>>>    mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
>>>    mas_node_count_gfp lib/maple_tree.c:1316 [inline]
>>>    mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
>>>    vma_expand+0x277/0x850 mm/mmap.c:541
>>>    mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
>>>    do_mmap+0x8c9/0xf70 mm/mmap.c:1411
>>>    vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
>>>    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>> page last free stack trace:
>>>    reset_page_owner include/linux/page_owner.h:24 [inline]
>>>    free_pages_prepare mm/page_alloc.c:1446 [inline]
>>>    free_pcp_prepare mm/page_alloc.c:1496 [inline]
>>>    free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
>>>    free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
>>>    qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
>>>    kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
>>>    __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
>>>    kasan_slab_alloc include/linux/kasan.h:201 [inline]
>>>    slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
>>>    slab_alloc_node mm/slub.c:3452 [inline]
>>>    kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
>>>    __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
>>>    alloc_skb include/linux/skbuff.h:1270 [inline]
>>>    alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
>>>    sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
>>>    unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
>>>    sock_sendmsg_nosec net/socket.c:714 [inline]
>>>    sock_sendmsg net/socket.c:734 [inline]
>>>    __sys_sendto+0x475/0x5f0 net/socket.c:2117
>>>    __do_sys_sendto net/socket.c:2129 [inline]
>>>    __se_sys_sendto net/socket.c:2125 [inline]
>>>    __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
>>>    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>
>>> Memory state around the buggy address:
>>>    ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>    ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>> ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>                      ^
>>>    ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>    ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>> ==================================================================
>>>
>>
>>
>> I think the patch from below should fix it.
>>
>> I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
>> EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
>> super block in the buffer get corrupted sometime after the .get_tree
>> (which eventually calls __ext4_fill_super()) is called. So instead of
>> relying on the contents of the buffer, we should instead rely on the
>> s_desc_size initialized at the __ext4_fill_super() time.
>>
>> If someone finds this good (or bad), or has a more in depth explanation,
>> please let me know, it will help me better understand the subsystem. In
>> the meantime I'll continue to investigate this and prepare a patch for
>> it.
> 
> If there's something corrupting the superblock while the filesystem is
> mounted, we need to find what is corrupting the SB and fix *that*. Not try
> to paper over the problem by not using the on-disk data... Maybe journal
> replay is corrupting the value or something like that?
> 
> 								Honza
>

Ok, I agree. First thing would be to understand the reproducer and to
simplify it if possible. I haven't yet decoded what the syz repro is
doing at
https://syzkaller.appspot.com/text?tag=ReproSyz&x=16ce3de4c80000
Will reply to this email thread once I understand what's happening. If 
you or someone else can decode the syz repro faster than me, shoot.

Cheers,
ta

>> index 260c1b3e3ef2..91d41e84da32 100644
>> --- a/fs/ext4/super.c
>> +++ b/fs/ext4/super.c
>> @@ -3182,11 +3182,9 @@ static __le16 ext4_group_desc_csum(struct super_block
>> *sb, __u32 block_group,
>>          crc = crc16(crc, (__u8 *)gdp, offset);
>>          offset += sizeof(gdp->bg_checksum); /* skip checksum */
>>          /* for checksum of struct ext4_group_desc do the rest...*/
>> -       if (ext4_has_feature_64bit(sb) &&
>> -           offset < le16_to_cpu(sbi->s_es->s_desc_size))
>> +       if (ext4_has_feature_64bit(sb) && offset < sbi->s_desc_size)
>>                  crc = crc16(crc, (__u8 *)gdp + offset,
>> -                           le16_to_cpu(sbi->s_es->s_desc_size) -
>> -                               offset);
>> +                           sbi->s_desc_size - offset);
>>
>>   out:
>>          return cpu_to_le16(crc);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum
  2023-03-07 11:02       ` Tudor Ambarus
@ 2023-03-13 11:11         ` Tudor Ambarus
  2023-03-13 11:57           ` Jan Kara
  0 siblings, 1 reply; 20+ messages in thread
From: Tudor Ambarus @ 2023-03-13 11:11 UTC (permalink / raw)
  To: Jan Kara
  Cc: syzbot, adilger.kernel, linux-ext4, linux-kernel, llvm, nathan,
	ndesaulniers, syzkaller-bugs, trix, tytso, Lee Jones

Hi, Jan,

On 3/7/23 11:02, Tudor Ambarus wrote:
> 
> 
> On 3/7/23 10:39, Jan Kara wrote:
>> Hi!
> 
> Hi!
> 
> Thanks for taking the time to review the proposal!
> 
>>
>> On Wed 01-03-23 12:13:51, Tudor Ambarus wrote:
>>> On 2/13/23 15:56, syzbot wrote:
>>>> syzbot has found a reproducer for the following issue on:
>>>>
>>>> HEAD commit:    ceaa837f96ad Linux 6.2-rc8
>>>> git tree:       upstream
>>>> console output:
>>>> https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
>>>> kernel config: 
>>>> https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
>>>> dashboard link:
>>>> https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
>>>> compiler:       Debian clang version 15.0.7, GNU ld (GNU Binutils
>>>> for Debian) 2.35.2
>>>> syz repro:     
>>>> https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000
>>>>
>>>> Downloadable assets:
>>>> disk image:
>>>> https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
>>>> vmlinux:
>>>> https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
>>>> kernel image:
>>>> https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
>>>> mounted in repro:
>>>> https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz
>>>>
>>>> IMPORTANT: if you fix the issue, please add the following tag to the
>>>> commit:
>>>> Reported-by: syzbot+8785e41224a3afd04321@syzkaller.appspotmail.com
>>>>
>>>> ==================================================================
>>>> BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
>>>> Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339
>>>>
>>>> CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted
>>>> 6.2.0-rc8-syzkaller #0
>>>> Hardware name: Google Google Compute Engine/Google Compute Engine,
>>>> BIOS Google 01/21/2023
>>>> Call Trace:
>>>>    <TASK>
>>>>    __dump_stack lib/dump_stack.c:88 [inline]
>>>>    dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
>>>>    print_address_description mm/kasan/report.c:306 [inline]
>>>>    print_report+0x163/0x4f0 mm/kasan/report.c:417
>>>>    kasan_report+0x13a/0x170 mm/kasan/report.c:517
>>>>    crc16+0x1fb/0x280 lib/crc16.c:58
>>>>    ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
>>>>    ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
>>>>    ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
>>>>    ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
>>>>    ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
>>>>    ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
>>>>    ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
>>>>    ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
>>>>    ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
>>>>    ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
>>>>    evict+0x2a4/0x620 fs/inode.c:664
>>>>    do_unlinkat+0x4f1/0x930 fs/namei.c:4327
>>>>    __do_sys_unlink fs/namei.c:4368 [inline]
>>>>    __se_sys_unlink fs/namei.c:4366 [inline]
>>>>    __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
>>>>    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>>>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>> RIP: 0033:0x7fbc85a8c0f9
>>>> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48
>>>> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
>>>> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
>>>> RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
>>>> RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
>>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
>>>> RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
>>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
>>>> R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
>>>>    </TASK>
>>>>
>>>> The buggy address belongs to the physical page:
>>>> page:ffffea0001f78000 refcount:0 mapcount:-128
>>>> mapping:0000000000000000 index:0x0 pfn:0x7de00
>>>> flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
>>>> raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08
>>>> 0000000000000000
>>>> raw: 0000000000000000 0000000000000001 00000000ffffff7f
>>>> 0000000000000000
>>>> page dumped because: kasan: bad access detected
>>>> page_owner tracks the page as freed
>>>> page last allocated via order 1, migratetype Unmovable, gfp_mask
>>>> 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
>>>>    prep_new_page mm/page_alloc.c:2531 [inline]
>>>>    get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
>>>>    __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
>>>>    alloc_slab_page+0x6a/0x160 mm/slub.c:1851
>>>>    allocate_slab mm/slub.c:1998 [inline]
>>>>    new_slab+0x84/0x2f0 mm/slub.c:2051
>>>>    ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
>>>>    __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
>>>>    kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
>>>>    mt_alloc_bulk lib/maple_tree.c:157 [inline]
>>>>    mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
>>>>    mas_node_count_gfp lib/maple_tree.c:1316 [inline]
>>>>    mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
>>>>    vma_expand+0x277/0x850 mm/mmap.c:541
>>>>    mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
>>>>    do_mmap+0x8c9/0xf70 mm/mmap.c:1411
>>>>    vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
>>>>    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>>>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>> page last free stack trace:
>>>>    reset_page_owner include/linux/page_owner.h:24 [inline]
>>>>    free_pages_prepare mm/page_alloc.c:1446 [inline]
>>>>    free_pcp_prepare mm/page_alloc.c:1496 [inline]
>>>>    free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
>>>>    free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
>>>>    qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
>>>>    kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
>>>>    __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
>>>>    kasan_slab_alloc include/linux/kasan.h:201 [inline]
>>>>    slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
>>>>    slab_alloc_node mm/slub.c:3452 [inline]
>>>>    kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
>>>>    __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
>>>>    alloc_skb include/linux/skbuff.h:1270 [inline]
>>>>    alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
>>>>    sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
>>>>    unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
>>>>    sock_sendmsg_nosec net/socket.c:714 [inline]
>>>>    sock_sendmsg net/socket.c:734 [inline]
>>>>    __sys_sendto+0x475/0x5f0 net/socket.c:2117
>>>>    __do_sys_sendto net/socket.c:2129 [inline]
>>>>    __se_sys_sendto net/socket.c:2125 [inline]
>>>>    __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
>>>>    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>>>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>
>>>> Memory state around the buggy address:
>>>>    ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>    ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>> ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>                      ^
>>>>    ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>    ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>> ==================================================================
>>>>
>>>
>>>
>>> I think the patch from below should fix it.
>>>
>>> I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
>>> EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
>>> super block in the buffer get corrupted sometime after the .get_tree
>>> (which eventually calls __ext4_fill_super()) is called. So instead of
>>> relying on the contents of the buffer, we should instead rely on the
>>> s_desc_size initialized at the __ext4_fill_super() time.
>>>
>>> If someone finds this good (or bad), or has a more in depth explanation,
>>> please let me know, it will help me better understand the subsystem. In
>>> the meantime I'll continue to investigate this and prepare a patch for
>>> it.
>>
>> If there's something corrupting the superblock while the filesystem is
>> mounted, we need to find what is corrupting the SB and fix *that*. Not
>> try
>> to paper over the problem by not using the on-disk data... Maybe journal
>> replay is corrupting the value or something like that?
>>
>>                                 Honza
>>
> 
> Ok, I agree. First thing would be to understand the reproducer and to
> simplify it if possible. I haven't yet decoded what the syz repro is
> doing at
> https://syzkaller.appspot.com/text?tag=ReproSyz&x=16ce3de4c80000
> Will reply to this email thread once I understand what's happening. If
> you or someone else can decode the syz repro faster than me, shoot.
> 

I can now explain how the contents of the super block of the buffer get
corrupted. After the ext4 fs is mounted to the target ("./bus"), the
reproducer maps 6MB of data starting at offset 0 in the target's file
("./bus"), then it starts overriding the data with something else, by
using memcpy, memset, individual byte inits. Does that mean that we
shouldn't rely on the contents of the super block in the buffer after we
mount the file system? If so, then my patch stands. I'll be happy to
extend it if needed. Below one may find a step by step interpretation of
the reproducer.

We have a strace log for the same bug, but on Android 5.15:
https://syzkaller.appspot.com/text?tag=CrashLog&x=14ecec8cc80000

Look for pid 328. You notice that the bpf() syscalls return error, so I
commented them out in the c repro to confirm that they are not the
cause. The bug reproduced without the bpf() calls. One can find the c
repro at:
https://syzkaller.appspot.com/text?tag=ReproC&x=17c5fc50c80000

Let's look at these calls, just before the bug was hit:
[pid   328] open("./bus",
O_RDWR|O_CREAT|O_TRUNC|O_NONBLOCK|O_SYNC|O_DIRECT|O_LARGEFILE|O_NOATIME,
000) = 4
[pid   328] mount("/dev/loop0", "./bus", NULL, MS_BIND, NULL) = 0
[pid   328] open("./bus", O_RDWR|O_SYNC|O_NOATIME|0x3c) = 5
[pid   328] mmap(0x20000000, 6291456,
PROT_READ|PROT_WRITE|PROT_EXEC|PROT_SEM|0x47ffff0, MAP_SHARED|MAP_FIXED,
5, 0) = 0x20000000

- ./bus is created (if it does not exist), fd 4 is returned.
- /dev/loop0 is mounted to ./bus
- then it creates a new file descriptor (5) for the same ./bus
- then it creates a mapping for ./bus starting at offset zero. The
mapped area is at 0x20000000 and is of 0x600000ul length.

Now look again in the c reproducer. You'll see that after the mapping
lots of bytes are overwritten starting with 0x20000300. If I comment out
all those byte modifications after the mmap, the reproducer is silenced.

Cheers,
ta
> 
>>> index 260c1b3e3ef2..91d41e84da32 100644
>>> --- a/fs/ext4/super.c
>>> +++ b/fs/ext4/super.c
>>> @@ -3182,11 +3182,9 @@ static __le16 ext4_group_desc_csum(struct
>>> super_block
>>> *sb, __u32 block_group,
>>>          crc = crc16(crc, (__u8 *)gdp, offset);
>>>          offset += sizeof(gdp->bg_checksum); /* skip checksum */
>>>          /* for checksum of struct ext4_group_desc do the rest...*/
>>> -       if (ext4_has_feature_64bit(sb) &&
>>> -           offset < le16_to_cpu(sbi->s_es->s_desc_size))
>>> +       if (ext4_has_feature_64bit(sb) && offset < sbi->s_desc_size)
>>>                  crc = crc16(crc, (__u8 *)gdp + offset,
>>> -                           le16_to_cpu(sbi->s_es->s_desc_size) -
>>> -                               offset);
>>> +                           sbi->s_desc_size - offset);
>>>
>>>   out:
>>>          return cpu_to_le16(crc);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum
  2023-03-13 11:11         ` Tudor Ambarus
@ 2023-03-13 11:57           ` Jan Kara
  2023-03-13 12:27             ` yebin
                               ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Jan Kara @ 2023-03-13 11:57 UTC (permalink / raw)
  To: Tudor Ambarus
  Cc: Jan Kara, syzbot, adilger.kernel, linux-ext4, linux-kernel, llvm,
	nathan, ndesaulniers, syzkaller-bugs, trix, tytso, Lee Jones

Hi Tudor!

On Mon 13-03-23 11:11:18, Tudor Ambarus wrote:
> On 3/7/23 11:02, Tudor Ambarus wrote:
> > On 3/7/23 10:39, Jan Kara wrote:
> >> On Wed 01-03-23 12:13:51, Tudor Ambarus wrote:
> >>> On 2/13/23 15:56, syzbot wrote:
> >>>> syzbot has found a reproducer for the following issue on:
> >>>>
> >>>> HEAD commit:    ceaa837f96ad Linux 6.2-rc8
> >>>> git tree:       upstream
> >>>> console output:
> >>>> https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
> >>>> kernel config: 
> >>>> https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
> >>>> dashboard link:
> >>>> https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
> >>>> compiler:       Debian clang version 15.0.7, GNU ld (GNU Binutils
> >>>> for Debian) 2.35.2
> >>>> syz repro:     
> >>>> https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000
> >>>>
> >>>> Downloadable assets:
> >>>> disk image:
> >>>> https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
> >>>> vmlinux:
> >>>> https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
> >>>> kernel image:
> >>>> https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
> >>>> mounted in repro:
> >>>> https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz
> >>>>
> >>>> IMPORTANT: if you fix the issue, please add the following tag to the
> >>>> commit:
> >>>> Reported-by: syzbot+8785e41224a3afd04321@syzkaller.appspotmail.com
> >>>>
> >>>> ==================================================================
> >>>> BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
> >>>> Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339
> >>>>
> >>>> CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted
> >>>> 6.2.0-rc8-syzkaller #0
> >>>> Hardware name: Google Google Compute Engine/Google Compute Engine,
> >>>> BIOS Google 01/21/2023
> >>>> Call Trace:
> >>>>    <TASK>
> >>>>    __dump_stack lib/dump_stack.c:88 [inline]
> >>>>    dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
> >>>>    print_address_description mm/kasan/report.c:306 [inline]
> >>>>    print_report+0x163/0x4f0 mm/kasan/report.c:417
> >>>>    kasan_report+0x13a/0x170 mm/kasan/report.c:517
> >>>>    crc16+0x1fb/0x280 lib/crc16.c:58
> >>>>    ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
> >>>>    ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
> >>>>    ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
> >>>>    ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
> >>>>    ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
> >>>>    ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
> >>>>    ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
> >>>>    ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
> >>>>    ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
> >>>>    ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
> >>>>    evict+0x2a4/0x620 fs/inode.c:664
> >>>>    do_unlinkat+0x4f1/0x930 fs/namei.c:4327
> >>>>    __do_sys_unlink fs/namei.c:4368 [inline]
> >>>>    __se_sys_unlink fs/namei.c:4366 [inline]
> >>>>    __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
> >>>>    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> >>>>    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> >>>>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >>>> RIP: 0033:0x7fbc85a8c0f9
> >>>> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48
> >>>> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
> >>>> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> >>>> RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
> >>>> RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
> >>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
> >>>> RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
> >>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> >>>> R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
> >>>>    </TASK>
> >>>>
> >>>> The buggy address belongs to the physical page:
> >>>> page:ffffea0001f78000 refcount:0 mapcount:-128
> >>>> mapping:0000000000000000 index:0x0 pfn:0x7de00
> >>>> flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
> >>>> raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08
> >>>> 0000000000000000
> >>>> raw: 0000000000000000 0000000000000001 00000000ffffff7f
> >>>> 0000000000000000
> >>>> page dumped because: kasan: bad access detected
> >>>> page_owner tracks the page as freed
> >>>> page last allocated via order 1, migratetype Unmovable, gfp_mask
> >>>> 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
> >>>>    prep_new_page mm/page_alloc.c:2531 [inline]
> >>>>    get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
> >>>>    __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
> >>>>    alloc_slab_page+0x6a/0x160 mm/slub.c:1851
> >>>>    allocate_slab mm/slub.c:1998 [inline]
> >>>>    new_slab+0x84/0x2f0 mm/slub.c:2051
> >>>>    ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
> >>>>    __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
> >>>>    kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
> >>>>    mt_alloc_bulk lib/maple_tree.c:157 [inline]
> >>>>    mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
> >>>>    mas_node_count_gfp lib/maple_tree.c:1316 [inline]
> >>>>    mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
> >>>>    vma_expand+0x277/0x850 mm/mmap.c:541
> >>>>    mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
> >>>>    do_mmap+0x8c9/0xf70 mm/mmap.c:1411
> >>>>    vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
> >>>>    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> >>>>    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> >>>>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >>>> page last free stack trace:
> >>>>    reset_page_owner include/linux/page_owner.h:24 [inline]
> >>>>    free_pages_prepare mm/page_alloc.c:1446 [inline]
> >>>>    free_pcp_prepare mm/page_alloc.c:1496 [inline]
> >>>>    free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
> >>>>    free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
> >>>>    qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
> >>>>    kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
> >>>>    __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
> >>>>    kasan_slab_alloc include/linux/kasan.h:201 [inline]
> >>>>    slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
> >>>>    slab_alloc_node mm/slub.c:3452 [inline]
> >>>>    kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
> >>>>    __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
> >>>>    alloc_skb include/linux/skbuff.h:1270 [inline]
> >>>>    alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
> >>>>    sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
> >>>>    unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
> >>>>    sock_sendmsg_nosec net/socket.c:714 [inline]
> >>>>    sock_sendmsg net/socket.c:734 [inline]
> >>>>    __sys_sendto+0x475/0x5f0 net/socket.c:2117
> >>>>    __do_sys_sendto net/socket.c:2129 [inline]
> >>>>    __se_sys_sendto net/socket.c:2125 [inline]
> >>>>    __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
> >>>>    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> >>>>    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> >>>>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >>>>
> >>>> Memory state around the buggy address:
> >>>>    ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >>>>    ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >>>>> ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> >>>>                      ^
> >>>>    ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> >>>>    ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> >>>> ==================================================================
> >>>>
> >>>
> >>>
> >>> I think the patch from below should fix it.
> >>>
> >>> I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
> >>> EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
> >>> super block in the buffer get corrupted sometime after the .get_tree
> >>> (which eventually calls __ext4_fill_super()) is called. So instead of
> >>> relying on the contents of the buffer, we should instead rely on the
> >>> s_desc_size initialized at the __ext4_fill_super() time.
> >>>
> >>> If someone finds this good (or bad), or has a more in depth explanation,
> >>> please let me know, it will help me better understand the subsystem. In
> >>> the meantime I'll continue to investigate this and prepare a patch for
> >>> it.
> >>
> >> If there's something corrupting the superblock while the filesystem is
> >> mounted, we need to find what is corrupting the SB and fix *that*. Not
> >> try
> >> to paper over the problem by not using the on-disk data... Maybe journal
> >> replay is corrupting the value or something like that?
> >>
> >>                                 Honza
> >>
> > 
> > Ok, I agree. First thing would be to understand the reproducer and to
> > simplify it if possible. I haven't yet decoded what the syz repro is
> > doing at
> > https://syzkaller.appspot.com/text?tag=ReproSyz&x=16ce3de4c80000
> > Will reply to this email thread once I understand what's happening. If
> > you or someone else can decode the syz repro faster than me, shoot.
> > 
> 
> I can now explain how the contents of the super block of the buffer get
> corrupted. After the ext4 fs is mounted to the target ("./bus"), the
> reproducer maps 6MB of data starting at offset 0 in the target's file
> ("./bus"), then it starts overriding the data with something else, by
> using memcpy, memset, individual byte inits. Does that mean that we
> shouldn't rely on the contents of the super block in the buffer after we
> mount the file system? If so, then my patch stands. I'll be happy to
> extend it if needed. Below one may find a step by step interpretation of
> the reproducer.
> 
> We have a strace log for the same bug, but on Android 5.15:
> https://syzkaller.appspot.com/text?tag=CrashLog&x=14ecec8cc80000
> 
> Look for pid 328. You notice that the bpf() syscalls return error, so I
> commented them out in the c repro to confirm that they are not the
> cause. The bug reproduced without the bpf() calls. One can find the c
> repro at:
> https://syzkaller.appspot.com/text?tag=ReproC&x=17c5fc50c80000
> 
> Let's look at these calls, just before the bug was hit:
> [pid   328] open("./bus",
> O_RDWR|O_CREAT|O_TRUNC|O_NONBLOCK|O_SYNC|O_DIRECT|O_LARGEFILE|O_NOATIME,
> 000) = 4
> [pid   328] mount("/dev/loop0", "./bus", NULL, MS_BIND, NULL) = 0
> [pid   328] open("./bus", O_RDWR|O_SYNC|O_NOATIME|0x3c) = 5
> [pid   328] mmap(0x20000000, 6291456,
> PROT_READ|PROT_WRITE|PROT_EXEC|PROT_SEM|0x47ffff0, MAP_SHARED|MAP_FIXED,
> 5, 0) = 0x20000000

Yeah, looking at the reproducer, before this the reproducer also mounts
/dev/loop0 as ext4 filesystem.

> - ./bus is created (if it does not exist), fd 4 is returned.
> - /dev/loop0 is mounted to ./bus
> - then it creates a new file descriptor (5) for the same ./bus
> - then it creates a mapping for ./bus starting at offset zero. The
> mapped area is at 0x20000000 and is of 0x600000ul length.

So the result is that the reproducer modified the block device while it is
mounted by the filesystem. We know cases like this can crash the kernel and
it is inherently difficult to fix. We have to trust the buffer cache
contents as otherwise the performance will be unacceptable. For historical
reasons we also have to allow modifications of buffer cache while ext4 is
mounted because tune2fs uses this to e.g. update the label of a mounted
filesystem.

Long-term we are moving ext4 in a direction where we can disallow block
device modifications while the fs is mounted but we are not there yet. I've
discussed some shorter-term solution to avoid such known problems with syzbot
developers and what seems plausible would be a kconfig option to disallow
writing to a block device when it is exclusively open by someone else.
But so far I didn't get to trying whether this would reasonably work. Would
you be interested in having a look into this?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum
  2023-03-13 11:57           ` Jan Kara
@ 2023-03-13 12:27             ` yebin
  2023-03-13 13:01               ` Jan Kara
  2023-03-13 14:53             ` Dmitry Vyukov
  2023-04-30  2:55             ` Theodore Ts'o
  2 siblings, 1 reply; 20+ messages in thread
From: yebin @ 2023-03-13 12:27 UTC (permalink / raw)
  To: Jan Kara, Tudor Ambarus
  Cc: syzbot, adilger.kernel, linux-ext4, linux-kernel, llvm, nathan,
	ndesaulniers, syzkaller-bugs, trix, tytso, Lee Jones



On 2023/3/13 19:57, Jan Kara wrote:
> Hi Tudor!
>
> On Mon 13-03-23 11:11:18, Tudor Ambarus wrote:
>> On 3/7/23 11:02, Tudor Ambarus wrote:
>>> On 3/7/23 10:39, Jan Kara wrote:
>>>> On Wed 01-03-23 12:13:51, Tudor Ambarus wrote:
>>>>> On 2/13/23 15:56, syzbot wrote:
>>>>>> syzbot has found a reproducer for the following issue on:
>>>>>>
>>>>>> HEAD commit:    ceaa837f96ad Linux 6.2-rc8
>>>>>> git tree:       upstream
>>>>>> console output:
>>>>>> https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
>>>>>> kernel config:
>>>>>> https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
>>>>>> dashboard link:
>>>>>> https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
>>>>>> compiler:       Debian clang version 15.0.7, GNU ld (GNU Binutils
>>>>>> for Debian) 2.35.2
>>>>>> syz repro:
>>>>>> https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000
>>>>>>
>>>>>> Downloadable assets:
>>>>>> disk image:
>>>>>> https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
>>>>>> vmlinux:
>>>>>> https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
>>>>>> kernel image:
>>>>>> https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
>>>>>> mounted in repro:
>>>>>> https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz
>>>>>>
>>>>>> IMPORTANT: if you fix the issue, please add the following tag to the
>>>>>> commit:
>>>>>> Reported-by: syzbot+8785e41224a3afd04321@syzkaller.appspotmail.com
>>>>>>
>>>>>> ==================================================================
>>>>>> BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
>>>>>> Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339
>>>>>>
>>>>>> CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted
>>>>>> 6.2.0-rc8-syzkaller #0
>>>>>> Hardware name: Google Google Compute Engine/Google Compute Engine,
>>>>>> BIOS Google 01/21/2023
>>>>>> Call Trace:
>>>>>>     <TASK>
>>>>>>     __dump_stack lib/dump_stack.c:88 [inline]
>>>>>>     dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
>>>>>>     print_address_description mm/kasan/report.c:306 [inline]
>>>>>>     print_report+0x163/0x4f0 mm/kasan/report.c:417
>>>>>>     kasan_report+0x13a/0x170 mm/kasan/report.c:517
>>>>>>     crc16+0x1fb/0x280 lib/crc16.c:58
>>>>>>     ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
>>>>>>     ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
>>>>>>     ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
>>>>>>     ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
>>>>>>     ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
>>>>>>     ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
>>>>>>     ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
>>>>>>     ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
>>>>>>     ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
>>>>>>     ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
>>>>>>     evict+0x2a4/0x620 fs/inode.c:664
>>>>>>     do_unlinkat+0x4f1/0x930 fs/namei.c:4327
>>>>>>     __do_sys_unlink fs/namei.c:4368 [inline]
>>>>>>     __se_sys_unlink fs/namei.c:4366 [inline]
>>>>>>     __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
>>>>>>     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>>>     do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>>>>>     entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>> RIP: 0033:0x7fbc85a8c0f9
>>>>>> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48
>>>>>> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
>>>>>> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
>>>>>> RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
>>>>>> RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
>>>>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
>>>>>> RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
>>>>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
>>>>>> R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
>>>>>>     </TASK>
>>>>>>
>>>>>> The buggy address belongs to the physical page:
>>>>>> page:ffffea0001f78000 refcount:0 mapcount:-128
>>>>>> mapping:0000000000000000 index:0x0 pfn:0x7de00
>>>>>> flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
>>>>>> raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08
>>>>>> 0000000000000000
>>>>>> raw: 0000000000000000 0000000000000001 00000000ffffff7f
>>>>>> 0000000000000000
>>>>>> page dumped because: kasan: bad access detected
>>>>>> page_owner tracks the page as freed
>>>>>> page last allocated via order 1, migratetype Unmovable, gfp_mask
>>>>>> 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
>>>>>>     prep_new_page mm/page_alloc.c:2531 [inline]
>>>>>>     get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
>>>>>>     __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
>>>>>>     alloc_slab_page+0x6a/0x160 mm/slub.c:1851
>>>>>>     allocate_slab mm/slub.c:1998 [inline]
>>>>>>     new_slab+0x84/0x2f0 mm/slub.c:2051
>>>>>>     ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
>>>>>>     __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
>>>>>>     kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
>>>>>>     mt_alloc_bulk lib/maple_tree.c:157 [inline]
>>>>>>     mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
>>>>>>     mas_node_count_gfp lib/maple_tree.c:1316 [inline]
>>>>>>     mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
>>>>>>     vma_expand+0x277/0x850 mm/mmap.c:541
>>>>>>     mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
>>>>>>     do_mmap+0x8c9/0xf70 mm/mmap.c:1411
>>>>>>     vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
>>>>>>     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>>>     do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>>>>>     entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>> page last free stack trace:
>>>>>>     reset_page_owner include/linux/page_owner.h:24 [inline]
>>>>>>     free_pages_prepare mm/page_alloc.c:1446 [inline]
>>>>>>     free_pcp_prepare mm/page_alloc.c:1496 [inline]
>>>>>>     free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
>>>>>>     free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
>>>>>>     qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
>>>>>>     kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
>>>>>>     __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
>>>>>>     kasan_slab_alloc include/linux/kasan.h:201 [inline]
>>>>>>     slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
>>>>>>     slab_alloc_node mm/slub.c:3452 [inline]
>>>>>>     kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
>>>>>>     __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
>>>>>>     alloc_skb include/linux/skbuff.h:1270 [inline]
>>>>>>     alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
>>>>>>     sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
>>>>>>     unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
>>>>>>     sock_sendmsg_nosec net/socket.c:714 [inline]
>>>>>>     sock_sendmsg net/socket.c:734 [inline]
>>>>>>     __sys_sendto+0x475/0x5f0 net/socket.c:2117
>>>>>>     __do_sys_sendto net/socket.c:2129 [inline]
>>>>>>     __se_sys_sendto net/socket.c:2125 [inline]
>>>>>>     __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
>>>>>>     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>>>     do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>>>>>     entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>>
>>>>>> Memory state around the buggy address:
>>>>>>     ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>>>     ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>>>> ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>>>                       ^
>>>>>>     ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>>>     ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>>> ==================================================================
>>>>>>
>>>>>
>>>>> I think the patch from below should fix it.
>>>>>
>>>>> I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
>>>>> EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
>>>>> super block in the buffer get corrupted sometime after the .get_tree
>>>>> (which eventually calls __ext4_fill_super()) is called. So instead of
>>>>> relying on the contents of the buffer, we should instead rely on the
>>>>> s_desc_size initialized at the __ext4_fill_super() time.
>>>>>
>>>>> If someone finds this good (or bad), or has a more in depth explanation,
>>>>> please let me know, it will help me better understand the subsystem. In
>>>>> the meantime I'll continue to investigate this and prepare a patch for
>>>>> it.
>>>> If there's something corrupting the superblock while the filesystem is
>>>> mounted, we need to find what is corrupting the SB and fix *that*. Not
>>>> try
>>>> to paper over the problem by not using the on-disk data... Maybe journal
>>>> replay is corrupting the value or something like that?
>>>>
>>>>                                  Honza
>>>>
>>> Ok, I agree. First thing would be to understand the reproducer and to
>>> simplify it if possible. I haven't yet decoded what the syz repro is
>>> doing at
>>> https://syzkaller.appspot.com/text?tag=ReproSyz&x=16ce3de4c80000
>>> Will reply to this email thread once I understand what's happening. If
>>> you or someone else can decode the syz repro faster than me, shoot.
>>>
>> I can now explain how the contents of the super block of the buffer get
>> corrupted. After the ext4 fs is mounted to the target ("./bus"), the
>> reproducer maps 6MB of data starting at offset 0 in the target's file
>> ("./bus"), then it starts overriding the data with something else, by
>> using memcpy, memset, individual byte inits. Does that mean that we
>> shouldn't rely on the contents of the super block in the buffer after we
>> mount the file system? If so, then my patch stands. I'll be happy to
>> extend it if needed. Below one may find a step by step interpretation of
>> the reproducer.
>>
>> We have a strace log for the same bug, but on Android 5.15:
>> https://syzkaller.appspot.com/text?tag=CrashLog&x=14ecec8cc80000
>>
>> Look for pid 328. You notice that the bpf() syscalls return error, so I
>> commented them out in the c repro to confirm that they are not the
>> cause. The bug reproduced without the bpf() calls. One can find the c
>> repro at:
>> https://syzkaller.appspot.com/text?tag=ReproC&x=17c5fc50c80000
>>
>> Let's look at these calls, just before the bug was hit:
>> [pid   328] open("./bus",
>> O_RDWR|O_CREAT|O_TRUNC|O_NONBLOCK|O_SYNC|O_DIRECT|O_LARGEFILE|O_NOATIME,
>> 000) = 4
>> [pid   328] mount("/dev/loop0", "./bus", NULL, MS_BIND, NULL) = 0
>> [pid   328] open("./bus", O_RDWR|O_SYNC|O_NOATIME|0x3c) = 5
>> [pid   328] mmap(0x20000000, 6291456,
>> PROT_READ|PROT_WRITE|PROT_EXEC|PROT_SEM|0x47ffff0, MAP_SHARED|MAP_FIXED,
>> 5, 0) = 0x20000000
> Yeah, looking at the reproducer, before this the reproducer also mounts
> /dev/loop0 as ext4 filesystem.
>
>> - ./bus is created (if it does not exist), fd 4 is returned.
>> - /dev/loop0 is mounted to ./bus
>> - then it creates a new file descriptor (5) for the same ./bus
>> - then it creates a mapping for ./bus starting at offset zero. The
>> mapped area is at 0x20000000 and is of 0x600000ul length.
> So the result is that the reproducer modified the block device while it is
> mounted by the filesystem. We know cases like this can crash the kernel and
> it is inherently difficult to fix. We have to trust the buffer cache
> contents as otherwise the performance will be unacceptable. For historical
> reasons we also have to allow modifications of buffer cache while ext4 is
> mounted because tune2fs uses this to e.g. update the label of a mounted
> filesystem.
>
> Long-term we are moving ext4 in a direction where we can disallow block
> device modifications while the fs is mounted but we are not there yet. I've
> discussed some shorter-term solution to avoid such known problems with syzbot
> developers and what seems plausible would be a kconfig option to disallow
> writing to a block device when it is exclusively open by someone else.
> But so far I didn't get to trying whether this would reasonably work. Would
> you be interested in having a look into this?
I am interested in this job. The file system is often damaged by writing 
block devices,
which is a headache. I have always wanted to eradicate this kind of problem.
A few months ago, I tried to add a mount parameter to prohibit 
modification after the
block device is mounted.But I encountered several problems that led to 
the termination
of my attempt. First of all, the 32-bit super block flags have been used 
up and need to
be extended. Secondly, I don't know how to handle read-only flag in the 
case of multiple
mount points.
  "disallow writing to a block device when it is exclusively open by 
someone else. "
-> Perhaps we can add a new IOCTL command to control whether write 
operations are
allowed after the block device has been exclusively opened. I don't know 
if this is feasible?
Do you have any good suggestions?
> 								Honza


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum
  2023-03-13 12:27             ` yebin
@ 2023-03-13 13:01               ` Jan Kara
  2023-03-13 13:17                 ` yebin (H)
  2023-03-13 14:43                 ` Tudor Ambarus
  0 siblings, 2 replies; 20+ messages in thread
From: Jan Kara @ 2023-03-13 13:01 UTC (permalink / raw)
  To: yebin
  Cc: Jan Kara, Tudor Ambarus, syzbot, adilger.kernel, linux-ext4,
	linux-kernel, llvm, nathan, ndesaulniers, syzkaller-bugs, trix,
	tytso, Lee Jones

On Mon 13-03-23 20:27:34, yebin wrote:
> On 2023/3/13 19:57, Jan Kara wrote:
> > On Mon 13-03-23 11:11:18, Tudor Ambarus wrote:
> > > On 3/7/23 11:02, Tudor Ambarus wrote:
> > > > On 3/7/23 10:39, Jan Kara wrote:
> > > > > On Wed 01-03-23 12:13:51, Tudor Ambarus wrote:
> > > > > > On 2/13/23 15:56, syzbot wrote:
> > > > > > > syzbot has found a reproducer for the following issue on:
> > > > > > > 
> > > > > > > HEAD commit:    ceaa837f96ad Linux 6.2-rc8
> > > > > > > git tree:       upstream
> > > > > > > console output:
> > > > > > > https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
> > > > > > > kernel config:
> > > > > > > https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
> > > > > > > dashboard link:
> > > > > > > https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
> > > > > > > compiler:       Debian clang version 15.0.7, GNU ld (GNU Binutils
> > > > > > > for Debian) 2.35.2
> > > > > > > syz repro:
> > > > > > > https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000
> > > > > > > 
> > > > > > > Downloadable assets:
> > > > > > > disk image:
> > > > > > > https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
> > > > > > > vmlinux:
> > > > > > > https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
> > > > > > > kernel image:
> > > > > > > https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
> > > > > > > mounted in repro:
> > > > > > > https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz
> > > > > > > 
> > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the
> > > > > > > commit:
> > > > > > > Reported-by: syzbot+8785e41224a3afd04321@syzkaller.appspotmail.com
> > > > > > > 
> > > > > > > ==================================================================
> > > > > > > BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
> > > > > > > Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339
> > > > > > > 
> > > > > > > CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted
> > > > > > > 6.2.0-rc8-syzkaller #0
> > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > > > > > > BIOS Google 01/21/2023
> > > > > > > Call Trace:
> > > > > > >     <TASK>
> > > > > > >     __dump_stack lib/dump_stack.c:88 [inline]
> > > > > > >     dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
> > > > > > >     print_address_description mm/kasan/report.c:306 [inline]
> > > > > > >     print_report+0x163/0x4f0 mm/kasan/report.c:417
> > > > > > >     kasan_report+0x13a/0x170 mm/kasan/report.c:517
> > > > > > >     crc16+0x1fb/0x280 lib/crc16.c:58
> > > > > > >     ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
> > > > > > >     ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
> > > > > > >     ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
> > > > > > >     ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
> > > > > > >     ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
> > > > > > >     ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
> > > > > > >     ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
> > > > > > >     ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
> > > > > > >     ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
> > > > > > >     ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
> > > > > > >     evict+0x2a4/0x620 fs/inode.c:664
> > > > > > >     do_unlinkat+0x4f1/0x930 fs/namei.c:4327
> > > > > > >     __do_sys_unlink fs/namei.c:4368 [inline]
> > > > > > >     __se_sys_unlink fs/namei.c:4366 [inline]
> > > > > > >     __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
> > > > > > >     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > > > >     do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > > > > > >     entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > > RIP: 0033:0x7fbc85a8c0f9
> > > > > > > Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48
> > > > > > > 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
> > > > > > > 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > > > > > > RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
> > > > > > > RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
> > > > > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
> > > > > > > RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
> > > > > > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > > > > > > R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
> > > > > > >     </TASK>
> > > > > > > 
> > > > > > > The buggy address belongs to the physical page:
> > > > > > > page:ffffea0001f78000 refcount:0 mapcount:-128
> > > > > > > mapping:0000000000000000 index:0x0 pfn:0x7de00
> > > > > > > flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
> > > > > > > raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08
> > > > > > > 0000000000000000
> > > > > > > raw: 0000000000000000 0000000000000001 00000000ffffff7f
> > > > > > > 0000000000000000
> > > > > > > page dumped because: kasan: bad access detected
> > > > > > > page_owner tracks the page as freed
> > > > > > > page last allocated via order 1, migratetype Unmovable, gfp_mask
> > > > > > > 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
> > > > > > >     prep_new_page mm/page_alloc.c:2531 [inline]
> > > > > > >     get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
> > > > > > >     __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
> > > > > > >     alloc_slab_page+0x6a/0x160 mm/slub.c:1851
> > > > > > >     allocate_slab mm/slub.c:1998 [inline]
> > > > > > >     new_slab+0x84/0x2f0 mm/slub.c:2051
> > > > > > >     ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
> > > > > > >     __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
> > > > > > >     kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
> > > > > > >     mt_alloc_bulk lib/maple_tree.c:157 [inline]
> > > > > > >     mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
> > > > > > >     mas_node_count_gfp lib/maple_tree.c:1316 [inline]
> > > > > > >     mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
> > > > > > >     vma_expand+0x277/0x850 mm/mmap.c:541
> > > > > > >     mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
> > > > > > >     do_mmap+0x8c9/0xf70 mm/mmap.c:1411
> > > > > > >     vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
> > > > > > >     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > > > >     do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > > > > > >     entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > > page last free stack trace:
> > > > > > >     reset_page_owner include/linux/page_owner.h:24 [inline]
> > > > > > >     free_pages_prepare mm/page_alloc.c:1446 [inline]
> > > > > > >     free_pcp_prepare mm/page_alloc.c:1496 [inline]
> > > > > > >     free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
> > > > > > >     free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
> > > > > > >     qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
> > > > > > >     kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
> > > > > > >     __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
> > > > > > >     kasan_slab_alloc include/linux/kasan.h:201 [inline]
> > > > > > >     slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
> > > > > > >     slab_alloc_node mm/slub.c:3452 [inline]
> > > > > > >     kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
> > > > > > >     __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
> > > > > > >     alloc_skb include/linux/skbuff.h:1270 [inline]
> > > > > > >     alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
> > > > > > >     sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
> > > > > > >     unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
> > > > > > >     sock_sendmsg_nosec net/socket.c:714 [inline]
> > > > > > >     sock_sendmsg net/socket.c:734 [inline]
> > > > > > >     __sys_sendto+0x475/0x5f0 net/socket.c:2117
> > > > > > >     __do_sys_sendto net/socket.c:2129 [inline]
> > > > > > >     __se_sys_sendto net/socket.c:2125 [inline]
> > > > > > >     __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
> > > > > > >     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > > > >     do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > > > > > >     entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > > 
> > > > > > > Memory state around the buggy address:
> > > > > > >     ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > > >     ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > > > > ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > >                       ^
> > > > > > >     ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > >     ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > > ==================================================================
> > > > > > > 
> > > > > > 
> > > > > > I think the patch from below should fix it.
> > > > > > 
> > > > > > I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
> > > > > > EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
> > > > > > super block in the buffer get corrupted sometime after the .get_tree
> > > > > > (which eventually calls __ext4_fill_super()) is called. So instead of
> > > > > > relying on the contents of the buffer, we should instead rely on the
> > > > > > s_desc_size initialized at the __ext4_fill_super() time.
> > > > > > 
> > > > > > If someone finds this good (or bad), or has a more in depth explanation,
> > > > > > please let me know, it will help me better understand the subsystem. In
> > > > > > the meantime I'll continue to investigate this and prepare a patch for
> > > > > > it.
> > > > > If there's something corrupting the superblock while the filesystem is
> > > > > mounted, we need to find what is corrupting the SB and fix *that*. Not
> > > > > try
> > > > > to paper over the problem by not using the on-disk data... Maybe journal
> > > > > replay is corrupting the value or something like that?
> > > > > 
> > > > >                                  Honza
> > > > > 
> > > > Ok, I agree. First thing would be to understand the reproducer and to
> > > > simplify it if possible. I haven't yet decoded what the syz repro is
> > > > doing at
> > > > https://syzkaller.appspot.com/text?tag=ReproSyz&x=16ce3de4c80000
> > > > Will reply to this email thread once I understand what's happening. If
> > > > you or someone else can decode the syz repro faster than me, shoot.
> > > > 
> > > I can now explain how the contents of the super block of the buffer get
> > > corrupted. After the ext4 fs is mounted to the target ("./bus"), the
> > > reproducer maps 6MB of data starting at offset 0 in the target's file
> > > ("./bus"), then it starts overriding the data with something else, by
> > > using memcpy, memset, individual byte inits. Does that mean that we
> > > shouldn't rely on the contents of the super block in the buffer after we
> > > mount the file system? If so, then my patch stands. I'll be happy to
> > > extend it if needed. Below one may find a step by step interpretation of
> > > the reproducer.
> > > 
> > > We have a strace log for the same bug, but on Android 5.15:
> > > https://syzkaller.appspot.com/text?tag=CrashLog&x=14ecec8cc80000
> > > 
> > > Look for pid 328. You notice that the bpf() syscalls return error, so I
> > > commented them out in the c repro to confirm that they are not the
> > > cause. The bug reproduced without the bpf() calls. One can find the c
> > > repro at:
> > > https://syzkaller.appspot.com/text?tag=ReproC&x=17c5fc50c80000
> > > 
> > > Let's look at these calls, just before the bug was hit:
> > > [pid   328] open("./bus",
> > > O_RDWR|O_CREAT|O_TRUNC|O_NONBLOCK|O_SYNC|O_DIRECT|O_LARGEFILE|O_NOATIME,
> > > 000) = 4
> > > [pid   328] mount("/dev/loop0", "./bus", NULL, MS_BIND, NULL) = 0
> > > [pid   328] open("./bus", O_RDWR|O_SYNC|O_NOATIME|0x3c) = 5
> > > [pid   328] mmap(0x20000000, 6291456,
> > > PROT_READ|PROT_WRITE|PROT_EXEC|PROT_SEM|0x47ffff0, MAP_SHARED|MAP_FIXED,
> > > 5, 0) = 0x20000000
> > Yeah, looking at the reproducer, before this the reproducer also mounts
> > /dev/loop0 as ext4 filesystem.
> > 
> > > - ./bus is created (if it does not exist), fd 4 is returned.
> > > - /dev/loop0 is mounted to ./bus
> > > - then it creates a new file descriptor (5) for the same ./bus
> > > - then it creates a mapping for ./bus starting at offset zero. The
> > > mapped area is at 0x20000000 and is of 0x600000ul length.
> > So the result is that the reproducer modified the block device while it is
> > mounted by the filesystem. We know cases like this can crash the kernel and
> > it is inherently difficult to fix. We have to trust the buffer cache
> > contents as otherwise the performance will be unacceptable. For historical
> > reasons we also have to allow modifications of buffer cache while ext4 is
> > mounted because tune2fs uses this to e.g. update the label of a mounted
> > filesystem.
> > 
> > Long-term we are moving ext4 in a direction where we can disallow block
> > device modifications while the fs is mounted but we are not there yet. I've
> > discussed some shorter-term solution to avoid such known problems with syzbot
> > developers and what seems plausible would be a kconfig option to disallow
> > writing to a block device when it is exclusively open by someone else.
> > But so far I didn't get to trying whether this would reasonably work. Would
> > you be interested in having a look into this?
>
> I am interested in this job. The file system is often damaged by writing
> block devices, which is a headache. I have always wanted to eradicate
> this kind of problem.  A few months ago, I tried to add a mount parameter
> to prohibit modification after the block device is mounted.But I
> encountered several problems that led to the termination of my attempt.
> First of all, the 32-bit super block flags have been used up and need to
> be extended. Secondly, I don't know how to handle read-only flag in the
> case of multiple mount points.
>  "disallow writing to a block device when it is exclusively open by someone
> else. "
> -> Perhaps we can add a new IOCTL command to control whether write
> operations are allowed after the block device has been exclusively
> opened. I don't know if this is feasible?  Do you have any good
> suggestions?

Well, ioctl() for syzbot would be possible as well but for start I'd try
whether the idea with kconfig option will work. Then it will be enough to
just make sure all kernels used for fuzzing are built with this option set.
Thanks for having a look into this!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum
  2023-03-13 13:01               ` Jan Kara
@ 2023-03-13 13:17                 ` yebin (H)
  2023-03-14 11:19                   ` Jan Kara
  2023-03-13 14:43                 ` Tudor Ambarus
  1 sibling, 1 reply; 20+ messages in thread
From: yebin (H) @ 2023-03-13 13:17 UTC (permalink / raw)
  To: Jan Kara, yebin
  Cc: Tudor Ambarus, syzbot, adilger.kernel, linux-ext4, linux-kernel,
	llvm, nathan, ndesaulniers, syzkaller-bugs, trix, tytso,
	Lee Jones



On 2023/3/13 21:01, Jan Kara wrote:
> On Mon 13-03-23 20:27:34, yebin wrote:
>> On 2023/3/13 19:57, Jan Kara wrote:
>>> On Mon 13-03-23 11:11:18, Tudor Ambarus wrote:
>>>> On 3/7/23 11:02, Tudor Ambarus wrote:
>>>>> On 3/7/23 10:39, Jan Kara wrote:
>>>>>> On Wed 01-03-23 12:13:51, Tudor Ambarus wrote:
>>>>>>> On 2/13/23 15:56, syzbot wrote:
>>>>>>>> syzbot has found a reproducer for the following issue on:
>>>>>>>>
>>>>>>>> HEAD commit:    ceaa837f96ad Linux 6.2-rc8
>>>>>>>> git tree:       upstream
>>>>>>>> console output:
>>>>>>>> https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
>>>>>>>> kernel config:
>>>>>>>> https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
>>>>>>>> dashboard link:
>>>>>>>> https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
>>>>>>>> compiler:       Debian clang version 15.0.7, GNU ld (GNU Binutils
>>>>>>>> for Debian) 2.35.2
>>>>>>>> syz repro:
>>>>>>>> https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000
>>>>>>>>
>>>>>>>> Downloadable assets:
>>>>>>>> disk image:
>>>>>>>> https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
>>>>>>>> vmlinux:
>>>>>>>> https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
>>>>>>>> kernel image:
>>>>>>>> https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
>>>>>>>> mounted in repro:
>>>>>>>> https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz
>>>>>>>>
>>>>>>>> IMPORTANT: if you fix the issue, please add the following tag to the
>>>>>>>> commit:
>>>>>>>> Reported-by: syzbot+8785e41224a3afd04321@syzkaller.appspotmail.com
>>>>>>>>
>>>>>>>> ==================================================================
>>>>>>>> BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
>>>>>>>> Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339
>>>>>>>>
>>>>>>>> CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted
>>>>>>>> 6.2.0-rc8-syzkaller #0
>>>>>>>> Hardware name: Google Google Compute Engine/Google Compute Engine,
>>>>>>>> BIOS Google 01/21/2023
>>>>>>>> Call Trace:
>>>>>>>>      <TASK>
>>>>>>>>      __dump_stack lib/dump_stack.c:88 [inline]
>>>>>>>>      dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
>>>>>>>>      print_address_description mm/kasan/report.c:306 [inline]
>>>>>>>>      print_report+0x163/0x4f0 mm/kasan/report.c:417
>>>>>>>>      kasan_report+0x13a/0x170 mm/kasan/report.c:517
>>>>>>>>      crc16+0x1fb/0x280 lib/crc16.c:58
>>>>>>>>      ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
>>>>>>>>      ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
>>>>>>>>      ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
>>>>>>>>      ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
>>>>>>>>      ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
>>>>>>>>      ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
>>>>>>>>      ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
>>>>>>>>      ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
>>>>>>>>      ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
>>>>>>>>      ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
>>>>>>>>      evict+0x2a4/0x620 fs/inode.c:664
>>>>>>>>      do_unlinkat+0x4f1/0x930 fs/namei.c:4327
>>>>>>>>      __do_sys_unlink fs/namei.c:4368 [inline]
>>>>>>>>      __se_sys_unlink fs/namei.c:4366 [inline]
>>>>>>>>      __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
>>>>>>>>      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>>>>>      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>>>>>>>      entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>>>> RIP: 0033:0x7fbc85a8c0f9
>>>>>>>> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48
>>>>>>>> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
>>>>>>>> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
>>>>>>>> RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
>>>>>>>> RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
>>>>>>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
>>>>>>>> RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
>>>>>>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
>>>>>>>> R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
>>>>>>>>      </TASK>
>>>>>>>>
>>>>>>>> The buggy address belongs to the physical page:
>>>>>>>> page:ffffea0001f78000 refcount:0 mapcount:-128
>>>>>>>> mapping:0000000000000000 index:0x0 pfn:0x7de00
>>>>>>>> flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
>>>>>>>> raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08
>>>>>>>> 0000000000000000
>>>>>>>> raw: 0000000000000000 0000000000000001 00000000ffffff7f
>>>>>>>> 0000000000000000
>>>>>>>> page dumped because: kasan: bad access detected
>>>>>>>> page_owner tracks the page as freed
>>>>>>>> page last allocated via order 1, migratetype Unmovable, gfp_mask
>>>>>>>> 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
>>>>>>>>      prep_new_page mm/page_alloc.c:2531 [inline]
>>>>>>>>      get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
>>>>>>>>      __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
>>>>>>>>      alloc_slab_page+0x6a/0x160 mm/slub.c:1851
>>>>>>>>      allocate_slab mm/slub.c:1998 [inline]
>>>>>>>>      new_slab+0x84/0x2f0 mm/slub.c:2051
>>>>>>>>      ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
>>>>>>>>      __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
>>>>>>>>      kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
>>>>>>>>      mt_alloc_bulk lib/maple_tree.c:157 [inline]
>>>>>>>>      mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
>>>>>>>>      mas_node_count_gfp lib/maple_tree.c:1316 [inline]
>>>>>>>>      mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
>>>>>>>>      vma_expand+0x277/0x850 mm/mmap.c:541
>>>>>>>>      mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
>>>>>>>>      do_mmap+0x8c9/0xf70 mm/mmap.c:1411
>>>>>>>>      vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
>>>>>>>>      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>>>>>      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>>>>>>>      entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>>>> page last free stack trace:
>>>>>>>>      reset_page_owner include/linux/page_owner.h:24 [inline]
>>>>>>>>      free_pages_prepare mm/page_alloc.c:1446 [inline]
>>>>>>>>      free_pcp_prepare mm/page_alloc.c:1496 [inline]
>>>>>>>>      free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
>>>>>>>>      free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
>>>>>>>>      qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
>>>>>>>>      kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
>>>>>>>>      __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
>>>>>>>>      kasan_slab_alloc include/linux/kasan.h:201 [inline]
>>>>>>>>      slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
>>>>>>>>      slab_alloc_node mm/slub.c:3452 [inline]
>>>>>>>>      kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
>>>>>>>>      __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
>>>>>>>>      alloc_skb include/linux/skbuff.h:1270 [inline]
>>>>>>>>      alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
>>>>>>>>      sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
>>>>>>>>      unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
>>>>>>>>      sock_sendmsg_nosec net/socket.c:714 [inline]
>>>>>>>>      sock_sendmsg net/socket.c:734 [inline]
>>>>>>>>      __sys_sendto+0x475/0x5f0 net/socket.c:2117
>>>>>>>>      __do_sys_sendto net/socket.c:2129 [inline]
>>>>>>>>      __se_sys_sendto net/socket.c:2125 [inline]
>>>>>>>>      __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
>>>>>>>>      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>>>>>      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>>>>>>>      entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>>>>
>>>>>>>> Memory state around the buggy address:
>>>>>>>>      ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>>>>>      ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>>>>>> ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>>>>>                        ^
>>>>>>>>      ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>>>>>      ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>>>>> ==================================================================
>>>>>>>>
>>>>>>> I think the patch from below should fix it.
>>>>>>>
>>>>>>> I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
>>>>>>> EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
>>>>>>> super block in the buffer get corrupted sometime after the .get_tree
>>>>>>> (which eventually calls __ext4_fill_super()) is called. So instead of
>>>>>>> relying on the contents of the buffer, we should instead rely on the
>>>>>>> s_desc_size initialized at the __ext4_fill_super() time.
>>>>>>>
>>>>>>> If someone finds this good (or bad), or has a more in depth explanation,
>>>>>>> please let me know, it will help me better understand the subsystem. In
>>>>>>> the meantime I'll continue to investigate this and prepare a patch for
>>>>>>> it.
>>>>>> If there's something corrupting the superblock while the filesystem is
>>>>>> mounted, we need to find what is corrupting the SB and fix *that*. Not
>>>>>> try
>>>>>> to paper over the problem by not using the on-disk data... Maybe journal
>>>>>> replay is corrupting the value or something like that?
>>>>>>
>>>>>>                                   Honza
>>>>>>
>>>>> Ok, I agree. First thing would be to understand the reproducer and to
>>>>> simplify it if possible. I haven't yet decoded what the syz repro is
>>>>> doing at
>>>>> https://syzkaller.appspot.com/text?tag=ReproSyz&x=16ce3de4c80000
>>>>> Will reply to this email thread once I understand what's happening. If
>>>>> you or someone else can decode the syz repro faster than me, shoot.
>>>>>
>>>> I can now explain how the contents of the super block of the buffer get
>>>> corrupted. After the ext4 fs is mounted to the target ("./bus"), the
>>>> reproducer maps 6MB of data starting at offset 0 in the target's file
>>>> ("./bus"), then it starts overriding the data with something else, by
>>>> using memcpy, memset, individual byte inits. Does that mean that we
>>>> shouldn't rely on the contents of the super block in the buffer after we
>>>> mount the file system? If so, then my patch stands. I'll be happy to
>>>> extend it if needed. Below one may find a step by step interpretation of
>>>> the reproducer.
>>>>
>>>> We have a strace log for the same bug, but on Android 5.15:
>>>> https://syzkaller.appspot.com/text?tag=CrashLog&x=14ecec8cc80000
>>>>
>>>> Look for pid 328. You notice that the bpf() syscalls return error, so I
>>>> commented them out in the c repro to confirm that they are not the
>>>> cause. The bug reproduced without the bpf() calls. One can find the c
>>>> repro at:
>>>> https://syzkaller.appspot.com/text?tag=ReproC&x=17c5fc50c80000
>>>>
>>>> Let's look at these calls, just before the bug was hit:
>>>> [pid   328] open("./bus",
>>>> O_RDWR|O_CREAT|O_TRUNC|O_NONBLOCK|O_SYNC|O_DIRECT|O_LARGEFILE|O_NOATIME,
>>>> 000) = 4
>>>> [pid   328] mount("/dev/loop0", "./bus", NULL, MS_BIND, NULL) = 0
>>>> [pid   328] open("./bus", O_RDWR|O_SYNC|O_NOATIME|0x3c) = 5
>>>> [pid   328] mmap(0x20000000, 6291456,
>>>> PROT_READ|PROT_WRITE|PROT_EXEC|PROT_SEM|0x47ffff0, MAP_SHARED|MAP_FIXED,
>>>> 5, 0) = 0x20000000
>>> Yeah, looking at the reproducer, before this the reproducer also mounts
>>> /dev/loop0 as ext4 filesystem.
>>>
>>>> - ./bus is created (if it does not exist), fd 4 is returned.
>>>> - /dev/loop0 is mounted to ./bus
>>>> - then it creates a new file descriptor (5) for the same ./bus
>>>> - then it creates a mapping for ./bus starting at offset zero. The
>>>> mapped area is at 0x20000000 and is of 0x600000ul length.
>>> So the result is that the reproducer modified the block device while it is
>>> mounted by the filesystem. We know cases like this can crash the kernel and
>>> it is inherently difficult to fix. We have to trust the buffer cache
>>> contents as otherwise the performance will be unacceptable. For historical
>>> reasons we also have to allow modifications of buffer cache while ext4 is
>>> mounted because tune2fs uses this to e.g. update the label of a mounted
>>> filesystem.
>>>
>>> Long-term we are moving ext4 in a direction where we can disallow block
>>> device modifications while the fs is mounted but we are not there yet. I've
>>> discussed some shorter-term solution to avoid such known problems with syzbot
>>> developers and what seems plausible would be a kconfig option to disallow
>>> writing to a block device when it is exclusively open by someone else.
>>> But so far I didn't get to trying whether this would reasonably work. Would
>>> you be interested in having a look into this?
>> I am interested in this job. The file system is often damaged by writing
>> block devices, which is a headache. I have always wanted to eradicate
>> this kind of problem.  A few months ago, I tried to add a mount parameter
>> to prohibit modification after the block device is mounted.But I
>> encountered several problems that led to the termination of my attempt.
>> First of all, the 32-bit super block flags have been used up and need to
>> be extended. Secondly, I don't know how to handle read-only flag in the
>> case of multiple mount points.
>>   "disallow writing to a block device when it is exclusively open by someone
>> else. "
>> -> Perhaps we can add a new IOCTL command to control whether write
>> operations are allowed after the block device has been exclusively
>> opened. I don't know if this is feasible?  Do you have any good
>> suggestions?
> Well, ioctl() for syzbot would be possible as well but for start I'd try
> whether the idea with kconfig option will work. Then it will be enough to
> just make sure all kernels used for fuzzing are built with this option set.
> Thanks for having a look into this!
In fact, I also want to solve the problem of file system damage caused 
by writing raw disks
in the production environment. Use kconfig directly to control whether 
it loses flexibility in
the production environment.
>
> 								Honza


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum
  2023-03-13 13:01               ` Jan Kara
  2023-03-13 13:17                 ` yebin (H)
@ 2023-03-13 14:43                 ` Tudor Ambarus
  1 sibling, 0 replies; 20+ messages in thread
From: Tudor Ambarus @ 2023-03-13 14:43 UTC (permalink / raw)
  To: Jan Kara, yebin
  Cc: syzbot, adilger.kernel, linux-ext4, linux-kernel, llvm, nathan,
	ndesaulniers, syzkaller-bugs, trix, tytso, Lee Jones

Hi, Jan, Ye,

On 3/13/23 13:01, Jan Kara wrote:
> On Mon 13-03-23 20:27:34, yebin wrote:
>> On 2023/3/13 19:57, Jan Kara wrote:
>>> On Mon 13-03-23 11:11:18, Tudor Ambarus wrote:
>>>> On 3/7/23 11:02, Tudor Ambarus wrote:
>>>>> On 3/7/23 10:39, Jan Kara wrote:
>>>>>> On Wed 01-03-23 12:13:51, Tudor Ambarus wrote:
>>>>>>> On 2/13/23 15:56, syzbot wrote:
>>>>>>>> syzbot has found a reproducer for the following issue on:
>>>>>>>>
>>>>>>>> HEAD commit:    ceaa837f96ad Linux 6.2-rc8
>>>>>>>> git tree:       upstream
>>>>>>>> console output:
>>>>>>>> https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
>>>>>>>> kernel config:
>>>>>>>> https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
>>>>>>>> dashboard link:
>>>>>>>> https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
>>>>>>>> compiler:       Debian clang version 15.0.7, GNU ld (GNU Binutils
>>>>>>>> for Debian) 2.35.2
>>>>>>>> syz repro:
>>>>>>>> https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000
>>>>>>>>
>>>>>>>> Downloadable assets:
>>>>>>>> disk image:
>>>>>>>> https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
>>>>>>>> vmlinux:
>>>>>>>> https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
>>>>>>>> kernel image:
>>>>>>>> https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
>>>>>>>> mounted in repro:
>>>>>>>> https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz
>>>>>>>>
>>>>>>>> IMPORTANT: if you fix the issue, please add the following tag to the
>>>>>>>> commit:
>>>>>>>> Reported-by: syzbot+8785e41224a3afd04321@syzkaller.appspotmail.com
>>>>>>>>
>>>>>>>> ==================================================================
>>>>>>>> BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
>>>>>>>> Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339
>>>>>>>>
>>>>>>>> CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted
>>>>>>>> 6.2.0-rc8-syzkaller #0
>>>>>>>> Hardware name: Google Google Compute Engine/Google Compute Engine,
>>>>>>>> BIOS Google 01/21/2023
>>>>>>>> Call Trace:
>>>>>>>>     <TASK>
>>>>>>>>     __dump_stack lib/dump_stack.c:88 [inline]
>>>>>>>>     dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
>>>>>>>>     print_address_description mm/kasan/report.c:306 [inline]
>>>>>>>>     print_report+0x163/0x4f0 mm/kasan/report.c:417
>>>>>>>>     kasan_report+0x13a/0x170 mm/kasan/report.c:517
>>>>>>>>     crc16+0x1fb/0x280 lib/crc16.c:58
>>>>>>>>     ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
>>>>>>>>     ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
>>>>>>>>     ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
>>>>>>>>     ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
>>>>>>>>     ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
>>>>>>>>     ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
>>>>>>>>     ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
>>>>>>>>     ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
>>>>>>>>     ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
>>>>>>>>     ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
>>>>>>>>     evict+0x2a4/0x620 fs/inode.c:664
>>>>>>>>     do_unlinkat+0x4f1/0x930 fs/namei.c:4327
>>>>>>>>     __do_sys_unlink fs/namei.c:4368 [inline]
>>>>>>>>     __se_sys_unlink fs/namei.c:4366 [inline]
>>>>>>>>     __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
>>>>>>>>     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>>>>>     do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>>>>>>>     entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>>>> RIP: 0033:0x7fbc85a8c0f9
>>>>>>>> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48
>>>>>>>> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
>>>>>>>> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
>>>>>>>> RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
>>>>>>>> RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
>>>>>>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
>>>>>>>> RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
>>>>>>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
>>>>>>>> R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
>>>>>>>>     </TASK>
>>>>>>>>
>>>>>>>> The buggy address belongs to the physical page:
>>>>>>>> page:ffffea0001f78000 refcount:0 mapcount:-128
>>>>>>>> mapping:0000000000000000 index:0x0 pfn:0x7de00
>>>>>>>> flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
>>>>>>>> raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08
>>>>>>>> 0000000000000000
>>>>>>>> raw: 0000000000000000 0000000000000001 00000000ffffff7f
>>>>>>>> 0000000000000000
>>>>>>>> page dumped because: kasan: bad access detected
>>>>>>>> page_owner tracks the page as freed
>>>>>>>> page last allocated via order 1, migratetype Unmovable, gfp_mask
>>>>>>>> 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
>>>>>>>>     prep_new_page mm/page_alloc.c:2531 [inline]
>>>>>>>>     get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
>>>>>>>>     __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
>>>>>>>>     alloc_slab_page+0x6a/0x160 mm/slub.c:1851
>>>>>>>>     allocate_slab mm/slub.c:1998 [inline]
>>>>>>>>     new_slab+0x84/0x2f0 mm/slub.c:2051
>>>>>>>>     ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
>>>>>>>>     __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
>>>>>>>>     kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
>>>>>>>>     mt_alloc_bulk lib/maple_tree.c:157 [inline]
>>>>>>>>     mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
>>>>>>>>     mas_node_count_gfp lib/maple_tree.c:1316 [inline]
>>>>>>>>     mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
>>>>>>>>     vma_expand+0x277/0x850 mm/mmap.c:541
>>>>>>>>     mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
>>>>>>>>     do_mmap+0x8c9/0xf70 mm/mmap.c:1411
>>>>>>>>     vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
>>>>>>>>     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>>>>>     do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>>>>>>>     entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>>>> page last free stack trace:
>>>>>>>>     reset_page_owner include/linux/page_owner.h:24 [inline]
>>>>>>>>     free_pages_prepare mm/page_alloc.c:1446 [inline]
>>>>>>>>     free_pcp_prepare mm/page_alloc.c:1496 [inline]
>>>>>>>>     free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
>>>>>>>>     free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
>>>>>>>>     qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
>>>>>>>>     kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
>>>>>>>>     __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
>>>>>>>>     kasan_slab_alloc include/linux/kasan.h:201 [inline]
>>>>>>>>     slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
>>>>>>>>     slab_alloc_node mm/slub.c:3452 [inline]
>>>>>>>>     kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
>>>>>>>>     __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
>>>>>>>>     alloc_skb include/linux/skbuff.h:1270 [inline]
>>>>>>>>     alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
>>>>>>>>     sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
>>>>>>>>     unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
>>>>>>>>     sock_sendmsg_nosec net/socket.c:714 [inline]
>>>>>>>>     sock_sendmsg net/socket.c:734 [inline]
>>>>>>>>     __sys_sendto+0x475/0x5f0 net/socket.c:2117
>>>>>>>>     __do_sys_sendto net/socket.c:2129 [inline]
>>>>>>>>     __se_sys_sendto net/socket.c:2125 [inline]
>>>>>>>>     __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
>>>>>>>>     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>>>>>     do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>>>>>>>     entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>>>>
>>>>>>>> Memory state around the buggy address:
>>>>>>>>     ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>>>>>     ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>>>>>> ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>>>>>                       ^
>>>>>>>>     ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>>>>>     ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>>>>> ==================================================================
>>>>>>>>
>>>>>>>
>>>>>>> I think the patch from below should fix it.
>>>>>>>
>>>>>>> I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
>>>>>>> EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
>>>>>>> super block in the buffer get corrupted sometime after the .get_tree
>>>>>>> (which eventually calls __ext4_fill_super()) is called. So instead of
>>>>>>> relying on the contents of the buffer, we should instead rely on the
>>>>>>> s_desc_size initialized at the __ext4_fill_super() time.
>>>>>>>
>>>>>>> If someone finds this good (or bad), or has a more in depth explanation,
>>>>>>> please let me know, it will help me better understand the subsystem. In
>>>>>>> the meantime I'll continue to investigate this and prepare a patch for
>>>>>>> it.
>>>>>> If there's something corrupting the superblock while the filesystem is
>>>>>> mounted, we need to find what is corrupting the SB and fix *that*. Not
>>>>>> try
>>>>>> to paper over the problem by not using the on-disk data... Maybe journal
>>>>>> replay is corrupting the value or something like that?
>>>>>>
>>>>>>                                  Honza
>>>>>>
>>>>> Ok, I agree. First thing would be to understand the reproducer and to
>>>>> simplify it if possible. I haven't yet decoded what the syz repro is
>>>>> doing at
>>>>> https://syzkaller.appspot.com/text?tag=ReproSyz&x=16ce3de4c80000
>>>>> Will reply to this email thread once I understand what's happening. If
>>>>> you or someone else can decode the syz repro faster than me, shoot.
>>>>>
>>>> I can now explain how the contents of the super block of the buffer get
>>>> corrupted. After the ext4 fs is mounted to the target ("./bus"), the
>>>> reproducer maps 6MB of data starting at offset 0 in the target's file
>>>> ("./bus"), then it starts overriding the data with something else, by
>>>> using memcpy, memset, individual byte inits. Does that mean that we
>>>> shouldn't rely on the contents of the super block in the buffer after we
>>>> mount the file system? If so, then my patch stands. I'll be happy to
>>>> extend it if needed. Below one may find a step by step interpretation of
>>>> the reproducer.
>>>>
>>>> We have a strace log for the same bug, but on Android 5.15:
>>>> https://syzkaller.appspot.com/text?tag=CrashLog&x=14ecec8cc80000
>>>>
>>>> Look for pid 328. You notice that the bpf() syscalls return error, so I
>>>> commented them out in the c repro to confirm that they are not the
>>>> cause. The bug reproduced without the bpf() calls. One can find the c
>>>> repro at:
>>>> https://syzkaller.appspot.com/text?tag=ReproC&x=17c5fc50c80000
>>>>
>>>> Let's look at these calls, just before the bug was hit:
>>>> [pid   328] open("./bus",
>>>> O_RDWR|O_CREAT|O_TRUNC|O_NONBLOCK|O_SYNC|O_DIRECT|O_LARGEFILE|O_NOATIME,
>>>> 000) = 4
>>>> [pid   328] mount("/dev/loop0", "./bus", NULL, MS_BIND, NULL) = 0
>>>> [pid   328] open("./bus", O_RDWR|O_SYNC|O_NOATIME|0x3c) = 5
>>>> [pid   328] mmap(0x20000000, 6291456,
>>>> PROT_READ|PROT_WRITE|PROT_EXEC|PROT_SEM|0x47ffff0, MAP_SHARED|MAP_FIXED,
>>>> 5, 0) = 0x20000000
>>> Yeah, looking at the reproducer, before this the reproducer also mounts
>>> /dev/loop0 as ext4 filesystem.
>>>
>>>> - ./bus is created (if it does not exist), fd 4 is returned.
>>>> - /dev/loop0 is mounted to ./bus
>>>> - then it creates a new file descriptor (5) for the same ./bus
>>>> - then it creates a mapping for ./bus starting at offset zero. The
>>>> mapped area is at 0x20000000 and is of 0x600000ul length.
>>> So the result is that the reproducer modified the block device while it is
>>> mounted by the filesystem. We know cases like this can crash the kernel and
>>> it is inherently difficult to fix. We have to trust the buffer cache
>>> contents as otherwise the performance will be unacceptable. For historical
>>> reasons we also have to allow modifications of buffer cache while ext4 is
>>> mounted because tune2fs uses this to e.g. update the label of a mounted
>>> filesystem.
>>>
>>> Long-term we are moving ext4 in a direction where we can disallow block
>>> device modifications while the fs is mounted but we are not there yet. I've

sounds good.

>>> discussed some shorter-term solution to avoid such known problems with syzbot
>>> developers and what seems plausible would be a kconfig option to disallow
>>> writing to a block device when it is exclusively open by someone else.

How do we determine when a block device is exclusively open by someone else?

>>> But so far I didn't get to trying whether this would reasonably work. Would
>>> you be interested in having a look into this?
>>
>> I am interested in this job. The file system is often damaged by writing

I'm fine with Ye handling this. If that's not the case I can take a look
too, but I need more pointers than the ones already provided, as I've
recently started skimming over ext4.

>> block devices, which is a headache. I have always wanted to eradicate
>> this kind of problem.  A few months ago, I tried to add a mount parameter
>> to prohibit modification after the block device is mounted.But I
>> encountered several problems that led to the termination of my attempt.
>> First of all, the 32-bit super block flags have been used up and need to
>> be extended. Secondly, I don't know how to handle read-only flag in the
>> case of multiple mount points.
>>  "disallow writing to a block device when it is exclusively open by someone
>> else. "
>> -> Perhaps we can add a new IOCTL command to control whether write
>> operations are allowed after the block device has been exclusively
>> opened. I don't know if this is feasible?  Do you have any good
>> suggestions?
> 
> Well, ioctl() for syzbot would be possible as well but for start I'd try
> whether the idea with kconfig option will work. Then it will be enough to
> just make sure all kernels used for fuzzing are built with this option set.

How should we treat such bugs until the kconfig option is introduced? Do
we let them open, do we mark them as won't fix? The kconfig solution
feels a bit as a workaround, the bugs will still be hit by someone not
selecting that config option.

Cheers,
ta

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum
  2023-03-13 11:57           ` Jan Kara
  2023-03-13 12:27             ` yebin
@ 2023-03-13 14:53             ` Dmitry Vyukov
  2023-03-14  2:26               ` Theodore Ts'o
  2023-03-14  8:49               ` Jan Kara
  2023-04-30  2:55             ` Theodore Ts'o
  2 siblings, 2 replies; 20+ messages in thread
From: Dmitry Vyukov @ 2023-03-13 14:53 UTC (permalink / raw)
  To: Jan Kara
  Cc: Tudor Ambarus, syzbot, adilger.kernel, linux-ext4, linux-kernel,
	llvm, nathan, ndesaulniers, syzkaller-bugs, trix, tytso,
	Lee Jones

On Mon, 13 Mar 2023 at 12:57, Jan Kara <jack@suse.cz> wrote:
>
> Hi Tudor!
>
> On Mon 13-03-23 11:11:18, Tudor Ambarus wrote:
> > On 3/7/23 11:02, Tudor Ambarus wrote:
> > > On 3/7/23 10:39, Jan Kara wrote:
> > >> On Wed 01-03-23 12:13:51, Tudor Ambarus wrote:
> > >>> On 2/13/23 15:56, syzbot wrote:
> > >>>> syzbot has found a reproducer for the following issue on:
> > >>>>
> > >>>> HEAD commit:    ceaa837f96ad Linux 6.2-rc8
> > >>>> git tree:       upstream
> > >>>> console output:
> > >>>> https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
> > >>>> kernel config:
> > >>>> https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
> > >>>> dashboard link:
> > >>>> https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
> > >>>> compiler:       Debian clang version 15.0.7, GNU ld (GNU Binutils
> > >>>> for Debian) 2.35.2
> > >>>> syz repro:
> > >>>> https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000
> > >>>>
> > >>>> Downloadable assets:
> > >>>> disk image:
> > >>>> https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
> > >>>> vmlinux:
> > >>>> https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
> > >>>> kernel image:
> > >>>> https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
> > >>>> mounted in repro:
> > >>>> https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz
> > >>>>
> > >>>> IMPORTANT: if you fix the issue, please add the following tag to the
> > >>>> commit:
> > >>>> Reported-by: syzbot+8785e41224a3afd04321@syzkaller.appspotmail.com
> > >>>>
> > >>>> ==================================================================
> > >>>> BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
> > >>>> Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339
> > >>>>
> > >>>> CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted
> > >>>> 6.2.0-rc8-syzkaller #0
> > >>>> Hardware name: Google Google Compute Engine/Google Compute Engine,
> > >>>> BIOS Google 01/21/2023
> > >>>> Call Trace:
> > >>>>    <TASK>
> > >>>>    __dump_stack lib/dump_stack.c:88 [inline]
> > >>>>    dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
> > >>>>    print_address_description mm/kasan/report.c:306 [inline]
> > >>>>    print_report+0x163/0x4f0 mm/kasan/report.c:417
> > >>>>    kasan_report+0x13a/0x170 mm/kasan/report.c:517
> > >>>>    crc16+0x1fb/0x280 lib/crc16.c:58
> > >>>>    ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
> > >>>>    ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
> > >>>>    ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
> > >>>>    ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
> > >>>>    ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
> > >>>>    ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
> > >>>>    ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
> > >>>>    ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
> > >>>>    ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
> > >>>>    ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
> > >>>>    evict+0x2a4/0x620 fs/inode.c:664
> > >>>>    do_unlinkat+0x4f1/0x930 fs/namei.c:4327
> > >>>>    __do_sys_unlink fs/namei.c:4368 [inline]
> > >>>>    __se_sys_unlink fs/namei.c:4366 [inline]
> > >>>>    __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
> > >>>>    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > >>>>    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > >>>>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > >>>> RIP: 0033:0x7fbc85a8c0f9
> > >>>> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48
> > >>>> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
> > >>>> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > >>>> RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
> > >>>> RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
> > >>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
> > >>>> RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
> > >>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > >>>> R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
> > >>>>    </TASK>
> > >>>>
> > >>>> The buggy address belongs to the physical page:
> > >>>> page:ffffea0001f78000 refcount:0 mapcount:-128
> > >>>> mapping:0000000000000000 index:0x0 pfn:0x7de00
> > >>>> flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
> > >>>> raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08
> > >>>> 0000000000000000
> > >>>> raw: 0000000000000000 0000000000000001 00000000ffffff7f
> > >>>> 0000000000000000
> > >>>> page dumped because: kasan: bad access detected
> > >>>> page_owner tracks the page as freed
> > >>>> page last allocated via order 1, migratetype Unmovable, gfp_mask
> > >>>> 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
> > >>>>    prep_new_page mm/page_alloc.c:2531 [inline]
> > >>>>    get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
> > >>>>    __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
> > >>>>    alloc_slab_page+0x6a/0x160 mm/slub.c:1851
> > >>>>    allocate_slab mm/slub.c:1998 [inline]
> > >>>>    new_slab+0x84/0x2f0 mm/slub.c:2051
> > >>>>    ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
> > >>>>    __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
> > >>>>    kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
> > >>>>    mt_alloc_bulk lib/maple_tree.c:157 [inline]
> > >>>>    mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
> > >>>>    mas_node_count_gfp lib/maple_tree.c:1316 [inline]
> > >>>>    mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
> > >>>>    vma_expand+0x277/0x850 mm/mmap.c:541
> > >>>>    mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
> > >>>>    do_mmap+0x8c9/0xf70 mm/mmap.c:1411
> > >>>>    vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
> > >>>>    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > >>>>    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > >>>>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > >>>> page last free stack trace:
> > >>>>    reset_page_owner include/linux/page_owner.h:24 [inline]
> > >>>>    free_pages_prepare mm/page_alloc.c:1446 [inline]
> > >>>>    free_pcp_prepare mm/page_alloc.c:1496 [inline]
> > >>>>    free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
> > >>>>    free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
> > >>>>    qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
> > >>>>    kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
> > >>>>    __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
> > >>>>    kasan_slab_alloc include/linux/kasan.h:201 [inline]
> > >>>>    slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
> > >>>>    slab_alloc_node mm/slub.c:3452 [inline]
> > >>>>    kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
> > >>>>    __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
> > >>>>    alloc_skb include/linux/skbuff.h:1270 [inline]
> > >>>>    alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
> > >>>>    sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
> > >>>>    unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
> > >>>>    sock_sendmsg_nosec net/socket.c:714 [inline]
> > >>>>    sock_sendmsg net/socket.c:734 [inline]
> > >>>>    __sys_sendto+0x475/0x5f0 net/socket.c:2117
> > >>>>    __do_sys_sendto net/socket.c:2129 [inline]
> > >>>>    __se_sys_sendto net/socket.c:2125 [inline]
> > >>>>    __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
> > >>>>    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > >>>>    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > >>>>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > >>>>
> > >>>> Memory state around the buggy address:
> > >>>>    ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > >>>>    ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > >>>>> ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > >>>>                      ^
> > >>>>    ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > >>>>    ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > >>>> ==================================================================
> > >>>>
> > >>>
> > >>>
> > >>> I think the patch from below should fix it.
> > >>>
> > >>> I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
> > >>> EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
> > >>> super block in the buffer get corrupted sometime after the .get_tree
> > >>> (which eventually calls __ext4_fill_super()) is called. So instead of
> > >>> relying on the contents of the buffer, we should instead rely on the
> > >>> s_desc_size initialized at the __ext4_fill_super() time.
> > >>>
> > >>> If someone finds this good (or bad), or has a more in depth explanation,
> > >>> please let me know, it will help me better understand the subsystem. In
> > >>> the meantime I'll continue to investigate this and prepare a patch for
> > >>> it.
> > >>
> > >> If there's something corrupting the superblock while the filesystem is
> > >> mounted, we need to find what is corrupting the SB and fix *that*. Not
> > >> try
> > >> to paper over the problem by not using the on-disk data... Maybe journal
> > >> replay is corrupting the value or something like that?
> > >>
> > >>                                 Honza
> > >>
> > >
> > > Ok, I agree. First thing would be to understand the reproducer and to
> > > simplify it if possible. I haven't yet decoded what the syz repro is
> > > doing at
> > > https://syzkaller.appspot.com/text?tag=ReproSyz&x=16ce3de4c80000
> > > Will reply to this email thread once I understand what's happening. If
> > > you or someone else can decode the syz repro faster than me, shoot.
> > >
> >
> > I can now explain how the contents of the super block of the buffer get
> > corrupted. After the ext4 fs is mounted to the target ("./bus"), the
> > reproducer maps 6MB of data starting at offset 0 in the target's file
> > ("./bus"), then it starts overriding the data with something else, by
> > using memcpy, memset, individual byte inits. Does that mean that we
> > shouldn't rely on the contents of the super block in the buffer after we
> > mount the file system? If so, then my patch stands. I'll be happy to
> > extend it if needed. Below one may find a step by step interpretation of
> > the reproducer.
> >
> > We have a strace log for the same bug, but on Android 5.15:
> > https://syzkaller.appspot.com/text?tag=CrashLog&x=14ecec8cc80000
> >
> > Look for pid 328. You notice that the bpf() syscalls return error, so I
> > commented them out in the c repro to confirm that they are not the
> > cause. The bug reproduced without the bpf() calls. One can find the c
> > repro at:
> > https://syzkaller.appspot.com/text?tag=ReproC&x=17c5fc50c80000
> >
> > Let's look at these calls, just before the bug was hit:
> > [pid   328] open("./bus",
> > O_RDWR|O_CREAT|O_TRUNC|O_NONBLOCK|O_SYNC|O_DIRECT|O_LARGEFILE|O_NOATIME,
> > 000) = 4
> > [pid   328] mount("/dev/loop0", "./bus", NULL, MS_BIND, NULL) = 0
> > [pid   328] open("./bus", O_RDWR|O_SYNC|O_NOATIME|0x3c) = 5
> > [pid   328] mmap(0x20000000, 6291456,
> > PROT_READ|PROT_WRITE|PROT_EXEC|PROT_SEM|0x47ffff0, MAP_SHARED|MAP_FIXED,
> > 5, 0) = 0x20000000
>
> Yeah, looking at the reproducer, before this the reproducer also mounts
> /dev/loop0 as ext4 filesystem.
>
> > - ./bus is created (if it does not exist), fd 4 is returned.
> > - /dev/loop0 is mounted to ./bus
> > - then it creates a new file descriptor (5) for the same ./bus
> > - then it creates a mapping for ./bus starting at offset zero. The
> > mapped area is at 0x20000000 and is of 0x600000ul length.
>
> So the result is that the reproducer modified the block device while it is
> mounted by the filesystem. We know cases like this can crash the kernel and
> it is inherently difficult to fix. We have to trust the buffer cache
> contents as otherwise the performance will be unacceptable. For historical
> reasons we also have to allow modifications of buffer cache while ext4 is
> mounted because tune2fs uses this to e.g. update the label of a mounted
> filesystem.
>
> Long-term we are moving ext4 in a direction where we can disallow block
> device modifications while the fs is mounted but we are not there yet. I've
> discussed some shorter-term solution to avoid such known problems with syzbot
> developers and what seems plausible would be a kconfig option to disallow
> writing to a block device when it is exclusively open by someone else.
> But so far I didn't get to trying whether this would reasonably work. Would
> you be interested in having a look into this?

Hi Jan,

Does this affect only the loop device or also USB storage devices?
Say, if the USB device returns different contents during mount and on
subsequent reads?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum
  2023-03-13 14:53             ` Dmitry Vyukov
@ 2023-03-14  2:26               ` Theodore Ts'o
  2023-03-14  9:45                 ` Dmitry Vyukov
  2023-03-14  8:49               ` Jan Kara
  1 sibling, 1 reply; 20+ messages in thread
From: Theodore Ts'o @ 2023-03-14  2:26 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Jan Kara, Tudor Ambarus, syzbot, adilger.kernel, linux-ext4,
	linux-kernel, llvm, nathan, ndesaulniers, syzkaller-bugs, trix,
	Lee Jones

On Mon, Mar 13, 2023 at 03:53:57PM +0100, Dmitry Vyukov wrote:
> > Long-term we are moving ext4 in a direction where we can disallow block
> > device modifications while the fs is mounted but we are not there yet. I've
> > discussed some shorter-term solution to avoid such known problems with syzbot
> > developers and what seems plausible would be a kconfig option to disallow
> > writing to a block device when it is exclusively open by someone else.
> > But so far I didn't get to trying whether this would reasonably work. Would
> > you be interested in having a look into this?
> 
> Does this affect only the loop device or also USB storage devices?
> Say, if the USB device returns different contents during mount and on
> subsequent reads?

Modifying the block device while the file system is mounted is
something that we have to allow for now because tune2fs uses it to
modify the superblock.  It has historically also been used (rarely) by
people who know what they are doing to do surgery on a mounted file
system.  If we create a way for tune2fs to be able to update the
superblock via some kind of ioctl, we could disallow modifying the
block device while the file system is mounted.  Of course, it would
require waiting at least 5-6 years since sometimes people will update
the kernel without updating userspace.  We'd also need to check to
make sure there aren't boot loader installer (such as grub-install)
that depend on being able to modify the block device while the root
file system is mounted, at least in some rare cases.

The "how" to exclude mounted file systems is relatively easy.  The
kernel already knows when the file system is mounted, and it is
already a supported feature that a userspace application that wants to
be careful can open a block device with O_EXCL, and if it is in use by
the kernel --- mounted by a file system, being used by dm-thin, et. al
-- the open(2) system call will fail.  From the open(2) man page.

          In  general, the behavior of O_EXCL is undefined if it is used without
          O_CREAT.  There is one exception: on Linux 2.6 and later,  O_EXCL  can
          be  used without O_CREAT if pathname refers to a block device.  If the
          block device is in use by the system  (e.g.,  mounted),  open()  fails
          with the error EBUSY.

Something which the syzbot could to do today is to simply use O_EXCL
whenever trying to open a block device.  This would avoid a class of
syzbot false positives, since normally it requires root privileges
and/or an experienced sysadmin to try to modify a block device while
it is mounted and/or in use by LVM.

      	      	     	       - Ted

P.S.  Trivia note: Aproximately month after I started work at VA Linux
Systems, a sysadmin intern which was given the root password to
sourceforge.net, while trying to fix a disk-to-disk backup, ran
mkfs.ext3 on /dev/hdXX, which was also being used as one-half of a
RAID 0 setup on which open source code critical to the community
(including, for example, OpenGL) was mounted and serving.  The intern
got about 50% the way through zeroing the inode table on /dev/hdXX
before the file system noticed and threw an error, at which point
wiser heads stopped what the intern was doing and tried to clean up
the mess.  Of course, there were no backups, since that was what the
intern was trying to fix!

There are a couple of things that we could learn from this incident.
One was that giving the root password to an untrained intern not
familiar with the setup on the serving system was... an unfortunate
choice.  Another was that adding the above-mentioned O_EXCL feature
and teaching mkfs to use it was an obvious post-mortem action item to
prevent this kind of problem in the future...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum
  2023-03-13 14:53             ` Dmitry Vyukov
  2023-03-14  2:26               ` Theodore Ts'o
@ 2023-03-14  8:49               ` Jan Kara
  2023-03-14  9:33                 ` Dmitry Vyukov
  1 sibling, 1 reply; 20+ messages in thread
From: Jan Kara @ 2023-03-14  8:49 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Jan Kara, Tudor Ambarus, syzbot, adilger.kernel, linux-ext4,
	linux-kernel, llvm, nathan, ndesaulniers, syzkaller-bugs, trix,
	tytso, Lee Jones

On Mon 13-03-23 15:53:57, Dmitry Vyukov wrote:
> On Mon, 13 Mar 2023 at 12:57, Jan Kara <jack@suse.cz> wrote:
> >
> > Hi Tudor!
> >
> > On Mon 13-03-23 11:11:18, Tudor Ambarus wrote:
> > > On 3/7/23 11:02, Tudor Ambarus wrote:
> > > > On 3/7/23 10:39, Jan Kara wrote:
> > > >> On Wed 01-03-23 12:13:51, Tudor Ambarus wrote:
> > > >>> On 2/13/23 15:56, syzbot wrote:
> > > >>>> syzbot has found a reproducer for the following issue on:
> > > >>>>
> > > >>>> HEAD commit:    ceaa837f96ad Linux 6.2-rc8
> > > >>>> git tree:       upstream
> > > >>>> console output:
> > > >>>> https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
> > > >>>> kernel config:
> > > >>>> https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
> > > >>>> dashboard link:
> > > >>>> https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
> > > >>>> compiler:       Debian clang version 15.0.7, GNU ld (GNU Binutils
> > > >>>> for Debian) 2.35.2
> > > >>>> syz repro:
> > > >>>> https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000
> > > >>>>
> > > >>>> Downloadable assets:
> > > >>>> disk image:
> > > >>>> https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
> > > >>>> vmlinux:
> > > >>>> https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
> > > >>>> kernel image:
> > > >>>> https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
> > > >>>> mounted in repro:
> > > >>>> https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz
> > > >>>>
> > > >>>> IMPORTANT: if you fix the issue, please add the following tag to the
> > > >>>> commit:
> > > >>>> Reported-by: syzbot+8785e41224a3afd04321@syzkaller.appspotmail.com
> > > >>>>
> > > >>>> ==================================================================
> > > >>>> BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
> > > >>>> Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339
> > > >>>>
> > > >>>> CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted
> > > >>>> 6.2.0-rc8-syzkaller #0
> > > >>>> Hardware name: Google Google Compute Engine/Google Compute Engine,
> > > >>>> BIOS Google 01/21/2023
> > > >>>> Call Trace:
> > > >>>>    <TASK>
> > > >>>>    __dump_stack lib/dump_stack.c:88 [inline]
> > > >>>>    dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
> > > >>>>    print_address_description mm/kasan/report.c:306 [inline]
> > > >>>>    print_report+0x163/0x4f0 mm/kasan/report.c:417
> > > >>>>    kasan_report+0x13a/0x170 mm/kasan/report.c:517
> > > >>>>    crc16+0x1fb/0x280 lib/crc16.c:58
> > > >>>>    ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
> > > >>>>    ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
> > > >>>>    ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
> > > >>>>    ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
> > > >>>>    ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
> > > >>>>    ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
> > > >>>>    ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
> > > >>>>    ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
> > > >>>>    ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
> > > >>>>    ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
> > > >>>>    evict+0x2a4/0x620 fs/inode.c:664
> > > >>>>    do_unlinkat+0x4f1/0x930 fs/namei.c:4327
> > > >>>>    __do_sys_unlink fs/namei.c:4368 [inline]
> > > >>>>    __se_sys_unlink fs/namei.c:4366 [inline]
> > > >>>>    __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
> > > >>>>    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > >>>>    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > > >>>>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > >>>> RIP: 0033:0x7fbc85a8c0f9
> > > >>>> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48
> > > >>>> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
> > > >>>> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > > >>>> RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
> > > >>>> RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
> > > >>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
> > > >>>> RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
> > > >>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > > >>>> R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
> > > >>>>    </TASK>
> > > >>>>
> > > >>>> The buggy address belongs to the physical page:
> > > >>>> page:ffffea0001f78000 refcount:0 mapcount:-128
> > > >>>> mapping:0000000000000000 index:0x0 pfn:0x7de00
> > > >>>> flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
> > > >>>> raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08
> > > >>>> 0000000000000000
> > > >>>> raw: 0000000000000000 0000000000000001 00000000ffffff7f
> > > >>>> 0000000000000000
> > > >>>> page dumped because: kasan: bad access detected
> > > >>>> page_owner tracks the page as freed
> > > >>>> page last allocated via order 1, migratetype Unmovable, gfp_mask
> > > >>>> 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
> > > >>>>    prep_new_page mm/page_alloc.c:2531 [inline]
> > > >>>>    get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
> > > >>>>    __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
> > > >>>>    alloc_slab_page+0x6a/0x160 mm/slub.c:1851
> > > >>>>    allocate_slab mm/slub.c:1998 [inline]
> > > >>>>    new_slab+0x84/0x2f0 mm/slub.c:2051
> > > >>>>    ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
> > > >>>>    __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
> > > >>>>    kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
> > > >>>>    mt_alloc_bulk lib/maple_tree.c:157 [inline]
> > > >>>>    mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
> > > >>>>    mas_node_count_gfp lib/maple_tree.c:1316 [inline]
> > > >>>>    mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
> > > >>>>    vma_expand+0x277/0x850 mm/mmap.c:541
> > > >>>>    mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
> > > >>>>    do_mmap+0x8c9/0xf70 mm/mmap.c:1411
> > > >>>>    vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
> > > >>>>    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > >>>>    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > > >>>>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > >>>> page last free stack trace:
> > > >>>>    reset_page_owner include/linux/page_owner.h:24 [inline]
> > > >>>>    free_pages_prepare mm/page_alloc.c:1446 [inline]
> > > >>>>    free_pcp_prepare mm/page_alloc.c:1496 [inline]
> > > >>>>    free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
> > > >>>>    free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
> > > >>>>    qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
> > > >>>>    kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
> > > >>>>    __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
> > > >>>>    kasan_slab_alloc include/linux/kasan.h:201 [inline]
> > > >>>>    slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
> > > >>>>    slab_alloc_node mm/slub.c:3452 [inline]
> > > >>>>    kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
> > > >>>>    __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
> > > >>>>    alloc_skb include/linux/skbuff.h:1270 [inline]
> > > >>>>    alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
> > > >>>>    sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
> > > >>>>    unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
> > > >>>>    sock_sendmsg_nosec net/socket.c:714 [inline]
> > > >>>>    sock_sendmsg net/socket.c:734 [inline]
> > > >>>>    __sys_sendto+0x475/0x5f0 net/socket.c:2117
> > > >>>>    __do_sys_sendto net/socket.c:2129 [inline]
> > > >>>>    __se_sys_sendto net/socket.c:2125 [inline]
> > > >>>>    __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
> > > >>>>    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > >>>>    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > > >>>>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > >>>>
> > > >>>> Memory state around the buggy address:
> > > >>>>    ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > >>>>    ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > >>>>> ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > >>>>                      ^
> > > >>>>    ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > >>>>    ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > >>>> ==================================================================
> > > >>>>
> > > >>>
> > > >>>
> > > >>> I think the patch from below should fix it.
> > > >>>
> > > >>> I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
> > > >>> EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
> > > >>> super block in the buffer get corrupted sometime after the .get_tree
> > > >>> (which eventually calls __ext4_fill_super()) is called. So instead of
> > > >>> relying on the contents of the buffer, we should instead rely on the
> > > >>> s_desc_size initialized at the __ext4_fill_super() time.
> > > >>>
> > > >>> If someone finds this good (or bad), or has a more in depth explanation,
> > > >>> please let me know, it will help me better understand the subsystem. In
> > > >>> the meantime I'll continue to investigate this and prepare a patch for
> > > >>> it.
> > > >>
> > > >> If there's something corrupting the superblock while the filesystem is
> > > >> mounted, we need to find what is corrupting the SB and fix *that*. Not
> > > >> try
> > > >> to paper over the problem by not using the on-disk data... Maybe journal
> > > >> replay is corrupting the value or something like that?
> > > >>
> > > >>                                 Honza
> > > >>
> > > >
> > > > Ok, I agree. First thing would be to understand the reproducer and to
> > > > simplify it if possible. I haven't yet decoded what the syz repro is
> > > > doing at
> > > > https://syzkaller.appspot.com/text?tag=ReproSyz&x=16ce3de4c80000
> > > > Will reply to this email thread once I understand what's happening. If
> > > > you or someone else can decode the syz repro faster than me, shoot.
> > > >
> > >
> > > I can now explain how the contents of the super block of the buffer get
> > > corrupted. After the ext4 fs is mounted to the target ("./bus"), the
> > > reproducer maps 6MB of data starting at offset 0 in the target's file
> > > ("./bus"), then it starts overriding the data with something else, by
> > > using memcpy, memset, individual byte inits. Does that mean that we
> > > shouldn't rely on the contents of the super block in the buffer after we
> > > mount the file system? If so, then my patch stands. I'll be happy to
> > > extend it if needed. Below one may find a step by step interpretation of
> > > the reproducer.
> > >
> > > We have a strace log for the same bug, but on Android 5.15:
> > > https://syzkaller.appspot.com/text?tag=CrashLog&x=14ecec8cc80000
> > >
> > > Look for pid 328. You notice that the bpf() syscalls return error, so I
> > > commented them out in the c repro to confirm that they are not the
> > > cause. The bug reproduced without the bpf() calls. One can find the c
> > > repro at:
> > > https://syzkaller.appspot.com/text?tag=ReproC&x=17c5fc50c80000
> > >
> > > Let's look at these calls, just before the bug was hit:
> > > [pid   328] open("./bus",
> > > O_RDWR|O_CREAT|O_TRUNC|O_NONBLOCK|O_SYNC|O_DIRECT|O_LARGEFILE|O_NOATIME,
> > > 000) = 4
> > > [pid   328] mount("/dev/loop0", "./bus", NULL, MS_BIND, NULL) = 0
> > > [pid   328] open("./bus", O_RDWR|O_SYNC|O_NOATIME|0x3c) = 5
> > > [pid   328] mmap(0x20000000, 6291456,
> > > PROT_READ|PROT_WRITE|PROT_EXEC|PROT_SEM|0x47ffff0, MAP_SHARED|MAP_FIXED,
> > > 5, 0) = 0x20000000
> >
> > Yeah, looking at the reproducer, before this the reproducer also mounts
> > /dev/loop0 as ext4 filesystem.
> >
> > > - ./bus is created (if it does not exist), fd 4 is returned.
> > > - /dev/loop0 is mounted to ./bus
> > > - then it creates a new file descriptor (5) for the same ./bus
> > > - then it creates a mapping for ./bus starting at offset zero. The
> > > mapped area is at 0x20000000 and is of 0x600000ul length.
> >
> > So the result is that the reproducer modified the block device while it is
> > mounted by the filesystem. We know cases like this can crash the kernel and
> > it is inherently difficult to fix. We have to trust the buffer cache
> > contents as otherwise the performance will be unacceptable. For historical
> > reasons we also have to allow modifications of buffer cache while ext4 is
> > mounted because tune2fs uses this to e.g. update the label of a mounted
> > filesystem.
> >
> > Long-term we are moving ext4 in a direction where we can disallow block
> > device modifications while the fs is mounted but we are not there yet. I've
> > discussed some shorter-term solution to avoid such known problems with syzbot
> > developers and what seems plausible would be a kconfig option to disallow
> > writing to a block device when it is exclusively open by someone else.
> > But so far I didn't get to trying whether this would reasonably work. Would
> > you be interested in having a look into this?
> 
> Does this affect only the loop device or also USB storage devices?
> Say, if the USB device returns different contents during mount and on
> subsequent reads?

So if USB returns a different content, we are fine because we verify the
content each time when loading it into the buffer cache. But if something
in the software opens the block device and modifies it, it modifies
directly the buffer cache and thus bypasses any checks we do when loading
data from the storage.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum
  2023-03-14  8:49               ` Jan Kara
@ 2023-03-14  9:33                 ` Dmitry Vyukov
  0 siblings, 0 replies; 20+ messages in thread
From: Dmitry Vyukov @ 2023-03-14  9:33 UTC (permalink / raw)
  To: Jan Kara
  Cc: Tudor Ambarus, syzbot, adilger.kernel, linux-ext4, linux-kernel,
	llvm, nathan, ndesaulniers, syzkaller-bugs, trix, tytso,
	Lee Jones, syzkaller

On Tue, 14 Mar 2023 at 09:49, Jan Kara <jack@suse.cz> wrote:
>
> On Mon 13-03-23 15:53:57, Dmitry Vyukov wrote:
> > On Mon, 13 Mar 2023 at 12:57, Jan Kara <jack@suse.cz> wrote:
> > >
> > > Hi Tudor!
> > >
> > > On Mon 13-03-23 11:11:18, Tudor Ambarus wrote:
> > > > On 3/7/23 11:02, Tudor Ambarus wrote:
> > > > > On 3/7/23 10:39, Jan Kara wrote:
> > > > >> On Wed 01-03-23 12:13:51, Tudor Ambarus wrote:
> > > > >>> On 2/13/23 15:56, syzbot wrote:
> > > > >>>> syzbot has found a reproducer for the following issue on:
> > > > >>>>
> > > > >>>> HEAD commit:    ceaa837f96ad Linux 6.2-rc8
> > > > >>>> git tree:       upstream
> > > > >>>> console output:
> > > > >>>> https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
> > > > >>>> kernel config:
> > > > >>>> https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
> > > > >>>> dashboard link:
> > > > >>>> https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
> > > > >>>> compiler:       Debian clang version 15.0.7, GNU ld (GNU Binutils
> > > > >>>> for Debian) 2.35.2
> > > > >>>> syz repro:
> > > > >>>> https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000
> > > > >>>>
> > > > >>>> Downloadable assets:
> > > > >>>> disk image:
> > > > >>>> https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
> > > > >>>> vmlinux:
> > > > >>>> https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
> > > > >>>> kernel image:
> > > > >>>> https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
> > > > >>>> mounted in repro:
> > > > >>>> https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz
> > > > >>>>
> > > > >>>> IMPORTANT: if you fix the issue, please add the following tag to the
> > > > >>>> commit:
> > > > >>>> Reported-by: syzbot+8785e41224a3afd04321@syzkaller.appspotmail.com
> > > > >>>>
> > > > >>>> ==================================================================
> > > > >>>> BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
> > > > >>>> Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339
> > > > >>>>
> > > > >>>> CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted
> > > > >>>> 6.2.0-rc8-syzkaller #0
> > > > >>>> Hardware name: Google Google Compute Engine/Google Compute Engine,
> > > > >>>> BIOS Google 01/21/2023
> > > > >>>> Call Trace:
> > > > >>>>    <TASK>
> > > > >>>>    __dump_stack lib/dump_stack.c:88 [inline]
> > > > >>>>    dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
> > > > >>>>    print_address_description mm/kasan/report.c:306 [inline]
> > > > >>>>    print_report+0x163/0x4f0 mm/kasan/report.c:417
> > > > >>>>    kasan_report+0x13a/0x170 mm/kasan/report.c:517
> > > > >>>>    crc16+0x1fb/0x280 lib/crc16.c:58
> > > > >>>>    ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
> > > > >>>>    ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
> > > > >>>>    ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
> > > > >>>>    ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
> > > > >>>>    ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
> > > > >>>>    ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
> > > > >>>>    ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
> > > > >>>>    ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
> > > > >>>>    ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
> > > > >>>>    ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
> > > > >>>>    evict+0x2a4/0x620 fs/inode.c:664
> > > > >>>>    do_unlinkat+0x4f1/0x930 fs/namei.c:4327
> > > > >>>>    __do_sys_unlink fs/namei.c:4368 [inline]
> > > > >>>>    __se_sys_unlink fs/namei.c:4366 [inline]
> > > > >>>>    __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
> > > > >>>>    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > >>>>    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > > > >>>>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > >>>> RIP: 0033:0x7fbc85a8c0f9
> > > > >>>> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48
> > > > >>>> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
> > > > >>>> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > > > >>>> RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
> > > > >>>> RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
> > > > >>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
> > > > >>>> RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
> > > > >>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > > > >>>> R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
> > > > >>>>    </TASK>
> > > > >>>>
> > > > >>>> The buggy address belongs to the physical page:
> > > > >>>> page:ffffea0001f78000 refcount:0 mapcount:-128
> > > > >>>> mapping:0000000000000000 index:0x0 pfn:0x7de00
> > > > >>>> flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
> > > > >>>> raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08
> > > > >>>> 0000000000000000
> > > > >>>> raw: 0000000000000000 0000000000000001 00000000ffffff7f
> > > > >>>> 0000000000000000
> > > > >>>> page dumped because: kasan: bad access detected
> > > > >>>> page_owner tracks the page as freed
> > > > >>>> page last allocated via order 1, migratetype Unmovable, gfp_mask
> > > > >>>> 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
> > > > >>>>    prep_new_page mm/page_alloc.c:2531 [inline]
> > > > >>>>    get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
> > > > >>>>    __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
> > > > >>>>    alloc_slab_page+0x6a/0x160 mm/slub.c:1851
> > > > >>>>    allocate_slab mm/slub.c:1998 [inline]
> > > > >>>>    new_slab+0x84/0x2f0 mm/slub.c:2051
> > > > >>>>    ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
> > > > >>>>    __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
> > > > >>>>    kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
> > > > >>>>    mt_alloc_bulk lib/maple_tree.c:157 [inline]
> > > > >>>>    mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
> > > > >>>>    mas_node_count_gfp lib/maple_tree.c:1316 [inline]
> > > > >>>>    mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
> > > > >>>>    vma_expand+0x277/0x850 mm/mmap.c:541
> > > > >>>>    mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
> > > > >>>>    do_mmap+0x8c9/0xf70 mm/mmap.c:1411
> > > > >>>>    vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
> > > > >>>>    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > >>>>    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > > > >>>>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > >>>> page last free stack trace:
> > > > >>>>    reset_page_owner include/linux/page_owner.h:24 [inline]
> > > > >>>>    free_pages_prepare mm/page_alloc.c:1446 [inline]
> > > > >>>>    free_pcp_prepare mm/page_alloc.c:1496 [inline]
> > > > >>>>    free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
> > > > >>>>    free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
> > > > >>>>    qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
> > > > >>>>    kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
> > > > >>>>    __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
> > > > >>>>    kasan_slab_alloc include/linux/kasan.h:201 [inline]
> > > > >>>>    slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
> > > > >>>>    slab_alloc_node mm/slub.c:3452 [inline]
> > > > >>>>    kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
> > > > >>>>    __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
> > > > >>>>    alloc_skb include/linux/skbuff.h:1270 [inline]
> > > > >>>>    alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
> > > > >>>>    sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
> > > > >>>>    unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
> > > > >>>>    sock_sendmsg_nosec net/socket.c:714 [inline]
> > > > >>>>    sock_sendmsg net/socket.c:734 [inline]
> > > > >>>>    __sys_sendto+0x475/0x5f0 net/socket.c:2117
> > > > >>>>    __do_sys_sendto net/socket.c:2129 [inline]
> > > > >>>>    __se_sys_sendto net/socket.c:2125 [inline]
> > > > >>>>    __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
> > > > >>>>    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > >>>>    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > > > >>>>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > >>>>
> > > > >>>> Memory state around the buggy address:
> > > > >>>>    ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > >>>>    ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > >>>>> ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > >>>>                      ^
> > > > >>>>    ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > >>>>    ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > >>>> ==================================================================
> > > > >>>>
> > > > >>>
> > > > >>>
> > > > >>> I think the patch from below should fix it.
> > > > >>>
> > > > >>> I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
> > > > >>> EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
> > > > >>> super block in the buffer get corrupted sometime after the .get_tree
> > > > >>> (which eventually calls __ext4_fill_super()) is called. So instead of
> > > > >>> relying on the contents of the buffer, we should instead rely on the
> > > > >>> s_desc_size initialized at the __ext4_fill_super() time.
> > > > >>>
> > > > >>> If someone finds this good (or bad), or has a more in depth explanation,
> > > > >>> please let me know, it will help me better understand the subsystem. In
> > > > >>> the meantime I'll continue to investigate this and prepare a patch for
> > > > >>> it.
> > > > >>
> > > > >> If there's something corrupting the superblock while the filesystem is
> > > > >> mounted, we need to find what is corrupting the SB and fix *that*. Not
> > > > >> try
> > > > >> to paper over the problem by not using the on-disk data... Maybe journal
> > > > >> replay is corrupting the value or something like that?
> > > > >>
> > > > >>                                 Honza
> > > > >>
> > > > >
> > > > > Ok, I agree. First thing would be to understand the reproducer and to
> > > > > simplify it if possible. I haven't yet decoded what the syz repro is
> > > > > doing at
> > > > > https://syzkaller.appspot.com/text?tag=ReproSyz&x=16ce3de4c80000
> > > > > Will reply to this email thread once I understand what's happening. If
> > > > > you or someone else can decode the syz repro faster than me, shoot.
> > > > >
> > > >
> > > > I can now explain how the contents of the super block of the buffer get
> > > > corrupted. After the ext4 fs is mounted to the target ("./bus"), the
> > > > reproducer maps 6MB of data starting at offset 0 in the target's file
> > > > ("./bus"), then it starts overriding the data with something else, by
> > > > using memcpy, memset, individual byte inits. Does that mean that we
> > > > shouldn't rely on the contents of the super block in the buffer after we
> > > > mount the file system? If so, then my patch stands. I'll be happy to
> > > > extend it if needed. Below one may find a step by step interpretation of
> > > > the reproducer.
> > > >
> > > > We have a strace log for the same bug, but on Android 5.15:
> > > > https://syzkaller.appspot.com/text?tag=CrashLog&x=14ecec8cc80000
> > > >
> > > > Look for pid 328. You notice that the bpf() syscalls return error, so I
> > > > commented them out in the c repro to confirm that they are not the
> > > > cause. The bug reproduced without the bpf() calls. One can find the c
> > > > repro at:
> > > > https://syzkaller.appspot.com/text?tag=ReproC&x=17c5fc50c80000
> > > >
> > > > Let's look at these calls, just before the bug was hit:
> > > > [pid   328] open("./bus",
> > > > O_RDWR|O_CREAT|O_TRUNC|O_NONBLOCK|O_SYNC|O_DIRECT|O_LARGEFILE|O_NOATIME,
> > > > 000) = 4
> > > > [pid   328] mount("/dev/loop0", "./bus", NULL, MS_BIND, NULL) = 0
> > > > [pid   328] open("./bus", O_RDWR|O_SYNC|O_NOATIME|0x3c) = 5
> > > > [pid   328] mmap(0x20000000, 6291456,
> > > > PROT_READ|PROT_WRITE|PROT_EXEC|PROT_SEM|0x47ffff0, MAP_SHARED|MAP_FIXED,
> > > > 5, 0) = 0x20000000
> > >
> > > Yeah, looking at the reproducer, before this the reproducer also mounts
> > > /dev/loop0 as ext4 filesystem.
> > >
> > > > - ./bus is created (if it does not exist), fd 4 is returned.
> > > > - /dev/loop0 is mounted to ./bus
> > > > - then it creates a new file descriptor (5) for the same ./bus
> > > > - then it creates a mapping for ./bus starting at offset zero. The
> > > > mapped area is at 0x20000000 and is of 0x600000ul length.
> > >
> > > So the result is that the reproducer modified the block device while it is
> > > mounted by the filesystem. We know cases like this can crash the kernel and
> > > it is inherently difficult to fix. We have to trust the buffer cache
> > > contents as otherwise the performance will be unacceptable. For historical
> > > reasons we also have to allow modifications of buffer cache while ext4 is
> > > mounted because tune2fs uses this to e.g. update the label of a mounted
> > > filesystem.
> > >
> > > Long-term we are moving ext4 in a direction where we can disallow block
> > > device modifications while the fs is mounted but we are not there yet. I've
> > > discussed some shorter-term solution to avoid such known problems with syzbot
> > > developers and what seems plausible would be a kconfig option to disallow
> > > writing to a block device when it is exclusively open by someone else.
> > > But so far I didn't get to trying whether this would reasonably work. Would
> > > you be interested in having a look into this?
> >
> > Does this affect only the loop device or also USB storage devices?
> > Say, if the USB device returns different contents during mount and on
> > subsequent reads?
>
> So if USB returns a different content, we are fine because we verify the
> content each time when loading it into the buffer cache. But if something
> in the software opens the block device and modifies it, it modifies
> directly the buffer cache and thus bypasses any checks we do when loading
> data from the storage.

Thanks, I see. This is good.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum
  2023-03-14  2:26               ` Theodore Ts'o
@ 2023-03-14  9:45                 ` Dmitry Vyukov
  2023-03-14 10:05                   ` Dmitry Vyukov
  0 siblings, 1 reply; 20+ messages in thread
From: Dmitry Vyukov @ 2023-03-14  9:45 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Jan Kara, Tudor Ambarus, syzbot, adilger.kernel, linux-ext4,
	linux-kernel, llvm, nathan, ndesaulniers, syzkaller-bugs, trix,
	Lee Jones

On Tue, 14 Mar 2023 at 03:26, Theodore Ts'o <tytso@mit.edu> wrote:
>
> On Mon, Mar 13, 2023 at 03:53:57PM +0100, Dmitry Vyukov wrote:
> > > Long-term we are moving ext4 in a direction where we can disallow block
> > > device modifications while the fs is mounted but we are not there yet. I've
> > > discussed some shorter-term solution to avoid such known problems with syzbot
> > > developers and what seems plausible would be a kconfig option to disallow
> > > writing to a block device when it is exclusively open by someone else.
> > > But so far I didn't get to trying whether this would reasonably work. Would
> > > you be interested in having a look into this?
> >
> > Does this affect only the loop device or also USB storage devices?
> > Say, if the USB device returns different contents during mount and on
> > subsequent reads?
>
> Modifying the block device while the file system is mounted is
> something that we have to allow for now because tune2fs uses it to
> modify the superblock.  It has historically also been used (rarely) by
> people who know what they are doing to do surgery on a mounted file
> system.  If we create a way for tune2fs to be able to update the
> superblock via some kind of ioctl, we could disallow modifying the
> block device while the file system is mounted.  Of course, it would
> require waiting at least 5-6 years since sometimes people will update
> the kernel without updating userspace.  We'd also need to check to
> make sure there aren't boot loader installer (such as grub-install)
> that depend on being able to modify the block device while the root
> file system is mounted, at least in some rare cases.
>
> The "how" to exclude mounted file systems is relatively easy.  The
> kernel already knows when the file system is mounted, and it is
> already a supported feature that a userspace application that wants to
> be careful can open a block device with O_EXCL, and if it is in use by
> the kernel --- mounted by a file system, being used by dm-thin, et. al
> -- the open(2) system call will fail.  From the open(2) man page.
>
>           In  general, the behavior of O_EXCL is undefined if it is used without
>           O_CREAT.  There is one exception: on Linux 2.6 and later,  O_EXCL  can
>           be  used without O_CREAT if pathname refers to a block device.  If the
>           block device is in use by the system  (e.g.,  mounted),  open()  fails
>           with the error EBUSY.
>
> Something which the syzbot could to do today is to simply use O_EXCL
> whenever trying to open a block device.  This would avoid a class of
> syzbot false positives, since normally it requires root privileges
> and/or an experienced sysadmin to try to modify a block device while
> it is mounted and/or in use by LVM.
>
>                                - Ted
>
> P.S.  Trivia note: Aproximately month after I started work at VA Linux
> Systems, a sysadmin intern which was given the root password to
> sourceforge.net, while trying to fix a disk-to-disk backup, ran
> mkfs.ext3 on /dev/hdXX, which was also being used as one-half of a
> RAID 0 setup on which open source code critical to the community
> (including, for example, OpenGL) was mounted and serving.  The intern
> got about 50% the way through zeroing the inode table on /dev/hdXX
> before the file system noticed and threw an error, at which point
> wiser heads stopped what the intern was doing and tried to clean up
> the mess.  Of course, there were no backups, since that was what the
> intern was trying to fix!
>
> There are a couple of things that we could learn from this incident.
> One was that giving the root password to an untrained intern not
> familiar with the setup on the serving system was... an unfortunate
> choice.  Another was that adding the above-mentioned O_EXCL feature
> and teaching mkfs to use it was an obvious post-mortem action item to
> prevent this kind of problem in the future...

I am struggling to make my mind re how to think about this case.

"root" is very overloaded, but generally it does not mean "randomly
corrupting memory". Normally it gives access to system-wide changes
but with the same protection/consistency guarantees as for
unprivileged system calls.

There are, of course, things like /dev/{mem,kmem}. But at the same
time there is also lockdown LSM and more distros today enable it.

Btw, should this "prohibit writes to mounted device" be part of
LOCKDOWN_INTEGRITY? It looks like it gives capabilities similar to
/dev/{mem,kmem}.

Disabling in testing something that's enabled in production is
generally not very useful.
So one option is to do nothing about this for now.
If it's a true recognized issue that is in the process of fixing,
syzbot will just show that it's still present. One of the goals of
syzbot is to show the current state of things in an objective manner.
If some kernel developers are aware of an issue, it does not mean that
most distros/users are aware.

It makes sense to disable in testing things that are also recommended
to be disabled in production settings.
And LOCKDOWN_INTEGRITY may play such a role: we include this
restriction into LOCKDOWN_INTEGRITY and enable it on syzbot.
Though, unfortunately, we still don't enable it because it prohibits
access to debugfs, which is required for fuzzing. Need to ask lockdown
maintainers what they think about
LOCKDOWN_TEST_ONLY_DONT_ENABLE_IN_PROD_INTEGRITY which would whitelist
debugfs.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum
  2023-03-14  9:45                 ` Dmitry Vyukov
@ 2023-03-14 10:05                   ` Dmitry Vyukov
  0 siblings, 0 replies; 20+ messages in thread
From: Dmitry Vyukov @ 2023-03-14 10:05 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Jan Kara, Tudor Ambarus, syzbot, adilger.kernel, linux-ext4,
	linux-kernel, llvm, nathan, ndesaulniers, syzkaller-bugs, trix,
	Lee Jones

On Tue, 14 Mar 2023 at 10:45, Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Tue, 14 Mar 2023 at 03:26, Theodore Ts'o <tytso@mit.edu> wrote:
> >
> > On Mon, Mar 13, 2023 at 03:53:57PM +0100, Dmitry Vyukov wrote:
> > > > Long-term we are moving ext4 in a direction where we can disallow block
> > > > device modifications while the fs is mounted but we are not there yet. I've
> > > > discussed some shorter-term solution to avoid such known problems with syzbot
> > > > developers and what seems plausible would be a kconfig option to disallow
> > > > writing to a block device when it is exclusively open by someone else.
> > > > But so far I didn't get to trying whether this would reasonably work. Would
> > > > you be interested in having a look into this?
> > >
> > > Does this affect only the loop device or also USB storage devices?
> > > Say, if the USB device returns different contents during mount and on
> > > subsequent reads?
> >
> > Modifying the block device while the file system is mounted is
> > something that we have to allow for now because tune2fs uses it to
> > modify the superblock.  It has historically also been used (rarely) by
> > people who know what they are doing to do surgery on a mounted file
> > system.  If we create a way for tune2fs to be able to update the
> > superblock via some kind of ioctl, we could disallow modifying the
> > block device while the file system is mounted.  Of course, it would
> > require waiting at least 5-6 years since sometimes people will update
> > the kernel without updating userspace.  We'd also need to check to
> > make sure there aren't boot loader installer (such as grub-install)
> > that depend on being able to modify the block device while the root
> > file system is mounted, at least in some rare cases.
> >
> > The "how" to exclude mounted file systems is relatively easy.  The
> > kernel already knows when the file system is mounted, and it is
> > already a supported feature that a userspace application that wants to
> > be careful can open a block device with O_EXCL, and if it is in use by
> > the kernel --- mounted by a file system, being used by dm-thin, et. al
> > -- the open(2) system call will fail.  From the open(2) man page.
> >
> >           In  general, the behavior of O_EXCL is undefined if it is used without
> >           O_CREAT.  There is one exception: on Linux 2.6 and later,  O_EXCL  can
> >           be  used without O_CREAT if pathname refers to a block device.  If the
> >           block device is in use by the system  (e.g.,  mounted),  open()  fails
> >           with the error EBUSY.
> >
> > Something which the syzbot could to do today is to simply use O_EXCL
> > whenever trying to open a block device.  This would avoid a class of
> > syzbot false positives, since normally it requires root privileges
> > and/or an experienced sysadmin to try to modify a block device while
> > it is mounted and/or in use by LVM.
> >
> >                                - Ted
> >
> > P.S.  Trivia note: Aproximately month after I started work at VA Linux
> > Systems, a sysadmin intern which was given the root password to
> > sourceforge.net, while trying to fix a disk-to-disk backup, ran
> > mkfs.ext3 on /dev/hdXX, which was also being used as one-half of a
> > RAID 0 setup on which open source code critical to the community
> > (including, for example, OpenGL) was mounted and serving.  The intern
> > got about 50% the way through zeroing the inode table on /dev/hdXX
> > before the file system noticed and threw an error, at which point
> > wiser heads stopped what the intern was doing and tried to clean up
> > the mess.  Of course, there were no backups, since that was what the
> > intern was trying to fix!
> >
> > There are a couple of things that we could learn from this incident.
> > One was that giving the root password to an untrained intern not
> > familiar with the setup on the serving system was... an unfortunate
> > choice.  Another was that adding the above-mentioned O_EXCL feature
> > and teaching mkfs to use it was an obvious post-mortem action item to
> > prevent this kind of problem in the future...
>
> I am struggling to make my mind re how to think about this case.
>
> "root" is very overloaded, but generally it does not mean "randomly
> corrupting memory". Normally it gives access to system-wide changes
> but with the same protection/consistency guarantees as for
> unprivileged system calls.
>
> There are, of course, things like /dev/{mem,kmem}. But at the same
> time there is also lockdown LSM and more distros today enable it.
>
> Btw, should this "prohibit writes to mounted device" be part of
> LOCKDOWN_INTEGRITY? It looks like it gives capabilities similar to
> /dev/{mem,kmem}.
>
> Disabling in testing something that's enabled in production is
> generally not very useful.
> So one option is to do nothing about this for now.
> If it's a true recognized issue that is in the process of fixing,
> syzbot will just show that it's still present. One of the goals of
> syzbot is to show the current state of things in an objective manner.
> If some kernel developers are aware of an issue, it does not mean that
> most distros/users are aware.
>
> It makes sense to disable in testing things that are also recommended
> to be disabled in production settings.
> And LOCKDOWN_INTEGRITY may play such a role: we include this
> restriction into LOCKDOWN_INTEGRITY and enable it on syzbot.
> Though, unfortunately, we still don't enable it because it prohibits
> access to debugfs, which is required for fuzzing. Need to ask lockdown
> maintainers what they think about
> LOCKDOWN_TEST_ONLY_DONT_ENABLE_IN_PROD_INTEGRITY which would whitelist
> debugfs.

Asked lockdown maintainers about adding this it lockdown and adding
special mode for fuzzing:
https://lore.kernel.org/all/CACT4Y+Z-9KCgKwkktvdJwNJZxxeA1f74zkP7KD6c=OmKXxXfjw@mail.gmail.com/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum
  2023-03-13 13:17                 ` yebin (H)
@ 2023-03-14 11:19                   ` Jan Kara
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Kara @ 2023-03-14 11:19 UTC (permalink / raw)
  To: yebin (H)
  Cc: Jan Kara, yebin, Tudor Ambarus, syzbot, adilger.kernel,
	linux-ext4, linux-kernel, llvm, nathan, ndesaulniers,
	syzkaller-bugs, trix, tytso, Lee Jones

On Mon 13-03-23 21:17:57, yebin (H) wrote:
> On 2023/3/13 21:01, Jan Kara wrote:
> > On Mon 13-03-23 20:27:34, yebin wrote:
> > > On 2023/3/13 19:57, Jan Kara wrote:
> > > > On Mon 13-03-23 11:11:18, Tudor Ambarus wrote:
> > > > > On 3/7/23 11:02, Tudor Ambarus wrote:
> > > > > > On 3/7/23 10:39, Jan Kara wrote:
> > > > > > > On Wed 01-03-23 12:13:51, Tudor Ambarus wrote:
> > > > > > > > On 2/13/23 15:56, syzbot wrote:
> > > > > > > > > syzbot has found a reproducer for the following issue on:
> > > > > > > > > 
> > > > > > > > > HEAD commit:    ceaa837f96ad Linux 6.2-rc8
> > > > > > > > > git tree:       upstream
> > > > > > > > > console output:
> > > > > > > > > https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
> > > > > > > > > kernel config:
> > > > > > > > > https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
> > > > > > > > > dashboard link:
> > > > > > > > > https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
> > > > > > > > > compiler:       Debian clang version 15.0.7, GNU ld (GNU Binutils
> > > > > > > > > for Debian) 2.35.2
> > > > > > > > > syz repro:
> > > > > > > > > https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000
> > > > > > > > > 
> > > > > > > > > Downloadable assets:
> > > > > > > > > disk image:
> > > > > > > > > https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
> > > > > > > > > vmlinux:
> > > > > > > > > https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
> > > > > > > > > kernel image:
> > > > > > > > > https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
> > > > > > > > > mounted in repro:
> > > > > > > > > https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz
> > > > > > > > > 
> > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the
> > > > > > > > > commit:
> > > > > > > > > Reported-by: syzbot+8785e41224a3afd04321@syzkaller.appspotmail.com
> > > > > > > > > 
> > > > > > > > > ==================================================================
> > > > > > > > > BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
> > > > > > > > > Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339
> > > > > > > > > 
> > > > > > > > > CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted
> > > > > > > > > 6.2.0-rc8-syzkaller #0
> > > > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > > > > > > > > BIOS Google 01/21/2023
> > > > > > > > > Call Trace:
> > > > > > > > >      <TASK>
> > > > > > > > >      __dump_stack lib/dump_stack.c:88 [inline]
> > > > > > > > >      dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
> > > > > > > > >      print_address_description mm/kasan/report.c:306 [inline]
> > > > > > > > >      print_report+0x163/0x4f0 mm/kasan/report.c:417
> > > > > > > > >      kasan_report+0x13a/0x170 mm/kasan/report.c:517
> > > > > > > > >      crc16+0x1fb/0x280 lib/crc16.c:58
> > > > > > > > >      ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
> > > > > > > > >      ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
> > > > > > > > >      ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
> > > > > > > > >      ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
> > > > > > > > >      ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
> > > > > > > > >      ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
> > > > > > > > >      ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
> > > > > > > > >      ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
> > > > > > > > >      ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
> > > > > > > > >      ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
> > > > > > > > >      evict+0x2a4/0x620 fs/inode.c:664
> > > > > > > > >      do_unlinkat+0x4f1/0x930 fs/namei.c:4327
> > > > > > > > >      __do_sys_unlink fs/namei.c:4368 [inline]
> > > > > > > > >      __se_sys_unlink fs/namei.c:4366 [inline]
> > > > > > > > >      __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
> > > > > > > > >      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > > > > > >      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > > > > > > > >      entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > > > > RIP: 0033:0x7fbc85a8c0f9
> > > > > > > > > Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48
> > > > > > > > > 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
> > > > > > > > > 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > > > > > > > > RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
> > > > > > > > > RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
> > > > > > > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
> > > > > > > > > RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
> > > > > > > > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > > > > > > > > R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
> > > > > > > > >      </TASK>
> > > > > > > > > 
> > > > > > > > > The buggy address belongs to the physical page:
> > > > > > > > > page:ffffea0001f78000 refcount:0 mapcount:-128
> > > > > > > > > mapping:0000000000000000 index:0x0 pfn:0x7de00
> > > > > > > > > flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
> > > > > > > > > raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08
> > > > > > > > > 0000000000000000
> > > > > > > > > raw: 0000000000000000 0000000000000001 00000000ffffff7f
> > > > > > > > > 0000000000000000
> > > > > > > > > page dumped because: kasan: bad access detected
> > > > > > > > > page_owner tracks the page as freed
> > > > > > > > > page last allocated via order 1, migratetype Unmovable, gfp_mask
> > > > > > > > > 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
> > > > > > > > >      prep_new_page mm/page_alloc.c:2531 [inline]
> > > > > > > > >      get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
> > > > > > > > >      __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
> > > > > > > > >      alloc_slab_page+0x6a/0x160 mm/slub.c:1851
> > > > > > > > >      allocate_slab mm/slub.c:1998 [inline]
> > > > > > > > >      new_slab+0x84/0x2f0 mm/slub.c:2051
> > > > > > > > >      ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
> > > > > > > > >      __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
> > > > > > > > >      kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
> > > > > > > > >      mt_alloc_bulk lib/maple_tree.c:157 [inline]
> > > > > > > > >      mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
> > > > > > > > >      mas_node_count_gfp lib/maple_tree.c:1316 [inline]
> > > > > > > > >      mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
> > > > > > > > >      vma_expand+0x277/0x850 mm/mmap.c:541
> > > > > > > > >      mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
> > > > > > > > >      do_mmap+0x8c9/0xf70 mm/mmap.c:1411
> > > > > > > > >      vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
> > > > > > > > >      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > > > > > >      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > > > > > > > >      entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > > > > page last free stack trace:
> > > > > > > > >      reset_page_owner include/linux/page_owner.h:24 [inline]
> > > > > > > > >      free_pages_prepare mm/page_alloc.c:1446 [inline]
> > > > > > > > >      free_pcp_prepare mm/page_alloc.c:1496 [inline]
> > > > > > > > >      free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
> > > > > > > > >      free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
> > > > > > > > >      qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
> > > > > > > > >      kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
> > > > > > > > >      __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
> > > > > > > > >      kasan_slab_alloc include/linux/kasan.h:201 [inline]
> > > > > > > > >      slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
> > > > > > > > >      slab_alloc_node mm/slub.c:3452 [inline]
> > > > > > > > >      kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
> > > > > > > > >      __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
> > > > > > > > >      alloc_skb include/linux/skbuff.h:1270 [inline]
> > > > > > > > >      alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
> > > > > > > > >      sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
> > > > > > > > >      unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
> > > > > > > > >      sock_sendmsg_nosec net/socket.c:714 [inline]
> > > > > > > > >      sock_sendmsg net/socket.c:734 [inline]
> > > > > > > > >      __sys_sendto+0x475/0x5f0 net/socket.c:2117
> > > > > > > > >      __do_sys_sendto net/socket.c:2129 [inline]
> > > > > > > > >      __se_sys_sendto net/socket.c:2125 [inline]
> > > > > > > > >      __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
> > > > > > > > >      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > > > > > >      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > > > > > > > >      entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > > > > 
> > > > > > > > > Memory state around the buggy address:
> > > > > > > > >      ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > > > > >      ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > > > > > > ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > > > >                        ^
> > > > > > > > >      ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > > > >      ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > > > > ==================================================================
> > > > > > > > > 
> > > > > > > > I think the patch from below should fix it.
> > > > > > > > 
> > > > > > > > I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
> > > > > > > > EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
> > > > > > > > super block in the buffer get corrupted sometime after the .get_tree
> > > > > > > > (which eventually calls __ext4_fill_super()) is called. So instead of
> > > > > > > > relying on the contents of the buffer, we should instead rely on the
> > > > > > > > s_desc_size initialized at the __ext4_fill_super() time.
> > > > > > > > 
> > > > > > > > If someone finds this good (or bad), or has a more in depth explanation,
> > > > > > > > please let me know, it will help me better understand the subsystem. In
> > > > > > > > the meantime I'll continue to investigate this and prepare a patch for
> > > > > > > > it.
> > > > > > > If there's something corrupting the superblock while the filesystem is
> > > > > > > mounted, we need to find what is corrupting the SB and fix *that*. Not
> > > > > > > try
> > > > > > > to paper over the problem by not using the on-disk data... Maybe journal
> > > > > > > replay is corrupting the value or something like that?
> > > > > > > 
> > > > > > >                                   Honza
> > > > > > > 
> > > > > > Ok, I agree. First thing would be to understand the reproducer and to
> > > > > > simplify it if possible. I haven't yet decoded what the syz repro is
> > > > > > doing at
> > > > > > https://syzkaller.appspot.com/text?tag=ReproSyz&x=16ce3de4c80000
> > > > > > Will reply to this email thread once I understand what's happening. If
> > > > > > you or someone else can decode the syz repro faster than me, shoot.
> > > > > > 
> > > > > I can now explain how the contents of the super block of the buffer get
> > > > > corrupted. After the ext4 fs is mounted to the target ("./bus"), the
> > > > > reproducer maps 6MB of data starting at offset 0 in the target's file
> > > > > ("./bus"), then it starts overriding the data with something else, by
> > > > > using memcpy, memset, individual byte inits. Does that mean that we
> > > > > shouldn't rely on the contents of the super block in the buffer after we
> > > > > mount the file system? If so, then my patch stands. I'll be happy to
> > > > > extend it if needed. Below one may find a step by step interpretation of
> > > > > the reproducer.
> > > > > 
> > > > > We have a strace log for the same bug, but on Android 5.15:
> > > > > https://syzkaller.appspot.com/text?tag=CrashLog&x=14ecec8cc80000
> > > > > 
> > > > > Look for pid 328. You notice that the bpf() syscalls return error, so I
> > > > > commented them out in the c repro to confirm that they are not the
> > > > > cause. The bug reproduced without the bpf() calls. One can find the c
> > > > > repro at:
> > > > > https://syzkaller.appspot.com/text?tag=ReproC&x=17c5fc50c80000
> > > > > 
> > > > > Let's look at these calls, just before the bug was hit:
> > > > > [pid   328] open("./bus",
> > > > > O_RDWR|O_CREAT|O_TRUNC|O_NONBLOCK|O_SYNC|O_DIRECT|O_LARGEFILE|O_NOATIME,
> > > > > 000) = 4
> > > > > [pid   328] mount("/dev/loop0", "./bus", NULL, MS_BIND, NULL) = 0
> > > > > [pid   328] open("./bus", O_RDWR|O_SYNC|O_NOATIME|0x3c) = 5
> > > > > [pid   328] mmap(0x20000000, 6291456,
> > > > > PROT_READ|PROT_WRITE|PROT_EXEC|PROT_SEM|0x47ffff0, MAP_SHARED|MAP_FIXED,
> > > > > 5, 0) = 0x20000000
> > > > Yeah, looking at the reproducer, before this the reproducer also mounts
> > > > /dev/loop0 as ext4 filesystem.
> > > > 
> > > > > - ./bus is created (if it does not exist), fd 4 is returned.
> > > > > - /dev/loop0 is mounted to ./bus
> > > > > - then it creates a new file descriptor (5) for the same ./bus
> > > > > - then it creates a mapping for ./bus starting at offset zero. The
> > > > > mapped area is at 0x20000000 and is of 0x600000ul length.
> > > > So the result is that the reproducer modified the block device while it is
> > > > mounted by the filesystem. We know cases like this can crash the kernel and
> > > > it is inherently difficult to fix. We have to trust the buffer cache
> > > > contents as otherwise the performance will be unacceptable. For historical
> > > > reasons we also have to allow modifications of buffer cache while ext4 is
> > > > mounted because tune2fs uses this to e.g. update the label of a mounted
> > > > filesystem.
> > > > 
> > > > Long-term we are moving ext4 in a direction where we can disallow block
> > > > device modifications while the fs is mounted but we are not there yet. I've
> > > > discussed some shorter-term solution to avoid such known problems with syzbot
> > > > developers and what seems plausible would be a kconfig option to disallow
> > > > writing to a block device when it is exclusively open by someone else.
> > > > But so far I didn't get to trying whether this would reasonably work. Would
> > > > you be interested in having a look into this?
> > > I am interested in this job. The file system is often damaged by writing
> > > block devices, which is a headache. I have always wanted to eradicate
> > > this kind of problem.  A few months ago, I tried to add a mount parameter
> > > to prohibit modification after the block device is mounted.But I
> > > encountered several problems that led to the termination of my attempt.
> > > First of all, the 32-bit super block flags have been used up and need to
> > > be extended. Secondly, I don't know how to handle read-only flag in the
> > > case of multiple mount points.
> > >   "disallow writing to a block device when it is exclusively open by someone
> > > else. "
> > > -> Perhaps we can add a new IOCTL command to control whether write
> > > operations are allowed after the block device has been exclusively
> > > opened. I don't know if this is feasible?  Do you have any good
> > > suggestions?
> > Well, ioctl() for syzbot would be possible as well but for start I'd try
> > whether the idea with kconfig option will work. Then it will be enough to
> > just make sure all kernels used for fuzzing are built with this option set.
> > Thanks for having a look into this!
>
> In fact, I also want to solve the problem of file system damage caused by
> writing raw disks in the production environment. Use kconfig directly to
> control whether it loses flexibility in the production environment.

I see. But which protections do you exactly want in production? Since you
need to add somewhere the call to ioctl(2) to write-protect the device, you
could as well just "chmod ugo-w <device>" instead, couldn't you? And the
level of protection would be similar.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum
  2023-03-13 11:57           ` Jan Kara
  2023-03-13 12:27             ` yebin
  2023-03-13 14:53             ` Dmitry Vyukov
@ 2023-04-30  2:55             ` Theodore Ts'o
  2 siblings, 0 replies; 20+ messages in thread
From: Theodore Ts'o @ 2023-04-30  2:55 UTC (permalink / raw)
  To: Jan Kara
  Cc: Tudor Ambarus, syzbot, adilger.kernel, linux-ext4, linux-kernel,
	nathan, ndesaulniers, syzkaller-bugs, trix, Lee Jones,
	syzbot+1966db24521e5f6e23f7, syzbot+db6caad9ebd2c8022b41,
	syzbot+e2efa3efc15a1c9e95c3

On Mon, Mar 13, 2023 at 12:57:28PM +0100, Jan Kara wrote:
> > 
> > I can now explain how the contents of the super block of the buffer get
> > corrupted. After the ext4 fs is mounted to the target ("./bus"), the
> > reproducer maps 6MB of data starting at offset 0 in the target's file
> > ("./bus"), then it starts overriding the data with something else, by
> > using memcpy, memset, individual byte inits. Does that mean that we
> > shouldn't rely on the contents of the super block in the buffer after we
> > mount the file system?

It's not reasonable to avoid relying on the contents of the superblock
under all cases.  HOWEVER, sometimes it might make sense.  See below...

> So the result is that the reproducer modified the block device while it is
> mounted by the filesystem. We know cases like this can crash the kernel and
> it is inherently difficult to fix. We have to trust the buffer cache
> contents as otherwise the performance will be unacceptable. For historical
> reasons we also have to allow modifications of buffer cache while ext4 is
> mounted because tune2fs uses this to e.g. update the label of a mounted
> filesystem.

I've been taking a look at some of the syzkaller reports for ext4, and
there are a number of sysbot reports which are caused by the
reproducer messing with the block device while the file system is
mounted, including:

KASAN: slab-out-of-bounds Read in get_max_inline_xattr_value_size
    https://syzkaller.appspot.com/bug?id=731e35eeed762019e385baa96953d9ec8eb63c10
    syzbot+1966db24521e5f6e23f7@syzkaller.appspotmail.com

KASAN: slab-use-after-free Read in ext4_convert_inline_data_nolock
    https://syzkaller.appspot.com/bug?id=434a92f091e845da1ba387fb93f186412e30e35c
    syzbot+db6caad9ebd2c8022b41@syzkaller.appspotmail.com

kernel BUG in ext4_get_group_info
    https://syzkaller.appspot.com/bug?id=69b28112e098b070f639efb356393af3ffec4220
    syzbot+e2efa3efc15a1c9e95c3@syzkaller.appspotmail.com

(The easiest way to find them is to look at the Syzkaller reproducer,
and look for bind mounts of /dev/loopN to "./bus".  It's much less
painful than trying to find it in the C reproducer text file.)

As Jan has pointed out, we can't disable writing to the block device,
because this would break real-world system administrator workloads,
including the ability to set the label and uuid, use tune2fs to set
various parameters on the file system, etc.  We do have ioctls that
allow for setting the label and uuid, and in maybe ten years we should
be able to get to the point where all of the enterprise kernels still
supported by Red Hat, SuSE, etc. can be guaranteed to support all of
the necessary ioctls --- some of which still need to be implemented.

So this will take a *while*, and especially while senior management
types at many companies are announcing layoffs, cutting travel, and
talking about "year of efficiency" and "sharpening focus"[1], I don't
think we'll have much luck getting funded head count to impement
missing ioctls, other than slowly, on volunteer time, and maybe as
intern projects.  So what should we do in the intervening
year(s)/decade?  I'd propose the following priorities.

[1]  while simultaneously whining about "kernel (security) disasters"
and blaming the upstream developers.  Sigh...

From a quality of implementation (QoI) perspective, once we've
determined that it's caused by "messing with the block device while it
is mounted", if it just causes a denial of service attack, these should
be the lowest priority.  However, if there is an easy way to fix it,
AND if it fixes other issues OR makes the kernel smaller and/or more
efficient, I won't turn away those kind of proposed patches.

For example, in the case of the syzkaller report discussed in this
thread ("KASAN: slab-out-of-bounds Read in ext4_group_desc_csum"),
Tudor's proposed change of replacing

	le16_to_cpu(sbi->s_es->s_desc_size)

with
	sbi->s_desc_size

will actually reduce ext4's compiled text size, and make the code more
efficient (we remove an extra indirect reference and a potential byte
swap on big endian systems), and there is no downside.  In fact, in
many places we use sbi->s_desc_size in preference to accessing the
s_es variable; that's why we put it in the ext4_super_info structure
in the first place!  So sure, we should make this change, and if it
avoids a potential KASAN / syzkaller failure, that's a bonus.


Slightly higher in priority are those bugs which might allow kernel
state to be leaked ("kernel confidentiality").  Of course, if the
process with root access can write to the block device, it can almost
certainly read that block device as well; but there might be critical
bits of kernel state (for example, an RSA private key), in kernel
memory, that if leaked, it would be sad.


The highest priority would go to those where root access might be
leveraged to allow arbitrary code to be executed in kernel mode
("kernel integrity") --- which is unfortunate because it allows root
access to breach lockdown security.


Of course, since many of the people working syzbot reports for ext4
are volunteers and/or company engineers working on their own unfunded
personal time, we still can't *guarantee* anything.  In addition, I'd
still reject a patch which had an overly expensive CPU or memory
overhead with a "try harder".  So it would still be on a case-by-case
basis whether such patches would be accepted.  After all, some
business leaders have elected to disable some mitigations for
Spectre/Meltdown and related attacks because they were Too Damn
Expensive.  I reserve the right as upstream maintainer to make similar
judgement calls.

						- Ted

P.S.  As another example, over the weekend, I've been working on some
patches in the works to address the third syzbot report listed above
("kernel BUG in ext4_get_group_info").  When I evaluated these
patches, I found that they increased the compiled text size by 2k when
I added the additional checks, none of which were in hot paths.  But
after I un-inlined ext4_get_group_info(), the compiled test size
shrunk by 4k, for a net 2k byte *savings* in compiled kernel text
memory.

We already had similar checks and calls to ext4_error() in
ext4_get_group_desc(); this patch was just added a similar conditional
call to ext4_error() to ext4_get_group_info() --- and changing the
callers of that function to check for a NULL return from that
function.  While this change only prevents a denial of service attack,
in my judgement the QoI benefits outweigh the costs.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-04-30  2:55 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-02  6:54 [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum syzbot
2023-02-13 15:56 ` syzbot
2023-03-01 12:13   ` Tudor Ambarus
2023-03-07 10:39     ` Jan Kara
2023-03-07 11:02       ` Tudor Ambarus
2023-03-13 11:11         ` Tudor Ambarus
2023-03-13 11:57           ` Jan Kara
2023-03-13 12:27             ` yebin
2023-03-13 13:01               ` Jan Kara
2023-03-13 13:17                 ` yebin (H)
2023-03-14 11:19                   ` Jan Kara
2023-03-13 14:43                 ` Tudor Ambarus
2023-03-13 14:53             ` Dmitry Vyukov
2023-03-14  2:26               ` Theodore Ts'o
2023-03-14  9:45                 ` Dmitry Vyukov
2023-03-14 10:05                   ` Dmitry Vyukov
2023-03-14  8:49               ` Jan Kara
2023-03-14  9:33                 ` Dmitry Vyukov
2023-04-30  2:55             ` Theodore Ts'o
2023-03-03 21:43 ` syzbot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).