linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* scsi: use-after-free in bio_copy_from_iter
@ 2016-11-25 19:08 Dmitry Vyukov
  2016-12-02 16:50 ` Dmitry Vyukov
  0 siblings, 1 reply; 12+ messages in thread
From: Dmitry Vyukov @ 2016-11-25 19:08 UTC (permalink / raw)
  To: Doug Gilbert, jejb, Martin K. Petersen, linux-scsi, LKML, axboe,
	linux-block
  Cc: syzkaller

Hello,

The following program triggers use-after-free in bio_copy_from_iter:
https://gist.githubusercontent.com/dvyukov/80cd94b4e4c288f16ee4c787d404118b/raw/10536069562444da51b758bb39655b514ff93b45/gistfile1.txt


==================================================================
BUG: KASAN: use-after-free in copy_from_iter+0xf30/0x15e0 at addr
ffff880062c6e02a
Read of size 4096 by task a.out/8529
page:ffffea00018b1b80 count:2 mapcount:0 mapping:ffff88006c80e9d0 index:0x1695
flags: 0x5fffc0000000864(referenced|lru|active|private)
page dumped because: kasan: bad access detected
page->mem_cgroup:ffff88003ebcea00
CPU: 1 PID: 8529 Comm: a.out Not tainted 4.9.0-rc6+ #55
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 ffff880039ca6770 ffffffff834c3bb9 ffffffff00000001 1ffff10007394c81
 ffffed0007394c79 0000000041b58ab3 ffffffff89575c30 ffffffff834c38cb
 0000000000000000 0000000000000000 0000000000000001 0000000000000000
Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff834c3bb9>] dump_stack+0x2ee/0x3f5 lib/dump_stack.c:51
 [<     inline     >] print_address_description mm/kasan/report.c:211
 [<     inline     >] kasan_report_error mm/kasan/report.c:285
 [<ffffffff819f18b8>] kasan_report+0x418/0x440 mm/kasan/report.c:305
 [<     inline     >] check_memory_region_inline mm/kasan/kasan.c:308
 [<ffffffff819f0509>] check_memory_region+0x139/0x190 mm/kasan/kasan.c:315
 [<ffffffff819f0a43>] memcpy+0x23/0x50 mm/kasan/kasan.c:350
 [<ffffffff83525d10>] copy_from_iter+0xf30/0x15e0 lib/iov_iter.c:559
 [<ffffffff83526834>] copy_page_from_iter+0x474/0x860 lib/iov_iter.c:614
 [<     inline     >] bio_copy_from_iter block/bio.c:1025
 [<ffffffff833dfc10>] bio_copy_user_iov+0xb50/0xf10 block/bio.c:1226
 [<     inline     >] __blk_rq_map_user_iov block/blk-map.c:56
 [<ffffffff83421c15>] blk_rq_map_user_iov+0x295/0x930 block/blk-map.c:130
 [<ffffffff834223e9>] blk_rq_map_user+0x139/0x1e0 block/blk-map.c:159
 [<     inline     >] sg_start_req drivers/scsi/sg.c:1757
 [<ffffffff8470269a>] sg_common_write.isra.20+0x12da/0x1b20
drivers/scsi/sg.c:772
 [<ffffffff8470774a>] sg_write+0x78a/0xda0 drivers/scsi/sg.c:675
 [<ffffffff81a6fabe>] __vfs_write+0x65e/0x830 fs/read_write.c:510
 [<ffffffff81a6fd7c>] __kernel_write+0xec/0x340 fs/read_write.c:532
 [<ffffffff81b41e7c>] write_pipe_buf+0x19c/0x260 fs/splice.c:814
 [<     inline     >] splice_from_pipe_feed fs/splice.c:519
 [<ffffffff81b426af>] __splice_from_pipe+0x31f/0x750 fs/splice.c:643
 [<ffffffff81b45dcc>] splice_from_pipe+0x1dc/0x300 fs/splice.c:678
 [<ffffffff81b45f95>] default_file_splice_write+0x45/0x90 fs/splice.c:826
 [<     inline     >] do_splice_from fs/splice.c:868
 [<ffffffff81b3e2ba>] direct_splice_actor+0x12a/0x190 fs/splice.c:1035
 [<ffffffff81b40706>] splice_direct_to_actor+0x2c6/0x840 fs/splice.c:990
 [<ffffffff81b40f4e>] do_splice_direct+0x2ce/0x470 fs/splice.c:1078
 [<ffffffff81a7459f>] do_sendfile+0x73f/0x1060 fs/read_write.c:1401
 [<     inline     >] SYSC_sendfile64 fs/read_write.c:1456
 [<ffffffff81a7677e>] SyS_sendfile64+0xee/0x1a0 fs/read_write.c:1448
 [<ffffffff8814cf85>] entry_SYSCALL_64_fastpath+0x23/0xc6
arch/x86/entry/entry_64.S:209
Memory state around the buggy address:
 ffff880062c6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff880062c6ef80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff880062c6f000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                   ^
 ffff880062c6f080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff880062c6f100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================
Disabling lock debugging due to kernel taint
BUG: unable to handle kernel paging request at ffff880062c6f000
IP: [<ffffffff835051c2>] __memcpy+0x12/0x20 arch/x86/lib/memcpy_64.S:37
PGD c53d067 [  494.351750] PUD c540067
PMD 7fdea067 [  494.351750] PTE 8000000062c6f060

Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
Modules linked in:
CPU: 1 PID: 8529 Comm: a.out Tainted: G    B           4.9.0-rc6+ #55
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff88003c54e3c0 task.stack: ffff880039ca0000
RIP: 0010:[<ffffffff835051c2>]  [<ffffffff835051c2>]
__memcpy+0x12/0x20 arch/x86/lib/memcpy_64.S:37
RSP: 0018:ffff880039ca6838  EFLAGS: 00010246
RAX: ffff880031b99000 RBX: 0000000000001000 RCX: 0000000000000006
RDX: 0000000000000000 RSI: ffff880062c6effa RDI: ffff880031b99fd0
RBP: ffff880039ca6858 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000200 R11: ffffed00063733ff R12: ffff880031b99000
R13: ffff880062c6e02a R14: ffff880039ca6f70 R15: ffff880039ca6d18
FS:  00007f7a78013700(0000) GS:ffff88003ed00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff880062c6f000 CR3: 000000006411f000 CR4: 00000000000006e0
Stack:
 ffffffff819f0a65 ffff880039ca71a0 0000000000001000 0000000000002000
 ffff880039ca6d40 ffffffff83525d10 0000000041b58ab3 ffffffff89576240
 ffffffff00000001 0000000000000000 0000000041b58ab3 ffffffff894d07b0
Call Trace:
 [<ffffffff83525d10>] copy_from_iter+0xf30/0x15e0 lib/iov_iter.c:559
 [<ffffffff83526834>] copy_page_from_iter+0x474/0x860 lib/iov_iter.c:614
 [<     inline     >] bio_copy_from_iter block/bio.c:1025
 [<ffffffff833dfc10>] bio_copy_user_iov+0xb50/0xf10 block/bio.c:1226
 [<     inline     >] __blk_rq_map_user_iov block/blk-map.c:56
 [<ffffffff83421c15>] blk_rq_map_user_iov+0x295/0x930 block/blk-map.c:130
 [<ffffffff834223e9>] blk_rq_map_user+0x139/0x1e0 block/blk-map.c:159
 [<     inline     >] sg_start_req drivers/scsi/sg.c:1757
 [<ffffffff8470269a>] sg_common_write.isra.20+0x12da/0x1b20
drivers/scsi/sg.c:772
 [<ffffffff8470774a>] sg_write+0x78a/0xda0 drivers/scsi/sg.c:675
 [<ffffffff81a6fabe>] __vfs_write+0x65e/0x830 fs/read_write.c:510
 [<ffffffff81a6fd7c>] __kernel_write+0xec/0x340 fs/read_write.c:532
 [<ffffffff81b41e7c>] write_pipe_buf+0x19c/0x260 fs/splice.c:814
 [<     inline     >] splice_from_pipe_feed fs/splice.c:519
 [<ffffffff81b426af>] __splice_from_pipe+0x31f/0x750 fs/splice.c:643
 [<ffffffff81b45dcc>] splice_from_pipe+0x1dc/0x300 fs/splice.c:678
 [<ffffffff81b45f95>] default_file_splice_write+0x45/0x90 fs/splice.c:826
 [<     inline     >] do_splice_from fs/splice.c:868
 [<ffffffff81b3e2ba>] direct_splice_actor+0x12a/0x190 fs/splice.c:1035
 [<ffffffff81b40706>] splice_direct_to_actor+0x2c6/0x840 fs/splice.c:990
 [<ffffffff81b40f4e>] do_splice_direct+0x2ce/0x470 fs/splice.c:1078
 [<ffffffff81a7459f>] do_sendfile+0x73f/0x1060 fs/read_write.c:1401
 [<     inline     >] SYSC_sendfile64 fs/read_write.c:1456
 [<ffffffff81a7677e>] SyS_sendfile64+0xee/0x1a0 fs/read_write.c:1448
 [<ffffffff8814cf85>] entry_SYSCALL_64_fastpath+0x23/0xc6
arch/x86/entry/entry_64.S:209
Code: 4e fe e9 4d ff ff ff e8 9d c7 4e fe eb 8f e8 96 c7 4e fe e9 66
ff ff ff 90 0f 1f 44 00 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3>
48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 f3
RIP  [<ffffffff835051c2>] __memcpy+0x12/0x20 arch/x86/lib/memcpy_64.S:37
 RSP <ffff880039ca6838>
CR2: ffff880062c6f000
---[ end trace 13b61a6864c2c008 ]---
==================================================================


There are no alloc/free stacks in the report, so maybe it is not
use-after-free but rather an out-of-bounds.

On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: scsi: use-after-free in bio_copy_from_iter
  2016-11-25 19:08 scsi: use-after-free in bio_copy_from_iter Dmitry Vyukov
@ 2016-12-02 16:50 ` Dmitry Vyukov
  2016-12-03 10:38   ` Johannes Thumshirn
  0 siblings, 1 reply; 12+ messages in thread
From: Dmitry Vyukov @ 2016-12-02 16:50 UTC (permalink / raw)
  To: Doug Gilbert, jejb, Martin K. Petersen, linux-scsi, LKML, axboe,
	linux-block, David Rientjes
  Cc: syzkaller

On Fri, Nov 25, 2016 at 8:08 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> Hello,
>
> The following program triggers use-after-free in bio_copy_from_iter:
> https://gist.githubusercontent.com/dvyukov/80cd94b4e4c288f16ee4c787d404118b/raw/10536069562444da51b758bb39655b514ff93b45/gistfile1.txt
>
>
> ==================================================================
> BUG: KASAN: use-after-free in copy_from_iter+0xf30/0x15e0 at addr
> ffff880062c6e02a
> Read of size 4096 by task a.out/8529
> page:ffffea00018b1b80 count:2 mapcount:0 mapping:ffff88006c80e9d0 index:0x1695
> flags: 0x5fffc0000000864(referenced|lru|active|private)
> page dumped because: kasan: bad access detected
> page->mem_cgroup:ffff88003ebcea00
> CPU: 1 PID: 8529 Comm: a.out Not tainted 4.9.0-rc6+ #55
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>  ffff880039ca6770 ffffffff834c3bb9 ffffffff00000001 1ffff10007394c81
>  ffffed0007394c79 0000000041b58ab3 ffffffff89575c30 ffffffff834c38cb
>  0000000000000000 0000000000000000 0000000000000001 0000000000000000
> Call Trace:
>  [<     inline     >] __dump_stack lib/dump_stack.c:15
>  [<ffffffff834c3bb9>] dump_stack+0x2ee/0x3f5 lib/dump_stack.c:51
>  [<     inline     >] print_address_description mm/kasan/report.c:211
>  [<     inline     >] kasan_report_error mm/kasan/report.c:285
>  [<ffffffff819f18b8>] kasan_report+0x418/0x440 mm/kasan/report.c:305
>  [<     inline     >] check_memory_region_inline mm/kasan/kasan.c:308
>  [<ffffffff819f0509>] check_memory_region+0x139/0x190 mm/kasan/kasan.c:315
>  [<ffffffff819f0a43>] memcpy+0x23/0x50 mm/kasan/kasan.c:350
>  [<ffffffff83525d10>] copy_from_iter+0xf30/0x15e0 lib/iov_iter.c:559
>  [<ffffffff83526834>] copy_page_from_iter+0x474/0x860 lib/iov_iter.c:614
>  [<     inline     >] bio_copy_from_iter block/bio.c:1025
>  [<ffffffff833dfc10>] bio_copy_user_iov+0xb50/0xf10 block/bio.c:1226
>  [<     inline     >] __blk_rq_map_user_iov block/blk-map.c:56
>  [<ffffffff83421c15>] blk_rq_map_user_iov+0x295/0x930 block/blk-map.c:130
>  [<ffffffff834223e9>] blk_rq_map_user+0x139/0x1e0 block/blk-map.c:159
>  [<     inline     >] sg_start_req drivers/scsi/sg.c:1757
>  [<ffffffff8470269a>] sg_common_write.isra.20+0x12da/0x1b20
> drivers/scsi/sg.c:772
>  [<ffffffff8470774a>] sg_write+0x78a/0xda0 drivers/scsi/sg.c:675
>  [<ffffffff81a6fabe>] __vfs_write+0x65e/0x830 fs/read_write.c:510
>  [<ffffffff81a6fd7c>] __kernel_write+0xec/0x340 fs/read_write.c:532
>  [<ffffffff81b41e7c>] write_pipe_buf+0x19c/0x260 fs/splice.c:814
>  [<     inline     >] splice_from_pipe_feed fs/splice.c:519
>  [<ffffffff81b426af>] __splice_from_pipe+0x31f/0x750 fs/splice.c:643
>  [<ffffffff81b45dcc>] splice_from_pipe+0x1dc/0x300 fs/splice.c:678
>  [<ffffffff81b45f95>] default_file_splice_write+0x45/0x90 fs/splice.c:826
>  [<     inline     >] do_splice_from fs/splice.c:868
>  [<ffffffff81b3e2ba>] direct_splice_actor+0x12a/0x190 fs/splice.c:1035
>  [<ffffffff81b40706>] splice_direct_to_actor+0x2c6/0x840 fs/splice.c:990
>  [<ffffffff81b40f4e>] do_splice_direct+0x2ce/0x470 fs/splice.c:1078
>  [<ffffffff81a7459f>] do_sendfile+0x73f/0x1060 fs/read_write.c:1401
>  [<     inline     >] SYSC_sendfile64 fs/read_write.c:1456
>  [<ffffffff81a7677e>] SyS_sendfile64+0xee/0x1a0 fs/read_write.c:1448
>  [<ffffffff8814cf85>] entry_SYSCALL_64_fastpath+0x23/0xc6
> arch/x86/entry/entry_64.S:209
> Memory state around the buggy address:
>  ffff880062c6ef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  ffff880062c6ef80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>ffff880062c6f000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>                    ^
>  ffff880062c6f080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>  ffff880062c6f100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ==================================================================
> Disabling lock debugging due to kernel taint
> BUG: unable to handle kernel paging request at ffff880062c6f000
> IP: [<ffffffff835051c2>] __memcpy+0x12/0x20 arch/x86/lib/memcpy_64.S:37
> PGD c53d067 [  494.351750] PUD c540067
> PMD 7fdea067 [  494.351750] PTE 8000000062c6f060
>
> Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
> Modules linked in:
> CPU: 1 PID: 8529 Comm: a.out Tainted: G    B           4.9.0-rc6+ #55
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: ffff88003c54e3c0 task.stack: ffff880039ca0000
> RIP: 0010:[<ffffffff835051c2>]  [<ffffffff835051c2>]
> __memcpy+0x12/0x20 arch/x86/lib/memcpy_64.S:37
> RSP: 0018:ffff880039ca6838  EFLAGS: 00010246
> RAX: ffff880031b99000 RBX: 0000000000001000 RCX: 0000000000000006
> RDX: 0000000000000000 RSI: ffff880062c6effa RDI: ffff880031b99fd0
> RBP: ffff880039ca6858 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000200 R11: ffffed00063733ff R12: ffff880031b99000
> R13: ffff880062c6e02a R14: ffff880039ca6f70 R15: ffff880039ca6d18
> FS:  00007f7a78013700(0000) GS:ffff88003ed00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffff880062c6f000 CR3: 000000006411f000 CR4: 00000000000006e0
> Stack:
>  ffffffff819f0a65 ffff880039ca71a0 0000000000001000 0000000000002000
>  ffff880039ca6d40 ffffffff83525d10 0000000041b58ab3 ffffffff89576240
>  ffffffff00000001 0000000000000000 0000000041b58ab3 ffffffff894d07b0
> Call Trace:
>  [<ffffffff83525d10>] copy_from_iter+0xf30/0x15e0 lib/iov_iter.c:559
>  [<ffffffff83526834>] copy_page_from_iter+0x474/0x860 lib/iov_iter.c:614
>  [<     inline     >] bio_copy_from_iter block/bio.c:1025
>  [<ffffffff833dfc10>] bio_copy_user_iov+0xb50/0xf10 block/bio.c:1226
>  [<     inline     >] __blk_rq_map_user_iov block/blk-map.c:56
>  [<ffffffff83421c15>] blk_rq_map_user_iov+0x295/0x930 block/blk-map.c:130
>  [<ffffffff834223e9>] blk_rq_map_user+0x139/0x1e0 block/blk-map.c:159
>  [<     inline     >] sg_start_req drivers/scsi/sg.c:1757
>  [<ffffffff8470269a>] sg_common_write.isra.20+0x12da/0x1b20
> drivers/scsi/sg.c:772
>  [<ffffffff8470774a>] sg_write+0x78a/0xda0 drivers/scsi/sg.c:675
>  [<ffffffff81a6fabe>] __vfs_write+0x65e/0x830 fs/read_write.c:510
>  [<ffffffff81a6fd7c>] __kernel_write+0xec/0x340 fs/read_write.c:532
>  [<ffffffff81b41e7c>] write_pipe_buf+0x19c/0x260 fs/splice.c:814
>  [<     inline     >] splice_from_pipe_feed fs/splice.c:519
>  [<ffffffff81b426af>] __splice_from_pipe+0x31f/0x750 fs/splice.c:643
>  [<ffffffff81b45dcc>] splice_from_pipe+0x1dc/0x300 fs/splice.c:678
>  [<ffffffff81b45f95>] default_file_splice_write+0x45/0x90 fs/splice.c:826
>  [<     inline     >] do_splice_from fs/splice.c:868
>  [<ffffffff81b3e2ba>] direct_splice_actor+0x12a/0x190 fs/splice.c:1035
>  [<ffffffff81b40706>] splice_direct_to_actor+0x2c6/0x840 fs/splice.c:990
>  [<ffffffff81b40f4e>] do_splice_direct+0x2ce/0x470 fs/splice.c:1078
>  [<ffffffff81a7459f>] do_sendfile+0x73f/0x1060 fs/read_write.c:1401
>  [<     inline     >] SYSC_sendfile64 fs/read_write.c:1456
>  [<ffffffff81a7677e>] SyS_sendfile64+0xee/0x1a0 fs/read_write.c:1448
>  [<ffffffff8814cf85>] entry_SYSCALL_64_fastpath+0x23/0xc6
> arch/x86/entry/entry_64.S:209
> Code: 4e fe e9 4d ff ff ff e8 9d c7 4e fe eb 8f e8 96 c7 4e fe e9 66
> ff ff ff 90 0f 1f 44 00 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3>
> 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 f3
> RIP  [<ffffffff835051c2>] __memcpy+0x12/0x20 arch/x86/lib/memcpy_64.S:37
>  RSP <ffff880039ca6838>
> CR2: ffff880062c6f000
> ---[ end trace 13b61a6864c2c008 ]---
> ==================================================================
>
>
> There are no alloc/free stacks in the report, so maybe it is not
> use-after-free but rather an out-of-bounds.
>
> On commit 16ae16c6e5616c084168740990fc508bda6655d4 (Nov 24).


+David did some debugging of a similar case. His 0x400 at location
0x2000efdc refers to 0xffff at 0x20012fdc in the provided reproducer:
    NONFAILING(*(uint32_t*)0x20012fdc = (uint32_t)0xffff);
Here is his explanation:
=====
That's not nice, it's passing a reply_len mismatch of 0x400 which is going
to cause an sg_write warning (in 988 bytes, out 38 bytes).  input_size
will be 74 since count == 80, the mxsize is 0x400 (!) so after subtracting
SZ_SG_HEADER we get the 988 bytes in.  This forces SG_DXFER_TO_FROM_DEV
when we really want SG_DXFER_TO_DEV.  If you set the value at 0x2000efdc
to 0x24 rather than 0x400 so there's a legitimate reply_len equal to
SG_DXFER_TO_FROM_DEV, I'd assume this passes just fine, assuming your
container can get enough memory.
=====

I've tried to replace 0xffff in the provided reproducer with 0x24 and
it indeed does not crash.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: scsi: use-after-free in bio_copy_from_iter
  2016-12-02 16:50 ` Dmitry Vyukov
@ 2016-12-03 10:38   ` Johannes Thumshirn
  2016-12-03 15:22     ` Dmitry Vyukov
  0 siblings, 1 reply; 12+ messages in thread
From: Johannes Thumshirn @ 2016-12-03 10:38 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Doug Gilbert, jejb, Martin K. Petersen, linux-scsi, LKML, axboe,
	linux-block, David Rientjes, syzkaller, Hannes Reinecke

On Fri, Dec 02, 2016 at 05:50:39PM +0100, Dmitry Vyukov wrote:
> On Fri, Nov 25, 2016 at 8:08 PM, Dmitry Vyukov <dvyukov@google.com> wrote:

[...]

> 
> +David did some debugging of a similar case. His 0x400 at location
> 0x2000efdc refers to 0xffff at 0x20012fdc in the provided reproducer:
>     NONFAILING(*(uint32_t*)0x20012fdc = (uint32_t)0xffff);
> Here is his explanation:
> =====
> That's not nice, it's passing a reply_len mismatch of 0x400 which is going
> to cause an sg_write warning (in 988 bytes, out 38 bytes).  input_size
> will be 74 since count == 80, the mxsize is 0x400 (!) so after subtracting
> SZ_SG_HEADER we get the 988 bytes in.  This forces SG_DXFER_TO_FROM_DEV
> when we really want SG_DXFER_TO_DEV.  If you set the value at 0x2000efdc
> to 0x24 rather than 0x400 so there's a legitimate reply_len equal to
> SG_DXFER_TO_FROM_DEV, I'd assume this passes just fine, assuming your
> container can get enough memory.
> =====
> 
> I've tried to replace 0xffff in the provided reproducer with 0x24 and
> it indeed does not crash.

This is somewhat expected, AFAICS the value at 0x20012fdc ends up as
hp->dxfer_len in the SG driver. I did a lot of debugging (actually I spend the
whole last work week soley on this) but I can't find where it does the actual
use-after-free, so if you have anything to share I'd be glad.

Byte,
	Johannes

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: scsi: use-after-free in bio_copy_from_iter
  2016-12-03 10:38   ` Johannes Thumshirn
@ 2016-12-03 15:22     ` Dmitry Vyukov
  2016-12-03 18:19       ` Johannes Thumshirn
  0 siblings, 1 reply; 12+ messages in thread
From: Dmitry Vyukov @ 2016-12-03 15:22 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Doug Gilbert, jejb, Martin K. Petersen, linux-scsi, LKML, axboe,
	linux-block, David Rientjes, syzkaller, Hannes Reinecke

On Sat, Dec 3, 2016 at 11:38 AM, Johannes Thumshirn <jthumshirn@suse.de> wrote:
> On Fri, Dec 02, 2016 at 05:50:39PM +0100, Dmitry Vyukov wrote:
>> On Fri, Nov 25, 2016 at 8:08 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>
> [...]
>
>>
>> +David did some debugging of a similar case. His 0x400 at location
>> 0x2000efdc refers to 0xffff at 0x20012fdc in the provided reproducer:
>>     NONFAILING(*(uint32_t*)0x20012fdc = (uint32_t)0xffff);
>> Here is his explanation:
>> =====
>> That's not nice, it's passing a reply_len mismatch of 0x400 which is going
>> to cause an sg_write warning (in 988 bytes, out 38 bytes).  input_size
>> will be 74 since count == 80, the mxsize is 0x400 (!) so after subtracting
>> SZ_SG_HEADER we get the 988 bytes in.  This forces SG_DXFER_TO_FROM_DEV
>> when we really want SG_DXFER_TO_DEV.  If you set the value at 0x2000efdc
>> to 0x24 rather than 0x400 so there's a legitimate reply_len equal to
>> SG_DXFER_TO_FROM_DEV, I'd assume this passes just fine, assuming your
>> container can get enough memory.
>> =====
>>
>> I've tried to replace 0xffff in the provided reproducer with 0x24 and
>> it indeed does not crash.
>
> This is somewhat expected, AFAICS the value at 0x20012fdc ends up as
> hp->dxfer_len in the SG driver. I did a lot of debugging (actually I spend the
> whole last work week soley on this) but I can't find where it does the actual
> use-after-free, so if you have anything to share I'd be glad.

Hi Johannes,

Thanks for looking into this!

As I noted I don't think this is use-after-free, more likely it is an
out-of-bounds access against non-slab range.

Report says that we are copying 0x1000 bytes starting at 0xffff880062c6e02a.
The first bad address is 0xffff880062c6f000, this address was freed
previously and that's why KASAN reports UAF.
But this is already next page, and KASAN does not insert redzones
around pages (only around slab allocations).
So most likely the code should have not touch 0xffff880062c6f000 as it
is not his memory.
Also I noticed that the report happens after few minutes of repeatedly
running this program, so I would expect that this is some kind of race
-- either between kernel threads, or maybe between user space threads
and kernel. Or maybe it's just that the next page is not always marked
as free, so we just don't detect the bad access.

Does it all make any sense to you?
Can you think of any additional sanity checks that will ensure that
this code copies only memory it owns?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: scsi: use-after-free in bio_copy_from_iter
  2016-12-03 15:22     ` Dmitry Vyukov
@ 2016-12-03 18:19       ` Johannes Thumshirn
  2016-12-05 14:31         ` Dmitry Vyukov
  0 siblings, 1 reply; 12+ messages in thread
From: Johannes Thumshirn @ 2016-12-03 18:19 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Doug Gilbert, jejb, Martin K. Petersen, linux-scsi, LKML, axboe,
	linux-block, David Rientjes, syzkaller, Hannes Reinecke

On Sat, Dec 03, 2016 at 04:22:39PM +0100, Dmitry Vyukov wrote:
> On Sat, Dec 3, 2016 at 11:38 AM, Johannes Thumshirn <jthumshirn@suse.de> wrote:
> > On Fri, Dec 02, 2016 at 05:50:39PM +0100, Dmitry Vyukov wrote:
> >> On Fri, Nov 25, 2016 at 8:08 PM, Dmitry Vyukov <dvyukov@google.com> wrote:

[...]

Hi Dmitry,

> 
> Thanks for looking into this!
> 
> As I noted I don't think this is use-after-free, more likely it is an
> out-of-bounds access against non-slab range.
> 
> Report says that we are copying 0x1000 bytes starting at 0xffff880062c6e02a.
> The first bad address is 0xffff880062c6f000, this address was freed
> previously and that's why KASAN reports UAF.

We're copying 65499 bytes (65535 - sizeof(sg_header)) and we've got 2 order 3
page allocations to do this. It fails somewhere in there. I have seen fails at
0x2000, 0xe000 and all (0x1000 aligned) offsets inbetween.

> But this is already next page, and KASAN does not insert redzones
> around pages (only around slab allocations).
> So most likely the code should have not touch 0xffff880062c6f000 as it
> is not his memory.
> Also I noticed that the report happens after few minutes of repeatedly
> running this program, so I would expect that this is some kind of race
> -- either between kernel threads, or maybe between user space threads
> and kernel.

I somehow think it's a race as well, especially as I have to run the
reproducer in an endless loop and break out of it once I have the 1st
stacktrace in dmesg. This takes between some minutes up to one hour on my
setup.

But the race against a userspace thread... Could it be that the reproducer has
already exited it's threads while the copy_from_iter() is still running?
Normally I'd say no, as user-space shouldn't run while the kernel is doing
things in it's address space, but this is highly suspicious.

> Or maybe it's just that the next page is not always marked
> as free, so we just don't detect the bad access.

Could be, but I lack the memory management knowledge to say more than a 'could
be'.

> 
> Does it all make any sense to you?
> Can you think of any additional sanity checks that will ensure that
> this code copies only memory it owns?

Given that we pass the 0xffff as dxfer_len it thinks it owns all memory, so
this is OK, kinda. All that could be would be that user-space has already
exited and thus it's memory is already freed.

Byte,
	Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: scsi: use-after-free in bio_copy_from_iter
  2016-12-03 18:19       ` Johannes Thumshirn
@ 2016-12-05 14:31         ` Dmitry Vyukov
  2016-12-05 15:17           ` Johannes Thumshirn
  0 siblings, 1 reply; 12+ messages in thread
From: Dmitry Vyukov @ 2016-12-05 14:31 UTC (permalink / raw)
  To: syzkaller
  Cc: Doug Gilbert, jejb, Martin K. Petersen, linux-scsi, LKML, axboe,
	linux-block, David Rientjes, Hannes Reinecke, Johannes Thumshirn

On Sat, Dec 3, 2016 at 7:19 PM, Johannes Thumshirn <jthumshirn@suse.de> wrote:
> On Sat, Dec 03, 2016 at 04:22:39PM +0100, Dmitry Vyukov wrote:
>> On Sat, Dec 3, 2016 at 11:38 AM, Johannes Thumshirn <jthumshirn@suse.de> wrote:
>> > On Fri, Dec 02, 2016 at 05:50:39PM +0100, Dmitry Vyukov wrote:
>> >> On Fri, Nov 25, 2016 at 8:08 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>
> [...]
>
> Hi Dmitry,
>
>>
>> Thanks for looking into this!
>>
>> As I noted I don't think this is use-after-free, more likely it is an
>> out-of-bounds access against non-slab range.
>>
>> Report says that we are copying 0x1000 bytes starting at 0xffff880062c6e02a.
>> The first bad address is 0xffff880062c6f000, this address was freed
>> previously and that's why KASAN reports UAF.
>
> We're copying 65499 bytes (65535 - sizeof(sg_header)) and we've got 2 order 3
> page allocations to do this. It fails somewhere in there. I have seen fails at
> 0x2000, 0xe000 and all (0x1000 aligned) offsets inbetween.
>
>> But this is already next page, and KASAN does not insert redzones
>> around pages (only around slab allocations).
>> So most likely the code should have not touch 0xffff880062c6f000 as it
>> is not his memory.
>> Also I noticed that the report happens after few minutes of repeatedly
>> running this program, so I would expect that this is some kind of race
>> -- either between kernel threads, or maybe between user space threads
>> and kernel.
>
> I somehow think it's a race as well, especially as I have to run the
> reproducer in an endless loop and break out of it once I have the 1st
> stacktrace in dmesg. This takes between some minutes up to one hour on my
> setup.
>
> But the race against a userspace thread... Could it be that the reproducer has
> already exited it's threads while the copy_from_iter() is still running?
> Normally I'd say no, as user-space shouldn't run while the kernel is doing
> things in it's address space, but this is highly suspicious.
>
>> Or maybe it's just that the next page is not always marked
>> as free, so we just don't detect the bad access.
>
> Could be, but I lack the memory management knowledge to say more than a 'could
> be'.
>
>>
>> Does it all make any sense to you?
>> Can you think of any additional sanity checks that will ensure that
>> this code copies only memory it owns?
>
> Given that we pass the 0xffff as dxfer_len it thinks it owns all memory, so
> this is OK, kinda. All that could be would be that user-space has already
> exited and thus it's memory is already freed.


The crash happens in the context of sendfile syscall, we the address
space should be alive. But the crash happens on address
0xffff880062c6f000 which is not a user-space address, right? This is
something that kernel has allocated previously.
Do you have CONFIG_DEBUG_PAGEALLOC enabled? I have it enabled. Maybe
it increases changes of triggering the bug.

Do we know where that memory that we are copying was allocated? Is it
slab or page alloc? We could extend KASAN output with more details.
E.g. print allocation stack for the _first_ byte of memcpy, or
memorize page alloc/free stacks in page struct using lib/stackdepot.c.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: scsi: use-after-free in bio_copy_from_iter
  2016-12-05 14:31         ` Dmitry Vyukov
@ 2016-12-05 15:17           ` Johannes Thumshirn
  2016-12-05 19:03             ` Al Viro
  0 siblings, 1 reply; 12+ messages in thread
From: Johannes Thumshirn @ 2016-12-05 15:17 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzkaller, Doug Gilbert, jejb, Martin K. Petersen, linux-scsi,
	LKML, axboe, linux-block, David Rientjes, Hannes Reinecke,
	mhocko

On Mon, Dec 05, 2016 at 03:31:43PM +0100, Dmitry Vyukov wrote:
> On Sat, Dec 3, 2016 at 7:19 PM, Johannes Thumshirn <jthumshirn@suse.de> wrote:
> > On Sat, Dec 03, 2016 at 04:22:39PM +0100, Dmitry Vyukov wrote:
> >> On Sat, Dec 3, 2016 at 11:38 AM, Johannes Thumshirn <jthumshirn@suse.de> wrote:
> >> > On Fri, Dec 02, 2016 at 05:50:39PM +0100, Dmitry Vyukov wrote:
> >> >> On Fri, Nov 25, 2016 at 8:08 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> >
> > [...]
> >
> > Hi Dmitry,
> >
> >>
> >> Thanks for looking into this!
> >>
> >> As I noted I don't think this is use-after-free, more likely it is an
> >> out-of-bounds access against non-slab range.
> >>
> >> Report says that we are copying 0x1000 bytes starting at 0xffff880062c6e02a.
> >> The first bad address is 0xffff880062c6f000, this address was freed
> >> previously and that's why KASAN reports UAF.
> >
> > We're copying 65499 bytes (65535 - sizeof(sg_header)) and we've got 2 order 3
> > page allocations to do this. It fails somewhere in there. I have seen fails at
> > 0x2000, 0xe000 and all (0x1000 aligned) offsets inbetween.
> >
> >> But this is already next page, and KASAN does not insert redzones
> >> around pages (only around slab allocations).
> >> So most likely the code should have not touch 0xffff880062c6f000 as it
> >> is not his memory.
> >> Also I noticed that the report happens after few minutes of repeatedly
> >> running this program, so I would expect that this is some kind of race
> >> -- either between kernel threads, or maybe between user space threads
> >> and kernel.
> >
> > I somehow think it's a race as well, especially as I have to run the
> > reproducer in an endless loop and break out of it once I have the 1st
> > stacktrace in dmesg. This takes between some minutes up to one hour on my
> > setup.
> >
> > But the race against a userspace thread... Could it be that the reproducer has
> > already exited it's threads while the copy_from_iter() is still running?
> > Normally I'd say no, as user-space shouldn't run while the kernel is doing
> > things in it's address space, but this is highly suspicious.
> >
> >> Or maybe it's just that the next page is not always marked
> >> as free, so we just don't detect the bad access.
> >
> > Could be, but I lack the memory management knowledge to say more than a 'could
> > be'.
> >
> >>
> >> Does it all make any sense to you?
> >> Can you think of any additional sanity checks that will ensure that
> >> this code copies only memory it owns?
> >
> > Given that we pass the 0xffff as dxfer_len it thinks it owns all memory, so
> > this is OK, kinda. All that could be would be that user-space has already
> > exited and thus it's memory is already freed.
> 
> 
> The crash happens in the context of sendfile syscall, we the address
> space should be alive. But the crash happens on address
> 0xffff880062c6f000 which is not a user-space address, right? This is
> something that kernel has allocated previously.
> Do you have CONFIG_DEBUG_PAGEALLOC enabled? I have it enabled. Maybe
> it increases changes of triggering the bug.
> 
> Do we know where that memory that we are copying was allocated? Is it
> slab or page alloc? We could extend KASAN output with more details.
> E.g. print allocation stack for the _first_ byte of memcpy, or
> memorize page alloc/free stacks in page struct using lib/stackdepot.c.

It comes in this way:

drivers/scsi/sg.c:
sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
580         struct sg_header old_hdr;
581         sg_io_hdr_t *hp;
582         unsigned char cmnd[SG_MAX_CDB_SIZE];
[...]
598         if (__copy_from_user(&old_hdr, buf, SZ_SG_HEADER))
599                 return -EFAULT;
[...]
612         buf += SZ_SG_HEADER;
613         __get_user(opcode, buf);
[...]
614         if (sfp->next_cmd_len > 0) {
615                 cmd_size = sfp->next_cmd_len;
616                 sfp->next_cmd_len = 0;  /* reset so only this write() effected */
617         } else {
618                 cmd_size = COMMAND_SIZE(opcode);        /* based on SCSI command group */
619                 if ((opcode >= 0xc0) && old_hdr.twelve_byte)
620                         cmd_size = 12;
621         }
[...]
625         input_size = count - cmd_size;
626         mxsize = (input_size > old_hdr.reply_len) ? input_size : old_hdr.reply_len;
627         mxsize -= SZ_SG_HEADER;
633         hp = &srp->header;
[...]
646                 hp->dxferp = (char __user *)buf + cmd_size;
[...]
654         if (__copy_from_user(cmnd, buf, cmd_size))
655                 return -EFAULT;
[...]
675         k = sg_common_write(sfp, srp, cmnd, sfp->timeout, blocking);
[...]
752 static int
753 sg_common_write(Sg_fd * sfp, Sg_request * srp,
754                 unsigned char *cmnd, int timeout, int blocking)
755 {
[...]
772         k = sg_start_req(srp, cmnd);
[...]
1653 static int
1654 sg_start_req(Sg_request *srp, unsigned char *cmd)
1655 {
[...]
1757                 res = blk_rq_map_user(q, rq, md, hp->dxferp,
1758                                       hp->dxfer_len, GFP_ATOMIC);
[...]

block/blk-map.c:
148 int blk_rq_map_user(struct request_queue *q, struct request *rq,
149                     struct rq_map_data *map_data, void __user *ubuf,
150                     unsigned long len, gfp_t gfp_mask)
151 {
[...]
154         int ret = import_single_range(rq_data_dir(rq), ubuf, len, &iov, &i);

lib/iov_iter.c:
1209 int import_single_range(int rw, void __user *buf, size_t len,
1210                  struct iovec *iov, struct iov_iter *i)
1211 {
1217         iov->iov_base = buf;

block/blk-map.c:
148 int blk_rq_map_user(struct request_queue *q, struct request *rq,
149                     struct rq_map_data *map_data, void __user *ubuf,
150                     unsigned long len, gfp_t gfp_mask)
151 {
[...]
159         return blk_rq_map_user_iov(q, rq, map_data, &i, gfp_mask);

and so on....

So the memory for hp->dxferp comes from:
633         hp = &srp->header;
a.k.a.
sg_add_sfp()'s:
2151         sfp = kzalloc(sizeof(*sfp), GFP_ATOMIC | __GFP_NOWARN);
And then taken out of the pool by sg_add_request() methinks. So yes it is a
kernel pointer (and the address proves it as well). 

>From my debug instrumentation I see that the dxferp ends up in the
iovec_iter's kvec->iov_base and the faulting address is always dxferp + n *
4k with n in [1, 16] (and we're copying 16 4k pages from the iovec into the
bio).

HTH,
	Johannes

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: scsi: use-after-free in bio_copy_from_iter
  2016-12-05 15:17           ` Johannes Thumshirn
@ 2016-12-05 19:03             ` Al Viro
  2016-12-06  9:32               ` Johannes Thumshirn
  0 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2016-12-05 19:03 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Dmitry Vyukov, syzkaller, Doug Gilbert, jejb, Martin K. Petersen,
	linux-scsi, LKML, axboe, linux-block, David Rientjes,
	Hannes Reinecke, mhocko

On Mon, Dec 05, 2016 at 04:17:53PM +0100, Johannes Thumshirn wrote:
> 633         hp = &srp->header;
> [...]
> 646                 hp->dxferp = (char __user *)buf + cmd_size;

> So the memory for hp->dxferp comes from:
> 633         hp = &srp->header;

????

> >From my debug instrumentation I see that the dxferp ends up in the
> iovec_iter's kvec->iov_base and the faulting address is always dxferp + n *
> 4k with n in [1, 16] (and we're copying 16 4k pages from the iovec into the
> bio).

_Address_ of hp->dxferp comes from that assignment; the value is 'buf'
argument of sg_write() + small offset.  In this case, it should point
inside a pipe buffer, which is, indeed, at a kernel address.  Who'd
allocated srp is irrelevant.

And if you end up dereferencing more than one page worth there, you do have
a problem - pipe buffers are not going to be that large.  Could you slap
	WARN_ON((size_t)input_size > count);
right after the calculation of input_size in sg_write() and see if it triggers
on your reproducer?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: scsi: use-after-free in bio_copy_from_iter
  2016-12-05 19:03             ` Al Viro
@ 2016-12-06  9:32               ` Johannes Thumshirn
  2016-12-06  9:43                 ` Dmitry Vyukov
  0 siblings, 1 reply; 12+ messages in thread
From: Johannes Thumshirn @ 2016-12-06  9:32 UTC (permalink / raw)
  To: Al Viro
  Cc: Dmitry Vyukov, syzkaller, Doug Gilbert, jejb, Martin K. Petersen,
	linux-scsi, LKML, axboe, linux-block, David Rientjes,
	Hannes Reinecke, mhocko

On Mon, Dec 05, 2016 at 07:03:39PM +0000, Al Viro wrote:
> On Mon, Dec 05, 2016 at 04:17:53PM +0100, Johannes Thumshirn wrote:
> > 633         hp = &srp->header;
> > [...]
> > 646                 hp->dxferp = (char __user *)buf + cmd_size;
> 
> > So the memory for hp->dxferp comes from:
> > 633         hp = &srp->header;
> 
> ????
> 
> > >From my debug instrumentation I see that the dxferp ends up in the
> > iovec_iter's kvec->iov_base and the faulting address is always dxferp + n *
> > 4k with n in [1, 16] (and we're copying 16 4k pages from the iovec into the
> > bio).
> 
> _Address_ of hp->dxferp comes from that assignment; the value is 'buf'
> argument of sg_write() + small offset.  In this case, it should point
> inside a pipe buffer, which is, indeed, at a kernel address.  Who'd
> allocated srp is irrelevant.

Yes I realized that as well when I had enough distance between me and the
code...

> 
> And if you end up dereferencing more than one page worth there, you do have
> a problem - pipe buffers are not going to be that large.  Could you slap
> 	WARN_ON((size_t)input_size > count);
> right after the calculation of input_size in sg_write() and see if it triggers
> on your reproducer?

I did and it didn't trigger. What triggers is (as expected) a
	WARN_ON((size_t)mxsize > count);
We have count at 80 and mxsize (which ends in hp->dxfer_len) at 65499. But the
65499 bytes are the len of the data we're suppost to be copying in via the
iov. I'm still rather confused what's happening here, sorry.

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: scsi: use-after-free in bio_copy_from_iter
  2016-12-06  9:32               ` Johannes Thumshirn
@ 2016-12-06  9:43                 ` Dmitry Vyukov
  2016-12-06 15:38                   ` Johannes Thumshirn
  0 siblings, 1 reply; 12+ messages in thread
From: Dmitry Vyukov @ 2016-12-06  9:43 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Al Viro, syzkaller, Doug Gilbert, jejb, Martin K. Petersen,
	linux-scsi, LKML, axboe, linux-block, David Rientjes,
	Hannes Reinecke, Michal Hocko

On Tue, Dec 6, 2016 at 10:32 AM, Johannes Thumshirn <jthumshirn@suse.de> wrote:
> On Mon, Dec 05, 2016 at 07:03:39PM +0000, Al Viro wrote:
>> On Mon, Dec 05, 2016 at 04:17:53PM +0100, Johannes Thumshirn wrote:
>> > 633         hp = &srp->header;
>> > [...]
>> > 646                 hp->dxferp = (char __user *)buf + cmd_size;
>>
>> > So the memory for hp->dxferp comes from:
>> > 633         hp = &srp->header;
>>
>> ????
>>
>> > >From my debug instrumentation I see that the dxferp ends up in the
>> > iovec_iter's kvec->iov_base and the faulting address is always dxferp + n *
>> > 4k with n in [1, 16] (and we're copying 16 4k pages from the iovec into the
>> > bio).
>>
>> _Address_ of hp->dxferp comes from that assignment; the value is 'buf'
>> argument of sg_write() + small offset.  In this case, it should point
>> inside a pipe buffer, which is, indeed, at a kernel address.  Who'd
>> allocated srp is irrelevant.
>
> Yes I realized that as well when I had enough distance between me and the
> code...
>
>>
>> And if you end up dereferencing more than one page worth there, you do have
>> a problem - pipe buffers are not going to be that large.  Could you slap
>>       WARN_ON((size_t)input_size > count);
>> right after the calculation of input_size in sg_write() and see if it triggers
>> on your reproducer?
>
> I did and it didn't trigger. What triggers is (as expected) a
>         WARN_ON((size_t)mxsize > count);
> We have count at 80 and mxsize (which ends in hp->dxfer_len) at 65499. But the
> 65499 bytes are the len of the data we're suppost to be copying in via the
> iov. I'm still rather confused what's happening here, sorry.


I think the critical piece here is some kind of race or timing
condition. Note that the test program executes all of
memfd_create/write/open/sendfile twice. Second time the calls race
with each other, but they also can race with the first execution of
the calls.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: scsi: use-after-free in bio_copy_from_iter
  2016-12-06  9:43                 ` Dmitry Vyukov
@ 2016-12-06 15:38                   ` Johannes Thumshirn
  2016-12-06 15:46                     ` Dmitry Vyukov
  0 siblings, 1 reply; 12+ messages in thread
From: Johannes Thumshirn @ 2016-12-06 15:38 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Al Viro, syzkaller, Doug Gilbert, jejb, Martin K. Petersen,
	linux-scsi, LKML, axboe, linux-block, David Rientjes,
	Hannes Reinecke, Michal Hocko

On Tue, Dec 06, 2016 at 10:43:57AM +0100, Dmitry Vyukov wrote:
> On Tue, Dec 6, 2016 at 10:32 AM, Johannes Thumshirn <jthumshirn@suse.de> wrote:
> > On Mon, Dec 05, 2016 at 07:03:39PM +0000, Al Viro wrote:
> >> On Mon, Dec 05, 2016 at 04:17:53PM +0100, Johannes Thumshirn wrote:
> >> > 633         hp = &srp->header;
> >> > [...]
> >> > 646                 hp->dxferp = (char __user *)buf + cmd_size;
> >>
> >> > So the memory for hp->dxferp comes from:
> >> > 633         hp = &srp->header;
> >>
> >> ????
> >>
> >> > >From my debug instrumentation I see that the dxferp ends up in the
> >> > iovec_iter's kvec->iov_base and the faulting address is always dxferp + n *
> >> > 4k with n in [1, 16] (and we're copying 16 4k pages from the iovec into the
> >> > bio).
> >>
> >> _Address_ of hp->dxferp comes from that assignment; the value is 'buf'
> >> argument of sg_write() + small offset.  In this case, it should point
> >> inside a pipe buffer, which is, indeed, at a kernel address.  Who'd
> >> allocated srp is irrelevant.
> >
> > Yes I realized that as well when I had enough distance between me and the
> > code...
> >
> >>
> >> And if you end up dereferencing more than one page worth there, you do have
> >> a problem - pipe buffers are not going to be that large.  Could you slap
> >>       WARN_ON((size_t)input_size > count);
> >> right after the calculation of input_size in sg_write() and see if it triggers
> >> on your reproducer?
> >
> > I did and it didn't trigger. What triggers is (as expected) a
> >         WARN_ON((size_t)mxsize > count);
> > We have count at 80 and mxsize (which ends in hp->dxfer_len) at 65499. But the
> > 65499 bytes are the len of the data we're suppost to be copying in via the
> > iov. I'm still rather confused what's happening here, sorry.
> 
> 
> I think the critical piece here is some kind of race or timing
> condition. Note that the test program executes all of
> memfd_create/write/open/sendfile twice. Second time the calls race
> with each other, but they also can race with the first execution of
> the calls.

FWIW I've just run the reproducer once instead of looping it to check how it
would normally behave and it bailes out at:

604         if (count < (SZ_SG_HEADER + 6))
605                 return -EIO;    /* The minimum scsi command length is 6 bytes. */

That means, weren't going down the copy_form_iter() road at all. Usually, but
sometimes we do. And then we try to copy 16 pages from the pipe buffer (is
this correct?).
The reproducer does: sendfile("/dev/sg0", memfd, offset_in_memfd, 0x10000);

I don't see how we get there? Could it be random data from the mmap() we point
the memfd to?

This bug is confusing to be honest.

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: scsi: use-after-free in bio_copy_from_iter
  2016-12-06 15:38                   ` Johannes Thumshirn
@ 2016-12-06 15:46                     ` Dmitry Vyukov
  0 siblings, 0 replies; 12+ messages in thread
From: Dmitry Vyukov @ 2016-12-06 15:46 UTC (permalink / raw)
  To: syzkaller
  Cc: Al Viro, Doug Gilbert, jejb, Martin K. Petersen, linux-scsi,
	LKML, axboe, linux-block, David Rientjes, Hannes Reinecke,
	Michal Hocko

On Tue, Dec 6, 2016 at 4:38 PM, Johannes Thumshirn <jthumshirn@suse.de> wrote:
> On Tue, Dec 06, 2016 at 10:43:57AM +0100, Dmitry Vyukov wrote:
>> On Tue, Dec 6, 2016 at 10:32 AM, Johannes Thumshirn <jthumshirn@suse.de> wrote:
>> > On Mon, Dec 05, 2016 at 07:03:39PM +0000, Al Viro wrote:
>> >> On Mon, Dec 05, 2016 at 04:17:53PM +0100, Johannes Thumshirn wrote:
>> >> > 633         hp = &srp->header;
>> >> > [...]
>> >> > 646                 hp->dxferp = (char __user *)buf + cmd_size;
>> >>
>> >> > So the memory for hp->dxferp comes from:
>> >> > 633         hp = &srp->header;
>> >>
>> >> ????
>> >>
>> >> > >From my debug instrumentation I see that the dxferp ends up in the
>> >> > iovec_iter's kvec->iov_base and the faulting address is always dxferp + n *
>> >> > 4k with n in [1, 16] (and we're copying 16 4k pages from the iovec into the
>> >> > bio).
>> >>
>> >> _Address_ of hp->dxferp comes from that assignment; the value is 'buf'
>> >> argument of sg_write() + small offset.  In this case, it should point
>> >> inside a pipe buffer, which is, indeed, at a kernel address.  Who'd
>> >> allocated srp is irrelevant.
>> >
>> > Yes I realized that as well when I had enough distance between me and the
>> > code...
>> >
>> >>
>> >> And if you end up dereferencing more than one page worth there, you do have
>> >> a problem - pipe buffers are not going to be that large.  Could you slap
>> >>       WARN_ON((size_t)input_size > count);
>> >> right after the calculation of input_size in sg_write() and see if it triggers
>> >> on your reproducer?
>> >
>> > I did and it didn't trigger. What triggers is (as expected) a
>> >         WARN_ON((size_t)mxsize > count);
>> > We have count at 80 and mxsize (which ends in hp->dxfer_len) at 65499. But the
>> > 65499 bytes are the len of the data we're suppost to be copying in via the
>> > iov. I'm still rather confused what's happening here, sorry.
>>
>>
>> I think the critical piece here is some kind of race or timing
>> condition. Note that the test program executes all of
>> memfd_create/write/open/sendfile twice. Second time the calls race
>> with each other, but they also can race with the first execution of
>> the calls.
>
> FWIW I've just run the reproducer once instead of looping it to check how it
> would normally behave and it bailes out at:
>
> 604         if (count < (SZ_SG_HEADER + 6))
> 605                 return -EIO;    /* The minimum scsi command length is 6 bytes. */
>
> That means, weren't going down the copy_form_iter() road at all. Usually, but
> sometimes we do. And then we try to copy 16 pages from the pipe buffer (is
> this correct?).
> The reproducer does: sendfile("/dev/sg0", memfd, offset_in_memfd, 0x10000);
>
> I don't see how we get there? Could it be random data from the mmap() we point
> the memfd to?
>
> This bug is confusing to be honest.


Where does this count come from? What address in the user program? Is
it 0x20012fxx?
One possibility for non-deterministically changing inputs is that this part:

  case 2:
    NONFAILING(*(uint32_t*)0x20012fd8 = (uint32_t)0x28);
    NONFAILING(*(uint32_t*)0x20012fdc = (uint32_t)0xffff);
    NONFAILING(*(uint64_t*)0x20012fe0 = (uint64_t)0x0);
    NONFAILING(*(uint64_t*)0x20012fe8 = (uint64_t)0xffffffffffff993f);
    NONFAILING(*(uint64_t*)0x20012ff0 = (uint64_t)0xa8b);
    NONFAILING(*(uint32_t*)0x20012ff8 = (uint32_t)0xff);
    r[9] = syscall(__NR_write, r[2], 0x20012fd8ul, 0x28ul, 0, 0,
                           0, 0, 0, 0);

runs concurrently with this part:

  case 0:
    r[0] =
        syscall(__NR_mmap, 0x20000000ul, 0x13000ul, 0x3ul,
                        0x32ul, 0xfffffffffffffffful, 0x0ul, 0, 0, 0);

So all of the input data to the write, or a subset of the input data,
can be zeros.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-12-06 15:46 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-25 19:08 scsi: use-after-free in bio_copy_from_iter Dmitry Vyukov
2016-12-02 16:50 ` Dmitry Vyukov
2016-12-03 10:38   ` Johannes Thumshirn
2016-12-03 15:22     ` Dmitry Vyukov
2016-12-03 18:19       ` Johannes Thumshirn
2016-12-05 14:31         ` Dmitry Vyukov
2016-12-05 15:17           ` Johannes Thumshirn
2016-12-05 19:03             ` Al Viro
2016-12-06  9:32               ` Johannes Thumshirn
2016-12-06  9:43                 ` Dmitry Vyukov
2016-12-06 15:38                   ` Johannes Thumshirn
2016-12-06 15:46                     ` Dmitry Vyukov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).