Is ext4_dio_read_iter() broken?

* Is ext4_dio_read_iter() broken?
@ 2020-03-12  9:19 David Howells
  2020-03-12 10:07 ` Is ext4_dio_read_iter() broken? - and xfs_file_dio_aio_read() David Howells
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: David Howells @ 2020-03-12  9:19 UTC (permalink / raw)
  To: mbobrowski; +Cc: dhowells, jack, linux-ext4, linux-fsdevel

Hi Matthew,

Is ext4_dio_read_iter() broken?  It calls:

	file_accessed(iocb->ki_filp);

at the end of the function - but surely iocb should be expected to have been
freed when iocb->ki_complete() was called?

In my cachefiles rewrite, I'm seeing the attached kasan dump.  The offending
RIP, ext4_file_read_iter+0x12b is at the above line, where it is trying to
read iocb->ki_filp.

Here's an excerpt of the relevant bits from my code:

	static void cachefiles_read_complete(struct kiocb *iocb, long ret, long ret2)
	{
		struct cachefiles_kiocb *ki =
			container_of(iocb, struct cachefiles_kiocb, iocb);
		struct fscache_io_request *req = ki->req;
	...
		fput(ki->iocb.ki_filp);
		kfree(ki);
		fscache_end_io_operation(req->cookie);
	...
	}

	int cachefiles_read(struct fscache_object *obj,
			    struct fscache_io_request *req,
			    struct iov_iter *iter)
	{
		struct cachefiles_object *object =
			container_of(obj, struct cachefiles_object, fscache);
		struct cachefiles_kiocb *ki;
		struct file *file = object->backing_file;
		ssize_t ret = -ENOBUFS;
	...
		ki = kzalloc(sizeof(struct cachefiles_kiocb), GFP_KERNEL);
		if (!ki)
			goto presubmission_error;

		ki->iocb.ki_filp	= get_file(file);
		ki->iocb.ki_pos		= req->pos;
		ki->iocb.ki_flags	= IOCB_DIRECT;
		ki->iocb.ki_hint	= ki_hint_validate(file_write_hint(file));
		ki->iocb.ki_ioprio	= get_current_ioprio();
		ki->req			= req;

		if (req->io_done)
			ki->iocb.ki_complete = cachefiles_read_complete;

		ret = rw_verify_area(READ, file, &ki->iocb.ki_pos, iov_iter_count(iter));
		if (ret < 0)
			goto presubmission_error_free;

		fscache_get_io_request(req);
		trace_cachefiles_read(object, file_inode(file), req);
		ret = call_read_iter(file, &ki->iocb, iter);
		...
	}

The allocation point, cachefiles_read+0xd0, is the kzalloc() you can see and
the free point, cachefiles_read_complete+0x86, is the kfree() in the callback
function.

Most of the time it works, but just occasionally the callback happens first.

David
---
EXT4-fs (sda3): mounted filesystem with ordered data mode. Opts: user_xattr
FS-Cache: Cache "mycache" added (type cachefiles)
CacheFiles: File cache on sda3 registered
CacheFiles: Dirty object in cache
==================================================================
BUG: KASAN: use-after-free in ext4_file_read_iter+0x12b/0x181
Read of size 8 at addr ffff8883577d7700 by task bash/5140

CPU: 0 PID: 5140 Comm: bash Not tainted 5.6.0-rc1-fscache+ #1285
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Call Trace:
 dump_stack+0x97/0xd3
 print_address_description.constprop.0+0x1c/0x2b5
 ? ext4_file_read_iter+0x12b/0x181
 __kasan_report+0x144/0x189
 ? ext4_file_read_iter+0x12b/0x181
 kasan_report+0xe/0x12
 ext4_file_read_iter+0x12b/0x181
 cachefiles_read+0x381/0x459
 ? trace_event_raw_event_cachefiles_map+0x14d/0x14d
 fscache_read_helper+0x9a6/0xa5f
 ? fscache_read_helper_single+0x387/0x387
 ? kmem_cache_alloc_trace+0x13f/0x167
 ? test_bit+0x1d/0x27
 afs_prefetch_for_write+0x2d4/0x397
 ? afs_req_issue_op+0x9d/0x9d
 ? hlock_class+0x27/0x86
 ? mark_lock+0x97/0x2d5
 ? timestamp_truncate+0xda/0x175
 ? inode_init_owner+0x114/0x114
 afs_write_begin+0xc8/0x354
 generic_perform_write+0x154/0x29c
 ? filemap_range_has_page+0x171/0x171
 ? file_update_time+0x11e/0x1c0
 ? alloc_inode+0xbf/0xbf
 ? generic_write_checks+0x139/0x17c
 __generic_file_write_iter+0x16e/0x1c3
 generic_file_write_iter+0x62/0xc6
 afs_file_write+0xdc/0x125
 ? new_sync_write+0xfc/0x186
 new_sync_write+0x10c/0x186
 ? __lock_is_held+0x2a/0x87
 ? new_sync_read+0x17e/0x17e
 ? rcu_read_lock_any_held+0xa5/0xe8
 ? rcu_read_lock_sched_held+0xc5/0xc5
 ? vfs_write+0xd1/0x144
 ? rcu_sync_is_idle+0x46/0x4f
 ? __sb_start_write+0xc1/0x162
 ? vfs_write+0xd1/0x144
 vfs_write+0xef/0x144
 ksys_write+0xd2/0x147
 ? __ia32_sys_read+0x41/0x41
 ? get_vtime_delta+0x9c/0xaa
 ? mark_held_locks+0x1f/0x78
 do_syscall_64+0x6e/0x8a
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7fe0ecdc0ff8
Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 77 0d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
RSP: 002b:00007ffd5f2ca6f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000000000000000b RCX: 00007fe0ecdc0ff8
RDX: 000000000000000b RSI: 0000564008fe1b30 RDI: 0000000000000001
RBP: 0000564008fe1b30 R08: 000000000000000a R09: 00007fe0ece52e80
R10: 000000000000000a R11: 0000000000000246 R12: 00007fe0ece94780
R13: 000000000000000b R14: 00007fe0ece8f740 R15: 000000000000000b

Allocated by task 5140:
 save_stack+0x1b/0x6a
 __kasan_kmalloc.constprop.0+0x84/0x95
 kmem_cache_alloc_trace+0x124/0x167
 cachefiles_read+0xd0/0x459
 fscache_read_helper+0x9a6/0xa5f
 afs_prefetch_for_write+0x2d4/0x397
 afs_write_begin+0xc8/0x354
 generic_perform_write+0x154/0x29c
 __generic_file_write_iter+0x16e/0x1c3
 generic_file_write_iter+0x62/0xc6
 afs_file_write+0xdc/0x125
 new_sync_write+0x10c/0x186
 vfs_write+0xef/0x144
 ksys_write+0xd2/0x147
 do_syscall_64+0x6e/0x8a
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 19100:
 save_stack+0x1b/0x6a
 __kasan_slab_free+0xd9/0xf9
 kfree+0x10d/0x140
 cachefiles_read_complete+0x86/0x111
 iomap_dio_bio_end_io+0x1b2/0x1e5
 blk_update_request+0x335/0x45b
 scsi_end_request+0x4f/0x27d
 scsi_io_completion+0x22e/0x5a7
 blk_done_softirq+0x142/0x17a
 __do_softirq+0x22b/0x492

The buggy address belongs to the object at ffff8883577d7700
 which belongs to the cache kmalloc-64 of size 64
The buggy address is located 0 bytes inside of
 64-byte region [ffff8883577d7700, ffff8883577d7740)
The buggy address belongs to the page:
page:ffffea000d5df5c0 refcount:1 mapcount:0 mapping:ffff8883ce400380 index:0xffff8883577d7000
flags: 0x200000000000200(slab)
raw: 0200000000000200 ffffea000d8cdd08 ffffea000dc0af88 ffff8883ce400380
raw: ffff8883577d7000 ffff8883577d7000 0000000100000013 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8883577d7600: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
 ffff8883577d7680: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>ffff8883577d7700: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
                   ^
 ffff8883577d7780: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
 ffff8883577d7800: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
==================================================================
Disabling lock debugging due to kernel taint
BUG: unable to handle page fault for address: 0000000100000058
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0 
Oops: 0000 [#1] SMP KASAN PTI
CPU: 0 PID: 5140 Comm: bash Tainted: G    B             5.6.0-rc1-fscache+ #1285
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
RIP: 0010:touch_atime+0x82/0x137
Code: 00 00 48 89 44 24 58 31 c0 e8 8a bf fb ff 49 8d 7c 24 08 4d 8b 34 24 e8 7c bf fb ff 49 8b 5c 24 08 48 8d 7b 58 e8 6e bf fb ff <48> 8b 5b 58 4c 89 e7 48 89 de e8 b4 fd ff ff 84 c0 74 6e 4c 8d 63
RSP: 0018:ffff8883c8617800 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000100000000 RCX: ffffffff812f3d9f
RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000100000058
RBP: ffff8883c8617880 R08: 0000000000000003 R09: 0000000000000001
R10: fffffbfff05f7dc5 R11: 0000000000000000 R12: ffff8883679d5010
R13: 1ffff110790c2f00 R14: 0000100000000000 R15: ffff8883577d7720
FS:  00007fe0eccd0740(0000) GS:ffff8883d0200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000100000058 CR3: 000000037201c002 CR4: 00000000001606f0
Call Trace:
 ? atime_needs_update+0x1bb/0x1bb
 ? __kasan_report+0x169/0x189
 ? ext4_file_read_iter+0x12b/0x181
 ext4_file_read_iter+0x147/0x181
 cachefiles_read+0x381/0x459
 ? trace_event_raw_event_cachefiles_map+0x14d/0x14d
 fscache_read_helper+0x9a6/0xa5f
 ? fscache_read_helper_single+0x387/0x387
 ? kmem_cache_alloc_trace+0x13f/0x167
 ? test_bit+0x1d/0x27
 afs_prefetch_for_write+0x2d4/0x397
 ? afs_req_issue_op+0x9d/0x9d
 ? hlock_class+0x27/0x86
 ? mark_lock+0x97/0x2d5
 ? timestamp_truncate+0xda/0x175
 ? inode_init_owner+0x114/0x114
 afs_write_begin+0xc8/0x354
 generic_perform_write+0x154/0x29c
 ? filemap_range_has_page+0x171/0x171
 ? file_update_time+0x11e/0x1c0
 ? alloc_inode+0xbf/0xbf
 ? generic_write_checks+0x139/0x17c
 __generic_file_write_iter+0x16e/0x1c3
 generic_file_write_iter+0x62/0xc6
 afs_file_write+0xdc/0x125
 ? new_sync_write+0xfc/0x186
 new_sync_write+0x10c/0x186
 ? __lock_is_held+0x2a/0x87
 ? new_sync_read+0x17e/0x17e
 ? rcu_read_lock_any_held+0xa5/0xe8
 ? rcu_read_lock_sched_held+0xc5/0xc5
 ? vfs_write+0xd1/0x144
 ? rcu_sync_is_idle+0x46/0x4f
 ? __sb_start_write+0xc1/0x162
 ? vfs_write+0xd1/0x144
 vfs_write+0xef/0x144
 ksys_write+0xd2/0x147
 ? __ia32_sys_read+0x41/0x41
 ? get_vtime_delta+0x9c/0xaa
 ? mark_held_locks+0x1f/0x78
 do_syscall_64+0x6e/0x8a
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7fe0ecdc0ff8
Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 77 0d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
RSP: 002b:00007ffd5f2ca6f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000000000000000b RCX: 00007fe0ecdc0ff8
RDX: 000000000000000b RSI: 0000564008fe1b30 RDI: 0000000000000001
RBP: 0000564008fe1b30 R08: 000000000000000a R09: 00007fe0ece52e80
R10: 000000000000000a R11: 0000000000000246 R12: 00007fe0ece94780
R13: 000000000000000b R14: 00007fe0ece8f740 R15: 000000000000000b
Modules linked in:
CR2: 0000000100000058
---[ end trace 051c3ce276c972cc ]---
RIP: 0010:touch_atime+0x82/0x137
Code: 00 00 48 89 44 24 58 31 c0 e8 8a bf fb ff 49 8d 7c 24 08 4d 8b 34 24 e8 7c bf fb ff 49 8b 5c 24 08 48 8d 7b 58 e8 6e bf fb ff <48> 8b 5b 58 4c 89 e7 48 89 de e8 b4 fd ff ff 84 c0 74 6e 4c 8d 63
RSP: 0018:ffff8883c8617800 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000100000000 RCX: ffffffff812f3d9f
RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000100000058
RBP: ffff8883c8617880 R08: 0000000000000003 R09: 0000000000000001
R10: fffffbfff05f7dc5 R11: 0000000000000000 R12: ffff8883679d5010
R13: 1ffff110790c2f00 R14: 0000100000000000 R15: ffff8883577d7720
FS:  00007fe0eccd0740(0000) GS:ffff8883d0200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000100000058 CR3: 000000037201c002 CR4: 00000000001606f0

^ permalink raw reply	[flat|nested] 8+ messages in thread