linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Is ext4_dio_read_iter() broken?
@ 2020-03-12  9:19 David Howells
  2020-03-12 10:07 ` Is ext4_dio_read_iter() broken? - and xfs_file_dio_aio_read() David Howells
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: David Howells @ 2020-03-12  9:19 UTC (permalink / raw)
  To: mbobrowski; +Cc: dhowells, jack, linux-ext4, linux-fsdevel

Hi Matthew,

Is ext4_dio_read_iter() broken?  It calls:

	file_accessed(iocb->ki_filp);

at the end of the function - but surely iocb should be expected to have been
freed when iocb->ki_complete() was called?

In my cachefiles rewrite, I'm seeing the attached kasan dump.  The offending
RIP, ext4_file_read_iter+0x12b is at the above line, where it is trying to
read iocb->ki_filp.

Here's an excerpt of the relevant bits from my code:

	static void cachefiles_read_complete(struct kiocb *iocb, long ret, long ret2)
	{
		struct cachefiles_kiocb *ki =
			container_of(iocb, struct cachefiles_kiocb, iocb);
		struct fscache_io_request *req = ki->req;
	...
		fput(ki->iocb.ki_filp);
		kfree(ki);
		fscache_end_io_operation(req->cookie);
	...
	}

	int cachefiles_read(struct fscache_object *obj,
			    struct fscache_io_request *req,
			    struct iov_iter *iter)
	{
		struct cachefiles_object *object =
			container_of(obj, struct cachefiles_object, fscache);
		struct cachefiles_kiocb *ki;
		struct file *file = object->backing_file;
		ssize_t ret = -ENOBUFS;
	...
		ki = kzalloc(sizeof(struct cachefiles_kiocb), GFP_KERNEL);
		if (!ki)
			goto presubmission_error;

		ki->iocb.ki_filp	= get_file(file);
		ki->iocb.ki_pos		= req->pos;
		ki->iocb.ki_flags	= IOCB_DIRECT;
		ki->iocb.ki_hint	= ki_hint_validate(file_write_hint(file));
		ki->iocb.ki_ioprio	= get_current_ioprio();
		ki->req			= req;

		if (req->io_done)
			ki->iocb.ki_complete = cachefiles_read_complete;

		ret = rw_verify_area(READ, file, &ki->iocb.ki_pos, iov_iter_count(iter));
		if (ret < 0)
			goto presubmission_error_free;

		fscache_get_io_request(req);
		trace_cachefiles_read(object, file_inode(file), req);
		ret = call_read_iter(file, &ki->iocb, iter);
		...
	}

The allocation point, cachefiles_read+0xd0, is the kzalloc() you can see and
the free point, cachefiles_read_complete+0x86, is the kfree() in the callback
function.

Most of the time it works, but just occasionally the callback happens first.

David
---
EXT4-fs (sda3): mounted filesystem with ordered data mode. Opts: user_xattr
FS-Cache: Cache "mycache" added (type cachefiles)
CacheFiles: File cache on sda3 registered
CacheFiles: Dirty object in cache
==================================================================
BUG: KASAN: use-after-free in ext4_file_read_iter+0x12b/0x181
Read of size 8 at addr ffff8883577d7700 by task bash/5140

CPU: 0 PID: 5140 Comm: bash Not tainted 5.6.0-rc1-fscache+ #1285
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Call Trace:
 dump_stack+0x97/0xd3
 print_address_description.constprop.0+0x1c/0x2b5
 ? ext4_file_read_iter+0x12b/0x181
 __kasan_report+0x144/0x189
 ? ext4_file_read_iter+0x12b/0x181
 kasan_report+0xe/0x12
 ext4_file_read_iter+0x12b/0x181
 cachefiles_read+0x381/0x459
 ? trace_event_raw_event_cachefiles_map+0x14d/0x14d
 fscache_read_helper+0x9a6/0xa5f
 ? fscache_read_helper_single+0x387/0x387
 ? kmem_cache_alloc_trace+0x13f/0x167
 ? test_bit+0x1d/0x27
 afs_prefetch_for_write+0x2d4/0x397
 ? afs_req_issue_op+0x9d/0x9d
 ? hlock_class+0x27/0x86
 ? mark_lock+0x97/0x2d5
 ? timestamp_truncate+0xda/0x175
 ? inode_init_owner+0x114/0x114
 afs_write_begin+0xc8/0x354
 generic_perform_write+0x154/0x29c
 ? filemap_range_has_page+0x171/0x171
 ? file_update_time+0x11e/0x1c0
 ? alloc_inode+0xbf/0xbf
 ? generic_write_checks+0x139/0x17c
 __generic_file_write_iter+0x16e/0x1c3
 generic_file_write_iter+0x62/0xc6
 afs_file_write+0xdc/0x125
 ? new_sync_write+0xfc/0x186
 new_sync_write+0x10c/0x186
 ? __lock_is_held+0x2a/0x87
 ? new_sync_read+0x17e/0x17e
 ? rcu_read_lock_any_held+0xa5/0xe8
 ? rcu_read_lock_sched_held+0xc5/0xc5
 ? vfs_write+0xd1/0x144
 ? rcu_sync_is_idle+0x46/0x4f
 ? __sb_start_write+0xc1/0x162
 ? vfs_write+0xd1/0x144
 vfs_write+0xef/0x144
 ksys_write+0xd2/0x147
 ? __ia32_sys_read+0x41/0x41
 ? get_vtime_delta+0x9c/0xaa
 ? mark_held_locks+0x1f/0x78
 do_syscall_64+0x6e/0x8a
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7fe0ecdc0ff8
Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 77 0d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
RSP: 002b:00007ffd5f2ca6f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000000000000000b RCX: 00007fe0ecdc0ff8
RDX: 000000000000000b RSI: 0000564008fe1b30 RDI: 0000000000000001
RBP: 0000564008fe1b30 R08: 000000000000000a R09: 00007fe0ece52e80
R10: 000000000000000a R11: 0000000000000246 R12: 00007fe0ece94780
R13: 000000000000000b R14: 00007fe0ece8f740 R15: 000000000000000b

Allocated by task 5140:
 save_stack+0x1b/0x6a
 __kasan_kmalloc.constprop.0+0x84/0x95
 kmem_cache_alloc_trace+0x124/0x167
 cachefiles_read+0xd0/0x459
 fscache_read_helper+0x9a6/0xa5f
 afs_prefetch_for_write+0x2d4/0x397
 afs_write_begin+0xc8/0x354
 generic_perform_write+0x154/0x29c
 __generic_file_write_iter+0x16e/0x1c3
 generic_file_write_iter+0x62/0xc6
 afs_file_write+0xdc/0x125
 new_sync_write+0x10c/0x186
 vfs_write+0xef/0x144
 ksys_write+0xd2/0x147
 do_syscall_64+0x6e/0x8a
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 19100:
 save_stack+0x1b/0x6a
 __kasan_slab_free+0xd9/0xf9
 kfree+0x10d/0x140
 cachefiles_read_complete+0x86/0x111
 iomap_dio_bio_end_io+0x1b2/0x1e5
 blk_update_request+0x335/0x45b
 scsi_end_request+0x4f/0x27d
 scsi_io_completion+0x22e/0x5a7
 blk_done_softirq+0x142/0x17a
 __do_softirq+0x22b/0x492

The buggy address belongs to the object at ffff8883577d7700
 which belongs to the cache kmalloc-64 of size 64
The buggy address is located 0 bytes inside of
 64-byte region [ffff8883577d7700, ffff8883577d7740)
The buggy address belongs to the page:
page:ffffea000d5df5c0 refcount:1 mapcount:0 mapping:ffff8883ce400380 index:0xffff8883577d7000
flags: 0x200000000000200(slab)
raw: 0200000000000200 ffffea000d8cdd08 ffffea000dc0af88 ffff8883ce400380
raw: ffff8883577d7000 ffff8883577d7000 0000000100000013 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8883577d7600: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
 ffff8883577d7680: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>ffff8883577d7700: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
                   ^
 ffff8883577d7780: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
 ffff8883577d7800: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
==================================================================
Disabling lock debugging due to kernel taint
BUG: unable to handle page fault for address: 0000000100000058
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0 
Oops: 0000 [#1] SMP KASAN PTI
CPU: 0 PID: 5140 Comm: bash Tainted: G    B             5.6.0-rc1-fscache+ #1285
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
RIP: 0010:touch_atime+0x82/0x137
Code: 00 00 48 89 44 24 58 31 c0 e8 8a bf fb ff 49 8d 7c 24 08 4d 8b 34 24 e8 7c bf fb ff 49 8b 5c 24 08 48 8d 7b 58 e8 6e bf fb ff <48> 8b 5b 58 4c 89 e7 48 89 de e8 b4 fd ff ff 84 c0 74 6e 4c 8d 63
RSP: 0018:ffff8883c8617800 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000100000000 RCX: ffffffff812f3d9f
RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000100000058
RBP: ffff8883c8617880 R08: 0000000000000003 R09: 0000000000000001
R10: fffffbfff05f7dc5 R11: 0000000000000000 R12: ffff8883679d5010
R13: 1ffff110790c2f00 R14: 0000100000000000 R15: ffff8883577d7720
FS:  00007fe0eccd0740(0000) GS:ffff8883d0200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000100000058 CR3: 000000037201c002 CR4: 00000000001606f0
Call Trace:
 ? atime_needs_update+0x1bb/0x1bb
 ? __kasan_report+0x169/0x189
 ? ext4_file_read_iter+0x12b/0x181
 ext4_file_read_iter+0x147/0x181
 cachefiles_read+0x381/0x459
 ? trace_event_raw_event_cachefiles_map+0x14d/0x14d
 fscache_read_helper+0x9a6/0xa5f
 ? fscache_read_helper_single+0x387/0x387
 ? kmem_cache_alloc_trace+0x13f/0x167
 ? test_bit+0x1d/0x27
 afs_prefetch_for_write+0x2d4/0x397
 ? afs_req_issue_op+0x9d/0x9d
 ? hlock_class+0x27/0x86
 ? mark_lock+0x97/0x2d5
 ? timestamp_truncate+0xda/0x175
 ? inode_init_owner+0x114/0x114
 afs_write_begin+0xc8/0x354
 generic_perform_write+0x154/0x29c
 ? filemap_range_has_page+0x171/0x171
 ? file_update_time+0x11e/0x1c0
 ? alloc_inode+0xbf/0xbf
 ? generic_write_checks+0x139/0x17c
 __generic_file_write_iter+0x16e/0x1c3
 generic_file_write_iter+0x62/0xc6
 afs_file_write+0xdc/0x125
 ? new_sync_write+0xfc/0x186
 new_sync_write+0x10c/0x186
 ? __lock_is_held+0x2a/0x87
 ? new_sync_read+0x17e/0x17e
 ? rcu_read_lock_any_held+0xa5/0xe8
 ? rcu_read_lock_sched_held+0xc5/0xc5
 ? vfs_write+0xd1/0x144
 ? rcu_sync_is_idle+0x46/0x4f
 ? __sb_start_write+0xc1/0x162
 ? vfs_write+0xd1/0x144
 vfs_write+0xef/0x144
 ksys_write+0xd2/0x147
 ? __ia32_sys_read+0x41/0x41
 ? get_vtime_delta+0x9c/0xaa
 ? mark_held_locks+0x1f/0x78
 do_syscall_64+0x6e/0x8a
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7fe0ecdc0ff8
Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 77 0d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
RSP: 002b:00007ffd5f2ca6f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000000000000000b RCX: 00007fe0ecdc0ff8
RDX: 000000000000000b RSI: 0000564008fe1b30 RDI: 0000000000000001
RBP: 0000564008fe1b30 R08: 000000000000000a R09: 00007fe0ece52e80
R10: 000000000000000a R11: 0000000000000246 R12: 00007fe0ece94780
R13: 000000000000000b R14: 00007fe0ece8f740 R15: 000000000000000b
Modules linked in:
CR2: 0000000100000058
---[ end trace 051c3ce276c972cc ]---
RIP: 0010:touch_atime+0x82/0x137
Code: 00 00 48 89 44 24 58 31 c0 e8 8a bf fb ff 49 8d 7c 24 08 4d 8b 34 24 e8 7c bf fb ff 49 8b 5c 24 08 48 8d 7b 58 e8 6e bf fb ff <48> 8b 5b 58 4c 89 e7 48 89 de e8 b4 fd ff ff 84 c0 74 6e 4c 8d 63
RSP: 0018:ffff8883c8617800 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000100000000 RCX: ffffffff812f3d9f
RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000100000058
RBP: ffff8883c8617880 R08: 0000000000000003 R09: 0000000000000001
R10: fffffbfff05f7dc5 R11: 0000000000000000 R12: ffff8883679d5010
R13: 1ffff110790c2f00 R14: 0000100000000000 R15: ffff8883577d7720
FS:  00007fe0eccd0740(0000) GS:ffff8883d0200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000100000058 CR3: 000000037201c002 CR4: 00000000001606f0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Is ext4_dio_read_iter() broken? - and xfs_file_dio_aio_read()
  2020-03-12  9:19 Is ext4_dio_read_iter() broken? David Howells
@ 2020-03-12 10:07 ` David Howells
  2020-03-12 10:42   ` Christoph Hellwig
  2020-03-12 10:49   ` David Howells
  2020-03-12 10:20 ` Is ext4_dio_read_iter() broken? Matthew Bobrowski
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 8+ messages in thread
From: David Howells @ 2020-03-12 10:07 UTC (permalink / raw)
  To: mbobrowski, darrick.wong
  Cc: dhowells, jack, hch, linux-ext4, linux-xfs, linux-fsdevel

David Howells <dhowells@redhat.com> wrote:

> Is ext4_dio_read_iter() broken?  It calls:
> 
> 	file_accessed(iocb->ki_filp);
> 
> at the end of the function - but surely iocb should be expected to have been
> freed when iocb->ki_complete() was called?

I think it's actually worse than that.  You also can't call
inode_unlock_shared(inode) because you no longer own a ref on the inode since
->ki_complete() is expected to call fput() on iocb->ki_filp.

Yes, you own a shared lock on it, but unless somewhere along the
fput-dput-iput chain the inode lock is taken exclusively, the inode can be
freed whilst you're still holding the lock.

Oh - and ext4_dax_read_iter() is also similarly broken.

And xfs_file_dio_aio_read() appears to be broken as it touches the inode after
calling iomap_dio_rw() to unlock it.

David


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Is ext4_dio_read_iter() broken?
  2020-03-12  9:19 Is ext4_dio_read_iter() broken? David Howells
  2020-03-12 10:07 ` Is ext4_dio_read_iter() broken? - and xfs_file_dio_aio_read() David Howells
@ 2020-03-12 10:20 ` Matthew Bobrowski
  2020-03-12 10:26 ` btrfs may be broken too - " David Howells
  2020-03-12 23:23 ` Dave Chinner
  3 siblings, 0 replies; 8+ messages in thread
From: Matthew Bobrowski @ 2020-03-12 10:20 UTC (permalink / raw)
  To: David Howells; +Cc: jack, linux-ext4, linux-fsdevel

On Thu, Mar 12, 2020 at 09:19:39AM +0000, David Howells wrote:
> Hi Matthew,
> 
> Is ext4_dio_read_iter() broken?  It calls:
> 
> 	file_accessed(iocb->ki_filp);
> 
> at the end of the function - but surely iocb should be expected to have been
> freed when iocb->ki_complete() was called?
> 
> In my cachefiles rewrite, I'm seeing the attached kasan dump.  The offending
> RIP, ext4_file_read_iter+0x12b is at the above line, where it is trying to
> read iocb->ki_filp.
>
> Here's an excerpt of the relevant bits from my code:
> 
> 	static void cachefiles_read_complete(struct kiocb *iocb, long ret, long ret2)
> 	{
> 		struct cachefiles_kiocb *ki =
> 			container_of(iocb, struct cachefiles_kiocb, iocb);
> 		struct fscache_io_request *req = ki->req;
> 	...
> 		fput(ki->iocb.ki_filp);
> 		kfree(ki);
> 		fscache_end_io_operation(req->cookie);
> 	...
> 	}

I'm not exactly sure what you're rewriting, although from this excerpt
the way that this is implemented would sure cause the UAF in
ext4_dio_read_iter().

I don't forsee any issues with calling file_accessed(iocb->ki_filp);
prior to calling into the iomap infrastructure, unless I'm totally
missing something obvious...

/M

^ permalink raw reply	[flat|nested] 8+ messages in thread

* btrfs may be broken too - Re: Is ext4_dio_read_iter() broken?
  2020-03-12  9:19 Is ext4_dio_read_iter() broken? David Howells
  2020-03-12 10:07 ` Is ext4_dio_read_iter() broken? - and xfs_file_dio_aio_read() David Howells
  2020-03-12 10:20 ` Is ext4_dio_read_iter() broken? Matthew Bobrowski
@ 2020-03-12 10:26 ` David Howells
  2020-03-12 23:23 ` Dave Chinner
  3 siblings, 0 replies; 8+ messages in thread
From: David Howells @ 2020-03-12 10:26 UTC (permalink / raw)
  To: clm, josef, dsterba
  Cc: dhowells, mbobrowski, darrick.wong, jack, hch, linux-ext4,
	linux-xfs, linux-btrfs, linux-fsdevel

David Howells <dhowells@redhat.com> wrote:

> > Is ext4_dio_read_iter() broken?  It calls:
> > 
> > 	file_accessed(iocb->ki_filp);
> > 
> > at the end of the function - but surely iocb should be expected to have been
> > freed when iocb->ki_complete() was called?
> 
> I think it's actually worse than that.  You also can't call
> inode_unlock_shared(inode) because you no longer own a ref on the inode since
> ->ki_complete() is expected to call fput() on iocb->ki_filp.
> 
> Yes, you own a shared lock on it, but unless somewhere along the
> fput-dput-iput chain the inode lock is taken exclusively, the inode can be
> freed whilst you're still holding the lock.
> 
> Oh - and ext4_dax_read_iter() is also similarly broken.
> 
> And xfs_file_dio_aio_read() appears to be broken as it touches the inode after
> calling iomap_dio_rw() to unlock it.

Seems btrfs_file_write_iter() is also broken:

	if (iocb->ki_flags & IOCB_DIRECT) {
		num_written = __btrfs_direct_write(iocb, from);
	} else {
		num_written = btrfs_buffered_write(iocb, from);
		if (num_written > 0)
			iocb->ki_pos = pos + num_written;
		if (clean_page)
			pagecache_isize_extended(inode, oldsize,
						i_size_read(inode));
	}

	inode_unlock(inode);

But if __btrfs_direct_write() returned -EIOCBQUEUED then inode may have been
deallocated by the point it's calling inode_unlock().  Holding the lock is not
a preventative measure that I can see.

David


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Is ext4_dio_read_iter() broken? - and xfs_file_dio_aio_read()
  2020-03-12 10:07 ` Is ext4_dio_read_iter() broken? - and xfs_file_dio_aio_read() David Howells
@ 2020-03-12 10:42   ` Christoph Hellwig
  2020-03-12 10:49   ` David Howells
  1 sibling, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2020-03-12 10:42 UTC (permalink / raw)
  To: David Howells
  Cc: mbobrowski, darrick.wong, jack, hch, linux-ext4, linux-xfs,
	linux-fsdevel

On Thu, Mar 12, 2020 at 10:07:57AM +0000, David Howells wrote:
> David Howells <dhowells@redhat.com> wrote:
> 
> > Is ext4_dio_read_iter() broken?  It calls:
> > 
> > 	file_accessed(iocb->ki_filp);
> > 
> > at the end of the function - but surely iocb should be expected to have been
> > freed when iocb->ki_complete() was called?

The iocb is refcounted and only completed when the refcount hits zero,
and an extra reference is held until the submission has completed.
Take a look at iocb_put().

> I think it's actually worse than that.  You also can't call
> inode_unlock_shared(inode) because you no longer own a ref on the inode since
> ->ki_complete() is expected to call fput() on iocb->ki_filp.

the file reference also hold an inode reference.

> 
> Yes, you own a shared lock on it, but unless somewhere along the
> fput-dput-iput chain the inode lock is taken exclusively, the inode can be
> freed whilst you're still holding the lock.
> 
> Oh - and ext4_dax_read_iter() is also similarly broken.

In addition to that DAX never executes asynchronously.

> And xfs_file_dio_aio_read() appears to be broken as it touches the inode after
> calling iomap_dio_rw() to unlock it.

Same as above.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Is ext4_dio_read_iter() broken? - and xfs_file_dio_aio_read()
  2020-03-12 10:07 ` Is ext4_dio_read_iter() broken? - and xfs_file_dio_aio_read() David Howells
  2020-03-12 10:42   ` Christoph Hellwig
@ 2020-03-12 10:49   ` David Howells
  2020-03-12 10:53     ` Christoph Hellwig
  1 sibling, 1 reply; 8+ messages in thread
From: David Howells @ 2020-03-12 10:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: dhowells, mbobrowski, darrick.wong, jack, linux-ext4, linux-xfs,
	linux-fsdevel

Christoph Hellwig <hch@lst.de> wrote:

> > > at the end of the function - but surely iocb should be expected to have
> > > been freed when iocb->ki_complete() was called?
> 
> The iocb is refcounted and only completed when the refcount hits zero,
> and an extra reference is held until the submission has completed.
> Take a look at iocb_put().

Ah...  This is in struct aio_kiocb and not struct kiocb - that's why I missed
it.  Thanks.

David


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Is ext4_dio_read_iter() broken? - and xfs_file_dio_aio_read()
  2020-03-12 10:49   ` David Howells
@ 2020-03-12 10:53     ` Christoph Hellwig
  0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2020-03-12 10:53 UTC (permalink / raw)
  To: David Howells
  Cc: Christoph Hellwig, mbobrowski, darrick.wong, jack, linux-ext4,
	linux-xfs, linux-fsdevel

On Thu, Mar 12, 2020 at 10:49:40AM +0000, David Howells wrote:
> Christoph Hellwig <hch@lst.de> wrote:
> 
> > > > at the end of the function - but surely iocb should be expected to have
> > > > been freed when iocb->ki_complete() was called?
> > 
> > The iocb is refcounted and only completed when the refcount hits zero,
> > and an extra reference is held until the submission has completed.
> > Take a look at iocb_put().
> 
> Ah...  This is in struct aio_kiocb and not struct kiocb - that's why I missed
> it.  Thanks.

That being said we have a few other spots using ->ki_complete for
asynchronous execution, which might not be as careful.  As someone
having written one or two of those I have my doubts I got everthing
right and will audit those.

> 
> David
---end quoted text---

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Is ext4_dio_read_iter() broken?
  2020-03-12  9:19 Is ext4_dio_read_iter() broken? David Howells
                   ` (2 preceding siblings ...)
  2020-03-12 10:26 ` btrfs may be broken too - " David Howells
@ 2020-03-12 23:23 ` Dave Chinner
  3 siblings, 0 replies; 8+ messages in thread
From: Dave Chinner @ 2020-03-12 23:23 UTC (permalink / raw)
  To: David Howells; +Cc: mbobrowski, jack, linux-ext4, linux-fsdevel

On Thu, Mar 12, 2020 at 09:19:39AM +0000, David Howells wrote:
> Hi Matthew,
> 
> Is ext4_dio_read_iter() broken?  It calls:
> 
> 	file_accessed(iocb->ki_filp);
> 
> at the end of the function - but surely iocb should be expected to have been
> freed when iocb->ki_complete() was called?
> 
> In my cachefiles rewrite, I'm seeing the attached kasan dump.  The offending
> RIP, ext4_file_read_iter+0x12b is at the above line, where it is trying to
> read iocb->ki_filp.
> 
> Here's an excerpt of the relevant bits from my code:
> 
> 	static void cachefiles_read_complete(struct kiocb *iocb, long ret, long ret2)
> 	{
> 		struct cachefiles_kiocb *ki =
> 			container_of(iocb, struct cachefiles_kiocb, iocb);
> 		struct fscache_io_request *req = ki->req;
> 	...
> 		fput(ki->iocb.ki_filp);
> 		kfree(ki);
> 		fscache_end_io_operation(req->cookie);
> 	...
> 	}
> 
> 	int cachefiles_read(struct fscache_object *obj,
> 			    struct fscache_io_request *req,
> 			    struct iov_iter *iter)
> 	{
> 		struct cachefiles_object *object =
> 			container_of(obj, struct cachefiles_object, fscache);
> 		struct cachefiles_kiocb *ki;
> 		struct file *file = object->backing_file;
> 		ssize_t ret = -ENOBUFS;
> 	...
> 		ki = kzalloc(sizeof(struct cachefiles_kiocb), GFP_KERNEL);
> 		if (!ki)
> 			goto presubmission_error;
> 
> 		ki->iocb.ki_filp	= get_file(file);
> 		ki->iocb.ki_pos		= req->pos;
> 		ki->iocb.ki_flags	= IOCB_DIRECT;
> 		ki->iocb.ki_hint	= ki_hint_validate(file_write_hint(file));
> 		ki->iocb.ki_ioprio	= get_current_ioprio();
> 		ki->req			= req;
> 
> 		if (req->io_done)
> 			ki->iocb.ki_complete = cachefiles_read_complete;
> 
> 		ret = rw_verify_area(READ, file, &ki->iocb.ki_pos, iov_iter_count(iter));
> 		if (ret < 0)
> 			goto presubmission_error_free;
> 
> 		fscache_get_io_request(req);
> 		trace_cachefiles_read(object, file_inode(file), req);
> 		ret = call_read_iter(file, &ki->iocb, iter);
> 		...
> 	}
> 
> The allocation point, cachefiles_read+0xd0, is the kzalloc() you can see and
> the free point, cachefiles_read_complete+0x86, is the kfree() in the callback
> function.

It looks to me like you are creating your own iocb that you are
doing AIO+DIO on, and you aren't taking a reference count to the
iocb yourself to ensure that it exists for the entire submission
path. i.e. see how the AIO code uses ki_refcnt and iocb_put() to
ensure the iocb does not get destroyed by IO completion before the
submission path has completed.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-03-12 23:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-12  9:19 Is ext4_dio_read_iter() broken? David Howells
2020-03-12 10:07 ` Is ext4_dio_read_iter() broken? - and xfs_file_dio_aio_read() David Howells
2020-03-12 10:42   ` Christoph Hellwig
2020-03-12 10:49   ` David Howells
2020-03-12 10:53     ` Christoph Hellwig
2020-03-12 10:20 ` Is ext4_dio_read_iter() broken? Matthew Bobrowski
2020-03-12 10:26 ` btrfs may be broken too - " David Howells
2020-03-12 23:23 ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).