All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG: iomap_dio_rw() accesses freed memory
@ 2019-01-10 10:17 Chandan Rajendra
  2019-01-10 14:25 ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Chandan Rajendra @ 2019-01-10 10:17 UTC (permalink / raw)
  To: linux-xfs; +Cc: hch, darrick.wong

The following call trace is observed on next-20190110, when generic/323 test
is executed on a 4k block sized XFS filesystem on ppc64le machine,

[ T7710] BUG: Unable to handle kernel data access at 0x6b6b6b6b6b6b6be3
[ T7710] Faulting instruction address: 0xc00000000090372c
[ T7710] Oops: Kernel access of bad area, sig: 11 [#12]
[ T7710] LE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries
[ T7710] Modules linked in:
[ T7710] CPU: 7 PID: 7710 Comm: aio-last-ref-he Tainted: G      D           5.0.0-rc1-next-20190110 #36
[ T7710] NIP:  c00000000090372c LR: c0000000004bf75c CTR: c00000000091cfe0
[ T7710] REGS: c00000062fc67730 TRAP: 0380   Tainted: G      D            (5.0.0-rc1-next-20190110)
[ T7710] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 44428224  XER: 20000000
[ T7710] CFAR: c0000000004bf758 IRQMASK: 0 
[ T7710] GPR00: c0000000004bf75c c00000062fc679c0 c0000000017f4b00 6b6b6b6b6b6b6b6b 
[ T7710] GPR04: 000000006b6b6b6b 0000000000000001 0000000000000000 0000000000000001 
[ T7710] GPR08: 0000000000000000 6b6b6b6b6b6b6b6b 0000000000000000 0000000000000000 
[ T7710] GPR12: 0000000044428224 c00000003ff93700 00000000000f0000 0000000105611a30 
[ T7710] GPR16: 00007fff52fde088 00000001056119e8 00007fff52fde288 00000001056119c8 
[ T7710] GPR20: 0000000000000010 c000000634258bb8 00000000000effff c000000627e94918 
[ T7710] GPR24: c000000000fb3290 c000000627e94700 fffffffffffffdef 00000000000f0000 
[ T7710] GPR28: 0000000000000000 0000000000000002 c0000006247ae180 c000000634258b98 
[ T7710] NIP [c00000000090372c] blk_poll+0x2c/0x3d0
[ T7710] LR [c0000000004bf75c] iomap_dio_rw+0x45c/0x4f0
[ T7710] Call Trace:
[ T7710] [c00000062fc67a60] [c0000000004bf7b8] iomap_dio_rw+0x4b8/0x4f0
[ T7710] [c00000062fc67b20] [c0000000006ed3a4] xfs_file_dio_aio_read+0xb4/0x250
[ T7710] [c00000062fc67b80] [c0000000006edbc4] xfs_file_read_iter+0x114/0x160
[ T7710] [c00000062fc67bc0] [c00000000049476c] aio_read+0x12c/0x1d0
[ T7710] [c00000062fc67cc0] [c000000000495094] io_submit_one+0x634/0xbd0
[ T7710] [c00000062fc67d90] [c0000000004956f0] sys_io_submit+0xc0/0x380
[ T7710] [c00000062fc67e20] [c00000000000bae4] system_call+0x5c/0x70
[ T7710] Instruction dump:
[ T7710] 60000000 3c4c00ef 38421400 7c0802a6 60000000 7d800026 2f84ffff fb81ffe0 
[ T7710] 3b800000 91810008 f821ff61 419e01a4 <e9230078> 793c6fe3 41820198 7c0802a6 
[ T7710] ---[ end trace b14f7219ccf85a08 ]---
[ T7710]


The is happening due to the following sequence of events,

1. iomap_dio_rw() allocates a 'struct iomap_dio'. Ref count is set to 1.
2. iomap_dio_bio_actor increments ref count to 2 and submits a bio.
3. iomap_dio_rw() decrements ref count. Since dio->ref is non-zero,
   !atomic_dec_and_test(&dio->ref) would evaluate to true.
4. Meanwhile the bio submitted in step 2 completes I/O. iomap_dio_bio_end_io()
   decrements dio->ref to the value of 0. This would inturn call
   iomap_dio_complete() which frees the iomap_dio structure.
5. iomap_dio_rw() can now deference a freed structure via either
   "!dio->submit.last_queue" or "!blk_poll(dio->submit.last_queue,
   dio->submit.cookie, true)" 


-- 
chandan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BUG: iomap_dio_rw() accesses freed memory
  2019-01-10 10:17 BUG: iomap_dio_rw() accesses freed memory Chandan Rajendra
@ 2019-01-10 14:25 ` Christoph Hellwig
  2019-01-10 16:49   ` Chandan Rajendra
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2019-01-10 14:25 UTC (permalink / raw)
  To: Chandan Rajendra; +Cc: linux-xfs, hch, darrick.wong

On Thu, Jan 10, 2019 at 03:47:09PM +0530, Chandan Rajendra wrote:
> The following call trace is observed on next-20190110, when generic/323 test
> is executed on a 4k block sized XFS filesystem on ppc64le machine,
> 
> [ T7710] BUG: Unable to handle kernel data access at 0x6b6b6b6b6b6b6be3
> [ T7710] Faulting instruction address: 0xc00000000090372c
> [ T7710] Oops: Kernel access of bad area, sig: 11 [#12]
> [ T7710] LE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries
> [ T7710] Modules linked in:
> [ T7710] CPU: 7 PID: 7710 Comm: aio-last-ref-he Tainted: G      D           5.0.0-rc1-next-20190110 #36
> [ T7710] NIP:  c00000000090372c LR: c0000000004bf75c CTR: c00000000091cfe0
> [ T7710] REGS: c00000062fc67730 TRAP: 0380   Tainted: G      D            (5.0.0-rc1-next-20190110)
> [ T7710] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 44428224  XER: 20000000
> [ T7710] CFAR: c0000000004bf758 IRQMASK: 0 
> [ T7710] GPR00: c0000000004bf75c c00000062fc679c0 c0000000017f4b00 6b6b6b6b6b6b6b6b 
> [ T7710] GPR04: 000000006b6b6b6b 0000000000000001 0000000000000000 0000000000000001 
> [ T7710] GPR08: 0000000000000000 6b6b6b6b6b6b6b6b 0000000000000000 0000000000000000 
> [ T7710] GPR12: 0000000044428224 c00000003ff93700 00000000000f0000 0000000105611a30 
> [ T7710] GPR16: 00007fff52fde088 00000001056119e8 00007fff52fde288 00000001056119c8 
> [ T7710] GPR20: 0000000000000010 c000000634258bb8 00000000000effff c000000627e94918 
> [ T7710] GPR24: c000000000fb3290 c000000627e94700 fffffffffffffdef 00000000000f0000 
> [ T7710] GPR28: 0000000000000000 0000000000000002 c0000006247ae180 c000000634258b98 
> [ T7710] NIP [c00000000090372c] blk_poll+0x2c/0x3d0
> [ T7710] LR [c0000000004bf75c] iomap_dio_rw+0x45c/0x4f0
> [ T7710] Call Trace:
> [ T7710] [c00000062fc67a60] [c0000000004bf7b8] iomap_dio_rw+0x4b8/0x4f0
> [ T7710] [c00000062fc67b20] [c0000000006ed3a4] xfs_file_dio_aio_read+0xb4/0x250
> [ T7710] [c00000062fc67b80] [c0000000006edbc4] xfs_file_read_iter+0x114/0x160
> [ T7710] [c00000062fc67bc0] [c00000000049476c] aio_read+0x12c/0x1d0
> [ T7710] [c00000062fc67cc0] [c000000000495094] io_submit_one+0x634/0xbd0
> [ T7710] [c00000062fc67d90] [c0000000004956f0] sys_io_submit+0xc0/0x380
> [ T7710] [c00000062fc67e20] [c00000000000bae4] system_call+0x5c/0x70
> [ T7710] Instruction dump:
> [ T7710] 60000000 3c4c00ef 38421400 7c0802a6 60000000 7d800026 2f84ffff fb81ffe0 
> [ T7710] 3b800000 91810008 f821ff61 419e01a4 <e9230078> 793c6fe3 41820198 7c0802a6 
> [ T7710] ---[ end trace b14f7219ccf85a08 ]---
> [ T7710]
> 
> 
> The is happening due to the following sequence of events,
> 
> 1. iomap_dio_rw() allocates a 'struct iomap_dio'. Ref count is set to 1.
> 2. iomap_dio_bio_actor increments ref count to 2 and submits a bio.
> 3. iomap_dio_rw() decrements ref count. Since dio->ref is non-zero,
>    !atomic_dec_and_test(&dio->ref) would evaluate to true.
> 4. Meanwhile the bio submitted in step 2 completes I/O. iomap_dio_bio_end_io()
>    decrements dio->ref to the value of 0. This would inturn call
>    iomap_dio_complete() which frees the iomap_dio structure.

If iomap_dio_bio_end_io completes the dio this should be an AIO request.

> 5. iomap_dio_rw() can now deference a freed structure via either
>    "!dio->submit.last_queue" or "!blk_poll(dio->submit.last_queue,
>    dio->submit.cookie, true)" 

But then again this code is only executed for the wait_for_completion
case.  So I wonder how you manage to hit this.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BUG: iomap_dio_rw() accesses freed memory
  2019-01-10 14:25 ` Christoph Hellwig
@ 2019-01-10 16:49   ` Chandan Rajendra
  2019-01-10 17:09     ` Darrick J. Wong
  0 siblings, 1 reply; 10+ messages in thread
From: Chandan Rajendra @ 2019-01-10 16:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, darrick.wong

On Thursday, January 10, 2019 7:55:52 PM IST Christoph Hellwig wrote:
> On Thu, Jan 10, 2019 at 03:47:09PM +0530, Chandan Rajendra wrote:
> > The following call trace is observed on next-20190110, when generic/323 test
> > is executed on a 4k block sized XFS filesystem on ppc64le machine,
> > 
> > [ T7710] BUG: Unable to handle kernel data access at 0x6b6b6b6b6b6b6be3
> > [ T7710] Faulting instruction address: 0xc00000000090372c
> > [ T7710] Oops: Kernel access of bad area, sig: 11 [#12]
> > [ T7710] LE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries
> > [ T7710] Modules linked in:
> > [ T7710] CPU: 7 PID: 7710 Comm: aio-last-ref-he Tainted: G      D           5.0.0-rc1-next-20190110 #36
> > [ T7710] NIP:  c00000000090372c LR: c0000000004bf75c CTR: c00000000091cfe0
> > [ T7710] REGS: c00000062fc67730 TRAP: 0380   Tainted: G      D            (5.0.0-rc1-next-20190110)
> > [ T7710] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 44428224  XER: 20000000
> > [ T7710] CFAR: c0000000004bf758 IRQMASK: 0 
> > [ T7710] GPR00: c0000000004bf75c c00000062fc679c0 c0000000017f4b00 6b6b6b6b6b6b6b6b 
> > [ T7710] GPR04: 000000006b6b6b6b 0000000000000001 0000000000000000 0000000000000001 
> > [ T7710] GPR08: 0000000000000000 6b6b6b6b6b6b6b6b 0000000000000000 0000000000000000 
> > [ T7710] GPR12: 0000000044428224 c00000003ff93700 00000000000f0000 0000000105611a30 
> > [ T7710] GPR16: 00007fff52fde088 00000001056119e8 00007fff52fde288 00000001056119c8 
> > [ T7710] GPR20: 0000000000000010 c000000634258bb8 00000000000effff c000000627e94918 
> > [ T7710] GPR24: c000000000fb3290 c000000627e94700 fffffffffffffdef 00000000000f0000 
> > [ T7710] GPR28: 0000000000000000 0000000000000002 c0000006247ae180 c000000634258b98 
> > [ T7710] NIP [c00000000090372c] blk_poll+0x2c/0x3d0
> > [ T7710] LR [c0000000004bf75c] iomap_dio_rw+0x45c/0x4f0
> > [ T7710] Call Trace:
> > [ T7710] [c00000062fc67a60] [c0000000004bf7b8] iomap_dio_rw+0x4b8/0x4f0
> > [ T7710] [c00000062fc67b20] [c0000000006ed3a4] xfs_file_dio_aio_read+0xb4/0x250
> > [ T7710] [c00000062fc67b80] [c0000000006edbc4] xfs_file_read_iter+0x114/0x160
> > [ T7710] [c00000062fc67bc0] [c00000000049476c] aio_read+0x12c/0x1d0
> > [ T7710] [c00000062fc67cc0] [c000000000495094] io_submit_one+0x634/0xbd0
> > [ T7710] [c00000062fc67d90] [c0000000004956f0] sys_io_submit+0xc0/0x380
> > [ T7710] [c00000062fc67e20] [c00000000000bae4] system_call+0x5c/0x70
> > [ T7710] Instruction dump:
> > [ T7710] 60000000 3c4c00ef 38421400 7c0802a6 60000000 7d800026 2f84ffff fb81ffe0 
> > [ T7710] 3b800000 91810008 f821ff61 419e01a4 <e9230078> 793c6fe3 41820198 7c0802a6 
> > [ T7710] ---[ end trace b14f7219ccf85a08 ]---
> > [ T7710]
> > 
> > 
> > The is happening due to the following sequence of events,
> > 
> > 1. iomap_dio_rw() allocates a 'struct iomap_dio'. Ref count is set to 1.
> > 2. iomap_dio_bio_actor increments ref count to 2 and submits a bio.
> > 3. iomap_dio_rw() decrements ref count. Since dio->ref is non-zero,
> >    !atomic_dec_and_test(&dio->ref) would evaluate to true.
> > 4. Meanwhile the bio submitted in step 2 completes I/O. iomap_dio_bio_end_io()
> >    decrements dio->ref to the value of 0. This would inturn call
> >    iomap_dio_complete() which frees the iomap_dio structure.
> 
> If iomap_dio_bio_end_io completes the dio this should be an AIO request.
> 
> > 5. iomap_dio_rw() can now deference a freed structure via either
> >    "!dio->submit.last_queue" or "!blk_poll(dio->submit.last_queue,
> >    dio->submit.cookie, true)" 
> 
> But then again this code is only executed for the wait_for_completion
> case.  So I wonder how you manage to hit this.

A call to printk() right after " if (!atomic_dec_and_test(&dio->ref)) { " in
iomap_dio_rw() gave,

iomap_dio_rw: dio = 00000000540abd3d; dio->submit.last_queue = 000000000788c132; dio->wait_for_completion = 107

This is most probably because the freed dio's memory is already being written
to by its owning code.

-- 
chandan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BUG: iomap_dio_rw() accesses freed memory
  2019-01-10 16:49   ` Chandan Rajendra
@ 2019-01-10 17:09     ` Darrick J. Wong
  2019-01-15 18:27       ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Darrick J. Wong @ 2019-01-10 17:09 UTC (permalink / raw)
  To: Chandan Rajendra; +Cc: Christoph Hellwig, linux-xfs

On Thu, Jan 10, 2019 at 10:19:26PM +0530, Chandan Rajendra wrote:
> On Thursday, January 10, 2019 7:55:52 PM IST Christoph Hellwig wrote:
> > On Thu, Jan 10, 2019 at 03:47:09PM +0530, Chandan Rajendra wrote:
> > > The following call trace is observed on next-20190110, when generic/323 test
> > > is executed on a 4k block sized XFS filesystem on ppc64le machine,
> > > 
> > > [ T7710] BUG: Unable to handle kernel data access at 0x6b6b6b6b6b6b6be3
> > > [ T7710] Faulting instruction address: 0xc00000000090372c
> > > [ T7710] Oops: Kernel access of bad area, sig: 11 [#12]
> > > [ T7710] LE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries
> > > [ T7710] Modules linked in:
> > > [ T7710] CPU: 7 PID: 7710 Comm: aio-last-ref-he Tainted: G      D           5.0.0-rc1-next-20190110 #36
> > > [ T7710] NIP:  c00000000090372c LR: c0000000004bf75c CTR: c00000000091cfe0
> > > [ T7710] REGS: c00000062fc67730 TRAP: 0380   Tainted: G      D            (5.0.0-rc1-next-20190110)
> > > [ T7710] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 44428224  XER: 20000000
> > > [ T7710] CFAR: c0000000004bf758 IRQMASK: 0 
> > > [ T7710] GPR00: c0000000004bf75c c00000062fc679c0 c0000000017f4b00 6b6b6b6b6b6b6b6b 
> > > [ T7710] GPR04: 000000006b6b6b6b 0000000000000001 0000000000000000 0000000000000001 
> > > [ T7710] GPR08: 0000000000000000 6b6b6b6b6b6b6b6b 0000000000000000 0000000000000000 
> > > [ T7710] GPR12: 0000000044428224 c00000003ff93700 00000000000f0000 0000000105611a30 
> > > [ T7710] GPR16: 00007fff52fde088 00000001056119e8 00007fff52fde288 00000001056119c8 
> > > [ T7710] GPR20: 0000000000000010 c000000634258bb8 00000000000effff c000000627e94918 
> > > [ T7710] GPR24: c000000000fb3290 c000000627e94700 fffffffffffffdef 00000000000f0000 
> > > [ T7710] GPR28: 0000000000000000 0000000000000002 c0000006247ae180 c000000634258b98 
> > > [ T7710] NIP [c00000000090372c] blk_poll+0x2c/0x3d0
> > > [ T7710] LR [c0000000004bf75c] iomap_dio_rw+0x45c/0x4f0
> > > [ T7710] Call Trace:
> > > [ T7710] [c00000062fc67a60] [c0000000004bf7b8] iomap_dio_rw+0x4b8/0x4f0
> > > [ T7710] [c00000062fc67b20] [c0000000006ed3a4] xfs_file_dio_aio_read+0xb4/0x250
> > > [ T7710] [c00000062fc67b80] [c0000000006edbc4] xfs_file_read_iter+0x114/0x160
> > > [ T7710] [c00000062fc67bc0] [c00000000049476c] aio_read+0x12c/0x1d0
> > > [ T7710] [c00000062fc67cc0] [c000000000495094] io_submit_one+0x634/0xbd0
> > > [ T7710] [c00000062fc67d90] [c0000000004956f0] sys_io_submit+0xc0/0x380
> > > [ T7710] [c00000062fc67e20] [c00000000000bae4] system_call+0x5c/0x70
> > > [ T7710] Instruction dump:
> > > [ T7710] 60000000 3c4c00ef 38421400 7c0802a6 60000000 7d800026 2f84ffff fb81ffe0 
> > > [ T7710] 3b800000 91810008 f821ff61 419e01a4 <e9230078> 793c6fe3 41820198 7c0802a6 
> > > [ T7710] ---[ end trace b14f7219ccf85a08 ]---
> > > [ T7710]
> > > 
> > > 
> > > The is happening due to the following sequence of events,
> > > 
> > > 1. iomap_dio_rw() allocates a 'struct iomap_dio'. Ref count is set to 1.
> > > 2. iomap_dio_bio_actor increments ref count to 2 and submits a bio.
> > > 3. iomap_dio_rw() decrements ref count. Since dio->ref is non-zero,
> > >    !atomic_dec_and_test(&dio->ref) would evaluate to true.
> > > 4. Meanwhile the bio submitted in step 2 completes I/O. iomap_dio_bio_end_io()
> > >    decrements dio->ref to the value of 0. This would inturn call
> > >    iomap_dio_complete() which frees the iomap_dio structure.
> > 
> > If iomap_dio_bio_end_io completes the dio this should be an AIO request.
> > 
> > > 5. iomap_dio_rw() can now deference a freed structure via either
> > >    "!dio->submit.last_queue" or "!blk_poll(dio->submit.last_queue,
> > >    dio->submit.cookie, true)" 
> > 
> > But then again this code is only executed for the wait_for_completion
> > case.  So I wonder how you manage to hit this.
> 
> A call to printk() right after " if (!atomic_dec_and_test(&dio->ref)) { " in
> iomap_dio_rw() gave,
> 
> iomap_dio_rw: dio = 00000000540abd3d; dio->submit.last_queue = 000000000788c132; dio->wait_for_completion = 107
> 
> This is most probably because the freed dio's memory is already being written
> to by its owning code.

generic/323 on a very fast scsi device reproduces (and hangs) easily.
The UAF happens (on my machine, anyway) because the bio endio function
gets called before the submitting process even reaches blk_finish_plug.

This is the exact same issue being discussed in "Broken dio refcounting
leads to livelock?"  I think Dave Chinner was working on a less gross
fix than the one we'd come up with in that other thread.

--D

> -- 
> chandan
> 
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BUG: iomap_dio_rw() accesses freed memory
  2019-01-10 17:09     ` Darrick J. Wong
@ 2019-01-15 18:27       ` Christoph Hellwig
  2019-01-15 20:51         ` Dave Chinner
  2019-01-16  5:59         ` Chandan Rajendra
  0 siblings, 2 replies; 10+ messages in thread
From: Christoph Hellwig @ 2019-01-15 18:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Chandan Rajendra, Dave Chinner, linux-xfs

So after review that thread and this thread I still don't really
see a refcounting problem, just a use after free of the
wait_for_completion member for fast aio completions.

Chandan, can you give this patch a spin?

diff --git a/fs/iomap.c b/fs/iomap.c
index cb184ff68680..47362397cb82 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -1813,6 +1813,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	loff_t pos = iocb->ki_pos, start = pos;
 	loff_t end = iocb->ki_pos + count - 1, ret = 0;
 	unsigned int flags = IOMAP_DIRECT;
+	bool wait_for_completion = is_sync_kiocb(iocb);
 	struct blk_plug plug;
 	struct iomap_dio *dio;
 
@@ -1832,7 +1833,6 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	dio->end_io = end_io;
 	dio->error = 0;
 	dio->flags = 0;
-	dio->wait_for_completion = is_sync_kiocb(iocb);
 
 	dio->submit.iter = iter;
 	dio->submit.waiter = current;
@@ -1887,7 +1887,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		dio_warn_stale_pagecache(iocb->ki_filp);
 	ret = 0;
 
-	if (iov_iter_rw(iter) == WRITE && !dio->wait_for_completion &&
+	if (iov_iter_rw(iter) == WRITE && !wait_for_completion &&
 	    !inode->i_sb->s_dio_done_wq) {
 		ret = sb_init_dio_done_wq(inode->i_sb);
 		if (ret < 0)
@@ -1903,7 +1903,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		if (ret <= 0) {
 			/* magic error code to fall back to buffered I/O */
 			if (ret == -ENOTBLK) {
-				dio->wait_for_completion = true;
+				wait_for_completion = true;
 				ret = 0;
 			}
 			break;
@@ -1925,8 +1925,24 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	if (dio->flags & IOMAP_DIO_WRITE_FUA)
 		dio->flags &= ~IOMAP_DIO_NEED_SYNC;
 
+	/*
+	 * We are about to drop our additional submission reference, which
+	 * might be the last reference to the dio.  There are three three
+	 * different ways we can progress here:
+	 *
+	 *  (a) If this is the last reference we will always complete and free
+	 *	the dio ourselves. right here.
+	 *  (b) If this is not the last reference, and we serve an asynchronous
+	 *	iocb, we must never touch the dio after the decrement, the
+	 *	I/O completion handler will complete and free it.
+	 *  (c) If this is not the last reference, but we serve a synchronous
+	 *	iocb, the I/O completion handler will wake us up on the drop
+	 *	of the final reference, and we will complete and free it here
+	 *	after we got woken by the I/O completion handler.
+	 */
+	dio->wait_for_completion = wait_for_completion;
 	if (!atomic_dec_and_test(&dio->ref)) {
-		if (!dio->wait_for_completion)
+		if (!wait_for_completion)
 			return -EIOCBQUEUED;
 
 		for (;;) {
@@ -1943,9 +1959,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		__set_current_state(TASK_RUNNING);
 	}
 
-	ret = iomap_dio_complete(dio);
-
-	return ret;
+	return iomap_dio_complete(dio);
 
 out_free_dio:
 	kfree(dio);

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: BUG: iomap_dio_rw() accesses freed memory
  2019-01-15 18:27       ` Christoph Hellwig
@ 2019-01-15 20:51         ` Dave Chinner
  2019-01-15 21:09           ` Christoph Hellwig
  2019-01-16  5:59         ` Chandan Rajendra
  1 sibling, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2019-01-15 20:51 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Darrick J. Wong, Chandan Rajendra, linux-xfs

On Tue, Jan 15, 2019 at 07:27:45PM +0100, Christoph Hellwig wrote:
> So after review that thread and this thread I still don't really
> see a refcounting problem, just a use after free of the
> wait_for_completion member for fast aio completions.
> 
> Chandan, can you give this patch a spin?
> 
> diff --git a/fs/iomap.c b/fs/iomap.c
> index cb184ff68680..47362397cb82 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -1813,6 +1813,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  	loff_t pos = iocb->ki_pos, start = pos;
>  	loff_t end = iocb->ki_pos + count - 1, ret = 0;
>  	unsigned int flags = IOMAP_DIRECT;
> +	bool wait_for_completion = is_sync_kiocb(iocb);
>  	struct blk_plug plug;
>  	struct iomap_dio *dio;
>  
> @@ -1832,7 +1833,6 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  	dio->end_io = end_io;
>  	dio->error = 0;
>  	dio->flags = 0;
> -	dio->wait_for_completion = is_sync_kiocb(iocb);
>  
>  	dio->submit.iter = iter;
>  	dio->submit.waiter = current;
> @@ -1887,7 +1887,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  		dio_warn_stale_pagecache(iocb->ki_filp);
>  	ret = 0;
>  
> -	if (iov_iter_rw(iter) == WRITE && !dio->wait_for_completion &&
> +	if (iov_iter_rw(iter) == WRITE && !wait_for_completion &&
>  	    !inode->i_sb->s_dio_done_wq) {
>  		ret = sb_init_dio_done_wq(inode->i_sb);
>  		if (ret < 0)
> @@ -1903,7 +1903,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  		if (ret <= 0) {
>  			/* magic error code to fall back to buffered I/O */
>  			if (ret == -ENOTBLK) {
> -				dio->wait_for_completion = true;
> +				wait_for_completion = true;
>  				ret = 0;
>  			}
>  			break;
> @@ -1925,8 +1925,24 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  	if (dio->flags & IOMAP_DIO_WRITE_FUA)
>  		dio->flags &= ~IOMAP_DIO_NEED_SYNC;
>  
> +	/*
> +	 * We are about to drop our additional submission reference, which
> +	 * might be the last reference to the dio.  There are three three
> +	 * different ways we can progress here:
> +	 *
> +	 *  (a) If this is the last reference we will always complete and free
> +	 *	the dio ourselves. right here.
> +	 *  (b) If this is not the last reference, and we serve an asynchronous
> +	 *	iocb, we must never touch the dio after the decrement, the
> +	 *	I/O completion handler will complete and free it.
> +	 *  (c) If this is not the last reference, but we serve a synchronous
> +	 *	iocb, the I/O completion handler will wake us up on the drop
> +	 *	of the final reference, and we will complete and free it here
> +	 *	after we got woken by the I/O completion handler.
> +	 */
> +	dio->wait_for_completion = wait_for_completion;
>  	if (!atomic_dec_and_test(&dio->ref)) {

Atomic operations don't imply a memory barrier for dependent data,
right? So in completion we do:

	if (atomic_dec_and_test(&dio->ref)) {
		if (dio->wait_for_completion) {
			/* do wakeup */
		} else ....

So if we get:

CPU 0 (submission)		CPU 1 (completion)
set dio->wfc
dec dio->ref, val = 1		dec dio->ref, val = 0
enter wait loop			check dio->wfc

Where's the memory barrier to ensure that CPU 1 sees the value
that CPU 0 set in dio->wait_for_completion?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BUG: iomap_dio_rw() accesses freed memory
  2019-01-15 20:51         ` Dave Chinner
@ 2019-01-15 21:09           ` Christoph Hellwig
  2019-01-15 23:18             ` Dave Chinner
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2019-01-15 21:09 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Darrick J. Wong, Chandan Rajendra, linux-xfs

On Wed, Jan 16, 2019 at 07:51:41AM +1100, Dave Chinner wrote:
> Atomic operations don't imply a memory barrier for dependent data,
> right?

Documentation/atomic_t.txt says:

-------------------------- snip --------------------------
The rule of thumb:

 - non-RMW operations are unordered;

 - RMW operations that have no return value are unordered;

 - RMW operations that have a return value are fully ordered;

[...]

Fully ordered primitives are ordered against everything prior and everything
subsequent. Therefore a fully ordered primitive is like having an smp_mb()
before and an smp_mb() after the primitive.


-------------------------- snip --------------------------

I think atomic_dec_and_test clearly falls into the third category,
and I can't see how much of the kernel could work if that wasn't the
case.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BUG: iomap_dio_rw() accesses freed memory
  2019-01-15 21:09           ` Christoph Hellwig
@ 2019-01-15 23:18             ` Dave Chinner
  2019-01-15 23:24               ` Darrick J. Wong
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2019-01-15 23:18 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Darrick J. Wong, Chandan Rajendra, linux-xfs

On Tue, Jan 15, 2019 at 10:09:03PM +0100, Christoph Hellwig wrote:
> On Wed, Jan 16, 2019 at 07:51:41AM +1100, Dave Chinner wrote:
> > Atomic operations don't imply a memory barrier for dependent data,
> > right?
> 
> Documentation/atomic_t.txt says:
> 
> -------------------------- snip --------------------------
> The rule of thumb:
> 
>  - non-RMW operations are unordered;
> 
>  - RMW operations that have no return value are unordered;
> 
>  - RMW operations that have a return value are fully ordered;
> 
> [...]
> 
> Fully ordered primitives are ordered against everything prior and everything
> subsequent. Therefore a fully ordered primitive is like having an smp_mb()
> before and an smp_mb() after the primitive.

I guess I haven't looked at the documentation for a while. Or the
implementation for that matter.

/me goes off and looks.

Oh, they are now implemented with built in, explicit
smp_mb__before_atomic() and smp_mb__after_atomic() barriers. Ok,
so the necessary barriers are there, my brain was telling me they
still needed to be added manually and needed updating.....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BUG: iomap_dio_rw() accesses freed memory
  2019-01-15 23:18             ` Dave Chinner
@ 2019-01-15 23:24               ` Darrick J. Wong
  0 siblings, 0 replies; 10+ messages in thread
From: Darrick J. Wong @ 2019-01-15 23:24 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, Chandan Rajendra, linux-xfs

On Wed, Jan 16, 2019 at 10:18:51AM +1100, Dave Chinner wrote:
> On Tue, Jan 15, 2019 at 10:09:03PM +0100, Christoph Hellwig wrote:
> > On Wed, Jan 16, 2019 at 07:51:41AM +1100, Dave Chinner wrote:
> > > Atomic operations don't imply a memory barrier for dependent data,
> > > right?
> > 
> > Documentation/atomic_t.txt says:
> > 
> > -------------------------- snip --------------------------
> > The rule of thumb:
> > 
> >  - non-RMW operations are unordered;
> > 
> >  - RMW operations that have no return value are unordered;
> > 
> >  - RMW operations that have a return value are fully ordered;
> > 
> > [...]
> > 
> > Fully ordered primitives are ordered against everything prior and everything
> > subsequent. Therefore a fully ordered primitive is like having an smp_mb()
> > before and an smp_mb() after the primitive.
> 
> I guess I haven't looked at the documentation for a while. Or the
> implementation for that matter.
> 
> /me goes off and looks.
> 
> Oh, they are now implemented with built in, explicit
> smp_mb__before_atomic() and smp_mb__after_atomic() barriers. Ok,
> so the necessary barriers are there, my brain was telling me they
> still needed to be added manually and needed updating.....

FWIW this fixes the machine that was cranking out generic/323 hangs,
having run it repeatedly in a loop for the ~4 hours in between falling
down the stairs this morning and finally getting back from all the
errands I had to do today.

Ready for this crap year to be over with, which means this ought to have
someone other than me look over it:

Tested-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: BUG: iomap_dio_rw() accesses freed memory
  2019-01-15 18:27       ` Christoph Hellwig
  2019-01-15 20:51         ` Dave Chinner
@ 2019-01-16  5:59         ` Chandan Rajendra
  1 sibling, 0 replies; 10+ messages in thread
From: Chandan Rajendra @ 2019-01-16  5:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs

On Tuesday, January 15, 2019 11:57:45 PM IST Christoph Hellwig wrote:
> So after review that thread and this thread I still don't really
> see a refcounting problem, just a use after free of the
> wait_for_completion member for fast aio completions.
> 
> Chandan, can you give this patch a spin?
>

This fixes the bug,

Tested-by: Chandan Rajendra <chandan@linux.ibm.com>

> diff --git a/fs/iomap.c b/fs/iomap.c
> index cb184ff68680..47362397cb82 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -1813,6 +1813,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  	loff_t pos = iocb->ki_pos, start = pos;
>  	loff_t end = iocb->ki_pos + count - 1, ret = 0;
>  	unsigned int flags = IOMAP_DIRECT;
> +	bool wait_for_completion = is_sync_kiocb(iocb);
>  	struct blk_plug plug;
>  	struct iomap_dio *dio;
>  
> @@ -1832,7 +1833,6 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  	dio->end_io = end_io;
>  	dio->error = 0;
>  	dio->flags = 0;
> -	dio->wait_for_completion = is_sync_kiocb(iocb);
>  
>  	dio->submit.iter = iter;
>  	dio->submit.waiter = current;
> @@ -1887,7 +1887,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  		dio_warn_stale_pagecache(iocb->ki_filp);
>  	ret = 0;
>  
> -	if (iov_iter_rw(iter) == WRITE && !dio->wait_for_completion &&
> +	if (iov_iter_rw(iter) == WRITE && !wait_for_completion &&
>  	    !inode->i_sb->s_dio_done_wq) {
>  		ret = sb_init_dio_done_wq(inode->i_sb);
>  		if (ret < 0)
> @@ -1903,7 +1903,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  		if (ret <= 0) {
>  			/* magic error code to fall back to buffered I/O */
>  			if (ret == -ENOTBLK) {
> -				dio->wait_for_completion = true;
> +				wait_for_completion = true;
>  				ret = 0;
>  			}
>  			break;
> @@ -1925,8 +1925,24 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  	if (dio->flags & IOMAP_DIO_WRITE_FUA)
>  		dio->flags &= ~IOMAP_DIO_NEED_SYNC;
>  
> +	/*
> +	 * We are about to drop our additional submission reference, which
> +	 * might be the last reference to the dio.  There are three three
> +	 * different ways we can progress here:
> +	 *
> +	 *  (a) If this is the last reference we will always complete and free
> +	 *	the dio ourselves. right here.
> +	 *  (b) If this is not the last reference, and we serve an asynchronous
> +	 *	iocb, we must never touch the dio after the decrement, the
> +	 *	I/O completion handler will complete and free it.
> +	 *  (c) If this is not the last reference, but we serve a synchronous
> +	 *	iocb, the I/O completion handler will wake us up on the drop
> +	 *	of the final reference, and we will complete and free it here
> +	 *	after we got woken by the I/O completion handler.
> +	 */
> +	dio->wait_for_completion = wait_for_completion;
>  	if (!atomic_dec_and_test(&dio->ref)) {
> -		if (!dio->wait_for_completion)
> +		if (!wait_for_completion)
>  			return -EIOCBQUEUED;
>  
>  		for (;;) {
> @@ -1943,9 +1959,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  		__set_current_state(TASK_RUNNING);
>  	}
>  
> -	ret = iomap_dio_complete(dio);
> -
> -	return ret;
> +	return iomap_dio_complete(dio);
>  
>  out_free_dio:
>  	kfree(dio);
> 
> 


-- 
chandan

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-01-16  5:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-10 10:17 BUG: iomap_dio_rw() accesses freed memory Chandan Rajendra
2019-01-10 14:25 ` Christoph Hellwig
2019-01-10 16:49   ` Chandan Rajendra
2019-01-10 17:09     ` Darrick J. Wong
2019-01-15 18:27       ` Christoph Hellwig
2019-01-15 20:51         ` Dave Chinner
2019-01-15 21:09           ` Christoph Hellwig
2019-01-15 23:18             ` Dave Chinner
2019-01-15 23:24               ` Darrick J. Wong
2019-01-16  5:59         ` Chandan Rajendra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.