* BUG: iomap_dio_rw() accesses freed memory
@ 2019-01-10 10:17 Chandan Rajendra
2019-01-10 14:25 ` Christoph Hellwig
0 siblings, 1 reply; 10+ messages in thread
From: Chandan Rajendra @ 2019-01-10 10:17 UTC (permalink / raw)
To: linux-xfs; +Cc: hch, darrick.wong
The following call trace is observed on next-20190110, when generic/323 test
is executed on a 4k block sized XFS filesystem on ppc64le machine,
[ T7710] BUG: Unable to handle kernel data access at 0x6b6b6b6b6b6b6be3
[ T7710] Faulting instruction address: 0xc00000000090372c
[ T7710] Oops: Kernel access of bad area, sig: 11 [#12]
[ T7710] LE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries
[ T7710] Modules linked in:
[ T7710] CPU: 7 PID: 7710 Comm: aio-last-ref-he Tainted: G D 5.0.0-rc1-next-20190110 #36
[ T7710] NIP: c00000000090372c LR: c0000000004bf75c CTR: c00000000091cfe0
[ T7710] REGS: c00000062fc67730 TRAP: 0380 Tainted: G D (5.0.0-rc1-next-20190110)
[ T7710] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44428224 XER: 20000000
[ T7710] CFAR: c0000000004bf758 IRQMASK: 0
[ T7710] GPR00: c0000000004bf75c c00000062fc679c0 c0000000017f4b00 6b6b6b6b6b6b6b6b
[ T7710] GPR04: 000000006b6b6b6b 0000000000000001 0000000000000000 0000000000000001
[ T7710] GPR08: 0000000000000000 6b6b6b6b6b6b6b6b 0000000000000000 0000000000000000
[ T7710] GPR12: 0000000044428224 c00000003ff93700 00000000000f0000 0000000105611a30
[ T7710] GPR16: 00007fff52fde088 00000001056119e8 00007fff52fde288 00000001056119c8
[ T7710] GPR20: 0000000000000010 c000000634258bb8 00000000000effff c000000627e94918
[ T7710] GPR24: c000000000fb3290 c000000627e94700 fffffffffffffdef 00000000000f0000
[ T7710] GPR28: 0000000000000000 0000000000000002 c0000006247ae180 c000000634258b98
[ T7710] NIP [c00000000090372c] blk_poll+0x2c/0x3d0
[ T7710] LR [c0000000004bf75c] iomap_dio_rw+0x45c/0x4f0
[ T7710] Call Trace:
[ T7710] [c00000062fc67a60] [c0000000004bf7b8] iomap_dio_rw+0x4b8/0x4f0
[ T7710] [c00000062fc67b20] [c0000000006ed3a4] xfs_file_dio_aio_read+0xb4/0x250
[ T7710] [c00000062fc67b80] [c0000000006edbc4] xfs_file_read_iter+0x114/0x160
[ T7710] [c00000062fc67bc0] [c00000000049476c] aio_read+0x12c/0x1d0
[ T7710] [c00000062fc67cc0] [c000000000495094] io_submit_one+0x634/0xbd0
[ T7710] [c00000062fc67d90] [c0000000004956f0] sys_io_submit+0xc0/0x380
[ T7710] [c00000062fc67e20] [c00000000000bae4] system_call+0x5c/0x70
[ T7710] Instruction dump:
[ T7710] 60000000 3c4c00ef 38421400 7c0802a6 60000000 7d800026 2f84ffff fb81ffe0
[ T7710] 3b800000 91810008 f821ff61 419e01a4 <e9230078> 793c6fe3 41820198 7c0802a6
[ T7710] ---[ end trace b14f7219ccf85a08 ]---
[ T7710]
The is happening due to the following sequence of events,
1. iomap_dio_rw() allocates a 'struct iomap_dio'. Ref count is set to 1.
2. iomap_dio_bio_actor increments ref count to 2 and submits a bio.
3. iomap_dio_rw() decrements ref count. Since dio->ref is non-zero,
!atomic_dec_and_test(&dio->ref) would evaluate to true.
4. Meanwhile the bio submitted in step 2 completes I/O. iomap_dio_bio_end_io()
decrements dio->ref to the value of 0. This would inturn call
iomap_dio_complete() which frees the iomap_dio structure.
5. iomap_dio_rw() can now deference a freed structure via either
"!dio->submit.last_queue" or "!blk_poll(dio->submit.last_queue,
dio->submit.cookie, true)"
--
chandan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: BUG: iomap_dio_rw() accesses freed memory
2019-01-10 10:17 BUG: iomap_dio_rw() accesses freed memory Chandan Rajendra
@ 2019-01-10 14:25 ` Christoph Hellwig
2019-01-10 16:49 ` Chandan Rajendra
0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2019-01-10 14:25 UTC (permalink / raw)
To: Chandan Rajendra; +Cc: linux-xfs, hch, darrick.wong
On Thu, Jan 10, 2019 at 03:47:09PM +0530, Chandan Rajendra wrote:
> The following call trace is observed on next-20190110, when generic/323 test
> is executed on a 4k block sized XFS filesystem on ppc64le machine,
>
> [ T7710] BUG: Unable to handle kernel data access at 0x6b6b6b6b6b6b6be3
> [ T7710] Faulting instruction address: 0xc00000000090372c
> [ T7710] Oops: Kernel access of bad area, sig: 11 [#12]
> [ T7710] LE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries
> [ T7710] Modules linked in:
> [ T7710] CPU: 7 PID: 7710 Comm: aio-last-ref-he Tainted: G D 5.0.0-rc1-next-20190110 #36
> [ T7710] NIP: c00000000090372c LR: c0000000004bf75c CTR: c00000000091cfe0
> [ T7710] REGS: c00000062fc67730 TRAP: 0380 Tainted: G D (5.0.0-rc1-next-20190110)
> [ T7710] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44428224 XER: 20000000
> [ T7710] CFAR: c0000000004bf758 IRQMASK: 0
> [ T7710] GPR00: c0000000004bf75c c00000062fc679c0 c0000000017f4b00 6b6b6b6b6b6b6b6b
> [ T7710] GPR04: 000000006b6b6b6b 0000000000000001 0000000000000000 0000000000000001
> [ T7710] GPR08: 0000000000000000 6b6b6b6b6b6b6b6b 0000000000000000 0000000000000000
> [ T7710] GPR12: 0000000044428224 c00000003ff93700 00000000000f0000 0000000105611a30
> [ T7710] GPR16: 00007fff52fde088 00000001056119e8 00007fff52fde288 00000001056119c8
> [ T7710] GPR20: 0000000000000010 c000000634258bb8 00000000000effff c000000627e94918
> [ T7710] GPR24: c000000000fb3290 c000000627e94700 fffffffffffffdef 00000000000f0000
> [ T7710] GPR28: 0000000000000000 0000000000000002 c0000006247ae180 c000000634258b98
> [ T7710] NIP [c00000000090372c] blk_poll+0x2c/0x3d0
> [ T7710] LR [c0000000004bf75c] iomap_dio_rw+0x45c/0x4f0
> [ T7710] Call Trace:
> [ T7710] [c00000062fc67a60] [c0000000004bf7b8] iomap_dio_rw+0x4b8/0x4f0
> [ T7710] [c00000062fc67b20] [c0000000006ed3a4] xfs_file_dio_aio_read+0xb4/0x250
> [ T7710] [c00000062fc67b80] [c0000000006edbc4] xfs_file_read_iter+0x114/0x160
> [ T7710] [c00000062fc67bc0] [c00000000049476c] aio_read+0x12c/0x1d0
> [ T7710] [c00000062fc67cc0] [c000000000495094] io_submit_one+0x634/0xbd0
> [ T7710] [c00000062fc67d90] [c0000000004956f0] sys_io_submit+0xc0/0x380
> [ T7710] [c00000062fc67e20] [c00000000000bae4] system_call+0x5c/0x70
> [ T7710] Instruction dump:
> [ T7710] 60000000 3c4c00ef 38421400 7c0802a6 60000000 7d800026 2f84ffff fb81ffe0
> [ T7710] 3b800000 91810008 f821ff61 419e01a4 <e9230078> 793c6fe3 41820198 7c0802a6
> [ T7710] ---[ end trace b14f7219ccf85a08 ]---
> [ T7710]
>
>
> The is happening due to the following sequence of events,
>
> 1. iomap_dio_rw() allocates a 'struct iomap_dio'. Ref count is set to 1.
> 2. iomap_dio_bio_actor increments ref count to 2 and submits a bio.
> 3. iomap_dio_rw() decrements ref count. Since dio->ref is non-zero,
> !atomic_dec_and_test(&dio->ref) would evaluate to true.
> 4. Meanwhile the bio submitted in step 2 completes I/O. iomap_dio_bio_end_io()
> decrements dio->ref to the value of 0. This would inturn call
> iomap_dio_complete() which frees the iomap_dio structure.
If iomap_dio_bio_end_io completes the dio this should be an AIO request.
> 5. iomap_dio_rw() can now deference a freed structure via either
> "!dio->submit.last_queue" or "!blk_poll(dio->submit.last_queue,
> dio->submit.cookie, true)"
But then again this code is only executed for the wait_for_completion
case. So I wonder how you manage to hit this.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: BUG: iomap_dio_rw() accesses freed memory
2019-01-10 14:25 ` Christoph Hellwig
@ 2019-01-10 16:49 ` Chandan Rajendra
2019-01-10 17:09 ` Darrick J. Wong
0 siblings, 1 reply; 10+ messages in thread
From: Chandan Rajendra @ 2019-01-10 16:49 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-xfs, darrick.wong
On Thursday, January 10, 2019 7:55:52 PM IST Christoph Hellwig wrote:
> On Thu, Jan 10, 2019 at 03:47:09PM +0530, Chandan Rajendra wrote:
> > The following call trace is observed on next-20190110, when generic/323 test
> > is executed on a 4k block sized XFS filesystem on ppc64le machine,
> >
> > [ T7710] BUG: Unable to handle kernel data access at 0x6b6b6b6b6b6b6be3
> > [ T7710] Faulting instruction address: 0xc00000000090372c
> > [ T7710] Oops: Kernel access of bad area, sig: 11 [#12]
> > [ T7710] LE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries
> > [ T7710] Modules linked in:
> > [ T7710] CPU: 7 PID: 7710 Comm: aio-last-ref-he Tainted: G D 5.0.0-rc1-next-20190110 #36
> > [ T7710] NIP: c00000000090372c LR: c0000000004bf75c CTR: c00000000091cfe0
> > [ T7710] REGS: c00000062fc67730 TRAP: 0380 Tainted: G D (5.0.0-rc1-next-20190110)
> > [ T7710] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44428224 XER: 20000000
> > [ T7710] CFAR: c0000000004bf758 IRQMASK: 0
> > [ T7710] GPR00: c0000000004bf75c c00000062fc679c0 c0000000017f4b00 6b6b6b6b6b6b6b6b
> > [ T7710] GPR04: 000000006b6b6b6b 0000000000000001 0000000000000000 0000000000000001
> > [ T7710] GPR08: 0000000000000000 6b6b6b6b6b6b6b6b 0000000000000000 0000000000000000
> > [ T7710] GPR12: 0000000044428224 c00000003ff93700 00000000000f0000 0000000105611a30
> > [ T7710] GPR16: 00007fff52fde088 00000001056119e8 00007fff52fde288 00000001056119c8
> > [ T7710] GPR20: 0000000000000010 c000000634258bb8 00000000000effff c000000627e94918
> > [ T7710] GPR24: c000000000fb3290 c000000627e94700 fffffffffffffdef 00000000000f0000
> > [ T7710] GPR28: 0000000000000000 0000000000000002 c0000006247ae180 c000000634258b98
> > [ T7710] NIP [c00000000090372c] blk_poll+0x2c/0x3d0
> > [ T7710] LR [c0000000004bf75c] iomap_dio_rw+0x45c/0x4f0
> > [ T7710] Call Trace:
> > [ T7710] [c00000062fc67a60] [c0000000004bf7b8] iomap_dio_rw+0x4b8/0x4f0
> > [ T7710] [c00000062fc67b20] [c0000000006ed3a4] xfs_file_dio_aio_read+0xb4/0x250
> > [ T7710] [c00000062fc67b80] [c0000000006edbc4] xfs_file_read_iter+0x114/0x160
> > [ T7710] [c00000062fc67bc0] [c00000000049476c] aio_read+0x12c/0x1d0
> > [ T7710] [c00000062fc67cc0] [c000000000495094] io_submit_one+0x634/0xbd0
> > [ T7710] [c00000062fc67d90] [c0000000004956f0] sys_io_submit+0xc0/0x380
> > [ T7710] [c00000062fc67e20] [c00000000000bae4] system_call+0x5c/0x70
> > [ T7710] Instruction dump:
> > [ T7710] 60000000 3c4c00ef 38421400 7c0802a6 60000000 7d800026 2f84ffff fb81ffe0
> > [ T7710] 3b800000 91810008 f821ff61 419e01a4 <e9230078> 793c6fe3 41820198 7c0802a6
> > [ T7710] ---[ end trace b14f7219ccf85a08 ]---
> > [ T7710]
> >
> >
> > The is happening due to the following sequence of events,
> >
> > 1. iomap_dio_rw() allocates a 'struct iomap_dio'. Ref count is set to 1.
> > 2. iomap_dio_bio_actor increments ref count to 2 and submits a bio.
> > 3. iomap_dio_rw() decrements ref count. Since dio->ref is non-zero,
> > !atomic_dec_and_test(&dio->ref) would evaluate to true.
> > 4. Meanwhile the bio submitted in step 2 completes I/O. iomap_dio_bio_end_io()
> > decrements dio->ref to the value of 0. This would inturn call
> > iomap_dio_complete() which frees the iomap_dio structure.
>
> If iomap_dio_bio_end_io completes the dio this should be an AIO request.
>
> > 5. iomap_dio_rw() can now deference a freed structure via either
> > "!dio->submit.last_queue" or "!blk_poll(dio->submit.last_queue,
> > dio->submit.cookie, true)"
>
> But then again this code is only executed for the wait_for_completion
> case. So I wonder how you manage to hit this.
A call to printk() right after " if (!atomic_dec_and_test(&dio->ref)) { " in
iomap_dio_rw() gave,
iomap_dio_rw: dio = 00000000540abd3d; dio->submit.last_queue = 000000000788c132; dio->wait_for_completion = 107
This is most probably because the freed dio's memory is already being written
to by its owning code.
--
chandan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: BUG: iomap_dio_rw() accesses freed memory
2019-01-10 16:49 ` Chandan Rajendra
@ 2019-01-10 17:09 ` Darrick J. Wong
2019-01-15 18:27 ` Christoph Hellwig
0 siblings, 1 reply; 10+ messages in thread
From: Darrick J. Wong @ 2019-01-10 17:09 UTC (permalink / raw)
To: Chandan Rajendra; +Cc: Christoph Hellwig, linux-xfs
On Thu, Jan 10, 2019 at 10:19:26PM +0530, Chandan Rajendra wrote:
> On Thursday, January 10, 2019 7:55:52 PM IST Christoph Hellwig wrote:
> > On Thu, Jan 10, 2019 at 03:47:09PM +0530, Chandan Rajendra wrote:
> > > The following call trace is observed on next-20190110, when generic/323 test
> > > is executed on a 4k block sized XFS filesystem on ppc64le machine,
> > >
> > > [ T7710] BUG: Unable to handle kernel data access at 0x6b6b6b6b6b6b6be3
> > > [ T7710] Faulting instruction address: 0xc00000000090372c
> > > [ T7710] Oops: Kernel access of bad area, sig: 11 [#12]
> > > [ T7710] LE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries
> > > [ T7710] Modules linked in:
> > > [ T7710] CPU: 7 PID: 7710 Comm: aio-last-ref-he Tainted: G D 5.0.0-rc1-next-20190110 #36
> > > [ T7710] NIP: c00000000090372c LR: c0000000004bf75c CTR: c00000000091cfe0
> > > [ T7710] REGS: c00000062fc67730 TRAP: 0380 Tainted: G D (5.0.0-rc1-next-20190110)
> > > [ T7710] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44428224 XER: 20000000
> > > [ T7710] CFAR: c0000000004bf758 IRQMASK: 0
> > > [ T7710] GPR00: c0000000004bf75c c00000062fc679c0 c0000000017f4b00 6b6b6b6b6b6b6b6b
> > > [ T7710] GPR04: 000000006b6b6b6b 0000000000000001 0000000000000000 0000000000000001
> > > [ T7710] GPR08: 0000000000000000 6b6b6b6b6b6b6b6b 0000000000000000 0000000000000000
> > > [ T7710] GPR12: 0000000044428224 c00000003ff93700 00000000000f0000 0000000105611a30
> > > [ T7710] GPR16: 00007fff52fde088 00000001056119e8 00007fff52fde288 00000001056119c8
> > > [ T7710] GPR20: 0000000000000010 c000000634258bb8 00000000000effff c000000627e94918
> > > [ T7710] GPR24: c000000000fb3290 c000000627e94700 fffffffffffffdef 00000000000f0000
> > > [ T7710] GPR28: 0000000000000000 0000000000000002 c0000006247ae180 c000000634258b98
> > > [ T7710] NIP [c00000000090372c] blk_poll+0x2c/0x3d0
> > > [ T7710] LR [c0000000004bf75c] iomap_dio_rw+0x45c/0x4f0
> > > [ T7710] Call Trace:
> > > [ T7710] [c00000062fc67a60] [c0000000004bf7b8] iomap_dio_rw+0x4b8/0x4f0
> > > [ T7710] [c00000062fc67b20] [c0000000006ed3a4] xfs_file_dio_aio_read+0xb4/0x250
> > > [ T7710] [c00000062fc67b80] [c0000000006edbc4] xfs_file_read_iter+0x114/0x160
> > > [ T7710] [c00000062fc67bc0] [c00000000049476c] aio_read+0x12c/0x1d0
> > > [ T7710] [c00000062fc67cc0] [c000000000495094] io_submit_one+0x634/0xbd0
> > > [ T7710] [c00000062fc67d90] [c0000000004956f0] sys_io_submit+0xc0/0x380
> > > [ T7710] [c00000062fc67e20] [c00000000000bae4] system_call+0x5c/0x70
> > > [ T7710] Instruction dump:
> > > [ T7710] 60000000 3c4c00ef 38421400 7c0802a6 60000000 7d800026 2f84ffff fb81ffe0
> > > [ T7710] 3b800000 91810008 f821ff61 419e01a4 <e9230078> 793c6fe3 41820198 7c0802a6
> > > [ T7710] ---[ end trace b14f7219ccf85a08 ]---
> > > [ T7710]
> > >
> > >
> > > The is happening due to the following sequence of events,
> > >
> > > 1. iomap_dio_rw() allocates a 'struct iomap_dio'. Ref count is set to 1.
> > > 2. iomap_dio_bio_actor increments ref count to 2 and submits a bio.
> > > 3. iomap_dio_rw() decrements ref count. Since dio->ref is non-zero,
> > > !atomic_dec_and_test(&dio->ref) would evaluate to true.
> > > 4. Meanwhile the bio submitted in step 2 completes I/O. iomap_dio_bio_end_io()
> > > decrements dio->ref to the value of 0. This would inturn call
> > > iomap_dio_complete() which frees the iomap_dio structure.
> >
> > If iomap_dio_bio_end_io completes the dio this should be an AIO request.
> >
> > > 5. iomap_dio_rw() can now deference a freed structure via either
> > > "!dio->submit.last_queue" or "!blk_poll(dio->submit.last_queue,
> > > dio->submit.cookie, true)"
> >
> > But then again this code is only executed for the wait_for_completion
> > case. So I wonder how you manage to hit this.
>
> A call to printk() right after " if (!atomic_dec_and_test(&dio->ref)) { " in
> iomap_dio_rw() gave,
>
> iomap_dio_rw: dio = 00000000540abd3d; dio->submit.last_queue = 000000000788c132; dio->wait_for_completion = 107
>
> This is most probably because the freed dio's memory is already being written
> to by its owning code.
generic/323 on a very fast scsi device reproduces (and hangs) easily.
The UAF happens (on my machine, anyway) because the bio endio function
gets called before the submitting process even reaches blk_finish_plug.
This is the exact same issue being discussed in "Broken dio refcounting
leads to livelock?" I think Dave Chinner was working on a less gross
fix than the one we'd come up with in that other thread.
--D
> --
> chandan
>
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: BUG: iomap_dio_rw() accesses freed memory
2019-01-10 17:09 ` Darrick J. Wong
@ 2019-01-15 18:27 ` Christoph Hellwig
2019-01-15 20:51 ` Dave Chinner
2019-01-16 5:59 ` Chandan Rajendra
0 siblings, 2 replies; 10+ messages in thread
From: Christoph Hellwig @ 2019-01-15 18:27 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Chandan Rajendra, Dave Chinner, linux-xfs
So after review that thread and this thread I still don't really
see a refcounting problem, just a use after free of the
wait_for_completion member for fast aio completions.
Chandan, can you give this patch a spin?
diff --git a/fs/iomap.c b/fs/iomap.c
index cb184ff68680..47362397cb82 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -1813,6 +1813,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
loff_t pos = iocb->ki_pos, start = pos;
loff_t end = iocb->ki_pos + count - 1, ret = 0;
unsigned int flags = IOMAP_DIRECT;
+ bool wait_for_completion = is_sync_kiocb(iocb);
struct blk_plug plug;
struct iomap_dio *dio;
@@ -1832,7 +1833,6 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
dio->end_io = end_io;
dio->error = 0;
dio->flags = 0;
- dio->wait_for_completion = is_sync_kiocb(iocb);
dio->submit.iter = iter;
dio->submit.waiter = current;
@@ -1887,7 +1887,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
dio_warn_stale_pagecache(iocb->ki_filp);
ret = 0;
- if (iov_iter_rw(iter) == WRITE && !dio->wait_for_completion &&
+ if (iov_iter_rw(iter) == WRITE && !wait_for_completion &&
!inode->i_sb->s_dio_done_wq) {
ret = sb_init_dio_done_wq(inode->i_sb);
if (ret < 0)
@@ -1903,7 +1903,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
if (ret <= 0) {
/* magic error code to fall back to buffered I/O */
if (ret == -ENOTBLK) {
- dio->wait_for_completion = true;
+ wait_for_completion = true;
ret = 0;
}
break;
@@ -1925,8 +1925,24 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
if (dio->flags & IOMAP_DIO_WRITE_FUA)
dio->flags &= ~IOMAP_DIO_NEED_SYNC;
+ /*
+ * We are about to drop our additional submission reference, which
+ * might be the last reference to the dio. There are three three
+ * different ways we can progress here:
+ *
+ * (a) If this is the last reference we will always complete and free
+ * the dio ourselves. right here.
+ * (b) If this is not the last reference, and we serve an asynchronous
+ * iocb, we must never touch the dio after the decrement, the
+ * I/O completion handler will complete and free it.
+ * (c) If this is not the last reference, but we serve a synchronous
+ * iocb, the I/O completion handler will wake us up on the drop
+ * of the final reference, and we will complete and free it here
+ * after we got woken by the I/O completion handler.
+ */
+ dio->wait_for_completion = wait_for_completion;
if (!atomic_dec_and_test(&dio->ref)) {
- if (!dio->wait_for_completion)
+ if (!wait_for_completion)
return -EIOCBQUEUED;
for (;;) {
@@ -1943,9 +1959,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
__set_current_state(TASK_RUNNING);
}
- ret = iomap_dio_complete(dio);
-
- return ret;
+ return iomap_dio_complete(dio);
out_free_dio:
kfree(dio);
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: BUG: iomap_dio_rw() accesses freed memory
2019-01-15 18:27 ` Christoph Hellwig
@ 2019-01-15 20:51 ` Dave Chinner
2019-01-15 21:09 ` Christoph Hellwig
2019-01-16 5:59 ` Chandan Rajendra
1 sibling, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2019-01-15 20:51 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Darrick J. Wong, Chandan Rajendra, linux-xfs
On Tue, Jan 15, 2019 at 07:27:45PM +0100, Christoph Hellwig wrote:
> So after review that thread and this thread I still don't really
> see a refcounting problem, just a use after free of the
> wait_for_completion member for fast aio completions.
>
> Chandan, can you give this patch a spin?
>
> diff --git a/fs/iomap.c b/fs/iomap.c
> index cb184ff68680..47362397cb82 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -1813,6 +1813,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> loff_t pos = iocb->ki_pos, start = pos;
> loff_t end = iocb->ki_pos + count - 1, ret = 0;
> unsigned int flags = IOMAP_DIRECT;
> + bool wait_for_completion = is_sync_kiocb(iocb);
> struct blk_plug plug;
> struct iomap_dio *dio;
>
> @@ -1832,7 +1833,6 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> dio->end_io = end_io;
> dio->error = 0;
> dio->flags = 0;
> - dio->wait_for_completion = is_sync_kiocb(iocb);
>
> dio->submit.iter = iter;
> dio->submit.waiter = current;
> @@ -1887,7 +1887,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> dio_warn_stale_pagecache(iocb->ki_filp);
> ret = 0;
>
> - if (iov_iter_rw(iter) == WRITE && !dio->wait_for_completion &&
> + if (iov_iter_rw(iter) == WRITE && !wait_for_completion &&
> !inode->i_sb->s_dio_done_wq) {
> ret = sb_init_dio_done_wq(inode->i_sb);
> if (ret < 0)
> @@ -1903,7 +1903,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> if (ret <= 0) {
> /* magic error code to fall back to buffered I/O */
> if (ret == -ENOTBLK) {
> - dio->wait_for_completion = true;
> + wait_for_completion = true;
> ret = 0;
> }
> break;
> @@ -1925,8 +1925,24 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> if (dio->flags & IOMAP_DIO_WRITE_FUA)
> dio->flags &= ~IOMAP_DIO_NEED_SYNC;
>
> + /*
> + * We are about to drop our additional submission reference, which
> + * might be the last reference to the dio. There are three three
> + * different ways we can progress here:
> + *
> + * (a) If this is the last reference we will always complete and free
> + * the dio ourselves. right here.
> + * (b) If this is not the last reference, and we serve an asynchronous
> + * iocb, we must never touch the dio after the decrement, the
> + * I/O completion handler will complete and free it.
> + * (c) If this is not the last reference, but we serve a synchronous
> + * iocb, the I/O completion handler will wake us up on the drop
> + * of the final reference, and we will complete and free it here
> + * after we got woken by the I/O completion handler.
> + */
> + dio->wait_for_completion = wait_for_completion;
> if (!atomic_dec_and_test(&dio->ref)) {
Atomic operations don't imply a memory barrier for dependent data,
right? So in completion we do:
if (atomic_dec_and_test(&dio->ref)) {
if (dio->wait_for_completion) {
/* do wakeup */
} else ....
So if we get:
CPU 0 (submission) CPU 1 (completion)
set dio->wfc
dec dio->ref, val = 1 dec dio->ref, val = 0
enter wait loop check dio->wfc
Where's the memory barrier to ensure that CPU 1 sees the value
that CPU 0 set in dio->wait_for_completion?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: BUG: iomap_dio_rw() accesses freed memory
2019-01-15 20:51 ` Dave Chinner
@ 2019-01-15 21:09 ` Christoph Hellwig
2019-01-15 23:18 ` Dave Chinner
0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2019-01-15 21:09 UTC (permalink / raw)
To: Dave Chinner
Cc: Christoph Hellwig, Darrick J. Wong, Chandan Rajendra, linux-xfs
On Wed, Jan 16, 2019 at 07:51:41AM +1100, Dave Chinner wrote:
> Atomic operations don't imply a memory barrier for dependent data,
> right?
Documentation/atomic_t.txt says:
-------------------------- snip --------------------------
The rule of thumb:
- non-RMW operations are unordered;
- RMW operations that have no return value are unordered;
- RMW operations that have a return value are fully ordered;
[...]
Fully ordered primitives are ordered against everything prior and everything
subsequent. Therefore a fully ordered primitive is like having an smp_mb()
before and an smp_mb() after the primitive.
-------------------------- snip --------------------------
I think atomic_dec_and_test clearly falls into the third category,
and I can't see how much of the kernel could work if that wasn't the
case.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: BUG: iomap_dio_rw() accesses freed memory
2019-01-15 21:09 ` Christoph Hellwig
@ 2019-01-15 23:18 ` Dave Chinner
2019-01-15 23:24 ` Darrick J. Wong
0 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2019-01-15 23:18 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Darrick J. Wong, Chandan Rajendra, linux-xfs
On Tue, Jan 15, 2019 at 10:09:03PM +0100, Christoph Hellwig wrote:
> On Wed, Jan 16, 2019 at 07:51:41AM +1100, Dave Chinner wrote:
> > Atomic operations don't imply a memory barrier for dependent data,
> > right?
>
> Documentation/atomic_t.txt says:
>
> -------------------------- snip --------------------------
> The rule of thumb:
>
> - non-RMW operations are unordered;
>
> - RMW operations that have no return value are unordered;
>
> - RMW operations that have a return value are fully ordered;
>
> [...]
>
> Fully ordered primitives are ordered against everything prior and everything
> subsequent. Therefore a fully ordered primitive is like having an smp_mb()
> before and an smp_mb() after the primitive.
I guess I haven't looked at the documentation for a while. Or the
implementation for that matter.
/me goes off and looks.
Oh, they are now implemented with built in, explicit
smp_mb__before_atomic() and smp_mb__after_atomic() barriers. Ok,
so the necessary barriers are there, my brain was telling me they
still needed to be added manually and needed updating.....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: BUG: iomap_dio_rw() accesses freed memory
2019-01-15 23:18 ` Dave Chinner
@ 2019-01-15 23:24 ` Darrick J. Wong
0 siblings, 0 replies; 10+ messages in thread
From: Darrick J. Wong @ 2019-01-15 23:24 UTC (permalink / raw)
To: Dave Chinner; +Cc: Christoph Hellwig, Chandan Rajendra, linux-xfs
On Wed, Jan 16, 2019 at 10:18:51AM +1100, Dave Chinner wrote:
> On Tue, Jan 15, 2019 at 10:09:03PM +0100, Christoph Hellwig wrote:
> > On Wed, Jan 16, 2019 at 07:51:41AM +1100, Dave Chinner wrote:
> > > Atomic operations don't imply a memory barrier for dependent data,
> > > right?
> >
> > Documentation/atomic_t.txt says:
> >
> > -------------------------- snip --------------------------
> > The rule of thumb:
> >
> > - non-RMW operations are unordered;
> >
> > - RMW operations that have no return value are unordered;
> >
> > - RMW operations that have a return value are fully ordered;
> >
> > [...]
> >
> > Fully ordered primitives are ordered against everything prior and everything
> > subsequent. Therefore a fully ordered primitive is like having an smp_mb()
> > before and an smp_mb() after the primitive.
>
> I guess I haven't looked at the documentation for a while. Or the
> implementation for that matter.
>
> /me goes off and looks.
>
> Oh, they are now implemented with built in, explicit
> smp_mb__before_atomic() and smp_mb__after_atomic() barriers. Ok,
> so the necessary barriers are there, my brain was telling me they
> still needed to be added manually and needed updating.....
FWIW this fixes the machine that was cranking out generic/323 hangs,
having run it repeatedly in a loop for the ~4 hours in between falling
down the stairs this morning and finally getting back from all the
errands I had to do today.
Ready for this crap year to be over with, which means this ought to have
someone other than me look over it:
Tested-by: Darrick J. Wong <darrick.wong@oracle.com>
--D
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: BUG: iomap_dio_rw() accesses freed memory
2019-01-15 18:27 ` Christoph Hellwig
2019-01-15 20:51 ` Dave Chinner
@ 2019-01-16 5:59 ` Chandan Rajendra
1 sibling, 0 replies; 10+ messages in thread
From: Chandan Rajendra @ 2019-01-16 5:59 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs
On Tuesday, January 15, 2019 11:57:45 PM IST Christoph Hellwig wrote:
> So after review that thread and this thread I still don't really
> see a refcounting problem, just a use after free of the
> wait_for_completion member for fast aio completions.
>
> Chandan, can you give this patch a spin?
>
This fixes the bug,
Tested-by: Chandan Rajendra <chandan@linux.ibm.com>
> diff --git a/fs/iomap.c b/fs/iomap.c
> index cb184ff68680..47362397cb82 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -1813,6 +1813,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> loff_t pos = iocb->ki_pos, start = pos;
> loff_t end = iocb->ki_pos + count - 1, ret = 0;
> unsigned int flags = IOMAP_DIRECT;
> + bool wait_for_completion = is_sync_kiocb(iocb);
> struct blk_plug plug;
> struct iomap_dio *dio;
>
> @@ -1832,7 +1833,6 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> dio->end_io = end_io;
> dio->error = 0;
> dio->flags = 0;
> - dio->wait_for_completion = is_sync_kiocb(iocb);
>
> dio->submit.iter = iter;
> dio->submit.waiter = current;
> @@ -1887,7 +1887,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> dio_warn_stale_pagecache(iocb->ki_filp);
> ret = 0;
>
> - if (iov_iter_rw(iter) == WRITE && !dio->wait_for_completion &&
> + if (iov_iter_rw(iter) == WRITE && !wait_for_completion &&
> !inode->i_sb->s_dio_done_wq) {
> ret = sb_init_dio_done_wq(inode->i_sb);
> if (ret < 0)
> @@ -1903,7 +1903,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> if (ret <= 0) {
> /* magic error code to fall back to buffered I/O */
> if (ret == -ENOTBLK) {
> - dio->wait_for_completion = true;
> + wait_for_completion = true;
> ret = 0;
> }
> break;
> @@ -1925,8 +1925,24 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> if (dio->flags & IOMAP_DIO_WRITE_FUA)
> dio->flags &= ~IOMAP_DIO_NEED_SYNC;
>
> + /*
> + * We are about to drop our additional submission reference, which
> + * might be the last reference to the dio. There are three three
> + * different ways we can progress here:
> + *
> + * (a) If this is the last reference we will always complete and free
> + * the dio ourselves. right here.
> + * (b) If this is not the last reference, and we serve an asynchronous
> + * iocb, we must never touch the dio after the decrement, the
> + * I/O completion handler will complete and free it.
> + * (c) If this is not the last reference, but we serve a synchronous
> + * iocb, the I/O completion handler will wake us up on the drop
> + * of the final reference, and we will complete and free it here
> + * after we got woken by the I/O completion handler.
> + */
> + dio->wait_for_completion = wait_for_completion;
> if (!atomic_dec_and_test(&dio->ref)) {
> - if (!dio->wait_for_completion)
> + if (!wait_for_completion)
> return -EIOCBQUEUED;
>
> for (;;) {
> @@ -1943,9 +1959,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> __set_current_state(TASK_RUNNING);
> }
>
> - ret = iomap_dio_complete(dio);
> -
> - return ret;
> + return iomap_dio_complete(dio);
>
> out_free_dio:
> kfree(dio);
>
>
--
chandan
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-01-16 5:59 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-10 10:17 BUG: iomap_dio_rw() accesses freed memory Chandan Rajendra
2019-01-10 14:25 ` Christoph Hellwig
2019-01-10 16:49 ` Chandan Rajendra
2019-01-10 17:09 ` Darrick J. Wong
2019-01-15 18:27 ` Christoph Hellwig
2019-01-15 20:51 ` Dave Chinner
2019-01-15 21:09 ` Christoph Hellwig
2019-01-15 23:18 ` Dave Chinner
2019-01-15 23:24 ` Darrick J. Wong
2019-01-16 5:59 ` Chandan Rajendra
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.