* Re: BUG: aio/direct-io data corruption in 4.7
@ 2018-11-05 15:16 Gregory Shapiro
2018-11-06 7:28 ` Jack Wang
0 siblings, 1 reply; 6+ messages in thread
From: Gregory Shapiro @ 2018-11-05 15:16 UTC (permalink / raw)
To: hch, jnicklin
Cc: linux-kernel, linux-fsdevel, gregory.shapiro, Gregory Shapiro
Hello, my name is Gregory Shapiro and I am a newbie on this list.
I recently encountered data corruption as I got a kernel to
acknowledge write ("io_getevents" system call with a correct number of
bytes) but undergoing write to disk failed.
After investigating the problem I found it is identical to issue found
in direct-io.c mentioned the bellow thread.
https://lore.kernel.org/lkml/20160921141539.GA17898@infradead.org/
Is there a reason proposed patch didn't apply to the kernel?
When can I expect it to be applied?
Thanks,
Gregory
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: BUG: aio/direct-io data corruption in 4.7
2018-11-05 15:16 BUG: aio/direct-io data corruption in 4.7 Gregory Shapiro
@ 2018-11-06 7:28 ` Jack Wang
2018-11-06 11:31 ` Gregory Shapiro
0 siblings, 1 reply; 6+ messages in thread
From: Jack Wang @ 2018-11-06 7:28 UTC (permalink / raw)
To: shapiro.gregory
Cc: hch, jnicklin, linux-kernel, linux-fsdevel, gregory.shapiro
Gregory Shapiro <shapiro.gregory@gmail.com> 于2018年11月5日周一 下午4:19写道:
>
> Hello, my name is Gregory Shapiro and I am a newbie on this list.
> I recently encountered data corruption as I got a kernel to
> acknowledge write ("io_getevents" system call with a correct number of
> bytes) but undergoing write to disk failed.
> After investigating the problem I found it is identical to issue found
> in direct-io.c mentioned the bellow thread.
> https://lore.kernel.org/lkml/20160921141539.GA17898@infradead.org/
> Is there a reason proposed patch didn't apply to the kernel?
> When can I expect it to be applied?
> Thanks,
> Gregory
Hi Gregory,
Thanks for your info.
Have you tried with latest kernel other than 4.7, is the problem still there?
Could you share your test case?
Regards,
Jack Wang
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: BUG: aio/direct-io data corruption in 4.7
2018-11-06 7:28 ` Jack Wang
@ 2018-11-06 11:31 ` Gregory Shapiro
2018-11-09 15:44 ` Jack Wang
0 siblings, 1 reply; 6+ messages in thread
From: Gregory Shapiro @ 2018-11-06 11:31 UTC (permalink / raw)
To: jack.wang.usish
Cc: hch, jnicklin, linux-kernel, linux-fsdevel, gregory.shapiro
Hi Jack,
I tested it in 4.9.102 and I checked the latest code from elixir
(versions 4.19 and 4.20) and the error in code is still present there.
More on the scenario and the bug:
I experienced data corruption in my application (nvme based storage).
The issue was caused because of faulty hardware, but the real problem
is I got a correct number of bytes in io_getevents thus couldn't
recognize it correctly the error.
Looking at the /var/log/messages and I saw the following errors in
time of coruption:
Oct 11 14:55:15 block01-node05 kernel: [19272.951015]
blk_update_request: I/O error, dev nvme2n3, sector 117359360
Oct 11 14:55:15 block01-node05 kernel: [19272.952786]
blk_update_request: I/O error, dev nvme2n3, sector 117359872
Oct 11 14:55:16 block01-node05 kernel: [19273.544374]
blk_update_request: I/O error, dev nvme2n3, sector 117360384
...
So the block level does receive information about the error, but I
don't see it in the application.
running ftrace and doing code reading I find out that dio error status
is overridden.
In dio_complete it is propagated in (dio->io_error and if
dio->io_error is not zero in we are in async write
the status is overridden by transferred.
static ssize_t dio_complete(struct dio *dio, ssize_t ret, bool is_async)
{
...
if (ret == 0)
ret = dio->page_errors;
if (ret == 0)
ret = dio->io_error;
if (ret == 0)
ret = transferred;
...
if (is_async) {
/*
* generic_write_sync expects ki_pos to have been updated
* already, but the submission path only does this for
* synchronous I/O.
*/
dio->iocb->ki_pos += transferred;
if (dio->op == REQ_OP_WRITE)
ret = generic_write_sync(dio->iocb, transferred);
dio->iocb->ki_complete(dio->iocb, ret, 0);
For your convenience I am attaching ftrace log to for easier tracking
the flow in the code:
26) | nvme_complete_rq [nvme_core]() {
26) | blk_mq_end_request() {
26) | blk_update_request() { <----
log is from here
26) 0.563 us | blk_account_io_completion();
26) 0.263 us | bio_advance();
26) | bio_endio() {
26) | dio_bio_end_aio() {
26) | dio_bio_complete() {
26) | bio_check_pages_dirty() {
26) | bio_put() {
26) | bio_free() {
26) | __bio_free() {
26) 0.045 us | bio_disassociate_task();
26) 0.497 us | }
26) 0.042 us | bvec_free();
26) | mempool_free() {
26) | mempool_free_slab() {
26) 0.264 us | kmem_cache_free();
26) 0.606 us | }
26) 1.125 us | }
26) 2.588 us | }
26) 2.920 us | }
26) 3.979 us | }
26) 4.712 us | }
26) 0.040 us | _raw_spin_lock_irqsave();
26) 0.048 us | _raw_spin_unlock_irqrestore();
26) | dio_complete() {
dio_complete(dio, 0, true);
26) | aio_complete() {
dio->iocb->ki_complete(dio->iocb, ret, 0); <<aio_complete(struct kiocb
*kiocb, long res, long res2)>>
26) 0.073 us | _raw_spin_lock_irqsave();
26) 0.114 us | refill_reqs_available();
26) 0.048 us | _raw_spin_unlock_irqrestore();
26) | kiocb_free() {
26) 0.171 us | fput();
26) 0.102 us | kmem_cache_free();
26) 0.902 us | }
}
On Tue, Nov 6, 2018 at 9:29 AM Jack Wang <jack.wang.usish@gmail.com> wrote:
>
> Gregory Shapiro <shapiro.gregory@gmail.com> 于2018年11月5日周一 下午4:19写道:
> >
> > Hello, my name is Gregory Shapiro and I am a newbie on this list.
> > I recently encountered data corruption as I got a kernel to
> > acknowledge write ("io_getevents" system call with a correct number of
> > bytes) but undergoing write to disk failed.
> > After investigating the problem I found it is identical to issue found
> > in direct-io.c mentioned the bellow thread.
> > https://lore.kernel.org/lkml/20160921141539.GA17898@infradead.org/
> > Is there a reason proposed patch didn't apply to the kernel?
> > When can I expect it to be applied?
> > Thanks,
> > Gregory
>
> Hi Gregory,
>
> Thanks for your info.
> Have you tried with latest kernel other than 4.7, is the problem still there?
>
> Could you share your test case?
>
> Regards,
> Jack Wang
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: BUG: aio/direct-io data corruption in 4.7
2018-11-06 11:31 ` Gregory Shapiro
@ 2018-11-09 15:44 ` Jack Wang
0 siblings, 0 replies; 6+ messages in thread
From: Jack Wang @ 2018-11-09 15:44 UTC (permalink / raw)
To: shapiro.gregory
Cc: hch, jnicklin, linux-kernel, linux-fsdevel, gregory.shapiro
Gregory Shapiro <shapiro.gregory@gmail.com> 于2018年11月6日周二 下午12:31写道:
>
> Hi Jack,
> I tested it in 4.9.102 and I checked the latest code from elixir
> (versions 4.19 and 4.20) and the error in code is still present there.
> More on the scenario and the bug:
> I experienced data corruption in my application (nvme based storage).
> The issue was caused because of faulty hardware, but the real problem
> is I got a correct number of bytes in io_getevents thus couldn't
> recognize it correctly the error.
> Looking at the /var/log/messages and I saw the following errors in
> time of coruption:
Thanks for the info, Gregory.
I noticed guys from Amazon pushing the fix to upstream:
https://lore.kernel.org/patchwork/patch/1008443/
I hope it will be in upstream soon.
Regards,
Jack Wang
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: BUG: aio/direct-io data corruption in 4.7
2016-09-12 18:38 Jonathan Nicklin
@ 2016-09-21 14:15 ` Christoph Hellwig
0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2016-09-21 14:15 UTC (permalink / raw)
To: Jonathan Nicklin; +Cc: linux-kernel, linux-fsdevel
Hi Jonathan,
please keep linux-fsdevel on the Cc list for something like, and if
you already track down a commit the author of that commit.
> Description: "fs: simplify the generic_write_sync prototype"
> Committed: Apr 7, 2016
> Hash: e259221763a40403d5bb232209998e8c45804ab8
> Affects: 4.7-rc1 - master
>
> I have confirmed a fix for the AIO/Direct-IO failure condition but
> have not reviewed the rest of the changes associated with that commit.
> If you would like a small patch for direct-io.c, let me know.
On travel right now, but I suspect you want something like this fix?
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 7c3ce73..891f71f 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -276,7 +276,7 @@ static ssize_t dio_complete(struct dio *dio, ssize_t ret, bool is_async)
dio->iocb->ki_pos += transferred;
if (dio->op == REQ_OP_WRITE)
- ret = generic_write_sync(dio->iocb, transferred);
+ ret = generic_write_sync(dio->iocb, ret);
dio->iocb->ki_complete(dio->iocb, ret, 0);
}
^ permalink raw reply related [flat|nested] 6+ messages in thread
* BUG: aio/direct-io data corruption in 4.7
@ 2016-09-12 18:38 Jonathan Nicklin
2016-09-21 14:15 ` Christoph Hellwig
0 siblings, 1 reply; 6+ messages in thread
From: Jonathan Nicklin @ 2016-09-12 18:38 UTC (permalink / raw)
To: linux-kernel
In 4.7.2, the kernel is acknowledging block writes that have not
completed to disk. To reproduce: create an MD array, run FIO (direct +
libaio), and pull all drives. FIO will continue to run without
receiving I/O errors. I have also reproduced the bug using physical
drives. In this case, only a limited number of I/Os are incorrectly
acknowledged; FIO eventually receives an I/O error after the device
reference is removed.
The root cause of the problem is that dio_complete() does not
correctly propagate I/O errors in the is_async case. Specifically,
generic_write_sync() appears to be overwriting the return status
destined for ki_complete().
This bug appears to have been introduced by the following commit:
Description: "fs: simplify the generic_write_sync prototype"
Committed: Apr 7, 2016
Hash: e259221763a40403d5bb232209998e8c45804ab8
Affects: 4.7-rc1 - master
I have confirmed a fix for the AIO/Direct-IO failure condition but
have not reviewed the rest of the changes associated with that commit.
If you would like a small patch for direct-io.c, let me know.
Regards,
-Jonathan
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-11-09 15:44 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-05 15:16 BUG: aio/direct-io data corruption in 4.7 Gregory Shapiro
2018-11-06 7:28 ` Jack Wang
2018-11-06 11:31 ` Gregory Shapiro
2018-11-09 15:44 ` Jack Wang
-- strict thread matches above, loose matches on Subject: below --
2016-09-12 18:38 Jonathan Nicklin
2016-09-21 14:15 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).