LKML Archive on lore.kernel.org
 help / color / Atom feed
* Re: BUG: aio/direct-io data corruption in 4.7
@ 2018-11-05 15:16 Gregory Shapiro
  2018-11-06  7:28 ` Jack Wang
  0 siblings, 1 reply; 6+ messages in thread
From: Gregory Shapiro @ 2018-11-05 15:16 UTC (permalink / raw)
  To: hch, jnicklin
  Cc: linux-kernel, linux-fsdevel, gregory.shapiro, Gregory Shapiro

Hello, my name is Gregory Shapiro and I am a newbie on this list.
I recently encountered data corruption as I got a kernel to
acknowledge write ("io_getevents" system call with a correct number of
bytes) but undergoing write to disk failed.
After investigating the problem I found it is identical to issue found
in direct-io.c mentioned the bellow thread.
https://lore.kernel.org/lkml/20160921141539.GA17898@infradead.org/
Is there a reason proposed patch didn't apply to the kernel?
When can I expect it to be applied?
Thanks,
 Gregory

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BUG: aio/direct-io data corruption in 4.7
  2018-11-05 15:16 BUG: aio/direct-io data corruption in 4.7 Gregory Shapiro
@ 2018-11-06  7:28 ` Jack Wang
  2018-11-06 11:31   ` Gregory Shapiro
  0 siblings, 1 reply; 6+ messages in thread
From: Jack Wang @ 2018-11-06  7:28 UTC (permalink / raw)
  To: shapiro.gregory
  Cc: hch, jnicklin, linux-kernel, linux-fsdevel, gregory.shapiro

Gregory Shapiro <shapiro.gregory@gmail.com> 于2018年11月5日周一 下午4:19写道:
>
> Hello, my name is Gregory Shapiro and I am a newbie on this list.
> I recently encountered data corruption as I got a kernel to
> acknowledge write ("io_getevents" system call with a correct number of
> bytes) but undergoing write to disk failed.
> After investigating the problem I found it is identical to issue found
> in direct-io.c mentioned the bellow thread.
> https://lore.kernel.org/lkml/20160921141539.GA17898@infradead.org/
> Is there a reason proposed patch didn't apply to the kernel?
> When can I expect it to be applied?
> Thanks,
>  Gregory

Hi Gregory,

Thanks for your info.
Have you tried with latest kernel other than 4.7, is the problem still there?

Could you share your test case?

Regards,
Jack Wang

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BUG: aio/direct-io data corruption in 4.7
  2018-11-06  7:28 ` Jack Wang
@ 2018-11-06 11:31   ` Gregory Shapiro
  2018-11-09 15:44     ` Jack Wang
  0 siblings, 1 reply; 6+ messages in thread
From: Gregory Shapiro @ 2018-11-06 11:31 UTC (permalink / raw)
  To: jack.wang.usish
  Cc: hch, jnicklin, linux-kernel, linux-fsdevel, gregory.shapiro

Hi Jack,
I tested it in 4.9.102 and I checked the latest code from elixir
(versions 4.19 and 4.20) and the error in code is still present there.
More on the scenario and the bug:
I experienced data corruption in my application (nvme based storage).
The issue was caused because of faulty hardware, but the real problem
is I got a correct number of bytes in io_getevents thus couldn't
recognize it correctly the error.
Looking at the /var/log/messages and  I saw the following errors in
time of coruption:

Oct 11 14:55:15 block01-node05 kernel: [19272.951015]
blk_update_request: I/O error, dev nvme2n3, sector 117359360
Oct 11 14:55:15 block01-node05 kernel: [19272.952786]
blk_update_request: I/O error, dev nvme2n3, sector 117359872
Oct 11 14:55:16 block01-node05 kernel: [19273.544374]
blk_update_request: I/O error, dev nvme2n3, sector 117360384
...
So the block level does receive information about the error, but I
don't see it in the application.
running ftrace and doing code reading I find out that dio error status
is overridden.
In dio_complete it is propagated in (dio->io_error and if
dio->io_error is not zero in we are in async write
the status is overridden by transferred.

static ssize_t dio_complete(struct dio *dio, ssize_t ret, bool is_async)
{
...
        if (ret == 0)
                ret = dio->page_errors;
        if (ret == 0)
                ret = dio->io_error;
        if (ret == 0)
                ret = transferred;
...
        if (is_async) {
                /*
                 * generic_write_sync expects ki_pos to have been updated
                 * already, but the submission path only does this for
                 * synchronous I/O.
                 */
                dio->iocb->ki_pos += transferred;

                if (dio->op == REQ_OP_WRITE)
                        ret = generic_write_sync(dio->iocb,  transferred);
                dio->iocb->ki_complete(dio->iocb, ret, 0);



For your convenience I am attaching ftrace log to for easier tracking
the flow in the code:


 26)               |                nvme_complete_rq [nvme_core]() {
 26)               |                  blk_mq_end_request() {
 26)               |                    blk_update_request() { <----
log is from here
 26)   0.563 us    |                      blk_account_io_completion();
 26)   0.263 us    |                      bio_advance();
 26)               |                      bio_endio() {
 26)               |                        dio_bio_end_aio() {
 26)               |                          dio_bio_complete() {
 26)               |                            bio_check_pages_dirty() {
 26)               |                              bio_put() {
 26)               |                                bio_free() {
 26)               |                                  __bio_free() {
 26)   0.045 us    |                                    bio_disassociate_task();
 26)   0.497 us    |                                  }
 26)   0.042 us    |                                  bvec_free();
 26)               |                                  mempool_free() {
 26)               |                                    mempool_free_slab() {
 26)   0.264 us    |                                      kmem_cache_free();
 26)   0.606 us    |                                    }
 26)   1.125 us    |                                  }
 26)   2.588 us    |                                }
 26)   2.920 us    |                              }
 26)   3.979 us    |                            }
 26)   4.712 us    |                          }
 26)   0.040 us    |                          _raw_spin_lock_irqsave();
 26)   0.048 us    |                          _raw_spin_unlock_irqrestore();
 26)               |                          dio_complete() {
dio_complete(dio, 0, true);
 26)               |                            aio_complete() {
dio->iocb->ki_complete(dio->iocb, ret, 0); <<aio_complete(struct kiocb
*kiocb, long res, long res2)>>
 26)   0.073 us    |                              _raw_spin_lock_irqsave();
 26)   0.114 us    |                              refill_reqs_available();
 26)   0.048 us    |                              _raw_spin_unlock_irqrestore();
 26)               |                              kiocb_free() {
 26)   0.171 us    |                                fput();
 26)   0.102 us    |                                kmem_cache_free();
 26)   0.902 us    |                              }


        }

On Tue, Nov 6, 2018 at 9:29 AM Jack Wang <jack.wang.usish@gmail.com> wrote:
>
> Gregory Shapiro <shapiro.gregory@gmail.com> 于2018年11月5日周一 下午4:19写道:
> >
> > Hello, my name is Gregory Shapiro and I am a newbie on this list.
> > I recently encountered data corruption as I got a kernel to
> > acknowledge write ("io_getevents" system call with a correct number of
> > bytes) but undergoing write to disk failed.
> > After investigating the problem I found it is identical to issue found
> > in direct-io.c mentioned the bellow thread.
> > https://lore.kernel.org/lkml/20160921141539.GA17898@infradead.org/
> > Is there a reason proposed patch didn't apply to the kernel?
> > When can I expect it to be applied?
> > Thanks,
> >  Gregory
>
> Hi Gregory,
>
> Thanks for your info.
> Have you tried with latest kernel other than 4.7, is the problem still there?
>
> Could you share your test case?
>
> Regards,
> Jack Wang

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BUG: aio/direct-io data corruption in 4.7
  2018-11-06 11:31   ` Gregory Shapiro
@ 2018-11-09 15:44     ` Jack Wang
  0 siblings, 0 replies; 6+ messages in thread
From: Jack Wang @ 2018-11-09 15:44 UTC (permalink / raw)
  To: shapiro.gregory
  Cc: hch, jnicklin, linux-kernel, linux-fsdevel, gregory.shapiro

Gregory Shapiro <shapiro.gregory@gmail.com> 于2018年11月6日周二 下午12:31写道:
>
> Hi Jack,
> I tested it in 4.9.102 and I checked the latest code from elixir
> (versions 4.19 and 4.20) and the error in code is still present there.
> More on the scenario and the bug:
> I experienced data corruption in my application (nvme based storage).
> The issue was caused because of faulty hardware, but the real problem
> is I got a correct number of bytes in io_getevents thus couldn't
> recognize it correctly the error.
> Looking at the /var/log/messages and  I saw the following errors in
> time of coruption:
Thanks for the info, Gregory.

I noticed guys from Amazon pushing the fix to upstream:
https://lore.kernel.org/patchwork/patch/1008443/
I hope it will be in upstream soon.


Regards,
Jack Wang

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: BUG: aio/direct-io data corruption in 4.7
  2016-09-12 18:38 Jonathan Nicklin
@ 2016-09-21 14:15 ` Christoph Hellwig
  0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2016-09-21 14:15 UTC (permalink / raw)
  To: Jonathan Nicklin; +Cc: linux-kernel, linux-fsdevel

Hi Jonathan,

please keep linux-fsdevel on the Cc list for something like, and if
you already track down a commit the author of that commit.

> Description: "fs: simplify the generic_write_sync prototype"
> Committed: Apr 7, 2016
> Hash: e259221763a40403d5bb232209998e8c45804ab8
> Affects: 4.7-rc1 - master
> 
> I have confirmed a fix for the AIO/Direct-IO failure condition but
> have not reviewed the rest of the changes associated with that commit.
> If you would like a small patch for direct-io.c, let me know.

On travel right now, but I suspect you want something like this fix?

diff --git a/fs/direct-io.c b/fs/direct-io.c
index 7c3ce73..891f71f 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -276,7 +276,7 @@ static ssize_t dio_complete(struct dio *dio, ssize_t ret, bool is_async)
 		dio->iocb->ki_pos += transferred;
 
 		if (dio->op == REQ_OP_WRITE)
-			ret = generic_write_sync(dio->iocb,  transferred);
+			ret = generic_write_sync(dio->iocb, ret);
 		dio->iocb->ki_complete(dio->iocb, ret, 0);
 	}
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* BUG: aio/direct-io data corruption in 4.7
@ 2016-09-12 18:38 Jonathan Nicklin
  2016-09-21 14:15 ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Jonathan Nicklin @ 2016-09-12 18:38 UTC (permalink / raw)
  To: linux-kernel

In 4.7.2, the kernel is acknowledging block writes that have not
completed to disk. To reproduce: create an MD array, run FIO (direct +
libaio), and pull all drives. FIO will continue to run without
receiving I/O errors. I have also reproduced the bug using physical
drives. In this case, only a limited number of I/Os are incorrectly
acknowledged; FIO eventually receives an I/O error after the device
reference is removed.

The root cause of the problem is that dio_complete() does not
correctly propagate I/O errors in the is_async case. Specifically,
generic_write_sync() appears to be overwriting the return status
destined for ki_complete().

This bug appears to have been introduced by the following commit:

Description: "fs: simplify the generic_write_sync prototype"
Committed: Apr 7, 2016
Hash: e259221763a40403d5bb232209998e8c45804ab8
Affects: 4.7-rc1 - master

I have confirmed a fix for the AIO/Direct-IO failure condition but
have not reviewed the rest of the changes associated with that commit.
If you would like a small patch for direct-io.c, let me know.

Regards,
-Jonathan

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, back to index

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-05 15:16 BUG: aio/direct-io data corruption in 4.7 Gregory Shapiro
2018-11-06  7:28 ` Jack Wang
2018-11-06 11:31   ` Gregory Shapiro
2018-11-09 15:44     ` Jack Wang
  -- strict thread matches above, loose matches on Subject: below --
2016-09-12 18:38 Jonathan Nicklin
2016-09-21 14:15 ` Christoph Hellwig

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git