LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Gregory Shapiro <shapiro.gregory@gmail.com>
To: jack.wang.usish@gmail.com
Cc: hch@infradead.org, jnicklin@blockbridge.com,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	gregory.shapiro@kaminario.com
Subject: Re: BUG: aio/direct-io data corruption in 4.7
Date: Tue, 6 Nov 2018 13:31:44 +0200
Message-ID: <CAAGb-1eLHwRy2L5tp+skWdV5zJ7jfCwTe-1UnJq_n5yjQfdJug@mail.gmail.com> (raw)
In-Reply-To: <CA+res+SDaTwjkKQpnREP1dW8XmtQsk7NGLj=heYGbGeL3Dr-OQ@mail.gmail.com>

Hi Jack,
I tested it in 4.9.102 and I checked the latest code from elixir
(versions 4.19 and 4.20) and the error in code is still present there.
More on the scenario and the bug:
I experienced data corruption in my application (nvme based storage).
The issue was caused because of faulty hardware, but the real problem
is I got a correct number of bytes in io_getevents thus couldn't
recognize it correctly the error.
Looking at the /var/log/messages and  I saw the following errors in
time of coruption:

Oct 11 14:55:15 block01-node05 kernel: [19272.951015]
blk_update_request: I/O error, dev nvme2n3, sector 117359360
Oct 11 14:55:15 block01-node05 kernel: [19272.952786]
blk_update_request: I/O error, dev nvme2n3, sector 117359872
Oct 11 14:55:16 block01-node05 kernel: [19273.544374]
blk_update_request: I/O error, dev nvme2n3, sector 117360384
...
So the block level does receive information about the error, but I
don't see it in the application.
running ftrace and doing code reading I find out that dio error status
is overridden.
In dio_complete it is propagated in (dio->io_error and if
dio->io_error is not zero in we are in async write
the status is overridden by transferred.

static ssize_t dio_complete(struct dio *dio, ssize_t ret, bool is_async)
{
...
        if (ret == 0)
                ret = dio->page_errors;
        if (ret == 0)
                ret = dio->io_error;
        if (ret == 0)
                ret = transferred;
...
        if (is_async) {
                /*
                 * generic_write_sync expects ki_pos to have been updated
                 * already, but the submission path only does this for
                 * synchronous I/O.
                 */
                dio->iocb->ki_pos += transferred;

                if (dio->op == REQ_OP_WRITE)
                        ret = generic_write_sync(dio->iocb,  transferred);
                dio->iocb->ki_complete(dio->iocb, ret, 0);



For your convenience I am attaching ftrace log to for easier tracking
the flow in the code:


 26)               |                nvme_complete_rq [nvme_core]() {
 26)               |                  blk_mq_end_request() {
 26)               |                    blk_update_request() { <----
log is from here
 26)   0.563 us    |                      blk_account_io_completion();
 26)   0.263 us    |                      bio_advance();
 26)               |                      bio_endio() {
 26)               |                        dio_bio_end_aio() {
 26)               |                          dio_bio_complete() {
 26)               |                            bio_check_pages_dirty() {
 26)               |                              bio_put() {
 26)               |                                bio_free() {
 26)               |                                  __bio_free() {
 26)   0.045 us    |                                    bio_disassociate_task();
 26)   0.497 us    |                                  }
 26)   0.042 us    |                                  bvec_free();
 26)               |                                  mempool_free() {
 26)               |                                    mempool_free_slab() {
 26)   0.264 us    |                                      kmem_cache_free();
 26)   0.606 us    |                                    }
 26)   1.125 us    |                                  }
 26)   2.588 us    |                                }
 26)   2.920 us    |                              }
 26)   3.979 us    |                            }
 26)   4.712 us    |                          }
 26)   0.040 us    |                          _raw_spin_lock_irqsave();
 26)   0.048 us    |                          _raw_spin_unlock_irqrestore();
 26)               |                          dio_complete() {
dio_complete(dio, 0, true);
 26)               |                            aio_complete() {
dio->iocb->ki_complete(dio->iocb, ret, 0); <<aio_complete(struct kiocb
*kiocb, long res, long res2)>>
 26)   0.073 us    |                              _raw_spin_lock_irqsave();
 26)   0.114 us    |                              refill_reqs_available();
 26)   0.048 us    |                              _raw_spin_unlock_irqrestore();
 26)               |                              kiocb_free() {
 26)   0.171 us    |                                fput();
 26)   0.102 us    |                                kmem_cache_free();
 26)   0.902 us    |                              }


        }

On Tue, Nov 6, 2018 at 9:29 AM Jack Wang <jack.wang.usish@gmail.com> wrote:
>
> Gregory Shapiro <shapiro.gregory@gmail.com> 于2018年11月5日周一 下午4:19写道:
> >
> > Hello, my name is Gregory Shapiro and I am a newbie on this list.
> > I recently encountered data corruption as I got a kernel to
> > acknowledge write ("io_getevents" system call with a correct number of
> > bytes) but undergoing write to disk failed.
> > After investigating the problem I found it is identical to issue found
> > in direct-io.c mentioned the bellow thread.
> > https://lore.kernel.org/lkml/20160921141539.GA17898@infradead.org/
> > Is there a reason proposed patch didn't apply to the kernel?
> > When can I expect it to be applied?
> > Thanks,
> >  Gregory
>
> Hi Gregory,
>
> Thanks for your info.
> Have you tried with latest kernel other than 4.7, is the problem still there?
>
> Could you share your test case?
>
> Regards,
> Jack Wang

  reply index

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-05 15:16 Gregory Shapiro
2018-11-06  7:28 ` Jack Wang
2018-11-06 11:31   ` Gregory Shapiro [this message]
2018-11-09 15:44     ` Jack Wang
  -- strict thread matches above, loose matches on Subject: below --
2016-09-12 18:38 Jonathan Nicklin
2016-09-21 14:15 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAAGb-1eLHwRy2L5tp+skWdV5zJ7jfCwTe-1UnJq_n5yjQfdJug@mail.gmail.com \
    --to=shapiro.gregory@gmail.com \
    --cc=gregory.shapiro@kaminario.com \
    --cc=hch@infradead.org \
    --cc=jack.wang.usish@gmail.com \
    --cc=jnicklin@blockbridge.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git