linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Damien Le Moal <Damien.LeMoal@wdc.com>
To: Jens Axboe <axboe@kernel.dk>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
Cc: Masato Suzuki <masato.suzuki@wdc.com>,
	Naohiro Aota <Naohiro.Aota@wdc.com>
Subject: Re: [PATCH] block: Fix __blkdev_direct_IO()
Date: Fri, 2 Aug 2019 04:11:37 +0000	[thread overview]
Message-ID: <BYAPR04MB581664B63FCC7A77B8D48A77E7D90@BYAPR04MB5816.namprd04.prod.outlook.com> (raw)
In-Reply-To: 0bf921d3-162b-8e51-63fa-5da757ccbbd2@kernel.dk

On 2019/08/02 12:57, Jens Axboe wrote:
> On 8/1/19 7:05 PM, Damien Le Moal wrote:
>> On 2019/08/02 4:51, Jens Axboe wrote:
>>> On 8/1/19 4:21 AM, Damien Le Moal wrote:
>>>> The recent fix to properly handle IOCB_NOWAIT for async O_DIRECT IO
>>>> (patch 6a43074e2f46) introduced two problems with BIO fragment handling
>>>> for direct IOs:
>>>> 1) The dio size processed is claculated by incrementing the ret variable
>>>> by the size of the bio fragment issued for the dio. However, this size
>>>> is obtained directly from bio->bi_iter.bi_size AFTER the bio submission
>>>> which may result in referencing the bi_size value after the bio
>>>> completed, resulting in an incorrect value use.
>>>> 2) The ret variable is not incremented by the size of the last bio
>>>> fragment issued for the bio, leading to an invalid IO size being
>>>> returned to the user.
>>>>
>>>> Fix both problem by using dio->size (which is incremented before the bio
>>>> submission) to update the value of ret after bio submissions, including
>>>> for the last bio fragment issued.
>>>
>>> Thanks, applied. Do you have a test case? I ran this through the usual
>>> xfstests and block bits, but didn't catch anything.
>>>
>>
>> The problem was detected with our weekly RC test runs for zoned block
>> devices.  RC1 last week was OK, but failures happen on RC2 Monday. We
>> never hit a oops for the BIO reference after submission but we were
>> getting a lot of unaligned write errors for all types of zoned drive
>> tested (real SMR disks, tcmu-runner ZBC handler disks and nullblk in
>> zoned mode) using various applications (fio, dd, ...)
>>
>> Masato isolated the problem to very large direct writes and could
>> reliably recreate the problem with a dd doing a single 8MB write to a
>> sequential zone.  With this case, blktrace showed that the 8MB write
>> was split into multiple BIOs (expected) and the whole 8MB being
>> cleanly written sequentially. But this was followed by a stream of
>> small 4K writes starting around the end of the 8MB chunk, but within
>> it, causing unaligned write errors (overwrite in sequential zones not
>> being possible).
>>
>> dd if=/dev/zero of=/dev/nullb0 bs=8M oflag=direct count=1
>>
>> On a nullblk disk in zoned mode should recreate the problem, and
>> blktrace revealing that dd sees a short write for the 8M IO and issues
>> remaining as 4K requests.
>>
>> Using a regular disk, this however does not generate any error at all,
>> which may explain why you did not see any problem. I think we can
>> create a blktest case for this using nullblk in zoned mode. Would you
>> like us to send one ?
> 
> Thanks for the detailed explanation. I think we should absolutely have a
> blktests case for this, even if O_DIRECT isn't strictly block level,
> it's definitely in a grey zone that I think we should cover.
> 

OK. We will work on a test case and post it.

-- 
Damien Le Moal
Western Digital Research

  reply	other threads:[~2019-08-02  4:11 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-01 10:21 [PATCH] block: Fix __blkdev_direct_IO() Damien Le Moal
2019-08-01 19:51 ` Jens Axboe
2019-08-02  1:05   ` Damien Le Moal
2019-08-02  3:57     ` Jens Axboe
2019-08-02  4:11       ` Damien Le Moal [this message]
2019-08-05 17:11 ` Bart Van Assche
2019-08-06 19:53   ` Jens Axboe
2019-08-09 13:01     ` Damien Le Moal
2019-08-09 15:52       ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BYAPR04MB581664B63FCC7A77B8D48A77E7D90@BYAPR04MB5816.namprd04.prod.outlook.com \
    --to=damien.lemoal@wdc.com \
    --cc=Naohiro.Aota@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=linux-block@vger.kernel.org \
    --cc=masato.suzuki@wdc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).