All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Unaligned zoned write with bsrange ‘3k-1025K’ always fails for the reason being ‘data corruption’
       [not found] <CAE5BwqHyQzTx7gtGMXb7wE1xvysxeiopsCBijSHEzOx29PvzHg@mail.gmail.com>
@ 2016-09-19 20:52 ` Jens Axboe
  2016-09-19 21:09   ` Jens Axboe
  0 siblings, 1 reply; 3+ messages in thread
From: Jens Axboe @ 2016-09-19 20:52 UTC (permalink / raw)
  To: Sunil Nadumutlu; +Cc: fio

On 09/19/2016 06:30 AM, Sunil Nadumutlu wrote:
> Hi Jens,
>
> I have been observing one interesting issue since couple of days where
> write_verify is failing (due to data corruption) while running unaligned
> zoned write with bsrange ‘3k-1025K’.
> I have following fio CLI syntax tried on RHEL 7.0 client, by involving
> raw device and filesystem on raw dev as well.
>
> Here are fio CLI syntax:
> Raw device IO:
> ==============
>   fio  --runtime=43200  --filename=/dev/sdc --rw=randrw
> --ioengine=libaio --direct=1 --time_based --verify=md5 --verify_dump=1
> --verify_fatal=1 --threads=8 --zonesize=1m --zoneskip=1024
> --name=zoning-unaligned-large --bsrange=3k-1025k --runtime=86400
>
> File system (ext4) IO:
> ======================
> fio  --runtime=43200  --filename=/tmp/xyz/fio3  --rw=randrw
> --ioengine=libaio --direct=1 --time_based --verify=md5 --verify_dump=1
> --verify_fatal=1 --threads=8 --zonesize=1m --zoneskip=1024
> --name=zoning-unaligned-large --bsrange=3k-1025k --runtime=86400
> --create_on_open=1 --filesize=500m  --buffer_pattern=0x48656c6c6f776f746c64
>
> write_verify always failed within few seconds on filesystem, and
> write_verify failed on raw device within 2-3mins for the reason being
> data corruption observed during write verification.
> In case of raw dev, data mismatch was observed after 1K (sometiem it was
> across block), however incase FS, data corruption was observed across
> 3076 block.
>
> To narrow down the issue, I ran similar workload bs=3k ( no bsrange ,
> and only fixed block) IO using vdbench and medusa tool, where test have
> passed successfully. Hence I am thinking that this is not an issue with
> storage array. At this time, I am just curious to know whether fio has
> any known bug in this area. One thing to note here is that vdbench and
> medusa doesn’t have block zoneing and bsrange option. It is always fixed
> block size with these tools.
>
> Attached herewith are dumps collected from above 2 tests…
> File system fio dump:
> fio3.1390592.expected
> fio3.1390592.received
>
> Raw dev fio dump:
> sdc.3325952.expected
> sdc.3325952.received
>
> Anticipating your help in this regard.

Please send questions to the fio mailing list, fio@vger.kernel.org. I 
don't have time to answer all queries personally, plenty more people are 
capable of doing that.

That said, you have multiple threads writing to the same file or device 
in both cases.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Unaligned zoned write with bsrange ‘3k-1025K’ always fails for the reason being ‘data corruption’
  2016-09-19 20:52 ` Unaligned zoned write with bsrange ‘3k-1025K’ always fails for the reason being ‘data corruption’ Jens Axboe
@ 2016-09-19 21:09   ` Jens Axboe
  2016-09-19 23:09     ` Sunil Nadumutlu
  0 siblings, 1 reply; 3+ messages in thread
From: Jens Axboe @ 2016-09-19 21:09 UTC (permalink / raw)
  To: Sunil Nadumutlu; +Cc: fio

On 09/19/2016 02:52 PM, Jens Axboe wrote:
> On 09/19/2016 06:30 AM, Sunil Nadumutlu wrote:
>> Hi Jens,
>>
>> I have been observing one interesting issue since couple of days where
>> write_verify is failing (due to data corruption) while running unaligned
>> zoned write with bsrange ‘3k-1025K’.
>> I have following fio CLI syntax tried on RHEL 7.0 client, by involving
>> raw device and filesystem on raw dev as well.
>>
>> Here are fio CLI syntax:
>> Raw device IO:
>> ==============
>>   fio  --runtime=43200  --filename=/dev/sdc --rw=randrw
>> --ioengine=libaio --direct=1 --time_based --verify=md5 --verify_dump=1
>> --verify_fatal=1 --threads=8 --zonesize=1m --zoneskip=1024
>> --name=zoning-unaligned-large --bsrange=3k-1025k --runtime=86400
>>
>> File system (ext4) IO:
>> ======================
>> fio  --runtime=43200  --filename=/tmp/xyz/fio3  --rw=randrw
>> --ioengine=libaio --direct=1 --time_based --verify=md5 --verify_dump=1
>> --verify_fatal=1 --threads=8 --zonesize=1m --zoneskip=1024
>> --name=zoning-unaligned-large --bsrange=3k-1025k --runtime=86400
>> --create_on_open=1 --filesize=500m
>> --buffer_pattern=0x48656c6c6f776f746c64
>>
>> write_verify always failed within few seconds on filesystem, and
>> write_verify failed on raw device within 2-3mins for the reason being
>> data corruption observed during write verification.
>> In case of raw dev, data mismatch was observed after 1K (sometiem it was
>> across block), however incase FS, data corruption was observed across
>> 3076 block.
>>
>> To narrow down the issue, I ran similar workload bs=3k ( no bsrange ,
>> and only fixed block) IO using vdbench and medusa tool, where test have
>> passed successfully. Hence I am thinking that this is not an issue with
>> storage array. At this time, I am just curious to know whether fio has
>> any known bug in this area. One thing to note here is that vdbench and
>> medusa doesn’t have block zoneing and bsrange option. It is always fixed
>> block size with these tools.
>>
>> Attached herewith are dumps collected from above 2 tests…
>> File system fio dump:
>> fio3.1390592.expected
>> fio3.1390592.received
>>
>> Raw dev fio dump:
>> sdc.3325952.expected
>> sdc.3325952.received
>>
>> Anticipating your help in this regard.
>
> Please send questions to the fio mailing list, fio@vger.kernel.org. I
> don't have time to answer all queries personally, plenty more people are
> capable of doing that.
>
> That said, you have multiple threads writing to the same file or device
> in both cases.

Actually, I take that back, looks like it's just a bad use case of
'thread' - it's a bool, so just 0/1 applies here. There's just one job
running in your test case.

Can you try and add experimental_verify=1 and see if that changes
anything for you?

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Unaligned zoned write with bsrange ‘3k-1025K’ always fails for the reason being ‘data corruption’
  2016-09-19 21:09   ` Jens Axboe
@ 2016-09-19 23:09     ` Sunil Nadumutlu
  0 siblings, 0 replies; 3+ messages in thread
From: Sunil Nadumutlu @ 2016-09-19 23:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio

[-- Attachment #1: Type: text/plain, Size: 3131 bytes --]

Sure Jens. will update mailing list on my findings.
Appreciate your time.
Regards,
Sunil


On Mon, Sep 19, 2016 at 2:09 PM, Jens Axboe <axboe@kernel.dk> wrote:

> On 09/19/2016 02:52 PM, Jens Axboe wrote:
>
>> On 09/19/2016 06:30 AM, Sunil Nadumutlu wrote:
>>
>>> Hi Jens,
>>>
>>> I have been observing one interesting issue since couple of days where
>>> write_verify is failing (due to data corruption) while running unaligned
>>> zoned write with bsrange ‘3k-1025K’.
>>> I have following fio CLI syntax tried on RHEL 7.0 client, by involving
>>> raw device and filesystem on raw dev as well.
>>>
>>> Here are fio CLI syntax:
>>> Raw device IO:
>>> ==============
>>>   fio  --runtime=43200  --filename=/dev/sdc --rw=randrw
>>> --ioengine=libaio --direct=1 --time_based --verify=md5 --verify_dump=1
>>> --verify_fatal=1 --threads=8 --zonesize=1m --zoneskip=1024
>>> --name=zoning-unaligned-large --bsrange=3k-1025k --runtime=86400
>>>
>>> File system (ext4) IO:
>>> ======================
>>> fio  --runtime=43200  --filename=/tmp/xyz/fio3  --rw=randrw
>>> --ioengine=libaio --direct=1 --time_based --verify=md5 --verify_dump=1
>>> --verify_fatal=1 --threads=8 --zonesize=1m --zoneskip=1024
>>> --name=zoning-unaligned-large --bsrange=3k-1025k --runtime=86400
>>> --create_on_open=1 --filesize=500m
>>> --buffer_pattern=0x48656c6c6f776f746c64
>>>
>>> write_verify always failed within few seconds on filesystem, and
>>> write_verify failed on raw device within 2-3mins for the reason being
>>> data corruption observed during write verification.
>>> In case of raw dev, data mismatch was observed after 1K (sometiem it was
>>> across block), however incase FS, data corruption was observed across
>>> 3076 block.
>>>
>>> To narrow down the issue, I ran similar workload bs=3k ( no bsrange ,
>>> and only fixed block) IO using vdbench and medusa tool, where test have
>>> passed successfully. Hence I am thinking that this is not an issue with
>>> storage array. At this time, I am just curious to know whether fio has
>>> any known bug in this area. One thing to note here is that vdbench and
>>> medusa doesn’t have block zoneing and bsrange option. It is always fixed
>>> block size with these tools.
>>>
>>> Attached herewith are dumps collected from above 2 tests…
>>> File system fio dump:
>>> fio3.1390592.expected
>>> fio3.1390592.received
>>>
>>> Raw dev fio dump:
>>> sdc.3325952.expected
>>> sdc.3325952.received
>>>
>>> Anticipating your help in this regard.
>>>
>>
>> Please send questions to the fio mailing list, fio@vger.kernel.org. I
>> don't have time to answer all queries personally, plenty more people are
>> capable of doing that.
>>
>> That said, you have multiple threads writing to the same file or device
>> in both cases.
>>
>
> Actually, I take that back, looks like it's just a bad use case of
> 'thread' - it's a bool, so just 0/1 applies here. There's just one job
> running in your test case.
>
> Can you try and add experimental_verify=1 and see if that changes
> anything for you?
>
> --
> Jens Axboe
>
>

[-- Attachment #2: Type: text/html, Size: 3920 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-09-19 23:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAE5BwqHyQzTx7gtGMXb7wE1xvysxeiopsCBijSHEzOx29PvzHg@mail.gmail.com>
2016-09-19 20:52 ` Unaligned zoned write with bsrange ‘3k-1025K’ always fails for the reason being ‘data corruption’ Jens Axboe
2016-09-19 21:09   ` Jens Axboe
2016-09-19 23:09     ` Sunil Nadumutlu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.