All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sitsofe Wheeler <sitsofe@gmail.com>
To: Jeff Furlong <jeff.furlong@wdc.com>
Cc: "fio@vger.kernel.org" <fio@vger.kernel.org>,
	Jens Axboe <axboe@kernel.dk>
Subject: Re: fio offset with ba
Date: Sat, 21 Oct 2017 05:21:39 +0100	[thread overview]
Message-ID: <CALjAwxiXinxTL6jeSNNoMCmu5--5CKBdWapSX-hOQ2+KYCJ6Ng@mail.gmail.com> (raw)
In-Reply-To: <BN3PR0401MB14573713E0FDB73C2E788A9B8F400@BN3PR0401MB1457.namprd04.prod.outlook.com>

Hmm,

I see a few choices for trying to solve alignment of the offset and
forced alignment of sequential blocks:
* Introduce an "offset_align" option solely for aligning the "initial"
offset and stop trying to use blockalign in the offset. Document it
well to stop confusion with blockalign.
* Document blockalign as ONLY working for random I/O or
* Do blockalign for sequential I/O on a per I/O basis (potentially
creating gappy I/O). This is also the request from
https://github.com/axboe/fio/issues/341 . Technically this is has a
big overlap with zoned I/O and the existing "generated offset" used in
rw option but perhaps it's just different enough to be justified?

Thoughts?

On 21 October 2017 at 01:13, Jeff Furlong <jeff.furlong@wdc.com> wrote:
> Yes, the bits around line 845 show the min() being used.  It seems get_start_offset() is used for the first IO and subsequent IO's, making things difficult.  I believe the correct fix here would be to set the ba specific to the io type.  The ba parameter allows for read/write/trim alignments.  So if io_u->ddir==DDIR_READ then we could just align to o->ba[DDIR_READ].  But I don't see how we would access io_u at this point, we don't know of the potential io_u at this time?  It could be a mixed read/write/trim.
>
> Lacking that, brute force would suggest:
>
>                 if (fio_option_is_set(o, ba)) {
>                         align_bs = (unsigned long long) o->ba[DDIR_READ];
>                         if(o->ba[DDIR_READ] != o->ba[DDIR_WRITE])
>                                 align_bs = (unsigned long long) o->ba[DDIR_READ] * (unsigned long long) o->ba[DDIR_WRITE];
>                         if(align_bs != (unsigned long long) o->ba[DDIR_TRIM])
>                                 align_bs = align_bs * (unsigned long long) o->ba[DDIR_TRIM];
>
> But I see another problem in that o->ba[DDIR_READ] is not set for sequential workloads.  In fixup_options() we have:
>
>         if (!o->ba[DDIR_READ] || !td_random(td))
>                 o->ba[DDIR_READ] = o->min_bs[DDIR_READ];
>         if (!o->ba[DDIR_WRITE] || !td_random(td))
>                 o->ba[DDIR_WRITE] = o->min_bs[DDIR_WRITE];
>         if (!o->ba[DDIR_TRIM] || !td_random(td))
>                 o->ba[DDIR_TRIM] = o->min_bs[DDIR_TRIM];
>
> I don't follow that code, other than if sequential, set ba to min_bs?  If we remove that code and use above change, we can get the starting LBA to be aligned to the ba in the case of
>
> fio --name=test_job --ioengine=libaio --direct=1 --rw=read --iodepth=1 --size=100% --bs=4k --filename=/dev/nvme1n1 --number_ios=8 --offset=50% --log_offset=1 --write_iops_log=test_job --ba=8k
>
> # cat test_job_iops.1.log
> 0, 1, 0, 4096, 1600315899904
>
> It seems to be a hack, so I didn't create a patch for it.  Would like to better understand what I'm missing before breaking something.
>
> Thanks.
>
> Regards,
> Jeff
>
>
> -----Original Message-----
> From: Sitsofe Wheeler [mailto:sitsofe@gmail.com]
> Sent: Friday, October 20, 2017 2:01 PM
> To: Jeff Furlong <jeff.furlong@wdc.com>
> Cc: fio@vger.kernel.org
> Subject: Re: fio offset with ba
>
> Hi,
>
> On 20 October 2017 at 20:08, Jeff Furlong <jeff.furlong@wdc.com> wrote:
>>
>> I don't quite follow the logic in the calculate offset function.  The offset parameter recently allows a percentage.  Suppose we set it to 50% and want to block align the IO's starting at 50% of device capacity, then block aligned to 8KB.
>>
>> # fio -version
>> fio-3.1-60-g71aa
>>
>> # blockdev --getsize64 /dev/nvme1n1
>> 3200631791616
>>
>> # fio --name=test_job --ioengine=libaio --direct=1 --rw=read
>> --iodepth=1 --size=100% --bs=4k --filename=/dev/nvme1n1 --runtime=1s
>> --offset=50% --log_offset=1 --write_iops_log=test_job --ba=8k
>>
>> # cat test_job_iops.1.log
>> 0, 1, 0, 4096, 1600315895808
>> 0, 1, 0, 4096, 1600315899904
>> 0, 1, 0, 4096, 1600315904000
>> 0, 1, 0, 4096, 1600315908096
>>
>> So we can see the device has 3200631791616 bytes, 50% of which is
>> 1600315895808 bytes, which happens to be 4KB aligned, but not 8KB
>> aligned.  Even though we set the --ba=8k parameter, the offset LBA as
>> logged in the iops.1.log shows
>
> Hmm I see the same problem with this job:
> fio --name=test_job --ioengine=null --rw=read --iodepth=1
> --size=3200631791616 --bs=4k --number_ios=1 --offset=50% --ba=8k --debug=io
>
> [...]
> io       15013 fill_io_u: io_u 0x236ad80:
> off=1600315895808/len=4096/ddir=0io       15013 /test_job.0.0io
> 15013
>
> I think your guess about only impacting random I/O is probably right because fio --name=test_job --randrepeat=0 --ioengine=null --rw=randread
> --iodepth=1 --size=3200631791616 --bs=4k --number_ios=1 --offset=50% --ba=8k --debug=io
>
> picks offsets that are 8k aligned.
>
>> 4KB alignment.  Does --ba work for all IO's or only random IO's?  If all, does get_start_offset() control the raw offset value?  I don't see why the min(ba, bs) is used in the calculation, but perhaps I am missing something.  Thanks.
>
> Where is min(ba, bs) done - do you mean the bits around https://github.com/axboe/fio/commit/89978a6b26f81bdbd63228e2e2a86f604ee46c56#diff-4abbf037246dd2e450dc3f6a2ac77180R845?
> I agree you probably want to take the maximum of all the block alignments but what if one of the smaller ones is not a multiple of the largest one?
>
> Would you like to propose a patch?
>
> --
> Sitsofe | http://sucs.org/~sits/

-- 
Sitsofe | http://sucs.org/~sits/


  reply	other threads:[~2017-10-21  4:21 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-20 19:08 fio offset with ba Jeff Furlong
2017-10-20 21:01 ` Sitsofe Wheeler
2017-10-21  0:13   ` Jeff Furlong
2017-10-21  4:21     ` Sitsofe Wheeler [this message]
2017-10-23 23:56       ` Jeff Furlong
2017-10-24  6:21         ` Sitsofe Wheeler
2017-10-26  0:27           ` Jeff Furlong
2017-10-26  4:24             ` Jens Axboe
2017-10-26  6:27               ` Sitsofe Wheeler
2017-10-26 14:25                 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALjAwxiXinxTL6jeSNNoMCmu5--5CKBdWapSX-hOQ2+KYCJ6Ng@mail.gmail.com \
    --to=sitsofe@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=fio@vger.kernel.org \
    --cc=jeff.furlong@wdc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.