All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Lyle <mlyle@lyle.org>
To: Coly Li <i@coly.li>
Cc: Junhui Tang <tang.junhui@zte.com.cn>,
	linux-bcache@vger.kernel.org, linux-block@vger.kernel.org,
	Kent Overstreet <kent.overstreet@gmail.com>
Subject: Re: [PATCH 4/5] bcache: writeback: collapse contiguous IO better
Date: Sat, 30 Sep 2017 01:31:33 -0700	[thread overview]
Message-ID: <CAJ+L6qe9XjPOw=6NpWzZhw-PLC5EXhstOrPaapbqvDcD0+_7QA@mail.gmail.com> (raw)
In-Reply-To: <CAJ+L6qf_zG37s4nGhxTbNvsdvDr5fwbmYocwP5m4rkjMLQKjtg@mail.gmail.com>

Just one more note---

IO merging is not happening properly right now.

It's easy to get a case together where basically all the writeback is
sequential.  E.g. if your writeback dirty data target is 15GB, do
something like:

$ sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1
--name=test --filename=test --bs=8k --iodepth=256 --size=30G
--readwrite=randwrite --ramp_time=4

This creates a situation where bcache has lots of 8K segments and
they're almost all sequential.

Wait for the test to complete, then watch iostat 1 .  You'll see that
the writeback KB written / tps == just a little more than 8, despite
the spinning disks doing IO at maximum rate.  You can feel the disk
and tell it is seeking tons.  The IOs are not being merged properly.

Mike

On Sat, Sep 30, 2017 at 1:23 AM, Michael Lyle <mlyle@lyle.org> wrote:
> On Sat, Sep 30, 2017 at 1:03 AM, Coly Li <i@coly.li> wrote:
>>>> If writeback_rate is not minimum value, it means there are front end
>>>> write requests existing.
>>>
>>> This is wrong.  Else we'd never make it back to the target.
>>>
>>
>> Of cause we can :-) When dirty data is far beyond dirty percent,
>> writeback rate is quite high, in that case dc->last_read takes effect.
>> But this is not the situation which your patch handles better.
>
> ... and dirty data can be beyond dirty percent without front-end write
> requests happening, right?  dirty data being significantly beyond
> dirty percent means that we fell far behind at some point in the past,
> because the spinning disk couldn't keep up with the writeback rate...
>
> I really don't even understand what you're saying.  Front-end write
> exists existing that will make their way to the backing disk will only
> happen if writeback is bypassed or if they're judged to be sequential.
> The point of writeback is to eat these front-end write requests.
>
>> You are right. But when writeback rate is high, dc->last_read can solve
>> the seeking problem as your patch does.
>
> Sure, but it doesn't lay the ground work for plugging..
>
>> Do you mean write I/O cannot be issued within slice_idle, or they are
>> issued within slice_idle but underlying elevator does not merge the
>> continuous request ?
>
> I mean this:  imagine the writeback rate is high with the current
> code, and the writeback code issues read requests for backing
> locations 1,2,3,4,5,6, which are noncontiguous on the SSD.
>
> 1, 3, 4, 5 are read quickly and the underlying elevator merges 3, 4, and 5.
>
> 2 & 6 arrive 2ms later.
>
> What happens?
>
>> I meant the dirty_io list in your future patch, not current 5 bios
>> patch. Hmm, maybe you can ignore my noise here, I just jumped off the topic.
>
> Keeping the set of dirty_io objects that are being issued in a list so
> that it's easy to iterate contiguous ones doesn't change the
> commitment behavior.
>
>>>> And plug list will be unplugged automatically as default, when context
>>>> switching happens. If you will performance read I/Os to the btrees, a
>>>> context switch is probably to happen, then you won't keep a large bio
>>>> lists ...
>>>
>>> Thankfully we'll have 5 things to fire immediately after each other,
>>> so we don't need to worry about automatic unplug.
>>>
>>
>> Forgive me, again I meant dirty_io list stuffs, not this patch...
>
> The idea is this: the group of up-to-5 IOs that the current patch
> gathers, are put in a list.  When all 5 have been read, then the
> device is plugged, and the writes are issued together, and then the
> device is unplugged.  Kent suggested this to me on IRC.
>
>> Could you please show exact benchmark result ? Then it will be easier to
>> change my mind. I only know if writeback I/O is not continuously issued
>> within slice_idle, mostly it is because read dirty delayed in line 238
>> of read_dirty(),
>> 235     if (KEY_START(&w->key) != dc->last_read ||
>> 236         jiffies_to_msecs(delay) > 50)
>> 237             while (!kthread_should_stop() && delay)
>> 238                     delay = schedule_timeout_interruptible(delay);
>>
>> If writeback rate is high and writeback I/O is continuous, read I/O will
>> be issued without delay. Then writeback I/O will be issued by
>> write_dirty() in very short time.
>
> What this code says is, you're willing to get up to 50ms "ahead" on
> the writeback for contiguous I/O.
>
> This is bad, because: A) there is no bounds on how much you are
> willing to aggregate.  If the writeback rate is high, 50ms could be a
> lot of data, and a lot of IOPs to the cache device.  The new code
> bounds this.  B) Even if a single I/O could put you 50ms ahead, it
> still could be advantageous to do multiple writes.
>
>> Therefore without a real performance comparison, I cannot image why
>> adjacent write requests cannot merged in elevator... Unless I do the
>> test myself.
>
> See the above example with the lack of write-ordering-enforcement.
>
> Mike
>
>>
>>
>> --
>> Coly Li

  reply	other threads:[~2017-09-30  8:31 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-27  7:32 [PATCH 4/5] bcache: writeback: collapse contiguous IO better tang.junhui
2017-09-27  7:47 ` Michael Lyle
2017-09-27  7:58   ` Michael Lyle
2017-09-30  2:25 ` Coly Li
2017-09-30  3:17   ` Michael Lyle
2017-09-30  6:58     ` Coly Li
2017-09-30  7:13       ` Michael Lyle
2017-09-30  7:13         ` Michael Lyle
2017-09-30  7:33         ` Michael Lyle
2017-09-30  7:33           ` Michael Lyle
2017-09-30  8:03         ` Coly Li
2017-09-30  8:23           ` Michael Lyle
2017-09-30  8:31             ` Michael Lyle [this message]
     [not found]               ` <CAJ+L6qcU+Db5TP1Q2J-V8angdzeW9DFGwc7KQqc4di9CSxusLg@mail.gmail.com>
     [not found]                 ` <CAJ+L6qdu4OSRh7Qdkk-5XBgd4W_N29Y6-wVLf-jFAMKEhQrTbQ@mail.gmail.com>
     [not found]                   ` <CAJ+L6qcyq-E4MrWNfB9kGA8DMD_U1HMxJii-=-qPfv0LeRL45w@mail.gmail.com>
2017-09-30 22:49                     ` Michael Lyle
2017-10-01  4:51                       ` Coly Li
2017-10-01 16:56                         ` Michael Lyle
2017-10-01 16:56                           ` Michael Lyle
2017-10-01 17:23                           ` Coly Li
2017-10-01 17:34                             ` Michael Lyle
2017-10-04 18:43                               ` Coly Li
2017-10-04 18:43                                 ` Coly Li
2017-10-04 23:54                                 ` Michael Lyle
2017-10-04 23:54                                   ` Michael Lyle
2017-10-05 17:38                                   ` Coly Li
2017-10-05 17:53                                     ` Michael Lyle
2017-10-05 18:07                                       ` Coly Li
2017-10-05 22:59                                       ` Michael Lyle
2017-10-06  8:27                                         ` Coly Li
2017-10-06  9:20                                           ` Michael Lyle
2017-10-06 10:36                                             ` Coly Li
2017-10-06 10:42                                               ` Michael Lyle
2017-10-06 10:42                                                 ` Michael Lyle
2017-10-06 10:56                                                 ` Michael Lyle
2017-10-06 10:56                                                   ` Michael Lyle
2017-10-06 11:00                                                 ` Hannes Reinecke
2017-10-06 11:09                                                   ` Michael Lyle
2017-10-06 11:09                                                     ` Michael Lyle
2017-10-06 11:57                                                     ` Michael Lyle
2017-10-06 11:57                                                       ` Michael Lyle
2017-10-06 12:37                                                       ` Coly Li
2017-10-06 17:36                                                         ` Michael Lyle
2017-10-06 18:09                                                           ` Coly Li
2017-10-06 18:23                                                             ` Michael Lyle
2017-10-06 18:36                                                             ` Michael Lyle
2017-10-09 18:58                                                               ` Coly Li
2017-10-10  0:00                                                                 ` Michael Lyle
2017-10-09  5:59                                                             ` Hannes Reinecke
2017-10-06 12:20                                                 ` Coly Li
2017-10-06 17:53                                                   ` Michael Lyle
  -- strict thread matches above, loose matches on Subject: below --
2017-09-29  3:37 tang.junhui
2017-09-29  4:15 ` Michael Lyle
2017-09-29  4:22   ` Coly Li
2017-09-29  4:27     ` Michael Lyle
2017-09-29  4:26   ` Michael Lyle
2017-09-26 19:24 [PATCH 1/5] bcache: don't write back data if reading it failed Michael Lyle
2017-09-26 19:24 ` [PATCH 4/5] bcache: writeback: collapse contiguous IO better Michael Lyle

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJ+L6qe9XjPOw=6NpWzZhw-PLC5EXhstOrPaapbqvDcD0+_7QA@mail.gmail.com' \
    --to=mlyle@lyle.org \
    --cc=i@coly.li \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=tang.junhui@zte.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.