All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: bupsplit.c copyright and patching
       [not found]                       ` <CAHqTa-3KHrC67r9tZs5kFNF7bSh5Dt_AYHnAEhHziqvAijD_wA@mail.gmail.com>
@ 2018-04-23 20:03                         ` Nix
  2018-04-23 20:22                           ` Avery Pennarun
  2018-04-24 16:47                           ` Dave Chinner
  0 siblings, 2 replies; 6+ messages in thread
From: Nix @ 2018-04-23 20:03 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Rob Browning, Robert Evans, bup-list, linux-xfs

[Cc:ed in the xfs list to ask a question: see the last quoted section
 below]

On 23 Apr 2018, Avery Pennarun stated:

> On Mon, Apr 23, 2018 at 1:44 PM, Nix <nix@esperi.org.uk> wrote:
>> Hm. Checking the documentation it looks like the scheduler is smarter
>> than I thought: it does try to batch the requests and service as many as
>> possible in each sweep across the disk surface, but it is indeed only
>> tunable on a systemwide basis :(
>
> Yeah, my understanding was that only cfq actually cares about ionice.

Yes, though idle versus non-idle can be used by other components too: it
can tell bcache not to cache low-priority reads, for instance (pretty
crucial if you've just done an index deletion, or the next bup run would
destroy your entire bcache!)

> That's really a shame: bup does a great job (basically zero
> performance impact) when run at ionice 'idle" priority, especially
> since it uses fadvise() to tell the kernel when it's done with files,
> so it doesn't get other people's stuff evicted from page cache.

Yeah. Mind you, I don't actually notice its performance impact here,
with the deadline scheduler, but part of that is bcache and the rest is
128GiB RAM. We can't require users to have something like *that*. :P
(heck most of my systems are much smaller.)

> On the other hand, maybe what you actually want is just cfq with your
> high-priority tasks given a higher-than-average ionice priority.  I

The XFS FAQ claims that this is, ah, not good for xfs performance, but
this may be one of those XFS canards that is wildly out of date, like
almost all the online tuning hints telling you to do something on the
mkfs.xfs line, most of which actually make performance *worse*.

xfs folks, could you confirm that the deadline scheduler really is still
necessary for XFS atop md, and that CFS is still a distinctly bad idea?

I'm trying to evaluate possible bup improvements before they're made
(involving making *lots* of I/O requests in parallel, i.e. dozens, and
relying on the I/O scheduler to sort it out, even if the number of
requests is way above the number of rotating-rust spindles), and it has
been put forward that CFS will do the right thing here and deadline is
likely to be just terrible.
<http://xfs.org/index.php/XFS_FAQ#Q:_Which_I.2FO_scheduler_for_XFS.3F>
suggests otherwise (particularly for parallel workloads such as, uh,
this one), but even on xfs.org I am somewhat concerned about stale
recommendations causing trouble...

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bupsplit.c copyright and patching
  2018-04-23 20:03                         ` bupsplit.c copyright and patching Nix
@ 2018-04-23 20:22                           ` Avery Pennarun
  2018-04-23 21:53                             ` Nix
  2018-04-24 16:47                           ` Dave Chinner
  1 sibling, 1 reply; 6+ messages in thread
From: Avery Pennarun @ 2018-04-23 20:22 UTC (permalink / raw)
  To: Nix; +Cc: Rob Browning, Robert Evans, bup-list, linux-xfs

On Mon, Apr 23, 2018 at 4:03 PM, Nix <nix@esperi.org.uk> wrote:
> [Cc:ed in the xfs list to ask a question: see the last quoted section
>  below]
>
> On 23 Apr 2018, Avery Pennarun stated:
>
>> On Mon, Apr 23, 2018 at 1:44 PM, Nix <nix@esperi.org.uk> wrote:
>>> Hm. Checking the documentation it looks like the scheduler is smarter
>>> than I thought: it does try to batch the requests and service as many as
>>> possible in each sweep across the disk surface, but it is indeed only
>>> tunable on a systemwide basis :(
>>
>> Yeah, my understanding was that only cfq actually cares about ionice.
>
> Yes, though idle versus non-idle can be used by other components too: it
> can tell bcache not to cache low-priority reads, for instance (pretty
> crucial if you've just done an index deletion, or the next bup run would
> destroy your entire bcache!)

Hmm, I don't know about that.  It's hard to completely avoid caching
things at all, because the usual way of things is to load stuff into
the cache, then feed it back to userspace sometime shortly afterward.
Not doing so can make things much worse (eg. if someone tries to
stat() the same file twice in a row or something).  bup's way of doing
fadvise() when it's done with a file seems to work pretty well in my
experience, explicitly not churning the kernel cache even when doing a
full backup from scratch.

>> That's really a shame: bup does a great job (basically zero
>> performance impact) when run at ionice 'idle" priority, especially
>> since it uses fadvise() to tell the kernel when it's done with files,
>> so it doesn't get other people's stuff evicted from page cache.
>
> Yeah. Mind you, I don't actually notice its performance impact here,
> with the deadline scheduler, but part of that is bcache and the rest is
> 128GiB RAM. We can't require users to have something like *that*. :P
> (heck most of my systems are much smaller.)

Inherently, doing a system backup will need to cause a huge number of
disk seeks.  If you are checking a couple of realtime streams to make
sure they don't miss any deadlines, then you might not notice any
impact, but you ought to notice a reduction in total available
throughput while a big backup task is running (unless it's running
using idle priority).  This would presumably be even worse if the
backup task is running many file reads/stats in parallel.

Incidentally, I have a tool that we used on a DVR product to ensure we
could support multiple realtime streams under heavy load (ie.
something like 12 readers + 12 writers on a single 7200 RPM disk).
For that use case, it was easy to see that deadline was better for
keeping deadlines (imagine!) than cfq.  But cfq got more total
throughput.  This was on ext4 with preallocation though, not xfs.  The
tool I wrote is diskbench, available here:
https://gfiber.googlesource.com/vendor/google/platform/+/master/cmds/diskbench.c

> <http://xfs.org/index.php/XFS_FAQ#Q:_Which_I.2FO_scheduler_for_XFS.3F>
> suggests otherwise (particularly for parallel workloads such as, uh,
> this one), but even on xfs.org I am somewhat concerned about stale
> recommendations causing trouble...

That section seems... not so well supported.  If your deadlines are
incredibly short, there might be something to it, but the whole point
of cfq and this sort of time slicing is to minimize seeks.  It might
cause the disk to linger longer in a particular section without
seeking, but if (eg.) one of the tasks decides it wants to read from
all over the disk, it wouldn't make sense to just let that task do
whatever it wants, then switch to another task that does whatever it
wants, and so on.  That would seem to *maximize* seeks as well as
latency, which is just bad for everyone.  Empirically cfq is quite
good :)

Have fun,

Avery

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bupsplit.c copyright and patching
  2018-04-23 20:22                           ` Avery Pennarun
@ 2018-04-23 21:53                             ` Nix
  2018-04-23 22:06                               ` Avery Pennarun
  0 siblings, 1 reply; 6+ messages in thread
From: Nix @ 2018-04-23 21:53 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Rob Browning, Robert Evans, bup-list, linux-xfs

On 23 Apr 2018, Avery Pennarun stated:
> On Mon, Apr 23, 2018 at 4:03 PM, Nix <nix@esperi.org.uk> wrote:
>> On 23 Apr 2018, Avery Pennarun stated:
>> Yes, though idle versus non-idle can be used by other components too: it
>> can tell bcache not to cache low-priority reads, for instance (pretty
>> crucial if you've just done an index deletion, or the next bup run would
>> destroy your entire bcache!)
>
> Hmm, I don't know about that.  It's hard to completely avoid caching
> things at all, because the usual way of things is to load stuff into
> the cache, then feed it back to userspace sometime shortly afterward.
> Not doing so can make things much worse (eg. if someone tries to
> stat() the same file twice in a row or something).  bup's way of doing
> fadvise() when it's done with a file seems to work pretty well in my
> experience, explicitly not churning the kernel cache even when doing a
> full backup from scratch.

Bear in mind that the 'cache' I mentioned above is a bcache cache, i.e.
*writing the data to SSD*. :) It'll still get read into the page cache!

>>> That's really a shame: bup does a great job (basically zero
>>> performance impact) when run at ionice 'idle" priority, especially
>>> since it uses fadvise() to tell the kernel when it's done with files,
>>> so it doesn't get other people's stuff evicted from page cache.
>>
>> Yeah. Mind you, I don't actually notice its performance impact here,
>> with the deadline scheduler, but part of that is bcache and the rest is
>> 128GiB RAM. We can't require users to have something like *that*. :P
>> (heck most of my systems are much smaller.)
>
> Inherently, doing a system backup will need to cause a huge number of
> disk seeks.

This is actually something where having a few hundred GiB of bcache
helps a great deal. A great deal of seeky metadata gets cached during
normal usage, and then bup will use it, even if it's not allowed to
populate it more.

>             If you are checking a couple of realtime streams to make
> sure they don't miss any deadlines, then you might not notice any
> impact, but you ought to notice a reduction in total available
> throughput while a big backup task is running (unless it's running
> using idle priority).

Of course it is. Why would you want to run a backup task at higher
priority than that? :)

> Incidentally, I have a tool that we used on a DVR product to ensure we
> could support multiple realtime streams under heavy load (ie.
> something like 12 readers + 12 writers on a single 7200 RPM disk).

This is also what xfs's realtime stuff was meant for, back in the day.

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bupsplit.c copyright and patching
  2018-04-23 21:53                             ` Nix
@ 2018-04-23 22:06                               ` Avery Pennarun
  0 siblings, 0 replies; 6+ messages in thread
From: Avery Pennarun @ 2018-04-23 22:06 UTC (permalink / raw)
  To: Nix; +Cc: Rob Browning, Robert Evans, bup-list, linux-xfs

On Mon, Apr 23, 2018 at 5:53 PM, Nix <nix@esperi.org.uk> wrote:
> On 23 Apr 2018, Avery Pennarun stated:
>>             If you are checking a couple of realtime streams to make
>> sure they don't miss any deadlines, then you might not notice any
>> impact, but you ought to notice a reduction in total available
>> throughput while a big backup task is running (unless it's running
>> using idle priority).
>
> Of course it is. Why would you want to run a backup task at higher
> priority than that? :)

Well, assuming your block scheduler supports it :)  Anyway, the idea
is that this is relatively easy to benchmark empirically, as long as
you know what to look for.

>> Incidentally, I have a tool that we used on a DVR product to ensure we
>> could support multiple realtime streams under heavy load (ie.
>> something like 12 readers + 12 writers on a single 7200 RPM disk).
>
> This is also what xfs's realtime stuff was meant for, back in the day.

Oops, I wasn't clear.  diskbench is a tool for checking whether your
cpu + disk + filesystem + scheduler can handle the load.  It doesn't
actually do the work.  That way you can do things like compare ext4,
ext4 + large prealloc, xfs, different disk schedulers, etc.  For our
dvr it was clear that the deadline scheduler did better, but only
because we had virtually no non-dvr disk accesses.  It would have been
really nice to be able to deprioritize all the non-realtime disk
accesses.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bupsplit.c copyright and patching
  2018-04-23 20:03                         ` bupsplit.c copyright and patching Nix
  2018-04-23 20:22                           ` Avery Pennarun
@ 2018-04-24 16:47                           ` Dave Chinner
  2018-05-02  8:57                             ` Nix
  1 sibling, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2018-04-24 16:47 UTC (permalink / raw)
  To: Nix; +Cc: Avery Pennarun, Rob Browning, Robert Evans, bup-list, linux-xfs

On Mon, Apr 23, 2018 at 09:03:26PM +0100, Nix wrote:
> The XFS FAQ claims that this is, ah, not good for xfs performance, but
> this may be one of those XFS canards that is wildly out of date, like
> almost all the online tuning hints telling you to do something on the
> mkfs.xfs line, most of which actually make performance *worse*.
> 
> xfs folks, could you confirm that the deadline scheduler really is still
> necessary for XFS atop md, and that CFS is still a distinctly bad idea?

Yes, the problem still exists. CFQ just doesn't work well with
workloads that issue concurrent, dependent IOs from multiple
processes, nor does it work well with hardware raid arrays with
non-volatile caches that have unpredictable IO performance. This
sort of workload and hardware is common in the sorts of high
performance applications we see run on XFS filesystems, and we avoid
CFQ as much as possible....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bupsplit.c copyright and patching
  2018-04-24 16:47                           ` Dave Chinner
@ 2018-05-02  8:57                             ` Nix
  0 siblings, 0 replies; 6+ messages in thread
From: Nix @ 2018-05-02  8:57 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Avery Pennarun, Rob Browning, Robert Evans, bup-list, linux-xfs

On 24 Apr 2018, Dave Chinner said:
> Yes, the problem still exists. CFQ just doesn't work well with
> workloads that issue concurrent, dependent IOs from multiple
> processes, nor does it work well with hardware raid arrays with
> non-volatile caches that have unpredictable IO performance. This
> sort of workload and hardware is common in the sorts of high
> performance applications we see run on XFS filesystems, and we avoid
> CFQ as much as possible....

Oh! I thought its problem was with concurrent *independent* I/Os. If
it's fine with those, and you're using md (not hardware RAID) it sounds
like the wost problems with CFQ may be avoided? (This is probably not
too surprising in hindsight, since those are the characteristics of
things like kernel compile runs, the one workload Linux will always work
well with!)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-05-02  8:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <6179db6e-786d-40a2-936d-9fb4dfa2529f@googlegroups.com>
     [not found] ` <87k1uak2xb.fsf@trouble.defaultvalue.org>
     [not found]   ` <CAHqTa-1eKSZq6Vph+th5YH-b43NAvF+BWCDoJqPJQGa3=Oze-w@mail.gmail.com>
     [not found]     ` <874llde5yn.fsf@trouble.defaultvalue.org>
     [not found]       ` <CAHqTa-1ghU6+Y0Y2pBOjbS=7CWKMytPvj-c1Z0aE3=PqpPi1OA@mail.gmail.com>
     [not found]         ` <87k1u0ca2d.fsf@trouble.defaultvalue.org>
     [not found]           ` <CAHqTa-0B-iZfjhX4pHMnb0d-XRhY_wCywsjOOnOgSRF1iFKN4Q@mail.gmail.com>
     [not found]             ` <871sfsefs7.fsf@esperi.org.uk>
     [not found]               ` <CAHqTa-25QqzDb+u9_4zXuoMGU5uF9nv_pmioLE+xJ+0pHbiYUg@mail.gmail.com>
     [not found]                 ` <87k1szdni5.fsf@esperi.org.uk>
     [not found]                   ` <CAHqTa-28D05VafHNeS0f16qHQ9A6g=Vb8u=YEODquniwfEAygA@mail.gmail.com>
     [not found]                     ` <871sf5ddi7.fsf@esperi.org.uk>
     [not found]                       ` <CAHqTa-3KHrC67r9tZs5kFNF7bSh5Dt_AYHnAEhHziqvAijD_wA@mail.gmail.com>
2018-04-23 20:03                         ` bupsplit.c copyright and patching Nix
2018-04-23 20:22                           ` Avery Pennarun
2018-04-23 21:53                             ` Nix
2018-04-23 22:06                               ` Avery Pennarun
2018-04-24 16:47                           ` Dave Chinner
2018-05-02  8:57                             ` Nix

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.