* Re: bupsplit.c copyright and patching [not found] ` <CAHqTa-3KHrC67r9tZs5kFNF7bSh5Dt_AYHnAEhHziqvAijD_wA@mail.gmail.com> @ 2018-04-23 20:03 ` Nix 2018-04-23 20:22 ` Avery Pennarun 2018-04-24 16:47 ` Dave Chinner 0 siblings, 2 replies; 6+ messages in thread From: Nix @ 2018-04-23 20:03 UTC (permalink / raw) To: Avery Pennarun; +Cc: Rob Browning, Robert Evans, bup-list, linux-xfs [Cc:ed in the xfs list to ask a question: see the last quoted section below] On 23 Apr 2018, Avery Pennarun stated: > On Mon, Apr 23, 2018 at 1:44 PM, Nix <nix@esperi.org.uk> wrote: >> Hm. Checking the documentation it looks like the scheduler is smarter >> than I thought: it does try to batch the requests and service as many as >> possible in each sweep across the disk surface, but it is indeed only >> tunable on a systemwide basis :( > > Yeah, my understanding was that only cfq actually cares about ionice. Yes, though idle versus non-idle can be used by other components too: it can tell bcache not to cache low-priority reads, for instance (pretty crucial if you've just done an index deletion, or the next bup run would destroy your entire bcache!) > That's really a shame: bup does a great job (basically zero > performance impact) when run at ionice 'idle" priority, especially > since it uses fadvise() to tell the kernel when it's done with files, > so it doesn't get other people's stuff evicted from page cache. Yeah. Mind you, I don't actually notice its performance impact here, with the deadline scheduler, but part of that is bcache and the rest is 128GiB RAM. We can't require users to have something like *that*. :P (heck most of my systems are much smaller.) > On the other hand, maybe what you actually want is just cfq with your > high-priority tasks given a higher-than-average ionice priority. I The XFS FAQ claims that this is, ah, not good for xfs performance, but this may be one of those XFS canards that is wildly out of date, like almost all the online tuning hints telling you to do something on the mkfs.xfs line, most of which actually make performance *worse*. xfs folks, could you confirm that the deadline scheduler really is still necessary for XFS atop md, and that CFS is still a distinctly bad idea? I'm trying to evaluate possible bup improvements before they're made (involving making *lots* of I/O requests in parallel, i.e. dozens, and relying on the I/O scheduler to sort it out, even if the number of requests is way above the number of rotating-rust spindles), and it has been put forward that CFS will do the right thing here and deadline is likely to be just terrible. <http://xfs.org/index.php/XFS_FAQ#Q:_Which_I.2FO_scheduler_for_XFS.3F> suggests otherwise (particularly for parallel workloads such as, uh, this one), but even on xfs.org I am somewhat concerned about stale recommendations causing trouble... ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: bupsplit.c copyright and patching 2018-04-23 20:03 ` bupsplit.c copyright and patching Nix @ 2018-04-23 20:22 ` Avery Pennarun 2018-04-23 21:53 ` Nix 2018-04-24 16:47 ` Dave Chinner 1 sibling, 1 reply; 6+ messages in thread From: Avery Pennarun @ 2018-04-23 20:22 UTC (permalink / raw) To: Nix; +Cc: Rob Browning, Robert Evans, bup-list, linux-xfs On Mon, Apr 23, 2018 at 4:03 PM, Nix <nix@esperi.org.uk> wrote: > [Cc:ed in the xfs list to ask a question: see the last quoted section > below] > > On 23 Apr 2018, Avery Pennarun stated: > >> On Mon, Apr 23, 2018 at 1:44 PM, Nix <nix@esperi.org.uk> wrote: >>> Hm. Checking the documentation it looks like the scheduler is smarter >>> than I thought: it does try to batch the requests and service as many as >>> possible in each sweep across the disk surface, but it is indeed only >>> tunable on a systemwide basis :( >> >> Yeah, my understanding was that only cfq actually cares about ionice. > > Yes, though idle versus non-idle can be used by other components too: it > can tell bcache not to cache low-priority reads, for instance (pretty > crucial if you've just done an index deletion, or the next bup run would > destroy your entire bcache!) Hmm, I don't know about that. It's hard to completely avoid caching things at all, because the usual way of things is to load stuff into the cache, then feed it back to userspace sometime shortly afterward. Not doing so can make things much worse (eg. if someone tries to stat() the same file twice in a row or something). bup's way of doing fadvise() when it's done with a file seems to work pretty well in my experience, explicitly not churning the kernel cache even when doing a full backup from scratch. >> That's really a shame: bup does a great job (basically zero >> performance impact) when run at ionice 'idle" priority, especially >> since it uses fadvise() to tell the kernel when it's done with files, >> so it doesn't get other people's stuff evicted from page cache. > > Yeah. Mind you, I don't actually notice its performance impact here, > with the deadline scheduler, but part of that is bcache and the rest is > 128GiB RAM. We can't require users to have something like *that*. :P > (heck most of my systems are much smaller.) Inherently, doing a system backup will need to cause a huge number of disk seeks. If you are checking a couple of realtime streams to make sure they don't miss any deadlines, then you might not notice any impact, but you ought to notice a reduction in total available throughput while a big backup task is running (unless it's running using idle priority). This would presumably be even worse if the backup task is running many file reads/stats in parallel. Incidentally, I have a tool that we used on a DVR product to ensure we could support multiple realtime streams under heavy load (ie. something like 12 readers + 12 writers on a single 7200 RPM disk). For that use case, it was easy to see that deadline was better for keeping deadlines (imagine!) than cfq. But cfq got more total throughput. This was on ext4 with preallocation though, not xfs. The tool I wrote is diskbench, available here: https://gfiber.googlesource.com/vendor/google/platform/+/master/cmds/diskbench.c > <http://xfs.org/index.php/XFS_FAQ#Q:_Which_I.2FO_scheduler_for_XFS.3F> > suggests otherwise (particularly for parallel workloads such as, uh, > this one), but even on xfs.org I am somewhat concerned about stale > recommendations causing trouble... That section seems... not so well supported. If your deadlines are incredibly short, there might be something to it, but the whole point of cfq and this sort of time slicing is to minimize seeks. It might cause the disk to linger longer in a particular section without seeking, but if (eg.) one of the tasks decides it wants to read from all over the disk, it wouldn't make sense to just let that task do whatever it wants, then switch to another task that does whatever it wants, and so on. That would seem to *maximize* seeks as well as latency, which is just bad for everyone. Empirically cfq is quite good :) Have fun, Avery ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: bupsplit.c copyright and patching 2018-04-23 20:22 ` Avery Pennarun @ 2018-04-23 21:53 ` Nix 2018-04-23 22:06 ` Avery Pennarun 0 siblings, 1 reply; 6+ messages in thread From: Nix @ 2018-04-23 21:53 UTC (permalink / raw) To: Avery Pennarun; +Cc: Rob Browning, Robert Evans, bup-list, linux-xfs On 23 Apr 2018, Avery Pennarun stated: > On Mon, Apr 23, 2018 at 4:03 PM, Nix <nix@esperi.org.uk> wrote: >> On 23 Apr 2018, Avery Pennarun stated: >> Yes, though idle versus non-idle can be used by other components too: it >> can tell bcache not to cache low-priority reads, for instance (pretty >> crucial if you've just done an index deletion, or the next bup run would >> destroy your entire bcache!) > > Hmm, I don't know about that. It's hard to completely avoid caching > things at all, because the usual way of things is to load stuff into > the cache, then feed it back to userspace sometime shortly afterward. > Not doing so can make things much worse (eg. if someone tries to > stat() the same file twice in a row or something). bup's way of doing > fadvise() when it's done with a file seems to work pretty well in my > experience, explicitly not churning the kernel cache even when doing a > full backup from scratch. Bear in mind that the 'cache' I mentioned above is a bcache cache, i.e. *writing the data to SSD*. :) It'll still get read into the page cache! >>> That's really a shame: bup does a great job (basically zero >>> performance impact) when run at ionice 'idle" priority, especially >>> since it uses fadvise() to tell the kernel when it's done with files, >>> so it doesn't get other people's stuff evicted from page cache. >> >> Yeah. Mind you, I don't actually notice its performance impact here, >> with the deadline scheduler, but part of that is bcache and the rest is >> 128GiB RAM. We can't require users to have something like *that*. :P >> (heck most of my systems are much smaller.) > > Inherently, doing a system backup will need to cause a huge number of > disk seeks. This is actually something where having a few hundred GiB of bcache helps a great deal. A great deal of seeky metadata gets cached during normal usage, and then bup will use it, even if it's not allowed to populate it more. > If you are checking a couple of realtime streams to make > sure they don't miss any deadlines, then you might not notice any > impact, but you ought to notice a reduction in total available > throughput while a big backup task is running (unless it's running > using idle priority). Of course it is. Why would you want to run a backup task at higher priority than that? :) > Incidentally, I have a tool that we used on a DVR product to ensure we > could support multiple realtime streams under heavy load (ie. > something like 12 readers + 12 writers on a single 7200 RPM disk). This is also what xfs's realtime stuff was meant for, back in the day. -- NULL && (void) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: bupsplit.c copyright and patching 2018-04-23 21:53 ` Nix @ 2018-04-23 22:06 ` Avery Pennarun 0 siblings, 0 replies; 6+ messages in thread From: Avery Pennarun @ 2018-04-23 22:06 UTC (permalink / raw) To: Nix; +Cc: Rob Browning, Robert Evans, bup-list, linux-xfs On Mon, Apr 23, 2018 at 5:53 PM, Nix <nix@esperi.org.uk> wrote: > On 23 Apr 2018, Avery Pennarun stated: >> If you are checking a couple of realtime streams to make >> sure they don't miss any deadlines, then you might not notice any >> impact, but you ought to notice a reduction in total available >> throughput while a big backup task is running (unless it's running >> using idle priority). > > Of course it is. Why would you want to run a backup task at higher > priority than that? :) Well, assuming your block scheduler supports it :) Anyway, the idea is that this is relatively easy to benchmark empirically, as long as you know what to look for. >> Incidentally, I have a tool that we used on a DVR product to ensure we >> could support multiple realtime streams under heavy load (ie. >> something like 12 readers + 12 writers on a single 7200 RPM disk). > > This is also what xfs's realtime stuff was meant for, back in the day. Oops, I wasn't clear. diskbench is a tool for checking whether your cpu + disk + filesystem + scheduler can handle the load. It doesn't actually do the work. That way you can do things like compare ext4, ext4 + large prealloc, xfs, different disk schedulers, etc. For our dvr it was clear that the deadline scheduler did better, but only because we had virtually no non-dvr disk accesses. It would have been really nice to be able to deprioritize all the non-realtime disk accesses. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: bupsplit.c copyright and patching 2018-04-23 20:03 ` bupsplit.c copyright and patching Nix 2018-04-23 20:22 ` Avery Pennarun @ 2018-04-24 16:47 ` Dave Chinner 2018-05-02 8:57 ` Nix 1 sibling, 1 reply; 6+ messages in thread From: Dave Chinner @ 2018-04-24 16:47 UTC (permalink / raw) To: Nix; +Cc: Avery Pennarun, Rob Browning, Robert Evans, bup-list, linux-xfs On Mon, Apr 23, 2018 at 09:03:26PM +0100, Nix wrote: > The XFS FAQ claims that this is, ah, not good for xfs performance, but > this may be one of those XFS canards that is wildly out of date, like > almost all the online tuning hints telling you to do something on the > mkfs.xfs line, most of which actually make performance *worse*. > > xfs folks, could you confirm that the deadline scheduler really is still > necessary for XFS atop md, and that CFS is still a distinctly bad idea? Yes, the problem still exists. CFQ just doesn't work well with workloads that issue concurrent, dependent IOs from multiple processes, nor does it work well with hardware raid arrays with non-volatile caches that have unpredictable IO performance. This sort of workload and hardware is common in the sorts of high performance applications we see run on XFS filesystems, and we avoid CFQ as much as possible.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: bupsplit.c copyright and patching 2018-04-24 16:47 ` Dave Chinner @ 2018-05-02 8:57 ` Nix 0 siblings, 0 replies; 6+ messages in thread From: Nix @ 2018-05-02 8:57 UTC (permalink / raw) To: Dave Chinner Cc: Avery Pennarun, Rob Browning, Robert Evans, bup-list, linux-xfs On 24 Apr 2018, Dave Chinner said: > Yes, the problem still exists. CFQ just doesn't work well with > workloads that issue concurrent, dependent IOs from multiple > processes, nor does it work well with hardware raid arrays with > non-volatile caches that have unpredictable IO performance. This > sort of workload and hardware is common in the sorts of high > performance applications we see run on XFS filesystems, and we avoid > CFQ as much as possible.... Oh! I thought its problem was with concurrent *independent* I/Os. If it's fine with those, and you're using md (not hardware RAID) it sounds like the wost problems with CFQ may be avoided? (This is probably not too surprising in hindsight, since those are the characteristics of things like kernel compile runs, the one workload Linux will always work well with!) ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-05-02 8:57 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <6179db6e-786d-40a2-936d-9fb4dfa2529f@googlegroups.com> [not found] ` <87k1uak2xb.fsf@trouble.defaultvalue.org> [not found] ` <CAHqTa-1eKSZq6Vph+th5YH-b43NAvF+BWCDoJqPJQGa3=Oze-w@mail.gmail.com> [not found] ` <874llde5yn.fsf@trouble.defaultvalue.org> [not found] ` <CAHqTa-1ghU6+Y0Y2pBOjbS=7CWKMytPvj-c1Z0aE3=PqpPi1OA@mail.gmail.com> [not found] ` <87k1u0ca2d.fsf@trouble.defaultvalue.org> [not found] ` <CAHqTa-0B-iZfjhX4pHMnb0d-XRhY_wCywsjOOnOgSRF1iFKN4Q@mail.gmail.com> [not found] ` <871sfsefs7.fsf@esperi.org.uk> [not found] ` <CAHqTa-25QqzDb+u9_4zXuoMGU5uF9nv_pmioLE+xJ+0pHbiYUg@mail.gmail.com> [not found] ` <87k1szdni5.fsf@esperi.org.uk> [not found] ` <CAHqTa-28D05VafHNeS0f16qHQ9A6g=Vb8u=YEODquniwfEAygA@mail.gmail.com> [not found] ` <871sf5ddi7.fsf@esperi.org.uk> [not found] ` <CAHqTa-3KHrC67r9tZs5kFNF7bSh5Dt_AYHnAEhHziqvAijD_wA@mail.gmail.com> 2018-04-23 20:03 ` bupsplit.c copyright and patching Nix 2018-04-23 20:22 ` Avery Pennarun 2018-04-23 21:53 ` Nix 2018-04-23 22:06 ` Avery Pennarun 2018-04-24 16:47 ` Dave Chinner 2018-05-02 8:57 ` Nix
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.