* [Ksummit-discuss] [TECH TOPIC] Addressing long-standing high-latency problems related to I/O @ 2016-09-16 7:55 Paolo Valente 2016-09-16 8:24 ` Greg KH 2016-09-21 14:30 ` Bart Van Assche 0 siblings, 2 replies; 20+ messages in thread From: Paolo Valente @ 2016-09-16 7:55 UTC (permalink / raw) To: ksummit-discuss; +Cc: b.zolnierkie, Jens Axboe, hare, Tejun Heo, osandov, hch Linux systems suffers from long-standing high-latency problems, at system and application level, related to I/O. For example, they usually suffer from poor responsiveness--or even starvation, depending on the workload--while, e.g., one or more files are being read/written/copied. On a similar note, background workloads may cause audio/video playback/streaming to stutter, even with long gaps. A lot of test results on this problem can be found here [1] (I'm citing only this resource just because I'm familiar with it, but evidence can be found in countless technical reports, scientific papers, forum discussions, and so on). These problems are caused mainly by poor I/O scheduling, although I/O schedulers are not the only culprit. To address these issues, eight years ago I started to work on a new I/O scheduler, named BFQ [1], with other researchers and developers. Since then, we have improved and fine-tuned BFQ rather constantly. In particular we have tested it extensively, especially on desktop system. In our easily-repeatable experiments, BFQ proves to be able to solve latency issues in many, if not most, use cases [2]. For example, regardless of the background workload considered in [2], application start-up times are about the same as when the storage device is idle. Similarly, audio/video playback is always perfectly smooth. The feedback received so far confirms our results. In this respect, BFQ is, e.g., the default I/O scheduler in a few distributions, including Sabayon and Arch Linux ARM, as well as in CyanoGenMod for several devices. BFQ has been submitted several times on lkml over the last eight years, the last times by me. But it has not made it, for (apparently) other reasons than how serious the latency problem is, or how effectively BFQ solves it. In short, the problem with the first patchsets was that they added a new scheduler, while it was decided that they should have replaced CFQ instead [3]. Then time passed in various submit&revise rounds. Meanwhile blk-mq has entered mainline, and a new objection has been raised: it is not sensible to touch code (blk) that will eventually be deprecated [4]. In view of these facts, I would like to propose a discussion on this topic, and, in particular on the following points: 1) If blk will still be used in a considerable number of systems for at least one or two more years, as many thinks it is the case, is it sensible to prevent a lot of users from enjoying a responsive and smooth system? It does not seem a good idea, also because having BFQ, or an even better variant of it, in blk would imply having a strong reference benchmark to drive the development of effective I/O scheduling also in blk-mq. 2) Work is going on in blk-mq to add I/O scheduling, but IMHO current approaches and ideas may not be sufficient to solve the above latency problems. So, still IMO, latency issues may get even worse for low- and medium-speed single-queue devices in the transition to blk-mq, as there is no I/O scheduling yet in blk-mq, and these issues may remain there if no accurate-enough scheduler is added. In contrast, solving latency issues, and not only improving throughput, is probably quite important to speed up the transition to blk-mq for these devices. 3) Can we join forces to solve latency problems in blk-mq? By working on BFQ, I should have gained some experience on I/O scheduling and on providing strong service guarantees (low latency, accurate bandwidth distribution, ...), yet I'm all but an expert of bulk-mq's inner workings and issues. I'm willing to help on all areas in this regard, including tasks related to the previous point. Thanks, Paolo [1] http://algogroup.unimore.it/people/paolo/disk_sched/ [2] http://algogroup.unimore.it/people/paolo/disk_sched/results.php [3] https://lists.linux-foundation.org/pipermail/containers/2014-June/034704.html [4] https://lkml.org/lkml/2016/8/8/207 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Addressing long-standing high-latency problems related to I/O 2016-09-16 7:55 [Ksummit-discuss] [TECH TOPIC] Addressing long-standing high-latency problems related to I/O Paolo Valente @ 2016-09-16 8:24 ` Greg KH 2016-09-16 8:59 ` Linus Walleij 2016-09-16 15:15 ` James Bottomley 2016-09-21 14:30 ` Bart Van Assche 1 sibling, 2 replies; 20+ messages in thread From: Greg KH @ 2016-09-16 8:24 UTC (permalink / raw) To: Paolo Valente Cc: b.zolnierkie, ksummit-discuss, Jens Axboe, hare, Tejun Heo, osandov, hch On Fri, Sep 16, 2016 at 09:55:45AM +0200, Paolo Valente wrote: > Linux systems suffers from long-standing high-latency problems, at > system and application level, related to I/O. For example, they > usually suffer from poor responsiveness--or even starvation, depending > on the workload--while, e.g., one or more files are being > read/written/copied. On a similar note, background workloads may > cause audio/video playback/streaming to stutter, even with long gaps. > A lot of test results on this problem can be found here [1] (I'm > citing only this resource just because I'm familiar with it, but > evidence can be found in countless technical reports, scientific > papers, forum discussions, and so on). <snip> Isn't this a better topic for the Vault conference, or the storage mini conference? thanks, greg k-h ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Addressing long-standing high-latency problems related to I/O 2016-09-16 8:24 ` Greg KH @ 2016-09-16 8:59 ` Linus Walleij 2016-09-16 9:10 ` Bart Van Assche 2016-09-22 9:18 ` Ulf Hansson 2016-09-16 15:15 ` James Bottomley 1 sibling, 2 replies; 20+ messages in thread From: Linus Walleij @ 2016-09-16 8:59 UTC (permalink / raw) To: Greg KH Cc: Bartlomiej Zolnierkiewicz, ksummit-discuss, Jens Axboe, hare, Tejun Heo, osandov, Christoph Hellwig On Fri, Sep 16, 2016 at 10:24 AM, Greg KH <gregkh@linuxfoundation.org> wrote: > On Fri, Sep 16, 2016 at 09:55:45AM +0200, Paolo Valente wrote: >> Linux systems suffers from long-standing high-latency problems, at >> system and application level, related to I/O. For example, they >> usually suffer from poor responsiveness--or even starvation, depending >> on the workload--while, e.g., one or more files are being >> read/written/copied. On a similar note, background workloads may >> cause audio/video playback/streaming to stutter, even with long gaps. >> A lot of test results on this problem can be found here [1] (I'm >> citing only this resource just because I'm familiar with it, but >> evidence can be found in countless technical reports, scientific >> papers, forum discussions, and so on). > > <snip> > > Isn't this a better topic for the Vault conference, or the storage mini > conference? Paolo was invited to the kernel summit and I guess so are the core block maintainers: Jens, Tejun, Christoph. The right people are there so why not take the opportunity. If for nothing else just have a formal chat. Overall I personally think the most KS-related discussion would be to address the problems Paolo has had to break into the block layer development community and the conflicting responses to the patch sets, which generated a few flak comments under the last LWN article: http://lwn.net/Articles/674308/ The main problem is that unlike some random driver this cannot be put into staging and adding it as a secondary (or tertiary or whatever) scheduling policy in block/* was explicitly nixed. AFAICT there is no clear answer from the block maintainers regarding: - Is the old blk layer deprecated or not? Christoph seems to say "yes, forget it, work on mq", but I am still unsure about Jens and Tejuns positions here. Would be nice with some consensus. If it is deprecated it would make sense not to merge any new code using it, right? - When is an all-out transition to mq really going to happen? "When it's ready and all blk consumers are migrated" is a good answer, but pretty unhelpful for developers like Paolo. Can we get a clearer picture? - What will subsystems (especially my pet peeve about MMC/SD which is single-queue by nature) that experience a performance regression with a switch to mq do? Not switch until mq has a scheduling policy? Switch and suck up the performance regression, multiplied by the number of Android handheld devices on the planet? I only have handwavy arguments about the latter being the case which is why I'm working on a patch to MMC/SD to switch to mq as an RFT. It's taking some time though, alas I'm not very smart. Yours, Linus Walleij ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Addressing long-standing high-latency problems related to I/O 2016-09-16 8:59 ` Linus Walleij @ 2016-09-16 9:10 ` Bart Van Assche 2016-09-16 11:24 ` Linus Walleij 2016-09-22 9:18 ` Ulf Hansson 1 sibling, 1 reply; 20+ messages in thread From: Bart Van Assche @ 2016-09-16 9:10 UTC (permalink / raw) To: Linus Walleij, Greg KH Cc: Bartlomiej Zolnierkiewicz, ksummit-discuss, Jens Axboe, hare, Tejun Heo, osandov, Christoph Hellwig On 09/16/2016 10:59 AM, Linus Walleij wrote: > - What will subsystems (especially my pet peeve about MMC/SD > which is single-queue by nature) that experience a performance > regression with a switch to mq do? Not switch until mq has a > scheduling policy? Switch and suck up the performance regression, > multiplied by the number of Android handheld devices on the > planet? > > I only have handwavy arguments about the latter being the > case which is why I'm working on a patch to MMC/SD to > switch to mq as an RFT. It's taking some time though, alas > I'm not very smart. Hello Linus, What was your reference when comparing blk-mq MMC/SD performance with the current implementation? Which I/O scheduler was used when measuring performance with the traditional block layer? If it was not noop, how does blk-mq performance of MMC/SD compare to the performance of the current implementation with noop scheduler? Thanks, Bart. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Addressing long-standing high-latency problems related to I/O 2016-09-16 9:10 ` Bart Van Assche @ 2016-09-16 11:24 ` Linus Walleij 2016-09-16 11:46 ` Arnd Bergmann 2016-09-16 11:53 ` Bart Van Assche 0 siblings, 2 replies; 20+ messages in thread From: Linus Walleij @ 2016-09-16 11:24 UTC (permalink / raw) To: Bart Van Assche Cc: Bartlomiej Zolnierkiewicz, ksummit-discuss, Greg KH, Jens Axboe, hare, Tejun Heo, osandov, Christoph Hellwig On Fri, Sep 16, 2016 at 11:10 AM, Bart Van Assche <bart.vanassche@sandisk.com> wrote: > What was your reference when comparing blk-mq MMC/SD performance with the > current implementation? I have *NOT* compared the performance, since I did not manage to replace blk with blk mq in MMC/SD yet. If someone else has more experience and can do this in 5 minutes to get a rough measure I would appreciate to see it. I am working on it from the bottom up, trying to make a not too stupid search/and/substitute replacement. As MMC is doing a lot of stacking requests and looking ahead and behind and what not, this needs to be done thoroughly. But this is the reference tests I have used for CFQ vs BFQ comparisons so far: Hardware: - ARM Integrator/AP IM-PD1 SD-card at 300kHz (!) - Ux500 with 7.18GiB eMMC - Ux500 with SanDisk 4GiB uSD card - ARM Juno with 2GiB Kingston uSD card - ARM Juno with SanDisk 4GiB uSD card - Marvell Kirkwood Feroceon ARM with 2GiB SD card First the standard dd-write/read test of course, because if you have performance issues there you can just forget about everything else. Looks something like: time dd if=/dev/mmcblk0 of=/dev/null bs=1M count=1024 iflag=direct That is with busybox dd/time. Then I used iozone which is something the mobile industry had traditionally used to provide some figures on storage throughput, as many just want a figure to put on their whitepaper, they use iozone, which will read and write a number of blocks of varying size, re-read it, re-write it and also perform reads and writes at random offsets: http://www.iozone.org/ I just usually use it like so: mount /dev/mmcblk0p1 /mnt iozone -az -i0 -i1 -i2 -s 20m -I -f /mnt/foo.test Both of these are simple to cross compile and run from an initramfs on ARM targets. Then I use Jens Axboe's fio. This is a more complicated beast intended to generate real-world workloads to emulate the load on your random Google or Facebook database server or image cluser or Idon'tknowwhat. https://github.com/axboe/fio It is not super-useful on MMC/SD cards, because the load will simply bog down everything and your typical embedded system will start to behave like an updating Android phone "optimizing applications" which is a known issue that is caused by the slowness of eMMC. It also eats memory quickly and that way just kills any embedded system because of OOM before you can make any meaningful tests. But it can spawn any number of readers & writers and stress out your device very efficiently if you have enough memory and CPU. (It is apparently designed to test systems with lots of memory and CPU power.) I mainly used fio on NAS type devices. For example on Marvell Kirkwood Pogoplug 4 with SATA, I can do a test like this to test an dmcrypt devicemapper thing: fio --filename=/dev/dm-0 --direct=1 --iodepth=1 --rw=read --bs=64K \ --size=1G --group_reporting --numjobs=1 --name=test_read > Which I/O scheduler was used when measuring > performance with the traditional block layer? I used CFQ, deadline, noop, and of course the BFQ patches. With BFQ I reproduced the figures reported by Paolo on a laptop but since his test cases use fio to stress the system and eMMC/SD are so slow, I couldn't come up with any good usecase using fio. Any hints on better tests are welcome! In the kernel logs I only see peole doing a lot of dd tests which I think is silly, you need more serious test cases so it's good if we can build some consensus there. What do you guys at SanDisk use? Yours, Linus Walleij ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Addressing long-standing high-latency problems related to I/O 2016-09-16 11:24 ` Linus Walleij @ 2016-09-16 11:46 ` Arnd Bergmann 2016-09-16 13:10 ` Paolo Valente 2016-09-16 13:36 ` Linus Walleij 2016-09-16 11:53 ` Bart Van Assche 1 sibling, 2 replies; 20+ messages in thread From: Arnd Bergmann @ 2016-09-16 11:46 UTC (permalink / raw) To: ksummit-discuss Cc: Bartlomiej Zolnierkiewicz, ksummit-discuss, Greg KH, Jens Axboe, hare, Tejun Heo, Bart Van Assche, osandov, Christoph Hellwig On Friday, September 16, 2016 1:24:07 PM CEST Linus Walleij wrote: > It is not super-useful on MMC/SD cards, because the load > will simply bog down everything and your typical embedded > system will start to behave like an updating Android phone > "optimizing applications" which is a known issue that is > caused by the slowness of eMMC. It also eats memory > quickly and that way just kills any embedded system because > of OOM before you can make any meaningful tests. But it > can spawn any number of readers & writers and stress out > your device very efficiently if you have enough memory > and CPU. (It is apparently designed to test systems with > lots of memory and CPU power.) I think it's more complex than "the slowness of eMMC": I would expect that in a read-only scenario, eMMC (or SD cards and most USB sticks) does't do that bad, it may be one order of magnitude slower than a hard drive but doesn't suffer from seeks during read by nearly as much. For writes, the situation is completely different on these, as you can just hit extremely long delays (up to a second) on a single write whenever the device goes into garbage collection mode, during which no other I/O is done, and that ends up stalling any process that is waiting for a read request. > I mainly used fio on NAS type devices. > For example on Marvell Kirkwood Pogoplug 4 with SATA, I > can do a test like this to test an dmcrypt devicemapper thing: > > fio --filename=/dev/dm-0 --direct=1 --iodepth=1 --rw=read --bs=64K \ > --size=1G --group_reporting --numjobs=1 --name=test_read > > > Which I/O scheduler was used when measuring > > performance with the traditional block layer? > > I used CFQ, deadline, noop, and of course the BFQ patches. > With BFQ I reproduced the figures reported by Paolo on a > laptop but since his test cases use fio to stress the system > and eMMC/SD are so slow, I couldn't come up with any good > usecase using fio. > > Any hints on better tests are welcome! > In the kernel logs I only see peole doing a lot of dd > tests which I think is silly, you need more serious > test cases so it's good if we can build some consensus > there. My guess is that the impact of the file system is much greater than the I/O scheduler. If the file system is well tuned to the storage device (e.g. f2fs should be near ideal), you can avoid most of the stalls regardless of the scheduler, while with file systems that are not aware of flash geometry at all (e.g. the now-removed ext3 code, especially with journaling), the scheduler won't be able to help that much either. What file system did you use for testing, and which tuning did you do for your storage devices? Maybe a better long-term strategy is to improve the important file systems (ext4, xfs, btrfs) further to work well with flash storage through blk-mq. Arnd ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Addressing long-standing high-latency problems related to I/O 2016-09-16 11:46 ` Arnd Bergmann @ 2016-09-16 13:10 ` Paolo Valente 2016-09-16 13:36 ` Linus Walleij 1 sibling, 0 replies; 20+ messages in thread From: Paolo Valente @ 2016-09-16 13:10 UTC (permalink / raw) To: Arnd Bergmann Cc: ksummit-discuss, Bartlomiej Zolnierkiewicz, ksummit-discuss, Greg KH, Jens Axboe, hare, Tejun Heo, Bart Van Assche, osandov, Christoph Hellwig > Il giorno 16 set 2016, alle ore 13:46, Arnd Bergmann <arnd@arndb.de> ha scritto: > <snip> > My guess is that the impact of the file system is much greater > than the I/O scheduler. If the file system is well tuned > to the storage device (e.g. f2fs should be near ideal), > you can avoid most of the stalls regardless of the scheduler, > while with file systems that are not aware of flash geometry > at all (e.g. the now-removed ext3 code, especially with > journaling), the scheduler won't be able to help that much > either. > If I have not misunderstood your guess, then it actually does not match our results for any test case [1]. More precisely, certain filesystems do improve performance for certain, or sometimes most workloads, but responsiveness, starvation and frame-drop issues remain basically unchanged. Per-filesystems results are not reported in [1], but, if you want, I can reproduce them for the filesystems you suggest. According to our experience, the fundamental problem is that either 1) The I/O scheduler goes on choosing the wrong I/O requests to dispatch for very long: seconds, minutes, or forever, depending on the workload and the scheduler. For example, one tries to start a new application while one or more files are being copied, and the I/O requests of the starting application are served very rarely, or not served at all until the copy is finished. Then the application takes a very long time to start, or simply does not start until the copy is finished. or 2) The I/O scheduler does nothing (noop), or does not exist (blk-mq), so service order is FIFO plus internal reordering in the storage device, where internal reordering is most often aimed at maximizing throughput. In this case, the problem described for the previous case usually gets much worse, because any Linux scheduler, apart from noop, tends somehow to achieve fairness and reduce latency. Thanks, Paolo [1] http://algogroup.unimore.it/people/paolo/disk_sched/results.php > What file system did you use for testing, and which tuning > did you do for your storage devices? > > Maybe a better long-term strategy is to improve the important > file systems (ext4, xfs, btrfs) further to work well with > flash storage through blk-mq. > > Arnd > _______________________________________________ > Ksummit-discuss mailing list > Ksummit-discuss@lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Addressing long-standing high-latency problems related to I/O 2016-09-16 11:46 ` Arnd Bergmann 2016-09-16 13:10 ` Paolo Valente @ 2016-09-16 13:36 ` Linus Walleij 1 sibling, 0 replies; 20+ messages in thread From: Linus Walleij @ 2016-09-16 13:36 UTC (permalink / raw) To: Arnd Bergmann Cc: ksummit-discuss, Bartlomiej Zolnierkiewicz, ksummit-discuss, Greg KH, Jens Axboe, hare, Tejun Heo, Bart Van Assche, Omar Sandoval, Christoph Hellwig On Fri, Sep 16, 2016 at 1:46 PM, Arnd Bergmann <arnd@arndb.de> wrote: > What file system did you use for testing, and which tuning > did you do for your storage devices? These were all with ext4. Yours, Linus Walleij ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Addressing long-standing high-latency problems related to I/O 2016-09-16 11:24 ` Linus Walleij 2016-09-16 11:46 ` Arnd Bergmann @ 2016-09-16 11:53 ` Bart Van Assche 1 sibling, 0 replies; 20+ messages in thread From: Bart Van Assche @ 2016-09-16 11:53 UTC (permalink / raw) To: Linus Walleij Cc: Bartlomiej Zolnierkiewicz, ksummit-discuss, Greg KH, Jens Axboe, hare, Tejun Heo, osandov, Christoph Hellwig On 09/16/2016 01:24 PM, Linus Walleij wrote: > What do you guys at SanDisk use? Hello Linus, We use fio for block device performance measurements. Before we run fio we disable C-state and P-state transitions to make sure that the results will not depend on any frequency scaling algorithm. Furthermore, we install a udev rule that sets the following block layer parameters for non-rotational devices: add_random=0 and rq_affinity=2. Bart. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Addressing long-standing high-latency problems related to I/O 2016-09-16 8:59 ` Linus Walleij 2016-09-16 9:10 ` Bart Van Assche @ 2016-09-22 9:18 ` Ulf Hansson 2016-09-22 11:06 ` Linus Walleij 1 sibling, 1 reply; 20+ messages in thread From: Ulf Hansson @ 2016-09-22 9:18 UTC (permalink / raw) To: Linus Walleij Cc: Bartlomiej Zolnierkiewicz, ksummit-discuss, Greg KH, Jens Axboe, hare, Tejun Heo, osandov, Christoph Hellwig On 16 September 2016 at 10:59, Linus Walleij <linus.walleij@linaro.org> wrote: > On Fri, Sep 16, 2016 at 10:24 AM, Greg KH <gregkh@linuxfoundation.org> wrote: >> On Fri, Sep 16, 2016 at 09:55:45AM +0200, Paolo Valente wrote: >>> Linux systems suffers from long-standing high-latency problems, at >>> system and application level, related to I/O. For example, they >>> usually suffer from poor responsiveness--or even starvation, depending >>> on the workload--while, e.g., one or more files are being >>> read/written/copied. On a similar note, background workloads may >>> cause audio/video playback/streaming to stutter, even with long gaps. >>> A lot of test results on this problem can be found here [1] (I'm >>> citing only this resource just because I'm familiar with it, but >>> evidence can be found in countless technical reports, scientific >>> papers, forum discussions, and so on). >> >> <snip> >> >> Isn't this a better topic for the Vault conference, or the storage mini >> conference? > > Paolo was invited to the kernel summit and I guess so are the > core block maintainers: Jens, Tejun, Christoph. The right people are > there so why not take the opportunity. > > If for nothing else just have a formal chat. Whatever form works for me! Although, I may join first at Tuesday as I will be at LPC. > > Overall I personally think the most KS-related discussion would be > to address the problems Paolo has had to break into the block layer > development community and the conflicting responses to the patch > sets, which generated a few flak comments under the last LWN > article: > http://lwn.net/Articles/674308/ > > The main problem is that unlike some random driver this cannot > be put into staging and adding it as a secondary (or tertiary or > whatever) scheduling policy in block/* was explicitly nixed. > > AFAICT there is no clear answer from the block maintainers > regarding: > > - Is the old blk layer deprecated or not? Christoph seems to > say "yes, forget it, work on mq", but I am still unsure about Jens > and Tejuns positions here. Would be nice with some consensus. > If it is deprecated it would make sense not to merge any new > code using it, right? > > - When is an all-out transition to mq really going to happen? > "When it's ready and all blk consumers are migrated" is a good > answer, but pretty unhelpful for developers like Paolo. > Can we get a clearer picture? > > - What will subsystems (especially my pet peeve about MMC/SD > which is single-queue by nature) that experience a performance > regression with a switch to mq do? Not switch until mq has a > scheduling policy? Switch and suck up the performance regression, > multiplied by the number of Android handheld devices on the > planet? With my MMC hat on, I would of course appreciate to reach a consensus about the three topics above. To me, the KS seems like a very good opportunity to meet and discuss this, especially since it seems like many important stakeholders will be there. > > I only have handwavy arguments about the latter being the > case which is why I'm working on a patch to MMC/SD to > switch to mq as an RFT. It's taking some time though, alas > I'm not very smart. I appreciate this! I don't expect it to be easy, as you would probably have to rip out most of the mmc block/core code related to request management. For example, I guess the asynchronous request mechanism doesn't really fit into blkmq, does it? Kind regards Ulf Hansson ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Addressing long-standing high-latency problems related to I/O 2016-09-22 9:18 ` Ulf Hansson @ 2016-09-22 11:06 ` Linus Walleij 0 siblings, 0 replies; 20+ messages in thread From: Linus Walleij @ 2016-09-22 11:06 UTC (permalink / raw) To: Ulf Hansson Cc: Bartlomiej Zolnierkiewicz, ksummit-discuss, Greg KH, Jens Axboe, hare, Tejun Heo, Omar Sandoval, Christoph Hellwig On Thu, Sep 22, 2016 at 11:18 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote: > On 16 September 2016 at 10:59, Linus Walleij <linus.walleij@linaro.org> wrote: >> I only have handwavy arguments about the latter being the >> case which is why I'm working on a patch to MMC/SD to >> switch to mq as an RFT. It's taking some time though, alas >> I'm not very smart. > > I appreciate this! I don't expect it to be easy, as you would probably > have to rip out most of the mmc block/core code related to request > management. > > For example, I guess the asynchronous request mechanism doesn't really > fit into blkmq, does it? Nopes. I have no idea how to make that work. I got blk-mq running for MMC/SD today and I see a gross performance regression, from 37 MB/s to 27 MB/s on Ux500 7.38 GB eMMC with a simple dd test: BEFORE switching to MQ: time dd if=/dev/mmcblk3 of=/dev/null bs=1M count=1024 1073741824 bytes (1.0GB) copied, 27.530335 seconds, 37.2MB/s real 0m 27.54s user 0m 0.02s sys 0m 7.56s AFTER switching to MQ: time dd if=/dev/mmcblk3 of=/dev/null bs=1M count=1024 1073741824 bytes (1.0GB) copied, 37.170990 seconds, 27.5MB/s real 0m 37.18s user 0m 0.02s sys 0m 7.32s I will however post my hacky patch as a RFD to the blockdevs and the block maintainers, along with the numbers and a speculation about what may be causing it. asynchronous requests (request pipelining) is one thing, another thing is front/back merge in the block layer I guess. I think I should give the blkdevs will have the opportunity to tell me off for all the stupid ways in which I should *not* be using MQ before we draw any conclusions from this... Yours, Linus Walleij ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Addressing long-standing high-latency problems related to I/O 2016-09-16 8:24 ` Greg KH 2016-09-16 8:59 ` Linus Walleij @ 2016-09-16 15:15 ` James Bottomley 2016-09-16 18:48 ` Paolo Valente 1 sibling, 1 reply; 20+ messages in thread From: James Bottomley @ 2016-09-16 15:15 UTC (permalink / raw) To: Greg KH, Paolo Valente Cc: b.zolnierkie, ksummit-discuss, Jens Axboe, hare, Tejun Heo, osandov, hch On Fri, 2016-09-16 at 10:24 +0200, Greg KH wrote: > On Fri, Sep 16, 2016 at 09:55:45AM +0200, Paolo Valente wrote: > > Linux systems suffers from long-standing high-latency problems, at > > system and application level, related to I/O. For example, they > > usually suffer from poor responsiveness--or even starvation, > > depending on the workload--while, e.g., one or more files are being > > read/written/copied. On a similar note, background workloads may > > cause audio/video playback/streaming to stutter, even with long > > gaps. A lot of test results on this problem can be found here [1] > > (I'm citing only this resource just because I'm familiar with it, > > but evidence can be found in countless technical reports, > > scientific papers, forum discussions, and so on). > > <snip> > > Isn't this a better topic for the Vault conference, or the storage > mini conference? LSF/MM would be the place to have the technical discussion, yes. It will be in Cambridge (MA,USA not the real one) in the Feb/March time frame in 2017. Far more of the storage experts (who likely want to weigh in) will be present. My understanding of the patch set is that you've only sent it as an RFC and the main criticism was that it only applied to our legacy interface, not the new mq one. You sent out an RFD for ideas around mq in August, but the main criticism was that your ideas would introduce a contention point. Omar Sandoval is also working on something similar in mq, are you actually talking to him? James ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Addressing long-standing high-latency problems related to I/O 2016-09-16 15:15 ` James Bottomley @ 2016-09-16 18:48 ` Paolo Valente 2016-09-16 19:36 ` James Bottomley 0 siblings, 1 reply; 20+ messages in thread From: Paolo Valente @ 2016-09-16 18:48 UTC (permalink / raw) To: James Bottomley Cc: Bartlomiej Zolnierkiewicz, ksummit-discuss, Greg KH, Jens Axboe, hare, Tejun Heo, osandov, hch > Il giorno 16 set 2016, alle ore 17:15, James Bottomley <James.Bottomley@HansenPartnership.com> ha scritto: > > On Fri, 2016-09-16 at 10:24 +0200, Greg KH wrote: >> On Fri, Sep 16, 2016 at 09:55:45AM +0200, Paolo Valente wrote: >>> Linux systems suffers from long-standing high-latency problems, at >>> system and application level, related to I/O. For example, they >>> usually suffer from poor responsiveness--or even starvation, >>> depending on the workload--while, e.g., one or more files are being >>> read/written/copied. On a similar note, background workloads may >>> cause audio/video playback/streaming to stutter, even with long >>> gaps. A lot of test results on this problem can be found here [1] >>> (I'm citing only this resource just because I'm familiar with it, >>> but evidence can be found in countless technical reports, >>> scientific papers, forum discussions, and so on). >> >> <snip> >> >> Isn't this a better topic for the Vault conference, or the storage >> mini conference? > > LSF/MM would be the place to have the technical discussion, yes. It > will be in Cambridge (MA,USA not the real one) in the Feb/March time > frame in 2017. Far more of the storage experts (who likely want to > weigh in) will be present. > Perfect venue. Just it would be a pity IMO to waste the opportunity of my being at KS with other people working on the components involved in high-latency issues, and to delay by more months a discussion on possible solutions. > My understanding of the patch set is that you've only sent it as an RFC Actually, in last submission the RFC tag was gone. > and the main criticism was that it only applied to our legacy > interface, not the new mq one. Yes. What puzzles me a little bit is that, over these years, virtually no ack or objection concerned how relevant/irrelevant the addressed latency problems are, or how effective/ineffective BFQ is in solving them. > You sent out an RFD for ideas around mq > in August, but the main criticism was that your ideas would introduce a > contention point. Yes, that criticism concerned one of my questions: I asked whether io contexts or something like that could be used for I/O scheduling in blk-mq. Since I have just started thinking about possible solutions to solve effectively latency issues in blk-mq, I'm trying to understand on what ground they could be based. Naively, I didn't realize that io contexts, in their current incarnation, are just unfeasible in a parallel framework. > Omar Sandoval is also working on something similar > in mq, are you actually talking to him? > One of the purposes of my RFD was exactly to talk with somebody like Omar. He did reply providing very useful information. As of now, my interaction with Omar consists just in the exchange of emails in that thread. That exchange is currently stuck at my last email, sent about three weeks ago, and containing some considerations and questions about the information Omar provided me in his email. Thanks, Paolo > James > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Addressing long-standing high-latency problems related to I/O 2016-09-16 18:48 ` Paolo Valente @ 2016-09-16 19:36 ` James Bottomley 2016-09-16 20:13 ` Paolo Valente ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: James Bottomley @ 2016-09-16 19:36 UTC (permalink / raw) To: Paolo Valente Cc: Bartlomiej Zolnierkiewicz, ksummit-discuss, Greg KH, Jens Axboe, hare, Tejun Heo, osandov, hch On Fri, 2016-09-16 at 20:48 +0200, Paolo Valente wrote: > > Il giorno 16 set 2016, alle ore 17:15, James Bottomley < > > James.Bottomley@HansenPartnership.com> ha scritto: > > > > On Fri, 2016-09-16 at 10:24 +0200, Greg KH wrote: > > > On Fri, Sep 16, 2016 at 09:55:45AM +0200, Paolo Valente wrote: > > > > Linux systems suffers from long-standing high-latency problems, > > > > at system and application level, related to I/O. For example, > > > > they usually suffer from poor responsiveness--or even > > > > starvation, depending on the workload--while, e.g., one or more > > > > files are being read/written/copied. On a similar note, > > > > background workloads may cause audio/video playback/streaming > > > > to stutter, even with long gaps. A lot of test results on this > > > > problem can be found here [1] (I'm citing only this resource > > > > just because I'm familiar with it, but evidence can be found in > > > > countless technical reports, scientific papers, forum > > > > discussions, and so on). > > > > > > <snip> > > > > > > Isn't this a better topic for the Vault conference, or the > > > storage mini conference? > > > > LSF/MM would be the place to have the technical discussion, yes. > > It will be in Cambridge (MA,USA not the real one) in the Feb/March > > time frame in 2017. Far more of the storage experts (who likely > > want to weigh in) will be present. > > > > Perfect venue. Just it would be a pity IMO to waste the opportunity > of my being at KS with other people working on the components > involved in high-latency issues, and to delay by more months a > discussion on possible solutions. OK, so the problem with a formal discussion of something like this at KS is that of the 80 or so people in the room, likely only 10 have any interest whatsoever, leading to intense boredom for the remaining 70. And for those 10, there were likely another 10 who didn't get invited who wanted the chance to express an opinion. Realistically, this is why we no-longer do technical discussions at KS: audience too broad and not enough specific subject matter experts. However, nothing says you can't have a discussion in the hallway if you're already going. > > My understanding of the patch set is that you've only sent it as an > > RFC > > Actually, in last submission the RFC tag was gone. > > > and the main criticism was that it only applied to our legacy > > interface, not the new mq one. > > Yes. What puzzles me a little bit is that, over these years, > virtually no ack or objection concerned how relevant/irrelevant the > addressed latency problems are, or how effective/ineffective BFQ is > in solving them. Where have you been posting them for years? I stay pretty close to block issues, but the first time I actually noticed was when you posted to linux-block on 1 Feb this year. > > You sent out an RFD for ideas around mq in August, but the main > > criticism was that your ideas would introduce a contention point. > > Yes, that criticism concerned one of my questions: I asked whether io > contexts or something like that could be used for I/O scheduling in > blk-mq. Since I have just started thinking about possible solutions > to solve effectively latency issues in blk-mq, I'm trying to > understand on what ground they could be based. Naively, I didn't > realize that io contexts, in their current incarnation, are just > unfeasible in a parallel framework. Well, I understand, but you're trying to get the attention of people who believe nothing now is important except blk-mq ... I'm afraid it means you do need to understand and adapt to the new toy. > > Omar Sandoval is also working on something similar > > in mq, are you actually talking to him? > > > > One of the purposes of my RFD was exactly to talk with somebody like > Omar. He did reply providing very useful information. As of now, my > interaction with Omar consists just in the exchange of emails in that > thread. That exchange is currently stuck at my last email, sent > about three weeks ago, and containing some considerations and > questions about the information Omar provided me in his email. My hazy recollection of Omar from the last LSF/MM is that he's quite a recent FB developer and he's got quite a lot to do ... he may just need reminding. James ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Addressing long-standing high-latency problems related to I/O 2016-09-16 19:36 ` James Bottomley @ 2016-09-16 20:13 ` Paolo Valente 2016-09-19 8:17 ` Jan Kara 2016-09-17 10:31 ` Linus Walleij 2016-09-21 13:51 ` Grant Likely 2 siblings, 1 reply; 20+ messages in thread From: Paolo Valente @ 2016-09-16 20:13 UTC (permalink / raw) To: James Bottomley Cc: Bartlomiej Zolnierkiewicz, ksummit-discuss, Greg KH, Jens Axboe, hare, Tejun Heo, osandov, Christoph Hellwig > Il giorno 16 set 2016, alle ore 21:36, James Bottomley <James.Bottomley@HansenPartnership.com> ha scritto: > > On Fri, 2016-09-16 at 20:48 +0200, Paolo Valente wrote: >>> Il giorno 16 set 2016, alle ore 17:15, James Bottomley < >>> James.Bottomley@HansenPartnership.com> ha scritto: >>> >>> On Fri, 2016-09-16 at 10:24 +0200, Greg KH wrote: >>>> On Fri, Sep 16, 2016 at 09:55:45AM +0200, Paolo Valente wrote: >>>>> Linux systems suffers from long-standing high-latency problems, >>>>> at system and application level, related to I/O. For example, >>>>> they usually suffer from poor responsiveness--or even >>>>> starvation, depending on the workload--while, e.g., one or more >>>>> files are being read/written/copied. On a similar note, >>>>> background workloads may cause audio/video playback/streaming >>>>> to stutter, even with long gaps. A lot of test results on this >>>>> problem can be found here [1] (I'm citing only this resource >>>>> just because I'm familiar with it, but evidence can be found in >>>>> countless technical reports, scientific papers, forum >>>>> discussions, and so on). >>>> >>>> <snip> >>>> >>>> Isn't this a better topic for the Vault conference, or the >>>> storage mini conference? >>> >>> LSF/MM would be the place to have the technical discussion, yes. >>> It will be in Cambridge (MA,USA not the real one) in the Feb/March >>> time frame in 2017. Far more of the storage experts (who likely >>> want to weigh in) will be present. >>> >> >> Perfect venue. Just it would be a pity IMO to waste the opportunity >> of my being at KS with other people working on the components >> involved in high-latency issues, and to delay by more months a >> discussion on possible solutions. > > OK, so the problem with a formal discussion of something like this at > KS is that of the 80 or so people in the room, likely only 10 have any > interest whatsoever, leading to intense boredom for the remaining 70. No no, that would be scary to me, given the level of the audience! I thought it would have been possible to arrange some sort of sub-discussions with limited groups (although maybe the fact the Linux still suffers from high latencies might somehow worry all people that care about the kernel). I'm sorry, but this will be my first time at KS. > > And for those 10, there were likely another 10 who didn't get invited > who wanted the chance to express an opinion. Realistically, this is > why we no-longer do technical discussions at KS: audience too broad and > not enough specific subject matter experts. > > However, nothing says you can't have a discussion in the hallway if > you're already going. > Which may be enough to raise more awareness. >>> My understanding of the patch set is that you've only sent it as an >>> RFC >> >> Actually, in last submission the RFC tag was gone. >> >>> and the main criticism was that it only applied to our legacy >>> interface, not the new mq one. >> >> Yes. What puzzles me a little bit is that, over these years, >> virtually no ack or objection concerned how relevant/irrelevant the >> addressed latency problems are, or how effective/ineffective BFQ is >> in solving them. > > Where have you been posting them for years? I stay pretty close to > block issues, but the first time I actually noticed was when you posted > to linux-block on 1 Feb this year. > I forgot all BFQ submissions too :) (please have a look at the last link in the following list) After a little search, here are the very first ones: https://lkml.org/lkml/2008/4/1/234 https://lkml.org/lkml/2008/11/11/148 Then the first one with the new version of BFQ: https://lkml.org/lkml/2014/5/27/314 After a few other rounds in the last two years, the last one (which you already saw according to your summary): https://lkml.org/lkml/2016/8/8/207 And, maybe even more relevant, a very short feedback highlighting that the problem seems to be still alive, well and serious: https://lkml.org/lkml/2016/9/9/154 >>> You sent out an RFD for ideas around mq in August, but the main >>> criticism was that your ideas would introduce a contention point. >> >> Yes, that criticism concerned one of my questions: I asked whether io >> contexts or something like that could be used for I/O scheduling in >> blk-mq. Since I have just started thinking about possible solutions >> to solve effectively latency issues in blk-mq, I'm trying to >> understand on what ground they could be based. Naively, I didn't >> realize that io contexts, in their current incarnation, are just >> unfeasible in a parallel framework. > > Well, I understand, but you're trying to get the attention of people > who believe nothing now is important except blk-mq ... I'm afraid it > means you do need to understand and adapt to the new toy. > I did notice it ... And we are trying to find a good way to retrofit strong low-latency guarantees in blk-mq too. Anyway, a concrete problem is that, if I'm not completely mistaken, there is a huge number of users and sysadmins that could already enjoy much better Linux systems, while waiting for the transition to blk-mq to be completed. >>> Omar Sandoval is also working on something similar >>> in mq, are you actually talking to him? >>> >> >> One of the purposes of my RFD was exactly to talk with somebody like >> Omar. He did reply providing very useful information. As of now, my >> interaction with Omar consists just in the exchange of emails in that >> thread. That exchange is currently stuck at my last email, sent >> about three weeks ago, and containing some considerations and >> questions about the information Omar provided me in his email. > > My hazy recollection of Omar from the last LSF/MM is that he's quite a > recent FB developer and he's got quite a lot to do ... he may just need > reminding. > Then I will follow your advise. Thank you very much, Paolo > James ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Addressing long-standing high-latency problems related to I/O 2016-09-16 20:13 ` Paolo Valente @ 2016-09-19 8:17 ` Jan Kara 0 siblings, 0 replies; 20+ messages in thread From: Jan Kara @ 2016-09-19 8:17 UTC (permalink / raw) To: Paolo Valente Cc: Jens Axboe, Bartlomiej Zolnierkiewicz, ksummit-discuss, Greg KH, James Bottomley, hare, Tejun Heo, osandov, Christoph Hellwig On Fri 16-09-16 22:13:44, Paolo Valente wrote: > > Il giorno 16 set 2016, alle ore 21:36, James Bottomley <James.Bottomley@HansenPartnership.com> ha scritto: > > > > On Fri, 2016-09-16 at 20:48 +0200, Paolo Valente wrote: > >>> Il giorno 16 set 2016, alle ore 17:15, James Bottomley < > >>> James.Bottomley@HansenPartnership.com> ha scritto: > >>> > >>> On Fri, 2016-09-16 at 10:24 +0200, Greg KH wrote: > >>>> On Fri, Sep 16, 2016 at 09:55:45AM +0200, Paolo Valente wrote: > >>>>> Linux systems suffers from long-standing high-latency problems, > >>>>> at system and application level, related to I/O. For example, > >>>>> they usually suffer from poor responsiveness--or even > >>>>> starvation, depending on the workload--while, e.g., one or more > >>>>> files are being read/written/copied. On a similar note, > >>>>> background workloads may cause audio/video playback/streaming > >>>>> to stutter, even with long gaps. A lot of test results on this > >>>>> problem can be found here [1] (I'm citing only this resource > >>>>> just because I'm familiar with it, but evidence can be found in > >>>>> countless technical reports, scientific papers, forum > >>>>> discussions, and so on). > >>>> > >>>> <snip> > >>>> > >>>> Isn't this a better topic for the Vault conference, or the > >>>> storage mini conference? > >>> > >>> LSF/MM would be the place to have the technical discussion, yes. > >>> It will be in Cambridge (MA,USA not the real one) in the Feb/March > >>> time frame in 2017. Far more of the storage experts (who likely > >>> want to weigh in) will be present. > >>> > >> > >> Perfect venue. Just it would be a pity IMO to waste the opportunity > >> of my being at KS with other people working on the components > >> involved in high-latency issues, and to delay by more months a > >> discussion on possible solutions. > > > > OK, so the problem with a formal discussion of something like this at > > KS is that of the 80 or so people in the room, likely only 10 have any > > interest whatsoever, leading to intense boredom for the remaining 70. > > No no, that would be scary to me, given the level of the audience! I > thought it would have been possible to arrange some sort of > sub-discussions with limited groups (although maybe the fact the Linux > still suffers from high latencies might somehow worry all people that > care about the kernel). I'm sorry, but this will be my first time at KS. Yeah, so I'll be at KS and I'd be interested in this discussion. Actually I expect to have Jens Axboe and Christoph Hellwig around as well which are biggest blk-mq proponents so I think the most important people for the discussion about what are the blockers for merging are there. I agree that for a discussion about details of the scheduling algorithm LSF/MM is a better venue but at least for a process discussion under which conditions BFQ is mergeable KS is OK. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Addressing long-standing high-latency problems related to I/O 2016-09-16 19:36 ` James Bottomley 2016-09-16 20:13 ` Paolo Valente @ 2016-09-17 10:31 ` Linus Walleij 2016-09-21 13:51 ` Grant Likely 2 siblings, 0 replies; 20+ messages in thread From: Linus Walleij @ 2016-09-17 10:31 UTC (permalink / raw) To: James Bottomley Cc: Bartlomiej Zolnierkiewicz, ksummit-discuss, Greg KH, Jens Axboe, hare, Tejun Heo, Omar Sandoval, Christoph Hellwig On Fri, Sep 16, 2016 at 9:36 PM, James Bottomley <James.Bottomley@hansenpartnership.com> wrote: > OK, so the problem with a formal discussion of something like this at > KS is that of the 80 or so people in the room, likely only 10 have any > interest whatsoever, leading to intense boredom for the remaining 70. If it is about the semantics of CFQ, BFQ and MQ yes. But what is lurking here is also a social problem and that need to be addressed at KS IMO. Following this from a laidback bystander position I recognize patterns that we saw earlier in the CPU scheduler community when the fuzz was all about the interactive qualities of the O(1) scheduler vs rotating staircase and the eventual merge of the (awesome) CFS scheduler. And the kind of attention that brought about the (cool) CPU deadline scheduler. I think the problem is in the original Thomas Kuhnian sense paradigmatic. That is, loosely defined, what kind of questions should be addressed and what type of answers that could be expected, a set of working assumptions for the community so that it can make steady progress and not be disturbed by irrelevant noise. The block layer people are working inside a paradigm, that is something like "use the hardware optimally to maximize throughput". It is obvious from things like mq and the fio tool that this is what is percieved as the problem space it sets out to manouver. Now comes along this italian who says something totally perpendicular like "but I care much more about latency", i.e. how interactive the system is for a user, start-up time of applications under load, no skipping in the media players under system load and things like that. Well that is no storage cluster use case... (Modified truth: Paolo actually consulted for a storage provider that did not provide a certain average thoughput, but instead an as exact throughput rate as possible, which made BFQ fit their usecase better. But you get the point.) The usual reaction from people working inside the paradigm to concepts alien to them will be a series of shrugs and yawns. And that is human. Don't rock the boat. Sit down. Or even "can't you just take a bigger and faster nvram disk? Well, I think you will be able to in two years so give up right now." Even more intimidating when he's making research reports and measurements and develop repeatable test cases to prove the point. In the past CPU scheduler debate some people have done lame handwavy arguments as to why this or that scheduler is so much better, but that is not the case here. Paolo's tests are very real. Scientific, repeatable, hard measures. The point is, I suspect that the block layer community is all about throughput and the talk about latency and interactivity is seen as an annoying distraction. Like the kids making noise about doing detours for catching Pokémons in the back seat of the car while you're in the driving seat, driving to some percieved important destination. If you see what I mean. Their problems is not really your problem, so you don't care much. It will be more "yeah yeah, we'll see about your Pokémons. Someday." But as in the case with the CPU schedulers, what we risk getting out there amongst the comments in LWN and Phoronix and sites like that is a conspiracy theory: that the block layer devs are living in their ivory tower and not caring about interactivity of Linux and the desktop user experience and all that old yada-yada we've heard a million times by now. The point is not about the Linux desktop even, if you ask me. The point for me is that for everyone using an Android phone, Linux block layer interactivity matters, every time an application lags in start up on a stressed Android, and for spurious writes like "optimizing applications" it's even worse. (Disclaimer: I represent the embedded, tablet and handset industry. I might be tainted.) The people who think ineractivity of the block layer is important to them wants a voice. And Paolo is there for them, at the KS. I would take this opportunity to listen to him, whether formally or informally. ALSO to get the vibe from the kernel developer community at large: hands down: what matters to us? Data storage clusters of nvrams or embedded eMMC cards in Android phones? Or both? Is that even a question so silly that it should not be asked? Don't ask me, ask the KS attendees. I think it's relevant. (If for nothing else we do a good job at kicking up dust on this mailing list already, and we've been told it is actually more important than the KS itself.) Yours, Linus Walleij ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Addressing long-standing high-latency problems related to I/O 2016-09-16 19:36 ` James Bottomley 2016-09-16 20:13 ` Paolo Valente 2016-09-17 10:31 ` Linus Walleij @ 2016-09-21 13:51 ` Grant Likely 2 siblings, 0 replies; 20+ messages in thread From: Grant Likely @ 2016-09-21 13:51 UTC (permalink / raw) To: James Bottomley Cc: Bartlomiej Zolnierkiewicz, ksummit-discuss, Greg KH, Jens Axboe, Hannes Reinecke, Tejun Heo, osandov, Christoph Hellwig On Fri, Sep 16, 2016 at 8:36 PM, James Bottomley <James.Bottomley@hansenpartnership.com> wrote: > On Fri, 2016-09-16 at 20:48 +0200, Paolo Valente wrote: >> > Il giorno 16 set 2016, alle ore 17:15, James Bottomley < >> > James.Bottomley@HansenPartnership.com> ha scritto: >> > >> > On Fri, 2016-09-16 at 10:24 +0200, Greg KH wrote: >> > > On Fri, Sep 16, 2016 at 09:55:45AM +0200, Paolo Valente wrote: >> > > > Linux systems suffers from long-standing high-latency problems, >> > > > at system and application level, related to I/O. For example, >> > > > they usually suffer from poor responsiveness--or even >> > > > starvation, depending on the workload--while, e.g., one or more >> > > > files are being read/written/copied. On a similar note, >> > > > background workloads may cause audio/video playback/streaming >> > > > to stutter, even with long gaps. A lot of test results on this >> > > > problem can be found here [1] (I'm citing only this resource >> > > > just because I'm familiar with it, but evidence can be found in >> > > > countless technical reports, scientific papers, forum >> > > > discussions, and so on). >> > > >> > > <snip> >> > > >> > > Isn't this a better topic for the Vault conference, or the >> > > storage mini conference? >> > >> > LSF/MM would be the place to have the technical discussion, yes. >> > It will be in Cambridge (MA,USA not the real one) in the Feb/March >> > time frame in 2017. Far more of the storage experts (who likely >> > want to weigh in) will be present. >> > >> >> Perfect venue. Just it would be a pity IMO to waste the opportunity >> of my being at KS with other people working on the components >> involved in high-latency issues, and to delay by more months a >> discussion on possible solutions. > > OK, so the problem with a formal discussion of something like this at > KS is that of the 80 or so people in the room, likely only 10 have any > interest whatsoever, leading to intense boredom for the remaining 70. > And for those 10, there were likely another 10 who didn't get invited > who wanted the chance to express an opinion. Realistically, this is > why we no-longer do technical discussions at KS: audience too broad and > not enough specific subject matter experts. > > However, nothing says you can't have a discussion in the hallway if > you're already going. Maybe we can set aside a slot or too for smaller scale BoF sessions? If there are other topics in this vein, it would be good to have a list of them ahead of time. I've been considering a BoF related to our device model for example. g. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Addressing long-standing high-latency problems related to I/O 2016-09-16 7:55 [Ksummit-discuss] [TECH TOPIC] Addressing long-standing high-latency problems related to I/O Paolo Valente 2016-09-16 8:24 ` Greg KH @ 2016-09-21 14:30 ` Bart Van Assche 2016-09-21 14:37 ` Paolo Valente 1 sibling, 1 reply; 20+ messages in thread From: Bart Van Assche @ 2016-09-21 14:30 UTC (permalink / raw) To: Paolo Valente, ksummit-discuss Cc: b.zolnierkie, Jens Axboe, hare, Tejun Heo, osandov, hch On 09/16/16 00:55, Paolo Valente wrote: > Linux systems suffers from long-standing high-latency problems, at > system and application level, related to I/O. Hello Paolo, Are you aware of Jens' throttled background buffered writeback work? If not, can you repeat your measurements against a kernel on which these patches have been applied? See also http://www.spinics.net/lists/linux-fsdevel/msg101391.html. Thanks, Bart. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Addressing long-standing high-latency problems related to I/O 2016-09-21 14:30 ` Bart Van Assche @ 2016-09-21 14:37 ` Paolo Valente 0 siblings, 0 replies; 20+ messages in thread From: Paolo Valente @ 2016-09-21 14:37 UTC (permalink / raw) To: Bart Van Assche Cc: Bartlomiej Zolnierkiewicz, ksummit-discuss, Jens Axboe, hare, Tejun Heo, osandov, hch > Il giorno 21 set 2016, alle ore 16:30, Bart Van Assche <bart.vanassche@sandisk.com> ha scritto: > > On 09/16/16 00:55, Paolo Valente wrote: >> Linux systems suffers from long-standing high-latency problems, at >> system and application level, related to I/O. > > Hello Paolo, > Hi > Are you aware of Jens' throttled background buffered writeback work? If not, can you repeat your measurements against a kernel on which these patches have been applied? Already done (see below). > See also http://www.spinics.net/lists/linux-fsdevel/msg101391.html. > A brief report of the outcome of my measurements is in this email of mine on the same thread: http://www.spinics.net/lists/linux-fsdevel/msg101430.html In short, application start-up time happened to be about the same with and without writeback throttling (actually slightly higher with throttling, for, e.g., gnome-terminal). Thanks, Paolo > Thanks, > > Bart. > ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2016-09-22 11:06 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-09-16 7:55 [Ksummit-discuss] [TECH TOPIC] Addressing long-standing high-latency problems related to I/O Paolo Valente 2016-09-16 8:24 ` Greg KH 2016-09-16 8:59 ` Linus Walleij 2016-09-16 9:10 ` Bart Van Assche 2016-09-16 11:24 ` Linus Walleij 2016-09-16 11:46 ` Arnd Bergmann 2016-09-16 13:10 ` Paolo Valente 2016-09-16 13:36 ` Linus Walleij 2016-09-16 11:53 ` Bart Van Assche 2016-09-22 9:18 ` Ulf Hansson 2016-09-22 11:06 ` Linus Walleij 2016-09-16 15:15 ` James Bottomley 2016-09-16 18:48 ` Paolo Valente 2016-09-16 19:36 ` James Bottomley 2016-09-16 20:13 ` Paolo Valente 2016-09-19 8:17 ` Jan Kara 2016-09-17 10:31 ` Linus Walleij 2016-09-21 13:51 ` Grant Likely 2016-09-21 14:30 ` Bart Van Assche 2016-09-21 14:37 ` Paolo Valente
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.