On Thu, 2009-12-31 at 11:34 +0100, Corrado Zoccolo wrote: > Hi Yanmin, > On Thu, Dec 31, 2009 at 10:16 AM, Zhang, Yanmin > wrote: > > Comparing with kernel 2.6.32, fio mmap randread 64k has more than 40% regression with > > 2.6.33-rc1. > Thanks for your timely reply. Some comments inlined below. > Can you compare the performance also with 2.6.31? We did. We run Linux kernel Performance Tracking project and run many benchmarks when a RC kernel is released. The result of 2.6.31 is quite similar to the one of 2.6.32. But the one of 2.6.30 is about 8% better than the one of 2.6.31. > I think I understand what causes your problem. > 2.6.32, with default settings, handled even random readers as > sequential ones to provide fairness. This has benefits on single disks > and JBODs, but causes harm on raids. I didn't test RAID as that machine with hardware RAID HBA is crashed now. But if we turn on hardware RAID in HBA, mostly we use noop io scheduler. > For 2.6.33, we changed the way in which this is handled, restoring the > enable_idle = 0 for seeky queues as it was in 2.6.31: > @@ -2218,13 +2352,10 @@ cfq_update_idle_window(struct cfq_data *cfqd, > struct cfq_queue *cfqq, > enable_idle = old_idle = cfq_cfqq_idle_window(cfqq); > > if (!atomic_read(&cic->ioc->nr_tasks) || !cfqd->cfq_slice_idle || > - (!cfqd->cfq_latency && cfqd->hw_tag && CFQQ_SEEKY(cfqq))) > + (sample_valid(cfqq->seek_samples) && CFQQ_SEEKY(cfqq))) > enable_idle = 0; > (compare with 2.6.31: > if (!atomic_read(&cic->ioc->nr_tasks) || !cfqd->cfq_slice_idle || > (cfqd->hw_tag && CIC_SEEKY(cic))) > enable_idle = 0; > excluding the sample_valid check, it should be equivalent for you (I > assume you have NCQ disks)) > and we provide fairness for them by servicing all seeky queues > together, and then idling before switching to other ones. As for function cfq_update_idle_window, you is right. But since 2.6.32, CFQ merges many patches and the patches have impact on each other. > > The mmap 64k randreader will have a large seek_mean, resulting in > being marked seeky, but will send 16 * 4k sequential requests one > after the other, so alternating between those seeky queues will cause > harm. > > I'm working on a new way to compute seekiness of queues, that should > fix your issue, correctly identifying those queues as non-seeky (for > me, a queue should be considered seeky only if it submits more than 1 > seeky requests for 8 sequential ones). > > > > > The test scenario: 1 JBOD has 12 disks and every disk has 2 partitions. Create > > 8 1-GB files per partition and start 8 processes to do rand read on the 8 files > > per partitions. There are 8*24 processes totally. randread block size is 64K. > > > > We found the regression on 2 machines. One machine has 8GB memory and the other has > > 6GB. > > > > Bisect is very unstable. The related patches are many instead of just one. > > > > > > 1) commit 8e550632cccae34e265cb066691945515eaa7fb5 > > Author: Corrado Zoccolo > > Date: Thu Nov 26 10:02:58 2009 +0100 > > > > cfq-iosched: fix corner cases in idling logic > > > > > > This patch introduces about less than 20% regression. I just reverted below section > > and this part regression disappear. It shows this regression is stable and not impacted > > by other patches. > > > > @@ -1253,9 +1254,9 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd) > > return; > > > > /* > > - * still requests with the driver, don't idle > > + * still active requests from this queue, don't idle > > */ > > - if (rq_in_driver(cfqd)) > > + if (cfqq->dispatched) > > return; Although 5 patches are related to the regression, above line is quite independent. Reverting above line could always improve the result for about 20%. > > > This shouldn't affect you if all queues are marked as idle. Do you mean to use command ionice to mark it as idle class? I didn't try it. > Does just > your patch: > > - (!cfq_cfqq_deep(cfqq) && sample_valid(cfqq->seek_samples) > > - && CFQQ_SEEKY(cfqq))) > > + (!cfqd->cfq_latency && !cfq_cfqq_deep(cfqq) && > > + sample_valid(cfqq->seek_samples) && CFQQ_SEEKY(cfqq))) > fix most of the regression without touching arm_slice_timer? No. If to fix the regression completely, I need apply above patch plus a debug patch. The debug patch is to just work around the 3 patches report by Shaohua's tiobench regression report. Without the debug patch, the regression isn't resolved. Below is the debug patch. diff -Nraup linux-2.6.33_rc1/block/cfq-iosched.c linux-2.6.33_rc1_randread64k/block/cfq-iosched.c --- linux-2.6.33_rc1/block/cfq-iosched.c 2009-12-23 14:12:03.000000000 +0800 +++ linux-2.6.33_rc1_randread64k/block/cfq-iosched.c 2009-12-30 17:12:28.000000000 +0800 @@ -592,6 +592,9 @@ cfq_set_prio_slice(struct cfq_data *cfqd cfqq->slice_start = jiffies; cfqq->slice_end = jiffies + slice; cfqq->allocated_slice = slice; +/*YMZHANG*/ + cfqq->slice_end = cfq_prio_to_slice(cfqd, cfqq) + jiffies; + cfq_log_cfqq(cfqd, cfqq, "set_slice=%lu", cfqq->slice_end - jiffies); } @@ -1836,7 +1839,8 @@ static void cfq_arm_slice_timer(struct c /* * still active requests from this queue, don't idle */ - if (cfqq->dispatched) + //if (cfqq->dispatched) + if (rq_in_driver(cfqd)) return; /* @@ -1941,6 +1945,9 @@ static void cfq_setup_merge(struct cfq_q new_cfqq = __cfqq; } + /* YMZHANG debug */ + return; + process_refs = cfqq_process_refs(cfqq); /* * If the process for the cfqq has gone away, there is no > > I guess > > 5db5d64277bf390056b1a87d0bb288c8b8553f96. > will still introduce a 10% regression, but this is needed to improve > latency, and you can just disable low_latency to avoid it. You are right. I did a quick testing. If my patch + revert 2 patches and keep 5db5d64, the regression is about 20%. But low_latency=0 doesn't work like what we imagined. If patch + revert 2 patches and keep 5db5d64 while set low_latency=0, the regression is still there. One reason is my patch doesn't work when low_latency=0. > > Thanks, > Corrado I attach the fio job file for your reference. I got a cold and will continue to work on it next week. Yanmin