From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paolo Valente Subject: [PATCH RFC - TAKE TWO - 04/12] block, bfq: modify the peak-rate estimator Date: Thu, 29 May 2014 11:05:35 +0200 Message-ID: <1401354343-5527-5-git-send-email-paolo.valente@unimore.it> References: <20140528221929.GG1419@htj.dyndns.org> <1401354343-5527-1-git-send-email-paolo.valente@unimore.it> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: In-Reply-To: <1401354343-5527-1-git-send-email-paolo.valente-rcYM44yAMweonA0d6jMUrA@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Jens Axboe , Tejun Heo , Li Zefan Cc: Paolo Valente , containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Fabio Checconi , Arianna Avanzini , cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Paolo Valente List-Id: containers.vger.kernel.org VW5sZXNzIHRoZSBtYXhpbXVtIGJ1ZGdldCBCX21heCB0aGF0IEJGUSBjYW4gYXNzaWduIHRvIGEg cXVldWUgaXMgc2V0CmV4cGxpY2l0bHkgYnkgdGhlIHVzZXIsIEJGUSBhdXRvbWF0aWNhbGx5IHVw ZGF0ZXMgQl9tYXguIEluCnBhcnRpY3VsYXIsIEJGUSBkeW5hbWljYWxseSBzZXRzIEJfbWF4IHRv IHRoZSBudW1iZXIgb2Ygc2VjdG9ycyB0aGF0CmNhbiBiZSByZWFkLCBhdCB0aGUgY3VycmVudCBl c3RpbWF0ZWQgcGVhayByYXRlLCBkdXJpbmcgdGhlIG1heGltdW0KdGltZSwgVF9tYXgsIGFsbG93 ZWQgYmVmb3JlIGEgYnVkZ2V0IHRpbWVvdXQgb2NjdXJzLiBJbiBmb3JtdWxhcywgaWYKd2UgZGVu b3RlIGFzIFJfZXN0IHRoZSBlc3RpbWF0ZWQgcGVhayByYXRlLCB0aGVuIEJfbWF4ID0gVF9tYXgg 4oiXClJfZXN0LiBIZW5jZSwgdGhlIGhpZ2hlciBSX2VzdCBpcyB3aXRoIHJlc3BlY3QgdG8gdGhl IGFjdHVhbCBkaXNrIHBlYWsKcmF0ZSwgdGhlIGhpZ2hlciB0aGUgcHJvYmFiaWxpdHkgdGhhdCBw cm9jZXNzZXMgaW5jdXIgYnVkZ2V0IHRpbWVvdXRzCnVuanVzdGx5IGlzLiBCZXNpZGVzLCBhIHRv byBoaWdoIHZhbHVlIG9mIEJfbWF4IHVubmVjZXNzYXJpbHkKaW5jcmVhc2VzIHRoZSBkZXZpYXRp b24gZnJvbSBhbiBpZGVhbCwgc21vb3RoIHNlcnZpY2UuCgpUbyBmaWx0ZXIgb3V0IHNwaWtlcywg dGhlIGVzdGltYXRlZCBwZWFrIHJhdGUgaXMgdXBkYXRlZCBvbmx5IG9uIHRoZQpleHBpcmF0aW9u IG9mIHF1ZXVlcyB0aGF0IGhhdmUgYmVlbiBzZXJ2ZWQgZm9yIGEgbG9uZy1lbm91Z2ggdGltZS4g IEFzCmEgZmlyc3Qgc3RlcCwgdGhlIGVzdGltYXRvciBjb21wdXRlcyB0aGUgZGV2aWNlIHJhdGUs IFJfbWVhcywgZHVyaW5nCnRoZSBzZXJ2aWNlIG9mIHRoZSBxdWV1ZS4gQWZ0ZXIgdGhhdCwgaWYg Ul9lc3QgPCBSX21lYXMsIHRoZW4gUl9lc3QgaXMKc2V0IHRvIFJfbWVhcy4KClVuZm9ydHVuYXRl bHksIG91ciBleHBlcmltZW50cyBoaWdobGlnaHRlZCB0aGUgZm9sbG93aW5nIHR3bwpwcm9ibGVt cy4gRmlyc3QsIGJlY2F1c2Ugb2YgWkJSLCBkZXBlbmRpbmcgb24gdGhlIGxvY2FsaXR5IG9mIHRo ZQp3b3JrbG9hZCwgdGhlIGVzdGltYXRvciBtYXkgZWFzaWx5IGNvbnZlcmdlIHRvIGEgdmFsdWUg dGhhdCBpcwphcHByb3ByaWF0ZSBvbmx5IGZvciBwYXJ0IG9mIGEgZGlzay4gU2Vjb25kLCBSX2Vz dCBtYXkganVtcCAoYW5kCnJlbWFpbiBmb3JldmVyIGVxdWFsKSB0byBhIG11Y2ggaGlnaGVyIHZh bHVlIHRoYW4gdGhlIGFjdHVhbCBkZXZpY2UKcGVhayByYXRlLCBpbiBjYXNlIG9mIGhpdHMgaW4g dGhlIGRyaXZlIGNhY2hlLCB3aGljaCBtYXkgbGV0IHNlY3RvcnMKYmUgdHJhbnNmZXJyZWQgaW4g cHJhY3RpY2UgYXQgYnVzIHJhdGUuCgpUbyB0cnkgdG8gY29udmVyZ2UgdG8gdGhlIGFjdHVhbCBh dmVyYWdlIHBlYWsgcmF0ZSBvdmVyIHRoZSBkaXNrCnN1cmZhY2UgKGluIGNhc2Ugb2Ygcm90YXRp b25hbCBkZXZpY2VzKSwgYW5kIHRvIHNtb290aCB0aGUgc3Bpa2VzCmNhdXNlZCBieSB0aGUgZHJp dmUgY2FjaGUsIHRoaXMgcGF0Y2ggY2hhbmdlcyB0aGUgZXN0aW1hdG9yIGFzCmZvbGxvd3MuIElu IHRoZSBkZXNjcmlwdGlvbiBvZiB0aGUgY2hhbmdlcywgd2UgcmVmZXIgdG8gYSBxdWV1ZQpjb250 YWluaW5nIHJhbmRvbSByZXF1ZXN0cyBhcyAnc2Vla3knLCBhY2NvcmRpbmcgdG8gdGhlIHRlcm1p bm9sb2d5CnVzZWQgaW4gdGhlIGNvZGUsIGFuZCBpbmhlcml0ZWQgZnJvbSBDRlEuCgotIEZpcnN0 LCBub3cgUl9lc3QgbWF5IGJlIHVwZGF0ZWQgYWxzbyBpbiBjYXNlIHRoZSBqdXN0LWV4cGlyZWQg cXVldWUsCiAgZGVzcGl0ZSBub3QgYmVpbmcgZGV0ZWN0ZWQgYXMgc2Vla3ksIGhhcyBub3QgYmVl biBob3dldmVyIGFibGUgdG8KICBjb25zdW1lIGFsbCBvZiBpdHMgYnVkZ2V0IHdpdGhpbiB0aGUg bWF4aW11bSB0aW1lIHNsaWNlIFRfbWF4LiBJbgogIGZhY3QsIHRoaXMgaXMgYW4gaW5kaWNhdGlv biB0aGF0IEJfbWF4IGlzIHRvbyBsYXJnZS4gU2luY2UgQl9tYXggPQogIFRfbWF4IOKIlyBSX2Vz dCwgUl9lc3QgaXMgdGhlbiBwcm9iYWJseSB0b28gbGFyZ2UsIGFuZCBzaG91bGQgYmUKICByZWR1 Y2VkLgoKLSBTZWNvbmQsIHRvIGZpbHRlciB0aGUgc3Bpa2VzIGluIFJfbWVhcywgYSBkaXNjcmV0 ZSBsb3ctcGFzcyBmaWx0ZXIKICBpcyBub3cgdXNlZCB0byB1cGRhdGUgUl9lc3QgaW5zdGVhZCBv ZiBqdXN0IGtlZXBpbmcgdGhlIGhpZ2hlc3QgcmF0ZQogIHNhbXBsZWQuIFRoZSByYXRpb25hbGUg aXMgdGhhdCB0aGUgYXZlcmFnZSBwZWFrIHJhdGUgb2YgYSBkaXNrIG92ZXIKICBpdHMgc3VyZmFj ZSBpcyBhIHJlbGF0aXZlbHkgc3RhYmxlIHF1YW50aXR5LCBoZW5jZSBhIGxvdy1wYXNzIGZpbHRl cgogIHNob3VsZCBjb252ZXJnZSBtb3JlIG9yIGxlc3MgcXVpY2tseSB0byB0aGUgcmlnaHQgdmFs dWUuCgpXaXRoIHRoZSBjdXJyZW50IHZhbHVlcyBvZiB0aGUgY29uc3RhbnRzIHVzZWQgaW4gdGhl IGZpbHRlciwgdGhlCmxhdHRlciBzZWVtcyB0byBlZmZlY3RpdmVseSBzbW9vdGggZmx1Y3R1YXRp b25zIGFuZCBhbGxvdyB0aGUKZXN0aW1hdG9yIHRvIGNvbnZlcmdlIHRvIHRoZSBhY3R1YWwgcGVh ayByYXRlIHdpdGggYWxsIHRoZSBkZXZpY2VzIHdlCnRlc3RlZC4KClNpZ25lZC1vZmYtYnk6IFBh b2xvIFZhbGVudGUgPHBhb2xvLnZhbGVudGVAdW5pbW9yZS5pdD4KU2lnbmVkLW9mZi1ieTogQXJp YW5uYSBBdmFuemluaSA8YXZhbnppbmkuYXJpYW5uYUBnbWFpbC5jb20+Ci0tLQogYmxvY2svYmZx LWlvc2NoZWQuYyB8IDIzICsrKysrKysrKysrKysrKysrKy0tLS0tCiAxIGZpbGUgY2hhbmdlZCwg MTggaW5zZXJ0aW9ucygrKSwgNSBkZWxldGlvbnMoLSkKCmRpZmYgLS1naXQgYS9ibG9jay9iZnEt aW9zY2hlZC5jIGIvYmxvY2svYmZxLWlvc2NoZWQuYwppbmRleCA0OWZmMWRhLi4yYTRlMDNkIDEw MDY0NAotLS0gYS9ibG9jay9iZnEtaW9zY2hlZC5jCisrKyBiL2Jsb2NrL2JmcS1pb3NjaGVkLmMK QEAgLTgxOCw3ICs4MTgsNyBAQCBzdGF0aWMgdW5zaWduZWQgbG9uZyBiZnFfY2FsY19tYXhfYnVk Z2V0KHU2NCBwZWFrX3JhdGUsIHU2NCB0aW1lb3V0KQogICogdGhyb3VnaHB1dC4gU2VlIHRoZSBj b2RlIGZvciBtb3JlIGRldGFpbHMuCiAgKi8KIHN0YXRpYyBpbnQgYmZxX3VwZGF0ZV9wZWFrX3Jh dGUoc3RydWN0IGJmcV9kYXRhICpiZnFkLCBzdHJ1Y3QgYmZxX3F1ZXVlICpiZnFxLAotCQkJCWlu dCBjb21wZW5zYXRlKQorCQkJCWludCBjb21wZW5zYXRlLCBlbnVtIGJmcXFfZXhwaXJhdGlvbiBy ZWFzb24pCiB7CiAJdTY0IGJ3LCB1c2VjcywgZXhwZWN0ZWQsIHRpbWVvdXQ7CiAJa3RpbWVfdCBk ZWx0YTsKQEAgLTg1NCwxMCArODU0LDIzIEBAIHN0YXRpYyBpbnQgYmZxX3VwZGF0ZV9wZWFrX3Jh dGUoc3RydWN0IGJmcV9kYXRhICpiZnFkLCBzdHJ1Y3QgYmZxX3F1ZXVlICpiZnFxLAogCSAqIHRo ZSBwZWFrIHJhdGUgZXN0aW1hdGlvbi4KIAkgKi8KIAlpZiAodXNlY3MgPiAyMDAwMCkgewotCQlp ZiAoYncgPiBiZnFkLT5wZWFrX3JhdGUpIHsKLQkJCWJmcWQtPnBlYWtfcmF0ZSA9IGJ3OworCQlp ZiAoYncgPiBiZnFkLT5wZWFrX3JhdGUgfHwKKwkJICAgKCFCRlFRX1NFRUtZKGJmcXEpICYmCisJ CSAgICByZWFzb24gPT0gQkZRX0JGUVFfQlVER0VUX1RJTUVPVVQpKSB7CisJCQliZnFfbG9nKGJm cWQsICJtZWFzdXJlZCBidyA9JWxsdSIsIGJ3KTsKKwkJCS8qCisJCQkgKiBUbyBzbW9vdGggb3Nj aWxsYXRpb25zIHVzZSBhIGxvdy1wYXNzIGZpbHRlciB3aXRoCisJCQkgKiBhbHBoYT03LzgsIGku ZS4sCisJCQkgKiBuZXdfcmF0ZSA9ICg3LzgpICogb2xkX3JhdGUgKyAoMS84KSAqIGJ3CisJCQkg Ki8KKwkJCWRvX2RpdihidywgOCk7CisJCQlpZiAoYncgPT0gMCkKKwkJCQlyZXR1cm4gMDsKKwkJ CWJmcWQtPnBlYWtfcmF0ZSAqPSA3OworCQkJZG9fZGl2KGJmcWQtPnBlYWtfcmF0ZSwgOCk7CisJ CQliZnFkLT5wZWFrX3JhdGUgKz0gYnc7CiAJCQl1cGRhdGUgPSAxOwotCQkJYmZxX2xvZyhiZnFk LCAibmV3IHBlYWtfcmF0ZT0lbGx1IiwgYncpOworCQkJYmZxX2xvZyhiZnFkLCAibmV3IHBlYWtf cmF0ZT0lbGx1IiwgYmZxZC0+cGVha19yYXRlKTsKIAkJfQogCiAJCXVwZGF0ZSB8PSBiZnFkLT5w ZWFrX3JhdGVfc2FtcGxlcyA9PSBCRlFfUEVBS19SQVRFX1NBTVBMRVMgLSAxOwpAQCAtOTM2LDcg Kzk0OSw3IEBAIHN0YXRpYyB2b2lkIGJmcV9iZnFxX2V4cGlyZShzdHJ1Y3QgYmZxX2RhdGEgKmJm cWQsCiAJLyogVXBkYXRlIGRpc2sgcGVhayByYXRlIGZvciBhdXRvdHVuaW5nIGFuZCBjaGVjayB3 aGV0aGVyIHRoZQogCSAqIHByb2Nlc3MgaXMgc2xvdyAoc2VlIGJmcV91cGRhdGVfcGVha19yYXRl KS4KIAkgKi8KLQlzbG93ID0gYmZxX3VwZGF0ZV9wZWFrX3JhdGUoYmZxZCwgYmZxcSwgY29tcGVu c2F0ZSk7CisJc2xvdyA9IGJmcV91cGRhdGVfcGVha19yYXRlKGJmcWQsIGJmcXEsIGNvbXBlbnNh dGUsIHJlYXNvbik7CiAKIAkvKgogCSAqIEFzIGFib3ZlIGV4cGxhaW5lZCwgJ3B1bmlzaCcgc2xv dyAoaS5lLiwgc2Vla3kpLCB0aW1lZC1vdXQKLS0gCjEuOS4yCgpfX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fXwpDb250YWluZXJzIG1haWxpbmcgbGlzdApDb250 YWluZXJzQGxpc3RzLmxpbnV4LWZvdW5kYXRpb24ub3JnCmh0dHBzOi8vbGlzdHMubGludXhmb3Vu ZGF0aW9uLm9yZy9tYWlsbWFuL2xpc3RpbmZvL2NvbnRhaW5lcnM= From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964998AbaE2JJf (ORCPT ); Thu, 29 May 2014 05:09:35 -0400 Received: from spostino.sms.unimo.it ([155.185.44.3]:41522 "EHLO spostino.sms.unimo.it" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934258AbaE2JGw (ORCPT ); Thu, 29 May 2014 05:06:52 -0400 From: Paolo Valente To: Jens Axboe , Tejun Heo , Li Zefan Cc: Fabio Checconi , Arianna Avanzini , Paolo Valente , linux-kernel@vger.kernel.org, containers@lists.linux-foundation.org, cgroups@vger.kernel.org, Paolo Valente Subject: [PATCH RFC - TAKE TWO - 04/12] block, bfq: modify the peak-rate estimator Date: Thu, 29 May 2014 11:05:35 +0200 Message-Id: <1401354343-5527-5-git-send-email-paolo.valente@unimore.it> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1401354343-5527-1-git-send-email-paolo.valente@unimore.it> References: <20140528221929.GG1419@htj.dyndns.org> <1401354343-5527-1-git-send-email-paolo.valente@unimore.it> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit UNIMORE-X-SA-Score: -2.9 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Unless the maximum budget B_max that BFQ can assign to a queue is set explicitly by the user, BFQ automatically updates B_max. In particular, BFQ dynamically sets B_max to the number of sectors that can be read, at the current estimated peak rate, during the maximum time, T_max, allowed before a budget timeout occurs. In formulas, if we denote as R_est the estimated peak rate, then B_max = T_max ∗ R_est. Hence, the higher R_est is with respect to the actual disk peak rate, the higher the probability that processes incur budget timeouts unjustly is. Besides, a too high value of B_max unnecessarily increases the deviation from an ideal, smooth service. To filter out spikes, the estimated peak rate is updated only on the expiration of queues that have been served for a long-enough time. As a first step, the estimator computes the device rate, R_meas, during the service of the queue. After that, if R_est < R_meas, then R_est is set to R_meas. Unfortunately, our experiments highlighted the following two problems. First, because of ZBR, depending on the locality of the workload, the estimator may easily converge to a value that is appropriate only for part of a disk. Second, R_est may jump (and remain forever equal) to a much higher value than the actual device peak rate, in case of hits in the drive cache, which may let sectors be transferred in practice at bus rate. To try to converge to the actual average peak rate over the disk surface (in case of rotational devices), and to smooth the spikes caused by the drive cache, this patch changes the estimator as follows. In the description of the changes, we refer to a queue containing random requests as 'seeky', according to the terminology used in the code, and inherited from CFQ. - First, now R_est may be updated also in case the just-expired queue, despite not being detected as seeky, has not been however able to consume all of its budget within the maximum time slice T_max. In fact, this is an indication that B_max is too large. Since B_max = T_max ∗ R_est, R_est is then probably too large, and should be reduced. - Second, to filter the spikes in R_meas, a discrete low-pass filter is now used to update R_est instead of just keeping the highest rate sampled. The rationale is that the average peak rate of a disk over its surface is a relatively stable quantity, hence a low-pass filter should converge more or less quickly to the right value. With the current values of the constants used in the filter, the latter seems to effectively smooth fluctuations and allow the estimator to converge to the actual peak rate with all the devices we tested. Signed-off-by: Paolo Valente Signed-off-by: Arianna Avanzini --- block/bfq-iosched.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 49ff1da..2a4e03d 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -818,7 +818,7 @@ static unsigned long bfq_calc_max_budget(u64 peak_rate, u64 timeout) * throughput. See the code for more details. */ static int bfq_update_peak_rate(struct bfq_data *bfqd, struct bfq_queue *bfqq, - int compensate) + int compensate, enum bfqq_expiration reason) { u64 bw, usecs, expected, timeout; ktime_t delta; @@ -854,10 +854,23 @@ static int bfq_update_peak_rate(struct bfq_data *bfqd, struct bfq_queue *bfqq, * the peak rate estimation. */ if (usecs > 20000) { - if (bw > bfqd->peak_rate) { - bfqd->peak_rate = bw; + if (bw > bfqd->peak_rate || + (!BFQQ_SEEKY(bfqq) && + reason == BFQ_BFQQ_BUDGET_TIMEOUT)) { + bfq_log(bfqd, "measured bw =%llu", bw); + /* + * To smooth oscillations use a low-pass filter with + * alpha=7/8, i.e., + * new_rate = (7/8) * old_rate + (1/8) * bw + */ + do_div(bw, 8); + if (bw == 0) + return 0; + bfqd->peak_rate *= 7; + do_div(bfqd->peak_rate, 8); + bfqd->peak_rate += bw; update = 1; - bfq_log(bfqd, "new peak_rate=%llu", bw); + bfq_log(bfqd, "new peak_rate=%llu", bfqd->peak_rate); } update |= bfqd->peak_rate_samples == BFQ_PEAK_RATE_SAMPLES - 1; @@ -936,7 +949,7 @@ static void bfq_bfqq_expire(struct bfq_data *bfqd, /* Update disk peak rate for autotuning and check whether the * process is slow (see bfq_update_peak_rate). */ - slow = bfq_update_peak_rate(bfqd, bfqq, compensate); + slow = bfq_update_peak_rate(bfqd, bfqq, compensate, reason); /* * As above explained, 'punish' slow (i.e., seeky), timed-out -- 1.9.2