From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.9 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 146EAC00523 for ; Fri, 3 Jan 2020 07:55:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0138821D7D for ; Fri, 3 Jan 2020 07:55:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="a4p9aCO1" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726005AbgACHz2 (ORCPT ); Fri, 3 Jan 2020 02:55:28 -0500 Received: from smtp-fw-4101.amazon.com ([72.21.198.25]:36750 "EHLO smtp-fw-4101.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725890AbgACHz2 (ORCPT ); Fri, 3 Jan 2020 02:55:28 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1578038127; x=1609574127; h=from:to:cc:subject:date:message-id:in-reply-to: mime-version; bh=BNj2ZpEqscC16GT1ujYI1MhyWZyxnccFrqY6TpA1/wU=; b=a4p9aCO18n0D9tWTRIkfJhUEzU3WBiJlxiGQAfRG7FydhrohcEa8Tio1 zm7f4Whv7K4kWp3BtWwYoWZfXxmti+pylCMktT4ANEVZgFn26uu8qbqoi 1e69hYBISrDC64hcnasUkh5EV4gF5rAWdXd3BtFKW0uNrKyBdLCa96h6u M=; IronPort-SDR: QwSJXMSAWI5rBa/AVXZwfYid/EHff+9L/s8IDNWg6pTX+v8EnMtlCqIc9bOvfc1hEdMe38punz qQ2Ta0LX70cg== X-IronPort-AV: E=Sophos;i="5.69,389,1571702400"; d="scan'208";a="10782255" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-1a-16acd5e0.us-east-1.amazon.com) ([10.43.8.6]) by smtp-border-fw-out-4101.iad4.amazon.com with ESMTP; 03 Jan 2020 07:55:25 +0000 Received: from EX13MTAUEA001.ant.amazon.com (iad55-ws-svc-p15-lb9-vlan2.iad.amazon.com [10.40.159.162]) by email-inbound-relay-1a-16acd5e0.us-east-1.amazon.com (Postfix) with ESMTPS id 5FEC8A2575; Fri, 3 Jan 2020 07:55:22 +0000 (UTC) Received: from EX13D31EUA001.ant.amazon.com (10.43.165.15) by EX13MTAUEA001.ant.amazon.com (10.43.61.82) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Fri, 3 Jan 2020 07:55:21 +0000 Received: from u886c93fd17d25d.ant.amazon.com (10.43.160.109) by EX13D31EUA001.ant.amazon.com (10.43.165.15) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Fri, 3 Jan 2020 07:55:16 +0000 From: SeongJae Park To: , SeongJae Park CC: , , , , , SeongJae Park , , , Subject: Re: [Xen-devel] [PATCH v13 3/5] xen/blkback: Squeeze page pools if a memory pressure is detected Date: Fri, 3 Jan 2020 08:54:48 +0100 Message-ID: <20200103075448.12994-1-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20191218183718.31719-4-sjpark@amazon.com> (raw) MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.43.160.109] X-ClientProxiedBy: EX13d09UWC003.ant.amazon.com (10.43.162.113) To EX13D31EUA001.ant.amazon.com (10.43.165.15) Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Hello Roger, Sorry if I'm disturbing your vacation. If you are already came back to work, may I ask your opinion about this patch? On Wed, 18 Dec 2019 19:37:16 +0100 SeongJae Park wrote: > From: SeongJae Park > > Each `blkif` has a free pages pool for the grant mapping. The size of > the pool starts from zero and is increased on demand while processing > the I/O requests. If current I/O requests handling is finished or 100 > milliseconds has passed since last I/O requests handling, it checks and > shrinks the pool to not exceed the size limit, `max_buffer_pages`. > > Therefore, host administrators can cause memory pressure in blkback by > attaching a large number of block devices and inducing I/O. Such > problematic situations can be avoided by limiting the maximum number of > devices that can be attached, but finding the optimal limit is not so > easy. Improper set of the limit can results in memory pressure or a > resource underutilization. This commit avoids such problematic > situations by squeezing the pools (returns every free page in the pool > to the system) for a while (users can set this duration via a module > parameter) if memory pressure is detected. > > Discussions > =========== > > The `blkback`'s original shrinking mechanism returns only pages in the > pool which are not currently be used by `blkback` to the system. In > other words, the pages that are not mapped with granted pages. Because > this commit is changing only the shrink limit but still uses the same > freeing mechanism it does not touch pages which are currently mapping > grants. > > Once memory pressure is detected, this commit keeps the squeezing limit > for a user-specified time duration. The duration should be neither too > long nor too short. If it is too long, the squeezing incurring overhead > can reduce the I/O performance. If it is too short, `blkback` will not > free enough pages to reduce the memory pressure. This commit sets the > value as `10 milliseconds` by default because it is a short time in > terms of I/O while it is a long time in terms of memory operations. > Also, as the original shrinking mechanism works for at least every 100 > milliseconds, this could be a somewhat reasonable choice. I also tested > other durations (refer to the below section for more details) and > confirmed that 10 milliseconds is the one that works best with the test. > That said, the proper duration depends on actual configurations and > workloads. That's why this commit allows users to set the duration as a > module parameter. > > Memory Pressure Test > ==================== > > To show how this commit fixes the memory pressure situation well, I > configured a test environment on a xen-running virtualization system. > On the `blkfront` running guest instances, I attach a large number of > network-backed volume devices and induce I/O to those. Meanwhile, I > measure the number of pages that swapped in (pswpin) and out (pswpout) > on the `blkback` running guest. The test ran twice, once for the > `blkback` before this commit and once for that after this commit. As > shown below, this commit has dramatically reduced the memory pressure: > > pswpin pswpout > before 76,672 185,799 > after 867 3,967 > > Optimal Aggressive Shrinking Duration > ------------------------------------- > > To find a best squeezing duration, I repeated the test with three > different durations (1ms, 10ms, and 100ms). The results are as below: > > duration pswpin pswpout > 1 707 5,095 > 10 867 3,967 > 100 362 3,348 > > As expected, the memory pressure decreases as the duration increases, > but the reduction become slow from the `10ms`. Based on this results, I > chose the default duration as 10ms. > > Performance Overhead Test > ========================= > > This commit could incur I/O performance degradation under severe memory > pressure because the squeezing will require more page allocations per > I/O. To show the overhead, I artificially made a worst-case squeezing > situation and measured the I/O performance of a `blkfront` running > guest. > > For the artificial squeezing, I set the `blkback.max_buffer_pages` using > the `/sys/module/xen_blkback/parameters/max_buffer_pages` file. In this > test, I set the value to `1024` and `0`. The `1024` is the default > value. Setting the value as `0` is same to a situation doing the > squeezing always (worst-case). > > If the underlying block device is slow enough, the squeezing overhead > could be hidden. For the reason, I use a fast block device, namely the > rbd[1]: > > # xl block-attach guest phy:/dev/ram0 xvdb w > > For the I/O performance measurement, I run a simple `dd` command 5 times > directly to the device as below and collect the 'MB/s' results. > > $ for i in {1..5}; do dd if=/dev/zero of=/dev/xvdb \ > bs=4k count=$((256*512)); sync; done > > The results are as below. 'max_pgs' represents the value of the > `blkback.max_buffer_pages` parameter. > > max_pgs Min Max Median Avg Stddev > 0 417 423 420 419.4 2.5099801 > 1024 414 425 416 417.8 4.4384682 > No difference proven at 95.0% confidence > > In short, even worst case squeezing on ramdisk based fast block device > makes no visible performance degradation. Please note that this is just > a very simple and minimal test. On systems using super-fast block > devices and a special I/O workload, the results might be different. If > you have any doubt, test on your machine with your workload to find the > optimal squeezing duration for you. > > [1] https://www.kernel.org/doc/html/latest/admin-guide/blockdev/ramdisk.html > > Signed-off-by: SeongJae Park > --- > .../ABI/testing/sysfs-driver-xen-blkback | 10 ++++++++ > drivers/block/xen-blkback/blkback.c | 7 ++++-- > drivers/block/xen-blkback/common.h | 1 + > drivers/block/xen-blkback/xenbus.c | 23 ++++++++++++++++++- > 4 files changed, 38 insertions(+), 3 deletions(-) > > diff --git a/Documentation/ABI/testing/sysfs-driver-xen-blkback b/Documentation/ABI/testing/sysfs-driver-xen-blkback > index 4e7babb3ba1f..f01224231f3f 100644 > --- a/Documentation/ABI/testing/sysfs-driver-xen-blkback > +++ b/Documentation/ABI/testing/sysfs-driver-xen-blkback > @@ -25,3 +25,13 @@ Description: > allocated without being in use. The time is in > seconds, 0 means indefinitely long. > The default is 60 seconds. > + > +What: /sys/module/xen_blkback/parameters/buffer_squeeze_duration_ms > +Date: December 2019 > +KernelVersion: 5.5 > +Contact: SeongJae Park > +Description: > + When memory pressure is reported to blkback this option > + controls the duration in milliseconds that blkback will not > + cache any page not backed by a grant mapping. > + The default is 10ms. > diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c > index fd1e19f1a49f..79f677aeb5cc 100644 > --- a/drivers/block/xen-blkback/blkback.c > +++ b/drivers/block/xen-blkback/blkback.c > @@ -656,8 +656,11 @@ int xen_blkif_schedule(void *arg) > ring->next_lru = jiffies + msecs_to_jiffies(LRU_INTERVAL); > } > > - /* Shrink if we have more than xen_blkif_max_buffer_pages */ > - shrink_free_pagepool(ring, xen_blkif_max_buffer_pages); > + /* Shrink the free pages pool if it is too large. */ > + if (time_before(jiffies, blkif->buffer_squeeze_end)) > + shrink_free_pagepool(ring, 0); > + else > + shrink_free_pagepool(ring, xen_blkif_max_buffer_pages); > > if (log_stats && time_after(jiffies, ring->st_print)) > print_stats(ring); > diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h > index 1d3002d773f7..536c84f61fed 100644 > --- a/drivers/block/xen-blkback/common.h > +++ b/drivers/block/xen-blkback/common.h > @@ -319,6 +319,7 @@ struct xen_blkif { > /* All rings for this device. */ > struct xen_blkif_ring *rings; > unsigned int nr_rings; > + unsigned long buffer_squeeze_end; > }; > > struct seg_buf { > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c > index b90dbcd99c03..24172c180f5f 100644 > --- a/drivers/block/xen-blkback/xenbus.c > +++ b/drivers/block/xen-blkback/xenbus.c > @@ -824,6 +824,26 @@ static void frontend_changed(struct xenbus_device *dev, > } > > > +/* Once a memory pressure is detected, squeeze free page pools for a while. */ > +static unsigned int buffer_squeeze_duration_ms = 10; > +module_param_named(buffer_squeeze_duration_ms, > + buffer_squeeze_duration_ms, int, 0644); > +MODULE_PARM_DESC(buffer_squeeze_duration_ms, > +"Duration in ms to squeeze pages buffer when a memory pressure is detected"); > + > +/* > + * Callback received when the memory pressure is detected. > + */ > +static void reclaim_memory(struct xenbus_device *dev) > +{ > + struct backend_info *be = dev_get_drvdata(&dev->dev); > + > + if (!be) > + return; This null check is the only one change from the version (https://lore.kernel.org/xen-devel/20191216093755.GJ11756@Air-de-Roger/) you gave me the 'Reviewed-by' before. This check is necessary because 'reclaim_memory()' can be called before 'probe' or after 'remove' callback. Thanks, SeongJae Park > + be->blkif->buffer_squeeze_end = jiffies + > + msecs_to_jiffies(buffer_squeeze_duration_ms); > +} > + > /* ** Connection ** */ > > > @@ -1115,7 +1135,8 @@ static struct xenbus_driver xen_blkbk_driver = { > .ids = xen_blkbk_ids, > .probe = xen_blkbk_probe, > .remove = xen_blkbk_remove, > - .otherend_changed = frontend_changed > + .otherend_changed = frontend_changed, > + .reclaim_memory = reclaim_memory, > }; > > int xen_blkif_xenbus_init(void) > -- > 2.17.1