[PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory pressure

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory pressure
@ 2019-12-09  8:58 SeongJae Park
  2019-12-09  8:58 ` [PATCH v3 1/1] xen/blkback: Squeeze page pools if a memory pressure is detected SeongJae Park
  2019-12-09  9:39 ` [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory pressure Jürgen Groß
  0 siblings, 2 replies; 10+ messages in thread
From: SeongJae Park @ 2019-12-09  8:58 UTC (permalink / raw)
  To: axboe, konrad.wilk, roger.pau
  Cc: linux-block, linux-kernel, pdurrant, sj38.park, xen-devel, SeongJae Park

Each `blkif` has a free pages pool for the grant mapping.  The size of
the pool starts from zero and be increased on demand while processing
the I/O requests.  If current I/O requests handling is finished or 100
milliseconds has passed since last I/O requests handling, it checks and
shrinks the pool to not exceed the size limit, `max_buffer_pages`.

Therefore, `blkfront` running guests can cause a memory pressure in the
`blkback` running guest by attaching a large number of block devices and
inducing I/O.  System administrators can avoid such problematic
situations by limiting the maximum number of devices each guest can
attach.  However, finding the optimal limit is not so easy.  Improper
set of the limit can results in the memory pressure or a resource
underutilization.  This commit avoids such problematic situations by
squeezing the pools (returns every free page in the pool to the system)
for a while (users can set this duration via a module parameter) if a
memory pressure is detected.

Base Version
------------

This patch is based on v5.4.  A complete tree is also available at my
public git repo:
https://github.com/sjp38/linux/tree/blkback_aggressive_shrinking_v3

Patch History
-------------

Changes from v2 (https://lore.kernel.org/linux-block/af195033-23d5-38ed-b73b-f6e2e3b34541@amazon.com)
 - Rename the module parameter and variables for brevity (aggressive
   shrinking -> squeezing)

Changes from v1 (https://lore.kernel.org/xen-devel/20191204113419.2298-1-sjpark@amazon.com/)
 - Adjust the description to not use the term, `arbitrarily` (suggested
   by Paul Durrant)
 - Specify time unit of the duration in the parameter description,
   (suggested by Maximilian Heyne)
 - Change default aggressive shrinking duration from 1ms to 10ms
 - Merge two patches into one single patch

SeongJae Park (1):
  xen/blkback: Squeeze page pools if a memory pressure is detected

 drivers/block/xen-blkback/blkback.c | 35 +++++++++++++++++++++++++++--
 1 file changed, 33 insertions(+), 2 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 1/1] xen/blkback: Squeeze page pools if a memory pressure is detected
  2019-12-09  8:58 [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory pressure SeongJae Park
@ 2019-12-09  8:58 ` SeongJae Park
  2019-12-09  9:39 ` [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory pressure Jürgen Groß
  1 sibling, 0 replies; 10+ messages in thread
From: SeongJae Park @ 2019-12-09  8:58 UTC (permalink / raw)
  To: axboe, konrad.wilk, roger.pau
  Cc: linux-block, linux-kernel, pdurrant, sj38.park, xen-devel, SeongJae Park

From: SeongJae Park <sjpark@amazon.de>

Each `blkif` has a free pages pool for the grant mapping.  The size of
the pool starts from zero and be increased on demand while processing
the I/O requests.  If current I/O requests handling is finished or 100
milliseconds has passed since last I/O requests handling, it checks and
shrinks the pool to not exceed the size limit, `max_buffer_pages`.

Therefore, `blkfront` running guests can cause a memory pressure in the
`blkback` running guest by attaching a large number of block devices and
inducing I/O.  System administrators can avoid such problematic
situations by limiting the maximum number of devices each guest can
attach.  However, finding the optimal limit is not so easy.  Improper
set of the limit can results in the memory pressure or a resource
underutilization.  This commit avoids such problematic situations by
squeezing the pools (returns every free page in the pool to the system)
for a while (users can set this duration via a module parameter) if a
memory pressure is detected.

Discussions
===========

The `blkback`'s original shrinking mechanism returns only pages in the
pool, which are not currently be used by `blkback`, to the system.  In
other words, the pages are not mapped with foreign pages.  Because this
commit is changing only the shrink limit but uses the mechanism as is,
this commit does not introduce improper mappings related security
issues.

Once a memory pressure is detected, this commit keeps the squeezing
limit for a user-specified time duration.  The duration should be
neither too long nor too short.  If it is too long, the squeezing
incurring overhead can reduce the I/O performance.  If it is too short,
`blkback` will not free enough pages to reduce the memory pressure.
This commit sets the value as `10 milliseconds` by default because it is
a short time in terms of I/O while it is a long time in terms of memory
operations.  Also, as the original shrinking mechanism works for at
least every 100 milliseconds, this could be a somewhat reasonable
choice.  I also tested other durations (refer to the below section for
more details) and confirmed that 10 milliseconds is the one that works
best with the test.  That said, the proper duration depends on actual
configurations and workloads.  That's why this commit is allowing users
to set it as their optimal value via the module parameter.

Memory Pressure Test
====================

To show how this commit fixes the memory pressure situation well, I
configured a test environment on a xen-running virtualization system.
On the `blkfront` running guest instances, I attach a large number of
network-backed volume devices and induce I/O to those.  Meanwhile, I
measure the number of pages that swapped in and out on the `blkback`
running guest.  The test ran twice, once for the `blkback` before this
commit and once for that after this commit.  As shown below, this commit
has dramatically reduced the memory pressure:

                pswpin  pswpout
    before      76,672  185,799
    after          212    3,325

Optimal Aggressive Shrinking Duration
-------------------------------------

To find a best squeezing duration, I repeated the test with three
different durations (1ms, 10ms, and 100ms).  The results are as below:

    duration    pswpin  pswpout
    1           852     6,424
    10          212     3,325
    100         203     3,340

As expected, the memory pressure has decreased as the duration is
increased, but the reduction stopped from the `10ms`.  Based on this
results, I chose the default duration as 10ms.

Performance Overhead Test
=========================

This commit could incur I/O performance degradation under severe memory
pressure because the squeezing will require more page allocations per
I/O.  To show the overhead, I artificially made a worst-case squeezing
situation and measured the I/O performance of a `blkfront` running
guest.

For the artificial squeezing, I set the `blkback.max_buffer_pages` using
the `/sys/module/xen_blkback/parameters/max_buffer_pages` file.  We set
the value to `1024` and `0`.  The `1024` is the default value.  Setting
the value as `0` is same to a situation doing the squeezing always
(worst-case).

For the I/O performance measurement, I use a simple `dd` command.

Default Performance
-------------------

    [dom0]# echo 1024 > /sys/module/xen_blkback/parameters/max_buffer_pages
    [instance]$ for i in {1..5}; do dd if=/dev/zero of=file bs=4k count=$((256*512)); sync; done
    131072+0 records in
    131072+0 records out
    536870912 bytes (537 MB) copied, 11.7257 s, 45.8 MB/s
    131072+0 records in
    131072+0 records out
    536870912 bytes (537 MB) copied, 13.8827 s, 38.7 MB/s
    131072+0 records in
    131072+0 records out
    536870912 bytes (537 MB) copied, 13.8781 s, 38.7 MB/s
    131072+0 records in
    131072+0 records out
    536870912 bytes (537 MB) copied, 13.8737 s, 38.7 MB/s
    131072+0 records in
    131072+0 records out
    536870912 bytes (537 MB) copied, 13.8702 s, 38.7 MB/s

Worst-case Performance
----------------------

    [dom0]# echo 0 > /sys/module/xen_blkback/parameters/max_buffer_pages
    [instance]$ for i in {1..5}; do dd if=/dev/zero of=file bs=4k count=$((256*512)); sync; done
    131072+0 records in
    131072+0 records out
    536870912 bytes (537 MB) copied, 11.7257 s, 45.8 MB/s
    131072+0 records in
    131072+0 records out
    536870912 bytes (537 MB) copied, 13.878 s, 38.7 MB/s
    131072+0 records in
    131072+0 records out
    536870912 bytes (537 MB) copied, 13.8746 s, 38.7 MB/s
    131072+0 records in
    131072+0 records out
    536870912 bytes (537 MB) copied, 13.8786 s, 38.7 MB/s
    131072+0 records in
    131072+0 records out
    536870912 bytes (537 MB) copied, 13.8749 s, 38.7 MB/s

In short, even worst case squeezing makes no visible performance
degradation.  I think this is due to the slow speed of the I/O.  In
other words, the additional page allocation overhead is hidden under the
much slower I/O latency.

Nevertheless, pleaset note that this is just a very simple and minimal
test.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
---
 drivers/block/xen-blkback/blkback.c | 35 +++++++++++++++++++++++++++--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index fd1e19f1a49f..587061fd06fc 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -142,6 +142,31 @@ static inline bool persistent_gnt_timeout(struct persistent_gnt *persistent_gnt)
 		HZ * xen_blkif_pgrant_timeout);
 }

+/*
+ * Once a memory pressure is detected, squeeze free page pools
+ * this time (milliseconds)
+ */
+static int xen_blkif_buffer_squeeze_duration = 10;
+module_param_named(buffer_squeeze_duration,
+		xen_blkif_buffer_squeeze_duration, int, 0644);
+MODULE_PARM_DESC(buffer_squeeze_duration,
+"Duration in ms to squeeze pages buffer when a memory pressure is detected");
+
+static unsigned long xen_blk_buffer_squeeze_end;
+
+static unsigned long blkif_shrink_count(struct shrinker *shrinker,
+				struct shrink_control *sc)
+{
+	xen_blk_buffer_squeeze_end = jiffies +
+		msecs_to_jiffies(xen_blkif_buffer_squeeze_duration);
+	return 0;
+}
+
+static struct shrinker blkif_shrinker = {
+	.count_objects = blkif_shrink_count,
+	.seeks = DEFAULT_SEEKS,
+};
+
 static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page)
 {
 	unsigned long flags;
@@ -656,8 +681,11 @@ int xen_blkif_schedule(void *arg)
 			ring->next_lru = jiffies + msecs_to_jiffies(LRU_INTERVAL);
 		}

-		/* Shrink if we have more than xen_blkif_max_buffer_pages */
-		shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);
+		/* Shrink the free pages pool if it is too large. */
+		if (time_before(jiffies, xen_blk_buffer_squeeze_end))
+			shrink_free_pagepool(ring, 0);
+		else
+			shrink_free_pagepool(ring, xen_blkif_max_buffer_pages);

 		if (log_stats && time_after(jiffies, ring->st_print))
 			print_stats(ring);
@@ -1498,6 +1526,9 @@ static int __init xen_blkif_init(void)
 	if (rc)
 		goto failed_init;

+	if (register_shrinker(&blkif_shrinker))
+		pr_warn("shrinker registration failed\n");
+
  failed_init:
 	return rc;
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory pressure
  2019-12-09  8:58 [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory pressure SeongJae Park
  2019-12-09  8:58 ` [PATCH v3 1/1] xen/blkback: Squeeze page pools if a memory pressure is detected SeongJae Park
@ 2019-12-09  9:39 ` Jürgen Groß
  2019-12-09  9:46   ` Durrant, Paul
  2019-12-09 10:23   ` SeongJae Park
  1 sibling, 2 replies; 10+ messages in thread
From: Jürgen Groß @ 2019-12-09  9:39 UTC (permalink / raw)
  To: SeongJae Park, axboe, konrad.wilk, roger.pau
  Cc: linux-block, linux-kernel, pdurrant, sj38.park, xen-devel

On 09.12.19 09:58, SeongJae Park wrote:
> Each `blkif` has a free pages pool for the grant mapping.  The size of
> the pool starts from zero and be increased on demand while processing
> the I/O requests.  If current I/O requests handling is finished or 100
> milliseconds has passed since last I/O requests handling, it checks and
> shrinks the pool to not exceed the size limit, `max_buffer_pages`.
> 
> Therefore, `blkfront` running guests can cause a memory pressure in the
> `blkback` running guest by attaching a large number of block devices and
> inducing I/O.

I'm having problems to understand how a guest can attach a large number
of block devices without those having been configured by the host admin
before.

If those devices have been configured, dom0 should be ready for that
number of devices, e.g. by having enough spare memory area for ballooned
pages.

So either I'm missing something here or your reasoning for the need of
the patch is wrong.


Juergen

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory pressure
  2019-12-09  9:39 ` [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory pressure Jürgen Groß
@ 2019-12-09  9:46   ` Durrant, Paul
  2019-12-09 10:15     ` Jürgen Groß
  2019-12-09 10:23   ` SeongJae Park
  1 sibling, 1 reply; 10+ messages in thread
From: Durrant, Paul @ 2019-12-09  9:46 UTC (permalink / raw)
  To: Jürgen Groß, Park, Seongjae, axboe, konrad.wilk, roger.pau
  Cc: linux-block, linux-kernel, sj38.park, xen-devel

> -----Original Message-----
> From: Jürgen Groß <jgross@suse.com>
> Sent: 09 December 2019 09:39
> To: Park, Seongjae <sjpark@amazon.com>; axboe@kernel.dk;
> konrad.wilk@oracle.com; roger.pau@citrix.com
> Cc: linux-block@vger.kernel.org; linux-kernel@vger.kernel.org; Durrant,
> Paul <pdurrant@amazon.com>; sj38.park@gmail.com; xen-
> devel@lists.xenproject.org
> Subject: Re: [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory
> pressure
> 
> On 09.12.19 09:58, SeongJae Park wrote:
> > Each `blkif` has a free pages pool for the grant mapping.  The size of
> > the pool starts from zero and be increased on demand while processing
> > the I/O requests.  If current I/O requests handling is finished or 100
> > milliseconds has passed since last I/O requests handling, it checks and
> > shrinks the pool to not exceed the size limit, `max_buffer_pages`.
> >
> > Therefore, `blkfront` running guests can cause a memory pressure in the
> > `blkback` running guest by attaching a large number of block devices and
> > inducing I/O.
> 
> I'm having problems to understand how a guest can attach a large number
> of block devices without those having been configured by the host admin
> before.
> 
> If those devices have been configured, dom0 should be ready for that
> number of devices, e.g. by having enough spare memory area for ballooned
> pages.
> 
> So either I'm missing something here or your reasoning for the need of
> the patch is wrong.
> 

I think the underlying issue is that persistent grant support is hogging memory in the backends, thereby compromising scalability. IIUC this patch is essentially a band-aid to get back to the scalability that was possible before persistent grant support was added. Ultimately the right answer should be to get rid of persistent grants support and use grant copy, but such a change is clearly more invasive and would need far more testing.

  Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory pressure
  2019-12-09  9:46   ` Durrant, Paul
@ 2019-12-09 10:15     ` Jürgen Groß
  2019-12-09 10:52       ` SeongJae Park
  0 siblings, 1 reply; 10+ messages in thread
From: Jürgen Groß @ 2019-12-09 10:15 UTC (permalink / raw)
  To: Durrant, Paul, Park, Seongjae, axboe, konrad.wilk, roger.pau
  Cc: linux-block, linux-kernel, sj38.park, xen-devel

On 09.12.19 10:46, Durrant, Paul wrote:
>> -----Original Message-----
>> From: Jürgen Groß <jgross@suse.com>
>> Sent: 09 December 2019 09:39
>> To: Park, Seongjae <sjpark@amazon.com>; axboe@kernel.dk;
>> konrad.wilk@oracle.com; roger.pau@citrix.com
>> Cc: linux-block@vger.kernel.org; linux-kernel@vger.kernel.org; Durrant,
>> Paul <pdurrant@amazon.com>; sj38.park@gmail.com; xen-
>> devel@lists.xenproject.org
>> Subject: Re: [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory
>> pressure
>>
>> On 09.12.19 09:58, SeongJae Park wrote:
>>> Each `blkif` has a free pages pool for the grant mapping.  The size of
>>> the pool starts from zero and be increased on demand while processing
>>> the I/O requests.  If current I/O requests handling is finished or 100
>>> milliseconds has passed since last I/O requests handling, it checks and
>>> shrinks the pool to not exceed the size limit, `max_buffer_pages`.
>>>
>>> Therefore, `blkfront` running guests can cause a memory pressure in the
>>> `blkback` running guest by attaching a large number of block devices and
>>> inducing I/O.
>>
>> I'm having problems to understand how a guest can attach a large number
>> of block devices without those having been configured by the host admin
>> before.
>>
>> If those devices have been configured, dom0 should be ready for that
>> number of devices, e.g. by having enough spare memory area for ballooned
>> pages.
>>
>> So either I'm missing something here or your reasoning for the need of
>> the patch is wrong.
>>
> 
> I think the underlying issue is that persistent grant support is hogging memory in the backends, thereby compromising scalability. IIUC this patch is essentially a band-aid to get back to the scalability that was possible before persistent grant support was added. Ultimately the right answer should be to get rid of persistent grants support and use grant copy, but such a change is clearly more invasive and would need far more testing.

Persistent grants are hogging ballooned pages, which is equivalent to
memory only in case of the backend's domain memory being equal or
rather near to its max memory size.

So configuring the backend domain with enough spare area for ballooned
pages should make this problem much less serious.

Another problem in this area is the amount of maptrack frames configured
for a driver domain, which will limit the number of concurrent foreign
mappings of that domain.

So instead of having a blkback specific solution I'd rather have a
common callback for backends to release foreign mappings in order to
enable a global resource management.


Juergen

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re: [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory pressure
  2019-12-09  9:39 ` [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory pressure Jürgen Groß
  2019-12-09  9:46   ` Durrant, Paul
@ 2019-12-09 10:23   ` SeongJae Park
  2019-12-09 10:29     ` Jürgen Groß
  1 sibling, 1 reply; 10+ messages in thread
From: SeongJae Park @ 2019-12-09 10:23 UTC (permalink / raw)
  To: jgross; +Cc: linux-block, linux-kernel, pdurrant, sj38.park, xen-devel

On   Mon, 9 Dec 2019 10:39:02 +0100  Juergen <jgross@suse.com> wrote:

>On 09.12.19 09:58, SeongJae Park wrote:
>> Each `blkif` has a free pages pool for the grant mapping.  The size of
>> the pool starts from zero and be increased on demand while processing
>> the I/O requests.  If current I/O requests handling is finished or 100
>> milliseconds has passed since last I/O requests handling, it checks and
>> shrinks the pool to not exceed the size limit, `max_buffer_pages`.
>>
>> Therefore, `blkfront` running guests can cause a memory pressure in the
>> `blkback` running guest by attaching a large number of block devices and
>> inducing I/O.
>
>I'm having problems to understand how a guest can attach a large number
>of block devices without those having been configured by the host admin
>before.
>
>If those devices have been configured, dom0 should be ready for that
>number of devices, e.g. by having enough spare memory area for ballooned
>pages.

As mentioned in the original message as below, administrators _can_ avoid this
problem, but finding the optimal configuration is hard, especially if the
number of the guests is large.

	System administrators can avoid such problematic situations by limiting
	the maximum number of devices each guest can attach.  However, finding
	the optimal limit is not so easy.  Improper set of the limit can
	results in the memory pressure or a resource underutilization.


Thanks,
SeongJae Park

>
>So either I'm missing something here or your reasoning for the need of
>the patch is wrong.
>
>
>Juergen
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory pressure
  2019-12-09 10:23   ` SeongJae Park
@ 2019-12-09 10:29     ` Jürgen Groß
  0 siblings, 0 replies; 10+ messages in thread
From: Jürgen Groß @ 2019-12-09 10:29 UTC (permalink / raw)
  To: SeongJae Park; +Cc: linux-block, linux-kernel, pdurrant, sj38.park, xen-devel

On 09.12.19 11:23, SeongJae Park wrote:
> On   Mon, 9 Dec 2019 10:39:02 +0100  Juergen <jgross@suse.com> wrote:
> 
>> On 09.12.19 09:58, SeongJae Park wrote:
>>> Each `blkif` has a free pages pool for the grant mapping.  The size of
>>> the pool starts from zero and be increased on demand while processing
>>> the I/O requests.  If current I/O requests handling is finished or 100
>>> milliseconds has passed since last I/O requests handling, it checks and
>>> shrinks the pool to not exceed the size limit, `max_buffer_pages`.
>>>
>>> Therefore, `blkfront` running guests can cause a memory pressure in the
>>> `blkback` running guest by attaching a large number of block devices and
>>> inducing I/O.
>>
>> I'm having problems to understand how a guest can attach a large number
>> of block devices without those having been configured by the host admin
>> before.
>>
>> If those devices have been configured, dom0 should be ready for that
>> number of devices, e.g. by having enough spare memory area for ballooned
>> pages.
> 
> As mentioned in the original message as below, administrators _can_ avoid this
> problem, but finding the optimal configuration is hard, especially if the
> number of the guests is large.
> 
> 	System administrators can avoid such problematic situations by limiting
> 	the maximum number of devices each guest can attach.  However, finding
> 	the optimal limit is not so easy.  Improper set of the limit can
> 	results in the memory pressure or a resource underutilization.

This sounds as if the admin would set a device limit. But it is the
other way round: The admin needs to configure each possible device
with all parameters (e.g. backing dom0 resource) for enabling the
frontend to use it.


Juergen

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re: [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory pressure
  2019-12-09 10:15     ` Jürgen Groß
@ 2019-12-09 10:52       ` SeongJae Park
  2019-12-09 11:08         ` Jürgen Groß
  0 siblings, 1 reply; 10+ messages in thread
From: SeongJae Park @ 2019-12-09 10:52 UTC (permalink / raw)
  To: jgross; +Cc: linux-block, linux-kernel, sj38.park, xen-devel

On Mon, 9 Dec 2019 11:15:22 +0100 "Jürgen Groß" <jgross@suse.com> wrote:

>On 09.12.19 10:46, Durrant, Paul wrote:
>>> -----Original Message-----
>>> From: Jürgen Groß <jgross@suse.com>
>>> Sent: 09 December 2019 09:39
>>> To: Park, Seongjae <sjpark@amazon.com>; axboe@kernel.dk;
>>> konrad.wilk@oracle.com; roger.pau@citrix.com
>>> Cc: linux-block@vger.kernel.org; linux-kernel@vger.kernel.org; Durrant,
>>> Paul <pdurrant@amazon.com>; sj38.park@gmail.com; xen-
>>> devel@lists.xenproject.org
>>> Subject: Re: [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory
>>> pressure
>>>
>>> On 09.12.19 09:58, SeongJae Park wrote:
>>>> Each `blkif` has a free pages pool for the grant mapping.  The size of
>>>> the pool starts from zero and be increased on demand while processing
>>>> the I/O requests.  If current I/O requests handling is finished or 100
>>>> milliseconds has passed since last I/O requests handling, it checks and
>>>> shrinks the pool to not exceed the size limit, `max_buffer_pages`.
>>>>
>>>> Therefore, `blkfront` running guests can cause a memory pressure in the
>>>> `blkback` running guest by attaching a large number of block devices and
>>>> inducing I/O.
>>>
>>> I'm having problems to understand how a guest can attach a large number
>>> of block devices without those having been configured by the host admin
>>> before.
>>>
>>> If those devices have been configured, dom0 should be ready for that
>>> number of devices, e.g. by having enough spare memory area for ballooned
>>> pages.
>>>
>>> So either I'm missing something here or your reasoning for the need of
>>> the patch is wrong.
>>>
>>
>> I think the underlying issue is that persistent grant support is hogging memory in the backends, thereby compromising scalability. IIUC this patch is essentially a band-aid to get back to the scalability that was possible before persistent grant support was added. Ultimately the right answer should be to get rid of persistent grants support and use grant copy, but such a change is clearly more invasive and would need far more testing.
>
>Persistent grants are hogging ballooned pages, which is equivalent to
>memory only in case of the backend's domain memory being equal or
>rather near to its max memory size.
>
>So configuring the backend domain with enough spare area for ballooned
>pages should make this problem much less serious.
>
>Another problem in this area is the amount of maptrack frames configured
>for a driver domain, which will limit the number of concurrent foreign
>mappings of that domain.

Right, similar problems from other backends are possible.

>
>So instead of having a blkback specific solution I'd rather have a
>common callback for backends to release foreign mappings in order to
>enable a global resource management.

This patch is also based on a common callback, namely the shrinker callback
system.  As the shrinker callback is designed for the general memory pressure
handling, I thought this is a right one to use.  Other backends having similar
problems can use this in their way.


Thanks,
SeongJae Park


>
>
>Juergen
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory pressure
  2019-12-09 10:52       ` SeongJae Park
@ 2019-12-09 11:08         ` Jürgen Groß
  2019-12-09 11:32           ` SeongJae Park
  0 siblings, 1 reply; 10+ messages in thread
From: Jürgen Groß @ 2019-12-09 11:08 UTC (permalink / raw)
  To: SeongJae Park; +Cc: linux-block, linux-kernel, sj38.park, xen-devel

On 09.12.19 11:52, SeongJae Park wrote:
> On Mon, 9 Dec 2019 11:15:22 +0100 "Jürgen Groß" <jgross@suse.com> wrote:
> 
>> On 09.12.19 10:46, Durrant, Paul wrote:
>>>> -----Original Message-----
>>>> From: Jürgen Groß <jgross@suse.com>
>>>> Sent: 09 December 2019 09:39
>>>> To: Park, Seongjae <sjpark@amazon.com>; axboe@kernel.dk;
>>>> konrad.wilk@oracle.com; roger.pau@citrix.com
>>>> Cc: linux-block@vger.kernel.org; linux-kernel@vger.kernel.org; Durrant,
>>>> Paul <pdurrant@amazon.com>; sj38.park@gmail.com; xen-
>>>> devel@lists.xenproject.org
>>>> Subject: Re: [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory
>>>> pressure
>>>>
>>>> On 09.12.19 09:58, SeongJae Park wrote:
>>>>> Each `blkif` has a free pages pool for the grant mapping.  The size of
>>>>> the pool starts from zero and be increased on demand while processing
>>>>> the I/O requests.  If current I/O requests handling is finished or 100
>>>>> milliseconds has passed since last I/O requests handling, it checks and
>>>>> shrinks the pool to not exceed the size limit, `max_buffer_pages`.
>>>>>
>>>>> Therefore, `blkfront` running guests can cause a memory pressure in the
>>>>> `blkback` running guest by attaching a large number of block devices and
>>>>> inducing I/O.
>>>>
>>>> I'm having problems to understand how a guest can attach a large number
>>>> of block devices without those having been configured by the host admin
>>>> before.
>>>>
>>>> If those devices have been configured, dom0 should be ready for that
>>>> number of devices, e.g. by having enough spare memory area for ballooned
>>>> pages.
>>>>
>>>> So either I'm missing something here or your reasoning for the need of
>>>> the patch is wrong.
>>>>
>>>
>>> I think the underlying issue is that persistent grant support is hogging memory in the backends, thereby compromising scalability. IIUC this patch is essentially a band-aid to get back to the scalability that was possible before persistent grant support was added. Ultimately the right answer should be to get rid of persistent grants support and use grant copy, but such a change is clearly more invasive and would need far more testing.
>>
>> Persistent grants are hogging ballooned pages, which is equivalent to
>> memory only in case of the backend's domain memory being equal or
>> rather near to its max memory size.
>>
>> So configuring the backend domain with enough spare area for ballooned
>> pages should make this problem much less serious.
>>
>> Another problem in this area is the amount of maptrack frames configured
>> for a driver domain, which will limit the number of concurrent foreign
>> mappings of that domain.
> 
> Right, similar problems from other backends are possible.
> 
>>
>> So instead of having a blkback specific solution I'd rather have a
>> common callback for backends to release foreign mappings in order to
>> enable a global resource management.
> 
> This patch is also based on a common callback, namely the shrinker callback
> system.  As the shrinker callback is designed for the general memory pressure
> handling, I thought this is a right one to use.  Other backends having similar
> problems can use this in their way.

But this is addressing memory shortage only and it is acting globally.

What I'd like to have in some (maybe distant) future is a way to control
resource usage per guest. Why would you want to throttle performance of
all guests instead of only the one causing the pain by hogging lots of
resources?

The new backend callback should (IMO) have a domid as parameter for
specifying which guest should be taken away resources (including the
possibility to select "any domain").

It might be reasonable to have your shrinker hook in e.g. xenbus for
calling the backend callbacks. And you could have another agent in the
grant driver reacting on shortage of possible grant mappings.

I don't expect you to implement all of that at once, but I think having
that idea in mind when addressing current issues would be nice. So as a
starting point you could move the shrinker hook to xenbus, add the
generic callback to struct xenbus_driver, populate that callback in
blkback and call it in the shrinker hook with "any domain". This would
enable a future extension to other backends and a dynamic resource
management in a natural way.


Juergen

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Re: [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory pressure
  2019-12-09 11:08         ` Jürgen Groß
@ 2019-12-09 11:32           ` SeongJae Park
  0 siblings, 0 replies; 10+ messages in thread
From: SeongJae Park @ 2019-12-09 11:32 UTC (permalink / raw)
  To: jgross; +Cc: linux-block, linux-kernel, sj38.park, xen-devel

On Mon, 9 Dec 2019 12:08:10 +0100 "Jürgen Groß" <jgross@suse.com> wrote:

>On 09.12.19 11:52, SeongJae Park wrote:
>> On Mon, 9 Dec 2019 11:15:22 +0100 "Jürgen Groß" <jgross@suse.com> wrote:
>>
>>> On 09.12.19 10:46, Durrant, Paul wrote:
>>>>> -----Original Message-----
>>>>> From: Jürgen Groß <jgross@suse.com>
>>>>> Sent: 09 December 2019 09:39
>>>>> To: Park, Seongjae <sjpark@amazon.com>; axboe@kernel.dk;
>>>>> konrad.wilk@oracle.com; roger.pau@citrix.com
>>>>> Cc: linux-block@vger.kernel.org; linux-kernel@vger.kernel.org; Durrant,
>>>>> Paul <pdurrant@amazon.com>; sj38.park@gmail.com; xen-
>>>>> devel@lists.xenproject.org
>>>>> Subject: Re: [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory
>>>>> pressure
>>>>>
>>>>> On 09.12.19 09:58, SeongJae Park wrote:
>>>>>> Each `blkif` has a free pages pool for the grant mapping.  The size of
>>>>>> the pool starts from zero and be increased on demand while processing
>>>>>> the I/O requests.  If current I/O requests handling is finished or 100
>>>>>> milliseconds has passed since last I/O requests handling, it checks and
>>>>>> shrinks the pool to not exceed the size limit, `max_buffer_pages`.
>>>>>>
>>>>>> Therefore, `blkfront` running guests can cause a memory pressure in the
>>>>>> `blkback` running guest by attaching a large number of block devices and
>>>>>> inducing I/O.
>>>>>
>>>>> I'm having problems to understand how a guest can attach a large number
>>>>> of block devices without those having been configured by the host admin
>>>>> before.
>>>>>
>>>>> If those devices have been configured, dom0 should be ready for that
>>>>> number of devices, e.g. by having enough spare memory area for ballooned
>>>>> pages.
>>>>>
>>>>> So either I'm missing something here or your reasoning for the need of
>>>>> the patch is wrong.
>>>>>
>>>>
>>>> I think the underlying issue is that persistent grant support is hogging memory in the backends, thereby compromising scalability. IIUC this patch is essentially a band-aid to get back to the scalability that was possible before persistent grant support was added. Ultimately the right answer should be to get rid of persistent grants support and use grant copy, but such a change is clearly more invasive and would need far more testing.
>>>
>>> Persistent grants are hogging ballooned pages, which is equivalent to
>>> memory only in case of the backend's domain memory being equal or
>>> rather near to its max memory size.
>>>
>>> So configuring the backend domain with enough spare area for ballooned
>>> pages should make this problem much less serious.
>>>
>>> Another problem in this area is the amount of maptrack frames configured
>>> for a driver domain, which will limit the number of concurrent foreign
>>> mappings of that domain.
>>
>> Right, similar problems from other backends are possible.
>>
>>>
>>> So instead of having a blkback specific solution I'd rather have a
>>> common callback for backends to release foreign mappings in order to
>>> enable a global resource management.
>>
>> This patch is also based on a common callback, namely the shrinker callback
>> system.  As the shrinker callback is designed for the general memory pressure
>> handling, I thought this is a right one to use.  Other backends having similar
>> problems can use this in their way.
>
> But this is addressing memory shortage only and it is acting globally.
>
> What I'd like to have in some (maybe distant) future is a way to control
> resource usage per guest. Why would you want to throttle performance of
> all guests instead of only the one causing the pain by hogging lots of
> resources?

Good point.  I was also concerned about the performance fairness at first, but
settled in this ugly but simple solution mainly because my worst-case
performance test (detailed in 1st patch's commit msg) shows no visible
performance degradation, though it is a minimal test on my test environment.

Anyway, I agree with your future direction.

>
> The new backend callback should (IMO) have a domid as parameter for
> specifying which guest should be taken away resources (including the
> possibility to select "any domain").
>
> It might be reasonable to have your shrinker hook in e.g. xenbus for
> calling the backend callbacks. And you could have another agent in the
> grant driver reacting on shortage of possible grant mappings.
>
> I don't expect you to implement all of that at once, but I think having
> that idea in mind when addressing current issues would be nice. So as a
> starting point you could move the shrinker hook to xenbus, add the
> generic callback to struct xenbus_driver, populate that callback in
> blkback and call it in the shrinker hook with "any domain". This would
> enable a future extension to other backends and a dynamic resource
> management in a natural way.

Appreciate this kind and detailed advice.  I will post the second version
applying your comments, soon.


Thanks,
SeongJae Park

>
>
>Juergen
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-12-09 11:33 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-09  8:58 [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory pressure SeongJae Park
2019-12-09  8:58 ` [PATCH v3 1/1] xen/blkback: Squeeze page pools if a memory pressure is detected SeongJae Park
2019-12-09  9:39 ` [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory pressure Jürgen Groß
2019-12-09  9:46   ` Durrant, Paul
2019-12-09 10:15     ` Jürgen Groß
2019-12-09 10:52       ` SeongJae Park
2019-12-09 11:08         ` Jürgen Groß
2019-12-09 11:32           ` SeongJae Park
2019-12-09 10:23   ` SeongJae Park
2019-12-09 10:29     ` Jürgen Groß

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).