linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] memory pressure detection in VMs using PSI mechanism for dynamically inflating/deflating VM memory
@ 2023-01-15  3:57 Sudarshan Rajagopalan
  2023-01-17 15:33 ` David Hildenbrand
  0 siblings, 1 reply; 8+ messages in thread
From: Sudarshan Rajagopalan @ 2023-01-15  3:57 UTC (permalink / raw)
  To: David Hildenbrand, Johannes Weiner, Suren Baghdasaryan,
	Mike Rapoport, Oscar Salvador, Anshuman Khandual, mark.rutland,
	will, virtualization, linux-mm, linux-kernel, linux-arm-kernel,
	linux-arm-msm
  Cc: Trilok Soni (QUIC), Sukadev Bhattiprolu (QUIC),
	Srivatsa Vaddagiri (QUIC), Patrick Daly (QUIC)

Hello all,

We’re from the Linux memory team here at Qualcomm. We are currently 
devising a VM memory resizing feature where we dynamically inflate or 
deflate the Linux VM based on ongoing memory demands in the VM. We 
wanted to propose few details about this userspace daemon in form of RFC 
and wanted to know the upstream’s opinion. Here are few details –

1. This will be a native userspace daemon that will be running only in 
the Linux VM which will use virtio-mem driver that uses memory hotplug 
to add/remove memory. The VM (aka Secondary VM, SVM) will request for 
memory from the host which is Primary VM, PVM via the backend hypervisor 
which takes care of cross-VM communication.

2. This will be guest driver. This daemon will use PSI mechanism to 
monitor memory pressure to keep track of memory demands in the system. 
It will register to few memory pressure events and make an educated 
guess on when demand for memory in system is increasing.

3. Currently, min PSI window size is 500ms, so PSI monitor sampling 
period will be 50ms. In order to get quick response time from PSI, we’ve 
reduced the min window size to 50ms so that as small as 5ms increase in 
memory pressure can be reported to userspace by PSI.

/* PSI trigger definitions */
-#define WINDOW_MIN_US 500000   /* Min window size is 500ms */
+#define WINDOW_MIN_US 50000    /* Min window size is 50ms */

4. Detecting increase in memory demand – when a certain usecase starts 
in VM that does memory allocations, it will stall causing PSI mechanism 
to generate a memory pressure event to userspace. To simply put, when 
pressure increases certain set threshold, it can make educated guess 
that a memory requiring usecase has ran and VM system needs memory to be 
added.

5. Detecting decrease in memory pressure – the reverse part where we 
give back memory to PVM when memory is no longer needed is bit tricky. 
We look for pressure decay and see if PSI averages (avg10, avg60, 
avg300) go down, and along with other memory stats (such as free memory 
etc) we make an educated guess that usecase has ended and memory has 
been free’ed by the usecase, and this memory can be given back to PVM 
when its no longer needed.

6. I’m skimming much on the logic and intelligence but the daemon relies 
on PSI mechanism to know when memory demand is going up and down, and 
communicates with virtio-mem driver for hot-plugging/unplugging memory. 
We also factor in the latency involved with roundtrips between SVM<->PVM 
so we size the memory chuck that needs to be plugged-in accordingly.

7. The whole purpose of daemon using PSI mechanism is to make this si 
guest driven rather than host driven, which currently is the case mostly 
with virtio-mem users. The memory pressure and usage monitoring happens 
inside the SVM and the SVM makes the decisions to request for memory 
from PVM. This avoids any intervention such as admin in PVM to monitor 
and control the knobs. We have also set max limit of how much SVMs can 
grow interms of memory, so that a rouge VM would not abuse this scheme.

This daemon is currently in just Beta stage now and we have basic 
functionality running. We are yet to add more flesh to this scheme to 
make sure any potential risks or security concerns are taken care as well.

We would happy to know your opinions on such a scheme.

Thanks and Regards,
Sudarshan


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] memory pressure detection in VMs using PSI mechanism for dynamically inflating/deflating VM memory
  2023-01-15  3:57 [RFC] memory pressure detection in VMs using PSI mechanism for dynamically inflating/deflating VM memory Sudarshan Rajagopalan
@ 2023-01-17 15:33 ` David Hildenbrand
  2023-01-17 23:45   ` Sudarshan Rajagopalan
  0 siblings, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2023-01-17 15:33 UTC (permalink / raw)
  To: Sudarshan Rajagopalan, Johannes Weiner, Suren Baghdasaryan,
	Mike Rapoport, Oscar Salvador, Anshuman Khandual, mark.rutland,
	will, virtualization, linux-mm, linux-kernel, linux-arm-kernel,
	linux-arm-msm
  Cc: Trilok Soni (QUIC), Sukadev Bhattiprolu (QUIC),
	Srivatsa Vaddagiri (QUIC), Patrick Daly (QUIC)

On 15.01.23 04:57, Sudarshan Rajagopalan wrote:
> Hello all,
> 

Hi,

I'll focus on the virtio-mem side of things :)

> We’re from the Linux memory team here at Qualcomm. We are currently
> devising a VM memory resizing feature where we dynamically inflate or
> deflate the Linux VM based on ongoing memory demands in the VM. We
> wanted to propose few details about this userspace daemon in form of RFC
> and wanted to know the upstream’s opinion. Here are few details –

I'd avoid using the terminology of inflating/deflating VM memory when 
talking about virtio-mem. Just call it "dynamically resizing VM memory". 
virtio-mem is one way of doing it using memory devices.

Inflation/deflation, in contrast, reminds one of a traditional balloon 
driver, along the lines of virtio-balloon.

> 
> 1. This will be a native userspace daemon that will be running only in
> the Linux VM which will use virtio-mem driver that uses memory hotplug
> to add/remove memory. The VM (aka Secondary VM, SVM) will request for
> memory from the host which is Primary VM, PVM via the backend hypervisor
> which takes care of cross-VM communication.
> 
> 2. This will be guest driver. This daemon will use PSI mechanism to
> monitor memory pressure to keep track of memory demands in the system.
> It will register to few memory pressure events and make an educated
> guess on when demand for memory in system is increasing.

Is that running in the primary or the secondary VM?

> 
> 3. Currently, min PSI window size is 500ms, so PSI monitor sampling
> period will be 50ms. In order to get quick response time from PSI, we’ve
> reduced the min window size to 50ms so that as small as 5ms increase in
> memory pressure can be reported to userspace by PSI.
> 
> /* PSI trigger definitions */
> -#define WINDOW_MIN_US 500000   /* Min window size is 500ms */
> +#define WINDOW_MIN_US 50000    /* Min window size is 50ms */
> 
> 4. Detecting increase in memory demand – when a certain usecase starts
> in VM that does memory allocations, it will stall causing PSI mechanism
> to generate a memory pressure event to userspace. To simply put, when
> pressure increases certain set threshold, it can make educated guess
> that a memory requiring usecase has ran and VM system needs memory to be
> added.
> 
> 5. Detecting decrease in memory pressure – the reverse part where we
> give back memory to PVM when memory is no longer needed is bit tricky.
> We look for pressure decay and see if PSI averages (avg10, avg60,
> avg300) go down, and along with other memory stats (such as free memory
> etc) we make an educated guess that usecase has ended and memory has
> been free’ed by the usecase, and this memory can be given back to PVM
> when its no longer needed.
> 
> 6. I’m skimming much on the logic and intelligence but the daemon relies
> on PSI mechanism to know when memory demand is going up and down, and
> communicates with virtio-mem driver for hot-plugging/unplugging memory.

For now, the hypervisor is in charge of triggering a virtio-mem device 
resize request. Will the Linux VM expose a virtio-mem device to the SVM 
and request to resize the SVM memory via that virtio-mem device?

> We also factor in the latency involved with roundtrips between SVM<->PVM
> so we size the memory chuck that needs to be plugged-in accordingly.
> 
> 7. The whole purpose of daemon using PSI mechanism is to make this si
> guest driven rather than host driven, which currently is the case mostly
> with virtio-mem users. The memory pressure and usage monitoring happens
> inside the SVM and the SVM makes the decisions to request for memory
> from PVM. This avoids any intervention such as admin in PVM to monitor
> and control the knobs. We have also set max limit of how much SVMs can
> grow interms of memory, so that a rouge VM would not abuse this scheme.

Something I envisioned at some point is to
1) Have a virtio-mem guest driver to request a size change. The
    hypervisor will react accordingly by adjusting the requested size.

    Such a driver<->device request could be communicated via any other
    communication mechanism to the hypervisor, but it already came up a
    couple of times to do it via the virtio-mem protocol directly.

2) Configure the hypervisor to have a lower/upper range. Within that
    range, resize requests by the driver can be granted. The current
    values of these properties can be exposed via the device to the
    driver as well.

Is that what you also proposing here? If so, great.

> 
> This daemon is currently in just Beta stage now and we have basic
> functionality running. We are yet to add more flesh to this scheme to

Good to hear that the basics are running with virtio-mem (I assume :) ).

> make sure any potential risks or security concerns are taken care as well.

It would be great to draw/explain the architecture in more detail.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] memory pressure detection in VMs using PSI mechanism for dynamically inflating/deflating VM memory
  2023-01-17 15:33 ` David Hildenbrand
@ 2023-01-17 23:45   ` Sudarshan Rajagopalan
  2023-01-23  9:58     ` David Hildenbrand
  0 siblings, 1 reply; 8+ messages in thread
From: Sudarshan Rajagopalan @ 2023-01-17 23:45 UTC (permalink / raw)
  To: David Hildenbrand, Johannes Weiner, Suren Baghdasaryan,
	Mike Rapoport, Oscar Salvador, Anshuman Khandual, mark.rutland,
	will, virtualization, linux-mm, linux-kernel, linux-arm-kernel,
	linux-arm-msm
  Cc: Trilok Soni (QUIC), Sukadev Bhattiprolu (QUIC),
	Srivatsa Vaddagiri (QUIC), Patrick Daly (QUIC)


Hello David, thanks for your comments.


On 1/17/2023 7:33 AM, David Hildenbrand wrote:
> On 15.01.23 04:57, Sudarshan Rajagopalan wrote:
>> Hello all,
>>
>
> Hi,
>
> I'll focus on the virtio-mem side of things :)
>
>> We’re from the Linux memory team here at Qualcomm. We are currently
>> devising a VM memory resizing feature where we dynamically inflate or
>> deflate the Linux VM based on ongoing memory demands in the VM. We
>> wanted to propose few details about this userspace daemon in form of RFC
>> and wanted to know the upstream’s opinion. Here are few details –
>
> I'd avoid using the terminology of inflating/deflating VM memory when 
> talking about virtio-mem. Just call it "dynamically resizing VM 
> memory". virtio-mem is one way of doing it using memory devices.
>
> Inflation/deflation, in contrast, reminds one of a traditional balloon 
> driver, along the lines of virtio-balloon.

Ok sure, duly noted :). "dynamically resizing VM memory" makes more 
sense when using virtio-mem.

>
>>
>> 1. This will be a native userspace daemon that will be running only in
>> the Linux VM which will use virtio-mem driver that uses memory hotplug
>> to add/remove memory. The VM (aka Secondary VM, SVM) will request for
>> memory from the host which is Primary VM, PVM via the backend hypervisor
>> which takes care of cross-VM communication.
>>
>> 2. This will be guest driver. This daemon will use PSI mechanism to
>> monitor memory pressure to keep track of memory demands in the system.
>> It will register to few memory pressure events and make an educated
>> guess on when demand for memory in system is increasing.
>
> Is that running in the primary or the secondary VM?

The userspace PSI daemon will be running on secondary VM. It will talk 
to a kernel driver (running on secondary VM itself) via ioctl. This 
kernel driver will talk to slightly modified version of virtio-mem 
driver where it can call the virtio_mem_config_changed(virtiomem_device) 
function for resizing the secondary VM. So its mainly "guest driven" now.

>
>>
>> 3. Currently, min PSI window size is 500ms, so PSI monitor sampling
>> period will be 50ms. In order to get quick response time from PSI, we’ve
>> reduced the min window size to 50ms so that as small as 5ms increase in
>> memory pressure can be reported to userspace by PSI.
>>
>> /* PSI trigger definitions */
>> -#define WINDOW_MIN_US 500000   /* Min window size is 500ms */
>> +#define WINDOW_MIN_US 50000    /* Min window size is 50ms */
>>
>> 4. Detecting increase in memory demand – when a certain usecase starts
>> in VM that does memory allocations, it will stall causing PSI mechanism
>> to generate a memory pressure event to userspace. To simply put, when
>> pressure increases certain set threshold, it can make educated guess
>> that a memory requiring usecase has ran and VM system needs memory to be
>> added.
>>
>> 5. Detecting decrease in memory pressure – the reverse part where we
>> give back memory to PVM when memory is no longer needed is bit tricky.
>> We look for pressure decay and see if PSI averages (avg10, avg60,
>> avg300) go down, and along with other memory stats (such as free memory
>> etc) we make an educated guess that usecase has ended and memory has
>> been free’ed by the usecase, and this memory can be given back to PVM
>> when its no longer needed.
>>
>> 6. I’m skimming much on the logic and intelligence but the daemon relies
>> on PSI mechanism to know when memory demand is going up and down, and
>> communicates with virtio-mem driver for hot-plugging/unplugging memory.
>
> For now, the hypervisor is in charge of triggering a virtio-mem device 
> resize request. Will the Linux VM expose a virtio-mem device to the 
> SVM and request to resize the SVM memory via that virtio-mem device?

Yes, the Linux VM will expose virtio-mem device where the Linux VM 
itself can ask to resize its VM memory.


>
>> We also factor in the latency involved with roundtrips between SVM<->PVM
>> so we size the memory chuck that needs to be plugged-in accordingly.
>>
>> 7. The whole purpose of daemon using PSI mechanism is to make this si
>> guest driven rather than host driven, which currently is the case mostly
>> with virtio-mem users. The memory pressure and usage monitoring happens
>> inside the SVM and the SVM makes the decisions to request for memory
>> from PVM. This avoids any intervention such as admin in PVM to monitor
>> and control the knobs. We have also set max limit of how much SVMs can
>> grow interms of memory, so that a rouge VM would not abuse this scheme.
>
> Something I envisioned at some point is to
> 1) Have a virtio-mem guest driver to request a size change. The
>    hypervisor will react accordingly by adjusting the requested size.
>
>    Such a driver<->device request could be communicated via any other
>    communication mechanism to the hypervisor, but it already came up a
>    couple of times to do it via the virtio-mem protocol directly.
>
> 2) Configure the hypervisor to have a lower/upper range. Within that
>    range, resize requests by the driver can be granted. The current
>    values of these properties can be exposed via the device to the
>    driver as well.
>
> Is that what you also proposing here? If so, great.

Actually this is exactly what we are doing here. The virtio-mem guest 
driver requests size change and hypervisor reacts to this size change 
and adds/removes the requested size memory into IPA space of VM. The 
virtio-mem guest driver then plugs in/out this memory via memory 
hotplug. I think the driver communicates to hypervisor via virtio 
protocol itself.

Currently we're setting min/max limit within the virtio-mem guest driver 
itself on how much VM memory can be resized to. This limit can ofcourse 
be set in hypervisor for security reasons but we're still in 
experimentation stage now.

>
>>
>> This daemon is currently in just Beta stage now and we have basic
>> functionality running. We are yet to add more flesh to this scheme to
>
> Good to hear that the basics are running with virtio-mem (I assume :) ).
>
>> make sure any potential risks or security concerns are taken care as 
>> well.
>
> It would be great to draw/explain the architecture in more detail.

We will be looking into solving any potential security concerns where 
hypervisor would restrict few actions of resizing of memory. Right now, 
we are experimenting to see if PSI mechanism itself can be used for ways 
of detecting memory pressure in the system and add memory to secondary 
VM when memory is in need. Taking into account all the latencies 
involved in the PSI scheme (i.e. time when one does malloc call till 
when extra memory gets added to SVM system). And wanted to know 
upstream's opinion on such a scheme using PSI mechanism for detecting 
memory pressure and resizing SVM accordingly.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] memory pressure detection in VMs using PSI mechanism for dynamically inflating/deflating VM memory
  2023-01-17 23:45   ` Sudarshan Rajagopalan
@ 2023-01-23  9:58     ` David Hildenbrand
  2023-01-23 23:04       ` Sudarshan Rajagopalan
  0 siblings, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2023-01-23  9:58 UTC (permalink / raw)
  To: Sudarshan Rajagopalan, Johannes Weiner, Suren Baghdasaryan,
	Mike Rapoport, Oscar Salvador, Anshuman Khandual, mark.rutland,
	will, virtualization, linux-mm, linux-kernel, linux-arm-kernel,
	linux-arm-msm
  Cc: Trilok Soni (QUIC), Sukadev Bhattiprolu (QUIC),
	Srivatsa Vaddagiri (QUIC), Patrick Daly (QUIC)

>>>
>>> 1. This will be a native userspace daemon that will be running only in
>>> the Linux VM which will use virtio-mem driver that uses memory hotplug
>>> to add/remove memory. The VM (aka Secondary VM, SVM) will request for
>>> memory from the host which is Primary VM, PVM via the backend hypervisor
>>> which takes care of cross-VM communication.
>>>
>>> 2. This will be guest driver. This daemon will use PSI mechanism to
>>> monitor memory pressure to keep track of memory demands in the system.
>>> It will register to few memory pressure events and make an educated
>>> guess on when demand for memory in system is increasing.
>>
>> Is that running in the primary or the secondary VM?
> 
> The userspace PSI daemon will be running on secondary VM. It will talk
> to a kernel driver (running on secondary VM itself) via ioctl. This
> kernel driver will talk to slightly modified version of virtio-mem
> driver where it can call the virtio_mem_config_changed(virtiomem_device)
> function for resizing the secondary VM. So its mainly "guest driven" now.

Okay, thanks.

[...]

>>>
>>> This daemon is currently in just Beta stage now and we have basic
>>> functionality running. We are yet to add more flesh to this scheme to
>>
>> Good to hear that the basics are running with virtio-mem (I assume :) ).
>>
>>> make sure any potential risks or security concerns are taken care as
>>> well.
>>
>> It would be great to draw/explain the architecture in more detail.
> 
> We will be looking into solving any potential security concerns where
> hypervisor would restrict few actions of resizing of memory. Right now,
> we are experimenting to see if PSI mechanism itself can be used for ways
> of detecting memory pressure in the system and add memory to secondary
> VM when memory is in need. Taking into account all the latencies
> involved in the PSI scheme (i.e. time when one does malloc call till
> when extra memory gets added to SVM system). And wanted to know
> upstream's opinion on such a scheme using PSI mechanism for detecting
> memory pressure and resizing SVM accordingly.

One problematic thing is that adding memory to Linux by virtio-mem 
eventually consumes memory (e.g., the memmap), especially when having to 
to add a completely new memory block to Linux.

So if you're already under severe memory pressure, these allocations to 
bring up new memory can fail. The question is, if PSI can notify "early" 
enough such that this barely happens in practice.

There are some possible ways to mitigate:

1) Always keep spare memory blocks by virtio-mem added to Linux, that
    don't expose any memory yet. Memory from these block can be handed
    over to Linux without additional Linux allocations. Of course, they
    consume metadata, so one might want to limit them.

2) Implement memmap_on_memory support for virtio-mem. This might help in
    some setups, where the device block size is suitable.

Did you run into that scenario already during your experiments, and how 
did you deal with that?

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] memory pressure detection in VMs using PSI mechanism for dynamically inflating/deflating VM memory
  2023-01-23  9:58     ` David Hildenbrand
@ 2023-01-23 23:04       ` Sudarshan Rajagopalan
  2023-01-24 15:20         ` David Hildenbrand
  0 siblings, 1 reply; 8+ messages in thread
From: Sudarshan Rajagopalan @ 2023-01-23 23:04 UTC (permalink / raw)
  To: David Hildenbrand, Johannes Weiner, Suren Baghdasaryan,
	Mike Rapoport, Oscar Salvador, Anshuman Khandual, mark.rutland,
	will, virtualization, linux-mm, linux-kernel, linux-arm-kernel,
	linux-arm-msm
  Cc: Trilok Soni (QUIC), Sukadev Bhattiprolu (QUIC),
	Srivatsa Vaddagiri (QUIC), Patrick Daly (QUIC)


On 1/23/2023 1:58 AM, David Hildenbrand wrote:
>>>>
>>>> 1. This will be a native userspace daemon that will be running only in
>>>> the Linux VM which will use virtio-mem driver that uses memory hotplug
>>>> to add/remove memory. The VM (aka Secondary VM, SVM) will request for
>>>> memory from the host which is Primary VM, PVM via the backend 
>>>> hypervisor
>>>> which takes care of cross-VM communication.
>>>>
>>>> 2. This will be guest driver. This daemon will use PSI mechanism to
>>>> monitor memory pressure to keep track of memory demands in the system.
>>>> It will register to few memory pressure events and make an educated
>>>> guess on when demand for memory in system is increasing.
>>>
>>> Is that running in the primary or the secondary VM?
>>
>> The userspace PSI daemon will be running on secondary VM. It will talk
>> to a kernel driver (running on secondary VM itself) via ioctl. This
>> kernel driver will talk to slightly modified version of virtio-mem
>> driver where it can call the virtio_mem_config_changed(virtiomem_device)
>> function for resizing the secondary VM. So its mainly "guest driven" 
>> now.
>
> Okay, thanks.
>
> [...]
>
>>>>
>>>> This daemon is currently in just Beta stage now and we have basic
>>>> functionality running. We are yet to add more flesh to this scheme to
>>>
>>> Good to hear that the basics are running with virtio-mem (I assume 
>>> :) ).
>>>
>>>> make sure any potential risks or security concerns are taken care as
>>>> well.
>>>
>>> It would be great to draw/explain the architecture in more detail.
>>
>> We will be looking into solving any potential security concerns where
>> hypervisor would restrict few actions of resizing of memory. Right now,
>> we are experimenting to see if PSI mechanism itself can be used for ways
>> of detecting memory pressure in the system and add memory to secondary
>> VM when memory is in need. Taking into account all the latencies
>> involved in the PSI scheme (i.e. time when one does malloc call till
>> when extra memory gets added to SVM system). And wanted to know
>> upstream's opinion on such a scheme using PSI mechanism for detecting
>> memory pressure and resizing SVM accordingly.
>
> One problematic thing is that adding memory to Linux by virtio-mem 
> eventually consumes memory (e.g., the memmap), especially when having 
> to to add a completely new memory block to Linux.
>
Yes we have thought about this issue as well where-in when system is 
heavily on memory pressure, it would require some memory for memmap 
metadata, and also few other places in memory hotplug that it would need 
to alloc_pages for hot-plugging in. I think this path in memory_hotplug 
may be fixed where it doesn't rely on allocating some small portion of 
memory for hotplugging. But, the purpose memory_hotplug itself wasn't 
for plugging memory on system being in memory pressure :).


> So if you're already under severe memory pressure, these allocations 
> to bring up new memory can fail. The question is, if PSI can notify 
> "early" enough such that this barely happens in practice.
>
> There are some possible ways to mitigate:
>
> 1) Always keep spare memory blocks by virtio-mem added to Linux, that
> B B  don't expose any memory yet. Memory from these block can be handed
> B B  over to Linux without additional Linux allocations. Of course, they
> B B  consume metadata, so one might want to limit them.
>
> 2) Implement memmap_on_memory support for virtio-mem. This might help in
> B B  some setups, where the device block size is suitable.
>
> Did you run into that scenario already during your experiments, and 
> how did you deal with that?
>
We are exactly implementing 2) you had mentioned i.e. enabling 
memmap_on_memory support for virtio-mem. This always guarantees that 
free memory is always present for memmap metadata while hotplugging. But 
this required us to increase memory block size to 256MB (from 128MB) for 
alignment requirement of memory hotplug to enable memory_on_memmap, for 
4K page size configuration. Option 1) you mentioned also seems 
interesting - its good to have some spare memory in hand when system is 
heavily in memory pressure so that this memory can be handed over 
immediately on PSI pressure and doesn't have to wait for memory plug-in 
request roundtrip from Primary VM.

Do you think having memmap_on_memory support for virtio-mem is useful to 
have? If so, we can send the patch that supports this in virtio-mem?

Also, we are looking into ways of having memmap_on_memory enabled 
without requiring to increase memory block size. This might require some 
core changes in memory_hotplug but we haven't explored it much.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] memory pressure detection in VMs using PSI mechanism for dynamically inflating/deflating VM memory
  2023-01-23 23:04       ` Sudarshan Rajagopalan
@ 2023-01-24 15:20         ` David Hildenbrand
  0 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2023-01-24 15:20 UTC (permalink / raw)
  To: Sudarshan Rajagopalan, Johannes Weiner, Suren Baghdasaryan,
	Mike Rapoport, Oscar Salvador, Anshuman Khandual, mark.rutland,
	will, virtualization, linux-mm, linux-kernel, linux-arm-kernel,
	linux-arm-msm
  Cc: Trilok Soni (QUIC), Sukadev Bhattiprolu (QUIC),
	Srivatsa Vaddagiri (QUIC), Patrick Daly (QUIC)

On 24.01.23 00:04, Sudarshan Rajagopalan wrote:
[...]
>> One problematic thing is that adding memory to Linux by virtio-mem
>> eventually consumes memory (e.g., the memmap), especially when having
>> to to add a completely new memory block to Linux.
>>
> Yes we have thought about this issue as well where-in when system is
> heavily on memory pressure, it would require some memory for memmap
> metadata, and also few other places in memory hotplug that it would need
> to alloc_pages for hot-plugging in. I think this path in memory_hotplug
> may be fixed where it doesn't rely on allocating some small portion of
> memory for hotplugging. But, the purpose memory_hotplug itself wasn't
> for plugging memory on system being in memory pressure :).

Some small allocations might be classified as "urgent" and go to atomic 
reserves (e.g., resource tree node, memory device node). The big 
allocations (memmap, page-ext if enabled, eventually page tables for 
direct map when not mapping huge pages) are the problematic "memory 
consumers" I think.

> 
> 
>> So if you're already under severe memory pressure, these allocations
>> to bring up new memory can fail. The question is, if PSI can notify
>> "early" enough such that this barely happens in practice.
>>
>> There are some possible ways to mitigate:
>>
>> 1) Always keep spare memory blocks by virtio-mem added to Linux, that
>> B B  don't expose any memory yet. Memory from these block can be handed
>> B B  over to Linux without additional Linux allocations. Of course, they
>> B B  consume metadata, so one might want to limit them.
>>
>> 2) Implement memmap_on_memory support for virtio-mem. This might help in
>> B B  some setups, where the device block size is suitable.
>>
>> Did you run into that scenario already during your experiments, and
>> how did you deal with that?
>>
> We are exactly implementing 2) you had mentioned i.e. enabling
> memmap_on_memory support for virtio-mem. This always guarantees that
> free memory is always present for memmap metadata while hotplugging. But
> this required us to increase memory block size to 256MB (from 128MB) for
> alignment requirement of memory hotplug to enable memory_on_memmap, for
> 4K page size configuration. Option 1) you mentioned also seems

The memmap of 128 MiB is 2 MiB. Assuming the pageblock size is 2 MiB, 
and virtio-mem supports a device block size of 2 MiB, it should "in 
theory" also work with 128 MiB memory blocks.

So I'd be curious why the change to 256 MiB was required. Maybe, that 
kernel config ends up with a pageblock size of 4 MiB (IIRC that can 
happen without CONFIG_HUGETLB -- which we should most probbaly change to 
also be PMD_ORDER due to THP).

> interesting - its good to have some spare memory in hand when system is
> heavily in memory pressure so that this memory can be handed over
> immediately on PSI pressure and doesn't have to wait for memory plug-in
> request roundtrip from Primary VM.

The idea was that you'd still do the roundtrip to request plugging of 
device memory blocks, but that you could immediately expose memory to 
the system (without requiring allocations), to eventually immediately 
prepare the next Linux memory block while "fresh" memory is available.

This way you could handle most allocations that happen when adding a 
Linux memory block.

The main idea was to always have at least one spare one lying around. 
And as soon as you start exposing memory from one of them to the page 
allocator, immediately prepare the next one.

> 
> Do you think having memmap_on_memory support for virtio-mem is useful to
> have? If so, we can send the patch that supports this in virtio-mem?
> 

I think yes. However, last time I though about adding support, I 
realized that there are some ugly corner cases to handle cleanly.

You have to make sure that the device memory blocks to-be-used as memmap 
are "plugged" even before calling add_memory_driver_managed(). And you 
can only "unplug" these device memory blocks after the memory block was 
removed via offline_and_remove_memory().

So the whole order of events and management of plugged device blocks 
changes quite a bit ...

... and what to do if the device block size is, say 4MiB, but the memmap 
is 2 MiB? Of course, one could simply skip the optimization then.

Having that said, if you managed to get it running and it's not too 
ugly, please share.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] memory pressure detection in VMs using PSI mechanism for dynamically inflating/deflating VM memory
       [not found]   ` <50f979aa-37a6-db4b-465d-1dc0a27c2dfc@quicinc.com>
@ 2023-08-01 21:20     ` Sudarshan Rajagopalan
  0 siblings, 0 replies; 8+ messages in thread
From: Sudarshan Rajagopalan @ 2023-08-01 21:20 UTC (permalink / raw)
  To: T.J. Alumbaugh, David Hildenbrand, Johannes Weiner,
	Suren Baghdasaryan, Mike Rapoport, Oscar Salvador,
	Anshuman Khandual, mark.rutland, will, virtualization, linux-mm,
	linux-kernel, linux-arm-kernel, linux-arm-msm
  Cc: Trilok Soni (QUIC), Sukadev Bhattiprolu (QUIC),
	Srivatsa Vaddagiri (QUIC), Patrick Daly (QUIC)


On 1/23/2023 3:47 PM, Sudarshan Rajagopalan wrote:
>
> On 1/23/2023 1:26 PM, T.J. Alumbaugh wrote:
>> Hi Sudarshan,
>>
>> I had questions about the setup and another about the use of PSI.
> Thanks for your comments Alumbaugh.
>>> 1. This will be a native userspace daemon that will be running only 
>>> in the Linux VM which will use virtio-mem driver that uses memory 
>>> hotplug to add/remove memory. The VM (aka Secondary VM, SVM) will 
>>> request for memory from the host which is Primary VM, PVM via the 
>>> backend hypervisor which takes care of cross-VM communication.
>>>
>> In regards to the "PVM/SVM" nomenclature, is the implied setup one of
>> fault tolerance (i.e. the secondary is there to take over in case of
>> failure of the primary VM)? Generally speaking, are the PVM and SVM
>> part of a defined system running some workload? The context seems to
>> be that the situation is more intricate than "two virtual machines
>> running on a host", but I'm not clear how it is different from that
>> general notion.
>
> Here the Primary VM (PVM) is actually the host and we run a VM from 
> this host. We simply call this newly launched VM as Secondary VM 
> (SVM). Sorry for the confusion here. The secondary VM runs in a secure 
> environment.
>
>>
>>> 5. Detecting decrease in memory pressure b\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] memory pressure detection in VMs using PSI mechanism for dynamically inflating/deflating VM memory
       [not found] <DS0PR02MB90787835F5B9CB9771A20329C4C09@DS0PR02MB9078.namprd02.prod.outlook.com>
@ 2023-01-23 21:26 ` T.J. Alumbaugh
       [not found]   ` <50f979aa-37a6-db4b-465d-1dc0a27c2dfc@quicinc.com>
  0 siblings, 1 reply; 8+ messages in thread
From: T.J. Alumbaugh @ 2023-01-23 21:26 UTC (permalink / raw)
  To: Sudarshan Rajagopalan (QUIC)
  Cc: David Hildenbrand, Johannes Weiner, Suren Baghdasaryan,
	Mike Rapoport, Oscar Salvador, Anshuman Khandual, mark.rutland,
	will, virtualization, linux-mm, linux-kernel, linux-arm-kernel,
	linux-arm-msm, Trilok Soni (QUIC), Sukadev Bhattiprolu (QUIC),
	Srivatsa Vaddagiri (QUIC), Patrick Daly (QUIC)

Hi Sudarshan,

I had questions about the setup and another about the use of PSI.

>
> 1. This will be a native userspace daemon that will be running only in the Linux VM which will use virtio-mem driver that uses memory hotplug to add/remove memory. The VM (aka Secondary VM, SVM) will request for memory from the host which is Primary VM, PVM via the backend hypervisor which takes care of cross-VM communication.
>

In regards to the "PVM/SVM" nomenclature, is the implied setup one of
fault tolerance (i.e. the secondary is there to take over in case of
failure of the primary VM)? Generally speaking, are the PVM and SVM
part of a defined system running some workload? The context seems to
be that the situation is more intricate than "two virtual machines
running on a host", but I'm not clear how it is different from that
general notion.

>
> 5. Detecting decrease in memory pressure – the reverse part where we give back memory to PVM when memory is no longer needed is bit tricky. We look for pressure decay and see if PSI averages (avg10, avg60, avg300) go down, and along with other memory stats (such as free memory etc) we make an educated guess that usecase has ended and memory has been free’ed by the usecase, and this memory can be given back to PVM when its no longer needed.
>

This is also very interesting to me. Detecting a decrease in pressure
using PSI seems difficult. IIUC correctly, the approach taken in
OOMD/senpai from Meta seems to be continually applying
pressure/backing off, and then seeing the outcome of that decision on
the pressure metric to feedback to the next decision (see links
below). Is your approach similar? Do you check the metric periodically
or only when receiving PSI memory events in userspace?

https://github.com/facebookincubator/senpai/blob/main/senpai.py#L117-L148
https://github.com/facebookincubator/oomd/blob/main/src/oomd/plugins/Senpai.cpp#L529-L538

Very interesting proposal. Thanks for sending,

-T.J.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-08-01 21:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-15  3:57 [RFC] memory pressure detection in VMs using PSI mechanism for dynamically inflating/deflating VM memory Sudarshan Rajagopalan
2023-01-17 15:33 ` David Hildenbrand
2023-01-17 23:45   ` Sudarshan Rajagopalan
2023-01-23  9:58     ` David Hildenbrand
2023-01-23 23:04       ` Sudarshan Rajagopalan
2023-01-24 15:20         ` David Hildenbrand
     [not found] <DS0PR02MB90787835F5B9CB9771A20329C4C09@DS0PR02MB9078.namprd02.prod.outlook.com>
2023-01-23 21:26 ` T.J. Alumbaugh
     [not found]   ` <50f979aa-37a6-db4b-465d-1dc0a27c2dfc@quicinc.com>
2023-08-01 21:20     ` Sudarshan Rajagopalan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).