[RFC] memory pressure detection in VMs using PSI mechanism for dynamically inflating/deflating VM memory

* [RFC] memory pressure detection in VMs using PSI mechanism for dynamically inflating/deflating VM memory
@ 2023-01-15  3:57 Sudarshan Rajagopalan
  2023-01-17 15:33 ` David Hildenbrand
  0 siblings, 1 reply; 8+ messages in thread
From: Sudarshan Rajagopalan @ 2023-01-15  3:57 UTC (permalink / raw)
  To: David Hildenbrand, Johannes Weiner, Suren Baghdasaryan,
	Mike Rapoport, Oscar Salvador, Anshuman Khandual, mark.rutland,
	will, virtualization, linux-mm, linux-kernel, linux-arm-kernel,
	linux-arm-msm
  Cc: Trilok Soni (QUIC), Sukadev Bhattiprolu (QUIC),
	Srivatsa Vaddagiri (QUIC), Patrick Daly (QUIC)

Hello all,

We’re from the Linux memory team here at Qualcomm. We are currently 
devising a VM memory resizing feature where we dynamically inflate or 
deflate the Linux VM based on ongoing memory demands in the VM. We 
wanted to propose few details about this userspace daemon in form of RFC 
and wanted to know the upstream’s opinion. Here are few details –

1. This will be a native userspace daemon that will be running only in 
the Linux VM which will use virtio-mem driver that uses memory hotplug 
to add/remove memory. The VM (aka Secondary VM, SVM) will request for 
memory from the host which is Primary VM, PVM via the backend hypervisor 
which takes care of cross-VM communication.

2. This will be guest driver. This daemon will use PSI mechanism to 
monitor memory pressure to keep track of memory demands in the system. 
It will register to few memory pressure events and make an educated 
guess on when demand for memory in system is increasing.

3. Currently, min PSI window size is 500ms, so PSI monitor sampling 
period will be 50ms. In order to get quick response time from PSI, we’ve 
reduced the min window size to 50ms so that as small as 5ms increase in 
memory pressure can be reported to userspace by PSI.

/* PSI trigger definitions */
-#define WINDOW_MIN_US 500000   /* Min window size is 500ms */
+#define WINDOW_MIN_US 50000    /* Min window size is 50ms */

4. Detecting increase in memory demand – when a certain usecase starts 
in VM that does memory allocations, it will stall causing PSI mechanism 
to generate a memory pressure event to userspace. To simply put, when 
pressure increases certain set threshold, it can make educated guess 
that a memory requiring usecase has ran and VM system needs memory to be 
added.

5. Detecting decrease in memory pressure – the reverse part where we 
give back memory to PVM when memory is no longer needed is bit tricky. 
We look for pressure decay and see if PSI averages (avg10, avg60, 
avg300) go down, and along with other memory stats (such as free memory 
etc) we make an educated guess that usecase has ended and memory has 
been free’ed by the usecase, and this memory can be given back to PVM 
when its no longer needed.

6. I’m skimming much on the logic and intelligence but the daemon relies 
on PSI mechanism to know when memory demand is going up and down, and 
communicates with virtio-mem driver for hot-plugging/unplugging memory. 
We also factor in the latency involved with roundtrips between SVM<->PVM 
so we size the memory chuck that needs to be plugged-in accordingly.

7. The whole purpose of daemon using PSI mechanism is to make this si 
guest driven rather than host driven, which currently is the case mostly 
with virtio-mem users. The memory pressure and usage monitoring happens 
inside the SVM and the SVM makes the decisions to request for memory 
from PVM. This avoids any intervention such as admin in PVM to monitor 
and control the knobs. We have also set max limit of how much SVMs can 
grow interms of memory, so that a rouge VM would not abuse this scheme.

This daemon is currently in just Beta stage now and we have basic 
functionality running. We are yet to add more flesh to this scheme to 
make sure any potential risks or security concerns are taken care as well.

We would happy to know your opinions on such a scheme.

Thanks and Regards,
Sudarshan

^ permalink raw reply	[flat|nested] 8+ messages in thread