All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Improve memory statistics for virtio balloon
@ 2024-04-18  6:25 zhenwei pi
  2024-04-18  6:26 ` [PATCH 1/3] virtio_balloon: introduce oom-kill invocations zhenwei pi
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: zhenwei pi @ 2024-04-18  6:25 UTC (permalink / raw)
  To: linux-kernel, linux-mm, virtualization
  Cc: mst, david, jasowang, xuanzhuo, akpm, zhenwei pi

RFC -> v1:
- several text changes: oom-kill -> oom-kills, SCAN_ASYNC -> ASYN_SCAN.
- move vm events codes into '#ifdef CONFIG_VM_EVENT_COUNTERS'

RFC version:
Link: https://lore.kernel.org/lkml/20240415084113.1203428-1-pizhenwei@bytedance.com/T/#m1898963b3c27a989b1123db475135c3ca687ca84

zhenwei pi (3):
  virtio_balloon: introduce oom-kill invocations
  virtio_balloon: introduce memory allocation stall counter
  virtio_balloon: introduce memory scan/reclaim info

 drivers/virtio/virtio_balloon.c     | 30 ++++++++++++++++++++++++++++-
 include/uapi/linux/virtio_balloon.h | 16 +++++++++++++--
 2 files changed, 43 insertions(+), 3 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] virtio_balloon: introduce oom-kill invocations
  2024-04-18  6:25 [PATCH 0/3] Improve memory statistics for virtio balloon zhenwei pi
@ 2024-04-18  6:26 ` zhenwei pi
  2024-04-18 10:58   ` David Hildenbrand
  2024-04-18  6:26 ` [PATCH 2/3] virtio_balloon: introduce memory allocation stall counter zhenwei pi
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: zhenwei pi @ 2024-04-18  6:26 UTC (permalink / raw)
  To: linux-kernel, linux-mm, virtualization
  Cc: mst, david, jasowang, xuanzhuo, akpm, zhenwei pi

When the guest OS runs under critical memory pressure, the guest
starts to kill processes. A guest monitor agent may scan 'oom_kill'
from /proc/vmstat, and reports the OOM KILL event. However, the agent
may be killed and we will loss this critical event(and the later
events).

For now we can also grep for magic words in guest kernel log from host
side. Rather than this unstable way, virtio balloon reports OOM-KILL
invocations instead.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
 drivers/virtio/virtio_balloon.c     | 1 +
 include/uapi/linux/virtio_balloon.h | 6 ++++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 1f5b3dd31fcf..fd19934a847f 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -337,6 +337,7 @@ static unsigned int update_balloon_stats(struct virtio_balloon *vb)
 				pages_to_bytes(events[PSWPOUT]));
 	update_stat(vb, idx++, VIRTIO_BALLOON_S_MAJFLT, events[PGMAJFAULT]);
 	update_stat(vb, idx++, VIRTIO_BALLOON_S_MINFLT, events[PGFAULT]);
+	update_stat(vb, idx++, VIRTIO_BALLOON_S_OOM_KILL, events[OOM_KILL]);
 #ifdef CONFIG_HUGETLB_PAGE
 	update_stat(vb, idx++, VIRTIO_BALLOON_S_HTLB_PGALLOC,
 		    events[HTLB_BUDDY_PGALLOC]);
diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
index ddaa45e723c4..b17bbe033697 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -71,7 +71,8 @@ struct virtio_balloon_config {
 #define VIRTIO_BALLOON_S_CACHES   7   /* Disk caches */
 #define VIRTIO_BALLOON_S_HTLB_PGALLOC  8  /* Hugetlb page allocations */
 #define VIRTIO_BALLOON_S_HTLB_PGFAIL   9  /* Hugetlb page allocation failures */
-#define VIRTIO_BALLOON_S_NR       10
+#define VIRTIO_BALLOON_S_OOM_KILL      10 /* OOM killer invocations */
+#define VIRTIO_BALLOON_S_NR       11
 
 #define VIRTIO_BALLOON_S_NAMES_WITH_PREFIX(VIRTIO_BALLOON_S_NAMES_prefix) { \
 	VIRTIO_BALLOON_S_NAMES_prefix "swap-in", \
@@ -83,7 +84,8 @@ struct virtio_balloon_config {
 	VIRTIO_BALLOON_S_NAMES_prefix "available-memory", \
 	VIRTIO_BALLOON_S_NAMES_prefix "disk-caches", \
 	VIRTIO_BALLOON_S_NAMES_prefix "hugetlb-allocations", \
-	VIRTIO_BALLOON_S_NAMES_prefix "hugetlb-failures" \
+	VIRTIO_BALLOON_S_NAMES_prefix "hugetlb-failures", \
+	VIRTIO_BALLOON_S_NAMES_prefix "oom-kills" \
 }
 
 #define VIRTIO_BALLOON_S_NAMES VIRTIO_BALLOON_S_NAMES_WITH_PREFIX("")
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/3] virtio_balloon: introduce memory allocation stall counter
  2024-04-18  6:25 [PATCH 0/3] Improve memory statistics for virtio balloon zhenwei pi
  2024-04-18  6:26 ` [PATCH 1/3] virtio_balloon: introduce oom-kill invocations zhenwei pi
@ 2024-04-18  6:26 ` zhenwei pi
  2024-04-18 11:49   ` David Hildenbrand
  2024-04-21  3:43   ` kernel test robot
  2024-04-18  6:26 ` [PATCH 3/3] virtio_balloon: introduce memory scan/reclaim info zhenwei pi
  2024-04-22 20:57 ` [PATCH 0/3] Improve memory statistics for virtio balloon Michael S. Tsirkin
  3 siblings, 2 replies; 9+ messages in thread
From: zhenwei pi @ 2024-04-18  6:26 UTC (permalink / raw)
  To: linux-kernel, linux-mm, virtualization
  Cc: mst, david, jasowang, xuanzhuo, akpm, zhenwei pi

Memory allocation stall counter represents the performance/latency of
memory allocation, expose this counter to the host side by virtio
balloon device via out-of-bound way.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
 drivers/virtio/virtio_balloon.c     | 20 +++++++++++++++++++-
 include/uapi/linux/virtio_balloon.h |  6 ++++--
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index fd19934a847f..e88e6573afa5 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -321,7 +321,7 @@ static unsigned int update_balloon_stats(struct virtio_balloon *vb)
 	unsigned long events[NR_VM_EVENT_ITEMS];
 	struct sysinfo i;
 	unsigned int idx = 0;
-	long available;
+	long available, stall = 0;
 	unsigned long caches;
 
 	all_vm_events(events);
@@ -338,6 +338,24 @@ static unsigned int update_balloon_stats(struct virtio_balloon *vb)
 	update_stat(vb, idx++, VIRTIO_BALLOON_S_MAJFLT, events[PGMAJFAULT]);
 	update_stat(vb, idx++, VIRTIO_BALLOON_S_MINFLT, events[PGFAULT]);
 	update_stat(vb, idx++, VIRTIO_BALLOON_S_OOM_KILL, events[OOM_KILL]);
+
+	/* sum all the stall events */
+#ifdef CONFIG_ZONE_DMA
+	stall += events[ALLOCSTALL_DMA];
+#endif
+#ifdef CONFIG_ZONE_DMA32
+	stall += events[ALLOCSTALL_DMA32];
+#endif
+#ifdef CONFIG_HIGHMEM
+	stall += events[ALLOCSTALL_HIGH];
+#endif
+#ifdef CONFIG_ZONE_DEVICE
+	stall += events[ALLOCSTALL_DEVICE];
+#endif
+	stall += events[ALLOCSTALL_NORMAL];
+	stall += events[ALLOCSTALL_MOVABLE];
+	update_stat(vb, idx++, VIRTIO_BALLOON_S_ALLOC_STALL, stall);
+
 #ifdef CONFIG_HUGETLB_PAGE
 	update_stat(vb, idx++, VIRTIO_BALLOON_S_HTLB_PGALLOC,
 		    events[HTLB_BUDDY_PGALLOC]);
diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
index b17bbe033697..487b893a160e 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -72,7 +72,8 @@ struct virtio_balloon_config {
 #define VIRTIO_BALLOON_S_HTLB_PGALLOC  8  /* Hugetlb page allocations */
 #define VIRTIO_BALLOON_S_HTLB_PGFAIL   9  /* Hugetlb page allocation failures */
 #define VIRTIO_BALLOON_S_OOM_KILL      10 /* OOM killer invocations */
-#define VIRTIO_BALLOON_S_NR       11
+#define VIRTIO_BALLOON_S_ALLOC_STALL   11 /* Stall count of memory allocatoin */
+#define VIRTIO_BALLOON_S_NR       12
 
 #define VIRTIO_BALLOON_S_NAMES_WITH_PREFIX(VIRTIO_BALLOON_S_NAMES_prefix) { \
 	VIRTIO_BALLOON_S_NAMES_prefix "swap-in", \
@@ -85,7 +86,8 @@ struct virtio_balloon_config {
 	VIRTIO_BALLOON_S_NAMES_prefix "disk-caches", \
 	VIRTIO_BALLOON_S_NAMES_prefix "hugetlb-allocations", \
 	VIRTIO_BALLOON_S_NAMES_prefix "hugetlb-failures", \
-	VIRTIO_BALLOON_S_NAMES_prefix "oom-kills" \
+	VIRTIO_BALLOON_S_NAMES_prefix "oom-kills", \
+	VIRTIO_BALLOON_S_NAMES_prefix "alloc-stalls" \
 }
 
 #define VIRTIO_BALLOON_S_NAMES VIRTIO_BALLOON_S_NAMES_WITH_PREFIX("")
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/3] virtio_balloon: introduce memory scan/reclaim info
  2024-04-18  6:25 [PATCH 0/3] Improve memory statistics for virtio balloon zhenwei pi
  2024-04-18  6:26 ` [PATCH 1/3] virtio_balloon: introduce oom-kill invocations zhenwei pi
  2024-04-18  6:26 ` [PATCH 2/3] virtio_balloon: introduce memory allocation stall counter zhenwei pi
@ 2024-04-18  6:26 ` zhenwei pi
  2024-04-18 11:51   ` David Hildenbrand
  2024-04-22 20:57 ` [PATCH 0/3] Improve memory statistics for virtio balloon Michael S. Tsirkin
  3 siblings, 1 reply; 9+ messages in thread
From: zhenwei pi @ 2024-04-18  6:26 UTC (permalink / raw)
  To: linux-kernel, linux-mm, virtualization
  Cc: mst, david, jasowang, xuanzhuo, akpm, zhenwei pi

Expose memory scan/reclaim information to the host side via virtio
balloon device.

Now we have a metric to analyze the memory performance:

y: counter increases
n: counter does not changes
h: the rate of counter change is high
l: the rate of counter change is low

OOM: VIRTIO_BALLOON_S_OOM_KILL
STALL: VIRTIO_BALLOON_S_ALLOC_STALL
ASCAN: VIRTIO_BALLOON_S_SCAN_ASYNC
DSCAN: VIRTIO_BALLOON_S_SCAN_DIRECT
ARCLM: VIRTIO_BALLOON_S_RECLAIM_ASYNC
DRCLM: VIRTIO_BALLOON_S_RECLAIM_DIRECT

- OOM[y], STALL[*], ASCAN[*], DSCAN[*], ARCLM[*], DRCLM[*]:
  the guest runs under really critial memory pressure

- OOM[n], STALL[h], ASCAN[*], DSCAN[l], ARCLM[*], DRCLM[l]:
  the memory allocation stalls due to cgroup, not the global memory
  pressure.

- OOM[n], STALL[h], ASCAN[*], DSCAN[h], ARCLM[*], DRCLM[h]:
  the memory allocation stalls due to global memory pressure. The
  performance gets hurt a lot. A high ratio between DRCLM/DSCAN shows
  quite effective memory reclaiming.

- OOM[n], STALL[h], ASCAN[*], DSCAN[h], ARCLM[*], DRCLM[l]:
  the memory allocation stalls due to global memory pressure.
  the ratio between DRCLM/DSCAN gets low, the guest OS is thrashing
  heavily, the serious case leads poor performance and difficult
  trouble shooting. Ex, sshd may block on memory allocation when
  accepting new connections, a user can't login a VM by ssh command.

- OOM[n], STALL[n], ASCAN[h], DSCAN[n], ARCLM[l], DRCLM[n]:
  the low ratio between ARCLM/ASCAN shows that the guest tries to
  reclaim more memory, but it can't. Once more memory is required in
  future, it will struggle to reclaim memory.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
 drivers/virtio/virtio_balloon.c     |  9 +++++++++
 include/uapi/linux/virtio_balloon.h | 12 ++++++++++--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index e88e6573afa5..bc9332c1ae85 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -356,6 +356,15 @@ static unsigned int update_balloon_stats(struct virtio_balloon *vb)
 	stall += events[ALLOCSTALL_MOVABLE];
 	update_stat(vb, idx++, VIRTIO_BALLOON_S_ALLOC_STALL, stall);
 
+	update_stat(vb, idx++, VIRTIO_BALLOON_S_ASYNC_SCAN,
+			pages_to_bytes(events[PGSCAN_KSWAPD]));
+	update_stat(vb, idx++, VIRTIO_BALLOON_S_DIRECT_SCAN,
+			pages_to_bytes(events[PGSCAN_DIRECT]));
+	update_stat(vb, idx++, VIRTIO_BALLOON_S_ASYNC_RECLAIM,
+			pages_to_bytes(events[PGSTEAL_KSWAPD]));
+	update_stat(vb, idx++, VIRTIO_BALLOON_S_DIRECT_RECLAIM,
+			pages_to_bytes(events[PGSTEAL_DIRECT]));
+
 #ifdef CONFIG_HUGETLB_PAGE
 	update_stat(vb, idx++, VIRTIO_BALLOON_S_HTLB_PGALLOC,
 		    events[HTLB_BUDDY_PGALLOC]);
diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
index 487b893a160e..ee35a372805d 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -73,7 +73,11 @@ struct virtio_balloon_config {
 #define VIRTIO_BALLOON_S_HTLB_PGFAIL   9  /* Hugetlb page allocation failures */
 #define VIRTIO_BALLOON_S_OOM_KILL      10 /* OOM killer invocations */
 #define VIRTIO_BALLOON_S_ALLOC_STALL   11 /* Stall count of memory allocatoin */
-#define VIRTIO_BALLOON_S_NR       12
+#define VIRTIO_BALLOON_S_ASYNC_SCAN    12 /* Amount of memory scanned asynchronously */
+#define VIRTIO_BALLOON_S_DIRECT_SCAN   13 /* Amount of memory scanned directly */
+#define VIRTIO_BALLOON_S_ASYNC_RECLAIM 14 /* Amount of memory reclaimed asynchronously */
+#define VIRTIO_BALLOON_S_DIRECT_RECLAIM 15 /* Amount of memory reclaimed directly */
+#define VIRTIO_BALLOON_S_NR       16
 
 #define VIRTIO_BALLOON_S_NAMES_WITH_PREFIX(VIRTIO_BALLOON_S_NAMES_prefix) { \
 	VIRTIO_BALLOON_S_NAMES_prefix "swap-in", \
@@ -87,7 +91,11 @@ struct virtio_balloon_config {
 	VIRTIO_BALLOON_S_NAMES_prefix "hugetlb-allocations", \
 	VIRTIO_BALLOON_S_NAMES_prefix "hugetlb-failures", \
 	VIRTIO_BALLOON_S_NAMES_prefix "oom-kills", \
-	VIRTIO_BALLOON_S_NAMES_prefix "alloc-stalls" \
+	VIRTIO_BALLOON_S_NAMES_prefix "alloc-stalls", \
+	VIRTIO_BALLOON_S_NAMES_prefix "async-scans", \
+	VIRTIO_BALLOON_S_NAMES_prefix "direct-scans", \
+	VIRTIO_BALLOON_S_NAMES_prefix "async-reclaims", \
+	VIRTIO_BALLOON_S_NAMES_prefix "direct-reclaims" \
 }
 
 #define VIRTIO_BALLOON_S_NAMES VIRTIO_BALLOON_S_NAMES_WITH_PREFIX("")
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] virtio_balloon: introduce oom-kill invocations
  2024-04-18  6:26 ` [PATCH 1/3] virtio_balloon: introduce oom-kill invocations zhenwei pi
@ 2024-04-18 10:58   ` David Hildenbrand
  0 siblings, 0 replies; 9+ messages in thread
From: David Hildenbrand @ 2024-04-18 10:58 UTC (permalink / raw)
  To: zhenwei pi, linux-kernel, linux-mm, virtualization
  Cc: mst, jasowang, xuanzhuo, akpm

On 18.04.24 08:26, zhenwei pi wrote:
> When the guest OS runs under critical memory pressure, the guest
> starts to kill processes. A guest monitor agent may scan 'oom_kill'
> from /proc/vmstat, and reports the OOM KILL event. However, the agent
> may be killed and we will loss this critical event(and the later
> events).
> 
> For now we can also grep for magic words in guest kernel log from host
> side. Rather than this unstable way, virtio balloon reports OOM-KILL
> invocations instead.
> 
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] virtio_balloon: introduce memory allocation stall counter
  2024-04-18  6:26 ` [PATCH 2/3] virtio_balloon: introduce memory allocation stall counter zhenwei pi
@ 2024-04-18 11:49   ` David Hildenbrand
  2024-04-21  3:43   ` kernel test robot
  1 sibling, 0 replies; 9+ messages in thread
From: David Hildenbrand @ 2024-04-18 11:49 UTC (permalink / raw)
  To: zhenwei pi, linux-kernel, linux-mm, virtualization
  Cc: mst, jasowang, xuanzhuo, akpm

On 18.04.24 08:26, zhenwei pi wrote:
> Memory allocation stall counter represents the performance/latency of
> memory allocation, expose this counter to the host side by virtio
> balloon device via out-of-bound way.
> 
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> ---
>   drivers/virtio/virtio_balloon.c     | 20 +++++++++++++++++++-
>   include/uapi/linux/virtio_balloon.h |  6 ++++--
>   2 files changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index fd19934a847f..e88e6573afa5 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -321,7 +321,7 @@ static unsigned int update_balloon_stats(struct virtio_balloon *vb)
>   	unsigned long events[NR_VM_EVENT_ITEMS];
>   	struct sysinfo i;
>   	unsigned int idx = 0;
> -	long available;
> +	long available, stall = 0;
>   	unsigned long caches;
>   
>   	all_vm_events(events);
> @@ -338,6 +338,24 @@ static unsigned int update_balloon_stats(struct virtio_balloon *vb)
>   	update_stat(vb, idx++, VIRTIO_BALLOON_S_MAJFLT, events[PGMAJFAULT]);
>   	update_stat(vb, idx++, VIRTIO_BALLOON_S_MINFLT, events[PGFAULT]);
>   	update_stat(vb, idx++, VIRTIO_BALLOON_S_OOM_KILL, events[OOM_KILL]);
> +
> +	/* sum all the stall events */
> +#ifdef CONFIG_ZONE_DMA
> +	stall += events[ALLOCSTALL_DMA];
> +#endif
> +#ifdef CONFIG_ZONE_DMA32
> +	stall += events[ALLOCSTALL_DMA32];
> +#endif
> +#ifdef CONFIG_HIGHMEM
> +	stall += events[ALLOCSTALL_HIGH];
> +#endif
> +#ifdef CONFIG_ZONE_DEVICE
> +	stall += events[ALLOCSTALL_DEVICE];
> +#endif

Naive me would think that ALLOCSTALL_DEVICE is always 0. :)

Likely we should just do:

for (zid = 0; zid < MAX_NR_ZONES; zid++)
	stall += events[ALLOCSTALL_NORMAL - ZONE_NORMAL + zid];

(see isolate_lru_folios() -> __count_zid_vm_events(), where we realy on 
the same ordering)

Apart form that, LGTM.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] virtio_balloon: introduce memory scan/reclaim info
  2024-04-18  6:26 ` [PATCH 3/3] virtio_balloon: introduce memory scan/reclaim info zhenwei pi
@ 2024-04-18 11:51   ` David Hildenbrand
  0 siblings, 0 replies; 9+ messages in thread
From: David Hildenbrand @ 2024-04-18 11:51 UTC (permalink / raw)
  To: zhenwei pi, linux-kernel, linux-mm, virtualization
  Cc: mst, jasowang, xuanzhuo, akpm

On 18.04.24 08:26, zhenwei pi wrote:
> Expose memory scan/reclaim information to the host side via virtio
> balloon device.
> 
> Now we have a metric to analyze the memory performance:
> 
> y: counter increases
> n: counter does not changes
> h: the rate of counter change is high
> l: the rate of counter change is low
> 
> OOM: VIRTIO_BALLOON_S_OOM_KILL
> STALL: VIRTIO_BALLOON_S_ALLOC_STALL
> ASCAN: VIRTIO_BALLOON_S_SCAN_ASYNC
> DSCAN: VIRTIO_BALLOON_S_SCAN_DIRECT
> ARCLM: VIRTIO_BALLOON_S_RECLAIM_ASYNC
> DRCLM: VIRTIO_BALLOON_S_RECLAIM_DIRECT
> 
> - OOM[y], STALL[*], ASCAN[*], DSCAN[*], ARCLM[*], DRCLM[*]:
>    the guest runs under really critial memory pressure
> 
> - OOM[n], STALL[h], ASCAN[*], DSCAN[l], ARCLM[*], DRCLM[l]:
>    the memory allocation stalls due to cgroup, not the global memory
>    pressure.
> 
> - OOM[n], STALL[h], ASCAN[*], DSCAN[h], ARCLM[*], DRCLM[h]:
>    the memory allocation stalls due to global memory pressure. The
>    performance gets hurt a lot. A high ratio between DRCLM/DSCAN shows
>    quite effective memory reclaiming.
> 
> - OOM[n], STALL[h], ASCAN[*], DSCAN[h], ARCLM[*], DRCLM[l]:
>    the memory allocation stalls due to global memory pressure.
>    the ratio between DRCLM/DSCAN gets low, the guest OS is thrashing
>    heavily, the serious case leads poor performance and difficult
>    trouble shooting. Ex, sshd may block on memory allocation when
>    accepting new connections, a user can't login a VM by ssh command.
> 
> - OOM[n], STALL[n], ASCAN[h], DSCAN[n], ARCLM[l], DRCLM[n]:
>    the low ratio between ARCLM/ASCAN shows that the guest tries to
>    reclaim more memory, but it can't. Once more memory is required in
>    future, it will struggle to reclaim memory.
> 
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> ---
>   drivers/virtio/virtio_balloon.c     |  9 +++++++++
>   include/uapi/linux/virtio_balloon.h | 12 ++++++++++--
>   2 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index e88e6573afa5..bc9332c1ae85 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -356,6 +356,15 @@ static unsigned int update_balloon_stats(struct virtio_balloon *vb)
>   	stall += events[ALLOCSTALL_MOVABLE];
>   	update_stat(vb, idx++, VIRTIO_BALLOON_S_ALLOC_STALL, stall);
>   
> +	update_stat(vb, idx++, VIRTIO_BALLOON_S_ASYNC_SCAN,
> +			pages_to_bytes(events[PGSCAN_KSWAPD]));
> +	update_stat(vb, idx++, VIRTIO_BALLOON_S_DIRECT_SCAN,
> +			pages_to_bytes(events[PGSCAN_DIRECT]));
> +	update_stat(vb, idx++, VIRTIO_BALLOON_S_ASYNC_RECLAIM,
> +			pages_to_bytes(events[PGSTEAL_KSWAPD]));
> +	update_stat(vb, idx++, VIRTIO_BALLOON_S_DIRECT_RECLAIM,
> +			pages_to_bytes(events[PGSTEAL_DIRECT]));
> +
>   #ifdef CONFIG_HUGETLB_PAGE
>   	update_stat(vb, idx++, VIRTIO_BALLOON_S_HTLB_PGALLOC,
>   		    events[HTLB_BUDDY_PGALLOC]);
> diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
> index 487b893a160e..ee35a372805d 100644
> --- a/include/uapi/linux/virtio_balloon.h
> +++ b/include/uapi/linux/virtio_balloon.h
> @@ -73,7 +73,11 @@ struct virtio_balloon_config {
>   #define VIRTIO_BALLOON_S_HTLB_PGFAIL   9  /* Hugetlb page allocation failures */
>   #define VIRTIO_BALLOON_S_OOM_KILL      10 /* OOM killer invocations */
>   #define VIRTIO_BALLOON_S_ALLOC_STALL   11 /* Stall count of memory allocatoin */
> -#define VIRTIO_BALLOON_S_NR       12
> +#define VIRTIO_BALLOON_S_ASYNC_SCAN    12 /* Amount of memory scanned asynchronously */
> +#define VIRTIO_BALLOON_S_DIRECT_SCAN   13 /* Amount of memory scanned directly */
> +#define VIRTIO_BALLOON_S_ASYNC_RECLAIM 14 /* Amount of memory reclaimed asynchronously */
> +#define VIRTIO_BALLOON_S_DIRECT_RECLAIM 15 /* Amount of memory reclaimed directly */
> +#define VIRTIO_BALLOON_S_NR       16
>   
>   #define VIRTIO_BALLOON_S_NAMES_WITH_PREFIX(VIRTIO_BALLOON_S_NAMES_prefix) { \
>   	VIRTIO_BALLOON_S_NAMES_prefix "swap-in", \
> @@ -87,7 +91,11 @@ struct virtio_balloon_config {
>   	VIRTIO_BALLOON_S_NAMES_prefix "hugetlb-allocations", \
>   	VIRTIO_BALLOON_S_NAMES_prefix "hugetlb-failures", \
>   	VIRTIO_BALLOON_S_NAMES_prefix "oom-kills", \
> -	VIRTIO_BALLOON_S_NAMES_prefix "alloc-stalls" \
> +	VIRTIO_BALLOON_S_NAMES_prefix "alloc-stalls", \
> +	VIRTIO_BALLOON_S_NAMES_prefix "async-scans", \
> +	VIRTIO_BALLOON_S_NAMES_prefix "direct-scans", \
> +	VIRTIO_BALLOON_S_NAMES_prefix "async-reclaims", \
> +	VIRTIO_BALLOON_S_NAMES_prefix "direct-reclaims" \
>   }
>   
>   #define VIRTIO_BALLOON_S_NAMES VIRTIO_BALLOON_S_NAMES_WITH_PREFIX("")

Not an expert on these counters/events, but LGTM

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] virtio_balloon: introduce memory allocation stall counter
  2024-04-18  6:26 ` [PATCH 2/3] virtio_balloon: introduce memory allocation stall counter zhenwei pi
  2024-04-18 11:49   ` David Hildenbrand
@ 2024-04-21  3:43   ` kernel test robot
  1 sibling, 0 replies; 9+ messages in thread
From: kernel test robot @ 2024-04-21  3:43 UTC (permalink / raw)
  To: zhenwei pi, linux-kernel, linux-mm, virtualization
  Cc: oe-kbuild-all, mst, david, jasowang, xuanzhuo, akpm, zhenwei pi

Hi zhenwei,

kernel test robot noticed the following build warnings:

[auto build test WARNING on akpm-mm/mm-everything]
[also build test WARNING on linus/master v6.9-rc4 next-20240419]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/zhenwei-pi/virtio_balloon-introduce-oom-kill-invocations/20240418-142934
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20240418062602.1291391-3-pizhenwei%40bytedance.com
patch subject: [PATCH 2/3] virtio_balloon: introduce memory allocation stall counter
config: i386-randconfig-141-20240421 (https://download.01.org/0day-ci/archive/20240421/202404211106.B9pwuFqk-lkp@intel.com/config)
compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240421/202404211106.B9pwuFqk-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202404211106.B9pwuFqk-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/virtio/virtio_balloon.c:324:18: warning: unused variable 'stall' [-Wunused-variable]
     324 |         long available, stall = 0;
         |                         ^~~~~
   1 warning generated.


vim +/stall +324 drivers/virtio/virtio_balloon.c

   318	
   319	static unsigned int update_balloon_stats(struct virtio_balloon *vb)
   320	{
   321		unsigned long events[NR_VM_EVENT_ITEMS];
   322		struct sysinfo i;
   323		unsigned int idx = 0;
 > 324		long available, stall = 0;
   325		unsigned long caches;
   326	
   327		all_vm_events(events);
   328		si_meminfo(&i);
   329	
   330		available = si_mem_available();
   331		caches = global_node_page_state(NR_FILE_PAGES);
   332	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] Improve memory statistics for virtio balloon
  2024-04-18  6:25 [PATCH 0/3] Improve memory statistics for virtio balloon zhenwei pi
                   ` (2 preceding siblings ...)
  2024-04-18  6:26 ` [PATCH 3/3] virtio_balloon: introduce memory scan/reclaim info zhenwei pi
@ 2024-04-22 20:57 ` Michael S. Tsirkin
  3 siblings, 0 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2024-04-22 20:57 UTC (permalink / raw)
  To: zhenwei pi
  Cc: linux-kernel, linux-mm, virtualization, david, jasowang, xuanzhuo, akpm

On Thu, Apr 18, 2024 at 02:25:59PM +0800, zhenwei pi wrote:
> RFC -> v1:
> - several text changes: oom-kill -> oom-kills, SCAN_ASYNC -> ASYN_SCAN.
> - move vm events codes into '#ifdef CONFIG_VM_EVENT_COUNTERS'
> 
> RFC version:
> Link: https://lore.kernel.org/lkml/20240415084113.1203428-1-pizhenwei@bytedance.com/T/#m1898963b3c27a989b1123db475135c3ca687ca84


Make sure this builds without introducing new warnings please. 

> zhenwei pi (3):
>   virtio_balloon: introduce oom-kill invocations
>   virtio_balloon: introduce memory allocation stall counter
>   virtio_balloon: introduce memory scan/reclaim info
> 
>  drivers/virtio/virtio_balloon.c     | 30 ++++++++++++++++++++++++++++-
>  include/uapi/linux/virtio_balloon.h | 16 +++++++++++++--
>  2 files changed, 43 insertions(+), 3 deletions(-)
> 
> -- 
> 2.34.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-04-22 20:58 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-18  6:25 [PATCH 0/3] Improve memory statistics for virtio balloon zhenwei pi
2024-04-18  6:26 ` [PATCH 1/3] virtio_balloon: introduce oom-kill invocations zhenwei pi
2024-04-18 10:58   ` David Hildenbrand
2024-04-18  6:26 ` [PATCH 2/3] virtio_balloon: introduce memory allocation stall counter zhenwei pi
2024-04-18 11:49   ` David Hildenbrand
2024-04-21  3:43   ` kernel test robot
2024-04-18  6:26 ` [PATCH 3/3] virtio_balloon: introduce memory scan/reclaim info zhenwei pi
2024-04-18 11:51   ` David Hildenbrand
2024-04-22 20:57 ` [PATCH 0/3] Improve memory statistics for virtio balloon Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.