linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv2 0/4] zsmalloc: make zspage chain size configurable
@ 2023-01-09  3:38 Sergey Senozhatsky
  2023-01-09  3:38 ` [PATCHv2 1/4] zsmalloc: rework zspage chain size selection Sergey Senozhatsky
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Sergey Senozhatsky @ 2023-01-09  3:38 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton; +Cc: linux-kernel, linux-mm, Sergey Senozhatsky

Hi,

	This turns hard coded limit on maximum number of physical
pages per-zspage into a config option. It also increases the default
limit from 4 to 8.

Sergey Senozhatsky (4):
  zsmalloc: rework zspage chain size selection
  zsmalloc: skip chain size calculation for pow_of_2 classes
  zsmalloc: make zspage chain size configurable
  zsmalloc: set default zspage chain size to 8

 Documentation/mm/zsmalloc.rst | 168 ++++++++++++++++++++++++++++++++++
 mm/Kconfig                    |  19 ++++
 mm/zsmalloc.c                 |  72 +++++----------
 3 files changed, 212 insertions(+), 47 deletions(-)

-- 
2.39.0.314.g84b9a713c41-goog



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCHv2 1/4] zsmalloc: rework zspage chain size selection
  2023-01-09  3:38 [PATCHv2 0/4] zsmalloc: make zspage chain size configurable Sergey Senozhatsky
@ 2023-01-09  3:38 ` Sergey Senozhatsky
  2023-01-13 17:32   ` Minchan Kim
  2023-01-09  3:38 ` [PATCHv2 2/4] zsmalloc: skip chain size calculation for pow_of_2 classes Sergey Senozhatsky
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Sergey Senozhatsky @ 2023-01-09  3:38 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton; +Cc: linux-kernel, linux-mm, Sergey Senozhatsky

Computers are bad at division. We currently decide the best
zspage chain size (max number of physical pages per-zspage)
by looking at a `used percentage` value. This is not enough
as we lose precision during usage percentage calculations
For example, let's look at size class 208:

pages per zspage       wasted bytes         used%
       1                   144               96
       2                    80               99
       3                    16               99
       4                   160               99

Current algorithm will select 2 page per zspage configuration,
as it's the first one to reach 99%. However, 3 pages per zspage
waste less memory.

Change algorithm and select zspage configuration that has
lowest wasted value.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 mm/zsmalloc.c | 56 +++++++++++++++++----------------------------------
 1 file changed, 19 insertions(+), 37 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 6aafacd664fc..effe10fe76e9 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -802,42 +802,6 @@ static enum fullness_group fix_fullness_group(struct size_class *class,
 	return newfg;
 }
 
-/*
- * We have to decide on how many pages to link together
- * to form a zspage for each size class. This is important
- * to reduce wastage due to unusable space left at end of
- * each zspage which is given as:
- *     wastage = Zp % class_size
- *     usage = Zp - wastage
- * where Zp = zspage size = k * PAGE_SIZE where k = 1, 2, ...
- *
- * For example, for size class of 3/8 * PAGE_SIZE, we should
- * link together 3 PAGE_SIZE sized pages to form a zspage
- * since then we can perfectly fit in 8 such objects.
- */
-static int get_pages_per_zspage(int class_size)
-{
-	int i, max_usedpc = 0;
-	/* zspage order which gives maximum used size per KB */
-	int max_usedpc_order = 1;
-
-	for (i = 1; i <= ZS_MAX_PAGES_PER_ZSPAGE; i++) {
-		int zspage_size;
-		int waste, usedpc;
-
-		zspage_size = i * PAGE_SIZE;
-		waste = zspage_size % class_size;
-		usedpc = (zspage_size - waste) * 100 / zspage_size;
-
-		if (usedpc > max_usedpc) {
-			max_usedpc = usedpc;
-			max_usedpc_order = i;
-		}
-	}
-
-	return max_usedpc_order;
-}
-
 static struct zspage *get_zspage(struct page *page)
 {
 	struct zspage *zspage = (struct zspage *)page_private(page);
@@ -2318,6 +2282,24 @@ static int zs_register_shrinker(struct zs_pool *pool)
 				 pool->name);
 }
 
+static int calculate_zspage_chain_size(int class_size)
+{
+	int i, min_waste = INT_MAX;
+	int chain_size = 1;
+
+	for (i = 1; i <= ZS_MAX_PAGES_PER_ZSPAGE; i++) {
+		int waste;
+
+		waste = (i * PAGE_SIZE) % class_size;
+		if (waste < min_waste) {
+			min_waste = waste;
+			chain_size = i;
+		}
+	}
+
+	return chain_size;
+}
+
 /**
  * zs_create_pool - Creates an allocation pool to work from.
  * @name: pool name to be created
@@ -2362,7 +2344,7 @@ struct zs_pool *zs_create_pool(const char *name)
 		size = ZS_MIN_ALLOC_SIZE + i * ZS_SIZE_CLASS_DELTA;
 		if (size > ZS_MAX_ALLOC_SIZE)
 			size = ZS_MAX_ALLOC_SIZE;
-		pages_per_zspage = get_pages_per_zspage(size);
+		pages_per_zspage = calculate_zspage_chain_size(size);
 		objs_per_zspage = pages_per_zspage * PAGE_SIZE / size;
 
 		/*
-- 
2.39.0.314.g84b9a713c41-goog



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCHv2 2/4] zsmalloc: skip chain size calculation for pow_of_2 classes
  2023-01-09  3:38 [PATCHv2 0/4] zsmalloc: make zspage chain size configurable Sergey Senozhatsky
  2023-01-09  3:38 ` [PATCHv2 1/4] zsmalloc: rework zspage chain size selection Sergey Senozhatsky
@ 2023-01-09  3:38 ` Sergey Senozhatsky
  2023-01-13 17:32   ` Minchan Kim
  2023-01-09  3:38 ` [PATCHv2 3/4] zsmalloc: make zspage chain size configurable Sergey Senozhatsky
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Sergey Senozhatsky @ 2023-01-09  3:38 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton; +Cc: linux-kernel, linux-mm, Sergey Senozhatsky

If a class size is power of 2 then it wastes no memory
and the best configuration is 1 physical page per-zspage.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 mm/zsmalloc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index effe10fe76e9..ee8431784998 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -2287,6 +2287,9 @@ static int calculate_zspage_chain_size(int class_size)
 	int i, min_waste = INT_MAX;
 	int chain_size = 1;
 
+	if (is_power_of_2(class_size))
+		return chain_size;
+
 	for (i = 1; i <= ZS_MAX_PAGES_PER_ZSPAGE; i++) {
 		int waste;
 
-- 
2.39.0.314.g84b9a713c41-goog



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCHv2 3/4] zsmalloc: make zspage chain size configurable
  2023-01-09  3:38 [PATCHv2 0/4] zsmalloc: make zspage chain size configurable Sergey Senozhatsky
  2023-01-09  3:38 ` [PATCHv2 1/4] zsmalloc: rework zspage chain size selection Sergey Senozhatsky
  2023-01-09  3:38 ` [PATCHv2 2/4] zsmalloc: skip chain size calculation for pow_of_2 classes Sergey Senozhatsky
@ 2023-01-09  3:38 ` Sergey Senozhatsky
  2023-01-12  7:11   ` Sergey Senozhatsky
  2023-01-13 19:02   ` [PATCHv2 3/4] zsmalloc: make zspage chain size configurable Minchan Kim
  2023-01-09  3:38 ` [PATCHv2 4/4] zsmalloc: set default zspage chain size to 8 Sergey Senozhatsky
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 27+ messages in thread
From: Sergey Senozhatsky @ 2023-01-09  3:38 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton; +Cc: linux-kernel, linux-mm, Sergey Senozhatsky

Remove hard coded limit on the maximum number of physical
pages per-zspage.

This will allow tuning of zsmalloc pool as zspage chain
size changes `pages per-zspage` and `objects per-zspage`
characteristics of size classes which also affects size
classes clustering (the way size classes are merged).

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 Documentation/mm/zsmalloc.rst | 168 ++++++++++++++++++++++++++++++++++
 mm/Kconfig                    |  19 ++++
 mm/zsmalloc.c                 |  15 +--
 3 files changed, 191 insertions(+), 11 deletions(-)

diff --git a/Documentation/mm/zsmalloc.rst b/Documentation/mm/zsmalloc.rst
index 6e79893d6132..40323c9b39d8 100644
--- a/Documentation/mm/zsmalloc.rst
+++ b/Documentation/mm/zsmalloc.rst
@@ -80,3 +80,171 @@ Similarly, we assign zspage to:
 * ZS_ALMOST_FULL  when n > N / f
 * ZS_EMPTY        when n == 0
 * ZS_FULL         when n == N
+
+
+Internals
+=========
+
+zsmalloc has 255 size classes, each of which can hold a number of zspages.
+Each zspage can contain up to ZSMALLOC_CHAIN_SIZE physical (0-order) pages.
+The optimal zspage chain size for each size class is calculated during the
+creation of the zsmalloc pool (see calculate_zspage_chain_size()).
+
+As an optimization, zsmalloc merges size classes that have similar
+characteristics in terms of the number of pages per zspage and the number
+of objects that each zspage can store.
+
+For instance, consider the following size classes:::
+
+  class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
+  ...
+     94  1536           0            0             0          0          0                3        0
+    100  1632           0            0             0          0          0                2        0
+  ...
+
+
+Size classes #95-99 are merged with size class #100. This means that when we
+need to store an object of size, say, 1568 bytes, we end up using size class
+#100 instead of size class #96. Size class #100 is meant for objects of size
+1632 bytes, so each object of size 1568 bytes wastes 1632-1568=64 bytes.
+
+Size class #100 consists of zspages with 2 physical pages each, which can
+hold a total of 5 objects. If we need to store 13 objects of size 1568, we
+end up allocating three zspages, or 6 physical pages.
+
+However, if we take a closer look at size class #96 (which is meant for
+objects of size 1568 bytes) and trace `calculate_zspage_chain_size()`, we
+find that the most optimal zspage configuration for this class is a chain
+of 5 physical pages:::
+
+    pages per zspage      wasted bytes     used%
+           1                  960           76
+           2                  352           95
+           3                 1312           89
+           4                  704           95
+           5                   96           99
+
+This means that a class #96 configuration with 5 physical pages can store 13
+objects of size 1568 in a single zspage, using a total of 5 physical pages.
+This is more efficient than the class #100 configuration, which would use 6
+physical pages to store the same number of objects.
+
+As the zspage chain size for class #96 increases, its key characteristics
+such as pages per-zspage and objects per-zspage also change. This leads to
+dewer class mergers, resulting in a more compact grouping of classes, which
+reduces memory wastage.
+
+Let's take a closer look at the bottom of `/sys/kernel/debug/zsmalloc/zramX/classes`:::
+
+  class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
+  ...
+    202  3264           0            0             0          0          0                4        0
+    254  4096           0            0             0          0          0                1        0
+  ...
+
+Size class #202 stores objects of size 3264 bytes and has a maximum of 4 pages
+per zspage. Any object larger than 3264 bytes is considered huge and belongs
+to size class #254, which stores each object in its own physical page (objects
+in huge classes do not share pages).
+
+Increasing the size of the chain of zspages also results in a higher watermark
+for the huge size class and fewer huge classes overall. This allows for more
+efficient storage of large objects.
+
+For zspage chain size of 8, huge class watermark becomes 3632 bytes:::
+
+  class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
+  ...
+    202  3264           0            0             0          0          0                4        0
+    211  3408           0            0             0          0          0                5        0
+    217  3504           0            0             0          0          0                6        0
+    222  3584           0            0             0          0          0                7        0
+    225  3632           0            0             0          0          0                8        0
+    254  4096           0            0             0          0          0                1        0
+  ...
+
+For zspage chain size of 16, huge class watermark becomes 3840 bytes:::
+
+  class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
+  ...
+    202  3264           0            0             0          0          0                4        0
+    206  3328           0            0             0          0          0               13        0
+    207  3344           0            0             0          0          0                9        0
+    208  3360           0            0             0          0          0               14        0
+    211  3408           0            0             0          0          0                5        0
+    212  3424           0            0             0          0          0               16        0
+    214  3456           0            0             0          0          0               11        0
+    217  3504           0            0             0          0          0                6        0
+    219  3536           0            0             0          0          0               13        0
+    222  3584           0            0             0          0          0                7        0
+    223  3600           0            0             0          0          0               15        0
+    225  3632           0            0             0          0          0                8        0
+    228  3680           0            0             0          0          0                9        0
+    230  3712           0            0             0          0          0               10        0
+    232  3744           0            0             0          0          0               11        0
+    234  3776           0            0             0          0          0               12        0
+    235  3792           0            0             0          0          0               13        0
+    236  3808           0            0             0          0          0               14        0
+    238  3840           0            0             0          0          0               15        0
+    254  4096           0            0             0          0          0                1        0
+  ...
+
+Overall the combined zspage chain size effect on zsmalloc pool configuration:::
+
+  pages per zspage   number of size classes (clusters)   huge size class watermark
+         4                        69                               3264
+         5                        86                               3408
+         6                        93                               3504
+         7                       112                               3584
+         8                       123                               3632
+         9                       140                               3680
+        10                       143                               3712
+        11                       159                               3744
+        12                       164                               3776
+        13                       180                               3792
+        14                       183                               3808
+        15                       188                               3840
+        16                       191                               3840
+
+
+A synthetic test
+----------------
+
+zram as a build artifacts storage (Linux kernel compilation).
+
+* `CONFIG_ZSMALLOC_CHAIN_SIZE=4`
+
+  zsmalloc classes stats:::
+
+    class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
+    ...
+    Total                13           51        413836     412973     159955                         3
+
+  zram mm_stat:::
+
+   1691783168 628083717 655175680        0 655175680       60        0    34048    34049
+
+
+* `CONFIG_ZSMALLOC_CHAIN_SIZE=8`
+
+  zsmalloc classes stats:::
+
+    class  size almost_full almost_empty obj_allocated   obj_used pages_used pages_per_zspage freeable
+    ...
+    Total                18           87        414852     412978     156666                         0
+
+  zram mm_stat:::
+
+    1691803648 627793930 641703936        0 641703936       60        0    33591    33591
+
+Using larger zspage chains may result in using fewer physical pages, as seen
+in the example where the number of physical pages used decreased from 159955
+to 156666, at the same time maximum zsmalloc pool memory usage went down from
+655175680 to 641703936 bytes.
+
+However, this advantage may be offset by the potential for increased system
+memory pressure (as some zspages have larger chain sizes) in cases where there
+is heavy internal fragmentation and zspool compaction is unable to relocate
+objects and release zspages. In these cases, it is recommended to decrease
+the limit on the size of the zspage chains (as specified by the
+CONFIG_ZSMALLOC_CHAIN_SIZE option).
diff --git a/mm/Kconfig b/mm/Kconfig
index 4eb4afa53e6d..5b2863de4be5 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -191,6 +191,25 @@ config ZSMALLOC_STAT
 	  information to userspace via debugfs.
 	  If unsure, say N.
 
+config ZSMALLOC_CHAIN_SIZE
+	int "Maximum number of physical pages per-zspage"
+	default 4
+	range 1 16
+	depends on ZSMALLOC
+	help
+	  This option sets the upper limit on the number of physical pages
+	  that a zmalloc page (zspage) can consist of. The optimal zspage
+	  chain size is calculated for each size class during the
+	  initialization of the pool.
+
+	  Changing this option can alter the characteristics of size classes,
+	  such as the number of pages per zspage and the number of objects
+	  per zspage. This can also result in different configurations of
+	  the pool, as zsmalloc merges size classes with similar
+	  characteristics.
+
+	  For more information, see zsmalloc documentation.
+
 menu "SLAB allocator options"
 
 choice
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index ee8431784998..77a8746a453d 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -73,13 +73,6 @@
  */
 #define ZS_ALIGN		8
 
-/*
- * A single 'zspage' is composed of up to 2^N discontiguous 0-order (single)
- * pages. ZS_MAX_ZSPAGE_ORDER defines upper limit on N.
- */
-#define ZS_MAX_ZSPAGE_ORDER 2
-#define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER)
-
 #define ZS_HANDLE_SIZE (sizeof(unsigned long))
 
 /*
@@ -126,7 +119,7 @@
 #define MAX(a, b) ((a) >= (b) ? (a) : (b))
 /* ZS_MIN_ALLOC_SIZE must be multiple of ZS_ALIGN */
 #define ZS_MIN_ALLOC_SIZE \
-	MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS))
+	MAX(32, (CONFIG_ZSMALLOC_CHAIN_SIZE << PAGE_SHIFT >> OBJ_INDEX_BITS))
 /* each chunk includes extra space to keep handle */
 #define ZS_MAX_ALLOC_SIZE	PAGE_SIZE
 
@@ -1078,7 +1071,7 @@ static struct zspage *alloc_zspage(struct zs_pool *pool,
 					gfp_t gfp)
 {
 	int i;
-	struct page *pages[ZS_MAX_PAGES_PER_ZSPAGE];
+	struct page *pages[CONFIG_ZSMALLOC_CHAIN_SIZE];
 	struct zspage *zspage = cache_alloc_zspage(pool, gfp);
 
 	if (!zspage)
@@ -1910,7 +1903,7 @@ static void replace_sub_page(struct size_class *class, struct zspage *zspage,
 				struct page *newpage, struct page *oldpage)
 {
 	struct page *page;
-	struct page *pages[ZS_MAX_PAGES_PER_ZSPAGE] = {NULL, };
+	struct page *pages[CONFIG_ZSMALLOC_CHAIN_SIZE] = {NULL, };
 	int idx = 0;
 
 	page = get_first_page(zspage);
@@ -2290,7 +2283,7 @@ static int calculate_zspage_chain_size(int class_size)
 	if (is_power_of_2(class_size))
 		return chain_size;
 
-	for (i = 1; i <= ZS_MAX_PAGES_PER_ZSPAGE; i++) {
+	for (i = 1; i <= CONFIG_ZSMALLOC_CHAIN_SIZE; i++) {
 		int waste;
 
 		waste = (i * PAGE_SIZE) % class_size;
-- 
2.39.0.314.g84b9a713c41-goog



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCHv2 4/4] zsmalloc: set default zspage chain size to 8
  2023-01-09  3:38 [PATCHv2 0/4] zsmalloc: make zspage chain size configurable Sergey Senozhatsky
                   ` (2 preceding siblings ...)
  2023-01-09  3:38 ` [PATCHv2 3/4] zsmalloc: make zspage chain size configurable Sergey Senozhatsky
@ 2023-01-09  3:38 ` Sergey Senozhatsky
  2023-01-13 19:02   ` Minchan Kim
  2023-01-13 19:57 ` [PATCHv2 0/4] zsmalloc: make zspage chain size configurable Mike Kravetz
  2023-01-16  3:15 ` Sergey Senozhatsky
  5 siblings, 1 reply; 27+ messages in thread
From: Sergey Senozhatsky @ 2023-01-09  3:38 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton; +Cc: linux-kernel, linux-mm, Sergey Senozhatsky

This changes key characteristics (pages per-zspage and objects
per-zspage) of a number of size classes which in results in
different pool configuration. With zspage chain size of 8 we
have more size clases clusters (123) and higher huge size class
watermark (3632 bytes).

Please read zsmalloc documentation for more details.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 mm/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index 5b2863de4be5..d854a421821b 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -193,7 +193,7 @@ config ZSMALLOC_STAT
 
 config ZSMALLOC_CHAIN_SIZE
 	int "Maximum number of physical pages per-zspage"
-	default 4
+	default 8
 	range 1 16
 	depends on ZSMALLOC
 	help
-- 
2.39.0.314.g84b9a713c41-goog



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCHv2 3/4] zsmalloc: make zspage chain size configurable
  2023-01-09  3:38 ` [PATCHv2 3/4] zsmalloc: make zspage chain size configurable Sergey Senozhatsky
@ 2023-01-12  7:11   ` Sergey Senozhatsky
  2023-01-12  7:14     ` [PATCH] zsmalloc: turn chain size config option into UL constant Sergey Senozhatsky
  2023-01-13 19:02   ` [PATCHv2 3/4] zsmalloc: make zspage chain size configurable Minchan Kim
  1 sibling, 1 reply; 27+ messages in thread
From: Sergey Senozhatsky @ 2023-01-12  7:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Minchan Kim, linux-kernel, linux-mm, Sergey Senozhatsky

On (23/01/09 12:38), Sergey Senozhatsky wrote:
> Remove hard coded limit on the maximum number of physical
> pages per-zspage.
> 
> This will allow tuning of zsmalloc pool as zspage chain
> size changes `pages per-zspage` and `objects per-zspage`
> characteristics of size classes which also affects size
> classes clustering (the way size classes are merged).

Andrew I have small fixup patch (0day build bot failure on
parisc64). How would you prefer to handle this?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] zsmalloc: turn chain size config option into UL constant
  2023-01-12  7:11   ` Sergey Senozhatsky
@ 2023-01-12  7:14     ` Sergey Senozhatsky
  0 siblings, 0 replies; 27+ messages in thread
From: Sergey Senozhatsky @ 2023-01-12  7:14 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton
  Cc: linux-kernel, linux-mm, Sergey Senozhatsky, kernel test robot

This fixes

>> mm/zsmalloc.c:122:59: warning: right shift count >= width of type [-Wshift-count-overflow]

and

>> mm/zsmalloc.c:224:28: error: variably modified 'size_class' at file scope
     224 |         struct size_class *size_class[ZS_SIZE_CLASSES];

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 mm/zsmalloc.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index febfe86d0b1b..290053e648b0 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -133,9 +133,12 @@
 #define MAGIC_VAL_BITS	8
 
 #define MAX(a, b) ((a) >= (b) ? (a) : (b))
+
+#define ZS_MAX_PAGES_PER_ZSPAGE	(_AC(CONFIG_ZSMALLOC_CHAIN_SIZE, UL))
+
 /* ZS_MIN_ALLOC_SIZE must be multiple of ZS_ALIGN */
 #define ZS_MIN_ALLOC_SIZE \
-	MAX(32, (CONFIG_ZSMALLOC_CHAIN_SIZE << PAGE_SHIFT >> OBJ_INDEX_BITS))
+	MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS))
 /* each chunk includes extra space to keep handle */
 #define ZS_MAX_ALLOC_SIZE	PAGE_SIZE
 
@@ -1119,7 +1122,7 @@ static struct zspage *alloc_zspage(struct zs_pool *pool,
 					gfp_t gfp)
 {
 	int i;
-	struct page *pages[CONFIG_ZSMALLOC_CHAIN_SIZE];
+	struct page *pages[ZS_MAX_PAGES_PER_ZSPAGE];
 	struct zspage *zspage = cache_alloc_zspage(pool, gfp);
 
 	if (!zspage)
@@ -1986,7 +1989,7 @@ static void replace_sub_page(struct size_class *class, struct zspage *zspage,
 				struct page *newpage, struct page *oldpage)
 {
 	struct page *page;
-	struct page *pages[CONFIG_ZSMALLOC_CHAIN_SIZE] = {NULL, };
+	struct page *pages[ZS_MAX_PAGES_PER_ZSPAGE] = {NULL, };
 	int idx = 0;
 
 	page = get_first_page(zspage);
@@ -2366,7 +2369,7 @@ static int calculate_zspage_chain_size(int class_size)
 	if (is_power_of_2(class_size))
 		return chain_size;
 
-	for (i = 1; i <= CONFIG_ZSMALLOC_CHAIN_SIZE; i++) {
+	for (i = 1; i <= ZS_MAX_PAGES_PER_ZSPAGE; i++) {
 		int waste;
 
 		waste = (i * PAGE_SIZE) % class_size;
-- 
2.39.0.314.g84b9a713c41-goog



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCHv2 1/4] zsmalloc: rework zspage chain size selection
  2023-01-09  3:38 ` [PATCHv2 1/4] zsmalloc: rework zspage chain size selection Sergey Senozhatsky
@ 2023-01-13 17:32   ` Minchan Kim
  0 siblings, 0 replies; 27+ messages in thread
From: Minchan Kim @ 2023-01-13 17:32 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: Andrew Morton, linux-kernel, linux-mm

On Mon, Jan 09, 2023 at 12:38:35PM +0900, Sergey Senozhatsky wrote:
> Computers are bad at division. We currently decide the best
> zspage chain size (max number of physical pages per-zspage)
> by looking at a `used percentage` value. This is not enough
> as we lose precision during usage percentage calculations
> For example, let's look at size class 208:
> 
> pages per zspage       wasted bytes         used%
>        1                   144               96
>        2                    80               99
>        3                    16               99
>        4                   160               99
> 
> Current algorithm will select 2 page per zspage configuration,
> as it's the first one to reach 99%. However, 3 pages per zspage
> waste less memory.
> 
> Change algorithm and select zspage configuration that has
> lowest wasted value.
> 
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv2 2/4] zsmalloc: skip chain size calculation for pow_of_2 classes
  2023-01-09  3:38 ` [PATCHv2 2/4] zsmalloc: skip chain size calculation for pow_of_2 classes Sergey Senozhatsky
@ 2023-01-13 17:32   ` Minchan Kim
  0 siblings, 0 replies; 27+ messages in thread
From: Minchan Kim @ 2023-01-13 17:32 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: Andrew Morton, linux-kernel, linux-mm

On Mon, Jan 09, 2023 at 12:38:36PM +0900, Sergey Senozhatsky wrote:
> If a class size is power of 2 then it wastes no memory
> and the best configuration is 1 physical page per-zspage.
> 
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv2 3/4] zsmalloc: make zspage chain size configurable
  2023-01-09  3:38 ` [PATCHv2 3/4] zsmalloc: make zspage chain size configurable Sergey Senozhatsky
  2023-01-12  7:11   ` Sergey Senozhatsky
@ 2023-01-13 19:02   ` Minchan Kim
  1 sibling, 0 replies; 27+ messages in thread
From: Minchan Kim @ 2023-01-13 19:02 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: Andrew Morton, linux-kernel, linux-mm

On Mon, Jan 09, 2023 at 12:38:37PM +0900, Sergey Senozhatsky wrote:
> Remove hard coded limit on the maximum number of physical
> pages per-zspage.
> 
> This will allow tuning of zsmalloc pool as zspage chain
> size changes `pages per-zspage` and `objects per-zspage`
> characteristics of size classes which also affects size
> classes clustering (the way size classes are merged).
> 
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>

with addtional patch to fix UL in the thread.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv2 4/4] zsmalloc: set default zspage chain size to 8
  2023-01-09  3:38 ` [PATCHv2 4/4] zsmalloc: set default zspage chain size to 8 Sergey Senozhatsky
@ 2023-01-13 19:02   ` Minchan Kim
  2023-01-14  7:28     ` Sergey Senozhatsky
  0 siblings, 1 reply; 27+ messages in thread
From: Minchan Kim @ 2023-01-13 19:02 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: Andrew Morton, linux-kernel, linux-mm

On Mon, Jan 09, 2023 at 12:38:38PM +0900, Sergey Senozhatsky wrote:
> This changes key characteristics (pages per-zspage and objects
> per-zspage) of a number of size classes which in results in
> different pool configuration. With zspage chain size of 8 we
> have more size clases clusters (123) and higher huge size class
> watermark (3632 bytes).
> 
> Please read zsmalloc documentation for more details.
> 
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>

Thanks for great work, Sergey!


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable
  2023-01-09  3:38 [PATCHv2 0/4] zsmalloc: make zspage chain size configurable Sergey Senozhatsky
                   ` (3 preceding siblings ...)
  2023-01-09  3:38 ` [PATCHv2 4/4] zsmalloc: set default zspage chain size to 8 Sergey Senozhatsky
@ 2023-01-13 19:57 ` Mike Kravetz
  2023-01-14  5:27   ` Sergey Senozhatsky
                     ` (2 more replies)
  2023-01-16  3:15 ` Sergey Senozhatsky
  5 siblings, 3 replies; 27+ messages in thread
From: Mike Kravetz @ 2023-01-13 19:57 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: Minchan Kim, Andrew Morton, linux-kernel, linux-mm

On 01/09/23 12:38, Sergey Senozhatsky wrote:
> Hi,
> 
> 	This turns hard coded limit on maximum number of physical
> pages per-zspage into a config option. It also increases the default
> limit from 4 to 8.
> 
> Sergey Senozhatsky (4):
>   zsmalloc: rework zspage chain size selection
>   zsmalloc: skip chain size calculation for pow_of_2 classes
>   zsmalloc: make zspage chain size configurable
>   zsmalloc: set default zspage chain size to 8
> 
>  Documentation/mm/zsmalloc.rst | 168 ++++++++++++++++++++++++++++++++++
>  mm/Kconfig                    |  19 ++++
>  mm/zsmalloc.c                 |  72 +++++----------
>  3 files changed, 212 insertions(+), 47 deletions(-)

Hi Sergey,

The following BUG shows up after this series in linux-next.  I can easily
recreate by doing the following:

# echo large_value > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
where 'large_value' is a so big that there could never possibly be that
many 2MB huge pages in the system.

-- 
Mike Kravetz

[   22.981684] ------------[ cut here ]------------
[   22.982990] kernel BUG at mm/zsmalloc.c:1982!
[   22.984204] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[   22.985561] CPU: 0 PID: 41 Comm: kcompactd0 Not tainted 6.2.0-rc3+ #13
[   22.987430] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.1-2.fc37 04/01/2014
[   22.989728] RIP: 0010:zs_page_migrate+0x43c/0x490
[   22.991070] Code: c7 c6 c8 f6 21 82 e8 b3 73 f6 ff 0f 0b 0f 1f 44 00 00 e9 20 fd ff ff 0f 1f 44 00 00 e9 9e fd ff ff 48 83 ef 01 e9 6b fe ff ff <0f> 0b 48 8b 43 20 49 89 45 20 e9 ff fd ff ff 48 c7 c6 60 d3 1d 82
[   22.995900] RSP: 0018:ffffc9000121fb20 EFLAGS: 00010246
[   22.997364] RAX: 0000000000000002 RBX: ffffea0005b8b380 RCX: 0000000000000000
[   22.999299] RDX: 0000000000000002 RSI: ffffffff81e28a62 RDI: 00000000ffffffff
[   23.001236] RBP: ffff88816e2cf000 R08: ffffea0005b8b340 R09: 0000000000000008
[   23.003181] R10: ffff88827fffafe0 R11: 0000000000280000 R12: ffff88816e2cf400
[   23.005038] R13: ffffea0009e7f800 R14: ffff88817d783880 R15: ffff8881036a44d8
[   23.006921] FS:  0000000000000000(0000) GS:ffff888277c00000(0000) knlGS:0000000000000000
[   23.009116] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   23.010732] CR2: 00007f8b14e20550 CR3: 0000000103026004 CR4: 0000000000370ef0
[   23.013978] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   23.015931] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   23.017892] Call Trace:
[   23.018664]  <TASK>
[   23.019345]  move_to_new_folio+0x14d/0x1f0
[   23.020710]  migrate_pages+0xe36/0x1240
[   23.021895]  ? __pfx_compaction_alloc+0x10/0x10
[   23.023202]  ? _raw_write_lock+0x13/0x30
[   23.024335]  ? __pfx_compaction_free+0x10/0x10
[   23.025608]  ? isolate_movable_page+0xff/0x250
[   23.026880]  compact_zone+0x9da/0xdf0
[   23.027990]  kcompactd_do_work+0x1d2/0x2c0
[   23.029180]  kcompactd+0x220/0x3e0
[   23.030166]  ? __pfx_autoremove_wake_function+0x10/0x10
[   23.031612]  ? __pfx_kcompactd+0x10/0x10
[   23.032706]  kthread+0xe6/0x110
[   23.033648]  ? __pfx_kthread+0x10/0x10
[   23.034704]  ret_from_fork+0x29/0x50
[   23.035734]  </TASK>
[   23.036443] Modules linked in: rfkill ip6table_filter ip6_tables sunrpc snd_hda_codec_generic snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device 9p netfs snd_pcm joydev 9pnet_virtio virtio_balloon snd_timer snd soundcore 9pnet virtio_blk virtio_net net_failover failover virtio_console crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw virtio_pci virtio virtio_pci_legacy_dev virtio_pci_modern_dev virtio_ring fuse
[   23.049869] ---[ end trace 0000000000000000 ]---
[   23.051154] RIP: 0010:zs_page_migrate+0x43c/0x490
[   23.052466] Code: c7 c6 c8 f6 21 82 e8 b3 73 f6 ff 0f 0b 0f 1f 44 00 00 e9 20 fd ff ff 0f 1f 44 00 00 e9 9e fd ff ff 48 83 ef 01 e9 6b fe ff ff <0f> 0b 48 8b 43 20 49 89 45 20 e9 ff fd ff ff 48 c7 c6 60 d3 1d 82
[   23.057413] RSP: 0018:ffffc9000121fb20 EFLAGS: 00010246
[   23.058892] RAX: 0000000000000002 RBX: ffffea0005b8b380 RCX: 0000000000000000
[   23.060867] RDX: 0000000000000002 RSI: ffffffff81e28a62 RDI: 00000000ffffffff
[   23.062835] RBP: ffff88816e2cf000 R08: ffffea0005b8b340 R09: 0000000000000008
[   23.064825] R10: ffff88827fffafe0 R11: 0000000000280000 R12: ffff88816e2cf400
[   23.066806] R13: ffffea0009e7f800 R14: ffff88817d783880 R15: ffff8881036a44d8
[   23.068738] FS:  0000000000000000(0000) GS:ffff888277c00000(0000) knlGS:0000000000000000
[   23.071022] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   23.072579] CR2: 00007f8b14e20550 CR3: 0000000103026004 CR4: 0000000000370ef0
[   23.076152] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   23.078172] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   23.080134] note: kcompactd0[41] exited with preempt_count 1



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable
  2023-01-13 19:57 ` [PATCHv2 0/4] zsmalloc: make zspage chain size configurable Mike Kravetz
@ 2023-01-14  5:27   ` Sergey Senozhatsky
  2023-01-14  6:34   ` Sergey Senozhatsky
  2023-01-14  7:08   ` Sergey Senozhatsky
  2 siblings, 0 replies; 27+ messages in thread
From: Sergey Senozhatsky @ 2023-01-14  5:27 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Sergey Senozhatsky, Minchan Kim, Andrew Morton, linux-kernel, linux-mm

On (23/01/13 11:57), Mike Kravetz wrote:
> > 	This turns hard coded limit on maximum number of physical
> > pages per-zspage into a config option. It also increases the default
> > limit from 4 to 8.
> > 
> > Sergey Senozhatsky (4):
> >   zsmalloc: rework zspage chain size selection
> >   zsmalloc: skip chain size calculation for pow_of_2 classes
> >   zsmalloc: make zspage chain size configurable
> >   zsmalloc: set default zspage chain size to 8
> > 
> >  Documentation/mm/zsmalloc.rst | 168 ++++++++++++++++++++++++++++++++++
> >  mm/Kconfig                    |  19 ++++
> >  mm/zsmalloc.c                 |  72 +++++----------
> >  3 files changed, 212 insertions(+), 47 deletions(-)
> 
> Hi Sergey,

Hi Mike,

> The following BUG shows up after this series in linux-next.  I can easily
> recreate by doing the following:
>
> # echo large_value > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
> where 'large_value' is a so big that there could never possibly be that
> many 2MB huge pages in the system.

Hmm... Are we sure this is related? I really cannot see how chain-size
can have an effect on zspage ->isolate counter. What chain-size value
do you use? You don't see problems with chain size of 4?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable
  2023-01-13 19:57 ` [PATCHv2 0/4] zsmalloc: make zspage chain size configurable Mike Kravetz
  2023-01-14  5:27   ` Sergey Senozhatsky
@ 2023-01-14  6:34   ` Sergey Senozhatsky
  2023-01-14  7:08   ` Sergey Senozhatsky
  2 siblings, 0 replies; 27+ messages in thread
From: Sergey Senozhatsky @ 2023-01-14  6:34 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Sergey Senozhatsky, Minchan Kim, Andrew Morton, linux-kernel, linux-mm

On (23/01/13 11:57), Mike Kravetz wrote:
> The following BUG shows up after this series in linux-next.  I can easily
> recreate by doing the following:
> 
> # echo large_value > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
> where 'large_value' is a so big that there could never possibly be that
> many 2MB huge pages in the system.

Just to make sure. Do you have this patch applied?
https://lore.kernel.org/lkml/20230112071443.1933880-1-senozhatsky@chromium.org


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable
  2023-01-13 19:57 ` [PATCHv2 0/4] zsmalloc: make zspage chain size configurable Mike Kravetz
  2023-01-14  5:27   ` Sergey Senozhatsky
  2023-01-14  6:34   ` Sergey Senozhatsky
@ 2023-01-14  7:08   ` Sergey Senozhatsky
  2023-01-14 21:34     ` Mike Kravetz
  2023-01-15  7:18     ` Sergey Senozhatsky
  2 siblings, 2 replies; 27+ messages in thread
From: Sergey Senozhatsky @ 2023-01-14  7:08 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Sergey Senozhatsky, Minchan Kim, Andrew Morton, linux-kernel, linux-mm

On (23/01/13 11:57), Mike Kravetz wrote:
> Hi Sergey,
> 
> The following BUG shows up after this series in linux-next.  I can easily
> recreate by doing the following:
> 
> # echo large_value > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
> where 'large_value' is a so big that there could never possibly be that
> many 2MB huge pages in the system.

I get migration warnins with the zsmalloc series reverted.
I guess the problem is somewhere else. Can you double check
on you side?


[   87.208255] ------------[ cut here ]------------
[   87.209431] WARNING: CPU: 18 PID: 300 at mm/migrate.c:995 move_to_new_folio+0x1ef/0x260
[   87.211993] Modules linked in: deflate zlib_deflate zstd zstd_compress zram
[   87.214287] CPU: 18 PID: 300 Comm: kcompactd0 Tainted: G                 N 6.2.0-rc3-next-20230113+ #385
[   87.217529] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
[   87.220131] RIP: 0010:move_to_new_folio+0x1ef/0x260
[   87.221892] Code: 84 c0 74 78 48 8b 43 18 44 89 ea 48 89 de 4c 89 e7 ff 50 06 85 c0 0f 85 a9 fe ff ff 48 8b 03 a9 00 00 04 00 0f 85 7a fe ff ff <0f> 0b e9 73 fe ff ff 48 8b 03 f6 c4 20 74 2a be c0 0c 00 00 48 89
[   87.226514] RSP: 0018:ffffc90000b9fb08 EFLAGS: 00010246
[   87.227879] RAX: 4000000000000021 RBX: ffffea0000890500 RCX: 0000000000000000
[   87.230948] RDX: 0000000000000000 RSI: ffffffff81e6f950 RDI: ffffea0000890500
[   87.233026] RBP: ffffea0000890500 R08: 0000001e82ec3c3e R09: 0000000000000001
[   87.235517] R10: 00000000ffffffff R11: 00000000ffffffff R12: ffffea00015a26c0
[   87.237807] R13: 0000000000000001 R14: ffffea00015a2680 R15: ffffea00008904c0
[   87.239438] FS:  0000000000000000(0000) GS:ffff888624200000(0000) knlGS:0000000000000000
[   87.241303] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   87.242627] CR2: 00007fe537ebbdb8 CR3: 0000000110a0a004 CR4: 0000000000770ee0
[   87.244283] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   87.245913] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   87.247559] PKRU: 55555554
[   87.248269] Call Trace:
[   87.248862]  <TASK>
[   87.249370]  ? lock_is_held_type+0xd9/0x130
[   87.250377]  migrate_pages_batch+0x553/0xc80
[   87.251513]  ? move_freelist_tail+0xc0/0xc0
[   87.252545]  ? isolate_freepages+0x290/0x290
[   87.253654]  ? trace_mm_migrate_pages+0xf0/0xf0
[   87.254901]  migrate_pages+0x1ae/0x330
[   87.255877]  ? isolate_freepages+0x290/0x290
[   87.257015]  ? move_freelist_tail+0xc0/0xc0
[   87.258213]  compact_zone+0x528/0x6a0
[   87.260911]  proactive_compact_node+0x87/0xd0
[   87.262090]  kcompactd+0x1ca/0x360
[   87.263018]  ? swake_up_all+0xe0/0xe0
[   87.264101]  ? kcompactd_do_work+0x240/0x240
[   87.265243]  kthread+0xec/0x110
[   87.266031]  ? kthread_complete_and_exit+0x20/0x20
[   87.267268]  ret_from_fork+0x1f/0x30
[   87.268243]  </TASK>
[   87.268984] irq event stamp: 311113
[   87.269930] hardirqs last  enabled at (311125): [<ffffffff810da6c2>] __up_console_sem+0x52/0x60
[   87.272235] hardirqs last disabled at (311134): [<ffffffff810da6a7>] __up_console_sem+0x37/0x60
[   87.275707] softirqs last  enabled at (311088): [<ffffffff819d2b2c>] __do_softirq+0x21c/0x31f
[   87.278450] softirqs last disabled at (311083): [<ffffffff81070b8d>] __irq_exit_rcu+0xad/0x120
[   87.280555] ---[ end trace 0000000000000000 ]---


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv2 4/4] zsmalloc: set default zspage chain size to 8
  2023-01-13 19:02   ` Minchan Kim
@ 2023-01-14  7:28     ` Sergey Senozhatsky
  0 siblings, 0 replies; 27+ messages in thread
From: Sergey Senozhatsky @ 2023-01-14  7:28 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Sergey Senozhatsky, Andrew Morton, linux-kernel, linux-mm

On (23/01/13 11:02), Minchan Kim wrote:
> Acked-by: Minchan Kim <minchan@kernel.org>
> 
> Thanks for great work, Sergey!

Thank you!


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable
  2023-01-14  7:08   ` Sergey Senozhatsky
@ 2023-01-14 21:34     ` Mike Kravetz
  2023-01-15  4:21       ` Sergey Senozhatsky
  2023-01-15  7:18     ` Sergey Senozhatsky
  1 sibling, 1 reply; 27+ messages in thread
From: Mike Kravetz @ 2023-01-14 21:34 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: Minchan Kim, Andrew Morton, linux-kernel, linux-mm

On 01/14/23 16:08, Sergey Senozhatsky wrote:
> On (23/01/13 11:57), Mike Kravetz wrote:
> > Hi Sergey,
> > 
> > The following BUG shows up after this series in linux-next.  I can easily
> > recreate by doing the following:
> > 
> > # echo large_value > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
> > where 'large_value' is a so big that there could never possibly be that
> > many 2MB huge pages in the system.
> 
> I get migration warnins with the zsmalloc series reverted.
> I guess the problem is somewhere else. Can you double check
> on you side?

I did the following:

- Start with clean v6.2-rc3
  Perform echo, did not see issue

- Applied your 5 patches (includes the zsmalloc: turn chain size config option
  into UL constant patch).  Took default value for ZSMALLOC_CHAIN_SIZE of 8.
  Performed echo, recreated issue.

- Changed ZSMALLOC_CHAIN_SIZE to 1.
  Perform echo, did not see issue

I have not looked into the details of your patches or elsewhere.  Just thought
it might be related to your series because of the above.  And, since your
series was fresh in your mind this may trigger some thought/explanation.

It is certainly possible that root cause could be elsewhere and your series is
just exposing that.  I can take a closer look on Monday.

Thanks,
-- 
Mike Kravetz


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable
  2023-01-14 21:34     ` Mike Kravetz
@ 2023-01-15  4:21       ` Sergey Senozhatsky
  2023-01-15  5:32         ` Sergey Senozhatsky
  0 siblings, 1 reply; 27+ messages in thread
From: Sergey Senozhatsky @ 2023-01-15  4:21 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Sergey Senozhatsky, Minchan Kim, Andrew Morton, linux-kernel, linux-mm

On (23/01/14 13:34), Mike Kravetz wrote:
> I did the following:
> 
> - Start with clean v6.2-rc3
>   Perform echo, did not see issue
> 
> - Applied your 5 patches (includes the zsmalloc: turn chain size config option
>   into UL constant patch).  Took default value for ZSMALLOC_CHAIN_SIZE of 8.
>   Performed echo, recreated issue.
> 
> - Changed ZSMALLOC_CHAIN_SIZE to 1.
>   Perform echo, did not see issue

The patch set basically just adjusts $NUM in calculate_zspage_chain_size():

		for (i = 1; i <= $NUM; i++)

It changes default 4 to 8. Can't really see how this can cause problems.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable
  2023-01-15  4:21       ` Sergey Senozhatsky
@ 2023-01-15  5:32         ` Sergey Senozhatsky
  0 siblings, 0 replies; 27+ messages in thread
From: Sergey Senozhatsky @ 2023-01-15  5:32 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Minchan Kim, Andrew Morton, linux-kernel, linux-mm, Sergey Senozhatsky

On (23/01/15 13:21), Sergey Senozhatsky wrote:
> On (23/01/14 13:34), Mike Kravetz wrote:
> > I did the following:
> > 
> > - Start with clean v6.2-rc3
> >   Perform echo, did not see issue
> > 
> > - Applied your 5 patches (includes the zsmalloc: turn chain size config option
> >   into UL constant patch).  Took default value for ZSMALLOC_CHAIN_SIZE of 8.
> >   Performed echo, recreated issue.
> > 
> > - Changed ZSMALLOC_CHAIN_SIZE to 1.
> >   Perform echo, did not see issue
> 
> The patch set basically just adjusts $NUM in calculate_zspage_chain_size():
> 
> 		for (i = 1; i <= $NUM; i++)
> 
> It changes default 4 to 8. Can't really see how this can cause problems.

OK, I guess it overflows zspage isolated counter, which is a 3 bit
integer, so the max chain-size we can have is b111 == 7.

We probably need something like below (this should not increase sizeof
zspage):

---

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 290053e648b0..86b742a613ee 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -129,7 +129,7 @@
 #define HUGE_BITS      1
 #define FULLNESS_BITS  2
 #define CLASS_BITS     8
-#define ISOLATED_BITS  3
+#define ISOLATED_BITS  5
 #define MAGIC_VAL_BITS 8
 
 #define MAX(a, b) ((a) >= (b) ? (a) : (b))


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable
  2023-01-14  7:08   ` Sergey Senozhatsky
  2023-01-14 21:34     ` Mike Kravetz
@ 2023-01-15  7:18     ` Sergey Senozhatsky
  2023-01-15  8:19       ` Sergey Senozhatsky
  2023-01-15 13:04       ` Matthew Wilcox
  1 sibling, 2 replies; 27+ messages in thread
From: Sergey Senozhatsky @ 2023-01-15  7:18 UTC (permalink / raw)
  To: Matthew Wilcox, Andrew Morton
  Cc: Mike Kravetz, Minchan Kim, linux-kernel, linux-mm, Sergey Senozhatsky

Cc-ing Matthew,

On (23/01/14 16:08), Sergey Senozhatsky wrote:
> [   87.208255] ------------[ cut here ]------------
> [   87.209431] WARNING: CPU: 18 PID: 300 at mm/migrate.c:995 move_to_new_folio+0x1ef/0x260
> [   87.211993] Modules linked in: deflate zlib_deflate zstd zstd_compress zram
> [   87.214287] CPU: 18 PID: 300 Comm: kcompactd0 Tainted: G                 N 6.2.0-rc3-next-20230113+ #385
> [   87.217529] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
> [   87.220131] RIP: 0010:move_to_new_folio+0x1ef/0x260
> [   87.221892] Code: 84 c0 74 78 48 8b 43 18 44 89 ea 48 89 de 4c 89 e7 ff 50 06 85 c0 0f 85 a9 fe ff ff 48 8b 03 a9 00 00 04 00 0f 85 7a fe ff ff <0f> 0b e9 73 fe ff ff 48 8b 03 f6 c4 20 74 2a be c0 0c 00 00 48 89
> [   87.226514] RSP: 0018:ffffc90000b9fb08 EFLAGS: 00010246
> [   87.227879] RAX: 4000000000000021 RBX: ffffea0000890500 RCX: 0000000000000000
> [   87.230948] RDX: 0000000000000000 RSI: ffffffff81e6f950 RDI: ffffea0000890500
> [   87.233026] RBP: ffffea0000890500 R08: 0000001e82ec3c3e R09: 0000000000000001
> [   87.235517] R10: 00000000ffffffff R11: 00000000ffffffff R12: ffffea00015a26c0
> [   87.237807] R13: 0000000000000001 R14: ffffea00015a2680 R15: ffffea00008904c0
> [   87.239438] FS:  0000000000000000(0000) GS:ffff888624200000(0000) knlGS:0000000000000000
> [   87.241303] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   87.242627] CR2: 00007fe537ebbdb8 CR3: 0000000110a0a004 CR4: 0000000000770ee0
> [   87.244283] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   87.245913] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   87.247559] PKRU: 55555554
> [   87.248269] Call Trace:
> [   87.248862]  <TASK>
> [   87.249370]  ? lock_is_held_type+0xd9/0x130
> [   87.250377]  migrate_pages_batch+0x553/0xc80
> [   87.251513]  ? move_freelist_tail+0xc0/0xc0
> [   87.252545]  ? isolate_freepages+0x290/0x290
> [   87.253654]  ? trace_mm_migrate_pages+0xf0/0xf0
> [   87.254901]  migrate_pages+0x1ae/0x330
> [   87.255877]  ? isolate_freepages+0x290/0x290
> [   87.257015]  ? move_freelist_tail+0xc0/0xc0
> [   87.258213]  compact_zone+0x528/0x6a0
> [   87.260911]  proactive_compact_node+0x87/0xd0
> [   87.262090]  kcompactd+0x1ca/0x360
> [   87.263018]  ? swake_up_all+0xe0/0xe0
> [   87.264101]  ? kcompactd_do_work+0x240/0x240
> [   87.265243]  kthread+0xec/0x110
> [   87.266031]  ? kthread_complete_and_exit+0x20/0x20
> [   87.267268]  ret_from_fork+0x1f/0x30
> [   87.268243]  </TASK>
> [   87.268984] irq event stamp: 311113
> [   87.269930] hardirqs last  enabled at (311125): [<ffffffff810da6c2>] __up_console_sem+0x52/0x60
> [   87.272235] hardirqs last disabled at (311134): [<ffffffff810da6a7>] __up_console_sem+0x37/0x60
> [   87.275707] softirqs last  enabled at (311088): [<ffffffff819d2b2c>] __do_softirq+0x21c/0x31f
> [   87.278450] softirqs last disabled at (311083): [<ffffffff81070b8d>] __irq_exit_rcu+0xad/0x120
> [   87.280555] ---[ end trace 0000000000000000 ]---

So this warning is move_to_new_folio() being called on un-isolated
src folio. I had DEBUG_VM disabled so VM_BUG_ON_FOLIO(!folio_test_isolated(src))
did nothing, however after mops->migrate_page() it would trigger WARN_ON()
because it evaluates folio_test_isolated(src) one more time:

[   59.500580] page:0000000097d97a42 refcount:2 mapcount:1665 mapping:0000000000000000 index:0xffffea00185ce940 pfn:0x113dc4
[   59.503239] flags: 0x8000000000000001(locked|zone=2)
[   59.505060] raw: 8000000000000001 ffffea00044f70c8 ffffc90000ba7c20 ffffffff81c22582
[   59.507288] raw: ffffea00185ce940 ffff88809183fdb0 0000000200000680 0000000000000000
[   59.509622] page dumped because: VM_BUG_ON_FOLIO(!folio_test_isolated(src))
[   59.511845] ------------[ cut here ]------------
[   59.513181] kernel BUG at mm/migrate.c:988!
[   59.514821] invalid opcode: 0000 [#1] PREEMPT SMP PTI

[   59.523018] RIP: 0010:move_to_new_folio+0x362/0x3b0
[   59.524160] Code: ff ff e9 55 fd ff ff 48 89 df e8 69 d8 ff ff f0 80 60 02 fb 31 c0 e9 65 fd ff ff 48 c7 c6 00 f5 e9 81 48 89 df e8 be c0 f9 ff <0f> 0b 48 c7 c6 00 f5 e9 81 48 89 df e8 ad c0 f9 ff 0f 0b b8 f5 ff
[   59.528349] RSP: 0018:ffffc90000ba7af8 EFLAGS: 00010246
[   59.529551] RAX: 000000000000003f RBX: ffffea00044f7100 RCX: 0000000000000000
[   59.531186] RDX: 0000000000000000 RSI: ffffffff81e8dcf1 RDI: 00000000ffffffff
[   59.532790] RBP: ffffea00184f1140 R08: 00000000ffffbfff R09: 00000000ffffbfff
[   59.534392] R10: ffff888621ca0000 R11: ffff888621ca0000 R12: 8000000000000001
[   59.536026] R13: 0000000000000001 R14: 0000000000000000 R15: ffffea00184f1140
[   59.537646] FS:  0000000000000000(0000) GS:ffff888626a00000(0000) knlGS:0000000000000000
[   59.539484] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   59.540785] CR2: 00007ff7fbed8000 CR3: 0000000101a26001 CR4: 0000000000770ee0
[   59.542412] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   59.544030] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   59.545637] PKRU: 55555554
[   59.546261] Call Trace:
[   59.546833]  <TASK>
[   59.547371]  ? lock_is_held_type+0xd9/0x130
[   59.548331]  migrate_pages_batch+0x650/0xdc0
[   59.549326]  ? move_freelist_tail+0xc0/0xc0
[   59.550281]  ? isolate_freepages+0x290/0x290
[   59.551289]  ? folio_flags.constprop.0+0x50/0x50
[   59.552348]  migrate_pages+0x3fa/0x4d0
[   59.553224]  ? isolate_freepages+0x290/0x290
[   59.554214]  ? move_freelist_tail+0xc0/0xc0
[   59.555173]  compact_zone+0x51b/0x6a0
[   59.556031]  proactive_compact_node+0x8e/0xe0
[   59.557033]  kcompactd+0x1c3/0x350
[   59.557842]  ? swake_up_all+0xe0/0xe0
[   59.558699]  ? kcompactd_do_work+0x260/0x260
[   59.559703]  kthread+0xec/0x110
[   59.560450]  ? kthread_complete_and_exit+0x20/0x20
[   59.561582]  ret_from_fork+0x1f/0x30
[   59.562427]  </TASK>
[   59.562966] Modules linked in: deflate zlib_deflate zstd zstd_compress zram
[   59.564591] ---[ end trace 0000000000000000 ]---
[   59.565661] RIP: 0010:move_to_new_folio+0x362/0x3b0
[   59.566802] Code: ff ff e9 55 fd ff ff 48 89 df e8 69 d8 ff ff f0 80 60 02 fb 31 c0 e9 65 fd ff ff 48 c7 c6 00 f5 e9 81 48 89 df e8 be c0 f9 ff <0f> 0b 48 c7 c6 00 f5 e9 81 48 89 df e8 ad c0 f9 ff 0f 0b b8 f5 ff
[   59.571048] RSP: 0018:ffffc90000ba7af8 EFLAGS: 00010246
[   59.572257] RAX: 000000000000003f RBX: ffffea00044f7100 RCX: 0000000000000000
[   59.573906] RDX: 0000000000000000 RSI: ffffffff81e8dcf1 RDI: 00000000ffffffff
[   59.575544] RBP: ffffea00184f1140 R08: 00000000ffffbfff R09: 00000000ffffbfff
[   59.577236] R10: ffff888621ca0000 R11: ffff888621ca0000 R12: 8000000000000001
[   59.578893] R13: 0000000000000001 R14: 0000000000000000 R15: ffffea00184f1140
[   59.580593] FS:  0000000000000000(0000) GS:ffff888626a00000(0000) knlGS:0000000000000000
[   59.582432] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   59.583767] CR2: 00007ff7fbed8000 CR3: 0000000101a26001 CR4: 0000000000770ee0
[   59.585437] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   59.587082] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   59.588738] PKRU: 55555554


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable
  2023-01-15  7:18     ` Sergey Senozhatsky
@ 2023-01-15  8:19       ` Sergey Senozhatsky
  2023-01-16  1:27         ` Huang, Ying
  2023-01-15 13:04       ` Matthew Wilcox
  1 sibling, 1 reply; 27+ messages in thread
From: Sergey Senozhatsky @ 2023-01-15  8:19 UTC (permalink / raw)
  To: Huang Ying
  Cc: Matthew Wilcox, Andrew Morton, Mike Kravetz, Minchan Kim,
	linux-kernel, linux-mm, Sergey Senozhatsky

+ Huang Ying,

> On (23/01/14 16:08), Sergey Senozhatsky wrote:
> > [   87.208255] ------------[ cut here ]------------
> > [   87.209431] WARNING: CPU: 18 PID: 300 at mm/migrate.c:995 move_to_new_folio+0x1ef/0x260
> > [   87.211993] Modules linked in: deflate zlib_deflate zstd zstd_compress zram
> > [   87.214287] CPU: 18 PID: 300 Comm: kcompactd0 Tainted: G                 N 6.2.0-rc3-next-20230113+ #385
> > [   87.217529] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
> > [   87.220131] RIP: 0010:move_to_new_folio+0x1ef/0x260
> > [   87.221892] Code: 84 c0 74 78 48 8b 43 18 44 89 ea 48 89 de 4c 89 e7 ff 50 06 85 c0 0f 85 a9 fe ff ff 48 8b 03 a9 00 00 04 00 0f 85 7a fe ff ff <0f> 0b e9 73 fe ff ff 48 8b 03 f6 c4 20 74 2a be c0 0c 00 00 48 89
> > [   87.226514] RSP: 0018:ffffc90000b9fb08 EFLAGS: 00010246
> > [   87.227879] RAX: 4000000000000021 RBX: ffffea0000890500 RCX: 0000000000000000
> > [   87.230948] RDX: 0000000000000000 RSI: ffffffff81e6f950 RDI: ffffea0000890500
> > [   87.233026] RBP: ffffea0000890500 R08: 0000001e82ec3c3e R09: 0000000000000001
> > [   87.235517] R10: 00000000ffffffff R11: 00000000ffffffff R12: ffffea00015a26c0
> > [   87.237807] R13: 0000000000000001 R14: ffffea00015a2680 R15: ffffea00008904c0
> > [   87.239438] FS:  0000000000000000(0000) GS:ffff888624200000(0000) knlGS:0000000000000000
> > [   87.241303] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   87.242627] CR2: 00007fe537ebbdb8 CR3: 0000000110a0a004 CR4: 0000000000770ee0
> > [   87.244283] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [   87.245913] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [   87.247559] PKRU: 55555554
> > [   87.248269] Call Trace:
> > [   87.248862]  <TASK>
> > [   87.249370]  ? lock_is_held_type+0xd9/0x130
> > [   87.250377]  migrate_pages_batch+0x553/0xc80
> > [   87.251513]  ? move_freelist_tail+0xc0/0xc0
> > [   87.252545]  ? isolate_freepages+0x290/0x290
> > [   87.253654]  ? trace_mm_migrate_pages+0xf0/0xf0
> > [   87.254901]  migrate_pages+0x1ae/0x330
> > [   87.255877]  ? isolate_freepages+0x290/0x290
> > [   87.257015]  ? move_freelist_tail+0xc0/0xc0
> > [   87.258213]  compact_zone+0x528/0x6a0
> > [   87.260911]  proactive_compact_node+0x87/0xd0
> > [   87.262090]  kcompactd+0x1ca/0x360
> > [   87.263018]  ? swake_up_all+0xe0/0xe0
> > [   87.264101]  ? kcompactd_do_work+0x240/0x240
> > [   87.265243]  kthread+0xec/0x110
> > [   87.266031]  ? kthread_complete_and_exit+0x20/0x20
> > [   87.267268]  ret_from_fork+0x1f/0x30
> > [   87.268243]  </TASK>
> > [   87.268984] irq event stamp: 311113
> > [   87.269930] hardirqs last  enabled at (311125): [<ffffffff810da6c2>] __up_console_sem+0x52/0x60
> > [   87.272235] hardirqs last disabled at (311134): [<ffffffff810da6a7>] __up_console_sem+0x37/0x60
> > [   87.275707] softirqs last  enabled at (311088): [<ffffffff819d2b2c>] __do_softirq+0x21c/0x31f
> > [   87.278450] softirqs last disabled at (311083): [<ffffffff81070b8d>] __irq_exit_rcu+0xad/0x120
> > [   87.280555] ---[ end trace 0000000000000000 ]---
> 
> So this warning is move_to_new_folio() being called on un-isolated
> src folio. I had DEBUG_VM disabled so VM_BUG_ON_FOLIO(!folio_test_isolated(src))
> did nothing, however after mops->migrate_page() it would trigger WARN_ON()
> because it evaluates folio_test_isolated(src) one more time:
> 
> [   59.500580] page:0000000097d97a42 refcount:2 mapcount:1665 mapping:0000000000000000 index:0xffffea00185ce940 pfn:0x113dc4
> [   59.503239] flags: 0x8000000000000001(locked|zone=2)
> [   59.505060] raw: 8000000000000001 ffffea00044f70c8 ffffc90000ba7c20 ffffffff81c22582
> [   59.507288] raw: ffffea00185ce940 ffff88809183fdb0 0000000200000680 0000000000000000
> [   59.509622] page dumped because: VM_BUG_ON_FOLIO(!folio_test_isolated(src))
> [   59.511845] ------------[ cut here ]------------
> [   59.513181] kernel BUG at mm/migrate.c:988!
> [   59.514821] invalid opcode: 0000 [#1] PREEMPT SMP PTI
> 
> [   59.523018] RIP: 0010:move_to_new_folio+0x362/0x3b0
> [   59.524160] Code: ff ff e9 55 fd ff ff 48 89 df e8 69 d8 ff ff f0 80 60 02 fb 31 c0 e9 65 fd ff ff 48 c7 c6 00 f5 e9 81 48 89 df e8 be c0 f9 ff <0f> 0b 48 c7 c6 00 f5 e9 81 48 89 df e8 ad c0 f9 ff 0f 0b b8 f5 ff
> [   59.528349] RSP: 0018:ffffc90000ba7af8 EFLAGS: 00010246
> [   59.529551] RAX: 000000000000003f RBX: ffffea00044f7100 RCX: 0000000000000000
> [   59.531186] RDX: 0000000000000000 RSI: ffffffff81e8dcf1 RDI: 00000000ffffffff
> [   59.532790] RBP: ffffea00184f1140 R08: 00000000ffffbfff R09: 00000000ffffbfff
> [   59.534392] R10: ffff888621ca0000 R11: ffff888621ca0000 R12: 8000000000000001
> [   59.536026] R13: 0000000000000001 R14: 0000000000000000 R15: ffffea00184f1140
> [   59.537646] FS:  0000000000000000(0000) GS:ffff888626a00000(0000) knlGS:0000000000000000
> [   59.539484] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   59.540785] CR2: 00007ff7fbed8000 CR3: 0000000101a26001 CR4: 0000000000770ee0
> [   59.542412] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   59.544030] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   59.545637] PKRU: 55555554
> [   59.546261] Call Trace:
> [   59.546833]  <TASK>
> [   59.547371]  ? lock_is_held_type+0xd9/0x130
> [   59.548331]  migrate_pages_batch+0x650/0xdc0
> [   59.549326]  ? move_freelist_tail+0xc0/0xc0
> [   59.550281]  ? isolate_freepages+0x290/0x290
> [   59.551289]  ? folio_flags.constprop.0+0x50/0x50
> [   59.552348]  migrate_pages+0x3fa/0x4d0
> [   59.553224]  ? isolate_freepages+0x290/0x290
> [   59.554214]  ? move_freelist_tail+0xc0/0xc0
> [   59.555173]  compact_zone+0x51b/0x6a0
> [   59.556031]  proactive_compact_node+0x8e/0xe0
> [   59.557033]  kcompactd+0x1c3/0x350
> [   59.557842]  ? swake_up_all+0xe0/0xe0
> [   59.558699]  ? kcompactd_do_work+0x260/0x260
> [   59.559703]  kthread+0xec/0x110
> [   59.560450]  ? kthread_complete_and_exit+0x20/0x20
> [   59.561582]  ret_from_fork+0x1f/0x30
> [   59.562427]  </TASK>
> [   59.562966] Modules linked in: deflate zlib_deflate zstd zstd_compress zram
> [   59.564591] ---[ end trace 0000000000000000 ]---
> [   59.565661] RIP: 0010:move_to_new_folio+0x362/0x3b0
> [   59.566802] Code: ff ff e9 55 fd ff ff 48 89 df e8 69 d8 ff ff f0 80 60 02 fb 31 c0 e9 65 fd ff ff 48 c7 c6 00 f5 e9 81 48 89 df e8 be c0 f9 ff <0f> 0b 48 c7 c6 00 f5 e9 81 48 89 df e8 ad c0 f9 ff 0f 0b b8 f5 ff
> [   59.571048] RSP: 0018:ffffc90000ba7af8 EFLAGS: 00010246
> [   59.572257] RAX: 000000000000003f RBX: ffffea00044f7100 RCX: 0000000000000000
> [   59.573906] RDX: 0000000000000000 RSI: ffffffff81e8dcf1 RDI: 00000000ffffffff
> [   59.575544] RBP: ffffea00184f1140 R08: 00000000ffffbfff R09: 00000000ffffbfff
> [   59.577236] R10: ffff888621ca0000 R11: ffff888621ca0000 R12: 8000000000000001
> [   59.578893] R13: 0000000000000001 R14: 0000000000000000 R15: ffffea00184f1140
> [   59.580593] FS:  0000000000000000(0000) GS:ffff888626a00000(0000) knlGS:0000000000000000
> [   59.582432] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   59.583767] CR2: 00007ff7fbed8000 CR3: 0000000101a26001 CR4: 0000000000770ee0
> [   59.585437] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   59.587082] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   59.588738] PKRU: 55555554


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable
  2023-01-15  7:18     ` Sergey Senozhatsky
  2023-01-15  8:19       ` Sergey Senozhatsky
@ 2023-01-15 13:04       ` Matthew Wilcox
  2023-01-15 14:55         ` Sergey Senozhatsky
  1 sibling, 1 reply; 27+ messages in thread
From: Matthew Wilcox @ 2023-01-15 13:04 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Mike Kravetz, Minchan Kim, linux-kernel, linux-mm

On Sun, Jan 15, 2023 at 04:18:55PM +0900, Sergey Senozhatsky wrote:
> So this warning is move_to_new_folio() being called on un-isolated
> src folio. I had DEBUG_VM disabled so VM_BUG_ON_FOLIO(!folio_test_isolated(src))
> did nothing, however after mops->migrate_page() it would trigger WARN_ON()
> because it evaluates folio_test_isolated(src) one more time:
> 
> [   59.500580] page:0000000097d97a42 refcount:2 mapcount:1665 mapping:0000000000000000 index:0xffffea00185ce940 pfn:0x113dc4
> [   59.503239] flags: 0x8000000000000001(locked|zone=2)
> [   59.505060] raw: 8000000000000001 ffffea00044f70c8 ffffc90000ba7c20 ffffffff81c22582
> [   59.507288] raw: ffffea00185ce940 ffff88809183fdb0 0000000200000680 0000000000000000

That is quite the messed-up page.  mapcount is positive, but higher than
refcount.  And not just a little bit; 1665 vs 2.  But mapping is NULL,
so it's not anon or file memory.  Makes me think it belongs to a driver
that's using ->mapcount for its own purposes.  It's not PageSlab.

Given that you're working on zsmalloc, I took a look and:

static inline void set_first_obj_offset(struct page *page, unsigned int offset)
{
        page->page_type = offset;
}

(page_type aliases with mapcount).  So I'm pretty sure this is a
zsmalloc page.  But mapping should point to zsmalloc_mops.  Not
really sure what's going on here.  Can you bisect?

> [   59.509622] page dumped because: VM_BUG_ON_FOLIO(!folio_test_isolated(src))
> [   59.511845] ------------[ cut here ]------------
> [   59.513181] kernel BUG at mm/migrate.c:988!
> [   59.514821] invalid opcode: 0000 [#1] PREEMPT SMP PTI
> 
> [   59.523018] RIP: 0010:move_to_new_folio+0x362/0x3b0
> [   59.524160] Code: ff ff e9 55 fd ff ff 48 89 df e8 69 d8 ff ff f0 80 60 02 fb 31 c0 e9 65 fd ff ff 48 c7 c6 00 f5 e9 81 48 89 df e8 be c0 f9 ff <0f> 0b 48 c7 c6 00 f5 e9 81 48 89 df e8 ad c0 f9 ff 0f 0b b8 f5 ff
> [   59.528349] RSP: 0018:ffffc90000ba7af8 EFLAGS: 00010246
> [   59.529551] RAX: 000000000000003f RBX: ffffea00044f7100 RCX: 0000000000000000
> [   59.531186] RDX: 0000000000000000 RSI: ffffffff81e8dcf1 RDI: 00000000ffffffff
> [   59.532790] RBP: ffffea00184f1140 R08: 00000000ffffbfff R09: 00000000ffffbfff
> [   59.534392] R10: ffff888621ca0000 R11: ffff888621ca0000 R12: 8000000000000001
> [   59.536026] R13: 0000000000000001 R14: 0000000000000000 R15: ffffea00184f1140
> [   59.537646] FS:  0000000000000000(0000) GS:ffff888626a00000(0000) knlGS:0000000000000000
> [   59.539484] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   59.540785] CR2: 00007ff7fbed8000 CR3: 0000000101a26001 CR4: 0000000000770ee0
> [   59.542412] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   59.544030] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   59.545637] PKRU: 55555554
> [   59.546261] Call Trace:
> [   59.546833]  <TASK>
> [   59.547371]  ? lock_is_held_type+0xd9/0x130
> [   59.548331]  migrate_pages_batch+0x650/0xdc0
> [   59.549326]  ? move_freelist_tail+0xc0/0xc0
> [   59.550281]  ? isolate_freepages+0x290/0x290
> [   59.551289]  ? folio_flags.constprop.0+0x50/0x50
> [   59.552348]  migrate_pages+0x3fa/0x4d0
> [   59.553224]  ? isolate_freepages+0x290/0x290
> [   59.554214]  ? move_freelist_tail+0xc0/0xc0
> [   59.555173]  compact_zone+0x51b/0x6a0
> [   59.556031]  proactive_compact_node+0x8e/0xe0
> [   59.557033]  kcompactd+0x1c3/0x350
> [   59.557842]  ? swake_up_all+0xe0/0xe0
> [   59.558699]  ? kcompactd_do_work+0x260/0x260
> [   59.559703]  kthread+0xec/0x110
> [   59.560450]  ? kthread_complete_and_exit+0x20/0x20
> [   59.561582]  ret_from_fork+0x1f/0x30
> [   59.562427]  </TASK>
> [   59.562966] Modules linked in: deflate zlib_deflate zstd zstd_compress zram
> [   59.564591] ---[ end trace 0000000000000000 ]---
> [   59.565661] RIP: 0010:move_to_new_folio+0x362/0x3b0
> [   59.566802] Code: ff ff e9 55 fd ff ff 48 89 df e8 69 d8 ff ff f0 80 60 02 fb 31 c0 e9 65 fd ff ff 48 c7 c6 00 f5 e9 81 48 89 df e8 be c0 f9 ff <0f> 0b 48 c7 c6 00 f5 e9 81 48 89 df e8 ad c0 f9 ff 0f 0b b8 f5 ff
> [   59.571048] RSP: 0018:ffffc90000ba7af8 EFLAGS: 00010246
> [   59.572257] RAX: 000000000000003f RBX: ffffea00044f7100 RCX: 0000000000000000
> [   59.573906] RDX: 0000000000000000 RSI: ffffffff81e8dcf1 RDI: 00000000ffffffff
> [   59.575544] RBP: ffffea00184f1140 R08: 00000000ffffbfff R09: 00000000ffffbfff
> [   59.577236] R10: ffff888621ca0000 R11: ffff888621ca0000 R12: 8000000000000001
> [   59.578893] R13: 0000000000000001 R14: 0000000000000000 R15: ffffea00184f1140
> [   59.580593] FS:  0000000000000000(0000) GS:ffff888626a00000(0000) knlGS:0000000000000000
> [   59.582432] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   59.583767] CR2: 00007ff7fbed8000 CR3: 0000000101a26001 CR4: 0000000000770ee0
> [   59.585437] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   59.587082] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   59.588738] PKRU: 55555554


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable
  2023-01-15 13:04       ` Matthew Wilcox
@ 2023-01-15 14:55         ` Sergey Senozhatsky
  0 siblings, 0 replies; 27+ messages in thread
From: Sergey Senozhatsky @ 2023-01-15 14:55 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Sergey Senozhatsky, Andrew Morton, Mike Kravetz, Minchan Kim,
	linux-kernel, linux-mm

On (23/01/15 13:04), Matthew Wilcox wrote:
> On Sun, Jan 15, 2023 at 04:18:55PM +0900, Sergey Senozhatsky wrote:
> > So this warning is move_to_new_folio() being called on un-isolated
> > src folio. I had DEBUG_VM disabled so VM_BUG_ON_FOLIO(!folio_test_isolated(src))
> > did nothing, however after mops->migrate_page() it would trigger WARN_ON()
> > because it evaluates folio_test_isolated(src) one more time:
> > 
> > [   59.500580] page:0000000097d97a42 refcount:2 mapcount:1665 mapping:0000000000000000 index:0xffffea00185ce940 pfn:0x113dc4
> > [   59.503239] flags: 0x8000000000000001(locked|zone=2)
> > [   59.505060] raw: 8000000000000001 ffffea00044f70c8 ffffc90000ba7c20 ffffffff81c22582
> > [   59.507288] raw: ffffea00185ce940 ffff88809183fdb0 0000000200000680 0000000000000000
> 
> That is quite the messed-up page.  mapcount is positive, but higher than
> refcount.  And not just a little bit; 1665 vs 2.  But mapping is NULL,
> so it's not anon or file memory.  Makes me think it belongs to a driver
> that's using ->mapcount for its own purposes.  It's not PageSlab.
> 
> Given that you're working on zsmalloc, I took a look and:
> 
> static inline void set_first_obj_offset(struct page *page, unsigned int offset)
> {
>         page->page_type = offset;
> }
> 
> (page_type aliases with mapcount).  So I'm pretty sure this is a
> zsmalloc page.  But mapping should point to zsmalloc_mops.  Not
> really sure what's going on here.  Can you bisect?

Thanks.

Let me try bisecting. From what I can tell it seems that
tags/next-20221226 is the last good and tags/next-20230105
is the first bad kernel.

I'll try to narrow it down from here.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable
  2023-01-15  8:19       ` Sergey Senozhatsky
@ 2023-01-16  1:27         ` Huang, Ying
  2023-01-16  3:46           ` Sergey Senozhatsky
  0 siblings, 1 reply; 27+ messages in thread
From: Huang, Ying @ 2023-01-16  1:27 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Matthew Wilcox, Andrew Morton, Mike Kravetz, Minchan Kim,
	linux-kernel, linux-mm

Hi, Sergey,

Sergey Senozhatsky <senozhatsky@chromium.org> writes:
> + Huang Ying,
>
>> On (23/01/14 16:08), Sergey Senozhatsky wrote:
>> > [   87.208255] ------------[ cut here ]------------
>> > [   87.209431] WARNING: CPU: 18 PID: 300 at mm/migrate.c:995 move_to_new_folio+0x1ef/0x260
>> > [   87.211993] Modules linked in: deflate zlib_deflate zstd zstd_compress zram
>> > [   87.214287] CPU: 18 PID: 300 Comm: kcompactd0 Tainted: G                 N 6.2.0-rc3-next-20230113+ #385
>> > [   87.217529] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
>> > [   87.220131] RIP: 0010:move_to_new_folio+0x1ef/0x260
>> > [ 87.221892] Code: 84 c0 74 78 48 8b 43 18 44 89 ea 48 89 de 4c 89
>> > e7 ff 50 06 85 c0 0f 85 a9 fe ff ff 48 8b 03 a9 00 00 04 00 0f 85
>> > 7a fe ff ff <0f> 0b e9 73 fe ff ff 48 8b 03 f6 c4 20 74 2a be c0
>> > 0c 00 00 48 89
>> > [   87.226514] RSP: 0018:ffffc90000b9fb08 EFLAGS: 00010246
>> > [   87.227879] RAX: 4000000000000021 RBX: ffffea0000890500 RCX: 0000000000000000
>> > [   87.230948] RDX: 0000000000000000 RSI: ffffffff81e6f950 RDI: ffffea0000890500
>> > [   87.233026] RBP: ffffea0000890500 R08: 0000001e82ec3c3e R09: 0000000000000001
>> > [   87.235517] R10: 00000000ffffffff R11: 00000000ffffffff R12: ffffea00015a26c0
>> > [   87.237807] R13: 0000000000000001 R14: ffffea00015a2680 R15: ffffea00008904c0
>> > [   87.239438] FS:  0000000000000000(0000) GS:ffff888624200000(0000) knlGS:0000000000000000
>> > [   87.241303] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > [   87.242627] CR2: 00007fe537ebbdb8 CR3: 0000000110a0a004 CR4: 0000000000770ee0
>> > [   87.244283] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> > [   87.245913] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> > [   87.247559] PKRU: 55555554
>> > [   87.248269] Call Trace:
>> > [   87.248862]  <TASK>
>> > [   87.249370]  ? lock_is_held_type+0xd9/0x130
>> > [   87.250377]  migrate_pages_batch+0x553/0xc80
>> > [   87.251513]  ? move_freelist_tail+0xc0/0xc0
>> > [   87.252545]  ? isolate_freepages+0x290/0x290
>> > [   87.253654]  ? trace_mm_migrate_pages+0xf0/0xf0
>> > [   87.254901]  migrate_pages+0x1ae/0x330
>> > [   87.255877]  ? isolate_freepages+0x290/0x290
>> > [   87.257015]  ? move_freelist_tail+0xc0/0xc0
>> > [   87.258213]  compact_zone+0x528/0x6a0
>> > [   87.260911]  proactive_compact_node+0x87/0xd0
>> > [   87.262090]  kcompactd+0x1ca/0x360
>> > [   87.263018]  ? swake_up_all+0xe0/0xe0
>> > [   87.264101]  ? kcompactd_do_work+0x240/0x240
>> > [   87.265243]  kthread+0xec/0x110
>> > [   87.266031]  ? kthread_complete_and_exit+0x20/0x20
>> > [   87.267268]  ret_from_fork+0x1f/0x30
>> > [   87.268243]  </TASK>
>> > [   87.268984] irq event stamp: 311113
>> > [   87.269930] hardirqs last  enabled at (311125): [<ffffffff810da6c2>] __up_console_sem+0x52/0x60
>> > [   87.272235] hardirqs last disabled at (311134): [<ffffffff810da6a7>] __up_console_sem+0x37/0x60
>> > [   87.275707] softirqs last  enabled at (311088): [<ffffffff819d2b2c>] __do_softirq+0x21c/0x31f
>> > [   87.278450] softirqs last disabled at (311083): [<ffffffff81070b8d>] __irq_exit_rcu+0xad/0x120
>> > [   87.280555] ---[ end trace 0000000000000000 ]---
>> 
>> So this warning is move_to_new_folio() being called on un-isolated
>> src folio. I had DEBUG_VM disabled so VM_BUG_ON_FOLIO(!folio_test_isolated(src))
>> did nothing, however after mops->migrate_page() it would trigger WARN_ON()
>> because it evaluates folio_test_isolated(src) one more time:
>> 
>> [   59.500580] page:0000000097d97a42 refcount:2 mapcount:1665 mapping:0000000000000000 index:0xffffea00185ce940 pfn:0x113dc4
>> [   59.503239] flags: 0x8000000000000001(locked|zone=2)
>> [   59.505060] raw: 8000000000000001 ffffea00044f70c8 ffffc90000ba7c20 ffffffff81c22582
>> [   59.507288] raw: ffffea00185ce940 ffff88809183fdb0 0000000200000680 0000000000000000
>> [   59.509622] page dumped because: VM_BUG_ON_FOLIO(!folio_test_isolated(src))
>> [   59.511845] ------------[ cut here ]------------
>> [   59.513181] kernel BUG at mm/migrate.c:988!
>> [   59.514821] invalid opcode: 0000 [#1] PREEMPT SMP PTI
>> 
>> [   59.523018] RIP: 0010:move_to_new_folio+0x362/0x3b0
>> [ 59.524160] Code: ff ff e9 55 fd ff ff 48 89 df e8 69 d8 ff ff f0
>> 80 60 02 fb 31 c0 e9 65 fd ff ff 48 c7 c6 00 f5 e9 81 48 89 df e8 be
>> c0 f9 ff <0f> 0b 48 c7 c6 00 f5 e9 81 48 89 df e8 ad c0 f9 ff 0f 0b
>> b8 f5 ff
>> [   59.528349] RSP: 0018:ffffc90000ba7af8 EFLAGS: 00010246
>> [   59.529551] RAX: 000000000000003f RBX: ffffea00044f7100 RCX: 0000000000000000
>> [   59.531186] RDX: 0000000000000000 RSI: ffffffff81e8dcf1 RDI: 00000000ffffffff
>> [   59.532790] RBP: ffffea00184f1140 R08: 00000000ffffbfff R09: 00000000ffffbfff
>> [   59.534392] R10: ffff888621ca0000 R11: ffff888621ca0000 R12: 8000000000000001
>> [   59.536026] R13: 0000000000000001 R14: 0000000000000000 R15: ffffea00184f1140
>> [   59.537646] FS:  0000000000000000(0000) GS:ffff888626a00000(0000) knlGS:0000000000000000
>> [   59.539484] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   59.540785] CR2: 00007ff7fbed8000 CR3: 0000000101a26001 CR4: 0000000000770ee0
>> [   59.542412] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [   59.544030] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [   59.545637] PKRU: 55555554
>> [   59.546261] Call Trace:
>> [   59.546833]  <TASK>
>> [   59.547371]  ? lock_is_held_type+0xd9/0x130
>> [   59.548331]  migrate_pages_batch+0x650/0xdc0
>> [   59.549326]  ? move_freelist_tail+0xc0/0xc0
>> [   59.550281]  ? isolate_freepages+0x290/0x290
>> [   59.551289]  ? folio_flags.constprop.0+0x50/0x50
>> [   59.552348]  migrate_pages+0x3fa/0x4d0
>> [   59.553224]  ? isolate_freepages+0x290/0x290
>> [   59.554214]  ? move_freelist_tail+0xc0/0xc0
>> [   59.555173]  compact_zone+0x51b/0x6a0
>> [   59.556031]  proactive_compact_node+0x8e/0xe0
>> [   59.557033]  kcompactd+0x1c3/0x350
>> [   59.557842]  ? swake_up_all+0xe0/0xe0
>> [   59.558699]  ? kcompactd_do_work+0x260/0x260
>> [   59.559703]  kthread+0xec/0x110
>> [   59.560450]  ? kthread_complete_and_exit+0x20/0x20
>> [   59.561582]  ret_from_fork+0x1f/0x30
>> [   59.562427]  </TASK>
>> [   59.562966] Modules linked in: deflate zlib_deflate zstd zstd_compress zram
>> [   59.564591] ---[ end trace 0000000000000000 ]---
>> [   59.565661] RIP: 0010:move_to_new_folio+0x362/0x3b0
>> [ 59.566802] Code: ff ff e9 55 fd ff ff 48 89 df e8 69 d8 ff ff f0
>> 80 60 02 fb 31 c0 e9 65 fd ff ff 48 c7 c6 00 f5 e9 81 48 89 df e8 be
>> c0 f9 ff <0f> 0b 48 c7 c6 00 f5 e9 81 48 89 df e8 ad c0 f9 ff 0f 0b
>> b8 f5 ff
>> [   59.571048] RSP: 0018:ffffc90000ba7af8 EFLAGS: 00010246
>> [   59.572257] RAX: 000000000000003f RBX: ffffea00044f7100 RCX: 0000000000000000
>> [   59.573906] RDX: 0000000000000000 RSI: ffffffff81e8dcf1 RDI: 00000000ffffffff
>> [   59.575544] RBP: ffffea00184f1140 R08: 00000000ffffbfff R09: 00000000ffffbfff
>> [   59.577236] R10: ffff888621ca0000 R11: ffff888621ca0000 R12: 8000000000000001
>> [   59.578893] R13: 0000000000000001 R14: 0000000000000000 R15: ffffea00184f1140
>> [   59.580593] FS:  0000000000000000(0000) GS:ffff888626a00000(0000) knlGS:0000000000000000
>> [   59.582432] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   59.583767] CR2: 00007ff7fbed8000 CR3: 0000000101a26001 CR4: 0000000000770ee0
>> [   59.585437] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [   59.587082] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [   59.588738] PKRU: 55555554

Thanks for reporting.  We have just fixed a ZRAM related bug in
migrate_pages() batching series with the help of Mike.

https://lore.kernel.org/linux-mm/Y8DizzvFXBSEPzI4@monkey/

I will send out a new version today or tomorrow to fix it.  Please try
that.

Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable
  2023-01-09  3:38 [PATCHv2 0/4] zsmalloc: make zspage chain size configurable Sergey Senozhatsky
                   ` (4 preceding siblings ...)
  2023-01-13 19:57 ` [PATCHv2 0/4] zsmalloc: make zspage chain size configurable Mike Kravetz
@ 2023-01-16  3:15 ` Sergey Senozhatsky
  2023-01-16 18:34   ` Mike Kravetz
  5 siblings, 1 reply; 27+ messages in thread
From: Sergey Senozhatsky @ 2023-01-16  3:15 UTC (permalink / raw)
  To: Andrew Morton, Mike Kravetz
  Cc: Minchan Kim, linux-kernel, linux-mm, Sergey Senozhatsky

On (23/01/09 12:38), Sergey Senozhatsky wrote:
> 	This turns hard coded limit on maximum number of physical
> pages per-zspage into a config option. It also increases the default
> limit from 4 to 8.
> 
> Sergey Senozhatsky (4):
>   zsmalloc: rework zspage chain size selection
>   zsmalloc: skip chain size calculation for pow_of_2 classes
>   zsmalloc: make zspage chain size configurable
>   zsmalloc: set default zspage chain size to 8
> 
>  Documentation/mm/zsmalloc.rst | 168 ++++++++++++++++++++++++++++++++++
>  mm/Kconfig                    |  19 ++++
>  mm/zsmalloc.c                 |  72 +++++----------
>  3 files changed, 212 insertions(+), 47 deletions(-)

Andrew,

Can you please drop this series? We have two fixup patches (hppa64 build
failure and isolated bit-field overflow reported by Mike) for this series
and at this point I probably want to send out v3 with all fixups squashed.

Mike, would that be OK with you if I squash ->isolated fixup?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable
  2023-01-16  1:27         ` Huang, Ying
@ 2023-01-16  3:46           ` Sergey Senozhatsky
  0 siblings, 0 replies; 27+ messages in thread
From: Sergey Senozhatsky @ 2023-01-16  3:46 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Sergey Senozhatsky, Matthew Wilcox, Andrew Morton, Mike Kravetz,
	Minchan Kim, linux-kernel, linux-mm

Hi,

On (23/01/16 09:27), Huang, Ying wrote:
> >> [   59.546261] Call Trace:
> >> [   59.546833]  <TASK>
> >> [   59.547371]  ? lock_is_held_type+0xd9/0x130
> >> [   59.548331]  migrate_pages_batch+0x650/0xdc0
> >> [   59.549326]  ? move_freelist_tail+0xc0/0xc0
> >> [   59.550281]  ? isolate_freepages+0x290/0x290
> >> [   59.551289]  ? folio_flags.constprop.0+0x50/0x50
> >> [   59.552348]  migrate_pages+0x3fa/0x4d0
> >> [   59.553224]  ? isolate_freepages+0x290/0x290
> >> [   59.554214]  ? move_freelist_tail+0xc0/0xc0
> >> [   59.555173]  compact_zone+0x51b/0x6a0
> >> [   59.556031]  proactive_compact_node+0x8e/0xe0
> >> [   59.557033]  kcompactd+0x1c3/0x350
> >> [   59.557842]  ? swake_up_all+0xe0/0xe0
> >> [   59.558699]  ? kcompactd_do_work+0x260/0x260
> >> [   59.559703]  kthread+0xec/0x110
> >> [   59.560450]  ? kthread_complete_and_exit+0x20/0x20
> >> [   59.561582]  ret_from_fork+0x1f/0x30
> >> [   59.562427]  </TASK>
> >> [   59.562966] Modules linked in: deflate zlib_deflate zstd zstd_compress zram
> >> [   59.564591] ---[ end trace 0000000000000000 ]---
> >> [   59.565661] RIP: 0010:move_to_new_folio+0x362/0x3b0
> 
> Thanks for reporting.  We have just fixed a ZRAM related bug in
> migrate_pages() batching series with the help of Mike.

Oh, great. Yeah, I narroved it down to that series as well.

> https://lore.kernel.org/linux-mm/Y8DizzvFXBSEPzI4@monkey/

That fixes it!


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv2 0/4] zsmalloc: make zspage chain size configurable
  2023-01-16  3:15 ` Sergey Senozhatsky
@ 2023-01-16 18:34   ` Mike Kravetz
  0 siblings, 0 replies; 27+ messages in thread
From: Mike Kravetz @ 2023-01-16 18:34 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: Andrew Morton, Minchan Kim, linux-kernel, linux-mm

On 01/16/23 12:15, Sergey Senozhatsky wrote:
> On (23/01/09 12:38), Sergey Senozhatsky wrote:
> > 	This turns hard coded limit on maximum number of physical
> > pages per-zspage into a config option. It also increases the default
> > limit from 4 to 8.
> > 
> > Sergey Senozhatsky (4):
> >   zsmalloc: rework zspage chain size selection
> >   zsmalloc: skip chain size calculation for pow_of_2 classes
> >   zsmalloc: make zspage chain size configurable
> >   zsmalloc: set default zspage chain size to 8
> > 
> >  Documentation/mm/zsmalloc.rst | 168 ++++++++++++++++++++++++++++++++++
> >  mm/Kconfig                    |  19 ++++
> >  mm/zsmalloc.c                 |  72 +++++----------
> >  3 files changed, 212 insertions(+), 47 deletions(-)
> 
> Andrew,
> 
> Can you please drop this series? We have two fixup patches (hppa64 build
> failure and isolated bit-field overflow reported by Mike) for this series
> and at this point I probably want to send out v3 with all fixups squashed.
> 
> Mike, would that be OK with you if I squash ->isolated fixup?

I'm OK with however you want to address.   Thanks!
-- 
Mike Kravetz


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2023-01-16 18:34 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-09  3:38 [PATCHv2 0/4] zsmalloc: make zspage chain size configurable Sergey Senozhatsky
2023-01-09  3:38 ` [PATCHv2 1/4] zsmalloc: rework zspage chain size selection Sergey Senozhatsky
2023-01-13 17:32   ` Minchan Kim
2023-01-09  3:38 ` [PATCHv2 2/4] zsmalloc: skip chain size calculation for pow_of_2 classes Sergey Senozhatsky
2023-01-13 17:32   ` Minchan Kim
2023-01-09  3:38 ` [PATCHv2 3/4] zsmalloc: make zspage chain size configurable Sergey Senozhatsky
2023-01-12  7:11   ` Sergey Senozhatsky
2023-01-12  7:14     ` [PATCH] zsmalloc: turn chain size config option into UL constant Sergey Senozhatsky
2023-01-13 19:02   ` [PATCHv2 3/4] zsmalloc: make zspage chain size configurable Minchan Kim
2023-01-09  3:38 ` [PATCHv2 4/4] zsmalloc: set default zspage chain size to 8 Sergey Senozhatsky
2023-01-13 19:02   ` Minchan Kim
2023-01-14  7:28     ` Sergey Senozhatsky
2023-01-13 19:57 ` [PATCHv2 0/4] zsmalloc: make zspage chain size configurable Mike Kravetz
2023-01-14  5:27   ` Sergey Senozhatsky
2023-01-14  6:34   ` Sergey Senozhatsky
2023-01-14  7:08   ` Sergey Senozhatsky
2023-01-14 21:34     ` Mike Kravetz
2023-01-15  4:21       ` Sergey Senozhatsky
2023-01-15  5:32         ` Sergey Senozhatsky
2023-01-15  7:18     ` Sergey Senozhatsky
2023-01-15  8:19       ` Sergey Senozhatsky
2023-01-16  1:27         ` Huang, Ying
2023-01-16  3:46           ` Sergey Senozhatsky
2023-01-15 13:04       ` Matthew Wilcox
2023-01-15 14:55         ` Sergey Senozhatsky
2023-01-16  3:15 ` Sergey Senozhatsky
2023-01-16 18:34   ` Mike Kravetz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).