* [PATCHv2 0/9] zsmalloc/zram: configurable zspage size
@ 2022-10-26 11:29 Sergey Senozhatsky
2022-10-26 11:29 ` [PATCHv2 1/9] zram: add size class equals check into recompression Sergey Senozhatsky
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: Sergey Senozhatsky @ 2022-10-26 11:29 UTC (permalink / raw)
To: Andrew Morton, Minchan Kim
Cc: Nitin Gupta, linux-kernel, linux-mm, Sergey Senozhatsky
Hello,
Some use-cases and/or data patterns may benefit from
larger zspages. Currently the limit on the number of physical
pages that are linked into a zspage is hardcoded to 4. Higher
limit changes key characteristics of a number of the size
classes, improving compactness of the pool and redusing the
amount of memory zsmalloc pool uses. More on this in 0001
commit message.
v2:
-- Cherry picked a patch from Alexey (minor code tweaks to move
it ahead of this series)
-- zsmalloc does not require anymore pages-per-zspage limit to be a
pow of 2 value, and overall doesn't use "order" any longer
-- zram does not require "zspage order" (pow of 2) value anymore
and instead accepts an integer in [1,16] range
-- There is no global huge_class_size in zsmalloc anymore.
huge_class_size is per-pool, since it depends on pager-per-zspage,
which can be different for different pools.
-- There is no global huge_class_size in zram anymore. It should
be per-pool (per-device).
-- Updated documentation
-- Fixed documentation htmldocs warning (Stephen)
-- Dropped get_pages_per_zspage() patch
-- Renamed zram sysfs knob (device attribute)
-- Re-worked "synthetic test" section in the first commit: more numbers,
objects distribution analysis, etc.
Alexey Romanov (1):
zram: add size class equals check into recompression
Sergey Senozhatsky (8):
zsmalloc: turn zspage order into runtime variable
zsmalloc: move away from page order defines
zsmalloc: make huge class watermark zs_pool member
zram: huge size watermark cannot be global
zsmalloc: pass limit on pages per-zspage to zs_create_pool()
zram: add pages_per_pool_page device attribute
Documentation: document zram pages_per_pool_page attribute
zsmalloc: break out of loop when found perfect zspage order
Documentation/admin-guide/blockdev/zram.rst | 38 +++++--
drivers/block/zram/zram_drv.c | 63 +++++++++--
drivers/block/zram/zram_drv.h | 7 ++
include/linux/zsmalloc.h | 14 ++-
mm/zsmalloc.c | 114 +++++++++++++-------
5 files changed, 178 insertions(+), 58 deletions(-)
--
2.38.0.135.g90850a2211-goog
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCHv2 1/9] zram: add size class equals check into recompression
2022-10-26 11:29 [PATCHv2 0/9] zsmalloc/zram: configurable zspage size Sergey Senozhatsky
@ 2022-10-26 11:29 ` Sergey Senozhatsky
2022-10-26 11:29 ` [PATCHv2 2/9] zsmalloc: turn zspage order into runtime variable Sergey Senozhatsky
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Sergey Senozhatsky @ 2022-10-26 11:29 UTC (permalink / raw)
To: Andrew Morton, Minchan Kim
Cc: Nitin Gupta, linux-kernel, linux-mm, Alexey Romanov, Sergey Senozhatsky
From: Alexey Romanov <avromanov@sberdevices.ru>
It makes no sense for us to recompress the object if it will
be in the same size class. We anyway don't get any memory gain.
But, at the same time, we get a CPU time overhead when inserting
this object into zspage and decompressing it afterwards.
Signed-off-by: Alexey Romanov <avromanov@sberdevices.ru>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
drivers/block/zram/zram_drv.c | 5 +++++
include/linux/zsmalloc.h | 2 ++
mm/zsmalloc.c | 21 +++++++++++++++++++++
3 files changed, 28 insertions(+)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 364323713393..fd31beb6491a 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1632,6 +1632,8 @@ static int zram_recompress(struct zram *zram, u32 index, struct page *page,
unsigned long handle_next;
unsigned int comp_len_next;
unsigned int comp_len_prev;
+ unsigned int class_index_prev;
+ unsigned int class_index_next;
struct zcomp_strm *zstrm;
void *src, *dst;
int ret;
@@ -1656,6 +1658,8 @@ static int zram_recompress(struct zram *zram, u32 index, struct page *page,
ret = zcomp_compress(zstrm, src, &comp_len_next);
kunmap_atomic(src);
+ class_index_prev = zs_lookup_class_index(zram->mem_pool, comp_len_prev);
+ class_index_next = zs_lookup_class_index(zram->mem_pool, comp_len_next);
/*
* Either a compression error or we failed to compressed the object
* in a way that will save us memory. Mark the object so that we
@@ -1663,6 +1667,7 @@ static int zram_recompress(struct zram *zram, u32 index, struct page *page,
*/
if (comp_len_next >= huge_class_size ||
comp_len_next >= comp_len_prev ||
+ class_index_next >= class_index_prev ||
ret) {
zram_set_flag(zram, index, ZRAM_RECOMP_SKIP);
zram_clear_flag(zram, index, ZRAM_IDLE);
diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index 2a430e713ce5..a48cd0ffe57d 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -55,5 +55,7 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle);
unsigned long zs_get_total_pages(struct zs_pool *pool);
unsigned long zs_compact(struct zs_pool *pool);
+unsigned int zs_lookup_class_index(struct zs_pool *pool, unsigned int size);
+
void zs_pool_stats(struct zs_pool *pool, struct zs_pool_stats *stats);
#endif
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index d03941cace2c..065744b7e9d8 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1205,6 +1205,27 @@ static bool zspage_full(struct size_class *class, struct zspage *zspage)
return get_zspage_inuse(zspage) == class->objs_per_zspage;
}
+/**
+ * zs_lookup_class_index() - Returns index of the zsmalloc &size_class
+ * that hold objects of the provided size.
+ * @pool: zsmalloc pool to use
+ * @size: object size
+ *
+ * Context: Any context.
+ *
+ * Return: the index of the zsmalloc &size_class that hold objects of the
+ * provided size.
+ */
+unsigned int zs_lookup_class_index(struct zs_pool *pool, unsigned int size)
+{
+ struct size_class *class;
+
+ class = pool->size_class[get_size_class_index(size)];
+
+ return class->index;
+}
+EXPORT_SYMBOL_GPL(zs_lookup_class_index);
+
unsigned long zs_get_total_pages(struct zs_pool *pool)
{
return atomic_long_read(&pool->pages_allocated);
--
2.38.0.135.g90850a2211-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCHv2 2/9] zsmalloc: turn zspage order into runtime variable
2022-10-26 11:29 [PATCHv2 0/9] zsmalloc/zram: configurable zspage size Sergey Senozhatsky
2022-10-26 11:29 ` [PATCHv2 1/9] zram: add size class equals check into recompression Sergey Senozhatsky
@ 2022-10-26 11:29 ` Sergey Senozhatsky
2022-10-26 11:29 ` [PATCHv2 3/9] zsmalloc: move away from page order defines Sergey Senozhatsky
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Sergey Senozhatsky @ 2022-10-26 11:29 UTC (permalink / raw)
To: Andrew Morton, Minchan Kim
Cc: Nitin Gupta, linux-kernel, linux-mm, Sergey Senozhatsky
zsmalloc has 255 size classes. Size classes contain a number of zspages,
which store objects of the same size. zspage can consist of up to four
physical pages. The exact (most optimal) zspage size is calculated for
each size class during zsmalloc pool creation.
As a reasonable optimization, zsmalloc merges size classes that have
similar characteristics: number of pages per zspage and number of
objects zspage can store.
For example, let's look at the following size classes:
class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
..
94 1536 0 0 0 0 0 3 0
100 1632 0 0 0 0 0 2 0
..
Size classes #95-99 are merged with size class #100. That is, each time
we store an object of size, say, 1568 bytes instead of using class #96
we end up storing it in size class #100. Class #100 is for objects of
1632 bytes in size, hence every 1568 bytes object wastes 1632-1568 bytes.
Class #100 zspages consist of 2 physical pages and can hold 5 objects.
When we need to store, say, 13 objects of size 1568 we end up allocating
three zspages; in other words, 6 physical pages.
However, if we'll look closer at size class #96 (which should hold objects
of size 1568 bytes) and trace get_pages_per_zspage():
pages per zspage wasted bytes used%
1 960 76
2 352 95
3 1312 89
4 704 95
5 96 99
We'd notice that the most optimal zspage configuration for this class is
when it consists of 5 physical pages, but currently we never let zspages
to consists of more than 4 pages. A 5 page class #96 configuration would
store 13 objects of size 1568 in a single zspage, allocating 5 physical
pages, as opposed to 6 physical pages that class #100 will allocate.
A higher order zspage for class #96 also changes its key characteristics:
pages per-zspage and objects per-zspage. As a result classes #96 and #100
are not merged anymore, which gives us more compact zsmalloc.
Of course the described effect does not apply only to size classes #96 and
We still merge classes, but less often so. In other words classes are grouped
in a more compact way, which decreases memory wastage:
zspage order # unique size classes
2 69
3 123
4 191
Let's take a closer look at the bottom of /sys/kernel/debug/zsmalloc/zram0/classes:
class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
...
202 3264 0 0 0 0 0 4 0
254 4096 0 0 0 0 0 1 0
...
For exactly same reason - maximum 4 pages per zspage - the last non-huge
size class is #202, which stores objects of size 3264 bytes. Any object
larger than 3264 bytes, hence, is considered to be huge and lands in size
class #254, which uses a whole physical page to store every object. To put
it slightly differently - objects in huge classes don't share physical pages.
3264 bytes is too low of a watermark and we have too many huge classes:
classes from #203 to #254. Similarly to class size #96 above, higher order
zspages change key characteristics for some of those huge size classes and
thus those classes become normal classes, where stored objects share physical
pages.
Hence yet another consequence of higher order zspages: we move the huge
size class watermark with higher order zspages, have less huge classes and
store large objects in a more compact way.
For order 3, huge class watermark becomes 3632 bytes:
class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
...
202 3264 0 0 0 0 0 4 0
211 3408 0 0 0 0 0 5 0
217 3504 0 0 0 0 0 6 0
222 3584 0 0 0 0 0 7 0
225 3632 0 0 0 0 0 8 0
254 4096 0 0 0 0 0 1 0
...
For order 4, huge class watermark becomes 3840 bytes:
class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
...
202 3264 0 0 0 0 0 4 0
206 3328 0 0 0 0 0 13 0
207 3344 0 0 0 0 0 9 0
208 3360 0 0 0 0 0 14 0
211 3408 0 0 0 0 0 5 0
212 3424 0 0 0 0 0 16 0
214 3456 0 0 0 0 0 11 0
217 3504 0 0 0 0 0 6 0
219 3536 0 0 0 0 0 13 0
222 3584 0 0 0 0 0 7 0
223 3600 0 0 0 0 0 15 0
225 3632 0 0 0 0 0 8 0
228 3680 0 0 0 0 0 9 0
230 3712 0 0 0 0 0 10 0
232 3744 0 0 0 0 0 11 0
234 3776 0 0 0 0 0 12 0
235 3792 0 0 0 0 0 13 0
236 3808 0 0 0 0 0 14 0
238 3840 0 0 0 0 0 15 0
254 4096 0 0 0 0 0 1 0
...
TESTS
=====
1) ChromeOS memory pressure test
=============================================================================
Our standard memory pressure test, that is designed with reproducibility
in mind.
zram is configured as a swap device, lzo-rle compression algorithm.
We captured /sys/block/zram0/mm_stat after every test and rebooted
device.
Columns per (Documentation/admin-guide/blockdev/zram.rst)
orig_data_size mem_used_total mem_used_max pages_compacted
compr_data_size mem_limit same_pages huge_pages
ORDER 2 (BASE) zspage
10353639424 2981711944 3166896128 0 3543158784 579494 825135 123707
10168573952 2932288347 3106541568 0 3499085824 565187 853137 126153
9950461952 2815911234 3035693056 0 3441090560 586696 748054 122103
9892335616 2779566152 2943459328 0 3514736640 591541 650696 119621
9993949184 2814279212 3021357056 0 3336421376 582488 711744 121273
9953226752 2856382009 3025649664 0 3512893440 564559 787861 123034
9838448640 2785481728 2997575680 0 3367219200 573282 777099 122739
ORDER 3 zspage
9509138432 2706941227 2823393280 0 3389587456 535856 1011472 90223
10105245696 2882368370 3013095424 0 3296165888 563896 1059033 94808
9531236352 2666125512 2867650560 0 3396173824 567117 1126396 88807
9561812992 2714536764 2956652544 0 3310505984 548223 827322 90992
9807470592 2790315707 2908053504 0 3378315264 563670 1020933 93725
10178371584 2948838782 3071209472 0 3329548288 548533 954546 90730
9925165056 2849839413 2958274560 0 3336978432 551464 1058302 89381
ORDER 4 zspage
9444515840 2613362645 2668232704 0 3396759552 573735 1162207 83475
10129108992 2925888488 3038351360 0 3499597824 555634 1231542 84525
9876594688 2786692282 2897006592 0 3469463552 584835 1290535 84133
10012909568 2649711847 2801512448 0 3171323904 675405 750728 80424
10120966144 2866742402 2978639872 0 3257815040 587435 1093981 83587
9578790912 2671245225 2802270208 0 3376353280 545548 1047930 80895
10108588032 2888433523 2983960576 0 3316641792 571445 1290640 81402
First, we establish that order 3 and 4 don't cause any statistically
significant change in `orig_data_size` (number of bytes we store during
the test), in other words larger zspages don't cause regressions.
T-test for order 3:
x order-2-stored
+ order-3-stored
+-----------------------------------------------------------------------------+
|+ + + + x x + x x + x+ x|
| |________________________AM__|_________M_____A____|__________| |
+-----------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 7 9.8384486e+09 1.0353639e+10 9.9532268e+09 1.0021519e+10 1.7916718e+08
+ 7 9.5091384e+09 1.0178372e+10 9.8074706e+09 9.8026344e+09 2.7856206e+08
No difference proven at 95.0% confidence
T-test for order 4:
x order-2-stored
+ order-4-stored
+-----------------------------------------------------------------------------+
| + |
|+ + x +x xx x + ++ x x|
| |__________________|____A____M____M____________|_| |
+-----------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 7 9.8384486e+09 1.0353639e+10 9.9532268e+09 1.0021519e+10 1.7916718e+08
+ 7 9.4445158e+09 1.0129109e+10 1.001291e+10 9.8959249e+09 2.7947784e+08
No difference proven at 95.0% confidence
Next we establish that there is a statistically significant improvement
in `mem_used_total` metrics.
T-test for order 3:
x order-2-usedmem
+ order-3-usedmem
+-----------------------------------------------------------------------------+
|+ + + x ++ x + xx x + x x|
| |_________________A__M__|____________|__A________________| |
+-----------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 7 2.9434593e+09 3.1668961e+09 3.0256497e+09 3.0424532e+09 73235062
+ 7 2.8233933e+09 3.0712095e+09 2.9566525e+09 2.9426185e+09 84630851
Difference at 95.0% confidence
-9.98347e+07 +/- 9.21744e+07
-3.28139% +/- 3.02961%
(Student's t, pooled s = 7.91383e+07)
T-test for order 4:
x order-2-usedmem
+ order-4-usedmem
+-----------------------------------------------------------------------------+
| + x |
|+ + + x ++ x x * x x|
| |__________________A__M__________|_____|_M__A__________| |
+-----------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 7 2.9434593e+09 3.1668961e+09 3.0256497e+09 3.0424532e+09 73235062
+ 7 2.6682327e+09 3.0383514e+09 2.8970066e+09 2.8814248e+09 1.3098053e+08
Difference at 95.0% confidence
-1.61028e+08 +/- 1.23591e+08
-5.29272% +/- 4.0622%
(Student's t, pooled s = 1.06111e+08)
Order 3 zspages also show statistically significant improvement in
`mem_used_max` metrics.
T-test for order 3:
x order-2-maxmem
+ order-3-maxmem
+-----------------------------------------------------------------------------+
|+ + + x+ x + + + x x x x|
| |________M__A_________|_|_____________________A___________M____________| |
+-----------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 7 3.3364214e+09 3.5431588e+09 3.4990858e+09 3.4592294e+09 80073158
+ 7 3.2961659e+09 3.3961738e+09 3.3369784e+09 3.3481822e+09 39840377
Difference at 95.0% confidence
-1.11047e+08 +/- 7.36589e+07
-3.21017% +/- 2.12934%
(Student's t, pooled s = 6.32415e+07)
Order 4 zspages, on the other hand, do not show any statistically significant
improvement in `mem_used_max` metrics.
T-test for order 4:
x order-2-maxmem
+ order-4-maxmem
+-----------------------------------------------------------------------------+
|+ + + x x + + x + * x x|
| |_______________________A___M________________A_|_____M_______| |
+-----------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 7 3.3364214e+09 3.5431588e+09 3.4990858e+09 3.4592294e+09 80073158
+ 7 3.1713239e+09 3.4995978e+09 3.3763533e+09 3.3554221e+09 1.1609062e+08
No difference proven at 95.0% confidence
Overall, with sufficient level of confidence, order 3 zspages appear to be
beneficial for these particular use-case and data patterns.
Rather expectedly we also observed lower numbers of huge-pages when zsmalloc
is configured with order 3 and order 4 zspages, for the reason already
explained.
2) Synthetic test
=============================================================================
Test untars linux-6.0.tar.xz and compiles the kernel.
zram is configured as a block device with ext4 file system, lzo-rle
compression algorithm. We captured /sys/block/zram0/mm_stat after
every test and rebooted the VM.
orig_data_size mem_used_total mem_used_max pages_compacted
compr_data_size mem_limit same_pages huge_pages
ORDER 2 (BASE) zspage
1691791360 628086729 655171584 0 655171584 60 0 34043
1691787264 628089196 655175680 0 655175680 60 0 34046
1691803648 628098840 655187968 0 655187968 59 0 34047
1691795456 628091503 655183872 0 655183872 60 0 34044
1691799552 628086877 655183872 0 655183872 60 0 34047
ORDER 3 zspage
1691803648 627792993 641794048 0 641794048 60 0 33591
1691787264 627779342 641708032 0 641708032 59 0 33591
1691811840 627786616 641769472 0 641769472 60 0 33591
1691803648 627794468 641818624 0 641818624 59 0 33592
1691783168 627780882 641794048 0 641794048 61 0 33591
ORDER 4 zspage
1691803648 627726635 639655936 0 639655936 60 0 33435
1691811840 627733348 639643648 0 639643648 61 0 33434
1691795456 627726290 639614976 0 639614976 60 0 33435
1691803648 627730458 639688704 0 639688704 60 0 33434
1691811840 627727771 639688704 0 639688704 60 0 33434
Order 3 and order 4 show statistically significant improvement in
`mem_used_max` metrics.
T-test for order 3:
x order-2-maxmem
+ order-3-maxmem
+--------------------------------------------------------------------------+
|+ x|
|+ x|
|+ x|
|++ x|
|A| A|
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 5 6.5517158e+08 6.5518797e+08 6.5518387e+08 6.551806e+08 6730.4157
+ 5 6.4170803e+08 6.4181862e+08 6.4179405e+08 6.4177684e+08 42210.666
Difference at 95.0% confidence
-1.34038e+07 +/- 44080.7
-2.04581% +/- 0.00672802%
(Student's t, pooled s = 30224.5)
T-test for order 4:
x order-2-maxmem
+ order-4-maxmem
+--------------------------------------------------------------------------+
|+ x|
|+ x|
|+ x|
|+ x|
|+ x|
|A A|
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 5 6.5517158e+08 6.5518797e+08 6.5518387e+08 6.551806e+08 6730.4157
+ 5 6.3961498e+08 6.396887e+08 6.3965594e+08 6.3965839e+08 31408.602
Difference at 95.0% confidence
-1.55222e+07 +/- 33126.2
-2.36915% +/- 0.00505604%
(Student's t, pooled s = 22713.4)
This test tends to benefit more from order 4 zspages, due to test's data
patterns.
zsmalloc object distribution analysis
=============================================================================
Order 2 (4 pages per zspage) tends to put many objects in size class 2048,
which is merged with size classes #112-#125:
class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
...
71 1168 0 0 6146 6146 1756 2 0
74 1216 0 1 4560 4552 1368 3 0
76 1248 0 1 2938 2934 904 4 0
83 1360 0 0 10971 10971 3657 1 0
91 1488 0 0 16126 16126 5864 4 0
94 1536 0 1 5912 5908 2217 3 0
100 1632 0 0 11990 11990 4796 2 0
107 1744 0 1 15771 15768 6759 3 0
111 1808 0 1 10386 10380 4616 4 0
126 2048 0 0 45444 45444 22722 1 0
144 2336 0 0 47446 47446 27112 4 0
151 2448 1 0 10760 10759 6456 3 0
168 2720 0 0 10173 10173 6782 2 0
190 3072 0 1 1700 1697 1275 3 0
202 3264 0 1 290 286 232 4 0
254 4096 0 0 34051 34051 34051 1 0
Order 3 (8 pages per zspage) changed pool characteristics and unmerged
some of the size classes, which resulted in less objects being put into
size class 2048, because there are lower size classes are now available
for more compact object storage:
class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable
...
71 1168 0 1 2996 2994 856 2 0
72 1184 0 1 1632 1609 476 7 0
73 1200 1 0 1445 1442 425 5 0
74 1216 0 0 1510 1510 453 3 0
75 1232 0 1 1495 1479 455 7 0
76 1248 0 1 1456 1451 448 4 0
78 1280 0 1 3040 3033 950 5 0
79 1296 0 1 1584 1571 504 7 0
83 1360 0 0 6375 6375 2125 1 0
84 1376 0 1 1817 1796 632 8 0
87 1424 0 1 6020 6006 2107 7 0
88 1440 0 1 2108 2101 744 6 0
89 1456 0 1 2072 2064 740 5 0
91 1488 0 1 4169 4159 1516 4 0
92 1504 0 1 2014 2007 742 7 0
94 1536 0 1 3904 3900 1464 3 0
95 1552 0 1 1890 1873 720 8 0
96 1568 0 1 1963 1958 755 5 0
97 1584 0 1 1980 1974 770 7 0
100 1632 0 1 6190 6187 2476 2 0
103 1680 0 0 6477 6477 2667 7 0
104 1696 0 1 2256 2253 940 5 0
105 1712 0 1 2356 2340 992 8 0
107 1744 1 0 4697 4696 2013 3 0
110 1792 0 1 7744 7734 3388 7 0
111 1808 0 1 2655 2649 1180 4 0
114 1856 0 1 8371 8365 3805 5 0
116 1888 1 0 5863 5862 2706 6 0
117 1904 0 1 2955 2942 1379 7 0
118 1920 0 1 3009 2997 1416 8 0
126 2048 0 0 25276 25276 12638 1 0
128 2080 0 1 6060 6052 3232 8 0
129 2096 1 0 3081 3080 1659 7 0
134 2176 0 1 14835 14830 7912 8 0
135 2192 0 1 2769 2758 1491 7 0
137 2224 0 1 5082 5077 2772 6 0
140 2272 0 1 7236 7232 4020 5 0
144 2336 0 1 8428 8423 4816 4 0
147 2384 0 1 5316 5313 3101 7 0
151 2448 0 1 5445 5443 3267 3 0
155 2512 0 0 4121 4121 2536 8 0
158 2560 0 1 2208 2205 1380 5 0
160 2592 0 0 1133 1133 721 7 0
168 2720 0 0 2712 2712 1808 2 0
177 2864 1 0 1100 1098 770 7 0
180 2912 0 1 189 183 135 5 0
184 2976 0 1 176 166 128 8 0
190 3072 0 0 252 252 189 3 0
197 3184 0 1 198 192 154 7 0
202 3264 0 1 100 96 80 4 0
211 3408 0 1 210 208 175 5 0
217 3504 0 1 98 94 84 6 0
222 3584 0 0 104 104 91 7 0
225 3632 0 1 54 50 48 8 0
254 4096 0 0 33591 33591 33591 1 0
Note, the huge size watermark is above 3632 and there are a number of new
normal classes available that previously were merged with the huge class.
For instance, size class #211 holds 210 objects of size 3408 and uses 175
physical pages, while previously for those objects we would have used 210
physical pages.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
include/linux/zsmalloc.h | 12 +++++++
mm/zsmalloc.c | 75 +++++++++++++++++++++++-----------------
2 files changed, 56 insertions(+), 31 deletions(-)
diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index a48cd0ffe57d..6cd1d95b928a 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -33,6 +33,18 @@ enum zs_mapmode {
*/
};
+#define ZS_PAGE_ORDER_2 2
+#define ZS_PAGE_ORDER_4 4
+
+/*
+ * A single 'zspage' is composed of up to 2^N discontiguous 0-order (single)
+ * pages. ZS_MAX_PAGE_ORDER defines upper limit on N, ZS_MIN_PAGE_ORDER
+ * defines lower limit on N. ZS_DEFAULT_PAGE_ORDER is recommended value.
+ */
+#define ZS_MIN_PAGE_ORDER ZS_PAGE_ORDER_2
+#define ZS_MAX_PAGE_ORDER ZS_PAGE_ORDER_4
+#define ZS_DEFAULT_PAGE_ORDER ZS_PAGE_ORDER_2
+
struct zs_pool_stats {
/* How many pages were migrated (freed) */
atomic_long_t pages_compacted;
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 065744b7e9d8..bc377e5d3417 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -74,12 +74,7 @@
*/
#define ZS_ALIGN 8
-/*
- * A single 'zspage' is composed of up to 2^N discontiguous 0-order (single)
- * pages. ZS_MAX_ZSPAGE_ORDER defines upper limit on N.
- */
-#define ZS_MAX_ZSPAGE_ORDER 2
-#define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_ZSPAGE_ORDER)
+#define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_PAGE_ORDER)
#define ZS_HANDLE_SIZE (sizeof(unsigned long))
@@ -124,10 +119,8 @@
#define ISOLATED_BITS 3
#define MAGIC_VAL_BITS 8
-#define MAX(a, b) ((a) >= (b) ? (a) : (b))
-/* ZS_MIN_ALLOC_SIZE must be multiple of ZS_ALIGN */
-#define ZS_MIN_ALLOC_SIZE \
- MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS))
+#define ZS_MIN_ALLOC_SIZE 32U
+
/* each chunk includes extra space to keep handle */
#define ZS_MAX_ALLOC_SIZE PAGE_SIZE
@@ -141,12 +134,10 @@
* determined). NOTE: all those class sizes must be set as multiple of
* ZS_ALIGN to make sure link_free itself never has to span 2 pages.
*
- * ZS_MIN_ALLOC_SIZE and ZS_SIZE_CLASS_DELTA must be multiple of ZS_ALIGN
- * (reason above)
+ * pool->min_alloc_size (ZS_MIN_ALLOC_SIZE) and ZS_SIZE_CLASS_DELTA must
+ * be multiple of ZS_ALIGN (reason above)
*/
#define ZS_SIZE_CLASS_DELTA (PAGE_SIZE >> CLASS_BITS)
-#define ZS_SIZE_CLASSES (DIV_ROUND_UP(ZS_MAX_ALLOC_SIZE - ZS_MIN_ALLOC_SIZE, \
- ZS_SIZE_CLASS_DELTA) + 1)
enum fullness_group {
ZS_EMPTY,
@@ -230,12 +221,15 @@ struct link_free {
struct zs_pool {
const char *name;
- struct size_class *size_class[ZS_SIZE_CLASSES];
+ struct size_class **size_class;
struct kmem_cache *handle_cachep;
struct kmem_cache *zspage_cachep;
atomic_long_t pages_allocated;
+ u32 num_size_classes;
+ u32 min_alloc_size;
+
struct zs_pool_stats stats;
/* Compact classes */
@@ -523,15 +517,15 @@ static void set_zspage_mapping(struct zspage *zspage,
* classes depending on its size. This function returns index of the
* size class which has chunk size big enough to hold the given size.
*/
-static int get_size_class_index(int size)
+static int get_size_class_index(struct zs_pool *pool, int size)
{
int idx = 0;
- if (likely(size > ZS_MIN_ALLOC_SIZE))
- idx = DIV_ROUND_UP(size - ZS_MIN_ALLOC_SIZE,
+ if (likely(size > pool->min_alloc_size))
+ idx = DIV_ROUND_UP(size - pool->min_alloc_size,
ZS_SIZE_CLASS_DELTA);
- return min_t(int, ZS_SIZE_CLASSES - 1, idx);
+ return min_t(int, pool->num_size_classes - 1, idx);
}
/* type can be of enum type class_stat_type or fullness_group */
@@ -591,7 +585,7 @@ static int zs_stats_size_show(struct seq_file *s, void *v)
"obj_allocated", "obj_used", "pages_used",
"pages_per_zspage", "freeable");
- for (i = 0; i < ZS_SIZE_CLASSES; i++) {
+ for (i = 0; i < pool->num_size_classes; i++) {
class = pool->size_class[i];
if (class->index != i)
@@ -777,13 +771,13 @@ static enum fullness_group fix_fullness_group(struct size_class *class,
* link together 3 PAGE_SIZE sized pages to form a zspage
* since then we can perfectly fit in 8 such objects.
*/
-static int get_pages_per_zspage(int class_size)
+static int get_pages_per_zspage(u32 class_size, u32 max_pages_per_zspage)
{
int i, max_usedpc = 0;
/* zspage order which gives maximum used size per KB */
int max_usedpc_order = 1;
- for (i = 1; i <= ZS_MAX_PAGES_PER_ZSPAGE; i++) {
+ for (i = 1; i <= max_pages_per_zspage; i++) {
int zspage_size;
int waste, usedpc;
@@ -1220,7 +1214,7 @@ unsigned int zs_lookup_class_index(struct zs_pool *pool, unsigned int size)
{
struct size_class *class;
- class = pool->size_class[get_size_class_index(size)];
+ class = pool->size_class[get_size_class_index(pool, size)];
return class->index;
}
@@ -1431,7 +1425,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp)
/* extra space in chunk to keep the handle */
size += ZS_HANDLE_SIZE;
- class = pool->size_class[get_size_class_index(size)];
+ class = pool->size_class[get_size_class_index(pool, size)];
/* class->lock effectively protects the zpage migration */
spin_lock(&class->lock);
@@ -1980,7 +1974,7 @@ static void async_free_zspage(struct work_struct *work)
struct zs_pool *pool = container_of(work, struct zs_pool,
free_work);
- for (i = 0; i < ZS_SIZE_CLASSES; i++) {
+ for (i = 0; i < pool->num_size_classes; i++) {
class = pool->size_class[i];
if (class->index != i)
continue;
@@ -2129,7 +2123,7 @@ unsigned long zs_compact(struct zs_pool *pool)
struct size_class *class;
unsigned long pages_freed = 0;
- for (i = ZS_SIZE_CLASSES - 1; i >= 0; i--) {
+ for (i = pool->num_size_classes - 1; i >= 0; i--) {
class = pool->size_class[i];
if (class->index != i)
continue;
@@ -2173,7 +2167,7 @@ static unsigned long zs_shrinker_count(struct shrinker *shrinker,
struct zs_pool *pool = container_of(shrinker, struct zs_pool,
shrinker);
- for (i = ZS_SIZE_CLASSES - 1; i >= 0; i--) {
+ for (i = pool->num_size_classes - 1; i >= 0; i--) {
class = pool->size_class[i];
if (class->index != i)
continue;
@@ -2215,11 +2209,28 @@ struct zs_pool *zs_create_pool(const char *name)
int i;
struct zs_pool *pool;
struct size_class *prev_class = NULL;
+ u32 max_pages_per_zspage;
pool = kzalloc(sizeof(*pool), GFP_KERNEL);
if (!pool)
return NULL;
+ max_pages_per_zspage = 1U << ZS_DEFAULT_PAGE_ORDER;
+ /* min_alloc_size must be multiple of ZS_ALIGN */
+ pool->min_alloc_size = (max_pages_per_zspage << PAGE_SHIFT) >>
+ OBJ_INDEX_BITS;
+ pool->min_alloc_size = max(pool->min_alloc_size, ZS_MIN_ALLOC_SIZE);
+
+ pool->num_size_classes =
+ DIV_ROUND_UP(ZS_MAX_ALLOC_SIZE - pool->min_alloc_size,
+ ZS_SIZE_CLASS_DELTA) + 1;
+
+ pool->size_class = kmalloc_array(pool->num_size_classes,
+ sizeof(struct size_class *),
+ GFP_KERNEL | __GFP_ZERO);
+ if (!pool->size_class)
+ goto err;
+
init_deferred_free(pool);
rwlock_init(&pool->migrate_lock);
@@ -2234,17 +2245,18 @@ struct zs_pool *zs_create_pool(const char *name)
* Iterate reversely, because, size of size_class that we want to use
* for merging should be larger or equal to current size.
*/
- for (i = ZS_SIZE_CLASSES - 1; i >= 0; i--) {
+ for (i = pool->num_size_classes - 1; i >= 0; i--) {
int size;
int pages_per_zspage;
int objs_per_zspage;
struct size_class *class;
int fullness = 0;
- size = ZS_MIN_ALLOC_SIZE + i * ZS_SIZE_CLASS_DELTA;
+ size = pool->min_alloc_size + i * ZS_SIZE_CLASS_DELTA;
if (size > ZS_MAX_ALLOC_SIZE)
size = ZS_MAX_ALLOC_SIZE;
- pages_per_zspage = get_pages_per_zspage(size);
+ pages_per_zspage = get_pages_per_zspage(size,
+ max_pages_per_zspage);
objs_per_zspage = pages_per_zspage * PAGE_SIZE / size;
/*
@@ -2328,7 +2340,7 @@ void zs_destroy_pool(struct zs_pool *pool)
zs_flush_migration(pool);
zs_pool_stat_destroy(pool);
- for (i = 0; i < ZS_SIZE_CLASSES; i++) {
+ for (i = 0; i < pool->num_size_classes; i++) {
int fg;
struct size_class *class = pool->size_class[i];
@@ -2348,6 +2360,7 @@ void zs_destroy_pool(struct zs_pool *pool)
}
destroy_cache(pool);
+ kfree(pool->size_class);
kfree(pool->name);
kfree(pool);
}
--
2.38.0.135.g90850a2211-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCHv2 3/9] zsmalloc: move away from page order defines
2022-10-26 11:29 [PATCHv2 0/9] zsmalloc/zram: configurable zspage size Sergey Senozhatsky
2022-10-26 11:29 ` [PATCHv2 1/9] zram: add size class equals check into recompression Sergey Senozhatsky
2022-10-26 11:29 ` [PATCHv2 2/9] zsmalloc: turn zspage order into runtime variable Sergey Senozhatsky
@ 2022-10-26 11:29 ` Sergey Senozhatsky
2022-10-26 11:29 ` [PATCHv2 4/9] zsmalloc: make huge class watermark zs_pool member Sergey Senozhatsky
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Sergey Senozhatsky @ 2022-10-26 11:29 UTC (permalink / raw)
To: Andrew Morton, Minchan Kim
Cc: Nitin Gupta, linux-kernel, linux-mm, Sergey Senozhatsky
There is no reason for us to require pages per-zspage to be a
power of two. Rename macros and use plain limit numbers there
instead of 2 ^ N values. This will let us to have more tunable
limits.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
include/linux/zsmalloc.h | 16 +++++++---------
mm/zsmalloc.c | 4 +---
2 files changed, 8 insertions(+), 12 deletions(-)
diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index 6cd1d95b928a..b6b8654a2d45 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -33,17 +33,15 @@ enum zs_mapmode {
*/
};
-#define ZS_PAGE_ORDER_2 2
-#define ZS_PAGE_ORDER_4 4
-
/*
- * A single 'zspage' is composed of up to 2^N discontiguous 0-order (single)
- * pages. ZS_MAX_PAGE_ORDER defines upper limit on N, ZS_MIN_PAGE_ORDER
- * defines lower limit on N. ZS_DEFAULT_PAGE_ORDER is recommended value.
+ * A single 'zspage' is composed of up to N discontiguous 0-order
+ * (single) pages. ZS_MAX_PAGES_PER_ZSPAGE defines upper limit on N,
+ * ZS_MIN_PAGES_PER_ZSPAGE defines lower limit on N.
+ * ZS_DEFAULT_PAGES_PER_ZSPAGE is a recommended value.
*/
-#define ZS_MIN_PAGE_ORDER ZS_PAGE_ORDER_2
-#define ZS_MAX_PAGE_ORDER ZS_PAGE_ORDER_4
-#define ZS_DEFAULT_PAGE_ORDER ZS_PAGE_ORDER_2
+#define ZS_MIN_PAGES_PER_ZSPAGE 1
+#define ZS_MAX_PAGES_PER_ZSPAGE 16
+#define ZS_DEFAULT_PAGES_PER_ZSPAGE 4
struct zs_pool_stats {
/* How many pages were migrated (freed) */
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index bc377e5d3417..d4f68fba746b 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -74,8 +74,6 @@
*/
#define ZS_ALIGN 8
-#define ZS_MAX_PAGES_PER_ZSPAGE (_AC(1, UL) << ZS_MAX_PAGE_ORDER)
-
#define ZS_HANDLE_SIZE (sizeof(unsigned long))
/*
@@ -2215,7 +2213,7 @@ struct zs_pool *zs_create_pool(const char *name)
if (!pool)
return NULL;
- max_pages_per_zspage = 1U << ZS_DEFAULT_PAGE_ORDER;
+ max_pages_per_zspage = ZS_DEFAULT_PAGES_PER_ZSPAGE;
/* min_alloc_size must be multiple of ZS_ALIGN */
pool->min_alloc_size = (max_pages_per_zspage << PAGE_SHIFT) >>
OBJ_INDEX_BITS;
--
2.38.0.135.g90850a2211-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCHv2 4/9] zsmalloc: make huge class watermark zs_pool member
2022-10-26 11:29 [PATCHv2 0/9] zsmalloc/zram: configurable zspage size Sergey Senozhatsky
` (2 preceding siblings ...)
2022-10-26 11:29 ` [PATCHv2 3/9] zsmalloc: move away from page order defines Sergey Senozhatsky
@ 2022-10-26 11:29 ` Sergey Senozhatsky
2022-10-26 11:29 ` [PATCHv2 5/9] zram: huge size watermark cannot be global Sergey Senozhatsky
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Sergey Senozhatsky @ 2022-10-26 11:29 UTC (permalink / raw)
To: Andrew Morton, Minchan Kim
Cc: Nitin Gupta, linux-kernel, linux-mm, Sergey Senozhatsky
We will permit per-pool configuration of pages per-zspage value,
which changes characteristics of the classes and moves around
huge class size watermark. Thus huge class size needs to be
a per-pool variable.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
mm/zsmalloc.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index d4f68fba746b..cab8080e2fc6 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -178,7 +178,6 @@ static struct dentry *zs_stat_root;
* (see: fix_fullness_group())
*/
static const int fullness_threshold_frac = 4;
-static size_t huge_class_size;
struct size_class {
spinlock_t lock;
@@ -227,6 +226,7 @@ struct zs_pool {
u32 num_size_classes;
u32 min_alloc_size;
+ size_t huge_class_size;
struct zs_pool_stats stats;
@@ -1350,7 +1350,7 @@ EXPORT_SYMBOL_GPL(zs_unmap_object);
*/
size_t zs_huge_class_size(struct zs_pool *pool)
{
- return huge_class_size;
+ return pool->huge_class_size;
}
EXPORT_SYMBOL_GPL(zs_huge_class_size);
@@ -2264,8 +2264,8 @@ struct zs_pool *zs_create_pool(const char *name)
* endup in the huge class.
*/
if (pages_per_zspage != 1 && objs_per_zspage != 1 &&
- !huge_class_size) {
- huge_class_size = size;
+ !pool->huge_class_size) {
+ pool->huge_class_size = size;
/*
* The object uses ZS_HANDLE_SIZE bytes to store the
* handle. We need to subtract it, because zs_malloc()
@@ -2275,7 +2275,7 @@ struct zs_pool *zs_create_pool(const char *name)
* class because it grows by ZS_HANDLE_SIZE extra bytes
* right before class lookup.
*/
- huge_class_size -= (ZS_HANDLE_SIZE - 1);
+ pool->huge_class_size -= (ZS_HANDLE_SIZE - 1);
}
/*
--
2.38.0.135.g90850a2211-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCHv2 5/9] zram: huge size watermark cannot be global
2022-10-26 11:29 [PATCHv2 0/9] zsmalloc/zram: configurable zspage size Sergey Senozhatsky
` (3 preceding siblings ...)
2022-10-26 11:29 ` [PATCHv2 4/9] zsmalloc: make huge class watermark zs_pool member Sergey Senozhatsky
@ 2022-10-26 11:29 ` Sergey Senozhatsky
2022-10-26 11:29 ` [PATCHv2 6/9] zsmalloc: pass limit on pages per-zspage to zs_create_pool() Sergey Senozhatsky
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Sergey Senozhatsky @ 2022-10-26 11:29 UTC (permalink / raw)
To: Andrew Morton, Minchan Kim
Cc: Nitin Gupta, linux-kernel, linux-mm, Sergey Senozhatsky
ZRAM will pass pool specific limit on number of pages
per-zspages which will affect pool's characteristics.
Namely huge size class watermark value. Move huge_class_size
to struct zram, because this value now can be unique to the
pool (zram device).
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
drivers/block/zram/zram_drv.c | 13 +++----------
drivers/block/zram/zram_drv.h | 5 +++++
2 files changed, 8 insertions(+), 10 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index fd31beb6491a..90b0c66bbd5b 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -50,12 +50,6 @@ static const char *default_comp_algs[ZRAM_MAX_ZCOMPS] = {
/* Module params (documentation at end) */
static unsigned int num_devices = 1;
-/*
- * Pages that compress to sizes equals or greater than this are stored
- * uncompressed in memory.
- */
-static size_t huge_class_size;
-
static const struct block_device_operations zram_devops;
static void zram_free_page(struct zram *zram, size_t index);
@@ -1259,8 +1253,7 @@ static bool zram_meta_alloc(struct zram *zram, u64 disksize)
return false;
}
- if (!huge_class_size)
- huge_class_size = zs_huge_class_size(zram->mem_pool);
+ zram->huge_class_size = zs_huge_class_size(zram->mem_pool);
return true;
}
@@ -1488,7 +1481,7 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
return ret;
}
- if (comp_len >= huge_class_size)
+ if (comp_len >= zram->huge_class_size)
comp_len = PAGE_SIZE;
/*
* handle allocation has 2 paths:
@@ -1665,7 +1658,7 @@ static int zram_recompress(struct zram *zram, u32 index, struct page *page,
* in a way that will save us memory. Mark the object so that we
* don't attempt to re-compress it again (RECOMP_SKIP).
*/
- if (comp_len_next >= huge_class_size ||
+ if (comp_len_next >= zram->huge_class_size ||
comp_len_next >= comp_len_prev ||
class_index_next >= class_index_prev ||
ret) {
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 09b9ceb5dfa3..9d6fcfdf7aa7 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -120,6 +120,11 @@ struct zram {
*/
u64 disksize; /* bytes */
const char *comp_algs[ZRAM_MAX_ZCOMPS];
+ /*
+ * Pages that compress to sizes equal or greater than this are stored
+ * uncompressed in memory.
+ */
+ size_t huge_class_size;
/*
* zram is claimed so open request will be failed
*/
--
2.38.0.135.g90850a2211-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCHv2 6/9] zsmalloc: pass limit on pages per-zspage to zs_create_pool()
2022-10-26 11:29 [PATCHv2 0/9] zsmalloc/zram: configurable zspage size Sergey Senozhatsky
` (4 preceding siblings ...)
2022-10-26 11:29 ` [PATCHv2 5/9] zram: huge size watermark cannot be global Sergey Senozhatsky
@ 2022-10-26 11:29 ` Sergey Senozhatsky
2022-10-26 11:29 ` [PATCHv2 7/9] zram: add pages_per_pool_page device attribute Sergey Senozhatsky
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Sergey Senozhatsky @ 2022-10-26 11:29 UTC (permalink / raw)
To: Andrew Morton, Minchan Kim
Cc: Nitin Gupta, linux-kernel, linux-mm, Sergey Senozhatsky
Allow zsmalloc pool owner to specify max number of pages
per-zspage (during pool creation), so that different pools
can have different characteristics.
By default we pass ZS_DEFAULT_PAGES_PER_ZSPAGE which is 4
(matches the current order 2 zspages limit).
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
drivers/block/zram/zram_drv.c | 3 ++-
include/linux/zsmalloc.h | 2 +-
mm/zsmalloc.c | 11 +++++++----
3 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 90b0c66bbd5b..bec02f636bce 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1247,7 +1247,8 @@ static bool zram_meta_alloc(struct zram *zram, u64 disksize)
if (!zram->table)
return false;
- zram->mem_pool = zs_create_pool(zram->disk->disk_name);
+ zram->mem_pool = zs_create_pool(zram->disk->disk_name,
+ ZS_DEFAULT_PAGES_PER_ZSPAGE);
if (!zram->mem_pool) {
vfree(zram->table);
return false;
diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index b6b8654a2d45..ebd1ff3e8115 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -50,7 +50,7 @@ struct zs_pool_stats {
struct zs_pool;
-struct zs_pool *zs_create_pool(const char *name);
+struct zs_pool *zs_create_pool(const char *name, u32 max_pages_per_zspage);
void zs_destroy_pool(struct zs_pool *pool);
unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t flags);
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index cab8080e2fc6..7465a45495bf 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -366,7 +366,7 @@ static void *zs_zpool_create(const char *name, gfp_t gfp,
* different contexts and its caller must provide a valid
* gfp mask.
*/
- return zs_create_pool(name);
+ return zs_create_pool(name, ZS_DEFAULT_PAGES_PER_ZSPAGE);
}
static void zs_zpool_destroy(void *pool)
@@ -2195,6 +2195,7 @@ static int zs_register_shrinker(struct zs_pool *pool)
/**
* zs_create_pool - Creates an allocation pool to work from.
* @name: pool name to be created
+ * @max_pages_per_zspage: maximum number of pages per-zspage
*
* This function must be called before anything when using
* the zsmalloc allocator.
@@ -2202,18 +2203,20 @@ static int zs_register_shrinker(struct zs_pool *pool)
* On success, a pointer to the newly created pool is returned,
* otherwise NULL.
*/
-struct zs_pool *zs_create_pool(const char *name)
+struct zs_pool *zs_create_pool(const char *name, u32 max_pages_per_zspage)
{
int i;
struct zs_pool *pool;
struct size_class *prev_class = NULL;
- u32 max_pages_per_zspage;
+
+ if (WARN_ON(max_pages_per_zspage < ZS_MIN_PAGES_PER_ZSPAGE ||
+ max_pages_per_zspage > ZS_MAX_PAGES_PER_ZSPAGE))
+ return NULL;
pool = kzalloc(sizeof(*pool), GFP_KERNEL);
if (!pool)
return NULL;
- max_pages_per_zspage = ZS_DEFAULT_PAGES_PER_ZSPAGE;
/* min_alloc_size must be multiple of ZS_ALIGN */
pool->min_alloc_size = (max_pages_per_zspage << PAGE_SHIFT) >>
OBJ_INDEX_BITS;
--
2.38.0.135.g90850a2211-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCHv2 7/9] zram: add pages_per_pool_page device attribute
2022-10-26 11:29 [PATCHv2 0/9] zsmalloc/zram: configurable zspage size Sergey Senozhatsky
` (5 preceding siblings ...)
2022-10-26 11:29 ` [PATCHv2 6/9] zsmalloc: pass limit on pages per-zspage to zs_create_pool() Sergey Senozhatsky
@ 2022-10-26 11:29 ` Sergey Senozhatsky
2022-10-26 11:29 ` [PATCHv2 8/9] Documentation: document zram pages_per_pool_page attribute Sergey Senozhatsky
2022-10-26 11:29 ` [PATCHv2 9/9] zsmalloc: break out of loop when found perfect zspage order Sergey Senozhatsky
8 siblings, 0 replies; 10+ messages in thread
From: Sergey Senozhatsky @ 2022-10-26 11:29 UTC (permalink / raw)
To: Andrew Morton, Minchan Kim
Cc: Nitin Gupta, linux-kernel, linux-mm, Sergey Senozhatsky
Add a new sysfs knob that allows user-space to set
zsmalloc pages per-zspage limit value on per-device
basis.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
drivers/block/zram/zram_drv.c | 44 ++++++++++++++++++++++++++++++++++-
drivers/block/zram/zram_drv.h | 2 ++
2 files changed, 45 insertions(+), 1 deletion(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index bec02f636bce..cf9d3474b80c 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1180,6 +1180,45 @@ static ssize_t mm_stat_show(struct device *dev,
return ret;
}
+static ssize_t pages_per_pool_page_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ u32 val;
+ struct zram *zram = dev_to_zram(dev);
+
+ down_read(&zram->init_lock);
+ val = zram->pages_per_pool_page;
+ up_read(&zram->init_lock);
+
+ return scnprintf(buf, PAGE_SIZE, "%d\n", val);
+}
+
+static ssize_t pages_per_pool_page_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+ struct zram *zram = dev_to_zram(dev);
+ u32 val;
+
+ if (kstrtou32(buf, 10, &val))
+ return -EINVAL;
+
+ if (val < ZS_MIN_PAGES_PER_ZSPAGE || val > ZS_MAX_PAGES_PER_ZSPAGE)
+ return -EINVAL;
+
+ down_read(&zram->init_lock);
+ if (init_done(zram)) {
+ up_read(&zram->init_lock);
+ return -EINVAL;
+ }
+
+ zram->pages_per_pool_page = val;
+ up_read(&zram->init_lock);
+
+ return len;
+}
+
#ifdef CONFIG_ZRAM_WRITEBACK
#define FOUR_K(x) ((x) * (1 << (PAGE_SHIFT - 12)))
static ssize_t bd_stat_show(struct device *dev,
@@ -1248,7 +1287,7 @@ static bool zram_meta_alloc(struct zram *zram, u64 disksize)
return false;
zram->mem_pool = zs_create_pool(zram->disk->disk_name,
- ZS_DEFAULT_PAGES_PER_ZSPAGE);
+ zram->pages_per_pool_page);
if (!zram->mem_pool) {
vfree(zram->table);
return false;
@@ -2174,6 +2213,7 @@ static DEVICE_ATTR_RW(writeback_limit_enable);
static DEVICE_ATTR_RW(recomp_algorithm);
static DEVICE_ATTR_WO(recompress);
#endif
+static DEVICE_ATTR_RW(pages_per_pool_page);
static struct attribute *zram_disk_attrs[] = {
&dev_attr_disksize.attr,
@@ -2201,6 +2241,7 @@ static struct attribute *zram_disk_attrs[] = {
&dev_attr_recomp_algorithm.attr,
&dev_attr_recompress.attr,
#endif
+ &dev_attr_pages_per_pool_page.attr,
NULL,
};
@@ -2238,6 +2279,7 @@ static int zram_add(void)
goto out_free_idr;
}
+ zram->pages_per_pool_page = ZS_DEFAULT_PAGES_PER_ZSPAGE;
zram->disk->major = zram_major;
zram->disk->first_minor = device_id;
zram->disk->minors = 1;
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 9d6fcfdf7aa7..bdfc9bf0bdd5 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -120,6 +120,8 @@ struct zram {
*/
u64 disksize; /* bytes */
const char *comp_algs[ZRAM_MAX_ZCOMPS];
+
+ u32 pages_per_pool_page;
/*
* Pages that compress to sizes equal or greater than this are stored
* uncompressed in memory.
--
2.38.0.135.g90850a2211-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCHv2 8/9] Documentation: document zram pages_per_pool_page attribute
2022-10-26 11:29 [PATCHv2 0/9] zsmalloc/zram: configurable zspage size Sergey Senozhatsky
` (6 preceding siblings ...)
2022-10-26 11:29 ` [PATCHv2 7/9] zram: add pages_per_pool_page device attribute Sergey Senozhatsky
@ 2022-10-26 11:29 ` Sergey Senozhatsky
2022-10-26 11:29 ` [PATCHv2 9/9] zsmalloc: break out of loop when found perfect zspage order Sergey Senozhatsky
8 siblings, 0 replies; 10+ messages in thread
From: Sergey Senozhatsky @ 2022-10-26 11:29 UTC (permalink / raw)
To: Andrew Morton, Minchan Kim
Cc: Nitin Gupta, linux-kernel, linux-mm, Sergey Senozhatsky
Provide a simple documentation for pages_per_pool_page ZRAM
device attribute.
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
Documentation/admin-guide/blockdev/zram.rst | 38 ++++++++++++++++-----
1 file changed, 30 insertions(+), 8 deletions(-)
diff --git a/Documentation/admin-guide/blockdev/zram.rst b/Documentation/admin-guide/blockdev/zram.rst
index 010fb05a5999..4cb287520d45 100644
--- a/Documentation/admin-guide/blockdev/zram.rst
+++ b/Documentation/admin-guide/blockdev/zram.rst
@@ -112,7 +112,29 @@ to list all of them using, for instance, /proc/crypto or any other
method. This, however, has an advantage of permitting the usage of
custom crypto compression modules (implementing S/W or H/W compression).
-4) Set Disksize
+4) Set pages per-pool page limit: Optional
+==========================================
+
+zsmalloc pages can consist of up to ZS_DEFAULT_PAGES_PER_ZSPAGE (single)
+physical pages. The exact number is calculated for each zsmalloc size
+class during zsmalloc pool creation. ZRAM provides pages_per_pool_page
+device attribute that lets one adjust that limit (maximum possible value
+is ZS_MAX_PAGES_PER_ZSPAGE). The default limit is considered to be good
+enough, so tweak this value only when the changes in zsmalloc size classes
+characteristics are beneficial for your data patterns. The limit on the
+pages per zspages (currently) should be in [1,16] range; default value
+is 4.
+
+Examples::
+
+ #show current zsmalloc pages per-pool page limit
+ cat /sys/block/zramX/pages_per_pool_page
+ 4
+
+ #set zsmalloc pages per-pool page limit
+ echo 8 > /sys/block/zramX/pages_per_pool_page
+
+5) Set Disksize
===============
Set disk size by writing the value to sysfs node 'disksize'.
@@ -132,7 +154,7 @@ There is little point creating a zram of greater than twice the size of memory
since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
size of the disk when not in use so a huge zram is wasteful.
-5) Set memory limit: Optional
+6) Set memory limit: Optional
=============================
Set memory limit by writing the value to sysfs node 'mem_limit'.
@@ -151,7 +173,7 @@ Examples::
# To disable memory limit
echo 0 > /sys/block/zram0/mem_limit
-6) Activate
+7) Activate
===========
::
@@ -162,7 +184,7 @@ Examples::
mkfs.ext4 /dev/zram1
mount /dev/zram1 /tmp
-7) Add/remove zram devices
+8) Add/remove zram devices
==========================
zram provides a control interface, which enables dynamic (on-demand) device
@@ -182,7 +204,7 @@ execute::
echo X > /sys/class/zram-control/hot_remove
-8) Stats
+9) Stats
========
Per-device statistics are exported as various nodes under /sys/block/zram<id>/
@@ -283,15 +305,15 @@ a single line of text and contains the following stats separated by whitespace:
Unit: 4K bytes
============== =============================================================
-9) Deactivate
-=============
+10) Deactivate
+==============
::
swapoff /dev/zram0
umount /dev/zram1
-10) Reset
+11) Reset
=========
Write any positive value to 'reset' sysfs node::
--
2.38.0.135.g90850a2211-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCHv2 9/9] zsmalloc: break out of loop when found perfect zspage order
2022-10-26 11:29 [PATCHv2 0/9] zsmalloc/zram: configurable zspage size Sergey Senozhatsky
` (7 preceding siblings ...)
2022-10-26 11:29 ` [PATCHv2 8/9] Documentation: document zram pages_per_pool_page attribute Sergey Senozhatsky
@ 2022-10-26 11:29 ` Sergey Senozhatsky
8 siblings, 0 replies; 10+ messages in thread
From: Sergey Senozhatsky @ 2022-10-26 11:29 UTC (permalink / raw)
To: Andrew Morton, Minchan Kim
Cc: Nitin Gupta, linux-kernel, linux-mm, Sergey Senozhatsky
If we found zspage configuration that gives us perfect
100% used percentage (zero wasted space) then there is
no point it trying any other configuration
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
mm/zsmalloc.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 7465a45495bf..0bc9ed1f1a5d 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -787,6 +787,9 @@ static int get_pages_per_zspage(u32 class_size, u32 max_pages_per_zspage)
max_usedpc = usedpc;
max_usedpc_order = i;
}
+
+ if (usedpc == 100)
+ break;
}
return max_usedpc_order;
--
2.38.0.135.g90850a2211-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-10-26 11:31 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-26 11:29 [PATCHv2 0/9] zsmalloc/zram: configurable zspage size Sergey Senozhatsky
2022-10-26 11:29 ` [PATCHv2 1/9] zram: add size class equals check into recompression Sergey Senozhatsky
2022-10-26 11:29 ` [PATCHv2 2/9] zsmalloc: turn zspage order into runtime variable Sergey Senozhatsky
2022-10-26 11:29 ` [PATCHv2 3/9] zsmalloc: move away from page order defines Sergey Senozhatsky
2022-10-26 11:29 ` [PATCHv2 4/9] zsmalloc: make huge class watermark zs_pool member Sergey Senozhatsky
2022-10-26 11:29 ` [PATCHv2 5/9] zram: huge size watermark cannot be global Sergey Senozhatsky
2022-10-26 11:29 ` [PATCHv2 6/9] zsmalloc: pass limit on pages per-zspage to zs_create_pool() Sergey Senozhatsky
2022-10-26 11:29 ` [PATCHv2 7/9] zram: add pages_per_pool_page device attribute Sergey Senozhatsky
2022-10-26 11:29 ` [PATCHv2 8/9] Documentation: document zram pages_per_pool_page attribute Sergey Senozhatsky
2022-10-26 11:29 ` [PATCHv2 9/9] zsmalloc: break out of loop when found perfect zspage order Sergey Senozhatsky
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.