linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/4] zram memory control enhance
@ 2014-08-25  0:05 Minchan Kim
  2014-08-25  0:05 ` [PATCH v5 1/4] zsmalloc: move pages_allocated to zs_pool Minchan Kim
                   ` (3 more replies)
  0 siblings, 4 replies; 27+ messages in thread
From: Minchan Kim @ 2014-08-25  0:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Jerome Marchand,
	juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta,
	Seth Jennings, Dan Streetman, ds2horner, Minchan Kim

Currently, zram has no feature to limit memory so theoretically
zram can deplete system memory.
Users have asked for a limit several times as even without exhaustion
zram makes it hard to control memory usage of the platform.
This patchset adds the feature.

Patch 1 makes zs_get_total_size_bytes faster because it would be
used frequently in later patches for the new feature.

Patch 2 changes zs_get_total_size_bytes's return unit from bytes
to page so that zsmalloc doesn't need unnecessary operation(ie,
<< PAGE_SHIFT).

Patch 3 adds new feature. I added the feature into zram layer,
not zsmalloc because limiation is zram's requirement, not zsmalloc
so any other user using zsmalloc(ie, zpool) shouldn't affected
by unnecessary branch of zsmalloc. In future, if every users
of zsmalloc want the feature, then, we could move the feature
from client side to zsmalloc easily but vice versa would be
painful.

Patch 4 adds news facility to report maximum memory usage of zram
so that this avoids user polling frequently via /sys/block/zram0/
mem_used_total and ensures transient max are not missed.

* From v4
 * Add Reviewed-by - Dan
 * Clean up document of mem_limit - David

* From v3
 * get_zs_total_size_byte function name change - Dan
 * clarifiction of the document - Dan
 * atomic account instead of introducing new lock in zsmalloc - David
 * remove unnecessary atomic instruction in updating max - David
 
* From v2
 * introduce helper funcntion to update max_used_pages
   for readability - David
 * avoid unncessary zs_get_total_size call in updating loop
   for max_used_pages - David

* From v1
 * rebased on next-20140815
 * fix up race problem - David, Dan
 * reset mem_used_max as current total_bytes, rather than 0 - David
 * resetting works with only "0" write for extensiblilty - David, Dan

Minchan Kim (4):
  zsmalloc: move pages_allocated to zs_pool
  zsmalloc: change return value unit of  zs_get_total_size_bytes
  zram: zram memory size limitation
  zram: report maximum used memory

 Documentation/ABI/testing/sysfs-block-zram |  20 ++++++
 Documentation/blockdev/zram.txt            |  25 +++++--
 drivers/block/zram/zram_drv.c              | 101 ++++++++++++++++++++++++++++-
 drivers/block/zram/zram_drv.h              |   6 ++
 include/linux/zsmalloc.h                   |   2 +-
 mm/zsmalloc.c                              |  30 ++++-----
 6 files changed, 158 insertions(+), 26 deletions(-)

-- 
2.0.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v5 1/4] zsmalloc: move pages_allocated to zs_pool
  2014-08-25  0:05 [PATCH v5 0/4] zram memory control enhance Minchan Kim
@ 2014-08-25  0:05 ` Minchan Kim
  2014-08-25  0:05 ` [PATCH v5 2/4] zsmalloc: change return value unit of zs_get_total_size_bytes Minchan Kim
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 27+ messages in thread
From: Minchan Kim @ 2014-08-25  0:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Jerome Marchand,
	juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta,
	Seth Jennings, Dan Streetman, ds2horner, Minchan Kim

pages_allocated has counted in size_class structure and when user
of zsmalloc want to see total_size_bytes, it should gather all of
count from each size_class to report the sum.

it's not bad if user don't see the value often but if user start
to see the value frequently, it would be not a good deal for
performance pov.

This patch moves the count from size_class to zs_pool so it could
reduce memory footprint (from [255 * 8byte] to
[sizeof(atomic_long_t)]).

Reviewed-by: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 23 ++++++++---------------
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 94f38fac5e81..2a4acf400846 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -199,9 +199,6 @@ struct size_class {
 
 	spinlock_t lock;
 
-	/* stats */
-	u64 pages_allocated;
-
 	struct page *fullness_list[_ZS_NR_FULLNESS_GROUPS];
 };
 
@@ -220,6 +217,7 @@ struct zs_pool {
 	struct size_class size_class[ZS_SIZE_CLASSES];
 
 	gfp_t flags;	/* allocation flags used when growing pool */
+	atomic_long_t pages_allocated;
 };
 
 /*
@@ -1028,8 +1026,9 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
 			return 0;
 
 		set_zspage_mapping(first_page, class->index, ZS_EMPTY);
+		atomic_long_add(class->pages_per_zspage,
+					&pool->pages_allocated);
 		spin_lock(&class->lock);
-		class->pages_allocated += class->pages_per_zspage;
 	}
 
 	obj = (unsigned long)first_page->freelist;
@@ -1082,14 +1081,13 @@ void zs_free(struct zs_pool *pool, unsigned long obj)
 
 	first_page->inuse--;
 	fullness = fix_fullness_group(pool, first_page);
-
-	if (fullness == ZS_EMPTY)
-		class->pages_allocated -= class->pages_per_zspage;
-
 	spin_unlock(&class->lock);
 
-	if (fullness == ZS_EMPTY)
+	if (fullness == ZS_EMPTY) {
+		atomic_long_sub(class->pages_per_zspage,
+				&pool->pages_allocated);
 		free_zspage(first_page);
+	}
 }
 EXPORT_SYMBOL_GPL(zs_free);
 
@@ -1185,12 +1183,7 @@ EXPORT_SYMBOL_GPL(zs_unmap_object);
 
 u64 zs_get_total_size_bytes(struct zs_pool *pool)
 {
-	int i;
-	u64 npages = 0;
-
-	for (i = 0; i < ZS_SIZE_CLASSES; i++)
-		npages += pool->size_class[i].pages_allocated;
-
+	u64 npages = atomic_long_read(&pool->pages_allocated);
 	return npages << PAGE_SHIFT;
 }
 EXPORT_SYMBOL_GPL(zs_get_total_size_bytes);
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 2/4] zsmalloc: change return value unit of  zs_get_total_size_bytes
  2014-08-25  0:05 [PATCH v5 0/4] zram memory control enhance Minchan Kim
  2014-08-25  0:05 ` [PATCH v5 1/4] zsmalloc: move pages_allocated to zs_pool Minchan Kim
@ 2014-08-25  0:05 ` Minchan Kim
  2014-08-25  4:08   ` David Horner
  2014-08-25  0:05 ` [PATCH v5 3/4] zram: zram memory size limitation Minchan Kim
  2014-08-25  0:05 ` [PATCH v5 4/4] zram: report maximum used memory Minchan Kim
  3 siblings, 1 reply; 27+ messages in thread
From: Minchan Kim @ 2014-08-25  0:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Jerome Marchand,
	juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta,
	Seth Jennings, Dan Streetman, ds2horner, Minchan Kim

zs_get_total_size_bytes returns a amount of memory zsmalloc
consumed with *byte unit* but zsmalloc operates *page unit*
rather than byte unit so let's change the API so benefit
we could get is that reduce unnecessary overhead
(ie, change page unit with byte unit) in zsmalloc.

Since return type is pages, "zs_get_total_pages" is better than
"zs_get_total_size_bytes".

Reviewed-by: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/zram/zram_drv.c | 4 ++--
 include/linux/zsmalloc.h      | 2 +-
 mm/zsmalloc.c                 | 9 ++++-----
 3 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index d00831c3d731..f0b8b30a7128 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -103,10 +103,10 @@ static ssize_t mem_used_total_show(struct device *dev,
 
 	down_read(&zram->init_lock);
 	if (init_done(zram))
-		val = zs_get_total_size_bytes(meta->mem_pool);
+		val = zs_get_total_pages(meta->mem_pool);
 	up_read(&zram->init_lock);
 
-	return scnprintf(buf, PAGE_SIZE, "%llu\n", val);
+	return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
 }
 
 static ssize_t max_comp_streams_show(struct device *dev,
diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index e44d634e7fb7..05c214760977 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -46,6 +46,6 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
 			enum zs_mapmode mm);
 void zs_unmap_object(struct zs_pool *pool, unsigned long handle);
 
-u64 zs_get_total_size_bytes(struct zs_pool *pool);
+unsigned long zs_get_total_pages(struct zs_pool *pool);
 
 #endif
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 2a4acf400846..c4a91578dc96 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -297,7 +297,7 @@ static void zs_zpool_unmap(void *pool, unsigned long handle)
 
 static u64 zs_zpool_total_size(void *pool)
 {
-	return zs_get_total_size_bytes(pool);
+	return zs_get_total_pages(pool) << PAGE_SHIFT;
 }
 
 static struct zpool_driver zs_zpool_driver = {
@@ -1181,12 +1181,11 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
 }
 EXPORT_SYMBOL_GPL(zs_unmap_object);
 
-u64 zs_get_total_size_bytes(struct zs_pool *pool)
+unsigned long zs_get_total_pages(struct zs_pool *pool)
 {
-	u64 npages = atomic_long_read(&pool->pages_allocated);
-	return npages << PAGE_SHIFT;
+	return atomic_long_read(&pool->pages_allocated);
 }
-EXPORT_SYMBOL_GPL(zs_get_total_size_bytes);
+EXPORT_SYMBOL_GPL(zs_get_total_pages);
 
 module_init(zs_init);
 module_exit(zs_exit);
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 3/4] zram: zram memory size limitation
  2014-08-25  0:05 [PATCH v5 0/4] zram memory control enhance Minchan Kim
  2014-08-25  0:05 ` [PATCH v5 1/4] zsmalloc: move pages_allocated to zs_pool Minchan Kim
  2014-08-25  0:05 ` [PATCH v5 2/4] zsmalloc: change return value unit of zs_get_total_size_bytes Minchan Kim
@ 2014-08-25  0:05 ` Minchan Kim
  2014-08-25 11:09   ` Sergey Senozhatsky
  2014-08-26  7:37   ` Joonsoo Kim
  2014-08-25  0:05 ` [PATCH v5 4/4] zram: report maximum used memory Minchan Kim
  3 siblings, 2 replies; 27+ messages in thread
From: Minchan Kim @ 2014-08-25  0:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Jerome Marchand,
	juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta,
	Seth Jennings, Dan Streetman, ds2horner, Minchan Kim

Since zram has no control feature to limit memory usage,
it makes hard to manage system memrory.

This patch adds new knob "mem_limit" via sysfs to set up the
a limit so that zram could fail allocation once it reaches
the limit.

In addition, user could change the limit in runtime so that
he could manage the memory more dynamically.

Initial state is no limit so it doesn't break old behavior.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
 Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
 drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
 drivers/block/zram/zram_drv.h              |  5 ++++
 4 files changed, 76 insertions(+), 4 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
index 70ec992514d0..dbe643775ec1 100644
--- a/Documentation/ABI/testing/sysfs-block-zram
+++ b/Documentation/ABI/testing/sysfs-block-zram
@@ -119,3 +119,13 @@ Description:
 		efficiency can be calculated using compr_data_size and this
 		statistic.
 		Unit: bytes
+
+What:		/sys/block/zram<id>/mem_limit
+Date:		August 2014
+Contact:	Minchan Kim <minchan@kernel.org>
+Description:
+		The mem_limit file is read/write and specifies the amount
+		of memory to be able to consume memory to store store
+		compressed data. The limit could be changed in run time
+		and "0" means disable the limit. No limit is the initial state.
+		Unit: bytes
diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index 0595c3f56ccf..82c6a41116db 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
 since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
 size of the disk when not in use so a huge zram is wasteful.
 
-5) Activate:
+5) Set memory limit: Optional
+	Set memory limit by writing the value to sysfs node 'mem_limit'.
+	The value can be either in bytes or you can use mem suffixes.
+	In addition, you could change the value in runtime.
+	Examples:
+	    # limit /dev/zram0 with 50MB memory
+	    echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
+
+	    # Using mem suffixes
+	    echo 256K > /sys/block/zram0/mem_limit
+	    echo 512M > /sys/block/zram0/mem_limit
+	    echo 1G > /sys/block/zram0/mem_limit
+
+	    # To disable memory limit
+	    echo 0 > /sys/block/zram0/mem_limit
+
+6) Activate:
 	mkswap /dev/zram0
 	swapon /dev/zram0
 
 	mkfs.ext4 /dev/zram1
 	mount /dev/zram1 /tmp
 
-6) Stats:
+7) Stats:
 	Per-device statistics are exported as various nodes under
 	/sys/block/zram<id>/
 		disksize
@@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
 		compr_data_size
 		mem_used_total
 
-7) Deactivate:
+8) Deactivate:
 	swapoff /dev/zram0
 	umount /dev/zram1
 
-8) Reset:
+9) Reset:
 	Write any positive value to 'reset' sysfs node
 	echo 1 > /sys/block/zram0/reset
 	echo 1 > /sys/block/zram1/reset
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index f0b8b30a7128..370c355eb127 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
 	return scnprintf(buf, PAGE_SIZE, "%d\n", val);
 }
 
+static ssize_t mem_limit_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	u64 val;
+	struct zram *zram = dev_to_zram(dev);
+
+	down_read(&zram->init_lock);
+	val = zram->limit_pages;
+	up_read(&zram->init_lock);
+
+	return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
+}
+
+static ssize_t mem_limit_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t len)
+{
+	u64 limit;
+	struct zram *zram = dev_to_zram(dev);
+
+	limit = memparse(buf, NULL);
+	down_write(&zram->init_lock);
+	zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
+	up_write(&zram->init_lock);
+
+	return len;
+}
+
 static ssize_t max_comp_streams_store(struct device *dev,
 		struct device_attribute *attr, const char *buf, size_t len)
 {
@@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 		ret = -ENOMEM;
 		goto out;
 	}
+
+	if (zram->limit_pages &&
+		zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
+		zs_free(meta->mem_pool, handle);
+		ret = -ENOMEM;
+		goto out;
+	}
+
 	cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
 
 	if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
@@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
 	struct zram_meta *meta;
 
 	down_write(&zram->init_lock);
+
+	zram->limit_pages = 0;
+
 	if (!init_done(zram)) {
 		up_write(&zram->init_lock);
 		return;
@@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
 static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
 static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
 static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
+static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
+		mem_limit_store);
 static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
 		max_comp_streams_show, max_comp_streams_store);
 static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
@@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
 	&dev_attr_orig_data_size.attr,
 	&dev_attr_compr_data_size.attr,
 	&dev_attr_mem_used_total.attr,
+	&dev_attr_mem_limit.attr,
 	&dev_attr_max_comp_streams.attr,
 	&dev_attr_comp_algorithm.attr,
 	NULL,
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index e0f725c87cc6..b7aa9c21553f 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -112,6 +112,11 @@ struct zram {
 	u64 disksize;	/* bytes */
 	int max_comp_streams;
 	struct zram_stats stats;
+	/*
+	 * the number of pages zram can consume for storing compressed data
+	 */
+	unsigned long limit_pages;
+
 	char compressor[10];
 };
 #endif
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v5 4/4] zram: report maximum used memory
  2014-08-25  0:05 [PATCH v5 0/4] zram memory control enhance Minchan Kim
                   ` (2 preceding siblings ...)
  2014-08-25  0:05 ` [PATCH v5 3/4] zram: zram memory size limitation Minchan Kim
@ 2014-08-25  0:05 ` Minchan Kim
  2014-08-25  4:05   ` David Horner
  3 siblings, 1 reply; 27+ messages in thread
From: Minchan Kim @ 2014-08-25  0:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Jerome Marchand,
	juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta,
	Seth Jennings, Dan Streetman, ds2horner, Minchan Kim

Normally, zram user could get maximum memory usage zram consumed
via polling mem_used_total with sysfs in userspace.

But it has a critical problem because user can miss peak memory
usage during update inverval of polling. For avoiding that,
user should poll it with shorter interval(ie, 0.0000000001s)
with mlocking to avoid page fault delay when memory pressure
is heavy. It would be troublesome.

This patch adds new knob "mem_used_max" so user could see
the maximum memory usage easily via reading the knob and reset
it via "echo 0 > /sys/block/zram0/mem_used_max".

Reviewed-by: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 Documentation/ABI/testing/sysfs-block-zram | 10 +++++
 Documentation/blockdev/zram.txt            |  1 +
 drivers/block/zram/zram_drv.c              | 60 +++++++++++++++++++++++++++++-
 drivers/block/zram/zram_drv.h              |  1 +
 4 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
index dbe643775ec1..01a38eaf1552 100644
--- a/Documentation/ABI/testing/sysfs-block-zram
+++ b/Documentation/ABI/testing/sysfs-block-zram
@@ -120,6 +120,16 @@ Description:
 		statistic.
 		Unit: bytes
 
+What:		/sys/block/zram<id>/mem_used_max
+Date:		August 2014
+Contact:	Minchan Kim <minchan@kernel.org>
+Description:
+		The mem_used_max file is read/write and specifies the amount
+		of maximum memory zram have consumed to store compressed data.
+		For resetting the value, you should write "0". Otherwise,
+		you could see -EINVAL.
+		Unit: bytes
+
 What:		/sys/block/zram<id>/mem_limit
 Date:		August 2014
 Contact:	Minchan Kim <minchan@kernel.org>
diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index 82c6a41116db..7fcf9c6592ec 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -111,6 +111,7 @@ size of the disk when not in use so a huge zram is wasteful.
 		orig_data_size
 		compr_data_size
 		mem_used_total
+		mem_used_max
 
 8) Deactivate:
 	swapoff /dev/zram0
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 370c355eb127..1a2b3e320ea5 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -149,6 +149,41 @@ static ssize_t mem_limit_store(struct device *dev,
 	return len;
 }
 
+static ssize_t mem_used_max_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	u64 val = 0;
+	struct zram *zram = dev_to_zram(dev);
+
+	down_read(&zram->init_lock);
+	if (init_done(zram))
+		val = atomic_long_read(&zram->stats.max_used_pages);
+	up_read(&zram->init_lock);
+
+	return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
+}
+
+static ssize_t mem_used_max_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t len)
+{
+	int err;
+	unsigned long val;
+	struct zram *zram = dev_to_zram(dev);
+	struct zram_meta *meta = zram->meta;
+
+	err = kstrtoul(buf, 10, &val);
+	if (err || val != 0)
+		return -EINVAL;
+
+	down_read(&zram->init_lock);
+	if (init_done(zram))
+		atomic_long_set(&zram->stats.max_used_pages,
+				zs_get_total_pages(meta->mem_pool));
+	up_read(&zram->init_lock);
+
+	return len;
+}
+
 static ssize_t max_comp_streams_store(struct device *dev,
 		struct device_attribute *attr, const char *buf, size_t len)
 {
@@ -461,6 +496,21 @@ out_cleanup:
 	return ret;
 }
 
+static inline void update_used_max(struct zram *zram,
+					const unsigned long pages)
+{
+	int old_max, cur_max;
+
+	old_max = atomic_long_read(&zram->stats.max_used_pages);
+
+	do {
+		cur_max = old_max;
+		if (pages > cur_max)
+			old_max = atomic_long_cmpxchg(
+				&zram->stats.max_used_pages, cur_max, pages);
+	} while (old_max != cur_max);
+}
+
 static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 			   int offset)
 {
@@ -472,6 +522,7 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 	struct zram_meta *meta = zram->meta;
 	struct zcomp_strm *zstrm;
 	bool locked = false;
+	unsigned long alloced_pages;
 
 	page = bvec->bv_page;
 	if (is_partial_io(bvec)) {
@@ -541,13 +592,15 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 		goto out;
 	}
 
-	if (zram->limit_pages &&
-		zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
+	alloced_pages = zs_get_total_pages(meta->mem_pool);
+	if (zram->limit_pages && alloced_pages > zram->limit_pages) {
 		zs_free(meta->mem_pool, handle);
 		ret = -ENOMEM;
 		goto out;
 	}
 
+	update_used_max(zram, alloced_pages);
+
 	cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
 
 	if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
@@ -897,6 +950,8 @@ static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
 static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
 static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
 		mem_limit_store);
+static DEVICE_ATTR(mem_used_max, S_IRUGO | S_IWUSR, mem_used_max_show,
+		mem_used_max_store);
 static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
 		max_comp_streams_show, max_comp_streams_store);
 static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
@@ -926,6 +981,7 @@ static struct attribute *zram_disk_attrs[] = {
 	&dev_attr_compr_data_size.attr,
 	&dev_attr_mem_used_total.attr,
 	&dev_attr_mem_limit.attr,
+	&dev_attr_mem_used_max.attr,
 	&dev_attr_max_comp_streams.attr,
 	&dev_attr_comp_algorithm.attr,
 	NULL,
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index b7aa9c21553f..c6ee271317f5 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -90,6 +90,7 @@ struct zram_stats {
 	atomic64_t notify_free;	/* no. of swap slot free notifications */
 	atomic64_t zero_pages;		/* no. of zero filled pages */
 	atomic64_t pages_stored;	/* no. of pages currently stored */
+	atomic_long_t max_used_pages;	/* no. of maximum pages stored */
 };
 
 struct zram_meta {
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 4/4] zram: report maximum used memory
  2014-08-25  0:05 ` [PATCH v5 4/4] zram: report maximum used memory Minchan Kim
@ 2014-08-25  4:05   ` David Horner
  0 siblings, 0 replies; 27+ messages in thread
From: David Horner @ 2014-08-25  4:05 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky,
	Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato,
	Nitin Gupta, Seth Jennings, Dan Streetman

On Sun, Aug 24, 2014 at 8:05 PM, Minchan Kim <minchan@kernel.org> wrote:
> Normally, zram user could get maximum memory usage zram consumed
> via polling mem_used_total with sysfs in userspace.
>
> But it has a critical problem because user can miss peak memory
> usage during update inverval of polling. For avoiding that,
> user should poll it with shorter interval(ie, 0.0000000001s)
> with mlocking to avoid page fault delay when memory pressure
> is heavy. It would be troublesome.
>
> This patch adds new knob "mem_used_max" so user could see
> the maximum memory usage easily via reading the knob and reset
> it via "echo 0 > /sys/block/zram0/mem_used_max".
>
> Reviewed-by: Dan Streetman <ddstreet@ieee.org>
Reviewed-by: David Horner <ds2horner@gmail.com>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  Documentation/ABI/testing/sysfs-block-zram | 10 +++++
>  Documentation/blockdev/zram.txt            |  1 +
>  drivers/block/zram/zram_drv.c              | 60 +++++++++++++++++++++++++++++-
>  drivers/block/zram/zram_drv.h              |  1 +
>  4 files changed, 70 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
> index dbe643775ec1..01a38eaf1552 100644
> --- a/Documentation/ABI/testing/sysfs-block-zram
> +++ b/Documentation/ABI/testing/sysfs-block-zram
> @@ -120,6 +120,16 @@ Description:
>                 statistic.
>                 Unit: bytes
>
> +What:          /sys/block/zram<id>/mem_used_max
> +Date:          August 2014
> +Contact:       Minchan Kim <minchan@kernel.org>
> +Description:
> +               The mem_used_max file is read/write and specifies the amount
> +               of maximum memory zram have consumed to store compressed data.
> +               For resetting the value, you should write "0". Otherwise,
> +               you could see -EINVAL.
> +               Unit: bytes
> +
>  What:          /sys/block/zram<id>/mem_limit
>  Date:          August 2014
>  Contact:       Minchan Kim <minchan@kernel.org>
> diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
> index 82c6a41116db..7fcf9c6592ec 100644
> --- a/Documentation/blockdev/zram.txt
> +++ b/Documentation/blockdev/zram.txt
> @@ -111,6 +111,7 @@ size of the disk when not in use so a huge zram is wasteful.
>                 orig_data_size
>                 compr_data_size
>                 mem_used_total
> +               mem_used_max
>
>  8) Deactivate:
>         swapoff /dev/zram0
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 370c355eb127..1a2b3e320ea5 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -149,6 +149,41 @@ static ssize_t mem_limit_store(struct device *dev,
>         return len;
>  }
>
> +static ssize_t mem_used_max_show(struct device *dev,
> +               struct device_attribute *attr, char *buf)
> +{
> +       u64 val = 0;
> +       struct zram *zram = dev_to_zram(dev);
> +
> +       down_read(&zram->init_lock);
> +       if (init_done(zram))
> +               val = atomic_long_read(&zram->stats.max_used_pages);
> +       up_read(&zram->init_lock);
> +
> +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
> +}
> +
> +static ssize_t mem_used_max_store(struct device *dev,
> +               struct device_attribute *attr, const char *buf, size_t len)
> +{
> +       int err;
> +       unsigned long val;
> +       struct zram *zram = dev_to_zram(dev);
> +       struct zram_meta *meta = zram->meta;
> +
> +       err = kstrtoul(buf, 10, &val);
> +       if (err || val != 0)
> +               return -EINVAL;
> +
> +       down_read(&zram->init_lock);
> +       if (init_done(zram))
> +               atomic_long_set(&zram->stats.max_used_pages,
> +                               zs_get_total_pages(meta->mem_pool));
> +       up_read(&zram->init_lock);
> +
> +       return len;
> +}
> +
>  static ssize_t max_comp_streams_store(struct device *dev,
>                 struct device_attribute *attr, const char *buf, size_t len)
>  {
> @@ -461,6 +496,21 @@ out_cleanup:
>         return ret;
>  }
>
> +static inline void update_used_max(struct zram *zram,
> +                                       const unsigned long pages)
> +{
> +       int old_max, cur_max;
> +
> +       old_max = atomic_long_read(&zram->stats.max_used_pages);
> +
> +       do {
> +               cur_max = old_max;
> +               if (pages > cur_max)
> +                       old_max = atomic_long_cmpxchg(
> +                               &zram->stats.max_used_pages, cur_max, pages);
> +       } while (old_max != cur_max);
> +}
> +
>  static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>                            int offset)
>  {
> @@ -472,6 +522,7 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>         struct zram_meta *meta = zram->meta;
>         struct zcomp_strm *zstrm;
>         bool locked = false;
> +       unsigned long alloced_pages;
>
>         page = bvec->bv_page;
>         if (is_partial_io(bvec)) {
> @@ -541,13 +592,15 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>                 goto out;
>         }
>
> -       if (zram->limit_pages &&
> -               zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
> +       alloced_pages = zs_get_total_pages(meta->mem_pool);
> +       if (zram->limit_pages && alloced_pages > zram->limit_pages) {
>                 zs_free(meta->mem_pool, handle);
>                 ret = -ENOMEM;
>                 goto out;
>         }
>
> +       update_used_max(zram, alloced_pages);
> +
>         cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>
>         if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
> @@ -897,6 +950,8 @@ static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
>  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
>  static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
>                 mem_limit_store);
> +static DEVICE_ATTR(mem_used_max, S_IRUGO | S_IWUSR, mem_used_max_show,
> +               mem_used_max_store);
>  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
>                 max_comp_streams_show, max_comp_streams_store);
>  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
> @@ -926,6 +981,7 @@ static struct attribute *zram_disk_attrs[] = {
>         &dev_attr_compr_data_size.attr,
>         &dev_attr_mem_used_total.attr,
>         &dev_attr_mem_limit.attr,
> +       &dev_attr_mem_used_max.attr,
>         &dev_attr_max_comp_streams.attr,
>         &dev_attr_comp_algorithm.attr,
>         NULL,
> diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
> index b7aa9c21553f..c6ee271317f5 100644
> --- a/drivers/block/zram/zram_drv.h
> +++ b/drivers/block/zram/zram_drv.h
> @@ -90,6 +90,7 @@ struct zram_stats {
>         atomic64_t notify_free; /* no. of swap slot free notifications */
>         atomic64_t zero_pages;          /* no. of zero filled pages */
>         atomic64_t pages_stored;        /* no. of pages currently stored */
> +       atomic_long_t max_used_pages;   /* no. of maximum pages stored */
>  };
>
>  struct zram_meta {
> --
> 2.0.0
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 2/4] zsmalloc: change return value unit of zs_get_total_size_bytes
  2014-08-25  0:05 ` [PATCH v5 2/4] zsmalloc: change return value unit of zs_get_total_size_bytes Minchan Kim
@ 2014-08-25  4:08   ` David Horner
  0 siblings, 0 replies; 27+ messages in thread
From: David Horner @ 2014-08-25  4:08 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky,
	Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato,
	Nitin Gupta, Seth Jennings, Dan Streetman

On Sun, Aug 24, 2014 at 8:05 PM, Minchan Kim <minchan@kernel.org> wrote:
> zs_get_total_size_bytes returns a amount of memory zsmalloc
> consumed with *byte unit* but zsmalloc operates *page unit*
> rather than byte unit so let's change the API so benefit
> we could get is that reduce unnecessary overhead
> (ie, change page unit with byte unit) in zsmalloc.
>
> Since return type is pages, "zs_get_total_pages" is better than
> "zs_get_total_size_bytes".
>
> Reviewed-by: Dan Streetman <ddstreet@ieee.org>
Reviewed-by: David Horner <ds2horner@gmail.com>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  drivers/block/zram/zram_drv.c | 4 ++--
>  include/linux/zsmalloc.h      | 2 +-
>  mm/zsmalloc.c                 | 9 ++++-----
>  3 files changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index d00831c3d731..f0b8b30a7128 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -103,10 +103,10 @@ static ssize_t mem_used_total_show(struct device *dev,
>
>         down_read(&zram->init_lock);
>         if (init_done(zram))
> -               val = zs_get_total_size_bytes(meta->mem_pool);
> +               val = zs_get_total_pages(meta->mem_pool);
>         up_read(&zram->init_lock);
>
> -       return scnprintf(buf, PAGE_SIZE, "%llu\n", val);
> +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
>  }
>
>  static ssize_t max_comp_streams_show(struct device *dev,
> diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
> index e44d634e7fb7..05c214760977 100644
> --- a/include/linux/zsmalloc.h
> +++ b/include/linux/zsmalloc.h
> @@ -46,6 +46,6 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
>                         enum zs_mapmode mm);
>  void zs_unmap_object(struct zs_pool *pool, unsigned long handle);
>
> -u64 zs_get_total_size_bytes(struct zs_pool *pool);
> +unsigned long zs_get_total_pages(struct zs_pool *pool);
>
>  #endif
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 2a4acf400846..c4a91578dc96 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -297,7 +297,7 @@ static void zs_zpool_unmap(void *pool, unsigned long handle)
>
>  static u64 zs_zpool_total_size(void *pool)
>  {
> -       return zs_get_total_size_bytes(pool);
> +       return zs_get_total_pages(pool) << PAGE_SHIFT;
>  }
>
>  static struct zpool_driver zs_zpool_driver = {
> @@ -1181,12 +1181,11 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
>  }
>  EXPORT_SYMBOL_GPL(zs_unmap_object);
>
> -u64 zs_get_total_size_bytes(struct zs_pool *pool)
> +unsigned long zs_get_total_pages(struct zs_pool *pool)
>  {
> -       u64 npages = atomic_long_read(&pool->pages_allocated);
> -       return npages << PAGE_SHIFT;
> +       return atomic_long_read(&pool->pages_allocated);
>  }
> -EXPORT_SYMBOL_GPL(zs_get_total_size_bytes);
> +EXPORT_SYMBOL_GPL(zs_get_total_pages);
>
>  module_init(zs_init);
>  module_exit(zs_exit);
> --
> 2.0.0
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 3/4] zram: zram memory size limitation
  2014-08-25  0:05 ` [PATCH v5 3/4] zram: zram memory size limitation Minchan Kim
@ 2014-08-25 11:09   ` Sergey Senozhatsky
  2014-08-26  4:52     ` Minchan Kim
  2014-08-26  7:37   ` Joonsoo Kim
  1 sibling, 1 reply; 27+ messages in thread
From: Sergey Senozhatsky @ 2014-08-25 11:09 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, Sergey Senozhatsky,
	Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato,
	Nitin Gupta, Seth Jennings, Dan Streetman, ds2horner

On (08/25/14 09:05), Minchan Kim wrote:
> Since zram has no control feature to limit memory usage,
> it makes hard to manage system memrory.
> 
> This patch adds new knob "mem_limit" via sysfs to set up the
> a limit so that zram could fail allocation once it reaches
> the limit.
> 
> In addition, user could change the limit in runtime so that
> he could manage the memory more dynamically.
> 
> Initial state is no limit so it doesn't break old behavior.
> 
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
>  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
>  drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
>  drivers/block/zram/zram_drv.h              |  5 ++++
>  4 files changed, 76 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
> index 70ec992514d0..dbe643775ec1 100644
> --- a/Documentation/ABI/testing/sysfs-block-zram
> +++ b/Documentation/ABI/testing/sysfs-block-zram
> @@ -119,3 +119,13 @@ Description:
>  		efficiency can be calculated using compr_data_size and this
>  		statistic.
>  		Unit: bytes
> +
> +What:		/sys/block/zram<id>/mem_limit
> +Date:		August 2014
> +Contact:	Minchan Kim <minchan@kernel.org>
> +Description:
> +		The mem_limit file is read/write and specifies the amount
> +		of memory to be able to consume memory to store store
> +		compressed data. The limit could be changed in run time
> +		and "0" means disable the limit. No limit is the initial state.

just a nitpick, sorry.
"the amount of memory to be able to consume memory to store store compressed data"
							^^^^^^^

"the maximum amount of memory ZRAM can use to store the compressed data"?

	-ss

> +		Unit: bytes
> diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
> index 0595c3f56ccf..82c6a41116db 100644
> --- a/Documentation/blockdev/zram.txt
> +++ b/Documentation/blockdev/zram.txt
> @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
>  since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
>  size of the disk when not in use so a huge zram is wasteful.
>  
> -5) Activate:
> +5) Set memory limit: Optional
> +	Set memory limit by writing the value to sysfs node 'mem_limit'.
> +	The value can be either in bytes or you can use mem suffixes.
> +	In addition, you could change the value in runtime.
> +	Examples:
> +	    # limit /dev/zram0 with 50MB memory
> +	    echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
> +
> +	    # Using mem suffixes
> +	    echo 256K > /sys/block/zram0/mem_limit
> +	    echo 512M > /sys/block/zram0/mem_limit
> +	    echo 1G > /sys/block/zram0/mem_limit
> +
> +	    # To disable memory limit
> +	    echo 0 > /sys/block/zram0/mem_limit
> +
> +6) Activate:
>  	mkswap /dev/zram0
>  	swapon /dev/zram0
>  
>  	mkfs.ext4 /dev/zram1
>  	mount /dev/zram1 /tmp
>  
> -6) Stats:
> +7) Stats:
>  	Per-device statistics are exported as various nodes under
>  	/sys/block/zram<id>/
>  		disksize
> @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
>  		compr_data_size
>  		mem_used_total
>  
> -7) Deactivate:
> +8) Deactivate:
>  	swapoff /dev/zram0
>  	umount /dev/zram1
>  
> -8) Reset:
> +9) Reset:
>  	Write any positive value to 'reset' sysfs node
>  	echo 1 > /sys/block/zram0/reset
>  	echo 1 > /sys/block/zram1/reset
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index f0b8b30a7128..370c355eb127 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
>  	return scnprintf(buf, PAGE_SIZE, "%d\n", val);
>  }
>  
> +static ssize_t mem_limit_show(struct device *dev,
> +		struct device_attribute *attr, char *buf)
> +{
> +	u64 val;
> +	struct zram *zram = dev_to_zram(dev);
> +
> +	down_read(&zram->init_lock);
> +	val = zram->limit_pages;
> +	up_read(&zram->init_lock);
> +
> +	return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
> +}
> +
> +static ssize_t mem_limit_store(struct device *dev,
> +		struct device_attribute *attr, const char *buf, size_t len)
> +{
> +	u64 limit;
> +	struct zram *zram = dev_to_zram(dev);
> +
> +	limit = memparse(buf, NULL);
> +	down_write(&zram->init_lock);
> +	zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
> +	up_write(&zram->init_lock);
> +
> +	return len;
> +}
> +
>  static ssize_t max_comp_streams_store(struct device *dev,
>  		struct device_attribute *attr, const char *buf, size_t len)
>  {
> @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>  		ret = -ENOMEM;
>  		goto out;
>  	}
> +
> +	if (zram->limit_pages &&
> +		zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
> +		zs_free(meta->mem_pool, handle);
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
>  	cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>  
>  	if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
> @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>  	struct zram_meta *meta;
>  
>  	down_write(&zram->init_lock);
> +
> +	zram->limit_pages = 0;
> +
>  	if (!init_done(zram)) {
>  		up_write(&zram->init_lock);
>  		return;
> @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
>  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
>  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
>  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
> +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
> +		mem_limit_store);
>  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
>  		max_comp_streams_show, max_comp_streams_store);
>  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
> @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
>  	&dev_attr_orig_data_size.attr,
>  	&dev_attr_compr_data_size.attr,
>  	&dev_attr_mem_used_total.attr,
> +	&dev_attr_mem_limit.attr,
>  	&dev_attr_max_comp_streams.attr,
>  	&dev_attr_comp_algorithm.attr,
>  	NULL,
> diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
> index e0f725c87cc6..b7aa9c21553f 100644
> --- a/drivers/block/zram/zram_drv.h
> +++ b/drivers/block/zram/zram_drv.h
> @@ -112,6 +112,11 @@ struct zram {
>  	u64 disksize;	/* bytes */
>  	int max_comp_streams;
>  	struct zram_stats stats;
> +	/*
> +	 * the number of pages zram can consume for storing compressed data
> +	 */
> +	unsigned long limit_pages;
> +
>  	char compressor[10];
>  };
>  #endif
> -- 
> 2.0.0
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 3/4] zram: zram memory size limitation
  2014-08-25 11:09   ` Sergey Senozhatsky
@ 2014-08-26  4:52     ` Minchan Kim
  0 siblings, 0 replies; 27+ messages in thread
From: Minchan Kim @ 2014-08-26  4:52 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, linux-mm, linux-kernel, Jerome Marchand,
	juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta,
	Seth Jennings, Dan Streetman, ds2horner

Hey Sergey,

On Mon, Aug 25, 2014 at 08:09:27PM +0900, Sergey Senozhatsky wrote:
> On (08/25/14 09:05), Minchan Kim wrote:
> > Since zram has no control feature to limit memory usage,
> > it makes hard to manage system memrory.
> > 
> > This patch adds new knob "mem_limit" via sysfs to set up the
> > a limit so that zram could fail allocation once it reaches
> > the limit.
> > 
> > In addition, user could change the limit in runtime so that
> > he could manage the memory more dynamically.
> > 
> > Initial state is no limit so it doesn't break old behavior.
> > 
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > ---
> >  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
> >  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
> >  drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
> >  drivers/block/zram/zram_drv.h              |  5 ++++
> >  4 files changed, 76 insertions(+), 4 deletions(-)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
> > index 70ec992514d0..dbe643775ec1 100644
> > --- a/Documentation/ABI/testing/sysfs-block-zram
> > +++ b/Documentation/ABI/testing/sysfs-block-zram
> > @@ -119,3 +119,13 @@ Description:
> >  		efficiency can be calculated using compr_data_size and this
> >  		statistic.
> >  		Unit: bytes
> > +
> > +What:		/sys/block/zram<id>/mem_limit
> > +Date:		August 2014
> > +Contact:	Minchan Kim <minchan@kernel.org>
> > +Description:
> > +		The mem_limit file is read/write and specifies the amount
> > +		of memory to be able to consume memory to store store
> > +		compressed data. The limit could be changed in run time
> > +		and "0" means disable the limit. No limit is the initial state.
> 
> just a nitpick, sorry.
> "the amount of memory to be able to consume memory to store store compressed data"
> 							^^^^^^^
> 
> "the maximum amount of memory ZRAM can use to store the compressed data"?

Will fix.
Thanks.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 3/4] zram: zram memory size limitation
  2014-08-25  0:05 ` [PATCH v5 3/4] zram: zram memory size limitation Minchan Kim
  2014-08-25 11:09   ` Sergey Senozhatsky
@ 2014-08-26  7:37   ` Joonsoo Kim
  2014-08-26  7:55     ` Minchan Kim
  1 sibling, 1 reply; 27+ messages in thread
From: Joonsoo Kim @ 2014-08-26  7:37 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, Sergey Senozhatsky,
	Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato,
	Nitin Gupta, Seth Jennings, Dan Streetman, ds2horner

On Mon, Aug 25, 2014 at 09:05:55AM +0900, Minchan Kim wrote:
> @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>  		ret = -ENOMEM;
>  		goto out;
>  	}
> +
> +	if (zram->limit_pages &&
> +		zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
> +		zs_free(meta->mem_pool, handle);
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
>  	cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);

Hello,

I don't follow up previous discussion, so I could be wrong.
Why this enforcement should be here?

I think that this has two problems.
1) alloc/free happens unnecessarilly if we have used memory over the
limitation.
2) Even if this request doesn't do new allocation, it could be failed
due to other's allocation. There is time gap between allocation and
free, so legimate user who want to use preallocated zsmalloc memory
could also see this condition true and then he will be failed.

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 3/4] zram: zram memory size limitation
  2014-08-26  7:37   ` Joonsoo Kim
@ 2014-08-26  7:55     ` Minchan Kim
  2014-08-26 12:22       ` David Horner
  0 siblings, 1 reply; 27+ messages in thread
From: Minchan Kim @ 2014-08-26  7:55 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, linux-mm, linux-kernel, Sergey Senozhatsky,
	Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato,
	Nitin Gupta, Seth Jennings, Dan Streetman, ds2horner

Hey Joonsoo,

On Tue, Aug 26, 2014 at 04:37:30PM +0900, Joonsoo Kim wrote:
> On Mon, Aug 25, 2014 at 09:05:55AM +0900, Minchan Kim wrote:
> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
> >  		ret = -ENOMEM;
> >  		goto out;
> >  	}
> > +
> > +	if (zram->limit_pages &&
> > +		zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
> > +		zs_free(meta->mem_pool, handle);
> > +		ret = -ENOMEM;
> > +		goto out;
> > +	}
> > +
> >  	cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
> 
> Hello,
> 
> I don't follow up previous discussion, so I could be wrong.
> Why this enforcement should be here?
> 
> I think that this has two problems.
> 1) alloc/free happens unnecessarilly if we have used memory over the
> limitation.

True but firstly, I implemented the logic in zsmalloc, not zram but
as I described in cover-letter, it's not a requirement of zsmalloc
but zram so it should be in there. If every user want it in future,
then we could move the function into zsmalloc. That's what we
concluded in previous discussion.

Another idea is we could call zs_get_total_pages right before zs_malloc
but the problem is we cannot know how many of pages are allocated
by zsmalloc in advance.
IOW, zram should be blind on zsmalloc's internal.

About alloc/free cost once if it is over the limit,
I don't think it's important to consider.
Do you have any scenario in your mind to consider alloc/free cost
when the limit is over?

> 2) Even if this request doesn't do new allocation, it could be failed
> due to other's allocation. There is time gap between allocation and
> free, so legimate user who want to use preallocated zsmalloc memory
> could also see this condition true and then he will be failed.

Yeb, we already discussed that. :)
Such false positive shouldn't be a severe problem if we can keep a
promise that zram user cannot exceed mem_limit.

> 
> Thanks.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 3/4] zram: zram memory size limitation
  2014-08-26  7:55     ` Minchan Kim
@ 2014-08-26 12:22       ` David Horner
  2014-08-27  1:26         ` Joonsoo Kim
  0 siblings, 1 reply; 27+ messages in thread
From: David Horner @ 2014-08-26 12:22 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Joonsoo Kim, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman

On Tue, Aug 26, 2014 at 3:55 AM, Minchan Kim <minchan@kernel.org> wrote:
> Hey Joonsoo,
>
> On Tue, Aug 26, 2014 at 04:37:30PM +0900, Joonsoo Kim wrote:
>> On Mon, Aug 25, 2014 at 09:05:55AM +0900, Minchan Kim wrote:
>> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>> >             ret = -ENOMEM;
>> >             goto out;
>> >     }
>> > +
>> > +   if (zram->limit_pages &&
>> > +           zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
>> > +           zs_free(meta->mem_pool, handle);
>> > +           ret = -ENOMEM;
>> > +           goto out;
>> > +   }
>> > +
>> >     cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>>
>> Hello,
>>
>> I don't follow up previous discussion, so I could be wrong.
>> Why this enforcement should be here?
>>
>> I think that this has two problems.
>> 1) alloc/free happens unnecessarilly if we have used memory over the
>> limitation.
>
> True but firstly, I implemented the logic in zsmalloc, not zram but
> as I described in cover-letter, it's not a requirement of zsmalloc
> but zram so it should be in there. If every user want it in future,
> then we could move the function into zsmalloc. That's what we
> concluded in previous discussion.
>
> Another idea is we could call zs_get_total_pages right before zs_malloc
> but the problem is we cannot know how many of pages are allocated
> by zsmalloc in advance.
> IOW, zram should be blind on zsmalloc's internal.
>

We did however suggest that we could check before hand to see if
max was already exceeded as an optimization.
(possibly with a guess on usage but at least using the minimum of 1 page)
In the contested case, the max may already be exceeded transiently and
therefore we know this one _could_ fail (it could also pass, but odds
aren't good).
As Minchan mentions this was discussed before - but not into great detail.
Testing should be done to determine possible benefit. And as he also
mentions, the better place for it may be in zsmalloc, but that
requires an ABI change.

Certainly a detailed suggestion could happen on this thread and I'm
also interested
in your thoughts, but this patchset should be able to go in as is.
Memory exhaustion avoidance probably trumps the possible thrashing at
threshold.

> About alloc/free cost once if it is over the limit,
> I don't think it's important to consider.
> Do you have any scenario in your mind to consider alloc/free cost
> when the limit is over?
>
>> 2) Even if this request doesn't do new allocation, it could be failed
>> due to other's allocation. There is time gap between allocation and
>> free, so legimate user who want to use preallocated zsmalloc memory
>> could also see this condition true and then he will be failed.
>
> Yeb, we already discussed that. :)
> Such false positive shouldn't be a severe problem if we can keep a
> promise that zram user cannot exceed mem_limit.
>

And we cannot avoid the race, nor can we avoid in a low overhead competitive
concurrent process transient inconsistent states.
Different views for different observers.
 They are a consequence of the theory of "Special Computational Relativity".
 I am working on a String Unification Theory of Quantum and General CR in LISP.
 ;-)



>>
>> Thanks.
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
> --
> Kind regards,
> Minchan Kim

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 3/4] zram: zram memory size limitation
  2014-08-26 12:22       ` David Horner
@ 2014-08-27  1:26         ` Joonsoo Kim
  2014-08-27  2:51           ` Minchan Kim
  0 siblings, 1 reply; 27+ messages in thread
From: Joonsoo Kim @ 2014-08-27  1:26 UTC (permalink / raw)
  To: David Horner
  Cc: Minchan Kim, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman

Hello, Minchan and David.

On Tue, Aug 26, 2014 at 08:22:29AM -0400, David Horner wrote:
> On Tue, Aug 26, 2014 at 3:55 AM, Minchan Kim <minchan@kernel.org> wrote:
> > Hey Joonsoo,
> >
> > On Tue, Aug 26, 2014 at 04:37:30PM +0900, Joonsoo Kim wrote:
> >> On Mon, Aug 25, 2014 at 09:05:55AM +0900, Minchan Kim wrote:
> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
> >> >             ret = -ENOMEM;
> >> >             goto out;
> >> >     }
> >> > +
> >> > +   if (zram->limit_pages &&
> >> > +           zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
> >> > +           zs_free(meta->mem_pool, handle);
> >> > +           ret = -ENOMEM;
> >> > +           goto out;
> >> > +   }
> >> > +
> >> >     cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
> >>
> >> Hello,
> >>
> >> I don't follow up previous discussion, so I could be wrong.
> >> Why this enforcement should be here?
> >>
> >> I think that this has two problems.
> >> 1) alloc/free happens unnecessarilly if we have used memory over the
> >> limitation.
> >
> > True but firstly, I implemented the logic in zsmalloc, not zram but
> > as I described in cover-letter, it's not a requirement of zsmalloc
> > but zram so it should be in there. If every user want it in future,
> > then we could move the function into zsmalloc. That's what we
> > concluded in previous discussion.

Hmm...
Problem is that we can't avoid these unnecessary overhead in this
implementation. If we can implement this feature in zram efficiently,
it's okay. But, I think that current form isn't.

> >
> > Another idea is we could call zs_get_total_pages right before zs_malloc
> > but the problem is we cannot know how many of pages are allocated
> > by zsmalloc in advance.
> > IOW, zram should be blind on zsmalloc's internal.
> >
> 
> We did however suggest that we could check before hand to see if
> max was already exceeded as an optimization.
> (possibly with a guess on usage but at least using the minimum of 1 page)
> In the contested case, the max may already be exceeded transiently and
> therefore we know this one _could_ fail (it could also pass, but odds
> aren't good).
> As Minchan mentions this was discussed before - but not into great detail.
> Testing should be done to determine possible benefit. And as he also
> mentions, the better place for it may be in zsmalloc, but that
> requires an ABI change.

Why we hesitate to change zsmalloc API? It is in-kernel API and there
are just two users now, zswap and zram. We can change it easily.
I think that we just need following simple API change in zsmalloc.c.

zs_zpool_create(gfp_t gfp, struct zpool_ops *zpool_op)
=>
zs_zpool_create(unsigned long limit, gfp_t gfp, struct zpool_ops
*zpool_op)

It's pool allocator so there is no obstacle for us to limit maximum
memory usage in zsmalloc. It's a natural idea to limit memory usage
for pool allocator.

> Certainly a detailed suggestion could happen on this thread and I'm
> also interested
> in your thoughts, but this patchset should be able to go in as is.
> Memory exhaustion avoidance probably trumps the possible thrashing at
> threshold.
> 
> > About alloc/free cost once if it is over the limit,
> > I don't think it's important to consider.
> > Do you have any scenario in your mind to consider alloc/free cost
> > when the limit is over?
> >
> >> 2) Even if this request doesn't do new allocation, it could be failed
> >> due to other's allocation. There is time gap between allocation and
> >> free, so legimate user who want to use preallocated zsmalloc memory
> >> could also see this condition true and then he will be failed.
> >
> > Yeb, we already discussed that. :)
> > Such false positive shouldn't be a severe problem if we can keep a
> > promise that zram user cannot exceed mem_limit.
> >

If we can keep such a promise, why we need to limit memory usage?
I guess that this limit feature is useful for user who can't keep such promise.
So, we should assume that this false positive happens frequently.

> And we cannot avoid the race, nor can we avoid in a low overhead competitive
> concurrent process transient inconsistent states.
> Different views for different observers.
>  They are a consequence of the theory of "Special Computational Relativity".
>  I am working on a String Unification Theory of Quantum and General CR in LISP.
>  ;-)

If we move limit logic to zsmalloc, we can avoid the race by commiting
needed memory size before actual allocation attempt. This commiting makes
concurrent process serialized so there is no race here. There is
possibilty to fail to allocate, but I think this is better than alloc
and free blindlessly depending on inconsistent states.

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 3/4] zram: zram memory size limitation
  2014-08-27  1:26         ` Joonsoo Kim
@ 2014-08-27  2:51           ` Minchan Kim
  2014-08-27  5:04             ` Joonsoo Kim
  2014-08-27 14:03             ` Dan Streetman
  0 siblings, 2 replies; 27+ messages in thread
From: Minchan Kim @ 2014-08-27  2:51 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: David Horner, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman

Hey Joonsoo,

On Wed, Aug 27, 2014 at 10:26:11AM +0900, Joonsoo Kim wrote:
> Hello, Minchan and David.
> 
> On Tue, Aug 26, 2014 at 08:22:29AM -0400, David Horner wrote:
> > On Tue, Aug 26, 2014 at 3:55 AM, Minchan Kim <minchan@kernel.org> wrote:
> > > Hey Joonsoo,
> > >
> > > On Tue, Aug 26, 2014 at 04:37:30PM +0900, Joonsoo Kim wrote:
> > >> On Mon, Aug 25, 2014 at 09:05:55AM +0900, Minchan Kim wrote:
> > >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
> > >> >             ret = -ENOMEM;
> > >> >             goto out;
> > >> >     }
> > >> > +
> > >> > +   if (zram->limit_pages &&
> > >> > +           zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
> > >> > +           zs_free(meta->mem_pool, handle);
> > >> > +           ret = -ENOMEM;
> > >> > +           goto out;
> > >> > +   }
> > >> > +
> > >> >     cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
> > >>
> > >> Hello,
> > >>
> > >> I don't follow up previous discussion, so I could be wrong.
> > >> Why this enforcement should be here?
> > >>
> > >> I think that this has two problems.
> > >> 1) alloc/free happens unnecessarilly if we have used memory over the
> > >> limitation.
> > >
> > > True but firstly, I implemented the logic in zsmalloc, not zram but
> > > as I described in cover-letter, it's not a requirement of zsmalloc
> > > but zram so it should be in there. If every user want it in future,
> > > then we could move the function into zsmalloc. That's what we
> > > concluded in previous discussion.
> 
> Hmm...
> Problem is that we can't avoid these unnecessary overhead in this
> implementation. If we can implement this feature in zram efficiently,
> it's okay. But, I think that current form isn't.


If we can add it in zsmalloc, it would be more clean and efficient
for zram but as I said, at the moment, I didn't want to put zram's
requirement into zsmalloc because to me, it's weird to enforce max
limit to allocator. It's client's role, I think.

If current implementation is expensive and rather hard to follow,
It would be one reason to move the feature into zsmalloc but
I don't think it makes critical trobule in zram usecase.
See below.

But I still open and will wait others's opinion.
If other guys think zsmalloc is better place, I am willing to move
it into zsmalloc.

> 
> > >
> > > Another idea is we could call zs_get_total_pages right before zs_malloc
> > > but the problem is we cannot know how many of pages are allocated
> > > by zsmalloc in advance.
> > > IOW, zram should be blind on zsmalloc's internal.
> > >
> > 
> > We did however suggest that we could check before hand to see if
> > max was already exceeded as an optimization.
> > (possibly with a guess on usage but at least using the minimum of 1 page)
> > In the contested case, the max may already be exceeded transiently and
> > therefore we know this one _could_ fail (it could also pass, but odds
> > aren't good).
> > As Minchan mentions this was discussed before - but not into great detail.
> > Testing should be done to determine possible benefit. And as he also
> > mentions, the better place for it may be in zsmalloc, but that
> > requires an ABI change.
> 
> Why we hesitate to change zsmalloc API? It is in-kernel API and there
> are just two users now, zswap and zram. We can change it easily.
> I think that we just need following simple API change in zsmalloc.c.
> 
> zs_zpool_create(gfp_t gfp, struct zpool_ops *zpool_op)
> =>
> zs_zpool_create(unsigned long limit, gfp_t gfp, struct zpool_ops
> *zpool_op)
> 
> It's pool allocator so there is no obstacle for us to limit maximum
> memory usage in zsmalloc. It's a natural idea to limit memory usage
> for pool allocator.
> 
> > Certainly a detailed suggestion could happen on this thread and I'm
> > also interested
> > in your thoughts, but this patchset should be able to go in as is.
> > Memory exhaustion avoidance probably trumps the possible thrashing at
> > threshold.
> > 
> > > About alloc/free cost once if it is over the limit,
> > > I don't think it's important to consider.
> > > Do you have any scenario in your mind to consider alloc/free cost
> > > when the limit is over?
> > >
> > >> 2) Even if this request doesn't do new allocation, it could be failed
> > >> due to other's allocation. There is time gap between allocation and
> > >> free, so legimate user who want to use preallocated zsmalloc memory
> > >> could also see this condition true and then he will be failed.
> > >
> > > Yeb, we already discussed that. :)
> > > Such false positive shouldn't be a severe problem if we can keep a
> > > promise that zram user cannot exceed mem_limit.
> > >
> 
> If we can keep such a promise, why we need to limit memory usage?
> I guess that this limit feature is useful for user who can't keep such promise.
> So, we should assume that this false positive happens frequently.


The goal is to limit memory usage within some threshold.
so false positive shouldn't be harmful unless it exceeds the threshold.
In addition, If such false positive happens frequently, it means
zram is very trobule so that user would see lots of write fail
message, sometime really slow system if zram is used for swap.
If we protect just one write from the race, how much does it help
this situation? I don't think it's critical problem.

> 
> > And we cannot avoid the race, nor can we avoid in a low overhead competitive
> > concurrent process transient inconsistent states.
> > Different views for different observers.
> >  They are a consequence of the theory of "Special Computational Relativity".
> >  I am working on a String Unification Theory of Quantum and General CR in LISP.
> >  ;-)
> 
> If we move limit logic to zsmalloc, we can avoid the race by commiting
> needed memory size before actual allocation attempt. This commiting makes
> concurrent process serialized so there is no race here. There is
> possibilty to fail to allocate, but I think this is better than alloc
> and free blindlessly depending on inconsistent states.

Normally, zsmalloc/zsfree allocates object from existing pool so
it's not big overhead and if someone continue to try writing  once limit is
full, another overhead (vfs, fs, block) would be bigger than zsmalloc
so it's not a problem, I think.

> 
> Thanks.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 3/4] zram: zram memory size limitation
  2014-08-27  2:51           ` Minchan Kim
@ 2014-08-27  5:04             ` Joonsoo Kim
  2014-08-27  7:28               ` Minchan Kim
  2014-08-27 14:03             ` Dan Streetman
  1 sibling, 1 reply; 27+ messages in thread
From: Joonsoo Kim @ 2014-08-27  5:04 UTC (permalink / raw)
  To: Minchan Kim
  Cc: David Horner, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman

On Wed, Aug 27, 2014 at 11:51:32AM +0900, Minchan Kim wrote:
> Hey Joonsoo,
> 
> On Wed, Aug 27, 2014 at 10:26:11AM +0900, Joonsoo Kim wrote:
> > Hello, Minchan and David.
> > 
> > On Tue, Aug 26, 2014 at 08:22:29AM -0400, David Horner wrote:
> > > On Tue, Aug 26, 2014 at 3:55 AM, Minchan Kim <minchan@kernel.org> wrote:
> > > > Hey Joonsoo,
> > > >
> > > > On Tue, Aug 26, 2014 at 04:37:30PM +0900, Joonsoo Kim wrote:
> > > >> On Mon, Aug 25, 2014 at 09:05:55AM +0900, Minchan Kim wrote:
> > > >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
> > > >> >             ret = -ENOMEM;
> > > >> >             goto out;
> > > >> >     }
> > > >> > +
> > > >> > +   if (zram->limit_pages &&
> > > >> > +           zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
> > > >> > +           zs_free(meta->mem_pool, handle);
> > > >> > +           ret = -ENOMEM;
> > > >> > +           goto out;
> > > >> > +   }
> > > >> > +
> > > >> >     cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
> > > >>
> > > >> Hello,
> > > >>
> > > >> I don't follow up previous discussion, so I could be wrong.
> > > >> Why this enforcement should be here?
> > > >>
> > > >> I think that this has two problems.
> > > >> 1) alloc/free happens unnecessarilly if we have used memory over the
> > > >> limitation.
> > > >
> > > > True but firstly, I implemented the logic in zsmalloc, not zram but
> > > > as I described in cover-letter, it's not a requirement of zsmalloc
> > > > but zram so it should be in there. If every user want it in future,
> > > > then we could move the function into zsmalloc. That's what we
> > > > concluded in previous discussion.
> > 
> > Hmm...
> > Problem is that we can't avoid these unnecessary overhead in this
> > implementation. If we can implement this feature in zram efficiently,
> > it's okay. But, I think that current form isn't.
> 
> 
> If we can add it in zsmalloc, it would be more clean and efficient
> for zram but as I said, at the moment, I didn't want to put zram's
> requirement into zsmalloc because to me, it's weird to enforce max
> limit to allocator. It's client's role, I think.

AFAIK, many kinds of pools such as thread-pool or memory-pool have
their own limit. It's not weird for me.

> If current implementation is expensive and rather hard to follow,
> It would be one reason to move the feature into zsmalloc but
> I don't think it makes critical trobule in zram usecase.
> See below.
> 
> But I still open and will wait others's opinion.
> If other guys think zsmalloc is better place, I am willing to move
> it into zsmalloc.
> 
> > 
> > > >
> > > > Another idea is we could call zs_get_total_pages right before zs_malloc
> > > > but the problem is we cannot know how many of pages are allocated
> > > > by zsmalloc in advance.
> > > > IOW, zram should be blind on zsmalloc's internal.
> > > >
> > > 
> > > We did however suggest that we could check before hand to see if
> > > max was already exceeded as an optimization.
> > > (possibly with a guess on usage but at least using the minimum of 1 page)
> > > In the contested case, the max may already be exceeded transiently and
> > > therefore we know this one _could_ fail (it could also pass, but odds
> > > aren't good).
> > > As Minchan mentions this was discussed before - but not into great detail.
> > > Testing should be done to determine possible benefit. And as he also
> > > mentions, the better place for it may be in zsmalloc, but that
> > > requires an ABI change.
> > 
> > Why we hesitate to change zsmalloc API? It is in-kernel API and there
> > are just two users now, zswap and zram. We can change it easily.
> > I think that we just need following simple API change in zsmalloc.c.
> > 
> > zs_zpool_create(gfp_t gfp, struct zpool_ops *zpool_op)
> > =>
> > zs_zpool_create(unsigned long limit, gfp_t gfp, struct zpool_ops
> > *zpool_op)
> > 
> > It's pool allocator so there is no obstacle for us to limit maximum
> > memory usage in zsmalloc. It's a natural idea to limit memory usage
> > for pool allocator.
> > 
> > > Certainly a detailed suggestion could happen on this thread and I'm
> > > also interested
> > > in your thoughts, but this patchset should be able to go in as is.
> > > Memory exhaustion avoidance probably trumps the possible thrashing at
> > > threshold.
> > > 
> > > > About alloc/free cost once if it is over the limit,
> > > > I don't think it's important to consider.
> > > > Do you have any scenario in your mind to consider alloc/free cost
> > > > when the limit is over?
> > > >
> > > >> 2) Even if this request doesn't do new allocation, it could be failed
> > > >> due to other's allocation. There is time gap between allocation and
> > > >> free, so legimate user who want to use preallocated zsmalloc memory
> > > >> could also see this condition true and then he will be failed.
> > > >
> > > > Yeb, we already discussed that. :)
> > > > Such false positive shouldn't be a severe problem if we can keep a
> > > > promise that zram user cannot exceed mem_limit.
> > > >
> > 
> > If we can keep such a promise, why we need to limit memory usage?
> > I guess that this limit feature is useful for user who can't keep such promise.
> > So, we should assume that this false positive happens frequently.
> 
> 
> The goal is to limit memory usage within some threshold.
> so false positive shouldn't be harmful unless it exceeds the threshold.
> In addition, If such false positive happens frequently, it means
> zram is very trobule so that user would see lots of write fail
> message, sometime really slow system if zram is used for swap.
> If we protect just one write from the race, how much does it help
> this situation? I don't think it's critical problem.

If it is just rarely happend event that memory usage exceeds the threshold,
why this limit is needed? Could you tell me when this limit is useful
with such assumption?

> 
> > 
> > > And we cannot avoid the race, nor can we avoid in a low overhead competitive
> > > concurrent process transient inconsistent states.
> > > Different views for different observers.
> > >  They are a consequence of the theory of "Special Computational Relativity".
> > >  I am working on a String Unification Theory of Quantum and General CR in LISP.
> > >  ;-)
> > 
> > If we move limit logic to zsmalloc, we can avoid the race by commiting
> > needed memory size before actual allocation attempt. This commiting makes
> > concurrent process serialized so there is no race here. There is
> > possibilty to fail to allocate, but I think this is better than alloc
> > and free blindlessly depending on inconsistent states.
> 
> Normally, zsmalloc/zsfree allocates object from existing pool so
> it's not big overhead and if someone continue to try writing  once limit is
> full, another overhead (vfs, fs, block) would be bigger than zsmalloc
> so it's not a problem, I think.

We should do our best, regardless other subsystem does. If there is
known possibility to suffer from needless alloc/free, why we ignore
it? :)

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 3/4] zram: zram memory size limitation
  2014-08-27  5:04             ` Joonsoo Kim
@ 2014-08-27  7:28               ` Minchan Kim
  2014-08-28  8:21                 ` Joonsoo Kim
  0 siblings, 1 reply; 27+ messages in thread
From: Minchan Kim @ 2014-08-27  7:28 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: David Horner, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman

On Wed, Aug 27, 2014 at 02:04:38PM +0900, Joonsoo Kim wrote:
> On Wed, Aug 27, 2014 at 11:51:32AM +0900, Minchan Kim wrote:
> > Hey Joonsoo,
> > 
> > On Wed, Aug 27, 2014 at 10:26:11AM +0900, Joonsoo Kim wrote:
> > > Hello, Minchan and David.
> > > 
> > > On Tue, Aug 26, 2014 at 08:22:29AM -0400, David Horner wrote:
> > > > On Tue, Aug 26, 2014 at 3:55 AM, Minchan Kim <minchan@kernel.org> wrote:
> > > > > Hey Joonsoo,
> > > > >
> > > > > On Tue, Aug 26, 2014 at 04:37:30PM +0900, Joonsoo Kim wrote:
> > > > >> On Mon, Aug 25, 2014 at 09:05:55AM +0900, Minchan Kim wrote:
> > > > >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
> > > > >> >             ret = -ENOMEM;
> > > > >> >             goto out;
> > > > >> >     }
> > > > >> > +
> > > > >> > +   if (zram->limit_pages &&
> > > > >> > +           zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
> > > > >> > +           zs_free(meta->mem_pool, handle);
> > > > >> > +           ret = -ENOMEM;
> > > > >> > +           goto out;
> > > > >> > +   }
> > > > >> > +
> > > > >> >     cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
> > > > >>
> > > > >> Hello,
> > > > >>
> > > > >> I don't follow up previous discussion, so I could be wrong.
> > > > >> Why this enforcement should be here?
> > > > >>
> > > > >> I think that this has two problems.
> > > > >> 1) alloc/free happens unnecessarilly if we have used memory over the
> > > > >> limitation.
> > > > >
> > > > > True but firstly, I implemented the logic in zsmalloc, not zram but
> > > > > as I described in cover-letter, it's not a requirement of zsmalloc
> > > > > but zram so it should be in there. If every user want it in future,
> > > > > then we could move the function into zsmalloc. That's what we
> > > > > concluded in previous discussion.
> > > 
> > > Hmm...
> > > Problem is that we can't avoid these unnecessary overhead in this
> > > implementation. If we can implement this feature in zram efficiently,
> > > it's okay. But, I think that current form isn't.
> > 
> > 
> > If we can add it in zsmalloc, it would be more clean and efficient
> > for zram but as I said, at the moment, I didn't want to put zram's
> > requirement into zsmalloc because to me, it's weird to enforce max
> > limit to allocator. It's client's role, I think.
> 
> AFAIK, many kinds of pools such as thread-pool or memory-pool have
> their own limit. It's not weird for me.

Actually I don't know what is pool allocator but things you mentioned
is basically used to gaurantee *new* thread/memory, not limit although
it would implement limit.

Another question, why do you think zsmalloc is pool allocator?
IOW, What logic makes you think it's pool allocator?

> 
> > If current implementation is expensive and rather hard to follow,
> > It would be one reason to move the feature into zsmalloc but
> > I don't think it makes critical trobule in zram usecase.
> > See below.
> > 
> > But I still open and will wait others's opinion.
> > If other guys think zsmalloc is better place, I am willing to move
> > it into zsmalloc.
> > 
> > > 
> > > > >
> > > > > Another idea is we could call zs_get_total_pages right before zs_malloc
> > > > > but the problem is we cannot know how many of pages are allocated
> > > > > by zsmalloc in advance.
> > > > > IOW, zram should be blind on zsmalloc's internal.
> > > > >
> > > > 
> > > > We did however suggest that we could check before hand to see if
> > > > max was already exceeded as an optimization.
> > > > (possibly with a guess on usage but at least using the minimum of 1 page)
> > > > In the contested case, the max may already be exceeded transiently and
> > > > therefore we know this one _could_ fail (it could also pass, but odds
> > > > aren't good).
> > > > As Minchan mentions this was discussed before - but not into great detail.
> > > > Testing should be done to determine possible benefit. And as he also
> > > > mentions, the better place for it may be in zsmalloc, but that
> > > > requires an ABI change.
> > > 
> > > Why we hesitate to change zsmalloc API? It is in-kernel API and there
> > > are just two users now, zswap and zram. We can change it easily.
> > > I think that we just need following simple API change in zsmalloc.c.
> > > 
> > > zs_zpool_create(gfp_t gfp, struct zpool_ops *zpool_op)
> > > =>
> > > zs_zpool_create(unsigned long limit, gfp_t gfp, struct zpool_ops
> > > *zpool_op)
> > > 
> > > It's pool allocator so there is no obstacle for us to limit maximum
> > > memory usage in zsmalloc. It's a natural idea to limit memory usage
> > > for pool allocator.
> > > 
> > > > Certainly a detailed suggestion could happen on this thread and I'm
> > > > also interested
> > > > in your thoughts, but this patchset should be able to go in as is.
> > > > Memory exhaustion avoidance probably trumps the possible thrashing at
> > > > threshold.
> > > > 
> > > > > About alloc/free cost once if it is over the limit,
> > > > > I don't think it's important to consider.
> > > > > Do you have any scenario in your mind to consider alloc/free cost
> > > > > when the limit is over?
> > > > >
> > > > >> 2) Even if this request doesn't do new allocation, it could be failed
> > > > >> due to other's allocation. There is time gap between allocation and
> > > > >> free, so legimate user who want to use preallocated zsmalloc memory
> > > > >> could also see this condition true and then he will be failed.
> > > > >
> > > > > Yeb, we already discussed that. :)
> > > > > Such false positive shouldn't be a severe problem if we can keep a
> > > > > promise that zram user cannot exceed mem_limit.
> > > > >
> > > 
> > > If we can keep such a promise, why we need to limit memory usage?
> > > I guess that this limit feature is useful for user who can't keep such promise.
> > > So, we should assume that this false positive happens frequently.
> > 
> > 
> > The goal is to limit memory usage within some threshold.
> > so false positive shouldn't be harmful unless it exceeds the threshold.
> > In addition, If such false positive happens frequently, it means
> > zram is very trobule so that user would see lots of write fail
> > message, sometime really slow system if zram is used for swap.
> > If we protect just one write from the race, how much does it help
> > this situation? I don't think it's critical problem.
> 
> If it is just rarely happend event that memory usage exceeds the threshold,
> why this limit is needed? Could you tell me when this limit is useful
> with such assumption?

If there is no feature, zram can use up all of memory so it's out of control.
Although we could control virtual disksize, it's not perfect all the time.
I expect userspace will have more flexibility to handle memory if zram support
the feature. For example, system can set the limit very low and if it hit the
limit and see enough free memory in system, he could increase the limit to
the hard limit. If it hit the hard limit and other component needs more memory,
we could decrease the limit and kill old process to get a free memory.
It's a simple example but it might give more freedom to userspace memory
manager.

> 
> > 
> > > 
> > > > And we cannot avoid the race, nor can we avoid in a low overhead competitive
> > > > concurrent process transient inconsistent states.
> > > > Different views for different observers.
> > > >  They are a consequence of the theory of "Special Computational Relativity".
> > > >  I am working on a String Unification Theory of Quantum and General CR in LISP.
> > > >  ;-)
> > > 
> > > If we move limit logic to zsmalloc, we can avoid the race by commiting
> > > needed memory size before actual allocation attempt. This commiting makes
> > > concurrent process serialized so there is no race here. There is
> > > possibilty to fail to allocate, but I think this is better than alloc
> > > and free blindlessly depending on inconsistent states.
> > 
> > Normally, zsmalloc/zsfree allocates object from existing pool so
> > it's not big overhead and if someone continue to try writing  once limit is
> > full, another overhead (vfs, fs, block) would be bigger than zsmalloc
> > so it's not a problem, I think.
> 
> We should do our best, regardless other subsystem does. If there is
> known possibility to suffer from needless alloc/free, why we ignore
> it? :)

Yeb, I'd like to do my best and gap between you and me is just difference
of which is best.

> 
> Thanks.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 3/4] zram: zram memory size limitation
  2014-08-27  2:51           ` Minchan Kim
  2014-08-27  5:04             ` Joonsoo Kim
@ 2014-08-27 14:03             ` Dan Streetman
  2014-08-27 14:44               ` David Horner
  2014-08-28  2:52               ` Minchan Kim
  1 sibling, 2 replies; 27+ messages in thread
From: Dan Streetman @ 2014-08-27 14:03 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Joonsoo Kim, David Horner, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings

On Tue, Aug 26, 2014 at 10:51 PM, Minchan Kim <minchan@kernel.org> wrote:
> Hey Joonsoo,
>
> On Wed, Aug 27, 2014 at 10:26:11AM +0900, Joonsoo Kim wrote:
>> Hello, Minchan and David.
>>
>> On Tue, Aug 26, 2014 at 08:22:29AM -0400, David Horner wrote:
>> > On Tue, Aug 26, 2014 at 3:55 AM, Minchan Kim <minchan@kernel.org> wrote:
>> > > Hey Joonsoo,
>> > >
>> > > On Tue, Aug 26, 2014 at 04:37:30PM +0900, Joonsoo Kim wrote:
>> > >> On Mon, Aug 25, 2014 at 09:05:55AM +0900, Minchan Kim wrote:
>> > >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>> > >> >             ret = -ENOMEM;
>> > >> >             goto out;
>> > >> >     }
>> > >> > +
>> > >> > +   if (zram->limit_pages &&
>> > >> > +           zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
>> > >> > +           zs_free(meta->mem_pool, handle);
>> > >> > +           ret = -ENOMEM;
>> > >> > +           goto out;
>> > >> > +   }
>> > >> > +
>> > >> >     cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>> > >>
>> > >> Hello,
>> > >>
>> > >> I don't follow up previous discussion, so I could be wrong.
>> > >> Why this enforcement should be here?
>> > >>
>> > >> I think that this has two problems.
>> > >> 1) alloc/free happens unnecessarilly if we have used memory over the
>> > >> limitation.
>> > >
>> > > True but firstly, I implemented the logic in zsmalloc, not zram but
>> > > as I described in cover-letter, it's not a requirement of zsmalloc
>> > > but zram so it should be in there. If every user want it in future,
>> > > then we could move the function into zsmalloc. That's what we
>> > > concluded in previous discussion.
>>
>> Hmm...
>> Problem is that we can't avoid these unnecessary overhead in this
>> implementation. If we can implement this feature in zram efficiently,
>> it's okay. But, I think that current form isn't.
>
>
> If we can add it in zsmalloc, it would be more clean and efficient
> for zram but as I said, at the moment, I didn't want to put zram's
> requirement into zsmalloc because to me, it's weird to enforce max
> limit to allocator. It's client's role, I think.
>
> If current implementation is expensive and rather hard to follow,
> It would be one reason to move the feature into zsmalloc but
> I don't think it makes critical trobule in zram usecase.
> See below.
>
> But I still open and will wait others's opinion.
> If other guys think zsmalloc is better place, I am willing to move
> it into zsmalloc.

Moving it into zsmalloc would allow rejecting new zsmallocs before
actually crossing the limit, since it can calculate that internally.
However, with the current patches the limit will only be briefly
crossed, and it should not be crossed by a large amount.  Now, if this
is happening repeatedly and quickly during extreme memory pressure,
the constant alloc/free will clearly be worse than a simple internal
calculation and failure.  But would it ever happen repeatedly once the
zram limit is reached?

Now that I'm thinking about the limit from the perspective of the zram
user, I wonder what really will happen.  If zram is being used for
swap space, then when swap starts getting errors trying to write
pages, how damaging will that be to the system?  I haven't checked
what swap does when it encounters disk errors.  Of course, with no
zram limit, continually writing to zram until memory is totally
consumed isn't good either.  But in any case, I would hope that swap
would not repeatedly hammer on a disk when it's getting write failures
from it.

Alternately, if zram was being used as a compressed ram disk for
regular file storage, it's entirely up to the application to handle
write failures, so it may continue to try to write to a full zram
disk.

As far as what the zsmalloc api would look like with the limit added,
it would need a setter and getter function (adding it as a param to
the create function would be optional i think).  But more importantly,
it would need to handle multiple ways of specifying the limit.  In our
specific current use cases, zram and zswap, each handles their
internal limit differently - zswap currently uses a % of total ram as
its limit (defaulting to 20), while with these patches zram will use a
specific number of bytes as its limit (defaulting to no limit).  If
the limiting mechanism is moved into zsmalloc (and possibly zbud),
then either both users need to use the same units (bytes or %ram), or
zsmalloc/zbud need to be able to set their limit in either units.  It
seems to me like keeping the limit in zram/zswap is currently
preferable, at least without both using the same limit units.


>
>>
>> > >
>> > > Another idea is we could call zs_get_total_pages right before zs_malloc
>> > > but the problem is we cannot know how many of pages are allocated
>> > > by zsmalloc in advance.
>> > > IOW, zram should be blind on zsmalloc's internal.
>> > >
>> >
>> > We did however suggest that we could check before hand to see if
>> > max was already exceeded as an optimization.
>> > (possibly with a guess on usage but at least using the minimum of 1 page)
>> > In the contested case, the max may already be exceeded transiently and
>> > therefore we know this one _could_ fail (it could also pass, but odds
>> > aren't good).
>> > As Minchan mentions this was discussed before - but not into great detail.
>> > Testing should be done to determine possible benefit. And as he also
>> > mentions, the better place for it may be in zsmalloc, but that
>> > requires an ABI change.
>>
>> Why we hesitate to change zsmalloc API? It is in-kernel API and there
>> are just two users now, zswap and zram. We can change it easily.
>> I think that we just need following simple API change in zsmalloc.c.
>>
>> zs_zpool_create(gfp_t gfp, struct zpool_ops *zpool_op)
>> =>
>> zs_zpool_create(unsigned long limit, gfp_t gfp, struct zpool_ops
>> *zpool_op)
>>
>> It's pool allocator so there is no obstacle for us to limit maximum
>> memory usage in zsmalloc. It's a natural idea to limit memory usage
>> for pool allocator.
>>
>> > Certainly a detailed suggestion could happen on this thread and I'm
>> > also interested
>> > in your thoughts, but this patchset should be able to go in as is.
>> > Memory exhaustion avoidance probably trumps the possible thrashing at
>> > threshold.
>> >
>> > > About alloc/free cost once if it is over the limit,
>> > > I don't think it's important to consider.
>> > > Do you have any scenario in your mind to consider alloc/free cost
>> > > when the limit is over?
>> > >
>> > >> 2) Even if this request doesn't do new allocation, it could be failed
>> > >> due to other's allocation. There is time gap between allocation and
>> > >> free, so legimate user who want to use preallocated zsmalloc memory
>> > >> could also see this condition true and then he will be failed.
>> > >
>> > > Yeb, we already discussed that. :)
>> > > Such false positive shouldn't be a severe problem if we can keep a
>> > > promise that zram user cannot exceed mem_limit.
>> > >
>>
>> If we can keep such a promise, why we need to limit memory usage?
>> I guess that this limit feature is useful for user who can't keep such promise.
>> So, we should assume that this false positive happens frequently.
>
>
> The goal is to limit memory usage within some threshold.
> so false positive shouldn't be harmful unless it exceeds the threshold.
> In addition, If such false positive happens frequently, it means
> zram is very trobule so that user would see lots of write fail
> message, sometime really slow system if zram is used for swap.
> If we protect just one write from the race, how much does it help
> this situation? I don't think it's critical problem.
>
>>
>> > And we cannot avoid the race, nor can we avoid in a low overhead competitive
>> > concurrent process transient inconsistent states.
>> > Different views for different observers.
>> >  They are a consequence of the theory of "Special Computational Relativity".
>> >  I am working on a String Unification Theory of Quantum and General CR in LISP.
>> >  ;-)
>>
>> If we move limit logic to zsmalloc, we can avoid the race by commiting
>> needed memory size before actual allocation attempt. This commiting makes
>> concurrent process serialized so there is no race here. There is
>> possibilty to fail to allocate, but I think this is better than alloc
>> and free blindlessly depending on inconsistent states.
>
> Normally, zsmalloc/zsfree allocates object from existing pool so
> it's not big overhead and if someone continue to try writing  once limit is
> full, another overhead (vfs, fs, block) would be bigger than zsmalloc
> so it's not a problem, I think.
>
>>
>> Thanks.
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
> --
> Kind regards,
> Minchan Kim

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 3/4] zram: zram memory size limitation
  2014-08-27 14:03             ` Dan Streetman
@ 2014-08-27 14:44               ` David Horner
  2014-08-27 15:14                 ` Dan Streetman
  2014-08-28  2:52               ` Minchan Kim
  1 sibling, 1 reply; 27+ messages in thread
From: David Horner @ 2014-08-27 14:44 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Minchan Kim, Joonsoo Kim, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings

On Wed, Aug 27, 2014 at 10:03 AM, Dan Streetman <ddstreet@ieee.org> wrote:
> On Tue, Aug 26, 2014 at 10:51 PM, Minchan Kim <minchan@kernel.org> wrote:
>> Hey Joonsoo,
>>
>> On Wed, Aug 27, 2014 at 10:26:11AM +0900, Joonsoo Kim wrote:
>>> Hello, Minchan and David.
>>>
>>> On Tue, Aug 26, 2014 at 08:22:29AM -0400, David Horner wrote:
>>> > On Tue, Aug 26, 2014 at 3:55 AM, Minchan Kim <minchan@kernel.org> wrote:
>>> > > Hey Joonsoo,
>>> > >
>>> > > On Tue, Aug 26, 2014 at 04:37:30PM +0900, Joonsoo Kim wrote:
>>> > >> On Mon, Aug 25, 2014 at 09:05:55AM +0900, Minchan Kim wrote:
>>> > >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>>> > >> >             ret = -ENOMEM;
>>> > >> >             goto out;
>>> > >> >     }
>>> > >> > +
>>> > >> > +   if (zram->limit_pages &&
>>> > >> > +           zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
>>> > >> > +           zs_free(meta->mem_pool, handle);
>>> > >> > +           ret = -ENOMEM;
>>> > >> > +           goto out;
>>> > >> > +   }
>>> > >> > +
>>> > >> >     cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>>> > >>
>>> > >> Hello,
>>> > >>
>>> > >> I don't follow up previous discussion, so I could be wrong.
>>> > >> Why this enforcement should be here?
>>> > >>
>>> > >> I think that this has two problems.
>>> > >> 1) alloc/free happens unnecessarilly if we have used memory over the
>>> > >> limitation.
>>> > >
>>> > > True but firstly, I implemented the logic in zsmalloc, not zram but
>>> > > as I described in cover-letter, it's not a requirement of zsmalloc
>>> > > but zram so it should be in there. If every user want it in future,
>>> > > then we could move the function into zsmalloc. That's what we
>>> > > concluded in previous discussion.
>>>
>>> Hmm...
>>> Problem is that we can't avoid these unnecessary overhead in this
>>> implementation. If we can implement this feature in zram efficiently,
>>> it's okay. But, I think that current form isn't.
>>
>>
>> If we can add it in zsmalloc, it would be more clean and efficient
>> for zram but as I said, at the moment, I didn't want to put zram's
>> requirement into zsmalloc because to me, it's weird to enforce max
>> limit to allocator. It's client's role, I think.
>>
>> If current implementation is expensive and rather hard to follow,
>> It would be one reason to move the feature into zsmalloc but
>> I don't think it makes critical trobule in zram usecase.
>> See below.
>>
>> But I still open and will wait others's opinion.
>> If other guys think zsmalloc is better place, I am willing to move
>> it into zsmalloc.
>
> Moving it into zsmalloc would allow rejecting new zsmallocs before
> actually crossing the limit, since it can calculate that internally.
> However, with the current patches the limit will only be briefly
> crossed, and it should not be crossed by a large amount.  Now, if this
> is happening repeatedly and quickly during extreme memory pressure,
> the constant alloc/free will clearly be worse than a simple internal
> calculation and failure.  But would it ever happen repeatedly once the
> zram limit is reached?
>
> Now that I'm thinking about the limit from the perspective of the zram
> user, I wonder what really will happen.  If zram is being used for
> swap space, then when swap starts getting errors trying to write
> pages, how damaging will that be to the system?  I haven't checked
> what swap does when it encounters disk errors.  Of course, with no
> zram limit, continually writing to zram until memory is totally
> consumed isn't good either.  But in any case, I would hope that swap
> would not repeatedly hammer on a disk when it's getting write failures
> from it.
>
> Alternately, if zram was being used as a compressed ram disk for
> regular file storage, it's entirely up to the application to handle
> write failures, so it may continue to try to write to a full zram
> disk.
>
> As far as what the zsmalloc api would look like with the limit added,
> it would need a setter and getter function (adding it as a param to
> the create function would be optional i think).  But more importantly,
> it would need to handle multiple ways of specifying the limit.  In our
> specific current use cases, zram and zswap, each handles their
> internal limit differently - zswap currently uses a % of total ram as
> its limit (defaulting to 20), while with these patches zram will use a
> specific number of bytes as its limit (defaulting to no limit).  If
> the limiting mechanism is moved into zsmalloc (and possibly zbud),
> then either both users need to use the same units (bytes or %ram), or
> zsmalloc/zbud need to be able to set their limit in either units.  It
> seems to me like keeping the limit in zram/zswap is currently
> preferable, at least without both using the same limit units.
>

zswap knows what 20% (or whatever % it currently uses , and perhaps it too
will become a tuning knob) of memory is in bytes.

So, if the interface to establish a limit for a pool (or pool set, or whatever
zsmalloc sets up for its allocation mechanism) is stipulated in bytes
(to actually use pages internally, of visa-versa) , then both can use
that interface.
zram with its native page stipulation, and zswap with calculated % of memory).

Both would need a mechanism to change the max as need change,
 so the API has to handle this.


Or am I way off base?


>
>>
>>>
>>> > >
>>> > > Another idea is we could call zs_get_total_pages right before zs_malloc
>>> > > but the problem is we cannot know how many of pages are allocated
>>> > > by zsmalloc in advance.
>>> > > IOW, zram should be blind on zsmalloc's internal.
>>> > >
>>> >
>>> > We did however suggest that we could check before hand to see if
>>> > max was already exceeded as an optimization.
>>> > (possibly with a guess on usage but at least using the minimum of 1 page)
>>> > In the contested case, the max may already be exceeded transiently and
>>> > therefore we know this one _could_ fail (it could also pass, but odds
>>> > aren't good).
>>> > As Minchan mentions this was discussed before - but not into great detail.
>>> > Testing should be done to determine possible benefit. And as he also
>>> > mentions, the better place for it may be in zsmalloc, but that
>>> > requires an ABI change.
>>>
>>> Why we hesitate to change zsmalloc API? It is in-kernel API and there
>>> are just two users now, zswap and zram. We can change it easily.
>>> I think that we just need following simple API change in zsmalloc.c.
>>>
>>> zs_zpool_create(gfp_t gfp, struct zpool_ops *zpool_op)
>>> =>
>>> zs_zpool_create(unsigned long limit, gfp_t gfp, struct zpool_ops
>>> *zpool_op)
>>>
>>> It's pool allocator so there is no obstacle for us to limit maximum
>>> memory usage in zsmalloc. It's a natural idea to limit memory usage
>>> for pool allocator.
>>>
>>> > Certainly a detailed suggestion could happen on this thread and I'm
>>> > also interested
>>> > in your thoughts, but this patchset should be able to go in as is.
>>> > Memory exhaustion avoidance probably trumps the possible thrashing at
>>> > threshold.
>>> >
>>> > > About alloc/free cost once if it is over the limit,
>>> > > I don't think it's important to consider.
>>> > > Do you have any scenario in your mind to consider alloc/free cost
>>> > > when the limit is over?
>>> > >
>>> > >> 2) Even if this request doesn't do new allocation, it could be failed
>>> > >> due to other's allocation. There is time gap between allocation and
>>> > >> free, so legimate user who want to use preallocated zsmalloc memory
>>> > >> could also see this condition true and then he will be failed.
>>> > >
>>> > > Yeb, we already discussed that. :)
>>> > > Such false positive shouldn't be a severe problem if we can keep a
>>> > > promise that zram user cannot exceed mem_limit.
>>> > >
>>>
>>> If we can keep such a promise, why we need to limit memory usage?
>>> I guess that this limit feature is useful for user who can't keep such promise.
>>> So, we should assume that this false positive happens frequently.
>>
>>
>> The goal is to limit memory usage within some threshold.
>> so false positive shouldn't be harmful unless it exceeds the threshold.
>> In addition, If such false positive happens frequently, it means
>> zram is very trobule so that user would see lots of write fail
>> message, sometime really slow system if zram is used for swap.
>> If we protect just one write from the race, how much does it help
>> this situation? I don't think it's critical problem.
>>
>>>
>>> > And we cannot avoid the race, nor can we avoid in a low overhead competitive
>>> > concurrent process transient inconsistent states.
>>> > Different views for different observers.
>>> >  They are a consequence of the theory of "Special Computational Relativity".
>>> >  I am working on a String Unification Theory of Quantum and General CR in LISP.
>>> >  ;-)
>>>
>>> If we move limit logic to zsmalloc, we can avoid the race by commiting
>>> needed memory size before actual allocation attempt. This commiting makes
>>> concurrent process serialized so there is no race here. There is
>>> possibilty to fail to allocate, but I think this is better than alloc
>>> and free blindlessly depending on inconsistent states.
>>
>> Normally, zsmalloc/zsfree allocates object from existing pool so
>> it's not big overhead and if someone continue to try writing  once limit is
>> full, another overhead (vfs, fs, block) would be bigger than zsmalloc
>> so it's not a problem, I think.
>>
>>>
>>> Thanks.
>>>
>>> --
>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>> see: http://www.linux-mm.org/ .
>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>
>> --
>> Kind regards,
>> Minchan Kim

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 3/4] zram: zram memory size limitation
  2014-08-27 14:44               ` David Horner
@ 2014-08-27 15:14                 ` Dan Streetman
  2014-08-27 15:35                   ` David Horner
  0 siblings, 1 reply; 27+ messages in thread
From: Dan Streetman @ 2014-08-27 15:14 UTC (permalink / raw)
  To: David Horner
  Cc: Minchan Kim, Joonsoo Kim, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings

On Wed, Aug 27, 2014 at 10:44 AM, David Horner <ds2horner@gmail.com> wrote:
> On Wed, Aug 27, 2014 at 10:03 AM, Dan Streetman <ddstreet@ieee.org> wrote:
>> On Tue, Aug 26, 2014 at 10:51 PM, Minchan Kim <minchan@kernel.org> wrote:
>>> Hey Joonsoo,
>>>
>>> On Wed, Aug 27, 2014 at 10:26:11AM +0900, Joonsoo Kim wrote:
>>>> Hello, Minchan and David.
>>>>
>>>> On Tue, Aug 26, 2014 at 08:22:29AM -0400, David Horner wrote:
>>>> > On Tue, Aug 26, 2014 at 3:55 AM, Minchan Kim <minchan@kernel.org> wrote:
>>>> > > Hey Joonsoo,
>>>> > >
>>>> > > On Tue, Aug 26, 2014 at 04:37:30PM +0900, Joonsoo Kim wrote:
>>>> > >> On Mon, Aug 25, 2014 at 09:05:55AM +0900, Minchan Kim wrote:
>>>> > >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>>>> > >> >             ret = -ENOMEM;
>>>> > >> >             goto out;
>>>> > >> >     }
>>>> > >> > +
>>>> > >> > +   if (zram->limit_pages &&
>>>> > >> > +           zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
>>>> > >> > +           zs_free(meta->mem_pool, handle);
>>>> > >> > +           ret = -ENOMEM;
>>>> > >> > +           goto out;
>>>> > >> > +   }
>>>> > >> > +
>>>> > >> >     cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>>>> > >>
>>>> > >> Hello,
>>>> > >>
>>>> > >> I don't follow up previous discussion, so I could be wrong.
>>>> > >> Why this enforcement should be here?
>>>> > >>
>>>> > >> I think that this has two problems.
>>>> > >> 1) alloc/free happens unnecessarilly if we have used memory over the
>>>> > >> limitation.
>>>> > >
>>>> > > True but firstly, I implemented the logic in zsmalloc, not zram but
>>>> > > as I described in cover-letter, it's not a requirement of zsmalloc
>>>> > > but zram so it should be in there. If every user want it in future,
>>>> > > then we could move the function into zsmalloc. That's what we
>>>> > > concluded in previous discussion.
>>>>
>>>> Hmm...
>>>> Problem is that we can't avoid these unnecessary overhead in this
>>>> implementation. If we can implement this feature in zram efficiently,
>>>> it's okay. But, I think that current form isn't.
>>>
>>>
>>> If we can add it in zsmalloc, it would be more clean and efficient
>>> for zram but as I said, at the moment, I didn't want to put zram's
>>> requirement into zsmalloc because to me, it's weird to enforce max
>>> limit to allocator. It's client's role, I think.
>>>
>>> If current implementation is expensive and rather hard to follow,
>>> It would be one reason to move the feature into zsmalloc but
>>> I don't think it makes critical trobule in zram usecase.
>>> See below.
>>>
>>> But I still open and will wait others's opinion.
>>> If other guys think zsmalloc is better place, I am willing to move
>>> it into zsmalloc.
>>
>> Moving it into zsmalloc would allow rejecting new zsmallocs before
>> actually crossing the limit, since it can calculate that internally.
>> However, with the current patches the limit will only be briefly
>> crossed, and it should not be crossed by a large amount.  Now, if this
>> is happening repeatedly and quickly during extreme memory pressure,
>> the constant alloc/free will clearly be worse than a simple internal
>> calculation and failure.  But would it ever happen repeatedly once the
>> zram limit is reached?
>>
>> Now that I'm thinking about the limit from the perspective of the zram
>> user, I wonder what really will happen.  If zram is being used for
>> swap space, then when swap starts getting errors trying to write
>> pages, how damaging will that be to the system?  I haven't checked
>> what swap does when it encounters disk errors.  Of course, with no
>> zram limit, continually writing to zram until memory is totally
>> consumed isn't good either.  But in any case, I would hope that swap
>> would not repeatedly hammer on a disk when it's getting write failures
>> from it.
>>
>> Alternately, if zram was being used as a compressed ram disk for
>> regular file storage, it's entirely up to the application to handle
>> write failures, so it may continue to try to write to a full zram
>> disk.
>>
>> As far as what the zsmalloc api would look like with the limit added,
>> it would need a setter and getter function (adding it as a param to
>> the create function would be optional i think).  But more importantly,
>> it would need to handle multiple ways of specifying the limit.  In our
>> specific current use cases, zram and zswap, each handles their
>> internal limit differently - zswap currently uses a % of total ram as
>> its limit (defaulting to 20), while with these patches zram will use a
>> specific number of bytes as its limit (defaulting to no limit).  If
>> the limiting mechanism is moved into zsmalloc (and possibly zbud),
>> then either both users need to use the same units (bytes or %ram), or
>> zsmalloc/zbud need to be able to set their limit in either units.  It
>> seems to me like keeping the limit in zram/zswap is currently
>> preferable, at least without both using the same limit units.
>>
>
> zswap knows what 20% (or whatever % it currently uses , and perhaps it too
> will become a tuning knob) of memory is in bytes.
>
> So, if the interface to establish a limit for a pool (or pool set, or whatever
> zsmalloc sets up for its allocation mechanism) is stipulated in bytes
> (to actually use pages internally, of visa-versa) , then both can use
> that interface.
> zram with its native page stipulation, and zswap with calculated % of memory).

No, unless zswap monitors memory hotplug and updates the limit on each
hotplug event, 20% of the *current* total ram at zswap initialization
is not equal to an actual 20% of ram limit.  zswap checks its size
against totalram_pages for each new allocation. I don't think we would
prefer adding memory hotplug monitoring to zswap just to update the
zpool size limit.

>
> Both would need a mechanism to change the max as need change,
>  so the API has to handle this.
>
>
> Or am I way off base?
>
>
>>
>>>
>>>>
>>>> > >
>>>> > > Another idea is we could call zs_get_total_pages right before zs_malloc
>>>> > > but the problem is we cannot know how many of pages are allocated
>>>> > > by zsmalloc in advance.
>>>> > > IOW, zram should be blind on zsmalloc's internal.
>>>> > >
>>>> >
>>>> > We did however suggest that we could check before hand to see if
>>>> > max was already exceeded as an optimization.
>>>> > (possibly with a guess on usage but at least using the minimum of 1 page)
>>>> > In the contested case, the max may already be exceeded transiently and
>>>> > therefore we know this one _could_ fail (it could also pass, but odds
>>>> > aren't good).
>>>> > As Minchan mentions this was discussed before - but not into great detail.
>>>> > Testing should be done to determine possible benefit. And as he also
>>>> > mentions, the better place for it may be in zsmalloc, but that
>>>> > requires an ABI change.
>>>>
>>>> Why we hesitate to change zsmalloc API? It is in-kernel API and there
>>>> are just two users now, zswap and zram. We can change it easily.
>>>> I think that we just need following simple API change in zsmalloc.c.
>>>>
>>>> zs_zpool_create(gfp_t gfp, struct zpool_ops *zpool_op)
>>>> =>
>>>> zs_zpool_create(unsigned long limit, gfp_t gfp, struct zpool_ops
>>>> *zpool_op)
>>>>
>>>> It's pool allocator so there is no obstacle for us to limit maximum
>>>> memory usage in zsmalloc. It's a natural idea to limit memory usage
>>>> for pool allocator.
>>>>
>>>> > Certainly a detailed suggestion could happen on this thread and I'm
>>>> > also interested
>>>> > in your thoughts, but this patchset should be able to go in as is.
>>>> > Memory exhaustion avoidance probably trumps the possible thrashing at
>>>> > threshold.
>>>> >
>>>> > > About alloc/free cost once if it is over the limit,
>>>> > > I don't think it's important to consider.
>>>> > > Do you have any scenario in your mind to consider alloc/free cost
>>>> > > when the limit is over?
>>>> > >
>>>> > >> 2) Even if this request doesn't do new allocation, it could be failed
>>>> > >> due to other's allocation. There is time gap between allocation and
>>>> > >> free, so legimate user who want to use preallocated zsmalloc memory
>>>> > >> could also see this condition true and then he will be failed.
>>>> > >
>>>> > > Yeb, we already discussed that. :)
>>>> > > Such false positive shouldn't be a severe problem if we can keep a
>>>> > > promise that zram user cannot exceed mem_limit.
>>>> > >
>>>>
>>>> If we can keep such a promise, why we need to limit memory usage?
>>>> I guess that this limit feature is useful for user who can't keep such promise.
>>>> So, we should assume that this false positive happens frequently.
>>>
>>>
>>> The goal is to limit memory usage within some threshold.
>>> so false positive shouldn't be harmful unless it exceeds the threshold.
>>> In addition, If such false positive happens frequently, it means
>>> zram is very trobule so that user would see lots of write fail
>>> message, sometime really slow system if zram is used for swap.
>>> If we protect just one write from the race, how much does it help
>>> this situation? I don't think it's critical problem.
>>>
>>>>
>>>> > And we cannot avoid the race, nor can we avoid in a low overhead competitive
>>>> > concurrent process transient inconsistent states.
>>>> > Different views for different observers.
>>>> >  They are a consequence of the theory of "Special Computational Relativity".
>>>> >  I am working on a String Unification Theory of Quantum and General CR in LISP.
>>>> >  ;-)
>>>>
>>>> If we move limit logic to zsmalloc, we can avoid the race by commiting
>>>> needed memory size before actual allocation attempt. This commiting makes
>>>> concurrent process serialized so there is no race here. There is
>>>> possibilty to fail to allocate, but I think this is better than alloc
>>>> and free blindlessly depending on inconsistent states.
>>>
>>> Normally, zsmalloc/zsfree allocates object from existing pool so
>>> it's not big overhead and if someone continue to try writing  once limit is
>>> full, another overhead (vfs, fs, block) would be bigger than zsmalloc
>>> so it's not a problem, I think.
>>>
>>>>
>>>> Thanks.
>>>>
>>>> --
>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>>> see: http://www.linux-mm.org/ .
>>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>
>>> --
>>> Kind regards,
>>> Minchan Kim

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 3/4] zram: zram memory size limitation
  2014-08-27 15:14                 ` Dan Streetman
@ 2014-08-27 15:35                   ` David Horner
  2014-08-27 16:29                     ` Dan Streetman
  0 siblings, 1 reply; 27+ messages in thread
From: David Horner @ 2014-08-27 15:35 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Minchan Kim, Joonsoo Kim, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings

On Wed, Aug 27, 2014 at 11:14 AM, Dan Streetman <ddstreet@ieee.org> wrote:
> On Wed, Aug 27, 2014 at 10:44 AM, David Horner <ds2horner@gmail.com> wrote:
>> On Wed, Aug 27, 2014 at 10:03 AM, Dan Streetman <ddstreet@ieee.org> wrote:
>>> On Tue, Aug 26, 2014 at 10:51 PM, Minchan Kim <minchan@kernel.org> wrote:
>>>> Hey Joonsoo,
>>>>
>>>> On Wed, Aug 27, 2014 at 10:26:11AM +0900, Joonsoo Kim wrote:
>>>>> Hello, Minchan and David.
>>>>>
>>>>> On Tue, Aug 26, 2014 at 08:22:29AM -0400, David Horner wrote:
>>>>> > On Tue, Aug 26, 2014 at 3:55 AM, Minchan Kim <minchan@kernel.org> wrote:
>>>>> > > Hey Joonsoo,
>>>>> > >
>>>>> > > On Tue, Aug 26, 2014 at 04:37:30PM +0900, Joonsoo Kim wrote:
>>>>> > >> On Mon, Aug 25, 2014 at 09:05:55AM +0900, Minchan Kim wrote:
>>>>> > >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>>>>> > >> >             ret = -ENOMEM;
>>>>> > >> >             goto out;
>>>>> > >> >     }
>>>>> > >> > +
>>>>> > >> > +   if (zram->limit_pages &&
>>>>> > >> > +           zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
>>>>> > >> > +           zs_free(meta->mem_pool, handle);
>>>>> > >> > +           ret = -ENOMEM;
>>>>> > >> > +           goto out;
>>>>> > >> > +   }
>>>>> > >> > +
>>>>> > >> >     cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>>>>> > >>
>>>>> > >> Hello,
>>>>> > >>
>>>>> > >> I don't follow up previous discussion, so I could be wrong.
>>>>> > >> Why this enforcement should be here?
>>>>> > >>
>>>>> > >> I think that this has two problems.
>>>>> > >> 1) alloc/free happens unnecessarilly if we have used memory over the
>>>>> > >> limitation.
>>>>> > >
>>>>> > > True but firstly, I implemented the logic in zsmalloc, not zram but
>>>>> > > as I described in cover-letter, it's not a requirement of zsmalloc
>>>>> > > but zram so it should be in there. If every user want it in future,
>>>>> > > then we could move the function into zsmalloc. That's what we
>>>>> > > concluded in previous discussion.
>>>>>
>>>>> Hmm...
>>>>> Problem is that we can't avoid these unnecessary overhead in this
>>>>> implementation. If we can implement this feature in zram efficiently,
>>>>> it's okay. But, I think that current form isn't.
>>>>
>>>>
>>>> If we can add it in zsmalloc, it would be more clean and efficient
>>>> for zram but as I said, at the moment, I didn't want to put zram's
>>>> requirement into zsmalloc because to me, it's weird to enforce max
>>>> limit to allocator. It's client's role, I think.
>>>>
>>>> If current implementation is expensive and rather hard to follow,
>>>> It would be one reason to move the feature into zsmalloc but
>>>> I don't think it makes critical trobule in zram usecase.
>>>> See below.
>>>>
>>>> But I still open and will wait others's opinion.
>>>> If other guys think zsmalloc is better place, I am willing to move
>>>> it into zsmalloc.
>>>
>>> Moving it into zsmalloc would allow rejecting new zsmallocs before
>>> actually crossing the limit, since it can calculate that internally.
>>> However, with the current patches the limit will only be briefly
>>> crossed, and it should not be crossed by a large amount.  Now, if this
>>> is happening repeatedly and quickly during extreme memory pressure,
>>> the constant alloc/free will clearly be worse than a simple internal
>>> calculation and failure.  But would it ever happen repeatedly once the
>>> zram limit is reached?
>>>
>>> Now that I'm thinking about the limit from the perspective of the zram
>>> user, I wonder what really will happen.  If zram is being used for
>>> swap space, then when swap starts getting errors trying to write
>>> pages, how damaging will that be to the system?  I haven't checked
>>> what swap does when it encounters disk errors.  Of course, with no
>>> zram limit, continually writing to zram until memory is totally
>>> consumed isn't good either.  But in any case, I would hope that swap
>>> would not repeatedly hammer on a disk when it's getting write failures
>>> from it.
>>>
>>> Alternately, if zram was being used as a compressed ram disk for
>>> regular file storage, it's entirely up to the application to handle
>>> write failures, so it may continue to try to write to a full zram
>>> disk.
>>>
>>> As far as what the zsmalloc api would look like with the limit added,
>>> it would need a setter and getter function (adding it as a param to
>>> the create function would be optional i think).  But more importantly,
>>> it would need to handle multiple ways of specifying the limit.  In our
>>> specific current use cases, zram and zswap, each handles their
>>> internal limit differently - zswap currently uses a % of total ram as
>>> its limit (defaulting to 20), while with these patches zram will use a
>>> specific number of bytes as its limit (defaulting to no limit).  If
>>> the limiting mechanism is moved into zsmalloc (and possibly zbud),
>>> then either both users need to use the same units (bytes or %ram), or
>>> zsmalloc/zbud need to be able to set their limit in either units.  It
>>> seems to me like keeping the limit in zram/zswap is currently
>>> preferable, at least without both using the same limit units.
>>>
>>
>> zswap knows what 20% (or whatever % it currently uses , and perhaps it too
>> will become a tuning knob) of memory is in bytes.
>>
>> So, if the interface to establish a limit for a pool (or pool set, or whatever
>> zsmalloc sets up for its allocation mechanism) is stipulated in bytes
>> (to actually use pages internally, of visa-versa) , then both can use
>> that interface.
>> zram with its native page stipulation, and zswap with calculated % of memory).
>
> No, unless zswap monitors memory hotplug and updates the limit on each
> hotplug event, 20% of the *current* total ram at zswap initialization
> is not equal to an actual 20% of ram limit.  zswap checks its size
> against totalram_pages for each new allocation. I don't think we would
> prefer adding memory hotplug monitoring to zswap just to update the
> zpool size limit.
>

OK - I see the need to retain the limits where they are in the using
components so that
zsmalloc is not unnecessarily complicated (keeping track of 2 limit methods).

So, zswap has the same race conditions and possible transient over-allocations?
It looks like I will have to check on how zswap implements it.
But perhaps you can answer the question that is not in the code:
Have there been reported thrashing behaviour around the 20% limit for zswap?

thanks.

>>
>> Both would need a mechanism to change the max as need change,
>>  so the API has to handle this.
>>
>>
>> Or am I way off base?
>>
>>
>>>
>>>>
>>>>>
>>>>> > >
>>>>> > > Another idea is we could call zs_get_total_pages right before zs_malloc
>>>>> > > but the problem is we cannot know how many of pages are allocated
>>>>> > > by zsmalloc in advance.
>>>>> > > IOW, zram should be blind on zsmalloc's internal.
>>>>> > >
>>>>> >
>>>>> > We did however suggest that we could check before hand to see if
>>>>> > max was already exceeded as an optimization.
>>>>> > (possibly with a guess on usage but at least using the minimum of 1 page)
>>>>> > In the contested case, the max may already be exceeded transiently and
>>>>> > therefore we know this one _could_ fail (it could also pass, but odds
>>>>> > aren't good).
>>>>> > As Minchan mentions this was discussed before - but not into great detail.
>>>>> > Testing should be done to determine possible benefit. And as he also
>>>>> > mentions, the better place for it may be in zsmalloc, but that
>>>>> > requires an ABI change.
>>>>>
>>>>> Why we hesitate to change zsmalloc API? It is in-kernel API and there
>>>>> are just two users now, zswap and zram. We can change it easily.
>>>>> I think that we just need following simple API change in zsmalloc.c.
>>>>>
>>>>> zs_zpool_create(gfp_t gfp, struct zpool_ops *zpool_op)
>>>>> =>
>>>>> zs_zpool_create(unsigned long limit, gfp_t gfp, struct zpool_ops
>>>>> *zpool_op)
>>>>>
>>>>> It's pool allocator so there is no obstacle for us to limit maximum
>>>>> memory usage in zsmalloc. It's a natural idea to limit memory usage
>>>>> for pool allocator.
>>>>>
>>>>> > Certainly a detailed suggestion could happen on this thread and I'm
>>>>> > also interested
>>>>> > in your thoughts, but this patchset should be able to go in as is.
>>>>> > Memory exhaustion avoidance probably trumps the possible thrashing at
>>>>> > threshold.
>>>>> >
>>>>> > > About alloc/free cost once if it is over the limit,
>>>>> > > I don't think it's important to consider.
>>>>> > > Do you have any scenario in your mind to consider alloc/free cost
>>>>> > > when the limit is over?
>>>>> > >
>>>>> > >> 2) Even if this request doesn't do new allocation, it could be failed
>>>>> > >> due to other's allocation. There is time gap between allocation and
>>>>> > >> free, so legimate user who want to use preallocated zsmalloc memory
>>>>> > >> could also see this condition true and then he will be failed.
>>>>> > >
>>>>> > > Yeb, we already discussed that. :)
>>>>> > > Such false positive shouldn't be a severe problem if we can keep a
>>>>> > > promise that zram user cannot exceed mem_limit.
>>>>> > >
>>>>>
>>>>> If we can keep such a promise, why we need to limit memory usage?
>>>>> I guess that this limit feature is useful for user who can't keep such promise.
>>>>> So, we should assume that this false positive happens frequently.
>>>>
>>>>
>>>> The goal is to limit memory usage within some threshold.
>>>> so false positive shouldn't be harmful unless it exceeds the threshold.
>>>> In addition, If such false positive happens frequently, it means
>>>> zram is very trobule so that user would see lots of write fail
>>>> message, sometime really slow system if zram is used for swap.
>>>> If we protect just one write from the race, how much does it help
>>>> this situation? I don't think it's critical problem.
>>>>
>>>>>
>>>>> > And we cannot avoid the race, nor can we avoid in a low overhead competitive
>>>>> > concurrent process transient inconsistent states.
>>>>> > Different views for different observers.
>>>>> >  They are a consequence of the theory of "Special Computational Relativity".
>>>>> >  I am working on a String Unification Theory of Quantum and General CR in LISP.
>>>>> >  ;-)
>>>>>
>>>>> If we move limit logic to zsmalloc, we can avoid the race by commiting
>>>>> needed memory size before actual allocation attempt. This commiting makes
>>>>> concurrent process serialized so there is no race here. There is
>>>>> possibilty to fail to allocate, but I think this is better than alloc
>>>>> and free blindlessly depending on inconsistent states.
>>>>
>>>> Normally, zsmalloc/zsfree allocates object from existing pool so
>>>> it's not big overhead and if someone continue to try writing  once limit is
>>>> full, another overhead (vfs, fs, block) would be bigger than zsmalloc
>>>> so it's not a problem, I think.
>>>>
>>>>>
>>>>> Thanks.
>>>>>
>>>>> --
>>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>>>> see: http://www.linux-mm.org/ .
>>>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>>
>>>> --
>>>> Kind regards,
>>>> Minchan Kim

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 3/4] zram: zram memory size limitation
  2014-08-27 15:35                   ` David Horner
@ 2014-08-27 16:29                     ` Dan Streetman
  2014-08-27 16:59                       ` David Horner
  2014-08-28  2:59                       ` Minchan Kim
  0 siblings, 2 replies; 27+ messages in thread
From: Dan Streetman @ 2014-08-27 16:29 UTC (permalink / raw)
  To: David Horner
  Cc: Minchan Kim, Joonsoo Kim, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings

On Wed, Aug 27, 2014 at 11:35 AM, David Horner <ds2horner@gmail.com> wrote:
> On Wed, Aug 27, 2014 at 11:14 AM, Dan Streetman <ddstreet@ieee.org> wrote:
>> On Wed, Aug 27, 2014 at 10:44 AM, David Horner <ds2horner@gmail.com> wrote:
>>> On Wed, Aug 27, 2014 at 10:03 AM, Dan Streetman <ddstreet@ieee.org> wrote:
>>>> On Tue, Aug 26, 2014 at 10:51 PM, Minchan Kim <minchan@kernel.org> wrote:
>>>>> Hey Joonsoo,
>>>>>
>>>>> On Wed, Aug 27, 2014 at 10:26:11AM +0900, Joonsoo Kim wrote:
>>>>>> Hello, Minchan and David.
>>>>>>
>>>>>> On Tue, Aug 26, 2014 at 08:22:29AM -0400, David Horner wrote:
>>>>>> > On Tue, Aug 26, 2014 at 3:55 AM, Minchan Kim <minchan@kernel.org> wrote:
>>>>>> > > Hey Joonsoo,
>>>>>> > >
>>>>>> > > On Tue, Aug 26, 2014 at 04:37:30PM +0900, Joonsoo Kim wrote:
>>>>>> > >> On Mon, Aug 25, 2014 at 09:05:55AM +0900, Minchan Kim wrote:
>>>>>> > >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>>>>>> > >> >             ret = -ENOMEM;
>>>>>> > >> >             goto out;
>>>>>> > >> >     }
>>>>>> > >> > +
>>>>>> > >> > +   if (zram->limit_pages &&
>>>>>> > >> > +           zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
>>>>>> > >> > +           zs_free(meta->mem_pool, handle);
>>>>>> > >> > +           ret = -ENOMEM;
>>>>>> > >> > +           goto out;
>>>>>> > >> > +   }
>>>>>> > >> > +
>>>>>> > >> >     cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>>>>>> > >>
>>>>>> > >> Hello,
>>>>>> > >>
>>>>>> > >> I don't follow up previous discussion, so I could be wrong.
>>>>>> > >> Why this enforcement should be here?
>>>>>> > >>
>>>>>> > >> I think that this has two problems.
>>>>>> > >> 1) alloc/free happens unnecessarilly if we have used memory over the
>>>>>> > >> limitation.
>>>>>> > >
>>>>>> > > True but firstly, I implemented the logic in zsmalloc, not zram but
>>>>>> > > as I described in cover-letter, it's not a requirement of zsmalloc
>>>>>> > > but zram so it should be in there. If every user want it in future,
>>>>>> > > then we could move the function into zsmalloc. That's what we
>>>>>> > > concluded in previous discussion.
>>>>>>
>>>>>> Hmm...
>>>>>> Problem is that we can't avoid these unnecessary overhead in this
>>>>>> implementation. If we can implement this feature in zram efficiently,
>>>>>> it's okay. But, I think that current form isn't.
>>>>>
>>>>>
>>>>> If we can add it in zsmalloc, it would be more clean and efficient
>>>>> for zram but as I said, at the moment, I didn't want to put zram's
>>>>> requirement into zsmalloc because to me, it's weird to enforce max
>>>>> limit to allocator. It's client's role, I think.
>>>>>
>>>>> If current implementation is expensive and rather hard to follow,
>>>>> It would be one reason to move the feature into zsmalloc but
>>>>> I don't think it makes critical trobule in zram usecase.
>>>>> See below.
>>>>>
>>>>> But I still open and will wait others's opinion.
>>>>> If other guys think zsmalloc is better place, I am willing to move
>>>>> it into zsmalloc.
>>>>
>>>> Moving it into zsmalloc would allow rejecting new zsmallocs before
>>>> actually crossing the limit, since it can calculate that internally.
>>>> However, with the current patches the limit will only be briefly
>>>> crossed, and it should not be crossed by a large amount.  Now, if this
>>>> is happening repeatedly and quickly during extreme memory pressure,
>>>> the constant alloc/free will clearly be worse than a simple internal
>>>> calculation and failure.  But would it ever happen repeatedly once the
>>>> zram limit is reached?
>>>>
>>>> Now that I'm thinking about the limit from the perspective of the zram
>>>> user, I wonder what really will happen.  If zram is being used for
>>>> swap space, then when swap starts getting errors trying to write
>>>> pages, how damaging will that be to the system?  I haven't checked
>>>> what swap does when it encounters disk errors.  Of course, with no
>>>> zram limit, continually writing to zram until memory is totally
>>>> consumed isn't good either.  But in any case, I would hope that swap
>>>> would not repeatedly hammer on a disk when it's getting write failures
>>>> from it.
>>>>
>>>> Alternately, if zram was being used as a compressed ram disk for
>>>> regular file storage, it's entirely up to the application to handle
>>>> write failures, so it may continue to try to write to a full zram
>>>> disk.
>>>>
>>>> As far as what the zsmalloc api would look like with the limit added,
>>>> it would need a setter and getter function (adding it as a param to
>>>> the create function would be optional i think).  But more importantly,
>>>> it would need to handle multiple ways of specifying the limit.  In our
>>>> specific current use cases, zram and zswap, each handles their
>>>> internal limit differently - zswap currently uses a % of total ram as
>>>> its limit (defaulting to 20), while with these patches zram will use a
>>>> specific number of bytes as its limit (defaulting to no limit).  If
>>>> the limiting mechanism is moved into zsmalloc (and possibly zbud),
>>>> then either both users need to use the same units (bytes or %ram), or
>>>> zsmalloc/zbud need to be able to set their limit in either units.  It
>>>> seems to me like keeping the limit in zram/zswap is currently
>>>> preferable, at least without both using the same limit units.
>>>>
>>>
>>> zswap knows what 20% (or whatever % it currently uses , and perhaps it too
>>> will become a tuning knob) of memory is in bytes.
>>>
>>> So, if the interface to establish a limit for a pool (or pool set, or whatever
>>> zsmalloc sets up for its allocation mechanism) is stipulated in bytes
>>> (to actually use pages internally, of visa-versa) , then both can use
>>> that interface.
>>> zram with its native page stipulation, and zswap with calculated % of memory).
>>
>> No, unless zswap monitors memory hotplug and updates the limit on each
>> hotplug event, 20% of the *current* total ram at zswap initialization
>> is not equal to an actual 20% of ram limit.  zswap checks its size
>> against totalram_pages for each new allocation. I don't think we would
>> prefer adding memory hotplug monitoring to zswap just to update the
>> zpool size limit.
>>
>
> OK - I see the need to retain the limits where they are in the using
> components so that
> zsmalloc is not unnecessarily complicated (keeping track of 2 limit methods).
>
> So, zswap has the same race conditions and possible transient over-allocations?
> It looks like I will have to check on how zswap implements it.
> But perhaps you can answer the question that is not in the code:
> Have there been reported thrashing behaviour around the 20% limit for zswap?

zswap does a simple over-allocation check before allocating anything.
So during page store, it checks if (total_ram * 0.20) < used.  This
actually places the effective limit higher than the specified limit,
but only by a single allocation.  This approach could be taken with
zram as well.

The amount of over-allocation (past the specified limit) would vary
between zsmalloc and zbud.  Since zbud increases itself in page
increments, any over-allocation past the zswap limit would be by only
1 page.  However, zsmalloc is variable in its allocation increments,
as it depends on which class needs to be grown; zsmalloc is divided
into many "classes", each of contains some number of "zspages" which
try to precisely contain some number of N-sized areas; e.g. one class
might use zspages that are 2 pages to store 3 separate areas which are
each 2/3 of a page number of bytes; if that class needed to be grown,
it would add one zspage that is 2 pages.  The max number of actual
pages per zspage is defined by ZS_MAX_PAGES_PER_ZSPAGE which is
currently set to 1<<2, so 4.

So with zswap, it will over-allocate memory past its specified limit,
up to 1 page (with zbud) or up to 4 pages (with zsmalloc).  zram could
do the same, simply check if its size > limit before each write, and
fail if so; that would remove the alloc/free issue, and would only
over-allocate by at most 4 pages (with the current zsmalloc settings).
Alternately, zram could check if its (current_size + 4pages > limit),
which would then stop it short of the limit by up to 4 pages.  Really
though, 4 pages either above or under the limit probably doesn't
matter.

>
> thanks.
>
>>>
>>> Both would need a mechanism to change the max as need change,
>>>  so the API has to handle this.
>>>
>>>
>>> Or am I way off base?
>>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>> > >
>>>>>> > > Another idea is we could call zs_get_total_pages right before zs_malloc
>>>>>> > > but the problem is we cannot know how many of pages are allocated
>>>>>> > > by zsmalloc in advance.
>>>>>> > > IOW, zram should be blind on zsmalloc's internal.
>>>>>> > >
>>>>>> >
>>>>>> > We did however suggest that we could check before hand to see if
>>>>>> > max was already exceeded as an optimization.
>>>>>> > (possibly with a guess on usage but at least using the minimum of 1 page)
>>>>>> > In the contested case, the max may already be exceeded transiently and
>>>>>> > therefore we know this one _could_ fail (it could also pass, but odds
>>>>>> > aren't good).
>>>>>> > As Minchan mentions this was discussed before - but not into great detail.
>>>>>> > Testing should be done to determine possible benefit. And as he also
>>>>>> > mentions, the better place for it may be in zsmalloc, but that
>>>>>> > requires an ABI change.
>>>>>>
>>>>>> Why we hesitate to change zsmalloc API? It is in-kernel API and there
>>>>>> are just two users now, zswap and zram. We can change it easily.
>>>>>> I think that we just need following simple API change in zsmalloc.c.
>>>>>>
>>>>>> zs_zpool_create(gfp_t gfp, struct zpool_ops *zpool_op)
>>>>>> =>
>>>>>> zs_zpool_create(unsigned long limit, gfp_t gfp, struct zpool_ops
>>>>>> *zpool_op)
>>>>>>
>>>>>> It's pool allocator so there is no obstacle for us to limit maximum
>>>>>> memory usage in zsmalloc. It's a natural idea to limit memory usage
>>>>>> for pool allocator.
>>>>>>
>>>>>> > Certainly a detailed suggestion could happen on this thread and I'm
>>>>>> > also interested
>>>>>> > in your thoughts, but this patchset should be able to go in as is.
>>>>>> > Memory exhaustion avoidance probably trumps the possible thrashing at
>>>>>> > threshold.
>>>>>> >
>>>>>> > > About alloc/free cost once if it is over the limit,
>>>>>> > > I don't think it's important to consider.
>>>>>> > > Do you have any scenario in your mind to consider alloc/free cost
>>>>>> > > when the limit is over?
>>>>>> > >
>>>>>> > >> 2) Even if this request doesn't do new allocation, it could be failed
>>>>>> > >> due to other's allocation. There is time gap between allocation and
>>>>>> > >> free, so legimate user who want to use preallocated zsmalloc memory
>>>>>> > >> could also see this condition true and then he will be failed.
>>>>>> > >
>>>>>> > > Yeb, we already discussed that. :)
>>>>>> > > Such false positive shouldn't be a severe problem if we can keep a
>>>>>> > > promise that zram user cannot exceed mem_limit.
>>>>>> > >
>>>>>>
>>>>>> If we can keep such a promise, why we need to limit memory usage?
>>>>>> I guess that this limit feature is useful for user who can't keep such promise.
>>>>>> So, we should assume that this false positive happens frequently.
>>>>>
>>>>>
>>>>> The goal is to limit memory usage within some threshold.
>>>>> so false positive shouldn't be harmful unless it exceeds the threshold.
>>>>> In addition, If such false positive happens frequently, it means
>>>>> zram is very trobule so that user would see lots of write fail
>>>>> message, sometime really slow system if zram is used for swap.
>>>>> If we protect just one write from the race, how much does it help
>>>>> this situation? I don't think it's critical problem.
>>>>>
>>>>>>
>>>>>> > And we cannot avoid the race, nor can we avoid in a low overhead competitive
>>>>>> > concurrent process transient inconsistent states.
>>>>>> > Different views for different observers.
>>>>>> >  They are a consequence of the theory of "Special Computational Relativity".
>>>>>> >  I am working on a String Unification Theory of Quantum and General CR in LISP.
>>>>>> >  ;-)
>>>>>>
>>>>>> If we move limit logic to zsmalloc, we can avoid the race by commiting
>>>>>> needed memory size before actual allocation attempt. This commiting makes
>>>>>> concurrent process serialized so there is no race here. There is
>>>>>> possibilty to fail to allocate, but I think this is better than alloc
>>>>>> and free blindlessly depending on inconsistent states.
>>>>>
>>>>> Normally, zsmalloc/zsfree allocates object from existing pool so
>>>>> it's not big overhead and if someone continue to try writing  once limit is
>>>>> full, another overhead (vfs, fs, block) would be bigger than zsmalloc
>>>>> so it's not a problem, I think.
>>>>>
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> --
>>>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>>>>> see: http://www.linux-mm.org/ .
>>>>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>>>
>>>>> --
>>>>> Kind regards,
>>>>> Minchan Kim

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 3/4] zram: zram memory size limitation
  2014-08-27 16:29                     ` Dan Streetman
@ 2014-08-27 16:59                       ` David Horner
  2014-08-27 19:04                         ` Dan Streetman
  2014-08-28  2:59                       ` Minchan Kim
  1 sibling, 1 reply; 27+ messages in thread
From: David Horner @ 2014-08-27 16:59 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Minchan Kim, Joonsoo Kim, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings

On Wed, Aug 27, 2014 at 12:29 PM, Dan Streetman <ddstreet@ieee.org> wrote:
> On Wed, Aug 27, 2014 at 11:35 AM, David Horner <ds2horner@gmail.com> wrote:
>> On Wed, Aug 27, 2014 at 11:14 AM, Dan Streetman <ddstreet@ieee.org> wrote:
>>> On Wed, Aug 27, 2014 at 10:44 AM, David Horner <ds2horner@gmail.com> wrote:
>>>> On Wed, Aug 27, 2014 at 10:03 AM, Dan Streetman <ddstreet@ieee.org> wrote:
>>>>> On Tue, Aug 26, 2014 at 10:51 PM, Minchan Kim <minchan@kernel.org> wrote:
>>>>>> Hey Joonsoo,
>>>>>>
>>>>>> On Wed, Aug 27, 2014 at 10:26:11AM +0900, Joonsoo Kim wrote:
>>>>>>> Hello, Minchan and David.
>>>>>>>
>>>>>>> On Tue, Aug 26, 2014 at 08:22:29AM -0400, David Horner wrote:
>>>>>>> > On Tue, Aug 26, 2014 at 3:55 AM, Minchan Kim <minchan@kernel.org> wrote:
>>>>>>> > > Hey Joonsoo,
>>>>>>> > >
>>>>>>> > > On Tue, Aug 26, 2014 at 04:37:30PM +0900, Joonsoo Kim wrote:
>>>>>>> > >> On Mon, Aug 25, 2014 at 09:05:55AM +0900, Minchan Kim wrote:
>>>>>>> > >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>>>>>>> > >> >             ret = -ENOMEM;
>>>>>>> > >> >             goto out;
>>>>>>> > >> >     }
>>>>>>> > >> > +
>>>>>>> > >> > +   if (zram->limit_pages &&
>>>>>>> > >> > +           zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
>>>>>>> > >> > +           zs_free(meta->mem_pool, handle);
>>>>>>> > >> > +           ret = -ENOMEM;
>>>>>>> > >> > +           goto out;
>>>>>>> > >> > +   }
>>>>>>> > >> > +
>>>>>>> > >> >     cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>>>>>>> > >>
>>>>>>> > >> Hello,
>>>>>>> > >>
>>>>>>> > >> I don't follow up previous discussion, so I could be wrong.
>>>>>>> > >> Why this enforcement should be here?
>>>>>>> > >>
>>>>>>> > >> I think that this has two problems.
>>>>>>> > >> 1) alloc/free happens unnecessarilly if we have used memory over the
>>>>>>> > >> limitation.
>>>>>>> > >
>>>>>>> > > True but firstly, I implemented the logic in zsmalloc, not zram but
>>>>>>> > > as I described in cover-letter, it's not a requirement of zsmalloc
>>>>>>> > > but zram so it should be in there. If every user want it in future,
>>>>>>> > > then we could move the function into zsmalloc. That's what we
>>>>>>> > > concluded in previous discussion.
>>>>>>>
>>>>>>> Hmm...
>>>>>>> Problem is that we can't avoid these unnecessary overhead in this
>>>>>>> implementation. If we can implement this feature in zram efficiently,
>>>>>>> it's okay. But, I think that current form isn't.
>>>>>>
>>>>>>
>>>>>> If we can add it in zsmalloc, it would be more clean and efficient
>>>>>> for zram but as I said, at the moment, I didn't want to put zram's
>>>>>> requirement into zsmalloc because to me, it's weird to enforce max
>>>>>> limit to allocator. It's client's role, I think.
>>>>>>
>>>>>> If current implementation is expensive and rather hard to follow,
>>>>>> It would be one reason to move the feature into zsmalloc but
>>>>>> I don't think it makes critical trobule in zram usecase.
>>>>>> See below.
>>>>>>
>>>>>> But I still open and will wait others's opinion.
>>>>>> If other guys think zsmalloc is better place, I am willing to move
>>>>>> it into zsmalloc.
>>>>>
>>>>> Moving it into zsmalloc would allow rejecting new zsmallocs before
>>>>> actually crossing the limit, since it can calculate that internally.
>>>>> However, with the current patches the limit will only be briefly
>>>>> crossed, and it should not be crossed by a large amount.  Now, if this
>>>>> is happening repeatedly and quickly during extreme memory pressure,
>>>>> the constant alloc/free will clearly be worse than a simple internal
>>>>> calculation and failure.  But would it ever happen repeatedly once the
>>>>> zram limit is reached?
>>>>>
>>>>> Now that I'm thinking about the limit from the perspective of the zram
>>>>> user, I wonder what really will happen.  If zram is being used for
>>>>> swap space, then when swap starts getting errors trying to write
>>>>> pages, how damaging will that be to the system?  I haven't checked
>>>>> what swap does when it encounters disk errors.  Of course, with no
>>>>> zram limit, continually writing to zram until memory is totally
>>>>> consumed isn't good either.  But in any case, I would hope that swap
>>>>> would not repeatedly hammer on a disk when it's getting write failures
>>>>> from it.
>>>>>
>>>>> Alternately, if zram was being used as a compressed ram disk for
>>>>> regular file storage, it's entirely up to the application to handle
>>>>> write failures, so it may continue to try to write to a full zram
>>>>> disk.
>>>>>
>>>>> As far as what the zsmalloc api would look like with the limit added,
>>>>> it would need a setter and getter function (adding it as a param to
>>>>> the create function would be optional i think).  But more importantly,
>>>>> it would need to handle multiple ways of specifying the limit.  In our
>>>>> specific current use cases, zram and zswap, each handles their
>>>>> internal limit differently - zswap currently uses a % of total ram as
>>>>> its limit (defaulting to 20), while with these patches zram will use a
>>>>> specific number of bytes as its limit (defaulting to no limit).  If
>>>>> the limiting mechanism is moved into zsmalloc (and possibly zbud),
>>>>> then either both users need to use the same units (bytes or %ram), or
>>>>> zsmalloc/zbud need to be able to set their limit in either units.  It
>>>>> seems to me like keeping the limit in zram/zswap is currently
>>>>> preferable, at least without both using the same limit units.
>>>>>
>>>>
>>>> zswap knows what 20% (or whatever % it currently uses , and perhaps it too
>>>> will become a tuning knob) of memory is in bytes.
>>>>
>>>> So, if the interface to establish a limit for a pool (or pool set, or whatever
>>>> zsmalloc sets up for its allocation mechanism) is stipulated in bytes
>>>> (to actually use pages internally, of visa-versa) , then both can use
>>>> that interface.
>>>> zram with its native page stipulation, and zswap with calculated % of memory).
>>>
>>> No, unless zswap monitors memory hotplug and updates the limit on each
>>> hotplug event, 20% of the *current* total ram at zswap initialization
>>> is not equal to an actual 20% of ram limit.  zswap checks its size
>>> against totalram_pages for each new allocation. I don't think we would
>>> prefer adding memory hotplug monitoring to zswap just to update the
>>> zpool size limit.
>>>
>>
>> OK - I see the need to retain the limits where they are in the using
>> components so that
>> zsmalloc is not unnecessarily complicated (keeping track of 2 limit methods).
>>
>> So, zswap has the same race conditions and possible transient over-allocations?
>> It looks like I will have to check on how zswap implements it.
>> But perhaps you can answer the question that is not in the code:
>> Have there been reported thrashing behaviour around the 20% limit for zswap?
>
> zswap does a simple over-allocation check before allocating anything.
> So during page store, it checks if (total_ram * 0.20) < used.  This
> actually places the effective limit higher than the specified limit,
> but only by a single allocation.  This approach could be taken with
> zram as well.
>
> The amount of over-allocation (past the specified limit) would vary
> between zsmalloc and zbud.  Since zbud increases itself in page
> increments, any over-allocation past the zswap limit would be by only
> 1 page.  However, zsmalloc is variable in its allocation increments,
> as it depends on which class needs to be grown; zsmalloc is divided
> into many "classes", each of contains some number of "zspages" which
> try to precisely contain some number of N-sized areas; e.g. one class
> might use zspages that are 2 pages to store 3 separate areas which are
> each 2/3 of a page number of bytes; if that class needed to be grown,
> it would add one zspage that is 2 pages.  The max number of actual
> pages per zspage is defined by ZS_MAX_PAGES_PER_ZSPAGE which is
> currently set to 1<<2, so 4.
>
> So with zswap, it will over-allocate memory past its specified limit,
> up to 1 page (with zbud) or up to 4 pages (with zsmalloc).  zram could
> do the same, simply check if its size > limit before each write, and
> fail if so; that would remove the alloc/free issue, and would only
> over-allocate by at most 4 pages (with the current zsmalloc settings).
> Alternately, zram could check if its (current_size + 4pages > limit),
> which would then stop it short of the limit by up to 4 pages.  Really
> though, 4 pages either above or under the limit probably doesn't
> matter.

I agree that it isn't the over allocation of a few pages, per se, but if
there is potential for concurrent allocation through either zram of zswap
there is the possibility of high contention thrashing as the processes
jocky for those final pages. Perhaps not a thundering herd, but
 it could readily become a hot spot as the pressure is on and there is
 no explicit mechanism to throttle back retries and new efforts.

I agree that a precheck before zsmalloc request in zram may be a valuable
optimization, but only if there is a real possibility of threshold thrashing.
I suggested before that it might be best to wait and see....
Do you have any suggestions on a stress test approach to empirically assess
the risk?

Thanks again.


>
>>
>> thanks.
>>
>>>>
>>>> Both would need a mechanism to change the max as need change,
>>>>  so the API has to handle this.
>>>>
>>>>
>>>> Or am I way off base?
>>>>
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> > >
>>>>>>> > > Another idea is we could call zs_get_total_pages right before zs_malloc
>>>>>>> > > but the problem is we cannot know how many of pages are allocated
>>>>>>> > > by zsmalloc in advance.
>>>>>>> > > IOW, zram should be blind on zsmalloc's internal.
>>>>>>> > >
>>>>>>> >
>>>>>>> > We did however suggest that we could check before hand to see if
>>>>>>> > max was already exceeded as an optimization.
>>>>>>> > (possibly with a guess on usage but at least using the minimum of 1 page)
>>>>>>> > In the contested case, the max may already be exceeded transiently and
>>>>>>> > therefore we know this one _could_ fail (it could also pass, but odds
>>>>>>> > aren't good).
>>>>>>> > As Minchan mentions this was discussed before - but not into great detail.
>>>>>>> > Testing should be done to determine possible benefit. And as he also
>>>>>>> > mentions, the better place for it may be in zsmalloc, but that
>>>>>>> > requires an ABI change.
>>>>>>>
>>>>>>> Why we hesitate to change zsmalloc API? It is in-kernel API and there
>>>>>>> are just two users now, zswap and zram. We can change it easily.
>>>>>>> I think that we just need following simple API change in zsmalloc.c.
>>>>>>>
>>>>>>> zs_zpool_create(gfp_t gfp, struct zpool_ops *zpool_op)
>>>>>>> =>
>>>>>>> zs_zpool_create(unsigned long limit, gfp_t gfp, struct zpool_ops
>>>>>>> *zpool_op)
>>>>>>>
>>>>>>> It's pool allocator so there is no obstacle for us to limit maximum
>>>>>>> memory usage in zsmalloc. It's a natural idea to limit memory usage
>>>>>>> for pool allocator.
>>>>>>>
>>>>>>> > Certainly a detailed suggestion could happen on this thread and I'm
>>>>>>> > also interested
>>>>>>> > in your thoughts, but this patchset should be able to go in as is.
>>>>>>> > Memory exhaustion avoidance probably trumps the possible thrashing at
>>>>>>> > threshold.
>>>>>>> >
>>>>>>> > > About alloc/free cost once if it is over the limit,
>>>>>>> > > I don't think it's important to consider.
>>>>>>> > > Do you have any scenario in your mind to consider alloc/free cost
>>>>>>> > > when the limit is over?
>>>>>>> > >
>>>>>>> > >> 2) Even if this request doesn't do new allocation, it could be failed
>>>>>>> > >> due to other's allocation. There is time gap between allocation and
>>>>>>> > >> free, so legimate user who want to use preallocated zsmalloc memory
>>>>>>> > >> could also see this condition true and then he will be failed.
>>>>>>> > >
>>>>>>> > > Yeb, we already discussed that. :)
>>>>>>> > > Such false positive shouldn't be a severe problem if we can keep a
>>>>>>> > > promise that zram user cannot exceed mem_limit.
>>>>>>> > >
>>>>>>>
>>>>>>> If we can keep such a promise, why we need to limit memory usage?
>>>>>>> I guess that this limit feature is useful for user who can't keep such promise.
>>>>>>> So, we should assume that this false positive happens frequently.
>>>>>>
>>>>>>
>>>>>> The goal is to limit memory usage within some threshold.
>>>>>> so false positive shouldn't be harmful unless it exceeds the threshold.
>>>>>> In addition, If such false positive happens frequently, it means
>>>>>> zram is very trobule so that user would see lots of write fail
>>>>>> message, sometime really slow system if zram is used for swap.
>>>>>> If we protect just one write from the race, how much does it help
>>>>>> this situation? I don't think it's critical problem.
>>>>>>
>>>>>>>
>>>>>>> > And we cannot avoid the race, nor can we avoid in a low overhead competitive
>>>>>>> > concurrent process transient inconsistent states.
>>>>>>> > Different views for different observers.
>>>>>>> >  They are a consequence of the theory of "Special Computational Relativity".
>>>>>>> >  I am working on a String Unification Theory of Quantum and General CR in LISP.
>>>>>>> >  ;-)
>>>>>>>
>>>>>>> If we move limit logic to zsmalloc, we can avoid the race by commiting
>>>>>>> needed memory size before actual allocation attempt. This commiting makes
>>>>>>> concurrent process serialized so there is no race here. There is
>>>>>>> possibilty to fail to allocate, but I think this is better than alloc
>>>>>>> and free blindlessly depending on inconsistent states.
>>>>>>
>>>>>> Normally, zsmalloc/zsfree allocates object from existing pool so
>>>>>> it's not big overhead and if someone continue to try writing  once limit is
>>>>>> full, another overhead (vfs, fs, block) would be bigger than zsmalloc
>>>>>> so it's not a problem, I think.
>>>>>>
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>>>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>>>>>> see: http://www.linux-mm.org/ .
>>>>>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>>>>
>>>>>> --
>>>>>> Kind regards,
>>>>>> Minchan Kim

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 3/4] zram: zram memory size limitation
  2014-08-27 16:59                       ` David Horner
@ 2014-08-27 19:04                         ` Dan Streetman
  2014-08-28  3:04                           ` Minchan Kim
  0 siblings, 1 reply; 27+ messages in thread
From: Dan Streetman @ 2014-08-27 19:04 UTC (permalink / raw)
  To: David Horner
  Cc: Minchan Kim, Joonsoo Kim, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings

On Wed, Aug 27, 2014 at 12:59 PM, David Horner <ds2horner@gmail.com> wrote:
> On Wed, Aug 27, 2014 at 12:29 PM, Dan Streetman <ddstreet@ieee.org> wrote:
>> On Wed, Aug 27, 2014 at 11:35 AM, David Horner <ds2horner@gmail.com> wrote:
>>> On Wed, Aug 27, 2014 at 11:14 AM, Dan Streetman <ddstreet@ieee.org> wrote:
>>>> On Wed, Aug 27, 2014 at 10:44 AM, David Horner <ds2horner@gmail.com> wrote:
>>>>> On Wed, Aug 27, 2014 at 10:03 AM, Dan Streetman <ddstreet@ieee.org> wrote:
>>>>>> On Tue, Aug 26, 2014 at 10:51 PM, Minchan Kim <minchan@kernel.org> wrote:
>>>>>>> Hey Joonsoo,
>>>>>>>
>>>>>>> On Wed, Aug 27, 2014 at 10:26:11AM +0900, Joonsoo Kim wrote:
>>>>>>>> Hello, Minchan and David.
>>>>>>>>
>>>>>>>> On Tue, Aug 26, 2014 at 08:22:29AM -0400, David Horner wrote:
>>>>>>>> > On Tue, Aug 26, 2014 at 3:55 AM, Minchan Kim <minchan@kernel.org> wrote:
>>>>>>>> > > Hey Joonsoo,
>>>>>>>> > >
>>>>>>>> > > On Tue, Aug 26, 2014 at 04:37:30PM +0900, Joonsoo Kim wrote:
>>>>>>>> > >> On Mon, Aug 25, 2014 at 09:05:55AM +0900, Minchan Kim wrote:
>>>>>>>> > >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>>>>>>>> > >> >             ret = -ENOMEM;
>>>>>>>> > >> >             goto out;
>>>>>>>> > >> >     }
>>>>>>>> > >> > +
>>>>>>>> > >> > +   if (zram->limit_pages &&
>>>>>>>> > >> > +           zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
>>>>>>>> > >> > +           zs_free(meta->mem_pool, handle);
>>>>>>>> > >> > +           ret = -ENOMEM;
>>>>>>>> > >> > +           goto out;
>>>>>>>> > >> > +   }
>>>>>>>> > >> > +
>>>>>>>> > >> >     cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>>>>>>>> > >>
>>>>>>>> > >> Hello,
>>>>>>>> > >>
>>>>>>>> > >> I don't follow up previous discussion, so I could be wrong.
>>>>>>>> > >> Why this enforcement should be here?
>>>>>>>> > >>
>>>>>>>> > >> I think that this has two problems.
>>>>>>>> > >> 1) alloc/free happens unnecessarilly if we have used memory over the
>>>>>>>> > >> limitation.
>>>>>>>> > >
>>>>>>>> > > True but firstly, I implemented the logic in zsmalloc, not zram but
>>>>>>>> > > as I described in cover-letter, it's not a requirement of zsmalloc
>>>>>>>> > > but zram so it should be in there. If every user want it in future,
>>>>>>>> > > then we could move the function into zsmalloc. That's what we
>>>>>>>> > > concluded in previous discussion.
>>>>>>>>
>>>>>>>> Hmm...
>>>>>>>> Problem is that we can't avoid these unnecessary overhead in this
>>>>>>>> implementation. If we can implement this feature in zram efficiently,
>>>>>>>> it's okay. But, I think that current form isn't.
>>>>>>>
>>>>>>>
>>>>>>> If we can add it in zsmalloc, it would be more clean and efficient
>>>>>>> for zram but as I said, at the moment, I didn't want to put zram's
>>>>>>> requirement into zsmalloc because to me, it's weird to enforce max
>>>>>>> limit to allocator. It's client's role, I think.
>>>>>>>
>>>>>>> If current implementation is expensive and rather hard to follow,
>>>>>>> It would be one reason to move the feature into zsmalloc but
>>>>>>> I don't think it makes critical trobule in zram usecase.
>>>>>>> See below.
>>>>>>>
>>>>>>> But I still open and will wait others's opinion.
>>>>>>> If other guys think zsmalloc is better place, I am willing to move
>>>>>>> it into zsmalloc.
>>>>>>
>>>>>> Moving it into zsmalloc would allow rejecting new zsmallocs before
>>>>>> actually crossing the limit, since it can calculate that internally.
>>>>>> However, with the current patches the limit will only be briefly
>>>>>> crossed, and it should not be crossed by a large amount.  Now, if this
>>>>>> is happening repeatedly and quickly during extreme memory pressure,
>>>>>> the constant alloc/free will clearly be worse than a simple internal
>>>>>> calculation and failure.  But would it ever happen repeatedly once the
>>>>>> zram limit is reached?
>>>>>>
>>>>>> Now that I'm thinking about the limit from the perspective of the zram
>>>>>> user, I wonder what really will happen.  If zram is being used for
>>>>>> swap space, then when swap starts getting errors trying to write
>>>>>> pages, how damaging will that be to the system?  I haven't checked
>>>>>> what swap does when it encounters disk errors.  Of course, with no
>>>>>> zram limit, continually writing to zram until memory is totally
>>>>>> consumed isn't good either.  But in any case, I would hope that swap
>>>>>> would not repeatedly hammer on a disk when it's getting write failures
>>>>>> from it.
>>>>>>
>>>>>> Alternately, if zram was being used as a compressed ram disk for
>>>>>> regular file storage, it's entirely up to the application to handle
>>>>>> write failures, so it may continue to try to write to a full zram
>>>>>> disk.
>>>>>>
>>>>>> As far as what the zsmalloc api would look like with the limit added,
>>>>>> it would need a setter and getter function (adding it as a param to
>>>>>> the create function would be optional i think).  But more importantly,
>>>>>> it would need to handle multiple ways of specifying the limit.  In our
>>>>>> specific current use cases, zram and zswap, each handles their
>>>>>> internal limit differently - zswap currently uses a % of total ram as
>>>>>> its limit (defaulting to 20), while with these patches zram will use a
>>>>>> specific number of bytes as its limit (defaulting to no limit).  If
>>>>>> the limiting mechanism is moved into zsmalloc (and possibly zbud),
>>>>>> then either both users need to use the same units (bytes or %ram), or
>>>>>> zsmalloc/zbud need to be able to set their limit in either units.  It
>>>>>> seems to me like keeping the limit in zram/zswap is currently
>>>>>> preferable, at least without both using the same limit units.
>>>>>>
>>>>>
>>>>> zswap knows what 20% (or whatever % it currently uses , and perhaps it too
>>>>> will become a tuning knob) of memory is in bytes.
>>>>>
>>>>> So, if the interface to establish a limit for a pool (or pool set, or whatever
>>>>> zsmalloc sets up for its allocation mechanism) is stipulated in bytes
>>>>> (to actually use pages internally, of visa-versa) , then both can use
>>>>> that interface.
>>>>> zram with its native page stipulation, and zswap with calculated % of memory).
>>>>
>>>> No, unless zswap monitors memory hotplug and updates the limit on each
>>>> hotplug event, 20% of the *current* total ram at zswap initialization
>>>> is not equal to an actual 20% of ram limit.  zswap checks its size
>>>> against totalram_pages for each new allocation. I don't think we would
>>>> prefer adding memory hotplug monitoring to zswap just to update the
>>>> zpool size limit.
>>>>
>>>
>>> OK - I see the need to retain the limits where they are in the using
>>> components so that
>>> zsmalloc is not unnecessarily complicated (keeping track of 2 limit methods).
>>>
>>> So, zswap has the same race conditions and possible transient over-allocations?
>>> It looks like I will have to check on how zswap implements it.
>>> But perhaps you can answer the question that is not in the code:
>>> Have there been reported thrashing behaviour around the 20% limit for zswap?
>>
>> zswap does a simple over-allocation check before allocating anything.
>> So during page store, it checks if (total_ram * 0.20) < used.  This
>> actually places the effective limit higher than the specified limit,
>> but only by a single allocation.  This approach could be taken with
>> zram as well.
>>
>> The amount of over-allocation (past the specified limit) would vary
>> between zsmalloc and zbud.  Since zbud increases itself in page
>> increments, any over-allocation past the zswap limit would be by only
>> 1 page.  However, zsmalloc is variable in its allocation increments,
>> as it depends on which class needs to be grown; zsmalloc is divided
>> into many "classes", each of contains some number of "zspages" which
>> try to precisely contain some number of N-sized areas; e.g. one class
>> might use zspages that are 2 pages to store 3 separate areas which are
>> each 2/3 of a page number of bytes; if that class needed to be grown,
>> it would add one zspage that is 2 pages.  The max number of actual
>> pages per zspage is defined by ZS_MAX_PAGES_PER_ZSPAGE which is
>> currently set to 1<<2, so 4.
>>
>> So with zswap, it will over-allocate memory past its specified limit,
>> up to 1 page (with zbud) or up to 4 pages (with zsmalloc).  zram could
>> do the same, simply check if its size > limit before each write, and
>> fail if so; that would remove the alloc/free issue, and would only
>> over-allocate by at most 4 pages (with the current zsmalloc settings).
>> Alternately, zram could check if its (current_size + 4pages > limit),
>> which would then stop it short of the limit by up to 4 pages.  Really
>> though, 4 pages either above or under the limit probably doesn't
>> matter.
>
> I agree that it isn't the over allocation of a few pages, per se, but if
> there is potential for concurrent allocation through either zram of zswap
> there is the possibility of high contention thrashing as the processes
> jocky for those final pages. Perhaps not a thundering herd, but
>  it could readily become a hot spot as the pressure is on and there is
>  no explicit mechanism to throttle back retries and new efforts.
>
> I agree that a precheck before zsmalloc request in zram may be a valuable
> optimization, but only if there is a real possibility of threshold thrashing.
> I suggested before that it might be best to wait and see....
> Do you have any suggestions on a stress test approach to empirically assess
> the risk?

Minchan is the expert here so he may have suggestions and/or existing
tests, but I think just creating a zram device with a limit and any
filesystem and mounting it, then filling up the device, would be a
good basic test.  Maybe once it gets close to the limit, try writing
to it with multiple threads concurrently.  Also, while near/at the
limit, try concurrently writing to it and deleting from it - probably
writing many small files as well as deleting many small files would be
best to keep it right at its limit.  However, I think
zram_bvec_write() is already protected inside the init_lock semphore
(via zram_make_request), so it seems like there will never be any
concurrency issue.

Since the threads writing to files at the limit will be getting
(expected) errors during write, the test is really looking at is
whether there is any significant performance difference between a
alloc/free-on-full approach or a pre-check/fail-on-full approach.
That might not become apparent until there is significant memory
pressure, so you might set the zram limit high enough, or you could
otherwise artificially use up system memory (e.g. stress -m).

Also, as a general test of the new zram limit, it would be interesting
to see how various filesystem drivers handle unexpected write
failures...I'm no fs expert, but I would expect that this is different
than a typical ENOSPC condition, which I assume comes from the
filesystem, not the block layer.  This error would be more like a
hardware error.

I'd also be interested in what happens when a zram-backed swap device
reaches its limit...I would hope swap device write errors would lead
to a normal OOM condition and invoke the oom-killer, but from
page_io.c end_swap_bio_write:

 * We failed to write the page out to swap-space.
 * Re-dirty the page in order to avoid it being reclaimed.
 * Also print a dire warning that things will go BAD (tm)
 * very quickly.



>
> Thanks again.
>
>
>>
>>>
>>> thanks.
>>>
>>>>>
>>>>> Both would need a mechanism to change the max as need change,
>>>>>  so the API has to handle this.
>>>>>
>>>>>
>>>>> Or am I way off base?
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> > >
>>>>>>>> > > Another idea is we could call zs_get_total_pages right before zs_malloc
>>>>>>>> > > but the problem is we cannot know how many of pages are allocated
>>>>>>>> > > by zsmalloc in advance.
>>>>>>>> > > IOW, zram should be blind on zsmalloc's internal.
>>>>>>>> > >
>>>>>>>> >
>>>>>>>> > We did however suggest that we could check before hand to see if
>>>>>>>> > max was already exceeded as an optimization.
>>>>>>>> > (possibly with a guess on usage but at least using the minimum of 1 page)
>>>>>>>> > In the contested case, the max may already be exceeded transiently and
>>>>>>>> > therefore we know this one _could_ fail (it could also pass, but odds
>>>>>>>> > aren't good).
>>>>>>>> > As Minchan mentions this was discussed before - but not into great detail.
>>>>>>>> > Testing should be done to determine possible benefit. And as he also
>>>>>>>> > mentions, the better place for it may be in zsmalloc, but that
>>>>>>>> > requires an ABI change.
>>>>>>>>
>>>>>>>> Why we hesitate to change zsmalloc API? It is in-kernel API and there
>>>>>>>> are just two users now, zswap and zram. We can change it easily.
>>>>>>>> I think that we just need following simple API change in zsmalloc.c.
>>>>>>>>
>>>>>>>> zs_zpool_create(gfp_t gfp, struct zpool_ops *zpool_op)
>>>>>>>> =>
>>>>>>>> zs_zpool_create(unsigned long limit, gfp_t gfp, struct zpool_ops
>>>>>>>> *zpool_op)
>>>>>>>>
>>>>>>>> It's pool allocator so there is no obstacle for us to limit maximum
>>>>>>>> memory usage in zsmalloc. It's a natural idea to limit memory usage
>>>>>>>> for pool allocator.
>>>>>>>>
>>>>>>>> > Certainly a detailed suggestion could happen on this thread and I'm
>>>>>>>> > also interested
>>>>>>>> > in your thoughts, but this patchset should be able to go in as is.
>>>>>>>> > Memory exhaustion avoidance probably trumps the possible thrashing at
>>>>>>>> > threshold.
>>>>>>>> >
>>>>>>>> > > About alloc/free cost once if it is over the limit,
>>>>>>>> > > I don't think it's important to consider.
>>>>>>>> > > Do you have any scenario in your mind to consider alloc/free cost
>>>>>>>> > > when the limit is over?
>>>>>>>> > >
>>>>>>>> > >> 2) Even if this request doesn't do new allocation, it could be failed
>>>>>>>> > >> due to other's allocation. There is time gap between allocation and
>>>>>>>> > >> free, so legimate user who want to use preallocated zsmalloc memory
>>>>>>>> > >> could also see this condition true and then he will be failed.
>>>>>>>> > >
>>>>>>>> > > Yeb, we already discussed that. :)
>>>>>>>> > > Such false positive shouldn't be a severe problem if we can keep a
>>>>>>>> > > promise that zram user cannot exceed mem_limit.
>>>>>>>> > >
>>>>>>>>
>>>>>>>> If we can keep such a promise, why we need to limit memory usage?
>>>>>>>> I guess that this limit feature is useful for user who can't keep such promise.
>>>>>>>> So, we should assume that this false positive happens frequently.
>>>>>>>
>>>>>>>
>>>>>>> The goal is to limit memory usage within some threshold.
>>>>>>> so false positive shouldn't be harmful unless it exceeds the threshold.
>>>>>>> In addition, If such false positive happens frequently, it means
>>>>>>> zram is very trobule so that user would see lots of write fail
>>>>>>> message, sometime really slow system if zram is used for swap.
>>>>>>> If we protect just one write from the race, how much does it help
>>>>>>> this situation? I don't think it's critical problem.
>>>>>>>
>>>>>>>>
>>>>>>>> > And we cannot avoid the race, nor can we avoid in a low overhead competitive
>>>>>>>> > concurrent process transient inconsistent states.
>>>>>>>> > Different views for different observers.
>>>>>>>> >  They are a consequence of the theory of "Special Computational Relativity".
>>>>>>>> >  I am working on a String Unification Theory of Quantum and General CR in LISP.
>>>>>>>> >  ;-)
>>>>>>>>
>>>>>>>> If we move limit logic to zsmalloc, we can avoid the race by commiting
>>>>>>>> needed memory size before actual allocation attempt. This commiting makes
>>>>>>>> concurrent process serialized so there is no race here. There is
>>>>>>>> possibilty to fail to allocate, but I think this is better than alloc
>>>>>>>> and free blindlessly depending on inconsistent states.
>>>>>>>
>>>>>>> Normally, zsmalloc/zsfree allocates object from existing pool so
>>>>>>> it's not big overhead and if someone continue to try writing  once limit is
>>>>>>> full, another overhead (vfs, fs, block) would be bigger than zsmalloc
>>>>>>> so it's not a problem, I think.
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>>>>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>>>>>>> see: http://www.linux-mm.org/ .
>>>>>>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>>>>>
>>>>>>> --
>>>>>>> Kind regards,
>>>>>>> Minchan Kim

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 3/4] zram: zram memory size limitation
  2014-08-27 14:03             ` Dan Streetman
  2014-08-27 14:44               ` David Horner
@ 2014-08-28  2:52               ` Minchan Kim
  1 sibling, 0 replies; 27+ messages in thread
From: Minchan Kim @ 2014-08-28  2:52 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Joonsoo Kim, David Horner, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings

Hello,

On Wed, Aug 27, 2014 at 10:03:45AM -0400, Dan Streetman wrote:
> On Tue, Aug 26, 2014 at 10:51 PM, Minchan Kim <minchan@kernel.org> wrote:
> > Hey Joonsoo,
> >
> > On Wed, Aug 27, 2014 at 10:26:11AM +0900, Joonsoo Kim wrote:
> >> Hello, Minchan and David.
> >>
> >> On Tue, Aug 26, 2014 at 08:22:29AM -0400, David Horner wrote:
> >> > On Tue, Aug 26, 2014 at 3:55 AM, Minchan Kim <minchan@kernel.org> wrote:
> >> > > Hey Joonsoo,
> >> > >
> >> > > On Tue, Aug 26, 2014 at 04:37:30PM +0900, Joonsoo Kim wrote:
> >> > >> On Mon, Aug 25, 2014 at 09:05:55AM +0900, Minchan Kim wrote:
> >> > >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
> >> > >> >             ret = -ENOMEM;
> >> > >> >             goto out;
> >> > >> >     }
> >> > >> > +
> >> > >> > +   if (zram->limit_pages &&
> >> > >> > +           zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
> >> > >> > +           zs_free(meta->mem_pool, handle);
> >> > >> > +           ret = -ENOMEM;
> >> > >> > +           goto out;
> >> > >> > +   }
> >> > >> > +
> >> > >> >     cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
> >> > >>
> >> > >> Hello,
> >> > >>
> >> > >> I don't follow up previous discussion, so I could be wrong.
> >> > >> Why this enforcement should be here?
> >> > >>
> >> > >> I think that this has two problems.
> >> > >> 1) alloc/free happens unnecessarilly if we have used memory over the
> >> > >> limitation.
> >> > >
> >> > > True but firstly, I implemented the logic in zsmalloc, not zram but
> >> > > as I described in cover-letter, it's not a requirement of zsmalloc
> >> > > but zram so it should be in there. If every user want it in future,
> >> > > then we could move the function into zsmalloc. That's what we
> >> > > concluded in previous discussion.
> >>
> >> Hmm...
> >> Problem is that we can't avoid these unnecessary overhead in this
> >> implementation. If we can implement this feature in zram efficiently,
> >> it's okay. But, I think that current form isn't.
> >
> >
> > If we can add it in zsmalloc, it would be more clean and efficient
> > for zram but as I said, at the moment, I didn't want to put zram's
> > requirement into zsmalloc because to me, it's weird to enforce max
> > limit to allocator. It's client's role, I think.
> >
> > If current implementation is expensive and rather hard to follow,
> > It would be one reason to move the feature into zsmalloc but
> > I don't think it makes critical trobule in zram usecase.
> > See below.
> >
> > But I still open and will wait others's opinion.
> > If other guys think zsmalloc is better place, I am willing to move
> > it into zsmalloc.
> 
> Moving it into zsmalloc would allow rejecting new zsmallocs before
> actually crossing the limit, since it can calculate that internally.
> However, with the current patches the limit will only be briefly
> crossed, and it should not be crossed by a large amount.  Now, if this
> is happening repeatedly and quickly during extreme memory pressure,
> the constant alloc/free will clearly be worse than a simple internal
> calculation and failure.  But would it ever happen repeatedly once the
> zram limit is reached?

Right. it depends on user.
If user writes continuously without any action to cover the situation
once the limit is over, it would be terrible but what I meant *terrible*
doesn't mean alloc/free cost. Actually, zsmalloc/zsfree cost is really
cheaper comparing to other mm reclaim, swap, vfs, fs, block layer's one.

What I meant *terrible* is slower system caused by swapout failure
on system but it try to reclaim anonymous pages continuously.

> 
> Now that I'm thinking about the limit from the perspective of the zram
> user, I wonder what really will happen.  If zram is being used for
> swap space, then when swap starts getting errors trying to write
> pages, how damaging will that be to the system?  I haven't checked
> what swap does when it encounters disk errors.  Of course, with no
> zram limit, continually writing to zram until memory is totally
> consumed isn't good either.  But in any case, I would hope that swap
> would not repeatedly hammer on a disk when it's getting write failures
> from it.

Good point. Actually, it's my next step.
Curretly, VM doesn't handle congestion for anonymous page while it is
doing something for file-backed pages(but actually, it's really basic
stuff at this moment) so we could improve it with several ways.
I'm looking at it now.

> 
> Alternately, if zram was being used as a compressed ram disk for
> regular file storage, it's entirely up to the application to handle
> write failures, so it may continue to try to write to a full zram
> disk.
> 
> As far as what the zsmalloc api would look like with the limit added,
> it would need a setter and getter function (adding it as a param to
> the create function would be optional i think).  But more importantly,
> it would need to handle multiple ways of specifying the limit.  In our
> specific current use cases, zram and zswap, each handles their
> internal limit differently - zswap currently uses a % of total ram as
> its limit (defaulting to 20), while with these patches zram will use a
> specific number of bytes as its limit (defaulting to no limit).  If
> the limiting mechanism is moved into zsmalloc (and possibly zbud),
> then either both users need to use the same units (bytes or %ram), or
> zsmalloc/zbud need to be able to set their limit in either units.  It
> seems to me like keeping the limit in zram/zswap is currently
> preferable, at least without both using the same limit units.

Acutally, I didn't thought memory-hotplug for zswap.
Thanks for point it out, Dan.
I agree with you.

Thanks!

> 
> 
> >
> >>
> >> > >
> >> > > Another idea is we could call zs_get_total_pages right before zs_malloc
> >> > > but the problem is we cannot know how many of pages are allocated
> >> > > by zsmalloc in advance.
> >> > > IOW, zram should be blind on zsmalloc's internal.
> >> > >
> >> >
> >> > We did however suggest that we could check before hand to see if
> >> > max was already exceeded as an optimization.
> >> > (possibly with a guess on usage but at least using the minimum of 1 page)
> >> > In the contested case, the max may already be exceeded transiently and
> >> > therefore we know this one _could_ fail (it could also pass, but odds
> >> > aren't good).
> >> > As Minchan mentions this was discussed before - but not into great detail.
> >> > Testing should be done to determine possible benefit. And as he also
> >> > mentions, the better place for it may be in zsmalloc, but that
> >> > requires an ABI change.
> >>
> >> Why we hesitate to change zsmalloc API? It is in-kernel API and there
> >> are just two users now, zswap and zram. We can change it easily.
> >> I think that we just need following simple API change in zsmalloc.c.
> >>
> >> zs_zpool_create(gfp_t gfp, struct zpool_ops *zpool_op)
> >> =>
> >> zs_zpool_create(unsigned long limit, gfp_t gfp, struct zpool_ops
> >> *zpool_op)
> >>
> >> It's pool allocator so there is no obstacle for us to limit maximum
> >> memory usage in zsmalloc. It's a natural idea to limit memory usage
> >> for pool allocator.
> >>
> >> > Certainly a detailed suggestion could happen on this thread and I'm
> >> > also interested
> >> > in your thoughts, but this patchset should be able to go in as is.
> >> > Memory exhaustion avoidance probably trumps the possible thrashing at
> >> > threshold.
> >> >
> >> > > About alloc/free cost once if it is over the limit,
> >> > > I don't think it's important to consider.
> >> > > Do you have any scenario in your mind to consider alloc/free cost
> >> > > when the limit is over?
> >> > >
> >> > >> 2) Even if this request doesn't do new allocation, it could be failed
> >> > >> due to other's allocation. There is time gap between allocation and
> >> > >> free, so legimate user who want to use preallocated zsmalloc memory
> >> > >> could also see this condition true and then he will be failed.
> >> > >
> >> > > Yeb, we already discussed that. :)
> >> > > Such false positive shouldn't be a severe problem if we can keep a
> >> > > promise that zram user cannot exceed mem_limit.
> >> > >
> >>
> >> If we can keep such a promise, why we need to limit memory usage?
> >> I guess that this limit feature is useful for user who can't keep such promise.
> >> So, we should assume that this false positive happens frequently.
> >
> >
> > The goal is to limit memory usage within some threshold.
> > so false positive shouldn't be harmful unless it exceeds the threshold.
> > In addition, If such false positive happens frequently, it means
> > zram is very trobule so that user would see lots of write fail
> > message, sometime really slow system if zram is used for swap.
> > If we protect just one write from the race, how much does it help
> > this situation? I don't think it's critical problem.
> >
> >>
> >> > And we cannot avoid the race, nor can we avoid in a low overhead competitive
> >> > concurrent process transient inconsistent states.
> >> > Different views for different observers.
> >> >  They are a consequence of the theory of "Special Computational Relativity".
> >> >  I am working on a String Unification Theory of Quantum and General CR in LISP.
> >> >  ;-)
> >>
> >> If we move limit logic to zsmalloc, we can avoid the race by commiting
> >> needed memory size before actual allocation attempt. This commiting makes
> >> concurrent process serialized so there is no race here. There is
> >> possibilty to fail to allocate, but I think this is better than alloc
> >> and free blindlessly depending on inconsistent states.
> >
> > Normally, zsmalloc/zsfree allocates object from existing pool so
> > it's not big overhead and if someone continue to try writing  once limit is
> > full, another overhead (vfs, fs, block) would be bigger than zsmalloc
> > so it's not a problem, I think.
> >
> >>
> >> Thanks.
> >>
> >> --
> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >> the body to majordomo@kvack.org.  For more info on Linux MM,
> >> see: http://www.linux-mm.org/ .
> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >
> > --
> > Kind regards,
> > Minchan Kim
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 3/4] zram: zram memory size limitation
  2014-08-27 16:29                     ` Dan Streetman
  2014-08-27 16:59                       ` David Horner
@ 2014-08-28  2:59                       ` Minchan Kim
  1 sibling, 0 replies; 27+ messages in thread
From: Minchan Kim @ 2014-08-28  2:59 UTC (permalink / raw)
  To: Dan Streetman
  Cc: David Horner, Joonsoo Kim, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings

On Wed, Aug 27, 2014 at 12:29:22PM -0400, Dan Streetman wrote:
> On Wed, Aug 27, 2014 at 11:35 AM, David Horner <ds2horner@gmail.com> wrote:
> > On Wed, Aug 27, 2014 at 11:14 AM, Dan Streetman <ddstreet@ieee.org> wrote:
> >> On Wed, Aug 27, 2014 at 10:44 AM, David Horner <ds2horner@gmail.com> wrote:
> >>> On Wed, Aug 27, 2014 at 10:03 AM, Dan Streetman <ddstreet@ieee.org> wrote:
> >>>> On Tue, Aug 26, 2014 at 10:51 PM, Minchan Kim <minchan@kernel.org> wrote:
> >>>>> Hey Joonsoo,
> >>>>>
> >>>>> On Wed, Aug 27, 2014 at 10:26:11AM +0900, Joonsoo Kim wrote:
> >>>>>> Hello, Minchan and David.
> >>>>>>
> >>>>>> On Tue, Aug 26, 2014 at 08:22:29AM -0400, David Horner wrote:
> >>>>>> > On Tue, Aug 26, 2014 at 3:55 AM, Minchan Kim <minchan@kernel.org> wrote:
> >>>>>> > > Hey Joonsoo,
> >>>>>> > >
> >>>>>> > > On Tue, Aug 26, 2014 at 04:37:30PM +0900, Joonsoo Kim wrote:
> >>>>>> > >> On Mon, Aug 25, 2014 at 09:05:55AM +0900, Minchan Kim wrote:
> >>>>>> > >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
> >>>>>> > >> >             ret = -ENOMEM;
> >>>>>> > >> >             goto out;
> >>>>>> > >> >     }
> >>>>>> > >> > +
> >>>>>> > >> > +   if (zram->limit_pages &&
> >>>>>> > >> > +           zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
> >>>>>> > >> > +           zs_free(meta->mem_pool, handle);
> >>>>>> > >> > +           ret = -ENOMEM;
> >>>>>> > >> > +           goto out;
> >>>>>> > >> > +   }
> >>>>>> > >> > +
> >>>>>> > >> >     cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
> >>>>>> > >>
> >>>>>> > >> Hello,
> >>>>>> > >>
> >>>>>> > >> I don't follow up previous discussion, so I could be wrong.
> >>>>>> > >> Why this enforcement should be here?
> >>>>>> > >>
> >>>>>> > >> I think that this has two problems.
> >>>>>> > >> 1) alloc/free happens unnecessarilly if we have used memory over the
> >>>>>> > >> limitation.
> >>>>>> > >
> >>>>>> > > True but firstly, I implemented the logic in zsmalloc, not zram but
> >>>>>> > > as I described in cover-letter, it's not a requirement of zsmalloc
> >>>>>> > > but zram so it should be in there. If every user want it in future,
> >>>>>> > > then we could move the function into zsmalloc. That's what we
> >>>>>> > > concluded in previous discussion.
> >>>>>>
> >>>>>> Hmm...
> >>>>>> Problem is that we can't avoid these unnecessary overhead in this
> >>>>>> implementation. If we can implement this feature in zram efficiently,
> >>>>>> it's okay. But, I think that current form isn't.
> >>>>>
> >>>>>
> >>>>> If we can add it in zsmalloc, it would be more clean and efficient
> >>>>> for zram but as I said, at the moment, I didn't want to put zram's
> >>>>> requirement into zsmalloc because to me, it's weird to enforce max
> >>>>> limit to allocator. It's client's role, I think.
> >>>>>
> >>>>> If current implementation is expensive and rather hard to follow,
> >>>>> It would be one reason to move the feature into zsmalloc but
> >>>>> I don't think it makes critical trobule in zram usecase.
> >>>>> See below.
> >>>>>
> >>>>> But I still open and will wait others's opinion.
> >>>>> If other guys think zsmalloc is better place, I am willing to move
> >>>>> it into zsmalloc.
> >>>>
> >>>> Moving it into zsmalloc would allow rejecting new zsmallocs before
> >>>> actually crossing the limit, since it can calculate that internally.
> >>>> However, with the current patches the limit will only be briefly
> >>>> crossed, and it should not be crossed by a large amount.  Now, if this
> >>>> is happening repeatedly and quickly during extreme memory pressure,
> >>>> the constant alloc/free will clearly be worse than a simple internal
> >>>> calculation and failure.  But would it ever happen repeatedly once the
> >>>> zram limit is reached?
> >>>>
> >>>> Now that I'm thinking about the limit from the perspective of the zram
> >>>> user, I wonder what really will happen.  If zram is being used for
> >>>> swap space, then when swap starts getting errors trying to write
> >>>> pages, how damaging will that be to the system?  I haven't checked
> >>>> what swap does when it encounters disk errors.  Of course, with no
> >>>> zram limit, continually writing to zram until memory is totally
> >>>> consumed isn't good either.  But in any case, I would hope that swap
> >>>> would not repeatedly hammer on a disk when it's getting write failures
> >>>> from it.
> >>>>
> >>>> Alternately, if zram was being used as a compressed ram disk for
> >>>> regular file storage, it's entirely up to the application to handle
> >>>> write failures, so it may continue to try to write to a full zram
> >>>> disk.
> >>>>
> >>>> As far as what the zsmalloc api would look like with the limit added,
> >>>> it would need a setter and getter function (adding it as a param to
> >>>> the create function would be optional i think).  But more importantly,
> >>>> it would need to handle multiple ways of specifying the limit.  In our
> >>>> specific current use cases, zram and zswap, each handles their
> >>>> internal limit differently - zswap currently uses a % of total ram as
> >>>> its limit (defaulting to 20), while with these patches zram will use a
> >>>> specific number of bytes as its limit (defaulting to no limit).  If
> >>>> the limiting mechanism is moved into zsmalloc (and possibly zbud),
> >>>> then either both users need to use the same units (bytes or %ram), or
> >>>> zsmalloc/zbud need to be able to set their limit in either units.  It
> >>>> seems to me like keeping the limit in zram/zswap is currently
> >>>> preferable, at least without both using the same limit units.
> >>>>
> >>>
> >>> zswap knows what 20% (or whatever % it currently uses , and perhaps it too
> >>> will become a tuning knob) of memory is in bytes.
> >>>
> >>> So, if the interface to establish a limit for a pool (or pool set, or whatever
> >>> zsmalloc sets up for its allocation mechanism) is stipulated in bytes
> >>> (to actually use pages internally, of visa-versa) , then both can use
> >>> that interface.
> >>> zram with its native page stipulation, and zswap with calculated % of memory).
> >>
> >> No, unless zswap monitors memory hotplug and updates the limit on each
> >> hotplug event, 20% of the *current* total ram at zswap initialization
> >> is not equal to an actual 20% of ram limit.  zswap checks its size
> >> against totalram_pages for each new allocation. I don't think we would
> >> prefer adding memory hotplug monitoring to zswap just to update the
> >> zpool size limit.
> >>
> >
> > OK - I see the need to retain the limits where they are in the using
> > components so that
> > zsmalloc is not unnecessarily complicated (keeping track of 2 limit methods).
> >
> > So, zswap has the same race conditions and possible transient over-allocations?
> > It looks like I will have to check on how zswap implements it.
> > But perhaps you can answer the question that is not in the code:
> > Have there been reported thrashing behaviour around the 20% limit for zswap?
> 
> zswap does a simple over-allocation check before allocating anything.
> So during page store, it checks if (total_ram * 0.20) < used.  This
> actually places the effective limit higher than the specified limit,
> but only by a single allocation.  This approach could be taken with
> zram as well.
> 
> The amount of over-allocation (past the specified limit) would vary
> between zsmalloc and zbud.  Since zbud increases itself in page
> increments, any over-allocation past the zswap limit would be by only
> 1 page.  However, zsmalloc is variable in its allocation increments,
> as it depends on which class needs to be grown; zsmalloc is divided
> into many "classes", each of contains some number of "zspages" which
> try to precisely contain some number of N-sized areas; e.g. one class
> might use zspages that are 2 pages to store 3 separate areas which are
> each 2/3 of a page number of bytes; if that class needed to be grown,
> it would add one zspage that is 2 pages.  The max number of actual
> pages per zspage is defined by ZS_MAX_PAGES_PER_ZSPAGE which is
> currently set to 1<<2, so 4.
> So with zswap, it will over-allocate memory past its specified limit,
> up to 1 page (with zbud) or up to 4 pages (with zsmalloc).  zram could
> do the same, simply check if its size > limit before each write, and
> fail if so; that would remove the alloc/free issue, and would only
> over-allocate by at most 4 pages (with the current zsmalloc settings).
> Alternately, zram could check if its (current_size + 4pages > limit),
> which would then stop it short of the limit by up to 4 pages.  Really
> though, 4 pages either above or under the limit probably doesn't
> matter.

It's doable but the problem is that it expose allocator's internal
(ie, zsmalloc allocates page up to 4) and that's thing I wanted to
avoid from the beginning.

If anyone says alloc/free cost is really high with breaking
my expectation, we could consider it as one of solution.


> 
> >
> > thanks.
> >
> >>>
> >>> Both would need a mechanism to change the max as need change,
> >>>  so the API has to handle this.
> >>>
> >>>
> >>> Or am I way off base?
> >>>
> >>>
> >>>>
> >>>>>
> >>>>>>
> >>>>>> > >
> >>>>>> > > Another idea is we could call zs_get_total_pages right before zs_malloc
> >>>>>> > > but the problem is we cannot know how many of pages are allocated
> >>>>>> > > by zsmalloc in advance.
> >>>>>> > > IOW, zram should be blind on zsmalloc's internal.
> >>>>>> > >
> >>>>>> >
> >>>>>> > We did however suggest that we could check before hand to see if
> >>>>>> > max was already exceeded as an optimization.
> >>>>>> > (possibly with a guess on usage but at least using the minimum of 1 page)
> >>>>>> > In the contested case, the max may already be exceeded transiently and
> >>>>>> > therefore we know this one _could_ fail (it could also pass, but odds
> >>>>>> > aren't good).
> >>>>>> > As Minchan mentions this was discussed before - but not into great detail.
> >>>>>> > Testing should be done to determine possible benefit. And as he also
> >>>>>> > mentions, the better place for it may be in zsmalloc, but that
> >>>>>> > requires an ABI change.
> >>>>>>
> >>>>>> Why we hesitate to change zsmalloc API? It is in-kernel API and there
> >>>>>> are just two users now, zswap and zram. We can change it easily.
> >>>>>> I think that we just need following simple API change in zsmalloc.c.
> >>>>>>
> >>>>>> zs_zpool_create(gfp_t gfp, struct zpool_ops *zpool_op)
> >>>>>> =>
> >>>>>> zs_zpool_create(unsigned long limit, gfp_t gfp, struct zpool_ops
> >>>>>> *zpool_op)
> >>>>>>
> >>>>>> It's pool allocator so there is no obstacle for us to limit maximum
> >>>>>> memory usage in zsmalloc. It's a natural idea to limit memory usage
> >>>>>> for pool allocator.
> >>>>>>
> >>>>>> > Certainly a detailed suggestion could happen on this thread and I'm
> >>>>>> > also interested
> >>>>>> > in your thoughts, but this patchset should be able to go in as is.
> >>>>>> > Memory exhaustion avoidance probably trumps the possible thrashing at
> >>>>>> > threshold.
> >>>>>> >
> >>>>>> > > About alloc/free cost once if it is over the limit,
> >>>>>> > > I don't think it's important to consider.
> >>>>>> > > Do you have any scenario in your mind to consider alloc/free cost
> >>>>>> > > when the limit is over?
> >>>>>> > >
> >>>>>> > >> 2) Even if this request doesn't do new allocation, it could be failed
> >>>>>> > >> due to other's allocation. There is time gap between allocation and
> >>>>>> > >> free, so legimate user who want to use preallocated zsmalloc memory
> >>>>>> > >> could also see this condition true and then he will be failed.
> >>>>>> > >
> >>>>>> > > Yeb, we already discussed that. :)
> >>>>>> > > Such false positive shouldn't be a severe problem if we can keep a
> >>>>>> > > promise that zram user cannot exceed mem_limit.
> >>>>>> > >
> >>>>>>
> >>>>>> If we can keep such a promise, why we need to limit memory usage?
> >>>>>> I guess that this limit feature is useful for user who can't keep such promise.
> >>>>>> So, we should assume that this false positive happens frequently.
> >>>>>
> >>>>>
> >>>>> The goal is to limit memory usage within some threshold.
> >>>>> so false positive shouldn't be harmful unless it exceeds the threshold.
> >>>>> In addition, If such false positive happens frequently, it means
> >>>>> zram is very trobule so that user would see lots of write fail
> >>>>> message, sometime really slow system if zram is used for swap.
> >>>>> If we protect just one write from the race, how much does it help
> >>>>> this situation? I don't think it's critical problem.
> >>>>>
> >>>>>>
> >>>>>> > And we cannot avoid the race, nor can we avoid in a low overhead competitive
> >>>>>> > concurrent process transient inconsistent states.
> >>>>>> > Different views for different observers.
> >>>>>> >  They are a consequence of the theory of "Special Computational Relativity".
> >>>>>> >  I am working on a String Unification Theory of Quantum and General CR in LISP.
> >>>>>> >  ;-)
> >>>>>>
> >>>>>> If we move limit logic to zsmalloc, we can avoid the race by commiting
> >>>>>> needed memory size before actual allocation attempt. This commiting makes
> >>>>>> concurrent process serialized so there is no race here. There is
> >>>>>> possibilty to fail to allocate, but I think this is better than alloc
> >>>>>> and free blindlessly depending on inconsistent states.
> >>>>>
> >>>>> Normally, zsmalloc/zsfree allocates object from existing pool so
> >>>>> it's not big overhead and if someone continue to try writing  once limit is
> >>>>> full, another overhead (vfs, fs, block) would be bigger than zsmalloc
> >>>>> so it's not a problem, I think.
> >>>>>
> >>>>>>
> >>>>>> Thanks.
> >>>>>>
> >>>>>> --
> >>>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >>>>>> the body to majordomo@kvack.org.  For more info on Linux MM,
> >>>>>> see: http://www.linux-mm.org/ .
> >>>>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >>>>>
> >>>>> --
> >>>>> Kind regards,
> >>>>> Minchan Kim
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 3/4] zram: zram memory size limitation
  2014-08-27 19:04                         ` Dan Streetman
@ 2014-08-28  3:04                           ` Minchan Kim
  0 siblings, 0 replies; 27+ messages in thread
From: Minchan Kim @ 2014-08-28  3:04 UTC (permalink / raw)
  To: Dan Streetman
  Cc: David Horner, Joonsoo Kim, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings

On Wed, Aug 27, 2014 at 03:04:32PM -0400, Dan Streetman wrote:
> On Wed, Aug 27, 2014 at 12:59 PM, David Horner <ds2horner@gmail.com> wrote:
> > On Wed, Aug 27, 2014 at 12:29 PM, Dan Streetman <ddstreet@ieee.org> wrote:
> >> On Wed, Aug 27, 2014 at 11:35 AM, David Horner <ds2horner@gmail.com> wrote:
> >>> On Wed, Aug 27, 2014 at 11:14 AM, Dan Streetman <ddstreet@ieee.org> wrote:
> >>>> On Wed, Aug 27, 2014 at 10:44 AM, David Horner <ds2horner@gmail.com> wrote:
> >>>>> On Wed, Aug 27, 2014 at 10:03 AM, Dan Streetman <ddstreet@ieee.org> wrote:
> >>>>>> On Tue, Aug 26, 2014 at 10:51 PM, Minchan Kim <minchan@kernel.org> wrote:
> >>>>>>> Hey Joonsoo,
> >>>>>>>
> >>>>>>> On Wed, Aug 27, 2014 at 10:26:11AM +0900, Joonsoo Kim wrote:
> >>>>>>>> Hello, Minchan and David.
> >>>>>>>>
> >>>>>>>> On Tue, Aug 26, 2014 at 08:22:29AM -0400, David Horner wrote:
> >>>>>>>> > On Tue, Aug 26, 2014 at 3:55 AM, Minchan Kim <minchan@kernel.org> wrote:
> >>>>>>>> > > Hey Joonsoo,
> >>>>>>>> > >
> >>>>>>>> > > On Tue, Aug 26, 2014 at 04:37:30PM +0900, Joonsoo Kim wrote:
> >>>>>>>> > >> On Mon, Aug 25, 2014 at 09:05:55AM +0900, Minchan Kim wrote:
> >>>>>>>> > >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
> >>>>>>>> > >> >             ret = -ENOMEM;
> >>>>>>>> > >> >             goto out;
> >>>>>>>> > >> >     }
> >>>>>>>> > >> > +
> >>>>>>>> > >> > +   if (zram->limit_pages &&
> >>>>>>>> > >> > +           zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
> >>>>>>>> > >> > +           zs_free(meta->mem_pool, handle);
> >>>>>>>> > >> > +           ret = -ENOMEM;
> >>>>>>>> > >> > +           goto out;
> >>>>>>>> > >> > +   }
> >>>>>>>> > >> > +
> >>>>>>>> > >> >     cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
> >>>>>>>> > >>
> >>>>>>>> > >> Hello,
> >>>>>>>> > >>
> >>>>>>>> > >> I don't follow up previous discussion, so I could be wrong.
> >>>>>>>> > >> Why this enforcement should be here?
> >>>>>>>> > >>
> >>>>>>>> > >> I think that this has two problems.
> >>>>>>>> > >> 1) alloc/free happens unnecessarilly if we have used memory over the
> >>>>>>>> > >> limitation.
> >>>>>>>> > >
> >>>>>>>> > > True but firstly, I implemented the logic in zsmalloc, not zram but
> >>>>>>>> > > as I described in cover-letter, it's not a requirement of zsmalloc
> >>>>>>>> > > but zram so it should be in there. If every user want it in future,
> >>>>>>>> > > then we could move the function into zsmalloc. That's what we
> >>>>>>>> > > concluded in previous discussion.
> >>>>>>>>
> >>>>>>>> Hmm...
> >>>>>>>> Problem is that we can't avoid these unnecessary overhead in this
> >>>>>>>> implementation. If we can implement this feature in zram efficiently,
> >>>>>>>> it's okay. But, I think that current form isn't.
> >>>>>>>
> >>>>>>>
> >>>>>>> If we can add it in zsmalloc, it would be more clean and efficient
> >>>>>>> for zram but as I said, at the moment, I didn't want to put zram's
> >>>>>>> requirement into zsmalloc because to me, it's weird to enforce max
> >>>>>>> limit to allocator. It's client's role, I think.
> >>>>>>>
> >>>>>>> If current implementation is expensive and rather hard to follow,
> >>>>>>> It would be one reason to move the feature into zsmalloc but
> >>>>>>> I don't think it makes critical trobule in zram usecase.
> >>>>>>> See below.
> >>>>>>>
> >>>>>>> But I still open and will wait others's opinion.
> >>>>>>> If other guys think zsmalloc is better place, I am willing to move
> >>>>>>> it into zsmalloc.
> >>>>>>
> >>>>>> Moving it into zsmalloc would allow rejecting new zsmallocs before
> >>>>>> actually crossing the limit, since it can calculate that internally.
> >>>>>> However, with the current patches the limit will only be briefly
> >>>>>> crossed, and it should not be crossed by a large amount.  Now, if this
> >>>>>> is happening repeatedly and quickly during extreme memory pressure,
> >>>>>> the constant alloc/free will clearly be worse than a simple internal
> >>>>>> calculation and failure.  But would it ever happen repeatedly once the
> >>>>>> zram limit is reached?
> >>>>>>
> >>>>>> Now that I'm thinking about the limit from the perspective of the zram
> >>>>>> user, I wonder what really will happen.  If zram is being used for
> >>>>>> swap space, then when swap starts getting errors trying to write
> >>>>>> pages, how damaging will that be to the system?  I haven't checked
> >>>>>> what swap does when it encounters disk errors.  Of course, with no
> >>>>>> zram limit, continually writing to zram until memory is totally
> >>>>>> consumed isn't good either.  But in any case, I would hope that swap
> >>>>>> would not repeatedly hammer on a disk when it's getting write failures
> >>>>>> from it.
> >>>>>>
> >>>>>> Alternately, if zram was being used as a compressed ram disk for
> >>>>>> regular file storage, it's entirely up to the application to handle
> >>>>>> write failures, so it may continue to try to write to a full zram
> >>>>>> disk.
> >>>>>>
> >>>>>> As far as what the zsmalloc api would look like with the limit added,
> >>>>>> it would need a setter and getter function (adding it as a param to
> >>>>>> the create function would be optional i think).  But more importantly,
> >>>>>> it would need to handle multiple ways of specifying the limit.  In our
> >>>>>> specific current use cases, zram and zswap, each handles their
> >>>>>> internal limit differently - zswap currently uses a % of total ram as
> >>>>>> its limit (defaulting to 20), while with these patches zram will use a
> >>>>>> specific number of bytes as its limit (defaulting to no limit).  If
> >>>>>> the limiting mechanism is moved into zsmalloc (and possibly zbud),
> >>>>>> then either both users need to use the same units (bytes or %ram), or
> >>>>>> zsmalloc/zbud need to be able to set their limit in either units.  It
> >>>>>> seems to me like keeping the limit in zram/zswap is currently
> >>>>>> preferable, at least without both using the same limit units.
> >>>>>>
> >>>>>
> >>>>> zswap knows what 20% (or whatever % it currently uses , and perhaps it too
> >>>>> will become a tuning knob) of memory is in bytes.
> >>>>>
> >>>>> So, if the interface to establish a limit for a pool (or pool set, or whatever
> >>>>> zsmalloc sets up for its allocation mechanism) is stipulated in bytes
> >>>>> (to actually use pages internally, of visa-versa) , then both can use
> >>>>> that interface.
> >>>>> zram with its native page stipulation, and zswap with calculated % of memory).
> >>>>
> >>>> No, unless zswap monitors memory hotplug and updates the limit on each
> >>>> hotplug event, 20% of the *current* total ram at zswap initialization
> >>>> is not equal to an actual 20% of ram limit.  zswap checks its size
> >>>> against totalram_pages for each new allocation. I don't think we would
> >>>> prefer adding memory hotplug monitoring to zswap just to update the
> >>>> zpool size limit.
> >>>>
> >>>
> >>> OK - I see the need to retain the limits where they are in the using
> >>> components so that
> >>> zsmalloc is not unnecessarily complicated (keeping track of 2 limit methods).
> >>>
> >>> So, zswap has the same race conditions and possible transient over-allocations?
> >>> It looks like I will have to check on how zswap implements it.
> >>> But perhaps you can answer the question that is not in the code:
> >>> Have there been reported thrashing behaviour around the 20% limit for zswap?
> >>
> >> zswap does a simple over-allocation check before allocating anything.
> >> So during page store, it checks if (total_ram * 0.20) < used.  This
> >> actually places the effective limit higher than the specified limit,
> >> but only by a single allocation.  This approach could be taken with
> >> zram as well.
> >>
> >> The amount of over-allocation (past the specified limit) would vary
> >> between zsmalloc and zbud.  Since zbud increases itself in page
> >> increments, any over-allocation past the zswap limit would be by only
> >> 1 page.  However, zsmalloc is variable in its allocation increments,
> >> as it depends on which class needs to be grown; zsmalloc is divided
> >> into many "classes", each of contains some number of "zspages" which
> >> try to precisely contain some number of N-sized areas; e.g. one class
> >> might use zspages that are 2 pages to store 3 separate areas which are
> >> each 2/3 of a page number of bytes; if that class needed to be grown,
> >> it would add one zspage that is 2 pages.  The max number of actual
> >> pages per zspage is defined by ZS_MAX_PAGES_PER_ZSPAGE which is
> >> currently set to 1<<2, so 4.
> >>
> >> So with zswap, it will over-allocate memory past its specified limit,
> >> up to 1 page (with zbud) or up to 4 pages (with zsmalloc).  zram could
> >> do the same, simply check if its size > limit before each write, and
> >> fail if so; that would remove the alloc/free issue, and would only
> >> over-allocate by at most 4 pages (with the current zsmalloc settings).
> >> Alternately, zram could check if its (current_size + 4pages > limit),
> >> which would then stop it short of the limit by up to 4 pages.  Really
> >> though, 4 pages either above or under the limit probably doesn't
> >> matter.
> >
> > I agree that it isn't the over allocation of a few pages, per se, but if
> > there is potential for concurrent allocation through either zram of zswap
> > there is the possibility of high contention thrashing as the processes
> > jocky for those final pages. Perhaps not a thundering herd, but
> >  it could readily become a hot spot as the pressure is on and there is
> >  no explicit mechanism to throttle back retries and new efforts.
> >
> > I agree that a precheck before zsmalloc request in zram may be a valuable
> > optimization, but only if there is a real possibility of threshold thrashing.
> > I suggested before that it might be best to wait and see....
> > Do you have any suggestions on a stress test approach to empirically assess
> > the risk?
> 
> Minchan is the expert here so he may have suggestions and/or existing
> tests, but I think just creating a zram device with a limit and any
> filesystem and mounting it, then filling up the device, would be a
> good basic test.  Maybe once it gets close to the limit, try writing
> to it with multiple threads concurrently.  Also, while near/at the
> limit, try concurrently writing to it and deleting from it - probably
> writing many small files as well as deleting many small files would be
> best to keep it right at its limit.  However, I think
> zram_bvec_write() is already protected inside the init_lock semphore
> (via zram_make_request), so it seems like there will never be any
> concurrency issue.

Strictly speaking, it's read-side lock so that it could be concurrent.

> 
> Since the threads writing to files at the limit will be getting
> (expected) errors during write, the test is really looking at is
> whether there is any significant performance difference between a
> alloc/free-on-full approach or a pre-check/fail-on-full approach.
> That might not become apparent until there is significant memory
> pressure, so you might set the zram limit high enough, or you could
> otherwise artificially use up system memory (e.g. stress -m).
> 
> Also, as a general test of the new zram limit, it would be interesting
> to see how various filesystem drivers handle unexpected write
> failures...I'm no fs expert, but I would expect that this is different
> than a typical ENOSPC condition, which I assume comes from the
> filesystem, not the block layer.  This error would be more like a
> hardware error.

True. Acutally, I found a BUG_ON on ext4 but didn't look it in depth
yet. Maybe I will report.

> 
> I'd also be interested in what happens when a zram-backed swap device
> reaches its limit...I would hope swap device write errors would lead
> to a normal OOM condition and invoke the oom-killer, but from
> page_io.c end_swap_bio_write:
> 
>  * We failed to write the page out to swap-space.
>  * Re-dirty the page in order to avoid it being reclaimed.
>  * Also print a dire warning that things will go BAD (tm)
>  * very quickly.
> 

Yeb, bad. as I said, VM doesn't have congestion control for swap at the
moment and it could be improved and I am seeing that.

> 
> 
> >
> > Thanks again.
> >
> >
> >>
> >>>
> >>> thanks.
> >>>
> >>>>>
> >>>>> Both would need a mechanism to change the max as need change,
> >>>>>  so the API has to handle this.
> >>>>>
> >>>>>
> >>>>> Or am I way off base?
> >>>>>
> >>>>>
> >>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> > >
> >>>>>>>> > > Another idea is we could call zs_get_total_pages right before zs_malloc
> >>>>>>>> > > but the problem is we cannot know how many of pages are allocated
> >>>>>>>> > > by zsmalloc in advance.
> >>>>>>>> > > IOW, zram should be blind on zsmalloc's internal.
> >>>>>>>> > >
> >>>>>>>> >
> >>>>>>>> > We did however suggest that we could check before hand to see if
> >>>>>>>> > max was already exceeded as an optimization.
> >>>>>>>> > (possibly with a guess on usage but at least using the minimum of 1 page)
> >>>>>>>> > In the contested case, the max may already be exceeded transiently and
> >>>>>>>> > therefore we know this one _could_ fail (it could also pass, but odds
> >>>>>>>> > aren't good).
> >>>>>>>> > As Minchan mentions this was discussed before - but not into great detail.
> >>>>>>>> > Testing should be done to determine possible benefit. And as he also
> >>>>>>>> > mentions, the better place for it may be in zsmalloc, but that
> >>>>>>>> > requires an ABI change.
> >>>>>>>>
> >>>>>>>> Why we hesitate to change zsmalloc API? It is in-kernel API and there
> >>>>>>>> are just two users now, zswap and zram. We can change it easily.
> >>>>>>>> I think that we just need following simple API change in zsmalloc.c.
> >>>>>>>>
> >>>>>>>> zs_zpool_create(gfp_t gfp, struct zpool_ops *zpool_op)
> >>>>>>>> =>
> >>>>>>>> zs_zpool_create(unsigned long limit, gfp_t gfp, struct zpool_ops
> >>>>>>>> *zpool_op)
> >>>>>>>>
> >>>>>>>> It's pool allocator so there is no obstacle for us to limit maximum
> >>>>>>>> memory usage in zsmalloc. It's a natural idea to limit memory usage
> >>>>>>>> for pool allocator.
> >>>>>>>>
> >>>>>>>> > Certainly a detailed suggestion could happen on this thread and I'm
> >>>>>>>> > also interested
> >>>>>>>> > in your thoughts, but this patchset should be able to go in as is.
> >>>>>>>> > Memory exhaustion avoidance probably trumps the possible thrashing at
> >>>>>>>> > threshold.
> >>>>>>>> >
> >>>>>>>> > > About alloc/free cost once if it is over the limit,
> >>>>>>>> > > I don't think it's important to consider.
> >>>>>>>> > > Do you have any scenario in your mind to consider alloc/free cost
> >>>>>>>> > > when the limit is over?
> >>>>>>>> > >
> >>>>>>>> > >> 2) Even if this request doesn't do new allocation, it could be failed
> >>>>>>>> > >> due to other's allocation. There is time gap between allocation and
> >>>>>>>> > >> free, so legimate user who want to use preallocated zsmalloc memory
> >>>>>>>> > >> could also see this condition true and then he will be failed.
> >>>>>>>> > >
> >>>>>>>> > > Yeb, we already discussed that. :)
> >>>>>>>> > > Such false positive shouldn't be a severe problem if we can keep a
> >>>>>>>> > > promise that zram user cannot exceed mem_limit.
> >>>>>>>> > >
> >>>>>>>>
> >>>>>>>> If we can keep such a promise, why we need to limit memory usage?
> >>>>>>>> I guess that this limit feature is useful for user who can't keep such promise.
> >>>>>>>> So, we should assume that this false positive happens frequently.
> >>>>>>>
> >>>>>>>
> >>>>>>> The goal is to limit memory usage within some threshold.
> >>>>>>> so false positive shouldn't be harmful unless it exceeds the threshold.
> >>>>>>> In addition, If such false positive happens frequently, it means
> >>>>>>> zram is very trobule so that user would see lots of write fail
> >>>>>>> message, sometime really slow system if zram is used for swap.
> >>>>>>> If we protect just one write from the race, how much does it help
> >>>>>>> this situation? I don't think it's critical problem.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> > And we cannot avoid the race, nor can we avoid in a low overhead competitive
> >>>>>>>> > concurrent process transient inconsistent states.
> >>>>>>>> > Different views for different observers.
> >>>>>>>> >  They are a consequence of the theory of "Special Computational Relativity".
> >>>>>>>> >  I am working on a String Unification Theory of Quantum and General CR in LISP.
> >>>>>>>> >  ;-)
> >>>>>>>>
> >>>>>>>> If we move limit logic to zsmalloc, we can avoid the race by commiting
> >>>>>>>> needed memory size before actual allocation attempt. This commiting makes
> >>>>>>>> concurrent process serialized so there is no race here. There is
> >>>>>>>> possibilty to fail to allocate, but I think this is better than alloc
> >>>>>>>> and free blindlessly depending on inconsistent states.
> >>>>>>>
> >>>>>>> Normally, zsmalloc/zsfree allocates object from existing pool so
> >>>>>>> it's not big overhead and if someone continue to try writing  once limit is
> >>>>>>> full, another overhead (vfs, fs, block) would be bigger than zsmalloc
> >>>>>>> so it's not a problem, I think.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks.
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >>>>>>>> the body to majordomo@kvack.org.  For more info on Linux MM,
> >>>>>>>> see: http://www.linux-mm.org/ .
> >>>>>>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Kind regards,
> >>>>>>> Minchan Kim
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v5 3/4] zram: zram memory size limitation
  2014-08-27  7:28               ` Minchan Kim
@ 2014-08-28  8:21                 ` Joonsoo Kim
  0 siblings, 0 replies; 27+ messages in thread
From: Joonsoo Kim @ 2014-08-28  8:21 UTC (permalink / raw)
  To: Minchan Kim
  Cc: David Horner, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman

On Wed, Aug 27, 2014 at 04:28:19PM +0900, Minchan Kim wrote:
> On Wed, Aug 27, 2014 at 02:04:38PM +0900, Joonsoo Kim wrote:
> > On Wed, Aug 27, 2014 at 11:51:32AM +0900, Minchan Kim wrote:
> > > Hey Joonsoo,
> > > 
> > > On Wed, Aug 27, 2014 at 10:26:11AM +0900, Joonsoo Kim wrote:
> > > > Hello, Minchan and David.
> > > > 
> > > > On Tue, Aug 26, 2014 at 08:22:29AM -0400, David Horner wrote:
> > > > > On Tue, Aug 26, 2014 at 3:55 AM, Minchan Kim <minchan@kernel.org> wrote:
> > > > > > Hey Joonsoo,
> > > > > >
> > > > > > On Tue, Aug 26, 2014 at 04:37:30PM +0900, Joonsoo Kim wrote:
> > > > > >> On Mon, Aug 25, 2014 at 09:05:55AM +0900, Minchan Kim wrote:
> > > > > >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
> > > > > >> >             ret = -ENOMEM;
> > > > > >> >             goto out;
> > > > > >> >     }
> > > > > >> > +
> > > > > >> > +   if (zram->limit_pages &&
> > > > > >> > +           zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
> > > > > >> > +           zs_free(meta->mem_pool, handle);
> > > > > >> > +           ret = -ENOMEM;
> > > > > >> > +           goto out;
> > > > > >> > +   }
> > > > > >> > +
> > > > > >> >     cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
> > > > > >>
> > > > > >> Hello,
> > > > > >>
> > > > > >> I don't follow up previous discussion, so I could be wrong.
> > > > > >> Why this enforcement should be here?
> > > > > >>
> > > > > >> I think that this has two problems.
> > > > > >> 1) alloc/free happens unnecessarilly if we have used memory over the
> > > > > >> limitation.
> > > > > >
> > > > > > True but firstly, I implemented the logic in zsmalloc, not zram but
> > > > > > as I described in cover-letter, it's not a requirement of zsmalloc
> > > > > > but zram so it should be in there. If every user want it in future,
> > > > > > then we could move the function into zsmalloc. That's what we
> > > > > > concluded in previous discussion.
> > > > 
> > > > Hmm...
> > > > Problem is that we can't avoid these unnecessary overhead in this
> > > > implementation. If we can implement this feature in zram efficiently,
> > > > it's okay. But, I think that current form isn't.
> > > 
> > > 
> > > If we can add it in zsmalloc, it would be more clean and efficient
> > > for zram but as I said, at the moment, I didn't want to put zram's
> > > requirement into zsmalloc because to me, it's weird to enforce max
> > > limit to allocator. It's client's role, I think.
> > 
> > AFAIK, many kinds of pools such as thread-pool or memory-pool have
> > their own limit. It's not weird for me.
> 
> Actually I don't know what is pool allocator but things you mentioned
> is basically used to gaurantee *new* thread/memory, not limit although
> it would implement limit.
> 
> Another question, why do you think zsmalloc is pool allocator?
> IOW, What logic makes you think it's pool allocator?

In fact, it is not pool allocator for now. But, it looks like pool
allocator because it is used only for one zram device. If there are
many zram devices, there are many zs_pool and their memory cannot be
shared. It is totally isolated each other. We can easily make it
actual pool allocator or impose memory usage limit on it with this
property. This make me think that zsmalloc is better place to limit
memory usage.

Anyway, I don't have strong objection to current implementation. You
can fix it later when it turn out to be real problem.

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2014-08-28  8:20 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-25  0:05 [PATCH v5 0/4] zram memory control enhance Minchan Kim
2014-08-25  0:05 ` [PATCH v5 1/4] zsmalloc: move pages_allocated to zs_pool Minchan Kim
2014-08-25  0:05 ` [PATCH v5 2/4] zsmalloc: change return value unit of zs_get_total_size_bytes Minchan Kim
2014-08-25  4:08   ` David Horner
2014-08-25  0:05 ` [PATCH v5 3/4] zram: zram memory size limitation Minchan Kim
2014-08-25 11:09   ` Sergey Senozhatsky
2014-08-26  4:52     ` Minchan Kim
2014-08-26  7:37   ` Joonsoo Kim
2014-08-26  7:55     ` Minchan Kim
2014-08-26 12:22       ` David Horner
2014-08-27  1:26         ` Joonsoo Kim
2014-08-27  2:51           ` Minchan Kim
2014-08-27  5:04             ` Joonsoo Kim
2014-08-27  7:28               ` Minchan Kim
2014-08-28  8:21                 ` Joonsoo Kim
2014-08-27 14:03             ` Dan Streetman
2014-08-27 14:44               ` David Horner
2014-08-27 15:14                 ` Dan Streetman
2014-08-27 15:35                   ` David Horner
2014-08-27 16:29                     ` Dan Streetman
2014-08-27 16:59                       ` David Horner
2014-08-27 19:04                         ` Dan Streetman
2014-08-28  3:04                           ` Minchan Kim
2014-08-28  2:59                       ` Minchan Kim
2014-08-28  2:52               ` Minchan Kim
2014-08-25  0:05 ` [PATCH v5 4/4] zram: report maximum used memory Minchan Kim
2014-08-25  4:05   ` David Horner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).