All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/4] zram memory control enhance
@ 2014-08-22  0:42 ` Minchan Kim
  0 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2014-08-22  0:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Jerome Marchand,
	juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta,
	Seth Jennings, Dan Streetman, ds2horner, Minchan Kim

Currently, zram has no feature to limit memory so theoretically
zram can deplete system memory.
Users have asked for a limit several times as even without exhaustion
zram makes it hard to control memory usage of the platform.
This patchset adds the feature.

Patch 1 makes zs_get_total_size_bytes faster because it would be
used frequently in later patches for the new feature.

Patch 2 changes zs_get_total_size_bytes's return unit from bytes
to page so that zsmalloc doesn't need unnecessary operation(ie,
<< PAGE_SHIFT).

Patch 3 adds new feature. I added the feature into zram layer,
not zsmalloc because limiation is zram's requirement, not zsmalloc
so any other user using zsmalloc(ie, zpool) shouldn't affected
by unnecessary branch of zsmalloc. In future, if every users
of zsmalloc want the feature, then, we could move the feature
from client side to zsmalloc easily but vice versa would be
painful.

Patch 4 adds news facility to report maximum memory usage of zram
so that this avoids user polling frequently via /sys/block/zram0/
mem_used_total and ensures transient max are not missed.

* From v3
 * get_zs_total_size_byte function name change - Dan
 * clarifiction of the document - Dan
 * atomic account instead of introducing new lock in zsmalloc - David
 * remove unnecessary atomic instruction in updating max - David
 
* From v2
 * introduce helper funcntion to update max_used_pages
   for readability - David
 * avoid unncessary zs_get_total_size call in updating loop
   for max_used_pages - David

* From v1
 * rebased on next-20140815
 * fix up race problem - David, Dan
 * reset mem_used_max as current total_bytes, rather than 0 - David
 * resetting works with only "0" write for extensiblilty - David, Dan

Minchan Kim (4):
  zsmalloc: move pages_allocated to zs_pool
  zsmalloc: change return value unit of  zs_get_total_size_bytes
  zram: zram memory size limitation
  zram: report maximum used memory

 Documentation/ABI/testing/sysfs-block-zram |  20 ++++++
 Documentation/blockdev/zram.txt            |  25 +++++--
 drivers/block/zram/zram_drv.c              | 101 ++++++++++++++++++++++++++++-
 drivers/block/zram/zram_drv.h              |   6 ++
 include/linux/zsmalloc.h                   |   2 +-
 mm/zsmalloc.c                              |  30 ++++-----
 6 files changed, 158 insertions(+), 26 deletions(-)

-- 
2.0.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v4 0/4] zram memory control enhance
@ 2014-08-22  0:42 ` Minchan Kim
  0 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2014-08-22  0:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Jerome Marchand,
	juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta,
	Seth Jennings, Dan Streetman, ds2horner, Minchan Kim

Currently, zram has no feature to limit memory so theoretically
zram can deplete system memory.
Users have asked for a limit several times as even without exhaustion
zram makes it hard to control memory usage of the platform.
This patchset adds the feature.

Patch 1 makes zs_get_total_size_bytes faster because it would be
used frequently in later patches for the new feature.

Patch 2 changes zs_get_total_size_bytes's return unit from bytes
to page so that zsmalloc doesn't need unnecessary operation(ie,
<< PAGE_SHIFT).

Patch 3 adds new feature. I added the feature into zram layer,
not zsmalloc because limiation is zram's requirement, not zsmalloc
so any other user using zsmalloc(ie, zpool) shouldn't affected
by unnecessary branch of zsmalloc. In future, if every users
of zsmalloc want the feature, then, we could move the feature
from client side to zsmalloc easily but vice versa would be
painful.

Patch 4 adds news facility to report maximum memory usage of zram
so that this avoids user polling frequently via /sys/block/zram0/
mem_used_total and ensures transient max are not missed.

* From v3
 * get_zs_total_size_byte function name change - Dan
 * clarifiction of the document - Dan
 * atomic account instead of introducing new lock in zsmalloc - David
 * remove unnecessary atomic instruction in updating max - David
 
* From v2
 * introduce helper funcntion to update max_used_pages
   for readability - David
 * avoid unncessary zs_get_total_size call in updating loop
   for max_used_pages - David

* From v1
 * rebased on next-20140815
 * fix up race problem - David, Dan
 * reset mem_used_max as current total_bytes, rather than 0 - David
 * resetting works with only "0" write for extensiblilty - David, Dan

Minchan Kim (4):
  zsmalloc: move pages_allocated to zs_pool
  zsmalloc: change return value unit of  zs_get_total_size_bytes
  zram: zram memory size limitation
  zram: report maximum used memory

 Documentation/ABI/testing/sysfs-block-zram |  20 ++++++
 Documentation/blockdev/zram.txt            |  25 +++++--
 drivers/block/zram/zram_drv.c              | 101 ++++++++++++++++++++++++++++-
 drivers/block/zram/zram_drv.h              |   6 ++
 include/linux/zsmalloc.h                   |   2 +-
 mm/zsmalloc.c                              |  30 ++++-----
 6 files changed, 158 insertions(+), 26 deletions(-)

-- 
2.0.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v4 1/4] zsmalloc: move pages_allocated to zs_pool
  2014-08-22  0:42 ` Minchan Kim
@ 2014-08-22  0:42   ` Minchan Kim
  -1 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2014-08-22  0:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Jerome Marchand,
	juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta,
	Seth Jennings, Dan Streetman, ds2horner, Minchan Kim

pages_allocated has counted in size_class structure and when user
of zsmalloc want to see total_size_bytes, it should gather all of
count from each size_class to report the sum.

it's not bad if user don't see the value often but if user start
to see the value frequently, it would be not a good deal for
performance pov.

This patch moves the count from size_class to zs_pool so it could
reduce memory footprint (from [255 * 8byte] to
[sizeof(atomic_long_t)]).

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 23 ++++++++---------------
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 94f38fac5e81..2a4acf400846 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -199,9 +199,6 @@ struct size_class {
 
 	spinlock_t lock;
 
-	/* stats */
-	u64 pages_allocated;
-
 	struct page *fullness_list[_ZS_NR_FULLNESS_GROUPS];
 };
 
@@ -220,6 +217,7 @@ struct zs_pool {
 	struct size_class size_class[ZS_SIZE_CLASSES];
 
 	gfp_t flags;	/* allocation flags used when growing pool */
+	atomic_long_t pages_allocated;
 };
 
 /*
@@ -1028,8 +1026,9 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
 			return 0;
 
 		set_zspage_mapping(first_page, class->index, ZS_EMPTY);
+		atomic_long_add(class->pages_per_zspage,
+					&pool->pages_allocated);
 		spin_lock(&class->lock);
-		class->pages_allocated += class->pages_per_zspage;
 	}
 
 	obj = (unsigned long)first_page->freelist;
@@ -1082,14 +1081,13 @@ void zs_free(struct zs_pool *pool, unsigned long obj)
 
 	first_page->inuse--;
 	fullness = fix_fullness_group(pool, first_page);
-
-	if (fullness == ZS_EMPTY)
-		class->pages_allocated -= class->pages_per_zspage;
-
 	spin_unlock(&class->lock);
 
-	if (fullness == ZS_EMPTY)
+	if (fullness == ZS_EMPTY) {
+		atomic_long_sub(class->pages_per_zspage,
+				&pool->pages_allocated);
 		free_zspage(first_page);
+	}
 }
 EXPORT_SYMBOL_GPL(zs_free);
 
@@ -1185,12 +1183,7 @@ EXPORT_SYMBOL_GPL(zs_unmap_object);
 
 u64 zs_get_total_size_bytes(struct zs_pool *pool)
 {
-	int i;
-	u64 npages = 0;
-
-	for (i = 0; i < ZS_SIZE_CLASSES; i++)
-		npages += pool->size_class[i].pages_allocated;
-
+	u64 npages = atomic_long_read(&pool->pages_allocated);
 	return npages << PAGE_SHIFT;
 }
 EXPORT_SYMBOL_GPL(zs_get_total_size_bytes);
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 1/4] zsmalloc: move pages_allocated to zs_pool
@ 2014-08-22  0:42   ` Minchan Kim
  0 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2014-08-22  0:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Jerome Marchand,
	juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta,
	Seth Jennings, Dan Streetman, ds2horner, Minchan Kim

pages_allocated has counted in size_class structure and when user
of zsmalloc want to see total_size_bytes, it should gather all of
count from each size_class to report the sum.

it's not bad if user don't see the value often but if user start
to see the value frequently, it would be not a good deal for
performance pov.

This patch moves the count from size_class to zs_pool so it could
reduce memory footprint (from [255 * 8byte] to
[sizeof(atomic_long_t)]).

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 23 ++++++++---------------
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 94f38fac5e81..2a4acf400846 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -199,9 +199,6 @@ struct size_class {
 
 	spinlock_t lock;
 
-	/* stats */
-	u64 pages_allocated;
-
 	struct page *fullness_list[_ZS_NR_FULLNESS_GROUPS];
 };
 
@@ -220,6 +217,7 @@ struct zs_pool {
 	struct size_class size_class[ZS_SIZE_CLASSES];
 
 	gfp_t flags;	/* allocation flags used when growing pool */
+	atomic_long_t pages_allocated;
 };
 
 /*
@@ -1028,8 +1026,9 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
 			return 0;
 
 		set_zspage_mapping(first_page, class->index, ZS_EMPTY);
+		atomic_long_add(class->pages_per_zspage,
+					&pool->pages_allocated);
 		spin_lock(&class->lock);
-		class->pages_allocated += class->pages_per_zspage;
 	}
 
 	obj = (unsigned long)first_page->freelist;
@@ -1082,14 +1081,13 @@ void zs_free(struct zs_pool *pool, unsigned long obj)
 
 	first_page->inuse--;
 	fullness = fix_fullness_group(pool, first_page);
-
-	if (fullness == ZS_EMPTY)
-		class->pages_allocated -= class->pages_per_zspage;
-
 	spin_unlock(&class->lock);
 
-	if (fullness == ZS_EMPTY)
+	if (fullness == ZS_EMPTY) {
+		atomic_long_sub(class->pages_per_zspage,
+				&pool->pages_allocated);
 		free_zspage(first_page);
+	}
 }
 EXPORT_SYMBOL_GPL(zs_free);
 
@@ -1185,12 +1183,7 @@ EXPORT_SYMBOL_GPL(zs_unmap_object);
 
 u64 zs_get_total_size_bytes(struct zs_pool *pool)
 {
-	int i;
-	u64 npages = 0;
-
-	for (i = 0; i < ZS_SIZE_CLASSES; i++)
-		npages += pool->size_class[i].pages_allocated;
-
+	u64 npages = atomic_long_read(&pool->pages_allocated);
 	return npages << PAGE_SHIFT;
 }
 EXPORT_SYMBOL_GPL(zs_get_total_size_bytes);
-- 
2.0.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 2/4] zsmalloc: change return value unit of  zs_get_total_size_bytes
  2014-08-22  0:42 ` Minchan Kim
@ 2014-08-22  0:42   ` Minchan Kim
  -1 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2014-08-22  0:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Jerome Marchand,
	juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta,
	Seth Jennings, Dan Streetman, ds2horner, Minchan Kim

zs_get_total_size_bytes returns a amount of memory zsmalloc
consumed with *byte unit* but zsmalloc operates *page unit*
rather than byte unit so let's change the API so benefit
we could get is that reduce unnecessary overhead
(ie, change page unit with byte unit) in zsmalloc.

Since return type is pages, "zs_get_total_pages" is better than
"zs_get_total_size_bytes".

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/zram/zram_drv.c | 4 ++--
 include/linux/zsmalloc.h      | 2 +-
 mm/zsmalloc.c                 | 9 ++++-----
 3 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index d00831c3d731..f0b8b30a7128 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -103,10 +103,10 @@ static ssize_t mem_used_total_show(struct device *dev,
 
 	down_read(&zram->init_lock);
 	if (init_done(zram))
-		val = zs_get_total_size_bytes(meta->mem_pool);
+		val = zs_get_total_pages(meta->mem_pool);
 	up_read(&zram->init_lock);
 
-	return scnprintf(buf, PAGE_SIZE, "%llu\n", val);
+	return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
 }
 
 static ssize_t max_comp_streams_show(struct device *dev,
diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index e44d634e7fb7..05c214760977 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -46,6 +46,6 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
 			enum zs_mapmode mm);
 void zs_unmap_object(struct zs_pool *pool, unsigned long handle);
 
-u64 zs_get_total_size_bytes(struct zs_pool *pool);
+unsigned long zs_get_total_pages(struct zs_pool *pool);
 
 #endif
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 2a4acf400846..c4a91578dc96 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -297,7 +297,7 @@ static void zs_zpool_unmap(void *pool, unsigned long handle)
 
 static u64 zs_zpool_total_size(void *pool)
 {
-	return zs_get_total_size_bytes(pool);
+	return zs_get_total_pages(pool) << PAGE_SHIFT;
 }
 
 static struct zpool_driver zs_zpool_driver = {
@@ -1181,12 +1181,11 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
 }
 EXPORT_SYMBOL_GPL(zs_unmap_object);
 
-u64 zs_get_total_size_bytes(struct zs_pool *pool)
+unsigned long zs_get_total_pages(struct zs_pool *pool)
 {
-	u64 npages = atomic_long_read(&pool->pages_allocated);
-	return npages << PAGE_SHIFT;
+	return atomic_long_read(&pool->pages_allocated);
 }
-EXPORT_SYMBOL_GPL(zs_get_total_size_bytes);
+EXPORT_SYMBOL_GPL(zs_get_total_pages);
 
 module_init(zs_init);
 module_exit(zs_exit);
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 2/4] zsmalloc: change return value unit of  zs_get_total_size_bytes
@ 2014-08-22  0:42   ` Minchan Kim
  0 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2014-08-22  0:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Jerome Marchand,
	juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta,
	Seth Jennings, Dan Streetman, ds2horner, Minchan Kim

zs_get_total_size_bytes returns a amount of memory zsmalloc
consumed with *byte unit* but zsmalloc operates *page unit*
rather than byte unit so let's change the API so benefit
we could get is that reduce unnecessary overhead
(ie, change page unit with byte unit) in zsmalloc.

Since return type is pages, "zs_get_total_pages" is better than
"zs_get_total_size_bytes".

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/zram/zram_drv.c | 4 ++--
 include/linux/zsmalloc.h      | 2 +-
 mm/zsmalloc.c                 | 9 ++++-----
 3 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index d00831c3d731..f0b8b30a7128 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -103,10 +103,10 @@ static ssize_t mem_used_total_show(struct device *dev,
 
 	down_read(&zram->init_lock);
 	if (init_done(zram))
-		val = zs_get_total_size_bytes(meta->mem_pool);
+		val = zs_get_total_pages(meta->mem_pool);
 	up_read(&zram->init_lock);
 
-	return scnprintf(buf, PAGE_SIZE, "%llu\n", val);
+	return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
 }
 
 static ssize_t max_comp_streams_show(struct device *dev,
diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index e44d634e7fb7..05c214760977 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -46,6 +46,6 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle,
 			enum zs_mapmode mm);
 void zs_unmap_object(struct zs_pool *pool, unsigned long handle);
 
-u64 zs_get_total_size_bytes(struct zs_pool *pool);
+unsigned long zs_get_total_pages(struct zs_pool *pool);
 
 #endif
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 2a4acf400846..c4a91578dc96 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -297,7 +297,7 @@ static void zs_zpool_unmap(void *pool, unsigned long handle)
 
 static u64 zs_zpool_total_size(void *pool)
 {
-	return zs_get_total_size_bytes(pool);
+	return zs_get_total_pages(pool) << PAGE_SHIFT;
 }
 
 static struct zpool_driver zs_zpool_driver = {
@@ -1181,12 +1181,11 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
 }
 EXPORT_SYMBOL_GPL(zs_unmap_object);
 
-u64 zs_get_total_size_bytes(struct zs_pool *pool)
+unsigned long zs_get_total_pages(struct zs_pool *pool)
 {
-	u64 npages = atomic_long_read(&pool->pages_allocated);
-	return npages << PAGE_SHIFT;
+	return atomic_long_read(&pool->pages_allocated);
 }
-EXPORT_SYMBOL_GPL(zs_get_total_size_bytes);
+EXPORT_SYMBOL_GPL(zs_get_total_pages);
 
 module_init(zs_init);
 module_exit(zs_exit);
-- 
2.0.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 3/4] zram: zram memory size limitation
  2014-08-22  0:42 ` Minchan Kim
@ 2014-08-22  0:42   ` Minchan Kim
  -1 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2014-08-22  0:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Jerome Marchand,
	juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta,
	Seth Jennings, Dan Streetman, ds2horner, Minchan Kim

Since zram has no control feature to limit memory usage,
it makes hard to manage system memrory.

This patch adds new knob "mem_limit" via sysfs to set up the
a limit so that zram could fail allocation once it reaches
the limit.

In addition, user could change the limit in runtime so that
he could manage the memory more dynamically.

Default is no limit so it doesn't break old behavior.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
 Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
 drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
 drivers/block/zram/zram_drv.h              |  5 ++++
 4 files changed, 76 insertions(+), 4 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
index 70ec992514d0..b8c779d64968 100644
--- a/Documentation/ABI/testing/sysfs-block-zram
+++ b/Documentation/ABI/testing/sysfs-block-zram
@@ -119,3 +119,13 @@ Description:
 		efficiency can be calculated using compr_data_size and this
 		statistic.
 		Unit: bytes
+
+What:		/sys/block/zram<id>/mem_limit
+Date:		August 2014
+Contact:	Minchan Kim <minchan@kernel.org>
+Description:
+		The mem_limit file is read/write and specifies the amount
+		of memory to be able to consume memory to store store
+		compressed data. The limit could be changed in run time
+		and "0" is default which means disable the limit.
+		Unit: bytes
diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index 0595c3f56ccf..82c6a41116db 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
 since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
 size of the disk when not in use so a huge zram is wasteful.
 
-5) Activate:
+5) Set memory limit: Optional
+	Set memory limit by writing the value to sysfs node 'mem_limit'.
+	The value can be either in bytes or you can use mem suffixes.
+	In addition, you could change the value in runtime.
+	Examples:
+	    # limit /dev/zram0 with 50MB memory
+	    echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
+
+	    # Using mem suffixes
+	    echo 256K > /sys/block/zram0/mem_limit
+	    echo 512M > /sys/block/zram0/mem_limit
+	    echo 1G > /sys/block/zram0/mem_limit
+
+	    # To disable memory limit
+	    echo 0 > /sys/block/zram0/mem_limit
+
+6) Activate:
 	mkswap /dev/zram0
 	swapon /dev/zram0
 
 	mkfs.ext4 /dev/zram1
 	mount /dev/zram1 /tmp
 
-6) Stats:
+7) Stats:
 	Per-device statistics are exported as various nodes under
 	/sys/block/zram<id>/
 		disksize
@@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
 		compr_data_size
 		mem_used_total
 
-7) Deactivate:
+8) Deactivate:
 	swapoff /dev/zram0
 	umount /dev/zram1
 
-8) Reset:
+9) Reset:
 	Write any positive value to 'reset' sysfs node
 	echo 1 > /sys/block/zram0/reset
 	echo 1 > /sys/block/zram1/reset
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index f0b8b30a7128..370c355eb127 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
 	return scnprintf(buf, PAGE_SIZE, "%d\n", val);
 }
 
+static ssize_t mem_limit_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	u64 val;
+	struct zram *zram = dev_to_zram(dev);
+
+	down_read(&zram->init_lock);
+	val = zram->limit_pages;
+	up_read(&zram->init_lock);
+
+	return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
+}
+
+static ssize_t mem_limit_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t len)
+{
+	u64 limit;
+	struct zram *zram = dev_to_zram(dev);
+
+	limit = memparse(buf, NULL);
+	down_write(&zram->init_lock);
+	zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
+	up_write(&zram->init_lock);
+
+	return len;
+}
+
 static ssize_t max_comp_streams_store(struct device *dev,
 		struct device_attribute *attr, const char *buf, size_t len)
 {
@@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 		ret = -ENOMEM;
 		goto out;
 	}
+
+	if (zram->limit_pages &&
+		zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
+		zs_free(meta->mem_pool, handle);
+		ret = -ENOMEM;
+		goto out;
+	}
+
 	cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
 
 	if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
@@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
 	struct zram_meta *meta;
 
 	down_write(&zram->init_lock);
+
+	zram->limit_pages = 0;
+
 	if (!init_done(zram)) {
 		up_write(&zram->init_lock);
 		return;
@@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
 static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
 static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
 static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
+static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
+		mem_limit_store);
 static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
 		max_comp_streams_show, max_comp_streams_store);
 static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
@@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
 	&dev_attr_orig_data_size.attr,
 	&dev_attr_compr_data_size.attr,
 	&dev_attr_mem_used_total.attr,
+	&dev_attr_mem_limit.attr,
 	&dev_attr_max_comp_streams.attr,
 	&dev_attr_comp_algorithm.attr,
 	NULL,
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index e0f725c87cc6..b7aa9c21553f 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -112,6 +112,11 @@ struct zram {
 	u64 disksize;	/* bytes */
 	int max_comp_streams;
 	struct zram_stats stats;
+	/*
+	 * the number of pages zram can consume for storing compressed data
+	 */
+	unsigned long limit_pages;
+
 	char compressor[10];
 };
 #endif
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 3/4] zram: zram memory size limitation
@ 2014-08-22  0:42   ` Minchan Kim
  0 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2014-08-22  0:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Jerome Marchand,
	juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta,
	Seth Jennings, Dan Streetman, ds2horner, Minchan Kim

Since zram has no control feature to limit memory usage,
it makes hard to manage system memrory.

This patch adds new knob "mem_limit" via sysfs to set up the
a limit so that zram could fail allocation once it reaches
the limit.

In addition, user could change the limit in runtime so that
he could manage the memory more dynamically.

Default is no limit so it doesn't break old behavior.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
 Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
 drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
 drivers/block/zram/zram_drv.h              |  5 ++++
 4 files changed, 76 insertions(+), 4 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
index 70ec992514d0..b8c779d64968 100644
--- a/Documentation/ABI/testing/sysfs-block-zram
+++ b/Documentation/ABI/testing/sysfs-block-zram
@@ -119,3 +119,13 @@ Description:
 		efficiency can be calculated using compr_data_size and this
 		statistic.
 		Unit: bytes
+
+What:		/sys/block/zram<id>/mem_limit
+Date:		August 2014
+Contact:	Minchan Kim <minchan@kernel.org>
+Description:
+		The mem_limit file is read/write and specifies the amount
+		of memory to be able to consume memory to store store
+		compressed data. The limit could be changed in run time
+		and "0" is default which means disable the limit.
+		Unit: bytes
diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index 0595c3f56ccf..82c6a41116db 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
 since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
 size of the disk when not in use so a huge zram is wasteful.
 
-5) Activate:
+5) Set memory limit: Optional
+	Set memory limit by writing the value to sysfs node 'mem_limit'.
+	The value can be either in bytes or you can use mem suffixes.
+	In addition, you could change the value in runtime.
+	Examples:
+	    # limit /dev/zram0 with 50MB memory
+	    echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
+
+	    # Using mem suffixes
+	    echo 256K > /sys/block/zram0/mem_limit
+	    echo 512M > /sys/block/zram0/mem_limit
+	    echo 1G > /sys/block/zram0/mem_limit
+
+	    # To disable memory limit
+	    echo 0 > /sys/block/zram0/mem_limit
+
+6) Activate:
 	mkswap /dev/zram0
 	swapon /dev/zram0
 
 	mkfs.ext4 /dev/zram1
 	mount /dev/zram1 /tmp
 
-6) Stats:
+7) Stats:
 	Per-device statistics are exported as various nodes under
 	/sys/block/zram<id>/
 		disksize
@@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
 		compr_data_size
 		mem_used_total
 
-7) Deactivate:
+8) Deactivate:
 	swapoff /dev/zram0
 	umount /dev/zram1
 
-8) Reset:
+9) Reset:
 	Write any positive value to 'reset' sysfs node
 	echo 1 > /sys/block/zram0/reset
 	echo 1 > /sys/block/zram1/reset
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index f0b8b30a7128..370c355eb127 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
 	return scnprintf(buf, PAGE_SIZE, "%d\n", val);
 }
 
+static ssize_t mem_limit_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	u64 val;
+	struct zram *zram = dev_to_zram(dev);
+
+	down_read(&zram->init_lock);
+	val = zram->limit_pages;
+	up_read(&zram->init_lock);
+
+	return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
+}
+
+static ssize_t mem_limit_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t len)
+{
+	u64 limit;
+	struct zram *zram = dev_to_zram(dev);
+
+	limit = memparse(buf, NULL);
+	down_write(&zram->init_lock);
+	zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
+	up_write(&zram->init_lock);
+
+	return len;
+}
+
 static ssize_t max_comp_streams_store(struct device *dev,
 		struct device_attribute *attr, const char *buf, size_t len)
 {
@@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 		ret = -ENOMEM;
 		goto out;
 	}
+
+	if (zram->limit_pages &&
+		zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
+		zs_free(meta->mem_pool, handle);
+		ret = -ENOMEM;
+		goto out;
+	}
+
 	cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
 
 	if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
@@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
 	struct zram_meta *meta;
 
 	down_write(&zram->init_lock);
+
+	zram->limit_pages = 0;
+
 	if (!init_done(zram)) {
 		up_write(&zram->init_lock);
 		return;
@@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
 static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
 static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
 static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
+static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
+		mem_limit_store);
 static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
 		max_comp_streams_show, max_comp_streams_store);
 static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
@@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
 	&dev_attr_orig_data_size.attr,
 	&dev_attr_compr_data_size.attr,
 	&dev_attr_mem_used_total.attr,
+	&dev_attr_mem_limit.attr,
 	&dev_attr_max_comp_streams.attr,
 	&dev_attr_comp_algorithm.attr,
 	NULL,
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index e0f725c87cc6..b7aa9c21553f 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -112,6 +112,11 @@ struct zram {
 	u64 disksize;	/* bytes */
 	int max_comp_streams;
 	struct zram_stats stats;
+	/*
+	 * the number of pages zram can consume for storing compressed data
+	 */
+	unsigned long limit_pages;
+
 	char compressor[10];
 };
 #endif
-- 
2.0.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 4/4] zram: report maximum used memory
  2014-08-22  0:42 ` Minchan Kim
@ 2014-08-22  0:42   ` Minchan Kim
  -1 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2014-08-22  0:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Jerome Marchand,
	juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta,
	Seth Jennings, Dan Streetman, ds2horner, Minchan Kim

Normally, zram user could get maximum memory usage zram consumed
via polling mem_used_total with sysfs in userspace.

But it has a critical problem because user can miss peak memory
usage during update inverval of polling. For avoiding that,
user should poll it with shorter interval(ie, 0.0000000001s)
with mlocking to avoid page fault delay when memory pressure
is heavy. It would be troublesome.

This patch adds new knob "mem_used_max" so user could see
the maximum memory usage easily via reading the knob and reset
it via "echo 0 > /sys/block/zram0/mem_used_max".

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 Documentation/ABI/testing/sysfs-block-zram | 10 +++++
 Documentation/blockdev/zram.txt            |  1 +
 drivers/block/zram/zram_drv.c              | 60 +++++++++++++++++++++++++++++-
 drivers/block/zram/zram_drv.h              |  1 +
 4 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
index b8c779d64968..7b8fca6a9b77 100644
--- a/Documentation/ABI/testing/sysfs-block-zram
+++ b/Documentation/ABI/testing/sysfs-block-zram
@@ -120,6 +120,16 @@ Description:
 		statistic.
 		Unit: bytes
 
+What:		/sys/block/zram<id>/mem_used_max
+Date:		August 2014
+Contact:	Minchan Kim <minchan@kernel.org>
+Description:
+		The mem_used_max file is read/write and specifies the amount
+		of maximum memory zram have consumed to store compressed data.
+		For resetting the value, you should write "0". Otherwise,
+		you could see -EINVAL.
+		Unit: bytes
+
 What:		/sys/block/zram<id>/mem_limit
 Date:		August 2014
 Contact:	Minchan Kim <minchan@kernel.org>
diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index 82c6a41116db..7fcf9c6592ec 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -111,6 +111,7 @@ size of the disk when not in use so a huge zram is wasteful.
 		orig_data_size
 		compr_data_size
 		mem_used_total
+		mem_used_max
 
 8) Deactivate:
 	swapoff /dev/zram0
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 370c355eb127..1a2b3e320ea5 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -149,6 +149,41 @@ static ssize_t mem_limit_store(struct device *dev,
 	return len;
 }
 
+static ssize_t mem_used_max_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	u64 val = 0;
+	struct zram *zram = dev_to_zram(dev);
+
+	down_read(&zram->init_lock);
+	if (init_done(zram))
+		val = atomic_long_read(&zram->stats.max_used_pages);
+	up_read(&zram->init_lock);
+
+	return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
+}
+
+static ssize_t mem_used_max_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t len)
+{
+	int err;
+	unsigned long val;
+	struct zram *zram = dev_to_zram(dev);
+	struct zram_meta *meta = zram->meta;
+
+	err = kstrtoul(buf, 10, &val);
+	if (err || val != 0)
+		return -EINVAL;
+
+	down_read(&zram->init_lock);
+	if (init_done(zram))
+		atomic_long_set(&zram->stats.max_used_pages,
+				zs_get_total_pages(meta->mem_pool));
+	up_read(&zram->init_lock);
+
+	return len;
+}
+
 static ssize_t max_comp_streams_store(struct device *dev,
 		struct device_attribute *attr, const char *buf, size_t len)
 {
@@ -461,6 +496,21 @@ out_cleanup:
 	return ret;
 }
 
+static inline void update_used_max(struct zram *zram,
+					const unsigned long pages)
+{
+	int old_max, cur_max;
+
+	old_max = atomic_long_read(&zram->stats.max_used_pages);
+
+	do {
+		cur_max = old_max;
+		if (pages > cur_max)
+			old_max = atomic_long_cmpxchg(
+				&zram->stats.max_used_pages, cur_max, pages);
+	} while (old_max != cur_max);
+}
+
 static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 			   int offset)
 {
@@ -472,6 +522,7 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 	struct zram_meta *meta = zram->meta;
 	struct zcomp_strm *zstrm;
 	bool locked = false;
+	unsigned long alloced_pages;
 
 	page = bvec->bv_page;
 	if (is_partial_io(bvec)) {
@@ -541,13 +592,15 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 		goto out;
 	}
 
-	if (zram->limit_pages &&
-		zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
+	alloced_pages = zs_get_total_pages(meta->mem_pool);
+	if (zram->limit_pages && alloced_pages > zram->limit_pages) {
 		zs_free(meta->mem_pool, handle);
 		ret = -ENOMEM;
 		goto out;
 	}
 
+	update_used_max(zram, alloced_pages);
+
 	cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
 
 	if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
@@ -897,6 +950,8 @@ static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
 static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
 static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
 		mem_limit_store);
+static DEVICE_ATTR(mem_used_max, S_IRUGO | S_IWUSR, mem_used_max_show,
+		mem_used_max_store);
 static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
 		max_comp_streams_show, max_comp_streams_store);
 static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
@@ -926,6 +981,7 @@ static struct attribute *zram_disk_attrs[] = {
 	&dev_attr_compr_data_size.attr,
 	&dev_attr_mem_used_total.attr,
 	&dev_attr_mem_limit.attr,
+	&dev_attr_mem_used_max.attr,
 	&dev_attr_max_comp_streams.attr,
 	&dev_attr_comp_algorithm.attr,
 	NULL,
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index b7aa9c21553f..c6ee271317f5 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -90,6 +90,7 @@ struct zram_stats {
 	atomic64_t notify_free;	/* no. of swap slot free notifications */
 	atomic64_t zero_pages;		/* no. of zero filled pages */
 	atomic64_t pages_stored;	/* no. of pages currently stored */
+	atomic_long_t max_used_pages;	/* no. of maximum pages stored */
 };
 
 struct zram_meta {
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v4 4/4] zram: report maximum used memory
@ 2014-08-22  0:42   ` Minchan Kim
  0 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2014-08-22  0:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Jerome Marchand,
	juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta,
	Seth Jennings, Dan Streetman, ds2horner, Minchan Kim

Normally, zram user could get maximum memory usage zram consumed
via polling mem_used_total with sysfs in userspace.

But it has a critical problem because user can miss peak memory
usage during update inverval of polling. For avoiding that,
user should poll it with shorter interval(ie, 0.0000000001s)
with mlocking to avoid page fault delay when memory pressure
is heavy. It would be troublesome.

This patch adds new knob "mem_used_max" so user could see
the maximum memory usage easily via reading the knob and reset
it via "echo 0 > /sys/block/zram0/mem_used_max".

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 Documentation/ABI/testing/sysfs-block-zram | 10 +++++
 Documentation/blockdev/zram.txt            |  1 +
 drivers/block/zram/zram_drv.c              | 60 +++++++++++++++++++++++++++++-
 drivers/block/zram/zram_drv.h              |  1 +
 4 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
index b8c779d64968..7b8fca6a9b77 100644
--- a/Documentation/ABI/testing/sysfs-block-zram
+++ b/Documentation/ABI/testing/sysfs-block-zram
@@ -120,6 +120,16 @@ Description:
 		statistic.
 		Unit: bytes
 
+What:		/sys/block/zram<id>/mem_used_max
+Date:		August 2014
+Contact:	Minchan Kim <minchan@kernel.org>
+Description:
+		The mem_used_max file is read/write and specifies the amount
+		of maximum memory zram have consumed to store compressed data.
+		For resetting the value, you should write "0". Otherwise,
+		you could see -EINVAL.
+		Unit: bytes
+
 What:		/sys/block/zram<id>/mem_limit
 Date:		August 2014
 Contact:	Minchan Kim <minchan@kernel.org>
diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index 82c6a41116db..7fcf9c6592ec 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -111,6 +111,7 @@ size of the disk when not in use so a huge zram is wasteful.
 		orig_data_size
 		compr_data_size
 		mem_used_total
+		mem_used_max
 
 8) Deactivate:
 	swapoff /dev/zram0
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 370c355eb127..1a2b3e320ea5 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -149,6 +149,41 @@ static ssize_t mem_limit_store(struct device *dev,
 	return len;
 }
 
+static ssize_t mem_used_max_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	u64 val = 0;
+	struct zram *zram = dev_to_zram(dev);
+
+	down_read(&zram->init_lock);
+	if (init_done(zram))
+		val = atomic_long_read(&zram->stats.max_used_pages);
+	up_read(&zram->init_lock);
+
+	return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
+}
+
+static ssize_t mem_used_max_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t len)
+{
+	int err;
+	unsigned long val;
+	struct zram *zram = dev_to_zram(dev);
+	struct zram_meta *meta = zram->meta;
+
+	err = kstrtoul(buf, 10, &val);
+	if (err || val != 0)
+		return -EINVAL;
+
+	down_read(&zram->init_lock);
+	if (init_done(zram))
+		atomic_long_set(&zram->stats.max_used_pages,
+				zs_get_total_pages(meta->mem_pool));
+	up_read(&zram->init_lock);
+
+	return len;
+}
+
 static ssize_t max_comp_streams_store(struct device *dev,
 		struct device_attribute *attr, const char *buf, size_t len)
 {
@@ -461,6 +496,21 @@ out_cleanup:
 	return ret;
 }
 
+static inline void update_used_max(struct zram *zram,
+					const unsigned long pages)
+{
+	int old_max, cur_max;
+
+	old_max = atomic_long_read(&zram->stats.max_used_pages);
+
+	do {
+		cur_max = old_max;
+		if (pages > cur_max)
+			old_max = atomic_long_cmpxchg(
+				&zram->stats.max_used_pages, cur_max, pages);
+	} while (old_max != cur_max);
+}
+
 static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 			   int offset)
 {
@@ -472,6 +522,7 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 	struct zram_meta *meta = zram->meta;
 	struct zcomp_strm *zstrm;
 	bool locked = false;
+	unsigned long alloced_pages;
 
 	page = bvec->bv_page;
 	if (is_partial_io(bvec)) {
@@ -541,13 +592,15 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 		goto out;
 	}
 
-	if (zram->limit_pages &&
-		zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
+	alloced_pages = zs_get_total_pages(meta->mem_pool);
+	if (zram->limit_pages && alloced_pages > zram->limit_pages) {
 		zs_free(meta->mem_pool, handle);
 		ret = -ENOMEM;
 		goto out;
 	}
 
+	update_used_max(zram, alloced_pages);
+
 	cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
 
 	if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
@@ -897,6 +950,8 @@ static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
 static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
 static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
 		mem_limit_store);
+static DEVICE_ATTR(mem_used_max, S_IRUGO | S_IWUSR, mem_used_max_show,
+		mem_used_max_store);
 static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
 		max_comp_streams_show, max_comp_streams_store);
 static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
@@ -926,6 +981,7 @@ static struct attribute *zram_disk_attrs[] = {
 	&dev_attr_compr_data_size.attr,
 	&dev_attr_mem_used_total.attr,
 	&dev_attr_mem_limit.attr,
+	&dev_attr_mem_used_max.attr,
 	&dev_attr_max_comp_streams.attr,
 	&dev_attr_comp_algorithm.attr,
 	NULL,
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index b7aa9c21553f..c6ee271317f5 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -90,6 +90,7 @@ struct zram_stats {
 	atomic64_t notify_free;	/* no. of swap slot free notifications */
 	atomic64_t zero_pages;		/* no. of zero filled pages */
 	atomic64_t pages_stored;	/* no. of pages currently stored */
+	atomic_long_t max_used_pages;	/* no. of maximum pages stored */
 };
 
 struct zram_meta {
-- 
2.0.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
  2014-08-22  0:42   ` Minchan Kim
@ 2014-08-22 10:55     ` David Horner
  -1 siblings, 0 replies; 44+ messages in thread
From: David Horner @ 2014-08-22 10:55 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky,
	Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato,
	Nitin Gupta, Seth Jennings, Dan Streetman

On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
> Since zram has no control feature to limit memory usage,
> it makes hard to manage system memrory.
>
> This patch adds new knob "mem_limit" via sysfs to set up the
> a limit so that zram could fail allocation once it reaches
> the limit.
>
> In addition, user could change the limit in runtime so that
> he could manage the memory more dynamically.
>
- Default is no limit so it doesn't break old behavior.
+ Initial state is no limit so it doesn't break old behavior.

I understand your previous post now.

I was saying that setting to either a null value or garbage
 (which is interpreted as zero by memparse(buf, NULL);)
removes the limit.

I think this is "surprise" behaviour and rather the null case should
return  -EINVAL
The test below should be "good enough" though not catching all garbage.

>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
>  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
>  drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
>  drivers/block/zram/zram_drv.h              |  5 ++++
>  4 files changed, 76 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
> index 70ec992514d0..b8c779d64968 100644
> --- a/Documentation/ABI/testing/sysfs-block-zram
> +++ b/Documentation/ABI/testing/sysfs-block-zram
> @@ -119,3 +119,13 @@ Description:
>                 efficiency can be calculated using compr_data_size and this
>                 statistic.
>                 Unit: bytes
> +
> +What:          /sys/block/zram<id>/mem_limit
> +Date:          August 2014
> +Contact:       Minchan Kim <minchan@kernel.org>
> +Description:
> +               The mem_limit file is read/write and specifies the amount
> +               of memory to be able to consume memory to store store
> +               compressed data. The limit could be changed in run time
> -               and "0" is default which means disable the limit.
> +               and "0" means disable the limit. No limit is the initial state.

there should be no default in the API.

> +               Unit: bytes
> diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
> index 0595c3f56ccf..82c6a41116db 100644
> --- a/Documentation/blockdev/zram.txt
> +++ b/Documentation/blockdev/zram.txt
> @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
>  since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
>  size of the disk when not in use so a huge zram is wasteful.
>
> -5) Activate:
> +5) Set memory limit: Optional
> +       Set memory limit by writing the value to sysfs node 'mem_limit'.
> +       The value can be either in bytes or you can use mem suffixes.
> +       In addition, you could change the value in runtime.
> +       Examples:
> +           # limit /dev/zram0 with 50MB memory
> +           echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
> +
> +           # Using mem suffixes
> +           echo 256K > /sys/block/zram0/mem_limit
> +           echo 512M > /sys/block/zram0/mem_limit
> +           echo 1G > /sys/block/zram0/mem_limit
> +
> +           # To disable memory limit
> +           echo 0 > /sys/block/zram0/mem_limit
> +
> +6) Activate:
>         mkswap /dev/zram0
>         swapon /dev/zram0
>
>         mkfs.ext4 /dev/zram1
>         mount /dev/zram1 /tmp
>
> -6) Stats:
> +7) Stats:
>         Per-device statistics are exported as various nodes under
>         /sys/block/zram<id>/
>                 disksize
> @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
>                 compr_data_size
>                 mem_used_total
>
> -7) Deactivate:
> +8) Deactivate:
>         swapoff /dev/zram0
>         umount /dev/zram1
>
> -8) Reset:
> +9) Reset:
>         Write any positive value to 'reset' sysfs node
>         echo 1 > /sys/block/zram0/reset
>         echo 1 > /sys/block/zram1/reset
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index f0b8b30a7128..370c355eb127 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
>         return scnprintf(buf, PAGE_SIZE, "%d\n", val);
>  }
>
> +static ssize_t mem_limit_show(struct device *dev,
> +               struct device_attribute *attr, char *buf)
> +{
> +       u64 val;
> +       struct zram *zram = dev_to_zram(dev);
> +
> +       down_read(&zram->init_lock);
> +       val = zram->limit_pages;
> +       up_read(&zram->init_lock);
> +
> +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
> +}
> +
> +static ssize_t mem_limit_store(struct device *dev,
> +               struct device_attribute *attr, const char *buf, size_t len)
> +{
> +       u64 limit;
> +       struct zram *zram = dev_to_zram(dev);
> +
> +       limit = memparse(buf, NULL);

            if (limit = 0 && buf != "0")
                  return  -EINVAL

> +       down_write(&zram->init_lock);
> +       zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
> +       up_write(&zram->init_lock);
> +
> +       return len;
> +}
> +
>  static ssize_t max_comp_streams_store(struct device *dev,
>                 struct device_attribute *attr, const char *buf, size_t len)
>  {
> @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>                 ret = -ENOMEM;
>                 goto out;
>         }
> +
> +       if (zram->limit_pages &&
> +               zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
> +               zs_free(meta->mem_pool, handle);
> +               ret = -ENOMEM;
> +               goto out;
> +       }
> +
>         cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>
>         if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
> @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>         struct zram_meta *meta;
>
>         down_write(&zram->init_lock);
> +
> +       zram->limit_pages = 0;
> +
>         if (!init_done(zram)) {
>                 up_write(&zram->init_lock);
>                 return;
> @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
>  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
>  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
>  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
> +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
> +               mem_limit_store);
>  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
>                 max_comp_streams_show, max_comp_streams_store);
>  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
> @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
>         &dev_attr_orig_data_size.attr,
>         &dev_attr_compr_data_size.attr,
>         &dev_attr_mem_used_total.attr,
> +       &dev_attr_mem_limit.attr,
>         &dev_attr_max_comp_streams.attr,
>         &dev_attr_comp_algorithm.attr,
>         NULL,
> diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
> index e0f725c87cc6..b7aa9c21553f 100644
> --- a/drivers/block/zram/zram_drv.h
> +++ b/drivers/block/zram/zram_drv.h
> @@ -112,6 +112,11 @@ struct zram {
>         u64 disksize;   /* bytes */
>         int max_comp_streams;
>         struct zram_stats stats;
> +       /*
> +        * the number of pages zram can consume for storing compressed data
> +        */
> +       unsigned long limit_pages;
> +
>         char compressor[10];
>  };
>  #endif
> --
> 2.0.0
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
@ 2014-08-22 10:55     ` David Horner
  0 siblings, 0 replies; 44+ messages in thread
From: David Horner @ 2014-08-22 10:55 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky,
	Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato,
	Nitin Gupta, Seth Jennings, Dan Streetman

On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
> Since zram has no control feature to limit memory usage,
> it makes hard to manage system memrory.
>
> This patch adds new knob "mem_limit" via sysfs to set up the
> a limit so that zram could fail allocation once it reaches
> the limit.
>
> In addition, user could change the limit in runtime so that
> he could manage the memory more dynamically.
>
- Default is no limit so it doesn't break old behavior.
+ Initial state is no limit so it doesn't break old behavior.

I understand your previous post now.

I was saying that setting to either a null value or garbage
 (which is interpreted as zero by memparse(buf, NULL);)
removes the limit.

I think this is "surprise" behaviour and rather the null case should
return  -EINVAL
The test below should be "good enough" though not catching all garbage.

>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
>  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
>  drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
>  drivers/block/zram/zram_drv.h              |  5 ++++
>  4 files changed, 76 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
> index 70ec992514d0..b8c779d64968 100644
> --- a/Documentation/ABI/testing/sysfs-block-zram
> +++ b/Documentation/ABI/testing/sysfs-block-zram
> @@ -119,3 +119,13 @@ Description:
>                 efficiency can be calculated using compr_data_size and this
>                 statistic.
>                 Unit: bytes
> +
> +What:          /sys/block/zram<id>/mem_limit
> +Date:          August 2014
> +Contact:       Minchan Kim <minchan@kernel.org>
> +Description:
> +               The mem_limit file is read/write and specifies the amount
> +               of memory to be able to consume memory to store store
> +               compressed data. The limit could be changed in run time
> -               and "0" is default which means disable the limit.
> +               and "0" means disable the limit. No limit is the initial state.

there should be no default in the API.

> +               Unit: bytes
> diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
> index 0595c3f56ccf..82c6a41116db 100644
> --- a/Documentation/blockdev/zram.txt
> +++ b/Documentation/blockdev/zram.txt
> @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
>  since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
>  size of the disk when not in use so a huge zram is wasteful.
>
> -5) Activate:
> +5) Set memory limit: Optional
> +       Set memory limit by writing the value to sysfs node 'mem_limit'.
> +       The value can be either in bytes or you can use mem suffixes.
> +       In addition, you could change the value in runtime.
> +       Examples:
> +           # limit /dev/zram0 with 50MB memory
> +           echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
> +
> +           # Using mem suffixes
> +           echo 256K > /sys/block/zram0/mem_limit
> +           echo 512M > /sys/block/zram0/mem_limit
> +           echo 1G > /sys/block/zram0/mem_limit
> +
> +           # To disable memory limit
> +           echo 0 > /sys/block/zram0/mem_limit
> +
> +6) Activate:
>         mkswap /dev/zram0
>         swapon /dev/zram0
>
>         mkfs.ext4 /dev/zram1
>         mount /dev/zram1 /tmp
>
> -6) Stats:
> +7) Stats:
>         Per-device statistics are exported as various nodes under
>         /sys/block/zram<id>/
>                 disksize
> @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
>                 compr_data_size
>                 mem_used_total
>
> -7) Deactivate:
> +8) Deactivate:
>         swapoff /dev/zram0
>         umount /dev/zram1
>
> -8) Reset:
> +9) Reset:
>         Write any positive value to 'reset' sysfs node
>         echo 1 > /sys/block/zram0/reset
>         echo 1 > /sys/block/zram1/reset
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index f0b8b30a7128..370c355eb127 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
>         return scnprintf(buf, PAGE_SIZE, "%d\n", val);
>  }
>
> +static ssize_t mem_limit_show(struct device *dev,
> +               struct device_attribute *attr, char *buf)
> +{
> +       u64 val;
> +       struct zram *zram = dev_to_zram(dev);
> +
> +       down_read(&zram->init_lock);
> +       val = zram->limit_pages;
> +       up_read(&zram->init_lock);
> +
> +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
> +}
> +
> +static ssize_t mem_limit_store(struct device *dev,
> +               struct device_attribute *attr, const char *buf, size_t len)
> +{
> +       u64 limit;
> +       struct zram *zram = dev_to_zram(dev);
> +
> +       limit = memparse(buf, NULL);

            if (limit = 0 && buf != "0")
                  return  -EINVAL

> +       down_write(&zram->init_lock);
> +       zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
> +       up_write(&zram->init_lock);
> +
> +       return len;
> +}
> +
>  static ssize_t max_comp_streams_store(struct device *dev,
>                 struct device_attribute *attr, const char *buf, size_t len)
>  {
> @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>                 ret = -ENOMEM;
>                 goto out;
>         }
> +
> +       if (zram->limit_pages &&
> +               zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
> +               zs_free(meta->mem_pool, handle);
> +               ret = -ENOMEM;
> +               goto out;
> +       }
> +
>         cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>
>         if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
> @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>         struct zram_meta *meta;
>
>         down_write(&zram->init_lock);
> +
> +       zram->limit_pages = 0;
> +
>         if (!init_done(zram)) {
>                 up_write(&zram->init_lock);
>                 return;
> @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
>  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
>  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
>  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
> +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
> +               mem_limit_store);
>  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
>                 max_comp_streams_show, max_comp_streams_store);
>  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
> @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
>         &dev_attr_orig_data_size.attr,
>         &dev_attr_compr_data_size.attr,
>         &dev_attr_mem_used_total.attr,
> +       &dev_attr_mem_limit.attr,
>         &dev_attr_max_comp_streams.attr,
>         &dev_attr_comp_algorithm.attr,
>         NULL,
> diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
> index e0f725c87cc6..b7aa9c21553f 100644
> --- a/drivers/block/zram/zram_drv.h
> +++ b/drivers/block/zram/zram_drv.h
> @@ -112,6 +112,11 @@ struct zram {
>         u64 disksize;   /* bytes */
>         int max_comp_streams;
>         struct zram_stats stats;
> +       /*
> +        * the number of pages zram can consume for storing compressed data
> +        */
> +       unsigned long limit_pages;
> +
>         char compressor[10];
>  };
>  #endif
> --
> 2.0.0
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
  2014-08-22 10:55     ` David Horner
@ 2014-08-22 18:47       ` Dan Streetman
  -1 siblings, 0 replies; 44+ messages in thread
From: Dan Streetman @ 2014-08-22 18:47 UTC (permalink / raw)
  To: David Horner
  Cc: Minchan Kim, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings

On Fri, Aug 22, 2014 at 6:55 AM, David Horner <ds2horner@gmail.com> wrote:
> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
>> Since zram has no control feature to limit memory usage,
>> it makes hard to manage system memrory.
>>
>> This patch adds new knob "mem_limit" via sysfs to set up the
>> a limit so that zram could fail allocation once it reaches
>> the limit.
>>
>> In addition, user could change the limit in runtime so that
>> he could manage the memory more dynamically.
>>
> - Default is no limit so it doesn't break old behavior.
> + Initial state is no limit so it doesn't break old behavior.
>
> I understand your previous post now.

Yes by "default" I meant the initial value.

>
> I was saying that setting to either a null value or garbage
>  (which is interpreted as zero by memparse(buf, NULL);)
> removes the limit.
>
> I think this is "surprise" behaviour and rather the null case should
> return  -EINVAL
> The test below should be "good enough" though not catching all garbage.

I'm not sure of the specifics of memparse, but if it returns 0 for
non-numeric strings (which i assume it does, since there's no method
for reporting errors) I agree that should return -EINVAL instead of
clearing the mem_limit.

>
>>
>> Signed-off-by: Minchan Kim <minchan@kernel.org>
>> ---
>>  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
>>  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
>>  drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
>>  drivers/block/zram/zram_drv.h              |  5 ++++
>>  4 files changed, 76 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
>> index 70ec992514d0..b8c779d64968 100644
>> --- a/Documentation/ABI/testing/sysfs-block-zram
>> +++ b/Documentation/ABI/testing/sysfs-block-zram
>> @@ -119,3 +119,13 @@ Description:
>>                 efficiency can be calculated using compr_data_size and this
>>                 statistic.
>>                 Unit: bytes
>> +
>> +What:          /sys/block/zram<id>/mem_limit
>> +Date:          August 2014
>> +Contact:       Minchan Kim <minchan@kernel.org>
>> +Description:
>> +               The mem_limit file is read/write and specifies the amount
>> +               of memory to be able to consume memory to store store
>> +               compressed data. The limit could be changed in run time
>> -               and "0" is default which means disable the limit.
>> +               and "0" means disable the limit. No limit is the initial state.
>
> there should be no default in the API.
>
>> +               Unit: bytes
>> diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
>> index 0595c3f56ccf..82c6a41116db 100644
>> --- a/Documentation/blockdev/zram.txt
>> +++ b/Documentation/blockdev/zram.txt
>> @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
>>  since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
>>  size of the disk when not in use so a huge zram is wasteful.
>>
>> -5) Activate:
>> +5) Set memory limit: Optional
>> +       Set memory limit by writing the value to sysfs node 'mem_limit'.
>> +       The value can be either in bytes or you can use mem suffixes.
>> +       In addition, you could change the value in runtime.
>> +       Examples:
>> +           # limit /dev/zram0 with 50MB memory
>> +           echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
>> +
>> +           # Using mem suffixes
>> +           echo 256K > /sys/block/zram0/mem_limit
>> +           echo 512M > /sys/block/zram0/mem_limit
>> +           echo 1G > /sys/block/zram0/mem_limit
>> +
>> +           # To disable memory limit
>> +           echo 0 > /sys/block/zram0/mem_limit
>> +
>> +6) Activate:
>>         mkswap /dev/zram0
>>         swapon /dev/zram0
>>
>>         mkfs.ext4 /dev/zram1
>>         mount /dev/zram1 /tmp
>>
>> -6) Stats:
>> +7) Stats:
>>         Per-device statistics are exported as various nodes under
>>         /sys/block/zram<id>/
>>                 disksize
>> @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
>>                 compr_data_size
>>                 mem_used_total
>>
>> -7) Deactivate:
>> +8) Deactivate:
>>         swapoff /dev/zram0
>>         umount /dev/zram1
>>
>> -8) Reset:
>> +9) Reset:
>>         Write any positive value to 'reset' sysfs node
>>         echo 1 > /sys/block/zram0/reset
>>         echo 1 > /sys/block/zram1/reset
>> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
>> index f0b8b30a7128..370c355eb127 100644
>> --- a/drivers/block/zram/zram_drv.c
>> +++ b/drivers/block/zram/zram_drv.c
>> @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
>>         return scnprintf(buf, PAGE_SIZE, "%d\n", val);
>>  }
>>
>> +static ssize_t mem_limit_show(struct device *dev,
>> +               struct device_attribute *attr, char *buf)
>> +{
>> +       u64 val;
>> +       struct zram *zram = dev_to_zram(dev);
>> +
>> +       down_read(&zram->init_lock);
>> +       val = zram->limit_pages;
>> +       up_read(&zram->init_lock);
>> +
>> +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
>> +}
>> +
>> +static ssize_t mem_limit_store(struct device *dev,
>> +               struct device_attribute *attr, const char *buf, size_t len)
>> +{
>> +       u64 limit;
>> +       struct zram *zram = dev_to_zram(dev);
>> +
>> +       limit = memparse(buf, NULL);
>
>             if (limit = 0 && buf != "0")
>                   return  -EINVAL
>
>> +       down_write(&zram->init_lock);
>> +       zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
>> +       up_write(&zram->init_lock);
>> +
>> +       return len;
>> +}
>> +
>>  static ssize_t max_comp_streams_store(struct device *dev,
>>                 struct device_attribute *attr, const char *buf, size_t len)
>>  {
>> @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>>                 ret = -ENOMEM;
>>                 goto out;
>>         }
>> +
>> +       if (zram->limit_pages &&
>> +               zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
>> +               zs_free(meta->mem_pool, handle);
>> +               ret = -ENOMEM;
>> +               goto out;
>> +       }
>> +
>>         cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>>
>>         if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
>> @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>>         struct zram_meta *meta;
>>
>>         down_write(&zram->init_lock);
>> +
>> +       zram->limit_pages = 0;
>> +
>>         if (!init_done(zram)) {
>>                 up_write(&zram->init_lock);
>>                 return;
>> @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
>>  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
>>  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
>>  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
>> +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
>> +               mem_limit_store);
>>  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
>>                 max_comp_streams_show, max_comp_streams_store);
>>  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
>> @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
>>         &dev_attr_orig_data_size.attr,
>>         &dev_attr_compr_data_size.attr,
>>         &dev_attr_mem_used_total.attr,
>> +       &dev_attr_mem_limit.attr,
>>         &dev_attr_max_comp_streams.attr,
>>         &dev_attr_comp_algorithm.attr,
>>         NULL,
>> diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
>> index e0f725c87cc6..b7aa9c21553f 100644
>> --- a/drivers/block/zram/zram_drv.h
>> +++ b/drivers/block/zram/zram_drv.h
>> @@ -112,6 +112,11 @@ struct zram {
>>         u64 disksize;   /* bytes */
>>         int max_comp_streams;
>>         struct zram_stats stats;
>> +       /*
>> +        * the number of pages zram can consume for storing compressed data
>> +        */
>> +       unsigned long limit_pages;
>> +
>>         char compressor[10];
>>  };
>>  #endif
>> --
>> 2.0.0
>>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
@ 2014-08-22 18:47       ` Dan Streetman
  0 siblings, 0 replies; 44+ messages in thread
From: Dan Streetman @ 2014-08-22 18:47 UTC (permalink / raw)
  To: David Horner
  Cc: Minchan Kim, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings

On Fri, Aug 22, 2014 at 6:55 AM, David Horner <ds2horner@gmail.com> wrote:
> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
>> Since zram has no control feature to limit memory usage,
>> it makes hard to manage system memrory.
>>
>> This patch adds new knob "mem_limit" via sysfs to set up the
>> a limit so that zram could fail allocation once it reaches
>> the limit.
>>
>> In addition, user could change the limit in runtime so that
>> he could manage the memory more dynamically.
>>
> - Default is no limit so it doesn't break old behavior.
> + Initial state is no limit so it doesn't break old behavior.
>
> I understand your previous post now.

Yes by "default" I meant the initial value.

>
> I was saying that setting to either a null value or garbage
>  (which is interpreted as zero by memparse(buf, NULL);)
> removes the limit.
>
> I think this is "surprise" behaviour and rather the null case should
> return  -EINVAL
> The test below should be "good enough" though not catching all garbage.

I'm not sure of the specifics of memparse, but if it returns 0 for
non-numeric strings (which i assume it does, since there's no method
for reporting errors) I agree that should return -EINVAL instead of
clearing the mem_limit.

>
>>
>> Signed-off-by: Minchan Kim <minchan@kernel.org>
>> ---
>>  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
>>  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
>>  drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
>>  drivers/block/zram/zram_drv.h              |  5 ++++
>>  4 files changed, 76 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
>> index 70ec992514d0..b8c779d64968 100644
>> --- a/Documentation/ABI/testing/sysfs-block-zram
>> +++ b/Documentation/ABI/testing/sysfs-block-zram
>> @@ -119,3 +119,13 @@ Description:
>>                 efficiency can be calculated using compr_data_size and this
>>                 statistic.
>>                 Unit: bytes
>> +
>> +What:          /sys/block/zram<id>/mem_limit
>> +Date:          August 2014
>> +Contact:       Minchan Kim <minchan@kernel.org>
>> +Description:
>> +               The mem_limit file is read/write and specifies the amount
>> +               of memory to be able to consume memory to store store
>> +               compressed data. The limit could be changed in run time
>> -               and "0" is default which means disable the limit.
>> +               and "0" means disable the limit. No limit is the initial state.
>
> there should be no default in the API.
>
>> +               Unit: bytes
>> diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
>> index 0595c3f56ccf..82c6a41116db 100644
>> --- a/Documentation/blockdev/zram.txt
>> +++ b/Documentation/blockdev/zram.txt
>> @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
>>  since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
>>  size of the disk when not in use so a huge zram is wasteful.
>>
>> -5) Activate:
>> +5) Set memory limit: Optional
>> +       Set memory limit by writing the value to sysfs node 'mem_limit'.
>> +       The value can be either in bytes or you can use mem suffixes.
>> +       In addition, you could change the value in runtime.
>> +       Examples:
>> +           # limit /dev/zram0 with 50MB memory
>> +           echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
>> +
>> +           # Using mem suffixes
>> +           echo 256K > /sys/block/zram0/mem_limit
>> +           echo 512M > /sys/block/zram0/mem_limit
>> +           echo 1G > /sys/block/zram0/mem_limit
>> +
>> +           # To disable memory limit
>> +           echo 0 > /sys/block/zram0/mem_limit
>> +
>> +6) Activate:
>>         mkswap /dev/zram0
>>         swapon /dev/zram0
>>
>>         mkfs.ext4 /dev/zram1
>>         mount /dev/zram1 /tmp
>>
>> -6) Stats:
>> +7) Stats:
>>         Per-device statistics are exported as various nodes under
>>         /sys/block/zram<id>/
>>                 disksize
>> @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
>>                 compr_data_size
>>                 mem_used_total
>>
>> -7) Deactivate:
>> +8) Deactivate:
>>         swapoff /dev/zram0
>>         umount /dev/zram1
>>
>> -8) Reset:
>> +9) Reset:
>>         Write any positive value to 'reset' sysfs node
>>         echo 1 > /sys/block/zram0/reset
>>         echo 1 > /sys/block/zram1/reset
>> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
>> index f0b8b30a7128..370c355eb127 100644
>> --- a/drivers/block/zram/zram_drv.c
>> +++ b/drivers/block/zram/zram_drv.c
>> @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
>>         return scnprintf(buf, PAGE_SIZE, "%d\n", val);
>>  }
>>
>> +static ssize_t mem_limit_show(struct device *dev,
>> +               struct device_attribute *attr, char *buf)
>> +{
>> +       u64 val;
>> +       struct zram *zram = dev_to_zram(dev);
>> +
>> +       down_read(&zram->init_lock);
>> +       val = zram->limit_pages;
>> +       up_read(&zram->init_lock);
>> +
>> +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
>> +}
>> +
>> +static ssize_t mem_limit_store(struct device *dev,
>> +               struct device_attribute *attr, const char *buf, size_t len)
>> +{
>> +       u64 limit;
>> +       struct zram *zram = dev_to_zram(dev);
>> +
>> +       limit = memparse(buf, NULL);
>
>             if (limit = 0 && buf != "0")
>                   return  -EINVAL
>
>> +       down_write(&zram->init_lock);
>> +       zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
>> +       up_write(&zram->init_lock);
>> +
>> +       return len;
>> +}
>> +
>>  static ssize_t max_comp_streams_store(struct device *dev,
>>                 struct device_attribute *attr, const char *buf, size_t len)
>>  {
>> @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>>                 ret = -ENOMEM;
>>                 goto out;
>>         }
>> +
>> +       if (zram->limit_pages &&
>> +               zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
>> +               zs_free(meta->mem_pool, handle);
>> +               ret = -ENOMEM;
>> +               goto out;
>> +       }
>> +
>>         cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>>
>>         if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
>> @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>>         struct zram_meta *meta;
>>
>>         down_write(&zram->init_lock);
>> +
>> +       zram->limit_pages = 0;
>> +
>>         if (!init_done(zram)) {
>>                 up_write(&zram->init_lock);
>>                 return;
>> @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
>>  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
>>  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
>>  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
>> +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
>> +               mem_limit_store);
>>  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
>>                 max_comp_streams_show, max_comp_streams_store);
>>  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
>> @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
>>         &dev_attr_orig_data_size.attr,
>>         &dev_attr_compr_data_size.attr,
>>         &dev_attr_mem_used_total.attr,
>> +       &dev_attr_mem_limit.attr,
>>         &dev_attr_max_comp_streams.attr,
>>         &dev_attr_comp_algorithm.attr,
>>         NULL,
>> diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
>> index e0f725c87cc6..b7aa9c21553f 100644
>> --- a/drivers/block/zram/zram_drv.h
>> +++ b/drivers/block/zram/zram_drv.h
>> @@ -112,6 +112,11 @@ struct zram {
>>         u64 disksize;   /* bytes */
>>         int max_comp_streams;
>>         struct zram_stats stats;
>> +       /*
>> +        * the number of pages zram can consume for storing compressed data
>> +        */
>> +       unsigned long limit_pages;
>> +
>>         char compressor[10];
>>  };
>>  #endif
>> --
>> 2.0.0
>>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 0/4] zram memory control enhance
  2014-08-22  0:42 ` Minchan Kim
@ 2014-08-22 19:15   ` Dan Streetman
  -1 siblings, 0 replies; 44+ messages in thread
From: Dan Streetman @ 2014-08-22 19:15 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky,
	Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato,
	Nitin Gupta, Seth Jennings, David Horner

On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
> Currently, zram has no feature to limit memory so theoretically
> zram can deplete system memory.
> Users have asked for a limit several times as even without exhaustion
> zram makes it hard to control memory usage of the platform.
> This patchset adds the feature.
>
> Patch 1 makes zs_get_total_size_bytes faster because it would be
> used frequently in later patches for the new feature.
>
> Patch 2 changes zs_get_total_size_bytes's return unit from bytes
> to page so that zsmalloc doesn't need unnecessary operation(ie,
> << PAGE_SHIFT).
>
> Patch 3 adds new feature. I added the feature into zram layer,
> not zsmalloc because limiation is zram's requirement, not zsmalloc
> so any other user using zsmalloc(ie, zpool) shouldn't affected
> by unnecessary branch of zsmalloc. In future, if every users
> of zsmalloc want the feature, then, we could move the feature
> from client side to zsmalloc easily but vice versa would be
> painful.
>
> Patch 4 adds news facility to report maximum memory usage of zram
> so that this avoids user polling frequently via /sys/block/zram0/
> mem_used_total and ensures transient max are not missed.

FWIW, with the minor update to checking the memparse in patch 3 David
mentioned, feel free to add to all the patches:

Reviewed-by: Dan Streetman <ddstreet@ieee.org>

>
> * From v3
>  * get_zs_total_size_byte function name change - Dan
>  * clarifiction of the document - Dan
>  * atomic account instead of introducing new lock in zsmalloc - David
>  * remove unnecessary atomic instruction in updating max - David
>
> * From v2
>  * introduce helper funcntion to update max_used_pages
>    for readability - David
>  * avoid unncessary zs_get_total_size call in updating loop
>    for max_used_pages - David
>
> * From v1
>  * rebased on next-20140815
>  * fix up race problem - David, Dan
>  * reset mem_used_max as current total_bytes, rather than 0 - David
>  * resetting works with only "0" write for extensiblilty - David, Dan
>
> Minchan Kim (4):
>   zsmalloc: move pages_allocated to zs_pool
>   zsmalloc: change return value unit of  zs_get_total_size_bytes
>   zram: zram memory size limitation
>   zram: report maximum used memory
>
>  Documentation/ABI/testing/sysfs-block-zram |  20 ++++++
>  Documentation/blockdev/zram.txt            |  25 +++++--
>  drivers/block/zram/zram_drv.c              | 101 ++++++++++++++++++++++++++++-
>  drivers/block/zram/zram_drv.h              |   6 ++
>  include/linux/zsmalloc.h                   |   2 +-
>  mm/zsmalloc.c                              |  30 ++++-----
>  6 files changed, 158 insertions(+), 26 deletions(-)
>
> --
> 2.0.0
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 0/4] zram memory control enhance
@ 2014-08-22 19:15   ` Dan Streetman
  0 siblings, 0 replies; 44+ messages in thread
From: Dan Streetman @ 2014-08-22 19:15 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky,
	Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato,
	Nitin Gupta, Seth Jennings, David Horner

On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
> Currently, zram has no feature to limit memory so theoretically
> zram can deplete system memory.
> Users have asked for a limit several times as even without exhaustion
> zram makes it hard to control memory usage of the platform.
> This patchset adds the feature.
>
> Patch 1 makes zs_get_total_size_bytes faster because it would be
> used frequently in later patches for the new feature.
>
> Patch 2 changes zs_get_total_size_bytes's return unit from bytes
> to page so that zsmalloc doesn't need unnecessary operation(ie,
> << PAGE_SHIFT).
>
> Patch 3 adds new feature. I added the feature into zram layer,
> not zsmalloc because limiation is zram's requirement, not zsmalloc
> so any other user using zsmalloc(ie, zpool) shouldn't affected
> by unnecessary branch of zsmalloc. In future, if every users
> of zsmalloc want the feature, then, we could move the feature
> from client side to zsmalloc easily but vice versa would be
> painful.
>
> Patch 4 adds news facility to report maximum memory usage of zram
> so that this avoids user polling frequently via /sys/block/zram0/
> mem_used_total and ensures transient max are not missed.

FWIW, with the minor update to checking the memparse in patch 3 David
mentioned, feel free to add to all the patches:

Reviewed-by: Dan Streetman <ddstreet@ieee.org>

>
> * From v3
>  * get_zs_total_size_byte function name change - Dan
>  * clarifiction of the document - Dan
>  * atomic account instead of introducing new lock in zsmalloc - David
>  * remove unnecessary atomic instruction in updating max - David
>
> * From v2
>  * introduce helper funcntion to update max_used_pages
>    for readability - David
>  * avoid unncessary zs_get_total_size call in updating loop
>    for max_used_pages - David
>
> * From v1
>  * rebased on next-20140815
>  * fix up race problem - David, Dan
>  * reset mem_used_max as current total_bytes, rather than 0 - David
>  * resetting works with only "0" write for extensiblilty - David, Dan
>
> Minchan Kim (4):
>   zsmalloc: move pages_allocated to zs_pool
>   zsmalloc: change return value unit of  zs_get_total_size_bytes
>   zram: zram memory size limitation
>   zram: report maximum used memory
>
>  Documentation/ABI/testing/sysfs-block-zram |  20 ++++++
>  Documentation/blockdev/zram.txt            |  25 +++++--
>  drivers/block/zram/zram_drv.c              | 101 ++++++++++++++++++++++++++++-
>  drivers/block/zram/zram_drv.h              |   6 ++
>  include/linux/zsmalloc.h                   |   2 +-
>  mm/zsmalloc.c                              |  30 ++++-----
>  6 files changed, 158 insertions(+), 26 deletions(-)
>
> --
> 2.0.0
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
  2014-08-22 10:55     ` David Horner
@ 2014-08-24 23:56       ` Minchan Kim
  -1 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2014-08-24 23:56 UTC (permalink / raw)
  To: David Horner
  Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky,
	Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato,
	Nitin Gupta, Seth Jennings, Dan Streetman

Hello David,

On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote:
> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
> > Since zram has no control feature to limit memory usage,
> > it makes hard to manage system memrory.
> >
> > This patch adds new knob "mem_limit" via sysfs to set up the
> > a limit so that zram could fail allocation once it reaches
> > the limit.
> >
> > In addition, user could change the limit in runtime so that
> > he could manage the memory more dynamically.
> >
> - Default is no limit so it doesn't break old behavior.
> + Initial state is no limit so it doesn't break old behavior.
> 
> I understand your previous post now.
> 
> I was saying that setting to either a null value or garbage
>  (which is interpreted as zero by memparse(buf, NULL);)
> removes the limit.
> 
> I think this is "surprise" behaviour and rather the null case should
> return  -EINVAL
> The test below should be "good enough" though not catching all garbage.

Thanks for suggesting but as I said, it should be fixed in memparse itself,
not caller if it is really problem so I don't want to touch it in this
patchset. It's not critical for adding the feature.

> 
> >
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > ---
> >  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
> >  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
> >  drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
> >  drivers/block/zram/zram_drv.h              |  5 ++++
> >  4 files changed, 76 insertions(+), 4 deletions(-)
> >
> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
> > index 70ec992514d0..b8c779d64968 100644
> > --- a/Documentation/ABI/testing/sysfs-block-zram
> > +++ b/Documentation/ABI/testing/sysfs-block-zram
> > @@ -119,3 +119,13 @@ Description:
> >                 efficiency can be calculated using compr_data_size and this
> >                 statistic.
> >                 Unit: bytes
> > +
> > +What:          /sys/block/zram<id>/mem_limit
> > +Date:          August 2014
> > +Contact:       Minchan Kim <minchan@kernel.org>
> > +Description:
> > +               The mem_limit file is read/write and specifies the amount
> > +               of memory to be able to consume memory to store store
> > +               compressed data. The limit could be changed in run time
> > -               and "0" is default which means disable the limit.
> > +               and "0" means disable the limit. No limit is the initial state.
> 
> there should be no default in the API.

Thanks.

> 
> > +               Unit: bytes
> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
> > index 0595c3f56ccf..82c6a41116db 100644
> > --- a/Documentation/blockdev/zram.txt
> > +++ b/Documentation/blockdev/zram.txt
> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
> >  since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
> >  size of the disk when not in use so a huge zram is wasteful.
> >
> > -5) Activate:
> > +5) Set memory limit: Optional
> > +       Set memory limit by writing the value to sysfs node 'mem_limit'.
> > +       The value can be either in bytes or you can use mem suffixes.
> > +       In addition, you could change the value in runtime.
> > +       Examples:
> > +           # limit /dev/zram0 with 50MB memory
> > +           echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
> > +
> > +           # Using mem suffixes
> > +           echo 256K > /sys/block/zram0/mem_limit
> > +           echo 512M > /sys/block/zram0/mem_limit
> > +           echo 1G > /sys/block/zram0/mem_limit
> > +
> > +           # To disable memory limit
> > +           echo 0 > /sys/block/zram0/mem_limit
> > +
> > +6) Activate:
> >         mkswap /dev/zram0
> >         swapon /dev/zram0
> >
> >         mkfs.ext4 /dev/zram1
> >         mount /dev/zram1 /tmp
> >
> > -6) Stats:
> > +7) Stats:
> >         Per-device statistics are exported as various nodes under
> >         /sys/block/zram<id>/
> >                 disksize
> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
> >                 compr_data_size
> >                 mem_used_total
> >
> > -7) Deactivate:
> > +8) Deactivate:
> >         swapoff /dev/zram0
> >         umount /dev/zram1
> >
> > -8) Reset:
> > +9) Reset:
> >         Write any positive value to 'reset' sysfs node
> >         echo 1 > /sys/block/zram0/reset
> >         echo 1 > /sys/block/zram1/reset
> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> > index f0b8b30a7128..370c355eb127 100644
> > --- a/drivers/block/zram/zram_drv.c
> > +++ b/drivers/block/zram/zram_drv.c
> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
> >         return scnprintf(buf, PAGE_SIZE, "%d\n", val);
> >  }
> >
> > +static ssize_t mem_limit_show(struct device *dev,
> > +               struct device_attribute *attr, char *buf)
> > +{
> > +       u64 val;
> > +       struct zram *zram = dev_to_zram(dev);
> > +
> > +       down_read(&zram->init_lock);
> > +       val = zram->limit_pages;
> > +       up_read(&zram->init_lock);
> > +
> > +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
> > +}
> > +
> > +static ssize_t mem_limit_store(struct device *dev,
> > +               struct device_attribute *attr, const char *buf, size_t len)
> > +{
> > +       u64 limit;
> > +       struct zram *zram = dev_to_zram(dev);
> > +
> > +       limit = memparse(buf, NULL);
> 
>             if (limit = 0 && buf != "0")
>                   return  -EINVAL
> 
> > +       down_write(&zram->init_lock);
> > +       zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
> > +       up_write(&zram->init_lock);
> > +
> > +       return len;
> > +}
> > +
> >  static ssize_t max_comp_streams_store(struct device *dev,
> >                 struct device_attribute *attr, const char *buf, size_t len)
> >  {
> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
> >                 ret = -ENOMEM;
> >                 goto out;
> >         }
> > +
> > +       if (zram->limit_pages &&
> > +               zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
> > +               zs_free(meta->mem_pool, handle);
> > +               ret = -ENOMEM;
> > +               goto out;
> > +       }
> > +
> >         cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
> >
> >         if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
> >         struct zram_meta *meta;
> >
> >         down_write(&zram->init_lock);
> > +
> > +       zram->limit_pages = 0;
> > +
> >         if (!init_done(zram)) {
> >                 up_write(&zram->init_lock);
> >                 return;
> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
> >  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
> >  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
> >  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
> > +               mem_limit_store);
> >  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
> >                 max_comp_streams_show, max_comp_streams_store);
> >  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
> >         &dev_attr_orig_data_size.attr,
> >         &dev_attr_compr_data_size.attr,
> >         &dev_attr_mem_used_total.attr,
> > +       &dev_attr_mem_limit.attr,
> >         &dev_attr_max_comp_streams.attr,
> >         &dev_attr_comp_algorithm.attr,
> >         NULL,
> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
> > index e0f725c87cc6..b7aa9c21553f 100644
> > --- a/drivers/block/zram/zram_drv.h
> > +++ b/drivers/block/zram/zram_drv.h
> > @@ -112,6 +112,11 @@ struct zram {
> >         u64 disksize;   /* bytes */
> >         int max_comp_streams;
> >         struct zram_stats stats;
> > +       /*
> > +        * the number of pages zram can consume for storing compressed data
> > +        */
> > +       unsigned long limit_pages;
> > +
> >         char compressor[10];
> >  };
> >  #endif
> > --
> > 2.0.0
> >
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
@ 2014-08-24 23:56       ` Minchan Kim
  0 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2014-08-24 23:56 UTC (permalink / raw)
  To: David Horner
  Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky,
	Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato,
	Nitin Gupta, Seth Jennings, Dan Streetman

Hello David,

On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote:
> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
> > Since zram has no control feature to limit memory usage,
> > it makes hard to manage system memrory.
> >
> > This patch adds new knob "mem_limit" via sysfs to set up the
> > a limit so that zram could fail allocation once it reaches
> > the limit.
> >
> > In addition, user could change the limit in runtime so that
> > he could manage the memory more dynamically.
> >
> - Default is no limit so it doesn't break old behavior.
> + Initial state is no limit so it doesn't break old behavior.
> 
> I understand your previous post now.
> 
> I was saying that setting to either a null value or garbage
>  (which is interpreted as zero by memparse(buf, NULL);)
> removes the limit.
> 
> I think this is "surprise" behaviour and rather the null case should
> return  -EINVAL
> The test below should be "good enough" though not catching all garbage.

Thanks for suggesting but as I said, it should be fixed in memparse itself,
not caller if it is really problem so I don't want to touch it in this
patchset. It's not critical for adding the feature.

> 
> >
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > ---
> >  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
> >  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
> >  drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
> >  drivers/block/zram/zram_drv.h              |  5 ++++
> >  4 files changed, 76 insertions(+), 4 deletions(-)
> >
> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
> > index 70ec992514d0..b8c779d64968 100644
> > --- a/Documentation/ABI/testing/sysfs-block-zram
> > +++ b/Documentation/ABI/testing/sysfs-block-zram
> > @@ -119,3 +119,13 @@ Description:
> >                 efficiency can be calculated using compr_data_size and this
> >                 statistic.
> >                 Unit: bytes
> > +
> > +What:          /sys/block/zram<id>/mem_limit
> > +Date:          August 2014
> > +Contact:       Minchan Kim <minchan@kernel.org>
> > +Description:
> > +               The mem_limit file is read/write and specifies the amount
> > +               of memory to be able to consume memory to store store
> > +               compressed data. The limit could be changed in run time
> > -               and "0" is default which means disable the limit.
> > +               and "0" means disable the limit. No limit is the initial state.
> 
> there should be no default in the API.

Thanks.

> 
> > +               Unit: bytes
> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
> > index 0595c3f56ccf..82c6a41116db 100644
> > --- a/Documentation/blockdev/zram.txt
> > +++ b/Documentation/blockdev/zram.txt
> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
> >  since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
> >  size of the disk when not in use so a huge zram is wasteful.
> >
> > -5) Activate:
> > +5) Set memory limit: Optional
> > +       Set memory limit by writing the value to sysfs node 'mem_limit'.
> > +       The value can be either in bytes or you can use mem suffixes.
> > +       In addition, you could change the value in runtime.
> > +       Examples:
> > +           # limit /dev/zram0 with 50MB memory
> > +           echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
> > +
> > +           # Using mem suffixes
> > +           echo 256K > /sys/block/zram0/mem_limit
> > +           echo 512M > /sys/block/zram0/mem_limit
> > +           echo 1G > /sys/block/zram0/mem_limit
> > +
> > +           # To disable memory limit
> > +           echo 0 > /sys/block/zram0/mem_limit
> > +
> > +6) Activate:
> >         mkswap /dev/zram0
> >         swapon /dev/zram0
> >
> >         mkfs.ext4 /dev/zram1
> >         mount /dev/zram1 /tmp
> >
> > -6) Stats:
> > +7) Stats:
> >         Per-device statistics are exported as various nodes under
> >         /sys/block/zram<id>/
> >                 disksize
> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
> >                 compr_data_size
> >                 mem_used_total
> >
> > -7) Deactivate:
> > +8) Deactivate:
> >         swapoff /dev/zram0
> >         umount /dev/zram1
> >
> > -8) Reset:
> > +9) Reset:
> >         Write any positive value to 'reset' sysfs node
> >         echo 1 > /sys/block/zram0/reset
> >         echo 1 > /sys/block/zram1/reset
> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> > index f0b8b30a7128..370c355eb127 100644
> > --- a/drivers/block/zram/zram_drv.c
> > +++ b/drivers/block/zram/zram_drv.c
> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
> >         return scnprintf(buf, PAGE_SIZE, "%d\n", val);
> >  }
> >
> > +static ssize_t mem_limit_show(struct device *dev,
> > +               struct device_attribute *attr, char *buf)
> > +{
> > +       u64 val;
> > +       struct zram *zram = dev_to_zram(dev);
> > +
> > +       down_read(&zram->init_lock);
> > +       val = zram->limit_pages;
> > +       up_read(&zram->init_lock);
> > +
> > +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
> > +}
> > +
> > +static ssize_t mem_limit_store(struct device *dev,
> > +               struct device_attribute *attr, const char *buf, size_t len)
> > +{
> > +       u64 limit;
> > +       struct zram *zram = dev_to_zram(dev);
> > +
> > +       limit = memparse(buf, NULL);
> 
>             if (limit = 0 && buf != "0")
>                   return  -EINVAL
> 
> > +       down_write(&zram->init_lock);
> > +       zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
> > +       up_write(&zram->init_lock);
> > +
> > +       return len;
> > +}
> > +
> >  static ssize_t max_comp_streams_store(struct device *dev,
> >                 struct device_attribute *attr, const char *buf, size_t len)
> >  {
> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
> >                 ret = -ENOMEM;
> >                 goto out;
> >         }
> > +
> > +       if (zram->limit_pages &&
> > +               zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
> > +               zs_free(meta->mem_pool, handle);
> > +               ret = -ENOMEM;
> > +               goto out;
> > +       }
> > +
> >         cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
> >
> >         if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
> >         struct zram_meta *meta;
> >
> >         down_write(&zram->init_lock);
> > +
> > +       zram->limit_pages = 0;
> > +
> >         if (!init_done(zram)) {
> >                 up_write(&zram->init_lock);
> >                 return;
> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
> >  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
> >  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
> >  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
> > +               mem_limit_store);
> >  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
> >                 max_comp_streams_show, max_comp_streams_store);
> >  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
> >         &dev_attr_orig_data_size.attr,
> >         &dev_attr_compr_data_size.attr,
> >         &dev_attr_mem_used_total.attr,
> > +       &dev_attr_mem_limit.attr,
> >         &dev_attr_max_comp_streams.attr,
> >         &dev_attr_comp_algorithm.attr,
> >         NULL,
> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
> > index e0f725c87cc6..b7aa9c21553f 100644
> > --- a/drivers/block/zram/zram_drv.h
> > +++ b/drivers/block/zram/zram_drv.h
> > @@ -112,6 +112,11 @@ struct zram {
> >         u64 disksize;   /* bytes */
> >         int max_comp_streams;
> >         struct zram_stats stats;
> > +       /*
> > +        * the number of pages zram can consume for storing compressed data
> > +        */
> > +       unsigned long limit_pages;
> > +
> >         char compressor[10];
> >  };
> >  #endif
> > --
> > 2.0.0
> >
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 0/4] zram memory control enhance
  2014-08-22 19:15   ` Dan Streetman
@ 2014-08-24 23:58     ` Minchan Kim
  -1 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2014-08-24 23:58 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky,
	Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato,
	Nitin Gupta, Seth Jennings, David Horner

Hello Dan,

On Fri, Aug 22, 2014 at 03:15:36PM -0400, Dan Streetman wrote:
> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
> > Currently, zram has no feature to limit memory so theoretically
> > zram can deplete system memory.
> > Users have asked for a limit several times as even without exhaustion
> > zram makes it hard to control memory usage of the platform.
> > This patchset adds the feature.
> >
> > Patch 1 makes zs_get_total_size_bytes faster because it would be
> > used frequently in later patches for the new feature.
> >
> > Patch 2 changes zs_get_total_size_bytes's return unit from bytes
> > to page so that zsmalloc doesn't need unnecessary operation(ie,
> > << PAGE_SHIFT).
> >
> > Patch 3 adds new feature. I added the feature into zram layer,
> > not zsmalloc because limiation is zram's requirement, not zsmalloc
> > so any other user using zsmalloc(ie, zpool) shouldn't affected
> > by unnecessary branch of zsmalloc. In future, if every users
> > of zsmalloc want the feature, then, we could move the feature
> > from client side to zsmalloc easily but vice versa would be
> > painful.
> >
> > Patch 4 adds news facility to report maximum memory usage of zram
> > so that this avoids user polling frequently via /sys/block/zram0/
> > mem_used_total and ensures transient max are not missed.
> 
> FWIW, with the minor update to checking the memparse in patch 3 David
> mentioned, feel free to add to all the patches:

I replied David's reply, it's not critical for the goal
of this patchset. And if we should fix, it should be memparse and handle
all of cases, not just only null case.
So I will take your Reviewed-by except 3 patch. :)

> 
> Reviewed-by: Dan Streetman <ddstreet@ieee.org>

Thanks!

> 
> >
> > * From v3
> >  * get_zs_total_size_byte function name change - Dan
> >  * clarifiction of the document - Dan
> >  * atomic account instead of introducing new lock in zsmalloc - David
> >  * remove unnecessary atomic instruction in updating max - David
> >
> > * From v2
> >  * introduce helper funcntion to update max_used_pages
> >    for readability - David
> >  * avoid unncessary zs_get_total_size call in updating loop
> >    for max_used_pages - David
> >
> > * From v1
> >  * rebased on next-20140815
> >  * fix up race problem - David, Dan
> >  * reset mem_used_max as current total_bytes, rather than 0 - David
> >  * resetting works with only "0" write for extensiblilty - David, Dan
> >
> > Minchan Kim (4):
> >   zsmalloc: move pages_allocated to zs_pool
> >   zsmalloc: change return value unit of  zs_get_total_size_bytes
> >   zram: zram memory size limitation
> >   zram: report maximum used memory
> >
> >  Documentation/ABI/testing/sysfs-block-zram |  20 ++++++
> >  Documentation/blockdev/zram.txt            |  25 +++++--
> >  drivers/block/zram/zram_drv.c              | 101 ++++++++++++++++++++++++++++-
> >  drivers/block/zram/zram_drv.h              |   6 ++
> >  include/linux/zsmalloc.h                   |   2 +-
> >  mm/zsmalloc.c                              |  30 ++++-----
> >  6 files changed, 158 insertions(+), 26 deletions(-)
> >
> > --
> > 2.0.0
> >
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 0/4] zram memory control enhance
@ 2014-08-24 23:58     ` Minchan Kim
  0 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2014-08-24 23:58 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky,
	Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato,
	Nitin Gupta, Seth Jennings, David Horner

Hello Dan,

On Fri, Aug 22, 2014 at 03:15:36PM -0400, Dan Streetman wrote:
> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
> > Currently, zram has no feature to limit memory so theoretically
> > zram can deplete system memory.
> > Users have asked for a limit several times as even without exhaustion
> > zram makes it hard to control memory usage of the platform.
> > This patchset adds the feature.
> >
> > Patch 1 makes zs_get_total_size_bytes faster because it would be
> > used frequently in later patches for the new feature.
> >
> > Patch 2 changes zs_get_total_size_bytes's return unit from bytes
> > to page so that zsmalloc doesn't need unnecessary operation(ie,
> > << PAGE_SHIFT).
> >
> > Patch 3 adds new feature. I added the feature into zram layer,
> > not zsmalloc because limiation is zram's requirement, not zsmalloc
> > so any other user using zsmalloc(ie, zpool) shouldn't affected
> > by unnecessary branch of zsmalloc. In future, if every users
> > of zsmalloc want the feature, then, we could move the feature
> > from client side to zsmalloc easily but vice versa would be
> > painful.
> >
> > Patch 4 adds news facility to report maximum memory usage of zram
> > so that this avoids user polling frequently via /sys/block/zram0/
> > mem_used_total and ensures transient max are not missed.
> 
> FWIW, with the minor update to checking the memparse in patch 3 David
> mentioned, feel free to add to all the patches:

I replied David's reply, it's not critical for the goal
of this patchset. And if we should fix, it should be memparse and handle
all of cases, not just only null case.
So I will take your Reviewed-by except 3 patch. :)

> 
> Reviewed-by: Dan Streetman <ddstreet@ieee.org>

Thanks!

> 
> >
> > * From v3
> >  * get_zs_total_size_byte function name change - Dan
> >  * clarifiction of the document - Dan
> >  * atomic account instead of introducing new lock in zsmalloc - David
> >  * remove unnecessary atomic instruction in updating max - David
> >
> > * From v2
> >  * introduce helper funcntion to update max_used_pages
> >    for readability - David
> >  * avoid unncessary zs_get_total_size call in updating loop
> >    for max_used_pages - David
> >
> > * From v1
> >  * rebased on next-20140815
> >  * fix up race problem - David, Dan
> >  * reset mem_used_max as current total_bytes, rather than 0 - David
> >  * resetting works with only "0" write for extensiblilty - David, Dan
> >
> > Minchan Kim (4):
> >   zsmalloc: move pages_allocated to zs_pool
> >   zsmalloc: change return value unit of  zs_get_total_size_bytes
> >   zram: zram memory size limitation
> >   zram: report maximum used memory
> >
> >  Documentation/ABI/testing/sysfs-block-zram |  20 ++++++
> >  Documentation/blockdev/zram.txt            |  25 +++++--
> >  drivers/block/zram/zram_drv.c              | 101 ++++++++++++++++++++++++++++-
> >  drivers/block/zram/zram_drv.h              |   6 ++
> >  include/linux/zsmalloc.h                   |   2 +-
> >  mm/zsmalloc.c                              |  30 ++++-----
> >  6 files changed, 158 insertions(+), 26 deletions(-)
> >
> > --
> > 2.0.0
> >
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
  2014-08-24 23:56       ` Minchan Kim
@ 2014-08-25  3:40         ` David Horner
  -1 siblings, 0 replies; 44+ messages in thread
From: David Horner @ 2014-08-25  3:40 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky,
	Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato,
	Nitin Gupta, Seth Jennings, Dan Streetman

On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote:
> Hello David,
>
> On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote:
>> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
>> > Since zram has no control feature to limit memory usage,
>> > it makes hard to manage system memrory.
>> >
>> > This patch adds new knob "mem_limit" via sysfs to set up the
>> > a limit so that zram could fail allocation once it reaches
>> > the limit.
>> >
>> > In addition, user could change the limit in runtime so that
>> > he could manage the memory more dynamically.
>> >
>> - Default is no limit so it doesn't break old behavior.
>> + Initial state is no limit so it doesn't break old behavior.
>>
>> I understand your previous post now.
>>
>> I was saying that setting to either a null value or garbage
>>  (which is interpreted as zero by memparse(buf, NULL);)
>> removes the limit.
>>
>> I think this is "surprise" behaviour and rather the null case should
>> return  -EINVAL
>> The test below should be "good enough" though not catching all garbage.
>
> Thanks for suggesting but as I said, it should be fixed in memparse itself,
> not caller if it is really problem so I don't want to touch it in this
> patchset. It's not critical for adding the feature.
>

I've looked into the memparse function more since we talked.
I do believe a wrapper function around it for the typical use by sysfs would
be very valuable.
However, there is nothing wrong with memparse itself that needs to be fixed.

It does what it is documented to do very well (In My Uninformed Opinion).
It provides everything that a caller needs to manage the token that it
processes.
It thus handles strings like "7,,5,8,,9" with the implied zeros.

The fact that other callers don't check the return pointer value to
see if only a null
string was processed, is not its fault.
Nor that it may not be ideally suited to sysfs attributes; that other store
functions use it in a given manner does not means that is correct -
nor that it is
incorrect for that "knob". Some attributes could be just as valid with
null zeros.

And you are correct, to disambiguate the zero is not required for the
limit feature.
Your original patch which disallowed zero was full feature for mem_limit.
It is the requested non-crucial feature to allow zero to reestablish
the initial state
 that benefits from distinguishing an explicit zero from a "default zero'
 when garbage is written.

The final argument is that if we release this feature as is the undocumented
 functionality could be relied upon, and when later fixed: user space breaks.
They say getting API right is a difficult exercise. I suggest, if we
don't insisting on
 an explicit zero we have the API wrong.

I don't think you disagreed, just that the burden to get it correct
lay elsewhere.

If that is the case it doesn't really matter, we cannot release this
interface until
 it is corrected wherever it must be.

And my zero check was a poor hack.

I should have explicitly checked the returned pointer value.

I will send that proposed revision, and hopefully you will consider it
for inclusion.




>>
>> >
>> > Signed-off-by: Minchan Kim <minchan@kernel.org>
>> > ---
>> >  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
>> >  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
>> >  drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
>> >  drivers/block/zram/zram_drv.h              |  5 ++++
>> >  4 files changed, 76 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
>> > index 70ec992514d0..b8c779d64968 100644
>> > --- a/Documentation/ABI/testing/sysfs-block-zram
>> > +++ b/Documentation/ABI/testing/sysfs-block-zram
>> > @@ -119,3 +119,13 @@ Description:
>> >                 efficiency can be calculated using compr_data_size and this
>> >                 statistic.
>> >                 Unit: bytes
>> > +
>> > +What:          /sys/block/zram<id>/mem_limit
>> > +Date:          August 2014
>> > +Contact:       Minchan Kim <minchan@kernel.org>
>> > +Description:
>> > +               The mem_limit file is read/write and specifies the amount
>> > +               of memory to be able to consume memory to store store
>> > +               compressed data. The limit could be changed in run time
>> > -               and "0" is default which means disable the limit.
>> > +               and "0" means disable the limit. No limit is the initial state.
>>
>> there should be no default in the API.
>
> Thanks.
>
>>
>> > +               Unit: bytes
>> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
>> > index 0595c3f56ccf..82c6a41116db 100644
>> > --- a/Documentation/blockdev/zram.txt
>> > +++ b/Documentation/blockdev/zram.txt
>> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
>> >  since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
>> >  size of the disk when not in use so a huge zram is wasteful.
>> >
>> > -5) Activate:
>> > +5) Set memory limit: Optional
>> > +       Set memory limit by writing the value to sysfs node 'mem_limit'.
>> > +       The value can be either in bytes or you can use mem suffixes.
>> > +       In addition, you could change the value in runtime.
>> > +       Examples:
>> > +           # limit /dev/zram0 with 50MB memory
>> > +           echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
>> > +
>> > +           # Using mem suffixes
>> > +           echo 256K > /sys/block/zram0/mem_limit
>> > +           echo 512M > /sys/block/zram0/mem_limit
>> > +           echo 1G > /sys/block/zram0/mem_limit
>> > +
>> > +           # To disable memory limit
>> > +           echo 0 > /sys/block/zram0/mem_limit
>> > +
>> > +6) Activate:
>> >         mkswap /dev/zram0
>> >         swapon /dev/zram0
>> >
>> >         mkfs.ext4 /dev/zram1
>> >         mount /dev/zram1 /tmp
>> >
>> > -6) Stats:
>> > +7) Stats:
>> >         Per-device statistics are exported as various nodes under
>> >         /sys/block/zram<id>/
>> >                 disksize
>> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
>> >                 compr_data_size
>> >                 mem_used_total
>> >
>> > -7) Deactivate:
>> > +8) Deactivate:
>> >         swapoff /dev/zram0
>> >         umount /dev/zram1
>> >
>> > -8) Reset:
>> > +9) Reset:
>> >         Write any positive value to 'reset' sysfs node
>> >         echo 1 > /sys/block/zram0/reset
>> >         echo 1 > /sys/block/zram1/reset
>> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
>> > index f0b8b30a7128..370c355eb127 100644
>> > --- a/drivers/block/zram/zram_drv.c
>> > +++ b/drivers/block/zram/zram_drv.c
>> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
>> >         return scnprintf(buf, PAGE_SIZE, "%d\n", val);
>> >  }
>> >
>> > +static ssize_t mem_limit_show(struct device *dev,
>> > +               struct device_attribute *attr, char *buf)
>> > +{
>> > +       u64 val;
>> > +       struct zram *zram = dev_to_zram(dev);
>> > +
>> > +       down_read(&zram->init_lock);
>> > +       val = zram->limit_pages;
>> > +       up_read(&zram->init_lock);
>> > +
>> > +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
>> > +}
>> > +
>> > +static ssize_t mem_limit_store(struct device *dev,
>> > +               struct device_attribute *attr, const char *buf, size_t len)
>> > +{
>> > +       u64 limit;
>> > +       struct zram *zram = dev_to_zram(dev);
>> > +
>> > +       limit = memparse(buf, NULL);
>>
>>             if (limit = 0 && buf != "0")
>>                   return  -EINVAL
>>
>> > +       down_write(&zram->init_lock);
>> > +       zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
>> > +       up_write(&zram->init_lock);
>> > +
>> > +       return len;
>> > +}
>> > +
>> >  static ssize_t max_comp_streams_store(struct device *dev,
>> >                 struct device_attribute *attr, const char *buf, size_t len)
>> >  {
>> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>> >                 ret = -ENOMEM;
>> >                 goto out;
>> >         }
>> > +
>> > +       if (zram->limit_pages &&
>> > +               zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
>> > +               zs_free(meta->mem_pool, handle);
>> > +               ret = -ENOMEM;
>> > +               goto out;
>> > +       }
>> > +
>> >         cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>> >
>> >         if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
>> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>> >         struct zram_meta *meta;
>> >
>> >         down_write(&zram->init_lock);
>> > +
>> > +       zram->limit_pages = 0;
>> > +
>> >         if (!init_done(zram)) {
>> >                 up_write(&zram->init_lock);
>> >                 return;
>> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
>> >  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
>> >  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
>> >  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
>> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
>> > +               mem_limit_store);
>> >  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
>> >                 max_comp_streams_show, max_comp_streams_store);
>> >  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
>> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
>> >         &dev_attr_orig_data_size.attr,
>> >         &dev_attr_compr_data_size.attr,
>> >         &dev_attr_mem_used_total.attr,
>> > +       &dev_attr_mem_limit.attr,
>> >         &dev_attr_max_comp_streams.attr,
>> >         &dev_attr_comp_algorithm.attr,
>> >         NULL,
>> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
>> > index e0f725c87cc6..b7aa9c21553f 100644
>> > --- a/drivers/block/zram/zram_drv.h
>> > +++ b/drivers/block/zram/zram_drv.h
>> > @@ -112,6 +112,11 @@ struct zram {
>> >         u64 disksize;   /* bytes */
>> >         int max_comp_streams;
>> >         struct zram_stats stats;
>> > +       /*
>> > +        * the number of pages zram can consume for storing compressed data
>> > +        */
>> > +       unsigned long limit_pages;
>> > +
>> >         char compressor[10];
>> >  };
>> >  #endif
>> > --
>> > 2.0.0
>> >
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
> --
> Kind regards,
> Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
@ 2014-08-25  3:40         ` David Horner
  0 siblings, 0 replies; 44+ messages in thread
From: David Horner @ 2014-08-25  3:40 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky,
	Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato,
	Nitin Gupta, Seth Jennings, Dan Streetman

On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote:
> Hello David,
>
> On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote:
>> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
>> > Since zram has no control feature to limit memory usage,
>> > it makes hard to manage system memrory.
>> >
>> > This patch adds new knob "mem_limit" via sysfs to set up the
>> > a limit so that zram could fail allocation once it reaches
>> > the limit.
>> >
>> > In addition, user could change the limit in runtime so that
>> > he could manage the memory more dynamically.
>> >
>> - Default is no limit so it doesn't break old behavior.
>> + Initial state is no limit so it doesn't break old behavior.
>>
>> I understand your previous post now.
>>
>> I was saying that setting to either a null value or garbage
>>  (which is interpreted as zero by memparse(buf, NULL);)
>> removes the limit.
>>
>> I think this is "surprise" behaviour and rather the null case should
>> return  -EINVAL
>> The test below should be "good enough" though not catching all garbage.
>
> Thanks for suggesting but as I said, it should be fixed in memparse itself,
> not caller if it is really problem so I don't want to touch it in this
> patchset. It's not critical for adding the feature.
>

I've looked into the memparse function more since we talked.
I do believe a wrapper function around it for the typical use by sysfs would
be very valuable.
However, there is nothing wrong with memparse itself that needs to be fixed.

It does what it is documented to do very well (In My Uninformed Opinion).
It provides everything that a caller needs to manage the token that it
processes.
It thus handles strings like "7,,5,8,,9" with the implied zeros.

The fact that other callers don't check the return pointer value to
see if only a null
string was processed, is not its fault.
Nor that it may not be ideally suited to sysfs attributes; that other store
functions use it in a given manner does not means that is correct -
nor that it is
incorrect for that "knob". Some attributes could be just as valid with
null zeros.

And you are correct, to disambiguate the zero is not required for the
limit feature.
Your original patch which disallowed zero was full feature for mem_limit.
It is the requested non-crucial feature to allow zero to reestablish
the initial state
 that benefits from distinguishing an explicit zero from a "default zero'
 when garbage is written.

The final argument is that if we release this feature as is the undocumented
 functionality could be relied upon, and when later fixed: user space breaks.
They say getting API right is a difficult exercise. I suggest, if we
don't insisting on
 an explicit zero we have the API wrong.

I don't think you disagreed, just that the burden to get it correct
lay elsewhere.

If that is the case it doesn't really matter, we cannot release this
interface until
 it is corrected wherever it must be.

And my zero check was a poor hack.

I should have explicitly checked the returned pointer value.

I will send that proposed revision, and hopefully you will consider it
for inclusion.




>>
>> >
>> > Signed-off-by: Minchan Kim <minchan@kernel.org>
>> > ---
>> >  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
>> >  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
>> >  drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
>> >  drivers/block/zram/zram_drv.h              |  5 ++++
>> >  4 files changed, 76 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
>> > index 70ec992514d0..b8c779d64968 100644
>> > --- a/Documentation/ABI/testing/sysfs-block-zram
>> > +++ b/Documentation/ABI/testing/sysfs-block-zram
>> > @@ -119,3 +119,13 @@ Description:
>> >                 efficiency can be calculated using compr_data_size and this
>> >                 statistic.
>> >                 Unit: bytes
>> > +
>> > +What:          /sys/block/zram<id>/mem_limit
>> > +Date:          August 2014
>> > +Contact:       Minchan Kim <minchan@kernel.org>
>> > +Description:
>> > +               The mem_limit file is read/write and specifies the amount
>> > +               of memory to be able to consume memory to store store
>> > +               compressed data. The limit could be changed in run time
>> > -               and "0" is default which means disable the limit.
>> > +               and "0" means disable the limit. No limit is the initial state.
>>
>> there should be no default in the API.
>
> Thanks.
>
>>
>> > +               Unit: bytes
>> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
>> > index 0595c3f56ccf..82c6a41116db 100644
>> > --- a/Documentation/blockdev/zram.txt
>> > +++ b/Documentation/blockdev/zram.txt
>> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
>> >  since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
>> >  size of the disk when not in use so a huge zram is wasteful.
>> >
>> > -5) Activate:
>> > +5) Set memory limit: Optional
>> > +       Set memory limit by writing the value to sysfs node 'mem_limit'.
>> > +       The value can be either in bytes or you can use mem suffixes.
>> > +       In addition, you could change the value in runtime.
>> > +       Examples:
>> > +           # limit /dev/zram0 with 50MB memory
>> > +           echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
>> > +
>> > +           # Using mem suffixes
>> > +           echo 256K > /sys/block/zram0/mem_limit
>> > +           echo 512M > /sys/block/zram0/mem_limit
>> > +           echo 1G > /sys/block/zram0/mem_limit
>> > +
>> > +           # To disable memory limit
>> > +           echo 0 > /sys/block/zram0/mem_limit
>> > +
>> > +6) Activate:
>> >         mkswap /dev/zram0
>> >         swapon /dev/zram0
>> >
>> >         mkfs.ext4 /dev/zram1
>> >         mount /dev/zram1 /tmp
>> >
>> > -6) Stats:
>> > +7) Stats:
>> >         Per-device statistics are exported as various nodes under
>> >         /sys/block/zram<id>/
>> >                 disksize
>> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
>> >                 compr_data_size
>> >                 mem_used_total
>> >
>> > -7) Deactivate:
>> > +8) Deactivate:
>> >         swapoff /dev/zram0
>> >         umount /dev/zram1
>> >
>> > -8) Reset:
>> > +9) Reset:
>> >         Write any positive value to 'reset' sysfs node
>> >         echo 1 > /sys/block/zram0/reset
>> >         echo 1 > /sys/block/zram1/reset
>> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
>> > index f0b8b30a7128..370c355eb127 100644
>> > --- a/drivers/block/zram/zram_drv.c
>> > +++ b/drivers/block/zram/zram_drv.c
>> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
>> >         return scnprintf(buf, PAGE_SIZE, "%d\n", val);
>> >  }
>> >
>> > +static ssize_t mem_limit_show(struct device *dev,
>> > +               struct device_attribute *attr, char *buf)
>> > +{
>> > +       u64 val;
>> > +       struct zram *zram = dev_to_zram(dev);
>> > +
>> > +       down_read(&zram->init_lock);
>> > +       val = zram->limit_pages;
>> > +       up_read(&zram->init_lock);
>> > +
>> > +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
>> > +}
>> > +
>> > +static ssize_t mem_limit_store(struct device *dev,
>> > +               struct device_attribute *attr, const char *buf, size_t len)
>> > +{
>> > +       u64 limit;
>> > +       struct zram *zram = dev_to_zram(dev);
>> > +
>> > +       limit = memparse(buf, NULL);
>>
>>             if (limit = 0 && buf != "0")
>>                   return  -EINVAL
>>
>> > +       down_write(&zram->init_lock);
>> > +       zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
>> > +       up_write(&zram->init_lock);
>> > +
>> > +       return len;
>> > +}
>> > +
>> >  static ssize_t max_comp_streams_store(struct device *dev,
>> >                 struct device_attribute *attr, const char *buf, size_t len)
>> >  {
>> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>> >                 ret = -ENOMEM;
>> >                 goto out;
>> >         }
>> > +
>> > +       if (zram->limit_pages &&
>> > +               zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
>> > +               zs_free(meta->mem_pool, handle);
>> > +               ret = -ENOMEM;
>> > +               goto out;
>> > +       }
>> > +
>> >         cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>> >
>> >         if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
>> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>> >         struct zram_meta *meta;
>> >
>> >         down_write(&zram->init_lock);
>> > +
>> > +       zram->limit_pages = 0;
>> > +
>> >         if (!init_done(zram)) {
>> >                 up_write(&zram->init_lock);
>> >                 return;
>> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
>> >  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
>> >  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
>> >  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
>> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
>> > +               mem_limit_store);
>> >  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
>> >                 max_comp_streams_show, max_comp_streams_store);
>> >  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
>> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
>> >         &dev_attr_orig_data_size.attr,
>> >         &dev_attr_compr_data_size.attr,
>> >         &dev_attr_mem_used_total.attr,
>> > +       &dev_attr_mem_limit.attr,
>> >         &dev_attr_max_comp_streams.attr,
>> >         &dev_attr_comp_algorithm.attr,
>> >         NULL,
>> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
>> > index e0f725c87cc6..b7aa9c21553f 100644
>> > --- a/drivers/block/zram/zram_drv.h
>> > +++ b/drivers/block/zram/zram_drv.h
>> > @@ -112,6 +112,11 @@ struct zram {
>> >         u64 disksize;   /* bytes */
>> >         int max_comp_streams;
>> >         struct zram_stats stats;
>> > +       /*
>> > +        * the number of pages zram can consume for storing compressed data
>> > +        */
>> > +       unsigned long limit_pages;
>> > +
>> >         char compressor[10];
>> >  };
>> >  #endif
>> > --
>> > 2.0.0
>> >
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
> --
> Kind regards,
> Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
  2014-08-25  3:40         ` David Horner
@ 2014-08-25  4:37           ` Minchan Kim
  -1 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2014-08-25  4:37 UTC (permalink / raw)
  To: David Horner
  Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky,
	Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato,
	Nitin Gupta, Seth Jennings, Dan Streetman

On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote:
> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote:
> > Hello David,
> >
> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote:
> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
> >> > Since zram has no control feature to limit memory usage,
> >> > it makes hard to manage system memrory.
> >> >
> >> > This patch adds new knob "mem_limit" via sysfs to set up the
> >> > a limit so that zram could fail allocation once it reaches
> >> > the limit.
> >> >
> >> > In addition, user could change the limit in runtime so that
> >> > he could manage the memory more dynamically.
> >> >
> >> - Default is no limit so it doesn't break old behavior.
> >> + Initial state is no limit so it doesn't break old behavior.
> >>
> >> I understand your previous post now.
> >>
> >> I was saying that setting to either a null value or garbage
> >>  (which is interpreted as zero by memparse(buf, NULL);)
> >> removes the limit.
> >>
> >> I think this is "surprise" behaviour and rather the null case should
> >> return  -EINVAL
> >> The test below should be "good enough" though not catching all garbage.
> >
> > Thanks for suggesting but as I said, it should be fixed in memparse itself,
> > not caller if it is really problem so I don't want to touch it in this
> > patchset. It's not critical for adding the feature.
> >
> 
> I've looked into the memparse function more since we talked.
> I do believe a wrapper function around it for the typical use by sysfs would
> be very valuable.

Agree.

> However, there is nothing wrong with memparse itself that needs to be fixed.
> 
> It does what it is documented to do very well (In My Uninformed Opinion).
> It provides everything that a caller needs to manage the token that it
> processes.
> It thus handles strings like "7,,5,8,,9" with the implied zeros.

Maybe strict_memparse would be better to protect such things so you
could find several places to clean it up.

> 
> The fact that other callers don't check the return pointer value to
> see if only a null
> string was processed, is not its fault.
> Nor that it may not be ideally suited to sysfs attributes; that other store
> functions use it in a given manner does not means that is correct -
> nor that it is
> incorrect for that "knob". Some attributes could be just as valid with
> null zeros.
> 
> And you are correct, to disambiguate the zero is not required for the
> limit feature.
> Your original patch which disallowed zero was full feature for mem_limit.
> It is the requested non-crucial feature to allow zero to reestablish
> the initial state
>  that benefits from distinguishing an explicit zero from a "default zero'
>  when garbage is written.
> 
> The final argument is that if we release this feature as is the undocumented
>  functionality could be relied upon, and when later fixed: user space breaks.

I don't get it. Why does it break userspace?
The sysfs-block-zram says "0" means disable the limit.
If someone writes *garabge* but work as if disabling the limit,
it's not a right thing and he already broke although it worked
so it would be not a problem if we fix later.
(ie, we don't need to take care of broken userspace)
Am I missing your point?

> They say getting API right is a difficult exercise. I suggest, if we
> don't insisting on
>  an explicit zero we have the API wrong.
> 
> I don't think you disagreed, just that the burden to get it correct
> lay elsewhere.
> 
> If that is the case it doesn't really matter, we cannot release this
> interface until
>  it is corrected wherever it must be.
> 
> And my zero check was a poor hack.
> 
> I should have explicitly checked the returned pointer value.
> 
> I will send that proposed revision, and hopefully you will consider it
> for inclusion.
> 
> 
> 
> 
> >>
> >> >
> >> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> >> > ---
> >> >  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
> >> >  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
> >> >  drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
> >> >  drivers/block/zram/zram_drv.h              |  5 ++++
> >> >  4 files changed, 76 insertions(+), 4 deletions(-)
> >> >
> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
> >> > index 70ec992514d0..b8c779d64968 100644
> >> > --- a/Documentation/ABI/testing/sysfs-block-zram
> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram
> >> > @@ -119,3 +119,13 @@ Description:
> >> >                 efficiency can be calculated using compr_data_size and this
> >> >                 statistic.
> >> >                 Unit: bytes
> >> > +
> >> > +What:          /sys/block/zram<id>/mem_limit
> >> > +Date:          August 2014
> >> > +Contact:       Minchan Kim <minchan@kernel.org>
> >> > +Description:
> >> > +               The mem_limit file is read/write and specifies the amount
> >> > +               of memory to be able to consume memory to store store
> >> > +               compressed data. The limit could be changed in run time
> >> > -               and "0" is default which means disable the limit.
> >> > +               and "0" means disable the limit. No limit is the initial state.
> >>
> >> there should be no default in the API.
> >
> > Thanks.
> >
> >>
> >> > +               Unit: bytes
> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
> >> > index 0595c3f56ccf..82c6a41116db 100644
> >> > --- a/Documentation/blockdev/zram.txt
> >> > +++ b/Documentation/blockdev/zram.txt
> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
> >> >  since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
> >> >  size of the disk when not in use so a huge zram is wasteful.
> >> >
> >> > -5) Activate:
> >> > +5) Set memory limit: Optional
> >> > +       Set memory limit by writing the value to sysfs node 'mem_limit'.
> >> > +       The value can be either in bytes or you can use mem suffixes.
> >> > +       In addition, you could change the value in runtime.
> >> > +       Examples:
> >> > +           # limit /dev/zram0 with 50MB memory
> >> > +           echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
> >> > +
> >> > +           # Using mem suffixes
> >> > +           echo 256K > /sys/block/zram0/mem_limit
> >> > +           echo 512M > /sys/block/zram0/mem_limit
> >> > +           echo 1G > /sys/block/zram0/mem_limit
> >> > +
> >> > +           # To disable memory limit
> >> > +           echo 0 > /sys/block/zram0/mem_limit
> >> > +
> >> > +6) Activate:
> >> >         mkswap /dev/zram0
> >> >         swapon /dev/zram0
> >> >
> >> >         mkfs.ext4 /dev/zram1
> >> >         mount /dev/zram1 /tmp
> >> >
> >> > -6) Stats:
> >> > +7) Stats:
> >> >         Per-device statistics are exported as various nodes under
> >> >         /sys/block/zram<id>/
> >> >                 disksize
> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
> >> >                 compr_data_size
> >> >                 mem_used_total
> >> >
> >> > -7) Deactivate:
> >> > +8) Deactivate:
> >> >         swapoff /dev/zram0
> >> >         umount /dev/zram1
> >> >
> >> > -8) Reset:
> >> > +9) Reset:
> >> >         Write any positive value to 'reset' sysfs node
> >> >         echo 1 > /sys/block/zram0/reset
> >> >         echo 1 > /sys/block/zram1/reset
> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> >> > index f0b8b30a7128..370c355eb127 100644
> >> > --- a/drivers/block/zram/zram_drv.c
> >> > +++ b/drivers/block/zram/zram_drv.c
> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
> >> >         return scnprintf(buf, PAGE_SIZE, "%d\n", val);
> >> >  }
> >> >
> >> > +static ssize_t mem_limit_show(struct device *dev,
> >> > +               struct device_attribute *attr, char *buf)
> >> > +{
> >> > +       u64 val;
> >> > +       struct zram *zram = dev_to_zram(dev);
> >> > +
> >> > +       down_read(&zram->init_lock);
> >> > +       val = zram->limit_pages;
> >> > +       up_read(&zram->init_lock);
> >> > +
> >> > +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
> >> > +}
> >> > +
> >> > +static ssize_t mem_limit_store(struct device *dev,
> >> > +               struct device_attribute *attr, const char *buf, size_t len)
> >> > +{
> >> > +       u64 limit;
> >> > +       struct zram *zram = dev_to_zram(dev);
> >> > +
> >> > +       limit = memparse(buf, NULL);
> >>
> >>             if (limit = 0 && buf != "0")
> >>                   return  -EINVAL
> >>
> >> > +       down_write(&zram->init_lock);
> >> > +       zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
> >> > +       up_write(&zram->init_lock);
> >> > +
> >> > +       return len;
> >> > +}
> >> > +
> >> >  static ssize_t max_comp_streams_store(struct device *dev,
> >> >                 struct device_attribute *attr, const char *buf, size_t len)
> >> >  {
> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
> >> >                 ret = -ENOMEM;
> >> >                 goto out;
> >> >         }
> >> > +
> >> > +       if (zram->limit_pages &&
> >> > +               zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
> >> > +               zs_free(meta->mem_pool, handle);
> >> > +               ret = -ENOMEM;
> >> > +               goto out;
> >> > +       }
> >> > +
> >> >         cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
> >> >
> >> >         if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
> >> >         struct zram_meta *meta;
> >> >
> >> >         down_write(&zram->init_lock);
> >> > +
> >> > +       zram->limit_pages = 0;
> >> > +
> >> >         if (!init_done(zram)) {
> >> >                 up_write(&zram->init_lock);
> >> >                 return;
> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
> >> >  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
> >> >  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
> >> >  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
> >> > +               mem_limit_store);
> >> >  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
> >> >                 max_comp_streams_show, max_comp_streams_store);
> >> >  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
> >> >         &dev_attr_orig_data_size.attr,
> >> >         &dev_attr_compr_data_size.attr,
> >> >         &dev_attr_mem_used_total.attr,
> >> > +       &dev_attr_mem_limit.attr,
> >> >         &dev_attr_max_comp_streams.attr,
> >> >         &dev_attr_comp_algorithm.attr,
> >> >         NULL,
> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
> >> > index e0f725c87cc6..b7aa9c21553f 100644
> >> > --- a/drivers/block/zram/zram_drv.h
> >> > +++ b/drivers/block/zram/zram_drv.h
> >> > @@ -112,6 +112,11 @@ struct zram {
> >> >         u64 disksize;   /* bytes */
> >> >         int max_comp_streams;
> >> >         struct zram_stats stats;
> >> > +       /*
> >> > +        * the number of pages zram can consume for storing compressed data
> >> > +        */
> >> > +       unsigned long limit_pages;
> >> > +
> >> >         char compressor[10];
> >> >  };
> >> >  #endif
> >> > --
> >> > 2.0.0
> >> >
> >>
> >> --
> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >> the body to majordomo@kvack.org.  For more info on Linux MM,
> >> see: http://www.linux-mm.org/ .
> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >
> > --
> > Kind regards,
> > Minchan Kim
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
@ 2014-08-25  4:37           ` Minchan Kim
  0 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2014-08-25  4:37 UTC (permalink / raw)
  To: David Horner
  Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky,
	Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato,
	Nitin Gupta, Seth Jennings, Dan Streetman

On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote:
> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote:
> > Hello David,
> >
> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote:
> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
> >> > Since zram has no control feature to limit memory usage,
> >> > it makes hard to manage system memrory.
> >> >
> >> > This patch adds new knob "mem_limit" via sysfs to set up the
> >> > a limit so that zram could fail allocation once it reaches
> >> > the limit.
> >> >
> >> > In addition, user could change the limit in runtime so that
> >> > he could manage the memory more dynamically.
> >> >
> >> - Default is no limit so it doesn't break old behavior.
> >> + Initial state is no limit so it doesn't break old behavior.
> >>
> >> I understand your previous post now.
> >>
> >> I was saying that setting to either a null value or garbage
> >>  (which is interpreted as zero by memparse(buf, NULL);)
> >> removes the limit.
> >>
> >> I think this is "surprise" behaviour and rather the null case should
> >> return  -EINVAL
> >> The test below should be "good enough" though not catching all garbage.
> >
> > Thanks for suggesting but as I said, it should be fixed in memparse itself,
> > not caller if it is really problem so I don't want to touch it in this
> > patchset. It's not critical for adding the feature.
> >
> 
> I've looked into the memparse function more since we talked.
> I do believe a wrapper function around it for the typical use by sysfs would
> be very valuable.

Agree.

> However, there is nothing wrong with memparse itself that needs to be fixed.
> 
> It does what it is documented to do very well (In My Uninformed Opinion).
> It provides everything that a caller needs to manage the token that it
> processes.
> It thus handles strings like "7,,5,8,,9" with the implied zeros.

Maybe strict_memparse would be better to protect such things so you
could find several places to clean it up.

> 
> The fact that other callers don't check the return pointer value to
> see if only a null
> string was processed, is not its fault.
> Nor that it may not be ideally suited to sysfs attributes; that other store
> functions use it in a given manner does not means that is correct -
> nor that it is
> incorrect for that "knob". Some attributes could be just as valid with
> null zeros.
> 
> And you are correct, to disambiguate the zero is not required for the
> limit feature.
> Your original patch which disallowed zero was full feature for mem_limit.
> It is the requested non-crucial feature to allow zero to reestablish
> the initial state
>  that benefits from distinguishing an explicit zero from a "default zero'
>  when garbage is written.
> 
> The final argument is that if we release this feature as is the undocumented
>  functionality could be relied upon, and when later fixed: user space breaks.

I don't get it. Why does it break userspace?
The sysfs-block-zram says "0" means disable the limit.
If someone writes *garabge* but work as if disabling the limit,
it's not a right thing and he already broke although it worked
so it would be not a problem if we fix later.
(ie, we don't need to take care of broken userspace)
Am I missing your point?

> They say getting API right is a difficult exercise. I suggest, if we
> don't insisting on
>  an explicit zero we have the API wrong.
> 
> I don't think you disagreed, just that the burden to get it correct
> lay elsewhere.
> 
> If that is the case it doesn't really matter, we cannot release this
> interface until
>  it is corrected wherever it must be.
> 
> And my zero check was a poor hack.
> 
> I should have explicitly checked the returned pointer value.
> 
> I will send that proposed revision, and hopefully you will consider it
> for inclusion.
> 
> 
> 
> 
> >>
> >> >
> >> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> >> > ---
> >> >  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
> >> >  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
> >> >  drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
> >> >  drivers/block/zram/zram_drv.h              |  5 ++++
> >> >  4 files changed, 76 insertions(+), 4 deletions(-)
> >> >
> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
> >> > index 70ec992514d0..b8c779d64968 100644
> >> > --- a/Documentation/ABI/testing/sysfs-block-zram
> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram
> >> > @@ -119,3 +119,13 @@ Description:
> >> >                 efficiency can be calculated using compr_data_size and this
> >> >                 statistic.
> >> >                 Unit: bytes
> >> > +
> >> > +What:          /sys/block/zram<id>/mem_limit
> >> > +Date:          August 2014
> >> > +Contact:       Minchan Kim <minchan@kernel.org>
> >> > +Description:
> >> > +               The mem_limit file is read/write and specifies the amount
> >> > +               of memory to be able to consume memory to store store
> >> > +               compressed data. The limit could be changed in run time
> >> > -               and "0" is default which means disable the limit.
> >> > +               and "0" means disable the limit. No limit is the initial state.
> >>
> >> there should be no default in the API.
> >
> > Thanks.
> >
> >>
> >> > +               Unit: bytes
> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
> >> > index 0595c3f56ccf..82c6a41116db 100644
> >> > --- a/Documentation/blockdev/zram.txt
> >> > +++ b/Documentation/blockdev/zram.txt
> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
> >> >  since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
> >> >  size of the disk when not in use so a huge zram is wasteful.
> >> >
> >> > -5) Activate:
> >> > +5) Set memory limit: Optional
> >> > +       Set memory limit by writing the value to sysfs node 'mem_limit'.
> >> > +       The value can be either in bytes or you can use mem suffixes.
> >> > +       In addition, you could change the value in runtime.
> >> > +       Examples:
> >> > +           # limit /dev/zram0 with 50MB memory
> >> > +           echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
> >> > +
> >> > +           # Using mem suffixes
> >> > +           echo 256K > /sys/block/zram0/mem_limit
> >> > +           echo 512M > /sys/block/zram0/mem_limit
> >> > +           echo 1G > /sys/block/zram0/mem_limit
> >> > +
> >> > +           # To disable memory limit
> >> > +           echo 0 > /sys/block/zram0/mem_limit
> >> > +
> >> > +6) Activate:
> >> >         mkswap /dev/zram0
> >> >         swapon /dev/zram0
> >> >
> >> >         mkfs.ext4 /dev/zram1
> >> >         mount /dev/zram1 /tmp
> >> >
> >> > -6) Stats:
> >> > +7) Stats:
> >> >         Per-device statistics are exported as various nodes under
> >> >         /sys/block/zram<id>/
> >> >                 disksize
> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
> >> >                 compr_data_size
> >> >                 mem_used_total
> >> >
> >> > -7) Deactivate:
> >> > +8) Deactivate:
> >> >         swapoff /dev/zram0
> >> >         umount /dev/zram1
> >> >
> >> > -8) Reset:
> >> > +9) Reset:
> >> >         Write any positive value to 'reset' sysfs node
> >> >         echo 1 > /sys/block/zram0/reset
> >> >         echo 1 > /sys/block/zram1/reset
> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> >> > index f0b8b30a7128..370c355eb127 100644
> >> > --- a/drivers/block/zram/zram_drv.c
> >> > +++ b/drivers/block/zram/zram_drv.c
> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
> >> >         return scnprintf(buf, PAGE_SIZE, "%d\n", val);
> >> >  }
> >> >
> >> > +static ssize_t mem_limit_show(struct device *dev,
> >> > +               struct device_attribute *attr, char *buf)
> >> > +{
> >> > +       u64 val;
> >> > +       struct zram *zram = dev_to_zram(dev);
> >> > +
> >> > +       down_read(&zram->init_lock);
> >> > +       val = zram->limit_pages;
> >> > +       up_read(&zram->init_lock);
> >> > +
> >> > +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
> >> > +}
> >> > +
> >> > +static ssize_t mem_limit_store(struct device *dev,
> >> > +               struct device_attribute *attr, const char *buf, size_t len)
> >> > +{
> >> > +       u64 limit;
> >> > +       struct zram *zram = dev_to_zram(dev);
> >> > +
> >> > +       limit = memparse(buf, NULL);
> >>
> >>             if (limit = 0 && buf != "0")
> >>                   return  -EINVAL
> >>
> >> > +       down_write(&zram->init_lock);
> >> > +       zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
> >> > +       up_write(&zram->init_lock);
> >> > +
> >> > +       return len;
> >> > +}
> >> > +
> >> >  static ssize_t max_comp_streams_store(struct device *dev,
> >> >                 struct device_attribute *attr, const char *buf, size_t len)
> >> >  {
> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
> >> >                 ret = -ENOMEM;
> >> >                 goto out;
> >> >         }
> >> > +
> >> > +       if (zram->limit_pages &&
> >> > +               zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
> >> > +               zs_free(meta->mem_pool, handle);
> >> > +               ret = -ENOMEM;
> >> > +               goto out;
> >> > +       }
> >> > +
> >> >         cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
> >> >
> >> >         if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
> >> >         struct zram_meta *meta;
> >> >
> >> >         down_write(&zram->init_lock);
> >> > +
> >> > +       zram->limit_pages = 0;
> >> > +
> >> >         if (!init_done(zram)) {
> >> >                 up_write(&zram->init_lock);
> >> >                 return;
> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
> >> >  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
> >> >  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
> >> >  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
> >> > +               mem_limit_store);
> >> >  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
> >> >                 max_comp_streams_show, max_comp_streams_store);
> >> >  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
> >> >         &dev_attr_orig_data_size.attr,
> >> >         &dev_attr_compr_data_size.attr,
> >> >         &dev_attr_mem_used_total.attr,
> >> > +       &dev_attr_mem_limit.attr,
> >> >         &dev_attr_max_comp_streams.attr,
> >> >         &dev_attr_comp_algorithm.attr,
> >> >         NULL,
> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
> >> > index e0f725c87cc6..b7aa9c21553f 100644
> >> > --- a/drivers/block/zram/zram_drv.h
> >> > +++ b/drivers/block/zram/zram_drv.h
> >> > @@ -112,6 +112,11 @@ struct zram {
> >> >         u64 disksize;   /* bytes */
> >> >         int max_comp_streams;
> >> >         struct zram_stats stats;
> >> > +       /*
> >> > +        * the number of pages zram can consume for storing compressed data
> >> > +        */
> >> > +       unsigned long limit_pages;
> >> > +
> >> >         char compressor[10];
> >> >  };
> >> >  #endif
> >> > --
> >> > 2.0.0
> >> >
> >>
> >> --
> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >> the body to majordomo@kvack.org.  For more info on Linux MM,
> >> see: http://www.linux-mm.org/ .
> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >
> > --
> > Kind regards,
> > Minchan Kim
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
  2014-08-25  4:37           ` Minchan Kim
@ 2014-08-25  8:22             ` David Horner
  -1 siblings, 0 replies; 44+ messages in thread
From: David Horner @ 2014-08-25  8:22 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky,
	Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato,
	Nitin Gupta, Seth Jennings, Dan Streetman

On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote:
> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote:
>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote:
>> > Hello David,
>> >
>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote:
>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
>> >> > Since zram has no control feature to limit memory usage,
>> >> > it makes hard to manage system memrory.
>> >> >
>> >> > This patch adds new knob "mem_limit" via sysfs to set up the
>> >> > a limit so that zram could fail allocation once it reaches
>> >> > the limit.
>> >> >
>> >> > In addition, user could change the limit in runtime so that
>> >> > he could manage the memory more dynamically.
>> >> >
>> >> - Default is no limit so it doesn't break old behavior.
>> >> + Initial state is no limit so it doesn't break old behavior.
>> >>
>> >> I understand your previous post now.
>> >>
>> >> I was saying that setting to either a null value or garbage
>> >>  (which is interpreted as zero by memparse(buf, NULL);)
>> >> removes the limit.
>> >>
>> >> I think this is "surprise" behaviour and rather the null case should
>> >> return  -EINVAL
>> >> The test below should be "good enough" though not catching all garbage.
>> >
>> > Thanks for suggesting but as I said, it should be fixed in memparse itself,
>> > not caller if it is really problem so I don't want to touch it in this
>> > patchset. It's not critical for adding the feature.
>> >
>>
>> I've looked into the memparse function more since we talked.
>> I do believe a wrapper function around it for the typical use by sysfs would
>> be very valuable.
>
> Agree.
>
>> However, there is nothing wrong with memparse itself that needs to be fixed.
>>
>> It does what it is documented to do very well (In My Uninformed Opinion).
>> It provides everything that a caller needs to manage the token that it
>> processes.
>> It thus handles strings like "7,,5,8,,9" with the implied zeros.
>
> Maybe strict_memparse would be better to protect such things so you
> could find several places to clean it up.
>
>>
>> The fact that other callers don't check the return pointer value to
>> see if only a null
>> string was processed, is not its fault.
>> Nor that it may not be ideally suited to sysfs attributes; that other store
>> functions use it in a given manner does not means that is correct -
>> nor that it is
>> incorrect for that "knob". Some attributes could be just as valid with
>> null zeros.
>>
>> And you are correct, to disambiguate the zero is not required for the
>> limit feature.
>> Your original patch which disallowed zero was full feature for mem_limit.
>> It is the requested non-crucial feature to allow zero to reestablish
>> the initial state
>>  that benefits from distinguishing an explicit zero from a "default zero'
>>  when garbage is written.
>>
>> The final argument is that if we release this feature as is the undocumented
>>  functionality could be relied upon, and when later fixed: user space breaks.
>
> I don't get it. Why does it break userspace?
> The sysfs-block-zram says "0" means disable the limit.
> If someone writes *garabge* but work as if disabling the limit,
> it's not a right thing and he already broke although it worked
> so it would be not a problem if we fix later.
> (ie, we don't need to take care of broken userspace)
> Am I missing your point?
>

Perhaps you are missing my point, perhaps ignoring or dismissing.

Basically, if a facility works in a useful way, even if it was designed for
different usage, that becomes the "accepted" interface/usage.
The developer may not have intended that usage or may even considered
it wrong and a broken usage, but it is what it is and people become
 reliant on that behaviour.

Case in point is memparse itself.

The developer intentionally sets the return pointer because that is the
only value that can be validated for correct performance.
The return value allows -ve so the standard error message passing is not valid.
Unfortunately, C allows the user to pass a NULL value in the parameter.
The developer could consider that absurd and fundamentally broken.
But to the user it is a valid situation, because (perhaps) it can't be
bothered to handle error cases.

So, who is to blame.
You say memparse, that it is fundamentally broken,
  because it didn't check to see that it was used correctly.
 And I say  mem_limit_store is fundamentally broken,
  because it didn't check to see that it was used correctly.

The difference is that memparse cannot stop being abused
(C allows the NULL argument and extensive tricks are required to address that)
however, we can readily fix mem_limit_store and ensure
1) no regression when the interface IS fixed and
2) predictable behaviour when accidental or "fuzzy" input arrives.


>> They say getting API right is a difficult exercise. I suggest, if we
>> don't insisting on
>>  an explicit zero we have the API wrong.
>>
>> I don't think you disagreed, just that the burden to get it correct
>> lay elsewhere.
>>
>> If that is the case it doesn't really matter, we cannot release this
>> interface until
>>  it is corrected wherever it must be.
>>
>> And my zero check was a poor hack.
>>
>> I should have explicitly checked the returned pointer value.
>>
>> I will send that proposed revision, and hopefully you will consider it
>> for inclusion.
>>
>>
>>
>>
>> >>
>> >> >
>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org>
>> >> > ---
>> >> >  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
>> >> >  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
>> >> >  drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
>> >> >  drivers/block/zram/zram_drv.h              |  5 ++++
>> >> >  4 files changed, 76 insertions(+), 4 deletions(-)
>> >> >
>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
>> >> > index 70ec992514d0..b8c779d64968 100644
>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram
>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram
>> >> > @@ -119,3 +119,13 @@ Description:
>> >> >                 efficiency can be calculated using compr_data_size and this
>> >> >                 statistic.
>> >> >                 Unit: bytes
>> >> > +
>> >> > +What:          /sys/block/zram<id>/mem_limit
>> >> > +Date:          August 2014
>> >> > +Contact:       Minchan Kim <minchan@kernel.org>
>> >> > +Description:
>> >> > +               The mem_limit file is read/write and specifies the amount
>> >> > +               of memory to be able to consume memory to store store
>> >> > +               compressed data. The limit could be changed in run time
>> >> > -               and "0" is default which means disable the limit.
>> >> > +               and "0" means disable the limit. No limit is the initial state.
>> >>
>> >> there should be no default in the API.
>> >
>> > Thanks.
>> >
>> >>
>> >> > +               Unit: bytes
>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
>> >> > index 0595c3f56ccf..82c6a41116db 100644
>> >> > --- a/Documentation/blockdev/zram.txt
>> >> > +++ b/Documentation/blockdev/zram.txt
>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
>> >> >  since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
>> >> >  size of the disk when not in use so a huge zram is wasteful.
>> >> >
>> >> > -5) Activate:
>> >> > +5) Set memory limit: Optional
>> >> > +       Set memory limit by writing the value to sysfs node 'mem_limit'.
>> >> > +       The value can be either in bytes or you can use mem suffixes.
>> >> > +       In addition, you could change the value in runtime.
>> >> > +       Examples:
>> >> > +           # limit /dev/zram0 with 50MB memory
>> >> > +           echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
>> >> > +
>> >> > +           # Using mem suffixes
>> >> > +           echo 256K > /sys/block/zram0/mem_limit
>> >> > +           echo 512M > /sys/block/zram0/mem_limit
>> >> > +           echo 1G > /sys/block/zram0/mem_limit
>> >> > +
>> >> > +           # To disable memory limit
>> >> > +           echo 0 > /sys/block/zram0/mem_limit
>> >> > +
>> >> > +6) Activate:
>> >> >         mkswap /dev/zram0
>> >> >         swapon /dev/zram0
>> >> >
>> >> >         mkfs.ext4 /dev/zram1
>> >> >         mount /dev/zram1 /tmp
>> >> >
>> >> > -6) Stats:
>> >> > +7) Stats:
>> >> >         Per-device statistics are exported as various nodes under
>> >> >         /sys/block/zram<id>/
>> >> >                 disksize
>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
>> >> >                 compr_data_size
>> >> >                 mem_used_total
>> >> >
>> >> > -7) Deactivate:
>> >> > +8) Deactivate:
>> >> >         swapoff /dev/zram0
>> >> >         umount /dev/zram1
>> >> >
>> >> > -8) Reset:
>> >> > +9) Reset:
>> >> >         Write any positive value to 'reset' sysfs node
>> >> >         echo 1 > /sys/block/zram0/reset
>> >> >         echo 1 > /sys/block/zram1/reset
>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
>> >> > index f0b8b30a7128..370c355eb127 100644
>> >> > --- a/drivers/block/zram/zram_drv.c
>> >> > +++ b/drivers/block/zram/zram_drv.c
>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
>> >> >         return scnprintf(buf, PAGE_SIZE, "%d\n", val);
>> >> >  }
>> >> >
>> >> > +static ssize_t mem_limit_show(struct device *dev,
>> >> > +               struct device_attribute *attr, char *buf)
>> >> > +{
>> >> > +       u64 val;
>> >> > +       struct zram *zram = dev_to_zram(dev);
>> >> > +
>> >> > +       down_read(&zram->init_lock);
>> >> > +       val = zram->limit_pages;
>> >> > +       up_read(&zram->init_lock);
>> >> > +
>> >> > +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
>> >> > +}
>> >> > +
>> >> > +static ssize_t mem_limit_store(struct device *dev,
>> >> > +               struct device_attribute *attr, const char *buf, size_t len)
>> >> > +{
>> >> > +       u64 limit;
>> >> > +       struct zram *zram = dev_to_zram(dev);
>> >> > +
>> >> > +       limit = memparse(buf, NULL);
>> >>
>> >>             if (limit = 0 && buf != "0")
>> >>                   return  -EINVAL
>> >>
>> >> > +       down_write(&zram->init_lock);
>> >> > +       zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
>> >> > +       up_write(&zram->init_lock);
>> >> > +
>> >> > +       return len;
>> >> > +}
>> >> > +
>> >> >  static ssize_t max_comp_streams_store(struct device *dev,
>> >> >                 struct device_attribute *attr, const char *buf, size_t len)
>> >> >  {
>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>> >> >                 ret = -ENOMEM;
>> >> >                 goto out;
>> >> >         }
>> >> > +
>> >> > +       if (zram->limit_pages &&
>> >> > +               zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
>> >> > +               zs_free(meta->mem_pool, handle);
>> >> > +               ret = -ENOMEM;
>> >> > +               goto out;
>> >> > +       }
>> >> > +
>> >> >         cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>> >> >
>> >> >         if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>> >> >         struct zram_meta *meta;
>> >> >
>> >> >         down_write(&zram->init_lock);
>> >> > +
>> >> > +       zram->limit_pages = 0;
>> >> > +
>> >> >         if (!init_done(zram)) {
>> >> >                 up_write(&zram->init_lock);
>> >> >                 return;
>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
>> >> >  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
>> >> >  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
>> >> >  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
>> >> > +               mem_limit_store);
>> >> >  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
>> >> >                 max_comp_streams_show, max_comp_streams_store);
>> >> >  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
>> >> >         &dev_attr_orig_data_size.attr,
>> >> >         &dev_attr_compr_data_size.attr,
>> >> >         &dev_attr_mem_used_total.attr,
>> >> > +       &dev_attr_mem_limit.attr,
>> >> >         &dev_attr_max_comp_streams.attr,
>> >> >         &dev_attr_comp_algorithm.attr,
>> >> >         NULL,
>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
>> >> > index e0f725c87cc6..b7aa9c21553f 100644
>> >> > --- a/drivers/block/zram/zram_drv.h
>> >> > +++ b/drivers/block/zram/zram_drv.h
>> >> > @@ -112,6 +112,11 @@ struct zram {
>> >> >         u64 disksize;   /* bytes */
>> >> >         int max_comp_streams;
>> >> >         struct zram_stats stats;
>> >> > +       /*
>> >> > +        * the number of pages zram can consume for storing compressed data
>> >> > +        */
>> >> > +       unsigned long limit_pages;
>> >> > +
>> >> >         char compressor[10];
>> >> >  };
>> >> >  #endif
>> >> > --
>> >> > 2.0.0
>> >> >
>> >>
>> >> --
>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >> the body to majordomo@kvack.org.  For more info on Linux MM,
>> >> see: http://www.linux-mm.org/ .
>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>> >
>> > --
>> > Kind regards,
>> > Minchan Kim
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
> --
> Kind regards,
> Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
@ 2014-08-25  8:22             ` David Horner
  0 siblings, 0 replies; 44+ messages in thread
From: David Horner @ 2014-08-25  8:22 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky,
	Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato,
	Nitin Gupta, Seth Jennings, Dan Streetman

On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote:
> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote:
>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote:
>> > Hello David,
>> >
>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote:
>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
>> >> > Since zram has no control feature to limit memory usage,
>> >> > it makes hard to manage system memrory.
>> >> >
>> >> > This patch adds new knob "mem_limit" via sysfs to set up the
>> >> > a limit so that zram could fail allocation once it reaches
>> >> > the limit.
>> >> >
>> >> > In addition, user could change the limit in runtime so that
>> >> > he could manage the memory more dynamically.
>> >> >
>> >> - Default is no limit so it doesn't break old behavior.
>> >> + Initial state is no limit so it doesn't break old behavior.
>> >>
>> >> I understand your previous post now.
>> >>
>> >> I was saying that setting to either a null value or garbage
>> >>  (which is interpreted as zero by memparse(buf, NULL);)
>> >> removes the limit.
>> >>
>> >> I think this is "surprise" behaviour and rather the null case should
>> >> return  -EINVAL
>> >> The test below should be "good enough" though not catching all garbage.
>> >
>> > Thanks for suggesting but as I said, it should be fixed in memparse itself,
>> > not caller if it is really problem so I don't want to touch it in this
>> > patchset. It's not critical for adding the feature.
>> >
>>
>> I've looked into the memparse function more since we talked.
>> I do believe a wrapper function around it for the typical use by sysfs would
>> be very valuable.
>
> Agree.
>
>> However, there is nothing wrong with memparse itself that needs to be fixed.
>>
>> It does what it is documented to do very well (In My Uninformed Opinion).
>> It provides everything that a caller needs to manage the token that it
>> processes.
>> It thus handles strings like "7,,5,8,,9" with the implied zeros.
>
> Maybe strict_memparse would be better to protect such things so you
> could find several places to clean it up.
>
>>
>> The fact that other callers don't check the return pointer value to
>> see if only a null
>> string was processed, is not its fault.
>> Nor that it may not be ideally suited to sysfs attributes; that other store
>> functions use it in a given manner does not means that is correct -
>> nor that it is
>> incorrect for that "knob". Some attributes could be just as valid with
>> null zeros.
>>
>> And you are correct, to disambiguate the zero is not required for the
>> limit feature.
>> Your original patch which disallowed zero was full feature for mem_limit.
>> It is the requested non-crucial feature to allow zero to reestablish
>> the initial state
>>  that benefits from distinguishing an explicit zero from a "default zero'
>>  when garbage is written.
>>
>> The final argument is that if we release this feature as is the undocumented
>>  functionality could be relied upon, and when later fixed: user space breaks.
>
> I don't get it. Why does it break userspace?
> The sysfs-block-zram says "0" means disable the limit.
> If someone writes *garabge* but work as if disabling the limit,
> it's not a right thing and he already broke although it worked
> so it would be not a problem if we fix later.
> (ie, we don't need to take care of broken userspace)
> Am I missing your point?
>

Perhaps you are missing my point, perhaps ignoring or dismissing.

Basically, if a facility works in a useful way, even if it was designed for
different usage, that becomes the "accepted" interface/usage.
The developer may not have intended that usage or may even considered
it wrong and a broken usage, but it is what it is and people become
 reliant on that behaviour.

Case in point is memparse itself.

The developer intentionally sets the return pointer because that is the
only value that can be validated for correct performance.
The return value allows -ve so the standard error message passing is not valid.
Unfortunately, C allows the user to pass a NULL value in the parameter.
The developer could consider that absurd and fundamentally broken.
But to the user it is a valid situation, because (perhaps) it can't be
bothered to handle error cases.

So, who is to blame.
You say memparse, that it is fundamentally broken,
  because it didn't check to see that it was used correctly.
 And I say  mem_limit_store is fundamentally broken,
  because it didn't check to see that it was used correctly.

The difference is that memparse cannot stop being abused
(C allows the NULL argument and extensive tricks are required to address that)
however, we can readily fix mem_limit_store and ensure
1) no regression when the interface IS fixed and
2) predictable behaviour when accidental or "fuzzy" input arrives.


>> They say getting API right is a difficult exercise. I suggest, if we
>> don't insisting on
>>  an explicit zero we have the API wrong.
>>
>> I don't think you disagreed, just that the burden to get it correct
>> lay elsewhere.
>>
>> If that is the case it doesn't really matter, we cannot release this
>> interface until
>>  it is corrected wherever it must be.
>>
>> And my zero check was a poor hack.
>>
>> I should have explicitly checked the returned pointer value.
>>
>> I will send that proposed revision, and hopefully you will consider it
>> for inclusion.
>>
>>
>>
>>
>> >>
>> >> >
>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org>
>> >> > ---
>> >> >  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
>> >> >  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
>> >> >  drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
>> >> >  drivers/block/zram/zram_drv.h              |  5 ++++
>> >> >  4 files changed, 76 insertions(+), 4 deletions(-)
>> >> >
>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
>> >> > index 70ec992514d0..b8c779d64968 100644
>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram
>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram
>> >> > @@ -119,3 +119,13 @@ Description:
>> >> >                 efficiency can be calculated using compr_data_size and this
>> >> >                 statistic.
>> >> >                 Unit: bytes
>> >> > +
>> >> > +What:          /sys/block/zram<id>/mem_limit
>> >> > +Date:          August 2014
>> >> > +Contact:       Minchan Kim <minchan@kernel.org>
>> >> > +Description:
>> >> > +               The mem_limit file is read/write and specifies the amount
>> >> > +               of memory to be able to consume memory to store store
>> >> > +               compressed data. The limit could be changed in run time
>> >> > -               and "0" is default which means disable the limit.
>> >> > +               and "0" means disable the limit. No limit is the initial state.
>> >>
>> >> there should be no default in the API.
>> >
>> > Thanks.
>> >
>> >>
>> >> > +               Unit: bytes
>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
>> >> > index 0595c3f56ccf..82c6a41116db 100644
>> >> > --- a/Documentation/blockdev/zram.txt
>> >> > +++ b/Documentation/blockdev/zram.txt
>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
>> >> >  since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
>> >> >  size of the disk when not in use so a huge zram is wasteful.
>> >> >
>> >> > -5) Activate:
>> >> > +5) Set memory limit: Optional
>> >> > +       Set memory limit by writing the value to sysfs node 'mem_limit'.
>> >> > +       The value can be either in bytes or you can use mem suffixes.
>> >> > +       In addition, you could change the value in runtime.
>> >> > +       Examples:
>> >> > +           # limit /dev/zram0 with 50MB memory
>> >> > +           echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
>> >> > +
>> >> > +           # Using mem suffixes
>> >> > +           echo 256K > /sys/block/zram0/mem_limit
>> >> > +           echo 512M > /sys/block/zram0/mem_limit
>> >> > +           echo 1G > /sys/block/zram0/mem_limit
>> >> > +
>> >> > +           # To disable memory limit
>> >> > +           echo 0 > /sys/block/zram0/mem_limit
>> >> > +
>> >> > +6) Activate:
>> >> >         mkswap /dev/zram0
>> >> >         swapon /dev/zram0
>> >> >
>> >> >         mkfs.ext4 /dev/zram1
>> >> >         mount /dev/zram1 /tmp
>> >> >
>> >> > -6) Stats:
>> >> > +7) Stats:
>> >> >         Per-device statistics are exported as various nodes under
>> >> >         /sys/block/zram<id>/
>> >> >                 disksize
>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
>> >> >                 compr_data_size
>> >> >                 mem_used_total
>> >> >
>> >> > -7) Deactivate:
>> >> > +8) Deactivate:
>> >> >         swapoff /dev/zram0
>> >> >         umount /dev/zram1
>> >> >
>> >> > -8) Reset:
>> >> > +9) Reset:
>> >> >         Write any positive value to 'reset' sysfs node
>> >> >         echo 1 > /sys/block/zram0/reset
>> >> >         echo 1 > /sys/block/zram1/reset
>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
>> >> > index f0b8b30a7128..370c355eb127 100644
>> >> > --- a/drivers/block/zram/zram_drv.c
>> >> > +++ b/drivers/block/zram/zram_drv.c
>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
>> >> >         return scnprintf(buf, PAGE_SIZE, "%d\n", val);
>> >> >  }
>> >> >
>> >> > +static ssize_t mem_limit_show(struct device *dev,
>> >> > +               struct device_attribute *attr, char *buf)
>> >> > +{
>> >> > +       u64 val;
>> >> > +       struct zram *zram = dev_to_zram(dev);
>> >> > +
>> >> > +       down_read(&zram->init_lock);
>> >> > +       val = zram->limit_pages;
>> >> > +       up_read(&zram->init_lock);
>> >> > +
>> >> > +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
>> >> > +}
>> >> > +
>> >> > +static ssize_t mem_limit_store(struct device *dev,
>> >> > +               struct device_attribute *attr, const char *buf, size_t len)
>> >> > +{
>> >> > +       u64 limit;
>> >> > +       struct zram *zram = dev_to_zram(dev);
>> >> > +
>> >> > +       limit = memparse(buf, NULL);
>> >>
>> >>             if (limit = 0 && buf != "0")
>> >>                   return  -EINVAL
>> >>
>> >> > +       down_write(&zram->init_lock);
>> >> > +       zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
>> >> > +       up_write(&zram->init_lock);
>> >> > +
>> >> > +       return len;
>> >> > +}
>> >> > +
>> >> >  static ssize_t max_comp_streams_store(struct device *dev,
>> >> >                 struct device_attribute *attr, const char *buf, size_t len)
>> >> >  {
>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>> >> >                 ret = -ENOMEM;
>> >> >                 goto out;
>> >> >         }
>> >> > +
>> >> > +       if (zram->limit_pages &&
>> >> > +               zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
>> >> > +               zs_free(meta->mem_pool, handle);
>> >> > +               ret = -ENOMEM;
>> >> > +               goto out;
>> >> > +       }
>> >> > +
>> >> >         cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>> >> >
>> >> >         if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>> >> >         struct zram_meta *meta;
>> >> >
>> >> >         down_write(&zram->init_lock);
>> >> > +
>> >> > +       zram->limit_pages = 0;
>> >> > +
>> >> >         if (!init_done(zram)) {
>> >> >                 up_write(&zram->init_lock);
>> >> >                 return;
>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
>> >> >  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
>> >> >  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
>> >> >  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
>> >> > +               mem_limit_store);
>> >> >  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
>> >> >                 max_comp_streams_show, max_comp_streams_store);
>> >> >  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
>> >> >         &dev_attr_orig_data_size.attr,
>> >> >         &dev_attr_compr_data_size.attr,
>> >> >         &dev_attr_mem_used_total.attr,
>> >> > +       &dev_attr_mem_limit.attr,
>> >> >         &dev_attr_max_comp_streams.attr,
>> >> >         &dev_attr_comp_algorithm.attr,
>> >> >         NULL,
>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
>> >> > index e0f725c87cc6..b7aa9c21553f 100644
>> >> > --- a/drivers/block/zram/zram_drv.h
>> >> > +++ b/drivers/block/zram/zram_drv.h
>> >> > @@ -112,6 +112,11 @@ struct zram {
>> >> >         u64 disksize;   /* bytes */
>> >> >         int max_comp_streams;
>> >> >         struct zram_stats stats;
>> >> > +       /*
>> >> > +        * the number of pages zram can consume for storing compressed data
>> >> > +        */
>> >> > +       unsigned long limit_pages;
>> >> > +
>> >> >         char compressor[10];
>> >> >  };
>> >> >  #endif
>> >> > --
>> >> > 2.0.0
>> >> >
>> >>
>> >> --
>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >> the body to majordomo@kvack.org.  For more info on Linux MM,
>> >> see: http://www.linux-mm.org/ .
>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>> >
>> > --
>> > Kind regards,
>> > Minchan Kim
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
> --
> Kind regards,
> Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
  2014-08-25  4:37           ` Minchan Kim
@ 2014-08-25  8:25             ` Dongsheng Song
  -1 siblings, 0 replies; 44+ messages in thread
From: Dongsheng Song @ 2014-08-25  8:25 UTC (permalink / raw)
  To: Minchan Kim
  Cc: David Horner, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman

> +What:          /sys/block/zram<id>/mem_limit
> +Date:          August 2014
> +Contact:       Minchan Kim <minchan@kernel.org>
> +Description:
> +               The mem_limit file is read/write and specifies the amount
 > +               of memory to be able to consume memory to store store
> +               compressed data. The limit could be changed in run time
> +               and "0" means disable the limit. No limit is the initial state.

extra word 'store' ?
The mem_limit file is read/write and specifies the amount of memory to
be able to consume memory to store store compressed data.

maybe this better ?
The mem_limit file is read/write and specifies the amount of memory to
store compressed data.

--
Dongsheng

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
@ 2014-08-25  8:25             ` Dongsheng Song
  0 siblings, 0 replies; 44+ messages in thread
From: Dongsheng Song @ 2014-08-25  8:25 UTC (permalink / raw)
  To: Minchan Kim
  Cc: David Horner, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman

> +What:          /sys/block/zram<id>/mem_limit
> +Date:          August 2014
> +Contact:       Minchan Kim <minchan@kernel.org>
> +Description:
> +               The mem_limit file is read/write and specifies the amount
 > +               of memory to be able to consume memory to store store
> +               compressed data. The limit could be changed in run time
> +               and "0" means disable the limit. No limit is the initial state.

extra word 'store' ?
The mem_limit file is read/write and specifies the amount of memory to
be able to consume memory to store store compressed data.

maybe this better ?
The mem_limit file is read/write and specifies the amount of memory to
store compressed data.

--
Dongsheng

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
  2014-08-25  8:22             ` David Horner
@ 2014-08-25 18:12               ` Dan Streetman
  -1 siblings, 0 replies; 44+ messages in thread
From: Dan Streetman @ 2014-08-25 18:12 UTC (permalink / raw)
  To: David Horner
  Cc: Minchan Kim, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings

On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@gmail.com> wrote:
> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote:
>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote:
>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote:
>>> > Hello David,
>>> >
>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote:
>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
>>> >> > Since zram has no control feature to limit memory usage,
>>> >> > it makes hard to manage system memrory.
>>> >> >
>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the
>>> >> > a limit so that zram could fail allocation once it reaches
>>> >> > the limit.
>>> >> >
>>> >> > In addition, user could change the limit in runtime so that
>>> >> > he could manage the memory more dynamically.
>>> >> >
>>> >> - Default is no limit so it doesn't break old behavior.
>>> >> + Initial state is no limit so it doesn't break old behavior.
>>> >>
>>> >> I understand your previous post now.
>>> >>
>>> >> I was saying that setting to either a null value or garbage
>>> >>  (which is interpreted as zero by memparse(buf, NULL);)
>>> >> removes the limit.
>>> >>
>>> >> I think this is "surprise" behaviour and rather the null case should
>>> >> return  -EINVAL
>>> >> The test below should be "good enough" though not catching all garbage.
>>> >
>>> > Thanks for suggesting but as I said, it should be fixed in memparse itself,
>>> > not caller if it is really problem so I don't want to touch it in this
>>> > patchset. It's not critical for adding the feature.
>>> >
>>>
>>> I've looked into the memparse function more since we talked.
>>> I do believe a wrapper function around it for the typical use by sysfs would
>>> be very valuable.
>>
>> Agree.
>>
>>> However, there is nothing wrong with memparse itself that needs to be fixed.
>>>
>>> It does what it is documented to do very well (In My Uninformed Opinion).
>>> It provides everything that a caller needs to manage the token that it
>>> processes.
>>> It thus handles strings like "7,,5,8,,9" with the implied zeros.
>>
>> Maybe strict_memparse would be better to protect such things so you
>> could find several places to clean it up.
>>
>>>
>>> The fact that other callers don't check the return pointer value to
>>> see if only a null
>>> string was processed, is not its fault.
>>> Nor that it may not be ideally suited to sysfs attributes; that other store
>>> functions use it in a given manner does not means that is correct -
>>> nor that it is
>>> incorrect for that "knob". Some attributes could be just as valid with
>>> null zeros.
>>>
>>> And you are correct, to disambiguate the zero is not required for the
>>> limit feature.
>>> Your original patch which disallowed zero was full feature for mem_limit.
>>> It is the requested non-crucial feature to allow zero to reestablish
>>> the initial state
>>>  that benefits from distinguishing an explicit zero from a "default zero'
>>>  when garbage is written.
>>>
>>> The final argument is that if we release this feature as is the undocumented
>>>  functionality could be relied upon, and when later fixed: user space breaks.
>>
>> I don't get it. Why does it break userspace?
>> The sysfs-block-zram says "0" means disable the limit.
>> If someone writes *garabge* but work as if disabling the limit,
>> it's not a right thing and he already broke although it worked
>> so it would be not a problem if we fix later.
>> (ie, we don't need to take care of broken userspace)
>> Am I missing your point?
>>
>
> Perhaps you are missing my point, perhaps ignoring or dismissing.
>
> Basically, if a facility works in a useful way, even if it was designed for
> different usage, that becomes the "accepted" interface/usage.
> The developer may not have intended that usage or may even considered
> it wrong and a broken usage, but it is what it is and people become
>  reliant on that behaviour.
>
> Case in point is memparse itself.
>
> The developer intentionally sets the return pointer because that is the
> only value that can be validated for correct performance.
> The return value allows -ve so the standard error message passing is not valid.
> Unfortunately, C allows the user to pass a NULL value in the parameter.
> The developer could consider that absurd and fundamentally broken.
> But to the user it is a valid situation, because (perhaps) it can't be
> bothered to handle error cases.
>
> So, who is to blame.
> You say memparse, that it is fundamentally broken,
>   because it didn't check to see that it was used correctly.
>  And I say  mem_limit_store is fundamentally broken,
>   because it didn't check to see that it was used correctly.

I think we should look at what the rest of the kernel does as far as
checking memparse results.  It appears to be a mix of some code
checking memparse while others don't.  The most common way to check
appears to be to verify that memparse actually parsed at least 1
character, e.g.:
  oldp = p;
  mem_size = memparse(p, &p);
  if (p == oldp)
    return -EINVAL;

although other places where 0 isn't valid can simply check for that:
  mem_size = memparse(p, &p);
  /* don't remove all of memory when handling "mem={invalid}" param */
  if (mem_size == 0)
    return -EINVAL;

or even the other memparse use in zram_drv.c:
  disksize = memparse(buf, NULL);
  if (!disksize)
    return -EINVAL;


And there seem to be other places where (maybe?) there's no checking
at all.  However, it also seems like many cases of memparse usage are
looking for a non-zero value, and therefore they can either
immediately check for zero/invalid or (possibly) later code has checks
to avoid using any zero value.  In this case though, 0 is a valid
value.  So, while I agree that if a user passes an invalid (i.e.
non-numeric) value it's clearly user error, it might be closer to the
apparent (although unwritten AFAICT) memparse usage api to check the
result for validity; in our case a simple check if at least 1 char was
parsed is all that's needed, e.g.:

{
  u64 limit;
  char *tmp = buf;
  struct zram *zram = dev_to_zram(dev);

  limit = memparse(buf, &tmp);
  if (buf == tmp) /* no chars parsed, invalid input */
    return -EINVAL;
  down_write(&zram->init_lock);
...


Separate from this patch, it would also help if the lib/cmdline.c
memparse doc was at least updated to clarify when the result should be
checked for validity (e.g. always, or at least when the result is 0)
and how best to do that (e.g. if 0 is an invalid value, just check if
the result is 0; if 0 is a possible valid value, check if any chars
were parsed).


>
> The difference is that memparse cannot stop being abused
> (C allows the NULL argument and extensive tricks are required to address that)
> however, we can readily fix mem_limit_store and ensure
> 1) no regression when the interface IS fixed and
> 2) predictable behaviour when accidental or "fuzzy" input arrives.
>
>
>>> They say getting API right is a difficult exercise. I suggest, if we
>>> don't insisting on
>>>  an explicit zero we have the API wrong.
>>>
>>> I don't think you disagreed, just that the burden to get it correct
>>> lay elsewhere.
>>>
>>> If that is the case it doesn't really matter, we cannot release this
>>> interface until
>>>  it is corrected wherever it must be.
>>>
>>> And my zero check was a poor hack.
>>>
>>> I should have explicitly checked the returned pointer value.
>>>
>>> I will send that proposed revision, and hopefully you will consider it
>>> for inclusion.
>>>
>>>
>>>
>>>
>>> >>
>>> >> >
>>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org>
>>> >> > ---
>>> >> >  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
>>> >> >  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
>>> >> >  drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
>>> >> >  drivers/block/zram/zram_drv.h              |  5 ++++
>>> >> >  4 files changed, 76 insertions(+), 4 deletions(-)
>>> >> >
>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
>>> >> > index 70ec992514d0..b8c779d64968 100644
>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram
>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram
>>> >> > @@ -119,3 +119,13 @@ Description:
>>> >> >                 efficiency can be calculated using compr_data_size and this
>>> >> >                 statistic.
>>> >> >                 Unit: bytes
>>> >> > +
>>> >> > +What:          /sys/block/zram<id>/mem_limit
>>> >> > +Date:          August 2014
>>> >> > +Contact:       Minchan Kim <minchan@kernel.org>
>>> >> > +Description:
>>> >> > +               The mem_limit file is read/write and specifies the amount
>>> >> > +               of memory to be able to consume memory to store store
>>> >> > +               compressed data. The limit could be changed in run time
>>> >> > -               and "0" is default which means disable the limit.
>>> >> > +               and "0" means disable the limit. No limit is the initial state.
>>> >>
>>> >> there should be no default in the API.
>>> >
>>> > Thanks.
>>> >
>>> >>
>>> >> > +               Unit: bytes
>>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
>>> >> > index 0595c3f56ccf..82c6a41116db 100644
>>> >> > --- a/Documentation/blockdev/zram.txt
>>> >> > +++ b/Documentation/blockdev/zram.txt
>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
>>> >> >  since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
>>> >> >  size of the disk when not in use so a huge zram is wasteful.
>>> >> >
>>> >> > -5) Activate:
>>> >> > +5) Set memory limit: Optional
>>> >> > +       Set memory limit by writing the value to sysfs node 'mem_limit'.
>>> >> > +       The value can be either in bytes or you can use mem suffixes.
>>> >> > +       In addition, you could change the value in runtime.
>>> >> > +       Examples:
>>> >> > +           # limit /dev/zram0 with 50MB memory
>>> >> > +           echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
>>> >> > +
>>> >> > +           # Using mem suffixes
>>> >> > +           echo 256K > /sys/block/zram0/mem_limit
>>> >> > +           echo 512M > /sys/block/zram0/mem_limit
>>> >> > +           echo 1G > /sys/block/zram0/mem_limit
>>> >> > +
>>> >> > +           # To disable memory limit
>>> >> > +           echo 0 > /sys/block/zram0/mem_limit
>>> >> > +
>>> >> > +6) Activate:
>>> >> >         mkswap /dev/zram0
>>> >> >         swapon /dev/zram0
>>> >> >
>>> >> >         mkfs.ext4 /dev/zram1
>>> >> >         mount /dev/zram1 /tmp
>>> >> >
>>> >> > -6) Stats:
>>> >> > +7) Stats:
>>> >> >         Per-device statistics are exported as various nodes under
>>> >> >         /sys/block/zram<id>/
>>> >> >                 disksize
>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
>>> >> >                 compr_data_size
>>> >> >                 mem_used_total
>>> >> >
>>> >> > -7) Deactivate:
>>> >> > +8) Deactivate:
>>> >> >         swapoff /dev/zram0
>>> >> >         umount /dev/zram1
>>> >> >
>>> >> > -8) Reset:
>>> >> > +9) Reset:
>>> >> >         Write any positive value to 'reset' sysfs node
>>> >> >         echo 1 > /sys/block/zram0/reset
>>> >> >         echo 1 > /sys/block/zram1/reset
>>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
>>> >> > index f0b8b30a7128..370c355eb127 100644
>>> >> > --- a/drivers/block/zram/zram_drv.c
>>> >> > +++ b/drivers/block/zram/zram_drv.c
>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
>>> >> >         return scnprintf(buf, PAGE_SIZE, "%d\n", val);
>>> >> >  }
>>> >> >
>>> >> > +static ssize_t mem_limit_show(struct device *dev,
>>> >> > +               struct device_attribute *attr, char *buf)
>>> >> > +{
>>> >> > +       u64 val;
>>> >> > +       struct zram *zram = dev_to_zram(dev);
>>> >> > +
>>> >> > +       down_read(&zram->init_lock);
>>> >> > +       val = zram->limit_pages;
>>> >> > +       up_read(&zram->init_lock);
>>> >> > +
>>> >> > +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
>>> >> > +}
>>> >> > +
>>> >> > +static ssize_t mem_limit_store(struct device *dev,
>>> >> > +               struct device_attribute *attr, const char *buf, size_t len)
>>> >> > +{
>>> >> > +       u64 limit;
>>> >> > +       struct zram *zram = dev_to_zram(dev);
>>> >> > +
>>> >> > +       limit = memparse(buf, NULL);
>>> >>
>>> >>             if (limit = 0 && buf != "0")
>>> >>                   return  -EINVAL
>>> >>
>>> >> > +       down_write(&zram->init_lock);
>>> >> > +       zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
>>> >> > +       up_write(&zram->init_lock);
>>> >> > +
>>> >> > +       return len;
>>> >> > +}
>>> >> > +
>>> >> >  static ssize_t max_comp_streams_store(struct device *dev,
>>> >> >                 struct device_attribute *attr, const char *buf, size_t len)
>>> >> >  {
>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>>> >> >                 ret = -ENOMEM;
>>> >> >                 goto out;
>>> >> >         }
>>> >> > +
>>> >> > +       if (zram->limit_pages &&
>>> >> > +               zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
>>> >> > +               zs_free(meta->mem_pool, handle);
>>> >> > +               ret = -ENOMEM;
>>> >> > +               goto out;
>>> >> > +       }
>>> >> > +
>>> >> >         cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>>> >> >
>>> >> >         if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>>> >> >         struct zram_meta *meta;
>>> >> >
>>> >> >         down_write(&zram->init_lock);
>>> >> > +
>>> >> > +       zram->limit_pages = 0;
>>> >> > +
>>> >> >         if (!init_done(zram)) {
>>> >> >                 up_write(&zram->init_lock);
>>> >> >                 return;
>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
>>> >> >  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
>>> >> >  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
>>> >> >  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
>>> >> > +               mem_limit_store);
>>> >> >  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
>>> >> >                 max_comp_streams_show, max_comp_streams_store);
>>> >> >  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
>>> >> >         &dev_attr_orig_data_size.attr,
>>> >> >         &dev_attr_compr_data_size.attr,
>>> >> >         &dev_attr_mem_used_total.attr,
>>> >> > +       &dev_attr_mem_limit.attr,
>>> >> >         &dev_attr_max_comp_streams.attr,
>>> >> >         &dev_attr_comp_algorithm.attr,
>>> >> >         NULL,
>>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
>>> >> > index e0f725c87cc6..b7aa9c21553f 100644
>>> >> > --- a/drivers/block/zram/zram_drv.h
>>> >> > +++ b/drivers/block/zram/zram_drv.h
>>> >> > @@ -112,6 +112,11 @@ struct zram {
>>> >> >         u64 disksize;   /* bytes */
>>> >> >         int max_comp_streams;
>>> >> >         struct zram_stats stats;
>>> >> > +       /*
>>> >> > +        * the number of pages zram can consume for storing compressed data
>>> >> > +        */
>>> >> > +       unsigned long limit_pages;
>>> >> > +
>>> >> >         char compressor[10];
>>> >> >  };
>>> >> >  #endif
>>> >> > --
>>> >> > 2.0.0
>>> >> >
>>> >>
>>> >> --
>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> >> the body to majordomo@kvack.org.  For more info on Linux MM,
>>> >> see: http://www.linux-mm.org/ .
>>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>> >
>>> > --
>>> > Kind regards,
>>> > Minchan Kim
>>>
>>> --
>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>> see: http://www.linux-mm.org/ .
>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>
>> --
>> Kind regards,
>> Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
@ 2014-08-25 18:12               ` Dan Streetman
  0 siblings, 0 replies; 44+ messages in thread
From: Dan Streetman @ 2014-08-25 18:12 UTC (permalink / raw)
  To: David Horner
  Cc: Minchan Kim, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings

On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@gmail.com> wrote:
> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote:
>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote:
>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote:
>>> > Hello David,
>>> >
>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote:
>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
>>> >> > Since zram has no control feature to limit memory usage,
>>> >> > it makes hard to manage system memrory.
>>> >> >
>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the
>>> >> > a limit so that zram could fail allocation once it reaches
>>> >> > the limit.
>>> >> >
>>> >> > In addition, user could change the limit in runtime so that
>>> >> > he could manage the memory more dynamically.
>>> >> >
>>> >> - Default is no limit so it doesn't break old behavior.
>>> >> + Initial state is no limit so it doesn't break old behavior.
>>> >>
>>> >> I understand your previous post now.
>>> >>
>>> >> I was saying that setting to either a null value or garbage
>>> >>  (which is interpreted as zero by memparse(buf, NULL);)
>>> >> removes the limit.
>>> >>
>>> >> I think this is "surprise" behaviour and rather the null case should
>>> >> return  -EINVAL
>>> >> The test below should be "good enough" though not catching all garbage.
>>> >
>>> > Thanks for suggesting but as I said, it should be fixed in memparse itself,
>>> > not caller if it is really problem so I don't want to touch it in this
>>> > patchset. It's not critical for adding the feature.
>>> >
>>>
>>> I've looked into the memparse function more since we talked.
>>> I do believe a wrapper function around it for the typical use by sysfs would
>>> be very valuable.
>>
>> Agree.
>>
>>> However, there is nothing wrong with memparse itself that needs to be fixed.
>>>
>>> It does what it is documented to do very well (In My Uninformed Opinion).
>>> It provides everything that a caller needs to manage the token that it
>>> processes.
>>> It thus handles strings like "7,,5,8,,9" with the implied zeros.
>>
>> Maybe strict_memparse would be better to protect such things so you
>> could find several places to clean it up.
>>
>>>
>>> The fact that other callers don't check the return pointer value to
>>> see if only a null
>>> string was processed, is not its fault.
>>> Nor that it may not be ideally suited to sysfs attributes; that other store
>>> functions use it in a given manner does not means that is correct -
>>> nor that it is
>>> incorrect for that "knob". Some attributes could be just as valid with
>>> null zeros.
>>>
>>> And you are correct, to disambiguate the zero is not required for the
>>> limit feature.
>>> Your original patch which disallowed zero was full feature for mem_limit.
>>> It is the requested non-crucial feature to allow zero to reestablish
>>> the initial state
>>>  that benefits from distinguishing an explicit zero from a "default zero'
>>>  when garbage is written.
>>>
>>> The final argument is that if we release this feature as is the undocumented
>>>  functionality could be relied upon, and when later fixed: user space breaks.
>>
>> I don't get it. Why does it break userspace?
>> The sysfs-block-zram says "0" means disable the limit.
>> If someone writes *garabge* but work as if disabling the limit,
>> it's not a right thing and he already broke although it worked
>> so it would be not a problem if we fix later.
>> (ie, we don't need to take care of broken userspace)
>> Am I missing your point?
>>
>
> Perhaps you are missing my point, perhaps ignoring or dismissing.
>
> Basically, if a facility works in a useful way, even if it was designed for
> different usage, that becomes the "accepted" interface/usage.
> The developer may not have intended that usage or may even considered
> it wrong and a broken usage, but it is what it is and people become
>  reliant on that behaviour.
>
> Case in point is memparse itself.
>
> The developer intentionally sets the return pointer because that is the
> only value that can be validated for correct performance.
> The return value allows -ve so the standard error message passing is not valid.
> Unfortunately, C allows the user to pass a NULL value in the parameter.
> The developer could consider that absurd and fundamentally broken.
> But to the user it is a valid situation, because (perhaps) it can't be
> bothered to handle error cases.
>
> So, who is to blame.
> You say memparse, that it is fundamentally broken,
>   because it didn't check to see that it was used correctly.
>  And I say  mem_limit_store is fundamentally broken,
>   because it didn't check to see that it was used correctly.

I think we should look at what the rest of the kernel does as far as
checking memparse results.  It appears to be a mix of some code
checking memparse while others don't.  The most common way to check
appears to be to verify that memparse actually parsed at least 1
character, e.g.:
  oldp = p;
  mem_size = memparse(p, &p);
  if (p == oldp)
    return -EINVAL;

although other places where 0 isn't valid can simply check for that:
  mem_size = memparse(p, &p);
  /* don't remove all of memory when handling "mem={invalid}" param */
  if (mem_size == 0)
    return -EINVAL;

or even the other memparse use in zram_drv.c:
  disksize = memparse(buf, NULL);
  if (!disksize)
    return -EINVAL;


And there seem to be other places where (maybe?) there's no checking
at all.  However, it also seems like many cases of memparse usage are
looking for a non-zero value, and therefore they can either
immediately check for zero/invalid or (possibly) later code has checks
to avoid using any zero value.  In this case though, 0 is a valid
value.  So, while I agree that if a user passes an invalid (i.e.
non-numeric) value it's clearly user error, it might be closer to the
apparent (although unwritten AFAICT) memparse usage api to check the
result for validity; in our case a simple check if at least 1 char was
parsed is all that's needed, e.g.:

{
  u64 limit;
  char *tmp = buf;
  struct zram *zram = dev_to_zram(dev);

  limit = memparse(buf, &tmp);
  if (buf == tmp) /* no chars parsed, invalid input */
    return -EINVAL;
  down_write(&zram->init_lock);
...


Separate from this patch, it would also help if the lib/cmdline.c
memparse doc was at least updated to clarify when the result should be
checked for validity (e.g. always, or at least when the result is 0)
and how best to do that (e.g. if 0 is an invalid value, just check if
the result is 0; if 0 is a possible valid value, check if any chars
were parsed).


>
> The difference is that memparse cannot stop being abused
> (C allows the NULL argument and extensive tricks are required to address that)
> however, we can readily fix mem_limit_store and ensure
> 1) no regression when the interface IS fixed and
> 2) predictable behaviour when accidental or "fuzzy" input arrives.
>
>
>>> They say getting API right is a difficult exercise. I suggest, if we
>>> don't insisting on
>>>  an explicit zero we have the API wrong.
>>>
>>> I don't think you disagreed, just that the burden to get it correct
>>> lay elsewhere.
>>>
>>> If that is the case it doesn't really matter, we cannot release this
>>> interface until
>>>  it is corrected wherever it must be.
>>>
>>> And my zero check was a poor hack.
>>>
>>> I should have explicitly checked the returned pointer value.
>>>
>>> I will send that proposed revision, and hopefully you will consider it
>>> for inclusion.
>>>
>>>
>>>
>>>
>>> >>
>>> >> >
>>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org>
>>> >> > ---
>>> >> >  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
>>> >> >  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
>>> >> >  drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
>>> >> >  drivers/block/zram/zram_drv.h              |  5 ++++
>>> >> >  4 files changed, 76 insertions(+), 4 deletions(-)
>>> >> >
>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
>>> >> > index 70ec992514d0..b8c779d64968 100644
>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram
>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram
>>> >> > @@ -119,3 +119,13 @@ Description:
>>> >> >                 efficiency can be calculated using compr_data_size and this
>>> >> >                 statistic.
>>> >> >                 Unit: bytes
>>> >> > +
>>> >> > +What:          /sys/block/zram<id>/mem_limit
>>> >> > +Date:          August 2014
>>> >> > +Contact:       Minchan Kim <minchan@kernel.org>
>>> >> > +Description:
>>> >> > +               The mem_limit file is read/write and specifies the amount
>>> >> > +               of memory to be able to consume memory to store store
>>> >> > +               compressed data. The limit could be changed in run time
>>> >> > -               and "0" is default which means disable the limit.
>>> >> > +               and "0" means disable the limit. No limit is the initial state.
>>> >>
>>> >> there should be no default in the API.
>>> >
>>> > Thanks.
>>> >
>>> >>
>>> >> > +               Unit: bytes
>>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
>>> >> > index 0595c3f56ccf..82c6a41116db 100644
>>> >> > --- a/Documentation/blockdev/zram.txt
>>> >> > +++ b/Documentation/blockdev/zram.txt
>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
>>> >> >  since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
>>> >> >  size of the disk when not in use so a huge zram is wasteful.
>>> >> >
>>> >> > -5) Activate:
>>> >> > +5) Set memory limit: Optional
>>> >> > +       Set memory limit by writing the value to sysfs node 'mem_limit'.
>>> >> > +       The value can be either in bytes or you can use mem suffixes.
>>> >> > +       In addition, you could change the value in runtime.
>>> >> > +       Examples:
>>> >> > +           # limit /dev/zram0 with 50MB memory
>>> >> > +           echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
>>> >> > +
>>> >> > +           # Using mem suffixes
>>> >> > +           echo 256K > /sys/block/zram0/mem_limit
>>> >> > +           echo 512M > /sys/block/zram0/mem_limit
>>> >> > +           echo 1G > /sys/block/zram0/mem_limit
>>> >> > +
>>> >> > +           # To disable memory limit
>>> >> > +           echo 0 > /sys/block/zram0/mem_limit
>>> >> > +
>>> >> > +6) Activate:
>>> >> >         mkswap /dev/zram0
>>> >> >         swapon /dev/zram0
>>> >> >
>>> >> >         mkfs.ext4 /dev/zram1
>>> >> >         mount /dev/zram1 /tmp
>>> >> >
>>> >> > -6) Stats:
>>> >> > +7) Stats:
>>> >> >         Per-device statistics are exported as various nodes under
>>> >> >         /sys/block/zram<id>/
>>> >> >                 disksize
>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
>>> >> >                 compr_data_size
>>> >> >                 mem_used_total
>>> >> >
>>> >> > -7) Deactivate:
>>> >> > +8) Deactivate:
>>> >> >         swapoff /dev/zram0
>>> >> >         umount /dev/zram1
>>> >> >
>>> >> > -8) Reset:
>>> >> > +9) Reset:
>>> >> >         Write any positive value to 'reset' sysfs node
>>> >> >         echo 1 > /sys/block/zram0/reset
>>> >> >         echo 1 > /sys/block/zram1/reset
>>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
>>> >> > index f0b8b30a7128..370c355eb127 100644
>>> >> > --- a/drivers/block/zram/zram_drv.c
>>> >> > +++ b/drivers/block/zram/zram_drv.c
>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
>>> >> >         return scnprintf(buf, PAGE_SIZE, "%d\n", val);
>>> >> >  }
>>> >> >
>>> >> > +static ssize_t mem_limit_show(struct device *dev,
>>> >> > +               struct device_attribute *attr, char *buf)
>>> >> > +{
>>> >> > +       u64 val;
>>> >> > +       struct zram *zram = dev_to_zram(dev);
>>> >> > +
>>> >> > +       down_read(&zram->init_lock);
>>> >> > +       val = zram->limit_pages;
>>> >> > +       up_read(&zram->init_lock);
>>> >> > +
>>> >> > +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
>>> >> > +}
>>> >> > +
>>> >> > +static ssize_t mem_limit_store(struct device *dev,
>>> >> > +               struct device_attribute *attr, const char *buf, size_t len)
>>> >> > +{
>>> >> > +       u64 limit;
>>> >> > +       struct zram *zram = dev_to_zram(dev);
>>> >> > +
>>> >> > +       limit = memparse(buf, NULL);
>>> >>
>>> >>             if (limit = 0 && buf != "0")
>>> >>                   return  -EINVAL
>>> >>
>>> >> > +       down_write(&zram->init_lock);
>>> >> > +       zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
>>> >> > +       up_write(&zram->init_lock);
>>> >> > +
>>> >> > +       return len;
>>> >> > +}
>>> >> > +
>>> >> >  static ssize_t max_comp_streams_store(struct device *dev,
>>> >> >                 struct device_attribute *attr, const char *buf, size_t len)
>>> >> >  {
>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>>> >> >                 ret = -ENOMEM;
>>> >> >                 goto out;
>>> >> >         }
>>> >> > +
>>> >> > +       if (zram->limit_pages &&
>>> >> > +               zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
>>> >> > +               zs_free(meta->mem_pool, handle);
>>> >> > +               ret = -ENOMEM;
>>> >> > +               goto out;
>>> >> > +       }
>>> >> > +
>>> >> >         cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>>> >> >
>>> >> >         if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>>> >> >         struct zram_meta *meta;
>>> >> >
>>> >> >         down_write(&zram->init_lock);
>>> >> > +
>>> >> > +       zram->limit_pages = 0;
>>> >> > +
>>> >> >         if (!init_done(zram)) {
>>> >> >                 up_write(&zram->init_lock);
>>> >> >                 return;
>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
>>> >> >  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
>>> >> >  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
>>> >> >  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
>>> >> > +               mem_limit_store);
>>> >> >  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
>>> >> >                 max_comp_streams_show, max_comp_streams_store);
>>> >> >  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
>>> >> >         &dev_attr_orig_data_size.attr,
>>> >> >         &dev_attr_compr_data_size.attr,
>>> >> >         &dev_attr_mem_used_total.attr,
>>> >> > +       &dev_attr_mem_limit.attr,
>>> >> >         &dev_attr_max_comp_streams.attr,
>>> >> >         &dev_attr_comp_algorithm.attr,
>>> >> >         NULL,
>>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
>>> >> > index e0f725c87cc6..b7aa9c21553f 100644
>>> >> > --- a/drivers/block/zram/zram_drv.h
>>> >> > +++ b/drivers/block/zram/zram_drv.h
>>> >> > @@ -112,6 +112,11 @@ struct zram {
>>> >> >         u64 disksize;   /* bytes */
>>> >> >         int max_comp_streams;
>>> >> >         struct zram_stats stats;
>>> >> > +       /*
>>> >> > +        * the number of pages zram can consume for storing compressed data
>>> >> > +        */
>>> >> > +       unsigned long limit_pages;
>>> >> > +
>>> >> >         char compressor[10];
>>> >> >  };
>>> >> >  #endif
>>> >> > --
>>> >> > 2.0.0
>>> >> >
>>> >>
>>> >> --
>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> >> the body to majordomo@kvack.org.  For more info on Linux MM,
>>> >> see: http://www.linux-mm.org/ .
>>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>> >
>>> > --
>>> > Kind regards,
>>> > Minchan Kim
>>>
>>> --
>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>> see: http://www.linux-mm.org/ .
>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>
>> --
>> Kind regards,
>> Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
  2014-08-25 18:12               ` Dan Streetman
@ 2014-08-26  1:54                 ` David Horner
  -1 siblings, 0 replies; 44+ messages in thread
From: David Horner @ 2014-08-26  1:54 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Minchan Kim, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings

On Mon, Aug 25, 2014 at 2:12 PM, Dan Streetman <ddstreet@ieee.org> wrote:
> On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@gmail.com> wrote:
>> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote:
>>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote:
>>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote:
>>>> > Hello David,
>>>> >
>>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote:
>>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
>>>> >> > Since zram has no control feature to limit memory usage,
>>>> >> > it makes hard to manage system memrory.
>>>> >> >
>>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the
>>>> >> > a limit so that zram could fail allocation once it reaches
>>>> >> > the limit.
>>>> >> >
>>>> >> > In addition, user could change the limit in runtime so that
>>>> >> > he could manage the memory more dynamically.
>>>> >> >
>>>> >> - Default is no limit so it doesn't break old behavior.
>>>> >> + Initial state is no limit so it doesn't break old behavior.
>>>> >>
>>>> >> I understand your previous post now.
>>>> >>
>>>> >> I was saying that setting to either a null value or garbage
>>>> >>  (which is interpreted as zero by memparse(buf, NULL);)
>>>> >> removes the limit.
>>>> >>
>>>> >> I think this is "surprise" behaviour and rather the null case should
>>>> >> return  -EINVAL
>>>> >> The test below should be "good enough" though not catching all garbage.
>>>> >
>>>> > Thanks for suggesting but as I said, it should be fixed in memparse itself,
>>>> > not caller if it is really problem so I don't want to touch it in this
>>>> > patchset. It's not critical for adding the feature.
>>>> >
>>>>
>>>> I've looked into the memparse function more since we talked.
>>>> I do believe a wrapper function around it for the typical use by sysfs would
>>>> be very valuable.
>>>
>>> Agree.
>>>
>>>> However, there is nothing wrong with memparse itself that needs to be fixed.
>>>>
>>>> It does what it is documented to do very well (In My Uninformed Opinion).
>>>> It provides everything that a caller needs to manage the token that it
>>>> processes.
>>>> It thus handles strings like "7,,5,8,,9" with the implied zeros.
>>>
>>> Maybe strict_memparse would be better to protect such things so you
>>> could find several places to clean it up.
>>>
>>>>
>>>> The fact that other callers don't check the return pointer value to
>>>> see if only a null
>>>> string was processed, is not its fault.
>>>> Nor that it may not be ideally suited to sysfs attributes; that other store
>>>> functions use it in a given manner does not means that is correct -
>>>> nor that it is
>>>> incorrect for that "knob". Some attributes could be just as valid with
>>>> null zeros.
>>>>
>>>> And you are correct, to disambiguate the zero is not required for the
>>>> limit feature.
>>>> Your original patch which disallowed zero was full feature for mem_limit.
>>>> It is the requested non-crucial feature to allow zero to reestablish
>>>> the initial state
>>>>  that benefits from distinguishing an explicit zero from a "default zero'
>>>>  when garbage is written.
>>>>
>>>> The final argument is that if we release this feature as is the undocumented
>>>>  functionality could be relied upon, and when later fixed: user space breaks.
>>>
>>> I don't get it. Why does it break userspace?
>>> The sysfs-block-zram says "0" means disable the limit.
>>> If someone writes *garabge* but work as if disabling the limit,
>>> it's not a right thing and he already broke although it worked
>>> so it would be not a problem if we fix later.
>>> (ie, we don't need to take care of broken userspace)
>>> Am I missing your point?
>>>
>>
>> Perhaps you are missing my point, perhaps ignoring or dismissing.
>>
>> Basically, if a facility works in a useful way, even if it was designed for
>> different usage, that becomes the "accepted" interface/usage.
>> The developer may not have intended that usage or may even considered
>> it wrong and a broken usage, but it is what it is and people become
>>  reliant on that behaviour.
>>
>> Case in point is memparse itself.
>>
>> The developer intentionally sets the return pointer because that is the
>> only value that can be validated for correct performance.
>> The return value allows -ve so the standard error message passing is not valid.
>> Unfortunately, C allows the user to pass a NULL value in the parameter.
>> The developer could consider that absurd and fundamentally broken.
>> But to the user it is a valid situation, because (perhaps) it can't be
>> bothered to handle error cases.
>>
>> So, who is to blame.
>> You say memparse, that it is fundamentally broken,
>>   because it didn't check to see that it was used correctly.
>>  And I say  mem_limit_store is fundamentally broken,
>>   because it didn't check to see that it was used correctly.
>
> I think we should look at what the rest of the kernel does as far as
> checking memparse results.  It appears to be a mix of some code
> checking memparse while others don't.  The most common way to check
> appears to be to verify that memparse actually parsed at least 1
> character, e.g.:
>   oldp = p;
>   mem_size = memparse(p, &p);
>   if (p == oldp)
>     return -EINVAL;
>
> although other places where 0 isn't valid can simply check for that:
>   mem_size = memparse(p, &p);
>   /* don't remove all of memory when handling "mem={invalid}" param */
>   if (mem_size == 0)
>     return -EINVAL;
>
> or even the other memparse use in zram_drv.c:
>   disksize = memparse(buf, NULL);
>   if (!disksize)
>     return -EINVAL;
>
>
> And there seem to be other places where (maybe?) there's no checking
> at all.  However, it also seems like many cases of memparse usage are
> looking for a non-zero value, and therefore they can either
> immediately check for zero/invalid or (possibly) later code has checks
> to avoid using any zero value.  In this case though, 0 is a valid
> value.  So, while I agree that if a user passes an invalid (i.e.
> non-numeric) value it's clearly user error, it might be closer to the
> apparent (although unwritten AFAICT) memparse usage api to check the
> result for validity; in our case a simple check if at least 1 char was
> parsed is all that's needed, e.g.:
>
> {
>   u64 limit;
>   char *tmp = buf;
>   struct zram *zram = dev_to_zram(dev);
>
>   limit = memparse(buf, &tmp);
>   if (buf == tmp) /* no chars parsed, invalid input */
>     return -EINVAL;
>   down_write(&zram->init_lock);


Thank you Dan, for this clear, unoffensive and I believe compelling analysis.

I have much to learn.

> ...
>
>
> Separate from this patch, it would also help if the lib/cmdline.c
> memparse doc was at least updated to clarify when the result should be
> checked for validity (e.g. always, or at least when the result is 0)
> and how best to do that (e.g. if 0 is an invalid value, just check if
> the result is 0; if 0 is a possible valid value, check if any chars
> were parsed).
>
>

I'd argue that the code is not the place for this usage recommendation.
But rather an expansion of the support doc for sysfs
on how to use such parsing/validation routines.

I agree with Minchan that these helper functions could be improved
for specific use by sysfs.
 And I will pursue this. (and maybe the documentation?)


>>
>> The difference is that memparse cannot stop being abused
>> (C allows the NULL argument and extensive tricks are required to address that)
>> however, we can readily fix mem_limit_store and ensure
>> 1) no regression when the interface IS fixed and
>> 2) predictable behaviour when accidental or "fuzzy" input arrives.
>>
>>
>>>> They say getting API right is a difficult exercise. I suggest, if we
>>>> don't insisting on
>>>>  an explicit zero we have the API wrong.
>>>>
>>>> I don't think you disagreed, just that the burden to get it correct
>>>> lay elsewhere.
>>>>
>>>> If that is the case it doesn't really matter, we cannot release this
>>>> interface until
>>>>  it is corrected wherever it must be.
>>>>
>>>> And my zero check was a poor hack.
>>>>
>>>> I should have explicitly checked the returned pointer value.
>>>>
>>>> I will send that proposed revision, and hopefully you will consider it
>>>> for inclusion.
>>>>
>>>>
>>>>
>>>>
>>>> >>
>>>> >> >
>>>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org>
>>>> >> > ---
>>>> >> >  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
>>>> >> >  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
>>>> >> >  drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
>>>> >> >  drivers/block/zram/zram_drv.h              |  5 ++++
>>>> >> >  4 files changed, 76 insertions(+), 4 deletions(-)
>>>> >> >
>>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
>>>> >> > index 70ec992514d0..b8c779d64968 100644
>>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram
>>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram
>>>> >> > @@ -119,3 +119,13 @@ Description:
>>>> >> >                 efficiency can be calculated using compr_data_size and this
>>>> >> >                 statistic.
>>>> >> >                 Unit: bytes
>>>> >> > +
>>>> >> > +What:          /sys/block/zram<id>/mem_limit
>>>> >> > +Date:          August 2014
>>>> >> > +Contact:       Minchan Kim <minchan@kernel.org>
>>>> >> > +Description:
>>>> >> > +               The mem_limit file is read/write and specifies the amount
>>>> >> > +               of memory to be able to consume memory to store store
>>>> >> > +               compressed data. The limit could be changed in run time
>>>> >> > -               and "0" is default which means disable the limit.
>>>> >> > +               and "0" means disable the limit. No limit is the initial state.
>>>> >>
>>>> >> there should be no default in the API.
>>>> >
>>>> > Thanks.
>>>> >
>>>> >>
>>>> >> > +               Unit: bytes
>>>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
>>>> >> > index 0595c3f56ccf..82c6a41116db 100644
>>>> >> > --- a/Documentation/blockdev/zram.txt
>>>> >> > +++ b/Documentation/blockdev/zram.txt
>>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
>>>> >> >  since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
>>>> >> >  size of the disk when not in use so a huge zram is wasteful.
>>>> >> >
>>>> >> > -5) Activate:
>>>> >> > +5) Set memory limit: Optional
>>>> >> > +       Set memory limit by writing the value to sysfs node 'mem_limit'.
>>>> >> > +       The value can be either in bytes or you can use mem suffixes.
>>>> >> > +       In addition, you could change the value in runtime.
>>>> >> > +       Examples:
>>>> >> > +           # limit /dev/zram0 with 50MB memory
>>>> >> > +           echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
>>>> >> > +
>>>> >> > +           # Using mem suffixes
>>>> >> > +           echo 256K > /sys/block/zram0/mem_limit
>>>> >> > +           echo 512M > /sys/block/zram0/mem_limit
>>>> >> > +           echo 1G > /sys/block/zram0/mem_limit
>>>> >> > +
>>>> >> > +           # To disable memory limit
>>>> >> > +           echo 0 > /sys/block/zram0/mem_limit
>>>> >> > +
>>>> >> > +6) Activate:
>>>> >> >         mkswap /dev/zram0
>>>> >> >         swapon /dev/zram0
>>>> >> >
>>>> >> >         mkfs.ext4 /dev/zram1
>>>> >> >         mount /dev/zram1 /tmp
>>>> >> >
>>>> >> > -6) Stats:
>>>> >> > +7) Stats:
>>>> >> >         Per-device statistics are exported as various nodes under
>>>> >> >         /sys/block/zram<id>/
>>>> >> >                 disksize
>>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
>>>> >> >                 compr_data_size
>>>> >> >                 mem_used_total
>>>> >> >
>>>> >> > -7) Deactivate:
>>>> >> > +8) Deactivate:
>>>> >> >         swapoff /dev/zram0
>>>> >> >         umount /dev/zram1
>>>> >> >
>>>> >> > -8) Reset:
>>>> >> > +9) Reset:
>>>> >> >         Write any positive value to 'reset' sysfs node
>>>> >> >         echo 1 > /sys/block/zram0/reset
>>>> >> >         echo 1 > /sys/block/zram1/reset
>>>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
>>>> >> > index f0b8b30a7128..370c355eb127 100644
>>>> >> > --- a/drivers/block/zram/zram_drv.c
>>>> >> > +++ b/drivers/block/zram/zram_drv.c
>>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
>>>> >> >         return scnprintf(buf, PAGE_SIZE, "%d\n", val);
>>>> >> >  }
>>>> >> >
>>>> >> > +static ssize_t mem_limit_show(struct device *dev,
>>>> >> > +               struct device_attribute *attr, char *buf)
>>>> >> > +{
>>>> >> > +       u64 val;
>>>> >> > +       struct zram *zram = dev_to_zram(dev);
>>>> >> > +
>>>> >> > +       down_read(&zram->init_lock);
>>>> >> > +       val = zram->limit_pages;
>>>> >> > +       up_read(&zram->init_lock);
>>>> >> > +
>>>> >> > +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
>>>> >> > +}
>>>> >> > +
>>>> >> > +static ssize_t mem_limit_store(struct device *dev,
>>>> >> > +               struct device_attribute *attr, const char *buf, size_t len)
>>>> >> > +{
>>>> >> > +       u64 limit;
>>>> >> > +       struct zram *zram = dev_to_zram(dev);
>>>> >> > +
>>>> >> > +       limit = memparse(buf, NULL);
>>>> >>
>>>> >>             if (limit = 0 && buf != "0")
>>>> >>                   return  -EINVAL
>>>> >>
>>>> >> > +       down_write(&zram->init_lock);
>>>> >> > +       zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
>>>> >> > +       up_write(&zram->init_lock);
>>>> >> > +
>>>> >> > +       return len;
>>>> >> > +}
>>>> >> > +
>>>> >> >  static ssize_t max_comp_streams_store(struct device *dev,
>>>> >> >                 struct device_attribute *attr, const char *buf, size_t len)
>>>> >> >  {
>>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>>>> >> >                 ret = -ENOMEM;
>>>> >> >                 goto out;
>>>> >> >         }
>>>> >> > +
>>>> >> > +       if (zram->limit_pages &&
>>>> >> > +               zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
>>>> >> > +               zs_free(meta->mem_pool, handle);
>>>> >> > +               ret = -ENOMEM;
>>>> >> > +               goto out;
>>>> >> > +       }
>>>> >> > +
>>>> >> >         cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>>>> >> >
>>>> >> >         if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
>>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>>>> >> >         struct zram_meta *meta;
>>>> >> >
>>>> >> >         down_write(&zram->init_lock);
>>>> >> > +
>>>> >> > +       zram->limit_pages = 0;
>>>> >> > +
>>>> >> >         if (!init_done(zram)) {
>>>> >> >                 up_write(&zram->init_lock);
>>>> >> >                 return;
>>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
>>>> >> >  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
>>>> >> >  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
>>>> >> >  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
>>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
>>>> >> > +               mem_limit_store);
>>>> >> >  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
>>>> >> >                 max_comp_streams_show, max_comp_streams_store);
>>>> >> >  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
>>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
>>>> >> >         &dev_attr_orig_data_size.attr,
>>>> >> >         &dev_attr_compr_data_size.attr,
>>>> >> >         &dev_attr_mem_used_total.attr,
>>>> >> > +       &dev_attr_mem_limit.attr,
>>>> >> >         &dev_attr_max_comp_streams.attr,
>>>> >> >         &dev_attr_comp_algorithm.attr,
>>>> >> >         NULL,
>>>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
>>>> >> > index e0f725c87cc6..b7aa9c21553f 100644
>>>> >> > --- a/drivers/block/zram/zram_drv.h
>>>> >> > +++ b/drivers/block/zram/zram_drv.h
>>>> >> > @@ -112,6 +112,11 @@ struct zram {
>>>> >> >         u64 disksize;   /* bytes */
>>>> >> >         int max_comp_streams;
>>>> >> >         struct zram_stats stats;
>>>> >> > +       /*
>>>> >> > +        * the number of pages zram can consume for storing compressed data
>>>> >> > +        */
>>>> >> > +       unsigned long limit_pages;
>>>> >> > +
>>>> >> >         char compressor[10];
>>>> >> >  };
>>>> >> >  #endif
>>>> >> > --
>>>> >> > 2.0.0
>>>> >> >
>>>> >>
>>>> >> --
>>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>> >> the body to majordomo@kvack.org.  For more info on Linux MM,
>>>> >> see: http://www.linux-mm.org/ .
>>>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>> >
>>>> > --
>>>> > Kind regards,
>>>> > Minchan Kim
>>>>
>>>> --
>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>>> see: http://www.linux-mm.org/ .
>>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>
>>> --
>>> Kind regards,
>>> Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
@ 2014-08-26  1:54                 ` David Horner
  0 siblings, 0 replies; 44+ messages in thread
From: David Horner @ 2014-08-26  1:54 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Minchan Kim, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings

On Mon, Aug 25, 2014 at 2:12 PM, Dan Streetman <ddstreet@ieee.org> wrote:
> On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@gmail.com> wrote:
>> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote:
>>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote:
>>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote:
>>>> > Hello David,
>>>> >
>>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote:
>>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
>>>> >> > Since zram has no control feature to limit memory usage,
>>>> >> > it makes hard to manage system memrory.
>>>> >> >
>>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the
>>>> >> > a limit so that zram could fail allocation once it reaches
>>>> >> > the limit.
>>>> >> >
>>>> >> > In addition, user could change the limit in runtime so that
>>>> >> > he could manage the memory more dynamically.
>>>> >> >
>>>> >> - Default is no limit so it doesn't break old behavior.
>>>> >> + Initial state is no limit so it doesn't break old behavior.
>>>> >>
>>>> >> I understand your previous post now.
>>>> >>
>>>> >> I was saying that setting to either a null value or garbage
>>>> >>  (which is interpreted as zero by memparse(buf, NULL);)
>>>> >> removes the limit.
>>>> >>
>>>> >> I think this is "surprise" behaviour and rather the null case should
>>>> >> return  -EINVAL
>>>> >> The test below should be "good enough" though not catching all garbage.
>>>> >
>>>> > Thanks for suggesting but as I said, it should be fixed in memparse itself,
>>>> > not caller if it is really problem so I don't want to touch it in this
>>>> > patchset. It's not critical for adding the feature.
>>>> >
>>>>
>>>> I've looked into the memparse function more since we talked.
>>>> I do believe a wrapper function around it for the typical use by sysfs would
>>>> be very valuable.
>>>
>>> Agree.
>>>
>>>> However, there is nothing wrong with memparse itself that needs to be fixed.
>>>>
>>>> It does what it is documented to do very well (In My Uninformed Opinion).
>>>> It provides everything that a caller needs to manage the token that it
>>>> processes.
>>>> It thus handles strings like "7,,5,8,,9" with the implied zeros.
>>>
>>> Maybe strict_memparse would be better to protect such things so you
>>> could find several places to clean it up.
>>>
>>>>
>>>> The fact that other callers don't check the return pointer value to
>>>> see if only a null
>>>> string was processed, is not its fault.
>>>> Nor that it may not be ideally suited to sysfs attributes; that other store
>>>> functions use it in a given manner does not means that is correct -
>>>> nor that it is
>>>> incorrect for that "knob". Some attributes could be just as valid with
>>>> null zeros.
>>>>
>>>> And you are correct, to disambiguate the zero is not required for the
>>>> limit feature.
>>>> Your original patch which disallowed zero was full feature for mem_limit.
>>>> It is the requested non-crucial feature to allow zero to reestablish
>>>> the initial state
>>>>  that benefits from distinguishing an explicit zero from a "default zero'
>>>>  when garbage is written.
>>>>
>>>> The final argument is that if we release this feature as is the undocumented
>>>>  functionality could be relied upon, and when later fixed: user space breaks.
>>>
>>> I don't get it. Why does it break userspace?
>>> The sysfs-block-zram says "0" means disable the limit.
>>> If someone writes *garabge* but work as if disabling the limit,
>>> it's not a right thing and he already broke although it worked
>>> so it would be not a problem if we fix later.
>>> (ie, we don't need to take care of broken userspace)
>>> Am I missing your point?
>>>
>>
>> Perhaps you are missing my point, perhaps ignoring or dismissing.
>>
>> Basically, if a facility works in a useful way, even if it was designed for
>> different usage, that becomes the "accepted" interface/usage.
>> The developer may not have intended that usage or may even considered
>> it wrong and a broken usage, but it is what it is and people become
>>  reliant on that behaviour.
>>
>> Case in point is memparse itself.
>>
>> The developer intentionally sets the return pointer because that is the
>> only value that can be validated for correct performance.
>> The return value allows -ve so the standard error message passing is not valid.
>> Unfortunately, C allows the user to pass a NULL value in the parameter.
>> The developer could consider that absurd and fundamentally broken.
>> But to the user it is a valid situation, because (perhaps) it can't be
>> bothered to handle error cases.
>>
>> So, who is to blame.
>> You say memparse, that it is fundamentally broken,
>>   because it didn't check to see that it was used correctly.
>>  And I say  mem_limit_store is fundamentally broken,
>>   because it didn't check to see that it was used correctly.
>
> I think we should look at what the rest of the kernel does as far as
> checking memparse results.  It appears to be a mix of some code
> checking memparse while others don't.  The most common way to check
> appears to be to verify that memparse actually parsed at least 1
> character, e.g.:
>   oldp = p;
>   mem_size = memparse(p, &p);
>   if (p == oldp)
>     return -EINVAL;
>
> although other places where 0 isn't valid can simply check for that:
>   mem_size = memparse(p, &p);
>   /* don't remove all of memory when handling "mem={invalid}" param */
>   if (mem_size == 0)
>     return -EINVAL;
>
> or even the other memparse use in zram_drv.c:
>   disksize = memparse(buf, NULL);
>   if (!disksize)
>     return -EINVAL;
>
>
> And there seem to be other places where (maybe?) there's no checking
> at all.  However, it also seems like many cases of memparse usage are
> looking for a non-zero value, and therefore they can either
> immediately check for zero/invalid or (possibly) later code has checks
> to avoid using any zero value.  In this case though, 0 is a valid
> value.  So, while I agree that if a user passes an invalid (i.e.
> non-numeric) value it's clearly user error, it might be closer to the
> apparent (although unwritten AFAICT) memparse usage api to check the
> result for validity; in our case a simple check if at least 1 char was
> parsed is all that's needed, e.g.:
>
> {
>   u64 limit;
>   char *tmp = buf;
>   struct zram *zram = dev_to_zram(dev);
>
>   limit = memparse(buf, &tmp);
>   if (buf == tmp) /* no chars parsed, invalid input */
>     return -EINVAL;
>   down_write(&zram->init_lock);


Thank you Dan, for this clear, unoffensive and I believe compelling analysis.

I have much to learn.

> ...
>
>
> Separate from this patch, it would also help if the lib/cmdline.c
> memparse doc was at least updated to clarify when the result should be
> checked for validity (e.g. always, or at least when the result is 0)
> and how best to do that (e.g. if 0 is an invalid value, just check if
> the result is 0; if 0 is a possible valid value, check if any chars
> were parsed).
>
>

I'd argue that the code is not the place for this usage recommendation.
But rather an expansion of the support doc for sysfs
on how to use such parsing/validation routines.

I agree with Minchan that these helper functions could be improved
for specific use by sysfs.
 And I will pursue this. (and maybe the documentation?)


>>
>> The difference is that memparse cannot stop being abused
>> (C allows the NULL argument and extensive tricks are required to address that)
>> however, we can readily fix mem_limit_store and ensure
>> 1) no regression when the interface IS fixed and
>> 2) predictable behaviour when accidental or "fuzzy" input arrives.
>>
>>
>>>> They say getting API right is a difficult exercise. I suggest, if we
>>>> don't insisting on
>>>>  an explicit zero we have the API wrong.
>>>>
>>>> I don't think you disagreed, just that the burden to get it correct
>>>> lay elsewhere.
>>>>
>>>> If that is the case it doesn't really matter, we cannot release this
>>>> interface until
>>>>  it is corrected wherever it must be.
>>>>
>>>> And my zero check was a poor hack.
>>>>
>>>> I should have explicitly checked the returned pointer value.
>>>>
>>>> I will send that proposed revision, and hopefully you will consider it
>>>> for inclusion.
>>>>
>>>>
>>>>
>>>>
>>>> >>
>>>> >> >
>>>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org>
>>>> >> > ---
>>>> >> >  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
>>>> >> >  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
>>>> >> >  drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
>>>> >> >  drivers/block/zram/zram_drv.h              |  5 ++++
>>>> >> >  4 files changed, 76 insertions(+), 4 deletions(-)
>>>> >> >
>>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
>>>> >> > index 70ec992514d0..b8c779d64968 100644
>>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram
>>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram
>>>> >> > @@ -119,3 +119,13 @@ Description:
>>>> >> >                 efficiency can be calculated using compr_data_size and this
>>>> >> >                 statistic.
>>>> >> >                 Unit: bytes
>>>> >> > +
>>>> >> > +What:          /sys/block/zram<id>/mem_limit
>>>> >> > +Date:          August 2014
>>>> >> > +Contact:       Minchan Kim <minchan@kernel.org>
>>>> >> > +Description:
>>>> >> > +               The mem_limit file is read/write and specifies the amount
>>>> >> > +               of memory to be able to consume memory to store store
>>>> >> > +               compressed data. The limit could be changed in run time
>>>> >> > -               and "0" is default which means disable the limit.
>>>> >> > +               and "0" means disable the limit. No limit is the initial state.
>>>> >>
>>>> >> there should be no default in the API.
>>>> >
>>>> > Thanks.
>>>> >
>>>> >>
>>>> >> > +               Unit: bytes
>>>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
>>>> >> > index 0595c3f56ccf..82c6a41116db 100644
>>>> >> > --- a/Documentation/blockdev/zram.txt
>>>> >> > +++ b/Documentation/blockdev/zram.txt
>>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
>>>> >> >  since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
>>>> >> >  size of the disk when not in use so a huge zram is wasteful.
>>>> >> >
>>>> >> > -5) Activate:
>>>> >> > +5) Set memory limit: Optional
>>>> >> > +       Set memory limit by writing the value to sysfs node 'mem_limit'.
>>>> >> > +       The value can be either in bytes or you can use mem suffixes.
>>>> >> > +       In addition, you could change the value in runtime.
>>>> >> > +       Examples:
>>>> >> > +           # limit /dev/zram0 with 50MB memory
>>>> >> > +           echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
>>>> >> > +
>>>> >> > +           # Using mem suffixes
>>>> >> > +           echo 256K > /sys/block/zram0/mem_limit
>>>> >> > +           echo 512M > /sys/block/zram0/mem_limit
>>>> >> > +           echo 1G > /sys/block/zram0/mem_limit
>>>> >> > +
>>>> >> > +           # To disable memory limit
>>>> >> > +           echo 0 > /sys/block/zram0/mem_limit
>>>> >> > +
>>>> >> > +6) Activate:
>>>> >> >         mkswap /dev/zram0
>>>> >> >         swapon /dev/zram0
>>>> >> >
>>>> >> >         mkfs.ext4 /dev/zram1
>>>> >> >         mount /dev/zram1 /tmp
>>>> >> >
>>>> >> > -6) Stats:
>>>> >> > +7) Stats:
>>>> >> >         Per-device statistics are exported as various nodes under
>>>> >> >         /sys/block/zram<id>/
>>>> >> >                 disksize
>>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
>>>> >> >                 compr_data_size
>>>> >> >                 mem_used_total
>>>> >> >
>>>> >> > -7) Deactivate:
>>>> >> > +8) Deactivate:
>>>> >> >         swapoff /dev/zram0
>>>> >> >         umount /dev/zram1
>>>> >> >
>>>> >> > -8) Reset:
>>>> >> > +9) Reset:
>>>> >> >         Write any positive value to 'reset' sysfs node
>>>> >> >         echo 1 > /sys/block/zram0/reset
>>>> >> >         echo 1 > /sys/block/zram1/reset
>>>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
>>>> >> > index f0b8b30a7128..370c355eb127 100644
>>>> >> > --- a/drivers/block/zram/zram_drv.c
>>>> >> > +++ b/drivers/block/zram/zram_drv.c
>>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
>>>> >> >         return scnprintf(buf, PAGE_SIZE, "%d\n", val);
>>>> >> >  }
>>>> >> >
>>>> >> > +static ssize_t mem_limit_show(struct device *dev,
>>>> >> > +               struct device_attribute *attr, char *buf)
>>>> >> > +{
>>>> >> > +       u64 val;
>>>> >> > +       struct zram *zram = dev_to_zram(dev);
>>>> >> > +
>>>> >> > +       down_read(&zram->init_lock);
>>>> >> > +       val = zram->limit_pages;
>>>> >> > +       up_read(&zram->init_lock);
>>>> >> > +
>>>> >> > +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
>>>> >> > +}
>>>> >> > +
>>>> >> > +static ssize_t mem_limit_store(struct device *dev,
>>>> >> > +               struct device_attribute *attr, const char *buf, size_t len)
>>>> >> > +{
>>>> >> > +       u64 limit;
>>>> >> > +       struct zram *zram = dev_to_zram(dev);
>>>> >> > +
>>>> >> > +       limit = memparse(buf, NULL);
>>>> >>
>>>> >>             if (limit = 0 && buf != "0")
>>>> >>                   return  -EINVAL
>>>> >>
>>>> >> > +       down_write(&zram->init_lock);
>>>> >> > +       zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
>>>> >> > +       up_write(&zram->init_lock);
>>>> >> > +
>>>> >> > +       return len;
>>>> >> > +}
>>>> >> > +
>>>> >> >  static ssize_t max_comp_streams_store(struct device *dev,
>>>> >> >                 struct device_attribute *attr, const char *buf, size_t len)
>>>> >> >  {
>>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>>>> >> >                 ret = -ENOMEM;
>>>> >> >                 goto out;
>>>> >> >         }
>>>> >> > +
>>>> >> > +       if (zram->limit_pages &&
>>>> >> > +               zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
>>>> >> > +               zs_free(meta->mem_pool, handle);
>>>> >> > +               ret = -ENOMEM;
>>>> >> > +               goto out;
>>>> >> > +       }
>>>> >> > +
>>>> >> >         cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>>>> >> >
>>>> >> >         if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
>>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>>>> >> >         struct zram_meta *meta;
>>>> >> >
>>>> >> >         down_write(&zram->init_lock);
>>>> >> > +
>>>> >> > +       zram->limit_pages = 0;
>>>> >> > +
>>>> >> >         if (!init_done(zram)) {
>>>> >> >                 up_write(&zram->init_lock);
>>>> >> >                 return;
>>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
>>>> >> >  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
>>>> >> >  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
>>>> >> >  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
>>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
>>>> >> > +               mem_limit_store);
>>>> >> >  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
>>>> >> >                 max_comp_streams_show, max_comp_streams_store);
>>>> >> >  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
>>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
>>>> >> >         &dev_attr_orig_data_size.attr,
>>>> >> >         &dev_attr_compr_data_size.attr,
>>>> >> >         &dev_attr_mem_used_total.attr,
>>>> >> > +       &dev_attr_mem_limit.attr,
>>>> >> >         &dev_attr_max_comp_streams.attr,
>>>> >> >         &dev_attr_comp_algorithm.attr,
>>>> >> >         NULL,
>>>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
>>>> >> > index e0f725c87cc6..b7aa9c21553f 100644
>>>> >> > --- a/drivers/block/zram/zram_drv.h
>>>> >> > +++ b/drivers/block/zram/zram_drv.h
>>>> >> > @@ -112,6 +112,11 @@ struct zram {
>>>> >> >         u64 disksize;   /* bytes */
>>>> >> >         int max_comp_streams;
>>>> >> >         struct zram_stats stats;
>>>> >> > +       /*
>>>> >> > +        * the number of pages zram can consume for storing compressed data
>>>> >> > +        */
>>>> >> > +       unsigned long limit_pages;
>>>> >> > +
>>>> >> >         char compressor[10];
>>>> >> >  };
>>>> >> >  #endif
>>>> >> > --
>>>> >> > 2.0.0
>>>> >> >
>>>> >>
>>>> >> --
>>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>> >> the body to majordomo@kvack.org.  For more info on Linux MM,
>>>> >> see: http://www.linux-mm.org/ .
>>>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>> >
>>>> > --
>>>> > Kind regards,
>>>> > Minchan Kim
>>>>
>>>> --
>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>>> see: http://www.linux-mm.org/ .
>>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>
>>> --
>>> Kind regards,
>>> Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
  2014-08-25 18:12               ` Dan Streetman
@ 2014-08-26  4:28                 ` David Horner
  -1 siblings, 0 replies; 44+ messages in thread
From: David Horner @ 2014-08-26  4:28 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Minchan Kim, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings

On Mon, Aug 25, 2014 at 2:12 PM, Dan Streetman <ddstreet@ieee.org> wrote:
> On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@gmail.com> wrote:
>> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote:
>>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote:
>>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote:
>>>> > Hello David,
>>>> >
>>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote:
>>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
>>>> >> > Since zram has no control feature to limit memory usage,
>>>> >> > it makes hard to manage system memrory.
>>>> >> >
>>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the
>>>> >> > a limit so that zram could fail allocation once it reaches
>>>> >> > the limit.
>>>> >> >
>>>> >> > In addition, user could change the limit in runtime so that
>>>> >> > he could manage the memory more dynamically.
>>>> >> >
>>>> >> - Default is no limit so it doesn't break old behavior.
>>>> >> + Initial state is no limit so it doesn't break old behavior.
>>>> >>
>>>> >> I understand your previous post now.
>>>> >>
>>>> >> I was saying that setting to either a null value or garbage
>>>> >>  (which is interpreted as zero by memparse(buf, NULL);)
>>>> >> removes the limit.
>>>> >>
>>>> >> I think this is "surprise" behaviour and rather the null case should
>>>> >> return  -EINVAL
>>>> >> The test below should be "good enough" though not catching all garbage.
>>>> >
>>>> > Thanks for suggesting but as I said, it should be fixed in memparse itself,
>>>> > not caller if it is really problem so I don't want to touch it in this
>>>> > patchset. It's not critical for adding the feature.
>>>> >
>>>>
>>>> I've looked into the memparse function more since we talked.
>>>> I do believe a wrapper function around it for the typical use by sysfs would
>>>> be very valuable.
>>>
>>> Agree.
>>>
>>>> However, there is nothing wrong with memparse itself that needs to be fixed.
>>>>
>>>> It does what it is documented to do very well (In My Uninformed Opinion).
>>>> It provides everything that a caller needs to manage the token that it
>>>> processes.
>>>> It thus handles strings like "7,,5,8,,9" with the implied zeros.
>>>
>>> Maybe strict_memparse would be better to protect such things so you
>>> could find several places to clean it up.
>>>
>>>>
>>>> The fact that other callers don't check the return pointer value to
>>>> see if only a null
>>>> string was processed, is not its fault.
>>>> Nor that it may not be ideally suited to sysfs attributes; that other store
>>>> functions use it in a given manner does not means that is correct -
>>>> nor that it is
>>>> incorrect for that "knob". Some attributes could be just as valid with
>>>> null zeros.
>>>>
>>>> And you are correct, to disambiguate the zero is not required for the
>>>> limit feature.
>>>> Your original patch which disallowed zero was full feature for mem_limit.
>>>> It is the requested non-crucial feature to allow zero to reestablish
>>>> the initial state
>>>>  that benefits from distinguishing an explicit zero from a "default zero'
>>>>  when garbage is written.
>>>>
>>>> The final argument is that if we release this feature as is the undocumented
>>>>  functionality could be relied upon, and when later fixed: user space breaks.
>>>
>>> I don't get it. Why does it break userspace?
>>> The sysfs-block-zram says "0" means disable the limit.
>>> If someone writes *garabge* but work as if disabling the limit,
>>> it's not a right thing and he already broke although it worked
>>> so it would be not a problem if we fix later.
>>> (ie, we don't need to take care of broken userspace)
>>> Am I missing your point?
>>>
>>
>> Perhaps you are missing my point, perhaps ignoring or dismissing.
>>
>> Basically, if a facility works in a useful way, even if it was designed for
>> different usage, that becomes the "accepted" interface/usage.
>> The developer may not have intended that usage or may even considered
>> it wrong and a broken usage, but it is what it is and people become
>>  reliant on that behaviour.
>>
>> Case in point is memparse itself.
>>
>> The developer intentionally sets the return pointer because that is the
>> only value that can be validated for correct performance.
>> The return value allows -ve so the standard error message passing is not valid.
>> Unfortunately, C allows the user to pass a NULL value in the parameter.
>> The developer could consider that absurd and fundamentally broken.
>> But to the user it is a valid situation, because (perhaps) it can't be
>> bothered to handle error cases.
>>
>> So, who is to blame.
>> You say memparse, that it is fundamentally broken,
>>   because it didn't check to see that it was used correctly.
>>  And I say  mem_limit_store is fundamentally broken,
>>   because it didn't check to see that it was used correctly.
>
> I think we should look at what the rest of the kernel does as far as
> checking memparse results.  It appears to be a mix of some code
> checking memparse while others don't.  The most common way to check
> appears to be to verify that memparse actually parsed at least 1
> character, e.g.:
>   oldp = p;
>   mem_size = memparse(p, &p);
>   if (p == oldp)
>     return -EINVAL;
>
> although other places where 0 isn't valid can simply check for that:
>   mem_size = memparse(p, &p);
>   /* don't remove all of memory when handling "mem={invalid}" param */
>   if (mem_size == 0)
>     return -EINVAL;
>
> or even the other memparse use in zram_drv.c:
>   disksize = memparse(buf, NULL);
>   if (!disksize)
>     return -EINVAL;
>
>
> And there seem to be other places where (maybe?) there's no checking
> at all.  However, it also seems like many cases of memparse usage are
> looking for a non-zero value, and therefore they can either
> immediately check for zero/invalid or (possibly) later code has checks
> to avoid using any zero value.  In this case though, 0 is a valid
> value.  So, while I agree that if a user passes an invalid (i.e.
> non-numeric) value it's clearly user error, it might be closer to the
> apparent (although unwritten AFAICT) memparse usage api to check the
> result for validity; in our case a simple check if at least 1 char was
> parsed is all that's needed, e.g.:
>
> {
>   u64 limit;
>   char *tmp = buf;
>   struct zram *zram = dev_to_zram(dev);
>
>   limit = memparse(buf, &tmp);
>   if (buf == tmp) /* no chars parsed, invalid input */
>     return -EINVAL;
>   down_write(&zram->init_lock);
> ...
>
>
> Separate from this patch, it would also help if the lib/cmdline.c
> memparse doc was at least updated to clarify when the result should be
> checked for validity

FWIW:
I was pondering why I thought this was the wrong place.
On reflection the best explanation is that it is not validity -
     the program does what it does quite well.
      (although it does have flaws for use by sysfs
         1) it uses simple_strtoull which according to kernel.h#L269 is obsolete
         2) it checks for a suffix in the null zero case
              (that means G,K,M are all valid memory size constants,
               and I think that should not be in the definition of
valid mem parms)
         3) it does nothing to enforce termination of the input.
            Both simple_strtoull and its successor  kstrtoull are not
buffer overrun safe.
            And so neither is memparse.
            It may be the sysfs buffer management does some magic here
               - but I have not seen it documented nor in code.)

Rather than _validity_ it is _applicability_ that needs explaining.
And that is not documented in the function that does its thing.
But rather in the code that uses it, and more specifically in the framework
established for its specific use - as in sysfs for numeric memory parameters.

> and how best to do that (e.g. if 0 is an invalid value, just check if
> the result is 0; if 0 is a possible valid value, check if any chars
> were parsed).
>
>
>>
>> The difference is that memparse cannot stop being abused
>> (C allows the NULL argument and extensive tricks are required to address that)
>> however, we can readily fix mem_limit_store and ensure
>> 1) no regression when the interface IS fixed and
>> 2) predictable behaviour when accidental or "fuzzy" input arrives.
>>
>>
>>>> They say getting API right is a difficult exercise. I suggest, if we
>>>> don't insisting on
>>>>  an explicit zero we have the API wrong.
>>>>
>>>> I don't think you disagreed, just that the burden to get it correct
>>>> lay elsewhere.
>>>>
>>>> If that is the case it doesn't really matter, we cannot release this
>>>> interface until
>>>>  it is corrected wherever it must be.
>>>>
>>>> And my zero check was a poor hack.
>>>>
>>>> I should have explicitly checked the returned pointer value.
>>>>
>>>> I will send that proposed revision, and hopefully you will consider it
>>>> for inclusion.
>>>>
>>>>
>>>>
>>>>
>>>> >>
>>>> >> >
>>>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org>
>>>> >> > ---
>>>> >> >  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
>>>> >> >  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
>>>> >> >  drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
>>>> >> >  drivers/block/zram/zram_drv.h              |  5 ++++
>>>> >> >  4 files changed, 76 insertions(+), 4 deletions(-)
>>>> >> >
>>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
>>>> >> > index 70ec992514d0..b8c779d64968 100644
>>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram
>>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram
>>>> >> > @@ -119,3 +119,13 @@ Description:
>>>> >> >                 efficiency can be calculated using compr_data_size and this
>>>> >> >                 statistic.
>>>> >> >                 Unit: bytes
>>>> >> > +
>>>> >> > +What:          /sys/block/zram<id>/mem_limit
>>>> >> > +Date:          August 2014
>>>> >> > +Contact:       Minchan Kim <minchan@kernel.org>
>>>> >> > +Description:
>>>> >> > +               The mem_limit file is read/write and specifies the amount
>>>> >> > +               of memory to be able to consume memory to store store
>>>> >> > +               compressed data. The limit could be changed in run time
>>>> >> > -               and "0" is default which means disable the limit.
>>>> >> > +               and "0" means disable the limit. No limit is the initial state.
>>>> >>
>>>> >> there should be no default in the API.
>>>> >
>>>> > Thanks.
>>>> >
>>>> >>
>>>> >> > +               Unit: bytes
>>>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
>>>> >> > index 0595c3f56ccf..82c6a41116db 100644
>>>> >> > --- a/Documentation/blockdev/zram.txt
>>>> >> > +++ b/Documentation/blockdev/zram.txt
>>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
>>>> >> >  since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
>>>> >> >  size of the disk when not in use so a huge zram is wasteful.
>>>> >> >
>>>> >> > -5) Activate:
>>>> >> > +5) Set memory limit: Optional
>>>> >> > +       Set memory limit by writing the value to sysfs node 'mem_limit'.
>>>> >> > +       The value can be either in bytes or you can use mem suffixes.
>>>> >> > +       In addition, you could change the value in runtime.
>>>> >> > +       Examples:
>>>> >> > +           # limit /dev/zram0 with 50MB memory
>>>> >> > +           echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
>>>> >> > +
>>>> >> > +           # Using mem suffixes
>>>> >> > +           echo 256K > /sys/block/zram0/mem_limit
>>>> >> > +           echo 512M > /sys/block/zram0/mem_limit
>>>> >> > +           echo 1G > /sys/block/zram0/mem_limit
>>>> >> > +
>>>> >> > +           # To disable memory limit
>>>> >> > +           echo 0 > /sys/block/zram0/mem_limit
>>>> >> > +
>>>> >> > +6) Activate:
>>>> >> >         mkswap /dev/zram0
>>>> >> >         swapon /dev/zram0
>>>> >> >
>>>> >> >         mkfs.ext4 /dev/zram1
>>>> >> >         mount /dev/zram1 /tmp
>>>> >> >
>>>> >> > -6) Stats:
>>>> >> > +7) Stats:
>>>> >> >         Per-device statistics are exported as various nodes under
>>>> >> >         /sys/block/zram<id>/
>>>> >> >                 disksize
>>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
>>>> >> >                 compr_data_size
>>>> >> >                 mem_used_total
>>>> >> >
>>>> >> > -7) Deactivate:
>>>> >> > +8) Deactivate:
>>>> >> >         swapoff /dev/zram0
>>>> >> >         umount /dev/zram1
>>>> >> >
>>>> >> > -8) Reset:
>>>> >> > +9) Reset:
>>>> >> >         Write any positive value to 'reset' sysfs node
>>>> >> >         echo 1 > /sys/block/zram0/reset
>>>> >> >         echo 1 > /sys/block/zram1/reset
>>>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
>>>> >> > index f0b8b30a7128..370c355eb127 100644
>>>> >> > --- a/drivers/block/zram/zram_drv.c
>>>> >> > +++ b/drivers/block/zram/zram_drv.c
>>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
>>>> >> >         return scnprintf(buf, PAGE_SIZE, "%d\n", val);
>>>> >> >  }
>>>> >> >
>>>> >> > +static ssize_t mem_limit_show(struct device *dev,
>>>> >> > +               struct device_attribute *attr, char *buf)
>>>> >> > +{
>>>> >> > +       u64 val;
>>>> >> > +       struct zram *zram = dev_to_zram(dev);
>>>> >> > +
>>>> >> > +       down_read(&zram->init_lock);
>>>> >> > +       val = zram->limit_pages;
>>>> >> > +       up_read(&zram->init_lock);
>>>> >> > +
>>>> >> > +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
>>>> >> > +}
>>>> >> > +
>>>> >> > +static ssize_t mem_limit_store(struct device *dev,
>>>> >> > +               struct device_attribute *attr, const char *buf, size_t len)
>>>> >> > +{
>>>> >> > +       u64 limit;
>>>> >> > +       struct zram *zram = dev_to_zram(dev);
>>>> >> > +
>>>> >> > +       limit = memparse(buf, NULL);
>>>> >>
>>>> >>             if (limit = 0 && buf != "0")
>>>> >>                   return  -EINVAL
>>>> >>
>>>> >> > +       down_write(&zram->init_lock);
>>>> >> > +       zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
>>>> >> > +       up_write(&zram->init_lock);
>>>> >> > +
>>>> >> > +       return len;
>>>> >> > +}
>>>> >> > +
>>>> >> >  static ssize_t max_comp_streams_store(struct device *dev,
>>>> >> >                 struct device_attribute *attr, const char *buf, size_t len)
>>>> >> >  {
>>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>>>> >> >                 ret = -ENOMEM;
>>>> >> >                 goto out;
>>>> >> >         }
>>>> >> > +
>>>> >> > +       if (zram->limit_pages &&
>>>> >> > +               zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
>>>> >> > +               zs_free(meta->mem_pool, handle);
>>>> >> > +               ret = -ENOMEM;
>>>> >> > +               goto out;
>>>> >> > +       }
>>>> >> > +
>>>> >> >         cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>>>> >> >
>>>> >> >         if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
>>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>>>> >> >         struct zram_meta *meta;
>>>> >> >
>>>> >> >         down_write(&zram->init_lock);
>>>> >> > +
>>>> >> > +       zram->limit_pages = 0;
>>>> >> > +
>>>> >> >         if (!init_done(zram)) {
>>>> >> >                 up_write(&zram->init_lock);
>>>> >> >                 return;
>>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
>>>> >> >  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
>>>> >> >  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
>>>> >> >  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
>>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
>>>> >> > +               mem_limit_store);
>>>> >> >  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
>>>> >> >                 max_comp_streams_show, max_comp_streams_store);
>>>> >> >  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
>>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
>>>> >> >         &dev_attr_orig_data_size.attr,
>>>> >> >         &dev_attr_compr_data_size.attr,
>>>> >> >         &dev_attr_mem_used_total.attr,
>>>> >> > +       &dev_attr_mem_limit.attr,
>>>> >> >         &dev_attr_max_comp_streams.attr,
>>>> >> >         &dev_attr_comp_algorithm.attr,
>>>> >> >         NULL,
>>>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
>>>> >> > index e0f725c87cc6..b7aa9c21553f 100644
>>>> >> > --- a/drivers/block/zram/zram_drv.h
>>>> >> > +++ b/drivers/block/zram/zram_drv.h
>>>> >> > @@ -112,6 +112,11 @@ struct zram {
>>>> >> >         u64 disksize;   /* bytes */
>>>> >> >         int max_comp_streams;
>>>> >> >         struct zram_stats stats;
>>>> >> > +       /*
>>>> >> > +        * the number of pages zram can consume for storing compressed data
>>>> >> > +        */
>>>> >> > +       unsigned long limit_pages;
>>>> >> > +
>>>> >> >         char compressor[10];
>>>> >> >  };
>>>> >> >  #endif
>>>> >> > --
>>>> >> > 2.0.0
>>>> >> >
>>>> >>
>>>> >> --
>>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>> >> the body to majordomo@kvack.org.  For more info on Linux MM,
>>>> >> see: http://www.linux-mm.org/ .
>>>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>> >
>>>> > --
>>>> > Kind regards,
>>>> > Minchan Kim
>>>>
>>>> --
>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>>> see: http://www.linux-mm.org/ .
>>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>
>>> --
>>> Kind regards,
>>> Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
@ 2014-08-26  4:28                 ` David Horner
  0 siblings, 0 replies; 44+ messages in thread
From: David Horner @ 2014-08-26  4:28 UTC (permalink / raw)
  To: Dan Streetman
  Cc: Minchan Kim, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings

On Mon, Aug 25, 2014 at 2:12 PM, Dan Streetman <ddstreet@ieee.org> wrote:
> On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@gmail.com> wrote:
>> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote:
>>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote:
>>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote:
>>>> > Hello David,
>>>> >
>>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote:
>>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
>>>> >> > Since zram has no control feature to limit memory usage,
>>>> >> > it makes hard to manage system memrory.
>>>> >> >
>>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the
>>>> >> > a limit so that zram could fail allocation once it reaches
>>>> >> > the limit.
>>>> >> >
>>>> >> > In addition, user could change the limit in runtime so that
>>>> >> > he could manage the memory more dynamically.
>>>> >> >
>>>> >> - Default is no limit so it doesn't break old behavior.
>>>> >> + Initial state is no limit so it doesn't break old behavior.
>>>> >>
>>>> >> I understand your previous post now.
>>>> >>
>>>> >> I was saying that setting to either a null value or garbage
>>>> >>  (which is interpreted as zero by memparse(buf, NULL);)
>>>> >> removes the limit.
>>>> >>
>>>> >> I think this is "surprise" behaviour and rather the null case should
>>>> >> return  -EINVAL
>>>> >> The test below should be "good enough" though not catching all garbage.
>>>> >
>>>> > Thanks for suggesting but as I said, it should be fixed in memparse itself,
>>>> > not caller if it is really problem so I don't want to touch it in this
>>>> > patchset. It's not critical for adding the feature.
>>>> >
>>>>
>>>> I've looked into the memparse function more since we talked.
>>>> I do believe a wrapper function around it for the typical use by sysfs would
>>>> be very valuable.
>>>
>>> Agree.
>>>
>>>> However, there is nothing wrong with memparse itself that needs to be fixed.
>>>>
>>>> It does what it is documented to do very well (In My Uninformed Opinion).
>>>> It provides everything that a caller needs to manage the token that it
>>>> processes.
>>>> It thus handles strings like "7,,5,8,,9" with the implied zeros.
>>>
>>> Maybe strict_memparse would be better to protect such things so you
>>> could find several places to clean it up.
>>>
>>>>
>>>> The fact that other callers don't check the return pointer value to
>>>> see if only a null
>>>> string was processed, is not its fault.
>>>> Nor that it may not be ideally suited to sysfs attributes; that other store
>>>> functions use it in a given manner does not means that is correct -
>>>> nor that it is
>>>> incorrect for that "knob". Some attributes could be just as valid with
>>>> null zeros.
>>>>
>>>> And you are correct, to disambiguate the zero is not required for the
>>>> limit feature.
>>>> Your original patch which disallowed zero was full feature for mem_limit.
>>>> It is the requested non-crucial feature to allow zero to reestablish
>>>> the initial state
>>>>  that benefits from distinguishing an explicit zero from a "default zero'
>>>>  when garbage is written.
>>>>
>>>> The final argument is that if we release this feature as is the undocumented
>>>>  functionality could be relied upon, and when later fixed: user space breaks.
>>>
>>> I don't get it. Why does it break userspace?
>>> The sysfs-block-zram says "0" means disable the limit.
>>> If someone writes *garabge* but work as if disabling the limit,
>>> it's not a right thing and he already broke although it worked
>>> so it would be not a problem if we fix later.
>>> (ie, we don't need to take care of broken userspace)
>>> Am I missing your point?
>>>
>>
>> Perhaps you are missing my point, perhaps ignoring or dismissing.
>>
>> Basically, if a facility works in a useful way, even if it was designed for
>> different usage, that becomes the "accepted" interface/usage.
>> The developer may not have intended that usage or may even considered
>> it wrong and a broken usage, but it is what it is and people become
>>  reliant on that behaviour.
>>
>> Case in point is memparse itself.
>>
>> The developer intentionally sets the return pointer because that is the
>> only value that can be validated for correct performance.
>> The return value allows -ve so the standard error message passing is not valid.
>> Unfortunately, C allows the user to pass a NULL value in the parameter.
>> The developer could consider that absurd and fundamentally broken.
>> But to the user it is a valid situation, because (perhaps) it can't be
>> bothered to handle error cases.
>>
>> So, who is to blame.
>> You say memparse, that it is fundamentally broken,
>>   because it didn't check to see that it was used correctly.
>>  And I say  mem_limit_store is fundamentally broken,
>>   because it didn't check to see that it was used correctly.
>
> I think we should look at what the rest of the kernel does as far as
> checking memparse results.  It appears to be a mix of some code
> checking memparse while others don't.  The most common way to check
> appears to be to verify that memparse actually parsed at least 1
> character, e.g.:
>   oldp = p;
>   mem_size = memparse(p, &p);
>   if (p == oldp)
>     return -EINVAL;
>
> although other places where 0 isn't valid can simply check for that:
>   mem_size = memparse(p, &p);
>   /* don't remove all of memory when handling "mem={invalid}" param */
>   if (mem_size == 0)
>     return -EINVAL;
>
> or even the other memparse use in zram_drv.c:
>   disksize = memparse(buf, NULL);
>   if (!disksize)
>     return -EINVAL;
>
>
> And there seem to be other places where (maybe?) there's no checking
> at all.  However, it also seems like many cases of memparse usage are
> looking for a non-zero value, and therefore they can either
> immediately check for zero/invalid or (possibly) later code has checks
> to avoid using any zero value.  In this case though, 0 is a valid
> value.  So, while I agree that if a user passes an invalid (i.e.
> non-numeric) value it's clearly user error, it might be closer to the
> apparent (although unwritten AFAICT) memparse usage api to check the
> result for validity; in our case a simple check if at least 1 char was
> parsed is all that's needed, e.g.:
>
> {
>   u64 limit;
>   char *tmp = buf;
>   struct zram *zram = dev_to_zram(dev);
>
>   limit = memparse(buf, &tmp);
>   if (buf == tmp) /* no chars parsed, invalid input */
>     return -EINVAL;
>   down_write(&zram->init_lock);
> ...
>
>
> Separate from this patch, it would also help if the lib/cmdline.c
> memparse doc was at least updated to clarify when the result should be
> checked for validity

FWIW:
I was pondering why I thought this was the wrong place.
On reflection the best explanation is that it is not validity -
     the program does what it does quite well.
      (although it does have flaws for use by sysfs
         1) it uses simple_strtoull which according to kernel.h#L269 is obsolete
         2) it checks for a suffix in the null zero case
              (that means G,K,M are all valid memory size constants,
               and I think that should not be in the definition of
valid mem parms)
         3) it does nothing to enforce termination of the input.
            Both simple_strtoull and its successor  kstrtoull are not
buffer overrun safe.
            And so neither is memparse.
            It may be the sysfs buffer management does some magic here
               - but I have not seen it documented nor in code.)

Rather than _validity_ it is _applicability_ that needs explaining.
And that is not documented in the function that does its thing.
But rather in the code that uses it, and more specifically in the framework
established for its specific use - as in sysfs for numeric memory parameters.

> and how best to do that (e.g. if 0 is an invalid value, just check if
> the result is 0; if 0 is a possible valid value, check if any chars
> were parsed).
>
>
>>
>> The difference is that memparse cannot stop being abused
>> (C allows the NULL argument and extensive tricks are required to address that)
>> however, we can readily fix mem_limit_store and ensure
>> 1) no regression when the interface IS fixed and
>> 2) predictable behaviour when accidental or "fuzzy" input arrives.
>>
>>
>>>> They say getting API right is a difficult exercise. I suggest, if we
>>>> don't insisting on
>>>>  an explicit zero we have the API wrong.
>>>>
>>>> I don't think you disagreed, just that the burden to get it correct
>>>> lay elsewhere.
>>>>
>>>> If that is the case it doesn't really matter, we cannot release this
>>>> interface until
>>>>  it is corrected wherever it must be.
>>>>
>>>> And my zero check was a poor hack.
>>>>
>>>> I should have explicitly checked the returned pointer value.
>>>>
>>>> I will send that proposed revision, and hopefully you will consider it
>>>> for inclusion.
>>>>
>>>>
>>>>
>>>>
>>>> >>
>>>> >> >
>>>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org>
>>>> >> > ---
>>>> >> >  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
>>>> >> >  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
>>>> >> >  drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
>>>> >> >  drivers/block/zram/zram_drv.h              |  5 ++++
>>>> >> >  4 files changed, 76 insertions(+), 4 deletions(-)
>>>> >> >
>>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
>>>> >> > index 70ec992514d0..b8c779d64968 100644
>>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram
>>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram
>>>> >> > @@ -119,3 +119,13 @@ Description:
>>>> >> >                 efficiency can be calculated using compr_data_size and this
>>>> >> >                 statistic.
>>>> >> >                 Unit: bytes
>>>> >> > +
>>>> >> > +What:          /sys/block/zram<id>/mem_limit
>>>> >> > +Date:          August 2014
>>>> >> > +Contact:       Minchan Kim <minchan@kernel.org>
>>>> >> > +Description:
>>>> >> > +               The mem_limit file is read/write and specifies the amount
>>>> >> > +               of memory to be able to consume memory to store store
>>>> >> > +               compressed data. The limit could be changed in run time
>>>> >> > -               and "0" is default which means disable the limit.
>>>> >> > +               and "0" means disable the limit. No limit is the initial state.
>>>> >>
>>>> >> there should be no default in the API.
>>>> >
>>>> > Thanks.
>>>> >
>>>> >>
>>>> >> > +               Unit: bytes
>>>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
>>>> >> > index 0595c3f56ccf..82c6a41116db 100644
>>>> >> > --- a/Documentation/blockdev/zram.txt
>>>> >> > +++ b/Documentation/blockdev/zram.txt
>>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
>>>> >> >  since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
>>>> >> >  size of the disk when not in use so a huge zram is wasteful.
>>>> >> >
>>>> >> > -5) Activate:
>>>> >> > +5) Set memory limit: Optional
>>>> >> > +       Set memory limit by writing the value to sysfs node 'mem_limit'.
>>>> >> > +       The value can be either in bytes or you can use mem suffixes.
>>>> >> > +       In addition, you could change the value in runtime.
>>>> >> > +       Examples:
>>>> >> > +           # limit /dev/zram0 with 50MB memory
>>>> >> > +           echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
>>>> >> > +
>>>> >> > +           # Using mem suffixes
>>>> >> > +           echo 256K > /sys/block/zram0/mem_limit
>>>> >> > +           echo 512M > /sys/block/zram0/mem_limit
>>>> >> > +           echo 1G > /sys/block/zram0/mem_limit
>>>> >> > +
>>>> >> > +           # To disable memory limit
>>>> >> > +           echo 0 > /sys/block/zram0/mem_limit
>>>> >> > +
>>>> >> > +6) Activate:
>>>> >> >         mkswap /dev/zram0
>>>> >> >         swapon /dev/zram0
>>>> >> >
>>>> >> >         mkfs.ext4 /dev/zram1
>>>> >> >         mount /dev/zram1 /tmp
>>>> >> >
>>>> >> > -6) Stats:
>>>> >> > +7) Stats:
>>>> >> >         Per-device statistics are exported as various nodes under
>>>> >> >         /sys/block/zram<id>/
>>>> >> >                 disksize
>>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
>>>> >> >                 compr_data_size
>>>> >> >                 mem_used_total
>>>> >> >
>>>> >> > -7) Deactivate:
>>>> >> > +8) Deactivate:
>>>> >> >         swapoff /dev/zram0
>>>> >> >         umount /dev/zram1
>>>> >> >
>>>> >> > -8) Reset:
>>>> >> > +9) Reset:
>>>> >> >         Write any positive value to 'reset' sysfs node
>>>> >> >         echo 1 > /sys/block/zram0/reset
>>>> >> >         echo 1 > /sys/block/zram1/reset
>>>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
>>>> >> > index f0b8b30a7128..370c355eb127 100644
>>>> >> > --- a/drivers/block/zram/zram_drv.c
>>>> >> > +++ b/drivers/block/zram/zram_drv.c
>>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
>>>> >> >         return scnprintf(buf, PAGE_SIZE, "%d\n", val);
>>>> >> >  }
>>>> >> >
>>>> >> > +static ssize_t mem_limit_show(struct device *dev,
>>>> >> > +               struct device_attribute *attr, char *buf)
>>>> >> > +{
>>>> >> > +       u64 val;
>>>> >> > +       struct zram *zram = dev_to_zram(dev);
>>>> >> > +
>>>> >> > +       down_read(&zram->init_lock);
>>>> >> > +       val = zram->limit_pages;
>>>> >> > +       up_read(&zram->init_lock);
>>>> >> > +
>>>> >> > +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
>>>> >> > +}
>>>> >> > +
>>>> >> > +static ssize_t mem_limit_store(struct device *dev,
>>>> >> > +               struct device_attribute *attr, const char *buf, size_t len)
>>>> >> > +{
>>>> >> > +       u64 limit;
>>>> >> > +       struct zram *zram = dev_to_zram(dev);
>>>> >> > +
>>>> >> > +       limit = memparse(buf, NULL);
>>>> >>
>>>> >>             if (limit = 0 && buf != "0")
>>>> >>                   return  -EINVAL
>>>> >>
>>>> >> > +       down_write(&zram->init_lock);
>>>> >> > +       zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
>>>> >> > +       up_write(&zram->init_lock);
>>>> >> > +
>>>> >> > +       return len;
>>>> >> > +}
>>>> >> > +
>>>> >> >  static ssize_t max_comp_streams_store(struct device *dev,
>>>> >> >                 struct device_attribute *attr, const char *buf, size_t len)
>>>> >> >  {
>>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>>>> >> >                 ret = -ENOMEM;
>>>> >> >                 goto out;
>>>> >> >         }
>>>> >> > +
>>>> >> > +       if (zram->limit_pages &&
>>>> >> > +               zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
>>>> >> > +               zs_free(meta->mem_pool, handle);
>>>> >> > +               ret = -ENOMEM;
>>>> >> > +               goto out;
>>>> >> > +       }
>>>> >> > +
>>>> >> >         cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>>>> >> >
>>>> >> >         if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
>>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>>>> >> >         struct zram_meta *meta;
>>>> >> >
>>>> >> >         down_write(&zram->init_lock);
>>>> >> > +
>>>> >> > +       zram->limit_pages = 0;
>>>> >> > +
>>>> >> >         if (!init_done(zram)) {
>>>> >> >                 up_write(&zram->init_lock);
>>>> >> >                 return;
>>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
>>>> >> >  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
>>>> >> >  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
>>>> >> >  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
>>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
>>>> >> > +               mem_limit_store);
>>>> >> >  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
>>>> >> >                 max_comp_streams_show, max_comp_streams_store);
>>>> >> >  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
>>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
>>>> >> >         &dev_attr_orig_data_size.attr,
>>>> >> >         &dev_attr_compr_data_size.attr,
>>>> >> >         &dev_attr_mem_used_total.attr,
>>>> >> > +       &dev_attr_mem_limit.attr,
>>>> >> >         &dev_attr_max_comp_streams.attr,
>>>> >> >         &dev_attr_comp_algorithm.attr,
>>>> >> >         NULL,
>>>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
>>>> >> > index e0f725c87cc6..b7aa9c21553f 100644
>>>> >> > --- a/drivers/block/zram/zram_drv.h
>>>> >> > +++ b/drivers/block/zram/zram_drv.h
>>>> >> > @@ -112,6 +112,11 @@ struct zram {
>>>> >> >         u64 disksize;   /* bytes */
>>>> >> >         int max_comp_streams;
>>>> >> >         struct zram_stats stats;
>>>> >> > +       /*
>>>> >> > +        * the number of pages zram can consume for storing compressed data
>>>> >> > +        */
>>>> >> > +       unsigned long limit_pages;
>>>> >> > +
>>>> >> >         char compressor[10];
>>>> >> >  };
>>>> >> >  #endif
>>>> >> > --
>>>> >> > 2.0.0
>>>> >> >
>>>> >>
>>>> >> --
>>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>> >> the body to majordomo@kvack.org.  For more info on Linux MM,
>>>> >> see: http://www.linux-mm.org/ .
>>>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>> >
>>>> > --
>>>> > Kind regards,
>>>> > Minchan Kim
>>>>
>>>> --
>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>>> see: http://www.linux-mm.org/ .
>>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>
>>> --
>>> Kind regards,
>>> Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
  2014-08-26  1:54                 ` David Horner
@ 2014-08-26  4:39                   ` Minchan Kim
  -1 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2014-08-26  4:39 UTC (permalink / raw)
  To: David Horner
  Cc: Dan Streetman, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings

Hi Dan and David,

On Mon, Aug 25, 2014 at 09:54:57PM -0400, David Horner wrote:
> On Mon, Aug 25, 2014 at 2:12 PM, Dan Streetman <ddstreet@ieee.org> wrote:
> > On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@gmail.com> wrote:
> >> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote:
> >>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote:
> >>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote:
> >>>> > Hello David,
> >>>> >
> >>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote:
> >>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
> >>>> >> > Since zram has no control feature to limit memory usage,
> >>>> >> > it makes hard to manage system memrory.
> >>>> >> >
> >>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the
> >>>> >> > a limit so that zram could fail allocation once it reaches
> >>>> >> > the limit.
> >>>> >> >
> >>>> >> > In addition, user could change the limit in runtime so that
> >>>> >> > he could manage the memory more dynamically.
> >>>> >> >
> >>>> >> - Default is no limit so it doesn't break old behavior.
> >>>> >> + Initial state is no limit so it doesn't break old behavior.
> >>>> >>
> >>>> >> I understand your previous post now.
> >>>> >>
> >>>> >> I was saying that setting to either a null value or garbage
> >>>> >>  (which is interpreted as zero by memparse(buf, NULL);)
> >>>> >> removes the limit.
> >>>> >>
> >>>> >> I think this is "surprise" behaviour and rather the null case should
> >>>> >> return  -EINVAL
> >>>> >> The test below should be "good enough" though not catching all garbage.
> >>>> >
> >>>> > Thanks for suggesting but as I said, it should be fixed in memparse itself,
> >>>> > not caller if it is really problem so I don't want to touch it in this
> >>>> > patchset. It's not critical for adding the feature.
> >>>> >
> >>>>
> >>>> I've looked into the memparse function more since we talked.
> >>>> I do believe a wrapper function around it for the typical use by sysfs would
> >>>> be very valuable.
> >>>
> >>> Agree.
> >>>
> >>>> However, there is nothing wrong with memparse itself that needs to be fixed.
> >>>>
> >>>> It does what it is documented to do very well (In My Uninformed Opinion).
> >>>> It provides everything that a caller needs to manage the token that it
> >>>> processes.
> >>>> It thus handles strings like "7,,5,8,,9" with the implied zeros.
> >>>
> >>> Maybe strict_memparse would be better to protect such things so you
> >>> could find several places to clean it up.
> >>>
> >>>>
> >>>> The fact that other callers don't check the return pointer value to
> >>>> see if only a null
> >>>> string was processed, is not its fault.
> >>>> Nor that it may not be ideally suited to sysfs attributes; that other store
> >>>> functions use it in a given manner does not means that is correct -
> >>>> nor that it is
> >>>> incorrect for that "knob". Some attributes could be just as valid with
> >>>> null zeros.
> >>>>
> >>>> And you are correct, to disambiguate the zero is not required for the
> >>>> limit feature.
> >>>> Your original patch which disallowed zero was full feature for mem_limit.
> >>>> It is the requested non-crucial feature to allow zero to reestablish
> >>>> the initial state
> >>>>  that benefits from distinguishing an explicit zero from a "default zero'
> >>>>  when garbage is written.
> >>>>
> >>>> The final argument is that if we release this feature as is the undocumented
> >>>>  functionality could be relied upon, and when later fixed: user space breaks.
> >>>
> >>> I don't get it. Why does it break userspace?
> >>> The sysfs-block-zram says "0" means disable the limit.
> >>> If someone writes *garabge* but work as if disabling the limit,
> >>> it's not a right thing and he already broke although it worked
> >>> so it would be not a problem if we fix later.
> >>> (ie, we don't need to take care of broken userspace)
> >>> Am I missing your point?
> >>>
> >>
> >> Perhaps you are missing my point, perhaps ignoring or dismissing.
> >>
> >> Basically, if a facility works in a useful way, even if it was designed for
> >> different usage, that becomes the "accepted" interface/usage.
> >> The developer may not have intended that usage or may even considered
> >> it wrong and a broken usage, but it is what it is and people become
> >>  reliant on that behaviour.
> >>
> >> Case in point is memparse itself.
> >>
> >> The developer intentionally sets the return pointer because that is the
> >> only value that can be validated for correct performance.
> >> The return value allows -ve so the standard error message passing is not valid.
> >> Unfortunately, C allows the user to pass a NULL value in the parameter.
> >> The developer could consider that absurd and fundamentally broken.
> >> But to the user it is a valid situation, because (perhaps) it can't be
> >> bothered to handle error cases.
> >>
> >> So, who is to blame.
> >> You say memparse, that it is fundamentally broken,
> >>   because it didn't check to see that it was used correctly.
> >>  And I say  mem_limit_store is fundamentally broken,
> >>   because it didn't check to see that it was used correctly.
> >
> > I think we should look at what the rest of the kernel does as far as
> > checking memparse results.  It appears to be a mix of some code
> > checking memparse while others don't.  The most common way to check
> > appears to be to verify that memparse actually parsed at least 1
> > character, e.g.:
> >   oldp = p;
> >   mem_size = memparse(p, &p);
> >   if (p == oldp)
> >     return -EINVAL;
> >
> > although other places where 0 isn't valid can simply check for that:
> >   mem_size = memparse(p, &p);
> >   /* don't remove all of memory when handling "mem={invalid}" param */
> >   if (mem_size == 0)
> >     return -EINVAL;
> >
> > or even the other memparse use in zram_drv.c:
> >   disksize = memparse(buf, NULL);
> >   if (!disksize)
> >     return -EINVAL;
> >
> >
> > And there seem to be other places where (maybe?) there's no checking
> > at all.  However, it also seems like many cases of memparse usage are
> > looking for a non-zero value, and therefore they can either
> > immediately check for zero/invalid or (possibly) later code has checks
> > to avoid using any zero value.  In this case though, 0 is a valid
> > value.  So, while I agree that if a user passes an invalid (i.e.
> > non-numeric) value it's clearly user error, it might be closer to the
> > apparent (although unwritten AFAICT) memparse usage api to check the
> > result for validity; in our case a simple check if at least 1 char was
> > parsed is all that's needed, e.g.:
> >
> > {
> >   u64 limit;
> >   char *tmp = buf;
> >   struct zram *zram = dev_to_zram(dev);
> >
> >   limit = memparse(buf, &tmp);
> >   if (buf == tmp) /* no chars parsed, invalid input */
> >     return -EINVAL;
> >   down_write(&zram->init_lock);
> 
> 
> Thank you Dan, for this clear, unoffensive and I believe compelling analysis.

Thanks for suggestion, Dan.

David, Are you okay for this?

You pointed out several cases. One was NULL check.
Dan's patch will fix it but other example you pointed out was
"7,,5,8,,9". Slightly modifying your example, "0..1" can reset without
returning EINVAL. Actually, it was not what we want.
Couldn't we check it if you guys really want to prevent wrong use from
userspace? If we don't need it, pz, give me a reason so I will convince
and proceed this patchset and do further works.

Thanks.

> 
> I have much to learn.
> 
> > ...
> >
> >
> > Separate from this patch, it would also help if the lib/cmdline.c
> > memparse doc was at least updated to clarify when the result should be
> > checked for validity (e.g. always, or at least when the result is 0)
> > and how best to do that (e.g. if 0 is an invalid value, just check if
> > the result is 0; if 0 is a possible valid value, check if any chars
> > were parsed).
> >
> >
> 
> I'd argue that the code is not the place for this usage recommendation.
> But rather an expansion of the support doc for sysfs
> on how to use such parsing/validation routines.
> 
> I agree with Minchan that these helper functions could be improved
> for specific use by sysfs.
>  And I will pursue this. (and maybe the documentation?)
> 
> 
> >>
> >> The difference is that memparse cannot stop being abused
> >> (C allows the NULL argument and extensive tricks are required to address that)
> >> however, we can readily fix mem_limit_store and ensure
> >> 1) no regression when the interface IS fixed and
> >> 2) predictable behaviour when accidental or "fuzzy" input arrives.
> >>
> >>
> >>>> They say getting API right is a difficult exercise. I suggest, if we
> >>>> don't insisting on
> >>>>  an explicit zero we have the API wrong.
> >>>>
> >>>> I don't think you disagreed, just that the burden to get it correct
> >>>> lay elsewhere.
> >>>>
> >>>> If that is the case it doesn't really matter, we cannot release this
> >>>> interface until
> >>>>  it is corrected wherever it must be.
> >>>>
> >>>> And my zero check was a poor hack.
> >>>>
> >>>> I should have explicitly checked the returned pointer value.
> >>>>
> >>>> I will send that proposed revision, and hopefully you will consider it
> >>>> for inclusion.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> >>
> >>>> >> >
> >>>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> >>>> >> > ---
> >>>> >> >  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
> >>>> >> >  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
> >>>> >> >  drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
> >>>> >> >  drivers/block/zram/zram_drv.h              |  5 ++++
> >>>> >> >  4 files changed, 76 insertions(+), 4 deletions(-)
> >>>> >> >
> >>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
> >>>> >> > index 70ec992514d0..b8c779d64968 100644
> >>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram
> >>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram
> >>>> >> > @@ -119,3 +119,13 @@ Description:
> >>>> >> >                 efficiency can be calculated using compr_data_size and this
> >>>> >> >                 statistic.
> >>>> >> >                 Unit: bytes
> >>>> >> > +
> >>>> >> > +What:          /sys/block/zram<id>/mem_limit
> >>>> >> > +Date:          August 2014
> >>>> >> > +Contact:       Minchan Kim <minchan@kernel.org>
> >>>> >> > +Description:
> >>>> >> > +               The mem_limit file is read/write and specifies the amount
> >>>> >> > +               of memory to be able to consume memory to store store
> >>>> >> > +               compressed data. The limit could be changed in run time
> >>>> >> > -               and "0" is default which means disable the limit.
> >>>> >> > +               and "0" means disable the limit. No limit is the initial state.
> >>>> >>
> >>>> >> there should be no default in the API.
> >>>> >
> >>>> > Thanks.
> >>>> >
> >>>> >>
> >>>> >> > +               Unit: bytes
> >>>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
> >>>> >> > index 0595c3f56ccf..82c6a41116db 100644
> >>>> >> > --- a/Documentation/blockdev/zram.txt
> >>>> >> > +++ b/Documentation/blockdev/zram.txt
> >>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
> >>>> >> >  since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
> >>>> >> >  size of the disk when not in use so a huge zram is wasteful.
> >>>> >> >
> >>>> >> > -5) Activate:
> >>>> >> > +5) Set memory limit: Optional
> >>>> >> > +       Set memory limit by writing the value to sysfs node 'mem_limit'.
> >>>> >> > +       The value can be either in bytes or you can use mem suffixes.
> >>>> >> > +       In addition, you could change the value in runtime.
> >>>> >> > +       Examples:
> >>>> >> > +           # limit /dev/zram0 with 50MB memory
> >>>> >> > +           echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
> >>>> >> > +
> >>>> >> > +           # Using mem suffixes
> >>>> >> > +           echo 256K > /sys/block/zram0/mem_limit
> >>>> >> > +           echo 512M > /sys/block/zram0/mem_limit
> >>>> >> > +           echo 1G > /sys/block/zram0/mem_limit
> >>>> >> > +
> >>>> >> > +           # To disable memory limit
> >>>> >> > +           echo 0 > /sys/block/zram0/mem_limit
> >>>> >> > +
> >>>> >> > +6) Activate:
> >>>> >> >         mkswap /dev/zram0
> >>>> >> >         swapon /dev/zram0
> >>>> >> >
> >>>> >> >         mkfs.ext4 /dev/zram1
> >>>> >> >         mount /dev/zram1 /tmp
> >>>> >> >
> >>>> >> > -6) Stats:
> >>>> >> > +7) Stats:
> >>>> >> >         Per-device statistics are exported as various nodes under
> >>>> >> >         /sys/block/zram<id>/
> >>>> >> >                 disksize
> >>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
> >>>> >> >                 compr_data_size
> >>>> >> >                 mem_used_total
> >>>> >> >
> >>>> >> > -7) Deactivate:
> >>>> >> > +8) Deactivate:
> >>>> >> >         swapoff /dev/zram0
> >>>> >> >         umount /dev/zram1
> >>>> >> >
> >>>> >> > -8) Reset:
> >>>> >> > +9) Reset:
> >>>> >> >         Write any positive value to 'reset' sysfs node
> >>>> >> >         echo 1 > /sys/block/zram0/reset
> >>>> >> >         echo 1 > /sys/block/zram1/reset
> >>>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> >>>> >> > index f0b8b30a7128..370c355eb127 100644
> >>>> >> > --- a/drivers/block/zram/zram_drv.c
> >>>> >> > +++ b/drivers/block/zram/zram_drv.c
> >>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
> >>>> >> >         return scnprintf(buf, PAGE_SIZE, "%d\n", val);
> >>>> >> >  }
> >>>> >> >
> >>>> >> > +static ssize_t mem_limit_show(struct device *dev,
> >>>> >> > +               struct device_attribute *attr, char *buf)
> >>>> >> > +{
> >>>> >> > +       u64 val;
> >>>> >> > +       struct zram *zram = dev_to_zram(dev);
> >>>> >> > +
> >>>> >> > +       down_read(&zram->init_lock);
> >>>> >> > +       val = zram->limit_pages;
> >>>> >> > +       up_read(&zram->init_lock);
> >>>> >> > +
> >>>> >> > +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
> >>>> >> > +}
> >>>> >> > +
> >>>> >> > +static ssize_t mem_limit_store(struct device *dev,
> >>>> >> > +               struct device_attribute *attr, const char *buf, size_t len)
> >>>> >> > +{
> >>>> >> > +       u64 limit;
> >>>> >> > +       struct zram *zram = dev_to_zram(dev);
> >>>> >> > +
> >>>> >> > +       limit = memparse(buf, NULL);
> >>>> >>
> >>>> >>             if (limit = 0 && buf != "0")
> >>>> >>                   return  -EINVAL
> >>>> >>
> >>>> >> > +       down_write(&zram->init_lock);
> >>>> >> > +       zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
> >>>> >> > +       up_write(&zram->init_lock);
> >>>> >> > +
> >>>> >> > +       return len;
> >>>> >> > +}
> >>>> >> > +
> >>>> >> >  static ssize_t max_comp_streams_store(struct device *dev,
> >>>> >> >                 struct device_attribute *attr, const char *buf, size_t len)
> >>>> >> >  {
> >>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
> >>>> >> >                 ret = -ENOMEM;
> >>>> >> >                 goto out;
> >>>> >> >         }
> >>>> >> > +
> >>>> >> > +       if (zram->limit_pages &&
> >>>> >> > +               zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
> >>>> >> > +               zs_free(meta->mem_pool, handle);
> >>>> >> > +               ret = -ENOMEM;
> >>>> >> > +               goto out;
> >>>> >> > +       }
> >>>> >> > +
> >>>> >> >         cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
> >>>> >> >
> >>>> >> >         if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
> >>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
> >>>> >> >         struct zram_meta *meta;
> >>>> >> >
> >>>> >> >         down_write(&zram->init_lock);
> >>>> >> > +
> >>>> >> > +       zram->limit_pages = 0;
> >>>> >> > +
> >>>> >> >         if (!init_done(zram)) {
> >>>> >> >                 up_write(&zram->init_lock);
> >>>> >> >                 return;
> >>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
> >>>> >> >  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
> >>>> >> >  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
> >>>> >> >  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
> >>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
> >>>> >> > +               mem_limit_store);
> >>>> >> >  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
> >>>> >> >                 max_comp_streams_show, max_comp_streams_store);
> >>>> >> >  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
> >>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
> >>>> >> >         &dev_attr_orig_data_size.attr,
> >>>> >> >         &dev_attr_compr_data_size.attr,
> >>>> >> >         &dev_attr_mem_used_total.attr,
> >>>> >> > +       &dev_attr_mem_limit.attr,
> >>>> >> >         &dev_attr_max_comp_streams.attr,
> >>>> >> >         &dev_attr_comp_algorithm.attr,
> >>>> >> >         NULL,
> >>>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
> >>>> >> > index e0f725c87cc6..b7aa9c21553f 100644
> >>>> >> > --- a/drivers/block/zram/zram_drv.h
> >>>> >> > +++ b/drivers/block/zram/zram_drv.h
> >>>> >> > @@ -112,6 +112,11 @@ struct zram {
> >>>> >> >         u64 disksize;   /* bytes */
> >>>> >> >         int max_comp_streams;
> >>>> >> >         struct zram_stats stats;
> >>>> >> > +       /*
> >>>> >> > +        * the number of pages zram can consume for storing compressed data
> >>>> >> > +        */
> >>>> >> > +       unsigned long limit_pages;
> >>>> >> > +
> >>>> >> >         char compressor[10];
> >>>> >> >  };
> >>>> >> >  #endif
> >>>> >> > --
> >>>> >> > 2.0.0
> >>>> >> >
> >>>> >>
> >>>> >> --
> >>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >>>> >> the body to majordomo@kvack.org.  For more info on Linux MM,
> >>>> >> see: http://www.linux-mm.org/ .
> >>>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >>>> >
> >>>> > --
> >>>> > Kind regards,
> >>>> > Minchan Kim
> >>>>
> >>>> --
> >>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >>>> the body to majordomo@kvack.org.  For more info on Linux MM,
> >>>> see: http://www.linux-mm.org/ .
> >>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >>>
> >>> --
> >>> Kind regards,
> >>> Minchan Kim
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
@ 2014-08-26  4:39                   ` Minchan Kim
  0 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2014-08-26  4:39 UTC (permalink / raw)
  To: David Horner
  Cc: Dan Streetman, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings

Hi Dan and David,

On Mon, Aug 25, 2014 at 09:54:57PM -0400, David Horner wrote:
> On Mon, Aug 25, 2014 at 2:12 PM, Dan Streetman <ddstreet@ieee.org> wrote:
> > On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@gmail.com> wrote:
> >> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote:
> >>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote:
> >>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote:
> >>>> > Hello David,
> >>>> >
> >>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote:
> >>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
> >>>> >> > Since zram has no control feature to limit memory usage,
> >>>> >> > it makes hard to manage system memrory.
> >>>> >> >
> >>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the
> >>>> >> > a limit so that zram could fail allocation once it reaches
> >>>> >> > the limit.
> >>>> >> >
> >>>> >> > In addition, user could change the limit in runtime so that
> >>>> >> > he could manage the memory more dynamically.
> >>>> >> >
> >>>> >> - Default is no limit so it doesn't break old behavior.
> >>>> >> + Initial state is no limit so it doesn't break old behavior.
> >>>> >>
> >>>> >> I understand your previous post now.
> >>>> >>
> >>>> >> I was saying that setting to either a null value or garbage
> >>>> >>  (which is interpreted as zero by memparse(buf, NULL);)
> >>>> >> removes the limit.
> >>>> >>
> >>>> >> I think this is "surprise" behaviour and rather the null case should
> >>>> >> return  -EINVAL
> >>>> >> The test below should be "good enough" though not catching all garbage.
> >>>> >
> >>>> > Thanks for suggesting but as I said, it should be fixed in memparse itself,
> >>>> > not caller if it is really problem so I don't want to touch it in this
> >>>> > patchset. It's not critical for adding the feature.
> >>>> >
> >>>>
> >>>> I've looked into the memparse function more since we talked.
> >>>> I do believe a wrapper function around it for the typical use by sysfs would
> >>>> be very valuable.
> >>>
> >>> Agree.
> >>>
> >>>> However, there is nothing wrong with memparse itself that needs to be fixed.
> >>>>
> >>>> It does what it is documented to do very well (In My Uninformed Opinion).
> >>>> It provides everything that a caller needs to manage the token that it
> >>>> processes.
> >>>> It thus handles strings like "7,,5,8,,9" with the implied zeros.
> >>>
> >>> Maybe strict_memparse would be better to protect such things so you
> >>> could find several places to clean it up.
> >>>
> >>>>
> >>>> The fact that other callers don't check the return pointer value to
> >>>> see if only a null
> >>>> string was processed, is not its fault.
> >>>> Nor that it may not be ideally suited to sysfs attributes; that other store
> >>>> functions use it in a given manner does not means that is correct -
> >>>> nor that it is
> >>>> incorrect for that "knob". Some attributes could be just as valid with
> >>>> null zeros.
> >>>>
> >>>> And you are correct, to disambiguate the zero is not required for the
> >>>> limit feature.
> >>>> Your original patch which disallowed zero was full feature for mem_limit.
> >>>> It is the requested non-crucial feature to allow zero to reestablish
> >>>> the initial state
> >>>>  that benefits from distinguishing an explicit zero from a "default zero'
> >>>>  when garbage is written.
> >>>>
> >>>> The final argument is that if we release this feature as is the undocumented
> >>>>  functionality could be relied upon, and when later fixed: user space breaks.
> >>>
> >>> I don't get it. Why does it break userspace?
> >>> The sysfs-block-zram says "0" means disable the limit.
> >>> If someone writes *garabge* but work as if disabling the limit,
> >>> it's not a right thing and he already broke although it worked
> >>> so it would be not a problem if we fix later.
> >>> (ie, we don't need to take care of broken userspace)
> >>> Am I missing your point?
> >>>
> >>
> >> Perhaps you are missing my point, perhaps ignoring or dismissing.
> >>
> >> Basically, if a facility works in a useful way, even if it was designed for
> >> different usage, that becomes the "accepted" interface/usage.
> >> The developer may not have intended that usage or may even considered
> >> it wrong and a broken usage, but it is what it is and people become
> >>  reliant on that behaviour.
> >>
> >> Case in point is memparse itself.
> >>
> >> The developer intentionally sets the return pointer because that is the
> >> only value that can be validated for correct performance.
> >> The return value allows -ve so the standard error message passing is not valid.
> >> Unfortunately, C allows the user to pass a NULL value in the parameter.
> >> The developer could consider that absurd and fundamentally broken.
> >> But to the user it is a valid situation, because (perhaps) it can't be
> >> bothered to handle error cases.
> >>
> >> So, who is to blame.
> >> You say memparse, that it is fundamentally broken,
> >>   because it didn't check to see that it was used correctly.
> >>  And I say  mem_limit_store is fundamentally broken,
> >>   because it didn't check to see that it was used correctly.
> >
> > I think we should look at what the rest of the kernel does as far as
> > checking memparse results.  It appears to be a mix of some code
> > checking memparse while others don't.  The most common way to check
> > appears to be to verify that memparse actually parsed at least 1
> > character, e.g.:
> >   oldp = p;
> >   mem_size = memparse(p, &p);
> >   if (p == oldp)
> >     return -EINVAL;
> >
> > although other places where 0 isn't valid can simply check for that:
> >   mem_size = memparse(p, &p);
> >   /* don't remove all of memory when handling "mem={invalid}" param */
> >   if (mem_size == 0)
> >     return -EINVAL;
> >
> > or even the other memparse use in zram_drv.c:
> >   disksize = memparse(buf, NULL);
> >   if (!disksize)
> >     return -EINVAL;
> >
> >
> > And there seem to be other places where (maybe?) there's no checking
> > at all.  However, it also seems like many cases of memparse usage are
> > looking for a non-zero value, and therefore they can either
> > immediately check for zero/invalid or (possibly) later code has checks
> > to avoid using any zero value.  In this case though, 0 is a valid
> > value.  So, while I agree that if a user passes an invalid (i.e.
> > non-numeric) value it's clearly user error, it might be closer to the
> > apparent (although unwritten AFAICT) memparse usage api to check the
> > result for validity; in our case a simple check if at least 1 char was
> > parsed is all that's needed, e.g.:
> >
> > {
> >   u64 limit;
> >   char *tmp = buf;
> >   struct zram *zram = dev_to_zram(dev);
> >
> >   limit = memparse(buf, &tmp);
> >   if (buf == tmp) /* no chars parsed, invalid input */
> >     return -EINVAL;
> >   down_write(&zram->init_lock);
> 
> 
> Thank you Dan, for this clear, unoffensive and I believe compelling analysis.

Thanks for suggestion, Dan.

David, Are you okay for this?

You pointed out several cases. One was NULL check.
Dan's patch will fix it but other example you pointed out was
"7,,5,8,,9". Slightly modifying your example, "0..1" can reset without
returning EINVAL. Actually, it was not what we want.
Couldn't we check it if you guys really want to prevent wrong use from
userspace? If we don't need it, pz, give me a reason so I will convince
and proceed this patchset and do further works.

Thanks.

> 
> I have much to learn.
> 
> > ...
> >
> >
> > Separate from this patch, it would also help if the lib/cmdline.c
> > memparse doc was at least updated to clarify when the result should be
> > checked for validity (e.g. always, or at least when the result is 0)
> > and how best to do that (e.g. if 0 is an invalid value, just check if
> > the result is 0; if 0 is a possible valid value, check if any chars
> > were parsed).
> >
> >
> 
> I'd argue that the code is not the place for this usage recommendation.
> But rather an expansion of the support doc for sysfs
> on how to use such parsing/validation routines.
> 
> I agree with Minchan that these helper functions could be improved
> for specific use by sysfs.
>  And I will pursue this. (and maybe the documentation?)
> 
> 
> >>
> >> The difference is that memparse cannot stop being abused
> >> (C allows the NULL argument and extensive tricks are required to address that)
> >> however, we can readily fix mem_limit_store and ensure
> >> 1) no regression when the interface IS fixed and
> >> 2) predictable behaviour when accidental or "fuzzy" input arrives.
> >>
> >>
> >>>> They say getting API right is a difficult exercise. I suggest, if we
> >>>> don't insisting on
> >>>>  an explicit zero we have the API wrong.
> >>>>
> >>>> I don't think you disagreed, just that the burden to get it correct
> >>>> lay elsewhere.
> >>>>
> >>>> If that is the case it doesn't really matter, we cannot release this
> >>>> interface until
> >>>>  it is corrected wherever it must be.
> >>>>
> >>>> And my zero check was a poor hack.
> >>>>
> >>>> I should have explicitly checked the returned pointer value.
> >>>>
> >>>> I will send that proposed revision, and hopefully you will consider it
> >>>> for inclusion.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> >>
> >>>> >> >
> >>>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> >>>> >> > ---
> >>>> >> >  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
> >>>> >> >  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
> >>>> >> >  drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
> >>>> >> >  drivers/block/zram/zram_drv.h              |  5 ++++
> >>>> >> >  4 files changed, 76 insertions(+), 4 deletions(-)
> >>>> >> >
> >>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
> >>>> >> > index 70ec992514d0..b8c779d64968 100644
> >>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram
> >>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram
> >>>> >> > @@ -119,3 +119,13 @@ Description:
> >>>> >> >                 efficiency can be calculated using compr_data_size and this
> >>>> >> >                 statistic.
> >>>> >> >                 Unit: bytes
> >>>> >> > +
> >>>> >> > +What:          /sys/block/zram<id>/mem_limit
> >>>> >> > +Date:          August 2014
> >>>> >> > +Contact:       Minchan Kim <minchan@kernel.org>
> >>>> >> > +Description:
> >>>> >> > +               The mem_limit file is read/write and specifies the amount
> >>>> >> > +               of memory to be able to consume memory to store store
> >>>> >> > +               compressed data. The limit could be changed in run time
> >>>> >> > -               and "0" is default which means disable the limit.
> >>>> >> > +               and "0" means disable the limit. No limit is the initial state.
> >>>> >>
> >>>> >> there should be no default in the API.
> >>>> >
> >>>> > Thanks.
> >>>> >
> >>>> >>
> >>>> >> > +               Unit: bytes
> >>>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
> >>>> >> > index 0595c3f56ccf..82c6a41116db 100644
> >>>> >> > --- a/Documentation/blockdev/zram.txt
> >>>> >> > +++ b/Documentation/blockdev/zram.txt
> >>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
> >>>> >> >  since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
> >>>> >> >  size of the disk when not in use so a huge zram is wasteful.
> >>>> >> >
> >>>> >> > -5) Activate:
> >>>> >> > +5) Set memory limit: Optional
> >>>> >> > +       Set memory limit by writing the value to sysfs node 'mem_limit'.
> >>>> >> > +       The value can be either in bytes or you can use mem suffixes.
> >>>> >> > +       In addition, you could change the value in runtime.
> >>>> >> > +       Examples:
> >>>> >> > +           # limit /dev/zram0 with 50MB memory
> >>>> >> > +           echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
> >>>> >> > +
> >>>> >> > +           # Using mem suffixes
> >>>> >> > +           echo 256K > /sys/block/zram0/mem_limit
> >>>> >> > +           echo 512M > /sys/block/zram0/mem_limit
> >>>> >> > +           echo 1G > /sys/block/zram0/mem_limit
> >>>> >> > +
> >>>> >> > +           # To disable memory limit
> >>>> >> > +           echo 0 > /sys/block/zram0/mem_limit
> >>>> >> > +
> >>>> >> > +6) Activate:
> >>>> >> >         mkswap /dev/zram0
> >>>> >> >         swapon /dev/zram0
> >>>> >> >
> >>>> >> >         mkfs.ext4 /dev/zram1
> >>>> >> >         mount /dev/zram1 /tmp
> >>>> >> >
> >>>> >> > -6) Stats:
> >>>> >> > +7) Stats:
> >>>> >> >         Per-device statistics are exported as various nodes under
> >>>> >> >         /sys/block/zram<id>/
> >>>> >> >                 disksize
> >>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
> >>>> >> >                 compr_data_size
> >>>> >> >                 mem_used_total
> >>>> >> >
> >>>> >> > -7) Deactivate:
> >>>> >> > +8) Deactivate:
> >>>> >> >         swapoff /dev/zram0
> >>>> >> >         umount /dev/zram1
> >>>> >> >
> >>>> >> > -8) Reset:
> >>>> >> > +9) Reset:
> >>>> >> >         Write any positive value to 'reset' sysfs node
> >>>> >> >         echo 1 > /sys/block/zram0/reset
> >>>> >> >         echo 1 > /sys/block/zram1/reset
> >>>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> >>>> >> > index f0b8b30a7128..370c355eb127 100644
> >>>> >> > --- a/drivers/block/zram/zram_drv.c
> >>>> >> > +++ b/drivers/block/zram/zram_drv.c
> >>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
> >>>> >> >         return scnprintf(buf, PAGE_SIZE, "%d\n", val);
> >>>> >> >  }
> >>>> >> >
> >>>> >> > +static ssize_t mem_limit_show(struct device *dev,
> >>>> >> > +               struct device_attribute *attr, char *buf)
> >>>> >> > +{
> >>>> >> > +       u64 val;
> >>>> >> > +       struct zram *zram = dev_to_zram(dev);
> >>>> >> > +
> >>>> >> > +       down_read(&zram->init_lock);
> >>>> >> > +       val = zram->limit_pages;
> >>>> >> > +       up_read(&zram->init_lock);
> >>>> >> > +
> >>>> >> > +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
> >>>> >> > +}
> >>>> >> > +
> >>>> >> > +static ssize_t mem_limit_store(struct device *dev,
> >>>> >> > +               struct device_attribute *attr, const char *buf, size_t len)
> >>>> >> > +{
> >>>> >> > +       u64 limit;
> >>>> >> > +       struct zram *zram = dev_to_zram(dev);
> >>>> >> > +
> >>>> >> > +       limit = memparse(buf, NULL);
> >>>> >>
> >>>> >>             if (limit = 0 && buf != "0")
> >>>> >>                   return  -EINVAL
> >>>> >>
> >>>> >> > +       down_write(&zram->init_lock);
> >>>> >> > +       zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
> >>>> >> > +       up_write(&zram->init_lock);
> >>>> >> > +
> >>>> >> > +       return len;
> >>>> >> > +}
> >>>> >> > +
> >>>> >> >  static ssize_t max_comp_streams_store(struct device *dev,
> >>>> >> >                 struct device_attribute *attr, const char *buf, size_t len)
> >>>> >> >  {
> >>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
> >>>> >> >                 ret = -ENOMEM;
> >>>> >> >                 goto out;
> >>>> >> >         }
> >>>> >> > +
> >>>> >> > +       if (zram->limit_pages &&
> >>>> >> > +               zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
> >>>> >> > +               zs_free(meta->mem_pool, handle);
> >>>> >> > +               ret = -ENOMEM;
> >>>> >> > +               goto out;
> >>>> >> > +       }
> >>>> >> > +
> >>>> >> >         cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
> >>>> >> >
> >>>> >> >         if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
> >>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
> >>>> >> >         struct zram_meta *meta;
> >>>> >> >
> >>>> >> >         down_write(&zram->init_lock);
> >>>> >> > +
> >>>> >> > +       zram->limit_pages = 0;
> >>>> >> > +
> >>>> >> >         if (!init_done(zram)) {
> >>>> >> >                 up_write(&zram->init_lock);
> >>>> >> >                 return;
> >>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
> >>>> >> >  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
> >>>> >> >  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
> >>>> >> >  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
> >>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
> >>>> >> > +               mem_limit_store);
> >>>> >> >  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
> >>>> >> >                 max_comp_streams_show, max_comp_streams_store);
> >>>> >> >  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
> >>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
> >>>> >> >         &dev_attr_orig_data_size.attr,
> >>>> >> >         &dev_attr_compr_data_size.attr,
> >>>> >> >         &dev_attr_mem_used_total.attr,
> >>>> >> > +       &dev_attr_mem_limit.attr,
> >>>> >> >         &dev_attr_max_comp_streams.attr,
> >>>> >> >         &dev_attr_comp_algorithm.attr,
> >>>> >> >         NULL,
> >>>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
> >>>> >> > index e0f725c87cc6..b7aa9c21553f 100644
> >>>> >> > --- a/drivers/block/zram/zram_drv.h
> >>>> >> > +++ b/drivers/block/zram/zram_drv.h
> >>>> >> > @@ -112,6 +112,11 @@ struct zram {
> >>>> >> >         u64 disksize;   /* bytes */
> >>>> >> >         int max_comp_streams;
> >>>> >> >         struct zram_stats stats;
> >>>> >> > +       /*
> >>>> >> > +        * the number of pages zram can consume for storing compressed data
> >>>> >> > +        */
> >>>> >> > +       unsigned long limit_pages;
> >>>> >> > +
> >>>> >> >         char compressor[10];
> >>>> >> >  };
> >>>> >> >  #endif
> >>>> >> > --
> >>>> >> > 2.0.0
> >>>> >> >
> >>>> >>
> >>>> >> --
> >>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >>>> >> the body to majordomo@kvack.org.  For more info on Linux MM,
> >>>> >> see: http://www.linux-mm.org/ .
> >>>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >>>> >
> >>>> > --
> >>>> > Kind regards,
> >>>> > Minchan Kim
> >>>>
> >>>> --
> >>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >>>> the body to majordomo@kvack.org.  For more info on Linux MM,
> >>>> see: http://www.linux-mm.org/ .
> >>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >>>
> >>> --
> >>> Kind regards,
> >>> Minchan Kim
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
  2014-08-25  8:25             ` Dongsheng Song
@ 2014-08-26  4:51               ` Minchan Kim
  -1 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2014-08-26  4:51 UTC (permalink / raw)
  To: Dongsheng Song
  Cc: David Horner, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman

Hello,

On Mon, Aug 25, 2014 at 04:25:31PM +0800, Dongsheng Song wrote:
> > +What:          /sys/block/zram<id>/mem_limit
> > +Date:          August 2014
> > +Contact:       Minchan Kim <minchan@kernel.org>
> > +Description:
> > +               The mem_limit file is read/write and specifies the amount
>  > +               of memory to be able to consume memory to store store
> > +               compressed data. The limit could be changed in run time
> > +               and "0" means disable the limit. No limit is the initial state.
> 
> extra word 'store' ?
> The mem_limit file is read/write and specifies the amount of memory to
> be able to consume memory to store store compressed data.
> 
> maybe this better ?
> The mem_limit file is read/write and specifies the amount of memory to
> store compressed data.

Will fix.
Thanks!

> 
> --
> Dongsheng
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
@ 2014-08-26  4:51               ` Minchan Kim
  0 siblings, 0 replies; 44+ messages in thread
From: Minchan Kim @ 2014-08-26  4:51 UTC (permalink / raw)
  To: Dongsheng Song
  Cc: David Horner, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman

Hello,

On Mon, Aug 25, 2014 at 04:25:31PM +0800, Dongsheng Song wrote:
> > +What:          /sys/block/zram<id>/mem_limit
> > +Date:          August 2014
> > +Contact:       Minchan Kim <minchan@kernel.org>
> > +Description:
> > +               The mem_limit file is read/write and specifies the amount
>  > +               of memory to be able to consume memory to store store
> > +               compressed data. The limit could be changed in run time
> > +               and "0" means disable the limit. No limit is the initial state.
> 
> extra word 'store' ?
> The mem_limit file is read/write and specifies the amount of memory to
> be able to consume memory to store store compressed data.
> 
> maybe this better ?
> The mem_limit file is read/write and specifies the amount of memory to
> store compressed data.

Will fix.
Thanks!

> 
> --
> Dongsheng
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
  2014-08-26  4:39                   ` Minchan Kim
@ 2014-08-26  5:36                     ` David Horner
  -1 siblings, 0 replies; 44+ messages in thread
From: David Horner @ 2014-08-26  5:36 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Dan Streetman, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings

On Tue, Aug 26, 2014 at 12:39 AM, Minchan Kim <minchan@kernel.org> wrote:
> Hi Dan and David,
>
> On Mon, Aug 25, 2014 at 09:54:57PM -0400, David Horner wrote:
>> On Mon, Aug 25, 2014 at 2:12 PM, Dan Streetman <ddstreet@ieee.org> wrote:
>> > On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@gmail.com> wrote:
>> >> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote:
>> >>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote:
>> >>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote:
>> >>>> > Hello David,
>> >>>> >
>> >>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote:
>> >>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
>> >>>> >> > Since zram has no control feature to limit memory usage,
>> >>>> >> > it makes hard to manage system memrory.
>> >>>> >> >
>> >>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the
>> >>>> >> > a limit so that zram could fail allocation once it reaches
>> >>>> >> > the limit.
>> >>>> >> >
>> >>>> >> > In addition, user could change the limit in runtime so that
>> >>>> >> > he could manage the memory more dynamically.
>> >>>> >> >
>> >>>> >> - Default is no limit so it doesn't break old behavior.
>> >>>> >> + Initial state is no limit so it doesn't break old behavior.
>> >>>> >>
>> >>>> >> I understand your previous post now.
>> >>>> >>
>> >>>> >> I was saying that setting to either a null value or garbage
>> >>>> >>  (which is interpreted as zero by memparse(buf, NULL);)
>> >>>> >> removes the limit.
>> >>>> >>
>> >>>> >> I think this is "surprise" behaviour and rather the null case should
>> >>>> >> return  -EINVAL
>> >>>> >> The test below should be "good enough" though not catching all garbage.
>> >>>> >
>> >>>> > Thanks for suggesting but as I said, it should be fixed in memparse itself,
>> >>>> > not caller if it is really problem so I don't want to touch it in this
>> >>>> > patchset. It's not critical for adding the feature.
>> >>>> >
>> >>>>
>> >>>> I've looked into the memparse function more since we talked.
>> >>>> I do believe a wrapper function around it for the typical use by sysfs would
>> >>>> be very valuable.
>> >>>
>> >>> Agree.
>> >>>
>> >>>> However, there is nothing wrong with memparse itself that needs to be fixed.
>> >>>>
>> >>>> It does what it is documented to do very well (In My Uninformed Opinion).
>> >>>> It provides everything that a caller needs to manage the token that it
>> >>>> processes.
>> >>>> It thus handles strings like "7,,5,8,,9" with the implied zeros.
>> >>>
>> >>> Maybe strict_memparse would be better to protect such things so you
>> >>> could find several places to clean it up.
>> >>>
>> >>>>
>> >>>> The fact that other callers don't check the return pointer value to
>> >>>> see if only a null
>> >>>> string was processed, is not its fault.
>> >>>> Nor that it may not be ideally suited to sysfs attributes; that other store
>> >>>> functions use it in a given manner does not means that is correct -
>> >>>> nor that it is
>> >>>> incorrect for that "knob". Some attributes could be just as valid with
>> >>>> null zeros.
>> >>>>
>> >>>> And you are correct, to disambiguate the zero is not required for the
>> >>>> limit feature.
>> >>>> Your original patch which disallowed zero was full feature for mem_limit.
>> >>>> It is the requested non-crucial feature to allow zero to reestablish
>> >>>> the initial state
>> >>>>  that benefits from distinguishing an explicit zero from a "default zero'
>> >>>>  when garbage is written.
>> >>>>
>> >>>> The final argument is that if we release this feature as is the undocumented
>> >>>>  functionality could be relied upon, and when later fixed: user space breaks.
>> >>>
>> >>> I don't get it. Why does it break userspace?
>> >>> The sysfs-block-zram says "0" means disable the limit.
>> >>> If someone writes *garabge* but work as if disabling the limit,
>> >>> it's not a right thing and he already broke although it worked
>> >>> so it would be not a problem if we fix later.
>> >>> (ie, we don't need to take care of broken userspace)
>> >>> Am I missing your point?
>> >>>
>> >>
>> >> Perhaps you are missing my point, perhaps ignoring or dismissing.
>> >>
>> >> Basically, if a facility works in a useful way, even if it was designed for
>> >> different usage, that becomes the "accepted" interface/usage.
>> >> The developer may not have intended that usage or may even considered
>> >> it wrong and a broken usage, but it is what it is and people become
>> >>  reliant on that behaviour.
>> >>
>> >> Case in point is memparse itself.
>> >>
>> >> The developer intentionally sets the return pointer because that is the
>> >> only value that can be validated for correct performance.
>> >> The return value allows -ve so the standard error message passing is not valid.
>> >> Unfortunately, C allows the user to pass a NULL value in the parameter.
>> >> The developer could consider that absurd and fundamentally broken.
>> >> But to the user it is a valid situation, because (perhaps) it can't be
>> >> bothered to handle error cases.
>> >>
>> >> So, who is to blame.
>> >> You say memparse, that it is fundamentally broken,
>> >>   because it didn't check to see that it was used correctly.
>> >>  And I say  mem_limit_store is fundamentally broken,
>> >>   because it didn't check to see that it was used correctly.
>> >
>> > I think we should look at what the rest of the kernel does as far as
>> > checking memparse results.  It appears to be a mix of some code
>> > checking memparse while others don't.  The most common way to check
>> > appears to be to verify that memparse actually parsed at least 1
>> > character, e.g.:
>> >   oldp = p;
>> >   mem_size = memparse(p, &p);
>> >   if (p == oldp)
>> >     return -EINVAL;
>> >
>> > although other places where 0 isn't valid can simply check for that:
>> >   mem_size = memparse(p, &p);
>> >   /* don't remove all of memory when handling "mem={invalid}" param */
>> >   if (mem_size == 0)
>> >     return -EINVAL;
>> >
>> > or even the other memparse use in zram_drv.c:
>> >   disksize = memparse(buf, NULL);
>> >   if (!disksize)
>> >     return -EINVAL;
>> >
>> >
>> > And there seem to be other places where (maybe?) there's no checking
>> > at all.  However, it also seems like many cases of memparse usage are
>> > looking for a non-zero value, and therefore they can either
>> > immediately check for zero/invalid or (possibly) later code has checks
>> > to avoid using any zero value.  In this case though, 0 is a valid
>> > value.  So, while I agree that if a user passes an invalid (i.e.
>> > non-numeric) value it's clearly user error, it might be closer to the
>> > apparent (although unwritten AFAICT) memparse usage api to check the
>> > result for validity; in our case a simple check if at least 1 char was
>> > parsed is all that's needed, e.g.:
>> >
>> > {
>> >   u64 limit;
>> >   char *tmp = buf;
>> >   struct zram *zram = dev_to_zram(dev);
>> >
>> >   limit = memparse(buf, &tmp);
>> >   if (buf == tmp) /* no chars parsed, invalid input */
>> >     return -EINVAL;
>> >   down_write(&zram->init_lock);
>>
>>
>> Thank you Dan, for this clear, unoffensive and I believe compelling analysis.
>
> Thanks for suggestion, Dan.
>
> David, Are you okay for this?
>
> You pointed out several cases. One was NULL check.
> Dan's patch will fix it but other example you pointed out was
> "7,,5,8,,9". Slightly modifying your example, "0..1" can reset without
> returning EINVAL. Actually, it was not what we want.
> Couldn't we check it if you guys really want to prevent wrong use from
> userspace? If we don't need it, pz, give me a reason so I will convince
> and proceed this patchset and do further works.
>
> Thanks.
>

I'm very happy about this patch.

As for your example, yes, the validation is somewhat slack.

We could insist that the parsed value exactly matches the supplied input length.
But the general case of trailing blanks, and as you pointed out, CR LF
or other valid end-of-line codes would also have to be taken into account.
A substantial coding for little value returned.

I agree that in this case the fix up should be elsewhere, in the sysfs
support layer.
Trailing white space and end-of-line indicators should be optionally
stripped before
the store routine gets them, and a known terminating value appended.
Then the checking and overrun avoidance can be reasonably implemented.

Until then, the code is good as far as I am concerned.
The API is sound and the exposure to overruns and false indications is
already quite low.

(more for me to research and hopefully have time to do some real coding).

Finally, if the user wanted to express a fractional unit allocation,
like .8G, that too would be
a nice enhancement that could be added later as I don't see that
breaking the API.

(comments on this? Dan?)


>>
>> I have much to learn.
>>
>> > ...
>> >
>> >
>> > Separate from this patch, it would also help if the lib/cmdline.c
>> > memparse doc was at least updated to clarify when the result should be
>> > checked for validity (e.g. always, or at least when the result is 0)
>> > and how best to do that (e.g. if 0 is an invalid value, just check if
>> > the result is 0; if 0 is a possible valid value, check if any chars
>> > were parsed).
>> >
>> >
>>
>> I'd argue that the code is not the place for this usage recommendation.
>> But rather an expansion of the support doc for sysfs
>> on how to use such parsing/validation routines.
>>
>> I agree with Minchan that these helper functions could be improved
>> for specific use by sysfs.
>>  And I will pursue this. (and maybe the documentation?)
>>
>>
>> >>
>> >> The difference is that memparse cannot stop being abused
>> >> (C allows the NULL argument and extensive tricks are required to address that)
>> >> however, we can readily fix mem_limit_store and ensure
>> >> 1) no regression when the interface IS fixed and
>> >> 2) predictable behaviour when accidental or "fuzzy" input arrives.
>> >>
>> >>
>> >>>> They say getting API right is a difficult exercise. I suggest, if we
>> >>>> don't insisting on
>> >>>>  an explicit zero we have the API wrong.
>> >>>>
>> >>>> I don't think you disagreed, just that the burden to get it correct
>> >>>> lay elsewhere.
>> >>>>
>> >>>> If that is the case it doesn't really matter, we cannot release this
>> >>>> interface until
>> >>>>  it is corrected wherever it must be.
>> >>>>
>> >>>> And my zero check was a poor hack.
>> >>>>
>> >>>> I should have explicitly checked the returned pointer value.
>> >>>>
>> >>>> I will send that proposed revision, and hopefully you will consider it
>> >>>> for inclusion.
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> >>
>> >>>> >> >
>> >>>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org>
>> >>>> >> > ---
>> >>>> >> >  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
>> >>>> >> >  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
>> >>>> >> >  drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
>> >>>> >> >  drivers/block/zram/zram_drv.h              |  5 ++++
>> >>>> >> >  4 files changed, 76 insertions(+), 4 deletions(-)
>> >>>> >> >
>> >>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
>> >>>> >> > index 70ec992514d0..b8c779d64968 100644
>> >>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram
>> >>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram
>> >>>> >> > @@ -119,3 +119,13 @@ Description:
>> >>>> >> >                 efficiency can be calculated using compr_data_size and this
>> >>>> >> >                 statistic.
>> >>>> >> >                 Unit: bytes
>> >>>> >> > +
>> >>>> >> > +What:          /sys/block/zram<id>/mem_limit
>> >>>> >> > +Date:          August 2014
>> >>>> >> > +Contact:       Minchan Kim <minchan@kernel.org>
>> >>>> >> > +Description:
>> >>>> >> > +               The mem_limit file is read/write and specifies the amount
>> >>>> >> > +               of memory to be able to consume memory to store store
>> >>>> >> > +               compressed data. The limit could be changed in run time
>> >>>> >> > -               and "0" is default which means disable the limit.
>> >>>> >> > +               and "0" means disable the limit. No limit is the initial state.
>> >>>> >>
>> >>>> >> there should be no default in the API.
>> >>>> >
>> >>>> > Thanks.
>> >>>> >
>> >>>> >>
>> >>>> >> > +               Unit: bytes
>> >>>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
>> >>>> >> > index 0595c3f56ccf..82c6a41116db 100644
>> >>>> >> > --- a/Documentation/blockdev/zram.txt
>> >>>> >> > +++ b/Documentation/blockdev/zram.txt
>> >>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
>> >>>> >> >  since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
>> >>>> >> >  size of the disk when not in use so a huge zram is wasteful.
>> >>>> >> >
>> >>>> >> > -5) Activate:
>> >>>> >> > +5) Set memory limit: Optional
>> >>>> >> > +       Set memory limit by writing the value to sysfs node 'mem_limit'.
>> >>>> >> > +       The value can be either in bytes or you can use mem suffixes.
>> >>>> >> > +       In addition, you could change the value in runtime.
>> >>>> >> > +       Examples:
>> >>>> >> > +           # limit /dev/zram0 with 50MB memory
>> >>>> >> > +           echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
>> >>>> >> > +
>> >>>> >> > +           # Using mem suffixes
>> >>>> >> > +           echo 256K > /sys/block/zram0/mem_limit
>> >>>> >> > +           echo 512M > /sys/block/zram0/mem_limit
>> >>>> >> > +           echo 1G > /sys/block/zram0/mem_limit
>> >>>> >> > +
>> >>>> >> > +           # To disable memory limit
>> >>>> >> > +           echo 0 > /sys/block/zram0/mem_limit
>> >>>> >> > +
>> >>>> >> > +6) Activate:
>> >>>> >> >         mkswap /dev/zram0
>> >>>> >> >         swapon /dev/zram0
>> >>>> >> >
>> >>>> >> >         mkfs.ext4 /dev/zram1
>> >>>> >> >         mount /dev/zram1 /tmp
>> >>>> >> >
>> >>>> >> > -6) Stats:
>> >>>> >> > +7) Stats:
>> >>>> >> >         Per-device statistics are exported as various nodes under
>> >>>> >> >         /sys/block/zram<id>/
>> >>>> >> >                 disksize
>> >>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
>> >>>> >> >                 compr_data_size
>> >>>> >> >                 mem_used_total
>> >>>> >> >
>> >>>> >> > -7) Deactivate:
>> >>>> >> > +8) Deactivate:
>> >>>> >> >         swapoff /dev/zram0
>> >>>> >> >         umount /dev/zram1
>> >>>> >> >
>> >>>> >> > -8) Reset:
>> >>>> >> > +9) Reset:
>> >>>> >> >         Write any positive value to 'reset' sysfs node
>> >>>> >> >         echo 1 > /sys/block/zram0/reset
>> >>>> >> >         echo 1 > /sys/block/zram1/reset
>> >>>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
>> >>>> >> > index f0b8b30a7128..370c355eb127 100644
>> >>>> >> > --- a/drivers/block/zram/zram_drv.c
>> >>>> >> > +++ b/drivers/block/zram/zram_drv.c
>> >>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
>> >>>> >> >         return scnprintf(buf, PAGE_SIZE, "%d\n", val);
>> >>>> >> >  }
>> >>>> >> >
>> >>>> >> > +static ssize_t mem_limit_show(struct device *dev,
>> >>>> >> > +               struct device_attribute *attr, char *buf)
>> >>>> >> > +{
>> >>>> >> > +       u64 val;
>> >>>> >> > +       struct zram *zram = dev_to_zram(dev);
>> >>>> >> > +
>> >>>> >> > +       down_read(&zram->init_lock);
>> >>>> >> > +       val = zram->limit_pages;
>> >>>> >> > +       up_read(&zram->init_lock);
>> >>>> >> > +
>> >>>> >> > +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
>> >>>> >> > +}
>> >>>> >> > +
>> >>>> >> > +static ssize_t mem_limit_store(struct device *dev,
>> >>>> >> > +               struct device_attribute *attr, const char *buf, size_t len)
>> >>>> >> > +{
>> >>>> >> > +       u64 limit;
>> >>>> >> > +       struct zram *zram = dev_to_zram(dev);
>> >>>> >> > +
>> >>>> >> > +       limit = memparse(buf, NULL);
>> >>>> >>
>> >>>> >>             if (limit = 0 && buf != "0")
>> >>>> >>                   return  -EINVAL
>> >>>> >>
>> >>>> >> > +       down_write(&zram->init_lock);
>> >>>> >> > +       zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
>> >>>> >> > +       up_write(&zram->init_lock);
>> >>>> >> > +
>> >>>> >> > +       return len;
>> >>>> >> > +}
>> >>>> >> > +
>> >>>> >> >  static ssize_t max_comp_streams_store(struct device *dev,
>> >>>> >> >                 struct device_attribute *attr, const char *buf, size_t len)
>> >>>> >> >  {
>> >>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>> >>>> >> >                 ret = -ENOMEM;
>> >>>> >> >                 goto out;
>> >>>> >> >         }
>> >>>> >> > +
>> >>>> >> > +       if (zram->limit_pages &&
>> >>>> >> > +               zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
>> >>>> >> > +               zs_free(meta->mem_pool, handle);
>> >>>> >> > +               ret = -ENOMEM;
>> >>>> >> > +               goto out;
>> >>>> >> > +       }
>> >>>> >> > +
>> >>>> >> >         cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>> >>>> >> >
>> >>>> >> >         if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
>> >>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>> >>>> >> >         struct zram_meta *meta;
>> >>>> >> >
>> >>>> >> >         down_write(&zram->init_lock);
>> >>>> >> > +
>> >>>> >> > +       zram->limit_pages = 0;
>> >>>> >> > +
>> >>>> >> >         if (!init_done(zram)) {
>> >>>> >> >                 up_write(&zram->init_lock);
>> >>>> >> >                 return;
>> >>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
>> >>>> >> >  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
>> >>>> >> >  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
>> >>>> >> >  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
>> >>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
>> >>>> >> > +               mem_limit_store);
>> >>>> >> >  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
>> >>>> >> >                 max_comp_streams_show, max_comp_streams_store);
>> >>>> >> >  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
>> >>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
>> >>>> >> >         &dev_attr_orig_data_size.attr,
>> >>>> >> >         &dev_attr_compr_data_size.attr,
>> >>>> >> >         &dev_attr_mem_used_total.attr,
>> >>>> >> > +       &dev_attr_mem_limit.attr,
>> >>>> >> >         &dev_attr_max_comp_streams.attr,
>> >>>> >> >         &dev_attr_comp_algorithm.attr,
>> >>>> >> >         NULL,
>> >>>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
>> >>>> >> > index e0f725c87cc6..b7aa9c21553f 100644
>> >>>> >> > --- a/drivers/block/zram/zram_drv.h
>> >>>> >> > +++ b/drivers/block/zram/zram_drv.h
>> >>>> >> > @@ -112,6 +112,11 @@ struct zram {
>> >>>> >> >         u64 disksize;   /* bytes */
>> >>>> >> >         int max_comp_streams;
>> >>>> >> >         struct zram_stats stats;
>> >>>> >> > +       /*
>> >>>> >> > +        * the number of pages zram can consume for storing compressed data
>> >>>> >> > +        */
>> >>>> >> > +       unsigned long limit_pages;
>> >>>> >> > +
>> >>>> >> >         char compressor[10];
>> >>>> >> >  };
>> >>>> >> >  #endif
>> >>>> >> > --
>> >>>> >> > 2.0.0
>> >>>> >> >
>> >>>> >>
>> >>>> >> --
>> >>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >>>> >> the body to majordomo@kvack.org.  For more info on Linux MM,
>> >>>> >> see: http://www.linux-mm.org/ .
>> >>>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>> >>>> >
>> >>>> > --
>> >>>> > Kind regards,
>> >>>> > Minchan Kim
>> >>>>
>> >>>> --
>> >>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> >>>> see: http://www.linux-mm.org/ .
>> >>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>> >>>
>> >>> --
>> >>> Kind regards,
>> >>> Minchan Kim
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
> --
> Kind regards,
> Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
@ 2014-08-26  5:36                     ` David Horner
  0 siblings, 0 replies; 44+ messages in thread
From: David Horner @ 2014-08-26  5:36 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Dan Streetman, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings

On Tue, Aug 26, 2014 at 12:39 AM, Minchan Kim <minchan@kernel.org> wrote:
> Hi Dan and David,
>
> On Mon, Aug 25, 2014 at 09:54:57PM -0400, David Horner wrote:
>> On Mon, Aug 25, 2014 at 2:12 PM, Dan Streetman <ddstreet@ieee.org> wrote:
>> > On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@gmail.com> wrote:
>> >> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote:
>> >>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote:
>> >>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote:
>> >>>> > Hello David,
>> >>>> >
>> >>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote:
>> >>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
>> >>>> >> > Since zram has no control feature to limit memory usage,
>> >>>> >> > it makes hard to manage system memrory.
>> >>>> >> >
>> >>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the
>> >>>> >> > a limit so that zram could fail allocation once it reaches
>> >>>> >> > the limit.
>> >>>> >> >
>> >>>> >> > In addition, user could change the limit in runtime so that
>> >>>> >> > he could manage the memory more dynamically.
>> >>>> >> >
>> >>>> >> - Default is no limit so it doesn't break old behavior.
>> >>>> >> + Initial state is no limit so it doesn't break old behavior.
>> >>>> >>
>> >>>> >> I understand your previous post now.
>> >>>> >>
>> >>>> >> I was saying that setting to either a null value or garbage
>> >>>> >>  (which is interpreted as zero by memparse(buf, NULL);)
>> >>>> >> removes the limit.
>> >>>> >>
>> >>>> >> I think this is "surprise" behaviour and rather the null case should
>> >>>> >> return  -EINVAL
>> >>>> >> The test below should be "good enough" though not catching all garbage.
>> >>>> >
>> >>>> > Thanks for suggesting but as I said, it should be fixed in memparse itself,
>> >>>> > not caller if it is really problem so I don't want to touch it in this
>> >>>> > patchset. It's not critical for adding the feature.
>> >>>> >
>> >>>>
>> >>>> I've looked into the memparse function more since we talked.
>> >>>> I do believe a wrapper function around it for the typical use by sysfs would
>> >>>> be very valuable.
>> >>>
>> >>> Agree.
>> >>>
>> >>>> However, there is nothing wrong with memparse itself that needs to be fixed.
>> >>>>
>> >>>> It does what it is documented to do very well (In My Uninformed Opinion).
>> >>>> It provides everything that a caller needs to manage the token that it
>> >>>> processes.
>> >>>> It thus handles strings like "7,,5,8,,9" with the implied zeros.
>> >>>
>> >>> Maybe strict_memparse would be better to protect such things so you
>> >>> could find several places to clean it up.
>> >>>
>> >>>>
>> >>>> The fact that other callers don't check the return pointer value to
>> >>>> see if only a null
>> >>>> string was processed, is not its fault.
>> >>>> Nor that it may not be ideally suited to sysfs attributes; that other store
>> >>>> functions use it in a given manner does not means that is correct -
>> >>>> nor that it is
>> >>>> incorrect for that "knob". Some attributes could be just as valid with
>> >>>> null zeros.
>> >>>>
>> >>>> And you are correct, to disambiguate the zero is not required for the
>> >>>> limit feature.
>> >>>> Your original patch which disallowed zero was full feature for mem_limit.
>> >>>> It is the requested non-crucial feature to allow zero to reestablish
>> >>>> the initial state
>> >>>>  that benefits from distinguishing an explicit zero from a "default zero'
>> >>>>  when garbage is written.
>> >>>>
>> >>>> The final argument is that if we release this feature as is the undocumented
>> >>>>  functionality could be relied upon, and when later fixed: user space breaks.
>> >>>
>> >>> I don't get it. Why does it break userspace?
>> >>> The sysfs-block-zram says "0" means disable the limit.
>> >>> If someone writes *garabge* but work as if disabling the limit,
>> >>> it's not a right thing and he already broke although it worked
>> >>> so it would be not a problem if we fix later.
>> >>> (ie, we don't need to take care of broken userspace)
>> >>> Am I missing your point?
>> >>>
>> >>
>> >> Perhaps you are missing my point, perhaps ignoring or dismissing.
>> >>
>> >> Basically, if a facility works in a useful way, even if it was designed for
>> >> different usage, that becomes the "accepted" interface/usage.
>> >> The developer may not have intended that usage or may even considered
>> >> it wrong and a broken usage, but it is what it is and people become
>> >>  reliant on that behaviour.
>> >>
>> >> Case in point is memparse itself.
>> >>
>> >> The developer intentionally sets the return pointer because that is the
>> >> only value that can be validated for correct performance.
>> >> The return value allows -ve so the standard error message passing is not valid.
>> >> Unfortunately, C allows the user to pass a NULL value in the parameter.
>> >> The developer could consider that absurd and fundamentally broken.
>> >> But to the user it is a valid situation, because (perhaps) it can't be
>> >> bothered to handle error cases.
>> >>
>> >> So, who is to blame.
>> >> You say memparse, that it is fundamentally broken,
>> >>   because it didn't check to see that it was used correctly.
>> >>  And I say  mem_limit_store is fundamentally broken,
>> >>   because it didn't check to see that it was used correctly.
>> >
>> > I think we should look at what the rest of the kernel does as far as
>> > checking memparse results.  It appears to be a mix of some code
>> > checking memparse while others don't.  The most common way to check
>> > appears to be to verify that memparse actually parsed at least 1
>> > character, e.g.:
>> >   oldp = p;
>> >   mem_size = memparse(p, &p);
>> >   if (p == oldp)
>> >     return -EINVAL;
>> >
>> > although other places where 0 isn't valid can simply check for that:
>> >   mem_size = memparse(p, &p);
>> >   /* don't remove all of memory when handling "mem={invalid}" param */
>> >   if (mem_size == 0)
>> >     return -EINVAL;
>> >
>> > or even the other memparse use in zram_drv.c:
>> >   disksize = memparse(buf, NULL);
>> >   if (!disksize)
>> >     return -EINVAL;
>> >
>> >
>> > And there seem to be other places where (maybe?) there's no checking
>> > at all.  However, it also seems like many cases of memparse usage are
>> > looking for a non-zero value, and therefore they can either
>> > immediately check for zero/invalid or (possibly) later code has checks
>> > to avoid using any zero value.  In this case though, 0 is a valid
>> > value.  So, while I agree that if a user passes an invalid (i.e.
>> > non-numeric) value it's clearly user error, it might be closer to the
>> > apparent (although unwritten AFAICT) memparse usage api to check the
>> > result for validity; in our case a simple check if at least 1 char was
>> > parsed is all that's needed, e.g.:
>> >
>> > {
>> >   u64 limit;
>> >   char *tmp = buf;
>> >   struct zram *zram = dev_to_zram(dev);
>> >
>> >   limit = memparse(buf, &tmp);
>> >   if (buf == tmp) /* no chars parsed, invalid input */
>> >     return -EINVAL;
>> >   down_write(&zram->init_lock);
>>
>>
>> Thank you Dan, for this clear, unoffensive and I believe compelling analysis.
>
> Thanks for suggestion, Dan.
>
> David, Are you okay for this?
>
> You pointed out several cases. One was NULL check.
> Dan's patch will fix it but other example you pointed out was
> "7,,5,8,,9". Slightly modifying your example, "0..1" can reset without
> returning EINVAL. Actually, it was not what we want.
> Couldn't we check it if you guys really want to prevent wrong use from
> userspace? If we don't need it, pz, give me a reason so I will convince
> and proceed this patchset and do further works.
>
> Thanks.
>

I'm very happy about this patch.

As for your example, yes, the validation is somewhat slack.

We could insist that the parsed value exactly matches the supplied input length.
But the general case of trailing blanks, and as you pointed out, CR LF
or other valid end-of-line codes would also have to be taken into account.
A substantial coding for little value returned.

I agree that in this case the fix up should be elsewhere, in the sysfs
support layer.
Trailing white space and end-of-line indicators should be optionally
stripped before
the store routine gets them, and a known terminating value appended.
Then the checking and overrun avoidance can be reasonably implemented.

Until then, the code is good as far as I am concerned.
The API is sound and the exposure to overruns and false indications is
already quite low.

(more for me to research and hopefully have time to do some real coding).

Finally, if the user wanted to express a fractional unit allocation,
like .8G, that too would be
a nice enhancement that could be added later as I don't see that
breaking the API.

(comments on this? Dan?)


>>
>> I have much to learn.
>>
>> > ...
>> >
>> >
>> > Separate from this patch, it would also help if the lib/cmdline.c
>> > memparse doc was at least updated to clarify when the result should be
>> > checked for validity (e.g. always, or at least when the result is 0)
>> > and how best to do that (e.g. if 0 is an invalid value, just check if
>> > the result is 0; if 0 is a possible valid value, check if any chars
>> > were parsed).
>> >
>> >
>>
>> I'd argue that the code is not the place for this usage recommendation.
>> But rather an expansion of the support doc for sysfs
>> on how to use such parsing/validation routines.
>>
>> I agree with Minchan that these helper functions could be improved
>> for specific use by sysfs.
>>  And I will pursue this. (and maybe the documentation?)
>>
>>
>> >>
>> >> The difference is that memparse cannot stop being abused
>> >> (C allows the NULL argument and extensive tricks are required to address that)
>> >> however, we can readily fix mem_limit_store and ensure
>> >> 1) no regression when the interface IS fixed and
>> >> 2) predictable behaviour when accidental or "fuzzy" input arrives.
>> >>
>> >>
>> >>>> They say getting API right is a difficult exercise. I suggest, if we
>> >>>> don't insisting on
>> >>>>  an explicit zero we have the API wrong.
>> >>>>
>> >>>> I don't think you disagreed, just that the burden to get it correct
>> >>>> lay elsewhere.
>> >>>>
>> >>>> If that is the case it doesn't really matter, we cannot release this
>> >>>> interface until
>> >>>>  it is corrected wherever it must be.
>> >>>>
>> >>>> And my zero check was a poor hack.
>> >>>>
>> >>>> I should have explicitly checked the returned pointer value.
>> >>>>
>> >>>> I will send that proposed revision, and hopefully you will consider it
>> >>>> for inclusion.
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> >>
>> >>>> >> >
>> >>>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org>
>> >>>> >> > ---
>> >>>> >> >  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
>> >>>> >> >  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
>> >>>> >> >  drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
>> >>>> >> >  drivers/block/zram/zram_drv.h              |  5 ++++
>> >>>> >> >  4 files changed, 76 insertions(+), 4 deletions(-)
>> >>>> >> >
>> >>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
>> >>>> >> > index 70ec992514d0..b8c779d64968 100644
>> >>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram
>> >>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram
>> >>>> >> > @@ -119,3 +119,13 @@ Description:
>> >>>> >> >                 efficiency can be calculated using compr_data_size and this
>> >>>> >> >                 statistic.
>> >>>> >> >                 Unit: bytes
>> >>>> >> > +
>> >>>> >> > +What:          /sys/block/zram<id>/mem_limit
>> >>>> >> > +Date:          August 2014
>> >>>> >> > +Contact:       Minchan Kim <minchan@kernel.org>
>> >>>> >> > +Description:
>> >>>> >> > +               The mem_limit file is read/write and specifies the amount
>> >>>> >> > +               of memory to be able to consume memory to store store
>> >>>> >> > +               compressed data. The limit could be changed in run time
>> >>>> >> > -               and "0" is default which means disable the limit.
>> >>>> >> > +               and "0" means disable the limit. No limit is the initial state.
>> >>>> >>
>> >>>> >> there should be no default in the API.
>> >>>> >
>> >>>> > Thanks.
>> >>>> >
>> >>>> >>
>> >>>> >> > +               Unit: bytes
>> >>>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
>> >>>> >> > index 0595c3f56ccf..82c6a41116db 100644
>> >>>> >> > --- a/Documentation/blockdev/zram.txt
>> >>>> >> > +++ b/Documentation/blockdev/zram.txt
>> >>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
>> >>>> >> >  since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
>> >>>> >> >  size of the disk when not in use so a huge zram is wasteful.
>> >>>> >> >
>> >>>> >> > -5) Activate:
>> >>>> >> > +5) Set memory limit: Optional
>> >>>> >> > +       Set memory limit by writing the value to sysfs node 'mem_limit'.
>> >>>> >> > +       The value can be either in bytes or you can use mem suffixes.
>> >>>> >> > +       In addition, you could change the value in runtime.
>> >>>> >> > +       Examples:
>> >>>> >> > +           # limit /dev/zram0 with 50MB memory
>> >>>> >> > +           echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
>> >>>> >> > +
>> >>>> >> > +           # Using mem suffixes
>> >>>> >> > +           echo 256K > /sys/block/zram0/mem_limit
>> >>>> >> > +           echo 512M > /sys/block/zram0/mem_limit
>> >>>> >> > +           echo 1G > /sys/block/zram0/mem_limit
>> >>>> >> > +
>> >>>> >> > +           # To disable memory limit
>> >>>> >> > +           echo 0 > /sys/block/zram0/mem_limit
>> >>>> >> > +
>> >>>> >> > +6) Activate:
>> >>>> >> >         mkswap /dev/zram0
>> >>>> >> >         swapon /dev/zram0
>> >>>> >> >
>> >>>> >> >         mkfs.ext4 /dev/zram1
>> >>>> >> >         mount /dev/zram1 /tmp
>> >>>> >> >
>> >>>> >> > -6) Stats:
>> >>>> >> > +7) Stats:
>> >>>> >> >         Per-device statistics are exported as various nodes under
>> >>>> >> >         /sys/block/zram<id>/
>> >>>> >> >                 disksize
>> >>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
>> >>>> >> >                 compr_data_size
>> >>>> >> >                 mem_used_total
>> >>>> >> >
>> >>>> >> > -7) Deactivate:
>> >>>> >> > +8) Deactivate:
>> >>>> >> >         swapoff /dev/zram0
>> >>>> >> >         umount /dev/zram1
>> >>>> >> >
>> >>>> >> > -8) Reset:
>> >>>> >> > +9) Reset:
>> >>>> >> >         Write any positive value to 'reset' sysfs node
>> >>>> >> >         echo 1 > /sys/block/zram0/reset
>> >>>> >> >         echo 1 > /sys/block/zram1/reset
>> >>>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
>> >>>> >> > index f0b8b30a7128..370c355eb127 100644
>> >>>> >> > --- a/drivers/block/zram/zram_drv.c
>> >>>> >> > +++ b/drivers/block/zram/zram_drv.c
>> >>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
>> >>>> >> >         return scnprintf(buf, PAGE_SIZE, "%d\n", val);
>> >>>> >> >  }
>> >>>> >> >
>> >>>> >> > +static ssize_t mem_limit_show(struct device *dev,
>> >>>> >> > +               struct device_attribute *attr, char *buf)
>> >>>> >> > +{
>> >>>> >> > +       u64 val;
>> >>>> >> > +       struct zram *zram = dev_to_zram(dev);
>> >>>> >> > +
>> >>>> >> > +       down_read(&zram->init_lock);
>> >>>> >> > +       val = zram->limit_pages;
>> >>>> >> > +       up_read(&zram->init_lock);
>> >>>> >> > +
>> >>>> >> > +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
>> >>>> >> > +}
>> >>>> >> > +
>> >>>> >> > +static ssize_t mem_limit_store(struct device *dev,
>> >>>> >> > +               struct device_attribute *attr, const char *buf, size_t len)
>> >>>> >> > +{
>> >>>> >> > +       u64 limit;
>> >>>> >> > +       struct zram *zram = dev_to_zram(dev);
>> >>>> >> > +
>> >>>> >> > +       limit = memparse(buf, NULL);
>> >>>> >>
>> >>>> >>             if (limit = 0 && buf != "0")
>> >>>> >>                   return  -EINVAL
>> >>>> >>
>> >>>> >> > +       down_write(&zram->init_lock);
>> >>>> >> > +       zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
>> >>>> >> > +       up_write(&zram->init_lock);
>> >>>> >> > +
>> >>>> >> > +       return len;
>> >>>> >> > +}
>> >>>> >> > +
>> >>>> >> >  static ssize_t max_comp_streams_store(struct device *dev,
>> >>>> >> >                 struct device_attribute *attr, const char *buf, size_t len)
>> >>>> >> >  {
>> >>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>> >>>> >> >                 ret = -ENOMEM;
>> >>>> >> >                 goto out;
>> >>>> >> >         }
>> >>>> >> > +
>> >>>> >> > +       if (zram->limit_pages &&
>> >>>> >> > +               zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
>> >>>> >> > +               zs_free(meta->mem_pool, handle);
>> >>>> >> > +               ret = -ENOMEM;
>> >>>> >> > +               goto out;
>> >>>> >> > +       }
>> >>>> >> > +
>> >>>> >> >         cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>> >>>> >> >
>> >>>> >> >         if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
>> >>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>> >>>> >> >         struct zram_meta *meta;
>> >>>> >> >
>> >>>> >> >         down_write(&zram->init_lock);
>> >>>> >> > +
>> >>>> >> > +       zram->limit_pages = 0;
>> >>>> >> > +
>> >>>> >> >         if (!init_done(zram)) {
>> >>>> >> >                 up_write(&zram->init_lock);
>> >>>> >> >                 return;
>> >>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
>> >>>> >> >  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
>> >>>> >> >  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
>> >>>> >> >  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
>> >>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
>> >>>> >> > +               mem_limit_store);
>> >>>> >> >  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
>> >>>> >> >                 max_comp_streams_show, max_comp_streams_store);
>> >>>> >> >  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
>> >>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
>> >>>> >> >         &dev_attr_orig_data_size.attr,
>> >>>> >> >         &dev_attr_compr_data_size.attr,
>> >>>> >> >         &dev_attr_mem_used_total.attr,
>> >>>> >> > +       &dev_attr_mem_limit.attr,
>> >>>> >> >         &dev_attr_max_comp_streams.attr,
>> >>>> >> >         &dev_attr_comp_algorithm.attr,
>> >>>> >> >         NULL,
>> >>>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
>> >>>> >> > index e0f725c87cc6..b7aa9c21553f 100644
>> >>>> >> > --- a/drivers/block/zram/zram_drv.h
>> >>>> >> > +++ b/drivers/block/zram/zram_drv.h
>> >>>> >> > @@ -112,6 +112,11 @@ struct zram {
>> >>>> >> >         u64 disksize;   /* bytes */
>> >>>> >> >         int max_comp_streams;
>> >>>> >> >         struct zram_stats stats;
>> >>>> >> > +       /*
>> >>>> >> > +        * the number of pages zram can consume for storing compressed data
>> >>>> >> > +        */
>> >>>> >> > +       unsigned long limit_pages;
>> >>>> >> > +
>> >>>> >> >         char compressor[10];
>> >>>> >> >  };
>> >>>> >> >  #endif
>> >>>> >> > --
>> >>>> >> > 2.0.0
>> >>>> >> >
>> >>>> >>
>> >>>> >> --
>> >>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >>>> >> the body to majordomo@kvack.org.  For more info on Linux MM,
>> >>>> >> see: http://www.linux-mm.org/ .
>> >>>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>> >>>> >
>> >>>> > --
>> >>>> > Kind regards,
>> >>>> > Minchan Kim
>> >>>>
>> >>>> --
>> >>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> >>>> see: http://www.linux-mm.org/ .
>> >>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>> >>>
>> >>> --
>> >>> Kind regards,
>> >>> Minchan Kim
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
> --
> Kind regards,
> Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
  2014-08-26  4:39                   ` Minchan Kim
@ 2014-08-26 13:31                     ` Dan Streetman
  -1 siblings, 0 replies; 44+ messages in thread
From: Dan Streetman @ 2014-08-26 13:31 UTC (permalink / raw)
  To: Minchan Kim
  Cc: David Horner, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings

On Tue, Aug 26, 2014 at 12:39 AM, Minchan Kim <minchan@kernel.org> wrote:
> Hi Dan and David,
>
> On Mon, Aug 25, 2014 at 09:54:57PM -0400, David Horner wrote:
>> On Mon, Aug 25, 2014 at 2:12 PM, Dan Streetman <ddstreet@ieee.org> wrote:
>> > On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@gmail.com> wrote:
>> >> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote:
>> >>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote:
>> >>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote:
>> >>>> > Hello David,
>> >>>> >
>> >>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote:
>> >>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
>> >>>> >> > Since zram has no control feature to limit memory usage,
>> >>>> >> > it makes hard to manage system memrory.
>> >>>> >> >
>> >>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the
>> >>>> >> > a limit so that zram could fail allocation once it reaches
>> >>>> >> > the limit.
>> >>>> >> >
>> >>>> >> > In addition, user could change the limit in runtime so that
>> >>>> >> > he could manage the memory more dynamically.
>> >>>> >> >
>> >>>> >> - Default is no limit so it doesn't break old behavior.
>> >>>> >> + Initial state is no limit so it doesn't break old behavior.
>> >>>> >>
>> >>>> >> I understand your previous post now.
>> >>>> >>
>> >>>> >> I was saying that setting to either a null value or garbage
>> >>>> >>  (which is interpreted as zero by memparse(buf, NULL);)
>> >>>> >> removes the limit.
>> >>>> >>
>> >>>> >> I think this is "surprise" behaviour and rather the null case should
>> >>>> >> return  -EINVAL
>> >>>> >> The test below should be "good enough" though not catching all garbage.
>> >>>> >
>> >>>> > Thanks for suggesting but as I said, it should be fixed in memparse itself,
>> >>>> > not caller if it is really problem so I don't want to touch it in this
>> >>>> > patchset. It's not critical for adding the feature.
>> >>>> >
>> >>>>
>> >>>> I've looked into the memparse function more since we talked.
>> >>>> I do believe a wrapper function around it for the typical use by sysfs would
>> >>>> be very valuable.
>> >>>
>> >>> Agree.
>> >>>
>> >>>> However, there is nothing wrong with memparse itself that needs to be fixed.
>> >>>>
>> >>>> It does what it is documented to do very well (In My Uninformed Opinion).
>> >>>> It provides everything that a caller needs to manage the token that it
>> >>>> processes.
>> >>>> It thus handles strings like "7,,5,8,,9" with the implied zeros.
>> >>>
>> >>> Maybe strict_memparse would be better to protect such things so you
>> >>> could find several places to clean it up.
>> >>>
>> >>>>
>> >>>> The fact that other callers don't check the return pointer value to
>> >>>> see if only a null
>> >>>> string was processed, is not its fault.
>> >>>> Nor that it may not be ideally suited to sysfs attributes; that other store
>> >>>> functions use it in a given manner does not means that is correct -
>> >>>> nor that it is
>> >>>> incorrect for that "knob". Some attributes could be just as valid with
>> >>>> null zeros.
>> >>>>
>> >>>> And you are correct, to disambiguate the zero is not required for the
>> >>>> limit feature.
>> >>>> Your original patch which disallowed zero was full feature for mem_limit.
>> >>>> It is the requested non-crucial feature to allow zero to reestablish
>> >>>> the initial state
>> >>>>  that benefits from distinguishing an explicit zero from a "default zero'
>> >>>>  when garbage is written.
>> >>>>
>> >>>> The final argument is that if we release this feature as is the undocumented
>> >>>>  functionality could be relied upon, and when later fixed: user space breaks.
>> >>>
>> >>> I don't get it. Why does it break userspace?
>> >>> The sysfs-block-zram says "0" means disable the limit.
>> >>> If someone writes *garabge* but work as if disabling the limit,
>> >>> it's not a right thing and he already broke although it worked
>> >>> so it would be not a problem if we fix later.
>> >>> (ie, we don't need to take care of broken userspace)
>> >>> Am I missing your point?
>> >>>
>> >>
>> >> Perhaps you are missing my point, perhaps ignoring or dismissing.
>> >>
>> >> Basically, if a facility works in a useful way, even if it was designed for
>> >> different usage, that becomes the "accepted" interface/usage.
>> >> The developer may not have intended that usage or may even considered
>> >> it wrong and a broken usage, but it is what it is and people become
>> >>  reliant on that behaviour.
>> >>
>> >> Case in point is memparse itself.
>> >>
>> >> The developer intentionally sets the return pointer because that is the
>> >> only value that can be validated for correct performance.
>> >> The return value allows -ve so the standard error message passing is not valid.
>> >> Unfortunately, C allows the user to pass a NULL value in the parameter.
>> >> The developer could consider that absurd and fundamentally broken.
>> >> But to the user it is a valid situation, because (perhaps) it can't be
>> >> bothered to handle error cases.
>> >>
>> >> So, who is to blame.
>> >> You say memparse, that it is fundamentally broken,
>> >>   because it didn't check to see that it was used correctly.
>> >>  And I say  mem_limit_store is fundamentally broken,
>> >>   because it didn't check to see that it was used correctly.
>> >
>> > I think we should look at what the rest of the kernel does as far as
>> > checking memparse results.  It appears to be a mix of some code
>> > checking memparse while others don't.  The most common way to check
>> > appears to be to verify that memparse actually parsed at least 1
>> > character, e.g.:
>> >   oldp = p;
>> >   mem_size = memparse(p, &p);
>> >   if (p == oldp)
>> >     return -EINVAL;
>> >
>> > although other places where 0 isn't valid can simply check for that:
>> >   mem_size = memparse(p, &p);
>> >   /* don't remove all of memory when handling "mem={invalid}" param */
>> >   if (mem_size == 0)
>> >     return -EINVAL;
>> >
>> > or even the other memparse use in zram_drv.c:
>> >   disksize = memparse(buf, NULL);
>> >   if (!disksize)
>> >     return -EINVAL;
>> >
>> >
>> > And there seem to be other places where (maybe?) there's no checking
>> > at all.  However, it also seems like many cases of memparse usage are
>> > looking for a non-zero value, and therefore they can either
>> > immediately check for zero/invalid or (possibly) later code has checks
>> > to avoid using any zero value.  In this case though, 0 is a valid
>> > value.  So, while I agree that if a user passes an invalid (i.e.
>> > non-numeric) value it's clearly user error, it might be closer to the
>> > apparent (although unwritten AFAICT) memparse usage api to check the
>> > result for validity; in our case a simple check if at least 1 char was
>> > parsed is all that's needed, e.g.:
>> >
>> > {
>> >   u64 limit;
>> >   char *tmp = buf;
>> >   struct zram *zram = dev_to_zram(dev);
>> >
>> >   limit = memparse(buf, &tmp);
>> >   if (buf == tmp) /* no chars parsed, invalid input */
>> >     return -EINVAL;
>> >   down_write(&zram->init_lock);
>>
>>
>> Thank you Dan, for this clear, unoffensive and I believe compelling analysis.
>
> Thanks for suggestion, Dan.
>
> David, Are you okay for this?
>
> You pointed out several cases. One was NULL check.
> Dan's patch will fix it but other example you pointed out was
> "7,,5,8,,9". Slightly modifying your example, "0..1" can reset without
> returning EINVAL. Actually, it was not what we want.
> Couldn't we check it if you guys really want to prevent wrong use from
> userspace? If we don't need it, pz, give me a reason so I will convince
> and proceed this patchset and do further works.

As you show, the simple check to see if at least 1 char was parsed
won't catch all invalid strings, only those with no leading numerics,
like "help", "?", "", etc.  But that appears to be the common usage of
memparse, to only check for basic validity, not strictly checking that
the entire input string was fully parsed.

I think the rest of this patch is good, and this is a very minor issue
that only occurs with user error.  This could be left until later,
possibly along with a larger memparse update.  With or without this
minor adjustment to check the memparse result basic validity:

Reviewed-by: Dan Streetman <ddstreet@ieee.org>

>
> Thanks.
>
>>
>> I have much to learn.
>>
>> > ...
>> >
>> >
>> > Separate from this patch, it would also help if the lib/cmdline.c
>> > memparse doc was at least updated to clarify when the result should be
>> > checked for validity (e.g. always, or at least when the result is 0)
>> > and how best to do that (e.g. if 0 is an invalid value, just check if
>> > the result is 0; if 0 is a possible valid value, check if any chars
>> > were parsed).
>> >
>> >
>>
>> I'd argue that the code is not the place for this usage recommendation.
>> But rather an expansion of the support doc for sysfs
>> on how to use such parsing/validation routines.
>>
>> I agree with Minchan that these helper functions could be improved
>> for specific use by sysfs.
>>  And I will pursue this. (and maybe the documentation?)
>>
>>
>> >>
>> >> The difference is that memparse cannot stop being abused
>> >> (C allows the NULL argument and extensive tricks are required to address that)
>> >> however, we can readily fix mem_limit_store and ensure
>> >> 1) no regression when the interface IS fixed and
>> >> 2) predictable behaviour when accidental or "fuzzy" input arrives.
>> >>
>> >>
>> >>>> They say getting API right is a difficult exercise. I suggest, if we
>> >>>> don't insisting on
>> >>>>  an explicit zero we have the API wrong.
>> >>>>
>> >>>> I don't think you disagreed, just that the burden to get it correct
>> >>>> lay elsewhere.
>> >>>>
>> >>>> If that is the case it doesn't really matter, we cannot release this
>> >>>> interface until
>> >>>>  it is corrected wherever it must be.
>> >>>>
>> >>>> And my zero check was a poor hack.
>> >>>>
>> >>>> I should have explicitly checked the returned pointer value.
>> >>>>
>> >>>> I will send that proposed revision, and hopefully you will consider it
>> >>>> for inclusion.
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> >>
>> >>>> >> >
>> >>>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org>
>> >>>> >> > ---
>> >>>> >> >  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
>> >>>> >> >  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
>> >>>> >> >  drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
>> >>>> >> >  drivers/block/zram/zram_drv.h              |  5 ++++
>> >>>> >> >  4 files changed, 76 insertions(+), 4 deletions(-)
>> >>>> >> >
>> >>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
>> >>>> >> > index 70ec992514d0..b8c779d64968 100644
>> >>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram
>> >>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram
>> >>>> >> > @@ -119,3 +119,13 @@ Description:
>> >>>> >> >                 efficiency can be calculated using compr_data_size and this
>> >>>> >> >                 statistic.
>> >>>> >> >                 Unit: bytes
>> >>>> >> > +
>> >>>> >> > +What:          /sys/block/zram<id>/mem_limit
>> >>>> >> > +Date:          August 2014
>> >>>> >> > +Contact:       Minchan Kim <minchan@kernel.org>
>> >>>> >> > +Description:
>> >>>> >> > +               The mem_limit file is read/write and specifies the amount
>> >>>> >> > +               of memory to be able to consume memory to store store
>> >>>> >> > +               compressed data. The limit could be changed in run time
>> >>>> >> > -               and "0" is default which means disable the limit.
>> >>>> >> > +               and "0" means disable the limit. No limit is the initial state.
>> >>>> >>
>> >>>> >> there should be no default in the API.
>> >>>> >
>> >>>> > Thanks.
>> >>>> >
>> >>>> >>
>> >>>> >> > +               Unit: bytes
>> >>>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
>> >>>> >> > index 0595c3f56ccf..82c6a41116db 100644
>> >>>> >> > --- a/Documentation/blockdev/zram.txt
>> >>>> >> > +++ b/Documentation/blockdev/zram.txt
>> >>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
>> >>>> >> >  since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
>> >>>> >> >  size of the disk when not in use so a huge zram is wasteful.
>> >>>> >> >
>> >>>> >> > -5) Activate:
>> >>>> >> > +5) Set memory limit: Optional
>> >>>> >> > +       Set memory limit by writing the value to sysfs node 'mem_limit'.
>> >>>> >> > +       The value can be either in bytes or you can use mem suffixes.
>> >>>> >> > +       In addition, you could change the value in runtime.
>> >>>> >> > +       Examples:
>> >>>> >> > +           # limit /dev/zram0 with 50MB memory
>> >>>> >> > +           echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
>> >>>> >> > +
>> >>>> >> > +           # Using mem suffixes
>> >>>> >> > +           echo 256K > /sys/block/zram0/mem_limit
>> >>>> >> > +           echo 512M > /sys/block/zram0/mem_limit
>> >>>> >> > +           echo 1G > /sys/block/zram0/mem_limit
>> >>>> >> > +
>> >>>> >> > +           # To disable memory limit
>> >>>> >> > +           echo 0 > /sys/block/zram0/mem_limit
>> >>>> >> > +
>> >>>> >> > +6) Activate:
>> >>>> >> >         mkswap /dev/zram0
>> >>>> >> >         swapon /dev/zram0
>> >>>> >> >
>> >>>> >> >         mkfs.ext4 /dev/zram1
>> >>>> >> >         mount /dev/zram1 /tmp
>> >>>> >> >
>> >>>> >> > -6) Stats:
>> >>>> >> > +7) Stats:
>> >>>> >> >         Per-device statistics are exported as various nodes under
>> >>>> >> >         /sys/block/zram<id>/
>> >>>> >> >                 disksize
>> >>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
>> >>>> >> >                 compr_data_size
>> >>>> >> >                 mem_used_total
>> >>>> >> >
>> >>>> >> > -7) Deactivate:
>> >>>> >> > +8) Deactivate:
>> >>>> >> >         swapoff /dev/zram0
>> >>>> >> >         umount /dev/zram1
>> >>>> >> >
>> >>>> >> > -8) Reset:
>> >>>> >> > +9) Reset:
>> >>>> >> >         Write any positive value to 'reset' sysfs node
>> >>>> >> >         echo 1 > /sys/block/zram0/reset
>> >>>> >> >         echo 1 > /sys/block/zram1/reset
>> >>>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
>> >>>> >> > index f0b8b30a7128..370c355eb127 100644
>> >>>> >> > --- a/drivers/block/zram/zram_drv.c
>> >>>> >> > +++ b/drivers/block/zram/zram_drv.c
>> >>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
>> >>>> >> >         return scnprintf(buf, PAGE_SIZE, "%d\n", val);
>> >>>> >> >  }
>> >>>> >> >
>> >>>> >> > +static ssize_t mem_limit_show(struct device *dev,
>> >>>> >> > +               struct device_attribute *attr, char *buf)
>> >>>> >> > +{
>> >>>> >> > +       u64 val;
>> >>>> >> > +       struct zram *zram = dev_to_zram(dev);
>> >>>> >> > +
>> >>>> >> > +       down_read(&zram->init_lock);
>> >>>> >> > +       val = zram->limit_pages;
>> >>>> >> > +       up_read(&zram->init_lock);
>> >>>> >> > +
>> >>>> >> > +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
>> >>>> >> > +}
>> >>>> >> > +
>> >>>> >> > +static ssize_t mem_limit_store(struct device *dev,
>> >>>> >> > +               struct device_attribute *attr, const char *buf, size_t len)
>> >>>> >> > +{
>> >>>> >> > +       u64 limit;
>> >>>> >> > +       struct zram *zram = dev_to_zram(dev);
>> >>>> >> > +
>> >>>> >> > +       limit = memparse(buf, NULL);
>> >>>> >>
>> >>>> >>             if (limit = 0 && buf != "0")
>> >>>> >>                   return  -EINVAL
>> >>>> >>
>> >>>> >> > +       down_write(&zram->init_lock);
>> >>>> >> > +       zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
>> >>>> >> > +       up_write(&zram->init_lock);
>> >>>> >> > +
>> >>>> >> > +       return len;
>> >>>> >> > +}
>> >>>> >> > +
>> >>>> >> >  static ssize_t max_comp_streams_store(struct device *dev,
>> >>>> >> >                 struct device_attribute *attr, const char *buf, size_t len)
>> >>>> >> >  {
>> >>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>> >>>> >> >                 ret = -ENOMEM;
>> >>>> >> >                 goto out;
>> >>>> >> >         }
>> >>>> >> > +
>> >>>> >> > +       if (zram->limit_pages &&
>> >>>> >> > +               zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
>> >>>> >> > +               zs_free(meta->mem_pool, handle);
>> >>>> >> > +               ret = -ENOMEM;
>> >>>> >> > +               goto out;
>> >>>> >> > +       }
>> >>>> >> > +
>> >>>> >> >         cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>> >>>> >> >
>> >>>> >> >         if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
>> >>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>> >>>> >> >         struct zram_meta *meta;
>> >>>> >> >
>> >>>> >> >         down_write(&zram->init_lock);
>> >>>> >> > +
>> >>>> >> > +       zram->limit_pages = 0;
>> >>>> >> > +
>> >>>> >> >         if (!init_done(zram)) {
>> >>>> >> >                 up_write(&zram->init_lock);
>> >>>> >> >                 return;
>> >>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
>> >>>> >> >  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
>> >>>> >> >  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
>> >>>> >> >  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
>> >>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
>> >>>> >> > +               mem_limit_store);
>> >>>> >> >  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
>> >>>> >> >                 max_comp_streams_show, max_comp_streams_store);
>> >>>> >> >  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
>> >>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
>> >>>> >> >         &dev_attr_orig_data_size.attr,
>> >>>> >> >         &dev_attr_compr_data_size.attr,
>> >>>> >> >         &dev_attr_mem_used_total.attr,
>> >>>> >> > +       &dev_attr_mem_limit.attr,
>> >>>> >> >         &dev_attr_max_comp_streams.attr,
>> >>>> >> >         &dev_attr_comp_algorithm.attr,
>> >>>> >> >         NULL,
>> >>>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
>> >>>> >> > index e0f725c87cc6..b7aa9c21553f 100644
>> >>>> >> > --- a/drivers/block/zram/zram_drv.h
>> >>>> >> > +++ b/drivers/block/zram/zram_drv.h
>> >>>> >> > @@ -112,6 +112,11 @@ struct zram {
>> >>>> >> >         u64 disksize;   /* bytes */
>> >>>> >> >         int max_comp_streams;
>> >>>> >> >         struct zram_stats stats;
>> >>>> >> > +       /*
>> >>>> >> > +        * the number of pages zram can consume for storing compressed data
>> >>>> >> > +        */
>> >>>> >> > +       unsigned long limit_pages;
>> >>>> >> > +
>> >>>> >> >         char compressor[10];
>> >>>> >> >  };
>> >>>> >> >  #endif
>> >>>> >> > --
>> >>>> >> > 2.0.0
>> >>>> >> >
>> >>>> >>
>> >>>> >> --
>> >>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >>>> >> the body to majordomo@kvack.org.  For more info on Linux MM,
>> >>>> >> see: http://www.linux-mm.org/ .
>> >>>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>> >>>> >
>> >>>> > --
>> >>>> > Kind regards,
>> >>>> > Minchan Kim
>> >>>>
>> >>>> --
>> >>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> >>>> see: http://www.linux-mm.org/ .
>> >>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>> >>>
>> >>> --
>> >>> Kind regards,
>> >>> Minchan Kim
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
> --
> Kind regards,
> Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
@ 2014-08-26 13:31                     ` Dan Streetman
  0 siblings, 0 replies; 44+ messages in thread
From: Dan Streetman @ 2014-08-26 13:31 UTC (permalink / raw)
  To: Minchan Kim
  Cc: David Horner, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings

On Tue, Aug 26, 2014 at 12:39 AM, Minchan Kim <minchan@kernel.org> wrote:
> Hi Dan and David,
>
> On Mon, Aug 25, 2014 at 09:54:57PM -0400, David Horner wrote:
>> On Mon, Aug 25, 2014 at 2:12 PM, Dan Streetman <ddstreet@ieee.org> wrote:
>> > On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@gmail.com> wrote:
>> >> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote:
>> >>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote:
>> >>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote:
>> >>>> > Hello David,
>> >>>> >
>> >>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote:
>> >>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
>> >>>> >> > Since zram has no control feature to limit memory usage,
>> >>>> >> > it makes hard to manage system memrory.
>> >>>> >> >
>> >>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the
>> >>>> >> > a limit so that zram could fail allocation once it reaches
>> >>>> >> > the limit.
>> >>>> >> >
>> >>>> >> > In addition, user could change the limit in runtime so that
>> >>>> >> > he could manage the memory more dynamically.
>> >>>> >> >
>> >>>> >> - Default is no limit so it doesn't break old behavior.
>> >>>> >> + Initial state is no limit so it doesn't break old behavior.
>> >>>> >>
>> >>>> >> I understand your previous post now.
>> >>>> >>
>> >>>> >> I was saying that setting to either a null value or garbage
>> >>>> >>  (which is interpreted as zero by memparse(buf, NULL);)
>> >>>> >> removes the limit.
>> >>>> >>
>> >>>> >> I think this is "surprise" behaviour and rather the null case should
>> >>>> >> return  -EINVAL
>> >>>> >> The test below should be "good enough" though not catching all garbage.
>> >>>> >
>> >>>> > Thanks for suggesting but as I said, it should be fixed in memparse itself,
>> >>>> > not caller if it is really problem so I don't want to touch it in this
>> >>>> > patchset. It's not critical for adding the feature.
>> >>>> >
>> >>>>
>> >>>> I've looked into the memparse function more since we talked.
>> >>>> I do believe a wrapper function around it for the typical use by sysfs would
>> >>>> be very valuable.
>> >>>
>> >>> Agree.
>> >>>
>> >>>> However, there is nothing wrong with memparse itself that needs to be fixed.
>> >>>>
>> >>>> It does what it is documented to do very well (In My Uninformed Opinion).
>> >>>> It provides everything that a caller needs to manage the token that it
>> >>>> processes.
>> >>>> It thus handles strings like "7,,5,8,,9" with the implied zeros.
>> >>>
>> >>> Maybe strict_memparse would be better to protect such things so you
>> >>> could find several places to clean it up.
>> >>>
>> >>>>
>> >>>> The fact that other callers don't check the return pointer value to
>> >>>> see if only a null
>> >>>> string was processed, is not its fault.
>> >>>> Nor that it may not be ideally suited to sysfs attributes; that other store
>> >>>> functions use it in a given manner does not means that is correct -
>> >>>> nor that it is
>> >>>> incorrect for that "knob". Some attributes could be just as valid with
>> >>>> null zeros.
>> >>>>
>> >>>> And you are correct, to disambiguate the zero is not required for the
>> >>>> limit feature.
>> >>>> Your original patch which disallowed zero was full feature for mem_limit.
>> >>>> It is the requested non-crucial feature to allow zero to reestablish
>> >>>> the initial state
>> >>>>  that benefits from distinguishing an explicit zero from a "default zero'
>> >>>>  when garbage is written.
>> >>>>
>> >>>> The final argument is that if we release this feature as is the undocumented
>> >>>>  functionality could be relied upon, and when later fixed: user space breaks.
>> >>>
>> >>> I don't get it. Why does it break userspace?
>> >>> The sysfs-block-zram says "0" means disable the limit.
>> >>> If someone writes *garabge* but work as if disabling the limit,
>> >>> it's not a right thing and he already broke although it worked
>> >>> so it would be not a problem if we fix later.
>> >>> (ie, we don't need to take care of broken userspace)
>> >>> Am I missing your point?
>> >>>
>> >>
>> >> Perhaps you are missing my point, perhaps ignoring or dismissing.
>> >>
>> >> Basically, if a facility works in a useful way, even if it was designed for
>> >> different usage, that becomes the "accepted" interface/usage.
>> >> The developer may not have intended that usage or may even considered
>> >> it wrong and a broken usage, but it is what it is and people become
>> >>  reliant on that behaviour.
>> >>
>> >> Case in point is memparse itself.
>> >>
>> >> The developer intentionally sets the return pointer because that is the
>> >> only value that can be validated for correct performance.
>> >> The return value allows -ve so the standard error message passing is not valid.
>> >> Unfortunately, C allows the user to pass a NULL value in the parameter.
>> >> The developer could consider that absurd and fundamentally broken.
>> >> But to the user it is a valid situation, because (perhaps) it can't be
>> >> bothered to handle error cases.
>> >>
>> >> So, who is to blame.
>> >> You say memparse, that it is fundamentally broken,
>> >>   because it didn't check to see that it was used correctly.
>> >>  And I say  mem_limit_store is fundamentally broken,
>> >>   because it didn't check to see that it was used correctly.
>> >
>> > I think we should look at what the rest of the kernel does as far as
>> > checking memparse results.  It appears to be a mix of some code
>> > checking memparse while others don't.  The most common way to check
>> > appears to be to verify that memparse actually parsed at least 1
>> > character, e.g.:
>> >   oldp = p;
>> >   mem_size = memparse(p, &p);
>> >   if (p == oldp)
>> >     return -EINVAL;
>> >
>> > although other places where 0 isn't valid can simply check for that:
>> >   mem_size = memparse(p, &p);
>> >   /* don't remove all of memory when handling "mem={invalid}" param */
>> >   if (mem_size == 0)
>> >     return -EINVAL;
>> >
>> > or even the other memparse use in zram_drv.c:
>> >   disksize = memparse(buf, NULL);
>> >   if (!disksize)
>> >     return -EINVAL;
>> >
>> >
>> > And there seem to be other places where (maybe?) there's no checking
>> > at all.  However, it also seems like many cases of memparse usage are
>> > looking for a non-zero value, and therefore they can either
>> > immediately check for zero/invalid or (possibly) later code has checks
>> > to avoid using any zero value.  In this case though, 0 is a valid
>> > value.  So, while I agree that if a user passes an invalid (i.e.
>> > non-numeric) value it's clearly user error, it might be closer to the
>> > apparent (although unwritten AFAICT) memparse usage api to check the
>> > result for validity; in our case a simple check if at least 1 char was
>> > parsed is all that's needed, e.g.:
>> >
>> > {
>> >   u64 limit;
>> >   char *tmp = buf;
>> >   struct zram *zram = dev_to_zram(dev);
>> >
>> >   limit = memparse(buf, &tmp);
>> >   if (buf == tmp) /* no chars parsed, invalid input */
>> >     return -EINVAL;
>> >   down_write(&zram->init_lock);
>>
>>
>> Thank you Dan, for this clear, unoffensive and I believe compelling analysis.
>
> Thanks for suggestion, Dan.
>
> David, Are you okay for this?
>
> You pointed out several cases. One was NULL check.
> Dan's patch will fix it but other example you pointed out was
> "7,,5,8,,9". Slightly modifying your example, "0..1" can reset without
> returning EINVAL. Actually, it was not what we want.
> Couldn't we check it if you guys really want to prevent wrong use from
> userspace? If we don't need it, pz, give me a reason so I will convince
> and proceed this patchset and do further works.

As you show, the simple check to see if at least 1 char was parsed
won't catch all invalid strings, only those with no leading numerics,
like "help", "?", "", etc.  But that appears to be the common usage of
memparse, to only check for basic validity, not strictly checking that
the entire input string was fully parsed.

I think the rest of this patch is good, and this is a very minor issue
that only occurs with user error.  This could be left until later,
possibly along with a larger memparse update.  With or without this
minor adjustment to check the memparse result basic validity:

Reviewed-by: Dan Streetman <ddstreet@ieee.org>

>
> Thanks.
>
>>
>> I have much to learn.
>>
>> > ...
>> >
>> >
>> > Separate from this patch, it would also help if the lib/cmdline.c
>> > memparse doc was at least updated to clarify when the result should be
>> > checked for validity (e.g. always, or at least when the result is 0)
>> > and how best to do that (e.g. if 0 is an invalid value, just check if
>> > the result is 0; if 0 is a possible valid value, check if any chars
>> > were parsed).
>> >
>> >
>>
>> I'd argue that the code is not the place for this usage recommendation.
>> But rather an expansion of the support doc for sysfs
>> on how to use such parsing/validation routines.
>>
>> I agree with Minchan that these helper functions could be improved
>> for specific use by sysfs.
>>  And I will pursue this. (and maybe the documentation?)
>>
>>
>> >>
>> >> The difference is that memparse cannot stop being abused
>> >> (C allows the NULL argument and extensive tricks are required to address that)
>> >> however, we can readily fix mem_limit_store and ensure
>> >> 1) no regression when the interface IS fixed and
>> >> 2) predictable behaviour when accidental or "fuzzy" input arrives.
>> >>
>> >>
>> >>>> They say getting API right is a difficult exercise. I suggest, if we
>> >>>> don't insisting on
>> >>>>  an explicit zero we have the API wrong.
>> >>>>
>> >>>> I don't think you disagreed, just that the burden to get it correct
>> >>>> lay elsewhere.
>> >>>>
>> >>>> If that is the case it doesn't really matter, we cannot release this
>> >>>> interface until
>> >>>>  it is corrected wherever it must be.
>> >>>>
>> >>>> And my zero check was a poor hack.
>> >>>>
>> >>>> I should have explicitly checked the returned pointer value.
>> >>>>
>> >>>> I will send that proposed revision, and hopefully you will consider it
>> >>>> for inclusion.
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> >>
>> >>>> >> >
>> >>>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org>
>> >>>> >> > ---
>> >>>> >> >  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
>> >>>> >> >  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
>> >>>> >> >  drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
>> >>>> >> >  drivers/block/zram/zram_drv.h              |  5 ++++
>> >>>> >> >  4 files changed, 76 insertions(+), 4 deletions(-)
>> >>>> >> >
>> >>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
>> >>>> >> > index 70ec992514d0..b8c779d64968 100644
>> >>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram
>> >>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram
>> >>>> >> > @@ -119,3 +119,13 @@ Description:
>> >>>> >> >                 efficiency can be calculated using compr_data_size and this
>> >>>> >> >                 statistic.
>> >>>> >> >                 Unit: bytes
>> >>>> >> > +
>> >>>> >> > +What:          /sys/block/zram<id>/mem_limit
>> >>>> >> > +Date:          August 2014
>> >>>> >> > +Contact:       Minchan Kim <minchan@kernel.org>
>> >>>> >> > +Description:
>> >>>> >> > +               The mem_limit file is read/write and specifies the amount
>> >>>> >> > +               of memory to be able to consume memory to store store
>> >>>> >> > +               compressed data. The limit could be changed in run time
>> >>>> >> > -               and "0" is default which means disable the limit.
>> >>>> >> > +               and "0" means disable the limit. No limit is the initial state.
>> >>>> >>
>> >>>> >> there should be no default in the API.
>> >>>> >
>> >>>> > Thanks.
>> >>>> >
>> >>>> >>
>> >>>> >> > +               Unit: bytes
>> >>>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
>> >>>> >> > index 0595c3f56ccf..82c6a41116db 100644
>> >>>> >> > --- a/Documentation/blockdev/zram.txt
>> >>>> >> > +++ b/Documentation/blockdev/zram.txt
>> >>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
>> >>>> >> >  since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
>> >>>> >> >  size of the disk when not in use so a huge zram is wasteful.
>> >>>> >> >
>> >>>> >> > -5) Activate:
>> >>>> >> > +5) Set memory limit: Optional
>> >>>> >> > +       Set memory limit by writing the value to sysfs node 'mem_limit'.
>> >>>> >> > +       The value can be either in bytes or you can use mem suffixes.
>> >>>> >> > +       In addition, you could change the value in runtime.
>> >>>> >> > +       Examples:
>> >>>> >> > +           # limit /dev/zram0 with 50MB memory
>> >>>> >> > +           echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
>> >>>> >> > +
>> >>>> >> > +           # Using mem suffixes
>> >>>> >> > +           echo 256K > /sys/block/zram0/mem_limit
>> >>>> >> > +           echo 512M > /sys/block/zram0/mem_limit
>> >>>> >> > +           echo 1G > /sys/block/zram0/mem_limit
>> >>>> >> > +
>> >>>> >> > +           # To disable memory limit
>> >>>> >> > +           echo 0 > /sys/block/zram0/mem_limit
>> >>>> >> > +
>> >>>> >> > +6) Activate:
>> >>>> >> >         mkswap /dev/zram0
>> >>>> >> >         swapon /dev/zram0
>> >>>> >> >
>> >>>> >> >         mkfs.ext4 /dev/zram1
>> >>>> >> >         mount /dev/zram1 /tmp
>> >>>> >> >
>> >>>> >> > -6) Stats:
>> >>>> >> > +7) Stats:
>> >>>> >> >         Per-device statistics are exported as various nodes under
>> >>>> >> >         /sys/block/zram<id>/
>> >>>> >> >                 disksize
>> >>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
>> >>>> >> >                 compr_data_size
>> >>>> >> >                 mem_used_total
>> >>>> >> >
>> >>>> >> > -7) Deactivate:
>> >>>> >> > +8) Deactivate:
>> >>>> >> >         swapoff /dev/zram0
>> >>>> >> >         umount /dev/zram1
>> >>>> >> >
>> >>>> >> > -8) Reset:
>> >>>> >> > +9) Reset:
>> >>>> >> >         Write any positive value to 'reset' sysfs node
>> >>>> >> >         echo 1 > /sys/block/zram0/reset
>> >>>> >> >         echo 1 > /sys/block/zram1/reset
>> >>>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
>> >>>> >> > index f0b8b30a7128..370c355eb127 100644
>> >>>> >> > --- a/drivers/block/zram/zram_drv.c
>> >>>> >> > +++ b/drivers/block/zram/zram_drv.c
>> >>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
>> >>>> >> >         return scnprintf(buf, PAGE_SIZE, "%d\n", val);
>> >>>> >> >  }
>> >>>> >> >
>> >>>> >> > +static ssize_t mem_limit_show(struct device *dev,
>> >>>> >> > +               struct device_attribute *attr, char *buf)
>> >>>> >> > +{
>> >>>> >> > +       u64 val;
>> >>>> >> > +       struct zram *zram = dev_to_zram(dev);
>> >>>> >> > +
>> >>>> >> > +       down_read(&zram->init_lock);
>> >>>> >> > +       val = zram->limit_pages;
>> >>>> >> > +       up_read(&zram->init_lock);
>> >>>> >> > +
>> >>>> >> > +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
>> >>>> >> > +}
>> >>>> >> > +
>> >>>> >> > +static ssize_t mem_limit_store(struct device *dev,
>> >>>> >> > +               struct device_attribute *attr, const char *buf, size_t len)
>> >>>> >> > +{
>> >>>> >> > +       u64 limit;
>> >>>> >> > +       struct zram *zram = dev_to_zram(dev);
>> >>>> >> > +
>> >>>> >> > +       limit = memparse(buf, NULL);
>> >>>> >>
>> >>>> >>             if (limit = 0 && buf != "0")
>> >>>> >>                   return  -EINVAL
>> >>>> >>
>> >>>> >> > +       down_write(&zram->init_lock);
>> >>>> >> > +       zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
>> >>>> >> > +       up_write(&zram->init_lock);
>> >>>> >> > +
>> >>>> >> > +       return len;
>> >>>> >> > +}
>> >>>> >> > +
>> >>>> >> >  static ssize_t max_comp_streams_store(struct device *dev,
>> >>>> >> >                 struct device_attribute *attr, const char *buf, size_t len)
>> >>>> >> >  {
>> >>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>> >>>> >> >                 ret = -ENOMEM;
>> >>>> >> >                 goto out;
>> >>>> >> >         }
>> >>>> >> > +
>> >>>> >> > +       if (zram->limit_pages &&
>> >>>> >> > +               zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
>> >>>> >> > +               zs_free(meta->mem_pool, handle);
>> >>>> >> > +               ret = -ENOMEM;
>> >>>> >> > +               goto out;
>> >>>> >> > +       }
>> >>>> >> > +
>> >>>> >> >         cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>> >>>> >> >
>> >>>> >> >         if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
>> >>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>> >>>> >> >         struct zram_meta *meta;
>> >>>> >> >
>> >>>> >> >         down_write(&zram->init_lock);
>> >>>> >> > +
>> >>>> >> > +       zram->limit_pages = 0;
>> >>>> >> > +
>> >>>> >> >         if (!init_done(zram)) {
>> >>>> >> >                 up_write(&zram->init_lock);
>> >>>> >> >                 return;
>> >>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
>> >>>> >> >  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
>> >>>> >> >  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
>> >>>> >> >  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
>> >>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
>> >>>> >> > +               mem_limit_store);
>> >>>> >> >  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
>> >>>> >> >                 max_comp_streams_show, max_comp_streams_store);
>> >>>> >> >  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
>> >>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
>> >>>> >> >         &dev_attr_orig_data_size.attr,
>> >>>> >> >         &dev_attr_compr_data_size.attr,
>> >>>> >> >         &dev_attr_mem_used_total.attr,
>> >>>> >> > +       &dev_attr_mem_limit.attr,
>> >>>> >> >         &dev_attr_max_comp_streams.attr,
>> >>>> >> >         &dev_attr_comp_algorithm.attr,
>> >>>> >> >         NULL,
>> >>>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
>> >>>> >> > index e0f725c87cc6..b7aa9c21553f 100644
>> >>>> >> > --- a/drivers/block/zram/zram_drv.h
>> >>>> >> > +++ b/drivers/block/zram/zram_drv.h
>> >>>> >> > @@ -112,6 +112,11 @@ struct zram {
>> >>>> >> >         u64 disksize;   /* bytes */
>> >>>> >> >         int max_comp_streams;
>> >>>> >> >         struct zram_stats stats;
>> >>>> >> > +       /*
>> >>>> >> > +        * the number of pages zram can consume for storing compressed data
>> >>>> >> > +        */
>> >>>> >> > +       unsigned long limit_pages;
>> >>>> >> > +
>> >>>> >> >         char compressor[10];
>> >>>> >> >  };
>> >>>> >> >  #endif
>> >>>> >> > --
>> >>>> >> > 2.0.0
>> >>>> >> >
>> >>>> >>
>> >>>> >> --
>> >>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >>>> >> the body to majordomo@kvack.org.  For more info on Linux MM,
>> >>>> >> see: http://www.linux-mm.org/ .
>> >>>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>> >>>> >
>> >>>> > --
>> >>>> > Kind regards,
>> >>>> > Minchan Kim
>> >>>>
>> >>>> --
>> >>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> >>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> >>>> see: http://www.linux-mm.org/ .
>> >>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>> >>>
>> >>> --
>> >>> Kind regards,
>> >>> Minchan Kim
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
> --
> Kind regards,
> Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
  2014-08-26  4:28                 ` David Horner
@ 2014-08-26 13:40                   ` Dan Streetman
  -1 siblings, 0 replies; 44+ messages in thread
From: Dan Streetman @ 2014-08-26 13:40 UTC (permalink / raw)
  To: David Horner
  Cc: Minchan Kim, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings

On Tue, Aug 26, 2014 at 12:28 AM, David Horner <ds2horner@gmail.com> wrote:
> On Mon, Aug 25, 2014 at 2:12 PM, Dan Streetman <ddstreet@ieee.org> wrote:
>> On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@gmail.com> wrote:
>>> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote:
>>>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote:
>>>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote:
>>>>> > Hello David,
>>>>> >
>>>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote:
>>>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
>>>>> >> > Since zram has no control feature to limit memory usage,
>>>>> >> > it makes hard to manage system memrory.
>>>>> >> >
>>>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the
>>>>> >> > a limit so that zram could fail allocation once it reaches
>>>>> >> > the limit.
>>>>> >> >
>>>>> >> > In addition, user could change the limit in runtime so that
>>>>> >> > he could manage the memory more dynamically.
>>>>> >> >
>>>>> >> - Default is no limit so it doesn't break old behavior.
>>>>> >> + Initial state is no limit so it doesn't break old behavior.
>>>>> >>
>>>>> >> I understand your previous post now.
>>>>> >>
>>>>> >> I was saying that setting to either a null value or garbage
>>>>> >>  (which is interpreted as zero by memparse(buf, NULL);)
>>>>> >> removes the limit.
>>>>> >>
>>>>> >> I think this is "surprise" behaviour and rather the null case should
>>>>> >> return  -EINVAL
>>>>> >> The test below should be "good enough" though not catching all garbage.
>>>>> >
>>>>> > Thanks for suggesting but as I said, it should be fixed in memparse itself,
>>>>> > not caller if it is really problem so I don't want to touch it in this
>>>>> > patchset. It's not critical for adding the feature.
>>>>> >
>>>>>
>>>>> I've looked into the memparse function more since we talked.
>>>>> I do believe a wrapper function around it for the typical use by sysfs would
>>>>> be very valuable.
>>>>
>>>> Agree.
>>>>
>>>>> However, there is nothing wrong with memparse itself that needs to be fixed.
>>>>>
>>>>> It does what it is documented to do very well (In My Uninformed Opinion).
>>>>> It provides everything that a caller needs to manage the token that it
>>>>> processes.
>>>>> It thus handles strings like "7,,5,8,,9" with the implied zeros.
>>>>
>>>> Maybe strict_memparse would be better to protect such things so you
>>>> could find several places to clean it up.
>>>>
>>>>>
>>>>> The fact that other callers don't check the return pointer value to
>>>>> see if only a null
>>>>> string was processed, is not its fault.
>>>>> Nor that it may not be ideally suited to sysfs attributes; that other store
>>>>> functions use it in a given manner does not means that is correct -
>>>>> nor that it is
>>>>> incorrect for that "knob". Some attributes could be just as valid with
>>>>> null zeros.
>>>>>
>>>>> And you are correct, to disambiguate the zero is not required for the
>>>>> limit feature.
>>>>> Your original patch which disallowed zero was full feature for mem_limit.
>>>>> It is the requested non-crucial feature to allow zero to reestablish
>>>>> the initial state
>>>>>  that benefits from distinguishing an explicit zero from a "default zero'
>>>>>  when garbage is written.
>>>>>
>>>>> The final argument is that if we release this feature as is the undocumented
>>>>>  functionality could be relied upon, and when later fixed: user space breaks.
>>>>
>>>> I don't get it. Why does it break userspace?
>>>> The sysfs-block-zram says "0" means disable the limit.
>>>> If someone writes *garabge* but work as if disabling the limit,
>>>> it's not a right thing and he already broke although it worked
>>>> so it would be not a problem if we fix later.
>>>> (ie, we don't need to take care of broken userspace)
>>>> Am I missing your point?
>>>>
>>>
>>> Perhaps you are missing my point, perhaps ignoring or dismissing.
>>>
>>> Basically, if a facility works in a useful way, even if it was designed for
>>> different usage, that becomes the "accepted" interface/usage.
>>> The developer may not have intended that usage or may even considered
>>> it wrong and a broken usage, but it is what it is and people become
>>>  reliant on that behaviour.
>>>
>>> Case in point is memparse itself.
>>>
>>> The developer intentionally sets the return pointer because that is the
>>> only value that can be validated for correct performance.
>>> The return value allows -ve so the standard error message passing is not valid.
>>> Unfortunately, C allows the user to pass a NULL value in the parameter.
>>> The developer could consider that absurd and fundamentally broken.
>>> But to the user it is a valid situation, because (perhaps) it can't be
>>> bothered to handle error cases.
>>>
>>> So, who is to blame.
>>> You say memparse, that it is fundamentally broken,
>>>   because it didn't check to see that it was used correctly.
>>>  And I say  mem_limit_store is fundamentally broken,
>>>   because it didn't check to see that it was used correctly.
>>
>> I think we should look at what the rest of the kernel does as far as
>> checking memparse results.  It appears to be a mix of some code
>> checking memparse while others don't.  The most common way to check
>> appears to be to verify that memparse actually parsed at least 1
>> character, e.g.:
>>   oldp = p;
>>   mem_size = memparse(p, &p);
>>   if (p == oldp)
>>     return -EINVAL;
>>
>> although other places where 0 isn't valid can simply check for that:
>>   mem_size = memparse(p, &p);
>>   /* don't remove all of memory when handling "mem={invalid}" param */
>>   if (mem_size == 0)
>>     return -EINVAL;
>>
>> or even the other memparse use in zram_drv.c:
>>   disksize = memparse(buf, NULL);
>>   if (!disksize)
>>     return -EINVAL;
>>
>>
>> And there seem to be other places where (maybe?) there's no checking
>> at all.  However, it also seems like many cases of memparse usage are
>> looking for a non-zero value, and therefore they can either
>> immediately check for zero/invalid or (possibly) later code has checks
>> to avoid using any zero value.  In this case though, 0 is a valid
>> value.  So, while I agree that if a user passes an invalid (i.e.
>> non-numeric) value it's clearly user error, it might be closer to the
>> apparent (although unwritten AFAICT) memparse usage api to check the
>> result for validity; in our case a simple check if at least 1 char was
>> parsed is all that's needed, e.g.:
>>
>> {
>>   u64 limit;
>>   char *tmp = buf;
>>   struct zram *zram = dev_to_zram(dev);
>>
>>   limit = memparse(buf, &tmp);
>>   if (buf == tmp) /* no chars parsed, invalid input */
>>     return -EINVAL;
>>   down_write(&zram->init_lock);
>> ...
>>
>>
>> Separate from this patch, it would also help if the lib/cmdline.c
>> memparse doc was at least updated to clarify when the result should be
>> checked for validity
>
> FWIW:
> I was pondering why I thought this was the wrong place.
> On reflection the best explanation is that it is not validity -
>      the program does what it does quite well.
>       (although it does have flaws for use by sysfs
>          1) it uses simple_strtoull which according to kernel.h#L269 is obsolete
>          2) it checks for a suffix in the null zero case
>               (that means G,K,M are all valid memory size constants,
>                and I think that should not be in the definition of
> valid mem parms)
>          3) it does nothing to enforce termination of the input.
>             Both simple_strtoull and its successor  kstrtoull are not
> buffer overrun safe.
>             And so neither is memparse.
>             It may be the sysfs buffer management does some magic here
>                - but I have not seen it documented nor in code.)
>
> Rather than _validity_ it is _applicability_ that needs explaining.
> And that is not documented in the function that does its thing.
> But rather in the code that uses it, and more specifically in the framework
> established for its specific use - as in sysfs for numeric memory parameters.

Well, sysfs isn't the only user of memparse, over half of its usage is
from arch/, presumably for kernel boot param parsing.  So the doc on
its usage shouldn't only be for sysfs.

>
>> and how best to do that (e.g. if 0 is an invalid value, just check if
>> the result is 0; if 0 is a possible valid value, check if any chars
>> were parsed).
>>
>>
>>>
>>> The difference is that memparse cannot stop being abused
>>> (C allows the NULL argument and extensive tricks are required to address that)
>>> however, we can readily fix mem_limit_store and ensure
>>> 1) no regression when the interface IS fixed and
>>> 2) predictable behaviour when accidental or "fuzzy" input arrives.
>>>
>>>
>>>>> They say getting API right is a difficult exercise. I suggest, if we
>>>>> don't insisting on
>>>>>  an explicit zero we have the API wrong.
>>>>>
>>>>> I don't think you disagreed, just that the burden to get it correct
>>>>> lay elsewhere.
>>>>>
>>>>> If that is the case it doesn't really matter, we cannot release this
>>>>> interface until
>>>>>  it is corrected wherever it must be.
>>>>>
>>>>> And my zero check was a poor hack.
>>>>>
>>>>> I should have explicitly checked the returned pointer value.
>>>>>
>>>>> I will send that proposed revision, and hopefully you will consider it
>>>>> for inclusion.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> >>
>>>>> >> >
>>>>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org>
>>>>> >> > ---
>>>>> >> >  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
>>>>> >> >  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
>>>>> >> >  drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
>>>>> >> >  drivers/block/zram/zram_drv.h              |  5 ++++
>>>>> >> >  4 files changed, 76 insertions(+), 4 deletions(-)
>>>>> >> >
>>>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
>>>>> >> > index 70ec992514d0..b8c779d64968 100644
>>>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram
>>>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram
>>>>> >> > @@ -119,3 +119,13 @@ Description:
>>>>> >> >                 efficiency can be calculated using compr_data_size and this
>>>>> >> >                 statistic.
>>>>> >> >                 Unit: bytes
>>>>> >> > +
>>>>> >> > +What:          /sys/block/zram<id>/mem_limit
>>>>> >> > +Date:          August 2014
>>>>> >> > +Contact:       Minchan Kim <minchan@kernel.org>
>>>>> >> > +Description:
>>>>> >> > +               The mem_limit file is read/write and specifies the amount
>>>>> >> > +               of memory to be able to consume memory to store store
>>>>> >> > +               compressed data. The limit could be changed in run time
>>>>> >> > -               and "0" is default which means disable the limit.
>>>>> >> > +               and "0" means disable the limit. No limit is the initial state.
>>>>> >>
>>>>> >> there should be no default in the API.
>>>>> >
>>>>> > Thanks.
>>>>> >
>>>>> >>
>>>>> >> > +               Unit: bytes
>>>>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
>>>>> >> > index 0595c3f56ccf..82c6a41116db 100644
>>>>> >> > --- a/Documentation/blockdev/zram.txt
>>>>> >> > +++ b/Documentation/blockdev/zram.txt
>>>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
>>>>> >> >  since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
>>>>> >> >  size of the disk when not in use so a huge zram is wasteful.
>>>>> >> >
>>>>> >> > -5) Activate:
>>>>> >> > +5) Set memory limit: Optional
>>>>> >> > +       Set memory limit by writing the value to sysfs node 'mem_limit'.
>>>>> >> > +       The value can be either in bytes or you can use mem suffixes.
>>>>> >> > +       In addition, you could change the value in runtime.
>>>>> >> > +       Examples:
>>>>> >> > +           # limit /dev/zram0 with 50MB memory
>>>>> >> > +           echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
>>>>> >> > +
>>>>> >> > +           # Using mem suffixes
>>>>> >> > +           echo 256K > /sys/block/zram0/mem_limit
>>>>> >> > +           echo 512M > /sys/block/zram0/mem_limit
>>>>> >> > +           echo 1G > /sys/block/zram0/mem_limit
>>>>> >> > +
>>>>> >> > +           # To disable memory limit
>>>>> >> > +           echo 0 > /sys/block/zram0/mem_limit
>>>>> >> > +
>>>>> >> > +6) Activate:
>>>>> >> >         mkswap /dev/zram0
>>>>> >> >         swapon /dev/zram0
>>>>> >> >
>>>>> >> >         mkfs.ext4 /dev/zram1
>>>>> >> >         mount /dev/zram1 /tmp
>>>>> >> >
>>>>> >> > -6) Stats:
>>>>> >> > +7) Stats:
>>>>> >> >         Per-device statistics are exported as various nodes under
>>>>> >> >         /sys/block/zram<id>/
>>>>> >> >                 disksize
>>>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
>>>>> >> >                 compr_data_size
>>>>> >> >                 mem_used_total
>>>>> >> >
>>>>> >> > -7) Deactivate:
>>>>> >> > +8) Deactivate:
>>>>> >> >         swapoff /dev/zram0
>>>>> >> >         umount /dev/zram1
>>>>> >> >
>>>>> >> > -8) Reset:
>>>>> >> > +9) Reset:
>>>>> >> >         Write any positive value to 'reset' sysfs node
>>>>> >> >         echo 1 > /sys/block/zram0/reset
>>>>> >> >         echo 1 > /sys/block/zram1/reset
>>>>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
>>>>> >> > index f0b8b30a7128..370c355eb127 100644
>>>>> >> > --- a/drivers/block/zram/zram_drv.c
>>>>> >> > +++ b/drivers/block/zram/zram_drv.c
>>>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
>>>>> >> >         return scnprintf(buf, PAGE_SIZE, "%d\n", val);
>>>>> >> >  }
>>>>> >> >
>>>>> >> > +static ssize_t mem_limit_show(struct device *dev,
>>>>> >> > +               struct device_attribute *attr, char *buf)
>>>>> >> > +{
>>>>> >> > +       u64 val;
>>>>> >> > +       struct zram *zram = dev_to_zram(dev);
>>>>> >> > +
>>>>> >> > +       down_read(&zram->init_lock);
>>>>> >> > +       val = zram->limit_pages;
>>>>> >> > +       up_read(&zram->init_lock);
>>>>> >> > +
>>>>> >> > +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
>>>>> >> > +}
>>>>> >> > +
>>>>> >> > +static ssize_t mem_limit_store(struct device *dev,
>>>>> >> > +               struct device_attribute *attr, const char *buf, size_t len)
>>>>> >> > +{
>>>>> >> > +       u64 limit;
>>>>> >> > +       struct zram *zram = dev_to_zram(dev);
>>>>> >> > +
>>>>> >> > +       limit = memparse(buf, NULL);
>>>>> >>
>>>>> >>             if (limit = 0 && buf != "0")
>>>>> >>                   return  -EINVAL
>>>>> >>
>>>>> >> > +       down_write(&zram->init_lock);
>>>>> >> > +       zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
>>>>> >> > +       up_write(&zram->init_lock);
>>>>> >> > +
>>>>> >> > +       return len;
>>>>> >> > +}
>>>>> >> > +
>>>>> >> >  static ssize_t max_comp_streams_store(struct device *dev,
>>>>> >> >                 struct device_attribute *attr, const char *buf, size_t len)
>>>>> >> >  {
>>>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>>>>> >> >                 ret = -ENOMEM;
>>>>> >> >                 goto out;
>>>>> >> >         }
>>>>> >> > +
>>>>> >> > +       if (zram->limit_pages &&
>>>>> >> > +               zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
>>>>> >> > +               zs_free(meta->mem_pool, handle);
>>>>> >> > +               ret = -ENOMEM;
>>>>> >> > +               goto out;
>>>>> >> > +       }
>>>>> >> > +
>>>>> >> >         cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>>>>> >> >
>>>>> >> >         if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
>>>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>>>>> >> >         struct zram_meta *meta;
>>>>> >> >
>>>>> >> >         down_write(&zram->init_lock);
>>>>> >> > +
>>>>> >> > +       zram->limit_pages = 0;
>>>>> >> > +
>>>>> >> >         if (!init_done(zram)) {
>>>>> >> >                 up_write(&zram->init_lock);
>>>>> >> >                 return;
>>>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
>>>>> >> >  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
>>>>> >> >  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
>>>>> >> >  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
>>>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
>>>>> >> > +               mem_limit_store);
>>>>> >> >  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
>>>>> >> >                 max_comp_streams_show, max_comp_streams_store);
>>>>> >> >  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
>>>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
>>>>> >> >         &dev_attr_orig_data_size.attr,
>>>>> >> >         &dev_attr_compr_data_size.attr,
>>>>> >> >         &dev_attr_mem_used_total.attr,
>>>>> >> > +       &dev_attr_mem_limit.attr,
>>>>> >> >         &dev_attr_max_comp_streams.attr,
>>>>> >> >         &dev_attr_comp_algorithm.attr,
>>>>> >> >         NULL,
>>>>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
>>>>> >> > index e0f725c87cc6..b7aa9c21553f 100644
>>>>> >> > --- a/drivers/block/zram/zram_drv.h
>>>>> >> > +++ b/drivers/block/zram/zram_drv.h
>>>>> >> > @@ -112,6 +112,11 @@ struct zram {
>>>>> >> >         u64 disksize;   /* bytes */
>>>>> >> >         int max_comp_streams;
>>>>> >> >         struct zram_stats stats;
>>>>> >> > +       /*
>>>>> >> > +        * the number of pages zram can consume for storing compressed data
>>>>> >> > +        */
>>>>> >> > +       unsigned long limit_pages;
>>>>> >> > +
>>>>> >> >         char compressor[10];
>>>>> >> >  };
>>>>> >> >  #endif
>>>>> >> > --
>>>>> >> > 2.0.0
>>>>> >> >
>>>>> >>
>>>>> >> --
>>>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>>> >> the body to majordomo@kvack.org.  For more info on Linux MM,
>>>>> >> see: http://www.linux-mm.org/ .
>>>>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>>> >
>>>>> > --
>>>>> > Kind regards,
>>>>> > Minchan Kim
>>>>>
>>>>> --
>>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>>>> see: http://www.linux-mm.org/ .
>>>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>>
>>>> --
>>>> Kind regards,
>>>> Minchan Kim

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v4 3/4] zram: zram memory size limitation
@ 2014-08-26 13:40                   ` Dan Streetman
  0 siblings, 0 replies; 44+ messages in thread
From: Dan Streetman @ 2014-08-26 13:40 UTC (permalink / raw)
  To: David Horner
  Cc: Minchan Kim, Andrew Morton, Linux-MM, linux-kernel,
	Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park,
	Luigi Semenzato, Nitin Gupta, Seth Jennings

On Tue, Aug 26, 2014 at 12:28 AM, David Horner <ds2horner@gmail.com> wrote:
> On Mon, Aug 25, 2014 at 2:12 PM, Dan Streetman <ddstreet@ieee.org> wrote:
>> On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@gmail.com> wrote:
>>> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote:
>>>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote:
>>>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote:
>>>>> > Hello David,
>>>>> >
>>>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote:
>>>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote:
>>>>> >> > Since zram has no control feature to limit memory usage,
>>>>> >> > it makes hard to manage system memrory.
>>>>> >> >
>>>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the
>>>>> >> > a limit so that zram could fail allocation once it reaches
>>>>> >> > the limit.
>>>>> >> >
>>>>> >> > In addition, user could change the limit in runtime so that
>>>>> >> > he could manage the memory more dynamically.
>>>>> >> >
>>>>> >> - Default is no limit so it doesn't break old behavior.
>>>>> >> + Initial state is no limit so it doesn't break old behavior.
>>>>> >>
>>>>> >> I understand your previous post now.
>>>>> >>
>>>>> >> I was saying that setting to either a null value or garbage
>>>>> >>  (which is interpreted as zero by memparse(buf, NULL);)
>>>>> >> removes the limit.
>>>>> >>
>>>>> >> I think this is "surprise" behaviour and rather the null case should
>>>>> >> return  -EINVAL
>>>>> >> The test below should be "good enough" though not catching all garbage.
>>>>> >
>>>>> > Thanks for suggesting but as I said, it should be fixed in memparse itself,
>>>>> > not caller if it is really problem so I don't want to touch it in this
>>>>> > patchset. It's not critical for adding the feature.
>>>>> >
>>>>>
>>>>> I've looked into the memparse function more since we talked.
>>>>> I do believe a wrapper function around it for the typical use by sysfs would
>>>>> be very valuable.
>>>>
>>>> Agree.
>>>>
>>>>> However, there is nothing wrong with memparse itself that needs to be fixed.
>>>>>
>>>>> It does what it is documented to do very well (In My Uninformed Opinion).
>>>>> It provides everything that a caller needs to manage the token that it
>>>>> processes.
>>>>> It thus handles strings like "7,,5,8,,9" with the implied zeros.
>>>>
>>>> Maybe strict_memparse would be better to protect such things so you
>>>> could find several places to clean it up.
>>>>
>>>>>
>>>>> The fact that other callers don't check the return pointer value to
>>>>> see if only a null
>>>>> string was processed, is not its fault.
>>>>> Nor that it may not be ideally suited to sysfs attributes; that other store
>>>>> functions use it in a given manner does not means that is correct -
>>>>> nor that it is
>>>>> incorrect for that "knob". Some attributes could be just as valid with
>>>>> null zeros.
>>>>>
>>>>> And you are correct, to disambiguate the zero is not required for the
>>>>> limit feature.
>>>>> Your original patch which disallowed zero was full feature for mem_limit.
>>>>> It is the requested non-crucial feature to allow zero to reestablish
>>>>> the initial state
>>>>>  that benefits from distinguishing an explicit zero from a "default zero'
>>>>>  when garbage is written.
>>>>>
>>>>> The final argument is that if we release this feature as is the undocumented
>>>>>  functionality could be relied upon, and when later fixed: user space breaks.
>>>>
>>>> I don't get it. Why does it break userspace?
>>>> The sysfs-block-zram says "0" means disable the limit.
>>>> If someone writes *garabge* but work as if disabling the limit,
>>>> it's not a right thing and he already broke although it worked
>>>> so it would be not a problem if we fix later.
>>>> (ie, we don't need to take care of broken userspace)
>>>> Am I missing your point?
>>>>
>>>
>>> Perhaps you are missing my point, perhaps ignoring or dismissing.
>>>
>>> Basically, if a facility works in a useful way, even if it was designed for
>>> different usage, that becomes the "accepted" interface/usage.
>>> The developer may not have intended that usage or may even considered
>>> it wrong and a broken usage, but it is what it is and people become
>>>  reliant on that behaviour.
>>>
>>> Case in point is memparse itself.
>>>
>>> The developer intentionally sets the return pointer because that is the
>>> only value that can be validated for correct performance.
>>> The return value allows -ve so the standard error message passing is not valid.
>>> Unfortunately, C allows the user to pass a NULL value in the parameter.
>>> The developer could consider that absurd and fundamentally broken.
>>> But to the user it is a valid situation, because (perhaps) it can't be
>>> bothered to handle error cases.
>>>
>>> So, who is to blame.
>>> You say memparse, that it is fundamentally broken,
>>>   because it didn't check to see that it was used correctly.
>>>  And I say  mem_limit_store is fundamentally broken,
>>>   because it didn't check to see that it was used correctly.
>>
>> I think we should look at what the rest of the kernel does as far as
>> checking memparse results.  It appears to be a mix of some code
>> checking memparse while others don't.  The most common way to check
>> appears to be to verify that memparse actually parsed at least 1
>> character, e.g.:
>>   oldp = p;
>>   mem_size = memparse(p, &p);
>>   if (p == oldp)
>>     return -EINVAL;
>>
>> although other places where 0 isn't valid can simply check for that:
>>   mem_size = memparse(p, &p);
>>   /* don't remove all of memory when handling "mem={invalid}" param */
>>   if (mem_size == 0)
>>     return -EINVAL;
>>
>> or even the other memparse use in zram_drv.c:
>>   disksize = memparse(buf, NULL);
>>   if (!disksize)
>>     return -EINVAL;
>>
>>
>> And there seem to be other places where (maybe?) there's no checking
>> at all.  However, it also seems like many cases of memparse usage are
>> looking for a non-zero value, and therefore they can either
>> immediately check for zero/invalid or (possibly) later code has checks
>> to avoid using any zero value.  In this case though, 0 is a valid
>> value.  So, while I agree that if a user passes an invalid (i.e.
>> non-numeric) value it's clearly user error, it might be closer to the
>> apparent (although unwritten AFAICT) memparse usage api to check the
>> result for validity; in our case a simple check if at least 1 char was
>> parsed is all that's needed, e.g.:
>>
>> {
>>   u64 limit;
>>   char *tmp = buf;
>>   struct zram *zram = dev_to_zram(dev);
>>
>>   limit = memparse(buf, &tmp);
>>   if (buf == tmp) /* no chars parsed, invalid input */
>>     return -EINVAL;
>>   down_write(&zram->init_lock);
>> ...
>>
>>
>> Separate from this patch, it would also help if the lib/cmdline.c
>> memparse doc was at least updated to clarify when the result should be
>> checked for validity
>
> FWIW:
> I was pondering why I thought this was the wrong place.
> On reflection the best explanation is that it is not validity -
>      the program does what it does quite well.
>       (although it does have flaws for use by sysfs
>          1) it uses simple_strtoull which according to kernel.h#L269 is obsolete
>          2) it checks for a suffix in the null zero case
>               (that means G,K,M are all valid memory size constants,
>                and I think that should not be in the definition of
> valid mem parms)
>          3) it does nothing to enforce termination of the input.
>             Both simple_strtoull and its successor  kstrtoull are not
> buffer overrun safe.
>             And so neither is memparse.
>             It may be the sysfs buffer management does some magic here
>                - but I have not seen it documented nor in code.)
>
> Rather than _validity_ it is _applicability_ that needs explaining.
> And that is not documented in the function that does its thing.
> But rather in the code that uses it, and more specifically in the framework
> established for its specific use - as in sysfs for numeric memory parameters.

Well, sysfs isn't the only user of memparse, over half of its usage is
from arch/, presumably for kernel boot param parsing.  So the doc on
its usage shouldn't only be for sysfs.

>
>> and how best to do that (e.g. if 0 is an invalid value, just check if
>> the result is 0; if 0 is a possible valid value, check if any chars
>> were parsed).
>>
>>
>>>
>>> The difference is that memparse cannot stop being abused
>>> (C allows the NULL argument and extensive tricks are required to address that)
>>> however, we can readily fix mem_limit_store and ensure
>>> 1) no regression when the interface IS fixed and
>>> 2) predictable behaviour when accidental or "fuzzy" input arrives.
>>>
>>>
>>>>> They say getting API right is a difficult exercise. I suggest, if we
>>>>> don't insisting on
>>>>>  an explicit zero we have the API wrong.
>>>>>
>>>>> I don't think you disagreed, just that the burden to get it correct
>>>>> lay elsewhere.
>>>>>
>>>>> If that is the case it doesn't really matter, we cannot release this
>>>>> interface until
>>>>>  it is corrected wherever it must be.
>>>>>
>>>>> And my zero check was a poor hack.
>>>>>
>>>>> I should have explicitly checked the returned pointer value.
>>>>>
>>>>> I will send that proposed revision, and hopefully you will consider it
>>>>> for inclusion.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> >>
>>>>> >> >
>>>>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org>
>>>>> >> > ---
>>>>> >> >  Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
>>>>> >> >  Documentation/blockdev/zram.txt            | 24 ++++++++++++++---
>>>>> >> >  drivers/block/zram/zram_drv.c              | 41 ++++++++++++++++++++++++++++++
>>>>> >> >  drivers/block/zram/zram_drv.h              |  5 ++++
>>>>> >> >  4 files changed, 76 insertions(+), 4 deletions(-)
>>>>> >> >
>>>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
>>>>> >> > index 70ec992514d0..b8c779d64968 100644
>>>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram
>>>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram
>>>>> >> > @@ -119,3 +119,13 @@ Description:
>>>>> >> >                 efficiency can be calculated using compr_data_size and this
>>>>> >> >                 statistic.
>>>>> >> >                 Unit: bytes
>>>>> >> > +
>>>>> >> > +What:          /sys/block/zram<id>/mem_limit
>>>>> >> > +Date:          August 2014
>>>>> >> > +Contact:       Minchan Kim <minchan@kernel.org>
>>>>> >> > +Description:
>>>>> >> > +               The mem_limit file is read/write and specifies the amount
>>>>> >> > +               of memory to be able to consume memory to store store
>>>>> >> > +               compressed data. The limit could be changed in run time
>>>>> >> > -               and "0" is default which means disable the limit.
>>>>> >> > +               and "0" means disable the limit. No limit is the initial state.
>>>>> >>
>>>>> >> there should be no default in the API.
>>>>> >
>>>>> > Thanks.
>>>>> >
>>>>> >>
>>>>> >> > +               Unit: bytes
>>>>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
>>>>> >> > index 0595c3f56ccf..82c6a41116db 100644
>>>>> >> > --- a/Documentation/blockdev/zram.txt
>>>>> >> > +++ b/Documentation/blockdev/zram.txt
>>>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
>>>>> >> >  since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
>>>>> >> >  size of the disk when not in use so a huge zram is wasteful.
>>>>> >> >
>>>>> >> > -5) Activate:
>>>>> >> > +5) Set memory limit: Optional
>>>>> >> > +       Set memory limit by writing the value to sysfs node 'mem_limit'.
>>>>> >> > +       The value can be either in bytes or you can use mem suffixes.
>>>>> >> > +       In addition, you could change the value in runtime.
>>>>> >> > +       Examples:
>>>>> >> > +           # limit /dev/zram0 with 50MB memory
>>>>> >> > +           echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
>>>>> >> > +
>>>>> >> > +           # Using mem suffixes
>>>>> >> > +           echo 256K > /sys/block/zram0/mem_limit
>>>>> >> > +           echo 512M > /sys/block/zram0/mem_limit
>>>>> >> > +           echo 1G > /sys/block/zram0/mem_limit
>>>>> >> > +
>>>>> >> > +           # To disable memory limit
>>>>> >> > +           echo 0 > /sys/block/zram0/mem_limit
>>>>> >> > +
>>>>> >> > +6) Activate:
>>>>> >> >         mkswap /dev/zram0
>>>>> >> >         swapon /dev/zram0
>>>>> >> >
>>>>> >> >         mkfs.ext4 /dev/zram1
>>>>> >> >         mount /dev/zram1 /tmp
>>>>> >> >
>>>>> >> > -6) Stats:
>>>>> >> > +7) Stats:
>>>>> >> >         Per-device statistics are exported as various nodes under
>>>>> >> >         /sys/block/zram<id>/
>>>>> >> >                 disksize
>>>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
>>>>> >> >                 compr_data_size
>>>>> >> >                 mem_used_total
>>>>> >> >
>>>>> >> > -7) Deactivate:
>>>>> >> > +8) Deactivate:
>>>>> >> >         swapoff /dev/zram0
>>>>> >> >         umount /dev/zram1
>>>>> >> >
>>>>> >> > -8) Reset:
>>>>> >> > +9) Reset:
>>>>> >> >         Write any positive value to 'reset' sysfs node
>>>>> >> >         echo 1 > /sys/block/zram0/reset
>>>>> >> >         echo 1 > /sys/block/zram1/reset
>>>>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
>>>>> >> > index f0b8b30a7128..370c355eb127 100644
>>>>> >> > --- a/drivers/block/zram/zram_drv.c
>>>>> >> > +++ b/drivers/block/zram/zram_drv.c
>>>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
>>>>> >> >         return scnprintf(buf, PAGE_SIZE, "%d\n", val);
>>>>> >> >  }
>>>>> >> >
>>>>> >> > +static ssize_t mem_limit_show(struct device *dev,
>>>>> >> > +               struct device_attribute *attr, char *buf)
>>>>> >> > +{
>>>>> >> > +       u64 val;
>>>>> >> > +       struct zram *zram = dev_to_zram(dev);
>>>>> >> > +
>>>>> >> > +       down_read(&zram->init_lock);
>>>>> >> > +       val = zram->limit_pages;
>>>>> >> > +       up_read(&zram->init_lock);
>>>>> >> > +
>>>>> >> > +       return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
>>>>> >> > +}
>>>>> >> > +
>>>>> >> > +static ssize_t mem_limit_store(struct device *dev,
>>>>> >> > +               struct device_attribute *attr, const char *buf, size_t len)
>>>>> >> > +{
>>>>> >> > +       u64 limit;
>>>>> >> > +       struct zram *zram = dev_to_zram(dev);
>>>>> >> > +
>>>>> >> > +       limit = memparse(buf, NULL);
>>>>> >>
>>>>> >>             if (limit = 0 && buf != "0")
>>>>> >>                   return  -EINVAL
>>>>> >>
>>>>> >> > +       down_write(&zram->init_lock);
>>>>> >> > +       zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
>>>>> >> > +       up_write(&zram->init_lock);
>>>>> >> > +
>>>>> >> > +       return len;
>>>>> >> > +}
>>>>> >> > +
>>>>> >> >  static ssize_t max_comp_streams_store(struct device *dev,
>>>>> >> >                 struct device_attribute *attr, const char *buf, size_t len)
>>>>> >> >  {
>>>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>>>>> >> >                 ret = -ENOMEM;
>>>>> >> >                 goto out;
>>>>> >> >         }
>>>>> >> > +
>>>>> >> > +       if (zram->limit_pages &&
>>>>> >> > +               zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
>>>>> >> > +               zs_free(meta->mem_pool, handle);
>>>>> >> > +               ret = -ENOMEM;
>>>>> >> > +               goto out;
>>>>> >> > +       }
>>>>> >> > +
>>>>> >> >         cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>>>>> >> >
>>>>> >> >         if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
>>>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>>>>> >> >         struct zram_meta *meta;
>>>>> >> >
>>>>> >> >         down_write(&zram->init_lock);
>>>>> >> > +
>>>>> >> > +       zram->limit_pages = 0;
>>>>> >> > +
>>>>> >> >         if (!init_done(zram)) {
>>>>> >> >                 up_write(&zram->init_lock);
>>>>> >> >                 return;
>>>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
>>>>> >> >  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
>>>>> >> >  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
>>>>> >> >  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
>>>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
>>>>> >> > +               mem_limit_store);
>>>>> >> >  static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
>>>>> >> >                 max_comp_streams_show, max_comp_streams_store);
>>>>> >> >  static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
>>>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
>>>>> >> >         &dev_attr_orig_data_size.attr,
>>>>> >> >         &dev_attr_compr_data_size.attr,
>>>>> >> >         &dev_attr_mem_used_total.attr,
>>>>> >> > +       &dev_attr_mem_limit.attr,
>>>>> >> >         &dev_attr_max_comp_streams.attr,
>>>>> >> >         &dev_attr_comp_algorithm.attr,
>>>>> >> >         NULL,
>>>>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
>>>>> >> > index e0f725c87cc6..b7aa9c21553f 100644
>>>>> >> > --- a/drivers/block/zram/zram_drv.h
>>>>> >> > +++ b/drivers/block/zram/zram_drv.h
>>>>> >> > @@ -112,6 +112,11 @@ struct zram {
>>>>> >> >         u64 disksize;   /* bytes */
>>>>> >> >         int max_comp_streams;
>>>>> >> >         struct zram_stats stats;
>>>>> >> > +       /*
>>>>> >> > +        * the number of pages zram can consume for storing compressed data
>>>>> >> > +        */
>>>>> >> > +       unsigned long limit_pages;
>>>>> >> > +
>>>>> >> >         char compressor[10];
>>>>> >> >  };
>>>>> >> >  #endif
>>>>> >> > --
>>>>> >> > 2.0.0
>>>>> >> >
>>>>> >>
>>>>> >> --
>>>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>>> >> the body to majordomo@kvack.org.  For more info on Linux MM,
>>>>> >> see: http://www.linux-mm.org/ .
>>>>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>>> >
>>>>> > --
>>>>> > Kind regards,
>>>>> > Minchan Kim
>>>>>
>>>>> --
>>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>>>> see: http://www.linux-mm.org/ .
>>>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>>
>>>> --
>>>> Kind regards,
>>>> Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2014-08-26 13:40 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-22  0:42 [PATCH v4 0/4] zram memory control enhance Minchan Kim
2014-08-22  0:42 ` Minchan Kim
2014-08-22  0:42 ` [PATCH v4 1/4] zsmalloc: move pages_allocated to zs_pool Minchan Kim
2014-08-22  0:42   ` Minchan Kim
2014-08-22  0:42 ` [PATCH v4 2/4] zsmalloc: change return value unit of zs_get_total_size_bytes Minchan Kim
2014-08-22  0:42   ` Minchan Kim
2014-08-22  0:42 ` [PATCH v4 3/4] zram: zram memory size limitation Minchan Kim
2014-08-22  0:42   ` Minchan Kim
2014-08-22 10:55   ` David Horner
2014-08-22 10:55     ` David Horner
2014-08-22 18:47     ` Dan Streetman
2014-08-22 18:47       ` Dan Streetman
2014-08-24 23:56     ` Minchan Kim
2014-08-24 23:56       ` Minchan Kim
2014-08-25  3:40       ` David Horner
2014-08-25  3:40         ` David Horner
2014-08-25  4:37         ` Minchan Kim
2014-08-25  4:37           ` Minchan Kim
2014-08-25  8:22           ` David Horner
2014-08-25  8:22             ` David Horner
2014-08-25 18:12             ` Dan Streetman
2014-08-25 18:12               ` Dan Streetman
2014-08-26  1:54               ` David Horner
2014-08-26  1:54                 ` David Horner
2014-08-26  4:39                 ` Minchan Kim
2014-08-26  4:39                   ` Minchan Kim
2014-08-26  5:36                   ` David Horner
2014-08-26  5:36                     ` David Horner
2014-08-26 13:31                   ` Dan Streetman
2014-08-26 13:31                     ` Dan Streetman
2014-08-26  4:28               ` David Horner
2014-08-26  4:28                 ` David Horner
2014-08-26 13:40                 ` Dan Streetman
2014-08-26 13:40                   ` Dan Streetman
2014-08-25  8:25           ` Dongsheng Song
2014-08-25  8:25             ` Dongsheng Song
2014-08-26  4:51             ` Minchan Kim
2014-08-26  4:51               ` Minchan Kim
2014-08-22  0:42 ` [PATCH v4 4/4] zram: report maximum used memory Minchan Kim
2014-08-22  0:42   ` Minchan Kim
2014-08-22 19:15 ` [PATCH v4 0/4] zram memory control enhance Dan Streetman
2014-08-22 19:15   ` Dan Streetman
2014-08-24 23:58   ` Minchan Kim
2014-08-24 23:58     ` Minchan Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.