* [PATCH v4 0/4] zram memory control enhance @ 2014-08-22 0:42 ` Minchan Kim 0 siblings, 0 replies; 44+ messages in thread From: Minchan Kim @ 2014-08-22 0:42 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman, ds2horner, Minchan Kim Currently, zram has no feature to limit memory so theoretically zram can deplete system memory. Users have asked for a limit several times as even without exhaustion zram makes it hard to control memory usage of the platform. This patchset adds the feature. Patch 1 makes zs_get_total_size_bytes faster because it would be used frequently in later patches for the new feature. Patch 2 changes zs_get_total_size_bytes's return unit from bytes to page so that zsmalloc doesn't need unnecessary operation(ie, << PAGE_SHIFT). Patch 3 adds new feature. I added the feature into zram layer, not zsmalloc because limiation is zram's requirement, not zsmalloc so any other user using zsmalloc(ie, zpool) shouldn't affected by unnecessary branch of zsmalloc. In future, if every users of zsmalloc want the feature, then, we could move the feature from client side to zsmalloc easily but vice versa would be painful. Patch 4 adds news facility to report maximum memory usage of zram so that this avoids user polling frequently via /sys/block/zram0/ mem_used_total and ensures transient max are not missed. * From v3 * get_zs_total_size_byte function name change - Dan * clarifiction of the document - Dan * atomic account instead of introducing new lock in zsmalloc - David * remove unnecessary atomic instruction in updating max - David * From v2 * introduce helper funcntion to update max_used_pages for readability - David * avoid unncessary zs_get_total_size call in updating loop for max_used_pages - David * From v1 * rebased on next-20140815 * fix up race problem - David, Dan * reset mem_used_max as current total_bytes, rather than 0 - David * resetting works with only "0" write for extensiblilty - David, Dan Minchan Kim (4): zsmalloc: move pages_allocated to zs_pool zsmalloc: change return value unit of zs_get_total_size_bytes zram: zram memory size limitation zram: report maximum used memory Documentation/ABI/testing/sysfs-block-zram | 20 ++++++ Documentation/blockdev/zram.txt | 25 +++++-- drivers/block/zram/zram_drv.c | 101 ++++++++++++++++++++++++++++- drivers/block/zram/zram_drv.h | 6 ++ include/linux/zsmalloc.h | 2 +- mm/zsmalloc.c | 30 ++++----- 6 files changed, 158 insertions(+), 26 deletions(-) -- 2.0.0 ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v4 0/4] zram memory control enhance @ 2014-08-22 0:42 ` Minchan Kim 0 siblings, 0 replies; 44+ messages in thread From: Minchan Kim @ 2014-08-22 0:42 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman, ds2horner, Minchan Kim Currently, zram has no feature to limit memory so theoretically zram can deplete system memory. Users have asked for a limit several times as even without exhaustion zram makes it hard to control memory usage of the platform. This patchset adds the feature. Patch 1 makes zs_get_total_size_bytes faster because it would be used frequently in later patches for the new feature. Patch 2 changes zs_get_total_size_bytes's return unit from bytes to page so that zsmalloc doesn't need unnecessary operation(ie, << PAGE_SHIFT). Patch 3 adds new feature. I added the feature into zram layer, not zsmalloc because limiation is zram's requirement, not zsmalloc so any other user using zsmalloc(ie, zpool) shouldn't affected by unnecessary branch of zsmalloc. In future, if every users of zsmalloc want the feature, then, we could move the feature from client side to zsmalloc easily but vice versa would be painful. Patch 4 adds news facility to report maximum memory usage of zram so that this avoids user polling frequently via /sys/block/zram0/ mem_used_total and ensures transient max are not missed. * From v3 * get_zs_total_size_byte function name change - Dan * clarifiction of the document - Dan * atomic account instead of introducing new lock in zsmalloc - David * remove unnecessary atomic instruction in updating max - David * From v2 * introduce helper funcntion to update max_used_pages for readability - David * avoid unncessary zs_get_total_size call in updating loop for max_used_pages - David * From v1 * rebased on next-20140815 * fix up race problem - David, Dan * reset mem_used_max as current total_bytes, rather than 0 - David * resetting works with only "0" write for extensiblilty - David, Dan Minchan Kim (4): zsmalloc: move pages_allocated to zs_pool zsmalloc: change return value unit of zs_get_total_size_bytes zram: zram memory size limitation zram: report maximum used memory Documentation/ABI/testing/sysfs-block-zram | 20 ++++++ Documentation/blockdev/zram.txt | 25 +++++-- drivers/block/zram/zram_drv.c | 101 ++++++++++++++++++++++++++++- drivers/block/zram/zram_drv.h | 6 ++ include/linux/zsmalloc.h | 2 +- mm/zsmalloc.c | 30 ++++----- 6 files changed, 158 insertions(+), 26 deletions(-) -- 2.0.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v4 1/4] zsmalloc: move pages_allocated to zs_pool 2014-08-22 0:42 ` Minchan Kim @ 2014-08-22 0:42 ` Minchan Kim -1 siblings, 0 replies; 44+ messages in thread From: Minchan Kim @ 2014-08-22 0:42 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman, ds2horner, Minchan Kim pages_allocated has counted in size_class structure and when user of zsmalloc want to see total_size_bytes, it should gather all of count from each size_class to report the sum. it's not bad if user don't see the value often but if user start to see the value frequently, it would be not a good deal for performance pov. This patch moves the count from size_class to zs_pool so it could reduce memory footprint (from [255 * 8byte] to [sizeof(atomic_long_t)]). Signed-off-by: Minchan Kim <minchan@kernel.org> --- mm/zsmalloc.c | 23 ++++++++--------------- 1 file changed, 8 insertions(+), 15 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 94f38fac5e81..2a4acf400846 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -199,9 +199,6 @@ struct size_class { spinlock_t lock; - /* stats */ - u64 pages_allocated; - struct page *fullness_list[_ZS_NR_FULLNESS_GROUPS]; }; @@ -220,6 +217,7 @@ struct zs_pool { struct size_class size_class[ZS_SIZE_CLASSES]; gfp_t flags; /* allocation flags used when growing pool */ + atomic_long_t pages_allocated; }; /* @@ -1028,8 +1026,9 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size) return 0; set_zspage_mapping(first_page, class->index, ZS_EMPTY); + atomic_long_add(class->pages_per_zspage, + &pool->pages_allocated); spin_lock(&class->lock); - class->pages_allocated += class->pages_per_zspage; } obj = (unsigned long)first_page->freelist; @@ -1082,14 +1081,13 @@ void zs_free(struct zs_pool *pool, unsigned long obj) first_page->inuse--; fullness = fix_fullness_group(pool, first_page); - - if (fullness == ZS_EMPTY) - class->pages_allocated -= class->pages_per_zspage; - spin_unlock(&class->lock); - if (fullness == ZS_EMPTY) + if (fullness == ZS_EMPTY) { + atomic_long_sub(class->pages_per_zspage, + &pool->pages_allocated); free_zspage(first_page); + } } EXPORT_SYMBOL_GPL(zs_free); @@ -1185,12 +1183,7 @@ EXPORT_SYMBOL_GPL(zs_unmap_object); u64 zs_get_total_size_bytes(struct zs_pool *pool) { - int i; - u64 npages = 0; - - for (i = 0; i < ZS_SIZE_CLASSES; i++) - npages += pool->size_class[i].pages_allocated; - + u64 npages = atomic_long_read(&pool->pages_allocated); return npages << PAGE_SHIFT; } EXPORT_SYMBOL_GPL(zs_get_total_size_bytes); -- 2.0.0 ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v4 1/4] zsmalloc: move pages_allocated to zs_pool @ 2014-08-22 0:42 ` Minchan Kim 0 siblings, 0 replies; 44+ messages in thread From: Minchan Kim @ 2014-08-22 0:42 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman, ds2horner, Minchan Kim pages_allocated has counted in size_class structure and when user of zsmalloc want to see total_size_bytes, it should gather all of count from each size_class to report the sum. it's not bad if user don't see the value often but if user start to see the value frequently, it would be not a good deal for performance pov. This patch moves the count from size_class to zs_pool so it could reduce memory footprint (from [255 * 8byte] to [sizeof(atomic_long_t)]). Signed-off-by: Minchan Kim <minchan@kernel.org> --- mm/zsmalloc.c | 23 ++++++++--------------- 1 file changed, 8 insertions(+), 15 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 94f38fac5e81..2a4acf400846 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -199,9 +199,6 @@ struct size_class { spinlock_t lock; - /* stats */ - u64 pages_allocated; - struct page *fullness_list[_ZS_NR_FULLNESS_GROUPS]; }; @@ -220,6 +217,7 @@ struct zs_pool { struct size_class size_class[ZS_SIZE_CLASSES]; gfp_t flags; /* allocation flags used when growing pool */ + atomic_long_t pages_allocated; }; /* @@ -1028,8 +1026,9 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size) return 0; set_zspage_mapping(first_page, class->index, ZS_EMPTY); + atomic_long_add(class->pages_per_zspage, + &pool->pages_allocated); spin_lock(&class->lock); - class->pages_allocated += class->pages_per_zspage; } obj = (unsigned long)first_page->freelist; @@ -1082,14 +1081,13 @@ void zs_free(struct zs_pool *pool, unsigned long obj) first_page->inuse--; fullness = fix_fullness_group(pool, first_page); - - if (fullness == ZS_EMPTY) - class->pages_allocated -= class->pages_per_zspage; - spin_unlock(&class->lock); - if (fullness == ZS_EMPTY) + if (fullness == ZS_EMPTY) { + atomic_long_sub(class->pages_per_zspage, + &pool->pages_allocated); free_zspage(first_page); + } } EXPORT_SYMBOL_GPL(zs_free); @@ -1185,12 +1183,7 @@ EXPORT_SYMBOL_GPL(zs_unmap_object); u64 zs_get_total_size_bytes(struct zs_pool *pool) { - int i; - u64 npages = 0; - - for (i = 0; i < ZS_SIZE_CLASSES; i++) - npages += pool->size_class[i].pages_allocated; - + u64 npages = atomic_long_read(&pool->pages_allocated); return npages << PAGE_SHIFT; } EXPORT_SYMBOL_GPL(zs_get_total_size_bytes); -- 2.0.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v4 2/4] zsmalloc: change return value unit of zs_get_total_size_bytes 2014-08-22 0:42 ` Minchan Kim @ 2014-08-22 0:42 ` Minchan Kim -1 siblings, 0 replies; 44+ messages in thread From: Minchan Kim @ 2014-08-22 0:42 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman, ds2horner, Minchan Kim zs_get_total_size_bytes returns a amount of memory zsmalloc consumed with *byte unit* but zsmalloc operates *page unit* rather than byte unit so let's change the API so benefit we could get is that reduce unnecessary overhead (ie, change page unit with byte unit) in zsmalloc. Since return type is pages, "zs_get_total_pages" is better than "zs_get_total_size_bytes". Signed-off-by: Minchan Kim <minchan@kernel.org> --- drivers/block/zram/zram_drv.c | 4 ++-- include/linux/zsmalloc.h | 2 +- mm/zsmalloc.c | 9 ++++----- 3 files changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index d00831c3d731..f0b8b30a7128 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -103,10 +103,10 @@ static ssize_t mem_used_total_show(struct device *dev, down_read(&zram->init_lock); if (init_done(zram)) - val = zs_get_total_size_bytes(meta->mem_pool); + val = zs_get_total_pages(meta->mem_pool); up_read(&zram->init_lock); - return scnprintf(buf, PAGE_SIZE, "%llu\n", val); + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); } static ssize_t max_comp_streams_show(struct device *dev, diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h index e44d634e7fb7..05c214760977 100644 --- a/include/linux/zsmalloc.h +++ b/include/linux/zsmalloc.h @@ -46,6 +46,6 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle, enum zs_mapmode mm); void zs_unmap_object(struct zs_pool *pool, unsigned long handle); -u64 zs_get_total_size_bytes(struct zs_pool *pool); +unsigned long zs_get_total_pages(struct zs_pool *pool); #endif diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 2a4acf400846..c4a91578dc96 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -297,7 +297,7 @@ static void zs_zpool_unmap(void *pool, unsigned long handle) static u64 zs_zpool_total_size(void *pool) { - return zs_get_total_size_bytes(pool); + return zs_get_total_pages(pool) << PAGE_SHIFT; } static struct zpool_driver zs_zpool_driver = { @@ -1181,12 +1181,11 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle) } EXPORT_SYMBOL_GPL(zs_unmap_object); -u64 zs_get_total_size_bytes(struct zs_pool *pool) +unsigned long zs_get_total_pages(struct zs_pool *pool) { - u64 npages = atomic_long_read(&pool->pages_allocated); - return npages << PAGE_SHIFT; + return atomic_long_read(&pool->pages_allocated); } -EXPORT_SYMBOL_GPL(zs_get_total_size_bytes); +EXPORT_SYMBOL_GPL(zs_get_total_pages); module_init(zs_init); module_exit(zs_exit); -- 2.0.0 ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v4 2/4] zsmalloc: change return value unit of zs_get_total_size_bytes @ 2014-08-22 0:42 ` Minchan Kim 0 siblings, 0 replies; 44+ messages in thread From: Minchan Kim @ 2014-08-22 0:42 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman, ds2horner, Minchan Kim zs_get_total_size_bytes returns a amount of memory zsmalloc consumed with *byte unit* but zsmalloc operates *page unit* rather than byte unit so let's change the API so benefit we could get is that reduce unnecessary overhead (ie, change page unit with byte unit) in zsmalloc. Since return type is pages, "zs_get_total_pages" is better than "zs_get_total_size_bytes". Signed-off-by: Minchan Kim <minchan@kernel.org> --- drivers/block/zram/zram_drv.c | 4 ++-- include/linux/zsmalloc.h | 2 +- mm/zsmalloc.c | 9 ++++----- 3 files changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index d00831c3d731..f0b8b30a7128 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -103,10 +103,10 @@ static ssize_t mem_used_total_show(struct device *dev, down_read(&zram->init_lock); if (init_done(zram)) - val = zs_get_total_size_bytes(meta->mem_pool); + val = zs_get_total_pages(meta->mem_pool); up_read(&zram->init_lock); - return scnprintf(buf, PAGE_SIZE, "%llu\n", val); + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); } static ssize_t max_comp_streams_show(struct device *dev, diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h index e44d634e7fb7..05c214760977 100644 --- a/include/linux/zsmalloc.h +++ b/include/linux/zsmalloc.h @@ -46,6 +46,6 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle, enum zs_mapmode mm); void zs_unmap_object(struct zs_pool *pool, unsigned long handle); -u64 zs_get_total_size_bytes(struct zs_pool *pool); +unsigned long zs_get_total_pages(struct zs_pool *pool); #endif diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 2a4acf400846..c4a91578dc96 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -297,7 +297,7 @@ static void zs_zpool_unmap(void *pool, unsigned long handle) static u64 zs_zpool_total_size(void *pool) { - return zs_get_total_size_bytes(pool); + return zs_get_total_pages(pool) << PAGE_SHIFT; } static struct zpool_driver zs_zpool_driver = { @@ -1181,12 +1181,11 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle) } EXPORT_SYMBOL_GPL(zs_unmap_object); -u64 zs_get_total_size_bytes(struct zs_pool *pool) +unsigned long zs_get_total_pages(struct zs_pool *pool) { - u64 npages = atomic_long_read(&pool->pages_allocated); - return npages << PAGE_SHIFT; + return atomic_long_read(&pool->pages_allocated); } -EXPORT_SYMBOL_GPL(zs_get_total_size_bytes); +EXPORT_SYMBOL_GPL(zs_get_total_pages); module_init(zs_init); module_exit(zs_exit); -- 2.0.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v4 3/4] zram: zram memory size limitation 2014-08-22 0:42 ` Minchan Kim @ 2014-08-22 0:42 ` Minchan Kim -1 siblings, 0 replies; 44+ messages in thread From: Minchan Kim @ 2014-08-22 0:42 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman, ds2horner, Minchan Kim Since zram has no control feature to limit memory usage, it makes hard to manage system memrory. This patch adds new knob "mem_limit" via sysfs to set up the a limit so that zram could fail allocation once it reaches the limit. In addition, user could change the limit in runtime so that he could manage the memory more dynamically. Default is no limit so it doesn't break old behavior. Signed-off-by: Minchan Kim <minchan@kernel.org> --- Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ Documentation/blockdev/zram.txt | 24 ++++++++++++++--- drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ drivers/block/zram/zram_drv.h | 5 ++++ 4 files changed, 76 insertions(+), 4 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram index 70ec992514d0..b8c779d64968 100644 --- a/Documentation/ABI/testing/sysfs-block-zram +++ b/Documentation/ABI/testing/sysfs-block-zram @@ -119,3 +119,13 @@ Description: efficiency can be calculated using compr_data_size and this statistic. Unit: bytes + +What: /sys/block/zram<id>/mem_limit +Date: August 2014 +Contact: Minchan Kim <minchan@kernel.org> +Description: + The mem_limit file is read/write and specifies the amount + of memory to be able to consume memory to store store + compressed data. The limit could be changed in run time + and "0" is default which means disable the limit. + Unit: bytes diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt index 0595c3f56ccf..82c6a41116db 100644 --- a/Documentation/blockdev/zram.txt +++ b/Documentation/blockdev/zram.txt @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the size of the disk when not in use so a huge zram is wasteful. -5) Activate: +5) Set memory limit: Optional + Set memory limit by writing the value to sysfs node 'mem_limit'. + The value can be either in bytes or you can use mem suffixes. + In addition, you could change the value in runtime. + Examples: + # limit /dev/zram0 with 50MB memory + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit + + # Using mem suffixes + echo 256K > /sys/block/zram0/mem_limit + echo 512M > /sys/block/zram0/mem_limit + echo 1G > /sys/block/zram0/mem_limit + + # To disable memory limit + echo 0 > /sys/block/zram0/mem_limit + +6) Activate: mkswap /dev/zram0 swapon /dev/zram0 mkfs.ext4 /dev/zram1 mount /dev/zram1 /tmp -6) Stats: +7) Stats: Per-device statistics are exported as various nodes under /sys/block/zram<id>/ disksize @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. compr_data_size mem_used_total -7) Deactivate: +8) Deactivate: swapoff /dev/zram0 umount /dev/zram1 -8) Reset: +9) Reset: Write any positive value to 'reset' sysfs node echo 1 > /sys/block/zram0/reset echo 1 > /sys/block/zram1/reset diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index f0b8b30a7128..370c355eb127 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, return scnprintf(buf, PAGE_SIZE, "%d\n", val); } +static ssize_t mem_limit_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + u64 val; + struct zram *zram = dev_to_zram(dev); + + down_read(&zram->init_lock); + val = zram->limit_pages; + up_read(&zram->init_lock); + + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); +} + +static ssize_t mem_limit_store(struct device *dev, + struct device_attribute *attr, const char *buf, size_t len) +{ + u64 limit; + struct zram *zram = dev_to_zram(dev); + + limit = memparse(buf, NULL); + down_write(&zram->init_lock); + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; + up_write(&zram->init_lock); + + return len; +} + static ssize_t max_comp_streams_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, ret = -ENOMEM; goto out; } + + if (zram->limit_pages && + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { + zs_free(meta->mem_pool, handle); + ret = -ENOMEM; + goto out; + } + cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) struct zram_meta *meta; down_write(&zram->init_lock); + + zram->limit_pages = 0; + if (!init_done(zram)) { up_write(&zram->init_lock); return; @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, + mem_limit_store); static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, max_comp_streams_show, max_comp_streams_store); static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { &dev_attr_orig_data_size.attr, &dev_attr_compr_data_size.attr, &dev_attr_mem_used_total.attr, + &dev_attr_mem_limit.attr, &dev_attr_max_comp_streams.attr, &dev_attr_comp_algorithm.attr, NULL, diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h index e0f725c87cc6..b7aa9c21553f 100644 --- a/drivers/block/zram/zram_drv.h +++ b/drivers/block/zram/zram_drv.h @@ -112,6 +112,11 @@ struct zram { u64 disksize; /* bytes */ int max_comp_streams; struct zram_stats stats; + /* + * the number of pages zram can consume for storing compressed data + */ + unsigned long limit_pages; + char compressor[10]; }; #endif -- 2.0.0 ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v4 3/4] zram: zram memory size limitation @ 2014-08-22 0:42 ` Minchan Kim 0 siblings, 0 replies; 44+ messages in thread From: Minchan Kim @ 2014-08-22 0:42 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman, ds2horner, Minchan Kim Since zram has no control feature to limit memory usage, it makes hard to manage system memrory. This patch adds new knob "mem_limit" via sysfs to set up the a limit so that zram could fail allocation once it reaches the limit. In addition, user could change the limit in runtime so that he could manage the memory more dynamically. Default is no limit so it doesn't break old behavior. Signed-off-by: Minchan Kim <minchan@kernel.org> --- Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ Documentation/blockdev/zram.txt | 24 ++++++++++++++--- drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ drivers/block/zram/zram_drv.h | 5 ++++ 4 files changed, 76 insertions(+), 4 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram index 70ec992514d0..b8c779d64968 100644 --- a/Documentation/ABI/testing/sysfs-block-zram +++ b/Documentation/ABI/testing/sysfs-block-zram @@ -119,3 +119,13 @@ Description: efficiency can be calculated using compr_data_size and this statistic. Unit: bytes + +What: /sys/block/zram<id>/mem_limit +Date: August 2014 +Contact: Minchan Kim <minchan@kernel.org> +Description: + The mem_limit file is read/write and specifies the amount + of memory to be able to consume memory to store store + compressed data. The limit could be changed in run time + and "0" is default which means disable the limit. + Unit: bytes diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt index 0595c3f56ccf..82c6a41116db 100644 --- a/Documentation/blockdev/zram.txt +++ b/Documentation/blockdev/zram.txt @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the size of the disk when not in use so a huge zram is wasteful. -5) Activate: +5) Set memory limit: Optional + Set memory limit by writing the value to sysfs node 'mem_limit'. + The value can be either in bytes or you can use mem suffixes. + In addition, you could change the value in runtime. + Examples: + # limit /dev/zram0 with 50MB memory + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit + + # Using mem suffixes + echo 256K > /sys/block/zram0/mem_limit + echo 512M > /sys/block/zram0/mem_limit + echo 1G > /sys/block/zram0/mem_limit + + # To disable memory limit + echo 0 > /sys/block/zram0/mem_limit + +6) Activate: mkswap /dev/zram0 swapon /dev/zram0 mkfs.ext4 /dev/zram1 mount /dev/zram1 /tmp -6) Stats: +7) Stats: Per-device statistics are exported as various nodes under /sys/block/zram<id>/ disksize @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. compr_data_size mem_used_total -7) Deactivate: +8) Deactivate: swapoff /dev/zram0 umount /dev/zram1 -8) Reset: +9) Reset: Write any positive value to 'reset' sysfs node echo 1 > /sys/block/zram0/reset echo 1 > /sys/block/zram1/reset diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index f0b8b30a7128..370c355eb127 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, return scnprintf(buf, PAGE_SIZE, "%d\n", val); } +static ssize_t mem_limit_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + u64 val; + struct zram *zram = dev_to_zram(dev); + + down_read(&zram->init_lock); + val = zram->limit_pages; + up_read(&zram->init_lock); + + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); +} + +static ssize_t mem_limit_store(struct device *dev, + struct device_attribute *attr, const char *buf, size_t len) +{ + u64 limit; + struct zram *zram = dev_to_zram(dev); + + limit = memparse(buf, NULL); + down_write(&zram->init_lock); + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; + up_write(&zram->init_lock); + + return len; +} + static ssize_t max_comp_streams_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, ret = -ENOMEM; goto out; } + + if (zram->limit_pages && + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { + zs_free(meta->mem_pool, handle); + ret = -ENOMEM; + goto out; + } + cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) struct zram_meta *meta; down_write(&zram->init_lock); + + zram->limit_pages = 0; + if (!init_done(zram)) { up_write(&zram->init_lock); return; @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, + mem_limit_store); static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, max_comp_streams_show, max_comp_streams_store); static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { &dev_attr_orig_data_size.attr, &dev_attr_compr_data_size.attr, &dev_attr_mem_used_total.attr, + &dev_attr_mem_limit.attr, &dev_attr_max_comp_streams.attr, &dev_attr_comp_algorithm.attr, NULL, diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h index e0f725c87cc6..b7aa9c21553f 100644 --- a/drivers/block/zram/zram_drv.h +++ b/drivers/block/zram/zram_drv.h @@ -112,6 +112,11 @@ struct zram { u64 disksize; /* bytes */ int max_comp_streams; struct zram_stats stats; + /* + * the number of pages zram can consume for storing compressed data + */ + unsigned long limit_pages; + char compressor[10]; }; #endif -- 2.0.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation 2014-08-22 0:42 ` Minchan Kim @ 2014-08-22 10:55 ` David Horner -1 siblings, 0 replies; 44+ messages in thread From: David Horner @ 2014-08-22 10:55 UTC (permalink / raw) To: Minchan Kim Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: > Since zram has no control feature to limit memory usage, > it makes hard to manage system memrory. > > This patch adds new knob "mem_limit" via sysfs to set up the > a limit so that zram could fail allocation once it reaches > the limit. > > In addition, user could change the limit in runtime so that > he could manage the memory more dynamically. > - Default is no limit so it doesn't break old behavior. + Initial state is no limit so it doesn't break old behavior. I understand your previous post now. I was saying that setting to either a null value or garbage (which is interpreted as zero by memparse(buf, NULL);) removes the limit. I think this is "surprise" behaviour and rather the null case should return -EINVAL The test below should be "good enough" though not catching all garbage. > > Signed-off-by: Minchan Kim <minchan@kernel.org> > --- > Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ > Documentation/blockdev/zram.txt | 24 ++++++++++++++--- > drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ > drivers/block/zram/zram_drv.h | 5 ++++ > 4 files changed, 76 insertions(+), 4 deletions(-) > > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram > index 70ec992514d0..b8c779d64968 100644 > --- a/Documentation/ABI/testing/sysfs-block-zram > +++ b/Documentation/ABI/testing/sysfs-block-zram > @@ -119,3 +119,13 @@ Description: > efficiency can be calculated using compr_data_size and this > statistic. > Unit: bytes > + > +What: /sys/block/zram<id>/mem_limit > +Date: August 2014 > +Contact: Minchan Kim <minchan@kernel.org> > +Description: > + The mem_limit file is read/write and specifies the amount > + of memory to be able to consume memory to store store > + compressed data. The limit could be changed in run time > - and "0" is default which means disable the limit. > + and "0" means disable the limit. No limit is the initial state. there should be no default in the API. > + Unit: bytes > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt > index 0595c3f56ccf..82c6a41116db 100644 > --- a/Documentation/blockdev/zram.txt > +++ b/Documentation/blockdev/zram.txt > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory > since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the > size of the disk when not in use so a huge zram is wasteful. > > -5) Activate: > +5) Set memory limit: Optional > + Set memory limit by writing the value to sysfs node 'mem_limit'. > + The value can be either in bytes or you can use mem suffixes. > + In addition, you could change the value in runtime. > + Examples: > + # limit /dev/zram0 with 50MB memory > + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit > + > + # Using mem suffixes > + echo 256K > /sys/block/zram0/mem_limit > + echo 512M > /sys/block/zram0/mem_limit > + echo 1G > /sys/block/zram0/mem_limit > + > + # To disable memory limit > + echo 0 > /sys/block/zram0/mem_limit > + > +6) Activate: > mkswap /dev/zram0 > swapon /dev/zram0 > > mkfs.ext4 /dev/zram1 > mount /dev/zram1 /tmp > > -6) Stats: > +7) Stats: > Per-device statistics are exported as various nodes under > /sys/block/zram<id>/ > disksize > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. > compr_data_size > mem_used_total > > -7) Deactivate: > +8) Deactivate: > swapoff /dev/zram0 > umount /dev/zram1 > > -8) Reset: > +9) Reset: > Write any positive value to 'reset' sysfs node > echo 1 > /sys/block/zram0/reset > echo 1 > /sys/block/zram1/reset > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > index f0b8b30a7128..370c355eb127 100644 > --- a/drivers/block/zram/zram_drv.c > +++ b/drivers/block/zram/zram_drv.c > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, > return scnprintf(buf, PAGE_SIZE, "%d\n", val); > } > > +static ssize_t mem_limit_show(struct device *dev, > + struct device_attribute *attr, char *buf) > +{ > + u64 val; > + struct zram *zram = dev_to_zram(dev); > + > + down_read(&zram->init_lock); > + val = zram->limit_pages; > + up_read(&zram->init_lock); > + > + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); > +} > + > +static ssize_t mem_limit_store(struct device *dev, > + struct device_attribute *attr, const char *buf, size_t len) > +{ > + u64 limit; > + struct zram *zram = dev_to_zram(dev); > + > + limit = memparse(buf, NULL); if (limit = 0 && buf != "0") return -EINVAL > + down_write(&zram->init_lock); > + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; > + up_write(&zram->init_lock); > + > + return len; > +} > + > static ssize_t max_comp_streams_store(struct device *dev, > struct device_attribute *attr, const char *buf, size_t len) > { > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, > ret = -ENOMEM; > goto out; > } > + > + if (zram->limit_pages && > + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { > + zs_free(meta->mem_pool, handle); > + ret = -ENOMEM; > + goto out; > + } > + > cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); > > if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) > struct zram_meta *meta; > > down_write(&zram->init_lock); > + > + zram->limit_pages = 0; > + > if (!init_done(zram)) { > up_write(&zram->init_lock); > return; > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); > static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); > static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); > static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, > + mem_limit_store); > static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, > max_comp_streams_show, max_comp_streams_store); > static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { > &dev_attr_orig_data_size.attr, > &dev_attr_compr_data_size.attr, > &dev_attr_mem_used_total.attr, > + &dev_attr_mem_limit.attr, > &dev_attr_max_comp_streams.attr, > &dev_attr_comp_algorithm.attr, > NULL, > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h > index e0f725c87cc6..b7aa9c21553f 100644 > --- a/drivers/block/zram/zram_drv.h > +++ b/drivers/block/zram/zram_drv.h > @@ -112,6 +112,11 @@ struct zram { > u64 disksize; /* bytes */ > int max_comp_streams; > struct zram_stats stats; > + /* > + * the number of pages zram can consume for storing compressed data > + */ > + unsigned long limit_pages; > + > char compressor[10]; > }; > #endif > -- > 2.0.0 > ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation @ 2014-08-22 10:55 ` David Horner 0 siblings, 0 replies; 44+ messages in thread From: David Horner @ 2014-08-22 10:55 UTC (permalink / raw) To: Minchan Kim Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: > Since zram has no control feature to limit memory usage, > it makes hard to manage system memrory. > > This patch adds new knob "mem_limit" via sysfs to set up the > a limit so that zram could fail allocation once it reaches > the limit. > > In addition, user could change the limit in runtime so that > he could manage the memory more dynamically. > - Default is no limit so it doesn't break old behavior. + Initial state is no limit so it doesn't break old behavior. I understand your previous post now. I was saying that setting to either a null value or garbage (which is interpreted as zero by memparse(buf, NULL);) removes the limit. I think this is "surprise" behaviour and rather the null case should return -EINVAL The test below should be "good enough" though not catching all garbage. > > Signed-off-by: Minchan Kim <minchan@kernel.org> > --- > Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ > Documentation/blockdev/zram.txt | 24 ++++++++++++++--- > drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ > drivers/block/zram/zram_drv.h | 5 ++++ > 4 files changed, 76 insertions(+), 4 deletions(-) > > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram > index 70ec992514d0..b8c779d64968 100644 > --- a/Documentation/ABI/testing/sysfs-block-zram > +++ b/Documentation/ABI/testing/sysfs-block-zram > @@ -119,3 +119,13 @@ Description: > efficiency can be calculated using compr_data_size and this > statistic. > Unit: bytes > + > +What: /sys/block/zram<id>/mem_limit > +Date: August 2014 > +Contact: Minchan Kim <minchan@kernel.org> > +Description: > + The mem_limit file is read/write and specifies the amount > + of memory to be able to consume memory to store store > + compressed data. The limit could be changed in run time > - and "0" is default which means disable the limit. > + and "0" means disable the limit. No limit is the initial state. there should be no default in the API. > + Unit: bytes > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt > index 0595c3f56ccf..82c6a41116db 100644 > --- a/Documentation/blockdev/zram.txt > +++ b/Documentation/blockdev/zram.txt > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory > since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the > size of the disk when not in use so a huge zram is wasteful. > > -5) Activate: > +5) Set memory limit: Optional > + Set memory limit by writing the value to sysfs node 'mem_limit'. > + The value can be either in bytes or you can use mem suffixes. > + In addition, you could change the value in runtime. > + Examples: > + # limit /dev/zram0 with 50MB memory > + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit > + > + # Using mem suffixes > + echo 256K > /sys/block/zram0/mem_limit > + echo 512M > /sys/block/zram0/mem_limit > + echo 1G > /sys/block/zram0/mem_limit > + > + # To disable memory limit > + echo 0 > /sys/block/zram0/mem_limit > + > +6) Activate: > mkswap /dev/zram0 > swapon /dev/zram0 > > mkfs.ext4 /dev/zram1 > mount /dev/zram1 /tmp > > -6) Stats: > +7) Stats: > Per-device statistics are exported as various nodes under > /sys/block/zram<id>/ > disksize > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. > compr_data_size > mem_used_total > > -7) Deactivate: > +8) Deactivate: > swapoff /dev/zram0 > umount /dev/zram1 > > -8) Reset: > +9) Reset: > Write any positive value to 'reset' sysfs node > echo 1 > /sys/block/zram0/reset > echo 1 > /sys/block/zram1/reset > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > index f0b8b30a7128..370c355eb127 100644 > --- a/drivers/block/zram/zram_drv.c > +++ b/drivers/block/zram/zram_drv.c > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, > return scnprintf(buf, PAGE_SIZE, "%d\n", val); > } > > +static ssize_t mem_limit_show(struct device *dev, > + struct device_attribute *attr, char *buf) > +{ > + u64 val; > + struct zram *zram = dev_to_zram(dev); > + > + down_read(&zram->init_lock); > + val = zram->limit_pages; > + up_read(&zram->init_lock); > + > + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); > +} > + > +static ssize_t mem_limit_store(struct device *dev, > + struct device_attribute *attr, const char *buf, size_t len) > +{ > + u64 limit; > + struct zram *zram = dev_to_zram(dev); > + > + limit = memparse(buf, NULL); if (limit = 0 && buf != "0") return -EINVAL > + down_write(&zram->init_lock); > + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; > + up_write(&zram->init_lock); > + > + return len; > +} > + > static ssize_t max_comp_streams_store(struct device *dev, > struct device_attribute *attr, const char *buf, size_t len) > { > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, > ret = -ENOMEM; > goto out; > } > + > + if (zram->limit_pages && > + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { > + zs_free(meta->mem_pool, handle); > + ret = -ENOMEM; > + goto out; > + } > + > cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); > > if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) > struct zram_meta *meta; > > down_write(&zram->init_lock); > + > + zram->limit_pages = 0; > + > if (!init_done(zram)) { > up_write(&zram->init_lock); > return; > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); > static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); > static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); > static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, > + mem_limit_store); > static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, > max_comp_streams_show, max_comp_streams_store); > static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { > &dev_attr_orig_data_size.attr, > &dev_attr_compr_data_size.attr, > &dev_attr_mem_used_total.attr, > + &dev_attr_mem_limit.attr, > &dev_attr_max_comp_streams.attr, > &dev_attr_comp_algorithm.attr, > NULL, > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h > index e0f725c87cc6..b7aa9c21553f 100644 > --- a/drivers/block/zram/zram_drv.h > +++ b/drivers/block/zram/zram_drv.h > @@ -112,6 +112,11 @@ struct zram { > u64 disksize; /* bytes */ > int max_comp_streams; > struct zram_stats stats; > + /* > + * the number of pages zram can consume for storing compressed data > + */ > + unsigned long limit_pages; > + > char compressor[10]; > }; > #endif > -- > 2.0.0 > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation 2014-08-22 10:55 ` David Horner @ 2014-08-22 18:47 ` Dan Streetman -1 siblings, 0 replies; 44+ messages in thread From: Dan Streetman @ 2014-08-22 18:47 UTC (permalink / raw) To: David Horner Cc: Minchan Kim, Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings On Fri, Aug 22, 2014 at 6:55 AM, David Horner <ds2horner@gmail.com> wrote: > On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: >> Since zram has no control feature to limit memory usage, >> it makes hard to manage system memrory. >> >> This patch adds new knob "mem_limit" via sysfs to set up the >> a limit so that zram could fail allocation once it reaches >> the limit. >> >> In addition, user could change the limit in runtime so that >> he could manage the memory more dynamically. >> > - Default is no limit so it doesn't break old behavior. > + Initial state is no limit so it doesn't break old behavior. > > I understand your previous post now. Yes by "default" I meant the initial value. > > I was saying that setting to either a null value or garbage > (which is interpreted as zero by memparse(buf, NULL);) > removes the limit. > > I think this is "surprise" behaviour and rather the null case should > return -EINVAL > The test below should be "good enough" though not catching all garbage. I'm not sure of the specifics of memparse, but if it returns 0 for non-numeric strings (which i assume it does, since there's no method for reporting errors) I agree that should return -EINVAL instead of clearing the mem_limit. > >> >> Signed-off-by: Minchan Kim <minchan@kernel.org> >> --- >> Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ >> Documentation/blockdev/zram.txt | 24 ++++++++++++++--- >> drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ >> drivers/block/zram/zram_drv.h | 5 ++++ >> 4 files changed, 76 insertions(+), 4 deletions(-) >> >> diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram >> index 70ec992514d0..b8c779d64968 100644 >> --- a/Documentation/ABI/testing/sysfs-block-zram >> +++ b/Documentation/ABI/testing/sysfs-block-zram >> @@ -119,3 +119,13 @@ Description: >> efficiency can be calculated using compr_data_size and this >> statistic. >> Unit: bytes >> + >> +What: /sys/block/zram<id>/mem_limit >> +Date: August 2014 >> +Contact: Minchan Kim <minchan@kernel.org> >> +Description: >> + The mem_limit file is read/write and specifies the amount >> + of memory to be able to consume memory to store store >> + compressed data. The limit could be changed in run time >> - and "0" is default which means disable the limit. >> + and "0" means disable the limit. No limit is the initial state. > > there should be no default in the API. > >> + Unit: bytes >> diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt >> index 0595c3f56ccf..82c6a41116db 100644 >> --- a/Documentation/blockdev/zram.txt >> +++ b/Documentation/blockdev/zram.txt >> @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory >> since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the >> size of the disk when not in use so a huge zram is wasteful. >> >> -5) Activate: >> +5) Set memory limit: Optional >> + Set memory limit by writing the value to sysfs node 'mem_limit'. >> + The value can be either in bytes or you can use mem suffixes. >> + In addition, you could change the value in runtime. >> + Examples: >> + # limit /dev/zram0 with 50MB memory >> + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit >> + >> + # Using mem suffixes >> + echo 256K > /sys/block/zram0/mem_limit >> + echo 512M > /sys/block/zram0/mem_limit >> + echo 1G > /sys/block/zram0/mem_limit >> + >> + # To disable memory limit >> + echo 0 > /sys/block/zram0/mem_limit >> + >> +6) Activate: >> mkswap /dev/zram0 >> swapon /dev/zram0 >> >> mkfs.ext4 /dev/zram1 >> mount /dev/zram1 /tmp >> >> -6) Stats: >> +7) Stats: >> Per-device statistics are exported as various nodes under >> /sys/block/zram<id>/ >> disksize >> @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. >> compr_data_size >> mem_used_total >> >> -7) Deactivate: >> +8) Deactivate: >> swapoff /dev/zram0 >> umount /dev/zram1 >> >> -8) Reset: >> +9) Reset: >> Write any positive value to 'reset' sysfs node >> echo 1 > /sys/block/zram0/reset >> echo 1 > /sys/block/zram1/reset >> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c >> index f0b8b30a7128..370c355eb127 100644 >> --- a/drivers/block/zram/zram_drv.c >> +++ b/drivers/block/zram/zram_drv.c >> @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, >> return scnprintf(buf, PAGE_SIZE, "%d\n", val); >> } >> >> +static ssize_t mem_limit_show(struct device *dev, >> + struct device_attribute *attr, char *buf) >> +{ >> + u64 val; >> + struct zram *zram = dev_to_zram(dev); >> + >> + down_read(&zram->init_lock); >> + val = zram->limit_pages; >> + up_read(&zram->init_lock); >> + >> + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); >> +} >> + >> +static ssize_t mem_limit_store(struct device *dev, >> + struct device_attribute *attr, const char *buf, size_t len) >> +{ >> + u64 limit; >> + struct zram *zram = dev_to_zram(dev); >> + >> + limit = memparse(buf, NULL); > > if (limit = 0 && buf != "0") > return -EINVAL > >> + down_write(&zram->init_lock); >> + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; >> + up_write(&zram->init_lock); >> + >> + return len; >> +} >> + >> static ssize_t max_comp_streams_store(struct device *dev, >> struct device_attribute *attr, const char *buf, size_t len) >> { >> @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, >> ret = -ENOMEM; >> goto out; >> } >> + >> + if (zram->limit_pages && >> + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { >> + zs_free(meta->mem_pool, handle); >> + ret = -ENOMEM; >> + goto out; >> + } >> + >> cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); >> >> if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { >> @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) >> struct zram_meta *meta; >> >> down_write(&zram->init_lock); >> + >> + zram->limit_pages = 0; >> + >> if (!init_done(zram)) { >> up_write(&zram->init_lock); >> return; >> @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); >> static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); >> static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); >> static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); >> +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, >> + mem_limit_store); >> static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, >> max_comp_streams_show, max_comp_streams_store); >> static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, >> @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { >> &dev_attr_orig_data_size.attr, >> &dev_attr_compr_data_size.attr, >> &dev_attr_mem_used_total.attr, >> + &dev_attr_mem_limit.attr, >> &dev_attr_max_comp_streams.attr, >> &dev_attr_comp_algorithm.attr, >> NULL, >> diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h >> index e0f725c87cc6..b7aa9c21553f 100644 >> --- a/drivers/block/zram/zram_drv.h >> +++ b/drivers/block/zram/zram_drv.h >> @@ -112,6 +112,11 @@ struct zram { >> u64 disksize; /* bytes */ >> int max_comp_streams; >> struct zram_stats stats; >> + /* >> + * the number of pages zram can consume for storing compressed data >> + */ >> + unsigned long limit_pages; >> + >> char compressor[10]; >> }; >> #endif >> -- >> 2.0.0 >> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation @ 2014-08-22 18:47 ` Dan Streetman 0 siblings, 0 replies; 44+ messages in thread From: Dan Streetman @ 2014-08-22 18:47 UTC (permalink / raw) To: David Horner Cc: Minchan Kim, Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings On Fri, Aug 22, 2014 at 6:55 AM, David Horner <ds2horner@gmail.com> wrote: > On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: >> Since zram has no control feature to limit memory usage, >> it makes hard to manage system memrory. >> >> This patch adds new knob "mem_limit" via sysfs to set up the >> a limit so that zram could fail allocation once it reaches >> the limit. >> >> In addition, user could change the limit in runtime so that >> he could manage the memory more dynamically. >> > - Default is no limit so it doesn't break old behavior. > + Initial state is no limit so it doesn't break old behavior. > > I understand your previous post now. Yes by "default" I meant the initial value. > > I was saying that setting to either a null value or garbage > (which is interpreted as zero by memparse(buf, NULL);) > removes the limit. > > I think this is "surprise" behaviour and rather the null case should > return -EINVAL > The test below should be "good enough" though not catching all garbage. I'm not sure of the specifics of memparse, but if it returns 0 for non-numeric strings (which i assume it does, since there's no method for reporting errors) I agree that should return -EINVAL instead of clearing the mem_limit. > >> >> Signed-off-by: Minchan Kim <minchan@kernel.org> >> --- >> Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ >> Documentation/blockdev/zram.txt | 24 ++++++++++++++--- >> drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ >> drivers/block/zram/zram_drv.h | 5 ++++ >> 4 files changed, 76 insertions(+), 4 deletions(-) >> >> diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram >> index 70ec992514d0..b8c779d64968 100644 >> --- a/Documentation/ABI/testing/sysfs-block-zram >> +++ b/Documentation/ABI/testing/sysfs-block-zram >> @@ -119,3 +119,13 @@ Description: >> efficiency can be calculated using compr_data_size and this >> statistic. >> Unit: bytes >> + >> +What: /sys/block/zram<id>/mem_limit >> +Date: August 2014 >> +Contact: Minchan Kim <minchan@kernel.org> >> +Description: >> + The mem_limit file is read/write and specifies the amount >> + of memory to be able to consume memory to store store >> + compressed data. The limit could be changed in run time >> - and "0" is default which means disable the limit. >> + and "0" means disable the limit. No limit is the initial state. > > there should be no default in the API. > >> + Unit: bytes >> diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt >> index 0595c3f56ccf..82c6a41116db 100644 >> --- a/Documentation/blockdev/zram.txt >> +++ b/Documentation/blockdev/zram.txt >> @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory >> since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the >> size of the disk when not in use so a huge zram is wasteful. >> >> -5) Activate: >> +5) Set memory limit: Optional >> + Set memory limit by writing the value to sysfs node 'mem_limit'. >> + The value can be either in bytes or you can use mem suffixes. >> + In addition, you could change the value in runtime. >> + Examples: >> + # limit /dev/zram0 with 50MB memory >> + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit >> + >> + # Using mem suffixes >> + echo 256K > /sys/block/zram0/mem_limit >> + echo 512M > /sys/block/zram0/mem_limit >> + echo 1G > /sys/block/zram0/mem_limit >> + >> + # To disable memory limit >> + echo 0 > /sys/block/zram0/mem_limit >> + >> +6) Activate: >> mkswap /dev/zram0 >> swapon /dev/zram0 >> >> mkfs.ext4 /dev/zram1 >> mount /dev/zram1 /tmp >> >> -6) Stats: >> +7) Stats: >> Per-device statistics are exported as various nodes under >> /sys/block/zram<id>/ >> disksize >> @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. >> compr_data_size >> mem_used_total >> >> -7) Deactivate: >> +8) Deactivate: >> swapoff /dev/zram0 >> umount /dev/zram1 >> >> -8) Reset: >> +9) Reset: >> Write any positive value to 'reset' sysfs node >> echo 1 > /sys/block/zram0/reset >> echo 1 > /sys/block/zram1/reset >> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c >> index f0b8b30a7128..370c355eb127 100644 >> --- a/drivers/block/zram/zram_drv.c >> +++ b/drivers/block/zram/zram_drv.c >> @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, >> return scnprintf(buf, PAGE_SIZE, "%d\n", val); >> } >> >> +static ssize_t mem_limit_show(struct device *dev, >> + struct device_attribute *attr, char *buf) >> +{ >> + u64 val; >> + struct zram *zram = dev_to_zram(dev); >> + >> + down_read(&zram->init_lock); >> + val = zram->limit_pages; >> + up_read(&zram->init_lock); >> + >> + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); >> +} >> + >> +static ssize_t mem_limit_store(struct device *dev, >> + struct device_attribute *attr, const char *buf, size_t len) >> +{ >> + u64 limit; >> + struct zram *zram = dev_to_zram(dev); >> + >> + limit = memparse(buf, NULL); > > if (limit = 0 && buf != "0") > return -EINVAL > >> + down_write(&zram->init_lock); >> + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; >> + up_write(&zram->init_lock); >> + >> + return len; >> +} >> + >> static ssize_t max_comp_streams_store(struct device *dev, >> struct device_attribute *attr, const char *buf, size_t len) >> { >> @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, >> ret = -ENOMEM; >> goto out; >> } >> + >> + if (zram->limit_pages && >> + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { >> + zs_free(meta->mem_pool, handle); >> + ret = -ENOMEM; >> + goto out; >> + } >> + >> cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); >> >> if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { >> @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) >> struct zram_meta *meta; >> >> down_write(&zram->init_lock); >> + >> + zram->limit_pages = 0; >> + >> if (!init_done(zram)) { >> up_write(&zram->init_lock); >> return; >> @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); >> static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); >> static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); >> static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); >> +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, >> + mem_limit_store); >> static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, >> max_comp_streams_show, max_comp_streams_store); >> static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, >> @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { >> &dev_attr_orig_data_size.attr, >> &dev_attr_compr_data_size.attr, >> &dev_attr_mem_used_total.attr, >> + &dev_attr_mem_limit.attr, >> &dev_attr_max_comp_streams.attr, >> &dev_attr_comp_algorithm.attr, >> NULL, >> diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h >> index e0f725c87cc6..b7aa9c21553f 100644 >> --- a/drivers/block/zram/zram_drv.h >> +++ b/drivers/block/zram/zram_drv.h >> @@ -112,6 +112,11 @@ struct zram { >> u64 disksize; /* bytes */ >> int max_comp_streams; >> struct zram_stats stats; >> + /* >> + * the number of pages zram can consume for storing compressed data >> + */ >> + unsigned long limit_pages; >> + >> char compressor[10]; >> }; >> #endif >> -- >> 2.0.0 >> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation 2014-08-22 10:55 ` David Horner @ 2014-08-24 23:56 ` Minchan Kim -1 siblings, 0 replies; 44+ messages in thread From: Minchan Kim @ 2014-08-24 23:56 UTC (permalink / raw) To: David Horner Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman Hello David, On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote: > On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: > > Since zram has no control feature to limit memory usage, > > it makes hard to manage system memrory. > > > > This patch adds new knob "mem_limit" via sysfs to set up the > > a limit so that zram could fail allocation once it reaches > > the limit. > > > > In addition, user could change the limit in runtime so that > > he could manage the memory more dynamically. > > > - Default is no limit so it doesn't break old behavior. > + Initial state is no limit so it doesn't break old behavior. > > I understand your previous post now. > > I was saying that setting to either a null value or garbage > (which is interpreted as zero by memparse(buf, NULL);) > removes the limit. > > I think this is "surprise" behaviour and rather the null case should > return -EINVAL > The test below should be "good enough" though not catching all garbage. Thanks for suggesting but as I said, it should be fixed in memparse itself, not caller if it is really problem so I don't want to touch it in this patchset. It's not critical for adding the feature. > > > > > Signed-off-by: Minchan Kim <minchan@kernel.org> > > --- > > Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ > > Documentation/blockdev/zram.txt | 24 ++++++++++++++--- > > drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ > > drivers/block/zram/zram_drv.h | 5 ++++ > > 4 files changed, 76 insertions(+), 4 deletions(-) > > > > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram > > index 70ec992514d0..b8c779d64968 100644 > > --- a/Documentation/ABI/testing/sysfs-block-zram > > +++ b/Documentation/ABI/testing/sysfs-block-zram > > @@ -119,3 +119,13 @@ Description: > > efficiency can be calculated using compr_data_size and this > > statistic. > > Unit: bytes > > + > > +What: /sys/block/zram<id>/mem_limit > > +Date: August 2014 > > +Contact: Minchan Kim <minchan@kernel.org> > > +Description: > > + The mem_limit file is read/write and specifies the amount > > + of memory to be able to consume memory to store store > > + compressed data. The limit could be changed in run time > > - and "0" is default which means disable the limit. > > + and "0" means disable the limit. No limit is the initial state. > > there should be no default in the API. Thanks. > > > + Unit: bytes > > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt > > index 0595c3f56ccf..82c6a41116db 100644 > > --- a/Documentation/blockdev/zram.txt > > +++ b/Documentation/blockdev/zram.txt > > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory > > since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the > > size of the disk when not in use so a huge zram is wasteful. > > > > -5) Activate: > > +5) Set memory limit: Optional > > + Set memory limit by writing the value to sysfs node 'mem_limit'. > > + The value can be either in bytes or you can use mem suffixes. > > + In addition, you could change the value in runtime. > > + Examples: > > + # limit /dev/zram0 with 50MB memory > > + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit > > + > > + # Using mem suffixes > > + echo 256K > /sys/block/zram0/mem_limit > > + echo 512M > /sys/block/zram0/mem_limit > > + echo 1G > /sys/block/zram0/mem_limit > > + > > + # To disable memory limit > > + echo 0 > /sys/block/zram0/mem_limit > > + > > +6) Activate: > > mkswap /dev/zram0 > > swapon /dev/zram0 > > > > mkfs.ext4 /dev/zram1 > > mount /dev/zram1 /tmp > > > > -6) Stats: > > +7) Stats: > > Per-device statistics are exported as various nodes under > > /sys/block/zram<id>/ > > disksize > > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. > > compr_data_size > > mem_used_total > > > > -7) Deactivate: > > +8) Deactivate: > > swapoff /dev/zram0 > > umount /dev/zram1 > > > > -8) Reset: > > +9) Reset: > > Write any positive value to 'reset' sysfs node > > echo 1 > /sys/block/zram0/reset > > echo 1 > /sys/block/zram1/reset > > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > > index f0b8b30a7128..370c355eb127 100644 > > --- a/drivers/block/zram/zram_drv.c > > +++ b/drivers/block/zram/zram_drv.c > > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, > > return scnprintf(buf, PAGE_SIZE, "%d\n", val); > > } > > > > +static ssize_t mem_limit_show(struct device *dev, > > + struct device_attribute *attr, char *buf) > > +{ > > + u64 val; > > + struct zram *zram = dev_to_zram(dev); > > + > > + down_read(&zram->init_lock); > > + val = zram->limit_pages; > > + up_read(&zram->init_lock); > > + > > + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); > > +} > > + > > +static ssize_t mem_limit_store(struct device *dev, > > + struct device_attribute *attr, const char *buf, size_t len) > > +{ > > + u64 limit; > > + struct zram *zram = dev_to_zram(dev); > > + > > + limit = memparse(buf, NULL); > > if (limit = 0 && buf != "0") > return -EINVAL > > > + down_write(&zram->init_lock); > > + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; > > + up_write(&zram->init_lock); > > + > > + return len; > > +} > > + > > static ssize_t max_comp_streams_store(struct device *dev, > > struct device_attribute *attr, const char *buf, size_t len) > > { > > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, > > ret = -ENOMEM; > > goto out; > > } > > + > > + if (zram->limit_pages && > > + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { > > + zs_free(meta->mem_pool, handle); > > + ret = -ENOMEM; > > + goto out; > > + } > > + > > cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); > > > > if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { > > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) > > struct zram_meta *meta; > > > > down_write(&zram->init_lock); > > + > > + zram->limit_pages = 0; > > + > > if (!init_done(zram)) { > > up_write(&zram->init_lock); > > return; > > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); > > static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); > > static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); > > static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); > > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, > > + mem_limit_store); > > static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, > > max_comp_streams_show, max_comp_streams_store); > > static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, > > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { > > &dev_attr_orig_data_size.attr, > > &dev_attr_compr_data_size.attr, > > &dev_attr_mem_used_total.attr, > > + &dev_attr_mem_limit.attr, > > &dev_attr_max_comp_streams.attr, > > &dev_attr_comp_algorithm.attr, > > NULL, > > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h > > index e0f725c87cc6..b7aa9c21553f 100644 > > --- a/drivers/block/zram/zram_drv.h > > +++ b/drivers/block/zram/zram_drv.h > > @@ -112,6 +112,11 @@ struct zram { > > u64 disksize; /* bytes */ > > int max_comp_streams; > > struct zram_stats stats; > > + /* > > + * the number of pages zram can consume for storing compressed data > > + */ > > + unsigned long limit_pages; > > + > > char compressor[10]; > > }; > > #endif > > -- > > 2.0.0 > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation @ 2014-08-24 23:56 ` Minchan Kim 0 siblings, 0 replies; 44+ messages in thread From: Minchan Kim @ 2014-08-24 23:56 UTC (permalink / raw) To: David Horner Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman Hello David, On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote: > On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: > > Since zram has no control feature to limit memory usage, > > it makes hard to manage system memrory. > > > > This patch adds new knob "mem_limit" via sysfs to set up the > > a limit so that zram could fail allocation once it reaches > > the limit. > > > > In addition, user could change the limit in runtime so that > > he could manage the memory more dynamically. > > > - Default is no limit so it doesn't break old behavior. > + Initial state is no limit so it doesn't break old behavior. > > I understand your previous post now. > > I was saying that setting to either a null value or garbage > (which is interpreted as zero by memparse(buf, NULL);) > removes the limit. > > I think this is "surprise" behaviour and rather the null case should > return -EINVAL > The test below should be "good enough" though not catching all garbage. Thanks for suggesting but as I said, it should be fixed in memparse itself, not caller if it is really problem so I don't want to touch it in this patchset. It's not critical for adding the feature. > > > > > Signed-off-by: Minchan Kim <minchan@kernel.org> > > --- > > Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ > > Documentation/blockdev/zram.txt | 24 ++++++++++++++--- > > drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ > > drivers/block/zram/zram_drv.h | 5 ++++ > > 4 files changed, 76 insertions(+), 4 deletions(-) > > > > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram > > index 70ec992514d0..b8c779d64968 100644 > > --- a/Documentation/ABI/testing/sysfs-block-zram > > +++ b/Documentation/ABI/testing/sysfs-block-zram > > @@ -119,3 +119,13 @@ Description: > > efficiency can be calculated using compr_data_size and this > > statistic. > > Unit: bytes > > + > > +What: /sys/block/zram<id>/mem_limit > > +Date: August 2014 > > +Contact: Minchan Kim <minchan@kernel.org> > > +Description: > > + The mem_limit file is read/write and specifies the amount > > + of memory to be able to consume memory to store store > > + compressed data. The limit could be changed in run time > > - and "0" is default which means disable the limit. > > + and "0" means disable the limit. No limit is the initial state. > > there should be no default in the API. Thanks. > > > + Unit: bytes > > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt > > index 0595c3f56ccf..82c6a41116db 100644 > > --- a/Documentation/blockdev/zram.txt > > +++ b/Documentation/blockdev/zram.txt > > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory > > since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the > > size of the disk when not in use so a huge zram is wasteful. > > > > -5) Activate: > > +5) Set memory limit: Optional > > + Set memory limit by writing the value to sysfs node 'mem_limit'. > > + The value can be either in bytes or you can use mem suffixes. > > + In addition, you could change the value in runtime. > > + Examples: > > + # limit /dev/zram0 with 50MB memory > > + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit > > + > > + # Using mem suffixes > > + echo 256K > /sys/block/zram0/mem_limit > > + echo 512M > /sys/block/zram0/mem_limit > > + echo 1G > /sys/block/zram0/mem_limit > > + > > + # To disable memory limit > > + echo 0 > /sys/block/zram0/mem_limit > > + > > +6) Activate: > > mkswap /dev/zram0 > > swapon /dev/zram0 > > > > mkfs.ext4 /dev/zram1 > > mount /dev/zram1 /tmp > > > > -6) Stats: > > +7) Stats: > > Per-device statistics are exported as various nodes under > > /sys/block/zram<id>/ > > disksize > > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. > > compr_data_size > > mem_used_total > > > > -7) Deactivate: > > +8) Deactivate: > > swapoff /dev/zram0 > > umount /dev/zram1 > > > > -8) Reset: > > +9) Reset: > > Write any positive value to 'reset' sysfs node > > echo 1 > /sys/block/zram0/reset > > echo 1 > /sys/block/zram1/reset > > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > > index f0b8b30a7128..370c355eb127 100644 > > --- a/drivers/block/zram/zram_drv.c > > +++ b/drivers/block/zram/zram_drv.c > > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, > > return scnprintf(buf, PAGE_SIZE, "%d\n", val); > > } > > > > +static ssize_t mem_limit_show(struct device *dev, > > + struct device_attribute *attr, char *buf) > > +{ > > + u64 val; > > + struct zram *zram = dev_to_zram(dev); > > + > > + down_read(&zram->init_lock); > > + val = zram->limit_pages; > > + up_read(&zram->init_lock); > > + > > + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); > > +} > > + > > +static ssize_t mem_limit_store(struct device *dev, > > + struct device_attribute *attr, const char *buf, size_t len) > > +{ > > + u64 limit; > > + struct zram *zram = dev_to_zram(dev); > > + > > + limit = memparse(buf, NULL); > > if (limit = 0 && buf != "0") > return -EINVAL > > > + down_write(&zram->init_lock); > > + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; > > + up_write(&zram->init_lock); > > + > > + return len; > > +} > > + > > static ssize_t max_comp_streams_store(struct device *dev, > > struct device_attribute *attr, const char *buf, size_t len) > > { > > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, > > ret = -ENOMEM; > > goto out; > > } > > + > > + if (zram->limit_pages && > > + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { > > + zs_free(meta->mem_pool, handle); > > + ret = -ENOMEM; > > + goto out; > > + } > > + > > cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); > > > > if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { > > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) > > struct zram_meta *meta; > > > > down_write(&zram->init_lock); > > + > > + zram->limit_pages = 0; > > + > > if (!init_done(zram)) { > > up_write(&zram->init_lock); > > return; > > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); > > static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); > > static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); > > static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); > > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, > > + mem_limit_store); > > static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, > > max_comp_streams_show, max_comp_streams_store); > > static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, > > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { > > &dev_attr_orig_data_size.attr, > > &dev_attr_compr_data_size.attr, > > &dev_attr_mem_used_total.attr, > > + &dev_attr_mem_limit.attr, > > &dev_attr_max_comp_streams.attr, > > &dev_attr_comp_algorithm.attr, > > NULL, > > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h > > index e0f725c87cc6..b7aa9c21553f 100644 > > --- a/drivers/block/zram/zram_drv.h > > +++ b/drivers/block/zram/zram_drv.h > > @@ -112,6 +112,11 @@ struct zram { > > u64 disksize; /* bytes */ > > int max_comp_streams; > > struct zram_stats stats; > > + /* > > + * the number of pages zram can consume for storing compressed data > > + */ > > + unsigned long limit_pages; > > + > > char compressor[10]; > > }; > > #endif > > -- > > 2.0.0 > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation 2014-08-24 23:56 ` Minchan Kim @ 2014-08-25 3:40 ` David Horner -1 siblings, 0 replies; 44+ messages in thread From: David Horner @ 2014-08-25 3:40 UTC (permalink / raw) To: Minchan Kim Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote: > Hello David, > > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote: >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: >> > Since zram has no control feature to limit memory usage, >> > it makes hard to manage system memrory. >> > >> > This patch adds new knob "mem_limit" via sysfs to set up the >> > a limit so that zram could fail allocation once it reaches >> > the limit. >> > >> > In addition, user could change the limit in runtime so that >> > he could manage the memory more dynamically. >> > >> - Default is no limit so it doesn't break old behavior. >> + Initial state is no limit so it doesn't break old behavior. >> >> I understand your previous post now. >> >> I was saying that setting to either a null value or garbage >> (which is interpreted as zero by memparse(buf, NULL);) >> removes the limit. >> >> I think this is "surprise" behaviour and rather the null case should >> return -EINVAL >> The test below should be "good enough" though not catching all garbage. > > Thanks for suggesting but as I said, it should be fixed in memparse itself, > not caller if it is really problem so I don't want to touch it in this > patchset. It's not critical for adding the feature. > I've looked into the memparse function more since we talked. I do believe a wrapper function around it for the typical use by sysfs would be very valuable. However, there is nothing wrong with memparse itself that needs to be fixed. It does what it is documented to do very well (In My Uninformed Opinion). It provides everything that a caller needs to manage the token that it processes. It thus handles strings like "7,,5,8,,9" with the implied zeros. The fact that other callers don't check the return pointer value to see if only a null string was processed, is not its fault. Nor that it may not be ideally suited to sysfs attributes; that other store functions use it in a given manner does not means that is correct - nor that it is incorrect for that "knob". Some attributes could be just as valid with null zeros. And you are correct, to disambiguate the zero is not required for the limit feature. Your original patch which disallowed zero was full feature for mem_limit. It is the requested non-crucial feature to allow zero to reestablish the initial state that benefits from distinguishing an explicit zero from a "default zero' when garbage is written. The final argument is that if we release this feature as is the undocumented functionality could be relied upon, and when later fixed: user space breaks. They say getting API right is a difficult exercise. I suggest, if we don't insisting on an explicit zero we have the API wrong. I don't think you disagreed, just that the burden to get it correct lay elsewhere. If that is the case it doesn't really matter, we cannot release this interface until it is corrected wherever it must be. And my zero check was a poor hack. I should have explicitly checked the returned pointer value. I will send that proposed revision, and hopefully you will consider it for inclusion. >> >> > >> > Signed-off-by: Minchan Kim <minchan@kernel.org> >> > --- >> > Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ >> > Documentation/blockdev/zram.txt | 24 ++++++++++++++--- >> > drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ >> > drivers/block/zram/zram_drv.h | 5 ++++ >> > 4 files changed, 76 insertions(+), 4 deletions(-) >> > >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram >> > index 70ec992514d0..b8c779d64968 100644 >> > --- a/Documentation/ABI/testing/sysfs-block-zram >> > +++ b/Documentation/ABI/testing/sysfs-block-zram >> > @@ -119,3 +119,13 @@ Description: >> > efficiency can be calculated using compr_data_size and this >> > statistic. >> > Unit: bytes >> > + >> > +What: /sys/block/zram<id>/mem_limit >> > +Date: August 2014 >> > +Contact: Minchan Kim <minchan@kernel.org> >> > +Description: >> > + The mem_limit file is read/write and specifies the amount >> > + of memory to be able to consume memory to store store >> > + compressed data. The limit could be changed in run time >> > - and "0" is default which means disable the limit. >> > + and "0" means disable the limit. No limit is the initial state. >> >> there should be no default in the API. > > Thanks. > >> >> > + Unit: bytes >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt >> > index 0595c3f56ccf..82c6a41116db 100644 >> > --- a/Documentation/blockdev/zram.txt >> > +++ b/Documentation/blockdev/zram.txt >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory >> > since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the >> > size of the disk when not in use so a huge zram is wasteful. >> > >> > -5) Activate: >> > +5) Set memory limit: Optional >> > + Set memory limit by writing the value to sysfs node 'mem_limit'. >> > + The value can be either in bytes or you can use mem suffixes. >> > + In addition, you could change the value in runtime. >> > + Examples: >> > + # limit /dev/zram0 with 50MB memory >> > + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit >> > + >> > + # Using mem suffixes >> > + echo 256K > /sys/block/zram0/mem_limit >> > + echo 512M > /sys/block/zram0/mem_limit >> > + echo 1G > /sys/block/zram0/mem_limit >> > + >> > + # To disable memory limit >> > + echo 0 > /sys/block/zram0/mem_limit >> > + >> > +6) Activate: >> > mkswap /dev/zram0 >> > swapon /dev/zram0 >> > >> > mkfs.ext4 /dev/zram1 >> > mount /dev/zram1 /tmp >> > >> > -6) Stats: >> > +7) Stats: >> > Per-device statistics are exported as various nodes under >> > /sys/block/zram<id>/ >> > disksize >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. >> > compr_data_size >> > mem_used_total >> > >> > -7) Deactivate: >> > +8) Deactivate: >> > swapoff /dev/zram0 >> > umount /dev/zram1 >> > >> > -8) Reset: >> > +9) Reset: >> > Write any positive value to 'reset' sysfs node >> > echo 1 > /sys/block/zram0/reset >> > echo 1 > /sys/block/zram1/reset >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c >> > index f0b8b30a7128..370c355eb127 100644 >> > --- a/drivers/block/zram/zram_drv.c >> > +++ b/drivers/block/zram/zram_drv.c >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, >> > return scnprintf(buf, PAGE_SIZE, "%d\n", val); >> > } >> > >> > +static ssize_t mem_limit_show(struct device *dev, >> > + struct device_attribute *attr, char *buf) >> > +{ >> > + u64 val; >> > + struct zram *zram = dev_to_zram(dev); >> > + >> > + down_read(&zram->init_lock); >> > + val = zram->limit_pages; >> > + up_read(&zram->init_lock); >> > + >> > + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); >> > +} >> > + >> > +static ssize_t mem_limit_store(struct device *dev, >> > + struct device_attribute *attr, const char *buf, size_t len) >> > +{ >> > + u64 limit; >> > + struct zram *zram = dev_to_zram(dev); >> > + >> > + limit = memparse(buf, NULL); >> >> if (limit = 0 && buf != "0") >> return -EINVAL >> >> > + down_write(&zram->init_lock); >> > + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; >> > + up_write(&zram->init_lock); >> > + >> > + return len; >> > +} >> > + >> > static ssize_t max_comp_streams_store(struct device *dev, >> > struct device_attribute *attr, const char *buf, size_t len) >> > { >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, >> > ret = -ENOMEM; >> > goto out; >> > } >> > + >> > + if (zram->limit_pages && >> > + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { >> > + zs_free(meta->mem_pool, handle); >> > + ret = -ENOMEM; >> > + goto out; >> > + } >> > + >> > cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); >> > >> > if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) >> > struct zram_meta *meta; >> > >> > down_write(&zram->init_lock); >> > + >> > + zram->limit_pages = 0; >> > + >> > if (!init_done(zram)) { >> > up_write(&zram->init_lock); >> > return; >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); >> > static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); >> > static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); >> > static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, >> > + mem_limit_store); >> > static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, >> > max_comp_streams_show, max_comp_streams_store); >> > static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { >> > &dev_attr_orig_data_size.attr, >> > &dev_attr_compr_data_size.attr, >> > &dev_attr_mem_used_total.attr, >> > + &dev_attr_mem_limit.attr, >> > &dev_attr_max_comp_streams.attr, >> > &dev_attr_comp_algorithm.attr, >> > NULL, >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h >> > index e0f725c87cc6..b7aa9c21553f 100644 >> > --- a/drivers/block/zram/zram_drv.h >> > +++ b/drivers/block/zram/zram_drv.h >> > @@ -112,6 +112,11 @@ struct zram { >> > u64 disksize; /* bytes */ >> > int max_comp_streams; >> > struct zram_stats stats; >> > + /* >> > + * the number of pages zram can consume for storing compressed data >> > + */ >> > + unsigned long limit_pages; >> > + >> > char compressor[10]; >> > }; >> > #endif >> > -- >> > 2.0.0 >> > >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > > -- > Kind regards, > Minchan Kim ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation @ 2014-08-25 3:40 ` David Horner 0 siblings, 0 replies; 44+ messages in thread From: David Horner @ 2014-08-25 3:40 UTC (permalink / raw) To: Minchan Kim Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote: > Hello David, > > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote: >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: >> > Since zram has no control feature to limit memory usage, >> > it makes hard to manage system memrory. >> > >> > This patch adds new knob "mem_limit" via sysfs to set up the >> > a limit so that zram could fail allocation once it reaches >> > the limit. >> > >> > In addition, user could change the limit in runtime so that >> > he could manage the memory more dynamically. >> > >> - Default is no limit so it doesn't break old behavior. >> + Initial state is no limit so it doesn't break old behavior. >> >> I understand your previous post now. >> >> I was saying that setting to either a null value or garbage >> (which is interpreted as zero by memparse(buf, NULL);) >> removes the limit. >> >> I think this is "surprise" behaviour and rather the null case should >> return -EINVAL >> The test below should be "good enough" though not catching all garbage. > > Thanks for suggesting but as I said, it should be fixed in memparse itself, > not caller if it is really problem so I don't want to touch it in this > patchset. It's not critical for adding the feature. > I've looked into the memparse function more since we talked. I do believe a wrapper function around it for the typical use by sysfs would be very valuable. However, there is nothing wrong with memparse itself that needs to be fixed. It does what it is documented to do very well (In My Uninformed Opinion). It provides everything that a caller needs to manage the token that it processes. It thus handles strings like "7,,5,8,,9" with the implied zeros. The fact that other callers don't check the return pointer value to see if only a null string was processed, is not its fault. Nor that it may not be ideally suited to sysfs attributes; that other store functions use it in a given manner does not means that is correct - nor that it is incorrect for that "knob". Some attributes could be just as valid with null zeros. And you are correct, to disambiguate the zero is not required for the limit feature. Your original patch which disallowed zero was full feature for mem_limit. It is the requested non-crucial feature to allow zero to reestablish the initial state that benefits from distinguishing an explicit zero from a "default zero' when garbage is written. The final argument is that if we release this feature as is the undocumented functionality could be relied upon, and when later fixed: user space breaks. They say getting API right is a difficult exercise. I suggest, if we don't insisting on an explicit zero we have the API wrong. I don't think you disagreed, just that the burden to get it correct lay elsewhere. If that is the case it doesn't really matter, we cannot release this interface until it is corrected wherever it must be. And my zero check was a poor hack. I should have explicitly checked the returned pointer value. I will send that proposed revision, and hopefully you will consider it for inclusion. >> >> > >> > Signed-off-by: Minchan Kim <minchan@kernel.org> >> > --- >> > Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ >> > Documentation/blockdev/zram.txt | 24 ++++++++++++++--- >> > drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ >> > drivers/block/zram/zram_drv.h | 5 ++++ >> > 4 files changed, 76 insertions(+), 4 deletions(-) >> > >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram >> > index 70ec992514d0..b8c779d64968 100644 >> > --- a/Documentation/ABI/testing/sysfs-block-zram >> > +++ b/Documentation/ABI/testing/sysfs-block-zram >> > @@ -119,3 +119,13 @@ Description: >> > efficiency can be calculated using compr_data_size and this >> > statistic. >> > Unit: bytes >> > + >> > +What: /sys/block/zram<id>/mem_limit >> > +Date: August 2014 >> > +Contact: Minchan Kim <minchan@kernel.org> >> > +Description: >> > + The mem_limit file is read/write and specifies the amount >> > + of memory to be able to consume memory to store store >> > + compressed data. The limit could be changed in run time >> > - and "0" is default which means disable the limit. >> > + and "0" means disable the limit. No limit is the initial state. >> >> there should be no default in the API. > > Thanks. > >> >> > + Unit: bytes >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt >> > index 0595c3f56ccf..82c6a41116db 100644 >> > --- a/Documentation/blockdev/zram.txt >> > +++ b/Documentation/blockdev/zram.txt >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory >> > since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the >> > size of the disk when not in use so a huge zram is wasteful. >> > >> > -5) Activate: >> > +5) Set memory limit: Optional >> > + Set memory limit by writing the value to sysfs node 'mem_limit'. >> > + The value can be either in bytes or you can use mem suffixes. >> > + In addition, you could change the value in runtime. >> > + Examples: >> > + # limit /dev/zram0 with 50MB memory >> > + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit >> > + >> > + # Using mem suffixes >> > + echo 256K > /sys/block/zram0/mem_limit >> > + echo 512M > /sys/block/zram0/mem_limit >> > + echo 1G > /sys/block/zram0/mem_limit >> > + >> > + # To disable memory limit >> > + echo 0 > /sys/block/zram0/mem_limit >> > + >> > +6) Activate: >> > mkswap /dev/zram0 >> > swapon /dev/zram0 >> > >> > mkfs.ext4 /dev/zram1 >> > mount /dev/zram1 /tmp >> > >> > -6) Stats: >> > +7) Stats: >> > Per-device statistics are exported as various nodes under >> > /sys/block/zram<id>/ >> > disksize >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. >> > compr_data_size >> > mem_used_total >> > >> > -7) Deactivate: >> > +8) Deactivate: >> > swapoff /dev/zram0 >> > umount /dev/zram1 >> > >> > -8) Reset: >> > +9) Reset: >> > Write any positive value to 'reset' sysfs node >> > echo 1 > /sys/block/zram0/reset >> > echo 1 > /sys/block/zram1/reset >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c >> > index f0b8b30a7128..370c355eb127 100644 >> > --- a/drivers/block/zram/zram_drv.c >> > +++ b/drivers/block/zram/zram_drv.c >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, >> > return scnprintf(buf, PAGE_SIZE, "%d\n", val); >> > } >> > >> > +static ssize_t mem_limit_show(struct device *dev, >> > + struct device_attribute *attr, char *buf) >> > +{ >> > + u64 val; >> > + struct zram *zram = dev_to_zram(dev); >> > + >> > + down_read(&zram->init_lock); >> > + val = zram->limit_pages; >> > + up_read(&zram->init_lock); >> > + >> > + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); >> > +} >> > + >> > +static ssize_t mem_limit_store(struct device *dev, >> > + struct device_attribute *attr, const char *buf, size_t len) >> > +{ >> > + u64 limit; >> > + struct zram *zram = dev_to_zram(dev); >> > + >> > + limit = memparse(buf, NULL); >> >> if (limit = 0 && buf != "0") >> return -EINVAL >> >> > + down_write(&zram->init_lock); >> > + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; >> > + up_write(&zram->init_lock); >> > + >> > + return len; >> > +} >> > + >> > static ssize_t max_comp_streams_store(struct device *dev, >> > struct device_attribute *attr, const char *buf, size_t len) >> > { >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, >> > ret = -ENOMEM; >> > goto out; >> > } >> > + >> > + if (zram->limit_pages && >> > + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { >> > + zs_free(meta->mem_pool, handle); >> > + ret = -ENOMEM; >> > + goto out; >> > + } >> > + >> > cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); >> > >> > if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) >> > struct zram_meta *meta; >> > >> > down_write(&zram->init_lock); >> > + >> > + zram->limit_pages = 0; >> > + >> > if (!init_done(zram)) { >> > up_write(&zram->init_lock); >> > return; >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); >> > static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); >> > static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); >> > static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, >> > + mem_limit_store); >> > static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, >> > max_comp_streams_show, max_comp_streams_store); >> > static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { >> > &dev_attr_orig_data_size.attr, >> > &dev_attr_compr_data_size.attr, >> > &dev_attr_mem_used_total.attr, >> > + &dev_attr_mem_limit.attr, >> > &dev_attr_max_comp_streams.attr, >> > &dev_attr_comp_algorithm.attr, >> > NULL, >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h >> > index e0f725c87cc6..b7aa9c21553f 100644 >> > --- a/drivers/block/zram/zram_drv.h >> > +++ b/drivers/block/zram/zram_drv.h >> > @@ -112,6 +112,11 @@ struct zram { >> > u64 disksize; /* bytes */ >> > int max_comp_streams; >> > struct zram_stats stats; >> > + /* >> > + * the number of pages zram can consume for storing compressed data >> > + */ >> > + unsigned long limit_pages; >> > + >> > char compressor[10]; >> > }; >> > #endif >> > -- >> > 2.0.0 >> > >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > > -- > Kind regards, > Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation 2014-08-25 3:40 ` David Horner @ 2014-08-25 4:37 ` Minchan Kim -1 siblings, 0 replies; 44+ messages in thread From: Minchan Kim @ 2014-08-25 4:37 UTC (permalink / raw) To: David Horner Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote: > On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote: > > Hello David, > > > > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote: > >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: > >> > Since zram has no control feature to limit memory usage, > >> > it makes hard to manage system memrory. > >> > > >> > This patch adds new knob "mem_limit" via sysfs to set up the > >> > a limit so that zram could fail allocation once it reaches > >> > the limit. > >> > > >> > In addition, user could change the limit in runtime so that > >> > he could manage the memory more dynamically. > >> > > >> - Default is no limit so it doesn't break old behavior. > >> + Initial state is no limit so it doesn't break old behavior. > >> > >> I understand your previous post now. > >> > >> I was saying that setting to either a null value or garbage > >> (which is interpreted as zero by memparse(buf, NULL);) > >> removes the limit. > >> > >> I think this is "surprise" behaviour and rather the null case should > >> return -EINVAL > >> The test below should be "good enough" though not catching all garbage. > > > > Thanks for suggesting but as I said, it should be fixed in memparse itself, > > not caller if it is really problem so I don't want to touch it in this > > patchset. It's not critical for adding the feature. > > > > I've looked into the memparse function more since we talked. > I do believe a wrapper function around it for the typical use by sysfs would > be very valuable. Agree. > However, there is nothing wrong with memparse itself that needs to be fixed. > > It does what it is documented to do very well (In My Uninformed Opinion). > It provides everything that a caller needs to manage the token that it > processes. > It thus handles strings like "7,,5,8,,9" with the implied zeros. Maybe strict_memparse would be better to protect such things so you could find several places to clean it up. > > The fact that other callers don't check the return pointer value to > see if only a null > string was processed, is not its fault. > Nor that it may not be ideally suited to sysfs attributes; that other store > functions use it in a given manner does not means that is correct - > nor that it is > incorrect for that "knob". Some attributes could be just as valid with > null zeros. > > And you are correct, to disambiguate the zero is not required for the > limit feature. > Your original patch which disallowed zero was full feature for mem_limit. > It is the requested non-crucial feature to allow zero to reestablish > the initial state > that benefits from distinguishing an explicit zero from a "default zero' > when garbage is written. > > The final argument is that if we release this feature as is the undocumented > functionality could be relied upon, and when later fixed: user space breaks. I don't get it. Why does it break userspace? The sysfs-block-zram says "0" means disable the limit. If someone writes *garabge* but work as if disabling the limit, it's not a right thing and he already broke although it worked so it would be not a problem if we fix later. (ie, we don't need to take care of broken userspace) Am I missing your point? > They say getting API right is a difficult exercise. I suggest, if we > don't insisting on > an explicit zero we have the API wrong. > > I don't think you disagreed, just that the burden to get it correct > lay elsewhere. > > If that is the case it doesn't really matter, we cannot release this > interface until > it is corrected wherever it must be. > > And my zero check was a poor hack. > > I should have explicitly checked the returned pointer value. > > I will send that proposed revision, and hopefully you will consider it > for inclusion. > > > > > >> > >> > > >> > Signed-off-by: Minchan Kim <minchan@kernel.org> > >> > --- > >> > Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ > >> > Documentation/blockdev/zram.txt | 24 ++++++++++++++--- > >> > drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ > >> > drivers/block/zram/zram_drv.h | 5 ++++ > >> > 4 files changed, 76 insertions(+), 4 deletions(-) > >> > > >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram > >> > index 70ec992514d0..b8c779d64968 100644 > >> > --- a/Documentation/ABI/testing/sysfs-block-zram > >> > +++ b/Documentation/ABI/testing/sysfs-block-zram > >> > @@ -119,3 +119,13 @@ Description: > >> > efficiency can be calculated using compr_data_size and this > >> > statistic. > >> > Unit: bytes > >> > + > >> > +What: /sys/block/zram<id>/mem_limit > >> > +Date: August 2014 > >> > +Contact: Minchan Kim <minchan@kernel.org> > >> > +Description: > >> > + The mem_limit file is read/write and specifies the amount > >> > + of memory to be able to consume memory to store store > >> > + compressed data. The limit could be changed in run time > >> > - and "0" is default which means disable the limit. > >> > + and "0" means disable the limit. No limit is the initial state. > >> > >> there should be no default in the API. > > > > Thanks. > > > >> > >> > + Unit: bytes > >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt > >> > index 0595c3f56ccf..82c6a41116db 100644 > >> > --- a/Documentation/blockdev/zram.txt > >> > +++ b/Documentation/blockdev/zram.txt > >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory > >> > since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the > >> > size of the disk when not in use so a huge zram is wasteful. > >> > > >> > -5) Activate: > >> > +5) Set memory limit: Optional > >> > + Set memory limit by writing the value to sysfs node 'mem_limit'. > >> > + The value can be either in bytes or you can use mem suffixes. > >> > + In addition, you could change the value in runtime. > >> > + Examples: > >> > + # limit /dev/zram0 with 50MB memory > >> > + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit > >> > + > >> > + # Using mem suffixes > >> > + echo 256K > /sys/block/zram0/mem_limit > >> > + echo 512M > /sys/block/zram0/mem_limit > >> > + echo 1G > /sys/block/zram0/mem_limit > >> > + > >> > + # To disable memory limit > >> > + echo 0 > /sys/block/zram0/mem_limit > >> > + > >> > +6) Activate: > >> > mkswap /dev/zram0 > >> > swapon /dev/zram0 > >> > > >> > mkfs.ext4 /dev/zram1 > >> > mount /dev/zram1 /tmp > >> > > >> > -6) Stats: > >> > +7) Stats: > >> > Per-device statistics are exported as various nodes under > >> > /sys/block/zram<id>/ > >> > disksize > >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. > >> > compr_data_size > >> > mem_used_total > >> > > >> > -7) Deactivate: > >> > +8) Deactivate: > >> > swapoff /dev/zram0 > >> > umount /dev/zram1 > >> > > >> > -8) Reset: > >> > +9) Reset: > >> > Write any positive value to 'reset' sysfs node > >> > echo 1 > /sys/block/zram0/reset > >> > echo 1 > /sys/block/zram1/reset > >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > >> > index f0b8b30a7128..370c355eb127 100644 > >> > --- a/drivers/block/zram/zram_drv.c > >> > +++ b/drivers/block/zram/zram_drv.c > >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, > >> > return scnprintf(buf, PAGE_SIZE, "%d\n", val); > >> > } > >> > > >> > +static ssize_t mem_limit_show(struct device *dev, > >> > + struct device_attribute *attr, char *buf) > >> > +{ > >> > + u64 val; > >> > + struct zram *zram = dev_to_zram(dev); > >> > + > >> > + down_read(&zram->init_lock); > >> > + val = zram->limit_pages; > >> > + up_read(&zram->init_lock); > >> > + > >> > + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); > >> > +} > >> > + > >> > +static ssize_t mem_limit_store(struct device *dev, > >> > + struct device_attribute *attr, const char *buf, size_t len) > >> > +{ > >> > + u64 limit; > >> > + struct zram *zram = dev_to_zram(dev); > >> > + > >> > + limit = memparse(buf, NULL); > >> > >> if (limit = 0 && buf != "0") > >> return -EINVAL > >> > >> > + down_write(&zram->init_lock); > >> > + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; > >> > + up_write(&zram->init_lock); > >> > + > >> > + return len; > >> > +} > >> > + > >> > static ssize_t max_comp_streams_store(struct device *dev, > >> > struct device_attribute *attr, const char *buf, size_t len) > >> > { > >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, > >> > ret = -ENOMEM; > >> > goto out; > >> > } > >> > + > >> > + if (zram->limit_pages && > >> > + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { > >> > + zs_free(meta->mem_pool, handle); > >> > + ret = -ENOMEM; > >> > + goto out; > >> > + } > >> > + > >> > cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); > >> > > >> > if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { > >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) > >> > struct zram_meta *meta; > >> > > >> > down_write(&zram->init_lock); > >> > + > >> > + zram->limit_pages = 0; > >> > + > >> > if (!init_done(zram)) { > >> > up_write(&zram->init_lock); > >> > return; > >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); > >> > static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); > >> > static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); > >> > static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); > >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, > >> > + mem_limit_store); > >> > static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, > >> > max_comp_streams_show, max_comp_streams_store); > >> > static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, > >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { > >> > &dev_attr_orig_data_size.attr, > >> > &dev_attr_compr_data_size.attr, > >> > &dev_attr_mem_used_total.attr, > >> > + &dev_attr_mem_limit.attr, > >> > &dev_attr_max_comp_streams.attr, > >> > &dev_attr_comp_algorithm.attr, > >> > NULL, > >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h > >> > index e0f725c87cc6..b7aa9c21553f 100644 > >> > --- a/drivers/block/zram/zram_drv.h > >> > +++ b/drivers/block/zram/zram_drv.h > >> > @@ -112,6 +112,11 @@ struct zram { > >> > u64 disksize; /* bytes */ > >> > int max_comp_streams; > >> > struct zram_stats stats; > >> > + /* > >> > + * the number of pages zram can consume for storing compressed data > >> > + */ > >> > + unsigned long limit_pages; > >> > + > >> > char compressor[10]; > >> > }; > >> > #endif > >> > -- > >> > 2.0.0 > >> > > >> > >> -- > >> To unsubscribe, send a message with 'unsubscribe linux-mm' in > >> the body to majordomo@kvack.org. For more info on Linux MM, > >> see: http://www.linux-mm.org/ . > >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > > > > -- > > Kind regards, > > Minchan Kim > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation @ 2014-08-25 4:37 ` Minchan Kim 0 siblings, 0 replies; 44+ messages in thread From: Minchan Kim @ 2014-08-25 4:37 UTC (permalink / raw) To: David Horner Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote: > On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote: > > Hello David, > > > > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote: > >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: > >> > Since zram has no control feature to limit memory usage, > >> > it makes hard to manage system memrory. > >> > > >> > This patch adds new knob "mem_limit" via sysfs to set up the > >> > a limit so that zram could fail allocation once it reaches > >> > the limit. > >> > > >> > In addition, user could change the limit in runtime so that > >> > he could manage the memory more dynamically. > >> > > >> - Default is no limit so it doesn't break old behavior. > >> + Initial state is no limit so it doesn't break old behavior. > >> > >> I understand your previous post now. > >> > >> I was saying that setting to either a null value or garbage > >> (which is interpreted as zero by memparse(buf, NULL);) > >> removes the limit. > >> > >> I think this is "surprise" behaviour and rather the null case should > >> return -EINVAL > >> The test below should be "good enough" though not catching all garbage. > > > > Thanks for suggesting but as I said, it should be fixed in memparse itself, > > not caller if it is really problem so I don't want to touch it in this > > patchset. It's not critical for adding the feature. > > > > I've looked into the memparse function more since we talked. > I do believe a wrapper function around it for the typical use by sysfs would > be very valuable. Agree. > However, there is nothing wrong with memparse itself that needs to be fixed. > > It does what it is documented to do very well (In My Uninformed Opinion). > It provides everything that a caller needs to manage the token that it > processes. > It thus handles strings like "7,,5,8,,9" with the implied zeros. Maybe strict_memparse would be better to protect such things so you could find several places to clean it up. > > The fact that other callers don't check the return pointer value to > see if only a null > string was processed, is not its fault. > Nor that it may not be ideally suited to sysfs attributes; that other store > functions use it in a given manner does not means that is correct - > nor that it is > incorrect for that "knob". Some attributes could be just as valid with > null zeros. > > And you are correct, to disambiguate the zero is not required for the > limit feature. > Your original patch which disallowed zero was full feature for mem_limit. > It is the requested non-crucial feature to allow zero to reestablish > the initial state > that benefits from distinguishing an explicit zero from a "default zero' > when garbage is written. > > The final argument is that if we release this feature as is the undocumented > functionality could be relied upon, and when later fixed: user space breaks. I don't get it. Why does it break userspace? The sysfs-block-zram says "0" means disable the limit. If someone writes *garabge* but work as if disabling the limit, it's not a right thing and he already broke although it worked so it would be not a problem if we fix later. (ie, we don't need to take care of broken userspace) Am I missing your point? > They say getting API right is a difficult exercise. I suggest, if we > don't insisting on > an explicit zero we have the API wrong. > > I don't think you disagreed, just that the burden to get it correct > lay elsewhere. > > If that is the case it doesn't really matter, we cannot release this > interface until > it is corrected wherever it must be. > > And my zero check was a poor hack. > > I should have explicitly checked the returned pointer value. > > I will send that proposed revision, and hopefully you will consider it > for inclusion. > > > > > >> > >> > > >> > Signed-off-by: Minchan Kim <minchan@kernel.org> > >> > --- > >> > Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ > >> > Documentation/blockdev/zram.txt | 24 ++++++++++++++--- > >> > drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ > >> > drivers/block/zram/zram_drv.h | 5 ++++ > >> > 4 files changed, 76 insertions(+), 4 deletions(-) > >> > > >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram > >> > index 70ec992514d0..b8c779d64968 100644 > >> > --- a/Documentation/ABI/testing/sysfs-block-zram > >> > +++ b/Documentation/ABI/testing/sysfs-block-zram > >> > @@ -119,3 +119,13 @@ Description: > >> > efficiency can be calculated using compr_data_size and this > >> > statistic. > >> > Unit: bytes > >> > + > >> > +What: /sys/block/zram<id>/mem_limit > >> > +Date: August 2014 > >> > +Contact: Minchan Kim <minchan@kernel.org> > >> > +Description: > >> > + The mem_limit file is read/write and specifies the amount > >> > + of memory to be able to consume memory to store store > >> > + compressed data. The limit could be changed in run time > >> > - and "0" is default which means disable the limit. > >> > + and "0" means disable the limit. No limit is the initial state. > >> > >> there should be no default in the API. > > > > Thanks. > > > >> > >> > + Unit: bytes > >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt > >> > index 0595c3f56ccf..82c6a41116db 100644 > >> > --- a/Documentation/blockdev/zram.txt > >> > +++ b/Documentation/blockdev/zram.txt > >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory > >> > since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the > >> > size of the disk when not in use so a huge zram is wasteful. > >> > > >> > -5) Activate: > >> > +5) Set memory limit: Optional > >> > + Set memory limit by writing the value to sysfs node 'mem_limit'. > >> > + The value can be either in bytes or you can use mem suffixes. > >> > + In addition, you could change the value in runtime. > >> > + Examples: > >> > + # limit /dev/zram0 with 50MB memory > >> > + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit > >> > + > >> > + # Using mem suffixes > >> > + echo 256K > /sys/block/zram0/mem_limit > >> > + echo 512M > /sys/block/zram0/mem_limit > >> > + echo 1G > /sys/block/zram0/mem_limit > >> > + > >> > + # To disable memory limit > >> > + echo 0 > /sys/block/zram0/mem_limit > >> > + > >> > +6) Activate: > >> > mkswap /dev/zram0 > >> > swapon /dev/zram0 > >> > > >> > mkfs.ext4 /dev/zram1 > >> > mount /dev/zram1 /tmp > >> > > >> > -6) Stats: > >> > +7) Stats: > >> > Per-device statistics are exported as various nodes under > >> > /sys/block/zram<id>/ > >> > disksize > >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. > >> > compr_data_size > >> > mem_used_total > >> > > >> > -7) Deactivate: > >> > +8) Deactivate: > >> > swapoff /dev/zram0 > >> > umount /dev/zram1 > >> > > >> > -8) Reset: > >> > +9) Reset: > >> > Write any positive value to 'reset' sysfs node > >> > echo 1 > /sys/block/zram0/reset > >> > echo 1 > /sys/block/zram1/reset > >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > >> > index f0b8b30a7128..370c355eb127 100644 > >> > --- a/drivers/block/zram/zram_drv.c > >> > +++ b/drivers/block/zram/zram_drv.c > >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, > >> > return scnprintf(buf, PAGE_SIZE, "%d\n", val); > >> > } > >> > > >> > +static ssize_t mem_limit_show(struct device *dev, > >> > + struct device_attribute *attr, char *buf) > >> > +{ > >> > + u64 val; > >> > + struct zram *zram = dev_to_zram(dev); > >> > + > >> > + down_read(&zram->init_lock); > >> > + val = zram->limit_pages; > >> > + up_read(&zram->init_lock); > >> > + > >> > + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); > >> > +} > >> > + > >> > +static ssize_t mem_limit_store(struct device *dev, > >> > + struct device_attribute *attr, const char *buf, size_t len) > >> > +{ > >> > + u64 limit; > >> > + struct zram *zram = dev_to_zram(dev); > >> > + > >> > + limit = memparse(buf, NULL); > >> > >> if (limit = 0 && buf != "0") > >> return -EINVAL > >> > >> > + down_write(&zram->init_lock); > >> > + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; > >> > + up_write(&zram->init_lock); > >> > + > >> > + return len; > >> > +} > >> > + > >> > static ssize_t max_comp_streams_store(struct device *dev, > >> > struct device_attribute *attr, const char *buf, size_t len) > >> > { > >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, > >> > ret = -ENOMEM; > >> > goto out; > >> > } > >> > + > >> > + if (zram->limit_pages && > >> > + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { > >> > + zs_free(meta->mem_pool, handle); > >> > + ret = -ENOMEM; > >> > + goto out; > >> > + } > >> > + > >> > cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); > >> > > >> > if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { > >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) > >> > struct zram_meta *meta; > >> > > >> > down_write(&zram->init_lock); > >> > + > >> > + zram->limit_pages = 0; > >> > + > >> > if (!init_done(zram)) { > >> > up_write(&zram->init_lock); > >> > return; > >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); > >> > static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); > >> > static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); > >> > static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); > >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, > >> > + mem_limit_store); > >> > static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, > >> > max_comp_streams_show, max_comp_streams_store); > >> > static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, > >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { > >> > &dev_attr_orig_data_size.attr, > >> > &dev_attr_compr_data_size.attr, > >> > &dev_attr_mem_used_total.attr, > >> > + &dev_attr_mem_limit.attr, > >> > &dev_attr_max_comp_streams.attr, > >> > &dev_attr_comp_algorithm.attr, > >> > NULL, > >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h > >> > index e0f725c87cc6..b7aa9c21553f 100644 > >> > --- a/drivers/block/zram/zram_drv.h > >> > +++ b/drivers/block/zram/zram_drv.h > >> > @@ -112,6 +112,11 @@ struct zram { > >> > u64 disksize; /* bytes */ > >> > int max_comp_streams; > >> > struct zram_stats stats; > >> > + /* > >> > + * the number of pages zram can consume for storing compressed data > >> > + */ > >> > + unsigned long limit_pages; > >> > + > >> > char compressor[10]; > >> > }; > >> > #endif > >> > -- > >> > 2.0.0 > >> > > >> > >> -- > >> To unsubscribe, send a message with 'unsubscribe linux-mm' in > >> the body to majordomo@kvack.org. For more info on Linux MM, > >> see: http://www.linux-mm.org/ . > >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > > > > -- > > Kind regards, > > Minchan Kim > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation 2014-08-25 4:37 ` Minchan Kim @ 2014-08-25 8:22 ` David Horner -1 siblings, 0 replies; 44+ messages in thread From: David Horner @ 2014-08-25 8:22 UTC (permalink / raw) To: Minchan Kim Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote: > On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote: >> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote: >> > Hello David, >> > >> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote: >> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: >> >> > Since zram has no control feature to limit memory usage, >> >> > it makes hard to manage system memrory. >> >> > >> >> > This patch adds new knob "mem_limit" via sysfs to set up the >> >> > a limit so that zram could fail allocation once it reaches >> >> > the limit. >> >> > >> >> > In addition, user could change the limit in runtime so that >> >> > he could manage the memory more dynamically. >> >> > >> >> - Default is no limit so it doesn't break old behavior. >> >> + Initial state is no limit so it doesn't break old behavior. >> >> >> >> I understand your previous post now. >> >> >> >> I was saying that setting to either a null value or garbage >> >> (which is interpreted as zero by memparse(buf, NULL);) >> >> removes the limit. >> >> >> >> I think this is "surprise" behaviour and rather the null case should >> >> return -EINVAL >> >> The test below should be "good enough" though not catching all garbage. >> > >> > Thanks for suggesting but as I said, it should be fixed in memparse itself, >> > not caller if it is really problem so I don't want to touch it in this >> > patchset. It's not critical for adding the feature. >> > >> >> I've looked into the memparse function more since we talked. >> I do believe a wrapper function around it for the typical use by sysfs would >> be very valuable. > > Agree. > >> However, there is nothing wrong with memparse itself that needs to be fixed. >> >> It does what it is documented to do very well (In My Uninformed Opinion). >> It provides everything that a caller needs to manage the token that it >> processes. >> It thus handles strings like "7,,5,8,,9" with the implied zeros. > > Maybe strict_memparse would be better to protect such things so you > could find several places to clean it up. > >> >> The fact that other callers don't check the return pointer value to >> see if only a null >> string was processed, is not its fault. >> Nor that it may not be ideally suited to sysfs attributes; that other store >> functions use it in a given manner does not means that is correct - >> nor that it is >> incorrect for that "knob". Some attributes could be just as valid with >> null zeros. >> >> And you are correct, to disambiguate the zero is not required for the >> limit feature. >> Your original patch which disallowed zero was full feature for mem_limit. >> It is the requested non-crucial feature to allow zero to reestablish >> the initial state >> that benefits from distinguishing an explicit zero from a "default zero' >> when garbage is written. >> >> The final argument is that if we release this feature as is the undocumented >> functionality could be relied upon, and when later fixed: user space breaks. > > I don't get it. Why does it break userspace? > The sysfs-block-zram says "0" means disable the limit. > If someone writes *garabge* but work as if disabling the limit, > it's not a right thing and he already broke although it worked > so it would be not a problem if we fix later. > (ie, we don't need to take care of broken userspace) > Am I missing your point? > Perhaps you are missing my point, perhaps ignoring or dismissing. Basically, if a facility works in a useful way, even if it was designed for different usage, that becomes the "accepted" interface/usage. The developer may not have intended that usage or may even considered it wrong and a broken usage, but it is what it is and people become reliant on that behaviour. Case in point is memparse itself. The developer intentionally sets the return pointer because that is the only value that can be validated for correct performance. The return value allows -ve so the standard error message passing is not valid. Unfortunately, C allows the user to pass a NULL value in the parameter. The developer could consider that absurd and fundamentally broken. But to the user it is a valid situation, because (perhaps) it can't be bothered to handle error cases. So, who is to blame. You say memparse, that it is fundamentally broken, because it didn't check to see that it was used correctly. And I say mem_limit_store is fundamentally broken, because it didn't check to see that it was used correctly. The difference is that memparse cannot stop being abused (C allows the NULL argument and extensive tricks are required to address that) however, we can readily fix mem_limit_store and ensure 1) no regression when the interface IS fixed and 2) predictable behaviour when accidental or "fuzzy" input arrives. >> They say getting API right is a difficult exercise. I suggest, if we >> don't insisting on >> an explicit zero we have the API wrong. >> >> I don't think you disagreed, just that the burden to get it correct >> lay elsewhere. >> >> If that is the case it doesn't really matter, we cannot release this >> interface until >> it is corrected wherever it must be. >> >> And my zero check was a poor hack. >> >> I should have explicitly checked the returned pointer value. >> >> I will send that proposed revision, and hopefully you will consider it >> for inclusion. >> >> >> >> >> >> >> >> > >> >> > Signed-off-by: Minchan Kim <minchan@kernel.org> >> >> > --- >> >> > Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ >> >> > Documentation/blockdev/zram.txt | 24 ++++++++++++++--- >> >> > drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ >> >> > drivers/block/zram/zram_drv.h | 5 ++++ >> >> > 4 files changed, 76 insertions(+), 4 deletions(-) >> >> > >> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram >> >> > index 70ec992514d0..b8c779d64968 100644 >> >> > --- a/Documentation/ABI/testing/sysfs-block-zram >> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram >> >> > @@ -119,3 +119,13 @@ Description: >> >> > efficiency can be calculated using compr_data_size and this >> >> > statistic. >> >> > Unit: bytes >> >> > + >> >> > +What: /sys/block/zram<id>/mem_limit >> >> > +Date: August 2014 >> >> > +Contact: Minchan Kim <minchan@kernel.org> >> >> > +Description: >> >> > + The mem_limit file is read/write and specifies the amount >> >> > + of memory to be able to consume memory to store store >> >> > + compressed data. The limit could be changed in run time >> >> > - and "0" is default which means disable the limit. >> >> > + and "0" means disable the limit. No limit is the initial state. >> >> >> >> there should be no default in the API. >> > >> > Thanks. >> > >> >> >> >> > + Unit: bytes >> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt >> >> > index 0595c3f56ccf..82c6a41116db 100644 >> >> > --- a/Documentation/blockdev/zram.txt >> >> > +++ b/Documentation/blockdev/zram.txt >> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory >> >> > since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the >> >> > size of the disk when not in use so a huge zram is wasteful. >> >> > >> >> > -5) Activate: >> >> > +5) Set memory limit: Optional >> >> > + Set memory limit by writing the value to sysfs node 'mem_limit'. >> >> > + The value can be either in bytes or you can use mem suffixes. >> >> > + In addition, you could change the value in runtime. >> >> > + Examples: >> >> > + # limit /dev/zram0 with 50MB memory >> >> > + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit >> >> > + >> >> > + # Using mem suffixes >> >> > + echo 256K > /sys/block/zram0/mem_limit >> >> > + echo 512M > /sys/block/zram0/mem_limit >> >> > + echo 1G > /sys/block/zram0/mem_limit >> >> > + >> >> > + # To disable memory limit >> >> > + echo 0 > /sys/block/zram0/mem_limit >> >> > + >> >> > +6) Activate: >> >> > mkswap /dev/zram0 >> >> > swapon /dev/zram0 >> >> > >> >> > mkfs.ext4 /dev/zram1 >> >> > mount /dev/zram1 /tmp >> >> > >> >> > -6) Stats: >> >> > +7) Stats: >> >> > Per-device statistics are exported as various nodes under >> >> > /sys/block/zram<id>/ >> >> > disksize >> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. >> >> > compr_data_size >> >> > mem_used_total >> >> > >> >> > -7) Deactivate: >> >> > +8) Deactivate: >> >> > swapoff /dev/zram0 >> >> > umount /dev/zram1 >> >> > >> >> > -8) Reset: >> >> > +9) Reset: >> >> > Write any positive value to 'reset' sysfs node >> >> > echo 1 > /sys/block/zram0/reset >> >> > echo 1 > /sys/block/zram1/reset >> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c >> >> > index f0b8b30a7128..370c355eb127 100644 >> >> > --- a/drivers/block/zram/zram_drv.c >> >> > +++ b/drivers/block/zram/zram_drv.c >> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, >> >> > return scnprintf(buf, PAGE_SIZE, "%d\n", val); >> >> > } >> >> > >> >> > +static ssize_t mem_limit_show(struct device *dev, >> >> > + struct device_attribute *attr, char *buf) >> >> > +{ >> >> > + u64 val; >> >> > + struct zram *zram = dev_to_zram(dev); >> >> > + >> >> > + down_read(&zram->init_lock); >> >> > + val = zram->limit_pages; >> >> > + up_read(&zram->init_lock); >> >> > + >> >> > + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); >> >> > +} >> >> > + >> >> > +static ssize_t mem_limit_store(struct device *dev, >> >> > + struct device_attribute *attr, const char *buf, size_t len) >> >> > +{ >> >> > + u64 limit; >> >> > + struct zram *zram = dev_to_zram(dev); >> >> > + >> >> > + limit = memparse(buf, NULL); >> >> >> >> if (limit = 0 && buf != "0") >> >> return -EINVAL >> >> >> >> > + down_write(&zram->init_lock); >> >> > + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; >> >> > + up_write(&zram->init_lock); >> >> > + >> >> > + return len; >> >> > +} >> >> > + >> >> > static ssize_t max_comp_streams_store(struct device *dev, >> >> > struct device_attribute *attr, const char *buf, size_t len) >> >> > { >> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, >> >> > ret = -ENOMEM; >> >> > goto out; >> >> > } >> >> > + >> >> > + if (zram->limit_pages && >> >> > + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { >> >> > + zs_free(meta->mem_pool, handle); >> >> > + ret = -ENOMEM; >> >> > + goto out; >> >> > + } >> >> > + >> >> > cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); >> >> > >> >> > if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { >> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) >> >> > struct zram_meta *meta; >> >> > >> >> > down_write(&zram->init_lock); >> >> > + >> >> > + zram->limit_pages = 0; >> >> > + >> >> > if (!init_done(zram)) { >> >> > up_write(&zram->init_lock); >> >> > return; >> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); >> >> > static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); >> >> > static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); >> >> > static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); >> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, >> >> > + mem_limit_store); >> >> > static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, >> >> > max_comp_streams_show, max_comp_streams_store); >> >> > static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, >> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { >> >> > &dev_attr_orig_data_size.attr, >> >> > &dev_attr_compr_data_size.attr, >> >> > &dev_attr_mem_used_total.attr, >> >> > + &dev_attr_mem_limit.attr, >> >> > &dev_attr_max_comp_streams.attr, >> >> > &dev_attr_comp_algorithm.attr, >> >> > NULL, >> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h >> >> > index e0f725c87cc6..b7aa9c21553f 100644 >> >> > --- a/drivers/block/zram/zram_drv.h >> >> > +++ b/drivers/block/zram/zram_drv.h >> >> > @@ -112,6 +112,11 @@ struct zram { >> >> > u64 disksize; /* bytes */ >> >> > int max_comp_streams; >> >> > struct zram_stats stats; >> >> > + /* >> >> > + * the number of pages zram can consume for storing compressed data >> >> > + */ >> >> > + unsigned long limit_pages; >> >> > + >> >> > char compressor[10]; >> >> > }; >> >> > #endif >> >> > -- >> >> > 2.0.0 >> >> > >> >> >> >> -- >> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> >> the body to majordomo@kvack.org. For more info on Linux MM, >> >> see: http://www.linux-mm.org/ . >> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >> > >> > -- >> > Kind regards, >> > Minchan Kim >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > > -- > Kind regards, > Minchan Kim ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation @ 2014-08-25 8:22 ` David Horner 0 siblings, 0 replies; 44+ messages in thread From: David Horner @ 2014-08-25 8:22 UTC (permalink / raw) To: Minchan Kim Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote: > On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote: >> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote: >> > Hello David, >> > >> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote: >> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: >> >> > Since zram has no control feature to limit memory usage, >> >> > it makes hard to manage system memrory. >> >> > >> >> > This patch adds new knob "mem_limit" via sysfs to set up the >> >> > a limit so that zram could fail allocation once it reaches >> >> > the limit. >> >> > >> >> > In addition, user could change the limit in runtime so that >> >> > he could manage the memory more dynamically. >> >> > >> >> - Default is no limit so it doesn't break old behavior. >> >> + Initial state is no limit so it doesn't break old behavior. >> >> >> >> I understand your previous post now. >> >> >> >> I was saying that setting to either a null value or garbage >> >> (which is interpreted as zero by memparse(buf, NULL);) >> >> removes the limit. >> >> >> >> I think this is "surprise" behaviour and rather the null case should >> >> return -EINVAL >> >> The test below should be "good enough" though not catching all garbage. >> > >> > Thanks for suggesting but as I said, it should be fixed in memparse itself, >> > not caller if it is really problem so I don't want to touch it in this >> > patchset. It's not critical for adding the feature. >> > >> >> I've looked into the memparse function more since we talked. >> I do believe a wrapper function around it for the typical use by sysfs would >> be very valuable. > > Agree. > >> However, there is nothing wrong with memparse itself that needs to be fixed. >> >> It does what it is documented to do very well (In My Uninformed Opinion). >> It provides everything that a caller needs to manage the token that it >> processes. >> It thus handles strings like "7,,5,8,,9" with the implied zeros. > > Maybe strict_memparse would be better to protect such things so you > could find several places to clean it up. > >> >> The fact that other callers don't check the return pointer value to >> see if only a null >> string was processed, is not its fault. >> Nor that it may not be ideally suited to sysfs attributes; that other store >> functions use it in a given manner does not means that is correct - >> nor that it is >> incorrect for that "knob". Some attributes could be just as valid with >> null zeros. >> >> And you are correct, to disambiguate the zero is not required for the >> limit feature. >> Your original patch which disallowed zero was full feature for mem_limit. >> It is the requested non-crucial feature to allow zero to reestablish >> the initial state >> that benefits from distinguishing an explicit zero from a "default zero' >> when garbage is written. >> >> The final argument is that if we release this feature as is the undocumented >> functionality could be relied upon, and when later fixed: user space breaks. > > I don't get it. Why does it break userspace? > The sysfs-block-zram says "0" means disable the limit. > If someone writes *garabge* but work as if disabling the limit, > it's not a right thing and he already broke although it worked > so it would be not a problem if we fix later. > (ie, we don't need to take care of broken userspace) > Am I missing your point? > Perhaps you are missing my point, perhaps ignoring or dismissing. Basically, if a facility works in a useful way, even if it was designed for different usage, that becomes the "accepted" interface/usage. The developer may not have intended that usage or may even considered it wrong and a broken usage, but it is what it is and people become reliant on that behaviour. Case in point is memparse itself. The developer intentionally sets the return pointer because that is the only value that can be validated for correct performance. The return value allows -ve so the standard error message passing is not valid. Unfortunately, C allows the user to pass a NULL value in the parameter. The developer could consider that absurd and fundamentally broken. But to the user it is a valid situation, because (perhaps) it can't be bothered to handle error cases. So, who is to blame. You say memparse, that it is fundamentally broken, because it didn't check to see that it was used correctly. And I say mem_limit_store is fundamentally broken, because it didn't check to see that it was used correctly. The difference is that memparse cannot stop being abused (C allows the NULL argument and extensive tricks are required to address that) however, we can readily fix mem_limit_store and ensure 1) no regression when the interface IS fixed and 2) predictable behaviour when accidental or "fuzzy" input arrives. >> They say getting API right is a difficult exercise. I suggest, if we >> don't insisting on >> an explicit zero we have the API wrong. >> >> I don't think you disagreed, just that the burden to get it correct >> lay elsewhere. >> >> If that is the case it doesn't really matter, we cannot release this >> interface until >> it is corrected wherever it must be. >> >> And my zero check was a poor hack. >> >> I should have explicitly checked the returned pointer value. >> >> I will send that proposed revision, and hopefully you will consider it >> for inclusion. >> >> >> >> >> >> >> >> > >> >> > Signed-off-by: Minchan Kim <minchan@kernel.org> >> >> > --- >> >> > Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ >> >> > Documentation/blockdev/zram.txt | 24 ++++++++++++++--- >> >> > drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ >> >> > drivers/block/zram/zram_drv.h | 5 ++++ >> >> > 4 files changed, 76 insertions(+), 4 deletions(-) >> >> > >> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram >> >> > index 70ec992514d0..b8c779d64968 100644 >> >> > --- a/Documentation/ABI/testing/sysfs-block-zram >> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram >> >> > @@ -119,3 +119,13 @@ Description: >> >> > efficiency can be calculated using compr_data_size and this >> >> > statistic. >> >> > Unit: bytes >> >> > + >> >> > +What: /sys/block/zram<id>/mem_limit >> >> > +Date: August 2014 >> >> > +Contact: Minchan Kim <minchan@kernel.org> >> >> > +Description: >> >> > + The mem_limit file is read/write and specifies the amount >> >> > + of memory to be able to consume memory to store store >> >> > + compressed data. The limit could be changed in run time >> >> > - and "0" is default which means disable the limit. >> >> > + and "0" means disable the limit. No limit is the initial state. >> >> >> >> there should be no default in the API. >> > >> > Thanks. >> > >> >> >> >> > + Unit: bytes >> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt >> >> > index 0595c3f56ccf..82c6a41116db 100644 >> >> > --- a/Documentation/blockdev/zram.txt >> >> > +++ b/Documentation/blockdev/zram.txt >> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory >> >> > since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the >> >> > size of the disk when not in use so a huge zram is wasteful. >> >> > >> >> > -5) Activate: >> >> > +5) Set memory limit: Optional >> >> > + Set memory limit by writing the value to sysfs node 'mem_limit'. >> >> > + The value can be either in bytes or you can use mem suffixes. >> >> > + In addition, you could change the value in runtime. >> >> > + Examples: >> >> > + # limit /dev/zram0 with 50MB memory >> >> > + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit >> >> > + >> >> > + # Using mem suffixes >> >> > + echo 256K > /sys/block/zram0/mem_limit >> >> > + echo 512M > /sys/block/zram0/mem_limit >> >> > + echo 1G > /sys/block/zram0/mem_limit >> >> > + >> >> > + # To disable memory limit >> >> > + echo 0 > /sys/block/zram0/mem_limit >> >> > + >> >> > +6) Activate: >> >> > mkswap /dev/zram0 >> >> > swapon /dev/zram0 >> >> > >> >> > mkfs.ext4 /dev/zram1 >> >> > mount /dev/zram1 /tmp >> >> > >> >> > -6) Stats: >> >> > +7) Stats: >> >> > Per-device statistics are exported as various nodes under >> >> > /sys/block/zram<id>/ >> >> > disksize >> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. >> >> > compr_data_size >> >> > mem_used_total >> >> > >> >> > -7) Deactivate: >> >> > +8) Deactivate: >> >> > swapoff /dev/zram0 >> >> > umount /dev/zram1 >> >> > >> >> > -8) Reset: >> >> > +9) Reset: >> >> > Write any positive value to 'reset' sysfs node >> >> > echo 1 > /sys/block/zram0/reset >> >> > echo 1 > /sys/block/zram1/reset >> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c >> >> > index f0b8b30a7128..370c355eb127 100644 >> >> > --- a/drivers/block/zram/zram_drv.c >> >> > +++ b/drivers/block/zram/zram_drv.c >> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, >> >> > return scnprintf(buf, PAGE_SIZE, "%d\n", val); >> >> > } >> >> > >> >> > +static ssize_t mem_limit_show(struct device *dev, >> >> > + struct device_attribute *attr, char *buf) >> >> > +{ >> >> > + u64 val; >> >> > + struct zram *zram = dev_to_zram(dev); >> >> > + >> >> > + down_read(&zram->init_lock); >> >> > + val = zram->limit_pages; >> >> > + up_read(&zram->init_lock); >> >> > + >> >> > + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); >> >> > +} >> >> > + >> >> > +static ssize_t mem_limit_store(struct device *dev, >> >> > + struct device_attribute *attr, const char *buf, size_t len) >> >> > +{ >> >> > + u64 limit; >> >> > + struct zram *zram = dev_to_zram(dev); >> >> > + >> >> > + limit = memparse(buf, NULL); >> >> >> >> if (limit = 0 && buf != "0") >> >> return -EINVAL >> >> >> >> > + down_write(&zram->init_lock); >> >> > + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; >> >> > + up_write(&zram->init_lock); >> >> > + >> >> > + return len; >> >> > +} >> >> > + >> >> > static ssize_t max_comp_streams_store(struct device *dev, >> >> > struct device_attribute *attr, const char *buf, size_t len) >> >> > { >> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, >> >> > ret = -ENOMEM; >> >> > goto out; >> >> > } >> >> > + >> >> > + if (zram->limit_pages && >> >> > + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { >> >> > + zs_free(meta->mem_pool, handle); >> >> > + ret = -ENOMEM; >> >> > + goto out; >> >> > + } >> >> > + >> >> > cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); >> >> > >> >> > if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { >> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) >> >> > struct zram_meta *meta; >> >> > >> >> > down_write(&zram->init_lock); >> >> > + >> >> > + zram->limit_pages = 0; >> >> > + >> >> > if (!init_done(zram)) { >> >> > up_write(&zram->init_lock); >> >> > return; >> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); >> >> > static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); >> >> > static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); >> >> > static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); >> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, >> >> > + mem_limit_store); >> >> > static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, >> >> > max_comp_streams_show, max_comp_streams_store); >> >> > static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, >> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { >> >> > &dev_attr_orig_data_size.attr, >> >> > &dev_attr_compr_data_size.attr, >> >> > &dev_attr_mem_used_total.attr, >> >> > + &dev_attr_mem_limit.attr, >> >> > &dev_attr_max_comp_streams.attr, >> >> > &dev_attr_comp_algorithm.attr, >> >> > NULL, >> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h >> >> > index e0f725c87cc6..b7aa9c21553f 100644 >> >> > --- a/drivers/block/zram/zram_drv.h >> >> > +++ b/drivers/block/zram/zram_drv.h >> >> > @@ -112,6 +112,11 @@ struct zram { >> >> > u64 disksize; /* bytes */ >> >> > int max_comp_streams; >> >> > struct zram_stats stats; >> >> > + /* >> >> > + * the number of pages zram can consume for storing compressed data >> >> > + */ >> >> > + unsigned long limit_pages; >> >> > + >> >> > char compressor[10]; >> >> > }; >> >> > #endif >> >> > -- >> >> > 2.0.0 >> >> > >> >> >> >> -- >> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> >> the body to majordomo@kvack.org. For more info on Linux MM, >> >> see: http://www.linux-mm.org/ . >> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >> > >> > -- >> > Kind regards, >> > Minchan Kim >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > > -- > Kind regards, > Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation 2014-08-25 8:22 ` David Horner @ 2014-08-25 18:12 ` Dan Streetman -1 siblings, 0 replies; 44+ messages in thread From: Dan Streetman @ 2014-08-25 18:12 UTC (permalink / raw) To: David Horner Cc: Minchan Kim, Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@gmail.com> wrote: > On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote: >> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote: >>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote: >>> > Hello David, >>> > >>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote: >>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: >>> >> > Since zram has no control feature to limit memory usage, >>> >> > it makes hard to manage system memrory. >>> >> > >>> >> > This patch adds new knob "mem_limit" via sysfs to set up the >>> >> > a limit so that zram could fail allocation once it reaches >>> >> > the limit. >>> >> > >>> >> > In addition, user could change the limit in runtime so that >>> >> > he could manage the memory more dynamically. >>> >> > >>> >> - Default is no limit so it doesn't break old behavior. >>> >> + Initial state is no limit so it doesn't break old behavior. >>> >> >>> >> I understand your previous post now. >>> >> >>> >> I was saying that setting to either a null value or garbage >>> >> (which is interpreted as zero by memparse(buf, NULL);) >>> >> removes the limit. >>> >> >>> >> I think this is "surprise" behaviour and rather the null case should >>> >> return -EINVAL >>> >> The test below should be "good enough" though not catching all garbage. >>> > >>> > Thanks for suggesting but as I said, it should be fixed in memparse itself, >>> > not caller if it is really problem so I don't want to touch it in this >>> > patchset. It's not critical for adding the feature. >>> > >>> >>> I've looked into the memparse function more since we talked. >>> I do believe a wrapper function around it for the typical use by sysfs would >>> be very valuable. >> >> Agree. >> >>> However, there is nothing wrong with memparse itself that needs to be fixed. >>> >>> It does what it is documented to do very well (In My Uninformed Opinion). >>> It provides everything that a caller needs to manage the token that it >>> processes. >>> It thus handles strings like "7,,5,8,,9" with the implied zeros. >> >> Maybe strict_memparse would be better to protect such things so you >> could find several places to clean it up. >> >>> >>> The fact that other callers don't check the return pointer value to >>> see if only a null >>> string was processed, is not its fault. >>> Nor that it may not be ideally suited to sysfs attributes; that other store >>> functions use it in a given manner does not means that is correct - >>> nor that it is >>> incorrect for that "knob". Some attributes could be just as valid with >>> null zeros. >>> >>> And you are correct, to disambiguate the zero is not required for the >>> limit feature. >>> Your original patch which disallowed zero was full feature for mem_limit. >>> It is the requested non-crucial feature to allow zero to reestablish >>> the initial state >>> that benefits from distinguishing an explicit zero from a "default zero' >>> when garbage is written. >>> >>> The final argument is that if we release this feature as is the undocumented >>> functionality could be relied upon, and when later fixed: user space breaks. >> >> I don't get it. Why does it break userspace? >> The sysfs-block-zram says "0" means disable the limit. >> If someone writes *garabge* but work as if disabling the limit, >> it's not a right thing and he already broke although it worked >> so it would be not a problem if we fix later. >> (ie, we don't need to take care of broken userspace) >> Am I missing your point? >> > > Perhaps you are missing my point, perhaps ignoring or dismissing. > > Basically, if a facility works in a useful way, even if it was designed for > different usage, that becomes the "accepted" interface/usage. > The developer may not have intended that usage or may even considered > it wrong and a broken usage, but it is what it is and people become > reliant on that behaviour. > > Case in point is memparse itself. > > The developer intentionally sets the return pointer because that is the > only value that can be validated for correct performance. > The return value allows -ve so the standard error message passing is not valid. > Unfortunately, C allows the user to pass a NULL value in the parameter. > The developer could consider that absurd and fundamentally broken. > But to the user it is a valid situation, because (perhaps) it can't be > bothered to handle error cases. > > So, who is to blame. > You say memparse, that it is fundamentally broken, > because it didn't check to see that it was used correctly. > And I say mem_limit_store is fundamentally broken, > because it didn't check to see that it was used correctly. I think we should look at what the rest of the kernel does as far as checking memparse results. It appears to be a mix of some code checking memparse while others don't. The most common way to check appears to be to verify that memparse actually parsed at least 1 character, e.g.: oldp = p; mem_size = memparse(p, &p); if (p == oldp) return -EINVAL; although other places where 0 isn't valid can simply check for that: mem_size = memparse(p, &p); /* don't remove all of memory when handling "mem={invalid}" param */ if (mem_size == 0) return -EINVAL; or even the other memparse use in zram_drv.c: disksize = memparse(buf, NULL); if (!disksize) return -EINVAL; And there seem to be other places where (maybe?) there's no checking at all. However, it also seems like many cases of memparse usage are looking for a non-zero value, and therefore they can either immediately check for zero/invalid or (possibly) later code has checks to avoid using any zero value. In this case though, 0 is a valid value. So, while I agree that if a user passes an invalid (i.e. non-numeric) value it's clearly user error, it might be closer to the apparent (although unwritten AFAICT) memparse usage api to check the result for validity; in our case a simple check if at least 1 char was parsed is all that's needed, e.g.: { u64 limit; char *tmp = buf; struct zram *zram = dev_to_zram(dev); limit = memparse(buf, &tmp); if (buf == tmp) /* no chars parsed, invalid input */ return -EINVAL; down_write(&zram->init_lock); ... Separate from this patch, it would also help if the lib/cmdline.c memparse doc was at least updated to clarify when the result should be checked for validity (e.g. always, or at least when the result is 0) and how best to do that (e.g. if 0 is an invalid value, just check if the result is 0; if 0 is a possible valid value, check if any chars were parsed). > > The difference is that memparse cannot stop being abused > (C allows the NULL argument and extensive tricks are required to address that) > however, we can readily fix mem_limit_store and ensure > 1) no regression when the interface IS fixed and > 2) predictable behaviour when accidental or "fuzzy" input arrives. > > >>> They say getting API right is a difficult exercise. I suggest, if we >>> don't insisting on >>> an explicit zero we have the API wrong. >>> >>> I don't think you disagreed, just that the burden to get it correct >>> lay elsewhere. >>> >>> If that is the case it doesn't really matter, we cannot release this >>> interface until >>> it is corrected wherever it must be. >>> >>> And my zero check was a poor hack. >>> >>> I should have explicitly checked the returned pointer value. >>> >>> I will send that proposed revision, and hopefully you will consider it >>> for inclusion. >>> >>> >>> >>> >>> >> >>> >> > >>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org> >>> >> > --- >>> >> > Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ >>> >> > Documentation/blockdev/zram.txt | 24 ++++++++++++++--- >>> >> > drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ >>> >> > drivers/block/zram/zram_drv.h | 5 ++++ >>> >> > 4 files changed, 76 insertions(+), 4 deletions(-) >>> >> > >>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram >>> >> > index 70ec992514d0..b8c779d64968 100644 >>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram >>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram >>> >> > @@ -119,3 +119,13 @@ Description: >>> >> > efficiency can be calculated using compr_data_size and this >>> >> > statistic. >>> >> > Unit: bytes >>> >> > + >>> >> > +What: /sys/block/zram<id>/mem_limit >>> >> > +Date: August 2014 >>> >> > +Contact: Minchan Kim <minchan@kernel.org> >>> >> > +Description: >>> >> > + The mem_limit file is read/write and specifies the amount >>> >> > + of memory to be able to consume memory to store store >>> >> > + compressed data. The limit could be changed in run time >>> >> > - and "0" is default which means disable the limit. >>> >> > + and "0" means disable the limit. No limit is the initial state. >>> >> >>> >> there should be no default in the API. >>> > >>> > Thanks. >>> > >>> >> >>> >> > + Unit: bytes >>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt >>> >> > index 0595c3f56ccf..82c6a41116db 100644 >>> >> > --- a/Documentation/blockdev/zram.txt >>> >> > +++ b/Documentation/blockdev/zram.txt >>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory >>> >> > since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the >>> >> > size of the disk when not in use so a huge zram is wasteful. >>> >> > >>> >> > -5) Activate: >>> >> > +5) Set memory limit: Optional >>> >> > + Set memory limit by writing the value to sysfs node 'mem_limit'. >>> >> > + The value can be either in bytes or you can use mem suffixes. >>> >> > + In addition, you could change the value in runtime. >>> >> > + Examples: >>> >> > + # limit /dev/zram0 with 50MB memory >>> >> > + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit >>> >> > + >>> >> > + # Using mem suffixes >>> >> > + echo 256K > /sys/block/zram0/mem_limit >>> >> > + echo 512M > /sys/block/zram0/mem_limit >>> >> > + echo 1G > /sys/block/zram0/mem_limit >>> >> > + >>> >> > + # To disable memory limit >>> >> > + echo 0 > /sys/block/zram0/mem_limit >>> >> > + >>> >> > +6) Activate: >>> >> > mkswap /dev/zram0 >>> >> > swapon /dev/zram0 >>> >> > >>> >> > mkfs.ext4 /dev/zram1 >>> >> > mount /dev/zram1 /tmp >>> >> > >>> >> > -6) Stats: >>> >> > +7) Stats: >>> >> > Per-device statistics are exported as various nodes under >>> >> > /sys/block/zram<id>/ >>> >> > disksize >>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. >>> >> > compr_data_size >>> >> > mem_used_total >>> >> > >>> >> > -7) Deactivate: >>> >> > +8) Deactivate: >>> >> > swapoff /dev/zram0 >>> >> > umount /dev/zram1 >>> >> > >>> >> > -8) Reset: >>> >> > +9) Reset: >>> >> > Write any positive value to 'reset' sysfs node >>> >> > echo 1 > /sys/block/zram0/reset >>> >> > echo 1 > /sys/block/zram1/reset >>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c >>> >> > index f0b8b30a7128..370c355eb127 100644 >>> >> > --- a/drivers/block/zram/zram_drv.c >>> >> > +++ b/drivers/block/zram/zram_drv.c >>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, >>> >> > return scnprintf(buf, PAGE_SIZE, "%d\n", val); >>> >> > } >>> >> > >>> >> > +static ssize_t mem_limit_show(struct device *dev, >>> >> > + struct device_attribute *attr, char *buf) >>> >> > +{ >>> >> > + u64 val; >>> >> > + struct zram *zram = dev_to_zram(dev); >>> >> > + >>> >> > + down_read(&zram->init_lock); >>> >> > + val = zram->limit_pages; >>> >> > + up_read(&zram->init_lock); >>> >> > + >>> >> > + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); >>> >> > +} >>> >> > + >>> >> > +static ssize_t mem_limit_store(struct device *dev, >>> >> > + struct device_attribute *attr, const char *buf, size_t len) >>> >> > +{ >>> >> > + u64 limit; >>> >> > + struct zram *zram = dev_to_zram(dev); >>> >> > + >>> >> > + limit = memparse(buf, NULL); >>> >> >>> >> if (limit = 0 && buf != "0") >>> >> return -EINVAL >>> >> >>> >> > + down_write(&zram->init_lock); >>> >> > + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; >>> >> > + up_write(&zram->init_lock); >>> >> > + >>> >> > + return len; >>> >> > +} >>> >> > + >>> >> > static ssize_t max_comp_streams_store(struct device *dev, >>> >> > struct device_attribute *attr, const char *buf, size_t len) >>> >> > { >>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, >>> >> > ret = -ENOMEM; >>> >> > goto out; >>> >> > } >>> >> > + >>> >> > + if (zram->limit_pages && >>> >> > + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { >>> >> > + zs_free(meta->mem_pool, handle); >>> >> > + ret = -ENOMEM; >>> >> > + goto out; >>> >> > + } >>> >> > + >>> >> > cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); >>> >> > >>> >> > if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { >>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) >>> >> > struct zram_meta *meta; >>> >> > >>> >> > down_write(&zram->init_lock); >>> >> > + >>> >> > + zram->limit_pages = 0; >>> >> > + >>> >> > if (!init_done(zram)) { >>> >> > up_write(&zram->init_lock); >>> >> > return; >>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); >>> >> > static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); >>> >> > static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); >>> >> > static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); >>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, >>> >> > + mem_limit_store); >>> >> > static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, >>> >> > max_comp_streams_show, max_comp_streams_store); >>> >> > static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, >>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { >>> >> > &dev_attr_orig_data_size.attr, >>> >> > &dev_attr_compr_data_size.attr, >>> >> > &dev_attr_mem_used_total.attr, >>> >> > + &dev_attr_mem_limit.attr, >>> >> > &dev_attr_max_comp_streams.attr, >>> >> > &dev_attr_comp_algorithm.attr, >>> >> > NULL, >>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h >>> >> > index e0f725c87cc6..b7aa9c21553f 100644 >>> >> > --- a/drivers/block/zram/zram_drv.h >>> >> > +++ b/drivers/block/zram/zram_drv.h >>> >> > @@ -112,6 +112,11 @@ struct zram { >>> >> > u64 disksize; /* bytes */ >>> >> > int max_comp_streams; >>> >> > struct zram_stats stats; >>> >> > + /* >>> >> > + * the number of pages zram can consume for storing compressed data >>> >> > + */ >>> >> > + unsigned long limit_pages; >>> >> > + >>> >> > char compressor[10]; >>> >> > }; >>> >> > #endif >>> >> > -- >>> >> > 2.0.0 >>> >> > >>> >> >>> >> -- >>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>> >> the body to majordomo@kvack.org. For more info on Linux MM, >>> >> see: http://www.linux-mm.org/ . >>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >>> > >>> > -- >>> > Kind regards, >>> > Minchan Kim >>> >>> -- >>> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>> the body to majordomo@kvack.org. For more info on Linux MM, >>> see: http://www.linux-mm.org/ . >>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >> >> -- >> Kind regards, >> Minchan Kim ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation @ 2014-08-25 18:12 ` Dan Streetman 0 siblings, 0 replies; 44+ messages in thread From: Dan Streetman @ 2014-08-25 18:12 UTC (permalink / raw) To: David Horner Cc: Minchan Kim, Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@gmail.com> wrote: > On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote: >> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote: >>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote: >>> > Hello David, >>> > >>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote: >>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: >>> >> > Since zram has no control feature to limit memory usage, >>> >> > it makes hard to manage system memrory. >>> >> > >>> >> > This patch adds new knob "mem_limit" via sysfs to set up the >>> >> > a limit so that zram could fail allocation once it reaches >>> >> > the limit. >>> >> > >>> >> > In addition, user could change the limit in runtime so that >>> >> > he could manage the memory more dynamically. >>> >> > >>> >> - Default is no limit so it doesn't break old behavior. >>> >> + Initial state is no limit so it doesn't break old behavior. >>> >> >>> >> I understand your previous post now. >>> >> >>> >> I was saying that setting to either a null value or garbage >>> >> (which is interpreted as zero by memparse(buf, NULL);) >>> >> removes the limit. >>> >> >>> >> I think this is "surprise" behaviour and rather the null case should >>> >> return -EINVAL >>> >> The test below should be "good enough" though not catching all garbage. >>> > >>> > Thanks for suggesting but as I said, it should be fixed in memparse itself, >>> > not caller if it is really problem so I don't want to touch it in this >>> > patchset. It's not critical for adding the feature. >>> > >>> >>> I've looked into the memparse function more since we talked. >>> I do believe a wrapper function around it for the typical use by sysfs would >>> be very valuable. >> >> Agree. >> >>> However, there is nothing wrong with memparse itself that needs to be fixed. >>> >>> It does what it is documented to do very well (In My Uninformed Opinion). >>> It provides everything that a caller needs to manage the token that it >>> processes. >>> It thus handles strings like "7,,5,8,,9" with the implied zeros. >> >> Maybe strict_memparse would be better to protect such things so you >> could find several places to clean it up. >> >>> >>> The fact that other callers don't check the return pointer value to >>> see if only a null >>> string was processed, is not its fault. >>> Nor that it may not be ideally suited to sysfs attributes; that other store >>> functions use it in a given manner does not means that is correct - >>> nor that it is >>> incorrect for that "knob". Some attributes could be just as valid with >>> null zeros. >>> >>> And you are correct, to disambiguate the zero is not required for the >>> limit feature. >>> Your original patch which disallowed zero was full feature for mem_limit. >>> It is the requested non-crucial feature to allow zero to reestablish >>> the initial state >>> that benefits from distinguishing an explicit zero from a "default zero' >>> when garbage is written. >>> >>> The final argument is that if we release this feature as is the undocumented >>> functionality could be relied upon, and when later fixed: user space breaks. >> >> I don't get it. Why does it break userspace? >> The sysfs-block-zram says "0" means disable the limit. >> If someone writes *garabge* but work as if disabling the limit, >> it's not a right thing and he already broke although it worked >> so it would be not a problem if we fix later. >> (ie, we don't need to take care of broken userspace) >> Am I missing your point? >> > > Perhaps you are missing my point, perhaps ignoring or dismissing. > > Basically, if a facility works in a useful way, even if it was designed for > different usage, that becomes the "accepted" interface/usage. > The developer may not have intended that usage or may even considered > it wrong and a broken usage, but it is what it is and people become > reliant on that behaviour. > > Case in point is memparse itself. > > The developer intentionally sets the return pointer because that is the > only value that can be validated for correct performance. > The return value allows -ve so the standard error message passing is not valid. > Unfortunately, C allows the user to pass a NULL value in the parameter. > The developer could consider that absurd and fundamentally broken. > But to the user it is a valid situation, because (perhaps) it can't be > bothered to handle error cases. > > So, who is to blame. > You say memparse, that it is fundamentally broken, > because it didn't check to see that it was used correctly. > And I say mem_limit_store is fundamentally broken, > because it didn't check to see that it was used correctly. I think we should look at what the rest of the kernel does as far as checking memparse results. It appears to be a mix of some code checking memparse while others don't. The most common way to check appears to be to verify that memparse actually parsed at least 1 character, e.g.: oldp = p; mem_size = memparse(p, &p); if (p == oldp) return -EINVAL; although other places where 0 isn't valid can simply check for that: mem_size = memparse(p, &p); /* don't remove all of memory when handling "mem={invalid}" param */ if (mem_size == 0) return -EINVAL; or even the other memparse use in zram_drv.c: disksize = memparse(buf, NULL); if (!disksize) return -EINVAL; And there seem to be other places where (maybe?) there's no checking at all. However, it also seems like many cases of memparse usage are looking for a non-zero value, and therefore they can either immediately check for zero/invalid or (possibly) later code has checks to avoid using any zero value. In this case though, 0 is a valid value. So, while I agree that if a user passes an invalid (i.e. non-numeric) value it's clearly user error, it might be closer to the apparent (although unwritten AFAICT) memparse usage api to check the result for validity; in our case a simple check if at least 1 char was parsed is all that's needed, e.g.: { u64 limit; char *tmp = buf; struct zram *zram = dev_to_zram(dev); limit = memparse(buf, &tmp); if (buf == tmp) /* no chars parsed, invalid input */ return -EINVAL; down_write(&zram->init_lock); ... Separate from this patch, it would also help if the lib/cmdline.c memparse doc was at least updated to clarify when the result should be checked for validity (e.g. always, or at least when the result is 0) and how best to do that (e.g. if 0 is an invalid value, just check if the result is 0; if 0 is a possible valid value, check if any chars were parsed). > > The difference is that memparse cannot stop being abused > (C allows the NULL argument and extensive tricks are required to address that) > however, we can readily fix mem_limit_store and ensure > 1) no regression when the interface IS fixed and > 2) predictable behaviour when accidental or "fuzzy" input arrives. > > >>> They say getting API right is a difficult exercise. I suggest, if we >>> don't insisting on >>> an explicit zero we have the API wrong. >>> >>> I don't think you disagreed, just that the burden to get it correct >>> lay elsewhere. >>> >>> If that is the case it doesn't really matter, we cannot release this >>> interface until >>> it is corrected wherever it must be. >>> >>> And my zero check was a poor hack. >>> >>> I should have explicitly checked the returned pointer value. >>> >>> I will send that proposed revision, and hopefully you will consider it >>> for inclusion. >>> >>> >>> >>> >>> >> >>> >> > >>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org> >>> >> > --- >>> >> > Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ >>> >> > Documentation/blockdev/zram.txt | 24 ++++++++++++++--- >>> >> > drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ >>> >> > drivers/block/zram/zram_drv.h | 5 ++++ >>> >> > 4 files changed, 76 insertions(+), 4 deletions(-) >>> >> > >>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram >>> >> > index 70ec992514d0..b8c779d64968 100644 >>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram >>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram >>> >> > @@ -119,3 +119,13 @@ Description: >>> >> > efficiency can be calculated using compr_data_size and this >>> >> > statistic. >>> >> > Unit: bytes >>> >> > + >>> >> > +What: /sys/block/zram<id>/mem_limit >>> >> > +Date: August 2014 >>> >> > +Contact: Minchan Kim <minchan@kernel.org> >>> >> > +Description: >>> >> > + The mem_limit file is read/write and specifies the amount >>> >> > + of memory to be able to consume memory to store store >>> >> > + compressed data. The limit could be changed in run time >>> >> > - and "0" is default which means disable the limit. >>> >> > + and "0" means disable the limit. No limit is the initial state. >>> >> >>> >> there should be no default in the API. >>> > >>> > Thanks. >>> > >>> >> >>> >> > + Unit: bytes >>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt >>> >> > index 0595c3f56ccf..82c6a41116db 100644 >>> >> > --- a/Documentation/blockdev/zram.txt >>> >> > +++ b/Documentation/blockdev/zram.txt >>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory >>> >> > since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the >>> >> > size of the disk when not in use so a huge zram is wasteful. >>> >> > >>> >> > -5) Activate: >>> >> > +5) Set memory limit: Optional >>> >> > + Set memory limit by writing the value to sysfs node 'mem_limit'. >>> >> > + The value can be either in bytes or you can use mem suffixes. >>> >> > + In addition, you could change the value in runtime. >>> >> > + Examples: >>> >> > + # limit /dev/zram0 with 50MB memory >>> >> > + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit >>> >> > + >>> >> > + # Using mem suffixes >>> >> > + echo 256K > /sys/block/zram0/mem_limit >>> >> > + echo 512M > /sys/block/zram0/mem_limit >>> >> > + echo 1G > /sys/block/zram0/mem_limit >>> >> > + >>> >> > + # To disable memory limit >>> >> > + echo 0 > /sys/block/zram0/mem_limit >>> >> > + >>> >> > +6) Activate: >>> >> > mkswap /dev/zram0 >>> >> > swapon /dev/zram0 >>> >> > >>> >> > mkfs.ext4 /dev/zram1 >>> >> > mount /dev/zram1 /tmp >>> >> > >>> >> > -6) Stats: >>> >> > +7) Stats: >>> >> > Per-device statistics are exported as various nodes under >>> >> > /sys/block/zram<id>/ >>> >> > disksize >>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. >>> >> > compr_data_size >>> >> > mem_used_total >>> >> > >>> >> > -7) Deactivate: >>> >> > +8) Deactivate: >>> >> > swapoff /dev/zram0 >>> >> > umount /dev/zram1 >>> >> > >>> >> > -8) Reset: >>> >> > +9) Reset: >>> >> > Write any positive value to 'reset' sysfs node >>> >> > echo 1 > /sys/block/zram0/reset >>> >> > echo 1 > /sys/block/zram1/reset >>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c >>> >> > index f0b8b30a7128..370c355eb127 100644 >>> >> > --- a/drivers/block/zram/zram_drv.c >>> >> > +++ b/drivers/block/zram/zram_drv.c >>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, >>> >> > return scnprintf(buf, PAGE_SIZE, "%d\n", val); >>> >> > } >>> >> > >>> >> > +static ssize_t mem_limit_show(struct device *dev, >>> >> > + struct device_attribute *attr, char *buf) >>> >> > +{ >>> >> > + u64 val; >>> >> > + struct zram *zram = dev_to_zram(dev); >>> >> > + >>> >> > + down_read(&zram->init_lock); >>> >> > + val = zram->limit_pages; >>> >> > + up_read(&zram->init_lock); >>> >> > + >>> >> > + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); >>> >> > +} >>> >> > + >>> >> > +static ssize_t mem_limit_store(struct device *dev, >>> >> > + struct device_attribute *attr, const char *buf, size_t len) >>> >> > +{ >>> >> > + u64 limit; >>> >> > + struct zram *zram = dev_to_zram(dev); >>> >> > + >>> >> > + limit = memparse(buf, NULL); >>> >> >>> >> if (limit = 0 && buf != "0") >>> >> return -EINVAL >>> >> >>> >> > + down_write(&zram->init_lock); >>> >> > + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; >>> >> > + up_write(&zram->init_lock); >>> >> > + >>> >> > + return len; >>> >> > +} >>> >> > + >>> >> > static ssize_t max_comp_streams_store(struct device *dev, >>> >> > struct device_attribute *attr, const char *buf, size_t len) >>> >> > { >>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, >>> >> > ret = -ENOMEM; >>> >> > goto out; >>> >> > } >>> >> > + >>> >> > + if (zram->limit_pages && >>> >> > + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { >>> >> > + zs_free(meta->mem_pool, handle); >>> >> > + ret = -ENOMEM; >>> >> > + goto out; >>> >> > + } >>> >> > + >>> >> > cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); >>> >> > >>> >> > if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { >>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) >>> >> > struct zram_meta *meta; >>> >> > >>> >> > down_write(&zram->init_lock); >>> >> > + >>> >> > + zram->limit_pages = 0; >>> >> > + >>> >> > if (!init_done(zram)) { >>> >> > up_write(&zram->init_lock); >>> >> > return; >>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); >>> >> > static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); >>> >> > static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); >>> >> > static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); >>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, >>> >> > + mem_limit_store); >>> >> > static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, >>> >> > max_comp_streams_show, max_comp_streams_store); >>> >> > static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, >>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { >>> >> > &dev_attr_orig_data_size.attr, >>> >> > &dev_attr_compr_data_size.attr, >>> >> > &dev_attr_mem_used_total.attr, >>> >> > + &dev_attr_mem_limit.attr, >>> >> > &dev_attr_max_comp_streams.attr, >>> >> > &dev_attr_comp_algorithm.attr, >>> >> > NULL, >>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h >>> >> > index e0f725c87cc6..b7aa9c21553f 100644 >>> >> > --- a/drivers/block/zram/zram_drv.h >>> >> > +++ b/drivers/block/zram/zram_drv.h >>> >> > @@ -112,6 +112,11 @@ struct zram { >>> >> > u64 disksize; /* bytes */ >>> >> > int max_comp_streams; >>> >> > struct zram_stats stats; >>> >> > + /* >>> >> > + * the number of pages zram can consume for storing compressed data >>> >> > + */ >>> >> > + unsigned long limit_pages; >>> >> > + >>> >> > char compressor[10]; >>> >> > }; >>> >> > #endif >>> >> > -- >>> >> > 2.0.0 >>> >> > >>> >> >>> >> -- >>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>> >> the body to majordomo@kvack.org. For more info on Linux MM, >>> >> see: http://www.linux-mm.org/ . >>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >>> > >>> > -- >>> > Kind regards, >>> > Minchan Kim >>> >>> -- >>> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>> the body to majordomo@kvack.org. For more info on Linux MM, >>> see: http://www.linux-mm.org/ . >>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >> >> -- >> Kind regards, >> Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation 2014-08-25 18:12 ` Dan Streetman @ 2014-08-26 1:54 ` David Horner -1 siblings, 0 replies; 44+ messages in thread From: David Horner @ 2014-08-26 1:54 UTC (permalink / raw) To: Dan Streetman Cc: Minchan Kim, Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings On Mon, Aug 25, 2014 at 2:12 PM, Dan Streetman <ddstreet@ieee.org> wrote: > On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@gmail.com> wrote: >> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote: >>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote: >>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote: >>>> > Hello David, >>>> > >>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote: >>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: >>>> >> > Since zram has no control feature to limit memory usage, >>>> >> > it makes hard to manage system memrory. >>>> >> > >>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the >>>> >> > a limit so that zram could fail allocation once it reaches >>>> >> > the limit. >>>> >> > >>>> >> > In addition, user could change the limit in runtime so that >>>> >> > he could manage the memory more dynamically. >>>> >> > >>>> >> - Default is no limit so it doesn't break old behavior. >>>> >> + Initial state is no limit so it doesn't break old behavior. >>>> >> >>>> >> I understand your previous post now. >>>> >> >>>> >> I was saying that setting to either a null value or garbage >>>> >> (which is interpreted as zero by memparse(buf, NULL);) >>>> >> removes the limit. >>>> >> >>>> >> I think this is "surprise" behaviour and rather the null case should >>>> >> return -EINVAL >>>> >> The test below should be "good enough" though not catching all garbage. >>>> > >>>> > Thanks for suggesting but as I said, it should be fixed in memparse itself, >>>> > not caller if it is really problem so I don't want to touch it in this >>>> > patchset. It's not critical for adding the feature. >>>> > >>>> >>>> I've looked into the memparse function more since we talked. >>>> I do believe a wrapper function around it for the typical use by sysfs would >>>> be very valuable. >>> >>> Agree. >>> >>>> However, there is nothing wrong with memparse itself that needs to be fixed. >>>> >>>> It does what it is documented to do very well (In My Uninformed Opinion). >>>> It provides everything that a caller needs to manage the token that it >>>> processes. >>>> It thus handles strings like "7,,5,8,,9" with the implied zeros. >>> >>> Maybe strict_memparse would be better to protect such things so you >>> could find several places to clean it up. >>> >>>> >>>> The fact that other callers don't check the return pointer value to >>>> see if only a null >>>> string was processed, is not its fault. >>>> Nor that it may not be ideally suited to sysfs attributes; that other store >>>> functions use it in a given manner does not means that is correct - >>>> nor that it is >>>> incorrect for that "knob". Some attributes could be just as valid with >>>> null zeros. >>>> >>>> And you are correct, to disambiguate the zero is not required for the >>>> limit feature. >>>> Your original patch which disallowed zero was full feature for mem_limit. >>>> It is the requested non-crucial feature to allow zero to reestablish >>>> the initial state >>>> that benefits from distinguishing an explicit zero from a "default zero' >>>> when garbage is written. >>>> >>>> The final argument is that if we release this feature as is the undocumented >>>> functionality could be relied upon, and when later fixed: user space breaks. >>> >>> I don't get it. Why does it break userspace? >>> The sysfs-block-zram says "0" means disable the limit. >>> If someone writes *garabge* but work as if disabling the limit, >>> it's not a right thing and he already broke although it worked >>> so it would be not a problem if we fix later. >>> (ie, we don't need to take care of broken userspace) >>> Am I missing your point? >>> >> >> Perhaps you are missing my point, perhaps ignoring or dismissing. >> >> Basically, if a facility works in a useful way, even if it was designed for >> different usage, that becomes the "accepted" interface/usage. >> The developer may not have intended that usage or may even considered >> it wrong and a broken usage, but it is what it is and people become >> reliant on that behaviour. >> >> Case in point is memparse itself. >> >> The developer intentionally sets the return pointer because that is the >> only value that can be validated for correct performance. >> The return value allows -ve so the standard error message passing is not valid. >> Unfortunately, C allows the user to pass a NULL value in the parameter. >> The developer could consider that absurd and fundamentally broken. >> But to the user it is a valid situation, because (perhaps) it can't be >> bothered to handle error cases. >> >> So, who is to blame. >> You say memparse, that it is fundamentally broken, >> because it didn't check to see that it was used correctly. >> And I say mem_limit_store is fundamentally broken, >> because it didn't check to see that it was used correctly. > > I think we should look at what the rest of the kernel does as far as > checking memparse results. It appears to be a mix of some code > checking memparse while others don't. The most common way to check > appears to be to verify that memparse actually parsed at least 1 > character, e.g.: > oldp = p; > mem_size = memparse(p, &p); > if (p == oldp) > return -EINVAL; > > although other places where 0 isn't valid can simply check for that: > mem_size = memparse(p, &p); > /* don't remove all of memory when handling "mem={invalid}" param */ > if (mem_size == 0) > return -EINVAL; > > or even the other memparse use in zram_drv.c: > disksize = memparse(buf, NULL); > if (!disksize) > return -EINVAL; > > > And there seem to be other places where (maybe?) there's no checking > at all. However, it also seems like many cases of memparse usage are > looking for a non-zero value, and therefore they can either > immediately check for zero/invalid or (possibly) later code has checks > to avoid using any zero value. In this case though, 0 is a valid > value. So, while I agree that if a user passes an invalid (i.e. > non-numeric) value it's clearly user error, it might be closer to the > apparent (although unwritten AFAICT) memparse usage api to check the > result for validity; in our case a simple check if at least 1 char was > parsed is all that's needed, e.g.: > > { > u64 limit; > char *tmp = buf; > struct zram *zram = dev_to_zram(dev); > > limit = memparse(buf, &tmp); > if (buf == tmp) /* no chars parsed, invalid input */ > return -EINVAL; > down_write(&zram->init_lock); Thank you Dan, for this clear, unoffensive and I believe compelling analysis. I have much to learn. > ... > > > Separate from this patch, it would also help if the lib/cmdline.c > memparse doc was at least updated to clarify when the result should be > checked for validity (e.g. always, or at least when the result is 0) > and how best to do that (e.g. if 0 is an invalid value, just check if > the result is 0; if 0 is a possible valid value, check if any chars > were parsed). > > I'd argue that the code is not the place for this usage recommendation. But rather an expansion of the support doc for sysfs on how to use such parsing/validation routines. I agree with Minchan that these helper functions could be improved for specific use by sysfs. And I will pursue this. (and maybe the documentation?) >> >> The difference is that memparse cannot stop being abused >> (C allows the NULL argument and extensive tricks are required to address that) >> however, we can readily fix mem_limit_store and ensure >> 1) no regression when the interface IS fixed and >> 2) predictable behaviour when accidental or "fuzzy" input arrives. >> >> >>>> They say getting API right is a difficult exercise. I suggest, if we >>>> don't insisting on >>>> an explicit zero we have the API wrong. >>>> >>>> I don't think you disagreed, just that the burden to get it correct >>>> lay elsewhere. >>>> >>>> If that is the case it doesn't really matter, we cannot release this >>>> interface until >>>> it is corrected wherever it must be. >>>> >>>> And my zero check was a poor hack. >>>> >>>> I should have explicitly checked the returned pointer value. >>>> >>>> I will send that proposed revision, and hopefully you will consider it >>>> for inclusion. >>>> >>>> >>>> >>>> >>>> >> >>>> >> > >>>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org> >>>> >> > --- >>>> >> > Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ >>>> >> > Documentation/blockdev/zram.txt | 24 ++++++++++++++--- >>>> >> > drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ >>>> >> > drivers/block/zram/zram_drv.h | 5 ++++ >>>> >> > 4 files changed, 76 insertions(+), 4 deletions(-) >>>> >> > >>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram >>>> >> > index 70ec992514d0..b8c779d64968 100644 >>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram >>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram >>>> >> > @@ -119,3 +119,13 @@ Description: >>>> >> > efficiency can be calculated using compr_data_size and this >>>> >> > statistic. >>>> >> > Unit: bytes >>>> >> > + >>>> >> > +What: /sys/block/zram<id>/mem_limit >>>> >> > +Date: August 2014 >>>> >> > +Contact: Minchan Kim <minchan@kernel.org> >>>> >> > +Description: >>>> >> > + The mem_limit file is read/write and specifies the amount >>>> >> > + of memory to be able to consume memory to store store >>>> >> > + compressed data. The limit could be changed in run time >>>> >> > - and "0" is default which means disable the limit. >>>> >> > + and "0" means disable the limit. No limit is the initial state. >>>> >> >>>> >> there should be no default in the API. >>>> > >>>> > Thanks. >>>> > >>>> >> >>>> >> > + Unit: bytes >>>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt >>>> >> > index 0595c3f56ccf..82c6a41116db 100644 >>>> >> > --- a/Documentation/blockdev/zram.txt >>>> >> > +++ b/Documentation/blockdev/zram.txt >>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory >>>> >> > since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the >>>> >> > size of the disk when not in use so a huge zram is wasteful. >>>> >> > >>>> >> > -5) Activate: >>>> >> > +5) Set memory limit: Optional >>>> >> > + Set memory limit by writing the value to sysfs node 'mem_limit'. >>>> >> > + The value can be either in bytes or you can use mem suffixes. >>>> >> > + In addition, you could change the value in runtime. >>>> >> > + Examples: >>>> >> > + # limit /dev/zram0 with 50MB memory >>>> >> > + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit >>>> >> > + >>>> >> > + # Using mem suffixes >>>> >> > + echo 256K > /sys/block/zram0/mem_limit >>>> >> > + echo 512M > /sys/block/zram0/mem_limit >>>> >> > + echo 1G > /sys/block/zram0/mem_limit >>>> >> > + >>>> >> > + # To disable memory limit >>>> >> > + echo 0 > /sys/block/zram0/mem_limit >>>> >> > + >>>> >> > +6) Activate: >>>> >> > mkswap /dev/zram0 >>>> >> > swapon /dev/zram0 >>>> >> > >>>> >> > mkfs.ext4 /dev/zram1 >>>> >> > mount /dev/zram1 /tmp >>>> >> > >>>> >> > -6) Stats: >>>> >> > +7) Stats: >>>> >> > Per-device statistics are exported as various nodes under >>>> >> > /sys/block/zram<id>/ >>>> >> > disksize >>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. >>>> >> > compr_data_size >>>> >> > mem_used_total >>>> >> > >>>> >> > -7) Deactivate: >>>> >> > +8) Deactivate: >>>> >> > swapoff /dev/zram0 >>>> >> > umount /dev/zram1 >>>> >> > >>>> >> > -8) Reset: >>>> >> > +9) Reset: >>>> >> > Write any positive value to 'reset' sysfs node >>>> >> > echo 1 > /sys/block/zram0/reset >>>> >> > echo 1 > /sys/block/zram1/reset >>>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c >>>> >> > index f0b8b30a7128..370c355eb127 100644 >>>> >> > --- a/drivers/block/zram/zram_drv.c >>>> >> > +++ b/drivers/block/zram/zram_drv.c >>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, >>>> >> > return scnprintf(buf, PAGE_SIZE, "%d\n", val); >>>> >> > } >>>> >> > >>>> >> > +static ssize_t mem_limit_show(struct device *dev, >>>> >> > + struct device_attribute *attr, char *buf) >>>> >> > +{ >>>> >> > + u64 val; >>>> >> > + struct zram *zram = dev_to_zram(dev); >>>> >> > + >>>> >> > + down_read(&zram->init_lock); >>>> >> > + val = zram->limit_pages; >>>> >> > + up_read(&zram->init_lock); >>>> >> > + >>>> >> > + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); >>>> >> > +} >>>> >> > + >>>> >> > +static ssize_t mem_limit_store(struct device *dev, >>>> >> > + struct device_attribute *attr, const char *buf, size_t len) >>>> >> > +{ >>>> >> > + u64 limit; >>>> >> > + struct zram *zram = dev_to_zram(dev); >>>> >> > + >>>> >> > + limit = memparse(buf, NULL); >>>> >> >>>> >> if (limit = 0 && buf != "0") >>>> >> return -EINVAL >>>> >> >>>> >> > + down_write(&zram->init_lock); >>>> >> > + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; >>>> >> > + up_write(&zram->init_lock); >>>> >> > + >>>> >> > + return len; >>>> >> > +} >>>> >> > + >>>> >> > static ssize_t max_comp_streams_store(struct device *dev, >>>> >> > struct device_attribute *attr, const char *buf, size_t len) >>>> >> > { >>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, >>>> >> > ret = -ENOMEM; >>>> >> > goto out; >>>> >> > } >>>> >> > + >>>> >> > + if (zram->limit_pages && >>>> >> > + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { >>>> >> > + zs_free(meta->mem_pool, handle); >>>> >> > + ret = -ENOMEM; >>>> >> > + goto out; >>>> >> > + } >>>> >> > + >>>> >> > cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); >>>> >> > >>>> >> > if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { >>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) >>>> >> > struct zram_meta *meta; >>>> >> > >>>> >> > down_write(&zram->init_lock); >>>> >> > + >>>> >> > + zram->limit_pages = 0; >>>> >> > + >>>> >> > if (!init_done(zram)) { >>>> >> > up_write(&zram->init_lock); >>>> >> > return; >>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); >>>> >> > static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); >>>> >> > static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); >>>> >> > static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); >>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, >>>> >> > + mem_limit_store); >>>> >> > static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, >>>> >> > max_comp_streams_show, max_comp_streams_store); >>>> >> > static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, >>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { >>>> >> > &dev_attr_orig_data_size.attr, >>>> >> > &dev_attr_compr_data_size.attr, >>>> >> > &dev_attr_mem_used_total.attr, >>>> >> > + &dev_attr_mem_limit.attr, >>>> >> > &dev_attr_max_comp_streams.attr, >>>> >> > &dev_attr_comp_algorithm.attr, >>>> >> > NULL, >>>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h >>>> >> > index e0f725c87cc6..b7aa9c21553f 100644 >>>> >> > --- a/drivers/block/zram/zram_drv.h >>>> >> > +++ b/drivers/block/zram/zram_drv.h >>>> >> > @@ -112,6 +112,11 @@ struct zram { >>>> >> > u64 disksize; /* bytes */ >>>> >> > int max_comp_streams; >>>> >> > struct zram_stats stats; >>>> >> > + /* >>>> >> > + * the number of pages zram can consume for storing compressed data >>>> >> > + */ >>>> >> > + unsigned long limit_pages; >>>> >> > + >>>> >> > char compressor[10]; >>>> >> > }; >>>> >> > #endif >>>> >> > -- >>>> >> > 2.0.0 >>>> >> > >>>> >> >>>> >> -- >>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>>> >> the body to majordomo@kvack.org. For more info on Linux MM, >>>> >> see: http://www.linux-mm.org/ . >>>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >>>> > >>>> > -- >>>> > Kind regards, >>>> > Minchan Kim >>>> >>>> -- >>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>>> the body to majordomo@kvack.org. For more info on Linux MM, >>>> see: http://www.linux-mm.org/ . >>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >>> >>> -- >>> Kind regards, >>> Minchan Kim ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation @ 2014-08-26 1:54 ` David Horner 0 siblings, 0 replies; 44+ messages in thread From: David Horner @ 2014-08-26 1:54 UTC (permalink / raw) To: Dan Streetman Cc: Minchan Kim, Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings On Mon, Aug 25, 2014 at 2:12 PM, Dan Streetman <ddstreet@ieee.org> wrote: > On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@gmail.com> wrote: >> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote: >>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote: >>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote: >>>> > Hello David, >>>> > >>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote: >>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: >>>> >> > Since zram has no control feature to limit memory usage, >>>> >> > it makes hard to manage system memrory. >>>> >> > >>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the >>>> >> > a limit so that zram could fail allocation once it reaches >>>> >> > the limit. >>>> >> > >>>> >> > In addition, user could change the limit in runtime so that >>>> >> > he could manage the memory more dynamically. >>>> >> > >>>> >> - Default is no limit so it doesn't break old behavior. >>>> >> + Initial state is no limit so it doesn't break old behavior. >>>> >> >>>> >> I understand your previous post now. >>>> >> >>>> >> I was saying that setting to either a null value or garbage >>>> >> (which is interpreted as zero by memparse(buf, NULL);) >>>> >> removes the limit. >>>> >> >>>> >> I think this is "surprise" behaviour and rather the null case should >>>> >> return -EINVAL >>>> >> The test below should be "good enough" though not catching all garbage. >>>> > >>>> > Thanks for suggesting but as I said, it should be fixed in memparse itself, >>>> > not caller if it is really problem so I don't want to touch it in this >>>> > patchset. It's not critical for adding the feature. >>>> > >>>> >>>> I've looked into the memparse function more since we talked. >>>> I do believe a wrapper function around it for the typical use by sysfs would >>>> be very valuable. >>> >>> Agree. >>> >>>> However, there is nothing wrong with memparse itself that needs to be fixed. >>>> >>>> It does what it is documented to do very well (In My Uninformed Opinion). >>>> It provides everything that a caller needs to manage the token that it >>>> processes. >>>> It thus handles strings like "7,,5,8,,9" with the implied zeros. >>> >>> Maybe strict_memparse would be better to protect such things so you >>> could find several places to clean it up. >>> >>>> >>>> The fact that other callers don't check the return pointer value to >>>> see if only a null >>>> string was processed, is not its fault. >>>> Nor that it may not be ideally suited to sysfs attributes; that other store >>>> functions use it in a given manner does not means that is correct - >>>> nor that it is >>>> incorrect for that "knob". Some attributes could be just as valid with >>>> null zeros. >>>> >>>> And you are correct, to disambiguate the zero is not required for the >>>> limit feature. >>>> Your original patch which disallowed zero was full feature for mem_limit. >>>> It is the requested non-crucial feature to allow zero to reestablish >>>> the initial state >>>> that benefits from distinguishing an explicit zero from a "default zero' >>>> when garbage is written. >>>> >>>> The final argument is that if we release this feature as is the undocumented >>>> functionality could be relied upon, and when later fixed: user space breaks. >>> >>> I don't get it. Why does it break userspace? >>> The sysfs-block-zram says "0" means disable the limit. >>> If someone writes *garabge* but work as if disabling the limit, >>> it's not a right thing and he already broke although it worked >>> so it would be not a problem if we fix later. >>> (ie, we don't need to take care of broken userspace) >>> Am I missing your point? >>> >> >> Perhaps you are missing my point, perhaps ignoring or dismissing. >> >> Basically, if a facility works in a useful way, even if it was designed for >> different usage, that becomes the "accepted" interface/usage. >> The developer may not have intended that usage or may even considered >> it wrong and a broken usage, but it is what it is and people become >> reliant on that behaviour. >> >> Case in point is memparse itself. >> >> The developer intentionally sets the return pointer because that is the >> only value that can be validated for correct performance. >> The return value allows -ve so the standard error message passing is not valid. >> Unfortunately, C allows the user to pass a NULL value in the parameter. >> The developer could consider that absurd and fundamentally broken. >> But to the user it is a valid situation, because (perhaps) it can't be >> bothered to handle error cases. >> >> So, who is to blame. >> You say memparse, that it is fundamentally broken, >> because it didn't check to see that it was used correctly. >> And I say mem_limit_store is fundamentally broken, >> because it didn't check to see that it was used correctly. > > I think we should look at what the rest of the kernel does as far as > checking memparse results. It appears to be a mix of some code > checking memparse while others don't. The most common way to check > appears to be to verify that memparse actually parsed at least 1 > character, e.g.: > oldp = p; > mem_size = memparse(p, &p); > if (p == oldp) > return -EINVAL; > > although other places where 0 isn't valid can simply check for that: > mem_size = memparse(p, &p); > /* don't remove all of memory when handling "mem={invalid}" param */ > if (mem_size == 0) > return -EINVAL; > > or even the other memparse use in zram_drv.c: > disksize = memparse(buf, NULL); > if (!disksize) > return -EINVAL; > > > And there seem to be other places where (maybe?) there's no checking > at all. However, it also seems like many cases of memparse usage are > looking for a non-zero value, and therefore they can either > immediately check for zero/invalid or (possibly) later code has checks > to avoid using any zero value. In this case though, 0 is a valid > value. So, while I agree that if a user passes an invalid (i.e. > non-numeric) value it's clearly user error, it might be closer to the > apparent (although unwritten AFAICT) memparse usage api to check the > result for validity; in our case a simple check if at least 1 char was > parsed is all that's needed, e.g.: > > { > u64 limit; > char *tmp = buf; > struct zram *zram = dev_to_zram(dev); > > limit = memparse(buf, &tmp); > if (buf == tmp) /* no chars parsed, invalid input */ > return -EINVAL; > down_write(&zram->init_lock); Thank you Dan, for this clear, unoffensive and I believe compelling analysis. I have much to learn. > ... > > > Separate from this patch, it would also help if the lib/cmdline.c > memparse doc was at least updated to clarify when the result should be > checked for validity (e.g. always, or at least when the result is 0) > and how best to do that (e.g. if 0 is an invalid value, just check if > the result is 0; if 0 is a possible valid value, check if any chars > were parsed). > > I'd argue that the code is not the place for this usage recommendation. But rather an expansion of the support doc for sysfs on how to use such parsing/validation routines. I agree with Minchan that these helper functions could be improved for specific use by sysfs. And I will pursue this. (and maybe the documentation?) >> >> The difference is that memparse cannot stop being abused >> (C allows the NULL argument and extensive tricks are required to address that) >> however, we can readily fix mem_limit_store and ensure >> 1) no regression when the interface IS fixed and >> 2) predictable behaviour when accidental or "fuzzy" input arrives. >> >> >>>> They say getting API right is a difficult exercise. I suggest, if we >>>> don't insisting on >>>> an explicit zero we have the API wrong. >>>> >>>> I don't think you disagreed, just that the burden to get it correct >>>> lay elsewhere. >>>> >>>> If that is the case it doesn't really matter, we cannot release this >>>> interface until >>>> it is corrected wherever it must be. >>>> >>>> And my zero check was a poor hack. >>>> >>>> I should have explicitly checked the returned pointer value. >>>> >>>> I will send that proposed revision, and hopefully you will consider it >>>> for inclusion. >>>> >>>> >>>> >>>> >>>> >> >>>> >> > >>>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org> >>>> >> > --- >>>> >> > Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ >>>> >> > Documentation/blockdev/zram.txt | 24 ++++++++++++++--- >>>> >> > drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ >>>> >> > drivers/block/zram/zram_drv.h | 5 ++++ >>>> >> > 4 files changed, 76 insertions(+), 4 deletions(-) >>>> >> > >>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram >>>> >> > index 70ec992514d0..b8c779d64968 100644 >>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram >>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram >>>> >> > @@ -119,3 +119,13 @@ Description: >>>> >> > efficiency can be calculated using compr_data_size and this >>>> >> > statistic. >>>> >> > Unit: bytes >>>> >> > + >>>> >> > +What: /sys/block/zram<id>/mem_limit >>>> >> > +Date: August 2014 >>>> >> > +Contact: Minchan Kim <minchan@kernel.org> >>>> >> > +Description: >>>> >> > + The mem_limit file is read/write and specifies the amount >>>> >> > + of memory to be able to consume memory to store store >>>> >> > + compressed data. The limit could be changed in run time >>>> >> > - and "0" is default which means disable the limit. >>>> >> > + and "0" means disable the limit. No limit is the initial state. >>>> >> >>>> >> there should be no default in the API. >>>> > >>>> > Thanks. >>>> > >>>> >> >>>> >> > + Unit: bytes >>>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt >>>> >> > index 0595c3f56ccf..82c6a41116db 100644 >>>> >> > --- a/Documentation/blockdev/zram.txt >>>> >> > +++ b/Documentation/blockdev/zram.txt >>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory >>>> >> > since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the >>>> >> > size of the disk when not in use so a huge zram is wasteful. >>>> >> > >>>> >> > -5) Activate: >>>> >> > +5) Set memory limit: Optional >>>> >> > + Set memory limit by writing the value to sysfs node 'mem_limit'. >>>> >> > + The value can be either in bytes or you can use mem suffixes. >>>> >> > + In addition, you could change the value in runtime. >>>> >> > + Examples: >>>> >> > + # limit /dev/zram0 with 50MB memory >>>> >> > + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit >>>> >> > + >>>> >> > + # Using mem suffixes >>>> >> > + echo 256K > /sys/block/zram0/mem_limit >>>> >> > + echo 512M > /sys/block/zram0/mem_limit >>>> >> > + echo 1G > /sys/block/zram0/mem_limit >>>> >> > + >>>> >> > + # To disable memory limit >>>> >> > + echo 0 > /sys/block/zram0/mem_limit >>>> >> > + >>>> >> > +6) Activate: >>>> >> > mkswap /dev/zram0 >>>> >> > swapon /dev/zram0 >>>> >> > >>>> >> > mkfs.ext4 /dev/zram1 >>>> >> > mount /dev/zram1 /tmp >>>> >> > >>>> >> > -6) Stats: >>>> >> > +7) Stats: >>>> >> > Per-device statistics are exported as various nodes under >>>> >> > /sys/block/zram<id>/ >>>> >> > disksize >>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. >>>> >> > compr_data_size >>>> >> > mem_used_total >>>> >> > >>>> >> > -7) Deactivate: >>>> >> > +8) Deactivate: >>>> >> > swapoff /dev/zram0 >>>> >> > umount /dev/zram1 >>>> >> > >>>> >> > -8) Reset: >>>> >> > +9) Reset: >>>> >> > Write any positive value to 'reset' sysfs node >>>> >> > echo 1 > /sys/block/zram0/reset >>>> >> > echo 1 > /sys/block/zram1/reset >>>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c >>>> >> > index f0b8b30a7128..370c355eb127 100644 >>>> >> > --- a/drivers/block/zram/zram_drv.c >>>> >> > +++ b/drivers/block/zram/zram_drv.c >>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, >>>> >> > return scnprintf(buf, PAGE_SIZE, "%d\n", val); >>>> >> > } >>>> >> > >>>> >> > +static ssize_t mem_limit_show(struct device *dev, >>>> >> > + struct device_attribute *attr, char *buf) >>>> >> > +{ >>>> >> > + u64 val; >>>> >> > + struct zram *zram = dev_to_zram(dev); >>>> >> > + >>>> >> > + down_read(&zram->init_lock); >>>> >> > + val = zram->limit_pages; >>>> >> > + up_read(&zram->init_lock); >>>> >> > + >>>> >> > + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); >>>> >> > +} >>>> >> > + >>>> >> > +static ssize_t mem_limit_store(struct device *dev, >>>> >> > + struct device_attribute *attr, const char *buf, size_t len) >>>> >> > +{ >>>> >> > + u64 limit; >>>> >> > + struct zram *zram = dev_to_zram(dev); >>>> >> > + >>>> >> > + limit = memparse(buf, NULL); >>>> >> >>>> >> if (limit = 0 && buf != "0") >>>> >> return -EINVAL >>>> >> >>>> >> > + down_write(&zram->init_lock); >>>> >> > + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; >>>> >> > + up_write(&zram->init_lock); >>>> >> > + >>>> >> > + return len; >>>> >> > +} >>>> >> > + >>>> >> > static ssize_t max_comp_streams_store(struct device *dev, >>>> >> > struct device_attribute *attr, const char *buf, size_t len) >>>> >> > { >>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, >>>> >> > ret = -ENOMEM; >>>> >> > goto out; >>>> >> > } >>>> >> > + >>>> >> > + if (zram->limit_pages && >>>> >> > + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { >>>> >> > + zs_free(meta->mem_pool, handle); >>>> >> > + ret = -ENOMEM; >>>> >> > + goto out; >>>> >> > + } >>>> >> > + >>>> >> > cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); >>>> >> > >>>> >> > if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { >>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) >>>> >> > struct zram_meta *meta; >>>> >> > >>>> >> > down_write(&zram->init_lock); >>>> >> > + >>>> >> > + zram->limit_pages = 0; >>>> >> > + >>>> >> > if (!init_done(zram)) { >>>> >> > up_write(&zram->init_lock); >>>> >> > return; >>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); >>>> >> > static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); >>>> >> > static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); >>>> >> > static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); >>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, >>>> >> > + mem_limit_store); >>>> >> > static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, >>>> >> > max_comp_streams_show, max_comp_streams_store); >>>> >> > static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, >>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { >>>> >> > &dev_attr_orig_data_size.attr, >>>> >> > &dev_attr_compr_data_size.attr, >>>> >> > &dev_attr_mem_used_total.attr, >>>> >> > + &dev_attr_mem_limit.attr, >>>> >> > &dev_attr_max_comp_streams.attr, >>>> >> > &dev_attr_comp_algorithm.attr, >>>> >> > NULL, >>>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h >>>> >> > index e0f725c87cc6..b7aa9c21553f 100644 >>>> >> > --- a/drivers/block/zram/zram_drv.h >>>> >> > +++ b/drivers/block/zram/zram_drv.h >>>> >> > @@ -112,6 +112,11 @@ struct zram { >>>> >> > u64 disksize; /* bytes */ >>>> >> > int max_comp_streams; >>>> >> > struct zram_stats stats; >>>> >> > + /* >>>> >> > + * the number of pages zram can consume for storing compressed data >>>> >> > + */ >>>> >> > + unsigned long limit_pages; >>>> >> > + >>>> >> > char compressor[10]; >>>> >> > }; >>>> >> > #endif >>>> >> > -- >>>> >> > 2.0.0 >>>> >> > >>>> >> >>>> >> -- >>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>>> >> the body to majordomo@kvack.org. For more info on Linux MM, >>>> >> see: http://www.linux-mm.org/ . >>>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >>>> > >>>> > -- >>>> > Kind regards, >>>> > Minchan Kim >>>> >>>> -- >>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>>> the body to majordomo@kvack.org. For more info on Linux MM, >>>> see: http://www.linux-mm.org/ . >>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >>> >>> -- >>> Kind regards, >>> Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation 2014-08-26 1:54 ` David Horner @ 2014-08-26 4:39 ` Minchan Kim -1 siblings, 0 replies; 44+ messages in thread From: Minchan Kim @ 2014-08-26 4:39 UTC (permalink / raw) To: David Horner Cc: Dan Streetman, Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings Hi Dan and David, On Mon, Aug 25, 2014 at 09:54:57PM -0400, David Horner wrote: > On Mon, Aug 25, 2014 at 2:12 PM, Dan Streetman <ddstreet@ieee.org> wrote: > > On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@gmail.com> wrote: > >> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote: > >>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote: > >>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote: > >>>> > Hello David, > >>>> > > >>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote: > >>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: > >>>> >> > Since zram has no control feature to limit memory usage, > >>>> >> > it makes hard to manage system memrory. > >>>> >> > > >>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the > >>>> >> > a limit so that zram could fail allocation once it reaches > >>>> >> > the limit. > >>>> >> > > >>>> >> > In addition, user could change the limit in runtime so that > >>>> >> > he could manage the memory more dynamically. > >>>> >> > > >>>> >> - Default is no limit so it doesn't break old behavior. > >>>> >> + Initial state is no limit so it doesn't break old behavior. > >>>> >> > >>>> >> I understand your previous post now. > >>>> >> > >>>> >> I was saying that setting to either a null value or garbage > >>>> >> (which is interpreted as zero by memparse(buf, NULL);) > >>>> >> removes the limit. > >>>> >> > >>>> >> I think this is "surprise" behaviour and rather the null case should > >>>> >> return -EINVAL > >>>> >> The test below should be "good enough" though not catching all garbage. > >>>> > > >>>> > Thanks for suggesting but as I said, it should be fixed in memparse itself, > >>>> > not caller if it is really problem so I don't want to touch it in this > >>>> > patchset. It's not critical for adding the feature. > >>>> > > >>>> > >>>> I've looked into the memparse function more since we talked. > >>>> I do believe a wrapper function around it for the typical use by sysfs would > >>>> be very valuable. > >>> > >>> Agree. > >>> > >>>> However, there is nothing wrong with memparse itself that needs to be fixed. > >>>> > >>>> It does what it is documented to do very well (In My Uninformed Opinion). > >>>> It provides everything that a caller needs to manage the token that it > >>>> processes. > >>>> It thus handles strings like "7,,5,8,,9" with the implied zeros. > >>> > >>> Maybe strict_memparse would be better to protect such things so you > >>> could find several places to clean it up. > >>> > >>>> > >>>> The fact that other callers don't check the return pointer value to > >>>> see if only a null > >>>> string was processed, is not its fault. > >>>> Nor that it may not be ideally suited to sysfs attributes; that other store > >>>> functions use it in a given manner does not means that is correct - > >>>> nor that it is > >>>> incorrect for that "knob". Some attributes could be just as valid with > >>>> null zeros. > >>>> > >>>> And you are correct, to disambiguate the zero is not required for the > >>>> limit feature. > >>>> Your original patch which disallowed zero was full feature for mem_limit. > >>>> It is the requested non-crucial feature to allow zero to reestablish > >>>> the initial state > >>>> that benefits from distinguishing an explicit zero from a "default zero' > >>>> when garbage is written. > >>>> > >>>> The final argument is that if we release this feature as is the undocumented > >>>> functionality could be relied upon, and when later fixed: user space breaks. > >>> > >>> I don't get it. Why does it break userspace? > >>> The sysfs-block-zram says "0" means disable the limit. > >>> If someone writes *garabge* but work as if disabling the limit, > >>> it's not a right thing and he already broke although it worked > >>> so it would be not a problem if we fix later. > >>> (ie, we don't need to take care of broken userspace) > >>> Am I missing your point? > >>> > >> > >> Perhaps you are missing my point, perhaps ignoring or dismissing. > >> > >> Basically, if a facility works in a useful way, even if it was designed for > >> different usage, that becomes the "accepted" interface/usage. > >> The developer may not have intended that usage or may even considered > >> it wrong and a broken usage, but it is what it is and people become > >> reliant on that behaviour. > >> > >> Case in point is memparse itself. > >> > >> The developer intentionally sets the return pointer because that is the > >> only value that can be validated for correct performance. > >> The return value allows -ve so the standard error message passing is not valid. > >> Unfortunately, C allows the user to pass a NULL value in the parameter. > >> The developer could consider that absurd and fundamentally broken. > >> But to the user it is a valid situation, because (perhaps) it can't be > >> bothered to handle error cases. > >> > >> So, who is to blame. > >> You say memparse, that it is fundamentally broken, > >> because it didn't check to see that it was used correctly. > >> And I say mem_limit_store is fundamentally broken, > >> because it didn't check to see that it was used correctly. > > > > I think we should look at what the rest of the kernel does as far as > > checking memparse results. It appears to be a mix of some code > > checking memparse while others don't. The most common way to check > > appears to be to verify that memparse actually parsed at least 1 > > character, e.g.: > > oldp = p; > > mem_size = memparse(p, &p); > > if (p == oldp) > > return -EINVAL; > > > > although other places where 0 isn't valid can simply check for that: > > mem_size = memparse(p, &p); > > /* don't remove all of memory when handling "mem={invalid}" param */ > > if (mem_size == 0) > > return -EINVAL; > > > > or even the other memparse use in zram_drv.c: > > disksize = memparse(buf, NULL); > > if (!disksize) > > return -EINVAL; > > > > > > And there seem to be other places where (maybe?) there's no checking > > at all. However, it also seems like many cases of memparse usage are > > looking for a non-zero value, and therefore they can either > > immediately check for zero/invalid or (possibly) later code has checks > > to avoid using any zero value. In this case though, 0 is a valid > > value. So, while I agree that if a user passes an invalid (i.e. > > non-numeric) value it's clearly user error, it might be closer to the > > apparent (although unwritten AFAICT) memparse usage api to check the > > result for validity; in our case a simple check if at least 1 char was > > parsed is all that's needed, e.g.: > > > > { > > u64 limit; > > char *tmp = buf; > > struct zram *zram = dev_to_zram(dev); > > > > limit = memparse(buf, &tmp); > > if (buf == tmp) /* no chars parsed, invalid input */ > > return -EINVAL; > > down_write(&zram->init_lock); > > > Thank you Dan, for this clear, unoffensive and I believe compelling analysis. Thanks for suggestion, Dan. David, Are you okay for this? You pointed out several cases. One was NULL check. Dan's patch will fix it but other example you pointed out was "7,,5,8,,9". Slightly modifying your example, "0..1" can reset without returning EINVAL. Actually, it was not what we want. Couldn't we check it if you guys really want to prevent wrong use from userspace? If we don't need it, pz, give me a reason so I will convince and proceed this patchset and do further works. Thanks. > > I have much to learn. > > > ... > > > > > > Separate from this patch, it would also help if the lib/cmdline.c > > memparse doc was at least updated to clarify when the result should be > > checked for validity (e.g. always, or at least when the result is 0) > > and how best to do that (e.g. if 0 is an invalid value, just check if > > the result is 0; if 0 is a possible valid value, check if any chars > > were parsed). > > > > > > I'd argue that the code is not the place for this usage recommendation. > But rather an expansion of the support doc for sysfs > on how to use such parsing/validation routines. > > I agree with Minchan that these helper functions could be improved > for specific use by sysfs. > And I will pursue this. (and maybe the documentation?) > > > >> > >> The difference is that memparse cannot stop being abused > >> (C allows the NULL argument and extensive tricks are required to address that) > >> however, we can readily fix mem_limit_store and ensure > >> 1) no regression when the interface IS fixed and > >> 2) predictable behaviour when accidental or "fuzzy" input arrives. > >> > >> > >>>> They say getting API right is a difficult exercise. I suggest, if we > >>>> don't insisting on > >>>> an explicit zero we have the API wrong. > >>>> > >>>> I don't think you disagreed, just that the burden to get it correct > >>>> lay elsewhere. > >>>> > >>>> If that is the case it doesn't really matter, we cannot release this > >>>> interface until > >>>> it is corrected wherever it must be. > >>>> > >>>> And my zero check was a poor hack. > >>>> > >>>> I should have explicitly checked the returned pointer value. > >>>> > >>>> I will send that proposed revision, and hopefully you will consider it > >>>> for inclusion. > >>>> > >>>> > >>>> > >>>> > >>>> >> > >>>> >> > > >>>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org> > >>>> >> > --- > >>>> >> > Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ > >>>> >> > Documentation/blockdev/zram.txt | 24 ++++++++++++++--- > >>>> >> > drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ > >>>> >> > drivers/block/zram/zram_drv.h | 5 ++++ > >>>> >> > 4 files changed, 76 insertions(+), 4 deletions(-) > >>>> >> > > >>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram > >>>> >> > index 70ec992514d0..b8c779d64968 100644 > >>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram > >>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram > >>>> >> > @@ -119,3 +119,13 @@ Description: > >>>> >> > efficiency can be calculated using compr_data_size and this > >>>> >> > statistic. > >>>> >> > Unit: bytes > >>>> >> > + > >>>> >> > +What: /sys/block/zram<id>/mem_limit > >>>> >> > +Date: August 2014 > >>>> >> > +Contact: Minchan Kim <minchan@kernel.org> > >>>> >> > +Description: > >>>> >> > + The mem_limit file is read/write and specifies the amount > >>>> >> > + of memory to be able to consume memory to store store > >>>> >> > + compressed data. The limit could be changed in run time > >>>> >> > - and "0" is default which means disable the limit. > >>>> >> > + and "0" means disable the limit. No limit is the initial state. > >>>> >> > >>>> >> there should be no default in the API. > >>>> > > >>>> > Thanks. > >>>> > > >>>> >> > >>>> >> > + Unit: bytes > >>>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt > >>>> >> > index 0595c3f56ccf..82c6a41116db 100644 > >>>> >> > --- a/Documentation/blockdev/zram.txt > >>>> >> > +++ b/Documentation/blockdev/zram.txt > >>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory > >>>> >> > since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the > >>>> >> > size of the disk when not in use so a huge zram is wasteful. > >>>> >> > > >>>> >> > -5) Activate: > >>>> >> > +5) Set memory limit: Optional > >>>> >> > + Set memory limit by writing the value to sysfs node 'mem_limit'. > >>>> >> > + The value can be either in bytes or you can use mem suffixes. > >>>> >> > + In addition, you could change the value in runtime. > >>>> >> > + Examples: > >>>> >> > + # limit /dev/zram0 with 50MB memory > >>>> >> > + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit > >>>> >> > + > >>>> >> > + # Using mem suffixes > >>>> >> > + echo 256K > /sys/block/zram0/mem_limit > >>>> >> > + echo 512M > /sys/block/zram0/mem_limit > >>>> >> > + echo 1G > /sys/block/zram0/mem_limit > >>>> >> > + > >>>> >> > + # To disable memory limit > >>>> >> > + echo 0 > /sys/block/zram0/mem_limit > >>>> >> > + > >>>> >> > +6) Activate: > >>>> >> > mkswap /dev/zram0 > >>>> >> > swapon /dev/zram0 > >>>> >> > > >>>> >> > mkfs.ext4 /dev/zram1 > >>>> >> > mount /dev/zram1 /tmp > >>>> >> > > >>>> >> > -6) Stats: > >>>> >> > +7) Stats: > >>>> >> > Per-device statistics are exported as various nodes under > >>>> >> > /sys/block/zram<id>/ > >>>> >> > disksize > >>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. > >>>> >> > compr_data_size > >>>> >> > mem_used_total > >>>> >> > > >>>> >> > -7) Deactivate: > >>>> >> > +8) Deactivate: > >>>> >> > swapoff /dev/zram0 > >>>> >> > umount /dev/zram1 > >>>> >> > > >>>> >> > -8) Reset: > >>>> >> > +9) Reset: > >>>> >> > Write any positive value to 'reset' sysfs node > >>>> >> > echo 1 > /sys/block/zram0/reset > >>>> >> > echo 1 > /sys/block/zram1/reset > >>>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > >>>> >> > index f0b8b30a7128..370c355eb127 100644 > >>>> >> > --- a/drivers/block/zram/zram_drv.c > >>>> >> > +++ b/drivers/block/zram/zram_drv.c > >>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, > >>>> >> > return scnprintf(buf, PAGE_SIZE, "%d\n", val); > >>>> >> > } > >>>> >> > > >>>> >> > +static ssize_t mem_limit_show(struct device *dev, > >>>> >> > + struct device_attribute *attr, char *buf) > >>>> >> > +{ > >>>> >> > + u64 val; > >>>> >> > + struct zram *zram = dev_to_zram(dev); > >>>> >> > + > >>>> >> > + down_read(&zram->init_lock); > >>>> >> > + val = zram->limit_pages; > >>>> >> > + up_read(&zram->init_lock); > >>>> >> > + > >>>> >> > + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); > >>>> >> > +} > >>>> >> > + > >>>> >> > +static ssize_t mem_limit_store(struct device *dev, > >>>> >> > + struct device_attribute *attr, const char *buf, size_t len) > >>>> >> > +{ > >>>> >> > + u64 limit; > >>>> >> > + struct zram *zram = dev_to_zram(dev); > >>>> >> > + > >>>> >> > + limit = memparse(buf, NULL); > >>>> >> > >>>> >> if (limit = 0 && buf != "0") > >>>> >> return -EINVAL > >>>> >> > >>>> >> > + down_write(&zram->init_lock); > >>>> >> > + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; > >>>> >> > + up_write(&zram->init_lock); > >>>> >> > + > >>>> >> > + return len; > >>>> >> > +} > >>>> >> > + > >>>> >> > static ssize_t max_comp_streams_store(struct device *dev, > >>>> >> > struct device_attribute *attr, const char *buf, size_t len) > >>>> >> > { > >>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, > >>>> >> > ret = -ENOMEM; > >>>> >> > goto out; > >>>> >> > } > >>>> >> > + > >>>> >> > + if (zram->limit_pages && > >>>> >> > + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { > >>>> >> > + zs_free(meta->mem_pool, handle); > >>>> >> > + ret = -ENOMEM; > >>>> >> > + goto out; > >>>> >> > + } > >>>> >> > + > >>>> >> > cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); > >>>> >> > > >>>> >> > if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { > >>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) > >>>> >> > struct zram_meta *meta; > >>>> >> > > >>>> >> > down_write(&zram->init_lock); > >>>> >> > + > >>>> >> > + zram->limit_pages = 0; > >>>> >> > + > >>>> >> > if (!init_done(zram)) { > >>>> >> > up_write(&zram->init_lock); > >>>> >> > return; > >>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); > >>>> >> > static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); > >>>> >> > static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); > >>>> >> > static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); > >>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, > >>>> >> > + mem_limit_store); > >>>> >> > static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, > >>>> >> > max_comp_streams_show, max_comp_streams_store); > >>>> >> > static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, > >>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { > >>>> >> > &dev_attr_orig_data_size.attr, > >>>> >> > &dev_attr_compr_data_size.attr, > >>>> >> > &dev_attr_mem_used_total.attr, > >>>> >> > + &dev_attr_mem_limit.attr, > >>>> >> > &dev_attr_max_comp_streams.attr, > >>>> >> > &dev_attr_comp_algorithm.attr, > >>>> >> > NULL, > >>>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h > >>>> >> > index e0f725c87cc6..b7aa9c21553f 100644 > >>>> >> > --- a/drivers/block/zram/zram_drv.h > >>>> >> > +++ b/drivers/block/zram/zram_drv.h > >>>> >> > @@ -112,6 +112,11 @@ struct zram { > >>>> >> > u64 disksize; /* bytes */ > >>>> >> > int max_comp_streams; > >>>> >> > struct zram_stats stats; > >>>> >> > + /* > >>>> >> > + * the number of pages zram can consume for storing compressed data > >>>> >> > + */ > >>>> >> > + unsigned long limit_pages; > >>>> >> > + > >>>> >> > char compressor[10]; > >>>> >> > }; > >>>> >> > #endif > >>>> >> > -- > >>>> >> > 2.0.0 > >>>> >> > > >>>> >> > >>>> >> -- > >>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in > >>>> >> the body to majordomo@kvack.org. For more info on Linux MM, > >>>> >> see: http://www.linux-mm.org/ . > >>>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > >>>> > > >>>> > -- > >>>> > Kind regards, > >>>> > Minchan Kim > >>>> > >>>> -- > >>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in > >>>> the body to majordomo@kvack.org. For more info on Linux MM, > >>>> see: http://www.linux-mm.org/ . > >>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > >>> > >>> -- > >>> Kind regards, > >>> Minchan Kim > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation @ 2014-08-26 4:39 ` Minchan Kim 0 siblings, 0 replies; 44+ messages in thread From: Minchan Kim @ 2014-08-26 4:39 UTC (permalink / raw) To: David Horner Cc: Dan Streetman, Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings Hi Dan and David, On Mon, Aug 25, 2014 at 09:54:57PM -0400, David Horner wrote: > On Mon, Aug 25, 2014 at 2:12 PM, Dan Streetman <ddstreet@ieee.org> wrote: > > On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@gmail.com> wrote: > >> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote: > >>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote: > >>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote: > >>>> > Hello David, > >>>> > > >>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote: > >>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: > >>>> >> > Since zram has no control feature to limit memory usage, > >>>> >> > it makes hard to manage system memrory. > >>>> >> > > >>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the > >>>> >> > a limit so that zram could fail allocation once it reaches > >>>> >> > the limit. > >>>> >> > > >>>> >> > In addition, user could change the limit in runtime so that > >>>> >> > he could manage the memory more dynamically. > >>>> >> > > >>>> >> - Default is no limit so it doesn't break old behavior. > >>>> >> + Initial state is no limit so it doesn't break old behavior. > >>>> >> > >>>> >> I understand your previous post now. > >>>> >> > >>>> >> I was saying that setting to either a null value or garbage > >>>> >> (which is interpreted as zero by memparse(buf, NULL);) > >>>> >> removes the limit. > >>>> >> > >>>> >> I think this is "surprise" behaviour and rather the null case should > >>>> >> return -EINVAL > >>>> >> The test below should be "good enough" though not catching all garbage. > >>>> > > >>>> > Thanks for suggesting but as I said, it should be fixed in memparse itself, > >>>> > not caller if it is really problem so I don't want to touch it in this > >>>> > patchset. It's not critical for adding the feature. > >>>> > > >>>> > >>>> I've looked into the memparse function more since we talked. > >>>> I do believe a wrapper function around it for the typical use by sysfs would > >>>> be very valuable. > >>> > >>> Agree. > >>> > >>>> However, there is nothing wrong with memparse itself that needs to be fixed. > >>>> > >>>> It does what it is documented to do very well (In My Uninformed Opinion). > >>>> It provides everything that a caller needs to manage the token that it > >>>> processes. > >>>> It thus handles strings like "7,,5,8,,9" with the implied zeros. > >>> > >>> Maybe strict_memparse would be better to protect such things so you > >>> could find several places to clean it up. > >>> > >>>> > >>>> The fact that other callers don't check the return pointer value to > >>>> see if only a null > >>>> string was processed, is not its fault. > >>>> Nor that it may not be ideally suited to sysfs attributes; that other store > >>>> functions use it in a given manner does not means that is correct - > >>>> nor that it is > >>>> incorrect for that "knob". Some attributes could be just as valid with > >>>> null zeros. > >>>> > >>>> And you are correct, to disambiguate the zero is not required for the > >>>> limit feature. > >>>> Your original patch which disallowed zero was full feature for mem_limit. > >>>> It is the requested non-crucial feature to allow zero to reestablish > >>>> the initial state > >>>> that benefits from distinguishing an explicit zero from a "default zero' > >>>> when garbage is written. > >>>> > >>>> The final argument is that if we release this feature as is the undocumented > >>>> functionality could be relied upon, and when later fixed: user space breaks. > >>> > >>> I don't get it. Why does it break userspace? > >>> The sysfs-block-zram says "0" means disable the limit. > >>> If someone writes *garabge* but work as if disabling the limit, > >>> it's not a right thing and he already broke although it worked > >>> so it would be not a problem if we fix later. > >>> (ie, we don't need to take care of broken userspace) > >>> Am I missing your point? > >>> > >> > >> Perhaps you are missing my point, perhaps ignoring or dismissing. > >> > >> Basically, if a facility works in a useful way, even if it was designed for > >> different usage, that becomes the "accepted" interface/usage. > >> The developer may not have intended that usage or may even considered > >> it wrong and a broken usage, but it is what it is and people become > >> reliant on that behaviour. > >> > >> Case in point is memparse itself. > >> > >> The developer intentionally sets the return pointer because that is the > >> only value that can be validated for correct performance. > >> The return value allows -ve so the standard error message passing is not valid. > >> Unfortunately, C allows the user to pass a NULL value in the parameter. > >> The developer could consider that absurd and fundamentally broken. > >> But to the user it is a valid situation, because (perhaps) it can't be > >> bothered to handle error cases. > >> > >> So, who is to blame. > >> You say memparse, that it is fundamentally broken, > >> because it didn't check to see that it was used correctly. > >> And I say mem_limit_store is fundamentally broken, > >> because it didn't check to see that it was used correctly. > > > > I think we should look at what the rest of the kernel does as far as > > checking memparse results. It appears to be a mix of some code > > checking memparse while others don't. The most common way to check > > appears to be to verify that memparse actually parsed at least 1 > > character, e.g.: > > oldp = p; > > mem_size = memparse(p, &p); > > if (p == oldp) > > return -EINVAL; > > > > although other places where 0 isn't valid can simply check for that: > > mem_size = memparse(p, &p); > > /* don't remove all of memory when handling "mem={invalid}" param */ > > if (mem_size == 0) > > return -EINVAL; > > > > or even the other memparse use in zram_drv.c: > > disksize = memparse(buf, NULL); > > if (!disksize) > > return -EINVAL; > > > > > > And there seem to be other places where (maybe?) there's no checking > > at all. However, it also seems like many cases of memparse usage are > > looking for a non-zero value, and therefore they can either > > immediately check for zero/invalid or (possibly) later code has checks > > to avoid using any zero value. In this case though, 0 is a valid > > value. So, while I agree that if a user passes an invalid (i.e. > > non-numeric) value it's clearly user error, it might be closer to the > > apparent (although unwritten AFAICT) memparse usage api to check the > > result for validity; in our case a simple check if at least 1 char was > > parsed is all that's needed, e.g.: > > > > { > > u64 limit; > > char *tmp = buf; > > struct zram *zram = dev_to_zram(dev); > > > > limit = memparse(buf, &tmp); > > if (buf == tmp) /* no chars parsed, invalid input */ > > return -EINVAL; > > down_write(&zram->init_lock); > > > Thank you Dan, for this clear, unoffensive and I believe compelling analysis. Thanks for suggestion, Dan. David, Are you okay for this? You pointed out several cases. One was NULL check. Dan's patch will fix it but other example you pointed out was "7,,5,8,,9". Slightly modifying your example, "0..1" can reset without returning EINVAL. Actually, it was not what we want. Couldn't we check it if you guys really want to prevent wrong use from userspace? If we don't need it, pz, give me a reason so I will convince and proceed this patchset and do further works. Thanks. > > I have much to learn. > > > ... > > > > > > Separate from this patch, it would also help if the lib/cmdline.c > > memparse doc was at least updated to clarify when the result should be > > checked for validity (e.g. always, or at least when the result is 0) > > and how best to do that (e.g. if 0 is an invalid value, just check if > > the result is 0; if 0 is a possible valid value, check if any chars > > were parsed). > > > > > > I'd argue that the code is not the place for this usage recommendation. > But rather an expansion of the support doc for sysfs > on how to use such parsing/validation routines. > > I agree with Minchan that these helper functions could be improved > for specific use by sysfs. > And I will pursue this. (and maybe the documentation?) > > > >> > >> The difference is that memparse cannot stop being abused > >> (C allows the NULL argument and extensive tricks are required to address that) > >> however, we can readily fix mem_limit_store and ensure > >> 1) no regression when the interface IS fixed and > >> 2) predictable behaviour when accidental or "fuzzy" input arrives. > >> > >> > >>>> They say getting API right is a difficult exercise. I suggest, if we > >>>> don't insisting on > >>>> an explicit zero we have the API wrong. > >>>> > >>>> I don't think you disagreed, just that the burden to get it correct > >>>> lay elsewhere. > >>>> > >>>> If that is the case it doesn't really matter, we cannot release this > >>>> interface until > >>>> it is corrected wherever it must be. > >>>> > >>>> And my zero check was a poor hack. > >>>> > >>>> I should have explicitly checked the returned pointer value. > >>>> > >>>> I will send that proposed revision, and hopefully you will consider it > >>>> for inclusion. > >>>> > >>>> > >>>> > >>>> > >>>> >> > >>>> >> > > >>>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org> > >>>> >> > --- > >>>> >> > Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ > >>>> >> > Documentation/blockdev/zram.txt | 24 ++++++++++++++--- > >>>> >> > drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ > >>>> >> > drivers/block/zram/zram_drv.h | 5 ++++ > >>>> >> > 4 files changed, 76 insertions(+), 4 deletions(-) > >>>> >> > > >>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram > >>>> >> > index 70ec992514d0..b8c779d64968 100644 > >>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram > >>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram > >>>> >> > @@ -119,3 +119,13 @@ Description: > >>>> >> > efficiency can be calculated using compr_data_size and this > >>>> >> > statistic. > >>>> >> > Unit: bytes > >>>> >> > + > >>>> >> > +What: /sys/block/zram<id>/mem_limit > >>>> >> > +Date: August 2014 > >>>> >> > +Contact: Minchan Kim <minchan@kernel.org> > >>>> >> > +Description: > >>>> >> > + The mem_limit file is read/write and specifies the amount > >>>> >> > + of memory to be able to consume memory to store store > >>>> >> > + compressed data. The limit could be changed in run time > >>>> >> > - and "0" is default which means disable the limit. > >>>> >> > + and "0" means disable the limit. No limit is the initial state. > >>>> >> > >>>> >> there should be no default in the API. > >>>> > > >>>> > Thanks. > >>>> > > >>>> >> > >>>> >> > + Unit: bytes > >>>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt > >>>> >> > index 0595c3f56ccf..82c6a41116db 100644 > >>>> >> > --- a/Documentation/blockdev/zram.txt > >>>> >> > +++ b/Documentation/blockdev/zram.txt > >>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory > >>>> >> > since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the > >>>> >> > size of the disk when not in use so a huge zram is wasteful. > >>>> >> > > >>>> >> > -5) Activate: > >>>> >> > +5) Set memory limit: Optional > >>>> >> > + Set memory limit by writing the value to sysfs node 'mem_limit'. > >>>> >> > + The value can be either in bytes or you can use mem suffixes. > >>>> >> > + In addition, you could change the value in runtime. > >>>> >> > + Examples: > >>>> >> > + # limit /dev/zram0 with 50MB memory > >>>> >> > + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit > >>>> >> > + > >>>> >> > + # Using mem suffixes > >>>> >> > + echo 256K > /sys/block/zram0/mem_limit > >>>> >> > + echo 512M > /sys/block/zram0/mem_limit > >>>> >> > + echo 1G > /sys/block/zram0/mem_limit > >>>> >> > + > >>>> >> > + # To disable memory limit > >>>> >> > + echo 0 > /sys/block/zram0/mem_limit > >>>> >> > + > >>>> >> > +6) Activate: > >>>> >> > mkswap /dev/zram0 > >>>> >> > swapon /dev/zram0 > >>>> >> > > >>>> >> > mkfs.ext4 /dev/zram1 > >>>> >> > mount /dev/zram1 /tmp > >>>> >> > > >>>> >> > -6) Stats: > >>>> >> > +7) Stats: > >>>> >> > Per-device statistics are exported as various nodes under > >>>> >> > /sys/block/zram<id>/ > >>>> >> > disksize > >>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. > >>>> >> > compr_data_size > >>>> >> > mem_used_total > >>>> >> > > >>>> >> > -7) Deactivate: > >>>> >> > +8) Deactivate: > >>>> >> > swapoff /dev/zram0 > >>>> >> > umount /dev/zram1 > >>>> >> > > >>>> >> > -8) Reset: > >>>> >> > +9) Reset: > >>>> >> > Write any positive value to 'reset' sysfs node > >>>> >> > echo 1 > /sys/block/zram0/reset > >>>> >> > echo 1 > /sys/block/zram1/reset > >>>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > >>>> >> > index f0b8b30a7128..370c355eb127 100644 > >>>> >> > --- a/drivers/block/zram/zram_drv.c > >>>> >> > +++ b/drivers/block/zram/zram_drv.c > >>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, > >>>> >> > return scnprintf(buf, PAGE_SIZE, "%d\n", val); > >>>> >> > } > >>>> >> > > >>>> >> > +static ssize_t mem_limit_show(struct device *dev, > >>>> >> > + struct device_attribute *attr, char *buf) > >>>> >> > +{ > >>>> >> > + u64 val; > >>>> >> > + struct zram *zram = dev_to_zram(dev); > >>>> >> > + > >>>> >> > + down_read(&zram->init_lock); > >>>> >> > + val = zram->limit_pages; > >>>> >> > + up_read(&zram->init_lock); > >>>> >> > + > >>>> >> > + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); > >>>> >> > +} > >>>> >> > + > >>>> >> > +static ssize_t mem_limit_store(struct device *dev, > >>>> >> > + struct device_attribute *attr, const char *buf, size_t len) > >>>> >> > +{ > >>>> >> > + u64 limit; > >>>> >> > + struct zram *zram = dev_to_zram(dev); > >>>> >> > + > >>>> >> > + limit = memparse(buf, NULL); > >>>> >> > >>>> >> if (limit = 0 && buf != "0") > >>>> >> return -EINVAL > >>>> >> > >>>> >> > + down_write(&zram->init_lock); > >>>> >> > + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; > >>>> >> > + up_write(&zram->init_lock); > >>>> >> > + > >>>> >> > + return len; > >>>> >> > +} > >>>> >> > + > >>>> >> > static ssize_t max_comp_streams_store(struct device *dev, > >>>> >> > struct device_attribute *attr, const char *buf, size_t len) > >>>> >> > { > >>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, > >>>> >> > ret = -ENOMEM; > >>>> >> > goto out; > >>>> >> > } > >>>> >> > + > >>>> >> > + if (zram->limit_pages && > >>>> >> > + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { > >>>> >> > + zs_free(meta->mem_pool, handle); > >>>> >> > + ret = -ENOMEM; > >>>> >> > + goto out; > >>>> >> > + } > >>>> >> > + > >>>> >> > cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); > >>>> >> > > >>>> >> > if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { > >>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) > >>>> >> > struct zram_meta *meta; > >>>> >> > > >>>> >> > down_write(&zram->init_lock); > >>>> >> > + > >>>> >> > + zram->limit_pages = 0; > >>>> >> > + > >>>> >> > if (!init_done(zram)) { > >>>> >> > up_write(&zram->init_lock); > >>>> >> > return; > >>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); > >>>> >> > static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); > >>>> >> > static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); > >>>> >> > static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); > >>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, > >>>> >> > + mem_limit_store); > >>>> >> > static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, > >>>> >> > max_comp_streams_show, max_comp_streams_store); > >>>> >> > static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, > >>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { > >>>> >> > &dev_attr_orig_data_size.attr, > >>>> >> > &dev_attr_compr_data_size.attr, > >>>> >> > &dev_attr_mem_used_total.attr, > >>>> >> > + &dev_attr_mem_limit.attr, > >>>> >> > &dev_attr_max_comp_streams.attr, > >>>> >> > &dev_attr_comp_algorithm.attr, > >>>> >> > NULL, > >>>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h > >>>> >> > index e0f725c87cc6..b7aa9c21553f 100644 > >>>> >> > --- a/drivers/block/zram/zram_drv.h > >>>> >> > +++ b/drivers/block/zram/zram_drv.h > >>>> >> > @@ -112,6 +112,11 @@ struct zram { > >>>> >> > u64 disksize; /* bytes */ > >>>> >> > int max_comp_streams; > >>>> >> > struct zram_stats stats; > >>>> >> > + /* > >>>> >> > + * the number of pages zram can consume for storing compressed data > >>>> >> > + */ > >>>> >> > + unsigned long limit_pages; > >>>> >> > + > >>>> >> > char compressor[10]; > >>>> >> > }; > >>>> >> > #endif > >>>> >> > -- > >>>> >> > 2.0.0 > >>>> >> > > >>>> >> > >>>> >> -- > >>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in > >>>> >> the body to majordomo@kvack.org. For more info on Linux MM, > >>>> >> see: http://www.linux-mm.org/ . > >>>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > >>>> > > >>>> > -- > >>>> > Kind regards, > >>>> > Minchan Kim > >>>> > >>>> -- > >>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in > >>>> the body to majordomo@kvack.org. For more info on Linux MM, > >>>> see: http://www.linux-mm.org/ . > >>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > >>> > >>> -- > >>> Kind regards, > >>> Minchan Kim > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation 2014-08-26 4:39 ` Minchan Kim @ 2014-08-26 5:36 ` David Horner -1 siblings, 0 replies; 44+ messages in thread From: David Horner @ 2014-08-26 5:36 UTC (permalink / raw) To: Minchan Kim Cc: Dan Streetman, Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings On Tue, Aug 26, 2014 at 12:39 AM, Minchan Kim <minchan@kernel.org> wrote: > Hi Dan and David, > > On Mon, Aug 25, 2014 at 09:54:57PM -0400, David Horner wrote: >> On Mon, Aug 25, 2014 at 2:12 PM, Dan Streetman <ddstreet@ieee.org> wrote: >> > On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@gmail.com> wrote: >> >> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote: >> >>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote: >> >>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote: >> >>>> > Hello David, >> >>>> > >> >>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote: >> >>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: >> >>>> >> > Since zram has no control feature to limit memory usage, >> >>>> >> > it makes hard to manage system memrory. >> >>>> >> > >> >>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the >> >>>> >> > a limit so that zram could fail allocation once it reaches >> >>>> >> > the limit. >> >>>> >> > >> >>>> >> > In addition, user could change the limit in runtime so that >> >>>> >> > he could manage the memory more dynamically. >> >>>> >> > >> >>>> >> - Default is no limit so it doesn't break old behavior. >> >>>> >> + Initial state is no limit so it doesn't break old behavior. >> >>>> >> >> >>>> >> I understand your previous post now. >> >>>> >> >> >>>> >> I was saying that setting to either a null value or garbage >> >>>> >> (which is interpreted as zero by memparse(buf, NULL);) >> >>>> >> removes the limit. >> >>>> >> >> >>>> >> I think this is "surprise" behaviour and rather the null case should >> >>>> >> return -EINVAL >> >>>> >> The test below should be "good enough" though not catching all garbage. >> >>>> > >> >>>> > Thanks for suggesting but as I said, it should be fixed in memparse itself, >> >>>> > not caller if it is really problem so I don't want to touch it in this >> >>>> > patchset. It's not critical for adding the feature. >> >>>> > >> >>>> >> >>>> I've looked into the memparse function more since we talked. >> >>>> I do believe a wrapper function around it for the typical use by sysfs would >> >>>> be very valuable. >> >>> >> >>> Agree. >> >>> >> >>>> However, there is nothing wrong with memparse itself that needs to be fixed. >> >>>> >> >>>> It does what it is documented to do very well (In My Uninformed Opinion). >> >>>> It provides everything that a caller needs to manage the token that it >> >>>> processes. >> >>>> It thus handles strings like "7,,5,8,,9" with the implied zeros. >> >>> >> >>> Maybe strict_memparse would be better to protect such things so you >> >>> could find several places to clean it up. >> >>> >> >>>> >> >>>> The fact that other callers don't check the return pointer value to >> >>>> see if only a null >> >>>> string was processed, is not its fault. >> >>>> Nor that it may not be ideally suited to sysfs attributes; that other store >> >>>> functions use it in a given manner does not means that is correct - >> >>>> nor that it is >> >>>> incorrect for that "knob". Some attributes could be just as valid with >> >>>> null zeros. >> >>>> >> >>>> And you are correct, to disambiguate the zero is not required for the >> >>>> limit feature. >> >>>> Your original patch which disallowed zero was full feature for mem_limit. >> >>>> It is the requested non-crucial feature to allow zero to reestablish >> >>>> the initial state >> >>>> that benefits from distinguishing an explicit zero from a "default zero' >> >>>> when garbage is written. >> >>>> >> >>>> The final argument is that if we release this feature as is the undocumented >> >>>> functionality could be relied upon, and when later fixed: user space breaks. >> >>> >> >>> I don't get it. Why does it break userspace? >> >>> The sysfs-block-zram says "0" means disable the limit. >> >>> If someone writes *garabge* but work as if disabling the limit, >> >>> it's not a right thing and he already broke although it worked >> >>> so it would be not a problem if we fix later. >> >>> (ie, we don't need to take care of broken userspace) >> >>> Am I missing your point? >> >>> >> >> >> >> Perhaps you are missing my point, perhaps ignoring or dismissing. >> >> >> >> Basically, if a facility works in a useful way, even if it was designed for >> >> different usage, that becomes the "accepted" interface/usage. >> >> The developer may not have intended that usage or may even considered >> >> it wrong and a broken usage, but it is what it is and people become >> >> reliant on that behaviour. >> >> >> >> Case in point is memparse itself. >> >> >> >> The developer intentionally sets the return pointer because that is the >> >> only value that can be validated for correct performance. >> >> The return value allows -ve so the standard error message passing is not valid. >> >> Unfortunately, C allows the user to pass a NULL value in the parameter. >> >> The developer could consider that absurd and fundamentally broken. >> >> But to the user it is a valid situation, because (perhaps) it can't be >> >> bothered to handle error cases. >> >> >> >> So, who is to blame. >> >> You say memparse, that it is fundamentally broken, >> >> because it didn't check to see that it was used correctly. >> >> And I say mem_limit_store is fundamentally broken, >> >> because it didn't check to see that it was used correctly. >> > >> > I think we should look at what the rest of the kernel does as far as >> > checking memparse results. It appears to be a mix of some code >> > checking memparse while others don't. The most common way to check >> > appears to be to verify that memparse actually parsed at least 1 >> > character, e.g.: >> > oldp = p; >> > mem_size = memparse(p, &p); >> > if (p == oldp) >> > return -EINVAL; >> > >> > although other places where 0 isn't valid can simply check for that: >> > mem_size = memparse(p, &p); >> > /* don't remove all of memory when handling "mem={invalid}" param */ >> > if (mem_size == 0) >> > return -EINVAL; >> > >> > or even the other memparse use in zram_drv.c: >> > disksize = memparse(buf, NULL); >> > if (!disksize) >> > return -EINVAL; >> > >> > >> > And there seem to be other places where (maybe?) there's no checking >> > at all. However, it also seems like many cases of memparse usage are >> > looking for a non-zero value, and therefore they can either >> > immediately check for zero/invalid or (possibly) later code has checks >> > to avoid using any zero value. In this case though, 0 is a valid >> > value. So, while I agree that if a user passes an invalid (i.e. >> > non-numeric) value it's clearly user error, it might be closer to the >> > apparent (although unwritten AFAICT) memparse usage api to check the >> > result for validity; in our case a simple check if at least 1 char was >> > parsed is all that's needed, e.g.: >> > >> > { >> > u64 limit; >> > char *tmp = buf; >> > struct zram *zram = dev_to_zram(dev); >> > >> > limit = memparse(buf, &tmp); >> > if (buf == tmp) /* no chars parsed, invalid input */ >> > return -EINVAL; >> > down_write(&zram->init_lock); >> >> >> Thank you Dan, for this clear, unoffensive and I believe compelling analysis. > > Thanks for suggestion, Dan. > > David, Are you okay for this? > > You pointed out several cases. One was NULL check. > Dan's patch will fix it but other example you pointed out was > "7,,5,8,,9". Slightly modifying your example, "0..1" can reset without > returning EINVAL. Actually, it was not what we want. > Couldn't we check it if you guys really want to prevent wrong use from > userspace? If we don't need it, pz, give me a reason so I will convince > and proceed this patchset and do further works. > > Thanks. > I'm very happy about this patch. As for your example, yes, the validation is somewhat slack. We could insist that the parsed value exactly matches the supplied input length. But the general case of trailing blanks, and as you pointed out, CR LF or other valid end-of-line codes would also have to be taken into account. A substantial coding for little value returned. I agree that in this case the fix up should be elsewhere, in the sysfs support layer. Trailing white space and end-of-line indicators should be optionally stripped before the store routine gets them, and a known terminating value appended. Then the checking and overrun avoidance can be reasonably implemented. Until then, the code is good as far as I am concerned. The API is sound and the exposure to overruns and false indications is already quite low. (more for me to research and hopefully have time to do some real coding). Finally, if the user wanted to express a fractional unit allocation, like .8G, that too would be a nice enhancement that could be added later as I don't see that breaking the API. (comments on this? Dan?) >> >> I have much to learn. >> >> > ... >> > >> > >> > Separate from this patch, it would also help if the lib/cmdline.c >> > memparse doc was at least updated to clarify when the result should be >> > checked for validity (e.g. always, or at least when the result is 0) >> > and how best to do that (e.g. if 0 is an invalid value, just check if >> > the result is 0; if 0 is a possible valid value, check if any chars >> > were parsed). >> > >> > >> >> I'd argue that the code is not the place for this usage recommendation. >> But rather an expansion of the support doc for sysfs >> on how to use such parsing/validation routines. >> >> I agree with Minchan that these helper functions could be improved >> for specific use by sysfs. >> And I will pursue this. (and maybe the documentation?) >> >> >> >> >> >> The difference is that memparse cannot stop being abused >> >> (C allows the NULL argument and extensive tricks are required to address that) >> >> however, we can readily fix mem_limit_store and ensure >> >> 1) no regression when the interface IS fixed and >> >> 2) predictable behaviour when accidental or "fuzzy" input arrives. >> >> >> >> >> >>>> They say getting API right is a difficult exercise. I suggest, if we >> >>>> don't insisting on >> >>>> an explicit zero we have the API wrong. >> >>>> >> >>>> I don't think you disagreed, just that the burden to get it correct >> >>>> lay elsewhere. >> >>>> >> >>>> If that is the case it doesn't really matter, we cannot release this >> >>>> interface until >> >>>> it is corrected wherever it must be. >> >>>> >> >>>> And my zero check was a poor hack. >> >>>> >> >>>> I should have explicitly checked the returned pointer value. >> >>>> >> >>>> I will send that proposed revision, and hopefully you will consider it >> >>>> for inclusion. >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >> >>>> >> > >> >>>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org> >> >>>> >> > --- >> >>>> >> > Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ >> >>>> >> > Documentation/blockdev/zram.txt | 24 ++++++++++++++--- >> >>>> >> > drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ >> >>>> >> > drivers/block/zram/zram_drv.h | 5 ++++ >> >>>> >> > 4 files changed, 76 insertions(+), 4 deletions(-) >> >>>> >> > >> >>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram >> >>>> >> > index 70ec992514d0..b8c779d64968 100644 >> >>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram >> >>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram >> >>>> >> > @@ -119,3 +119,13 @@ Description: >> >>>> >> > efficiency can be calculated using compr_data_size and this >> >>>> >> > statistic. >> >>>> >> > Unit: bytes >> >>>> >> > + >> >>>> >> > +What: /sys/block/zram<id>/mem_limit >> >>>> >> > +Date: August 2014 >> >>>> >> > +Contact: Minchan Kim <minchan@kernel.org> >> >>>> >> > +Description: >> >>>> >> > + The mem_limit file is read/write and specifies the amount >> >>>> >> > + of memory to be able to consume memory to store store >> >>>> >> > + compressed data. The limit could be changed in run time >> >>>> >> > - and "0" is default which means disable the limit. >> >>>> >> > + and "0" means disable the limit. No limit is the initial state. >> >>>> >> >> >>>> >> there should be no default in the API. >> >>>> > >> >>>> > Thanks. >> >>>> > >> >>>> >> >> >>>> >> > + Unit: bytes >> >>>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt >> >>>> >> > index 0595c3f56ccf..82c6a41116db 100644 >> >>>> >> > --- a/Documentation/blockdev/zram.txt >> >>>> >> > +++ b/Documentation/blockdev/zram.txt >> >>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory >> >>>> >> > since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the >> >>>> >> > size of the disk when not in use so a huge zram is wasteful. >> >>>> >> > >> >>>> >> > -5) Activate: >> >>>> >> > +5) Set memory limit: Optional >> >>>> >> > + Set memory limit by writing the value to sysfs node 'mem_limit'. >> >>>> >> > + The value can be either in bytes or you can use mem suffixes. >> >>>> >> > + In addition, you could change the value in runtime. >> >>>> >> > + Examples: >> >>>> >> > + # limit /dev/zram0 with 50MB memory >> >>>> >> > + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit >> >>>> >> > + >> >>>> >> > + # Using mem suffixes >> >>>> >> > + echo 256K > /sys/block/zram0/mem_limit >> >>>> >> > + echo 512M > /sys/block/zram0/mem_limit >> >>>> >> > + echo 1G > /sys/block/zram0/mem_limit >> >>>> >> > + >> >>>> >> > + # To disable memory limit >> >>>> >> > + echo 0 > /sys/block/zram0/mem_limit >> >>>> >> > + >> >>>> >> > +6) Activate: >> >>>> >> > mkswap /dev/zram0 >> >>>> >> > swapon /dev/zram0 >> >>>> >> > >> >>>> >> > mkfs.ext4 /dev/zram1 >> >>>> >> > mount /dev/zram1 /tmp >> >>>> >> > >> >>>> >> > -6) Stats: >> >>>> >> > +7) Stats: >> >>>> >> > Per-device statistics are exported as various nodes under >> >>>> >> > /sys/block/zram<id>/ >> >>>> >> > disksize >> >>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. >> >>>> >> > compr_data_size >> >>>> >> > mem_used_total >> >>>> >> > >> >>>> >> > -7) Deactivate: >> >>>> >> > +8) Deactivate: >> >>>> >> > swapoff /dev/zram0 >> >>>> >> > umount /dev/zram1 >> >>>> >> > >> >>>> >> > -8) Reset: >> >>>> >> > +9) Reset: >> >>>> >> > Write any positive value to 'reset' sysfs node >> >>>> >> > echo 1 > /sys/block/zram0/reset >> >>>> >> > echo 1 > /sys/block/zram1/reset >> >>>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c >> >>>> >> > index f0b8b30a7128..370c355eb127 100644 >> >>>> >> > --- a/drivers/block/zram/zram_drv.c >> >>>> >> > +++ b/drivers/block/zram/zram_drv.c >> >>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, >> >>>> >> > return scnprintf(buf, PAGE_SIZE, "%d\n", val); >> >>>> >> > } >> >>>> >> > >> >>>> >> > +static ssize_t mem_limit_show(struct device *dev, >> >>>> >> > + struct device_attribute *attr, char *buf) >> >>>> >> > +{ >> >>>> >> > + u64 val; >> >>>> >> > + struct zram *zram = dev_to_zram(dev); >> >>>> >> > + >> >>>> >> > + down_read(&zram->init_lock); >> >>>> >> > + val = zram->limit_pages; >> >>>> >> > + up_read(&zram->init_lock); >> >>>> >> > + >> >>>> >> > + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); >> >>>> >> > +} >> >>>> >> > + >> >>>> >> > +static ssize_t mem_limit_store(struct device *dev, >> >>>> >> > + struct device_attribute *attr, const char *buf, size_t len) >> >>>> >> > +{ >> >>>> >> > + u64 limit; >> >>>> >> > + struct zram *zram = dev_to_zram(dev); >> >>>> >> > + >> >>>> >> > + limit = memparse(buf, NULL); >> >>>> >> >> >>>> >> if (limit = 0 && buf != "0") >> >>>> >> return -EINVAL >> >>>> >> >> >>>> >> > + down_write(&zram->init_lock); >> >>>> >> > + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; >> >>>> >> > + up_write(&zram->init_lock); >> >>>> >> > + >> >>>> >> > + return len; >> >>>> >> > +} >> >>>> >> > + >> >>>> >> > static ssize_t max_comp_streams_store(struct device *dev, >> >>>> >> > struct device_attribute *attr, const char *buf, size_t len) >> >>>> >> > { >> >>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, >> >>>> >> > ret = -ENOMEM; >> >>>> >> > goto out; >> >>>> >> > } >> >>>> >> > + >> >>>> >> > + if (zram->limit_pages && >> >>>> >> > + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { >> >>>> >> > + zs_free(meta->mem_pool, handle); >> >>>> >> > + ret = -ENOMEM; >> >>>> >> > + goto out; >> >>>> >> > + } >> >>>> >> > + >> >>>> >> > cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); >> >>>> >> > >> >>>> >> > if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { >> >>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) >> >>>> >> > struct zram_meta *meta; >> >>>> >> > >> >>>> >> > down_write(&zram->init_lock); >> >>>> >> > + >> >>>> >> > + zram->limit_pages = 0; >> >>>> >> > + >> >>>> >> > if (!init_done(zram)) { >> >>>> >> > up_write(&zram->init_lock); >> >>>> >> > return; >> >>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); >> >>>> >> > static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); >> >>>> >> > static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); >> >>>> >> > static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); >> >>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, >> >>>> >> > + mem_limit_store); >> >>>> >> > static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, >> >>>> >> > max_comp_streams_show, max_comp_streams_store); >> >>>> >> > static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, >> >>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { >> >>>> >> > &dev_attr_orig_data_size.attr, >> >>>> >> > &dev_attr_compr_data_size.attr, >> >>>> >> > &dev_attr_mem_used_total.attr, >> >>>> >> > + &dev_attr_mem_limit.attr, >> >>>> >> > &dev_attr_max_comp_streams.attr, >> >>>> >> > &dev_attr_comp_algorithm.attr, >> >>>> >> > NULL, >> >>>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h >> >>>> >> > index e0f725c87cc6..b7aa9c21553f 100644 >> >>>> >> > --- a/drivers/block/zram/zram_drv.h >> >>>> >> > +++ b/drivers/block/zram/zram_drv.h >> >>>> >> > @@ -112,6 +112,11 @@ struct zram { >> >>>> >> > u64 disksize; /* bytes */ >> >>>> >> > int max_comp_streams; >> >>>> >> > struct zram_stats stats; >> >>>> >> > + /* >> >>>> >> > + * the number of pages zram can consume for storing compressed data >> >>>> >> > + */ >> >>>> >> > + unsigned long limit_pages; >> >>>> >> > + >> >>>> >> > char compressor[10]; >> >>>> >> > }; >> >>>> >> > #endif >> >>>> >> > -- >> >>>> >> > 2.0.0 >> >>>> >> > >> >>>> >> >> >>>> >> -- >> >>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> >>>> >> the body to majordomo@kvack.org. For more info on Linux MM, >> >>>> >> see: http://www.linux-mm.org/ . >> >>>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >> >>>> > >> >>>> > -- >> >>>> > Kind regards, >> >>>> > Minchan Kim >> >>>> >> >>>> -- >> >>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> >>>> the body to majordomo@kvack.org. For more info on Linux MM, >> >>>> see: http://www.linux-mm.org/ . >> >>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >> >>> >> >>> -- >> >>> Kind regards, >> >>> Minchan Kim >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > > -- > Kind regards, > Minchan Kim ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation @ 2014-08-26 5:36 ` David Horner 0 siblings, 0 replies; 44+ messages in thread From: David Horner @ 2014-08-26 5:36 UTC (permalink / raw) To: Minchan Kim Cc: Dan Streetman, Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings On Tue, Aug 26, 2014 at 12:39 AM, Minchan Kim <minchan@kernel.org> wrote: > Hi Dan and David, > > On Mon, Aug 25, 2014 at 09:54:57PM -0400, David Horner wrote: >> On Mon, Aug 25, 2014 at 2:12 PM, Dan Streetman <ddstreet@ieee.org> wrote: >> > On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@gmail.com> wrote: >> >> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote: >> >>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote: >> >>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote: >> >>>> > Hello David, >> >>>> > >> >>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote: >> >>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: >> >>>> >> > Since zram has no control feature to limit memory usage, >> >>>> >> > it makes hard to manage system memrory. >> >>>> >> > >> >>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the >> >>>> >> > a limit so that zram could fail allocation once it reaches >> >>>> >> > the limit. >> >>>> >> > >> >>>> >> > In addition, user could change the limit in runtime so that >> >>>> >> > he could manage the memory more dynamically. >> >>>> >> > >> >>>> >> - Default is no limit so it doesn't break old behavior. >> >>>> >> + Initial state is no limit so it doesn't break old behavior. >> >>>> >> >> >>>> >> I understand your previous post now. >> >>>> >> >> >>>> >> I was saying that setting to either a null value or garbage >> >>>> >> (which is interpreted as zero by memparse(buf, NULL);) >> >>>> >> removes the limit. >> >>>> >> >> >>>> >> I think this is "surprise" behaviour and rather the null case should >> >>>> >> return -EINVAL >> >>>> >> The test below should be "good enough" though not catching all garbage. >> >>>> > >> >>>> > Thanks for suggesting but as I said, it should be fixed in memparse itself, >> >>>> > not caller if it is really problem so I don't want to touch it in this >> >>>> > patchset. It's not critical for adding the feature. >> >>>> > >> >>>> >> >>>> I've looked into the memparse function more since we talked. >> >>>> I do believe a wrapper function around it for the typical use by sysfs would >> >>>> be very valuable. >> >>> >> >>> Agree. >> >>> >> >>>> However, there is nothing wrong with memparse itself that needs to be fixed. >> >>>> >> >>>> It does what it is documented to do very well (In My Uninformed Opinion). >> >>>> It provides everything that a caller needs to manage the token that it >> >>>> processes. >> >>>> It thus handles strings like "7,,5,8,,9" with the implied zeros. >> >>> >> >>> Maybe strict_memparse would be better to protect such things so you >> >>> could find several places to clean it up. >> >>> >> >>>> >> >>>> The fact that other callers don't check the return pointer value to >> >>>> see if only a null >> >>>> string was processed, is not its fault. >> >>>> Nor that it may not be ideally suited to sysfs attributes; that other store >> >>>> functions use it in a given manner does not means that is correct - >> >>>> nor that it is >> >>>> incorrect for that "knob". Some attributes could be just as valid with >> >>>> null zeros. >> >>>> >> >>>> And you are correct, to disambiguate the zero is not required for the >> >>>> limit feature. >> >>>> Your original patch which disallowed zero was full feature for mem_limit. >> >>>> It is the requested non-crucial feature to allow zero to reestablish >> >>>> the initial state >> >>>> that benefits from distinguishing an explicit zero from a "default zero' >> >>>> when garbage is written. >> >>>> >> >>>> The final argument is that if we release this feature as is the undocumented >> >>>> functionality could be relied upon, and when later fixed: user space breaks. >> >>> >> >>> I don't get it. Why does it break userspace? >> >>> The sysfs-block-zram says "0" means disable the limit. >> >>> If someone writes *garabge* but work as if disabling the limit, >> >>> it's not a right thing and he already broke although it worked >> >>> so it would be not a problem if we fix later. >> >>> (ie, we don't need to take care of broken userspace) >> >>> Am I missing your point? >> >>> >> >> >> >> Perhaps you are missing my point, perhaps ignoring or dismissing. >> >> >> >> Basically, if a facility works in a useful way, even if it was designed for >> >> different usage, that becomes the "accepted" interface/usage. >> >> The developer may not have intended that usage or may even considered >> >> it wrong and a broken usage, but it is what it is and people become >> >> reliant on that behaviour. >> >> >> >> Case in point is memparse itself. >> >> >> >> The developer intentionally sets the return pointer because that is the >> >> only value that can be validated for correct performance. >> >> The return value allows -ve so the standard error message passing is not valid. >> >> Unfortunately, C allows the user to pass a NULL value in the parameter. >> >> The developer could consider that absurd and fundamentally broken. >> >> But to the user it is a valid situation, because (perhaps) it can't be >> >> bothered to handle error cases. >> >> >> >> So, who is to blame. >> >> You say memparse, that it is fundamentally broken, >> >> because it didn't check to see that it was used correctly. >> >> And I say mem_limit_store is fundamentally broken, >> >> because it didn't check to see that it was used correctly. >> > >> > I think we should look at what the rest of the kernel does as far as >> > checking memparse results. It appears to be a mix of some code >> > checking memparse while others don't. The most common way to check >> > appears to be to verify that memparse actually parsed at least 1 >> > character, e.g.: >> > oldp = p; >> > mem_size = memparse(p, &p); >> > if (p == oldp) >> > return -EINVAL; >> > >> > although other places where 0 isn't valid can simply check for that: >> > mem_size = memparse(p, &p); >> > /* don't remove all of memory when handling "mem={invalid}" param */ >> > if (mem_size == 0) >> > return -EINVAL; >> > >> > or even the other memparse use in zram_drv.c: >> > disksize = memparse(buf, NULL); >> > if (!disksize) >> > return -EINVAL; >> > >> > >> > And there seem to be other places where (maybe?) there's no checking >> > at all. However, it also seems like many cases of memparse usage are >> > looking for a non-zero value, and therefore they can either >> > immediately check for zero/invalid or (possibly) later code has checks >> > to avoid using any zero value. In this case though, 0 is a valid >> > value. So, while I agree that if a user passes an invalid (i.e. >> > non-numeric) value it's clearly user error, it might be closer to the >> > apparent (although unwritten AFAICT) memparse usage api to check the >> > result for validity; in our case a simple check if at least 1 char was >> > parsed is all that's needed, e.g.: >> > >> > { >> > u64 limit; >> > char *tmp = buf; >> > struct zram *zram = dev_to_zram(dev); >> > >> > limit = memparse(buf, &tmp); >> > if (buf == tmp) /* no chars parsed, invalid input */ >> > return -EINVAL; >> > down_write(&zram->init_lock); >> >> >> Thank you Dan, for this clear, unoffensive and I believe compelling analysis. > > Thanks for suggestion, Dan. > > David, Are you okay for this? > > You pointed out several cases. One was NULL check. > Dan's patch will fix it but other example you pointed out was > "7,,5,8,,9". Slightly modifying your example, "0..1" can reset without > returning EINVAL. Actually, it was not what we want. > Couldn't we check it if you guys really want to prevent wrong use from > userspace? If we don't need it, pz, give me a reason so I will convince > and proceed this patchset and do further works. > > Thanks. > I'm very happy about this patch. As for your example, yes, the validation is somewhat slack. We could insist that the parsed value exactly matches the supplied input length. But the general case of trailing blanks, and as you pointed out, CR LF or other valid end-of-line codes would also have to be taken into account. A substantial coding for little value returned. I agree that in this case the fix up should be elsewhere, in the sysfs support layer. Trailing white space and end-of-line indicators should be optionally stripped before the store routine gets them, and a known terminating value appended. Then the checking and overrun avoidance can be reasonably implemented. Until then, the code is good as far as I am concerned. The API is sound and the exposure to overruns and false indications is already quite low. (more for me to research and hopefully have time to do some real coding). Finally, if the user wanted to express a fractional unit allocation, like .8G, that too would be a nice enhancement that could be added later as I don't see that breaking the API. (comments on this? Dan?) >> >> I have much to learn. >> >> > ... >> > >> > >> > Separate from this patch, it would also help if the lib/cmdline.c >> > memparse doc was at least updated to clarify when the result should be >> > checked for validity (e.g. always, or at least when the result is 0) >> > and how best to do that (e.g. if 0 is an invalid value, just check if >> > the result is 0; if 0 is a possible valid value, check if any chars >> > were parsed). >> > >> > >> >> I'd argue that the code is not the place for this usage recommendation. >> But rather an expansion of the support doc for sysfs >> on how to use such parsing/validation routines. >> >> I agree with Minchan that these helper functions could be improved >> for specific use by sysfs. >> And I will pursue this. (and maybe the documentation?) >> >> >> >> >> >> The difference is that memparse cannot stop being abused >> >> (C allows the NULL argument and extensive tricks are required to address that) >> >> however, we can readily fix mem_limit_store and ensure >> >> 1) no regression when the interface IS fixed and >> >> 2) predictable behaviour when accidental or "fuzzy" input arrives. >> >> >> >> >> >>>> They say getting API right is a difficult exercise. I suggest, if we >> >>>> don't insisting on >> >>>> an explicit zero we have the API wrong. >> >>>> >> >>>> I don't think you disagreed, just that the burden to get it correct >> >>>> lay elsewhere. >> >>>> >> >>>> If that is the case it doesn't really matter, we cannot release this >> >>>> interface until >> >>>> it is corrected wherever it must be. >> >>>> >> >>>> And my zero check was a poor hack. >> >>>> >> >>>> I should have explicitly checked the returned pointer value. >> >>>> >> >>>> I will send that proposed revision, and hopefully you will consider it >> >>>> for inclusion. >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >> >>>> >> > >> >>>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org> >> >>>> >> > --- >> >>>> >> > Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ >> >>>> >> > Documentation/blockdev/zram.txt | 24 ++++++++++++++--- >> >>>> >> > drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ >> >>>> >> > drivers/block/zram/zram_drv.h | 5 ++++ >> >>>> >> > 4 files changed, 76 insertions(+), 4 deletions(-) >> >>>> >> > >> >>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram >> >>>> >> > index 70ec992514d0..b8c779d64968 100644 >> >>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram >> >>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram >> >>>> >> > @@ -119,3 +119,13 @@ Description: >> >>>> >> > efficiency can be calculated using compr_data_size and this >> >>>> >> > statistic. >> >>>> >> > Unit: bytes >> >>>> >> > + >> >>>> >> > +What: /sys/block/zram<id>/mem_limit >> >>>> >> > +Date: August 2014 >> >>>> >> > +Contact: Minchan Kim <minchan@kernel.org> >> >>>> >> > +Description: >> >>>> >> > + The mem_limit file is read/write and specifies the amount >> >>>> >> > + of memory to be able to consume memory to store store >> >>>> >> > + compressed data. The limit could be changed in run time >> >>>> >> > - and "0" is default which means disable the limit. >> >>>> >> > + and "0" means disable the limit. No limit is the initial state. >> >>>> >> >> >>>> >> there should be no default in the API. >> >>>> > >> >>>> > Thanks. >> >>>> > >> >>>> >> >> >>>> >> > + Unit: bytes >> >>>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt >> >>>> >> > index 0595c3f56ccf..82c6a41116db 100644 >> >>>> >> > --- a/Documentation/blockdev/zram.txt >> >>>> >> > +++ b/Documentation/blockdev/zram.txt >> >>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory >> >>>> >> > since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the >> >>>> >> > size of the disk when not in use so a huge zram is wasteful. >> >>>> >> > >> >>>> >> > -5) Activate: >> >>>> >> > +5) Set memory limit: Optional >> >>>> >> > + Set memory limit by writing the value to sysfs node 'mem_limit'. >> >>>> >> > + The value can be either in bytes or you can use mem suffixes. >> >>>> >> > + In addition, you could change the value in runtime. >> >>>> >> > + Examples: >> >>>> >> > + # limit /dev/zram0 with 50MB memory >> >>>> >> > + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit >> >>>> >> > + >> >>>> >> > + # Using mem suffixes >> >>>> >> > + echo 256K > /sys/block/zram0/mem_limit >> >>>> >> > + echo 512M > /sys/block/zram0/mem_limit >> >>>> >> > + echo 1G > /sys/block/zram0/mem_limit >> >>>> >> > + >> >>>> >> > + # To disable memory limit >> >>>> >> > + echo 0 > /sys/block/zram0/mem_limit >> >>>> >> > + >> >>>> >> > +6) Activate: >> >>>> >> > mkswap /dev/zram0 >> >>>> >> > swapon /dev/zram0 >> >>>> >> > >> >>>> >> > mkfs.ext4 /dev/zram1 >> >>>> >> > mount /dev/zram1 /tmp >> >>>> >> > >> >>>> >> > -6) Stats: >> >>>> >> > +7) Stats: >> >>>> >> > Per-device statistics are exported as various nodes under >> >>>> >> > /sys/block/zram<id>/ >> >>>> >> > disksize >> >>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. >> >>>> >> > compr_data_size >> >>>> >> > mem_used_total >> >>>> >> > >> >>>> >> > -7) Deactivate: >> >>>> >> > +8) Deactivate: >> >>>> >> > swapoff /dev/zram0 >> >>>> >> > umount /dev/zram1 >> >>>> >> > >> >>>> >> > -8) Reset: >> >>>> >> > +9) Reset: >> >>>> >> > Write any positive value to 'reset' sysfs node >> >>>> >> > echo 1 > /sys/block/zram0/reset >> >>>> >> > echo 1 > /sys/block/zram1/reset >> >>>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c >> >>>> >> > index f0b8b30a7128..370c355eb127 100644 >> >>>> >> > --- a/drivers/block/zram/zram_drv.c >> >>>> >> > +++ b/drivers/block/zram/zram_drv.c >> >>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, >> >>>> >> > return scnprintf(buf, PAGE_SIZE, "%d\n", val); >> >>>> >> > } >> >>>> >> > >> >>>> >> > +static ssize_t mem_limit_show(struct device *dev, >> >>>> >> > + struct device_attribute *attr, char *buf) >> >>>> >> > +{ >> >>>> >> > + u64 val; >> >>>> >> > + struct zram *zram = dev_to_zram(dev); >> >>>> >> > + >> >>>> >> > + down_read(&zram->init_lock); >> >>>> >> > + val = zram->limit_pages; >> >>>> >> > + up_read(&zram->init_lock); >> >>>> >> > + >> >>>> >> > + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); >> >>>> >> > +} >> >>>> >> > + >> >>>> >> > +static ssize_t mem_limit_store(struct device *dev, >> >>>> >> > + struct device_attribute *attr, const char *buf, size_t len) >> >>>> >> > +{ >> >>>> >> > + u64 limit; >> >>>> >> > + struct zram *zram = dev_to_zram(dev); >> >>>> >> > + >> >>>> >> > + limit = memparse(buf, NULL); >> >>>> >> >> >>>> >> if (limit = 0 && buf != "0") >> >>>> >> return -EINVAL >> >>>> >> >> >>>> >> > + down_write(&zram->init_lock); >> >>>> >> > + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; >> >>>> >> > + up_write(&zram->init_lock); >> >>>> >> > + >> >>>> >> > + return len; >> >>>> >> > +} >> >>>> >> > + >> >>>> >> > static ssize_t max_comp_streams_store(struct device *dev, >> >>>> >> > struct device_attribute *attr, const char *buf, size_t len) >> >>>> >> > { >> >>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, >> >>>> >> > ret = -ENOMEM; >> >>>> >> > goto out; >> >>>> >> > } >> >>>> >> > + >> >>>> >> > + if (zram->limit_pages && >> >>>> >> > + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { >> >>>> >> > + zs_free(meta->mem_pool, handle); >> >>>> >> > + ret = -ENOMEM; >> >>>> >> > + goto out; >> >>>> >> > + } >> >>>> >> > + >> >>>> >> > cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); >> >>>> >> > >> >>>> >> > if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { >> >>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) >> >>>> >> > struct zram_meta *meta; >> >>>> >> > >> >>>> >> > down_write(&zram->init_lock); >> >>>> >> > + >> >>>> >> > + zram->limit_pages = 0; >> >>>> >> > + >> >>>> >> > if (!init_done(zram)) { >> >>>> >> > up_write(&zram->init_lock); >> >>>> >> > return; >> >>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); >> >>>> >> > static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); >> >>>> >> > static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); >> >>>> >> > static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); >> >>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, >> >>>> >> > + mem_limit_store); >> >>>> >> > static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, >> >>>> >> > max_comp_streams_show, max_comp_streams_store); >> >>>> >> > static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, >> >>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { >> >>>> >> > &dev_attr_orig_data_size.attr, >> >>>> >> > &dev_attr_compr_data_size.attr, >> >>>> >> > &dev_attr_mem_used_total.attr, >> >>>> >> > + &dev_attr_mem_limit.attr, >> >>>> >> > &dev_attr_max_comp_streams.attr, >> >>>> >> > &dev_attr_comp_algorithm.attr, >> >>>> >> > NULL, >> >>>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h >> >>>> >> > index e0f725c87cc6..b7aa9c21553f 100644 >> >>>> >> > --- a/drivers/block/zram/zram_drv.h >> >>>> >> > +++ b/drivers/block/zram/zram_drv.h >> >>>> >> > @@ -112,6 +112,11 @@ struct zram { >> >>>> >> > u64 disksize; /* bytes */ >> >>>> >> > int max_comp_streams; >> >>>> >> > struct zram_stats stats; >> >>>> >> > + /* >> >>>> >> > + * the number of pages zram can consume for storing compressed data >> >>>> >> > + */ >> >>>> >> > + unsigned long limit_pages; >> >>>> >> > + >> >>>> >> > char compressor[10]; >> >>>> >> > }; >> >>>> >> > #endif >> >>>> >> > -- >> >>>> >> > 2.0.0 >> >>>> >> > >> >>>> >> >> >>>> >> -- >> >>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> >>>> >> the body to majordomo@kvack.org. For more info on Linux MM, >> >>>> >> see: http://www.linux-mm.org/ . >> >>>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >> >>>> > >> >>>> > -- >> >>>> > Kind regards, >> >>>> > Minchan Kim >> >>>> >> >>>> -- >> >>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> >>>> the body to majordomo@kvack.org. For more info on Linux MM, >> >>>> see: http://www.linux-mm.org/ . >> >>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >> >>> >> >>> -- >> >>> Kind regards, >> >>> Minchan Kim >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > > -- > Kind regards, > Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation 2014-08-26 4:39 ` Minchan Kim @ 2014-08-26 13:31 ` Dan Streetman -1 siblings, 0 replies; 44+ messages in thread From: Dan Streetman @ 2014-08-26 13:31 UTC (permalink / raw) To: Minchan Kim Cc: David Horner, Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings On Tue, Aug 26, 2014 at 12:39 AM, Minchan Kim <minchan@kernel.org> wrote: > Hi Dan and David, > > On Mon, Aug 25, 2014 at 09:54:57PM -0400, David Horner wrote: >> On Mon, Aug 25, 2014 at 2:12 PM, Dan Streetman <ddstreet@ieee.org> wrote: >> > On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@gmail.com> wrote: >> >> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote: >> >>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote: >> >>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote: >> >>>> > Hello David, >> >>>> > >> >>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote: >> >>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: >> >>>> >> > Since zram has no control feature to limit memory usage, >> >>>> >> > it makes hard to manage system memrory. >> >>>> >> > >> >>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the >> >>>> >> > a limit so that zram could fail allocation once it reaches >> >>>> >> > the limit. >> >>>> >> > >> >>>> >> > In addition, user could change the limit in runtime so that >> >>>> >> > he could manage the memory more dynamically. >> >>>> >> > >> >>>> >> - Default is no limit so it doesn't break old behavior. >> >>>> >> + Initial state is no limit so it doesn't break old behavior. >> >>>> >> >> >>>> >> I understand your previous post now. >> >>>> >> >> >>>> >> I was saying that setting to either a null value or garbage >> >>>> >> (which is interpreted as zero by memparse(buf, NULL);) >> >>>> >> removes the limit. >> >>>> >> >> >>>> >> I think this is "surprise" behaviour and rather the null case should >> >>>> >> return -EINVAL >> >>>> >> The test below should be "good enough" though not catching all garbage. >> >>>> > >> >>>> > Thanks for suggesting but as I said, it should be fixed in memparse itself, >> >>>> > not caller if it is really problem so I don't want to touch it in this >> >>>> > patchset. It's not critical for adding the feature. >> >>>> > >> >>>> >> >>>> I've looked into the memparse function more since we talked. >> >>>> I do believe a wrapper function around it for the typical use by sysfs would >> >>>> be very valuable. >> >>> >> >>> Agree. >> >>> >> >>>> However, there is nothing wrong with memparse itself that needs to be fixed. >> >>>> >> >>>> It does what it is documented to do very well (In My Uninformed Opinion). >> >>>> It provides everything that a caller needs to manage the token that it >> >>>> processes. >> >>>> It thus handles strings like "7,,5,8,,9" with the implied zeros. >> >>> >> >>> Maybe strict_memparse would be better to protect such things so you >> >>> could find several places to clean it up. >> >>> >> >>>> >> >>>> The fact that other callers don't check the return pointer value to >> >>>> see if only a null >> >>>> string was processed, is not its fault. >> >>>> Nor that it may not be ideally suited to sysfs attributes; that other store >> >>>> functions use it in a given manner does not means that is correct - >> >>>> nor that it is >> >>>> incorrect for that "knob". Some attributes could be just as valid with >> >>>> null zeros. >> >>>> >> >>>> And you are correct, to disambiguate the zero is not required for the >> >>>> limit feature. >> >>>> Your original patch which disallowed zero was full feature for mem_limit. >> >>>> It is the requested non-crucial feature to allow zero to reestablish >> >>>> the initial state >> >>>> that benefits from distinguishing an explicit zero from a "default zero' >> >>>> when garbage is written. >> >>>> >> >>>> The final argument is that if we release this feature as is the undocumented >> >>>> functionality could be relied upon, and when later fixed: user space breaks. >> >>> >> >>> I don't get it. Why does it break userspace? >> >>> The sysfs-block-zram says "0" means disable the limit. >> >>> If someone writes *garabge* but work as if disabling the limit, >> >>> it's not a right thing and he already broke although it worked >> >>> so it would be not a problem if we fix later. >> >>> (ie, we don't need to take care of broken userspace) >> >>> Am I missing your point? >> >>> >> >> >> >> Perhaps you are missing my point, perhaps ignoring or dismissing. >> >> >> >> Basically, if a facility works in a useful way, even if it was designed for >> >> different usage, that becomes the "accepted" interface/usage. >> >> The developer may not have intended that usage or may even considered >> >> it wrong and a broken usage, but it is what it is and people become >> >> reliant on that behaviour. >> >> >> >> Case in point is memparse itself. >> >> >> >> The developer intentionally sets the return pointer because that is the >> >> only value that can be validated for correct performance. >> >> The return value allows -ve so the standard error message passing is not valid. >> >> Unfortunately, C allows the user to pass a NULL value in the parameter. >> >> The developer could consider that absurd and fundamentally broken. >> >> But to the user it is a valid situation, because (perhaps) it can't be >> >> bothered to handle error cases. >> >> >> >> So, who is to blame. >> >> You say memparse, that it is fundamentally broken, >> >> because it didn't check to see that it was used correctly. >> >> And I say mem_limit_store is fundamentally broken, >> >> because it didn't check to see that it was used correctly. >> > >> > I think we should look at what the rest of the kernel does as far as >> > checking memparse results. It appears to be a mix of some code >> > checking memparse while others don't. The most common way to check >> > appears to be to verify that memparse actually parsed at least 1 >> > character, e.g.: >> > oldp = p; >> > mem_size = memparse(p, &p); >> > if (p == oldp) >> > return -EINVAL; >> > >> > although other places where 0 isn't valid can simply check for that: >> > mem_size = memparse(p, &p); >> > /* don't remove all of memory when handling "mem={invalid}" param */ >> > if (mem_size == 0) >> > return -EINVAL; >> > >> > or even the other memparse use in zram_drv.c: >> > disksize = memparse(buf, NULL); >> > if (!disksize) >> > return -EINVAL; >> > >> > >> > And there seem to be other places where (maybe?) there's no checking >> > at all. However, it also seems like many cases of memparse usage are >> > looking for a non-zero value, and therefore they can either >> > immediately check for zero/invalid or (possibly) later code has checks >> > to avoid using any zero value. In this case though, 0 is a valid >> > value. So, while I agree that if a user passes an invalid (i.e. >> > non-numeric) value it's clearly user error, it might be closer to the >> > apparent (although unwritten AFAICT) memparse usage api to check the >> > result for validity; in our case a simple check if at least 1 char was >> > parsed is all that's needed, e.g.: >> > >> > { >> > u64 limit; >> > char *tmp = buf; >> > struct zram *zram = dev_to_zram(dev); >> > >> > limit = memparse(buf, &tmp); >> > if (buf == tmp) /* no chars parsed, invalid input */ >> > return -EINVAL; >> > down_write(&zram->init_lock); >> >> >> Thank you Dan, for this clear, unoffensive and I believe compelling analysis. > > Thanks for suggestion, Dan. > > David, Are you okay for this? > > You pointed out several cases. One was NULL check. > Dan's patch will fix it but other example you pointed out was > "7,,5,8,,9". Slightly modifying your example, "0..1" can reset without > returning EINVAL. Actually, it was not what we want. > Couldn't we check it if you guys really want to prevent wrong use from > userspace? If we don't need it, pz, give me a reason so I will convince > and proceed this patchset and do further works. As you show, the simple check to see if at least 1 char was parsed won't catch all invalid strings, only those with no leading numerics, like "help", "?", "", etc. But that appears to be the common usage of memparse, to only check for basic validity, not strictly checking that the entire input string was fully parsed. I think the rest of this patch is good, and this is a very minor issue that only occurs with user error. This could be left until later, possibly along with a larger memparse update. With or without this minor adjustment to check the memparse result basic validity: Reviewed-by: Dan Streetman <ddstreet@ieee.org> > > Thanks. > >> >> I have much to learn. >> >> > ... >> > >> > >> > Separate from this patch, it would also help if the lib/cmdline.c >> > memparse doc was at least updated to clarify when the result should be >> > checked for validity (e.g. always, or at least when the result is 0) >> > and how best to do that (e.g. if 0 is an invalid value, just check if >> > the result is 0; if 0 is a possible valid value, check if any chars >> > were parsed). >> > >> > >> >> I'd argue that the code is not the place for this usage recommendation. >> But rather an expansion of the support doc for sysfs >> on how to use such parsing/validation routines. >> >> I agree with Minchan that these helper functions could be improved >> for specific use by sysfs. >> And I will pursue this. (and maybe the documentation?) >> >> >> >> >> >> The difference is that memparse cannot stop being abused >> >> (C allows the NULL argument and extensive tricks are required to address that) >> >> however, we can readily fix mem_limit_store and ensure >> >> 1) no regression when the interface IS fixed and >> >> 2) predictable behaviour when accidental or "fuzzy" input arrives. >> >> >> >> >> >>>> They say getting API right is a difficult exercise. I suggest, if we >> >>>> don't insisting on >> >>>> an explicit zero we have the API wrong. >> >>>> >> >>>> I don't think you disagreed, just that the burden to get it correct >> >>>> lay elsewhere. >> >>>> >> >>>> If that is the case it doesn't really matter, we cannot release this >> >>>> interface until >> >>>> it is corrected wherever it must be. >> >>>> >> >>>> And my zero check was a poor hack. >> >>>> >> >>>> I should have explicitly checked the returned pointer value. >> >>>> >> >>>> I will send that proposed revision, and hopefully you will consider it >> >>>> for inclusion. >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >> >>>> >> > >> >>>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org> >> >>>> >> > --- >> >>>> >> > Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ >> >>>> >> > Documentation/blockdev/zram.txt | 24 ++++++++++++++--- >> >>>> >> > drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ >> >>>> >> > drivers/block/zram/zram_drv.h | 5 ++++ >> >>>> >> > 4 files changed, 76 insertions(+), 4 deletions(-) >> >>>> >> > >> >>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram >> >>>> >> > index 70ec992514d0..b8c779d64968 100644 >> >>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram >> >>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram >> >>>> >> > @@ -119,3 +119,13 @@ Description: >> >>>> >> > efficiency can be calculated using compr_data_size and this >> >>>> >> > statistic. >> >>>> >> > Unit: bytes >> >>>> >> > + >> >>>> >> > +What: /sys/block/zram<id>/mem_limit >> >>>> >> > +Date: August 2014 >> >>>> >> > +Contact: Minchan Kim <minchan@kernel.org> >> >>>> >> > +Description: >> >>>> >> > + The mem_limit file is read/write and specifies the amount >> >>>> >> > + of memory to be able to consume memory to store store >> >>>> >> > + compressed data. The limit could be changed in run time >> >>>> >> > - and "0" is default which means disable the limit. >> >>>> >> > + and "0" means disable the limit. No limit is the initial state. >> >>>> >> >> >>>> >> there should be no default in the API. >> >>>> > >> >>>> > Thanks. >> >>>> > >> >>>> >> >> >>>> >> > + Unit: bytes >> >>>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt >> >>>> >> > index 0595c3f56ccf..82c6a41116db 100644 >> >>>> >> > --- a/Documentation/blockdev/zram.txt >> >>>> >> > +++ b/Documentation/blockdev/zram.txt >> >>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory >> >>>> >> > since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the >> >>>> >> > size of the disk when not in use so a huge zram is wasteful. >> >>>> >> > >> >>>> >> > -5) Activate: >> >>>> >> > +5) Set memory limit: Optional >> >>>> >> > + Set memory limit by writing the value to sysfs node 'mem_limit'. >> >>>> >> > + The value can be either in bytes or you can use mem suffixes. >> >>>> >> > + In addition, you could change the value in runtime. >> >>>> >> > + Examples: >> >>>> >> > + # limit /dev/zram0 with 50MB memory >> >>>> >> > + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit >> >>>> >> > + >> >>>> >> > + # Using mem suffixes >> >>>> >> > + echo 256K > /sys/block/zram0/mem_limit >> >>>> >> > + echo 512M > /sys/block/zram0/mem_limit >> >>>> >> > + echo 1G > /sys/block/zram0/mem_limit >> >>>> >> > + >> >>>> >> > + # To disable memory limit >> >>>> >> > + echo 0 > /sys/block/zram0/mem_limit >> >>>> >> > + >> >>>> >> > +6) Activate: >> >>>> >> > mkswap /dev/zram0 >> >>>> >> > swapon /dev/zram0 >> >>>> >> > >> >>>> >> > mkfs.ext4 /dev/zram1 >> >>>> >> > mount /dev/zram1 /tmp >> >>>> >> > >> >>>> >> > -6) Stats: >> >>>> >> > +7) Stats: >> >>>> >> > Per-device statistics are exported as various nodes under >> >>>> >> > /sys/block/zram<id>/ >> >>>> >> > disksize >> >>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. >> >>>> >> > compr_data_size >> >>>> >> > mem_used_total >> >>>> >> > >> >>>> >> > -7) Deactivate: >> >>>> >> > +8) Deactivate: >> >>>> >> > swapoff /dev/zram0 >> >>>> >> > umount /dev/zram1 >> >>>> >> > >> >>>> >> > -8) Reset: >> >>>> >> > +9) Reset: >> >>>> >> > Write any positive value to 'reset' sysfs node >> >>>> >> > echo 1 > /sys/block/zram0/reset >> >>>> >> > echo 1 > /sys/block/zram1/reset >> >>>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c >> >>>> >> > index f0b8b30a7128..370c355eb127 100644 >> >>>> >> > --- a/drivers/block/zram/zram_drv.c >> >>>> >> > +++ b/drivers/block/zram/zram_drv.c >> >>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, >> >>>> >> > return scnprintf(buf, PAGE_SIZE, "%d\n", val); >> >>>> >> > } >> >>>> >> > >> >>>> >> > +static ssize_t mem_limit_show(struct device *dev, >> >>>> >> > + struct device_attribute *attr, char *buf) >> >>>> >> > +{ >> >>>> >> > + u64 val; >> >>>> >> > + struct zram *zram = dev_to_zram(dev); >> >>>> >> > + >> >>>> >> > + down_read(&zram->init_lock); >> >>>> >> > + val = zram->limit_pages; >> >>>> >> > + up_read(&zram->init_lock); >> >>>> >> > + >> >>>> >> > + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); >> >>>> >> > +} >> >>>> >> > + >> >>>> >> > +static ssize_t mem_limit_store(struct device *dev, >> >>>> >> > + struct device_attribute *attr, const char *buf, size_t len) >> >>>> >> > +{ >> >>>> >> > + u64 limit; >> >>>> >> > + struct zram *zram = dev_to_zram(dev); >> >>>> >> > + >> >>>> >> > + limit = memparse(buf, NULL); >> >>>> >> >> >>>> >> if (limit = 0 && buf != "0") >> >>>> >> return -EINVAL >> >>>> >> >> >>>> >> > + down_write(&zram->init_lock); >> >>>> >> > + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; >> >>>> >> > + up_write(&zram->init_lock); >> >>>> >> > + >> >>>> >> > + return len; >> >>>> >> > +} >> >>>> >> > + >> >>>> >> > static ssize_t max_comp_streams_store(struct device *dev, >> >>>> >> > struct device_attribute *attr, const char *buf, size_t len) >> >>>> >> > { >> >>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, >> >>>> >> > ret = -ENOMEM; >> >>>> >> > goto out; >> >>>> >> > } >> >>>> >> > + >> >>>> >> > + if (zram->limit_pages && >> >>>> >> > + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { >> >>>> >> > + zs_free(meta->mem_pool, handle); >> >>>> >> > + ret = -ENOMEM; >> >>>> >> > + goto out; >> >>>> >> > + } >> >>>> >> > + >> >>>> >> > cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); >> >>>> >> > >> >>>> >> > if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { >> >>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) >> >>>> >> > struct zram_meta *meta; >> >>>> >> > >> >>>> >> > down_write(&zram->init_lock); >> >>>> >> > + >> >>>> >> > + zram->limit_pages = 0; >> >>>> >> > + >> >>>> >> > if (!init_done(zram)) { >> >>>> >> > up_write(&zram->init_lock); >> >>>> >> > return; >> >>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); >> >>>> >> > static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); >> >>>> >> > static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); >> >>>> >> > static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); >> >>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, >> >>>> >> > + mem_limit_store); >> >>>> >> > static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, >> >>>> >> > max_comp_streams_show, max_comp_streams_store); >> >>>> >> > static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, >> >>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { >> >>>> >> > &dev_attr_orig_data_size.attr, >> >>>> >> > &dev_attr_compr_data_size.attr, >> >>>> >> > &dev_attr_mem_used_total.attr, >> >>>> >> > + &dev_attr_mem_limit.attr, >> >>>> >> > &dev_attr_max_comp_streams.attr, >> >>>> >> > &dev_attr_comp_algorithm.attr, >> >>>> >> > NULL, >> >>>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h >> >>>> >> > index e0f725c87cc6..b7aa9c21553f 100644 >> >>>> >> > --- a/drivers/block/zram/zram_drv.h >> >>>> >> > +++ b/drivers/block/zram/zram_drv.h >> >>>> >> > @@ -112,6 +112,11 @@ struct zram { >> >>>> >> > u64 disksize; /* bytes */ >> >>>> >> > int max_comp_streams; >> >>>> >> > struct zram_stats stats; >> >>>> >> > + /* >> >>>> >> > + * the number of pages zram can consume for storing compressed data >> >>>> >> > + */ >> >>>> >> > + unsigned long limit_pages; >> >>>> >> > + >> >>>> >> > char compressor[10]; >> >>>> >> > }; >> >>>> >> > #endif >> >>>> >> > -- >> >>>> >> > 2.0.0 >> >>>> >> > >> >>>> >> >> >>>> >> -- >> >>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> >>>> >> the body to majordomo@kvack.org. For more info on Linux MM, >> >>>> >> see: http://www.linux-mm.org/ . >> >>>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >> >>>> > >> >>>> > -- >> >>>> > Kind regards, >> >>>> > Minchan Kim >> >>>> >> >>>> -- >> >>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> >>>> the body to majordomo@kvack.org. For more info on Linux MM, >> >>>> see: http://www.linux-mm.org/ . >> >>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >> >>> >> >>> -- >> >>> Kind regards, >> >>> Minchan Kim >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > > -- > Kind regards, > Minchan Kim ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation @ 2014-08-26 13:31 ` Dan Streetman 0 siblings, 0 replies; 44+ messages in thread From: Dan Streetman @ 2014-08-26 13:31 UTC (permalink / raw) To: Minchan Kim Cc: David Horner, Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings On Tue, Aug 26, 2014 at 12:39 AM, Minchan Kim <minchan@kernel.org> wrote: > Hi Dan and David, > > On Mon, Aug 25, 2014 at 09:54:57PM -0400, David Horner wrote: >> On Mon, Aug 25, 2014 at 2:12 PM, Dan Streetman <ddstreet@ieee.org> wrote: >> > On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@gmail.com> wrote: >> >> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote: >> >>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote: >> >>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote: >> >>>> > Hello David, >> >>>> > >> >>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote: >> >>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: >> >>>> >> > Since zram has no control feature to limit memory usage, >> >>>> >> > it makes hard to manage system memrory. >> >>>> >> > >> >>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the >> >>>> >> > a limit so that zram could fail allocation once it reaches >> >>>> >> > the limit. >> >>>> >> > >> >>>> >> > In addition, user could change the limit in runtime so that >> >>>> >> > he could manage the memory more dynamically. >> >>>> >> > >> >>>> >> - Default is no limit so it doesn't break old behavior. >> >>>> >> + Initial state is no limit so it doesn't break old behavior. >> >>>> >> >> >>>> >> I understand your previous post now. >> >>>> >> >> >>>> >> I was saying that setting to either a null value or garbage >> >>>> >> (which is interpreted as zero by memparse(buf, NULL);) >> >>>> >> removes the limit. >> >>>> >> >> >>>> >> I think this is "surprise" behaviour and rather the null case should >> >>>> >> return -EINVAL >> >>>> >> The test below should be "good enough" though not catching all garbage. >> >>>> > >> >>>> > Thanks for suggesting but as I said, it should be fixed in memparse itself, >> >>>> > not caller if it is really problem so I don't want to touch it in this >> >>>> > patchset. It's not critical for adding the feature. >> >>>> > >> >>>> >> >>>> I've looked into the memparse function more since we talked. >> >>>> I do believe a wrapper function around it for the typical use by sysfs would >> >>>> be very valuable. >> >>> >> >>> Agree. >> >>> >> >>>> However, there is nothing wrong with memparse itself that needs to be fixed. >> >>>> >> >>>> It does what it is documented to do very well (In My Uninformed Opinion). >> >>>> It provides everything that a caller needs to manage the token that it >> >>>> processes. >> >>>> It thus handles strings like "7,,5,8,,9" with the implied zeros. >> >>> >> >>> Maybe strict_memparse would be better to protect such things so you >> >>> could find several places to clean it up. >> >>> >> >>>> >> >>>> The fact that other callers don't check the return pointer value to >> >>>> see if only a null >> >>>> string was processed, is not its fault. >> >>>> Nor that it may not be ideally suited to sysfs attributes; that other store >> >>>> functions use it in a given manner does not means that is correct - >> >>>> nor that it is >> >>>> incorrect for that "knob". Some attributes could be just as valid with >> >>>> null zeros. >> >>>> >> >>>> And you are correct, to disambiguate the zero is not required for the >> >>>> limit feature. >> >>>> Your original patch which disallowed zero was full feature for mem_limit. >> >>>> It is the requested non-crucial feature to allow zero to reestablish >> >>>> the initial state >> >>>> that benefits from distinguishing an explicit zero from a "default zero' >> >>>> when garbage is written. >> >>>> >> >>>> The final argument is that if we release this feature as is the undocumented >> >>>> functionality could be relied upon, and when later fixed: user space breaks. >> >>> >> >>> I don't get it. Why does it break userspace? >> >>> The sysfs-block-zram says "0" means disable the limit. >> >>> If someone writes *garabge* but work as if disabling the limit, >> >>> it's not a right thing and he already broke although it worked >> >>> so it would be not a problem if we fix later. >> >>> (ie, we don't need to take care of broken userspace) >> >>> Am I missing your point? >> >>> >> >> >> >> Perhaps you are missing my point, perhaps ignoring or dismissing. >> >> >> >> Basically, if a facility works in a useful way, even if it was designed for >> >> different usage, that becomes the "accepted" interface/usage. >> >> The developer may not have intended that usage or may even considered >> >> it wrong and a broken usage, but it is what it is and people become >> >> reliant on that behaviour. >> >> >> >> Case in point is memparse itself. >> >> >> >> The developer intentionally sets the return pointer because that is the >> >> only value that can be validated for correct performance. >> >> The return value allows -ve so the standard error message passing is not valid. >> >> Unfortunately, C allows the user to pass a NULL value in the parameter. >> >> The developer could consider that absurd and fundamentally broken. >> >> But to the user it is a valid situation, because (perhaps) it can't be >> >> bothered to handle error cases. >> >> >> >> So, who is to blame. >> >> You say memparse, that it is fundamentally broken, >> >> because it didn't check to see that it was used correctly. >> >> And I say mem_limit_store is fundamentally broken, >> >> because it didn't check to see that it was used correctly. >> > >> > I think we should look at what the rest of the kernel does as far as >> > checking memparse results. It appears to be a mix of some code >> > checking memparse while others don't. The most common way to check >> > appears to be to verify that memparse actually parsed at least 1 >> > character, e.g.: >> > oldp = p; >> > mem_size = memparse(p, &p); >> > if (p == oldp) >> > return -EINVAL; >> > >> > although other places where 0 isn't valid can simply check for that: >> > mem_size = memparse(p, &p); >> > /* don't remove all of memory when handling "mem={invalid}" param */ >> > if (mem_size == 0) >> > return -EINVAL; >> > >> > or even the other memparse use in zram_drv.c: >> > disksize = memparse(buf, NULL); >> > if (!disksize) >> > return -EINVAL; >> > >> > >> > And there seem to be other places where (maybe?) there's no checking >> > at all. However, it also seems like many cases of memparse usage are >> > looking for a non-zero value, and therefore they can either >> > immediately check for zero/invalid or (possibly) later code has checks >> > to avoid using any zero value. In this case though, 0 is a valid >> > value. So, while I agree that if a user passes an invalid (i.e. >> > non-numeric) value it's clearly user error, it might be closer to the >> > apparent (although unwritten AFAICT) memparse usage api to check the >> > result for validity; in our case a simple check if at least 1 char was >> > parsed is all that's needed, e.g.: >> > >> > { >> > u64 limit; >> > char *tmp = buf; >> > struct zram *zram = dev_to_zram(dev); >> > >> > limit = memparse(buf, &tmp); >> > if (buf == tmp) /* no chars parsed, invalid input */ >> > return -EINVAL; >> > down_write(&zram->init_lock); >> >> >> Thank you Dan, for this clear, unoffensive and I believe compelling analysis. > > Thanks for suggestion, Dan. > > David, Are you okay for this? > > You pointed out several cases. One was NULL check. > Dan's patch will fix it but other example you pointed out was > "7,,5,8,,9". Slightly modifying your example, "0..1" can reset without > returning EINVAL. Actually, it was not what we want. > Couldn't we check it if you guys really want to prevent wrong use from > userspace? If we don't need it, pz, give me a reason so I will convince > and proceed this patchset and do further works. As you show, the simple check to see if at least 1 char was parsed won't catch all invalid strings, only those with no leading numerics, like "help", "?", "", etc. But that appears to be the common usage of memparse, to only check for basic validity, not strictly checking that the entire input string was fully parsed. I think the rest of this patch is good, and this is a very minor issue that only occurs with user error. This could be left until later, possibly along with a larger memparse update. With or without this minor adjustment to check the memparse result basic validity: Reviewed-by: Dan Streetman <ddstreet@ieee.org> > > Thanks. > >> >> I have much to learn. >> >> > ... >> > >> > >> > Separate from this patch, it would also help if the lib/cmdline.c >> > memparse doc was at least updated to clarify when the result should be >> > checked for validity (e.g. always, or at least when the result is 0) >> > and how best to do that (e.g. if 0 is an invalid value, just check if >> > the result is 0; if 0 is a possible valid value, check if any chars >> > were parsed). >> > >> > >> >> I'd argue that the code is not the place for this usage recommendation. >> But rather an expansion of the support doc for sysfs >> on how to use such parsing/validation routines. >> >> I agree with Minchan that these helper functions could be improved >> for specific use by sysfs. >> And I will pursue this. (and maybe the documentation?) >> >> >> >> >> >> The difference is that memparse cannot stop being abused >> >> (C allows the NULL argument and extensive tricks are required to address that) >> >> however, we can readily fix mem_limit_store and ensure >> >> 1) no regression when the interface IS fixed and >> >> 2) predictable behaviour when accidental or "fuzzy" input arrives. >> >> >> >> >> >>>> They say getting API right is a difficult exercise. I suggest, if we >> >>>> don't insisting on >> >>>> an explicit zero we have the API wrong. >> >>>> >> >>>> I don't think you disagreed, just that the burden to get it correct >> >>>> lay elsewhere. >> >>>> >> >>>> If that is the case it doesn't really matter, we cannot release this >> >>>> interface until >> >>>> it is corrected wherever it must be. >> >>>> >> >>>> And my zero check was a poor hack. >> >>>> >> >>>> I should have explicitly checked the returned pointer value. >> >>>> >> >>>> I will send that proposed revision, and hopefully you will consider it >> >>>> for inclusion. >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >> >>>> >> > >> >>>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org> >> >>>> >> > --- >> >>>> >> > Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ >> >>>> >> > Documentation/blockdev/zram.txt | 24 ++++++++++++++--- >> >>>> >> > drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ >> >>>> >> > drivers/block/zram/zram_drv.h | 5 ++++ >> >>>> >> > 4 files changed, 76 insertions(+), 4 deletions(-) >> >>>> >> > >> >>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram >> >>>> >> > index 70ec992514d0..b8c779d64968 100644 >> >>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram >> >>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram >> >>>> >> > @@ -119,3 +119,13 @@ Description: >> >>>> >> > efficiency can be calculated using compr_data_size and this >> >>>> >> > statistic. >> >>>> >> > Unit: bytes >> >>>> >> > + >> >>>> >> > +What: /sys/block/zram<id>/mem_limit >> >>>> >> > +Date: August 2014 >> >>>> >> > +Contact: Minchan Kim <minchan@kernel.org> >> >>>> >> > +Description: >> >>>> >> > + The mem_limit file is read/write and specifies the amount >> >>>> >> > + of memory to be able to consume memory to store store >> >>>> >> > + compressed data. The limit could be changed in run time >> >>>> >> > - and "0" is default which means disable the limit. >> >>>> >> > + and "0" means disable the limit. No limit is the initial state. >> >>>> >> >> >>>> >> there should be no default in the API. >> >>>> > >> >>>> > Thanks. >> >>>> > >> >>>> >> >> >>>> >> > + Unit: bytes >> >>>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt >> >>>> >> > index 0595c3f56ccf..82c6a41116db 100644 >> >>>> >> > --- a/Documentation/blockdev/zram.txt >> >>>> >> > +++ b/Documentation/blockdev/zram.txt >> >>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory >> >>>> >> > since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the >> >>>> >> > size of the disk when not in use so a huge zram is wasteful. >> >>>> >> > >> >>>> >> > -5) Activate: >> >>>> >> > +5) Set memory limit: Optional >> >>>> >> > + Set memory limit by writing the value to sysfs node 'mem_limit'. >> >>>> >> > + The value can be either in bytes or you can use mem suffixes. >> >>>> >> > + In addition, you could change the value in runtime. >> >>>> >> > + Examples: >> >>>> >> > + # limit /dev/zram0 with 50MB memory >> >>>> >> > + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit >> >>>> >> > + >> >>>> >> > + # Using mem suffixes >> >>>> >> > + echo 256K > /sys/block/zram0/mem_limit >> >>>> >> > + echo 512M > /sys/block/zram0/mem_limit >> >>>> >> > + echo 1G > /sys/block/zram0/mem_limit >> >>>> >> > + >> >>>> >> > + # To disable memory limit >> >>>> >> > + echo 0 > /sys/block/zram0/mem_limit >> >>>> >> > + >> >>>> >> > +6) Activate: >> >>>> >> > mkswap /dev/zram0 >> >>>> >> > swapon /dev/zram0 >> >>>> >> > >> >>>> >> > mkfs.ext4 /dev/zram1 >> >>>> >> > mount /dev/zram1 /tmp >> >>>> >> > >> >>>> >> > -6) Stats: >> >>>> >> > +7) Stats: >> >>>> >> > Per-device statistics are exported as various nodes under >> >>>> >> > /sys/block/zram<id>/ >> >>>> >> > disksize >> >>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. >> >>>> >> > compr_data_size >> >>>> >> > mem_used_total >> >>>> >> > >> >>>> >> > -7) Deactivate: >> >>>> >> > +8) Deactivate: >> >>>> >> > swapoff /dev/zram0 >> >>>> >> > umount /dev/zram1 >> >>>> >> > >> >>>> >> > -8) Reset: >> >>>> >> > +9) Reset: >> >>>> >> > Write any positive value to 'reset' sysfs node >> >>>> >> > echo 1 > /sys/block/zram0/reset >> >>>> >> > echo 1 > /sys/block/zram1/reset >> >>>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c >> >>>> >> > index f0b8b30a7128..370c355eb127 100644 >> >>>> >> > --- a/drivers/block/zram/zram_drv.c >> >>>> >> > +++ b/drivers/block/zram/zram_drv.c >> >>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, >> >>>> >> > return scnprintf(buf, PAGE_SIZE, "%d\n", val); >> >>>> >> > } >> >>>> >> > >> >>>> >> > +static ssize_t mem_limit_show(struct device *dev, >> >>>> >> > + struct device_attribute *attr, char *buf) >> >>>> >> > +{ >> >>>> >> > + u64 val; >> >>>> >> > + struct zram *zram = dev_to_zram(dev); >> >>>> >> > + >> >>>> >> > + down_read(&zram->init_lock); >> >>>> >> > + val = zram->limit_pages; >> >>>> >> > + up_read(&zram->init_lock); >> >>>> >> > + >> >>>> >> > + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); >> >>>> >> > +} >> >>>> >> > + >> >>>> >> > +static ssize_t mem_limit_store(struct device *dev, >> >>>> >> > + struct device_attribute *attr, const char *buf, size_t len) >> >>>> >> > +{ >> >>>> >> > + u64 limit; >> >>>> >> > + struct zram *zram = dev_to_zram(dev); >> >>>> >> > + >> >>>> >> > + limit = memparse(buf, NULL); >> >>>> >> >> >>>> >> if (limit = 0 && buf != "0") >> >>>> >> return -EINVAL >> >>>> >> >> >>>> >> > + down_write(&zram->init_lock); >> >>>> >> > + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; >> >>>> >> > + up_write(&zram->init_lock); >> >>>> >> > + >> >>>> >> > + return len; >> >>>> >> > +} >> >>>> >> > + >> >>>> >> > static ssize_t max_comp_streams_store(struct device *dev, >> >>>> >> > struct device_attribute *attr, const char *buf, size_t len) >> >>>> >> > { >> >>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, >> >>>> >> > ret = -ENOMEM; >> >>>> >> > goto out; >> >>>> >> > } >> >>>> >> > + >> >>>> >> > + if (zram->limit_pages && >> >>>> >> > + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { >> >>>> >> > + zs_free(meta->mem_pool, handle); >> >>>> >> > + ret = -ENOMEM; >> >>>> >> > + goto out; >> >>>> >> > + } >> >>>> >> > + >> >>>> >> > cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); >> >>>> >> > >> >>>> >> > if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { >> >>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) >> >>>> >> > struct zram_meta *meta; >> >>>> >> > >> >>>> >> > down_write(&zram->init_lock); >> >>>> >> > + >> >>>> >> > + zram->limit_pages = 0; >> >>>> >> > + >> >>>> >> > if (!init_done(zram)) { >> >>>> >> > up_write(&zram->init_lock); >> >>>> >> > return; >> >>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); >> >>>> >> > static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); >> >>>> >> > static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); >> >>>> >> > static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); >> >>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, >> >>>> >> > + mem_limit_store); >> >>>> >> > static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, >> >>>> >> > max_comp_streams_show, max_comp_streams_store); >> >>>> >> > static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, >> >>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { >> >>>> >> > &dev_attr_orig_data_size.attr, >> >>>> >> > &dev_attr_compr_data_size.attr, >> >>>> >> > &dev_attr_mem_used_total.attr, >> >>>> >> > + &dev_attr_mem_limit.attr, >> >>>> >> > &dev_attr_max_comp_streams.attr, >> >>>> >> > &dev_attr_comp_algorithm.attr, >> >>>> >> > NULL, >> >>>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h >> >>>> >> > index e0f725c87cc6..b7aa9c21553f 100644 >> >>>> >> > --- a/drivers/block/zram/zram_drv.h >> >>>> >> > +++ b/drivers/block/zram/zram_drv.h >> >>>> >> > @@ -112,6 +112,11 @@ struct zram { >> >>>> >> > u64 disksize; /* bytes */ >> >>>> >> > int max_comp_streams; >> >>>> >> > struct zram_stats stats; >> >>>> >> > + /* >> >>>> >> > + * the number of pages zram can consume for storing compressed data >> >>>> >> > + */ >> >>>> >> > + unsigned long limit_pages; >> >>>> >> > + >> >>>> >> > char compressor[10]; >> >>>> >> > }; >> >>>> >> > #endif >> >>>> >> > -- >> >>>> >> > 2.0.0 >> >>>> >> > >> >>>> >> >> >>>> >> -- >> >>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> >>>> >> the body to majordomo@kvack.org. For more info on Linux MM, >> >>>> >> see: http://www.linux-mm.org/ . >> >>>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >> >>>> > >> >>>> > -- >> >>>> > Kind regards, >> >>>> > Minchan Kim >> >>>> >> >>>> -- >> >>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> >>>> the body to majordomo@kvack.org. For more info on Linux MM, >> >>>> see: http://www.linux-mm.org/ . >> >>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >> >>> >> >>> -- >> >>> Kind regards, >> >>> Minchan Kim >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > > -- > Kind regards, > Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation 2014-08-25 18:12 ` Dan Streetman @ 2014-08-26 4:28 ` David Horner -1 siblings, 0 replies; 44+ messages in thread From: David Horner @ 2014-08-26 4:28 UTC (permalink / raw) To: Dan Streetman Cc: Minchan Kim, Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings On Mon, Aug 25, 2014 at 2:12 PM, Dan Streetman <ddstreet@ieee.org> wrote: > On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@gmail.com> wrote: >> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote: >>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote: >>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote: >>>> > Hello David, >>>> > >>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote: >>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: >>>> >> > Since zram has no control feature to limit memory usage, >>>> >> > it makes hard to manage system memrory. >>>> >> > >>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the >>>> >> > a limit so that zram could fail allocation once it reaches >>>> >> > the limit. >>>> >> > >>>> >> > In addition, user could change the limit in runtime so that >>>> >> > he could manage the memory more dynamically. >>>> >> > >>>> >> - Default is no limit so it doesn't break old behavior. >>>> >> + Initial state is no limit so it doesn't break old behavior. >>>> >> >>>> >> I understand your previous post now. >>>> >> >>>> >> I was saying that setting to either a null value or garbage >>>> >> (which is interpreted as zero by memparse(buf, NULL);) >>>> >> removes the limit. >>>> >> >>>> >> I think this is "surprise" behaviour and rather the null case should >>>> >> return -EINVAL >>>> >> The test below should be "good enough" though not catching all garbage. >>>> > >>>> > Thanks for suggesting but as I said, it should be fixed in memparse itself, >>>> > not caller if it is really problem so I don't want to touch it in this >>>> > patchset. It's not critical for adding the feature. >>>> > >>>> >>>> I've looked into the memparse function more since we talked. >>>> I do believe a wrapper function around it for the typical use by sysfs would >>>> be very valuable. >>> >>> Agree. >>> >>>> However, there is nothing wrong with memparse itself that needs to be fixed. >>>> >>>> It does what it is documented to do very well (In My Uninformed Opinion). >>>> It provides everything that a caller needs to manage the token that it >>>> processes. >>>> It thus handles strings like "7,,5,8,,9" with the implied zeros. >>> >>> Maybe strict_memparse would be better to protect such things so you >>> could find several places to clean it up. >>> >>>> >>>> The fact that other callers don't check the return pointer value to >>>> see if only a null >>>> string was processed, is not its fault. >>>> Nor that it may not be ideally suited to sysfs attributes; that other store >>>> functions use it in a given manner does not means that is correct - >>>> nor that it is >>>> incorrect for that "knob". Some attributes could be just as valid with >>>> null zeros. >>>> >>>> And you are correct, to disambiguate the zero is not required for the >>>> limit feature. >>>> Your original patch which disallowed zero was full feature for mem_limit. >>>> It is the requested non-crucial feature to allow zero to reestablish >>>> the initial state >>>> that benefits from distinguishing an explicit zero from a "default zero' >>>> when garbage is written. >>>> >>>> The final argument is that if we release this feature as is the undocumented >>>> functionality could be relied upon, and when later fixed: user space breaks. >>> >>> I don't get it. Why does it break userspace? >>> The sysfs-block-zram says "0" means disable the limit. >>> If someone writes *garabge* but work as if disabling the limit, >>> it's not a right thing and he already broke although it worked >>> so it would be not a problem if we fix later. >>> (ie, we don't need to take care of broken userspace) >>> Am I missing your point? >>> >> >> Perhaps you are missing my point, perhaps ignoring or dismissing. >> >> Basically, if a facility works in a useful way, even if it was designed for >> different usage, that becomes the "accepted" interface/usage. >> The developer may not have intended that usage or may even considered >> it wrong and a broken usage, but it is what it is and people become >> reliant on that behaviour. >> >> Case in point is memparse itself. >> >> The developer intentionally sets the return pointer because that is the >> only value that can be validated for correct performance. >> The return value allows -ve so the standard error message passing is not valid. >> Unfortunately, C allows the user to pass a NULL value in the parameter. >> The developer could consider that absurd and fundamentally broken. >> But to the user it is a valid situation, because (perhaps) it can't be >> bothered to handle error cases. >> >> So, who is to blame. >> You say memparse, that it is fundamentally broken, >> because it didn't check to see that it was used correctly. >> And I say mem_limit_store is fundamentally broken, >> because it didn't check to see that it was used correctly. > > I think we should look at what the rest of the kernel does as far as > checking memparse results. It appears to be a mix of some code > checking memparse while others don't. The most common way to check > appears to be to verify that memparse actually parsed at least 1 > character, e.g.: > oldp = p; > mem_size = memparse(p, &p); > if (p == oldp) > return -EINVAL; > > although other places where 0 isn't valid can simply check for that: > mem_size = memparse(p, &p); > /* don't remove all of memory when handling "mem={invalid}" param */ > if (mem_size == 0) > return -EINVAL; > > or even the other memparse use in zram_drv.c: > disksize = memparse(buf, NULL); > if (!disksize) > return -EINVAL; > > > And there seem to be other places where (maybe?) there's no checking > at all. However, it also seems like many cases of memparse usage are > looking for a non-zero value, and therefore they can either > immediately check for zero/invalid or (possibly) later code has checks > to avoid using any zero value. In this case though, 0 is a valid > value. So, while I agree that if a user passes an invalid (i.e. > non-numeric) value it's clearly user error, it might be closer to the > apparent (although unwritten AFAICT) memparse usage api to check the > result for validity; in our case a simple check if at least 1 char was > parsed is all that's needed, e.g.: > > { > u64 limit; > char *tmp = buf; > struct zram *zram = dev_to_zram(dev); > > limit = memparse(buf, &tmp); > if (buf == tmp) /* no chars parsed, invalid input */ > return -EINVAL; > down_write(&zram->init_lock); > ... > > > Separate from this patch, it would also help if the lib/cmdline.c > memparse doc was at least updated to clarify when the result should be > checked for validity FWIW: I was pondering why I thought this was the wrong place. On reflection the best explanation is that it is not validity - the program does what it does quite well. (although it does have flaws for use by sysfs 1) it uses simple_strtoull which according to kernel.h#L269 is obsolete 2) it checks for a suffix in the null zero case (that means G,K,M are all valid memory size constants, and I think that should not be in the definition of valid mem parms) 3) it does nothing to enforce termination of the input. Both simple_strtoull and its successor kstrtoull are not buffer overrun safe. And so neither is memparse. It may be the sysfs buffer management does some magic here - but I have not seen it documented nor in code.) Rather than _validity_ it is _applicability_ that needs explaining. And that is not documented in the function that does its thing. But rather in the code that uses it, and more specifically in the framework established for its specific use - as in sysfs for numeric memory parameters. > and how best to do that (e.g. if 0 is an invalid value, just check if > the result is 0; if 0 is a possible valid value, check if any chars > were parsed). > > >> >> The difference is that memparse cannot stop being abused >> (C allows the NULL argument and extensive tricks are required to address that) >> however, we can readily fix mem_limit_store and ensure >> 1) no regression when the interface IS fixed and >> 2) predictable behaviour when accidental or "fuzzy" input arrives. >> >> >>>> They say getting API right is a difficult exercise. I suggest, if we >>>> don't insisting on >>>> an explicit zero we have the API wrong. >>>> >>>> I don't think you disagreed, just that the burden to get it correct >>>> lay elsewhere. >>>> >>>> If that is the case it doesn't really matter, we cannot release this >>>> interface until >>>> it is corrected wherever it must be. >>>> >>>> And my zero check was a poor hack. >>>> >>>> I should have explicitly checked the returned pointer value. >>>> >>>> I will send that proposed revision, and hopefully you will consider it >>>> for inclusion. >>>> >>>> >>>> >>>> >>>> >> >>>> >> > >>>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org> >>>> >> > --- >>>> >> > Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ >>>> >> > Documentation/blockdev/zram.txt | 24 ++++++++++++++--- >>>> >> > drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ >>>> >> > drivers/block/zram/zram_drv.h | 5 ++++ >>>> >> > 4 files changed, 76 insertions(+), 4 deletions(-) >>>> >> > >>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram >>>> >> > index 70ec992514d0..b8c779d64968 100644 >>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram >>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram >>>> >> > @@ -119,3 +119,13 @@ Description: >>>> >> > efficiency can be calculated using compr_data_size and this >>>> >> > statistic. >>>> >> > Unit: bytes >>>> >> > + >>>> >> > +What: /sys/block/zram<id>/mem_limit >>>> >> > +Date: August 2014 >>>> >> > +Contact: Minchan Kim <minchan@kernel.org> >>>> >> > +Description: >>>> >> > + The mem_limit file is read/write and specifies the amount >>>> >> > + of memory to be able to consume memory to store store >>>> >> > + compressed data. The limit could be changed in run time >>>> >> > - and "0" is default which means disable the limit. >>>> >> > + and "0" means disable the limit. No limit is the initial state. >>>> >> >>>> >> there should be no default in the API. >>>> > >>>> > Thanks. >>>> > >>>> >> >>>> >> > + Unit: bytes >>>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt >>>> >> > index 0595c3f56ccf..82c6a41116db 100644 >>>> >> > --- a/Documentation/blockdev/zram.txt >>>> >> > +++ b/Documentation/blockdev/zram.txt >>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory >>>> >> > since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the >>>> >> > size of the disk when not in use so a huge zram is wasteful. >>>> >> > >>>> >> > -5) Activate: >>>> >> > +5) Set memory limit: Optional >>>> >> > + Set memory limit by writing the value to sysfs node 'mem_limit'. >>>> >> > + The value can be either in bytes or you can use mem suffixes. >>>> >> > + In addition, you could change the value in runtime. >>>> >> > + Examples: >>>> >> > + # limit /dev/zram0 with 50MB memory >>>> >> > + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit >>>> >> > + >>>> >> > + # Using mem suffixes >>>> >> > + echo 256K > /sys/block/zram0/mem_limit >>>> >> > + echo 512M > /sys/block/zram0/mem_limit >>>> >> > + echo 1G > /sys/block/zram0/mem_limit >>>> >> > + >>>> >> > + # To disable memory limit >>>> >> > + echo 0 > /sys/block/zram0/mem_limit >>>> >> > + >>>> >> > +6) Activate: >>>> >> > mkswap /dev/zram0 >>>> >> > swapon /dev/zram0 >>>> >> > >>>> >> > mkfs.ext4 /dev/zram1 >>>> >> > mount /dev/zram1 /tmp >>>> >> > >>>> >> > -6) Stats: >>>> >> > +7) Stats: >>>> >> > Per-device statistics are exported as various nodes under >>>> >> > /sys/block/zram<id>/ >>>> >> > disksize >>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. >>>> >> > compr_data_size >>>> >> > mem_used_total >>>> >> > >>>> >> > -7) Deactivate: >>>> >> > +8) Deactivate: >>>> >> > swapoff /dev/zram0 >>>> >> > umount /dev/zram1 >>>> >> > >>>> >> > -8) Reset: >>>> >> > +9) Reset: >>>> >> > Write any positive value to 'reset' sysfs node >>>> >> > echo 1 > /sys/block/zram0/reset >>>> >> > echo 1 > /sys/block/zram1/reset >>>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c >>>> >> > index f0b8b30a7128..370c355eb127 100644 >>>> >> > --- a/drivers/block/zram/zram_drv.c >>>> >> > +++ b/drivers/block/zram/zram_drv.c >>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, >>>> >> > return scnprintf(buf, PAGE_SIZE, "%d\n", val); >>>> >> > } >>>> >> > >>>> >> > +static ssize_t mem_limit_show(struct device *dev, >>>> >> > + struct device_attribute *attr, char *buf) >>>> >> > +{ >>>> >> > + u64 val; >>>> >> > + struct zram *zram = dev_to_zram(dev); >>>> >> > + >>>> >> > + down_read(&zram->init_lock); >>>> >> > + val = zram->limit_pages; >>>> >> > + up_read(&zram->init_lock); >>>> >> > + >>>> >> > + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); >>>> >> > +} >>>> >> > + >>>> >> > +static ssize_t mem_limit_store(struct device *dev, >>>> >> > + struct device_attribute *attr, const char *buf, size_t len) >>>> >> > +{ >>>> >> > + u64 limit; >>>> >> > + struct zram *zram = dev_to_zram(dev); >>>> >> > + >>>> >> > + limit = memparse(buf, NULL); >>>> >> >>>> >> if (limit = 0 && buf != "0") >>>> >> return -EINVAL >>>> >> >>>> >> > + down_write(&zram->init_lock); >>>> >> > + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; >>>> >> > + up_write(&zram->init_lock); >>>> >> > + >>>> >> > + return len; >>>> >> > +} >>>> >> > + >>>> >> > static ssize_t max_comp_streams_store(struct device *dev, >>>> >> > struct device_attribute *attr, const char *buf, size_t len) >>>> >> > { >>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, >>>> >> > ret = -ENOMEM; >>>> >> > goto out; >>>> >> > } >>>> >> > + >>>> >> > + if (zram->limit_pages && >>>> >> > + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { >>>> >> > + zs_free(meta->mem_pool, handle); >>>> >> > + ret = -ENOMEM; >>>> >> > + goto out; >>>> >> > + } >>>> >> > + >>>> >> > cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); >>>> >> > >>>> >> > if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { >>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) >>>> >> > struct zram_meta *meta; >>>> >> > >>>> >> > down_write(&zram->init_lock); >>>> >> > + >>>> >> > + zram->limit_pages = 0; >>>> >> > + >>>> >> > if (!init_done(zram)) { >>>> >> > up_write(&zram->init_lock); >>>> >> > return; >>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); >>>> >> > static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); >>>> >> > static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); >>>> >> > static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); >>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, >>>> >> > + mem_limit_store); >>>> >> > static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, >>>> >> > max_comp_streams_show, max_comp_streams_store); >>>> >> > static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, >>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { >>>> >> > &dev_attr_orig_data_size.attr, >>>> >> > &dev_attr_compr_data_size.attr, >>>> >> > &dev_attr_mem_used_total.attr, >>>> >> > + &dev_attr_mem_limit.attr, >>>> >> > &dev_attr_max_comp_streams.attr, >>>> >> > &dev_attr_comp_algorithm.attr, >>>> >> > NULL, >>>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h >>>> >> > index e0f725c87cc6..b7aa9c21553f 100644 >>>> >> > --- a/drivers/block/zram/zram_drv.h >>>> >> > +++ b/drivers/block/zram/zram_drv.h >>>> >> > @@ -112,6 +112,11 @@ struct zram { >>>> >> > u64 disksize; /* bytes */ >>>> >> > int max_comp_streams; >>>> >> > struct zram_stats stats; >>>> >> > + /* >>>> >> > + * the number of pages zram can consume for storing compressed data >>>> >> > + */ >>>> >> > + unsigned long limit_pages; >>>> >> > + >>>> >> > char compressor[10]; >>>> >> > }; >>>> >> > #endif >>>> >> > -- >>>> >> > 2.0.0 >>>> >> > >>>> >> >>>> >> -- >>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>>> >> the body to majordomo@kvack.org. For more info on Linux MM, >>>> >> see: http://www.linux-mm.org/ . >>>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >>>> > >>>> > -- >>>> > Kind regards, >>>> > Minchan Kim >>>> >>>> -- >>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>>> the body to majordomo@kvack.org. For more info on Linux MM, >>>> see: http://www.linux-mm.org/ . >>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >>> >>> -- >>> Kind regards, >>> Minchan Kim ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation @ 2014-08-26 4:28 ` David Horner 0 siblings, 0 replies; 44+ messages in thread From: David Horner @ 2014-08-26 4:28 UTC (permalink / raw) To: Dan Streetman Cc: Minchan Kim, Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings On Mon, Aug 25, 2014 at 2:12 PM, Dan Streetman <ddstreet@ieee.org> wrote: > On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@gmail.com> wrote: >> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote: >>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote: >>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote: >>>> > Hello David, >>>> > >>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote: >>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: >>>> >> > Since zram has no control feature to limit memory usage, >>>> >> > it makes hard to manage system memrory. >>>> >> > >>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the >>>> >> > a limit so that zram could fail allocation once it reaches >>>> >> > the limit. >>>> >> > >>>> >> > In addition, user could change the limit in runtime so that >>>> >> > he could manage the memory more dynamically. >>>> >> > >>>> >> - Default is no limit so it doesn't break old behavior. >>>> >> + Initial state is no limit so it doesn't break old behavior. >>>> >> >>>> >> I understand your previous post now. >>>> >> >>>> >> I was saying that setting to either a null value or garbage >>>> >> (which is interpreted as zero by memparse(buf, NULL);) >>>> >> removes the limit. >>>> >> >>>> >> I think this is "surprise" behaviour and rather the null case should >>>> >> return -EINVAL >>>> >> The test below should be "good enough" though not catching all garbage. >>>> > >>>> > Thanks for suggesting but as I said, it should be fixed in memparse itself, >>>> > not caller if it is really problem so I don't want to touch it in this >>>> > patchset. It's not critical for adding the feature. >>>> > >>>> >>>> I've looked into the memparse function more since we talked. >>>> I do believe a wrapper function around it for the typical use by sysfs would >>>> be very valuable. >>> >>> Agree. >>> >>>> However, there is nothing wrong with memparse itself that needs to be fixed. >>>> >>>> It does what it is documented to do very well (In My Uninformed Opinion). >>>> It provides everything that a caller needs to manage the token that it >>>> processes. >>>> It thus handles strings like "7,,5,8,,9" with the implied zeros. >>> >>> Maybe strict_memparse would be better to protect such things so you >>> could find several places to clean it up. >>> >>>> >>>> The fact that other callers don't check the return pointer value to >>>> see if only a null >>>> string was processed, is not its fault. >>>> Nor that it may not be ideally suited to sysfs attributes; that other store >>>> functions use it in a given manner does not means that is correct - >>>> nor that it is >>>> incorrect for that "knob". Some attributes could be just as valid with >>>> null zeros. >>>> >>>> And you are correct, to disambiguate the zero is not required for the >>>> limit feature. >>>> Your original patch which disallowed zero was full feature for mem_limit. >>>> It is the requested non-crucial feature to allow zero to reestablish >>>> the initial state >>>> that benefits from distinguishing an explicit zero from a "default zero' >>>> when garbage is written. >>>> >>>> The final argument is that if we release this feature as is the undocumented >>>> functionality could be relied upon, and when later fixed: user space breaks. >>> >>> I don't get it. Why does it break userspace? >>> The sysfs-block-zram says "0" means disable the limit. >>> If someone writes *garabge* but work as if disabling the limit, >>> it's not a right thing and he already broke although it worked >>> so it would be not a problem if we fix later. >>> (ie, we don't need to take care of broken userspace) >>> Am I missing your point? >>> >> >> Perhaps you are missing my point, perhaps ignoring or dismissing. >> >> Basically, if a facility works in a useful way, even if it was designed for >> different usage, that becomes the "accepted" interface/usage. >> The developer may not have intended that usage or may even considered >> it wrong and a broken usage, but it is what it is and people become >> reliant on that behaviour. >> >> Case in point is memparse itself. >> >> The developer intentionally sets the return pointer because that is the >> only value that can be validated for correct performance. >> The return value allows -ve so the standard error message passing is not valid. >> Unfortunately, C allows the user to pass a NULL value in the parameter. >> The developer could consider that absurd and fundamentally broken. >> But to the user it is a valid situation, because (perhaps) it can't be >> bothered to handle error cases. >> >> So, who is to blame. >> You say memparse, that it is fundamentally broken, >> because it didn't check to see that it was used correctly. >> And I say mem_limit_store is fundamentally broken, >> because it didn't check to see that it was used correctly. > > I think we should look at what the rest of the kernel does as far as > checking memparse results. It appears to be a mix of some code > checking memparse while others don't. The most common way to check > appears to be to verify that memparse actually parsed at least 1 > character, e.g.: > oldp = p; > mem_size = memparse(p, &p); > if (p == oldp) > return -EINVAL; > > although other places where 0 isn't valid can simply check for that: > mem_size = memparse(p, &p); > /* don't remove all of memory when handling "mem={invalid}" param */ > if (mem_size == 0) > return -EINVAL; > > or even the other memparse use in zram_drv.c: > disksize = memparse(buf, NULL); > if (!disksize) > return -EINVAL; > > > And there seem to be other places where (maybe?) there's no checking > at all. However, it also seems like many cases of memparse usage are > looking for a non-zero value, and therefore they can either > immediately check for zero/invalid or (possibly) later code has checks > to avoid using any zero value. In this case though, 0 is a valid > value. So, while I agree that if a user passes an invalid (i.e. > non-numeric) value it's clearly user error, it might be closer to the > apparent (although unwritten AFAICT) memparse usage api to check the > result for validity; in our case a simple check if at least 1 char was > parsed is all that's needed, e.g.: > > { > u64 limit; > char *tmp = buf; > struct zram *zram = dev_to_zram(dev); > > limit = memparse(buf, &tmp); > if (buf == tmp) /* no chars parsed, invalid input */ > return -EINVAL; > down_write(&zram->init_lock); > ... > > > Separate from this patch, it would also help if the lib/cmdline.c > memparse doc was at least updated to clarify when the result should be > checked for validity FWIW: I was pondering why I thought this was the wrong place. On reflection the best explanation is that it is not validity - the program does what it does quite well. (although it does have flaws for use by sysfs 1) it uses simple_strtoull which according to kernel.h#L269 is obsolete 2) it checks for a suffix in the null zero case (that means G,K,M are all valid memory size constants, and I think that should not be in the definition of valid mem parms) 3) it does nothing to enforce termination of the input. Both simple_strtoull and its successor kstrtoull are not buffer overrun safe. And so neither is memparse. It may be the sysfs buffer management does some magic here - but I have not seen it documented nor in code.) Rather than _validity_ it is _applicability_ that needs explaining. And that is not documented in the function that does its thing. But rather in the code that uses it, and more specifically in the framework established for its specific use - as in sysfs for numeric memory parameters. > and how best to do that (e.g. if 0 is an invalid value, just check if > the result is 0; if 0 is a possible valid value, check if any chars > were parsed). > > >> >> The difference is that memparse cannot stop being abused >> (C allows the NULL argument and extensive tricks are required to address that) >> however, we can readily fix mem_limit_store and ensure >> 1) no regression when the interface IS fixed and >> 2) predictable behaviour when accidental or "fuzzy" input arrives. >> >> >>>> They say getting API right is a difficult exercise. I suggest, if we >>>> don't insisting on >>>> an explicit zero we have the API wrong. >>>> >>>> I don't think you disagreed, just that the burden to get it correct >>>> lay elsewhere. >>>> >>>> If that is the case it doesn't really matter, we cannot release this >>>> interface until >>>> it is corrected wherever it must be. >>>> >>>> And my zero check was a poor hack. >>>> >>>> I should have explicitly checked the returned pointer value. >>>> >>>> I will send that proposed revision, and hopefully you will consider it >>>> for inclusion. >>>> >>>> >>>> >>>> >>>> >> >>>> >> > >>>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org> >>>> >> > --- >>>> >> > Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ >>>> >> > Documentation/blockdev/zram.txt | 24 ++++++++++++++--- >>>> >> > drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ >>>> >> > drivers/block/zram/zram_drv.h | 5 ++++ >>>> >> > 4 files changed, 76 insertions(+), 4 deletions(-) >>>> >> > >>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram >>>> >> > index 70ec992514d0..b8c779d64968 100644 >>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram >>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram >>>> >> > @@ -119,3 +119,13 @@ Description: >>>> >> > efficiency can be calculated using compr_data_size and this >>>> >> > statistic. >>>> >> > Unit: bytes >>>> >> > + >>>> >> > +What: /sys/block/zram<id>/mem_limit >>>> >> > +Date: August 2014 >>>> >> > +Contact: Minchan Kim <minchan@kernel.org> >>>> >> > +Description: >>>> >> > + The mem_limit file is read/write and specifies the amount >>>> >> > + of memory to be able to consume memory to store store >>>> >> > + compressed data. The limit could be changed in run time >>>> >> > - and "0" is default which means disable the limit. >>>> >> > + and "0" means disable the limit. No limit is the initial state. >>>> >> >>>> >> there should be no default in the API. >>>> > >>>> > Thanks. >>>> > >>>> >> >>>> >> > + Unit: bytes >>>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt >>>> >> > index 0595c3f56ccf..82c6a41116db 100644 >>>> >> > --- a/Documentation/blockdev/zram.txt >>>> >> > +++ b/Documentation/blockdev/zram.txt >>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory >>>> >> > since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the >>>> >> > size of the disk when not in use so a huge zram is wasteful. >>>> >> > >>>> >> > -5) Activate: >>>> >> > +5) Set memory limit: Optional >>>> >> > + Set memory limit by writing the value to sysfs node 'mem_limit'. >>>> >> > + The value can be either in bytes or you can use mem suffixes. >>>> >> > + In addition, you could change the value in runtime. >>>> >> > + Examples: >>>> >> > + # limit /dev/zram0 with 50MB memory >>>> >> > + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit >>>> >> > + >>>> >> > + # Using mem suffixes >>>> >> > + echo 256K > /sys/block/zram0/mem_limit >>>> >> > + echo 512M > /sys/block/zram0/mem_limit >>>> >> > + echo 1G > /sys/block/zram0/mem_limit >>>> >> > + >>>> >> > + # To disable memory limit >>>> >> > + echo 0 > /sys/block/zram0/mem_limit >>>> >> > + >>>> >> > +6) Activate: >>>> >> > mkswap /dev/zram0 >>>> >> > swapon /dev/zram0 >>>> >> > >>>> >> > mkfs.ext4 /dev/zram1 >>>> >> > mount /dev/zram1 /tmp >>>> >> > >>>> >> > -6) Stats: >>>> >> > +7) Stats: >>>> >> > Per-device statistics are exported as various nodes under >>>> >> > /sys/block/zram<id>/ >>>> >> > disksize >>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. >>>> >> > compr_data_size >>>> >> > mem_used_total >>>> >> > >>>> >> > -7) Deactivate: >>>> >> > +8) Deactivate: >>>> >> > swapoff /dev/zram0 >>>> >> > umount /dev/zram1 >>>> >> > >>>> >> > -8) Reset: >>>> >> > +9) Reset: >>>> >> > Write any positive value to 'reset' sysfs node >>>> >> > echo 1 > /sys/block/zram0/reset >>>> >> > echo 1 > /sys/block/zram1/reset >>>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c >>>> >> > index f0b8b30a7128..370c355eb127 100644 >>>> >> > --- a/drivers/block/zram/zram_drv.c >>>> >> > +++ b/drivers/block/zram/zram_drv.c >>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, >>>> >> > return scnprintf(buf, PAGE_SIZE, "%d\n", val); >>>> >> > } >>>> >> > >>>> >> > +static ssize_t mem_limit_show(struct device *dev, >>>> >> > + struct device_attribute *attr, char *buf) >>>> >> > +{ >>>> >> > + u64 val; >>>> >> > + struct zram *zram = dev_to_zram(dev); >>>> >> > + >>>> >> > + down_read(&zram->init_lock); >>>> >> > + val = zram->limit_pages; >>>> >> > + up_read(&zram->init_lock); >>>> >> > + >>>> >> > + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); >>>> >> > +} >>>> >> > + >>>> >> > +static ssize_t mem_limit_store(struct device *dev, >>>> >> > + struct device_attribute *attr, const char *buf, size_t len) >>>> >> > +{ >>>> >> > + u64 limit; >>>> >> > + struct zram *zram = dev_to_zram(dev); >>>> >> > + >>>> >> > + limit = memparse(buf, NULL); >>>> >> >>>> >> if (limit = 0 && buf != "0") >>>> >> return -EINVAL >>>> >> >>>> >> > + down_write(&zram->init_lock); >>>> >> > + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; >>>> >> > + up_write(&zram->init_lock); >>>> >> > + >>>> >> > + return len; >>>> >> > +} >>>> >> > + >>>> >> > static ssize_t max_comp_streams_store(struct device *dev, >>>> >> > struct device_attribute *attr, const char *buf, size_t len) >>>> >> > { >>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, >>>> >> > ret = -ENOMEM; >>>> >> > goto out; >>>> >> > } >>>> >> > + >>>> >> > + if (zram->limit_pages && >>>> >> > + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { >>>> >> > + zs_free(meta->mem_pool, handle); >>>> >> > + ret = -ENOMEM; >>>> >> > + goto out; >>>> >> > + } >>>> >> > + >>>> >> > cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); >>>> >> > >>>> >> > if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { >>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) >>>> >> > struct zram_meta *meta; >>>> >> > >>>> >> > down_write(&zram->init_lock); >>>> >> > + >>>> >> > + zram->limit_pages = 0; >>>> >> > + >>>> >> > if (!init_done(zram)) { >>>> >> > up_write(&zram->init_lock); >>>> >> > return; >>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); >>>> >> > static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); >>>> >> > static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); >>>> >> > static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); >>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, >>>> >> > + mem_limit_store); >>>> >> > static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, >>>> >> > max_comp_streams_show, max_comp_streams_store); >>>> >> > static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, >>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { >>>> >> > &dev_attr_orig_data_size.attr, >>>> >> > &dev_attr_compr_data_size.attr, >>>> >> > &dev_attr_mem_used_total.attr, >>>> >> > + &dev_attr_mem_limit.attr, >>>> >> > &dev_attr_max_comp_streams.attr, >>>> >> > &dev_attr_comp_algorithm.attr, >>>> >> > NULL, >>>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h >>>> >> > index e0f725c87cc6..b7aa9c21553f 100644 >>>> >> > --- a/drivers/block/zram/zram_drv.h >>>> >> > +++ b/drivers/block/zram/zram_drv.h >>>> >> > @@ -112,6 +112,11 @@ struct zram { >>>> >> > u64 disksize; /* bytes */ >>>> >> > int max_comp_streams; >>>> >> > struct zram_stats stats; >>>> >> > + /* >>>> >> > + * the number of pages zram can consume for storing compressed data >>>> >> > + */ >>>> >> > + unsigned long limit_pages; >>>> >> > + >>>> >> > char compressor[10]; >>>> >> > }; >>>> >> > #endif >>>> >> > -- >>>> >> > 2.0.0 >>>> >> > >>>> >> >>>> >> -- >>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>>> >> the body to majordomo@kvack.org. For more info on Linux MM, >>>> >> see: http://www.linux-mm.org/ . >>>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >>>> > >>>> > -- >>>> > Kind regards, >>>> > Minchan Kim >>>> >>>> -- >>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>>> the body to majordomo@kvack.org. For more info on Linux MM, >>>> see: http://www.linux-mm.org/ . >>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >>> >>> -- >>> Kind regards, >>> Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation 2014-08-26 4:28 ` David Horner @ 2014-08-26 13:40 ` Dan Streetman -1 siblings, 0 replies; 44+ messages in thread From: Dan Streetman @ 2014-08-26 13:40 UTC (permalink / raw) To: David Horner Cc: Minchan Kim, Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings On Tue, Aug 26, 2014 at 12:28 AM, David Horner <ds2horner@gmail.com> wrote: > On Mon, Aug 25, 2014 at 2:12 PM, Dan Streetman <ddstreet@ieee.org> wrote: >> On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@gmail.com> wrote: >>> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote: >>>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote: >>>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote: >>>>> > Hello David, >>>>> > >>>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote: >>>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: >>>>> >> > Since zram has no control feature to limit memory usage, >>>>> >> > it makes hard to manage system memrory. >>>>> >> > >>>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the >>>>> >> > a limit so that zram could fail allocation once it reaches >>>>> >> > the limit. >>>>> >> > >>>>> >> > In addition, user could change the limit in runtime so that >>>>> >> > he could manage the memory more dynamically. >>>>> >> > >>>>> >> - Default is no limit so it doesn't break old behavior. >>>>> >> + Initial state is no limit so it doesn't break old behavior. >>>>> >> >>>>> >> I understand your previous post now. >>>>> >> >>>>> >> I was saying that setting to either a null value or garbage >>>>> >> (which is interpreted as zero by memparse(buf, NULL);) >>>>> >> removes the limit. >>>>> >> >>>>> >> I think this is "surprise" behaviour and rather the null case should >>>>> >> return -EINVAL >>>>> >> The test below should be "good enough" though not catching all garbage. >>>>> > >>>>> > Thanks for suggesting but as I said, it should be fixed in memparse itself, >>>>> > not caller if it is really problem so I don't want to touch it in this >>>>> > patchset. It's not critical for adding the feature. >>>>> > >>>>> >>>>> I've looked into the memparse function more since we talked. >>>>> I do believe a wrapper function around it for the typical use by sysfs would >>>>> be very valuable. >>>> >>>> Agree. >>>> >>>>> However, there is nothing wrong with memparse itself that needs to be fixed. >>>>> >>>>> It does what it is documented to do very well (In My Uninformed Opinion). >>>>> It provides everything that a caller needs to manage the token that it >>>>> processes. >>>>> It thus handles strings like "7,,5,8,,9" with the implied zeros. >>>> >>>> Maybe strict_memparse would be better to protect such things so you >>>> could find several places to clean it up. >>>> >>>>> >>>>> The fact that other callers don't check the return pointer value to >>>>> see if only a null >>>>> string was processed, is not its fault. >>>>> Nor that it may not be ideally suited to sysfs attributes; that other store >>>>> functions use it in a given manner does not means that is correct - >>>>> nor that it is >>>>> incorrect for that "knob". Some attributes could be just as valid with >>>>> null zeros. >>>>> >>>>> And you are correct, to disambiguate the zero is not required for the >>>>> limit feature. >>>>> Your original patch which disallowed zero was full feature for mem_limit. >>>>> It is the requested non-crucial feature to allow zero to reestablish >>>>> the initial state >>>>> that benefits from distinguishing an explicit zero from a "default zero' >>>>> when garbage is written. >>>>> >>>>> The final argument is that if we release this feature as is the undocumented >>>>> functionality could be relied upon, and when later fixed: user space breaks. >>>> >>>> I don't get it. Why does it break userspace? >>>> The sysfs-block-zram says "0" means disable the limit. >>>> If someone writes *garabge* but work as if disabling the limit, >>>> it's not a right thing and he already broke although it worked >>>> so it would be not a problem if we fix later. >>>> (ie, we don't need to take care of broken userspace) >>>> Am I missing your point? >>>> >>> >>> Perhaps you are missing my point, perhaps ignoring or dismissing. >>> >>> Basically, if a facility works in a useful way, even if it was designed for >>> different usage, that becomes the "accepted" interface/usage. >>> The developer may not have intended that usage or may even considered >>> it wrong and a broken usage, but it is what it is and people become >>> reliant on that behaviour. >>> >>> Case in point is memparse itself. >>> >>> The developer intentionally sets the return pointer because that is the >>> only value that can be validated for correct performance. >>> The return value allows -ve so the standard error message passing is not valid. >>> Unfortunately, C allows the user to pass a NULL value in the parameter. >>> The developer could consider that absurd and fundamentally broken. >>> But to the user it is a valid situation, because (perhaps) it can't be >>> bothered to handle error cases. >>> >>> So, who is to blame. >>> You say memparse, that it is fundamentally broken, >>> because it didn't check to see that it was used correctly. >>> And I say mem_limit_store is fundamentally broken, >>> because it didn't check to see that it was used correctly. >> >> I think we should look at what the rest of the kernel does as far as >> checking memparse results. It appears to be a mix of some code >> checking memparse while others don't. The most common way to check >> appears to be to verify that memparse actually parsed at least 1 >> character, e.g.: >> oldp = p; >> mem_size = memparse(p, &p); >> if (p == oldp) >> return -EINVAL; >> >> although other places where 0 isn't valid can simply check for that: >> mem_size = memparse(p, &p); >> /* don't remove all of memory when handling "mem={invalid}" param */ >> if (mem_size == 0) >> return -EINVAL; >> >> or even the other memparse use in zram_drv.c: >> disksize = memparse(buf, NULL); >> if (!disksize) >> return -EINVAL; >> >> >> And there seem to be other places where (maybe?) there's no checking >> at all. However, it also seems like many cases of memparse usage are >> looking for a non-zero value, and therefore they can either >> immediately check for zero/invalid or (possibly) later code has checks >> to avoid using any zero value. In this case though, 0 is a valid >> value. So, while I agree that if a user passes an invalid (i.e. >> non-numeric) value it's clearly user error, it might be closer to the >> apparent (although unwritten AFAICT) memparse usage api to check the >> result for validity; in our case a simple check if at least 1 char was >> parsed is all that's needed, e.g.: >> >> { >> u64 limit; >> char *tmp = buf; >> struct zram *zram = dev_to_zram(dev); >> >> limit = memparse(buf, &tmp); >> if (buf == tmp) /* no chars parsed, invalid input */ >> return -EINVAL; >> down_write(&zram->init_lock); >> ... >> >> >> Separate from this patch, it would also help if the lib/cmdline.c >> memparse doc was at least updated to clarify when the result should be >> checked for validity > > FWIW: > I was pondering why I thought this was the wrong place. > On reflection the best explanation is that it is not validity - > the program does what it does quite well. > (although it does have flaws for use by sysfs > 1) it uses simple_strtoull which according to kernel.h#L269 is obsolete > 2) it checks for a suffix in the null zero case > (that means G,K,M are all valid memory size constants, > and I think that should not be in the definition of > valid mem parms) > 3) it does nothing to enforce termination of the input. > Both simple_strtoull and its successor kstrtoull are not > buffer overrun safe. > And so neither is memparse. > It may be the sysfs buffer management does some magic here > - but I have not seen it documented nor in code.) > > Rather than _validity_ it is _applicability_ that needs explaining. > And that is not documented in the function that does its thing. > But rather in the code that uses it, and more specifically in the framework > established for its specific use - as in sysfs for numeric memory parameters. Well, sysfs isn't the only user of memparse, over half of its usage is from arch/, presumably for kernel boot param parsing. So the doc on its usage shouldn't only be for sysfs. > >> and how best to do that (e.g. if 0 is an invalid value, just check if >> the result is 0; if 0 is a possible valid value, check if any chars >> were parsed). >> >> >>> >>> The difference is that memparse cannot stop being abused >>> (C allows the NULL argument and extensive tricks are required to address that) >>> however, we can readily fix mem_limit_store and ensure >>> 1) no regression when the interface IS fixed and >>> 2) predictable behaviour when accidental or "fuzzy" input arrives. >>> >>> >>>>> They say getting API right is a difficult exercise. I suggest, if we >>>>> don't insisting on >>>>> an explicit zero we have the API wrong. >>>>> >>>>> I don't think you disagreed, just that the burden to get it correct >>>>> lay elsewhere. >>>>> >>>>> If that is the case it doesn't really matter, we cannot release this >>>>> interface until >>>>> it is corrected wherever it must be. >>>>> >>>>> And my zero check was a poor hack. >>>>> >>>>> I should have explicitly checked the returned pointer value. >>>>> >>>>> I will send that proposed revision, and hopefully you will consider it >>>>> for inclusion. >>>>> >>>>> >>>>> >>>>> >>>>> >> >>>>> >> > >>>>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org> >>>>> >> > --- >>>>> >> > Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ >>>>> >> > Documentation/blockdev/zram.txt | 24 ++++++++++++++--- >>>>> >> > drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ >>>>> >> > drivers/block/zram/zram_drv.h | 5 ++++ >>>>> >> > 4 files changed, 76 insertions(+), 4 deletions(-) >>>>> >> > >>>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram >>>>> >> > index 70ec992514d0..b8c779d64968 100644 >>>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram >>>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram >>>>> >> > @@ -119,3 +119,13 @@ Description: >>>>> >> > efficiency can be calculated using compr_data_size and this >>>>> >> > statistic. >>>>> >> > Unit: bytes >>>>> >> > + >>>>> >> > +What: /sys/block/zram<id>/mem_limit >>>>> >> > +Date: August 2014 >>>>> >> > +Contact: Minchan Kim <minchan@kernel.org> >>>>> >> > +Description: >>>>> >> > + The mem_limit file is read/write and specifies the amount >>>>> >> > + of memory to be able to consume memory to store store >>>>> >> > + compressed data. The limit could be changed in run time >>>>> >> > - and "0" is default which means disable the limit. >>>>> >> > + and "0" means disable the limit. No limit is the initial state. >>>>> >> >>>>> >> there should be no default in the API. >>>>> > >>>>> > Thanks. >>>>> > >>>>> >> >>>>> >> > + Unit: bytes >>>>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt >>>>> >> > index 0595c3f56ccf..82c6a41116db 100644 >>>>> >> > --- a/Documentation/blockdev/zram.txt >>>>> >> > +++ b/Documentation/blockdev/zram.txt >>>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory >>>>> >> > since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the >>>>> >> > size of the disk when not in use so a huge zram is wasteful. >>>>> >> > >>>>> >> > -5) Activate: >>>>> >> > +5) Set memory limit: Optional >>>>> >> > + Set memory limit by writing the value to sysfs node 'mem_limit'. >>>>> >> > + The value can be either in bytes or you can use mem suffixes. >>>>> >> > + In addition, you could change the value in runtime. >>>>> >> > + Examples: >>>>> >> > + # limit /dev/zram0 with 50MB memory >>>>> >> > + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit >>>>> >> > + >>>>> >> > + # Using mem suffixes >>>>> >> > + echo 256K > /sys/block/zram0/mem_limit >>>>> >> > + echo 512M > /sys/block/zram0/mem_limit >>>>> >> > + echo 1G > /sys/block/zram0/mem_limit >>>>> >> > + >>>>> >> > + # To disable memory limit >>>>> >> > + echo 0 > /sys/block/zram0/mem_limit >>>>> >> > + >>>>> >> > +6) Activate: >>>>> >> > mkswap /dev/zram0 >>>>> >> > swapon /dev/zram0 >>>>> >> > >>>>> >> > mkfs.ext4 /dev/zram1 >>>>> >> > mount /dev/zram1 /tmp >>>>> >> > >>>>> >> > -6) Stats: >>>>> >> > +7) Stats: >>>>> >> > Per-device statistics are exported as various nodes under >>>>> >> > /sys/block/zram<id>/ >>>>> >> > disksize >>>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. >>>>> >> > compr_data_size >>>>> >> > mem_used_total >>>>> >> > >>>>> >> > -7) Deactivate: >>>>> >> > +8) Deactivate: >>>>> >> > swapoff /dev/zram0 >>>>> >> > umount /dev/zram1 >>>>> >> > >>>>> >> > -8) Reset: >>>>> >> > +9) Reset: >>>>> >> > Write any positive value to 'reset' sysfs node >>>>> >> > echo 1 > /sys/block/zram0/reset >>>>> >> > echo 1 > /sys/block/zram1/reset >>>>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c >>>>> >> > index f0b8b30a7128..370c355eb127 100644 >>>>> >> > --- a/drivers/block/zram/zram_drv.c >>>>> >> > +++ b/drivers/block/zram/zram_drv.c >>>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, >>>>> >> > return scnprintf(buf, PAGE_SIZE, "%d\n", val); >>>>> >> > } >>>>> >> > >>>>> >> > +static ssize_t mem_limit_show(struct device *dev, >>>>> >> > + struct device_attribute *attr, char *buf) >>>>> >> > +{ >>>>> >> > + u64 val; >>>>> >> > + struct zram *zram = dev_to_zram(dev); >>>>> >> > + >>>>> >> > + down_read(&zram->init_lock); >>>>> >> > + val = zram->limit_pages; >>>>> >> > + up_read(&zram->init_lock); >>>>> >> > + >>>>> >> > + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); >>>>> >> > +} >>>>> >> > + >>>>> >> > +static ssize_t mem_limit_store(struct device *dev, >>>>> >> > + struct device_attribute *attr, const char *buf, size_t len) >>>>> >> > +{ >>>>> >> > + u64 limit; >>>>> >> > + struct zram *zram = dev_to_zram(dev); >>>>> >> > + >>>>> >> > + limit = memparse(buf, NULL); >>>>> >> >>>>> >> if (limit = 0 && buf != "0") >>>>> >> return -EINVAL >>>>> >> >>>>> >> > + down_write(&zram->init_lock); >>>>> >> > + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; >>>>> >> > + up_write(&zram->init_lock); >>>>> >> > + >>>>> >> > + return len; >>>>> >> > +} >>>>> >> > + >>>>> >> > static ssize_t max_comp_streams_store(struct device *dev, >>>>> >> > struct device_attribute *attr, const char *buf, size_t len) >>>>> >> > { >>>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, >>>>> >> > ret = -ENOMEM; >>>>> >> > goto out; >>>>> >> > } >>>>> >> > + >>>>> >> > + if (zram->limit_pages && >>>>> >> > + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { >>>>> >> > + zs_free(meta->mem_pool, handle); >>>>> >> > + ret = -ENOMEM; >>>>> >> > + goto out; >>>>> >> > + } >>>>> >> > + >>>>> >> > cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); >>>>> >> > >>>>> >> > if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { >>>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) >>>>> >> > struct zram_meta *meta; >>>>> >> > >>>>> >> > down_write(&zram->init_lock); >>>>> >> > + >>>>> >> > + zram->limit_pages = 0; >>>>> >> > + >>>>> >> > if (!init_done(zram)) { >>>>> >> > up_write(&zram->init_lock); >>>>> >> > return; >>>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); >>>>> >> > static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); >>>>> >> > static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); >>>>> >> > static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); >>>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, >>>>> >> > + mem_limit_store); >>>>> >> > static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, >>>>> >> > max_comp_streams_show, max_comp_streams_store); >>>>> >> > static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, >>>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { >>>>> >> > &dev_attr_orig_data_size.attr, >>>>> >> > &dev_attr_compr_data_size.attr, >>>>> >> > &dev_attr_mem_used_total.attr, >>>>> >> > + &dev_attr_mem_limit.attr, >>>>> >> > &dev_attr_max_comp_streams.attr, >>>>> >> > &dev_attr_comp_algorithm.attr, >>>>> >> > NULL, >>>>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h >>>>> >> > index e0f725c87cc6..b7aa9c21553f 100644 >>>>> >> > --- a/drivers/block/zram/zram_drv.h >>>>> >> > +++ b/drivers/block/zram/zram_drv.h >>>>> >> > @@ -112,6 +112,11 @@ struct zram { >>>>> >> > u64 disksize; /* bytes */ >>>>> >> > int max_comp_streams; >>>>> >> > struct zram_stats stats; >>>>> >> > + /* >>>>> >> > + * the number of pages zram can consume for storing compressed data >>>>> >> > + */ >>>>> >> > + unsigned long limit_pages; >>>>> >> > + >>>>> >> > char compressor[10]; >>>>> >> > }; >>>>> >> > #endif >>>>> >> > -- >>>>> >> > 2.0.0 >>>>> >> > >>>>> >> >>>>> >> -- >>>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>>>> >> the body to majordomo@kvack.org. For more info on Linux MM, >>>>> >> see: http://www.linux-mm.org/ . >>>>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >>>>> > >>>>> > -- >>>>> > Kind regards, >>>>> > Minchan Kim >>>>> >>>>> -- >>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>>>> the body to majordomo@kvack.org. For more info on Linux MM, >>>>> see: http://www.linux-mm.org/ . >>>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >>>> >>>> -- >>>> Kind regards, >>>> Minchan Kim ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation @ 2014-08-26 13:40 ` Dan Streetman 0 siblings, 0 replies; 44+ messages in thread From: Dan Streetman @ 2014-08-26 13:40 UTC (permalink / raw) To: David Horner Cc: Minchan Kim, Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings On Tue, Aug 26, 2014 at 12:28 AM, David Horner <ds2horner@gmail.com> wrote: > On Mon, Aug 25, 2014 at 2:12 PM, Dan Streetman <ddstreet@ieee.org> wrote: >> On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@gmail.com> wrote: >>> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@kernel.org> wrote: >>>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote: >>>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@kernel.org> wrote: >>>>> > Hello David, >>>>> > >>>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote: >>>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: >>>>> >> > Since zram has no control feature to limit memory usage, >>>>> >> > it makes hard to manage system memrory. >>>>> >> > >>>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the >>>>> >> > a limit so that zram could fail allocation once it reaches >>>>> >> > the limit. >>>>> >> > >>>>> >> > In addition, user could change the limit in runtime so that >>>>> >> > he could manage the memory more dynamically. >>>>> >> > >>>>> >> - Default is no limit so it doesn't break old behavior. >>>>> >> + Initial state is no limit so it doesn't break old behavior. >>>>> >> >>>>> >> I understand your previous post now. >>>>> >> >>>>> >> I was saying that setting to either a null value or garbage >>>>> >> (which is interpreted as zero by memparse(buf, NULL);) >>>>> >> removes the limit. >>>>> >> >>>>> >> I think this is "surprise" behaviour and rather the null case should >>>>> >> return -EINVAL >>>>> >> The test below should be "good enough" though not catching all garbage. >>>>> > >>>>> > Thanks for suggesting but as I said, it should be fixed in memparse itself, >>>>> > not caller if it is really problem so I don't want to touch it in this >>>>> > patchset. It's not critical for adding the feature. >>>>> > >>>>> >>>>> I've looked into the memparse function more since we talked. >>>>> I do believe a wrapper function around it for the typical use by sysfs would >>>>> be very valuable. >>>> >>>> Agree. >>>> >>>>> However, there is nothing wrong with memparse itself that needs to be fixed. >>>>> >>>>> It does what it is documented to do very well (In My Uninformed Opinion). >>>>> It provides everything that a caller needs to manage the token that it >>>>> processes. >>>>> It thus handles strings like "7,,5,8,,9" with the implied zeros. >>>> >>>> Maybe strict_memparse would be better to protect such things so you >>>> could find several places to clean it up. >>>> >>>>> >>>>> The fact that other callers don't check the return pointer value to >>>>> see if only a null >>>>> string was processed, is not its fault. >>>>> Nor that it may not be ideally suited to sysfs attributes; that other store >>>>> functions use it in a given manner does not means that is correct - >>>>> nor that it is >>>>> incorrect for that "knob". Some attributes could be just as valid with >>>>> null zeros. >>>>> >>>>> And you are correct, to disambiguate the zero is not required for the >>>>> limit feature. >>>>> Your original patch which disallowed zero was full feature for mem_limit. >>>>> It is the requested non-crucial feature to allow zero to reestablish >>>>> the initial state >>>>> that benefits from distinguishing an explicit zero from a "default zero' >>>>> when garbage is written. >>>>> >>>>> The final argument is that if we release this feature as is the undocumented >>>>> functionality could be relied upon, and when later fixed: user space breaks. >>>> >>>> I don't get it. Why does it break userspace? >>>> The sysfs-block-zram says "0" means disable the limit. >>>> If someone writes *garabge* but work as if disabling the limit, >>>> it's not a right thing and he already broke although it worked >>>> so it would be not a problem if we fix later. >>>> (ie, we don't need to take care of broken userspace) >>>> Am I missing your point? >>>> >>> >>> Perhaps you are missing my point, perhaps ignoring or dismissing. >>> >>> Basically, if a facility works in a useful way, even if it was designed for >>> different usage, that becomes the "accepted" interface/usage. >>> The developer may not have intended that usage or may even considered >>> it wrong and a broken usage, but it is what it is and people become >>> reliant on that behaviour. >>> >>> Case in point is memparse itself. >>> >>> The developer intentionally sets the return pointer because that is the >>> only value that can be validated for correct performance. >>> The return value allows -ve so the standard error message passing is not valid. >>> Unfortunately, C allows the user to pass a NULL value in the parameter. >>> The developer could consider that absurd and fundamentally broken. >>> But to the user it is a valid situation, because (perhaps) it can't be >>> bothered to handle error cases. >>> >>> So, who is to blame. >>> You say memparse, that it is fundamentally broken, >>> because it didn't check to see that it was used correctly. >>> And I say mem_limit_store is fundamentally broken, >>> because it didn't check to see that it was used correctly. >> >> I think we should look at what the rest of the kernel does as far as >> checking memparse results. It appears to be a mix of some code >> checking memparse while others don't. The most common way to check >> appears to be to verify that memparse actually parsed at least 1 >> character, e.g.: >> oldp = p; >> mem_size = memparse(p, &p); >> if (p == oldp) >> return -EINVAL; >> >> although other places where 0 isn't valid can simply check for that: >> mem_size = memparse(p, &p); >> /* don't remove all of memory when handling "mem={invalid}" param */ >> if (mem_size == 0) >> return -EINVAL; >> >> or even the other memparse use in zram_drv.c: >> disksize = memparse(buf, NULL); >> if (!disksize) >> return -EINVAL; >> >> >> And there seem to be other places where (maybe?) there's no checking >> at all. However, it also seems like many cases of memparse usage are >> looking for a non-zero value, and therefore they can either >> immediately check for zero/invalid or (possibly) later code has checks >> to avoid using any zero value. In this case though, 0 is a valid >> value. So, while I agree that if a user passes an invalid (i.e. >> non-numeric) value it's clearly user error, it might be closer to the >> apparent (although unwritten AFAICT) memparse usage api to check the >> result for validity; in our case a simple check if at least 1 char was >> parsed is all that's needed, e.g.: >> >> { >> u64 limit; >> char *tmp = buf; >> struct zram *zram = dev_to_zram(dev); >> >> limit = memparse(buf, &tmp); >> if (buf == tmp) /* no chars parsed, invalid input */ >> return -EINVAL; >> down_write(&zram->init_lock); >> ... >> >> >> Separate from this patch, it would also help if the lib/cmdline.c >> memparse doc was at least updated to clarify when the result should be >> checked for validity > > FWIW: > I was pondering why I thought this was the wrong place. > On reflection the best explanation is that it is not validity - > the program does what it does quite well. > (although it does have flaws for use by sysfs > 1) it uses simple_strtoull which according to kernel.h#L269 is obsolete > 2) it checks for a suffix in the null zero case > (that means G,K,M are all valid memory size constants, > and I think that should not be in the definition of > valid mem parms) > 3) it does nothing to enforce termination of the input. > Both simple_strtoull and its successor kstrtoull are not > buffer overrun safe. > And so neither is memparse. > It may be the sysfs buffer management does some magic here > - but I have not seen it documented nor in code.) > > Rather than _validity_ it is _applicability_ that needs explaining. > And that is not documented in the function that does its thing. > But rather in the code that uses it, and more specifically in the framework > established for its specific use - as in sysfs for numeric memory parameters. Well, sysfs isn't the only user of memparse, over half of its usage is from arch/, presumably for kernel boot param parsing. So the doc on its usage shouldn't only be for sysfs. > >> and how best to do that (e.g. if 0 is an invalid value, just check if >> the result is 0; if 0 is a possible valid value, check if any chars >> were parsed). >> >> >>> >>> The difference is that memparse cannot stop being abused >>> (C allows the NULL argument and extensive tricks are required to address that) >>> however, we can readily fix mem_limit_store and ensure >>> 1) no regression when the interface IS fixed and >>> 2) predictable behaviour when accidental or "fuzzy" input arrives. >>> >>> >>>>> They say getting API right is a difficult exercise. I suggest, if we >>>>> don't insisting on >>>>> an explicit zero we have the API wrong. >>>>> >>>>> I don't think you disagreed, just that the burden to get it correct >>>>> lay elsewhere. >>>>> >>>>> If that is the case it doesn't really matter, we cannot release this >>>>> interface until >>>>> it is corrected wherever it must be. >>>>> >>>>> And my zero check was a poor hack. >>>>> >>>>> I should have explicitly checked the returned pointer value. >>>>> >>>>> I will send that proposed revision, and hopefully you will consider it >>>>> for inclusion. >>>>> >>>>> >>>>> >>>>> >>>>> >> >>>>> >> > >>>>> >> > Signed-off-by: Minchan Kim <minchan@kernel.org> >>>>> >> > --- >>>>> >> > Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++ >>>>> >> > Documentation/blockdev/zram.txt | 24 ++++++++++++++--- >>>>> >> > drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++ >>>>> >> > drivers/block/zram/zram_drv.h | 5 ++++ >>>>> >> > 4 files changed, 76 insertions(+), 4 deletions(-) >>>>> >> > >>>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram >>>>> >> > index 70ec992514d0..b8c779d64968 100644 >>>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram >>>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram >>>>> >> > @@ -119,3 +119,13 @@ Description: >>>>> >> > efficiency can be calculated using compr_data_size and this >>>>> >> > statistic. >>>>> >> > Unit: bytes >>>>> >> > + >>>>> >> > +What: /sys/block/zram<id>/mem_limit >>>>> >> > +Date: August 2014 >>>>> >> > +Contact: Minchan Kim <minchan@kernel.org> >>>>> >> > +Description: >>>>> >> > + The mem_limit file is read/write and specifies the amount >>>>> >> > + of memory to be able to consume memory to store store >>>>> >> > + compressed data. The limit could be changed in run time >>>>> >> > - and "0" is default which means disable the limit. >>>>> >> > + and "0" means disable the limit. No limit is the initial state. >>>>> >> >>>>> >> there should be no default in the API. >>>>> > >>>>> > Thanks. >>>>> > >>>>> >> >>>>> >> > + Unit: bytes >>>>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt >>>>> >> > index 0595c3f56ccf..82c6a41116db 100644 >>>>> >> > --- a/Documentation/blockdev/zram.txt >>>>> >> > +++ b/Documentation/blockdev/zram.txt >>>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory >>>>> >> > since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the >>>>> >> > size of the disk when not in use so a huge zram is wasteful. >>>>> >> > >>>>> >> > -5) Activate: >>>>> >> > +5) Set memory limit: Optional >>>>> >> > + Set memory limit by writing the value to sysfs node 'mem_limit'. >>>>> >> > + The value can be either in bytes or you can use mem suffixes. >>>>> >> > + In addition, you could change the value in runtime. >>>>> >> > + Examples: >>>>> >> > + # limit /dev/zram0 with 50MB memory >>>>> >> > + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit >>>>> >> > + >>>>> >> > + # Using mem suffixes >>>>> >> > + echo 256K > /sys/block/zram0/mem_limit >>>>> >> > + echo 512M > /sys/block/zram0/mem_limit >>>>> >> > + echo 1G > /sys/block/zram0/mem_limit >>>>> >> > + >>>>> >> > + # To disable memory limit >>>>> >> > + echo 0 > /sys/block/zram0/mem_limit >>>>> >> > + >>>>> >> > +6) Activate: >>>>> >> > mkswap /dev/zram0 >>>>> >> > swapon /dev/zram0 >>>>> >> > >>>>> >> > mkfs.ext4 /dev/zram1 >>>>> >> > mount /dev/zram1 /tmp >>>>> >> > >>>>> >> > -6) Stats: >>>>> >> > +7) Stats: >>>>> >> > Per-device statistics are exported as various nodes under >>>>> >> > /sys/block/zram<id>/ >>>>> >> > disksize >>>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful. >>>>> >> > compr_data_size >>>>> >> > mem_used_total >>>>> >> > >>>>> >> > -7) Deactivate: >>>>> >> > +8) Deactivate: >>>>> >> > swapoff /dev/zram0 >>>>> >> > umount /dev/zram1 >>>>> >> > >>>>> >> > -8) Reset: >>>>> >> > +9) Reset: >>>>> >> > Write any positive value to 'reset' sysfs node >>>>> >> > echo 1 > /sys/block/zram0/reset >>>>> >> > echo 1 > /sys/block/zram1/reset >>>>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c >>>>> >> > index f0b8b30a7128..370c355eb127 100644 >>>>> >> > --- a/drivers/block/zram/zram_drv.c >>>>> >> > +++ b/drivers/block/zram/zram_drv.c >>>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev, >>>>> >> > return scnprintf(buf, PAGE_SIZE, "%d\n", val); >>>>> >> > } >>>>> >> > >>>>> >> > +static ssize_t mem_limit_show(struct device *dev, >>>>> >> > + struct device_attribute *attr, char *buf) >>>>> >> > +{ >>>>> >> > + u64 val; >>>>> >> > + struct zram *zram = dev_to_zram(dev); >>>>> >> > + >>>>> >> > + down_read(&zram->init_lock); >>>>> >> > + val = zram->limit_pages; >>>>> >> > + up_read(&zram->init_lock); >>>>> >> > + >>>>> >> > + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); >>>>> >> > +} >>>>> >> > + >>>>> >> > +static ssize_t mem_limit_store(struct device *dev, >>>>> >> > + struct device_attribute *attr, const char *buf, size_t len) >>>>> >> > +{ >>>>> >> > + u64 limit; >>>>> >> > + struct zram *zram = dev_to_zram(dev); >>>>> >> > + >>>>> >> > + limit = memparse(buf, NULL); >>>>> >> >>>>> >> if (limit = 0 && buf != "0") >>>>> >> return -EINVAL >>>>> >> >>>>> >> > + down_write(&zram->init_lock); >>>>> >> > + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT; >>>>> >> > + up_write(&zram->init_lock); >>>>> >> > + >>>>> >> > + return len; >>>>> >> > +} >>>>> >> > + >>>>> >> > static ssize_t max_comp_streams_store(struct device *dev, >>>>> >> > struct device_attribute *attr, const char *buf, size_t len) >>>>> >> > { >>>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, >>>>> >> > ret = -ENOMEM; >>>>> >> > goto out; >>>>> >> > } >>>>> >> > + >>>>> >> > + if (zram->limit_pages && >>>>> >> > + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { >>>>> >> > + zs_free(meta->mem_pool, handle); >>>>> >> > + ret = -ENOMEM; >>>>> >> > + goto out; >>>>> >> > + } >>>>> >> > + >>>>> >> > cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); >>>>> >> > >>>>> >> > if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { >>>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity) >>>>> >> > struct zram_meta *meta; >>>>> >> > >>>>> >> > down_write(&zram->init_lock); >>>>> >> > + >>>>> >> > + zram->limit_pages = 0; >>>>> >> > + >>>>> >> > if (!init_done(zram)) { >>>>> >> > up_write(&zram->init_lock); >>>>> >> > return; >>>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL); >>>>> >> > static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); >>>>> >> > static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); >>>>> >> > static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); >>>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, >>>>> >> > + mem_limit_store); >>>>> >> > static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, >>>>> >> > max_comp_streams_show, max_comp_streams_store); >>>>> >> > static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, >>>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = { >>>>> >> > &dev_attr_orig_data_size.attr, >>>>> >> > &dev_attr_compr_data_size.attr, >>>>> >> > &dev_attr_mem_used_total.attr, >>>>> >> > + &dev_attr_mem_limit.attr, >>>>> >> > &dev_attr_max_comp_streams.attr, >>>>> >> > &dev_attr_comp_algorithm.attr, >>>>> >> > NULL, >>>>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h >>>>> >> > index e0f725c87cc6..b7aa9c21553f 100644 >>>>> >> > --- a/drivers/block/zram/zram_drv.h >>>>> >> > +++ b/drivers/block/zram/zram_drv.h >>>>> >> > @@ -112,6 +112,11 @@ struct zram { >>>>> >> > u64 disksize; /* bytes */ >>>>> >> > int max_comp_streams; >>>>> >> > struct zram_stats stats; >>>>> >> > + /* >>>>> >> > + * the number of pages zram can consume for storing compressed data >>>>> >> > + */ >>>>> >> > + unsigned long limit_pages; >>>>> >> > + >>>>> >> > char compressor[10]; >>>>> >> > }; >>>>> >> > #endif >>>>> >> > -- >>>>> >> > 2.0.0 >>>>> >> > >>>>> >> >>>>> >> -- >>>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>>>> >> the body to majordomo@kvack.org. For more info on Linux MM, >>>>> >> see: http://www.linux-mm.org/ . >>>>> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >>>>> > >>>>> > -- >>>>> > Kind regards, >>>>> > Minchan Kim >>>>> >>>>> -- >>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>>>> the body to majordomo@kvack.org. For more info on Linux MM, >>>>> see: http://www.linux-mm.org/ . >>>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> >>>> >>>> -- >>>> Kind regards, >>>> Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation 2014-08-25 4:37 ` Minchan Kim @ 2014-08-25 8:25 ` Dongsheng Song -1 siblings, 0 replies; 44+ messages in thread From: Dongsheng Song @ 2014-08-25 8:25 UTC (permalink / raw) To: Minchan Kim Cc: David Horner, Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman > +What: /sys/block/zram<id>/mem_limit > +Date: August 2014 > +Contact: Minchan Kim <minchan@kernel.org> > +Description: > + The mem_limit file is read/write and specifies the amount > + of memory to be able to consume memory to store store > + compressed data. The limit could be changed in run time > + and "0" means disable the limit. No limit is the initial state. extra word 'store' ? The mem_limit file is read/write and specifies the amount of memory to be able to consume memory to store store compressed data. maybe this better ? The mem_limit file is read/write and specifies the amount of memory to store compressed data. -- Dongsheng ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation @ 2014-08-25 8:25 ` Dongsheng Song 0 siblings, 0 replies; 44+ messages in thread From: Dongsheng Song @ 2014-08-25 8:25 UTC (permalink / raw) To: Minchan Kim Cc: David Horner, Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman > +What: /sys/block/zram<id>/mem_limit > +Date: August 2014 > +Contact: Minchan Kim <minchan@kernel.org> > +Description: > + The mem_limit file is read/write and specifies the amount > + of memory to be able to consume memory to store store > + compressed data. The limit could be changed in run time > + and "0" means disable the limit. No limit is the initial state. extra word 'store' ? The mem_limit file is read/write and specifies the amount of memory to be able to consume memory to store store compressed data. maybe this better ? The mem_limit file is read/write and specifies the amount of memory to store compressed data. -- Dongsheng -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation 2014-08-25 8:25 ` Dongsheng Song @ 2014-08-26 4:51 ` Minchan Kim -1 siblings, 0 replies; 44+ messages in thread From: Minchan Kim @ 2014-08-26 4:51 UTC (permalink / raw) To: Dongsheng Song Cc: David Horner, Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman Hello, On Mon, Aug 25, 2014 at 04:25:31PM +0800, Dongsheng Song wrote: > > +What: /sys/block/zram<id>/mem_limit > > +Date: August 2014 > > +Contact: Minchan Kim <minchan@kernel.org> > > +Description: > > + The mem_limit file is read/write and specifies the amount > > + of memory to be able to consume memory to store store > > + compressed data. The limit could be changed in run time > > + and "0" means disable the limit. No limit is the initial state. > > extra word 'store' ? > The mem_limit file is read/write and specifies the amount of memory to > be able to consume memory to store store compressed data. > > maybe this better ? > The mem_limit file is read/write and specifies the amount of memory to > store compressed data. Will fix. Thanks! > > -- > Dongsheng > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 3/4] zram: zram memory size limitation @ 2014-08-26 4:51 ` Minchan Kim 0 siblings, 0 replies; 44+ messages in thread From: Minchan Kim @ 2014-08-26 4:51 UTC (permalink / raw) To: Dongsheng Song Cc: David Horner, Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman Hello, On Mon, Aug 25, 2014 at 04:25:31PM +0800, Dongsheng Song wrote: > > +What: /sys/block/zram<id>/mem_limit > > +Date: August 2014 > > +Contact: Minchan Kim <minchan@kernel.org> > > +Description: > > + The mem_limit file is read/write and specifies the amount > > + of memory to be able to consume memory to store store > > + compressed data. The limit could be changed in run time > > + and "0" means disable the limit. No limit is the initial state. > > extra word 'store' ? > The mem_limit file is read/write and specifies the amount of memory to > be able to consume memory to store store compressed data. > > maybe this better ? > The mem_limit file is read/write and specifies the amount of memory to > store compressed data. Will fix. Thanks! > > -- > Dongsheng > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v4 4/4] zram: report maximum used memory 2014-08-22 0:42 ` Minchan Kim @ 2014-08-22 0:42 ` Minchan Kim -1 siblings, 0 replies; 44+ messages in thread From: Minchan Kim @ 2014-08-22 0:42 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman, ds2horner, Minchan Kim Normally, zram user could get maximum memory usage zram consumed via polling mem_used_total with sysfs in userspace. But it has a critical problem because user can miss peak memory usage during update inverval of polling. For avoiding that, user should poll it with shorter interval(ie, 0.0000000001s) with mlocking to avoid page fault delay when memory pressure is heavy. It would be troublesome. This patch adds new knob "mem_used_max" so user could see the maximum memory usage easily via reading the knob and reset it via "echo 0 > /sys/block/zram0/mem_used_max". Signed-off-by: Minchan Kim <minchan@kernel.org> --- Documentation/ABI/testing/sysfs-block-zram | 10 +++++ Documentation/blockdev/zram.txt | 1 + drivers/block/zram/zram_drv.c | 60 +++++++++++++++++++++++++++++- drivers/block/zram/zram_drv.h | 1 + 4 files changed, 70 insertions(+), 2 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram index b8c779d64968..7b8fca6a9b77 100644 --- a/Documentation/ABI/testing/sysfs-block-zram +++ b/Documentation/ABI/testing/sysfs-block-zram @@ -120,6 +120,16 @@ Description: statistic. Unit: bytes +What: /sys/block/zram<id>/mem_used_max +Date: August 2014 +Contact: Minchan Kim <minchan@kernel.org> +Description: + The mem_used_max file is read/write and specifies the amount + of maximum memory zram have consumed to store compressed data. + For resetting the value, you should write "0". Otherwise, + you could see -EINVAL. + Unit: bytes + What: /sys/block/zram<id>/mem_limit Date: August 2014 Contact: Minchan Kim <minchan@kernel.org> diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt index 82c6a41116db..7fcf9c6592ec 100644 --- a/Documentation/blockdev/zram.txt +++ b/Documentation/blockdev/zram.txt @@ -111,6 +111,7 @@ size of the disk when not in use so a huge zram is wasteful. orig_data_size compr_data_size mem_used_total + mem_used_max 8) Deactivate: swapoff /dev/zram0 diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 370c355eb127..1a2b3e320ea5 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -149,6 +149,41 @@ static ssize_t mem_limit_store(struct device *dev, return len; } +static ssize_t mem_used_max_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + u64 val = 0; + struct zram *zram = dev_to_zram(dev); + + down_read(&zram->init_lock); + if (init_done(zram)) + val = atomic_long_read(&zram->stats.max_used_pages); + up_read(&zram->init_lock); + + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); +} + +static ssize_t mem_used_max_store(struct device *dev, + struct device_attribute *attr, const char *buf, size_t len) +{ + int err; + unsigned long val; + struct zram *zram = dev_to_zram(dev); + struct zram_meta *meta = zram->meta; + + err = kstrtoul(buf, 10, &val); + if (err || val != 0) + return -EINVAL; + + down_read(&zram->init_lock); + if (init_done(zram)) + atomic_long_set(&zram->stats.max_used_pages, + zs_get_total_pages(meta->mem_pool)); + up_read(&zram->init_lock); + + return len; +} + static ssize_t max_comp_streams_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { @@ -461,6 +496,21 @@ out_cleanup: return ret; } +static inline void update_used_max(struct zram *zram, + const unsigned long pages) +{ + int old_max, cur_max; + + old_max = atomic_long_read(&zram->stats.max_used_pages); + + do { + cur_max = old_max; + if (pages > cur_max) + old_max = atomic_long_cmpxchg( + &zram->stats.max_used_pages, cur_max, pages); + } while (old_max != cur_max); +} + static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, int offset) { @@ -472,6 +522,7 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, struct zram_meta *meta = zram->meta; struct zcomp_strm *zstrm; bool locked = false; + unsigned long alloced_pages; page = bvec->bv_page; if (is_partial_io(bvec)) { @@ -541,13 +592,15 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, goto out; } - if (zram->limit_pages && - zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { + alloced_pages = zs_get_total_pages(meta->mem_pool); + if (zram->limit_pages && alloced_pages > zram->limit_pages) { zs_free(meta->mem_pool, handle); ret = -ENOMEM; goto out; } + update_used_max(zram, alloced_pages); + cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { @@ -897,6 +950,8 @@ static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, mem_limit_store); +static DEVICE_ATTR(mem_used_max, S_IRUGO | S_IWUSR, mem_used_max_show, + mem_used_max_store); static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, max_comp_streams_show, max_comp_streams_store); static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, @@ -926,6 +981,7 @@ static struct attribute *zram_disk_attrs[] = { &dev_attr_compr_data_size.attr, &dev_attr_mem_used_total.attr, &dev_attr_mem_limit.attr, + &dev_attr_mem_used_max.attr, &dev_attr_max_comp_streams.attr, &dev_attr_comp_algorithm.attr, NULL, diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h index b7aa9c21553f..c6ee271317f5 100644 --- a/drivers/block/zram/zram_drv.h +++ b/drivers/block/zram/zram_drv.h @@ -90,6 +90,7 @@ struct zram_stats { atomic64_t notify_free; /* no. of swap slot free notifications */ atomic64_t zero_pages; /* no. of zero filled pages */ atomic64_t pages_stored; /* no. of pages currently stored */ + atomic_long_t max_used_pages; /* no. of maximum pages stored */ }; struct zram_meta { -- 2.0.0 ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v4 4/4] zram: report maximum used memory @ 2014-08-22 0:42 ` Minchan Kim 0 siblings, 0 replies; 44+ messages in thread From: Minchan Kim @ 2014-08-22 0:42 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings, Dan Streetman, ds2horner, Minchan Kim Normally, zram user could get maximum memory usage zram consumed via polling mem_used_total with sysfs in userspace. But it has a critical problem because user can miss peak memory usage during update inverval of polling. For avoiding that, user should poll it with shorter interval(ie, 0.0000000001s) with mlocking to avoid page fault delay when memory pressure is heavy. It would be troublesome. This patch adds new knob "mem_used_max" so user could see the maximum memory usage easily via reading the knob and reset it via "echo 0 > /sys/block/zram0/mem_used_max". Signed-off-by: Minchan Kim <minchan@kernel.org> --- Documentation/ABI/testing/sysfs-block-zram | 10 +++++ Documentation/blockdev/zram.txt | 1 + drivers/block/zram/zram_drv.c | 60 +++++++++++++++++++++++++++++- drivers/block/zram/zram_drv.h | 1 + 4 files changed, 70 insertions(+), 2 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram index b8c779d64968..7b8fca6a9b77 100644 --- a/Documentation/ABI/testing/sysfs-block-zram +++ b/Documentation/ABI/testing/sysfs-block-zram @@ -120,6 +120,16 @@ Description: statistic. Unit: bytes +What: /sys/block/zram<id>/mem_used_max +Date: August 2014 +Contact: Minchan Kim <minchan@kernel.org> +Description: + The mem_used_max file is read/write and specifies the amount + of maximum memory zram have consumed to store compressed data. + For resetting the value, you should write "0". Otherwise, + you could see -EINVAL. + Unit: bytes + What: /sys/block/zram<id>/mem_limit Date: August 2014 Contact: Minchan Kim <minchan@kernel.org> diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt index 82c6a41116db..7fcf9c6592ec 100644 --- a/Documentation/blockdev/zram.txt +++ b/Documentation/blockdev/zram.txt @@ -111,6 +111,7 @@ size of the disk when not in use so a huge zram is wasteful. orig_data_size compr_data_size mem_used_total + mem_used_max 8) Deactivate: swapoff /dev/zram0 diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 370c355eb127..1a2b3e320ea5 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -149,6 +149,41 @@ static ssize_t mem_limit_store(struct device *dev, return len; } +static ssize_t mem_used_max_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + u64 val = 0; + struct zram *zram = dev_to_zram(dev); + + down_read(&zram->init_lock); + if (init_done(zram)) + val = atomic_long_read(&zram->stats.max_used_pages); + up_read(&zram->init_lock); + + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT); +} + +static ssize_t mem_used_max_store(struct device *dev, + struct device_attribute *attr, const char *buf, size_t len) +{ + int err; + unsigned long val; + struct zram *zram = dev_to_zram(dev); + struct zram_meta *meta = zram->meta; + + err = kstrtoul(buf, 10, &val); + if (err || val != 0) + return -EINVAL; + + down_read(&zram->init_lock); + if (init_done(zram)) + atomic_long_set(&zram->stats.max_used_pages, + zs_get_total_pages(meta->mem_pool)); + up_read(&zram->init_lock); + + return len; +} + static ssize_t max_comp_streams_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) { @@ -461,6 +496,21 @@ out_cleanup: return ret; } +static inline void update_used_max(struct zram *zram, + const unsigned long pages) +{ + int old_max, cur_max; + + old_max = atomic_long_read(&zram->stats.max_used_pages); + + do { + cur_max = old_max; + if (pages > cur_max) + old_max = atomic_long_cmpxchg( + &zram->stats.max_used_pages, cur_max, pages); + } while (old_max != cur_max); +} + static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, int offset) { @@ -472,6 +522,7 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, struct zram_meta *meta = zram->meta; struct zcomp_strm *zstrm; bool locked = false; + unsigned long alloced_pages; page = bvec->bv_page; if (is_partial_io(bvec)) { @@ -541,13 +592,15 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index, goto out; } - if (zram->limit_pages && - zs_get_total_pages(meta->mem_pool) > zram->limit_pages) { + alloced_pages = zs_get_total_pages(meta->mem_pool); + if (zram->limit_pages && alloced_pages > zram->limit_pages) { zs_free(meta->mem_pool, handle); ret = -ENOMEM; goto out; } + update_used_max(zram, alloced_pages); + cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO); if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) { @@ -897,6 +950,8 @@ static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show, mem_limit_store); +static DEVICE_ATTR(mem_used_max, S_IRUGO | S_IWUSR, mem_used_max_show, + mem_used_max_store); static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR, max_comp_streams_show, max_comp_streams_store); static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR, @@ -926,6 +981,7 @@ static struct attribute *zram_disk_attrs[] = { &dev_attr_compr_data_size.attr, &dev_attr_mem_used_total.attr, &dev_attr_mem_limit.attr, + &dev_attr_mem_used_max.attr, &dev_attr_max_comp_streams.attr, &dev_attr_comp_algorithm.attr, NULL, diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h index b7aa9c21553f..c6ee271317f5 100644 --- a/drivers/block/zram/zram_drv.h +++ b/drivers/block/zram/zram_drv.h @@ -90,6 +90,7 @@ struct zram_stats { atomic64_t notify_free; /* no. of swap slot free notifications */ atomic64_t zero_pages; /* no. of zero filled pages */ atomic64_t pages_stored; /* no. of pages currently stored */ + atomic_long_t max_used_pages; /* no. of maximum pages stored */ }; struct zram_meta { -- 2.0.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v4 0/4] zram memory control enhance 2014-08-22 0:42 ` Minchan Kim @ 2014-08-22 19:15 ` Dan Streetman -1 siblings, 0 replies; 44+ messages in thread From: Dan Streetman @ 2014-08-22 19:15 UTC (permalink / raw) To: Minchan Kim Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings, David Horner On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: > Currently, zram has no feature to limit memory so theoretically > zram can deplete system memory. > Users have asked for a limit several times as even without exhaustion > zram makes it hard to control memory usage of the platform. > This patchset adds the feature. > > Patch 1 makes zs_get_total_size_bytes faster because it would be > used frequently in later patches for the new feature. > > Patch 2 changes zs_get_total_size_bytes's return unit from bytes > to page so that zsmalloc doesn't need unnecessary operation(ie, > << PAGE_SHIFT). > > Patch 3 adds new feature. I added the feature into zram layer, > not zsmalloc because limiation is zram's requirement, not zsmalloc > so any other user using zsmalloc(ie, zpool) shouldn't affected > by unnecessary branch of zsmalloc. In future, if every users > of zsmalloc want the feature, then, we could move the feature > from client side to zsmalloc easily but vice versa would be > painful. > > Patch 4 adds news facility to report maximum memory usage of zram > so that this avoids user polling frequently via /sys/block/zram0/ > mem_used_total and ensures transient max are not missed. FWIW, with the minor update to checking the memparse in patch 3 David mentioned, feel free to add to all the patches: Reviewed-by: Dan Streetman <ddstreet@ieee.org> > > * From v3 > * get_zs_total_size_byte function name change - Dan > * clarifiction of the document - Dan > * atomic account instead of introducing new lock in zsmalloc - David > * remove unnecessary atomic instruction in updating max - David > > * From v2 > * introduce helper funcntion to update max_used_pages > for readability - David > * avoid unncessary zs_get_total_size call in updating loop > for max_used_pages - David > > * From v1 > * rebased on next-20140815 > * fix up race problem - David, Dan > * reset mem_used_max as current total_bytes, rather than 0 - David > * resetting works with only "0" write for extensiblilty - David, Dan > > Minchan Kim (4): > zsmalloc: move pages_allocated to zs_pool > zsmalloc: change return value unit of zs_get_total_size_bytes > zram: zram memory size limitation > zram: report maximum used memory > > Documentation/ABI/testing/sysfs-block-zram | 20 ++++++ > Documentation/blockdev/zram.txt | 25 +++++-- > drivers/block/zram/zram_drv.c | 101 ++++++++++++++++++++++++++++- > drivers/block/zram/zram_drv.h | 6 ++ > include/linux/zsmalloc.h | 2 +- > mm/zsmalloc.c | 30 ++++----- > 6 files changed, 158 insertions(+), 26 deletions(-) > > -- > 2.0.0 > ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 0/4] zram memory control enhance @ 2014-08-22 19:15 ` Dan Streetman 0 siblings, 0 replies; 44+ messages in thread From: Dan Streetman @ 2014-08-22 19:15 UTC (permalink / raw) To: Minchan Kim Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings, David Horner On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: > Currently, zram has no feature to limit memory so theoretically > zram can deplete system memory. > Users have asked for a limit several times as even without exhaustion > zram makes it hard to control memory usage of the platform. > This patchset adds the feature. > > Patch 1 makes zs_get_total_size_bytes faster because it would be > used frequently in later patches for the new feature. > > Patch 2 changes zs_get_total_size_bytes's return unit from bytes > to page so that zsmalloc doesn't need unnecessary operation(ie, > << PAGE_SHIFT). > > Patch 3 adds new feature. I added the feature into zram layer, > not zsmalloc because limiation is zram's requirement, not zsmalloc > so any other user using zsmalloc(ie, zpool) shouldn't affected > by unnecessary branch of zsmalloc. In future, if every users > of zsmalloc want the feature, then, we could move the feature > from client side to zsmalloc easily but vice versa would be > painful. > > Patch 4 adds news facility to report maximum memory usage of zram > so that this avoids user polling frequently via /sys/block/zram0/ > mem_used_total and ensures transient max are not missed. FWIW, with the minor update to checking the memparse in patch 3 David mentioned, feel free to add to all the patches: Reviewed-by: Dan Streetman <ddstreet@ieee.org> > > * From v3 > * get_zs_total_size_byte function name change - Dan > * clarifiction of the document - Dan > * atomic account instead of introducing new lock in zsmalloc - David > * remove unnecessary atomic instruction in updating max - David > > * From v2 > * introduce helper funcntion to update max_used_pages > for readability - David > * avoid unncessary zs_get_total_size call in updating loop > for max_used_pages - David > > * From v1 > * rebased on next-20140815 > * fix up race problem - David, Dan > * reset mem_used_max as current total_bytes, rather than 0 - David > * resetting works with only "0" write for extensiblilty - David, Dan > > Minchan Kim (4): > zsmalloc: move pages_allocated to zs_pool > zsmalloc: change return value unit of zs_get_total_size_bytes > zram: zram memory size limitation > zram: report maximum used memory > > Documentation/ABI/testing/sysfs-block-zram | 20 ++++++ > Documentation/blockdev/zram.txt | 25 +++++-- > drivers/block/zram/zram_drv.c | 101 ++++++++++++++++++++++++++++- > drivers/block/zram/zram_drv.h | 6 ++ > include/linux/zsmalloc.h | 2 +- > mm/zsmalloc.c | 30 ++++----- > 6 files changed, 158 insertions(+), 26 deletions(-) > > -- > 2.0.0 > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 0/4] zram memory control enhance 2014-08-22 19:15 ` Dan Streetman @ 2014-08-24 23:58 ` Minchan Kim -1 siblings, 0 replies; 44+ messages in thread From: Minchan Kim @ 2014-08-24 23:58 UTC (permalink / raw) To: Dan Streetman Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings, David Horner Hello Dan, On Fri, Aug 22, 2014 at 03:15:36PM -0400, Dan Streetman wrote: > On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: > > Currently, zram has no feature to limit memory so theoretically > > zram can deplete system memory. > > Users have asked for a limit several times as even without exhaustion > > zram makes it hard to control memory usage of the platform. > > This patchset adds the feature. > > > > Patch 1 makes zs_get_total_size_bytes faster because it would be > > used frequently in later patches for the new feature. > > > > Patch 2 changes zs_get_total_size_bytes's return unit from bytes > > to page so that zsmalloc doesn't need unnecessary operation(ie, > > << PAGE_SHIFT). > > > > Patch 3 adds new feature. I added the feature into zram layer, > > not zsmalloc because limiation is zram's requirement, not zsmalloc > > so any other user using zsmalloc(ie, zpool) shouldn't affected > > by unnecessary branch of zsmalloc. In future, if every users > > of zsmalloc want the feature, then, we could move the feature > > from client side to zsmalloc easily but vice versa would be > > painful. > > > > Patch 4 adds news facility to report maximum memory usage of zram > > so that this avoids user polling frequently via /sys/block/zram0/ > > mem_used_total and ensures transient max are not missed. > > FWIW, with the minor update to checking the memparse in patch 3 David > mentioned, feel free to add to all the patches: I replied David's reply, it's not critical for the goal of this patchset. And if we should fix, it should be memparse and handle all of cases, not just only null case. So I will take your Reviewed-by except 3 patch. :) > > Reviewed-by: Dan Streetman <ddstreet@ieee.org> Thanks! > > > > > * From v3 > > * get_zs_total_size_byte function name change - Dan > > * clarifiction of the document - Dan > > * atomic account instead of introducing new lock in zsmalloc - David > > * remove unnecessary atomic instruction in updating max - David > > > > * From v2 > > * introduce helper funcntion to update max_used_pages > > for readability - David > > * avoid unncessary zs_get_total_size call in updating loop > > for max_used_pages - David > > > > * From v1 > > * rebased on next-20140815 > > * fix up race problem - David, Dan > > * reset mem_used_max as current total_bytes, rather than 0 - David > > * resetting works with only "0" write for extensiblilty - David, Dan > > > > Minchan Kim (4): > > zsmalloc: move pages_allocated to zs_pool > > zsmalloc: change return value unit of zs_get_total_size_bytes > > zram: zram memory size limitation > > zram: report maximum used memory > > > > Documentation/ABI/testing/sysfs-block-zram | 20 ++++++ > > Documentation/blockdev/zram.txt | 25 +++++-- > > drivers/block/zram/zram_drv.c | 101 ++++++++++++++++++++++++++++- > > drivers/block/zram/zram_drv.h | 6 ++ > > include/linux/zsmalloc.h | 2 +- > > mm/zsmalloc.c | 30 ++++----- > > 6 files changed, 158 insertions(+), 26 deletions(-) > > > > -- > > 2.0.0 > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Kind regards, Minchan Kim ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v4 0/4] zram memory control enhance @ 2014-08-24 23:58 ` Minchan Kim 0 siblings, 0 replies; 44+ messages in thread From: Minchan Kim @ 2014-08-24 23:58 UTC (permalink / raw) To: Dan Streetman Cc: Andrew Morton, Linux-MM, linux-kernel, Sergey Senozhatsky, Jerome Marchand, juno.choi, seungho1.park, Luigi Semenzato, Nitin Gupta, Seth Jennings, David Horner Hello Dan, On Fri, Aug 22, 2014 at 03:15:36PM -0400, Dan Streetman wrote: > On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@kernel.org> wrote: > > Currently, zram has no feature to limit memory so theoretically > > zram can deplete system memory. > > Users have asked for a limit several times as even without exhaustion > > zram makes it hard to control memory usage of the platform. > > This patchset adds the feature. > > > > Patch 1 makes zs_get_total_size_bytes faster because it would be > > used frequently in later patches for the new feature. > > > > Patch 2 changes zs_get_total_size_bytes's return unit from bytes > > to page so that zsmalloc doesn't need unnecessary operation(ie, > > << PAGE_SHIFT). > > > > Patch 3 adds new feature. I added the feature into zram layer, > > not zsmalloc because limiation is zram's requirement, not zsmalloc > > so any other user using zsmalloc(ie, zpool) shouldn't affected > > by unnecessary branch of zsmalloc. In future, if every users > > of zsmalloc want the feature, then, we could move the feature > > from client side to zsmalloc easily but vice versa would be > > painful. > > > > Patch 4 adds news facility to report maximum memory usage of zram > > so that this avoids user polling frequently via /sys/block/zram0/ > > mem_used_total and ensures transient max are not missed. > > FWIW, with the minor update to checking the memparse in patch 3 David > mentioned, feel free to add to all the patches: I replied David's reply, it's not critical for the goal of this patchset. And if we should fix, it should be memparse and handle all of cases, not just only null case. So I will take your Reviewed-by except 3 patch. :) > > Reviewed-by: Dan Streetman <ddstreet@ieee.org> Thanks! > > > > > * From v3 > > * get_zs_total_size_byte function name change - Dan > > * clarifiction of the document - Dan > > * atomic account instead of introducing new lock in zsmalloc - David > > * remove unnecessary atomic instruction in updating max - David > > > > * From v2 > > * introduce helper funcntion to update max_used_pages > > for readability - David > > * avoid unncessary zs_get_total_size call in updating loop > > for max_used_pages - David > > > > * From v1 > > * rebased on next-20140815 > > * fix up race problem - David, Dan > > * reset mem_used_max as current total_bytes, rather than 0 - David > > * resetting works with only "0" write for extensiblilty - David, Dan > > > > Minchan Kim (4): > > zsmalloc: move pages_allocated to zs_pool > > zsmalloc: change return value unit of zs_get_total_size_bytes > > zram: zram memory size limitation > > zram: report maximum used memory > > > > Documentation/ABI/testing/sysfs-block-zram | 20 ++++++ > > Documentation/blockdev/zram.txt | 25 +++++-- > > drivers/block/zram/zram_drv.c | 101 ++++++++++++++++++++++++++++- > > drivers/block/zram/zram_drv.h | 6 ++ > > include/linux/zsmalloc.h | 2 +- > > mm/zsmalloc.c | 30 ++++----- > > 6 files changed, 158 insertions(+), 26 deletions(-) > > > > -- > > 2.0.0 > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2014-08-26 13:40 UTC | newest] Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-08-22 0:42 [PATCH v4 0/4] zram memory control enhance Minchan Kim 2014-08-22 0:42 ` Minchan Kim 2014-08-22 0:42 ` [PATCH v4 1/4] zsmalloc: move pages_allocated to zs_pool Minchan Kim 2014-08-22 0:42 ` Minchan Kim 2014-08-22 0:42 ` [PATCH v4 2/4] zsmalloc: change return value unit of zs_get_total_size_bytes Minchan Kim 2014-08-22 0:42 ` Minchan Kim 2014-08-22 0:42 ` [PATCH v4 3/4] zram: zram memory size limitation Minchan Kim 2014-08-22 0:42 ` Minchan Kim 2014-08-22 10:55 ` David Horner 2014-08-22 10:55 ` David Horner 2014-08-22 18:47 ` Dan Streetman 2014-08-22 18:47 ` Dan Streetman 2014-08-24 23:56 ` Minchan Kim 2014-08-24 23:56 ` Minchan Kim 2014-08-25 3:40 ` David Horner 2014-08-25 3:40 ` David Horner 2014-08-25 4:37 ` Minchan Kim 2014-08-25 4:37 ` Minchan Kim 2014-08-25 8:22 ` David Horner 2014-08-25 8:22 ` David Horner 2014-08-25 18:12 ` Dan Streetman 2014-08-25 18:12 ` Dan Streetman 2014-08-26 1:54 ` David Horner 2014-08-26 1:54 ` David Horner 2014-08-26 4:39 ` Minchan Kim 2014-08-26 4:39 ` Minchan Kim 2014-08-26 5:36 ` David Horner 2014-08-26 5:36 ` David Horner 2014-08-26 13:31 ` Dan Streetman 2014-08-26 13:31 ` Dan Streetman 2014-08-26 4:28 ` David Horner 2014-08-26 4:28 ` David Horner 2014-08-26 13:40 ` Dan Streetman 2014-08-26 13:40 ` Dan Streetman 2014-08-25 8:25 ` Dongsheng Song 2014-08-25 8:25 ` Dongsheng Song 2014-08-26 4:51 ` Minchan Kim 2014-08-26 4:51 ` Minchan Kim 2014-08-22 0:42 ` [PATCH v4 4/4] zram: report maximum used memory Minchan Kim 2014-08-22 0:42 ` Minchan Kim 2014-08-22 19:15 ` [PATCH v4 0/4] zram memory control enhance Dan Streetman 2014-08-22 19:15 ` Dan Streetman 2014-08-24 23:58 ` Minchan Kim 2014-08-24 23:58 ` Minchan Kim
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.