All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] zram memory tracking
@ 2018-03-27  6:50 Minchan Kim
  2018-03-27  6:50 ` [PATCH v2 1/4] zram: correct flag name of ZRAM_ACCESS Minchan Kim
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Minchan Kim @ 2018-03-27  6:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Sergey Senozhatsky, LKML, Minchan Kim

zRam as swap is useful for small memory device. However, swap means
those pages on zram are mostly cold pages due to VM's LRU algorithm.
Especially, once init data for application are touched for launching,
they tend to be not accessed any more and finally swapped out.
zRAM can store such cold pages as compressed form but it's pointless
to keep in memory. As well, it's pointless to store incompressible
pages to zram so better idea is app developers manages them directly
like free or mlock rather than remaining them on heap.

This patch provides a debugfs /sys/kernel/debug/zram/zram0/block_state
to represent each block's state so admin can investigate what memory is
cold|incompressible|same page with using pagemap once the pages are
swapped out.

The output is as follows,

   25064    .wh       100
   25065    s..       9
   25067    ..h       15

First column is zram's block index and second one represents symbol
(s: same page w: written page to backing store h: huge page) of the 
block state. Last one means number of seconds elapsed since the block
was last accessed. So above example means the 25064th block is accessed
100 second ago and it was huge so it was written to the backing store.

* From v1:
  * Do not propagate error number for debugfs fail - Greg KH
  * Add writeback and hugepage information - Sergey

Minchan Kim (4):
  zram: correct flag name of ZRAM_ACCESS
  zram: mark incompressible page as ZRAM_HUGE
  zram: record accessed second
  zram: introduce zram memory tracking

 Documentation/blockdev/zram.txt |   1 +
 drivers/block/zram/Kconfig      |  10 ++
 drivers/block/zram/zram_drv.c   | 157 +++++++++++++++++++++++++++++---
 drivers/block/zram/zram_drv.h   |  12 ++-
 4 files changed, 163 insertions(+), 17 deletions(-)

-- 
2.17.0.rc0.231.g781580f067-goog

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/4] zram: correct flag name of ZRAM_ACCESS
  2018-03-27  6:50 [PATCH v2 0/4] zram memory tracking Minchan Kim
@ 2018-03-27  6:50 ` Minchan Kim
  2018-03-27  6:50 ` [PATCH v2 2/4] zram: mark incompressible page as ZRAM_HUGE Minchan Kim
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Minchan Kim @ 2018-03-27  6:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Sergey Senozhatsky, LKML, Minchan Kim

ZRAM_ACCESS is used for locking a slot of zram so correct the name.
It is also not a common flag to indicate status of the block so
move the declare position on top of the flag.
Lastly, let's move the function to the top of source code to be able to
use it easily without forward declaration.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/zram/zram_drv.c | 20 ++++++++++----------
 drivers/block/zram/zram_drv.h |  6 +++---
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 0f3fadd71230..18dadeab775b 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -52,6 +52,16 @@ static size_t huge_class_size;
 
 static void zram_free_page(struct zram *zram, size_t index);
 
+static void zram_slot_lock(struct zram *zram, u32 index)
+{
+	bit_spin_lock(ZRAM_LOCK, &zram->table[index].value);
+}
+
+static void zram_slot_unlock(struct zram *zram, u32 index)
+{
+	bit_spin_unlock(ZRAM_LOCK, &zram->table[index].value);
+}
+
 static inline bool init_done(struct zram *zram)
 {
 	return zram->disksize;
@@ -753,16 +763,6 @@ static DEVICE_ATTR_RO(io_stat);
 static DEVICE_ATTR_RO(mm_stat);
 static DEVICE_ATTR_RO(debug_stat);
 
-static void zram_slot_lock(struct zram *zram, u32 index)
-{
-	bit_spin_lock(ZRAM_ACCESS, &zram->table[index].value);
-}
-
-static void zram_slot_unlock(struct zram *zram, u32 index)
-{
-	bit_spin_unlock(ZRAM_ACCESS, &zram->table[index].value);
-}
-
 static void zram_meta_free(struct zram *zram, u64 disksize)
 {
 	size_t num_pages = disksize >> PAGE_SHIFT;
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index d71c8000a964..e540462fec6c 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -44,9 +44,9 @@
 
 /* Flags for zram pages (table[page_no].value) */
 enum zram_pageflags {
-	/* Page consists the same element */
-	ZRAM_SAME = ZRAM_FLAG_SHIFT,
-	ZRAM_ACCESS,	/* page is now accessed */
+	/* zram slot is locked */
+	ZRAM_LOCK = ZRAM_FLAG_SHIFT,
+	ZRAM_SAME,	/* Page consists the same element */
 	ZRAM_WB,	/* page is stored on backing_device */
 
 	__NR_ZRAM_PAGEFLAGS,
-- 
2.17.0.rc0.231.g781580f067-goog

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/4] zram: mark incompressible page as ZRAM_HUGE
  2018-03-27  6:50 [PATCH v2 0/4] zram memory tracking Minchan Kim
  2018-03-27  6:50 ` [PATCH v2 1/4] zram: correct flag name of ZRAM_ACCESS Minchan Kim
@ 2018-03-27  6:50 ` Minchan Kim
  2018-03-27  6:50 ` [PATCH v2 3/4] zram: record accessed second Minchan Kim
  2018-03-27  6:50 ` [PATCH v2 4/4] zram: introduce zram memory tracking Minchan Kim
  3 siblings, 0 replies; 8+ messages in thread
From: Minchan Kim @ 2018-03-27  6:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Sergey Senozhatsky, LKML, Minchan Kim

Mark incompressible pages so that we could investigate who is the
owner of the incompressible pages once the page is swapped out
via using upcoming zram memory tracker feature.

With it, we could prevent such pages to be swapped out by using
mlock. Otherwise we might remove them.

This patch exposes new stat for huge pages via mm_stat.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 Documentation/blockdev/zram.txt |  1 +
 drivers/block/zram/zram_drv.c   | 17 ++++++++++++++---
 drivers/block/zram/zram_drv.h   |  2 ++
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index 257e65714c6a..78db38d02bc9 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -218,6 +218,7 @@ The stat file represents device's mm statistics. It consists of a single
  same_pages       the number of same element filled pages written to this disk.
                   No memory is allocated for such pages.
  pages_compacted  the number of pages freed during compaction
+ huge_pages	  the number of incompressible pages
 
 9) Deactivate:
 	swapoff /dev/zram0
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 18dadeab775b..777fb3339f59 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -729,14 +729,15 @@ static ssize_t mm_stat_show(struct device *dev,
 	max_used = atomic_long_read(&zram->stats.max_used_pages);
 
 	ret = scnprintf(buf, PAGE_SIZE,
-			"%8llu %8llu %8llu %8lu %8ld %8llu %8lu\n",
+			"%8llu %8llu %8llu %8lu %8ld %8llu %8lu %8llu\n",
 			orig_size << PAGE_SHIFT,
 			(u64)atomic64_read(&zram->stats.compr_data_size),
 			mem_used << PAGE_SHIFT,
 			zram->limit_pages << PAGE_SHIFT,
 			max_used << PAGE_SHIFT,
 			(u64)atomic64_read(&zram->stats.same_pages),
-			pool_stats.pages_compacted);
+			pool_stats.pages_compacted,
+			(u64)atomic64_read(&zram->stats.huge_pages));
 	up_read(&zram->init_lock);
 
 	return ret;
@@ -805,6 +806,11 @@ static void zram_free_page(struct zram *zram, size_t index)
 {
 	unsigned long handle;
 
+	if (zram_test_flag(zram, index, ZRAM_HUGE)) {
+		zram_clear_flag(zram, index, ZRAM_HUGE);
+		atomic64_dec(&zram->stats.huge_pages);
+	}
+
 	if (zram_wb_enabled(zram) && zram_test_flag(zram, index, ZRAM_WB)) {
 		zram_wb_clear(zram, index);
 		atomic64_dec(&zram->stats.pages_stored);
@@ -973,6 +979,7 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
 	}
 
 	if (unlikely(comp_len >= huge_class_size)) {
+		comp_len = PAGE_SIZE;
 		if (zram_wb_enabled(zram) && allow_wb) {
 			zcomp_stream_put(zram->comp);
 			ret = write_to_bdev(zram, bvec, index, bio, &element);
@@ -984,7 +991,6 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
 			allow_wb = false;
 			goto compress_again;
 		}
-		comp_len = PAGE_SIZE;
 	}
 
 	/*
@@ -1046,6 +1052,11 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
 	zram_slot_lock(zram, index);
 	zram_free_page(zram, index);
 
+	if (comp_len == PAGE_SIZE) {
+		zram_set_flag(zram, index, ZRAM_HUGE);
+		atomic64_inc(&zram->stats.huge_pages);
+	}
+
 	if (flags) {
 		zram_set_flag(zram, index, flags);
 		zram_set_element(zram, index, element);
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index e540462fec6c..ef660212745c 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -48,6 +48,7 @@ enum zram_pageflags {
 	ZRAM_LOCK = ZRAM_FLAG_SHIFT,
 	ZRAM_SAME,	/* Page consists the same element */
 	ZRAM_WB,	/* page is stored on backing_device */
+	ZRAM_HUGE,	/* Incompressible page */
 
 	__NR_ZRAM_PAGEFLAGS,
 };
@@ -72,6 +73,7 @@ struct zram_stats {
 	atomic64_t invalid_io;	/* non-page-aligned I/O requests */
 	atomic64_t notify_free;	/* no. of swap slot free notifications */
 	atomic64_t same_pages;		/* no. of same element filled pages */
+	atomic64_t huge_pages;		/* no. of huge pages */
 	atomic64_t pages_stored;	/* no. of pages currently stored */
 	atomic_long_t max_used_pages;	/* no. of maximum pages stored */
 	atomic64_t writestall;		/* no. of write slow paths */
-- 
2.17.0.rc0.231.g781580f067-goog

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 3/4] zram: record accessed second
  2018-03-27  6:50 [PATCH v2 0/4] zram memory tracking Minchan Kim
  2018-03-27  6:50 ` [PATCH v2 1/4] zram: correct flag name of ZRAM_ACCESS Minchan Kim
  2018-03-27  6:50 ` [PATCH v2 2/4] zram: mark incompressible page as ZRAM_HUGE Minchan Kim
@ 2018-03-27  6:50 ` Minchan Kim
  2018-03-27  6:50 ` [PATCH v2 4/4] zram: introduce zram memory tracking Minchan Kim
  3 siblings, 0 replies; 8+ messages in thread
From: Minchan Kim @ 2018-03-27  6:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Sergey Senozhatsky, LKML, Minchan Kim

zRam as swap is useful for small memory device. However, swap means
those pages on zram are mostly cold pages due to VM's LRU algorithm.
Especially, once init data for application are touched for launching,
they tend to be not accessed any more and finally swapped out.
zRAM can store such cold pages as compressed form but it's pointless
to keep in memory. Better idea is app developers free them directly
rather than remaining them on heap.

This patch records last access time of each block of zram so that
With upcoming zram memory tracking, it could help userspace developers
to reduce memory footprint.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/zram/zram_drv.c | 16 ++++++++++++++++
 drivers/block/zram/zram_drv.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 777fb3339f59..aa48d3e0d4fc 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -107,6 +107,16 @@ static inline void zram_set_element(struct zram *zram, u32 index,
 	zram->table[index].element = element;
 }
 
+static void zram_accessed(struct zram *zram, u32 index)
+{
+	zram->table[index].ac_time = get_seconds();
+}
+
+static void zram_reset_access(struct zram *zram, u32 index)
+{
+	zram->table[index].ac_time = 0;
+}
+
 static unsigned long zram_get_element(struct zram *zram, u32 index)
 {
 	return zram->table[index].element;
@@ -806,6 +816,8 @@ static void zram_free_page(struct zram *zram, size_t index)
 {
 	unsigned long handle;
 
+	zram_reset_access(zram, index);
+
 	if (zram_test_flag(zram, index, ZRAM_HUGE)) {
 		zram_clear_flag(zram, index, ZRAM_HUGE);
 		atomic64_dec(&zram->stats.huge_pages);
@@ -1177,6 +1189,10 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
 
 	generic_end_io_acct(q, rw_acct, &zram->disk->part0, start_time);
 
+	zram_slot_lock(zram, index);
+	zram_accessed(zram, index);
+	zram_slot_unlock(zram, index);
+
 	if (unlikely(ret < 0)) {
 		if (!is_write)
 			atomic64_inc(&zram->stats.failed_reads);
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index ef660212745c..f9943d623934 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -62,6 +62,7 @@ struct zram_table_entry {
 		unsigned long element;
 	};
 	unsigned long value;
+	unsigned long ac_time;
 };
 
 struct zram_stats {
-- 
2.17.0.rc0.231.g781580f067-goog

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 4/4] zram: introduce zram memory tracking
  2018-03-27  6:50 [PATCH v2 0/4] zram memory tracking Minchan Kim
                   ` (2 preceding siblings ...)
  2018-03-27  6:50 ` [PATCH v2 3/4] zram: record accessed second Minchan Kim
@ 2018-03-27  6:50 ` Minchan Kim
  2018-03-27  7:12   ` Greg KH
  2018-03-27  8:10   ` Sergey Senozhatsky
  3 siblings, 2 replies; 8+ messages in thread
From: Minchan Kim @ 2018-03-27  6:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Sergey Senozhatsky, LKML, Minchan Kim, Greg KH

zRam as swap is useful for small memory device. However, swap means
those pages on zram are mostly cold pages due to VM's LRU algorithm.
Especially, once init data for application are touched for launching,
they tend to be not accessed any more and finally swapped out.
zRAM can store such cold pages as compressed form but it's pointless
to keep in memory. Better idea is app developers free them directly
rather than remaining them on heap.

This patch tell us last access time of each block of zram via
"cat /sys/kernel/debug/zram/zram0/block_state".

The output is as follows,

   25064    .wh       100
   25065    s..       9
   25067    ..h       15

First column is zram's block index and second one represents symbol
(s: same page w: written page to backing store h: huge page) of the
block state. Last one means number of seconds elapsed since the block
was last accessed. So above example means the 25065th block is accessed
100 second ago and it was huge so it was written to the backing store.

Admin can leverage this information to catch cold|incompressible pages
of process with *pagemap* once part of heaps are swapped out.

Cc: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/zram/Kconfig    |  10 +++
 drivers/block/zram/zram_drv.c | 124 +++++++++++++++++++++++++++++++---
 drivers/block/zram/zram_drv.h |   3 +
 3 files changed, 126 insertions(+), 11 deletions(-)

diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
index ac3a31d433b2..fb0763133dd1 100644
--- a/drivers/block/zram/Kconfig
+++ b/drivers/block/zram/Kconfig
@@ -26,3 +26,13 @@ config ZRAM_WRITEBACK
 	 /sys/block/zramX/backing_dev.
 
 	 See zram.txt for more infomration.
+
+config ZRAM_MEMORY_TRACKING
+       bool "Tracking zram block status"
+       depends on ZRAM
+       select DEBUG_FS
+       default n
+       help
+	 With this feature, admin can track the state of allocated block
+	 of zRAM. Admin could see the information via
+	 /sys/kernel/debug/zram/zramX/block_state.
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index aa48d3e0d4fc..11dcadc40cea 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -31,10 +31,15 @@
 #include <linux/err.h>
 #include <linux/idr.h>
 #include <linux/sysfs.h>
+#include <linux/debugfs.h>
 #include <linux/cpuhotplug.h>
 
 #include "zram_drv.h"
 
+#ifdef CONFIG_ZRAM_MEMORY_TRACKING
+static struct dentry *zram_debugfs_root;
+#endif
+
 static DEFINE_IDR(zram_index_idr);
 /* idr index must be protected */
 static DEFINE_MUTEX(zram_index_mutex);
@@ -67,6 +72,13 @@ static inline bool init_done(struct zram *zram)
 	return zram->disksize;
 }
 
+static inline bool zram_allocated(struct zram *zram, u32 index)
+{
+
+	return (zram->table[index].value >> (ZRAM_FLAG_SHIFT + 1)) ||
+					zram->table[index].handle;
+}
+
 static inline struct zram *dev_to_zram(struct device *dev)
 {
 	return (struct zram *)dev_to_disk(dev)->private_data;
@@ -83,7 +95,7 @@ static void zram_set_handle(struct zram *zram, u32 index, unsigned long handle)
 }
 
 /* flag operations require table entry bit_spin_lock() being held */
-static int zram_test_flag(struct zram *zram, u32 index,
+static bool zram_test_flag(struct zram *zram, u32 index,
 			enum zram_pageflags flag)
 {
 	return zram->table[index].value & BIT(flag);
@@ -107,16 +119,6 @@ static inline void zram_set_element(struct zram *zram, u32 index,
 	zram->table[index].element = element;
 }
 
-static void zram_accessed(struct zram *zram, u32 index)
-{
-	zram->table[index].ac_time = get_seconds();
-}
-
-static void zram_reset_access(struct zram *zram, u32 index)
-{
-	zram->table[index].ac_time = 0;
-}
-
 static unsigned long zram_get_element(struct zram *zram, u32 index)
 {
 	return zram->table[index].element;
@@ -620,6 +622,98 @@ static int read_from_bdev(struct zram *zram, struct bio_vec *bvec,
 static void zram_wb_clear(struct zram *zram, u32 index) {}
 #endif
 
+#ifdef CONFIG_ZRAM_MEMORY_TRACKING
+static void zram_accessed(struct zram *zram, u32 index)
+{
+	zram->table[index].ac_time = get_seconds();
+}
+
+static void zram_reset_access(struct zram *zram, u32 index)
+{
+	zram->table[index].ac_time = 0;
+}
+
+static ssize_t read_block_state(struct file *file, char __user *buf,
+				size_t count, loff_t *ppos)
+{
+	char *kbuf;
+	ssize_t index, written = 0;
+	struct zram *zram = file->private_data;
+	unsigned long last_access, nr_pages = zram->disksize >> PAGE_SHIFT;
+	char flags[__NR_ZRAM_PAGEFLAGS - ZRAM_FLAG_SHIFT];
+
+	kbuf = kvmalloc(count, GFP_KERNEL);
+	if (!kbuf)
+		return -ENOMEM;
+
+	down_read(&zram->init_lock);
+	if (!init_done(zram)) {
+		up_read(&zram->init_lock);
+		kvfree(kbuf);
+		return -EINVAL;
+	}
+
+	for (index = *ppos; index < nr_pages; index++) {
+		int copied;
+
+		zram_slot_lock(zram, index);
+		if (!zram_allocated(zram, index))
+			goto next;
+
+		snprintf(flags, sizeof(flags), "%c%c%c",
+			zram_test_flag(zram, index, ZRAM_SAME) ? 's' : '.',
+			zram_test_flag(zram, index, ZRAM_WB) ? 'w' : '.',
+			zram_test_flag(zram, index, ZRAM_HUGE) ? 'h' : '.');
+
+		last_access = get_seconds() - zram->table[index].ac_time;
+		copied = snprintf(kbuf + written, count, "%12lu %6s %12lu\n",
+						index, flags, last_access);
+		if (count < copied) {
+			zram_slot_unlock(zram, index);
+			break;
+		}
+		written += copied;
+		count -= copied;
+next:
+		zram_slot_unlock(zram, index);
+		*ppos += 1;
+	}
+
+	up_read(&zram->init_lock);
+	copy_to_user(buf, kbuf, written);
+	kvfree(kbuf);
+
+	return written;
+}
+
+static const struct file_operations proc_zram_block_state_op = {
+	.open = simple_open,
+	.read = read_block_state,
+	.llseek = default_llseek,
+};
+
+static void zram_debugfs_register(struct zram *zram)
+{
+	if (!zram_debugfs_root)
+		return;
+
+	zram->debugfs_dir = debugfs_create_dir(zram->disk->disk_name,
+						zram_debugfs_root);
+	debugfs_create_file("block_state", 0400, zram->debugfs_dir,
+				zram, &proc_zram_block_state_op);
+}
+
+static void zram_debugfs_unregister(struct zram *zram)
+{
+	debugfs_remove_recursive(zram->debugfs_dir);
+}
+#else
+static void zram_accessed(struct zram *zram, u32 index) {};
+static void zram_reset_access(struct zram *zram, u32 index) {};
+static unsigned long zram_access_time(struct zram *zram, u32 index) { return 0 };
+static void zram_debugfs_register(struct zram *zram) {};
+static void zram_debugfs_unregister(struct zram *zram) {};
+#endif
 
 /*
  * We switched to per-cpu streams and this attr is not needed anymore.
@@ -1604,6 +1698,7 @@ static int zram_add(void)
 	}
 	strlcpy(zram->compressor, default_compressor, sizeof(zram->compressor));
 
+	zram_debugfs_register(zram);
 	pr_info("Added device: %s\n", zram->disk->disk_name);
 	return device_id;
 
@@ -1637,6 +1732,7 @@ static int zram_remove(struct zram *zram)
 	zram->claim = true;
 	mutex_unlock(&bdev->bd_mutex);
 
+	zram_debugfs_unregister(zram);
 	/*
 	 * Remove sysfs first, so no one will perform a disksize
 	 * store while we destroy the devices. This also helps during
@@ -1738,6 +1834,9 @@ static int zram_remove_cb(int id, void *ptr, void *data)
 static void destroy_devices(void)
 {
 	class_unregister(&zram_control_class);
+#ifdef CONFIG_ZRAM_MEMORY_TRACKING
+	debugfs_remove_recursive(zram_debugfs_root);
+#endif
 	idr_for_each(&zram_index_idr, &zram_remove_cb, NULL);
 	idr_destroy(&zram_index_idr);
 	unregister_blkdev(zram_major, "zram");
@@ -1760,6 +1859,9 @@ static int __init zram_init(void)
 		return ret;
 	}
 
+#ifdef CONFIG_ZRAM_MEMORY_TRACKING
+	zram_debugfs_root = debugfs_create_dir("zram", NULL);
+#endif
 	zram_major = register_blkdev(0, "zram");
 	if (zram_major <= 0) {
 		pr_err("Unable to get major number\n");
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index f9943d623934..247d287d5054 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -111,5 +111,8 @@ struct zram {
 	unsigned long nr_pages;
 	spinlock_t bitmap_lock;
 #endif
+#ifdef CONFIG_ZRAM_MEMORY_TRACKING
+	struct dentry *debugfs_dir;
+#endif
 };
 #endif
-- 
2.17.0.rc0.231.g781580f067-goog

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 4/4] zram: introduce zram memory tracking
  2018-03-27  6:50 ` [PATCH v2 4/4] zram: introduce zram memory tracking Minchan Kim
@ 2018-03-27  7:12   ` Greg KH
  2018-03-27  7:23     ` Minchan Kim
  2018-03-27  8:10   ` Sergey Senozhatsky
  1 sibling, 1 reply; 8+ messages in thread
From: Greg KH @ 2018-03-27  7:12 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Andrew Morton, Sergey Senozhatsky, LKML

On Tue, Mar 27, 2018 at 03:50:33PM +0900, Minchan Kim wrote:
> zRam as swap is useful for small memory device. However, swap means
> those pages on zram are mostly cold pages due to VM's LRU algorithm.
> Especially, once init data for application are touched for launching,
> they tend to be not accessed any more and finally swapped out.
> zRAM can store such cold pages as compressed form but it's pointless
> to keep in memory. Better idea is app developers free them directly
> rather than remaining them on heap.
> 
> This patch tell us last access time of each block of zram via
> "cat /sys/kernel/debug/zram/zram0/block_state".
> 
> The output is as follows,
> 
>    25064    .wh       100
>    25065    s..       9
>    25067    ..h       15
> 
> First column is zram's block index and second one represents symbol
> (s: same page w: written page to backing store h: huge page) of the
> block state. Last one means number of seconds elapsed since the block
> was last accessed. So above example means the 25065th block is accessed
> 100 second ago and it was huge so it was written to the backing store.
> 
> Admin can leverage this information to catch cold|incompressible pages
> of process with *pagemap* once part of heaps are swapped out.

For debugging, yes, but remember that no system functionality should
ever depend on debugfs :)

> +config ZRAM_MEMORY_TRACKING
> +       bool "Tracking zram block status"
> +       depends on ZRAM
> +       select DEBUG_FS

depends on?

> +       default n

That's always the default, no need to list it.

> +       help
> +	 With this feature, admin can track the state of allocated block
> +	 of zRAM. Admin could see the information via
> +	 /sys/kernel/debug/zram/zramX/block_state.

Odd indentation, please use tabs everywhere here.

> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -31,10 +31,15 @@
>  #include <linux/err.h>
>  #include <linux/idr.h>
>  #include <linux/sysfs.h>
> +#include <linux/debugfs.h>
>  #include <linux/cpuhotplug.h>
>  
>  #include "zram_drv.h"
>  
> +#ifdef CONFIG_ZRAM_MEMORY_TRACKING
> +static struct dentry *zram_debugfs_root;
> +#endif
> +
>  static DEFINE_IDR(zram_index_idr);
>  /* idr index must be protected */
>  static DEFINE_MUTEX(zram_index_mutex);
> @@ -67,6 +72,13 @@ static inline bool init_done(struct zram *zram)
>  	return zram->disksize;
>  }
>  
> +static inline bool zram_allocated(struct zram *zram, u32 index)
> +{
> +
> +	return (zram->table[index].value >> (ZRAM_FLAG_SHIFT + 1)) ||
> +					zram->table[index].handle;
> +}
> +
>  static inline struct zram *dev_to_zram(struct device *dev)
>  {
>  	return (struct zram *)dev_to_disk(dev)->private_data;
> @@ -83,7 +95,7 @@ static void zram_set_handle(struct zram *zram, u32 index, unsigned long handle)
>  }
>  
>  /* flag operations require table entry bit_spin_lock() being held */
> -static int zram_test_flag(struct zram *zram, u32 index,
> +static bool zram_test_flag(struct zram *zram, u32 index,
>  			enum zram_pageflags flag)
>  {
>  	return zram->table[index].value & BIT(flag);
> @@ -107,16 +119,6 @@ static inline void zram_set_element(struct zram *zram, u32 index,
>  	zram->table[index].element = element;
>  }
>  
> -static void zram_accessed(struct zram *zram, u32 index)
> -{
> -	zram->table[index].ac_time = get_seconds();
> -}
> -
> -static void zram_reset_access(struct zram *zram, u32 index)
> -{
> -	zram->table[index].ac_time = 0;
> -}
> -
>  static unsigned long zram_get_element(struct zram *zram, u32 index)
>  {
>  	return zram->table[index].element;
> @@ -620,6 +622,98 @@ static int read_from_bdev(struct zram *zram, struct bio_vec *bvec,
>  static void zram_wb_clear(struct zram *zram, u32 index) {}
>  #endif
>  
> +#ifdef CONFIG_ZRAM_MEMORY_TRACKING
> +static void zram_accessed(struct zram *zram, u32 index)
> +{
> +	zram->table[index].ac_time = get_seconds();
> +}
> +
> +static void zram_reset_access(struct zram *zram, u32 index)
> +{
> +	zram->table[index].ac_time = 0;
> +}
> +
> +static ssize_t read_block_state(struct file *file, char __user *buf,
> +				size_t count, loff_t *ppos)
> +{
> +	char *kbuf;
> +	ssize_t index, written = 0;
> +	struct zram *zram = file->private_data;
> +	unsigned long last_access, nr_pages = zram->disksize >> PAGE_SHIFT;
> +	char flags[__NR_ZRAM_PAGEFLAGS - ZRAM_FLAG_SHIFT];
> +
> +	kbuf = kvmalloc(count, GFP_KERNEL);
> +	if (!kbuf)
> +		return -ENOMEM;
> +
> +	down_read(&zram->init_lock);
> +	if (!init_done(zram)) {
> +		up_read(&zram->init_lock);
> +		kvfree(kbuf);
> +		return -EINVAL;
> +	}
> +
> +	for (index = *ppos; index < nr_pages; index++) {
> +		int copied;
> +
> +		zram_slot_lock(zram, index);
> +		if (!zram_allocated(zram, index))
> +			goto next;
> +
> +		snprintf(flags, sizeof(flags), "%c%c%c",
> +			zram_test_flag(zram, index, ZRAM_SAME) ? 's' : '.',
> +			zram_test_flag(zram, index, ZRAM_WB) ? 'w' : '.',
> +			zram_test_flag(zram, index, ZRAM_HUGE) ? 'h' : '.');
> +
> +		last_access = get_seconds() - zram->table[index].ac_time;
> +		copied = snprintf(kbuf + written, count, "%12lu %6s %12lu\n",
> +						index, flags, last_access);
> +		if (count < copied) {
> +			zram_slot_unlock(zram, index);
> +			break;
> +		}
> +		written += copied;
> +		count -= copied;
> +next:
> +		zram_slot_unlock(zram, index);
> +		*ppos += 1;
> +	}
> +
> +	up_read(&zram->init_lock);
> +	copy_to_user(buf, kbuf, written);
> +	kvfree(kbuf);
> +
> +	return written;
> +}
> +
> +static const struct file_operations proc_zram_block_state_op = {
> +	.open = simple_open,
> +	.read = read_block_state,
> +	.llseek = default_llseek,
> +};
> +
> +static void zram_debugfs_register(struct zram *zram)
> +{
> +	if (!zram_debugfs_root)
> +		return;
> +
> +	zram->debugfs_dir = debugfs_create_dir(zram->disk->disk_name,
> +						zram_debugfs_root);
> +	debugfs_create_file("block_state", 0400, zram->debugfs_dir,
> +				zram, &proc_zram_block_state_op);
> +}

Much nicer code, thanks for the changes.

>  static void destroy_devices(void)
>  {
>  	class_unregister(&zram_control_class);
> +#ifdef CONFIG_ZRAM_MEMORY_TRACKING
> +	debugfs_remove_recursive(zram_debugfs_root);
> +#endif

No need for #ifdefs in .c code if your .h files are correct.

>  	idr_for_each(&zram_index_idr, &zram_remove_cb, NULL);
>  	idr_destroy(&zram_index_idr);
>  	unregister_blkdev(zram_major, "zram");
> @@ -1760,6 +1859,9 @@ static int __init zram_init(void)
>  		return ret;
>  	}
>  
> +#ifdef CONFIG_ZRAM_MEMORY_TRACKING
> +	zram_debugfs_root = debugfs_create_dir("zram", NULL);
> +#endif

Same here. Put this in a wrapper function somewhere.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 4/4] zram: introduce zram memory tracking
  2018-03-27  7:12   ` Greg KH
@ 2018-03-27  7:23     ` Minchan Kim
  0 siblings, 0 replies; 8+ messages in thread
From: Minchan Kim @ 2018-03-27  7:23 UTC (permalink / raw)
  To: Greg KH; +Cc: Andrew Morton, Sergey Senozhatsky, LKML

Hi Greg,

On Tue, Mar 27, 2018 at 09:12:38AM +0200, Greg KH wrote:
> On Tue, Mar 27, 2018 at 03:50:33PM +0900, Minchan Kim wrote:
> > zRam as swap is useful for small memory device. However, swap means
> > those pages on zram are mostly cold pages due to VM's LRU algorithm.
> > Especially, once init data for application are touched for launching,
> > they tend to be not accessed any more and finally swapped out.
> > zRAM can store such cold pages as compressed form but it's pointless
> > to keep in memory. Better idea is app developers free them directly
> > rather than remaining them on heap.
> > 
> > This patch tell us last access time of each block of zram via
> > "cat /sys/kernel/debug/zram/zram0/block_state".
> > 
> > The output is as follows,
> > 
> >    25064    .wh       100
> >    25065    s..       9
> >    25067    ..h       15
> > 
> > First column is zram's block index and second one represents symbol
> > (s: same page w: written page to backing store h: huge page) of the
> > block state. Last one means number of seconds elapsed since the block
> > was last accessed. So above example means the 25065th block is accessed
> > 100 second ago and it was huge so it was written to the backing store.
> > 
> > Admin can leverage this information to catch cold|incompressible pages
> > of process with *pagemap* once part of heaps are swapped out.
> 
> For debugging, yes, but remember that no system functionality should
> ever depend on debugfs :)

Sure, any functionality of zram doesn't rely on the this feature.

> 
> > +config ZRAM_MEMORY_TRACKING
> > +       bool "Tracking zram block status"
> > +       depends on ZRAM
> > +       select DEBUG_FS
> 
> depends on?
> 
> > +       default n
> 
> That's always the default, no need to list it.

Thanks. I will clean it out.

> 
> > +       help
> > +	 With this feature, admin can track the state of allocated block
> > +	 of zRAM. Admin could see the information via
> > +	 /sys/kernel/debug/zram/zramX/block_state.
> 
> Odd indentation, please use tabs everywhere here.

Oops.

> 
> > --- a/drivers/block/zram/zram_drv.c
> > +++ b/drivers/block/zram/zram_drv.c
> > @@ -31,10 +31,15 @@
> >  #include <linux/err.h>
> >  #include <linux/idr.h>
> >  #include <linux/sysfs.h>
> > +#include <linux/debugfs.h>
> >  #include <linux/cpuhotplug.h>
> >  
> >  #include "zram_drv.h"
> >  
> > +#ifdef CONFIG_ZRAM_MEMORY_TRACKING
> > +static struct dentry *zram_debugfs_root;
> > +#endif
> > +
> >  static DEFINE_IDR(zram_index_idr);
> >  /* idr index must be protected */
> >  static DEFINE_MUTEX(zram_index_mutex);
> > @@ -67,6 +72,13 @@ static inline bool init_done(struct zram *zram)
> >  	return zram->disksize;
> >  }
> >  
> > +static inline bool zram_allocated(struct zram *zram, u32 index)
> > +{
> > +
> > +	return (zram->table[index].value >> (ZRAM_FLAG_SHIFT + 1)) ||
> > +					zram->table[index].handle;
> > +}
> > +
> >  static inline struct zram *dev_to_zram(struct device *dev)
> >  {
> >  	return (struct zram *)dev_to_disk(dev)->private_data;
> > @@ -83,7 +95,7 @@ static void zram_set_handle(struct zram *zram, u32 index, unsigned long handle)
> >  }
> >  
> >  /* flag operations require table entry bit_spin_lock() being held */
> > -static int zram_test_flag(struct zram *zram, u32 index,
> > +static bool zram_test_flag(struct zram *zram, u32 index,
> >  			enum zram_pageflags flag)
> >  {
> >  	return zram->table[index].value & BIT(flag);
> > @@ -107,16 +119,6 @@ static inline void zram_set_element(struct zram *zram, u32 index,
> >  	zram->table[index].element = element;
> >  }
> >  
> > -static void zram_accessed(struct zram *zram, u32 index)
> > -{
> > -	zram->table[index].ac_time = get_seconds();
> > -}
> > -
> > -static void zram_reset_access(struct zram *zram, u32 index)
> > -{
> > -	zram->table[index].ac_time = 0;
> > -}
> > -
> >  static unsigned long zram_get_element(struct zram *zram, u32 index)
> >  {
> >  	return zram->table[index].element;
> > @@ -620,6 +622,98 @@ static int read_from_bdev(struct zram *zram, struct bio_vec *bvec,
> >  static void zram_wb_clear(struct zram *zram, u32 index) {}
> >  #endif
> >  
> > +#ifdef CONFIG_ZRAM_MEMORY_TRACKING
> > +static void zram_accessed(struct zram *zram, u32 index)
> > +{
> > +	zram->table[index].ac_time = get_seconds();
> > +}
> > +
> > +static void zram_reset_access(struct zram *zram, u32 index)
> > +{
> > +	zram->table[index].ac_time = 0;
> > +}
> > +
> > +static ssize_t read_block_state(struct file *file, char __user *buf,
> > +				size_t count, loff_t *ppos)
> > +{
> > +	char *kbuf;
> > +	ssize_t index, written = 0;
> > +	struct zram *zram = file->private_data;
> > +	unsigned long last_access, nr_pages = zram->disksize >> PAGE_SHIFT;
> > +	char flags[__NR_ZRAM_PAGEFLAGS - ZRAM_FLAG_SHIFT];
> > +
> > +	kbuf = kvmalloc(count, GFP_KERNEL);
> > +	if (!kbuf)
> > +		return -ENOMEM;
> > +
> > +	down_read(&zram->init_lock);
> > +	if (!init_done(zram)) {
> > +		up_read(&zram->init_lock);
> > +		kvfree(kbuf);
> > +		return -EINVAL;
> > +	}
> > +
> > +	for (index = *ppos; index < nr_pages; index++) {
> > +		int copied;
> > +
> > +		zram_slot_lock(zram, index);
> > +		if (!zram_allocated(zram, index))
> > +			goto next;
> > +
> > +		snprintf(flags, sizeof(flags), "%c%c%c",
> > +			zram_test_flag(zram, index, ZRAM_SAME) ? 's' : '.',
> > +			zram_test_flag(zram, index, ZRAM_WB) ? 'w' : '.',
> > +			zram_test_flag(zram, index, ZRAM_HUGE) ? 'h' : '.');
> > +
> > +		last_access = get_seconds() - zram->table[index].ac_time;
> > +		copied = snprintf(kbuf + written, count, "%12lu %6s %12lu\n",
> > +						index, flags, last_access);
> > +		if (count < copied) {
> > +			zram_slot_unlock(zram, index);
> > +			break;
> > +		}
> > +		written += copied;
> > +		count -= copied;
> > +next:
> > +		zram_slot_unlock(zram, index);
> > +		*ppos += 1;
> > +	}
> > +
> > +	up_read(&zram->init_lock);
> > +	copy_to_user(buf, kbuf, written);
> > +	kvfree(kbuf);
> > +
> > +	return written;
> > +}
> > +
> > +static const struct file_operations proc_zram_block_state_op = {
> > +	.open = simple_open,
> > +	.read = read_block_state,
> > +	.llseek = default_llseek,
> > +};
> > +
> > +static void zram_debugfs_register(struct zram *zram)
> > +{
> > +	if (!zram_debugfs_root)
> > +		return;
> > +
> > +	zram->debugfs_dir = debugfs_create_dir(zram->disk->disk_name,
> > +						zram_debugfs_root);
> > +	debugfs_create_file("block_state", 0400, zram->debugfs_dir,
> > +				zram, &proc_zram_block_state_op);
> > +}
> 
> Much nicer code, thanks for the changes.
> 
> >  static void destroy_devices(void)
> >  {
> >  	class_unregister(&zram_control_class);
> > +#ifdef CONFIG_ZRAM_MEMORY_TRACKING
> > +	debugfs_remove_recursive(zram_debugfs_root);
> > +#endif
> 
> No need for #ifdefs in .c code if your .h files are correct.
> 
> >  	idr_for_each(&zram_index_idr, &zram_remove_cb, NULL);
> >  	idr_destroy(&zram_index_idr);
> >  	unregister_blkdev(zram_major, "zram");
> > @@ -1760,6 +1859,9 @@ static int __init zram_init(void)
> >  		return ret;
> >  	}
> >  
> > +#ifdef CONFIG_ZRAM_MEMORY_TRACKING
> > +	zram_debugfs_root = debugfs_create_dir("zram", NULL);
> > +#endif
> 
> Same here. Put this in a wrapper function somewhere.

Okay.
Thanks for the super fast review, Greg!

I bet there are several Greg in the world and each of them live
different time zone conutries.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 4/4] zram: introduce zram memory tracking
  2018-03-27  6:50 ` [PATCH v2 4/4] zram: introduce zram memory tracking Minchan Kim
  2018-03-27  7:12   ` Greg KH
@ 2018-03-27  8:10   ` Sergey Senozhatsky
  1 sibling, 0 replies; 8+ messages in thread
From: Sergey Senozhatsky @ 2018-03-27  8:10 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Andrew Morton, Sergey Senozhatsky, LKML, Greg KH

On (03/27/18 15:50), Minchan Kim wrote:
[..]
>
> +	char flags[__NR_ZRAM_PAGEFLAGS - ZRAM_FLAG_SHIFT];
>

Technically, you don't need this buffer...

> +		snprintf(flags, sizeof(flags), "%c%c%c",
> +			zram_test_flag(zram, index, ZRAM_SAME) ? 's' : '.',
> +			zram_test_flag(zram, index, ZRAM_WB) ? 'w' : '.',
> +			zram_test_flag(zram, index, ZRAM_HUGE) ? 'h' : '.');
> +
> +		last_access = get_seconds() - zram->table[index].ac_time;
> +		copied = snprintf(kbuf + written, count, "%12lu %6s %12lu\n",
> +						index, flags, last_access);

... if you'll do flags decoding in place. But if you prefer it this way
then OK.

	-ss

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-03-27  8:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-27  6:50 [PATCH v2 0/4] zram memory tracking Minchan Kim
2018-03-27  6:50 ` [PATCH v2 1/4] zram: correct flag name of ZRAM_ACCESS Minchan Kim
2018-03-27  6:50 ` [PATCH v2 2/4] zram: mark incompressible page as ZRAM_HUGE Minchan Kim
2018-03-27  6:50 ` [PATCH v2 3/4] zram: record accessed second Minchan Kim
2018-03-27  6:50 ` [PATCH v2 4/4] zram: introduce zram memory tracking Minchan Kim
2018-03-27  7:12   ` Greg KH
2018-03-27  7:23     ` Minchan Kim
2018-03-27  8:10   ` Sergey Senozhatsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.