All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/4] zram memory tracking
@ 2018-04-16  3:31 Minchan Kim
  2018-04-16  3:31 ` [PATCH v4 1/4] zram: correct flag name of ZRAM_ACCESS Minchan Kim
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Minchan Kim @ 2018-04-16  3:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, Sergey Senozhatsky, Minchan Kim

zRam as swap is useful for small memory device. However, swap means
those pages on zram are mostly cold pages due to VM's LRU algorithm.
Especially, once init data for application are touched for launching,
they tend to be not accessed any more and finally swapped out.
zRAM can store such cold pages as compressed form but it's pointless
to keep in memory. As well, it's pointless to store incompressible
pages to zram so better idea is app developers manages them directly
like free or mlock rather than remaining them on heap.

This patch provides a debugfs /sys/kernel/debug/zram/zram0/block_state
to represent each block's state so admin can investigate what memory is
cold|incompressible|same page with using pagemap once the pages are
swapped out.


The output is as follows,
      300    75.033841 .wh
      301    63.806904 s..
      302    63.806919 ..h

First column is zram's block index and 3rh one represents symbol
(s: same page w: written page to backing store h: huge page) of the
block state. Second column represents usec time unit of the block
was last accessed. So above example means the 300th block is accessed
at 75.033851 second and it was huge so it was written to the backing
store.

* from v3:
  * use depends on selecting DEBUG_FS - Greg KH
  * Add acked-by from Greg
  * Fix null ptr access at module unload - Sergey
  * warning fix from copy_to_user - Sergey

* From v2:
  * debugfs and Kconfig cleanup - Greg KH
  * Remove unnecesarry buffer - Sergey
  * Change timestamp from sec to usec

* From v1:
  * Do not propagate error number for debugfs fail - Greg KH
  * Add writeback and hugepage information - Sergey

Minchan Kim (4):
  zram: correct flag name of ZRAM_ACCESS
  zram: mark incompressible page as ZRAM_HUGE
  zram: record accessed second
  zram: introduce zram memory tracking

 Documentation/blockdev/zram.txt |  25 +++++
 drivers/block/zram/Kconfig      |  10 ++
 drivers/block/zram/zram_drv.c   | 173 +++++++++++++++++++++++++++++---
 drivers/block/zram/zram_drv.h   |  14 ++-
 4 files changed, 205 insertions(+), 17 deletions(-)

-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v4 1/4] zram: correct flag name of ZRAM_ACCESS
  2018-04-16  3:31 [PATCH v4 0/4] zram memory tracking Minchan Kim
@ 2018-04-16  3:31 ` Minchan Kim
  2018-04-16  3:31 ` [PATCH v4 2/4] zram: mark incompressible page as ZRAM_HUGE Minchan Kim
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Minchan Kim @ 2018-04-16  3:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, Sergey Senozhatsky, Minchan Kim

ZRAM_ACCESS is used for locking a slot of zram so correct the name.
It is also not a common flag to indicate status of the block so
move the declare position on top of the flag.
Lastly, let's move the function to the top of source code to be able to
use it easily without forward declaration.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/zram/zram_drv.c | 20 ++++++++++----------
 drivers/block/zram/zram_drv.h |  6 +++---
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 0f3fadd71230..18dadeab775b 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -52,6 +52,16 @@ static size_t huge_class_size;
 
 static void zram_free_page(struct zram *zram, size_t index);
 
+static void zram_slot_lock(struct zram *zram, u32 index)
+{
+	bit_spin_lock(ZRAM_LOCK, &zram->table[index].value);
+}
+
+static void zram_slot_unlock(struct zram *zram, u32 index)
+{
+	bit_spin_unlock(ZRAM_LOCK, &zram->table[index].value);
+}
+
 static inline bool init_done(struct zram *zram)
 {
 	return zram->disksize;
@@ -753,16 +763,6 @@ static DEVICE_ATTR_RO(io_stat);
 static DEVICE_ATTR_RO(mm_stat);
 static DEVICE_ATTR_RO(debug_stat);
 
-static void zram_slot_lock(struct zram *zram, u32 index)
-{
-	bit_spin_lock(ZRAM_ACCESS, &zram->table[index].value);
-}
-
-static void zram_slot_unlock(struct zram *zram, u32 index)
-{
-	bit_spin_unlock(ZRAM_ACCESS, &zram->table[index].value);
-}
-
 static void zram_meta_free(struct zram *zram, u64 disksize)
 {
 	size_t num_pages = disksize >> PAGE_SHIFT;
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 008861220723..8d8959ceabd1 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -43,9 +43,9 @@
 
 /* Flags for zram pages (table[page_no].value) */
 enum zram_pageflags {
-	/* Page consists the same element */
-	ZRAM_SAME = ZRAM_FLAG_SHIFT,
-	ZRAM_ACCESS,	/* page is now accessed */
+	/* zram slot is locked */
+	ZRAM_LOCK = ZRAM_FLAG_SHIFT,
+	ZRAM_SAME,	/* Page consists the same element */
 	ZRAM_WB,	/* page is stored on backing_device */
 
 	__NR_ZRAM_PAGEFLAGS,
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 2/4] zram: mark incompressible page as ZRAM_HUGE
  2018-04-16  3:31 [PATCH v4 0/4] zram memory tracking Minchan Kim
  2018-04-16  3:31 ` [PATCH v4 1/4] zram: correct flag name of ZRAM_ACCESS Minchan Kim
@ 2018-04-16  3:31 ` Minchan Kim
  2018-04-16  3:31 ` [PATCH v4 3/4] zram: record accessed second Minchan Kim
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Minchan Kim @ 2018-04-16  3:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, Sergey Senozhatsky, Minchan Kim

Mark incompressible pages so that we could investigate who is the
owner of the incompressible pages once the page is swapped out
via using upcoming zram memory tracker feature.

With it, we could prevent such pages to be swapped out by using
mlock. Otherwise we might remove them.

This patch exposes new stat for huge pages via mm_stat.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 Documentation/blockdev/zram.txt |  1 +
 drivers/block/zram/zram_drv.c   | 17 ++++++++++++++---
 drivers/block/zram/zram_drv.h   |  2 ++
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index 257e65714c6a..78db38d02bc9 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -218,6 +218,7 @@ The stat file represents device's mm statistics. It consists of a single
  same_pages       the number of same element filled pages written to this disk.
                   No memory is allocated for such pages.
  pages_compacted  the number of pages freed during compaction
+ huge_pages	  the number of incompressible pages
 
 9) Deactivate:
 	swapoff /dev/zram0
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 18dadeab775b..777fb3339f59 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -729,14 +729,15 @@ static ssize_t mm_stat_show(struct device *dev,
 	max_used = atomic_long_read(&zram->stats.max_used_pages);
 
 	ret = scnprintf(buf, PAGE_SIZE,
-			"%8llu %8llu %8llu %8lu %8ld %8llu %8lu\n",
+			"%8llu %8llu %8llu %8lu %8ld %8llu %8lu %8llu\n",
 			orig_size << PAGE_SHIFT,
 			(u64)atomic64_read(&zram->stats.compr_data_size),
 			mem_used << PAGE_SHIFT,
 			zram->limit_pages << PAGE_SHIFT,
 			max_used << PAGE_SHIFT,
 			(u64)atomic64_read(&zram->stats.same_pages),
-			pool_stats.pages_compacted);
+			pool_stats.pages_compacted,
+			(u64)atomic64_read(&zram->stats.huge_pages));
 	up_read(&zram->init_lock);
 
 	return ret;
@@ -805,6 +806,11 @@ static void zram_free_page(struct zram *zram, size_t index)
 {
 	unsigned long handle;
 
+	if (zram_test_flag(zram, index, ZRAM_HUGE)) {
+		zram_clear_flag(zram, index, ZRAM_HUGE);
+		atomic64_dec(&zram->stats.huge_pages);
+	}
+
 	if (zram_wb_enabled(zram) && zram_test_flag(zram, index, ZRAM_WB)) {
 		zram_wb_clear(zram, index);
 		atomic64_dec(&zram->stats.pages_stored);
@@ -973,6 +979,7 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
 	}
 
 	if (unlikely(comp_len >= huge_class_size)) {
+		comp_len = PAGE_SIZE;
 		if (zram_wb_enabled(zram) && allow_wb) {
 			zcomp_stream_put(zram->comp);
 			ret = write_to_bdev(zram, bvec, index, bio, &element);
@@ -984,7 +991,6 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
 			allow_wb = false;
 			goto compress_again;
 		}
-		comp_len = PAGE_SIZE;
 	}
 
 	/*
@@ -1046,6 +1052,11 @@ static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
 	zram_slot_lock(zram, index);
 	zram_free_page(zram, index);
 
+	if (comp_len == PAGE_SIZE) {
+		zram_set_flag(zram, index, ZRAM_HUGE);
+		atomic64_inc(&zram->stats.huge_pages);
+	}
+
 	if (flags) {
 		zram_set_flag(zram, index, flags);
 		zram_set_element(zram, index, element);
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 8d8959ceabd1..ff0547bdb586 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -47,6 +47,7 @@ enum zram_pageflags {
 	ZRAM_LOCK = ZRAM_FLAG_SHIFT,
 	ZRAM_SAME,	/* Page consists the same element */
 	ZRAM_WB,	/* page is stored on backing_device */
+	ZRAM_HUGE,	/* Incompressible page */
 
 	__NR_ZRAM_PAGEFLAGS,
 };
@@ -71,6 +72,7 @@ struct zram_stats {
 	atomic64_t invalid_io;	/* non-page-aligned I/O requests */
 	atomic64_t notify_free;	/* no. of swap slot free notifications */
 	atomic64_t same_pages;		/* no. of same element filled pages */
+	atomic64_t huge_pages;		/* no. of huge pages */
 	atomic64_t pages_stored;	/* no. of pages currently stored */
 	atomic_long_t max_used_pages;	/* no. of maximum pages stored */
 	atomic64_t writestall;		/* no. of write slow paths */
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 3/4] zram: record accessed second
  2018-04-16  3:31 [PATCH v4 0/4] zram memory tracking Minchan Kim
  2018-04-16  3:31 ` [PATCH v4 1/4] zram: correct flag name of ZRAM_ACCESS Minchan Kim
  2018-04-16  3:31 ` [PATCH v4 2/4] zram: mark incompressible page as ZRAM_HUGE Minchan Kim
@ 2018-04-16  3:31 ` Minchan Kim
  2018-04-16  3:31 ` [PATCH v4 4/4] zram: introduce zram memory tracking Minchan Kim
  2018-04-16  7:42 ` [PATCH v4 0/4] " Sergey Senozhatsky
  4 siblings, 0 replies; 9+ messages in thread
From: Minchan Kim @ 2018-04-16  3:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, Sergey Senozhatsky, Minchan Kim

zRam as swap is useful for small memory device. However, swap means
those pages on zram are mostly cold pages due to VM's LRU algorithm.
Especially, once init data for application are touched for launching,
they tend to be not accessed any more and finally swapped out.
zRAM can store such cold pages as compressed form but it's pointless
to keep in memory. Better idea is app developers free them directly
rather than remaining them on heap.

This patch records last access time of each block of zram so that
With upcoming zram memory tracking, it could help userspace developers
to reduce memory footprint.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/zram/zram_drv.c | 16 ++++++++++++++++
 drivers/block/zram/zram_drv.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 777fb3339f59..7fc10e2ad734 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -107,6 +107,16 @@ static inline void zram_set_element(struct zram *zram, u32 index,
 	zram->table[index].element = element;
 }
 
+static void zram_accessed(struct zram *zram, u32 index)
+{
+	zram->table[index].ac_time = sched_clock();
+}
+
+static void zram_reset_access(struct zram *zram, u32 index)
+{
+	zram->table[index].ac_time = 0;
+}
+
 static unsigned long zram_get_element(struct zram *zram, u32 index)
 {
 	return zram->table[index].element;
@@ -806,6 +816,8 @@ static void zram_free_page(struct zram *zram, size_t index)
 {
 	unsigned long handle;
 
+	zram_reset_access(zram, index);
+
 	if (zram_test_flag(zram, index, ZRAM_HUGE)) {
 		zram_clear_flag(zram, index, ZRAM_HUGE);
 		atomic64_dec(&zram->stats.huge_pages);
@@ -1177,6 +1189,10 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
 
 	generic_end_io_acct(q, rw_acct, &zram->disk->part0, start_time);
 
+	zram_slot_lock(zram, index);
+	zram_accessed(zram, index);
+	zram_slot_unlock(zram, index);
+
 	if (unlikely(ret < 0)) {
 		if (!is_write)
 			atomic64_inc(&zram->stats.failed_reads);
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index ff0547bdb586..1075218e88b2 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -61,6 +61,7 @@ struct zram_table_entry {
 		unsigned long element;
 	};
 	unsigned long value;
+	u64 ac_time;
 };
 
 struct zram_stats {
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 4/4] zram: introduce zram memory tracking
  2018-04-16  3:31 [PATCH v4 0/4] zram memory tracking Minchan Kim
                   ` (2 preceding siblings ...)
  2018-04-16  3:31 ` [PATCH v4 3/4] zram: record accessed second Minchan Kim
@ 2018-04-16  3:31 ` Minchan Kim
  2018-04-16  4:17   ` Randy Dunlap
  2018-04-16  7:42 ` [PATCH v4 0/4] " Sergey Senozhatsky
  4 siblings, 1 reply; 9+ messages in thread
From: Minchan Kim @ 2018-04-16  3:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, Sergey Senozhatsky, Minchan Kim, Greg Kroah-Hartman

zRam as swap is useful for small memory device. However, swap means
those pages on zram are mostly cold pages due to VM's LRU algorithm.
Especially, once init data for application are touched for launching,
they tend to be not accessed any more and finally swapped out.
zRAM can store such cold pages as compressed form but it's pointless
to keep in memory. Better idea is app developers free them directly
rather than remaining them on heap.

This patch tell us last access time of each block of zram via
"cat /sys/kernel/debug/zram/zram0/block_state".

The output is as follows,
      300    75.033841 .wh
      301    63.806904 s..
      302    63.806919 ..h

First column is zram's block index and 3rh one represents symbol
(s: same page w: written page to backing store h: huge page) of the
block state. Second column represents usec time unit of the block
was last accessed. So above example means the 300th block is accessed
at 75.033851 second and it was huge so it was written to the backing
store.

Admin can leverage this information to catch cold|incompressible pages
of process with *pagemap* once part of heaps are swapped out.

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 Documentation/blockdev/zram.txt |  24 ++++++
 drivers/block/zram/Kconfig      |  10 +++
 drivers/block/zram/zram_drv.c   | 140 +++++++++++++++++++++++++++++---
 drivers/block/zram/zram_drv.h   |   5 ++
 4 files changed, 168 insertions(+), 11 deletions(-)

diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index 78db38d02bc9..45509c7d5716 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -243,5 +243,29 @@ to backing storage rather than keeping it in memory.
 User should set up backing device via /sys/block/zramX/backing_dev
 before disksize setting.
 
+= memory tracking
+
+With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the
+zram block. It could be useful to catch cold or incompressible
+pages of the proess with*pagemap.
+If you enable the feature, you could see block state via
+/sys/kernel/debug/zram/zram0/block_state". The output is as follows,
+
+	  300    75.033841 .wh
+	  301    63.806904 s..
+	  302    63.806919 ..h
+
+First column is zram's block index.
+Second column is access time.
+Third column is state of the block.
+(s: same page
+w: written page to backing store
+h: huge page)
+
+First line of above example says 300th block is accessed at 75.033841sec
+and the block's state is huge so it is written back to the backing
+storage. It's a debugging feature so anyone shouldn't rely on it to work
+properly.
+
 Nitin Gupta
 ngupta@vflare.org
diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
index ac3a31d433b2..01090338fb47 100644
--- a/drivers/block/zram/Kconfig
+++ b/drivers/block/zram/Kconfig
@@ -26,3 +26,13 @@ config ZRAM_WRITEBACK
 	 /sys/block/zramX/backing_dev.
 
 	 See zram.txt for more infomration.
+
+config ZRAM_MEMORY_TRACKING
+	bool "Tracking zram block status"
+	depends on ZRAM && DEBUG_FS
+	help
+	  With this feature, admin can track the state of allocated block
+	  of zRAM. Admin could see the information via
+	  /sys/kernel/debug/zram/zramX/block_state.
+
+	  See zram.txt for more information.
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 7fc10e2ad734..a999d9996a13 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -31,6 +31,7 @@
 #include <linux/err.h>
 #include <linux/idr.h>
 #include <linux/sysfs.h>
+#include <linux/debugfs.h>
 #include <linux/cpuhotplug.h>
 
 #include "zram_drv.h"
@@ -67,6 +68,13 @@ static inline bool init_done(struct zram *zram)
 	return zram->disksize;
 }
 
+static inline bool zram_allocated(struct zram *zram, u32 index)
+{
+
+	return (zram->table[index].value >> (ZRAM_FLAG_SHIFT + 1)) ||
+					zram->table[index].handle;
+}
+
 static inline struct zram *dev_to_zram(struct device *dev)
 {
 	return (struct zram *)dev_to_disk(dev)->private_data;
@@ -83,7 +91,7 @@ static void zram_set_handle(struct zram *zram, u32 index, unsigned long handle)
 }
 
 /* flag operations require table entry bit_spin_lock() being held */
-static int zram_test_flag(struct zram *zram, u32 index,
+static bool zram_test_flag(struct zram *zram, u32 index,
 			enum zram_pageflags flag)
 {
 	return zram->table[index].value & BIT(flag);
@@ -107,16 +115,6 @@ static inline void zram_set_element(struct zram *zram, u32 index,
 	zram->table[index].element = element;
 }
 
-static void zram_accessed(struct zram *zram, u32 index)
-{
-	zram->table[index].ac_time = sched_clock();
-}
-
-static void zram_reset_access(struct zram *zram, u32 index)
-{
-	zram->table[index].ac_time = 0;
-}
-
 static unsigned long zram_get_element(struct zram *zram, u32 index)
 {
 	return zram->table[index].element;
@@ -620,6 +618,122 @@ static int read_from_bdev(struct zram *zram, struct bio_vec *bvec,
 static void zram_wb_clear(struct zram *zram, u32 index) {}
 #endif
 
+#ifdef CONFIG_ZRAM_MEMORY_TRACKING
+
+static struct dentry *zram_debugfs_root;
+
+static void zram_debugfs_create(void)
+{
+	zram_debugfs_root = debugfs_create_dir("zram", NULL);
+}
+
+static void zram_debugfs_destroy(void)
+{
+	debugfs_remove_recursive(zram_debugfs_root);
+}
+
+static void zram_accessed(struct zram *zram, u32 index)
+{
+	zram->table[index].ac_time = sched_clock();
+}
+
+static void zram_reset_access(struct zram *zram, u32 index)
+{
+	zram->table[index].ac_time = 0;
+}
+
+static long long ns2usecs(u64 nsec)
+{
+	nsec += 500;
+	do_div(nsec, 1000);
+	return nsec;
+}
+
+static ssize_t read_block_state(struct file *file, char __user *buf,
+				size_t count, loff_t *ppos)
+{
+	char *kbuf;
+	ssize_t index, written = 0;
+	struct zram *zram = file->private_data;
+	u64 last_access;
+	unsigned long usec_rem;
+	unsigned long nr_pages = zram->disksize >> PAGE_SHIFT;
+
+	kbuf = kvmalloc(count, GFP_KERNEL);
+	if (!kbuf)
+		return -ENOMEM;
+
+	down_read(&zram->init_lock);
+	if (!init_done(zram)) {
+		up_read(&zram->init_lock);
+		kvfree(kbuf);
+		return -EINVAL;
+	}
+
+	for (index = *ppos; index < nr_pages; index++) {
+		int copied;
+
+		zram_slot_lock(zram, index);
+		if (!zram_allocated(zram, index))
+			goto next;
+
+		last_access = ns2usecs(zram->table[index].ac_time);
+		usec_rem = do_div(last_access, USEC_PER_SEC);
+		copied = snprintf(kbuf + written, count,
+			"%12lu %5lu.%06lu %c%c%c\n",
+			index, (unsigned long)last_access, usec_rem,
+			zram_test_flag(zram, index, ZRAM_SAME) ? 's' : '.',
+			zram_test_flag(zram, index, ZRAM_WB) ? 'w' : '.',
+			zram_test_flag(zram, index, ZRAM_HUGE) ? 'h' : '.');
+
+		if (count < copied) {
+			zram_slot_unlock(zram, index);
+			break;
+		}
+		written += copied;
+		count -= copied;
+next:
+		zram_slot_unlock(zram, index);
+		*ppos += 1;
+	}
+
+	up_read(&zram->init_lock);
+	if (copy_to_user(buf, kbuf, written))
+		written = -EFAULT;
+	kvfree(kbuf);
+
+	return written;
+}
+
+static const struct file_operations proc_zram_block_state_op = {
+	.open = simple_open,
+	.read = read_block_state,
+	.llseek = default_llseek,
+};
+
+static void zram_debugfs_register(struct zram *zram)
+{
+	if (!zram_debugfs_root)
+		return;
+
+	zram->debugfs_dir = debugfs_create_dir(zram->disk->disk_name,
+						zram_debugfs_root);
+	debugfs_create_file("block_state", 0400, zram->debugfs_dir,
+				zram, &proc_zram_block_state_op);
+}
+
+static void zram_debugfs_unregister(struct zram *zram)
+{
+	debugfs_remove_recursive(zram->debugfs_dir);
+}
+#else
+static void zram_debugfs_create(void) {};
+static void zram_debugfs_destroy(void) {};
+static void zram_accessed(struct zram *zram, u32 index) {};
+static void zram_reset_access(struct zram *zram, u32 index) {};
+static void zram_debugfs_register(struct zram *zram) {};
+static void zram_debugfs_unregister(struct zram *zram) {};
+#endif
 
 /*
  * We switched to per-cpu streams and this attr is not needed anymore.
@@ -1604,6 +1718,7 @@ static int zram_add(void)
 	}
 	strlcpy(zram->compressor, default_compressor, sizeof(zram->compressor));
 
+	zram_debugfs_register(zram);
 	pr_info("Added device: %s\n", zram->disk->disk_name);
 	return device_id;
 
@@ -1637,6 +1752,7 @@ static int zram_remove(struct zram *zram)
 	zram->claim = true;
 	mutex_unlock(&bdev->bd_mutex);
 
+	zram_debugfs_unregister(zram);
 	/*
 	 * Remove sysfs first, so no one will perform a disksize
 	 * store while we destroy the devices. This also helps during
@@ -1739,6 +1855,7 @@ static void destroy_devices(void)
 {
 	class_unregister(&zram_control_class);
 	idr_for_each(&zram_index_idr, &zram_remove_cb, NULL);
+	zram_debugfs_destroy();
 	idr_destroy(&zram_index_idr);
 	unregister_blkdev(zram_major, "zram");
 	cpuhp_remove_multi_state(CPUHP_ZCOMP_PREPARE);
@@ -1760,6 +1877,7 @@ static int __init zram_init(void)
 		return ret;
 	}
 
+	zram_debugfs_create();
 	zram_major = register_blkdev(0, "zram");
 	if (zram_major <= 0) {
 		pr_err("Unable to get major number\n");
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 1075218e88b2..dbe459b300c0 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -61,7 +61,9 @@ struct zram_table_entry {
 		unsigned long element;
 	};
 	unsigned long value;
+#ifdef CONFIG_ZRAM_MEMORY_TRACKING
 	u64 ac_time;
+#endif
 };
 
 struct zram_stats {
@@ -110,5 +112,8 @@ struct zram {
 	unsigned long nr_pages;
 	spinlock_t bitmap_lock;
 #endif
+#ifdef CONFIG_ZRAM_MEMORY_TRACKING
+	struct dentry *debugfs_dir;
+#endif
 };
 #endif
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 4/4] zram: introduce zram memory tracking
  2018-04-16  3:31 ` [PATCH v4 4/4] zram: introduce zram memory tracking Minchan Kim
@ 2018-04-16  4:17   ` Randy Dunlap
  2018-04-16  4:26     ` Minchan Kim
  0 siblings, 1 reply; 9+ messages in thread
From: Randy Dunlap @ 2018-04-16  4:17 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton; +Cc: LKML, Sergey Senozhatsky, Greg Kroah-Hartman

On 04/15/2018 08:31 PM, Minchan Kim wrote:
> zRam as swap is useful for small memory device. However, swap means
> those pages on zram are mostly cold pages due to VM's LRU algorithm.
> Especially, once init data for application are touched for launching,
> they tend to be not accessed any more and finally swapped out.
> zRAM can store such cold pages as compressed form but it's pointless
> to keep in memory. Better idea is app developers free them directly
> rather than remaining them on heap.
> 
> This patch tell us last access time of each block of zram via
> "cat /sys/kernel/debug/zram/zram0/block_state".
> 
> The output is as follows,
>       300    75.033841 .wh
>       301    63.806904 s..
>       302    63.806919 ..h
> 
> First column is zram's block index and 3rh one represents symbol
> (s: same page w: written page to backing store h: huge page) of the
> block state. Second column represents usec time unit of the block
> was last accessed. So above example means the 300th block is accessed
> at 75.033851 second and it was huge so it was written to the backing
> store.
> 
> Admin can leverage this information to catch cold|incompressible pages
> of process with *pagemap* once part of heaps are swapped out.
> 
> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  Documentation/blockdev/zram.txt |  24 ++++++
>  drivers/block/zram/Kconfig      |  10 +++
>  drivers/block/zram/zram_drv.c   | 140 +++++++++++++++++++++++++++++---
>  drivers/block/zram/zram_drv.h   |   5 ++
>  4 files changed, 168 insertions(+), 11 deletions(-)
> 
> diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
> index 78db38d02bc9..45509c7d5716 100644
> --- a/Documentation/blockdev/zram.txt
> +++ b/Documentation/blockdev/zram.txt
> @@ -243,5 +243,29 @@ to backing storage rather than keeping it in memory.
>  User should set up backing device via /sys/block/zramX/backing_dev
>  before disksize setting.
>  
> += memory tracking
> +
> +With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the
> +zram block. It could be useful to catch cold or incompressible
> +pages of the proess with*pagemap.

?               process

> +If you enable the feature, you could see block state via
> +/sys/kernel/debug/zram/zram0/block_state". The output is as follows,
> +
> +	  300    75.033841 .wh
> +	  301    63.806904 s..
> +	  302    63.806919 ..h
> +
> +First column is zram's block index.
> +Second column is access time.
> +Third column is state of the block.
> +(s: same page
> +w: written page to backing store
> +h: huge page)
> +
> +First line of above example says 300th block is accessed at 75.033841sec
> +and the block's state is huge so it is written back to the backing
> +storage. It's a debugging feature so anyone shouldn't rely on it to work
> +properly.
> +
>  Nitin Gupta
>  ngupta@vflare.org
> diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
> index ac3a31d433b2..01090338fb47 100644
> --- a/drivers/block/zram/Kconfig
> +++ b/drivers/block/zram/Kconfig
> @@ -26,3 +26,13 @@ config ZRAM_WRITEBACK
>  	 /sys/block/zramX/backing_dev.
>  
>  	 See zram.txt for more infomration.
> +
> +config ZRAM_MEMORY_TRACKING
> +	bool "Tracking zram block status"

	bool "Track zram block status"

although sometimes it is zRam or zRAM.


> +	depends on ZRAM && DEBUG_FS
> +	help
> +	  With this feature, admin can track the state of allocated block

	                                                            blocks

> +	  of zRAM. Admin could see the information via
> +	  /sys/kernel/debug/zram/zramX/block_state.
> +
> +	  See zram.txt for more information.

	  See Documentation/blockdev/zram.txt for more information.


-- 
~Randy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 4/4] zram: introduce zram memory tracking
  2018-04-16  4:17   ` Randy Dunlap
@ 2018-04-16  4:26     ` Minchan Kim
  0 siblings, 0 replies; 9+ messages in thread
From: Minchan Kim @ 2018-04-16  4:26 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: Andrew Morton, LKML, Sergey Senozhatsky, Greg Kroah-Hartman

On Sun, Apr 15, 2018 at 09:17:45PM -0700, Randy Dunlap wrote:
> On 04/15/2018 08:31 PM, Minchan Kim wrote:
> > zRam as swap is useful for small memory device. However, swap means
> > those pages on zram are mostly cold pages due to VM's LRU algorithm.
> > Especially, once init data for application are touched for launching,
> > they tend to be not accessed any more and finally swapped out.
> > zRAM can store such cold pages as compressed form but it's pointless
> > to keep in memory. Better idea is app developers free them directly
> > rather than remaining them on heap.
> > 
> > This patch tell us last access time of each block of zram via
> > "cat /sys/kernel/debug/zram/zram0/block_state".
> > 
> > The output is as follows,
> >       300    75.033841 .wh
> >       301    63.806904 s..
> >       302    63.806919 ..h
> > 
> > First column is zram's block index and 3rh one represents symbol
> > (s: same page w: written page to backing store h: huge page) of the
> > block state. Second column represents usec time unit of the block
> > was last accessed. So above example means the 300th block is accessed
> > at 75.033851 second and it was huge so it was written to the backing
> > store.
> > 
> > Admin can leverage this information to catch cold|incompressible pages
> > of process with *pagemap* once part of heaps are swapped out.
> > 
> > Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > ---
> >  Documentation/blockdev/zram.txt |  24 ++++++
> >  drivers/block/zram/Kconfig      |  10 +++
> >  drivers/block/zram/zram_drv.c   | 140 +++++++++++++++++++++++++++++---
> >  drivers/block/zram/zram_drv.h   |   5 ++
> >  4 files changed, 168 insertions(+), 11 deletions(-)
> > 
> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
> > index 78db38d02bc9..45509c7d5716 100644
> > --- a/Documentation/blockdev/zram.txt
> > +++ b/Documentation/blockdev/zram.txt
> > @@ -243,5 +243,29 @@ to backing storage rather than keeping it in memory.
> >  User should set up backing device via /sys/block/zramX/backing_dev
> >  before disksize setting.
> >  
> > += memory tracking
> > +
> > +With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the
> > +zram block. It could be useful to catch cold or incompressible
> > +pages of the proess with*pagemap.
> 
> ?               process
> 
> > +If you enable the feature, you could see block state via
> > +/sys/kernel/debug/zram/zram0/block_state". The output is as follows,
> > +
> > +	  300    75.033841 .wh
> > +	  301    63.806904 s..
> > +	  302    63.806919 ..h
> > +
> > +First column is zram's block index.
> > +Second column is access time.
> > +Third column is state of the block.
> > +(s: same page
> > +w: written page to backing store
> > +h: huge page)
> > +
> > +First line of above example says 300th block is accessed at 75.033841sec
> > +and the block's state is huge so it is written back to the backing
> > +storage. It's a debugging feature so anyone shouldn't rely on it to work
> > +properly.
> > +
> >  Nitin Gupta
> >  ngupta@vflare.org
> > diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
> > index ac3a31d433b2..01090338fb47 100644
> > --- a/drivers/block/zram/Kconfig
> > +++ b/drivers/block/zram/Kconfig
> > @@ -26,3 +26,13 @@ config ZRAM_WRITEBACK
> >  	 /sys/block/zramX/backing_dev.
> >  
> >  	 See zram.txt for more infomration.
> > +
> > +config ZRAM_MEMORY_TRACKING
> > +	bool "Tracking zram block status"
> 
> 	bool "Track zram block status"
> 
> although sometimes it is zRam or zRAM.
> 
> 
> > +	depends on ZRAM && DEBUG_FS
> > +	help
> > +	  With this feature, admin can track the state of allocated block
> 
> 	                                                            blocks
> 
> > +	  of zRAM. Admin could see the information via
> > +	  /sys/kernel/debug/zram/zramX/block_state.
> > +
> > +	  See zram.txt for more information.
> 
> 	  See Documentation/blockdev/zram.txt for more information.

I just fix things. I will wait more feedback and then resend.
Thanks for the review!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 0/4] zram memory tracking
  2018-04-16  3:31 [PATCH v4 0/4] zram memory tracking Minchan Kim
                   ` (3 preceding siblings ...)
  2018-04-16  3:31 ` [PATCH v4 4/4] zram: introduce zram memory tracking Minchan Kim
@ 2018-04-16  7:42 ` Sergey Senozhatsky
  2018-04-16  9:12   ` Minchan Kim
  4 siblings, 1 reply; 9+ messages in thread
From: Sergey Senozhatsky @ 2018-04-16  7:42 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Andrew Morton, LKML, Sergey Senozhatsky

On (04/16/18 12:31), Minchan Kim wrote:
> Minchan Kim (4):
>   zram: correct flag name of ZRAM_ACCESS
>   zram: mark incompressible page as ZRAM_HUGE
>   zram: record accessed second
>   zram: introduce zram memory tracking

Looks OK to me. After Randy's comments addressed:

Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

	-ss

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 0/4] zram memory tracking
  2018-04-16  7:42 ` [PATCH v4 0/4] " Sergey Senozhatsky
@ 2018-04-16  9:12   ` Minchan Kim
  0 siblings, 0 replies; 9+ messages in thread
From: Minchan Kim @ 2018-04-16  9:12 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: Andrew Morton, LKML

On Mon, Apr 16, 2018 at 04:42:18PM +0900, Sergey Senozhatsky wrote:
> On (04/16/18 12:31), Minchan Kim wrote:
> > Minchan Kim (4):
> >   zram: correct flag name of ZRAM_ACCESS
> >   zram: mark incompressible page as ZRAM_HUGE
> >   zram: record accessed second
> >   zram: introduce zram memory tracking
> 
> Looks OK to me. After Randy's comments addressed:
> 
> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

Thanks for the review, Sergey!

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-04-16  9:12 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-16  3:31 [PATCH v4 0/4] zram memory tracking Minchan Kim
2018-04-16  3:31 ` [PATCH v4 1/4] zram: correct flag name of ZRAM_ACCESS Minchan Kim
2018-04-16  3:31 ` [PATCH v4 2/4] zram: mark incompressible page as ZRAM_HUGE Minchan Kim
2018-04-16  3:31 ` [PATCH v4 3/4] zram: record accessed second Minchan Kim
2018-04-16  3:31 ` [PATCH v4 4/4] zram: introduce zram memory tracking Minchan Kim
2018-04-16  4:17   ` Randy Dunlap
2018-04-16  4:26     ` Minchan Kim
2018-04-16  7:42 ` [PATCH v4 0/4] " Sergey Senozhatsky
2018-04-16  9:12   ` Minchan Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.